What is exactly out of domain in the OOD splits?

Hi OCP team!

I looked at the OC20 and OC22 papers and the GitHub repository of OCP. It’s not obvious what is the exact difference between OOD validation splits and i.i.d split. It is just mentioned that there are either “unseen” adsorbates or catalysts in the OOD splits. I did not find more information about this. My main question is that what is exactly out of domain in the OOD splits? Does it mean that it contains different molecules (different combinations of atoms) or maybe molecules in the OOD split have different geometry? Could you please elaborate?

Hi -

In OC20 we have two axes for OOD - adsorbate and material composition. From our list of 82 adsorbates, we held out 7 for OOD validation, and 7 for OOD test. For material composition, we enumerate the unique compositions of our bulks - (i.e. (Ag,Cu), (Ag, Au), (Cu), etc.) and hold some out for validation and test, respectively. The motivation here is that in practical applications you may want to study adsorbates not present in OC20 and/or unique material compositions. This results in our 4 splits - ID, OOD-Ads (adsorbate), OOD-Cat (material composition), OOD-Both (adsorbate + material composition).

For OC22 we do the same thing but only for material composition and hence only have two splits - ID and OOD.

See “Train/Test/Validation Splits” of the OC20 SI for more details on the specific adsorbates.