What is exactly out of domain in the OOD splits?

hmdmahdavi · October 12, 2022, 5:14pm

Hi OCP team!

I looked at the OC20 and OC22 papers and the GitHub repository of OCP. It’s not obvious what is the exact difference between OOD validation splits and i.i.d split. It is just mentioned that there are either “unseen” adsorbates or catalysts in the OOD splits. I did not find more information about this. My main question is that what is exactly out of domain in the OOD splits? Does it mean that it contains different molecules (different combinations of atoms) or maybe molecules in the OOD split have different geometry? Could you please elaborate?

mshuaibi · October 12, 2022, 10:14pm

Hi -

In OC20 we have two axes for OOD - adsorbate and material composition. From our list of 82 adsorbates, we held out 7 for OOD validation, and 7 for OOD test. For material composition, we enumerate the unique compositions of our bulks - (i.e. (Ag,Cu), (Ag, Au), (Cu), etc.) and hold some out for validation and test, respectively. The motivation here is that in practical applications you may want to study adsorbates not present in OC20 and/or unique material compositions. This results in our 4 splits - ID, OOD-Ads (adsorbate), OOD-Cat (material composition), OOD-Both (adsorbate + material composition).

For OC22 we do the same thing but only for material composition and hence only have two splits - ID and OOD.

See “Train/Test/Validation Splits” of the OC20 SI for more details on the specific adsorbates.

Topic		Replies	Views
Open Catalyst 2022 (OC22) Dataset: Oxide Electrocatalysis	5	1339	October 7, 2022
Questions on available DFT data for training	3	201	March 9, 2024
Split of Test Challenge Data	7	1526	September 28, 2021
NeurIPS test-challenge metadata	1	643	September 29, 2021
Reading the adsorbates in OC22 dataset	2	46	September 27, 2024

What is exactly out of domain in the OOD splits?

Related topics