Data mapping information for test challenge set

It seems the ‘sid’ filed in test challenge set has no meaning, it cannot map to ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub to get information like ads_symbols.

Can you provide a similar mapping for the test challenge set?
Some information is hard to infer by atom_numbers, for example, CHO and COH are two types of ads, but it contains the same atoms.

Another small issue is, the atoms for ads_symbols ‘*NHN2’ is 111177, in test set. Is that expected?

We did not release a similar mapping for the test-challenge split. sids are a unique identifier for us to keep track of systems and submissions. We can consider releasing this information once the challenge is over.

Another small issue is, the atoms for ads_symbols ‘*NHN2’ is 111177 , in test set. Is that expected?

Can you point me to the exact system (sid) and task this shows up in? Also is this in the test or test-challenge? I can take a look and get back to you.

you can check sid=1000415 for test_ood_ads split, and 1008449 for test_ood_both split.
and sid=10000 for test challenge.

1 Like

sadly, our model depends on ads types.
also refer to this thread, Split of Test Challenge Data - #3 by Jingtun

Thanks! Let me look into both of these points and get back to you.

It looks like the database the adsorbates were pulled from had this adsorbate improperly labeled. These should in fact be *NH2NH2, consistent with the atomic numbers of the atoms object. I didn’t notice any other mislabel with the rest of the adsorbates. I’ll update the mapping information here ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub. Thanks for pointing this out!

1 Like

We have considered your request and have made the corresponding metadata available here - NeurIPS test-challenge metadata. Good luck!