NeurIPS test-challenge metadata

Some have requested metadata information (adsorbate+bulk identity) for the challenge dataset similar to that released for OC20, particularly for distinguishing adsorbates (i.e. *COH vs *CHO). While this information is directly available from the dataset by looking at atomic_numbers and positions, this mapping makes it easier to do so. We provide this for convenience to approaches that might need this information.

Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/challenge_2021_data_mapping.pkl
(MD5 checksum: c71ebe004c351118882f3c1359baf8ec)

An example entry is

{0: {'bulk_mpid': 'mp-676799',
  'bulk_symbols': 'Ag8GeTe6',
  'ads_symbols': '*CH2*CH2'},
 1: {'bulk_mpid': 'mp-2400',
  'bulk_symbols': 'Na4S4',
  'ads_symbols': '*COHCHOH'},
 2: {'bulk_mpid': 'mp-35835',
  'bulk_symbols': 'Ag2Au4S4',
  'ads_symbols': '*COHCH2'},

Where dictionary keys are adsorbate+catalyst system-ids (sid found in the data object), and

  • bulk_mpid : Materials Project ID of the bulk system used corresponding the the catalyst surface
  • bulk_symbols Chemical composition of the bulk counterpart
  • ads_symbols Chemical composition of the adsorbate counterpart

If you have any questions or concerns please let us know.

– The OC20 Team

1 Like