Some have requested metadata information (adsorbate+bulk identity) for the challenge dataset similar to that released for OC20, particularly for distinguishing adsorbates (i.e. *COH
vs *CHO
). While this information is directly available from the dataset by looking at atomic_numbers
and positions
, this mapping makes it easier to do so. We provide this for convenience to approaches that might need this information.
Downloadable link: https://dl.fbaipublicfiles.com/opencatalystproject/data/challenge_2021_data_mapping.pkl
(MD5 checksum: c71ebe004c351118882f3c1359baf8ec
)
An example entry is
{0: {'bulk_mpid': 'mp-676799',
'bulk_symbols': 'Ag8GeTe6',
'ads_symbols': '*CH2*CH2'},
1: {'bulk_mpid': 'mp-2400',
'bulk_symbols': 'Na4S4',
'ads_symbols': '*COHCHOH'},
2: {'bulk_mpid': 'mp-35835',
'bulk_symbols': 'Ag2Au4S4',
'ads_symbols': '*COHCH2'},
Where dictionary keys are adsorbate+catalyst system-ids (sid
found in the data object), and
-
bulk_mpid
: Materials Project ID of the bulk system used corresponding the the catalyst surface -
bulk_symbols
Chemical composition of the bulk counterpart -
ads_symbols
Chemical composition of the adsorbate counterpart
If you have any questions or concerns please let us know.
– The OC20 Team