Question about different relaxed energy in the same system in IS2RE dataset

Thanks so much for your fantastic work on Open-Catalyst-Dataset.

Recently, I have been trying to use the IS2RE dataset you provided. I find that there are several pieces of data belonging to the same adsorbent+slab system with different relaxed energy, because the adsorption sites are different. Examples are as follows:

For PtHg4(1,0,0)+*O system in the oc20 IS2RE dataset, these two mapping data have different energies.

random1285003: {'bulk_id': 1753, 'ads_id': 0, 'bulk_mpid': 'mp-936', 'bulk_symbols': 'PtHg4', 'ads_symbols': '*O', 'miller_index': (1, 0, 0), 'shift': 0.125, 'top': False, 'adsorption_site': ((3.24, 5.84, 25.58),), 'class': 0, 'anomaly': 0} – y_relaxed: 2.44910eV
random1991216: {'bulk_id': 1753, 'ads_id': 0, 'bulk_mpid': 'mp-936', 'bulk_symbols': 'PtHg4', 'ads_symbols': '*O', 'miller_index': (1, 0, 0), 'shift': 0.125, 'top': False, 'adsorption_site': ((1.62, 5.84, 25.73),), 'class': 0, 'anomaly': 0} –y_relaxed: 1.84555eV

For this case, I would like to know:

  1. Whether the adsorption sites in oc20_data_mapping.pkl refers to the final relaxed site or the initial adsorption site?
  2. If y_relaxed is used as the filtering metric to filter materials, what value do you recommend to use? Or using their average value? I personally think it seems more appropriate to use the global minimum because it is the most stable configuration.


Hi -

  1. This refers to the initial placements of the adsorbate on the slab.
  2. Correct, picking the global minimum is correct procedure in cases like this where you have identical adsorbate+slab configurations but varying sites.

Some relevant work that you might find interesting - [2211.16486] AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials. Here we explore how to find the global minimum for many different adsorption sites using OC20 models.