Question about different relaxed energy in the same system in IS2RE dataset

Thanks so much for your fantastic work on Open-Catalyst-Dataset.

Recently, I have been trying to use the IS2RE dataset you provided. I find that there are several pieces of data belonging to the same adsorbent+slab system with different relaxed energy, because the adsorption sites are different. Examples are as follows:

For PtHg4(1,0,0)+*O system in the oc20 IS2RE dataset, these two mapping data have different energies.

random1285003: {'bulk_id': 1753, 'ads_id': 0, 'bulk_mpid': 'mp-936', 'bulk_symbols': 'PtHg4', 'ads_symbols': '*O', 'miller_index': (1, 0, 0), 'shift': 0.125, 'top': False, 'adsorption_site': ((3.24, 5.84, 25.58),), 'class': 0, 'anomaly': 0} – y_relaxed: 2.44910eV
random1991216: {'bulk_id': 1753, 'ads_id': 0, 'bulk_mpid': 'mp-936', 'bulk_symbols': 'PtHg4', 'ads_symbols': '*O', 'miller_index': (1, 0, 0), 'shift': 0.125, 'top': False, 'adsorption_site': ((1.62, 5.84, 25.73),), 'class': 0, 'anomaly': 0} –y_relaxed: 1.84555eV

For this case, I would like to know:

  1. Whether the adsorption sites in oc20_data_mapping.pkl refers to the final relaxed site or the initial adsorption site?
  2. If y_relaxed is used as the filtering metric to filter materials, what value do you recommend to use? Or using their average value? I personally think it seems more appropriate to use the global minimum because it is the most stable configuration.


  1. This refers to the initial placements of the adsorbate on the slab.
  2. Correct, picking the global minimum is correct procedure in cases like this where you have identical adsorbate+slab configurations but varying sites.

Some relevant work that you might find interesting - [2211.16486] AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials. Here we explore how to find the global minimum for many different adsorption sites using OC20 models.