Hi,
Thanks so much for your fantastic work on Open-Catalyst-Dataset.
Recently, I have been trying to use the IS2RE dataset you provided. I find that there are several pieces of data belonging to the same adsorbent+slab system with different relaxed energy, because the adsorption sites are different. Examples are as follows:
For PtHg4(1,0,0)+*O system in the oc20 IS2RE dataset, these two mapping data have different energies.
random1285003: {'bulk_id': 1753, 'ads_id': 0, 'bulk_mpid': 'mp-936', 'bulk_symbols': 'PtHg4', 'ads_symbols': '*O', 'miller_index': (1, 0, 0), 'shift': 0.125, 'top': False, 'adsorption_site': ((3.24, 5.84, 25.58),), 'class': 0, 'anomaly': 0}
– y_relaxed: 2.44910eV
random1991216: {'bulk_id': 1753, 'ads_id': 0, 'bulk_mpid': 'mp-936', 'bulk_symbols': 'PtHg4', 'ads_symbols': '*O', 'miller_index': (1, 0, 0), 'shift': 0.125, 'top': False, 'adsorption_site': ((1.62, 5.84, 25.73),), 'class': 0, 'anomaly': 0}
–y_relaxed: 1.84555eV
For this case, I would like to know:
- Whether the adsorption sites in oc20_data_mapping.pkl refers to the final relaxed site or the initial adsorption site?
- If y_relaxed is used as the filtering metric to filter materials, what value do you recommend to use? Or using their average value? I personally think it seems more appropriate to use the global minimum because it is the most stable configuration.
Thanks!