I have downloaded the OC22 dataset in the lmdb format and can read the individual entries too. The output of a particular entry say dataset[0], looks like Data(y=-566.62288864, pos=[109, 3], cell=[1, 3, 3], atomic_numbers=[109], natoms=109, force=[109, 3], fixed=[109], tags=[109], nads=1, sid=17914, fid=24, id=‘0_0’, oc22=1)
How to identify the adsorbate (the chemical species) that is present on the slab?
You can use the tags information to identify the adsorbate. Specifically, tags=2 correspond to the adsorbate. So the adsorbate in that example would be:
You can also use the following metadata Open Catalyst 2022 (OC22) to look up all information for a particular system. The sid in your Data object can be used to query this mapping.