Reading the adsorbates in OC22 dataset

Hi

I have downloaded the OC22 dataset in the lmdb format and can read the individual entries too. The output of a particular entry say dataset[0], looks like Data(y=-566.62288864, pos=[109, 3], cell=[1, 3, 3], atomic_numbers=[109], natoms=109, force=[109, 3], fixed=[109], tags=[109], nads=1, sid=17914, fid=24, id=‘0_0’, oc22=1)

How to identify the adsorbate (the chemical species) that is present on the slab?

Hi -

You can use the tags information to identify the adsorbate. Specifically, tags=2 correspond to the adsorbate. So the adsorbate in that example would be:

adsorbate_atomic_numbers = data.atomic_numbers[data.tags == 2]

You can also use the following metadata Open Catalyst 2022 (OC22) to look up all information for a particular system. The sid in your Data object can be used to query this mapping.

Alternatively, if you want the raw ASE trajectories - Open Catalyst 2022 (OC22).

Hi mshuaibi

Thank you. That helps!

Also, where can I find the information on all these tags that can be used with the Data object on the website. Please guide me regarding the same.

Regards,
Reshma