OC22 structures from lmdb

I’m trying to construct ASE objects from OC22 lmdb files. Is there a method I can use for this purpose? I’m assuming the ‘batch’ in batch_to_atoms here would be the object after loading the dataset with OC22LmdbDataset?

Also does the OC22 mapping indicate the adsorption site position on a surface? I remember the mapping in OC20 does have ‘adsorption_site’ as a key.

Thanks!

If you were specifically interested in converting LMDB objects to ASE objects the script you linked is the right thing to be using. You would need to do something like this:

from ocpmodels.datasets import OC22LmdbDataset, data_list_collater
from ocpmodels.common.relaxation.ase_utils import batch_to_atoms
from torch.utils.data import DataLoader
from tqdm import tqdm

dataset = OC22LmdbDataset({"src": "path/to/data/"})
dataloader = DataLoader(
    dataset,
    batch_size=64,
    num_workers=8,
    collate_fn=data_list_collater,
    shuffle=False
)

for batch in tqdm(dataloader, total=len(dataloader)):
    atoms_object_list = batch_to_atoms(batch)
    ### process accordingly

However, we have released the full ASE atoms object trajectories directly here - https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#relaxation-trajectories-1 which you can download and use directly instead. Make sure to leverage the mapping files here https://github.com/Open-Catalyst-Project/ocp/blob/main/DATASET.md#oc22-mappings to help you navigate the systems if you were interested in specific data.

Thanks for the reply!

Does the OC22 mapping indicate adsorbate site too? Like in OC20 mapping.