Converting LMDB to ASE object

Hi, I’ve already asked how should I convert LMDB to ASE object in previous post, but still it’s not that clear how I can do this specifically.
In the following script, I’m supposed to convert some data format called batch to ASE object.
But I can I get “batch” object from LMDB file?

def batch_to_atoms(batch):
    n_systems = batch.neighbors.shape[0]
    natoms = batch.natoms.tolist()
    numbers = torch.split(batch.atomic_numbers, natoms)
    fixed = torch.split(batch.fixed, natoms)
    forces = torch.split(batch.force, natoms)
    positions = torch.split(batch.pos, natoms)
    tags = torch.split(batch.tags, natoms)
    cells = batch.cell
    energies = batch.y.tolist()

    atoms_objects = []
    for idx in range(n_systems):
        atoms = Atoms(
            numbers=numbers[idx].tolist(),
            positions=positions[idx].cpu().detach().numpy(),
            tags=tags[idx].tolist(),
            cell=cells[idx].cpu().detach().numpy(),
            constraint=FixAtoms(mask=fixed[idx].tolist()),
            pbc=[True, True, True],
        )
        calc = sp(
            atoms=atoms,
            energy=energies[idx],
            forces=forces[idx].cpu().detach().numpy(),
        )
        atoms.set_calculator(calc)
        atoms_objects.append(atoms)

    return atoms_objects

Hi -

Sorry for the delay on this! This script is a little specific to how our codebase is set up but it can still be used. You can do something like this:

from ocpmodels.datasets import LmdbDataset, data_list_collater

dataset = LmdbDataset({"src": "path/to/lmdb/dataset"})

data_object = dataset[0]
batch = data_list_collater([data])
atoms = batch_to_atoms(batch)

Let me know if you have any issues with this. What dataset were you trying to convert to ASE? Was this in regards to the OCP Challenge, if so we can consider releasing those ASE objects directly.

Thank you for the reply!
Yes, the reason why asked this question was related to OCP challenge. I would be appreciated if you could release ASE object directly but I’ll try converting myself as you have advised. Thank you!!

Got it!

I have gone ahead and converted the data for you and added a link publicly at Open Catalyst Challenge. See - “OC20-Dense-val-ase-format”. These are all the inputs, in ASE atoms object format, for the OC20-Dense-Val set to be used for the challenge.

For “ground truth” targets, you can download the data under “OC20-Dense-val-trajectories”. Note that this set of data may be less than the inputs because of the need to remove anomalous (desorption, dissociation, etc.) systems or non-converged calculations.

Hope this helps!

I have tried as you have suggested, but it causes an error like following.

from ocpmodels.datasets import LmdbDataset, data_list_collater

dataset = LmdbDataset({"src": "/home/hjung/Calculation/OpenCatalyst/train/data.0000.lmdb"})

data = dataset[0]
batch = data_list_collater([data])
atoms = batch_to_atoms(batch)
WARNING:root:LMDB does not contain edge index information, set otf_graph=True

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     78         try:
---> 79             return self[key]
     80         except KeyError:

~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/storage.py in __getitem__(self, key)
    103     def __getitem__(self, key: str) -> Any:
--> 104         return self._mapping[key]
    105 

KeyError: 'neighbors'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-10-b03edaa8ba1d> in <module>
      3 data = dataset[0]
      4 batch = data_list_collater([data])
----> 5 atoms = batch_to_atoms(batch)

<ipython-input-2-a14d1f7cdce7> in batch_to_atoms(batch)
      1 def batch_to_atoms(batch):
----> 2     n_systems = batch.neighbors.shape[0]
      3     natoms = batch.natoms.tolist()
      4     numbers = torch.split(batch.atomic_numbers, natoms)
      5     fixed = torch.split(batch.fixed, natoms)

~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/data.py in __getattr__(self, key)
    439                 "dataset, remove the 'processed/' directory in the dataset's "
    440                 "root folder and try again.")
--> 441         return getattr(self._store, key)
    442 
    443     def __setattr__(self, key: str, value: Any):

~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     79             return self[key]
     80         except KeyError:
---> 81             raise AttributeError(
     82                 f"'{self.__class__.__name__}' object has no attribute '{key}'")
     83 

AttributeError: 'GlobalStorage' object has no attribute 'neighbors'

So, I think converting LMDB to ASE object is still not successful. Do you have some idea how to solve this problem?

I’ve uploaded the ASE objects here: https://dl.fbaipublicfiles.com/opencatalystproject/data/neurips_2023/oc20dense_is2re_val_ase.tar.gz. See my previous post.