Converting LMDB to ASE object

jungsdao · August 9, 2023, 2:46pm

Hi, I’ve already asked how should I convert LMDB to ASE object in previous post, but still it’s not that clear how I can do this specifically.
In the following script, I’m supposed to convert some data format called batch to ASE object.
But I can I get “batch” object from LMDB file?

def batch_to_atoms(batch):
    n_systems = batch.neighbors.shape[0]
    natoms = batch.natoms.tolist()
    numbers = torch.split(batch.atomic_numbers, natoms)
    fixed = torch.split(batch.fixed, natoms)
    forces = torch.split(batch.force, natoms)
    positions = torch.split(batch.pos, natoms)
    tags = torch.split(batch.tags, natoms)
    cells = batch.cell
    energies = batch.y.tolist()

    atoms_objects = []
    for idx in range(n_systems):
        atoms = Atoms(
            numbers=numbers[idx].tolist(),
            positions=positions[idx].cpu().detach().numpy(),
            tags=tags[idx].tolist(),
            cell=cells[idx].cpu().detach().numpy(),
            constraint=FixAtoms(mask=fixed[idx].tolist()),
            pbc=[True, True, True],
        )
        calc = sp(
            atoms=atoms,
            energy=energies[idx],
            forces=forces[idx].cpu().detach().numpy(),
        )
        atoms.set_calculator(calc)
        atoms_objects.append(atoms)

    return atoms_objects

mshuaibi · August 21, 2023, 6:53pm

Hi -

Sorry for the delay on this! This script is a little specific to how our codebase is set up but it can still be used. You can do something like this:

from ocpmodels.datasets import LmdbDataset, data_list_collater

dataset = LmdbDataset({"src": "path/to/lmdb/dataset"})

data_object = dataset[0]
batch = data_list_collater([data])
atoms = batch_to_atoms(batch)

Let me know if you have any issues with this. What dataset were you trying to convert to ASE? Was this in regards to the OCP Challenge, if so we can consider releasing those ASE objects directly.

jungsdao · August 28, 2023, 6:35pm

Thank you for the reply!
Yes, the reason why asked this question was related to OCP challenge. I would be appreciated if you could release ASE object directly but I’ll try converting myself as you have advised. Thank you!!

mshuaibi · August 29, 2023, 5:52pm

Got it!

I have gone ahead and converted the data for you and added a link publicly at Open Catalyst Challenge. See - “OC20-Dense-val-ase-format”. These are all the inputs, in ASE atoms object format, for the OC20-Dense-Val set to be used for the challenge.

For “ground truth” targets, you can download the data under “OC20-Dense-val-trajectories”. Note that this set of data may be less than the inputs because of the need to remove anomalous (desorption, dissociation, etc.) systems or non-converged calculations.

Hope this helps!

jungsdao · August 31, 2023, 3:05pm

I have tried as you have suggested, but it causes an error like following.

from ocpmodels.datasets import LmdbDataset, data_list_collater

dataset = LmdbDataset({"src": "/home/hjung/Calculation/OpenCatalyst/train/data.0000.lmdb"})

data = dataset[0]
batch = data_list_collater([data])
atoms = batch_to_atoms(batch)

WARNING:root:LMDB does not contain edge index information, set otf_graph=True

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     78         try:
---> 79             return self[key]
     80         except KeyError:

~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/storage.py in __getitem__(self, key)
    103     def __getitem__(self, key: str) -> Any:
--> 104         return self._mapping[key]
    105 

KeyError: 'neighbors'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-10-b03edaa8ba1d> in <module>
      3 data = dataset[0]
      4 batch = data_list_collater([data])
----> 5 atoms = batch_to_atoms(batch)

<ipython-input-2-a14d1f7cdce7> in batch_to_atoms(batch)
      1 def batch_to_atoms(batch):
----> 2     n_systems = batch.neighbors.shape[0]
      3     natoms = batch.natoms.tolist()
      4     numbers = torch.split(batch.atomic_numbers, natoms)
      5     fixed = torch.split(batch.fixed, natoms)

~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/data.py in __getattr__(self, key)
    439                 "dataset, remove the 'processed/' directory in the dataset's "
    440                 "root folder and try again.")
--> 441         return getattr(self._store, key)
    442 
    443     def __setattr__(self, key: str, value: Any):

~/anaconda3/envs/rdkitenv/lib/python3.9/site-packages/torch_geometric/data/storage.py in __getattr__(self, key)
     79             return self[key]
     80         except KeyError:
---> 81             raise AttributeError(
     82                 f"'{self.__class__.__name__}' object has no attribute '{key}'")
     83 

AttributeError: 'GlobalStorage' object has no attribute 'neighbors'

So, I think converting LMDB to ASE object is still not successful. Do you have some idea how to solve this problem?

mshuaibi · August 31, 2023, 3:51pm

I’ve uploaded the ASE objects here: https://dl.fbaipublicfiles.com/opencatalystproject/data/neurips_2023/oc20dense_is2re_val_ase.tar.gz. See my previous post.

Topic		Replies	Views
OC22 structures from lmdb	2	200	November 8, 2023
Lmdb转换：什么格式可以转换成lmdb	1	383	July 18, 2023
How to construct ASE atoms object from LMDB dataset?	5	944	August 25, 2022
Reading LMDB files: field interpretations	0	319	August 6, 2023
Batch_to_atoms error	2	362	August 31, 2022

Converting LMDB to ASE object

Related Topics