Opening LMDB files

Please I will like to open the LMBD file to see how the data in it is structured. Is there a sample code I can use to open and access the contents?

You can refer to the end of this tutorial for help with interacting with the LMDBs - ocp/lmdb_dataset_creation.ipynb at master · Open-Catalyst-Project/ocp · GitHub. Note this sample code is for S2EF LMDBs (specifying the directory as src). If you’re interested in interacting with the IS2RE dataset, something like this should do:

from ocpmodels.datasets import SinglePointLmdbDataset
dataset = SinglePointLmdbDataset({"src": "path/to/is2re/data.lmdb"})

Thank you very much for the quick response.
I am still struggling on how to install ocpmodels.
I have tried:
conda install ocpmodels
on Jupiter notebook but it does not work. Please is there a way out?

The repo is not installable via a conda package. Please follow the details here for step-by-step instructions: GitHub - Open-Catalyst-Project/ocp: https://opencatalystproject.org/.

Hi Muhammad,

Thank you for all your help.

I don’t know if I will be asking for too much if I request for a complete code to be able to open/view the data frame in the LMDB files. I keep running into errors when I try to follow the sample code you provided which seems to be for only the S2EF. I want to interact with the IS2RE dataset. Please guide me.

I don’t know the section of the sample code within which I am into input:

from ocpmodels.datasets import SinglePointLmdbDataset
dataset = SinglePointLmdbDataset({“src”: “path/to/is2re/data.lmdb”})

I look forward to reading from you

If you can share your error messages I can better help you. Navigating the IS2RE dataset is similar to the S2EF dataset:

  1. If you haven’t already done so, make sure to download the IS2RE LMDBs here - ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub. Uncompress the downloaded file.
  2. Clone the OCP repo onto your machine.
  3. Follow the installation instructions here to set up your conda environment - GitHub - Open-Catalyst-Project/ocp: https://opencatalystproject.org/. If you are using a non-gpu machine, use
    conda-merge env.common.yml env.cpu.yml > env.yml
    when you get to that step. Make sure to pip install -e . from within the cloned ocp directory.
  4. Make note of the path to the downloaded+uncompressed IS2RE LMDBs, specifically find the data.lmdb file you’re interested in, let’s assume all/train/data.lmdb.
  5. Run the following:
from ocpmodels.datasets import SinglePointLmdbDataset
dataset = SinglePointLmdbDataset({"src": "all/train/data.lmdb"})
sample = dataset[0]
print(sample)

output: Data(atomic_numbers=[86], cell=[1, 3, 3], cell_offsets=[2964, 3], distances=[2964], edge_index=[2, 2964], fixed=[86], force=[86, 3], natoms=86, pos=[86, 3], pos_relaxed=[86, 3], sid=2472718, tags=[86], y_init=6.282500615000004, y_relaxed=-0.025550085000020317)

Hi Muhammed,

I get this error when I try to run [from ocpmodels.datasets import SinglePointLmdbDataset]:

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from ocpmodels.datasets import SinglePointLmdbDataset

ModuleNotFoundError: No module named ‘ocpmodels’

Have you followed the installation instructions first, these errors are due to not properly installing the repo/dependencies.