Opening LMDB files

Please I will like to open the LMBD file to see how the data in it is structured. Is there a sample code I can use to open and access the contents?

You can refer to the end of this tutorial for help with interacting with the LMDBs - ocp/lmdb_dataset_creation.ipynb at master · Open-Catalyst-Project/ocp · GitHub. Note this sample code is for S2EF LMDBs (specifying the directory as src). If you’re interested in interacting with the IS2RE dataset, something like this should do:

from ocpmodels.datasets import SinglePointLmdbDataset
dataset = SinglePointLmdbDataset({"src": "path/to/is2re/data.lmdb"})

Thank you very much for the quick response.
I am still struggling on how to install ocpmodels.
I have tried:
conda install ocpmodels
on Jupiter notebook but it does not work. Please is there a way out?

The repo is not installable via a conda package. Please follow the details here for step-by-step instructions: GitHub - Open-Catalyst-Project/ocp: https://opencatalystproject.org/.

Hi Muhammad,

Thank you for all your help.

I don’t know if I will be asking for too much if I request for a complete code to be able to open/view the data frame in the LMDB files. I keep running into errors when I try to follow the sample code you provided which seems to be for only the S2EF. I want to interact with the IS2RE dataset. Please guide me.

I don’t know the section of the sample code within which I am into input:

from ocpmodels.datasets import SinglePointLmdbDataset
dataset = SinglePointLmdbDataset({“src”: “path/to/is2re/data.lmdb”})

I look forward to reading from you

If you can share your error messages I can better help you. Navigating the IS2RE dataset is similar to the S2EF dataset:

  1. If you haven’t already done so, make sure to download the IS2RE LMDBs here - ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub. Uncompress the downloaded file.
  2. Clone the OCP repo onto your machine.
  3. Follow the installation instructions here to set up your conda environment - GitHub - Open-Catalyst-Project/ocp: https://opencatalystproject.org/. If you are using a non-gpu machine, use
    conda-merge env.common.yml env.cpu.yml > env.yml
    when you get to that step. Make sure to pip install -e . from within the cloned ocp directory.
  4. Make note of the path to the downloaded+uncompressed IS2RE LMDBs, specifically find the data.lmdb file you’re interested in, let’s assume all/train/data.lmdb.
  5. Run the following:
from ocpmodels.datasets import SinglePointLmdbDataset
dataset = SinglePointLmdbDataset({"src": "all/train/data.lmdb"})
sample = dataset[0]
print(sample)

output: Data(atomic_numbers=[86], cell=[1, 3, 3], cell_offsets=[2964, 3], distances=[2964], edge_index=[2, 2964], fixed=[86], force=[86, 3], natoms=86, pos=[86, 3], pos_relaxed=[86, 3], sid=2472718, tags=[86], y_init=6.282500615000004, y_relaxed=-0.025550085000020317)

Hi Muhammed,

I get this error when I try to run [from ocpmodels.datasets import SinglePointLmdbDataset]:

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 from ocpmodels.datasets import SinglePointLmdbDataset

ModuleNotFoundError: No module named ‘ocpmodels’

Have you followed the installation instructions first, these errors are due to not properly installing the repo/dependencies.

Hi!

Do you know why is this happened? I installed the ocpmodels properly, but I still could not run the code.
Thank you for your help

Hi -

Try the following:

from ocpmodels.datasets import LmdbDataset

dataset = LmdbDataset({"src": "Documents/etc/etc/train"}) 

Some context here - When loading S2EF data you only need to give the directory, and not a specific *.lmdb file. When using IS2RE data you need to specify the exact .lmdb file. We’re working on making this easier to use, but hopefully this resolves your issue for the time being.

1 Like

Sir, can you please help me with the pip equivalent commands for creating the virtual environments with the required OCP dependencies?
The installation instructions are given to set up the Conda environment. However, I cannot run Conda on my cluster, hence the need to use Pip.

Thank you for your time!