Accessing the .cif structure files from the dataset

Dear OC2020 team,

First of all I would like to thank you for providing the dataset and the codebase.

I have been trying to employ your database and code for my research. However, I am not sure how to navigate through the .lmdb data files. My current research requires CH* binding energies on different catalyst surfaces that are computed after incorporating spin polarization. For this I need the .cif structures for the CH* adsorbates from your database.

I have been studying your codebase and I was able to read the .lmdb files, but I suppose these files only contain the crystal graphs and not the .cif structures (please correct me if I am wrong). Thus, if possible, may I request you to please help me with the .cif structure file for CH* adsorbate? It would greatly help me with my research.

I am looking forward to hearing from you.

Thanking you

Sincerely,

Shambhawi

1 Like

Hi @priya

Glad to hear you’re interested. The .lmdb files are structured in a manner to be directly used with the codebase. It sounds like you want an ASE compatible structure. Although we don’t have .cif structures, you can download our similar ASE-readable .extxyz structures here. You can then convert these structures to .cif format with a few lines:

import ase.io

structures = ase.io.read("random0000.extxyz", ":")
ase.io.write("random0000.cif", structures)

As far as parsing the dataset for CH* adsorbate systems, ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub has all the information you need to find what you’re looking for.

Let us know if you still need help.

3 Likes