Questions about Data mapping information

Hello, OCP team!
Recently I have been working with Data mapping information (a Python pickle) to process relevant data. I need the information about the slab and adsorbates for each of the systems in OC22 dataset. A Python dictionary can be loaded by loading the pickle file. But I’m confused about the meaning of a key (miller_index) in the loaded Python dictionary. In the DATASET.md, miller_index is interpreted as 3-tuple of integers indicating the Miller indices of the surface. I don’t quite understand the explanation. Can I interpret this key as the location of the adsorbate on the slab. For example, for the same slab, different miller_index means that adsorbate position on the slab is also different. Specifically, in an oer reaction, the adsorbate OH*, O*, OOH* will appear sequentially. Should the OH*, O*, OOH* conform to the same miller_index?
Can you help me explain in detail what this key (miller_index) means. Thank you.

Hi -

Not quite. When creating catalyst surfaces you must first start with some bulk material. This is identified as bulk_id in the data mapping. Starting from a bulk material, there are different ways one can “slice” that bulk material to create different surfaces, for example:

How we can define these different surfaces can be done with miller index which in simple terms, is just a notation to define different planes of the bulk crystal material. Surfaces can have different performance and hence important to study adsorbates placed on different miller indices i.e. Cu (100) vs Cu (111).

Hope that helps!

Thank you for your answer. I think I understand.
For the picture you shared, can you tell me what the miller index is for the last two structures that I don’t understand how they were created.
image

I don’t have the exact miller indices for these, I mainly use these to illustrate different surfaces. You can take a look through this to give you a better understanding of what various miller indices will correspond to - 1.2: Miller Indices (hkl) - Chemistry LibreTexts.

Ok, I see. Thank you so much for helping me.

Now I have a new question. :disappointed_relieved:
If I have a sid for a material, or a bulk_id for that material, how do I find the LMDB Data object of that material? Now I can only go through all the materials of the entire oc22, and finally match the corresponding material. Is there any other way?

You may have an easier time downloading the following: ocp/DATASET.md at main · Open-Catalyst-Project/ocp · GitHub. These are the full trajectories for different sids, you can get what you need here rather than searching the LMDBs.