Extracting Surface Energy Data from OC22 Dataset

Hello OC22 community,

I’m currently working with the OC22 dataset and have developed a physics-based machine learning model. In this model, I’m using surface energy as a key feature for slab surfaces as input. I noticed in the OC22 paper that it mentions “this enables the community to study other surface properties like surface energy,” which greatly interests me.

My goal is to extract surface energy data for all clean surface models and use it in my machine learning model for high-throughput screening. I understand that I can locate surface information through the traj_id in the OC22 mapping and extract detailed information about the surface structure (such as energy and surface area) from the last frame of the surface structure.

However, I’m uncertain whether the OC22 database provides information about the corresponding bulk for these surfaces. Bulk energy and composition information are essential for calculating surface energy.

My questions are:

  1. How can I obtain bulk information from OC22 that corresponds to the surface structures?
  2. Has the OC22 team provided any convenient scripts or tools for extracting surface energy data for all surfaces in the dataset?
  3. Have other researchers in the community had similar requirements or experiences they could share?

Any insights, suggestions, or resources would be greatly appreciated. Thank you in advance for your help!