Vasp parameters for OCP dataset generation

We have been trying to use some models trained on the OCP dataset in conjunction with our own data, and have run into some discrepancies. Mainly, we want to ensure that when we run VASP on our own systems of interest, we will generate results consist with those in the ground truth for the OCP training data. Are the VASP settings used in the code on this page the exact ones that were used for all the calculations in the training set? We tried using these parameters to run VASP calculations for a few systems taken from the training data and got results that were not entirely consistent with the results in the dataset.

Yes, that code was used for the OC20 dataset. Specifically - Open-Catalyst-Dataset/vasp.py at a7832a430957e9cba3f0e08ed213408114ec47ef · Open-Catalyst-Project/Open-Catalyst-Dataset · GitHub.

Are you comparing your values to the values in the LMDB? If so those values won’t be consistent because those are adsorption energies. You want to be comparing values to the raw trajectories found here - ocp/DATASET.md at main · Open-Catalyst-Project/ocp · GitHub.

1 Like

We are using the raw trajectories and calculating both the total and reference energies using VASP (with the settings from that file) and subtracting them to obtain the adsorption energies.

Are you seeing the discrepancy in raw energies? Adsorption energies? Or both? I’m trying to identify whether it’s a referencing issue or VASP.

To help debug maybe you can provide the outputs of a sample system (randomid?) calculation you’re trying to do for adsorption energy.

This sheet shows the results of our comparison on a handful of systems from the training set. We took the trajectories from the dataset page, then ran VASP calculations of a) the full initial system, b) the initial system without the adsorbate, and c) the initial system with only the adsorbate (we also tried just using the E_gas table given in the paper for this step instead of simulating the adsorbate energies), then compared the resulting energies to the energies recorded in the final state of the trajectory and the given reference energies.

I’m trying to understand the spreadsheet shared. Few questions and comments -

  1. ocp E_gas is the correct methodology. We don’t evaluate each adsorbate for the gas reference.

  2. Have you tried doing a sanity check on just single point calculations and not the full optimization first? For example taking the initial/relaxed structure of an OCP system, setting NSW=0 in the INCAR, and comparing the resulting energy values? Will be easier to debug starting from that as there are some nuiances to how long relaxations were actually allowed to run despite the INCAR settings.