Using our own data from Vasp with the OCP codebase

kaselby · December 14, 2021, 5:35pm

Hello,

We are looking to test some of the models in the OCP repo with some of our own data, which was generated using Vasp. How do we get the data we have in the right format to be used by the models? I’ve tried looking at the dataset creation tutorial and the data preprocessing tutorial, but they all seem to assume that your data was generated and saved by ASE. I’ve looked at importing Vasp runs into ASE, but it only seems to import the final, post-relaxation positions/energies, and the format here also requires the pre-relaxation positions. Do I need to write my own script to turn Vasp’s outputs into a format parseable by ASE that has all the information we need and then apply the pipelines shown in those links? Or is there an easier way?

mshuaibi · December 14, 2021, 6:08pm

Hi,

Glad to hear you’re interested in using your own data with the OCP repo. All our data was generated with VASP and then converted via ASE. Assuming you have successfully run VASP, you should see an OUTCAR file that corresponds to the results. The following code will allow you to read in your VASP data for a particular system in a format ready to be used in the dataset creation tutorial:

import ase.io
data = ase.io.read("OUTCAR", ":")

Let me know if this works for you. Alternatively, if you share some code snippets from your end I can better help debug what could be different in our workflows.

kaselby · December 14, 2021, 7:22pm

I see. I had thought that the OUTCAR file only contained the positions for the atoms post-relaxation, and not their initial positions, which we need for the IS2RE/IS2RS tasks.

mshuaibi · December 14, 2021, 7:28pm

If you ran a relaxation via VASP your OUTCAR includes all states - initial position, intermediates, and final position. You can index out initial + final positions accordingly:

data = ase.io.read("OUTCAR", ":")
initial_position = data[0]
relaxed_position = data[-1]

Note - the presence of a full OUTCAR doesn’t necessarily mean a relaxed state was reached. It could have reached the max number of steps and terminated accordingly. We filtered these systems by checking if the max absolute force on the relaxed state was less than 0.05 eV/A.

kaselby · December 14, 2021, 7:29pm

Ah, okay, I must have misunderstood something then. Thanks very much!

yinlliang · November 7, 2023, 11:27am

On this basis, I would like to ask you a question. How to put the processed data into the OC22LmdbDataset class, I loaded this class. But it doesn’t seem to be able to be used to feed into a model for training.

Topic		Replies	Views
Create OC22lmdbdataset by using OUTCAR（relaxation）	3	270	November 14, 2023
Vasp input file	2	1096	March 2, 2021
Vasp parameters for OCP dataset generation	5	622	August 15, 2022
Using My Own Data-ASE-Readable Files-POSCAR	7	361	September 13, 2023
Using OC22 Dataset for VASP Input File Generation	1	20	November 25, 2024

Using our own data from Vasp with the OCP codebase

Related topics