How to upload engineered descriptor derived dataset (.csv files), model (train, validation, test) scripts and associated scripts on evaluation server?

Hi,

I am finding it not straightforward to upload on the eval server the generated datasets (train, validation, test) from the lmdb files as .csv files which will then be used in the training/ validation/ testing scripts to get the MAE/ EwT metrics. I constructed new descriptors to transform the lmdb data (train/validation/test) into .csv files first and then using the files which contain the transformed data, I carry out training/ validation/ testing steps. Any basic, example grade guidelines on how to upload the derived datasets and the relevant scripts on the eval server? This link does not clearly mention what+how to proceed when transformed dataset as .csv files are used: ocp/train.md at release · Open-Catalyst-Project/ocp · GitHub. Any help and detail guidelines would be highly appreciated.

Thanks,
Rajarshi

Hey @rajarshiche, you only need to submit one file to the evaluation server — the predictions on the evaluation set — not the dataset files or any other scripts.

If this is for the Open Catalyst Challenge, see the “Submission Guidelines” section here: Open Catalyst Challenge. We describe the format of the file to be uploaded — it’s a numpy binary file with challenge_ids and challenge_energy arrays. There’s also a dummy submission file here that you can download and inspect.

If this is not for the challenge, and you’re looking to submit to the usual OC20 validation or test evaluation servers, those are numpy binaries as well. Here are a few example submission files from GemNet-OC: S2EF on test, IS2RE on test.

Hope this helps, happy to answer follow-ups!

Thanks for the submission guidelines. I was looking for the prediction of IS2RE-Total task as attached (under Open Catalyst 2022 (OC22) of ocp/DATASET.md at main · Open-Catalyst-Project/ocp · GitHub), using the dataset which has test_id/test_ood/val_id/val_ood splits. Just to clarify, for each split, do I have to submit a binary .npz containing the following arrays?
{
“challenge_ids”: array([‘1_1’, ‘1_2’, …]),
“challenge_energy”: array([-3.63920, -1.08237, 12.92103, …,])
}

Hi -

Sorry about that. We are in the process of updating the script to make the correct submission file. It can be found here - ocp/make_submission_file.py at predict_fp · Open-Catalyst-Project/ocp · GitHub. We plan to merge these changes soon. But the npz file that gets created and is used for the evaluation server will look something like this:

{
"id_ids": array(['1', ...]),
"id_energy": array([2342, ...]),
"ood_ids": array(['6', ...]),
"ood_energy": array([7935, ...]),
}

Hope this helps!

1 Like

Thanks for the clarification and it did the job. One follow up question is does the id_energy or ood_energy values have to be in specific scaled values rather than the absolute relaxed energy values while uploading to the evalAI server? Is there a specific scaling factor to be used for the evalAI submission for OC22?