NeurIPS ‘21 Challenge Updates

Hi All -

With less than a month from the release of the test-challenge dataset split we look forward to seeing what the community has to offer. We’d like to post a few updates regarding some of the past concerns people have raised: IS2RE Leaderboard Concerns.

  1. We recognize that the resource availability of participants may vary drastically across research labs and industries. To encourage more participation, we will be recognizing 2 winners for the NeurIPS ‘21 Challenge based on: (1) The best overall performance and (2) The best performance using ONLY the IS2RE dataset (size 460,328): ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub. You will be prompted at submission time to specify whether you only used the IS2RE dataset or not.
  2. Participants submitting to track (2) are prohibited from using any other datasets and/or pretrained S2EF models. Data augmentation is permitted as long as it comes ONLY from the IS2RE dataset. Pretraining in any form that uses S2EF data will not be allowed for this track. Participants submitting to track (1) are free to use any dataset. Using DFT is prohibited for both tracks.
  3. Participants can submit to both tracks as long as the submissions follow the regulations mentioned above. We will be inviting the winners of each track for an oral presentation at NeurIPS. If a single team wins both tracks, we will additionally invite the second place team of track 2 to present.
  4. An updated leaderboard will be released with the following additional column (among other minor additions): Dataset (mandatory): IS2RE-only vs Any

Please let us know if you have any questions or concerns. If you are unsure as to what track your approach falls under we recommend you reach out to us sooner rather than later as to avoid any future confusion.

Good Luck!

-The OCP Team

4 Likes

Question: Does IS2RE-only learder board (track (2)) include using Relaxation Trajectories dataset as augmentation?

No - Relaxation trajectories are what comprise the S2EF dataset, so augmenting this would not be permitted for track 2. Augmentation can only come from direct transformations of the IS2RE dataset (rotations, interpolations, etc.), no external sources are permitted here.

Hi OCP official teams,

A) Is cross-validation on the whole dataset (train + validation set) permitted for the final competition solution?
B) Can we ensemble multiple models by using different seed?

Thanks for your reply!

Yes - cross-validation and ensembling is permitted.

Hi,

What do you mean by cross-validation on this task?

Thanks.

By cross-validation I refer to the general idea of using both train+validation data during training in ways other than a classic train-val split (i.e. k-fold cross-validation). You can find several examples online on how this is exactly done. We haven’t used these methods in any of the OC20 results presented. I hope this answers your question.

So on the IS2RE dataset we can use train + all validation splits (560K) for training?

Thanks

Yes, you may use both the training and validation set and still be included in the IS2RE-only dataset track.

Question: Does IS2RE-only competition learder board (track (2)) include using test dataset (in IS2RE all, not the competition test dataset) as augmentation (like for Self-Supervised Learning) for training?

Test data of any kind is prohibited for training use in both tracks of the NeurIPS '21 Challenge. Such approaches are permitted in the main OC20 leaderboard however.