With less than a month from the release of the test-challenge dataset split we look forward to seeing what the community has to offer. We’d like to post a few updates regarding some of the past concerns people have raised: IS2RE Leaderboard Concerns.
We recognize that the resource availability of participants may vary drastically across research labs and industries. To encourage more participation, we will be recognizing 2 winners for the NeurIPS ‘21 Challenge based on: (1) The best overall performance and (2) The best performance using ONLY the IS2RE dataset (size 460,328): ocp/DATASET.md at master · Open-Catalyst-Project/ocp · GitHub. You will be prompted at submission time to specify whether you only used the IS2RE dataset or not.
Participants submitting to track (2) are prohibited from using any other datasets and/or pretrained S2EF models. Data augmentation is permitted as long as it comes ONLY from the IS2RE dataset. Pretraining in any form that uses S2EF data will not be allowed for this track. Participants submitting to track (1) are free to use any dataset. Using DFT is prohibited for both tracks.
Participants can submit to both tracks as long as the submissions follow the regulations mentioned above. We will be inviting the winners of each track for an oral presentation at NeurIPS. If a single team wins both tracks, we will additionally invite the second place team of track 2 to present.
An updated leaderboard will be released with the following additional column (among other minor additions): Dataset (mandatory): IS2RE-only vs Any
Please let us know if you have any questions or concerns. If you are unsure as to what track your approach falls under we recommend you reach out to us sooner rather than later as to avoid any future confusion.
No - Relaxation trajectories are what comprise the S2EF dataset, so augmenting this would not be permitted for track 2. Augmentation can only come from direct transformations of the IS2RE dataset (rotations, interpolations, etc.), no external sources are permitted here.
A) Is cross-validation on the whole dataset (train + validation set) permitted for the final competition solution?
B) Can we ensemble multiple models by using different seed?
By cross-validation I refer to the general idea of using both train+validation data during training in ways other than a classic train-val split (i.e. k-fold cross-validation). You can find several examples online on how this is exactly done. We haven’t used these methods in any of the OC20 results presented. I hope this answers your question.
Question: Does IS2RE-only competition learder board (track (2)) include using test dataset (in IS2RE all, not the competition test dataset) as augmentation (like for Self-Supervised Learning) for training?
Test data of any kind is prohibited for training use in both tracks of the NeurIPS '21 Challenge. Such approaches are permitted in the main OC20 leaderboard however.