2nd Open Catalyst Challenge @ NeurIPS 2022

Hi all,

We’re very excited to announce the 2nd Open Catalyst Challenge at NeurIPS 2022!

This year’s challenge will be on the same task — Initial Structure to Relaxed Energy (IS2RE) prediction — and use the same training and validation data — OC20 — as last year.

Different from last year, we’re planning to have a single track, and allowing (and encouraging!) the use of both the IS2RE as well as the S2EF (Structure to Energy-Forces) 2M datasets for training.

We’ve found relaxation-based IS2RE approaches (trained on S2EF-2M) to consistently perform significantly better than direct IS2RE approaches, albeit more expensive.

Training on S2EF splits larger than 2M is not allowed for the challenge to keep compute costs ~manageable.

The submission deadline for the challenge is October 07, 2022. We’ll be releasing the test dataset for the challenge on September 21, 2022, ~two weeks before the deadline.

More details here: Open Catalyst Challenge.

We saw amazing participation and modeling improvements last year (thank you!), and look forward to more of the same this year!

Please let us know if you have any questions or concerns.

Good luck!

— Open Catalyst team

5 Likes

Hello there,

I have a question regarding this particular line: “Using DFT is not allowed.” Could you elaborate on what do you mean by DFT? Do you mean the whole theory, codes and/or data due to the theory, parts of ideas due to the theory or something else? It would be good to be precise.

Thanks!

Hello there,

I have 1 question to ask:

it is about to what extend we can utilize extra information in the training? Like the Mendeleev periodic table information or some basic knowledge about the atom itself. Or we just cannot use any extra information as not provided in the dataset, just use the atomic number?

Thanks!

@abhshkdz for your comment. Thanks!

Thanks for the questions!

Just to reiterate our motivation behind organizing the challenge — we want to encourage methods that can accelerate IS2RE pipelines and make them considerably faster than DFT. Using DFT (with equivalent theory as in OC20 data) at test time would be similarly slow, and hence not allowed.

Having said that, there might be other cheaper calculations (e.g. force fields, reactive force fields, some approximate tight binding methods) that are much faster. That’s fine to do, especially if these calculations take < 1 second per IS2RE prediction (simulating the entire relaxation may take slightly longer, < 10 seconds). We obviously don’t have a way to strictly enforce / check inference times since we just ask for predictions, but would appreciate it if you stick to the spirit and keep the ~1 second ballpark number in mind. Note that most tight binding methods are significantly more expensive than this.

Other auxiliary features (e.g. Bader charges, other element properties as in the CGCNN paper, etc.) are also fine to use to train models, but worth keeping in mind that some of these features (e.g. Bader charges) might not be available at test time. We recently released Bader charge data for OC20 training / validation here: ocp/DATASET.md at main · Open-Catalyst-Project/ocp · GitHub.

Hi,

The IS2RE data contains the relaxed structure for each initial structure. I want to make sure that we are allowed to use the relaxed structures, right? Thanks!

Limei

1 Like

Yes, using relaxed structures is allowed.

1 Like

Hello @mshuaibi, are IS2RE validation data allowed for training? Just curious.

Yep you’re allowed to use IS2RE validation data for training.

1 Like