We’re thrilled to announce the 3rd Open Catalyst Challenge at NeurIPS 2023!
This year’s focus: computing adsorption energy — which builds on our past challenges and moves closer to practical applications.
An important quantity for screening catalysts is the adsorption energy, i.e. how strongly a molecule referred to as the “adsorbate” interacts with the catalyst’s surface. This year’s task will be to find the adsorption energy given an adsorbate and a catalyst surface.
While innovation on this task can take various forms (more details on the website), one approach is: given a set of ML relaxed structures, predict their energies and identify the global minimum energy structure and its corresponding energy (adsorption energy). This could be accomplished with initial structure to energy (IS2RE) models, which were the focus of our previous challenges. So, participants could use models or variants of models from previous IS2RE challenges as a starting point.
This year participants are allowed to train on the OC20 IS2RE/S split and the 2M S2EF split, as well as the OC20-Dense ID split. OC20-Dense was part of the recent AdsorbML release and focuses on the identification of adsorption energies. The training data is limited to a subset of OC20 to keep the computational burden manageable.
The challenge deadline is Oct 06, 2023. We’ll be releasing the test dataset for the challenge on September 20, 2023, about two weeks before the deadline.
We will start hosting office hours on Wednesdays from 9am-10am PT and Thursdays from 4pm-5pm PT. We will kick off the first office hours Thursday (7/20) and Wednesday (7/26) with a presentation providing an overview of this year’s challenge and ways to approach it. The Zoom link for office hours is here: Launch Meeting
The winners’ announcement and presentations will be featured in the AI for Science workshop at NeurIPS this year — we’re really excited that a couple teams will be able to present their work in front of such an amazing audience!
Hi,
Regarding this statement from the original post: “The training data is limited to a subset of OC20 to keep the computational burden manageable.”, we are pre-computing additional training features using RDKit and PySCF for the OC20 training dataset. Would this be allowed?
Hi, can you provide more details on what features you would be pre-computing? Or what type of calculation you would be running with PySCF?
In general, we are not allowing participants to use DFT on the test set. So, if you would need to run a DFT calculation (or similar) with PySCF to make predictions on the test set, that approach would not be allowed.
Thank you for your reply. To clarify, we are not conducting any DFT calculations using PySCF. Instead, we are using PySCF for featurization at the atom and electron levels for the training data. We then train an intermediate network to predict these features during inference time. Our approach does not involve any PySCF calculations during inference.
Hi - to clarify, are you referring to all the initial structures in ASE format (the equivalent of the LMDBs)? If so, we can release those as well for convenience.
Hi, thanks for the clarification. Yes, that approach would be allowed. One thing to keep in mind is that all models (including intermediate models) can only be trained on the permitted data listed on the challenge website in the dataset section. For example, if you are utilizing S2EF data it is limited to the 2M set.
maybe a stupid quesiton! during the office hours you mentioned you will share a code that we can start with, did you publish it already, I can’t find yet. Otherwise, do you think starting with adsorbML code is a good starting point?
I found out that tutorials repo inside OCP is great for beginners like me! yet it seems it is in-progress but the docker image is a very good start GitHub - Open-Catalyst-Project/tutorial
Hi, I am glad you were able to find some tutorial materials to get started with the OCP codebase! We have some additional tutorials here. If you prefer a conda install over docker we also have instructions for that here. The validation code that we referenced during the challenge overview is now available, and can be found here. If you have more questions about getting started with the OCP codebase feel free to stop one of our office hours — we are happy to answer questions!
The challenge submission deadline has been been extended by a week from Oct. 6th to Oct. 13th. The test set will still be released on Sept. 20th.
The use of pre-trained LLMs such as Llama 2 will be allowed for the competition. Any additional training/fine-tuning of pre-trained LLMs should only use the permitted training data listed on the challenge website.
Hi everyone, we have released this year’s test set! Links for both LMDB and ASE formats can be found on the challenge website under the Challenge test data subsection. Submissions will be made through Eval AI and are open until Oct. 13th. Guidance on the submission process and the file format can be found under the submission guidelines section on the challenge website.
If you run into issues or have questions, post them here or join us for office hours on Wednesday 9-10am PT or Thursday 4-5pm PT (zoom link). Good luck to all the participants! We can’t wait to see your innovations!