Open Catalyst 2022 (OC22) Dataset: Oxide Electrocatalysis

We’re very excited to announce the release of the Open Catalyst 2022 (OC22) Dataset: Oxide Electrocatalysis.

Two years ago we released the Open Catalyst 2020 (OC20) Dataset and have been impressed by the amazing progress the community has made so far. While OC20 spanned a large chemical and material space, it did not include everything. Specifically, OC20 lacked oxide materials - a class of materials that play an important role in green hydrogen production (Oxygen Evolution Reaction (OER)) and other oxide chemistries. Today we’re releasing OC22 in hopes of continuing to encourage the development of faster, more accurate models on even more complex systems. OC22 consists of ~60,000 DFT relaxations (~9M single point calculations) and took upwards of 20M compute hours. For reference, OC20 took ~70M compute hours and was almost 16x this size.

While OC20 is yet a solved problem, we anticipate OC22 to aid in the development of more generally applicable models and methods. Noteworthy, OC22 modifies the energy targets to be the DFT total energy, instead of the adsorption energy. A more challenging task, the DFT total energy would allow models to additionally screen surface configurations, an important and necessary step for studying OER. We’ve released a new dataloader that allows you to explore the same task for OC20 as well.

One question that arises when new datasets are created is whether the data complements existing datasets or vice versa. In this work, we explore the extent OC20 can aid OC22 via transfer learning or by jointly training on both datasets. We hope the existence of both datasets will also encourage the community to explore transfer learning strategies to aid catalyst applications more broadly.

For more details make sure to check out our paper.
Dataset download: ocp/ at main · Open-Catalyst-Project/ocp · GitHub

