New GemNet-dT code, results, model weights

abhshkdz · August 27, 2021, 7:17pm

Hi all,

We just released an implementation of GemNet-dT (following arxiv.org/abs/2106.08903) on the OCP repository along with pretrained model weights. This model achieves the best results we know of thus far across all OCP tasks (see leaderboards). This was made possible by Johannes Klicpera who implemented GemNet in the OCP codebase over his summer internship with us, thank you!

Specifically, improvements (averaged across all splits) compared to the next-best entry on the leaderboard:

— IS2RE energy MAE (via relaxation), 0.4342 —> 0.3997 (7.9%↑ relative)
— S2EF force MAE, 0.0297 —> 0.0242 (18.5%↑ relative)
— IS2RS AFbT, 21.8% —> 27.6% (26.6%↑ relative)

GemNet-dT is also relatively quite efficient. For S2EF, we’re able to fit batch sizes of up to 16 (or 32 with AMP) on NVIDIA 32GB V100 GPUs, compared to 8 for DimeNet++ and 3 for SpinConv. Training these models for 24 hours on 16 x V100s gets to 0.025 force MAE on val ID for GemNet, compared to 0.035 for DimeNet++ and 0.047 for SpinConv.

Also included as part of this code release is an implementation of SpinConv (following arxiv.org/abs/2106.09575) and several other improvements. Complete details here: Release v0.0.3: GemNet-dT, SpinConv, new data: MD, Rattled, per-adsorbate trajectories, etc. · Open-Catalyst-Project/ocp · GitHub.

Note that our team will not be entering GemNet, SpinConv, or any other model in the challenge we’re hosting at NeurIPS. We encourage everyone to refer to and/or build on any of this code for the challenge (or otherwise).

Thanks

hoaxingz · August 30, 2021, 4:49am

Thanks for the release! It seems like you guys are using a very large number of GPUs (64 or 16 V100s?) for multiple days to train the models. If we are students at a research institution with limited GPU resources (e.g. 1 or 2 V100 GPUs for training each model), is it feasible to participate in this competition and get good results, or would you say that large-scale compute is a strong prerequisite to get good results?

mshuaibi · August 30, 2021, 1:48pm

Hi - This is a common concern we’ve been receiving. We discuss some of them in more detail here - IS2RE Leaderboard Concerns.

TLDR - Models trained on the S2EF dataset (trained with the compute you mentioned) that then run a relaxation to get the relaxed energy are currently the best performing approach. Alternatively, training a model on the IS2RE dataset (~250x less data than the S2EF dataset) to directly predict the relaxed energy is something we’re also interested in for compute reasons (direct approaches are 200-400x faster at inference). To address this (not finalized yet), we are leaning towards awarding 2 teams (1) overall best performance, irrespective of the dataset/approach used and (2) the best performance having only trained on the IS2RE dataset (~460k data points). This would allow teams without heavy compute to still compete without being at a significant disadvantage merely due to compute resources.

Let us know if there are any other concerns. We are constantly trying to make the competition as engaging as possible for the community.

ccc · September 5, 2021, 1:54pm

Thanks for sharing this awesome model!
How does the GemNet perform in the validation set?

mshuaibi · September 6, 2021, 4:21pm

Hi -

Here are some validation numbers for GemNet:

IS2RE (relaxation)/IS2RS	Energy MAE (eV)	EwT	ADwT
ID	0.397	11.81%	58.21%

S2EF	Energy MAE (eV)	Forces MAE (eV/A)	Forces Cos
ID	0.234	0.021	0.632
OOD-Ads	0.245	0.024	0.621
OOD-Cat	0.347	0.025	0.575
OOD-Both	0.405	0.032	0.605

SanZhang · September 10, 2021, 1:58pm

Here are the problems that I try to reimplement this: “Training these models for 24 hours on 16 x V100s gets to 0.025 force MAE on val ID for GemNet”.

It seems that the pre-trained GEMNET has a bsz of 2048. However, 16 x V100s can not fit such big bsz and I can not find any code related to grad accumlation.
After implementing the grad accumulation myself, the training loss seems not good.

BTW, could you please provide the tensorboard log file for leting us compare the training process.
Thanks a lot!

mshuaibi · September 10, 2021, 3:18pm

Hi -

The pre-trained GemNet was trained on 64 x 32 GB V100 cards with AMP (--amp at the command line). This allowed for a batch size of 2048, no grad accumulation was necessary. At an effective batch size of 512 you should be able to get similar performance as well.

As far as logs, we will certainly look into this and see if it’s possible internally.

SanZhang · September 11, 2021, 2:57am

Thank you very much. You address my problems!

limei · November 29, 2021, 11:10pm

Hi,

Have you tried the GemNet-Q model? If so, could you share some training information and/or validation results?

Thanks a lot!

abhshkdz · December 7, 2021, 5:34pm

We’ve tried GemNet-Q, and it’s very hard to train a reasonable sized model given the memory requirements for quadruplets, so we’ve stopped pursuing GemNet-Q on OCP data.

We’re currently working on developing a more memory-efficient version of GemNet-T / Q, but that’s ongoing work, not public yet.

limei · December 7, 2021, 5:58pm

Thank you very much!

Topic		Replies	Views
Using GemNet versions and LMDB files	2	77	December 13, 2024
2nd Open Catalyst Challenge @ NeurIPS 2022	27	1885	July 27, 2023
IS2RE Table 4 Corrections	6	908	July 27, 2021
Evaluation server and leaderboards are now up!	0	746	March 1, 2021
Bader charges and LOBSTER analyses on relaxed structures	9	1262	June 21, 2022

New GemNet-dT code, results, model weights

Related topics