Using GemNet versions and LMDB files

Good day, OCP team,

I am interested in using the GemNet versions in the fairchem repository due to their optimization for multiple GPUs. I would appreciate your guidance on using both versions of the model (gemnet_gp and gemnet_oc). My intention is to use them both for making predictions of energy and forces as well as for performing molecular dynamics. To be more concise, allow me to pose the following questions:

  1. What is the relevance of the LMDB files required for both models in the base.yml file?
  2. What information do both the s2ef files and the LMDB files contain? Do your versions of GemNet incorporate the use of information that Gasteiger’s original version did not use, or is the exact same information used for both performing inferences and dynamics?
  3. Are there specific LMDB files for s2ef to train and test the model? This question arises because, in the download_data.py file, the only files of this format for s2ef are apparently those marked as ‘test’.
  4. If to use your versions of GemNet I need to convert my data into .XYZ format as used by the original model version or my s2ef data to LMDB data, after performing this conversion is it only necessary to set its path in the ‘base.yml’ file to make full use of both versions? (This is considering using the model for any molecular structure that requires investigation with my own data.) Do you have publicly available code to perform these conversions into the data format?
  5. What would be the reasons to choose the gemnet_gp model over the gemnet_oc model or vice versa?
  6. Is the command to use the gemnet_oc model analogous to that for gemnet_gp? (python main.py --mode train --config-yml configs/s2ef/all/gp_gemnet/gp-gemnet-xl.yml --distributed --num-nodes 32 --num-gpus 8 --gp-gpus 4)
  7. If its use is not analogous, how can I use the gemnet_oc model?

Thank you for your time and attention.

Hi -

  1. The LMDB files are what contain the trianing/validation set. They are used for training/validation/prediction. Definitely check out our tutorial - Making LMDB Datasets (original format, deprecated for ASE LMDBs). Which contains more information on how to create this.

  2. See above for details on what information is used. The GemNet-OC was developed directly in our repo by Gasteiger, so yes it is the same.

  3. The only difference between train/test is that you dont have the labels stored for test - “energy/forces”, as that is what the model should be predicting.

  4. See the tutorial linked above to give you exact steps on how to create these datasets. They don’t specifically need to be xyz. Let us know if you have any questions in the process.

  5. GemNet-GP was designed to scale to very large model sizes ([2203.09697] Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations). I would not use that and just focus on GemNet-OC as it is very accurate and much faster.

  6. Yup the command is exactly the same.

I would encourage you to spend some time going through the documentation - FAIR-Chem overview. If you scroll down you’ll find a variety of tutorials on the left hand side that could be super helpful. For instance, if you are only interested in doing prediction and not particularly in high-throughput we have an ASE calculator interface that allows you to use our models and run ASE simulations trivially - Simple simulations using the OCP ASE calculator.