Constructing Graphs from Catalyst-Adsorbate Systems

For all approaches, graph edges were determined by a nearest neighbor search limited by a cutoff radius of 6A, retaining up to the 50 nearest neighbors. When computing distances, periodic boundary conditions were taken into consideration.

I was wondering why the specific cutoff radius and neighbors sizes were chosen? Is there any scientific provenance to them and have you experimented with different values, too?

Our cutoff radius was selected to ensure the proper interactions were being captured while maintaining computational tractability. Given that the underlying DFT did not consider long-range interactions, we felt comfortable with 6A - similar to other works: https://aip.scitation.org/doi/full/10.1063/1.4966192 https://arxiv.org/pdf/1710.10324.pdf.

Our nearest neighbor limits were incorporated to help with model efficiency. We ran the following experiment on a literature dataset before arriving at the parameter we felt comfortable with:

num_neighbors: 12, 30, 50, 100
energy_mae(eV): 0.5655, 0.4931, 0.4876, 0.4843

We still provide the flexibility for users to modify these parameters if they choose to do so: ocp/preprocess_ef.py at e6fdfb0d0194d50b4c500b1d1eea10f040821de3 · Open-Catalyst-Project/ocp · GitHub.

More recently, we ran the following experiment exploring the trade-off between max neighbors, cutoff radius, and performance for an identical, small DimeNet++ model. Disclaimer- it’s likely that tuning model hyperparameters for each of these combinations can result in better/worse numbers but hopefully this gives you an idea.

Hope this helps!

2 Likes

Thank you, this is very useful!