OCP provides metadata for their training and validation data. Will the test dataset also include similar metadata? Also, will the relaxed structures in traj file be released as test dataset?
OCP metadata:
OCP provides metadata for their training and validation data. Will the test dataset also include similar metadata? Also, will the relaxed structures in traj file be released as test dataset?
OCP metadata:
The metadata is all encompassing - it contains information on all dataset splits, not just train/val. If you notice something not there let me know.
Trajectories will not be released for the test dataset as it contains the target properties (energy/forces) which we are purposefully not releasing to ensure no data leakage for evaluations.
Got it. So, for the final evaluation, we’ll be working with the relaxed structures in the form of an LMDB file.
I have another question. Will LLM-based models be evaluated separately? The announcement mentioned that they “may be” categorized differently.
If you’re referring to the ML relaxed structures, yes they’ll be in LMDB format. For the inputs we’ll release those in LMDB and ASE format.
Regarding LLMs - we’re hoping to evaluate all submissions together. However, given LLMs are very new and it’s unclear the scope of how people may tackle this approach, we put that disclaimer in case approaches may be viewed as not fair when compared to traditional approaches and we need to group it in it’s own category.