Question about size of OC20 dataset

Hi there. I am writing this post to ask something.

I have known the size of OC20 dataset is 640,081 relaxations and 133 million single point calculations in the OC20 paper. However, in the abstract of that paper, there is 1,281,040 relaxations and 264.9 million single point calculations. I’m confused about the double difference in these datasets.

Can you explain why there is a double difference?

Thanks for reading.

Hi -

The number quoted in the abstract - 1.28M relaxations + 264M single point calculations corresponds to all data spanning the training+validation+testing set. The 640,081 and 133M single points corresponds to specifically the training set. Hope that clarifies it!