Optionally download dataset

Can we optionally download the “extxyz” just according to the “system id” with no need to download all dataset?

Unfortunately we don’t have this functionality at this time. You’ll need to download the entire dataset and extract the relevant system. Sorry for the inconvenience!

When I retrieval all system ID from the file downloaded from "https://dl.fbaipublicfiles.com/opencatalystproject/data/oc20_data_mapping.pkl ", Some system IDs seem not to emerge in either ‘extxyz’(‘extxyz.xz’) or ‘lmdb’ files. For example, if I collected all ads+catalyst items with pure copper surface, the number satisfying the criteria in ‘oc20_data_mapping.pkl’ is 355, but finally, there is still 33 items missing? So, does ‘oc20_data_mapping.pkl’ file include some non-exist structures information? Thanks!

The mapping file is a superset of the dataset. Test systems for instance appear in the mapping file but are not contained within the extxyz file. At some stage there were structures that were excluded for either convergence or unrelated internal reasons. So yes, there may be systems that appear in the pickle file that aren’t contained in the downloaded data.