All trained models, training sets, and artifacts generated by the models have been uploaded to Zenodo. The files are
publicly accessible at: https://zenodo.org/records/10642388 . All files are
released under the CC-BY 4.0 license.
Each file listed below can be downloaded using the download.py
script. For example, to download cifs_v1_val.pkl.gz
:
python bin/download.py cifs_v1_val.pkl.gz
Name
Description
Download Link
cifs_v1_orig.tar.gz
The original CIF file dataset containing 3,551,492 symmetrized CIF files.
download ↓
cifs_v1_orig.pkl.gz
The contents of cifs_v1_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
cifs_v1_dedup.tar.gz
The deduplicated original CIF dataset, containing 2,285,914 symmetrized CIF files.
download ↓
cifs_v1_dedup.pkl.gz
The contents of cifs_v1_dedup.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
cifs_v1_prep.tar.gz
The deduplicated and pre-processed original CIF dataset, containing 2,285,719 CIF files.
download ↓
cifs_v1_prep.pkl.gz
The contents of cifs_v1_prep.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
cifs_v1_train.tar.gz
The training split of the main dataset, containing 2,047,889 CIF files.
download ↓
cifs_v1_train.pkl.gz
The contents of cifs_v1_train.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
cifs_v1_val.tar.gz
The validation split of the main dataset, containing 227,544 CIF files.
download ↓
cifs_v1_val.pkl.gz
The contents of cifs_v1_val.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
cifs_v1_test.tar.gz
The test split of the main dataset, containing 10,286 CIF files.
download ↓
cifs_v1_test.pkl.gz
The contents of cifs_v1_test.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
tokens_v1_all.tar.gz
The tokens of the complete main dataset.
download ↓
tokens_v1_train_val.tar.gz
The tokens of the training and validation sets of the main dataset.
download ↓
starts_v1_train.pkl
The start indices for the tokenized training set structures of the main dataset.
download ↓
starts_v1_val.pkl
The start indices for the tokenized validation set structures of the main dataset.
download ↓
challenge_set_v1.zip
The structures of the challenge set.
download ↓
Name
Description
Download Link
crystallm_v1_small.tar.gz
Model with small architecture trained on the full main dataset.
download ↓
crystallm_v1_large.tar.gz
Model with large architecture trained on the full main dataset.
download ↓
crystallm_perov_5_small.tar.gz
Model with small architecture trained on the Perov-5 training set only.
download ↓
crystallm_perov_5_large.tar.gz
Model with large architecture trained on the Perov-5 training set only.
download ↓
crystallm_carbon_24_small.tar.gz
Model with small architecture trained on the Carbon-24 training set only.
download ↓
crystallm_carbon_24_large.tar.gz
Model with large architecture trained on the Carbon-24 training set only.
download ↓
crystallm_mp_20_small.tar.gz
Model with small architecture trained on the MP-20 training set only.
download ↓
crystallm_mp_20_large.tar.gz
Model with large architecture trained on the MP-20 training set only.
download ↓
crystallm_mpts_52_small.tar.gz
Model with small architecture trained on the MPTS-52 training set only.
download ↓
crystallm_mpts_52_large.tar.gz
Model with large architecture trained on the MPTS-52 training set only.
download ↓
crystallm_v1_minus_mpts_52_small.tar.gz
Model with small architecture trained on the full main dataset minus the MPTS-52 test and validation sets.
download ↓
Name
Description
Download Link
perov_5_train_orig.tar.gz
The original CIF files of the Perov-5 training set (symmetrized).
download ↓
perov_5_train_orig.pkl.gz
The contents of perov_5_train_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
perov_5_train_prep.pkl.gz
The pre-processed CIF files of the Perov-5 training set.
download ↓
perov_5_val_orig.tar.gz
The original CIF files of the Perov-5 validation set (symmetrized).
download ↓
perov_5_val_orig.pkl.gz
The contents of perov_5_val_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
perov_5_val_prep.pkl.gz
The pre-processed CIF files of the Perov-5 validation set.
download ↓
perov_5_test_orig.tar.gz
The original CIF files of the Perov-5 test set (symmetrized).
download ↓
perov_5_test_orig.pkl.gz
The contents of perov_5_test_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
perov_5_test_prep.pkl.gz
The pre-processed CIF files of the Perov-5 test set.
download ↓
tokens_perov_5.tar.gz
The tokens of the Perov-5 training and validation sets.
download ↓
starts_perov_5_train.pkl
The start indices for the tokenized training set structures of the Perov-5 training set.
download ↓
prompts_perov_5_test.tar.gz
Text files containing prompts derived from the Perov-5 test set.
download ↓
Name
Description
Download Link
carbon_24_train_orig.tar.gz
The original CIF files of the Carbon-24 training set (symmetrized).
download ↓
carbon_24_train_orig.pkl.gz
The contents of carbon_24_train_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
carbon_24_train_prep.pkl.gz
The pre-processed CIF files of the Carbon-24 training set.
download ↓
carbon_24_val_orig.tar.gz
The original CIF files of the Carbon-24 validation set (symmetrized).
download ↓
carbon_24_val_orig.pkl.gz
The contents of carbon_24_val_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
carbon_24_val_prep.pkl.gz
The pre-processed CIF files of the Carbon-24 validation set.
download ↓
carbon_24_test_orig.tar.gz
The original CIF files of the Carbon-24 test set (symmetrized).
download ↓
carbon_24_test_orig.pkl.gz
The contents of carbon_24_test_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
carbon_24_test_prep.pkl.gz
The pre-processed CIF files of the Carbon-24 test set.
download ↓
tokens_carbon_24.tar.gz
The tokens of the Carbon-24 training and validation sets.
download ↓
starts_carbon_24_train.pkl
The start indices for the tokenized training set structures of the Carbon-24 training set.
download ↓
prompts_carbon_24_test.tar.gz
Text files containing prompts derived from the Carbon-24 test set.
download ↓
Name
Description
Download Link
mp_20_train_orig.tar.gz
The original CIF files of the MP-20 training set (symmetrized).
download ↓
mp_20_train_orig.pkl.gz
The contents of mp_20_train_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
mp_20_train_prep.pkl.gz
The pre-processed CIF files of the MP-20 training set.
download ↓
mp_20_val_orig.tar.gz
The original CIF files of the MP-20 validation set (symmetrized).
download ↓
mp_20_val_orig.pkl.gz
The contents of mp_20_val_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
mp_20_val_prep.pkl.gz
The pre-processed CIF files of the MP-20 validation set.
download ↓
mp_20_test_orig.tar.gz
The original CIF files of the MP-20 test set (symmetrized).
download ↓
mp_20_test_orig.pkl.gz
The contents of mp_20_test_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
mp_20_test_prep.pkl.gz
The pre-processed CIF files of the MP-20 test set.
download ↓
tokens_mp_20.tar.gz
The tokens of the MP-20 training and validation sets.
download ↓
starts_mp_20_train.pkl
The start indices for the tokenized training set structures of the MP-20 training set.
download ↓
prompts_mp_20_test.tar.gz
Text files containing prompts derived from the MP-20 test set.
download ↓
Name
Description
Download Link
mpts_52_train_orig.tar.gz
The original CIF files of the MPTS-52 training set (symmetrized).
download ↓
mpts_52_train_orig.pkl.gz
The contents of mpts_52_train_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
mpts_52_train_prep.pkl.gz
The pre-processed CIF files of the MPTS-52 training set.
download ↓
mpts_52_val_orig.tar.gz
The original CIF files of the MPTS-52 validation set (symmetrized).
download ↓
mpts_52_val_orig.pkl.gz
The contents of mpts_52_val_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
mpts_52_val_prep.pkl.gz
The pre-processed CIF files of the MPTS-52 validation set.
download ↓
mpts_52_test_orig.tar.gz
The original CIF files of the MPTS-52 test set (symmetrized).
download ↓
mpts_52_test_orig.pkl.gz
The contents of mpts_52_test_orig.tar.gz
as a serialized Python list of 2-tuples: (ID, CIF).
download ↓
mpts_52_test_prep.pkl.gz
The pre-processed CIF files of the MPTS-52 test set.
download ↓
tokens_mpts_52.tar.gz
The tokens of the MPTS-52 training and validation sets.
download ↓
tokens_v1_minus_mpts_52.tar.gz
The tokens of the full main dataset minus the MPTS-52 validation and test sets.
download ↓
starts_mpts_52_train.pkl
The start indices for the tokenized training set structures of the MPTS-52 training set.
download ↓
prompts_mpts_52_test.tar.gz
Text files containing prompts derived from the MPTS-52 test set.
download ↓
Generated Benchmark CIF Files
Name
Description
Download Link
gen_perov_5_small_raw.tar.gz
CIF files generated with the Perov-5 small model starting from the Perov-5 test set prompts (n=20 ).
download ↓
gen_perov_5_small.tar.gz
Pre-processed CIF files generated with the Perov-5 small model starting from the Perov-5 test set prompts (n=20 ).
download ↓
gen_perov_5_large_raw.tar.gz
CIF files generated with the Perov-5 large model starting from the Perov-5 test set prompts (n=20 ).
download ↓
gen_perov_5_large.tar.gz
Pre-processed CIF files generated with the Perov-5 large model starting from the Perov-5 test set prompts (n=20 ).
download ↓
gen_carbon_24_small_raw.tar.gz
CIF files generated with the Carbon-24 small model starting from the Carbon-24 test set prompts (n=20 ).
download ↓
gen_carbon_24_small.tar.gz
Pre-processed CIF files generated with the Carbon-24 small model starting from the Carbon-24 test set prompts (n=20 ).
download ↓
gen_carbon_24_large_raw.tar.gz
CIF files generated with the Carbon-24 large model starting from the Carbon-24 test set prompts (n=20 ).
download ↓
gen_carbon_24_large.tar.gz
Pre-processed CIF files generated with the Carbon-24 large model starting from the Carbon-24 test set prompts (n=20 ).
download ↓
gen_mp_20_small_raw.tar.gz
CIF files generated with the MP-20 small model starting from the MP-20 test set prompts (n=20 ).
download ↓
gen_mp_20_small.tar.gz
Pre-processed CIF files generated with the MP-20 small model starting from the MP-20 test set prompts (n=20 ).
download ↓
gen_mp_20_large_raw.tar.gz
CIF files generated with the MP-20 large model starting from the MP-20 test set prompts (n=20 ).
download ↓
gen_mp_20_large.tar.gz
Pre-processed CIF files generated with the MP-20 large model starting from the MP-20 test set prompts (n=20 ).
download ↓
gen_mpts_52_large_raw.tar.gz
CIF files generated with the MPTS-52 large model starting from the MPTS-52 test set prompts (n=20 ).
download ↓
gen_mpts_52_large.tar.gz
Pre-processed CIF files generated with the MPTS-52 large model starting from the MPTS-52 test set prompts (n=20 ).
download ↓
gen_v1_minus_mpts_52_small_raw.tar.gz
CIF files generated with the small model trained on the full dataset minus the MPTS-52 test and validation sets, starting from the MPTS-52 test set prompts (n=20 ).
download ↓
gen_v1_minus_mpts_52_small.tar.gz
Pre-processed CIF files generated with the small model trained on the full dataset minus the MPTS-52 test and validation sets, starting from the MPTS-52 test set prompts (n=20 ).
download ↓