Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
LICENSE		LICENSE
README.md		README.md
SPMM_models.py		SPMM_models.py
SPMM_models_rxn.py		SPMM_models_rxn.py
SPMM_pretrain.py		SPMM_pretrain.py
attention_visualize.py		attention_visualize.py
bialbef.py		bialbef.py
calc_property.py		calc_property.py
classification.py		classification.py
config_bert.json		config_bert.json
config_bert_property.json		config_bert_property.json
config_bert_smiles.json		config_bert_smiles.json
d_classification.py		d_classification.py
d_classification_multilabel.py		d_classification_multilabel.py
d_pv2smiles_deterministic.py		d_pv2smiles_deterministic.py
d_pv2smiles_stochastic.py		d_pv2smiles_stochastic.py
d_regression.py		d_regression.py
d_rxn_prediction.py		d_rxn_prediction.py
d_smiles2pv.py		d_smiles2pv.py
dataset.py		dataset.py
mmg.py		mmg.py
normalize.pkl		normalize.pkl
pg.py		pg.py
property_name.txt		property_name.txt
requirements.txt		requirements.txt
utils.py		utils.py
vocab_bpe_300.txt		vocab_bpe_300.txt
xbert.py		xbert.py

Repository files navigation

SPMM: Structure-Property Multi-Modal learning for molecules

GitHub for SPMM, a multi-modal molecular pre-train model for a synergistic comprehension of molecular structure and properties. The details can be found in the following paper: Bidirectional Generation of Structure and Properties Through a Single Molecular Foundation Model. https://arxiv.org/abs/2211.10590

Molecule structure will be given in SMILES, and we used 53 simple chemical properties to build a property vector(PV) of a molecule.

File description

data/: Contains the data used for the experiments in the paper.
Pretrain/: Contains the checkpoint of the pre-trained SPMM.
vocab_bpe_300.txt: Contains the SMILES tokens for the SMILES tokenizer.
property_name.txt: Contains the name of the 53 chemical properties.
normalize.pkl: Contains the mean and standard deviation of the 53 chemical properties that we used for PV.
calc_property.py: Contains the code to calculate the 53 chemical properties and build a PV for a given SMILES.
SPMM_models.py: Contains the code for the SPMM model and its pre-training codes.
SPMM_pretrain.py: runs SPMM pre-training.
d_*.py: Codes for the downstream tasks.

Requirements

Run pip install -r requirements.txt to install the required packages.

Code running

Arguments can be passed with commands, or be edited manually in the running code.

Pre-training

python SPMM_pretrain.py --data_path './data/pretrain_20m.txt'

PV-to-SMILES generation
- deterministic: The model takes PVs from the molecules in input_file, and generate molecules with those PVs. The generated molecules will be written in generated_molecules.txt.
```
python d_pv2smiles_deterministic.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --input_file './data/pubchem_1k_unseen.txt'
```
- stochastic: The model takes one query PV and generate n_generate molecules with that PV. The generated molecules will be written in generated_molecules.txt. Here, you need to build your input PV in the code. Check four examples that we included.
```
python d_pv2smiles_stochastic.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --n_generate 1000
```

SMILES-to-PV generation

The model takes the query molecules in input_file, and generate their PV.

python d_smiles2pv.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --input_file './data/pubchem_1k_unseen.txt'

Attention visualization

The model takes a query molecule input_file, and shows the attention map.

python attention_visualize.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --input_smiles 'CCN(C)CCC(O)C(c1ccccc1)c1ccccc1'

MoleculeNet + DILI prediction task

d_regression.py, d_classification.py, and d_classification_multilabel.py, performs regression, binary classification, and multi-label classification tasks, respectively.

python d_regression.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --name 'esol'
python d_classification.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --name 'bbbp'
python d_classification_multilabel.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --name 'clintox'

Forward/retro-reaction prediction tasks

d_rxn_prediction.py performs both forward/reverse reaction prediction task on USPTO-480k and USPTO-50k dataset.

e.g. forward reaction prediction, no beam search
```
python d_rxn_prediction.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --mode 'forward' --n_beam 1 
```
e.g. retro reaction prediction, beam search with k=3
```
python d_rxn_prediction.py --checkpoint './Pretrain/checkpoint_SPMM_20m.ckpt' --mode 'retro' --n_beam 3 
```

Acknowledgement

The code of BERT with cross-attention layers xbert.py are modified from the one in ALBEF.
The code for SMILES augmentation is taken from pysmilesutils.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPMM: Structure-Property Multi-Modal learning for molecules

File description

Requirements

Code running

Acknowledgement

About

Releases

Packages

Languages

License

mathcom/spmm

Folders and files

Latest commit

History

Repository files navigation

SPMM: Structure-Property Multi-Modal learning for molecules

File description

Requirements

Code running

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages