TRIPLEX 🧬

Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features, CVPR 2024. [arXiv]
Youngmin Chung, Ji Hun Ha, Kyeong Chan Im, Joo Sang Lee^*

It is now integrated with HEST and CLAM for data preparation.

Requirements

Python 3.10+
PyTorch 2.0+
PyTorch Lightning 2.0+
CUDA-enabled GPU (recommended)

TRIPLEX is tested on Python 3.11, PyTorch 2.4.1, PyTorch Lightning 2.4.0, and CUDA 12.1.

Installation

Clone this repository:

git clone https://github.com/NEXGEM/TRIPLEX.git
cd triplex

Create a conda environment:

conda create -n TRIPLEX python=3.11

Install Pytorch

pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121

Install HEST

Dependencies (CUDA-related Python packages)

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12==24.6.0 \
    dask-cudf-cu12==24.6.0 \
    cucim-cu12==24.6.0 \
    raft-dask-cu12==24.6.0

HEST

git clone https://github.com/mahmoodlab/HEST.git 
cd HEST 
pip install -e .

Install FlashAttention

pip install flash-attn --no-build-isolation

Install remaining dependencies:

pip install -r requirements.txt

Project Structure

.
├── config/                 # Configuration files for experiments
├── docker/                 # Docker-related setup files
├── figures/                # Figures
├── src/                    # Source code
│   ├── dataset/            # Dataset loading and preprocessing modules
│   ├── model/              # Model architectures
│   ├── preprocess/         # Codes for preprocessing data
│   ├── experiment/         # Codes for organizing experiment results
│   ├── main.py             # Main script for training, evaluation, and inference
│   ├── utils.py            # Utility functions
├── script/                 # Example scripts for runs 
├── README.md               # Project documentation
├── requirements.txt        # Python dependencies

Usage

Preprocessing

Before training or inference, raw data must be preprocessed. Modify the paths in the respective shell scripts and run them:

Create patches from WSI (Only for inference)

python src/preprocess/CLAM/create_patches_fp.py \
        --source $RAW_DIR \
        --save_dir $PROCESSED_DIR \
        --patch_size 256 \
        --seg \
        --patch \
        --stitch \
        --patch_level $PATCH_LEVEL

Prepare patches and st data

For training:

HEST

python src/preprocess/prepare_data.py --input_dir $RAW_DIR \
                                --output_dir $PROCESSED_DIR \
                                --mode hest

New data

python src/preprocess/prepare_data.py --input_dir $RAW_DIR \
                                --output_dir $PROCESSED_DIR \
                                --mode train

For inference:

python src/preprocess/prepare_data.py --input_dir $RAW_DIR \
                                --output_dir $PROCESSED_DIR \
                                --mode inference \
                                --patch_size 256 \
                                --slide_level 0 \
                                --slide_ext $EXTENSION

Get geneset for training (no need for hest benchmark)

python src/preprocess/get_geneset.py --st_dir $PROCESSED_DIR'/adata' \
                                    --output_dir $PROCESSED_DIR

Extract image features using foundation model (UNI)

Gloabl features:

training

### Global features
python src/preprocess/extract_img_features.py  \
        --patch_dataroot $PROCESSED_DIR'/patches' \
        --embed_dataroot $PROCESSED_DIR'/emb/global' \
        --num_n 1 \
        --use_openslide

inference

### Global features
python src/preprocess/extract_img_features.py  \
        --wsi_dataroot $RAW_DIR \
        --patch_dataroot $PROCESSED_DIR'/patches' \
        --embed_dataroot $PROCESSED_DIR'/emb/global' \
        --slide_ext $EXTENSION \
        --num_n 1 \
        --use_openslide

Neighbor features:

### Global features
python src/preprocess/extract_img_features.py  \
        --wsi_dataroot $RAW_DIR \
        --patch_dataroot $PROCESSED_DIR'/patches' \
        --embed_dataroot $PROCESSED_DIR'/emb/neighbor' \
        --slide_ext $EXTENSION \
        --use_openslide \
        --num_n 5

One-step preprocessing

HEST bench data

bash script/01-preprocess_hest_bench.sh /path/to/hest/wsis ./input/bench_data/CCRCC 'tif'

Other HEST data

bash script/02-preprocess_hest.sh /path/to/hest/wsis ./input/ST/andersson 'tif'

Note

Reproducing our experiments: The ST datasets we used in our experiments are already included in HEST data. You can run the scripts below to automatically download and pre-process the data.

BC1 dataset (Andersson et al.):

bash script/02.1-preprocess_BC1.sh

BC2 dataset (Bryan et al.):

bash script/02.2-preprocess_BC2.sh

SCC dataset (Andrew et al.):

bash script/02.3-preprocess_SCC.sh

Your own ST data

bash script/03-preprocess_new.sh /path/to/raw ./input/path/to/processed 'tif' visium

Only images (for inference)

bash script/04-preprocess_for_inference.sh /path/to/raw ./input/path/to/processed 'svs' 0

📈 Training

To train the model using cross-validation, run the following command:

python src/main.py --config_name=<config_path> --mode=cv --gpu=1

Replace <config_path> with the path to your configuration file.

📊 Evaluation

To evaluate the model, run the following command:

python src/main.py --config_name=<config_path> --mode=eval --gpu=1

The most recent folder inside the log directory will be used for evaluation. The file pcc_rank.npy will be saved in the output directory.

To identify highly predictive genes (HPGs), use the following command:

python src/experiment/get_HPG.py --dataset=<dataset_name> --model=<model_name>

The file idx_top.npy will be saved in the output directory.

To evaluate the model including HPGs, run the evaluation command again:

python src/main.py --config_name=<config_path> --mode=eval --gpu=1

🔍 Inference

To run inference:

python src/main.py --config_name=<config_path> --mode=inference --gpu=1 --model_path=<model_checkpoint_path>

Replace <model_checkpoint_path> with the path to your trained model checkpoint.

Configuration

Configurations are managed using YAML files located in the config/ directory. Each configuration file specifies parameters for the dataset, model, training, and evaluation. Example configuration parameters include:

GENERAL:
  seed: 2021
  log_path: ./logs
  
TRAINING:
  num_k: 8
  learning_rate: 1.0e-4
  num_epochs: 200
  monitor: PearsonCorrCoef
  mode: max
  early_stopping:
    patience: 20
  lr_scheduler:
    patience: 10
    factor: 0.1
  
MODEL:
  model_name: TRIPLEX 
  num_genes: 250
  emb_dim: 1024
  depth1: 1
  depth2: 5
  depth3: 4
  num_heads1: 4
  num_heads2: 8
  num_heads3: 4
  mlp_ratio1: 4
  mlp_ratio2: 4
  mlp_ratio3: 4
  dropout1: 0.4
  dropout2: 0.3
  dropout3: 0.3
  kernel_size: 3

DATA:
  data_dir: input/ST/andersson
  output_dir: output/pred/ST/andersson
  dataset_name: TriDataset
  gene_type: 'mean'
  num_genes: 1000
  num_outputs: 250
  cpm: True
  smooth: True
  
  train_dataloader:
        batch_size: 128
        num_workers: 4
        pin_memory: False
        shuffle: True

  test_dataloader:
      batch_size: 1
      num_workers: 4
      pin_memory: False
      shuffle: False

Modify these files as needed for your experiments.

Citation

@inproceedings{chung2024accurate,
  title={Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features},
  author={Chung, Youngmin and Ha, Ji Hun and Im, Kyeong Chan and Lee, Joo Sang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={11591--11600},
  year={2024}
}

Name	Name	Last commit message	Last commit date
Latest commit ymoneych Mar 1, 2025 adf16b2 · Mar 1, 2025 History 333 Commits
config	config	rm config	Feb 28, 2025
docker	docker	docs: Update README.md	Dec 30, 2024
figures	figures	first commit	Mar 11, 2024
input/ST	input/ST	Add ids.csv for ST dataset	Jan 3, 2025
script	script	chore: Update scripts	Feb 28, 2025
src	src	feat: Add CLAM	Feb 28, 2025
tutorials	tutorials	docs: Update ST tutorial	Mar 1, 2025
.gitignore	.gitignore	feat: Processing of Xexium samples	Feb 17, 2025
.gitmodules	.gitmodules	Update submodule configuration	Feb 28, 2025
README.md	README.md	Update README.md	Feb 17, 2025
requirements.txt	requirements.txt	build(deps): update requirements.txt	Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRIPLEX 🧬

Table of Contents

Requirements

Installation

Project Structure

Usage

Preprocessing

Create patches from WSI (Only for inference)

Prepare patches and st data

Get geneset for training (no need for hest benchmark)

Extract image features using foundation model (UNI)

One-step preprocessing

📈 Training

📊 Evaluation

🔍 Inference

Configuration

Citation

About

Releases

Packages

Contributors 3

Languages

NEXGEM/TRIPLEX

Folders and files

Latest commit

History

Repository files navigation

TRIPLEX 🧬

Table of Contents

Requirements

Installation

Project Structure

Usage

Preprocessing

Create patches from WSI (Only for inference)

Prepare patches and st data

Get geneset for training (no need for hest benchmark)

Extract image features using foundation model (UNI)

One-step preprocessing

📈 Training

📊 Evaluation

🔍 Inference

Configuration

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages