MedCoSS

This is the official Pytorch implementation of our CVPR 2024 paper (Highlight) "Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning".

Requirements

CUDA 11.5
Python 3.8
Pytorch 1.11.0
CuDNN 8.3.2.44

Data Preparation

Pre-training data
- Download the MIMIC-CXR dataset.
- Download the DeepLesion dataset.
- Download the ADNI dataset.
- Download seven TCGA datasets (TCGA-THYM, TCGA-THCA, TCGA-BRCA, TCGA-UCEC, TCGA-UVM, TCGA-OV, and TCGA-MESO).
Fine-tuning data
- PudMed20k dataset: Download the PudMed20k dataset.
- ChestXR dataset: Download the ChestXR dataset.
- QaTav2 dataset: Download the QaTav2 dataset.
- RICORD dataset: Download the MIDRC-RICORD-1A dataset and MIDRC-RICORD-1B dataset. The folder structure of the dataset should be like
```
dataset/RICORD_nii/
  ├── MIDRC-RICORD-1A
  ├── MIDRC-RICORD-1B
```
- LiTS dataset: Download the LiTS dataset.
- VS dataset: Download the VS dataset.
- LA dataset: Download the LA dataset.
- NCH dataset: Download the NCT-CRC-HE-100K and CRC-VAL-HE-7K datasets.
- GlaS dataset: Download the GlaS dataset.

Pre-processing

Pre-training data
- Report: Following MGCA's procedure to pre-process the MIMIC-CXR dataset.
- X-ray: Using Preprocess/MIMIC_CXR_JPG_Preprocess.py to pre-process the MIMC-CXR dataset.
- CT:
  - Using Preprocess/DL_save_nifti.py (from downloaded files) to transfer the PNG image to the nii.gz form.
  - Using Preprocess/re_spacing_ITK.py to resample CT volumes.
  - Using Preprocess/splitting_to_patches.py to extract about 125k sub-volumes, and the pre-processed dataset will be saved in DL_patches_v2/.
  - Using Preprocess/DeepLesion_Resize.py to resize images.
- MRI:
  - Using Preprocess/ADNI_Resize.py to resize images.
  - Using Preprocess/ADNI_split_slice.py to extract about 59k sub-volumes.
- Pathological imaging: Using Preprocess/TCGA_Preprocess.py to pre-process seven TCGA datasets.
Fine-tuning data
- PudMed20k dataset: None.
- ChestXR dataset: None.
- QaTav2 dataset: Using Preprocess/QaTav2.py to pre-process.
- RICORD dataset: Using Preprocess/RICORD.py to pre-process. Data Splits can be obtained from /Downstream/Dim_3/RICORD/data_split.
- LiTS dataset:
  - (1) Resampling all data to the same spacing of 1.5mm × 0.8mm × 0.8mm;
  - (2) Using the nnUNet v1 framework to pre-process.
- VS dataset:
  - (1) Run Convert_VSseg_to_nnUNet_dataset.py;
  - (2) Using the nnUNet v1 framework to pre-process;
  - (3) Using Preprocess/VSeg.py to pre-process.
- LA dataset: None.
- NCH dataset: None.
- GlaS dataset: Using Preprocess/GlaS.py to pre-process.

Pre-training

Download uni-perceiver-base-L12-H768-224size-torch-pretrained.pth.
Run sh run_ssl.sh for pre-training (4 GPUs with 24G. Before running it, you need to modify some addresses.)

Pre-trained Model

Pre-trained model is available in MedCoSS_Report_Xray_CT_MR_Path_Buffer0.05.

Fine-tuning

Run sh run_ds.sh for fine-tuning. (one GPU with 11G. Before running it, you need to modify some addresses.)

To do

Citation

If this code is helpful for your study, please cite:

@article{ye2024medcoss,
  title={Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning},
  author={Ye, Yiwen and Xie, Yutong and Zhang, Jianpeng and Chen, Ziyang and Wu, Qi and Xia, Yong},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  pages={11114-11124},
  year={2024},
}

Acknowledgements

The whole framework is based on MAE, Uni-Perceiver, and MGCA.

Contact

Yiwen Ye (ywye@mail.nwpu.edu.cn)

Name	Name	Last commit message	Last commit date
Latest commit yeerwen Oct 20, 2024 4ddc58f · Oct 20, 2024 History 17 Commits
Downstream	Downstream	Update	Apr 12, 2024
Preprocess	Preprocess	Update	Mar 18, 2024
dataloader	dataloader	Update	Mar 4, 2024
github	github	Pre-training Codes	Mar 3, 2024
model	model	Pre-training Codes	Mar 3, 2024
util	util	Pre-training Codes	Mar 3, 2024
.DS_Store	.DS_Store	Update	Apr 12, 2024
LICENSE	LICENSE	LICENSE	Mar 3, 2024
README.md	README.md	Update README.md	Jun 18, 2024
engine_pretrain.py	engine_pretrain.py	Pre-training Codes	Mar 3, 2024
engine_pretrain_er.py	engine_pretrain_er.py	Pre-training Codes	Mar 3, 2024
main_buffer_kmean.py	main_buffer_kmean.py	Update main_buffer_kmean.py	Oct 20, 2024
main_pretrain_medcoss.py	main_pretrain_medcoss.py	Pre-training Codes	Mar 3, 2024
main_pretrain_single_modal.py	main_pretrain_single_modal.py	Pre-training Codes	Mar 3, 2024
run_ds.sh	run_ds.sh	Fine-tuning code	Mar 15, 2024
run_ssl.sh	run_ssl.sh	Pre-processing	Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedCoSS

Requirements

Data Preparation

Pre-processing

Pre-training

Pre-trained Model

Fine-tuning

To do

Citation

Acknowledgements

Contact

About

Releases

Packages

Languages

License

yeerwen/MedCoSS

Folders and files

Latest commit

History

Repository files navigation

MedCoSS

Requirements

Data Preparation

Pre-processing

Pre-training

Pre-trained Model

Fine-tuning

To do

Citation

Acknowledgements

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages