- Introduction
- Features and TODO
- Installation
- Training
- Evaluation and Inference
- Experiments
- Citations
- Acknowledgements
This is the code for the NeurIPS 2019 paper Region Mutual Information Loss for Semantic Segmentation.
This paper proposes a region mutual information (RMI) loss to model the dependencies among pixels. RMI uses one pixel and its neighbor pixels to represent this pixel. Then for each pixel in an image, we get a multi-dimensional point that encodes the relationship between pixels, and the image is cast into a multi-dimensional distribution of these high-dimensional points. The prediction and ground truth thus can achieve high order consistency through maximizing the mutual information (MI) between their multi-dimensional distributions.
- Support different segmentation models, i.e., DeepLabv3, DeepLabv3+, PSPNet
- Multi-GPU training
- Multi-GPU Synchronized BatchNorm
- Support different backbones, e.g., Mobilenet, Xception
- Model pretrained on MS-COCO
- Distributed training
We are open to pull requests.
Please install PyTorch-1.1.0 and Python3.6.5. We highly recommend you to use our established PyTorch docker image - zhaosssss/torch_lab.
docker pull zhaosssss/torch_lab:1.1.0
If you have not installed docker, see https://docs.docker.com/.
After you install docker and pull our image, you can cd
to script
directory and run
./docker.sh
to create a running docker container.
If you do not want to use docker, try
pip install -r requirements.txt
However, this is not suggested.
Generally, directories are organized as follow:
|
|--dataset (save the dataset)
|--models (save the output checkpoints)
|--github (save the code)
|--|
|--|--RMI (the RMI code repository)
|--|--|--crf
|--|--|--dataloaders
|--|--|--losses
...
-
Download PASCAL VOC training/validation data (2GB tar file) and augmented segmentation data, extract and put them in the
dataset
directory. -
cd
togithub
directory and clone the RMI repo.
As for the CamVid dataset, you can download at SegNet-Tutorial. This is a processed version of original CamVid dataset.
See script/train.sh
for detailed information.
Before start training, you should specify some variables in the script/train.sh
.
-
pre_dir
, where you save your output checkpoints. If you organize the dir as we suggest, it should bepre_dir=models
. -
data_dir
, where you save your dataset. Besides, you should put the lists of the images in the dataset in a certain directory, checkdataloaders/datasets/pascal.py
to find how we organize the input pipeline.
You can find more information about the arguments of the code in parser_params.py
.
python parser_params.py --help
usage: parser_params.py [-h] [--resume RESUME] [--checkname CHECKNAME]
[--save_ckpt_steps SAVE_CKPT_STEPS]
[--max_ckpt_nums MAX_CKPT_NUMS]
[--model_dir MODEL_DIR] [--output_dir OUTPUT_DIR]
[--seg_model {deeplabv3,deeplabv3+,pspnet}]
[--backbone {resnet50,resnet101,resnet152,resnet50_beta,resnet101_beta,resnet152_beta}]
[--out_stride OUT_STRIDE] [--batch_size N]
[--accumulation_steps N] [--test_batch_size N]
[--dataset {pascal,coco,cityscapes,camvid}]
[--train_split {train,trainaug,trainval,val,test}]
[--data_dir DATA_DIR] [--use_sbd] [--workers N]
...
[--rmi_pool_size RMI_POOL_SIZE]
[--rmi_pool_stride RMI_POOL_STRIDE]
[--rmi_radius RMI_RADIUS]
[--crf_iter_steps CRF_ITER_STEPS]
[--local_rank LOCAL_RANK] [--world_size WORLD_SIZE]
[--dist_backend DIST_BACKEND]
[--multiprocessing_distributed]
After you set all the arguments properly, you can simply cd
to RMI/script
and run
./train.sh
to start training.
- Monitoring the training process through tensorboard
tensorboard --logdir=your_logdir --port=your_port
- GPU memory usage
Training a DeepLabv3 model with output_stride=16
, crop_size=513
, and batch_size=16
needs 4 GTX 1080 GPUs (8GB)
or 2 GTX TITAN X GPUs (12 GB) or 1 TITAN RTX GPUs (24 GB).
See script/eval.sh
and script/inference.sh
for detailed information.
You should also specify some variables in the scripts.
-
data_dir
, where you save your dataset. -
resume
, where your checkpoints locate. -
output_dir
, where the output data will be saved.
Then run
./eval.sh
or
./inference.sh
Some selected qualitative results on PASCAL VOC 2012 val set. Segmentation results of DeepLabv3+&RMI have richer details than DeepLabv3+&CE, e.g., small bumps of the airplane wing, branches of plants, limbs of cows and sheep, and so on.
If our paper and code are beneficial to your work, please cite:
@inproceedings{2019_zhao_rmi,
author = {Shuai Zhao and
Yang Wang and
Zheng Yang and
Deng Cai},
title = {Region Mutual Information Loss for Semantic Segmentation},
booktitle = {NeurIPS},
year = {2019},
}
If other related work in our code or paper also helps you, please cite the corresponding papers.