PyTorch code for Dynamic Multimodal Instance Segmentation guided by natural language queries, ECCV 2018.
A dark horse between three lighter horses |
To execute this, you must have Python 3.6.*, PyTorch, Visdom, cupy, Cython, Numpy and Matplotlib installed. To accomplish this, we recommend installing the Anaconda Python distribution and use conda to install the dependencies, as it follows:
conda install matplotlib numpy cython
conda install pytorch torchvision cuda90 -c pytorch
conda install aria2 -c bioconda
pip install -U visdom opencv-python cupy-cuda90 pynvrtc tqdm
You will also require the ReferIt loader library, which you can clone from: https://github.com/andfoy/refer. To install it, you can use pip
as it follows:
pip install git+https://github.com/andfoy/refer.git
Finally, you will need to install the Simple Recurrent Unit (SRU):
pip install -U git+https://github.com/taolei87/sru.git@43c85ed --no-deps
Conda packages will be created on future releases.
Additionally, you must download the ReferIt, UNC, UNC+ and GRef datasets. To accomplish this, we provide the download_dataset.sh
bash script that will take care of the required downloads.
bash download_data --path $PATH_TO_STORE_THE_DATASETS
Dataset Name | Original Name | Splits |
---|---|---|
referit | RefCLEF | train, val, trainval, test |
unc | RefCOCO | train, val, testA, testB |
unc+ | RefCOCO+ | train, val, testA, testB |
gref | RefCOCOg | train, val |
To train the model, you will need to provide the path to the directory that contains the aforementioned datasets, as well to other parameters required to train the model. To train the model with the low-resolution setup described on the original paper, please execute:
python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --accum-iters 1
To train the model on high-resolution, you just need to add the --high-res
and --upsamp-amplification 32
flags to the previous command. Note: The snapshot file must correspond to the low resolution weights.
To inspect all the available parameters and their description, please execute python -m dmn_pytorch.train --help
. Please refer to the datasets table displayed above to get more information about the dataset names and their respective available splits.
To evaluate the model, you can define the --eval-first
and --epochs 0
parameter flags to dmn_pytorch.train
as it follows:
python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --epochs 0 --eval-first
Additionally, you can visualize the results of the DMN model with a set of pretrained weights on visdom. To do so, you can execute the dmn_pytorch.visdom_display
script as it follows:
python -m dmn_pytorch.visdom_display --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --split $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --num-images $NUMBER_OF_EXAMPLES_TO_DISPLAY --snapshot $PATH_TO_THE_SNAPSHOT_FILE --no-eval --visdom http://$HOST:$PORT --env $NAME_OF_THE_VISDOM_ENV
The pretrained weights provided below were trained on two phases: during the low-resolution phase, the DMN was trained on UNC during 24 epochs with a constant learning rate, which then were fine-tuned for the remaining datasets during 10 epochs. Finally, the high-resolution phase was done over all the datasets using the weights from the previous phase during a total number of 4 epochs.
Dataset | Examples | High-Resolution Pretrained Weights | Splits | Performance (mIoU) |
---|---|---|---|---|
Referit | Referit Examples | Link | val | 0.5328 |
test | 0.5281 | |||
UNC | UNC Examples | Link | val | 0.4978 |
testA | 0.5484 | |||
testB | 0.4520 | |||
UNC+ | UNC+ Examples | Link | val | 0.3888 |
testA | 0.4425 | |||
testB | 0.3249 | |||
GRef | GRef Examples | Link | val | 0.3764 |
The DMN can be used and imported as a regular Python package on your scripts. To install it, you can use pip:
pip install -U .
Then you can import it as it follows:
from dmn_pytorch import DMN
We follow PEP8 and PEP257 style guidelines. Feel free to send a PR or create an issue if you have any problem/question.
@article{margffoy2018dmn,
title={Dynamic Multimodal Instance Segmentation guided by natural language queries},
author={{Margffoy-Tuay}, E. and {P{\'e}rez}, J.~C. and {Botero}, E. and
{Arbel{\'a}ez}, P.},
journal={European Conference on Computer Vision (ECCV)},
year={2018}
}