Skip to content

Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries, ECCV 2018

License

Notifications You must be signed in to change notification settings

BCV-Uniandes/DMS

Repository files navigation

dmn-pytorch

License Codacy Badge

PyTorch code for Dynamic Multimodal Instance Segmentation guided by natural language queries, ECCV 2018.

Project Page

horses
A dark horse between three lighter horses

Dependencies

To execute this, you must have Python 3.6.*, PyTorch, Visdom, cupy, Cython, Numpy and Matplotlib installed. To accomplish this, we recommend installing the Anaconda Python distribution and use conda to install the dependencies, as it follows:

conda install matplotlib numpy cython
conda install pytorch torchvision cuda90 -c pytorch
conda install aria2 -c bioconda
pip install -U visdom opencv-python cupy-cuda90 pynvrtc tqdm

You will also require the ReferIt loader library, which you can clone from: https://github.com/andfoy/refer. To install it, you can use pip as it follows:

pip install git+https://github.com/andfoy/refer.git

Finally, you will need to install the Simple Recurrent Unit (SRU):

pip install -U git+https://github.com/taolei87/sru.git@43c85ed --no-deps

Conda packages will be created on future releases.

Dataset download

Additionally, you must download the ReferIt, UNC, UNC+ and GRef datasets. To accomplish this, we provide the download_dataset.sh bash script that will take care of the required downloads.

bash download_data --path $PATH_TO_STORE_THE_DATASETS

Datasets

Dataset Name Original Name Splits
referit RefCLEF train, val, trainval, test
unc RefCOCO train, val, testA, testB
unc+ RefCOCO+ train, val, testA, testB
gref RefCOCOg train, val

Training

To train the model, you will need to provide the path to the directory that contains the aforementioned datasets, as well to other parameters required to train the model. To train the model with the low-resolution setup described on the original paper, please execute:

python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --accum-iters 1

To train the model on high-resolution, you just need to add the --high-res and --upsamp-amplification 32 flags to the previous command. Note: The snapshot file must correspond to the low resolution weights.

To inspect all the available parameters and their description, please execute python -m dmn_pytorch.train --help. Please refer to the datasets table displayed above to get more information about the dataset names and their respective available splits.

Evaluation

To evaluate the model, you can define the --eval-first and --epochs 0 parameter flags to dmn_pytorch.train as it follows:

python -u -m dmn_pytorch.train --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --val $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --save-folder $PATH_TO_STORE WEIGHT_SNAPSHOTS --snapshot $PATH_TO_THE_SNAPSHOT_FILE --epochs 0 --eval-first

Results Visualization

Additionally, you can visualize the results of the DMN model with a set of pretrained weights on visdom. To do so, you can execute the dmn_pytorch.visdom_display script as it follows:

python -m dmn_pytorch.visdom_display --data $PATH_TO_STORE_THE_DATASETS --dataset $DATASET --split $SPLIT_TO_EVALUATE --backend dpn92 --num-filters 10 --lang-layers 3 --mix-we --num-images $NUMBER_OF_EXAMPLES_TO_DISPLAY --snapshot $PATH_TO_THE_SNAPSHOT_FILE --no-eval --visdom http://$HOST:$PORT --env $NAME_OF_THE_VISDOM_ENV

Performance

The pretrained weights provided below were trained on two phases: during the low-resolution phase, the DMN was trained on UNC during 24 epochs with a constant learning rate, which then were fine-tuned for the remaining datasets during 10 epochs. Finally, the high-resolution phase was done over all the datasets using the weights from the previous phase during a total number of 4 epochs.

Dataset Examples High-Resolution Pretrained Weights Splits Performance (mIoU)
Referit Referit Examples Link val 0.5328
test 0.5281
UNC UNC Examples Link val 0.4978
testA 0.5484
testB 0.4520
UNC+ UNC+ Examples Link val 0.3888
testA 0.4425
testB 0.3249
GRef GRef Examples Link val 0.3764

External Installation

The DMN can be used and imported as a regular Python package on your scripts. To install it, you can use pip:

pip install -U .

Then you can import it as it follows:

from dmn_pytorch import DMN

Contribution Guidelines

We follow PEP8 and PEP257 style guidelines. Feel free to send a PR or create an issue if you have any problem/question.

Citation

@article{margffoy2018dmn,
  title={Dynamic Multimodal Instance Segmentation guided by natural language queries},
  author={{Margffoy-Tuay}, E. and {P{\'e}rez}, J.~C. and {Botero}, E. and
	{Arbel{\'a}ez}, P.},
  journal={European Conference on Computer Vision (ECCV)},
  year={2018}
}