π[arXiv] β π[PDF]
This repository contains code for CVPR2024 paper:
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He, Henghui Ding
CVPR 2024
Please see INSTALL.md. Then
pip install -r requirements.txt
python3 -m spacy download en_core_web_sm
Obtain the output masks of Valu set:
python train_net_dshmp.py \
--config-file configs/dshmp_swin_tiny.yaml \
--num-gpus 8 --dist-url auto --eval-only \
MODEL.WEIGHTS [path_to_weights] \
OUTPUT_DIR [output_dir]
Obtain the J&F results on Valu set:
python tools/eval_mevis.py
Obtain the output masks of Val set for CodaLab online evaluation:
python train_net_dshmp.py \
--config-file configs/dshmp_swin_tiny.yaml \
--num-gpus 8 --dist-url auto --eval-only \
MODEL.WEIGHTS [path_to_weights] \
OUTPUT_DIR [output_dir] DATASETS.TEST '("mevis_test",)'
Firstly, download the backbone weights (model_final_86143f.pkl
) and convert it using the script:
wget https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/instance/maskformer2_swin_tiny_bs16_50ep/model_final_86143f.pkl
python tools/process_ckpt.py
python tools/get_refer_id.py
Then start training:
python train_net_dshmp.py \
--config-file configs/dshmp_swin_tiny.yaml \
--num-gpus 8 --dist-url auto \
MODEL.WEIGHTS [path_to_weights] \
OUTPUT_DIR [path_to_weights]
Note: We train on a 3090 machine using 8 cards with 1 sample on each card, taking about 17 hours.
βοΈ Google Drive
This project is based on MeViS. Many thanks to the authors for their great works!
Please consider to cite DsHmp if it helps your research.
@inproceedings{DsHmp,
title={Decoupling static and hierarchical motion perception for referring video segmentation},
author={He, Shuting and Ding, Henghui},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={13332--13341},
year={2024}
}