Vision Transformers are Good Mask Auto-Labelers
Shiyi Lan, Xitong Yang, Zhiding Yu, Zuxuan Wu, Jose M. Alvarez, Anima Anandkumar
Accepted by Conference on Computer Vision and Pattern Recognition (CVPR) 2023.
- Please refer to the dockerfile in the root directory for environment specs. We also provide the docker image here.
python main.py
We copy the training scripts from mmdet.
To train a model, e.g. ResNet-50/SOLOv2, with 8 GPUs
cd mmdet;
bash tools/dist_train.sh configs/MALMask/solov2_r50_fpn_3x_coco_mal.py 8
For more detail, please refer the documentation or github repo of mmdetection.
python main.py --resume PATH/TO/WEIGHTS --label_dump_path PATH/TO/PSUEDO_LABELS_OUTPUT --not_eval_mask
To evaluate an instance segmentation model, e.g. ResNet-50/SOLOv2, with 8 GPUs:
bash tools/dist_test.sh configs/MALMask/solov2_r50_fpn_3x_coco_mal.py solov2_r50_fpn_3x_coco_essenco/latest.pth 8 --eval segm
To generate results of instance segmentation models, e.g. ResNet-50/SOLOv2, with 8 GPUs:
bash tools/dist_test.sh configs/MALMask/solov2_r50_fpn_3x_coco_mal.py solov2_r50_fpn_3x_coco_essenco/latest.pth 8 --format-only --options "jsonfile_prefix=work_dirs/solov2_r50_fpn_3x_coco_essenco/test-dev.json"
For more detail, please refer the documentation or github repo of mmdetection.
ViT-MAE-base (COCO) | MAL-ViT-base (LVIS v1.0) |
---|---|
download | download |
MAL-ViT-base (COCO train2017) | MAL-ViT-base (LVIS v1.0 train) |
---|---|
download | download |
Encoder | Decoder | weights |
---|---|---|
ResNet-50 | SOLOv2 | download |
ResNet-101-DCN | SOLOv2 | download |
ResNeXt-101-DCN | SOLOv2 | download |
ConvNeXt-s | Cascade MR-CNN | download |
ConvNeXt-b | Cascade MR-CNN | download |
Swin-s | Mask2Former | download |
No, we do not use mask. Check this
You have to rebuild your own docker since your nvidia driver version is different from mine and there are some customized operators in MMCV.
Copyright © 2022, NVIDIA Corporation. All rights reserved.
This work is made available under the Nvidia Source Code License-NC. Click here to view a copy of this license.
The pre-trained models are shared under CC-BY-NC-SA-4.0. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing
This repository is partly based on Pytorch-image-models (timm), MMDetection, and DINO. We leverage PyTorch Lightning.