Skip to content

[ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention"

License

Notifications You must be signed in to change notification settings

gpt4vision/OvSGTR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Feb 24, 2025
11b9c58 · Feb 24, 2025

History

35 Commits
Sep 14, 2024
Jul 27, 2024
Sep 20, 2024
Jun 5, 2024
Jun 4, 2024
Oct 7, 2024
Jun 4, 2024
Jun 5, 2024
Jun 4, 2024
Jun 4, 2024
Sep 14, 2024
Feb 24, 2025
Jun 4, 2024
Jun 4, 2024
Jun 4, 2024
Jun 4, 2024

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Official Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention" ( Paper ) OvSGG OvSGTR

News

  • 2025.02: add checkpoints for the TPAMI version
  • 2024.10: our paper has beed recognized as ``Best Paper Candidate'' (Milano, Italy, ECCV 2024)

Setup

For simplicity, you can directly run bash install.sh, which includes the following steps:

  1. install pytorch 1.9.1 and other dependencies, e.g.,
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html # this might need to be changed due to cuda driver version 

pip install -r requirements.txt
  1. install GroundingDINO and download pre-trained weights
cd GroundingDINO && python3 setup.py install 
mkdir $PWD/GroundingDINO/weights/

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -O $PWD/GroundingDINO/weights/groundingdino_swint_ogc.pth

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O $PWD/GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

Dataset

- VG150
- COCO

prepare the dataset under the folder data with the instruction

Closed-set SGG

For training OvSGTR (w. Swin-T) on VG150, running with this command

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinT_OGC_full.py  ./data  ./logs/ovsgtr_vg_swint_full ./GroundingDINO/weights/groundingdino_swint_ogc.pth

or

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinB_full.py  ./data  ./logs/ovsgtr_vg_swinb_full ./GroundingDINO/weights/groundingdino_swinb_cogcoor.pth 

for using Swin-B backbone. you might need to change the default devices of CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 in the script. Notice that the actual batch size = batch size (default 4 in config files) * num gpus. For inference, running with this command

bash scripts/DINO_eval.sh vg [config file] [data path] [output path] [checkpoint]

or

bash scripts/DINO_eval_dist.sh vg [config file] [data path] [output path] [checkpoint]

with multiple GPUs (there is a slight difference of the result output by DINO_eval.sh and DINO_eval_dist.sh due to data dividing and gathering).

benchmark on Closed-set SGG

Checkpoints

backbone R@20/50/100 Checkpoint Config
Swin-T 26.97 / 35.82 / 41.38 link config/GroundingDINO_SwinT_OGC_full.py
Swin-T (w. pre-trained on [MegaSG](https://arxiv.org/pdf/2411.15435) dataset) 27.34 / 36.27 / 41.95 link config/GroundingDINO_SwinT_OGC_full.py
Swin-B 27.75 / 36.44 / 42.35 link config/GroundingDINO_SwinB_full.py
Swin-B (w.o. frequency bias, focal loss) 27.53 / 36.18 / 41.79 link config/GroundingDINO_SwinB_full_open.py
Swin-B (w. pre-trained on MegaSG dataset) 28.61 / 37.58 / 43.41 link config/GroundingDINO_SwinB_full_open.py

OvD-SGG

for OvD-SGG mode, set sg_ovd_mode = True in the config file (e.g., config/GroundingDINO_SwinT_OGC_ovd.py). Following "Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning" and VS3, we split the VG150 into two parts, i.e., base objects VG150_BASE_OBJ_CATEGORIES, and novel objects in VG150_NOVEL2BASE. For PREDCLS, please set use_gt_box=True when calling inference scripts.

OvD

Checkpoints

backbone R@20/50/100 (Base+Novel) R@20/50/100 (Novel) Checkpoint Config
Swin-T 12.34 / 18.14 / 23.20 6.90 / 12.06 / 16.49 link config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B 15.43 / 21.35 / 26.22 10.21 / 15.58 / 19.96 link config/GroundingDINO_SwinB_ovd.py
Swin-T (w. pre-trained on MegaSG dataset) 14.33 / 20.91 / 25.98 10.52 / 17.30 / 22.90 link config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B (w. pre-trained on MegaSG dataset) 15.21 / 21.21 / 26.12 10.31 / 15.78 / 20.47 link config/GroundingDINO_SwinT_OGC_ovd.py

OvR-SGG

for OvR-SGG mode, set sg_ovr_mode = True in the config file (e.g., config/GroundingDINO_SwinT_OGC_ovr.py). Base object categories VG150_BASE_PREDICATE and novel object categories VG150_NOVEL_PREDICATE can be found in the datasets/vg.py.

Checkpoints

backbone R@20/50/100 (Base+Novel) R@20/50/100 (Novel) Checkpoint Config Pre-trained checkpoint Pre-trained config
Swin-T 15.85 / 20.50 / 23.90 10.17 / 13.47 / 16.20 link config/GroundingDINO_SwinT_OGC_ovr.py link config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B 17.63 / 22.90 / 26.68 12.09 / 16.37 / 19.73 link config/GroundingDINO_SwinB_ovr.py link config/GroundingDINO_SwinB_pretrain.py
Swin-T (pretrained on MegaSG) 19.38 / 25.40 / 29.71 12.23 / 17.02 / 21.15 link config/GroundingDINO_SwinT_OGC_ovr.py link config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B (pretrained on MegaSG) 21.09 / 27.92 / 32.74 16.59 / 22.86 / 27.73 link config/GroundingDINO_SwinB_ovr.py link config/GroundingDINO_SwinB_pretrain.py

OvD+R-SGG

For OvD+R-SGG mode, set both sg_ovd_mode = True and sg_ovr_mode = True (e.g., config/GroundingDINO_SwinT_OGC_ovdr.py)

Checkpoints

backbone R@20/50/100 (Joint) R@20/50/100 (Novel Object) R@20/50/100 (Novel Relation) Checkpoint Config Pre-trained checkpoint Pre-trained config
Swin-T 10.02 / 13.50 / 16.37 10.56 / 14.32 / 17.48 7.09 / 9.19 / 11.18 link config/GroundingDINO_SwinT_OGC_ovdr.py link config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B 12.37 / 17.14 / 21.03 12.63 / 17.58 / 21.70 10.56 / 14.62 / 18.22 link config/GroundingDINO_SwinB_ovdr.py link config/GroundingDINO_SwinB_pretrain.py
Swin-T (pretrained on MegaSG) 10.67 / 15.15 / 18.82 8.22 / 12.49 / 16.29 9.62 / 13.68 / 17.19 link config/GroundingDINO_SwinT_OGC_ovdr.py link config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B (pretrained on MegaSG) 12.54 / 17.84 / 21.95 10.29 / 15.66 / 19.84 12.21 / 17.15 / 21.05 link config/GroundingDINO_SwinB_ovdr.py link config/GroundingDINO_SwinB_pretrain.py

Acknowledgement

Thank Scene-Graph-Benchmark.pytorch and GroundingDINO for their awesome code and models.

Citation

Please cite OvSGTR in your publications if it helps your research:

@inproceedings{chen2024expanding,
  title={Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention},
  author={Chen, Zuyao and Wu, Jinlin and Lei, Zhen and Zhang, Zhaoxiang and Chen, Changwen},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={108--124},
  year={2024}
}

About

[ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published