Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

Official Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention" ( Paper )

News

2025.02: add checkpoints for the TPAMI version
2024.10: our paper has beed recognized as ``Best Paper Candidate'' (Milano, Italy, ECCV 2024)

Setup

For simplicity, you can directly run bash install.sh, which includes the following steps:

install pytorch 1.9.1 and other dependencies, e.g.,

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html # this might need to be changed due to cuda driver version 

pip install -r requirements.txt

install GroundingDINO and download pre-trained weights

cd GroundingDINO && python3 setup.py install

mkdir $PWD/GroundingDINO/weights/

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth -O $PWD/GroundingDINO/weights/groundingdino_swint_ogc.pth

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth -O $PWD/GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

Dataset

- VG150
- COCO

prepare the dataset under the folder data with the instruction

Closed-set SGG

For training OvSGTR (w. Swin-T) on VG150, running with this command

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinT_OGC_full.py  ./data  ./logs/ovsgtr_vg_swint_full ./GroundingDINO/weights/groundingdino_swint_ogc.pth

or

bash scripts/DINO_train_dist.sh vg ./config/GroundingDINO_SwinB_full.py  ./data  ./logs/ovsgtr_vg_swinb_full ./GroundingDINO/weights/groundingdino_swinb_cogcoor.pth

for using Swin-B backbone. you might need to change the default devices of CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 in the script. Notice that the actual batch size = batch size (default 4 in config files) * num gpus. For inference, running with this command

bash scripts/DINO_eval.sh vg [config file] [data path] [output path] [checkpoint]

or

bash scripts/DINO_eval_dist.sh vg [config file] [data path] [output path] [checkpoint]

with multiple GPUs (there is a slight difference of the result output by DINO_eval.sh and DINO_eval_dist.sh due to data dividing and gathering).

Checkpoints

backbone	R@20/50/100	Checkpoint	Config
Swin-T	26.97 / 35.82 / 41.38	link	config/GroundingDINO_SwinT_OGC_full.py
Swin-T (w. pre-trained on [MegaSG](https://arxiv.org/pdf/2411.15435) dataset)	27.34 / 36.27 / 41.95	link	config/GroundingDINO_SwinT_OGC_full.py
Swin-B	27.75 / 36.44 / 42.35	link	config/GroundingDINO_SwinB_full.py
Swin-B (w.o. frequency bias, focal loss)	27.53 / 36.18 / 41.79	link	config/GroundingDINO_SwinB_full_open.py
Swin-B (w. pre-trained on MegaSG dataset)	28.61 / 37.58 / 43.41	link	config/GroundingDINO_SwinB_full_open.py

OvD-SGG

for OvD-SGG mode, set sg_ovd_mode = True in the config file (e.g., config/GroundingDINO_SwinT_OGC_ovd.py). Following "Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning" and VS3, we split the VG150 into two parts, i.e., base objects VG150_BASE_OBJ_CATEGORIES, and novel objects in VG150_NOVEL2BASE. For PREDCLS, please set use_gt_box=True when calling inference scripts.

Checkpoints

backbone	R@20/50/100 (Base+Novel)	R@20/50/100 (Novel)	Checkpoint	Config
Swin-T	12.34 / 18.14 / 23.20	6.90 / 12.06 / 16.49	link	config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B	15.43 / 21.35 / 26.22	10.21 / 15.58 / 19.96	link	config/GroundingDINO_SwinB_ovd.py
Swin-T (w. pre-trained on MegaSG dataset)	14.33 / 20.91 / 25.98	10.52 / 17.30 / 22.90	link	config/GroundingDINO_SwinT_OGC_ovd.py
Swin-B (w. pre-trained on MegaSG dataset)	15.21 / 21.21 / 26.12	10.31 / 15.78 / 20.47	link	config/GroundingDINO_SwinT_OGC_ovd.py

OvR-SGG

for OvR-SGG mode, set sg_ovr_mode = True in the config file (e.g., config/GroundingDINO_SwinT_OGC_ovr.py). Base object categories VG150_BASE_PREDICATE and novel object categories VG150_NOVEL_PREDICATE can be found in the datasets/vg.py.

Checkpoints

backbone	R@20/50/100 (Base+Novel)	R@20/50/100 (Novel)	Checkpoint	Config	Pre-trained checkpoint	Pre-trained config
Swin-T	15.85 / 20.50 / 23.90	10.17 / 13.47 / 16.20	link	config/GroundingDINO_SwinT_OGC_ovr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B	17.63 / 22.90 / 26.68	12.09 / 16.37 / 19.73	link	config/GroundingDINO_SwinB_ovr.py	link	config/GroundingDINO_SwinB_pretrain.py
Swin-T (pretrained on MegaSG)	19.38 / 25.40 / 29.71	12.23 / 17.02 / 21.15	link	config/GroundingDINO_SwinT_OGC_ovr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B (pretrained on MegaSG)	21.09 / 27.92 / 32.74	16.59 / 22.86 / 27.73	link	config/GroundingDINO_SwinB_ovr.py	~~link~~	config/GroundingDINO_SwinB_pretrain.py

OvD+R-SGG

For OvD+R-SGG mode, set both sg_ovd_mode = True and sg_ovr_mode = True (e.g., config/GroundingDINO_SwinT_OGC_ovdr.py)

Checkpoints

backbone	R@20/50/100 (Joint)	R@20/50/100 (Novel Object)	R@20/50/100 (Novel Relation)	Checkpoint	Config	Pre-trained checkpoint	Pre-trained config
Swin-T	10.02 / 13.50 / 16.37	10.56 / 14.32 / 17.48	7.09 / 9.19 / 11.18	link	config/GroundingDINO_SwinT_OGC_ovdr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B	12.37 / 17.14 / 21.03	12.63 / 17.58 / 21.70	10.56 / 14.62 / 18.22	link	config/GroundingDINO_SwinB_ovdr.py	link	config/GroundingDINO_SwinB_pretrain.py
Swin-T (pretrained on MegaSG)	10.67 / 15.15 / 18.82	8.22 / 12.49 / 16.29	9.62 / 13.68 / 17.19	link	config/GroundingDINO_SwinT_OGC_ovdr.py	~~link~~	config/GroundingDINO_SwinT_OGC_pretrain.py
Swin-B (pretrained on MegaSG)	12.54 / 17.84 / 21.95	10.29 / 15.66 / 19.84	12.21 / 17.15 / 21.05	link	config/GroundingDINO_SwinB_ovdr.py	~~link~~	config/GroundingDINO_SwinB_pretrain.py

Acknowledgement

Thank Scene-Graph-Benchmark.pytorch and GroundingDINO for their awesome code and models.

Citation

Please cite OvSGTR in your publications if it helps your research:

@inproceedings{chen2024expanding,
  title={Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention},
  author={Chen, Zuyao and Wu, Jinlin and Lei, Zhen and Zhang, Zhaoxiang and Chen, Changwen},
  booktitle={European Conference on Computer Vision (ECCV)},
  pages={108--124},
  year={2024}
}

Name	Name	Last commit message	Last commit date
Latest commit JosephChenHub Feb 24, 2025 11b9c58 · Feb 24, 2025 History 35 Commits
GroundingDINO	GroundingDINO	update license	Sep 14, 2024
config	config	update swint ovr config	Jul 27, 2024
datasets	datasets	Update data.md	Sep 20, 2024
figures	figures	update README.md	Jun 5, 2024
models	models	code clean & refactory	Jun 4, 2024
paper	paper	correct poster	Oct 7, 2024
scripts	scripts	code clean & refactory	Jun 4, 2024
tools	tools	update configs &vg.py	Jun 5, 2024
util	util	code clean & refactory	Jun 4, 2024
.gitignore	.gitignore	code clean & refactory	Jun 4, 2024
LICENSE	LICENSE	update license	Sep 14, 2024
README.md	README.md	Update README.md	Feb 24, 2025
engine.py	engine.py	update README	Jun 4, 2024
install.sh	install.sh	update README	Jun 4, 2024
main.py	main.py	code clean & refactory	Jun 4, 2024
requirements.txt	requirements.txt	code clean & refactory	Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

News

Setup

Dataset

Closed-set SGG

Checkpoints

OvD-SGG

Checkpoints

OvR-SGG

Checkpoints

OvD+R-SGG

Checkpoints

Acknowledgement

Citation

About

Releases

Packages

Languages

License

gpt4vision/OvSGTR

Folders and files

Latest commit

History

Repository files navigation

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

News

Setup

Dataset

Closed-set SGG

Checkpoints

OvD-SGG

Checkpoints

OvR-SGG

Checkpoints

OvD+R-SGG

Checkpoints

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages