Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li^*, Xingao Li^*, Changxing Ding, Xiangmin Xu

The paper is accepted to CVPR2024.

Preparation

Environment

Install the dependencies.

pip install -r requirements.txt

Clone and build CLIP.

git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Dataset

HAA500 dataset

Download the Haa500_v1_1 dataset and unzip it to the DP-HOI/data/action folder.

Run pre_haa500.py.

python ./pre_datasets/pre_haa500.py

Kinetics700 dataset

Download the Kinetics700 dataset and unzip it to the DP-HOI/data/action folder.

Run pre_kinetics700.py.

python ./pre_datasets/pre_kinetics700.py

Flickr30k dataset

Download the Flickr30k dataset from the following URL and directly unzip it to the DP-HOI/data/caption folder.

VG dataset

Download the VG dataset from the following URL and directly unzip it to the DP-HOI/data/caption folder.

Download and unzip the processed annotations.zip to DP-HOI/data/caption/annotations folder.

Objects365 dataset

Download the Objects365 dataset from the following URL and directly unzip it to the DP-HOI/data/datection folder.

COCO dataset

Download the COCO dataset from the following URL and directly unzip it to the DP-HOI/data/datection folder.

Download and move the processed coco_objects365_200k.json to the DP-HOI/data/detection/annotations folder.

When you have completed the above steps, the pre-training dataset structure is:

DP-HOI
 |─ data
 |   └─ action
 |     └─ haa500  
 |       |─ annotations
 |       |   |─ train_haa500.json
 |       |─ images
 |       |─ videos
 |     └─ kinetics-700  
 |       |─ annotations
 |       |   |─ train_kinetics700.json
 |       |─ images
 |       |─ videos

 |   └─ caption
 |     └─ annotations
 |       |─ Flickr30k_VG_cluster_dphoi.json
 |       |─ triplets_category.txt
 |       |─ triplets_features.pth
 |     └─ Flickr30k  
 |       |─ images
 |     └─ VG
 |       |─ images
 
 |   └─ detection
 |     └─ annotations
 |       |─ coco_objects365_200k.json
 |     └─ coco  
 |       |─ images
 |       |─ annotations
 |       |   |─ instances_val2017.json
 |     └─ objects365
 |       |─ images

Initial parameters

To speed up the pre-training process, consider using DETR's pre-trained weights for initialization. Download the pretrained model of DETR detector for ResNet50 , and put it to the params directory.

Pre-training

After the preparation, you can start training with the following commands.

sh ./scripts/pretrain/train.sh

Fine-tuning

After pre-training, you can start fine-tuning with the following commands. An example of fine-tuning on HOICLIP is provided below.

python ./tools/convert_parameters.py \
        --finetune_model hoiclip \
        --load_path params/dphoi_res50_3layers.pth \
        --save_path params/dphoi_res50_hico_hoiclip.pth \
        --dataset hico \
        --num_queries 64 
sh ./scripts/finetune/hoiclip/train_hico.sh

Pre-trained model

You can also directly download the pre-trained model of DP-HOI for ResNet50.

Results

HICO-DET

	Full (D)	Rare (D)	Non-rare (D)	Model	Config
ours (UPT)	33.36	28.74	34.75	model	config
ours (PViC)	35.77	32.26	36.81	model	config
ours (CDN-S^†)	35.00	32.38	35.78	model	config
ours (CDN-S^†+CCS^*)	35.38	34.61	35.61	model	config
ours (HOICLIP)	36.56	34.36	37.22	model	config

D: Default, †: DN strategy from DN-DETR, *: data augmentation strategy from DOQ. The weights fine-tuned on HICO-DET for two-stage methods (e.g., UPT and PViC) can be download here.

V-COCO

	Scenario 1	Model	Config
ours (GEN_s)	66.6	model	config

Zero-shot HOI Detection Results

	Type	Unseen	Seen	Full	Model	Config
ours (HOICLIP)	UV	26.30	34.49	33.34	model	config
ours (HOICLIP)	RF-UC	30.49	36.17	35.03	model	config
ours (HOICLIP)	NF-UC	28.87	29.98	29.76	model	config

Citation

Please consider citing our paper if it helps your research.

@inproceedings{disentangled_cvpr2024,
author = {Zhuolong Li,Xingao Li,Changxing Ding,Xiangmin Xu},
title = {Disentangled Pre-training for Human-Object Interaction Detection},
booktitle={CVPR},
year = {2024},
}

Acknowledgement

Codes are built from DETR, DN-DETR, CLIP. We thank them for their contributions.

Name	Name	Last commit message	Last commit date
Latest commit zlli3 Nov 3, 2024 17a3f25 · Nov 3, 2024 History 62 Commits
__pycache__	__pycache__	init	Sep 23, 2024
data	data	init	Sep 23, 2024
datasets	datasets	init	Sep 23, 2024
models	models	init	Sep 23, 2024
paper_images	paper_images	init	Sep 23, 2024
params	params	Create .keep	Mar 20, 2024
pre_datasets	pre_datasets	init	Sep 23, 2024
scripts	scripts	init	Sep 23, 2024
tools	tools	Upload tools	May 17, 2024
util	util	init	Sep 23, 2024
LICENSE	LICENSE	Add files via upload	Mar 20, 2024
README.md	README.md	Update README.md	Nov 3, 2024
engine.py	engine.py	init	Sep 23, 2024
hubconf.py	hubconf.py	init	Sep 23, 2024
main.py	main.py	init	Sep 23, 2024
requirements.txt	requirements.txt	init	Sep 23, 2024
run_with_submitit.py	run_with_submitit.py	init	Sep 23, 2024
test_all.py	test_all.py	init	Sep 23, 2024
tox.ini	tox.ini	init	Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disentangled Pre-training for Human-Object Interaction Detection

Preparation

Environment

Dataset

Initial parameters

Pre-training

Fine-tuning

Pre-trained model

Results

HICO-DET

V-COCO

Zero-shot HOI Detection Results

Citation

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

xingaoli/DP-HOI

Folders and files

Latest commit

History

Repository files navigation

Disentangled Pre-training for Human-Object Interaction Detection

Preparation

Environment

Dataset

Initial parameters

Pre-training

Fine-tuning

Pre-trained model

Results

HICO-DET

V-COCO

Zero-shot HOI Detection Results

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages