Skip to content

Disentangled Pre-training for Human-Object Interaction Detection

License

Notifications You must be signed in to change notification settings

xingaoli/DP-HOI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Nov 3, 2024
17a3f25 · Nov 3, 2024

History

62 Commits
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Mar 20, 2024
Sep 23, 2024
Sep 23, 2024
May 17, 2024
Sep 23, 2024
Mar 20, 2024
Nov 3, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024
Sep 23, 2024

Repository files navigation

Disentangled Pre-training for Human-Object Interaction Detection

Zhuolong Li*, Xingao Li*, Changxing Ding, Xiangmin Xu

The paper is accepted to CVPR2024.

Preparation

Environment

  1. Install the dependencies.
pip install -r requirements.txt
  1. Clone and build CLIP.
git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Dataset

  1. HAA500 dataset

  Download the Haa500_v1_1 dataset and unzip it to the DP-HOI/data/action folder.

  Run pre_haa500.py.

python ./pre_datasets/pre_haa500.py
  1. Kinetics700 dataset

  Download the Kinetics700 dataset and unzip it to the DP-HOI/data/action folder.

  Run pre_kinetics700.py.

python ./pre_datasets/pre_kinetics700.py
  1. Flickr30k dataset

  Download the Flickr30k dataset from the following URL and directly unzip it to the DP-HOI/data/caption folder.

  1. VG dataset

  Download the VG dataset from the following URL and directly unzip it to the DP-HOI/data/caption folder.

  Download and unzip the processed annotations.zip to DP-HOI/data/caption/annotations folder.

  1. Objects365 dataset

  Download the Objects365 dataset from the following URL and directly unzip it to the DP-HOI/data/datection folder.

  1. COCO dataset

  Download the COCO dataset from the following URL and directly unzip it to the DP-HOI/data/datection folder.

  Download and move the processed coco_objects365_200k.json to the DP-HOI/data/detection/annotations folder.

When you have completed the above steps, the pre-training dataset structure is:

DP-HOI
 |─ data
 |   └─ action
 |     └─ haa500  
 |       |─ annotations
 |       |   |─ train_haa500.json
 |       |─ images
 |       |─ videos
 |     └─ kinetics-700  
 |       |─ annotations
 |       |   |─ train_kinetics700.json
 |       |─ images
 |       |─ videos

 |   └─ caption
 |     └─ annotations
 |       |─ Flickr30k_VG_cluster_dphoi.json
 |       |─ triplets_category.txt
 |       |─ triplets_features.pth
 |     └─ Flickr30k  
 |       |─ images
 |     └─ VG
 |       |─ images
 
 |   └─ detection
 |     └─ annotations
 |       |─ coco_objects365_200k.json
 |     └─ coco  
 |       |─ images
 |       |─ annotations
 |       |   |─ instances_val2017.json
 |     └─ objects365
 |       |─ images

Initial parameters

To speed up the pre-training process, consider using DETR's pre-trained weights for initialization. Download the pretrained model of DETR detector for ResNet50 , and put it to the params directory.

Pre-training

After the preparation, you can start training with the following commands.

sh ./scripts/pretrain/train.sh

Fine-tuning

After pre-training, you can start fine-tuning with the following commands. An example of fine-tuning on HOICLIP is provided below.

python ./tools/convert_parameters.py \
        --finetune_model hoiclip \
        --load_path params/dphoi_res50_3layers.pth \
        --save_path params/dphoi_res50_hico_hoiclip.pth \
        --dataset hico \
        --num_queries 64 
sh ./scripts/finetune/hoiclip/train_hico.sh

Pre-trained model

You can also directly download the pre-trained model of DP-HOI for ResNet50.

Results

HICO-DET

Full (D) Rare (D) Non-rare (D) Model Config
ours (UPT) 33.36 28.74 34.75 model config
ours (PViC) 35.77 32.26 36.81 model config
ours (CDN-S) 35.00 32.38 35.78 model config
ours (CDN-S+CCS*) 35.38 34.61 35.61 model config
ours (HOICLIP) 36.56 34.36 37.22 model config

D: Default, †: DN strategy from DN-DETR, *: data augmentation strategy from DOQ. The weights fine-tuned on HICO-DET for two-stage methods (e.g., UPT and PViC) can be download here.

V-COCO

Scenario 1 Model Config
ours (GENs) 66.6 model config

Zero-shot HOI Detection Results

Type Unseen Seen Full Model Config
ours (HOICLIP) UV 26.30 34.49 33.34 model config
ours (HOICLIP) RF-UC 30.49 36.17 35.03 model config
ours (HOICLIP) NF-UC 28.87 29.98 29.76 model config

Citation

Please consider citing our paper if it helps your research.

@inproceedings{disentangled_cvpr2024,
author = {Zhuolong Li,Xingao Li,Changxing Ding,Xiangmin Xu},
title = {Disentangled Pre-training for Human-Object Interaction Detection},
booktitle={CVPR},
year = {2024},
}

Acknowledgement

Codes are built from DETR, DN-DETR, CLIP. We thank them for their contributions.

About

Disentangled Pre-training for Human-Object Interaction Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published