This repository contains an implementation of the models introduced in the paper Scene Graph Generation by Iterative Message Passing by Xu et al. The model taks an image and a graph proposal as input and predicts the object and relationship categories in the graph. The network is implemented using TensorFlow and the rest of the framework is in Python. Because the model is built directly on top of Faster-RCNN by Ren et al, a substantial amount of data processing code is adapted from the py-faster-rcnn repository.
If you find this work useful in your research, please consider citing:
@inproceedings{xu2017scenegraph,
title={Scene Graph Generation by Iterative Message Passing},
author={Xu, Danfei and Zhu, Yuke and Choy, Christopher and Fei-Fei, Li},
booktitle={Computer Vision and Pattern Recognition (CVPR)},
year={2017}
}
The project page is available at http://cs.stanford.edu/~danfei/scene-graph/.
- The framework does not include a regional proposal network implementation. A RoI proposal database pre-extracted using the py-faster-rcnn framework is available for download.
- You need CUDA-compatible GPUs to run the framework. A CPU-compatible version will be released soon.
- You need at least 320 GB of free space to store the processed VisualGenome image dataset. A training script that reads image files directly will be released in the future. However, if you just want to test/visualize some sample predictions, you may download a subset of the processed dataset (mini-vg) following the instruction or the "Quick Start" section. The subset takes ~4GB of space.
To get started with the framework, install the following dependencies:
- Python 2.7
- TensorFlow r0.11
- h5py
- numpy
- matplotlib
- scipy
- pyyaml
- easydict
- cython
- graphviz (optional, if you wish to visualize the graph structure)
- CUDA 7.5/8.0 (required, CPU-only version will be released soon)
-
It is recommended that you install everything in an Anaconda environment, i.e., install Anaconda and run
conda create -n sence-graph python=2.7 source activate scene-graph
-
Run
pip install -r requirements.txt
to install all the requirement except TensorFlow and CUDA. Follow the provided URL to install TensorFlow r0.11 (0.10 and 0.12 also works).
The code has not been tested on TensorFlow 1.0 and above, but may potentially work once you convert all TF-related code using the offical transition script.
- After you have installed all the dependencies, run the following command to compile nms and bbox libraries.
cd lib
make
- Follow this this instruction to see if you can use the pre-compiled roi-pooling custom op or have to compile the op by yourself.
-
Make sure you have installed all the dependencies and compiled the libraries.
-
Run the
download.sh
script to download the mini-vg dataset and a model checkpoint../download.sh
-
Run the following command to visualize a predicted scene graph. Set
GPU_ID
to the ID of the GPU you want to use, e.g.0
../experiments/scripts/test.sh mini-vg -1 \ dual_graph_vrd_final 2 \ checkpoints/dual_graph_vrd_final_iter2.ckpt \ viz_cls \ GPU_ID
The scene graph dataset used in the paper is the VisualGenome dataset, although the framework can work with any scene graph dataset if converted to the desired format. Please refer to the dataset README for further instructions on converting the VG dataset into the desired format or downloading pre-processed datasets.
Follow the following steps to train a model:
-
Prepare or download the full dataset.
-
Download a Faster-RCNN model pretrained on the MS-COCO dataset and save the model to
data/pretrained/
. -
Edit the training script
experiments/scripts/train.sh
such that all paths agree with the files on your file system. -
To train the final model with inference iterations = 2, run:
./experiments/scripts/train.sh dual_graph_vrd_final 2 CHECKPOINT_DIRECTORY GPU_ID
The program saves a checkpoint to checkpoints/CHECKPOINT_DIRECTORY/
every 50000 iterations. Training a full model on a desktop with Intel i7 CPU, 64GB memory, and a TitanX graphics card takes around 20 hours. You may use tensorboard to visualize the training process. By default, the tf log directory is set to checkpoints/CHECKPOINT_DIRECTORY/tf_logs/
.
Follow the following steps to evaluate a model:
-
Prepare or download the full dataset or the mini-vg dataset.
-
If you wish to evaluate a pre-trained model, first download a checkpoint in the "Checkpoints" section.
-
Edit the evaluation script
experiments/scripts/test.sh
such that all paths agree with the files on your file system. -
To evaluate the final model with inference iterations = 2 using 100 images in the test set of the full VG dataset (use mini-vg for the mini VG dataset), run
./experiments/scripts/test.sh vg 100 dual_graph_vrd_final 2 CHECKPOINT_PATH(.ckpt) all GPU_ID
Note that to reproduce the results as presented the paper, you have to evaluate the entire test set by setting the number of images to -1.
The evaluation process takes around 10 hours. Setting the evaluation mode to all
is to evaluate the models on all three tasks, i.e., pred_cls, sg_cls, sg_det
.
You can also set the evaluation mode to individual tasks.
Follow the following steps to visualize a scene graph predicted by the model:
-
Prepare or download the full dataset or the mini-vg dataset.
-
If you wish to evaluate a pre-trained model, first download a checkpoint in the "Checkpoints" section.
-
Edit the evaluation script
experiments/scripts/test.sh
such that all paths agree with the files on your file system. -
To visualize the predicted graph of the first 100 images in the test set of the full VG dataset (use mini-vg for the mini VG dataset), run
./experiments/scripts/test.sh vg 100 dual_graph_vrd_final 2 CHECKPOINT_PATH(.ckpt) viz_cls GPU_ID
The viz_cls
mode assumes ground truth bounding boxes and predicts the predicted object and relationship labels, which is of the same setting as the sg_cls
task.
viz_det
mode uses the proposed bounding box from the regional proposal network as the object proposals, which is of the same setting as the sg_det
task.
A TensorFlow checkpoint of the final model trained with 2 inference iterations:
dual_graph_vrd_final_iter2_checkpoint.zip
MIT License