Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation-CVPR 2018 Spotlight
The repository is an official implementation for the paper.
Links: [Paper][Oral Presentation]
By Dan Xu, Wei Wang, Hao Tang, Hong Liu, Nicu Sebe, Elisa Ricci
The code is implemented based on the Caffe framework. Please first download and install the modified caffe version. The code is tested on CUDA 8.0, cudnn 5.1, and python 2.7. The installation can follow the following instructions:
First clone the repository:
git clone https://github.com/danxuhk/StructuredAttentionDepthEstimation.git
Then build caffe and pycaffe:
cd $Caffe_ROOT cp Makefile.config.example Makefile.config vim Makefile.config ### changing neccessary lines to add dependancy sh install.sh
First download KITTI raw data from the official website http://www.cvlibs.net/datasets/kitti/ to the folder ./StructuredAttentionDepthEstimation/data/KITTI
. To generate the training data, follow the commands:
cd ./StructuredAttentionDepthEstimation/data python save_16bitpng_gt.py
The process will generate a training pair text file 'eigen_train_pairs.txt' under ./utils/filenames
for use in the training phase.
For testing, the eigen split of 697 images is used.
Please first download the trained model from Google Drive, and put the model under ./StructuredAttentionDepthEstimation/models
. The saved testing results can be also downloaded the same link. To test the trained model, follow the instructions:
cd ./StructuredAttentionDepthEstimation/prototxt python gen_deploy_prototxt.py ### generating a network definition for the deploy network sh test.sh ### testing and evaluating the model
We refine and fuse the multi-scale features derived from different deep semantic layers (e.g. res3d, res4f, res5c layers) using the proposed MeanFieldUpdate module as follows:
#the first meanfield updating MeanFieldUpdate(n, n.res3d_dec, n.res5c_dec, 1, 1, feat_num) MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf1, 2, 1, feat_num) MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf1, 3, 1, feat_num) #the second meanfield updating MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf1, 1, 2, feat_num) MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf2, 2, 2, feat_num) MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf2, 3, 2, feat_num) #the third meanfield updating MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf2, 1, 3, feat_num) MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf3, 2, 3, feat_num) MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf3, 3, 3, feat_num) #the fourth meanfield updating MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf3, 1, 4, feat_num) MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf4, 2, 4, feat_num) MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf4, 3, 4, feat_num) #the fifth meanfield updating MeanFieldUpdate(n, n.res3d_dec, n.updated_f3_mf4, 1, 5, feat_num) MeanFieldUpdate(n, n.res4f_dec, n.updated_f1_mf5, 2, 5, feat_num) MeanFieldUpdate(n, n.res5c_dec, n.updated_f2_mf5, 3, 5, feat_num)
Our testing runs very fast, and approaches around 8 fps in nearly real-time, which is significantly faster than previous graphical model-based approaches for single image depth estimation. The testing results on KITTI are shown in the table below using both the Eigen and the Garg crop. We further improved the accuracy over the results in the paper. The table and the figure below show the qualitative and the quatitative results respectively. The results are not exactly the same as the results reported in our paper. We further improved the accuracy.
The produced visualization results can be downloaded from here.To retrain the model, please first download the ResNet50 pretrained model on the ImageNet, and then put it under the foler ./StructuredAttentionDepthEstimation/models/pretrained_model
, and rename it with ResNet-50-pratrained-model.caffemodel
, which will be used as an initialization of our backbone network. To train our whole model, please follow:
cd ./StructuredAttentionDepthEstimation/prototxt python gen_train_prototxt.py ### generate a network definition for the training network python train.py
The training supports multiple GPU speedup. You can modify the iter_size in the ./prototxt/solver.prototxt
, the batch_size in gen_train_prototxt.py
and the gpu number in train.py
to change the overall batch size.
The # of overall batch size = # of gpus * batch_size * iter_size.
A Pytorch implementation of our model can be found here:
https://github.com/dontLoveBugs/StructuredAttentionDepthEstimation_pytorch
Please consider citing the following paper if the code is helpful in your research work:
@inproceedings{xu2018structured, title={Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation}, author={Xu, Dan and Wang, Wei and Tang, Hao and Liu, Hong and Sebe, Nicu and Ricci, Elisa}, booktitle={CVPR}, year={2018} }