Monocular Relative Depth Perception with Web Stereo Data Supervision

Ke Xian1, Chunhua Shen2, Zhiguo Cao1*, Hao Lu1, Yang Xiao1, Ruibo Li1, Zhenbo Luo3

1School of Automation, Huazhong University of Science and Technology, China

2The University of Adelaide, Australia 3Samsung Research Beijing, China

e-mail: kexian@hust.edu.cn

Abstract

In this paper we study the problem of monocular relative depth perception in the wild. We introduce a simple yet effective method to automatically generate dense relative depth annotations from web stereo images, and propose a new dataset that consists of diverse images as well as corresponding dense relative depth maps. Further, an improved ranking loss is introduced to deal with imbalanced ordinal relations, enforcing the network to focus on a set of hard pairs. Experimental results demonstrate that our proposed approach not only achieves state-of-the-art accuracy of relative depth perception in the wild, but also benefits other dense per-pixel prediction tasks, e.g., metric depth estimation and semantic segmentation.

ReDWeb V1 dataset

The ReDWeb V1 dataset consists of 3.6K RGB-RD images, covering both indoor and outdoor scenes. Note that, this dataset can be used for research only.

Download (Please feel free to reach out to me if you cannot download the ReDWeb dataset.)

BibTex

@inproceedings{Xian_2018_CVPR,

title = {Monocular Relative Depth Perception with Web Stereo Data Supervision},

author = {Xian, Ke and Shen, Chunhua and Cao, Zhiguo and Lu, Hao and Xiao, Yang and Li, Ruibo and Luo, Zhenbo},

booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},

month = {June},

year = {2018}

}

Acknowledgements

This work was supported in part by the National High-tech R&D Program of China (863 Program) under Grant No. 2015AA015904 and in part by the National Natural Science Foundation of China under Grant No. 61502187.