Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges

Tutorial syllabus

Obtaining dense and accurate depth measurement is of paramount importance for many 3D computer vision applications. Stereo matching has undergone a paradigm shift in the last few years due to the introduction of learning-based methods that replaced heuristics and hand-crafted rules. While in early 2012 the KITTI dataset highlighted how stereo matching was still an open problem, the recent success of Convolutional Neural Networks has led to tremendous progress and has established these methods as the undisputed state of the art. Similar observations can be made on all recent benchmarks, such as the KITTI 2012 and 2015, the Middlebury 2014 and the ETH3D benchmark, the leaderboards of which are dominated by learning-based methods.

The tutorial will cover conventional and deep learning methods that have replaced the components of the conventional stereo matching pipeline, end-to-end stereo systems and confidence estimation. The second part will focus on related problems, specifically single-view depth estimation and multi-view stereo, that have also benefited from the availability of ground truth datasets and learning algorithms. The tutorial will conclude with open problems including generalization as well as unsupervised and weakly supervised training.

Morning session:

  • Confidence measures and early machine-learning for stereo
  • Learning-based matching functions
  • Learning for optimization and post-processing
  • End-to-end stereo and synthetic datasets

Afternoon session:

  • Learning for multi-view stereo
  • End-to-end unsupervised monocular depth estimation
  • Domain shift and unseen environments: adaptation techniques