Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes
Chaoyang Wang Simon Lucey Federico Perazzi Oliver Wang
Carnegie Mellon University Adobe Inc.Chaoyang Wang Simon Lucey Federico Perazzi Oliver Wang
Carnegie Mellon University Adobe Inc.Samples from WSVD dataset
We present a fully data-driven method to compute depth from diverse monocular video sequences that contain large amounts of non-rigid objects, e.g., people. To learn reconstruction cues for non-rigid scenes, we introduce a new dataset (WSVD) consisting of stereo videos scraped from Youtube. This dataset has a wide variety of scene types, and features many nonrigid objects.
'''
@misc{wang2019web, title={Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes}, author={Chaoyang Wang and Simon Lucey and Federico Perazzi and Oliver Wang}, year={2019}, eprint={1904.11112}, archivePrefix={arXiv}, primaryClass={cs.CV}}'''
We infer depth from pairs of frames, which allows our network to take advantage of multiview information. As multiview information is ambiguous with respect to moving objects, we learn a prior on scenes with nonrigid objects by a new large scale stereo dataset. We also introduce a novel loss function for training that yields high quality results on web-sourced videos with unknown intrinsics. Please see the paper for full details.
Web Stereo Video Dataset consists of 553 stereoscopic videos from YouTube.
To download the videos, first download wsvd_list.txt, then run the following command assuming 'youtube-dl' has already been installed.
youtube-dl --download-archive downloaded.txt -f 'bestvideo[ext=mp4]' -a wsvd_list.txt -o 'wsvd/%(id)s.%(ext)s'
We provide the list of video frames which are the result of the following procedure:
We notice some of the stereoscopic videos place their left/right view on the opposite side, resulting in inverse disparity. We manually label those cases. The labels are provided per clips:
The frame ID lists (index starts from 1) are provided as pickle files:
Coming soon...