Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth
Ziyue Feng, Liang Yang, Longlong Jing, Haiyan Wang, Yingli Tian, and Bing Li
Ziyue Feng, Liang Yang, Longlong Jing, Haiyan Wang, Yingli Tian, and Bing Li
Abstract. Conventional self-supervised monocular depth prediction methods are based on a static environment assumption, which leads to accuracy degradation in dynamic scenes due to the mismatch and occlusion problems introduced by object motions. Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level. In this paper, we accordingly propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels. Our method, called DynamicDepth, is a new framework trained via a self-supervised cycle consistent learning scheme. A Dynamic Object Motion Disentanglement (DOMD) module is proposed to disentangle object motions to solve the mismatch problem. Moreover, novel occlusion-aware Cost Volume and Re-projection Loss are designed to alleviate the occlusion effects of object motions. Extensive analyses and experiments on the Cityscapes and KITTI datasets show that our method significantly outperforms the state-of-the-art monocular depth prediction methods, especially in the areas of dynamic objects.
Contributions:
We propose a novel Dynamic Object Motion Disentanglement (DOMD) module which leverages an initial depth prior prediction to solve the object motion mismatch problem in the final depth prediction.
We devise a Dynamic Object Cycle Consistent training scheme to mutually reinforce the Prior Depth and the Final Depth prediction.
We design an Occlusion-aware Cost Volume to enable geometric reasoning across temporal frames even in object motion occluded areas, and a novel Occlusion-aware Re-projection Loss to alleviate the motion occlusion problem in training supervision.
Our method significantly outperforms existing state-of-the-art methods on the Cityscapes and KITTI datasets.
Citation
@article{feng2022disentangling,
title={Disentangling Object Motion and Occlusion for Unsupervised Multi-frame Monocular Depth},
author={Feng, Ziyue and Yang, Liang and Jing, Longlong and Wang, Haiyan and Tian, YingLi and Li, Bing},
journal={arXiv preprint arXiv:2203.15174},
year={2022}
}