Depth-Aware Video Frame Interpolation

Wenbo Bao*, Wei-Sheng Lai#, Chao Ma*, Xiaoyun Zhang*, Zhiyong Gao*, Ming-Hsuan Yang#&

*Shanghai Jiao Tong University, #University of California, Merced, &Google


Video frame interpolation aims to synthesize non-existent frames in-between the original frames. While significant advances have been made from the deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose to explicitly detect the occlusion by exploring the depth cue in frame interpolation. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features as the contextual information. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable to optimize all the components. We conduct extensive experiments to analyze the effect of the depth-aware flow projection layer and hierarchical contextual features. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.


    author    = {Bao, Wenbo and Lai, Wei-Sheng and Ma, Chao and Zhang, Xiaoyun and Gao, Zhiyong and Yang, Ming-Hsuan}, 
    title     = {Depth-Aware Video Frame Interpolation}, 
    booktitle = {IEEE Conferene on Computer Vision and Pattern Recognition},
    year      = {2019}
    author    = {Bao, Wenbo and Lai, Wei-Sheng and Ma, Chao and Zhang, Xiaoyun and Gao, Zhiyong and Yang, Ming-Hsuan}, 
    title     = {Depth-Aware Video Frame Interpolation}, 
    booktitle = {},
    year      = {2019}


CVPR19 (to appear)

Video Demos

Network Architecture

Code and Results


[1] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black, and R. Szeliski. A database and evaluation methodology for optical flow. IJCV, 2011. 2

[2] W. Bao, W.-S. Lai, X. Zhang, Z. Gao, and M.-H. Yang. MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement. arXiv, 2018. 1

[14] H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. LearnedMiller, and J. Kautz. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation. In CVPR, 2018.

[19] Z. Li and N. Snavely. Megadepth: Learning single-view depth prediction from internet photos. In CVPR, 2018. 2

[22] G. Long, L. Kneip, J. M. Alvarez, H. Li, X. Zhang, and Q. Yu. Learning image matching by simply watching video. In ECCV, 2016.

[23] S. Niklaus and F. Liu. Context-aware synthesis for video frame interpolation. In CVPR, 2018.

[25] S. Niklaus, L. Mai, and F. Liu. Video frame interpolation via adaptive separable convolution. In ICCV, 2017.

[29] J. Revaud, P. Weinzaepfel, Z. Harchaoui, and C. Schmid. Epicflow: Edge-preserving interpolation of correspondences for optical flow. In CVPR, 2015.

[33] K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01, 2012. 2

[39] T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman. Video enhancement with task-oriented flow. arXiv, 2017. 2