MEMC-Net: Motion Estimation and Motion Compensation Driven

Neural Network for Video Frame Interpolation and Enhancement

Wenbo Bao*, Wei-Sheng Lai#, Xiaoyun Zhang*, Zhiyong Gao*, Ming-Hsuan Yang#

*Shanghai Jiao Tong University, #University of California, Merced

Abstract

Motion estimation (ME) and motion compensation (MC) have dominated classical video frame interpolation systems over the past decades. Recently, the convolutional neural networks set up a new data-driven paradigm for frame interpolation. However, existing learning based methods typically fall into estimating only one of the ME and MC building blocks, resulting in a limited performance on both computational efficiency and interpolation accuracy. In this work, we propose a motion estimation and motion compensation driven neural network for video frame interpolation. A novel adaptive warping layer is proposed to integrate both optical flow and interpolation kernels to synthesize target frame pixels. This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly. Our method benefits from the ME and MC model-driven architecture while avoiding the conventional hand-crafted design by training on a large amount of video data. Compared to existing methods, our approach is computationally efficient and able to generate more visually appealing results. Moreover, our MEMC architecture is a general framework, which can be seamlessly adapted to several video enhancement tasks, e.g., super-resolution, denoising, and deblocking. Extensive quantitative and qualitative evaluations demonstrate that the proposed method performs favorably against the state-of-the-art video frame interpolation and enhancement algorithms on a wide range of datasets.

Videos

Citation

@article{MEMC-Net,
         title={MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement},
         author={Bao, Wenbo and Lai, Wei-Sheng, and Zhang, Xiaoyun and Gao, Zhiyong and Yang, Ming-Hsuan},
         journal={arXiv preprint arXiv:1810.08768},
         year={2018}
}

Paper, Code, and Results

HD dataset

Selected Results

I. Video Frame Interpolation

A. Middlebury

Overlay

EpicFlow[1]

SpyNet[2]

SepConv-lf[3]

SepConv-l1[3]

Ours

Overlay

EpicFlow[1]

SpyNet[2]

SepConv-lf[3]

SepConv-l1[3]

Ours

MIND[4]

ToFlow[5]

EpicFlow[1]

SpyNet[2]

SepConv-lf[3]

SepConv-l1[3]

Ours

Ground Truth

B. Sintel Dataset

ToFlow[5]

SepConv-lf[3]

SepConv-l1[3]

Ours

ToFlow[5]

SepConv-lf[3]

SepConv-l1[3]

Ours

II. Video Frame Enhancement

A. Super-Resolution

EDSR[6]

ToFlow[5]

BayesSR[7]

Ours

Ground Truth

B. Denoising

EDSR_DN[6]

ToFlow[5]

VBM4D[8]

Ours

Ground Truth

C. Deblocking

EDSR_DB[6]

ToFlow[5]

VBM4D[8]

Ours

Ground Truth

References

1. Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: "Epicflow: Edge-preserving interpolation of correspondences for optical flow". In: CVPR. (2015)

2. Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: CVPR. (2017)

3. Niklaus, S., Mai, L., Liu, F.: Video frame interpolation via adaptive separable convolution. In: ICCV. (2017)

4. Long, G., Kneip, L., Alvarez, J.M., Li, H., Zhang, X., Yu, Q.: Learning image matching by simply watching video. In: ECCV. (2016)

5. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with taskoriented flow. arXiv preprint arXiv:1711.09078 (2017)

6. B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee: Enhanced deep residual networks for single image super-resolution. In: CVPR Workshops, 2017.

7. C. Liu and D. Sun: A bayesian approach to adaptive video super resolution, In: CVPR, 2011.

8. M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian: Video denoising, deblocking, and enhancement through separable 4-d nonlocal spatiotemporal transforms, In: TIP, 2012.