ACM international conference on Multimedia (ACM MM) 2016
Fig 1: Depth stream results. (top) high-speed color images. (middle) interpolated depth maps based on optical/scene flow. The first and the last depth map are input. (bottom) corresponding depth generated with our algorithm.
Fig 2: System Pipeline
Fig 3: Results on sampled images from hand waving stream. (a) Input color image and depth map. (b-e) Albedo and Shading images estimated by three recent approaches for intrinsic decomposition and by our approach
Fig 4: Results on the hand waving sequence. To better show the shape of the hands, we adjust the perpective angle of the generated mesh. We have two key frame color images and the initial meshes as input shown in (a) and (c). (d) and (h) are the key frame meshes after refinement. The second row shows the interpolation result of one sampled frame shown in (b). (e)-(g) are the recovered meshes using BL, our proposed method and SBL, respectively.
Fig 5: Results on the towel shaking sequence. To better demonstrate the shape of the towel, we adjust the perpective angle of the generated mesh which differs from the color image. First row shows the input color sequence with left most and right most have the corresponding input depth frame, while the middle ones are sampled frames between these two frames. Second row gives the input depth on the left and right, and interpolated depth with BL method for the middle frames. Third row displays the shading refinement result using SBL method. The last row shows the recovered depth from our method
Fig 6: Quantitative results for real datasets. All the results are computed with squared mean error. We discard the extreme outliers around the surface boundaries during the evaluation.
Abstract:
High-speed video has been commonly adopted in consumer-grade cameras, augmenting these videos with a corresponding depth stream will enable new multimedia applications, such as 3D slow-motion video. In this paper, we present a hybrid camera system that combines a high-speed color camera with a depth sensor, e.g. Kinect depth sensor, to generate a depth stream that can produce both high-speed and high-resolution RGB+depth stream. Simply interpolating the low-speed depth frames is not satisfactory, where interpolation artifacts and lose in surface details are often visible. We have developed a novel framework that utilizes both shading constraints within each frame and optical flow constraints between neighboring frames. More specifically we present (a) an effective method to find the intrinsics images to allow more accurate normal estimation; and (b) an optimization-based framework to estimate the high-resolution/high-speed depth stream, taking into consideration temporal smoothness and shading/depth consistency.
We evaluated our holistic framework with both synthetic and real sequences, it showed superior performance than previous state-of-the-art.
Citation:
@inproceedings{zuo2016High, title={High-speed Depth Stream Generation from a Hybrid Camera}, author={Xinxin Zuo and and Sen Wang and Jiangbin Zheng and Ruigang Yang}, booktitle={Proceedings of the 24th ACM international conference on Multimedia}, pages={878--887}, year={2016}, organization={ACM} }Reference
[1] Q. Chen and V. Koltun. A simple model for intrinsic image decomposition with depth cues. In ICCV, pages 241-248, 2013.
[2] J. Jeon, S. Cho, X. Tong, and S. Lee. Intrinsic image decomposition using structure-texture separation and surface normals. In ECCV, pages 218-233, 2014.
[3] J. T. Barron and J. Malik. Intrinsic scene properties from a single rgb-d image. IEEE TPAMI, 38(4):690-703, 2016.