Sensor Fusion of Lidar and Camera in Depth

The high definition Lidar is important to realize online 3-d scene reconstruction.

However, it is expensive and its generated point cloud data so far is not dense enough for the next stage of applications, like object segmentation, detection, tracking and classification.

A system and platform is designed to utilize the low cost sensors, like RGB cameras and low definition Lidar, to increase the density and resolution of the generated 3-d point cloud, which actually plays the role of a high definition Lidar.

1. System framework

Side view

Top view

2. Work pipeline

Camera -> (Panorama + Lidar) -> 3-D Point Cloud

(Lidar + Camera) -> Depth panorama -> Point Cloud

3. Fusion approaches

3.1 Lidar's projection is fused in the middle layer of CNN model

Stereo + Lidar

Mono + Lidar

3.2. Lidar' projection is fused directly with camera image

stereo + Lidar

mono + Lidar

3.3. Lidar's projection is fused with output of CNN Model

Stereo + Lidar

Mono+Lidar

4. CNN Model

the applied encoder-decoder network

5. References

•I Laina et al., “Deeper Depth prediction with fully convolutional residual networks”, arXiv 1606.00373.

•C Godard et al., “Unsupervised Monocular Depth Estimation with Left-Right Consistency”, arXiv 1609.03667.

•Y Cao et al., “Estimating Depth from Monocular Images as Classification Using Deep Fully Convolutional Residual Networks”, arXiv 1605.02305.

•B Ummenhofer et al., “DEMON: depth and motion network for learning monocular stereo”, arXiv 1612.02401.

•P Knoebelreiter et al., “End-to-end training of hybrid CNN-CRF models for stereo”, arXiv 1611.10229.

•N Mayer et al., “A large dataset to train CNNs for disparity, optical flow, and scene flow estimation”, CVPR 2016.

•P Fischer, et al. “FlowNet: learning optic flow with convolutional networks”, arXiv 1504.06852.

•J Žbontar, Y LeCun, “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches”, arXiv 1510.05970.

•W Luo, A G. Schwing, R Urtasun, “Efficient Deep Learning for Stereo Matching”, CVPR 2016.

•A Newell, K Yang, J Deng, “Stacked Hourglass Networks for Human Pose Estimation”, ECCV, 2016.

•R Szeliski, “image alignment and stitching: a tutorial”, MSR Report, Oct. 2004.