Hao Wang*
Jun Wang*
Liang Wang
Baidu Research - Institue of Deep Learning
(*joint first authors)
A system capable of performing robust online volumetric reconstruction of indoor scenes based on input from a handheld RGB-D camera is presented. Our system is powered by a two-pass reconstruction scheme. The first pass tracks camera poses at video rate and simultaneously constructs a pose graph on-the-fly. The tracker operates in real-time, which allows the reconstruction results to be visualized during the scanning process. Live visual feedbacks makes the scanning operation fast and intuitive. Upon termination of scanning, the second pass takes place to handle loop closures and reconstruct the final dense model using globally refined camera trajectories. The system is online with low delay and returns a dense model of sufficient accuracy. The beauty of this system lies in its accuracy, simplicity and ease of implementation when compared to existing methods. We demonstrate the performance of our system on several realworld scenes and quantitatively evaluate the modeling accuracy with respect to ground truth models obtained from a LIDAR scanner.
Automatic online reconstruction from a sequence of handheld RGB-D images. The left shows the reconstructed 3D model of our two-pass system. The middle shows our optimized camera trajectory and the rest show some details of the model. The area of this scene is about 40m2 and the camera trajectory is about 79 meters long. Scanning the room takes about 360 seconds and the online reconstruction finishes within 170 seconds, i.e. 170 seconds after the user terminates the scanning process.
Reconstruction of the Reading Room (top row) and the UE Lab (bottom row) datasets. From left to right are results produced by DVO-SLAM, Choi-Nonrigid, Choi-Rigid, ElasticFusion, and our method without/with pose graph optimization.
Heat maps showing errors from ground truth surface to the reconstructed surface on Reading Room and UE Lab datasets with different methods. From blue to red, the error increases from zero to 0.2m.
Cumulative histogram of errors from ground truth surface the reconstructed surface on Reading Room and UE Lab datasets.
Here are the two datasets (readingroom and uelab) used in the paper. Each dataset includes a rgbd video captured by an ASUS Xtion PRO LIVE and a groundtruth point cloud from a high precision LIDAR system (Riegl VZ 400). The dataset can be downloaded from dropbox(readingroom.oni, readingroom.ply, uelab.oni and uelab.ply) or Chinese Baidu YunPan. The RGBD video has been calibrated and the calibration parameters can be downloaded here (calibration file, calibration file format).
@inproceedings{rgbdrecon2016,
title = {Online Reconstruction of Indoor Scenes from RGB-D Streams},
author = {Hao Wang and Jun Wang and Liang Wang},
booktitle = {IEEE CVPR},
year = {2016}
}