SCRREAM

SCRREAM :

SCan, Register, REnder And Map : A Framework for Annotating
Accurate and Dense 3D Indoor Scenes with a Benchmark

HyunJun Jung, Weihang Li, Shun-Cheng Wu, William Bittner, Nikolas Brasch, Jifei Song,

Eduardo Perez-Pellitero, Zhensong Zhang, Arthur Moreau, Nassir Navab, Benjamin Busam

NeurIPS 2024, Dataset and Benchmark Track

Arxiv📝| Dataset🗄️ | Tools🛠️

Intro :

This is the project website for SCRREAM, a highly accurate indoor benchmark dataset and annotation framework focusing on dense 3D Vision tasks.

Hardware :

(a): We use a custom multi-modal camera rig to capture the real image sequences that comprises of RGB + P (Polarization) + 2xD (ToF, Active Stereo) that are synchronized via harware signal generated by raspberry pi. (b): Shining 3D EinScan scanner is used for scanning small household object. (c): For larger objects such as furniture or empty room, Artec Leo Hand-Held scanner is used for scanning.

Data Collection Procedure :

(a): Pre-scanning 3D models (emply room, household objects, furniture). (b): Partial scanning of the scene to obtain sparse GT for object registeration (c): Registeration of scanned 3D models to the partal scanning of the scene (d)-(g): Two stage mapping : Two stage mapping to register real image sequence to the registered mesh. (d): Render synthetic images with realistic lighting. (e): Map synthetic images' features to 3D using GT camera pose. (f): Record the real image sequence with multi modal camerea rig. (g): Map the real image features on the synthetic image features to obatin the camera pose. (h): Fetch 3D information by using the mapped camera pose and registered full mesh of the scene.

Two Stage Mapping :

We annotate the real camera pose via two stage mapping. (a): First using the registered 3D mesh of the scene, render the realistic image sequence that comes with GT camera pose, then run the SfM with the camera pose prior to obtain an 3D feature map of the scene. (b): Capture the real image sequence and run SfM but match the 3D feature map obtained from the synthetic images. By having prior 3D feature map, accurate camera pose can be obtained without having an issue with scale ambiguity. (c): Dense 3D Vision GT can be rendered with registered mesh and camera pose.

If you have used the dataset in your work or feel that this work has helped your research a bit, please kindly consider citing it:

@misc{jung2024scrreamscanregister,

title={SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark},

author={HyunJun Jung and Weihang Li and Shun-Cheng Wu and William Bittner and Nikolas Brasch and Jifei Song and Eduardo Pérez-Pellitero and Zhensong Zhang and Arthur Moreau and Nassir Navab and Benjamin Busam},

year={2024},

eprint={2410.22715},

archivePrefix={arXiv},

primaryClass={cs.CV},

url={https://arxiv.org/abs/2410.22715},

}

journal={arX preprint arXiv:2309.09

Page updated

Google Sites

Report abuse

SCRREAM :

SCan, Register, REnder And Map : A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark

SCan, Register, REnder And Map : A Framework for Annotating
Accurate and Dense 3D Indoor Scenes with a Benchmark