CommonScenes
Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion
Guangyao Zhai∗, Evin Pınar Örnek∗, Shun-Cheng Wu, Yan Di,
Federico Tombari, Nassir Navab, Benjamin Busam
NeurIPS 2023
Please visit our newest scene graph diffusion model for scene generation: EchoScene.
TL; DR
We present CommonScenes, a fully generative model creating indoor scenes with given scene graphs, powered by diffusion models.
Dining Room
Graph-to-3D
Ours
Bedroom
Graph-to-3D
Ours
Living Room
Graph-to-3D
Ours
About our method
I. Scene Graph Evolution
We upgrade the original scene graph along with its graph features to a contextual graph by inserting CLIP features to each node and edge. Then we augment the contextual graph to a box-enhanced contextual graph (BCG) using the embedding of ground truth bounding boxes as input of a VAE (II. Pipeline) during the training process, which is supposed to predict bounding boxes as scene layouts during the inference time.
II. Pipeline
Training:
Following the BCG mentioned above, a triplet-GCN-based contextual encoder Ec encodes the graph features to layout-shape joint distribution Z. We update BCG to the Updated Contextual Graph(UCG) by replacing the embedding of grounding bounding boxes with sampled latent codes from Z. The UCG is fed into two branches to generate shapes and layouts. In the shape branch, UCG is encoded again to generate per-object relation embedding, which conditions the diffusion process to recover the per-object shape. In the layout branch, UCG is decoded by the layout decoder Dl to generate layouts supervised by ground truth bounding boxes.
Inference:
At inference, the pipeline starts from UCG, which can be obtained by augmenting a scene graph with CLIP features and learned distribution Z. The scene layout with object bounding boxes are then generated, followed by diffusion based shape generation.
We further illustrate the whole pipeline in the video below:
If you feel that this work has helped your research a bit, please kindly consider citing it:
@article{zhai2023commonscenes,
title={CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion},
author={Zhai, Guangyao and {\"O}rnek, Evin Pinar and Wu, Shun-Cheng and Di, Yan and Tombari, Federico and Navab, Nassir and Busam, Benjamin},
journal={arXiv preprint arXiv:2305.16283},
year={2023}