CommonScenes

Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion

Guangyao Zhai∗, Evin Pınar Örnek∗, Shun-Cheng Wu, Yan Di

Federico Tombari, Nassir Navab, Benjamin Busam


NeurIPS 2023


Please visit our newest scene graph diffusion model for scene generation: EchoScene.

Arxiv📝 | SG-FRONT💾 | Code⌨️ 

TL; DR
We present CommonScenes, a fully generative model creating indoor scenes with given scene graphs, powered by diffusion models. 

Dining Room

Graph-to-3D

Ours

Bedroom

Graph-to-3D

Ours

Living Room

Graph-to-3D

Ours

About our method

I. Scene Graph Evolution

We upgrade the original scene graph along with its graph features to a contextual graph by inserting CLIP features to each node and edge. Then we augment the contextual graph to a box-enhanced contextual graph (BCG) using the embedding of ground truth bounding boxes as input of a VAE (II. Pipeline) during the training process, which is supposed to predict bounding boxes as scene layouts during the inference time.

II. Pipeline

Training:

Following the BCG mentioned above, a triplet-GCN-based contextual encoder Ec encodes the graph features to layout-shape joint distribution Z. We update BCG to the Updated Contextual Graph(UCG) by replacing the embedding of grounding bounding boxes with sampled latent codes from Z. The UCG is fed into two branches to generate shapes and layouts. In the shape branch, UCG is encoded again to generate per-object relation embedding, which conditions the diffusion process to recover the per-object shape. In the layout branch, UCG is decoded by the layout decoder Dl to generate layouts supervised by ground truth bounding boxes.

Inference:

At inference, the pipeline starts from UCG, which can be obtained by augmenting a scene graph with CLIP features and learned distribution Z. The scene layout with object bounding boxes are then generated, followed by diffusion based shape generation. 


We further illustrate the whole pipeline in the video below:

If you feel that this work has helped your research a bit, please kindly consider citing it:

@article{zhai2023commonscenes,

  title={CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion},

  author={Zhai, Guangyao and {\"O}rnek, Evin Pinar and Wu, Shun-Cheng and Di, Yan and Tombari, Federico and Navab, Nassir and Busam, Benjamin},

  journal={arXiv preprint arXiv:2305.16283},

  year={2023}