ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
We propose Asynchronous Score Distillation (ASD) a novel score distillation method to train text-to-3D generator in an unsupervised manner. ASD is stable to train and can scale up to 100k prompts. We conduct extensive experiments across different 2D diffusion models, including Stable Diffusion and MVDream, and text-to-3D generators, including Hyper-iNGP, 3DConv-Net and Triplane-Transformer. The results demonstrate ASD's effectiveness in stable 3D generator training, high-quality 3D content synthesis, and its superior prompt-consistency, especially under large prompt corpus.
Figure 1. Overview of Asynchronous Score Distillation (ASD). As illustrated in the left sub-figure, ASD can be employed for prompt-specific generation by optimizing 3D representations for each prompt, as well as for prompt-amortized generation by training a text-to-3D generator. The right sub-figure depicts how ASD uses the difference in noise predictions at asynchronous timesteps to update the 3D network parameters.
PS: You can change the resolution of the following videos for more detail 🙏
Teaser
Demo 1. Top rows: Asynchronous Score Distillation (ASD) for prompt-specific text-to-3D generation. Bottom row: ASD for prompt-amortized generation, which learns a text-to-3D generator on multiple prompts without 3D ground truths. ASD has strong capability to scale up the training corpus to as much as 100k text prompts.
Results with iNGP / Hyper-iNGP
Demo 3. Qualitative comparison among CSD, VSD and our ASD (with 3DConvnet as generator) on AT2520 and DF415 corpuses. SDS is not compared because it encounters numerical instability in this experiment.
Demo 2. Qualitative comparison on prompt-specific (with iNGP as the 3D representation) and prompt-amortized (with Hyper-iNGP as the 3D generator) text-to-3D results by SDS, CSD, VSD and our ASD methods
Results with 3DConv-net
Demo 3. Qualitative comparison among CSD, VSD and our ASD (with 3DConvnet as generator) on AT2520 and DF415 corpuses. SDS is not compared because it encounters numerical instability in this experiment.
Ablation Study
Demo 4. The qualitative results of the ablation study on the timestep interval
Scalability
Demo 5. The scalability comparison with CSD and VSD on CP100k corpus.
More Results with MVDream
Demo 6. Qualitative comparison between SDS* and ASD on prompt-specific text-to-3D generation, with iNGP as 3D representation and MVDream as 2D diffusion prior.