DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance 

Longwen Zhang1,2  Qiwei Qiu1,2  Hongyang Lin1,2  Qixuan Zhang1,2   Cheng Shi1  Wei Yang3  Ye Shi1  Sibei Yang1  Lan Xu1  Jingyi Yu1 

1ShanghaiTech University        2Deemos Technology        3Huazhong University of Science and Technology

Experience DreamFace online now!

Due to the data policy, the web demo is trained only using our collected data. 

Abstract

Emerging Metaverse applications demand accessible, accurate, and easy-to-use tools for 3D digital human creations in order to depict different cultures and societies as if in the physical world. Recent large-scale vision-language advances pave the way to for novices to conveniently customize 3D content. However, the generated CG-friendly assets still cannot represent the desired facial traits for human characteristics. In this paper, we present DreamFace, a progressive scheme to generate personalized 3D faces under text guidance. It enables layman users to naturally customize 3D facial assets that are compatible with CG pipelines, with desired shapes, textures, and fine-grained animation capabilities. From a text input to describe the facial traits, we first introduce a coarse-to-fine scheme to generate the neutral facial geometry with a unified topology. We employ a selection strategy in the CLIP embedding space to generate coarse geometry, and subsequently optimize both the details displacements and normals using Score Distillation Sampling from generic Latent Diffusion Model. Then, for neutral appearance generation, we introduce a dual-path mechanism, which combines the generic LDM with a novel texture LDM to ensure both the diversity and textural specification in the UV space. We also employ a two-stage optimization to perform SDS in both the latent and image spaces to significantly provides compact priors for fine-grained synthesis. Our generated neutral assets naturally support blendshapes-based facial animations. We further improve the animation ability with personalized deformation characteristics by learning the universal expression prior using the cross-identity hypernetwork, and a neural facial tracker for video input. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of DreamFace. Notably, DreamFace can generate of realistic 3D facial assets with physically-based rendering quality and rich animation ability from video footage, even for fashion icons or exotic characters in cartoons and fiction movies. 

Overview

Our pipeline mainly includes three modules, including geometry generation, physically-based texture diffusion, and animatability empowerment. Given textual guidance, DreamFace is able to generate facial assets that closely resemble the described characteristics in terms of shape and appearance. Our approach is consistent with industry standards in computer graphics production and is able to achieve photo-realistic results when driven and rendered.

Application

Our approach generates facial assets of celebrities that capture their personalized characteristics and achieve a high degree of resemblance. By generating physically-based textures, our facial assets achieve photo-realistic results using the modern CG rendering pipeline.

Our approach generates facial assets that faithfully match the characteristics described in the prompts. Through our animatability empowerment, the generated facial assets can be animated using a single RGB image and rendered photo-realistically in modernCG pipelines.

The upper row shows the rendering results from the differentiable renderer, and the lower row shows the corresponding diffuse maps. Our framework faithfully reveals the facial characteristics of characters, even if they are not present in our texture dataset, for example, the pink nose of Na’vi and the metallic patterns of Black Panther’s mask. In addition, our texture LDM serves as a robust prior, ensuring that the generated facial components share a consistent UV space. 

More results

By directly using our trained texture LDM with a prompt, one can achieve global editing effects such as aging and makeup. By further combining masks or sketches, one can create various effects such as tattoos, beards, and birthmarks.

For each case, we show the input RGB images (left) and the personalized driven results (right). Our framework provides each generated facial asset with personalized expressions from a single image.

Video

Bilibili video

Paper link

Dataset (coming soon...)

We are planning to provide more than 10,000 3D facial assets with PBR textures and BlendShapes. 

Citation

@misc{zhang2023dreamface,

      title={DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance},

      author={Longwen Zhang and Qiwei Qiu and Hongyang Lin and Qixuan Zhang and Cheng Shi and Wei Yang and Ye Shi and Sibei Yang and Lan Xu and Jingyi Yu},

      year={2023},

      eprint={2304.03117},

      archivePrefix={arXiv},

      primaryClass={cs.GR}

}

Notice: You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any way exploit any such content, nor may you distribute any part of this content over any network, including a local area network, sell or offer it for sale, or use such content to construct any kind of database.