Video-driven Neural Physically-based Facial Asset for Production

Longwen Zhang1,2 Chuxiao Zeng1,2 Qixuan Zhang1,2 Hongyang Lin1,2 Ruixiang Cao1,2 Wei Yang3 Lan Xu1 Jingyi Yu1

1ShanghaiTech University 2Deemos Technology 3Huazhong University of Science and Technology

Abstract

Production-level workflows for producing convincing 3D dynamic human faces have long relied on an assortment of labor-intensive tools for geometry and texture generation, motion capture and rigging, and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists with explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. Two key components are well-structured latent spaces due to dense temporal samplings from videos and explicit facial expression controls to regulate the latent spaces. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with ultra-fast video cameras to obtain raw 3D facial assets. We then set out to model the facial expression, geometry and physically-based textures using separate VAEs where we impose a global multi-layer perceptron (MLP) based expression mapping across the latent spaces of respective networks, to preserve characteristics across respective attributes while maintaining explicit controls over facial geometry and texture generation. We also introduce the idea to model the delta information as wrinkle maps for the physically-based textures in our texture VAE, achieving high-quality 4K rendering of dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion transfer and retargeting. In addition, our multi-VAE-based neural asset, along with the fast adaptation schemes, can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangling strategy by providing various promising physically-based editing results like geometry and material editing or wrinkle transfer with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.

Overview

Our fast capture stage (FaStage) extends the classical photometric LightStage. FaStage combines multi-view reconstruction and photometric reconstruction to recover, the dynamic geometry and physically-based textures of a performer frame by frame. We then present a new neural representation with tailored network designs and training strategies to model dynamic meshes and textures under different expressions. The trained neural assets can be used for a range of applications, including appearance stylization and relighting under novel expressions and viewpoints.

Application

Illustration of our applications: (a) performer-specific neural facial asset, (b) Cross-identity neural retargeting, (c) Geometry and texture editing.

More results

Our dynamic textures effectively enhance the appearance of the performer: (a,d) driven facial assets with only static neutral physically-based textures, (b,e) driven facial assets with dynamic physically-based textures, (c,f) zoom-in view. Our method successfully models the dynamic textures and preserves facial details at high resolution.

We extend the expression network for new subjects with different expressions and varying head poses: (a,e) the upper image is the new performer’s expression, and the lower image is the neural retargeting results from our expression decoder, (b,f) driven facial asset in front view, (c,g) driven facial asset in the left view, (d,h) zoom-in view. Our method achieves detailed video-driven results from different identities with dynamic textures, which leads to photo-realistic rendering.

We edit the video-driven facial assets (a,e) in various ways: (b) We paint the diffuse albedo with the style of Guan Yu, the most famous red-face character in traditional Chinese Peking Opera; (c) We stylize the geometry of the performer to the blue alien from the feature film AVATAR with a novel facial structure but consistent identity features; (d) We add a metal Deemos logo to the cheek of the performer by modifying the textures; (f) We paint the diffuse albedo with the style of Yu Ji, the beloved concubine of Xiang Yu, the hegemon of Western Chu, from the famous Peking Opera Farewell My Concubine; (g) We altered the performer's facial structure, and added realistic face painting by jointly modifying the diffuse albedo and the normal map; (g) We adjust the facial roughness to give the skin a more shiny look.

Video

Bilibili video

Paper link

NPFA_SIGASIA_Supplementary-6.pdf

Dataset

Apply for Dynamic Facial PBR dataset (coming soon)

  1. Download and read LICENSE AGREEMENT carefully, confirming that all the terms are agreed. Then scan the signed license agreement to pdf. Electronic signature is allowed, which is turning the handwritten signature into a picture to be placed on the agreement.

  2. Send an e-mail to shanghaitechmars@foxmail.com. We recommend to apply using a *.edu E-mail, which is more likely to be authorized.

You can refer to the following template for Email content:


Subject: Application of Dynamic Facial PBR Dataset

Content Template:

Applicant: __________

Occupation: __________

Institution: __________

Purpose of Usage: __________

Your Scholar website: __________

Your GMAIL Address: __________

Attachment -- Scanned of license agreement with signature


Optional: We recommend to attach a brief resume or personal page URL to help us identify you. If we cannot identify you as a professor or researcher, the application may be ignored.

Note: We find that many of our replies are mislabeled as spam, so if you don't get a response more than 24 hours after sending the request on a working day, please check the spam box just in case. The Gmail Address should be correct since we will open the google drive link for that account if the application been approved.

  1. We are obtaining agreements from volunteers involved in the Dynamic Facial PBR dataset, and the dataset will be updated from time to time. For each subject, we will open about 10 seconds 4D scanned mesh with dynamic physically-based textures, which is after pre-process. Only data with volunteers' agreements will be provided.

Citation

@misc{zhang2022videodriven,

title={Video-driven Neural Physically-based Facial Asset for Production},

author={Longwen Zhang and Chuxiao Zeng and Qixuan Zhang and Hongyang Lin and Ruixiang Cao and Wei Yang and Lan Xu and Jingyi Yu},

year={2022},

eprint={2202.05592},

archivePrefix={arXiv},

primaryClass={cs.CV}

}

Notice: You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any way exploit any such content, nor may you distribute any part of this content over any network, including a local area network, sell or offer it for sale, or use such content to construct any kind of database.