Geometry-aware Two-scale PIFu Representation 

for Human Reconstruction

NeurIPS 2022 (Spotlight)

Part 1: Abstract

From left to right : a. Raw RGBD, b. PIFu(RGBD), c. PIFuHD, d. IPNet, e. Ours, f. Our Additional Results

Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction from sparse, noisy inputs. Our key idea is to exploit the complementary properties of depth denoising and 3D reconstruction, for learning a two-scale PIFu representation to reconstruct high-frequency facial details and consistent bodies separately. To this end, we first formulate depth denoising and 3D reconstruction as a multi-task learning problem. The depth denoising process enriches the local geometry information of the reconstruction features, while the reconstruction process enhances depth denoising with global topology information. We then propose to learn the two-scale PIFu representation using two MLPs based on the denoised depth and geometry-aware features. Extensive experiments demonstrate the effectiveness of our approach in reconstructing facial details and bodies of different poses and its superiority over state-of-the-art methods.

Part 2: Pipeline

Proposed method overview. Given sparse and noisy RGBDs as inputs, Geometry-aware PIFu-Body performs depth denoising and predicts the body occupancy field. High-resolution PIFu-Face predicts the face occupancy field with fine-grained details. The body and face occupancy fields are fused to produce final results via the Face-to-body Fusion scheme.

Part 3: Demo Video

Part 4: More Results

Figure A : 3D reconstructed & textured results. RGBD inputs of the front view (a).  Our 3D reconstructed results (b). Our textured results (c).

Figure B : Examples of the eyeglasses (first row) and loose dressing (second row). RGBD inputs of the front view (a). Our refined depths [Fused points cloud] (b). Our 3D reconstructed results (c). Our textured results (d).

Figure C : Visualization of the hand reconstruction's difficulties.  Our reconstructed results (a, c). Ground-truth models (b, d). Zoom in to see the details.

Table A : Quantitative comparisons on our test set, between our sequential model (depth denoising and 3d reconstruction in sequential) and our proposed model (multi-task manner). The best results are marked in bold.

Figure D : Qualitative comparisons on our test set, between our sequential model (depth denoising and 3d reconstruction in sequential) and our proposed model (multi-task manner). RGBD inputs of the front view (a). Refined depths and normals of the sequential model (b), and our model (c). Normal error maps between the Ground-truth and the sequential model (d), our model (e). Ground-truth normals (f). Reconstructed 3D results of the sequential model (g) and our model (h). Ground-truth meshes (i). Zoom in to see the details.

Figure E : Qualitative comparisons between three face reconstruction methods and our PIFu-Face model.  Input facial RGB image (a). Predicted facial normal maps of Abrevava et. al [1] (b). 3D reconstructed facial model of DF2Net [2] (c), FaceVerse [3] (d). Our reconstructed facial results (e). Zoom in to see the details.

[1] Abrevaya V F, Boukhayma A, Torr P H S, et al. Cross-modal deep face normals with deactivable skip connections[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 4979-4989.

[2] Zeng X, Peng X, Qiao Y. Df2net: A dense-fine-finer network for detailed 3d face reconstruction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 2315-2324.

[3] Wang L, Chen Z, Yu T, et al. FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 20333-20342.

Part 5 : Citation

@article{dong2022geometry,

  title={Geometry-aware Two-scale PIFu Representation for Human Reconstruction},

  author={Dong, Zheng and Xu, Ke and Duan, Ziheng and Bao, Hujun and Xu, Weiwei and Lau, Rynson},

  journal={Advances in Neural Information Processing Systems},

  volume={35},

  pages={31130--31144},

  year={2022}

}