Abstract—In this work we propose a novel pipeline for the 3D reconstruction of the full body from egocentric viewpoints. 3-D reconstruction of the human body from egocentric viewpoints is a challenging task as the view is skewed and the body parts farther from the cameras are occluded. One such example is the view from cameras installed below VR headsets. To achieve this task, we first make use of conditional GANs to translate the egocentric views to full body third person views. This increases the comprehensibility of the image and caters to occlusions. The generated third person view is further sent through the 3D reconstruction module that generates a 3D mesh of the body. We also train a network that can take the third person full body view of the subject and generate the texture maps for applying on the mesh. The generated mesh has fairly realistic body proportions and is fully rigged allowing for further applications such as real time animation and pose transfer in games. This approach can be key to new domain of mobile human telepresence.
(a) The placement of the cameras on the VR headset. There is a camera in the front pointing downwards capturing the front and another camera on the back capturing the back. (b) The views from the front and back VR cameras. You can see that they are severely distorted due to perspective. (c) The third person view of the back and front body
Pipeline for reconstructing the 3D model of the human body from egocentric images. The inputs from the egocentric camera are first sent through an image translation network which translates the egocentric views into third person full body views. Then the translated view is sent through the 3D reconstruction module which outputs the shape and pose parameters for the SMPL model and the texture maps are obtained using the texture generation model. The reconstructed 3D mesh can be animated and viewed from novel viewpoints
Results
View Translation
Texture Generation
Third Person View to Reconstructed Mesh
Novel Poses and viewpoints
The generated texture is applied on the reconstructed mesh. Then the mesh is transformed into a new poses and shown from new viewpoints
Advantages of using SMPL + Texture module instead of PiFu by Saito et al.
The mesh generated from the SMPL model is compared with the mesh generated using PIFu. PiFu generates a distorted mesh of the body due to occlusions by the hand whereas the mesh recovery for the SMPL model gives fairly accurate results. Since the SMPL approach uses a rigged 3D model, it can be animated later if we import it in a virtual reality game or if we are able to extract the pose from the VR camera, we can simply transform the 3D model using the new pose. This however would not be possible in the obj file that is outputted in PiFu