Hi, I'm Soubhik, a PhD student at Max Planck Institute for Intelligent Systems advised by Michael J. Black.
As a Ph.D. student at the Max Planck Institute for Intelligent Systems, I have the privilege of being advised by Michael J. Black. Throughout my research journey, I've had the opportunity to collaborate closely with Timo Bolkart and Justus Thies. Additionally, during a research internship at Amazon Research, I was fortunate to be guided by Javier Romero.
My work centers on the generation and synthesis of digital humans, delving deep into the intricacies of generative modeling for 3D humans and their clothing, synthesizing human images, and reconstructing 3D shape and pose from monocular images. My fascination for generative modeling has grown immensely in recent years, particularly in the realms of GANs and diffusion models. My research interests encompass the broader scope of generative modeling, aiming to bridge the gap between 2D and 3D human representations and unlock new possibilities in the digital world.
Updates:
One paper accepted at CVPR 2024
I started a Student Researcher position at Google with Thabo Beeler on December 1, 2023.
I have been selected as a Top Reviewer for NeurIPS 2023.
I will be completing my Ph.D. by the close of 2023 and am currently seeking a research scientist position to commence at the start of 2024.
Selected Projects
Generative 3D Neural Avatars
SCULPT is a novel 3D generative model for creating clothed and textured human meshes, using deep neural networks to represent geometry and appearance distribution. Due to limited textured 3D mesh datasets, SCULPT leverages medium-sized 3D scan datasets and large-scale 2D image datasets in an unpaired learning procedure. It also leverages large-scale language models for better disentanglement. The method is validated on the SCULPT dataset and compared to other state-of-the-art 3D generative models for clothed human bodies. CVPR 2024
Code and paper link: SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J Black, Justus Thies, Timo Bolkart
Unconditional Video Generation
We present an efficient video generative model that captures long-term dependencies using a hybrid tri-plane representation and a single latent code, reducing computational complexity by 50%. This approach, enhanced with an optical flow-based GAN module, generates high-fidelity videos at 256 × 256 resolution and 30 fps. Our model's efficacy is validated across multiple datasets. Arxiv 2023
Paper link: RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks
Partha Ghosh, Soubhik Sanyal, Cordelia Schmid , Bernhard Scholkopf
Synthesizing Human Images
SPICE is a self-supervised framework that synthesizes images of a person in novel poses from a single image, addressing the challenges and costs of obtaining paired training data. It leverages 3D information about the human body to maintain realism and consistency in generated images. SPICE outperforms previous unsupervised methods, achieving state-of-the-art performance on the DeepFashion dataset, and can generate temporally coherent videos from static images and pose sequences. ICCV 2021 (Oral)
Soubhik Sanyal, Alex Vorobiov, Timo Bolkart, Matt Loper, Betty Mohler, Larry Davis, Javier Romero, Michael J Black
Reconstructing 3D Pose & Shape
RingNet is a neural network that estimates 3D face shape, pose, and expressions from a single image without 2D-to-3D supervision. It leverages multiple images of an individual and automatically detected 2D face features, using a novel loss function that encourages consistent face shape across images with same identity. The model achieves expression invariance using the FLAME face representation. RingNet outperforms methods with 3D supervision and is evaluated using a new "not quite in-the-wild" (NoW) database with 3D head scans and high-resolution images. CVPR 2019
Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black
Deep 3D Geometry Learning
The CoMA model is a versatile 3D face representation method that uses spectral convolutions on mesh surfaces for computer vision and graphics applications. It overcomes the limitations of traditional linear models by employing mesh sampling operations for a hierarchical representation. Trained on 20,466 meshes from 12 subjects, CoMA outperforms state-of-the-art models with 50% lower reconstruction error and 75% fewer parameters, proving its effectiveness in capturing non-linear facial variations. ECCV 2018
Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, Michael J. Black
Face & Object Recognition
Discriminative Pose-Free Descriptor (DPFD) tackles pose invariant matching in applications like face recognition and object matching. By using training examples at representative poses, virtual intermediate pose subspaces are generated. Images are then projected onto these subspaces, and a discriminative transform is applied to create a single feature vector (DPFD) for classification. The effectiveness of this approach is demonstrated through extensive experiments on the Multi-PIE and Surveillance Cameras Face datasets and its generalizability beyond faces is shown through experiments on matching objects across viewpoints. ICCV 2015
Soubhik Sanyal, Sivaram Prasad Mudunuri, Soma Biswas
Please check my google scholar for a full and updated list of my publications