Hi, I'm Soubhik, a Research Scientist @GenAI at Meta AI.
I have joined Meta AI as a research scientist in GenAI. Previously, I have finished my PhD under the supervision of Michael J. Black where I focused on building generative models for digital humans in 2D and 3D. Throughout my research journey, I've had the opportunity to collaborate closely with Timo Bolkart and Justus Thies. Additionally, I have done two research internships during my PhD (approximately ~2 years in total). One is with Google AR hosted by Thabo Beeler, and the other is with Amazon Research mentored by Javier Romero.
Updates:
I have joined as Research Scientist at the GenAI org at Meta AI on November 2024
I have sucessfully defended my PhD Thesis on September 2024
One paper accepted at CVPR 2024
I started a Student Researcher position at Google with Thabo Beeler on December 1, 2023
I have been selected as a Top Reviewer for NeurIPS 2023.
Selected Projects
Generative 3D Neural Avatars
SCULPT is a novel 3D generative model for creating clothed and textured human meshes, using deep neural networks to represent geometry and appearance distribution. Due to limited textured 3D mesh datasets, SCULPT leverages medium-sized 3D scan datasets and large-scale 2D image datasets in an unpaired learning procedure. It also leverages large-scale language models for better disentanglement. The method is validated on the SCULPT dataset and compared to other state-of-the-art 3D generative models for clothed human bodies. CVPR 2024
Code and paper link: SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J Black, Justus Thies, Timo Bolkart
Unconditional Video Generation
We present an efficient video generative model that captures long-term dependencies using a hybrid tri-plane representation and a single latent code, reducing computational complexity by 50%. This approach, enhanced with an optical flow-based GAN module, generates high-fidelity videos at 256 × 256 resolution and 30 fps. Our model's efficacy is validated across multiple datasets. Arxiv 2023
Paper link: RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks
Partha Ghosh*, Soubhik Sanyal*, Cordelia Schmid , Bernhard Scholkopf (Joint first author)
Synthesizing Human Images
SPICE is a self-supervised framework that synthesizes images of a person in novel poses from a single image, addressing the challenges and costs of obtaining paired training data. It leverages 3D information about the human body to maintain realism and consistency in generated images. SPICE outperforms previous unsupervised methods, achieving state-of-the-art performance on the DeepFashion dataset, and can generate temporally coherent videos from static images and pose sequences. ICCV 2021 (Oral)
Soubhik Sanyal, Alex Vorobiov, Timo Bolkart, Matt Loper, Betty Mohler, Larry Davis, Javier Romero, Michael J Black
Reconstructing 3D Pose & Shape
RingNet is a neural network that estimates 3D face shape, pose, and expressions from a single image without 2D-to-3D supervision. It leverages multiple images of an individual and automatically detected 2D face features, using a novel loss function that encourages consistent face shape across images with same identity. The model achieves expression invariance using the FLAME face representation. RingNet outperforms methods with 3D supervision and is evaluated using a new "not quite in-the-wild" (NoW) database with 3D head scans and high-resolution images. CVPR 2019
Soubhik Sanyal, Timo Bolkart, Haiwen Feng, Michael J. Black
Deep 3D Geometry Learning
The CoMA model is a versatile 3D face representation method that uses spectral convolutions on mesh surfaces for computer vision and graphics applications. It overcomes the limitations of traditional linear models by employing mesh sampling operations for a hierarchical representation. Trained on 20,466 meshes from 12 subjects, CoMA outperforms state-of-the-art models with 50% lower reconstruction error and 75% fewer parameters, proving its effectiveness in capturing non-linear facial variations. ECCV 2018
Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, Michael J. Black
Face & Object Recognition
Discriminative Pose-Free Descriptor (DPFD) tackles pose invariant matching in applications like face recognition and object matching. By using training examples at representative poses, virtual intermediate pose subspaces are generated. Images are then projected onto these subspaces, and a discriminative transform is applied to create a single feature vector (DPFD) for classification. The effectiveness of this approach is demonstrated through extensive experiments on the Multi-PIE and Surveillance Cameras Face datasets and its generalizability beyond faces is shown through experiments on matching objects across viewpoints. ICCV 2015
Soubhik Sanyal, Sivaram Prasad Mudunuri, Soma Biswas
Please check my google scholar for a full and updated list of my publications