Shubham Gupta

About Me

I am a PhD student at MILA, registered with University Laval with Prof Pascal Germain and Prof Cem Subakan. In previous avatars, I have been a Research Scientist at Descript, Machine Learning Engineer at Stripe, and Software Engineer at Snapchat and Google. I completed my Masters in Mathematics from University of Waterloo at Department of Combinatorics and Optimization, and my Bachelors in Technology from Indian Institute of Technology, Delhi.

My research focuses are Interpretable Machine Learning and Generative AI. Currently I am working on Interpretable ML methods for distinguishing fake computer generated audio from real world audio. I am also working on using advances in Large Language Modeling to produce more interpretable and controllable music recommendation systems. In the past, I have worked on approximation algorithms for network design problems.

In spare time, I like to sketch, paint, go on bike rides and play tabla.

The best way to reach me is to send an email to shgup1 (at) ulaval (dot) ca.

Publications

S. Gupta, Z. Li, T. Chen, C. Subakan, S. Reddy, P. Taslakian, V. Zantedeschi, ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval (Submitted to ICML, 2025) https://www.arxiv.org/abs/2502.07971
S. Gupta, T. Durand, G. Taylor, L. W. Białokozowicz, LAST SToP For Modeling Asynchronous Time Series (Submitted to ICML, 2025) https://arxiv.org/abs/2502.01922
S. Gupta, M. Ravanelli, P. Germain, C. Subakan, Phoneme Discretized Saliency Maps for Explainable Detection of AI Generated Voice, Interspeech, 2024, https://arxiv.org/abs/2406.10422
S. Gupta, I. Gomez-Sarmiento, F. Mezdari, M. Ravanelli, and C. Subakan, Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming, ANNPR, 2024, https://arxiv.org/pdf/2410.05455
A. Aggarwal, A. Louis, M. Bansal, N. Garg, N. Gupta, S. Gupta, S. Jain,* A 3-approximation algorithm for the facility location problem with uniform capacities. Math. Program. 141, 527–547 (2013). https://doi.org/10.1007/s10107-012-0565-4 (*equal contribution)
A. Aggarwal, A. Louis, M. Bansal, N. Garg, N. Gupta, S. Gupta, S. Jain, * A 3-approximation algorithm for the facility location problem with uniform capacities. IPCO'10: Proceedings of the 14th international conference on Integer Programming and Combinatorial OptimizationJune 2010, Pages 149–162. https://doi.org/10.1007/978-3-642-13036-6_12 (*equal contribution)

Education

MILA Affiliate, University Laval, PhD in Computer Science, (Sept 2023 - Present)
Research Interests: Generative AI, Interpretability & Explainability, Unsupervised & Semisupervised Learning
University of Waterloo, MMath in Combinatorics and Optimization, (2009-2011)
Thesis: Building Networks in the Face of Uncertainty
Indian Institute of Technology, Delhi, Bachelors in Technology, Computer science and Engineering, (2005-2009)
Thesis: Approximation Algorithms for Facility Location Problems.

Work Experience

DESCRIPT, Senior Applied Research Scientist, 2021 – 2023

Performed research in deep learning, self-supervised representation learning, and audio synthesis to develop highly accurate voice verification system (VVS), zero-shot text to speech (TTS) aligner, and zero-shot text to speech synthesis.

Voice Verification System - Independently researched and developed a highly accurate state-of-the-art Voice Verification System (VVS) with 99% speaker verification accuracy, a 48% improvement over the accuracy of the previous system used in the team

Zero Shot Text-to-Speech Synthesis - Collaborated with a team to develop a high quality speech synthesis system with long term consistency

Produced a TTS system that maintains the original speaker voice, prosody, and recording conditions when conditioned on a prefix of only a few seconds of speech from a previously unseen speaker during training
Identified transformer bottlenecks in the pipeline and replaced them by self-attention layers with lightweight convolutions

Zero Shot Text-to-Speech aligner - Researched, implemented and optimized a zero shot IPA to Phoneme aligner (no speaker tuning, trainable for multiple languages)

Conducted extensive literature review on TTS alignment and achieved a mean word boundary alignment error of 23 ms on an English golden set, a 77% improvement over 100 ms for rev alignments, leading to significant improvements in TTS audio quality metrics.
Proposed innovative performance metrics to enable rapid model research, meaningful architecture extensions, and more effective training.

STRIPE, Senior Machine Learning Engineer, 2020 – 2021

Merchant Fraud Team - Designed new models and improved existing ones to catch fraud among new merchant sign ups

Improved existing Anomaly Detection framework for catching fraud attacks, reducing MTTD (mean time to detect) by 72%, while increasing the number of attacks caught by 45%.
Led a working group that utilized a combination of text embeddings from models such as Fasttext, BERT and its variants, along with traditional NLP techniques like count and tf-idf vectorizers, to add text-based features to our merchant fraud classification models, resulting in a 5% improvement in classification roc-auc.

RETENTION SCIENCE, Senior Research Scientist, 2018 – 2020

Prototyped and productionized generalized ML models that enabled 145 e-commerce companies to run effective email marketing campaigns

Led the Data Science team that was responsible for more than 30 predictive ML models including recommendation systems, lead scoring, churn predictions, LTV/CLV modeling, and real time multi-armed bandit testing
These models powered 4B+ predictions daily for more than 385M users across 145 e-commerce clients covering many industries including fashion, food, books, and others

SNAPCHAT, Senior Software Engineer, 2014 – 2018

Smart Content Precaching - Led a team that leveraged ML techniques to create and optimize personalized content preloading strategies for users.

This resulted in 100% improvement in utilization of precached content.
Additionally, resulted in 50% reduction in loading screens, 7% growth in content monthly active users (MAU), and 5% increase in content engagement

GOOGLE, Software Engineer 2011 – 2014

Knowledge Graph (KG) Made significant improvements to extraction of facts for KG and reduced reconciliation losses

Designed and implemented extraction of entries not explicitly linked on Wikipedia, resulting in 18% more extractions
Implemented heuristics that increased recall of the above pipeline 10 times at similar precision

Professional Services

Co-organizer for Workshop on explainable machine learning for speech and audio, ICASSP 2023

Reviewer for Integer Programming and Combinatorial Optimization (IPCO), 2011

Selected Talks

Attribution methods for audio classification tasks and challenges (MILA, 2023)
Universal text-to-speech aligner (Descript, 2022)
Building recommendation systems using word2vec (Retention Science, 2019)
A 3-approximation for facility location with uniform capacities (University of Waterloo, 2010)
Approximation algorithms for facility location problems (IIT Delhi, 2009)

Page updated

Google Sites

Report abuse