I am a PhD student at MILA, registered with University Laval with Prof Pascal Germain and Prof Cem Subakan. In previous avatars, I have been a Research Scientist at Descript, Machine Learning Engineer at Stripe, and Software Engineer at Snapchat and Google. I completed my Masters in Mathematics from University of Waterloo at Department of Combinatorics and Optimization, and my Bachelors in Technology from Indian Institute of Technology, Delhi.
My research focuses are Interpretable Machine Learning and Generative AI. Currently I am working on Interpretable ML methods for distinguishing fake computer generated audio from real world audio. I am also working on using advances in Large Language Modeling to produce more interpretable and controllable music recommendation systems. In the past, I have worked on approximation algorithms for network design problems.
In spare time, I like to sketch, paint, go on bike rides and play tabla.
The best way to reach me is to send an email to shgup1 (at) ulaval (dot) ca.
S. Gupta, Z. Li, T. Chen, C. Subakan, S. Reddy, P. Taslakian, V. Zantedeschi, ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval (Submitted to ICML, 2025) https://www.arxiv.org/abs/2502.07971
S. Gupta, T. Durand, G. Taylor, L. W. Białokozowicz, LAST SToP For Modeling Asynchronous Time Series (Submitted to ICML, 2025) https://arxiv.org/abs/2502.01922
S. Gupta, M. Ravanelli, P. Germain, C. Subakan, Phoneme Discretized Saliency Maps for Explainable Detection of AI Generated Voice, Interspeech, 2024, https://arxiv.org/abs/2406.10422
S. Gupta, I. Gomez-Sarmiento, F. Mezdari, M. Ravanelli, and C. Subakan, Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming, ANNPR, 2024, https://arxiv.org/pdf/2410.05455
A. Aggarwal, A. Louis, M. Bansal, N. Garg, N. Gupta, S. Gupta, S. Jain,* A 3-approximation algorithm for the facility location problem with uniform capacities. Math. Program. 141, 527–547 (2013). https://doi.org/10.1007/s10107-012-0565-4 (*equal contribution)
A. Aggarwal, A. Louis, M. Bansal, N. Garg, N. Gupta, S. Gupta, S. Jain, * A 3-approximation algorithm for the facility location problem with uniform capacities. IPCO'10: Proceedings of the 14th international conference on Integer Programming and Combinatorial OptimizationJune 2010, Pages 149–162. https://doi.org/10.1007/978-3-642-13036-6_12 (*equal contribution)
MILA Affiliate, University Laval, PhD in Computer Science, (Sept 2023 - Present)
Research Interests: Generative AI, Interpretability & Explainability, Unsupervised & Semisupervised Learning
University of Waterloo, MMath in Combinatorics and Optimization, (2009-2011)
Thesis: Building Networks in the Face of Uncertainty
Indian Institute of Technology, Delhi, Bachelors in Technology, Computer science and Engineering, (2005-2009)
Thesis: Approximation Algorithms for Facility Location Problems.
DESCRIPT, Senior Applied Research Scientist, 2021 – 2023
Performed research in deep learning, self-supervised representation learning, and audio synthesis to develop highly accurate voice verification system (VVS), zero-shot text to speech (TTS) aligner, and zero-shot text to speech synthesis.
Voice Verification System - Independently researched and developed a highly accurate state-of-the-art Voice Verification System (VVS) with 99% speaker verification accuracy, a 48% improvement over the accuracy of the previous system used in the team
Zero Shot Text-to-Speech Synthesis - Collaborated with a team to develop a high quality speech synthesis system with long term consistency
Produced a TTS system that maintains the original speaker voice, prosody, and recording conditions when conditioned on a prefix of only a few seconds of speech from a previously unseen speaker during training
Identified transformer bottlenecks in the pipeline and replaced them by self-attention layers with lightweight convolutions
Zero Shot Text-to-Speech aligner - Researched, implemented and optimized a zero shot IPA to Phoneme aligner (no speaker tuning, trainable for multiple languages)
Conducted extensive literature review on TTS alignment and achieved a mean word boundary alignment error of 23 ms on an English golden set, a 77% improvement over 100 ms for rev alignments, leading to significant improvements in TTS audio quality metrics.
Proposed innovative performance metrics to enable rapid model research, meaningful architecture extensions, and more effective training.
STRIPE, Senior Machine Learning Engineer, 2020 – 2021
Merchant Fraud Team - Designed new models and improved existing ones to catch fraud among new merchant sign ups
Improved existing Anomaly Detection framework for catching fraud attacks, reducing MTTD (mean time to detect) by 72%, while increasing the number of attacks caught by 45%.
Led a working group that utilized a combination of text embeddings from models such as Fasttext, BERT and its variants, along with traditional NLP techniques like count and tf-idf vectorizers, to add text-based features to our merchant fraud classification models, resulting in a 5% improvement in classification roc-auc.
RETENTION SCIENCE, Senior Research Scientist, 2018 – 2020
Prototyped and productionized generalized ML models that enabled 145 e-commerce companies to run effective email marketing campaigns
Led the Data Science team that was responsible for more than 30 predictive ML models including recommendation systems, lead scoring, churn predictions, LTV/CLV modeling, and real time multi-armed bandit testing
These models powered 4B+ predictions daily for more than 385M users across 145 e-commerce clients covering many industries including fashion, food, books, and others
SNAPCHAT, Senior Software Engineer, 2014 – 2018
Smart Content Precaching - Led a team that leveraged ML techniques to create and optimize personalized content preloading strategies for users.
This resulted in 100% improvement in utilization of precached content.
Additionally, resulted in 50% reduction in loading screens, 7% growth in content monthly active users (MAU), and 5% increase in content engagement
GOOGLE, Software Engineer 2011 – 2014
Knowledge Graph (KG) Made significant improvements to extraction of facts for KG and reduced reconciliation losses
Designed and implemented extraction of entries not explicitly linked on Wikipedia, resulting in 18% more extractions
Implemented heuristics that increased recall of the above pipeline 10 times at similar precision
Co-organizer for Workshop on explainable machine learning for speech and audio, ICASSP 2023
Reviewer for Integer Programming and Combinatorial Optimization (IPCO), 2011
Attribution methods for audio classification tasks and challenges (MILA, 2023)
Universal text-to-speech aligner (Descript, 2022)
Building recommendation systems using word2vec (Retention Science, 2019)
A 3-approximation for facility location with uniform capacities (University of Waterloo, 2010)
Approximation algorithms for facility location problems (IIT Delhi, 2009)