I am a researcher at Google DeepMind India, interested in architectural and algorithmic advances for making foundation models efficient (training, inference, model size, etc.) and effective (quality, elastic compute, reasoning, etc.).

In recent research, I addressed various practical challenges in the design of machine learning systems -- robustness, concept drift, cost-efficiency, human-AI interaction, etc. I also worked on cognition-inspired learning systems, including meta-learning, continual learning, and robust vision. In a previous role, I led applied scientist teams at Microsoft Bing Ads in building & supporting large-scale production models of user behavior, including click & conversion prediction and user preference models & personalization,.

I received a Ph.D from the University of Washington, and worked in various research capacities at UW, UC San Diego, Microsoft Research, Fraunhofer Institute, and Lucent Bell Labs.

For updated details, please see my Google Scholar and LinkedIn pages.

Recent papers & news

ICML 2025. Masked Generative Nested Transformers with Decode Time Scaling. S. Goyal, et al.
ICLR Workshops 2025. Universal Model Routing for Efficient LLM Inference. W. Jitkrittum, et al.
Google Research blog posts about our work on spurious features & simplicity bias, and on reweighting for nonstationary learning.
CVPR 2024. Improving Generalization via Meta-Learning on Hard Samples. N. Jain, A.S. Suggala, P. Shenoy.
ICLR 2024. Learning model uncertainty as variance-minimizing instance weights. N. Jain, K. Shanmugham, P. Shenoy.

Learned temporal reweighting

We propose a temporal reweighting approach for training models under slow concept drift. A meta-model scores each instance, and its age, according to the value it provides for future predictions. We outperform a range of other robust reweighting schemes by upto 8% relative, on a longitudinal dataset (9 years), and on a range of other nonstationary learning benchmarks. To our knowledge, this is the first proposal to leverage instance characteristics and data age for forward transfer.

Instance-conditional timescales of decay for nonstationary learning

N. Jain, P. Shenoy. AAAI 2024.

Early readouts debias distillation

We improve accuracy and across-group fairness of student models in distillation. We show that early readouts (linear decoding from earlier layers of the network) indicate featural bias through overconfident errors on underrepresented instances. By reweighting teacher loss as a function of early-layer error confidence, we show gains not only in worst-group accuracy but also overall accuracy over other distillation approaches on fairness benchmark datasets.

Using early readouts to mediate featural bias in distillation.

R. Tiwari, D. Sivasubramanian, A. Reddy, G. Ramakrishnan, P. Shenoy. WACV 2024.

Debiasing with a feature sieve

We propose a feature sieve--a novel method for automatically mediating between potential predictive features in a deep network based on their generalization capability. Our method identifies and suppresses features with spurious label correlations, without access to definitions or other characterizations of potential features. We report significant gains (upto 11% relative) on real-world datasets with spurious feature-label correlations such as BAR, NICO, CelebA, Imagenet-9/Imagenet-A.

Overcoming simplicity bias in deep networks using a feature sieve

R. Tiwari, P. Shenoy. ICML 2023.