Speaker: Noah Golowich (Microsoft Research)
Title: Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
Paper: https://arxiv.org/abs/2603.07887
Slides: here
The recording will be uploaded here after the event.
Authors: Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Dominges-Enrich, Dylan J. Foster, Akshay Krishnamurthy
Abstract: Efficiently sampling from a complex probability distribution is a fundamental problem across machine learning and theoretical computer science. It has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from large language models (LLMs) have been proposed to solve challenging reasoning problems spanning domains such as mathematics and coding. For the most part, however, we lack a principled understanding of the accuracy--cost tradeoffs for such procedures. In this talk, we propose a formalization for such tasks as the problem of producing a sample from a target probability measure, given an oracle which yields approximate density estimates for the target measure. Depending on the context, this oracle may be interpreted as an approximate verifier or a *process reward model* for a particular language modeling task. This setup is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989).
Generalizing results from existing literature, we establish provable guarantees for the Sequential Monte Carlo algorithm and related particle filtering approaches, which have recently found success empirically in the context of both language modeling and diffusion. In particular, our theory identifies a few properties of the oracle which suffice for efficient sampling. We conduct experiments to show that these properties indeed correlate with sampling performance for certain language modeling tasks.
The efficacy of such sampling algorithms, however, is limited by the relationship between the underlying LLM and the particular sampling task at hand, which has motivated the framework of Test-Time Training (TTT). In particular, TTT updates a model's weights in response to partial generations and reward feedback received at inference time. In the latter half of the talk, we will discuss some provable benefits of TTT in the context of our sampling framework. Based on https://arxiv.org/pdf/2603.07887 (joint work with Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, and Akshay Krishnamurthy); and https://arxiv.org/pdf/2606.11437 (joint work with Ankur Moitra and Dhruv Rohatgi).
Speaker Bio: Noah Golowich is a postdoctoral researcher at Microsoft Research, NYC. In 2026, he will join the computer science department at UT Austin as an Assistant Professor. He completed he PhD at MIT, where he was advised by Constantinos Daskalakis and Ankur Moitra. He was a recipient of the 2025 AAAI/ACM SIGAI Doctoral Dissertation Award, the 2025 SIGecom Doctoral Dissertation Award, and the 2026 EATCS Doctoral Dissertation Award.
His research focuses broadly on the theoretical foundations of modern AI. He is particularly interested in the role that computational constraints play in shaping our current and future toolkit of algorithms for machine learning and AI.