Date: May 11, 2026
Speaker:
University of Pennsylvania
Data-driven decision-making relies on credible policy evaluation: we need to know whether a learned policy truly improves outcomes. This talk examines a key failure mode—the winner’s curse—where policy optimization exploits prediction error and selection, producing optimistic, often spurious performance gains.
First, we show that model-based policy optimization and evaluation can report large, stable improvements even when common “reassurances” from the literature hold: training data come from randomized trials, estimated gains are large, and predictive models are accurate, well-calibrated, and stable. We give theoretical constructions where true improvements are zero yet predicted gains are substantial. We illustrate these pitfalls in a simulation study inspired by refugee matching, where widely-used model-based evaluation projects large employment gains of over 60% even when the ground truth effect is zero.
Second, we argue that avoiding this optimism pushes us toward model-free off-policy evaluation—but its variance can be prohibitive, making naïve “optimize then evaluate” pipelines unreliable. To this end, we introduce inference-aware policy optimization, which anticipates downstream model-free evaluation by optimizing both estimated performance and the probability that the estimated improvement will pass a significance test on held-out data. We characterize the Pareto frontier of this tradeoff and provide an algorithm to estimate it, enabling policies that are not only promising, but also testable.
Hamsa Sridhar Bastani is an Associate Professor in Operations, Information and Decisions and Statistics & Data Science at the Wharton School of the University of Pennsylvania, where she co-directs the Wharton Healthcare Analytics Lab. Her research develops machine learning methods for learning and optimization—particularly reinforcement learning and human–AI collaboration—with applications in healthcare, public policy, and education. Her work has appeared in outlets including Nature, PNAS, Management Science, and Operations Research, and has been recognized with honors including the Wagner Prize and the INFORMS Pierskalla Award.
Date: May 18, 2026
Speaker:
Carnegie Mellon University
TBD
TBD
Date: June 1, 2026
Speaker:
Stanford University
Because human preferences are too complex to codify, AIs operate with misspecified objectives. Optimizing such objectives often produces undesirable outcomes; this phenomenon is known as reward hacking. Such outcomes are not necessarily catastrophic. Indeed, most examples of reward hacking in previous literature are benign. And typically, objectives can be modified to resolve the issue.
We study the prospect of catastrophic outcomes induced by AIs operating in complex environments. We argue that, when capabilities are sufficiently advanced, pursuing a fixed consequentialist objective tends to result in catastrophic outcomes. We formalize this by establishing conditions that provably lead to such outcomes. Under these conditions, simple or random behavior is safe. Catastrophic risk arises due to extraordinary competence rather than incompetence.
With a fixed consequentialist objective, avoiding catastrophe requires constraining AI capabilities. In fact, constraining capabilities the right amount not only averts catastrophe but yields valuable outcomes. Our results apply to any objective produced by modern industrial AI development pipelines.
Benjamin Van Roy is a Professor at Stanford University, where he has served on the faculty since 1998. His research focuses on reinforcement learning and alignment. Beyond academia, he founded the Efficient Agent Team at DeepMind (acquired by Google) and Enuvis (acquired by SiRF/Qualcomm). He has also led research programs at Morgan Stanley and Unica (acquired by IBM). He received the SB in Computer Science and Engineering and the SM and PhD in Electrical Engineering and Computer Science, all from MIT, where his doctoral research was advised by John N. Tsitsiklis.
He is a Fellow of INFORMS and IEEE and has served on the editorial boards of Machine Learning, Mathematics of Operations Research, for which he edited the Learning Theory Area, Operations Research, for which he edited the Financial Engineering Area, the INFORMS Journal on Optimization, and Foundations and Trends in Machine Learning. He has been a recipient of the MIT George C. Newton Undergraduate Laboratory Project Award, the MIT Morris J. Levin Memorial Master's Thesis Award, the MIT George M. Sprowls Doctoral Dissertation Award, the National Science Foundation CAREER Award, the Stanford Tau Beta Pi Award for Excellence in Undergraduate Teaching, the Management Science and Engineering Department's Graduate Teaching Award, the INFORMS Frederick W. Lanchester Prize, and the INFORMS Philip McCord Morse Lectureship Award.
He has graduated dozens of doctoral students, who have gone on to careers in academia (Carnegie Mellon, Columbia, Cornell, MIT, Northwestern, Rice, Stanford, USC), technology (Adobe, Amazon, DeepMind, Meta, Microsoft, Netflix, OpenAI, Spotify, Tesla, xAI), and finance (Citadel, DE Shaw, Goldman Sachs, Jane Street, Morgan Stanley, Two Sigma).