Speaker: Jongha Ryu (MIT)
Title: Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing
Paper: https://arxiv.org/abs/2502.10826
Slides: TBD
The recording will be uploaded here after the event.
Authors: Jongha Ryu, Jeongyeol Kwon, Benjamin Koppe, Kwang-Sung Jun
Abstract: I will present two improved algorithms for off-policy selection and learning in contextual bandits, where the goal is to identify a reward-maximizing policy using offline data collected by a fixed behavior policy.
For off-policy selection, I will introduce a pessimism-based algorithm that uses a new lower confidence bound derived from a betting framework with Cover’s universal portfolio. The resulting guarantee is variance-adaptive and strictly sharper than prior work.
For off-policy learning, I will describe a general condition on optimization objectives that reveals a different bias-variance tradeoff. A special instance, freezing, sharply reduces variance in small-data regimes while matching the best existing guarantees.
I will present experiments showing that the new selection method consistently outperforms previous approaches and that freezing yields substantial gains when data are scarce.
Speaker Bio: Jongha (Jon) Ryu is a postdoctoral associate at MIT EECS. He received his Ph.D. in Electrical and Computer Engineering from UC San Diego. His research focuses on statistical and mathematical foundations of scientific machine learning, with recent work on neural spectral methods, score-based generative modeling, and off-policy or sequential decision-making problems.