Speaker: Daniel Russo (Columbia University)
Title: Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
Speaker: Zakaria Mhammedi (Google Research)
Title: Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration
Speaker: Noah Golowich (Microsoft Research)
Title: Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference
Speaker: Andrew Wagenmaker (Microsoft Research)
Title: Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning