Mihaela van der Schaar, Hao Sun
University of Cambridge
Tutorial overview
Large Language Model (LLM) alignment remains one of the most critical challenges in reinforcement learning. As the success of models like DeepSeek-R1 demonstrates, improving alignment requires better architectures and a deeper understanding of reinforcement learning (RL) and reward modeling. This tutorial explores the connection between Inverse Reinforcement Learning (IRL) and LLM alignment, offering a structured roadmap for researchers and practitioners.
We frame LLM alignment as an inverse RL problem, contrasting traditional reinforcement learning with inverse methods that infer rewards from human data. A key focus is on reward models, examining how they are constructed from various data sources, including mathematical reasoning, binary feedback, preference data, and demonstrations.
Beyond theory, we delve into infrastructure and practical implementation, showcasing how to efficiently evaluate IRL-based LLM alignment ideas in minutes. We conclude with insights from sparse-reward RL, covering reward shaping, credit assignment, and lessons from self-play.
By the end of this tutorial, attendees will gain a practical and theoretical understanding of LLM alignment through inverse RL, equipping them with the tools to build better-aligned models efficiently.
Contents
Part 1. Motivations
Breakthroughs on RL x LLMs
Part 2. RL Meets LLMs: Forward and Inverse
RL, MDP; Inverse RL, MDP\R
LLM alignment as Inverse RL
Why do we (always) need RMs?
Part 3. Inverse: Learning Reward Models from Data
Building Reward Models for Chat (RLHF)
Building Reward Models for Math (Reasoning)
Part 4. Forward: LLM Optimization with Reward Models
Optimization Algorithms
Challenges and Opportunities
Part 5. Insights from Sparse-Reward RL Literature
Reward Shaping and Credit Assignment
Wisdom of Hindsight
Dense or Sparse? This is the question
Lessons from self-play
Brief Introduction of Speakers
Professor Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence, and Medicine at the University of Cambridge. In addition to leading the van der Schaar Lab, Mihaela is the founder and director of the Cambridge Centre for AI in Medicine (CCAIM).
Hao Sun is a final-year Ph.D. student at the University of Cambridge, working at the intersection of reinforcement learning and large language models.