Inverse Reinforcement Learning Meets Large Language Model Alignment

University of Cambridge

[Slide (July Version Updated!)] [Preprint Tutorial Paper]

Tutorial overview

Large Language Model (LLM) alignment remains one of the most critical challenges in reinforcement learning. As the success of models like DeepSeek-R1 demonstrates, improving alignment requires better architectures and a deeper understanding of reinforcement learning (RL) and reward modeling. This tutorial explores the connection between Inverse Reinforcement Learning (IRL) and LLM alignment, offering a structured roadmap for researchers and practitioners.

We frame LLM alignment as an inverse RL problem, contrasting traditional reinforcement learning with inverse methods that infer rewards from human data. A key focus is on reward models, examining how they are constructed from various data sources, including mathematical reasoning, binary feedback, preference data, and demonstrations.

Beyond theory, we delve into infrastructure and practical implementation, showcasing how to efficiently evaluate IRL-based LLM alignment ideas in minutes. We conclude with insights from sparse-reward RL, covering reward shaping, credit assignment, and lessons from self-play.

By the end of this tutorial, attendees will gain a practical and theoretical understanding of LLM alignment through inverse RL, equipping them with the tools to build better-aligned models efficiently.

Contents

Part 1. Motivations

Breakthroughs on RL x LLMs

Part 2. RL Meets LLMs: Forward and Inverse

RL, MDP; Inverse RL, MDP\R
LLM alignment as Inverse RL
Why do we (always) need RMs?

Part 3. Inverse: Learning Reward Models from Data

Building Reward Models for Chat (RLHF)
Building Reward Models for Math (Reasoning)

Part 4. Forward: LLM Optimization with Reward Models

Optimization Algorithms
Challenges and Opportunities

Part 5. Insights from Sparse-Reward RL Literature

Reward Shaping and Credit Assignment
Wisdom of Hindsight
Dense or Sparse? This is the question
Lessons from self-play

Brief Introduction of Speakers

Professor Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence, and Medicine at the University of Cambridge. In addition to leading the van der Schaar Lab, Mihaela is the founder and director of the Cambridge Centre for AI in Medicine (CCAIM).

Hao Sun is a final-year Ph.D. student at the University of Cambridge, working at the intersection of reinforcement learning and large language models.

References

Preprint Tutorial Paper: https://arxiv.org/pdf/2507.13158

Page updated

Google Sites

Report abuse