Fall 2025, Tue. Thu. 9 - 10:20 am, Fine Hal 110,
Instructor: Chi Jin Office hour: Fri. 9 - 10 am, Equad C332
TA: Wenzhe Li Office hour: Wed. 4 - 5 pm, Equad D321
Contents: Mathematical foundations of RL, mostly about theorems and proofs.
Grades: 5 problem sets (60%), 1 final exam (40%).
No late homework.
See past versions of the course here: 2024 Verison, 2022 Version, 2020 Version
Video recordings for lectures of the 2024 Version: YouTube.
Basics (tabular MDP):
Intro, MDP basics and planning.
Concentration inequalities.
Generative models, value iteration.
Online RL, exploration, optimism. [Homework 1 due]
Offline RL, pessimism.
Minimax lower bound. [Homework 2 due]
Advanced Topics:
Policy optimization.
Large state space, linear function approximation. [Homework 3 due]
General function approximation.
Game theory and multiagent RL . [Homework 4 due]
Learning Markov games.
Partial observable MDP. [Homework 5 due]
Reinforcement Learning: Theory and Algorithms (draft), by Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun
Reinforcement learning: an introduction, by Richard S. Sutton, Andrew G. Barto
Algorithms for Reinforcement Learning, by Csaba Szepesvári
Bandit Algorithms, by Tor Lattimore, Csaba Szepesvari
Mathematical Tools
High dimensional probability. An introduction with applications in Data Science, by Roman Vershynin
Concentration inequalities and martingale inequalities — a survey, by Fan Chung, Linyuan Lu
Nan Jiang, Statistical Reinforcement Learning
Wen Sun and Sham Kakade, Foundations of Reinforcement Learning
Dylan J. Foster and Alexander Rakhlin, Statistical Reinforcement Learning and Decision Making
Alekh Agarwal and Alex Slivkins, Bandits and Reinforcement Learning
More practical/empirical version (will not be covered in this course)
Sergey Levine, Deep Reinforcement Learning