ECE524: Foundations of Reinforcement Learning
Spring 2022, Tue. Thu. 09:30 am - 10:50 am, Equad B205,
Instructor: Chi Jin Office hour: Tuesday 4:00-5:00 pm [Zoom]
TA: Qinghua Liu Office hour: Wednesday 4:00-5:00 pm [Zoom]
Contents: Mathematical foundations of RL, mostly about theorems and proofs.
Grades: 4 problem sets (60%), 1 final exam (40%).
No late homework.
Lecture Notes
*Please see subscribed notes in 2020 version of the course.
1/25, 1/27. Intro [slide], MAB and MDP basics, policy evaluation and planning.
2/1, 2/3. Concentration inequalities for independent variables, martingales.
2/8, 2/10. Generative models, value Iteration.
2/15, 2/17. Exploration, multi-arm bandit, epsilon-greedy, UCB.
2/22. MDP Exploration, UCB-VI algorithm.
3/1, 3/3. Lower bounds for bandit and reinforcement learning.
3/15, 3/17. Linear MDP, least-squares value iteration.
3/22, 3/24. Global optimism, ELEANOR, fitted Q-iteration. [ELEANOR paper]
3/29, 3/31. Bellman rank, GOLF algorithm. [GOLF paper]
4/5, 4/7. Markov game, Nash equilibrium, direct combination approach. [slides]
4/12, 4/14. Nash value iteration, no-regret learning.
4/19, 4/21. V-learning, POMDPs, weakly revealing conditions, OMLE algorithm. [OMLE paper]
Schedule (weekly basis)
Basics (tabular MDP):
Intro, MDP basics and planning algorithms.
Concentration inequalities.
Generative models, TD algorithms.
Exploration in MAB: epsilon-greedy and UCB. [Homework 1 due]
Exploration in RL.
Minimax lower bound. [Homework 2 due]
Advanced Topics:
Reward-free/multi-objective RL.
Linear function approximation. [Homework 3 due]
General function approximation, OLIVE/GOLF.
Two-player zero-sum Markov games. [Homework 4 due]
Multiplayer general-sum Markov games.
Partial observable MDP. [Homework 5 due]
Reference Readings
Reinforcement Learning: Theory and Algorithms (draft), by Alekh Agarwal, Nan Jiang, Sham M. Kakade
Reinforcement learning: an introduction, by Richard S. Sutton, Andrew G. Barto
Algorithms for Reinforcement Learning, by Csaba Szepesvári
Bandit Algorithms, by Tor Lattimore, Csaba Szepesvari
Mathematical Tools
High dimensional probability. An introduction with applications in Data Science, by Roman Vershynin
Concentration inequalities and martingale inequalities — a survey, by Fan Chung, Linyuan Lu
Related Courses
Alekh Agarwal and Sham Kakade, Reinforcement Learning and Bandits
Nan Jiang, Statistical Reinforcement Learning
Alekh Agarwal and Alex Slivkins, Bandits and Reinforcement Learning
More practical/empirical version (will not be covered in this course)
Sergey Levine, Deep Reinforcement Learning
Shipra Agrawal, Reinforcement Learning