ELE524: Foundations of Reinforcement Learning
Spring 2020, Tue. Thu. 09:30 am - 10:50 am, Friend Center 006,
Instructor: Chi Jin Office hour: Tue. 4:00 pm - 5:00 pm, C-332 Equad
TA: Zhiyuan Li Office hour: Wed. 3:30 pm - 4:30 pm, 315 CS building
Contents: Mathematical foundations of RL, mostly about theorems and proofs.
Grades: 5 problem sets (60%), 1 scribe note (10%) and 1 final exam (30%). Final will take 40% for students who have not done scribe note.
No late homework. For two students scribing for the same lecture, please submit a merged version within one week of the lecture.
Scribe note [sign up sheet] and [template].
Lecture Notes
2/6. Concentration inequalities [draft] [note]. (see also Chapter 2 of [Ver 2020])
2/13. MDP planning [draft][note]. (see also Chapter 1 of [AJK 2019])
2/18. Generative model, value iteration (coarse analysis) [draft][note]. (see also Chapter 2 of [AJK 2019])
2/20. Value iteration (refined analysis), Q-learning [draft][note].
2/25. Generative model summary, multi-arm bandit [draft][note]. (see also Part II of [LS 2018])
3/5. Q-learning with UCB, MDP summary. [draft][note]. (see also [JABJ 2018])
3/31. PPO, TROP and natural policy gradient algorithms. [video][draft][note].
4/7. Function approximation overview, linear MDP. [video][draft][note]
4/9. Least-Squares Value Iteration (LSVI). [video][draft][note]
4/14. LSVI with UCB, Fitted Q-Iteration (FQI). [video][draft][note]
4/16. Analysis for FQI and Bellman rank intro. [video][draft][note]
4/21. (Guest lecture by Akshay Krishnamurthy) Bellman rank and OLIVE algorithm. [video]
4/30. Partially Observable MDP, Predictive State Representation (PSR). [video1][video2][draft][note]
Schedule (weekly basis)
Basics (tabular MDP):
Intro, MAB and MDP basics, concentration inequalities.
MDP Planning.
Generative models, TD algorithms.
Exploration in MAB: epsilon-greedy and UCB. [Homework 1 due]
Exploration in RL.
Minimax lower bound. [Homework 2 due]
Advanced Topics:
Policy optimization.
Linear quadratic regulator. [Homework 3 due]
Linear Function approximation.
General Function approximation. [Homework 4 due]
Off-policy evaluation / optimization.
Markov Games, Partial observable MDP. [Homework 5 due]
Reference Readings
Reinforcement Learning: Theory and Algorithms (draft), by Alekh Agarwal, Nan Jiang, Sham M. Kakade
Reinforcement learning: an introduction, by Richard S. Sutton, Andrew G. Barto
Algorithms for Reinforcement Learning, by Csaba Szepesvári
Bandit Algorithms, by Tor Lattimore, Csaba Szepesvari
Mathematical Tools
High dimensional probability. An introduction with applications in Data Science, by Roman Vershynin
Concentration inequalities and martingale inequalities — a survey, by Fan Chung, Linyuan Lu
Related Courses
Alekh Agarwal and Sham Kakade, Reinforcement Learning and Bandits
Nan Jiang, Statistical Reinforcement Learning
Alekh Agarwal and Alex Slivkins, Bandits and Reinforcement Learning
More practical/empirical version (will not be covered in this course)
Sergey Levine, Deep Reinforcement Learning
Shipra Agrawal, Reinforcement Learning