ECE586RS: MDPs and Reinforcement Learning
Term: Fall 2022
Prerequisites: ECE 534 (Random Processes)
Instructor: Prof. R. Srikant, rsrikant@illinois.edu
TAs: Yashaswini Murthy (ymurthy2@illinois.edu) and Anna Winnicki (annaw5@illinois.edu)
Prof. Srikant’s Office Hours: 2:20-3:00 MW, 107 CSL
TAs' Office Hours: Tue 3-5 pm, 3032 ECEB (3-4 Anna, 4-5 Yashaswini)
Lectures: 1-2:20 MW in Room 2015 ECEB
Fall Break: Nov. 19-Nov. 27
Last Day of instruction: Dec. 7
Outline (Time Permitting):
MDPs: Finite-horizon problems, infinite-horizon discount cost problems, Bellman equation, contraction and monotonicity properties, value and policy iteration
Optimization Background: gradient descent, mirror descent and stochastic gradient descent
Approximate Dynamic Programming: Approximate value iteration; policy evaluation using least-squares and gradient descent
TD Learning: Algorithms for tabular and function approximation settings, finite-time performance bounds and convergence
RL Methods Motivated by Value Iteration: Q-learning based on a single trajectory, with and without function approximation, offline and online versions, finite-time bounds and convergence
RL Methods Motivated by Policy Iteration: Policy gradient, natural policy gradient, finite-time bounds and convergence
Episodic RL: Q-learning over a finite-time horizon, connection to multi-armed bandits, regret bounds
Grading :
Homework: 80% (Homework will be posted on canvas)
Final Exam: 20% (7-10 pm, Dec. 14)
References:
MDPs:
D. P. Bertsekas. Dynamic Programming and Optimal Control, vol. I and II, Athena Scientific, 1995. (Later editions, vol. I, 2017 and vol. 2, 2012)
M. L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
S. M. Ross. Applied probability models with optimization applications. Courier Corporation, 2013.
A concise introduction to MDPs can be found in Chapter 17 of M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning, MIT Press, 2018.
Optimization:
A. Beck. Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. SIAM, 2014.
A. Beck. First-order methods in optimization. SIAM, 2017.
RL :
D. P. Bertsekas, and J. N. Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996.
D. P. Bertsekas. Reinforcement learning and optimal control. Athena Scientific, 2019.
S. T. Maguluri's lecture notes based on his ISyE course at GaTech.
S. P. Meyn. Control Systems and Reinforcement Learning, Cambridge University Press, 2022.
And several papers.