MDPs and RL 2022

ECE586RS: MDPs and Reinforcement Learning

Term: Fall 2022

Prerequisites: ECE 534 (Random Processes)

TAs: Yashaswini Murthy (ymurthy2@illinois.edu) and Anna Winnicki (annaw5@illinois.edu)

Prof. Srikant’s Office Hours: 2:20-3:00 MW, 107 CSL

TAs' Office Hours: Tue 3-5 pm, 3032 ECEB (3-4 Anna, 4-5 Yashaswini)

Lectures: 1-2:20 MW in Room 2015 ECEB

Fall Break: Nov. 19-Nov. 27

Last Day of instruction: Dec. 7

Outline (Time Permitting):

MDPs: Finite-horizon problems, infinite-horizon discount cost problems, Bellman equation, contraction and monotonicity properties, value and policy iteration
Optimization Background: gradient descent, mirror descent and stochastic gradient descent
Approximate Dynamic Programming: Approximate value iteration; policy evaluation using least-squares and gradient descent
TD Learning: Algorithms for tabular and function approximation settings, finite-time performance bounds and convergence
RL Methods Motivated by Value Iteration: Q-learning based on a single trajectory, with and without function approximation, offline and online versions, finite-time bounds and convergence
RL Methods Motivated by Policy Iteration: Policy gradient, natural policy gradient, finite-time bounds and convergence
Episodic RL: Q-learning over a finite-time horizon, connection to multi-armed bandits, regret bounds

Grading :

References:

MDPs:

D. P. Bertsekas. Dynamic Programming and Optimal Control, vol. I and II, Athena Scientific, 1995. (Later editions, vol. I, 2017 and vol. 2, 2012)
M. L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
S. M. Ross. Applied probability models with optimization applications. Courier Corporation, 2013.
A concise introduction to MDPs can be found in Chapter 17 of M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning, MIT Press, 2018.

Optimization:

A. Beck. Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. SIAM, 2014.

RL :

A. Agarwal, N. Jiang, S. Kakade, W. Sun. Reinforcement Learning Theory and Applications, Working Book.
D. P. Bertsekas, and J. N. Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996.
D. P. Bertsekas. Reinforcement learning and optimal control. Athena Scientific, 2019.
S. T. Maguluri's lecture notes based on his ISyE course at GaTech.
S. P. Meyn. Control Systems and Reinforcement Learning, Cambridge University Press, 2022.
Csaba Szepesvari's RL Course notes.
And several papers.