MDPs and RL 2025

ECE586RS: MDPs and Reinforcement Learning

Term: Fall 2025

Prerequisites: ECE 534 (Random Processes)

TAs: Seo Taek Kong (skong10@illinois.edu) and Saptarshi Mandal (smandal4@illinois.edu)

Office Hours: 3001 ECEB, 4- 6pm Wednesdays

Lectures: 12:30-1:50 TuTh in Room 106B8 Engineering Hall

Fall Break: Nov. 22-Nov. 30

Last Day of instruction for this class: Dec. 9

Outline:

This is a theoretical course on Reinforcement Learning, which will cover the following topics (time permitting):

MDPs: Finite-horizon problems, infinite-horizon discount cost problems, Bellman equation, contraction and monotonicity properties, value and policy iteration
Optimization Background: gradient descent, mirror descent and stochastic gradient descent
TD Learning: Algorithms for tabular and function approximation settings, finite-time performance bounds and convergence
RL Methods Motivated by Value Iteration: Q-learning based on a single trajectory, with and without function approximation, offline and online versions, finite-time bounds and convergence
RL Methods Motivated by Policy Iteration: Policy gradient, mirror descent-based methods, finite-time bounds and convergence
Episodic RL: Q-learning over a finite-time horizon, connection to multi-armed bandits, regret bounds

Grading :

Problem Sets (will be posted on canvas): 80%
- Unless otherwise stated, each problem will count for ten points. You total homework points at the end of the semester will count for 80% of your grade.
- If you miss the deadline for a homework, you will still be allowed to submit it up to 48 hours after the deadline for 75% of the credit. Submissions beyond 48 hours will not be permitted and will receive zero points.
- Homework will be assigned approximately every two weeks, and after a homework is assigned, you will have at least one week to complete it.
Final Exam: 20% (9-11 am, Dec. 16)

References:

MDPs:

D. P. Bertsekas. Dynamic Programming and Optimal Control, vol. I and II, Athena Scientific, 1995. (Later editions, vol. I, 2017 and vol. 2, 2012)
M. L. Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
S. M. Ross. Applied probability models with optimization applications. Courier Corporation, 2013.
A concise introduction to MDPs can be found in Chapter 17 of M. Mohri, A. Rostamizadeh, and A. Talwalkar. Foundations of Machine Learning, MIT Press, 2018.

RL :

A. Agarwal, N. Jiang, S. Kakade, W. Sun. Reinforcement Learning Theory and Applications, Working Book.
D. P. Bertsekas, and J. N. Tsitsiklis. Neuro-dynamic programming. Athena Scientific, 1996.
D. P. Bertsekas. Reinforcement learning and optimal control. Athena Scientific, 2019.
S. T. Maguluri's lecture notes based on his ISyE course at GaTech.
S. P. Meyn. Control Systems and Reinforcement Learning, Cambridge University Press, 2022.
Csaba Szepesvari's RL Course notes.
And several papers.

Optimization:

A. Beck. Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. SIAM, 2014.

Markov Chains

Report abuse