IE 3094 (Pitt). Markov Decision Processes (Spring 2023, Fall 2025, Graduate)


Probability Basics: Review of Random variables, Conditional probability and expectation, and Markov Processes.

Finite Horizon MDP: Dynamic programming backward induction, its optimality, with applications to inventory control, linear quadratic control, optimal stopping, and shortest path.

Finite Horizon Partially Observable MDP: Converting POMDP to MDP of Information State, Sufficient Statistics, Evolution of conditional distribution, DP equation as a function of sufficient statistics, Application to Sequential Hypothesis Testing.

Infinite Horizon Discounted Cost MDP: Bellman equation, uniqueness, and fixed point, value iteration, and necessary and sufficient conditions for optimality. Application to multi-armed bandits and Gittin's index. 

Infinite Horizon Unbounded and Undiscounted Cost MDP: Bellman equation and value iteration with applications to inventory control, sequential analysis (change detection and sequential hypothesis testing), and linear quadratic control.

Introduction to Reinforcement Learning: Stochastic approximation and Q-learning. 

Text: Lecture Notes, Dynamic Programming and Optimal Control, Vols I, II (Bertsekas), POMDP (Vikram Krishnamurthy), Neurodynamic Programming (Bertsekas and Tsitsiklis).