IE 3094 (Pitt). Markov Decision Processes (Spring 2023, Fall 2025, Graduate)
Probability Basics: Review of Random variables, Conditional probability and expectation, and Markov Processes.
Finite Horizon MDP: Dynamic programming backward induction, its optimality, with applications to inventory control, linear quadratic control, optimal stopping, and shortest path.
Finite Horizon Partially Observable MDP: Converting POMDP to MDP of Information State, Sufficient Statistics, Evolution of conditional distribution, DP equation as a function of sufficient statistics, Application to Sequential Hypothesis Testing.
Infinite Horizon Discounted Cost MDP: Bellman equation, uniqueness, and fixed point, value iteration, and necessary and sufficient conditions for optimality. Application to multi-armed bandits and Gittin's index.
Infinite Horizon Unbounded and Undiscounted Cost MDP: Bellman equation and value iteration with applications to inventory control, sequential analysis (change detection and sequential hypothesis testing), and linear quadratic control.
Introduction to Reinforcement Learning: Stochastic approximation and Q-learning.
Text: Lecture Notes, Dynamic Programming and Optimal Control, Vols I, II (Bertsekas), POMDP (Vikram Krishnamurthy), Neurodynamic Programming (Bertsekas and Tsitsiklis).