Markov Decision Processes and Reinforcement Learning

By Martin L. Puterman and Timothy C. Y. Chan

This book that will be an accessible and up to date introduction to Markov decision processes (MDPs). Our target audiences include undergraduate students, graduate students and self-directed learners looking for a structured introduction that covers foundations, algorithms, and applications.

Penultimate versions of chapters are posted here. We welcome all feedback and suggestions. Chapters 2-5 provide the necessary material for a course on MDPs.

This material will be published by Cambridge University Press in June 2026. This pre-publication version of the following chapters is free to view and download for personal use only. Not for redistribution, resale, or use in derivative works. ©Martin L. Puterman and Timothy C. Y. Chan, 2026.

Chaopter 1: Introduction
Chapter 2: Model Foundations
Chapter 3: Examples and Applications
Chapter 4: Finite Horizon Models
Chapter 5: Infinite Horizon Discounted MDPs
Chapter 6: Total Reward Models
Chapter 7: Average Reward Models
Chapter 8: POMDPs
Chapter 9: Value Function Approximation
Chapter 10: Simulation in Tabular Models
Chapter 11: Simulation and Function Approximation
Appendices: Markov Chains and Linear Programming