Markov Decision Processes

2022-2023

Time:

Location:

Lecturer:

Dr O. Kanavetas and Dr. F.M. Spieksma (UL)

Monday 13.15 - 15.00 (September 12 - November 14)

Room 0.16, Minnaert building, Leuvenlaan 4, 3584 CE Utrecht

Course description:

The theory of Markov decision processes (MDPs) - also known under the names sequential decision theory, stochastic control or stochastic dynamic programming - studies sequential optimization of stochastic systems by controlling their transition mechanism over time. Each control policy defines a stochastic process and values of objective functions associated with this process. The goal is to select a control policy that optimizes a function of the values generated by the utility functions.

In real life, decisions that are made usually have two types of impact. Firstly, they cost or save resources, such as money or time. Secondly, by influencing the dynamics of the system they have an impact on the future as well. Therefore, the decision with the largest immediate profit may not be good in view of future rewards in many situations. MDPs model this paradigm and can be used to model many important applications in practice. In this course, we provide results on the structure and existence of good policies, on methods for the computation of optimal policies, and illustrate them by applications.

Online learning is the process of providing online control decisions in sequential decision-making problems given knowledge about the optimal controls for the past decision epochs. In this course we apply the online learning techniques on finite-action Markov Decision Processes through Bandit problem.

Contents of the lectures:

1. Model formulation, policies, optimality criteria, the finite horizon.

2. Discounted rewards: optimality equation and solution method.

3. Structural properties.

4. Applications of MDPs.

5. Further topics in MDPs.

6. Online learning using MDPs.

Literature:

see Files folder.

The course is based on: lnmb.pdf, and lecture_notes.pdf

Extra literature: Lecture-Notes-MDP.pdf (lecture notes by Lodewijk Kallenberg).

Files can be found below; the subpage `files' contains old files from previous courses.

Prerequisites:

- Elementary knowledge of linear programming (e.g. K.G. Murty, Linear programming, Wiley, 1983).

- Elementary knowledge of probability theory ( e.g. S.M. Ross, A first course in probability, Macmillan, New York, 1976).

- Elementary knowledge of (numerical) analysis (e.g. Banach space; contracting mappings; Newton’s method; Laurent series).

Examination:

Take home problems. You work on the problems in groups. This task may contain a presentation by your group.

Assignment: Exercise 1.6 (airline overbooking), 2.8 (roulette pbm), and a selection of one out of 2.18 (dynamic location pbm) and 2.12 (bandit pbm). For the presentation you can choose any of the 3 exercises you chose to solve. You hand in both solutions and presentations by Sunday Nov. 6 at 23.59 (via email). Assessment: 2.5 points per exercise and 2.5 for the presentation, total 10 points.

Address of the lecturers:

Dr. O. Kanavetas

Mathematisch Instituut, Universiteit Leiden

Postbus 9512, 2300 RA Leiden

E-mail: o.kanavetas@math.leidenuniv.nl

Dr. F.M. Spieksma

Mathematisch Instituut, Universiteit Leiden

Postbus 9512, 2300 RA Leiden

E-mail: spieksma@math.leidenuniv.nl

URL: pub.math.leidenuniv.nl/~spieksmafm

Preliminary program