MDP 2018-2019

Time:

Location:

Lecturer:

Dr. F.M. Spieksma (UL)

Monday 13.15 - 15.00 (February 25 - May 6).

Mathematical Building, Room 611AB, Budapestlaan, Utrecht (De Uithof).

Attendance list:

attendance list

HW Remarks

3.1 the idea was that you invent something to model this, but quite likely I was overly optimistic..

So, one can for instance take the discrete time points of observation of the process to be 0,h,2h,...,

with h small compared to d_1 and d_2. Then one can take a dimensional state space, with one obvious component and

the other reflecting the amount of service the customer who is being served still has to get..

If alpha was the discount factor per unit time, you have to adapt alpha properly to get the right discount factor per interval of length h..

A good choice for the bounding function M is a function of the type e^{beta x} in the case of 1 -dimensional space..

3.2 remind that for convexity of the value function in the admission control single server queue, apart from convexity, we needed non-decreasingness of the n-horizon value functions in the succ.approx algorithm.. For the propagation of some desired properties one may thus need to impose more structure! In this case, one may need to impose that the differences are bounded by some number.. and show that this propagates..

Course description:

The theory of Markov decision processes (MDPs) - also known under the names sequential decision theory, stochastic control or stochastic dynamic programming - studies sequential optimization of stochastic systems by controlling their transition mechanism over time. Each control policy defines a stochastic process and values of objective functions associated with this process. The goal is to select a control policy that optimizes a function of the values generated by the utility functions.

In real life, decisions that are made usually have two types of impact. Firstly, they cost or save resources, such as money or time. Secondly, by influencing the dynamics of the system they have an impact on the future as well. Therefore, the decision with the largest immediate profit may not be good in view of future rewards in many situations. MDPs model this paradigm and can be used to model many important applications in practice. In this course, we provide results on the structure and existence of good policies, on methods for the computation of optimal policies, and illustrate them by applications.

Contents of the lectures:

1. Model formulation, policies, optimality criteria, the finite horizon.

2. Discounted rewards: optimality equation and solution method.

3. Average rewards: optimality equation and solution methods.

4. Structural properties.

5. Applications of MDPs.

6. Further topics in MDPs.

Literature:

see Files folder.

The course is based on: lnmb.pdf, and lecture_notes.pdf

Extra literature: Lecture-Notes-MDP.pdf (lecture notes by Lodewijk Kallenberg).

Files can be found below; the subpage `files' contains old files from previous courses.

Prerequisites:

- Elementary knowledge of linear programming (e.g. K.G. Murty, Linear programming, Wiley, 1983).

- Elementary knowledge of probability theory ( e.g. S.M. Ross, A first course in probability, Macmillan, New York, 1976).

- Elementary knowledge of (numerical) analysis (e.g. Banach space; contracting mappings; Newton’s method; Laurent series).

Examination:

Take home problems (plan is 5 take home assignments). You may hand in the problems in pairs or triples! Preferably hand in electronically.

Address of the lecturer:

Dr. F.M. Spieksma

Mathematisch Instituut, Universiteit Leiden

Postbus 9512, 2300 RA Leiden

Phone: 071 - 5277128, E-mail: spieksma@math.leidenuniv.nl

URL: pub.math.leidenuniv.nl/~spieksmafm

Preliminary program