Lecturers: Dr. Ir. A. Braaksma (UT) and Prof. Dr. F.M. Spieksma (UL)
Time: Monday 15.15 - 17.00 (September 9 - November 11)
Place: Room HFG 611, Hans Freudenthal building, Budapestlaan 6, 3584 CD Utrecht
Course description:
The theory of Markov decision processes (MDPs) - also known under the names sequential decision theory, stochastic control or stochastic dynamic programming - studies sequential optimisation of stochastic systems by controlling their transition mechanism over time. Each control policy defines a stochastic process and values of objective functions associated with this process. The goal is to select a control policy that optimizes a function of the values generated by the utility functions.
In real life, decisions that are made usually have two types of impact. Firstly, they cost or save resources, such as money or time. Secondly, by influencing the dynamics of the system they have an impact on the future as well. Therefore, the decision with the largest immediate profit may not be good in view of future rewards in many situations. MDPs model this paradigm and can be used to model many important applications in practice. In this course, we provide results on the structure and existence of good policies, on methods for the computation of optimal policies, and illustrate them by applications.
Online learning is the process of providing online control decisions in sequential decision-making problems given knowledge about the optimal controls for the past decision epochs. In this course we apply the online learning techniques on finite-action Markov Decision Processes through Bandit problem.
Contents of the lectures:
1. Model formulation, policies, optimality criteria, the finite horizon.
2. Discounted rewards: optimality equation and solution method.
3. Structural properties.
4. Applications of MDPs.
5. Further topics in MDPs.
6. Online learning using MDPs.
Literature:
see Files folder.
The course is based on: lnmb-LN-NF-OK-updated.pdf.
The files can be found below as well as further background literature.
Prerequisites:
- Elementary knowledge of linear programming (e.g. K.G. Murty, Linear programming, Wiley, 1983).
- Elementary knowledge of probability theory ( e.g. S.M. Ross, A first course in probability, Macmillan, New York, 1976).
- Elementary knowledge of (numerical) analysis (e.g. Banach space; contracting mappings; Newton’s method; Laurent series).
Examination:
Take home problems plus a presentation. You work on the problems in groups of size 2 or 3, each group will present their solution of one of the homework problems on either November 4 or November 11. You can download the assignment here.
Please submit:
A pdf document containing your solutions. Make sure that the last names of all group members are in the file name and that your names are also in the document itself.
Your code, having a file name similar to the previous document, again including the last names of all group members in the file name.
Your presentation slides (incl the last names of all group members in the file name). The deadline for submitting your slides is the date on which your presentation is scheduled (i.e., November 4 or 11).
The deadline for submission is November 3, 23:59. Please submit by uploading the above-mentioned items (report, code, and slides) here.
Presentations:
The presentation schedule can be found here. You have 10 minutes for your presentation, followed by 2-3 minutes time for questions from the audience. To ensure equity in grading, the 10-minute time limit is strict. You will be stopped if you are still presenting after 10 minutes.
Mail address of the lecturers:
Dr. Ir. A. Braaksma: a.braaksma@utwente.nl
Prof. Dr. F.M. Spieksma: spieksma@math.leidenuniv.nl
Lecture slides
Slides lecture 1 (September 9)
Slides lecture 2 (September 16)
Slides lecture 3 (September 23) and additional study material on which these slides are based.
Slides lecture 4 (September 30)
Preliminary program