Advanced Topics in Reinforcement Learning and Decision Making
This course gives a thorough overview of the fundamental theory of reinforcement learning, as well as a look into advanced algorithms and modern proof techniques for reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments (including reviewing and implementation of research papers) and a mini project.
This is an advanced version of FDAT070, Decision Making Under Uncertainty.
This is only a suggested schedule. Points in italics are optionally covered.
- Beliefs and decisions [Christos]
- Beliefs as probabilities.
- Hierarchies of decision making problems.
- The reinforcement learning problem
- Risk, uncertainty and Bayesian decision rules.
- PAC bounds.
- Reinforcement learning: fundamentals [Christos]
- (Stochastic) Bandit problems.
- Thompson Sampling.
- Markov decision processes (MDP).
- Backwards Induction.
- Implementation of UCB / Thompson / BI
- Infinite horizon MDPs: Exact algorithms [Divya]
- Value Iteration.
- Policy Iteration.
- Temporal differences.
- Linear programming.
- Implementation of MPI.
- Finite, discounted MDPs: Stochastic approximation [Aristide]
- Sarsa & Q-Learning
- Stochastic approximation theory.
- Finite Sample Bounds.
- Actor-Critic architectures
- Policy gradient.
- Continuous, discounted MDPs: Function approximation [Hannes]
- Approximate policy iteration (LSTD, CLSP)
- Fitted value iteration (aka Deep Q-Learning)
- Policy gradient
- Implementation of one ADP algorithm and comparison / Open-AI gym.
- Bayesian reinforcement learning [Divya]
- Bayes-Adaptive MDPs
- Partially Observable MDPs
- Online planning algorithms (UCT, Branch and Bound, etc)
- Expectation Maximisation
- Bounds for reinforcement learning in MDPs [Aristide]
- Finite MDPs: UCRL2
- Finite MDPs: Thompson sampling
- Hierarchical Reinforcement Learning [Hannes]
- Planning hierarchies: The Options Framework
- Observation hierarchies: CTW and extensions (e.g. AI-xi)
- Factored MDPs
- Interaction with humans [Christos]
- Inverse Reinforcement Learning
- Collaborative Reinforcement Learning
- Selected topics on societal issues [Christos]
- Mechanism design
Course organisation and material
The course takes place in LP3/4 and is worth 7.5 credits.
- Essential: [Math] Probability, Calculus [Programming] Good Python skills [Machine learning] Basic reinforcement learning.
Basic knowledge of reinforcement learning can be found in e.g.
- David Silver's RL course.
- The basic version of FDAT070, Decision Making Under Uncertainty.
- Stanford's machine learning course
While the course will go over the basic ideas, the focus will be on fundamentals, proofs and the state of the art.
Start date: 13 February. Go here, or e-mail me at chrdimi at chalmers.se if you want to join.
Tuesdays and Thursdays, 13:30-15:00. [Calendar]
Locations: EDIT 8103.
- 13.2: EDIT 8103 Beliefs and decisions [C.. 1,2]
- 15.2: EDIT 8103 RL fundamentals [Ch 3,4]
- 20.2: EDIT 5128 Exact algorithms [Ch 6]
- 22.2: EDIT 8103 [Reading 1, Reading 2, Reading 3]
- 27.2: EDIT 8103 Stochastic approximation [Ch 7]
- 01.3: EDIT 8103 [Reading 1, Reading 2]
- 06.3: EDIT 8103 Function approximation [Ch 8]
- 08.3: EDIT 8103 [Reading]
- 13.3: EDIT 8103 Bayesian RL [Ch 9]
- 15.3: EDIT 8103 [Reading 1, Reading 2, Reading 3]
- 20.3: EDIT 8103 Regret bounds [Ch 10]
- 22.3: EDIT 8103 [Reading 1, Reading 2]
- 27.3: EDIT 8103 Options
- 29.3: EDIT 8103 [Reading]
- 03.4: [Easter]
- 05.4: [Easter]
- 17.4: EDIT 8103 Multi-agent RL
- 19.4: EDIT 8103 Societal issues
- 03.5: Project presentations
- 01.5: Project reports
The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning". Other material will be referred to in the reading assignments.
10% participation, 50% assignments, 40% project
Assignments. Completion of 4 assignments is required for the course, focusing on the fundamentals. Those acting as discussion leaders on Thursdays are also responsible for presenting Tuesday's introductory material, which counts as 2 assignments.
Project. A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 2-3 students.
These informal meetings are there to cover some basic ground, and help the teaching staff select material. Students do not need to attend.
- 16.1: EDIT 8103 Prep: Fundamentals
- 18.1: EDIT 8103 Prep: Exact algorithms
- 23.1: EDIT 8103 Prep: Stochastics
- 25.1: EDIT 6128 Prep: Function approximation
- 30.1: EDIT 8103 Prep: Bayesian RL
- 01.2: EDIT 8103 Prep: Regret bounds
- 06.2: EDIT 8103 Prep: Options
- 08.2: EDIT 8103 Prep: Miscellany