Advanced Topics in Reinforcement Learning and Decision Making

Overview

This course gives a thorough overview of the fundamental theory of reinforcement learning, as well as a look into advanced algorithms and modern proof techniques for reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments (including reviewing and implementation of research papers) and a mini project.

This is an advanced version of FDAT070, Decision Making Under Uncertainty.

Modules

This is only a suggested schedule. Points in italics are optionally covered.
  1. Beliefs and decisions [Christos] 
    1. Beliefs as probabilities.
    2. Hierarchies of decision making problems.
    3. The reinforcement learning problem
    4. Risk, uncertainty and Bayesian decision rules.
    5. PAC bounds.
    6. Regret.
  2. Reinforcement learning: fundamentals [Christos]
    1. (Stochastic) Bandit problems.
    2. UCB.
    3. Thompson Sampling.
    4. Markov decision processes (MDP). 
    5. Backwards Induction.
    6. Implementation of UCB / Thompson / BI
  3. Infinite horizon MDPs: Exact algorithms [Divya]
    1. Value Iteration.
    2. Policy Iteration.
    3. Temporal differences.
    4. Linear programming.
    5. Implementation of MPI.
  4. Finite, discounted MDPs: Stochastic approximation [Aristide]
    1. Sarsa & Q-Learning
    2. Stochastic approximation theory.
    3. Finite Sample Bounds.
    4. Actor-Critic architectures
    5. Policy gradient.
  5. Continuous, discounted MDPs: Function approximation [Hannes]
    1. Approximate policy iteration (LSTD, CLSP)
    2. Fitted value iteration (aka Deep Q-Learning)
    3. Policy gradient
    4. Implementation of one ADP algorithm and comparison / Open-AI gym.
  6. Bayesian reinforcement learning [Divya]
    1. Bayes-Adaptive MDPs
    2. Partially Observable MDPs
    3. Online planning algorithms (UCT, Branch and Bound, etc)
    4. Expectation Maximisation
  7. Bounds for reinforcement learning in MDPs [Aristide]
    1. Finite MDPs: UCRL2
    2. Finite MDPs: Thompson sampling
  8. Hierarchical Reinforcement Learning [Hannes]
    1. Planning hierarchies: The Options Framework
    2. Observation hierarchies: CTW and extensions (e.g. AI-xi)
    3. Factored MDPs
  9. Interaction with humans [Christos]
    1. Inverse Reinforcement Learning
    2. Collaborative Reinforcement Learning
  10. Selected topics on societal issues [Christos]
    1. Safety
    2. Fairness
    3. Privacy
    4. Mechanism design


Course organisation and material

The course takes place in LP3/4 and is worth 7.5 credits.
Prerequisites
  • Essential: [Math] Probability, Calculus [Programming] Good Python skills [Machine learning] Basic reinforcement learning.
Basic knowledge of reinforcement learning can be found in e.g.
  1. David Silver's RL course.
  2. The basic version of FDAT070, Decision Making Under Uncertainty.
  3. Stanford's machine learning course
While the course will go over the basic ideas, the focus will be on fundamentals, proofs and the state of the art.
Schedule
Start date: 13 February. Go here, or  e-mail me at chrdimi at chalmers.se if you want to join.
Tuesdays and Thursdays, 13:30-15:00. [Calendar]
Locations: EDIT 8103.
  • 13.2: EDIT 8103 Beliefs and decisions [C.. 1,2]
  • 15.2: EDIT 8103 RL fundamentals [Ch 3,4]
  • 20.2: EDIT 5128 Exact algorithms [Ch 6]
  • 22.2: EDIT 8103 [Reading 1Reading 2Reading 3]
  • 27.2: EDIT 8103 Stochastic approximation [Ch 7]
  • 01.3: EDIT 8103 [Reading 1, Reading 2]
  • 06.3: EDIT 8103 Function approximation [Ch 8]
  • 08.3: EDIT 8103 [Reading]
  • 13.3: EDIT 8103 Bayesian RL [Ch 9]
  • 15.3: EDIT 8103 [Reading 1, Reading 2, Reading 3]
  • 20.3: EDIT 8103 Regret bounds [Ch 10]
  • 22.3: EDIT 8103 [Reading 1, Reading 2]
  • 27.3: EDIT 8103 Options
  • 29.3: EDIT 8103 [Reading]
  • 03.4: [Easter]
  • 05.4: [Easter]
  • 17.4: EDIT 8103 Multi-agent RL
  • 19.4: EDIT 8103 Societal issues
  • 03.5: Project presentations
  • 01.5: Project reports

Readings

The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning". Other material will be referred to in the reading assignments.

Examination

10% participation, 50% assignments, 40% project

Assignments. Completion of 4 assignments is required for the course, focusing on the fundamentals. Those acting as discussion leaders on Thursdays are also responsible for presenting Tuesday's introductory material, which counts as 2 assignments.

Project. A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 2-3 students.

Preparatory meetings

These informal meetings are there to cover some basic ground, and help the teaching staff select material. Students do not need to attend.
  • 16.1: EDIT 8103 Prep: Fundamentals
  • 18.1: EDIT 8103 Prep: Exact algorithms
  • 23.1: EDIT 8103 Prep: Stochastics
  • 25.1: EDIT 6128 Prep: Function approximation
  • 30.1: EDIT 8103 Prep: Bayesian RL
  • 01.2: EDIT 8103 Prep: Regret bounds
  • 06.2: EDIT 8103 Prep: Options
  • 08.2: EDIT 8103 Prep: Miscellany
Comments