Advanced Topics in Reinforcement Learning and Decision Making

Overview

This course gives a thorough overview of the fundamental theory of reinforcement learning, as well as a look into advanced algorithms and modern proof techniques for reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments (including reviewing and implementation of research papers) and a mini project.

Modules

This is only a suggested schedule. Points in italics are optionally covered.
  1. Beliefs and decisions [Christos] 
    1. Beliefs as probabilities.
    2. Hierarchies of decision making problems.
    3. The reinforcement learning problem
    4. Risk, uncertainty and Bayesian decision rules.
    5. PAC bounds.
    6. Regret.
  2. Reinforcement learning: fundamentals [Christos]
    1. (Stochastic) Bandit problems.
    2. UCB.
    3. Thompson Sampling.
    4. Markov decision processes (MDP). 
    5. Backwards Induction.
    6. Monte-Carlo Tree Sarch
  3. Infinite horizon MDPs: Exact algorithms [Divya]
    1. Value Iteration.
    2. Policy Iteration.
    3. Temporal differences.
    4. Linear programming.
    5. Policy gradient.
  4. Finite, discounted MDPs: Stochastic approximation [Aristide]
    1. Sarsa & Q-Learning
    2. Stochastic approximation theory.
    3. Finite Sample Bounds.
    4. Actor-Critic architectures
    5. Policy gradient.
  5. Continuous, discounted MDPs: Function approximation [Hannes]
    1. Approximate policy iteration (LSTD, CLSP)
    2. Fitted value iteration (aka Deep Q-Learning)
    3. Policy gradient
  6. Bayesian reinforcement learning [Divya]
    1. Bayes-Adaptive MDPs
    2. Partially Observable MDPs
    3. Online planning algorithms (UCT, Branch and Bound, etc)
    4. Expectation Maximisation
  7. Bounds for reinforcement learning in MDPs [Aristide]
    1. Finite MDPs: UCRL2
    2. Finite MDPs: Thompson sampling
  8. Hierarchical Reinforcement Learning [Hannes]
    1. Planning hierarchies: The Options Framework
    2. Observation hierarchies: CTW and extensions (e.g. AI-xi)
    3. Factored MDPs
  9. Interaction with humans [Christos]
    1. Inverse Reinforcement Learning
    2. Collaborative Reinforcement Learning
  10. Selected topics on societal issues [Christos]
    1. Safety
    2. Fairness
    3. Privacy
    4. Mechanism design

Course organisation and material

The course takes place in LP3/4 and is worth 7.5 credits.

Schedule

Start date: 13 February. Go here, or  e-mail me at chrdimi at chalmers.se if you want to join.
Tuesdays and Thursdays, 13:30-15:00. [Calendar]
Locations: EDIT 8103.
  • 13.2: EDIT 8103 Beliefs and decisions
  • 15.2: EDIT 8103 RL fundamentals [Reading]
  • 20.2: EDIT 8103 Exact algorithms
  • 22.2: EDIT 8103 [Reading 1Reading 2Reading 3]
  • 27.2: EDIT 8103 Stochastic approximation
  • 01.3: EDIT 8103 [Reading]
  • 06.3: EDIT 8103 Function approximation
  • 08.3: EDIT 8103 [Reading]
  • 13.3: EDIT 8103 Bayesian RL
  • 15.3: EDIT 8103 [Reading]
  • 20.3: EDIT 8103 Regret bounds
  • 22.3:
  • 27.3: EDIT 8103 Options
  • 29.3: EDIT 8103 [Reading]
  • 03.4: [Easter]
  • 05.4: [Easter]
  • 10.4: EDIT 8103 Multi-agent RL
  • 12.4: EDIT 8103 Societal issues
  • 17.4: Project presentations (13:30-17:00)
  • 26.4: Project reports

Readings

The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning". Other material will be referred to in the reading assignments.

Assignments and mini-projects

There will be 3 assignments focusing on the fundamentals. Reading assignments on Thursdays include a discussion leader, who is also responsible for presenting Tuesday's introductory material. A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. 

Preparatory meetings

These informal meetings are there to cover some basic ground, and help the teaching staff select material. Students do not need to attend.
  • 16.1: EDIT 8103 Prep: Fundamentals
  • 18.1: EDIT 8103 Prep: Exact algorithms
  • 23.1: EDIT 8103 Prep: Stochastics
  • 25.1: EDIT 6128 Prep: Function approximation
  • 30.1: EDIT 8103 Prep: Bayesian RL
  • 01.2: EDIT 8103 Prep: Regret bounds
  • 06.2: EDIT 8103 Prep: Options
  • 08.2: EDIT 8103 Prep: Miscellany
Comments