# Advanced Topics in Reinforcement Learning and Decision Making

## Overview

This course gives a thorough overview of the fundamental theory of reinforcement learning, as well as a look into advanced algorithms and modern proof techniques for reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments (including reviewing and implementation of research papers) and a mini project.

This is an advanced version of *FDAT070, Decision Making Under Uncertainty.*

## Modules

This is only a suggested schedule. Points in italics are *optionally covered.*

- Beliefs and decisions [Christos]
- Beliefs as probabilities.
- Hierarchies of decision making problems.
- The reinforcement learning problem
- Risk, uncertainty and Bayesian decision rules.
- PAC bounds.
- Regret.

- Reinforcement learning: fundamentals [Christos]
- (Stochastic) Bandit problems.
- UCB.
- Thompson Sampling.
- Markov decision processes (MDP).
- Backwards Induction.
- Implementation of UCB / Thompson / BI

- Infinite horizon MDPs: Exact algorithms [Divya]
- Value Iteration.
- Policy Iteration.
- Temporal differences.
- Linear programming.
- Implementation of MPI
*.*

- Finite, discounted MDPs: Stochastic approximation [Aristide]
- Sarsa & Q-Learning
- Stochastic approximation theory.
- Finite Sample Bounds.
*Actor-Critic architectures**Policy gradient.*

- Continuous, discounted MDPs: Function approximation [Hannes]
- Approximate policy iteration (LSTD, CLSP)
- Fitted value iteration (aka Deep Q-Learning)
- Policy gradient
- Implementation of one ADP algorithm and comparison / Open-AI gym.

- Bayesian reinforcement learning [Divya]
- Bayes-Adaptive MDPs
- Partially Observable MDPs
- Online planning algorithms (UCT, Branch and Bound, etc)
*Expectation Maximisation*

- Bounds for reinforcement learning in MDPs [Aristide]
- Finite MDPs: UCRL2
- Finite MDPs: Thompson sampling

- Hierarchical Reinforcement Learning [Hannes]
- Planning hierarchies: The Options Framework
- Observation hierarchies: CTW and extensions (e.g. AI-xi)
*Factored MDPs*

- Interaction with humans [Christos]
- Inverse Reinforcement Learning
- Collaborative Reinforcement Learning

- Selected topics on societal issues [Christos]
- Safety
- Fairness
- Privacy
- Mechanism design

## Course organisation and material

The course takes place in LP3/4 and is worth 7.5 credits.

Prerequisites

- Essential: [Math] Probability, Calculus [Programming] Good Python skills [Machine learning] Basic reinforcement learning.

Basic knowledge of reinforcement learning can be found in e.g.

*David Silver's RL course*.- The basic version of
*FDAT070, Decision Making Under Uncertainty.* - Stanford's machine learning course

While the course will go over the basic ideas, the focus will be on fundamentals, proofs and the state of the art.

Schedule

Start date: **13 February.** Go here, or e-mail me at chrdimi at chalmers.se if you want to join.

Tuesdays and Thursdays, 13:30-15:00. [Calendar]

Locations: EDIT 8103.

- 13.2: EDIT 8103 Beliefs and decisions [C.. 1,2]
- 15.2: EDIT 8103 RL fundamentals [Ch 3,4]
- 20.2: EDIT 5128 Exact algorithms [Ch 6]
- 22.2: EDIT 8103 [Reading 1, Reading 2, Reading 3]
- 27.2: EDIT 8103 Stochastic approximation [Ch 7]
- 01.3: EDIT 8103 [Reading 1, Reading 2]
- 06.3: EDIT 8103 Function approximation [Ch 8]
- 08.3: EDIT 8103 [Reading]
- 13.3: EDIT 8103 Bayesian RL [Ch 9]
- 15.3: EDIT 8103 [Reading 1, Reading 2, Reading 3]
- 20.3: EDIT 8103 Regret bounds [Ch 10]
- 22.3: EDIT 8103 [Reading 1, Reading 2]
- 27.3: EDIT 8103 Options
- 29.3: EDIT 8103 [Reading]
- 03.4: [Easter]
- 05.4: [Easter]
- 17.4: EDIT 8103 Multi-agent RL
- 19.4: EDIT 8103 Societal issues
**03.5: Project presentations****01.5: Project reports**

## Readings

The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning". Other material will be referred to in the reading assignments.

## Examination

10% participation, 50% assignments, 40% project

**Assignments. **Completion of 4 assignments is required for the course, focusing on the fundamentals. Those acting as discussion leaders on Thursdays are also responsible for presenting Tuesday's introductory material, which counts as 2 assignments.

**Project. **A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 2-3 students.

## Preparatory meetings

These *informal* meetings are there to cover some basic ground, and help the teaching staff select material. Students do not need to attend.

*16.1: EDIT 8103 Prep: Fundamentals**18.1: EDIT 8103 Prep: Exact algorithms**23.1: EDIT 8103 Prep: Stochastics**25.1: EDIT 6128 Prep: Function approximation**30.1: EDIT 8103 Prep: Bayesian RL**01.2: EDIT 8103 Prep: Regret bounds**06.2: EDIT 8103 Prep: Options**08.2: EDIT 8103 Prep: Miscellany*