# Decision Making Under Uncertainty and Reinforcement Learning

## Overview

This course gives an introduction to decision theory and reinforcement learning, focusing on theory and algorithms. reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments and a mini project.

*An advanced version of this course is FDAT115, Advanced Topics in Reinforcement Learning and Decision Making*

## Modules

This is only a suggested schedule. Points in italics are *optionally covered.*

- Beliefs and decisions [Christos]
- Beliefs as probabilities.
- Utility theory.
- The reinforcement learning problem
- Risk, uncertainty and Bayesian decision rules.
- Regret.

- Section on estimation
- Sufficient statistics [simple example of finite hypothesis space]
- Exponential families and Beta-Bernoulli
- Credible intervals
- Concentration inequalities (Markov, Chebyschev, Hoeffding)

- Reinforcement learning: fundamentals [Christos]
- (Stochastic) Bandit problems.
- Example: UCB.
- Example: Thompson Sampling.
- Markov decision processes (MDP).
- Backwards Induction: Main Theorem
- Implementation of UCB / Thompson / BI

- Infinite horizon MDPs: Exact algorithms
- MDP Theory
- Value Iteration and convergence proof
- Policy Iteration.
*Temporal differences.**Linear programming.*

- Finite, discounted MDPs: Stochastic approximation
- Sarsa & Q-Learning
- Stochastic approximation theory.
- Actor-Critic architectures

- Continuous, discounted MDPs: Function approximation [Hannes]
- Approximate policy iteration (LSTD, CLSP)
- Fitted value iteration (aka Deep Q-Learning)
- Policy gradient
- Implementation of one ADP algorithm and comparison / Open-AI gym.

- Bayesian reinforcement learning [Divya]
- Bayes-Adaptive MDPs
- Partially Observable MDPs
- Online planning algorithms (UCT, Branch and Bound, etc)

- Large-scale RL
- Double/Deep/Bootstrapped Q-Learning etc
- Alpha-Zero

## Course organisation and material

- Google classroom (for assignments and readings)
- QA Platform (for lingering questions)
- Draft textbook (main source material)

The course takes place in LP3/4 and is worth 7.5 credits.

Lecturer: Debabrota Basu (Some lectures by Christos Dimitrakakis)

Prerequisites

- Essential: [Math] Probability, Calculus [Programming] Good Python skills

While the course will go over the basic ideas, the focus will be on fundamentals, proofs and algorithms. We will also run a section on statistical inference for those that have not taken any advanced statistics courses before.

Start date: **12 February.** Go here to join the QA.

Wednesdays 10:00-12:00

or Thursdays, 14:00-16:00.

**Detailed Schedule**

- 12.2: EDIT Analysen. Beliefs and decisions [C.. 1,2,3]
- 13.2: EL42. Section on Bayesian analysis and estimation [Ch. 4]
- 19.2: EDIT 5128.
*Optional Tutorial* - 20.2: EL42. RL fundamentals [Ch 6]
- 27.2: EDIT Analysen. MDP Theory / Value Iteration [Ch 6]
- 4.3: EDIT Analysen. Policy Iteration / Temporal Differences / Linear Programming [Ch 6]
- 11.3: EDIT Analysen Sarsa / Q Learning [Ch 7]
- 12.3: EDIT Analysen Stochastic Approximation / Actor-Critic[Ch 7]
- 18.3: EDIT Analysen Function approximation [Ch 8]
- 19.3 EL42. Gradient methods [Ch 8]
- 25.3: EDIT Analysen. Bayesian RL [ Ch 9]
- 26.3: EDIT Analysen. Large scale RL
**29.4: Project presentations****08.5: Project reports**

## Readings

The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning".

Other useful books:

- DeGroot, Optimal Statistical Decisions
- Puterman, Markov Decision Processes
- Bertsekas and Tsitsiklis, Neuro-Dynamic Programming

Other material will be referred to in the course assignments.

## Examination

10% participation, 50% assignments, 40% project

**Assignments. **Completion of 4 assignments is required for the course, focusing on the fundamentals.

**Project. **A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 2-3 students.