# Decision Making Under Uncertainty and Reinforcement Learning

## Overview

This course gives an introduction to decision theory and reinforcement learning, focusing on theory and algorithms. reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments and a mini project.

An advanced version of this course is FDAT115, Advanced Topics in Reinforcement Learning and Decision Making

## Modules

This is only a suggested schedule. Points in italics are optionally covered.

1. Beliefs and decisions [Christos]
1. Beliefs as probabilities.
2. Utility theory.
3. The reinforcement learning problem
4. Risk, uncertainty and Bayesian decision rules.
5. Regret.
2. Section on estimation
1. Sufficient statistics [simple example of finite hypothesis space]
2. Exponential families and Beta-Bernoulli
3. Credible intervals
4. Concentration inequalities (Markov, Chebyschev, Hoeffding)
3. Reinforcement learning: fundamentals [Christos]
1. (Stochastic) Bandit problems.
2. Example: UCB.
3. Example: Thompson Sampling.
4. Markov decision processes (MDP).
5. Backwards Induction: Main Theorem
6. Implementation of UCB / Thompson / BI
4. Infinite horizon MDPs: Exact algorithms
1. MDP Theory
2. Value Iteration and convergence proof
3. Policy Iteration.
4. Temporal differences.
5. Linear programming.
5. Finite, discounted MDPs: Stochastic approximation
1. Sarsa & Q-Learning
2. Stochastic approximation theory.
3. Actor-Critic architectures
6. Continuous, discounted MDPs: Function approximation [Hannes]
1. Approximate policy iteration (LSTD, CLSP)
2. Fitted value iteration (aka Deep Q-Learning)
4. Implementation of one ADP algorithm and comparison / Open-AI gym.
7. Bayesian reinforcement learning [Divya]
2. Partially Observable MDPs
3. Online planning algorithms (UCT, Branch and Bound, etc)
8. Large-scale RL
1. Double/Deep/Bootstrapped Q-Learning etc
2. Alpha-Zero

## Course organisation and material

The course takes place in LP3/4 and is worth 7.5 credits.

Lecturer: Debabrota Basu (Some lectures by Christos Dimitrakakis)

Prerequisites

• Essential: [Math] Probability, Calculus [Programming] Good Python skills

While the course will go over the basic ideas, the focus will be on fundamentals, proofs and algorithms. We will also run a section on statistical inference for those that have not taken any advanced statistics courses before.

Start date: 12 February. Go here to join the QA.

Wednesdays 10:00-12:00

or Thursdays, 14:00-16:00.

Detailed Schedule

• 12.2: EDIT Analysen. Beliefs and decisions [C.. 1,2,3]
• 13.2: EL42. Section on Bayesian analysis and estimation [Ch. 4]
• 19.2: EDIT 5128. Optional Tutorial
• 20.2: EL42. RL fundamentals [Ch 6]
• 27.2: EDIT Analysen. MDP Theory / Value Iteration [Ch 6]
• 4.3: EDIT Analysen. Policy Iteration / Temporal Differences / Linear Programming [Ch 6]
• 11.3: EDIT Analysen Sarsa / Q Learning [Ch 7]
• 12.3: EDIT Analysen Stochastic Approximation / Actor-Critic[Ch 7]
• 18.3: EDIT Analysen Function approximation [Ch 8]
• 19.3 EL42. Gradient methods [Ch 8]
• 25.3: EDIT Analysen. Bayesian RL [ Ch 9]
• 26.3: EDIT Analysen. Large scale RL
• 29.4: Project presentations
• 08.5: Project reports

The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning".

Other useful books:

• DeGroot, Optimal Statistical Decisions
• Puterman, Markov Decision Processes
• Bertsekas and Tsitsiklis, Neuro-Dynamic Programming

Other material will be referred to in the course assignments.

## Examination

10% participation, 50% assignments, 40% project

Assignments. Completion of 4 assignments is required for the course, focusing on the fundamentals.

Project. A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 2-3 students.