Decision Making Under Uncertainty and Reinforcement Learning

Overview

This course gives an introduction to decision theory and reinforcement learning, focusing on theory and algorithms. reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some hands-on work. Assessment is through assignments and a mini project.

An advanced version of this course is FDAT115, Advanced Topics in Reinforcement Learning and Decision Making

Modules

This is only a suggested schedule. Points in italics are optionally covered.

  1. Beliefs and decisions [Christos]
    1. Beliefs as probabilities.
    2. Utility theory.
    3. The reinforcement learning problem
    4. Risk, uncertainty and Bayesian decision rules.
    5. Regret.
  2. Section on estimation
    1. Sufficient statistics [simple example of finite hypothesis space]
    2. Exponential families and Beta-Bernoulli
    3. Credible intervals
    4. Concentration inequalities (Markov, Chebyschev, Hoeffding)
  3. Reinforcement learning: fundamentals [Christos]
    1. (Stochastic) Bandit problems.
    2. Example: UCB.
    3. Example: Thompson Sampling.
    4. Markov decision processes (MDP).
    5. Backwards Induction: Main Theorem
    6. Implementation of UCB / Thompson / BI
  4. Infinite horizon MDPs: Exact algorithms
    1. MDP Theory
    2. Value Iteration and convergence proof
    3. Policy Iteration.
    4. Temporal differences.
    5. Linear programming.
  5. Finite, discounted MDPs: Stochastic approximation
    1. Sarsa & Q-Learning
    2. Stochastic approximation theory.
    3. Actor-Critic architectures
  6. Continuous, discounted MDPs: Function approximation [Hannes]
    1. Approximate policy iteration (LSTD, CLSP)
    2. Fitted value iteration (aka Deep Q-Learning)
    3. Policy gradient
    4. Implementation of one ADP algorithm and comparison / Open-AI gym.
  7. Bayesian reinforcement learning [Divya]
    1. Bayes-Adaptive MDPs
    2. Partially Observable MDPs
    3. Online planning algorithms (UCT, Branch and Bound, etc)
  8. Large-scale RL
    1. Double/Deep/Bootstrapped Q-Learning etc
    2. Alpha-Zero


Course organisation and material

The course takes place in LP3/4 and is worth 7.5 credits.

Lecturer: Debabrota Basu (Some lectures by Christos Dimitrakakis)

Prerequisites

  • Essential: [Math] Probability, Calculus [Programming] Good Python skills

While the course will go over the basic ideas, the focus will be on fundamentals, proofs and algorithms. We will also run a section on statistical inference for those that have not taken any advanced statistics courses before.

Start date: 12 February. Go here to join the QA.

Wednesdays 10:00-12:00

or Thursdays, 14:00-16:00.

Detailed Schedule

  • 12.2: EDIT Analysen. Beliefs and decisions [C.. 1,2,3]
  • 13.2: EL42. Section on Bayesian analysis and estimation [Ch. 4]
  • 19.2: EDIT 5128. Optional Tutorial
  • 20.2: EL42. RL fundamentals [Ch 6]
  • 27.2: EDIT Analysen. MDP Theory / Value Iteration [Ch 6]
  • 4.3: EDIT Analysen. Policy Iteration / Temporal Differences / Linear Programming [Ch 6]
  • 11.3: EDIT Analysen Sarsa / Q Learning [Ch 7]
  • 12.3: EDIT Analysen Stochastic Approximation / Actor-Critic[Ch 7]
  • 18.3: EDIT Analysen Function approximation [Ch 8]
  • 19.3 EL42. Gradient methods [Ch 8]
  • 25.3: EDIT Analysen. Bayesian RL [ Ch 9]
  • 26.3: EDIT Analysen. Large scale RL
  • 29.4: Project presentations
  • 08.5: Project reports

Readings

The course mainly follows the structure of our draft book, "Decision Making Under Uncertainty and Reinforcement Learning".

Other useful books:

  • DeGroot, Optimal Statistical Decisions
  • Puterman, Markov Decision Processes
  • Bertsekas and Tsitsiklis, Neuro-Dynamic Programming

Other material will be referred to in the course assignments.

Examination

10% participation, 50% assignments, 40% project

Assignments. Completion of 4 assignments is required for the course, focusing on the fundamentals.

Project. A larger mini-project combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 2-3 students.