OverviewThis course gives a thorough overview of the fundamental theory of reinforcement learning, as well as a look into advanced algorithms and modern proof techniques for reinforcement learning problems. Each module lasts for about a week, and includes a short lecture of basic concepts, an overview of proofs and some handson work. Assessment is through assignments (including reviewing and implementation of research papers) and a mini project.
This is an advanced version of FDAT070, Decision Making Under Uncertainty. ModulesThis is only a suggested schedule. Points in italics are optionally covered.  Beliefs and decisions [Christos]
 Beliefs as probabilities.
 Hierarchies of decision making problems.
 The reinforcement learning problem
 Risk, uncertainty and Bayesian decision rules.
 PAC bounds.
 Regret.
 Reinforcement learning: fundamentals [Christos]
 (Stochastic) Bandit problems.
 UCB.
 Thompson Sampling.
 Markov decision processes (MDP).
 Backwards Induction.
 Implementation of UCB / Thompson / BI
 Infinite horizon MDPs: Exact algorithms [Divya]
 Value Iteration.
 Policy Iteration.
 Temporal differences.
 Linear programming.
 Implementation of MPI.
 Finite, discounted MDPs: Stochastic approximation [Aristide]
 Sarsa & QLearning
 Stochastic approximation theory.
 Finite Sample Bounds.
 ActorCritic architectures
 Policy gradient.
 Continuous, discounted MDPs: Function approximation [Hannes]
 Approximate policy iteration (LSTD, CLSP)
 Fitted value iteration (aka Deep QLearning)
 Policy gradient
 Implementation of one ADP algorithm and comparison / OpenAI gym.
 Bayesian reinforcement learning [Divya]
 BayesAdaptive MDPs
 Partially Observable MDPs
 Online planning algorithms (UCT, Branch and Bound, etc)
 Expectation Maximisation
 Bounds for reinforcement learning in MDPs [Aristide]
 Finite MDPs: UCRL2
 Finite MDPs: Thompson sampling
 Hierarchical Reinforcement Learning [Hannes]
 Planning hierarchies: The Options Framework
 Observation hierarchies: CTW and extensions (e.g. AIxi)
 Factored MDPs
 Interaction with humans [Christos]
 Inverse Reinforcement Learning
 Collaborative Reinforcement Learning
 Selected topics on societal issues [Christos]
 Safety
 Fairness
 Privacy
 Mechanism design

Course organisation and materialThe course takes place in LP3/4 and is worth 7.5 credits. Prerequisites Schedule Start date: 13 February. Go here, or email me at chrdimi at chalmers.se if you want to join. Locations: EDIT 8103.  13.2: EDIT 8103 Beliefs and decisions [C.. 1,2]
 15.2: EDIT 8103 RL fundamentals [Ch 3,4]
 20.2: EDIT 5128 Exact algorithms [Ch 6]
 22.2: EDIT 8103 [Reading 1, Reading 2, Reading 3]
 27.2: EDIT 8103 Stochastic approximation [Ch 7]
 01.3: EDIT 8103 [Reading 1, Reading 2]
 06.3: EDIT 8103 Function approximation [Ch 8]
 08.3: EDIT 8103 [Reading]
 13.3: EDIT 8103 Bayesian RL [Ch 9]
 15.3: EDIT 8103 [Reading 1, Reading 2, Reading 3]
 20.3: EDIT 8103 Regret bounds [Ch 10]
 22.3: EDIT 8103 [Reading 1, Reading 2]
 27.3: EDIT 8103 Options
 29.3: EDIT 8103 [Reading]
 03.4: [Easter]
 05.4: [Easter]
 17.4: EDIT 8103 Multiagent RL
 19.4: EDIT 8103 Societal issues
 03.5: Project presentations
 01.5: Project reports
10% participation, 50% assignments, 40% project
Assignments. Completion of 4 assignments is required for the course, focusing on the fundamentals. Those acting as discussion leaders on Thursdays are also responsible for presenting Tuesday's introductory material, which counts as 2 assignments.
Project. A larger miniproject combining elements of the assignments for optimal exploration, will take place at the end of the course. The projects are done in groups of 23 students. Preparatory meetingsThese informal meetings are there to cover some basic ground, and help the teaching staff select material. Students do not need to attend.  16.1: EDIT 8103 Prep: Fundamentals
 18.1: EDIT 8103 Prep: Exact algorithms
 23.1: EDIT 8103 Prep: Stochastics
 25.1: EDIT 6128 Prep: Function approximation
 30.1: EDIT 8103 Prep: Bayesian RL
 01.2: EDIT 8103 Prep: Regret bounds
 06.2: EDIT 8103 Prep: Options
 08.2: EDIT 8103 Prep: Miscellany
