Optional Reading

Python (scientific computing): Focus on array operations, broadcasting, plotting, and basic linear algebra using NumPy.
- Python basics: https://www.w3schools.com/python/
- NumPy, SciPy, and Matplotlib: https://python-course.eu/numerical-programming/introduction-to-numpy.php

PyTorch: Focus on tensors, automatic differentiation, neural network modules, and training loops.
- Deep Learning with PyTorch: A 60 Minute Blitz
- Example PyTorch code: https://github.com/yunjey/pytorch-tutorial
- PyTorch tutorials: https://docs.pytorch.org/tutorials/index.html

Linear Algebra: Focus on vectors, matrices, eigenvalues, norms, and gradients.
- MIT OCW 18.06 Linear Algebra: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
- Khan Academy Linear Algebra: https://www.khanacademy.org/math/linear-algebra
- Concise lecture notes: https://drive.google.com/file/d/1avcmfGNo_WsuG_e0UhkByzGmGjyZ5EqR/view

Probability: Focus on random variables, expectation, variance, conditional probability, and common distributions.
- Probability notes: https://drive.google.com/file/d/1WbwDVSIaWLm84D8t6WyQpI_ApIofbfng/view

General reinforcement learning:
Reinforcement Learning: An Introduction by Richard Sutton
OpenAI Spinning Up in Deep Reinforcement Learning: https://spinningup.openai.com/en/latest/

Reinforcement learning taxonomy: Reinforcement learning improves behaviour from evaluative feedback

Multi-armed bandits:
- Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto, Chapter 2
- Bandit Algorithms by Tor Lattimore and Csaba Szepesv´ari, Chapters 1, 4-7, 11, 19 (Introduction to bandits; regret; concentration inequalities; Explore-then-Commit; Upper confidence bounds, EXP3, linUCB)
- Thompson Sampling
- linUCB
  - A Contextual-Bandit Approach to Personalized News Article Recommendation

Model-free Reinforcement learning: Q-function based methods
- Reinforcement Learning: An Introduction by Richard Sutton, Ch. 3-7, 12

Model-free Reinforcement learning: Policy Gradients
- Reinforcement Learning: An Introduction by Richard Sutton, Ch. 13
- Introduction
- Kinds of RL algorithms
- Policy gradients
- PPO
- Generalized Advantage Estimation (GAE)

Model-based Reinforcement Learning

Learning from demonstration
- DAgger
- Inverse reinforcement learning (IRL):
  - Linear programming IRLR: LP-IRL
  - Maximum entropy IRL: MaxEntIRL

Offline RL:
- Advantage Weighted Actor Critic: AWAC
- Offline RL with Implicit Q-learning (IQL)
- Conservative Q-learning (CQL)
- Latent Action Space for Offline RL: PLAS

Google Sites

Report abuse