PyTorch:
Reinforcement learning taxonomy: Reinforcement learning improves behaviour from evaluative feedback
Multi-armed bandits:
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto, Chapter 2
Bandit Algorithms by Tor Lattimore and Csaba Szepesv´ari, Chapters 1, 4-7, 11, 19 (Introduction to bandits; regret; concentration inequalities; Explore-then-Commit; Upper confidence bounds, EXP3, linUCB)
Thompson Sampling
linUCB
Model-free Reinforcement learning: Q-function based methods
Reinforcement Learning: An Introduction by Richard Sutton, Ch. 3-7, 12
Model-free Reinforcement learning: Policy Gradients
Model-based Reinforcement Learning
Learning from demonstration
Offline RL:
Advantage Weighted Actor Critic: AWAC
Conservative Q-learning (CQL)