(will be more updated soon)
Courses & Textbooks
Online Learning
- (Course Notes on Online Learning) Online Learning, by Gabor Bartok, David Pal, Csaba Szepesvari, and Istvan Szita.
- (Perceptron mistake bound) Perceptron Mistake Bounds, by Mehryar Mohri and Afshin Rostamizadeh. CoRR abs/1305.0208, 2013.
- (Survey Paper on Online Learning) Online Learning and Online Convex Optimization, by Shai Shalev-Shwartz. Foundations and Trends in Machine Learning, 4(11), 107-194, 2011.
Bandits
Coactive Learning
- Coactive Learning, by Pannaga Shivaswamy and Thorsten Joachims. Journal of Artificial Intelligence Research, 53, 1-40, 2015.
- Stable Coactive Learning via Perturbation, by Karthik Raman, Thorsten Joachims, Pannaga Shivaswamy, and Tobias Schnabel. International Conference on Machine Learning, 2013.
- Learning to Diversify from Implicit Feedback, by Karthik Raman, Pannaga Shivaswamy, and Thorsten Joachims. ACM Conference on Web Search and Data Mining, 2012.
- Learning Trajectory Preferences for Manipulators via Iterative Improvement, by Ashesh Jain, Brian Wojcik, Thorsten Joachims, and Ashutosh Saxena. Neural Information Processing Systems, 2013.
Behavioral Cloning
More Imitation Learning
- Forward Training
- DAgger & Follow-up Work
- A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, by Stephane Ross, Geoff Gordon, and Drew Bagnell. International Conference on Artificial Intelligence and Statistics, 2011.
- Learning Policies for Contextual Submodular Prediction, by Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell. International Conference on Machine Learning, 2017.
- Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, by Wen Sun, Arun Venkatraman, Geoff Gordon, Byron Boots, J. Andrew Bagnell. International Conference on Machine Learning, 2017.
- SEARN & Follow-up Work
- Generative Adversarial Imitation Learning
- Reduction of Behavioral Cloning to PAC Learning
- Learning to Search
Inverse reinforcement learning
Basic Reinforcement Learning
- (survey) Bayesian Reinforcement Learning: A Survey, by Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Foundations and Trends in Machine Learning, 8(5-6), 359-483, 2015.
- A3C: Deep learning + Actor-critic (Mnih et al., ICML 2016)
- Policy gradient theorem (Sutton et al., ICML 1999)
- A Natural Policy Gradient, by Sham Kakade. Neural Information Processing Systems, 2002.
- Deep deterministic policy gradient (Lillicrap et al., ICLR 2015)
- Playing Atari with Deep Reinforcement Learning, by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Neural Information Processing Systems, 2015.
- Guided Policy Search, by Sergey Levine and Vladlen Koltun. International Conference on Machine Learning, 2013.
- An Application of Reinforcement Learning to Aerobatic Helicopter Flight, by Pieter Abbeel, Adam Coates, Morgan Quigley, Andrew Ng. Neural Information Processing Systems, 2007.
- Self-Optimizing Memory Controllers: A Reinforcement Learning Approach, by Engin Ipek, Onur Mutlu, Jose Martinez, and Rich Caruana. International Symposium on Computer Architecture, 2008.
Sparse Feedback in RL
- Residual Loss Prediction: Reinforcement Learning With No Incremental Feedback, by Hal Daumé III, John Langford, Amr Sharaf. International Conference on Learning Representations, 2018.
- Hierarchical Imitation and Reinforcement Learning, by Hoang M. Le, Nan Jiang, Alekh Agarwal, Miro Dudík, Yisong Yue, Hal Daumé III.
Learning + Control
Safe Reinforcement Learning
- Safe Exploration in Markov Decision Processes, by Teodor Mihai Modolvan and Pieter Abbeel. International Conference on Machine Learning, 2012.
- Safe Exploration in Finite Markov Decision Processes with Gaussian Processes, by Matteo Turchetta, Felix Berkenkamp, Andreas Krause. NIPS 2016.
- Safe Model-based Reinforcement Learning with Stability Guarantees, by Felix Berkenkamp, Matteo Turchetta, Angela Schoellig, Andreas Krause. NIPS 2017
- Safe Exploration and Optimization of Constrained MDPs using Gaussian Processes, by Akifumi Wachi, Yanan Sui, Yisong Yue, Masahiro Ono. AAAI 2018.
- High Confidence Policy Improvement, by Philip Thomas, Georgios Theocharous, Mohammad Ghavamzadeh. ICML 2015.
Constrained Policy Search in Reinforcement Learning
- Conservative policy iteration (Kakade & Langford, ICML 2002)
- Safe Policy Iteration, by Matteo Pirotta, Marcello Restelli, Alessio Pecorino, Daniele Calandriello. ICML 2013.
- Trust Region Policy Optimization, by John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel. ICML 2015.
- Constrained Policy Optimization, by Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel. ICML 2017.
Multi-Task & Transfer in RL & IL
Off-policy learning
- Exploration Scavenging, by John Langford, Alexander Strehl, and Jenn Wortman Vaughan. International Conference on Machine Learning, 2008.
- Doubly Robust Policy Evaluation and Learning, by Miro Dudik, John Langford, and Lihong Li. International Conference on Machine Learning, 2011.
- Counterfactual Risk Minimization: Learning from Logged Bandit Feedback, by Adith Swaminathan and Thorsten Joachims. International Conference on Machine Learning, 2015.
- Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, by Nan Jiang and Lihong Li. ICML 2016.
Monte Carlo Tree Search
- A Survey of Monte Carlo Tree Search Methods by Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 2012.
- Applying Monte Carlo Tree Search to Go) Mastering the game of Go with deep neural networks and tree search, by David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Nature, 529, 484–489, doi:10.1038/nature16961, 2016.
Other Forward Search in RL
Theory
Partial-observable RL
- Planning and acting in partially observable stochastic domains, by Leslie Pack Kaelbling, Michael L. Littman, Anthony R. Cassandra
Adversarial & Multi-Agent
- Counterfactual regret minimization
- Multi-Agent Imitation Learning