Advanced Topics in Machine Learning, Caltech: http://www.yisongyue.com/courses/cs159/, taught by Yisong Yue
Mining from Large Data Sets, ETHZ: https://las.inf.ethz.ch/teaching/dm-f17, taught by Andreas Krause
(Course Notes on Online Learning) Online Learning, by Gabor Bartok, David Pal, Csaba Szepesvari, and Istvan Szita.
Active Learning, (accessible via UChicago IPs), by Burr Settles, 2012
Reinforcement Learning: An Introduction (Barto & Sutton): http://incompleteideas.net/book/bookdraft2018jan1.pdf
Algorithms for Reinforcement Learning, by Csaba Szepesvári, 2010
CMSC 25300/35300: Mathematical Foundations of Machine Learning, taught by Rebecca Willett
More lists of resources (RL):
(Perceptron mistake bound) Perceptron Mistake Bounds, by Mehryar Mohri and Afshin Rostamizadeh. CoRR abs/1305.0208, 2013.
(Survey Paper on Online Learning) Online Learning and Online Convex Optimization, by Shai Shalev-Shwartz. Foundations and Trends in Machine Learning, 4(11), 107-194, 2011.
(survey paper) Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, by Sébastien Bubeck, Nicolò Cesa-Bianchi
UCB
Finite-time Analysis of the Multiarmed Bandit Problem, by Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer. Machine Learning, 47, 235-356, 2002.
Linear-UCB
Improved Algorithms for Linear Stochastic Bandits, by Yasin Abbasi-Yadkori, David Pal, and Csaba Czepesvari. Neural Information Processing Systems, 2011.
Contextual Bandits
A Contextual-Bandit Approach to Personalized News Article Recommendation, by Lihong Li, Wei Chu, John Langford, and Robert Schapire. International World Wide Web Conference, 2010.
(survey paper) Taking the Human Out of the Loop: A Review of Bayesian Optimization, by Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan Adams, and Nando de Freitas. Proceedings of the IEEE, 104(1), 2016.
Practical Bayesian Optimization of Machine Learning Algorithms, by Jasper Snoek, Hugo Larochelle, and Ryan Adams. Neural Information Processing Systems, 2012.
Scalable Bayesian Optimization Using Deep Neural Networks, by Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostafa Ali Patwary, Prabhat, Ryan Adams. International Conference on Machine Learning, 2015.
Bayesian Multi-Scale Optimistic Optimization, by Ziyu Wang, Babak Shakibi, Lin Jin, Nando de Freitas. International Conference on Artificial Intelligence and Statistics, 2014.
(survey paper). Advancements in Dueling Bandits, by Yanan Sui, Masrour Zoghi, Katja Hofmann, Yisong Yue. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence Survey track. Pages 5502-5510.
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, by Yisong Yue and Thorsten Joachims. International Conference on Machine Learning, 2009.
The K-armed Dueling Bandits Problem, by Yisong Yue, Josef Broder, Robert Kleinberg, and Thorsten Joachims. Journal of Computer and System Sciences, DOI:10.1016/j.jcss.2011.12.028, 2012.
Contextual Dueling Bandits, by Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi. Proceedings of The 28th Conference on Learning Theory, PMLR 40:563-587, 2015.
Coactive Learning, by Pannaga Shivaswamy and Thorsten Joachims. Journal of Artificial Intelligence Research, 53, 1-40, 2015.
Stable Coactive Learning via Perturbation, by Karthik Raman, Thorsten Joachims, Pannaga Shivaswamy, and Tobias Schnabel. International Conference on Machine Learning, 2013.
Learning to Diversify from Implicit Feedback, by Karthik Raman, Pannaga Shivaswamy, and Thorsten Joachims. ACM Conference on Web Search and Data Mining, 2012.
Learning Trajectory Preferences for Manipulators via Iterative Improvement, by Ashesh Jain, Brian Wojcik, Thorsten Joachims, and Ashutosh Saxena. Neural Information Processing Systems, 2013.
(survey) Active Learning Literature Survey, by Burr Settles.
Analysis of perceptron-based active learning, by Sanjoy Dasgupta, Adam Kalai, and Claire Monteleoni. Learning Theory, 249-263, 2005.
Importance Weighted Active Learning, by Alina Beygelzimer, Sanjoy Dasgupta, John Langford, and Daniel Hsu. International Conference on Machine Learning, 2009.
Agnostic Active Learning Without Constraints, by Alina Beygelzimer, Daniel Hsu, John Langford, and Tong Zhang. Neural Information Processing Systems, 2010.
Efficient and Parsimonious Agnostic Active Learning, by Tzu-Kuo Huang, Alekh Agarwal, Daniel Hsu, John Langford, Robert Schapire. Neural Information Processing Systems, 2015.
Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization, by Daniel Golovin and Andreas Krause. Journal of Artificial Intelligence Research, 42, 427-486, 2011.
Near Optimal Bayesian Active Learning for Decision Making, by Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, Drew Bagnell, Siddhartha Srinivasa. International Conference on Artificial International and Statistics, 2014.
Submodular Surrogates for Value of Information, by Yuxin Chen, Shervin Javdani, Amin Karbasi, Drew Bagnell, Siddhartha Srinivasa, Andreas Krause. In the 29th AAAI Conference on Artificial Intelligence (AAAI).
Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning, by Ashish Kapoor, Eric Horvitz, Sumit Basu. In Proceedings of the 20th international joint conference on Artifical intelligence, 2007
Streaming Submodular Maximization: Massive Data Summarization on the Fly, by Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, Andreas Krause. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, August 2014.
(Streaming algorithms) Do Less, Get More: Streaming Submodular Maximization with Subsampling, by Moran Feldman, Amin Karbasi, Ehsan Kazemi. Advances in Neural Information Processing Systems 31 (NIPS 2018)
Active Learning with Feature Feedback
Active learning with feedback on features and instances, by H Raghavan, O Madani, R Jones - Journal of Machine Learning Research, 2006.
Learning with Feature Feedback: from Theory to Practice, by Stefanos Poulis, Sanjoy Dasgupta. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1104-1113, 2017.
Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback, by Arijit Biswas, Devi Parikh; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 644-651.
Active Structured Learning
Latent Structured Active Learning, by Wenjie Luo Alex Schwing Raquel Urtasun. Advances in Neural Information Processing Systems, 2013.
Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks, by Julius von Kügelgen, Paul K Rubenstein, Bernhard Schölkopf, Adrian Weller. NeurIPS 2019 Workshop “Do the right thing”: machine learning and causal inference for improved decision making, December 2019
Active Imitation Learning
Active Imitation Learning: Formal and Practical Reductions to I.I.D. Learning, by Kshitij Judah, Alan Fern, Tom Dietterich, Prasad Tadepalli. Journal of Machine Learning Research, 15, 4105-4143, 2015.
(survey) Bayesian Reinforcement Learning: A Survey, by Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Foundations and Trends in Machine Learning, 8(5-6), 359-483, 2015.
A3C: Deep learning + Actor-critic (Mnih et al., ICML 2016)
Policy gradient theorem (Sutton et al., ICML 1999)
A Natural Policy Gradient, by Sham Kakade. Neural Information Processing Systems, 2002.
Deep deterministic policy gradient (Lillicrap et al., ICLR 2015)
Playing Atari with Deep Reinforcement Learning, by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Neural Information Processing Systems, 2015.
Guided Policy Search, by Sergey Levine and Vladlen Koltun. International Conference on Machine Learning, 2013.
An Application of Reinforcement Learning to Aerobatic Helicopter Flight, by Pieter Abbeel, Adam Coates, Morgan Quigley, Andrew Ng. Neural Information Processing Systems, 2007.
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach, by Engin Ipek, Onur Mutlu, Jose Martinez, and Rich Caruana. International Symposium on Computer Architecture, 2008.
Thompson Sampling (Bandits)
Analysis of Thompson Sampling for the Multi-armed Bandit Problem, by Shipra Agrawal and Navin Goyal. Conference on Learning Theory, 2012.
An Empirical Evaluation of Thompson Sampling, by Olivier Chapelle and Lihong Li. Neural Information Processing Systems, 2012.
Thompson Sampling for Contextual Bandits with Linear Payoffs, by Shipra Agrawal and Navin Goyal. International Conference on Machine Learning, 2013.
Posterior Sampling (RL)
(More) Efficient Reinforcement Learning via Posterior Sampling, by Ian Osband Daniel Russo Benjamin Van Roy. Advances in Neural Information Processing Systems, 2013.
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds, by Shipra Agrawal, Randy Jia. Proceedings of the 31st International Conference on Neural Information Processing Systems, December 2017, Pages 1184–1194.
A Survey of Monte Carlo Tree Search Methods by Cameron Browne, Edward Powley, Daniel Whitehouse, Simon Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 2012.
Applying Monte Carlo Tree Search to Go Mastering the game of Go with deep neural networks and tree search, by David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Nature, 529, 484–489, doi:10.1038/nature16961, 2016.
An Invitation to Imitation, by Drew Bagnell
A Game-Theoretic Approach to Apprenticeship Learning, by Umar Syed and Robert Schapire. NIPS 2008.
Forward Training
Efficient Reductions for Imitation Learning, by Stephane Ross, Drew Bagnell. AISTATS 2010.
DAgger & Follow-up Work
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, by Stephane Ross, Geoff Gordon, and Drew Bagnell. International Conference on Artificial Intelligence and Statistics, 2011.
Learning Policies for Contextual Submodular Prediction, by Stephane Ross, Jiaji Zhou, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell. International Conference on Machine Learning, 2017.
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction, by Wen Sun, Arun Venkatraman, Geoff Gordon, Byron Boots, J. Andrew Bagnell. International Conference on Machine Learning, 2017.
SEARN & Follow-up Work
Search Based Structured Prediction, by Hal Daume, John Langford, Daniel Marcu. Machine Learning Journal 2009.
Learning to Search Better than Your Teacher, by Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford. ICML 2015
(survey paper) An Overview of Machine Teaching, by Xiaojin Zhu, Adish Singla, Sandra Zilles, Anna N. Rafferty. ArXiv 1801.05927, 2018.
Teaching Complexity
On the Complexity of Teaching, by S. Goldman, M. Kearns. In Proc. of the 4th Conference on Computational Learning Theory (COLT'91), 1991
Modeling Human Learning
Deep Knowledge Tracing, by C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. Guibas, J. Sohl-Dickstein, In Proc. of the 29th Conference on Neural Information Processing Systems (NeurIPS'15), 2015
Learning to Represent Student Knowledge on Programming Exercises Using Deep Learning, by L. Wang, A. Sy, L. Liu, C. Piech. In Proc. of the 10th International Conference on Educational Data Mining (EDM'17), 2017
Teaching a Human Learner
Near-Optimally Teaching the Crowd to Classify, by A. Singla, I. Bogunovic, G. Bartok, A. Karbasi, A. Krause. In Proc. of the 31st International Conference on Machine Learning (ICML), 2014
Near-Optimal Machine Teaching via Explanatory Teaching Sets, by Yuxin Chen, Oisin Mac Aodha, Shihan Su, Pietro Perona, Yisong Yue. In the 21th International Conference on Artificial Intelligence and Statistics (AISTATS) , Playa Blanca, Lanzarote, Canary Islands, April 2018.
Teaching a Forgetful Learner
Teaching Multiple Concepts to a Forgetful Learner, by A. Hunziker, Y. Chen, O. Mac Aodha, M. Gomez-Rodriguez, A. Krause, P. Perona, Y. Yue, A. Singla. In Proc. of the 33rd Conference on Neural Information Processing Systems (NeurIPS'19), 2019
Teaching a Machine Learner
Policy Poisoning in Batch Reinforcement Learning and Control, by Y. Ma, X. Zhang, W. Sun, X. Zhu. In Proc. of the 33rd Conference on Neural Information Processing Systems (NeurIPS'19), 2019
Machine Teaching and Optimal Control
An Optimal Control View of Adversarial Machine Learning, by Xiaojin Zhu. arXiv:1811.04422, 2018.
An Optimal Control Approach to Sequential Machine Teaching, by Laurent Lessard, Xuezhou Zhang, and Xiaojin Zhu. In The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.