Topics in ML (Monsoon 2017)
Course Code: CSE975
Course Description: This is an advance course in ML. The objective of this course is to give make the students familiar with online learning and reinforcement learning. Topics which are planned to be covered in this course are
Online Learning: Online classification/regression, Online learning from experts, Online-to-batch conversion.
Reinforcement Learning: Multi-arm Bandits, The exploration-exploitation dilemma, Markov Decision Processes, Dynamic Programming, Monte Carlo Methods, Temporal-Difference Learning, Sarsa: On-Policy TD Control, Q-learning, Value-function Approximation
References:
Richard S. Sutton and Andrew Barto, Reinforcement Learning: An Introduction, Second Edition, The MIT Press, 2012.
Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar, Foundations of Machine Learning. The MIT Press, 2012.
Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, The MIT Press, 2016.
Csaba Szepesvári, Algorithms for Reinforcement Learning, Morgan and Claypool, 2010.
Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st Edition, John Wiley & Sons, Inc. New York, NY, USA, 1994.
Supporting Material:
Lecture Notes/Slides:
Important Papers:
Yoav Freund and Robert E. Schapire Large Margin Classification Using the Perceptron Algorithm 1999: Machine Learning, Springer.
Littlestone, N. and Warmuth, M., The weighted majority algorithm 1994: Information and Computation, 108:212-261.
Littlestone, Nick. The Winnow Algorithm ”Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.” Machine learning 2.4 (1988): 285-318.
Cesa-Bianchi, Nicolo, et al. Doubling Trick ”How to use expert advice.” Journal of the ACM (JACM) 44.3 (1997): 427-485.
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47(2):235–256, 2002.
Sebastien Bubeck and Nicolo Cesa-Bianchi, Regret Analysis of Stochastic and Non-stochastic Multi-armed Bandit Problems,Machine Learning, Vol. 5, No. 1 (2012) 1-122.