Advanced Machine Learning
(CS461/CS672) Spring 2022
Anup Bhattacharya
Syllabus: In this course we will cover some basic theory underlying machine learning. We will study concepts such as PAC learning and generalization. We will also study some aspects of unsupervised learning. We may also study some special topics such as privacy and bandit algorithms, if time permits.
References: For the first part of this course, we will follow 'Understanding Machine Learning: From Theory to Algorithms' (PDF) by S. Shalev-Shwartz and S. Ben David (SSBD). Another book with which this course has a large overlap in materials is 'Foundations of Data Science' by A. Blum, J. Hopcroft and R. Kannan (BHK: pdf). We will provide links for other materials.
Prerequisites: This course will primarily be a proof-oriented course. We expect you to be familiar with algorithmic way of thinking. Knowledge of linear algebra and probability would be beneficial for this course.
Grading: MSc Students (Homeworks: 40%, MidSem: 20%, EndSem: 35%, Class participation: 5%). For PhD students, (Homeworks: 30%, MidSem: 20%, EndSem: 35%, Project: 15%)
You may avail at most 7 late days for submission of homework assignments. We will not allow grading of homeworks that uses more than 7 late days.
Announcements: The first class will be held online on Monday (10/01/2022) at 8:30 AM. Please join the Google classroom using the link.
Statistical Learning (PAC Learning)
Lecture 1: Introduction (Section 2.1, 2.2 of SSBD): Class notes, Recording
Lecture 2: ERM with Inductive Bias (Section 2.3 of SSBD): Class notes, Recording
Lecture 3: ERM with Inductive Bias, Proof of Theorem (Section 2.3 of SSBD): Class notes, Recording
Lecture 4: Definition of PAC Learning, Agnostic PAC Learning (Section 3.1, 3.2 of SSBD): Class notes, Recording
Lecture 5: \epsilon-representation samples, Agnostic PAC (Section 3.2.2, 4.1 of SSBD): Class notes, Recording
Lecture 6: Agnostic PAC Learnability (Section 4.1, 4.2 of SSBD): Class notes, Recording
Lecture 7: No-Free-Lunch Theorem (Section 5.1 of SSBD): Class notes, Recording
Lecture 8: Complete No-Free-Lunch Theorem, Bias-complexity trade-off (Section 5.1, 5.2 in SSBD): Class notes, Recording
Lecture 9: VC-Dimension (Section 6.1, 6.2 in SSBD): Class notes, Recording
Lecture 10: VC-dimension continued (Section 6.2, 6.3 in SSBD): Class notes, Recording
Lecture 11: VC-dimension, Fundamental Theorem of Statistical Learning (Section 6.3, 6.4 in SSBD): Class notes, Recording
Lecture 12: Sauer's Lemma (Section 6.5.1 in SSBD): Class notes, Recording
Lecture 13: Finish proof of Sauer's Lemma (Section 6.5.1 in SSBD): Recording
Lecture 14: Proof of Fundamental Theorem of Statistical Learning (Section 6.5.2 in SSBD, Nick Harvey's notes, Class notes, Recording)
Lecture 15: Complete proof of Fundamental Theorem of Statistical Learning (Nick Harvey's notes, Class notes, Recording)
Online Learning:
Lecture 16: Online learning (Section 21.1 in SSBD, Class notes, Recording)
Lecture 17: Littlestone dimension (Section 21.1 in SSBD, Class notes, Recording)
Lecture 18: Weighted Majority algorithm (Section 1.1, 1.2 in note, Class notes, Recording)
Lecture 19: Randomized Weighted Majority (Class notes, Recording)
Lecture 20: Discussions on Randomized Weighted Majority, FTL (Avrim Blum's notes, Class notes, Recording)
Lecture 21: FTL, FTRL (Avrim Blum's notes, Class notes, Recording)
Other Topics:
Lecture 22: Boosting (Sections 10.1, 10.2 in SSBD, Class notes)
In-person classes resume.
Convex Optimization:
Lecture 23, 24: Chapter9 in SSBD
Lecture 25: Sections 12.1.1 and 12.1.2 in SSBD
Lecture 26, 27 and 28: Sections 14.1, 14.2, and 14.3 in SSBD
Lecture 29, 30 and 31: Sections 14.3, 14.4 and 14.5 in SSBD
Upsupervised Learning:
Lecture 31,32: Clustering (Sections 22.1, 22.2 in SSBD, Sections 7.1,7.2 in BHK)
Lecture 33: SVD (Sections 3.1,3.2,3.3 in BHK)
Lecture 34, 35, 36: (Sections 3.5, 3.6, 3.7, 3.8 in BHK). Follow these links (1,2,3) for a more conceptual understanding.
Lecture 37, 38, 39: Learning mixtures of Gaussians, EM algorithm (24.1, 24.4 in SSBD, Sections 2.1, 2.2 in here)
Lecture 40, 41: SVD-based algorithm for learning mixtures of Spherical Gaussians (Sections 2.2.1 in here)
Project Presentations by PhD Students:
Sasmita presented "Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity"
Subham talked about "Incentive Aware PAC Learning"