Advanced Machine Learning (CS461/CSE661) in Spring 2025
Instructor: Anup Bhattacharya
TA: Pinki Pradhan
Instructor: Anup Bhattacharya
TA: Pinki Pradhan
Course description: Knowingly or unknowingly each of us are using machine learning in our daily lives and its impact on our lives is only increasing with time. One of the primary focus of this course would be to understand how these machine learning algorithms work. We will study machine learning algorithms in different applied contexts, argue why they work and understand their limitations.
Our approach will be theoretical, and whenever possible, we will try to prove mathematical guarantees. For the first 3/4th of this course, we will take this view and study different models used in machine learning and learn about various algorithms at play. For the last part of the course, we will focus on the more recent advances for which we don't yet have a very good understanding.
Prerequisites: Algorithms (CS301). Knowledge of CS460 (Machine Learning) will be beneficial but is not a prerequisite for this course. Since the primary language of the course will be mathematical, the students taking this course are expected to be comfortable in mathematical reasoning. In particular, basic knowledge of probability, linear algebra and some calculus will be absolutely essential for this course. If you are planning to register for this course, please keep in mind that this course might be a little heavy. So, register for this course if you are quite comfortable with the prerequisites and/or you are willing to devote enough time for this course.
Grading: 75% of credit will be from exams (10% from class tests, 25% midsem, 40% endsem), and the rest 25% will be on a project for which the students need to prepare a report and give a presentation at the end of the course. The topics of the projects will be based on some of the major advances in the field.
Class timings:
Tuesday and Thursday from 5:30 PM to 7 PM at M5. We will have a tutorial class on Wednesday at 5:30 PM also at M5.
Textbooks and other references:
Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David (SSBD) (PDF)
Youtube videos of a course by Shai Ben-David (youtube link)
Patterns, Predictions, and Actions: A story about machine learning (PDF)
An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani (KV) (google books)
Foundations of Data Science (BJK): PDF
Relevant courses elsewhere:
Practice problems: We will add practice problems as the semester progresses.
Lectures:
Lecture 1 (06/01/2025): Introduction, Model of Statistical Learning (Section 2.1 of SSBD)
Lecture 2 (07/01/2025): ERM, ERM with inductive bias, Learning finite hypothesis classes (Section 2.2, 2.3 of SSBD)
Lecture 3 (09/01/2025): Learning finite hypothesis classes, PAC learning, Agnostic PAC (Section 3.1, 3.2 of SSBD)
Lecture 4 (14/01/2025): Agnostic PAC, Learning finite hypothesis classes (Chapter 4 of SSBD)
For concentration inequalities, check out these notes one and two (We will cover some problems on these in the tutorial).
Lecture 5 (16/01/2025): No-free lunch theorem (we did not prove this theorem in the class), Bias-variance tradeoff (Chapter 5 of SSBD)
Lecture 6 (21/01/2025): VC-dimension, Fundamental theorem of Statistical Learning, Sauer's lemma (Section 6.1-6.4 of SSBD)
Lecture 7 (23/01/2025): Proof of Sauer's lemma, Computational aspects of learning theory (Section 6.5.1 of SSBD, Section 8.1 of SSBD)
Lecture 8 (28/01/2025): Computational learning, hardness of 3-term DNF formula (Section 1.4 of KV)
Tutorial (29/01/2025) by Pinki
Lecture 9 (30/01/2025): Hardness of 3-term DNF (Section 1.4, 1.5 of KV)
Lecture 10 (04/02/2025): Learning in the presence of noise (Section 5.1, 5.2 of KV)
Lecture 11 (11/02/2025): Algorithms for learning linear predictors: Preceptron (Chapter 9 in SSBD, upto section 9.1.2)
Lecture 12 (12/02/2025): Margin perceptron, kernel trick (Section 2 from this note)
Lecture 13 (13/02/2025): Convex optimization (Chapter 12, upto Section 12.1.1 of SSBD)
Lecture 14 (18/02/2025): Gradient descent, its analysis for convex Lipschitz functions (Section 14.1 of SSBD)
Lecture 15 (19/02/2025): Stochastic gradient descent (Section 14.2, 14.3 in SSBD)
Lecture 16 (20/02/2025): Discussed solutions of class test1
Lecture 17 (11/03/2025): Online learning, mistake bounds, Littlestone dimension (Section 21.1 of SSBD)
Lecture 18 (12/03/2025): Weighted majority, randomized weighted majority (Section 21.2 of SSBD)
Lecture 19 (18/03/2025): Analysis for randomized weighted magority algorithm (Chapter 14 and 15 of this note, Survey)
Lecture 20 (20/03/2025): Multiplicative weight updates framework and its applications (Chapter 14 and 15 of this note, Survey)
Lecture 21 (08/04/2025): Boosting (Chapter 4 (up to Section 4.3) in KV
Lecture 22 (09/04/2025): Stochastic bandits (Chapter 1 (Section 1.1 and 1.2 of Slivkins))
Lecture 23 (10/04/2025): Adversarial bandits (Chapter 6 (Section 6.1,6.2,6.3 in Slivkins)
Lecture 24 (14/04/2025): SVD/PCA (Chapter 2 of BJK). These notes (one,two,three)
Schedule of Project presentations:
15th April 2025
5:30 PM: Abhishek Singh and Pritipriya Dasbehera: Multiplicative Updates in Coordination Games and the Theory of Evolution (https://arxiv.org/pdf/1208.3160v1) (Report)
6:00 PM: Nehal Khosla and Upasana Das: Calibrated Language Models Must Hallucinate: https://arxiv.org/pdf/2311.14648 (Report)
6:30 PM Pankaj Kumar: Robustness - Understanding the Failure Modes of Out-of-Distribution Generalization (https://arxiv.org/abs/2010.15775) (Report)
16th April 2025:
5:30 PM: Khushi Sharma and Sajag Kumar: Descent-to-delete: Gradient-based methods for machine unlearning: https://arxiv.org/pdf/2007.02923 (Report)
6:00 Pm: Rishi Raj Sahoo: Differential privacy (https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf) (Report)
17th April 2025:
5:30 PM: Kaling Vikram Singh and Sidhant Mahi: Brain computation by assemblies of neurons (https://www.pnas.org/doi/pdf/10.1073/pnas.2001893117) (Report)
6:00 PM: Harisankar Binod and Yash Chauhan: Transformers Learn Shortcuts to Automata (https://arxiv.org/abs/2210.10749) (Report)
6:30 PM: Haribandhu Jena: Language generation in the limit (Neurips 2024): https://arxiv.org/pdf/2404.06757 (Report)
22nd April 2025:
5:30 PM: Suryendu Mondal: The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning: https://arxiv.org/pdf/1712.06559 (Report)
6:00 PM: Rajat Shuvra Saha and Dipendu Bar: Online Learning in Online Auctions: https://www.cs.cmu.edu/~avrim/Papers/onlineauction.pdf (Report)
6:30 PM: Shiva Thanay P. and Sattvik Yadav: Score-Based Generative Modeling through Stochastic Differential Equations: https://arxiv.org/abs/2011.13456 (Report)
23rd April 2025:
5:30 PM: Pradeep Kumar Baisakh and Rishav Das: Active Learning: A general agnostic active learning algorithm: https://www.cs.columbia.edu/~djhsu/papers/cal.pdf (Report)
6:00 PM: Vishal Meena and Prayas Patra: Calibration: Omnipredictors: https://arxiv.org/pdf/2109.05389 (Report)
24th April 2025:
5:30 PM: Nissy William and Ipsita Rout: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings (https://arxiv.org/pdf/1607.06520.pdf) (Report)
6:00 PM: Vamsi Krishna Taviti: Calibration: Omnipredictors (ITCS 2022) (https://arxiv.org/pdf/2109.05389) (Report)
6:30 PM: Summit Bikram Nayak on Biological random projection: A neural algorithm for a fundamental computing problem (https://www.science.org/doi/pdf/10.1126/science.aam9868?ijkey=aX3uts9Y4xqPE&keytype=ref&siteid=sci) (Report)