[Slides] [Annotated Slides] Reading: MIT short videos and exercises on probability (Unit 4). Khan academy probability lessons (a bit more basic, Units 7 and 9). Chapters 1-3 of Probability and Computing on basic probability: events, random variables, expectation, and variance.
Lecture 2 (09/04 Thu): Linearity of Expectation and Variance. Counting Duplicates. Markov’s Inequality.
[Slides] [Annotated Slides] Reading: Chapters 2.1-2.2 of Probability and Computing on linearity of expectation and Bernoulli (indicator) random variables, and Chapter 3.1 on Markov's inequality.
Lecture 3 (09/09 Tue): Fully Random, Pairwise Independent, and Two-Universal Hash Functions. Collision-Free and Two-Level Hashing.
[Slides] [Annotated Slides] Reading: Chapters 13.1 and 13.3 of Probability and Computing on hash functions. Section 1.3 of notes by Sanjeev Arora and Pravesh Kothari (Princeton) on constructing 2-universal hash functions. Pages 11-23 of slides by Kevin Wayne (Princeton) on asymptotics and big O notation.
Lecture 4 (09/11 Thu): Hashing for Load Balancing. Chebyshev's Inequality. The Union Bound. Chernoff Bounds.
[Slides] [Annotated Slides] Reading: Chapters 3.2 and 3.3 of Probability and Computing on moments and Chebyshev's inequality, and Chapter 4 on Chernoff bounds.
Lecture 5 (09/16 Tue): Fourth Moment Bound. Exponential Concentration Inequalities. Bernstein Inequalities. Chernoff Bounds.
[Slides] [Annotated Slides] Reading: Chapter 4 of Probability and Computing on moment generating functions and Chernoff bounds.
Lecture 6 (09/18 Thu): Applications of Chernoff Bounds. Bloom Filters.
[Slides] [Annotated Slides] Reading: Chapter 4 of Probability and Computing on Chernoff bounds. Chapter 4.3 of Mining of Massive Datasets on Bloom filters. Paper that gave a corrected analysis of Bloom filters.
Lecture 7 (09/23 Tue): Bloom Filters. Count-Min Sketch for Frequent Elements.
[Slides] [Annotated Slides] Reading: Chapters 1 and 5 of notes by Amit Chakrabarti (Dartmouth) on frequent elements.
Lecture 8 (09/25 Thu): Count-Min Sketch for Frequent Elements. Min-Hashing for Counting Distinct Elements.
[Slides] [Annotated Slides] Reading: Website with resources, implementations, and example applications of count-min sketch. Chapter 4.4 of Mining of Massive Datasets on distinct elements counting.
Lecture 9 (09/30 Tue): Analysis of Min-Hashing. The Median Trick. Distinct Elements in Pratice: Flajolet-Martin and HyperLogLog.
[Slides] [Annotated Slides] Reading: Exercise 4.9 of Probability and Computing on median-of-means. The 2007 paper that introduced the HyperLogLog algorithm.
Lecture 10 (10/02 Thu): Jaccard Similarity. Fast Similarity Search. Locality Sensitive Hashing.
[Slides] [Annotated Slides] Reading: Chapter 3 of Mining of Massive Datasets.
Lecture 11 (10/07 Tue): SimHash for Cosine Similarity. The Johnson-Lindenstrauss Lemma.
[Slides] [Annotated Slides] Reading: Chapter 2.7 of Foundations of Data Science and notes by Anupam Gupta (NYU) on the JL lemma.
Lecture 12 (10/09 Thu): Proof of the Johnson-Lindenstrauss Lemma. Application to Clustering.
[Slides] [Annotated Slides] Reading: Paper by Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós on a "sparse" JL transform.
Lecture 13 (10/14 Tue): Midterm Review.
[Slides] Reading: Study guide and practice questions for the Fall 2024 offering of the course.
Lecture 14 (10/16 Thu): Principal Component Analysis. Orthogonal Bases and Projection Matrices. Dual View of Low-Rank Approximation.
[Slides] [Annotated Slides] Reading: Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets.
Lecture 15 (10/21 Tue): Best Fit Subspace. Optimal Low-Rank Approximation via Eigendecomposition.
[Slides] [Annotated Slides] Reading: Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets.
Lecture 16 (10/23 Thu): Finish Up PCA. Eigenvalues as a Measure of Low-Rank Approximation Error. Linear Algebra Review.
[Slides] [Annotated Slides] Reading: Chapter 3 of Foundations of Data Science and Chapter 11 of Mining of Massive Datasets.