[Slides] [Annotated Slides] Reading: MIT short videos and exercises on probability (Unit 4). Khan academy probability lessons (a bit more basic, Units 7 and 9). Chapters 1-3 of Probability and Computing for basic probability: events, random variables, expectation, and variance.
Lecture 2 (09/04 Thu): Linearity of Expectation and Variance. Counting Duplicates. Markov’s Inequality.
[Slides] [Annotated Slides] Reading: Chapters 2.1-2.2 of Probability and Computing for linearity of expectation and Bernoulli (indicator) random variables, and Chapter 3.1 for Markov's inequality.
Lecture 3 (09/09 Tue): Fully Random, Pairwise Independent, and Two-Universal Hash Functions. Collision-Free and Two-Level Hashing.
[Slides] [Annotated Slides] Reading: Chapters 13.1 and 13.3 of Probability and Computing for hash functions. Section 1.3 of notes by Sanjeev Arora and Pravesh Kothari at Princeton for constructing 2-universal hash functions. Pages 11-23 of slides by Kevin Wayne at Princeton for asymptotics and big O notation.
Lecture 4 (09/11 Thu): Hashing for Load Balancing. Chebyshev's Inequality. The Union Bound. Chernoff Bounds.
[Slides] [Annotated Slides] Reading: Chapters 3.2 and 3.3 of Probability and Computing for moments and Chebyshev's inequality, and Chapter 4 for Chernoff bounds.
Lecture 5 (09/16 Tue): Fourth Moment Bound. Exponential Concentration Inequalities. Bernstein Inequalities. Chernoff Bounds.
[Slides] [Annotated Slides] Reading: Chapter 4 of Probability and Computing for moment generating functions and Chernoff bounds.
Lecture 6 (09/18 Thu): Applications of Chernoff Bounds. Bloom Filters.
[Slides] [Annotated Slides] Reading: Chapter 4 of Probability and Computing for Chernoff bounds. Chapter 4.3 of Mining of Massive Datasets for Bloom filters. Paper that gave a corrected analysis of Bloom filters.
Lecture 7 (09/23 Tue): Bloom Filters. Count-Min Sketch for Frequent Elements.
[Slides] [Annotated Slides] Reading: Chapters 1 and 5 of notes by Amit Chakrabarti at Dartmouth for frequent elements.
Lecture 8 (09/25 Thu): Count-Min Sketch for Frequent Elements. Min-Hashing for Counting Distinct Elements.
[Slides] [Annotated Slides] Reading: Website for resources, implementations, and example applications of count-min sketch. Chapter 4.4 of Mining of Massive Datasets for distinct elements counting.
Lecture 9 (09/30 Tue): Analysis of Min-Hashing. The Median Trick. Distinct Elements in Pratice: Flajolet-Martin and HyperLogLog.
[Slides] [Annotated Slides] Reading: Exercise 4.9 of Probability and Computing for median-of-means. The 2007 paper that introduced the HyperLogLog algorithm.
Lecture 10 (10/02 Thu): Jaccard Similarity. Fast Similarity Search. Locality Sensitive Hashing.
Reading: Chapter 3 of Mining of Massive Datasets.