Thu, 11/09: Course Introduction: Big Data, Topics, Tools
Tue, 30/09: Probability: Probability Spaces, Union Bound, Monty Hall problem, Independent Events, Verification of Polynomials
Thu, 02/10: Probability: Conditional Probability, Law of Total Probability, Bayes' Law, Karger's Min-Cut Algorithm, Karger-Stein's Algorithm
Tue, 07/10: Probability: Random Variables, Expectation (Linearity of Expectation, Conditional Expectation); Bernoulli, Binomial, Geometric Random Variables; Coupon Collector's problem
Thu, 09/10: Probability: Markov Inequality; Variance, Covariance, Chebyshev Inequality; Chernoff Bounds (for Binomial Random Variables)
Tue, 14/10: Streaming: Model, Reservoir Sampling, Bloom Filters
[MMD, Chapter 4.1, 4.2, 4.3], [AMD, Chapter 3.1], [Reservoir sampling: lecture notes]
Thu, 16/10: Streaming: Bloom Filters (cont.), Morris Counter, Median of Means
[MMD, Chapter 4.3], [AMD, Chapters 3.1, 5.5, 2.2.4], [DSA, Chapter 4]
Tue, 21/10: Streaming: Frequency Estimation (Majority, Misra-Gries, Count-Min Sketch)
[AMD, Chapters 5.3], [DSA, Chapters 1, 5.4, 5.5]
Thu, 23/10: Streaming: Count-Min Sketch (cont.); Cardinality Estimation: Flajolet-Martin's LogLog Algorithm
[MMD, Chapter 4.4], [AMD, Chapters 5.4], [DSA, Chapter 2]
Tue, 28/10: Streaming: Count 1's: DGIM and Sum Integers; Homework 1 Solutions
[MMD, Chapter 4.6], [AMD, Chapters 5.2]
DGIM simulator [website]
Homework 1: Cover's paradox [paper]
Thu, 30/10: Market-Basket Model: Association Rules, Frequent Itemsets Algorithms (Apriori, SON)
Tue, 04/11: Market-Basket Model: Frequent Itemsets Algorithms (SON, Random Sampling, Toivonen)
[MMD, Chapter 6.4]
ARMED Testing [AST 22]
Thu, 06/11: Similarity Search: Hash Functions (Uniform, k-wise Independent, Universal), Metric Spaces (Distances: Minkowski, Jaccard, Hamming, Cosine, Edit)