databases

    • Spectral Bloom filters. Saar Cohen, Yossi Matias. SIGMOD 03.

      • This paper independently introduces a Count-Min-like sketch for multisets (under the name "spectral Bloom filters"), and presents a number of applications in databases. Also, some heuristics for improving the accuracy of the sketch are discussed.

    • Statistical Analysis of Sketch Estimators. Florin Rusu, Alin Dobra. SIGMOD 07.

Performs a thorough analysis of sketch methods, primarily for the task of join size estimation (inner product), and concludes that Count-Min sketch is competitive across a variety of input settings.

"In this paper, we set out to perform a detailed study of the statistical and empirical behavior of the four basic sketching techniques that have been proposed in the research literature for computing size of join and related problems: AGMS, Fast-AGMS, Count-Min, and Fast-Count sketches."

Considers how to find significant counts in a communication efficient way over distributed data, and compares to Count-Min sketch as a compact solution to this problem.

Discusses techniques for frequent itemset mining on streaming data.

“Our modified BIRCH algorithm uses count-min sketches to approximate the linear sum component of a cluster’s statistics and to essentially estimate the distance metric for the insertion of new points.”

Observes that Count-Min sketch can be used to approximate quantiles of data within a distributed sensor (tree) network.

Uses a Count-Min sketch to build an approximation of the distribution of the frequencies of the itemsets in a dataset. The innovative contribution is a method for building this approximation without querying the sketch for each itemset.