databases
Spectral Bloom filters. Saar Cohen, Yossi Matias. SIGMOD 03.
This paper independently introduces a Count-Min-like sketch for multisets (under the name "spectral Bloom filters"), and presents a number of applications in databases. Also, some heuristics for improving the accuracy of the sketch are discussed.
Statistical Analysis of Sketch Estimators. Florin Rusu, Alin Dobra. SIGMOD 07.
Performs a thorough analysis of sketch methods, primarily for the task of join size estimation (inner product), and concludes that Count-Min sketch is competitive across a variety of input settings.
"In this paper, we set out to perform a detailed study of the statistical and empirical behavior of the four basic sketching techniques that have been proposed in the research literature for computing size of join and related problems: AGMS, Fast-AGMS, Count-Min, and Fast-Count sketches."
Graph-based synopses for relational selectivity estimation. Joshua Spiegel, Neoklis Polyzotis. SIGMOD 06.
Approximate Counts and Quantiles over Sliding Windows. Arvind Arasu, Gurmeet Singh Manku. PODS 04.
Finding global icebergs over distributed data sets. Qi (George) Zhao, Mitsunori Ogihara, Haixun Wang, Jun (Jim) Xu. PODS 06.
Considers how to find significant counts in a communication efficient way over distributed data, and compares to Count-Min sketch as a compact solution to this problem.
False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. Wei Wang, Haifeng Jiang, Hongjun Lu, Jeffrey Xu Yu, VLDB 2004.
Discusses techniques for frequent itemset mining on streaming data.
Graph-based synopses for relational selectivity estimation, Joshua Spiegel, Neoklis Polyzotis, SIGMOD 2006.
“Our modified BIRCH algorithm uses count-min sketches to approximate the linear sum component of a cluster’s statistics and to essentially estimate the distance metric for the insertion of new points.”
Power-Conserving Computation of Order-Statistics over Sensor Networks, Michael Greenwald, Sanjeev Khanna PODS 2004.
Observes that Count-Min sketch can be used to approximate quantiles of data within a distributed sensor (tree) network.
Mining Top-K Frequent Itemsets Through Progressive Sampling, Andrea Pietracaprina, Matteo Riondato, Eli Upfal, Fabio Vandin. Data Mining and Knowledge Discovery, Volume 21, Number 2, 2010.
Uses a Count-Min sketch to build an approximation of the distribution of the frequencies of the itemsets in a dataset. The innovative contribution is a method for building this approximation without querying the sketch for each itemset.