databases

- Spectral Bloom filters. Saar Cohen, Yossi Matias. SIGMOD 03.
  - This paper independently introduces a Count-Min-like sketch for multisets (under the name "spectral Bloom filters"), and presents a number of applications in databases. Also, some heuristics for improving the accuracy of the sketch are discussed.
- Statistical Analysis of Sketch Estimators. Florin Rusu, Alin Dobra. SIGMOD 07.

Performs a thorough analysis of sketch methods, primarily for the task of join size estimation (inner product), and concludes that Count-Min sketch is competitive across a variety of input settings.

"In this paper, we set out to perform a detailed study of the statistical and empirical behavior of the four basic sketching techniques that have been proposed in the research literature for computing size of join and related problems: AGMS, Fast-AGMS, Count-Min, and Fast-Count sketches."

- Graph-based synopses for relational selectivity estimation. Joshua Spiegel, Neoklis Polyzotis. SIGMOD 06.
- Approximate Counts and Quantiles over Sliding Windows. Arvind Arasu, Gurmeet Singh Manku. PODS 04.
- Finding global icebergs over distributed data sets. Qi (George) Zhao, Mitsunori Ogihara, Haixun Wang, Jun (Jim) Xu. PODS 06.

Considers how to find significant counts in a communication efficient way over distributed data, and compares to Count-Min sketch as a compact solution to this problem.

False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams. Wei Wang, Haifeng Jiang, Hongjun Lu, Jeffrey Xu Yu, VLDB 2004.

Discusses techniques for frequent itemset mining on streaming data.

- Graph-based synopses for relational selectivity estimation, Joshua Spiegel, Neoklis Polyzotis, SIGMOD 2006.

“Our modified BIRCH algorithm uses count-min sketches to approximate the linear sum component of a cluster’s statistics and to essentially estimate the distance metric for the insertion of new points.”

Power-Conserving Computation of Order-Statistics over Sensor Networks, Michael Greenwald, Sanjeev Khanna PODS 2004.

Observes that Count-Min sketch can be used to approximate quantiles of data within a distributed sensor (tree) network.

Mining Top-K Frequent Itemsets Through Progressive Sampling, Andrea Pietracaprina, Matteo Riondato, Eli Upfal, Fabio Vandin. Data Mining and Knowledge Discovery, Volume 21, Number 2, 2010.

Uses a Count-Min sketch to build an approximation of the distribution of the frequencies of the itemsets in a dataset. The innovative contribution is a method for building this approximation without querying the sketch for each itemset.

Page updated

Google Sites

Report abuse