databases

  • Spectral Bloom filters. Saar Cohen, Yossi Matias. SIGMOD 03. 
    This paper independently introduces a Count-Min-like sketch for multisets (under the name "spectral Bloom filters"), and presents a number of applications in databases. Also, some heuristics for improving the accuracy of the sketch are discussed.
Performs a thorough analysis of sketch methods, primarily for the task of join size estimation (inner product), and concludes that Count-Min sketch is competitive across a variety of input settings.
"In this paper, we set out to perform a detailed study of the statistical and empirical behavior of the four basic sketching techniques that have been proposed in the research literature for computing size of join and related problems: AGMS, Fast-AGMS, Count-Min, and Fast-Count sketches."
Considers how to find significant counts in a communication efficient way over distributed data, and compares to Count-Min sketch as a compact solution to this problem.
Discusses techniques for frequent itemset mining on streaming data.
“Our modified BIRCH algorithm uses count-min sketches to approximate the linear sum component of a cluster’s statistics and to essentially estimate the distance metric for the insertion of new points.”
Observes that Count-Min sketch can be used to approximate quantiles of data within a distributed sensor (tree) network.
Uses a Count-Min sketch to build an approximation of the distribution of the frequencies of the itemsets in a dataset. The innovative contribution is a method for building this approximation without querying the sketch for each itemset.

Comments