Recent site activity

  • Home
    edited by Paulo Sérgio Almeida
    attachment from Paulo Sérgio Almeida
    attachment removed by Paulo Sérgio Almeida
  • View All

Home

Scalable Bloom Filters

Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability. Scalable Bloom Filters are a variant of Bloom Filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability. Scalable Bloom Filters are described in the paper:

Scalable Bloom Filters
Paulo Sérgio Almeida, Carlos Baquero, Nuno Preguiça, David Hutchison
Information Processing Letters
Volume 101, Issue 6, 31 March 2007, Pages 255-261

Implementation in Erlang

An implementation en Erlang (module bloom.erl), which provides both Bloom Filters (partitioned variant) and Scalable Bloom Filters. This module assumes the existence of a module called bitarray so that  different alternatives may be provided. Two examples of module bitarray are shown, one with update in-place using hipe_bifs (very efficient), and a purely functional in the spirit of Erlang using the array module. The implementation uses bit arrays dimensioned as a power of 2 to enable reusing hash values across filters through bit operations. Double hashing is used (no need for enhanced double hashing for partitioned bloom filters). Overall it should be a quite efficient implementation, specially when using hipe_bifs bitarrays.

Č
ċ
ď
bitarray.erl-array
(0k)
Paulo Sérgio Almeida,
May 13, 2009, 3:40 PM
ċ
ď
bitarray.erl-hipe_bifs
(0k)
Paulo Sérgio Almeida,
May 13, 2009, 3:40 PM
ċ
ď
bloom.erl
(6k)
Paulo Sérgio Almeida,
May 13, 2009, 3:37 PM
Comments