The Bloom Paradox in the Counting Bloom Filter

advertisement

The Bloom Paradox

Ori Rottenstreich

Joint work with Isaac Keslassy

Technion , Israel

Problem Definition user cost = 1

S local cache x z cost = 10 cost = 10

M central memory with all elements x y z

• Requirement: A data structure in user with fast answer to

• Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives u y v user

2

Two Possible Errors

• False Positive: but the data structure answers

• Results in a redundant access to the local cache.

y

 Additional cost of 1.

• False Negative: but the data structure answers

• Results in an expensive access to the central memory instead of the local cache.

x

 Additional cost of 10-1=9.

3

Bloom Filters (Bloom, 1970)

• Initialization: Array of zero bits.

0 0 0 0 0 0 0 0 0 0

0

0

• Insertion: Each of the elements is hashed times, the corresponding bits are set.

• Query: Hashing the element, checking that all bits are set.

1 x

1

1 1

1 y

1

0 0 0 0 0 0 0

1 x

1

1

1 1 z

1 w

1

1

1

• False positive rate (probability) of .

• No false negatives.

4

Bloom Filters are Widely Used

• Cache/Memory Framework

• Packet Classification

• Intrusion Detection

• Routing

• Accounting

• Beyond networking: Spell Checking, DNA Classification

• Can be found in o Google's web browser Chrome o Google's database system BigTable o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System

5

The Bloom Paradox

Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless .

6

Outline

 Introduction to Bloom Filters

 The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

 Summary

7

Bloom Paradox Example

Bloom filter

• Parameters:

• Extreme case without locality: All elements with equal probability of belonging to the cache.

o Toy example

8

Bloom Paradox Example

• Parameters:

• Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter

• Intuition: cost = 1

S local cache x z

.

cost = 10 cost = 10

.

M central memory with all elements x y z u v

9

Bloom Paradox Example

• Parameters:

• Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter

• Surprise:

B

Bloom filter cost = 1

S local cache x z

.

cost = 10 cost = 10

.

central memory with all elements x y

M z u v

9

Bloom Paradox Example

• Parameters:

• Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter

• Surprise:

B

Bloom filter

.

.

The Bloom filter indicates the membership of elements. Only of them are indeed in .

Bloom Paradox Example

• When the Bloom filter states that , it is wrong with probability

• Average cost if we listen to the Bloom filter:

• Average cost if we don ’t: = =

The Bloom filter is useless!

Don ’t listen to the Bloom filter

11

Outline

 Introduction to Bloom Filters

 The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

 Summary

12

Costs of the Two Possible Errors

• The cost of a false positive : 1

• The cost of a false negative :

• In the cache example:

13

Conditions for the

Bloom Paradox

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when: o is small local cache

Bloom filter central memory

14

Conditions for the

Bloom Paradox

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) local cache

Bloom filter central memory

14

Conditions for the

Bloom Paradox

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) local cache

Bloom filter central memory

14

Conditions for the

Bloom Paradox

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when: o o o is small (because the Bloom filter implicitly assumes )

• • Theorem 1 : : (for )

The Bloom paradox occurs if and only if

14

Bloom Filter Improvements

• Theorem 1 :

The Bloom paradox occurs if and only if

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful local cache

Bloom filter central memory

15

Bloom Filter Improvements

• Theorem 1 :

The Bloom paradox occurs if and only if

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful local cache

Bloom filter central memory

15

Outline

 Introduction to Bloom Filters

 The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

 Summary

16

Counting Bloom Filters (CBFs)

• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.

x

1

1

1 1 y

1

1

0 1 0 1 0 0 0 0 0 0 0 0

• The solution: Counting Bloom filters - Storing array of counters instead of bits.

o Insertion: Incrementing counters by one.

o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.

+1 x

+1 +1 +1 y

+1

+1

0 1 0 1 0 0 2 0 1 0 1 0

• The same false positive probability.

• Require too much memory, e.g. 57 bits per element for .

Counting Bloom Filter Query

• Query y o Checking that counters are positive.

0 1 0 2 5 0 1 8 3 0 2 1 y z o Question: Which is more likely to be correct? y or z?

18

The Bloom Paradox in the

Counting Bloom Filter

• Theorem 2 :

Let denote the values of the counters pointed by the set of hash functions. Then,

Only counters product matters!

19

CBF Based

Membership Probability

-Before checking CBF, a priori membership probability = ≈ 0.03

-CBF indicates counters product=8

 a posteriori membership probability ≈ 0.69

• Parameters: n=3328, m = 28485, k=6

20

Experimental Results

• Internet trace (equinix-chicago) with real hash functions.

Counting Bloom filter parameters: n=2 10 , m / n = 30, k=5, 2 20 queries

21

Concluding Remarks

• Discovery of the Bloom paradox

• Importance of the a priori membership probability

• Using the counters product to estimate the correctness of a positive indication of the CBF

22

Thank You

Download