The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel Problem Definition yx user x y cost = 1 S cost = 10 local cache x cost = 10 M central memory with all elements x y z u v z • Requirement: A data structure in user with fast answer to • Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives y user 2 Two Possible Errors • False Positive: but the data structure answers • Results in a redundant access to the local cache. y Additional cost of 1. • False Negative: but the data structure answers • Results in an expensive access to the central memory instead of the local cache. x Additional cost of 10-1=9. 3 Bloom Filters (Bloom, 1970) • Initialization: Array of 0 0 zero bits. 0 0 0 0 0 0 0 0 0 0 • Insertion: Each of the elements is hashed times, the corresponding bits are set. • Query: Hashing the element, checking that all bits are set. y x 1 0 1 0 0 1 x 1 1 1 1 0 0 1 11 0 1 0 1 z 0 1 1 1 1 0 0 11 1 w 1 0 0 • False positive rate (probability) of • No false negatives 4 Bloom Filters are Widely Used • • • • • • Cache/Memory Framework Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification • Can be found in o Google's web browser Chrome o Google's database system BigTable o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System 5 Outline Introduction to Bloom Filters The Bloom Paradox The Variable-Increment Counting Bloom Filter 6 The Bloom Paradox Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless. 7 Example Bloom filter • Parameters: • Extreme case without locality: All elements with equal probability of belonging to the cache. o Toy example 8 The Bloom Paradox • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives → • Intuition: B user Bloom filter Bloom filter cost = 1 S cost = 10 cost = 10 local cache x central memory with all elements x z . M . y z u v 9 The Bloom Paradox • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives → • Surprise: B Bloom filter cost = 1 S cost = 10 cost = 10 local cache x central memory with all elements x z . M . y z u v 9 The Bloom Paradox • Parameters: • Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives → • Surprise: B Bloom filter . . The Bloom filter indicates the membership of elements. Only of them are indeed in . The Bloom Paradox • When the Bloom filter states that , it is wrong with probability • Average cost if we listen to the Bloom filter: • Average cost if we don’t: = = The Bloom filter is useless! Don’t listen to the Bloom filter 11 Outline Introduction to Bloom Filters The Bloom Paradox The Variable-Increment Counting Bloom Filter 12 Counting Bloom Filters (CBFs) • Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. y x 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 0 • The solution: Counting Bloom filters - Storing array of instead of bits. o Insertion: Incrementing counters by one. o Deletion: Decrementing counters by one. o Query: Checking that counters are positive. y x +1 +1 0 1 0 +1 +1 1 0 0 2 0 +1 1 counters +1 0 1 • The same false positive probability. • Require too much memory, e.g. 57 bits per element for 0 . Intuition for Variable Increments • Upon query, we should consider the exact values of the counters and not just their positiveness 0 1 0 2 y 5 0 1 8 3 0 2 1 z • Can we design a deterministic scheme that exploits the exact values of the counters? • Idea: Use variable increments to encode the element identity 14 Architecture • Each hash entry contains a pair of counters: o , fixed increments → number of elements in entry (as in CBF) o , variable increments → weighted sum of elements o weights from a pre-determined set • We use two sets of hash functions: o The first set uses hash functions with range , i.e. it points to the set of entries. o The second set uses hash functions with range , i.e. it points to the set . 1 2 3 4 5 6 7 8 9 c1 0 5 3 2 2 3 3 3 4 c2 0 34 25 26 17 21 9 6 26 15 Insertion • Insertion: At each entry , the two counters are updated as follows. o o from the set • Example 1: 1 2 3 4 5 6 7 8 9 c1 001 5 334 2 324 3 43 5 3 4 c2 008 34 25 2529 17 301743 21 30934 13 26 +8 x +4 +13 z +4 16 Query • Query c1 c2 y ( with ) 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 0 34 25 17 30 21 30 13 26 4? 8? y? • We ask whether o 17 can be a sum of 2 elements from the set o 30 can be a sum of 3 elements from the set • No: • How should we pick the set of variable increments? We should use including 4 including 8 Sequences! 17 Bh Sequences • Definition 1: Let Then, with is a be a sequence of positive integers. sequence iff all the sums are distinct. • Example 2: All the sums of • elements of are distinct: Therefore, is a sequence. sequences are widely used in error-correcting codes. 18 The Bh-CBF Scheme Query • Example 3: c1 c2 is a sequence 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 0 34 25 17 30 21 30 13 26 1? 4? X? o Since , then the Bh-CBF can determine that 19 The The Bh-CBF Bh-CBF Scheme Scheme Operations Query • Example 3: c1 c2 is a sequence 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 0 34 25 17 30 21 30 13 26 1? X? o Here, Since 4? 4? 8? y? and then necessarily , the Bh-CBF can determine that 19 The The Bh-CBF Bh-CBF Scheme Scheme Operations Query • Example 3: c1 c2 is a sequence 1 2 3 4 5 6 7 8 9 0 5 3 2 3 3 4 3 4 0 34 25 17 30 21 30 13 26 1? X? o Since 4? 4? y? 8? 4? 13? z? , the Bh-CBF cannot exclude that 19 Experimental Results • Internet trace (equinix-chicago) with real hash functions. For the Bh-CBF, (with ). 20 Concluding Remarks • The Bloom Paradox o Discovery of the Bloom paradox o Importance of the a priori membership probability • The Variable-Increment Counting Bloom Filter o Can extend many variants of the counting Bloom filter o First time sequences are presented in networking applications 21 Thank You