CS 221 Guest lecture: Cuckoo Hashing Shannon Larson March 11, 2011 Learning Goals • Describe the cuckoo hashing principle • Analyze the space and time complexity of cuckoo hashing • Apply the insert and lookup algorithms in a cuckoo hash table • Construct the graph for a cuckoo table Remember Graphs? • A set of nodes V = 𝑣1 , … , 𝑣𝑛 • A set of edges 𝐸 = 𝑣𝑖1 , 𝑣𝑖2 , … , 𝑣𝑖𝑘−1 , 𝑣𝑖𝑘 • Here: – 𝑉 = 𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 , 𝑣5 – 𝐸 = 𝑣1 , 𝑣2 , … 𝑣3 , 𝑣5 – 𝐸 = {𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 , 𝑒5 , 𝑒6 } Graph Cycles • A graph cycle is a path of edges such that the first and last vertices are the same 𝑣1 , 𝑣2 , 𝑣5 , 𝑣3 , 𝑣4 , 𝑣1 Recall Hashing • A hash function ℎ(𝑥) – Takes the target 𝑥 – Hashes x to a bucket 0 … 𝑁 − 1 • Perfect hashing is ideal: – O(1) lookup – O(1) insert • Perfect hashing is not realistic! Cuckoo Hashing: the idea • Remember the cuckoo bird? – Shares a nest with other species… – …then kicks the other species out! • Same idea with cuckoo hashing – When we insert 𝑥, we “kick out” what occupies the nest, 𝑦 – Then 𝑦 finds a new, alternate home Why is this cool? • Perfect hashing guarantees – O(1) lookup, O(1) insert • Cuckoo hashing guarantees – O(1) lookup – O(1) insert** • Other hashing strategies can’t guarantee this! • Also, it’s an option for your final project ** There’s a caveat here, but we’ll see it later Cuckoo Hashing: Two Nests • Suppose we have TWO hash tables 𝑇1 , 𝑇2 – they each have a hash function ℎ1 𝑥 , ℎ2 (𝑥) – we prefer 𝑇1 , but if we have to move we’ll go to 𝑇2 – if we’re in 𝑇2 and have to move, we’ll go back to 𝑇1 • This is our collision strategy for cuckoo hashing – Different from linear probing/open addressing – Different from trees Cuckoo Hashing: Example • We want to insert 𝑥 • There are no conflicts anywhere ℎ2 (𝑥) ℎ1 (𝑥) x Cuckoo Hashing : Example • Now we want to insert 𝑦 • There are no conflicts anywhere x y Cuckoo Hashing : Example • To insert 𝑧, ℎ1 𝑧 = ℎ1 (𝑥) • Move 𝑥 to ℎ2 (𝑥) y oh no! x z Cuckoo Hashing : Example • Now we insert 𝑧 into ℎ1 (𝑧) y x NOW we’re fine! z Cuckoo Hashing : Example • The final table after inserting 𝑥, 𝑦, 𝑧 in order y x z Why two tables? • Two tables, one for each hash function • Simple to visualize, simple to implement • But, why two? • One table works just as well! • Just as simple to implement (all one table) One Table Example • Let’s insert 𝑥, 𝑦, 𝑧 again, with ℎ1 𝑥 , ℎ2 (𝑥) • Again, ℎ1 𝑥 preferred ℎ2 (𝑥) ℎ1 (𝑥) x One Table Example • Now insert 𝑦 • No conflicts, no problem ℎ2 (𝑦) x ℎ1 (𝑦) y One Table Example • Now insert 𝑧 • But, another conflict with 𝑥: ℎ1 𝑥 = ℎ1 (𝑧) oh no! ℎ1 (𝑧) x y ℎ2 (𝑧) z One Table Example • First, move 𝑥 to ℎ2 (𝑥) ℎ1 (𝑧) x y ℎ2 (𝑥) z One Table Example • Now we move 𝑧 to ℎ1 (𝑧) x z y One Table Example • Final table after inserting 𝑥, 𝑦, 𝑧 in order x z y Graph Representation • How can we represent our table? • Why not a graph? – Nodes are every possible table entry – Edges are inserted entries • This is a directed graph • Direction from current location TO alternate location Graph Example • Remember our one-table example? x 1 z 2 y 3 4 1 2 3 4 Infinite Insert • Suppose we insert something, and we end up in an infinite loop – Or, “too many” displacements – Some pre-defined maximum based on table size Example: Loops • Remember our one-table example? x 1 z 2 y 3 4 1 2 3 4 Example: Loops • Let’s insert 𝑤: no conflicts still x 1 z 2 y 3 4 w 1 2 3 4 Example: Loops • Now let’s insert 𝑎: displace 𝑥 x 1 z 2 y 3 w 4 a 1 2 3 4 Example: Loops • Now 𝑥 is placed, and 𝑧 is displaced (put in 4) a 1 x 2 y 3 w 4 z 1 2 3 4 Example: Loops • Now 𝑧 is placed, and 𝑤 is displaced (put in 3) a 1 x 2 y 3 z 4 w 1 2 3 4 Example: Loops • Notice what happens to the graph • We keep going and going and going…. 1 2 3 4 Analysis: Loops • Remember infinite loops in a new insert? • In the graph, this is a closed loop – We might forever re-do the same displacements • The probability of getting a loop increases dramatically once we’ve inserted 𝑁 2 elements – N is the number of buckets (size of table) – This is from the research on cuckoo hashing Analysis: Loops • What can we do once we get a loop? – Rebuild, same size (ok solution) – Double table size (better solution) • We’ll need new hash functions for both Analysis • Lookup has O(1) time – At MOST two places to look, ever – One location per hash function • Insert has amortized O(1) time – Think of this as “in the long run” – In practice we see O(1) time insert – You’ll see amortized analysis in CPSC 320 • Remember the “grass and trees” analysis? Lookup: The Code Return the position of 𝑥 (either ℎ1 𝑥 or ℎ2 (𝑥)) Otherwise, return false lookup(x) return T[h1(x)] = x T[h2(x)] = x or Insert: The Code Given a table (array) T and item 𝑥 to insert: insert(x) if lookup(x) return; pos <- h1(x); for i <- 1 to M if T[pos] empty T[pos] <- x return; swap x and T[pos]; if pos = h1(x) pos <- h2(x) else pos <- h1(x) rehash(); insert(x); end // if it’s already here, done // store h1(x) // loop at most M times // if T[pos] empty, done // put x in T[pos] // now we’re displacing // if we couldn’t stop, rehash // then insert currently displaced Analysis: Load Factor • What is load? – The average fill factor (% full) the table is • What about cuckoo hash tables? – For two hash functions, load factor ≈ 50% • Remember loops? – For three hash functions, we get ≈ 91% • That’s pretty great, actually! More hash functions • What would this look like? • We would have three tables (simple case) – One hash function per table • Or, we would have two alternates (one table) More hash functions • What would this look like? • Each entry has TWO alternates, not one • ℎ1 𝑥 , ℎ2 𝑥 , ℎ3 (𝑥) x z y More hash functions • When something comes in new (insert) – Put it in ℎ1 (𝑥) • If it’s displaced, check ℎ2 (𝑥) – If that’s full, go to ℎ3 (𝑥) • To lookup, we just look in ℎ1 𝑥 , ℎ2 , (𝑥) or ℎ3 (𝑥) – Still constant time! Even better load? • Currently we’ve only put one item per bucket • What if we had two cells per bucket? x,w z y,a Even better load? • Currently we’ve only put one item per bucket • What if we had two cells per bucket? • What about collision strategies? – Round-robin (cells take turns swapping out) – FIFO (oldest resident gets kicked out) Even better load? Links & Resources • http://en.wikipedia.org/wiki/Cuckoo_hashing • http://www.ru.is/faculty/ulfar/CuckooHash.pdf • http://www.it-c.dk/people/pagh/papers/cuckooundergrad.pdf • No neat animations on the internet…yet! – Possible personal project? – Brownie points? – Pre-coop project?