CS 221 Guest lecture: Cuckoo Hashing

advertisement
CS 221
Guest lecture: Cuckoo Hashing
Shannon Larson
March 11, 2011
Learning Goals
• Describe the cuckoo hashing principle
• Analyze the space and time complexity of
cuckoo hashing
• Apply the insert and lookup algorithms in a
cuckoo hash table
• Construct the graph for a cuckoo table
Remember Graphs?
• A set of nodes V =
𝑣1 , … , 𝑣𝑛
• A set of edges 𝐸 =
𝑣𝑖1 , 𝑣𝑖2 , … , 𝑣𝑖𝑘−1 , 𝑣𝑖𝑘
• Here:
– 𝑉 = 𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 , 𝑣5
– 𝐸 = 𝑣1 , 𝑣2 , … 𝑣3 , 𝑣5
– 𝐸 = {𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 , 𝑒5 , 𝑒6 }
Graph Cycles
• A graph cycle is a path of edges such that the
first and last vertices are the same
𝑣1 , 𝑣2 , 𝑣5 , 𝑣3 , 𝑣4 , 𝑣1
Recall Hashing
• A hash function ℎ(𝑥)
– Takes the target 𝑥
– Hashes x to a bucket 0 … 𝑁 − 1
• Perfect hashing is ideal:
– O(1) lookup
– O(1) insert
• Perfect hashing is not realistic!
Cuckoo Hashing: the idea
• Remember the cuckoo bird?
– Shares a nest with other species…
– …then kicks the other species out!
• Same idea with cuckoo hashing
– When we insert 𝑥, we “kick out” what occupies
the nest, 𝑦
– Then 𝑦 finds a new, alternate home
Why is this cool?
• Perfect hashing guarantees
– O(1) lookup, O(1) insert
• Cuckoo hashing guarantees
– O(1) lookup
– O(1) insert**
• Other hashing strategies can’t guarantee this!
• Also, it’s an option for your final project
** There’s a caveat here, but we’ll see it later
Cuckoo Hashing: Two Nests
• Suppose we have TWO hash tables 𝑇1 , 𝑇2
– they each have a hash function ℎ1 𝑥 , ℎ2 (𝑥)
– we prefer 𝑇1 , but if we have to move we’ll go to 𝑇2
– if we’re in 𝑇2 and have to move, we’ll go back to 𝑇1
• This is our collision strategy for cuckoo hashing
– Different from linear probing/open addressing
– Different from trees
Cuckoo Hashing: Example
• We want to insert 𝑥
• There are no conflicts anywhere
ℎ2 (𝑥)
ℎ1 (𝑥)
x
Cuckoo Hashing : Example
• Now we want to insert 𝑦
• There are no conflicts anywhere
x
y
Cuckoo Hashing : Example
• To insert 𝑧, ℎ1 𝑧 = ℎ1 (𝑥)
• Move 𝑥 to ℎ2 (𝑥)
y
oh no!
x
z
Cuckoo Hashing : Example
• Now we insert 𝑧 into ℎ1 (𝑧)
y
x
NOW we’re fine!
z
Cuckoo Hashing : Example
• The final table after inserting 𝑥, 𝑦, 𝑧 in order
y
x
z
Why two tables?
• Two tables, one for each hash function
• Simple to visualize, simple to implement
• But, why two?
• One table works just as well!
• Just as simple to implement (all one table)
One Table Example
• Let’s insert 𝑥, 𝑦, 𝑧 again, with ℎ1 𝑥 , ℎ2 (𝑥)
• Again, ℎ1 𝑥 preferred
ℎ2 (𝑥)
ℎ1 (𝑥)
x
One Table Example
• Now insert 𝑦
• No conflicts, no problem
ℎ2 (𝑦)
x
ℎ1 (𝑦)
y
One Table Example
• Now insert 𝑧
• But, another conflict with 𝑥: ℎ1 𝑥 = ℎ1 (𝑧)
oh no!
ℎ1 (𝑧)
x
y
ℎ2 (𝑧)
z
One Table Example
• First, move 𝑥 to ℎ2 (𝑥)
ℎ1 (𝑧)
x
y
ℎ2 (𝑥)
z
One Table Example
• Now we move 𝑧 to ℎ1 (𝑧)
x
z
y
One Table Example
• Final table after inserting 𝑥, 𝑦, 𝑧 in order
x
z
y
Graph Representation
• How can we represent our table?
• Why not a graph?
– Nodes are every possible table entry
– Edges are inserted entries
• This is a directed graph
• Direction from current location TO alternate location
Graph Example
• Remember our one-table example?
x
1
z
2
y
3
4
1
2
3
4
Infinite Insert
• Suppose we insert something, and we end up
in an infinite loop
– Or, “too many” displacements
– Some pre-defined maximum based on table size
Example: Loops
• Remember our one-table example?
x
1
z
2
y
3
4
1
2
3
4
Example: Loops
• Let’s insert 𝑤: no conflicts still
x
1
z
2
y
3
4
w
1
2
3
4
Example: Loops
• Now let’s insert 𝑎: displace 𝑥
x
1
z
2
y
3
w
4
a
1
2
3
4
Example: Loops
• Now 𝑥 is placed, and 𝑧 is displaced (put in 4)
a
1
x
2
y
3
w
4
z
1
2
3
4
Example: Loops
• Now 𝑧 is placed, and 𝑤 is displaced (put in 3)
a
1
x
2
y
3
z
4
w
1
2
3
4
Example: Loops
• Notice what happens to the graph
• We keep going and going and going….
1
2
3
4
Analysis: Loops
• Remember infinite loops in a new insert?
• In the graph, this is a closed loop
– We might forever re-do the same displacements
• The probability of getting a loop increases
dramatically once we’ve inserted 𝑁 2 elements
– N is the number of buckets (size of table)
– This is from the research on cuckoo hashing
Analysis: Loops
• What can we do once we get a loop?
– Rebuild, same size (ok solution)
– Double table size (better solution)
• We’ll need new hash functions for both
Analysis
• Lookup has O(1) time
– At MOST two places to look, ever
– One location per hash function
• Insert has amortized O(1) time
– Think of this as “in the long run”
– In practice we see O(1) time insert
– You’ll see amortized analysis in CPSC 320
• Remember the “grass and trees” analysis?
Lookup: The Code
Return the position of 𝑥 (either ℎ1 𝑥 or ℎ2 (𝑥))
Otherwise, return false
lookup(x)
return T[h1(x)] = x
T[h2(x)] = x
or
Insert: The Code
Given a table (array) T and item 𝑥 to insert:
insert(x)
if lookup(x)
return;
pos <- h1(x);
for i <- 1 to M
if T[pos] empty
T[pos] <- x
return;
swap x and T[pos];
if pos = h1(x)
pos <- h2(x)
else
pos <- h1(x)
rehash();
insert(x);
end
// if it’s already here, done
// store h1(x)
// loop at most M times
// if T[pos] empty, done
// put x in T[pos]
// now we’re displacing
// if we couldn’t stop, rehash
// then insert currently displaced
Analysis: Load Factor
• What is load?
– The average fill factor (% full) the table is
• What about cuckoo hash tables?
– For two hash functions, load factor ≈ 50%
• Remember loops?
– For three hash functions, we get ≈ 91%
• That’s pretty great, actually!
More hash functions
• What would this look like?
• We would have three tables (simple case)
– One hash function per table
• Or, we would have two alternates (one table)
More hash functions
• What would this look like?
• Each entry has TWO alternates, not one
• ℎ1 𝑥 , ℎ2 𝑥 , ℎ3 (𝑥)
x
z
y
More hash functions
• When something comes in new (insert)
– Put it in ℎ1 (𝑥)
• If it’s displaced, check ℎ2 (𝑥)
– If that’s full, go to ℎ3 (𝑥)
• To lookup, we just look in ℎ1 𝑥 , ℎ2 , (𝑥) or ℎ3 (𝑥)
– Still constant time!
Even better load?
• Currently we’ve only put one item per bucket
• What if we had two cells per bucket?
x,w
z
y,a
Even better load?
• Currently we’ve only put one item per bucket
• What if we had two cells per bucket?
• What about collision strategies?
– Round-robin (cells take turns swapping out)
– FIFO (oldest resident gets kicked out)
Even better load?
Links & Resources
• http://en.wikipedia.org/wiki/Cuckoo_hashing
• http://www.ru.is/faculty/ulfar/CuckooHash.pdf
• http://www.it-c.dk/people/pagh/papers/cuckooundergrad.pdf
• No neat animations on the internet…yet!
– Possible personal project?
– Brownie points?
– Pre-coop project?
Download