Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher Motivation • Hash tables are ubiquitous. • Highly useful in router hardware. – Measurement and monitoring tasks. • Desiderata: – – – – Few (parallel) memory accesses. High space utilization. Low failure probability. Hardware-level simplicity. • What are good hash table designs for hardware? State of the Art : Multiple Choice Hashing • Each element placed in least loaded of d locations. (If 1 element/cell, look for 1 empty cell out of d.) Cuckoo Hashing and Moves • Cuckoo hashing paradigm: give each element d choices, and move elements among choices as needed. Original Cuckoo Hashing • 2 subtables, left and right. Each element gets one location per subtable. • Place new element in left subtable. – If element already there, kick it out, move to right subtable. – If element already there, kick it out, move to left subtable… – Until everything placed. • Works with high probability as long as load is less than ½. Better Cuckoo Hashing • • • • More choices More elements per bucket Generally kick out a random item. Such schemes are not fully analyzed. What’s Wrong with Cuckoo Hashing? • Lots of moves per insert in worst case. – Average is constant. – But maximum is Omega(log n) with non-trivial (inverse-poly) probability. • Router hardware settings: may need bounded number of memory accesses per insert. Moves Needed per Insertion The Power of One Move • Previous work (submitted): How much gain from allowing just one move? • Framework: allow small content-addressable memory (CAM) to handle unsolvable collisions [max 0.2%]. • Multiple schemes analyzed. • With 4 choices, insertions only (no deletions), factor of 2 or larger improvement in space. Pros/Cons of One Move Systems • Pros – – – – Simple to implement Efficient High space utilization for insertion-only Analyzable and optimizable • Cons – Performance suffers in settings with churn – Better space utilization possible with more moves The New Idea • Use the CAM as a queue for move operations. • Lookup: check the hash table and the CAMqueue. • Try move operations from queue as available. – Move attempt = 1 parallel memory lookup. • De-amortization – Use queue to make worst-case performance same as average-case performance. Queue Policy • Key point: better to give priority to “new” insertions over moves. – New moves have d choices; moves effectively have d – 1. • Intuition suggests older items may be less likely to be successfully placed. – True in practice. • Full priority queue may be too complex. • Simple strategy: new items placed at front, failed moves places at back. Probability of Success vs. Age Experimental Evaluation • Table of size 32768, 4 subtables. • Target utilization u. • Insert 32678u elements, then alternate insertions/deletions to get to steady state. • Allow ops queue operations (parallel memory operations) per insertion. Analysis • Currently we do not know how to analyze such systems. – For d > 2 choices, lots of open questions in cuckoo hashing analysis. – Analyzing d = 2 may be possible, but very low space utilization. • See [Kutzelnigg], asymptotic analysis of cuckoo hashing. • Need to understand distribution of move operations/element to analyze queue. Conclusions and Open Questions • Moving elements leads to much better space utilization in hash tables, at a price. • Cuckoo hashing appears implementable, with per-insert move guarantees based on de-amortization via a CAM queue. • Analysis in an idealized model? – Even analysis for basic cuckoo hashing open. • Performance on real traffic? – Bursty insertions/deletions? – Distribution of element lifetimes? • Proper sizing of CAM queue? – How does overflow probability scale?