Engineering a Sorted List Data Structure for 32 Bit Keys Roman Dementiev Lutz Kettner Jens Mehnert Peter Sanders MPI für Informatik, Saarbrücken Introduction The power of integer keys helps in – – – – 2 Sorting (radix MSB,LSB) Priority queues (radix heaps) Static search trees Dictionaries (hash tables) Faster both in theory and practice What about dynamic search data structures? R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Motivation van Emde Boas (vEB) search trees [van Emde Boas77,MehlhornNaeher90]: operation comparison based van Emde Boas insert, delete, search O(log n) O(log K) O(c + log n) O(c + log K) range query 3 n – number of elements K – bit width of keys c – size of the output Small K, large n → vEB are faster ? NO, their direct implementations are 2-8 times slower than comp. based trees [Wenzel92,here] Here: a tuned vEB data structure that outperforms comp. based implementations R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Direct vEB Implementation vEB tree maintains set M 0...2K 1 Recursive definition: – – |M|=1 or K=1: store directly, otherwise let K’ = K/2: store minM,maxM, K' top: store {x div 2 : x M } (top recursion) K' K' boti: store {x mod 2 : x M , x div 2 i} (bottom recursion) use hash table K’ bit vEB top hash table K’ bit vEB boti 4 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Improvement 1 Replace top data structure with a bit pattern hierarchy 0 63 … 3132 0 … … 3132 0 … 63 … 63 K’ bit vEB top 4095 … … 65535 … … hash table K’ bit vEB boti 5 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Improvement 2 0 63 … Break recursion when K=8 3 levels max. 3132 0 4095 … … … … 3132 0 63 63 65535 … … … … Level 1 – root Bits 31-16 hash table Level 2 … Bits 15-8 single elements … Level 2 Bits 15-8 … … … K’ bit vEB boti … … … hash table hash table … Level 3 Bits 7-0 … … hash table 6 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) … Improvement 3 0 63 … Replace root hash table with an array 3132 0 4095 … … … … 3132 0 63 63 65535 … … … … 0 65535 0 0 0 0 0 hasharray table 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 Level 1 – root Bits 31-16 0 0 Level 2 … Bits 15-8 single elements … Level 2 Bits 15-8 … … … … … … hash table hash table … Level 3 Bits 7-0 … … hash table 7 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) … Range Query Support 0 63 … Link elements 3132 0 4095 … … … … 3132 0 63 63 65535 … … … … 0 65535 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 array 0 0 0 0 0 0 0 … 0 Level 1 – root Bits 31-16 0 0 Level 2 … Bits 15-8 … Level 2 … … … Bits 15-8 … … hash table … … hash table Bits 7-0 … … hash table 8 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Level 3 … Example: Locate Operation return handle of min x M : y x Function locate(y:N):ElementHandle if y > maxM then return i := y[16..31] if top[i]=null or y>maxMi then return minMtop.locate(i) if Mi={x} then return x j := y[8..15] if ri[j]=null or y > maxMij then return minMi,top(i).locate(j) if Mij={x} then return x return rij[topij.locate(y[0..7])] 9 At most 9 comparisons for any input sizes R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) // no larger element // index into root table top // look in the next L2 table // single element case // key for L2 table at Mi // look in the next L3 table // single element case // L3 table access Locate Performance 10 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Construction 11 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Deletion 12 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Hard Inputs 225 M {2 i,2 i 255 : i 0.. | M | / 2}, , queries for 256j 128 for random j 0.. | M | / 2 | M | 8 13 8 R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04) Conclusions and Future Work Integer search trees can outperform comp. based search data struct. Future work: – – – – – 14 Support multi-set functionality Other key lengths (up to 38 bits) Reduce space consumption Find real inputs Port it to the LEDA library R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys (ALENEX'04)