ppt

advertisement
Engineering a Sorted List Data
Structure for 32 Bit Keys
Roman Dementiev
Lutz Kettner
Jens Mehnert
Peter Sanders
MPI für Informatik,
Saarbrücken
Introduction

The power of integer keys helps in
–
–
–
–


2
Sorting (radix MSB,LSB)
Priority queues (radix heaps)
Static search trees
Dictionaries (hash tables)
Faster both in theory and practice
What about dynamic search data structures?
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Motivation

van Emde Boas (vEB) search trees [van Emde Boas77,MehlhornNaeher90]:
operation
comparison based
van Emde Boas
insert, delete, search
O(log n)
O(log K)
O(c + log n)
O(c + log K)
range query
3
n – number of elements
K – bit width of keys
c – size of the output

Small K, large n → vEB are faster ?

NO, their direct implementations are 2-8 times slower than comp.
based trees [Wenzel92,here]

Here: a tuned vEB data structure that outperforms comp. based
implementations
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Direct vEB Implementation


vEB tree maintains set M  0...2K  1
Recursive definition:
–
–
|M|=1 or K=1: store directly,
otherwise let K’ = K/2: store minM,maxM,
K'
 top: store {x div 2 : x  M } (top recursion)
K'
K'
 boti: store {x mod 2 : x  M , x div 2  i}
(bottom recursion) use hash table
K’ bit
vEB
top
hash table
K’ bit
vEB
boti
4
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Improvement 1

Replace top data structure with a bit pattern hierarchy
0
63
…


3132
0
…
…


3132
0
…
63
…

63
K’ bit
vEB
top
4095
…
…

65535
…
…
hash table
K’ bit
vEB
boti
5
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Improvement 2
0
63
…

Break recursion
when K=8
3 levels max.


3132
0

4095
…
…
…
…

3132
0

63

63
65535
…
…
…
…
Level 1 – root
Bits 31-16
hash table
Level 2
…
Bits 15-8
single elements
…
Level 2
Bits 15-8


…
…
…

K’ bit
vEB
boti

…
…
…
hash table
hash table
…
Level 3
Bits 7-0


…
…
hash table
6
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
…
Improvement 3
0
63
…

Replace root hash table
with an array


3132
0

4095
…
…
…
…

3132
0

63

63
65535
…
…
…
…
0
65535
0 0 0 0 0
hasharray
table
0 0 0 0
0 0 0 0 0 0 0 0 0
0 0
0 0 0 0
0
…
0
Level 1 – root
Bits 31-16
0 0
Level 2
…
Bits 15-8
single elements
…
Level 2
Bits 15-8


…
…
…


…
…
…
hash table
hash table
…
Level 3
Bits 7-0


…
…
hash table
7
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
…
Range Query Support
0
63
…

Link elements


3132
0

4095
…
…
…
…

3132
0

63

63
65535
…
…
…
…
0
65535
0 0 0 0 0
0 0 0 0
0 0 0 0 0 0 0 0 0
array
0 0
0 0 0 0
0
…
0
Level 1 – root
Bits 31-16
0 0
Level 2
…
Bits 15-8

…
Level 2

…
…
…
Bits 15-8


…
…
hash table
…
…
hash table
Bits 7-0


…
…
hash table
8
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Level 3
…
Example: Locate Operation


return handle of min x  M : y  x
Function locate(y:N):ElementHandle
if y > maxM then return 
i := y[16..31]
if top[i]=null or y>maxMi then return minMtop.locate(i)
if Mi={x} then return x
j := y[8..15]
if ri[j]=null or y > maxMij then return minMi,top(i).locate(j)
if Mij={x} then return x
return rij[topij.locate(y[0..7])]

9
At most 9 comparisons for any input sizes
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
// no larger element
// index into root table top
// look in the next L2 table
// single element case
// key for L2 table at Mi
// look in the next L3 table
// single element case
// L3 table access
Locate Performance
10
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Construction
11
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Deletion
12
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Hard Inputs
 225 
M  {2 i,2 i  255 : i  0.. | M | / 2},   
, queries for 256j  128 for random j  0.. | M | / 2
| M | 
8
13
8
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Conclusions and Future Work


Integer search trees can outperform comp. based search
data struct.
Future work:
–
–
–
–
–
14
Support multi-set functionality
Other key lengths (up to 38 bits)
Reduce space consumption
Find real inputs
Port it to the LEDA library
R. Dementiev et al.: A Sorted List Data Structure for 32 Bit Keys
(ALENEX'04)
Download