A Self-adjusting Data Structure for Multi-dimensional

advertisement
A Self-adjusting Data Structure
for Multi-dimensional Point Sets
Eunhui Park & David M. Mount
University of Maryland
Sep. 2012
Motivation
• Sleator & Tarjan introduced the splay tree almost 30 years
ago.
• Self adjusts to access distribution
• Supports insertion and deletion in O(log n) amortized time
• Efficient access:
• Balance property – m accesses in O((m+n) log n) time
• Scanning property [Elmasry 2004] – access all items in O(n) time
• Working set property – … on temporal locality
• Static optimality property – Efficient access based on frequency
• Static & dynamic finger [Cole, 2000] properties – … on spatial locality
Is there a multi-dimensional generalization?
Background
• Compressed Quadtree
• Hierarchical partition of space
• O(n) space
• O(log n) access time if augmented:
• Topology tree [Frederickson1985, Har-Peled 2005 ]
• Skip quadtree [Eppstein, Goodrich, Sun 2005]
• Quadtreap [Mount, Park 2010] based on treap [Seidel, Aragon 1996]
• Efficient approximate proximity queries
• Approximate nearest neighbor search
• Approximate range search
Objective
Quadtree + Splay tree
Splay Quadtree
• Like quadtrees:
• A versatile geometric partition tree
• Supports efficient approximate proximity queries
• Like splay trees:
• Adjusts to access distribution
• Supports insertion/deletion in O(log n) amortized time
• Supports splay tree access properties: balance, static optimality,
working set, static finger
Overview
• BD-tree
• BD-tree
• Rotation
• Splaying operation
• Basic splaying
• Splaying
• Efficiency
• Insertion/deletion
• Search and access efficiency
BD-tree
Box Decomposition tree (BD-tree) :
A geometric data structure based on a hierarchical decomposition
of space into d-dimensional axis-aligned rectangles
• Each node is associated with a region of space
•
•
•
•
called a cell.
Each cell is defined by an outer box and an
optional inner box.
Partition operations: split and shrink.
Internal nodes: split nodes and shrink nodes.
Each leaf has a single point or a single inner box.
box
cell
leaves
BD-tree: Partitioning Operations
• Split
Partitions a cell by an
axis-orthogonal
hyperplane that bisects
the cell’s longest side.
C
D
C
E
left
D
E
split
• Shrink
Partitions a cell by a
shrinking box, which
lies within the cell.
right
C
C
C
F
shrink
inner
F
outer
C\F
523686
BD-tree: Promotion
• By construction, nodes are generated in shrink-split pairs. We
merge each into a single ternary node, called a pseudo-node.
shrink node
outer
inner
split node
right
left
pseudo-node
left right outer
• Tree can be restructured through a local operation, called
promotion.
x
y
E
𝑝𝑟𝑜𝑚𝑜𝑡𝑒(𝑥)
y
x
D
A
B
C
E
𝑝𝑟𝑜𝑚𝑜𝑡𝑒(𝑦)
A
C
B
C
D
D
E
A
B
Splay Quadtree
• Given an internal node, x, splay(x) uses promotions to
transform x to the root of the tree
• This makes future accesses to x more efficient
g
x
splay(x)
f
b
g
e
d
c
b
x
c
f
d
e
Basic Splaying
• As in Sleator & Tarjan, splaying is based on primitive operations:
• Zig-zag
z
z
y
x
F
A
x
G
y
x
B
C
F
D
D
A
E
B
G
z
y
D
E
A
C
B
E
C
F
G
• Zig-zig
x
z
y
y
y
F
x
D
A
B
C
G
A
z
x
B
z
D
E
C
A
B
C
E
F
G
D
E
F
G
The Problem of Right Promotion
• Inner-left convention:
• If an internal node’s cell has an inner box, it resides in its left child
• If necessary, left and right children are relabeled to satisfy this
• This guarantees that each cell has constant complexity
• Right promotion may violate this convention
y
x
E
𝑝𝑟𝑜𝑚𝑜𝑡𝑒(𝑥)
A
x
E
u
B
If this cell has
an inner box, u
y
B
C
A
D
v
C
D
D
v
v
Now, y’s cell has
two inner boxes,
u and v !
A
u
E
u
B
C
Splaying in 3-Phases
• Promotions must be carefully structured to avoid this problem
• 3-phased approach (3 passes from bottom to top)
g
R
g
f O
e O
L d
c
R
L
a
g
b
R
R
f
L
O
a
b
b
g
c
c
c
d
R
e
L
b
a
d
d
f
f
e
a
• As in Sleator & Tarjan, amortized efficiency is established by a
potential-based analysis.
e
Insertion and deletion
• Insert(q): locate leaf x containing q
add q as new leaf
splay(x)
x
𝑖𝑛𝑠𝑒𝑟𝑡(𝑞, 𝑥)
𝑠𝑝𝑙𝑎𝑦(𝑥)
x
q
x
q
• Insertion can be performed in O(log n) amortized time.
• Deletion can be performed in O(log n) amortized time.
Analogous to Splay Trees
• Balance Theorem:
Total access for q1, q2, …, qm takes O((m+n) log n) time.
• Working Set Theorem:
For each access qj, let tj be the number of different queries since the
last access of qj, or since the beginning if this is the qj’s first access.
Total m access queries take O( 𝑚
𝑗=1 log 𝑡𝑗 + 1 + 𝑚 + 𝑛 log 𝑛).
• Static Optimality Theorem:
Given a quadtree subdivision Z, where each cell z ∈ Z has an access
probability pz, the entropy of Z is defined as
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑍 =
𝑝𝑧 𝑙𝑜𝑔
𝑧∈𝑍
1
𝑝𝑧
Total m access queries take O(𝑚 ∙ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑍)).
Static Finger Theorem
• 1-dim (Sleator & Tarjan 83)
Total access for i1, i2, …, im takes
O(m + 𝑚
𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)).
𝑖𝑚
𝑓
• d-dim
• For a single point 𝑞,
- Let 𝑟 = 𝑑𝑖𝑠𝑡(𝑓, 𝑞)
𝑓×
𝑟
°
𝑞
Static Finger Theorem
• 1-dim (Sleator & Tarjan 83)
Total access for i1, i2, …, im takes
O(m + 𝑚
𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)).
𝑖𝑚
𝑓
• d-dim
• But most geometric queries involve
𝑄
regions, not points
- Let 𝑟 = max 𝑑𝑖𝑠𝑡(𝑓, 𝑥)
𝑥∈𝑄
𝑟
𝑓×
Static Finger Theorem
• 1-dim (Sleator & Tarjan 83)
Total access for i1, i2, …, im takes
O(m + 𝑚
𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)).
𝑖𝑚
𝑓
• d-dim
• 𝑚 queries 𝑄1 , 𝑄2 , ⋯ , 𝑄𝑚
𝑄𝑖
- Let 𝑟 = max max 𝑑𝑖𝑠𝑡(𝑓, 𝑥)
1≤𝑖≤𝑚 𝑥∈𝑄𝑖
𝑓×
𝑟
Static Finger Theorem
• 1-dim (Sleator & Tarjan 83)
Total access for i1, i2, …, im takes
O(m + 𝑚
𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)).
𝑖𝑚
𝑓
• d-dim
• For the technical reasons, need to expand
𝑄𝑖
- Let 𝑟 = (1 + 𝑐) max max 𝑑𝑖𝑠𝑡(𝑓, 𝑥)
1≤𝑖≤𝑚 𝑥∈𝑄𝑖
𝑓×
𝑟
Static Finger Theorem
• 1-dim (Sleator & Tarjan 83)
Total access for i1, i2, …, im takes
O(m + 𝑚
𝑖=0 log( 𝑖𝑗 − 𝑓 + 1)).
𝑖𝑚
𝑓
• d-dim
• Consider an expanded ball
𝑄𝑖
- Let 𝑟 = (1 + 𝑐) max max 𝑑𝑖𝑠𝑡(𝑓, 𝑥)
1≤𝑖≤𝑚 𝑥∈𝑄𝑖
• Define the working set to be the set 𝑊
of points within distance 𝑟 from 𝑓
• Total access for approx. range queries
𝑄1 , 𝑄2 , ⋯ , 𝑄𝑚 :
O(𝑚 (1/ε) d-1 log |𝑊| + 𝑛 log 𝑛)
• ANN queries
• Box queries
𝑓×
𝑟
𝑊: set of points
in expanded ball
Conclusions
• Splay Quadtree:
• Self-adjusting geometric data structure
• Supports insertion/deletion in O(log n) amortized time
• Supports efficient approximate proximity queries
• Open problems:
• Other properties of standard splay trees?
• Dynamic finger theorem
• Scanning theorem
• Better notions of distance (or generally locality) in a geometric
setting?
References
Thank you!
Download