slides

advertisement
I/O-Efficient Structures for Orthogonal
Range Max and Stabbing Max Queries
Second Year Project Presentation
Ke Yi
Advisor: Lars Arge
Committee: Pankaj K. Agarwal and Jun Yang
Problem Definition: Range Max Queries
• Range-aggregate queries:
range-count, range-sum,
range-max
• N points in Rd
• Each point p is associated
with a weight w(p)
• Query rectangle Q
• Compute max{w(p) | pQ}
• Static and dynamic
2
Problem Definition: Stabbing Max Queries
• N hyper-rectangles in Rd
• Each rectangle γ is
associated with a weight
w(γ)
• Query point q
x
y
q
• Compute max{w(γ) | qγ}
3
Model
D
Block I/O
M
• I/O Model
– N : Elements in structure
– B : Elements per block
– M : Elements in main memory
– n = N/B
• Assumptions
– M>B2
– Each word holds log2N bits
– Any coordinate or weight can be stored in
one word
P
4
Related Work & Our Results: Range Queries
• 1D range queries are easy: B-tree
* O(n) space, O(logBn) query & update
• 2D range queries:
– Poly-logarithmic query: CRB-tree [AAG03]
* O(nlogBn) space, O(log2Bn) query
– Linear space: kdB-tree, cross-tree, O-tree
* O ( n ) query, O(logBn) update
• Our results:
5
Related Work & Our Results: Stabbing Queries
• 1D stabbing queries
– SB-tree [YW01]
* O(n) space, O(logBn) query & insert
* Does not allow deletions!
• 2D stabbing queries
– No structures with worst-case guarantee
• Our results:
x
y
q
6
2D Range Max Queries
• The external version of Chazelle’s structure [C88]
– Linear space,
– Static: O(log1+εN) query
– Dynamic: O(log3N log log N) query & update
• Overall structure
– A normal B-tree Φ on y-coordinates of all the points
– A Fan-out ( B ) base B-tree T on x-coordinates
* Pv: all points stored in the subtree of v
* Each internal node v stores two secondary structures Cv, Mv
storing information about Pv in a compressed manner
* Cv and Mv of size O(|Pv| / logBn) → linear size in total
* Weights of points stored at leaves explicitly
7
2D Range Max Queries
• Cv borrowed from CRB-tree
– Compute the ranks of the points one level down in O(1) I/Os
– Identify the weight of a point explicitly in O(logBn) I/Os
v
• Mv computes the maximum
( B )
weight in a multislab in
O(logBn) I/Os
v1
v2 v3 v4 v5 v6
• Answering a query:
– Use Φ to compute the ranks
in the root of T
– Use Mv to compute maximum
at each level
– For a total of O(log2Bn) I/Os
8
2D Range Max Queries: Mv
• Divide Pv into chunks of BlogBN
• Divide each chunk into minichunks of size B
• Three-level structures
v
– Mv=(Ψ1, Ψ2, Ψ3)
( B )
– each of size O(|Pv| / logBn)
9
2D Range Max Queries: Mv
• Basic idea: encode the range max information in a compressed
manner, identify the maximum point using Cv once its rank is found
• Ψ3[l]: for each minichunk, stores a (slab index, weight rank) pair for
each point inside the minichunk
– Find the rank of the maximum-weight point in O(1) I/Os;
– Identify it in O(logBN) I/Os.
• Ψ2[k]: for each chunk, encode a Cartesian tree on the O(logBN)
minichunks for each of the O(B) multislabs
– Find the minichunk containing the maximum-weight point in
O(1) I/Os;
– Use Ψ3 to find the exact point in O(logBN) I/Os;
• Ψ1: A fanout ( B ) B-tree on the O(|Pv| / (BlogBn)) chunks
– Find the maximum-weight point in O(logBN) I/Os.
10
2D Range Max Queries
• Static structures
– O(n) size, O(log2BN) query, O(nlogBN) construction
– O(n) size, O(logB1+εN) query, O(NlogBN) construction
• Dynamization:
– Throw away Ψ2 and expandΨ3
– O(nlogBlogBN) size
– O(log3BN) query, worst case
– O(log2BN logM/BlogBN) insert, amortized
– O(log2BN) delete, amortized
• Extending to d-dimension
– Standard technique
– Pay an extra O(logd-2BN) factor to all these bounds
11
1D Stabbing Max Queries
• Modify the external interval tree [AV96] to support max
• Fan-out ( B ) base B-tree on x-coordinates
– Interval stored in highest node v where it contains slab boundary
– In one left (right) slab structure and the multislab structure
• Answering a query
x
y
– Search down tree and visit O(logBN)
nodes
– Compute the maximum weight in left (right)
q
slab structure and the multislab structure
v
( B )
12
1D Stabbing Max Queries
• Slab structures are implemented using B-trees
– Query and update: O(logBN) I/Os
• Multislab structure: Fan-out ( B ) B-tree
– At each internal node, we store the maximum weight for each of
the ( B ) slabs and for each of the ( B ) children
– Query: O(1) I/Os (only look at the root)
– Update: O(logBN) I/Os
• Rebalancing the base tree: O(logBN) I/Os
– Weight-balanced B-trees
• Overall cost: size O(n), query O(log2BN), update O(logBN).
13
1D Stabbing Max Queries
• Space-time tradeoff:
– O(nlogBεN) size
– O(nlogB2-εN) query
• Can handle the general semigroup queries
– A semigroup (S, +)
– Each weight w(γ) S
– Want to compute ∑ qγ w(γ)
• Ideas can also be used to improve the internal memory algorithm
– Linear size, O(log2N / log log N) query and update
14
2D Stabbing Max Queries
• Extend our 1D stabbing query structure
• Use our 2D range query structure as a building block
• Extending to d-dimension
– Standard technique
– Pay an extra O(logd-2BN) factor to all these bounds
15
Conclusions and Open Problems
• In this project, we developed I/O-efficient
– linear space structures with poly-logarithmic query cost for the
static 2D range max queries
– near linear space structures with poly-logarithmic query &
update cost for the dynamic 2D range max queries
– linear space structures with poly-logarithmic query cost for the
dynamic 1D stabbing max queries
– near linear space structures with poly-logarithmic query &
update cost for the dynamic 2D stabbing max queries
• Open problems
– Linear size dynamic structures for the 2D range & stabbing max
queries?
– General semigroup queries?
16
THE END
Thank you!
Download