I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang Problem Definition: Range Max Queries • Range-aggregate queries: range-count, range-sum, range-max • N points in Rd • Each point p is associated with a weight w(p) • Query rectangle Q • Compute max{w(p) | pQ} • Static and dynamic 2 Problem Definition: Stabbing Max Queries • N hyper-rectangles in Rd • Each rectangle γ is associated with a weight w(γ) • Query point q x y q • Compute max{w(γ) | qγ} 3 Model D Block I/O M • I/O Model – N : Elements in structure – B : Elements per block – M : Elements in main memory – n = N/B • Assumptions – M>B2 – Each word holds log2N bits – Any coordinate or weight can be stored in one word P 4 Related Work & Our Results: Range Queries • 1D range queries are easy: B-tree * O(n) space, O(logBn) query & update • 2D range queries: – Poly-logarithmic query: CRB-tree [AAG03] * O(nlogBn) space, O(log2Bn) query – Linear space: kdB-tree, cross-tree, O-tree * O ( n ) query, O(logBn) update • Our results: 5 Related Work & Our Results: Stabbing Queries • 1D stabbing queries – SB-tree [YW01] * O(n) space, O(logBn) query & insert * Does not allow deletions! • 2D stabbing queries – No structures with worst-case guarantee • Our results: x y q 6 2D Range Max Queries • The external version of Chazelle’s structure [C88] – Linear space, – Static: O(log1+εN) query – Dynamic: O(log3N log log N) query & update • Overall structure – A normal B-tree Φ on y-coordinates of all the points – A Fan-out ( B ) base B-tree T on x-coordinates * Pv: all points stored in the subtree of v * Each internal node v stores two secondary structures Cv, Mv storing information about Pv in a compressed manner * Cv and Mv of size O(|Pv| / logBn) → linear size in total * Weights of points stored at leaves explicitly 7 2D Range Max Queries • Cv borrowed from CRB-tree – Compute the ranks of the points one level down in O(1) I/Os – Identify the weight of a point explicitly in O(logBn) I/Os v • Mv computes the maximum ( B ) weight in a multislab in O(logBn) I/Os v1 v2 v3 v4 v5 v6 • Answering a query: – Use Φ to compute the ranks in the root of T – Use Mv to compute maximum at each level – For a total of O(log2Bn) I/Os 8 2D Range Max Queries: Mv • Divide Pv into chunks of BlogBN • Divide each chunk into minichunks of size B • Three-level structures v – Mv=(Ψ1, Ψ2, Ψ3) ( B ) – each of size O(|Pv| / logBn) 9 2D Range Max Queries: Mv • Basic idea: encode the range max information in a compressed manner, identify the maximum point using Cv once its rank is found • Ψ3[l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk – Find the rank of the maximum-weight point in O(1) I/Os; – Identify it in O(logBN) I/Os. • Ψ2[k]: for each chunk, encode a Cartesian tree on the O(logBN) minichunks for each of the O(B) multislabs – Find the minichunk containing the maximum-weight point in O(1) I/Os; – Use Ψ3 to find the exact point in O(logBN) I/Os; • Ψ1: A fanout ( B ) B-tree on the O(|Pv| / (BlogBn)) chunks – Find the maximum-weight point in O(logBN) I/Os. 10 2D Range Max Queries • Static structures – O(n) size, O(log2BN) query, O(nlogBN) construction – O(n) size, O(logB1+εN) query, O(NlogBN) construction • Dynamization: – Throw away Ψ2 and expandΨ3 – O(nlogBlogBN) size – O(log3BN) query, worst case – O(log2BN logM/BlogBN) insert, amortized – O(log2BN) delete, amortized • Extending to d-dimension – Standard technique – Pay an extra O(logd-2BN) factor to all these bounds 11 1D Stabbing Max Queries • Modify the external interval tree [AV96] to support max • Fan-out ( B ) base B-tree on x-coordinates – Interval stored in highest node v where it contains slab boundary – In one left (right) slab structure and the multislab structure • Answering a query x y – Search down tree and visit O(logBN) nodes – Compute the maximum weight in left (right) q slab structure and the multislab structure v ( B ) 12 1D Stabbing Max Queries • Slab structures are implemented using B-trees – Query and update: O(logBN) I/Os • Multislab structure: Fan-out ( B ) B-tree – At each internal node, we store the maximum weight for each of the ( B ) slabs and for each of the ( B ) children – Query: O(1) I/Os (only look at the root) – Update: O(logBN) I/Os • Rebalancing the base tree: O(logBN) I/Os – Weight-balanced B-trees • Overall cost: size O(n), query O(log2BN), update O(logBN). 13 1D Stabbing Max Queries • Space-time tradeoff: – O(nlogBεN) size – O(nlogB2-εN) query • Can handle the general semigroup queries – A semigroup (S, +) – Each weight w(γ) S – Want to compute ∑ qγ w(γ) • Ideas can also be used to improve the internal memory algorithm – Linear size, O(log2N / log log N) query and update 14 2D Stabbing Max Queries • Extend our 1D stabbing query structure • Use our 2D range query structure as a building block • Extending to d-dimension – Standard technique – Pay an extra O(logd-2BN) factor to all these bounds 15 Conclusions and Open Problems • In this project, we developed I/O-efficient – linear space structures with poly-logarithmic query cost for the static 2D range max queries – near linear space structures with poly-logarithmic query & update cost for the dynamic 2D range max queries – linear space structures with poly-logarithmic query cost for the dynamic 1D stabbing max queries – near linear space structures with poly-logarithmic query & update cost for the dynamic 2D stabbing max queries • Open problems – Linear size dynamic structures for the 2D range & stabbing max queries? – General semigroup queries? 16 THE END Thank you!