COSC 6114 Prof. Andy Mirzaian References: • [M. de Berge et al] chapter 5 Applications: • Data Base • GIS, Graphics: crop-&-zoom, windowing Orthogonal Range Search: Data Base Query salary 14,000 Mr. G. O. Meter born: Nov. 6, 1988 Salary: $13,600. 13,000 date of birth 1980,00,00 1989,99,99 2D Query Rectangle [1980,00,00 : 1989,99,99] [13,000 : 14,000] salary 14,000 13,000 4 2 date of birth 1980,00,00 1989,99,99 3D Query Orthogonal Range [1980,00,00 : 1989,99,99] [13,000 : 14,000] [2 : 4] 1D-Tree: 1-Dimensional Range Searching x axis x’ x Static: Binary Search in a sorted array. Dynamic: Store data points in some balanced Binary Search Tree T. Let the data points be P = { p1, p2 , …, pn } . T is a balanced BST where the data appear at its leaves sorted left to right. The internal nodes are used to split left & right subtrees. Assume x(v) = max x(L), where L is any leaf in the left subtree of internal node v. root[T ] 23 9 49 5 17 2 2 6 5 6 13 9 13 37 31 23 17 62 31 41 37 41 73 62 49 85 73 85 Query Range: [7 : 49] 91 Query Range [x : x’]: Call 1DRangeQuery(root[T ],x,x’) ALGORITHM 1DRangeQuery (v, x, x’) if v is a leaf then if x x(v) x’ then report data stored at v else do if x x(v) then 1DRangeQuery ( leftchild(v) , x, x’ ) if x(v) < x’ then 1DRangeQuery ( rightchild(v), x, x’ ) od end Complexities: root[T ] Query Time O( K + log n) Construction Time O(n log n) Space O(n) T ,[x,x’] output PT store T vsplit [These are optimal] K leaves reported 2D-Tree y Consider dimension d=2: point p=( x(p) , y(p) ) , range R = [x1 : x2] [y1 : y2] p R x(p) [x1 : x2] and y(p) [y1 : y2] . R y2 p y(p) y1 x x1 L x(p) x2 L 2D-tree Pleft Pright Pleft Pright OR L = vertical/horizontal median split. Pright Alternate between vertical & horizontal splitting L Pleft at even and odd depths. (Assume: no 2 points have equal x or y coordinates.) Constructing 2D-Tree Input: P = { p1, p2 , …, pn } 2 off-line. Output: 2D-tree storing P. Step 1: Pre-sort P on x & on y, i.e., 2 sorted lists Û = (Xsorted(P), Ysorted(P)). Step 2: root[T ] Build2DTree ( Û , 0) end Procedure Build2DTree ( Û , depth ) if Û contains one point then return a leaf storing this point else do if depth is even then x-median split Û, i.e., split data points in half by a vertical line L through x-median of Û and reconfigure Ûleft and Ûright . else y-median split Û, … by a horizontal line L, and reconfigure Ûleft and Ûright . v a newly created node storing line L leftchild(v) Build2DTree ( Ûleft , 1+depth) rightchild(v) Build2DTree ( Ûright , 1+depth) return v end T(n) = 2 T(n/2) + O(n) = O(n log n) time. 2D-Tree Example L1 L5 L9 p10 p2 p8 L2 p5 p 6 p3 L4 p1 p4 L3 L8 L6 p9 p7 L1 L7 L2 L6 L3 L4 p1 L5 p4 p3 p2 L7 L8 p5 p7 L9 p9 p6 p8 p10 Query Point Search in 2D-Tree L1 L5 L9 p10 p2 p8 L2 p5 p 6 p3 L4 p1 p4 L3 L8 q L6 p9 p7 L1 L7 L2 L6 L3 L4 p1 L5 p4 p3 p2 L7 L8 p5 p7 L9 p9 p6 p8 p10 2D-Tree node regions region(v) = rectangular region (possibly unbounded) covered by the subtree rooted at v. region (root[T ]) = (- : + ) (- : + ) Suppose region(v) = x1 : x2 y1 : y2 what are region(leftchild(v)) and region(rightchild(v))? L With x-split: region(lc(v)) = x1 : x(L) ] y1 : y2 region(rc(v)) = ( x(L) : x2 y1 : y2 With y-split: region(lc(v)) = x1 : x2 y1 : y(L) ] region(rc(v)) = x1 : x2 ( y(L) : y2 lc(v) rc(v) lc(v) L rc(v) 2D-Tree Range Search For range R = [x1 : x2] [y1 : y2] call Search2DTree (root[T ] , R ) ALGORITHM Search2DTree ( v , R ) 1. if v is a leaf then if p(v) R then report p(v) 2. else if region(lc(v)) R 3. then ReportSubtree (lc(v)) 4. else if region(lc(v)) R 5. then Search2DTree ( lc(v) , R ) 6. 7. 8. 9. end if region(rc(v)) R then ReportSubtree (rc(v)) else if region(rc(v)) R then Search2DTree ( rc(v) , R ) region(v) can either be passed as input parameter, or explicitly stored at node v, vT. ReportSubtree(v) is a simple linear-time in-order traversal that reports every leaf descendent of node v. Running Time of Search2DTree K = # of points reported. Lines 3 & 7 take O(K) time over all recursive calls. Total # nodes visited (reported or not) is proportional to # times conditions of lines 4 & 8 are true. region(v)R & region(v) R a bounding edge e of R intersects region(v). R has 4 bounding edges. Let e (assume vertical) be one of them. Define H(n) (resp. V(n)) = worst-case number of nodes v that intersect e for a 2D-tree of n leaves, assuming root corresponds to an x-split (resp. y-split). e H(n) e L V(n) H (n ) V (n / 2) 1 H (n ) 3 n 2 H ( n ) 2 H ( n / 4 ) 2 V (n ) 2H (n / 2) 1 V ( n ) 2 V ( n / 4 ) 3 V (n ) 4 n 3 (H (1) V (1) 1) Running Time O( K n ). L dD-Tree Complexities 2D-Tree O( K + n ) worst-case, Query Time : O( K + log n) average Construction Time : O(n log n) Storage Space: dD-Tree O(n) d-dimensions Use round-robin splitting at successive levels on the d dimensions x1 , x2 , … , xd . Query Time: Construction Time: Space: O(dK + d n1–1/d ) O(d n log n) O(dn) How can we improve the query time? Range Trees 2D Range Tree Query Time: O( K + log2 n ) O(K + log n) by Fractional Cascading Construction Time: O(n log n) Space: O(n log n) Range R = [x : x’] [y : y’] 1D Range Tree on x-coordinates: y O(log n) x x’ x x’ O(log n) canonical sub-trees Each x-range [x : x’] can be expressed as the disjoint union of O(log n) canonical x-ranges. Range Trees 2-level data structure: root[T ] Primary Level: BST on x-coordinates Tassoc(v) v Secondary level: BST on y-coord. P(v) P(v) min(v) max(v) Range Tree Construction ALGORITHM Build 2D Range Tree (P) Input: P = { p1, p2 , …, pn } 2, P = (Px , Py) represented by pre-sorted list on x (named Px) and on y (named Py). Output: pointer to the root of 2D range tree for P. Construct Tassoc , bottom up, based on Py , but store in each leaf the points, not just their y-coordinates. if |P| > 1 then do Pleft { pP | px xmed of P } (* both lists Px and Py should split *) Pright { pP | px > xmed of P } lc(v) Build 2D Range Tree (Pleft ) rc(v) Build 2D Range Tree (Pright ) od min(v) min (Px ); max(v) max(Px ) Tassoc(v) Tassoc return v end T(n) = 2 T(n/2) + O(n) = O(n log n) time. This includes time for pre-sorting. 2D Range Query ALGORITHM 2DRangeQuery ( v, [x : x’] [y : y’] ) 1. if x min(v) & max(v) x’ 2. then 1DRangeQuery (Tassoc(v) , [y : y’] ) 3. else if v is not a leaf do 4. if x max(lc(v)) 5. then 2DRangeQuery ( lc(v), [x : x’] [y : y’] ) 6. if min(rc(v)) x’ 7. then 2DRangeQuery ( rc(v), [x : x’] [y : y’] ) 8. od end T x x’ • Line 2 called at roots of red canonical sub-trees, a total of O(log n) times. Each call takes O(Kv + log | Tassoc(v) | ) = O(Kv + log n) time. • Lines 5 & 7 called at blue shoulder paths. Total cost O(log n). • Total Query Time = O(log n + v(Kv + log n)) = O(vKv + log2 n) = O(K + log2 n). Query Time: O( K + log2 n ) will be improved to O(K + log n) by Fractional Cascading Construction Time: O(n log n) Space: O(n log n) Higher Dimensional Range Trees P = { p1, p2 , …, pn } d, pi = (xi1 , xi2 , … , xid ) , i=1..n. root[T ] Primary Level: BST on the 1st coordinate Tassoc(v) (d-1)-dimensional Range Tree on coord’s 2..d. v P(v) P(v) Higher Dimensional Range Trees d-level data structure Higher Dimensional Range Trees Query Time: Qd(n) = O( K + logd n) improved to O(K + logd-1 n) by Frac. Casc. Construction Time: Td(n) = O(n logd-1 n) Space: Sd(n) = O(n logd-1 n) T d ( n ) 2 T d n2 T d 1 ( n ) O ( n ) T d ( n ) O ( n log T 2 ( n ) O ( n log n ) S d ( n ) 2 S d n2 S d 1 ( n ) O (1) S d ( n ) O ( n log S 2 ( n ) O ( n log n ) d 1 d 1 n) n) Q d ( n ) O ( K ) Qˆ d ( n ) Qˆ d ( n ) O (log d n ) ˆ ˆ Q d ( n ) O (log n ) O (log n ) Q d 1 ( n ) d Q ( n ) O ( K log n) 2 d Qˆ 2 ( n ) O (log n ) General Sets of Points What if 2 points have the same coordinate value at some coordinate axis? Composite Numer Space: (lexicographic order) (a,b) (a | b) (a | b) < (a’ | b’) a<a’ or (a=a’ & b<b’) p = (px , py ) p’ = ((px | py ) , (py | px ) ) R=[x:x’][y:y’] R’ = [ (x | -) : (x’ | +) ] [ (y | -) : (y’ | +) ] pR p’ R’ x px x’ (x | -) ((px | py ) (x’ | +) & y py y’ & (y | -) ((py | px ) (y’ | +) Note: no two points in the composite space have the same value at any coordinate (unless they are identical points). Fractional Cascading IDEA: Save repeated cost of binary search in many sorted lists for the same range [y : y’] if the list contents for one are a subset of the other. A2 A1 Binary search for y in A1 to get to A1[i]. Follow pointer to A2 to get to A2[j]. Now walk to the right in each list. A1 1 3 5 7 9 13 15 A2 5 26 13 19 23 26 31 36 36 45 63 92 45 nil nil Fractional Cascading A1 1 7 1 3 5 7 9 13 13 23 26 15 36 19 23 26 3 5 31 36 9 A2 15 19 31 A3 A2 A1 , A3 A1 . No binary search in A2 and A3 is needed. Do binary search in A1. Follow blue and red pointers from there to A2 and A3. Now we have the starting point in each sorted list. Walk to the right & report. nil Layered 2D Range Tree Tassoc(v) T v Tassoc(lc(v)) lc(v) rc(v) P(lc(v)) P(v) P(rc(v)) P(lc(v)) P(v) P(rc(v)) P(v) Tassoc(rc(v)) Layered 2D Range Tree T Associated Structures at the secondary level by Fractional Cascading Layered 2D Range Tree (by Fractional Cascading) Query Time: Q2(n) = O(log n + v (Kv + log n)) = O(v Kv + log2 n) = O(K + log2 n) improves to: Q2(n) = O(log n + v (Kv + 1)) = O(v Kv + log n) = O(K + log n). For d-dimensional range tree query time improves to: Q d ( n ) O ( K ) Qˆ d ( n ) ˆ ˆ Q ( n ) O (log n ) O (log n ) Q ( n ) d Q d ( n ) O ( K log d 1 Qˆ 2 ( n ) O (log n ) d 1 n) Exercises 1. Show the following implication on the worst-case query time on 2D-Tree: H ( n ) 2 H ( n / 4 ) 2 ˆ Q ( n ) O ( n ). V ( n ) 2 V ( n / 4 ) 3 2. Describe algorithms to insert and delete points from a 2D-Tree. You don’t need to take care of rebalancing the structure. 3. dD-Trees can also be used for partial match queries. A 2D partial match query specifies one of the coordinates and asks for all points that have the specified coordinate value. In higher dimensions we specify values for a subset of the coordinates. Here we allow multiple points to have equal values for coordinates. (a) Show that 2D-Trees can answer partial match queries in O(K+n) time, where K is the number of reported answers. (b) Describe a data structure that uses O(n) storage and answers 2D partial match queries in O(K + log n) time. (c) Show that a dD-Tree can solve a partial match query in O(K + n1-s/d) time, where s is the number of specified coordinates. (d) Show that, when we allow for O(d 2d n) storage, dD partial match queries can be answered in O(K + d log n) time. 4. Describe algorithms to insert and delete points from a Range Tree. You don’t need to take care of rebalancing the structure. 5. One can use 2D-Trees and Range Trees to search for a particular point (a,b) by performing a range query with the range [a:a] [b:b]. (a) Prove this takes O(log n) time on 2D-Trees. (b) Derive the time bound on Range Trees. 6. In many applications one wants to do range searching among objects other than points. (a) Let P be a set of n axis-parallel rectangles in the plane. We want to be able to report all rectangles in P that are completely contained in a query rectangle [x : x’] [y: y’]. Describe a data structure for this problem that uses O(n log3 n) storage and has O(K + log4 n) query time, where K is the number of reported answers. [Hint: Transform the problem to an orthogonal range searching problem in some higher dimensional space.] (b) Let P now consist of a set of n simple polygons in the plane. Describe a data structure that uses O(n log3 n) space (excluding space needed to externally store the polygons) and has O(K + log4 n) query time, where K is the number of polygons completely contained in the query rectangle that are reported. (c) Improve the query time to O(K + log3 n). 7. For this problem, assume for simplicity that n is a power of 2. Consider a 3D-Tree for a set of n points in 3D. Consider a line that is parallel to the x-axis. What is the maximum number of leaf cells that can intersect by such a line. 8. We showed that a 2D-tree could answer orthogonal range queries for a set of n points in the plane in O( n1/2 + K) time, where K was the number of points reported. Generalize this to show that in dimension 3, the query time is O(n2/3 + K). END