INTERVAL TREE & SEGMENTATION TREE Juho Lee, UiTae Kim UiTae Kim INTERVAL TREE Windowing Query Query for finding objects in rectangular region Let’s consider roads as line segments Let’s make the example even easier… Orthogonal Only two possible orientations Axis-parallel Data structure S: set of n axis-parallel line segments Query: find all line segments in S, intersecting W. W:=[x:x’] X [y:y’] We need a data structure for solving this window query efficiently Check segment as marked Query time: 𝑂(log 𝑛 + 𝑘) Storage: 𝑂(𝑛 log 𝑛) 1. 2. 3. 4. Entirely inside Intersects once Intersects twice (partially) overlaps boundary Lemma 10.1 Let S be a set of n axis-parallel line segments in the plane. The segments that have at least one endpoint inside an axisparallel query window W can be reported in O(logn+k) time with a data structure that uses O(nlogn) storage and preprocessing time, where k is the number of reported segments. 𝑙 ≔ (𝑥 = 𝑞𝑥 ) Now problem is reduced to… Checking whether the given line segments intersects ℓ: 𝑠 ≔ 𝑥, 𝑦 (𝑥 ′ , 𝑦) intersects ℓ iff 𝑥 ≤ 𝑞𝑥 ≤ 𝑥′ Now the problem becomes 1-D intersection test Finding intersection in 1-D 𝐼 ≔ { 𝑥1 : 𝑥1′ , 𝑥2 : 𝑥2′ , … , 𝑥𝑛 : 𝑥𝑛′ } closed set of intervals 𝑥𝑚𝑖𝑑 median of 2n endpoints If 𝑞𝑥 < 𝑥𝑚𝑖𝑑 , we don’t need to consider about the segments placed to the right of 𝑥𝑚𝑖𝑑 . Finding intersection in 1-D Construct binary tree Root(or interim node): 𝑥𝑚𝑖𝑑 Left child: Tree made of line segments placed to the left of 𝑥𝑚𝑖𝑑 Right child: Tree made of line segments placed to the right of 𝑥𝑚𝑖𝑑 Finding intersection in 1-D Construct binary tree Make tree recursively Finding intersection in 1-D Construct binary tree 𝑞𝑥 is in interval 𝑥𝑗 : 𝑥𝑗′ ∈ 𝐼𝑚𝑖𝑑 iff 𝑥𝑗 ≤ 𝑞𝑥 can be done easily if we have sorted list Interval tree If 𝐼 = ∅, then the interval tree is leaf Otherwise, let root 𝑣 be 𝑥𝑚𝑖𝑑 𝐼𝑙𝑒𝑓𝑡 ≔ 𝑥𝑗 : 𝑥𝑗′ ∈ 𝐼: 𝑥𝑗′ < 𝑥𝑚𝑖𝑑 forms left subtree 𝐼𝑚𝑖𝑑 ≔ 𝑥𝑗 : 𝑥𝑗′ ∈ 𝐼: 𝑥𝑗 ≤ 𝑥𝑚𝑖𝑑 ≤ 𝑥𝑗′ : stored twice ℒ𝑙𝑒𝑓𝑡 (𝑣): sorted from left to right ℒ𝑟𝑖𝑔ℎ𝑡 (𝑣): sorted from right to left 𝐼𝑟𝑖𝑔ℎ𝑡 ≔ { 𝑥𝑗 : 𝑥𝑗′ ∈ 𝐼: 𝑥𝑚𝑖𝑑 < 𝑥𝑗 } forms right subtree Interval tree Lemma 10.2 An interval tree on a set of n intervals uses O(n) storage and has depth O(logn). Lemma 10.2 pf) Depth part is trivial, since we are storing 2n endpoints in a binary tree. O(log n) Lemma 10.2 pf) For storage bound, note that 𝐼𝑚𝑖𝑑 , 𝐼𝑙𝑒𝑓𝑡 , 𝐼𝑟𝑖𝑔ℎ𝑡 are all disjoint. Thus, each segments are stored only twice, one for ℒ𝑙𝑒𝑓𝑡 (𝑣), and the other for ℒ𝑟𝑖𝑔ℎ𝑡 (𝑣) Hence, the storage bound is O(n). The tree uses O(n) also. Algorithm: ConstructIntervalTree ConstructIntervalTree(I) Input: A set I of intervals on the real line Output: The root of an interval tree for I Algorithm: ConstructIntervalTree 1. if 𝐼 = ∅ 2. then return an empty leaf 3. else Create a node ν. Compute 𝑥𝑚𝑖𝑑 , the median of the set of interval endpoints, and store 𝑥𝑚𝑖𝑑 with ν. 4. Compute and construct two sorted lists for 𝐼𝑚𝑖𝑑 : a list ℒ𝑙𝑒𝑓𝑡 (𝑣) sorted on left endpoint and a list ℒ 𝑟𝑖𝑔ℎ𝑡 (𝑣) sorted on right endpoint. Store these two lists at ν. 5. lc(ν)← ConstructIntervalTree(𝐼𝑙𝑒𝑓𝑡 ) 6. rc(ν)← ConstructIntervalTree(𝐼𝑟𝑖𝑔ℎ𝑡 ) 7. return v Algorithm: ConstructIntervalTree Finding median: O(n), but better to keep sorted list. O(nlogn) Let 𝑛𝑚𝑖𝑑 ≔ 𝑐𝑎𝑟𝑑(𝐼𝑚𝑖𝑑 ), then creating ℒ𝑙𝑒𝑓𝑡 𝑣 , ℒ𝑟𝑖𝑔ℎ𝑡 𝑣 takes 𝑂(𝑛𝑚𝑖𝑑 log 𝑛𝑚𝑖𝑑 ) Each step takes 𝑂(𝑛 + 𝑛𝑚𝑖𝑑 log 𝑛𝑚𝑖𝑑 ) Using similar arguments in Lemma 10.2, 𝑂(𝑛 log 𝑛). Lemma 10.3 An interval tree on a set of n intervals can be built in O(nlogn) time. Algorithm: QueryIntervalTree QueryIntervalTree(v, 𝑞𝑥 ) Input: The root v of a interval tree and a query point 𝑞𝑥 Output: All intervals containing 𝑞𝑥 Algorithm: QueryIntervalTree 1. if ν is not a leaf 2. then if 𝑞𝑥 < 𝑥𝑚𝑖𝑑 (𝜈) 3. then Walk along the list 𝐿𝑙𝑒𝑓𝑡 (𝜈), starting at the interval with the leftmost endpoint, reporting all the intervals that contain 𝑞𝑥 . Stop as soon as an interval does not contain 𝑞𝑥 . 4. QueryIntervalTree(lc(ν),𝑞𝑥 ) 5. else Walk along the list 𝐿𝑟𝑖𝑔ℎ𝑡 (𝜈), starting at the interval with the rightmost endpoint, reporting all the intervals that contain 𝑞𝑥 . Stop as soon as an interval does not contain 𝑞𝑥 . 6. QueryIntervalTree(rc(ν), 𝑞𝑥 ) Algorithm: QueryIntervalTree For each step, 𝑂(1 + 𝑘𝑣 ), where 𝑘𝑣 is the # of pts reporting We go through 𝑂(log 𝑛) steps, and Query time is 𝑂(log 𝑛 + 𝑘𝑣 ) 𝑘𝑣 = 𝑘. Theorem 10.4 An interval tree for a set I of n intervals uses O(n) storage and can be built in O(nlogn) time. Using the interval tree we can report all intervals that contain a query point in O(logn+k) time, where k is the number of reported intervals. Query time: 𝑂(log 𝑛 + 𝑘) Storage: 𝑂(𝑛) Query time: 𝑂(log 𝑛 + 𝑘) Storage: 𝑂(𝑛 log 𝑛) 1. 2. 3. 4. Entirely inside Intersects once Intersects twice (partially) overlaps boundary Extend query to segment Let 𝑆𝐻 ⊆ 𝑆 be the set of horizontal segments in 𝑆. Query becomes 𝑞 ≔ 𝑞𝑥 × [𝑞𝑦 : 𝑞𝑦′ ]. 𝑠 ∈ 𝑆𝐻 looks like 𝑠 ≔ [𝑠𝑥 : 𝑠𝑥′ ] × 𝑠𝑦 . Extend query to segment Recursively traveling through the left(right) subtree is still correct Extend query to segment However, treat for the 𝐼𝑚𝑖𝑑 is no longer correct [𝑞𝑥 : +∞)× [𝑞𝑦 : 𝑞𝑦′ ] if 𝑥𝑚𝑖𝑑 < 𝑞𝑥 Extend query to segment However, treat for the 𝐼𝑚𝑖𝑑 is no longer correct Perform range search for the left(right) endpoints Query is (−∞: 𝑞𝑥 ]× [𝑞𝑦 : 𝑞𝑦′ ] (or [𝑞𝑥 : +∞)× [𝑞𝑦 : 𝑞𝑦′ ]) Uses 𝑂(𝑛𝑚𝑖𝑑 log 𝑛𝑚𝑖𝑑 ) storage, 𝑂(log 𝑛𝑚𝑖𝑑 + 𝑘) query time Data structure Main structure: interval tree Τ on x-intervals of segments Instead of ℒ𝑙𝑒𝑓𝑡 (𝑣), ℒ𝑟𝑖𝑔ℎ𝑡 (𝑣), we have range search tree Τ𝑙𝑒𝑓𝑡 (𝑣), Τ𝑟𝑖𝑔ℎ𝑡 (𝑣). Data structure Replaced to range tree Data structure Range tree takes 𝑂(log 𝑛𝑚𝑖𝑑 + 𝑘), instead of 𝑂(𝑘𝑣 ). Total time for each step becomes 𝑂(log 2 𝑛 + 𝑘) Storage complexity becomes 𝑂(𝑛 log 𝑛), since range tree dominates the main interval tree Preprocessing time remains 𝑂(𝑛 log 𝑛) Theorem 10.5 Let S be a set of n horizontal segments in the plane. The segments intersecting a vertical query segment can be reported in 𝑂(log 2 𝑛 + 𝑘) time with a data structure that uses 𝑂(𝑛 log 𝑛) storage, where k is the number of reported segments. The structure can be built in 𝑂(𝑛 log 𝑛) time. Lemma 10.1 Let S be a set of n axis-parallel line segments in the plane. The segments that have at least one endpoint inside an axisparallel query window W can be reported in O(logn+k) time with a data structure that uses O(nlogn) storage and preprocessing time, where k is the number of reported segments. Corollary 10.6 Let S be a set of n axis-parallel segments in the plane. The segments intersecting an axis-parallel rectangular query window can be reported in 𝑂(log 2 𝑛 + 𝑘) time with a data structure that uses 𝑂(𝑛 log 𝑛) storage, where k is the number of reported segments. The structure can be built in 𝑂(𝑛 log 𝑛) time. Summary Query Time: 𝑂(log 2 𝑛 + 𝑘) Storage: 𝑂(𝑛 log 𝑛) Preprocessing:𝑂(𝑛 log 𝑛) Jooho Lee SEGMENT TREE Problem Given n segments, find intervals which contain qx. Let solve it with BST. qx Elementary Intervals (): Open Interval []: Closed Interval At most 4n+1 elementary Intervals. 2n(end points)+2n(segments)+1(right side) { (-∞:p1) (p1:p2) [p1:p1] (p2:p3) [p2:p2] (p3:p4) [p3:p3] (p4:p5) (p5:p6) [p4:p4] [p5:p5] (p6:+∞) [p6:p6] =Elementary Intervals I } BST of Elementary Intervals Height of BST: O(log(4n+1))=O(log n) (-∞:p1) (p1:p2) [p1:p1] (p2:p3) [p2:p2] (p3:p4) [p3:p3] (p4:∞) [p4:p4] u (-∞:p1) [p1:p1] (p1:p2) [p2:p2] (p2:p3) [p3:p3] (p3:p4) Let Int(u) corresponding Interval with u. Int(u) is (p1:p2) interval. [p4:p4] (p4:∞) BST of Elementary Intervals At leaf node u, Store interval I containing Int(u). Query time: log(n)+k 1.find the leaf interval containing qx 2.report all intervals stored at the corresponding leaf node. q x Int (v) I (v) u (-∞:p1) [p1:p1] (p1:p2) [p2:p2] (p2:p3) [p3:p3](p3:p4) But Storage can be quadratic..O(n2) [p4:p4] (p4:∞) Qaudratic Storage Example storages for open leaf intervals :0 1 2 . . . . . . . . .N-1 N N-1 . . . . . . . . 2 1 0 Sum of storages for open leaf intervals is more than N(N+1)/2! Idea Internal node w = union of elemental intervals of leaves in the sub-tree(w)! v Segment Tree u Int(u) Int(v) (-∞:p1) [p1:p1] (p1:p2) [p2:p2] (p2:p3) [p3:p3](p3:p4) (-∞:p1) (p1:p2) [p1:p1] (p2:p3) [p2:p2] (p3:p4) [p3:p3] (p4:∞) [p4:p4] [p4:p4] (p4:∞) Which interval is contained at node v? Each node v store the Interval [ x : x`] I (v ) Int (v) [ x : x`] Int ( parent (v)) [ x : x`] If Int ( parent (v)) [ x : x`] , [x:x`] is stored at parent(v) or parentn(v) I such that Storage in Segment Tree Construction of Segment Tree Construction There is a node to split a path. Once the paths diverge, as we follow the left path, whenever the path goes to the left child, the right child must store [x:x`]. Similarly, as we follow the right path, whenever the path goes to the right child, the left child must store [x:x`]. So, For one interval insertion, we just travel only at most two pat hs. Since Height is O(log N), time complexity to build segment tree will be O(N log N) Reporting Query with Segment Tree qx v q x Int (v) I (v) lc(v) rc(v) Int(lc(v)) O(log n+k) for a query! Int(lc(v)) Int(v) Storage Complexity A segment tree on a set of n intervals uses O(n log n) storages. Height of Tree H: O(log n) For any [ x : x`] I , contained by at most two nodes at the same depth. Therefore, (number of segments)*(2*H) = O(n log n)! If v1,v3 contain [x:x`], pa rent(v2) contain [x:x`]. Contradiction! N log N storage Example N/2 segments for filling points. N/2 segments for storing log n times. Then N/2* log N=O(NlogN) for storing! Summary Query Time: O(log n + k) Storage: O(n log n) Preprocessing: O(n log n) Quiz For given n integers, give a algorithm to handle queries whic h is to calculate sum of interval (a, b). Hint Use segment tree. Solution Get rid of closed interval For each node v, calculate total of Int(v). Similar way with previous method.