Slide 6

advertisement
COSC 6114
Prof. Andy Mirzaian
References:
• [M. de Berge et al] chapter 5
Applications:
• Data Base
• GIS, Graphics: crop-&-zoom, windowing
Orthogonal Range Search: Data Base Query
salary
14,000
Mr. G. O. Meter
born: Nov. 6, 1988
Salary: $13,600.
13,000
date of birth
1980,00,00
1989,99,99
2D Query Rectangle [1980,00,00 : 1989,99,99]  [13,000 : 14,000]
salary
14,000
13,000
4
2
date of birth
1980,00,00
1989,99,99
3D Query Orthogonal Range [1980,00,00 : 1989,99,99]  [13,000 : 14,000]  [2 : 4]
1D-Tree: 1-Dimensional Range Searching
x axis
x’
x
Static:
Binary Search in a sorted array.
Dynamic: Store data points in some balanced Binary Search Tree
T.
Let the data points be P = { p1, p2 , …, pn }  .
T is a balanced BST where the data appear at its leaves sorted left to right.
The internal nodes are used to split left & right subtrees.
Assume x(v) = max x(L), where L is any leaf in the left subtree of internal node v.
root[T ]
23
9
49
5
17
2
2
6
5
6
13
9
13
37
31
23
17
62
31
41
37
41
73
62
49
85
73
85
Query Range: [7 : 49]
91
Query Range [x : x’]: Call 1DRangeQuery(root[T ],x,x’)
ALGORITHM 1DRangeQuery (v, x, x’)
if v is a leaf then if x  x(v)  x’ then report data stored at v
else do
if x  x(v) then 1DRangeQuery ( leftchild(v) , x, x’ )
if x(v) < x’ then 1DRangeQuery ( rightchild(v), x, x’ )
od
end
Complexities:
root[T ]
Query Time
O( K + log n)
Construction Time O(n log n)
Space
O(n)
T ,[x,x’]  output
PT
store T
vsplit
[These are optimal]
K leaves reported
2D-Tree
y
Consider dimension d=2:
point p=( x(p) , y(p) ) , range R = [x1 : x2]  [y1 : y2]
p R  x(p)  [x1 : x2] and y(p)  [y1 : y2] .
R
y2
p
y(p)
y1
x
x1
L
x(p) x2
L
2D-tree
Pleft
Pright
Pleft
Pright
OR
L = vertical/horizontal median split.
Pright
Alternate between vertical & horizontal splitting
L
Pleft
at even and odd depths.
(Assume: no 2 points have equal x or y coordinates.)
Constructing 2D-Tree
Input: P = { p1, p2 , …, pn }  2 off-line.
Output: 2D-tree storing P.
Step 1: Pre-sort P on x & on y, i.e., 2 sorted lists Û = (Xsorted(P), Ysorted(P)).
Step 2: root[T ]  Build2DTree ( Û , 0)
end
Procedure Build2DTree ( Û , depth )
if Û contains one point then return a leaf storing this point
else do
if depth is even
then x-median split Û, i.e., split data points in half by a vertical line L
through x-median of Û and reconfigure Ûleft and Ûright .
else y-median split Û, … by a horizontal line L,
and reconfigure Ûleft and Ûright .
v  a newly created node storing line L
leftchild(v)  Build2DTree ( Ûleft , 1+depth)
rightchild(v)  Build2DTree ( Ûright , 1+depth)
return v
end
T(n) = 2 T(n/2) + O(n) = O(n log n) time.
2D-Tree Example
L1
L5
L9
p10
p2
p8
L2
p5 p
6
p3
L4
p1
p4
L3
L8
L6
p9
p7
L1
L7
L2
L6
L3
L4
p1
L5
p4
p3
p2
L7
L8
p5
p7
L9
p9
p6
p8
p10
Query Point Search in 2D-Tree
L1
L5
L9
p10
p2
p8
L2
p5 p
6
p3
L4
p1
p4
L3
L8
q
L6
p9
p7
L1
L7
L2
L6
L3
L4
p1
L5
p4
p3
p2
L7
L8
p5
p7
L9
p9
p6
p8
p10
2D-Tree node regions
region(v) = rectangular region (possibly unbounded) covered by the subtree rooted at v.
region (root[T ]) = (- : +  )  (- : +  )
Suppose region(v) =  x1 : x2    y1 : y2 
what are region(leftchild(v)) and region(rightchild(v))?
L
With x-split:
region(lc(v)) =  x1 : x(L) ]   y1 : y2 
region(rc(v)) = ( x(L) : x2    y1 : y2 
With y-split:
region(lc(v)) =  x1 : x2    y1 : y(L) ]
region(rc(v)) =  x1 : x2   ( y(L) : y2 
lc(v)
rc(v)
lc(v)
L
rc(v)
2D-Tree Range Search
For range R = [x1 : x2]  [y1 : y2]
call Search2DTree (root[T ] , R )
ALGORITHM Search2DTree ( v , R )
1. if v is a leaf then if p(v)  R then report p(v)
2. else if region(lc(v))  R
3.
then ReportSubtree (lc(v))
4.
else if region(lc(v))  R  
5.
then Search2DTree ( lc(v) , R )
6.
7.
8.
9.
end
if region(rc(v))  R
then ReportSubtree (rc(v))
else if region(rc(v))  R  
then Search2DTree ( rc(v) , R )
 region(v) can either be passed as input parameter, or explicitly stored at node v, vT.
 ReportSubtree(v) is a simple linear-time in-order traversal that reports every
leaf descendent of node v.
Running Time of Search2DTree






K = # of points reported.
Lines 3 & 7 take O(K) time over all recursive calls.
Total # nodes visited (reported or not) is proportional to # times conditions of
lines 4 & 8 are true.
region(v)R & region(v)  R  a bounding edge e of R intersects region(v).
R has  4 bounding edges. Let e (assume vertical) be one of them.
Define H(n) (resp. V(n)) = worst-case number of nodes v that intersect e for a
2D-tree of n leaves, assuming root corresponds to an x-split (resp. y-split).
e
H(n)
e
L
V(n)
 H (n )  V (n / 2)  1 
H (n )  3 n  2
 H ( n )  2 H ( n / 4 ) 2 
 V (n )  2H (n / 2)  1 







V
(
n
)

2
V
(
n
/
4
)

3


V (n )  4 n  3
 (H (1)  V (1)  1)




Running
Time
 O( K 
n ).
L
dD-Tree Complexities
2D-Tree
O( K + n ) worst-case,
 Query Time :
O( K + log n) average
 Construction Time : O(n log n)
 Storage Space:
dD-Tree
O(n)
d-dimensions
Use round-robin splitting at successive levels on the d dimensions x1 , x2 , … , xd .
 Query Time:
 Construction Time:
 Space:
O(dK + d n1–1/d )
O(d n log n)
O(dn)
How can we improve the query time?
Range Trees
2D Range Tree
 Query Time:
O( K + log2 n )
O(K + log n) by Fractional Cascading
 Construction Time: O(n log n)
 Space:
O(n log n)
Range R = [x : x’]  [y : y’]
1D Range Tree on x-coordinates:
y
O(log n)
x
x’
x
x’
O(log n) canonical sub-trees
Each x-range [x : x’] can be expressed as the disjoint union of O(log n) canonical x-ranges.
Range Trees
2-level data structure:
root[T ]
Primary Level:
BST on
x-coordinates
Tassoc(v)
v
Secondary level:
BST on y-coord.
P(v)
P(v)
min(v)
max(v)
Range Tree Construction
ALGORITHM Build 2D Range Tree (P)
Input: P = { p1, p2 , …, pn }  2, P = (Px , Py)
represented by pre-sorted list on x (named Px) and on y (named Py).
Output: pointer to the root of 2D range tree for P.
Construct Tassoc , bottom up, based on Py ,
but store in each leaf the points, not just their y-coordinates.
if |P| > 1
then do
Pleft  { pP | px  xmed of P }
(* both lists Px and Py should split *)
Pright  { pP | px > xmed of P }
lc(v)  Build 2D Range Tree (Pleft )
rc(v)  Build 2D Range Tree (Pright )
od
min(v)  min (Px ); max(v)  max(Px )
Tassoc(v)  Tassoc
return v
end
T(n) = 2 T(n/2) + O(n) = O(n log n) time.
This includes time for pre-sorting.
2D Range Query
ALGORITHM 2DRangeQuery ( v, [x : x’]  [y : y’] )
1. if x  min(v) & max(v)  x’
2.
then 1DRangeQuery (Tassoc(v) , [y : y’] )
3.
else if v is not a leaf do
4.
if x  max(lc(v))
5.
then 2DRangeQuery ( lc(v), [x : x’]  [y : y’] )
6.
if min(rc(v))  x’
7.
then 2DRangeQuery ( rc(v), [x : x’]  [y : y’] )
8.
od
end
T
x
x’
• Line 2 called at roots of red canonical sub-trees, a total of O(log n) times.
Each call takes O(Kv + log | Tassoc(v) | ) = O(Kv + log n) time.
• Lines 5 & 7 called at blue shoulder paths. Total cost O(log n).
• Total Query Time = O(log n + v(Kv + log n)) = O(vKv + log2 n) = O(K + log2 n).
Query Time:
O( K + log2 n ) will be improved to O(K + log n) by Fractional Cascading
Construction Time: O(n log n)
Space:
O(n log n)
Higher Dimensional Range Trees
P = { p1, p2 , …, pn }  d,
pi = (xi1 , xi2 , … , xid ) , i=1..n.
root[T ]
Primary Level:
BST on the 1st
coordinate
Tassoc(v)
(d-1)-dimensional
Range Tree
on coord’s 2..d.
v
P(v)
P(v)
Higher Dimensional Range Trees
d-level data structure
Higher Dimensional Range Trees
Query Time:
Qd(n) = O( K + logd n) improved to O(K + logd-1 n) by Frac. Casc.
Construction Time: Td(n) = O(n logd-1 n)
Space:
Sd(n) = O(n logd-1 n)
 T d ( n )  2 T d  n2   T d 1 ( n )  O ( n ) 

  T d ( n )  O ( n log
T 2 ( n )  O ( n log n )


 S d ( n )  2 S d  n2   S d 1 ( n )  O (1) 

  S d ( n )  O ( n log
S 2 ( n )  O ( n log n )


d 1
d 1
n)
n)


Q d ( n )  O ( K )  Qˆ d ( n )
 Qˆ d ( n )  O (log d n )
 ˆ

ˆ
 Q d ( n )  O (log n )  O (log n )  Q d 1 ( n )   
d
Q
(
n
)

O
(
K

log
n)
2
d



Qˆ 2 ( n )  O (log n )


General Sets of Points
What if 2 points have the same coordinate value at some coordinate axis?
 Composite Numer Space: (lexicographic order)
(a,b)  (a | b)
(a | b) < (a’ | b’)  a<a’ or (a=a’ & b<b’)
 p = (px , py )  p’ = ((px | py ) , (py | px ) )
R=[x:x’][y:y’]  R’ = [ (x | -) : (x’ | +) ]  [ (y | -) : (y’ | +) ]

pR

p’  R’
x  px  x’
(x | -)  ((px | py )  (x’ | +)
& y  py  y’
& (y | -)  ((py | px )  (y’ | +)
 Note: no two points in the composite space have the same
value at any coordinate (unless they are identical points).
Fractional Cascading
IDEA: Save repeated cost of binary search in many sorted lists for the same
range [y : y’] if the list contents for one are a subset of the other.
 A2  A1
 Binary search for y in A1 to get to A1[i].
 Follow pointer to A2 to get to A2[j].
 Now walk to the right in each list.
A1
1 3 5 7 9 13
15
A2
5
26
13
19
23
26
31
36
36
45
63
92
45
nil
nil
Fractional Cascading
A1
1
7
1 3 5 7 9 13
13
23
26
15
36
19
23
26
3
5
31
36
9
A2
15
19
31
A3
 A2  A1 , A3  A1 .
 No binary search in A2 and A3 is needed.
 Do binary search in A1.
 Follow blue and red pointers from there to A2 and A3.
 Now we have the starting point in each sorted list. Walk to the right & report.
nil
Layered 2D Range Tree
Tassoc(v)
T
v
Tassoc(lc(v))
lc(v)
rc(v)
P(lc(v))
P(v)
P(rc(v))
P(lc(v))  P(v)
P(rc(v))  P(v)
Tassoc(rc(v))
Layered 2D Range Tree
T
Associated Structures at
the secondary level by
Fractional Cascading
Layered 2D Range Tree (by Fractional Cascading)
Query Time:
Q2(n) = O(log n + v (Kv + log n)) = O(v Kv + log2 n) = O(K + log2 n)
improves to:
Q2(n) = O(log n + v (Kv + 1)) = O(v Kv + log n) = O(K + log n).
For d-dimensional range tree query time improves to:


Q d ( n )  O ( K )  Qˆ d ( n )
ˆ

ˆ
Q
(
n
)

O
(log
n
)

O
(log
n
)

Q
(
n
)
 d
  Q d ( n )  O ( K  log
d 1


Qˆ 2 ( n )  O (log n )


d 1
n)
Exercises
1.
Show the following implication on the worst-case query time on 2D-Tree:
 H ( n )  2 H ( n / 4 ) 2 
ˆ

  Q ( n )  O ( n ).
 V ( n )  2 V ( n / 4 )  3
2.
Describe algorithms to insert and delete points from a 2D-Tree. You don’t need to
take care of rebalancing the structure.
3.
dD-Trees can also be used for partial match queries. A 2D partial match query
specifies one of the coordinates and asks for all points that have the specified
coordinate value. In higher dimensions we specify values for a subset of the
coordinates. Here we allow multiple points to have equal values for coordinates.
(a) Show that 2D-Trees can answer partial match queries in O(K+n) time, where
K is the number of reported answers.
(b) Describe a data structure that uses O(n) storage and answers 2D partial match
queries in O(K + log n) time.
(c) Show that a dD-Tree can solve a partial match query in O(K + n1-s/d) time,
where s is the number of specified coordinates.
(d) Show that, when we allow for O(d 2d n) storage, dD partial match queries can
be answered in O(K + d log n) time.
4.
Describe algorithms to insert and delete points from a Range Tree. You don’t need
to take care of rebalancing the structure.
5.
One can use 2D-Trees and Range Trees to search for a particular point (a,b) by
performing a range query with the range [a:a] [b:b].
(a) Prove this takes O(log n) time on 2D-Trees.
(b) Derive the time bound on Range Trees.
6.
In many applications one wants to do range searching among objects other than
points.
(a) Let P be a set of n axis-parallel rectangles in the plane. We want to be able
to report all rectangles in P that are completely contained in a query rectangle
[x : x’]  [y: y’]. Describe a data structure for this problem that uses O(n log3 n)
storage and has O(K + log4 n) query time, where K is the number of reported
answers. [Hint: Transform the problem to an orthogonal range searching
problem in some higher dimensional space.]
(b) Let P now consist of a set of n simple polygons in the plane. Describe a data
structure that uses O(n log3 n) space (excluding space needed to externally
store the polygons) and has O(K + log4 n) query time, where K is the number
of polygons completely contained in the query rectangle that are reported.
(c) Improve the query time to O(K + log3 n).
7.
For this problem, assume for simplicity that n is a power of 2. Consider a 3D-Tree
for a set of n points in 3D. Consider a line that is parallel to the x-axis. What is the
maximum number of leaf cells that can intersect by such a line.
8.
We showed that a 2D-tree could answer orthogonal range queries for a set of n
points in the plane in O( n1/2 + K) time, where K was the number of points reported.
Generalize this to show that in dimension 3, the query time is O(n2/3 + K).
END
Download