Uploaded by TUSHAR MALI

SGD PPT

advertisement
Query Processing and
Optimization
Learning Objectives
• After this segment, students will be able to
• Describe common strategies for building blocks, e.g.,
•
•
•
•
15
Point Query
Range Query
Nearest Neighbor
Spatial Join
Scope
• Choice of strategies
•
•
•
Varies across software vendors and products
Representative strategies are listed here
Some strategies need special file-structures or indices
• Description of strategies
•
•
23
Main message: there are multiple strategies for each building block!
Focus on concepts rather than implementation details
Learning Objectives
• After this segment, students will be able to
• Describe common strategies for Point Queries
• Linear Search
• Binary Search with Z-order
• Indexed Search with R-tree
15
Strategies for Point Queries
•
Point Query
•
•
•
Given a location
Return a property (e.g., place name) of the location
List of strategies
•
•
•
24
Linear Search
Binary Search
• If records ordered by a space filling curve
Index Search
• If a spatial index is available
An Example Dataset
•
Data: 14 points, each with a type triangle or star
•
Query: Return type at location (x, y) = (10, 11)
•
Candidate Storage Methods:
•
7 data blocks, each with 2 points
Query point
25
Candidate Storage & Indexing Methods
C. R-tree (primary index)
root
A. Unordered
B. Z-order (Y-major)
Sorting number (Z-order index)
25
a
c
d e
b
f
g h i
Linear Search for Point Queries
•
•
•
Data: 14 points stored in data blocks with 2 points in each block
(Marked) Query: Return the type of crime in the location (x, y) = (2, 3)
Storage Methods: Unordered
Cost for linear search on this dataset: 7
Linear Search on data blocks 0 .. 6
25
Binary Search for Point Queries
•
•
•
Data: 14 points stored in data blocks with 2 points in each block
(Marked) Query: Return the type of crime in the location (x, y) = (2, 3)
Storage Method: Z-order (Y-major)
• Cost for binary search on this dataset: 3
Y-major
Binary search on data blocks 0 .. 6
(0+6) / 2 = 3
3
Ceil((3+6) / 2) = 5
5
Ceil((5+6) / 2) = 6
6
3 blocks (i.e., green, cyan, yellow) accessed
25
Search for Point Queries Using R-Tree
•
•
•
Data: 14 data points stored in data blocks with 2 points in each block
(Marked) Query: Return the type of crime in the location (x, y) = (2, 3)
Storage Methods: R-tree (Primary Index), root cached in main memory
• I/O cost
In this example
Index block
1
Data block
1
Root
root
a
c
25
d
b
b
e
f
g
h
i
g
Comparing 3 Strategies for Point Queries
•
Data: 14 points stored in data blocks with 2 points in each block
•
(Marked) Query: Return the type of crime in the location (x, y) = (2, 3)
Storage Method
In this example
Linear Search
7
Binary Search
3
Index Search
Query point
25
Index blocks
1
Data Blocks
1
Learning Objectives
• After this segment, students will be able to
• Describe common strategies for Range Queries
• Linear Search
• Multiple Binary Searches (+ Scan) with Z-order
• Indexed Search with R-tree
15
Range Queries
•
Range Query Example
•
•
25
List all countries crossed by of the river Amazon.
Returns several objects within a spatial region from a table
Strategies for Range Queries
1. Linear Search:
–
Scan all disk blocks of the data file
2. Binary Search
– If records are ordered using space filling curve (say Z-order)
– Decompose into disjoint Z-order intervals
– For each Z-interval,
• a binary search to get lowest Z-order within the Z-interval
• then scan forward till end of the Z-interval
3. Index Search
– If an index is available on spatial location of data objects,
– then use range-query operation on the index
25
Range Query – Running Example
•
•
•
Data: 14 points stored in data blocks with 2 points each
(Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 )
Storage Methods:
•
•
25
7 data block with 2 points each
Unordered, Z-ordered, R-tree
Range Query – Candidate Storage Methods
root
a
Unordered
Z-order (Y-major)
Sorting order (Z-order index)
25
c
d
b
e
f
g
h
R-tree (primary index)
i
Linear Search for Range Queries
•
•
Data: 14 points stored in data blocks with 2 points each
(Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 )
Storage Method: Unordered
•
Cost for range query on unordered data: 7
•
Linear search on data blocks 0 ..6
25
Binary Search for Range Queries
•
•
•
Data: 14 points stored in data blocks with 2 points each
(Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 )
Storage Method: Z-order
•
One Z-interval 12 .. 15 => search for 12 then scan forward
3 blocks (i.e., green, cyan, yellow) accessed
Binary search
on data blocks 0..6
(0+6) / 2 = 3
3
5
6
25
ceil((3+6) / 2) = 5
Found Z = 12,
scan forward till Z=15
Range Query with Two Z-intervals
•
•
Data: 14 points stored in data blocks with 2 points each
(Brown Box) Query: ( 2 <= x <= 3 ) and ( 1 <= y <= 2)
•
•
Two Z-intervals: [5 .. 6] and [12 .. 13]
One binary search (followed by scan) for each Z-interval !
3 blocks
(i.e., green, purple, cyan)
accessed
Binary search to find [(12), (13)]
Binary search to find [(6), (7)]
3
2
25
(0+6) / 2 = 3
Found Z=5
Scan till Z= 6
(0+6) / 2 = 3
3
5
ceil((3+6) / 2) = 5
Found Z=10
Scan till Z= 11
Search for Range Queries Using R-tree Index
•
•
•
Data: 14 points stored in data blocks with 2 points each
(Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 )
Cost in this example
Candidate Storage Methods: R-tree (Primary Index) Index block
1
Data block
2
Root
root
a
c
25
d
b
b
e
f
g
h
i
g, i
Comparing Algorithms for Range Queries
•
•
Data: 14 points stored in 7 data blocks with 2 points each
(Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 )
Storage Method
In this example
Linear Search
7
Binary Searches
3
Index Search
25
Index blocks
1
Data Blocks
2
Learning Objectives
• After this segment, students will be able to
• Describe common strategies for Spatial Join Queries
•
•
•
•
15
Nested Loop
Nested Loop with one Spatial Index
Space Partitioning
Tree Matching
Spatial Join Example
•
•
26
Pairs rivers with countries, they flow through.
Return pairs across “Rivers” and “Countries” tables satisfying “overlap”
predicate
Spatial Join – Example Data
Query: For each fire station, find all the houses within a distance <= 1
Fire station map
a
b
A
C
3
d
i
j
a
b
f
e
h
g
Fire-stations
c
c
B
D
Overlay
House map
Houses
d
e
h
k
l
A
g
D
B
f
C
i
j
k
l
Firestation
Hous
e
A
a
B
f
D
h
D
j
Storage Structure
2 blocks for fire stations
6 blocks for houses
Data block 0
Data block 1
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
c
B
a
A
b
C
e
i
h
D
g
f
d
j
k
l
Fire stations
Houses
Strategy 1: Nested Loop
•
List of strategies
1. Nested loop:
• Test all possible pairs for spatial predicate
• Outer loop: bring data blocks of first table in memory
• Inner loop: scan the second table
2. Nested Loop with a spatial index
3. Space Partitioning:
4. Tree Matching
5. Other, e.g. spatial-join-index based, external plane-sweep, …
27
Data block 0
Nested loop Example
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
Algorithm: For each block Bfs of fire stations
For each block Bh of houses
Scan all pairs of fire stations in B fs and houses in Bh
Fire stations
Houses
Cost: For Blue
For Red
block, inner loop fetches all 6 (circle) blocks
block, inner loop fetches all 6 (circle) blocks
# blocks for fire stations * # blocks for houses = 2*6 = 12
Assume: 3 memory buffers
(i.e., 1 for fire-stations, 1 for houses, 1 for results)
Data block 1
Refining Nested Loop with a spatial index
Rooth
X
Y
c
B
a
A
b
C
g
f
i
k
e
h
D
d
j
l
Rooth
Strategy 2. Nested loop with spatial index
X
•
•
Outer loop: For each data blocks D of first table
Inner loop: Range Query second table for overlapping block
– E.g., Houses within a distance <= 1
c
B
a
A
b
C
g
f
i
k
e
h
D
d
j
l
Y
Rooth
Ex.: Nested Loop with a spatial index
•
•
X
Y
Outer loop: For each data blocks D of first table
Inner loop: Range Query second table for overlapping block
– E.g., Houses within a distance <= 1
Fire stations
c
a
b
g
j
Inner blocks
0
2, 3, 5, 6
1
4, 6, 7
Houses
d
f
i
k
e
h
Outer
l
Data block 0
Data block 1
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
Rooth
Cost of Nested loop with Index
Fire stations
Houses
Index blocks
2+2=4
X
Y
Block 0:
Root -> X -> 2, 3
-> Y -> 5, 6
Block 1:
Root -> X -> 4
-> Y -> 6, 7
Data blocks
FS
2
House
4+3=7
Total
9
Data block 0
Data block 1
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
Strategy 3. Space Partitioning Join
•
Example Query: Pair rivers with countries they pass through.
•
•
Do we need to test Nile river with countries outside Africa?
Space Partitioning Idea
•
•
27
Rivers in Africa are tested with countries in Africa only
Test pairs of objects within common spatial regions
Common Space Partitioning
Query: For each fire station, find houses within distance <= 1
Four Partition: P0, P1, P2, P3
For each fire station, create MOBR with length of 1
P1
P0
P1
P0
c
B
a
b
A
C
e
g
P2
Data block 0
P3
Data block 1
d
f
i
h
D
Q? Why C in two partitions?
j
k
l
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
P0
A
a, b,
c, e
P1
B, C
d, f
P2
D
g, h,
j
P3
C
i, k, l
Space Partition Join Algorithm
Ex. For each fire station, find
houses within distance <= 1
Filter: For each partition Pi
Bring Partition in main memory
Test all pairs of MOBR Mfs of fire-station in Pi and all houses in Pi
Refinement: Test remaining pair with exact geometry, e.g., distance <= 1
P1
P0
B
c
a
b
g
P2
A
h
D
e
d
C
i
j
Result after Filter Phase
Partitions
f
k
P0
A
a, b, c, e
P1
B, C
d, f
P2
l
P3
P3
D
C
g, h, j
i, k, l
Result
MOBR
House
A
a, b, c, e
B
f
C
d, f
P2
D
h, j
P3
C
i, k
P0
P1
Cost of Space Partitioning Join
Total cost = 8+8+(3+2+3+3) = 27
Read all
data blocks
8
Write partitioning
back
8
Compute for each
partition
P0
3
P1
2
P2
3
P3
About 3 “scans” of each table
If replication of objects across partitions is rare.
P1
P0
B
c
a
b
3
g
A
h
D
e
d
C
i
j
f
k
l
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
P32
P3
Strategy 4 : Tree Matching – Basic Idea
•
Nested Loop with an Index
•
•
•
Houses
Space-partitioning join
•
•
Inner loop range queries
Eliminated pairs of data-blocks if disjoint MOBRs
Fire stations
Eliminated partition-pairs (( P0, P1), …) since disjoint MOBRs
Tree Matching, if both tables are indexed:
•
•
•
27
Eliminate pairs of index/data-blocks if disjoint MOBRs
Start at Root level – Eliminate child-pair if irrelevant
Recursion on remaining pairs
P0
A
a, b,
c, e
P1
B, C
d, f
P2
D
g, h,
j
P3
C
i, k, l
Inputs forTree Matching: Both table have spatial
indexes
Rooth
Rootfs
X
X
B
c
a
A
b
C
g
D
Data block 1
Data block 2
3
Data block 5
d
f
i
k
e
h
Data block 0
Y
Y
j
l
Data block 3
Data block 4
Data block 6
Data block 7
Example Spatial Join Query
•
•
•
•
Query: For each fire station, find houses within distance <= 1
MOBR buffer of size 1 to mimic spatial join predicate, i.e. distance <= 1
Root level – no child-pair is eliminated
Recursion on remaining pairs, i.e., (X, 0), (Y, 0), (X, 1), (Y, 1)
Rootfs
X
B
A
X
b
Y
Data block 1
h
Data block 2
3
d
f
i
k
e
g
D
Data block 0
a
Rooth
C
c
Data block 5
Y
j
l
Data block 3
Data block 4
Data block 6
Data block 7
Tree Matching Algorithm –
Next Iteration
•
•
•
•
Recursion
Recursion
Recursion
Recursion
on
on
on
on
X
Data block 4
Data block 5
Data block 6
Data block 7
MOBR of
f
e
Y
Rooth
X
Y
Rooth
d
h
Index blocks
i
j
k
l
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
3
Data block 3
c
b
Data block 1
Data block 2
(X, 0) => remaining pairs: (2, 0), (3, 0),
(Y, 0) => remaining pairs: (5, 0), (6, 0)
MOBR of
(X, 1) => remaining pairs: (4, 1),
(Y, 1) => remaining pairs: (6, 1), (7, 1)
a
g
Data block 0
2+2=4
Data blocks
FS
2
House
4+3=7
Total
9
X
Y
Fire stations
Houses
Cost of Tree Matching Algorithm
Data block 0
Data block 1
Data block 2
Data block 3
Data block 4
Data block 5
Data block 6
Data block 7
• Pairs examined:
•
•
(X, 0), (Y, 0), (X, 1), (Y, 1)
(2, 0), (3, 0), (5, 0), (6, 0), (4, 1), (6, 1), (7, 1)
Rooth
X
Y
• Blocks accessed
•
•
Index blocks besides roots: X, Y
Data blocks: all with 6 accessed twice
Index blocks
2+2=4
Data blocks
FS
2
House
4+3=7
Total
9
Rootfs
Fire stations
Houses
Comparing Algorithms for Spatial Join
1. Default choice is Nested loop
2. Neither table has spatial index
– Space partitioning if spatial-join predicate is selective
3. One table has a spatial index
– nested loop with index
4. Both table have spatial tree indexes & selective spatial join predicate
– Tree matching
Learning Objectives
• After this segment, students will be able to
• Describe common strategies for Nearest Neighbor Queries
• Two Phase (Upper bounding)
• Single Phase (Pruning)
15
Nearest Neighbor Queries
•
Example
•
•
28
Find the city closest to Chicago.
Return one spatial object from city data file C
Nearest Neighbor – Running Examples
Each point represents location of a restaurant.
Query:
Given the location of a user p, find the nearest restaurant.
(If more than one nearest neighbors, return all results)
c
a
d
f
e
b
h
i
j
k
l
g
Query point p
Restaurants
3
User
Result:
Nearest neighbor of p is j
Strategies for Nearest Neighbor Queries
• Two phase approach
•
•
•
Fetch C’s disk sector(s) containing the query point
M = minimum distance(query point, objects in fetched leaf)
Test all cities within distance M of query point (Range Query)
• Single phase approach
28
Two Phase Strategy (with a R-tree)
Find the index leaf containing the query point p: block red
In red leaf, Point g, h are the closest points to p, dB = 2
Create a circle Circlep whose center is p, and radius = dB
X
c
a
b
Y
g
3
d
f
i
k
e
h
j
p
Create the MOBR of Circlep : Mp
Range query: Mp, and test all points in Mp
Root -> Y -> Block brown
Since dist(p, j) = 1.41 < DB, point j is nearest neighbor of p
l
Root
X
Restaurants
User
Y
Cost:
Index
blocks
Data
blocks
Phase 1
1
1
Phase 2
1
1
Two Phase Strategy- Exercise
Ex.: Generalize the algorithm to the case when query point is outside bounding box of
root of the R-tree?
X
c
a
b
Y
Root
d
f
i
k
e
h
j
g
3
l
p
Restaurants
User
X
Y
Strategies for Nearest Neighbor Queries
• Two phase approach
• Single phase approach
•
•
•
28
Recursive algorithm for R-tree
Eliminate children dominated by some other children
Check the remaining data blocks for nearest neighbor
One Phase Strategy with a R-treee
Root
X
c
a
g
3
f
e
h
i
j
p
Data block 0
Data block 3
Y
2
d
b
Y
X
Finally, check blocks 0, 1, 3, 4
for nearest neighbors
Index blocks
Data blocks
k
l
Data block 1
Data block 4
First
level:
Second
level:
Data block 2
Data block 5
Node
MinDist
MaxDist
X
3
7.47
Y
0
4.47
0
3.16
4.12
1
3.16
5.10
2
4.47
3
0
2.83
4
1.41
2.83
5
3.16
4
Nothing eliminated
Node 2 eliminated
Node 5 eliminated
Comparing Algorithms for Nearest Neighbor
Queries
Data:
Each point in this dataset represents the location of a
restaurant.
c
a
d
f
i
Query:
Given the location of a user p, find the nearest restaurant.
(If more than one nearest neighbors, return all results)
k
Result:
Nearest neighbor of p is j
e
b
h
j
l
Storage Method
g
Two phase approach
Query point p
Restaurants
25
User
One phase approach
In this example
Index blocks
2
Data Blocks
2
Index blocks
2
Data Blocks
4
Download