Query Processing and Optimization in Spatial Databases

Query Processing and Optimization Learning Objectives • After this segment, students will be able to • Describe common strategies for building blocks, e.g., • • • • 15 Point Query Range Query Nearest Neighbor Spatial Join Scope • Choice of strategies • • • Varies across software vendors and products Representative strategies are listed here Some strategies need special file-structures or indices • Description of strategies • • 23 Main message: there are multiple strategies for each building block! Focus on concepts rather than implementation details Learning Objectives • After this segment, students will be able to • Describe common strategies for Point Queries • Linear Search • Binary Search with Z-order • Indexed Search with R-tree 15 Strategies for Point Queries • Point Query • • • Given a location Return a property (e.g., place name) of the location List of strategies • • • 24 Linear Search Binary Search • If records ordered by a space filling curve Index Search • If a spatial index is available An Example Dataset • Data: 14 points, each with a type triangle or star • Query: Return type at location (x, y) = (10, 11) • Candidate Storage Methods: • 7 data blocks, each with 2 points Query point 25 Candidate Storage & Indexing Methods C. R-tree (primary index) root A. Unordered B. Z-order (Y-major) Sorting number (Z-order index) 25 a c d e b f g h i Linear Search for Point Queries • • • Data: 14 points stored in data blocks with 2 points in each block (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Methods: Unordered Cost for linear search on this dataset: 7 Linear Search on data blocks 0 .. 6 25 Binary Search for Point Queries • • • Data: 14 points stored in data blocks with 2 points in each block (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Method: Z-order (Y-major) • Cost for binary search on this dataset: 3 Y-major Binary search on data blocks 0 .. 6 (0+6) / 2 = 3 3 Ceil((3+6) / 2) = 5 5 Ceil((5+6) / 2) = 6 6 3 blocks (i.e., green, cyan, yellow) accessed 25 Search for Point Queries Using R-Tree • • • Data: 14 data points stored in data blocks with 2 points in each block (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Methods: R-tree (Primary Index), root cached in main memory • I/O cost In this example Index block 1 Data block 1 Root root a c 25 d b b e f g h i g Comparing 3 Strategies for Point Queries • Data: 14 points stored in data blocks with 2 points in each block • (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Method In this example Linear Search 7 Binary Search 3 Index Search Query point 25 Index blocks 1 Data Blocks 1 Learning Objectives • After this segment, students will be able to • Describe common strategies for Range Queries • Linear Search • Multiple Binary Searches (+ Scan) with Z-order • Indexed Search with R-tree 15 Range Queries • Range Query Example • • 25 List all countries crossed by of the river Amazon. Returns several objects within a spatial region from a table Strategies for Range Queries 1. Linear Search: – Scan all disk blocks of the data file 2. Binary Search – If records are ordered using space filling curve (say Z-order) – Decompose into disjoint Z-order intervals – For each Z-interval, • a binary search to get lowest Z-order within the Z-interval • then scan forward till end of the Z-interval 3. Index Search – If an index is available on spatial location of data objects, – then use range-query operation on the index 25 Range Query – Running Example • • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Methods: • • 25 7 data block with 2 points each Unordered, Z-ordered, R-tree Range Query – Candidate Storage Methods root a Unordered Z-order (Y-major) Sorting order (Z-order index) 25 c d b e f g h R-tree (primary index) i Linear Search for Range Queries • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Method: Unordered • Cost for range query on unordered data: 7 • Linear search on data blocks 0 ..6 25 Binary Search for Range Queries • • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Method: Z-order • One Z-interval 12 .. 15 => search for 12 then scan forward 3 blocks (i.e., green, cyan, yellow) accessed Binary search on data blocks 0..6 (0+6) / 2 = 3 3 5 6 25 ceil((3+6) / 2) = 5 Found Z = 12, scan forward till Z=15 Range Query with Two Z-intervals • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 1 <= y <= 2) • • Two Z-intervals: [5 .. 6] and [12 .. 13] One binary search (followed by scan) for each Z-interval ! 3 blocks (i.e., green, purple, cyan) accessed Binary search to find [(12), (13)] Binary search to find [(6), (7)] 3 2 25 (0+6) / 2 = 3 Found Z=5 Scan till Z= 6 (0+6) / 2 = 3 3 5 ceil((3+6) / 2) = 5 Found Z=10 Scan till Z= 11 Search for Range Queries Using R-tree Index • • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Cost in this example Candidate Storage Methods: R-tree (Primary Index) Index block 1 Data block 2 Root root a c 25 d b b e f g h i g, i Comparing Algorithms for Range Queries • • Data: 14 points stored in 7 data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Method In this example Linear Search 7 Binary Searches 3 Index Search 25 Index blocks 1 Data Blocks 2 Learning Objectives • After this segment, students will be able to • Describe common strategies for Spatial Join Queries • • • • 15 Nested Loop Nested Loop with one Spatial Index Space Partitioning Tree Matching Spatial Join Example • • 26 Pairs rivers with countries, they flow through. Return pairs across “Rivers” and “Countries” tables satisfying “overlap” predicate Spatial Join – Example Data Query: For each fire station, find all the houses within a distance <= 1 Fire station map a b A C 3 d i j a b f e h g Fire-stations c c B D Overlay House map Houses d e h k l A g D B f C i j k l Firestation Hous e A a B f D h D j Storage Structure 2 blocks for fire stations 6 blocks for houses Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 c B a A b C e i h D g f d j k l Fire stations Houses Strategy 1: Nested Loop • List of strategies 1. Nested loop: • Test all possible pairs for spatial predicate • Outer loop: bring data blocks of first table in memory • Inner loop: scan the second table 2. Nested Loop with a spatial index 3. Space Partitioning: 4. Tree Matching 5. Other, e.g. spatial-join-index based, external plane-sweep, … 27 Data block 0 Nested loop Example Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 Algorithm: For each block Bfs of fire stations For each block Bh of houses Scan all pairs of fire stations in B fs and houses in Bh Fire stations Houses Cost: For Blue For Red block, inner loop fetches all 6 (circle) blocks block, inner loop fetches all 6 (circle) blocks # blocks for fire stations * # blocks for houses = 2*6 = 12 Assume: 3 memory buffers (i.e., 1 for fire-stations, 1 for houses, 1 for results) Data block 1 Refining Nested Loop with a spatial index Rooth X Y c B a A b C g f i k e h D d j l Rooth Strategy 2. Nested loop with spatial index X • • Outer loop: For each data blocks D of first table Inner loop: Range Query second table for overlapping block – E.g., Houses within a distance <= 1 c B a A b C g f i k e h D d j l Y Rooth Ex.: Nested Loop with a spatial index • • X Y Outer loop: For each data blocks D of first table Inner loop: Range Query second table for overlapping block – E.g., Houses within a distance <= 1 Fire stations c a b g j Inner blocks 0 2, 3, 5, 6 1 4, 6, 7 Houses d f i k e h Outer l Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 Rooth Cost of Nested loop with Index Fire stations Houses Index blocks 2+2=4 X Y Block 0: Root -> X -> 2, 3 -> Y -> 5, 6 Block 1: Root -> X -> 4 -> Y -> 6, 7 Data blocks FS 2 House 4+3=7 Total 9 Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 Strategy 3. Space Partitioning Join • Example Query: Pair rivers with countries they pass through. • • Do we need to test Nile river with countries outside Africa? Space Partitioning Idea • • 27 Rivers in Africa are tested with countries in Africa only Test pairs of objects within common spatial regions Common Space Partitioning Query: For each fire station, find houses within distance <= 1 Four Partition: P0, P1, P2, P3 For each fire station, create MOBR with length of 1 P1 P0 P1 P0 c B a b A C e g P2 Data block 0 P3 Data block 1 d f i h D Q? Why C in two partitions? j k l Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 P0 A a, b, c, e P1 B, C d, f P2 D g, h, j P3 C i, k, l Space Partition Join Algorithm Ex. For each fire station, find houses within distance <= 1 Filter: For each partition Pi Bring Partition in main memory Test all pairs of MOBR Mfs of fire-station in Pi and all houses in Pi Refinement: Test remaining pair with exact geometry, e.g., distance <= 1 P1 P0 B c a b g P2 A h D e d C i j Result after Filter Phase Partitions f k P0 A a, b, c, e P1 B, C d, f P2 l P3 P3 D C g, h, j i, k, l Result MOBR House A a, b, c, e B f C d, f P2 D h, j P3 C i, k P0 P1 Cost of Space Partitioning Join Total cost = 8+8+(3+2+3+3) = 27 Read all data blocks 8 Write partitioning back 8 Compute for each partition P0 3 P1 2 P2 3 P3 About 3 “scans” of each table If replication of objects across partitions is rare. P1 P0 B c a b 3 g A h D e d C i j f k l Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 P32 P3 Strategy 4 : Tree Matching – Basic Idea • Nested Loop with an Index • • • Houses Space-partitioning join • • Inner loop range queries Eliminated pairs of data-blocks if disjoint MOBRs Fire stations Eliminated partition-pairs (( P0, P1), …) since disjoint MOBRs Tree Matching, if both tables are indexed: • • • 27 Eliminate pairs of index/data-blocks if disjoint MOBRs Start at Root level – Eliminate child-pair if irrelevant Recursion on remaining pairs P0 A a, b, c, e P1 B, C d, f P2 D g, h, j P3 C i, k, l Inputs forTree Matching: Both table have spatial indexes Rooth Rootfs X X B c a A b C g D Data block 1 Data block 2 3 Data block 5 d f i k e h Data block 0 Y Y j l Data block 3 Data block 4 Data block 6 Data block 7 Example Spatial Join Query • • • • Query: For each fire station, find houses within distance <= 1 MOBR buffer of size 1 to mimic spatial join predicate, i.e. distance <= 1 Root level – no child-pair is eliminated Recursion on remaining pairs, i.e., (X, 0), (Y, 0), (X, 1), (Y, 1) Rootfs X B A X b Y Data block 1 h Data block 2 3 d f i k e g D Data block 0 a Rooth C c Data block 5 Y j l Data block 3 Data block 4 Data block 6 Data block 7 Tree Matching Algorithm – Next Iteration • • • • Recursion Recursion Recursion Recursion on on on on X Data block 4 Data block 5 Data block 6 Data block 7 MOBR of f e Y Rooth X Y Rooth d h Index blocks i j k l Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 3 Data block 3 c b Data block 1 Data block 2 (X, 0) => remaining pairs: (2, 0), (3, 0), (Y, 0) => remaining pairs: (5, 0), (6, 0) MOBR of (X, 1) => remaining pairs: (4, 1), (Y, 1) => remaining pairs: (6, 1), (7, 1) a g Data block 0 2+2=4 Data blocks FS 2 House 4+3=7 Total 9 X Y Fire stations Houses Cost of Tree Matching Algorithm Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 • Pairs examined: • • (X, 0), (Y, 0), (X, 1), (Y, 1) (2, 0), (3, 0), (5, 0), (6, 0), (4, 1), (6, 1), (7, 1) Rooth X Y • Blocks accessed • • Index blocks besides roots: X, Y Data blocks: all with 6 accessed twice Index blocks 2+2=4 Data blocks FS 2 House 4+3=7 Total 9 Rootfs Fire stations Houses Comparing Algorithms for Spatial Join 1. Default choice is Nested loop 2. Neither table has spatial index – Space partitioning if spatial-join predicate is selective 3. One table has a spatial index – nested loop with index 4. Both table have spatial tree indexes & selective spatial join predicate – Tree matching Learning Objectives • After this segment, students will be able to • Describe common strategies for Nearest Neighbor Queries • Two Phase (Upper bounding) • Single Phase (Pruning) 15 Nearest Neighbor Queries • Example • • 28 Find the city closest to Chicago. Return one spatial object from city data file C Nearest Neighbor – Running Examples Each point represents location of a restaurant. Query: Given the location of a user p, find the nearest restaurant. (If more than one nearest neighbors, return all results) c a d f e b h i j k l g Query point p Restaurants 3 User Result: Nearest neighbor of p is j Strategies for Nearest Neighbor Queries • Two phase approach • • • Fetch C’s disk sector(s) containing the query point M = minimum distance(query point, objects in fetched leaf) Test all cities within distance M of query point (Range Query) • Single phase approach 28 Two Phase Strategy (with a R-tree) Find the index leaf containing the query point p: block red In red leaf, Point g, h are the closest points to p, dB = 2 Create a circle Circlep whose center is p, and radius = dB X c a b Y g 3 d f i k e h j p Create the MOBR of Circlep : Mp Range query: Mp, and test all points in Mp Root -> Y -> Block brown Since dist(p, j) = 1.41 < DB, point j is nearest neighbor of p l Root X Restaurants User Y Cost: Index blocks Data blocks Phase 1 1 1 Phase 2 1 1 Two Phase Strategy- Exercise Ex.: Generalize the algorithm to the case when query point is outside bounding box of root of the R-tree? X c a b Y Root d f i k e h j g 3 l p Restaurants User X Y Strategies for Nearest Neighbor Queries • Two phase approach • Single phase approach • • • 28 Recursive algorithm for R-tree Eliminate children dominated by some other children Check the remaining data blocks for nearest neighbor One Phase Strategy with a R-treee Root X c a g 3 f e h i j p Data block 0 Data block 3 Y 2 d b Y X Finally, check blocks 0, 1, 3, 4 for nearest neighbors Index blocks Data blocks k l Data block 1 Data block 4 First level: Second level: Data block 2 Data block 5 Node MinDist MaxDist X 3 7.47 Y 0 4.47 0 3.16 4.12 1 3.16 5.10 2 4.47 3 0 2.83 4 1.41 2.83 5 3.16 4 Nothing eliminated Node 2 eliminated Node 5 eliminated Comparing Algorithms for Nearest Neighbor Queries Data: Each point in this dataset represents the location of a restaurant. c a d f i Query: Given the location of a user p, find the nearest restaurant. (If more than one nearest neighbors, return all results) k Result: Nearest neighbor of p is j e b h j l Storage Method g Two phase approach Query point p Restaurants 25 User One phase approach In this example Index blocks 2 Data Blocks 2 Index blocks 2 Data Blocks 4

Query Processing and Optimization in Spatial Databases

Products

Support

Query Processing and Optimization in Spatial Databases

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib