Query Processing and Optimization Learning Objectives • After this segment, students will be able to • Describe common strategies for building blocks, e.g., • • • • 15 Point Query Range Query Nearest Neighbor Spatial Join Scope • Choice of strategies • • • Varies across software vendors and products Representative strategies are listed here Some strategies need special file-structures or indices • Description of strategies • • 23 Main message: there are multiple strategies for each building block! Focus on concepts rather than implementation details Learning Objectives • After this segment, students will be able to • Describe common strategies for Point Queries • Linear Search • Binary Search with Z-order • Indexed Search with R-tree 15 Strategies for Point Queries • Point Query • • • Given a location Return a property (e.g., place name) of the location List of strategies • • • 24 Linear Search Binary Search • If records ordered by a space filling curve Index Search • If a spatial index is available An Example Dataset • Data: 14 points, each with a type triangle or star • Query: Return type at location (x, y) = (10, 11) • Candidate Storage Methods: • 7 data blocks, each with 2 points Query point 25 Candidate Storage & Indexing Methods C. R-tree (primary index) root A. Unordered B. Z-order (Y-major) Sorting number (Z-order index) 25 a c d e b f g h i Linear Search for Point Queries • • • Data: 14 points stored in data blocks with 2 points in each block (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Methods: Unordered Cost for linear search on this dataset: 7 Linear Search on data blocks 0 .. 6 25 Binary Search for Point Queries • • • Data: 14 points stored in data blocks with 2 points in each block (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Method: Z-order (Y-major) • Cost for binary search on this dataset: 3 Y-major Binary search on data blocks 0 .. 6 (0+6) / 2 = 3 3 Ceil((3+6) / 2) = 5 5 Ceil((5+6) / 2) = 6 6 3 blocks (i.e., green, cyan, yellow) accessed 25 Search for Point Queries Using R-Tree • • • Data: 14 data points stored in data blocks with 2 points in each block (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Methods: R-tree (Primary Index), root cached in main memory • I/O cost In this example Index block 1 Data block 1 Root root a c 25 d b b e f g h i g Comparing 3 Strategies for Point Queries • Data: 14 points stored in data blocks with 2 points in each block • (Marked) Query: Return the type of crime in the location (x, y) = (2, 3) Storage Method In this example Linear Search 7 Binary Search 3 Index Search Query point 25 Index blocks 1 Data Blocks 1 Learning Objectives • After this segment, students will be able to • Describe common strategies for Range Queries • Linear Search • Multiple Binary Searches (+ Scan) with Z-order • Indexed Search with R-tree 15 Range Queries • Range Query Example • • 25 List all countries crossed by of the river Amazon. Returns several objects within a spatial region from a table Strategies for Range Queries 1. Linear Search: – Scan all disk blocks of the data file 2. Binary Search – If records are ordered using space filling curve (say Z-order) – Decompose into disjoint Z-order intervals – For each Z-interval, • a binary search to get lowest Z-order within the Z-interval • then scan forward till end of the Z-interval 3. Index Search – If an index is available on spatial location of data objects, – then use range-query operation on the index 25 Range Query – Running Example • • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Methods: • • 25 7 data block with 2 points each Unordered, Z-ordered, R-tree Range Query – Candidate Storage Methods root a Unordered Z-order (Y-major) Sorting order (Z-order index) 25 c d b e f g h R-tree (primary index) i Linear Search for Range Queries • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Method: Unordered • Cost for range query on unordered data: 7 • Linear search on data blocks 0 ..6 25 Binary Search for Range Queries • • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Method: Z-order • One Z-interval 12 .. 15 => search for 12 then scan forward 3 blocks (i.e., green, cyan, yellow) accessed Binary search on data blocks 0..6 (0+6) / 2 = 3 3 5 6 25 ceil((3+6) / 2) = 5 Found Z = 12, scan forward till Z=15 Range Query with Two Z-intervals • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 1 <= y <= 2) • • Two Z-intervals: [5 .. 6] and [12 .. 13] One binary search (followed by scan) for each Z-interval ! 3 blocks (i.e., green, purple, cyan) accessed Binary search to find [(12), (13)] Binary search to find [(6), (7)] 3 2 25 (0+6) / 2 = 3 Found Z=5 Scan till Z= 6 (0+6) / 2 = 3 3 5 ceil((3+6) / 2) = 5 Found Z=10 Scan till Z= 11 Search for Range Queries Using R-tree Index • • • Data: 14 points stored in data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Cost in this example Candidate Storage Methods: R-tree (Primary Index) Index block 1 Data block 2 Root root a c 25 d b b e f g h i g, i Comparing Algorithms for Range Queries • • Data: 14 points stored in 7 data blocks with 2 points each (Brown Box) Query: ( 2 <= x <= 3 ) and ( 2 <= y <= 3 ) Storage Method In this example Linear Search 7 Binary Searches 3 Index Search 25 Index blocks 1 Data Blocks 2 Learning Objectives • After this segment, students will be able to • Describe common strategies for Spatial Join Queries • • • • 15 Nested Loop Nested Loop with one Spatial Index Space Partitioning Tree Matching Spatial Join Example • • 26 Pairs rivers with countries, they flow through. Return pairs across “Rivers” and “Countries” tables satisfying “overlap” predicate Spatial Join – Example Data Query: For each fire station, find all the houses within a distance <= 1 Fire station map a b A C 3 d i j a b f e h g Fire-stations c c B D Overlay House map Houses d e h k l A g D B f C i j k l Firestation Hous e A a B f D h D j Storage Structure 2 blocks for fire stations 6 blocks for houses Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 c B a A b C e i h D g f d j k l Fire stations Houses Strategy 1: Nested Loop • List of strategies 1. Nested loop: • Test all possible pairs for spatial predicate • Outer loop: bring data blocks of first table in memory • Inner loop: scan the second table 2. Nested Loop with a spatial index 3. Space Partitioning: 4. Tree Matching 5. Other, e.g. spatial-join-index based, external plane-sweep, … 27 Data block 0 Nested loop Example Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 Algorithm: For each block Bfs of fire stations For each block Bh of houses Scan all pairs of fire stations in B fs and houses in Bh Fire stations Houses Cost: For Blue For Red block, inner loop fetches all 6 (circle) blocks block, inner loop fetches all 6 (circle) blocks # blocks for fire stations * # blocks for houses = 2*6 = 12 Assume: 3 memory buffers (i.e., 1 for fire-stations, 1 for houses, 1 for results) Data block 1 Refining Nested Loop with a spatial index Rooth X Y c B a A b C g f i k e h D d j l Rooth Strategy 2. Nested loop with spatial index X • • Outer loop: For each data blocks D of first table Inner loop: Range Query second table for overlapping block – E.g., Houses within a distance <= 1 c B a A b C g f i k e h D d j l Y Rooth Ex.: Nested Loop with a spatial index • • X Y Outer loop: For each data blocks D of first table Inner loop: Range Query second table for overlapping block – E.g., Houses within a distance <= 1 Fire stations c a b g j Inner blocks 0 2, 3, 5, 6 1 4, 6, 7 Houses d f i k e h Outer l Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 Rooth Cost of Nested loop with Index Fire stations Houses Index blocks 2+2=4 X Y Block 0: Root -> X -> 2, 3 -> Y -> 5, 6 Block 1: Root -> X -> 4 -> Y -> 6, 7 Data blocks FS 2 House 4+3=7 Total 9 Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 Strategy 3. Space Partitioning Join • Example Query: Pair rivers with countries they pass through. • • Do we need to test Nile river with countries outside Africa? Space Partitioning Idea • • 27 Rivers in Africa are tested with countries in Africa only Test pairs of objects within common spatial regions Common Space Partitioning Query: For each fire station, find houses within distance <= 1 Four Partition: P0, P1, P2, P3 For each fire station, create MOBR with length of 1 P1 P0 P1 P0 c B a b A C e g P2 Data block 0 P3 Data block 1 d f i h D Q? Why C in two partitions? j k l Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 P0 A a, b, c, e P1 B, C d, f P2 D g, h, j P3 C i, k, l Space Partition Join Algorithm Ex. For each fire station, find houses within distance <= 1 Filter: For each partition Pi Bring Partition in main memory Test all pairs of MOBR Mfs of fire-station in Pi and all houses in Pi Refinement: Test remaining pair with exact geometry, e.g., distance <= 1 P1 P0 B c a b g P2 A h D e d C i j Result after Filter Phase Partitions f k P0 A a, b, c, e P1 B, C d, f P2 l P3 P3 D C g, h, j i, k, l Result MOBR House A a, b, c, e B f C d, f P2 D h, j P3 C i, k P0 P1 Cost of Space Partitioning Join Total cost = 8+8+(3+2+3+3) = 27 Read all data blocks 8 Write partitioning back 8 Compute for each partition P0 3 P1 2 P2 3 P3 About 3 “scans” of each table If replication of objects across partitions is rare. P1 P0 B c a b 3 g A h D e d C i j f k l Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 P32 P3 Strategy 4 : Tree Matching – Basic Idea • Nested Loop with an Index • • • Houses Space-partitioning join • • Inner loop range queries Eliminated pairs of data-blocks if disjoint MOBRs Fire stations Eliminated partition-pairs (( P0, P1), …) since disjoint MOBRs Tree Matching, if both tables are indexed: • • • 27 Eliminate pairs of index/data-blocks if disjoint MOBRs Start at Root level – Eliminate child-pair if irrelevant Recursion on remaining pairs P0 A a, b, c, e P1 B, C d, f P2 D g, h, j P3 C i, k, l Inputs forTree Matching: Both table have spatial indexes Rooth Rootfs X X B c a A b C g D Data block 1 Data block 2 3 Data block 5 d f i k e h Data block 0 Y Y j l Data block 3 Data block 4 Data block 6 Data block 7 Example Spatial Join Query • • • • Query: For each fire station, find houses within distance <= 1 MOBR buffer of size 1 to mimic spatial join predicate, i.e. distance <= 1 Root level – no child-pair is eliminated Recursion on remaining pairs, i.e., (X, 0), (Y, 0), (X, 1), (Y, 1) Rootfs X B A X b Y Data block 1 h Data block 2 3 d f i k e g D Data block 0 a Rooth C c Data block 5 Y j l Data block 3 Data block 4 Data block 6 Data block 7 Tree Matching Algorithm – Next Iteration • • • • Recursion Recursion Recursion Recursion on on on on X Data block 4 Data block 5 Data block 6 Data block 7 MOBR of f e Y Rooth X Y Rooth d h Index blocks i j k l Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 3 Data block 3 c b Data block 1 Data block 2 (X, 0) => remaining pairs: (2, 0), (3, 0), (Y, 0) => remaining pairs: (5, 0), (6, 0) MOBR of (X, 1) => remaining pairs: (4, 1), (Y, 1) => remaining pairs: (6, 1), (7, 1) a g Data block 0 2+2=4 Data blocks FS 2 House 4+3=7 Total 9 X Y Fire stations Houses Cost of Tree Matching Algorithm Data block 0 Data block 1 Data block 2 Data block 3 Data block 4 Data block 5 Data block 6 Data block 7 • Pairs examined: • • (X, 0), (Y, 0), (X, 1), (Y, 1) (2, 0), (3, 0), (5, 0), (6, 0), (4, 1), (6, 1), (7, 1) Rooth X Y • Blocks accessed • • Index blocks besides roots: X, Y Data blocks: all with 6 accessed twice Index blocks 2+2=4 Data blocks FS 2 House 4+3=7 Total 9 Rootfs Fire stations Houses Comparing Algorithms for Spatial Join 1. Default choice is Nested loop 2. Neither table has spatial index – Space partitioning if spatial-join predicate is selective 3. One table has a spatial index – nested loop with index 4. Both table have spatial tree indexes & selective spatial join predicate – Tree matching Learning Objectives • After this segment, students will be able to • Describe common strategies for Nearest Neighbor Queries • Two Phase (Upper bounding) • Single Phase (Pruning) 15 Nearest Neighbor Queries • Example • • 28 Find the city closest to Chicago. Return one spatial object from city data file C Nearest Neighbor – Running Examples Each point represents location of a restaurant. Query: Given the location of a user p, find the nearest restaurant. (If more than one nearest neighbors, return all results) c a d f e b h i j k l g Query point p Restaurants 3 User Result: Nearest neighbor of p is j Strategies for Nearest Neighbor Queries • Two phase approach • • • Fetch C’s disk sector(s) containing the query point M = minimum distance(query point, objects in fetched leaf) Test all cities within distance M of query point (Range Query) • Single phase approach 28 Two Phase Strategy (with a R-tree) Find the index leaf containing the query point p: block red In red leaf, Point g, h are the closest points to p, dB = 2 Create a circle Circlep whose center is p, and radius = dB X c a b Y g 3 d f i k e h j p Create the MOBR of Circlep : Mp Range query: Mp, and test all points in Mp Root -> Y -> Block brown Since dist(p, j) = 1.41 < DB, point j is nearest neighbor of p l Root X Restaurants User Y Cost: Index blocks Data blocks Phase 1 1 1 Phase 2 1 1 Two Phase Strategy- Exercise Ex.: Generalize the algorithm to the case when query point is outside bounding box of root of the R-tree? X c a b Y Root d f i k e h j g 3 l p Restaurants User X Y Strategies for Nearest Neighbor Queries • Two phase approach • Single phase approach • • • 28 Recursive algorithm for R-tree Eliminate children dominated by some other children Check the remaining data blocks for nearest neighbor One Phase Strategy with a R-treee Root X c a g 3 f e h i j p Data block 0 Data block 3 Y 2 d b Y X Finally, check blocks 0, 1, 3, 4 for nearest neighbors Index blocks Data blocks k l Data block 1 Data block 4 First level: Second level: Data block 2 Data block 5 Node MinDist MaxDist X 3 7.47 Y 0 4.47 0 3.16 4.12 1 3.16 5.10 2 4.47 3 0 2.83 4 1.41 2.83 5 3.16 4 Nothing eliminated Node 2 eliminated Node 5 eliminated Comparing Algorithms for Nearest Neighbor Queries Data: Each point in this dataset represents the location of a restaurant. c a d f i Query: Given the location of a user p, find the nearest restaurant. (If more than one nearest neighbors, return all results) k Result: Nearest neighbor of p is j e b h j l Storage Method g Two phase approach Query point p Restaurants 25 User One phase approach In this example Index blocks 2 Data Blocks 2 Index blocks 2 Data Blocks 4