Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal http://www-users.cs.umn.edu/~smithal/ 1 Chapter Organization OLD Organization 5.1 Evaluation of Spatial Operations 5.2 Query Optimization 5.3 Analysis of Spatial Index Structures 5.4 Distributed Spatial Database Systems 5.5 Parallel Spatial Database Systems 5.6 Summary New Organization 5.1 Evaluation of Spatial Operations -Parallel spatial joins -Top k spatial joins 5.2 Query Optimization 5.3 Analysis of Spatial Index Structures 5.4 Distributed Spatial Database Systems 5.5 Parallel Spatial Database Systems 5.6 Introduction to query models 5.7 Spatial Query types •Reverse nearest neighbour queries (RNN) •Skyline queries 5.8 Trends : Spatial Query Evaluation on Hadoop 2 5.9 Summary New Learning Objectives Learning Objectives (LO) LO2 : Learn about alternative algorithms to process spatial queries LO6: Introduction to query models LO7: Understanding new spatial query types • LO7.1 : Understanding concept of RNN queries • LO7.2 : Understanding concept of skyline queries LO8 : Trends : Spatial queries on Hadoop Map Reduce Mapping Sections to learning objectives LO2 LO6 LO7 LO8 - 5.1.6 - 5.7 - 5.8 - 5.9 3 Parallel spatial joins Concept In a parallel architecture, work is distributed amongst several processors. For a spatial join, the work can be distributed in both the filtering and refinement stages. Top k spatial joins Concept A spatial join finds all pairs of objects satisfying a given relation between the objects Given two data sets A and B, the top-k spatial Join retrieves the k objects in data set A or B that intersect the maximum number of objects from the other data set 4 Example – Parallel spatial join Steps•Task creation - Creating a set of tasks to be executed in parallel. •Task assignment •Task execution Src: Parallel Processing of Spatial Joins Using R-trees Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger 5 New Learning Objectives Learning Objectives (LO) LO2 : Learn about alternative algorithms to process spatial queries LO6: Introduction to query models LO7: Understanding new spatial query types • LO7.1 : Understanding concept of RNN queries • LO7.2 : Understanding concept of skyline queries LO8 : Trends : Spatial queries on Hadoop Map Reduce Mapping Sections to learning objectives LO2 LO6 LO7 LO8 - 5.1.6 - 5.7 - 5.8 - 5.9 6 LO6: Introduction to query models Concept Overview of Query models for Oracle spatial & ArcSDE Oracle Spatial: provides a SQL schema and functions that facilitate the storage, retrieval, update, and query of collections of spatial features in an Oracle database. Oracle Spatial uses a two-tier query model to resolve spatial queries and spatial joins. It implements the idea of Filter-Refine Paradigm. The two operations are referred to as primary and secondary filter operations. The primary filter permits fast selection of candidate records to pass along to the secondary filter. The secondary filter-Expensive- yields an accurate answer to a spatial query. 7 Example • The primary filter checks to see if the MBRs of the candidate objects interact, not whether the objects themselves interact. •The secondary filter ensures that only candidate objects that actually interact are selected. 8 New Learning Objectives Learning Objectives (LO) LO2 : Learn about alternative algorithms to process spatial queries LO6: Introduction to query models LO7: Understanding new spatial query types • LO7.1 : Understanding concept of RNN queries • LO7.2 : Understanding concept of skyline queries LO8 : Trends : Spatial queries on Hadoop Map Reduce Mapping Sections to learning objectives LO2 LO6 LO7 LO8 - 5.1.6 - 5.7 - 5.8 - 5.9 9 LO7.1: Understand concept of rnn queries Reverse Nearest Neighbor Queries Concept – Focuses on inverse relations among points Example - 5 data points What are the RNNs of 1? 4 2 1 3 5 10 Example: Business Impact Analysis 11 Algorithm Step 1: For each point p ε S, determine the distance to the nearest neighbor of p in S, denoted N(p). N(p) = min q ε S –{p} d(p,q). For each p ε S, generate a circle (p,N(p)) where p is its center and N(p) its radius. Step 2: For any query q (example Target store), determine all the circles (p,N(p)) that contain q and return their centers p. 12 New Learning Objectives Learning Objectives (LO) LO2 : Learn about alternative algorithms to process spatial queries LO6: Introduction to query models LO7: Understanding new spatial query types • LO7.1 : Understanding concept of RNN queries • LO7.2 : Understanding concept of skyline queries LO8 : Trends : Spatial queries on Hadoop Map Reduce Mapping Sections to learning objectives LO2 LO6 LO7 LO8 - 5.1.6 - 5.7 - 5.8 - 5.9 13 LO7.2 : Understanding concept of skyline queries Example You have to attend a conference and for your stay you are trying to find a good hotel. Your purpose is to optimize this hotel search so that both the distance from conference centre as well as price of the booking is low. 14 Concept Domination: a point dominates A another point B if and only if the coordinate of A on any axis is not larger than the corresponding coordinate of B. 15 Example Given a set of points, the skyline query returns a set of points (referred to as the skyline points), such that any point in skyline is not dominated by any other point in the dataset. 16 Example contd…. h6 h5 h1 h7 h9 h8 S1 h11 h10 h2 h13 h12 S3 Price S2 h3 Distance from conference center S4 h4 Example contd…. 12 h6 h5 10 h1 8 h8 S1 6 S2 h7 h9 h11 4 h10 Price h12 S3 2 h3 h2 h13 S4 h4 0 0 2 4 6 Distance from conference center 8 10 12 Result 12 10 h1 8 6 4 2 h4 h2 Price 0 0 2 4 6 Distance from conference center 8 10 12 New Learning Objectives Learning Objectives (LO) LO2 : Learn about alternative algorithms to process spatial queries LO6: Introduction to query models LO7: Understanding new spatial query types • LO7.1 : Understanding concept of RNN queries • LO7.2 : Understanding concept of skyline queries LO8 : Trends : Spatial queries on Hadoop Map Reduce Mapping Sections to learning objectives LO2 LO6 LO7 LO8 - 5.1.6 - 5.7 - 5.8 - 5.9 20 Spatial Query Evaluation on Hadoop Hadoop HDFS – Hadoop Distributed File System Map Reduce : Programming paradigm 21 Parallel Databases v/s Map Reduce Parallel DBMS or Map Reduce Hadoop Parallel DBMS Structured Data Expensive to set up Complex analytics not easy Hadoop Semi Structured data Can be done with low budget Conclusion: Hadoop/Map reduce cannot replace DBMS Complex analytics easier Combination or Map Reduce and SQL - Aster Data A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden & M. Stonebraker "A comparison of approaches to large-scale data analysis," SIGMOD ’09 22 Spatial Query Evaluation Map Stage 1) Homogenize data 2) Map to tiles. 3) Merge tiles into buckets. Reduce Stage 1) Filter to find overlapping MBRs 2) Refine results 23