Introduction to Spatial Databases

advertisement
Chapter 5 : Query Processing and Optimization
Group 4:
Nipun Garg, Surabhi Mithal
http://www-users.cs.umn.edu/~smithal/
1
Chapter Organization
OLD Organization
5.1 Evaluation of Spatial Operations
5.2 Query Optimization
5.3 Analysis of Spatial Index
Structures
5.4 Distributed Spatial Database
Systems
5.5 Parallel Spatial Database Systems
5.6 Summary
New Organization
5.1 Evaluation of Spatial Operations
-Parallel spatial joins
-Top k spatial joins
5.2 Query Optimization
5.3 Analysis of Spatial Index
Structures
5.4 Distributed Spatial Database
Systems
5.5 Parallel Spatial Database
Systems
5.6 Introduction to query models
5.7 Spatial Query types
•Reverse nearest neighbour
queries (RNN)
•Skyline queries
5.8 Trends : Spatial Query Evaluation
on Hadoop
2
5.9 Summary
New Learning Objectives
Learning Objectives (LO)
LO2 : Learn about alternative algorithms to process spatial queries
LO6: Introduction to query models
LO7: Understanding new spatial query types
• LO7.1 : Understanding concept of RNN queries
• LO7.2 : Understanding concept of skyline queries
LO8 : Trends : Spatial queries on Hadoop Map Reduce
Mapping Sections to learning objectives
LO2
LO6
LO7
LO8
- 5.1.6
- 5.7
- 5.8
- 5.9
3
Parallel spatial joins
Concept
In a parallel architecture, work is distributed amongst several
processors.
For a spatial join, the work can be distributed in both the filtering
and refinement stages.
Top k spatial joins
Concept
A spatial join finds all pairs of objects satisfying a given relation
between the objects
Given two data sets A and B, the top-k spatial Join retrieves the k
objects in data set A or B that intersect the maximum number of
objects from the other data set
4
Example – Parallel spatial join
Steps•Task creation - Creating a set of tasks to be executed in parallel.
•Task assignment
•Task execution
Src: Parallel Processing of Spatial Joins Using R-trees Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger
5
New Learning Objectives
Learning Objectives (LO)
LO2 : Learn about alternative algorithms to process spatial queries
LO6: Introduction to query models
LO7: Understanding new spatial query types
• LO7.1 : Understanding concept of RNN queries
• LO7.2 : Understanding concept of skyline queries
LO8 : Trends : Spatial queries on Hadoop Map Reduce
Mapping Sections to learning objectives
LO2
LO6
LO7
LO8
- 5.1.6
- 5.7
- 5.8
- 5.9
6
LO6: Introduction to query models
Concept
Overview of Query models for Oracle spatial & ArcSDE
Oracle Spatial: provides a SQL schema and functions that facilitate the storage,
retrieval, update, and query of collections of spatial features in an Oracle
database.
Oracle Spatial uses a two-tier query model to resolve spatial queries and
spatial joins. It implements the idea of Filter-Refine Paradigm. The two
operations are referred to as primary and secondary filter operations.
The primary filter permits fast selection of candidate records to pass along to the
secondary filter.
The secondary filter-Expensive- yields an accurate answer to a spatial query.
7
Example
• The primary filter checks to see if the
MBRs of the candidate objects interact,
not whether the objects themselves
interact.
•The secondary filter ensures that only
candidate objects that actually interact
are selected.
8
New Learning Objectives
Learning Objectives (LO)
LO2 : Learn about alternative algorithms to process spatial queries
LO6: Introduction to query models
LO7: Understanding new spatial query types
• LO7.1 : Understanding concept of RNN queries
• LO7.2 : Understanding concept of skyline queries
LO8 : Trends : Spatial queries on Hadoop Map Reduce
Mapping Sections to learning objectives
LO2
LO6
LO7
LO8
- 5.1.6
- 5.7
- 5.8
- 5.9
9
LO7.1: Understand concept of rnn queries
Reverse Nearest Neighbor Queries
Concept – Focuses on inverse relations among points
Example - 5 data points
What are the RNNs of 1?
4
2
1
3
5
10
Example: Business Impact Analysis
11
Algorithm
Step 1: For each point p ε S, determine the distance to the nearest
neighbor of p in S, denoted N(p).
N(p) = min q ε S –{p} d(p,q). For each p ε S, generate a circle
(p,N(p)) where p is its center and N(p) its radius.
Step 2: For any query q (example Target store), determine all the
circles (p,N(p)) that contain q and return their centers p.
12
New Learning Objectives
Learning Objectives (LO)
LO2 : Learn about alternative algorithms to process spatial queries
LO6: Introduction to query models
LO7: Understanding new spatial query types
• LO7.1 : Understanding concept of RNN queries
• LO7.2 : Understanding concept of skyline queries
LO8 : Trends : Spatial queries on Hadoop Map Reduce
Mapping Sections to learning objectives
LO2
LO6
LO7
LO8
- 5.1.6
- 5.7
- 5.8
- 5.9
13
LO7.2 : Understanding concept of skyline queries
Example You have to attend a conference and for your stay you are trying to
find a good hotel. Your purpose is to optimize this hotel search so
that both the distance from conference centre as well as price of the
booking is low.
14
Concept
Domination: a point dominates A another point B if and only if the
coordinate of A on any axis is not larger than the corresponding
coordinate of B.
15
Example
Given a set of points, the skyline query returns a set of points
(referred to as the skyline points), such that any point in skyline is
not dominated by any other point in the dataset.
16
Example contd….
h6
h5
h1
h7
h9
h8
S1
h11
h10
h2
h13
h12
S3
Price
S2
h3
Distance from conference center
S4
h4
Example contd….
12
h6
h5
10
h1
8
h8
S1
6
S2
h7
h9
h11
4
h10
Price
h12
S3
2
h3
h2
h13
S4
h4
0
0
2
4
6
Distance from conference center
8
10
12
Result
12
10
h1
8
6
4
2
h4
h2
Price
0
0
2
4
6
Distance from conference center
8
10
12
New Learning Objectives
Learning Objectives (LO)
LO2 : Learn about alternative algorithms to process spatial queries
LO6: Introduction to query models
LO7: Understanding new spatial query types
• LO7.1 : Understanding concept of RNN queries
• LO7.2 : Understanding concept of skyline queries
LO8 : Trends : Spatial queries on Hadoop Map Reduce
Mapping Sections to learning objectives
LO2
LO6
LO7
LO8
- 5.1.6
- 5.7
- 5.8
- 5.9
20
Spatial Query Evaluation on Hadoop
Hadoop
HDFS – Hadoop Distributed File System
Map Reduce : Programming paradigm
21
Parallel Databases v/s Map Reduce
Parallel DBMS or Map Reduce Hadoop
Parallel DBMS
Structured Data
Expensive to set up
Complex analytics not easy
Hadoop
Semi Structured data
Can be done with low
budget
Conclusion:
Hadoop/Map
reduce cannot
replace DBMS
Complex analytics easier
Combination or
Map Reduce
and SQL -
Aster Data
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden & M. Stonebraker
"A comparison of approaches to large-scale data analysis," SIGMOD ’09
22
Spatial Query Evaluation
Map Stage
1) Homogenize data
2) Map to tiles.
3) Merge tiles into
buckets.
Reduce Stage
1) Filter to find
overlapping
MBRs
2) Refine results
23
Download