Introduction to Spatial Databases

advertisement
Csci6330 Spatial Database Management
Yan Huang
huangyan@unt.edu
http://www.cs.unt.edu/~huangyan/6330
1
Today’s Outline
Introduction
Syllabus
Course description
Next class’s topic
2
Csci6330 Spatial Database Management
Instructor: Yan Huang
Office:
NTRP F251
Office Tel: 940-369-8353
Email:
huangyan@unt.edu
Office Hour: T, Th 8:55-9:55AM, or by appointment.
Class Time and Location: TR, 10:00am-11:20am, NTRP B190
Class Website: http://www.cs.unt.edu/~huangyan/6330
3
Self Introduction
Name
Major
Previous study and working experiences
Computer background: hardware, software
Why taking this course
Expectation for this course
4
Course Syllabus
5
Textbook
Text Book (required):
Spatial Databases : A Tour
• By Shashi Shekhar and Sanjay Chawla
• Available in the Book Store
Reference Book
Spatial Databases: with application to GIS
• By Philippe Rigaux, Michel Scholl, and Agnes Voisard
GIS: A Computing Perspective, Taylor and Francis
• By M.R. Worboys
6
Background
Prerequisite: csci4350 or csci5350
Entity-Relationship Model
File Organizations
Index Structures for Files
Basic Queries in SQL
Query Processing
Query Optimization
7
Course Description
This course treats a specific advanced topic of
current research interest in the area of spatial data
and information.
The main objective of this class is to study research
methods and literature in spatial database systems.
Core research skills of literature analysis,
innovation, evaluation of new ideas, and
communication are emphasized via homework
assignments and projects.
8
Topics
Introduction to Spatial Databases
Spatial Concepts and Data Models
Spatial Query Languages: SQL3, OGIS
Spatial Storage and Indexing: R-tree, Grid files,
Query Processing and Optimization
Strategies for range query, nearest neighbor query
Spatial joins (e.g. tree matching), cost models for new strategies,
impact on rule based optimization.
Spatial Networks:
Query language for graphs, graph algorithms, access methods.
Introduction to Spatial Data Mining
Spatial auto-correlation, co-location patterns, spatial outliers,
classification methods
Trends in Spatial Databases: mobile wireless applications, spatiotemporal data.
9
Homework
Assignments
Include research paper critique and presentation
Assignments will be due at the beginning of the class on the
due date
All assignments must have your name, student ID and course
name/number
You are encouraged to type your answers
Late homework
Submit to my office on paper
30% penalty for the first 24-hour
70% penalty for >= 24 hour
10
Examinations/Grading
There will be no exams for this course
Grading
Class participation and 10 quiz 20%
Paper analysis and presentation 30%
Final Project 50%
• Presentation 20%
• Final Report 30%
11
Paper Review and Presentation
Each student is expected to write a critique of TWO
paper P and Q
Present their overview in the class
This homework will help us learn literature analysis
skills towards starting a thesis or to simply understand
a research paper.
12
Steps for Paper Review and Presentation
A1. Identify the paper P and Q you would present. Make a home page for
the course assignment. Identify paper R and S you would like to lead
discussion
A2. Submit the narratives and slides of the paper P and Q. Electronic
copy will be linked to the web-page for the course to make those
available to the audience. The written critique should be about 2000
words long and the oral presentation should consist of about 30
transparencies addressing the following questions:
Problem statement
List the major contributions of the paper
What are the key concepts behind the approach in this paper
What is the validation methodology
List the assumptions made by the authors
If you were to rewrite this paper today, what would you preserve and
what would be revise? Briefly justify.
A3. Make a 45-minutes oral presentation in the class on paper P and Q.
A4. Lead the group discussion when paper R and S are presented.
13
Term Paper/Project
P0. Identify a team of two for your project and create a webpage for the
project
P1. Define possible questions
P2. Identify key sources, types of evidence
P3. Summary of key readings
P4. Possible outline or overall structure for paper / project.
P5. Formal proposal: Includes questions or thesis, key resources, and
proposed structure revised with feedback from peers and the instructor.
P6. Rough draft. Follow the outlines from the research papers covered in
the course. For example, the draft may have 5 sections:
Problem Statement, Significance of the problem
Related Work and Our Contributions
Proposed Approach
Validation of listed contributions (experimental, analytical)
Conclusions and Future Work
P7. Oral presentation in the class using 20 transparencies
14
Honor System
All work is to be done under the provisions of the
University of North Texas Honor System.
Students are encouraged to discuss the interpretation
of an assignment
However, the actual solution to problems must be one's
own
In research report, related work should be properly
cited
15
Helpful Comments
Read the textbook and papers before class
Participate class discussion
Review textbook and papers regularly after class
Start working on the assignments soon after they are
handed out.
Come to my office hours for questions and project
discussion
16
Chapter 1: Introduction to Spatial Databases
1.1 Overview
1.2 Application domains
1.3 Compare a SDBMS with a GIS
1.4 Categories of Users
1.5 An example of an SDBMS application
1.6 A Stroll though a spatial database
1.6.1 Data Models, 1.6.2 Query Language, 1.6.3 Query Processing,
1.6.4 File Organization and Indices, 1.6.5 Query Optimization,
1.6.6 Data Mining
17
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
• Application domains
• Users
• How is different from a DBMS?
LO2: Understand the concept of spatial databases
LO3: Learn about the Components of SDBMS
Mapping Sections to learning objectives
LO1
1.1, 1.2, 1.4
LO2
1.3, 1.5
LO3
1.6
18
Value of SDBMS
Traditional (non-spatial) database management systems provide:
Persistence across failures
Allows concurrent access to data
Scalability to search queries on very large datasets which do not fit
inside main memories of computers
Efficient for non-spatial queries, but not for spatial queries
Non-spatial queries:
List the names of all bookstore with more than ten thousand titles.
List the names of ten customers, in terms of sales, in the year 2001
Use an index to narrow down the search
Spatial Queries:
List the names of all bookstores with ten miles of Minneapolis
List all customers who live in Tennessee and its adjoining states
List all the customers who reside within fifty miles of the company
headquarter
19
Value of SDBMS – Spatial Data Examples
Examples of non-spatial data
Names, phone numbers, email addresses of people
Examples of Spatial data
Census Data
NASA satellites imagery - terabytes of data per day
Weather and Climate Data
Rivers, Farms, ecological impact
Medical Imaging
Exercise: Identify spatial and non-spatial data items in
A phone book
A Product catalog
20
Value of SDBMS – Users, Application Domains
Many important application domains have spatial data and
queries. Some Examples follow:
Army Field Commander: Has there been any significant
enemy troop movement since last night?
Insurance Risk Manager: Which homes are most likely to
be affected in the next great flood on the Mississippi?
Medical Doctor: Based on this patient's MRI, have we
treated somebody with a similar condition ?
Molecular Biologist:Is the topology of the amino acid
biosynthesis gene in the genome found in any other sequence
feature map in the database ?
Astronomer:Find all blue galaxies within 2 arcmin of quasars.
Exercise: List two ways you have used spatial data. Which
software did you use to manipulate spatial data?
21
Due and Next Class
Paper review selection and webpage
Team formation and project webpage
Prepare on a open book quiz with 2 questions based
on today’s topic
Read chapter 1.1,1.2,1.4 if you have not
Next class: chapter 1.3,1.5,1.6 of the book
22
Agenda Today
Quiz with two questions
Paper selection and team formation
Two papers to present and two papers to lead discussion
Team of two or one
Topics
Concepts of spatial database
Components of a spatial database
Summary of chapter 1
Next class’s topic
23
Quiz
Give a scenario and a query in your life which involves spatial
data.
Give a possible reason why spatial analysis using spatial queries
on customer data may be important to vendors such as best buy
and walmart.
24
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
LO2: Understand the concept of spatial databases
• What is a SDBMS?
• How is it different from a GIS?
LO3: Learn about the Components of SDBMS
Sections for LO2
Section 1.5 provides an example SDBMS
Section 1.1 and 1.3 compare SDBMS with DBMS and GIS
25
What is a SDBMS ?
A SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial abstract data types (ADTs)
and a query language from which these ADTs are callable
supports spatial indexing, efficient algorithms for processing
spatial operations, and domain specific rules for query
optimization
Example: Oracle Spatial data cartridge, ESRI SDE
can work with Oracle 8i DBMS
Has spatial data types (e.g. polygon), operations (e.g. overlap)
callable from SQL3 query language
Has spatial indices, e.g. R-trees
IBM: Spatial Option
Informix: Spatial Datablade
26
SDBMS Example
Consider a spatial dataset with:
County boundary (dashed white line)
Census block - name, area, population,
boundary (dark line)
Water bodies (dark polygons)
Satellite Imagery (gray scale pixels)
Storage in a SDBMS table:
create table census_blocks (
name
string,
area
float,
population
number,
boundary
polygon );
Fig 1.2
27
Modeling Spatial Data in Traditional DBMS
•A row in the table census_blocks (Figure 1.3)
• Question: Is Polyline datatype supported in DBMS?
Figure 1.3
28
Spatial Data Types and Traditional Databases
Traditional relational DBMS
Support simple data types, e.g. number, strings, date
Modeling Spatial data types is tedious
Example: Figure 1.4 shows modeling of polygon using numbers
Three new tables: polygon, edge, points
• Note: Polygon is a polyline where last point and first point are same
A simple unit sqaure represented as 16 rows across 3 tables
Simple spatial operators, e.g. area(), require joining tables
Tedious and computationally inefficient
Question. Name post-relational database management systems
which facilitate modeling of spatial data types, e.g. polygon.
29
Mapping “census_table” into a Relational Database
Fig 1.4
30
Evolution of DBMS technology
Fig 1.5
31
Spatial Data Types and Post-relational Databases
Post-relational DBMS
Support user defined abstract data types
Spatial data types (e.g. polygon) can be added
Choice of post-relational DBMS
Object oriented (OO) DBMS
Object relational (OR) DBMS
A spatial database is a collection of spatial data types, operators,
indices, processing strategies, etc. and can work with many postrelational DBMS as well as programming languages like Java, Visual
Basic etc.
32
An Example : OO-Model
Class Highway
tuple (highway_name: string, highway_type: string,
sections: list(Section))
Class Section
tuple (section_name: string,
number_lanes: integer,
city_start: City,
city_end: City,
geometry: Line)
Class City
Tuple (city_name: string,
population: integer,
geometry: Region)
33
An Example : OO-Model
Method
Method length in class Line: real (Computes the length of a line.)
Query: Length of Interstate 99
Sum (Select s. geometry->length()
from h in All_highways, s in h.sections
where h.highway_name = ‘I99’)
34
How is a SDBMS different from a GIS ?
GIS is a software to visualize and analyze spatial data
using spatial analysis functions such as
Search Thematic search, search by region, (re-)classification
Location analysis Buffer, corridor, overlay
Terrain analysis Slope/aspect, catchment, drainage network
Flow analysis Connectivity, shortest path
Distribution Change detection, proximity, nearest neighbor
Spatial analysis/Statistics Pattern, centrality, autocorrelation,
indices of similarity, topology: hole description
Measurements Distance, perimeter, shape, adjacency, direction
GIS uses SDBMS
to store, search, query, share large spatial data sets
35
How is a SDBMS different from a GIS ?
SDBMS focuses on
Efficient storage, querying, sharing of large spatial datasets
Provides simpler set based query operations
Example operations: search by region, overlay, nearest
neighbor, distance, adjacency, perimeter etc.
Uses spatial indices and query optimization to speedup queries
over large spatial datasets.
SDBMS may be used by applications other than GIS
Astronomy, Genomics, Multimedia information systems, ...
36
Evolution of acronym “GIS”
Geographic Information Systems (1980s)
Geographic Information Science (1990s)
Geographic Information Services (2000s)
Fig 1.1
37
Three meanings of the acronym GIS
Geographic Information Services
Web-sites and service centers for casual users, e.g. travelers
Example: Service (e.g. AAA, mapquest) for route planning
Geographic Information Systems
Software for professional users, e.g. cartographers
Example: ESRI Arc/View software
Geographic Information Science
Concepts, frameworks, theories to formalize use and
development of geographic information systems and services
Example: design spatial data types and operations for querying
38
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
LO2: Understand the concept of spatial databases
LO3: Learn about the Components of SDBMS
• Architecture choices
• SDBMS components:
–
–
–
–
data model, query languages,
query processing and optimization
File organization and indices
Data Mining
Chapter Sections
1.5 second half
1.6 – entire section
39
Components of a SDBMS
Recall: a SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial ADTs and a query
language from which these ADTs are callable
supports spatial indexing, algorithms for processing spatial
operations, and domain specific rules for query optimization
Components include
spatial data model, query language, query processing, file
organization and indices, query optimization, etc.
Figure 1.6 shows these components
We discuss each component briefly in chapter 1.6 and in more
detail in later chapters.
40
Three Layer Architecture
Fig 1.6
41
1.6.1 Spatial Taxonomy, Data Models
Spatial Taxonomy:
multitude of descriptions available to organize space.
Topology models homeomorphic relationships, e.g. overlap,
adjacent
Euclidean space models distance and direction in a plane
Graphs models connectivity, Shortest-Path
Spatial data models
rules to identify identifiable objects and properties of space
Object model help manage identifiable things, e.g. mountains,
cities, land-parcels etc.
Field model help manage continuous and amorphous
phenomenon, e.g. temperature, satellite imagery, snowfall etc.
More details in chapter 2.
42
1.6.2 Spatial Query Language
• Spatial query language
• Spatial data types, e.g. point, linestring, polygon, …
• Spatial operations, e.g. overlap, distance, nearest neighbor, …
• Callable from a query language (e.g. SQL3) of underlying DBMS
SELECT S.name
FROM Senator S
WHERE S.district.Area() > 300
• Standards
•
•
•
•
SQL3 (a.k.a. SQL 1999) is a standard for query languages
OGIS is a standard for spatial data types and operators
Both standards enjoy wide support in industry
More details in chapters 2 and 3
43
Multi-scan Query Example
•Find all senators who serve a district of are greater than 300
square miles and who own a business within the district
• Spatial join example
SELECT S.name
FROM Senator S, Business B
WHERE S.district.Area() > 300 AND Within(B.location, S.district)
Fig 1.7
44
Multi-scan Query Example
•Find the name of all female senators who own a business
•Non-Spatial Join example
SELECT S.name
FROM Senator S, Business B
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
Fig 1.7
45
1.6.3 Query Processing
• Efficient algorithms to answer spatial queries
• Common Strategy - filter and refine
• Filter Step:Query Region overlaps with MBRs of B,C and D
• Refine Step: Query Region overlaps with B and C
Fig 1.8
46
Query Processing of Join Queries
•Example - Determining pairs of intersecting rectangles
• (a):Two sets R and S of rectangles, (b): A rectangle with 2 opposite corners
marked, (c ): Rectangles sorted by smallest X coordinate value
• Plane sweep filter identifies 5 pairs out of 12 for refinement step
•Details of plane sweep algorithm on page 15
Fig 1.9
47
1.6.4 File Organization and Indices
• A difference between GIS and SDBMS assumptions
•GIS algorithms: dataset is loaded in main memory (Fig. 1.10(a))
•SDBMS: dataset is on secondary storage e.g disk (Fig. 1.10(b))
•SDBMS uses space filling curves and spatial indices
•to efficiently search disk resident large spatial datasets
•Memory access: 10ns, disk access 10,000,000ns (2000)
Fig 1.10
48
Organizing spatial data with space filling curves
•Issue:
•Sorting is not naturally defined on spatial data
•Many efficient search methods are based on sorting datasets
•Space filling curves
•Impose an ordering on the locations in a multi-dimensional space
•Examples: row-order (Fig. 1.11(a), z-order (Fig 1.11(b))
• Allow use of traditional efficient search methods on spatial data
Fig 1.11
49
Spatial Indexing: Search Data-Structures
•Choice for spatial indexing:
•B-tree is a hierarchical collection of ranges of linear keys, e.g. numbers
•B-tree index is used for efficient search of traditional data
•B-tree can be used with space filling curve on spatial data
•R-tree provides better search performance yet!
•R-tree is a hierarchical collection of rectangles
•More details in chapter 4
Fig 1.12: B-tree
Fig. 1.13: R- tree
50
1.6.5 Query Optimization
•Query Optimization
• A spatial operation can be processed using different strategies
• Computation cost of each strategy depends on many parameters
•Query optimization is the process of
•ordering operations in a query and
•selecting efficient strategy for each operation
•based on the details of a given dataset
•Example Query:
SELECT S.name
FROM Senator S, Business B
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
•Optimization decision examples
•Process (S.gender = ‘Female’) before (S.soc-sec = B.soc-sec )
•Do not use index for processing (S.gender = ‘Female’)
51
1.6.6 Data Mining
• Analysis of spatial data is of many types
• Deductive Querying, e.g. searching, sorting, overlays
• Inductive Mining, e.g. statistics, correlation, clustering,classification, …
• Data mining is a systematic and semi-automated search for
interesting non-trivial patterns in large spatial databases
•Example applications include
•Infer land-use classification from satellite imagery
•Identify cancer clusters and geographic factors with high correlation
•Identify crime hotspots to assign police patrols and social workers
52
1.7 Summary
SDBMS is valuable to many important applications
SDBMS is a software module
works with an underlying DBMS
provides spatial ADTs callable from a query language
provides methods for efficient processing of spatial queries
Components of SDBMS include
spatial data model, spatial data types and operators,
spatial query language, processing and optimization
spatial data mining
SDBMS is used to store, query and share spatial data
for GIS as well as other applications
53
Stuff Due and Next Class
Paper review selection and webpage
Team formation and project webpage
Prepare on a open book quiz with 2 questions based
on today’s topic
Next week: spatial concepts and data model, chapter 2
of the book
Note: Sep 21, Two paragraphs describing the possible questions. Submit
an electronic copy and link it to your project website
54
Download