Chapter 8 - Pick Managing Spatial Data

advertisement
Managing Spatial Data
Chapter 8 Slides from
James Pick, Geo-Business: GIS in the Digital
Organization, John Wiley and Sons, 2008.
Copyright © 2008 John Wiley and Sons.
DO NOT CIRCULATE WITHOUT
PERMISSION OF JAMES PICK
Copyright (c) 2008 by John Wiley
and Sons
Topics covered
• The lecture concerns the design and uses of spatial
databases and data warehouses.
• Database model approaches of relational, objectoriented, and object-relational are explained and
compared on their pluses and minuses. The first two are
covered in Connections-A. The relational model is the
most frequently used one for GIS.
• Data warehouses have different uses than databases.
They are used to archive huge amounts of data over a
long period of time. Their data design is simpler than for
data-bases.
• Spatial data warehouse examples are gives on an auto
insurance firm and the City of Portland
• The final topic is data quality. GIS professionals and
users need to be scrutinizing continually for data errors
and correcting them.
Copyright (c) 2008 by John Wiley
and Sons
Review - Design Elements of a GIS
Relational databases involve the
manipulation of the
attribute tables and
intermediate tables
that are created
(Source: Pick, 2007)
Copyright (c) 2008 by John Wiley
and Sons
Relationship
of Spatial
Data and
Attribute
Data
(Fig. 8.2)
Copyright (c) 2008 by John Wiley
and Sons
Data-bases: Relational Model
• The relational model is based on organizing data
into a series of tables. For each table, the rows
represent records, while the columns indicate
attributes.
– An example is the Attribute Tables in ArcGIS
– The tables are related to each other through relational
operators. For instance, the Join operator joins two
tables together. Many operators can work in
sequence to support complex spatial data
manipulation (think of some of the manipulations in
the lab from ArcGIS.
Copyright (c) 2008 by John Wiley
and Sons
Example of Relational Tables and Sequence
of Operations by Three Relational Operators
Copyright (c) 2008 by John Wiley
and Sons
Example of
Relational Tables
and Sequence of
Operations by
Three Relational
Operators on
Supplier Data (Fig.
8.3)
Relational Model
In it, based on queries,
tables are sliced, diced,
and combined to yield new
tables.
The starting and ending
attributes are spatially
referenced (X,Y
coordinates)
Copyright (c) 2008 by John Wiley
and Sons
Copyright (c) 2008 by John Wiley
and Sons
Data-bases: Object-oriented
Data Model
• The key unit is the object, which represents a real world
“thing” having attributes and behaviors. It is able to relate
to, and communicate with other objects by sending
messages to them that activate their behaviors.
• The objects can be organized into hierarchical classes, so
that characteristics can be inherited from higher objects to
lower ones.
• The object-oriented model is more suitable for models
applied to rapidly changing environments having complex
behaviors.
• Object-oriented programming languages, such as Visual
Basic, Java, and C++, allow the building and manipulating
of objects, including spatial objects. GIS software is
programmed today in object-oriented languages, and
developers can customize GIS applications by using these
languages.
Copyright (c) 2008 by John Wiley
and Sons
Spatial Object and Example (Fig 8.4)
Copyright (c) 2008 by John Wiley
and Sons
Spatial Objects Showing Multiplicity
and Class Hierarchy
Copyright (c) 2008 by John Wiley
and Sons
Copyright (c) 2008 by John Wiley
and Sons
Data-bases: Object-Relational
Data Model
• In this model, object-oriented capabilities
complement a relational database.
• Relational tables remain as the place for data
storage, but the relational model can interact
with some object-oriented functionality on top.
• This model is appearing in the commercial
marketplace in some major contemporary
products including Oracle Spatial 10g and
ESRI’s Geodatabase model of ArcGIS, which is
mixed object-relational.
Copyright (c) 2008 by John Wiley
and Sons
(Source: Modified from Tomlinson, 2003)
Copyright (c) 2008 by John Wiley
and Sons
TABLE 8.5 Appropriate Data Model for Certain Data Modeling Situations in Business (cont.)
Copyright (c) 2008 by John Wiley
and Sons
Oracle Spatial 11g
• Oracle Spatial 11g is a Spatial Relational Database that
is a version of the Standard Oracle 11 Database product,
which is among the leading databases for medium and
large businesses. Note: 11g recently superseded 10g.
• Within Oracle Spatial 11g, there are a number of Spatial
Functions available.
• Among them is the geodetic function supporting the use
of the latitude/longitude coordinate system while other
functions handle indexing, partitioning, and aggregation
for spatial data. Relational spatial operators can change
and transform spatial data.
• Oracle Spatial 11g can be a great choice for a large
business that has invested hugely in its Oracle
mainframe data-bases, but doesn’t need a lot of spatial
functionality.
– That said, Oracle Spatial 11g’s functionality is improving year by
year, and today could be classified as moderate.
Copyright (c) 2008 by John Wiley
and Sons
Copyright (c) 2008 by John Wiley
and Sons
New York City’s Integrated Data
Architecture Using Oracle Spatial 11g
• New York City standardized on Oracle Spatial
11g, with the justification was that Oracle had for
some time supported the non-spatial, heavyduty database processing for the city
• the decision to centralize its spatial applications
on Oracle Spatial 11g as the main repository
leveraged on the city’s existing Oracle
knowledge and skill base, as well as offered the
capacity to support a very large spatial
processing demand.
Copyright (c) 2008 by John Wiley
and Sons
Oracle Spatial 11g Integrated Data
Architecture for New York City (Fig 8.9)
(Source: GITA)
Copyright (c) 2008 by John Wiley
and Sons
Pluses and Minuses of Oracle
Spatial 10g (or 11g)
• The pluses are the potential for high-volume
spatial applications in the enterprise
environment, and potential in large IT shops to
leverage the Oracle knowledge already present.
• A minus for the GIS or IT department of a
smaller enterprise it that it may not have the
knowledge or skills to support Oracle Spatial.
• Another minus is that the spatial features are
only moderate and the GIS interface may be
less-friendly than for some other packages.
• Perhaps the greatest deterrent is the high cost of
Oracle databases.
Copyright (c) 2008 by John Wiley
and Sons
Enmax Case Study
in Geo-Business
• Enmax is a private corporation
wholly owned by the City of
Calgary in Canada.
• It serves a territory around Calgary of
422 square miles and has over
360,000 customers.
• It distributes natural gas and electricity
and has started an initiative in wind
energy.
• It’s process of adoption of an enterprise
approach with Oracle Spatial 10g is the
focus of this case.
Copyright (c) 2008 by John Wiley
and Sons
Enmax
Database
Configuration
(Fig 8.12)
(Source: Lawrence, 2005)
Copyright (c) 2008 by John Wiley
and Sons
Spatial Data Warehouses
• A data warehouse is oriented towards a
subject-oriented view of data, rather than
query-oriented. It receives data from one
or multiple relational databases, stores
large or massive amounts of data, and
emphasizes permanent storage of data
received over periods of time.
Copyright (c) 2008 by John Wiley
and Sons
Data Warehouse Star Schema
including location
Copyright (c) 2008 by John Wiley
and Sons
Spatially-enabling a data warehouse
• Data warehouses can be spatially-enabled in several ways.
– The data in the warehouse can have spatial attributes,
supporting mapping. Mapping functions are built into
some data warehouse packages.
– “Slicing and dicing” and what-if spreadsheet-like
functions are performed on the data in the warehouse,
and may include spatial characteristics.
• Technically, this follows the OLAP data management
model, which was proposed originally in the 1990s
by Codd.
– Furthermore, the data warehouse can be linked to GIS,
data mining, and other software packages for more
spatial and numerical analysis.
Copyright (c) 2008 by John Wiley
and Sons
The Data Warehouse and Its Data Flows,
Spatial Functions and Components
Copyright (c) 2008 by John Wiley
and Sons
Spatial data warehouse: Example
in Auto Insurance
• Spatial data warehouses can be built for large-scale
analysis of auto insurance.
• In this real-world example, the data warehouse resides
in Oracle Spatial 11g.
• The business items in the data warehouse have location
attributes that include census blocks, locations of
policies, business sites, landmarks, elevation, and traffic
characteristics.
• For data warehouses in auto risk insurance, maps can
be produced that take spatial views from the usual ZIPcode geography down to hundreds of block groups,
small areas within the ZIPs (Reid, 2006).
• This allows underwriters to set more refined policy
pricing. The geoprocessing needs to be fast, many tens
of millions of location data processed per day (Reid,
2006).
Copyright (c) 2008 by John Wiley
and Sons
Example of City of Portland
• The data consist of city and regional traffic
accidents from the Oregon Department of
Transportation.
• The solution combined an SQL Server data
warehouse with a customized program written in
ArcObjects API (application programming
interface) from ESRI Inc.
• There is a pre-defined schema of non-spatial
and spatial attributes for transport of data
between the data warehouse and the ArcObjects
program.
Copyright (c) 2008 by John Wiley
and Sons
Example of City of Portland (cont.)
• The city’s spatial data warehouse for city and regional
traffic accidents has over fifteen years of data and
fourteen dimensions, including time, streets, age and
gender of participants, cause, surface, and weather.
• The volume of data is huge, so attention was given to
mitigating performance bottlenecks (SQL Server
Magazine, 2002).
• A customized program allows the GIS software to
utilize part or all of the data warehouse.
• The benefits of this data-warehouse/GIS approach
included halving of replication time for a time slice of
data, fast spatial queries, and response times
shortened by twenty-fold or more
Copyright (c) 2008 by John Wiley
and Sons
Spatial Data Quality
• No matter how sophisticated the storage and access of data, for its
ultimate use, the data are only as good as their quality.
• An example from the field of medicine is preventable deaths from
medical errors, which was estimated as 44,000 to 98,000 Americans
yearly (Institute of Medicine, cited in Pierce, 2003).
• Likewise with GIS, the impacts of poor data quality can be profound.
– What if a governmental spatial system tracking shipments of
nuclear materials has errors so that it recommends the wrong
nuclear shipment routes, compromising security.
– In business, what if an insurance underwriter receives erroneous
data from a spatial database about a large customer commercial
property and prices the property policy too low?
– What if a private health-care firm’s ambulance routing software is
inaccurate for section of a city, cutting crucial minutes from the
transport of critically ill patients?
• Data quality is a crucial topic for the success of GIS.
• Management has the responsibility to exercise control and maintain
data quality.
Copyright (c) 2008 by John Wiley
and Sons
Spatial Data has Distinctive
Considerations to Achieve Data Quality
(1) Spatial completeness. Are there sufficient types and
numbers of spatial features for the problem at hand?
(2) Coverage. Does the geographic extent of the data
correspond to the extent of the problem at hand? Are the
geographic features consistent in the procedures used to
locate them across the whole coverage?
(3) Transforming spatial data. When data are aggregated,
joined, split apart, and queried in the data transformation
inside databases and data warehouses, errors can occur
leading to erroneously transformed results.
(4) Accuracy. This can be divided (Tomlinson, 2003) into
referential (error in referring to a spatial feature), topological
(error in the presenting of the topology, such as a broken
line segment), relative (two features are not located
correctly one to the other), and absolute (error in the map
position relative to the true earth position).
Copyright (c) 2008 by John Wiley
and Sons
Summary
• Data management is essential to GIS success.
• Each of the relational, object-oriented, and
object relational data models has pluses and
minuses and is appropriate for certain problems.
• Data warehouses contrast with databases in
being non-volatile and storing data historically.
• The data quality issues permeate data
management, since the use of data is
compromised if quality is low.
• The data management issues of GIS are similar
to those of IS in most ways, but the additional
need to handle spatial data makes GIS different
and unique.
Copyright (c) 2008 by John Wiley
and Sons
Download