Managing Spatial Data Chapter 8 Slides from James Pick, Geo-Business: GIS in the Digital Organization, John Wiley and Sons, 2008. Copyright © 2008 John Wiley and Sons. DO NOT CIRCULATE WITHOUT PERMISSION OF JAMES PICK Copyright (c) 2008 by John Wiley and Sons Topics covered • The lecture concerns the design and uses of spatial databases and data warehouses. • Database model approaches of relational, objectoriented, and object-relational are explained and compared on their pluses and minuses. The first two are covered in Connections-A. The relational model is the most frequently used one for GIS. • Data warehouses have different uses than databases. They are used to archive huge amounts of data over a long period of time. Their data design is simpler than for data-bases. • Spatial data warehouse examples are gives on an auto insurance firm and the City of Portland • The final topic is data quality. GIS professionals and users need to be scrutinizing continually for data errors and correcting them. Copyright (c) 2008 by John Wiley and Sons Review - Design Elements of a GIS Relational databases involve the manipulation of the attribute tables and intermediate tables that are created (Source: Pick, 2007) Copyright (c) 2008 by John Wiley and Sons Relationship of Spatial Data and Attribute Data (Fig. 8.2) Copyright (c) 2008 by John Wiley and Sons Data-bases: Relational Model • The relational model is based on organizing data into a series of tables. For each table, the rows represent records, while the columns indicate attributes. – An example is the Attribute Tables in ArcGIS – The tables are related to each other through relational operators. For instance, the Join operator joins two tables together. Many operators can work in sequence to support complex spatial data manipulation (think of some of the manipulations in the lab from ArcGIS. Copyright (c) 2008 by John Wiley and Sons Example of Relational Tables and Sequence of Operations by Three Relational Operators Copyright (c) 2008 by John Wiley and Sons Example of Relational Tables and Sequence of Operations by Three Relational Operators on Supplier Data (Fig. 8.3) Relational Model In it, based on queries, tables are sliced, diced, and combined to yield new tables. The starting and ending attributes are spatially referenced (X,Y coordinates) Copyright (c) 2008 by John Wiley and Sons Copyright (c) 2008 by John Wiley and Sons Data-bases: Object-oriented Data Model • The key unit is the object, which represents a real world “thing” having attributes and behaviors. It is able to relate to, and communicate with other objects by sending messages to them that activate their behaviors. • The objects can be organized into hierarchical classes, so that characteristics can be inherited from higher objects to lower ones. • The object-oriented model is more suitable for models applied to rapidly changing environments having complex behaviors. • Object-oriented programming languages, such as Visual Basic, Java, and C++, allow the building and manipulating of objects, including spatial objects. GIS software is programmed today in object-oriented languages, and developers can customize GIS applications by using these languages. Copyright (c) 2008 by John Wiley and Sons Spatial Object and Example (Fig 8.4) Copyright (c) 2008 by John Wiley and Sons Spatial Objects Showing Multiplicity and Class Hierarchy Copyright (c) 2008 by John Wiley and Sons Copyright (c) 2008 by John Wiley and Sons Data-bases: Object-Relational Data Model • In this model, object-oriented capabilities complement a relational database. • Relational tables remain as the place for data storage, but the relational model can interact with some object-oriented functionality on top. • This model is appearing in the commercial marketplace in some major contemporary products including Oracle Spatial 10g and ESRI’s Geodatabase model of ArcGIS, which is mixed object-relational. Copyright (c) 2008 by John Wiley and Sons (Source: Modified from Tomlinson, 2003) Copyright (c) 2008 by John Wiley and Sons TABLE 8.5 Appropriate Data Model for Certain Data Modeling Situations in Business (cont.) Copyright (c) 2008 by John Wiley and Sons Oracle Spatial 11g • Oracle Spatial 11g is a Spatial Relational Database that is a version of the Standard Oracle 11 Database product, which is among the leading databases for medium and large businesses. Note: 11g recently superseded 10g. • Within Oracle Spatial 11g, there are a number of Spatial Functions available. • Among them is the geodetic function supporting the use of the latitude/longitude coordinate system while other functions handle indexing, partitioning, and aggregation for spatial data. Relational spatial operators can change and transform spatial data. • Oracle Spatial 11g can be a great choice for a large business that has invested hugely in its Oracle mainframe data-bases, but doesn’t need a lot of spatial functionality. – That said, Oracle Spatial 11g’s functionality is improving year by year, and today could be classified as moderate. Copyright (c) 2008 by John Wiley and Sons Copyright (c) 2008 by John Wiley and Sons New York City’s Integrated Data Architecture Using Oracle Spatial 11g • New York City standardized on Oracle Spatial 11g, with the justification was that Oracle had for some time supported the non-spatial, heavyduty database processing for the city • the decision to centralize its spatial applications on Oracle Spatial 11g as the main repository leveraged on the city’s existing Oracle knowledge and skill base, as well as offered the capacity to support a very large spatial processing demand. Copyright (c) 2008 by John Wiley and Sons Oracle Spatial 11g Integrated Data Architecture for New York City (Fig 8.9) (Source: GITA) Copyright (c) 2008 by John Wiley and Sons Pluses and Minuses of Oracle Spatial 10g (or 11g) • The pluses are the potential for high-volume spatial applications in the enterprise environment, and potential in large IT shops to leverage the Oracle knowledge already present. • A minus for the GIS or IT department of a smaller enterprise it that it may not have the knowledge or skills to support Oracle Spatial. • Another minus is that the spatial features are only moderate and the GIS interface may be less-friendly than for some other packages. • Perhaps the greatest deterrent is the high cost of Oracle databases. Copyright (c) 2008 by John Wiley and Sons Enmax Case Study in Geo-Business • Enmax is a private corporation wholly owned by the City of Calgary in Canada. • It serves a territory around Calgary of 422 square miles and has over 360,000 customers. • It distributes natural gas and electricity and has started an initiative in wind energy. • It’s process of adoption of an enterprise approach with Oracle Spatial 10g is the focus of this case. Copyright (c) 2008 by John Wiley and Sons Enmax Database Configuration (Fig 8.12) (Source: Lawrence, 2005) Copyright (c) 2008 by John Wiley and Sons Spatial Data Warehouses • A data warehouse is oriented towards a subject-oriented view of data, rather than query-oriented. It receives data from one or multiple relational databases, stores large or massive amounts of data, and emphasizes permanent storage of data received over periods of time. Copyright (c) 2008 by John Wiley and Sons Data Warehouse Star Schema including location Copyright (c) 2008 by John Wiley and Sons Spatially-enabling a data warehouse • Data warehouses can be spatially-enabled in several ways. – The data in the warehouse can have spatial attributes, supporting mapping. Mapping functions are built into some data warehouse packages. – “Slicing and dicing” and what-if spreadsheet-like functions are performed on the data in the warehouse, and may include spatial characteristics. • Technically, this follows the OLAP data management model, which was proposed originally in the 1990s by Codd. – Furthermore, the data warehouse can be linked to GIS, data mining, and other software packages for more spatial and numerical analysis. Copyright (c) 2008 by John Wiley and Sons The Data Warehouse and Its Data Flows, Spatial Functions and Components Copyright (c) 2008 by John Wiley and Sons Spatial data warehouse: Example in Auto Insurance • Spatial data warehouses can be built for large-scale analysis of auto insurance. • In this real-world example, the data warehouse resides in Oracle Spatial 11g. • The business items in the data warehouse have location attributes that include census blocks, locations of policies, business sites, landmarks, elevation, and traffic characteristics. • For data warehouses in auto risk insurance, maps can be produced that take spatial views from the usual ZIPcode geography down to hundreds of block groups, small areas within the ZIPs (Reid, 2006). • This allows underwriters to set more refined policy pricing. The geoprocessing needs to be fast, many tens of millions of location data processed per day (Reid, 2006). Copyright (c) 2008 by John Wiley and Sons Example of City of Portland • The data consist of city and regional traffic accidents from the Oregon Department of Transportation. • The solution combined an SQL Server data warehouse with a customized program written in ArcObjects API (application programming interface) from ESRI Inc. • There is a pre-defined schema of non-spatial and spatial attributes for transport of data between the data warehouse and the ArcObjects program. Copyright (c) 2008 by John Wiley and Sons Example of City of Portland (cont.) • The city’s spatial data warehouse for city and regional traffic accidents has over fifteen years of data and fourteen dimensions, including time, streets, age and gender of participants, cause, surface, and weather. • The volume of data is huge, so attention was given to mitigating performance bottlenecks (SQL Server Magazine, 2002). • A customized program allows the GIS software to utilize part or all of the data warehouse. • The benefits of this data-warehouse/GIS approach included halving of replication time for a time slice of data, fast spatial queries, and response times shortened by twenty-fold or more Copyright (c) 2008 by John Wiley and Sons Spatial Data Quality • No matter how sophisticated the storage and access of data, for its ultimate use, the data are only as good as their quality. • An example from the field of medicine is preventable deaths from medical errors, which was estimated as 44,000 to 98,000 Americans yearly (Institute of Medicine, cited in Pierce, 2003). • Likewise with GIS, the impacts of poor data quality can be profound. – What if a governmental spatial system tracking shipments of nuclear materials has errors so that it recommends the wrong nuclear shipment routes, compromising security. – In business, what if an insurance underwriter receives erroneous data from a spatial database about a large customer commercial property and prices the property policy too low? – What if a private health-care firm’s ambulance routing software is inaccurate for section of a city, cutting crucial minutes from the transport of critically ill patients? • Data quality is a crucial topic for the success of GIS. • Management has the responsibility to exercise control and maintain data quality. Copyright (c) 2008 by John Wiley and Sons Spatial Data has Distinctive Considerations to Achieve Data Quality (1) Spatial completeness. Are there sufficient types and numbers of spatial features for the problem at hand? (2) Coverage. Does the geographic extent of the data correspond to the extent of the problem at hand? Are the geographic features consistent in the procedures used to locate them across the whole coverage? (3) Transforming spatial data. When data are aggregated, joined, split apart, and queried in the data transformation inside databases and data warehouses, errors can occur leading to erroneously transformed results. (4) Accuracy. This can be divided (Tomlinson, 2003) into referential (error in referring to a spatial feature), topological (error in the presenting of the topology, such as a broken line segment), relative (two features are not located correctly one to the other), and absolute (error in the map position relative to the true earth position). Copyright (c) 2008 by John Wiley and Sons Summary • Data management is essential to GIS success. • Each of the relational, object-oriented, and object relational data models has pluses and minuses and is appropriate for certain problems. • Data warehouses contrast with databases in being non-volatile and storing data historically. • The data quality issues permeate data management, since the use of data is compromised if quality is low. • The data management issues of GIS are similar to those of IS in most ways, but the additional need to handle spatial data makes GIS different and unique. Copyright (c) 2008 by John Wiley and Sons