lecture 11 ppt

advertisement
Introduction to Geographic Information Systems
Fall 2013 (INF 385T-28620)
Data Modeling,
Database Design
You are here
Food is there
Well, now it’s there
INF385T(28620) – Fall 2013 – Lecture 11
3
Outline









The design process
Describing a system of viewpoints
Developing use cases
GIS design stages
Conceptual design
Normalization example
Database diagrams
Building the database
Prototype to production
INF385T(28620) – Fall 2013 – Lecture 11
4
Best practices:
iterative, incremental development
Ideas, Use Cases,
Requirements
Planning, Cost-Benefit,
Risk Management
Analysis, Design
and Evaluation
Maintenance,
Evaluation
Deployment
INF385T(28620) – Fall 2013 – Lecture 11
Development,
Quality Control
and Evaluation
Note: This is only an
illustration – your experience
may differ!
5
Best practices: iterative approach

Focus on current issues
 Short iterations better reflect near-term internal
and external environment




Resolve misunderstandings early
Resolve analysis, design and
implementation disparities early
More accurate overall project status
Workload better distributed across life cycle
 Medium- to long-term changes in environment
factored in more easily through spiral approach
INF385T(28620) – Fall 2013 – Lecture 11
6
Starting out


Define the project
Get to know the client
 Stakeholders and decision makers

Build the design team
 Analyst
 Subject matter experts
 Users

Inventory tasks, products, data
 Identify model
 Develop Use Cases
 Inventory data
INF385T(28620) – Fall 2013 – Lecture 11
GIS Design Seminar

Teach client GIS concepts
and design methods

Introduce yourself and the
design team

Set realistic expectations
for the GIS system

Alleviate fears and
concerns
7
Building a System… of Viewpoints
Community Objectives
Business aspects: purpose,
scope and policies
What for? Why? Who? When?
Enterprise
Viewpoint
Abstract/Best Practices
Information
sources
Information
and
Viewpoint
models
What is it about?
Computational
Viewpoint
Types of services and
protocols
How does each bit
work?
Implementation/Development
Engineering
Viewpoint
Technology
Viewpoint
INF385T(28620) – Fall 2013 – LectureViewpoints
11
Solution types:
distribution infrastructure
How do the components work
together?
Implementation system:
hardware, software, distribution
With what?
8
in “Reference Model - Open Distributed Processing (RM-ODP)” ISO/IEC 10746
Use cases

A description of a task you want the system to
perform
Use case
 Add new water service, record parcel sale

Notify Owners
Basis of all analysis and design
 Start simple; expand with detail later


Analysis of use cases yields
data, interfaces, applications
Use cases can:
 Capture existing work flows
 Define new applications
 Help understand alternative
and pathological work flows
Data
GIS Database
INF385T(28620) – Fall 2013 – Lecture 11
9
Use case diagrams
System boundary

Show the actor/use case
relationships
 System architecture
 Data flows, coordination

Develop with users
Manage flood
control
structures
Operations &
Maintenance
Staff
 During meetings, interviews
 Clean up and refine later

Graphical notation is helpful,
but the use case document
is the most important artifact
Geocode &
map call list
GIS Analyst
Use case
and name
INF385T(28620) – Fall 2013 – Lecture 11
Produce
flood maps
Document
locations of
flooding
Emergency
Call Center
Actor
and name
10
Example use case

Some context: For emergency response application,
assume a set of use cases focused on exchanging information
leading to creating, updating, and posting flood maps to
interested agencies.

Each use case is
documented according
to a template, such as:
Use Case Name
Description:
Actors:
Pre-conditions:
Post-conditions:
Flow of events:
- business rules
- user actions, responses
Exceptions:
Alternates:
INF385T(28620) – Fall 2013 – Lecture 11
11
Example use case





Use case: Flooding Information & Response
Description: The call center receives and documents citizen calls
related to flooding during storm events. The information is geocoded,
mapped, and provided to the Water District’s Operation &
Maintenance staff, which makes decisions to manage control
structures to mitigate flooding.
Actors: Emergency Response Call Center, GIS Analyst, Operations
& Maintenance Staff.
Pre-conditions: The database of critical water facilities has been
created. The Emergency Call Center has been activated for a storm
event. Water district staff are operating under emergency operating
procedures.
Post-conditions: Status change notices are sent to the relevant
agencies registered to receive updates.
INF385T(28620) – Fall 2013 – Lecture 11
12
Use case primary scenario
1 Citizen places a call to the Emergency Call Center
hotline.
2 Call center staff document location and description of
flooding problem.
3 GIS analyst receives and geocodes locations from the
call center, producing a map of call locations.
4 Call reports are symbolized based on flooding issue.
Maps are produced and handed off to the operations
staff.
5 Operations & Maintenance staff review maps of flooding
incidents and make decisions for operating control
structures, gates, and pumps.
INF385T(28620) – Fall 2013 – Lecture 11
13
Are we finished yet?

What layers will you need based on the use
case?

Where will you get these layers?

Are there any changes that can be made to
the business process?
INF385T(28620) – Fall 2013 – Lecture 11
14
Using the use cases





From the set of use cases developed, the functional
requirements and interfaces can be fleshed out.
From an understanding of the collaborators and stakeholders
involved in the use cases, appropriate data sources and
maintenance authorities can be determined.
From a comparison of all the use cases, redundant information
and tasks can be discovered and minimized.
From an examination of potential alternate scenarios,
pathological situations can be anticipated and mitigated.
BUT… beware the use case time sink
 You cannot completely and correctly document all the use cases
for a reasonably complex system in your lifetime
 Keep it simple, and start prototyping as soon as you can – this will
further inform the use cases and keep your project moving
INF385T(28620) – Fall 2013 – Lecture 11
15
GIS DESIGN
INF385T(28620) – Fall 2013 – Lecture 11
Designing the database
Conceptual
model
Logical
model
Key
Project
Feature collection
Business
practices
Collect information,
identify desired
themes and
sources
INF385T(28620) – Fall 2013 – Lecture 11
Roads
Network
Rail
Topology
Boundaries
Relationship
Physical
model
table
table
table
Map themes to GIS
database elements:
define database
entities and
organization
Complete data
organization,
build full schema,
test and refine
17
GIS design practice

Think about the GIS features represented by
thematic layers, and about the integrity and
behavior of those features-



Parcels are represented as polygons.
Parcels share geometry with boundaries.
Parcels do not overlap.
… etc.
INF385T(28620) – Fall 2013 – Lecture 11
18
Conceptual design







Entities, general relationships, important
attributes
Sketches
ER/UML conceptual diagrams
Spreadsheets
Often reconstruct from existing systems/datasets
Very important for complex projects
Very useful to communicate with domain
experts/business people
INF385T(28620) – Fall 2013 – Lecture 11
19
Conceptual design


Purpose and usage of GIS
Data sources
 coverages, shapefiles, CAD, etc.
 compilation scale and accuracy

Spatial representation
 raster, vector, surface, address

Attributes
 required fields,
types of measurement

Relationships
 network, topological, general
INF385T(28620) – Fall 2013 – Lecture 11
20
Conceptual design
It is important to understand what you want to
achieve from the outset
 Symbology and labels
 what symbols at which scales
 text presentation on the map
Key

Spatial reference
 projection and datum
 the largest area mapped
 required detail and resolution

Special design cases, for example:
 condominium parcels
 parcel annotation
INF385T(28620) – Fall 2013 – Lecture 11
21
Diagramming themes

Classic layer diagrams
 Organize data into logical units
 Focus on common data
elements to help determine:

Attributes

Associations

Spatial relationships
Water Use Application
Hydrology
Utilities
Boundaries
Roads
INF385T(28620) – Fall 2013 – Lecture 11
22
Documenting themes
Layer
Map use
Data source
Representation
Spatial relationships
Map scale, accuracy, currency
Symbology and annotation
Parcels
Parcels define land ownership and are used for taxation
Compiled from land ownership transactions and cadastral records
Polygons
Parcel polygons do not overlap
1:2400, +/- 5 ft, quarterly update
Labeled or annotated with house number and street name
Layer
Map use
Data source
Representation
Spatial relationships
Map scale, accuracy, currency
Symbology and annotation
Streets
Define the street centerline network
Public or commercial data products or various government agencies
Polylines
Streets intersect only at endpoints and generally do not overlap
1:12000, +/- 10 ft., semiannual update
Symbolized according to road classification, labeled with street name
2-23
Inventory existing data
Legacy data
Target data layers
Annotation
Boundaries
Lots
Parcels
PLSS Monument
PLSS Quarter
PLSS Section
PLSS Township
INF385T(28620) – Fall 2013 – Lecture 11
24
Inventory existing data
 Model database schema from existing data
 Bridge existing data with current technology, for example:
Legacy data
Target Data Layers
GIS Database
Annotation
Boundaries
Lots
Parcels
PLSS Monument
PLSS Quarter
PLSS Section
PLSS Township
INF385T(28620) – Fall 2013 – Lecture 11
• Boundaries hold survey
attributes
• Coverage parcel polygons
only exist for regions
25
Lecture 11
DATABASE DESIGN
INF385T(28620) – Fall 2013 – Lecture 11
Files, databases, and GIS



Data files contain text or other data in arbitrary formats
Data tables contain records with fields (attributes, data
items) identified by a primary key
Relational Database Management System (RDBMS or
just DBMS):
 creates and maintains relationships between data tables
 allows one or more users to create or edit data in the tables
 allows users to sort, select, and retrieve information using
QUERIES and REPORTS

GIS adds a spatial dimension to databases, by integrating
location and geometric shape information with the tables
INF385T(28620) – Fall 2013 – Lecture 11
27
Relational database
A formal information model called “relational”

Tables can have formal & ad hoc relationships,
based on:
Ab 32
34
R





Rows and columns
Known column types
Relationships
SQL language and operators
Cd
12
9
A
Ef
17
11
X
xz
53
46
G
11
ed
4
w
12
vg
9
f
24
kl
2
c
12
op
2
v
Relational is based on a simple, generic model with
many implementations (MS Access, IBM DB2, Oracle,
MS SQLServer, and many others)
INF385T(28620) – Fall 2013 – Lecture 11
28
Data tables
Organized into columns, rows, and cells (like a spreadsheet)
Columns = attributes = fields = data items
Rows = records
Cells = values
Attribute
or Column
Record or Row
Cell or Value
INF385T(28620) – Fall 2013 – Lecture 11
29
Defining columns
To define a column or attribute, you must specify
the column name and type
All DBMS’s support basic types:
•
•
•
•
Number (integer, float, decimal)
String (text)
Boolean (Yes/No)
Date
Many DBMS (SQL-99) and GIS systems support
additional types (BLOB, XML, time series, …)
INF385T(28620) – Fall 2013 – Lecture 11
30
Primary key
The field or combination of fields that identifies each and
every record uniquely within a table
Note: Primary
key is more often
arbitrary,
meaningless to
users; main
purpose is to be
unique
3-31
Water use permit
example
• Paper-based
application form
required for permit to
withdraw surface or
ground water
3-32
Sample data for water use permits
INF385T(28620) – Fall 2013 – Lecture 11
33
use codes
INF385T(28620) – Fall 2013 – Lecture 11
34
Relational organization



Tables should be organized according to basic rules of
relational design for most efficient use.
Normalization is a series of steps followed to obtain a
database design that allows for efficient access and
storage of data in a relational database. These steps
reduce data redundancy and the chances of data
becoming inconsistent.
3NF or BCNF are the usual standards for relational
database design, however performance and convenience
may drive toward de-normalization.
INF385T(28620) – Fall 2013 – Lecture 11
35
Database normalization steps






First Normal Form (1NF) eliminates repeating groups by putting each into
a separate table and connecting them with a one-to-many relationship.
Second Normal Form (2NF) eliminates functional dependencies on a
partial key by putting the fields in a separate table from those that are
dependent on the whole key.
Third Normal Form (3NF) eliminates functional dependencies on non-key
fields by putting them in a separate table. At this stage, all non-key fields
are dependent on the key, the whole key and nothing but the key.
Boyce-Codd Normal Form (BCNF) is sometimes applied as a stronger
form of 3NF in which every determinant of a functional dependency within a
relation must be a candidate key for the schema.
Fourth Normal Form (4NF) separates independent multi-valued facts
stored in one table into separate tables.
Fifth Normal Form (5NF) breaks out data redundancy that is not covered
by any of the previous normal forms.
source - http://www.hyperdictionary.com/dictionary/database+normalisation
INF385T(28620) – Fall 2013 – Lecture 11
36
First normal form - NOT

Do you see any groups of repeating columns?
 What’s wrong with that?
 Can you think of a case where this is okay?

How would you reorganize to fix this?
INF385T(28620) – Fall 2013 – Lecture 11
37
First normal form
Foreign keys
Primary key
B
C
Primary key
A

The Use Code columns
can be removed from the
main table (A), and made
into rows of a separate
table (B), keyed by ActID.
(compare with previous slide)

INF385T(28620) – Fall 2013 – Lecture 11
(C) is a lookup table for
use code descriptions.
38
Relationship cardinality

With this design, one ActID can have any number of use
codes, and any one use code can be associated with
many ActID’s
 This is called a Many-to-Many (M:M) relationship
 This is much more space-efficient for data storage

One record in the Use Code Descriptions table (C) can
be associated with many records in the ActID-Use
Codes table (B)
 This is called a One-to-Many (1:M) relationship


You may also have relationships with fixed cardinalities,
such as 1:1, 1:2, 1:0..5, etc.
Cardinality of 1:0 generally means “nulls are allowed”
INF385T(28620) – Fall 2013 – Lecture 11
39
Second normal form - NOT

Do you see any dependencies between non-key columns
and a partial key?
 If the primary key were compound and included an OwnerID, there
could be such a dependency between Owner and OwnerID
Compound key


What’s wrong
with that?
What would
you do to
fix this?
INF385T(28620) – Fall 2013 – Lecture 11
40
Second normal form


Remove the non-key data
to a separate table and
link to it
… and clean up the data
while you’re at it!
 Spelling, abbreviations,
punctuation
 Firstname Lastname vs.
Lastname, Firstname
INF385T(28620) – Fall 2013 – Lecture 11
41
Third normal form - NOT
• Do you see any functional dependencies among
non-key fields in the table below?
» Need we ask again: what’s wrong with this?
• How would you reorganize to fix this?
3-42
Third normal form


Remove the source
description to a separate
table, and join using the
source code field
This will reduce duplication
of data (and errors)
INF385T(28620) – Fall 2013 – Lecture 11
43
Is that all there is to it?
Name
City
ST
PostalCode
John
Patty
Smith
Denver
Seattle
Vancouver
CO
WA
BC
80031
98107
V6C 1T2
 This table is NOT in Third Normal Form:
○ The PostalCode field is dependent on the City and ST fields
 To place this table in 3NF, a separate table would be created for the
City and ST fields, and joined using the PostalCode field
○ But this is generally not done with address & postal codes… WHY?
3-44
Normalization tradeoffs

When would you expect to normalize tables?
 For primary data entry and updates; easier to set up and
manage data integrity validation

Such as name and address subfields
 To support more kinds of ad hoc queries

When would you expect to denormalize?
 For presentation of data to users
 To reduce the number of table-joins for faster performance



Queries are known and fixed
Better performance for web publishing
Database views are often used to flatten relationship
structure for read-only access
INF385T(28620) – Fall 2013 – Lecture 11
45
Lecture 11
DATABASE DIAGRAMMING
INF385T(28620) – Fall 2013 – Lecture 11
Database diagramming:
conceptual / logical overview
Owners 1


*
Applications
* *
Use Codes
*
Database relationships and
cardinality can be diagrammed
for prototyping and
1
documentation
Use Code Descrips
The “*” on an association
link means “many”
INF385T(28620) – Fall 2013 – Lecture 11
47
Database diagramming:
Entity-Relationship (E-R) or Unified Modeling Language (UML)
Owners 1
*OwnerID
FirstName
LastName
Phone
StreetAddr
City
State
PostalCode
Applications * *
* *ApplicationID
OwnerID
ApplicationType
ProjectLocation
BusinessName
BusinessType
…
Use Codes
*RowID
UseCode
ApplicationID
*
1
Use Code Descrips
*UseCode
Description
INF385T(28620) – Fall 2013 – Lecture 11
48
Normalization tradeoff: referential integrity

Suppose you removed a record from the Owner
table
 What should be done with the related records from the
Applications table?
 Would this be easier or harder to manage than with the
de-normalized design on slide 33, “Sample data for water
use permits”?

The more tables are interconnected by
relationships, the greater the need to support
referential integrity within your applications
 A DBMS’ default support for referential integrity may be
very basic, such as to place Nulls in associated foreign
key fields, but only when a relationship is declared
INF385T(28620) – Fall 2013 – Lecture 11
49
Lecture 11
INF385T(28620) – Fall 2013 – Lecture 11
BUILDING THE DATABASE
Prototype prototype prototype…




Critical for validating your data model and
applications
An easy way to discover project
requirements
Don’t plan a lot of time for this, just do it!
Prototype in the simplest environment to
learn the most in the least time
 Validate that thematic choices, schema &
integrity rules support your requirements
 Reduce data management overhead with
personal, single-user system
INF385T(28620) – Fall 2013 – Lecture 11
51
Database environments

Production/Publishing
 Read-only copies of databases
 Used by majority of users
 Contains custom views of databases

Development/Maintenance
 Where compilation and editing occur
 Normalized for greater integrity enforcement
 May have multiple environments by data model
(cadastral/land use, transportation, utilities, hydro…)

Design/Test
 Prototype validation, load testing
 Isolate testing changes to the development environment,
so as not to corrupt the development system
INF385T(28620) – Fall 2013 – Lecture 11
52
Large projects can seem like this…
etc, etc…
smoke alarm…
1. Burning toast…
INF385T(28620) – Fall 2013 – Lecture 11
…fill glass!
1-53
But they can be simplified with
common data models

Should have:
 Simple structure with most common elements across a set of
applications in a user community
 Minimal rules, custom behavior, or cross-dependencies
 May include collections or sets or systems of feature classes,
e.g., networks, topologies, terrains

Should lend to:
 Web distribution
 Incremental, multi-user data updates
 User-side fusion, densification, value-adding

Early versions of data models (Arc 8 & 9):
http://support.esri.com/en/knowledgebase/techarticles/detail/40585

Current approach: http://resources.arcgis.com/en/communities/
INF385T(28620) – Fall 2013 – Lecture 11
1-54
Summary









The design process
Describing a system of viewpoints
Developing use cases
GIS design stages
Conceptual design
Normalization example
Database diagrams
Building the database
Prototype to production
INF385T(28620) – Fall 2013 – Lecture 11
55
Download