L8- The GIS Database

advertisement
L10 - The GIS Database –
Part 1
Chapter 8
Entity
• Bangor
– Penobscot
County, Maine,
United States
– Centroid 44.801N , 68.778W
– Area 34.4 square
miles
– Elevation – 158
feet
– Population 31,473
Standard GIS Data Model
Linked
spatial
and
attribute
data.
What is a database
A database is any organized collection of
data. Some examples of databases you
may encounter in your daily life are:
– a telephone book
– T.V. Guide
– airline reservation system
– motor vehicle registration records
– papers in your filing cabinet
– files on your computer hard drive.
Data vs. information:
What is the difference?
• What is data?
– Data can be defined in
many ways. Information
science defines data as
unprocessed information.
• What is information?
– Information is data that
have been organized and
communicated in a
coherent and meaningful
manner.
– Data is converted into
information, and
information is converted
into knowledge.
– Knowledge; information
evaluated and organized
so that it can be used
purposefully.
Database Definitions
What is a database?
It’s an organized collection of data.
What is a database management system?
A database management system (DBMS) which provides
you with the software tools you need to organize that data in
a flexible manner. It includes tools to add, modify or delete
data from the database, ask questions (or queries) about the
data stored in the database and produce reports
summarizing selected contents.
What is the ultimate purpose of
a database management
system?
Is to transform
Data
Information
Knowledge
Action
Features of a DBMS
• Database Management Systems provide
features to maintain database:
–
–
–
–
–
–
Data independence
Integrity and security
Transaction management
Concurrency control
Backup and recovery
Provides a language for the creation and querying of
the database.
– A language for writing application programs
Selecting a Database
Management System
Database management systems (or DBMSs) can be
divided into two categories -- desktop databases and
server databases.
• Generally speaking, desktop databases are oriented
toward single-user applications and reside on standard
personal computers (hence the term desktop).
• Server databases contain mechanisms to ensure the
reliability and consistency of data and are geared toward
multi-user applications.
Database Terminology
Relational Databases
• The relational database model is the most
dominant model in both the corporate and GIS
world, due to its flexibility, organization, and
functioning..
• It was defined by Edgar F. Codd (1970).
• It can accommodate a wide range of data types.
• It is not necessary to know beforehand the types
of processing that will be performed on the
database.
Relational Database Terminology
• Each table contains the data for a single entity.
• Each instance of an entity is a row/record/tuple
in a table.
• Columns contain attributes/fields that describe
the entity.
– Attributes must be from the same domain (text,
integer, date).
– Column order has no significance.
• Tables are related through keys.
Attributes
• An entity is represented by a set of attributes,
that is descriptive properties possessed by all
members of an entity set.
Domain – the set of permitted values for each
attribute
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
• E.g. multivalued attribute: phone-numbers
– Derived attributes
• Can be computed from other attributes
• E.g. age, given date of birth
Keys
• A super key of an entity set is a set of one or
more attributes whose values uniquely
determine each entity.
• A candidate key of an entity set is a minimal
super key
– Customer-id is candidate key of customer
– account-number is candidate key of account
• Although several candidate keys may exist, one
of the candidate keys is selected to be the
primary key.
Keys
• A composite key is a key with more than
one attribute.
• A foreign key is an attribute that is a key of
one or more relations other than the one in
which it appears.
Keys
Primary Key
Foreign Key
Primary Key
Foreign Key
Primary Key
Foreign Key
Relational Algebra
• Codd’s specification of a relational
database relied on relational algebra.
• Relational algebra takes tables/relations
as inputs and returns tables as outputs.
• The algebra combines or splits tables by
rows or by columns to generate either a
subset of tables or an expanded tables.
Fundamental building blocks
Tables comprise the fundamental building blocks of any
database.
The table above contains the employee information for an organization -characteristics like name, date of birth and title. Examine the construction of the
table and you'll find that each column of the table corresponds to a specific
employee characteristic (or attribute in database terms). Each row corresponds
to one particular employee and contains his or her information.
Relational Algebra
• Five basic operators
– select: 
– project: 
– union: 
– difference: –
– Cartesian product: x
• The operators take one or two
relations as inputs and produce a new
relation as a result.
Relational Algebra
• Derived Relational operators
– Intersection 
– Divide (not used very often)
– Join
• These can be expressed using
different combinations of the
fundamental operators.
Select Operation – Example
 Relation r
 A=B ^ D > 5 (r)
Select from relation r
where A=B AND D>5
A
B
C
D


1
7


5
7


12
3


23 10
A
B
C
D


1
7


23 10
Database Queries
• Queries may be made of one table or
several tables at the same time.
• In many systems querying is facilitated by
icons, or menus, or queries by example
(QBE – a graphical query language ).
SQL
• DDL – Data Definition Language; used to
create and manage the database.
• DDM – Data Manipulation Language; used
to query the database.
SQL
• SQL: widely used non-procedural language
– E.g. find the name of the customer with customer-id 192-837465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
– E.g. find the balances of all accounts held by the customer
with customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number =
account.account-number
• Application programs generally access databases
through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g. ODBC/JDBC) which
allow SQL queries to be sent to a database
Attribute Queries
The ArcGIS Attribute Query
Interface
State’s Table is Open
Table is Open
Options
Related tables
Select by attributes
Switch selection
Clear selection
Zoom to selected
Delete Selected
No Table is Open
Selection>Select by
Attributes from
the Menu Bar
A Spatial Query in SQL
SELECT city.name, city.geometry
FROM
city, county
WHERE county.name=‘Penobscot’ AND
city.geometry INSIDE county.geometry
city.population>30000;
Spatial Selection
Table Joins
• Table joins depend on the data not the
attribute name.
• There are many different types of table
joins.
• Tables can be joined regardless of the
relationship EXCEPT:
– When joining to the feature attribute table, the
relationship must be 1:1 or M:1
– Other relationships must use the relate.
One-to-One Join
Employee-id
Job
Employee-id
name
1
Digislave
1
Tom
2
Useless Supervisor
2
John
Join Employee-id to Employee-id
After join
Employee-id
Job
Name
1
Digislave
Tom
2
Useless Supervisor
John
A join does not permanently alter the table structure
Many-to-One Join
Polygon Id
Symbol
Symbol
Description
1
Qa
Qa
Quaternary Alluvium
2
Qa
Qe
Quaternary Eolian
3
Pa
Pa
Permian Abo
4
Qe
After Join on Symbol
Polygon ID
Symbol
Description
1
Qa
Quaternary Alluvium
2
Qa
Quaternary Alluvium
3
Pa
Permian Abo
4
Qe
Quaternary Eolian
Relate in a GIS
http://courses.washington.edu/gis
250/lessons/introduction_gis/relati
Relational Database Modeling
• Entity Relationship Model – the result is a
diagram of all of the entities, their attributes, and
the relationships between entities
– Each entity becomes a table.
– Each relationship becomes a table
• Database Normalization – The result is a
number of entity tables. We still need to create
the relationship tables.
Entity Relationship Diagram
ENTITY
Rectangles represent entity sets.
Diamonds represent relationship
sets.
Lines link attributes to entity sets and
RELATIONSHIP
entity sets to relationship sets.
Ellipses represent attributes
Double ellipses represent
multivalued attributes.
ATTRIBUTE
Dashed ellipses denote derived
attributes.
Underline indicates primary key
attributes (will study later)
Entity Relationship Diagram
Name Address
ID
Name
Number
Phone
Student
Major
Advisor Credits
Courses
Enrolls In
GPA
Credits
Time
Instructor
Room
Types of Relationships between
Entities
• 1:1 – one faculty member is assigned to
one office.
• 1:M (M:1) – one faculty member teaches
many courses.
• M:N – many students take many courses.
Orono Charter Airlines
Orono Charter Airlines is an aircraft charter company that
supplies on-demand charter flight services to corporate
customers, using a fleet of four different aircraft. For each
plane, they need to keep track of the number of passengers
it can hold, the charge per seat/mile, the manufacturer,
model name, total miles flown, and date of last annual
maintenance. Aircraft are always identifiable by a unique
registration number. For each flight, the company must
keep track of the date, pilot and copilot’s names, the plane
used, the destination, distance, air time, ground time and
fuel and oil used. In addition to pilots and copilots, the
company has customer service personnel, mechanics and
maintenance personnel. In addition to the normal
employee information, they must keep additional
information on the pilots, such as their license number, their
aircraft rating and when they had their last medical.
Entity Sets
•
•
•
•
•
•
Aircraft
Customer
Charter
Pilot
Employee
Model
Relationship Sets
Customer
1
Requests
M
Charter
Adding the Attributes
Customer
LicenseNum
Name
1
Requests
Charter
M
Date
PlaneRegNum
PilotName
StreetAddr
CoPilotName
CityAddr
Destination
State
Distance
Zip
Airtime
GroundTime
Phone
FuelUsed
CreditCardType
OilUsed
CreditCardNum
Date
PlaneRegistrationNo
PilotName
CopilotName
Destination
Distance
AirTime
GroundTime
FuelUsed
OilUsed
LicenseNo
CustomerName
CustomerAddress
CreditCard
CreditCardNumber
Customer
1
Requests
M
Charter
M
Flies
1
Aircraft
M
M
CoPilots
RegistrationNo
MilesFlown
DateLastMaintenance
NoPassengers
Charge_Seat_Mile
M
Pilots
References
1
1
Pilot
SSNo
LicenseNo
Rating
MedicalDate
1
Model
Orono Airlines
1
ModelName
Manufacturer
Address
Phone
ContactPerson
Is A
1
Employee
SSNo
Name
Address
Phone
JobTitle
Salary
Database Normalization
• Start with a single relational schema
containing all attributes and decompose it.
• The goal is to decrease the chance of
inconsistent information in the database.
• There is a hierarchy of rules for
decomposing the table.
• In most cases, only the first three are
used.
Normalization Rules
• Begin with an unnormalized user view.
• To put all tables in first normal form (1NF),
remove repeating groups.
• To put all tables in second normal form
(2NF), remove partial dependencies.
• To put all tables in third normal form (3NF)
remove all transitive dependencies.
Unnormalized Table
Student Name
_ID
Major
3421 Jone SIS
s
8725 Dow
…
CRN
333
222
BIO 555
666
777
Title
Instructor
Instructor
_Office
Aaaaa
Bbbbb
Bbbbb
Ccccc
Dddd
Holden
Taylor
Taylor
Allen
Kemp
134 BD
Grade
A
222 M C
222 M B
351 B B317 Nv C+
Remove repeating groups – Creates two tables
Student Table
StudentID
Name
Major
3421
Jones
SIS
8725
Dow
BIO
…
Student_course Table
Student_I
D
CRN
Title
Instructor
Instructor
_Office
Grade
3421
333
Aaaaaa Holden 125 Bd A
3421
222
Bbbbbb Taylor
222 M
C
8725
555
Bbbbbb Taylor
222 M
B
8725
666
Cccccc Allen
351 B
B-
8725
777
Dddddd Kemp
317 Nv C+
Remove partial dependencies.
This refers only to tables having a composite key.
You need to identify attributes that require only part of the
composite key, and remove them.
Registration Table
Student_ID
CRN
Grade
3421
333
A
3421
222
C
8725
555
B
8725
666
B-
8725
777
C+
Course-Instructor Table
CRN
Title
Instructor
Instructor
_Office
333
Aaaaaa Holden
125 Bd
222
Bbbbbb Taylor
222 M
555
Bbbbbb Taylor
222 M
666
Cccccc
351 B
777
Dddddd Kemp
Allen
317 Nv
To put a table in third normal form remove transitive dependencies
A transitive dependency is an attribute that depends only on
another attribute, not a key.
CRN Title
Instructor
333 Aaaaaa
Holden
222 Bbbbbb
Taylor
555 Bbbbbb
Taylor
666 Cccccc
Allen
777 Dddddd
Kemp
Course Table
Instructor Table
Instructor
Instructor
_Office
Holden
125 Bd
Taylor
222 M
Taylor
222 M
Allen
351 B
Kemp
317 Nv
Studen Name
t_ID
Major
CRN
Title
StudentID
Name
Major
Student_ID
CRN
Grade
CRN Title
Instructor
Instructor
Instructor
_Office
Instructor Instructor
_Office
Grade
When NOT to Normalize
• Relates in large tables require large
amounts of computer time to process.
– If you have an application that speed is more
important than database structure.
• If you are creating a very simple data base
that will be used for a short time and then
discarded.
Download