Relational database design handout

advertisement
RELATIONAL DATABASE DESIGN
Basic Concepts
• a database is an collection of logically related records or
files
• a relational database stores its data in 2-dimensional
tables
• a table is a two-dimensional structure made up
of rows (tuples, records) and columns (attributes, fields)
• example: a table of students engaged in sports activities,
where a student is allowed to participate in at most one
activity
StudentID
100
150
175
200
Activity
Skiing
Swimming
Squash
Swimming
Fee
200
50
50
50
Table Characteristics
• each row is unique and stores data about one entity
• row order is unimportant
• each column has a unique attribute name
• each column (attribute) description (metadata) is stored in
the database
• Access metadata is stored and manipulated in the
rows of the Design View tables
• column order is unimportant
• entries in a column have the same data type
MIS-DB-Design
1
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Primary Keys
• a primary key is an attribute, or a collection of attributes
whose value(s) uniquely identify each row in a relation
• a primary key must be minimal (that is, must not contain
unnecessary attributes
StudentID
100
150
175
200
Activity
Skiing
Swimming
Squash
Swimming
Fee
200
50
50
50
• we assume that a student is allowed to participate in at
most one activity
• the primary key in the above table is StudentID
• what if we allow the students to participate in more than
one activity?
StudentID
100
100
175
175
200
200
Activity
Skiing
Golf
Squash
Swimming
Swimming
Golf
Fee
200
65
50
50
50
65
• in this table, the two attributes, {StudentID, Activity},
constitute the primary key
• a multi-attribute primary key is called a concatenated key,
(composite key) and its members are called secondary keys
MIS-DB-Design
2
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Foreign Keys
• a foreign key is an attribute or a collection of attributes in
a relation, whose values match the values of a primary
key in some relation
• example: the STATE and CITY relations below
STATE relation:
State
Abbrev
CT
MI
SD
TN
TX
StateName
Connecticut
Michigan
South Dakota
Tennessee
Texas
Union
Order
5
26
40
16
28
StateBird
American robin
robin
pheasant
mocking bird
mocking bird
State
Population
3,287,116
9,295,297
696,004
4,877,185
16,986,510
CITY relation:
State
Abbrev
CT
CT
CT
MI
SD
SD
TN
TX
TX
CityName
Hartford
Madison
Portland
Lansing
Madison
Pierre
Nashville
Austin
Portland
City
Population
139,739
14,031
8,418
127,321
6,257
12,906
488,374
465,622
12,224
• primary key in STATE relation: StateAbbrev
• primary key in CITY relation: {StateAbbrev, CityName}
• foreign key in CITY relation: StateAbbrev
MIS-DB-Design
3
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Alternate Database Representations
• an alternative representation of the previous database is
• STATE = {StateAbbrev, StateName, UnionOrder,
StateBird, StatePopulation}
• CITY = {StateAbbrev, CityName, CityPopulation}
Functional Dependency
• a functional dependency is a relationship among
attributes
• attribute B is functionally dependent on attribute A if
given a value of attribute A we can uniquely look up the
corresponding value of attribute B
• attribute A is the determinant of attribute B if attribute B is
functionally dependent on attribute A
• in the STATE relation above, StateAbbrev is a
determinant of all other attributes, since specifying
its value would allow us to determine the values of all
other attributes uniquely by table lookup
• in the STATE relation, the attribute StateName is also
a determinant of all other attributes
• in the CITY relation above, the attributes StateAbbrev
and CityName together are a determinant of the
attribute CityPopulation
• in the CITY relation, the attribute CityName is not a
determinant of the attribute CityPopulation because
multiple cities in the table may have the same name
MIS-DB-Design
4
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Functional Dependency
Formally, given two attributes A and B, we say that B is
functionally dependent on attribute A if
ti(A) = tj(A)  ti(B) = tj(B) for i  j
where ti(A) means A’s value in the ith record.
Notice that the reverse is not necessarily true. Example:
Customer Name is dependent on Customer ID but the
reverse is not true assuming that two customers can have
the same name.
MIS-DB-Design
5
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Dependency Diagrams
• a dependency diagram is a pictorial representation of all
functional dependencies in a database
• an attribute is represented by a rectangle
• an arrow is drawn from the rectangle for attribute A
to the rectangle for attribute B whenever attribute A
is the determinant of attribute B
• example: students sports activity - I consists of the
relation ACTIVITY = {StudentID, Activity, Fee}
StudentID
Activity
Fee
• example: students sports activity - II consists of the
relation ACTIVITY = {StudentID, Activity, Fee}
StudentID
StudentID
100
150
175
200
MIS-DB-Design
Activity
Skiing
Swimming
Squash
Swimming
Activity
Fee
StudentID
100
100
175
175
200
200
Fee
200
50
50
50
6
Activity
Skiing
Golf
Squash
Swimming
Swimming
Golf
Fee
200
65
50
50
50
65
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Partial Dependencies
• a partial dependency is a functional dependency in which
the determinant is a part of the primary key
• example: ACTIVITY = {StudentID, Activity, Fee}
• the dependency between the attributes Activity and
Fee is a partial dependency
StudentID
Activity
Fee
Transitive Dependencies
• a transitive dependency is a functional dependency in
which none of the attributes involves attributes of a
primary key ( none of them is a part of the primary key)
• example: ACTIVITY = {StudentID, Activity, Fee}
• the dependency between the attributes Activity and
Fee is a transitive dependency
StudentID
MIS-DB-Design
Activity
7
Fee
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Database Anomalies
• anomalies are problems caused by bad database design
• problems mean here undesirable irregularities of
tables
• example: ACTIVITY = {StudentID, Activity, Fee}
StudentID
100
100
175
175
200
200
Activity
Skiing
Golf
Squash
Swimming
Swimming
Golf
Fee
200
65
50
50
50
65
• an insertion anomaly occurs when a row cannot be added
to a relation, because not all data is available
• example: we want to store the fact that diving
costs $175, but cannot enter this fact into the table
until a student takes up scuba-diving
• a deletion anomaly occurs when data is deleted from a
relation, and unintentionally other critical data are lost
• example: by deleting a record (say, StudentID = 100),
the fact that skiing costs $200 is lost
• an update anomaly occurs when one attribute is changed,
but the DBMS must make more than one change to reflect
that single change
• example: if the cost of swimming changes, then all
entries with swimming Activity must be changed too
MIS-DB-Design
8
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Cause of Anomalies
• anomalies are mostly caused by the following:
• data redundancy (replication of the same field in
multiple tables) ( repeating sections)
• partial dependency
• transitive dependency
• example: ACTIVITY = {StudentID, Activity, Fee}
StudentID
100
100
175
175
200
200
StudentID
Activity
Skiing
Golf
Squash
Swimming
Swimming
Golf
Activity
Fee
200
65
50
50
50
65
Fee
• in this example, there is a partial dependency, which the
cause of all the anomalies
MIS-DB-Design
9
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Cause of Anomalies (Cont.)
• a two-table solution:
• STUDENTS = {StudentID, Activity}
• ACTIVITIES = {Activity, Fee}
StudentID
100
100
150
175
175
200
200
StudentID
Activity
Skiing
Golf
Swimming
Squash
Swimming
Swimming
Golf
Activity
Fees
Skiing
200
Golf
65
Swimming
50
Squash
50
ScubaDiving 200
Fee
Activity
Activity
• the above relations do not have any of the anomalies
• we can add the cost of diving in ACTIVITIES
even though no one has taken it in STUDENTS
• if StudentID 100 drops Skiing, no skiing-related data
will be lost
• if the cost of swimming changes, that cost need
only be changed in one place only (the ACTIVITIES
table)
• the Activity field is replicated in the two tables, but
without this replication we cannot join the two tables
MIS-DB-Design
10
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Good Database Design Principles
1. no redundancy
• a field is stored in only one table, unless it happens to
be a foreign key
• replication of foreign keys is permissible, because
they allow two tables to be joined together
2. no partial dependencies
• the dependency diagram of any relation in the
database should contain no partial dependencies
3. no transitive dependencies
• the dependency diagram of any relation in the
database should contain no transitive dependencies
• normalization is the process of eliminating partial and
transitive dependencies
• as we normalize the relations, larger tables are split
into smaller tables with one common foreign key field
• there are five normal forms (NF), as given below
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
MIS-DB-Design
11
PA-5-Appendix
RELATIONAL DATABASE DESIGN
First Normal Form
• a relation is said to be in the first normal form (1NF)
if it does not contain any nested relation. In other words, all
attributes are atomic.
•IMPORTANT NOTE: many authors call the nested relations
repeating section. This name is missleading. The nested
relations involve many different instances of the same
attribute for one record. There is nothing repeating in this
anomaly.
• example: CLIENT table has nested relations.
Client
ID
2173
Client Name
VetID
VetName
PetID
PetName PetType
Barbara Hennessey
27
PetVet
4519
8005
Vernon Noordsy
Sandra Amidon
31
27
PetCare
PetVet
8112
Helen Wandzell
24
PetsRUs
1
2
3
2
1
2
3
Sam
Hoober
Tom
Charlie
Beefer
Kirby
Kirby
Bird
Dog
Hamster
Cat
Dog
Cat
Dog
CLIENT = {ClientD, ClientName, VetID, VetName, {PetID,
PetName, PetType} }
MIS-DB-Design
12
PA-5-Appendix
RELATIONAL DATABASE DESIGN
•In order to eliminate the nested relation, pull out the
nested relation and form a new table
•Be sure to include the old key in the new table so that you
can connect the tables back together.
•When a table contains no nested relations, we say that it is
in first normal form.
Client
Name
Client
ID
Vet
Name
VetID
transitive
Client
ID
PetID
Pet
Name
Pet
Type
Second Normal Form
• In order to eliminate the partial dependency, split the table
again.
•a table is said to be in the second normal form (2NF)
if it does not contain any partial dependencies, that is, each
nonkey column in a table depends on the entire key.
• example: partial dependencies in the relation
• now there are no partial dependencies, hence we need
not do anything;
• the relation still has some anomalies
MIS-DB-Design
13
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Third Normal Form
•In order to eliminate transitive dependency, we split the table
again.
•a table of 2NF is said to be in the third normal form (3NF) if it
does not contain any transitive dependencies, that is, Each
nonkey column depends on the whole key and nothing but the
key.
• in the 3NF, each determinant is a primary key
• example: conversion of CLIENT relation to the 3NF:
• CLIENTS = {ClientID, ClientName, VetID}
• PETS = {ClientID, PetID, PetName, PetType}
• VETS = {VetID, VetName}
Client
Name
Client
ID
VetID
VetID
Client
ID
PetID
Pet
Name
Vet
Name
Pet
Type
• note that the tables can be joined to yield a table in the
first normal form
• the ClientID and VetID fields are replicated, but they are
both foreign keys
MIS-DB-Design
14
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Third Normal Form (Cont.)
• example: CLIENT database in the third normal form
Client
ID
2173
Client Name
VetID
Barbara Hennessey
27
4519
8005
8112
Vernon Noordsy
Sandra Amidon
Helen Wandzell
31
27
24
VetID
27
31
24
VetName
PetVet
PetCare
PetsRUs
Client
ID
2173
2173
PetID
PetName
PetType
1
2
Sam
Hoober
Bird
Dog
2173
3
Tom
Hamster
4519
8005
8005
8112
2
1
2
3
Charlie
Beefer
Kirby
Kirby
Cat
Dog
Cat
Dog
with table relationships
• the database consists of three types of entities, stored
as distinct relations in separate tables:
• clients (CLIENTS)
• pets ( PETS)
• vets (VETS)
• there is no redundancy (only foreign keys are replicated)
• there are no partial and transitive dependencies
MIS-DB-Design
15
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Example: Hardware Store Database
• the ORDERS table :
Order
Numb
10001
10001
10002
10002
10002
10002
10003
10004
10004
10005
MIS-DB-Design
Cust
Code
5217
5217
5021
5021
5021
5021
4118
6002
6002
5021
Order
Date
11/22/94
11/22/94
11/22/94
11/22/94
11/22/94
11/22/94
11/22/94
11/22/94
11/22/94
11/23/94
Cust
Name
Williams
Williams
Johnson
Johnson
Johnson
Johnson
Lorenzo
Kopiusko
Kopiusko
Johnson
16
ProdDescr
Prod
Price
Hammer
$8.99
Screwdriver $4.45
Clipper
$18.22
Screwdriver $44.45
Crowbar
$11.07
Saw
$14.99
Hammer
$8.99
Saw
$14.99
Screwdriver $4.45
Cordlessdrill $34.95
Quantity
2
1
1
3
1
1
1
1
2
1
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Example: Hardware Store Database (Cont.)
• ORDERS = {OrderNumb, ProdDescr,
CustCode, OrderDate, CustName,
ProdPrice, Quantity}
• dependency diagram of the ORDERS table:
partial
Order
Numb
Prod
Descr
Cust
Code
Cust
Name
Order
Date
transitive
Prod
Price
Quantity
partial
• conversion of the hardware store database to 2NF
• QUANTITY = {OrderNumb, ProdDescr, Quantity}
• PRODUCTS = {ProdDescr, ProdPrice}
• ORDERS = {OrderNumb, CustCode, OrderDate,
CustName}
• dependency diagram of the hardware store database in 2NF
Order
Numb
Prod
Descr
Prod
Descr
Quantity
Prod
Price
transitive
Order
Numb
MIS-DB-Design
Cust
Code
17
Order
Date
Cust
Name
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Example: Hardware Store Database (Cont.)
• conversion of the ORDERS relation to 3NF
• QUANTITY = {OrderNumb, ProdDescr, OrderQuant}
• PRICE = {ProdDescr, ProdPrice}
• ORDERS = {OrderNumb, CustCode, OrderDate}
• CUSTOMERS = {CustCode, CustName}
• dependency diagram of the hardware store database in 3NF
Order
Numb
Order
Numb
Prod
Descr
Prod
Descr
Order
Quant
Order
Date
Cust
Code
Cust
Code
Prod
Price
Cust
Name
• table relationships for the hardware store database
MIS-DB-Design
18
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Example: Video Store Database
• the CUSTOMER relation:
Customer
Phone
ID
1
502-666-7777
2
502-888-6464
Last
Name
Johnson
Smith
First
Name
Martha
Jack
Address
125 Main St.
873 Elm St.
3
502-777-7575
Washington
Elroy
95 Easy St.
4
5
…..
502-333-9494
502-474-4746
…….
Adams
Steinmetz
……
Samuel
Susan
……
746 Brown Dr.
15 Speedway Dr.
……
City
Alvaton
Bowling
Green
Smith’s
Grove
Alvation
Portland
…..
State
KY
KY
Zip
Code
42122
42101
KY
42171
KY
TN
…..
42122
37148
…..
• the RENTAL-FORM relation:
Trans
ID
1
1
2
2
2
3
…..
Rent
Date
4/18/95
4/18/95
4/18/95
4/18/95
4/18/95
4/18/95
…….
Customer
ID
3
3
7
7
7
8
……
Video
ID
1
6
8
2
6
9
……
Copy#
2
3
1
1
1
1
……
Title
Rent
2001:SpaceOdyssey
Clockway Orange
Hpscotch
Apocalypse Now
Clockwork Orange
Luggage of the Gods
…..
$1.50
$1.50
$1.50
$2.00
$1.50
$2.50
…..
• a customer can rent multiple videos as part of the
same transaction
• however, the VideoID fields will be different for
each video
• multiple copies of the same video are allowed
• the copy# field stores the number of the copy
• video rental depends on the title and not on the day
• the database still contains some anomalies
MIS-DB-Design
19
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Example: Video Store Database (Cont.)
• relations for the video store database
• CUSTOMER = {CustomerID, Phone, Name, Address,
City, State, ZipCode}
• RENTAL-FORM = {TransID, RentDate, CustomerID,
VideoID, Copy#, Title, Rent}
• dependency diagram for the video store database
Customer
ID
Trans
ID
Phone
Rent
Date
Name
Customer
ID
Address
Video
ID
City
State
Copy#
Title
ZipCode
Rent
• video store database after eliminating partial dependencies
• CUSTOMER = {CustomerID, Phone, Name, Address,
City, State, ZipCode}
• RENTALS = {TransID, RentDate, CustomerID}
• VIDEOS = {VideoID, Title, Rent}
• VideosRented = {TransID, VideoID, Copy#}
MIS-DB-Design
20
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Example: Video Store Database (Cont.)
• dependency diagram for the video store database in 3NF
Customer
ID
Phone
Trans
ID
Rent
Date
Name
Address
Customer
ID
Trans
ID
City
Video
ID
Video
ID
State
Title
ZipCode
Rent
Copy#
• table relationships for the video store database
MIS-DB-Design
21
PA-5-Appendix
RELATIONAL DATABASE DESIGN
Summary of Guidelines for Database Design
• identify the entities involved in the database
• identify the fields relevant for each entity and define the
corresponding relations
• determine the primary key of each relation
• avoid data redundancy, but have some common fields so
that tables can be joined together
• ensure that all the required database processing can be
done using the defined relations
• normalize the relations by splitting them into smaller ones
MIS-DB-Design
22
PA-5-Appendix
Download