Chapter 5: The Relational Model and Normalization

advertisement
The Relational Model and
Normalization
R. Nakatsu
The Relational Model
• Data is represented in two-dimensional
tables
– Each of the tables is a matrix consisting of a
series of row/column intersections
– Tables are also called relations
– Columns of the tables are attributes
• Information in more than one table can be
easily extracted and combined
• E.F. Codd defined well-structured “normal
forms” of relations
Functional Dependency
Notation: X  Y
Each value of X determines one and
only one value of Y
Examples:
SID  Major, LastName, FirstName
ComputerSerialNumber  MemorySize
(SID, CourseNumber)  Grade
Functional Dependency
What are the functional dependencies
in the relation below?
A  B relationships
A  B and B  A
A  B but B not  A
A not  B and B not  A
one-to-one
many-to-one
many-to-many
Another way to write not is . For
example:
A   B and B   A (A multi-determines B
and B multi-determines A)
Key
A group of one or more attributes that
uniquely identifies a row.
A relation has one primary key and may also have
additional keys called candidate keys.
Composite Key
is a key that contains two or more attributes
Normalization
Normalization is a process that assigns
attributes (fields) to tables such that data
redundancies are eliminated or reduced,
thereby reducing the likelihood of data
anomalies.
Stages of Normalization (Normal Forms):
1NF, 2NF, 3NF, BCNF, 4NF
(Ensure that tables are at least 3NF; higher
forms are far less likely to be
encountered).
Normalization Process
Objective: Ensure that each table
conforms to the concept of well-formed
relations.
– Each table represents a single subject
– No data item will be unnecessarily
stored in more than one table
– All nonkey attributes in a table are
dependent on the primary key
– Each table is void of insertion, update,
and deletion anomalies
Anomaly
An undesirable consequence of data
modification in which two or more
different themes are entered
(insertion anomaly) in a single row or
two or more themes are lost if the
row is deleted (deletion anomaly).
Example
State the deletion and insertion anomalies.
SID
100
100
150
175
175
200
200
Activity
Skiing
Golf
Swimming
Squash
Swimming
Swimming
Golf
Fee
200
65
50
50
50
50
65
First Normal Form (1NF)
Any table of data that meets the definition of a
relation:
•
•
•
•
•
•
No multi-valued attributes allowed.
No repeating groups.
No two rows can be identical (need a primary key).
Order of the rows is insignificant.
All entries in a column are of the same kind.
Each column must have a unique name.
Table Not in 1NF
Table Not in 1NF
Table Not in 1NF
Order Table Attributes
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Order ID
Order Date
Shipping Date
Customer ID
Customer Name
Shipping Address
Book 1 Title
Book 1 Price
Book 1 Qty
Book 2 Title
Book 2 Price
Book 2 Qty
13.
14.
15.
16.
17.
18.
19.
20.
21.
Book 3 Title
Book 3 Price
Book 3 Qty
Book 4 Title
Book 4 Price
Book 4 Qty
Book 5 Title
Book 5 Price
Book 5 Qty
(Title, Price, Qty) is a repeating group
Second Normal Form (2NF)
If it is in 1NF and all its nonkey attributes are
dependent on all of the key.
No partial dependencies are allowed.
Partial dependency: Functional dependence in
which the determinant is only part of the
primary key.
Not in 2NF. Why?
SID
100
100
150
175
175
200
200
Activity
Skiing
Golf
Swimming
Squash
Swimming
Swimming
Golf
Fee
200
65
50
50
50
50
65
Tables in 2NF
Third Normal Form (3NF)
If it is in 2NF and has no transitive
dependencies.
Transitive Dependency:
One nonkey attribute
functionally depends on
another nonkey attribute.
What is the transitive
dependency in this example?
Tables in 3NF
Boyce-Codd Normal Form (BCNF)
If it is in 3NF and every determinant is a
candidate key.
© 2000 Prentice Hall
Database Systems, 9th Edition
22
Fourth Normal Form (4NF)
If it is in BCNF and has no multi-valued
dependencies.
A multi-valued dependency occurs when one
key determines multiple values of two other
attributes, and those attributes are
independent of one another.
Given two independent attributes A and B:
Key   A
Key   B
Not in 4NF. Why?
SID
100
100
100
100
150
Major
Music
Accounting
Music
Accounting
Math
Activity
Swimming
Swimming
Tennis
Tennis
Jogging
Tables in 4NF
© 2000 Prentice Hall
Summary of Normal Forms
•
•
•
•
•
•
1NF: Must meet the definition of a relation
2NF: No partial dependencies
3NF: No transitive dependencies
BCNF: Every determinant is a candidate key
4NF: No multi-valued dependencies
5NF and DKNF: Not covered (of theoretical
interest only)
These normal forms are nested.
Dependency Diagram
A dependency diagram depicts all
dependencies found within given table
structure
– Helps to get an overview of all
relationships among table’s attributes
– Makes it less likely that an important
dependency will be overlooked
– The arrows on the top indicate that the
Relation is in 1NF; that is, the primary
key determines all other attributes.
Database Systems, 9th Edition
28
Solution
Example:
Using ER Diagramming and
Normalization Together
Employee (Employee Number, Last Name,
First Name, Job Class, Hourly Rate)
Employee Number
Last Name
First Name
Job Class
Hourly Rate
11
Smith
John
Mechanic
20
12
Jones
Susan
Technician
18
13
McKay
Bob
Mechanic
20
14
Owens
Paula
Clerk
15
15
Chang
Steve
Mechanic
20
16
Sarandon
Sarah
Mechanic
20
In this example, HourlyRate is dependent on JobClass.
What is the problem with this table?
Solution: Create Two Tables
Employee (Employee Number, Last Name,
First Name, Job Class ID)
Employee Number
Last Name
First Name
Job Class ID
11
Smith
John
2
12
Jones
Susan
3
13
McKay
Bob
2
14
Owens
Paula
1
15
Chang
Steve
2
16
Sarandon
Sarah
2
Job Class ID is the link to the Job Class
table.
Job Class (Job Class ID, Job Class,
Hourly Rate)
Job Class ID
1
2
3
Job Class
Clerk
Mechanic
Technician
Hourly Rate
15
20
18
There are no more field dependencies!
De-Normalization
• Sometimes normalization is not worth
it. When a table is split into two or
more tables, the cost of the extra
processing (i.e., joins) may not be
worth it.
• Controlled Redundancy: For
performance reasons, however, it is
sometimes appropriate to duplicate
data intentionally.
Download