Database

advertisement
The Relational Model and
Normalization
1
The Relational Model



Page 113
Broad, flexible model
Basis for almost all DBMS products
E.F. Codd defined well-structured
“normal forms” of relations,
“normalization”
Relational Data Model


A relational data model organizes data as a
set of relations, or two-dimensional tables.
A relation is viewed as a two-dimensional
table, with following properties:




Each column contains values about the same
attribute, and each table cell must be simple
Each column has a distinct name (attribute name),
and the order of columns is immaterial
Each row is distinct, duplicate rows are not
allowed
The sequence of the rows is immaterial
An Example Relation
Key
Candidate
Key
Foreign
Key
Non-key
Attribute
Non- key
Attribute
Employee Employee
Number
Name
28719
Smith Tom
Department
Number
172
Salary
18,000
Date
Started
12/03/84
53730
Jones Bill
044
20,000
01/05/83
79313
Ropley Ed
044
11,000
18/09/81
51616
Fair Carolyn
090
50,000
05/12/79
61930
Hall Albert
090
25,000
21/06/82
Terminology in a Relation



Tuple - a row or record
Column - values of an attribute
Domain - a set of possible values for an
attribute
Terminology in a Relation

Key



primary key (unique ID)
Concatenated key - use two or more attributes to
identify a record (e.g.. Student ID & Course ID to
identify a Grade record)
Foreign key (cross reference key)

a foreign key is a non-key attribute in one relation
that also appears as a primary key in another
relation
An E-R Model for
Student Registration System
Course
Number
Instructor ID
Description
Room
Course
Attributes
Rank
Teaches
M
Name
Instructor
1
1
1
Advises
M
M
Student
Course Enrollment
M
Course
Number
Grade
Student
Number
1
Major
Student
Number
Student
Name
7
Covert E-R Model to Relational
Tables
Create one table for each entity with
key and attributes
 Introduce foreign key into the “many”
side to represent 1:m relation

A Relational Model For Student
Registration System
Course Table
Course ID
Description
Credit
Instructor ID
Instructor Table
Instructor ID
Instructor Name
Rank
Student Table
Student ID
Student Name
Major
Enrollment Table
Course ID
Student ID
Grade
Advisor ID
Relational Database

Advantages





Easy to understand and use
Powerful data manipulation capability
Implicit association to meet different needs. Flexible, best
for DSS
Normalization theory for database design
Disadvantages



Redundantly store keys as logical pointers for implementing
relationship
Inefficiency for high-volume transaction processing
Lack of semantic quality control
Equivalent Relational Terms
Page 114
Figure 5-1
© 2000 Prentice Hall
Normalization


Reduce complex user views to a set of
small, stable data structures
Eliminate errors and inconsistencies
related to the adding, deleting or
updating of record occurrences
Modification Anomalies



Insertion anomalies - cannot add a record
because of a missing value for one or more
fields
Deletion anomalies - the deletion of a record
causes an unintended deletion of information
Update anomalies - updating as made
needlessly complicated due to redundancy
Functional Dependence

Given a relation R, attribute Y of R is
functionally dependent on attribute X of
R if and only if, whenever two tuples of
R agree on their X- value, they must
necessarily agree on their Y-value.
We write R.X --> R.Y
Example:
(Student ID, Student Name, Course ID, Course Title, Grade)
Student ID --> Student Name,
Course ID --> Course Title
Student ID -?-> Course ID
Course Title -?-> Student Name
Student ID -?-> Grade
Course ID -?-> Grade
Normal Forms

A relation is said to be in a particular
normal form if it satisfies a certain
specified set of constraints
Normal Forms
1 NF (no repeating groups)
2 NF (no partial dependencies)
3 NF (no transitive dependencies)
Boyce-Codd NF
4 NF (no multi-value dependencies)
5 NF
Domain-Key NF
First Normal Form

A relation is in first normal form if it contains no
repeating groups
First Normal Form

An un-normalized relation contains repeating
groups
First Normal Form


Grade Report with repeating group of courses for each
student
(Student ID, Student Name, Campus Address, Major,
Course ID, Course Title, Instructor Name, Instructor
Location, Grade)
Remove repeating group
(Student ID, Student Name, Campus Address, Major)
(3NF)
(Student ID, Course ID, Course Title, Instructor Name,
Instructor Location, Grade) (1NF)
First Normal Form
Second Normal Form

A relation is in second normal form
if it is already in first normal form
and any partial functional
dependencies on the primary key
have been removed
Second Normal Form
A
B
C
D
partial functional dependencies on the primary key
A
B
B
D
C
Second Normal Form


(Student ID, Course ID, Course Title, Instructor Name,
Instructor Location, Grade) (1NF)
Primary key is Student ID + Course ID
Student ID + Course ID --> Grade
Course ID --> Course Title (partial dependency)
Removing partial dependencies
(Student ID, Course ID, Grade) (3NF)
(Course ID, Course Title, Instructor Name, Instructor
Location ) (2NF)
Second Normal Form
Third Normal Form


A relation is in third normal form if it
is already in second normal form and
contains no transitive dependencies
transitive dependency - One nonkey
attribute is dependent on one or
more nonkey attributes
Third Normal Form
A
B
C
D
transitive dependencies
A
B
C
D
C
Third Normal Form


(Course ID, Course Title, Instructor Name, Instructor
Location ) (2NF)
Course ID --> Instructor Name --> Instructor Location
Instructor Name is nonkey
Instructor Location is dependent on Instructor Name
Remove transitive dependency
(Course ID, Course Title, Instructor Name) (3NF)
(Instructor Name, Instructor Location ) (3NF)
Third Normal Form
Third Normal Form
“if it is in second normal form and has no
transitive dependencies”
Figure 5-7
© 2000 Prentice Hall
Practice: Mountain View
Community Hospital
Mountain View Community Hospital
Physician Report
Physician: A Campbell
Specialty: Internal Medicine
Date
Patient-Code
Patient-Name
Procedure
Charge
---------------------------------------------------------------------------------------------10/17/96 32968
Baker, Marry S. Examination
35.00
X-ray
75.00
10/17/96 39271
Emery, Nancy
Examination
35.00
Chemotherapy 50.00
10/18/96 32968
Baker, Marry S. Examination
35.00
----------------------------------------------------------------------------------------------
Normalize a table
Report (Doctor Name, Specialty, Date, Patient Code,
Patient Name, Procedure Name, Charge)
Analyzing functional dependency:
 Assume no duplicate Doctor Name. Otherwise
introduce a doctor ID
 Assume no duplicate Procedure Name. Otherwise
introduce a Procedure code
 Assume charge is determined by procedure.
 Assume a patient may visit a doctor more than once
during the same day.
Answer





Doctors (Doctor ID, Doctor Name, Specialty)
Patients (Patient Code, Patient Name)
Visit (Visit ID Doctor ID, Patient Code, Date)
Treatment (Visit ID, Procedure ID)
Procedure (Procedure ID, Procedure Name, Charge)
Here the Visit ID is automatically generated by the
system
A E-R Model for
Hospital Treatment Charge
Procedure
ID
Doctor ID
Description
Name
Rate
Specialty
Procedure
1
1
Doctors
Doctor ID
M
M
Treatment
M
1
Patients
Visit
Visit ID
M
Patient
Code
Visit ID
Procedure
ID
1
Date/Time
Patient
Code
Patient
Name
E-R model improvement criteria
vs. Normalization Theory



Each entity must have a key (simple or
composite) (basic requirement of a relational
table)
Introduce composite entity to convert a m:n
relation into two 1:m relations. Introduce a
composite key (the way of presenting m:n
relationships in relational database)
Convert a multivalued attribute into an
attribute entity or weak entity (1 NF)
E-R model improvement criteria
vs. Normalization theory
Make each entity represent a simple object
or concept (2 NF and 3NF)
 Divide complex entity into several related
simple entities (2 NF and 3 NF)
 Make each attribute associate with only
one entity unless it is a foreign key (3 NF)
 A good E-R model usually satisfies 3 NF.

Boyce-Codd Normal Form
“if every determinant is a candidate key”
Figure 5-8
© 2000 Prentice Hall
Boyce-Codd Normal Form
(Student, Major, Advisor) (3NF)
or (Student, Advisor, Major) (1NF)
Student may have more than one major with
one advisor in each major
Student + Major  Advisor
Student + Advisor  Major
Advisor  Major (Advisor determines major but Advisor

is not candidate key)

(Student, Advisor) (BCNF)
(Advisor, Major) (BCNF)
Boyce-Codd Normal Form



A relation is in BCNF if and only if it is in 3NF and
every determinant is a candidate key
A determinant is any attribute (simple or
composite) on which some other attribute is fully
functionally dependent
Situation:
1. Multiple candidate keys
2. Those candidate keys are composite
3. The candidate keys are overlapped
Fourth Normal Form


A relation is in fourth normal form if it is in
BCNF and contains no multivalued
dependencies
Multivalued Dependency



There are three attributes (e.g. A,B,C) in a
relation.
For each value of A there is a well-defined set of
value of B and a well-defined set of value of C.
The set of value of B is independent of the set of
value of C, and vice versa.
Fourth Normal Form


(Course, Instructor, Textbook) (BCNF)
One course is taught by several
instructors
One course uses the same set of
textbooks by each instructor
(Course, Textbook) (4NF)
(Course, Instructor) (4NF)
Fourth Normal Form
Course
Instructor
Textbook
1ka3
David
Intro. Web design
1ka3
Smith
Intro. Web design
1ka3
David
Intro. Access
1ka3
Smith
Intro. Access
Course
Instructor
Course
Textbook
1ka3
David
1ka3
Intro. Web design
1ka3
Smith
1ka3
Intro. Access
Fifth Normal Form
?
Page 125
Fifth Normal Form



Every join dependency is a
consequence of its relation keys
A non 5NF: Person-using-skills-on-jobs
(Person, Skill, Job)
5 NF: Has-skill (Person, Skill)
Need-skill (Skill, Job)
Assigned-to-job (Person, Job)
Domain Key Normal Form
“if every constraint on the relation is a
logical consequence of the definition of
keys and domains”
Constraint
“a rule governing static
values of attributes”

Key “unique identifier of a tuple”
Domain “description of an attribute’s
Page 125
allowed values”

Example of non DK/NF





Enrollment (Student ID, Course ID, Grade)
Key constraint: Student ID + Course ID --> Grade
Domain constraint:
Student ID: 7 digits, Course ID: 3 digits, Grade:
A,B,C,D,F,P
General constraint
If Course ID < 900 then Grade in {A,B,C,D,F}
else Grade in {P,F}
Since the general constraint cannot be inferred
from key constraint or domain constraint, it is not
a DK/NF.
Remarks on Normalization


The notions of dependency and
normalization are semantic in nature
The normalization guidelines should be
regarded primarily as a discipline to
help the database design
Limitations of normalization




may not natural, e.g. zip code, area code for
phone #
May ignore operational considerations: need not
change, may change over time. e.g. (order# ,
prod# ,description, unit-price, quantity)
Difficult to enforce integrity control
(Order#, Prod#, quantity)
(Prod#, Description, Unit-price)
Prod# may not be valid.
Now the integrity control is provided by relational
DBMS
Denormalization



Normalization is only one of many database
design goals.
Normalized (decomposed) tables require
additional processing, reducing system speed.
Normalization purity is often difficult to
sustain in the modern database environment.
The conflict between design efficiency,
information requirements, and processing
speed are often resolved through
compromises that include denormalization.
Download