L11_normalization

advertisement
CMPT 275
Phase:
Design
1
Map of design phase
DESIGN
LOW LEVEL
DESIGN
HIGH LEVEL
DESIGN
Data
Persistance
Subsystem
Module Interfaces
Modularization
Classes
Class Interfaces
Interaction Diagrams
architecture
User Interface
Janice Regan, 2008
User Manual
Implementation
2
Implementation issues
related to Data
Persistence
NORMALIZATION
3
Relational DB Design
 We will structure our relational database
table(s) using Normalization
 process of assigning attributes to tables
 series of stages called Normal Forms
 1st normal form:
fixed length records
 2nd normal form: remove partial
dependencies
 3rd normal form: remove transitive
dependencies
Janice Regan, 2008
4
Relational DB Design
 We will structure our relational database
table(s) using Normalization
 Advantage:

assures equal length records

reduces data redundancies hence helps eliminate
problems that result from redundancies
 Disadvantage:

decrease performance as we normalize to higher
forms, higher forms require more tables
Janice Regan, 2008
5
An example…
 To illustrate how to normalize, we shall
use the example of a Student Registration
System. Here are some requirements:
1. For each student, we need to remember:
student-id, name, address, phone, courses
2. For each of the course taken, must remember:
credit, semester, grade, room, instructor’s
office, instructor.
Janice Regan, 2008
6
An example…
 To illustrate how to normalize, we shall
use the example of a Student Registration
System. Here are some requirements:
3. Students can repeat same course in a later
semester
4. There is only one offering a a given course in a
semester
5. For each of the course attempted, must
remember: semester, grade, room, instructor,
instructor's office
Janice Regan, 2008
7
OO Classes: Class diagram
Student
Course offering
Student ID
name
Receives a grade for
address
0..* takes 0..*
phone
List of courses
Course name
semester
room
0..*
Instructor
Instructor’s office
Course
1
Course
Name
credit
List of students
List of grades
Janice Regan, 2008
8
Our First Table
 From our requirements, we could create the
following database table: (horizontal lines separate
records, representing single student objects)
Std-id
Std-name
Std-address Std-phone Coursename
15438
Paul K.
Brook St., Bby 294.2563
25636
47352
21544
Will B.
Kim L.
Xiao T.
Elf Ave., Van.
Mer Cr., Poco
Alpha St., Bby
256.2453
939.2766
295.9976
Cmpt 101
Cmpt 150
Bus 152
Engl 102
Biol 234
Cmpt 354
Sem
03-2
03-1
03-2
03-1
03-2
03-1
03-2
Grade Credit Room Instructor
CB
AA+
B+
D
A-
4
3
3
2
3
3
3
AQ2
AQ1
ASB
WM
EDC
AQ1
AQ2
Dr. Klaus
M. Nole
V. Karu
W. Loti
Dr. Quel
Dr. Klaus
Dr. Yu
Instructor’s
office
ASB985
ASB352
WM543
AQ834
EDC243
ASB985
ASB111
Each record is uniquely identified by the student number (Std-id). The primary key for the
table is therefore Std-id. This table is unnormalized (contains records of varying length)
Janice Regan, 2008
9
Our First Table: Is there a problem?
student-id
name
address
phone
course
credit
semester
grade
room
instructor
instructor’s office
NOT ALL RECORDS ARE OF THE SAME LENGTH !!
Group of attributes
repeated for each
course taken
by 1 student
Group of attributes
repeated each time a
particular course is
attempted by 1 student
 So far, attributes are in an unnormalized form.

Objects, transformed into DB records, will not all be of
same length. Each record contains all information
about one student.
Janice Regan, 2008
10
What is the problem?
 Problem: attributes that are lists (multiple
courses per student, multiple attempts per
course) do not produce fixed length records
 Solution: remove lists by adding additional rows
one to hold each attribute in the list
 Consider the example: for a student, have 1
complete row per course attempted/taken
 This results in a table in First Normal Form
Janice Regan, 2008
11
First Normal Form
 Definition of First Normal Form (1NF):
 Tables do not have repeating groups, i.e.,
each row/column intersection can contain one
and only one value, not a set of values.
 All the key attributes are defined, no blank
(null) values of keys are permitted
Janice Regan, 2008
12
Our example
Defining primary key attributes
 Which attributes are needed to assure each
record is uniquely identified
 Std-Id is not enough

a student can take multiple courses
 Std-id and course name is not enough

a student can take the same course more than once
if they wish
 Std-id, course name and semester is enough

Each time the student takes a course it is uniquely
identified as a single record in the table
Janice Regan, 2008
13
First Normal Form of Table (1NF)
Std-id
Std-name
Std-address
15438
15438
15438
25636
47352
21544
21544
Paul K.
Paul K.
Paul K.
Will B.
Kim L.
Xiao T.
Xiao T.
Brook St., Bby
Brook St., Bby
Brook St., Bby
Elf Ave., Van.
Merry Cr., Poco
Alpha St., BBY
Alpha St., Bby
Std-phone Course-name Semester Grade Credit Room Instructor Instructor’s
office
AQ2 Dr. Klaus ASB985
Cmpt 101
03-2
4
C294.2563
AQ1 M. Nole
Cmpt 150
03-1
3
B
ASB352
294.2563
ASB V. Karu
Bus 152
03-2
3
AWM543
294.2563
WM W. Loti
Engl 102
03-1
2
A+
AQ834
256.2453
EDC Dr. Quel EDC243
03-2
3
B+
939.2766 Biol 234
AQ1 Dr. Klaus ASB 985
Cmpt 354
03-1
3
D
295.9976
AQ2 Dr. Yu
03-2
3
AASB111
295-9976 Cmpt 354
Result: Single table with compound (multi-attribute) primary key.
Primary Key: each row uniquely identified by one single attribute
Compound Primary Key: each row uniquely identify by a or group of attributes
The compound primary key for the above table is: Std-id, Course-name, Semester
Janice Regan, 2008
14
Is there still a problem?
Yes!
 Our table in First Normal Form could still contain
data redundancies due to partial dependencies.
 Partial dependencies are based only on a part of
the compound primary key.
 Consider an attribute A, that is dependent on
the compound primary key K

If A is dependent on all components of the compound
primary key the A is fully dependent on K

If A is dependent on some but not all of the
components of the primary key then A is partially
dependent on K
Janice Regan, 2008
15
Redundancy: Examples + problems
 Examples of redundancy and partial
dependence:
 For each course a student takes the student’s
name, address and phone number are
repeated. A student’s name and address are
dependent on the student’s id but not on the
course name or semester
 For each course a student repeats the course
credit is repeated. The course credit is
dependent on the course name but not the
student’s id or the semester
Janice Regan, 2008
16
Problems related to Redundancy
 Redundancy
 Insert anomalies: e.g. Each time a student takes a
course the student information must be entered,
this adds to the potential for error
 Delete anomalies: if delete the row where info
about Std-id 47352 is stored will also delete info
that cannot be found anywhere else in DB table
namely that Dr. Quel’s office is EDC243
 Update problems: because of redundant data, if a
student moves, need to change student's address
in all rows corresponding to
every instance of every course the student had
ever taken. Problems occur if one occurrence is
missed or an error is made in one occurrence
Janice Regan, 2008
17
Partial Dependencies
 Definition: non-key attribute(s) dependent on
only some of primary key(s)
 Examples:

Phone #, Std-name, and address depend only on Stdid (not course name or semester)

Credit depends only on course name (not on Std-id or
semester)

Instructor, Instructor’s office, room, and grade depend
on course and semester (not Std-id)
Janice Regan, 2008
18
Partial Dependencies
 Definition: non-key attribute(s) dependent on
only some of primary key(s)
 When an attribute is only partially dependent on
the primary keys of the table there may be
redundant occurrences of that attribute in the
table
 Therefore, To remove redundancies we should
remove partial dependencies
Janice Regan, 2008
19
Problem with 1NF
 non-key attribute(s) may depend on
some but not all of the primary key(s)
 e.g.: primary keys are Std_id, Course-
name and semester
address depends only on Std-id
Janice Regan, 2008
20
From 1NF to 2NF
 Solution: Remove partial dependencies
 Determine if there are any partial dependencies.
 If so, divide 1NF table into several tables such
that in each table each non-primary key
attribute is dependent on only the primary key
(or compound primary key) of that table.
 Note that if primary key of 1NF table is not a
compound primary key, there cannot be partial
dependencies and hence the 1NF table is
already in 2NF.
Janice Regan, 2008
21
Second Normal Form - Example
Transform our 1NF table into a 2NF table
 STEP 1: We determine dependencies on
single primary key:
 Std-id
Phone #, Std-name, Std-address
 Course-name
credit
 Semester
none dependent only on semester
Janice Regan, 2008
22
2NF Example - Step 1
 DB tables look like:
Course Table
Student Table
Std-id
Std-name
Std-address
Std-phone
Course-name
15438
Paul K.
Brook St. Bby
294.2563
Cmpt 101
4
25636
Will B.
Elf Ave., Van.
256.2453
Cmpt 150
3
47352
Kim L.
Merry Cr., Poco
939.2766
21544
Xiao T.
Alpha St., Bby
295.9976
Bus 152
3
Engl 102
2
Biol 234
3
Cmpt 354
3
Janice Regan, 2008
credit
23
Second Normal Form - Example
 STEP 2: We determine dependencies on
pairs of primary keys:
 Course-name + Semester
room, instructor, instructor’s office
 Course-name + Std-id
none
 Semester + Std-id
none
Janice Regan, 2008
24
2NF Example - Step 2
Course Offering Table
Course-name
Semester
Room
Instructor
Instructor’s office
Cmpt 101
03-2
AQ2
Dr. Klaus
ASB985
Cmpt 150
03-1
AQ1
M. Nole
ASB352
Bus 152
03-2
ASB
V. Karu
WM543
Engl 102
03-1
WM
W. Loti
AQ834
Biol 234
03-2
EDC
Dr. Quel
EDC243
Cmpt 354
03-1
AQ1
Dr. Klaus
ASB985
Cmpt 354
03-2
AQ2
Dr. Yu
ASB111
Janice Regan, 2008
25
Second Normal Form - Example
Step 3
 We determine dependencies on
whole compound primary key:
 Course-name + Semester + Std-id
grade
Janice Regan, 2008
26
2NF Example - Step 3
Student Registration Table
Std-id
Course-name
15438
15438
15438
25636
47352
21544
21544
Cmpt 101
Cmpt 150
Bus 152
Engl 102
Biol 234
Cmpt 354
Cmpt 354
Janice Regan, 2008
Semester
03-2
03-1
03-2
03-1
03-2
03-1
03-2
Grade
CB
AA+
B+
D
A27
2NF Example, alternate Step 3-1
Introducing Association

Looking at the data model (class diagram), we can recognize
the “many-to-many” multiplicity relationship between
Student, grade and CourseOffering
Student
Student ID
name
address
phone
List of courses

Course offering
Course name
Receives a grade for
semester
0..*
List of grades
0..* takes 0..*
room
Instructor
Instructor’s office
List of students
Course
1
Course
Name
credit
These 3 attributes are used to implement
“many-to-many” multiplicity relationships
Janice Regan, 2008
28
Association Class
 However, this data model does not lead to DB
tables with records of fixed length because these
3 attributes are of varying size for each object of
Student and Course Offering class types, so…
 … we introduce yet another “class” that
associates 1 student to many attempts at
(registrations to) one course and 1 course to
many attempts (registrations) per 1 student. For
each of these attempts there is one grade
Janice Regan, 2008
29
Association Class
 An association class takes a many-many relation
and breaks it into two 1-many relationships

Student and Student Registration have a “1-to-many”
multiplicity relationship

Course Offering and Student Registration have a “1-tomany” multiplicity relationship
 The association class will contain the attributes
that are lists (that cause the many to may
relationship)
 The association class wil contain the attributes
that depend upon all the variables (lists) in the
association class.
Janice Regan, 2008
30
2NF Example, alternate Step 3 - 2
 This relationship can be broken down into 2
“1-to-many” multiplicity relationships by creating
an association class Student-Registration
Student
Course offering
Student ID
name
address
phone
1 Course
Course name
Name
semester
0..*
credit
room
Instructor
Instructor’s office
Student Registration
1
0..*
Janice Regan, 2008
Student ID
Course name
semester
grade
Course
1
0..*
31
2NF Example, alternate Step 3 - 3
Course Offering Table
Course-name Semester Room
Instructor
Cmpt 101
03-2
AQ2
Dr. Klaus
ASB985
Cmpt 150
03-1
AQ1
M. Nole
ASB352
Bus 152
03-2
ASB
V. Karu
WM543
Engl 102
03-1
WM
W. Loti
AQ834
Biol 234
03-2
EDC
Dr. Quel
EDC243
Cmpt 354 03-1
AQ1
Dr. Klaus
ASB985
Cmpt 354 03-2
AQ2
Dr. Yu
ASB111
Student Table
Std-id Std-name Std-address
Std-phone
25636
Will B.
Elf Ave., Van.
47352
Kim L.
21544
Xiao T.

Instructor’s office
Course Table
Course-name credit
256.2453
Cmpt 101
Cmpt 150
4
3
Merry Cr., Poco
939.2766
Bus 152
3
Alpha St., Bby
295.9976
Engl 102
2
Biol 234
3
Cmpt 354
3
Janice Regan, 2008
We can therefore store
the attributes that
depend on this
association into a
Student Registration
table. The compound
primary key of this
table is the union of
the primary keys of
the Student and the
Course Offering tables:
Student Registration Table
Std-id Course-name Semester
15438
15438
15438
25636
47352
21544
21544
Cmpt 101
Cmpt 150
Bus 152
Engl 102
Biol 234
Cmpt 354
Cmpt 354
03-2
03-1
03-2
03-1
03-2
03-1
03-2
Grade
CB
AA+
B+
D
A32
Second Normal Form
 To get Student Registration System in 2NF we
need 4 tables (files)

Multiplicities come from our Requirement Analysis
phase

With this 2NF DB, students do not have to register to a
course to be admitted to an institution

Room and instructor for a course offering can be
entered even if there are no students registered yet

Less redundancy: Most update problems have been
eliminated, but we can still have multiple occurrences
of instructor and instructor’s office
Janice Regan, 2008
33
Second Normal Form
 Definition of 2NF:
 The table is in 1NF
 The table includes no partial
dependencies
Janice Regan, 2008
34
Is there still a problem?
Yes!
 Our tables in 2NF could still contains data
redundancies due to transitive
dependencies.
 When one non-primary key attribute is
dependent on another non-primary key
attribute, the second non-primary key
attribute is transitively dependent on the
first non-primary key attribute.
Janice Regan, 2008
35
Transitive Dependencies: example
 instructor’s office (non-primary key attribute) is
transitively dependent on instructor (another nonprimary key attribute) but not on any of the
primary key attributes for that particular table
(course and/or semester)
 Solution: Conversion from 2NF to 3NF
 Determine the transitive dependencies.
 Split 2NF table containing the transitive
dependency such that the dependency is
represented by its own table.
Janice Regan, 2008
36
3NF Example
Course Offering Table
Course-name Semester Room
Instructor
Cmpt 101
03-2
AQ2
Dr. Klaus
Cmpt 150
03-1
AQ1
Bus 152
03-2
Engl 102
Biol 234
Course Table
Course-name credit
M. Nole
Cmpt 101
Cmpt 150
4
3
ASB
V. Karu
Bus 152
3
03-1
WM
W. Loti
Engl 102
2
03-2
EDC
Dr. Quel
Biol 234
3
Cmpt 354
3
Cmpt 354 03-1
AQ1
Dr. Klaus
Cmpt 354 03-2
AQ2
Dr. Yu
Student Table
Std-id Std-name Std-address
Std-phone
25636
Will B.
Elf Ave., Van.
256.2453
47352
Kim L.
Merry Cr., Poco
939.2766
21544
Xiao T.
Alpha St., Bby
295.9976
Janice Regan, 2008
Instructor Table
Instructor Instructor’s
office
Dr. Klaus ASB985
M. Nole
ASB352
V. Karu
WM543
W. Loti
AQ834
Dr. Quel
EDC243
Dr. Yu
ASB111
Student Registration Table
Std-id Course-name Semester
15438
15438
15438
25636
47352
21544
21544
Cmpt 101
Cmpt 150
Bus 152
Engl 102
Biol 234
Cmpt 354
Cmpt 354
03-2
03-1
03-2
03-1
03-2
03-1
03-2
Grade
CB
AA+
B+
D
A37
Third Normal Form
 Definition:
 Every table is in 2NF.
 There are no transitive dependencies.
Janice Regan, 2008
38
Normalization Summary
 When normalizing, we seek to make sure that
attributes depend

on the key (1NF)

on the whole key (2NF)

on nothing but the key (3NF)
 When normalized:

records have fixed length

no insert/delete/update anomalies

minimize redundancy
Janice Regan, 2008
39
Download