CmpE226-DB-L05n1-Nor.. - Charles W. Davidson College of

advertisement
CSIS 254
Oracle Normalization
Relational Databases
(Review)
• In relational databases, all data is stored in tables,
which correspond roughly to entities
• Each table is two-dimensional, consisting of rows
and columns
• Each row in a table, called a t-uple, corresponds to
an occurrence of the entity
• Columns in each table contain similar data across
all rows in the table
Relational Database Example
The following table is an example of a relational table
describing classes that students have taken at a mythical
college used in the rest of this lesson
Student
Id
Student
Name
0194327
0194327
1850243
1850243
1850243
8502432
7402943
Joe Adams
Joe Adams
Joe Adams
Jane Smith
Jane Smith
Ida Know
Eunice Eye
Course
Id
CSIS-840
CSIS-824
CSIS-740
CSIS-941
CSIS-840
CSIS-184
CSIS-824
Course Name
VB Concepts
Intro to C++
Oracle Admin
Systems Des.
VB Concepts
Networks
PowerPoint
Grade
Term
Teacher
C
B
A
B
B
A
W
Spr-02
Fal-02
Spr-03
Fal-02
Spr-02
Sum-03
Spr-02
Wilkins
Smythe
Wallace
Evans
Wolkins
Farmer
Simpson
Relational Database Example
• Each row (or t-uple) in the table describes a Class
taken by a Student in a term at our college
• The data in each column is consistent throughout the
table
• However, there are three inconsistencies in the table
itself. Can you find them?
Primary Keys
(Review)
• Each row in a table has a primary key, which is the
column or set of columns identified to our DBMS
that uniquely identifies it from every other row in
the table
• No attribute value in a primary key can be NULL
• A table can have only one primary key
• If a primary key is not specified, Oracle supplies one
• What would be the primary key for our sample
database?
Foreign Keys
(Review)
• An attribute (or group of attributes) in a table can
also be a foreign key, meaning that it references
the primary key (or at least unique attribute) to
another table
• An example would be a Customer Id attribute on
an invoice header, which would reference the
customer account information for that invoice
Normalization
Let’s begin our discussion of normalization by using
an example -- we want to expand the sample relational
table for our mythical college by tracking data for:
–
–
–
–
–
–
–
students
courses
departments
teachers
classes (courses offered during a term)
teachers assigned to each class
students enrolled in each class
Database Normalization Example
CLASS
We might start off
with an entity for
each Class that looks
something like this
Course Id
Term Offered
Department Name
Course Description
Classroom (or “Internet”)
Credits / Hours
Teacher Id
Teacher Name
Student #1 Data
Student #2 Data
….
Student #30 Data
Database Normalization Example
CLASS (exploded)
The information
stored for each
student would be
What problems can
you see with this
scheme?
…
Student #1 Data
Id
Full Name
e-mail Addresses
Grade for Class
GPA
Student #2 Data
Id
Full Name
e-mail Addresses
Grade for Class
GPA.
Problems with Our Example
• We can’t have more than 30 students in a class
• There’s lots of duplicate information in our tables
– This design would require many updates whenever
a change was made to data about a department, a
teacher, a student, etc.
• Does it make sense for us to have to know, for
example, a course number in order to to look up a
teacher’s name?
Problems with Our Example
(continued)
• Removing a class entity occurrence might remove
valuable information from our database
• We don’t have any data verification checks
– We might wind up with inconsistent data across
two or more records (is this necessarily bad if
we are trying to take snapshots?)
Normalization Goal #1
Remove redundant data
• Duplicated data wastes disk space
• Duplicated data may not necessarily be consistent,
that is, stored in exactly the same way
• Redundant data creates problems for our coders
– Ideally, data should be stored (and changed) in
exactly the same way in all locations, which not
only is time consuming for the system’s
programmers, but also takes computer resources
to perform once the system is implemented
Normalization Goal #2
Remove dependency issues
• It is not intuitive for a user of our new system to
look in the CLASS entity to find, for example, a
student’s email address.
• It would probably make more sense to move this
information into a separate entity (i.e., a database
table that defines students).
Normalization
The Bottom Line
“In summary, normal forms insure that
we do not compromise the integrity of
our data by either creating false data
or destroying true data.”
Ensor & Stevenson
Forms of Normalization
• To accomplish these goals, we have created a set
of rules which define normal forms or levels.
• There are five normal forms, each progressively
more restrictive, which are called first normal
form (1NF), second normal form (2NF), …
• Most database designers only consider the first
three forms in their work, as we will
• As we shall see, there might be good reasons to
deviate from these normal forms
First Normal Form (1NF)
• A database is in first normal form (1NF) if each
attribute of the database is simple, single-valued
(atomic), and does not repeat
– Let’s assume column definitions are consistent across rows
• Method:
– Reduce all attributes into atomic components
– Eliminate duplicative columns (repeating groups) and multivalued attributes from the same table
– Create a separate table for each group of related data
– Identify each row with a unique column or set of columns (a
primary key)
Our Sample Database
CLASS
Here’s what our
database entity for
classes at our college
currently looks like
Course Id
Term Offered
Department Name
Course Description
Classroom (or “Internet”)
Credits / Hours
Teacher Id
Teacher Name
Student #1 Data
Student #2 Data
….
Student #30 Data
Our Sample Database in 1NF
We should divide the
Course Id into a
Department Id and
Course Number (e.g.,
Course ID “CSIS-254”
would be divided into
Department Id “CSIS”,
Course Number “254”)
(Won’t this make the
Department Name
redundant?)
CLASS
Department Id (added)
Course Number (added)
Term Offered
Department Name
Course Description
Classroom (or “Internet”)
Credits / Hours
Teacher Id
Teacher Last Name
Student #1 Data
Student #2 Data
….
Student #30 Data
Our Sample Database in 1NF
Next, break out
Student Ids,
Names, e-mail
Address, and
Grades into a
separate entity,
eliminating the
repeating Student
groups.
CLASS / STUDENT
Department Id
Course Number
Term Offered
Student Id
Student Full Name
Student e-mail Addresses
Student Grade for Class
Student GPA
Our Sample Database in 1NF
CLASS / STUDENT
We need to break
down the
Student’s Names
into their simpler
components
Department Number
Course Number
Term Offered
Student Id
Student Full Name
First Name
Middle Name
Last Name
Student e-mail Addresses
Student Grade for Class
Student GPA
Our Sample Database in 1NF
Finally, we need to
break out Student email Addresses into
another entity,
where each
occurrence
represents a single
e-mail address
CLASS / STUDENT
E-MAIL ADDRESS
Department Id
Course Number
Term Offered
Student Id
Address Number or Id
Student e-mail Address
Our Sample Database in 1NF
CLASS
Department Id
Course Number
Term Offered
Department Name
Course Description
Classroom (or “Internet”)
Credits / Hours
Teacher Id
Teacher Last Name
CLASS / STUDENT
Department Id
Course Number
Term Offered
Student Id
Student Full Name
First Name
Middle Name
Last Name
Student Grade for Class
Student GPA
Our Sample Database in 1NF
CLASS / STUDENT
Department Id
Course Number
Term Offered
Student Id
Student Full Name
First Name
Middle Name
Last Name
Student Grade for Class
Student GPA
CLASS / STUDENT
E-MAIL ADDRESS
Department Id
Course Number
Term Offered
Student Id
Address Number or Id
Student e-mail Address
1NF Advantages
• Removes limits artificially introduced into a
database design by using repeating groups
• Ensures that attributes are broken into their most
basic units and are not multi-valued
Exercise
FAVORITE TV SHOWS
Put the following
table in 1NF, then
draw an ERD for
your new system
TV Show Name
Category
Main Star Name #1
Main Star Name #2
Main Star Name #3
Day and Time Shown
Network
Channel
My Rating (1-10)
One Possible Answer
SHOW / STARS
TV Show Name
Star Number
Star Name
FAVORITE TV SHOWS
TV Show Name
Category
My Rating (1-10)
SHOW TIMES
TV Show Name
Slot Number
Date and Time
Network
Channel
Second Normal Form (2NF)
• 2NF implies 1NF by definition
• All non-key attributes must be fully-dependent on
every key attribute in the primary key
– In other words, a non-key attribute cannot depend on only
part of the primary key
– This restriction applies only to tables with composite keys
• 2NF reduces redundant data in a table by extracting
it, placing it in new table(s), then creating
relationships between those tables.
Second Normal Form (2NF)
• Method:
– Remove subsets of data that appear in multiple
rows of a table, and place into separate tables
– Create relationships between these new tables
and their predecessors through the use of
foreign keys.
Our Sample Database in 2NF
We can break out
the Department
Name from the
CLASS entity, as it
will be the same for
each Class having
the same
Department
DEPARTMENT
Department Id
Department Name
Our Sample Database in 2NF
We also can break
out the Course
Description from
this entity, as it also
will be the same for
each Class
referencing the
same Course
COURSE
Department Id
Course Number
Course Description
Credits / Hours
Note that we’ve kept the
Department Id in this
entity. Why?
Our Sample Database in 2NF
We can also break
out the information
about each Teacher,
since it also will be
the same for each
Class that a Teacher
conducts,
irrespective of the
Class
TEACHER
Teacher Id
Teacher Last Name
Our Sample Database in 2NF
Our new CLASS /
STUDENT entity can
also have its studentrelated attributes (names,
and GPA) broken out,
that is, attributes that do
not change with the class
number
STUDENT
Student Id
Student Full Name
First Name
Middle Name
Last Name
Student GPA
Our Sample Database in 1NF
Student e-mail
Addresses are not
dependent upon
Department Id,
Course Number, or
Term, so remove
them from the email entity
STUDENT
E-MAIL ADDRESS
Department Id (deleted)
Course Number (deleted)
Term Offered (deleted)
Student Id
Address Number or Id
Student e-mail Address
Our Sample Database in 2NF
Our final CLASS /
STUDENT entity, minus
all of the attributes that
have been moved to
other entities, looks like
CLASS / STUDENT
Department Id
Course Id
Student Id
Term
Student Grade for Class
2NF and Foreign Keys
• To ensure data integrity, we would implement four
foreign keys in our CLASS, CLASS / STUDENT
– Department Id must reference an occurrence in
DEPARTMENT entity
– Course Id must reference a row in COURSE
– Student Id must reference a row in STUDENT
– Teacher Id must reference a row in TEACHER
• Would we implement a similar restriction on our
student e-mail address entity?
2NF Advantages
• All advantages of 1NF
• Common data is forced to be consistent, since it is
stored in only one place in the database
• We can store data about separate entities without
implying the existence of others
– In our original database design, we can’t store
information about Students, Teachers, or Departments
if we don’t have any classes in which they are
involved.
Exercise
SALES ORDER
Convert the
following
table into
2NF, and
draw a new
ERD
Order Number
Customer Account Number
Customer Account Name
Customer Address
Date of Entry of Order
Date of Requested Shipment
Item Numbers
Item Descriptions
Quantities Ordered
Unit Prices
Extended Prices
Total Order Price
Third Normal Form (3NF)
• 3NF implies 2NF (which implies 1NF)
• A database is in third normal form (3NF) if the
data in every column of each row (occurrence) in
a table (entity) is dependent ONLY upon each
column in the key
– In general, any time the contents of a group of fields
may apply to more than a single record in the table,
consider placing those fields in a separate table.
– This means that derived attributes are not allowed in
3NF
Third Normal Form (3NF)
• All attributes depend upon the key, the whole key,
and nothing but the key
• Method:
– Remove all derived columns
– Move all remaining columns not dependent on
the key into a new table
Our Sample Database in 3NF
Our STUDENT entity
cannot contain a GPA,
since that is a derived
attribute (the average of
all of the Grades
received)
STUDENT
Student Id
Student Names
First Name
Middle Name
Last Name
Student GPA (deleted)
Advantages of 3NF
• All advantages of 1NF and 2NF
• Information is stored in one and only one place in
the database
• All entities are now 2-dimensional, nonredundant, and can be implemented in relational
tables
Disadvantages of Normalization
• Proliferation of tables, resulting in increased
system complexity
– Can be overcome with views for end-users
• Performance hits through added tables and lack of
derived attributes
– May be partially offset by reduced computing
needs of maintaining data only once
• We will discuss these in detail next week...
Last Slide
Next Week’s Assignment
• Draw a complete ERD for our normalized 3NF
mythical college database. Does it make sense to you?
• Normalize the two organizations / systems that you
used in last week’s homework by updating their ERD’s
(Engineering Method only).
• Introduce at least two derived attributes that you might
include in your design, and explain why.
• Prepare for a quiz next week on what we have covered
so far in class:
Stages of SDLC, Entities, Attributes, Relationships,
Diagramming, and Normalization
Download