Normal Form

advertisement
Class Presentation: Normal Form
By Wen Ying Gao
CS157A
Section 2
October 20, 2005
Database Normalization
Database normalization relates to the level of
redundancy in a relational database’s structure. The
key idea is to reduce the chance of having multiple
different versions of the same data, like an address, by
storing all potentially duplicated data in different tables
and linking to them instead of using a copy.
First Normal Form
The domains of all attributes of a relation schema R are
atomic, which is if elements of the domain are
considered to be indivisible units.
It involves that removal of redundant data from
horizontal rows. We need to ensure that there is no
duplication of data in a given row, and that every
column stores the least amount of information possible.
Example:
A table for the entity of Book
Title
Author
ISBN
Subject
Publisher
Pages
Database System
Concepts
Sudarshan
0-07-295886-3
Database
McGraw-Hill
1142
Database System
Concepts
Silberschatz
0-07-295886-3
Database
McGraw-Hill
1142
The Ultimate
Guide
Das
0-07-240500-7
Unix
McGraw-Hill
445
The Ultimate
Guide
Korth
0-07-240500-7
Unix
McGraw-Hill
445
By applying the first normal form, we will have to construct
separate tables for the redundant data with extra tables to define
the relationship between the tables.
Author_ID Last Name
First Name
1
Sudarshan
Mark
2
Silberschatz
Abraham
Subject_ID
Subject
3
Das
Sumitabha
1
Database
4
Korth
Henry
2
Unix
* Here we have the table for author.
* Here we have the table for subject.
ISBN
Title
Pages
Publisher
0-07-295886-3
Database System Concepts
1142
McGraw-Hill
0-07-240500-7
The Ultimate Guide
445
McGraw-Hill
* Here we have the table for book.
Since the tables had separated in order to avoid redundancy,
we also need to create new tables to connect each table so that
their relationship between each table will remain unchanged.
ISBN
Author_ID
0-07-295886-3
1
0-07-240500-7
3
0-07-295886-3
2
0-07-240500-7
4
* Here we have the relationship between the book and the author.
ISBN
Subject_ID
0-07-295886-3
1
0-07-240500-7
2
* Here we have the relationship between the book and the subject.
Second Normal Form
If each attribute A in a relation schema R meets one of the
following criteria:
It must be in first normal form.
It is not partially dependent on a candidate key.
Every non-key attribute is fully dependent on each
candidate key of the relation.
Second Normal Form (or 2NF) deals with redundancy of data
in vertical columns.
Example of Second Normal Form:
Here is a list of attributes in a table that is in First Normal
Form:
Department
Project_Name
Employee_Name
Emp_Hire_Date
Project_Manager
Project_Name and Employee_Name are the candidate key for
this table. Emp_Hire_Date and Project_Manager are partially
depend on the Employee_Name, but not depend on the
Project_Name. Therefore, this table will not satisfy the Second
Normal Form.
In order to satisfy the Second Normal Form, we need to put the
Emp_Hire_Date and Project_Manager to other tables. We can
put the Emp_Hire_Date to the Employee table and put the
Project_Manager to the Project table.
So now we have three tables:
Department
Project_Name
Employee_Name
Project
Project_ID
Project_Name
Project_Manager
Employee
Employee_ID
Employee_Name
Employee_Hire_Date
Now, the Department table will only have the candidate key left.
Third Normal Form
A relation R is in Third Normal Form (3NF) if and
only if it is:
in Second Normal Form.
Every non-key attribute is non-transitively
dependent on the primary key.
An attribute C is transitively dependent on attribute A
if there exists an attribute B such that A  B and
B  C, then A  C.
Example of Third Normal Form:
Here is the Second Normal Form of the table for the invoice table:
It violates the Third Normal Form because there will be
redundancy for having multiple invoice number for the same
customer. In this example, Jones had both invoice 1001 and 1003.
To solve the problem, we will have to have another table for the
customers.
By having Customer table, there will be no transitive relationship
between the invoice number and the customer name and address.
Also, there will not be redundancy on the customer information.
There will be more examples for the First, Second, and Third
Normal Forms.
The following is the example of a table that change from each
of the normal forms.
First Normal Form:
s# -- supplier identification number (this is the primary key)
status -- status code assigned to
city -- city name of city where supplier is located
p# -- part number of part supplied
qty -- quantity of parts supplied to date
Second Normal Form:
Functional Dependency on First Normal Form:
s# —> city, status (this violated the Second Normal Form)
city —> status
(s#,p#) —>qty
Third Normal Form:
Functional Dependency of the Second Normal Form:
SUPPLIER.s# —> SUPPLIER.status (Transitive dependency)
SUPPLIER.s# —> SUPPLIER.city
SUPPLIER.city —> SUPPLIER.status
Reference:
http://www.utexas.edu/its/windows/database/datamodeling/rm/
rm7.html
http://en.wikipedia.org/wiki/Database-normalization
http://dev.mysql.com/tech-resources/articles/
intro-to-normalization.html
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/
normalisation/2nf.html
http://defiant.yk.psu.edu/~lxn/IST_210/
normal_form_definitions.html
http://www.blueclaw-db.com/database_2nd_normal_form.htm
http://www.troubleshooters.com/littstip/1tnom.html
Download