Class Presentation: Normal Form By Wen Ying Gao CS157A Section 2 October 20, 2005 Database Normalization Database normalization relates to the level of redundancy in a relational database’s structure. The key idea is to reduce the chance of having multiple different versions of the same data, like an address, by storing all potentially duplicated data in different tables and linking to them instead of using a copy. First Normal Form The domains of all attributes of a relation schema R are atomic, which is if elements of the domain are considered to be indivisible units. It involves that removal of redundant data from horizontal rows. We need to ensure that there is no duplication of data in a given row, and that every column stores the least amount of information possible. Example: A table for the entity of Book Title Author ISBN Subject Publisher Pages Database System Concepts Sudarshan 0-07-295886-3 Database McGraw-Hill 1142 Database System Concepts Silberschatz 0-07-295886-3 Database McGraw-Hill 1142 The Ultimate Guide Das 0-07-240500-7 Unix McGraw-Hill 445 The Ultimate Guide Korth 0-07-240500-7 Unix McGraw-Hill 445 By applying the first normal form, we will have to construct separate tables for the redundant data with extra tables to define the relationship between the tables. Author_ID Last Name First Name 1 Sudarshan Mark 2 Silberschatz Abraham Subject_ID Subject 3 Das Sumitabha 1 Database 4 Korth Henry 2 Unix * Here we have the table for author. * Here we have the table for subject. ISBN Title Pages Publisher 0-07-295886-3 Database System Concepts 1142 McGraw-Hill 0-07-240500-7 The Ultimate Guide 445 McGraw-Hill * Here we have the table for book. Since the tables had separated in order to avoid redundancy, we also need to create new tables to connect each table so that their relationship between each table will remain unchanged. ISBN Author_ID 0-07-295886-3 1 0-07-240500-7 3 0-07-295886-3 2 0-07-240500-7 4 * Here we have the relationship between the book and the author. ISBN Subject_ID 0-07-295886-3 1 0-07-240500-7 2 * Here we have the relationship between the book and the subject. Second Normal Form If each attribute A in a relation schema R meets one of the following criteria: It must be in first normal form. It is not partially dependent on a candidate key. Every non-key attribute is fully dependent on each candidate key of the relation. Second Normal Form (or 2NF) deals with redundancy of data in vertical columns. Example of Second Normal Form: Here is a list of attributes in a table that is in First Normal Form: Department Project_Name Employee_Name Emp_Hire_Date Project_Manager Project_Name and Employee_Name are the candidate key for this table. Emp_Hire_Date and Project_Manager are partially depend on the Employee_Name, but not depend on the Project_Name. Therefore, this table will not satisfy the Second Normal Form. In order to satisfy the Second Normal Form, we need to put the Emp_Hire_Date and Project_Manager to other tables. We can put the Emp_Hire_Date to the Employee table and put the Project_Manager to the Project table. So now we have three tables: Department Project_Name Employee_Name Project Project_ID Project_Name Project_Manager Employee Employee_ID Employee_Name Employee_Hire_Date Now, the Department table will only have the candidate key left. Third Normal Form A relation R is in Third Normal Form (3NF) if and only if it is: in Second Normal Form. Every non-key attribute is non-transitively dependent on the primary key. An attribute C is transitively dependent on attribute A if there exists an attribute B such that A B and B C, then A C. Example of Third Normal Form: Here is the Second Normal Form of the table for the invoice table: It violates the Third Normal Form because there will be redundancy for having multiple invoice number for the same customer. In this example, Jones had both invoice 1001 and 1003. To solve the problem, we will have to have another table for the customers. By having Customer table, there will be no transitive relationship between the invoice number and the customer name and address. Also, there will not be redundancy on the customer information. There will be more examples for the First, Second, and Third Normal Forms. The following is the example of a table that change from each of the normal forms. First Normal Form: s# -- supplier identification number (this is the primary key) status -- status code assigned to city -- city name of city where supplier is located p# -- part number of part supplied qty -- quantity of parts supplied to date Second Normal Form: Functional Dependency on First Normal Form: s# —> city, status (this violated the Second Normal Form) city —> status (s#,p#) —>qty Third Normal Form: Functional Dependency of the Second Normal Form: SUPPLIER.s# —> SUPPLIER.status (Transitive dependency) SUPPLIER.s# —> SUPPLIER.city SUPPLIER.city —> SUPPLIER.status Reference: http://www.utexas.edu/its/windows/database/datamodeling/rm/ rm7.html http://en.wikipedia.org/wiki/Database-normalization http://dev.mysql.com/tech-resources/articles/ intro-to-normalization.html http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/ normalisation/2nf.html http://defiant.yk.psu.edu/~lxn/IST_210/ normal_form_definitions.html http://www.blueclaw-db.com/database_2nd_normal_form.htm http://www.troubleshooters.com/littstip/1tnom.html