Normalization A technique for identifying table structures that have potential maintenance problems Normalization Normalization is a set of formal conditions that assure that a database is maintainable. The results of a well executed normalization process are the same as those of a well planned E-R model PROCESS OF DATA NORMALIZATION ELIMINATE REPEATING GROUPS Make a separate table for each set of related attributes and give each table a primary key. ELIMINATE REDUNDANT DATA If an attribute depends on only part of a multivalued key, remove it to a separate table. ELIMINATE COLUMNS NOT DEPENDENT ON KEY If attributes do not contribute to a description of the key, remove them to a separate table. Database Programming and Design PROCESS OF DATA NORMALIZATION ISOLATE INDEPENDENT MULTIPLE RELATIONSHIPS No table may contain two or more 1:n or n:m relationships that are not directly related. ISOLATE SEMANTICALLY RELATED MULTIPLE RELATIONSHIPS There may be practical constraints on information that justify separating logically related many-to-many relationships. Database Programming and Design Anomalies A table anomaly is a structure for which a normal database operation cannot be executed without information loss or full search of the data table Insertion Anomaly Deletion Anomaly Update or Modification Anomaly Normal Forms Relational theory defines a number of structure conditions called Normal Forms that assure that certain data anomalies do not occur in a database. Normal Forms 1NF 2NF 3NF BCNF 4NF Keys; No repeating groups No partial dependencies No transitive dependencies Determinants are candidate keys No multivalued dependencies Form (Company Order History) Order # 12003 Date Oct 1, 1997 Oklahoma Retail Company 1111 Asp Norman 1. 2. 3. Description Code Qty Price Amount Footballs Sweat Shirts Shorts 21 44 37 25.00 150 15.00 300 12.00 120 Total 6 20 10 570 0nf: Remove titles and derived quantities Order # 12003 Date Oct 1, 1997 Oklahoma Retail Company 1111 Asp Norman 1. 2. 3. Description Code Qty Price Amount Footballs Sweat Shirts Shorts 21 44 37 25.00 150 15.00 300 12.00 120 Total 6 20 10 570 0 Normal Form Remove titles and derived quantities Schema notation HISTORY(CustName, CustAddr, CustCity {OrderNum, OrderDate {ProdDescr, ProdCode, QtyOrdered, OrderPrice}} 1st Normal Form Add keys Remove repeating groups 1st Normal Form Add Keys for embedded entities Remove Repeating Groups HISTORY(CustID, CustName, CustAddr, CustCity {OrderNum, OrderDate {ProdDescr, ProdCode, QtyOrdered, OrderPrice}} 1st Normal Form Add Keys for embedded entities Remove Repeating Groups Create a table for each embedded entity, from the outside for nested groups Insert foreign keys and junction tables CUSTOMER(CustID, CustName, CustAddr, CustCity) ORDER(OrderNum, CustID, OrderDate {ProdDescr, ProdCode, QtyOrdered, OrderPrice}) 1st Normal Form CUSTOMER(CustID, CustName, CustAddr, CustCity) ORDER(OrderNum, CustID, OrderDate) PRODUCT(ProdDescr, ProdCode,) ORDER-PRODUCT(OrderNum, ProdCode, QtyOrdered, OrderPrice) 1st Normal Form CUSTOMER ORDER PRODUCT 1NF (Keys, No Repeating Groups) Table contains multi-valued attributes. TABLE { ATTRIBUTES} TABLE ?? ATTR-TABLE 2nd Normal Form No partial dependencies (an attribute has a partial dependency if it depends on part of a concatenated key) 2nd Normal Form ROSTER(StuID, ZAPNum, StudentName, CourseTitle, CourseGrade) Remove partial dependencies STUDENT(StuID, StudentName) SECTION(ZAPNum, CourseTitle) STUDENT-SECTION(StuID, ZAPNum, CourseGrade) 2nd Normal Form ROSTER STUDENT SECTION STUDENT-SECTION 2NF No partial dependencies Table has data from several connected tables. TABLE ?? TABLE ?? 3rd Normal Form No transitive dependencies (a transitive dependency is an attribute that depends on other non-key attributes) 3rd Normal Form Note: a transitive dependency arises when attributes from a second entity appear in a given table. SECTION(ZAPNum, RoomNum, Day, Time, CourseTitle, HoursCredit) 3rd Normal Form SECTION(ZAPNum, RoomNum, Day, Time, CourseID ,CourseTitle, HoursCredit) SECTION(ZAPNum, RoomNum, Day, Time, CourseID) COURSE(CourseID, CourseTitle, HoursCredit) 3NF No transitive dependencies Table contains data from an embedded entity with non-key attributes. TABLE TABLE SUB-TABLE ?? SUB-TABLE BCNF is the same, but the embedded table may involve key attributes. Boyce Codd Normal Form Every determinant is a candidate key BCNF BCNF dependenceies are like 3nf dependencies but they involve some key attributes Note: BCNF often arises when a 1:m relationship is modeled as a m:n relationship BCNF SALESMAN-CUST(SalesID, CustID, Commission) SALESMAN(SalesID, Commission) CUSTOMER(CustID, SalesID) 4th Normal Form No multi-valued dependencies 4th Normal Form Note: 4th Normal Form violations occur when a triple (or higher) concatenated key represents a pair of double keys 4th Normal Form 4th Normal Form Multuvalued dependencies Instructor Book Class Price Inro Comp MIS 2003 Parker Intro Comp MIS 2003 Kemp Data in Action MIS 4533 Kemp ORACLE Tricks MIS 4533 Warner Data in Action Warner ORACLE Tricks MIS 4533 MIS 4533 4th Normal Form INSTR-BOOK-COURSE(InstrID, Book, CourseID) COURSE-BOOK(CourseID, Book) COURSE-INSTR(CourseID, InstrID) 4NF (No multivalued dependencies) Independent repeating groups have been treated as a complex relationship. TABLE TABLE TABLE TABLE TABLE TABLE