Database Processing Normalization Chapter 3 David M. Kroenke DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 Chapter Premise • We have one or more tables of data • The data is to be stored in a new database • QUESTION: Should the data be stored as received, or should it be transformed for storage? DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-2 Important Relational Model Terms • • • • • • • • • • • Entity Relation Functional Dependency Determinant Candidate Key Composite Key Primary Key Surrogate Key Foreign Key Referential integrity constraint Normal Form DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-3 Entity • An entity is some identifiable thing that users want to track: – Customers – Computers – Sales • Rows contain data about an entity • Columns contain data about attributes of the entity DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-4 Relation • Relational DBMS products store data about entities in relations, which are a special type of table • A relation is a two-dimensional table that has the following characteristics: – – – – – All entries in a column are of the same kind Each column has a unique name Cells of the table hold a single value The order of the columns and rows is unimportant No two rows may be identical DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-5 A Relation DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-6 Tables That Are Not Relations: Multiple Entries per Cell DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-7 Functional Dependency • A functional dependency occurs when the value of one (a set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID StudentName StudentID (DormName, DormRoom, Fee) • The attribute on the left side of the functional dependency is called the determinant • Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice) ExtendedPrice • Function dependencies are not equations! DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-8 Functional Dependencies Are Not Equations ObjectColor Weight ObjectColor Shape ObjectColor (Weight, Shape) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-9 Composite Determinants • Composite determinant: A determinant of a functional dependency that consists of more than one attribute (StudentName, ClassName) (Grade) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-10 Functional Dependency Rules • If A (B, C), then A B and A C • If (A,B) C, then neither A nor B determines C by itself DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-11 Functional Dependencies in the SKU_DATA Table DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-12 Functional Dependencies in the SKU_DATA Table SKU (SKU_Description, Department, Buyer) SKU_Description (SKU, Department, Buyer) Buyer Department DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-13 What Makes Determinant Values Unique? • A determinant is unique in a relation if, and only if, it determines every other column in the relation • You cannot find the determinants of all functional dependencies simply by looking for unique values in one column: – Data set limitations DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-14 Keys • A key is a combination of one or more columns that is used to identify rows in a relation • A composite key is a key that consists of two or more columns DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-15 Candidate and Primary Keys • A candidate key is a key that determines all of the other columns in a relation • A primary key is a candidate key selected as the primary means of identifying rows in a relation: – There is one and only one primary key per relation – The primary key may be a composite key – The ideal primary key is short, numeric and never changes DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-16 Surrogate Keys • A surrogate key as an artificial column added to a relation to serve as a primary key: – DBMS supplied – Short, numeric and never changes – an ideal primary key! – Has artificial values that are meaningless to users – Normally hidden in forms and reports DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-17 Surrogate Keys Relation Descriptions: • RENTAL_PROPERTY without surrogate key: RENTAL_PROPERTY (Street, City, State/Province, Zip/PostalCode, Rental_Rate) • RENTAL_PROPERTY with surrogate key: RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Rental_Rate) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-18 Foreign Keys • A foreign key is the primary key of one relation that is placed in another relation to form a link between the relations: – A foreign key can be a single column or a composite key – The term refers to the fact that key values are foreign to the relation in which they appear as foreign key values DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-19 Foreign Keys Relation Descriptions: DEPARTMENT (DeptName, BudgetCode, ManagerName) EMPLOYEE (EmpNumber, EmpName, DeptName) Or DEPARTMENT (DeptName, BudgetCode, ManagerName) EMPLOYEE (EmpNumber, EmpName, DeptName) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-20 The Referential Integrity Constraint • A referential integrity constraint is a statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-21 Foreign Key with a Referential Integrity Constraint SKU_DATA (SKU, SKU_Description, Department, Buyer) ORDER_ITEM (OrderNumber, SKU, Quantity, Price, ExtendedPrice) Referential Integrity Constraint: ORDER_ITEM.SKU must first exist in SKU_DATA.SKU DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-22 Modification Anomalies An anomaly is an undesirable consequence of data modification – Deletion anomaly – Insertion anomaly – Update anomaly DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-23 Normal Forms • Relations are categorized as a normal form based on which modification anomalies or other problems that they are subject to: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-24 Normal Forms • 1NF – A table that qualifies as a relation is in 1NF • 2NF – A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key • 3NF – A relation is in 3NF if it is in 2NF and has no transitive dependencies • Boyce-Codd Normal Form (BCNF) – A relation is in BCNF if every determinant is a candidate key “I swear to construct my tables so that all nonkey columns are dependent on the key, the whole key and nothing but the key, so help me Codd.” DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-25 First Normal Form (1NF) • To be in First Normal Form (1NF) a relation must have only single-valued attributes -- neither repeating groups nor arrays are permitted DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-26 Violation of 1NF PlayerID Name Team AtBats Hits 123-456789 Alomar, R Cleveland, New York 234-567890 Alomar, S Cleveland, Chicago 200, 120 60, 150 50, 58 17, 40 Key: PlayerID Determinants: PlayerID Name PlayerID, Team AtBats, Hits DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-27 Second Normal Form (2NF) • To be in Second Normal Form (2NF) the relation must be in 1NF and each nonkey attribute must be dependent on the whole key (not a subset of the key) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-28 Violation of 2NF ItemNo CustomerID Quantity CreditRtg 12 57 25 OK 34 679 3 Poor Key: ItemNo, CustomerID Determinants: ItemNo, CustomerID Quantity CustomerID CreditRtg DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-29 Third Normal Form (3NF) • To be in Third Normal Form (3NF) the relation must be in 2NF and no transitive dependencies may exist within the relation. • A transitive dependency is when an attribute is indirectly functionally dependent on the key (that is, the dependency is through another nonkey attribute) DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-30 Violation of 3NF DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-31 Boyce-Codd Normal Form (BCNF) To be in Boyce-Codd Normal Form (BCNF) the relation must be in 3NF and every determinant must be a candidate key. DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-32 Steps for BCNF DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-33 Violation of BCNF Client ProbType Consultant Alpha Marketing Gomez Alpha Production Raginski Beta Marketing Gomez Omega Marketing Taylor Key: Client, ProbType Candidate Key: Client, Consultant Determinants: Consultant ProbType * No two consultants have same name * Consultant specializes in just one problem type * Consultant can be assigned to multiple clients DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-34