Information Resources Management March 13, 2001 Agenda Administrivia Normalization Homework #7 Mid-Term #2 Administrivia Homework #4 Homework #5 Homework #6 Quiz 2 Mid-Term #1 Keys Mid-Term Grades Regrade Requests HW 5 & 6 Create Database Enter query(s) as submitted Submit to me Database (electronic) Graded homework (paper) Reserve the right to change test data and reexecute query Normalization Why & What 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form 4th Normal Form Normalization - Why Eliminate anomalies Avoid duplication Increase flexibility and stability Reduce maintenance Normalization - What?!? Analysis of functional dependencies between attributes Building several smaller tables from larger ones Decomposing relations with anomalies to produce smaller, well-structured relations Reducing complexity & increasing stability Normalization - What (2) Series of Steps Recipe for constructing a “good” physical model of a database from a logical model Applied to all existing tables, including ones produced by earlier normalization steps Example Sales (Order#, Date, CustID, Name, Address, City, State, Zip, {Product#, ProductDesc, Price, QuantityOrdered}, Subtotal, Tax, S&H, Total) What are the problems with using a single table for all order information? Problems Implementing Repeating Groups Duplication of Data (customer name & address) Unnecessary Data (subtotal, total, tax) Others Normalization is a process to eliminate these problems. 1st Normal Form Eliminate Repeating Groups 1st Normal Form has no repeating groups Create definition with all other attributes, remove the repeat {}, and change the primary key to include the “key” for the repeating group. Example Sales (Order#, Date, CustID, Name, Address, City, State, Zip, Product#, ProductDesc, Price, QuantityOrdered, Subtotal, Tax, S&H, Total) Why is this better? 1st NF Improvements Implementation is possible Querying is possible 2nd Normal Form Remove all partial functional dependencies 2nd Normal Form has no partial functional dependencies and is in 1st Normal Form Partial dependencies get their own tables -- original table gets a foreign key Partial Functional Dependencies An attribute is only dependent on part of the primary key must be composite key single attribute key is 2nd NF Functional dependencies can be specified explicitly but usually come from the E-R model, user specifications, and common sense key non-key attributes Example - Functional Dependencies Order# Date, CustID, Name, Address, City, State, Zip, Subtotal, Tax, S&H, Total Order#, Product# ProductDesc, Price, QuantityOrdered CustID Name, Address, City, State, Zip Product# ProductDesc, Price Which are partial functional dependencies? Example Sales (Order#, Date, CustID, Name, Address, City, State, Zip, Subtotal, Tax, S&H, Total) OrderLine (Order#, Product#, ProductDesc, Price, QuantityOrdered) Is this 2nd NF? Example Sales (Order#, Date, CustID, Name, Address, City, State, Zip, Subtotal, Tax, S&H, Total) OrderLine (Order#, Product#, QuantityOrdered) Product (Product#, ProductDesc, Price) Is this 2nd NF? Why is this better than 1st NF? 2nd NF Improvements Elimination of Duplicate Data No Loss 3rd Normal Form Eliminate transitive functional dependencies 3rd Normal Form has no transitive depencencies and is in 2nd Normal Form Transitive dependencies get their own tables -- original table gets a foreign key Transitive Functional Dependencies Attribute is dependent on another, nonkey attribute or attributes Attribute is the result of a calculation CustID Name, Address, City, State, Zip Example Sales (Order#, Date, CustID, Subtotal, Tax, S&H, Total) OrderLine (Order#, Product#, QuantityOrdered) Product (Product#, ProductDesc, Price) Customer (CustID, Name, Address, City, State, Zip) Is this 3rd NF? Why is this better than 2nd NF? Example Sales (Order#, Date, CustID) OrderLine (Order#, Product#, QuantityOrdered) Product (Product#, ProductDesc, Price) Customer (CustID, Name, Address, City, State, Zip) Is this 3rd NF? Why is this better than 2nd NF? 3rd NF Improvements Elimination of Duplicate Data No Loss Data is Well-grouped Beyond 3rd Normal Form Assume we also want to track information about products, builders, and finishes The following are the functional dependencies: Product, Finish Builder Builder Finish Beyond 3rd Normal Form ProdFinish (Product#, {Finish, Builder}) becomes ProdFinish (Product#, Finish, Builder) Is this 3rd NF? What’s wrong with 3rd NF? Product Finish Builder 1 Cherry Stan 1 Oak Sue 2 Oak Vera 3 Pine Marv 3 Cherry Stan 5 Oak Vera Product, Finish Builder Builder Finish What’s Wrong with 3rd NF? Product 1 1 2 3 3 5 Finish Cherry Oak Oak Pine Cherry Oak What happens when: 1. Vera is replaced by Vern? Builder Stan Sue Vera Marv Stan Vera 2. Vera is rehired to work with Oak? 3. Product #3 in pine is discontinued? What’s Wrong with 3rd NF? Problems 1. Multiple changes need to be made 2. Can’t assign a builder without a product 3. Lose information that Marv works in Pine Problem & Solution Problem: Builder Finish Builder is not a key Solution: Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF) Every determinant in a relation (LHS of the FD’s) is a candidate key and 3rd NF Make determinant part of the key and that which is dependent on it an attribute and renormalize Example ProductFinish (Product#, Builder, Finish) Is this BCNF? Hint: Is it 3rd NF? Example ProductFinish (Product#, Builder) Builder (Builder, Finish) Is there anything wrong with this? Example ProductBuilder (Product#, Builder) Builder (Builder, Finish) Normalization often results in the need to rename tables so the table name matches the actual contents. Beyond BCNF Normalization with separate repeating groups can result in other anomalies CustService (State, {SalesPerson}, {Delivery}) Beyond BCNF CustService (State, SalesPerson, Delivery) State Sales Delivery State Sales Delivery PA George UPS NJ Mike UPS PA George RPS NJ Mike Truck PA Sue UPS NJ Valerie UPS PA Sue RPS NJ Valerie Truck Is this BCNF? Beyond BCNF Everything is in the key -- must be BCNF Still problems with duplication Multivalued Dependencies Multivalued Dependency At least three attributes (A, B, C) A B and A C B and C are independent of each other (they really shouldn’t be in the same table) 4th Normal Form No multivalued dependencies and BCNF Create separate tables for each separate functional dependency Example SalesForce (State, SalesPerson) Delivery (State, Delivery) State Sales State Delivery PA George PA UPS PA Sue PA RPS NJ Mike NJ UPS NJ Valerie NJ Truck Beyond 4th Normal Form 5th Normal Form Project-Join Normal Form Domain Key Normal Form (DKNF) User View Remove partial functional dependencies 1st NF Remove repeating groups 2nd NF Remove remaining functional dependency anomalies 3rd NF Remove transitive functional dependencies BCNF Remove multivalued dependencies 4th NF In-Class Exercises Identify the current normal form If not 4th NF, transform to 4th NF Homework #7 Normalization Database schema from HW #3 Earlier due date - post key? Mid-Term #2 Next week, 3/20 Topics Converting an E-R Diagram to a physical database schema Normalizing that schema (3NF) SQL Identification of BCNF, 4NF problems