Normalization • Also called “loss-less decomposition” • Process of optimizing table structures to eliminate redundancy and avoid anomalies and problems with extensibility. • Supports the golden rule: Each fact should be stored in the database only once. • Does not provide the solution to all design problems but provides a solid foundation. Normal Forms • • • • • • • 1st Normal Form 2nd Normal Form 3rd Normal Form BCNF 4th Normal Form 5th Normal Form Domain-Key Normal Form 1st Normal Form First Normal Form is violated if: • The relation has no identifiable primary key. • Any attempt has been made to store a multivalued fact in a tuple. 1st NF - Example Evaluate the design solutions on the next four slides for: • • • • • Query-ability Join-ability Constrain-ability Extensibility (of Language Domain) Extensibility (of Schema) 1NF Example – Schema 1 (correct) Programs Table Employees Table EMPID LANGUAGE EMPID LNAME 23 Jones 25 Smith 26 Billings 31 Dance 32 Jones 35 Barker 36 Woods 37 Jones 23 23 23 31 32 32 32 32 36 36 36 37 37 COBOL JAVA SQL SQL JAVA SQL VB COBOL VB SQL JAVA COBOL SQL FNAME SEX DEPT Mark M ITR Sara F FINC David M ACTG Ivanna F ACTG Mary F ITR Bob M ACTG Robin M ITR Mary F FINC PHONE 555-1087 555-2222 555-4356 444-4887 555-8745 555-6565 555-9812 555-1234 Languages Table NAME COBOL JAVA SQL VB FULLNAME COmmon Business Oriented Language JAVA Structured Query Language Visual Basic SALARY 45000 55000 42000 60000 70000 44000 90000 56000 1NF Example – Schema 2 (incorrect) Employees Table EMPID LNAME 23 Jones 25 Smith 26 Billings 31 Dance 32 Jones 35 Barker 36 Woods 37 Jones FNAME SEX DEPT Mark M ITR Sara F FINC David M ACTG Ivanna F ACTG Mary F ITR Bob M ACTG Robin M ITR Mary F FINC PHONE 555-1087 555-2222 555-4356 444-4887 555-8745 555-6565 555-9812 555-1234 SALARY 45000 55000 42000 60000 70000 44000 90000 56000 LANGUAGES COBOL, JAVA, SQL SQL JAVA, SQL, VB, COBOL VB, SQL, JAVA COBOL, SQL Languages Table NAME COBOL JAVA SQL VB FULLNAME COmmon Business Oriented Language JAVA Structured Query Language Visual Basic 1NF Example – Schema 3 (incorrect) Employees Table EMPID LNAME 23 Jones 25 Smith 26 Billings 31 Dance 32 Jones 35 Barker 36 Woods 37 Jones FNAME SEX DEPT Mark M ITR Sara F FINC David M ACTG Ivanna F ACTG Mary F ITR Bob M ACTG Robin M ITR Mary F FINC PHONE 555-1087 555-2222 555-4356 444-4887 555-8745 555-6565 555-9812 555-1234 SALARY LANG1 45000 COBOL 55000 42000 60000 SQL 70000 JAVA 44000 90000 VB 56000 COBOL LANG2 LANG3 JAVA SQL SQL VB SQL SQL JAVA Languages Table NAME COBOL JAVA SQL VB FULLNAME COmmon Business Oriented Language JAVA Structured Query Language Visual Basic LANG4 COBOL 1NF Example – Schema 4 (incorrect) Employees Table EMPID LNAME 23 Jones 25 Smith 26 Billings 31 Dance 32 Jones 35 Barker 36 Woods 37 Jones FNAME SEX DEPT Mark M ITR Sara F FINC David M ACTG Ivanna F ACTG Mary F ITR Bob M ACTG Robin M ITR Mary F FINC PHONE 555-1087 555-2222 555-4356 444-4887 555-8745 555-6565 555-9812 555-1234 SALARY 45000 55000 42000 60000 70000 44000 90000 56000 COBOL JAVA SQL VB T F F F T F F T T F F F T F T F T F F T T F T T F F F F T F T F Languages Table NAME COBOL JAVA SQL VB FULLNAME COmmon Business Oriented Language JAVA Structured Query Language Visual Basic 2nd Normal Form Second Normal Form is violated if: • First Normal Form is violated • If there exists a non-key field(s) which is functionally dependent on a partial key. partial key non-key 2NF Example – Raw Data JE #1 02-JAN-2003 100 Cash 310 Smith-Capital (owner investment) JE #2 03-JAN-2003 100 Cash 220 Notes Payable (borrowed money) JE #3 03-JAN-2003 120 Supplies 100 Cash 220 Notes Payable (purchased supplies) 20,000 20,000 30,000 30,000 5,000 1,000 4,000 2NF Example – Violation Transactions Table JENO LINENO DATE DESCRIPTION 1 1 02-JAN-2003 Owner investment ACCTNO ACCTNAME AMOUNT 100 Cash 20,000 1 2 02-JAN-2003 Owner investment 310 Smith-Capital 2 1 03-JAN-2003 Borrowed money 100 Cash 2 2 03-JAN-2003 Borrowed money 220 Notes Payable 3 1 03-JAN-2003 Purchased Supplies 120 Supplies 3 2 03-JAN-2003 Purchased Supplies 100 Cash (1,000) 3 3 03-JAN-2003 Purchased Supplies 220 Notes Payable (4,000) Is there a non-key field which is functional dependent on a partial key? (20,000) 30,000 (30,000) 5,000 2NF Example – Violation FDs that indicate violation of 2NF JENO LINENO DATE DESCRIPTION 1 1 02-JAN-2003 Owner investment ACCTNO ACCTNAME AMOUNT 100 Cash 20,000 1 2 02-JAN-2003 Owner investment 310 Smith-Capital (20,000) 2 1 03-JAN-2003 Borrowed money 100 Cash 2 2 03-JAN-2003 Borrowed money 220 Notes Payable 3 1 03-JAN-2003 Purchased Supplies 120 Supplies 3 2 03-JAN-2003 Purchased Supplies 100 Cash (1,000) 3 3 03-JAN-2003 Purchased Supplies 220 Notes Payable (4,000) 30,000 (30,000) 5,000 2NF Example – Corrected Journal_Entry Table JENO DATE DESCRIPTION 1 02-JAN-2003 Owner investment 2 03-JAN-2003 Borrowed money 3 03-JAN-2003 Purchased Supplies Transactions Table JENO LINENO ACCTNO ACCTNAME AMOUNT 1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000) 2 1 100 Cash 2 2 220 Notes Payable 3 1 120 Supplies 3 2 100 Cash (1,000) 3 3 220 Notes Payable (4,000) 30,000 (30,000) 5,000 3rd Normal Form Third Normal Form is violated if: • Second Normal Form is violated • If there exists a non-key field(s) which is functionally dependent on another non-key field(s). non-key non-key Note: A candidate key is not a non-key field. 3NF Example – Violation Journal_Entry Table Are there any non-key fields which functional determine another non-key field? JENO DATE DESCRIPTION 1 02-JAN-2003 Owner investment 2 03-JAN-2003 Borrowed money 3 03-JAN-2003 Purchased Supplies Transactions Table JENO LINENO ACCTNO ACCTNAME AMOUNT 1 1 100 Cash 20,000 Are there any redundant facts? 1 2 310 Smith-Capital (20,000) 2 1 100 Cash 2 2 220 Notes Payable 3 1 120 Supplies 3 2 100 Cash (1,000) 3 3 220 Notes Payable (4,000) 30,000 (30,000) 5,000 3NF Example – Violation FD that indicates violation of 3NF Journal_Entry Table Anomalies if not corrected: • update (if name of account 100 changes it must be changed in multiple places risking inconsistancy) • deletion (can't delete JE#3 and its transactions without losing information about account 120) • insertion (can't set up a new account, Jones-capital, for a new partner unless we first have a transaction involving that account. JENO DATE DESCRIPTION 1 02-JAN-2003 Owner investment 2 03-JAN-2003 Borrowed money 3 03-JAN-2003 Purchased Supplies JENO LINENO ACCTNO ACCTNAME AMOUNT 1 1 100 Cash 20,000 1 2 310 Smith-Capital (20,000) 2 1 100 Cash 2 2 220 Notes Payable 3 1 120 Supplies 3 2 100 Cash (1,000) 3 3 220 Notes Payable (4,000) 30,000 (30,000) 5,000 3NF Example – Corrected Journal_Entry Table Accounts Table ACCTNO ACCTNAME 100 Cash 120 Supplies 220 Notes Payable 310 Smith-Capital JENO DATE DESCRIPTION 1 02-JAN-2003 Owner investment 2 03-JAN-2003 Borrowed money 3 03-JAN-2003 Purchased Supplies Transactions Table JENO LINENO ACCTNO AMOUNT 1 1 100 20,000 1 2 310 (20,000) 2 1 100 30,000 2 2 220 (30,000) 3 1 120 5,000 3 2 100 (1,000) 3 3 220 (4,000) 3NF Example – Corrected Final Dependencies ACCTNO ACCTNAME 100 Cash JENO DATE DESCRIPTION 1 02-JAN-2003 Owner investment 120 Supplies 2 03-JAN-2003 Borrowed money 220 Notes Payable 3 03-JAN-2003 Purchased Supplies 310 Smith-Capital All non-key fields are FD on the PK and only the PK. JENO LINENO ACCTNO AMOUNT 1 1 100 20,000 1 2 310 (20,000) 2 1 100 30,000 2 2 220 (30,000) 3 1 120 5,000 3 2 100 (1,000) 3 3 220 (4,000) BCNF Normal Form Boyce-Codd Normal Form is violated if: • Third Normal Form is violated • If there exists a partial key which is functionally dependent on a non-key field(s). non-key partial-key BCNF Example Semantics • A student can have more than one major • A student has a different advisor for each major. • Each advisor advises for only one major. BCNF Example – Violation Student_Majors Table SID 1 ADVISOR MAJOR PHYSICS EINSTEIN 1 BIOLOGY LIVINGSTON 2 PHYSICS BOHR 2 COMPUTER SCIENCE CODD 3 PHYSICS EINSTEIN 4 BIOLOGY LIVINGSTON 4 ACCOUNTING PACIOLI 5 PHYSICS EINSTEIN 6 PHYSICS BOHR 6 BIOLOGY DARWIN 7 COMPUTER SCIENCE CODD 7 BIOLOGY DARWIN Does this relation violate third normal form? Are there any redundant facts? BCNF Example – Violation FD that violates BCNF It is important that you convince yourself that major does not FD advisor. SID 1 ADVISOR MAJOR PHYSICS EINSTEIN 1 BIOLOGY LIVINGSTON 2 PHYSICS BOHR 2 COMPUTER SCIENCE CODD 3 PHYSICS EINSTEIN 4 BIOLOGY LIVINGSTON 4 ACCOUNTING PACIOLI 5 PHYSICS EINSTEIN 6 PHYSICS BOHR 6 BIOLOGY DARWIN 7 COMPUTER SCIENCE CODD 7 BIOLOGY DARWIN BCNF Example – Corrected Advisors Table ADVISOR MAJOR BOHR PHYSICS CODD COMPUTER SCIENCE DARWIN BIOLOGY EINSTEIN PHYSICS SID 1 LIVINGSTON BIOLOGY 1 LIVINGSTON PACIOLI ACCOUNTING 2 BOHR 2 CODD 3 EINSTEIN 4 LIVINGSTON 4 PACIOLI 5 EINSTEIN 6 BOHR 6 DARWIN 7 CODD 7 DARWIN Note that the if the original key, counter-intuitively, in schema 1 had been defined as SID & ADVISOR this would have been a 2NF violation. Student_Advisors Table ADVISOR EINSTEIN 4th Normal Form 4th Normal Form is violated if: • Boyce Codd Normal Form is violated • If there exists a partial key which has multiple independent multi-valued functional dependencies to other partial keys. partial-key1 partial-key2 partial-key3 4NF Example – Violation Instruments_Languages Name Fred Instrument Piano Language French Fred Flute Italian Fred Flute Spanish Jane Piano French Jane Oboe French Sam Piano French Sam Oboe Spanish Sam Flute Spanish 4NF Example – Violation Name Fred Instrument Piano Language French Fred Flute Italian Fred Flute Spanish Jane Piano French Jane Oboe French Sam Piano French Sam Oboe Spanish Sam Flute Spanish Does this relation violate 1st, 2nd, 3rd, or BCNF? Are there any redundant facts? 4NF Example – Correction LanguagesSpoken InstrumentsPlayed Name Fred Language French Name Fred Instrument Piano Fred Italian Fred Flute Fred Spanish Jane Piano Jane French Jane Oboe Sam French Sam Piano Sam Spanish Sam Oboe Sam Flute