IT 20303 • The Relational DBMS • Section 05 Relational Database Theory • Normalization for Logical Database Design Relational Database Theory • Normalization – Process of analyzing a grouping of data items • Based on inherent characteristics • Often applied to existing files or databases Relational Database Theory • Normalization – Principles • Data items belong together in a logical group • Group of items can be identified by own unique identifier Relational Database Theory • Normalization – Data in the group describes one, and only one, thing – A Bottom-Up approach Relational Database Theory • Why Normalize – Avoid update anomalies • Nasty side effects – Minimize storage of redundant data – Support simpler logic for manipulating data Relational Database Theory • Why Not Normalize – Data is never (very rarely) updated – Data warehouse system is seldom normalized Relational Database Theory • Sample Data Not Normalized WARD NAME WARD TYPE NO. OF BEDS SENIOR NURSE PATINET NO PATIENT NAME DATE OF BIRTH Liston Orthopedic 6 J Bryan 45812 D Carter 21/02/65 71384 R Willis 08/10/46 69355 G Barnes 17/06/41 52217 M Brown 21/02/35 10823 R Willis 12/03/54 Godlee General 10 V Fox Relational Database Theory • How to Normalize Data using Functional Dependencies – Definition of Functional Dependency • Given a relation R, attribute Y of R is functionally dependent on attribute X of R, if and only if each X value in R has associated with it precisely one Y-value in R (at any one time) Relational Database Theory • Y of R is Dependent on X of R • X (-->)functionally determines Y X Y WARD NAME WARD TYPE NO. OF BEDS SENIOR NURSE PATINET NO PATIENT NAME DATE OF BIRTH Liston Orthopedic 6 J Bryan 45812 D Carter 21/02/65 Liston Orthopedic 6 J Bryan 71384 R Willis 08/10/46 Liston Orthopedic 6 J Bryan 69355 G Barnes 17/06/41 Godlee General 10 V Fox 52217 M Brown 21/02/35 Godlee General 10 V Fox 10823 R Willis 12/03/54 Relational Database Theory • Functional Dependency Diagram of Hospital Ward Example Patient Name Patient No Date of Birth Ward Type No of Beds Ward Name Senior Nurse Relational Database Theory • Table structure based on FD Diagram WARD WARD NAME WARD TYPE NO OF BEDS SENIOR NURSE Liston Orthopedic 6 J Bryan Godlee General 10 V Fox PATIENT NO PATIENT NAME DATE OF BIRTH WARD NAME 45812 D Carter 21/2/65 Liston 71384 R Willis 8/10/46 Liston 52217 M Brown 21/2/85 Godlee PATIENT Relational Database Theory • Normalization using Codd’s Rules – Origin • Early enthusiasts wanted to use relational theory • Sought rules for structuring data in relational model Relational Database Theory • Normalization using Codd’s Rules – Codd and contemporaries developed rules for “Normal Forms” • 1NF • 2NF • 3NF – Normal levels to do in database design • Boyce/Codd NF – 3.5NF • 4NF • 5NF Relational Database Theory • Customer-Order-Line Item Example – Assume an existing order-entry program and data file: ORD-NO CUST-NO TOT-ITEM-PRC CUSTNAME … ADDR PROD-NO PROD-NO PRODNAME PRODNAME UNIT-PRC UNIT-PRC QTY QTY TOT-ITEMPRC Relational Database Theory • 1NF – Break out repeating groups ORDER ORD-NO CUST-NO TOT-ITEM-PRC CUSTNAME ADDR PROD-NO … PROD-NO PRODNAME CUST-NO CUSTNAME ADDR PRODNAME UNIT-PRC UNIT-PRC QTY QTY TOT-ITEMPRC ORDER ORD-NO LINEITEM ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC ORD-NO PROD-NO PROD-NAME UNIT-PRC QTY TOT-ITEM-PRC Relational Database Theory • 2NF- Break out attributes dependent on part of the primary key LINEITEM ORD-NO PROD-NO PROD-NAME UNIT-PRC PROD-NO QTY TOT-ITEM-PRC QTY LINEITEM ORD-NO PRODUCT PROD-NO PROD-NAME ORDER ORD-NO CUST-NO CUSTNAME ADDR UNIT-PRC TOT-ITEM-PRC Relational Database Theory • 3NF- Break out attributes wholly dependent on another key ORDER ORD-NO CUST-NO CUSTNAME ADDR CUSTOMER CUST-NO ORDER ORD-NO CUSTNAME CUST-NO LINEITEM ORD-NO PROD-NO QTY PRODUCT PROD-NO PROD-NAME UNIT-PRC TOT-ITEM-PRC ADDR Relational Database Theory • Rules for 1NF, 2NF, & 3NF – 1NF • Break out repeating groups into a separate entity – 2NF • Break out attributes that are dependent on part of the primary key into a separate entity • Called Partial Dependency – 3NF • Break out attributes that are wholly dependent on another key (not PK) into a separate entity • Called Transitive Dependency Relational Database Theory • Normalization – A relation R is in 3rd Normal Form (3NF) if and only if the non-key attributes of R (if any) are: • Mutually independent, and • Fully dependent on the primary key of R Relational Database Theory • Normalization Cont’d – A relation is in 3NF if all the attributes are functionally dependent • On the Key • On the Whole Key, and • On Nothing but the Key –(So Help Me Codd) Relational Database Theory • Reconcile differences between the Data Model and Normalized Data Structures – Data model and normalized data structures must be reconciled – Discard data items from old files that are no longer needed • Calculation fields • Redundant fields – Resolve discrepancies in data item names – Ensure that new fields are really necessary • Use standard naming conventions Relational Database Theory • Example 01: PART-NO SUPP-1 SUPP-2 SUPP-3 SUPP-4 WDGT01 XYZZY FOOBAR NULL NULL – What happens when a part has more than four suppliers? – What happens when a supplier is dropped? – How do you query the parts with two or more suppliers? – Normalized Table: PART-NO SUPP WDGT01 XYZZY WDGT01 FOOBAR Relational Database Theory • Example 02: Normalize this table PART-NO SUPP PART_DESC SUPP-ADDRESS WDGT01 XYZZY Blue Widget 123 Bluejay Way WDGT01 FOOBAR Blue Widget 544 Old Orchard PART-NO SUPP SUPP SUPP-ADDRESS PART-NO PART_DESC WDGT01 XYZZY XYZZY 123 Bluejay Way WDGT01 Blue Widget WDGT01 FOOBAR FOOBAR 544 Old Orchard End Section Relational Database Theory • Multiply ways to Normalize Data – Normalization can be accomplished in different ways • Well-formed E-R model is normalized • Functional dependencies • Codd’s Rules for 1NF, 2NF, & 3NF – Discrepancies indicate something is missing or changed – One approach validates or checks another approach Relational Database Theory • Impact of Normalization – Improve the integrity of data • Purpose is to eliminate update anomalies – Minimize storage of redundant data – Reduce the complexity of programming logic • Emphasis now is on maintainability, simplicity of program • Normalized data can minimize complexity of code that manipulates the data – Enhance the stability, “goodness” of database design • Normalized data tends to be easier to understand • Normalized data can be used by many different applications more easily Relational Database Theory • Impact of Normalization on Performance – Concern that a large number of tables-and table joins-will result in poor performance • Join can be a very expensive operation • Test to determine frequency of joins, number of tables joined –After database is created and available Relational Database Theory • Impact of Normalization on Performance Cont’d – Requirements for application performance, response time dictate corrective actions – Performance addressed in section on physical database design • There are alternatives to de-normalizing data to improve performance Relational Database Theory • Recommendations for Data that is Updated – First Normalize – Don’t be dismayed by too many tables • Normalization increases number of tables but improves logic – Normalization is a helpful logical database design technique…for any DBMS Relational Database Theory • Objective of the design process is a “Good” design – The logical database design process • Is well understood • Uses complementary techniques • Can be automated with CASE tools Relational Database Theory • Objective of the design process is a “Good” design cont’d – A “Good” database design • Contains all the important entities and data items • Has stable primary keys • Identifies clearly all relationships • Has table structures in 3NF • Is understood by designers and users • Accurately models the real world, as described in the requirements Relational Database Theory • Questions?