Normalization A337 Structure What is a database? ◦ Tables of information Rows are referred to as records Columns are referred to as fields or attributes Record identifier is referred to as a record key Types ◦ Relational - Most common, Object-Oriented ◦ Hierarchical, Network (much older types) A337 - Reed Smith 2 Database structure Two approaches to the structure issue: ◦ Conceptual (you start with the question of “what information should I have?”) ERD from “scratch” ◦ Empirical (you already know what data there will be you just want to organize it into tables) – NORMALIZATION A337 - Reed Smith 3 Database Tables and Normalization Normalization - Process for evaluating and correcting table structures to minimize data redundancies ◦ Works through a series of stages called normal forms: Normal form (1NF) Second normal form (2NF) Third normal form (3NF) ◦ There are higher forms but are rarely necessary Normalization Why? ◦ Data structures need to: Minimize redundancy Avoid insertion, update, and deletion anomalies How? ◦ Restructure information such that: Only flat (rectangular) files exist (1st normal form) – No Nulls All items in each record depend upon) the primary record key (2nd normal form) – No Partial Dependencies If a field depends upon another then the “other” must be a primary key (3rd normal form) – No Transitive Dependencies A337 - Reed Smith 5 Normalize the following table: SALES_ORDERS SO_Number 1010 Item_Number Item_Name Qty_Ordered Cust_Code Cust_Name 2010-0050 Formed Handlebar 2 WHEEL Wheelaway Cycle Center 1000-1 20 in. Bicycle 5 WHEEL Wheelaway Cycle Center 1002-1 24 in. Bicycle 5 ETC Bikes Et Cetera 1001-1 26 in. Bicycle 1003-1 20 in. Bicycle 1001-1 26 in. Bicycle 10 WHEEL Wheelaway Cycle Center 1013 1001-1 26 in. Bicycle 50 IBS Inter. Bicycle Sales 1014 1003-1 20 in. Bicycle 25 ETC Bikes Et Cetera 1015 1003-1 20 in. Bicycle 25 WHEEL Wheelaway Cycle Center 1016 3961-1041 Tire Tube, 26 in. 3965-1050 Spoke Reflector 1003-1 1000-1 1011 1012 10 ETC 5 WHEEL 5 ETC 50 Bikes Et Cetera Wheelaway Cycle Center Bikes Et Cetera ETC Bikes Et Cetera 20 in. Bicycle 5 ETC Bikes Et Cetera 20 in. Bicycle 4 ETC Bikes Et Cetera A337 - Reed Smith 6 What is wrong with this solution? A337 - Reed Smith 7 First Normal Form Eliminate Nulls/Repeating Groups – ◦ Eliminate repeating groups by eliminating nulls, filling in cells with implied values with actual values Select a primary key ◦ may be a composite key 1NF: SALES_ORDERS SO_Number Item_Number Item_Name Qty_Ordered Cust_Code Cust_Name 1010 2010-0050 Formed Handlebar 2 WHEEL Wheelaway Cycle Center 1010 1000-1 20 in. Bicycle 5 WHEEL Wheelaway Cycle Center 1011 1002-1 24 in. Bicycle 5 ETC Bikes Et Cetera 1011 1001-1 26 in. Bicycle 1012 1003-1 20 in. Bicycle 1012 1001-1 26 in. Bicycle 10 WHEEL Wheelaway Cycle Center 1013 1001-1 26 in. Bicycle 50 IBS Inter. Bicycle Sales 1014 1003-1 20 in. Bicycle 25 ETC Bikes Et Cetera 1015 1003-1 20 in. Bicycle 25 WHEEL Wheelaway Cycle Center 1016 3961-1041 Tire Tube, 26 in. 1016 3965-1050 Spoke Reflector 1016 1003-1 1016 1000-1 10 ETC 5 WHEEL 5 ETC 50 Bikes Et Cetera Wheelaway Cycle Center Bikes Et Cetera ETC Bikes Et Cetera 20 in. Bicycle 5 ETC Bikes Et Cetera 20 in. Bicycle 4 ETC Bikes Et Cetera A337 - Reed Smith 9 What is wrong with this solution? Partial Dependencies ◦ For example, the sales order number is not relevant in the determination of the item name ◦ Similarly, the customer code and customer name do not depend upon the Item ID, they only depend upon the sales order number. A337 - Reed Smith 10 Second Normal Form Eliminate Partial Dependencies ◦ Write each key component on separate line, and then write the original (composite) key on the last line ◦ Each component will become the key in a new table Identify the Dependent Attributes ◦ Determine which attributes are dependent on which other attributes Creating 3 Tables SO_Number, Item_Number SO_Number, Item_Number 2NF: INVENTORY_ITEMS Item _ Nu m ber SAL ES_ORDER line ite m INVENTORY SO_Numb er Item_ Nu mber Qty_Ord ered 10 10 20 10 -0 050 2 10 10 10 00 -1 5 10 11 10 02 -1 5 10 11 10 01 -1 10 10 12 10 03 -1 5 10 12 10 01 -1 10 10 13 10 01 -1 50 10 14 10 03 -1 25 10 15 10 03 -1 25 10 16 39 61 -1 041 5 10 16 39 65 -1 050 50 10 16 10 03 -1 5 10 16 10 00 -1 4 Item _ Na m e 10 00 -1 20 in. Bicycle 10 01 -1 26 in. Bicycle 10 02 -1 24 in. Bicycle 10 03 -1 20 in. Bicycle 10 03 -1 20 in. Bicycle 20 10 -0 050 Fo rm e d Hand leb ar 39 61 -1 041 Tire Tube , 26 in . 39 65 -1 050 Spo ke Reflector SAL ES_ORDERS SO_Numb er Cust_ Co de Cust_ Na me 10 10 WHEEL Whe elawa y Cycle Cen ter 10 11 ETC Bikes Et Cete ra 10 12 WHEEL Whe elawa y Cycle Cen ter 10 13 IBS Inte r. Bicycle Sale s 10 14 ETC Bikes Et Cete ra 10 15 WHEEL Whe elawa y Cycle Cen ter 10 16 ETC Bikes Et Cete ra A337 - Reed Smith 13 What is wrong with this solution? Transitive Dependencies ◦ Notice that the third column of the Sales Orders file has the Customer name and that depends upon the customer number. ◦ But the customer number is not the primary key A337 - Reed Smith 14 Third Normal Form 1. For every transitive dependency, write its determinant as a PK for a new table 2. Identify the attributes dependent on each determinant identified in Step 1 and identify the dependency 3. Remove the dependent attributes in transitive relationship(s) from each table that has such a transitive relationship Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel 3NF: INVENTORY_ITEMS Item _ Nu m ber SAL ES_ORDER line ite m INVENTORY SO_Numb er Item_ Nu mber Qty_Ord ered 10 10 20 10 -0 050 2 10 10 10 00 -1 5 10 11 10 02 -1 5 10 11 10 01 -1 10 10 12 10 03 -1 5 10 12 10 01 -1 10 10 10 WHEEL 10 13 10 01 -1 50 10 11 ETC 10 14 10 03 -1 25 10 12 WHEEL 10 15 10 03 -1 25 10 16 39 61 -1 041 5 10 13 IBS 10 16 39 65 -1 050 50 10 14 ETC 10 16 10 03 -1 5 10 15 WHEEL 10 16 10 00 -1 4 10 16 ETC Item _ Na m e 10 00 -1 20 in. Bicycle 10 01 -1 26 in. Bicycle 10 02 -1 24 in. Bicycle 10 03 -1 20 in. Bicycle 10 03 -1 20 in. Bicycle 20 10 -0 050 Fo rm e d Hand leb ar 39 61 -1 041 Tire Tube , 26 in . 39 65 -1 050 Spo ke Reflector SAL ES_ORDERS SO_Numb er Cust_ Co de CUSTOMERS Cust_ Co de Cust_ Na me ETC Bikes Et Cete ra IBS Inte r. Bicycle Sales WHEEL Whe elawa y Cycle Cen ter A337 - Reed Smith 16 Denormalization Creation of normalized relations is important database design goal Processing requirements should also be a goal If tables decomposed to conform to normalization requirements ◦ Number of database tables expands 17 Denormalization (continued) Joining larger number of tables takes additional disk input/output (I/O) operations and processing logic ◦ Reduces system speed Conflicts among design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization A337 Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel 18