Agenda i. ii. iii. iv. Data Anomalies (problems with un-normalized data) Writing a relation from a User View Writing a relation from a verbal or written description. The first step of Normalization: Eliminating Repeating Groups. Definition: Normalization is the process of assigning attributes to relations in such a way that data redundancies are reduced or eliminated. User Views can be individual descriptions, reports, forms, or lists of data that are required to support the operations of a particular database user. How do we Normalize? We will Normalize our data records in three steps producing flexible and powerful data structures that are free of redundancies. a) b) c) 1st Normal Form: Eliminate Repeating Groups. 2nd Normal Form: Eliminate Partial Dependencies 3rd Normal Form: Eliminate Transitive Dependencies 1. Data Anomalies (problems with un-normalized data) Problem #1 Problem #2 Problem #3 Problem #4 – Data Redundancy means Multiple Updates – Update Anomaly: Means possible Inconsistent Data – Insertion Anomaly: No Place to Hold New Information – Deletion Anomaly: Loss of Information that we wanted to keep. Problem #1 – Multiple Updates : The need to perform the same update in several locations of the database because the same data is repeated. (Ex) Student(Student-Num, 1243658712 2343216578 3214325436 Course, History Java History Student-Name, Tom Blu Jill Fall Jack Pail Teacher, Student-Age) Ms.Green 12 Mr.Brown 13 Ms.Green 12 If Ms.Green is replaced by Ms.White, we will have to make more than one change to the database. Problem #2 – Inconsistent Data: When the same data is repeated in several records, they can be inconsistent. In the example below, which spelling is correct? Ms. Green or Ms. Greene (Ex) Student(Student-Num, Course, 1243658712 History 2343216578 Java 3214325436 History Student-Name, Tom Blu Jill Fall Jack Pail Teacher, Student-Age) Ms.Greene 12 Mr.Brown 13 Ms.Green 12 Problem #3 - No Place to Hold New Information: Let us say we have just hired a new teacher: Mr.Vert. We have no way to put him into the database as he has no students yet. No place for Mr.Vert (Ex) Student(Student-Num, 1243658712 2343216578 3214325436 Student-Name, Tom Blu Jill Fall Jack Pail Teacher, Student-Age) Ms.Greene 14 Mr.Brown 14 Ms.Green 14 Problem #4 – Loss of Information: If these students go to high school and we remove the student records, then we will lose the information about the teachers as well. (Ex) Student(Student-Num, 1243658712 2343216578 3214325436 Student-Name, Tom Blu Jill Fall Jack Pail Teacher, Student-Age) Ms.Greene 14 Mr.Brown 14 Ms.Green 14 2. Writing a relation from a User View CLASS LISTS FOR 2004-1 Course/Sec TeachID Teacher DBS201I 1199 Don Frey OOP244Q 1204 StudentID 061234978 045342973 044511982 075435973 … Mort Moreau 067452397 … StudentName Ju-jin Lee Pui-Ling Chan Cheryl Anderson Buu Tu … Julie Rivieres … a) b) c) d) List attributes Show repeating groups Select primary key (unique identifier for a row) Give the table a name. a ) List attributes Course, Section, TeachID, Teacher, StudentID, StudentName b) Show repeating groups Course, Section, TeachID, Teacher, (StudentID, StudentName) c) Select primary key (unique identifier for a row) Course, Section, TeachID, Teacher, (StudentID, StudentName) d) Give the table a name CLASSLIST(Course, Section, TeachID, Teacher, (StudentID, StudentName)) 3.Writing a relation from a verbal or written description. Write the DBDL for the following description: Each dentist’s office has a unique identifier for insurance companies. There is a mailing address for the office as well as the name of the head dentist. There are many patients and each patient has a unique identifier number. a) List attributes OfficeNo, MailAddress, HeadDentist, PatientNo, PatientName b) Show repeating groups OfficeNo, MailAddress, HeadDentist, (PatientNo, PatientName) c) Select primary key (unique identifier for a row) OfficeNo, MailAddress, HeadDentist, (PatientNo, PatientName) d) Give the table a name. DENTISTOFFICE(OfficeNo, MailAddress, HeadDentist, (PatientNo, PatientName)) We call this 0NF or UNF (Unnormalized Form) because there are repeating groups. 4. The first step of Normalization: Eliminating Repeating Groups. 1st Normal Form: How to eliminate repeating groups. Normalize the 0NF relations to 1NF by: 1) 2) 3) 4) 5) Selecting the Primary Key for the repeating group. Removing the repeating group from the relation. Make the primary key of the repeating group the PK of the outside table plus the key of the inside table. The Original relation remains (without the repeating group). Write the two relations. DBS201J 1199 Don Frey 061234978 045342973 044511982 075435973 etc... (Ex) Ju-jin Lee Pui-Ling Chan Cheryl Anderson Buu Tu Our class would have as a record layout: Class(Course Code, Section, TeacherID, TName, (Student ID, SName)) Step 1: (Student ID, SName)) Step 2,3: (Course Code, Section, Student ID, SName)) Step 4: (Course Code, Section, TeacherID, TName) Step 5: CLASSLIST(Course Code, Section, Student ID, SName)) COURSE(Course Code, Section, TeacherID, TName) So we get two tables after Normalizing to 1NF (First Normal Form). Selecting the Best Primary Key: Which is the best Primary key from the fields of the repeating group? _____StudentID________ CLASSLIST Table DBS201 J 061234978 Ju-jin Lee DBS201 J 045342973 Pui-Ling Chan DBS201 J 044511982 Cheryl Anderson DBS201 J 075435973 Buu Tu COURSE Table DBS201 J 1199 Don Frey DBS201 K 1201 Patricia Belvedere What is the key of record 3 in the CLASSLIST table? ______________________ What is the key of record 2 in the COURSE table? _________________________ (Exercise) Convert the following un-normalized records to 1st Normal Form. Purchases at Shoppers Drug Mart-1111 Young Street Toronto are identified by a unique purchase # on the bill. There can be several items and the purchase must record the item #, the quantity, the unit price, a tax code for each item, and the total price.