Class Agenda (11/06, 11/08) Review HW #8 answers Present normalization process Enhance conceptual knowledge of database design. Improve practical skills of database design. Approach this week Identify and define vocabulary for normalization. Discuss the characteristics of the three normal forms and the characteristics of a data model in third normal form. Use the normalization process to do the same database design as done intuitively last week. Compare results. 1 What is normalization? Normalization is a formal, process-oriented approach to data modeling. Normalization is the process of: examining groups of data attributes; splitting them into appropriate entities; identifying the relationships between the entities; and identifying appropriate primary and foreign keys. 2 Same old/same old Normalization should sound like what you have already done during database design. The ultimate goals of design have not changed; we are just going to go about it in a slightly different way. 3 Database Normalization What will you know about database normalization? Define normalization. Know the vocabulary of normalization. Understand the process of normalization. Better understand the characteristics of an effective database design. What will you be able to do? Be able to identify the characteristics of each normal form. Be able to tell whether or not a data model is in third normal form. Potentially be able to use normalization to assist you in the design of a database. 4 Normalization process Some refer to this as the “bottom-up” form of database design. Contrast with the more intuitive “top-down” approach we have been using. The results from the normalization process are stable, flexible entities. The results from the intuitive approach should be the same. 5 Two methods of applying normalization 1. Use it to help in designing a database. Normalization starts with a single entity. Normalization breaks that entity into a series of additional entities. More entities are discovered and named during the process. Entities are linked during the process. 2. Use it to validate the design of a database. Identify entities from the meaning of the data. Create conceptual and logical data models. Apply the rules of normalization to ensure a stable, non-redundant design. 6 Vocabulary for normalization A “functional dependency" is a relationship between attributes in which one attribute or group of attributes determines the value of another. A “determinant” is an attribute or group of attributes that, once known, can determine the value of another attribute. 7 Examples of functional dependencies and determinants A social security number determines your name and address. SSN name, address. A vehicle id number determines the make and model of a car. VIN make, model. Name and address are “functionally dependent” on SSN. SSN “determines” name and address. Functional dependency diagram format: CourseID CourseName, CourseDescription, CourseCredits ZipCode City, State PatientID, TreatmentDateTime TestResults Normalization process Normalization is accomplished in stages. A “normal form” is a state (level of completeness) of a data model. Unnormalized data: A data model that has not been normalized. It contains repeating groups and is not a stable model. Unnormalized data is essentially one entity. The system under analysis is categorized as a single entity. 9 Steps/forms/phases in Normalization First normal form: Remove repeating groups. Second normal form: Remove partial functional dependencies. Third normal form: Remove transitive dependences 10 • • • • • • • • • • • • • • • • • Semester Unnormalized data for grade report exercise Year Student Name Student Address Student City Student State Student Zip Code What attributes might be needed that aren’t visible on the grade report? Student ID Student College Student Major Student Minor Student Year Course ID Course Title Course Instructor Course Credits Grade Group all attributes in one “big” entity. Identify a primary key for the entity. Maybe studentID for this one. First Normal Form First normal form: Remove repeating groups. A repeating group is an attribute or group of attributes that can have more than one value for an instance of an entity. If it is a single attribute, we have been calling it a “multi-valued” attribute. To get a data model into first normal form: Identify repeating groups and place them as separate entities in the model. Identify a primary key for the repeating group. The key may be concatenated. Create the relationships between entities. Divide m:n relationships with appropriate intersection entities. 12 Second Normal Form Second normal form: dependencies. Remove partial functional A partial functional dependency is a situation in which one or more non-key attributes are functionally dependent on part, but not all, of the primary key. Partial functional dependencies occur only with concatenated keys. Examples of partial functional dependencies: PatientID, TreatmentDateTime PatName, TstResults, TrtID, LocID CourseID, StudentID CourseTitle, Grade Which entities developed during the transition to first normal form for the grade report have concatenated keys? 13 Third normal form Third normal form: Remove transitive dependencies. A transitive dependency occurs when a non-key attribute is functionally dependent on one or more non-key attributes. Third normal form examines entities with single primary keys and removes the “floating” or transitive dependencies. It may be possible to have attributes that are determined by other attributes, rather than by the primary key. They must be removed into entities with appropriate primary keys. Example of partial functional dependency: PatID, TrtDateTime TstResults, TrtType, TrtDescription, LocName, TrtID, LocID 14 Summary of normalization process Examine and evaluate the logical data model for effectiveness. Find the repeating groups and put the model into first normal form. Identify primary key fields for any new entities. Relate entities with foreign keys. Find the functional dependencies. Identify the partial functional dependencies and put the model into second normal form. Identify primary key fields for any new entities. Relate entities with foreign keys. Find the transitive dependencies and put the model into third normal form. Identify primary key fields for any new entities. Relate entities with foreign keys. 15 Goal of normalization A set of entities where each attribute in each entity is dependent on the primary key, the whole primary key, and nothing but the primary key!! 16