Database Design – Lecture 8 Normalization Lecture Objectives Review basic rules for converting Entities to Tables Review how to address Relationships between entities Overview of Normalization 2 Converting ERD into Relational Database Model Entities to Tables 1. 2. 3. 4. 5. Create a separate table for each entity and consider all the attributes you defined in ERD as the table columns Define a unique identifier as the table PK. Could be an existing attribute that can be used for PK otherwise create a generic PK. Do not include derived attributes in the tables. Ensure all attributes are atomic. Create a new table to support multi-valued attributes. 3 Converting ERD into Relational Database Model Relationships 1. 2. For 1:M connectivity, include the PK of the ’1’ as an FK attribute in the ‘M’ table. Add additional attributes to the ‘M’ table, as required. For M:N connectivity, create a bridge table. Include the PKs from both entities as a composite PK in the bridge table. These attributes will also be FKs in the bridge table. Add additional attributes to the bridge table, as required. 4 Converting ERD into Relational Database Model Relationships 3. 4. For Strong-Weak relationship, include PK of strong entity as part of the PK of the weak entity, thus giving it a composite primary key. It will also be an FK in the weak entity. For Supertype/Subtype relationships, create a new entity. Attributes unique to the subtype are moved to the new table and are removed from the supertype entity. Both entities will have the same primary keys. 5 Converting ERD into Relational Database Model Relationships 5. For recursive relationships (1:1, 1:M), add additional attributes as required. For M:N recursive relationship, create a new entity with a composite primary key. Create a primary key for the new entity and make a composite primary key using it and the primary key from the ‘1’ entity. The primary key from the ‘1’ will also be a FK in the new entity. 6 Normalization Normalization can be used in two ways: As a process to validate the table structures created through the conversion of entity relationship diagrams to relational tables As a process to create entities from table structures created by user views 7 Normalization Normalization is a process for assigning attributes to entities Reduces data redundancies Helps eliminate data anomalies Produces controlled redundancies to link tables Works through a series of stages called normal forms Works with views of data 8 Database Tables and Normalization Normalization stages 1NF - First normal form 2NF - Second normal form Eliminate partial dependencies 3NF - Third normal form Eliminate repeating groups Identify PK result is functional dependency Eliminate transitive dependencies BCNF – Boyce-Codd normal form Eliminate dependencies whereby a non-key attribute can identify a key attribute. 9 Need for Normalization possible primary keys? • Table entries have data inconsistencies (note blank entries) • Table displays data anomalies: • Update - Modifying JOB_CLASS • Insertion - New employee must be assigned project • Deletion – If an employee is deleted, other vital data lost Figure 5.2 10 Conversion to 1NF Repeating groups must be eliminated Proper primary key developed Uniquely identifies attribute values - rows (functional dependency) 11 Data Organization: 1NF Figure 5.2 Eliminate repeating groups – populate every cell of the ‘table’. 12 Conversion to 1NF Identify dependencies Desirable dependencies based on primary key Less desirable dependencies Partial (later) based on part of composite primary key Transitive (later) one nonprime attribute depends on another nonprime attribute 13 Dependency Diagram (1NF) First Normal Form: identify dependencies Partial Dependency: attributes dependent on part of the primary key A B C d e f Transitive Dependency (non-key attribute depends on another non-key attribute) Note: Capital letters refer to primary key, lower case letters refer to attributes. 14 Dependency Diagram (1NF) First Normal Form: identify all dependencies Partial Dependency: attributes dependent on part of the primary key A B C Functional Dependency: A, B, C, d, e, f Partial Dependency: A, d, e Transitive Dependency: d, f d e f Transitive Dependency (non-key attribute depends on another non-key attribute) 15 Data Organization: 1NF Figure 5.2 Eliminate repeating groups – populate every cell of the ‘table’. 16 Dependency Diagram (1NF) PROJ_NUM EMP_NUM PROJ_NAME EMP_NAME JOB_CLASS CHG_HOUR Functional Dependency: each attribute is uniquely identified by, or is dependent on the primary key. Primary key is a composite primary key and is made up of PROJ_NUM and EMP_NUM HOURS EMPLOYEE_PROJECT (PROJ_NUM (pk), EMP_NUM (pk), PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) 17 1NF Summarized All key attributes defined No repeating groups in table All attributes dependent on primary key (functional dependency) EMPLOYEE_PROJECT (PROJ_NUM (pk), EMP_NUM (pk), PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) 18 Dependency Diagram Continued 1NF 2NF PROJ_NUM EMP_NUM PROJ_NAME EMP_NAME JOB_CLASS CHG_HOUR HOURS 19 Conversion to 2NF Start with 1NF format: Write each key component on separate line Write original key on last line Each component is new table Write dependent attributes after each key PROJECT (PROJ_NUM (pk), PROJ_NAME) EMPLOYEE (EMP_NUM (pk), EMP_NAME, JOB_CLASS, CHG_HOUR) EMPLOYEE_PROJECT (PROJ_NUM (pk, fk), EMP_NUM (pk, fk), HOURS) Attribute of hours is dependent on composite primary key 20 2NF Summarized In 1NF Includes no partial dependencies Attributes dependent on a portion of primary key Still possible to exhibit transitive dependency Attributes may be functionally dependent on nonkey attributes 21 Dependency Diagram Continued 1NF 2NF 3NF PROJ_NUM EMP_NUM PROJ_NAME EMP_NAME JOB_CLASS CHG_HOUR HOURS 22 Conversion to 3NF Create separate tables to eliminate transitive functional dependencies Identify any additional attributes needed in new table JOB (JOB_CLASS (pk), JOB_DESCRIPTION, CHG_HOUR) PROJECT (PROJ_NUM (pk), PROJ_NAME) New attribute EMPLOYEE (EMP_NUM (pk), EMP_NAME, JOB_CLASS (fk)) EMPLOYEE_PROJECT (PROJ_NUM (pk, fk), EMP_NUM (pk, fk), HOURS) 23 3NF Summarized In 2NF Contains no transitive dependencies 24 Normalization and Database Design Normalization should be part of the design process E-R Diagram provides macro view Normalization provides micro view of entities Focuses on characteristics of specific entities May yield additional entities Difficult to separate normalization from E-R diagramming Business rules must be determined 25 Normalization Steps 1NF All key attributes are defined No repeating groups All attributes are functionally dependent on the primary key 2NF Table is in 1NF No partial dependencies 26 Normalization Steps 3NF Table is in 2NF No transitive dependencies New PKs identified where appropriate 27