MODULE 5: Normalization of Database Tables Intended Learning Outcomes At the end of this module, you are expected to be able to: 1. Apply knowledge of normalization in the database design process. 2. Normalize database tables through INF, 2NF, 3NF. 3. Convert 3NF into an ER diagram. Introduction When we design a database for an enterprise, the main objective is to create an accurate representation of the data, relationships between the data, and constraints on the data that is pertinent to the enterprise. To help achieve this objective, we can use one or more database design techniques. In previous module we described a technique called ER modeling. In this module and the next we describe another database design technique called normalization. Normalization is a database design technique that begins by examining the relationships (called functional dependencies) between attributes. Attributes describe some property of the data or of the relationships between the data that is important to the enterprise. Normalization uses a series of tests (described as normal forms) to help identify the optimal grouping for these attributes to ultimately identify a set of suitable relations that supports the data requirements of the enterprise. Discussions ◗ Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. Normalization is a formal method that can be used to identify relations based on their keys and the functional dependencies among their attributes. ◗ Relations with data redundancy suffer from update anomalies, which can be classified as insertion, deletion, and modification anomalies. ◗ One of the main concepts associated with normalization is functional dependency, which describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A ® B), if each value of A is associated with exactly one value of B. (A and B may each consist of one or more attributes.) ◗ The determinant of a functional dependency refers to the attribute, or group of attributes, on the left-hand side of the arrow. • The main characteristics of functional dependencies that we use for normalization have a one-to-one relationship between attribute(s) on the left-hand and right-hand sides of the dependency, hold for all time, and are fully functionally dependent. ◗ Unnormalized Form (UNF) is a table that contains one or more repeating groups. ◗ First Normal Form (1NF) is a relation in which the intersection of each row and column contains one and only one value. • Second Normal Form (2NF) is a relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on the primary key. Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A but not on any proper subset of A. ◗ Third Normal Form (3NF) is a relation that is in first and second normal form in which no non-primary- key attribute is transitively dependent on the primary key. Transitive dependency is a condition where A, B, and C are attributes of a relation such that if A ® B and B ® C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). ◗ General definition for Second Normal Form (2NF) is a relation that is in first normal form and every non-candidatekey attribute is fully functionally dependent on any candidate key. In this definition, a candidate-key attribute is part of any candidate key. ◗ General definition for Third Normal Form (3NF) is a relation that is in first and second normal form in which no noncandidate-key attribute is transitively dependent on any candidate key. In this definition, a candidate-key attribute is part of any candidate key. Review Questions 5.1 1. What is normalization? Normalization is the process of organizing the data in the database and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies 2. When is a table in 1NF? All of the key attributes are defined There are no repeating groups in the table All attributes are dependent on the primary key 3. When is a table in 2NF? It is in 1NF All non-key attributes are fully functional dependent only on the portion of the primary key. 4. When is a table in 3NF? It is in 2NF and it contains no transitive dependencies 5. When is a table in BCNF? When every determinant in a table is a candidate key. 6. Given the dependency diagram shown in Figure 5.1, answer Items 6a−6c. Figure 5.1 C1 C2 C3 C5 C4 a. Identify and discuss each of the indicated dependencies. C1 C2 represents a partial dependency, because C2 only depends on C1 rather than the entire primary key of C1 and C3. C4 C5 represents a transitive dependency, because C5 depends on attribute C4 which isn’t part of the primary key. C1, C3 C2, C4, C5 represents a functional dependency, because C2, C4 and C5 depends on the primary key composed of C1 and C3. b. Create a database whose tables are at least in 2NF, showing the dependency diagrams for each table. Table 1 Primary key: C1 Foreign key: None Normal form: 3NF Table 2 Primary key: C1 + C3 Foreign key: C1 to table 1 Normal form: 2NF, because the table exhibits the transitive dependencies from C4 C5 C2 C3 c. Create a database whose tables are at least in 3NF, showing the dependency diagrams for each table Table 1 Primary key: C1 Foreign key: None Normal form: 3NF C2 Table 2 Primary key: C1 + C3 Foreign key: C1 to table 1 & C4 to table 4 Normal form: 3NF Table 3 Primary key: C4 Foreign key: None Normal form: 3NF C4 C3 C5 7. What is a partial dependency? With what normal form is it associated? A partial dependency exists when there is a functional dependence in which the determinant is part of the primary key. The 2NF would be associated with partial dependency. 8. What three data anomalies are likely to be the result of data redundancy? How can such anomalies be eliminated? Tables can contain insertion, update, or deletion anomalies. Normalizing the table structure will likely reduce the probability of having data redundancies. Splitting up the tables to divide the information into separate relational groups reduces the probability of encountering data redundancy. 9. Define and discuss the concept of transitive dependency. A transitive dependency will only occur when a functional dependence exists among nonprime attributes. There are functional dependencies such that A→B, B→C, and A is the primary key. A→C is the transitive dependency because A determines the value of C via B. 10. Why is a table whose primary key consists of a single attribute automatically in 2NF when it is in 1NF? A partial dependency exists when there is a functional dependence in which the determinant is only part of the primary key. Therefore, if the PK is a single attribute, there can be no partial dependencies. 11. A table is in 3rd normal form when it is in 2nd normal form and there are no transitive dependencies. Practice Exercise 5.1 5. Using the STUDENT table structure shown in Table 5.2, write the relational schema and draw its dependency diagram. Identify all dependencies, including all transitive dependencies. Table 5.2 ATTRIBUTE_NAM E StudID StudLName StudMajor DeptCode DeptName DeptPhone CollegeName SAMPLE_VALU E 211343 Stephanos Accounting ACCT Accounting 4356 Business Admin SAMPLE_VALU E 200128 Smith Accounting ACCT Accounting 4356 Business Admin SAMPLE_VALU E 199876 Jones Marketing MKTG Marketing 4378 Business Admin SAMPLE_VALU E 199876 Ortiz Marketing MKTG Marketing 4378 Business Admin AdvisorLName AdvisorOffice AdvisorBldg AdvisorPhone StudGPA StudHours StudClass Grastand T201 Torre Building 2115 3.87 75 Junior Grastand T201 Torre Building 2115 2.78 45 Sophomore Gentry T228 Torre Building 2123 2.31 117 Senior Tillery T356 Torre Building 2159 3.45 113 Senior SAMPLE_VALU E 223456 McKulski Statistics MATH Mathematics 3420 Arts and Sciences Chen J331 Jones Building 3209 3.58 87 Junior 6. Using the answer to Problem 5, write the relational schema and draw the dependency diagram to meet the 3NF requirements to the greatest practical extent possible. If you believe that practical considerations dictate using a 2NF structure, explain why your decision to retain 2NF is appropriate. If necessary, add or modify attributes to create appropriate determinants and to adhere to the naming conventions. 7. Using the results of Problem 6, draw the Crow’s Foot ERD.