Objectives In this lesson, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form (BCNF) Appreciate the need for denormalization RDBMS Concepts/ Session 3 / 1 of 22 Normalization The logical design of the database, including the tables and the relationships between them, is the core of an optimized relational database. A good logical database design can lay the foundation for optimal database and application performance. A poor logical database design can impair the performance of the entire system. RDBMS Concepts/ Session 3 / 2 of 22 Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. A greater number of narrow tables (with fewer columns) is characteristic of a normalized database. A few wide tables (with more columns) is characteristic of an nonnomalized database. RDBMS Concepts/ Session 3 / 3 of 22 Understanding Data Redundancy Redundancy means repetition of data Redundancy increases the time involved in updating, adding, and deleting data It also increases the utilization of disk space and hence, disk I/O increases RDBMS Concepts/ Session 3 / 4 of 22 Understanding Data Redundancy (Contd.) Redundancy can lead to the following problems: Update anomalies—Inserting, modifying, and deleting data may cause inconsistencies Inconsistencies—Errors are more likely to occur when facts are repeated Unnecessary utilization of extra disk space RDBMS Concepts/ Session 3 / 5 of 22 Definition of Normalization Normalization is a scientific method of breaking down complex table structures into simple table structures by using certain rules It allows you to reduce redundancy in a table and eliminate the problems of inconsistency and disk space usage Normalization results in the formation of tables that satisfy certain specified rules and represent certain normal forms RDBMS Concepts/ Session 3 / 6 of 22 Normal Forms The most important and widely used normal forms are: First Normal Form (1 NF) Second Normal Form (2 NF) Third Normal Form (3 NF) Boyce Codd Normal Form (BCNF) RDBMS Concepts/ Session 3 / 7 of 22 First Normal Form A table is said to be in the 1 NF when each cell of the table contains precisely one value Functional Dependency The normalization theory is based on the fundamental notion of functional dependency Given a relation R, attribute A is functionally dependent on attribute B if each value of A in R is associated with precisely one value of B RDBMS Concepts/ Session 3 / 8 of 22 Un-Normalised Data Employee No Employee Name Branch Code Branch Name Branch Location Certification ID 1….n Certification Name 1….n Certification done at Marks obtained RDBMS Concepts/ Session 3 / 9 of 22 Rule 1 Eliminate repeating groups: Make a separate table for each set of repeated attributes and give each table a primary key. RDBMS Concepts/ Session 3 / 10 of 22 FNF Employee No Employee Name Branch Code Branch Name Branch Location Employee No Certification ID Certification Name Certification done at Marks obtained RDBMS Concepts/ Session 3 / 11 of 22 Second Normal Form (2NF) A table is said to be in 2 NF when it is in 1 NF and every attribute in the row is functionally dependent upon the whole key, and not just part of the key To ensure that a table is in 2 NF, you should: Find and remove attributes that are functionally dependent on only a part of the key and not on the whole key and place them in a different table Group the remaining attributes RDBMS Concepts/ Session 3 / 12 of 22 Rule 2 Eliminate Redundant Data If an attribute depends only on part of a multi-valued key, move it to separate table. The certification Name appears redundantly.(It also depends only on a part of the multi-valued key). RDBMS Concepts/ Session 3 / 13 of 22 SNF Employee Employee No Employee Name Branch Code Branch Name Branch Location Certifications Emp Certifications Certification ID Employee No Certification Certification ID Name Certification done at Marks obtained RDBMS Concepts/ Session 3 / 14 of 22 Third Normal Form (3NF) A relation is said to be in 3 NF when it is in 2 NF and every non-key attribute is functionally dependent only on the primary key To ensure that a table is in 3 NF, you should: Find and remove non-key attributes that are functionally dependent on attributes that are not the primary key and place them in a different table Group the remaining attributes RDBMS Concepts/ Session 3 / 15 of 22 Rule 3 Eliminate columns not dependent on Key Employee Table satisfies 1st & 2nd normal forms. But the key is Employee No, and the Branch name & location describe only a branch, Not a employee. RDBMS Concepts/ Session 3 / 16 of 22 TNF Employee Employee No Name Branch Code Branch Branch Code Branch Name Location Certification Cert. ID Cert. Name Emp Certification Emp No Cert Id Cert. Done at Marks obtained RDBMS Concepts/ Session 3 / 17 of 22 Boyce-Codd Normal Form The original definition of 3NF was inadequate in some situations It was not satisfactory for the tables: that had multiple candidate keys where the multiple candidate keys were composite where the multiple candidate keys overlapped Therefore, a new normal form—the BoyceCodd Normal Form (BCNF) was introduced A relation is in the Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key RDBMS Concepts/ Session 3 / 18 of 22 Characteristics of a normalized database Each table must have a key field. All field must contain small data. There must be no repeating fields. Each table must contain information about a single entity. Each field in a table must depend on the key field. All non-key fields must be mutually independent. RDBMS Concepts/ Session 3 / 19 of 22 Understanding Denormalization The end product of normalization is a set of related tables that comprise the database However, in the interests of speed of response to critical queries, which demand information from more than one table, it is sometimes wiser to introduce a degree of redundancy in tables The intentional introduction of redundancy in a table to improve performance is called denormalization RDBMS Concepts/ Session 3 / 20 of 22 Summary In this lesson, you learned that: Normalization is used to simplify table structures. Normalization results in the formation of tables that satisfy certain specified constraints, and represent certain normal forms. The normal forms are used to ensure that various types of anomalies and inconsistencies are not introduced in the database. A table structure is always in a certain normal form. Several normal forms have been identified. RDBMS Concepts/ Session 3 / 21 of 22 Summary (Contd.) The most important and widely used of these are: First Normal Form (1NF) Second Normal Form (2 NF) Third Normal Form (3 NF) Boyce Codd Normal Form (BCNF) The intentional introduction of redundancy in a table in order to improve performance is called denormalization. The decision to denormalize results in a tradeoff between performance and data integrity. Denormalization increases disk space utilization. RDBMS Concepts/ Session 3 / 22 of 22