Normalization Introduction ▪ A well-structured relation contains minimal redundancy and allows users to perform insert, modify and delete operations in a relation without causing issue. ▪ Redundancies in a table may result not only in wasted space, but also lead to loss of data integrity in the database. 2 Table with Repeating Groups 3 Anomalies in Databases ▪ An Anomaly is an error or inconsistency that may result when a user attempts to update a relation that contains redundant data. ▪ There are three types of anomalies ▪ Insertion anomaly ▪ Update anomaly ▪ Deletion anomaly 4 Insertion Anomaly ▪ Inability to add data to the database due to absence of related data ▪ Eg: Not being able to enter the registration details of a student, because that particular course does not yet exist. ▪ So, if we had a composite key of StudNo, CourseID – the data cannot be added IF the CourseID does not exist 5 Update Anomaly ▪ Happens when a change in redundant data makes the data inconsistent ▪ Eg: if a student has multiple registrations, and each registration also includes the address, we would have to update each address if there is a change. 6 Deletion Anomaly ▪ Unintended loss of data due to deletion of related data ▪ Eg: if a student is registered for a course, and the course name and course ID are stored in that same relation, if you delete the student, you lose the course details too 7 Normalisation Process ▪ Normalisation is a formal process for deciding which attributes should be grouped together in a relation. ▪ Normalisation is the process of decomposing relations with anomalies to produce smaller, well-structured relations. ▪ Normal forms are the rules used for structuring relations 8 Table with repeating groups • Remove Repeating 1st Normal Form (1NF) • Remove Partial Dependencies 2nd Normal Form (2NF) • Remove Transitive Dependencies 3rd Normal Form (3NF) • Make Every Determinant as a Key Boyce Codd Normal Form (BCNF) 4th Normal Form (4NF) • Remove Multivalued Dependencies • Remove Join Dependencies 5th Normal Form (5NF) 9 First Normal Form (1NF) ▪ A relation is in 1NF, if it does not contain repeating groups or multivalued attributes. ▪ In plain English: Each record needs to be unique and Each table cell should contain a single value ▪ This can be achieved by separating table into two tables 1. A table containing single valued attributes with a key ▪ Project_1(ProjNo, ProjName) 2. A table containing multivalued attributes with a composite key ▪ Works_1 (ProjNo, EmpNo, EmpName, JobTitle, HourlyRate, HrsWorked) 10 Before Normalisation Table with Repeating Groups 11 After Applying 1st Normal Form Composite Key 12 Functional Dependency ▪ A value of an attribute in a tuple can determines a value of other attributes in the same tuple. ▪ Eg: ▪ A,B,C,D are attributes in a relation called R. ▪ R (A,B,C,D) ▪ B,D are functionally dependent on A. ▪ A →B,D ▪ Example coming up, after explaining determinant / dependent 13 A →B,D ▪ Determinant ▪ An attribute or attributes on the left hand side of the functional dependency, which determines the values of other attributes in the same tuple. ▪ Dependent ▪ An attribute or attributes on the right hand side of the functional dependency that depends on determinant. 14 Example for Functional Dependency ▪ In this table, if we know the EmpNo, then we can find the EmpName, JobTitle, HourlyRate ▪ Therefore, those 3 attributes are functionally dependant on EmpNo 15 Second Normal Form (2NF) ▪ Relation must be in 1NF ▪ AND ▪ No Partial Dependencies exist (Every non-key attribute is fully functionally dependent on Key attribute). Partial Dependency: Non-key attribute functionally depends on just a part of the key attribute ▪ To achieve 2NF, identify the partial dependencies of table in 1NF, split the table into a set of relations where each relation is having a unique identifier 16 ▪ Functional Dependencies (in Works_1) ▪ EmpNo → EmpName, JobTitle, HourlyRate ▪ ProjNo,EmpNo → HrsWorked ▪ Relations in 2NF ▪ Employee_2 (EmpNo, EmpName, JobTitle, HourlyRate) ▪ Works_2 (EmpNo, ProjNo , HrsWorked) ▪ Project_2 (ProjNo, ProjName) 17 Before Applying 2nd Normal Form Partial Dependency 18 After Applying 2nd Normal Form Full Dependency on the Key 19 Third Normal Form (3NF) ▪ A relation is in 3NF, if the relation is in 2NF & no transitive dependencies exist ▪ Transitive Dependency: Non-key attribute is functionally dependent on another Non-key attribute 20 Third Normal Form (3NF) ▪ In Plain English: ▪ changing a non-key column, might cause any of the other non-key columns to change ▪ Eg ▪ Changing a name, may affect a title or designation ▪ To achieve 3NF, identify the transitive dependencies of table in 2NF. Based on them, split the table into a set of relations where each relation is having a unique identifier 21 ▪ Dependencies (in Employee_2) ▪ JobTitle → HourlyRate ▪ EmpNo → EmpName, JobTitle ▪ Relations in 3NF ▪ Job_3 (JobTitle, HourlyRate) ▪ Employee_3 (EmpNo, EmpName, JobTitle) ▪ Works_3 (EmpNo, ProjNo, HrsWorked) ▪ Project_3 (ProjNo, ProjName) 22 Before Applying 3rd Normal Form Transitive Dependency 23 After Applying 3rd Normal Form 24