Functional Dependencies and Normalisation Why Normalisation Normalisation with the use of Functional Dependency is the key way of eliminating data redundancy during the design of your database. Functional Dependencies A functional dependency is a relationship of one attribute or field in a record to another. In a database, we often have the case where one field defines the other. For example, we can say that Social Security Number (SSN) defines a name. Functional Dependencies This means that if I have a database with SSNs and names, and if I know someone's SSN, then I can find their name. By using the word “defines” we are saying that for every SSN we will have one and only one name. Functional Dependencies And hence we can come to a conclusion that we have defined name as being functionally dependent on SSN. The idea of a functional dependency is to define one field as an anchor (link) from which one can always find a single value for another field. Functional Dependencies Another example, suppose that a company assigned each employee a unique employee number. Each employee has a number and a name. Names might be the same for two different employees, but their employee numbers would always be different and unique because the company defined them that way. Functional Dependencies We write a functional dependency (FD) connection with an arrow: SSN → Name Or EmpNo → Name The expression SSN → Name is read "SSN defines Name" or "SSN implies Name" Functional Dependencies Consider the table below: Functional Dependencies (FD) You have two people named Fred! It is usual that Name will not be unique and it is commonplace for two people to have the same name. However, no two people have the same EmpNo and for each EmpNo, there is a Name. Functional Dependencies (FD) The FD X → Y means that for every occurrence of X you will get the same value of Y. consider another example Functional Dependencies (FD) Here, we will define two FDs: SSN → Name and School → Location. Further, we will define this FD: SSN → School. First, have we violated any FDs with our data? Because all SSNs are unique, there cannot be a FD violation of SSN → Name. Why? Because a FD X → Y says that given some value for X, you always get the same Y. Because the X's are unique, you will always get the same value. The same comment is true for SSN → School. Functional Dependencies (FD) Note: If we define a functional dependency X → Y and we define a functional dependency Y → Z, then we know by inference that X → Z. We can define SSN → School. We also can define School → Location, so we can infer that SSN → Location. Functional Dependencies (FD) The inference we have illustrated is called the transitivity rule of FD inference. Transitivity rule:Given X → Y Given Y → Z Then X → Z Functional Dependencies (FD) A primary key constraint is a special case of an FD. The attributes in the key play the role of X, and the set of all attributes in the relation plays the role of Y. Normal Forms and Normalisation Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form Normal Forms and Normalisation In a given relation schema, we need to decide whether it is a good design or whether it requires a decomposition into smaller relations. Such a decision must be guided by an understanding of what problems from current schema. Normal forms are used for such guidance. Normal Forms Normal forms based on FDs are: First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) The process of using normal forms to improve database design by breaking up relations into smaller relation is called Normalization Normal Forms These forms have increasingly resrictive requirements: Every relation in BCNF is also in 3NF Every relation in 3NF is also in 2NF Every relation in 2NF is also in 1NF Unnormalised Data-Set to First normal form (1NF) A relation is in first normal forms (1NF ) if every field contains only atomic values, that is, not lists or sets. Consider the data set below Unnormalised Data-Set to First normal form (1NF) In the dataset above suppose we choose the data item moduleName as the key of this data-set. There are some cells in the table such as studentNo, studentName, assGrade and assType contains multiple values. All repeat with respect to moduleName First Normal Form The attribute studentNo, studentName, assGrade and assType are not functionally dependent on our chosen primary key moduleName. The attributes staffNo and staffName are dependent to key. Form two tables; one for the functionally dependent attributes and other for nonfunctionally dependent. 1NF Tables 1NF to 2NF Remove part key dependencies Involves examining those tables that have a compound key and for each non-key data item in the table asking the question: can the data-item be uniquely identified by part of the compound key? 1NF to 2NF A relation is in second normal form if and only if it is in first normal form and every non-key attribute is fully functionally dependent on the primary key. E.g table Assessments have three-part compound key moduleName, studentNo and assType 1NF to 2NF We observed that ModuleName has no influence on the studentName. StudentNo alone determine studentName. Hence we have to break out the determinant and dependent data-items into their own table. This lead to decomposition of the tables as follows: 2NF Tables 2NF to 3NF To move from second normal form to third normal form we remove inter-data dependencies. To do this examine every table and ask of each pair of non-key data-items; is the value of data item A dependent on the value of data-item B, or vice versa? If the answer is yes, split off the relevant dataitems into a separate table 2NF to 3NF A relation is in third normal form if and only if it is in second normal form and every-key attribute is non-transitively dependent on the primary key. The only place where this is relevant is in the table called Modules. Here staffNo determines staffName. staffName hence is transitively dependent on moduleName 2NF to 3NF StaffNo is therefore asking to be a primary key. Hence, we create a separate table to be called Lecturers with staffNo as the primary key as illustrated in the below tables: 3NF Tables Another Example Consider the table below: Title Author1 Author 2 ISBN Subject Pages Publisher Database System Concepts Abraham Silberschatz Henry F. Korth 0072958863 MySQL, Computers 1168 McGraw-Hill Operating System Concepts Abraham Silberschatz Henry F. Korth 0471694665 Computers 944 McGraw-Hill