Database Management System Module 4: Schema Refinement Pitfalls in relational database design Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to Repetition of Information. Inability to represent certain information. Repetition of Information (Redundancy) Storing the same information redundantly, that is in more than one place within a database, can lead to several problems: Redundant storage: Some information stored repeatedly. Update anomalies: If one copy of such repeated data is updated inconsistency is created unless all copies are similarly updated. Redundancy Problems Insertion anomalies: It may not be possible to store some informstion unless some other infromation is stored as well. Deletion anomalies: it may not be possible to delete some information without losing some other information as well. Example of Redundancy Problem Consider a relation instance of Hourly_Emps below The key for Hourly_Emps is ssn. In addition, suppose the hourly_wages attribute is determined by the rating attribute. Example of Redundancy Problem That is for given rating value, there is oly one permissible hourly_wages value. This IC is an example of a functional dependency. This lead to redundancy in the relation Hourly_Emps as illustrated above. Consequence Some information is stored multiple times. E.g. the rating value 8 with hourly_wage 10 repeated three times. Wasting space Inconsistency; a hourly_wages in the first tuple could be updated without making a similar change in the second tuple (Update Anomaly) Example 2 Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) . Example 2 Redundancy: Data for branch-name, branch-city, assets are repeated for each loan that a branch makes Wastes space Complicates updating, introducing possibility of inconsistency of assets value Null values Cannot store information about a branch if no loans exist Can use null values, but they are difficult to handle Problem Solution Design Goals: Avoid redundant data Can be easily identified with Functional Dependency Ensure that relationships among attributes are represented Facilitate the checking of updates for violation of database integrity constraints. Problem Solution There are various ways of solving redundancy problem in your design: Use Decomposition Normalisation with the aid of Functional Dependencies Use Decomposition Problems arising from redundancy can be addressed by replacing a relation with collection of smaller relations. Each of the smaller relations contains a subset of the attributes of the original relation. This process is called Decomposition of the larger relation to smaller relation. Decomposition Example we can deal with the redundancy in Hourly_Emps by decomposing it into two relations. Hourly_Emps2 (ssn, name, lot, rating, hours,_worked) Wages (rating, hourly_wages) Decomposition The Instances Hourly_Emps2 and Wages are shown below: Decomposition Decompose the relation schema Lendingschema into: Branch-schema = (branch-name, branchcity,assets) Loan-info-schema = (customer-name, loannumber, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R 1 R2 Lossless-Join Decomposition Let R be a relation schema and let F be a set of FDs over R. A decomposition of R into two schemas with attribute set X and Y is said to be lossless-join decomposition with respect to F if for every instance r of R that satisfies the dependencies in F, x(r ) Y (r ) r Example Below are instances illustrating Lossy Decomposition All decomposition used to eliminate redundancy must be lossless which it is not in our case above. Functional Dependence: Revisited We use functional dependencies to: test relations to see if they are legal under a given set of functional dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisfies F. specify constraints on the set of legal relations We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F. Functional Dependence: Revisited Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loanschema may, by chance, satisfy loan-number customer-name. Closure of a Set of Functional Dependencies Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. E.g. If A B and B C, then we can infer that A C The set of all functional dependencies logically implied by F is the closure of F. We denote the closure of F by F+. Closure of a Set of Functional Dependencies We can find all of F+ by applying Armstrong’s Axioms: if , then (reflexivity) if , then (augmentation) if , and , then (transitivity) These rules are sound (generate only functional dependencies that actually hold) and complete (generate all functional dependencies that hold). Goal — Devise a Theory for the Following Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that each relation is in good form the decomposition is a lossless-join decomposition Our theory is based on: functional dependencies Normal Forms and Normalisation In a given relation schema, we need to decide whether it is a good design or whether it requires a decomposition into smaller relations. Such a decision must be guided by an understanding of what problems from current schema. Normal forms are used for such guidance. Normal Forms Normal forms based on FDs are: First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) The process of using normal forms to improve database design by breaking up relations into smaller relation is called Normalization Normal Forms These forms have increasingly resrictive requirements: Every relation in BCNF is also in 3NF Every relation in 3NF is also in 2NF Every relation in 2NF is also in 1NF Unnormalised Data-Set to First normal form (1NF) A relation is in first normal forms (1NF ) if every field contains only atomic values, that is, not lists or sets. Consider the data set below Unnormalised Data-Set to First normal form (1NF) In the dataset above suppose we choose the data item moduleName as the key of this data-set. There are some cells in the table such as studentNo, studentName, assGrade and assType contains multiple values. All repeat with respect to moduleName First Normal Form The attribute studentNo, studentName, assGrade and assType are not functionally dependent on our chosen primary key moduleName. The attributes staffNo and staffName are dependent to key. Form two tables; one for the functionally dependent attributes and other for non-functionally dependent. 1NF Tables 1NF to 2NF Remove part key dependencies Involves examining those tables that have a compound key and for each non-key data item in the table asking the question: can the data-item be uniquely identified by part of the compound key? 1NF to 2NF A relation is in second normal form if and only if it is in first normal form and every non-key attribute is fully functionally dependent on the primary key. E.g table Assessments have three-part compound key moduleName, studentNo and assType 1NF to 2NF We observed that ModuleName has no influence on the studentName. StudentNo alone determine studentName. Hence we have to break out the determinant and dependent data-items into their own table. This lead to decomposition of the tables as follows: 2NF Tables 2NF to 3NF To move from second normal form to third normal form we remove inter-data dependencies. To do this examine every table and ask of each pair of non-key data-items; is the value of data item A dependent on the value of data-item B, or vice versa? If the answer is yes, split off the relevant data-items into a separate table 2NF to 3NF A relation is in third normal form if and only if it is in second normal form and every-key attribute is non-transitively dependent on the primary key. The only place where this is relevant is in the table called Modules. Here staffNo determines staffName. staffName hence is transitively dependent on moduleName 2NF to 3NF StaffNo is therefore asking to be a primary key. Hence, we create a separate table to be called Lecturers with staffNo as the primary key as illustrated in the below tables: 3NF Tables