Karlstad University Department of Information Systems Adapted for a textbook by Date C. J. An Introduction to Database Systems Pearson Addison Wesley, 2004 Database Design Remigijus GUSTAS Phone: +46-54 700 17 65 E-mail: Remigijus.Gustas@kau.se http://www.cs.kau.se/~gustas/ Functional Dependence Functional dependence is a many to one relationship from one set of attributes to another within a given relvar Let r be a relation, and let X and Y be subsets of the attributes of r X → Y says “Y is functionally dependent on X”, or “X functionally determines Y” X is the determinant; Y the dependent An FD is trivial if and only if the right side is a subset of the left side (not interesting in practice) { S#, P# } → S# All other dependencies are called nontrivial Remigijus Gustas 7-2 Functional Dependency A particular dependency between two attributes. For a given relation, attribute B is functionally dependent on attribute A is, for every valid value of A, that value of A uniquely determines the value of B EMP# ENAME EMP# {ENAME, DEPT#, SALARY} 7-3 EMP# ENAME DEPT# SALARY E1 Lopez D1 40K E2 Cheng D1 42K E3 Finzi D2 30K E4 Saito D2 35K Remigijus Gustas 1 Armstrong’s Axioms The set of all FDs that are implied by a given set S of FDs is called the closure of S 1. 2. 3. 4. 5. 6. 7. Reflexivity: If B is a subset of A then A → B Augmentation: If A → B, then AC → BC Transitivity: If A → B and B → C, then A → C Self-determination: A → A Decomposition: If A → BC, then A → B and A → C Union: If A → B and A → C, then A → BC Composition: If A → B and C → D, then AC → BD A, B, and C are subsets of relvar R, and AB signify the union of A and B Three first rules are sound (all FDs) and complete (no additional FDs) Remigijus Gustas 7-4 Irreducible Sets of Dependencies Given some particular set S of FDs that need to be enforced, it is sufficient for the database system to enforce the FDs in irreducible equivalent Let S1 and S2 be two sets of FDs. If every FD implied by S1 is implied by S2, then S2 is a cover of S1 A set S of FDs is irreducible iff: The right side of every FD in S involves one attribute (a singleton set) The left side is irreducible No FD in S can be discarded without changing the closure of S Remigijus Gustas 7-5 Normalized Relations Redundancies removed by breaking into two separate relations 7-6 Remigijus Gustas 2 Nonloss (Information-Preserving) Decomposition Normalization uses a process of projection to decompose relvars Recomposition is a process of joins The decomposition of relvar R into projections R1…Rn is nonloss if R = the join of R1…Rn The normalization procedure can be seen as a method for eliminating functional dependencies that do not emanate from a candidate key Remigijus Gustas 7-7 Normalization - the process of converting complex data structures into more simple, stable data structures First Normal From (1NF) Unique rows All attributes are atomic Remigijus Gustas 7-8 Normalization (cont.) Second Normal Form (2NF) Each nonprimary key attribute is identified by the whole key (called full functional dependency). Third Normal Form (3NF) Nonprimary key attributes do not depend on each other (i.e. no transitive dependencies). The result of normalization is that every nonprimary key attribute depends upon the whole primary key. 7-9 Remigijus Gustas 3 1NF but not 2NF EMPLOYEE2(Emp_ID, Name, Dept, Salary, Course, Date_Completed) Functional dependencies: 1. Emp_ID Name, Dept, Salary 2. Emp_ID, Course Date_Completed partial key dependency Remigijus Gustas 7-10 2NF (actually, also 3NF) EMPLOYEE1(Emp_ID, Name, Dept, Salary) Functional dependencies: Emp_ID Name, Dept, Salary EMPCOURSE(Emp_ID, Course, Date_Completed) Functional dependency: Emp_ID, Course Date_Completed Remigijus Gustas 7-11 First Normal Form A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute In this way, relvars are always in 1NF A relvar in 1NF may display functional dependencies other than those emanating from the primary key Such non-primary-key dependencies promote update anomalies 7-12 Remigijus Gustas 4 Update Anomalies “Update anomalies” include three operations: An INSERT anomaly occurs when the user wishes to record a subordinate fact that is not dependent on the primary key e.g., recording a supplier location before the supplier supplies a part A DELETE anomaly conversely, may delete the location inadvertently An UPDATE anomaly occurs when many updates are required to record a simple fact Remigijus Gustas 7-13 Second Normal Form A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key Assumes only one candidate key A relvar in 2NF is less susceptible to update anomalies, but may still exhibit transitive dependencies Both attributes in a transitive dependency are irreducibly implied by the primary key, and each implies the other Remigijus Gustas 7-14 2NF but not 3NF SALES(Customer_ID, Customer_Name, SalesPerson, Region) Functional dependencies: 1. Customer_ID Customer_Name, SalesPerson, Region 2. SalesPerson Region 7-15 transitive Remigijus Gustas 5 Third Normal Form A relvar is in 3NF if and only if the nonkey attributes are both mutually independent and irreducibly dependent on the primary key (informal definition) A relvar is in 3NF if and only if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way Remigijus Gustas 7-16 Converted to 3NF SALES1(Customer_ID, Customer_Name, SalesPerson) Functional dependencies: Customer_ID Customer_Name, SalesPerson SPERSON(SalesPerson, Region) Functional dependency: SalesPerson Region Remigijus Gustas 7-17 Dependency Preservation Dependency preservation refers to a specific case of nonloss decomposition, such that the normalized relvars are independent of each other Some nonloss decompositions do not exhibit dependency preservation 7-18 Customer_ID {Customer_Name, SalesPerson} SalesPerson Region Constraint Customer_ID Region will be enforced automatically if the other two constraints are enforced Remigijus Gustas 6 Dependency Preservation: Example Dependencies are preserved in this projection: Customer {Customer_ID, Customer_Name, SalesPerson} CS {SalesPerson, Region} Dependencies are not preserved in this one: SC {Customer_ID, Customer_Name, SalesPerson} CS {Customer_ID, Region} Although the second is nonloss, you still cannot update them independently Remigijus Gustas 7-19 Boyce/Codd Normal Form BCNF refers to decompositions involving relvars with more than one candidate key, where the candidate keys are composite and overlapping A relvar is in BCNF if and only if every nontrivial, left- irreducible FD has a candidate key as its determinant That is, a relvar is in BCNF if and only if every determinant is a candidate key Remigijus Gustas 7-20 Boyce/Codd Normal Form: Example {S,J} T TJ S J T Smith Math Prof White Smith Physics Prof Green Jones Math Prof White Jones Physics Prof Green ST (S,T), TJ (T,J) Two projections are in BCNF, but the trouble is that they are not independent. Decomposition does not in fact avoid anomalies. 7-21 Remigijus Gustas 7