Chapter 8: Relational Database Design Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Relational Database Design n Find a “good” collection of relation schemas for our database n Two major pitfalls to avoid in designing a database schema l Redundancy l repeating information è data inconsistency Incompleteness 4 difficult or impossible to model certain aspects of the enterprise 4 Database System Concepts - 6th Edition 8.2 ©Silberschatz, Korth and Sudarshan Design Alternatives: Larger Schemas n Suppose we combine instructor and department into inst_dept n Redundancy l Wastes space l Complicates updating è possibility of inconsistency n Null values may be introduced è difficulty in handling Database System Concepts - 6th Edition 8.3 ©Silberschatz, Korth and Sudarshan Design Alternatives: Smaller Schemas n Suppose we decompose employee(ID, name, street, city, salary) into employee1 (ID, name) and employee2 (name, street, city, salary) n Lossy decomposition l Database System Concepts - 6th Edition 8.4 Introduces a loss of information ©Silberschatz, Korth and Sudarshan First Normal Form n Domain is atomic if its elements are considered to be indivisible units l Examples of non-atomic domains: 4 Set of names, composite attributes 4 Identification numbers like CS101 that can be broken up into parts n Atomicity is actually a property of how the elements of the domain are used n Non-atomic values complicate storage and encourage redundant (repeated) storage of data n A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic n We assume all relations are in first normal form Database System Concepts - 6th Edition 8.5 ©Silberschatz, Korth and Sudarshan Relational Theory Goal: Devise a theory for the following n Decide whether a particular relation R is in “good” form. n In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that l each relation is in good form l the decomposition is a lossless-join decomposition n Our theory is based on: l functional dependencies l multivalued dependencies Database System Concepts - 6th Edition 8.6 ©Silberschatz, Korth and Sudarshan Functional Dependencies n Constraints on the set of legal relations n Require that the value for a certain set of attributes determines uniquely the value for another set of attributes n A functional dependency is a generalization of the notion of a key n Provide the theoretical basis for good decomposition Database System Concepts - 6th Edition 8.7 ©Silberschatz, Korth and Sudarshan Functional Dependencies (Cont.) n Let R be a relation schema a Í R and b Í R n The functional dependency a®b holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes a, they also agree on the attributes b. That is, t1[a] = t2 [a] Þ t1[b ] = t2 [b ] n Example: Consider r(A,B ) with the following instance of r. 1 1 3 4 5 7 n On this instance, A ® B does NOT hold, but B ® A does hold. Database System Concepts - 6th Edition 8.8 ©Silberschatz, Korth and Sudarshan Keys and Functional Dependencies n K is a superkey for relation schema R if and only if K ® R n K is a candidate key for R if and only if l K ® R, and l for no a Ì K, a ® R n Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: inst_dept (ID, name, salary, dept_name, building, budget) We expect these functional dependencies to hold: dept_name® building and ID à building but would not expect the following to hold: dept_name ® salary Database System Concepts - 6th Edition 8.9 ©Silberschatz, Korth and Sudarshan Use of Functional Dependencies n Test relations to see if they are legal under a given set of FDs l If a relation r is legal under a set F of FDs, we say that r satisfies F. n Specify constraints on the set of legal relations l If all legal relations on R satisfy a set F of FDs, we say that F holds on R. n Note: A specific instance of a relation schema may satisfy a FD even if the FD does not hold on all legal instances. l For example, a specific instance of instructor may, by chance, satisfy name ® ID. Database System Concepts - 6th Edition 8.10 ©Silberschatz, Korth and Sudarshan Trivial Functional Dependency n A functional dependency is trivial if it is satisfied by all instances of a relation l Example: 4 ID, name ® ID 4 name ® name n In general, a ® b is trivial if b Í a Database System Concepts - 6th Edition 8.11 ©Silberschatz, Korth and Sudarshan Closure of a Set of Functional Dependencies n Given a set F of FDs, there are certain other FDs that are logically implied by F. l For example: If A ® B and B ® C, then we can infer that A ® C n The set of all functional dependencies logically implied by F is the closure of F (denoted by F+). n We can find F+ by repeatedly applying Armstrong’s Axioms: l if b Í a, then a ® b (reflexivity) l if a ® b, then g a ® g b (augmentation) l if a ® b, and b ® g, then a ® g (transitivity) n These rules are l sound (generate only functional dependencies that actually hold) and l complete (generate all functional dependencies that hold) Database System Concepts - 6th Edition 8.12 ©Silberschatz, Korth and Sudarshan Example n R = (A, B, C, G, H, I) F={ A®B A®C CG ® H CG ® I B ® H} n some members of F+ l A®H 4 l AG ® I 4 l by transitivity from A ® B and B ® H by augmenting A ® C with G, to get AG ® CG and then transitivity with CG ® I CG ® HI 4 by augmenting CG ® I to infer CG ® CGI, and augmenting of CG ® H to infer CGI ® HI, and then transitivity Database System Concepts - 6th Edition 8.13 ©Silberschatz, Korth and Sudarshan Additional Rules for Closure of FDs n Additional rules: l If a ® b holds and a ® g holds, then a ® b g holds (union) l If a ® b g holds, then a ® b holds and a ® g holds (decomposition) l If a ® b holds and g b ® d holds, then a g ® d holds (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms. Database System Concepts - 6th Edition 8.14 ©Silberschatz, Korth and Sudarshan Closure of Attribute Sets n Given a set of attributes a, define the closure of a under F (denoted by a+) as the set of attributes that are functionally determined by a under F n Algorithm to compute a+, the closure of a under F result := a; while (changes to result) do for each b ® g in F do begin if b Í result then result := result È g end Database System Concepts - 6th Edition 8.15 ©Silberschatz, Korth and Sudarshan Example of Attribute Set Closure n R = (A, B, C, G, H, I) n F = {A ® B A®C CG ® H CG ® I B ® H} n (AG)+ 1. result = AG 2. result = ABCG (A ® C and A ® B) 3. result = ABCGH (CG ® H and CG Í AGBC) 4. result = ABCGHI (CG ® I and CG Í AGBCH) n Is AG a candidate key? 1. Is AG a super key? 2. Does AG ® R? == Is (AG)+ Ê R Is any subset of AG a superkey? 1. 1. 2. Does A ® R? == Is (A)+ Ê R Does G ® R? == Is (G)+ Ê R Database System Concepts - 6th Edition 8.16 ©Silberschatz, Korth and Sudarshan Uses of Attribute Closure There are several uses of the attribute closure algorithm: n Testing for superkey: “is a a superkey?” l Compute a+, and check if a+ contains all attributes of R n Testing functional dependencies: “does a ® b hold? (Is a ® b in F+?)” l Just check if b Í a+. n Computing the closure of F l For each g Í R, we find the closure g+, and for each S Í g+, we output a functional dependency g ® S. Database System Concepts - 6th Edition 8.17 ©Silberschatz, Korth and Sudarshan Canonical Cover n Sets of functional dependencies may have redundant dependencies that can be inferred from the others l Example: A ® C is redundant in: {A ® B, B ® C, A ® C} l Parts of a functional dependency may be redundant 4 E.g.: on RHS: {A ® B, B ® C, A ® CD} can be simplified to {A ® B, B ® C, A ® D} 4 E.g.: on LHS: {A ® B, B ® C, AC ® D} can be simplified to {A ® B, B ® C, A ® D} n A canonical cover for F is a set of dependencies Fc such that l F+ = Fc+, and l No functional dependency in Fc contains an extraneous attribute, and l Each left side of functional dependency in Fc is unique n Intuitively, Fc is a “minimal” set of FDs equivalent to F, having no redundant dependencies or redundant parts of dependencies Database System Concepts - 6th Edition 8.18 ©Silberschatz, Korth and Sudarshan Extraneous Attributes n Let a ® b in F. l Attribute A Î a is extraneous if F Þ (F – {a ® b}) È {(a – A) ® b}. l Attribute A Î b is extraneous if (F – {a ® b}) È {a ®(b – A)} Þ F. n Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one n Example: Given F = {A ® C, AB ® C } l B is extraneous in AB ® C because {A ® C, AB ® C} Þ A ® C (i.e. the result of dropping B from AB ® C). n Example: Given F = {A ® C, AB ® CD} l C is extraneous in AB ® CD because AB ® C can be inferred even after deleting C Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan Testing if an Attribute is Extraneous Let a ® b in F. n n To test if attribute A Î a is extraneous 1. Compute ({a} – A)+ using the dependencies in F 2. Check that ({a} – A)+ contains b; if it does, A is extraneous in a To test if attribute A Î b is extraneous 1. Compute a+ using only the dependencies in F’ = (F – {a ® b}) È {a ®(b – A)}, 2. Check that a+ contains A; if it does, A is extraneous in b Database System Concepts - 6th Edition 8.20 ©Silberschatz, Korth and Sudarshan Computing a Canonical Cover n Algorithm Fc = F repeat Replace any a1 ® b1 and a1 ® b2 in Fc with a1 ® b1 b2 (union rule) Find a ® b in Fc with an extraneous attribute either in a or in b If an extraneous attribute is found, delete it from a ® b until F does not change Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied n Example R = (A, B, C) F = {A ® BC B®C A®B AB ® C} n Combine A ® BC and A ® B into A ® BC l n n A is extraneous in AB ® C l B ® C logically implies AB ® C l Now, Fc = {A ® BC, B ® C} C is extraneous in A ® BC l n 8.21 A ® C is logically implied by A ® B and B ® C. The canonical cover l Database System Concepts - 6th Edition Now, Fc = {A ® BC, B ® C, AB ® C} Fc = {A ® B, B ® C} ©Silberschatz, Korth and Sudarshan Lossless-join Decomposition n {R1, …, Rn} is a decomposition of R if R1 È … È Rn n {R1, R2} is a loseless-join decomposition of R if r = ÕR1 (r) ÕR2 (r) n Decomposition {R1, R2} of R is lossless join if at least one of the following dependencies is in F+: l R1 Ç R2 ® R1 l R1 Ç R2 ® R2 (i.e., if one of the two sub-schemas hold the key of the other sub-schema) Database System Concepts - 6th Edition 8.22 ©Silberschatz, Korth and Sudarshan Dependency Preservation n Let F be set of FD on R, and {R1, …, Rn} is a decomposition of R. n The restriction of F to Ri, denoted by Fi, is the set of FDs in F+ that include only attributes in Ri. n A decomposition is dependency preserving, if (F1 È F2 È … È Fn)+ = F+ n If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive. Database System Concepts - 6th Edition 8.23 ©Silberschatz, Korth and Sudarshan Example n R = (A, B, C) F = {A ® B, B ® C) l Can be decomposed in two different ways n R1 = (A, B), R2 = (B, C) l Lossless-join decomposition: R1 Ç R2 = {B} and B ® BC l Dependency preserving n R1 = (A, B), R2 = (A, C) l Lossless-join decomposition: R1 Ç R2 = {A} and A ® AB l Not dependency preserving (cannot check B ® C without computing R1 Database System Concepts - 6th Edition 8.24 R2) ©Silberschatz, Korth and Sudarshan Goals of Normalization n Let R be a relation scheme with a set F of functional dependencies n Decide whether a relation scheme R is in “good” form n In the case that a relation scheme R is not in “good” form, decompose it into a set of relation scheme {R1, R2, ..., Rn} such that l Each relation scheme is in good form l The decomposition is a lossless-join decomposition l Preferably, the decomposition should be dependency preserving Database System Concepts - 6th Edition 8.25 ©Silberschatz, Korth and Sudarshan Boyce-Codd Normal Form A relation schema R is in BCNF with respect to a set F of FDs if for all functional dependencies in F+ of the form a®b where a Í R and b Í R, at least one of the following holds: n a ® b is trivial (i.e., b Í a) n a is a superkey for R Example schema not in BCNF: inst_dept (ID, name, salary, dept_name, building, budget ) because dept_name® building, budget holds on inst_dept, but dept_name is not a superkey Database System Concepts - 6th Edition 8.26 ©Silberschatz, Korth and Sudarshan BCNF Example n Suppose we have a schema R and a non-trivial dependency a ® b causes a violation of BCNF. We decompose R into: • (aUb) • (R-(b-a)) n In our example, l l a = dept_name b = building, budget and inst_dept is replaced by l ( a U b ) = ( dept_name, building, budget ) l ( R - ( b - a ) ) = ( ID, name, salary, dept_name ) n Of course, we are only interested in lossless join decomposition! Database System Concepts - 6th Edition 8.27 ©Silberschatz, Korth and Sudarshan Testing for BCNF n To check if a non-trivial dependency a ® b causes a violation of BCNF 1. Compute a+ (the attribute closure of a), and 2. Verify that it includes all attributes of R, that is, it is a superkey of R n To check if a relation schema R is in BCNF l l Simplified test: check only the FDs in F (not in F+) for violation of BCNF If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either. n However, simplified test using only F is incorrect when testing a relation Ri in a decomposition of R l Example: consider R = (A, B, C, D, E), with F = { A ® B, BC ® D } 4 Decompose R into R1 = (A, B) and R2 = (A, C, D, E) 4 Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF. 4 In fact, dependency AC ® D in F+ shows R2 is not in BCNF. Database System Concepts - 6th Edition 8.28 ©Silberschatz, Korth and Sudarshan BCNF Decomposition Algorithm result := {R}; done := false; compute F +; while (not done) do if (there is a schema Ri in result that is not in BCNF) then begin let a ® b be a nontrivial functional dependency that holds on Ri such that a ® Ri is not in F +, and a Ç b = Æ; result := (result – Ri ) È (Ri – b) È (a, b ); end else done := true; Note: each Ri is in BCNF, and decomposition is lossless-join. Database System Concepts - 6th Edition 8.29 ©Silberschatz, Korth and Sudarshan Example of BCNF Decomposition n class (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id) n Functional dependencies: l l course_id → title, dept_name, credits building, room_number → capacity course_id, sec_id, semester, year → building, room_number, time_slot_id n A candidate key {course_id, sec_id, semester, year}. l n BCNF Decomposition: l l course_id→ title, dept_name, credits holds 4 but course_id is not a superkey. We replace class by: course(course_id, title, dept_name, credits) 4 class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id) 4 Database System Concepts - 6th Edition 8.30 ©Silberschatz, Korth and Sudarshan BCNF Decomposition (Cont.) n course is in BCNF l How do we know this? n building, room_number→capacity holds on class-1 l l but {building, room_number} is not a superkey for class-1. We replace class-1 by: 4 classroom (building, room_number, capacity) 4 section (course_id, sec_id, semester, year, building, room_number, time_slot_id) n classroom and section are in BCNF Database System Concepts - 6th Edition 8.31 ©Silberschatz, Korth and Sudarshan BCNF and Dependency Preservation n Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation l It is sufficient to test only those dependencies on each individual relation of a decomposition l If so, the decomposition is dependency preserving. n It is not always possible to get a BCNF decomposition that is dependency preserving l Example: R = (J, K, L) F = {JK ® L L®K} Two candidate keys = JK and JL 4 R is not in BCNF 4 Any decomposition of R will fail to preserve JK ® L This implies that testing for JK ® L requires a join Database System Concepts - 6th Edition 8.32 ©Silberschatz, Korth and Sudarshan Third Normal Form: Motivation n There are some situations where l BCNF is not dependency preserving, and l Efficient checking for FD violation on updates is important n Solution: define a weaker normal form, called Third Normal Form (3NF) l Allows some redundancy (with resultant problems) l But FDs can be checked on individual relations without computing a join. l There is always a lossless-join, dependency-preserving decomposition into 3NF. Database System Concepts - 6th Edition 8.33 ©Silberschatz, Korth and Sudarshan Third Normal Form n A relation schema R is in third normal form (3NF) if for all: a ® b in F+ at least one of the following holds: l a ® b is trivial (i.e., b Î a) l a is a superkey for R l Each attribute A in b – a is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) n If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). Database System Concepts - 6th Edition 8.34 ©Silberschatz, Korth and Sudarshan 3NF Example n Example: R = (J, K, L) F = {JK ® L L®K} Two candidate keys = JK and JL l l R is in 3NF 4 JK ® L : JK is a superkey 4 L ® K : K is contained in a candidate key J L K j1 l1 k1 j2 l1 k1 j3 l1 k1 null l2 k2 But R is not in BCNF (L ® K : nontrivial, L is not a superkey) n There is some redundancy in this schema l Repetition of information 4 l e.g., the relationship l1, k1 Need to use null values 4 e.g., to represent the relationship l2, k2 where there is no corresponding value for J). Database System Concepts - 6th Edition 8.35 ©Silberschatz, Korth and Sudarshan Testing for 3NF n Need to check only FDs in F (not all FDs in F+) n For each dependency a ® b, l Check if a is a superkey (using attribute closure) n If a is not a superkey, l We have to verify if each attribute in b is contained in a candidate key of R l This test is rather more expensive, since it involves finding candidate keys l Testing for 3NF has been shown to be NP-hard n Interestingly, decomposition into third normal form (described shortly) can be done in polynomial time Database System Concepts - 6th Edition 8.36 ©Silberschatz, Korth and Sudarshan 3NF Decomposition Algorithm Let Fc be a canonical cover for F; i := 0; for each functional dependency a ® b in Fc do i := i + 1; Ri := a b if none of the schemas Rj, 1 £ j £ i contains a candidate key for R then begin i := i + 1; Ri := any candidate key for R; end /* Optionally, remove redundant relations */ repeat if any schema Rj is contained in another schema Rk then /* delete Rj */ Rj = Ri;; i = i - 1; return (R1, R2, ..., Ri) Database System Concepts - 6th Edition 8.37 ©Silberschatz, Korth and Sudarshan 3NF Decomposition: An Example n Relation schema: cust_banker_branch = (customer_id, employee_id, branch_name, type ) n The functional dependencies for this relation schema are: 1. customer_id, employee_id ® branch_name, type 2. employee_id ® branch_name 3. customer_id, branch_name ® employee_id n We first compute a canonical cover l branch_name is extraneous in the r.h.s. of the 1st dependency l No other attribute is extraneous, so we get FC = customer_id, employee_id ® type employee_id ® branch_name customer_id, branch_name ® employee_id Database System Concepts - 6th Edition 8.38 ©Silberschatz, Korth and Sudarshan 3NF Decompsition Example (Cont.) n The for loop generates following 3NF schema: (customer_id, employee_id, type) (employee_id, branch_name) (customer_id, branch_name, employee_id) l Observe that (customer_id, employee_id, type) contains a candidate key of the original schema, so no further relation schema needs be added n At end of for loop, detect and delete schemas, such as (employee_id, branch_name), which are subsets of other schemas n The resultant simplified 3NF schema is: (customer_id, employee_id, type) (customer_id, branch_name, employee_id) Database System Concepts - 6th Edition 8.39 ©Silberschatz, Korth and Sudarshan Comparison of BCNF and 3NF n It is always possible to decompose a relation into a set of relations that are in 3NF such that: l The decomposition is lossless l The dependencies are preserved n It is always possible to decompose a relation into a set of relations that are in BCNF such that: l The decomposition is lossless l It may not be possible to preserve dependencies Database System Concepts - 6th Edition 8.40 ©Silberschatz, Korth and Sudarshan Overall Database Design Process n We have assumed schema R is given l R could have been generated when converting E-R diagram to a set of tables. l R could have been a single relation containing all attributes that are of interest (called universal relation). 4 l Normalization breaks R into smaller relations R could have been the result of some ad hoc design of relations, which we then test/convert to normal form. Database System Concepts - 6th Edition 8.41 ©Silberschatz, Korth and Sudarshan Denormalization for Performance n May want to use non-normalized schema for performance l For example, displaying prereqs along with course_id, and title requires join of course with prereq n Alternative 1: Use denormalized relation containing attributes of course as well as prereq with all above attributes l faster lookup l extra space and extra execution time for updates l extra coding work for programmer and possibility of error in extra code n Alternative 2: use a materialized view defined as course l prereq Benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors Database System Concepts - 6th Edition 8.42 ©Silberschatz, Korth and Sudarshan Other Design Issues n Some aspects of database design are not caught by normalization n Examples of bad database design, to be avoided: Instead of earnings (company_id, year, amount), use l earnings_2004, earnings_2005, earnings_2006, etc., all on the schema (company_id, earnings). 4 l Above are in BCNF, but make querying across years difficult and needs new table each year company_year (company_id, earnings_2004, earnings_2005, earnings_2006) 4 Also in BCNF, but also makes querying across years difficult and requires new attribute each year 4 Is an example of a crosstab, where values for one attribute become column names – used in spreadsheets, and in data analysis tools Database System Concepts - 6th Edition 8.43 ©Silberschatz, Korth and Sudarshan End of Chapter 8 Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use