Lecture 09: Functional Dependencies, Database Design Monday, October 18, 2004 1 Outline • Functional dependencies (3.4) • Rules about FDs (3.5) • Design of a Relational schema (3.6) 2 Functional Dependencies Definition: A1, ..., Am B1, ..., Bn holds in R if: t, t’ R, (t.A1=t’.A1 ... t.Am=t’.Am t.B1=t’.B1 ... t.Bn=t’.Bn ) R A1 ... Am B1 ... Bm t t’ if t, t’ agree here then t, t’ agree here 3 Review: Closure Example R(A,B,C,D,E,F) A, B A, D B A, F C E D B Compute {A,B}+ X = {A, B, } Compute {A, F}+ X = {A, F, } 4 Review: Inference Example: A, B C A, D B B D Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = : AB CD, ADBC, ABC D, ABD C, ACD B 5 Problem: find FD from the data • Given a database instance • Find all FD’s satisfied by that instance • Useful if we don’t get enough information from our users: need to reverse engineer a data instance 6 In Class: Find All FDs Student Alice Bob Alice Carol Dan Elsa Frank Dept CSE CSE EE CSE CSE CSE EE Course C++ C++ HW DB Java DB Circuits Room 020 020 040 045 050 045 020 Do all FDs make sense in practice ? 7 Answer Course Dept, Room Dept, Room Course Student, Dept Course, Room Student, Course Dept, Room Student, Room Dept, Course Do all FDs make sense in practice ? 8 Keys • A superkey is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An B • A key is a minimal superkey – I.e. set of attributes which is a superkey and for which no subset is a superkey 9 Computing (Super)Keys • Compute X+ for all sets X • If X+ = all attributes, then X is a key • List only the minimal X’s Note: there can be many keys ! • Example: R(A,B,C), ABC, BCA Keys: AB and BC 10 Examples of Keys • Product(name, price, category, color) name, category price category color Key is: {name, category} ; superkeys are all supersets • Enrollment(student, address, course, room, time) student address room, time course student, course room, time Keys are: [in class] 11 FD’s for E/R Diagrams Given a relation constructed from an E/R diagram, what is its key? Rule 1: If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. Person(address, name, ssn) Person address name ssn 12 FD’s for E/R Diagrams Rule 2: If the relation comes from a many-many relationship, the key of the relation is the set of all attribute keys in the relations corresponding to the entity sets name Product Person buys price name ssn date buys(name, ssn, date) 13 FD’s for E/R Diagrams Except: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. sname Product name card-no Purchase CreditCard Person Store ssn Purchase(name , sname, ssn, card-no) 14 FD’s for E/R Diagrams More rules: • Many-one, one-many, one-one relationships • Multi-way relationships • Weak entity sets (Try to find them yourself, or check book) 15 FD’s for E/R Diagrams Say: “the CreditCard determines the Person” sname Product name card-no Purchase CreditCard Person Store ssn Incomplete (what does it say ?) Purchase(name , sname, ssn, card-no) card-no ssn 16 Relational Schema Design (or Logical Design) Main idea: • Start with some relational schema • Find out its FD’s • Use them to design a better relational schema 17 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want 18 Relational Schema Design Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield SSN Name, City but not SSN PhoneNumber Anomalies: • Redundancy = repeat data • Update anomalies = Fred moves to “Bellevue” • Deletion anomalies = Joe deletes his phone number: what is his city ? 19 Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City SSN PhoneNumber Fred 123-45-6789 Seattle 123-45-6789 206-555-1234 Joe 987-65-4321 Westfield 123-45-6789 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: • No more repeated data • Easy to move Fred to “Bellevue” (how ?) • Easy to delete all Joe’s phone number (how ?) 20 Relational Schema Design name Conceptual Model: Product price Person buys name ssn Relational Model: plus FD’s Normalization: Eliminates anomalies 21 Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp 22 Decomposition • Sometimes it is correct: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name Price Name Category Gizmo 19.99 Gizmo Gadget OneClick 24.99 OneClick Camera Gizmo 19.99 Gizmo Camera Lossless decomposition 23 Incorrect Decomposition • Sometimes it is not: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera What’s incorrect ?? Name Category Price Category Gizmo Gadget 19.99 Gadget OneClick Camera 24.99 Camera Gizmo Camera 19.99 Camera Lossy decomposition 24 Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An B1, ..., Bm Then the decomposition is lossless Note: don’t need A1, ..., An C1, ..., Cp Example: name price, hence the first decomposition is lossless 25 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = will discuss Boyce Codd Normal Form (BCNF) = will discuss Others... 26 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A1, ..., An B is a non-trivial dependency in R , then {A1, ..., An} is a superkey for R In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R. 27 BCNF Decomposition Algorithm Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2 Until no more violations B’s R1 A’s Others R2 Is there a 2-attribute relation that is not in BCNF ? 28 Example Name Fred Fred Joe Joe SSN 123-45-6789 123-45-6789 987-65-4321 987-65-4321 PhoneNumber 206-555-1234 206-555-6543 908-555-2121 908-555-1234 City Seattle Seattle Westfield Westfield What are the dependencies? SSN Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? 29 Decompose it into BCNF Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 SSN Name, City Let’s check anomalies: • Redundancy ? • Update ? • Delete ? 30 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: A1, A2, …, An B1, B2, …, Bm Heuristics: choose B1 , B2, … Bm“as large as possible” Decompose: Is there a 2-attribute relation that is not in BCNF ? Others R1 A’s B’s R2 Continue until there are no BCNF violations left. 31 BCNF Decomposition Hints How you should proceed (HW, midterm, …) • Iterate over every set X, from small to large • Compute X+ • If X ≠ X+ ≠ [all attributes]: – Then it is a violation of BCNF (why ?) – Decompose into X+ and ([all attributes] - X+) Iterating over the X’s in different orders may lead to different BCNF results, but it doesn’t matter as long as the result is in BCNF 32 Example • R(A,B,C,D) A B, BC • In R: A+ = ABC ≠ ABCD – Split into R1(A,B,C), R2(A,D) • In R1: B+ = BC ≠ ABC – Split into R11(B,C), R12(A,B) • Final decomposition: R11(B,C), R12(A,B), R2(A,D) • What happens if in R we first pick B+ ? 33 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor Decompose in BCNF (in class): 34