Lecture 06 The Relational Data Model 1 Outline • Relational Data Model • Functional Dependencies • Logical Schema Design Reading Chapter 8 2 The Relational Data Model Data Modeling Relational Schema Physical storage Have seen this in SQL E/R diagrams Have seen this too Tables: column names: attributes rows: tuples Complex file organization and index structures. Discuss next 3 Terminology Table name or relation name Products: Name Category Manufacturer $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch household Hitachi gizmo Price Attribute names $203.99 Tuples or rows or records 4 Schemas Relational Schema: – Relation name plus attribute names – E.g. Product(Name, Price, Category, Manufacturer) – In practice we add the domain for each attribute Database Schema – Set of relational schemas – E.g. Product(Name, Price, Category, Manufacturer), Company(Name, Address, Phone), ....... 5 Instances • Relational schema = R(A1,…,Ak) Instance = relation with k attributes (of “type” R) – values of corresponding domains • Database schema = R1(…), R2(…), …, Rn(…) Instance = n relations, of types R1, R2, ..., Rn This is all mathematics, not to be confused with SQL tables ! (What's a difference?) 6 Example Relational schema:Product(Name, Price, Category, Manufacturer) Instance: Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch household Hitachi $203.99 7 First Normal Form (1NF) • A database schema is in First Normal Form Student if all tables are flat Student Name GPA Courses Math Alice Bob Carol 3.8 3.7 3.9 Name GPA Alice 3.8 Bob 3.7 Carol 3.9 DB Takes OS Student Course Alice Math Course Carol Math Math Alice DB DB Bob DB OS Alice OS Carol OS DB Course OS Math OS May need to add keys 8 Functional Dependencies • A form of constraint – hence, part of the schema • Finding them is part of the database design • Also used in normalizing the relations • Warning: this is the most abstract, and “hardest” part of the course. 9 Functional Dependencies Definition: If two tuples agree on the attributes A1, A2, …, An then they must also agree on the attributes B1, B2, …, Bm Formally: A1, A2, …, An B1, B2, …, Bm 10 Examples EmpID E0045 E1847 E1111 E9999 Name Smith John Smith Mary Phone 1234 9876 9876 1234 Position Clerk Salesrep Salesrep Lawyer • EmpID Name, Phone, Position • Position Phone • but Phone Position 11 In General • To check A B, erase all other columns … A … B X1 Y1 X2 Y2 … … • check if the remaining relation is many-one (called functional in mathematics) 12 Example EmpID E0045 E1847 E1111 E9999 Name Smith John Smith Mary Phone 1234 9876 9876 1234 Position Clerk Salesrep Salesrep Lawyer Position Phone 13 Typical Examples of FDs Product: name price, manufacturer Person: ssn name, age Company: name stockprice, president 14 Example Product(name, category, color, department, price) Consider these FDs: name color category department color, category price What do they say ? 15 Example FD’s are constraints on relations: • On some instances they hold • On others they don’t name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Does this instance satisfy all the FDs ? 16 Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 What about this one ? 17 Example If some FDs are satisfied, then others are satisfied too If all these FDs are true: name color category department color, category price Then this FD also holds: name, category price Why ?? 18 Inference Rules for FD’s A1, A2, …, An B1, B2, …, Bm Splitting rule and Combining rule Is equivalent to A1, A2, …, An B1 A1, A2, …, An B2 ..... A1, A2, …, An Bm A1 ... Am B1 ... Bm 19 Inference Rules for FD’s (continued) Trivial Rule A1, A2, …, An Ai where i = 1, 2, ..., n A1 … Am Why ? 20 Inference Rules for FD’s (continued) Transitive Closure Rule If A1, A2, …, An B1, B2, …, Bm and B1, B2, …, Bm C1, C2, …, Cp then A1, A2, …, An C1, C2, …, Cp Why ? 21 A1 … Am B1 … Bm C1 ... Cp 22 Example (continued) Start from the following FDs: 1. name color 2. category department 3. color, category price Infer the following FDs: Inferred FD Which Rule did we apply ? 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price 23 Example (continued) Answers: 1. name color 2. category department 3. color, category price Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price Which Rule did we apply ? Trivial rule Transitivity on 4, 1 Trivial rule Split/combine on 5, 6 Transitivity on 3, 7 24 Another Example • Enrollment(student, major, course, room, time) student major major, course room course time What else can we infer ? 25 Another Rule Augmentation If A1, A2, …, An B then A1, A2, …, An , C1, C2, …, Cp B Augmentation follows from trivial rules and transitivity How ? 26 Problem: infer ALL FDs Given a set of FDs, infer all possible FDs How to proceed ? • Try all possible FDs, apply all 3 rules – E.g. R(A, B, C, D): how many FDs are possible ? • Drop trivial FDs, drop augmented FDs – Still way too many • Better: use the Closure Algorithm (next) 27 Closure of a set of Attributes Given a set of attributes A1, …, An The closure, {A1, …, An}+ , is the set of attributes B s.t. A1, …, An B Example: name color category department color, category price Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color} 28 Closure Algorithm Start with X={A1, …, An}. Example: Repeat until X doesn’t change do: name color category department color, category price B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X. if {name, category}+ = {name, category, color, department, price} 29 Example R(A,B,C,D,E,F) Compute {A,B}+ A, B A, D B A, F C E D B X = {A, B, } Compute {A, F}+ X = {A, F, } 30 Using Closure to Infer ALL FDs Example: A, B C A, D B B D Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ = ABCD, AC+ = AC, AD+ = ABCD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = : AB CD, ADBC, ABC D, ABD C, ACD B 31 Problem: Finding FDs • Approach 1: During Database Design – Designer derives them from real-world knowledge of users – Problem: knowledge might not be available • Approach 2: From a Database Instance – Analyze given database instance and find all FD’s satisfied by that instance – Useful if designers don’t get enough information from users – Problem: FDs might be artifical for the given instance 32 Find All FDs Student Alice Bob Alice Carol Dan Elsa Frank Dept CSE CSE EE CSE CSE CSE EE Course C++ C++ HW DB Java DB Circuits Room 020 020 040 045 050 045 020 Do all FDs make sense in practice ? 33 Answer Course Dept, Room Dept, Room Course Student, Dept Course, Room Student, Course Dept, Room Student, Room Dept, Course Do all FDs make sense in practice ? 34 Keys • A key is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An B • A minimal key is a set of attributes which is a key and for which no subset is a key • Note: book calls them superkey and key 35 Computing Keys • Compute X+ for all sets X • If X+ = all attributes, then X is a key • List only the minimal keys Note: there can be many minimal keys ! • Example: R(A,B,C), ABC, BCA Minimal keys: AB and BC 36 Examples of Keys • Product(name, price, category, color) name, category price category color Keys are: {name, category} and all supersets • Enrollment(student, address, course, room, time) student address room, time course student, course room, time Keys are: 37 Relational Schema Design (or Logical Schema Design) Main idea: • Start with some relational schema • Find out its FD’s • Use them to design a better relational schema 38 Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Update anomalies: need to change in several places Delete anomalies: may lose data when we don’t want 39 Relational Schema Design Example: Persons with several phones Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield SSN Name, City but not SSN PhoneNumber Anomalies: • Redundancy = repeat data • Update anomalies = Fred moves to “Bellevue” • Deletion anomalies = Joe deletes his phone number: what is his city ? 40 Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City SSN PhoneNumber Fred 123-45-6789 Seattle 123-45-6789 206-555-1234 Joe 987-65-4321 Westfield 123-45-6789 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: • No more repeated data • Easy to move Fred to “Bellevue” (how ?) • Easy to delete all Joe’s phone number (how ?) 41 Relational Schema Design name Conceptual Model: Product price Person buys name ssn Relational Model: plus FD’s Normalization: Eliminates anomalies 42 Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp 43 Decomposition • Sometimes it is correct: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name Price Name Category Gizmo 19.99 Gizmo Gadget OneClick 24.99 OneClick Camera Gizmo 19.99 Gizmo Camera Lossless decomposition 44 Incorrect Decomposition • Sometimes it is not: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera What’s incorrect ?? Name Category Price Category Gizmo Gadget 19.99 Gadget OneClick Camera 24.99 Camera Gizmo Camera 19.99 Camera Lossy decomposition 45 Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) If A1, ..., An B1, ..., Bm Then the decomposition is lossless Note: don’t need necessarily A1, ..., An C1, ..., Cp Example: name price, hence the first decomposition is lossless 46 Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others... 47 Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if: If A1, ..., An B is a non-trivial dependency in R , then {A1, ..., An} is a key for R In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R. 48 BCNF Decomposition Algorithm Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2 Until no more violations B’s R1 A’s Others R2 Is there a 2-attribute relation that is not in BCNF ? 49 Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield What are the dependencies? SSN Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF? 50 Decompose it into BCNF Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 SSN Name, City Let’s check anomalies: • Redundancy ? • Update ? • Delete ? 51 Summary of BCNF Decomposition Find a dependency that violates the BCNF condition: A1, A2, …, An B1, B2, …, Bm Heuristics: choose B1 , B2, … Bm“as large as possible” Decompose: Others A’s B’s Continue until there are no BCNF violations left. 2-attribute relations are BCNF R1 R2 52 Example Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor Decompose in BCNF (in class): Step 1: find all keys (How ? Compute S+, for various sets S) Step 2: now decompose 53 Other Example • R(A,B,C,D) A B, B C • • • • Key: AD Violations of BCNF: A B, A C, ABC Pick A BC: split into R1(A,BC) R2(A,D) What happens if we pick A B first ? 54 Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C) Decompose R1(A,B) R2(A,C) Recover R’(A,B,C) should be the same as R(A,B,C) R’ is in general larger than R. Must ensure R’ = R 55 Lossless Decompositions • Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless 56 3NF: A Problem with BCNF Unit Company Product FD’s: Unit Company; Company, Product Unit So, there is a BCNF violation, and we decompose. Unit Company Unit Product Unit Company No FDs Notice: we loose the FD: Company, Product Unit 57 So What’s the Problem? Unit Company Unit Galaga99 Bingo UW UW Galaga99 Bingo Product databases databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again (anomalies?): Unit Galaga99 Bingo Company UW UW Product databases databases Violates the dependency: company, product -> unit! 58 Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if : Whenever there is a nontrivial dependency A1, A2, ..., An B for R , then {A1, A2, ..., An } is a key for R, or B is part of a key. Tradeoff: BCNF = no anomalies, but may lose some FDs 3NF = keeps all FDs, but may have some anomalies 59