11/17/2010 ITCS 3160 Jing Yang 2010 Fall Class 16: Basics of Functional Dependencies and Normalization for Relational Databases (ch.15) 1 Student Relation Is this a good db design? Can you suggest a better design? 2 1 11/17/2010 Informal Design Guidelines for Relation Schemas • Measures of quality – Making sure attribute semantics are clear – Reducing redundant information in tuples – Reducing NULL values in tuples – Disallowing possibility of generating spurious p tuples 3 Imparting Clear Semantics to Attributes in Relations • Semantics of a relation – Meaning resulting from interpretation of attribute values in a tuple • Easier to explain semantics of relation – Indicates better schema design 4 2 11/17/2010 Guideline 1 • Design relation schema so that it is easy to explain its meaning • Do not combine attributes from multiple entity types and relationship types into a single relation 5 Redundant Information in Tuples and Update Anomalies • Redundancy is at the root of several problems associated with relational schemas: associated with relational schemas: – redundant storage, update anomalies • Storing natural joins of base relations leads to update anomalies • Types of update anomalies: – Insertion – Deletion – Modification 6 3 11/17/2010 • What if you want to add a tuple with advisor id 440? • What if you delete the first and the last tuples? • What if you want to update advisor_office for one advisor? 7 Guideline 2 • Design base relation schemas so that no update anomalies are present in the relations d t li t i th l ti • If any anomalies are present: – Note them clearly – Make sure that the programs that update the p y database will operate correctly 8 4 11/17/2010 NULL Values in Tuples • May group many attributes together into a “f t” l ti “fat” relation – Can end up with many NULLs • Problems with NULLs – Wasted storage space – Problems understanding meaning Problems understanding meaning 9 10 5 11/17/2010 Guideline 3 • Avoid placing attributes in a base relation whose values may frequently be NULL h l f tl b NULL • If NULLs are unavoidable: – Make sure that they apply in exceptional cases only, not to a majority of tuples 11 Generation of Spurious Tuples • spurious tuples: meaningless tuples produced b by a natural join t lj i 12 6 11/17/2010 Guideline 4 • Design relation schemas to be joined with equality conditions on attributes that are lit diti tt ib t th t appropriately related – Guarantees that no spurious tuples are generated • Avoid relations that contain matching attributes that are not (foreign key, primary attributes that are not (foreign key, primary key) combinations 13 Discussion • Anomalies cause redundant work to be done • Waste of storage space due to NULLs • Generation of invalid and spurious data during joins 14 7 11/17/2010 Normalization • ideas → E/R → relations → better (normalized) relations relations • Normalization: process of making relations better by decomposing them into smaller relations to – reduce redundancy – eliminate update anomalies • final goal: all relations in Boyce‐Codd Normal Form (BCNF) • Analytic tool: Functional Dependency 15 Functional Dependency (cont’d) • R is a relation, A and B are its attributes. R.A ‐> R.B if and only if for each value of A no more than one value of B is associated. Namely if t1 and t2 are two tuples in the relation R and t1(A) t2(A) then we must have t1(B) = t2(B) t1(A) = t2(A) then we must have t1(B) t2(B) 16 8 11/17/2010 Student Relation 17 Functional Dependency (cont’d) • R is a relation, R.A ‐> R.B • A and B can be sets of attributes 18 9 11/17/2010 Functional Dependency (cont’d.) • Given a populated relation – Cannot determine which FDs hold and which do not – Unless meaning of and relationships among attributes known – Can state that FD does not hold if there are tuples that show violation of such an FD 19 Normalization of Relations • Assumption: a set of FD is given to each relation; each relation has a designated l ti h l ti h d i t d primary key • Takes a relation schema through a series of tests – Certify whether it satisfies a certain normal form Certify whether it satisfies a certain normal form – Proceeds in a top‐down fashion • Normal form tests 20 10 11/17/2010 Normalization of Relations (cont’d.) • Properties that the relational schemas should have: – Nonadditive join property • Must be achieved at any cost – Dependency preservation property • desirable 21 Practical Use of Normal Forms • Normalization carried out in practice – Resulting designs are of high quality and meet the desirable properties stated previously – Pays particular attention to normalization only up to 3NF, BCNF, or at most 4NF • Do not need to normalize to the highest possible normal form 22 11 11/17/2010 Definitions of Keys and Attributes Participating in Keys • Definition of superkey and key (key has to be minimal) i i l) • Candidate key – If more than one key in a relation schema • One is primary key • Others are secondary keys 23 24 12 11/17/2010 First Normal Form • Part of the formal definition of a relation in th b i (fl t) l ti the basic (flat) relational model l d l • Only attribute values permitted are single atomic (or indivisible) values 25 Practice 26 13 11/17/2010 First Normal Form (cont’d.) • Techniques to achieve first normal form – Remove attribute and place in separate relation – Expand the key – Use several atomic attributes Question: Which approach is the best? Question: Which approach is the best? 27 First Normal Form (cont’d.) • Does not allow nested relations • To change to 1NF: To change to 1NF: – Remove nested relation attributes into a new relation – Propagate the primary key into it – Unnest relation into a set of 1NF relations 28 14 11/17/2010 • employee (man#, name, birthdate) • jobhistory (man#, jobdate, title) • salaryhistory (man#, jobdate, salarydate, salary) • children (man#, childname, birthyear) 29 Second Normal Form • Based on concept of full functional d dependency d – Versus partial dependency Consider a new column called “Employ_Name” is added into this table 30 15 11/17/2010 Second Normal Form • Practice: second normalize the previous table with “Emplay_name” attribute” • Second normalize into a number of 2NF relations – Nonprime attributes are associated only with part of primary key on which they are fully functionally dependent 31 Practice 32 16 11/17/2010 Third Normal Form • Based on concept of transitive dependency – X ‐> Y, Y‐>Z and Y is neither a candidate key or a subset of any key • Example: 33 • Problematic FD – Left‐hand side is part of primary key – Left‐hand side is a nonkey attribute 34 17 11/17/2010 Practice 35 General Definition of Third Normal Form A functional dependency X → Y is trivial if Y is a subset of X 36 18 11/17/2010 Boyce‐Codd Normal Form • Every relation in BCNF is also in 3NF – Relation in 3NF is not necessarily in BCNF • Difference: – Condition which allows A to be prime is absent p from BCNF • Most relation schemas that are in 3NF are also in BCNF 37 Example 38 19 11/17/2010 Interesting Example • Is this relation in BCNF? • Is there any redundancy? 39 Multivalued Dependency Ename ->-> Pname Ename ->->Dname 40 20 11/17/2010 Fourth Normal Form • Fourth normal form (4NF) Fourth normal form (4NF) – Violated when a relation has undesirable multivalued dependencies 41 Fifth Normal Form • A table is in the fifth normal form (5NF) if it cannot have a lossless decomposition into any th l l d iti i t number of smaller tables. 42 21 11/17/2010 Practice • CAR_SALE (CarID, Option_type, O ti Option_Listprice, Sale_date, Li t i S l d t Discounted_price) • CarID → Sale_date • Option_type→ Option_Listprice • CarID, Option_type CarID Option type → Discounted_price Discounted price 43 Practice • Student (Sid, Name, Age, Did, Dname) 44 22 11/17/2010 Practice • Car_Sale(Car#, Date_sold, Salesperson#, C Commission%, Discount_amt) i i % Di t t) • Date_sold‐>Discount_amt • Salesperson#‐>Commission% 45 Homework 3 • Textbook: 15.30, 15.31 46 23 11/17/2010 47 Normalize the following tables L_C (Loan_no, Amount, Type, Ssn, Phone, Name, Add ) Addr) 48 24 11/17/2010 Author (AID, Aname) Book (Bid, Btitle, Pid, Year) Publisher(Pid, Pname, Paddress) Author_Book(AID, Bid) Queries: 1. Find the title and year of all books published by “Addison Wesley” (a publisher) 2. Find all authors who have published any books by “Addison Wesley” 3. How many books are published by “Addision Wesley” in year 2009? 4. Print all the name and the number of books published by each author. 5. Which publisher published the largest number of 50 books? 25