R - SEAS - University of Pennsylvania

Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems March 16, 2016 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan Announcements  Homework 3 will be due Monday10/22  Fall break will be 10/19, midterm on 10/26 2 Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s axioms:  Reflexivity: If Y  X then X  Y name, sid  name (trivial dependencies)  Augmentation: If X  Y then XW  YW serno  subj so serno, exp-grade  subj, exp-grade  Transitivity: If X  Y and Y  Z then X  Z serno  cid and cid  subj so serno  subj 3 Armstrong’s Axioms Lead to… If X  Y and X  Z then X  YZ  Pseudotransitivity: If X  Y and WY  Z then XW  Z  Decomposition: If X  Y and Z  Y then X  Z  Union: Let’s prove a few of these from Armstrong’s Axioms 4 Closure of a Set of FD’s Defn. Let F be a set of FD’s. Its closure, F+, is the set of all FD’s: {X  Y | X  Y is derivable from F by Armstrong’s Axioms} Which of the following are in the closure of our Student-Course FD’s? StudentData(sid, name, serno, cid, subj, grade) name  name cid  subj serno  subj cid, sid  subj cid  sid 5 Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set X, X+, is: X+ =  {Y | X  Y  F +}  This answers the question “is Y determined (transitively) by X?”; compute X+ by: closure := X; repeat until no change { if there is an FD U  V in F such that U is in closure then add V to closure}  Does sid, serno  subj, exp-grade? 6 Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent if their closures are equivalent, F + = G + e.g., these two sets are equivalent: {XY  Z, X  Y} and {X  Z, X  Y}  F + contains a huge number of FD’s (exponential in the size of the schema)  Would like to have smallest “representative” FD set 7 Minimal Cover we express Defn. A FD set F is minimal if: each FD in 1. Every FD in F is of the form X  A, simplest form where A is a single attribute 2. For no X  A in F is: in a sense, F – {X  A } equivalent to F each FD is “essential” 3. For no X  A in F and Z  X is: to the cover F – {X  A }  {Z  A } equivalent to F Defn. F is a minimum cover for G if F is minimal and is equivalent to G. e.g., {X  Z, X  Y} is a minimal cover for {XY  Z, X  Z, X  Y} 8 More on Closures If F is a set of FD’s and X  Y  F + then for some attribute A  Y, X  A  F + Proof by counterexample. Assume otherwise and let Y = {A1,..., An} Since we assume X  A1, ..., X  An are in F + then X  A1 ... An is in F + by union rule, hence, X  Y is in F + which is a contradiction 9 Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for FD’s? They are:  Consistent: any relation satisfying FD’s in F will satisfy those in F +  Complete: if an FD X  Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not XY  In other words, Armstrong’s axioms derive all the FD’s that should hold  What is the goal of using these axioms? 10 Decomposition Consider our original “bad” attribute set Stuff(sid, name, serno, subj, cid, exp-grade) We could decompose it into: Student(sid, name) Course(serno, cid) Subject(cid, subj) But this decomposition loses information about the relationship between students and courses. Why? 11 Lossless Join Decomposition R1, … Rk is a lossless join decomposition of R w.r.t. an FD set F if for every instance r of R that satisfies F, R1(r) ⋈ ... ⋈ Rk(r) = r Consider: sid name serno subj cid exp-grade 1 23 Sam Nitin 570103 550103 AI DB 570 550 B A What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)? 12 Testing for Lossless Join R1, R2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R1  R2)  R1 – R2 (R1  R2)  R2 – R1 So for the FD set: sid  name serno  cid, exp-grade cid  subj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition? 13 Dependency Preservation Ensures we can “easily” check whether a FD X  Y is violated during an update to a database:  The projection of an FD set F onto a set of attributes Z, FZ is {X  Y | X  Y  F +, X  Y  Z} i.e., it is those FDs local to Z’s attributes  A decomposition R1, …, Rk is dependency preserving if F + = (FR1 ... FRk)+ The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join 14 Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip, item, price) And FD set name  street, city street, city  st street, city  zip name, item  price Consider the decomposition R1(name, street, city, st, zip) and R2(name, item, price)  Is it lossless?  Is it dependency preserving? What if we replaced the first FD by name, street  city? 15 Another Example Given scheme: R(sid, fid, subj) and FD set: fid  subj sid, subj  fid Consider the decomposition R1(sid, fid) and R2(fid, subj)  Is it lossless?  Is it dependency preserving? 16 FD’s and Keys  Ideally, we want a design s.t. for each nontrivial dependency X  Y, X is a superkey for some relation schema in R  We just saw that this isn’t always possible  Hence we have two kinds of normal forms 17 Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial) ,or or X is a superkey for R Third Normal Form (3NF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial), or X is a superkey for R, or A is a member of some key for R 18 Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation  It’s strictly stronger than 3NF Let’s see algorithms to obtain:  A BCNF lossless join decomposition (nondeterministic)  A 3NF lossless join, dependency preserving decomposition 19 BCNF Decomposition Algorithm (from Korth et al.; our book gives a recursive version) result := {R} compute F+ while there is a relation schema Ri in result that isn’t in BCNF { i.e., A doesn’t form a key let A  B be a nontrivial FD on Ri s.t. A  Ri is not in F+ and A and B are disjoint } result:= (result – Ri)  {(Ri - B), (A,B)} 20 An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  name fid  prof serno  classroom, cid, fid  Find the Boyce-Codd Normal Form for this schema  What if instead: sid  name fid  prof classroom, cid  serno serno  cid 21 3NF Decomposition Algorithm Let F be a minimal cover i:=0 for each FD A  B in F { if none of the schemas Rj, 1 j  i, contains AB { increment i Ri := (A, B) } } if no schema Rj, 1  j  i contains a candidate key for R { increment i Ri := any candidate key for R } return (R1, …, Ri) Build dep.preserving decomp. Ensure lossless decomp. 22 An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  name fid  prof serno  classroom, cid, fid  Find the Third Normal Form for this schema  What if instead: sid  name fid  prof classroom, cid  serno serno  cid 23 Summary of Normalization  We can always decompose into 3NF and get:  Lossless join  Dependency preservation  But with BCNF:  We are only guaranteed lossless joins  The algorithm is nondeterministic, so there is not a unique decomposition for a given schema R  BCNF is stronger than 3NF: every BCNF schema is also in 3NF 24 Normalization Is Good… Or Is It?  In some cases, we might not mind redundancy, if the data isn’t directly updated:  Reports (people like to see breakdowns by semester, department, course, etc.)  Warehouses (archived copies of data for doing complex analysis)  Data sharing (sometimes we may export data into objectoriented or hierarchical formats) 25

R - SEAS - University of Pennsylvania

Related documents

Products

Support

R - SEAS - University of Pennsylvania

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib