Lecture 6: Schema refinement: Functional dependencies www.cl.cam.ac.uk/Teaching/current/Databases/ 1 Recall: Database design lifecycle • Requirements analysis – User needs; what must database do? • Conceptual design – High-level description; often using E/R model • Logical design Next two • Schema refinement lectures – Check schema for redundancies and anomalies – Translate E/R model into relational schema • Physical design/tuning – Consider typical workloads, and further optimise 2 Today’s lecture • Why are some designs bad? • What’s a functional dependency? • What’s the theory of functional dependencies? • (Next lecture: How can we use this theory to classify redundancy in relation design?) 3 Not all designs are equally good • Why is this design bad? Data(sid,sname,address,cid,cname,grade) • Why is this one preferable? Student(sid,sname,address) Course(cid,cname) Enrolled(sid,cid,grade) 4 An instance of our bad design sid sname addres ci s d 124 Britney USA 206 Database A++ 204 124 206 124 Essex USA London USA 202 201 206 202 C A+ BB+ Victoria Britney Emma Britney cname Semantics S/Eng I Database Semantics grade 5 Evils of redundancy • Redundancy is the root of many problems associated with relational schemas – Redundant storage – Update anomalies – Insertion anomalies – Deletion anomalies – LOW TRANSACTION THROUGHPUT • In general, with higher redundancy, if transactions are correct (no anomalies), then they have to lock more objects thus causing greater contention and lower throughput 6 Decomposition • We remove anomalies by replacing the schema Data(sid,sname,address,cid,cname,grade) with Student(sid,sname,address) Course(cid,cname) Enrolled(sid,cid,grade) • • Note the implicit extra cost here Two immediate questions: 1. Do we need to decompose a relation? 2. What problems might result from a decomposition? 7 Functional dependencies • Recall: – A key is a set of fields where if a pair of tuples agree on a key, they agree everywhere • In our bad design, if two tuples agree on sid, then they also agree on address, even though the rest of the tuples may not agree 8 Functional dependencies cont. • We can say that sid determines address – We’ll write this sid address • This is called a functional dependency (FD) • (Note: An FD is just another integrity constraint) 9 Functional dependencies cont. • We’d expect the following functional dependencies to hold in our Student database – sid sname,address – cid cname – sid,cid grade • A functional dependency X Y is simply a pair of sets (of field names) – Note: the sloppy notation A,B C,D rather than {A,B} {C,D} 10 Formalities • Given a relation R=R(A1:1, …, An:n), and X, Y ({A1, …, An}), an instance r of R satisfies XY, if – For any two tuples t1, t2 in R, if t1.X=t2.X then t1.Y=t2.Y • Note: This is a semantic assertion. We can not look at an instance to determine which FDs hold (although we can tell if the instance does not satisfy an FD!) 11 Properties of FDs • Assume that X Y and Y Z are known to hold in R. It’s clear that X Z holds too. • We shall say that an FD set F logically implies X Y, and write F [X Y – e.g. {X Y, Y Z} [ X Z • The closure of F is the set of all FDs logically implied by F, i.e. F+ @ {XY | F [ XY} • The set F+ can be big, even if F is small 12 Closure of a set of FDs • Which of the following are in the closure of our Student FDs? – addressaddress – cidcname – cidcname,sname – cid,sidcname,sname 13 Candidate keys and FDs • If R=R(A1:1, …, An:n) with FDs F and X{A1, …, An}, then X is a candidate key for R if – X A1, …,An F+ – For no proper subset YX is Y A1, …,An F+ 14 Armstrong’s axioms • Reflexivity: If YX then F \ XY – (This is called a trivial dependency) – Example: sname,addressaddress • Augmentation: If F \ XY then F \ X,WY,W – Example: As cidcname then cid,sidcname,sid • Transitivity: If F \ XY and F \ YZ then F \ XZ – Example: As sid,cidcid and cidcname, then sid,cidcname 15 Consequences of Armstrong’s axioms • Union: If F \ XY and F \ XZ then F \ XY,Z • Pseudo-transitivity: If F \ XY and F \ W,YZ then F \ X,WZ • Decomposition: If F \ XY and ZY then F \ XZ Exercise: Prove that these are consequences of Armstrong’s axioms 16 Proof of Union Rule Suppose that F \ XY and F \ XZ. By augmentation we have F \ XX,Y since X U X = X. Also by augmentation F \ X,YZ,Y Therefore, by transitivity we have F \ XZ,Y QED 17 Functional Dependencies Can be useful in Algebraic Reasoning Suppose R(A,B,C) is a relation schema with dependency AB, then R A, B ( R) A A,C ( R) (This is called Heath’s rule.) 18 Proof of Heath’s Rule First show that A A,C ( R) Suppose then and Since we have A A,C ( R) 19 Proof of Heath’s Rule (cont.) In the other direction, we must show that Suppose A Then there must exist records and There must also exist so that But the functional dependency tells us that Therefore, we have QED 20 Equivalence • Two sets of FDs, F and G, are said to be equivalent if F+=G+ • For example: {(A,BC), (AB)} and {(AC), (AB)} are equivalent • F+ can be huge – we’d prefer to look for small equivalent FD sets 21 Minimal cover • An FD set, F, is said to be minimal if 1. Every FD in F is of the form XA, where A is a single attribute 2. For no XA in F is F-{XA} equivalent to F 3. For no XA in F and ZX is (F-{XA}){ZA} equivalent to F • For example, {(AC), (AB)} is a minimal cover for {(A,BC), (AB)} 22 More on closures • FACT: If F is an FD set, and XYF+ then there exists an attribute AY such that XAF+ 23 Why Armstrong’s axioms? • Soundness – If F \ XY is deduced using the rules, then XY is true in any relation in which the dependencies of F are true • Completeness – If XY is is true in any relation in which the dependencies of F are true, then F \ XY can be deduced using the rules 24 Soundness • Consider the Augmentation rule: – We have XY, i.e. if t1.X=t2.X then t1.Y=t2.Y – If in addition t1.W=t2.W then it is clear that t1.(Y,W)=t2.(Y,W) 25 Soundness cont. Consider the Transitivity rule: – We have XY, i.e. if t1.X=t2.X then t1.Y=t2.Y (*) – We have YZ, i.e. if t1.Y=t2.Y then t1.Z=t2.Z (**) – Take two tuples s1 and s2 such that s1.X=s2.X then from (*) s1.Y=s2.Y and then from (**) s1.Z=s2.Z 26 Completeness • Exercise – (You may need the fact from slide 23) 27 Attribute closure • If we want to check whether XY is in a closure of the set F, could compute F+ and check – but expensive • Cheaper: We can instead compute the attribute closure, X+, using the following algorithm: closure:= X; repeat until no change{ if UVF, where Uclosure then closure:=closureV }; • Then F \ XY iff Y is a subset of X+ Try this with sid,snamecname,grade 28 Preview of next lecture: Goals of normalisation • Decide whether a relation is in “good form” • If it is not, then we will “decompose” it into a set of relations such that – Each relation is in “good form” – The decomposition has not lost any information that was present in the original relation • The theory of this process and the notion of “good form” is based on FDs 29 Summary You should now understand: • Redundancy and various forms of anomalies • Functional dependencies • Armstrong’s axioms Next lecture: Schema refinement: Normalisation 30