Cleveland State University CIS 611 – Relational Databases Prepared by Victor Matos Functional Dependencies Source: The Theory of Relational Databases D. Maier, Ed. Computer Science Press Available at: http://www.dbis.informatik.hu-berlin.de/~freytag/Maier/ 1 Functional Dependencies • Two primary purposes of databases are to – attenuate data redundancy and – enhance data reliability. • Any a priori knowledge of restrictions or constraints on permissible sets of data has considerable usefulness in reaching these goals. • Data dependencies are one way to formulate such advance knowledge. 2 Example1 • Consider the relation assign (Pilot, Flight, Date, Departs) PILOT FLIGHT DATE DEPARTS Cushing Cushing Clark Clark Clark Chin Chin Copely Copely Copely 83 116 281 301 83 83 116 281 281 412 9 Aug IO Aug 8 Aug 12 Aug 11 Aug 13 Aug 12 Aug 9 Aug 13 Aug 15 Aug 10: 15a 1:25p 5:50a 6:35p 10: 15a 10: 15a 1:25p 5:50a 5:50a 1:25p 3 Example1 - Observations • • • The relation assign tells which pilot flies a given flight on a given day, and what time the flight leaves. Not every combination of pilots, flights, dates, and times is allowable in assign. The following restrictions apply, among others: 1. 2. 3. For each flight there is exactly one time. For any given pilot, date, and time, there is only one flight. For a given flight and date, there is only one pilot. • These restrictions are examples of functional dependencies. • Informally, a functional dependency occurs when the values of a tuple on one set of attributes uniquely determine the values on another set of attributes. • Our restrictions can be phrased as 1. 2. 3. TIME functionally depends on FLIGHT, FLIGHT functionally depends on {PILOT, DATE, TIME}, and PILOT functionally depends on {FLIGHT, DATE}. 4 FD Definition Def. Let r be a relation on scheme R, with X and Y subsets of R. Relation r satisfies the functional dependency (FD) X Y if for every X-value x, y( X=x(r)) has at most one tuple. One way to interpret this expression is to look at pairs of tuples, t1 and t2, in r. If t1(X) = t2(X), then t1(Y) = t2(Y). In the FD X Y the portion X is called the left side and Y is called the right side. 5 FD Satisfies Algorithm 4.1 SATISFIES Input: A relation r and an FD X Y. Output: true if T satisfies X Y, false otherwise. SATISFIES(r, X Y); 1. Sort the relation r on its X columns to bring tuples with equal X-values together. 2. If each set of tuples with equal X-values has equal Yvalues, return true. Otherwise, return false. SATISFIES tests if a relation r satisfies an FD X Y. 6 Algorithm: Satisfies Using algorithm satisfies to test if FLIGHT DEPARTS PILOT Cushing Clark Chin Cushing Chin Clark Copely Copely Clark Copely FLIGHT 83 83 83 116 116 281 281 281 301 412 DATE 9-Aug 11-Aug 13-Aug IO Aug 12-Aug 8-Aug 9-Aug 13-Aug 12-Aug 15-Aug DEPART 10: 15a 10: 15a 10: 15a 1:25p 1:25p 5:50a 5:50a 5:50a 6:35p 1:25p Question: DEPARTS FLIGHT ??? 7 Inference Axioms • The number of FDs that can apply to a relation r(R) is finite, since there is only a finite number of subsets of R. • Thus it is always possible to find all the FDs that r satisfies, by trying all possibilities using the algorithm SATISFIES. • This brute-force approach is time-consuming. 8 Inference Axioms • Finding F requires semantic knowledge of the relation r. • After knowing some members of F, it is often possible to infer other members of F. • A set F of FDs implies the FD X Y, written F X Y, if every relation that satisfies all the FDs in F also satisfies X Y. • An inference axiom is a rule that states if a relation satisfies certain FDs, it must satisfy certain other FDs. 9 Inference Axioms F={ ...} Set of functional dependencies XY Set of all relations r(R) satisfying FDs in F A set F of FDs implies the FD X Y, written F X Y, if every relation that satisfies all the FDs in F also satisfies X Y. 10 Example - Inference Axioms F = { A B, B C } Set of functional dependencies AC Set of all relations r(R) satisfying FDs in F A set F of FDs implies the FD X Y, written F X Y, if every relation that satisfies all the FDs in F also satisfies X Y. 11 Inference Axioms The Armstrong-Set of Inference Axioms • Axioms will implement the “intelligence” needed to prove (or disprove) a sequence of derivations. • Inference Machines are used to determine whether or not the application of the axioms on some ‘basic knowledge’ produces a ‘new’ valid piece of knowledge not there in the basic set. • The first set we will consider is called the A-set proposed by W. Armstrong1. 1 William Armstrong: Dependency Structures of Data Base Relationships, page 580-583. IFIP Congress, 1974. 12 A-Axioms A1. Reflexivity XX A2. Augmentation If (Z A3. Additivity If { (X Y) and (X Z)} then X YZ A4. Projectivity If (X YZ) then X Y A5. Transitivity If (X Y) and (Y Z) then (X Z) W; X Y) then XW YZ A6. Pseudotransitivity If (X Y) and (YZ W) then XZ W 13 Inference Machine INPUT: Relation schema R Set F of FDs on R YES A-Axioms A1 A2 ... A6 INFERENCE MACHINE Output Is the “new” rule XY derived from what is known (R, F) by using the intelligence provided by the A-Axioms ? NO INPUT: A “new” rule of the form X With X and Y in schema(R) Y If NO we must conclude that (F XY) is not true 14 Example1 - Using the A-Axioms Consider R = (Street, Zip, City) ; and the dependencies F = { City Street Zip, Zip City } We want to show: Street Zip Street Zip City Proof: 15 Example1 - Using the A-Axioms Consider R = (Street, Zip, City) ; and the dependencies F = { City Street Zip, Zip City } We want to show: Street Zip Street Zip City Proof: 1. Zip City 2. Street Zip Street City 3. City Street Zip 4. City Street City Street Zip – Given – Augmentation of (1) by Street – Given – Augmentation of (3) by City Street 5. Street Zip City Street Zip – Transitivity of (2) and (4) 16 Example2 – Using A-Axioms Consider the relation schema <R,F> where R = (ABCDEGHI) and dependencies F = { ABE AGJ BE I E G GI H } Show that AB GH is derived by F. If YES give a proof If NO provide a counter-example 17 Example2A – Using A-Axioms Consider the relation schema <R,F> where R = (ABCDEGHI) and dependencies F = { ABE AGJ BE I E G GI H } Show that AB GH is derived by F. Q.E.D. quod erat demonstrandum Step Statement Explanation 1 AB E Given 2 E G Given 3 AB G Transitivity on (1) and (2) 4 AB BE Augmentation (1) by B 5 BE I Given 6 AB I Transitivity on (4) and (5) 7 AB GI Additivity on (6) and (3) 8 GI H Given 9 AB H Transitivity on (7) and (8) 10 AB GH Additivity on (3) and (11) 18 Example2B – Using A-Axioms Consider the relation schema <R,F> where R = (ABCDEGHI) and dependencies F = { ABE AGJ BE I E G GI H } Show that AB GH is derived by F. again! quod erat demonstrandum Q.E.D. Step Statement Explanation 1 AB E Given 2 AB AB Reflexivity 3 AB B Projectivity on (2) 4 AB BE Additivity on (1) and (3) 5 BE I Given 6 AB I Transitivity on (4) and (5) 7 EG Given 8 AB G Transitivity on (1) and (7) 9 AB GI Additivity on (6) and (8) 10 GI H Given 11 AB H Transitivity on (9) and (10) 12 AB GH Additivity on (8) and (11)19 Example3 – Using A-Axioms Consider the relation schema <R,F> where R = (ABCDEGHI) and dependencies F = { ABE AGJ BE I E G GI H } Step Show that AEI H is derived by F. Statement Explanation 1 2 3 4 Your turn! 5 6 7 8 9 10 11 12 20 Reducing the A-Axioms The set of A-Axioms is not minimal, therefore some of its rules could be eliminated. Observations • Rule A5 (transitivity) is a special case of rule A6 (pseudo-transitivity). • Rules A3 (additivity) and A4 (projectivity) can be derived from A1 (reflexivity), A2 (augmentation), A6 (pseudo-transitivity). Proof (a) First observation is trivial (just make Z= Ø) (b) Axiom A3 (Additivity) states that two rules, say X Y and X Z, can be combined in one X YZ. Lets use A2 on X Y to produce XZ YZ. Repeat A2 this time on X Z to produce X XZ. Now apply A5 on X XZ and XZ YZ; we get X YZ. Therefore, we conclude that X YZ without using the rule A3 itself (see next page) 21 Reducing the A-Axioms The set of A-Axioms is not minimal, therefore some of its rules could be eliminated. Statement Axiom A3 is redundant. Rule A3 (Additivity) states that two rules, say X Y and X Z, can be combined in one X YZ. Proof We can prove that this fact is true without using A3 1. 2. 3. 4. 5. X XZ X X X Y YZ Z XZ YZ Given (A2) Augmenting (1) by Z Given (A2) Augmenting (3) by Z (A5) Transitivity on (4) and (2) 22 Characterizing the A-Axioms • The set of A-Axioms is complete Therefore every FD that is implied by a set F of FDs can be derived from the FDs in F and one or more applications of the A-Axioms ( FA XY ) • A-Axioms are correct Applying the axioms to FDs in a set F can only produce FDs that are implied by F. • The set of A-axioms is not minimal Some rules are added for convenience but they can be removed without diminishing the expressive power of the A-axioms 23 Correctness of the A-Axioms The axioms can not be used to prove a false derivation. In such a case showing a counter-example is sufficient to establish the falsity of a statement. Example Assume schema R(XYZW). Does ( XY ZW ) A X Z ? The correct answer is NO. To show support for our argument we produce a counter-example. For instance: X Y Z W 1 2 3 4 1 5 6 7 On the example table there are no violations to the fact that XY implies a unique ZW (12 34 and 15 67). However X=1 determines two different Z values, 3 and 6. Therefore X Z is not a valid dependency as shown in the counter-example. 24 Closure F+ • Let F be a set of FDs for a relation r(R). The closure of F, denoted F+, is the smallest set containing F such that the A-axioms cannot be applied to the set to produce a new rule not included in the set already • Since F+ must be finite, we can compute it by starting with F, applying A1, A2, and A6, and adding the derived FDs to F until no new FDs can be derived. • • The closure of F depends on the scheme R. If R = (A B) then F+ will always contain B B, but if R = (A C), F+ never contains B B. F+ F 25 Closure F+ • The set F derives an FD X Y if X Y is in F+. • Since our inference axioms are correct, if F derives X Y, then F implies X Y ( F A X Y) • Note that F+ = (F+)+ • It is desirable to determine whether F computing F+ A X Y without • Computing the entire set F+ is time-consuming and tedious 26 Closure F+ Example: Consider the relation schema <R,F> where R = (A B C) and F = { AB C, C B }. By the use of brute-force we produce all rules out of F. F+ is the set of rules listed below A B C C A B C B AB AC BC AB AB AB … BC AB AC BC C A B C ABC ABC ABC ABC ABC … ABC ABC A B C AB BC F+ F = { ABC, CB } 27 Closure F+ Example: Consider the relation schema <R,F> where R = (ABC) and F = { AB C, C B }. Question: Does F B C ? Answer: F+ is the set of rules listed below and B C is not in the set; therefore the rule B C is not implied by F. A B C C A B C B AB AC BC AB AB AB … BC AB AC BC C A B C ABC ABC ABC ABC ABC … ABC ABC A B C AB BC This rule is NOT reachable from F BC F+ F = { ABC, CB } 28 Closure F+ Aside: How many FDs are there in <R,F> n An upper bound is r 1 n r n r 1 n r (2n 1)2 Each sum term represents the possible combinations of r attributes made out of the total n domains for each of the m X Y rule in F. for n=3 there are (23-1)2 = 49 possibilities, however for R holding 10 attributes there are over a million possible FDs 29 Closure F+ Definition. An FD X Y is trivial if X Y. • If F is a set of FDs over R and X is a subset of R, then there is a FD X Y in F+ such that Y is maximal: for any other FD X Z in F+, Y Z. – This result follows from additivity. – The right side Y is called the closure of X and is denoted by X+. • The closure of X always contains X, by reflexivity. 30 Derivations and DDAGs • If F XY, then either X Y is in F, or a series of applications of the inference A-axioms to F will yield X Y. • This sequence of axiom applications and resulting FDs is called a derivation of X Y from F. • More formally, let F be a set of FDs over scheme R . A sequence P of FDs over R is a derivation sequence on F if every FD in P either – is a member of F, or – follows from previous FDs in P by an application of one of the inference axioms A1 to A6. • P is a derivation sequence for XY if X Y is one of the FDs in P. • Definition Let P be a derivation sequence on F. The use set of P is the collection of all FDs (originally) in F that appear in P. 31 Derivations and DDAGs EXAMPLE Consider schema r(ABCDEG) and functional dependencies F = { A BC, BD G, C ED } A derivation sequence for A E is Step Explanation 1 2 3 4 5 Try… 32 Derivations and DDAGs EXAMPLE Consider schema r(ABCDEG) and functional dependencies F = { A BC, BD G, C ED } A derivation sequence for A E is Step 1 2 3 4 5 Explanation A A C C A BC C ED E E (given) (Projectivity [A4] on 1) (given) (Projectivity[A4] on 3) (Transitivity[A5] on 2 and 4) The set P for AE (five rules written above) is a derivation sequence on F. The Use_Set_Of_P is = {A BC, C ED } 33 Derivations and DDAGs Example. Consider schema <R, F> where R= { A B C D E G H I J } and F = { ABE, AG J, BE I, E G, GI H} The following sequence is a derivation sequence for A B G H. Step 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Explanation AB E AB AB AB B AB BE BE I AB I E G AB G AB GI GI H AB H GI GI G I I ABG H given) (reflexivity) (projectivity from 2) (additivity from 1 and 3) (given) (transitivity from 4 and 5) (given) (transitivity from 1 and 7) (additivity from 6 and 8) (given) (transitivity from 9 and 10) (reflexivity) (projectivity from 12) (additivity from 8 and 11) This P sequence contains unneeded FDs, such as 12 and 13, and is also a derivation sequence for other FDs, such as A B G I. The Use_Set_Of_P is {C ED, BE I, E G, GI H } 34 B-Axioms Definition: The B-Axioms set is a small and complete collection of inference rules. It is not a subset of A1 to A6, however it is equally expressive. For a relation (R), with W, X, Y, and Z subsets of R, and C an attribute in R then: B1. Reflexivity X X B2. Accumulation If (X YZ) and (Z CW) then X B3. Projectivity If (X YZ) then X Y YZC Motivation: This is another approach to the problem of finding a sequence of derivations using a smaller set of axioms. Significance: Since B-Axioms are complete, we can always find a derivation sequence using only the three B-axioms to assert whether or not FB XY. 35 B-Axioms Example: Let R = (ABCDEGHI) and F = {ABE AG J BE I EG GI H} Problem Find a derivation sequence P showing FB AB GH using only B-axioms. Answer See P sequence on the right Step Explanation 1 2 3 4 5 6 7 8 9 10 11 Comment (a) Use_Set_Of_P contains rules on lines 2, 5, 9, 11. (b) Too many steps! 12 13 14 36 B-Axioms Example: Let R = (ABCDEGHI) and F = {ABE AG J BE I EG GI H} Problem Find a derivation sequence P showing FB AB GH using only B-axioms. Answer See P sequence on the right Comment (a) Use_Set_Of_P contains rules on lines 2, 5, 9, 11. (b) Too many steps! Step Explanation 1 EI EI Reflexivity (B1) 2 EG Given 3 EI EIG Accumulation (B2) 4 EI GI Projectivity (B3) from (3) 5 GI H Given 6 EI GHI Accumulation from (4) and (5) 7 EI GH Projectivity from (6) 8 AB AB Reflexivity 9 AB E Given 10 AB ABE Accumulation from (8) and (9) 11 BE I Given 12 AB ABEI Accumulation from (10) and (11) 13 AB ABEIG Accumulation from (4) and (12) 14 AB ABEGHI Accumulation from (7) and (13) 15 AB GH Projectivity from (14) Ok, but useless 37 RAP-Derivation Sequence RAP: Stands for: Reflexivity, Augmentation, Projectivity Definition: Consider derivation sequences for X Y on a set F of FDs using the B-axioms that satisfy the following constraints: 1. 2. 3. The first FD is X X The last FD is X Y Every FD other than the first and last is either an FD in F (given) or and FD of the form X Z that was derived using axiom B2 (Accumulation). Such a derivation is called a RAP-derivation sequence 38 RAP-Derivation Sequence Example: Let R = (ABCDEGHI) and F = { ABE AGJ BE I E G GI H } Find a RAP-sequence for AB GH Step Comments 1. The table contains a RAP sequence for ABGH. 2. Each rule in P is either given in F or the result of applying B2 on previous rules in P. 3. First and Last lines agree with the definition of RAP sequence. 4. Use_Set_Of_P contains rules in lines 2, 4, 6, 8. Explanation 1 AB AB B1 2 AB E Given 3 AB ABE B2 4 BE I Given 5 AB ABEI B2 6 EG Given 7 AB ABEIG B2 8 GI H Given 9 AB ABIGH B2 10 AB GH B3 39 RAP-Derivation Sequence Example: Let R = (ABCDEGHI) and F = { ABE AGJ BE I E G GI H } Find a RAP-sequence for BHE GI Step Explanation 1 Your turn… 2 3 4 5 6 7 8 9 10 40 RAP-Derivation Sequence Example: Let R = (ABCDEG) and F = { A BC, BD G, C ED } Find a RAP-sequence for AD GE Step Explanation 1 Your turn… 2 3 4 5 6 7 8 9 10 41 RAP-Derivation Sequence Example: Let R = (ABCDEI) and F = { A D, AB E, BI E, CD I, E Find a RAP-sequence for AE DCI C} Step Explanation 1 Your turn… 2 3 4 5 6 7 8 9 10 42 Derivation DAGs A directed acyclic graph (DAG) is a directed graph with no directed paths from any node to itself. A labeled DAG is a DAG with an element from some labeling set L associated with each node. Valid DAG (disconnected but OK) Not a Valid DAG (Path: A-D-A makes a cycle) 43 Derivation DAGs • DAGS are a convenient way of graphically representing a derivation sequence of the form F B X Y • Whenever there is a RAP derivation sequence there is an equivalent DDAG (and conversely) 44 Derivation DAGs EXAMPLE Consider schema r(ABCDEG) and functional dependencies F = { A BC, BD G, C ED }. Show a DDAG for AD GE D G B A C E NOTE: The Use_Set of the derivation sequence is { A BC, BD G, C E } 45 Derivation DAGs Rules for Constructing a DDAG Rule 1. Any set of unconnected nodes with labels from r(R) is an F-based DDAG A1 A2 Rule 2. Let H be a DDAG including nodes labeled A1 … Ak. Let rule A1…Ak B be part of F. Form graph H’ by adding a new node labeled “B” and new edges <A1,B>,…,<AK,B>. A1 … New edges B New node AK AN Rule 3. Nothing else is an F-based derivation DDAG. 46 Derivation DAGs Example Consider the relation schema r(ABCDEGHIJ) subject to the dependencies in F = { AB E, AG J, BE I, E G, GI H }. Draw a DDAG for rule AB GH A B E G I H Note: The Use_Set of the derivation sequence is { AB E, BE I, E G, GI H } 47 Derivation DAGs Example Consider the relation schema r(ABCDEGHIJ) subject to the dependencies in F = { AB E, AG J, BE I, E G, GI H }. Draw a DDAG for the new rule BIG JA B J I H A G NOTE: No path from source to destination is possible, therefore the new rule BIG AJ is not derivable from F. 48 Derivation DAGs Example Consider the relation schema R(ABCDEGHIJ) subject to the dependencies in F = { AB E, AG J, BE I, E G, GI H }. Draw a derivation DDAG for the new rule AB HC G A B E C I H NOTE: Node C is not reachable from the source. Therefore the rule AB CH cannot be deduced from F. 49 X+ Closure of a Set of Attributes • In order to simplify the asserting of whether or not a rule X Y follows from a set F of FDs, we will compute X+ the closure of a set of attributes X • The set X+ is the maximal set of attributes which can be derived from X using a RAP derivation sequence starting on X • We will say that X Y is in F+ whenever Y is in X+ 50 Computing X + The following algorithm to compute X+ has poor performance but is easy to understand Algorithm: Input: Output: X-Closure A set of attributes X and a set of FDs F The closure of X under F denoted X+ function X-CLOSURE (X, F) begin OldDep = ; NewDep = X; while ( NewDep OldDep ) do begin OldDep = NewDep for every FD A B in F do if ( NewDep A ) then NewDep = NewDep end while; return ( NewDep ) end function; B; 51 Computing X + EXAMPLES Consider the relation schema r(ABCDEGHIJ) subject to the dependencies in F = { AB E, AG J, BE I, E G, GI H }. Compute closure of AB (AB) + = A B ABE ABEI ABEIG A B E I G HJ reflexivity using AB E BE I E G GI H nothing else could be added to AB+ Note: Observe that AB ABEIHG. This rule is a compact notation for the 27 FDs having AB as LHS. 52 Computing X + EXAMPLES Consider the relation schema r(ABCDEGHIJ) subject to the dependencies in F = { AB E, AG J, BE I, E G, GI H }. Compute closure DEC (DEC) + = D E C DECG using E G nothing else could be added 53 Member Algorithm Checking Membership In order to verify whether or not a functional dependency X Y could be derived from a set F of FDs the following simple test could be applied F B X Y if Y is part of X+ 54 Member Algorithm Member Algorithm Input: Rule X Y and functional dependencies F Output: TRUE whenever the rule is derived from F Method: begin if ( Xclosure (X, F) return( True ) else return( False ); Y ) then end; Example Question: Does rule AB EH follow from F = { AB E, AG J, BE I, E G, GI H } Answer: YES. Observe that (AB)+ = ABEIGHJ EH. 55 Linear Closure – XF+ Input: A set of attributes X and a set of functional dependencies F Output: The closure of X under F demote XF+ Procedure LINCLOSURE ( Attribute X, SetOfFDs F) BEGIN /* Initialization */ for each FD W Z in F do begin COUNT[ W Z ] = lenghtOf(w); for each attribute A in W do add rule W Z into LIST[ A ]; end; NEWDEP = X; UPDATE = X; /* Computation */ while ( UPDATE Ø) do begin Choose an attribute A in UPDATE; UPDATE = UPDATE - A; for each FD W Z in LIST[A] do begin COUNT[W Z] = COUNT[W Z] - 1; if ( COUNT[w Z] = 0 ) then ADD = Z - NEWDEP; NEWDEP = NEWDEP ADD; UPDATE = UPDATE ADD; end if; end for; end while; END 56 Linear Closure – XF+ Example Consider the schema r(ABCDEI) subject to the dependencies F = { A D, AB E, BI E, CD I, E C} Find the closure of AE using the Linear Closure algorithm 57 Linear Closure – XF+ Example (continuation…) Tracing the execution of the linear time closure algorithm UPDATE= AE NEWDEP= AE List[A] Rule 1 A D Count[1] = 0 therefore add D to both strings Rule 2 AB E Count[2] = 1 UPDATE= ED NEWDEP= AED List[E] Rule 5 E C Count[5] = 0 therefore add C to both strings UPDATE= DC NEWDEP= AEDC List[D] Rule 4 CD I Count[5] = 1 UPDATE= C NEWDEP= AEDC List[C] Rule 4 CD I Count[4] = 0 therefore add I to both strings UPDATE= I NEWDEP= AEDCI List[I] Rule 3 BI UPDATE= Ø NEWDEP= AEDCI E Count[3] = 1 therefore (AE) + = AEDCI 58 Linear Closure – XF+ Homework You will create a CASE tool for designing ‘good’ databases. The first step involves the implementation of the XLinearClosure() algorithm. 59