Logical Database Design (2 of 3) John Ortiz Finding All Candidate Keys (cont.) Method 2 (manual approach): Step 1: Draw the dependency graph of F. Each vertex corresponds to an attribute. Edges can be defined as follows: AB becomes A B A BC becomes A B C AB C becomes A B C Lecture 7 Logical Database Design (2) 3 Finding All Candidate Keys (cont.) Step 2: Identify the set of vertices Vni that have no incoming edges. Step 3: Identify the set of vertices Voi that have only incoming edges. Step 4: A candidate key is a set of attributes that contains all attributes in Vni contains no attribute in Voi has no subset that is already a candidate key Lecture 7 Logical Database Design (2) 4 An Example Using Method 2 Consider R(A, B, C, G, H, I), and F = {A BC, CG HI, B H } A B H C G I Vni = {A, G}, Voi = {H, I}. Since (AG)+ = ABCGHI, AG is the only candidate key of R. Lecture 7 Logical Database Design (2) 5 Another Example Using Method 2 Consider R(A, B, C, D, E, H), and F = {A B, AB E, BH C, C D, D A } A B E C D H Vni = { H }, Voi = { E }. Candidate keys: AH, BH, CH, DH. Lecture 7 Logical Database Design (2) 6 Normal Forms If a relation is in a certain normal form (BCNF, 3NF, …), certain types of redundancy is known to be avoided/eliminated. A relation schema R is in First Normal Form (1NF) if every attribute of R takes only single and atomic values. Every relation is in 1NF 1NF allows all kinds of redundancy Higher normal forms are defined in terms of FDs. Lecture 7 Logical Database Design (2) 7 Second Normal Form (2NF) Let F be a set of FDs satisfied by R. An attribute of R is prime if it appears in a candidate key (according to F) of R. Y is fully functionally dependent on X if F implies X Y, but not W Y where W X. R is in Second Normal Form (2NF) if every non-prime attribute of R is fully functionally dependent of every candidate key. If a part of a candidate key can determine a non-prime attribute, R is not in 2NF. Lecture 7 Logical Database Design (2) 8 2NF: Examples (1) Consider F = {B AH, L CAt} over relation Bank-Loans (Bank, Assets, Headquarter, Loan#, Customer, Amount) B A is in F+, where A is non-prime, & B is not a candidate key. Bank-Loans is not in 2NF. (2) Consider F = {S NMG, M AO} over Students(SID,Name,Major,GPA,Advisor,Office) S is the only candidate key, and has a single attribute. Students is in 2NF. 2NF relations still allow unwanted redundancy Lecture 7 Logical Database Design (2) 9 Another Definition of 2NF R is in 2NF if for every FD X Y in F+, Y X (trivial); or every attribute in Y is prime; or X is not a proper subset of any candidate key. R is in 2NF if every candidate key is a single attribute Lecture 7 Logical Database Design (2) 10 Third Normal Form (3NF) Let F be a set of FDs satisfied by R. R is in Third Normal Form (3NF) if for every FD X A in F+, (a) A X (trivial); or (b) every attribute in A is prime; or (c) X is a superkey. Let X be a candidate key. If Y B F+, B Y, B is non-prime, and Y is not a super key, then B is non-trivially transitively dependent of X. 3NF removes this dependency. Lecture 7 Logical Database Design (2) 11 3NF: Examples (1) Consider F = {S NASaDn, Dn Ds} over Employees (SSN, Name, Age, Salary, Dept_name, Dept_manager_SSN) Employees is not in 3NF due to Dn Ds. (2) Consider F = { CS Z, Z C } over R(City, Street, Zipcode) R is in 3NF as each attribute is prime (How many candidate keys are there?). 3NF may still have redundancy (introduced by Z C) Lecture 7 Logical Database Design (2) 12 Boyce-Codd Normal Forms (BCNF) Let F be a set of FDs over R. R is in Boyce-Codd Normal Form (BCNF) if for every FD X A in F+, (a) A X (trivial); or (b) X is a superkey. Example: Consider R(City, Street, Zipcode) and F = { CS Z, Z C }. R is in 3NF but not in BCNF because in Z C, Z is not a superkey. Lecture 7 Logical Database Design (2) 13 Normal Forms: Summary BCNF 3NF 2NF 1NF 2NF removes some insertion anomalies and deletion anomalies. Also removes redundancies caused by partial dependencies on key. 3NF removes all insertion anomalies and deletion anomalies. Also removes redundancies caused by transitive dependencies. BCNF achieves all that are achieved by 3NF, and removes all redundancies caused by FDs. Lecture 7 Logical Database Design (2) 14 Unnormalized SSN --> Name, Age, Address, PetID, PetName, PetAge, Type, License#, Vehicle, Color, VehPrice, Year SSN --> Name, Age, Address PetID --> PetName, PetAge, Type License# --> Vehicle, Color, VehPrice, Year Vehicle --> VehPrice EMPLOYEES SSN Name Age Address PetID D2 111 joe 43 72 R L1 123 joe 22 57 R bp1 C1 P1 222 steve 32 12 C L4 234 jim 35 18 C C2 F1 333 fred 21 12 Q L2 F2 S1 343 bob 17 15 H S2 D1 444 ann 21 32 F D4 555 777 788 789 987 ann sally sally tasha elena 21 25 24 27 51 32 F 54 Z 54 Z 54 Z 12 Q SSN : PetID :: 1 : M SSN : License# :: M : M PetName PetAge Type License# buddy 1 dog snipper 2 lizard LN 03 bl1 fluffy 1 cat pete 2 parot LN 01 lenny 1 lizard LN 09 sassy 1 cat herman 1 frog LN 04 vinny 2 lizard LN 06 feddy 3 frog sneaky 2 snake sulky 2 snake LN 14 fido 3 dog arfy 3 dog LN 05 C3 cotton 4 cat LN 15 D3 mutz 5 dog LN 07 D5 mutz2 4 dog LN 18 LN 08 L3 lizzy 3 lizard LN 06 Vehicle Color VehPrice (K) Year van grn 25 1991 viper celica red yel 70 29 1999 1987 jeep wagon blu red 28 10 1995 1975 truck blu 28 1982 SUV SUV jeep camry mustang wagon yel red blu wht red red 35 35 28 23 28 5 1997 1996 1995 1998 1991 1975 1NF SSN, PetID, License# --> Name, Age, Address, PetName, PetAge, Type, Vehicle, Color, VehPrice, Year SSN --> Name, Age, Address PetID --> PetName, PetAge, Type License# --> Vehicle, Color, VehPrice, Year Vehicle --> VehPrice SSN : PetID :: 1 : M SSN : License# :: M : M EMPLOYEES SSN Name Age Address PetID PetName PetAge Type License# Vehicle Color VehPrice (K) 111 joe 43 72 R D2 buddy 1 dog LN 03 van grn 25 111 joe 43 72 R L1 snipper 2 lizard LN 03 van grn 25 123 joe 22 57 R bp1 bl1 222 steve 32 12 C C1 fluffy 1 cat LN 01 viper red 70 222 steve 32 12 C P1 pete 2 parot LN 09 celica yel 29 222 steve 32 12 C L4 lenny 1 lizard LN 09 celica yel 29 234 jim 35 18 C C2 sassy 1 cat bl2 333 fred 21 12 Q F1 herman 1 frog LN 04 jeep blu 28 333 fred 53 12 Q L2 vinny 2 lizard LN 06 wagon red 10 343 bob 17 15 H F2 freddy 3 frog LN 14 truck blu 28 343 bob 17 15 H S1 sneaky 2 snake LN 14 truck blu 28 343 bob 17 15 H S2 sulky 2 snake LN 14 truck blu 28 444 ann 21 32 F D1 fido 3 dog bl3 444 ann 21 32 F D4 arfy 3 dog bl4 555 ann 21 32 F C3 cotton 4 cat LN 05 SUV yel 35 555 ann 21 32 F C3 cotton 5 cat LN 15 SUV red 35 777 sally 25 54 Z D3 mutz 5 dog LN 07 jeep blu 28 788 sally 24 54 Z D5 mutz2 4 dog LN 18 camry wht 23 789 tasha 27 54 Z bp2 LN 08 mustang red 28 987 elena 51 12 Q L3 lizzy 3 lizard LN 06 wagon red 5 Year 1991 1991 1999 1987 1987 1995 1975 1982 1982 1982 1997 1996 1995 1998 1991 1975 Redundancy Unleashed SSN, PetID, License# --> Name, Age, Address, PetName, PetAge, Type, Vehicle, Color, VehPrice, Year SSN --> Name, Age, Address PetID --> PetName, PetAge, Type License# --> Vehicle, Color, VehPrice, Year Vehicle --> VehPrice EMPLOYEES SSN Name Age Address PetID 111 joe 43 72 R D2 111 joe 43 72 R L1 123 joe 22 57 R bp1 222 steve 32 12 C C1 222 steve 32 12 C P1 222 steve 32 12 C L4 234 jim 35 18 C C2 333 fred 21 12 Q F1 333 fred 53 12 Q L2 343 bob 17 15 H F2 343 bob 17 15 H S1 343 bob 17 15 H S2 444 ann 21 32 F D1 444 ann 21 32 F D4 555 ann 21 32 F C3 555 ann 21 32 F C3 777 sally 25 54 Z D3 788 sally 24 54 Z D5 789 tasha 27 54 Z bp2 987 elena 51 12 Q L3 SSN : PetID :: 1 : M LEGEND: redundant inconsistent SSN : License# :: M : M PetName PetAge Type License# buddy 1 dog LN 03 snipper 2 lizard LN 03 bl1 fluffy 1 cat LN 01 pete 2 parot LN 09 lenny 1 lizard LN 09 sassy 1 cat bl2 herman 1 frog LN 04 vinny 2 lizard LN 06 freddy 3 frog LN 14 sneaky 2 snake LN 14 sulky 2 snake LN 14 fido 3 dog bl3 arfy 3 dog bl4 cotton 4 cat LN 05 cotton 5 cat LN 15 mutz 5 dog LN 07 mutz2 4 dog LN 18 LN 08 lizzy 3 lizard LN 06 redundant for 2 reasons Vehicle Color VehPrice (K) Year van grn 25 1991 van grn 25 1991 viper celica celica red yel yel 70 29 29 1999 1987 1987 jeep wagon truck truck truck blu red blu blu blu 28 10 28 28 28 1995 1975 1982 1982 1982 SUV SUV jeep camry mustang wagon red red grn wht red blu 35 30 28 23 28 10 1997 1996 1995 1998 1991 1975 2NF Raw – Part1 SSN --> Name, Age, Address PetID --> PetName, PetAge, Type, SSN License# --> Vehicle, Color, VehPrice, Year Vehicle --> VehPrice PEOPLE PETS SSN 111 Name Age Address joe 43 72 R 123 222 joe steve 22 32 57 R 12 C 234 333 jim fred 35 21 18 C 12 Q 343 bob 17 15 H 444 ann 21 32 F 555 ann 21 777 788 789 987 sally sally tasha elena 25 24 27 51 PetID D2 L1 PetName PetAge Type buddy 1 dog snipper 2 lizard SSN 111 111 32 F C1 P1 L4 C2 F1 L2 F2 S1 S2 D1 D4 C3 fluffy pete lenny sassy herman vinny freddy sneaky sulky fido arfy cotton 1 2 1 1 1 2 3 2 2 3 3 4 cat parot lizard cat frog lizard frog snake snake dog dog cat 222 222 222 234 333 333 343 343 343 444 444 555 54 Z 54 Z 54 Z 12 Q D3 D5 bp2 L3 mutz mutz2 5 4 dog dog 777 788 lizzy 3 lizard 987 2NF Raw – Part2 JT VEHICLES SSN License# 111 LN 03 222 LN 01 222 LN 09 333 LN 04 333 LN 06 343 LN 14 555 LN 05 555 LN 15 777 LN 07 788 LN 18 789 LN08 987 LN06 License# Vehicle Color VehPrice (K) Year LN 03 van grn 25 1991 LN 01 LN 09 viper celica red yel 70 29 1999 1987 LN 04 LN 06 LN 14 jeep wagon truck blu red blu 28 10 28 1995 1975 1982 LN 05 LN 15 LN 07 LN 18 LN 08 SUV SUV jeep camry mustang yel red blu wht red 35 35 28 23 28 1997 1996 1995 1998 1991 2NF Clean – Part1 SSN --> Name, Age, Address PetID --> PetName, PetAge, Type, SSN License# --> Vehicle, Color, VehPrice, Year Vehicle --> VehPrice PEOPLE SSN Name Age Address 111 joe 43 72 R 123 joe 22 57 R 222 steve 32 12 C 234 jim 35 18 C 333 fred 21 12 Q 343 bob 17 15 H 444 ann 21 32 F 555 ann 21 32 F 777 sally 25 54 Z 788 sally 24 54 Z 789 tasha 27 54 Z 987 elena 51 12 Q PETS PetID PetName PetAge Type C1 fluffy 1 cat C2 sassy 1 cat C3 cotton 4 cat D1 fido 3 dog D2 buddy 1 dog D3 mutz 5 dog D4 arfy 3 dog D5 mutz2 4 dog F1 herman 1 frog F2 freddy 3 frog L1 snipper 2 lizard L2 vinny 2 lizard L3 lizzy 3 lizard L4 lenny 1 lizard P1 pete 2 parot S1 sneaky 2 snake S2 sulky 2 snake SSN 222 234 555 444 111 777 444 788 333 343 111 333 987 222 222 343 343 2NF Clean – Part2 JT VEHICLES SSN License# 111 LN 03 222 LN 01 222 LN 09 333 LN 04 333 LN 06 343 LN 14 555 LN 05 555 LN 15 777 LN 07 788 LN 18 789 LN08 987 LN06 License# LN 01 LN 03 LN 04 LN 05 LN 06 LN 07 LN 08 LN 09 LN 14 LN 15 LN 18 Vehicle Color VehPrice (K) viper red 70 van grn 25 jeep blu 28 SUV yel 35 wagon red 10 jeep blu 28 mustang red 28 celica yel 29 truck blu 28 SUV red 35 camry wht 23 Year 1999 1991 1995 1997 1975 1995 1991 1987 1982 1996 1998 3NF Clean – Part1 SSN --> Name, Age, Address PetID --> PetName, PetAge, Type, SSN License# --> Vehicle, Color, Year Vehicle --> VehPrice PEOPLE PETS SSN 111 123 222 234 333 343 444 555 777 788 789 987 PetID PetName PetAge Type C1 fluffy 1 cat C2 sassy 1 cat C3 cotton 4 cat D1 fido 3 dog D2 buddy 1 dog D3 mutz 5 dog D4 arfy 3 dog D5 mutz2 4 dog F1 herman 1 frog F2 freddy 3 frog L1 snipper 2 lizard L2 vinny 2 lizard L3 lizzy 3 lizard L4 lenny 1 lizard P1 pete 2 parot S1 sneaky 2 snake S2 sulky 2 snake Name Age Address joe 43 72 R joe 22 57 R steve 32 12 C jim 35 18 C fred 21 12 Q bob 17 15 H ann 21 32 F ann 21 32 F sally 25 54 Z sally 24 54 Z tasha 27 54 Z elena 51 12 Q SSN 222 234 555 444 111 777 444 788 333 343 111 333 987 222 222 343 343 3NF Clean – Part2 JT VEHICLES SSN License# 111 LN 03 222 LN 01 222 LN 09 333 LN 04 333 LN 06 343 LN 14 555 LN 05 555 LN 15 777 LN 07 788 LN 18 789 LN08 987 LN06 License# Vehicle LN 01 viper LN 03 van LN 04 jeep LN 05 SUV LN 06 wagon LN 07 jeep LN 08 mustang LN 09 celica LN 14 truck LN 15 SUV LN 18 camry VEH Color red grn blu yel red blu red yel blu red wht Year 1999 1991 1995 1997 1975 1995 1991 1987 1982 1996 1998 Vehicle VehPrice (K) camry 23 celica 29 jeep 28 mustang 28 SUV 35 truck 28 van 25 viper 70 wagon 10 Normalize the Following Relation Universal Relation R (A, B, {C, D, K}, E, F(G, H, I), J) Given: A B, C DK, E F, F GHI, K EJ A:C is M:N, C:K is 1:M (C is the many), K:E is 1:M (E is the many) What do the parenthesis indicate? What do the braces indicate? Lecture 7 Logical Database Design (2) 24 E-R Diagram - Unnormalized H G A I F B R C K D Lecture 7 J E Logical Database Design (2) 25 Normalize the Following Relation Universal Relation R (A, B, {C, D, K}, E, F(G, H, I), J) Given: A B, C DK, E F, F GHI, K EJ Step 1: Remove any composite attributes Either determine that the level of detail provided by G, H, I is unnecessary OR remove F For our purposes we will remove F Lecture 7 Logical Database Design (2) 26 Normalize the Following Relation New Universal Relation R (A, B, {C, D, K}, E, G, H, I, J) Given: A B, C DK, E GHI, K EJ Step 2: Remove any multi-valued attributes If there is a determinant within the MV attributes, make it part of the key AC BDK Lecture 7 Logical Database Design (2) 27 Proof Given: A B (IR2) AC BC (augmentation) (IR4) AC B (decomposition) Given: C DK (IR2) AC DK (IR5) AC BDK (union) Lecture 7 Logical Database Design (2) 28 1NF 1NF Universal Relation R R(A, B, C, D, E, G, H, I, J, K) Given: AC BD, A B, C DK, E GHI, K EJ Find all Candidate Keys: Vni (A C), Voi (B D G H I J), E, K have both A determines BDK, in which K dets EJ, in which E dets GHI and C determines DK Only Candidate Key is AC Lecture 7 Logical Database Design (2) 29 E-R Diagram - 1NF H G I A B J R K C D Lecture 7 E Logical Database Design (2) 30 Update Anomalies in 1NF R(A, B, C, D, E, G, H, I, J, K) AC BDK, A B, C DK, E GHI, K EJ Identify Partial Dependencies: A B, C DK Can’t insert an ‘A’ without a ‘C’ (vice/versa) If you delete an ‘A’ may lose info about ‘C’ What info would you lose? If you change a ‘B’, may have to change in multiple places Lecture 7 Logical Database Design (2) 31 Going to 2NF REMOVE PARTIAL DEPENDENCIES R(A, B, C, D, E, G, H, I, J, K) AC BDK, A B, C DK, E GHI, K EJ R1(A, B) R2(C, D, K, E, G, H, I, J) Given: A:C is M:N, therefore we need what? R3(A, C) What is the PK for R3? Identify the FK(s). Check: Are we in 2NF? Part. Deps. in R1?, R2? Lecture 7 Logical Database Design (2) 32 E-R Diagram – 2NF H G I A R1 B M N R3 J R2 K C D Lecture 7 Logical Database Design (2) E 33 Update Anomalies in 2NF R1(A, B), R2(C, D, K, E, G, H, I, J), R3(A, C) Identify Transitive Dependencies: Given: A B, C DK, E GHI, K EJ C K, K E, E GHI Can’t insert an ‘K’ without a ‘C’ (NOT vice/versa) If you delete an ‘C’ may lose info about ‘K’ What info would you lose? If you change a ‘E’, may have to change in multiple places Lecture 7 Logical Database Design (2) 34 Going to 3NF REMOVE TRANSITIVE DEPENDENCIES R1(A, B) – IN 3NF, only one attribute in PK so impossible to have transitive dependency! R2(C, D, K, E, G, H, I, J) R3(A, C) A B, C DK, E GHI, K EJ C:K is 1:M (C is the many), K:E is 1:M (E is the many) R2 is replaced by: R4(C, D, K), R5(K, J), R6(E, G, H, I, K) Lecture 7 Logical Database Design (2) 35 E-R Diagram – 3NF K I E R5 H R6 J 1 G M R7 1 R8 K M N A R1 B Lecture 7 M R3 R4 K C Logical Database Design (2) D 36 Another Example R(A, B, C, D, E, F, G, H, I, J) AB -> F G H I J B:AB -> 1:M B -> C D E AB:H -> M:N H -> I J What is the candidate key? What normal form is this relation in? Are there any multi-valued attributes? Are there any partial dependencies? Are there any transitive dependencies? Are there any FDs determining part of the CK? Lecture 7 Logical Database Design (2) 37 1NF Anomalies R(A, B, C, D, E, F, G, H, I, J) AB -> F G H I J B:AB -> 1:M B -> C D E AB:H -> M:N H -> I J Insertion Anomaly based on Part. Dep.? Deletion Anomaly based on Part. Dep.? Modification Anomaly based on Part. Dep.? To go to 2NF, Decompose Partial Dependencies Lecture 7 Logical Database Design (2) 38 2NF R1(A, B, F, G, H, I, J) AB:H -> M:N R2(B, C, D, E) B:AB -> 1:M AB -> F G H I J, B -> C D E, H -> I J What are the CKs now? Are there any foreign keys? Lecture 7 Logical Database Design (2) 39 2NF Anomalies R1(A, B, F, G, H, I, J) AB:H -> M:N R2(B, C, D, E) B:AB -> 1:M AB -> F G H I J, B -> C D E, H -> I J Insertion Anomaly based on Trans. Dep.? Deletion Anomaly based on Trans. Dep.? Modification Anomaly based on Trans. Dep.? To go to 3NF, Decompose Transitive Dependencies Lecture 7 Logical Database Design (2) 40 3NF R1(A, B, F, G, H, I, J) AB:H -> M:N R2(B, C, D, E) B:AB -> 1:M AB -> F G H I J, B -> C D E, H -> I J Decompose transitive dependencies, R2 is ok R3(A, B, F, G), R1 is gone! R4(H, I, J) R5(A, B, H) What are the candidate keys now? What type of relation is R5? Lecture 7 Logical Database Design (2) 41 BCNF If G -> B then we would decompose further to achieve BCNF Lecture 7 Logical Database Design (2) 42 Could that last example be real? R(A, B, C, D, E, F, G, H, I, J) A = Depen. Name B = Emp. SSN, C D E = Emp. Name, Off, Ph F G = Depen. Rm#, Ph H = Depen. Car, I J = car make, model Each employee can have many dependents, but each dependent has only 1 employee, hence the 1:M between B and AB. Perhaps siblings share ownership of the car, hence the M:N between AB and H Lecture 7 Logical Database Design (2) 43