Normal Forms

advertisement
Logical Database Design (2 of 3)
John Ortiz
Finding All Candidate Keys (cont.)
Method 2 (manual approach):
Step 1: Draw the dependency graph of F. Each
vertex corresponds to an attribute. Edges can
be defined as follows:
AB
becomes
A
B
A  BC becomes
A
B
C
AB  C becomes
A
B
C
Lecture 7
Logical Database Design (2)
3
Finding All Candidate Keys (cont.)
Step 2: Identify the set of vertices Vni that
have no incoming edges.
Step 3: Identify the set of vertices Voi that
have only incoming edges.
Step 4: A candidate key is a set of attributes
that
contains all attributes in Vni
contains no attribute in Voi
has no subset that is already a candidate
key
Lecture 7
Logical Database Design (2)
4
An Example Using Method 2
Consider R(A, B, C, G, H, I), and
F = {A  BC, CG  HI, B  H }
A
B
H
C
G
I
Vni = {A, G}, Voi = {H, I}.
Since (AG)+ = ABCGHI, AG is the only candidate
key of R.
Lecture 7
Logical Database Design (2)
5
Another Example Using Method 2
Consider R(A, B, C, D, E, H), and
F = {A  B, AB  E, BH  C, C  D, D A }
A
B
E
C
D
H
Vni = { H }, Voi = { E }.
Candidate keys: AH, BH, CH, DH.
Lecture 7
Logical Database Design (2)
6
Normal Forms
 If a relation is in a certain normal form
(BCNF, 3NF, …), certain types of redundancy
is known to be avoided/eliminated.
 A relation schema R is in First Normal Form
(1NF) if every attribute of R takes only single
and atomic values.
Every relation is in 1NF
1NF allows all kinds of redundancy
Higher normal forms are defined in terms of
FDs.
Lecture 7
Logical Database Design (2)
7
Second Normal Form (2NF)
Let F be a set of FDs satisfied by R.
 An attribute of R is prime if it appears in a
candidate key (according to F) of R.
 Y is fully functionally dependent on X if F
implies X  Y, but not W  Y where W  X.
 R is in Second Normal Form (2NF) if every
non-prime attribute of R is fully functionally
dependent of every candidate key.
If a part of a candidate key can determine a
non-prime attribute, R is not in 2NF.
Lecture 7
Logical Database Design (2)
8
2NF: Examples
(1) Consider F = {B  AH, L  CAt} over relation
Bank-Loans (Bank, Assets, Headquarter,
Loan#, Customer, Amount)
 B  A is in F+, where A is non-prime, & B is not
a candidate key. Bank-Loans is not in 2NF.
(2) Consider F = {S  NMG, M  AO} over
Students(SID,Name,Major,GPA,Advisor,Office)
 S is the only candidate key, and has a single
attribute. Students is in 2NF.
2NF relations still allow unwanted redundancy
Lecture 7
Logical Database Design (2)
9
Another Definition of 2NF
 R is in 2NF if for every FD X  Y in F+,
Y  X (trivial); or
every attribute in Y is prime; or
X is not a proper subset of any candidate
key.
R is in 2NF if every candidate key is a single
attribute
Lecture 7
Logical Database Design (2)
10
Third Normal Form (3NF)
Let F be a set of FDs satisfied by R.
 R is in Third Normal Form (3NF) if for every
FD X  A in F+,
(a) A  X (trivial); or
(b) every attribute in A is prime; or
(c) X is a superkey.
 Let X be a candidate key. If Y  B  F+, B
Y, B is non-prime, and Y is not a super key,
then B is non-trivially transitively dependent
of X. 3NF removes this dependency.
Lecture 7
Logical Database Design (2)
11
3NF: Examples
(1) Consider F = {S  NASaDn, Dn  Ds} over
Employees (SSN, Name, Age, Salary,
Dept_name, Dept_manager_SSN)
 Employees is not in 3NF due to Dn  Ds.
(2) Consider F = { CS  Z, Z  C } over
R(City, Street, Zipcode)
 R is in 3NF as each attribute is prime (How
many candidate keys are there?).
3NF may still have redundancy (introduced by
Z  C)
Lecture 7
Logical Database Design (2)
12
Boyce-Codd Normal Forms (BCNF)
Let F be a set of FDs over R.
 R is in Boyce-Codd Normal Form (BCNF) if for
every FD X  A in F+,
(a) A  X (trivial); or
(b) X is a superkey.
Example: Consider R(City, Street, Zipcode) and F
= { CS  Z, Z  C }. R is in 3NF but not in
BCNF because in Z  C, Z is not a superkey.
Lecture 7
Logical Database Design (2)
13
Normal Forms: Summary
 BCNF  3NF  2NF  1NF
 2NF removes some insertion anomalies and
deletion anomalies. Also removes redundancies
caused by partial dependencies on key.
 3NF removes all insertion anomalies and
deletion anomalies. Also removes redundancies
caused by transitive dependencies.
 BCNF achieves all that are achieved by 3NF,
and removes all redundancies caused by FDs.
Lecture 7
Logical Database Design (2)
14
Unnormalized
SSN --> Name, Age, Address, PetID, PetName, PetAge, Type, License#, Vehicle, Color, VehPrice, Year
SSN --> Name, Age, Address
PetID --> PetName, PetAge, Type
License# --> Vehicle, Color, VehPrice, Year
Vehicle --> VehPrice
EMPLOYEES
SSN Name Age Address PetID
D2
111
joe
43
72 R
L1
123
joe
22
57 R
bp1
C1
P1
222 steve 32
12 C
L4
234
jim
35
18 C
C2
F1
333
fred
21
12 Q
L2
F2
S1
343
bob
17
15 H
S2
D1
444
ann
21
32 F
D4
555
777
788
789
987
ann
sally
sally
tasha
elena
21
25
24
27
51
32 F
54 Z
54 Z
54 Z
12 Q
SSN : PetID :: 1 : M
SSN : License# :: M : M
PetName PetAge Type License#
buddy
1
dog
snipper
2
lizard LN 03
bl1
fluffy
1
cat
pete
2
parot LN 01
lenny
1
lizard LN 09
sassy
1
cat
herman
1
frog LN 04
vinny
2
lizard LN 06
feddy
3
frog
sneaky
2
snake
sulky
2
snake LN 14
fido
3
dog
arfy
3
dog
LN 05
C3
cotton
4
cat LN 15
D3
mutz
5
dog LN 07
D5
mutz2
4
dog LN 18
LN 08
L3
lizzy
3
lizard LN 06
Vehicle Color VehPrice (K) Year
van
grn
25
1991
viper
celica
red
yel
70
29
1999
1987
jeep
wagon
blu
red
28
10
1995
1975
truck
blu
28
1982
SUV
SUV
jeep
camry
mustang
wagon
yel
red
blu
wht
red
red
35
35
28
23
28
5
1997
1996
1995
1998
1991
1975
1NF
SSN, PetID, License# --> Name, Age, Address, PetName, PetAge, Type, Vehicle, Color, VehPrice, Year
SSN --> Name, Age, Address
PetID --> PetName, PetAge, Type
License# --> Vehicle, Color, VehPrice, Year
Vehicle --> VehPrice
SSN : PetID :: 1 : M
SSN : License# :: M : M
EMPLOYEES
SSN Name Age Address PetID PetName PetAge Type License# Vehicle Color VehPrice (K)
111
joe 43
72 R
D2
buddy
1
dog LN 03
van
grn
25
111
joe 43
72 R
L1
snipper
2
lizard LN 03
van
grn
25
123
joe 22
57 R
bp1
bl1
222 steve 32
12 C
C1
fluffy
1
cat LN 01
viper
red
70
222 steve 32
12 C
P1
pete
2
parot LN 09
celica
yel
29
222 steve 32
12 C
L4
lenny
1
lizard LN 09
celica
yel
29
234
jim 35
18 C
C2
sassy
1
cat bl2
333
fred 21
12 Q
F1
herman
1
frog LN 04
jeep
blu
28
333
fred 53
12 Q
L2
vinny
2
lizard LN 06
wagon red
10
343
bob 17
15 H
F2
freddy
3
frog LN 14
truck
blu
28
343
bob 17
15 H
S1
sneaky
2
snake LN 14
truck
blu
28
343
bob 17
15 H
S2
sulky
2
snake LN 14
truck
blu
28
444
ann 21
32 F
D1
fido
3
dog bl3
444
ann 21
32 F
D4
arfy
3
dog bl4
555
ann 21
32 F
C3
cotton
4
cat LN 05
SUV
yel
35
555
ann 21
32 F
C3
cotton
5
cat LN 15
SUV
red
35
777 sally 25
54 Z
D3
mutz
5
dog LN 07
jeep
blu
28
788 sally 24
54 Z
D5
mutz2
4
dog LN 18
camry wht
23
789 tasha 27
54 Z
bp2
LN 08
mustang red
28
987 elena 51
12 Q
L3
lizzy
3
lizard LN 06
wagon red
5
Year
1991
1991
1999
1987
1987
1995
1975
1982
1982
1982
1997
1996
1995
1998
1991
1975
Redundancy Unleashed
SSN, PetID, License# --> Name, Age, Address, PetName, PetAge, Type, Vehicle, Color, VehPrice, Year
SSN --> Name, Age, Address
PetID --> PetName, PetAge, Type
License# --> Vehicle, Color, VehPrice, Year
Vehicle --> VehPrice
EMPLOYEES
SSN Name Age Address PetID
111
joe
43
72 R
D2
111
joe
43
72 R
L1
123
joe
22
57 R
bp1
222 steve 32
12 C
C1
222 steve 32
12 C
P1
222 steve 32
12 C
L4
234
jim
35
18 C
C2
333 fred 21
12 Q
F1
333 fred 53
12 Q
L2
343 bob 17
15 H
F2
343 bob 17
15 H
S1
343 bob 17
15 H
S2
444
ann 21
32 F
D1
444
ann 21
32 F
D4
555
ann 21
32 F
C3
555
ann 21
32 F
C3
777 sally 25
54 Z
D3
788 sally 24
54 Z
D5
789 tasha 27
54 Z
bp2
987 elena 51
12 Q
L3
SSN : PetID :: 1 : M
LEGEND: redundant
inconsistent
SSN : License# :: M : M
PetName PetAge Type License#
buddy
1
dog LN 03
snipper
2
lizard LN 03
bl1
fluffy
1
cat LN 01
pete
2
parot LN 09
lenny
1
lizard LN 09
sassy
1
cat bl2
herman
1
frog LN 04
vinny
2
lizard LN 06
freddy
3
frog LN 14
sneaky
2
snake LN 14
sulky
2
snake LN 14
fido
3
dog bl3
arfy
3
dog bl4
cotton
4
cat LN 05
cotton
5
cat LN 15
mutz
5
dog LN 07
mutz2
4
dog LN 18
LN 08
lizzy
3
lizard LN 06
redundant for 2 reasons
Vehicle Color VehPrice (K) Year
van
grn
25
1991
van
grn
25
1991
viper
celica
celica
red
yel
yel
70
29
29
1999
1987
1987
jeep
wagon
truck
truck
truck
blu
red
blu
blu
blu
28
10
28
28
28
1995
1975
1982
1982
1982
SUV
SUV
jeep
camry
mustang
wagon
red
red
grn
wht
red
blu
35
30
28
23
28
10
1997
1996
1995
1998
1991
1975
2NF Raw – Part1
SSN --> Name, Age, Address
PetID --> PetName, PetAge, Type, SSN
License# --> Vehicle, Color, VehPrice, Year
Vehicle --> VehPrice
PEOPLE
PETS
SSN
111
Name Age Address
joe
43
72 R
123
222
joe
steve
22
32
57 R
12 C
234
333
jim
fred
35
21
18 C
12 Q
343
bob
17
15 H
444
ann
21
32 F
555
ann
21
777
788
789
987
sally
sally
tasha
elena
25
24
27
51
PetID
D2
L1
PetName PetAge Type
buddy
1
dog
snipper
2
lizard
SSN
111
111
32 F
C1
P1
L4
C2
F1
L2
F2
S1
S2
D1
D4
C3
fluffy
pete
lenny
sassy
herman
vinny
freddy
sneaky
sulky
fido
arfy
cotton
1
2
1
1
1
2
3
2
2
3
3
4
cat
parot
lizard
cat
frog
lizard
frog
snake
snake
dog
dog
cat
222
222
222
234
333
333
343
343
343
444
444
555
54 Z
54 Z
54 Z
12 Q
D3
D5
bp2
L3
mutz
mutz2
5
4
dog
dog
777
788
lizzy
3
lizard
987
2NF Raw – Part2
JT
VEHICLES
SSN
License#
111
LN 03
222
LN 01
222
LN 09
333
LN 04
333
LN 06
343
LN 14
555
LN 05
555
LN 15
777
LN 07
788
LN 18
789
LN08
987
LN06
License# Vehicle Color VehPrice (K) Year
LN 03
van
grn
25
1991
LN 01
LN 09
viper
celica
red
yel
70
29
1999
1987
LN 04
LN 06
LN 14
jeep
wagon
truck
blu
red
blu
28
10
28
1995
1975
1982
LN 05
LN 15
LN 07
LN 18
LN 08
SUV
SUV
jeep
camry
mustang
yel
red
blu
wht
red
35
35
28
23
28
1997
1996
1995
1998
1991
2NF Clean – Part1
SSN --> Name, Age, Address
PetID --> PetName, PetAge, Type, SSN
License# --> Vehicle, Color, VehPrice, Year
Vehicle --> VehPrice
PEOPLE
SSN Name Age Address
111
joe
43
72 R
123
joe
22
57 R
222 steve 32
12 C
234
jim
35
18 C
333
fred
21
12 Q
343
bob
17
15 H
444
ann
21
32 F
555
ann
21
32 F
777
sally 25
54 Z
788
sally 24
54 Z
789 tasha 27
54 Z
987 elena 51
12 Q
PETS
PetID PetName PetAge Type
C1
fluffy
1
cat
C2
sassy
1
cat
C3
cotton
4
cat
D1
fido
3
dog
D2
buddy
1
dog
D3
mutz
5
dog
D4
arfy
3
dog
D5
mutz2
4
dog
F1
herman
1
frog
F2
freddy
3
frog
L1
snipper
2
lizard
L2
vinny
2
lizard
L3
lizzy
3
lizard
L4
lenny
1
lizard
P1
pete
2
parot
S1
sneaky
2
snake
S2
sulky
2
snake
SSN
222
234
555
444
111
777
444
788
333
343
111
333
987
222
222
343
343
2NF Clean – Part2
JT
VEHICLES
SSN
License#
111
LN 03
222
LN 01
222
LN 09
333
LN 04
333
LN 06
343
LN 14
555
LN 05
555
LN 15
777
LN 07
788
LN 18
789
LN08
987
LN06
License#
LN 01
LN 03
LN 04
LN 05
LN 06
LN 07
LN 08
LN 09
LN 14
LN 15
LN 18
Vehicle Color VehPrice (K)
viper
red
70
van
grn
25
jeep
blu
28
SUV
yel
35
wagon red
10
jeep
blu
28
mustang red
28
celica
yel
29
truck
blu
28
SUV
red
35
camry
wht
23
Year
1999
1991
1995
1997
1975
1995
1991
1987
1982
1996
1998
3NF Clean – Part1
SSN --> Name, Age, Address
PetID --> PetName, PetAge, Type, SSN
License# --> Vehicle, Color, Year
Vehicle --> VehPrice
PEOPLE
PETS
SSN
111
123
222
234
333
343
444
555
777
788
789
987
PetID PetName PetAge Type
C1
fluffy
1
cat
C2
sassy
1
cat
C3
cotton
4
cat
D1
fido
3
dog
D2
buddy
1
dog
D3
mutz
5
dog
D4
arfy
3
dog
D5
mutz2
4
dog
F1
herman
1
frog
F2
freddy
3
frog
L1
snipper
2
lizard
L2
vinny
2
lizard
L3
lizzy
3
lizard
L4
lenny
1
lizard
P1
pete
2
parot
S1
sneaky
2
snake
S2
sulky
2
snake
Name Age Address
joe
43
72 R
joe
22
57 R
steve 32
12 C
jim
35
18 C
fred
21
12 Q
bob
17
15 H
ann
21
32 F
ann
21
32 F
sally 25
54 Z
sally 24
54 Z
tasha 27
54 Z
elena 51
12 Q
SSN
222
234
555
444
111
777
444
788
333
343
111
333
987
222
222
343
343
3NF Clean – Part2
JT
VEHICLES
SSN
License#
111
LN 03
222
LN 01
222
LN 09
333
LN 04
333
LN 06
343
LN 14
555
LN 05
555
LN 15
777
LN 07
788
LN 18
789
LN08
987
LN06
License# Vehicle
LN 01
viper
LN 03
van
LN 04
jeep
LN 05
SUV
LN 06
wagon
LN 07
jeep
LN 08
mustang
LN 09
celica
LN 14
truck
LN 15
SUV
LN 18
camry
VEH
Color
red
grn
blu
yel
red
blu
red
yel
blu
red
wht
Year
1999
1991
1995
1997
1975
1995
1991
1987
1982
1996
1998
Vehicle VehPrice (K)
camry
23
celica
29
jeep
28
mustang
28
SUV
35
truck
28
van
25
viper
70
wagon
10
Normalize the Following Relation
 Universal Relation R
 (A, B, {C, D, K}, E, F(G, H, I), J)
 Given:
 A  B, C  DK, E  F, F  GHI, K  EJ
 A:C is M:N, C:K is 1:M (C is the many), K:E is
1:M (E is the many)
 What do the parenthesis indicate?
 What do the braces indicate?
Lecture 7
Logical Database Design (2)
24
E-R Diagram - Unnormalized
H
G
A
I
F
B
R
C
K
D
Lecture 7
J
E
Logical Database Design (2)
25
Normalize the Following Relation
 Universal Relation R
 (A, B, {C, D, K}, E, F(G, H, I), J)
 Given:
 A  B, C  DK, E  F, F  GHI, K  EJ
 Step 1: Remove any composite attributes
Either determine that the level of detail
provided by G, H, I is unnecessary
OR remove F
 For our purposes we will remove F
Lecture 7
Logical Database Design (2)
26
Normalize the Following Relation
 New Universal Relation R
 (A, B, {C, D, K}, E, G, H, I, J)
 Given:
 A  B, C  DK, E  GHI, K  EJ
 Step 2: Remove any multi-valued attributes
If there is a determinant within the MV
attributes, make it part of the key
AC  BDK
Lecture 7
Logical Database Design (2)
27
Proof
 Given: A  B
 (IR2) AC  BC
(augmentation)
 (IR4) AC  B (decomposition)
 Given: C  DK
 (IR2) AC  DK
 (IR5) AC  BDK
(union)
Lecture 7
Logical Database Design (2)
28
1NF
 1NF Universal Relation R
 R(A, B, C, D, E, G, H, I, J, K)
 Given:
 AC  BD, A  B, C  DK, E  GHI,
 K  EJ
 Find all Candidate Keys:
 Vni (A C), Voi (B D G H I J), E, K have both
 A determines BDK, in which K dets EJ, in
which E dets GHI and C determines DK
 Only Candidate Key is AC
Lecture 7
Logical Database Design (2)
29
E-R Diagram - 1NF
H
G
I
A
B
J
R
K
C
D
Lecture 7
E
Logical Database Design (2)
30
Update Anomalies in 1NF
 R(A, B, C, D, E, G, H, I, J, K)
 AC  BDK, A  B, C  DK, E  GHI,
 K  EJ
 Identify Partial Dependencies:
 A  B, C  DK
 Can’t insert an ‘A’ without a ‘C’ (vice/versa)
 If you delete an ‘A’ may lose info about ‘C’
What info would you lose?
 If you change a ‘B’, may have to change in
multiple places
Lecture 7
Logical Database Design (2)
31
Going to 2NF
 REMOVE PARTIAL DEPENDENCIES
 R(A, B, C, D, E, G, H, I, J, K)
 AC  BDK, A  B, C  DK, E  GHI, K 
EJ
 R1(A, B)
 R2(C, D, K, E, G, H, I, J)
 Given: A:C is M:N, therefore we need what?
 R3(A, C) What is the PK for R3?
 Identify the FK(s).
 Check: Are we in 2NF?
 Part. Deps. in R1?, R2?
Lecture 7
Logical Database Design (2)
32
E-R Diagram – 2NF
H
G
I
A
R1
B
M
N
R3
J
R2
K
C
D
Lecture 7
Logical Database Design (2)
E
33
Update Anomalies in 2NF
 R1(A, B), R2(C, D, K, E, G, H, I, J), R3(A, C)
 Identify Transitive Dependencies:
 Given: A  B, C  DK, E  GHI, K  EJ
 C  K, K  E, E  GHI
 Can’t insert an ‘K’ without a ‘C’ (NOT
vice/versa)
 If you delete an ‘C’ may lose info about ‘K’
What info would you lose?
 If you change a ‘E’, may have to change in
multiple places
Lecture 7
Logical Database Design (2)
34
Going to 3NF
 REMOVE TRANSITIVE DEPENDENCIES
 R1(A, B) – IN 3NF, only one attribute in PK
so impossible to have transitive dependency!
 R2(C, D, K, E, G, H, I, J)
 R3(A, C)
 A  B, C  DK, E  GHI, K  EJ
 C:K is 1:M (C is the many), K:E is 1:M (E is
the many)
 R2 is replaced by:
 R4(C, D, K), R5(K, J), R6(E, G, H, I, K)
Lecture 7
Logical Database Design (2)
35
E-R Diagram – 3NF
K
I
E
R5
H
R6
J
1
G
M
R7
1
R8
K
M
N
A
R1
B
Lecture 7
M
R3
R4
K
C
Logical Database Design (2)
D
36
Another Example
 R(A, B, C, D, E, F, G, H, I, J)
 AB -> F G H I J
B:AB -> 1:M
 B -> C D E
AB:H -> M:N
 H -> I J
 What is the candidate key?
 What normal form is this relation in?
Are there any multi-valued attributes?
Are there any partial dependencies?
Are there any transitive dependencies?
Are there any FDs determining part of the CK?
Lecture 7
Logical Database Design (2)
37
1NF Anomalies
 R(A, B, C, D, E, F, G, H, I, J)
 AB -> F G H I J
B:AB -> 1:M
 B -> C D E
AB:H -> M:N
 H -> I J
 Insertion Anomaly based on Part. Dep.?
 Deletion Anomaly based on Part. Dep.?
 Modification Anomaly based on Part. Dep.?
 To go to 2NF, Decompose Partial
Dependencies
Lecture 7
Logical Database Design (2)
38
2NF
 R1(A, B, F, G, H, I, J)
AB:H -> M:N
 R2(B, C, D, E)
B:AB -> 1:M
 AB -> F G H I J, B -> C D E, H -> I J
 What are the CKs now?
 Are there any foreign keys?
Lecture 7
Logical Database Design (2)
39
2NF Anomalies
 R1(A, B, F, G, H, I, J)
AB:H -> M:N
 R2(B, C, D, E)
B:AB -> 1:M
 AB -> F G H I J, B -> C D E, H -> I J
 Insertion Anomaly based on Trans. Dep.?
 Deletion Anomaly based on Trans. Dep.?
 Modification Anomaly based on Trans. Dep.?
 To go to 3NF, Decompose Transitive
Dependencies
Lecture 7
Logical Database Design (2)
40
3NF
 R1(A, B, F, G, H, I, J)
AB:H -> M:N
 R2(B, C, D, E)
B:AB -> 1:M
 AB -> F G H I J, B -> C D E, H -> I J
 Decompose transitive dependencies, R2 is ok
 R3(A, B, F, G), R1 is gone!
 R4(H, I, J)
 R5(A, B, H)
 What are the candidate keys now?
 What type of relation is R5?
Lecture 7
Logical Database Design (2)
41
BCNF
 If G -> B then we would decompose further to
achieve BCNF
Lecture 7
Logical Database Design (2)
42
Could that last example be real?
 R(A, B, C, D, E, F, G, H, I, J)
 A = Depen. Name
 B = Emp. SSN, C D E = Emp. Name, Off, Ph
 F G = Depen. Rm#, Ph
 H = Depen. Car, I J = car make, model
 Each employee can have many dependents, but
each dependent has only 1 employee, hence
the 1:M between B and AB.
 Perhaps siblings share ownership of the car,
hence the M:N between AB and H
Lecture 7
Logical Database Design (2)
43
Download