ppt

advertisement
Examples
An FD holds, or does not hold on an instance:
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
E1111
Smith
9876
Salesrep
22
Example
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
E1111
Position  Phone
Smith
9876
Salesrep
33
Constraints Prevent (some)
Anomalies in the Data
A poorly designed database causes anomalies:
Student
Mary
Course
CS351
Room
V12
If every course is in only
one room, contains
redundant information!
If we update the room
number for just one tuple
(Update anomaly)
Joe
CS351
V12
Suppose
everyone
drops the
course suddenly… we lose
information about where the course is! (Delete Anomaly)
Not be able to reserve rooms for courses without
Sam
CS351
V12
4
students.(Insert anomaly)
Constraints Prevent (some)
Anomalies in the Data
Is this better?
Student
Mary
Course
CS351
Course
•
•
Joe
CS351
•
Room
CS351
V12
Redundancy?
Update anomaly?
Delete Anomaly?
5
Continue
6
Terminology

Schema

Schema Instances

Relations under the schema
7
Examples
schema
instances:
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
EmpID
Mike
Name
9876
Phone
Salesrep
Position
E9999
E0045
Mary
Smith
1234
1234
Lawyer
Clerk
99
Example: R(A,B)
1
2
3
4
5
1
2
3
4
1
3
1
10
Functional Dependency
A form of constraint on instances of a schema

A relation is an instance of its schema

Holds on some instances not others.

Part of the schema, helps define a valid instance.
11
Functional Dependencies
X ->Y is an assertion about a relation R that
whenever two tuples of R agree on all the
attributes of X, then they must also agree on
all attributes in set Y.
“Whenever two tuples agree on X then they agree on Y.”
12
Functional Dependencies
X ->Y is an assertion about a relation R that
whenever two tuples of R agree on all the
attributes of X, then they must also agree on
all attributes in set Y.
 Say “X ->Y holds in R.”
 Convention: …, X, Y, Z represent sets of
attributes; A, B, C,… represent single attributes.
 Convention: no set formers in sets of attributes,
just ABC, rather than {A,B,C }.
13
A Picture Of FDs
A1
…
Am
B1
…
Bn
t1
t2
If t1,t2 agree here
Also agree here!
FD X Y on R where X={A1,…,Am} and Y = {B1,…Bm} holds
If for all t1,t2 in R
t1[A1] = t2[A1] AND t1[A2]=t2[A2] AND … AND t1[Am] = t2[Am]
then
t1[B1] = t2[B1] AND t1[B2]=t2[B2] AND … AND t1[Bm] = t2[Bm]
14
Examples
An FD holds, or does not hold on an instance:
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
E1111
Smith
9876
Salesrep
15
15
Example
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
E1111
Position  Phone
Smith
9876
Salesrep
16
16
Example
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
but not Phone  Position
E1111
Smith
9876
Salesrep
17
17
FD Meaning
FD is a constraint of a schema, that it says that it
allows schema instances for which where the FD
holds.
a property of the schema, not of a particular schema
instance.
18
FD Meaning
FD is a constraint of a schema, that it says that it
allows schema instances for which where the FD
holds.
You can check if an FD is violated by an instance, but
cannot prove that an FD is part of the schema using an
instance.
19
Examples
An FD holds, or does not hold on an instance:
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
Position->Phone, EmpID->Name,Phone,Position
Name->Phone, Phone->Position, Phone->Name
21
21
Examples
An FD holds, or does not hold on an instance:
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
E9999
Mary
1234
Lawyer
22
Position->Phone, EmpID->Name,Phone,Position
Name->Phone, Phone->Position, Phone->Name
Examples
An FD holds, or does not hold on an instance:
EmpID
Name
Phone
Position
E0045
Smith
1234
Clerk
E3542
Mike
9876
Salesrep
E1111
Smith
9876
Salesrep
23
Position->Phone, EmpID->Name,Phone,Position
Name->Phone, Phone->Position, Phone->Name
Examples
An FD holds, or does not hold on an instance:
A
B
C
a
b
c
?
?
?
A->B, A->C
B->C
24
Where are FD’s from?
From physics
–
–
no two courses can meet in the same
room at the same time
hour room -> course.
From semantics
–
–
no product has two listing prices
productID -> price.
From inferring
25
Inferring FD’s
Prod(Name, Color, Category, Department, Price)
Name
Gizmo
Color
Green
Category
Dep
Gadget
Toys
1. Name
 ColorGadget
Widget
Black
Toys
Gizmo
Garden
2. Category  Department
3. Color, Category  Price
Green
Whatsit
Price
49
59
Provided
FDs
99
Observation: Name, Category  Price also holds on
any instance
26
Inferring FD’s
Given a set of FDs {f1,…fn}, does an FD g hold?
Inference problem: How do we decide?
27
Inferring FD’s
We are given FD’s X1 -> A1, X2 -> A2,…,
Xn -> An , and we want to know whether
an FD Y -> B must hold in any relation
that satisfies the given FD’s.
 Example: If A -> B and B -> C hold, surely
A -> C holds, even if we don’t say so.
28
Inferring FD’s
Given a set of FDs {f1,…fn}, does an FD g hold?
Inference problem: How do we decide?
1.
2.
3.
Answer: Three simple rules called
Armstrong’s Rules.
Split
Reduction
Transitivity
29
1. Split
A1
…
Am
B1
…
Bn
A1, …, Am  B1,…,Bn
… is equivalent to the following n rules…
A1,…,Am  Bi for i=1,…,n
30
Splitting Right Sides of FD’s
X->B1B2…Bn holds for R exactly when
each of X->B1, X->B2,…, X->Bn hold for
R.
Example: A->BC is equivalent to A->B
and A->C.
 no splitting rule for left sides.
We generally express FD’s with
singleton right sides.
31
Splitting Right Sides of FD’s
title, year->length genre studioName
title, year->length
title, year->genre
title, year->studioName
32
Example: FD’s
name
Janeway
Janeway
Spock
addr
Voyager
Voyager
Enterprise
beersLiked manf
Bud
A.B.
WickedAle Pete’s
Bud
A.B.
favBeer
WickedAle
WickedAle
Bud
Drinkers(name, addr, beersLiked, manf, favBeer)
33
Example: FD’s
Drinkers(name, addr, beersLiked, manf,
favBeer)
 Reasonable FD’s to assert:
1. name -> addr favBeer
•
•
name -> addr
name -> favBeer
2. beersLiked -> manf
34
Example: FD’s
name
Janeway
Janeway
Spock
addr
Voyager
Voyager
Enterprise
name -> addr
beersLiked manf
Bud
A.B.
WickedAle Pete’s
Bud
A.B.
favBeer
WickedAle
WickedAle
Bud
name -> favBeer
beersLiked -> manf
35
2. Reduction/Trivial
A1
…
Am
A1,…,Am  Aj for any j=1,…,m
36
3. Transitivity
A1
…
Am
B1
…
Bn
C1
…
A1, …, Am  B1,…,Bn and
B1,…,Bn  C1,…,Ck
implies
A1,…,Am  C1,…,Ck
37
Ck
Transitivity
Prod(Name, Color, Category, Department, Price)
Name
Color
Category
Dep
Price
Gizmo
Green
Gadget
Toys
49
Widget
Black
Gadget
Toys
59
1. Name  Color
2. Category  Department
Gizmo3. Color,
GreenCategory
Whatsit
Garden
 Price
Provided FDs
99
Name, Category  ?
38
Inference Test
To test if Y -> B, start by assuming two
tuples agree in all attributes of Y.
Y
0000000. . . 0
00000?? . . . ?
39
Inference Test
Use the given FD’s to infer that these
tuples must also agree in certain other
attributes.
 If B is one of these attributes, then Y -> B
is true.
 Otherwise, Y -> B does not follow from
the given FD’s.
40
Closure Test
An easier way to test is to compute the
closure of Y, denoted Y +.
41
Closure of a set of Attributes
Given a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes Y
s.t. A1, …, An  Y
Example:
name  color
category  department
color, category  price
Closures:
name+ = ?
{name, category}+ = ?
color+ =
42
Closure of a set of Attributes
Given a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes Y
s.t. A1, …, An  Y
Example:
name  color
category  department
color, category  price
Closures:
name+ = {name, color}
{name, category}+ =
{name, category, color, department, price}
43
color+ = {color}
Compute Closure
Basis: Y + = Y.
Induction: Look for an FD’s left side X
that is a subset of the current Y +. If
the FD is X -> A, add A to Y +.
44
X
Y+
A
new Y+
45
Closure Algorithm
Start with X={A1, …, An}.
Repeat until X doesn’t change do:
B1, …, Bn  C is a FD and
B1, …, Bn are all in X
then add C to X.
if
Example:
name  color
category  department
color, category  price
{name, category}+ ={name, category, color, department, price }
Hence:
name, category  color, department, price
46
Example
A, B  C
A, D  E
B
 D
A, F  B
R(A,B,C,D,E,F)
Compute {A,B}+
X = {A, B,
}
Compute {A, F}+ X = {A, F,
}
47
Project milestones



Proposal, Thursday March 5
Demo, Tuesday April 28
Report, Thursday May 7
48
Why Do We Need the Closure?
•
With closure we can find all FD’s easily
•
To check if X -> A
–
Compute X+
–
Check if A is in X+
49
Example: Closure Test
Compute: AB +.
FDs:
–
–
–
–
AB -> C
BC -> AD
D -> E
CF ->B
50
Example: Closure Test
Compute: AB +.
FDs:
–
–
–
–
AB
+
AB -> C
BC -> AD
D -> E
CF ->B
= {A, B, C, D, E}
51
Example: Closure Test
AB ->D?
FDs:
–
–
–
–
AB
+
AB -> C
BC -> AD
D -> E
CF ->B
= {A, B, C, D, E}
52
Example: Closure Test
D->A?
FDs:
–
–
–
–
AB -> C
BC -> AD
D -> E
CF ->B
53
Using Closure to Infer ALL FDs
Example:
A, B  C
A, D  B
B
 D
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute–?)
BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all non-trivial FD’s:
55
AB  CD, ADBC, ABC  D, ABD  C, ACD 
55 B
Another Example
•
Enrollment(student, major, course, room, time)
student  major
major, course  room
course  time
student,course?
56
Keys of Relations
 K is a superkey for relation R if
K functionally determines all of R.
 K is a key for R if K is a superkey,
but no proper subset of K is a
superkey.
57
Example: Superkey
Drinkers(name, addr, beersLiked, manf,
favBeer)
 {name, beersLiked} is a superkey
because together these attributes
determine all the other attributes.
 name -> addr favBeer
 beersLiked -> manf
58
Example: Key
{name, beersLiked} is a key because
neither {name} nor {beersLiked} is a
superkey.
 name doesn’t -> manf; beersLiked doesn’t
-> addr.
There are no other keys, but lots of
superkeys.
 Any superset of {name, beersLiked}.
59
Where Do Keys Come From?
1. Just assert a key K.
 The only FD’s are K -> A for all
attributes A.
2. Assert FD’s and deduce the keys by
systematic exploration.
60
Finding keys and superkeys
•
For each set of attributes X
–
Compute X+
–
If X+ = {set of all attributes} then X is a superkey
–
If X is minimal, then it is a key
Do we need to check all sets of
attributes? Which sets?
61
Example of Keys
Product(name, price,category, color)
name, category  price
category  color
What is a key?
62
Example of Keys
Product(name, price,category, color)
name, category  price
category  color
(name,category)+ =
(name,price,category,color)
63
The key or a key?
Consider R(A,B,C) can you define FDs
such that there are two keys?
64
The Key or a key?
Consider R(A,B,C)
{ AB  C, BC  A }
{ A  BC, B  AC }
What are the keys?
Can you come up with three keys?
65
Decomposition
Student
Course
Room
Mary
CS351
V12
Joe
CS351
V12
Sam
CS351
V12
..
..
..
66
Decomposition
Student
Course
Mary
CS351
Joe
CS351
Course
CS351
Room
V12
67
Decomposition
Student
Course
Room
Mary
CS351
V12
Joe
CS351
V12
Sam
CS351
V12
..
..
..
Course->Room
68
Decomposition
Student
Course
Mary
CS351
Joe
CS351
Course
CS351
Room
V12
Course->Room
69
Finding FD’s in decomposition
Motivation: “normalization,” the process
where we break a relation schema into
two or more schemas.
Example: ABCD with FD’s AB ->C,
C ->D, and D ->A.
 Decompose into ABC, AD. What FD’s hold in
ABC ?
 Not only AB ->C, but also C ->A !
70
Why?
ABCD
a1b1cd1
a2b2cd2
comes
from
ABC
a1b1c
a2b2c
d1=d2 because
C -> D
a1=a2 because
D -> A
Thus, tuples in the projection
with equal C’s have equal A’s;
C -> A.
71
Basic Idea
1. Start with given FD’s and find all
nontrivial FD’s that follow from the
given FD’s.
 Nontrivial = right side not contained in
the left.
2. Restrict to those FD’s that involve only
attributes of the projected schema.
72
Simple, Exponential Algorithm
1. For each set of attributes X, compute X +.
2. Add X ->A for all A in X + - X.
3. However, drop XY ->A whenever we
discover X ->A.
 Because XY ->A follows from X ->A in any
projection.
4. Finally, use only FD’s involving projected
attributes.
73
A Few Tricks
No need to compute the closure of the
empty set or of the set of all attributes.
If we find X + = all attributes, so is the
closure of any superset of X.
74
Example: Projecting FD’s
ABC with FD’s A ->B and B ->C.
Project onto AC.
 A +=ABC ; yields A ->B, A ->C.
• We do not need to compute AB
+
or AC +.
 B +=BC ; yields B ->C.
 C +=C ; yields nothing.
 BC +=BC ; yields nothing.
75
Example -- Continued
Resulting FD’s: A ->B, A ->C, and B >C.
Projection onto AC : A ->C.
 Only FD that involves a subset of {A,C }.
76
Basis
S: a set of FD’s
basis: any equivalent set of S
77
Basis
S
–
–
–
–
–
–
A->B
A->C
B->A
B->C
C->A
C->B
78
Basis
Basis of S
–
–
–
–
–
–
A->B
A->C
B->A
B->C
C->A
C->B
79
Basis
Basis of S
–
–
–
–
–
–
A->B
A->C
B->A
B->C
C->A
C->B
80
Basis
Basis of S
–
–
–
–
–
–
A->B
A->C
B->A
B->C
C->A
C->B
81
Minimal Basis
A minimal basis for S
–
–
–
Singleton right sides
Removing any FD, the result is not a
basis
Removing any left-side attribute, the
result is not a basis
82
Minimal Basis
Minimal basis of S
–
–
–
–
–
–
A->B
A->C
B->A
B->C
C->A
C->B
83
Example 2: Projecting FD’s
FD’s: A ->B, B ->C, C ->D




A +=ABCD
B +=BCD
C +=CD
D +=D
Project to ACD
Minimal basis for ACD
84
A Geometric View of FD’s
Imagine the set of all instances of a
particular relation.
That is, all finite sets of tuples that
have the proper number of
components.
Each instance is a point in this space.
85
Example: R(A,B)
1
2
3
4
5
1
2
3
4
1
3
1
86
Example: A -> B for R(A,B)
{(1,2), (3,4)}
{}
1
2
3
4
A -> B
{(5,1)}
5
1
1
2 (1,3)}
{(1,2), (3,4),
3
1
4
3
87
An FD is a Subset of Instances
 For each FD X -> A there is a subset
of all instances that satisfy the FD.
 We can represent an FD by a region in
the space.
 Trivial FD = an FD that is represented
by the entire space.
 Example: A -> A.
88
Representing Sets of FD’s
If each FD is a set of relation instances,
then a collection of FD’s corresponds to
the intersection of those sets.
 Intersection = all instances that satisfy all
of the FD’s.
89
Example
Instances satisfying
A->B, B->C, and
CD->A
A->B
B->C
CD->A
90
Example
Is this A->C?
A->B
B->C
92
Example
A->B A->C B->C
93
Example
A->B
B->C
A->C
YES
YES
YES
{{a, b, c}}
NO
YES
YES
{{a, b1, c}, {a, b2, c}}
YES
NO
YES
{{a1, b, c1}, {a2, b, c2}}
NO
NO
YES
{{a, b1, c}, {a, b2, c}, {a1, b1, c1}}
94
Relational Schema Design
Goal of relational schema design is to
avoid anomalies and redundancy.
 Update anomaly : one occurrence of a fact
is changed, but not all occurrences.
 Deletion anomaly : valid fact is lost when a
tuple is deleted.
95
Example of Bad Design
name
Janeway
Janeway
Spock
addr
Voyager
Voyager
Enterprise
beersLiked manf
Bud
A.B.
WickedAle Pete’s
Bud
A.B.
favBeer
WickedAle
WickedAle
Bud
Drinkers(name, addr, beersLiked, manf, favBeer)
Data is redundant, and can be figured
out by using the FD’s name -> addr favBeer and
beersLiked -> manf.
96
Example of Bad Design
name
Janeway
Janeway
Spock
addr
Voyager
Voyager
Enterprise
beersLiked manf
Bud
A.B.
WickedAle Pete’s
Bud
A.B.
favBeer
WickedAle
WickedAle
Bud
Drinkers(name, addr, beersLiked, manf, favBeer)
• Update anomaly: if Janeway is transferred to Intrepid,
will we remember to change each of her tuples?
• Deletion anomaly: If nobody likes Bud, we lose track
of the fact that Anheuser-Busch manufactures Bud.
97
Boyce-Codd Normal Form
We say a relation R is in BCNF if
whenever X ->Y is a nontrivial FD that
holds in R, X is a superkey.
 Remember: nontrivial means Y is not
contained in X.
 Remember, a superkey is any superset of a
key (not necessarily a proper superset).
98
Example
Drinkers(name, addr, beersLiked, manf, favBeer)
FD’s: name->addr favBeer, beersLiked->manf
Only key is {name, beersLiked}.
In each FD, the left side is not a
superkey.
Any one of these FD’s shows Drinkers
is not in BCNF
99
Another Example
Beers(name, manf, manfAddr)
FD’s: name->manf, manf->manfAddr
Only key is {name} .
name->manf does not violate BCNF, but
manf->manfAddr does.
100
Decomposition into BCNF
Given: relation R with FD’s F.
Look among the given FD’s for a BCNF
violation X ->Y.
 If any FD following from F violates BCNF,
then there will surely be an FD in F itself
that violates BCNF.
Compute X +.
 Not all attributes, or else X is a superkey.
101
Decompose R Using X -> Y
 Replace R by relations with schemas:
1. R1 = X +.
2. R2 = R – (X
+
– X ).
 Project given FD’s F onto the two
new relations.
102
Decomposition Picture
R1
R-X +
X
R2
X +- X
R
103
Example
Name
SSN
PhoneNumber City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
SSN  Name, City
Joe
987-65-4321
What is the key?
{SSN, PhoneNumber}
908-555-2121
Westfield
908-555-1234
Westfield
104
Joe
987-65-4321
Example
Name
SSN
PhoneNumber City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
Joe
987-65-4321
908-555-1234
Westfield
SSN  Name, City
What is the key? use SSN  Name, City to split
{SSN, PhoneNumber}
105
Example
Name SSN
City
Fred
123-45-6789
Seattle
Joe
987-65-4321
Madison
SSN
PhoneNumber
123-45-6789
206-555-1234
123-45-6789
206-555-6543
987-65-4321
908-555-2121
987-65-4321
908-555-1234
SSN  Name, City
Let’s check anomalies:

Redundancy ?

Update ?

Delete ?
106
Example: BCNF Decomposition
Drinkers(name, addr, beersLiked, manf, favBeer)
F = name->addr, name -> favBeer,beersLiked>manf
 Pick BCNF violation name->addr.
 Close the left side: {name}+ = {name, addr,
favBeer}.
 Decomposed relations:
1. Drinkers1(name, addr, favBeer)
2. Drinkers2(name, beersLiked, manf)
107
Example -- Continued
We are not done; we need to check
Drinkers1 and Drinkers2 for BCNF.
Projecting FD’s is easy here.
For Drinkers1(name, addr, favBeer),
relevant FD’s are name->addr and
name->favBeer.
 Thus, {name} is the only key and Drinkers1
is in BCNF.
108
Example -- Continued
 For Drinkers2(name, beersLiked, manf),
the only FD is beersLiked->manf, and
the only key is {name, beersLiked}.
 Violation of BCNF.
 beersLiked+ = {beersLiked, manf}, so
we decompose Drinkers2 into:
1. Drinkers3(beersLiked, manf)
2. Drinkers4(name, beersLiked)
109
Example -- Concluded
 The resulting decomposition of Drinkers :
1. Drinkers1(name, addr, favBeer)
2. Drinkers3(beersLiked, manf)
3. Drinkers4(name, beersLiked)
 Notice: Drinkers1 tells us about drinkers,
Drinkers3 tells us about beers, and
Drinkers4 tells us the relationship between
drinkers and the beers they like.
110
R(A,B,C,D)
Example
A  B
BC
R(A,B,C,D)
A+ = ABC ≠ ABCD
R2(A,D)
R1(A,B,C)
B+ = BC ≠ ABC
R11(B,C)
R12(A,B)
What are
the keys ?
111
Decompositions
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
112
Theory of Decomposition
Name
Price Category
Gizmo
19.99 Gadget
OneClick
24.99 Camera
Gizmo
19.99 Camera
Category
Name
Price Name
Gizmo
19.99
Gizmo
OneClick
24.99
113
OneClick Camera
Sometimes it is
correct
Lossless
decomposition
Gadget
113
Lossy Decomposition
Name
Price Category
Gizmo
19.99 Gadget
OneClick
24.99 Camera
What’s wrong?
Gizmo
19.99 Camera
Price Category
Name
Category
Gadget
19.99 Gadget
OneClick Camera
24.99 Camera
Gizmo
114
Lossless Decompositions
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
What (set) relationship holds between
R1 Join R2 and R?
Hint: Which tuples of R will be present?
It’s lossless if
we have
equality!
115
Lossless Decompositions
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An  B1, ..., Bm
Then the decomposition is lossless
Note: don’t need A1, ..., An  C1, ..., Cp
BCNF decomposition is always lossless. WHY ?
116
116
The Problem with BCNF
There is one structure of FD’s that
causes trouble when we decompose.
AB ->C and C ->B.
 Example: A = street address, B = city,
C = zip code.
There are two keys, {A,B } and {A,C }.
C ->B is a BCNF violation, so we must
decompose into AC, BC.
117
We Cannot Enforce FD’s
The problem is that if we use AC and
BC as our database schema, we cannot
enforce the FD AB ->C by checking
FD’s in these decomposed relations.
Example with A = street, B = city, and
C = zip on the next slide.
118
An Unenforceable FD
street
city
zip
545 Tech Sq.
Cambidge
02138
545 Tech Sq.
London
02139
street city -> zip
zip->city
street
zip
545 Tech Sq. 02138
545 Tech Sq. 02139
city
Cambridge
London
119
zip
02138
02139
An Unenforceable FD
street
zip
545 Tech Sq. 02138
545 Tech Sq. 02139
city
Cambridge
London
zip
02138
02139
120
An Unenforceable FD
street
zip
545 Tech Sq. 02138
545 Tech Sq. 02139
city
Cambridge
Cambridge
zip
02138
02139
121
An Unenforceable FD
street
zip
545 Tech Sq. 02138
545 Tech Sq. 02139
city
Cambridge
Cambridge
zip
02138
02139
Join tuples with equal zip codes.
street
city
545 Tech Sq. Cambridge
545 Tech Sq. Cambridge
zip
02138
02139
Although no FD’s were violated in the decomposed relations,
FD street city -> zip is violated by the database as a whole.
122
3NF Let’s Us Avoid This Problem
3rd Normal Form (3NF) modifies the
BCNF condition so we do not have to
decompose in this problem situation.
An attribute is prime if it is a member of
any key.
X ->A violates 3NF if and only if X is not
a superkey, and also A is not prime.
123
Example: 3NF
In our problem situation with FD’s
AB ->C and C ->B, we have keys AB
and AC.
Thus A, B, and C are each prime.
Although C ->B violates BCNF, it does
not violate 3NF.
124
What 3NF and BCNF Give You
 There are two important properties of a
decomposition:
1. Lossless Join : it should be possible to project
the original relations onto the decomposed
schema, and then reconstruct the original.
2. Dependency Preservation : it should be
possible to check in the projected relations
whether all the given FD’s are satisfied.
125
3NF and BCNF -- Continued
We can get (1) with a BCNF decomposition.
We can get both (1) and (2) with a 3NF
decomposition.
But we can’t always get (1) and (2) with a
BCNF decomposition.
 street-city-zip is an example.
126
Testing for a Lossless Join
If we project R onto R1, R2,…, Rk , can
we recover R by rejoining?
127
Testing for a Lossless Join
If we project R onto R1, R2,…, Rk , can
we recover R by rejoining?
Any tuple in R can be recovered from
its projected fragments.
So the only question is: when we
rejoin, do we ever get back something
we didn’t have originally?
128
Example: Testing for Lossless Join
A
B
C
a
b
c
..
..
ABC
..
A
B
B
C
a
b
b
c
..
..
..
AB
..
A
B
C
a
b
c
..
..
..
..
..
..
BC
⋈BC
AB
129
Testing for a Lossless Join
If we project R onto R1, R2,…, Rk , can
we recover R by rejoining?
Any tuple in R can be recovered from
its projected fragments.
So the only question is: when we
rejoin, do we ever get back something
we didn’t have originally?
130
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
Decomposition: AB, BC
ABC
4
131
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
Decomposition: AB, BC
ABC
4
A
B
B
C
1
2
2
3
3
2
2
4
BC
AB
132
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
ABC
4
A
B
B
C
1
2
2
3
2
4
3
AB
2
A
B
C
1
2
3
3
2
4
1
2
4
3
2
3
BC
⋈BC
AB
133
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
4
ABC
!B→C
A
B
B
C
1
2
2
3
2
4
3
AB
2
A
B
C
1
2
3
3
2
4
1
2
4
3
2
3
BC
⋈BC
AB
134
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
4
ABC
!B→C
135
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
3
ABC
B→C
136
Example: Testing for Lossless Join
A
B
C
1
2
3
3
2
3
ABC
B→C
A
B
B
C
1
2
2
3
3
AB
2
A
B
C
1
2
3
3
2
3
BC
⋈BC
AB
137
The Chase Test
If we project R onto R1, R2,…, Rk , can
we recover R by rejoining?
Any tuple in R can be recovered from
its projected fragments.
So the only question is: when we
rejoin, do we ever get back something
we didn’t have originally?
138
The Chase Test
A
B
C
a?
b?
c?
ABC
A
B
B
C
..
..
..
..
..
..
..
..
AB
A
B
C
a
b
c
..
..
..
BC
AB
⋈BC
139
The Chase Test
A
B
C
a?
b?
c?
ABC
A
B
B
C
a
b
b
c
..
..
..
..
AB
A
B
C
a
b
c
..
..
..
BC
AB
⋈BC
140
The Chase Test
A
B
C
a
b
c1
a2
b
c
ABC
A
B
B
C
a
b
b
c
..
..
..
..
AB
A
B
C
a
b
c
..
..
..
BC
AB
⋈BC
141
The Chase Test
A
B
C
a
b
c1
a2
b
c
ABC
A
B
B
C
a
b
b
c
..
..
..
..
AB
A
B
C
a
b
c
..
..
..
BC
AB
⋈BC
142
143
The Chase Test
Suppose tuple t comes back in the
join.
Then t is the join of projections of
some tuples of R, one for each Ri of the
decomposition.
Can we use the given FD’s to show that
one of these tuples must be t ?
144
The Chase Test
Start by assuming t = abc… .
For each i, there is a tuple si of R that
has a, b, c,… in the attributes of Ri.
si can have any values in other
attributes.
We’ll use the same letter as in t, but
with a subscript, for these components.
145
Example: The Chase
Let R = ABCD, and the decomposition
be AB, BC, and CD.
Let the given FD’s be C->D and B ->A.
Suppose the tuple t = abcd is the join
of tuples projected onto AB, BC, CD.
146
The tuples
of R projected onto
AB, BC, CD.
A
a
a2 a
a3
Use B ->A
The Tableau
B
b
b
b3
C
c1
c
c
D
d1
d2 d
d
Use C->D
We’ve proved the
second tuple must be t.
147
Summary of the Chase
1. If two rows agree in the left side of a FD, make
their right sides agree too.
2. Always replace a subscripted symbol by the
corresponding unsubscripted one, if possible.
3. If we ever get an unsubscripted row, we know
any tuple in the project-join is in the original (the
join is lossless).
4. Otherwise, the final tableau is a counterexample.
148
Example: Lossy Join
Same relation R = ABCD and same
decomposition.
But with only the FD C->D.
149
These projections
rejoin to form
abcd.
A
a
a2
a3
The Tableau
B
b
b
b3
C
c1
c
c
These three tuples are an example
R that shows the join lossy. abcd
is not in R, but we can project and
rejoin to get abcd.
D
d1
d2
d
150
Lossless Decompositions
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An  B1, ..., Bm
Then the decomposition is lossless
Note: don’t need A1, ..., An  C1, ..., Cp
BCNF decomposition is always lossless. WHY ?
151
151
3NF Synthesis Algorithm
 Construct a decomposition into 3NF
relations with a lossless join and
dependency preservation.
 Algorithm:
1. Find a minimal base
2. For each FD X->A, create relation XA
3. If no decomposed relation contains a key,
create a relation with a key as schema
152
3NF Synthesis Algorithm
 Need minimal basis for the FD’s:
1. Right sides are single attributes.
2. No FD can be removed.
3. No attribute can be removed from a left side.
153
Constructing a Minimal Basis
1. Split right sides.
2. Repeatedly try to remove an FD and
see if the remaining FD’s are
equivalent to the original.
3. Repeatedly try to remove an attribute
from a left side and see if the resulting
FD’s are equivalent to the original.
154
Why It Works
Preserves dependencies: each FD from
a minimal basis is contained in a
relation, thus preserved.
Lossless Join: use the chase to show
that the row for the relation that
contains a key can be made allunsubscripted variables.
3NF: hard part – a property of minimal
bases.
155
3NF Synthesis Algorithm
Example
–
–
–
FD’s AB ->C , C ->B, and A ->D
Minibase
Decomposition
•
•
•
•
ABC
BC(dropped)
AD
ABE
156
Download