Functional Dependencies

advertisement
Lecture 09:
Functional Dependencies,
Database Design
Monday, January 24, 2005
1
Outline
• Functional dependencies (3.4)
• Rules about FDs (3.5)
• Design of a Relational schema (3.6)
2
Functional Dependencies
Definition: A1, ..., Am  B1, ..., Bn holds in R if:
t, t’  R, (t.A1=t’.A1  ...  t.Am=t’.Am  t.B1=t’.B1  ...  t.Bn=t’.Bn )
R
A1
...
Am
B1
...
Bm
t
t’
if t, t’ agree here
then t, t’ agree here
3
In Class: Find All FDs
Student
Alice
Bob
Alice
Carol
Dan
Elsa
Frank
Dept
CSE
CSE
EE
CSE
CSE
CSE
EE
Course
C++
C++
HW
DB
Java
DB
Circuits
Room
020
020
040
045
050
045
020
4
Answer
Course  Dept, Room
Dept, Room  Course
Student, Dept  Course, Room
Student, Course  Dept, Room
Student, Room  Dept, Course
Do all FDs
make sense
in practice ?
5
Review: Closure
Example
R(A,B,C,D,E,F)
A, B 
A, D 
B

A, F 
C
E
D
B
Compute {A,B}+
X = {A, B,
}
Compute {A, F}+
X = {A, F,
}
6
Review: Inference
Example:
A, B  C
A, D  B
B
 D
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)
BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X  Y, s.t. Y  X+ and XY = :
AB  CD, ADBC, ABC  D, ABD  C, ACD  B
7
Keys
• A superkey is a set of attributes A1, ..., An s.t. for any
other attribute B, we have A1, ..., An  B
• In other words: a set of attributes X is a superkey if X+ =
all attributes
• A key is a minimal superkey
– I.e. set of attributes which is a superkey and for which no subset
is a superkey
8
Computing (Super)Keys
• Compute X+ for all sets X
• If X+ = all attributes, then X is a key
• List only the minimal X’s
Note: there can be many keys !
• Example: R(A,B,C), ABC, BCA
Keys: AB and BC
9
Examples of Keys
• Product(name, price, category, color)
name, category  price
category  color
Key is: {name, category} ; superkeys are all supersets
• Enrollment(student, address, course, room, time)
student  address
room, time  course
student, course  room, time
Keys are: [in class]
10
FD’s for E/R Diagrams
Given a relation constructed from an E/R diagram, what is its key?
Rule 1: If the relation comes from an entity set,
the key of the relation is the set of attributes which is the
key of the entity set.
Person(address, name, ssn)
Person
address
name
ssn
11
FD’s for E/R Diagrams
Rule 2: If the relation comes from a many-many relationship,
the key of the relation is the set of all attribute keys in the
relations corresponding to the entity sets
name
Product
Person
buys
price
name
ssn
date
buys(name, ssn, date)
12
FD’s for E/R Diagrams
Except: if there is an arrow from the relationship to E, then
we don’t need the key of E as part of the relation key.
sname
Product
name
card-no
Purchase
CreditCard
Person
Store
ssn
Purchase(name , sname, ssn, card-no)
13
FD’s for E/R Diagrams
More rules:
• Many-one, one-many, one-one relationships
• Multi-way relationships
• Weak entity sets
(Try to find them yourself, or check book)
14
FD’s for E/R Diagrams
Say: “the CreditCard determines the Person”
sname
Product
name
card-no
Purchase
CreditCard
Person
Store
ssn
Incomplete
(what does
it say ?)
Purchase(name , sname, ssn, card-no)
card-no  ssn
15
Relational Schema Design
(or Logical Design)
Main idea:
• Start with some relational schema
• Find out its FD’s
• Use them to design a better relational
schema
16
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Updated anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
17
Relational Schema Design
Recall set attributes (persons with several phones):
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
SSN  Name, City
but not SSN  PhoneNumber
Anomalies:
• Redundancy
= repeat data
• Update anomalies = Fred moves to “Bellevue”
• Deletion anomalies = Joe deletes his phone number:
what is his city ?
18
Relation Decomposition
Break the relation into two:
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
Name
SSN
City
SSN
PhoneNumber
Fred
123-45-6789
Seattle
123-45-6789
206-555-1234
Joe
987-65-4321
Westfield
123-45-6789
206-555-6543
987-65-4321
908-555-2121
Anomalies have gone:
• No more repeated data
• Easy to move Fred to “Bellevue” (how ?)
• Easy to delete all Joe’s phone number (how ?)
19
Relational Schema Design
name
Conceptual Model:
Product
price
Person
buys
name
ssn
Relational Model:
plus FD’s
Normalization:
Eliminates anomalies
20
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
21
Decomposition
• Sometimes it is correct:
Name
Price
Category
Gizmo
19.99
Gadget
OneClick
24.99
Camera
Gizmo
19.99
Camera
Name
Price
Name
Category
Gizmo
19.99
Gizmo
Gadget
OneClick
24.99
OneClick
Camera
Gizmo
19.99
Gizmo
Camera
Lossless decomposition
22
Incorrect Decomposition
• Sometimes it is not:
Name
Price
Category
Gizmo
19.99
Gadget
OneClick
24.99
Camera
Gizmo
19.99
Camera
What’s
incorrect ??
Name
Category
Price
Category
Gizmo
Gadget
19.99
Gadget
OneClick
Camera
24.99
Camera
Gizmo
Camera
19.99
Camera
Lossy decomposition
23
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An  B1, ..., Bm
Then the decomposition is lossless
Note: don’t need A1, ..., An  C1, ..., Cp
Example: name  price, hence the first decomposition is lossless
24
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = will discuss
Boyce Codd Normal Form (BCNF) = will discuss
Others...
25
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if:
If A1, ..., An  B is a non-trivial dependency
in R , then {A1, ..., An} is a superkey for R
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute,
should determine all the attributes of R.
26
BCNF Decomposition Algorithm
Repeat
choose A1, …, Am  B1, …, Bn that violates the BNCF condition
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others])
continue with both R1 and R2
Until no more violations
B’s
R1
A’s
Others
R2
Is there a
2-attribute
relation that is
not in BCNF ?
In practice, we have
27
a better algorithm (next):
Example
Name
Fred
Fred
Joe
Joe
SSN
123-45-6789
123-45-6789
987-65-4321
987-65-4321
PhoneNumber
206-555-1234
206-555-6543
908-555-2121
908-555-1234
City
Seattle
Seattle
Westfield
Westfield
What are the dependencies?
SSN  Name, City
What are the keys?
{SSN, PhoneNumber}
Is it in BCNF?
28
Decompose it into BCNF
Name
SSN
City
Fred
123-45-6789 Seattle
Joe
987-65-4321 Westfield
SSN
PhoneNumber
123-45-6789
206-555-1234
123-45-6789
206-555-6543
987-65-4321
908-555-2121
987-65-4321
908-555-1234
SSN  Name, City
Let’s check anomalies:
• Redundancy ?
• Update ?
• Delete ?
29
Example Decomposition
Person(name, SSN, age, hairColor, phoneNumber)
SSN  name, age
age  hairColor
Decompose in BCNF (in class):
30
BCNF Decomposition Algorithm
BCNF_Decompose(R)
find X s.t.: X ≠X+ ≠ [all attributes]
if (not found) then “R is in BCNF”
let Y = X+ - X
let Z = [all attributes] - X+
decompose R into R1(X  Y) and R2(X  Z)
continue to decompose R1 and R2
31
Find X s.t.: X ≠X+ ≠ [all attributes]
Example BCNF Decomposition
Person(name, SSN, age, hairColor, phoneNumber)
SSN  name, age
age  hairColor
Iteration 1: Person
SSN+ = SSN, name, age, hairColor
Decompose into: P(SSN, name, age, hairColor)
Phone(SSN, phoneNumber)
Iteration 2: P
age+ = age, hairColor
Decompose: People(SSN, name, age)
Hair(age, hairColor)
Phone(SSN, phoneNumber)
What is
the key ?
32
Example
• R(A,B,C,D) A  B,
BC
• In R: A+ = ABC ≠ ABCD
– Split into R1(A,B,C), R2(A,D)
• In R1:
B+
= BC ≠ ABC
– Split into R11(B,C), R12(A,B)
What are
the keys ?
• Final decomposition: R11(B,C), R12(A,B), R2(A,D)
• What happens if in R we first pick B+ ? Or AB+ ?
33
Download