lecture09

advertisement
Lecture 09:
Functional Dependencies,
Database Design
Monday, October 18, 2004
1
Outline
• Functional dependencies (3.4)
• Rules about FDs (3.5)
• Design of a Relational schema (3.6)
2
Functional Dependencies
Definition: A1, ..., Am  B1, ..., Bn holds in R if:
t, t’  R, (t.A1=t’.A1  ...  t.Am=t’.Am  t.B1=t’.B1  ...  t.Bn=t’.Bn )
R
A1
...
Am
B1
...
Bm
t
t’
if t, t’ agree here
then t, t’ agree here
3
Review: Closure
Example
R(A,B,C,D,E,F)
A, B 
A, D 
B

A, F 
C
E
D
B
Compute {A,B}+
X = {A, B,
}
Compute {A, F}+
X = {A, F,
}
4
Review: Inference
Example:
A, B  C
A, D  B
B
 D
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)
BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X  Y, s.t. Y  X+ and XY = :
AB  CD, ADBC, ABC  D, ABD  C, ACD  B
5
Problem: find FD from the data
• Given a database instance
• Find all FD’s satisfied by that instance
• Useful if we don’t get enough information
from our users: need to reverse engineer a
data instance
6
In Class: Find All FDs
Student
Alice
Bob
Alice
Carol
Dan
Elsa
Frank
Dept
CSE
CSE
EE
CSE
CSE
CSE
EE
Course
C++
C++
HW
DB
Java
DB
Circuits
Room
020
020
040
045
050
045
020
Do all FDs
make sense
in practice ?
7
Answer
Course  Dept, Room
Dept, Room  Course
Student, Dept  Course, Room
Student, Course  Dept, Room
Student, Room  Dept, Course
Do all FDs
make sense
in practice ?
8
Keys
• A superkey is a set of attributes A1, ..., An s.t. for
any other attribute B, we have A1, ..., An  B
• A key is a minimal superkey
– I.e. set of attributes which is a superkey and for which
no subset is a superkey
9
Computing (Super)Keys
• Compute X+ for all sets X
• If X+ = all attributes, then X is a key
• List only the minimal X’s
Note: there can be many keys !
• Example: R(A,B,C), ABC, BCA
Keys: AB and BC
10
Examples of Keys
• Product(name, price, category, color)
name, category  price
category  color
Key is: {name, category} ; superkeys are all supersets
• Enrollment(student, address, course, room, time)
student  address
room, time  course
student, course  room, time
Keys are: [in class]
11
FD’s for E/R Diagrams
Given a relation constructed from an E/R diagram, what is its key?
Rule 1: If the relation comes from an entity set,
the key of the relation is the set of attributes which is the
key of the entity set.
Person(address, name, ssn)
Person
address
name
ssn
12
FD’s for E/R Diagrams
Rule 2: If the relation comes from a many-many relationship,
the key of the relation is the set of all attribute keys in the
relations corresponding to the entity sets
name
Product
Person
buys
price
name
ssn
date
buys(name, ssn, date)
13
FD’s for E/R Diagrams
Except: if there is an arrow from the relationship to E, then
we don’t need the key of E as part of the relation key.
sname
Product
name
card-no
Purchase
CreditCard
Person
Store
ssn
Purchase(name , sname, ssn, card-no)
14
FD’s for E/R Diagrams
More rules:
• Many-one, one-many, one-one relationships
• Multi-way relationships
• Weak entity sets
(Try to find them yourself, or check book)
15
FD’s for E/R Diagrams
Say: “the CreditCard determines the Person”
sname
Product
name
card-no
Purchase
CreditCard
Person
Store
ssn
Incomplete
(what does
it say ?)
Purchase(name , sname, ssn, card-no)
card-no  ssn
16
Relational Schema Design
(or Logical Design)
Main idea:
• Start with some relational schema
• Find out its FD’s
• Use them to design a better relational
schema
17
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Updated anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
18
Relational Schema Design
Recall set attributes (persons with several phones):
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
SSN  Name, City
but not SSN  PhoneNumber
Anomalies:
• Redundancy
= repeat data
• Update anomalies = Fred moves to “Bellevue”
• Deletion anomalies = Joe deletes his phone number:
what is his city ?
19
Relation Decomposition
Break the relation into two:
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
Name
SSN
City
SSN
PhoneNumber
Fred
123-45-6789
Seattle
123-45-6789
206-555-1234
Joe
987-65-4321
Westfield
123-45-6789
206-555-6543
987-65-4321
908-555-2121
Anomalies have gone:
• No more repeated data
• Easy to move Fred to “Bellevue” (how ?)
• Easy to delete all Joe’s phone number (how ?)
20
Relational Schema Design
name
Conceptual Model:
Product
price
Person
buys
name
ssn
Relational Model:
plus FD’s
Normalization:
Eliminates anomalies
21
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
22
Decomposition
• Sometimes it is correct:
Name
Price
Category
Gizmo
19.99
Gadget
OneClick
24.99
Camera
Gizmo
19.99
Camera
Name
Price
Name
Category
Gizmo
19.99
Gizmo
Gadget
OneClick
24.99
OneClick
Camera
Gizmo
19.99
Gizmo
Camera
Lossless decomposition
23
Incorrect Decomposition
• Sometimes it is not:
Name
Price
Category
Gizmo
19.99
Gadget
OneClick
24.99
Camera
Gizmo
19.99
Camera
What’s
incorrect ??
Name
Category
Price
Category
Gizmo
Gadget
19.99
Gadget
OneClick
Camera
24.99
Camera
Gizmo
Camera
19.99
Camera
Lossy decomposition
24
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An  B1, ..., Bm
Then the decomposition is lossless
Note: don’t need A1, ..., An  C1, ..., Cp
Example: name  price, hence the first decomposition is lossless
25
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = will discuss
Boyce Codd Normal Form (BCNF) = will discuss
Others...
26
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if:
If A1, ..., An  B is a non-trivial dependency
in R , then {A1, ..., An} is a superkey for R
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute,
should determine all the attributes of R.
27
BCNF Decomposition Algorithm
Repeat
choose A1, …, Am  B1, …, Bn that violates the BNCF condition
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others])
continue with both R1 and R2
Until no more violations
B’s
R1
A’s
Others
R2
Is there a
2-attribute
relation that is
not in BCNF ?
28
Example
Name
Fred
Fred
Joe
Joe
SSN
123-45-6789
123-45-6789
987-65-4321
987-65-4321
PhoneNumber
206-555-1234
206-555-6543
908-555-2121
908-555-1234
City
Seattle
Seattle
Westfield
Westfield
What are the dependencies?
SSN  Name, City
What are the keys?
{SSN, PhoneNumber}
Is it in BCNF?
29
Decompose it into BCNF
Name
SSN
City
Fred
123-45-6789 Seattle
Joe
987-65-4321 Westfield
SSN
PhoneNumber
123-45-6789
206-555-1234
123-45-6789
206-555-6543
987-65-4321
908-555-2121
987-65-4321
908-555-1234
SSN  Name, City
Let’s check anomalies:
• Redundancy ?
• Update ?
• Delete ?
30
Summary of BCNF
Decomposition
Find a dependency that violates the BCNF condition:
A1, A2, …, An  B1, B2, …, Bm
Heuristics: choose B1 , B2, … Bm“as large as possible”
Decompose:
Is there a
2-attribute
relation that is
not in BCNF ?
Others
R1
A’s
B’s
R2
Continue until
there are no
BCNF violations
left.
31
BCNF Decomposition Hints
How you should proceed (HW, midterm, …)
• Iterate over every set X, from small to large
• Compute X+
• If X ≠ X+ ≠ [all attributes]:
– Then it is a violation of BCNF (why ?)
– Decompose into X+ and ([all attributes] - X+)
Iterating over the X’s in different orders may lead to different
BCNF results, but it doesn’t matter as long as the result is in BCNF
32
Example
• R(A,B,C,D) A  B,
BC
• In R: A+ = ABC ≠ ABCD
– Split into R1(A,B,C), R2(A,D)
• In R1: B+ = BC ≠ ABC
– Split into R11(B,C), R12(A,B)
• Final decomposition: R11(B,C), R12(A,B), R2(A,D)
• What happens if in R we first pick B+ ?
33
Example Decomposition
Person(name, SSN, age, hairColor, phoneNumber)
SSN  name, age
age  hairColor
Decompose in BCNF (in class):
34
Download