ppt

advertisement
Lecture 06
The Relational Data Model
1
Outline
• Relational Data Model
• Functional Dependencies
• Logical Schema Design
Reading
Chapter 8
2
The Relational Data Model
Data
Modeling
Relational
Schema
Physical
storage
Have seen
this in SQL
E/R diagrams
Have seen
this too
Tables:
column names: attributes
rows: tuples
Complex
file organization
and index
structures.
Discuss next
3
Terminology
Table name or relation name
Products:
Name
Category
Manufacturer
$19.99
gadgets
GizmoWorks
Power gizmo $29.99
gadgets
GizmoWorks
SingleTouch $149.99
photography
Canon
MultiTouch
household
Hitachi
gizmo
Price
Attribute names
$203.99
Tuples or rows or records
4
Schemas
Relational Schema:
– Relation name plus attribute names
– E.g. Product(Name, Price, Category, Manufacturer)
– In practice we add the domain for each attribute
Database Schema
– Set of relational schemas
– E.g. Product(Name, Price, Category, Manufacturer),
Company(Name, Address, Phone),
.......
5
Instances
• Relational schema = R(A1,…,Ak)
Instance = relation with k attributes (of “type” R)
– values of corresponding domains
• Database schema = R1(…), R2(…), …, Rn(…)
Instance = n relations, of types R1, R2, ..., Rn
This is all mathematics, not to be confused with SQL tables !
(What's a difference?)
6
Example
Relational schema:Product(Name, Price, Category, Manufacturer)
Instance:
Name
Price
Category
Manufacturer
gizmo
$19.99
gadgets
GizmoWorks
Power gizmo $29.99
gadgets
GizmoWorks
SingleTouch $149.99
photography
Canon
MultiTouch
household
Hitachi
$203.99
7
First Normal Form (1NF)
• A database schema is in First Normal Form
Student
if all tables are flat
Student
Name
GPA
Courses
Math
Alice
Bob
Carol
3.8
3.7
3.9
Name
GPA
Alice
3.8
Bob
3.7
Carol
3.9
DB
Takes
OS
Student
Course
Alice
Math
Course
Carol
Math
Math
Alice
DB
DB
Bob
DB
OS
Alice
OS
Carol
OS
DB
Course
OS
Math
OS
May need
to add keys
8
Functional Dependencies
• A form of constraint
– hence, part of the schema
• Finding them is part of the database design
• Also used in normalizing the relations
• Warning: this is the most abstract, and
“hardest” part of the course.
9
Functional Dependencies
Definition:
If two tuples agree on the attributes
A1, A2, …, An
then they must also agree on the attributes
B1, B2, …, Bm
Formally:
A1, A2, …, An  B1, B2, …, Bm
10
Examples
EmpID
E0045
E1847
E1111
E9999
Name
Smith
John
Smith
Mary
Phone
1234
9876
9876
1234
Position
Clerk
Salesrep
Salesrep
Lawyer
• EmpID  Name, Phone, Position
• Position  Phone
• but Phone  Position
11
In General
• To check A  B, erase all other columns
… A … B
X1
Y1
X2
Y2
…
…
• check if the remaining relation is many-one
(called functional in mathematics)
12
Example
EmpID
E0045
E1847
E1111
E9999
Name
Smith
John
Smith
Mary
Phone
1234
9876
9876
1234
Position
Clerk
Salesrep
Salesrep
Lawyer
Position  Phone
13
Typical Examples of FDs
Product:
name  price, manufacturer
Person:
ssn
 name, age
Company: name  stockprice, president
14
Example
Product(name, category, color, department, price)
Consider these FDs:
name  color
category  department
color, category  price
What do they say ?
15
Example
FD’s are constraints on relations:
• On some instances they hold
• On others they don’t
name  color
category  department
color, category  price
name
category
color
department
price
Gizmo
Gadget
Green
Toys
49
Tweaker
Gadget
Green
Toys
99
Does this instance satisfy all the FDs ?
16
Example
name  color
category  department
color, category  price
name
category
color
department
price
Gizmo
Gadget
Green
Toys
49
Tweaker
Gadget
Black
Toys
99
Gizmo
Stationary
Green
Office-supp.
59
What about this one ?
17
Example
If some FDs are satisfied, then
others are satisfied too
If all these FDs are true:
name  color
category  department
color, category  price
Then this FD also holds:
name, category  price
Why ??
18
Inference Rules for FD’s
A1, A2, …, An  B1, B2, …, Bm
Splitting rule
and
Combining rule
Is equivalent to
A1, A2, …, An  B1
A1, A2, …, An  B2
.....
A1, A2, …, An  Bm
A1
...
Am
B1
...
Bm
19
Inference Rules for FD’s
(continued)
Trivial Rule
A1, A2, …, An  Ai
where i = 1, 2, ..., n
A1
…
Am
Why ?
20
Inference Rules for FD’s
(continued)
Transitive Closure Rule
If
A1, A2, …, An  B1, B2, …, Bm
and
B1, B2, …, Bm  C1, C2, …, Cp
then
A1, A2, …, An  C1, C2, …, Cp
Why ?
21
A1
…
Am
B1
…
Bm
C1
...
Cp
22
Example (continued)
Start from the following FDs:
1. name  color
2. category  department
3. color, category  price
Infer the following FDs:
Inferred FD
Which Rule
did we apply ?
4. name, category  name
5. name, category  color
6. name, category  category
7. name, category  color, category
8. name, category  price
23
Example (continued)
Answers:
1. name  color
2. category  department
3. color, category  price
Inferred FD
4. name, category  name
5. name, category  color
6. name, category  category
7. name, category  color, category
8. name, category  price
Which Rule
did we apply ?
Trivial rule
Transitivity on 4, 1
Trivial rule
Split/combine on 5, 6
Transitivity on 3, 7
24
Another Example
• Enrollment(student, major, course, room, time)
student  major
major, course  room
course  time
What else can we infer ?
25
Another Rule
Augmentation
If
A1, A2, …, An  B
then
A1, A2, …, An , C1, C2, …, Cp  B
Augmentation follows from trivial rules and transitivity
How ?
26
Problem: infer ALL FDs
Given a set of FDs, infer all possible FDs
How to proceed ?
• Try all possible FDs, apply all 3 rules
– E.g. R(A, B, C, D): how many FDs are possible ?
• Drop trivial FDs, drop augmented FDs
– Still way too many
• Better: use the Closure Algorithm (next)
27
Closure of a set of Attributes
Given a set of attributes A1, …, An
The closure, {A1, …, An}+ , is the set of attributes B
s.t. A1, …, An  B
Example:
name  color
category  department
color, category  price
Closures:
name+ = {name, color}
{name, category}+ = {name, category, color, department, price}
color+ = {color}
28
Closure Algorithm
Start with X={A1, …, An}.
Example:
Repeat until X doesn’t change do:
name  color
category  department
color, category  price
B1, …, Bn  C is a FD and
B1, …, Bn are all in X
then add C to X.
if
{name, category}+ =
{name, category, color,
department, price}
29
Example
R(A,B,C,D,E,F)
Compute {A,B}+
A, B 
A, D 
B

A, F 
C
E
D
B
X = {A, B,
}
Compute {A, F}+ X = {A, F,
}
30
Using Closure to Infer ALL FDs
Example:
A, B  C
A, D  B
B
 D
Step 1: Compute X+, for every X:
A+ = A, B+ = BD, C+ = C, D+ = D
AB+ = ABCD, AC+ = AC, AD+ = ABCD
ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)
BCD+ = BCD, ABCD+ = ABCD
Step 2: Enumerate all FD’s X  Y, s.t. Y  X+ and XY = :
AB  CD, ADBC, ABC  D, ABD  C, ACD  B
31
Problem: Finding FDs
• Approach 1: During Database Design
– Designer derives them from real-world knowledge of
users
– Problem: knowledge might not be available
• Approach 2: From a Database Instance
– Analyze given database instance and find all FD’s
satisfied by that instance
– Useful if designers don’t get enough information from
users
– Problem: FDs might be artifical for the given instance
32
Find All FDs
Student
Alice
Bob
Alice
Carol
Dan
Elsa
Frank
Dept
CSE
CSE
EE
CSE
CSE
CSE
EE
Course
C++
C++
HW
DB
Java
DB
Circuits
Room
020
020
040
045
050
045
020
Do all FDs
make sense
in practice ?
33
Answer
Course  Dept, Room
Dept, Room  Course
Student, Dept  Course, Room
Student, Course  Dept, Room
Student, Room  Dept, Course
Do all FDs
make sense
in practice ?
34
Keys
• A key is a set of attributes A1, ..., An s.t. for
any other attribute B, we have A1, ..., An  B
• A minimal key is a set of attributes which is a
key and for which no subset is a key
• Note: book calls them superkey and key
35
Computing Keys
• Compute X+ for all sets X
• If X+ = all attributes, then X is a key
• List only the minimal keys
Note: there can be many minimal keys !
• Example: R(A,B,C), ABC, BCA
Minimal keys: AB and BC
36
Examples of Keys
• Product(name, price, category, color)
name, category  price
category  color
Keys are: {name, category} and all supersets
• Enrollment(student, address, course, room, time)
student  address
room, time  course
student, course  room, time
Keys are:
37
Relational Schema Design
(or Logical Schema Design)
Main idea:
• Start with some relational schema
• Find out its FD’s
• Use them to design a better relational
schema
38
Data Anomalies
When a database is poorly designed we get anomalies:
Redundancy: data is repeated
Update anomalies: need to change in several places
Delete anomalies: may lose data when we don’t want
39
Relational Schema Design
Example: Persons with several phones
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
SSN  Name, City
but not SSN  PhoneNumber
Anomalies:
• Redundancy
= repeat data
• Update anomalies = Fred moves to “Bellevue”
• Deletion anomalies = Joe deletes his phone number:
what is his city ?
40
Relation Decomposition
Break the relation into two:
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
Name
SSN
City
SSN
PhoneNumber
Fred
123-45-6789
Seattle
123-45-6789
206-555-1234
Joe
987-65-4321
Westfield
123-45-6789
206-555-6543
987-65-4321
908-555-2121
Anomalies have gone:
• No more repeated data
• Easy to move Fred to “Bellevue” (how ?)
• Easy to delete all Joe’s phone number (how ?)
41
Relational Schema Design
name
Conceptual Model:
Product
price
Person
buys
name
ssn
Relational Model:
plus FD’s
Normalization:
Eliminates anomalies
42
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
R1 = projection of R on A1, ..., An, B1, ..., Bm
R2 = projection of R on A1, ..., An, C1, ..., Cp
43
Decomposition
• Sometimes it is correct:
Name
Price
Category
Gizmo
19.99
Gadget
OneClick
24.99
Camera
Gizmo
19.99
Camera
Name
Price
Name
Category
Gizmo
19.99
Gizmo
Gadget
OneClick
24.99
OneClick
Camera
Gizmo
19.99
Gizmo
Camera
Lossless decomposition
44
Incorrect Decomposition
• Sometimes it is not:
Name
Price
Category
Gizmo
19.99
Gadget
OneClick
24.99
Camera
Gizmo
19.99
Camera
What’s
incorrect ??
Name
Category
Price
Category
Gizmo
Gadget
19.99
Gadget
OneClick
Camera
24.99
Camera
Gizmo
Camera
19.99
Camera
Lossy decomposition
45
Decompositions in General
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An  B1, ..., Bm
Then the decomposition is lossless
Note: don’t need necessarily A1, ..., An  C1, ..., Cp
Example: name  price, hence the first decomposition is lossless
46
Normal Forms
First Normal Form = all attributes are atomic
Second Normal Form (2NF) = old and obsolete
Third Normal Form (3NF) = this lecture
Boyce Codd Normal Form (BCNF) = this lecture
Others...
47
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if:
If A1, ..., An  B is a non-trivial dependency
in R , then {A1, ..., An} is a key for R
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute,
it should determine all the attributes of R.
48
BCNF Decomposition Algorithm
Repeat
choose A1, …, Am  B1, …, Bn that violates the BNCF condition
split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others])
continue with both R1 and R2
Until no more violations
B’s
R1
A’s
Others
R2
Is there a
2-attribute
relation that is
not in BCNF ?
49
Example
Name
SSN
PhoneNumber
City
Fred
123-45-6789
206-555-1234
Seattle
Fred
123-45-6789
206-555-6543
Seattle
Joe
987-65-4321
908-555-2121
Westfield
Joe
987-65-4321
908-555-1234
Westfield
What are the dependencies?
SSN  Name, City
What are the keys?
{SSN, PhoneNumber}
Is it in BCNF?
50
Decompose it into BCNF
Name
SSN
City
Fred
123-45-6789
Seattle
Joe
987-65-4321
Westfield
SSN
PhoneNumber
123-45-6789
206-555-1234
123-45-6789
206-555-6543
987-65-4321
908-555-2121
987-65-4321
908-555-1234
SSN  Name, City
Let’s check anomalies:
• Redundancy ?
• Update ?
• Delete ?
51
Summary of BCNF
Decomposition
Find a dependency that violates the BCNF condition:
A1, A2, …, An  B1, B2, …, Bm
Heuristics: choose B1 , B2, … Bm“as large as possible”
Decompose:
Others
A’s
B’s
Continue until
there are no
BCNF violations
left.
2-attribute
relations are BCNF
R1
R2
52
Example Decomposition
Person(name, SSN, age, hairColor, phoneNumber)
SSN  name, age
age  hairColor
Decompose in BCNF (in class):
Step 1: find all keys (How ? Compute S+, for various sets S)
Step 2: now decompose
53
Other Example
• R(A,B,C,D) A  B, B  C
•
•
•
•
Key: AD
Violations of BCNF: A  B, A C, ABC
Pick A BC: split into R1(A,BC) R2(A,D)
What happens if we pick A  B first ?
54
Lossless Decompositions
A decomposition is lossless if we can recover:
R(A,B,C)
Decompose
R1(A,B)
R2(A,C)
Recover
R’(A,B,C) should be the same as
R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
55
Lossless Decompositions
• Given R(A,B,C) s.t. AB, the
decomposition into R1(A,B), R2(A,C) is
lossless
56
3NF: A Problem with BCNF
Unit
Company
Product
FD’s: Unit  Company;
Company, Product  Unit
So, there is a BCNF violation, and we decompose.
Unit
Company
Unit
Product
Unit  Company
No FDs
Notice: we loose the FD: Company, Product  Unit
57
So What’s the Problem?
Unit
Company
Unit
Galaga99
Bingo
UW
UW
Galaga99
Bingo
Product
databases
databases
No problem so far. All local FD’s are satisfied.
Let’s put all the data back into a single table again (anomalies?):
Unit
Galaga99
Bingo
Company
UW
UW
Product
databases
databases
Violates the dependency: company, product -> unit!
58
Solution: 3rd Normal Form
(3NF)
A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if :
Whenever there is a nontrivial dependency A1, A2, ..., An  B
for R , then {A1, A2, ..., An } is a key for R,
or B is part of a key.
Tradeoff:
BCNF = no anomalies, but may lose some FDs
3NF = keeps all FDs, but may have some anomalies
59
Download