R - SEAS - University of Pennsylvania

advertisement
Normal Forms
Zachary G. Ives
University of Pennsylvania
CIS 550 – Database & Information Systems
March 16, 2016
Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Announcements
 Homework 3 will be due Monday10/22
 Fall break will be 10/19, midterm on 10/26
2
Armstrong’s Axioms: Inferring FDs
Some FDs exist due to others; can compute using
Armstrong’s axioms:
 Reflexivity:
If Y  X then X  Y
name, sid  name
(trivial dependencies)
 Augmentation: If X  Y then XW  YW
serno  subj so serno, exp-grade  subj, exp-grade
 Transitivity:
If X  Y and Y  Z then X  Z
serno  cid and cid  subj
so serno  subj
3
Armstrong’s Axioms Lead to…
If X  Y and X  Z
then X  YZ
 Pseudotransitivity: If X  Y and WY  Z
then XW  Z
 Decomposition: If X  Y and Z  Y
then X  Z
 Union:
Let’s prove a few of these from Armstrong’s
Axioms
4
Closure of a Set of FD’s
Defn. Let F be a set of FD’s.
Its closure, F+, is the set of all FD’s:
{X  Y | X  Y is derivable from F by Armstrong’s
Axioms}
Which of the following are in the closure of our Student-Course
FD’s?
StudentData(sid, name, serno, cid, subj, grade)
name  name
cid  subj
serno  subj
cid, sid  subj
cid  sid
5
Attribute Closures: Is Something
Dependent on X?
Defn. The closure of an attribute set X, X+, is:
X+ =  {Y | X  Y  F +}
 This answers the question “is Y determined
(transitively) by X?”; compute X+ by:
closure := X;
repeat until no change {
if there is an FD U  V in F
such that U is in closure
then add V to closure}
 Does sid, serno  subj, exp-grade?
6
Equivalence of FD sets
Defn. Two sets of FD’s, F and G, are equivalent if
their closures are equivalent, F + = G +
e.g., these two sets are equivalent:
{XY  Z, X  Y} and
{X  Z, X  Y}
 F + contains a huge number of FD’s
(exponential in the size of the schema)
 Would like to have smallest “representative” FD
set
7
Minimal Cover
we express
Defn. A FD set F is minimal if:
each FD in
1. Every FD in F is of the form X  A,
simplest form
where A is a single attribute
2. For no X  A in F is:
in a sense,
F – {X  A } equivalent to F
each FD is
“essential”
3. For no X  A in F and Z  X is:
to the cover
F – {X  A }  {Z  A } equivalent to F
Defn. F is a minimum cover for G if F is minimal and is
equivalent to G.
e.g.,
{X  Z, X  Y} is a minimal cover for
{XY  Z, X  Z, X  Y}
8
More on Closures
If F is a set of FD’s and X  Y  F +
then for some attribute A  Y, X  A  F
+
Proof by counterexample.
Assume otherwise and let Y = {A1,..., An}
Since we assume X  A1, ..., X  An are in F +
then X  A1 ... An is in F + by union rule,
hence, X  Y is in F + which is a contradiction
9
Why Armstrong’s Axioms?
Why are Armstrong’s axioms (or an equivalent rule
set) appropriate for FD’s? They are:
 Consistent: any relation satisfying FD’s in F will satisfy
those in F +
 Complete: if an FD X  Y cannot be derived by
Armstrong’s axioms from F, then there exists some
relational instance satisfying F but not
XY
 In other words, Armstrong’s axioms derive all the
FD’s that should hold
 What is the goal of using these axioms?
10
Decomposition
Consider our original “bad” attribute set
Stuff(sid, name, serno, subj, cid, exp-grade)
We could decompose it into:
Student(sid, name)
Course(serno, cid)
Subject(cid, subj)
But this decomposition loses information about the
relationship between students and courses. Why?
11
Lossless Join Decomposition
R1, … Rk is a lossless join decomposition of R w.r.t. an FD set F if
for every instance r of R that satisfies F,
R1(r) ⋈ ... ⋈ Rk(r) = r
Consider:
sid
name
serno
subj
cid
exp-grade
1
23
Sam
Nitin
570103
550103
AI
DB
570
550
B
A
What if we decompose on
(sid, name) and (serno, subj, cid, exp-grade)?
12
Testing for Lossless Join
R1, R2 is a lossless join decomposition of R with respect to F
iff at least one of the following dependencies is in F+
(R1  R2)  R1 – R2
(R1  R2)  R2 – R1
So for the FD set:
sid  name
serno  cid, exp-grade
cid  subj
Is (sid, name) and (serno, subj, cid, exp-grade) a lossless
decomposition?
13
Dependency Preservation
Ensures we can “easily” check whether a FD X  Y
is violated during an update to a database:
 The projection of an FD set F onto a set of attributes Z,
FZ is
{X  Y | X  Y  F +, X  Y  Z}
i.e., it is those FDs local to Z’s attributes
 A decomposition R1, …, Rk is dependency preserving if
F + = (FR1 ... FRk)+
The decomposition hasn’t “lost” any essential FD’s, so
we can check without doing a join
14
Example of Lossless and
Dependency-Preserving Decompositions
Given relation scheme
R(name, street, city, st, zip, item, price)
And FD set name  street, city
street, city  st
street, city  zip
name, item  price
Consider the decomposition
R1(name, street, city, st, zip) and R2(name, item, price)
 Is it lossless?
 Is it dependency preserving?
What if we replaced the first FD by name, street  city?
15
Another Example
Given scheme: R(sid, fid, subj)
and FD set: fid  subj
sid, subj  fid
Consider the decomposition
R1(sid, fid) and R2(fid, subj)
 Is it lossless?
 Is it dependency preserving?
16
FD’s and Keys
 Ideally, we want a design s.t. for each nontrivial
dependency X  Y, X is a superkey for some
relation schema in R
 We just saw that this isn’t always possible
 Hence we have two kinds of normal forms
17
Two Important Normal Forms
Boyce-Codd Normal Form (BCNF). For every relation
scheme R and for every X  A that holds over R,
either A  X (it is trivial) ,or
or X is a superkey for R
Third Normal Form (3NF). For every relation scheme
R and for every X  A that holds over R,
either A  X (it is trivial), or
X is a superkey for R, or
A is a member of some key for R
18
Normal Forms Compared
BCNF is preferable, but sometimes in conflict with the
goal of dependency preservation
 It’s strictly stronger than 3NF
Let’s see algorithms to obtain:
 A BCNF lossless join decomposition (nondeterministic)
 A 3NF lossless join, dependency preserving
decomposition
19
BCNF Decomposition Algorithm
(from Korth et al.; our book gives a recursive version)
result := {R}
compute F+
while there is a relation schema Ri in result that isn’t in BCNF
{
i.e., A doesn’t
form a key
let A  B be a nontrivial FD on Ri
s.t. A  Ri is not in F+
and A and B are disjoint
}
result:= (result – Ri)  {(Ri - B), (A,B)}
20
An Example
Given the schema:
Stuff(sid, name, serno, classroom, cid, fid, prof)
And FDs:
sid  name
fid  prof
serno  classroom, cid, fid
 Find the Boyce-Codd Normal Form for this schema
 What if instead:
sid  name
fid  prof
classroom, cid  serno
serno  cid
21
3NF Decomposition Algorithm
Let F be a minimal cover
i:=0
for each FD A  B in F {
if none of the schemas Rj, 1 j  i, contains AB
{
increment i
Ri := (A, B)
}
}
if no schema Rj, 1  j  i contains a candidate key for R {
increment i
Ri := any candidate key for R
}
return (R1, …, Ri)
Build dep.preserving
decomp.
Ensure
lossless
decomp.
22
An Example
Given the schema:
Stuff(sid, name, serno, classroom, cid, fid, prof)
And FDs:
sid  name
fid  prof
serno  classroom, cid, fid
 Find the Third Normal Form for this schema
 What if instead:
sid  name
fid  prof
classroom, cid  serno
serno  cid
23
Summary of Normalization
 We can always decompose into 3NF and get:
 Lossless join
 Dependency preservation
 But with BCNF:
 We are only guaranteed lossless joins
 The algorithm is nondeterministic, so there is not a
unique decomposition for a given schema R
 BCNF is stronger than 3NF: every BCNF schema is
also in 3NF
24
Normalization Is Good… Or Is It?
 In some cases, we might not mind redundancy, if the
data isn’t directly updated:
 Reports (people like to see breakdowns by semester,
department, course, etc.)
 Warehouses (archived copies of data for doing complex
analysis)
 Data sharing (sometimes we may export data into objectoriented or hierarchical formats)
25
Download