Midterm 3 Revision - Department of Computer Science

advertisement
Midterm 3 Revision 2
Prof. Sin-Min Lee
Department of Computer Science
Normal Forms
1NF 2NF 3NF BCNF 4NF 5NF
Functional
dependencies
Multivalued
dependencies
Join
dependencies
F+: dependencies induced by
Armstrong’s Axioms
• Axioms for reasoning about FD’s
+
F is the set of dependencies which can be deduced from F
by applying three inference rules:
(i) reflexivity
if Y  X then X  Y
(ii) augmentation
if X  Y then XZ  YZ
(iii) transitivity
if X  Y and Y  Z then X  Z
• (i), (ii) and (iii) are Armstrong’s axioms
Completeness of Armstrong Axioms
• Idea
If X  Y cannot be deduced using Armstrong’s axioms then there is a
relational instance for R in which all the dependencies in F are true,
but X  Y does not hold
R=LMNO
X=L
F={L  M , M  N, O  N}
then X+ = LMN
L  O cannot
be deduced in F+
Counterexample:
L
M
O
N
L  O cannot
be deduced in F*
Closure of a Set of Attributes
• Let U be a set of attributes and F be a set of
functional dependencies on U.
• Suppose that X  U is a set of attributes.
• Definition: X+ = { A | F |=X  A}
• We would like to compute X+
Note that A, B, C, are attributes
R = ( A, B, C )
We refer to the set {A,B} simply as AB
F = { A  B, B  C }
F+ = { A  A, B  B, C  C,
Using reflexivity, we
AB  AB, BC  BC, AC  AC, ABC  ABC,
can generate all
AB  A, AB  B,
trivial dependencies
BC  B, BC  C,
AC  A, AC  C,
ABC  AB, ABC  BC, ABC  AC,
ABC  A, ABC  B, ABC  C,
A  B, … (1) ( given )
B  C, … (2) ( given )
A  C, … (3) ( transitivity on (1) and (2) )
AC  BC, … (4) ( augmentation on (1) )
AC  B, … (5) ( decomposition on (4) )
A  AB, … (6) ( augmentation on (1) )
AB  AC, AB  C, B  BC,
A  AC, AB  BC, AB  ABC, AC  ABC, A  BC, A  ABC }
Algorithm to compute closure of attributes X+ under F
closure := X ;
Repeat
for each U  V in F do
begin
if U  closure
then closure := closure  V ;
end
Until (there is no change in closure)
R( A, B, C, G, H, I )
F = { A  B, A  C, CG  H, CG  I, B  H }
To compute AG+
Is AG a candidate key?
closure = AG
AG  R
+
A
 R?
closure = ABG ( A  B )
+
G  R?
closure = ABCG ( A  C )
closure = ABCGH ( CG  H )
closure = ABCGHI ( CG  I )
Example
• R(ABCDE)
• F={ABC, CEB, DA, BCE}
– {A}+ =
– {A,B}+ =
– {B,D}+=
Theorem:
R - a relation schema
F - set of functional dependencies on R
The decomposition of R into relations with attribute sets
R1, R2 is a lossless-join decomposition iff
( R1  R2 )  R1  F +
OR
( R1  R2 )  R2  F +
i.e., R1  R2 is a superkey for R1 or R2.
(the attributes common to R1 and R2 must contain a key for
either R1 or R2 ).
Example
• R(A,B,C,D,E)
•  = {R1(A,D), R2(A,B),R3(B,E), R4(C,D,E),
R5(A,E)}
• FD1. AC
FD2. BC
FD3. CD
• FD4. DEC
FD5. CE A.
• Decide whether the decomposition is lossless.
BCNF Decomposition
Suppose R is not in BCNF, A is an attribute, and X  A is a FD
where X  A =  that violates the condition.
1. Remove A from R
2. Create a new relational schema XA
3. Repeat this process until all the relations are in BCNF
• It is a lossless join decomposition.
• But not necessary dependency preserving
Key is C
SDP
CSJDPQV
JS
SDP
CSJDQV
SDP
JS
JS
CJDQV
Key is C
SDP
JS
JP C
CSJDPQV
SDP
CSJDQV
SDP
JS
JS
CJDQV
The result is in BCNF
Does not preserve JPC, we can add a schema:
CJP
Each of SDP, JS, CJDQV, CJP is in BCNF, but there is
redundancy in CJP.
Possible refinement
CSJDPQV
Key is C
SDP
SDQ
SDP
CSJDQV
SDP
SDQ
SDQ
SD is a key in SDP and SDQ,
There is no dependency between P and Q
we can combine SDP and SDQ into one schema
Resulting in SDPQ, CSJDV
CSJDV
Overview
•
•
•
•
It is possible to decompose any
relational schema into a set of
relational schemas with the following
properties:
1) Attribute Preserving
2) FDs preserving
3) Lossless Join
The Decomposition Algorithm
•
•
•
Step 1. Find a minimal cover G for F
Step 2. For each left-hand side X that
appears in G, create a relation schema with
attributes
{X  A1  A2, ... ,  Am}
where X  A1, X  A2, ... , X  Am are all
dependencies in G with X as left-hand side.
Step 3. If none of the relation schemas
contains a key of R, create one more
relation schema that contains attributes that
form a key for R.
Example
•
•
•
•
•
Consider R(A, B, C, D, E) and
F={AB  C, A  BE, C E}
Step 1. minimal cover
Fmin={AC, AB, CE}
Step 2. R1(A,B,C), R2(C,E)
Step 3. Key: {A,D}
we have R3(A,D)
Final Result:
R1(A, B,C), R2(C,E), and R3(A,D)
BCNF Decomposition
•
•
Property LJ1:
A decomposition D={R1, R2} of R has
the lossless join property with respect
to a set of functional dependencies F
on R if and only if either
– the FD ((R1  R2)  (R1 - R2)) is in F+,
or
– the FD ((R1  R2)  (R2 - R1)) is in F+
BCNF Decomposition
•
•
Property LJ2:
If a decomposition D={R1, R2, ..., Rm} of R
has the lossless join property with respect
to a set of functional dependencies F on R,
and if a decomposition D1={Q1, Q2, ... ,Qk}
of Ri has the lossless join property with
respect to the projection of F on Ri, then the
decomposition
D2={R1, R2, ... Ri-1, Q1, Q2, ..., Qk, Ri+1,
..., Rm} of R has the lossless join property
with respect to F
BCNF Decomposition Algorithm
1. Set D  {R}
2. While there is a relation schema Q in D that
is not in BCNF do
begin
choose a relation schema Q in D that is
not in BCNF;
find a functional dependency X  Y in Q
that violates BCNF;
replace Q in D by two schemas
(Q-Y) and (X  Y)
end;
BCNF Decomposition Example
Consider R(A,B,C,D) and
F={A  B, B  C, D  B}
Decompose R into BCNF relations.
Step 1.D={R(A,B,C,D)}
Step 2. Loop 1.R is not in BCNF because A 
B and A is not a superkey
decompose R into R1(A, C, D), and R2(A,
B)
Loop 2. R1 is not in BCNF because A 
C and A is not a superkey
decompose R1 into R11(A, D) and R12(A,
Overview
•
•
Given a relation schema R(A1, A2, ... , An).
If R is not in the third normal form (3NF), we
wan to decompose it into a set of relation
schema
D= { R1, R2, ... ,Rm }, where
each Ri is in 3NF, such that the following
conditions are held:
1.
Attribute preservation condition.
2.
Dependency preservation condition
3.
Lossless join condition
Attribute Preservation Condition
•
•
Attribute preservation condition states
that the union of attributes of Ri equal
to the set of attributes of R.
For example: Given R(A, B, C, D)
and the decomposition
D1={ R1(A,B), R2(B,C) and
R3(A,C,D)}. D1 satisfies the attribute
preservation condition.
Attribute Preservation Condition
•
•
Given R(A, B, C, D) and the
decomposition
D2={R1(A, B), R2(B,C), R3(A, C)},
The attribute preservation condition is
violated because D is missing (not
preserved in the decomposition).
Dependency Preservation Condition
•
•
We say that a decomposition D={R1, R2, ...
, Rm} of R is dependency preserving with
respect to F if the union of the projections of
F on each Ri in D is equivalent to F. That is:
((F(R1)  ...  F(Rm))+ = F+
Given a set of dependencies F on R, the
projection of F on Ri, denoted by F(Ri)
where Ri is a subset of R, is the set of
dependencies X  Y in F+ such that the
attributes in X  Y are all contained in Ri.
Dependency Preservation Condition
•
•
Given R(A, B, C, D) and F = { A  B,
B  C, C  D}
Let D1={R1(A,B), R2(B,C), R3(C,D)}
F(R1)={A  B}
F(R2)={B  C}
F(R3)={C  D}
FDs are preserved.
Dependency Preservation Condition
•
•
Given R(A, B, C, D) and F = { A  B,
B  C, C  D}
Let D2={R1(A,B}, R2(B,C), R3(A, D)},
then FDs are not preserved.
Dependency Preservation Condition
•
•
Given R(A, B, C, D) and F = { A  B,
B  C, C  D}
Let D2={R1(A,B}, R2(B,C), R3(A, D)},
then FDs are not preserved.
Dependency Preservation Condition
•
We want to preserve the
dependencies because each
dependency in F represents a
constraint on a database.
Dependency Preservation Condition
•
If one of the dependencies is not
represented by the dependencies on some
individual relation Ri of the decomposition,
we will not be able to enforce this constraint
by looking only at an individual relation,
instead, to enforce the constraint, we will
have to join two or more of the relations in
the decomposition and then check that
functional dependency hold in the result of
the join operation. This is very inefficient.
Multivalued Dependencies and
Fourth Normal Form
• Formal Definition of Multivalued
Dependency
Multi-Valued Dependency
• Problem: multi-valued (or binary join) dependency
– Definition: If every instance of schema R can be (losslessly)
decomposed using attribute sets (X, Y) such that:
r =  X (r)
 Y (r)
then a multi-valued dependency exists
Ex: Person= SSN,PhoneN(Person)
 SSN,ChildSSN(Person)
Fourth Normal Form
• A schema is in fourth normal form if for
every non-trivial multi-valued dependency:
R=X
Y
either:
- X  Y or Y  X (trivial case) or
- X  Y is a superkey of R
(i.e., X  Y R )
4th Normal Form
• No multivalued dependencies and BCNF
• Create separate tables for each separate
functional dependency
Example
SalesForce (State, SalesPerson)
Delivery (State, Delivery)
State
Sales
State
Delivery
PA
George
PA
UPS
PA
Sue
PA
RPS
NJ
Mike
NJ
UPS
NJ
Valerie
NJ
Truck
Beyond 4th Normal Form
• 5th Normal Form
– Project-Join Normal Form
• Domain Key Normal Form (DKNF)
Assume the relation R contains the following two tuples
R(A B C D)
(12 34)…
(15 67)…
What other tuples must R contain so that A ->-> B and A ->-> C
hold for R ?
Answer: The tuples that must be included due to the two
multi-valued dependency are:
(1 2 6 7)
(1 5 3 4)
(1 2 6 4)
(1 5 3 7)
(1 2 3 7) second round
(1 5 6 4) second round
Example
• Consider the following relation and determinants.
R(a,b,c,d)
a,c -> b,d
a,d -> b
• To be in BCNF, all valid determinants must be a
candidate key. In the relation R, a,c->b,d is the
determinate used, so the first determinate is fine.
• a,d->b suggests that a,d can be the primary key,
which would determine b. However this would not
determine c. This is not a candidate key, and thus
R is not in BCNF.
Tuple Relational Calculus
• based on specifying a number of tuple
variables
• a tuple variable refers to any tuple
Simple example 1
• To find all employees whose salary is
greater than $50,000
– {t| EMPLOYEE(t) and t.Salary>5000}
• where
• EMPLOYEE(t) specifies the range of tuple variable
t
– The above operation selects all the attributes
Simple example 2
• To find only the names of employees
whose salary is greater than $50,000
– {t.FNAME, t.NAME| EMPLOYEE(t) and
t.Salary>5000}
•
•
•
•
The above is equivalent to
SELECT T.FNAME, T.LNAME
FROM EMPLOYEE T
WHERE T.SALARY > 5000
Elements of a tuple calculus
• In general, we need to specify the following
in a tuple calculus expression:
– Range Relation (I.e, R(t)) = FROM
– Selected combination= WHERE
– Requested attributes= SELECT
Elements of formula
• A formula is made of Predicate Calculus atoms:
–
–
–
–
–
–
an atom of the from R(ti)
ti.A op tj.B op{=, <,>,..}
F1 And F2 where F1 and F2 are formulas
F1 OR F2
Not (F1)
F’=(t) (F) or F’= (t) (F)
•  Y friends (Y, John)
• X likes(X, ICE_CREAM)
•
More Example
• For every project located in ‘Stafford’,
retrieve the project number, the controlling
department number, and the last name,
birthrate, and address of the manger of that
department.
Domain Relational Calculus
(DRC)
• Another type of formal predicate calculusbased language
• QBE is based on DRC
• The language shares a lot of similarities
with the tuple calculus
DRC
• The only difference is the type of variables:
– variables range over singles values from
domains of attributes
• An expression of DRC is:
– {x1, x2,…,xn|COND(x1,x2,…,xn, xn+2,…,xn+m)}
• where x1,x2,…,xn+m are domain var range over
attributers
• COND is a condition (or formula)
Examples
• Retrieve the birthdates and address of the
employee whose name is ‘John B. Smith’
• {uv| (q)(r)(s) (EMPLOYEE(qrstuvwxyz)
and q=‘John’ and r=‘B’ and s=‘Smith’
Alternative notation
• Ssign the constants ‘John’, ‘B’, and ‘Smith’
directly
• {uv|EMPLOYEE
(‘John’, ’B’, ’Smith’ ,t ,u ,v ,x ,y ,z)}
More example
• Retrieve the name and address of all
employees who work for the ‘Reseach’
department
• {qsv | ( z) EMPLOYEE(qrstuvwxyz) and
( l) ( m) (DEPARTMENT (lmno) and
l=‘Research’ and m=z))}
More example
• List the names of managers who have at
least on e dependent
• {sq| ( t) EMPLOYEE(qrstuvwxyz) and ((
j)( DEPARTMENT (hijk) and (( l) |
(DEPENTENT (lmnop) and t=j and t=l))))}
Characteristics of a Decomposition
• Two important characteristics of a
decomposition:
– lossless join: necessary, otherwise original
relation cannot be recreated, even if tables are
not modified
– dependency preserving: allows us to check
that inserts/updates are correct without joining
the sub-relations
Lossless Join
S
Cohen
Levy
Levy
S
Cohen
Levy
Levy
C
DB
OS
DB
C
DB
OS
DB
T
Smith
Jones
Smith
S
Cohen
T
Smith
Levy
Levy
Jones
Smith
Checking
• Check for a lossless join using the algorithm
from class (with the a-s and b-s)
• Check for dependency preserving using an
algorithm shown today
Dependency Preservation
•
•
•
•
•
R=ABC
Decomposition {AB, AC}
Dependencies {AB, BC}.
Is it lossless?
Does this decomposition preserve BC?
Dependency Preservation
(cont’d)
A
B
C
1
2
3
A
1
2
3
4
B
10
10
30
30
10
10
20
100
100
300
A
1
2
3
4
C
100
100
300
400
Definitions
• We define S (F) to be the set of dependencies XY in F+
such that X and Y are in S.
• We say that a decomposition R1...Rn of R is dependency
preserving if for all instances r of R that satisfy the FDs of
R:
(R1 (F) U ... U Rn (F))+ = F+
• Note that one inclusion clearly holds always.
• This definition implies an exponential algorithm to check
if a decomposition is dependency preserving
• We give a polynomial algorithm
Algorithm
• Let R be a relation, decomposed into R1, R2,…,Rn
• Let F be a set of functional dependencies
• To check whether R1,…,Rn preserves all the
functional dependencies in F, run the algorithm on
the next slide for each X -> Y in F
• If the answer is “Yes” for all FDs, then the
decomposition preserves F
• If the answer is “No” for at least one FD, then the
decomposition does not preserve F
Testing Dependency Preservation
• To check if the decomposition preserves XY:
Z:=X
while changes to Z occur do
for i=1 to n do
Z:= Z  ((Z  Ri)+  Ri)
if YZ
return “yes”
else
return “no”
Example (1)
•
•
•
•
R=ABCD
F = {A -> B, B -> C, C -> D, D -> A}
R1=AB, R2=BC, R3=CD
Is this decomposition dependency
preserving?
Example (2)
• R = ABCDE
• F = {A -> ABCDE, BC -> A, DE -> C}
• Suppose we decompose R into ABDE and
DEC.
• Is the decomposition dependency
preserving?
Normal Forms
• The basic idea: if a relation is in one of
these forms, then it avoids certain problems
(e.g., redundancy)
• Normal Forms:
– BCNF: Every dependency X->A in F+ must be
(1) trivial or (2) X is a super-key
– 3NF: Every dependency X->A in F+ must be
(1) trivial, (2) X is a super-key or (3) A is an
attribute of a key
Example
• Reminder F+ = {X -> X+ | exist Y->Z in F st Y in
X and Z not in X}
• Suppose that R = ABC. For each of the following
values of F, decide whether R is in BCNF/3NF:
–
–
–
–
–
F = {}
F = {A -> B}
F = {A -> B, A -> C}
F = {A -> B, B -> C}
F = {A -> B, BC -> A}
Decomposition into 3NF
• Given a relation R with functional dependencies F
• Step 1: Find a non-redundant cover G of F
• Step 2: For each FD XA in G, create a schema XA
• Step 3: If no schema created so far contains a key, add a key as a
schema
• Step 4: Remove schemas that are contained in other schemas
• The result is a decomposition into 3NF that is dependency
preserving and has a lossless join
Example
• Find a decomposition into 3NF for the
relation R = ABCDEFGH, with the
functional dependencies F = {AB,
ABCDE, EFGH, ACDFEG}
Example
• Non-redundant cover G = {AB, ACDE,
EFG, EFH}
• Key ACDF
• Schema: AB, ACDE, EFG, EFH, ACDF
Download