Normalization DB Tuning CS186 Final Review Session

advertisement
Normalization
DB Tuning
CS186 Final Review Session
Plan
•
•
•
•
Functional Dependencies, Rules of Inference
Candidate Keys
Normal forms (BCNF/3NF)
Decomposition
–
–
–
–
BCNF
Lossless
Dependency preserving
3NF + Minimal cover
• DB Tuning
Functional Dependencies
• A functional dependency X  Y holds over relation
schema R if, for every allowable instance r of R:
t1  r, t2  r, pX (t1) = pX (t2)
implies pY (t1) = pY (t2)
(where t1 and t2 are tuples;X and Y are sets of attributes)
• In other words: X  Y means
Given any two tuples in r, if the X values are the same, then the Y
values must also be the same. (but not vice versa!!)
• Can read “” as “determines”
Rules of Inference
• Armstrong’s Axioms (X, Y, Z are sets of attributes):
– Reflexivity: If X  Y, then X  Y
– Augmentation: If X  Y, then XZ  YZ for any Z
– Transitivity: If X  Y and Y  Z, then X  Z
• Some additional rules (that follow from AA):
– Union: If X  Y and X  Z, then X  YZ
– Decomposition: If X  YZ, then X  Y and X  Z
Candidate Keys
• R = {A, B, C, D, E}
• F = { B CD, D  E, B  A, E  C, AD B }
• Is B  E in F+ ?
• Is AD a key for R?
AD+ = AD
AD+ = ABD and B is a key, so Yes!
B+ = B
• Is AD a candidate key for R?
B+ = BCD
+ = A, D+ = DEC
A
B+ = BCDA
… A,D not keys, so Yes!
+
B = BCDAE … Yes!
• Is ADE a candidate key for R?
and B is a key for R too!
… No! AD is a key, so ADE is a
• Is D a key for R?
D+ = D
D+ = DE
D+ = DEC
… Nope!
superkey, but not a candidate key
Boyce-Codd Normal Form (BCNF)
• Reln R with FDs F is in BCNF if, for all X  A
in F+
– A  X (called a trivial FD), or
– X is a superkey for R.
• In other words: “R is in BCNF if the only nontrivial FDs over R are key constraints.”
Third Normal Form (3NF)
• Reln R with FDs F is in 3NF if, for all X  A in F+
A  X (called a trivial FD), or
X is a superkey of R, or
A is part of some candidate key (not superkey!) for R.
(sometimes stated as “A is prime”)
• If R is in BCNF, obviously in 3NF.
BCNF Decomposition
• For each FD in F+ that violates BCNF, X
A
• Decompose R into R-A and XA
• If either R-A or XA is not in BCNF,
decompose recursively
• Guaranteed to be lossless but not
dependency preserving
Lossless Decomposition
• The decomposition of R into X and Y is lossless
with respect to F if and only if the closure of F
contains:
X  Y  X, or
XYY
• Useful result: If W  Z holds over R and W 
Z is empty, then decomposition of R into R-Z
and WZ is loss-less.
Dependency Preserving Decompositions
• Decomposition of R into X and Y is dependency preserving
if (FX  FY ) + = F +
– i.e., if we consider only dependencies in the closure F + that can be
checked in X without considering Y, and in Y without considering
X, these imply all dependencies in F +.
• Important to consider F + in this definition:
– ABC, A  B, B  C, C  A, decomposed into AB and BC.
• F+ also contains B  A, A  C, C  B …
• FAB contains A B and B  A; FBC contains B  C and
CB
• So, (FAB  FBC)+ contains C  A
Example BCNF Decomposition
•
•
•
•
•
•
•
CSJDPQV, candidate key C
JP  C, SD  P, J  S
Using SD  P, we get SDP, CSJDQV
Using J  S, we get JS and CJDQV
Result: SDP, JS and CJDQV. All are in BCNF
Lossless decomposition.
Not dependency preserving. We did not preserve
JP  C
Minimal Cover for a Set of FDs
• Minimal cover G for a set of FDs F:
– Closure of F = closure of G.
– Right hand side of each FD in G is a single attribute.
– If we modify G by deleting an FD or by deleting attributes from
an FD in G, the closure changes.
• Intuitively, every FD in G is needed, and ``as small as
possible’’ in order to get the same closure as F.
• e.g., A  B, ABCD  E, EF  GH, ACDF  EG has
the following minimal cover:
– A  B, ACD  E, EF  G and EF  H
– Do we need ACDF  EG? It can be derived from ACD  E and
EF  G. Same for ACDF  E, ACDF  G
3NF Decomposition
• Decompose to BCNF
• For each FD X  A in minimal cover that is
not preserved
– Add relation XA
• Guaranteed to be lossless AND dependency
preserving
Tuning the Schema
Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val)
Depts (Did, Budget, Report)
Suppliers (Sid, Address)
Parts (Pid, Cost)
Projects (Jid, Mgr)
• We will concentrate on Contracts, denoted as CSJDPQV.
The following ICs are given to hold:
JP  C, SD  P, C is the primary key.
–
–
C and JP are candidate keys
3NF normal form
BCNF Decomposition
• Use SD  P, we get SDP and CSJDQV
• Lossless but not dependency-preserving (JP  C)
• Three options
– Leave it in 3NF without decomposition
– Create an assertion to enforce JP  C
• Acceptable when updates are infrequent
– Add JPC as an extra table (redundancy across relations)
Check Assertion (JP  C)
PartInfo: SDP
ContractInfo: CSJDQV
CREATE ASSERTION checkDep
Lossless join on SD
CHECK (NOT EXISTS
(SELECT *
FROM PartInfo PI, ContractInfo CI
Group By JP
WHERE PI.supplierid=CI.supplierid
AND PI.deptid = CI.deptid
GROUP BY CI.projectid, PI.partid
HAVING COUNT (cid) > 1
)
)
Count C
Dealing with CC Hotspots
• Consider relation R: ABC
• Frequent Queries:
– Update B
– Read C
• Tuple-granularity locking
• Options
– Lossless decomposition to AB and AC (Not good if
there are queries that reads BC)
– Others?
Other stuff
• Partitioning (Vertical and Horizontal)
• Physical DB Design
– Choice of index
• Whether to index? (Factor in costs of index
maintenance)
• Choice of search key(s)
• Clustered / Unclustered?
• Index-only scans
Download