Chapter 8: Relational Database Design

advertisement
Chapter 8: Relational Database Design
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Relational Database Design
n Find a “good” collection of relation schemas for our database
n Two major pitfalls to avoid in designing a database schema
l
Redundancy
l
repeating information è data inconsistency
Incompleteness
4 difficult or impossible to model certain aspects of the enterprise
4
Database System Concepts - 6th Edition
8.2
©Silberschatz, Korth and Sudarshan
Design Alternatives: Larger Schemas
n Suppose we combine instructor and department into inst_dept
n Redundancy
l
Wastes space
l
Complicates updating è possibility of inconsistency
n Null values may be introduced è difficulty in handling
Database System Concepts - 6th Edition
8.3
©Silberschatz, Korth and Sudarshan
Design Alternatives: Smaller Schemas
n Suppose we decompose employee(ID, name, street, city, salary) into
employee1 (ID, name) and employee2 (name, street, city, salary)
n Lossy decomposition
l
Database System Concepts - 6th Edition
8.4
Introduces a loss of information
©Silberschatz, Korth and Sudarshan
First Normal Form
n Domain is atomic if its elements are considered to be indivisible units
l
Examples of non-atomic domains:
4
Set of names, composite attributes
4
Identification numbers like CS101 that can be broken up into parts
n Atomicity is actually a property of how the elements of the domain are used
n Non-atomic values complicate storage and encourage redundant (repeated)
storage of data
n A relational schema R is in first normal form (1NF) if the domains of all
attributes of R are atomic
n We assume all relations are in first normal form
Database System Concepts - 6th Edition
8.5
©Silberschatz, Korth and Sudarshan
Relational Theory
Goal: Devise a theory for the following
n Decide whether a particular relation R is in “good” form.
n In the case that a relation R is not in “good” form, decompose it into a
set of relations {R1, R2, ..., Rn} such that
l
each relation is in good form
l
the decomposition is a lossless-join decomposition
n Our theory is based on:
l
functional dependencies
l
multivalued dependencies
Database System Concepts - 6th Edition
8.6
©Silberschatz, Korth and Sudarshan
Functional Dependencies
n Constraints on the set of legal relations
n Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes
n A functional dependency is a generalization of the notion of a key
n Provide the theoretical basis for good decomposition
Database System Concepts - 6th Edition
8.7
©Silberschatz, Korth and Sudarshan
Functional Dependencies (Cont.)
n Let R be a relation schema
a Í R and b Í R
n The functional dependency
a®b
holds on R if and only if for any legal relations r(R), whenever any
two tuples t1 and t2 of r agree on the attributes a, they also agree
on the attributes b. That is,
t1[a] = t2 [a] Þ t1[b ] = t2 [b ]
n Example: Consider r(A,B ) with the following instance of r.
1
1
3
4
5
7
n On this instance, A ® B does NOT hold, but B ® A does hold.
Database System Concepts - 6th Edition
8.8
©Silberschatz, Korth and Sudarshan
Keys and Functional Dependencies
n K is a superkey for relation schema R if and only if K ® R
n K is a candidate key for R if and only if
l
K ® R, and
l
for no a Ì K, a ® R
n Functional dependencies allow us to express constraints that cannot be
expressed using superkeys. Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget)
We expect these functional dependencies to hold:
dept_name® building
and
ID à building
but would not expect the following to hold:
dept_name ® salary
Database System Concepts - 6th Edition
8.9
©Silberschatz, Korth and Sudarshan
Use of Functional Dependencies
n Test relations to see if they are legal under a given set of FDs
l
If a relation r is legal under a set F of FDs, we say that r satisfies F.
n Specify constraints on the set of legal relations
l
If all legal relations on R satisfy a set F of FDs, we say that F holds on R.
n Note: A specific instance of a relation schema may satisfy a FD
even if the FD does not hold on all legal instances.
l
For example, a specific instance of instructor may, by chance, satisfy
name ® ID.
Database System Concepts - 6th Edition
8.10
©Silberschatz, Korth and Sudarshan
Trivial Functional Dependency
n A functional dependency is trivial if it is satisfied by all instances of a
relation
l
Example:
4
ID, name ® ID
4
name ® name
n In general, a ® b is trivial if b Í a
Database System Concepts - 6th Edition
8.11
©Silberschatz, Korth and Sudarshan
Closure of a Set of Functional Dependencies
n Given a set F of FDs, there are certain other FDs that are logically implied by F.
l
For example: If A ® B and B ® C, then we can infer that A ® C
n The set of all functional dependencies logically implied by F is the closure of F
(denoted by F+).
n We can find F+ by repeatedly applying Armstrong’s Axioms:
l
if b Í a, then a ® b
(reflexivity)
l
if a ® b, then g a ® g b
(augmentation)
l
if a ® b, and b ® g, then a ® g (transitivity)
n These rules are
l
sound (generate only functional dependencies that actually hold)
and
l
complete (generate all functional dependencies that hold)
Database System Concepts - 6th Edition
8.12
©Silberschatz, Korth and Sudarshan
Example
n R = (A, B, C, G, H, I)
F={ A®B
A®C
CG ® H
CG ® I
B ® H}
n some members of F+
l
A®H
4
l
AG ® I
4
l
by transitivity from A ® B and B ® H
by augmenting A ® C with G, to get AG ® CG
and then transitivity with CG ® I
CG ® HI
4
by augmenting CG ® I to infer CG ® CGI,
and augmenting of CG ® H to infer CGI ® HI,
and then transitivity
Database System Concepts - 6th Edition
8.13
©Silberschatz, Korth and Sudarshan
Additional Rules for Closure of FDs
n Additional rules:
l
If a ® b holds and a ® g holds, then a ® b g holds (union)
l
If a ® b g holds, then a ® b holds and a ® g holds (decomposition)
l
If a ® b holds and g b ® d holds, then a g ® d holds (pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.
Database System Concepts - 6th Edition
8.14
©Silberschatz, Korth and Sudarshan
Closure of Attribute Sets
n Given a set of attributes a, define the closure of a under F (denoted
by a+) as the set of attributes that are functionally determined by a
under F
n
Algorithm to compute a+, the closure of a under F
result := a;
while (changes to result) do
for each b ® g in F do
begin
if b Í result then result := result È g
end
Database System Concepts - 6th Edition
8.15
©Silberschatz, Korth and Sudarshan
Example of Attribute Set Closure
n R = (A, B, C, G, H, I)
n F = {A ® B
A®C
CG ® H
CG ® I
B ® H}
n (AG)+
1. result = AG
2. result = ABCG
(A ® C and A ® B)
3. result = ABCGH (CG ® H and CG Í AGBC)
4. result = ABCGHI (CG ® I and CG Í AGBCH)
n Is AG a candidate key?
1.
Is AG a super key?
2.
Does AG ® R? == Is (AG)+ Ê R
Is any subset of AG a superkey?
1.
1.
2.
Does A ® R? == Is (A)+ Ê R
Does G ® R? == Is (G)+ Ê R
Database System Concepts - 6th Edition
8.16
©Silberschatz, Korth and Sudarshan
Uses of Attribute Closure
There are several uses of the attribute closure algorithm:
n Testing for superkey: “is a a superkey?”
l
Compute a+, and check if a+ contains all attributes of R
n Testing functional dependencies: “does a ® b hold? (Is a ® b in F+?)”
l
Just check if b Í a+.
n Computing the closure of F
l
For each g Í R, we find the closure g+, and
for each S Í g+, we output a functional dependency g ® S.
Database System Concepts - 6th Edition
8.17
©Silberschatz, Korth and Sudarshan
Canonical Cover
n Sets of functional dependencies may have redundant dependencies that
can be inferred from the others
l
Example: A ® C is redundant in: {A ® B, B ® C, A ® C}
l
Parts of a functional dependency may be redundant
4
E.g.: on RHS: {A ® B, B ® C, A ® CD} can be simplified to
{A ® B, B ® C, A ® D}
4
E.g.: on LHS:
{A ® B, B ® C, AC ® D} can be simplified to
{A ® B, B ® C, A ® D}
n A canonical cover for F is a set of dependencies Fc such that
l
F+ = Fc+, and
l
No functional dependency in Fc contains an extraneous attribute, and
l
Each left side of functional dependency in Fc is unique
n Intuitively, Fc is a “minimal” set of FDs equivalent to F,
having no redundant dependencies or redundant parts of dependencies
Database System Concepts - 6th Edition
8.18
©Silberschatz, Korth and Sudarshan
Extraneous Attributes
n Let a ® b in F.
l
Attribute A Î a is extraneous if F Þ (F – {a ® b}) È {(a – A) ® b}.
l
Attribute A Î b is extraneous if (F – {a ® b}) È {a ®(b – A)} Þ F.
n Note: implication in the opposite direction is trivial in each of the cases above,
since a “stronger” functional dependency always implies a weaker one
n Example: Given F = {A ® C, AB ® C }
l
B is extraneous in AB ® C because
{A ® C, AB ® C} Þ A ® C (i.e. the result of dropping B from AB ® C).
n Example: Given F = {A ® C, AB ® CD}
l
C is extraneous in AB ® CD because
AB ® C can be inferred even after deleting C
Database System Concepts - 6th Edition
8.19
©Silberschatz, Korth and Sudarshan
Testing if an Attribute is Extraneous
Let a ® b in F.
n
n
To test if attribute A Î a is extraneous
1.
Compute ({a} – A)+ using the dependencies in F
2.
Check that ({a} – A)+ contains b; if it does, A is extraneous in a
To test if attribute A Î b is extraneous
1.
Compute a+ using only the dependencies in
F’ = (F – {a ® b}) È {a ®(b – A)},
2.
Check that a+ contains A; if it does, A is extraneous in b
Database System Concepts - 6th Edition
8.20
©Silberschatz, Korth and Sudarshan
Computing a Canonical Cover
n Algorithm
Fc = F
repeat
Replace any a1 ® b1 and a1 ® b2 in Fc
with a1 ® b1 b2 (union rule)
Find a ® b in Fc with an extraneous
attribute either in a or in b
If an extraneous attribute is found,
delete it from a ® b
until F does not change
Note: Union rule may become applicable
after some extraneous attributes have
been deleted, so it has to be re-applied
n Example
R = (A, B, C)
F = {A ® BC
B®C
A®B
AB ® C}
n
Combine A ® BC and A ® B into A ® BC
l
n
n
A is extraneous in AB ® C
l
B ® C logically implies AB ® C
l
Now, Fc = {A ® BC, B ® C}
C is extraneous in A ® BC
l
n
8.21
A ® C is logically implied by
A ® B and B ® C.
The canonical cover
l
Database System Concepts - 6th Edition
Now, Fc = {A ® BC, B ® C, AB ® C}
Fc = {A ® B, B ® C}
©Silberschatz, Korth and Sudarshan
Lossless-join Decomposition
n {R1, …, Rn} is a decomposition of R if R1 È … È Rn
n {R1, R2} is a loseless-join decomposition of R if
r = ÕR1 (r)
ÕR2 (r)
n Decomposition {R1, R2} of R is lossless join if
at least one of the following dependencies is in F+:
l
R1 Ç R2 ® R1
l
R1 Ç R2 ® R2
(i.e., if one of the two sub-schemas hold the key of the other sub-schema)
Database System Concepts - 6th Edition
8.22
©Silberschatz, Korth and Sudarshan
Dependency Preservation
n Let F be set of FD on R, and {R1, …, Rn} is a decomposition of R.
n The restriction of F to Ri, denoted by Fi, is the set of FDs in F+ that
include only attributes in Ri.
n A decomposition is dependency preserving, if
(F1 È F2 È … È Fn)+ = F+
n If it is not, then checking updates for violation of functional
dependencies may require computing joins, which is expensive.
Database System Concepts - 6th Edition
8.23
©Silberschatz, Korth and Sudarshan
Example
n R = (A, B, C)
F = {A ® B, B ® C)
l
Can be decomposed in two different ways
n R1 = (A, B), R2 = (B, C)
l
Lossless-join decomposition:
R1 Ç R2 = {B} and B ® BC
l
Dependency preserving
n R1 = (A, B), R2 = (A, C)
l
Lossless-join decomposition:
R1 Ç R2 = {A} and A ® AB
l
Not dependency preserving
(cannot check B ® C without computing R1
Database System Concepts - 6th Edition
8.24
R2)
©Silberschatz, Korth and Sudarshan
Goals of Normalization
n Let R be a relation scheme with a set F of functional dependencies
n Decide whether a relation scheme R is in “good” form
n In the case that a relation scheme R is not in “good” form,
decompose it into a set of relation scheme {R1, R2, ..., Rn} such that
l
Each relation scheme is in good form
l
The decomposition is a lossless-join decomposition
l
Preferably, the decomposition should be dependency preserving
Database System Concepts - 6th Edition
8.25
©Silberschatz, Korth and Sudarshan
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a set F of FDs
if for all functional dependencies in F+ of the form
a®b
where a Í R and b Í R, at least one of the following holds:
n a ® b is trivial (i.e., b Í a)
n a is a superkey for R
Example schema not in BCNF:
inst_dept (ID, name, salary, dept_name, building, budget )
because dept_name® building, budget holds on inst_dept,
but dept_name is not a superkey
Database System Concepts - 6th Edition
8.26
©Silberschatz, Korth and Sudarshan
BCNF Example
n Suppose we have a schema R and a non-trivial dependency a ® b
causes a violation of BCNF.
We decompose R into:
• (aUb)
• (R-(b-a))
n In our example,
l
l
a = dept_name
b = building, budget
and inst_dept is replaced by
l ( a U b ) = ( dept_name, building, budget )
l
( R - ( b - a ) ) = ( ID, name, salary, dept_name )
n Of course, we are only interested in lossless join decomposition!
Database System Concepts - 6th Edition
8.27
©Silberschatz, Korth and Sudarshan
Testing for BCNF
n To check if a non-trivial dependency a ® b causes a violation of BCNF
1. Compute a+ (the attribute closure of a), and
2. Verify that it includes all attributes of R, that is, it is a superkey of R
n To check if a relation schema R is in BCNF
l
l
Simplified test: check only the FDs in F (not in F+) for violation of BCNF
If none of the dependencies in F causes a violation of BCNF, then none
of the dependencies in F+ will cause a violation of BCNF either.
n However, simplified test using only F is incorrect when testing a relation Ri
in a decomposition of R
l
Example: consider R = (A, B, C, D, E), with F = { A ® B, BC ® D }
4 Decompose R into R1 = (A, B) and R2 = (A, C, D, E)
4
Neither of the dependencies in F contain only attributes from
(A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.
4
In fact, dependency AC ® D in F+ shows R2 is not in BCNF.
Database System Concepts - 6th Edition
8.28
©Silberschatz, Korth and Sudarshan
BCNF Decomposition Algorithm
result := {R};
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let a ® b be a nontrivial functional dependency that
holds on Ri such that a ® Ri is not in F +,
and a Ç b = Æ;
result := (result – Ri ) È (Ri – b) È (a, b );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.
Database System Concepts - 6th Edition
8.29
©Silberschatz, Korth and Sudarshan
Example of BCNF Decomposition
n class (course_id, title, dept_name, credits, sec_id, semester, year, building,
room_number, capacity, time_slot_id)
n Functional dependencies:
l
l
course_id → title, dept_name, credits
building, room_number → capacity
course_id, sec_id, semester, year → building, room_number, time_slot_id
n A candidate key {course_id, sec_id, semester, year}.
l
n BCNF Decomposition:
l
l
course_id→ title, dept_name, credits holds
4 but course_id is not a superkey.
We replace class by:
course(course_id, title, dept_name, credits)
4 class-1 (course_id, sec_id, semester, year, building,
room_number, capacity, time_slot_id)
4
Database System Concepts - 6th Edition
8.30
©Silberschatz, Korth and Sudarshan
BCNF Decomposition (Cont.)
n course is in BCNF
l
How do we know this?
n building, room_number→capacity holds on class-1
l
l
but {building, room_number} is not a superkey for class-1.
We replace class-1 by:
4
classroom (building, room_number, capacity)
4
section (course_id, sec_id, semester, year, building,
room_number, time_slot_id)
n classroom and section are in BCNF
Database System Concepts - 6th Edition
8.31
©Silberschatz, Korth and Sudarshan
BCNF and Dependency Preservation
n Constraints, including functional dependencies, are costly to check in practice
unless they pertain to only one relation
l
It is sufficient to test only those dependencies on each individual relation
of a decomposition
l
If so, the decomposition is dependency preserving.
n It is not always possible to get a BCNF decomposition that is dependency
preserving
l
Example: R = (J, K, L)
F = {JK ® L
L®K}
Two candidate keys = JK and JL
4
R is not in BCNF
4
Any decomposition of R will fail to preserve
JK ® L
This implies that testing for JK ® L requires a join
Database System Concepts - 6th Edition
8.32
©Silberschatz, Korth and Sudarshan
Third Normal Form: Motivation
n There are some situations where
l
BCNF is not dependency preserving, and
l
Efficient checking for FD violation on updates is important
n Solution: define a weaker normal form, called Third Normal Form (3NF)
l
Allows some redundancy (with resultant problems)
l
But FDs can be checked on individual relations without computing a join.
l
There is always a lossless-join, dependency-preserving decomposition
into 3NF.
Database System Concepts - 6th Edition
8.33
©Silberschatz, Korth and Sudarshan
Third Normal Form
n A relation schema R is in third normal form (3NF) if for all:
a ® b in F+
at least one of the following holds:
l
a ® b is trivial (i.e., b Î a)
l
a is a superkey for R
l
Each attribute A in b – a is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
n If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
Database System Concepts - 6th Edition
8.34
©Silberschatz, Korth and Sudarshan
3NF Example
n Example: R = (J, K, L)
F = {JK ® L
L®K}
Two candidate keys = JK and JL
l
l
R is in 3NF
4
JK ® L : JK is a superkey
4
L ® K : K is contained in a candidate key
J
L
K
j1
l1
k1
j2
l1
k1
j3
l1
k1
null
l2
k2
But R is not in BCNF (L ® K : nontrivial, L is not a superkey)
n There is some redundancy in this schema
l
Repetition of information
4
l
e.g., the relationship l1, k1
Need to use null values
4
e.g., to represent the relationship l2, k2 where there is no
corresponding value for J).
Database System Concepts - 6th Edition
8.35
©Silberschatz, Korth and Sudarshan
Testing for 3NF
n Need to check only FDs in F (not all FDs in F+)
n For each dependency a ® b,
l
Check if a is a superkey (using attribute closure)
n If a is not a superkey,
l
We have to verify if each attribute in b is contained in a candidate key of R
l
This test is rather more expensive, since it involves finding candidate keys
l
Testing for 3NF has been shown to be NP-hard
n Interestingly, decomposition into third normal form (described shortly) can be
done in polynomial time
Database System Concepts - 6th Edition
8.36
©Silberschatz, Korth and Sudarshan
3NF Decomposition Algorithm
Let Fc be a canonical cover for F;
i := 0;
for each functional dependency a ® b in Fc do
i := i + 1;
Ri := a b
if none of the schemas Rj, 1 £ j £ i contains a candidate key for R
then begin
i := i + 1;
Ri := any candidate key for R;
end
/* Optionally, remove redundant relations */
repeat
if any schema Rj is contained in another schema Rk
then /* delete Rj */
Rj = Ri;;
i = i - 1;
return (R1, R2, ..., Ri)
Database System Concepts - 6th Edition
8.37
©Silberschatz, Korth and Sudarshan
3NF Decomposition: An Example
n Relation schema:
cust_banker_branch = (customer_id, employee_id, branch_name, type )
n The functional dependencies for this relation schema are:
1.
customer_id, employee_id ® branch_name, type
2.
employee_id ® branch_name
3.
customer_id, branch_name ® employee_id
n We first compute a canonical cover
l
branch_name is extraneous in the r.h.s. of the 1st dependency
l
No other attribute is extraneous, so we get FC =
customer_id, employee_id ® type
employee_id ® branch_name
customer_id, branch_name ® employee_id
Database System Concepts - 6th Edition
8.38
©Silberschatz, Korth and Sudarshan
3NF Decompsition Example (Cont.)
n The for loop generates following 3NF schema:
(customer_id, employee_id, type)
(employee_id, branch_name)
(customer_id, branch_name, employee_id)
l
Observe that (customer_id, employee_id, type) contains a candidate key
of the original schema, so no further relation schema needs be added
n At end of for loop, detect and delete schemas, such as (employee_id,
branch_name), which are subsets of other schemas
n The resultant simplified 3NF schema is:
(customer_id, employee_id, type)
(customer_id, branch_name, employee_id)
Database System Concepts - 6th Edition
8.39
©Silberschatz, Korth and Sudarshan
Comparison of BCNF and 3NF
n It is always possible to decompose a relation into a set of relations
that are in 3NF such that:
l
The decomposition is lossless
l
The dependencies are preserved
n It is always possible to decompose a relation into a set of relations
that are in BCNF such that:
l
The decomposition is lossless
l
It may not be possible to preserve dependencies
Database System Concepts - 6th Edition
8.40
©Silberschatz, Korth and Sudarshan
Overall Database Design Process
n We have assumed schema R is given
l
R could have been generated when converting E-R diagram to a set
of tables.
l
R could have been a single relation containing all attributes that are
of interest (called universal relation).
4
l
Normalization breaks R into smaller relations
R could have been the result of some ad hoc design of relations,
which we then test/convert to normal form.
Database System Concepts - 6th Edition
8.41
©Silberschatz, Korth and Sudarshan
Denormalization for Performance
n May want to use non-normalized schema for performance
l
For example, displaying prereqs along with course_id, and title
requires join of course with prereq
n Alternative 1: Use denormalized relation containing attributes of course
as well as prereq with all above attributes
l
faster lookup
l
extra space and extra execution time for updates
l
extra coding work for programmer and possibility of error in extra code
n Alternative 2: use a materialized view defined as
course
l
prereq
Benefits and drawbacks same as above, except no extra coding work
for programmer and avoids possible errors
Database System Concepts - 6th Edition
8.42
©Silberschatz, Korth and Sudarshan
Other Design Issues
n Some aspects of database design are not caught by normalization
n Examples of bad database design, to be avoided:
Instead of earnings (company_id, year, amount), use
l
earnings_2004, earnings_2005, earnings_2006, etc., all on the schema
(company_id, earnings).
4
l
Above are in BCNF, but make querying across years difficult and
needs new table each year
company_year (company_id, earnings_2004, earnings_2005, earnings_2006)
4
Also in BCNF, but also makes querying across years difficult and
requires new attribute each year
4
Is an example of a crosstab, where values for one attribute become
column names – used in spreadsheets, and in data analysis tools
Database System Concepts - 6th Edition
8.43
©Silberschatz, Korth and Sudarshan
End of Chapter 8
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Download