slides on fourth normal form

advertisement
Fundamentals/ICY: Databases
2012/13
WEEK 11 –
Normal Form
(optional material)
th
4
John Barnden
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham, UK
Fourth Normal Form (4NF)
About a different sort of issue from 2NF/3NF/BCNF.
Those NFs are concerned with the redundancy from
functional dependencies (FDs).
4NF is concerned with redundancy resulting from
multivalued dependencies (MVDs).
Fourth Normal Form (4NF), contd.
A multivalued dependency of some attribute X on an
attribute-set D is like a functional dependency except that
X sometimes has several values for a given value of D.
The crucial point is that once the D value is specified, the
X values are independent of other attributes.
So, we can think of X as a multivalued attribute
implemented by putting different values in different
rows, where the set of X values is fully determined by
just the value of D.
E.g.: imagine multivalued car-colour being determined by just
the make and year of the car.
Notes re Multivalued Dependencies
 Caution: some books take functional dependencies to be just a
special case of multivalued dependencies. So all dependencies are
technically “multiple”, but some actually involve multiplicity and
some don’t.
 The determinant D in a (truly) multivalued dependency cannot be a
superkey, because if it were then there could only be one X value
per D value.
 The D/X association doesn’t violate 2NF, 3NF or BCNF because
it’s not a functional dependency.
 “Trivial” multivalued dependencies include those where D together
with X forms a superkey (including the case where there are no
other attributes). Trivial MVDs avoid the problem on the next slide.
Fourth Normal Form
[R,C&C and R&C:] A table is in 4NF if

It is in 3NF and

It does not have multiple multivalued dependencies
[Garcia-Molina et al.:] A table is in 4NF if

It is in BCNF

It does not have any non-trivial multivalued
dependencies
Example of Multiple MDs
Example: an employee may be assigned to several work
assignments and may, independently of that, help several
different charitable organizations.
If we try to use one table, we have

a multivalued dependency of assignment on (say) employee-id

a multivalued dependency of charitable-org on employee-id
Three Ways of Trying to Encode the
two multivalued dependencies
(Figure no. shown is from R&C 6th ed. It is 5.10 in 7th ed, and Fig. 7.10 in R,C&C.)
Problems with those Multiple MDs
Those methods cause wasted space, redundancy,
and/or additional manipulation complexity (with
distinct possibility of getting the manipulation
wrong).
Because of NULL values it may be difficult to define
a good or any PK. May need to replace NULLs by
some other special value.
A Set of Tables in 4NF
(Figure no. shown is from 6th ed. of textbook. It is 5.11 in 7th ed., and 7.11 in R,C&C)
Notes on the Resulting Tables
1) Tables ASSIGNMENT and SERVICE_V1 are bridging tables.
2) The PK of SERVICE_V1 consists of both its attributes.
3) The PK of ASSIGNMENT is meant to be ASSIGN_NUM. But note
that the other 2 columns also form a candidate key.
4) Each of the tables in the diagram is in 4NF, under both definitions of
4NF.
A. Each table is in BCNF (and hence 3NF), and
B. The only tables containing MVDs are ASSIGNMENT and SERVICE_V1, and
C. In each of these tables, there is only one MVD, with determinant =
EMP_NUM, and
D. Each of these MVDs is trivial: the attributes involved in it (the “D” together
with the “X”) is a superkey.
Problems even with a Single MVD
1) Suppose there is an attribute Z (different from X) that is not
determined by D together with X, such as SIZE. (Hence, also, Z is
not determined by D by itself.) Then there are different
represented objects (e.g. cars) with different values of Z but the
same value of D and X, and each such object needs to have rows
in the table to cover all the different values of X (e.g., red, blue
and green) associated with that value of D.
So we get redundancy of representation of the D/X association
(same problem as with e.g. partial and transitive dependencies, but
now worse because of the multi-valuedness of X).
Notice that the above situation can only happen if the MVD is
non-trivial. If the MVD were trivial you wouldn’t be able to have
a Z as above.
Problems with a Single MD, contd.
2) Just the problem covered earlier in module concerning
car-colour: if there is another attribute Y in the table and
Y is determined by D , then:
either it has a value repeated in all the different rows
holding the different X values for a single D value, so
we get redundancy of the representation of the D/Y
association
or if, say, NULLs are used instead of repeating the Y
value, we get extra manipulation complexity in
handling/maintaining Y.
Problems with a Single MD, contd.
But note that problem 2 is prevented from arising if
the table is in BCNF, because D has to be a nonsuperkey determinant (of Y), and this is disallowed
by BCNF.
Similarly, get some such protection from problem 2 if
the table is in 3NF or just 2NF.
But BCNF etc. don’t prevent either problem 1 or
special problems arising from the interaction of
different multivalued dependencies.
Download