Fundamentals/ICY: Databases 2012/13 WEEK 11 – Normal Form (optional material) th 4 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK Fourth Normal Form (4NF) About a different sort of issue from 2NF/3NF/BCNF. Those NFs are concerned with the redundancy from functional dependencies (FDs). 4NF is concerned with redundancy resulting from multivalued dependencies (MVDs). Fourth Normal Form (4NF), contd. A multivalued dependency of some attribute X on an attribute-set D is like a functional dependency except that X sometimes has several values for a given value of D. The crucial point is that once the D value is specified, the X values are independent of other attributes. So, we can think of X as a multivalued attribute implemented by putting different values in different rows, where the set of X values is fully determined by just the value of D. E.g.: imagine multivalued car-colour being determined by just the make and year of the car. Notes re Multivalued Dependencies Caution: some books take functional dependencies to be just a special case of multivalued dependencies. So all dependencies are technically “multiple”, but some actually involve multiplicity and some don’t. The determinant D in a (truly) multivalued dependency cannot be a superkey, because if it were then there could only be one X value per D value. The D/X association doesn’t violate 2NF, 3NF or BCNF because it’s not a functional dependency. “Trivial” multivalued dependencies include those where D together with X forms a superkey (including the case where there are no other attributes). Trivial MVDs avoid the problem on the next slide. Fourth Normal Form [R,C&C and R&C:] A table is in 4NF if It is in 3NF and It does not have multiple multivalued dependencies [Garcia-Molina et al.:] A table is in 4NF if It is in BCNF It does not have any non-trivial multivalued dependencies Example of Multiple MDs Example: an employee may be assigned to several work assignments and may, independently of that, help several different charitable organizations. If we try to use one table, we have a multivalued dependency of assignment on (say) employee-id a multivalued dependency of charitable-org on employee-id Three Ways of Trying to Encode the two multivalued dependencies (Figure no. shown is from R&C 6th ed. It is 5.10 in 7th ed, and Fig. 7.10 in R,C&C.) Problems with those Multiple MDs Those methods cause wasted space, redundancy, and/or additional manipulation complexity (with distinct possibility of getting the manipulation wrong). Because of NULL values it may be difficult to define a good or any PK. May need to replace NULLs by some other special value. A Set of Tables in 4NF (Figure no. shown is from 6th ed. of textbook. It is 5.11 in 7th ed., and 7.11 in R,C&C) Notes on the Resulting Tables 1) Tables ASSIGNMENT and SERVICE_V1 are bridging tables. 2) The PK of SERVICE_V1 consists of both its attributes. 3) The PK of ASSIGNMENT is meant to be ASSIGN_NUM. But note that the other 2 columns also form a candidate key. 4) Each of the tables in the diagram is in 4NF, under both definitions of 4NF. A. Each table is in BCNF (and hence 3NF), and B. The only tables containing MVDs are ASSIGNMENT and SERVICE_V1, and C. In each of these tables, there is only one MVD, with determinant = EMP_NUM, and D. Each of these MVDs is trivial: the attributes involved in it (the “D” together with the “X”) is a superkey. Problems even with a Single MVD 1) Suppose there is an attribute Z (different from X) that is not determined by D together with X, such as SIZE. (Hence, also, Z is not determined by D by itself.) Then there are different represented objects (e.g. cars) with different values of Z but the same value of D and X, and each such object needs to have rows in the table to cover all the different values of X (e.g., red, blue and green) associated with that value of D. So we get redundancy of representation of the D/X association (same problem as with e.g. partial and transitive dependencies, but now worse because of the multi-valuedness of X). Notice that the above situation can only happen if the MVD is non-trivial. If the MVD were trivial you wouldn’t be able to have a Z as above. Problems with a Single MD, contd. 2) Just the problem covered earlier in module concerning car-colour: if there is another attribute Y in the table and Y is determined by D , then: either it has a value repeated in all the different rows holding the different X values for a single D value, so we get redundancy of the representation of the D/Y association or if, say, NULLs are used instead of repeating the Y value, we get extra manipulation complexity in handling/maintaining Y. Problems with a Single MD, contd. But note that problem 2 is prevented from arising if the table is in BCNF, because D has to be a nonsuperkey determinant (of Y), and this is disallowed by BCNF. Similarly, get some such protection from problem 2 if the table is in 3NF or just 2NF. But BCNF etc. don’t prevent either problem 1 or special problems arising from the interaction of different multivalued dependencies.