Challenges in Natural Language Processing:

advertisement
Fundamentals/ICY: Databases
2013/14
Week 10 –Monday –Normalization, contd
John Barnden
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham, UK
Reminder
Second Normal Form
An entity type is in second normal form (2NF) if:

It is in 1NF and

It includes no partial dependencies
Conversion to 2NF
For each determinant D involved in a partial
dependency in the original entity type T,
use D as, also, the PK for a new entity type NT(D)
and move out the attributes X determined by D into
NT(D).
D itself stays in T as well as being copied into NT(D).
Reminder:
Partial and Transitive Dependencies
Second Normal Form (2NF) Conversion
results on example on previous slide
New
Third Normal Form
An entity type is in third normal form (3NF) if:

It is in 2NF and

It contains no transitive dependencies
Ent. Type in 2NF but not in 3NF because
of a “transitive” dependency
Transitive Dependencies
 A prime attribute is one that is within some candidate key
(not necessarily the primary key).
So a non-prime attribute is, in particular, not within the PK.
 A transitive dependency is where the determinant D is at least
partially outside the PK and is not a superkey,
and the determined attribute X is non-prime (the reason for this
restriction is on a later slide).

E.g.: previous Figure for simple case of a simple (= one-attribute)
determinant.

Above definition is partly based on Garcia-Molina, Ullman & Widom 2009.
More general than the account in our textbook.
Conversion to 3NF
For each determinant D involved in a transitive
dependency in the original entity type T,
use D as, also, the PK for a new entity type NT(D)
and move out the attributes X transitively determined
by D into NT(D).
NB: the determinants themselves stay in T as well.
Third Normal Form (3NF)
Conversion Results on previous example
The Boyce-Codd Normal Form (BCNF)
 Determinants of partial and transitive functional dependencies
are not superkeys.
So the corresponding normalization gets rid of some nonsuperkey determinants used in functional dependencies.
 Normalization into BCNF gets rid of all such determinants.
An entity type is in BCNF if it’s in 1NF and every determinant
in a functional dependency is a superkey

i.e., every attribute-set that determines any other attribute determines all
the attributes, so there’s no redundancy problem
An Entity Type in 3NF but not in BCNF
The dependency is NOT TRANSITIVE since B is prime
Decomposition to BCNF
The middle diagram shows that changing the PK so as to
include C instead of B changes the dependency into a partial
one, which can then be removed in the usual way.
((ASIDE: A Simple Form of BCNF))
 Any simple (= one-attribute) superkey is a candidate key.
So BCNF requires, in particular, all simple determinants to be candidate keys.
 Some books (incl. our textbook) define BCNF merely to mean in effect that all
simple determinants are candidate keys.
This is a simpler, less general form of BCNF.
A table could be in simple-BCNF but not be in full BCNF.
 My definition of (full) BCNF is from Garcia-Molina, Ullman & Widom,
Database Systems: The Complete Book, 2nd. Ed., Pearson, 2009.
This book also gives a process for conversion to full BCNF.
BCNF versus 3NF
 BCNF implies that there are no partial or transitive
dependencies, so a table that is in BCNF is also in
3NF.
 ((If a table is in 3NF but not BCNF then each of the
non-superkey determinants D is partly outside the PK
and determines only prime attributes.

If also the PK is the only candidate key, then:
the attributes determined by each D must all be in the PK;
but they cannot cover all of the PK (otherwise D would be a
superkey). So the PK must be composite.))
((A Reason for Prime-X Exclusion in
Transitive Dependencies))
 Earlier we said that in a transitive dependency the determined attribute X is
non-prime (i.e. not within a candidate key). The reason is:
 In removing a transitive dependency, we delete the dependent attribute X from
the original entity type. If X were within the primary key (special case of
candidate key), that key would therefore be disrupted, and this would affect
other entity types referencing the table.
 But non-primary candidate keys are also sometimes used for such referencing,
and are then called secondary keys. So if X were in such a key, the conversion to
3NF would disrupt the referencing.
 So, to keep things simple for the purposes of 3NF, all prime Xs are banned from
being transitively dependent.
((Inter-Table Reference Disruption contd.))
 NB: Conversion to 2NF can, and from 3NF to BCNF does,
remove dependent prime attributes, so is potentially disruptive of
reference between entity types.
But I assume that in practice it’s rarely a problem in conversion to
2NF, because, in partial dependencies, the dependent attributes are
rarely prime. In particular, they cannot be in the PK.
 By contrast, if a 3NF table is not in BCNF then the troublesome
dependencies necessarily involve prime Xs (see a previous slide).
((3NF and Reference Disruption contd.))
 Some textbooks (e.g., Connolly and Begg, Database Systems,
Pearson, 2010) only require transitive dependencies to avoid nonprimary-key attributes, rather than to avoid all prime attributes. In
that case, conversion to 3NF can disrupt references using a
secondary key. But at least the cases of 2NF and 3NF are now
more similar to each other.
 I haven’t seen a version of 2NF that is only concerned with nonprime Xs. But don’t be too surprised if you come across that!
Material on 4NF:
in Week 11 if there’s time (or in
Revision Week)
Normal Forms Overall
Let “<” mean “provides less protection than”. Then:
1NF < 2NF < 3NF < BCNF ((and 3NF < 4NF))
((Also BCNF < 4NF under the second definition of 4NF.
BCNF and 4NF guard against relatively unusual situations.
BCNF is more disruptive to achieve than 2NF or 3NF.
Merely requiring 2NF is now unusual.
So 3NF is a reasonable target.
Non-Normalization/Denormalization
Normalization leads to more tables.
Joining larger number of tables takes additional
disk input/output (I/O) operations, additional
manipulation complexity, and possibly substantial
communication delays.
Conflicts among design principles, information
requirements, and processing speed are often
resolved through compromises that may include
ending up with some non-normalized tables.
Non-/Denormalization (continued)
Unnormalized tables in a production database tend
to have these defects:

Data updates are less efficient to the extent that
programs that read and update tables must deal with
larger tables

((Indexing is much more cumbersome))

((Unnormalized tables yield no simple strategies for
creating virtual tables known as views))
Summary:
Normalization and Database Design
 Normalization helps eliminate data redundancies and some other
aspects of poor structure.
 Normalization focusses on problems in individual entity types.
 Difficult to separate normalization from overall ER modelling
process.
 Normalization cannot, by itself, guarantee good designs.
 3NF is often enough, but BCNF, 4NF etc. may also need to be
considered.
 Non-normalized entity types may be desirable in some cases, to
increase processing speed and/or reduce conceptual complexity of
operations.
Download