Module 5 – Logical Design: Table Mapping and Normalization
Goals: To learn the process of table mapping normalization and understand why it is part of the database
development process.
Objectives: To be able to apply these concepts to a variety of problems.
Introduction: Now that the systems analysis and design and conceptual design have been completed, it is possible
to progress to the next step, logical design.
We begin by using table mapping techniques to convert the ERD to a diagram that exposes which entities and
attributes are related, or functionally dependent, on others in the ERD. It is with this knowledge that can proceed to
the technique of normalization, which validates the ERD. Let’s look at the steps.
Transforming ER Diagrams into Relations
Step 1: Map Regular Entities
Each regular entity type is transformed into a relation.
Composite attributes should be broken down into each individual attribute.
Multivalued Attributes:
Two new relations are created when a multivalued attribute exists for an entity. The first relation
contains all of the attributes of the entity type except the multivalued attribute. The second relation
contains the two attributes that form the primary key of the second relation. The first attribute is the
primary key of the first relation, which becomes a foreign key to the first relation. The second attribute
is the multivalued attribute. The name of the second relation should capture the meaning of the
multivalued attribute.
Step 2: Map Weak Entities
For each weak entity type, create a new relation and include all of the simple attributes as attributes of this
relation. Include the primary key of the owner relation as a foreign key attribute in the new relation. The
primary key of the new relation is a combination of the primary key of the owner and the partial identifier
of the weak entity type.
Step 3: Map Binary Relationships
A. Map Binary One-to-Many Relationships
First, create a relation for each of the two entity types participating in the relationship. Next, include
the primary key attribute of the entity on the one-side of the relationship as a foreign key on the manyside of the relationship.
B. Map Binary Many-to-Many Relationships
Suppose that there is a binary many-to-many relationship between two entities, A and B. For such a
relationship, we create a new relation C. Include as a foreign key attribute in C the primary key for
each of the two participating entity types. These attributes become the primary key of C.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-1-
C. Map Binary One-to-One Relationships
First, two relations are created. Second, the primary key of one of the relations is included as a primary
key in the other relation.
Step 4: Map Associative Entities
First, create three relations, one for each of the participating entity types and a third for the associative
entity. The second step depends on whether an identifier was assigned to the associative entity.
Identifier Not Assigned
Default primary key for associative relation is the two primary key attributes of the other two relations.
These are then foreign keys into the other two relations.
Identifier Assigned
Identifier is the primary key. The primary keys of the two participating entity types are foreign keys in
the associative relation.
Step 5: Map Unary Relationships
A. Unary One-To-Many Relationships
The entity type is mapped to a relation as in step 1. A foreign key attribute is added which references
the primary key values.
B. Unary Many-To-Many Relationships
Two relations are created. One to represent the entity type in the relationship and the other an
associative relation to represent the M:N relationship itself. The primary key of the associative relation
consists of two attributes. These attributes both take their name from the primary key of the other
relation. They are both foreign keys into the other relation.
Step 6: Map Ternary (and n-ary) Relationships
Convert to an associative entity. Create the three relations. Create a fourth relation for the associative
entity. The primary key consists of the primary keys of the other three relations. These attributes are then
foreign keys into the three relations.
Example
Now that we have gone over all of the rules, let’s look at an example. Figure 1 shows an ER Diagram for
Wilber’s Bait World. We wish to convert this from the ER Model into a logical model. Let’s take each
part piece by piece:
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-2-
Figure 1
You will notice a few things about this ER diagram. First, there are two associative entities: Cust_Order
and the Has between Baitshop and Item.
You will also notice that there is a weak entity, Dependents. Also, notice that we have several composite
attributes, all for addresses. Finally, we have a multi-valued attribute on employee, called skill.
Let’s take this step by step.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-3-
1. Map Regular Entities
1.1. Baitshop
Baitshop has a composite attribute, address. We will need to break this down into separate fields. Also, we
will underline Shop_No since it is the primary key. Figure 2 shows the relation, or table, that will result
from the Baitshop entity:
SHOP_NO SHOP_NAM STREET
CITY STAT ZIPCO TELEPHO
E
EFigure 2 DE
NE
1.2. Employee
First, let’s create a new relation for the employee entity:
EMP_ID
F
K
NAME
STREET
MANAGERI
D
TAXRA SHOPRE
TE
NT
CITY
STAT ZIPCOD TELEPHON
E
E
Figure 3E
Employee also has a multi-valued attribute, so we need to create a second table. We will call this
employee_skill. As we have been instructed in Step 1, we should include the primary key of the Employee
table as well as the multivalued attribute itself, as shown in Figure 4:
EMP_ID Skill
Figure 4
Emp_ID is a foreign key to the employee table
1.3. Customer
Like the Employee entity, the Customer entity has a composite attribute, address. The table that results
from this entity is shown in Figure 5:
CUSTOMER_ NAM STRE CIT STAT ZIPCOD WTELEPHO HTELEPHO
ID
E
ET
Y
EFigure 5E
NE
NE
CREDIT_LI
MIT
1.4. Item
Figure 6 shows the table that results from the Item entity:
ITEM_NO
2.
F
K
DESCRIPTI TYPE PRIC WHOLESALER_I
ON Figure 6
E
D
Map Weak Entities
We have one weak entity, Dependents. Figure 7 shows the table that results from the Dependents entity:
EMP_ID
NAM GENDE
R
Figure 7E
Note that the primary key is composed of EMP_ID and NAME. Also, EMP_ID is a foreign key to the
employee table.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-4-
3. Map Binary Relationships
3.1. Map Binary One-to-Many Relationships
We have one binary one-to-many relationship between Baitshop and Employee. Remember our rule, the
primary key from the one side migrates to the many side. So, we change the employee relation to include
the primary key from the Baitshop relation. The new tables are represented in Figure 8:
SHOP_NO SHOP_NAME STREET
CITY
STATE ZIPCODE TELEPHONE
MANAGERID
TAXRATE SHOPRENT
FK
EMP_ID
NAME
STREET
CITY
STATE
ZIPCODE TELEPHONE
SHOP_NO
Figure 8
3.2. Map Binary Many-to-Many Relationships
The only binary Many-to-Many relationship that we have is between Item and the associative entity,
Cust_Order. We will map this later, once we resolve the associative entity.
3.3. Map Binary One-to-One Relationships
We have a one-to-one relationship between employee and baitshop. We can choose to place the primary
key of either one in the other. However, since all baitshops must have a manager, it would be best to place
the employee id in the baitshop table. We will call this ManagerID. We represent this in Figure 9
SHOP_NO SHOP_NAME STREE
T
EMP_ID
NAME
CITY STATE ZIPCODE
STREET
TELEPHONE
MANAGERID TAXR SHOP MANAGERID
ATE RENT
FK
CITY
STATE
ZIPCODE TELEPHONE
SHOP_NO
Figure 9
4. Map Associative Entities
4.1. Identifier Not Assigned
To Baitshop
In this example, we have both cases. Let’s look at the first case, identifier not
assigned. The Has associative entity between BaitShop and Item does not have
an identifier. We will convert this to a relation, called Baitshop_Invent. According
to our rules, we need to create a primary key that consists of the two primary
keys from BaitShop and Item. These become foreign keys into Baitshop and
Item. There are also 2 attributes, Quantity and Last_Order. These need to be
included in the relation also.Figure 10 shows this new relation.
SHOP_NO ITEM_NO
To Item
QUANTITY
LAST_ORDER
Figure 10
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-5-
4.2. Identifier Assigned
In this case, we use the identifier as the primary key. In our case study, we have
an associative entity between Baitshop and Customer. We will also include the
primary keys from both of these, but not as a composite primary key but rather as
foreign keys. We will also include the Purchase_Date as a field. The relation is
represented in Figure 11 and we will call it Cust_Order.
Figure 11
ORDER_I
D
SHOP_NO
CUSTOMER_I
D
PURCHASE_DATE
To
To Customer
Baitsho
p Relationships
5. Map Unary
We have no Unary relationships in our example.
6. Map Ternary Relationships
We have no Ternary relationships.
7. Miscellaneous
One final relationship that we have to clean up, the Has between Item and Cust_Order. This is a many-to-many
relationship. Therefore, we need to create a new relation, which we will call Order_Line. Order_Line is
represented in Figure 12 and has a composite primary key composed of the primary key from Order_Line and
the primary key from Item. We also include the quantity attribute.
ORDER_I
D
ITEM_NO
QUANTITY
Figure 12
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-6-
The Complete Logical Model
We will now put the entire logical model together. It is represented below:
Baitshop
SHOP_NO SHOP_NAME STREE
T
Employee
CITY STATE ZIPCODE
TELEPHONE
MANAGERID TAXR SHOP MANAGERID
ATE RENT
FK
FK
NAME
EMP_ID
STREET
CITY
STATE
ZIPCODE TELEPHONE
SHOP_NO
FK
Emp_Skill
EMP_ID
Skill
Dependent
EMP_ID
NAM GENDER
E
Customer
CUSTOMER_ NAME STREET CITY STATE ZIPCODE WTELEPHONE HTELEPHONE CREDIT_LIMIT
ID
Item
ITEM_NO
DESCRIPTION
TYPE
PRICE
WHOLESALER_ID
FK
Baitshop_Invent
SHOP_NO ITEM_NO
QUANTITY
LAST_ORDER
FK
FK
Cust_Order
FK
ORDER_I
D
SHOP_NO
CUSTOMER_I
D
ITEM_NO
QUANTITY
PURCHASE_DATE
Order_Line
ORDER_I
D
FK
Normalization
Normalization is an important part of the design of relational databases. Normalization is the process of applying a
set of rules against a proposed set of tables to identify and eliminate problems in the proposed data structure. This is
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-7-
part of the process of converting our conceptual design, as represented by our ER diagram, into a logical design
before physical implementation.
Normalization is used to validate, not replace, the ER diagram. If the process of normalization reveals errors in the
ERD, the ERD must then be updated. The resulting relations are then turned into a table map. The table map is
another way to graphically represent the database and how the tables relate to each other. This table map, which will
be covered in this text, is generally considered part of appropriate system documentation. It is also a good tool for
the database developer to keep on hand to remind him or her of the structure of the database. It is especially handy
for large projects, unless one has a photographic memory with abundant storage.
Many beginners and lazy database designers proceed to the logical step by simply composing a table map of the
entities and skipping the normalization process. This is a sloppy and dangerous practice, since normalization often
reveals subtle but important nuances of how data in relational tables behave. These oversights will never remain
buried once the implementation is in use. It is also a way for the database developer to guarantee s/he has a firm
grasp of the inherent meaning of the data.
Review of Keys
As previously discussed, the relational model represents data as a series of tables. Data is represented as a set of
rows comprised of columns. The order of the columns is insignificant. All rows must have a column that acts as a
primary key (with its properties of never being blank and always unique). Each column contains an attribute, or a
descriptor of a piece of information relevant to the database. Each table, made up of these rows, is called a relation.
Any attribute, or combination of attributes, that could possibly serve as a key is called a candidate key. It is from the
pool of potential keys, the candidates, that primary, secondary and foreign keys are chosen. A candidate key has the
properties that it will uniquely identify the row, whether it is a simple attribute or combination of attributes. It also
must be nonredundant; that is, no attribute of the key can be deleted without losing its unique identification status.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-8-
Primary keys are assigned for several reasons: to serve as a value to locate the exact relation desired; to guarantee
uniqueness of each row; and to be used in referential integrity, to be discussed later in the text. We denote primary
keys by underlining them in their rows, like this:
CUSTOMER (Customer_ID, Customer_last, Customer_first, Street, City, Zip, Telephone)
There are cases in which the primary keys may be more than one attribute. This is the case with weak entities, which
inherit the primary key from the strong entity in addition to its own identifier. This is known as a composite key, and
is denoted in this manner:
DEPENDENT (Employee_ID, Dependent_ID, Dependent_name, date_of_birth)
Remember from our discussion of keys that there can be foreign keys as well. A foreign key is a primary key from
another relation that has emigrated to another relation. This can serve two purposes: one, the insertion of the primary
key prevents the need to duplicate attributes from another table. As an example, we need to know which customer
has placed an order, but we don’t want to duplicate this customer information in the order table. By simply storing
the Customer_ID, we can join the two tables when necessary to recreate a customer’s order.
The second use of a foreign key is referential integrity. Referential integrity can also be programmed into the back
end of the database at physical implementation. What referential integrity means is that every child must have a
parent. Using the same example of the customers and the orders, we would never want to insert an order record
containing a Customer_ID that did correspond back to a valid relation in the table.
Foreign keys are denoted with a broken line:
ORDERS (Order_ID, Customer_ID, Date)
Why Normalize?
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
-9-
Every time a change is made to a database, consider it a chance for the database to be damaged. There are three
points of entry for errors that will lead to inconsistency and duplication in the database: when inserting a new record,
changing the value of an existing record, and deleting records from the file.
Proper design in this stage will help prevent these problems, called anomalies. There are ways during the physical
implementation that database developers can further protect the database from damage. For now, we rely on
normalization.
Consider the following table:
StudentID
S_Name
TeacherID
T_Name
InstruID
Instrument
01474
01423
01233
12100
Luigi
Ming
Veronique
Victor
143002
143002
122179
144300
Wolf
Wolf
Greylock
Blair
31
31
22
93
Oboe
Oboe
Violin
Cello
This design was quite common at one time. This file structure, called a flat file, evolved from when data was kept on
punched cards. Punched cards had eighty columns. All data pertaining to one record was on one card. The stack of
punched cards was the database! This is also the format of data in spreadsheets. It is also an easy way to design a file
structure: simply place all necessary fields in one big record.
It is not out of the realm of possibility that a database developer today will encounter such a file structure. Projects
often spring from spreadsheets that can no longer hold the size of the database necessary, or when the ability to
extract data from the spreadsheet reaches a point of diminishing returns. The data may be exported from an older
technology, such as a mainframe. Or, the newly assigned database developer may simply be the victim of an
untrained or lazy database developer.
This is a table in use by the ersatz Passageways Music School, a nonprofit organization that provides instruments
and musical instruction to low income students. Each student chooses one instrument type to learn, but can choose
another. Each instructor may teach one, or more, instruments. Each week, their assigned teacher meets with them
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 10 -
after school for lessons. The instruments have an ID on them to discourage theft. Each student comes in, signs out
the assigned instrument, has a lesson, then signs the instrument back in. The school assigns only one instrument of a
type to a student, since different instruments sometimes feel differently to the students.
What’s wrong with this picture? Look at the structure and ask yourself why this is a bad design. Remember that one
always keeps the bottom line in mind when designing data. The bottom line is generally how the data can be more
productively used or more information extracted. Ask yourself the following questions:

What will happen when a student takes more than one instrument?

What will happen when an instructor is able to teach more than one instrument or student?

What will happen when an instrument is assigned to different students for their individual sessions?
In each case, the answer is the same: when there is another record inserted into this table, the data pertaining to the
other columns gets repeated. Consider the first question; Ming is doing so well with the oboe that she decides to try
the violin as well. The record that reflects this decision is inserted.
StudentID
S_Name
TeacherID
T_Name
InstruID
Instrument
01474
01423
01233
12100
01423
Luigi
Ming
Veronique
Victor
Ming
143002
143002
122179
144300
122179
Wolf
Wolf
Greylock
Blair
Greylock
31
31
22
93
24
Oboe
Oboe
Violin
Cello
Violin
When a student takes more than one instrument, his or her information will be repeated. So is the information for the
instructor. This is unnecessary duplication.
Why do we care if information is duplicated? Consider the issue of space. Repeating information unnecessarily takes
up space on our physical server. While this may seem an insignificant matter, when databases increase in size it
becomes quite significant. Larger files cost in terms of disk space, and amount of time to join, back up and restore. It
will also take more time to search when searching on a single attribute. To retrieve a listing of students being taught
by Mr. Greylock, for example, a search of the entire database must take place.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 11 -
And what happens if this search for Mr. Greylock’s students hits a record where his name is misspelled as Mr.
Graylock? That record would be skipped, since Greylock and Graylock do not match. The more times an attribute is
entered, the more chances for user error.
Some intrepid flat-filers at this point might try simply repeating columns instead of rows. If a student takes a second
instrument, add a column called Instru2. This requires repeating the columns TeacherID, T_name, and InstruID as
well. These are called repeating groups. It also inserts a blank value for every student not taking a second
instrument. This will work for spreadsheet applications, but not for modern back ends that will reject the blank keys.
This is known as an insert anomaly. This awkward structure will also cause very awkward queries to be written to
extract needed information.
Repeating groups correspond to multivalued attributes in our ERD. It is possible, and in fact likely, that a student
could be learning more than one instrument, or that a teacher could teach more than one instrument, just as it is
likely that an employee have more than one skill or one dependent. Repeating groups are always eliminated in
relational databases.
And what happens when Passageways accepts a new student who has not been assigned a teacher as yet? We know
that keys are necessary to make a row unique. In this example, a composite key is needed to guarantee uniqueness:
StudentID + TeacherID + InstruID. Since we know that the properties of a key require that they can never be null,
we can’t insert this new student until a teacher and an instrument are assigned. This is again an insert anomaly.
Now assume that Miss Wolf has gotten married and is changing her name to Fox. How many times does the change
need to be executed? Once for every student, or twice in this example. This consumes systems resources as the file
is scanned for all values of “Wolf.” Physical disk retrieval is the slowest part of any system. We want to go to the
disk as little as possible. We are also increasing the chance for errors with each update.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 12 -
Finally, if one record is skipped for one reason or the other—say a new student for Miss Wolf had not yet been
posted to the database while the name change is running—we now have a database with both values, Wolf and Fox.
The database is said to be inconsistent—the values are not the same for all matching rows. These types of errors are
called update anomalies.
Look at the record that details Victor taking cello lessons from Mrs. Blair. What if Victor decides to drop out, and
we need to delete his record? Now we have lost all data pertaining to Mrs. Blair; that she exists and that she teaches
the cello. This is a delete anomaly. This delete anomaly also costs us data if Mrs. Blair decides to leave, or decides
to stop teaching the cello.
It is the problem of repetition, wasted space, and insert, update and delete anomalies that require normalization to
take place. The relational model is built upon a strong foundation of relational algebra and calculus, but it is not
necessary to know the math to perform a theory-based normalization process. Let’s begin.
The Start of Normalization: 0NF (Zero Normal Form)
The start of normalization is called 0NF (Zero Normal Form). This step is often skipped. 0NF is simply all columns
written out, bad design, repeating groups and all. 0NF is the structure we would see if we allowed the database
designer to insert the attempted Instru2, Instru3 columns. These should have been eliminated at the ERD level. In
cases of designs by people unfamiliar with ERD techniques, this occurs often. When converting legacy file
structures of this type, the first step is to eliminate these repeating groups.
1NF (First Normal Form)
All repeating groups have been eliminated. In addition, all composite attributes are broken down into simple
attributes (Address may be broken down into Number, Street, City, and Zip). At this point, 1NF (First Normal Form)
has been reached. 1NF is the table design that we see at the beginning of this chapter.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 13 -
In order to progress to second normal form, we need to understand functional dependencies. In 2NF, there are no
partial functional dependencies. In robust relational tables, we want full functional dependencies.
What is meant by full functional dependencies? An attribute is functionally dependent upon another attribute if X
determines Attribute Y. Y is said to be functionally dependent upon the value of X. Knowing the value of X would
be sufficient to find out the value of Y. Also, for any value of Y, there is only one value of X. X, the determing
value, is the determinant. The notation is XY.
We can say that Social Security Number is a determinant of a person’s Last name and First Name. For any SSN that
we see, we can determine the first and last name of the person. Also, for any given person (as represented by first
and last name, which admittedly is not unique) there is only one SSN.
SSN  Person’s Name
We draw this out using dependency maps like this:
SSN
Name
Partial functional dependency is where a nonkey attribute depends on part of the key but not all of it. Leaving
partial functional dependencies in will create problems in the relational model. It is not possible to have partial
functional dependencies in relations with a simple key (since there is no other part of the key for a nonkey attribute
to depend upon). There is a composite key for the above table. The dependency map looks like this:
StudentID
S_name
TeacherID
T_Name
InstruID
Instrument
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 14 -
The student name is determined by the student ID, the teacher name is determined by the teacher ID, and the
instrument is determined by the instrument ID. For each value of student name, teacher name, and instrument, there
is only one ID. The problem is that all of these values are in one table. The purpose of this database is keep track of
which student is taking which instrument with which teacher, and that purpose it meets. However, it is full of partial
key dependencies.
The key is StudentID + TeacherID + InstruID. While the student’s name is dependent upon the student ID, it is not
dependent on the teacher ID. The student’s name depends upon only one of the three attributes in the key and thus
violates the rule against having partial key dependencies. The same goes for the teacher’s name and instrument.
How do we solve this? By breaking the relation into three relations, each with its own key and only those attributes
that meet the full functional dependency requirement. We now have three relations:
STUDENT (StudentID, S_name)
TEACHER (TeacherID, T_name)
INSTRUMENT (InstruID, Instrument)
Each nonkey attribute in the relation is fully functionally dependent upon its key, or ID. We can now progress to
3NF.
3NF (Third Normal Form)
Third normal form (3NF) is 2NF with the addition of a new rule: no transitive dependencies. Remember how full
functional dependency works: The value of X will always determine the value of Y, and for every Y there is only
one X. If only part of the dependency is met, the dependency is said to be transitive.
Let’s say that now that we have a better file structure, Passageways has decided to add more attributes. The new
structure is:
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 15 -
STUDENT (StudentID, S_last, S_first, S_Street, City, Zip)
TEACHER (TeacherID, T_name, T_first, T_Street, City, Zip)
INSTRUMENT (InstruID, Instrument)
The zip code is the transitive dependency. If we know the StudentID, we can find out the zip code, but for any
given zip code, we cannot identify the student. There may be more than one student in a given zip code. So, even
though zip code describes the employee, it is a transitive dependency, and in theory, should be removed. In this
case, splitting the tables would cause another join when the student or teacher address is extracted.
STUDENT (StudentID, S_last, S_first, S_Street, Zip)
TEACHER (TeacherID, T_name, T_first, T_Street, Zip)
ZIPCODE (Zip, City, State)
INSTRUMENT (InstruID, Instrument)
The zip field is underlined to denote that it is now a foreign key.
It may sometimes be worth leaving in a transitive dependency to save on performance and simplicity. Putting the
information about the city and zip code back into the original relations is called denormalization. We will learn more
about how to make these decisions in later chapters.
When we move from 2NF to 3NF, it is necessary for us to know something about the meaning of the data and how it
will be used. It is also important not to lose any information when we decompose the tables. The goal is “lossless
decomposition,” or, checking to see if decomposing the original relations into several smaller relations has lost any
meaning from the data.
In this case, we have. Remember that the goal of this database is to keep track of which students are taking which
lessons from which instructors. Do you see any way of extracting this information from this design? We need to
have another table. Let’s call it “Lessons,” since it reflects a student taking lessons from one instructor with one
instrument. What needs to go into this relation?
LESSONS (StudentID, TeacherID, InstruID)
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 16 -
Why are all the attributes keys? Since we must have uniqueness, and we don’t want duplicate records inserted. The
back end will reject student assignments for different teachers or instruments.
This database is now in 3NF. Go back and verify full functional dependency.
Advanced Normal Forms
It was thought that 3NF was a far as one needed to go to ensure a robust design. Experience showed that still
anomalies existed beyond 3NF. These are Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF) and Fifth
Normal Form (5NF).
4NF (Fourth Normal Form)
Boyce-Codd Normal Form is a stronger form of 3NF. Since this example does not lend itself to a straightforward
demonstration of BCNF, we will return to BCNF after demonstrating 4NF and 5NF.
Our complete database now looks like this:
STUDENT (StudentID, S_last, S_first, S_Street, Zip)
TEACHER (TeacherID, T_name, T_first, T_Street, Zip)
ZIPCODE (Zip, City, State)
INSTRUMENT (InstruID, Instrument)
LESSONS (StudentID, TeacherID, InstruID)
Refer back to the original narrative; the database also exists to keep track of which student has which instrument
during a lesson. Look at the Lessons relation. We can see that it is possible to extract that information from the
relation. But look at the relation again; we know that a student signs out an instrument, meets with an instructor, and
then signs the instrument back in. Doesn’t this relation imply a fixed relationship between the student, the instructor,
and the instrument? (It actually also implies that an instructor only teaches one instrument, which is not likely to be
true all the time.)
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 17 -
Since moving from lower to higher normal forms always requires knowing the meaning of the data stored, and the
purpose of the data stored, we must know if there is a fixed relationship. For the previous normal forms, that
assumption was correct. Unfortunately now, the director of Passageways tells us that this is no longer true.
There are not enough teachers to guarantee that each student will always get the same one each time. A teacher may
be desired during the same hour for more than one student, so one will get the teacher s/he prefers and one will not.
This will require you to move to 4NF. We need to progress to 4NF when there are multivalued attributes A, B and
C, where the sets of values for B and C are independent of one another. In this case, the instrument being taught
(attribute C) is independent of the teacher (attribute B). To move to 4NF, we again decompose the relations by
splitting them. The resulting relations are:
STUDENT_INSTRUMENT (StudentID, InstruID)
STUDENT_TEACHER (StudentID, TeacherID)
With this structure, we can keep track of which student has used an instrument, however many s/he plays, without
redundancy or null values. We can also keep track of which student saw what teacher, regardless of how many
teachers have instructed that student.
The underlying issue is that the previous relation of Assignments was representing two 1:M relationships. 1 Student
can play Many instruments, and 1 Student can have Many teachers. Two relations with one 1:M relationship each
replaced this one relation with two underlying 1:M relationships. As data entry took place, think of what would
happen considering the number of instruments being learned, and the number of teachers giving lessons in that
instrument. The number of records in the database would have expanded exponentially. The resulting two relations
will save considerable space while adding flexibility.
This is the definition of 4NF: it is 3NF plus the removal of more than one 1:M relationships with a relation.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 18 -
5NF (Fifth Normal Form)
5NF occurs when constraints are placed against the data. Looking at the resulting two relations of
STUDENT_INSTRUMENT and STUDENT_TEACHER, couldn’t an inference be made that if a student takes a
given instrument, and sees a given teacher, that that teacher seen plays any of the instruments the student is taking?
Even if that inference could not be made, it’s possible that a new volunteer attempting to assign different students to
different instructors could use some help.
We actually lost information when we decomposed the LESSONS (StudentID, TeacherID, InstruID) relation to
meet the requirements of 4NF. We could certainly have discovered which teacher could teach which instrument
from this relation; if a teacher was in a lesson with a student, we could assume that teacher could play it. When we
split the table into the STUDENT_INSTRUMENT and STUDENT_TEACHER relations, we lost this ability. We
needed more meaning, that is, to know the constraint that a teacher could not teach every instrument offered.
5NF occurs when we have constraints placed upon independent multivalued attributes. The independent
multivalued attributes in this case are the instruments and the teachers, and they are independent of one another. We
can easily model this constraint by adding another relation:
TEACHER_INSTRUMENT (TeacherID, InstruID)
So, our final design looks like this:
STUDENT (StudentID, S_last, S_first, S_Street, Zip)
TEACHER (TeacherID, T_name, T_first, T_Street, Zip)
ZIPCODE (Zip, City, State)
INSTRUMENT (InstruID, Instrument)
STUDENT_INSTRUMENT (StudentID, InstruID)
STUDENT_TEACHER (StudentID, TeacherID)
TEACHER_INSTRUMENT (TeacherID, InstruID)
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 19 -
We wound up with seven tables from one original table.
BCNF (Boyce-Codd Normal Form)
Boyce-Codd Normal Form (BCNF) is a stronger form of 3NF. BCNF is based on the principle that every
determinant (value that determines another) in the relation must be a candidate key (able to uniquely identify the
row). Relations that have only one candidate key are automatically in 3NF, just like relations that have no nonkey
attributes are automatically in 3NF. Every relation in BCNF will be in 3NF, but not every 3NF is in BCNF.
Relations in 3NF that violate BCNF are quite rare. Violations of BCNF occur when there two or more candidate
keys that overlap each other, that is to say the candidate keys share some attribute in common.
BCNF states that for a dependency XY to remain in a relation, X must be a candidate key. To test for BCNF,
identify all determinants and make sure they are all candidate keys.
Passageways now wants us to model the student room assignments. A student may only receive one lesson per day,
and there is only one student in a room at a time. Teachers are assigned one room each day, but there can be more
than one teacher using a room on that day. Consider the following relation, which they believe to be in 3NF:
STUDENT_APPOINTMENT (StudentID, Appt_date, Appt_time, TeacherID, Room)
Assume that a student can have only one appointment per day. The dependency map looks like this:
Student ID
Appt_date
Appt_time
TeacherID
Room
This candidate key has been chosen
as the primary key. This is appropriate, since we are modeling students and their appointments. We don’t need to
include appointment time in the key since students are limited to one appointment per day.
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 20 -
Room
Appt_date
Appt_time
Student ID
This is a second functional dependency found in this relation. This one may remain, since this combination of
attributes is also a candidate key.
Room
Appt_date
Appt_time
TeacherID
Student ID
This third functional dependency could also be a candidate key.
Appt_date
TeacherID
Room
This functional dependency poses a problem. The room is determined by the teacher and the date, but this is not a
candidate key because it does not uniquely identify the row—there can be more than one teacher in a room per day,
so knowing the room number may get us more than one combination of teacher and date.
Why would this cause problems? Let’s populate the table and see what may happen.
StudentID
Appt_date
Appt_time
TeacherID
Room
01474
01423
01233
12100
10-02-00
10-02-00
10-02-00
10-03-00
10:00 AM
11:00 AM
01:00 PM
12:00 PM
Wolf
Wolf
Greylock
Blair
A
A
A
B
Ask yourself the following questions:

What will happen when a teacher calls in to cancel all the day’s appointments?

What will happen if the teacher is reassigned to a different room?

How many times are you storing teacher information for one day?
This relation causes duplication and insert anomalies. Our solution is to decompose the original relation into two
relations as follows:
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 21 -
STUDENT_APPOINTMENTS (StudentID, Appt_date, Appt_time, TeacherID)
ROOM_ASSIGNMENTS (TeacherID, Appt_date, Room)
Now, the information concerning a teacher’s daily room assignments are not repeated.
Is it always a good practice to decompose into BCNF? We should only split tables if all functional dependencies are
preserved. If a determinant is separated from the attributes that it determines, we might not decompose. In this case,
we have lost a functional dependency. We have lost the functional dependency that tells us that:
Room + Appt_date + Appt_time  TeacherID, StudentID, since the information about what student is in the
appointment. In this case, losing this functional dependency saves us some update problems, so we will leave the
relations as they stand.
We should do out Wilbur’s Bait world here
Copyright Lisa MacLean, Wentworth Institute of Technology, 2001. All rights reserved. Unauthorized use or copying strictly prohibited without
consent.
- 22 -