Information Systems Database Systems (AH) 6839

advertisement
Information Systems
Database Systems (AH)
6839
Spring 2000
HIGHER STILL
Information
Systems
Database Systems
Advanced Higher
Support Materials


CONTENTS
Section 1
Teacher/Lecturer Notes
Section 2
Student Notes
Section 3
Study Materials
Information Systems: Database Systems (AH)
1
Information Systems: Database Systems (AH)
2
Section 1
Teacher/Lecturer Notes
Information Systems: Database Systems (AH)
3
Information Systems: Database Systems (AH)
4
Aim
This unit is designed to further develop students’ knowledge and understanding of
data analysis and structuring.
Status of this learning and teaching pack
These materials are for guidance only. The mandatory content of this unit is detailed
in the unit specification of the Arrangements document.
Target audience
While entry is at the discretion of the centre, students would normally be expected to
have attained one of the following (or possess equivalent experience):
 Database Systems (Higher) unit
 Information Systems course at Higher level
Progression
This unit builds on skills acquired particularly at Higher level in database work, but
emphasises the design of database structures. It links closely with Systems Analysis
& Design (AH) and is used in the Advanced Higher Information Systems Project.
Learning and teaching approaches
You should note that this unit continues the new approach to the teaching of database
systems begun in the Database Systems unit at Higher level. The emphasis is on the
analysis of an existing system (non-computerized) and the design of a corresponding
database. Outcome 1 continues the theme of normalization that was already present at
Higher level. Outcome 2 introduces the construction of a data model by adding a data
dictionary, event lists and entity life histories.
The pack is divided into two sections, one for each outcome. The performance
criteria for each outcome are covered in the order stated in the arrangements except
that PC (e) of outcome 1 is relevant to each of the PC (a-d).
A student-centred, resource-based learning approach is recommended. To enliven
learning, the use of video, audio and multimedia learning aids is recommended.
While the distribution of time between the outcomes will vary, students might be
expected to complete each outcome within the following time scale:
Outcome 1
Outcome 2
6 hours
14 hours
Information Systems: Database Systems (AH)
5
Hardware and software resources
This unit has no requirement for the student to have regular access to a computer.
However, students may use software to prepare responses for the assessments.
Suitable software would include word-processing and graphics packages. It may also
be possible to use software provided for the Systems Analysis & Design unit (such as
CASE tools).
References
Books
The following books contain material relevant to the topic:
Any book on SSADM, for example,
Eva, M. 1994, SSADM Version 4: A User’s Guide, 2nd edition, McGraw-Hill
(especially chapters 11, 12)
A short book on databases that can be used by candidates.
Rolland, F.D. 1998, The Essence of Databases, Prentice Hall
A standard text on databases written for university level courses, suitable for teachers
and lecturers.
Elmasri, R and Shamkant B 2000. Fundamentals of Database Systems, 3rd edition,
Addison/Wesley
The British Computer Society’s Glossary of Computing Terms, published by
Longman.
Internet
A wide range of articles on the Internet is available by entering terms such as
‘Relational Database Systems’, ‘Normalization’ and ‘Entity Life History’ in a search
engine such as http://www.hotbot.com or http://www.excite.com/.
Information Systems: Database Systems (AH)
6
Section 2
Student Notes
Information Systems: Database Systems (AH)
7
Information Systems: Database Systems (AH)
8
This half unit, Database Systems, is a core unit of Information Systems (Advanced
Higher). You will find it easier if you have already studied Database Systems either at
Higher or elsewhere as you will be continuing to look at normalization in the first
outcome. The second outcome contains new material on constructing a fuller data
model. The outcomes of the unit are:
1
2
Normalize a data source to third normal form.
Produce a data model from a normalized data source.
You will find that no computer work is required for this unit. Outcomes 1 and 2 are
about analysis and design. Although you are not required to implement a system as
part of this unit you will use the methods of data analysis and design you learn in this
unit in the Information Systems Project.
How to tackle this unit
The learning pattern suggested for your use in this unit is as follows:
1. Study the Introduction for each outcome first. You may also find it helpful to read
through the Summary of the outcome before getting down to detailed work.
2. All the material is explained in terms of a running example. Work through the
material in the order it is written. Every few steps you are advised to try your own
example. You will find two example applications given in the section Exercises but
your teacher/lecturer may provide other exercises to try. Keep the results of your
example for outcome 1 as it will be of use in outcome 2 as well.
3. Review each outcome by using the summaries provided in the study material.
4. The section called Questions contains questions on all the material. Each question
states the performance criteria that are most relevant. These questions may be
attempted as soon as you have covered the appropriate performance criteria but they
are probably best used at the end of each outcome when you have worked through
your own example in detail.
Assessment
Assessment will normally follow each outcome, but your teacher or lecturer may
choose to assess you at different times. Possible assessment items are as follows:
Outcome 1
Tasks involving an analysis leading to the production of a third normal form data
model.
Outcome 2
Tasks involving the construction of a fuller data model with a data dictionary, event
list and entity/event matrix, and entity life histories.
Information Systems: Database Systems (AH)
9
References
At the head of each section you will find a list of key terms introduced in that section.
You can look these terms up in the reference material given below. Do not worry if
you cannot get access to the books listed: there are many alternatives and you should
find most of the terms in some book.
Books
The following books contain material relevant to the topic:
Any book on SSADM will follow a similar approach to this unit, for example,
Eva, M. 1994, SSADM Version 4: A User’s Guide, 2nd edition, McGraw-Hill
(especially chapters 11, 12)
There are many books on databases. A short book that can be used for reference is
Rolland, F.D. 1998, The Essence of Databases, Prentice Hall
The standard glossary is
The British Computer Society’s Glossary of Computing Terms, published by
Longman.
Internet
A wide range of articles on the Internet is available by entering terms such as
‘Relational Database Systems’, ‘Normalization’ and ‘Entity Life History’ in a search
engine such as http://www.hotbot.com or http://www.excite.com/.
Information Systems: Database Systems (AH)
10
Section 3
Study Materials
Information Systems: Database Systems (AH)
11
Information Systems: Database Systems (AH)
12
UNIT INTRODUCTION
This unit is concerned with the analysis of an existing data source such as a set of
paper forms or a collection of computer files that are not integrated into a single
database. In order to explain the ideas we shall use a running example of a form taken
from a hospital application.
We will only analyze one such form. In practice there would be many such forms to
analyze and so there are stages later in the analysis where the entities identified will
need to be merged. However this process lies outside the scope of this unit.
OUTCOME 1 – NORMALISATION
Key Terms
normalization
foreign key
normal forms
third normal form (3NF)
functional dependency
repeating item
first normal form (1NF)
partial dependency
key
repeating group
second normal form (2NF)
indirect dependency.
Introduction
Normalization is only defined in the context of the relational model of data. The steps
we follow here are those given in the performance criteria of the arrangements but
they are not the only routes to producing the final entities. However they are
especially suited to analyzing an existing data source and are often used as part of
SSADM (Structured Systems Analysis and Design Method).
If you look in database textbooks such as Rolland you will find alternative methods
based on constructing entity-relationship models and then mapping these models onto
the relational model of data. You will find it helpful anyway to use entity-relationship
models as these can often help our understanding of the data being modeled.
Note that it is possible to construct an entity-relationship model without any reference
to normalization. However, the entities produced in this way should then be
represented as relational tables and checked to make sure they are all in third normal
form.
Normalisation is important
What happens if we have a data source that is not fully normalized? In a word, we are
then in danger of storing a single fact in more than one place in the database, or even
of not storing it at all. If a fact (for example, the address of a supplier) is there more
than once we can have problems updating the data: perhaps we will change one
occurrence of the fact but not the other(s). This leads to loss of data integrity in the
form of inconsistency. If a fact is not even stored we can never retrieve it or make
any use of it. We will look at specific examples a little later on (see steps 6-8, below).
Functional dependence and keys
Normalization is built on the idea of functional dependence. Suppose an entity (a
group of data items) contains the items A and B. The data item, A, is functionally
dependent on the other, B, if, for any given value of B there can only be one value of
A in any of the entity instances (records). We also say that B functionally determines
A (shown in many books with an arrow, as in B  A).
Information Systems: Database Systems (AH)
13
Another way of explaining this is to suppose that there are two instances (say, two
suppliers). Suppose item B is the supplier’s name and item A is the supplier’s
address. Then if the two instances have the same supplier name it follows that the two
addresses must be the same if the address is functionally dependent on the name.
The idea of functional dependence can be extended by allowing B to be a set of data
items, just as a key may be made up of more than one data item. Indeed we can
define what is meant by a key in terms of functional dependencies.
A set of data items in an entity is a key for that entity if it functionally determines all
the other data items in the entity and if no subset of the key items will do this.
The usual definition of a key follows directly from this. The key of an entity (group
of data items) is one or more of its data items having the property that the values of
the key items are different in every possible instance of the entity. It follows that the
key value identifies each instance uniquely.
Unfortunately it is not sufficient to look at actual data and check that the condition for
a functional dependency or a key is satisfied. It may well be satisfied for the current
data, but data can change. What we need to know are the conditions that the
application puts on the data and these conditions are fixed by the ‘customer’, the
specifier of the requirements.
In our example of suppliers’ names and addresses it may or may not be true that one
supplier can have more than one address. For example, one address might be the head
office and another that of a warehouse. In such a case we would say that the
supplier’s name does not functionally determine the supplier’s address. It is thus as
useful to know which functional dependencies do not exist as to know which do exist.
See the further discussion in the section ‘Classification of Constraints’.
One final point. There may be more than one key for the same entity (group of data
items). In this case we say that they are candidate keys, one of which is chosen to be
the primary key. An example of this situation is given in the section ‘An Alternative
Data Source’.
Information Systems: Database Systems (AH)
14
The running example: Hospital Appointments
We will be applying the methods of this unit to one example. It has been designed to
include most of the problems that normalization has to deal with. The same example
will be used in the second part of the analysis where events and life histories are
examined.
A Hospital Trust consists of several hospitals and the appointments for each clinic are
kept in a folder. Each sheet in the folder is a form that looks like the following before
any data is entered on it.
HOSPITAL CLINIC APPOINTMENTS
DATE OF CLINIC
APPOINTMENTS
Time
Patient
NHS no.
CONSULTANT
Name
Phone number
Patient
Name
Patient
GP no.
HOSPITAL
Name
Phone number
Patient
GP name
Patient
GP address
A typical form that has been filled in with some data follows. Note that the variable
data (such as the hospital’s name) are shown in a sans serif font.
HOSPITAL CLINIC APPOINTMENTS
DATE OF CLINIC
24/6/1999
APPOINTMENTS
Time
Patient
NHS no.
1400
82-4561F
1415
55-8277H
1430
42-8433W
1445
77-5098I
1500
62-8231H
1515
34-9126P
1545
24-2187L
CONSULTANT
Name
Mr Rees
Phone number 395024
Patient
Name
Mary Fish
Steven Howe
Ming-Toh Wan
Ali Ibrahim
Roberta Henry
Louis Panumam
Fabian Lee
Patient
GP no.
3163
4200
3131
2090
5298
3742
3598
HOSPITAL
Name
Phone number
Patient
GP name
Dr Spencer
Dr Rose
Dr Henderson
Dr Rivington
Dr Rose
Dr Patel
Dr Anderson
St George’s
439862
Patient
GP address
Shieldham, SB3 6JK
Millington, SB23 5JC
Merkeley, MR4 8TD
Shieldham, SB3 9RU
Dunbally, MR11 7FA
Shieldham, SB3 6JK
Bruntsford, MR15 3GA
The following constraints are placed upon the data.
Consultants work for the Trust at each of its hospitals
Patients attend clinics taken by the consultants
One consultant can have only one clinic on a given day
Each patient can have only one clinic appointment on a given day
Information Systems: Database Systems (AH)
15
The Steps of Normalisation
The data contained on all these forms is to be analyzed. The process is summarised in
the following table and each step is explained in detail below. The column headed
‘State reached’ shows the state of normalization reached (UNF = Unnormalized
Normal Form, 1NF = 1st Normal Form, etc.), the final column shows where the
performance criteria are overtaken. Notice that steps 3, 5, 7 and 8 all contribute to PC
1e.
Step
Task
1
2
3
4
5
6
7
Identify all the data items
Identify any repeating groups
Identify the key(s)
Remove any repeating groups into separate entities
Identify keys, add foreign key(s) to represent the relationships
Identify any partial dependencies on keys
Remove partially dependent items to separate entities, identify
keys, add foreign keys
Identify any indirect dependencies on keys
Remove indirectly dependent items to separate entities,
identify keys, add foreign keys
8
9
State
reached
PC
overtaken
UNF
1a, 1e
1NF
1b, 1e
2NF
1c, 1e
3NF
1d, 1e
If you have done the Database Systems unit at Higher, you should be familiar with
steps 1 to 5, though you may have carried out the process in a different way.
Information Systems: Database Systems (AH)
16
Step 1
The first step is to extract the various data items that appear on the form. It is often
helpful to construct some additional data to increase understanding of the application.
This data should satisfy the conditions or constraints expressed in the specification.
For example, we should not have the same consultant taking two different clinics on
the same day. (You should make up some new data now.)
We name the group as a whole and each of the data items. These names may be a
little different from the names used on the printed form. For example, the form has
two items called ‘Name’, one for the consultant and one for the hospital. In order to
prevent confusion we shall expand these to ‘Consultant Name’ and ‘Hospital Name’,
respectively.
At this early stage in the analysis it is better to use long names and not use
abbreviations. Later, when the data dictionary is being constructed, shortened names
can be used.
Entity
Clinic Appointment
Data Items
Clinic Date
Consultant Name
Consultant Phone Number
Hospital Name
Hospital Phone Number
Appointment Time
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
Step 2
In step 1 we put all the data items into one entity group but we can see that all the
items from ‘Appointment Time’ onwards have more than one value in the example
data. This means that we have a repeating group of items, so we restate the entity
group as
Entity
Clinic Appointment
Data Items
Clinic Date
Consultant Name
Consultant Phone Number
Hospital Name
Hospital Phone Number
Repeating group
Appointment Time
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
The following page shows the data displayed in a single table with one column for
each data item. Some additional data has been added for another clinic (from another
copy of the form). Note that there are only two records shown in this table. One
record has a repeating group of seven repeats and the other one of two repeats.
Information Systems: Database Systems (AH)
17
Information Systems: Database Systems (AH)
18
Mr Rees
Mr Sale
24/06/99
Consultant
Name
24/06/99
Clinic
Date
396118
395024
Consultant
Phone
439862
439862
St George’s
Hospital
Phone
St George’s
Hospital
Name
62-8231H
34-9126P
24-2187L
1500
1515
1545
65-4922D
77-5098I
1445
1415
42-8433W
1430
72-3361K
55-8277H
1415
1400
82-4561F
Patient
NHS no.
1400
Time
Richard
Dell
Mandy
King
Fabian Lee
Louis
Panumam
Roberta
Henry
Ali Ibrahim
Steven
Howe
Ming-Toh
Wan
Mary Fish
Patient
Name
4326
3131
3598
3742
5298
2090
3131
4200
3163
Patient
GP no.
Dr Mitchell
Dr
Henderson
Dr
Anderson
Dr Patel
Dr Rose
Dr
Rivington
Dr
Henderson
Dr Rose
Dr
Spencer
Patient GP
name
Abbeyton,
MR14 4GF
Merkeley,
MR4 8TD
Bruntsford,
MR15 3GA
Shieldham,
SB3 6JK
Dunbally,
MR11 7FA
Shieldham,
SB3 9RU
Millington,
SB23 5JC
Merkeley,
MR4 8TD
Shieldham,
SB3 6JK
Patient GP
address
Step 3
The key for this group of data items is not easy to establish. Obviously Clinic Date on
its own is not enough (there can be many clinics on one day). Similarly the
Consultant Name alone is insufficient. However the combination of Clinic Date and
Consultant Name should be unique. We can show this key as (Clinic Date,
Consultant Name).
If the analyst were unsure of this fact from the stated requirements, the question to ask
in clarification would be, Can the same consultant have more than one clinic on the
same day? An answer of Yes would require some change to the choice of key. One
possibility is to add an artificial key, say a number that is guaranteed different for
each clinic. For the purposes of our running example we shall assume an answer of
No. The key data items are shown underlined:
Entity
Clinic Appointment
Data Items
Clinic Date
Consultant Name
Consultant Phone Number
Hospital Name
Hospital Phone Number
Repeating group
Appointment Time
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
We have now reached the unnormalized form (UNF) of the data source.
(This is a good time to try steps 1-3 for one of the exercises.)
Step 4
Normalization is a process whereby the data items are grouped into entities that
satisfy certain conditions (or constraints). The first of these is a fundamental
requirement of the relational model of data. First Normal Form (1NF) requires that
each data item be a single value. For example, a data item value must not be a
repeating set of values or an address of other data. In particular, this means that
repeating groups are not allowed within an entity.
We can remove repeating groups in two ways. We can take out the repeating items
into a separate entity, or we can duplicate the values of the non-repeating data items in
each record. The first method has the advantage of advancing the normalization
process. This is because the data items of the repeating group depend only on their
own key and not on the whole key.
In the example, the data items of the repeating group have a key Appointment Time
but the whole entity (using duplication of values) has a key of (Clinic Date,
Consultant Name, Appointment Time). We shall see later that the partial dependence
on the full key means that any entity containing a repeating group cannot be in Second
Normal Form (2NF). PC 1b requires that we use the first method and remove the
repeating group to another entity.
Information Systems: Database Systems (AH)
19
Entity
Consultant Session
Appointment (taken from the repeating group)
Data Items
Clinic Date
Consultant Name
Consultant Phone Number
Hospital
Hospital Phone Number
Appointment Time
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
See Fig. er1 for the ER diagram for this stage in the process.
Figure er1.
Consultant
Name
Patient
NHS No
Clinic
Date
Time
Patient
Name
N
Appointment
Patient GP
No
Patient GP
Address
Patient GP
Name
Is
For
1
Consultant
Session
Hospital
Phone No
Consultant
Phone No
Hospital
Step 5
These two groups have a many-to-one relationship between them: one Consultant
Session can have many Appointments, but each Appointment is for a single
Consultant Session. In the relational model a many-to-one relationship is represented
by putting the key of the ‘one’ end into the ‘many’ end as a foreign key.
In the example the key of Consultant Session is (Clinic Date, Consultant Name) so
both these items need to be added to the Appointment entity to put it into First Normal
Form (1NF). Of course they remain in the Consultant Session entity as well.
Information Systems: Database Systems (AH)
20
Entity
Consultant Session
Data Items
Clinic Date
Consultant Name
Consultant Phone Number
Hospital Name
Hospital Phone Number
Clinic Date
Consultant Name
Appointment Time
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
Appointment (with foreign keys added)
The key of Appointment includes the foreign key items but this is not true for all
foreign keys. See, for example, the case of Hospital in step 9, below.
There is no change of ER diagram at this point because ER models do not show
foreign keys. This is because ER models are not specific to relational databases but
can be used for other database models such as the network model or the objectoriented model. Foreign keys are what is used in the relational model to represent
relationships.
Our running example does not have a many-to-many relationship at this stage but it is
important to deal with these correctly. See the Additional Note on Many-to-Many
Relationships that follows step 9.
(You should now try steps 4, 5 of your exercise.)
Step 6
Second Normal Form (2NF) requires that every data item not in the key is fully
dependent on the key. To put it another way: an entity in 2NF cannot contain any
partial dependencies.
A data item is partially dependent on the key if it is functionally determined by part of
the key. Obviously we cannot have part of a single-item key, so there can be no
partial dependencies if the key is a single item.
We can now examine each entity in turn to see if it is in 2NF. We will look first at
some example data for Consultant Session. Note that this is the only entity containing
the items Consultant Phone Number and Hospital Phone Number. We will see that
there are some problems.
Clinic Date
Consultant Name
24/6/1999
24/6/1999
1/7/1999
1/7/1999
Mr Rees
Mr Sale
Mr Robins
Mr Sale
Consultant Phone
Number
395024
396118
385741
396118
Information Systems: Database Systems (AH)
Hospital Name
St George’s
St George’s
Central
St George’s
Hospital Phone
Number
439862
439862
290435
439862
21
(1) There is a problem with updating. For example, if Mr Sale changes his phone
number to 398765 we must update every place this occurs.
(2) There is a problem with losing information through deletion. Suppose Mr Rees’s
clinic is cancelled. We want to delete that row from the table, but that would remove
the only place where Mr Rees’s phone number is stored.
(3) There is a problem with adding information. If we wish to add the fact that the
consultant Mr Hope has phone number 443388 we need to add another row, but we
have no value for one of the key items (Clinic Date) for this new row. We cannot
leave it blank since every row must have a unique key to identify that row.
These three problems (or anomalies) are all aspects of the same difficulty: not having
a fact stored exactly once. They are caused by the partial dependency of Consultant
Phone Number on the key. Consultant Session has key (Clinic Date, Consultant
Name). The item Consultant Phone Number is dependent on Consultant Name, but
not on Clinic Date, so it is partially dependent on the key.
Every entity not in 2NF will have the same three problems.
(There are also problems to do with Hospital but they have another cause, see later
under step 8.)
Step 7
We must separate out a new entity containing Consultant Name and Consultant Phone
Number, which we can call Consultant. There is a many-to-one relationship between
the entities Consultant Session and Consultant. This relationship is represented in the
relational model by a foreign key. In the example we leave the key of Consultant (i.e.
Consultant Name) in the entity Consultant Session as a foreign key.
Entity
Consultant Session
Data Items
Clinic Date
Consultant Name
Hospital Name
Hospital Phone Number
Consultant Name
Consultant Phone Number
Consultant
The remaining data items in Consultant Session (Hospital Name and Hospital Phone
Number) depend on the full key, so Consultant Session is in 2NF.
Since the key of Consultant is a single item, Consultant Name, there cannot be any
partial dependencies, so Consultant also is in 2NF.
We can now examine the problems to see that they have been dealt with. If we put
the example data into the two tables we get
Clinic Date
24/6/1999
24/6/1999
1/7/1999
1/7/1999
Consultant Name
Mr Rees
Mr Sale
Mr Robins
Mr Sale
Hospital Name
St George’s
St George’s
Central
St George’s
Information Systems: Database Systems (AH)
Hospital Phone
439862
439862
290435
439862
22
Consultant Name
Mr Rees
Mr Sale
Mr Robins
Mr Sale
Consultant Phone
395024
396118
385741
396118
Problem (1). Updating Mr Sale’s phone number now involves changing just one data
value.
Problem (2). Deleting Mr Rees’s clinic on 24/6/1999 involves removing just one row
from Consultant Session and we still have Mr Rees’s phone number in the Consultant
table.
Problem (3). Adding a new consultant means adding a row to the Consultant table.
There is now a unique key value (the name of the consultant).
After these three changes the data would be
Clinic Date
24/6/1999
1/7/1999
1/7/1999
Consultant Name
Mr Sale
Mr Robins
Mr Sale
Consultant Name
Mr Rees
Mr Sale
Mr Robins
Mr Sale
Mr Hope
Consultant Phone
395024
398765
385741
396118
443388
Hospital Name
St George’s
Central
St George’s
Hospital Phone
439862
290435
439862
We now repeat steps 6 and 7 for the other entity, Appointment, which has key (Clinic
Date, Consultant Name, Appointment Time). In this case all of the non-key items
depend on the whole key. For example, we cannot know the Patient GP Number
unless we know the Patient, and we cannot know the NHS Number until all of the key
is known. So Appointment is in 2NF. We can show the current position:
Entity
Consultant Session
Consultant
Appointment
Data Items
Clinic Date
Consultant Name
Hospital
Hospital Phone Number
Consultant Name
Consultant Phone Number
Clinic Date
Consultant Name
Appointment Time
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
See Fig. er2 for the ER diagram at this point. Note again that no foreign keys are
needed on the ER diagram.
Information Systems: Database Systems (AH)
23
Figure er2.
The Consultant Session entity is properly termed a ‘weak entity’ and the Clinic Date
item is only a partial key. The full key is the combination of Clinic Date and
Consultant Name (the key of Consultant). The weak entity is shown with a
double-lined box.
Patient
NHS No
Patient
Name
Clinic
Date
Time
N
Is
For
Appointment
Patient GP
No
1
1
Hospital
Consultant
Session
N
Patient GP
Address
Patient GP
Name
Hospital
Phone No
Takes
1
Consultant
Consultant
Phone No
Consultant
Name
(Try steps 6, 7 for your own exercise.)
Step 8
Third Normal Form (3NF) requires that every non-key item depends directly on the
whole key. We look first at Consultant Session and again examine some example
data.
Clinic Date
24/6/1999
24/6/1999
1/7/1999
1/7/1999
Consultant Name
Mr Rees
Mr Sale
Mr Robins
Mr Sale
Hospital Name
St George’s
St George’s
Central
St George’s
Hospital Phone
439862
439862
290435
439862
Unfortunately the same kinds of problem are still present.
Information Systems: Database Systems (AH)
24
(1) There is a problem with updating. For example, if St George’s Hospital changes
its phone number we must update every place this occurs.
(2) There is a problem with losing information through deletion. Suppose Mr
Robins’s clinic is cancelled. We want to delete that row from the table, but that
would remove the only place where Central Hospital’s phone number is stored.
(3) There is a problem with adding information. If we wish to add the fact that Heath
Hospital has phone number 357380 we need to add another row, but we have no value
for either of the key items (Clinic Date, Consultant Name) for this new row. We
cannot leave them blank since every row must have a unique key to identify that row.
These problems are not caused by any partial dependence of items on the key but by
indirect dependence on the key.
Hospital Phone Number depends on Hospital which itself depends on the key so this
entity is not in 3NF.
Step 9
We must separate out an entity containing Hospital and Hospital Phone Number,
calling it, say, Hospital. There is a many-to-one relationship between Consultant
Session and Hospital and this relationship is represented by keeping the item Hospital
(the key of the new entity Hospital) in the Consultant Session as a foreign key. The
two entities are now
Entity
Consultant Session
Hospital
Data Items
Clinic Date
Consultant Name
Hospital
Hospital
Hospital Phone Number
We now repeat steps 8 and 9 for the other entities. First we examine the entity
Consultant. This has only one non-key item and so there is no possibility of
dependence on a non-key item. Consultant is in 3NF.
Finally we examine Appointment. All four items Patient Name, Patient GP Number,
Patient GP Name, Patient GP Address are dependent on the non-key item Patient
NHS Number so the entity is not in 3NF.
We can remove to another entity these items, calling it Patient. It will have the key
Patient NHS Number and this item is also left in Appointment as a foreign key to
represent the many-to-one relationship between Appointment and Patient. The two
entities are shown in the following table.
Information Systems: Database Systems (AH)
25
Entity
Appointment
Patient
Data Items
Clinic Date
Consultant Name
Appointment Time
Patient NHS Number
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
If we examine the new entity Patient we see that there are still items that do not
depend directly on the key. Patient GP Name and Patient GP Address depend on
Patient GP Number and not directly on Patient NHS Number This means that Patient
is not yet in 3NF. We remove to a new entity the items Patient GP Number, Patient
GP Name, Patient GP Address, calling it simply GP. It will have as its key Patient GP
Number.
The many-to-one relationship between Patient and GP is represented by the foreign
key Patient GP Number. It is convenient to rename the items in GP so that they no
longer carry the prefix Patient but we keep the old name in the foreign key to explain
its role. The entities are now
Entity
Patient
GP
Data Items
Patient NHS Number
Patient Name
Patient GP Number
GP Number
GP Name
GP Address
All the entities are now in 3NF and we can display the full set of entities
Entity
Consultant Session
Hospital
Consultant
Appointment
Patient
GP
Data Items
Clinic Date
Consultant Name
Hospital Name
Hospital Name
Hospital Phone Number
Consultant Name
Consultant Phone Number
Clinic Date
Consultant Name
Appointment Time
Patient NHS Number
Patient NHS Number
Patient Name
Patient GP Number
GP Number
GP Name
GP Address
See Fig. er3 for the final ER diagram with the six entities.
Information Systems: Database Systems (AH)
26
Figure er3.
Hospital
Hospital
Phone No
Hospital
Name
1
Holds
Time
N
N
Is
For
Appointment
1
N
Takes
1
1
Patient
Patient
Name
Consultant
Session
N
Attends
Patient
NHS No
Clinic
Date
Consultant
N
Consultant
Phone No
Consultant
Name
has
GP No
GP Name
1
GP
GP Address
Another entity is weak, Appointment. It gets its key from its own partial key and
from the key of Consultant Session. Since Consultant Session is itself weak, the key
of Appointment has three items in all. Note once again that no foreign keys are
shown on an ER diagram.
(Now try steps 8, 9 for your own exercise.)
Information Systems: Database Systems (AH)
27
Additional Note on Many-to-Many Relationships
When we form a new entity by taking out a group of items there will be relationships
between the resulting entities. Usually these will be many-to-one relationships (or
occasionally one-to-one). These relationships are represented in the relational model
by foreign keys.
However, sometimes a relationship will be many-to-many. In order to represent a
many-to-many relationship in the relational model we must introduce an additional
linking entity. This new entity will contain a foreign key for each of the originally
related entities.
Every many-to-many relationship can be decomposed into two many-to-one
relationships in this way. This is shown in a general way using ER diagrams in Fig.
er3A. Note that any attributes of the many-to-many relationship become attributes of
the new entity. This will happen in the simple example we now look at.
Figure ER3A.
Decomposition of a many-to-many relationship into two many-to-one relationships.
(a) The original many-to-many relationship
Key of B
Key of A
M
Entity A
Attribute
of A
Information Systems: Database Systems (AH)
N
R
Attribute of
R
Entity B
Attribute of
B
28
(b) The same model with the decomposition into two many-to-one relationships.
Key of A
1
Entity A
N
RA
Link Entity
Attribute of
R
N
Attribute
of A
RB
1
Key of B
Entity B
Attribute of
B
Suppose that students may enrol on courses. If there is one form for each student we
might show the data items as in the following table.
Entity
Student
Data Items
Student Number
Student Name
Student Date of Birth
Repeating group
Course Number
Course Title
Course Credits
Date Enrolled
Grade Awarded
(Try making up some data for this example.)
We separate the repeating group into a separate entity, Course.
Entity
Student
Course
Data Items
Student Number
Student Name
Student Date of Birth
Course Number
Course Title
Course Credits
Date Enrolled
Grade Awarded
(What happens to your example data at this point?)
Information Systems: Database Systems (AH)
29
These two entities have a many-to-many relationship (a student can do many courses,
a course can have many students taking it). We introduce a new entity, Enrolment,
that is in a many-to-one relationship with each of Student and Course. To represent
these relationships, we need to add the keys of these two entities as foreign keys in
Enrolment.
It is also clear that the data items Date Enrolled and Grade Awarded belong to this
new entity. (That is why it was difficult to assign the example data to the two entities
Student and Course.)
We now need to determine the key of Enrolment. If we assume that a student cannot
take a course more than once, Date Enrolled and Grade Awarded are functionally
determined by the combination (Student Number, Course Number). The key of
Enrolment is the combination (Student Number, Course Number).
Entity
Student
Course
Enrolment
Data Items
Student Number
Student Name
Student Date of Birth
Course Number
Course Title
Course Credits
Student Number
Course Number
Date Enrolled
Grade Awarded
If a student can take a course more than once, Date Enrolled is not functionally
determined by (Student Number, Course Number). We need to add Date Enrolled to
the key.
Entity
Student
Course
Enrolment
Data Items
Student Number
Student Name
Student Date of Birth
Course Number
Course Title
Course Credits
Student Number
Course Number
Date Enrolled
Grade Awarded
The ER diagrams are shown in Fig. er3B. The new entity is a weak entity and its
partial key is shown as Date Enrolled (assuming here that a course can be taken more
than once).
Information Systems: Database Systems (AH)
30
Figure er3B
The student course example.
(a) ER diagram with a many-to-many relationship.
Date
Enrolled
Student
Number
M
Student
Course
Number
N
Enrol
Course
Student Date
of Birth
Course
Credits
Student
Name
Course Title
Grade
Awarded
(b) ER diagram showing decomposition into two many-to-one relationships. The link
entity is a weak entity and has Date Enrolled as a partial key.
Student
Number
Grade
Awarded
1
Student
Student Date
of Birth
N
makes
Enrolment
Date
Enrolled
N
Student
Name
has
1
Course
Course Title
Information Systems: Database Systems (AH)
Course
Number
Course
Credits
31
Types of User
Nearly every information system will have more than one kind of user. Different
users will need to see different selections of the information, or at least want to have it
arranged in different ways. Some users will be restricted to read-only access, others
will be allowed to modify certain items, still others will be allowed to add or delete
instances, and perhaps others again will have unrestricted access.
In the hospital appointment example we can think of at least three kinds of user:
 the hospital administrators,
 the consultants,
 the patients.
Each type of user will look at different parts of the data and will want it arranged in a
way that suits them. The forms we have been using were obviously designed for use
by administrators staffing the clinics. In the next section we look at how the data
might be structured to suit the patients.
(You should list the types of users for the exercise that you are following through.)
For the second outcome of this unit you will be listing the events that cause
processing of the data. It is easy to overlook some of these events and so produce an
incomplete model. One way to help find all the events is to take the types of user in
turn and list events from their point of view. Duplicates will need to be removed of
course but you are much more likely to have found all the events.
An Alternative Data Source
Although we have completed the normalization of our hospital clinic data source it is
instructive to look at it again
The same data that is written on the Hospital Clinic Appointments forms can be
structured quite differently if we take the point of view of the patient. In this case
there will be a form filled in for each patient containing the details of each
appointment.
There is another reason for looking at the same problem a second time. We will find
that reaching 3NF can be quite difficult when there is a choice of keys for an entity.
In our previous study of the problem we were able to identify keys quite easily but we
did in fact overlook one possibility.
Information Systems: Database Systems (AH)
32
PATIENT HOSPITAL CLINIC APPOINTMENTS
PATIENT’S GP
GP no.
82-4561F
3163
GP name
Mary Fish
Dr Spencer
GP address
Shieldham, SB3 6JK
APPOINTMENTS
Clinic Date
Appointment
Hospital
Hospital
Consultant
Time
Phone no.
Name
24/6/1999
1400
St George’s
439862
Mr Rees
15/8/1999
1530
St George’s
439862
Mr Salmon
PATIENT
NHS no.
Name
Consultant
Phone no.
395024
309651
There is no new data on these forms. What is different is that the data is structured
differently and so different values will need to be repeated on the forms. For
example, the Hospital phone number is duplicated in the two appointments shown.
If we apply steps 1 to 3 to this new form of the data we get to an UNF shown below.
Entity
Patient Appointment
Data Items
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
Repeating group
Clinic Date
Appointment Time
Hospital Name
Hospital Phone Number
Consultant Name
Consultant Phone Number
We can now proceed to steps 4 and 5 and remove the repeating group into an entity
‘Appointment’, adding Patient NHS Number as a foreign key in Appointment, and
renaming the group with key Patient NHS Number as simply Patient.
Entity
Patient
Appointment
Data Items
Patient NHS Number
Patient Name
Patient GP Number
Patient GP Name
Patient GP Address
Clinic Date
Patient NHS Number
Appointment Time
Hospital Name
Hospital Phone Number
Consultant Name
Consultant Phone Number
Information Systems: Database Systems (AH)
33
For steps 6 and 7 we find that the Patient entity is already in 2NF since the key
consists of a single item. The Appointment entity is also in 2NF since neither Clinic
Date nor Patient NHS Number determines any of the other items uniquely. For
example, on a given date more than one Hospital may have clinics; a given patient
may see more than one Consultant (on different dates).
Moving on to 3NF with steps 8 and 9 we look for any dependencies that are not direct
on the keys.
In the entity Patient we have Patient GP Name and Patient GP Address dependent on
Patient GP Number and so not directly dependent on the key. We separate out a new
entity, GP, (as we did previously).
Entity
Patient
GP
Data Items
Patient NHS Number
Patient Name
Patient GP Number
GP Number
GP Name
GP Address
In the entity Appointment we have Hospital Phone Number dependent on Hospital
Name and so only indirectly dependent on the key. Similarly, Consultant Phone
Number is dependent on Consultant Name and only indirectly on the key. We must
therefore create two new entities, Hospital and Consultant, as shown below. Hospital
Name and Consultant Name are retained in Appointment as foreign keys to represent
the many-to-one relationships.
Entity
Hospital
Consultant
Appointment
Data Items
Hospital Name
Hospital Phone Number
Consultant Name
Consultant Phone Number
Clinic Date
Patient NHS Number
Appointment Time
Hospital Name
Consultant Name
It would appear that all these entities are in 3NF but there is a difficulty with
Appointment. There is in fact another key, namely (Clinic Date, Consultant Name,
Appointment Time). We could easily miss this other candidate key. When we looked
at the problem the first time the ‘obvious’ key was just this one. Now the ‘obvious’
key is (Clinic Date, Patient NHS Number).
3NF in its strict form requires that there is no indirect dependence of any item on any
key (primary or candidate). But Hospital Name is uniquely determined by the
candidate key (actually by a part of it only, i.e. by Clinic Date, Consultant Name), so
we must add another entity containing this dependence and removing Hospital Name
from Appointment. We can show typical data before the separation.
Information Systems: Database Systems (AH)
34
APPOINTMENT
Clinic Date
Patient NHS
Number
24/6/1999
82-4561F
15/8/1999
82-4561F
15/8/1999
73-4485G
17/8/1999
55-7108D
Appointment
Time
1400
1530
1545
1400
Hospital Name
St George’s
St George’s
St George’s
Central
Consultant
Name
Mr Rees
Mr Salmon
Mr Salmon
Mr Salmon
There are still present the same kinds of difficulty as before.
(1) There is a problem with updating. For example, if Mr Salmon’s clinic on
15/8/1999 changes its venue to Central Hospital we must update every place this
occurs.
(2) There is a problem with losing information through deletion. Suppose Mr
Salmon’s 1400 hrs appointment on 17/8/1999 is. We want to delete that row from the
table, but that would remove the only place where the fact that Mr Salmon has a clinic
at Central Hospital on that day is stored.
(3) There is a problem with adding information. If we wish to add the fact that Mr
Rees has a clinic at Heath Hospital on 22/8/1999 we need to add another row, but we
have no value for the item Patient NHS Number for this new row, and Patient NHS
Number is part of the key so cannot be left blank.
After separating out the entity Hospital we have the two entities
Entity
Consultant Session
Appointment
Data Items
Clinic Date
Consultant Name
Hospital Name
Clinic Date
Consultant Name
Appointment Time
Patient NHS Number
You should check that all three difficulties are indeed taken care of correctly in the
new entity Consultant Session.
We have now reached exactly the same set of entities in 3NF as we had before.
Clearly the identification of all the candidate keys is extremely important and any
omission can lead to incomplete normalization.
We can add a final comment on this example. If a manual system used both the forms
it would have difficulty maintaining the integrity of the data as there would be
considerable duplication between forms of the two kinds as well as duplication within
forms of the same kind.
Normalization of data sources is designed to avoid all such problems. It was rarely
used on manual systems because of the excessive amount of cross-referencing that
would be needed. This cross-referencing is not a problem in computer-based systems
(though it can lead to inefficiencies of processing).
Information Systems: Database Systems (AH)
35
Other normal forms
Are there any other normal forms after third normal form? Yes, there are fourth and
fifth normal forms, but that is all! However, these normal forms are rarely needed and
are not part of this unit. If you wish to explore further they are discussed in most
standard textbooks on databases.
Summary of outcome 1 (normalisation)
An entity is a named group of data items that are properties of that entity. An entity
instance is one particular set of values of those items. When represented in a
computer the entity instance is often called a record and the items fields, and the
whole collection of instances is called a file. In the relational model of data the
preferred terms are table (for file), row (for record) and column (for field).. In the
theory of relational databases the terms are relation (for table), tuple (for row) and
attribute (for column).
Every entity has at least one key. A key is one or more data items whose combined
values are different for every possible instance of the entity.
In UNF all the data items in the data source are listed, repeating items or groups of
items are shown and the key or keys are identified.
In 1NF there must be no repeating items or groups of items. An entity with repeating
items or groups must be split into two or more entities. Each repeating group will
need its own entity.
When one entity is split into two or more entities there will be relationships between
the entities. Usually the relationships will be many-to-one but they may be one-toone or many-to-many.
In the cases of many-to-one and one-to-one the relationships are represented by
putting foreign keys into one of the entities. For a many-to-one relationship the
foreign key is added to the ‘many’ end. For a one-to-one relationship it may be added
to either entity but if one of the entities is such that not every instance has to
participate in the relationship it is better to put the foreign key in that entity. This will
reduce the use of null (empty) values in the foreign key item.
In the case of a many-to-many relationship it is necessary to add a new link entity and
decompose the many-to-many relationship into two many-to-one relationships. The
new entity will have a foreign key for each of the other two entities. In addition it will
have data items that are attributes of the relationship rather than of either of the two
entities.
An item is partially dependent on a key if it is functionally dependent on an item or
items that are in the key but do not make up the whole key. Partial dependencies
cannot exist for single-item keys.
Information Systems: Database Systems (AH)
36
In 2NF there must be no items in an entity that are partially dependent on the key(s).
To reach 2NF any partially dependent items are removed to a new entity, along with
(copies of) the parts of the key they are dependent on. As before, relationships will
need to be established between the entities formed.
An item is indirectly dependent on a key if it is functionally dependent on an item or
items that are themselves not in the key but are functionally dependent on the key.
In 3NF there must be no items in an entity that are indirectly dependent on the key(s).
To reach 3NF any indirectly dependent items are removed to a new entity along with
(copies of) the items they are directly dependent on. As before, relationships will
need to be established between the entities formed.
It is important to identify all the keys of an entity. These are called candidate keys.
The one chosen is sometimes called the primary key.
Information Systems: Database Systems (AH)
37
OUTCOME 2 – COMPLETING THE DATA MODEL
Key Terms
data dictionary
entity life history
event
entity/event matrix
Introduction
This outcome covers several ways of extending the data model produced by
normalization. So far the data items have only been named and grouped together into
normalized entities. More information about the items is put into the data dictionary.
Another significant aspect of the data that has not been looked at is what processing is
performed on the data. The events that cause processing of the data will involve one
or more of the entities, creating, modifying or deleting instances. This
correspondence between events and entities is put into an entity/event matrix.
Finally the sequence in which events can happen is incorporated into entity life
histories.
Classification of constraints
Any data model needs to be able to represent (or, capture) the constraints on the data
it models. We can group together constraints of similar kinds into a classification.
Constraint class
Domain
Key
Description
Restrictions on the values allowed in
a data item
One or more data items that together
identify each record uniquely
Foreign Key
A data item that has values that exist
as the key values in another group
Functional
Dependency
There cannot be more than one value
of a functionally determined data
item for each value of the
determining item(s)
Includes multi-valued dependencies
and arbitrary constraints
Other
Example
Data item ‘Tax date’ holds only valid
dates
Data items ‘Clinic Date’ and ‘Consultant
Name’ together form a key of the
Consultant Session group of data items
‘Consultant Name’ in group Consultant
Session must be a value of ‘Consultant
Name’ in the Consultant group
‘Consultant Name’ functionally
determines ‘Consultant Phone’
The maximum number of appointments at
any single session is 20
This unit requires you to specify the constraints in the first three rows of this table.
These constraints will appear in the data dictionary. The fourth kind, functional
dependency, is involved in determining the keys and in the process of normalization,
but you are not required to explicitly list these constraints.
The last kind is ‘all the rest’ and it is not always possible to fit these constraints into
the relational data model. That means that they would have to be built into the
processing of the operations by some kind of programming.
It is important to realize that the constraints are part of the application. The data
analyst must find out from the customer specifying the system requirements what
these constraints are. Two similar applications may have different constraints.
Information Systems: Database Systems (AH)
38
For example, one organization may have the rule that any given manager may be in
charge of only one department, but another organization may allow one manager to be
in charge of several departments. To make it harder to discover, this second
organization may not have any such cases at the moment, so even the example data
will not show up this difference in the rule.
Although inspecting example data can suggest constraints it can never be used to
show the existence of a constraint. The most we can say is that the example data are
consistent with a certain constraint or that they do not comply with a constraint (and
so act as a counter-example).
Making sure that the actual data satisfies the constraints at all times is sometimes
termed maintaining the integrity of the data. So, for example, foreign key constraints
are called referential integrity constraints.
The Data Dictionary (PC 2a, 2b)
Many of the constraints are incorporated into the data dictionary. In addition, the data
dictionary is the place where standardization of names takes place and any synonyms
are identified. Since many names would be rather long it is usual to use abbreviations
but there is a danger of having several different versions of an abbreviation (e.g.
‘number’ may be abbreviated no., num, nmr, nr, nbr, or even #). So the data
dictionary is the place where the chosen form of abbreviation is defined.
One common convention is to start data item names with an abbreviated form of the
group or entity of which they are part. Another useful convention is to separate the
parts of a name with the underscore character. Whatever abbreviations are used as
part of the item names it is important that there is an unambiguous description of each
item in the dictionary.
For each item we also indicate the type of value that it is drawn from (often called its
domain). The table shows some common types. There is no single standard for how
the types are designated.
Type
Text
Designations
A, Alpha, Text
Whole number
Number, Integer
Number with fractional part
Number, Float
Date
Time
Boolean
Date
Time
Boolean, Logical
Comments
Usually a maximum length is
stated as in A(20).
Validations may be given in the
form of pictures like A9999
(meaning a letter followed by 4
digits).
A common validation is a range,
often shown as in 0..100.
Not always distinguished from
whole numbers.
An important special case is
currency.
May include a time component
Time only without date
Two values true/false, yes/no
At this stage in the design the format or layout of the values is of no concern, so, for
example, a date does not need to be specified as dd/mmm/yyyy or the like.
Information Systems: Database Systems (AH)
39
There is sometimes a choice between text and number. A field may be required to be
all digits so that it could be represented by either text or number. If no arithmetic is to
be carried out on the numbers then it is best to make the field text with a validation of
all digits. In particular this gets round the problem of leading zeros (e.g. 002954),
which are difficult to retain when type number is used.
All foreign keys should be noted under validation. The entry should refer to the key
item from which the values must be taken. Note that the type/size for a foreign key
should always be identical to that of the key to which it refers.
Some items may not have a value at all stages in the lifetime of the entity and these
are marked as not required. Finally, the key items will be shown, every item in the
key being shown with a ‘Y’ entry.
The data dictionary for our Hospital Appointment example is given on the next page.
The items are arranged by entity but another useful arrangement is by alphabetical
order of item names. (In practice the data dictionary would be stored in a database so
that various orderings and searches would be available.) The entity called ‘Consultant
Session’ has been renamed as ‘Clinic’.
Item
Entity
Description
Clinic_Date
Clinic_Cons_
Name
Clinic_Hosp_
Name
Hosp_Name
Hosp_Phone_
Nbr
Cons_Name
Cons_Phone_
Nbr
Appt_Date
Appt_Cons_
Name
Appt_Time
Appt_Pat_
NHS_Nbr
Pat_NHS_Nbr
Clinic
Clinic
Pat_Name
Pat_GP_Nbr
Patient
Patient
GP_Nbr
GP_Name
GP_Addr
GP
GP
GP
Date clinic held
Name of consultant
taking clinic
Name of hospital
where clinic is held
Name of hospital
Phone number of
hospital
Name of consultant
Phone number of
consultant
Date of Appointment
Name of consultant
for appointment
Time of appointment
NHS number of
patient attending
NHS number of
patient
Name of patient
Number of GP of
patient
Number of GP
Name of GP
Address of GP
Clinic
Hospital
Hospital
Consultant
Consultant
Appointment
Appointment
Appointment
Appointment
Patient
Req’d
Key
Y
Y
Y
Y
Y
N
A(32)
A(12)
Y
Y
Y
N
A(32)
A(12)
Y
Y
Y
N
Y
Y
Y
Y
Y
N
Y
N
Y
Y
Y
Y
N
N
Y
Y
Y
Y
N
N
Type/
Size
Date
A(32)
Range/
Validation
A(32)
Existing
Hosp_name
Date
A(32)
Time
A(8)
A(8)
A(32)
A(4)
A(4)
A(32)
A(100)
Existing
Cons_Name
Existing
Pat_NHS_Nbr
99-9999A
Existing
GP_Nbr
9999
(You should now try to produce the data dictionary for your exercise.)
Information Systems: Database Systems (AH)
40
Identification of Events and Construction of Entity/Event Matrix (PC 2c, 2d)
The next task is to identify the events associated with each of the entities. These
events are then placed into an entity/event matrix. In practice it is easier to work
directly with the matrix since the list of events is simply the first column of the
matrix.
Four kinds of event occur.
Event type
Create
Delete
Modify
Matrix entry
C
D
M
Read
R
Comments
An event that creates a new instance of an entity
An event that causes the removal of an instance of an entity
An event that changes the value of one or more items in an
entity
An event that uses the values of items in an entity but does
not change any of them
Each entity should have at least one event that creates an instance and one that deletes
an instance. Occasionally there will be no deletion event but in such cases there is
usually an archiving event that removes the entity instance to some form of archival
storage.
Most entities will also have events that can modify item values. Note that an event
should be shown as of type Modify when there is at least one item that can be
amended; other items may not be allowed to be amended by this or any other event.
The same event may affect more than one entity. For example it may create one
entity, modify another, and read from a third. Also, more than one event may affect
the same entity in the same way. For example, two different events may cause an
instance of an entity to be created.
The Entity/Event matrix for the hospital appointments example follows.
Event
Add hospital
Add consultant
Add clinic
Add GP
Add patient
Make appointment
Change time of appointment
Modify hospital details
Modify consultant details
Modify patient details
Modify GP details
Archive clinic details
Cancel clinic
Cancel appointment
Delete hospital
Delete consultant
Delete patient
Delete GP
Report of appointments by clinic
Report on appointments by patient
Hosp
C
R
R
R
Cons
Clinic
C
R
C
R
R
R
R
Appt
Pat
C
M
C
R
R
GP
C
R
R
M
M
M
M
D
R
R
R
R
D
D
D
R
R
R
D
D
D
R
R
R
R
R
R
R
R
R
R
R
D
R
R
R
D
R
R
(You should now try to derive the entity/event matrix for your exercise.)
Information Systems: Database Systems (AH)
41
Entity Life Histories (PC 2e)
The various events that affect any given entity are collected together and an entity life
history is constructed. This is usually shown in the form of a diagram which is a
hierarchy showing the order in which events can occur.
At the top of the diagram is a rectangular box representing the entity. Below it are
drawn rectangles representing the events that create, modify or delete instances of the
entity (read-only events are not of interest here). The boxes are connected by lines to
show the hierarchy and the sequence in time is from left to right in any group
connected to the same box.
If an event can be repeated (zero or more times) it is shown with an asterisk (*) in the
upper right corner. If an event is an alternative it is shown with a circle (o) in the
corner. Note that in the case of alternatives exactly one of the alternatives must be
selected, it is not allowed for none to be selected or indeed for more than one.
Figures aa1 to aa4 show the different possibilities. Fig. aa1 shows two different styles
of connecting the boxes. In Fig. aa3 note the use of the null box as an alternative to
give the effect of an optional selection.
In order to make the diagrams easier to understand there are two guidelines. All the
boxes connected to a given event should be of the same kind (so, for example, we do
not put an iteration as part of a sequence). Complex sets of events are grouped
together under named boxes (internal nodes) that are not themselves events. Events
only appear at the extremities of the diagram with no other boxes below them. Figs
bb1, bb2 show the example from the arrangements in its original form and then
following these guidelines.
As an example of constructing an entity life history consider the GP entity. From the
entity/event matrix we find that there are just three events: New GP, Modify GP
Details, Delete GP.
The creation event is put first (on the left).
The modifications can occur zero or more times over the lifetime of the GP instance,
so a repetition is needed. This breaks the first guideline in that the events under GP
would be a mixture, so we introduce an internal node. As this node covers the whole
life of the instance between creation and deletion a good name for it is ‘GP Life’. The
repeated event is placed below this node.
The diagram is shown in fig xx1 along with a textual representation of the hierarchy
using indentation. The comments indicate the kind of box (entity, internal node,
event).
No use of selection of alternative events appears in this life history, but suppose that it
is wished to separate the single event ‘Modify GP Details’ into two events, ‘Change
GP Name’, ‘Change GP Address’. These two events could then be alternatives so that
we add below ‘Modify GP Details’ two boxes, each with a circle to show that they are
alternatives. This is shown in fig xx2.
Information Systems: Database Systems (AH)
42
Some possible sequences of events that are allowed by this life history are shown
below. Note that internal nodes do not appear since they are not events.
New GP
Delete GP
Some possible sequences of events for the entity GP
New GP
New GP
Change GP Address
Change GP Address
Change GP Address
Change GP Name
Delete GP
Change GP Address
Delete GP
Many sequences of events are not allowed. All of the following sequences cannot be
formed from the given life history.
(You should be able to say why each sequence is illegal.)
New GP
New GP
Delete GP
Some sequences of events that are not allowed for the entity GP
Change GP Address
Delete GP
Change GP Address
New GP
Delete GP
Change GP Address
Delete GP
Figures xx3 to xx5 show further life histories of entities in the running example.
Figures xx6a and xx6b show two versions of the life history for Appointment. In the
second version the events that terminate the life of the entity instance are classified in
more detail. Both versions are correct and professionals might disagree over which
they prefer. This is rather like programming when there may be several correct
solutions to the same problem.
(You should construct entity life histories for your exercise now.)
Summary of outcome 2 (the fuller data model)
Detail about each data item is put into a data dictionary. The detail consists of the
following items.
Property of the data item
Item name
Entity
Description
Type/Size
Range/Validation
Required (Req’d)
Key
Comment
Different for every item, can use consistent abbreviations
Name of the entity the item is part of
This should expand any abbreviations used in the name
The type of the underlying domain with size shown for text
Any further constraints on the domain, especially for foreign keys
Y if the data item must always have a value
Y if this item is (part of) the key
The various events associated with the entities in the data source are identified, listed
and placed in an entity/event matrix. Every entity that is involved in a particular event
has an entry in the matrix showing whether its effect is to create, modify, delete or
merely read the entity instance. Every event must have at least one creation and one
deletion event.
Information Systems: Database Systems (AH)
43
The final step in the analysis is the construction of entity life histories to show the
order in time of the different events in the lifetime of an entity instance. The life
history is shown in a diagram that is a hierarchy of boxes. Each entity is placed at the
top of its hierarchy by putting the name of the entity in the topmost box. Below it
come first the creation events and last the deletion events. In between come any
modification events.
The order of time is from left to right on the diagram. Events that can be repeated are
marked with an asterisk. Events that are alternatives to be selected are marked with a
circle. The null event has no name and may be used as an alternative in a selection.
Complex events can be decomposed into several events at a lower level in the
hierarchy. Internal nodes (boxes that have at least one box below them in the
hierarchy) will not appear in the event list.
EXERCISES
These exercises give examples of data sources that can be analyzed using the methods
contained in this unit.
1.
A literary agency keeps records of books that have been published in which it
has an interest. The data is kept on forms like the one shown below. There is one
form for each different ISBN (since paperback and hardback editions of the same
book have different ISBNs. Some of the authors belong to a writers’ club run by the
agency.
Catalogue Details
ISBN
Title
Edition
Description
Author Details
Name
Martin Banks
Rodney Line
BOOK ENTRY
Publisher Details
Name
1-3452-7294X
Address
Fishing for All
Postcode
2
Pbk 280pp illus
Phone Number
03838 612345
02028 621378
Henry Hutters
Lamb Place, Dundee
DD1 8RX
Club Membership No. (if any)
Year Joined Club
517
1993
Information Systems: Database Systems (AH)
44
Figure er7.
Edition
Publisher
Name
ISBN
Title
Book
N
Written
By
1
Publisher
1
Description
Written
By
Publisher
Postcode
Publisher
Address
N
Authorship
N
Perform
1
Author
Author
Name
1
Author
Phone No
Information Systems: Database Systems (AH)
Author Club
Memb No
Member
1
Club
Author Year
Joined
45
2.
A collection of music CDs has a filing system that consists of a form for each
CD, an example being shown below. Each CD has a unique number allocated to it.
The details of the tracks are shown on the form. The CDs can be borrowed and the
details of the person (if any) who currently has the CD are also shown. (Former
borrowers are shown crossed through).
CD
CD number
Category
401
Classical
Title
Chamber Music Collection vol. 4
Track Details
Track
Title
Performers / Artists
number
1-5
Schubert Piano Quintet in A,
Jeorg Demus, Schubert Quartet
D667 (The Trout)
6-9
Mozart String Quartet in B flat,
Amadeus Quartet
K458 (The Hunt)
10-12
Beethoven Piano Trio in D,
Kempff – Szeryng – Fournier Trio
Opus 70/1 (The Ghost)
Borrower Details
Name
Address
Phone Number
Jo Lee
Anil Rae
3 Railway Lane, Miltown
24 Main Street, Miltown
0326 721456
0326 742509
Length
30:05
20:45
25:20
Date
Borrowed
16/11/98
4/3/99
Final ER Diagram for Exercise 1.
CD
Numb
er
CD
Category
Borrower
Name
Borrower
Address
CD Title
CD
Borrower
Phone No
Date
Borrowed
Track
Track
Length
Track
Number
Track
Title
Information Systems: Database Systems (AH)
Track
Performers
46
Fig xx1
GP {entity}
Add GP {event}
GP Life {internal node summarising events below it}
* Modify GP Details {repeated event}
Delete GP {event}
GP
Add GP
Delete GP
GP Life
0
*
Modify GP
Details
Information Systems: Database Systems (AH)
47
Fig xx2
GP {entity}
Add GP {event}
GP Life {internal node summarising events below it}
* Modify GP Details {repeated internal node}
° Modify GP Name {alternative event}
° Modify GP Address {alternative event}
Delete GP {event}
GP
Add GP
Delete GP
GP Life
*
Modify GP
Details
o
o
Modify GP
Name
Information Systems: Database Systems (AH)
Modify GP
Address
48
Fig xx3
Hospital
Add Hospital
Hospital Life
* Modify Hospital Details
Delete Hospital
Hospital
Add Hospital
Hospital Life
Delete Hospital
*
Modify Hospital
Details
Fig xx4
Consultant
Add Consultant
Consultant Life
* Modify Consultant Details
Delete Consultant
Consultant
Add
Consultant
Consultant Life
Delete
Consultant
*
Modify Consultant
Details
Information Systems: Database Systems (AH)
49
Fig xx5
Clinic
Add Clinic
Remove Clinic
° Archive Clinic Details
° Cancel Clinic
Clinic
Remove
Clinic
Add Clinic
o
Archive Clinic
Details
Information Systems: Database Systems (AH)
o
Cancel Clinic
50
Fig xx6a
Appointment
Make Appointment
Appointment Life
* Change Time of Appointment
Remove Appointment
° Archive Clinic Details
° Cancel Clinic
° Cancel Appointment
Appointment
Make
Appointment
Appointment
Life
*
Change Time of
Appointment
Information Systems: Database Systems (AH)
Remove
Appointment
o
Archive
Clinic Details
o
Cancel
Clinic
o
Cancel
Appointment
51
Fig xx6b
Appointment
Make Appointment
Appointment Life
* Change Time of Appointment
Remove Appointment
° Archive Clinic Details
° Cancellation
° Cancel Clinic
° Cancel Appointment
Appointment
Make
Appointment
Appointment
Life
*
Change Time of
Appointment
Remove
Appointment
o
Archive
Clinic Details
o
Cancellation
o
Cancel
Clinic
Information Systems: Database Systems (AH)
o
Cancel
Appointment
52
Fig aa1
Sequence (showing two different ways of drawing the connections in the hierarchy).
Event X consists of Event A followed by Event B followed by Event C.
Event X
Event A
Event C
Event B
Event X
Event A
Event B
Event C
Fig aa2
Selection of alternatives.
Event X consists of either Event A, or Event B, or Event C. (One must be selected.)
Event X
o
Event A
o
Event B
o
Event C
Fig aa3
Selection of optional alternatives.
The null box (containing a dash in place of a name) indicates no action.
Event X consists of either Event A or Event B or nothing.
Event X
o
Event A
o
Event B
Information Systems: Database Systems (AH)
o
______
53
Fig aa4
Repetition (iteration)
Event X consists of zero or more repetitions of Event A.
Event X
*
Event A
Fig bb1
Example of ELH from the arrangements – original form.
Invoice
*
Change of
Invoice
Detail
New
Invoice
Created
o
Change to
Existing
Detail Line
o
Detail Line
Deleted
o
New Detail
Line Added
Information Systems: Database Systems (AH)
Invoice
Archived
o
Invoice
Paid Detail
o
Invoice
Cancelled
54
Fig bb2
Example from arrangements redrawn for greater clarity.
A new internal node ‘Invoice Life’ is added so that all the nodes below Invoice are of
the same kind.
Invoice
New
Invoice
Created
Invoice
Life
*
Change of
Invoice
Detail
o
Change to
Existing
Detail Line
Information Systems: Database Systems (AH)
o
Detail Line
Deleted
Invoice
Archived
o
Invoice
Paid Detail
o
Invoice
Cancelled
o
New Detail
Line Added
55
QUESTIONS
These questions can be used to check your understanding of the material. The most
relevant performance criteria are shown after the question number.
1. PC 1e
An entity includes the data items x and y. If the data item x is a key, can (x, y) be a
key of the same entity?
2. PC 1a-d
The following group of items has a single repeating item, phone number. Assume
that clients may share the same phone number. Show how to move the repeating item
to another entity to produce 1NF. What is the key of the new entity? Is the new
entity in 1NF? 2NF? 3NF?
Entity
Client
Data Items
Client Contact Number
Client Name
Client Address
Client Post Code
Repeating item
Phone Number
3. PC 1c
Explain why a single-item key implies that the entity is at least in 2NF.
4. PC 1d
Explain why an all-key entity must be in 3NF. (An all-key entity is one where the key
consists of all the data items in the entity. They can occur when the underlying data
model has a many-to-many relationship.)
5. PC 1d, e
An entity E has four data items a, b, c, d and has two candidate keys (a, b) and c.
Also c functionally determines d. Is it in 3NF? If not, derive entities that are in 3NF.
What are the keys of these entities?
6. PC 1e
The following table shows some data for an entity concerning the poetical works of a
certain author. Why can the entity not have as its key the data item ‘Title’? What
could be a key in this example?
Why can we not be sure that this is a key?
Information Systems: Database Systems (AH)
56
Title
Year First Published
Threnody
Image of Man
Roses
To Winter
Pleasure’s Repose
My Heart’s Hope
Roses
Conquest of Care
1856
1856
1849
1860
1888
1860
1888
1860
Volume in Collected
Works
2
2
1
3
4
4
4
4
Number of
Lines
116
240
35
86
463
44
52
362
7. PC 2a, b
What is wrong with the following excerpt from a DD?
Item
Entity
Description
Emp_Nbr
Employee
Name
Employee
Address
Emp_Dept_No
Employee
Employee
Dept_Nbr
Department
Name
Dept_Mgr
Department
Department
Unique number for
employee
Full name of
employee
Address of employee
Number of
department in which
employee works
Unique number for
department
Name of department
Emp_Nbr of manager
of department
Type/
Size
A(6)
Range/
Validation
999999
A(40)
A(100)
A(4)
Number
A(32)
A(6)
Existing
Dept_Nbr
In 0..999
Existing
Emp_Nbr
Req’d
Key
Y
Y
Y
N
N
Y
N
N
N
Y
Y
Y
N
N
8. PC 2c
What is probably missing from this list of events for the entity Employee?
Events for Employee
Modify Employee Name
Modify Employee Address
Change Department of Employee
Dismiss Employee
Employee Resigns
Information Systems: Database Systems (AH)
57
9. PC 2e
Which of the following sequences are allowed and which are not allowed for the
Client ELH?
Sequence 1
Add Client
Delete Client
Sequence 2
Add Client
Change Client Name
Add Client
Change Client Address
Delete Client
Sequence 3
Add Client
Change Client Address
Change Client Address
Delete Client
Sequence 4
Delete Client
Add Client
Change Client Name
Delete Client
Client
Delete Client
Client Life
Add Client
*
Modify Client
Details
o
Change Client
Name
Information Systems: Database Systems (AH)
o
Change Client
Address
58
10. PC 2e
What is wrong with this ELH of Employee?
Employee
Employee Life
Delete Employee
*
Modify Employee
Details
Add Employee
Print List of
Employees
11. PC 2e
Improve the following ELH of Customer.
Customer
*
Modify Customer
Details
Add Customer
o
Change Customer
Phone Number
Information Systems: Database Systems (AH)
Remove Customer
o
Change Customer
Address
59
ANSWERS TO QUESTIONS
These questions can be used to check your understanding of the material. The most
relevant performance criteria are shown after the question number.
1. PC 1e
An entity includes the data items x and y. If the data item x is a key, can (x, y) be a
key of the same entity?
Answer. No. A key cannot include any unnecessary items. The combination (x, y) is
called a superkey.
2. PC 1a-d
The following group of items has a single repeating item, phone number. Assume
that clients may share the same phone number. Show how to move the repeating item
to another entity to produce 1NF. What is the key of the new entity? Is the new
entity in 1NF? 2NF? 3NF?
Entity
Client
Data Items
Client Contact Number
Client Name
Client Address
Client Post Code
Repeating item
Phone Number
Answer. Move the item Phone Number into a new entity, Phone, along with the key
of Client as a foreign key. The foreign key implements the one-to-many relationship
between the Client and Phone entities. The key of Phone is the combination of both
items since there may be several clients sharing one number and several phone
numbers for one client. Phone is in 1NF since there are no repeating items or groups.
It is in 2NF since there are no other items than the key to be partially dependent. It is
in 3NF for a similar reason: there are no items that can be indirectly dependent.
Entity
Client
Phone
Data Items
Client Contact Number
Client Name
Client Address
Client Post Code
Phone Number
Client Contact Number
Information Systems: Database Systems (AH)
60
(a) The first diagram shows a repeated item, Phone Number.
Client
Contract No
Phone
Number
Client
Client Name
Client Post
Code
Client
Address
(b) The second diagram shows the Phone Number treated as a separate entity.
Phone Number
Client
Contract No
1
Client Name
Client
Client
Address
Has
N
Phone
Client Post
Code
3. PC 1c
Explain why a single-item key implies that the entity is at least in 2NF.
Answer. Partial dependence requires there to be more than one item in the key, so
there can be no partial dependencies in an entity with a single-item key.
4. PC 1d
Explain why an all-key entity must be in 3NF. (An all-key entity is one where the key
consists of all the data items in the entity. They can occur when the underlying data
model has a many-to-many relationship.)
Answer. An entity is in 3NF if there are no indirect dependencies. An indirect
dependency requires two items other than the key to have a functional dependency. In
an all-key entity there are no such items.
5. PC 1d, e
An entity E has four data items a, b, c, d and has two candidate keys (a, b) and c.
Also c functionally determines d. Is it in 3NF? If not, derive entities that are in 3NF.
What are the keys of these entities?
Information Systems: Database Systems (AH)
61
Answer. It is not in 3NF since d is indirectly dependent on the key (a, b). (Note that
the key (a,b) functionally determines c.) We can remove d to a new entity, F, which
must also contain as a foreign key the item c. The key of F will be c. The new entity
E will still have two candidate keys (a, b) and c.
6. PC 1e
The following table shows some data for an entity concerning the poetical works of a
certain author. Why can the entity not have as its key the data item ‘Title’? What
could be a key in this example?
Why can we not be sure that this is a key?
Title
Year First Published
Threnody
Image of Man
Roses
To Winter
Pleasure’s Repose
My Heart’s Hope
Roses
Conquest of Care
1856
1856
1849
1860
1888
1860
1888
1860
Volume in Collected
Works
2
2
1
3
4
4
4
4
Number of
Lines
116
240
35
86
463
44
52
362
Answer. The item Title cannot be a key since the value ‘Roses’ appears in two
instances (records). A possible key is to combine Title with Year First Published.
With the data shown this combination is unique. However, it is possible that two
works with the same title could have been published in the same year (even if not in
the same publication). There are similar problems to combining with the other items.
7. PC 2a, b
What things are wrong with the following excerpt from a DD?
Item
Entity
Description
Emp_Nbr
Employee
Name
Employee
Address
Emp_Dept_No
Employee
Employee
Dept_Nbr
Department
Name
Dept_Mgr
Department
Department
Unique number for
employee
Full name of
employee
Address of employee
Number of
department in which
employee works
Unique number for
department
Name of department
Emp_Nbr of manager
of department
Type/
Size
A(6)
Range/
Validation
999999
A(40)
A(100)
A(4)
Number
A(32)
A(6)
Existing
Dept_Nbr
In 0..999
Existing
Emp_Nbr
Req’d
Key
Y
Y
Y
N
N
Y
N
N
N
Y
Y
Y
N
N
Answer. (1) The data item ‘Name’ is duplicated, it appears in each of the entities
Employee and Department. (2) The abbreviation for ‘number’ is not consistent (both
Nbr and No are used). (3) The type of the foreign key Emp_Dept_No is different
from the type of the key to which it refers (A (4) and Number, respectively). (4) The
key of Department is shown as not required but all key items have required values.
Information Systems: Database Systems (AH)
62
8. PC 2c
What is probably missing from this list of events for the entity Employee?
Events for Employee
Modify Employee Name
Modify Employee Address
Change Department of Employee
Dismiss Employee
Employee Resigns
Answer. There is no event that creates an instance of the entity. It is possible that one
of the events is badly named. The entity/event matrix would show whether this was
the case. There should be at least one event with a ‘C’ entry in the matrix for the
Employee entity.
9. PC 2e
Which of the following sequences are allowed and which are not allowed for the
Client ELH?
Sequence 1
Add Client
Delete Client
Sequence 2
Add Client
Change Client Name
Add Client
Change Client Address
Delete Client
Sequence 3
Add Client
Change Client Address
Change Client Address
Delete Client
Sequence 4
Delete Client
Add Client
Change Client Name
Delete Client
Client
Delete Client
Client Life
Add Client
*
Modify Client
Details
0o
Change Client
Name
Information Systems: Database Systems (AH)
o
Change Client
Address
63
Answer. Sequence 1 is OK since the repetition allows zero repeats. Sequence 2 is
illegal since the second Add Client cannot happen at all: it is not a repeated event, nor
can it happen after Change Client Name which is part of the Client Life internal node
that follows Add Client. Sequence 3 is OK since repetition of Modify Client Details
is allowed. Sequence 4 is illegal since Delete Client cannot be first in the sequence.
10. PC 2e
What is wrong with this ELH of Employee?
Employee
Employee Life
Delete Employee
*
Modify Employee
Details
Add Employee
Print List of
Employees
Answer. (1) The Delete Employee event should not come first, nor can the Add
Employee event that creates instances come last. They should be interchanged. (2)
The event ‘Print List of Employees’ is likely to be read-only and so should not be
shown in the ELH at all. If it does update the Employee entity it should be renamed
to show this. The entity/event matrix should be consulted and revised if necessary.
11. PC 2e
Improve the following ELH of Customer.
Customer
*
Modify Customer
Details
Add Customer
o
Change Customer
Phone Number
Information Systems: Database Systems (AH)
Remove Customer
o
Change Customer
Address
64
Answer. Since the three boxes at the top level are not a pure sequence (the second one
is a repetition) it is better to introduce a new internal node to cover the life of the
Customer between creation and deletion. The repetition is now placed under this new
node.
Customer
Customer Life
Add Customer
Remove Customer
*
Modify Customer
Details
o
Change Customer
Phone Number
Information Systems: Database Systems (AH)
o
Change Customer
Address
65
ANSWERS TO EXERCISES
These exercises give examples of data sources that can be analyzed using the methods
contained in this unit.
1.
A literary agency keeps records of books that have been published in which it has
an interest. The data is kept on forms like the one shown below. There is one
form for each different ISBN (since paperback and hardback editions of the same
book have different ISBNs). Some of the authors belong to a writers’ club run by
the agency.
Catalogue Details
ISBN
Title
Edition
Description
Author Details
Name
Martin Banks
Rodney Line
BOOK ENTRY
Publisher Details
Name
1-3452-7294X
Address
Fishing for All
Postcode
2
Pbk 280pp illus
Phone Number
03838 612345
02028 621378
Henry Hutters
Lamb Place, Dundee
DD1 8RX
Club Membership No. (if any)
Year Joined Club
517
1993
Answer.
Normalization.
Step 1. Some item names are extended to show their meaning more clearly.
Entity
Book Entry
Data Items
ISBN
Title
Edition
Description
Publisher Name
Publisher Address
Publisher Postcode
Author Name
Author Phone Number
Author Club Membership Number
Author Year Joined Club
Step 2. The last four data items form a repeating group (since there can be several
authors for a given book).
Step 3. The key for Book Entry is ISBN. We have now reached UNF.
Information Systems: Database Systems (AH)
66
Entity
Book Entry
Data Items
ISBN
Title
Edition
Description
Publisher Name
Publisher Address
Publisher Postcode
Repeating group
Author Name
Author Phone Number
Author Club Membership Number
Author Year Joined Club
Step 4. We remove the repeating group items to a new entity called Author. The key
of Author is Author Name (assuming that no two authors have the same name).
Entity
Book Entry
Author
Data Items
ISBN
Title
Edition
Description
Publisher Name
Publisher Address
Publisher Postcode
Author Name
Author Phone Number
Author Club Membership Number
Author Year Joined Club
Step 5. The relationship between Book Entry and Author is many-to-many. We add a
new entity, Authorship, that is in a many-to-one relationship with each of Book Entry
and Author. We add the key of Book Entry and the key of Author to Authorship as
foreign keys to implement this relationship. The key of Authorship consists of both
these foreign keys. We have now reached 1NF.
Entity
Book Entry
Authorship
Author
Data Items
ISBN
Title
Edition
Description
Publisher Name
Publisher Address
Publisher Postcode
ISBN
Author Name
Author Name
Author Phone Number
Author Club Membership Number
Author Year Joined Club
Step 6. Since Book Entry has a single-item key it is in 2NF. Since Author has a
single-item key it is in 2NF. Since Authorship is all key (i.e. all data items are part of
the key) it must be in 2NF.
Information Systems: Database Systems (AH)
67
Step 7. Since all three entities are in 2NF there is no work to be done at this step.
Steps 8 and 9.
(1) Book Entry is not in 3NF since it has items that are indirectly dependent on the
key, namely Publisher Name, Address and Postcode. Remove these to a new entity,
Publisher. Add the key of Book Entry to it as a foreign key to implement the manyto-one relationship between Book Entry and Publisher. The key of Publisher will be
Publisher Name (assumed unique).
(2) Authorship has no non-key items and is also in 3NF.
(3) It is likely that Author Year Joined Club is functionally dependent on Author Club
Membership Number so there is an indirect dependency on the key. Remove Author
Year Joined Club to a new entity, Club. Put key of Author into Club as a foreign key
to implement the relationship (which is one-to-one). The key of Club is Author Club
Membership Number.
The final set of 3NF entities is given below.
Entity
Book Entry
Publisher
Authorship
Author
Club
Information Systems: Database Systems (AH)
Data Items
ISBN
Title
Edition
Description
Publisher Name
Publisher Name
Publisher Address
Publisher Postcode
ISBN
Author Name
Author Name
Author Phone Number
Author Club Membership Number
Author Club Membership Number
Author Year Joined Club
68
The completed data model.
Figure er6a.
ER diagrams for Exercise 1.
If we wish to show the many-to-many relationship between Book Entry and Author
the ER diagram is as shown below with not all attributes shown.
Title
Edition
Description
Author
Name
ISBN
M
Book Entry
Publisher
attribs …
Information Systems: Database Systems (AH)
Written
By
N
Author
Author
attribs …
69
Figure er6b.
After the Authorship entity is introduced we have the many-to-many relationship
decomposed into two many-to-one relationships. Note that Authorship has no
attributes: its sole purpose is to relate Book Entry and Author.
Title
Edition
Description
ISBN
Book Entry
1
Publisher
attribs …
Written
By
N
Authorship
N
Perform
Author
Name
1
Author
Author
attribs ….
Information Systems: Database Systems (AH)
70
Data dictionary.
There will be some renaming. Each item will be prefixed with the (possibly
abbreviated) entity name to make it unique. Consistent abbreviations are used. The
entity ‘Book Entry’ is now better named just ‘Book’.
Item
Entity
Description
Type/
Size
A(10)
Book_ISBN
Book
Unique ISBN for
each book
Book_Title
Book_Edition
Book_
Description
Book_Publisher
Book
Book
Book
A(60)
Number
A(30)
Book
Title of book
Edition number
Binding, pagination,
etc.
Publisher of book
Pub_Name
Pub_Address
Pub_Postcode
Aship_ISBN
Publisher
Publisher
Publisher
Authorship
Name of publisher
Address of publisher
Postcode of publisher
ISBN of book written
A(30)
A(100)
A(10)
A(10)
Aship_Author
Authorship
Author writing
A(30)
Auth_Name
Auth_Phone
Author
Author
A(30)
A(12)
Auth_Club_Nbr
Author
Name of author
Phone number of
author
Membership number
in club
Club_Memb_Nbr
Club
Number
Club_Year_Join
Club
Unique membership
number
Year joined club
A(30)
Number
Number
Range/
Validation
Last digit is
a modulus11 check
Existing
Pub_Name
Exisiting
Book_ISBN
Existing
Auth_Name
Existing
Club_
Memb_Nbr
In
1900..2100
Req’d
Key
Y
Y
Y
Y
N
N
N
N
Y
N
Y
Y
Y
Y
Y
N
N
Y
Y
Y
Y
N
Y
N
N
N
Y
Y
Y
N
Entity/event matrix. No reports are shown in this list of events but many could be
included.
Event
Add book
Add publisher
Add authorship of book
Add author
Author join club
Modify book details
Modify publisher details
Modify author details
Modify club member details
Delete book
Delete publisher
Author leave club
Delete author
Delete authorship
Book
C
Pub
R
C
R
Aship
Auth
Club
C
R
C
R
C
M
R
M
M
M
D
R
Information Systems: Database Systems (AH)
R
D
R
R
D
D
D
D
R
71
Entity Life Histories.
Book
Add Book
Delete Book
Book Life
*
Modify Book
Details
Publisher
Add Publisher
Publisher Life
Delete Publisher
*
Modify Publisher
Details
Information Systems: Database Systems (AH)
72
Authorship
Delete Authorship
Add Authorship of
Book
Author
Delete Author
Author Life
Add Author
*
Modify Author
Details
Club
Author Join Club
Remove Club
Member
Club Life
*
Modify Club
Member Details
Information Systems: Database Systems (AH)
o
Author Leave
Club
o
Delete Author
73
2. A collection of music CDs has a filing system that consists of a form for each
CD, an example being shown below. Each CD has a unique number allocated to it.
The details of the tracks are shown on the form. The CDs can be borrowed and the
details of the person (if any) who currently has the CD are also shown. (Former
borrowers are shown crossed through).
CD
CD number
Category
401
Classical
Title
Chamber Music Collection vol. 4
Track Details
Track
Title
Performers / Artists
number
1-5
Schubert Piano Quintet in A,
Jeorg Demus, Schubert Quartet
D667 (The Trout)
6-9
Mozart String Quartet in B flat,
Amadeus Quartet
K458 (The Hunt)
10-12
Beethoven Piano Trio in D,
Kempff – Szeryng – Fournier Trio
Opus 70/1 (The Ghost)
Borrower Details
Name
Address
Phone Number
Jo Lee
Anil Rae
3 Railway Lane, Miltown
24 Main Street, Miltown
0326 721456
0326 742509
Length
30:05
20:45
25:20
Date
Borrowed
16/11/98
4/3/99
Answer.
Normalization.
Step 1. Some item names are extended to show their meaning more clearly.
Entity
CD
Data Items
CD Number
CD Category
CD Title
Track Number
Track Title
Track Performers
Track Length
Borrower Name
Borrower Address
Borrower Phone Number
Date Borrowed
Step 2. The four data items containing track data form a repeating group (since there
can be several tracks for a given CD). Those for Borrower do not form a repeating
group as only the current borrower (if any) is retained. (If a history of borrowing
were to be kept then the last four items would indeed be another repeating group and
would need to be moved to another entity for borrower. See steps 8 and 9, however.
What would be different in this case is the kind of relationship between CD and
borrower.
Information Systems: Database Systems (AH)
74
For a history of borrowing this relationship would be many-to-many and a further
entity would be needed to hold a borrowing with foreign keys for both CD and
borrower.)
Step 3. The key for CD is CD Number. We have now reached UNF.
Entity
CD
Data Items
CD Number
CD Category
CD Title
Repeating group
Track Number
Track Title
Track Performers
Track Length
Borrower Name
Borrower Address
Borrower Phone Number
Date Borrowed
Step 4. We remove the repeating group items to a new entity called Track.
Entity
CD
Track
Information Systems: Database Systems (AH)
Data Items
CD Number
CD Category
CD Title
Borrower Name
Borrower Address
Borrower Phone Number
Date Borrowed
Track Number
Track Title
Track Performers
Track Length
75
Figure er9.
After step 4 we have two entities in a one-to-many relationship.
CD
Numb
er
CD
Category
Borrower
Name
Borrower
Address
CD Title
CD
Date
Borrowed
Borrower
Phone No
1
Part
N
Track
Track
Length
Track
Number
Track
Title
Track
Performers
Step 5. The relationship between CD and Track is one-to-many. We add the key of
CD to Track as a foreign key to implement this relationship. The key of Track now
has the foreign key as well since one Track number value may be on more than one
CD. We have now reached 1NF.
Entity
CD
Track
Data Items
CD Number
CD Category
CD Title
Borrower Name
Borrower Address
Borrower Phone Number
Date Borrowed
CD Number
Track Number
Track Title
Track Performers
Track Length
Steps 6, 7. Since CD has a single-item key it is in 2NF. Track has no partial
dependencies of the last three items on the key since neither CD Number alone nor
Track Number alone is sufficient to determine these three items. So Track is in 2NF.
Information Systems: Database Systems (AH)
76
Steps 8 and 9. (1) CD is not in 3NF since it has items that are indirectly dependent on
the key, namely Borrower Address and Borrower Phone. (This assumes that
Borrower Name is unique.) Remove these to a new entity, Borrower. The key of
Borrower will be Borrower Name (which we have assumed unique). Leave the key of
Borrower in CD as a foreign key to implement the many-to-one relationship between
CD and Borrower. CD and Borrower are now in 3NF.
(2) Track has no indirectly dependent items so it is in 3NF.
The final set of 3NF entities is given below.
Entity
CD
Borrower
Track
Information Systems: Database Systems (AH)
Data Items
CD Number
CD Category
CD Title
Borrower Name
Date Borrowed
Borrower Name
Borrower Address
Borrower Phone Number
CD Number
Track Number
Track Title
Track Performers
Track Length
77
The completed data model.
Figure er10.
CD
Number
CD
Category
Borrower
Name
CD Title
Has
CD
N
Date
Borrowed
Borrower
Address
Borrower
1
1
Borrower
Phone No
Part
N
Track
Track
Length
Track
Number
Track
Title
Track
Performers
Data dictionary.
There will be some renaming. Each item will be prefixed with the (possibly
abbreviated) entity name to make it unique. Consistent abbreviations are used.
Item
Entity
Description
CD_Nbr
CD
CD_Category
CD_Title
CD_Date_
Borrowed
CD_Borrow_
Name
CD
CD
CD
Borr_Name
Borr_Address
Borr_Phone_Nbr
Borrower
Borrower
Borrower
Track_CD_Nbr
Track
Unique number for
each CD
Category of music
Title of CD
Date CD borrowed (if
any)
Name of current
borrower of CD (if
any)
Name of borrower
Address of borrower
Phone number of
borrower
CD number of this
track
CD
Information Systems: Database Systems (AH)
Type/
Size
A(10)
Range/
Validation
A(24)
A(80)
Date
A(30)
Existing
Borr_Name
A(30)
A(100)
A(12)
A(10)
Exisiting
CD_Nbr
Req’d
Key
Y
Y
N
Y
N
N
N
N
N
N
Y
Y
N
Y
N
N
Y
Y
78
Item
Entity
Description
Track_Nbr
Track_Title
Track_Performer
s
Track_Length
Track
Track
Track
Track number(s)
Title of track
Performers of track(s)
Type/
Size
A(6)
A(80)
A(120))
Track
Length of track(s)
Time
Range/
Validation
Req’d
Key
Y
N
N
Y
N
N
N
N
Entity/event matrix. No reports are shown in this list of events but many could be
included.
Information Systems: Database Systems (AH)
79
Event
Add CD
Add borrower
Modify CD details
Modify borrower details
Modify track details
Delete CD
Delete borrower
Borrow CD
Return CD
CD
C
Borr
Track
C
C
M
M
R
D
R
M
M
M
D
R
D
R
R
Entity Life Histories.
Borrower
Add Borrower
Delete Borrower
Borrower Life
*
Modify Borrower
Details
Track
Add CD
Delete CD
Track Life
*
Modify Track
Details
Information Systems: Database Systems (AH)
80
CD
Delete CD
CD Life
Add CD
*
Modify CD
o
o
Modify CD
Details
Loan
Borrow CD
Information Systems: Database Systems (AH)
Return CD
81
Download