Chapter 3
The Entity-Relationship Approach
Author: Graeme C. Simsion and Graham C. Witt
Copyright: ©2005 by Elsevier Inc. All rights reserved.
1
Normalisation
•
•
Each column can only have single facts. Do
this first.
Very simply normalization is essentially a
two-step process:
1. Put the data into tabular form (by removing
repeating groups to new tables).
2. Remove duplicated data to separate tables.
Critically: Every time we create a table (in either
step), we need to identify its primary key.
•
We did all this in the example in the last
lecture (the Drug Expenditure example)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
2
More formally
• Apart from repeating groups, we are
looking at certain relationships between
data in the tables.
– Which column(s) determine other
column(s)
– Create tables around the determining
column(s) (we call these determining
columns determinants)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
3
Determinants
• We divided the various tables (in step 2)
according to determinants.
• Hospital Number  Hospital Name, Contact
Person, Hospital Type, Teaching Status
• where we read ““ as “determines” or “is a
determinant of”.
• Determinants can be a combination of two or
more columns. Eg: Hospital Number +
Operation Number  Surgeon Number.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
4
Step 2 of Normalisation
• Identify any determinants, other than the
primary key, and the columns they determine
• Establish a separate table for each
determinant and the columns it determines.
The determinant becomes the key of the new
table.
• Name the new tables.
• Remove the determined columns from the
original table. Leave the determinants to
provide links between tables.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
5
What are determinants?
• Look for columns that appear by their
names to be identifiers. These may be
determinants or components of
determinants.
• Look for columns that appear to
describe something other than what the
table is about. Then look for other
columns that identify this “something”
Copyright: ©2005 by Elsevier Inc. All rights reserved.
6
Which Determinants were in the
Drug Expenditure Example?
• Hospital Number  Hospital Name, Contact
Person, Hospital Type, Teaching Status.
• Others in Operation table:
– Hospital Number + Surgeon Number  Surgeon
Specialty
– Operation Code  Operation Name, Procedure
Group
• Drug Administration table:
– Drug Short Name  Drug Name, Manufacturer
– Drug Short Name + Method of Administration +
Size of Dosage + Unit of Measure  Dose Cost
Copyright: ©2005 by Elsevier Inc. All rights reserved.
7
The Final Design
• The final design we have is in Third
Normal Form (3NF).
• By splitting tables along determinants
(or functional dependencies) we can tet
the design into 3NF easily.
• What about Performance? Surely all
Those Tables Will Slow Things Down?
Copyright: ©2005 by Elsevier Inc. All rights reserved.
8
Take a moment…
• Go back and examine the last lecture
and see that this is what we did in
normalization!
Copyright: ©2005 by Elsevier Inc. All rights reserved.
9
Performance of Normalised
Databases
• There are many tables for what seems to be
relatively little data.
• Thanks to advances in the capabilities of
DBMSs, and the increased power of
computer hardware, the number of tables is
less likely to be an important determinant of
performance than it might have been in the
past.
• But, performance is not an issue at this stage
(that comes later). We are designing here!
Copyright: ©2005 by Elsevier Inc. All rights reserved.
10
Definitions and a Few
Refinements (1)
• Determinants and Functional Dependency
– For each value of the determinant, there can only
be one value of some other nominated column(s)
in the table at any point in time
– The other nominated columns are functionally
dependent on the determinant.
– The determinant concept is what 3NF is all about;
we are simply grouping data items around their
determinants.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
11
Definitions and a Few
Refinements (2)
• Primary Keys
– A primary key is a nominated column or
combination of columns that has a different value
for every row in the table. Each table has one (and
only one) primary key.
• Candidate Keys
– Sometimes more than one column or combination
of columns could serve as a primary key. We refer
to such possible primary keys, whether chosen or
not, as candidate keys.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
12
Definitions and a Few
Refinements (3)
• A More Formal Definition of Third Normal
Form
• If we define the term “non-key column” to
mean “a column that is not part of the primary
key,” then we can say:
– A table is in 3NF if the only determinants of nonkey columns are candidate keys.
– If we want to be even more formal, we should
explicitly exclude trivial determinants: each column
is of course a determinant of itself.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
13
Definitions and a Few
Refinements (3)
• Foreign Keys
– When removing repeating groups to a new table,
we carried the primary key of the original table
with us, to cross-reference to the source.
– These cross-referencing columns are called
foreign keys, and they are our principal means of
linking data from different tables.
– Note that “elsewhere in the data model” may
include “elsewhere in the same table.” For
example, an Employee table might have a
primary key of Employee Number.
– A common convention for highlighting the foreign
keys in a model is an asterisk, as shown.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
14
Definitions and a Few
Refinements (4)
• Referential Integrity
– Imagine the Operation table that uses
hospital number to point to the relevant
Hospital records. We expect every hospital
number in the Operation table to have a
matching hospital number in the Hospital
table. This is referential integrity.
• Modern DBMSs provide referential
integrity features.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
15
Anomalies that Normalisation
is Really About
• Update Anomalies:
– Insertion anomalies
– Change anomalies
– Deletion anomalies
Copyright: ©2005 by Elsevier Inc. All rights reserved.
16
Denormalization and
Unnormalization
• it is sometimes necessary to compromise one data
modeling objective to achieve another.
• Occasionally, we implement database designs that
are not fully normalized to achieve some other
objective (most often performance).
• We normalize to achieve: completeness, nonredundancy, flexibility of extending repeating groups,
ease of data reuse, and programming simplicity. We
sacrifice this when we de-normalize.
• In many cases, these sacrifices will be prohibitively
costly.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
17
You don’t need to normalize like
this always
• The past two lectures have shown you what
makes a well structured database design
shown as tables.
• Don’t do it like this every time!
• There is the equivalent of a blue-print for data
modelling other than the table-like description
we’ve seen.
• Let’s return to the Drug Expenditure design.
Copyright: ©2005 by Elsevier Inc. All rights reserved.
18
Drug Expenditure Database
Model as Relations (Tables)
• OPERATION (Hospital Number*, Operation Number, Operation
Code*, Surgeon Number*)
• SURGEON (Hospital Number*, Surgeon Number, Surgeon
Specialty)
• OPERATION TYPE (Operation Code, Operation Name,
Procedure Group)
• STANDARD DRUG DOSAGE (Drug Short Name*, Method of
Administration, Size of Dose, Unit of Measure, Method of
Administration, Standard Cost of Dose Cost)
• DRUG (Drug Short Name, Drug Name, Manufacturer)
• HOSPITAL (Hospital Number, Hospital Name, Hospital
Category, Contact Person)
• DRUG ADMINISTRATION (Hospital Number*, Operation
Number*, Drug Short Name*, Method of Administration*, Size of
Dose*, Unit of Measure*, Method of Administration*, Hospital
Number*, Operation Number*, Number of Doses)
Copyright: ©2005 by Elsevier Inc. All rights reserved.
19
Drug Expenditure Database Model
as Entity-Relationship Model
Hospital
be
performed
at
be prescribed at
operate
at
be
operated
at by
be
managed
by
Operation
Type
be
classified
by
classify
Surgeon
perform
manage
Operation
Drug
follow
be of
use
be
available
in
Standard
Drug Dosage use
be used in
be used in
be
followed by
Drug
Admin
prescribe
Copyright: ©2005 by Elsevier Inc. All rights reserved.
20
What did we do?
• Each table is a box
• Each link via a foreign key is shown using a
line with some other markings (we’ll get to
these)
• Each box has a name that describes what
each row in the underlying table is about
• What do each of these mean?
• This leads us to the higher level model called
the Entity-Relationship Model… it is the
architects view of the database.
• We begin this next lecture…
Copyright: ©2005 by Elsevier Inc. All rights reserved.
21