Chapter 3 The Entity-Relationship Approach Author: Graeme C. Simsion and Graham C. Witt Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Normalisation • • Each column can only have single facts. Do this first. Very simply normalization is essentially a two-step process: 1. Put the data into tabular form (by removing repeating groups to new tables). 2. Remove duplicated data to separate tables. Critically: Every time we create a table (in either step), we need to identify its primary key. • We did all this in the example in the last lecture (the Drug Expenditure example) Copyright: ©2005 by Elsevier Inc. All rights reserved. 2 More formally • Apart from repeating groups, we are looking at certain relationships between data in the tables. – Which column(s) determine other column(s) – Create tables around the determining column(s) (we call these determining columns determinants) Copyright: ©2005 by Elsevier Inc. All rights reserved. 3 Determinants • We divided the various tables (in step 2) according to determinants. • Hospital Number Hospital Name, Contact Person, Hospital Type, Teaching Status • where we read ““ as “determines” or “is a determinant of”. • Determinants can be a combination of two or more columns. Eg: Hospital Number + Operation Number Surgeon Number. Copyright: ©2005 by Elsevier Inc. All rights reserved. 4 Step 2 of Normalisation • Identify any determinants, other than the primary key, and the columns they determine • Establish a separate table for each determinant and the columns it determines. The determinant becomes the key of the new table. • Name the new tables. • Remove the determined columns from the original table. Leave the determinants to provide links between tables. Copyright: ©2005 by Elsevier Inc. All rights reserved. 5 What are determinants? • Look for columns that appear by their names to be identifiers. These may be determinants or components of determinants. • Look for columns that appear to describe something other than what the table is about. Then look for other columns that identify this “something” Copyright: ©2005 by Elsevier Inc. All rights reserved. 6 Which Determinants were in the Drug Expenditure Example? • Hospital Number Hospital Name, Contact Person, Hospital Type, Teaching Status. • Others in Operation table: – Hospital Number + Surgeon Number Surgeon Specialty – Operation Code Operation Name, Procedure Group • Drug Administration table: – Drug Short Name Drug Name, Manufacturer – Drug Short Name + Method of Administration + Size of Dosage + Unit of Measure Dose Cost Copyright: ©2005 by Elsevier Inc. All rights reserved. 7 The Final Design • The final design we have is in Third Normal Form (3NF). • By splitting tables along determinants (or functional dependencies) we can tet the design into 3NF easily. • What about Performance? Surely all Those Tables Will Slow Things Down? Copyright: ©2005 by Elsevier Inc. All rights reserved. 8 Take a moment… • Go back and examine the last lecture and see that this is what we did in normalization! Copyright: ©2005 by Elsevier Inc. All rights reserved. 9 Performance of Normalised Databases • There are many tables for what seems to be relatively little data. • Thanks to advances in the capabilities of DBMSs, and the increased power of computer hardware, the number of tables is less likely to be an important determinant of performance than it might have been in the past. • But, performance is not an issue at this stage (that comes later). We are designing here! Copyright: ©2005 by Elsevier Inc. All rights reserved. 10 Definitions and a Few Refinements (1) • Determinants and Functional Dependency – For each value of the determinant, there can only be one value of some other nominated column(s) in the table at any point in time – The other nominated columns are functionally dependent on the determinant. – The determinant concept is what 3NF is all about; we are simply grouping data items around their determinants. Copyright: ©2005 by Elsevier Inc. All rights reserved. 11 Definitions and a Few Refinements (2) • Primary Keys – A primary key is a nominated column or combination of columns that has a different value for every row in the table. Each table has one (and only one) primary key. • Candidate Keys – Sometimes more than one column or combination of columns could serve as a primary key. We refer to such possible primary keys, whether chosen or not, as candidate keys. Copyright: ©2005 by Elsevier Inc. All rights reserved. 12 Definitions and a Few Refinements (3) • A More Formal Definition of Third Normal Form • If we define the term “non-key column” to mean “a column that is not part of the primary key,” then we can say: – A table is in 3NF if the only determinants of nonkey columns are candidate keys. – If we want to be even more formal, we should explicitly exclude trivial determinants: each column is of course a determinant of itself. Copyright: ©2005 by Elsevier Inc. All rights reserved. 13 Definitions and a Few Refinements (3) • Foreign Keys – When removing repeating groups to a new table, we carried the primary key of the original table with us, to cross-reference to the source. – These cross-referencing columns are called foreign keys, and they are our principal means of linking data from different tables. – Note that “elsewhere in the data model” may include “elsewhere in the same table.” For example, an Employee table might have a primary key of Employee Number. – A common convention for highlighting the foreign keys in a model is an asterisk, as shown. Copyright: ©2005 by Elsevier Inc. All rights reserved. 14 Definitions and a Few Refinements (4) • Referential Integrity – Imagine the Operation table that uses hospital number to point to the relevant Hospital records. We expect every hospital number in the Operation table to have a matching hospital number in the Hospital table. This is referential integrity. • Modern DBMSs provide referential integrity features. Copyright: ©2005 by Elsevier Inc. All rights reserved. 15 Anomalies that Normalisation is Really About • Update Anomalies: – Insertion anomalies – Change anomalies – Deletion anomalies Copyright: ©2005 by Elsevier Inc. All rights reserved. 16 Denormalization and Unnormalization • it is sometimes necessary to compromise one data modeling objective to achieve another. • Occasionally, we implement database designs that are not fully normalized to achieve some other objective (most often performance). • We normalize to achieve: completeness, nonredundancy, flexibility of extending repeating groups, ease of data reuse, and programming simplicity. We sacrifice this when we de-normalize. • In many cases, these sacrifices will be prohibitively costly. Copyright: ©2005 by Elsevier Inc. All rights reserved. 17 You don’t need to normalize like this always • The past two lectures have shown you what makes a well structured database design shown as tables. • Don’t do it like this every time! • There is the equivalent of a blue-print for data modelling other than the table-like description we’ve seen. • Let’s return to the Drug Expenditure design. Copyright: ©2005 by Elsevier Inc. All rights reserved. 18 Drug Expenditure Database Model as Relations (Tables) • OPERATION (Hospital Number*, Operation Number, Operation Code*, Surgeon Number*) • SURGEON (Hospital Number*, Surgeon Number, Surgeon Specialty) • OPERATION TYPE (Operation Code, Operation Name, Procedure Group) • STANDARD DRUG DOSAGE (Drug Short Name*, Method of Administration, Size of Dose, Unit of Measure, Method of Administration, Standard Cost of Dose Cost) • DRUG (Drug Short Name, Drug Name, Manufacturer) • HOSPITAL (Hospital Number, Hospital Name, Hospital Category, Contact Person) • DRUG ADMINISTRATION (Hospital Number*, Operation Number*, Drug Short Name*, Method of Administration*, Size of Dose*, Unit of Measure*, Method of Administration*, Hospital Number*, Operation Number*, Number of Doses) Copyright: ©2005 by Elsevier Inc. All rights reserved. 19 Drug Expenditure Database Model as Entity-Relationship Model Hospital be performed at be prescribed at operate at be operated at by be managed by Operation Type be classified by classify Surgeon perform manage Operation Drug follow be of use be available in Standard Drug Dosage use be used in be used in be followed by Drug Admin prescribe Copyright: ©2005 by Elsevier Inc. All rights reserved. 20 What did we do? • Each table is a box • Each link via a foreign key is shown using a line with some other markings (we’ll get to these) • Each box has a name that describes what each row in the underlying table is about • What do each of these mean? • This leads us to the higher level model called the Entity-Relationship Model… it is the architects view of the database. • We begin this next lecture… Copyright: ©2005 by Elsevier Inc. All rights reserved. 21