Information Systems Database Systems (AH) 6839 Spring 2000 HIGHER STILL Information Systems Database Systems Advanced Higher Support Materials CONTENTS Section 1 Teacher/Lecturer Notes Section 2 Student Notes Section 3 Study Materials Information Systems: Database Systems (AH) 1 Information Systems: Database Systems (AH) 2 Section 1 Teacher/Lecturer Notes Information Systems: Database Systems (AH) 3 Information Systems: Database Systems (AH) 4 Aim This unit is designed to further develop students’ knowledge and understanding of data analysis and structuring. Status of this learning and teaching pack These materials are for guidance only. The mandatory content of this unit is detailed in the unit specification of the Arrangements document. Target audience While entry is at the discretion of the centre, students would normally be expected to have attained one of the following (or possess equivalent experience): Database Systems (Higher) unit Information Systems course at Higher level Progression This unit builds on skills acquired particularly at Higher level in database work, but emphasises the design of database structures. It links closely with Systems Analysis & Design (AH) and is used in the Advanced Higher Information Systems Project. Learning and teaching approaches You should note that this unit continues the new approach to the teaching of database systems begun in the Database Systems unit at Higher level. The emphasis is on the analysis of an existing system (non-computerized) and the design of a corresponding database. Outcome 1 continues the theme of normalization that was already present at Higher level. Outcome 2 introduces the construction of a data model by adding a data dictionary, event lists and entity life histories. The pack is divided into two sections, one for each outcome. The performance criteria for each outcome are covered in the order stated in the arrangements except that PC (e) of outcome 1 is relevant to each of the PC (a-d). A student-centred, resource-based learning approach is recommended. To enliven learning, the use of video, audio and multimedia learning aids is recommended. While the distribution of time between the outcomes will vary, students might be expected to complete each outcome within the following time scale: Outcome 1 Outcome 2 6 hours 14 hours Information Systems: Database Systems (AH) 5 Hardware and software resources This unit has no requirement for the student to have regular access to a computer. However, students may use software to prepare responses for the assessments. Suitable software would include word-processing and graphics packages. It may also be possible to use software provided for the Systems Analysis & Design unit (such as CASE tools). References Books The following books contain material relevant to the topic: Any book on SSADM, for example, Eva, M. 1994, SSADM Version 4: A User’s Guide, 2nd edition, McGraw-Hill (especially chapters 11, 12) A short book on databases that can be used by candidates. Rolland, F.D. 1998, The Essence of Databases, Prentice Hall A standard text on databases written for university level courses, suitable for teachers and lecturers. Elmasri, R and Shamkant B 2000. Fundamentals of Database Systems, 3rd edition, Addison/Wesley The British Computer Society’s Glossary of Computing Terms, published by Longman. Internet A wide range of articles on the Internet is available by entering terms such as ‘Relational Database Systems’, ‘Normalization’ and ‘Entity Life History’ in a search engine such as http://www.hotbot.com or http://www.excite.com/. Information Systems: Database Systems (AH) 6 Section 2 Student Notes Information Systems: Database Systems (AH) 7 Information Systems: Database Systems (AH) 8 This half unit, Database Systems, is a core unit of Information Systems (Advanced Higher). You will find it easier if you have already studied Database Systems either at Higher or elsewhere as you will be continuing to look at normalization in the first outcome. The second outcome contains new material on constructing a fuller data model. The outcomes of the unit are: 1 2 Normalize a data source to third normal form. Produce a data model from a normalized data source. You will find that no computer work is required for this unit. Outcomes 1 and 2 are about analysis and design. Although you are not required to implement a system as part of this unit you will use the methods of data analysis and design you learn in this unit in the Information Systems Project. How to tackle this unit The learning pattern suggested for your use in this unit is as follows: 1. Study the Introduction for each outcome first. You may also find it helpful to read through the Summary of the outcome before getting down to detailed work. 2. All the material is explained in terms of a running example. Work through the material in the order it is written. Every few steps you are advised to try your own example. You will find two example applications given in the section Exercises but your teacher/lecturer may provide other exercises to try. Keep the results of your example for outcome 1 as it will be of use in outcome 2 as well. 3. Review each outcome by using the summaries provided in the study material. 4. The section called Questions contains questions on all the material. Each question states the performance criteria that are most relevant. These questions may be attempted as soon as you have covered the appropriate performance criteria but they are probably best used at the end of each outcome when you have worked through your own example in detail. Assessment Assessment will normally follow each outcome, but your teacher or lecturer may choose to assess you at different times. Possible assessment items are as follows: Outcome 1 Tasks involving an analysis leading to the production of a third normal form data model. Outcome 2 Tasks involving the construction of a fuller data model with a data dictionary, event list and entity/event matrix, and entity life histories. Information Systems: Database Systems (AH) 9 References At the head of each section you will find a list of key terms introduced in that section. You can look these terms up in the reference material given below. Do not worry if you cannot get access to the books listed: there are many alternatives and you should find most of the terms in some book. Books The following books contain material relevant to the topic: Any book on SSADM will follow a similar approach to this unit, for example, Eva, M. 1994, SSADM Version 4: A User’s Guide, 2nd edition, McGraw-Hill (especially chapters 11, 12) There are many books on databases. A short book that can be used for reference is Rolland, F.D. 1998, The Essence of Databases, Prentice Hall The standard glossary is The British Computer Society’s Glossary of Computing Terms, published by Longman. Internet A wide range of articles on the Internet is available by entering terms such as ‘Relational Database Systems’, ‘Normalization’ and ‘Entity Life History’ in a search engine such as http://www.hotbot.com or http://www.excite.com/. Information Systems: Database Systems (AH) 10 Section 3 Study Materials Information Systems: Database Systems (AH) 11 Information Systems: Database Systems (AH) 12 UNIT INTRODUCTION This unit is concerned with the analysis of an existing data source such as a set of paper forms or a collection of computer files that are not integrated into a single database. In order to explain the ideas we shall use a running example of a form taken from a hospital application. We will only analyze one such form. In practice there would be many such forms to analyze and so there are stages later in the analysis where the entities identified will need to be merged. However this process lies outside the scope of this unit. OUTCOME 1 – NORMALISATION Key Terms normalization foreign key normal forms third normal form (3NF) functional dependency repeating item first normal form (1NF) partial dependency key repeating group second normal form (2NF) indirect dependency. Introduction Normalization is only defined in the context of the relational model of data. The steps we follow here are those given in the performance criteria of the arrangements but they are not the only routes to producing the final entities. However they are especially suited to analyzing an existing data source and are often used as part of SSADM (Structured Systems Analysis and Design Method). If you look in database textbooks such as Rolland you will find alternative methods based on constructing entity-relationship models and then mapping these models onto the relational model of data. You will find it helpful anyway to use entity-relationship models as these can often help our understanding of the data being modeled. Note that it is possible to construct an entity-relationship model without any reference to normalization. However, the entities produced in this way should then be represented as relational tables and checked to make sure they are all in third normal form. Normalisation is important What happens if we have a data source that is not fully normalized? In a word, we are then in danger of storing a single fact in more than one place in the database, or even of not storing it at all. If a fact (for example, the address of a supplier) is there more than once we can have problems updating the data: perhaps we will change one occurrence of the fact but not the other(s). This leads to loss of data integrity in the form of inconsistency. If a fact is not even stored we can never retrieve it or make any use of it. We will look at specific examples a little later on (see steps 6-8, below). Functional dependence and keys Normalization is built on the idea of functional dependence. Suppose an entity (a group of data items) contains the items A and B. The data item, A, is functionally dependent on the other, B, if, for any given value of B there can only be one value of A in any of the entity instances (records). We also say that B functionally determines A (shown in many books with an arrow, as in B A). Information Systems: Database Systems (AH) 13 Another way of explaining this is to suppose that there are two instances (say, two suppliers). Suppose item B is the supplier’s name and item A is the supplier’s address. Then if the two instances have the same supplier name it follows that the two addresses must be the same if the address is functionally dependent on the name. The idea of functional dependence can be extended by allowing B to be a set of data items, just as a key may be made up of more than one data item. Indeed we can define what is meant by a key in terms of functional dependencies. A set of data items in an entity is a key for that entity if it functionally determines all the other data items in the entity and if no subset of the key items will do this. The usual definition of a key follows directly from this. The key of an entity (group of data items) is one or more of its data items having the property that the values of the key items are different in every possible instance of the entity. It follows that the key value identifies each instance uniquely. Unfortunately it is not sufficient to look at actual data and check that the condition for a functional dependency or a key is satisfied. It may well be satisfied for the current data, but data can change. What we need to know are the conditions that the application puts on the data and these conditions are fixed by the ‘customer’, the specifier of the requirements. In our example of suppliers’ names and addresses it may or may not be true that one supplier can have more than one address. For example, one address might be the head office and another that of a warehouse. In such a case we would say that the supplier’s name does not functionally determine the supplier’s address. It is thus as useful to know which functional dependencies do not exist as to know which do exist. See the further discussion in the section ‘Classification of Constraints’. One final point. There may be more than one key for the same entity (group of data items). In this case we say that they are candidate keys, one of which is chosen to be the primary key. An example of this situation is given in the section ‘An Alternative Data Source’. Information Systems: Database Systems (AH) 14 The running example: Hospital Appointments We will be applying the methods of this unit to one example. It has been designed to include most of the problems that normalization has to deal with. The same example will be used in the second part of the analysis where events and life histories are examined. A Hospital Trust consists of several hospitals and the appointments for each clinic are kept in a folder. Each sheet in the folder is a form that looks like the following before any data is entered on it. HOSPITAL CLINIC APPOINTMENTS DATE OF CLINIC APPOINTMENTS Time Patient NHS no. CONSULTANT Name Phone number Patient Name Patient GP no. HOSPITAL Name Phone number Patient GP name Patient GP address A typical form that has been filled in with some data follows. Note that the variable data (such as the hospital’s name) are shown in a sans serif font. HOSPITAL CLINIC APPOINTMENTS DATE OF CLINIC 24/6/1999 APPOINTMENTS Time Patient NHS no. 1400 82-4561F 1415 55-8277H 1430 42-8433W 1445 77-5098I 1500 62-8231H 1515 34-9126P 1545 24-2187L CONSULTANT Name Mr Rees Phone number 395024 Patient Name Mary Fish Steven Howe Ming-Toh Wan Ali Ibrahim Roberta Henry Louis Panumam Fabian Lee Patient GP no. 3163 4200 3131 2090 5298 3742 3598 HOSPITAL Name Phone number Patient GP name Dr Spencer Dr Rose Dr Henderson Dr Rivington Dr Rose Dr Patel Dr Anderson St George’s 439862 Patient GP address Shieldham, SB3 6JK Millington, SB23 5JC Merkeley, MR4 8TD Shieldham, SB3 9RU Dunbally, MR11 7FA Shieldham, SB3 6JK Bruntsford, MR15 3GA The following constraints are placed upon the data. Consultants work for the Trust at each of its hospitals Patients attend clinics taken by the consultants One consultant can have only one clinic on a given day Each patient can have only one clinic appointment on a given day Information Systems: Database Systems (AH) 15 The Steps of Normalisation The data contained on all these forms is to be analyzed. The process is summarised in the following table and each step is explained in detail below. The column headed ‘State reached’ shows the state of normalization reached (UNF = Unnormalized Normal Form, 1NF = 1st Normal Form, etc.), the final column shows where the performance criteria are overtaken. Notice that steps 3, 5, 7 and 8 all contribute to PC 1e. Step Task 1 2 3 4 5 6 7 Identify all the data items Identify any repeating groups Identify the key(s) Remove any repeating groups into separate entities Identify keys, add foreign key(s) to represent the relationships Identify any partial dependencies on keys Remove partially dependent items to separate entities, identify keys, add foreign keys Identify any indirect dependencies on keys Remove indirectly dependent items to separate entities, identify keys, add foreign keys 8 9 State reached PC overtaken UNF 1a, 1e 1NF 1b, 1e 2NF 1c, 1e 3NF 1d, 1e If you have done the Database Systems unit at Higher, you should be familiar with steps 1 to 5, though you may have carried out the process in a different way. Information Systems: Database Systems (AH) 16 Step 1 The first step is to extract the various data items that appear on the form. It is often helpful to construct some additional data to increase understanding of the application. This data should satisfy the conditions or constraints expressed in the specification. For example, we should not have the same consultant taking two different clinics on the same day. (You should make up some new data now.) We name the group as a whole and each of the data items. These names may be a little different from the names used on the printed form. For example, the form has two items called ‘Name’, one for the consultant and one for the hospital. In order to prevent confusion we shall expand these to ‘Consultant Name’ and ‘Hospital Name’, respectively. At this early stage in the analysis it is better to use long names and not use abbreviations. Later, when the data dictionary is being constructed, shortened names can be used. Entity Clinic Appointment Data Items Clinic Date Consultant Name Consultant Phone Number Hospital Name Hospital Phone Number Appointment Time Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address Step 2 In step 1 we put all the data items into one entity group but we can see that all the items from ‘Appointment Time’ onwards have more than one value in the example data. This means that we have a repeating group of items, so we restate the entity group as Entity Clinic Appointment Data Items Clinic Date Consultant Name Consultant Phone Number Hospital Name Hospital Phone Number Repeating group Appointment Time Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address The following page shows the data displayed in a single table with one column for each data item. Some additional data has been added for another clinic (from another copy of the form). Note that there are only two records shown in this table. One record has a repeating group of seven repeats and the other one of two repeats. Information Systems: Database Systems (AH) 17 Information Systems: Database Systems (AH) 18 Mr Rees Mr Sale 24/06/99 Consultant Name 24/06/99 Clinic Date 396118 395024 Consultant Phone 439862 439862 St George’s Hospital Phone St George’s Hospital Name 62-8231H 34-9126P 24-2187L 1500 1515 1545 65-4922D 77-5098I 1445 1415 42-8433W 1430 72-3361K 55-8277H 1415 1400 82-4561F Patient NHS no. 1400 Time Richard Dell Mandy King Fabian Lee Louis Panumam Roberta Henry Ali Ibrahim Steven Howe Ming-Toh Wan Mary Fish Patient Name 4326 3131 3598 3742 5298 2090 3131 4200 3163 Patient GP no. Dr Mitchell Dr Henderson Dr Anderson Dr Patel Dr Rose Dr Rivington Dr Henderson Dr Rose Dr Spencer Patient GP name Abbeyton, MR14 4GF Merkeley, MR4 8TD Bruntsford, MR15 3GA Shieldham, SB3 6JK Dunbally, MR11 7FA Shieldham, SB3 9RU Millington, SB23 5JC Merkeley, MR4 8TD Shieldham, SB3 6JK Patient GP address Step 3 The key for this group of data items is not easy to establish. Obviously Clinic Date on its own is not enough (there can be many clinics on one day). Similarly the Consultant Name alone is insufficient. However the combination of Clinic Date and Consultant Name should be unique. We can show this key as (Clinic Date, Consultant Name). If the analyst were unsure of this fact from the stated requirements, the question to ask in clarification would be, Can the same consultant have more than one clinic on the same day? An answer of Yes would require some change to the choice of key. One possibility is to add an artificial key, say a number that is guaranteed different for each clinic. For the purposes of our running example we shall assume an answer of No. The key data items are shown underlined: Entity Clinic Appointment Data Items Clinic Date Consultant Name Consultant Phone Number Hospital Name Hospital Phone Number Repeating group Appointment Time Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address We have now reached the unnormalized form (UNF) of the data source. (This is a good time to try steps 1-3 for one of the exercises.) Step 4 Normalization is a process whereby the data items are grouped into entities that satisfy certain conditions (or constraints). The first of these is a fundamental requirement of the relational model of data. First Normal Form (1NF) requires that each data item be a single value. For example, a data item value must not be a repeating set of values or an address of other data. In particular, this means that repeating groups are not allowed within an entity. We can remove repeating groups in two ways. We can take out the repeating items into a separate entity, or we can duplicate the values of the non-repeating data items in each record. The first method has the advantage of advancing the normalization process. This is because the data items of the repeating group depend only on their own key and not on the whole key. In the example, the data items of the repeating group have a key Appointment Time but the whole entity (using duplication of values) has a key of (Clinic Date, Consultant Name, Appointment Time). We shall see later that the partial dependence on the full key means that any entity containing a repeating group cannot be in Second Normal Form (2NF). PC 1b requires that we use the first method and remove the repeating group to another entity. Information Systems: Database Systems (AH) 19 Entity Consultant Session Appointment (taken from the repeating group) Data Items Clinic Date Consultant Name Consultant Phone Number Hospital Hospital Phone Number Appointment Time Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address See Fig. er1 for the ER diagram for this stage in the process. Figure er1. Consultant Name Patient NHS No Clinic Date Time Patient Name N Appointment Patient GP No Patient GP Address Patient GP Name Is For 1 Consultant Session Hospital Phone No Consultant Phone No Hospital Step 5 These two groups have a many-to-one relationship between them: one Consultant Session can have many Appointments, but each Appointment is for a single Consultant Session. In the relational model a many-to-one relationship is represented by putting the key of the ‘one’ end into the ‘many’ end as a foreign key. In the example the key of Consultant Session is (Clinic Date, Consultant Name) so both these items need to be added to the Appointment entity to put it into First Normal Form (1NF). Of course they remain in the Consultant Session entity as well. Information Systems: Database Systems (AH) 20 Entity Consultant Session Data Items Clinic Date Consultant Name Consultant Phone Number Hospital Name Hospital Phone Number Clinic Date Consultant Name Appointment Time Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address Appointment (with foreign keys added) The key of Appointment includes the foreign key items but this is not true for all foreign keys. See, for example, the case of Hospital in step 9, below. There is no change of ER diagram at this point because ER models do not show foreign keys. This is because ER models are not specific to relational databases but can be used for other database models such as the network model or the objectoriented model. Foreign keys are what is used in the relational model to represent relationships. Our running example does not have a many-to-many relationship at this stage but it is important to deal with these correctly. See the Additional Note on Many-to-Many Relationships that follows step 9. (You should now try steps 4, 5 of your exercise.) Step 6 Second Normal Form (2NF) requires that every data item not in the key is fully dependent on the key. To put it another way: an entity in 2NF cannot contain any partial dependencies. A data item is partially dependent on the key if it is functionally determined by part of the key. Obviously we cannot have part of a single-item key, so there can be no partial dependencies if the key is a single item. We can now examine each entity in turn to see if it is in 2NF. We will look first at some example data for Consultant Session. Note that this is the only entity containing the items Consultant Phone Number and Hospital Phone Number. We will see that there are some problems. Clinic Date Consultant Name 24/6/1999 24/6/1999 1/7/1999 1/7/1999 Mr Rees Mr Sale Mr Robins Mr Sale Consultant Phone Number 395024 396118 385741 396118 Information Systems: Database Systems (AH) Hospital Name St George’s St George’s Central St George’s Hospital Phone Number 439862 439862 290435 439862 21 (1) There is a problem with updating. For example, if Mr Sale changes his phone number to 398765 we must update every place this occurs. (2) There is a problem with losing information through deletion. Suppose Mr Rees’s clinic is cancelled. We want to delete that row from the table, but that would remove the only place where Mr Rees’s phone number is stored. (3) There is a problem with adding information. If we wish to add the fact that the consultant Mr Hope has phone number 443388 we need to add another row, but we have no value for one of the key items (Clinic Date) for this new row. We cannot leave it blank since every row must have a unique key to identify that row. These three problems (or anomalies) are all aspects of the same difficulty: not having a fact stored exactly once. They are caused by the partial dependency of Consultant Phone Number on the key. Consultant Session has key (Clinic Date, Consultant Name). The item Consultant Phone Number is dependent on Consultant Name, but not on Clinic Date, so it is partially dependent on the key. Every entity not in 2NF will have the same three problems. (There are also problems to do with Hospital but they have another cause, see later under step 8.) Step 7 We must separate out a new entity containing Consultant Name and Consultant Phone Number, which we can call Consultant. There is a many-to-one relationship between the entities Consultant Session and Consultant. This relationship is represented in the relational model by a foreign key. In the example we leave the key of Consultant (i.e. Consultant Name) in the entity Consultant Session as a foreign key. Entity Consultant Session Data Items Clinic Date Consultant Name Hospital Name Hospital Phone Number Consultant Name Consultant Phone Number Consultant The remaining data items in Consultant Session (Hospital Name and Hospital Phone Number) depend on the full key, so Consultant Session is in 2NF. Since the key of Consultant is a single item, Consultant Name, there cannot be any partial dependencies, so Consultant also is in 2NF. We can now examine the problems to see that they have been dealt with. If we put the example data into the two tables we get Clinic Date 24/6/1999 24/6/1999 1/7/1999 1/7/1999 Consultant Name Mr Rees Mr Sale Mr Robins Mr Sale Hospital Name St George’s St George’s Central St George’s Information Systems: Database Systems (AH) Hospital Phone 439862 439862 290435 439862 22 Consultant Name Mr Rees Mr Sale Mr Robins Mr Sale Consultant Phone 395024 396118 385741 396118 Problem (1). Updating Mr Sale’s phone number now involves changing just one data value. Problem (2). Deleting Mr Rees’s clinic on 24/6/1999 involves removing just one row from Consultant Session and we still have Mr Rees’s phone number in the Consultant table. Problem (3). Adding a new consultant means adding a row to the Consultant table. There is now a unique key value (the name of the consultant). After these three changes the data would be Clinic Date 24/6/1999 1/7/1999 1/7/1999 Consultant Name Mr Sale Mr Robins Mr Sale Consultant Name Mr Rees Mr Sale Mr Robins Mr Sale Mr Hope Consultant Phone 395024 398765 385741 396118 443388 Hospital Name St George’s Central St George’s Hospital Phone 439862 290435 439862 We now repeat steps 6 and 7 for the other entity, Appointment, which has key (Clinic Date, Consultant Name, Appointment Time). In this case all of the non-key items depend on the whole key. For example, we cannot know the Patient GP Number unless we know the Patient, and we cannot know the NHS Number until all of the key is known. So Appointment is in 2NF. We can show the current position: Entity Consultant Session Consultant Appointment Data Items Clinic Date Consultant Name Hospital Hospital Phone Number Consultant Name Consultant Phone Number Clinic Date Consultant Name Appointment Time Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address See Fig. er2 for the ER diagram at this point. Note again that no foreign keys are needed on the ER diagram. Information Systems: Database Systems (AH) 23 Figure er2. The Consultant Session entity is properly termed a ‘weak entity’ and the Clinic Date item is only a partial key. The full key is the combination of Clinic Date and Consultant Name (the key of Consultant). The weak entity is shown with a double-lined box. Patient NHS No Patient Name Clinic Date Time N Is For Appointment Patient GP No 1 1 Hospital Consultant Session N Patient GP Address Patient GP Name Hospital Phone No Takes 1 Consultant Consultant Phone No Consultant Name (Try steps 6, 7 for your own exercise.) Step 8 Third Normal Form (3NF) requires that every non-key item depends directly on the whole key. We look first at Consultant Session and again examine some example data. Clinic Date 24/6/1999 24/6/1999 1/7/1999 1/7/1999 Consultant Name Mr Rees Mr Sale Mr Robins Mr Sale Hospital Name St George’s St George’s Central St George’s Hospital Phone 439862 439862 290435 439862 Unfortunately the same kinds of problem are still present. Information Systems: Database Systems (AH) 24 (1) There is a problem with updating. For example, if St George’s Hospital changes its phone number we must update every place this occurs. (2) There is a problem with losing information through deletion. Suppose Mr Robins’s clinic is cancelled. We want to delete that row from the table, but that would remove the only place where Central Hospital’s phone number is stored. (3) There is a problem with adding information. If we wish to add the fact that Heath Hospital has phone number 357380 we need to add another row, but we have no value for either of the key items (Clinic Date, Consultant Name) for this new row. We cannot leave them blank since every row must have a unique key to identify that row. These problems are not caused by any partial dependence of items on the key but by indirect dependence on the key. Hospital Phone Number depends on Hospital which itself depends on the key so this entity is not in 3NF. Step 9 We must separate out an entity containing Hospital and Hospital Phone Number, calling it, say, Hospital. There is a many-to-one relationship between Consultant Session and Hospital and this relationship is represented by keeping the item Hospital (the key of the new entity Hospital) in the Consultant Session as a foreign key. The two entities are now Entity Consultant Session Hospital Data Items Clinic Date Consultant Name Hospital Hospital Hospital Phone Number We now repeat steps 8 and 9 for the other entities. First we examine the entity Consultant. This has only one non-key item and so there is no possibility of dependence on a non-key item. Consultant is in 3NF. Finally we examine Appointment. All four items Patient Name, Patient GP Number, Patient GP Name, Patient GP Address are dependent on the non-key item Patient NHS Number so the entity is not in 3NF. We can remove to another entity these items, calling it Patient. It will have the key Patient NHS Number and this item is also left in Appointment as a foreign key to represent the many-to-one relationship between Appointment and Patient. The two entities are shown in the following table. Information Systems: Database Systems (AH) 25 Entity Appointment Patient Data Items Clinic Date Consultant Name Appointment Time Patient NHS Number Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address If we examine the new entity Patient we see that there are still items that do not depend directly on the key. Patient GP Name and Patient GP Address depend on Patient GP Number and not directly on Patient NHS Number This means that Patient is not yet in 3NF. We remove to a new entity the items Patient GP Number, Patient GP Name, Patient GP Address, calling it simply GP. It will have as its key Patient GP Number. The many-to-one relationship between Patient and GP is represented by the foreign key Patient GP Number. It is convenient to rename the items in GP so that they no longer carry the prefix Patient but we keep the old name in the foreign key to explain its role. The entities are now Entity Patient GP Data Items Patient NHS Number Patient Name Patient GP Number GP Number GP Name GP Address All the entities are now in 3NF and we can display the full set of entities Entity Consultant Session Hospital Consultant Appointment Patient GP Data Items Clinic Date Consultant Name Hospital Name Hospital Name Hospital Phone Number Consultant Name Consultant Phone Number Clinic Date Consultant Name Appointment Time Patient NHS Number Patient NHS Number Patient Name Patient GP Number GP Number GP Name GP Address See Fig. er3 for the final ER diagram with the six entities. Information Systems: Database Systems (AH) 26 Figure er3. Hospital Hospital Phone No Hospital Name 1 Holds Time N N Is For Appointment 1 N Takes 1 1 Patient Patient Name Consultant Session N Attends Patient NHS No Clinic Date Consultant N Consultant Phone No Consultant Name has GP No GP Name 1 GP GP Address Another entity is weak, Appointment. It gets its key from its own partial key and from the key of Consultant Session. Since Consultant Session is itself weak, the key of Appointment has three items in all. Note once again that no foreign keys are shown on an ER diagram. (Now try steps 8, 9 for your own exercise.) Information Systems: Database Systems (AH) 27 Additional Note on Many-to-Many Relationships When we form a new entity by taking out a group of items there will be relationships between the resulting entities. Usually these will be many-to-one relationships (or occasionally one-to-one). These relationships are represented in the relational model by foreign keys. However, sometimes a relationship will be many-to-many. In order to represent a many-to-many relationship in the relational model we must introduce an additional linking entity. This new entity will contain a foreign key for each of the originally related entities. Every many-to-many relationship can be decomposed into two many-to-one relationships in this way. This is shown in a general way using ER diagrams in Fig. er3A. Note that any attributes of the many-to-many relationship become attributes of the new entity. This will happen in the simple example we now look at. Figure ER3A. Decomposition of a many-to-many relationship into two many-to-one relationships. (a) The original many-to-many relationship Key of B Key of A M Entity A Attribute of A Information Systems: Database Systems (AH) N R Attribute of R Entity B Attribute of B 28 (b) The same model with the decomposition into two many-to-one relationships. Key of A 1 Entity A N RA Link Entity Attribute of R N Attribute of A RB 1 Key of B Entity B Attribute of B Suppose that students may enrol on courses. If there is one form for each student we might show the data items as in the following table. Entity Student Data Items Student Number Student Name Student Date of Birth Repeating group Course Number Course Title Course Credits Date Enrolled Grade Awarded (Try making up some data for this example.) We separate the repeating group into a separate entity, Course. Entity Student Course Data Items Student Number Student Name Student Date of Birth Course Number Course Title Course Credits Date Enrolled Grade Awarded (What happens to your example data at this point?) Information Systems: Database Systems (AH) 29 These two entities have a many-to-many relationship (a student can do many courses, a course can have many students taking it). We introduce a new entity, Enrolment, that is in a many-to-one relationship with each of Student and Course. To represent these relationships, we need to add the keys of these two entities as foreign keys in Enrolment. It is also clear that the data items Date Enrolled and Grade Awarded belong to this new entity. (That is why it was difficult to assign the example data to the two entities Student and Course.) We now need to determine the key of Enrolment. If we assume that a student cannot take a course more than once, Date Enrolled and Grade Awarded are functionally determined by the combination (Student Number, Course Number). The key of Enrolment is the combination (Student Number, Course Number). Entity Student Course Enrolment Data Items Student Number Student Name Student Date of Birth Course Number Course Title Course Credits Student Number Course Number Date Enrolled Grade Awarded If a student can take a course more than once, Date Enrolled is not functionally determined by (Student Number, Course Number). We need to add Date Enrolled to the key. Entity Student Course Enrolment Data Items Student Number Student Name Student Date of Birth Course Number Course Title Course Credits Student Number Course Number Date Enrolled Grade Awarded The ER diagrams are shown in Fig. er3B. The new entity is a weak entity and its partial key is shown as Date Enrolled (assuming here that a course can be taken more than once). Information Systems: Database Systems (AH) 30 Figure er3B The student course example. (a) ER diagram with a many-to-many relationship. Date Enrolled Student Number M Student Course Number N Enrol Course Student Date of Birth Course Credits Student Name Course Title Grade Awarded (b) ER diagram showing decomposition into two many-to-one relationships. The link entity is a weak entity and has Date Enrolled as a partial key. Student Number Grade Awarded 1 Student Student Date of Birth N makes Enrolment Date Enrolled N Student Name has 1 Course Course Title Information Systems: Database Systems (AH) Course Number Course Credits 31 Types of User Nearly every information system will have more than one kind of user. Different users will need to see different selections of the information, or at least want to have it arranged in different ways. Some users will be restricted to read-only access, others will be allowed to modify certain items, still others will be allowed to add or delete instances, and perhaps others again will have unrestricted access. In the hospital appointment example we can think of at least three kinds of user: the hospital administrators, the consultants, the patients. Each type of user will look at different parts of the data and will want it arranged in a way that suits them. The forms we have been using were obviously designed for use by administrators staffing the clinics. In the next section we look at how the data might be structured to suit the patients. (You should list the types of users for the exercise that you are following through.) For the second outcome of this unit you will be listing the events that cause processing of the data. It is easy to overlook some of these events and so produce an incomplete model. One way to help find all the events is to take the types of user in turn and list events from their point of view. Duplicates will need to be removed of course but you are much more likely to have found all the events. An Alternative Data Source Although we have completed the normalization of our hospital clinic data source it is instructive to look at it again The same data that is written on the Hospital Clinic Appointments forms can be structured quite differently if we take the point of view of the patient. In this case there will be a form filled in for each patient containing the details of each appointment. There is another reason for looking at the same problem a second time. We will find that reaching 3NF can be quite difficult when there is a choice of keys for an entity. In our previous study of the problem we were able to identify keys quite easily but we did in fact overlook one possibility. Information Systems: Database Systems (AH) 32 PATIENT HOSPITAL CLINIC APPOINTMENTS PATIENT’S GP GP no. 82-4561F 3163 GP name Mary Fish Dr Spencer GP address Shieldham, SB3 6JK APPOINTMENTS Clinic Date Appointment Hospital Hospital Consultant Time Phone no. Name 24/6/1999 1400 St George’s 439862 Mr Rees 15/8/1999 1530 St George’s 439862 Mr Salmon PATIENT NHS no. Name Consultant Phone no. 395024 309651 There is no new data on these forms. What is different is that the data is structured differently and so different values will need to be repeated on the forms. For example, the Hospital phone number is duplicated in the two appointments shown. If we apply steps 1 to 3 to this new form of the data we get to an UNF shown below. Entity Patient Appointment Data Items Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address Repeating group Clinic Date Appointment Time Hospital Name Hospital Phone Number Consultant Name Consultant Phone Number We can now proceed to steps 4 and 5 and remove the repeating group into an entity ‘Appointment’, adding Patient NHS Number as a foreign key in Appointment, and renaming the group with key Patient NHS Number as simply Patient. Entity Patient Appointment Data Items Patient NHS Number Patient Name Patient GP Number Patient GP Name Patient GP Address Clinic Date Patient NHS Number Appointment Time Hospital Name Hospital Phone Number Consultant Name Consultant Phone Number Information Systems: Database Systems (AH) 33 For steps 6 and 7 we find that the Patient entity is already in 2NF since the key consists of a single item. The Appointment entity is also in 2NF since neither Clinic Date nor Patient NHS Number determines any of the other items uniquely. For example, on a given date more than one Hospital may have clinics; a given patient may see more than one Consultant (on different dates). Moving on to 3NF with steps 8 and 9 we look for any dependencies that are not direct on the keys. In the entity Patient we have Patient GP Name and Patient GP Address dependent on Patient GP Number and so not directly dependent on the key. We separate out a new entity, GP, (as we did previously). Entity Patient GP Data Items Patient NHS Number Patient Name Patient GP Number GP Number GP Name GP Address In the entity Appointment we have Hospital Phone Number dependent on Hospital Name and so only indirectly dependent on the key. Similarly, Consultant Phone Number is dependent on Consultant Name and only indirectly on the key. We must therefore create two new entities, Hospital and Consultant, as shown below. Hospital Name and Consultant Name are retained in Appointment as foreign keys to represent the many-to-one relationships. Entity Hospital Consultant Appointment Data Items Hospital Name Hospital Phone Number Consultant Name Consultant Phone Number Clinic Date Patient NHS Number Appointment Time Hospital Name Consultant Name It would appear that all these entities are in 3NF but there is a difficulty with Appointment. There is in fact another key, namely (Clinic Date, Consultant Name, Appointment Time). We could easily miss this other candidate key. When we looked at the problem the first time the ‘obvious’ key was just this one. Now the ‘obvious’ key is (Clinic Date, Patient NHS Number). 3NF in its strict form requires that there is no indirect dependence of any item on any key (primary or candidate). But Hospital Name is uniquely determined by the candidate key (actually by a part of it only, i.e. by Clinic Date, Consultant Name), so we must add another entity containing this dependence and removing Hospital Name from Appointment. We can show typical data before the separation. Information Systems: Database Systems (AH) 34 APPOINTMENT Clinic Date Patient NHS Number 24/6/1999 82-4561F 15/8/1999 82-4561F 15/8/1999 73-4485G 17/8/1999 55-7108D Appointment Time 1400 1530 1545 1400 Hospital Name St George’s St George’s St George’s Central Consultant Name Mr Rees Mr Salmon Mr Salmon Mr Salmon There are still present the same kinds of difficulty as before. (1) There is a problem with updating. For example, if Mr Salmon’s clinic on 15/8/1999 changes its venue to Central Hospital we must update every place this occurs. (2) There is a problem with losing information through deletion. Suppose Mr Salmon’s 1400 hrs appointment on 17/8/1999 is. We want to delete that row from the table, but that would remove the only place where the fact that Mr Salmon has a clinic at Central Hospital on that day is stored. (3) There is a problem with adding information. If we wish to add the fact that Mr Rees has a clinic at Heath Hospital on 22/8/1999 we need to add another row, but we have no value for the item Patient NHS Number for this new row, and Patient NHS Number is part of the key so cannot be left blank. After separating out the entity Hospital we have the two entities Entity Consultant Session Appointment Data Items Clinic Date Consultant Name Hospital Name Clinic Date Consultant Name Appointment Time Patient NHS Number You should check that all three difficulties are indeed taken care of correctly in the new entity Consultant Session. We have now reached exactly the same set of entities in 3NF as we had before. Clearly the identification of all the candidate keys is extremely important and any omission can lead to incomplete normalization. We can add a final comment on this example. If a manual system used both the forms it would have difficulty maintaining the integrity of the data as there would be considerable duplication between forms of the two kinds as well as duplication within forms of the same kind. Normalization of data sources is designed to avoid all such problems. It was rarely used on manual systems because of the excessive amount of cross-referencing that would be needed. This cross-referencing is not a problem in computer-based systems (though it can lead to inefficiencies of processing). Information Systems: Database Systems (AH) 35 Other normal forms Are there any other normal forms after third normal form? Yes, there are fourth and fifth normal forms, but that is all! However, these normal forms are rarely needed and are not part of this unit. If you wish to explore further they are discussed in most standard textbooks on databases. Summary of outcome 1 (normalisation) An entity is a named group of data items that are properties of that entity. An entity instance is one particular set of values of those items. When represented in a computer the entity instance is often called a record and the items fields, and the whole collection of instances is called a file. In the relational model of data the preferred terms are table (for file), row (for record) and column (for field).. In the theory of relational databases the terms are relation (for table), tuple (for row) and attribute (for column). Every entity has at least one key. A key is one or more data items whose combined values are different for every possible instance of the entity. In UNF all the data items in the data source are listed, repeating items or groups of items are shown and the key or keys are identified. In 1NF there must be no repeating items or groups of items. An entity with repeating items or groups must be split into two or more entities. Each repeating group will need its own entity. When one entity is split into two or more entities there will be relationships between the entities. Usually the relationships will be many-to-one but they may be one-toone or many-to-many. In the cases of many-to-one and one-to-one the relationships are represented by putting foreign keys into one of the entities. For a many-to-one relationship the foreign key is added to the ‘many’ end. For a one-to-one relationship it may be added to either entity but if one of the entities is such that not every instance has to participate in the relationship it is better to put the foreign key in that entity. This will reduce the use of null (empty) values in the foreign key item. In the case of a many-to-many relationship it is necessary to add a new link entity and decompose the many-to-many relationship into two many-to-one relationships. The new entity will have a foreign key for each of the other two entities. In addition it will have data items that are attributes of the relationship rather than of either of the two entities. An item is partially dependent on a key if it is functionally dependent on an item or items that are in the key but do not make up the whole key. Partial dependencies cannot exist for single-item keys. Information Systems: Database Systems (AH) 36 In 2NF there must be no items in an entity that are partially dependent on the key(s). To reach 2NF any partially dependent items are removed to a new entity, along with (copies of) the parts of the key they are dependent on. As before, relationships will need to be established between the entities formed. An item is indirectly dependent on a key if it is functionally dependent on an item or items that are themselves not in the key but are functionally dependent on the key. In 3NF there must be no items in an entity that are indirectly dependent on the key(s). To reach 3NF any indirectly dependent items are removed to a new entity along with (copies of) the items they are directly dependent on. As before, relationships will need to be established between the entities formed. It is important to identify all the keys of an entity. These are called candidate keys. The one chosen is sometimes called the primary key. Information Systems: Database Systems (AH) 37 OUTCOME 2 – COMPLETING THE DATA MODEL Key Terms data dictionary entity life history event entity/event matrix Introduction This outcome covers several ways of extending the data model produced by normalization. So far the data items have only been named and grouped together into normalized entities. More information about the items is put into the data dictionary. Another significant aspect of the data that has not been looked at is what processing is performed on the data. The events that cause processing of the data will involve one or more of the entities, creating, modifying or deleting instances. This correspondence between events and entities is put into an entity/event matrix. Finally the sequence in which events can happen is incorporated into entity life histories. Classification of constraints Any data model needs to be able to represent (or, capture) the constraints on the data it models. We can group together constraints of similar kinds into a classification. Constraint class Domain Key Description Restrictions on the values allowed in a data item One or more data items that together identify each record uniquely Foreign Key A data item that has values that exist as the key values in another group Functional Dependency There cannot be more than one value of a functionally determined data item for each value of the determining item(s) Includes multi-valued dependencies and arbitrary constraints Other Example Data item ‘Tax date’ holds only valid dates Data items ‘Clinic Date’ and ‘Consultant Name’ together form a key of the Consultant Session group of data items ‘Consultant Name’ in group Consultant Session must be a value of ‘Consultant Name’ in the Consultant group ‘Consultant Name’ functionally determines ‘Consultant Phone’ The maximum number of appointments at any single session is 20 This unit requires you to specify the constraints in the first three rows of this table. These constraints will appear in the data dictionary. The fourth kind, functional dependency, is involved in determining the keys and in the process of normalization, but you are not required to explicitly list these constraints. The last kind is ‘all the rest’ and it is not always possible to fit these constraints into the relational data model. That means that they would have to be built into the processing of the operations by some kind of programming. It is important to realize that the constraints are part of the application. The data analyst must find out from the customer specifying the system requirements what these constraints are. Two similar applications may have different constraints. Information Systems: Database Systems (AH) 38 For example, one organization may have the rule that any given manager may be in charge of only one department, but another organization may allow one manager to be in charge of several departments. To make it harder to discover, this second organization may not have any such cases at the moment, so even the example data will not show up this difference in the rule. Although inspecting example data can suggest constraints it can never be used to show the existence of a constraint. The most we can say is that the example data are consistent with a certain constraint or that they do not comply with a constraint (and so act as a counter-example). Making sure that the actual data satisfies the constraints at all times is sometimes termed maintaining the integrity of the data. So, for example, foreign key constraints are called referential integrity constraints. The Data Dictionary (PC 2a, 2b) Many of the constraints are incorporated into the data dictionary. In addition, the data dictionary is the place where standardization of names takes place and any synonyms are identified. Since many names would be rather long it is usual to use abbreviations but there is a danger of having several different versions of an abbreviation (e.g. ‘number’ may be abbreviated no., num, nmr, nr, nbr, or even #). So the data dictionary is the place where the chosen form of abbreviation is defined. One common convention is to start data item names with an abbreviated form of the group or entity of which they are part. Another useful convention is to separate the parts of a name with the underscore character. Whatever abbreviations are used as part of the item names it is important that there is an unambiguous description of each item in the dictionary. For each item we also indicate the type of value that it is drawn from (often called its domain). The table shows some common types. There is no single standard for how the types are designated. Type Text Designations A, Alpha, Text Whole number Number, Integer Number with fractional part Number, Float Date Time Boolean Date Time Boolean, Logical Comments Usually a maximum length is stated as in A(20). Validations may be given in the form of pictures like A9999 (meaning a letter followed by 4 digits). A common validation is a range, often shown as in 0..100. Not always distinguished from whole numbers. An important special case is currency. May include a time component Time only without date Two values true/false, yes/no At this stage in the design the format or layout of the values is of no concern, so, for example, a date does not need to be specified as dd/mmm/yyyy or the like. Information Systems: Database Systems (AH) 39 There is sometimes a choice between text and number. A field may be required to be all digits so that it could be represented by either text or number. If no arithmetic is to be carried out on the numbers then it is best to make the field text with a validation of all digits. In particular this gets round the problem of leading zeros (e.g. 002954), which are difficult to retain when type number is used. All foreign keys should be noted under validation. The entry should refer to the key item from which the values must be taken. Note that the type/size for a foreign key should always be identical to that of the key to which it refers. Some items may not have a value at all stages in the lifetime of the entity and these are marked as not required. Finally, the key items will be shown, every item in the key being shown with a ‘Y’ entry. The data dictionary for our Hospital Appointment example is given on the next page. The items are arranged by entity but another useful arrangement is by alphabetical order of item names. (In practice the data dictionary would be stored in a database so that various orderings and searches would be available.) The entity called ‘Consultant Session’ has been renamed as ‘Clinic’. Item Entity Description Clinic_Date Clinic_Cons_ Name Clinic_Hosp_ Name Hosp_Name Hosp_Phone_ Nbr Cons_Name Cons_Phone_ Nbr Appt_Date Appt_Cons_ Name Appt_Time Appt_Pat_ NHS_Nbr Pat_NHS_Nbr Clinic Clinic Pat_Name Pat_GP_Nbr Patient Patient GP_Nbr GP_Name GP_Addr GP GP GP Date clinic held Name of consultant taking clinic Name of hospital where clinic is held Name of hospital Phone number of hospital Name of consultant Phone number of consultant Date of Appointment Name of consultant for appointment Time of appointment NHS number of patient attending NHS number of patient Name of patient Number of GP of patient Number of GP Name of GP Address of GP Clinic Hospital Hospital Consultant Consultant Appointment Appointment Appointment Appointment Patient Req’d Key Y Y Y Y Y N A(32) A(12) Y Y Y N A(32) A(12) Y Y Y N Y Y Y Y Y N Y N Y Y Y Y N N Y Y Y Y N N Type/ Size Date A(32) Range/ Validation A(32) Existing Hosp_name Date A(32) Time A(8) A(8) A(32) A(4) A(4) A(32) A(100) Existing Cons_Name Existing Pat_NHS_Nbr 99-9999A Existing GP_Nbr 9999 (You should now try to produce the data dictionary for your exercise.) Information Systems: Database Systems (AH) 40 Identification of Events and Construction of Entity/Event Matrix (PC 2c, 2d) The next task is to identify the events associated with each of the entities. These events are then placed into an entity/event matrix. In practice it is easier to work directly with the matrix since the list of events is simply the first column of the matrix. Four kinds of event occur. Event type Create Delete Modify Matrix entry C D M Read R Comments An event that creates a new instance of an entity An event that causes the removal of an instance of an entity An event that changes the value of one or more items in an entity An event that uses the values of items in an entity but does not change any of them Each entity should have at least one event that creates an instance and one that deletes an instance. Occasionally there will be no deletion event but in such cases there is usually an archiving event that removes the entity instance to some form of archival storage. Most entities will also have events that can modify item values. Note that an event should be shown as of type Modify when there is at least one item that can be amended; other items may not be allowed to be amended by this or any other event. The same event may affect more than one entity. For example it may create one entity, modify another, and read from a third. Also, more than one event may affect the same entity in the same way. For example, two different events may cause an instance of an entity to be created. The Entity/Event matrix for the hospital appointments example follows. Event Add hospital Add consultant Add clinic Add GP Add patient Make appointment Change time of appointment Modify hospital details Modify consultant details Modify patient details Modify GP details Archive clinic details Cancel clinic Cancel appointment Delete hospital Delete consultant Delete patient Delete GP Report of appointments by clinic Report on appointments by patient Hosp C R R R Cons Clinic C R C R R R R Appt Pat C M C R R GP C R R M M M M D R R R R D D D R R R D D D R R R R R R R R R R R D R R R D R R (You should now try to derive the entity/event matrix for your exercise.) Information Systems: Database Systems (AH) 41 Entity Life Histories (PC 2e) The various events that affect any given entity are collected together and an entity life history is constructed. This is usually shown in the form of a diagram which is a hierarchy showing the order in which events can occur. At the top of the diagram is a rectangular box representing the entity. Below it are drawn rectangles representing the events that create, modify or delete instances of the entity (read-only events are not of interest here). The boxes are connected by lines to show the hierarchy and the sequence in time is from left to right in any group connected to the same box. If an event can be repeated (zero or more times) it is shown with an asterisk (*) in the upper right corner. If an event is an alternative it is shown with a circle (o) in the corner. Note that in the case of alternatives exactly one of the alternatives must be selected, it is not allowed for none to be selected or indeed for more than one. Figures aa1 to aa4 show the different possibilities. Fig. aa1 shows two different styles of connecting the boxes. In Fig. aa3 note the use of the null box as an alternative to give the effect of an optional selection. In order to make the diagrams easier to understand there are two guidelines. All the boxes connected to a given event should be of the same kind (so, for example, we do not put an iteration as part of a sequence). Complex sets of events are grouped together under named boxes (internal nodes) that are not themselves events. Events only appear at the extremities of the diagram with no other boxes below them. Figs bb1, bb2 show the example from the arrangements in its original form and then following these guidelines. As an example of constructing an entity life history consider the GP entity. From the entity/event matrix we find that there are just three events: New GP, Modify GP Details, Delete GP. The creation event is put first (on the left). The modifications can occur zero or more times over the lifetime of the GP instance, so a repetition is needed. This breaks the first guideline in that the events under GP would be a mixture, so we introduce an internal node. As this node covers the whole life of the instance between creation and deletion a good name for it is ‘GP Life’. The repeated event is placed below this node. The diagram is shown in fig xx1 along with a textual representation of the hierarchy using indentation. The comments indicate the kind of box (entity, internal node, event). No use of selection of alternative events appears in this life history, but suppose that it is wished to separate the single event ‘Modify GP Details’ into two events, ‘Change GP Name’, ‘Change GP Address’. These two events could then be alternatives so that we add below ‘Modify GP Details’ two boxes, each with a circle to show that they are alternatives. This is shown in fig xx2. Information Systems: Database Systems (AH) 42 Some possible sequences of events that are allowed by this life history are shown below. Note that internal nodes do not appear since they are not events. New GP Delete GP Some possible sequences of events for the entity GP New GP New GP Change GP Address Change GP Address Change GP Address Change GP Name Delete GP Change GP Address Delete GP Many sequences of events are not allowed. All of the following sequences cannot be formed from the given life history. (You should be able to say why each sequence is illegal.) New GP New GP Delete GP Some sequences of events that are not allowed for the entity GP Change GP Address Delete GP Change GP Address New GP Delete GP Change GP Address Delete GP Figures xx3 to xx5 show further life histories of entities in the running example. Figures xx6a and xx6b show two versions of the life history for Appointment. In the second version the events that terminate the life of the entity instance are classified in more detail. Both versions are correct and professionals might disagree over which they prefer. This is rather like programming when there may be several correct solutions to the same problem. (You should construct entity life histories for your exercise now.) Summary of outcome 2 (the fuller data model) Detail about each data item is put into a data dictionary. The detail consists of the following items. Property of the data item Item name Entity Description Type/Size Range/Validation Required (Req’d) Key Comment Different for every item, can use consistent abbreviations Name of the entity the item is part of This should expand any abbreviations used in the name The type of the underlying domain with size shown for text Any further constraints on the domain, especially for foreign keys Y if the data item must always have a value Y if this item is (part of) the key The various events associated with the entities in the data source are identified, listed and placed in an entity/event matrix. Every entity that is involved in a particular event has an entry in the matrix showing whether its effect is to create, modify, delete or merely read the entity instance. Every event must have at least one creation and one deletion event. Information Systems: Database Systems (AH) 43 The final step in the analysis is the construction of entity life histories to show the order in time of the different events in the lifetime of an entity instance. The life history is shown in a diagram that is a hierarchy of boxes. Each entity is placed at the top of its hierarchy by putting the name of the entity in the topmost box. Below it come first the creation events and last the deletion events. In between come any modification events. The order of time is from left to right on the diagram. Events that can be repeated are marked with an asterisk. Events that are alternatives to be selected are marked with a circle. The null event has no name and may be used as an alternative in a selection. Complex events can be decomposed into several events at a lower level in the hierarchy. Internal nodes (boxes that have at least one box below them in the hierarchy) will not appear in the event list. EXERCISES These exercises give examples of data sources that can be analyzed using the methods contained in this unit. 1. A literary agency keeps records of books that have been published in which it has an interest. The data is kept on forms like the one shown below. There is one form for each different ISBN (since paperback and hardback editions of the same book have different ISBNs. Some of the authors belong to a writers’ club run by the agency. Catalogue Details ISBN Title Edition Description Author Details Name Martin Banks Rodney Line BOOK ENTRY Publisher Details Name 1-3452-7294X Address Fishing for All Postcode 2 Pbk 280pp illus Phone Number 03838 612345 02028 621378 Henry Hutters Lamb Place, Dundee DD1 8RX Club Membership No. (if any) Year Joined Club 517 1993 Information Systems: Database Systems (AH) 44 Figure er7. Edition Publisher Name ISBN Title Book N Written By 1 Publisher 1 Description Written By Publisher Postcode Publisher Address N Authorship N Perform 1 Author Author Name 1 Author Phone No Information Systems: Database Systems (AH) Author Club Memb No Member 1 Club Author Year Joined 45 2. A collection of music CDs has a filing system that consists of a form for each CD, an example being shown below. Each CD has a unique number allocated to it. The details of the tracks are shown on the form. The CDs can be borrowed and the details of the person (if any) who currently has the CD are also shown. (Former borrowers are shown crossed through). CD CD number Category 401 Classical Title Chamber Music Collection vol. 4 Track Details Track Title Performers / Artists number 1-5 Schubert Piano Quintet in A, Jeorg Demus, Schubert Quartet D667 (The Trout) 6-9 Mozart String Quartet in B flat, Amadeus Quartet K458 (The Hunt) 10-12 Beethoven Piano Trio in D, Kempff – Szeryng – Fournier Trio Opus 70/1 (The Ghost) Borrower Details Name Address Phone Number Jo Lee Anil Rae 3 Railway Lane, Miltown 24 Main Street, Miltown 0326 721456 0326 742509 Length 30:05 20:45 25:20 Date Borrowed 16/11/98 4/3/99 Final ER Diagram for Exercise 1. CD Numb er CD Category Borrower Name Borrower Address CD Title CD Borrower Phone No Date Borrowed Track Track Length Track Number Track Title Information Systems: Database Systems (AH) Track Performers 46 Fig xx1 GP {entity} Add GP {event} GP Life {internal node summarising events below it} * Modify GP Details {repeated event} Delete GP {event} GP Add GP Delete GP GP Life 0 * Modify GP Details Information Systems: Database Systems (AH) 47 Fig xx2 GP {entity} Add GP {event} GP Life {internal node summarising events below it} * Modify GP Details {repeated internal node} ° Modify GP Name {alternative event} ° Modify GP Address {alternative event} Delete GP {event} GP Add GP Delete GP GP Life * Modify GP Details o o Modify GP Name Information Systems: Database Systems (AH) Modify GP Address 48 Fig xx3 Hospital Add Hospital Hospital Life * Modify Hospital Details Delete Hospital Hospital Add Hospital Hospital Life Delete Hospital * Modify Hospital Details Fig xx4 Consultant Add Consultant Consultant Life * Modify Consultant Details Delete Consultant Consultant Add Consultant Consultant Life Delete Consultant * Modify Consultant Details Information Systems: Database Systems (AH) 49 Fig xx5 Clinic Add Clinic Remove Clinic ° Archive Clinic Details ° Cancel Clinic Clinic Remove Clinic Add Clinic o Archive Clinic Details Information Systems: Database Systems (AH) o Cancel Clinic 50 Fig xx6a Appointment Make Appointment Appointment Life * Change Time of Appointment Remove Appointment ° Archive Clinic Details ° Cancel Clinic ° Cancel Appointment Appointment Make Appointment Appointment Life * Change Time of Appointment Information Systems: Database Systems (AH) Remove Appointment o Archive Clinic Details o Cancel Clinic o Cancel Appointment 51 Fig xx6b Appointment Make Appointment Appointment Life * Change Time of Appointment Remove Appointment ° Archive Clinic Details ° Cancellation ° Cancel Clinic ° Cancel Appointment Appointment Make Appointment Appointment Life * Change Time of Appointment Remove Appointment o Archive Clinic Details o Cancellation o Cancel Clinic Information Systems: Database Systems (AH) o Cancel Appointment 52 Fig aa1 Sequence (showing two different ways of drawing the connections in the hierarchy). Event X consists of Event A followed by Event B followed by Event C. Event X Event A Event C Event B Event X Event A Event B Event C Fig aa2 Selection of alternatives. Event X consists of either Event A, or Event B, or Event C. (One must be selected.) Event X o Event A o Event B o Event C Fig aa3 Selection of optional alternatives. The null box (containing a dash in place of a name) indicates no action. Event X consists of either Event A or Event B or nothing. Event X o Event A o Event B Information Systems: Database Systems (AH) o ______ 53 Fig aa4 Repetition (iteration) Event X consists of zero or more repetitions of Event A. Event X * Event A Fig bb1 Example of ELH from the arrangements – original form. Invoice * Change of Invoice Detail New Invoice Created o Change to Existing Detail Line o Detail Line Deleted o New Detail Line Added Information Systems: Database Systems (AH) Invoice Archived o Invoice Paid Detail o Invoice Cancelled 54 Fig bb2 Example from arrangements redrawn for greater clarity. A new internal node ‘Invoice Life’ is added so that all the nodes below Invoice are of the same kind. Invoice New Invoice Created Invoice Life * Change of Invoice Detail o Change to Existing Detail Line Information Systems: Database Systems (AH) o Detail Line Deleted Invoice Archived o Invoice Paid Detail o Invoice Cancelled o New Detail Line Added 55 QUESTIONS These questions can be used to check your understanding of the material. The most relevant performance criteria are shown after the question number. 1. PC 1e An entity includes the data items x and y. If the data item x is a key, can (x, y) be a key of the same entity? 2. PC 1a-d The following group of items has a single repeating item, phone number. Assume that clients may share the same phone number. Show how to move the repeating item to another entity to produce 1NF. What is the key of the new entity? Is the new entity in 1NF? 2NF? 3NF? Entity Client Data Items Client Contact Number Client Name Client Address Client Post Code Repeating item Phone Number 3. PC 1c Explain why a single-item key implies that the entity is at least in 2NF. 4. PC 1d Explain why an all-key entity must be in 3NF. (An all-key entity is one where the key consists of all the data items in the entity. They can occur when the underlying data model has a many-to-many relationship.) 5. PC 1d, e An entity E has four data items a, b, c, d and has two candidate keys (a, b) and c. Also c functionally determines d. Is it in 3NF? If not, derive entities that are in 3NF. What are the keys of these entities? 6. PC 1e The following table shows some data for an entity concerning the poetical works of a certain author. Why can the entity not have as its key the data item ‘Title’? What could be a key in this example? Why can we not be sure that this is a key? Information Systems: Database Systems (AH) 56 Title Year First Published Threnody Image of Man Roses To Winter Pleasure’s Repose My Heart’s Hope Roses Conquest of Care 1856 1856 1849 1860 1888 1860 1888 1860 Volume in Collected Works 2 2 1 3 4 4 4 4 Number of Lines 116 240 35 86 463 44 52 362 7. PC 2a, b What is wrong with the following excerpt from a DD? Item Entity Description Emp_Nbr Employee Name Employee Address Emp_Dept_No Employee Employee Dept_Nbr Department Name Dept_Mgr Department Department Unique number for employee Full name of employee Address of employee Number of department in which employee works Unique number for department Name of department Emp_Nbr of manager of department Type/ Size A(6) Range/ Validation 999999 A(40) A(100) A(4) Number A(32) A(6) Existing Dept_Nbr In 0..999 Existing Emp_Nbr Req’d Key Y Y Y N N Y N N N Y Y Y N N 8. PC 2c What is probably missing from this list of events for the entity Employee? Events for Employee Modify Employee Name Modify Employee Address Change Department of Employee Dismiss Employee Employee Resigns Information Systems: Database Systems (AH) 57 9. PC 2e Which of the following sequences are allowed and which are not allowed for the Client ELH? Sequence 1 Add Client Delete Client Sequence 2 Add Client Change Client Name Add Client Change Client Address Delete Client Sequence 3 Add Client Change Client Address Change Client Address Delete Client Sequence 4 Delete Client Add Client Change Client Name Delete Client Client Delete Client Client Life Add Client * Modify Client Details o Change Client Name Information Systems: Database Systems (AH) o Change Client Address 58 10. PC 2e What is wrong with this ELH of Employee? Employee Employee Life Delete Employee * Modify Employee Details Add Employee Print List of Employees 11. PC 2e Improve the following ELH of Customer. Customer * Modify Customer Details Add Customer o Change Customer Phone Number Information Systems: Database Systems (AH) Remove Customer o Change Customer Address 59 ANSWERS TO QUESTIONS These questions can be used to check your understanding of the material. The most relevant performance criteria are shown after the question number. 1. PC 1e An entity includes the data items x and y. If the data item x is a key, can (x, y) be a key of the same entity? Answer. No. A key cannot include any unnecessary items. The combination (x, y) is called a superkey. 2. PC 1a-d The following group of items has a single repeating item, phone number. Assume that clients may share the same phone number. Show how to move the repeating item to another entity to produce 1NF. What is the key of the new entity? Is the new entity in 1NF? 2NF? 3NF? Entity Client Data Items Client Contact Number Client Name Client Address Client Post Code Repeating item Phone Number Answer. Move the item Phone Number into a new entity, Phone, along with the key of Client as a foreign key. The foreign key implements the one-to-many relationship between the Client and Phone entities. The key of Phone is the combination of both items since there may be several clients sharing one number and several phone numbers for one client. Phone is in 1NF since there are no repeating items or groups. It is in 2NF since there are no other items than the key to be partially dependent. It is in 3NF for a similar reason: there are no items that can be indirectly dependent. Entity Client Phone Data Items Client Contact Number Client Name Client Address Client Post Code Phone Number Client Contact Number Information Systems: Database Systems (AH) 60 (a) The first diagram shows a repeated item, Phone Number. Client Contract No Phone Number Client Client Name Client Post Code Client Address (b) The second diagram shows the Phone Number treated as a separate entity. Phone Number Client Contract No 1 Client Name Client Client Address Has N Phone Client Post Code 3. PC 1c Explain why a single-item key implies that the entity is at least in 2NF. Answer. Partial dependence requires there to be more than one item in the key, so there can be no partial dependencies in an entity with a single-item key. 4. PC 1d Explain why an all-key entity must be in 3NF. (An all-key entity is one where the key consists of all the data items in the entity. They can occur when the underlying data model has a many-to-many relationship.) Answer. An entity is in 3NF if there are no indirect dependencies. An indirect dependency requires two items other than the key to have a functional dependency. In an all-key entity there are no such items. 5. PC 1d, e An entity E has four data items a, b, c, d and has two candidate keys (a, b) and c. Also c functionally determines d. Is it in 3NF? If not, derive entities that are in 3NF. What are the keys of these entities? Information Systems: Database Systems (AH) 61 Answer. It is not in 3NF since d is indirectly dependent on the key (a, b). (Note that the key (a,b) functionally determines c.) We can remove d to a new entity, F, which must also contain as a foreign key the item c. The key of F will be c. The new entity E will still have two candidate keys (a, b) and c. 6. PC 1e The following table shows some data for an entity concerning the poetical works of a certain author. Why can the entity not have as its key the data item ‘Title’? What could be a key in this example? Why can we not be sure that this is a key? Title Year First Published Threnody Image of Man Roses To Winter Pleasure’s Repose My Heart’s Hope Roses Conquest of Care 1856 1856 1849 1860 1888 1860 1888 1860 Volume in Collected Works 2 2 1 3 4 4 4 4 Number of Lines 116 240 35 86 463 44 52 362 Answer. The item Title cannot be a key since the value ‘Roses’ appears in two instances (records). A possible key is to combine Title with Year First Published. With the data shown this combination is unique. However, it is possible that two works with the same title could have been published in the same year (even if not in the same publication). There are similar problems to combining with the other items. 7. PC 2a, b What things are wrong with the following excerpt from a DD? Item Entity Description Emp_Nbr Employee Name Employee Address Emp_Dept_No Employee Employee Dept_Nbr Department Name Dept_Mgr Department Department Unique number for employee Full name of employee Address of employee Number of department in which employee works Unique number for department Name of department Emp_Nbr of manager of department Type/ Size A(6) Range/ Validation 999999 A(40) A(100) A(4) Number A(32) A(6) Existing Dept_Nbr In 0..999 Existing Emp_Nbr Req’d Key Y Y Y N N Y N N N Y Y Y N N Answer. (1) The data item ‘Name’ is duplicated, it appears in each of the entities Employee and Department. (2) The abbreviation for ‘number’ is not consistent (both Nbr and No are used). (3) The type of the foreign key Emp_Dept_No is different from the type of the key to which it refers (A (4) and Number, respectively). (4) The key of Department is shown as not required but all key items have required values. Information Systems: Database Systems (AH) 62 8. PC 2c What is probably missing from this list of events for the entity Employee? Events for Employee Modify Employee Name Modify Employee Address Change Department of Employee Dismiss Employee Employee Resigns Answer. There is no event that creates an instance of the entity. It is possible that one of the events is badly named. The entity/event matrix would show whether this was the case. There should be at least one event with a ‘C’ entry in the matrix for the Employee entity. 9. PC 2e Which of the following sequences are allowed and which are not allowed for the Client ELH? Sequence 1 Add Client Delete Client Sequence 2 Add Client Change Client Name Add Client Change Client Address Delete Client Sequence 3 Add Client Change Client Address Change Client Address Delete Client Sequence 4 Delete Client Add Client Change Client Name Delete Client Client Delete Client Client Life Add Client * Modify Client Details 0o Change Client Name Information Systems: Database Systems (AH) o Change Client Address 63 Answer. Sequence 1 is OK since the repetition allows zero repeats. Sequence 2 is illegal since the second Add Client cannot happen at all: it is not a repeated event, nor can it happen after Change Client Name which is part of the Client Life internal node that follows Add Client. Sequence 3 is OK since repetition of Modify Client Details is allowed. Sequence 4 is illegal since Delete Client cannot be first in the sequence. 10. PC 2e What is wrong with this ELH of Employee? Employee Employee Life Delete Employee * Modify Employee Details Add Employee Print List of Employees Answer. (1) The Delete Employee event should not come first, nor can the Add Employee event that creates instances come last. They should be interchanged. (2) The event ‘Print List of Employees’ is likely to be read-only and so should not be shown in the ELH at all. If it does update the Employee entity it should be renamed to show this. The entity/event matrix should be consulted and revised if necessary. 11. PC 2e Improve the following ELH of Customer. Customer * Modify Customer Details Add Customer o Change Customer Phone Number Information Systems: Database Systems (AH) Remove Customer o Change Customer Address 64 Answer. Since the three boxes at the top level are not a pure sequence (the second one is a repetition) it is better to introduce a new internal node to cover the life of the Customer between creation and deletion. The repetition is now placed under this new node. Customer Customer Life Add Customer Remove Customer * Modify Customer Details o Change Customer Phone Number Information Systems: Database Systems (AH) o Change Customer Address 65 ANSWERS TO EXERCISES These exercises give examples of data sources that can be analyzed using the methods contained in this unit. 1. A literary agency keeps records of books that have been published in which it has an interest. The data is kept on forms like the one shown below. There is one form for each different ISBN (since paperback and hardback editions of the same book have different ISBNs). Some of the authors belong to a writers’ club run by the agency. Catalogue Details ISBN Title Edition Description Author Details Name Martin Banks Rodney Line BOOK ENTRY Publisher Details Name 1-3452-7294X Address Fishing for All Postcode 2 Pbk 280pp illus Phone Number 03838 612345 02028 621378 Henry Hutters Lamb Place, Dundee DD1 8RX Club Membership No. (if any) Year Joined Club 517 1993 Answer. Normalization. Step 1. Some item names are extended to show their meaning more clearly. Entity Book Entry Data Items ISBN Title Edition Description Publisher Name Publisher Address Publisher Postcode Author Name Author Phone Number Author Club Membership Number Author Year Joined Club Step 2. The last four data items form a repeating group (since there can be several authors for a given book). Step 3. The key for Book Entry is ISBN. We have now reached UNF. Information Systems: Database Systems (AH) 66 Entity Book Entry Data Items ISBN Title Edition Description Publisher Name Publisher Address Publisher Postcode Repeating group Author Name Author Phone Number Author Club Membership Number Author Year Joined Club Step 4. We remove the repeating group items to a new entity called Author. The key of Author is Author Name (assuming that no two authors have the same name). Entity Book Entry Author Data Items ISBN Title Edition Description Publisher Name Publisher Address Publisher Postcode Author Name Author Phone Number Author Club Membership Number Author Year Joined Club Step 5. The relationship between Book Entry and Author is many-to-many. We add a new entity, Authorship, that is in a many-to-one relationship with each of Book Entry and Author. We add the key of Book Entry and the key of Author to Authorship as foreign keys to implement this relationship. The key of Authorship consists of both these foreign keys. We have now reached 1NF. Entity Book Entry Authorship Author Data Items ISBN Title Edition Description Publisher Name Publisher Address Publisher Postcode ISBN Author Name Author Name Author Phone Number Author Club Membership Number Author Year Joined Club Step 6. Since Book Entry has a single-item key it is in 2NF. Since Author has a single-item key it is in 2NF. Since Authorship is all key (i.e. all data items are part of the key) it must be in 2NF. Information Systems: Database Systems (AH) 67 Step 7. Since all three entities are in 2NF there is no work to be done at this step. Steps 8 and 9. (1) Book Entry is not in 3NF since it has items that are indirectly dependent on the key, namely Publisher Name, Address and Postcode. Remove these to a new entity, Publisher. Add the key of Book Entry to it as a foreign key to implement the manyto-one relationship between Book Entry and Publisher. The key of Publisher will be Publisher Name (assumed unique). (2) Authorship has no non-key items and is also in 3NF. (3) It is likely that Author Year Joined Club is functionally dependent on Author Club Membership Number so there is an indirect dependency on the key. Remove Author Year Joined Club to a new entity, Club. Put key of Author into Club as a foreign key to implement the relationship (which is one-to-one). The key of Club is Author Club Membership Number. The final set of 3NF entities is given below. Entity Book Entry Publisher Authorship Author Club Information Systems: Database Systems (AH) Data Items ISBN Title Edition Description Publisher Name Publisher Name Publisher Address Publisher Postcode ISBN Author Name Author Name Author Phone Number Author Club Membership Number Author Club Membership Number Author Year Joined Club 68 The completed data model. Figure er6a. ER diagrams for Exercise 1. If we wish to show the many-to-many relationship between Book Entry and Author the ER diagram is as shown below with not all attributes shown. Title Edition Description Author Name ISBN M Book Entry Publisher attribs … Information Systems: Database Systems (AH) Written By N Author Author attribs … 69 Figure er6b. After the Authorship entity is introduced we have the many-to-many relationship decomposed into two many-to-one relationships. Note that Authorship has no attributes: its sole purpose is to relate Book Entry and Author. Title Edition Description ISBN Book Entry 1 Publisher attribs … Written By N Authorship N Perform Author Name 1 Author Author attribs …. Information Systems: Database Systems (AH) 70 Data dictionary. There will be some renaming. Each item will be prefixed with the (possibly abbreviated) entity name to make it unique. Consistent abbreviations are used. The entity ‘Book Entry’ is now better named just ‘Book’. Item Entity Description Type/ Size A(10) Book_ISBN Book Unique ISBN for each book Book_Title Book_Edition Book_ Description Book_Publisher Book Book Book A(60) Number A(30) Book Title of book Edition number Binding, pagination, etc. Publisher of book Pub_Name Pub_Address Pub_Postcode Aship_ISBN Publisher Publisher Publisher Authorship Name of publisher Address of publisher Postcode of publisher ISBN of book written A(30) A(100) A(10) A(10) Aship_Author Authorship Author writing A(30) Auth_Name Auth_Phone Author Author A(30) A(12) Auth_Club_Nbr Author Name of author Phone number of author Membership number in club Club_Memb_Nbr Club Number Club_Year_Join Club Unique membership number Year joined club A(30) Number Number Range/ Validation Last digit is a modulus11 check Existing Pub_Name Exisiting Book_ISBN Existing Auth_Name Existing Club_ Memb_Nbr In 1900..2100 Req’d Key Y Y Y Y N N N N Y N Y Y Y Y Y N N Y Y Y Y N Y N N N Y Y Y N Entity/event matrix. No reports are shown in this list of events but many could be included. Event Add book Add publisher Add authorship of book Add author Author join club Modify book details Modify publisher details Modify author details Modify club member details Delete book Delete publisher Author leave club Delete author Delete authorship Book C Pub R C R Aship Auth Club C R C R C M R M M M D R Information Systems: Database Systems (AH) R D R R D D D D R 71 Entity Life Histories. Book Add Book Delete Book Book Life * Modify Book Details Publisher Add Publisher Publisher Life Delete Publisher * Modify Publisher Details Information Systems: Database Systems (AH) 72 Authorship Delete Authorship Add Authorship of Book Author Delete Author Author Life Add Author * Modify Author Details Club Author Join Club Remove Club Member Club Life * Modify Club Member Details Information Systems: Database Systems (AH) o Author Leave Club o Delete Author 73 2. A collection of music CDs has a filing system that consists of a form for each CD, an example being shown below. Each CD has a unique number allocated to it. The details of the tracks are shown on the form. The CDs can be borrowed and the details of the person (if any) who currently has the CD are also shown. (Former borrowers are shown crossed through). CD CD number Category 401 Classical Title Chamber Music Collection vol. 4 Track Details Track Title Performers / Artists number 1-5 Schubert Piano Quintet in A, Jeorg Demus, Schubert Quartet D667 (The Trout) 6-9 Mozart String Quartet in B flat, Amadeus Quartet K458 (The Hunt) 10-12 Beethoven Piano Trio in D, Kempff – Szeryng – Fournier Trio Opus 70/1 (The Ghost) Borrower Details Name Address Phone Number Jo Lee Anil Rae 3 Railway Lane, Miltown 24 Main Street, Miltown 0326 721456 0326 742509 Length 30:05 20:45 25:20 Date Borrowed 16/11/98 4/3/99 Answer. Normalization. Step 1. Some item names are extended to show their meaning more clearly. Entity CD Data Items CD Number CD Category CD Title Track Number Track Title Track Performers Track Length Borrower Name Borrower Address Borrower Phone Number Date Borrowed Step 2. The four data items containing track data form a repeating group (since there can be several tracks for a given CD). Those for Borrower do not form a repeating group as only the current borrower (if any) is retained. (If a history of borrowing were to be kept then the last four items would indeed be another repeating group and would need to be moved to another entity for borrower. See steps 8 and 9, however. What would be different in this case is the kind of relationship between CD and borrower. Information Systems: Database Systems (AH) 74 For a history of borrowing this relationship would be many-to-many and a further entity would be needed to hold a borrowing with foreign keys for both CD and borrower.) Step 3. The key for CD is CD Number. We have now reached UNF. Entity CD Data Items CD Number CD Category CD Title Repeating group Track Number Track Title Track Performers Track Length Borrower Name Borrower Address Borrower Phone Number Date Borrowed Step 4. We remove the repeating group items to a new entity called Track. Entity CD Track Information Systems: Database Systems (AH) Data Items CD Number CD Category CD Title Borrower Name Borrower Address Borrower Phone Number Date Borrowed Track Number Track Title Track Performers Track Length 75 Figure er9. After step 4 we have two entities in a one-to-many relationship. CD Numb er CD Category Borrower Name Borrower Address CD Title CD Date Borrowed Borrower Phone No 1 Part N Track Track Length Track Number Track Title Track Performers Step 5. The relationship between CD and Track is one-to-many. We add the key of CD to Track as a foreign key to implement this relationship. The key of Track now has the foreign key as well since one Track number value may be on more than one CD. We have now reached 1NF. Entity CD Track Data Items CD Number CD Category CD Title Borrower Name Borrower Address Borrower Phone Number Date Borrowed CD Number Track Number Track Title Track Performers Track Length Steps 6, 7. Since CD has a single-item key it is in 2NF. Track has no partial dependencies of the last three items on the key since neither CD Number alone nor Track Number alone is sufficient to determine these three items. So Track is in 2NF. Information Systems: Database Systems (AH) 76 Steps 8 and 9. (1) CD is not in 3NF since it has items that are indirectly dependent on the key, namely Borrower Address and Borrower Phone. (This assumes that Borrower Name is unique.) Remove these to a new entity, Borrower. The key of Borrower will be Borrower Name (which we have assumed unique). Leave the key of Borrower in CD as a foreign key to implement the many-to-one relationship between CD and Borrower. CD and Borrower are now in 3NF. (2) Track has no indirectly dependent items so it is in 3NF. The final set of 3NF entities is given below. Entity CD Borrower Track Information Systems: Database Systems (AH) Data Items CD Number CD Category CD Title Borrower Name Date Borrowed Borrower Name Borrower Address Borrower Phone Number CD Number Track Number Track Title Track Performers Track Length 77 The completed data model. Figure er10. CD Number CD Category Borrower Name CD Title Has CD N Date Borrowed Borrower Address Borrower 1 1 Borrower Phone No Part N Track Track Length Track Number Track Title Track Performers Data dictionary. There will be some renaming. Each item will be prefixed with the (possibly abbreviated) entity name to make it unique. Consistent abbreviations are used. Item Entity Description CD_Nbr CD CD_Category CD_Title CD_Date_ Borrowed CD_Borrow_ Name CD CD CD Borr_Name Borr_Address Borr_Phone_Nbr Borrower Borrower Borrower Track_CD_Nbr Track Unique number for each CD Category of music Title of CD Date CD borrowed (if any) Name of current borrower of CD (if any) Name of borrower Address of borrower Phone number of borrower CD number of this track CD Information Systems: Database Systems (AH) Type/ Size A(10) Range/ Validation A(24) A(80) Date A(30) Existing Borr_Name A(30) A(100) A(12) A(10) Exisiting CD_Nbr Req’d Key Y Y N Y N N N N N N Y Y N Y N N Y Y 78 Item Entity Description Track_Nbr Track_Title Track_Performer s Track_Length Track Track Track Track number(s) Title of track Performers of track(s) Type/ Size A(6) A(80) A(120)) Track Length of track(s) Time Range/ Validation Req’d Key Y N N Y N N N N Entity/event matrix. No reports are shown in this list of events but many could be included. Information Systems: Database Systems (AH) 79 Event Add CD Add borrower Modify CD details Modify borrower details Modify track details Delete CD Delete borrower Borrow CD Return CD CD C Borr Track C C M M R D R M M M D R D R R Entity Life Histories. Borrower Add Borrower Delete Borrower Borrower Life * Modify Borrower Details Track Add CD Delete CD Track Life * Modify Track Details Information Systems: Database Systems (AH) 80 CD Delete CD CD Life Add CD * Modify CD o o Modify CD Details Loan Borrow CD Information Systems: Database Systems (AH) Return CD 81