Chapter 12 Designing Databases Chapter Objectives Chapter 12 introduces students to database design, discussing both logical and physical database design. During logical design, logical data models are created for each known user interface, the logical models for each interface are integrated into a consolidated logical database model, the application’s conceptual E-R data model is translated into normalized data requirements, and the logical database model is then integrated with the translated E-R model. During physical design, decisions about data types, data structures, file organizations, and media are made. This chapter introduces the relational data model, the most common notation used for representing detailed data requirements necessary for database design. Concepts of the relational data model, normalization principles for creating relational models with desirable properties, a process for combining different relational data models into a consolidated relational data model, and how to translate an entity-relationship data model into a relational data model are presented. Chapter 12 reviews several choices systems builders have for the design of physical data storage structures. Chapter 12 emphasizes those decisions for which a systems analyst is most likely to have input, as opposed to the very technical data structure decisions made by database administrators and analysts. You should emphasize to your students throughout this section that physical design issues are addressed as part of a team of system analysts and technology specialists. Systems analysts bring an in-depth understanding of the application to the table, whereas technology specialists intimately understand the relative efficiency, security, integrity, and reliability of different technologies in different settings. To communicate with these specialists, systems analysts must have a sound knowledge of physical design issues, which is accomplished from studying the chapters in this section. Some students are attracted to the more technical topics (possibly because of prior exposure to computer science topics), and others are interested in the more methodological and managerial issues. Thus, another challenge of teaching Chapter 12, as well as Chapters 15 and 16, is keeping a balance of business and technical issues so that you hold the attention of all students. Chapter 12 provides a transition from typical systems analysis topics to data analysis topics; these topics are often discussed in database management courses. You should coordinate the teaching of this chapter with database management course faculty so that purposeful redundancy occurs and important topics are not missed across the courses in your curriculum. Instructional Objectives Specific student learning objectives are included at the beginning of the chapter. From an instructor’s point of view, the objectives of this chapter are to: 1. Show the relationship between systems analysis and design and database design. The philosophy of this textbook is that database design is a topic of joint interest between systems analysts and database specialists. In most cases, considerable interaction between conceptual and logical database design exists, combining the top-down approach of conceptual data 149 Modern Systems Analysis and Design, 3rd edition Instructor’s Manual modeling with a bottom-up approach using logical data modeling tools. Application area knowledge and enterprise database knowledge (often found in a data administration group) are necessary to converge to a viable logical design for an application’s database. Remember, an application’s logical database design does not imply a separate physical database for that application, but rather only a separate view of data, which may be part of a more comprehensive physical database. 2. Show the relationship between logical database design and physical database design. An important point under this objective is that it is the job of the analyst to capture during prior systems development phases--especially analysis and logical design--all of the parameters needed to make physical system design decisions. Although some information is not necessary for the techniques of prior stages, this information (such as field length, data integrity requirements, and an estimated number of entity instances) is essential for physical system design. 3. Present the relational data model as a logical data model that captures the structure of data in a very fundamental, stable form and suggests ways to organize data during physical database design, resulting in desirable data maintenance properties (which avoid certain data anomalies). 4. Show students how a conceptual data model can obscure some details about data requirements that must be better understood in order to perform physical database design. 5. Show students, using an example from Hoosier Burger, how to translate a conceptual data model into a logical data model and how to incorporate the data requirements of specific system outputs into the process of forming a logical data model. 6. Improve the ability of students who will take systems analyst jobs to communicate with technical specialists on systems development teams. 7. Emphasize the distinction between logical and physical system design by coverage of such topics as denormalization; such topics clarify the different objectives of logical and physical design (which is basically semantic richness of describing information requirements versus efficient and secure data processing). 8. Discuss database design for Internet-based electronic commerce applications. Classroom Ideas 1. This chapter, like Chapter 10, covers a topic addressed in most database management courses. Depending on your curriculum, this chapter may review previously covered material or introduce material covered (in more depth) in a subsequent course. However, logical database design is not strictly a database topic, but is essential for thorough systems analysis, thus it is an activity that should not be assigned to only specialists (database analysts). Although you are strongly encouraged to cover this chapter in your systems analysis and design course, you should coordinate how you address this topic with those who teach database courses. Chapter 12 is carefully written for the systems analysis and design student. This chapter is an excellent refresher for students who have studied the relational data model and normalization and provides a solid introduction to these topics for those students who will address this topic later in a database management course. 150 Chapter 12 Designing Databases: Logical Data Modeling 2. Emphasize to your students that logical data modeling is still technology independent. A logical data model is not necessarily mapped on a one-to-one basis into a physical database design. The purpose of logical database modeling is to prepare the description of stored data requirements into a format that makes it easier for physical database design decisions to be made. Students may want to go directly from conceptual data modeling to physical database design, so spend some time discussing the purpose of logical database modeling. 3. Review the key steps of logical database modeling. It is important that novice data modelers understand that all four steps are necessary to produce a thorough logical data model as input to physical database design. 4. Understanding the relational data model is critical for doing logical database modeling. The relational model is fairly simple, and since most students have experience with a PC database management system, this model is intuitive for them. Emphasize the five properties of relations and the concept of anomalies. 5. You can introduce normalization to your students from two perspectives. One approach is to introduce first, second, and third normal forms and teach your students to transform unnormalized data into third normal form by stepping through each normal form in sequence. Another approach is to emphasize functional dependencies and determinants (see Problem and Exercise 6). Use whichever approach is most comfortable for you; this chapter supports either approach. 6. Consider spending a significant portion of your class periods (allocated to Chapter 12) working problems that show how to translate between E-R and relational data models. Table 12-1 is a compact summary of how to map E-R constructs into relational constructs. Your students should become competent with translating in either direction. Problems and Exercises 2, 3, 4, and 6 are suitable for in-class exercises, but you should create other examples. Work a few examples for your students, and then have your students work (either individually or in small teams) on several problems in class and then present their answers. Practice is the best teacher of both normalization and translating between relational and E-R models. 7. Emphasize to your students that most E-R models developed during analysis are incomplete since system inputs and outputs are not designed in detail until logical design. Use this explanation to motivate the need for view integration. Be sure to discuss the potential pitfalls (view integration problems) that make view integration more than a mechanical process. Again, use many examples; Problem and Exercise 4 is a fairly simple one that students can work on inside or outside of class. Ask your students why view integration problems arise; for example, sample problems include independent analysts or project teams of different subsystems with slightly different data semantics, the integration of multiple independently-developed applications coming together to create an enterprise data model, and imprecision or lack of naming standards by analysts. 8. You should review in class the Hoosier Burger example found in the “Logical Database Design For Hoosier Burger” section. This example illustrates how to deal with both translating an E-R model into 3NF relations as well as integrating specific system requirements into the data model during logical database design. 151 Modern Systems Analysis and Design, 3rd edition Instructor’s Manual 9. You should emphasize to your students that physical database design is technology dependent, as well as logical requirements dependent. To make physical data storage decisions, one must know what constraints and opportunities available technologies have. What file organizations do the available database management systems have? What data types do the operating system and other system software support? What are the physical characteristics of secondary memory devices and what overhead space does data management software require? These are questions that technical specialists can answer; systems analysts, with application knowledge, along with technical specialists armed with answers to these questions can together perform physical file and database design. 10. Discuss with your students the role of the CASE repository as a central source of information necessary to make physical design decisions and as a place where these decisions are stored. You can also point out the interactive nature of some logical and physical design issues, such as choosing field data types, and that a CASE repository helps to synchronize logical and physical design activities. For example, when prototyping computer displays or reports with many design tools, analysts may have to select the data type and length of fields before these fields are placed on the prototype. If this display or report generator is a module of the CASE tool, then it draws on the CASE repository for necessary information in building the user interface. 11. Emphasize the important role analysts play in designing data integrity controls. Analysts are essential because designers need an in-depth knowledge of the business area and application to choose default values, picture formats, and range constraints; to determine whether null values should be allowed; and if a null value is present, how to process the data. 12. Review the types of file organizations. Discuss the seven factors that analysts consider when selecting a file organization. Have students evaluate each file organization with respect to each of the seven factors. Finally, review Table 12-3 as a way summarize the discussion. 13. Often when students prepare system development projects for the systems analysis and design course, they do not include adequate file and database controls into the system. Call your students’ attention to the section “Designing Controls for Files” and emphasize that controls are essential elements of a system and must be designed into the system, not added as an afterthought. 152 Chapter 12 Designing Databases: Logical Data Modeling Answers to Key Terms Suggested answers are provided below. These answers are presented top-down, left to right. 21. 28. 13. 8. 23. 27. 7. 20. 19. 26. 10. 5. 2. 1. Relation Well-structured relation Normalization Functional dependency Second normal form (2NF) Third normal form (3NF) Foreign key Referential integrity Recursive foreign key Synonyms Homonym Field Data type Calculated (or computed or derived) field 3. 14. 16. 4. 15. 6. 17. 25. 12. 11. 24. 9. 18. 22. Default value Null value Physical table Denormalization Physical file File organization Pointer Sequential file organization Indexed file organization Index Secondary key Sequential file organization Primary key Relational database model Answers to Review Questions 1. The purpose of normalization is to rid relations of anomalies. The goal is to form wellstructured relations that are simple and stable when data values change or data are added or deleted. 2. The five properties of relations are entries in columns are simple, entries in columns are from the same set of values, each row is unique, the sequence of columns is insignificant, and the sequence of rows is insignificant. 3. Synonyms, homonyms, transitive dependencies, and class/subclass relationships can arise during view integration. Synonyms occur when two or more different names are used for the same attribute from different user views. Homonyms occur when two or more attributes from different user views have the same name. Transitive dependencies are functional dependencies between non-key attributes that arise when functionally dependent non-keys come from different user views. Class/subclass relationships refer to relations representing the same entity from different user views actually represent different subsets of the same entity type. 4. Relationships between entities are represented in several ways in the relational data model. A binary 1:M relationship is represented by placing a foreign key (the primary key of the entity on the one side of the relationship) in the relation for the entity on the many side of the relationship. In a binary 1:1 relationship, a foreign key is placed in the relation on either side of the relationship or on both sides. For a binary and higher degree M:N relationship, a relation is created with a primary key, which is the concatenation of the primary keys from the related entities. In a unary relationship, a recursive foreign key is added to the relation. 153 Modern Systems Analysis and Design, 3rd edition Instructor’s Manual 5. The fundamental rule of normalization is that each non-key attribute must be fully functionally dependent on the whole primary key attribute (a non-key is dependent on the whole key and nothing but the key). Thus, there can be no functional dependencies between non-keys. 6. A foreign key is identified by using a dashed underline. 7. Instances in a relation cannot prove that a functional dependency exists; however, you can use sample data to demonstrate that a functional dependency does not exist. The sample data does not show you every possible instance, only a sampling. Knowledge of the problem domain is a reliable method for identifying functional dependencies. 8. The choice of data type often limits the possible values that may be stored for a field. For example, a numeric data type forbids alphabetic characters. Some data types have an assumed length (e.g., SMALLINT) that places an implicit range control on values. Data type may also limit the kinds of data manipulations possible, thus further controlling the integrity of the data or results from manipulating the data. For example, a DATE data type causes addition and subtraction to be limited by rules about dates. 9. A referential integrity control requires the data management software to access other data records to determine if the value is permitted, whereas a range control is checked by looking up values outside the files and database, in a repository or other source of metadata. 10. The purpose of denormalization is to physically locate data close to one another if they are often needed together for processing, thus minimizing secondary memory I/O operations. Because normalization forces all attributes dependent on the same primary key to be logically placed in one relation, a relation can become quite diverse with many attributes. For example, a PART relation might have attributes related to engineering, production, accounting, and marketing. It is likely that in only rare instances attributes from two or more of these areas are needed in the same data processing steps. If the attributes were stored in the same physical record, the record would be long and it would take more time to access this file than if the file were divided into segments and each segment had the same primary key and only fields for those attributes used together. 11. The factors that influence the decision to create an index are the data retrieval, insertion, deletion, and updating costs with and without the index. Indexes allow for rapid random retrieval and sorting of data, but indexes create additional storage and maintenance costs. 12. Data compression techniques are pattern matching and other methods that replace repeating strings of characters with codes of shorter lengths, thus reducing data storage requirements. 13. The two goals of designing physical tables are the efficient use of secondary storage and data processing speed. 14. The seven factors to considered when selecting a file organization are: (1) fast data retrieval, (2) high throughput for processing transactions, (3) efficient use of storage space, (4) protection from failures or data loss, (5) minimizing need for reorganization, (6) accommodating growth, and (7) security from unauthorized use. 154 Chapter 12 Designing Databases: Logical Data Modeling Answers to Problems and Exercises 1. The wording in the problem clarifies some of the relationships and associated cardinalities. Since VENDOR is functionally dependent on COMPNAME, there can be at most one vendor for each component (and we assume that there may be no vendor for a given component). Also, since COMPNAME is functionally dependent on PRODNAME, there can be at most one component per product (a rather odd situation, but that is what the wording says; we assume that some products have no components). Although not clarified in the problem, we assume that a product is assigned to exactly one salesperson, while a salesperson can be assigned one-to-many products. Given these clarifications, the 3NF relations are (foreign keys are in italics): PRODUCT (PRODNAME, SALESPERSON, COMPNAME) SALESPERSON (SALESPERSON) COMPONENT (COMPNAME, VENDOR) VENDOR (VENDOR) 2. Listed below are sample 3NF relations for the conceptual data model diagram in Figure 10-3. We have created a few, representative attributes to make this normalization meaningful (foreign keys are in italics). The SHIPMENT and PRODPLAN relations contain nonkey attributes, whereas the PRODITEM and SUPPITEM relations, the result of the many-to-many relationships, have no nonkey attributes, since these are not shown as associative entities in Figure 10-3. We have not, at this point, created proper primary keys. SUPPLIER (SUPPNAME, SUPPADDRESS) SHIPMENT (SHIPID, SUPPNAME, ITEMNAME, SHIPDATE) ITEM (ITEMNAME, ITEMDESC) SUPPITEM (SUPPNAME, ITEMNAME) PRODUCT (PRODNAME, PRODDESC) PRODITEM (PRODNAME, ITEMNAME) PRODPLAN (PRODPLANNO, ITEMNAME, MASTSCHEDNO, QUANTITY) MASTSCHED (MASTSCHEDNO, PRODNAME, MASTSCHEDDATE) 3. Listed below are 3NF relations for the E-R diagram on Figure 12-21. Foreign keys are in italics. QUOTE QUANTITY is not sufficient as the primary key of the PRICEQUOTE relation, but this attribute along with the primary keys of the two associated entities is a sufficient concatenated (or composite) primary key for this associative entity. The PART RECEIPT entity is called a weak or attributive entity, since its existence depends on a PRICE QUOTE entity instance. In this case, however, the PART RECEIPT entity has its own primary key, ORDER NO. The 3NF relations are: VENDOR (VENDOR NO, ADDRESS) PRICEQUOTE (VENDOR NO, ITEM NO, QUOTE QUANTITY, PRICE) PART (ITEM NO, DESC) PART RECEIPT (ORDER NO, VENDOR NO, ITEM NO, QUOTE QUANTITY, DATE, ORDER QUANTITY) 4. Listed below are merged 3NF relations for this hospital example. This is an interesting exercise because it points out how semantically lacking the relational data model is, since questions arise about functional dependencies across separately developed relations. One observation is clear: The second 3NF PATIENT relation in this exercise has only one value of TREATMENT 155 Instructor’s Manual Modern Systems Analysis and Design, 3rd edition DESCRIPTION, so each patient must be associated with only one treatment, otherwise this relation would not be in 3NF. But, we must also assume that each department has only one supervisor and each supervisor can supervise only one department. This last assumption means that we could create only a supervisor or a department relation, but not both. This is sufficient because, if the original set of six 3NF relations is comprehensive, there are no nonkey attributes dependent on either DEPARTMENT or SUPERVISOR ID. We, however, create both supervisor and department relations, with a one-to-one relationship between them, to allow for some evolution of the data model. One additional assumption about supervisors: SUPERVISOR is a separate entity from PHYSICIAN. We also assume that the attribute ADDRESS means the same address in both PATIENT relations, and further there are no other synonyms or homonyms across the relations. Interestingly, there is no relationship between patient and physician implied in the original 3NF relations, and we assume none exists. With these assumptions, the merged relations are (foreign keys are in italics): PATIENT (PATIENT NO, ADDRESS, ROOM NO, ADMIT DATE, TREATMENT ID) ROOM (ROOM NO, PHONE, DAILY RATE) PHYSICIAN (PHYSICIAN ID, NAME, DEPARTMENT ID) TREATMENT (TREATMENT ID, DESCRIPTION, COST) SUPERVISOR (SUPERVISOR ID, DEPARTMENT ID) DEPARTMENT (DEPARTMENT ID, SUPERVISOR ID) To create the E-R diagram from these 3NF relations, we have to make additional assumptions about minimum cardinalities. We assume that every patient is assigned a room, but a room may be empty; not all treatments have to be associated with a patient, but a patient has to have a treatment; and that each department has one supervisor and each supervisor has one department. We show relationships from both a department and a supervisor to a physician, but only one is necessary; we also assume that a physician must be associated with both a department and a supervisor. This is an interesting E-R diagram since it contains two, disconnected parts. This is possible, although rare in actual organizations. A suggested E-R diagram, including attributes, for this situation is presented below. Application E-R Diagram DESCRIPTION DEPARTMENT ID DEPARTMENT TREATMENT ID SUPERVISOR ID SUPERVISOR Supervises Given COST TREATMENT PATIENT NO ROOM NO Works in Works for PHYSICIAN PATIENT ADDRESS PHYSICIAN ID PHONE DAILY RATE NAME ADMIT DATE Located in 156 ROOM Chapter 12 Designing Databases: Logical Data Modeling 5. There are several foreign keys in these relations. OFFICER ID is a foreign key in OFFICE referencing MEMBER ID from the MEMBER relation. OFFICE NAME is a foreign key in EXPENSE referencing OFFICE NAME in the OFFICE relation. OFFICER IN CHARGE is a foreign key in COMMITTEE referring to OFFICER ID or OFFICE NAME (which is not clear from simply the relations) in the OFFICE relation. EXPENSE LEDGER NUMBER is a foreign key in PAYMENT referencing LEDGER NUMBER in the EXPENSE relation. MEMBER ID in both RECEIPT and WORKERS cross-references MEMBER ID in the MEMBER relation. COMMITTEE ID in WORKERS cross-references COMMITTEE ID in COMMITTEE. See the accompanying E-R diagram. For simplicity, we do not show attributes on this E-R diagram. The WORKERS relation exists because of a many-to-many relationship, Works On, between MEMBER and COMMITTEE. It is inferred that a member sometimes has many receipts, but a receipt must have a member. An expense sometimes has multiple payments, but each payment must have an expense. Each office sometimes has multiple expenses, but each expense must have an office. Each office may have a member as an officer-in-charge, and each member sometimes holds many offices. An office sometimes is responsible for many committees, and each committee must have an office in charge (although that office may not have a member assigned as officer). Committees sometimes have many workers, and each worker sometimes works on many committees. The ER diagram is more expressive in that it displays explicitly the minimum cardinalities of relationships and shows exactly which entities are related. Fraternity E-R Diagram MEMBER Submits RECEIPT Incurs EXPENSE Holds Works on OFFICE In Charge of Reimburse COMMITTEE PAYMENT 6. Since there are four determinants among the functional dependencies, there will be four relations. The last functional dependency, the one with only a three-key composite determinant, 157 Instructor’s Manual Modern Systems Analysis and Design, 3rd edition signifies all the dates on which a particular applicant interviewed for a particular position. This functional dependency does not signify a many-to-many relationship, like many composite keys do, since date interviewed is itself not a determinant. It signifies an entity with a threecomponent composite key. The four 3NF relations are: APPLICANT (APPLICANT ID, APPLICANT NAME, APPLICANT ADDRESS) POSITION (POSITION ID, POSITION TITLE, DATE POSITION OPENS, DEPARTMENT) APPLICATION (APPLICANT ID, POSITION ID, DATE APPLIED) INTERVIEW(APPLICANT ID, POSITION ID, DATE INTERVIEWED) A suggested E-R diagram is provided below. For clarity, we show composite keys as a single composite attribute. APPLICANT ID + POSITION APPLICANT ID APPLICANT POSITION ID DATE APPLIED Applies APPLICATION Responds to POSITION POSITION TITLE APPLICANT ADDRESS DEPARTMENT APPLICANT NAME Interviews for DATE POSITION OPENS APPLICANT ID + POSITION ID + DATE INTERVIEWED INTERVIEW 7. The objectives of a good coding scheme are to minimize storage space and to increase data integrity. Student major is a classical example of a sparse field that can benefit from being codified for storage and data entry. For on-line data entry, you could provide a list of possible majors from which the data entry person must choose; this legal list of codes will need to be updated, but this can probably be done separately from data entry of student data. A fundamental coding choice is whether to use codes that are as dense as possible or to try to use reasonably dense codes that have some meaning to most users (e.g., MIS versus 31 for a Management Information Systems major). Short character string codes (e.g., three alphabetic characters) may take as little storage as a two digit numeric code, so a short character string may achieve both objectives. An interesting way to approach this question is to have your students identify how their university and at least two other universities represent a student’s major. Have your students compare and contrast these coding schemes. It is likely that they will locate universities where the code is numeric, alphabetic, or alphanumeric. 158 Chapter 12 Designing Databases: Logical Data Modeling 8. Suggestions for the primary keys are underlined in the following relations. Attributes appearing in italics serve as foreign keys. VENDOR (VENDOR NO, ADDRESS) PRICEQUOTE (VENDOR NO, ITEM NO, QUOTE QUANTITY, PRICE) PART (ITEM NO, DESC) PART RECEIPT (ORDER NO, VENDOR NO, ITEM NO, QUOTE QUANTITY, DATE, ORDER QUANTITY) If VENDOR NO, ITEM NO, and ORDER NO are numeric values assigned without any relationship to the associated entities, then these numbers would not change as the real world changes, and they would be acceptable as primary keys or components of composite primary keys. QUOTE QUANTITY, on the other hand, is likely volatile, and is not suitable as part of the primary key for the PRICEQUOTE table. We need to create a nonintelligent primary key for the PRICEQUOTE table. Also, we still may need individual secondary index keys on VENDOR NO and ITEM NO attributes from this relation to facilitate joining the PRICEQUOTE table with the VENDOR and PART tables, respectively. 9. The guidelines for identifying keys for indexing suggest that attributes used for selection, sorting, grouping, and joining are potential candidates. We are asked to consider the three queries as the only accesses to the database, so we do not implicitly need primary key indices. We do not know the frequency of the three queries compared to update operations, so it is impossible to make precise, optimal decisions on the most economical indices. Space does not seem to be an issue since, as we will see, none of the qualifying fields are very long. So, we indicate all possible indices that might speed query processing. The first query, Query A, appears to access the PART and PRICEQUOTE tables, but actually the PRICEQUOTE table is sufficient. The E-R diagram indicates that every part has at least one vendor, so every part has at least one price quote. Thus, all parts appear at least once in the PRICEQUOTE table. The data reported in Query A (ITEM NO, VENDOR NO, QUOTE QUANTITY, and QUOTE PRICE) are found in the PRICEQUOTE table. The query asks for the results to be sorted by ITEM NO and by VENDOR NO (but not by quote quantity). A composite index on these two attributes is the most efficient way to directly produce this sorted output, thus avoiding the need to do a sort of the data once it is retrieved. This assumes that the DBMS or file system utilizes a composite index when it sees the compound sorting condition. Thus, for Query A, a secondary composite index on first ITEM NO and then VENDOR NO is ideal. The second query, Query B, asks to display all the attributes from the PART RECEIPT and PART tables; since ITEM NO is an attribute in the PART RECEIPT relation, we do not need the PRICEQUOTE relation to link the PART RECEIPT and PART tables. No sorting is done, but the query wants data for only a specified part receipt date. Assuming that many days of part receipt data are kept, then we would want to create a secondary key index on DATE and a secondary key index on ITEM NO in the PART RECEIPT table and a primary key index on ITEM NO in the PART table to efficiently support the selection and joining needed for Query B. The third query, Query C, involves selection of a particular vendor and display of attributes from all associated PRICEQUOTE rows for that vendor; thus, all the data needed for this query can be found in the PRICEQUOTE table. No sorting is mentioned. Thus, for Query C, only a secondary key index on VENDOR NO in the PRICEQUOTE table would support efficient processing. 159 Modern Systems Analysis and Design, 3rd edition Instructor’s Manual 10. This question highlights the difficulty of setting useful default values. Our analysis also suggests the potential difficulty of choosing a range check on the age attribute. The bulk of students at most universities typically range in age from 18 to 23. Some students attend university before age 18. For example, some students graduate from high school at age 17 or opt to take a high school equivalency examination and finish high school early (e.g., while younger than 18). In addition, sometimes high school students attend college classes while still enrolled in high school. Finally, there is the occasional child prodigy who attends college at an earlier than usual age. For the upper limit on the age range, there will obviously be some students older than 23, particularly if there are undergraduate students who are working adults or who return to school after other careers (such as military service or child rearing). Similarly, if there are graduate students at this university then there will most certainly be students older than 23. In theory, a student’s age could reach three digits. If we could have only one default value, we could study several years of student data and pick the modal age value. If we could have default values conditional on student type (e.g., undergraduate, graduate, returning adult), then we could use the modal values for each student type. Thus, we would want values for student type to be entered before student age. Alternatively, an impertinent student might respond to this question by saying why even store age. Store the birthday and let the system calculate age when needed. 11. One suggested answer is a situation where an employee needs to know a contact name for a vendor, a quoted price for a particular item, and a description for the item. If the tables are denormalized, this query requires the joining of only two tables; however, if we are using normalized relations, this query requires the joining of all three relations. Guidelines for Using the Field Exercises 1. Students can search through academic journals, textbooks, or the Web. They are likely to find information about other normal forms such as Boyce-Codd normal form (remove remaining anomalies resulting from functional dependencies), fourth normal form (removing multivalued dependencies), and fifth normal form (removing remaining anomalies). Students are likely to encounter these other normal forms in a database class (often from a Computer Science department) or on the job, particularly in a job where they are developing a relatively large, complex database. As with all normal forms, each rids the data model of potential redundancies and inconsistencies. 2. Normalized relations, a set of specifications specifying the format and structure of the data in secondary storage, and an updated CASE repository are several of the deliverables from file and database design. 3. Students should identify several types of information that are most useful for file and database design. Information as it relates to primary keys, secondary keys, data types, business rules, relationships, data integrity control methods, volume, present and future required data storage space, and file organization should be collected. 4. There are probably a number of DBMSs available at the student’s university. For example, the student’s PC laboratory is likely to have at least one DBMS available; perhaps Microsoft Access for Windows is available. A client-server or object-oriented DBMS might be available as well. In addition, the university’s administrative database applications are probably run on a mainframe or minicomputer. A DBMS such as IBM’s DB2 or the Oracle DBMS are probably being used. The data types supported by these DBMSs provide important criteria for 160 Chapter 12 Designing Databases: Logical Data Modeling determining what types of applications each of these DBMSs is best suited. PC DBMSs tend to have fewer physical file and database design options than do mini- or mainframe computerbased DBMSs. In addition, students should consider the number of records and physical file organizations that the DBMS allows, the available storage space on physical storage mediums, the record locking and other security features that the DBMS provides, the availability of a CASE tool, and other factors. 5. Your students should find that many of the physical file and database design issues presented in the chapter are dealt with at their university. For instance, storage space, file organizations, and data types are issues that must be addressed. Guidelines for Using the Broadway Entertainment Company Cases Guidelines and answers for using the Broadway Entertainment Company cases are available on this textbook’s companion Web site. Please visit www.prenhall.com/hoffer to access this information. 161