Chapter 15 Developing Databases: Logical Data

advertisement
Chapter 12
Designing Databases
Chapter Objectives
Chapter 12 introduces students to database design, discussing both logical and physical database
design. During logical design, logical data models are created for each known user interface, the
logical models for each interface are integrated into a consolidated logical database model, the
application’s conceptual E-R data model is translated into normalized data requirements, and the
logical database model is then integrated with the translated E-R model. During physical design,
decisions about data types, data structures, file organizations, and media are made.
This chapter introduces the relational data model, the most common notation used for representing
detailed data requirements necessary for database design. Concepts of the relational data model,
normalization principles for creating relational models with desirable properties, a process for
combining different relational data models into a consolidated relational data model, and how to
translate an entity-relationship data model into a relational data model are presented.
Chapter 12 reviews several choices systems builders have for the design of physical data storage
structures. Chapter 12 emphasizes those decisions for which a systems analyst is most likely to have
input, as opposed to the very technical data structure decisions made by database administrators and
analysts. You should emphasize to your students throughout this section that physical design issues
are addressed as part of a team of system analysts and technology specialists. Systems analysts bring
an in-depth understanding of the application to the table, whereas technology specialists intimately
understand the relative efficiency, security, integrity, and reliability of different technologies in
different settings. To communicate with these specialists, systems analysts must have a sound
knowledge of physical design issues, which is accomplished from studying the chapters in this
section. Some students are attracted to the more technical topics (possibly because of prior exposure
to computer science topics), and others are interested in the more methodological and managerial
issues. Thus, another challenge of teaching Chapter 12, as well as Chapters 15 and 16, is keeping a
balance of business and technical issues so that you hold the attention of all students.
Chapter 12 provides a transition from typical systems analysis topics to data analysis topics; these
topics are often discussed in database management courses. You should coordinate the teaching of
this chapter with database management course faculty so that purposeful redundancy occurs and
important topics are not missed across the courses in your curriculum.
Instructional Objectives
Specific student learning objectives are included at the beginning of the chapter. From an
instructor’s point of view, the objectives of this chapter are to:
1. Show the relationship between systems analysis and design and database design. The
philosophy of this textbook is that database design is a topic of joint interest between systems
analysts and database specialists. In most cases, considerable interaction between conceptual
and logical database design exists, combining the top-down approach of conceptual data
149
Modern Systems Analysis and Design, 3rd edition
Instructor’s Manual
modeling with a bottom-up approach using logical data modeling tools. Application area
knowledge and enterprise database knowledge (often found in a data administration group) are
necessary to converge to a viable logical design for an application’s database. Remember, an
application’s logical database design does not imply a separate physical database for that
application, but rather only a separate view of data, which may be part of a more comprehensive
physical database.
2. Show the relationship between logical database design and physical database design. An
important point under this objective is that it is the job of the analyst to capture during prior
systems development phases--especially analysis and logical design--all of the parameters
needed to make physical system design decisions. Although some information is not necessary
for the techniques of prior stages, this information (such as field length, data integrity
requirements, and an estimated number of entity instances) is essential for physical system
design.
3. Present the relational data model as a logical data model that captures the structure of data in a
very fundamental, stable form and suggests ways to organize data during physical database
design, resulting in desirable data maintenance properties (which avoid certain data anomalies).
4. Show students how a conceptual data model can obscure some details about data requirements
that must be better understood in order to perform physical database design.
5. Show students, using an example from Hoosier Burger, how to translate a conceptual data model
into a logical data model and how to incorporate the data requirements of specific system outputs
into the process of forming a logical data model.
6. Improve the ability of students who will take systems analyst jobs to communicate with technical
specialists on systems development teams.
7. Emphasize the distinction between logical and physical system design by coverage of such
topics as denormalization; such topics clarify the different objectives of logical and physical
design (which is basically semantic richness of describing information requirements versus
efficient and secure data processing).
8. Discuss database design for Internet-based electronic commerce applications.
Classroom Ideas
1. This chapter, like Chapter 10, covers a topic addressed in most database management courses.
Depending on your curriculum, this chapter may review previously covered material or
introduce material covered (in more depth) in a subsequent course. However, logical database
design is not strictly a database topic, but is essential for thorough systems analysis, thus it is an
activity that should not be assigned to only specialists (database analysts). Although you are
strongly encouraged to cover this chapter in your systems analysis and design course, you should
coordinate how you address this topic with those who teach database courses. Chapter 12 is
carefully written for the systems analysis and design student. This chapter is an excellent
refresher for students who have studied the relational data model and normalization and provides
a solid introduction to these topics for those students who will address this topic later in a
database management course.
150
Chapter 12
Designing Databases: Logical Data Modeling
2. Emphasize to your students that logical data modeling is still technology independent. A logical
data model is not necessarily mapped on a one-to-one basis into a physical database design. The
purpose of logical database modeling is to prepare the description of stored data requirements
into a format that makes it easier for physical database design decisions to be made. Students
may want to go directly from conceptual data modeling to physical database design, so spend
some time discussing the purpose of logical database modeling.
3. Review the key steps of logical database modeling. It is important that novice data modelers
understand that all four steps are necessary to produce a thorough logical data model as input to
physical database design.
4. Understanding the relational data model is critical for doing logical database modeling. The
relational model is fairly simple, and since most students have experience with a PC database
management system, this model is intuitive for them. Emphasize the five properties of relations
and the concept of anomalies.
5. You can introduce normalization to your students from two perspectives. One approach is to
introduce first, second, and third normal forms and teach your students to transform
unnormalized data into third normal form by stepping through each normal form in sequence.
Another approach is to emphasize functional dependencies and determinants (see Problem and
Exercise 6). Use whichever approach is most comfortable for you; this chapter supports either
approach.
6. Consider spending a significant portion of your class periods (allocated to Chapter 12) working
problems that show how to translate between E-R and relational data models. Table 12-1 is a
compact summary of how to map E-R constructs into relational constructs. Your students
should become competent with translating in either direction. Problems and Exercises 2, 3, 4,
and 6 are suitable for in-class exercises, but you should create other examples. Work a few
examples for your students, and then have your students work (either individually or in small
teams) on several problems in class and then present their answers. Practice is the best teacher
of both normalization and translating between relational and E-R models.
7. Emphasize to your students that most E-R models developed during analysis are incomplete
since system inputs and outputs are not designed in detail until logical design. Use this
explanation to motivate the need for view integration. Be sure to discuss the potential pitfalls
(view integration problems) that make view integration more than a mechanical process. Again,
use many examples; Problem and Exercise 4 is a fairly simple one that students can work on
inside or outside of class. Ask your students why view integration problems arise; for example,
sample problems include independent analysts or project teams of different subsystems with
slightly different data semantics, the integration of multiple independently-developed
applications coming together to create an enterprise data model, and imprecision or lack of
naming standards by analysts.
8. You should review in class the Hoosier Burger example found in the “Logical Database Design
For Hoosier Burger” section. This example illustrates how to deal with both translating an E-R
model into 3NF relations as well as integrating specific system requirements into the data model
during logical database design.
151
Modern Systems Analysis and Design, 3rd edition
Instructor’s Manual
9. You should emphasize to your students that physical database design is technology dependent,
as well as logical requirements dependent. To make physical data storage decisions, one must
know what constraints and opportunities available technologies have. What file organizations
do the available database management systems have? What data types do the operating system
and other system software support? What are the physical characteristics of secondary memory
devices and what overhead space does data management software require? These are questions
that technical specialists can answer; systems analysts, with application knowledge, along with
technical specialists armed with answers to these questions can together perform physical file
and database design.
10. Discuss with your students the role of the CASE repository as a central source of information
necessary to make physical design decisions and as a place where these decisions are stored.
You can also point out the interactive nature of some logical and physical design issues, such as
choosing field data types, and that a CASE repository helps to synchronize logical and physical
design activities. For example, when prototyping computer displays or reports with many design
tools, analysts may have to select the data type and length of fields before these fields are placed
on the prototype. If this display or report generator is a module of the CASE tool, then it draws
on the CASE repository for necessary information in building the user interface.
11. Emphasize the important role analysts play in designing data integrity controls. Analysts are
essential because designers need an in-depth knowledge of the business area and application to
choose default values, picture formats, and range constraints; to determine whether null values
should be allowed; and if a null value is present, how to process the data.
12. Review the types of file organizations. Discuss the seven factors that analysts consider when
selecting a file organization. Have students evaluate each file organization with respect to each
of the seven factors. Finally, review Table 12-3 as a way summarize the discussion.
13. Often when students prepare system development projects for the systems analysis and design
course, they do not include adequate file and database controls into the system. Call your
students’ attention to the section “Designing Controls for Files” and emphasize that controls are
essential elements of a system and must be designed into the system, not added as an
afterthought.
152
Chapter 12
Designing Databases: Logical Data Modeling
Answers to Key Terms
Suggested answers are provided below. These answers are presented top-down, left to right.
21.
28.
13.
8.
23.
27.
7.
20.
19.
26.
10.
5.
2.
1.
Relation
Well-structured relation
Normalization
Functional dependency
Second normal form (2NF)
Third normal form (3NF)
Foreign key
Referential integrity
Recursive foreign key
Synonyms
Homonym
Field
Data type
Calculated (or computed or derived) field
3.
14.
16.
4.
15.
6.
17.
25.
12.
11.
24.
9.
18.
22.
Default value
Null value
Physical table
Denormalization
Physical file
File organization
Pointer
Sequential file organization
Indexed file organization
Index
Secondary key
Sequential file organization
Primary key
Relational database model
Answers to Review Questions
1. The purpose of normalization is to rid relations of anomalies. The goal is to form wellstructured relations that are simple and stable when data values change or data are added or
deleted.
2. The five properties of relations are entries in columns are simple, entries in columns are from the
same set of values, each row is unique, the sequence of columns is insignificant, and the
sequence of rows is insignificant.
3. Synonyms, homonyms, transitive dependencies, and class/subclass relationships can arise during
view integration. Synonyms occur when two or more different names are used for the same
attribute from different user views. Homonyms occur when two or more attributes from
different user views have the same name. Transitive dependencies are functional dependencies
between non-key attributes that arise when functionally dependent non-keys come from different
user views. Class/subclass relationships refer to relations representing the same entity from
different user views actually represent different subsets of the same entity type.
4. Relationships between entities are represented in several ways in the relational data model. A
binary 1:M relationship is represented by placing a foreign key (the primary key of the entity on
the one side of the relationship) in the relation for the entity on the many side of the relationship.
In a binary 1:1 relationship, a foreign key is placed in the relation on either side of the
relationship or on both sides. For a binary and higher degree M:N relationship, a relation is
created with a primary key, which is the concatenation of the primary keys from the related
entities. In a unary relationship, a recursive foreign key is added to the relation.
153
Modern Systems Analysis and Design, 3rd edition
Instructor’s Manual
5. The fundamental rule of normalization is that each non-key attribute must be fully functionally
dependent on the whole primary key attribute (a non-key is dependent on the whole key and
nothing but the key). Thus, there can be no functional dependencies between non-keys.
6. A foreign key is identified by using a dashed underline.
7. Instances in a relation cannot prove that a functional dependency exists; however, you can use
sample data to demonstrate that a functional dependency does not exist. The sample data does
not show you every possible instance, only a sampling. Knowledge of the problem domain is a
reliable method for identifying functional dependencies.
8. The choice of data type often limits the possible values that may be stored for a field. For
example, a numeric data type forbids alphabetic characters. Some data types have an assumed
length (e.g., SMALLINT) that places an implicit range control on values. Data type may also
limit the kinds of data manipulations possible, thus further controlling the integrity of the data or
results from manipulating the data. For example, a DATE data type causes addition and
subtraction to be limited by rules about dates.
9. A referential integrity control requires the data management software to access other data records
to determine if the value is permitted, whereas a range control is checked by looking up values
outside the files and database, in a repository or other source of metadata.
10. The purpose of denormalization is to physically locate data close to one another if they are often
needed together for processing, thus minimizing secondary memory I/O operations. Because
normalization forces all attributes dependent on the same primary key to be logically placed in
one relation, a relation can become quite diverse with many attributes. For example, a PART
relation might have attributes related to engineering, production, accounting, and marketing. It
is likely that in only rare instances attributes from two or more of these areas are needed in the
same data processing steps. If the attributes were stored in the same physical record, the record
would be long and it would take more time to access this file than if the file were divided into
segments and each segment had the same primary key and only fields for those attributes used
together.
11. The factors that influence the decision to create an index are the data retrieval, insertion,
deletion, and updating costs with and without the index. Indexes allow for rapid random
retrieval and sorting of data, but indexes create additional storage and maintenance costs.
12. Data compression techniques are pattern matching and other methods that replace repeating
strings of characters with codes of shorter lengths, thus reducing data storage requirements.
13. The two goals of designing physical tables are the efficient use of secondary storage and data
processing speed.
14. The seven factors to considered when selecting a file organization are: (1) fast data retrieval, (2)
high throughput for processing transactions, (3) efficient use of storage space, (4) protection
from failures or data loss, (5) minimizing need for reorganization, (6) accommodating growth,
and (7) security from unauthorized use.
154
Chapter 12
Designing Databases: Logical Data Modeling
Answers to Problems and Exercises
1. The wording in the problem clarifies some of the relationships and associated cardinalities.
Since VENDOR is functionally dependent on COMPNAME, there can be at most one vendor
for each component (and we assume that there may be no vendor for a given component). Also,
since COMPNAME is functionally dependent on PRODNAME, there can be at most one
component per product (a rather odd situation, but that is what the wording says; we assume that
some products have no components). Although not clarified in the problem, we assume that a
product is assigned to exactly one salesperson, while a salesperson can be assigned one-to-many
products. Given these clarifications, the 3NF relations are (foreign keys are in italics):
PRODUCT (PRODNAME, SALESPERSON, COMPNAME)
SALESPERSON (SALESPERSON)
COMPONENT (COMPNAME, VENDOR)
VENDOR (VENDOR)
2. Listed below are sample 3NF relations for the conceptual data model diagram in Figure 10-3.
We have created a few, representative attributes to make this normalization meaningful (foreign
keys are in italics). The SHIPMENT and PRODPLAN relations contain nonkey attributes,
whereas the PRODITEM and SUPPITEM relations, the result of the many-to-many
relationships, have no nonkey attributes, since these are not shown as associative entities in
Figure 10-3. We have not, at this point, created proper primary keys.
SUPPLIER (SUPPNAME, SUPPADDRESS)
SHIPMENT (SHIPID, SUPPNAME, ITEMNAME, SHIPDATE)
ITEM (ITEMNAME, ITEMDESC)
SUPPITEM (SUPPNAME, ITEMNAME)
PRODUCT (PRODNAME, PRODDESC)
PRODITEM (PRODNAME, ITEMNAME)
PRODPLAN (PRODPLANNO, ITEMNAME, MASTSCHEDNO, QUANTITY)
MASTSCHED (MASTSCHEDNO, PRODNAME, MASTSCHEDDATE)
3. Listed below are 3NF relations for the E-R diagram on Figure 12-21. Foreign keys are in italics.
QUOTE QUANTITY is not sufficient as the primary key of the PRICEQUOTE relation, but this
attribute along with the primary keys of the two associated entities is a sufficient concatenated
(or composite) primary key for this associative entity. The PART RECEIPT entity is called a
weak or attributive entity, since its existence depends on a PRICE QUOTE entity instance. In
this case, however, the PART RECEIPT entity has its own primary key, ORDER NO. The 3NF
relations are:
VENDOR (VENDOR NO, ADDRESS)
PRICEQUOTE (VENDOR NO, ITEM NO, QUOTE QUANTITY, PRICE)
PART (ITEM NO, DESC)
PART RECEIPT (ORDER NO, VENDOR NO, ITEM NO, QUOTE QUANTITY, DATE,
ORDER QUANTITY)
4. Listed below are merged 3NF relations for this hospital example. This is an interesting exercise
because it points out how semantically lacking the relational data model is, since questions arise
about functional dependencies across separately developed relations. One observation is clear:
The second 3NF PATIENT relation in this exercise has only one value of TREATMENT
155
Instructor’s Manual
Modern Systems Analysis and Design, 3rd edition
DESCRIPTION, so each patient must be associated with only one treatment, otherwise this
relation would not be in 3NF. But, we must also assume that each department has only one
supervisor and each supervisor can supervise only one department. This last assumption means
that we could create only a supervisor or a department relation, but not both. This is sufficient
because, if the original set of six 3NF relations is comprehensive, there are no nonkey attributes
dependent on either DEPARTMENT or SUPERVISOR ID. We, however, create both
supervisor and department relations, with a one-to-one relationship between them, to allow for
some evolution of the data model. One additional assumption about supervisors:
SUPERVISOR is a separate entity from PHYSICIAN. We also assume that the attribute
ADDRESS means the same address in both PATIENT relations, and further there are no other
synonyms or homonyms across the relations. Interestingly, there is no relationship between
patient and physician implied in the original 3NF relations, and we assume none exists. With
these assumptions, the merged relations are (foreign keys are in italics):
PATIENT (PATIENT NO, ADDRESS, ROOM NO, ADMIT DATE, TREATMENT ID)
ROOM (ROOM NO, PHONE, DAILY RATE)
PHYSICIAN (PHYSICIAN ID, NAME, DEPARTMENT ID)
TREATMENT (TREATMENT ID, DESCRIPTION, COST)
SUPERVISOR (SUPERVISOR ID, DEPARTMENT ID)
DEPARTMENT (DEPARTMENT ID, SUPERVISOR ID)
To create the E-R diagram from these 3NF relations, we have to make additional assumptions
about minimum cardinalities. We assume that every patient is assigned a room, but a room may
be empty; not all treatments have to be associated with a patient, but a patient has to have a
treatment; and that each department has one supervisor and each supervisor has one department.
We show relationships from both a department and a supervisor to a physician, but only one is
necessary; we also assume that a physician must be associated with both a department and a
supervisor. This is an interesting E-R diagram since it contains two, disconnected parts. This is
possible, although rare in actual organizations. A suggested E-R diagram, including attributes,
for this situation is presented below.
Application
E-R Diagram
DESCRIPTION
DEPARTMENT ID
DEPARTMENT
TREATMENT ID
SUPERVISOR ID
SUPERVISOR
Supervises
Given
COST
TREATMENT
PATIENT NO
ROOM NO
Works
in
Works
for
PHYSICIAN
PATIENT
ADDRESS
PHYSICIAN ID
PHONE
DAILY RATE
NAME
ADMIT DATE
Located
in
156
ROOM
Chapter 12
Designing Databases: Logical Data Modeling
5. There are several foreign keys in these relations. OFFICER ID is a foreign key in OFFICE
referencing MEMBER ID from the MEMBER relation. OFFICE NAME is a foreign key in
EXPENSE referencing OFFICE NAME in the OFFICE relation. OFFICER IN CHARGE is a
foreign key in COMMITTEE referring to OFFICER ID or OFFICE NAME (which is not clear
from simply the relations) in the OFFICE relation. EXPENSE LEDGER NUMBER is a foreign
key in PAYMENT referencing LEDGER NUMBER in the EXPENSE relation. MEMBER ID
in both RECEIPT and WORKERS cross-references MEMBER ID in the MEMBER relation.
COMMITTEE ID in WORKERS cross-references COMMITTEE ID in COMMITTEE.
See the accompanying E-R diagram. For simplicity, we do not show attributes on this E-R
diagram. The WORKERS relation exists because of a many-to-many relationship, Works On,
between MEMBER and COMMITTEE.
It is inferred that a member sometimes has many receipts, but a receipt must have a member. An
expense sometimes has multiple payments, but each payment must have an expense. Each office
sometimes has multiple expenses, but each expense must have an office. Each office may have a
member as an officer-in-charge, and each member sometimes holds many offices. An office
sometimes is responsible for many committees, and each committee must have an office in
charge (although that office may not have a member assigned as officer). Committees
sometimes have many workers, and each worker sometimes works on many committees. The ER diagram is more expressive in that it displays explicitly the minimum cardinalities of
relationships and shows exactly which entities are related.
Fraternity
E-R Diagram
MEMBER
Submits
RECEIPT
Incurs
EXPENSE
Holds
Works
on
OFFICE
In Charge
of
Reimburse
COMMITTEE
PAYMENT
6. Since there are four determinants among the functional dependencies, there will be four
relations. The last functional dependency, the one with only a three-key composite determinant,
157
Instructor’s Manual
Modern Systems Analysis and Design, 3rd edition
signifies all the dates on which a particular applicant interviewed for a particular position. This
functional dependency does not signify a many-to-many relationship, like many composite keys
do, since date interviewed is itself not a determinant. It signifies an entity with a threecomponent composite key. The four 3NF relations are:
APPLICANT (APPLICANT ID, APPLICANT NAME, APPLICANT ADDRESS)
POSITION (POSITION ID, POSITION TITLE, DATE POSITION OPENS,
DEPARTMENT)
APPLICATION (APPLICANT ID, POSITION ID, DATE APPLIED)
INTERVIEW(APPLICANT ID, POSITION ID, DATE INTERVIEWED)
A suggested E-R diagram is provided below. For clarity, we show composite keys as a single
composite attribute.
APPLICANT ID + POSITION
APPLICANT ID
APPLICANT
POSITION ID
DATE APPLIED
Applies
APPLICATION
Responds
to
POSITION
POSITION TITLE
APPLICANT ADDRESS
DEPARTMENT
APPLICANT NAME
Interviews
for
DATE POSITION OPENS
APPLICANT ID + POSITION ID + DATE INTERVIEWED
INTERVIEW
7. The objectives of a good coding scheme are to minimize storage space and to increase data
integrity. Student major is a classical example of a sparse field that can benefit from being
codified for storage and data entry. For on-line data entry, you could provide a list of possible
majors from which the data entry person must choose; this legal list of codes will need to be
updated, but this can probably be done separately from data entry of student data. A
fundamental coding choice is whether to use codes that are as dense as possible or to try to use
reasonably dense codes that have some meaning to most users (e.g., MIS versus 31 for a
Management Information Systems major). Short character string codes (e.g., three alphabetic
characters) may take as little storage as a two digit numeric code, so a short character string may
achieve both objectives. An interesting way to approach this question is to have your students
identify how their university and at least two other universities represent a student’s major.
Have your students compare and contrast these coding schemes. It is likely that they will locate
universities where the code is numeric, alphabetic, or alphanumeric.
158
Chapter 12
Designing Databases: Logical Data Modeling
8. Suggestions for the primary keys are underlined in the following relations. Attributes appearing
in italics serve as foreign keys.
VENDOR (VENDOR NO, ADDRESS)
PRICEQUOTE (VENDOR NO, ITEM NO, QUOTE QUANTITY, PRICE)
PART (ITEM NO, DESC)
PART RECEIPT (ORDER NO, VENDOR NO, ITEM NO, QUOTE QUANTITY, DATE,
ORDER QUANTITY)
If VENDOR NO, ITEM NO, and ORDER NO are numeric values assigned without any
relationship to the associated entities, then these numbers would not change as the real world
changes, and they would be acceptable as primary keys or components of composite primary
keys. QUOTE QUANTITY, on the other hand, is likely volatile, and is not suitable as part of
the primary key for the PRICEQUOTE table. We need to create a nonintelligent primary key
for the PRICEQUOTE table. Also, we still may need individual secondary index keys on
VENDOR NO and ITEM NO attributes from this relation to facilitate joining the
PRICEQUOTE table with the VENDOR and PART tables, respectively.
9. The guidelines for identifying keys for indexing suggest that attributes used for selection,
sorting, grouping, and joining are potential candidates. We are asked to consider the three
queries as the only accesses to the database, so we do not implicitly need primary key indices.
We do not know the frequency of the three queries compared to update operations, so it is
impossible to make precise, optimal decisions on the most economical indices. Space does not
seem to be an issue since, as we will see, none of the qualifying fields are very long. So, we
indicate all possible indices that might speed query processing. The first query, Query A,
appears to access the PART and PRICEQUOTE tables, but actually the PRICEQUOTE table is
sufficient. The E-R diagram indicates that every part has at least one vendor, so every part has at
least one price quote. Thus, all parts appear at least once in the PRICEQUOTE table. The data
reported in Query A (ITEM NO, VENDOR NO, QUOTE QUANTITY, and QUOTE PRICE)
are found in the PRICEQUOTE table. The query asks for the results to be sorted by ITEM NO
and by VENDOR NO (but not by quote quantity). A composite index on these two attributes is
the most efficient way to directly produce this sorted output, thus avoiding the need to do a sort
of the data once it is retrieved. This assumes that the DBMS or file system utilizes a composite
index when it sees the compound sorting condition. Thus, for Query A, a secondary composite
index on first ITEM NO and then VENDOR NO is ideal.
The second query, Query B, asks to display all the attributes from the PART RECEIPT and
PART tables; since ITEM NO is an attribute in the PART RECEIPT relation, we do not need
the PRICEQUOTE relation to link the PART RECEIPT and PART tables. No sorting is done,
but the query wants data for only a specified part receipt date. Assuming that many days of part
receipt data are kept, then we would want to create a secondary key index on DATE and a
secondary key index on ITEM NO in the PART RECEIPT table and a primary key index on
ITEM NO in the PART table to efficiently support the selection and joining needed for Query B.
The third query, Query C, involves selection of a particular vendor and display of attributes from
all associated PRICEQUOTE rows for that vendor; thus, all the data needed for this query can
be found in the PRICEQUOTE table. No sorting is mentioned. Thus, for Query C, only a
secondary key index on VENDOR NO in the PRICEQUOTE table would support efficient
processing.
159
Modern Systems Analysis and Design, 3rd edition
Instructor’s Manual
10. This question highlights the difficulty of setting useful default values. Our analysis also suggests
the potential difficulty of choosing a range check on the age attribute. The bulk of students at
most universities typically range in age from 18 to 23. Some students attend university before
age 18. For example, some students graduate from high school at age 17 or opt to take a high
school equivalency examination and finish high school early (e.g., while younger than 18). In
addition, sometimes high school students attend college classes while still enrolled in high
school. Finally, there is the occasional child prodigy who attends college at an earlier than usual
age. For the upper limit on the age range, there will obviously be some students older than 23,
particularly if there are undergraduate students who are working adults or who return to school
after other careers (such as military service or child rearing). Similarly, if there are graduate
students at this university then there will most certainly be students older than 23. In theory, a
student’s age could reach three digits. If we could have only one default value, we could study
several years of student data and pick the modal age value. If we could have default values
conditional on student type (e.g., undergraduate, graduate, returning adult), then we could use
the modal values for each student type. Thus, we would want values for student type to be
entered before student age. Alternatively, an impertinent student might respond to this question
by saying why even store age. Store the birthday and let the system calculate age when needed.
11. One suggested answer is a situation where an employee needs to know a contact name for a
vendor, a quoted price for a particular item, and a description for the item. If the tables are
denormalized, this query requires the joining of only two tables; however, if we are using
normalized relations, this query requires the joining of all three relations.
Guidelines for Using the Field Exercises
1. Students can search through academic journals, textbooks, or the Web. They are likely to find
information about other normal forms such as Boyce-Codd normal form (remove remaining
anomalies resulting from functional dependencies), fourth normal form (removing multivalued
dependencies), and fifth normal form (removing remaining anomalies). Students are likely to
encounter these other normal forms in a database class (often from a Computer Science
department) or on the job, particularly in a job where they are developing a relatively large,
complex database. As with all normal forms, each rids the data model of potential redundancies
and inconsistencies.
2. Normalized relations, a set of specifications specifying the format and structure of the data in
secondary storage, and an updated CASE repository are several of the deliverables from file and
database design.
3. Students should identify several types of information that are most useful for file and database
design. Information as it relates to primary keys, secondary keys, data types, business rules,
relationships, data integrity control methods, volume, present and future required data storage
space, and file organization should be collected.
4. There are probably a number of DBMSs available at the student’s university. For example, the
student’s PC laboratory is likely to have at least one DBMS available; perhaps Microsoft Access
for Windows is available. A client-server or object-oriented DBMS might be available as well.
In addition, the university’s administrative database applications are probably run on a
mainframe or minicomputer. A DBMS such as IBM’s DB2 or the Oracle DBMS are probably
being used. The data types supported by these DBMSs provide important criteria for
160
Chapter 12
Designing Databases: Logical Data Modeling
determining what types of applications each of these DBMSs is best suited. PC DBMSs tend to
have fewer physical file and database design options than do mini- or mainframe computerbased DBMSs. In addition, students should consider the number of records and physical file
organizations that the DBMS allows, the available storage space on physical storage mediums,
the record locking and other security features that the DBMS provides, the availability of a
CASE tool, and other factors.
5. Your students should find that many of the physical file and database design issues presented in
the chapter are dealt with at their university. For instance, storage space, file organizations, and
data types are issues that must be addressed.
Guidelines for Using the Broadway Entertainment Company Cases
Guidelines and answers for using the Broadway Entertainment Company cases are available on this
textbook’s companion Web site. Please visit www.prenhall.com/hoffer to access this information.
161
Download