Module 3: Relational Database Design

advertisement
Module 3: Relational Database Design
Overview
In this module, we present the fundamentals of good relational database design, with an emphasis on logical
database design. The steps in the database-design process are discussed within the context of the overall
development of computer-based database applications. We describe the most popular relational database
design methodology as well as entity/relationship diagramming We present the enhanced entity/relationship
model and conclude with a few case studies to reinforce the relational database design procedure.
Module 3: Relational Database Design
Objectives
After completing this module, you should be able to:
explain the database design steps within the systems development life cycle
differentiate among conceptual, logical, and physical database design
define the terms entity, attribute, and relationship
understand the process of analyzing the important aspects of an end-user's application in order to
describe them using entities, attributes, and relationships
construct an entity/relationship diagram for a simple relational database application
utilize the enhanced entity/relationship model for more complex applications
describe alternative database design methodologies and notations
Module 3: Relational Database Design
Commentary
I.
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
Logical RDB Design
Data Modeling Process
Entity Relationship Diagramming
Enhanced Entity/Relationship Modeling
Object Modeling
RDB Specifications from ERDs
Case Study #1 (Legal Cases)
Case Study #2 (10K Race)
Case Study #3 (Car Rental Application)
I. Logical RDB Design
Most modern computer-based systems for enterprises now include a database component in addition to
application software. Thus these applications are sometimes referred to as database applications. In this
module we will be discussing the issues relevant to designing large, multi-user database applications, with an
emphasis on high performance transaction processing systems.
SDLC
When building a database application, the techniques of systems analysis describe the following steps as the
systems development life cycle (SDLC) process:
Module 3: Relational Database Design
1.
2.
3.
4.
5.
6.
7.
8.
Preliminary investigation (a.k.a. feasibility analysis)
Requirements analysis
Design
Development
Implementation
Acceptance testing
Delivery (a.k.a. deployment)
Maintenance
Database applications actually have their own set of activities that are more database-specific than the general
SDLC. This set of database development steps is sometimes referred to by names such as the database
application system life cycle.
Development Methodologies
Because database systems include both an application program (i.e., functional) component and a database
component, the database application system life cycle process for constructing these two parts of the system
progresses along parallel paths. The functional portion of the database application is handled by application
programs. Techniques such as data flow diagrams (DFDs) are used to design application programs. Various
computer aided software engineering (CASE) tools are available to assist in the design of the application
software for the user interface (forms, reports, etc.). The data portion of the system, consisting of data
structure and content issues is predominantly handled by the database itself. CASE tools can also be used to
assist in the logical and physical design of the data portion of the system (e.g., Oracle Designer).
Traditionally in the SDLC process above, steps are conducted in a serial fashion, where one step begins when
the previous step is completed. This is referred to as the "waterfall" technique. Unfortunately this can result in
a long time to deliver a system and frequently results in an inability to meet the evolving end-user's
requirements. One technique to minimize this problem is the use of feedback loops between the different
steps, such that one step will provide enhanced detail to an earlier step, and then the process moves forward
again. Therefore the development process frequently progresses using a spiral approach between steps,
versus a truly sequential process. A system built in this manner may be delivered in an incremental fashion.
Other techniques that help alleviate the long time to delivery, by speeding up the design and development
process, include rapid prototyping using visual programming tools, and object-oriented methodologies
(e.g., Oracle Developer and Sybase PowerBuilder). The object-oriented methodology is covered in module 6 of
this course and examples of development tools are presented in the advanced relational databases course.
Conceptual Design
Following the requirements analysis phase of database application development, the conceptual design of a
database is established. The conceptual design is an attempt to demonstrate that all of the end-user's data
requirements have been captured and are understood by the developer. The conceptual design is independent
of both the data model and any specific DBMS. The conceptual model represents a global view of data. It is an
enterprise-wide representation of data as viewed by senior management.
Since this course is concerned primarily with relational databases, the data model we will use is the relational
model (versus the hierarchical, network, object-oriented, or other models). Following the choice of data model,
the logical design of the relational database is conducted. A logical relational database design requires building
an abstract (i.e., high level) model of the data elements to be included in the database and the relationships
between the data elements.
ERDs and Physical Design
With relational databases, the most popular conceptual and logical data modeling technique is an
entity/relationship diagram (ERD). An ERD is a special type of semantic model—a model that gives
meaning to the data included. An ERD is commonly used to translate different views of data among
Module 3: Relational Database Design
management, database users, and developers to fit into a common framework, which will assist in many
aspects of database development. ERDs are discussed thoroughly later in this module.
Later in the design phase of database development, the actual relational database is built as part of the
physical design activity. This activity may consist of an enterprise data administrator turning over an ERD,
representing a logical model of the database, to a DBA to implement to build the database, using a specific
RDBMS and host platform. The physical database design phase also includes the specification of physical data
layouts, indexing and issues related to database tuning for better performance. The details of relational
database performance and tuning are covered in the advanced relational databases course.
Development
The database development approach with entity/relationship diagramming represents a top-down development
approach, where the logical design phase is at a high level of abstraction, although the determination of entities
and their entities (discussed in detail in the next section) from all of the required data elements is a bottom-up
approach.
The DDL SQL used to construct the physical relational database and the DML SQL to initially populate the
database will be discussed in module 4.
Implementation
Once the database is built, the implementation phase takes place. During this phase, data is added (called
"populating the database"), user forms and reports are built, and end-user queries are developed. Legacy
system data may also need to be added to the database, possibly using conversion routines to reformat the
data for the new system.
Acceptance Testing, Delivery, and Maintenance
When the database application is complete, the end user conducts system acceptance testing and, if the
system is able to meet their requirements, accepts delivery from the developer. System delivery usually
includes training for the end users, the delivery of system documentation and a period (usually years) of
maintenance. When the new system is no longer able to meet the end-user’s needs at some later time the
SDLC process begins anew on a new system.
II. Data Modeling Process
End-User Interviews
During the requirements analysis phase there are several techniques to gather the end user's requirements,
including end-user interviews, questionnaires, observing the existing operation, and conducting a functional
analysis of existing system forms, reports, and documentation. Note that in this course we are stressing the
gathering of requirements to develop the database, not the development of the application software.
The purpose of end-user interviews is to meet with various levels of the enterprise's end users, and in doing so,
capture the data elements required, the way the data elements are related, and the ways these data elements
will be entered, updated, and accessed by these users.
In a large system, the requirements analysis process consists of a series of data-gathering interview sessions in
which the developers listen carefully and, through experience, ask the end users data-related questions so as to
clarify the data that comprise the customer's business.
Entities, Attributes, and Relationships
Module 3: Relational Database Design
The database developer strives to identify primarily three aspects of the data:
1. entities
2. attributes
3. relationships
Simply stated, the entities are the very important "things" that exist within the enterprise. By discussing the
system requirements with the end user, the developer should be able to identify these "things" as the "nouns."
During requirements gathering, the developer should be able to identify the characteristics of the entities as
attributes, or "adjectives."
Finally, the developer should identify the relationships between the entities. These relationships are like the
"verbs" and "verb clauses" of the requirements. This noun-adjective-verb analogy is actually advocated by some
texts as a procedure to analyze written requirements for a database application.
Business Rules
The database developer also seeks out the business rules and other requirements of the final system.
Business rules describe the day-to-day processing rules for the enterprise. These rules, along with other
requirements will determine the database constraints that must be included in the system.
The seasoned database developer will strive to capture business rule statements that are precise, atomic, and
convey a business aspect of the end user's application. An understanding of the business rules of a system is
critical to developing a system that meets the requirements of the users.
An example of a business rule might be that "all customers have unique account numbers". Another rule might
be that "customers that place three or more orders per month are permitted a 10 percent discount on all
subsequent orders during the remainder of the quarter." As we saw in module 2, the former rule is rather easy
to include via declarative integrity (i.e., DDL SQL), while the latter might require procedural integrity
techniques (e.g., triggers).
Although the business rules may not be evident from an ERD, it is important that they be captured and
documented. As we shall soon see, some of the basic business rules can be captured as cardinalities on the
ERD (i.e., which sides of the relationships are for the "one" side and which are for the "many" side).
III. Entity Relationship Diagramming
Once the entities, attributes, and relationships of the enterprise data are understood, the database developer
constructs a data model for a relational database, usually in the form of an entity/relationship diagram. The
ERD is actually a picture of the data, or a conceptual model, for an enterprise. The ERD does not show the
functions the business performs, which is part of the application software design, a separate topic.
The ERD and the accompanying diagramming technique were developed by Peter Chen in 1976. There are now
a number of different notations used for ERDs. Although there are no standards, there are a number of
conventions.
Entities are usually indicated as rectangular boxes, and are the important "things" that have been noted by the
database developer about the end-user's business. The entity names inside the boxes are in all capital letters.
An example of an entity, the CUSTOMER entity, is shown in figure 3.1. Attributes are shown as words inside
ovals with lines that connect to the entities of which they are characteristics. The many attributes of the
CUSTOMER attributes are shown in figure 3.1, below.
Figure 3.1
Module 3: Relational Database Design
Entity and Attribute Types
The CUSTOMER entity of figure 3.1 is an example of a strong entity. Strong entities have a unique key value
of their own within the end user's application called a key attribute. The key attribute of an entity is the
unique identifier of each instance of an entity. The Customer ID attribute is the key attribute of the CUSTOMER
entity in figure 3.1. Key attributes are underlined.
A rectangular box with two borders is a special type of entity, called a weak entity. A weak entity is existencedependent on a strong entity. Weak entities do not have unique key values of their own in the end user's
application. Rather, their unique identification is partially or totally derived from the strong entity or entities
upon which they are dependent.
A composite attribute is actually composed of the more atomic, simple attributes. Customer ID and Initial
Contact Date are simple attributes of the CUSTOMER entity in figure 3.1, whereas Name and Address are
composite attributes.
A multi-valued attribute exists for an entity when there is some indeterminate number of possible simple
entities that might exist. The Phone Number attribute in figure 3.1 is an example of a multi-valued attribute,
since the CUSTOMER entity may have various types of phone numbers: home phone, business phone, fax
phone, cell phone number 1, cell phone number 2, etc. Multi-valued attributes are shown with a double oval.
A derived attribute can be calculated from another attribute of an entity. For the CUSTOMER entity we might
have added a derived attribute of Customer Longevity, which could be calculated from the Initial Contact Date
attribute.
Relationship Types
There are three types of relationships between entities in an ERD: one-to-one, one-to-many, and many-tomany. Note that the many-to-one relationship is the same as the one-to-many relationship, just viewed from
the opposite direction.
As stated earlier, there are no standard notations for ERDs. The greatest variety in the ERD notations is the way
in which relationships are depicted. Frequently relationships are shown as diamonds with names that are
between two or more entities. As shown in figure 3.2A, which uses the Chen notation, the "1" and "M"
indicators represent the types of relationships that exist between two adjacent entities. Figure 3.2B shows an
alternate notation using "crow's feet" on the "many side" of the relationship. Note that there are a number of
other popular ERD notations besides the two shown below.
Figure 3.2A
Module 3: Relational Database Design
Figure 3.2B
Relationship Sentences and Participation
One popular technique to determine the proper relationships between entities is to construct a set of sentence
pairs between all related entities. All sentences must begin with the word "Each." An entity name is then listed
in singular case after the word "Each." One sentence describes the relationship in one direction and the other
sentence describes the relationship in the other direction. Each pair of sentences describes the overall
relationship between the two entities. This technique is based on the CASE*Method approach and is used by
the Oracle Designer CASE tool.
For example, if we have two entities, CUSTOMER and ORDER, as shown in figure 3.2, we might discover from
the end user that the following sentence pair describes the relationship that exists between these two entities:
Each CUSTOMER may place one or more ORDERs.
Each ORDER is placed by a CUSTOMER.
Notice in figure 3.2A that the relationship verb "place" is inside a diamond. In figure 3.2B there are two
relationship verbs placed at either end of the single relationship line. This latter notation more easily facilitates
the sentence pair strategy since two verbs (or verb clauses) are available (versus a single verb).
The first of these sentences describes a one-to-many relationship with a partial (or optional) participation
(i.e., CUSTOMERS might not place ORDERs). The second sentence describes a one-to-one relationship with a
total (or mandatory) participation (i.e., ORDERs are always placed by a CUSTOMER). The "may place"
optional participation is indicated by a single line by the CUSTOMER entity in figure 3.2A and the oval by the
ORDER entity in figure 3.2B. The "is placed by" mandatory participation is indicated by the double line from the
ORDER entity in figure 3.2A and the hash mark by the CUSTOMER entity in figure 3.2B. Be sure you can
correlate all parts of the sentence pair above to both of the ERD portions in figure 3.2.
It is the total of both of these individual relationship sentences that determines the overall relationship. In this
case the individual one-to-many and one-to-one individual relationships form an overall one-to-many
relationship.
Bridge Entities
In other cases, we might have an overall many-to-many relationship from two individual one-to-many
relationship sentences. In this situation we must form a bridge entity between the two original entities, such
that each of the original entities will be related to the bridge in a one-to-many relationship. For example, the
sentence pair below (from a hypothetical hospital application) results in an overall many-to-many relationship
(refer to the ERD diagram of figure 3.3A):
Each DOCTOR treats one or more PATIENTs.
Each PATIENT is treated by one or more DOCTORs.
Figure 3.3A
Module 3: Relational Database Design
Once the bridge entity is formed, the original sentence pair is no longer valid, since there is no longer a direct
relationship between those entities. Two new sentence pairs must be formed between each of the original
entities and the bridge entity.
For the above sentences we might have a bridge entity of TREATMENT. After forming this new entity, the above
two sentences would be replaced by the two new sentence pairs below, in figure 3.3B:
Figure 3.3B
Each DOCTOR is part of one or more TREATMENTs.
Each TREATMENT involves a DOCTOR.
Each TREATMENT involves a PATIENT.
Each PATIENT is part of one or more TREATMENTs.
Note that each of these new sentence pairs describes an overall one-to-many relationship with the bridge entity
on the many side of the relationship and the original two entities on the one side of the relationship. Bridge
entities are also referred to as association entities or intersection entities in different textbooks. A bridge
entity results from the many-to-many relationship itself.
Bridge entities are weak entities, whose key values are frequently a composite of the two strong entities' key
values to which they are related. The key value for TREATMENT above might be the composite of Patient ID and
Doctor ID. However, a surrogate key Treatment ID might be more appropriate in this situation, as the stated
composite may not be unique (the same doctor can likely provide multiple treatments to the same patient).
Relationship Degrees
The number of entities that participate a relationship is called the degree of the relationship. The relationships
discussed thus far have included two entities and are referred to as binary relationships. Another relationship
worth noting is the classic PART entity, with its "part structure" relationship, as shown in figure 3.4, below.
Figure 3.4
In this relationship, a PART is a component of a larger PART, which may be a component of yet a larger part,
and so on, in a recursive relationship. A recursive relationship, where a single entity has a relationship
with itself, is also referred to as a unary relationship. The relationship sentences describing the relationship in
figure 3.4 are:
Each PART includes one or more PARTs.
Each PART is included in a PART.
In some applications we might have three entities that relate to a fourth bridge entity. This is a ternary
relationship. Each of the three entities is related to the bridge entity by a pair of relationship sentences such
that there are three sentence pairs in all. An example of a ternary relationship would be where sponsors are
used to raise funds to support various charities, as shown in figure 3.5, below.
Module 3: Relational Database Design
Figure 3.5
Thus we might have the three entities SPONSOR, AGENCY, and CHARITY, with a bridge entity of FUND relating
them. Our relationship sentence pairs would then be:
Each SPONSOR provides money for one or more FUNDs.
Each FUND is financed by a SPONSOR.
Each AGENCY manages one or more FUNDs.
Each FUND is managed by an AGENCY.
Each CHARITY receives money from one or more FUNDs.
Each FUND may provide money to a CHARITY.
Note that in this example we have chosen to utilize a ternary relationship because all three of the major entities
SPONSOR, AGENCY, and CHARITY, are assumed to be interrelated. If only two of these three entities could be
related at any time we would do better to model this as three binary relationships.
It is also possible that two different entities have multiple relationships between them, and these relationships
might even be of different degrees.
HAS-A versus IS-A Relationships
The relationships discussed thus far are all known as "HAS-A" relationships. There is another class of
relationships known as "IS-A" relationships. The IS-A relationships exist in a type hierarchy of entities. In this
type of relationship we might have a parent entity of EMPLOYEE, and then child entities of PROGRAMMER, DBA,
and CLERK. In this particular relationship, sentences such as the following exist:
Each
Each
Each
Each
EMPLOYEE may be a PROGRAMMER, DBA, or CLERK.
PROGRAMMER IS-AN EMPLOYEE.
DBA IS-AN EMPLOYEE.
CLERK IS-AN EMPLOYEE.
The IS-A relation types are involved in generalizations and specializations, which will be discussed and in more
detail, along with figures, in section IV of this module.
When developing an ERD, and preparing to move to the physical design phase of the database design process,
you cannot have any many-to-many relationships, because it is then impossible to determine the primary key
and foreign keys for the relationship. As we will see later in this module, the unique identifiers (i.e., key
attributes) of the entities on the "one side" of a one-to-many relationship form primary keys and the
corresponding attributes on the "many side" of the one-to-many relationship form the foreign keys.
Module 3: Relational Database Design
Also, one-to-one HAS-A relationships in an ERD should generally be combined, as the difficulty of working with
two tables (versus one) will result in performance degradation (the transformation of entities to tables and
performance issues will also be covered in module 4). But, if one of the two one-to-one related entities has the
potential to have a large number of NULLs, or is part of a type hierarchy, then leaving the one-to-one
relationship might be justified.
ERD Technique Summary
In summary, when using the sentence-pair strategy discussed in conjunction with developing an ERD the
sequence of events is as follows:
1.
2.
3.
4.
5.
Determine the entities from the requirements analysis.
Determine which entities are related to one another.
Develop sentence pairs between related entities.
Draw the ERD based on sentence pair relationships.
Repeat steps 3 and 4 iteratively, as necessary, until all overall relationships between entities are
reduced to one-to-many (except for type hierarchies, if any).
6. Use the ERD to develop tables, columns, primary keys, and foreign keys (this will be discussed in
module 4).
IV. Enhanced Entity/Relationship Modeling
EER Model
The classic ERD is well suited to many traditional database applications, but many modern databases require a
more robust type of modeling. To model more complex applications accurately, developers must use
sophisticated semantic data modeling techniques. In this section, we will explore how some of these
techniques have led to the enhanced entity/relationship (EER) model.
The EER contains all of the ERD concepts plus many useful additions. Unfortunately, as with ERDs, there is no
standard notation, just different preferred notations.
Superclasses and Subclasses
In a type hierarchy, a parent entity type is called a superclass. The entities in an entity set containing the child
entities are called a subclass of the parent entity type. Between the parent entity types and child entity types a
superclass/subclass relationship exists. Subclass entities are specific roles of the superclass entity. Every
member of a subclass is also a member of their superclass, but not every superclass member belongs to a
subclass.
Similar to what we will see with object classes later in module 6, in an entity hierarchy the subclass inherits
the attributes of the superclass. In addition, the subclass entities may have specific entities of their own. As you
can deduce from the example in section III, PROGRAMMERs, DBAs, and CLERKs would certainly have different
attributes, such as the skills unique to their specific jobs.
An entity type actually defines a set of entities in a relational database which all have the same structure (i.e.,
the same set of attributes). The collection of all entities of the same type is called an entity set. The entity
type describes the intension of the entity set, whereas the members of the entity set are called the extension
of the entity type.
Specialization and Generalization
A specialization exists where there are multiple subclasses for a superclass, where the subclasses have
attributes that differentiate them from the superclass. For example, SAILBOAT and POWER_BOAT are
Module 3: Relational Database Design
subclasses of the superclass RECREATIONAL_BOAT (and RECREATIONAL_BOAT is a subclass of the more
general BOAT superclass) as shown in figure 3.6.
Figure 3.6
Both the members of the SAILBOAT and POWER_BOAT subclasses inherit the attributes of their superclass, but
they also have specific attributes of their own, which differ between "sibling" subclasses. Also, different
entities at the same subclass level can be in different relationships with other entities.
If we start with multiple, similar subclasses, it might be possible create a superclass that has the common
attributes of all subclasses by using a process called generalization. The specialization process is a top-down
conceptual refinement analysis, where new subclasses are derived; since generalization is the reverse of
specialization, it is an example of a bottom-up conceptual synthesis.
A defining predicate of a subclass determines which members of the superclass belong to the subclass in a
specialization. For example, if the value of Boat_Type='Sail' for a RECREATIONAL_BOAT member, then that
member would also be a member of the SAILBOAT subclass. If an attribute's value can be used to determine
subclass membership, we have an attribute-defined specialization. If this is not the case, we have userdefined subclasses.
Subclasses can be disjoint or overlapping. If superclass members can belong to at most one subclass then we
have a constraint where the subclasses are disjoint. If superclass members can belong to multiple subclasses,
then the subclasses are overlapping. To distinguish the two types, we place either a "d" or an "o" on the circle
between the superclass and the subclasses, for disjoint or overlapping, respectively.
If all members of the superclass must be members of a subclass then we have total specialization. If not,
then we have partial specialization. As can be seen in figure 3.6, we have a disjoint, total specialization
because of the double line and "d" in the circle. Superclasses defined through generalization must result in a
total specialization hierarchy.
Hierarchies and Lattices
If each subclass can be related to only one superclass entity, then we have a specialization hierarchy. If this
is not the case we have the less restrictive specialization lattice organization.
A subclass that has no subclasses of its own in a specialization hierarchy or lattice is called a leaf node. A
subclass with multiple superclasses is called a shared subclass and through multiple inheritance, inherits
the attributes and relationships of all its superclasses. Specialization lattices have at least one shared subclass,
whereas specialization hierarchies have none.
Module 3: Relational Database Design
Categories
When we have a subclass that has multiple, distinct (i.e., not from the same type hierarchy or lattice)
superclasses, we have a union type or category subclass. In the EER model this is represented by a "U" in the
circle between the superclasses and the single subclass. Each member of a category subclass may only belong
to one of its superclasses, and inherits the attributes of only that single superclass.
V. Object Modeling
In addition to ERD and EER modeling, in the object oriented modeling realm there are two additional and very
widespread data modeling techniques—Universal Modeling Language (UML) and Object Modeling
Technique (OMT).
In the object-oriented paradigm, objects, which represent abstractions of real world items, are paramount.
Objects are created based on instantiation of object classes that specify their structure and behavior.
Objects have attributes (i.e., characteristics) and methods, which are like software functions used to define
their behavior.
In UML modeling, object classes are represented by a box partitioned vertically into three sections, with the
class name in the top section, the attributes in the middle section, and the methods in the bottom section.
Relationships between object classes are called associations, and each instance of a relationship is called a
link. UML also includes special notation for representing specializations and generalizations.
VI. RDB Specifications from ERDs
In this section we will discuss the guidelines for converting an ERD into the specifications for a relational
database. Specifically, we will describe how to convert entities, attributes, and relationships into relational
database tables, columns, and integrity constraints.
Tables
Each of the entities in the ERD (or EER diagram) will become a separate table in our relational database. This
includes strong entities, weak entities, and bridge entities. The entities in a specialization hierarchy or
specialization lattice may be either converted into individual tables or combined into a single table with
differentiating columns for the discriminating attribute (if any) and all other different attributes. This decision
will be based on the application and the number of differences between subclass attributes.
A good practice is to make relational database table names plural. Thus the CUSTOMER entity of figure 3.1 will
become the CUSTOMERS table.
Attributes
Each of the simple attributes, and the outermost attributes of any composite attributes will become columns in
the table that is formed by the entity to which they are attached. Derived attributes may be stored as columns
for the table if calculating their values would have performance impacts. The data types that will be chosen for
columns in the physical design phase (covered in module 4) will be determined by the domain of the attribute.
Multi-valued attributes should be used to form a new table and possibly a bridge table if needed. For example, if
a CUSTOMER entity (refer to figure 3.1) has Phone Number as a multi-valued attribute, this might be better
handled in the database by forming a PHONE_TYPES table and then, since this results in a many-to-many
relationship with CUSTOMER, a bridge table of CUSTOMER_PHONES between CUSTOMERS and PHONE_TYPES.
This will allow each customer an unlimited number of different phone numbers for different types of phones.
Keys
Module 3: Relational Database Design
Although it is not required for relational database tables, good relational database design dictates that all tables
should have a primary key. The unique identifier (i.e., key attribute) of each entity will become the primary key
of the table. If there are multiple attributes that form the unique identifier then we will have a composite
primary key.
Since the unique identifiers of categories are not based on a common superclass identifier, we must generate a
surrogate key for the primary key. A surrogate key is a machine generated, usually sequential, unique
numerical value that has no significance other than to provide uniqueness. Also, in cases where the number of
columns and different data types for a primary key of any table becomes unwieldy, it might be advisable to
generate a surrogate key for the primary key.
The many side of all one-to-many relationships will have a foreign key that references the primary key side on
the one side of the same relationship. If the keys are composite, ensure that the same column order and data
types exist for both the primary and referencing foreign keys. Usually the composite of the foreign keys of a
bridge entity is used to form its primary key.
The specification of foreign keys not allowed to be NULL, or having cascade deletion and updating, is based on
the requirements of the application and the capabilities of the RDBMS. Note, however, that total participation of
an entity in a relationship implies that foreign keys cannot be NULL.
VII. Case Study #1 (Legal Cases)
In this section and in the next two sections, some simple relational database data modeling examples will be
presented to reinforce the concepts presented earlier in the commentary. Learning how to design a good
relational database is the most important practical goal of this course.
In this and subsequent examples, the ERD notation that will be used is that shown in figure 3.2B.
You should not be shocked to learn that numerous different ERD notations exist in texts and within vendor
products at this time. Once you grasp the fundamentals, adapting to a slightly different notation for a specific
design activity should not be difficult.
Case study #1 involves a lawyer who has been in practice for several years and has maintained her record of
cases on 3 � 5 cards. Maintaining this "database" has been cumbersome, and you have been hired to automate
the process. You find that each 3 � 5 card represents a different case, and includes a client's name, address,
phone number(s), case number, case description, and other notes. The end user (the lawyer) wants you to
include both home and work phones in the database, along with personal information and the name of the
person who referred the client. Even though each card may have multiple names on it, each case will have only
one client associated with it (this is a business rule!).
This brief information in the above paragraph represents the end user interview portion of the requirements
analysis phase of the project. In this relatively simple application the end-user interview may have taken only
an hour or two. In other cases, the requirements gathering and analysis process can take weeks or months.
After asking the end user questions for clarification, you, the developer, are able to identify the entities,
attributes, and relationships.
On subsequent end-user interviews to your end user, you could present an ERD to them, since an ERD conveys
even to non-technical people the data involved in their business. Notice there is no discussion of what RDBMS
will be used, how many tables will be developed, the method of data entry, specific forms and reports, and so
on. These all come later. Based on the previous end-user interview, the ERD in figure 3.7 may be developed.
Figure 3.7
Module 3: Relational Database Design
In this diagram there are two entities, CLIENT and CASE, which are related as follows:
Each CLIENT may generate one or more CASEs.
Each CASE involves a CLIENT.
Initially we were told that "Each CLIENT generates one or more CASEs". But in a subsequent interview, we
found that there are situations in which clients do not generate a case as we have defined it. Thus the above
sentences, where the first sentence involves a partial inclusion, most accurately reflect the business, and
matches the ERD drawn in figure 3.7.
As you can see, our two entities have an overall one-to-many relationship. Although a CASE could have had
multiple CLIENTs, in this application the end user wanted only the primary client noted for each case.
Note the "crow's feet" at the entity on the many side of the relationship. The relationship verbs (i.e.,
"generates" and "involves") are shown one above the relationship line and one below.
The attributes are connected to the entity for which they are a characteristic. There are two composite
attributes for the CLIENT entity, Name and Address. Entity names are indicated in the singular (as opposed to
CLIENTs or CASEs) on the ERD. The lawyer, even though she is obviously essential to the enterprise, is not
shown in the ERD. The end user is usually not part of the data diagram, since to be an entity there must be
multiple instances of the entity. If there had been multiple lawyers, or multiple offices involved in this
application, then the situation and the data model might be different.
The attributes that uniquely identify each entity member are underlined. These unique identifiers will become
primarily keys of the tables formed by their entities. The Case Number was provided by the end user from their
3 � 5 card file. The Client ID was a surrogate key generated as a simple way to identify different clients. As you
can imagine, using names would lead to eventual problems that could be avoided by using a single column
surrogate key.
The case outcomes, judge assigned, trial date(s), income from the case, billing information, and so on, although
interesting, were not requested by the end user, and therefore are not shown on the ERD. They could be easily
added later, as relational databases are a very flexible database design.
In module 4 we will show how this ERD, and the others in the following case studies, can be converted into
tables for a relational database.
VIII. Case Study #2 (10K Race)
For this database application, we will consider a community's annual fund-raising event—a 10 K (10 kilometers,
or about 6.2 miles) foot race, in which 200 to 300 runners regularly participate.
Module 3: Relational Database Design
The race director (our end user) informs the database developer that he is tired of manually entering the
entrants onto a spreadsheet and is having an especially hard time determining the overall winners and agegroup winners from the finish times of those entrants who actually compete and finish the event. Plus there is a
requirement to keep runners' names and addresses to solicit their entry in each year's race via a bulk mailing.
As entrants mail in their entry fee, they are checked against a list of past entrants to see if addresses have
changed.
Phone numbers and e-mail addresses are not kept in the database. Entrants can be identified only by name
because of privacy concerns about social security numbers, so we will probably want to generate a surrogate
key to identify them. Each entrant is assigned a BIB number (the unique number they will wear in that race) for
the event, but no one is guaranteed a particular number, as the same set of numbers, typically from 1 to 300,
are re-used every year.
On race day, approximately 15 percent of the runners who entered do not show up, and another 50 percent
beyond those pre-registered sign up on race day, just before the event starts. As runners finish, their times are
recorded. But not all starters may finish the event for a variety of reasons.
Awards are given to the fastest male and female runners overall as well as to the first one to three finishers in
each age group, based on the number of participants. Age groups consist of runners of the same sex who share
the same "age decade" (e.g., all 20- to 29-year-old males compete as an age group, and females who are aged
60 and above compete as another age group).
Based on this end-user interview, the ERD shown in figure 3.8, below, was developed.
Figure 3-8
Notice that there are three entities in a type hierarchy: RUNNER, ENTRANT, and FINISHER. Through an analysis
of the application we deduced that each RUNNER may either be an ENTRANT or a NON_ENTRANT each year.
Since we are not interested in RUNNERs that don't enter, we have not included the NON_ENTRANT entity in the
ERD. In a similar manner, there are two subclasses of ENTRANT, FINISHER and NON_FINISHER, and we are not
interested in the non-finishers (called DNFs for "did not finish").
Module 3: Relational Database Design
The weak entity ENTRANT has all of the attributes of its strong entity RUNNER, plus some of its own attributes.
Similarly, the FINISHER subclass inherits all of the ENTRANT entity attributes.
RUNNER contains attributes that should not change, as well as other information about the runner, such as the
Address and Last Year Raced. Note that the derived attribute Last Year Raced could be determined by searching
the ENTRANT entity's instances, but this turns out to be cumbersome and leads to poor query performance
when preparing the bulk mailing.
The ENTRANT entity contains attributes that will change from year to year, such as the Year of the event,
runner's Age, runner's BIB number and runner's Age Group (another derived attribute). Weak entity FINISHER
is dependent on another weak entity, ENTRANT, plus it has some additional attributes of its own, such as Finish
Time and Award (yet another derived attribute).
Prior to developing the ERD we noted the following relationship sentence pairs:
Each RUNNER registers as one or more ENTRANTs.
Each ENTRANT IS-A RUNNER.
Each ENTRANT may become a FINISHER.
Each FINISHER IS-AN ENTRANT.
In this ERD, the ENTRANT/FINISHER relationship is one-to-one, which is common in a type hierarchy of entities.
Notice that there is full participation of the RUNNER entity in ENTRANT (you don't get into the database unless
you've entered at least one race), but a partial participation of the ENTRANT entity in FINISHER (you might not
finish just because you enter).
The composite attributes Name and Address are similar to those in the Legal Cases database. Last Year Raced is
an attribute of RUNNER that indicates the last time the runner registered for a race and is used to help in
preparing a bulk mailing to those runners most likely to participate in a new year's event. The derived attribute
Award has values such as "First Place Male Overall" or "2nd Place Men 50-59". Note that specific attribute values,
such as "Male" or "Female" for Gender, or the values for the derived attribute Age Group are not shown in an
ERD. We only show the generic attribute names.
As an interesting note, observe that the composite unique identifier of both the ENTRANT and FINISHER entities
is Year and BIB. This composite is also the foreign key of the FINISHER entity.
Many solutions to this and most applications are possible. But as long as each ERD meets the requirements for
the data and queries of the application, and is properly normalized (a topic discussed in detail in module 5), the
design is correct. Phrased another way, there is no single database design for every problem, but each solution
must meet the requirements and result in a manageable database.
IX. Case Study #3 (Car Rental Application)
The final case study presents a company that is involved in renting a fleet of cars to customers. Obviously, to
be profitable, the company rents cars as often as possible and tries to achieve repeat business with its
customers.
Each car is identified by a vehicle identification number (VIN). Charges for rental cars are based on class of car
rented (make, model, and size), days rented, length of rental, whether the rental takes place on weekends or
holidays, mileage driven, discounts, and insurance. Customer information must be maintained along with the
date and time of rental and rental pick-up and drop-off locations.
Initially the developer tried to relate entities CUSTOMER and CAR as follows:
Each CUSTOMER rents one or more CARs.
Module 3: Relational Database Design
Each CAR may be rented by one or more CUSTOMERs.
This ERD portion is shown in figure 3.9A, below. These two separate relationship sentences combine to form a
many-to-many relationship. As stated in the commentary earlier, for many-to-many relationships we must form
a bridge entity and then split the many-to-many relationship into two new one-to-many relationships, as shown
in figure 3.9B. below.
Figure 3.9A
Figure 3.9B
The initial sentence pair is now abandoned since CUSTOMER and CAR are no longer directly related. The two
new relationship sentence pairs we will use are:
Each CUSTOMER is involved in one or more RENTALs.
Each RENTAL includes a CUSTOMER.
and
Each RENTAL includes a CAR.
Each CAR may be involved in one or more RENTALs.
Note that RENTAL is really a weak entity, whose existence depends on the two strong entities, CUSTOMER and
CAR. Adding the other entities and their attributes results in the final ERD for this application as shown in figure
3-10. below.
Figure 3-10
Module 3: Relational Database Design
In the final ERD, we decided to add two entities: RATE_TABLE and RATE_SCHEDULE. The RATE_TABLE entity
consists of separate RATE_SCHEDULE information to note the particular schedule's standard Daily Rate. This
then becomes the derived Base Rate attribute of the RENTAL entity when multiplied by the number of days of
the rental. Each of the RENTAL charge components is a simple attribute.
The Rental ID is a unique invoice number for each rental transaction. We chose to use this surrogate key,
versus the composite of Account Number, VIN and Date/Time Out, to identify each RENTAL.
Developing the two entities RATE_SCHEDULE and RATE TABLE requires a bit of a leap in thinking. A
RATE_TABLE comprises all the possible RATE_SCHEDULEs that may be in effect at any specific time. For
example, at the present time, assume there are 55 different RATE_SCHEDULEs in the single, current RATE
TABLE.
Each RATE SCHEDULE is like a line in a RATE_TABLE schedule. For example, one RATE_SCHEDULE would be for
compact cars rented on a weekday for only one day, with a mileage charge. A second may be for the same type
of car rented at the same time, also for one day, but with unlimited mileage. A third schedule may be for
compact cars rented on a weekday for three or more days, with a mileage charge. Continuing this thinking,
there could be a large number of RATE_SCHEDULEs, each with their own standard Daily Rate (e.g., $19/day)
based on size/class of car (compact; mid-size; full size; luxury), day rented (weekday; weekend; holiday), days
rented (1; 3 to 5; 6 or more) and mileage option (charge per mile; unlimited; combination of unlimited to some
point, then a charge per mile).
In the next module, we will see how the logical design using an ERD is analyzed to specify the tables for a
relational database, how these tables are constructed using SQL during the physical design for a relational
database, and, finally, how the tables are populated with SQL.
Return to top of page
Download
Study collections