Chapter 10 – database concepts - Computer Science and Engineering

advertisement
CHAPTER 10 – DATABASE CONCEPTS
This chapter discusses the basic concepts needed to understand and use simple databases. While
the spreadsheet’s power lies in its ability to analyze data and relate values by creating formulas
that reference other cells, a database management system (DBMS) is designed to relate groups
of information and to store, retrieve, and manipulate that information in an efficient manner.
DATA, DATABASE, AND DBMS
DEFINING A DATABASE:
Data is a numeric or alphanumeric group of symbols, such as 223197001. When we give
meaning to data it becomes information. For example, 223197001 has no meaning
unless we are told it is a social security number. A piece of information becomes even more
meaningful when it is related to another piece of information: 223197001 is John Smith’s social
security number.
A database is a collection of related data stored in a well-defined structure. Databases exist in
both computerized and non-computerized formats.
Examples of databases include a
categorized file cabinet, the telephone book, a list of alphabetized songs on your iPod, or a listing
of all students and classes at a university. Databases are managed by software tools known as
Database Management Systems (DBMS). Examples of DBMS’s are Microsoft Access,
FoxPro, Oracle, and Sybase, among many others. Just as a word processor (e.g., Microsoft
Word) is used to create and edit documents, a DBMS is used to create and manage databases.
A RELATIONAL DBMS
Each DBMS is based on a database model that defines the way the information should be
organized and accessed. The three most commonly used models are the hierarchical, network,
and relational. Of these the most flexible is the relational model, which is what we will be
discussing in this chapter.
The relational model represents data and relationships using a collection of tables, as seen
in Figure 1. Each table is organized into categories of data known as fields. The table on the
left side of Figure 1 stores information regarding patients, including patient identification
number, name, address, and the doctor number of the physician treating this patient.
Patient#
AC34
BH72
BL12
EA45
FD89
Name
Marsh, Allen
Verns, Julie
Lee, Thang
Orwich, Robin
Ferb, Michael
Street
134 Central
415 Main
12 Mountain
867 Ridge
34 Crestview
City
Berridge
Berls
Denton
Fort
Stewart
Berridge
State
FL
FL
FL
FL
FL
ZipCode DoctorID
60330
21
60349
24
60412
24
60336
27
60330
21
Related field on tables
DoctorID
21
24
27
34
Last Name
Kerry
Reeves
Fernandez
Lee
First Name
Alyssa
Camden
Jaime
Jan
Figure 1
Page 1
A relational database many contain more than one table and these tables may themselves be
related to each other. The example in Figure 1 contains a second table with information about
each doctor. Notice that each doctor is identified by a unique DoctorID which can be related to
the DoctorID on the patient table in each patient record.
In addition to tables, most modern day DBMS’s include other objects which allow the user
store, retrieve, and manipulate data. In MS Access these objects include the following:
Queries – “questions” that retrieve information from a database. Queries are structures to
sort, filter, and select specific information.
Forms – structures for displaying data that allow a user to view information from and input
information in one or more objects (tables, queries, etc.).
Reports – structures for written output of data which again allow one to combine information
from one or more objects and view both details and summaries.
Macros & Program Modules – program code to perform specific actions.
A RELATIONAL DATABASE EXAMPLE – SUPPLY CHAIN MANAGEMENT
The diagram in Figure 2 represents part of an Order Entry and Inventory control system. The
system includes forms, queries, and reports for data entry and retrieval. The tables store
information regarding product inventory, vendors, customers and orders.
Forms:
Order
Transactions
New Vendors &
New Customers
New
Products
Shipments
Program Modules
Tables:
Current
Inventory
Vendor
List
Orders
Order
Details
Program Modules
Reports & Queries
Daily Ship
List
Customer
Invoices
Accounts
Payable
Customer
Accounts
Output
File:
Inventory
Low Message
Figure 2
The flow of information in the database for a typical order that might be phoned in by a
customer may be as follows:
Page 2




The order entry clerk would enter the order into an Order Transaction form.
Once the order is input, a predefined program module would take this data and enter it
into the appropriate tables: e.g., Order Details table, Customer Accounts table, etc.
A Daily Pick List report will be printed for the fork lift operator in the closest warehouse
and customer invoice report printed to be included with the shipment.
The inventory table will be updated at this warehouse with this reduction in quantity; if
insufficient inventory remains, an order report would automatically be emailed to the
supplier to order more inventory.
As you can see, a company’s supply chain system using DBMS software is an extremely valuable
tool in modern day business. This course will discuss some simple database concepts as well as
how to design and query a simple database. The mechanics of setting up tables, reports, and
forms is covered in the course textbook. Setting up program modules/macros is beyond the
scope of this course.
ACCESS TABLES: RECORDS AND FIELDS
DEFINING FIELDS AND RECORDS OF A TABLE
The basic component of an Access database is the table. All other objects are based on the
structure and data within the tables. Each table is organized into a specified set of ordered
categories, or fields. Figure 3 is part of a table named Customers. The fields in the Customer
table include SSN, First Name, Last Name, Address, City, State, and Postal Code. Related
information is input into the table as records. Each record contains related values for each
table field. For example, the first record in this table contains Jane Doe’s SSN, her last name,
her first name, her address, her city, her state, and her postal code in that order. Jane Doe’s
record does not contain John Black’s SSN or Mary Park’s postal code. In addition, the third
piece of data in any record in this table will always be the Last Name as the order of values in
each record is the same as the order of the fields in the table.
Table: Customers
Customers: Table
SSN
070-13-2976
121-78-8233
273-49-2211
873-38-3923
Primary
Key
First Name
Jane
John
Richard
Mary
Last Name
Doe
Black
Taylor
Parks
Address
123 W. Lane Av
34 Grand Av
99 King Dr
54 Elm St
City
Columbus
Seattle
Chicago
Houston
State
OH
WA
IL
TX
Postal Code
43210
90012
60638
34167
Record
Field
Figure 3
A table is frequently pictured as a collection of records in which fields are columns and records
are rows. A Data Base Management Systems is not limited to this physical view of a table.
Page 3
However, for purposes of abstracting the actual processes DBMS systems perform, this table
view is helpful. In Access this view can be invoked from the Fields ribbon.
In Access 2010, the Fields ribbon allows the user to not only view and input data, but to add new
fields and to specify field properties. Figure 4 illustrates the Access window with the Accounts
table open to the Fields ribbon visible.
Fields
Ribbon
Views
button –
switch from
datasheet to
design
Navigation
Pane – lists
database
objects
Record Selection
Buttons
Figure 4
View buttons
FIELD PROPERTIES
Fields are defined by field properties. The diagram in
Figure 5 shows the design view of an Access table where field
properties may be specified. The most common field properties
are:



Data type: The type of information stored. e.g., Number,
Text, Currency, Yes/No (Boolean), Memo, etc.
Field size: The number of characters for text or the
precision of numbers. e.g., numbers can be integer, long
integer, single precision, double precision, decimal, byte,
etc.
Format: For numbers, the format specifies display
properties such as currency style, scientific notation,
etc.
Figure 5
Page 4
Page 4





Input Mask: Predefined formats for displaying the field, such as social security numbers
with dashes displayed but not stored, etc.
Caption: The title that is displayed instead of the field name.
Default Value: A value that will be used if this field is left blank when a record is entered.
Validation Rule: A list of possible values or range of acceptable values for this field.
Required: If selected, when entering or modifying a record this field must be entered or the
computer display an error message.
It is advantageous to specify field properties both to make the table easier to use and more
efficient. For example, each time a new record is created, Access allocates memory (bytes) based
on the field size specified. That is, an amount of memory the size of the field is set aside, whether
it is needed or not. If a specific text field only requires three characters and the size specified as
50, each record will waste the space of 47 characters. If each character takes two bytes of
memory, 94 bytes of storage space would be wasted per record. A large database with 100,000
records would be wasting 9.4 million bytes!
WHAT FIELD TYPE SHOULD YOU USE?
A social security number consists of 9 digits. What field type would be best suited to store this
data? Using a number gives the user the ability to perform arithmetic calculations, while text
does not. Will it ever be necessary to perform arithmetic calculations on these values? Probably
not. So a number type is not needed, but can it be used?
Consider the social security number 003278343. If this value is typed into a Number field, what
value will be stored? Try it and you’ll find that the value 3278343 is displayed – the leading
zeros are discarded. Does that matter? In the case of social security numbers, this matters
greatly. The user will not want to have to “add” the zeros to print out a person’s data in a report.
If the value was stored as text, the zeros would remain part of the data stored.
Thus the best field choice for a social security number is a text field. This same logic applies to
zip codes and even phone numbers.
UNIQUELY IDENTIFYING TABLE RECORDS
Imagine a large bank with over 100,000 accounts; can a person’s last name alone be used to
identify the contents of their bank account? Is it possible that two customers have the same last
name, or even that one customer has multiple accounts? If such a customer made a deposit to
their account how do we know which account to use?
Obviously this is a very realistic situation and must be taken into consideration when designing
a database. To solve this problem, database designers include a field in tables that uniquely
identifies each record. The Customers table presented in Figure 3 contains a unique social
security number, SSN. Since no one person has the same social security number, this can be
used to uniquely identify a person/record in the table. A field that uniquely identifies a record is
Page 5
referred to as a primary key field. The primary key cannot be blank nor can it contain any
duplicate values (two records have the same value for the primary key field).
Will SSN always be a good primary key field to use? Not necessarily, it will depend on the
situation. Would an SSN uniquely identify a bank account? If a customer can have multiple
accounts (e.g., for example one for savings and one for checking) then the SSN is not a unique
identifier. To solve this problem the bank may use a unique account number, as seen in Figure 4.
A combination of fields can also be used to uniquely identify a record. Consider a table of
transactions a combination of account number and transaction time might be used to uniquely
identify a record, though normally these types of tables would be setup with a separate
transaction number. You’ve probably used these types of numbers in other applications, such as
when you look up an airline reservation or track a FedEx package.
A primary key field is not always necessary; not every table will have a single-field primary key
or any primary key at all. However, every relationship between tables must have some field or
combination of fields that uniquely identify records in one of the tables. Otherwise, for example,
it could not be established exactly which transaction will go to which account. When using a
combination of fields as a key, additional fields are known as secondary, tertiary, etc. fields. A
possible combination of fields to uniquely identify a bank customer could be the first name,
birth date, and phone number to include with last name. In this course we will use single
primary key fields to uniquely identify records.
In documentation a table is normally listed by its name followed by an ordered list of fields in
parenthesis. The primary key is underlined. The Accounts table would be written in this
notation as follows:
Accounts(acct#, SSN, LastName, FirstName, address, City, State, Postal Code)
This description of the Accounts table is also known as a relational schema.
RELATING TABLES IN A RELATIONAL DBMS
Tables are structured into records of related data organized into ordered fields. Records are
uniquely identified by a primary key field. By uniquely defining a field such as an account
number we can find data corresponding to that account number that may reside in other tables,
such as transactions made on that account. In this section we will explore relating data between
tables.
DEFINITION: FOREIGN KEYS
Databases can contain multiple tables where a relationship between tables is established by
correspondence between fields. The field used to relate two tables is referred to as the foreign
key. Consider the two tables in Figure 6. The first table lists an account number, the name of
the person listed on the account, and their address. The second table lists bank deposits
identified by the depositor’s name. Can each deposit be related to an account using the Name as
the foreign key between the two tables? Is Smith’s $100 deposit for the Smith on Main Street or
the Smith on Cherry Lane? Clearly using a field that has duplicate values on both tables does
not work very well.
Page 6
Acct#
1
2
3
Name
Smith
Jones
Smith
Address
123 Main St.
45 Elm St
27 Cherry Lane
?
Name
Smith
Smith
Jones
$Deposit
25
100
25
Figure 6
Figure 7 contains a modified version of the second table that includes the account number
instead of name. Can each deposit be uniquely matched to a single account using the account
number as the foreign key? If the account number is the primary key of the first table, the
transactions on the second table can related to a specific account. Thus, a foreign key must
be a primary key on at least one of the tables for a relationship to be valid.
Acct#
1
2
3
Name
Smith
Jones
Smith
Address
123 Main St.
45 Elm St
27 Cherry Lane
Acct#
Acct#
1
3
2
$Deposit
25
100
25
Figure 7
This type of relationship where many values from one table (many deposits) can match to a
single value on the related table (one acct#) is referred to as a Many to One relationship (or
a One to Many relationship). An equally valid relationship would be a One to One relationship
where each record of one table corresponds to at most one record in the second table. A One to
One relationship occurs when each foreign key is the primary key on both tables.
Page 7
REQUIREMENTS OF A VALID FOREIGN KEY
It is not required that the field names on each table match. Conversely, two fields that have the
same name do not imply there is a foreign key relationship.
The following rules define what is required for a relationship between two tables to be valid:
1. The foreign key must be a primary key on at least one of the tables.
2. The field types for the foreign key field must be the same on both tables.
3. The information being related must be the same.
The first requirement has already been discussed. What about the second requirement, what
does it mean the field types must be the same? Consider the foreign key in the previous
example. What field type is the acct# field? The designer had a choice of using Number or Text.
Either would have worked, though one may have been more efficient that the other. What
matters is consistency between the two tables. If the acct# field is an Integer number on the first
table and Text on the second table, Access will not be able to match the foreign key records.
Why should it make a difference which data type is specified? After all, you can’t see the type of
the field when looking at the information in the datasheet view. However, remember the data is
being stored in memory of a computer as a series of high/low level electrical charges that we
express as zeros and ones. The text representation for the digit 1 may be a series of 32 zeros and
ones while the Integer representation for the number 1 may be a series of 16 zeros and ones.
These two values are NOT EQUAL and thus computer will not recognize the two values as
matching.
What about the third requirement for a valid foreign key, what does “information being related
must be the same” mean? Consider two tables, where each has account numbers. Table 1
contains the account numbers at First City Bank and Table 2 contains the bank numbers at
Union Trust bank. While the fields may have the same name and type, there is no relationship
between account 1234 at First City Bank and account 1234 at Union Trust bank. Relating the
records between these two tables based on account number would be meaningless.
Page 8
MANY TO ONE TO MANY RELATIONSHIPS
In the following example another table has been added to the database to keep track of
withdrawals. The schema of the table is Withdrawals(Acct#, Amount). Can the Withdrawals
table be related to the Deposits table from the previous example?
Deposits
Acct#
256887
256887
654887
Amount
$50.
$75
$32
?
Withdrawals
Acct#
256777
654887
256887
Amount
$25
$100
$25
Figure 8
The fields Acct# (table Deposits) and Acct# (table Withdrawals) both represent a customer’s
account and we can assume they both are specified as the same data type. Yet in neither case
are these fields primary. There may be many instances of an account number on the deposits
table (e.g., 256887) and many instances of that same account number of the withdrawals table.
However, a deposit does not correspond to a withdrawal (and vice versa). Thus, the account
number would not be a valid foreign key. This type of relationship is referred to as a Many to
Many relationship.
Can we relate the tables using the Amount fields? While the Amounts fields have the same
name, they represent different information. On the deposits table Amount is the money into an
account. On the withdrawal table Amount is the money out of an account; it does not make
sense to relate these fields. In addition, it possible that two transactions contain the same value
in the amount field, so neither of these fields is primary.
But certainly it makes sense that somehow the deposits into an account are related to the
withdrawals from that account. To solve this dilemma databases are designed with intermediate
tables, in this case the Accounts table. In the Accounts table the account number is the primary
key and can be related to both the Deposits table and the Withdrawals table, as seen in Figure 9.
This changes the relationships so that there are now two Many to One relationships. The
relationship between three such is referred to as a Many to One to Many relationship.
Another possible database design would be to combine the Deposits and Withdrawal tables into
one Transactions table, since the fields are essentially the same, account number and a
monetary value. In the latter type of design, deposits would need to be entered as positive
values and withdrawals as negative values.
Figure 9
Page 9
DEFINING RELATIONSHIPS IN AN ACCESS DATABASE
Just as our DBMS software will allow us to define a table and enter records, it also allows us to
define relationships between tables. The Relationships tool can be launched from the
Relationship button on the Database Tools ribbon in the Show/Hide group, as shown in Figure
10. Figure 11 shows the relationships view that displays the relationship between the Accounts
and a Transactions tables.
Figure 10
Figure 11
When the relationships view is first launched in a new
database it will be blank. Each table must be individually
added from the Show Table box, as seen in Figure 12. To open
the Show Table dialog box, right-click anywhere in the
relationships window. Once the box is open, click on the
Tables Tab (or Query tab for a query) and select the name of
the table to be added and then click then Add button (or
double click on the table name). Repeat the process to add
additional tables. Figure 13 illustrates the Relationship
window with the Accounts and Transactions tables added but
not yet related. To relate tables follow these three steps:



Figure 12
Move the mouse to the foreign key field listed on one
of the tables.
While holding down the left mouse button, drag the
cursor to the corresponding field on the second table.
The cursor will change into a circle with a line through
it during this process.
When you reach the corresponding field, release the
mouse button.
A line should appear that connects the two fields, similar
Figure 13
to the picture originally shown in Figure 11. Once all
relationships have been defined, close the window and
select the Yes button to save. Repeat this process to make any additions/changes to the
relationships window. Relationships can be deleted by clicking on the relationship line and then
pressing the Delete key.
Page 10
USING TABLE TO MANIPULATE DATA
One of the main reasons for using DBMS software is the ability to quickly and easily locate
specific data or sets of data. Within an Access table it is possible to quickly and easily locate data
using the Filter and Sort tools. To understand how a database finds records, we will also briefly
explore the concept of search routines and Indexing.
APPLYING DATA FILTERS
The filter tool can be used in the datasheet view to display selected records of a table. A filter
allows us to specify criteria in a field or fields and show only those records that meet the criteria.
Using the filter tool we can list only those people who live in Columbus, those people whose last
name is Jones, or even only people whose last name is Jones and live in Columbus. The
mechanics of setting up filters from the datasheet view of a table can be found in any of the stepby-step instructions in the course text. The Filter tools can be found on the Sort and Filter
group of the Home tab (Figure 14). They include:




The Filter button allows the user to sort or
choose one listed item as a filter criterion. This list
will vary depending on which field of the database
is currently selected. To select a field, simply place
the cursor on any record in the field to be filtered
Figure 14
The Selection button applies the filter using the
current selection as the criteria. If the highlighted
“cell” is on record 3 in the account field and that value is 123, then 123 will be the
selection criteria. After clicking on the Selection button, options to select records based
on this filter will appear including: equals, does not equal, greater than, etc.
The Advanced button gives the user several options for filtering data, including Filter
by Form which enables criteria to be defined across multiple fields.
To remove the filter, click the Toggle Filter button.
This ability to find records that meet a specific set of criteria will be greatly extended in the next
chapter using a database query.
Page 11
SORTING TABLES
From the Datasheet Table View in Access, tables can also be sorted. Using the sort tool, select
a field and a sort type (ascending or descending). The records will be temporarily rearranged
based on this order. Sorts can be performed by clicking on the field to be sorted and then
selecting either the ascending or descending sort buttons in the Sort & Filter group of the
Home tab. The buttons for sort-ascending and sort-descending look like this:
If the table is not saved using the Save button, the table will revert back to the original
record order when reopened. Sorting is an efficient tool for helping to retrieve specific records.
More advanced sorts using multiple sort keys can be done using a query.
INDEXING TABLES AND SEARCH SCHEMES
INDEXING TABLES
There are also methods by which DBMS systems can index your files to create a cross reference
to the table records based on a specific sorting method. Since data is usually stored on magnetic
disks in a linear fashion, file indexing combined with search schemes make it more efficient for
the computer to retrieve records, especially in databases with millions of records. Several
indices can be setup for the same table, allowing for efficient searching for a variety of fields. For
example, the bank can search by account number or by last name, depending on the information
the customer has provided. These searches may always be done whether or not a table is
indexed; searches over a large number of records are more efficient when using indices.
A LINEAR SEARCH ROUTINE
As previously mentioned, one of the reasons to sort tables is to allow for more efficient data
retrieval. For example, imagine a dictionary that had words listed randomly. In order to find a
specific word one needs to systematically go through each word, one by one, until the desired
word is found. This is known as a linear search. On average, a linear search will have to look at
the number of items in a table divided by two in order to find a specific piece data: it might be
the first word in this randomly organized dictionary, but then again it might be the last.
Why is efficient data handling so important? First let us understand how data is stored
and retrieved by a DBMS like MS Access. Recall that when working with an Excel spreadsheet,
the entire file is loaded from the disk drive onto the computer’s RAM (random access memory).
So working with spreadsheet data is usually extremely fast, but the size of files are limited by the
RAM of the computer. In fact, one notices a significant slow down of operation speed as the
workbook file increases in size.
In contrast, most relational databases do not load all of the tables, queries, reports, etc. directly
into RAM. They load only the table of contents of the objects. To process information from one
or more objects, just those objects are loaded into RAM. Thus, a DBMS can handle much larger
quantities of data. In fact, many large databases systems have millions of records.
Page 12
When running a DBMS, the computer is not just processing information but continually
retrieving and writing data to and from secondary memory (usually a magnetic hard drive).
While a computer’s RAM can process information at very fast speeds, searching for specific
information on a disk drive and retrieving and/or writing to the drive is a much more time
consuming process. If a file is stored in random order, as with the un-alphabetized dictionary, it
will require the computer to look at the disk many more times to find the information that we
want than if the file was sorted. Consequently, computer scientists are interested in how to
search for information more efficiently.
THE BINARY SEARCH
There are many different search schemes that can be used with indexed files to speed up
retrieval of information. Most of us are all familiar with the alphabetical sort routine that
divides textual information into 26 groups based on the first letter of each item, and then further
subdivides each group by the second letter, etc. The search routine to retrieve information from
an alphabetical list, such as a dictionary, is to identify the first letter of the text and match it to
the correct group and then continue doing this with the second letter and so on until a match
has been identified.
A similarly efficient scheme which can be used with numerical data is known as a binary
search routine. A binary search routine is much more efficient than a linear search in finding
information. In Figure 15 records have been sorted by the indexed field, ID#, in ascending
order. To find the record for id#606147775 using a binary search routine the computer would
do the following:
1. First go to the middle record (not the average value!) of the list and
check to see if the value (id#) equals the value of the middle record.
If this is true, the record has been found. If not, continue to step 2.
2. If the value is greater than the value of this middle record, ignore
all records from the beginning of the list until this midpoint. The
remaining list will contain only records from the midpoint until the
end of the list. Set a new midpoint for this new list and begin again
at step 1.
3. If the value is less than the middle record, ignore all records from
the midpoint to the end of the list. The new list will contain only
records from the beginning of the list until the midpoint. Set a
new midpoint for this new list and begin again at step 1.
ID#
123456789
139555002
157745969
178301771
201529842
227776183
257436001
290951508
328824082
371616449
419975418
474621100
536370683
606147775
684995818
774094122
874775203
988544806
lname
Sommer
Suyama
Lebihan
Berglund
Trujillo
Moos
Citeaux
Callahan
Moreno
Fuller
Dodsworth
Leverling
Hardy
Anders
Peacock
King
Davolio
Buchanan
fname
Martín
Michael
Laurence
Christina
Ana
Hanna
Frédérique
Laura
Antonio
Andrew
Anne
Janet
Thomas
Maria
Margaret
Robert
Nancy
Steven
Figure 15
4. This process will continue until a match is found or all records have been searched (in which
case the value does not appear in the list).
Page 13
Applying this algorithm to this specific example:



Since 606147775 is greater than 328824082, consider only those records starting with
328824082 until the end of the table.
The midpoint of this new list is 536370683. Is 606147775 greater than 536370683? Yes. So
now consider only those records from 536370683 until the end of the table.
The midpoint of this new list from 536370683 to 988544806 is 684995818. Is 606147775
greater than 684995818? No. So our new list will be from 536370683 to 684995818.
The midpoint of this list is 606147775. Since this midpoint now matches our search value the
desired record has been found.
This search only looked at three different records in the table. A linear search on average would
have looked at 19/2 or 10.5 records. This is a significant improvement.
To illustrate the significant difference between linear and binary search routines, consider a
situation where instead of 19 records, the list had a million records. If the list isn’t sorted by the
value we’re searching for, in the worst case we would have to look at all one million records. If
the list is sorted and we use a binary search, we would only have to look at thirty one. If the
list had 10 million records and we could use a binary search, the worst case is still only looking
at thirty five records.
A binary search is only one of many different methods computer scientist use to improve the
efficiency of retrieving data. There are computer science courses devoted solely to this topic.
This discussion is only meant to provide you an appreciation of the processes involved and an
insight into the importance and complexity of the topic.
Page 14
DESIGNING YOUR OWN DATABASE
The design of a complex database management system can take weeks, months or even years to
complete, involving thousands of man-hours of effort by a team of computer scientists and
management. You may someday be part of one of these teams, or you may just be trying to
create a small database to keep track of a guest list for a large party. Regardless of the size and
complexity of your database, there are several things one must consider before creating a
database. As with a spreadsheet, the critical step in designing an effective database is to plan it.
Think about the following:



What data objects are present? Customers and account transactions are each table objects in
our sample database.
How is the data related? In our sample database, we have related these objects by a foreign
key field (SSN).
What information will be generated from the data? Will we need to design queries and/or
reports to list of all accounts for owners who live in Columbus, or summarize transactions by
account?
When setting up even the simplest of tables there are several factors to consider:

Tables should be divided into inseparable fields. For example, if an address field
contains the entire address (street, city, state and zip code) it may not be possible to sort our
list by state, or to display only those records within a specific zip code. In this case we may
want to define each of the address elements as separate fields.

Appropriate Field types should be selected with respect to the type of data being stored.

Appropriate field sizes should be used to minimize data storage.

Field properties should be defined to aid in data input (validity, defaults, etc.).

Each fact should change in only one place (except the foreign key and primary
key fields). If a fact appears in more than one record of a table, it should probably be
defined in another table.

Calculations shouldn’t be part of the table. In subsequent chapter we will discuss
how to perform calculations using the Access Query tool.

Appropriate primary keys should be selected that enable relationship structures between
tables of information.
Obviously this list can be greatly expanded. In fact there are both undergraduate and graduate
courses devoted to learning how to design, build and maintain databases. But the list should
give you some idea as to the types of things that need to be considered.
Page 15
A SAMPLE DATABASE
We will use the sample database in Figure 16 to illustrate the concepts introduced in this
chapter. This database contains three tables. The table designs and sample data are discussed
below. After the database structures are examined, database relationships will be defined. This
will require identifying what fields can be used as primary keys and what foreign key
relationships exist in our database.
Figure 16 -Club Database – Database View of Tables
Page 16
CLUB DATABASE TABLES:
The schema for the club database is as follows:
 Members(ID#, FirstName, LastName, PhoneNumber, City, State, Active, JoinDate )
This table is a membership list of all active and retired members. The ID# is a text data type
and has been identified as the primary key for this table. The Active field is a “Yes/No”
Boolean data type field (a yes indicates this is an active member). The Join Date field is a
date data type field. All other fields in the Members table are Text.
Note that when typing a Boolean field into an Access table, the column is usually displayed
as check box. In this case, a check indicates that this is an active member; no check indicates
a past or retired member. The actual values stored are the Boolean values. If an Access
table with a Boolean field is copied into an Excel spreadsheet, the Boolean values are
translated to True/False cell entries.
 Officers(ID#, PositionYear, Position )
A list of all past and present officers of the organization and the year and office held. ID# and
Position are type text, Year is type Number.
 Dues(ID#, DuesPaid, DuesDate )
This table contains records of member transactions. The amount of dues paid (full or partial
payment) is entered in the DuesPaid field. Dues for the year are $100 for active members;
retired members are not charged dues. Additional contributions are recorded in the
Donations field. The ID# field is type text and Dues-Paid and Donation fields are type
currency. A member may make any number of transactions.
 Donations(ID#, DonationType, DonationDate, EventType )
Additional contributions are recorded in the Donations field. The ID# field is type text and
the Donation field is type currency. The EvenType field records the avenue in which the
donation was collected; silent auction, raffle, etc. A member may make any number of
donations.
 Events(EventType, EventDescription)
The EventType and EventDescription are used to track the event in which the donation was
made.
Notice that the data types of the fields have been defined here. Later when setting up your
own databases, you will need to pay attention to the field types. As already discussed, field
names need not match to setup a Relationship, but field types must match.
Page 17
CLUB DATABASE RELATIONSHIPS:
So how can one determine from these tables whether Mr. Aston has paid his dues? Notice that
the Finance table identifies payments by ID# and not by name. The actual names of Members
can only be found on the Members table, and from there it can be determined that Mr. Aston’s
ID# is 345. Can ID# on the Members table be related to records with the same ID# on the
Finance table? Before answering this question we need to understand the relational database
structure.
As we learned in the previous sections, in order to relate two tables a common/valid foreign key
must exist. Such a relationship requires the following:

The same information is being related. The id# 2557892 is not related to the
telephone# 2557892 just because they look the same (unless telephone number is actually
being used as the member ID#).

Fields must be of the same field type. The membership id can be either a Text field or
as a Number field. When it is a Text field, the characters ‘645’ are stored as their ASCII 8 bit
equivalent for 6 then 4 then 5. When it is a Number field, the quantity 645 is saved as its
numerical binary equivalent. The computer does not recognize these two representations as
the same and cannot match them.

The field must be primary on at least one of the tables.
Also recall that the same field name in two tables does not imply a relationship exists between
them. Conversely, foreign keys can be setup between two fields that do not have the same
name as long as they meet the requirements of being related fields.
Now, keeping these rules in mind, consider the sample database:

The Members Table has an easily identifiable primary key which is the member ID#.
Looking at the Finances and Officers tables it appears that this same identification is being
used on both the Finance table (ID#) and on the Officers table (ID#). Since it is primary on
the Members table it can be used as a foreign key to set up the relationship from the
Members table to the Finances table and from the Members table to the Officers table. Can
the Finances table be directly related to the Officers table by this ID? No, since neither ID#
fields are primary on their respective tables.

The Dues Table does not contain a clear primary key field. If it is assumed that a member
may make more than one payment of the same amount, there is no field or field combination
that will uniquely identify a record on the Dues Table.

The Donations Table does not contain a clear primary key field. If it is assumed that a
member may make more than one donation of the same amount, there is no field or field
combination that will uniquely identify a record on the Donations Table.

The Officers Table also does not contain a primary key field.
Page 18

The Events Table has a primary key which is the EventType. Looking at the Events Table you
can see that each donation is connected to a specific event, thus the EventType field on the
Events table is a primary key for that relationship, and the EventType field on the Donations
table is the foreign key in the relationship between the two tables.
The resulting Relationships are diagrammed in Figure 17.
Foreign Key:
on
Dues
Primary Key: none
Foreign Key:
ID#
on
Dues Table
∞
Members
Primary key: ID#
1
ID#
Officers Table
Officers
1
∞ Primary Key: none
1
Foreign Key:
Figure 17
ID# on Donations Table
∞
Donations
Primary Key: none
Foreign Key:
∞
1
Events
Primary Key: EventType
EventType on Donations Table
The resulting Relationships as shown in MS Access (Figure 18)
Figure 18
Page 19
ENFORCING REFERENCIAL DATA INTEGRITY
A new transaction needs to be recorded on the Officers table as follows:
ID#
Year
Position
999
2007
President
This new transaction contains the ID# 999. Notice that there is no member ID# 999 listed on
the Members table. Is this a good idea? Certainly it makes sense that not all members will serve
as officers, but can the organization have a president who is not listed as a member? If the
membership list contains all members, where ID# is the primary key, it makes sense that related
tables, such as the Officers table, only contain member ID# numbers for persons listed on the
Members table. This concept of limiting entries made in a foreign key field to those items listed
in the primary key field of the related table is known as Referential Data Integrity.
In Access you can choose to physically enforce Referential Data Integrity in order to avoid any
unwanted problems. This feature will prompt the user with an error message if they attempt to
add a record that violates this property. Not enforcing this property does not automatically
mean that the database violates this rule; just that there is no mechanism in place to ensure it is
not violated. Referential Data Integrity can be enforced from the Relationships window, either
when initially setting up a foreign key relationship or by modifying the relationship line.
The steps are as follows:




Open the Relationships window using the
Relationships button on the Database Tools tab.
If necessary, add the tables and create the
relationships. If adding new relationships the
Edit Relationships window will automatically
open. If editing existing relationships either
double click on the relationship line (this can be
tricky) or right-click and select Edit Relationships
as seen in Figure 19.
Click on the check box for Enforce Referential
Data Integrity.
Click Create to save the changes.
Figure 19
Enforcing Referential Data Integrity is specific to a relationship. Enforcing this property for the
relationship between Officers and Members does not automatically enforce it for the
relationship between Finances and Members. Each one must be set independently.
Once Referential Data Integrity is enforced there will be two other options that can be selected:
Cascade Update Related Fields and Cascade Delete Related Records. Cascade
Update Related Fields allows for automatic updating of related database table records if the
primary key field value were to change for a specific record. In other words if member #002 is
assigned a new number, say #0200, and this property is enforced for the Members–Officers
relationship, all Officer records for member #002 will be updated to #0200. Similarly, Cascade
Page 20
Delete Related Records would delete related records if the corresponding record were deleted on
the primary table. So if member #020 is deleted from the Members table and this property is
enforced for the Members-Officers relationship, all Officer records for member #020 will be
deleted.
Based on the data shown in the Officers table in Figure 16, do all of the ID# numbers appearing
on the Officers table also appear on the Members table? No, the Officers table contains an entry
for Member ID# 009 while no member 009 appears on the Members table. Hence, Referential
Data Integrity has definitely not been enforced for this relationship. Similarly, consider the
relationship between the Members table and the Finances table. Here all member ID# numbers
contained on the Finances tables also appear on the related table; Referential Data Integrity has
not been violated. To know if Referential Data Integrity is enforced, one would need to check
the relationship properties in the Relationships window.
Page 21
Download