Chap 4

advertisement
CHAPTER 4
DATABASES AND DATA
WAREHOUSES
A Gold Mine of Information
4-2
Introduction
Today, Organizations Need...
Information to compete effectively
 Information just to stay alive in the information
age
 Information organized in such a way that you
can easily and quickly get to it
 Information-processing tools that help you work
with information

4-3
Introduction
YOUR FOCUS IN THIS CHAPTER
The Difference Between Logical and
Physical Views of Information
 Databases and Database Management
Systems
 How You Can Develop Database
Applications
 Data Warehouses and Data Mining Tools

4-4
Information Revisited
THREE THINGS ORGANIZATIONS
DO WITH INFORMATION
1.Process information in the form of
transactions
2.Use information to make a decision
3.Manage information while it’s used
4-5
Information
Revisited
PROCESSING INFORMATION IN
THE FORM OF TRANSACTIONS
Such as payroll processing, order processing,
and handling your registration requests for
classes.
 This is called ONLINE TRANSACTION
PROCESSING (OLTP) - the gathering of input
information, processing that information, and
updating existing information to reflect the
gathered and processed information.
 Operational databases support OLTP.

4-6
Information
Revisited
USING INFORMATION TO MAKE
A DECISION
For answering such questions as, “How many
senior-level marketing majors have not taken
statistics?”
 This is called ONLINE ANALYTICAL
PROCESSING (OLAP) - the manipulation of
information to support decision making.
 Data warehouses support OLAP.

4-7
Information Revisited
MANAGING INFORMATION
WHILE IT’S USED
Determining who can view or use information
 Specifying how to back up information
 Identifying what storage technologies to use

Most importantly, managing information includes
organizing it so that people can logically use it
without having to know anything about its
physical structure. The difference between
logical and physical is key.
4-8
Information
Revisited
physical
deals with
In managing information,
the structure of information as it resides on
various storage media.
 Logical deals with how knowledge workers
view their information needs, and includes such
terms as:

–
–
–
–
–
–
CHARACTER - our smallest unit of information.
FIELD - group of related characters.
RECORD - group of related fields.
FILE - group of related records.
DATABASE - group of logically associated files.
DATA WAREHOUSE - information from many
databases.
4-9
Databases
DATABASE
a collection of information that you organize and
access according to the logical structure of that
information.
A database is actually composed of two parts:
1. the information itself
– the files that are logically associated
2. the logical structure of the information
– called the data dictionary
4-10
Databases
A Database Is a Collection of
Information
Most databases contain two or more files with
related information.
 The Inventory database (Figure 4.4, page 125)
contains two files - Part and Facility.
 These two files are logically related because
parts are stored in facilities and because you
would use both of these files to manage your
inventory.

4-11
Databases
A Database Contains a Logical
Structure
You organize and access a database by its
logical structure, not its physical position.
 DATA DICTIONARY - contains the logical
structure of information in a database.
 The data dictionary contains the logical
properties that describe information in a
database.
 See Figure 4.5 (page 126) for the data
dictionary of the Percentage Markup field in the
Inventory database.

4-12
Databases
A Database Has Logical Ties
Among the Information
A PRIMARY KEY is a field in a database file
that uniquely describes each record.
 A FOREIGN KEY is a primary key of one file
that also appears in another file. So, foreign
keys specify how files are logically related.
 For example, the Part and Facility files are
logically related. So, in Figure 4.4 you can see
that Facility Number (the primary key for the
Facility file) exists in the Part file (where it’s a
foreign key).

4-13
Databases
A Database Contains Built-in
Integrity Constraints
An INTEGRITY CONSTRAINT is a rule that
helps assure the quality of the information in a
database.
 A registration database at your school includes
integrity constraints concerning prerequisites
for certain classes.
 Our Inventory database includes an integrity
constraint that says a part in the Part file
cannot be assigned to a facility that does not
exist in the Facility file.

4-14
Database Management Systems
DATABASE MANAGEMENT
SYSTEM (DBMS)
the software you use to specify the logical
organization for a database and access it.
A DBMS contains 5 software components:
1. DBMS engine
2. Data definition subsystem
3. Data manipulation subsystem
4. Application generation subsystem
5. Data administration subsystem
4-15
DBMSs
DBMS ENGINE
accepts logical requests from the various other
DBMS subsystems, converts them to their
physical equivalent, and actually accesses the
database and data dictionary as they exist on a
storage device.
Recall that:
 PHYSICAL VIEW deals with how information is
physically arranged, stored, and accessed on
some type of secondary storage device.
 LOGICAL VIEW focuses on how you need to
arrange and access information to meet your
particular business needs.
4-16
DBMSs
DATA DEFINITION SUBSYSTEM
helps you create and maintain the data
dictionary and define the structure of the files in
a database.
 You use this subsystem to define the
information logical structure when you first
create a database.
 Once you’ve created a database, you use this
subsystem to define new fields, delete fields, or
change field properties.
 Figure 4.5 (page 126) contains this subsystem
screen for the Part file.
4-17
DBMSs
DATA MANIPULATION SUBSYSTEM
helps you add, change, and delete information in
a database and mine it for valuable information.
This subsystem is most often the primary
interface between you as a user and the
information contained in a database.
 Tools in this subsystem include views, report
generators, query-by-example tools, and
structured query language.

4-18
DBMSs
DATA MANIPULATION TOOLS
VIEW - allows you to see the content of a
database file, make whatever changes you
want, perform simple sorting, and query to find
the location of specific information. See Figure
4.7 page 129.
 REPORT GENERATOR - helps you quickly
define formats of reports and what information
you want to see in a report. See Figures 4.8
and 4.9 page 130.

4-19
DBMSs
DATA MANIPULATION TOOLS
QUERY-BY-EXAMPLE (QBE) TOOL - helps
you graphically design the answer to a
question. Figure 4.10 (page 130) shows the
QBE for displaying the names and phone
numbers of facility managers in charge of parts
that cost more than $10.
 STRUCTURED QUERY LANGUAGE (SQL) a standardized fourth-generation language
found in most database environments. SQL is
the same as QBE, except that you perform a
query by creating a statement instead of
pointing, clicking, dragging.

4-20
DBMSs
APPLICATION GENERATION
SUBSYSTEM
contains facilities to help you develop
transaction-intensive applications. This
subsystem includes:
 Tools for creating data entry screens (See
Figure 4.12 page 131 for an example)
 Programming languages specific to a particular
DBMS
 Interfaces to commonly used programming
languages that are independent of any DBMS.
4-21
DBMSs
DATA ADMINISTRATION
SUBSYSTEM
helps you manage the overall database
environment by providing facilities for:
 Backup and recovery
 Security management
 Query optimization
 Reorganization
 Concurrency control
 Change management
4-22
Database Models
THE RELATIONAL DATABASE
MODEL
a database model that uses a series of twodimensional tables or files to store information.
This is the most popular model.
 Each table is called a RELATION.
 A relation contains information about a
particular ENTITY CLASS (a concept - people,
places, or things - about which you wish to
store information and that you can identify with
a unique key).

4-23
Database Models
Figure 4.14 (page 136) shows a relational
database for a video rental store.
 The entity classes are Customer, Video, Video
Rental, and Distributor.
 Notice how these tables are related to each
other through the use of foreign keys.
 In the Video Rental relation, you’ll find a
primary key that uses more than one one field
to create a unique description. This is called a
COMPOSITE PRIMARY KEY.
 A primary key that uses only one field is called
an ATOMIC PRIMARY KEY.

4-24
Database Models
THE OBJECT-ORIENTED (O-O)
DATABASE MODEL
a database model that brings together, stores,
and allows you to work with both information and
procedures that act on the information.

An OBJECT-ORIENTED DATABASE
MANAGEMENT SYSTEM (O-O DBMS) is the
DBMS software that allows you to develop and
work with an O-O database.
4-25
Database Models
This model takes advantage of the concept of
an OBJECT - a software module containing
information that describes an entity class
along with a list of procedures that can act on
the information describing the entity class.
 Figure 4.15 (page 138) shows the same video
rental store using the O-O database model.
 Notice that the objects (entity classes) - which
include Customer, Video Rental, Video, and
Distributor - contain both information and
procedures for working with that information.
 See Appendix C for more on objects.

4-26
Developing Databases
DEVELOPING YOUR OWN
DATABASE
Being able to develop your own database is a
part of knowledge worker computing.
 Building a database for your personal needs
includes the following 4 steps:

1. Defining entity classes and primary keys
2. Defining relationships among entity classes
3. Defining information (fields) for each relation
4. Using a data definition language to create the database

Follow along as we build the database to
support the report in Figure 4.16 on page 140.
4-27
Developing Databases
#1 - DEFINING ENTITY CLASSES
AND PRIMARY KEYS
From the report in Figure 4.16, you can identify
the entity classes as Employee, Department,
and Job.
 Now, for each entity class, you must define a
primary key that provides a unique description.
These include:

• Employee entity class - Emp ID (e.g., 2345 for Smith)
• Department entity class - Dept (e.g., 15)
• Job entity class - Job (e.g., 14 for Acct)
4-28
Developing Databases
#2 - DEFINING RELATIONSHIPS
AMONG ENTITY CLASSES

For this step, use an ENTITY-RELATIONSHIP
(E-R) DIAGRAM, a graphical method of
representing entity classes and their
relationships.

See Figure 4.17 (page 140) for the initial E-R
diagram of our database and a listing of E-R
diagram symbols.
4-29
EMPLOYEE
Developing Databases
M:1
DEPARTMENT
An Employee must be assigned to a
Department.
 An Employee cannot be assigned to more than
one Department.
 A Department may have many Employees
assigned to it.
 A Department is not required to have any
Employees assigned to it.

4-30
Developing Databases
After building the initial E-R diagram, you must
follow the process of normalization.
 NORMALIZATION is a process of assuring that
a relational database structure can be
implemented as a series of two-dimensional
tables.
 Normalization includes the following 3 steps:

1.Eliminate repeating groups or M:M relationships
2.Assure that each field in a relation depends only on the
primary key of that relation
3.Remove all derived fields from the relations.
4-31
Developing Databases
The first rule of normalization states that no
M:M relationships can exist.
 There is an M:M between Employee and Job.
 You eliminate this by creating an
INTERSECTION RELATION - a relation you
create to eliminate a repeating group.
 An intersection relation will have a composite
primary key that consists of the primary key
fields from the two intersecting relations.
 In Figure 4.18 (page 142), we created an
intersection relation called Employee-Job to
eliminate the M:M relationship.

4-32
Developing Databases
#3 - DEFINING INFORMATION
(FIELDS) FOR EACH RELATION
In this step, you follow rules #2 and #3 of
normalization.
 Your goal here is two-fold:

1.Make sure that the information in each relation is indeed
in the correct relation
2.Make sure that the information cannot be derived from
other information.
4-33
Developing Databases
To determine if information is in the correct
relation, ask:
“Does this piece of information depend only on
the primary key for this relation?”
 If the answer is yes, the information is in the
correct relation.
 In the Employee relation (Figure 4.20 page
144), we currently store Dept Sup. Does Dept
Sup depend on Emp ID?
 The answer is no - Dept Sup depends on Dept,
so it should be in the Department relation.

4-34
Developing Databases
Derived information - information that can be
mathematically determined from other
information - should not be stored in your
database.
 For example, # Emp is a field in the
Department relation.
 However, we can simply count the number of
occurrences of each Dept in the Employee
relation and determine the number of
employees.
 So, we remove # Emp from the database.

4-35
Developing Databases
#4 - USING A DATA DEFINITION
LANGUAGE TO CREATE THE
DATABASE
The final step is to actually create the relations
you identified in steps 1-3.
 You do this with a data definition language.
 This step includes:

– Developing a data dictionary
– Defining the various relations
– Defining primary keys and relationships
4-36
Data Warehouses
DATA WAREHOUSE
a logical collection of information - gathered from
many different operational databases - that
supports business analysis activities and
decision-making tasks. Data warehouses...
are a logical extension of databases
 support OLAP
 are among the newest and hottest buzz words
and concepts in the IT field.

4-37
Data Warehouses
DATA WAREHOUSE FEATURES

Data warehouses combine information from
different databases
– Making them a true repository of all an organization’s
information

Data warehouses are multi-dimensional
– As opposed to 2 dimensions in the relational model
– Often called hypercubes (See Figure 4.23 page 148)

Data warehouses support decision making
– While databases support OLTP, data warehouses support
OLAP
4-38
Data Warehouses
DATA MINING TOOLS
the software tools you use to query information in
a data warehouse.
 QUERY-AND-REPORTING TOOLS - QBE
tools, SQL, and report generators.
 INTELLIGENT AGENTS - various artificial
intelligence tools that form the basis for
“information discovery” in OLAP.
 MULTIDIMENSIONAL ANALYSIS (MDA)
TOOLS - slice-and-dice techniques that allow
you to view multidimensional information from
different perspectives.
4-39
Data Warehouses
IMPORTANT CONSIDERATIONS
IN USING A DATA WAREHOUSE
Do you need a data warehouse?
 Do you already have a data warehouse?
 Who will the users be?
 How up-to-date must the information be?
 What data mining tools do you need?

4-40
Managing Information
MANAGING THE INFORMATION
RESOURCE
How will changes in technology affect
organizing and managing information?
 What types of database models and databases
are most appropriate?
 Who should oversee the organization’s
information?

4-41
Managing Information
OVERSEEING YOUR
ORGANIZATION’S INFORMATION
CHIEF INFORMATION OFFICER (CIO) is the
IT manager who directs all IT systems and
personnel while communicating directly with
the highest levels of the organization.
 DATA ADMINISTRATION plans for, oversees
the development of, and monitors the
information resource.
 DATABASE ADMINISTRATION is responsible
for the more technical and operational aspects
of managing information in databases.

4-42
Managing Information
MANAGING THE INFORMATION
RESOURCE
Is information ownership a consideration?
 What are the ethics involved in organizing and
managing information?
 How should databases and database
applications be developed and maintained?

4-43
TO SUMMARIZE

How we view information:
– The physical view of information deals with how
information is physically arranged, stored, and accessed
on some type of secondary storage device.
– The logical view of information focuses on how you
need to arrange and access information to meet your
particular business needs.
A database is a collection of information that
you organize and access according to the
logical structure of that information.
 The data dictionary contains the logical
structure of information in a database.

4-44
TO SUMMARIZE
A database management system is the
software you use to specify the logical
organization for a database and access it.
 Popular database models include the relational
model and the object-oriented model.
 The four steps of developing a personal
database application include:

1. Define entity classes and primary keys
2. Define relationships among entity classes
3. Define information (fields) for each relation
4. Use a data definition language to create the database
4-45
TO SUMMARIZE

Data warehouses are a logical collection of
information - gathered from many different
operational databases - that supports business
analysis activities and decision-making tasks.

Data mining tools - the software tools you use
to query information in a data warehouse include query-and-reporting tools, intelligent
agents, and multidimensional analysis (MDA)
tools.
Download