chapter09 lecture notes - Carteret Community College

advertisement
Concepts of Database Management, Fifth Edition
9-1
Chapter 9
Database Management Approaches
At a Glance
Table of Contents

Overview

Objectives

Instructor Notes

Quick Quizzes

Key Terms
Lecture Notes
Overview
In this chapter, students examine several database management topics, most of which are applicable to relational
systems. Students learn about the issues involved in distributed processing and distributed databases. They study
client/server systems and data warehouses. Students examine object-oriented systems, which treat data as objects
and include the actions that operate on the objects. They also study the impact of the Web on database access.
Finally, students investigate the history of DBMSs and the network and hierarchical data models.
Chapter Objectives

Describe distributed database management systems (DBMSs).

Discuss client/server systems.

Define data warehouses and explain their structure and access.

Discuss the general concepts of object-oriented DBMSs.

Summarize the impact of Web access to databases.

Provide a brief history of database management.

Describe the network and hierarchical data models.
Instructor Notes
Distributed Databases
A distributed database is a single logical database that is physically divided among computers at several sites on a
network. A distributed database management system (DDBMS) is a DBMS capable of supporting and manipulating
distributed databases. Use Figure 9.1 to explain distributed database and distributed database management system
(DDBMS).
Computers in a network communicate through messages. Accessing data using messages over a network is
substantially slower than accessing data on a disk. In general, to access data rapidly in a distributed database, you
Concepts of Database Management, Fifth Edition
9-2
must attempt to minimize the numbers of messages. It is usually preferable to send a small number of lengthy
messages rather than a large number of short messages. Use the formula for message transmission time and the
illustration on page 288 to explain the messaging concept.
Characteristics of Distributed DBMSs
A Distributed Database Management System (DDBMS) can be homogeneous (same local DBMS at each site) or
heterogeneous (different local DBMSs). Heterogeneous DBMSs are more complex and more difficult to manage.
All DDBMSs share the characteristics of location transparency, replication transparency, and fragmentation
transparency.
Location Transparency
Location transparency is the characteristic that states that users do not need to be aware of the location of data in a
distributed database.
Replication Transparency
Replication lets users at different sites use and update copies of a database and then share their updates with other
users. Replication transparency refers to the characteristic that a DDBMS should update various copies of data
behind the scenes; users should be unaware of the steps.
Fragmentation Transparency
A DDBMS supports data fragmentation is the DDBMS can divide and manage a logical object, such as records in a
table, among the various locations under its control. If users are unaware of fragmentation, the DDBMS has
fragmentation transparency. Use Figures 9.2 and 9.3 to illustrate fragmentation transparency.
Advantages of Distributed Databases
When compared with a single centralized database, distributed databases offer the following advantages:

Local control of data
When each location retains it own data, it can exercise greater control.

Increasing database capacity
If the size of the disk at a single site becomes inadequate for its
database, only the disk capacity at that site needs to be increased. To
increase the capacity of the entire database, add a new site.

System availability
If one of the local databases in a distributed database becomes
unavailable, only users who need data in that particular database are
affected. Also, it the data has been replicated, potentially all users can
continue processing.

Added efficiency
When data is available locally, you eliminate network communication
delays and can retrieve data faster than with a remote centralized
database.
Disadvantages of Distributed Databases
Distributed databases have the following disadvantages:

Update of replicated data
Extra time is needed to update all the copies at various sites.

More complex query
processing
Complexity occurs due to the difference between the time it takes to send
messages between sites and the time it takes to access a disk.

More complex treatment of
concurrent update
Concurrent update in a distributed database is treated in basically the
same way as it is treated in nondistributed databases. There is, however,
an additional level of complexity created in a distributed environment.

More complex recovery
The basic recovery process is the same as that for nondistributed
databases. However, to make sure that the database remains consistent,
Concepts of Database Management, Fifth Edition
9-3
measures
each database update should be made permanent or aborted and undone.

More difficult management of
data dictionary
The location of the data dictionary entries is a concern. There are three
possibilities: choose one site and store the complete data dictionary at this
site; store a complete copy of the data dictionary at each site; and
distribute, possibly with replication, the data dictionary entries among the
various sites.

More complex database design
Distributing data does not affect the information level design. During the
physical
Rules for Distributed Databases
C. J. Date formulated 12 rules that distributed databases should follow. These rules are:
1.
Local autonomy
No site should depend on another site to perform its functions.
2.
No reliance on a central site
A DDBMS should not need to rely on one site more than any other site.
3.
Continuous operation
Performing any function should not shut down the entire distributed
database.
4.
Location transparency
Users should feel as if the entire database is stored at their location.
5.
Fragmentation transparency
Users should feel as if they are using a single central database.
6.
Replication transparency
Users should not be aware of any data replication.
7.
Distributed query
processing
A DDBMS must process queries as rapidly as possible even though the
data is distributed.
8.
Distributed transaction
management
A DDBMS must effectively manage transaction updates at multiple
sites.
9.
Hardware independence
A DDBMS must be able to run on different types of hardware.
10.
Operating system
independence
A DDBMS must be able to run on different operating systems.
11.
Network independence
A DDBMS must be able to run on different types of networks.
12.
DBMS independence
A DDBMS must be heterogeneous.
Client/Server Systems
In a network environment, a file server stores files required by users on the network. When users need data from a
file, the entire file is sent. Use Figure 9.4 to describe file server architecture.
In client/server architecture, the server is a computer providing data to the clients, which are the computers that are
connected to a network and that people use to access data stored on the server. The DBMS runs on the server and a
client sends a request for specific data to the server. Only the necessary data and not the entire file or files are sent.
A client/server architecture may be either two-tier or three-tier. In a two-tier architecture, the server performs
database functions and the clients perform the presentation (user interface) functions. Either the server or the clients
may perform business functions. The term fat client refers to an arrangement where the clients perform the business
functions. If the business functions reside on the server, each client is called a thin client. Use Figure 9.5 to illustrate
two-tier client/server architecture.
In a three-tier architecture, the clients perform the presentation functions, a database server performs the database
functions, and separate computers, called application servers, perform the business functions and act as interface
between clients and database server. Use Figure 9.6 to illustrate three-tier client/server architecture.
Concepts of Database Management, Fifth Edition
9-4
Advantages of Client/Server Systems
The advantages to using a client/server system instead of a file server are:
Transmits only the necessary data rather than entire files, across the network.
 Lower network traffic

Improved processing
distribution
Can distribute processing functions among multiple computers.

Thinner clients
Because application and database servers handle most of the processing, clients do
not need to be as powerful or as expensive as in a file-server environment.

Greater processing
transparency
Users do not need to learn any special commands or techniques.

Increased network,
hardware, and software
transparency
Because SQL is the common language, it is easier for users to access data from a
variety of sources. A single operation could access data from different networks,
different computers, and different operating systems.

Improved security
Can place additional security features on the application servers and on the network.

Decreased costs
Can replace, at a considerable cost savings, mainframe applications and mainframe
databases with PC applications and databases.

Increased scalability
Can upgrade the appropriate server or add additional processors to share the
processing load.
Triggers and Stored Procedures
Triggers, which are actions that occur automatically in response to associated database operations, provide
additional integrity support. Review the SQL example to create a trigger. If users execute a trigger or other
collection of SQL statements repeatedly, you can improve the performance of a client/server system by placing the
statements in a special file, called a stored procedure.
Data Warehouses
For routine update and retrieval operations, users typically interact with an RDBMS using online transaction
processing (OLTP) systems. These are ideal tools for operational needs but suffer from severe performance
problems when used for data analysis. Consequently, organizations have turned to data warehouses for the analysis
of data. A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of
management’s decision-making process. Subject-oriented means that data is organized by entity rather than by
application. Integrated means that data is stored in one place in the data warehouse. Time-variant means that data in
a data warehouse represents snapshots of data at various points in time in the past. Nonvolatile means that data is
read-only. Use Figure 9.7 to illustrate data warehouse architecture.
Spend some time on the subject of a data warehouse. Show the relationship between the type of information being
extracted and marketing decisions. Use the examples in the text to ask students for marketing strategies Premiere
Products might develop. Organizations use the results of these analyses to specifically target consumers.
Data Warehouse Structure and Access
The typical data warehouse data structure is a star schema, consisting of a central fact table, surrounded by
dimension tables. A fact table consists of rows that contain consolidated and summarized data. The fact table
contains a multipart primary key, each part of which is a foreign key to the surrounding dimension tables. Each
dimension table contains a single-part primary key that serves as an index for the fact table and that also contains
other fields associated with the primary key value. Use Figure 9.8 to illustrate a star schema.
Access to a data warehouse is accomplished through the use of online analytical processing (OLAP) software. When
users access a data warehouse, their queries usually involve aggregate data, such as total sales by month and average
sales by customer. Users often need to perform further analysis on the aggregate results. The most common types of
analyses are: slice and dice, drill down, and roll up. Use Figure 9.9 through 9.15 to illustrate these analyses.
Concepts of Database Management, Fifth Edition
9-5
Data mining consists of uncovering new knowledge, patterns, trends, and rules from the data stored in a data
warehouse.
Rules for OLAP Systems
E. F. Codd formulated 12 rules that OLAP systems should follow. These 12 rules are:
1.
Multidimensional conceptual view
Users should be able to view data in a multidimensional way.
2.
Transparency
Users should not have to know they are using a multidimensional
database.
3.
Accessibility
Users should perceive data as a single user view.
4.
Consistent reporting performance
The size and complexity of the warehouse should not affect
reporting performance.
5.
Client/server architecture
The server portion of the OLAP software should allow the use of
different types of clients.
6.
Generic dimensionality
Each data dimension should have the same structural and
operational capabilities.
7.
Dynamic sparse matrix handling
Missing data should be handled correctly and efficiently.
8.
Multiuser support
OLAP should provide secure, concurrent access.
9.
Unrestricted, cross-dimensional
operations
Users should be able to perform the same operations across any
number of dimensions.
10.
Intuitive data manipulation
Users should not need to use special interfaces to make their
requests.
11.
Flexible reporting
Users should be able to report data results any way they want.
12.
Unlimited dimensions and
aggregation levels
OLAP software should allow at least 15 data dimensions and an
unlimited number of summary levels
Quick Quiz
1.
A(n) _____ is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of
management’s decision-making process.
Answer: data warehouse
2.
A data warehouse structure that contains a fact table and surrounding dimension tables is called a(n) _____
schema.
Answer: star
3.
_____ consists of uncovering new knowledge, patterns, trends, and rules from the data stored in a data
warehouse.
Answer: Data mining
Object-Oriented DBMSs
Relational databases store and access data consisting of text and numbers. Graphics, drawings, photographs, video,
sound, voice mail, spreadsheets, and other complex objects can be stored in relational databases using special data
types known as binary large objects (BLOBs). When the primary focus is the storage and management of complex
objects, many companies use object-oriented DBMSs. Focus on a specific database and ask students questions about
a database with which they are familiar or have discussed in class to make sure they understand the underlying
concepts. Use the embedded questions to test students’ understanding of object-oriented concepts.
Concepts of Database Management, Fifth Edition
9-6
What is an Object-Oriented DBMS?
An object-oriented database management system (OODBMS) is one in which data and the actions that operate on
the data are encapsulated into objects. An object is a set of related attributes along with the actions that are
associated with the set of attributes. The concepts of object class, method, message, and inheritance are fundamental
to all object-oriented systems.
Objects and Classes
Use Figure 9.16 to explain the distinction between objects and classes. The differences between a collection of
objects and the relational model representation are:
 Entities are represented as objects rather than as relations.

Attributes are listed vertically below the object names. Each attribute is followed by the name of the domain
associated with the attribute. A domain is the set of values that are permitted for an attribute.

Objects can contain other objects.

An object can contain a portion of another object.
A class is a generalized category that describes a group of objects that can exist within it. For any class, you can
define a subclass.
Methods and Messages
Methods are the actions defined for a class. Use Figure 9.17 to explain methods. You define methods during the data
definition process. To cause a particular method to be executed, you send a message to the object. A message is a
request to execute a method.
Inheritance
Inheritance is a key feature of object-oriented systems. For any class, you can define a subclass. Every occurrence of
the subclass is also considered to be an occurrence of the class. The subclass inherits the structure of the class as
well as its methods.
Unified Modeling Language
Unified Modeling Language (UML) is an approach to model all the various aspects of software development for
object-oriented systems. UML includes several types of diagrams, each with its own special purpose that can be
used to represent database designs. The type of diagram most relevant to database design is the class diagram. Use
Figures 9.18 through 9.21 to illustrate UML. Stress that UML is rapidly becoming the industry standard in objectoriented software development. To learn more about UML, access the IBM Rational Rose web site, http://www306.ibm.com/software/rational/uml/.
Rules for OODBMSs
There are 14 rules to use as benchmarks against which to measure object-oriented systems. These rules are:
1. Complex objects
Must support the creation of complex objects from simple objects.
2.
Object identity
Must provide a way to identify objects, that is, must be able to distinguish one object
from another.
3.
Encapsulation
Must encapsulate data and associated methods together in the database.
4.
Information hiding
Must hide the details concerning the way data is stored and actual implementation of
methods.
5.
Types or classes
Must support either abstract types or classes.
6.
Inheritance
Must support inheritance.
7.
Late binding
Must be able to use the same name for different operations (late binding allows this).
In object-oriented systems, this is called polymorphism.
Concepts of Database Management, Fifth Edition
9-7
8.
Computational
completeness
Can use functions in the language of the OODBMS to perform computations.
9.
Extensibility
Must be able to define new data types.
10.
Persistence
Must have the ability to have a program remember its data from one execution to the
next.
11.
Performance
Should be able to manage very large databases.
12.
Concurrent update
support
Must support concurrent update.
13.
Recovery support
Must provide recovery services.
14.
Query facility
Must provide query facilities.
Web Access to Databases
Many organizations now use the Internet and a Web browser to conduct commercial activities. Collectively, these
activities are called electronic commerce (e-commerce). Users access databases via Web browsers. A three-tier
client/server architecture makes this access possible. If a company uses this architecture, a Web server often replaces
the application server to process client requests. A more flexible architecture positions Web servers as an additional
tier. Use Figure 9.22 to explain access to a database via a Web browser. Many different software languages,
products, and standards support e-commerce. One of these languages, XML (Extensible Markup Language) is
particularly suited to the exchange of data between different programs. Mention that Access provides the ability to
import and export data in XML. Access also supports data access page. A data access page is an HTML document
that can be bound directly to data in the database.
History of Database Management
The beginnings of database management coincided with the APOLLO project of the 1960s. IBM developed
Generalized Update Access Method (GUAM) to handle the coordination of vast amounts of data required for the
space project. Other DBMSs, such as Integrated Data Store (IDS) also were developed during the 1960s. The
relational model had its beginnings in 1970 with the publication of a paper by E.F. Codd. Commercial relational
DBMSs did not appear until the 1980s. Microsoft Access is currently the dominant PC-based relational DBMS.
Other large relational DBMSs include Oracle, Sybase, MySQL, and SQL Server.
Hierarchical and Network Databases
You can categorize a DBMS by the data model the DBMS follows. There are four data models, each of which has
two components: structure and operations. The relational model and the object-oriented model have been discussed
previously in this text. The network and hierarchical models are in declining use. Point out that although the use of
network and hierarchical model DBMSs is declining, some of the concepts show up in current database models.
Network Model
Users perceive a network model database as a collection of record types and relationships between these record
types. Such a structure is called a network. Use Figure 9.23 to explain the network model.
Hierarchical Model
Users perceive a hierarchical model database as a collection of hierarchies, or trees. Use Figure 9.24 to explain the
hierarchical model.
Key Terms
All key terms are defined in the Glossary section of the textbook.
Access
back-end machine
access delay
back-end processor
American National Standards Institute (ANSI)
binary large object (BLOB)
application server
binding
association
business to business (B2B)
Concepts of Database Management, Fifth Edition
business to consumer (B2C)
class
class diagram
client
client/server system
communications network
COnference on DAta SYstems Languages
(CODASYL)
coordinator
data cube
data fragmentation
Data Language/I (DL/I)
data mining
data model
data warehouse
database navigation
DataBase Task Group (DBTG)
DB2
dBASE
dimension table
distributed database
distributed database management system (DDBMS)
domain
drill down
electronic commerce (e-commerce)
encapsulated
extensible
Extensible Markup Language (XML)
fact table
fat client
file server
fragmentation transparency
front-end machine
front-end processor
Gemstone
generalization
Generalized Update Access Method (GUAM)
global deadlock
heterogeneous DDBMS
hierarchical model
hierarchy
homogeneous DDBMS
Information Management System (IMS)
inheritance
Integrated Data Management System (IDMS)
Integrated Data Store (IDS)
local deadlock
local site
9-8
location transparency
message
method
multidimensional database
multiplicity
MySQL
network
network model
object
Objectivity/DB
object-oriented database management system
(OODBMS)
object-relational DBMS (ORDBMS)
on-line analytical processing (OLAP)
on-line transaction processing (OLTP)
operation
Oracle
Paradox
persistence
polymorphism
primary copy
private visibility
protected visibility
public visibility
relational DBMS (RDBMS)
remote site
replication transparency
roll up
scalability
server
slice and dice
SQL server
star schema
stored procedure
structure
subclass
superclass
Sybase
System R
thin client
three-tier architecture
trigger
two-phase commit
two-tier architecture
Unified Modeling Language (UML)
Versant
visibility symbol
Download