Chp1-2-5th

advertisement
NOTES - PART I
Databases and Database Management Systems
Based on Chapters 1-2 in
Fundamentals of Databse Systems by Elmasri and Navathe, Ed. 5)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 1
Outline








Types of Databases and Database Applications
Basic Definitions
Typical DBMS Functionality
Example of a Database (UNIVERSITY)
Main Characteristics of the Database Approach
Database Users
Advantages of Using the Database Approach
When Not to Use Databases
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 1- 3
Simplified database system environment
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 1- 6
1. Basic Definitions
Database: A collection of related data that has the following
implicit properties:
1. Represents some aspect of the real world
2. Logically cohernet collection of data with
some inherent meaning
3. Designed, built, and populated with data for a
specific purpose
4. Can be of any size and complexity
Data: Known facts that can be recorded and have an implicit
meaning.
Database Management System (DBMS): A software
package/ system to facilitate the creation and maintenance of
a computerized database.
The DBMS is a general-purpose software system that
facilitates the process of defining, constructing,
manipulating, and sharing data among users and
applications.
Defining a database involves specifying the data types,
structures, and constraints for data. The descriptive
information about what is stored in the database forms what
is called meta-data: data that describes data.
Constructing is the storing of data.
Manipulating is the retrieval and updating of data.
Sharing is between multiple users and/or programs
concurrently.
Other functions include 1) system protection against
hardware and software failures, 2) security protection
against unauthorized access, and 3) evolution of data
requirements.
Database System: The DBMS software together with the
data itself. Sometimes, the applications are also included.
1.2. Example of a Database
INSERT MS Access Example Here
1.3 Characteristics of Database Approach
1 Self-describing nature of a database system: A DBMS
catalog stores the description of the database. The
description is called meta-data). This allows the DBMS
software to work with different databases.
2 Insulation between programs and data: Called programdata independence. Allows changing data storage
structures and operations without having to change the
DBMS access programs.
In object-oriented model, users can define operations
on data. These operations (methods) have an interface
which includes the operation name and parameters.
The implementation (of the method) can be changed
without affecting the interface. This is programoperation independence.
Data Abstraction: A data model is used to hide storage
details and present the users with a conceptual view of the
database. In other words, this allows the program data
independence and program operation independence.
A data model (definition found in chapter 2) is a
collection of concepts and basic operations that can be
used to describe the structure of a database, and certain
constraints that the database should obey.
3 Support of multiple views of the data: Each user may see a
different view of the database, which describes only the data
of interest to that user.
4 Concurrency Control: A multiuser DBMS must allow
multiple users to access the database at the same time
and ensure that users trying to update the same data do
so in a controlled manner so that the results is correct.
The concept of a transaction is central to the accessing
of data in a database. A transaction is an atomic unit of
work performed on the database. This work may be
reading, writing, updating, inserting, deleting, etc. The
transaction is to be performed as though it was
executed in isolation.
1.4. Actors (users) on the Scene
1. DBA-- database administrator is responsible for
authorizing access to the database, for coordinating and
monitoring its use, and for acquiring software and
hardware resources as needed.
2. Database Designers are responsible for identifying the
data to be stored in the database and for choosing
appropriate structures to represent and store this data.
3. End Users
a. casual
b. naïve
c. sophisticated
d. stand-alone
4. Software Engineers
a. system analysts determine the requirement of end
users and develop specifications for transactions to meet
their needs.
b. application programmers implement these
specifications as programs.
5. workers behind the scene
a. DBMS system designers and implementers
b. tool developers
c. operators and maintenance personnel
1.6 Additional Advantages of Using DBMS
1 Controlling redundancy in data storage and in development
and maintenence efforts.
2 Restricting unauthorized access to data.
3 Providing Persistent storage for program objects and
data structures
4 Efficient Query Processing
5 Providing backup and recovery systems
6 Providing multiple interfaces to different classes of users.
7 Representing complex relationships among data.
8 Enforcing integrity constraints on the database.
9 Permitting Inferencing and actions using rules. Typical
actions are performed by triggers and stored procedures.
10 Potential for enforcing standards.
11 Reduced application development time.
12 Flexibility to change data structures.
13 Availability of up-to-date information.
14 Economies of scale and sharing of data among multiple
users.
1.7 Brief History of database
HISTORY OF DATA MODELS
History of Data Models

Network Model:



The first network DBMS was implemented by
Honeywell in 1964-65 (IDS System).
Adopted heavily due to the support by CODASYL
(Conference on Data Systems Languages)
(CODASYL - DBTG report of 1971).
Later implemented in a large variety of systems IDMS (Cullinet - now Computer Associates), DMS
1100 (Unisys), IMAGE (H.P. (Hewlett-Packard)),
VAX -DBMS (Digital Equipment Corp., next
COMPAQ, now H.P.).
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 44
Example of Network Model Schema
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 45
Network Model

Advantages:



Network Model is able to model complex
relationships and represents semantics of
add/delete on the relationships.
Can handle most situations for modeling using
record types and relationship types.
Language is navigational; uses constructs like
FIND, FIND member, FIND owner, FIND NEXT
within set, GET, etc.

Programmers can do optimal navigation through the
database.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
•
Slide 2- 46
Network Model

Disadvantages:


Navigational and procedural nature of processing
Database contains a complex array of pointers
that thread through a set of records.

Little scope for automated “query optimization”
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 47
Network Model

Disadvantages:


Navigational and procedural nature of processing
Database contains a complex array of pointers
that thread through a set of records.

Little scope for automated “query optimization”
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 47
History of Data Models

Hierarchical Data Model:




Initially implemented in a joint effort by IBM and
North American Rockwell around 1965. Resulted
in the IMS family of systems.
IBM’s IMS product had (and still has) a very large
customer base worldwide
Hierarchical model was formalized based on the
IMS system
Other systems based on this model: System 2k
(SAS inc.)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 48
Hierarchical Model

Advantages:



Simple to construct and operate
Corresponds to a number of natural hierarchically organized
domains, e.g., organization (“org”) chart
Language is simple:


Uses constructs like GET, GET UNIQUE, GET NEXT, GET
NEXT WITHIN PARENT, etc.
Disadvantages:



Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization"
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 49
History of Data Models

Relational Model:






Proposed in 1970 by E.F. Codd (IBM), first commercial
system in 1981-82.
Now in several commercial products (e.g. DB2, ORACLE,
MS SQL Server, SYBASE, INFORMIX).
Several free open source implementations, e.g. MySQL,
PostgreSQL
Currently most dominant for developing database
applications.
SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2),
SQL-99, SQL3, …
Chapters 5 through 11 describe this model in detail
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 50
History of Data Models

Object-oriented Data Models:





Several models have been proposed for implementing in a
database system.
One set comprises models of persistent O-O Programming
Languages such as C++ (e.g., in OBJECTSTORE or
VERSANT), and Smalltalk (e.g., in GEMSTONE).
Additionally, systems like O2, ORION (at MCC - then
ITASCA), IRIS (at H.P.- used in Open OODB).
Object Database Standard: ODMG-93, ODMG-version 2.0,
ODMG-version 3.0.
Chapters 20 and 21 describe this model.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 51
History of Data Models

Object-Relational Models:





Most Recent Trend. Started with Informix
Universal Server.
Relational systems incorporate concepts from
object databases leading to object-relational.
Exemplified in the latest versions of Oracle-10i,
DB2, and SQL Server and other DBMSs.
Standards included in SQL-99 and expected to be
enhanced in future SQL standards.
Chapter 22 describes this model.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 52
• E-commerce: Extraction and insertion of data from/into
a database dynamically via web access. eXtended Markup
Language (XML) is the primary standard for interchanging
data among various types of databases and web pages.
• XML databases: An evolution from the usage of web
pages and the transmission of data between different
entities has led to the storage and retrievial of XML
documents in databases. In some cases this is native and in
others an adaptive hybrid.
1.8 When not to use a DBMS
Main inhibitors (costs) of using a DBMS:
- High initial investment and possible need for additional
hardware and/or personnel
- Overhead for providing generality, security, recovery,
integrity, and concurrency control.
When a DBMS may be unnecessary:
- If the database and applications are simple, well defined, and
not expected to change.
- If there are stringent real-time requirements that may not be
met because of DBMS overhead.
- If access to data by multiple users is not required.
When no DBMS may suffice:
- If the database system is not able to handle the complexity of
data because of modeling limitations
- If the database users need special operations not supported by
the DBMS.
Chapter 2
Outline









Data Models and Their Categories
History of Data Models
Schemas, Instances, and States
Three-Schema Architecture
Data Independence
DBMS Languages and Interfaces
Database System Utilities and Tools
Centralized and Client-Server Architectures
Classification of DBMSs
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 3
2.1 Data Models
A data model is a collection of concepts and basic
operations that can be used to describe the structure of
a database, and certain constraints that the database
should obey.
Data Model Operations: Operations for specifying
database retrievals and updates by referring to the concepts
of the data model.
2.1.1 Categories of data models:
- Conceptual (high-level, semantic) data models: Provide
concepts that are close to the way many users perceive data.
(Also called entity-based or object-based data models.)
- Physical (low-level, internal) data models: Provide
concepts that describe details of how data is stored in the
computer.
- representational data models: Provide concepts that fall
between the above two, balancing user views with some
computer storage details.
Entity-- represnets a real-world object or concept
Attribute -- represnets some property of interest that
further describes an entity
Relationship (between entities) -- represents an interaction
among entities
Traditional representational data models used: relational,
network, and hierarchical … these are record-based
Newer model closer to conceptual model … object
2.1.2 Schemas and Instances
Database Schema: The description of a database. Includes
descriptions of the database structure and the constraints that
should hold on the database.
Schema Diagram: A diagrammatic display of (some aspects
of) a database schema.
Database Instance: The actual data stored in a database at a
particular moment in time . Also called database state (or
occurrence).
The database schema changes very infrequently . The
database state changes every time the database is updated .
Schema is also called intension, whereas state is called
extension.
Example of a Database Schema
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 11
2.2.1 Three-Schema Architecture
Proposed to support DBMS characteristics of:
- Program-data independence.
- Support of multiple views of the data.
Defines DBMS schemas at three levels :
- Internal schema at the internal level to describe data
storage structures and access paths. Typically uses a
physical data model.
- Conceptual schema at the conceptual level to describe the
structure and constraints for the whole database. Uses a
conceptual or an implementation data model.
- External schemas at the external level to describe the
various user views. Usually uses the same data model as the
conceptual level.
Mappings among schema levels are also needed. Programs
refer to an external schema, and are mapped by the DBMS to
the internal schema for execution.
The three-schema architecture
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 2- 15
2.2.2 Data Independence
Logical Data Independence: The capacity to change the
conceptual schema without having to change the external
schemas and their application programs.
Physical Data Independence: The capacity to change the
internal schema without having to change the conceptual
schema.
When a schema at a lower level is changed, only the mappings
between this schema and higher-level schemas need to be
changed in a DBMS that fully supports data independence. The
higher-level schemas themselves are unchanged. Hence, the
application programs need not be changed since the refer to the
external schemas.
2.3.1 DBMS Languages
Data Definition Language (DDL): Used by the DBA and
database designers to specify the conceptual schema of a
database. In many DBMSs, the DDL is also used to define
internal and external schemas (views). In some DBMSs,
separate storage definition language (SDL) and view
definition language (VDL) are used to define internal and
external schemas.
Data Manipulation Language (DML): Used to specify
database retrievals and updates. Two types: high level
nonprocedural and low level procedural
Low Level Procedural
- DML commands (data sublanguage) can be embedded in a
general-purpose programming language (host language),
such as C++, Java, COBOL, …
- DML commands (data sublanguage) embedded in a
proprietary vendor specific language
High Level Non-Procedural
- Alternatively, stand-alone DML commands can be applied
directly (query language) ex. SQL
2.3.2 DBMS Interfaces
- User-friendly interfaces:
- Menu-Based Interfaces for Web Clients or Browsing
- Forms-based
- Graphics-based (Point and Click, Drag and Drop etc.)
- Natural language
- Parametric interfaces using function keys.
- Interfaces for the DBA:
- Creating accounts, granting authorizations
- Setting system parameters
- Changing schemas or access path
- Speech as Input and Output
- Stand-alone query language interfaces.
- Programmer interfaces for embedding DML in programming
languages:
- Pre-compiler Approach
- Procedure (Subroutine) Call Approach
- Report generation languages.
insert Figure 2.3………
2.5 Architectures of DBMS
2.5.1
Centralized
2.5.2
Basic Client/Server
2.5.3
Two-Tier Client/Server
ODBC ( Open Database Connectivity)
provides an API.
2.5.4
Three-Tier Client/Server
Client, Application/Web Server, Database Server
2.6. Classification of DBMSs
First classification: Based on the data model used:
- Traditional: Relational, Network, Hierarchical.
- Emerging: Object-oriented, Object-relational.
Second classification:
- Single-user (typically used with micro- computers) vs.
multi-user (most DBMSs).
Third classification:
- Centralized (uses a single computer with one database)
- Distributed (uses multiple computers, multiple databases)
- homogenous (loosely coupled systems using the same
DBMS software
- heterogeneous (loosely coupled systems using different preexisting DBMS). Also called federated DBMS
..note…..Distributed Database Systems have now come to be
known as client server based database systems because they
do not support a totally distributed environment, but rather a
set of database servers supporting a set of clients.
Fourth classification: cost
Fifth: type of access path (indexing and storage of data)
Sixth: general or special purpose
Download