DBMS Overview

advertisement
Overview of a Database Management System
Single Boxes:represent
system
components
Double Boxes:represent inmemory data
structures
Solid Lines :indicate
control and
data flow
Dashed Lines:indicate data
flow only
Overview of a Database
Management System
Sources of commands to the DBMS.
1. End Users and application programs that ask
for data or modify data
2. A DBA (Database Administrator) a person or
persons responsible for the structure or schema
of the database.
(Schema – the overall description of the
database logical structures that is defined by
the DDL)
Three main components of DBMS
1. Storage Manager - responsible for storing
data, metadata, indexes and logs.
2. Query Manager – parses queries, optimized
them by selecting query plan, and executes
the plan on the stored data
3. Transaction Manager - logging database
changes to support recovery after a system
crashes. Also support concurrent execution of
transactions in:
•
•
Atomicity – a tranx is performed either completely or
not at all
Isolation – a tranx is executed as if there was no
other concurrently executing tranx.
Data-Definition Language
Commands
• DDL (Data Definition Language) – defines the
format of each data element in a database.
Database tables for eg. Are created and
dropped using the DDL
• Parsed by a DDL processor and pass to the
execute engine
• Then goes through the index / file / record
manager to alter the metadata, that is the
schema information for the database.
(metadata – information describing the nature and structure of an
organization’s data: data about data)
Overview of Query Processing
• A user or an application program initiates some
action that does not affect the schema of the
database, but may affect the content of the
database (if the action is a modification
command) or will extract data from the database
(if action is a query)
• These command are expressed in DML (Data
Manipulation Language)
• ie,SQL is an example of a DML.
Overview of Query Processing
• DML statements are handled by two
separate subsystems:- Answering the Query
- Transaction processing
Answering the Query
1:- Query is parsed and optimized by a
Query Compiler.
2:- The resulting Query Plan or sequence of
actions the DBMS will perform to answer
the Query is passed to the execution
engine.
Answering the Query
3:- The execution engine issues a sequence
of requests for small pieces of data,
typically records or tuples of a relation, to
a resource manager that knows about
data files (holding relations), that format
and size of records in those files, and
index files which help find elements of
data files quickly.
Answering the Query
4:- The requests for data are translated
into pages and these requests are
passed to the buffer manager.
(Task of a buffer manager is to bring
appropriate portions of the data from
secondary storage (disk) where it kept
permanently to main memory buffer.
Normally the page or disk block is the unit
of transfer between buffer and disk).
Answering the Query
- The buffer manager communicates with a
storage manager to get data from disk
- The storage manager might involve OS
command
- Typically DBMS issues commands directly
to the disk controller
Transaction Processing
• Queries and other DML actions are
grouped into transactions, which are units
that must be executed atomically and
isolation from one another.
• Each Query or modification action is a
transaction by itself.
• Execution of a transactions must be
durable.
Transaction Processing
• Two major parts of a Transaction Processor
1. Concurrency control manager or scheduler
–
responsible for answering atomicity
and isolation of transaction.
2. Logging and recovery manager
–
responsible for the durability of
transactions
Transaction Processing
• DBMS offer the guarantee of durability
• Transaction manager therefore accepts
transaction commands from an application
which tell the transaction manager when
transaction begin and end as well as
information about the expectations of the
application
Transaction Processing
• Task that transaction processor performs
1.Logging: -to assure durability, every
change in the database is logged
separately on disk. Log Manager assure
when a system failure occurs, a recovery
manager will be able to examine the log of
changes and restore the database.
Transaction Processing
2. Concurrency control:- - assure that the
individual actions of multiple transactions
are executed in such an order that the net
effect is the same as if the transactions
had in fact executed in their entirely, one
at a time.
Transaction Processing
Typical scheduler does its work by
maintaining LOCK on certain pieces of the
database. These locks prevent two
transactions from accessing the same
piece of data in ways that interact badly.
Lock are generally stored in a main
memory lock tables
Transaction Processing
3. Deadlock resolution:- A situation where none
can proceed because each needs something
another transaction has. –
Transaction Manager has the responsibility
to intervene and cancel (roll back or abort)
one or more transactions to let the other
proceed.
Storage and Buffer Management
• Database normally resides in the
secondary memory
• Data must be in the main memory for any
useful operation to be perform
• It is the storage manager’s job to control
the placement of data on disk and its
movement between disk and main
memory.
Storage and Buffer Management
• Simple database – storage manager might
be nothing more than the file system of the
underlying OS.
• For efficiency purpose DBMS normally
control storage on disk directly.
• Storage manager keeps track of the
location of files on the disk and obtains the
block or blocks containing a file on request
from the buffer manager.
Storage and Buffer Management
• The buffer manager is responsible for
partitioning the available main memory
into buffers, which are page-sized regions
into which disk blocks can be transferred.
Storage and Buffer Management
• All DBMS components that need
information from disk will interact with the
buffers and the buffer manager, either
directly or through the execution engine.
Storage and Buffer Management
• Kind of information that various
component may need include:
Data :- the contents of the database
Metadata :- the database schema that
describes the structure of, and constrains
on, the database
Statistics :- Information gathered and
stored by the DBMS about data properties.
The Query Processor
• The portion of the DBMS that most affects
the performance that the user sees is the
query processor.
• Two component that represent the query
processor
- Query Compiler
- Execution Engine.
The Query Processor
• Query Compiler
-translate the query into an internal form
called query plan
- then a sequence of operations performed
on the data
The Query Processor
• Query Compiler consists of three major
units
(i) Query parser :- which build a tree structure from the
textual form of the query
(ii) Query Processor :- which perform semantic checks
on the query and perform some tree transformations
to turn the parse tree into a tree of algebraic
operators representing the initial query plan.
(iii) Query Optimizer :- transforms the initial query plan
into the best available sequence of operations on
the actual data.
The Query Processor
• The query compiler uses metadata and
statistics about the data to decide which
sequence of operations is likely to be the
fastest.
– For example: the existence of an index, which
is a specialized data structure that facilitates
access to data, given values for one or more
components of that data, can make one plan
much faster then another.
The Query Processor
• Execution Engine :- has the responsibility
for executing each of the steps in the
chosen query plan
The execution engine interacts with most of the
other component of the DBMS, either directly or
through the buffers. Data must get into the
Buffer in order to be manipulate. It needs to
interact with the scheduler to avoid accessing
data that is locked and with the log manager to
make sure that all database changes are
properly logged.
The Acid Properties of Transactions
• Properly implemented transaction are
commonly said to meet the “ACID
test” where :“A” = atomicity, the all or nothing execution
of transactions.
“I” = isolation, the fact that each
transaction must appear to be executed as
if no other transaction is executing at the
same time.
The Acid Properties of Transactions
“D”= durability, the condition that the effect on
the database of a transaction must never be lost
once the transaction has completed.
The remaining “C” stands for consistency. That
is, all databases have consistency constraints,
or expectations about relationships among data
elements (eg. Account balances may not be
negative) Transactions are expected to
preserver the consistency of the database.
Summary
• End of lecture
Outline of Database System
Studies
• Ideas related to Database system can be
divided into three broad categories.
1. Design of Databases
How does one develop a useful database?
What kind of information go into database?
How is the information structured?
What assumption are made about types or
values of data items?
How do data items connect?
Outline of Database System Studies
2. Database Programming.
How does one express queries and other
operations on the database?
How does one use other capabilities of a
DBMS such as transactions or constraints in
an application?
How is database programming combined
with conventional programming?
Outline of Database System
Studies
3.
Database System Implementation.
How does one build a DBMS, including
such matters as query processing and
organizing storage for efficient assess?
Storage management: how secondary storage is used
effectively to hold data and allow it to be accessed
quickly.
Query processing: how queries expressed in a very highlevel language such as SQL can be executed efficiently.
Transaction management: how to support transaction with
ACID properties.
Outline of Database System
Studies
4. Information Integration
Much of the recent evolution of database
systems has been toward capabilities that
allow different data sources, which may be
databases and/or information resources
that are not managed by a DBMS, to work
together in a larger whole.
Index
• How Indexes are Implemented
You can use the Indexed property to set a single-field
index. An index speeds up queries on the indexed
fields as well as sorting and grouping operations. For
example, if you search for specific employee names in
a LastName field, you can create an index for this
field to speed up the search for a specific name.
Setting
The Indexed property uses the following settings.
Setting
No
Yes
Yes
Description
(Default) No index.
(Duplicates OK) The index allows duplicates.
(No Duplicates) The index doesn't allow duplicates.
Index
• Remarks
Use the Indexed property to find and sort records by using a
single field in a table. The field can hold either unique or nonunique values. For example, you can create an index on an
EmployeeID field in an Employees table in which each
employee ID is unique or you can create an index on a Name
field in which some names may be duplicates.
• Note You can't index Memo, Hyperlink, or OLE Object
data type fields.
You can create as many indexes as you need. The indexes are
created when you save the table and are automatically updated
when you change or add records. You can add or delete indexes
at any time in table Design view.
Summary
• Overview of a Database Management System
DDL Data Definition Language Command
Query Processing
Storage and Buffer Management
Transaction Processing
The Query Processor
• Outline of Database-System Studies
Design of a Database
Database Programming
Database System Implementation
Information Integration
Download