Introduction

advertisement
PMIT-6102
Advanced Database Systems
ByJesmin Akhter
Assistant Professor, IIT, Jahangirnagar University
 Continue
from 24.01.2014-13.06.2014
Every week
Friday
•
From 10:30 AM-1:00 PM
NB: Schedule may change
Slide 2
Attendance
 Exercise test

=10%
=5%
Instant test
Assignment
Presentation
Class Test (Average of three) =15%
 Mid-Term Examination
=30%
 Final Examination
=40%
================================
=100%

Slide 3










Introduction (Lecture 01)
Overview of Relational DBMS (Lecture 02, 03)
Distributed Database Design (Lecture 04)
Overview of Query Processing (Lecture 05)
Distributed Query Processing (Lecture 06)
Tutorial-1
Mid-Term
Tutorial-2
Distributed Transaction Management (Lecture 07)
Distributed Concurrency Control (Lecture 08, 09)
Reliability (Lecture 10, 11)
Parallel Database Systems (Lecture 12,13)
Distributed Object DBMS (Lecture14)
Tutorial-4
Total : 14 lectures+ 4Tutorials + Midterm + Final Exam =20 weeks.
Slide 4
Tutorial-3
Slide 5
Tutorial
Date and Time
Tutorial-01
14th February 2014
Tutorial-02
14th March 2013
Mid term Examination
28th March 2014
Tutorial-03
2nd May 2014
Final Examination
13th June 2013
NB: Schedule may change
Slide 6
Lecture 01
Introduction to DDBMS
 Introduction
Distributed Database System
Applications
Distributed DBMS Promises
Problem Areas
Architectural Models for Distributed
DBMSs
Slide 8
Application
program 1
DBMS
Data description
Application
program 2
Data manipulation
control
database
Application
program 3
Slide 9
Database
Technology
Computer
Networks
integration
distribution
Distributed
Database
Systems
integration
Slide 10

A number of autonomous processing elements
that are interconnected by a computer network and
that cooperate in performing their assigned tasks.

The “processing element” referred to a computing
device that can execute a program on its own.
Slide 11




Processing logic: processing logic or processing
elements are distributed
Functions: Various functions of a computer system
could be delegated to various pieces of hardware or
software
Data: Data used by a number of applications may be
distributed to a number of processing sites
Control: The control of the execution of various tasks
might be distributed instead of being performed by
one computer system.
Slide 12
“Distributed database system” (DDBS) is used to
refer jointly
distributed database and the distributed DBMS.
A distributed database (DDB) is a collection of
multiple, logically interrelated databases distributed
over a computer network.

A distributed database management system
(D–DBMS) is the software
 manages the DDB and
 provides an access mechanism
 makes this distribution transparent to the users.
Slide 13



Physical distribution does not necessarily imply that
the computer systems be geographically far apart;
May be in the same room.
The communication between them is done over a
network instead of
 through shared memory or shared disk (multiprocessor
systems) with the network as the only shared resource.
Slide 14


A timesharing computer system
A loosely or tightly coupled multiprocessor system
Not DDBS, Because in DDBS communication between
computer systems is done over a network instead of through
shared memory or shared disk with the network as the only
shared resource.

A database system
which resides at one of the nodes of a network of
computers - this is a centralized database on a network
node
Slide 15


The CPU time is shared by different processes
Time slice is defined by the OS, for sharing CPU
time between processes.
Slide 16
P1
Pn
M
D
Not a DDBS
Slide 17

Each processor node has its
own primary and secondary memory,
may also have its own peripherals, are quite similar to the
distributed environment, but there are differences.
The fundamental difference is the mode of operation.
Database systems that run over multiprocessor systems are
called parallel database systems
P1
M1
Pn
D
Mn
1
D
n
Not a DDBS
Slide 18
Site 1
Site 2
Site 5
Communication
Network
Site 4
Site 3
Not a DDBS
Slide 19
Site 1
Site 2
Site 5
Communication
Network
Site 4
Site 3
Slide 20
DBMS
Software
DBMS
Software
DBMS
Software
User
Query
User
Application
DBMS
Software
Communication
Subsystem
User
Query
DBMS
Software
User
Application
User
Query
Slide 21


Data stored at a number of sites  each site
logically consists of a single processor.
Processors at different sites are interconnected
by a computer network  no multiprocessors
parallel database systems

Distributed database is a database, not a
collection of files  data logically related as
exhibited in the users’ access patterns
relational data model

D-DBMS is a full-fledged DBMS
not remote file system.
Slide 22







Manufacturing - especially multi-plant
manufacturing
Military command and control
Electronic fund transfers and electronic trading
Corporate MIS
Airline restrictions
Hotel chains
Any organization which has a decentralized
organization structure
Slide 23




Transparent management of distributed,
fragmented, and replicated data
Improved reliability/availability through
distributed transactions
Improved performance
Easier and more economical system expansion
Slide 24

Example: Four relations:
EMP(ENO, ENAME, TITLE)
PROJ(PNO,PNAME, BUDGET)
SAL(TITLE, AMT)
ASG(ENO, PNO, RESP, DUR).

For a centralized DBMS, find out the names of employees with
salary who worked on a project for more than 12 months
SELECT ENAME, AMT
FROM EMP, ASG, SAL
WHERE ASG.DUR > 12
AND EMP.ENO = ASG.ENO
AND SAL.TITLE = EMP.TITLE
Slide 25
ASG
EMP
ENO
ENAME
TITLE
E1
E2
E3
E4
E5
E6
E7
E8
J. Doe
M. Smith
A. Lee
J. Miller
B. Casey
L. Chu
R. Davis
J. Jones
Elect. Eng.
Syst. Anal.
Mech. Eng.
Programmer
Syst. Anal.
Elect. Eng.
Mech. Eng.
Syst. Anal.
ENO PNO
PROJ
E1
E2
E2
E3
E3
E4
E5
E6
E7
E7
E8
P1
P1
P2
P3
P4
P2
P2
P4
P3
P5
P3
RESP
Manager
Analyst
Analyst
Consultant
Engineer
Programmer
Manager
Manager
Engineer
Engineer
Manager
DUR
12
24
6
10
48
18
24
48
36
23
40
Sal
PNO
PNAME
BUDGET
TITLE
AMT
P1
P2
P3
P4
Instrumentation
Database Develop.
CAD/CAM
Maintenance
150000
135000
250000
310000
Elect. Eng.
Syst. Anal.
Mech. Eng.
Programmer
40000
34000
27000
24000
Slide 26

To localize data such that data about the employees in
 Waterloo office are stored in Waterloo,
those in the Boston office are stored in Boston, and so forth.
The same applies to the project and salary information.
That is data is distributed.


We partition each of the relations and store each partition
at a different site. This is known as fragmentation.
Data that are commonly accessed by one user
can be placed on that user’s local machine
as well as on the machine of another user with the same
access requirements.
That is data is replicated
Slide 27
Fully transparent access means that
the users can still create the query without paying any attention to the
fragmentation, location, or replication of data.
let the system worry about resolving these issues.
SELECT
FROM
WHERE
AND
AND
ENAME,AMT
EMP,ASG,SAL
DUR > 12
EMP.ENO = ASG.ENO
SAL.TITLE = EMP.TITLE
Tokyo
Paris
Boston
Communication
Network
Paris projects
Paris employees
Paris assignments
Boston employees
Boston projects
Boston employees
Boston assignments
Montreal
New
York
Boston projects
New York employees
New York projects
New York assignments
Montreal projects
Paris projects
New York projects
with budget > 200000
Montreal employees
Montreal assignments
Slide 28


A transparent system “hides” the implementation
details from users.
Fundamental issue is to provide Data independence in
the distributed environment
Network (distribution) transparency
Replication transparency
Fragmentation transparency
 horizontal
fragmentation: selection
 vertical fragmentation: projection
 hybrid
Slide 29


It refers to the immunity of user applications
to changes in the definition and organization of
data.
Logical data independence
 Logical data independence refers to the immunity
of user applications to changes in the logical
structure (i.e., schema) of the database.

Physical data independence
Deals with hiding the details of the storage
structure from user applications.
Slide 30


In centralized database systems, the only available
resource that needs to be shielded from the user is the
data.
In a distributed database environment
a second resource that needs to be managed in much the
same manner: the network.



The user should be protected from the operational details
of the network; possibly even hiding the existence of
the network.
Then there would be no difference between database
applications that would run on a centralized
database and those that would run on a distributed
database.
This type of transparency is referred to as network
transparency or distribution transparency.
Slide 31


From a DBMS perspective, distribution transparency
requires that users do not have to specify where data
are located.
Sometimes two types of distribution transparency
are identified:
location transparency
Naming transparency.
Slide 32

Location transparency refers to the fact that the
command used to perform a task is independent of

both the location of the data and the system on which an
operation is carried out.

Naming transparency means that a unique
name is provided for each object in the database.
In the absence of naming transparency, users are required
to embed the location name as part of the object name.
Slide 33
Distribute data in a replicated fashion
across the machines on a network.
 If one of the machines fails, a copy of the
data are still available on another
machine on the network

Increase reliability, and availability of data.
Increases the locality of reference.
Slide 34

Data are replicated, the transparency issue is:
The users should not be aware of the existence
of copies and the system should handle the
management of copies.
The users not to be involved with handling
copies and having to specify the fact that a
certain action can and/or should be taken on
multiple copies.
Slide 35


Increase performance, availability and reliability.
fragmentation can reduce the negative effects of
replication.
Each replica is not the full relation but only a
subset of it;
thus less space is required and fewer data items
need be managed.
Slide 36


Horizontal fragmentation: A relation is
partitioned into a set of sub-relations each of which
have a subset of the tuples (rows) of the original
relation.
Vertical fragmentation: Where each subrelation is defined on a subset of the attributes
(columns) of the original relation.
Slide 37

Improve reliability since they have replicated
components and, thereby eliminate single points of
failure.
The failure of a single site, or the failure of a
communication link which makes one or more sites
unreachable, is not sufficient to bring down the entire
system.
Slide 38



Proximity to its points of use (also called data
localization).
Requires some support for fragmentation and
replication.
This has two potential advantages:
Since each site handles only a portion of the database,
contention for CPU and I/O services is not as severe as for
centralized databases.
Localization reduces remote access delays that are usually
involved in wide area networks.
Slide 39


Issue is database scaling
One aspect of easier system expansion is
economics.
It normally costs much less to put together a system
of “smaller” computers with the equivalent power of a
single big machine.
Slide 40

First, data may be replicated in a distributed
environment.
 A distributed data base can be designed so that the entire
database, or portions of it, reside at different sites of a
computer network.

Second, if some sites fail (e.g., by either hardware
or software malfunction), or if some communication
links fail (making some of the sites unreachable)
While an update is being executed, the effects will not
be reflected on the data residing at the failing or
unreachable.

The third point is that since each site cannot have
instantaneous information on the actions currently
being carried out at the other sites,
The synchronization of transactions on multiple sites is
considerably harder than for a centralized system.
Slide 41
Possible ways in which a distributed DBMS may be architected:
(1) Autonomy of local systems,
(2) Their distribution, and
(3) Their heterogeneity.
Slide 42


Autonomy
Autonomy, refers to the distribution (or
decentralization) of control, not of data.
It indicates the degree to which individual DBMSs
can operate independently.

Autonomy is a function of a number of factors
such as
 whether the component systems (i.e., individual
DBMSs) exchange information,
whether they can independently execute
transactions, and whether one is allowed to modify
them.
Slide 43

Dimensions of Autonomy
Design autonomy
 Individual DBMSs are free to use the data models and
transaction management techniques that they prefer.
Communication autonomy
 Each
of the individual DBMSs is free to make its own
decision as to what type of information it wants to
provide to the other DBMSs or to the software that
controls their global execution.
Execution autonomy
 Each
DBMS can execute the transactions that are
submitted to it in any way that it wants to.
Slide 44



Distribution
The distribution dimension of the taxonomy deals
with data.
Physical distribution of data over multiple sites;
The user sees the data as one logical pool.

There are a number of ways DBMSs have been
distributed. Two classes:
client/server distribution
 peer-to-peer distribution (or full distribution).
Slide 45


Client/server distribution
The client/server distribution concentrates
data management duties at servers
while the clients focus on providing the
application environment including the user
interface.
 The communication duties are shared between
the client machines and servers.
Slide 46



Peer-to-peer distribution (or full
distribution).
In peer-to-peer systems, there is no distinction
of client machines versus servers.
Each machine has full DBMS functionality and
can communicate with other machines to
execute queries and transactions.
Slide 47




Heterogeneity
Hardware heterogeneity
Differences in networking protocols to variations in
data managers.
Heterogeneity in query languages
not only involves the use of completely different data
access paradigms in different data models.
but also covers differences in languages even when
the individual systems use the same data model.
Slide 48
Slide 49
What is the basic difference between
Database systems and distributed
Database Systems?
 What is being distributed?
 Define a loosely or tightly coupled
multiprocessor system
 Draw Distributed Database System –
Reality
 What do you mean by replicated data?
 What are the Promises Distributed DBMS

Slide 50
Download