Distributed Database Systems Contents

advertisement
Contents
• Review: Layered DBMS Architecture
• Distributed DBMS Architectures
– DDBMS Taxonomy
• Client/Server Models
• Key Problems of Distributed DBMS
– Distributed data modeling
– Distributed query processing & optimization
– Distributed transaction management
Distributed Database Systems
Architectures
Key Problems
• Failure recovery
Dr. Denise Ecklund
2 October 2002
©2002 Denise Ecklund, Vera Goebel
See the Pensum and Anbefalt on the class webpage:
DDBS-1
Applications
Control Semantic Integrity Control / Authorization
Compilation
Execution
Data Access
Consistency
Dependencies among DBMS components
Query Processing and Optimization
Storage Structures
Buffer Management
Concurrency Control / Logging
Transaction
Mgmt
Access Path
Mgmt
©2000 Vera Goebel
Sorting
component
Lock
component
Log component
(with savepoint mgmt)
Central
Components
Database
©2002 Denise Ecklund, Vera Goebel
DDBS-2
User Interfaces / View Management
Transaction Management
Functional Layers
Interface
of a DBMS
©2002 Denise Ecklund, Vera Goebel
System Buffer
Mgmt
Indicates a dependency
DDBS-3
©2002 Denise Ecklund, Vera Goebel
DDBS-4
III–1
Centralized DBS
Distributed DBS
• logically integrated
• physically centralized
• Data logically integrated (i.e., access based on one schema)
• Data physically distributed among multiple database nodes
• Processing is distributed among multiple database nodes
T2
T1
T1
T3
T2
network
T3
network
T4
DBS2
DBS
DBS3
T5
Traditionally: one large mainframe DBMS + n “stupid” terminals
©2002 Denise Ecklund, Vera Goebel
DBMS Implementation Alternatives
Distributed Heterog.
Federated DBMS
Centralized
DBMS DBMS
Homogeneous
Centralized
Heterogeneous DBMS
DDBS-6
No Autonomy
Distributed
Multi-DBMS
N1
Client/Server Distribution
Distributed
Heterogeneous DBMS
©2002 Denise Ecklund, Vera Goebel
Common DBMS Architectural Configurations
Distribution
Distributed Homog.
Federated DBMS
N3
Centralized Heterog.
Federated DBMS
Centralized Homog.
Multi-DBMS
Autonomy
©2000 Vera Goebel
N4
Centralized Heterog.
Multi-DBMS
Fully integrated nodes
Complete cooperation on:
- Data Schema
- Transaction Mgmt
Fully aware of all other
nodes in the DDB
Heterogeneity
©2002 Denise Ecklund, Vera Goebel
N2
Distributed Heterog.
Multi-DBMS
Centralized Homog.
Federated DBMS
Performance via parallel execution
- more users
- quick response
More data volume
- max disks on multiple nodes
Traditionally: m mainframes for the DBMSs + n terminals
DDBS-5
Distributed
DBMS DBMS
Homogeneous
Why a Distributed DBS?
DBS1
DDBS-7
Federated
Multi
D1
GTM & SM
D3
D2
D4
Independent DBMSs
Implement some
“cooperation functions”:
- Transaction Mgmt
- Schema mapping
Aware of other DBs
in the federation
D1
D2
D3
D4
Fully independent DBMSs
Independent, global
manager tries to
coordinate the DBMSs
using only existing
DBMS services
Unaware of GTM and
other DBs in the MultiDB
©2002 Denise Ecklund, Vera Goebel
DDBS-8
III–2
Parallel Database Platforms
(A form of Non-autonomous Distributed DBMS)
Shared Everything
Proc1 Proc2
Fast Interconnection Network
Mem1
Disk 1
...
Mem2
...
MemM
Disk p
Typically a special-purpose hardware
configuration, but this can be emulated
in software.
Proc1
...
Fast Interconnection
Network
Disk 1
...
Improved performance
Efficiency
Extensibility (addition of new nodes)
Transparency of distribution
• Storage of data
• Query execution
ProcN
– Autonomy of individual nodes
MemN
Mem1
• Advantages:
–
–
–
–
Shared Nothing
... ProcN
Distributed DBMS
• Problems:
Disk N
Can be a special-purpose hardware
configuration, or a network of workstations.
©2002 Denise Ecklund, Vera Goebel
DDBS-9
–
–
–
–
Complexity of design and implementation
Data consistency
Safety
Failure recovery
©2002 Denise Ecklund, Vera Goebel
DDBS-10
Client/Server Environments
data (object) server + n smart clients (workstations)
• objects stored and administered on server
• objects processed (accessed and modified) on
workstations [sometimes on the server too]
• CPU-time intensive applications on workstations
Client/Server Database Systems
The “Simple” Case of Distributed
Database Systems?
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
– GUIs, design tools
• Use client system local storage capabilities
• combination with distributed DBS services
=> distributed server architecture
DDBS-11
©2002 Denise Ecklund, Vera Goebel
DDBS-12
III–3
Clients with Centralized Server Architecture
Clients with Distributed Server Architecture
workstations
...
workstations
...
local communication network
Data Server 1
local communication
network
Data
Server
distributed DBMS
interface
local data mgnt. functions
database functions
Disk 1
Data Server m
interface
...
Disk 1
Disk n
database
©2002 Denise Ecklund, Vera Goebel
DDBS-13
...
database
Disk n
...
interface
distributed DBMS
local data mgnt. functions
Disk 1
...
database
Disk p
©2002 Denise Ecklund, Vera Goebel
DDBS-14
Clients with Data Server Approach
...
“3-tier client/server”
In the Client/Server DBMS
Architecture,
how are the database services
organized?
user interface
Application
Server
query parsing
data server interface
communication
channel
Data
Server
There are several architectural options!
application server interface
database functions
Disk 1
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
...
database
Disk n
DDBS-15
©2002 Denise Ecklund, Vera Goebel
DDBS-16
III–4
Client/Server Architectures
Object Server Architecture
server process
Relational
server process
client process
client process
Log/Lock
Manager
Application
cursor
management
Object-Oriented
SQL
Application
client process
Object Manager
Application
server process
object/page cache
management
DBMS
Object
Cache
objects,
pages, or
files
©2002 Denise Ecklund, Vera Goebel
DDBS-17
• Disadvantages/problems:
- remote procedure calls (RPC) for object references
- complex server design
- client cache consistency problems
- page-level locking, objects get copied multiple times, large
objects
©2000 Vera Goebel
File/Index
Manager
Page Cache
Manager
Page
Cache
Storage Allocation
and I/O
©2002 Denise Ecklund, Vera Goebel
DDBS-18
Page Server Architecture
Unit of transfer: object(s)
Server understands the object concept and can execute methods
most DBMS functionality is replicated on client(s) and server(s)
Advantages:
- server and client can run methods, workload can be balanced
- simplifies concurrency control design (centralized in server)
- implementation of object-level locking
- low cost for enforcing “constraints” on objects
©2002 Denise Ecklund, Vera Goebel
Object
Cache
Database
Object Server Architecture - summary
•
•
•
•
objects
object references
queries
method calls
locks
log records
Object
Manager
client process
Application
Object
Cache
Object Manager
File/Index
Manager
Page Cache
Manager
server process
pages
page references
Log/Lock Page Cache
locks
Manager
Manager
log records
Page
Cache
Storage Allocation
and I/O
Page
Cache
database
DDBS-19
©2002 Denise Ecklund, Vera Goebel
DDBS-20
III–5
Page Server Architecture - summary
File Server Architecture
• unit of transfer: page(s)
• server deals only with pages (understands no object semantics)
• server functionality: storage/retrieval of page(s), concurrency
control, recovery
• advantages:
- most DBMS functionality at client
- page transfer -> server overhead minimized
- more clients can be supported
- object clustering improves performance considerably
• disadvantages/problems:
- method execution only on clients
- object-level locking
- object clustering
- large client buffer pool required
©2002 Denise Ecklund, Vera Goebel
DDBS-21
File Server Architecture - summary
• disadvantages/problems:
- same as for page server
- NFS write are slow
- read operations bypass the server -> no request combination
- object clustering
- coordination of disk space allocation
©2000 Vera Goebel
Application
Object
Cache
Object Manager
File/Index
Manager
Page
Cache
Page Cache
Manager
server process
locks
log records
Log/Lock
Manager
pages
page references
Space
Allocation
NFS
database
©2002 Denise Ecklund, Vera Goebel
DDBS-22
Cache Consistency in Client/Server Architectures
• unit of transfer: page(s)
• simplification of page server, clients use remote file system (e.g.,
NFS) to read and write DB page(s) directly
• server functionality: I/O handling, concurrency control, recovery
• advantages:
- same as for page server
- NFS: user-level context switches can be avoided
- NFS widely-used -> stable SW, will be improved
©2002 Denise Ecklund, Vera Goebel
client process
Synchronous
Asynchronous
Deferred
Client sends 1 msg
per lock to server;
Avoidance- Client waits;
Based
Server replies with
Algorithms
ACK or NACK.
Client sends 1 msg
per lock to the server;
Client continues;
Server invalidates
cached copies at
other clients.
Client sends all write
lock requests to the
server at commit time;
Client waits;
Server replies when
all cached copies are
freed.
Client sends object
Detection- status query to server
for each access;
Based
Algorithms Client waits;
Server replies.
Client sends 1 msg
per lock to the server;
Client continues;
After commit, the
server sends updates
to all cached copies.
Client sends all write
lock requests to the
server at commit time;
Client waits;
Server replies based
on W-W conflicts only.
Best Performing Algorithms
DDBS-23
©2002 Denise Ecklund, Vera Goebel
DDBS-24
III–6
Comparison of the 3 Client/Server Architectures
Page & File Server
• Simple server design
• Complex client design
• Fine grained
concurrency control
difficult
• Very sensitive to
client buffer pool size
and clustering
Distributed Database Design
Distributed Directory/Catalogue Mgmt
Distributed Query Processing and Optimization
Distributed Transaction Mgmt
– Distributed Concurreny Control
– Distributed Deadlock Mgmt
– Distributed Recovery Mgmt
Object Server
• Complex server design
• “Relatively” simple client
design
• Fine-grained concurrency
control
• Reduces data movement,
relatively insensitive to
clustering
• Sensitive to client buffer
pool size
Conclusions:
• No clear winner
• Depends on object size and application’s object access
pattern
• File server ruled out by poor NFS performance
©2002 Denise Ecklund, Vera Goebel
Problems in Distributed DBMS Services
directory management
query processing
distributed DB design
concurrency control
(lock)
influences
DDBS-25
Distributed Storage in Relational DBMSs
deadlock management
reliability (log)
transaction
management
©2002 Denise Ecklund, Vera Goebel
DDBS-26
Distributed Storage in OODBMSs
• horizontal fragmentation: distribution of “rows”, selection
• Must fragment, allocate, and replicate object data among nodes
• Complicating Factors:
• vertical fragmentation: distribution of “columns”, projection
• hybrid fragmentation: ”projected columns” from ”selected rows”
• allocation: which fragment is assigned to which node?
• replication: multiple copies at different nodes, how many copies?
–
–
–
–
–
Encapsulation hides object structure
Object methods (bound to class not instance)
Users dynamically create new classes
Complex objects
Effects on garbage collection algorithm and performance
• Approaches:
• Design factors:
– Store objects in relations and use RDBMS strategies to distribute data
• All objects are stored in binary relations [OID, attr-value]
• One relation per class attribute
• Use nested relations to store complex (multi-class) objects
– Most frequent query access patterns
– Available distributed query processing algorithms
– Use object semantics
• Evaluation Criteria
• Evaluation Criteria:
– Cost metrics for: network traffic, query processing, transaction mgmt
– A system-wide goal: Maximize throughput or minimize latency
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
– Affinity metric (maximize)
– Cost metric (minimize)
DDBS-27
©2002 Denise Ecklund, Vera Goebel
DDBS-28
III–7
Horizontal Partitioning using Object Semantics
• Fragment object attributes based on the
class hierarchy
• Divide instances of a class into multiple groups
– Based on selected attribute values
Node1: Attr A < 100
Attr A >= 100
– Based on subclass designation
: Node2
Instances of Class X
Class C
Node1: Subcl X
Instances of Subclass subX
Subcl Y : Node2
• Fragment an object’s complex attributes
from it’s simple attributes
• Fragment an object’s attributes based on
typical method invocation sequence
Instances of Subclass subsubX
Fragment #1
©2002 Denise Ecklund, Vera Goebel
DDBS-29
Path Partitioning using Object Semantics
©2002 Denise Ecklund, Vera Goebel
DDBS-30
Local Object Data
Local
Methods
Remote
Methods
– Typically represented in a ”structure index”
Remote Object Data
Good performance;
Issue: Replicate methods so they
are local to all instances?
Fetch the remote data and execute
the methods;
Issue: Time/cost for data transfer?
Send data to remote site, execute,
and return result
OR fetch the method and execute;
Issues: Time/cost for two
transfers?
Ability to execute locally?
Send additional input values via
RPC, execute and return result;
Issues: RPC time?
Execution load on remote node?
{SubC#1-OID, SubC#2-OID, …SubC#n-OID}
• Replication Options
–
–
–
–
©2000 Vera Goebel
Fragment #3
More Issues in OODBMS Fragmentation
• Use the object composition graph for complex
objects
• A path partition is the set of objects
corresponding to instance variables in the
subtree rooted at the composite object
©2002 Denise Ecklund, Vera Goebel
Fragment #2
Breaks object encapsulation?
– Keep sequentially referenced attributes together
Composite
Object OID
Vertical Partitioning using Object Semantics
DDBS-31
Objects
Classes (collections of objects)
Methods
Class/Type specifications
©2002 Denise Ecklund, Vera Goebel
DDBS-32
III–8
Distributed Directory and Catalogue Management
Distributed Query Processing and Optimization
• Construction and execution of query plans, query optimization
• Goals: maximize parallelism (response time optimization)
minimize network data transfer (throughput optimization)
• Directory Information:
– Description and location of records/objects
• Size, special data properties (e.g.,executable, DB type,
user-defined type, etc.)
• Fragmentation scheme
• Basic Approaches to distributed query processing:
Pipelining – functional decomposition
– Definitions for views, integrity constraints
Rel A
• Options for organizing the directory:
–
–
–
–
Select A.x > 100
Issues: bottleneck, unreliable
Centralized
Issues: consistency, storage overhead
Fully replicated
Issues: complicated access protocol, consistency
Partitioned
e.g., zoned,
Combination of partitioned and replicated
DDBS-33
Creating the Distributed Query Processing Plan
Node 2
Rel B
Frag A.1
Frag A.2
Node 1:
Node 2:
Join A and B on y
Parallelism – data decomposition
replicated zoned
©2002 Denise Ecklund, Vera Goebel
Node 1
Processing Timelines
Node 1
Select A.x > 100
Node 2
Processing Timelines
Node 3
Node 1:
Node 2:
Node 3:
Union
©2002 Denise Ecklund, Vera Goebel
DDBS-34
Generic Layered Scheme for Planning a Dist. Query
calculus query on distributed objects
• Factors to be considered:
- distribution of data
- communication costs
- lack of sufficient locally available information
control site
query decomposition
Global
schema
algebraic query on distributed objects
data localization
Fragment
schema
fragment query
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
global optimization
local sites (use local information)
DDBS-35
Statistics on
fragments
optimized fragment query
with communication operations
control site (uses global information)
local sites
• 4 processing steps:
(1) query decomposition
(2) data localization
(3) global optimization
(4) local optimization
local optimization
This approach was
initially designed for
distributed relational
DBMSs.
It also applies to
distributed OODBMSs,
Multi-DBMSs, and
distributed MultiDBMSs.
Local
schema
optimized local queries
©2002 Denise Ecklund, Vera Goebel
DDBS-36
III–9
Distributed Query Processing Plans in OODBMSs
• Two forms of queries
– Explicit queries written in OQL
– Object navigation/traversal
}
Reduce to logical
algebraic expressions
• Planning and Optimizing
– Additional rewriting rules for sets of objects
• Union, Intersection, and Selection
– Cost function
• Should consider object size, structure, location, indexes, etc.
• Breaks object encapsulation to obtain this info?
• Objects can ”reflect” their access cost estimate
• information needed for optimization (fragment statistics):
- size of objects, image sizes of attributes
- transfer costs
- workload among nodes
- physical data layout
- access path, indexes, clustering information
- properties of the result (objects) - formulas for estimating the
cardinalities of operation results
• execution cost is expressed as a weighted combination of I/O,
CPU, and communication costs (mostly dominant).
total-cost = CCPU * #insts + CI/O * #I/Os + CMSG * #msgs + CTR * #bytes
– Volatile object databases can invalidate optimizations
• Dynamic Plan Selection (compile N plans; select 1 at runtime)
• Periodic replan during long-running queries
©2002 Denise Ecklund, Vera Goebel
Distributed Query Optimization
Optimization Goals: response time of single transaction or system throughput
DDBS-37
Distributed Transaction Managment
©2002 Denise Ecklund, Vera Goebel
DDBS-38
Classification of Concurrency Control Approaches
• Transaction Management (TM) in centralized DBS
– Achieves transaction ACID properties by using:
Isolation
CC Approaches
• concurrency control (CC)
• recovery (logging)
Pessimistic
• TM in DDBS
– Achieves transaction ACID properties by using:
Locking
Timestamp
Ordering
Optimistic
Hybrid
Locking
Timestamp
Ordering
Transaction Manager
Concurrency Control
(Isolation)
Recovery Protocol
(Durability)
Log Manager
Buffer Manager
Commit/Abort Protocol
(Atomicity)
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
Replica Control Protocol
(Mutual Consistency)
Strong algorithmic
dependencies
DDBS-39
Centralized
Basic
Primary
Copy
MultiVersion
Distributed
Conservative
phases of pessimistic
transaction execution
validate
phases of optimistic
transaction execution
read
read
compute
write
compute
validate
write
©2002 Denise Ecklund, Vera Goebel
DDBS-40
III–10
Two-Phase-Locking Approach (2PL)
Communication Structure of Centralized 2PL
Isolation
Obtain Lock
Number of locks
2PL Lock Graph
Isolation
Data Processors at
participating sites
Release Lock
Coordinating TM
Central Site LM
1
Lock Request
BEGIN
LOCK
POINT
END
2
Transaction
Duration
Lock Granted
3
Number of locks
Obtain Lock
Operation
Release Lock
4
End of Operation
5
Strict 2PL Lock Graph
BEGIN
END
Period of data item use
Release Locks
Transaction
Duration
©2002 Denise Ecklund, Vera Goebel
DDBS-41
©2002 Denise Ecklund, Vera Goebel
DDBS-42
Distributed Deadlock Management
Communication Structure of Distributed 2PL
Isolation
Isolation
Data Processors at
participating sites
Participating LMs
Example
Coordinating TM
Site X
1
T2 x
T1 x
holds lock Lc
Lock Request
2
waits for T1 x
to release Lc
holds lock Lb
T1 x needs a
waits for T1
on site y
to complete
Operation
3
End of Operation
Site Y
T2 y needs b
waits for T2
on site x
to complete
waits for T2 y
to release Ld
T1 y
holds lock La
4
Distributed Waits-For Graph
- requires many messages
to update lock status
- combine the “wait-for”
tracing message with lock
status update messages
- still an expensive algorithm
T2 y
holds lock Ld
Release Locks
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
DDBS-43
©2002 Denise Ecklund, Vera Goebel
DDBS-44
III–11
State Transitions for 2P Commit Protocol
Communication Structure of Centralized 2P Commit Protocol
Coordinating TM
Atomicity
Participating Sites
Coordinating TM
Participating Sites
1
Prepare
2
1
Prepare
Vote-Commit or Vote-Abort
Count the votes
If (missing any votes
or any Vote-Abort)
Then Global-Abort
Else Global-Commit
Disadvantages:
2
4
Write log entry
ACK
©2002 Denise Ecklund, Vera Goebel
Global-Commit
DDBS-45
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
Abort
DDBS-46
Coordinating TM
Initi
al
Recovery Protocols,
Termination Protocols,
& Replica Control Protocols
DDBS-47
Abo
Abort
rt
Wait – Send global-abort
Commit or Abort – BLOCKED!
or Vote-Abort
Re
Ready
ady
Global-Commit
Co
Commit
mmi
t
Coordinator Termination Protocol:
Vote-Commit
Global-Abort or
Termination Protocols
Coordinator: wait, commit, abort
Participant: initial, ready
Initi
al
2
3
Use timeouts to detect
potential failures that could
block protocol progress
Timeout states:
1
Wa
it
Who addresses the problem?
Atomicity, Consistency
Participating Sites
Prepare
Modified Concurrency Control
& Commit/Abort Protocols
Attempted solutions for the
blocking problem:
1) Termination Protocol
2) 3-Phase Commit
3) Quorum 3P Commit
©2002 Denise Ecklund, Vera Goebel
Transaction failure
Node failure
Media failure
Network failure
– How to continue service
– How to maintain ACID properties
while providing continued service
– How to ensure ACID properties after
recovery from the failure(s)
Commit
Termination Protocols
Types of Failure:
Issues to be addressed:
4
ACK
Failures in a Distributed System
• Partitions each containing 1 or more sites
Abort
Commit
Other communication
structures are
possible:
– Linear
– Distributed
- If failure occurs,
the 2PC protocol blocks!
Ready
Global-Abort or
Who participates?
Depends on the CC alg.
- Many, many messages
or Vote-Abort
3
Global-Abort or Global-Commit
–
–
–
–
Initial
Wait
Vote-Commit
3
- Preserves atomicity
- All processes are
“synchronous within
one state transition”
Initial
Make local commit/abort decision
Write log entry
Atomicity
Advantages:
Participant Termination Protocol:
4
ACK
Co
mmi
t
Abo
rt
Ready – Query the coordinator
If timeout
then query other participants;
If global-abort | global-commit
then proceed and terminate
else BLOCKED!
©2002 Denise Ecklund, Vera Goebel
DDBS-48
III–12
Replica Control Protocols
Consistency
Number of locks
• Update propagation of committed write operations
– CC locks all copies
– 2PC propagates the updated values with 2PC
messages (or an update propagation phase is
inserted between the wait and commit states
for those nodes holding an updateable value).
LAZY UPDATE
END
Obtain Locks
Period of data use
2-Phase Commit
Consistency
• Read-One-Write-All (ROWA)
• Part of the Concurrency Control Protocol
and the 2-Phase Commit Protocol
STRICT UPDATE
BEGIN
Strict Replica Control Protocol
Release
Locks
COMMIT POINT
©2002 Denise Ecklund, Vera Goebel
Lazy Replica Control Protocol
DDBS-49
Consistency
Atomicity, Durability
Commit Approaches:
Redo – use the undo/redo log to perform all the write operations again
Retry – use the transaction log to redo the entire subtransaction (R + W)
Abort Approaches:
Undo – use the undo/redo log to backout all the writes that were actually
performed
Compensation – use the transaction log to select and execute ”reverse”
subtransactions that semantically undo the write operations.
• To preserve single copy semantics, must ensure
that a transaction reads a current copy.
– Changes the CC algorithm for read-locks
– Adds an extra communication cost for reading data
Implementation requires knowledge of:
– Buffer manager algorithms for writing updated data
from volatile storage buffers to persistent storage
– Concurrency Control Algorithm
– Commit/Abort Protocols
– Replica Control Protocol
• Extended transaction models may not require
single copy semantics.
©2000 Vera Goebel
Recovery in Distributed Systems
DDBS-50
Select COMMIT or ABORT (or blocked) for each interrupted subtransaction
• Propagates updates from a primary node.
• Concurrency Control algorithm locks the
primary copy node (same node as the
primary lock node).
©2002 Denise Ecklund, Vera Goebel
©2002 Denise Ecklund, Vera Goebel
DDBS-51
©2002 Denise Ecklund, Vera Goebel
DDBS-52
III–13
Network Partitions in Distributed Systems
Data Availability in Partitioned Networks
Partition #2
• Concurrency Control model impacts data availability.
N2
N1
• ROWA – data replicated in multiple partitions is not
available for reading or writing.
N3
N8
N4
network
Partition #1
N7
N6
N5
Partition #3
• Primary Copy Node CC – can execute transactions if
the primary copy node for all of the read-set and all
of the write-set are in the client’s partition.
Issues:
9 Termination of interrupted transactions
9 Partition integration upon recovery from a network failure
– Data availability while failure is ongoing
Availability is still very limited . . . We need a new idea!
©2002 Denise Ecklund, Vera Goebel
Quorums
DDBS-53
ACID
• Quorum – a special type of majority
• Use quorums in the Concurrency Control,
Commit/Abort, Termination, and Recovery Protocols
– CC uses read-quorum & write-quorum
– C/A, Term, & Recov use commit-quorum & abort-quorum
• Advantages:
– More transactions can be executed during site failure and
network failure (and still retain ACID properties)
• Disadvantages:
©2000 Vera Goebel
DDBS-54
Read-Quorums and Write-Quorums
Isolation
• The Concurrency Control Algorithm serializes valid
transactions in a partition. It must obtain
– A read-quorum for each read operation
– A write-quorum for each write operation
• Let N=total number of nodes in the system
• Define the size of the read-quorum Nr and the size
of the write-quorum Nw as follows:
– Nr + Nw > N
– Nw > (N/2)
– Many messages are required to establish a quorum
– Necessity for a read-quorum slows down read operations
– Not quite sufficient (failures are not “clean”)
©2002 Denise Ecklund, Vera Goebel
©2002 Denise Ecklund, Vera Goebel
Simple Example:
N=8
Nr = 4
Nw = 5
• When failures occur, it is possible to have a valid
read quorum and no valid write quorum
DDBS-55
©2002 Denise Ecklund, Vera Goebel
DDBS-56
III–14
Commit-Quorums and Abort-Quorums
ACID
Conclusions
• Nearly all commerical relational database systems offer some
form of distribution
• The Commit/Abort Protocol requires votes from all
participants to commit or abort a transaction.
– Client/server at a minimum
– Commit a transaction if the accessible nodes can form a commitquorum
• Only a few commercial object-oriented database systems
support distribution beyond N clients and 1 server
– Abort a transaction if the accessible nodes can form an abort-quorum
– Elect a new coordinator node (if necessary)
– Try to form a commit-quorum before attempting
to form an abort-quorum
• Future research directions:
– Distributed data architecture and placement schemes (app-influenced)
• Let N=total number of nodes in the system
– Distributed databases as part of Internet applications
• Define the size of the commit-quorum Nc and the size of
abort-quorum Na as follows:
Simple Examples:
– Na + Nc > N; 0 ≤ Na, Nc ≤ N
©2002 Denise Ecklund, Vera Goebel
©2000 Vera Goebel
N=7
N=7
Nc = 4
Nc = 5
– Continued service during disconnection or failure
• Mobile systems accessing databases
Na = 4
Na = 3
• Relaxed transaction models for semantically-correct continued operation
DDBS-57
©2002 Denise Ecklund, Vera Goebel
DDBS-58
III–15
Download