Transaction Management Methods in Distributed

advertisement
Reliability Protocols in Distributed Database Systems
Chang Won Choi
CS632 – Advanced Database Systems
Cornell University
February 5, 2016
1 Introduction
In the past, implementation of distributed database systems was deemed impractical because network
technology was either too unreliable or immature to be used and because computers were too expensive to
be implemented in large numbers. However, as networks have become more reliable and computers have
become much cheaper, there has been a large interest to use distributed database systems. There are five
big reasons for using a distributed database system 2:





Many organizations are distributed in nature.
Multiple databases can be accessed transparently.
Database can be expanded incrementally – as needs arise, additional computers can be connected
to the distributed database system.
Reliability and availability is increased – distributed database can replicate data among several
sites. So even if one site fails, redundancy in data will lead to increased availability and reliability
of the data as a whole.
Performance will increased – query processing can be performed at multiple sites and as such
distributed database systems can mimic parallel database systems in a high-performance network.
Even with these benefits, distributed database systems has not been widely used because of many problems
in designing distributed database management system (DDBMS).
1.1
1.1.1
Descriptions of DDBMS
Homogeneous Distributed DBMS vs. Heterogeneous Distributed DBMS
Distributed DBMS may choose to be homogeneous or heterogeneous 2,6. In homogeneous system, the sites
involved in distributed DBMS use the same DBMS software at every site but the sites in heterogeneous
system can use different DBMS software at every site. While it might be easier to implement
homogeneous systems, heterogeneous systems are preferable because organizations may have different
DBMS installed at different sites and may want to access them transparently.
1.1.2
Replication
Distributed DBMS can choose to have multiple copies of relations at different sites or choose to have only
one copy of a relation 6. The benefits of data replication is increased reliability – if one site fails, then other
sites can perform queries for the relation. The performance will increase, as transaction can perform
queries from a local site and not worry about network problems. The problem with data replication is
decreased performance when there are a large number of updates, as distributed DBMS have to ensure that
each transaction is consistent with every replicated data. This adds additional communication costs to
ensure that all copies of the data are updated at the same time.
1.1.3
Fragmentation
Implementation of distributed DBMS can also differ on fragmentation 6,8. Fragmentation is when a relation
is broken up into smaller relations and stored in different sites. Relations can be either fragmented
horizontally or vertically. In horizontal fragmentation, subsets of the relation are split up, and in vertical
fragmentation columns of the relations are split up and stored at different sites. Fragmentation is preferable
if locality is to be exploited. For example, it would be faster for users in Ithaca to perform queries at Ithaca
sites and for Chicago users to perform queries at Chicago sites. And only when the entire relation is
needed, both sites will be accessed.
1.1.4
Local control
Last option in implementing distributed DBMS is to how much local control should be given to each site 6.
In some systems, distributed DBMS may force sites to perform some operations and the administrator of
the site has no control over that. In other systems such as Mariposa, each site has total control over its site
and may refuse to perform operations 7. It is preferable to have greater local control to implement load
balancing. If the site is being overloaded, then it is preferable to refuse further operations and if the data is
replicated, then refer the operations to other sites.
1.2
Problems in Designing Distributed DBMS
Same problems arise when designed distributed DBMS as designing traditional DBMS: distributed DBMS
also have to consider how to do query optimization and concurrency control but distributed DBMS requires
different solutions because of its nature.
1.2.1
Query Optimization
Query optimization poses greater difficulty to distributed DBMS than to centralized database systems
because additional variables of fragmentation, replication, and network transmission cost 6,8. Because
additional variables are added, the search tree for optimal query plan will be expanded and it will be harder
to choose an optimal plan. For example, it is not readily apparent to the optimizer whether or not to move a
relation over to another site and do a join or join after other operations are completed then the join sets
might have become smaller.
1.2.2
Concurrency Control & Recovery Method
As with centralized database systems, DDBMS have to guarantee ACID properties, where the ACID
properties are atomicity, consistency, isolation, and durability 6,8. In distributed DBMS, problems arise
when transactions span multiple sites. For example, how will distributed DBMS ensure that when a
transaction commit, the distributed DBMS has to ensure that all sub-transactions will be committed. And if
some sites, how will recovery process work? The purpose of this survey paper to look at various protocols
enforcing atomicity and durability and what future works could be done in this area.
2
Reliability Protocol Overview
One of the critical functions that database systems have to ensure is correctness and availability. During the
course of the database operations, the database system may stop running or some transactions may have to
be aborted before the transactions commit. In those situations, atomicity and durability would be
compromised if a committed transaction isn’t written to the disk or if aborted transaction is written to the
disk. It is the role of the transaction manager is to guarantee correctness of database system - all actions in
the transaction happen or none happen and if a transaction commits then its effects persist. Furthermore,
recovery manager has to be efficient enough so that downtime of the database is minimized and the
availability of the database is maximized 2,6.
2.1 No Steal-Force Method
To ensure correctness, a trivial solution would be to use no steal-force method 6. In no steal-force method,
the effects of transactions are held in the buffer pool until the transaction commits and after the transaction
commits the effects are forced (written) to the disk. There are two problems with this method. In a long
transaction, no steal prevents the content of the buffer pool to be replaced until the transaction commits.
This will lead to poor throughput in the database system as fewer items get be placed into the buffer
manager and disk has to be continuously accessed.. Force writes requires writes to the disk after the
transaction but this lead to poor response time. For example, if there is a page that is update frequently, the
page has to be forced to the disk at every update. It would be more efficient to have the page in the buffer
pool and then write to the disk after most of the updates are finished. However, force is necessary to
guarantee durability because what would happen if the database were to crashed before writing the data to
the disk? So a desired method is steal-no force method that will also guarantee atomicity and durability.
2.2 Write Ahead Logging
The solution is to log every action and outcome of every transaction 6. For example, if the transaction
decides to abort, the database system can look at the log and undo all the actions. If the database crashes
before committed writes are written to the disk, then once the database comes back up, the recovery
manager will read the logs and redo the actions of committed transactions. Lastly, write-ahead logging
(WAL) is used to ensure atomicity and durability. In WAL, logs are forced into a stable storage before
updates take place or before a transaction commit. A more detailed explanation can be found in the ARIES
paper 5.
3 Issues in Recovery Protocol in Distributed Database
With distributed databases, guaranteeing atomicity and durability becomes more complicated 2.
Transactions usually span more than one site, so if a transaction commits, then all the sites that are involved
in the transaction have to commit. Also, if the transaction abort, then all sub-transactions have to abort.
The problem is how to do this efficiently. The two-phase commit has been the most popular technique
among implemented distributed DBMS and variations have been made to the protocol to make it more
efficient in some cases 1,2,4,6. A more radical variation is called the three-phase commit protocol (3PC)
where additional message is sent at the end and in some cases, will lead to better performance 2.
3.1 Two-Phase Commit
The two-phase commit (2PC) has been the most popular technique for concurrency control in distributed
database as R*, Distributed-INGRES and DDM all use 2PC or variations of it 1,2,4,6. In the basic form of
2PC, there is a coordinator and subordinates where the coordinator is the site that has initiated the
transaction. In the first stage, the coordinator tries to get a uniform decision of either committing or
aborting out of the subordinates and in the second stage, the coordinator relay the decision back to them.
The protocol goes as follows:
1.
2.
3.
4.
5.
The coordinator will write a “prepare” log on a stable storage and will send “prepare” messages to
the subordinates.
After the subordinates receives a “prepare” message, they will decide if they can commit or not
and sends the response back to the coordinator. If the subordinate decides to commit, a “ready”
log will be written to a stable storage on the subordinate’s machine.
If the coordinator receives an abort reply from any subordinates, the coordinator will send abort
message to all subordinates. If all subordinates send “ready” messages, then the coordinator will
send another message saying that everyone has signaled commit and write a “global commit” log
to the stable storage.
If the subordinates get a message saying everyone has decided commit, then the subordinates will
commit the data and send back an acknowledgement message back to the coordinator and commit
log is written to the local stable storage. If the subordinates get an “abort” message, the
subordinates will abort.
When the coordinator receives an acknowledgement messages back from all subordinates, the
coordinator will record “complete” in the log and the transaction is finished.
If any of the subordinates fails to respond within a set time before the “commit” message have been sent,
the coordinator will send abort to all of the subordinates.
3.2 Resilience of 2PC
2-Phase Commit protocol is able to handle all failures and is able to keep the database systems consistent.
The following are possible failures and how 2PC is able to handle them. An important assumption is that
each site has its own recovery and concurrency controls to recover from local crashes and failures.
Process
Coordinator sends a “prepare”
message
Coordinator Fails
Nothing happens because
coordinator hasn’t sent any
messages yet
Subordinates sends either a
“ready” or “abort” message
All the subordinates that replied
“ready” would wait until the
coordinator recover and send a
reply back. Until then, the
subordinates will continue
sending message to the
coordinator with request for the
reply.
Same as above.
The coordinator sends “commit”
or “abort” message.
The subordinate will send an
acknowledgement.
3.2.1
Same as above
Subordinate Fails
Subordinate fails to receive a
“prepare” message so after some
predetermined time coordinator
will abort the transaction and
send abort to all subordinates
Same as above.
The coordinator will wait until
the subordinate recovers and
sends a commit message to the
subordinate.
Same as above
2PC Variations – Presume Abort
The researchers for the R* System improved the performance of 2PC protocol by assuming abort 2,4,9. In
presumed abort, when the coordinator receives an “abort” message from any of the subordinates, the
coordinator will send “abort” messages to other subordinates and forget about the entire transaction. If any
of the subordinates fail during the process, after the failed subordinate recover, it will ask the coordinator
what has happened and from the coordinator’s log, it will find that the transaction has been aborted, and the
subordinate will abort. The advantage of this variation is reduction of number of control messages between
the coordinator and subordinates. However, presume abort becomes less efficient when network is less
stable, so it would be better for the subordinates and the coordinator to wait for the network to become
reliable again.
3.3 2PC Issues & Non-blocking Commitment Protocol: 3PC
The main issue with traditional 2PC is the blocking factor 2. Blocking occurs if failure(s) result in other
sites having to wait until the failure gets resolved. For example, if there is any network or database failures
in any sites after the first phase, then subordinates and/or coordinator have to wait until network
communication is restored or until failed site recover from the failure. Since the recovery time can vary
from few minutes to few hours or even days, it is very inefficient for sites to wait and hold the locks until a
response is received. The problem is more acute for the case when the coordinator fails since all the
subordinates have to wait for the coordinator to recover. In the case that one of the subordinate fail, only
the coordinator have to wait for the failed subordinate to recover. This problem led to a non-blocking
solution. The goal of a non-blocking solution is to have the transaction keep on going at all operational
sites or abort the transaction if the coordinator fails.
The 3PC works as follows:
1.
2.
3.
4.
5.
The coordinator will write a “prepare” log on a stable storage and will send “prepare” messages to
the subordinates.
After the subordinates receives a “prepare” message, they will decide if they can commit or not
and sends the response back to the coordinator. If the subordinate decides to commit, a “ready”
log will be written to a stable storage on the subordinate’s machine.
If the coordinator receives an abort reply from any subordinates, the coordinator will send
“prepare abort” message to all subordinates. If all subordinates send a ready to commit message,
the coordinator will send “prepare commit” message to the subordinates and write a “global
commit” log to the stable storage.
If the subordinates get a “prepare commit” message, then the subordinates will go into “ready to
commit” and send back an “okay” message back to the coordinator and appropriate log is written
to the local stable storage. If the subordinates get an “prepare abort” message, the subordinates
will do the same.
When the coordinator receive “okay” messages back from all subordinates, the coordinator will
send “final commit” or “final abort” message to the subordinates and writes appropriate log to a
stable storage.
The difference between 3PC and 2PC can be seen when the coordinator fails before sending “prepare
commit” message. In 2PC, the subordinates will wait indefinitely until the coordinator comes back again,
but in 3PC, a new coordinator is chosen and the new coordinator behaves as follows:
Status of the new coordinator
The new coordinator is in
“prepare commit” mode
The
new
committed
coordinator
has
Actions by the new coordinator
The new coordinator will send
“prepare
commit”
to
the
subordinates and wait for “Ok”
message.
Once all “Ok”
messages are received, “commit”
message
is
sent
to
all
subordinates.
“Commit” messages are sent to
all subordinates
The new coordinator has aborted
“Abort” messages are sent to all
subordinates
The new coordinator
“prepare” mode
“Abort” messages are sent to all
subordinates
3.3.1
is
in
Why the actions are correct
Nothing has been changed at
other sites, so the new
coordinator will try to determine
the states of other sites and act
accordingly.
If the coordinator has committed
then other sites should have at
least gotten “prepare commit”
and sent “Ok” message back to
the old coordinator. So other
sites should have no problem of
committing.
If the coordinator has aborted
then at the worst case, other sites
would be at “prepare commit,” so
there is no inconsistency in
aborting at other sites.
Same as above.
3PC Issues – Quorum-based protocols
Surprisingly, 3PC, in its basic form, do not guarantee consistency 2. Consider the following case: let and x
and y be two groups of operating sites and because of a network failure, communication between two
groups are completely gone. Following 3PC, two groups will choose their own coordinators and continue
to go through the protocol. So there is a possibility that x and y will act differently. For example, x might
choose a coordinator that was in “prepare commit” and later decides to commit. But if y chooses a new
coordinator who was in “prepare” mode, then y will uniformly abort.
The solution is to use a quorum of subordinates and the coordinator 2. If the coordinator fails and a new
coordinator is chosen, the coordinator starts collecting information from the subordinates. An addition to
the 3PC is the “prepare to abort.” Unlike the original 3PC, sites that respond to “prepare to abort” can still
commit later on. What “prepare to abort” means is that the site wants to abort but in the same case it
doesn’t have to and can commit if the situation requires it. This is possible because the quorum is built
after all sites have research “prepare” state and in this state, the sites do not care if they commit or abort.
Quorum-based protocol also introduced VA and VC, where these are the constants that are defined during
the transaction. If V is the total number of sites in a transaction, then V A+VC=V+1 to ensure that commit or
abort in performed. The following the behavior by the new coordinator using quorum-based protocol:
Situation
At least one of the sites has
committed
Actions by the new coordinator
Send “commit” messages to the
subordinates
At least one of the sites has
aborted
Send “abort” messages to the
subordinates
The number of sites that are in
prepared-to-commit
state
is
greater than VC.
The number of sites that are in
prepared-to-abort state is greater
than VA.
The sum of number of sites that
are in prepare-to-commit and
unknown states is greater than VC.
The sum of number of sites that
are in prepare-to-commit and
unknown states is greater than
VA.
Other cases
Send “commit” messages to the
subordinates
Send “abort” messages to the
subordinates.
Reason for the actions
Since at least one site has
committed, other sites should be
at “prepare commit” or at
“commit.”
So eventually, all
sites should commit.
Since at least one site has
aborted, other sites will be at
most “prepare.” So there is no
possibility that other sites will
commit.
It is more likely that other sites
will want to commit so commit is
sent.
It is more likely that other sites
will want to abort so abort is sent.
“Prepare commit” message is sent
to the sites with unknown states
and the coordinator waits.
“Prepare abort” message is sent
to the sites with unknown states
and the coordinator waits.
No decision can be made.
The coordinator waits.
Same as above.
Same as above.
3.4 Replication Issues & Miscellaneous Issues
Recovery protocols have to account for replication if the distributed database system supports replication 2.
For the copies of the data to be consistent, the copies have to stay the same. This is done through updating
every copy and the protocol is same as of 2PC or 3PC. A single copy is used as a coordinator, and the
coordinator sends update commands to every copy and tries to commit the copies.
Another issue with 2PC, 3PC or another other protocols are how to resolved commission errors 2.
Commission errors occur when messages are corrupted during transmission or if the coordinator sends
wrong commands to the subordinates or if the subordinates do not perform correctly. As possible solutions
to commission error lies in network communication, they will not be discussed in this survey paper.
4 Future Research
There are many possibilities for research in recovery protocols in distributed database systems. While the
performance of recovery protocol made be trivial to centralized database system, because of
communication costs, recovery protocol has larger role in overall database performance. It would be an
important study to research on performance of recovery protocol in different situations. Furthermore, there
needs to be research done on how recovery protocol affects the performance of other components of
distributed database. For example, how will 3PC affect locking mechanisms of replicated data? Does
presume-abort work well in cases where commits are frequent and how fast will locks be acquired and
freed? These are some questions that needs to be answered for an efficient distributed database system to
be implemented.
5 Bibliography
1.
J. Bacon. Concurrent Systems: An Intergrated Approach to Operating Systems, Database, and
Distributed Systems. Addison-Wesley Publishing Company, 1993.
2.
W. Cellary, E. Gelenbe, and T. Morzy. Concurrency Control in Distributed Database Systems.
North-Holland, 1988.
3.
S. Ceri and G. Pelagatti. Distributed Databases: Principles and Systems. McGraw-Hill Book
Company, 1984.
4.
C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database
management system. ACM Transaction on Database Systems, 11(4): 378-396, 1986.
5.
C. Mohan, and et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking
and Partial Rollbacks Using Write-Ahead Logging. ACM Transaction on Database Systems,
17(1): 94-162, 1992.
6.
R. Ramakrishnan. Database Management Systems. McGraw-Hill Book Company, 1998.
7.
M. Stonebraker, and et al. Mariposa: a wide-area distributed database system.
8.
P. Valduriez and M. Ozsu. Distributed and Parallel Database Systems. ACM Computing Surveys,
28(1), March 1996.
9.
R. Williams, et al.. R*: An Overview of the Architecture, Technical Report RJ3325, IBM Research
Lab, San Jose, CA, December 1981.
10.
O. Wolfson. The Overhead of Locking (and Commit) Protocols in Distributed Database. ACM
Transactions on Database Systems, 12(3): 453-471, Sept. 1987.
Download