Constructing Atomic Commit Protocols Using Knowledge Abstract

advertisement
Constructing Atomic Commit Protocols
Using Knowledge
Weihai Yu
Department of Computer Science
University of Tromsø, Norway
weihai@cs.uit.no
Abstract
The main purpose of this paper is to use logic of process knowledge as a tool to (1)
obtain better understanding of the atomic commit problem, to (2) analyze at a rather
intuitive level the rationale behind correct atomic commit protocols that are in use
today, and to (3) conduct the design and construction of atomic commit protocols. This
is not a theoretic paper, although some formal definitions are introduced and used in the
discussions. This paper is intended for people not familiar with logic of process
knowledge, or even atomic commit protocols.
1
Introduction
Atomic commit protocols are a key element in supporting global atomicity of distributed
transactions. Two-phase commit protocol (2PC) is the de facto standard atomic commit protocol
[1][5]. As web services are gaining importance in open distributed processing, 2PC becomes a
hot research topic again. It is widely agreed that 2PC is important to guarantee correctness
properties in the complex distributed world whilst at the same time it reduces parallelism due to
high disk and message overhead and locking during windows of vulnerability. There are a
number of optimizations to the basic 2PC (e.g., [2][4][9][11][15]). Some of them are so widely
used that they are built into commercial systems and become part of the standards for
distributed transaction processing [14][16].
To understand and manipulate the vast amount of atomic commit protocols (including various
2PC variations) is not an easy task. This paper is an attempt to bring a useful picture of this
important research area with the help of the logic of process knowledge.
2
A General Model of Distributed Systems
This model is typically used in the study of knowledge and coordination [3][6][10][13].
A distributed system is a finite set P of n processes connected by a communication network. We
assume the existence of a global time of natural numbers, which is only for the purpose of the
description of system executions and thus is not observable by the processes. At each moment in
time, a number of processes may each execute an event, which can be a local event, sending or
receiving a message, and so on. Local events may include crash failure and restart thereafter,
timeout, and local actions specific to the system, as we shall see later for atomic commit
protocols.
The local state of a process consists of its initial state at time 0 and the sequence of events it has
executed. The global state of a system is an n+1 tuple (se, s1, …, sn) where se is the operating
environment of the system and s1, …, sn are local states of processes. The operating environment
is used to encapsulate the characteristics of the system that are not captured in the local states,
such as the failure model of the communication network. Like the global time, the operating
environment is only for the purpose of description of system behavior and is not observable by
the processes. For example, a network failure can be modeled as a partition event of the
operating environment.
A protocol is a function from local states to actions (i.e., local actions and message sendings).
Note that processes do not run their protocols in isolation. It is the combination of the protocols
run by all processes that causes the system to behave in a particular way. A joint protocol PT is
a tuple (PT1, …, PTn) consisting of protocols PTi for each of the processes i = 1, …, n.
Possible joint behavior of the system over time is modeled by runs. A run is a function from
time to global states. Thus in a possible execution r, r(0) is the initial global state of the system,
r(t) is the global state at time t, and so on. The behavior of the system running joint protocol PT
is characterized by the set of all possible runs, denoted by RPT, or R if PT is obvious from
context. If run r ∈ R and t is a natural number, r(t) is a point in R.
If r(t) = (se, s1, …, sn), we denote re(t) = se and ri(t) = si for i = 1, …, n. Two points r(t) and r´(t´)
are indistinguishable to process i, denoted r(t) ~i r´(t´), if ri(t) = r´i(t´), i.e., if the process i has
the same local state at both points. We write a -| r if event a occurs in run r. Similarly, a-| r(t) or
a -| ri(t) if a occurs in a point of a run or in a local state of process i.
Next, we describe some events of the distributed system that will be used in the discussions
later. Let P be the set of processes and p, q ∈ P, we have the following events:
•
sendp(q, m), p sends a message m to q,
•
receivep(q, m), p receives a meaage m from q,
•
failp, p fails,
•
restartp, p restarts after a failure,
•
timeoutp(a), p timeouts after an event a,
•
partitione (U), a network (re-)partition occurs. U is a partition of the processes. |U| is the
number of islands of the partition. |U| = 1 if there is no network partition. A message
from p to q can only be delivered when p and q are in the same island.
We can define well-formedness of runs, for example the following (we will not give a complete
definition here):
3
•
The only possible event after failp is restartp,
•
For t’ > t, sendp(q, m) -| r(t) ∧ receiveq(p, m) -| r(t’) ⇒ p and q are in the same partition
island between t and t’.
Process Knowledge
Process knowledge was first defined by Halpern and Moses [7] and detailed in [3].
To describe the semantics of a protocol, we assume a set Φ of primitive propositions on points
of runs, describing basic facts about the system. A fact ϕ is either true or false at a given point
r(t) in R, denoted (R, r, t) |= ϕ and (R, r, t) |≠ ϕ, respectively. The set of well-formed formulae
WFF is such that Φ ⊆ WFF, and for any ϕ and ϕ´ ∈ WFF, p ∈ P set of processes and a group G
⊆ P, ¬ϕ, ϕ∧ϕ´, ϕ∨ϕ´, ◊ϕ (eventually ϕ), Kpϕ (p knows ϕ), EGϕ (everybody in group G knows),
SGϕ (somebody in G knows), DGϕ (it is distributed knowledge in G), CGϕ (it is common
knowledge in G) are also in WFF. We often omit the subscript G if G = P.
Given the events defined earlier, we can define the propositions corresponding to the occurrence
of these events. For example:
•
FAILp, the last event of p is failp,
•
FAIL, either FAILp for some p or partition |U| > 1.
•
NO_FAIL = ¬FAIL.
•
NO_FAIL (t, t’), no fail between time t and t’.
Below are the formal definitions of some knowledge operators:
•
(R, r, t) |= ◊ϕ iff (R, r, t´) |= ϕ for some t´ ≥ t.
•
(R, r, t) |= Kpϕ iff (R, r´, t´) |= ϕ for all r´ (t´) such that r(t) ~p r´(t´).
•
(R, r, t) |= EGϕ iff (R, r, t) |= Kqϕ for all q ∈ G.
•
(R, r, t) |= SGϕ iff (R, r, t) |= Kqϕ for some q ∈ G.
•
(R, r, t) |= DGϕ iff (R, r´, t´) |= ϕ for all r´ (t´) such that r(t) ~G r´(t´).
•
(R, r, t) |= CGϕ iff (R, r, t) |= EGϕ, and (R, r, t) |= EGEGϕ, and …
Another way to define common knowledge is that it is the greatest fixed point of X = EG(ϕ∧X).
In many situations, common knowledge is too strong a notion to be achievable. In some cases,
the weaker notion of eventual common knowledge is used instead. Eventual common
knowledge, denoted C◊ϕ, is the greatest fixed point of X = ◊EG(ϕ∧X).
Furthermore, a proposition is said to be stable if once true, it remains true forever. Formally, if
(R, r, t) |= ϕ, then for all t’≥ t, (R, r, t’) |= ϕ
4
The Atomic Commitment Problem
Informally, an atomic commitment problem requires processes to agree on a common outcome
which can be either Commit or Abort. More specifically, an atomic commit protocol must
guarantee the atomic commitment properties [1]:
•
AC1: All processes that reach an outcome reach the same one.
•
AC2: A process cannot reverse its outcome after it has reached one.
•
AC3: The Commit outcome can only be reached if all participants voted Yes.
•
AC4: If there are no failures and all participants voted Yes, then the outcome will be
Commit.
•
AC5: Consider any execution containing only failures that the protocol is designed to
tolerate. At any point in this execution, if all existing failures are repaired and no new
failures occur for sufficiently long, then all processes will eventually reach an outcome.
Next, we specify more formally the atomic commit problem using the logic of process
knowledge. We first define some local actions specific to atomic commit protocols, for p ∈ T
participant processes in the transaction:
•
yesp, p votes Yes,
•
nop, p votes No,
•
commitp, p commits,
•
abortp, p aborts.
The corresponding primitive propositions after the executions of the actions are: YESp, NOp,
COMMITp, ABORTp.
Formally, the atomic commitment properties are:
5
•
AC1: ∀p, q ∈T, (R, r, t) |= ¬(COMMITp ∧ ABORTq)
•
AC2: implied by AC1 and that COMMITp and ABORTp are stable.
•
AC3: (R, r, t) |= COMMITp ⇒ (R, r, t) |= ∧q∈TYESq
•
AC4: NO_FAIL ∧ (R, r, t) |= ∧p∈TYESp ⇒ (R, r, t) |= ◊(∧p∈TCOMMITp)
•
AC5: For a d sufficiently large, if NO_FAIL(t, t+d), then there is t’ between t and t+d,
(R, r, t’) |= ∧p∈TCOMMITp ∨ ∧p∈TABORTp
Constructing Atomic Commit Protocols
Using Knowledge
We now discuss how process logic could be used to the specification and construction of atomic
commit protocols. We first look at the atomic agreement requirement, which is the main focus
of many papers in the literature [6][8][12], and then the protocol termination requirement, which
is also very important [13] but not much research result is available. Finally we discuss the
construction of the protocols.
5.1
Agreement
Neiger [13] showed a necessary condition for all participants to achieve the Commit agreement:
(R, r, t) |= COMMITp ⇒ (R, r, t) |= Kp(∧q∈TYESq)
That is, a process commits only if it knows that all participants have voted Yes. This is obvious
and several papers used this to show the flexibility of using knowledge to achieve agreement
[6][8][12].
However, this necessary condition is not sufficient to achieve the Commit agreement. For
example, in a naïve decentralized two-phase commit protocol, every participant broadcasts a
Yes vote to all other participants. If the network is not reliable, some of the participants may
receive all Yes votes but some may not. Although it may not necessarily lead to inconsistent
outcome if those participants that do receive all Yes votes decide to commit, obviously
something in addition must be done for those participants that do not receive all Yes votes.
AC4 presented a sufficient condition for all participants to achieve the Commit agreement. For
every individual participant, it suffices to decide the Commit outcome when it knows that all
participants voted Yes and no failure has occurred:
NO_FAIL ∧ (R, r, t) |= Kp(∧q∈TYESq) ⇒ (R, r, t) |= COMMITp
This sufficient condition could be useful in special circumstances. For example, given atomic
broadcast, p knowing YESq (where p ≠ q) implies NO_FAIL:
(R, r, t) |= KpYESq ⇒ NO_FAIL
Thus it is both necessary and sufficient to decide Commit when a process receives Yes votes
from all participants:
(R, r, t) |= COMMITp ⇔ (R, r, t) |= Kp(∧q∈TYESq)
In general, however, there is a gap between the necessary and the sufficient conditions for the
Commit agreement. If there is a condition that is both necessary and sufficient for the individual
processes to decide the Commit outcome, it should look like this:
(R, r, t) |= Kp(∧q∈TYESq ∧ LESS_FAIL) ⇔ (R, r, t) |= COMMITp
And for both Commit and Abort outcomes, a more general specification should look like:
•
(R, r, t) |= COMMITp ⇔ (R, r, t) |= Kp(∧q∈TYESq ∧ LESS_FAIL)
•
(R, r, t) |= ABORTp ⇔ (R, r, t) |= Kp(∨q∈T NOq ∨ MORE_FAIL)
There are some difficulties, though, to apply this specification in practice. For example, if
LESS_FAIL and MORE_FAIL overlap, the specification is non-deterministic and the protocol
would not guarantee atomicity. More importantly, LESS_FAIL is hard to define in theory and
detect at the processes in practice.
In Neiger [13], conditions like (∧q∈TYesq ∧ LESS_FAIL) and (∨q∈T Noq ∨ MORE_FAIL) are
called enabling conditions for Commit and Abort decisions. However, how the enabling
conditions should be assigned to practical atomic commit protocols is not discussed.
In what follows, we name the enabling
ABORT_ENABLED respectively. That is,
conditions
•
COMMIT_ENABLED ≡ ∧q∈TYESq ∧ LESS_FAIL
•
ABORT_ENABLED ≡ ∨q∈T NOq ∨ MORE_FAIL
COMMIT_ENABLED
and
Practically in most popular practical atomic commit protocols, COMMIT_ENABLED is
implicitly defined as “somebody knows the fact” (ST COMMIT_ENABLED) and typically (Kc
COMMIT_ENABLED) where this “somebody” is a particular c, known as the coordinator of
the protocol. That is, the decision made by the coordinator is tantamount to coordinator’s
knowledge of the enabling conditions:
•
(R, r, t) |= COMMITc ⇔ (R, r, t) |= Kc COMMIT_ENABLED
•
(R, r, t) |= ABORTc ⇔ (R, r, t) |= Kc ABORT_ENABLED
In other words, LESS_FAIL is defined by the fact that the coordinator has successfully received
all Yes votes within a specific deadline. Failures after that are considered to be LESS_FAIL. All
other failures are considered to be MORE_FAIL, including any participant failing to give a vote
or any vote failing to arrive at the coordinator in time.
More formally,
•
(R, r, t) |= Kc COMMIT_ENABLED ≡ (R, r, t) |= Kc(∧q∈TYESq ∧ ¬timeoutc(b))
•
(R, r, t) |= Kc ABORT_ENABLED ≡
(R, r, t) |= Kc(∨q∈T NOq ∨ (¬∧q∈TYESq ∧ timeoutc(b) )
where b is an earlier event local at c, such the beginning of the protocol.
Given the role of the coordinator as described above, the next task of the protocol is to ensure
that all participants p ∈ T eventually knows the enabling condition and reach the corresponding
outcome:
(R, r, t) |= COMMITc ⇒ (R, r, t) |= ◊Kp COMMITc
⇔ (R, r, t) |= ◊Kp COMMIT_ENABLED ⇔ (R, r, t) |= ◊COMMITp
5.2
Termination
An atomic commit protocol terminates when all participants reach the same outcome:
(R, r, t) |= COMMIT_END ∨ ABORT_END
where COMMIT_END ≡ ∧p∈TCOMMITp and ABORT_END ≡ ∧p∈TABORTp
Note that this is distributed knowledge among the participants, or formally
(R, r, t) |= DT (COMMIT_END ∨ ABORT_END).
For every individual participant to determine the coordinated outcome, Neiger [13] has shown
the necessary condition for termination:
•
(R, r, t) |= COMMITp ⇒ (R, r, t) |= KpC◊ COMMIT_ENABLED
•
(R, r, t) |= ABORTp ⇒ (R, r, t) |= Kp C◊ ABORT_ENABLED
That is, every individual participant must know that the enabling condition is eventual common
knowledge among the participants.
Similar to the necessary condition for agreement, this could be useful in special circumstances
when NO_FAIL is used instead of LESS_FAIL in COMMIT_ENABLED. For example, given
atomic broadcast,
(R, r, t) |= KpYESq ⇒ NO_FAIL
Therefore
(R, r, t) |= C◊(∧q∈TYesq) ⇒ (R, r, t) |= C◊ COMMIT_ENABLED
To obtain eventual common knowledge is a difficult task in general (though easier than
obtaining common knowedge). Here again, the use of a coordinator could be a rescue. Assume
that it is a priori common knowledge among participants of a transaction that a coordinator can
guarantee the following for a proposition ϕ:
(R, r, t) |= Kcϕ ⇒ (R, r, t) |= ◊C◊ϕ
That is, the coordinator’s knowledge about ϕ will eventually lead to eventual common
knowledge among the participants. If we replace ϕ with the enabling condition of an outcome,
then, in order to achieve KpC◊ϕ, it suffices to obtain KpKcϕ. This coincides with the sufficient
condition for achieving agreement among the participants. So the coordinator plays double
roles: (1) to detect the enabling condition for Commit, and (2) to ensure the termination of the
protocol.
Of course, it remains as the next task for the coordinator and the participants to guarantee that
this assumption actually holds.
5.3
Designing atomic commit protocols
We start with a design strategy which leads to protocol structures. Every protocol structure then
allows for multiple implementations.
According to the discussions in the previous sections, a useful strategy to design an atomic
commit protocol is to use a coordinator and divide the protocol into multiple steps, such as the
following:
1. (R, r, t1) |= Kcϕ
2. (R, r, t2) |= KpKcϕ
3. (R, r, t3) |= KcC◊ϕ
where c is the coordinator, ϕ is the enabling condition of either Commit or Abort, and t1 ≤ t2 ≤ t3.
The first two steps are actually the two phases in the well-known Two-Phase Commit protocols
(2PC) that are used to achieve the agreement: the first vote-collection phase and the second
outcome-notification phase. In the last step, the coordinator ensures that the protocol will
eventually terminate. Practically, this means that the coordinator must keep the necessary
information stably until this goal is achieved.
One practical design for the last termination step is that the coordinator keeps the necessary
information until it knows that the protocol actually has terminated. That is:
(R, r, t3) |= Kc(COMMIT_END ∨ ABORT_END)
This is known as the baseline 2PC, or presumed nothing 2PC.
Due to the fact that there can only be one of the two outcomes Commit and Abort, an alternative
design is that the coordinator keeps the necessary information until the following:
•
(R, r, t3) |= Kc(COMMIT_END ∨ ◊ABORT_END)
•
(R, r, t3) |= Kc(◊COMMIT_END ∨ ABORT_END)
That is, the coordinator keeps the necessary information about the transaction until it knows that
either the protocol has actually committed (aborted) or eventually all participants will only abort
(commit). These are known as the presumed abort (presumed commit) 2PC. The presumption is
the knowledge a priori known by the system. The advantage of the presumed outcome 2PC is
that the coordinator does not have to collect all the information about the actual termination of
the protocol in both Commit and Abort cases. For the presumed outcome, it is safe for the
coordinator to be information-free about the transaction and still be able to guarantee the correct
termination of the protocol. Of course, care must be taken to ensure that when the coordinator is
information-free about a transaction, it does know that, either has the transaction terminated
with the non-presumed outcome, or the only possible outcome will be the presumed one.
In general, the design of a distributed protocol can be done by first specifying the states of
knowledge and then providing the implementation of the state transitions. This is similar to the
use of formal specifications like VDM and Z in the design of sequential software. Janssen [8]
gave a classification of some common abstract transitions among states of knowledge and
discussed how they can be realized in different contexts.
For the baseline presumed nothing 2PC sketched above, we can present the corresponding states
of knowledge as bellow:
VOTEp = YESp ∨ NOp
DECISIONc = COMMITc ∨ ABORTc
COMMITc = Kc(∧p∈T YESp)
ABORTc = Kc(∨p∈T NOp ∨ (TIMEOUTc(some local event at c))
VOTEp
YESp
∧p∈T
DECISIONc
NOp
TIMEOUTc
COMMITc
∧p∈T
OUTCOMEp
END2PCc
∨p∈T
or
ABORTc
or
COMMITp
ABORTp
∧p∈T
∧p∈T
COMMIT_ENDc
ABORT_ENDc
Figure 1. Knowledge dependency graph of presumed nothing 2PC
OUTCOMTp = COMMITp ∨ ABORTp
COMMITp = KpCOMMITc
ABORTp = Kp(ABORTc∨ NOp)
END2PCc = COMMIT_ENDc ∨ ABORT_ENDc
COMMIT_ENDc = Kc (∧p∈T COMMITp)
ABORT_ENDc = Kc (∧p∈T ABORTp)
Basically, the states of knowledge and transitions among them form a knowledge-dependency
graph. For example, the state transitions for the baseline presumed nothing 2PC corresponds to
the graph in Figure 1. This knowledge dependency graph provides a basic structure of a correct
(i.e., meets the agreement and termination requirements) atomic commit protocol.
5.4
Implementing atomic commit protocols
Decoupling specification from implementation enables flexibility. Here we discuss some of the
implementation options.
5.4.1 Stable knowledge and recovery
One important goal in protocol design is to optimize performance and minimize runtime
overhead. In atomic commit protocols, network messages and disk accesses are the most
significant runtime overhead.
In our discussions so far, we require that the propositions (or knowledge states) are stable, i.e.,
once a fact becomes true, it will remain true forever. However in our system model, processes
are subject to failure at any possible moment and can restart thereafter. This implies that every
fact must be stably logged, which is a tremendous runtime overhead.
In practice however, facts need not be stable forever. Data about a fact that is not needed any
more, can be garbage collected. In fact, one important requirement is that all data about a
transaction must be eventually garbage collected.
Basically, the lifetime of data about a fact is bounded to its usage in knowledge ascription. For
example, YESp will contribute to COMMITc and ¬NOp, so its lifetime is bounded to the
lifetime of NOp and the time at which COMMITc is known. Note that the lifetime of mutually
exclusive facts are always bounded together, such as YESp and NOp. So it suffices to only
consider the lifetime of VOTEp.
Some facts need not be logged at all. For example, the fact FAILp will persist as long as the
process remains failed. As soon as the process restarts and the necessary information is reestablished, the fact that it has failed may not be of interest any more.
Some facts are subsumed by other facts. For example, it is always safe to assume that the time
between the events failc and restartc is greater than the value set in event timeoutc, so
TIMEOUTc is implied by FAILc and need not be logged.
Some local facts hang closely with one another and can be stably recorded in a single log. For
example, NOp directly leads to ABORTp, so the two facts can be recorded in a single log.
5.4.2 Knowledge and messages
In the theory of process knowledge [10], it is known that if at time t1 a fact ϕ local to process q
is unknown to process p and at time t2 > t1 p knows ϕ, then there is a message chain from q to p
between t1 and t2. This means that in the knowledge dependency graph, if there is an edge
between knowledge of different processes, then in protocol executions corresponding to the
graph, there is a message chain between the two processes. How this message chain occurs is an
implementation issue. Here we have at least the following choices:
•
Pull and push of knowledge.
This applies to virtually every piece of knowledge in the graph. For votes, typically a
pull model is applied, where the coordinator pulls this knowledge from the participants
with a Prepare message. In certain optimizations, a push model is used instead, such as
the 2PC with Early Prepare and Unsolicited Yes-Votes. Decisions, on the other hand,
are typically pushed from the coordinator to the participants. However, if a participant
does not hear from the coordinator for too long, it may pull this knowledge from the
coordinator by sending an inquiry message.
•
Knowledge collection and notification through a particular communication topology.
o
Stars. All messages go through the coordinator. This is quite natural in our
example, because all communication is in fact between the coordinator and the
participants. This version of 2PC with direct communication between the
coordinator and the participants is known as the flat 2PC.
o
Hierarchy. Processes are organized in a tree, typically rooted by the
coordinator. This is also known as tree 2PC or 2PC with interposition, which is
most widely used. One important reason for its wide use is that this tree
structure usually overlaps with the remote invocation structure of the
application.
o
Linear. This can be considered a special form of hierarchy.
o
Ring. This is not very much explored in practice, basically because establishing
such a ring structure per transaction is expensive. However, if there are a lot of
transactions with the same set of participants, this ring structure can be
established once and used many times. In such circumstances (which is often
the case in many applications), advantages of ring-structured communication
can be obtained. Advantages include: smaller number of messages, predictable
upper bound of transfer time of messages through all participants, piggyback of
several messages into one token through the same structure, etc.
The message chain theory has some other important implications to the implementation of an
atomic commit protocol. Here we discuss two of them.
In the knowledge dependency graph, a knowledge state may depend on some other knowledge
states. In other words, there can be multiple paths toward the same knowledge state. The
number of paths, however, may change during the execution of the protocol. For example, there
are two possible paths leading to END2PCc at the beginning of the protocol: COMMITp or
ABORTp, both from all participants. When the fact COMMITc is reached, the number of paths
reduces to only COMMITp from all participants. For OUTCOMEp, at the beginning, it depends
on COMMITc, ABORTc or NOp. After voting Yes, the number of paths reduces to COMMITc
and ABORTc. We can describe such dependency with a dependency set of a knowledge state,
which is a set of multiple subsets of processes. For END2PCc, it is {T}, both at the beginning
and after COMMITc is reached. For OUTCOMEp, it is {{c}, {p}} at the beginning and {{c}}
after a Yes vote. We say that END2PCc is blocked upon T, because the knowledge END2PCc
will not be obtained until all processes in T are available (not necessary at the same time). Note
that OUTCOMEp’s dependency set is {{c}, {p}} at the beginning. This means that OUTCOMEp
can be obtained based on some local knowledge (here NOp) only. So OUTCOMEp is not
blocking at the beginning. It is, however, blocked upon {c} after having voted Yes.
Blocking is an undesirable property of the protocol. It is more serious for some knowledge state
than other and deserves more concern. In 2PC, the blocking effect on OUTCOMEp (after Yes
vote) is more serious than the blocking effect on END2PCc. The main reason is that a
participant cannot release locks on data resources until it knows the outcome of a transaction. In
fact, 2PC is known as a blocking protocol due to this particular blocking effect. In general, the
blocking effect can be ameliorated in two ways:
•
Enhanced implementation, by improving the availability of the processes (and the
communication link with them) in the dependency set (e.g. Paxos 2PC [4]).
•
Different design, with a knowledge dependency graph in which the critical knowledge
state is never blocked by a remote process (e.g. 3PC).
Another use of the message-chain theory is, if it is known that there will be a massage chain
between two processes, then other information can be piggybacked along the same message
chain. The ring-structured communication above is one example. As another example, from the
knowledge dependency graph in Figure 1, we know that if a participant votes Yes, then there
will be a message chain from the coordinator back to the participant for the outcome to be
known. This participant-coordinator-participant chain can then be used for other purposes. For
example, in systems where disk writes at participants are more expensive than message passing,
the redo and undo logs, instead of being written to disk, can be piggybacked along the message
chain and still be guaranteed to be available when the outcome action commitp or abortp is
performed. Optimizations like Coordinator Log and Implicit Yes-Votes apply this principle.
5.4.3 Other implementation issues
In the discussions earlier, we mentioned the two different roles a coordinator plays in an atomic
commit protocol: achieving agreement and ensuring termination. It is therefore quite natural to
separate these roles in two coordinators in special circumstance. Transfer of coordinator in
linear 2PC is one such example.
Another interesting issue is how flexible a protocol execution can be. We consider the
flexibilities in two abstraction levels.
•
Within the same knowledge dependency graph. In the previous section, we discussed
different implementations that can be made for the same knowledge dependency graph.
Choice of different implementations can be made either statically or dynamically.
•
Switching between knowledge dependency graphs. Different design strategies may lead
to different knowledge dependency graphs. For example, the knowledge dependency
graphs corresponding to presumed nothing 2PC and presumed abort 2PC are different.
However, as long as required knowledge is available, an execution can switch from one
graph to another.
Our discussion on implementation issues so far is mainly based on the simple knowledge
dependency graph in Figure 1. The graph will look more complicated when, for example,
additional vote possibilities are possible (Yes, No, NoMatter/ReadOnly, Yes_heuristic,
Yes_compensate). The basic principle would still be the same.
6
Related Work
The logic of process knowledge has been used in design and analysis of atomic commit
protocols. In protocol design, the main focus has been on achieving agreement, mostly without
considering the effect of failure on decision making [6][8][12]. In protocol analysis, the focus
has been on proving impossibilities and lower bounds on number of messages [6][7][10][13].
There has not been much work on relating different coordination protocols in one single
framework, either for better understanding of the area or flexible runtime incorporation of them.
Chrysanthis et al. [2] gave a detailed overview of 2PC variations. We feel our way (design with
knowledge dependency graph and implementation of it) of looking at them is more logical and
offers us the possibility to achieve other goals such as flexibility (and switching between
different protocols).
7
Summery and Future Work
In this paper, we briefly overviewed a general distributed system model and the logic of process
knowledge. We then apply this logic as a tool to first formally specify the atomic commitment
problem and later on discuss the design and construction of atomic commit protocols. A design
starts with a design strategy which leads to a knowledge dependency graph. Different
implementations can be applied to the same graph. During the discussions, we mentioned
several protocol variations found in the literature. In fact, we believe that all 2PC variations in
the literature can be captured within this framework. Finally we mentioned some other
interesting issues such as flexibility.
This work is part of the Arctic Beans project, whose primary goal is to support flexibility and
adaptability in an open component-based enterprise architecture. We hope the logic of process
knowledge could be a useful tool to reason about and design flexible and adaptable coordination
protocols within this architecture.
8
References
[1] Bernstein, P. A., V. Hadzilacos and N. Goodman, Concurrency Control and Recovery in
Database Systems, Addison-Wesley, 1987.
[2] Chrysanthis, P. K., G. Samaras and Y. J. Al-Houmaily, “Recovery and Performance of
Atomic Commit Processing in Distributed Database Systems”, In V. Kumar and M. Hsu
(eds.), Recovery Mechanisms in Database Systems, pp 370-416, Prentice Hall PTR, 1998.
[3] Fagin, R., J. Y. Halpern, Y. Moses and M. Y. Vardi, Reasoning About Knowledge,
paperback edition, MIT Press, 2003.
[4] Gray, J. and L. Lamport, “Consensus on Transaction Commit”, Technical report MSR-TR2003-96, 2003.
[5] Gray, J. and A. Reuter, Transaction Processing: Concepts and Techniques, Morgan
Kaufmann Publishers, 1993.
[6] Hadzilacos, V., “A Knowledge-Theoretic Analysis of Atomic Commit Protocols”, In
Proceedings of 6th ACM Symposium on Principles of Database Systems, 1987.
[7] Halpern, J. and Y Moses, “Knowledge and Common Knowledge in a Distributed
Environment”, In Proceedings of 3rd ACM Symposium on Principles of Distributed
Computing, 1984.
[8] Janssen, W., “Layers as Knowledge Transitions in the Design of Distributed systems”, In
Proceedings of 1st International Workshop on Tools and Algorithms for Construction and
Analysis of System, Lecture Notes in Computer Science, Vol. 1019, pp. 238-263, SpringerVerlag, 1995.
[9] Lampson, B. W. and David B. Lomet, “A New Presumed Commit Optimization for Two
Phase Commit”, In Proceedings of 19th International Conference on Very Large Data
Bases, 1993.
[10] Mazer, M. S. and F. H. Lochovsky, “Analyzing Distributed Commitment by Reasoning
about Knowledge”, Technical Report CRL 90/10, DEC-CRL, 1990.
[11] Mohan, C., Bruce G. Lindsay and Ron Obermarck, “Transaction Management in the R*
Distributed Database Management System”. ACM Transactions on Database Systems,
11(4), pp 378-396, 1986.
[12] Moses, Y. and O. Kislev, “Knowledge-Oriented Programming”, In Proceedings of 12th
ACM Symposium on Principles of Distributed Computing, 1993.
[13] Neiger, G. and R. A. Bazzi, “Using Knowledge to Optimally Achieve Coordination in
Distributed Systems”, Theoretical Computer Science, 220 (1), pp 31-65, 1999.
[14] Object Management Group, CORBA Services Specification, Chapter 10, Transaction
Service Specification, December, 1998.
[15] Samaras, G., K. Britton, A. Citron and C. Mohan, “Two-Phase Commit Optimizations in a
Commercial Distributed Environment”, Distributed and Parallel Databases, 3(4), pp 325360, 1995.
[16] X/Open Company Ltd., Distributed Transaction Processing: The XA Specification.
Document number XO/CAE/91/300, 1991.
Download