Stabilizers: A Safe Lightweight Checkpointing Abstraction for Concurrent Programs Lukasz Ziarek Philip Schatz

advertisement
Stabilizers: A Safe Lightweight Checkpointing Abstraction for
Concurrent Programs
Lukasz Ziarek
Philip Schatz
Suresh Jagannathan
Department of Computer Science
Purdue University
{lziarek,schatzp,suresh}@cs.purdue.edu
Abstract
1. Introduction
A checkpoint is a mechanism that allows program execution to be
restarted from a previously saved state. Checkpoints can be used in
conjunction with exception handling abstractions to recover from
exceptional or erroneous events, to support debugging or replay
mechanisms, or to facilitate algorithms that rely on speculative
evaluation. While relatively straightforward to describe in a sequential setting, for example through the capture and application
of continuations, it is less clear how to ascribe a meaningful semantics for safe checkpoints in the presence of concurrency. For
a thread to correctly resume execution from a saved checkpoint,
it must ensure that all other threads which have witnessed its unwanted effects after the establishment of the checkpoint are also
reverted to a meaningful earlier state. If this is not done, data inconsistencies and other undesirable behavior may result. However,
automatically determining what constitutes a consistent global state
is not straightforward since thread interactions are a dynamic property of the program; requiring applications to specify such states
explicitly is not pragmatic.
In this paper, we present a safe and efficient on-the-fly checkpointing mechanism for concurrent programs. We introduce a new
linguistic abstraction called stabilizers that permits the specification of per-thread checkpoints and the restoration of globally
consistent checkpoints. Global checkpoints are computed through
lightweight monitoring of communication events among threads
(e.g. message-passing operations or updates to shared variables).
Our implementation results show that the memory and computation overheads for using stabilizers average roughly 4 to 6% on our
benchmark suite, leading us to conclude that stabilizers are a viable
mechanism for defining restorable state in concurrent programs.
Keywords: Concurrent programming, checkpointing, consistency, rollback, continuations, exception handling, message-passing,
shared memory.
Checkpointing mechanisms allow applications to preserve and restore state. Checkpoints have obvious utility for error recovery [29],
program replay and debugging [30, 39]; they can be used to support applications that engage in transactional behavior [18, 20, 41],
speculative execution [26] or persistence [12, 34]; and, they can be
used to build exception handlers that restore memory to a previous
state [36]. In functional languages, continuations provide a simple
checkpointing facility: defining a checkpoint corresponds to capturing a continuation [38], and restoring a checkpoint corresponds
to invoking this continuation. In the presence of references, a sensible checkpoint state would also need to store the current values of
these references along with the continuation.
Unfortunately, defining and manipulating checkpoints becomes
significantly more complex in the presence of concurrency. A
thread that wishes to establish a checkpoint can simply save its
local state, but of course there is no guarantee that the global state
of the program will be consistent if control ever reverts back to this
point. For example, suppose a communication event via messagepassing occurs between two threads and the sender subsequently
rolls back control to a local checkpointed state established prior to
the communication. A spurious unhandled execution of the (re)sent
message may result because the receiver has no knowledge that a
rollback of the sender has occurred, and thus has no need to expect retransmission of a previously executed message. In general,
the problem of computing a sensible checkpoint requires computing the transitive closure of dependencies manifest among threads
from the time the checkpoint is established to the time it is invoked.
A simple remedy would require the state of all active threads to be
simultaneously recorded whenever any thread establishes a checkpoint. While this solution is sound, it can lead to substantial inefficiencies and complexity. For instance, if a thread wishes to revert
its state to a previously established checkpoint, threads which have
witnessed its affects after the checkpoint must be unrolled. However, there may be other threads unaffected by the checkpointed
thread’s actions. A scheme that fails to recognize this distinction
would be overly conservative in its treatment of rollback, and would
be inefficient in practice, especially if checkpoints are restored often.
Existing checkpoint approaches can be classified into four broad
categories: (a) schemes that require applications to provide their
own specialized checkpoint and recovery mechanisms [5, 6]; (b)
schemes in which the compiler determines where checkpoints can
be safely inserted [4]; (c) checkpoint strategies that require operating system or hardware monitoring of thread state [9, 22, 25];
and (d) library implementations that capture and restore state [13].
Checkpointing functionality provided by an application or a library relies on the programmer to define meaningful checkpoints.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
c ACM [to be supplied]. . . $5.00.
Copyright 1
2006/1/17
For many multi-threaded applications, determining these points is
non-trivial because it requires reasoning about global, rather than
thread-local, invariants. Compiler and operating-system injected
checkpoints are transparent to the programmer. However, transparency comes at a notable cost: checkpoints may not be semantically meaningful or efficient to construct. If all application threads
run within the same process, saving and restoring checkpoints may
be expensive since only a small number of threads may be affected
as a result of a rollback. If application threads run in separate processes, each process may get checkpointed at different intervals,
violating the need to preserve global state. Furthermore, concurrent programs exacerbate the question of where and how to inject sensible checkpoints because thread interaction is often nondeterministic. In a multi-threaded program, an injected checkpoint
may capture different global state each time the same piece of code
is executed.
1.1 Stabilizers
To alleviate the burden of defining and restoring safe checkpoints
in multi-threaded programs, we propose a new language abstraction
for dynamic, composable, on-the-fly checkpointing called stabilizers. Stabilizers encapsulate two operations. The first initiates monitoring of code for communication and thread creation events, and
establishes thread-local checkpoints when monitored code is evaluated. The other reverts control and state to a safe global checkpoint. The checkpoints defined by stabilizers are composable: a
monitored procedure can freely create and return other monitored
procedures. Stabilizers can be arbitrarily nested, and work in the
presence of a dynamically-varying number of threads.
Our checkpointing mechanism is a middle ground between the
transparency afforded by operating systems or compilers, and the
precision afforded by user-injected checkpoints. In our approach,
applications are required to identify meaningful per-thread program
points where a checkpoint may be saved; when a rollback operation occurs, control reverts to one of these saved checkpoints for
each thread. The exact set of checkpoints chosen is determined by
safety conditions that ensure that a globally consistent state is preserved. Our approach guarantees that when a thread is rolled-back
to a checkpointed state C, other threads with which it has communicated prior to its last rollback are in states consistent with C.
No action is taken for threads that have not been influenced by its
effects.
To calculate how to revert threads to safe checkpoints, the runtime system keeps track of thread states and traces communication
events among threads. When a spawn, communication, or shared
data access operation occurs, information is recorded in a runtime
data structure about the operation as well as the thread’s continuation prior to the event.
To process a rollback, the runtime-maintained data structure is
consulted to determine the proper checkpoint for all threads that
maintains global consistency. A rollback is sensible only if reexecution results in a different execution path than the one that
caused the rollback to occur initially. Thus, our solution critically
relies on non-deterministic behavior: to ensure that rollbacks do not
simply lead to infinite looping, subsequent re-execution of threads
should lead to different thread interactions and behavior. For most
multi-threaded programs, this requirement is not particularly onerous. However, to allow applications further control over the state in
which a checkpoint resumes, stabilizers also come equipped with
a simple compensation mechanism [7] that maybe executed before
control is reverted to the checkpointed state. Compensations also
allow stabilizers to work in the presence of non-restorable actions
such as I/O.
1.2 Contributions
This paper makes three contributions:
2
1. The design and semantics of stabilizers, a new language abstraction for defining and restoring meaningful checkpoints in
concurrent programs in which threads communicate through
both message-passing and shared memory. To the best of our
knowledge, stabilizers are the first language-centric design of
a checkpointing facility for concurrent programs with dynamic
thread creation, and selective communication [32] that provides
global consistency and safety guarantees when checkpointed
state is restored.
2. A lightweight dynamic monitoring algorithm faithful to the semantics that constructs efficient checkpoints based on the context in which a restore action is performed. Efficiency is defined
with respect to the amount of rollback required to ensure that
all threads resume execution after a checkpoint is restored in a
consistent global state.
3. A detailed evaluation study in SML that quantifies the cost of
using stabilizers on various open-source server-style applications. Our results reveal that the cost of defining and monitoring thread state is small, typically adding roughly no more than
6% overhead to overall execution time. Memory overheads are
equally modest.
The remainder of the paper is structured as follows. In Section 2,
we provide a motivating example that highlights the issues associated with safely checkpointing computation in concurrent programs. Section 3 describes the stabilizer abstraction. An operational semantics is given in Section 4. A strategy for incremental
construction of checkpoint information is given in Section 5. Implementation details are provided in Section 6. A detailed evaluation
on the costs and overheads of using stabilizers is given in Section 7,
related work is presented in Section 8, and conclusions are given in
Section 9.
2. Motivating Example
To motivate the use of stabilizers, consider the program fragment
shown below. The program spawns a new asynchronous thread of
control to compute the application of f to argument arg . Function
f in turn spawns a thread to compute the application of g to argument arg’ , and sends data on a channel c that may potentially
be read by g . In addition, g also reads data from channel c’ that
is not accessed by f . Assume channels are synchronous, and thus
sends and receives block if there is no matching recipient or sender
(resp). In the example, both f and g can raise a Timeout exception. The desired behavior when a timeout occurs is to re-execute
the procedure that raises the exception, ensuring that none of the
procedure’s earlier effects remain visible when it is reapplied. Ordinarily, an exception handler will not be able to restore the global
program state such that the procedure can be re-executed safely.
let val c = channel()
val c’ = channel()
fun g y = ... recv(c) ... recv(c’)
...
raise Timeout
...
in handle Timeout => ...
fun f x = let val = spawn(g(...))
val = send(c,x)
...
in if ...
then raise Timeout
else ...
end handle Timeout => ...
in spawn(f(arg))
end
For example, notice that f not only spawns a new thread, but
also communicates data along channel c . Simply re-executing f
2006/1/17
without reverting c ’s receivers would be obviously incorrect. Furthermore, any thread such as g that reads a value communicated
by f may store that value, propagate it to other threads, or perform
arbitrary computation based on that value. If f ’s re-execution propagates a new value for x , its previous effects are no longer valid.
Thus, to ensure that the execution of the handler results in a benign
global state, all threads potentially affected by f ’s communication
on c must be identified. However, the handler’s obligations do not
end here. For example, consider thread g that also receives data
from channel c’ . If g is reverted because it read data produced by
f or because it raised a Timeout exception after receiving data
from c’ , then the communication it established on channel c’
is also suspect. Reverting g without clearing that communication
could lead to inconsistencies; the sender on c’ assumes that the
value it produced has been consumed, but g ’s re-execution would
expect a communication of that value. Observe that both f and g
provide their own local Timeout handlers; propagating the effect
of a timeout exception raised locally involves restructuring the program to communicate such events among concurrently executing
threads. Because the various scenarios that may arise depend upon
runtime scheduling decisions, any scheme that purports to allow
safe reversion of a previously executed computation must dynamically discover safe states for all affected threads.
3. Programming Model
To dynamically calculate globally consistent states, we introduce
a new abstraction called stabilizers. Stabilizers are expressed using two primitives, stable and stabilize , with the following
signatures:
stable
: (’a -> ’b) -> ’a -> ’b
stabilize : unit -> ’a
A stable section is a monitored section of code whose effects are
guaranteed to be reverted as a single unit. The primitive stable is
used to define stable sections. Given function f the evaluation of
stable f yields a new function f’ identical to f except that interesting communication, shared memory access, locks, and spawn
events are monitored and grouped.
The second primitive, stabilize reverts execution to a dynamically calculated global state; this state will always correspond
to a program state that existed immediately prior to execution of
a stable section, communication event, or thread spawn point for
each thread. We qualify this claim by observing that external nonrevocable actions that occur within a stable section that must be
reverted (e.g., I/O, foreign function calls, etc.) must be handled
explicitly by the application. Stabilizers are equipped with a simple compensation mechanism for this purpose; we elide the details
here. Note that similar to operations like raise or exit that also
do not return, the result type of stabilize is synthesized from
the context in which it occurs.
Unlike classical checkpointing schemes [37] or exception handling mechanisms, the result of invoking stabilize does not
guarantee that control reverts to the state corresponding to the
dynamically-closest stable section. The choice of where control reverts depends upon the actions undertaken by the thread within the
stable section in which the stabilize call was triggered, or the
event prior to the stabilize call if it occurs outside a stable section.
Composability is an important design feature of stabilizers:
there is no a priori classification of the procedures that need to
be monitored, nor is there any restriction against nesting stable sections. Moreover, stabilizers separate the construction of monitored
code regions from the capture of state. It is only when a monitored
procedure is applied that a potential thread-local restoration point is
established. The application of such a procedure may in turn result
3
(a)
(b)
Figure 1. Interaction between stable sections.
in the establishment of other independently constructed monitored
procedures. In addition, these procedures may themselves be applied and have program state saved appropriately; thus, state saving and restoration decisions are determined without prejudice to
the behavior of other monitored procedures.
3.1 Interaction of Stable Sections
When a stabilize action occurs, matching inter-thread events are
unrolled as pairs. If a send is unrolled, the matching receive must
also be unrolled. If a thread spawns another thread within a stable
section that is being reverted, this new thread (and all its actions)
must also be discarded. All threads which read from a shared
variable must be unrolled if the thread that wrote the value is
reverted to a state prior to the write. A program state is stable with
respect to a statement if there is no thread executing in this state
affected by the statement (i.e., all threads are in a point within their
execution prior to the execution of the statement and its transitive
effects).
For example, consider thread t1 that enters a stable section S1
and initiates a communication event with thread t2 (see Fig. 1(a)).
Suppose t1 subsequently enters another stable section S2 , and again
establishes a communication with thread t2 . Suppose further that t2
receives these events within its own stable section S3 . The program
states immediately prior to S1 and S2 represent feasible checkpoints as determined by the programmer, depicted as white circles
in the example. If a rollback is initiated within S2 , then a consistent
global state would require that t2 revert back to the state associated
with the start of S3 since it has received a communication from t1
initiated within S2 . However, discarding the actions within S3 now
obligates t1 to resume execution at the start of S1 since it initiated
a communication event within S1 to t2 (executing within S3 ). Such
situations can also arise without the presence of nested stable sections. Consider the example in Fig. 1(b). Once again, the program
is obligated to revert t1 , since the stable section S3 spans communication events from both S1 and S2 .
3.2 Example
One significant use of stabilizers is their ability to allow threadlocal reasoning about the effects of exceptional conditions, thereby
alleviating the need to have handlers that manage global state.
Swerve is an open-source webserver written in Concurrent
ML [32] whose structure could benefit from using stabilizers. Fig. 2
shows the definition of fileProducer, a Swerve function that
sends a requested file to a client by chunking the file contents
into a series of smaller packets. The body of the local procedure
sendFile() opens a file, establishes a time limit using Abort,
and then communicates the file to the receiver. The original code
must check in every iteration of the file processing loop whether
a timeout has occurred by polling the Abort.aborted function.
Although not shown in the figure, Swerve creates a new timeout event for every request that is triggered whenever processing
2006/1/17
fun fileProducer name abort consumer =
let fun sendFile() =
let fun loop strm =
if Abort.aborted abort
fun fileProducer name abort consumer =
let fun sendFile() =
let fun loop strm =
let val chunk =
BinIO.inputN(strm, fileChunk)
in ... read a chunk of the file
... and send to consumer
loop strm)
end
in stable fn() =>
(case BinIOReader.openIt abort name
of NONE
=>()
| SOME h =>(loop (BinIOReader.get h);
BinIOReader.closeIt h)) ()
end
then CML.send(consumer, XferAbort)
else let val chunk =
BinIO.inputN(strm, fileChunk)
in ... read a chunk of the file
... and send to consumer
loop strm)
end
in (case BinIOReader.openIt abort name
of NONE => ()
| SOME h => (loop (BinIOReader.get h);
BinIOReader.closeIt h)
end
Figure 2. File processing code from the Swerve web server. The right-hand side shows the code modified to use stabilizers. Italics mark
areas where the code is changed.
exceeds a time-limit. If a timeout has occurred, the procedure is obligated to notify the consumer through an explicit send on channel
consumer.
Stabilizers allow us to abstract this explicit notification process.
By wrapping the file processing logic of sendFile() in a stable
section, we guarantee its re-execution whenever a stabilize()
operation is performed. Suppose a call to stabilize replaced the
call to CML.send(consumer, XferAbort). This action would result in unrolling both the actions of sendFile() as well as the consumer, since the consumer is in the midst of receiving file chunks.
Note that if the timeout occurs before the consumer receives any
data, the stabilizer semantics ensures that the consumer remains
unaffected.
However, a cleaner solution presents itself. Suppose that we
modify the definition of Abort to invoke stabilize, and wrap
its operations within a stable section. Now, there is no need
for any thread to poll for the timeout event. Since the function
BinIOReader.openIt abort name, establishes a time limit via
the abort event by communicating with Abort to do so, a call to
stabilize() will unroll all actions related to processing this file.
This transformation therefore allows us to specify a timeout mechanism without having to embed non-local timeout handling logic
within each thread that potentially can be affected by it.
4. Semantics
Our semantics is defined in terms of a core call-by-value functional
language with threading primitives (see Fig. 3). For perspicuity, we
first present an interpretation of stabilizers in which evaluation of
stable sections immediately results in the capture of a consistent
global checkpoint. This semantics reflects a simpler characterization of checkpointing than the informal description presented in
Section 3. In Section 5, we refine this approach to construct checkpoints incrementally.
In the following, we use metavariables v to range over values,
and δ to range over stable section or checkpoint identifiers. We also
use P for thread terms, and e for expressions. We use over-bar
to represent a finite ordered sequence, for instance, f represents
f1 f2 . . . fn . The term α.α denotes the prefix extension of the
sequence α with a single element α, α.α the suffix extension, αα′
denotes sequence concatenation, φ denotes an empty sequence, and
α ≤ α′ holds if α is a prefix of α′ . We write | α | to denote the
length of sequence α.
Our communication model is a message-passing system with
synchronous send and receive operations. We do not impose a
strict ordering of communication actions on channels; communica4
tion actions on the same channel are paired non-deterministically.
To model asynchronous sends, we simply spawn a thread to perform the send1 . To this core language we add two new primitives:
stable and stabilize. The expression stable(λ x.e) creates a
stable function, λs x.e whose effects are monitored. When a stable
function is applied, a global checkpoint is established, and its body,
denoted as stable(e), is evaluated in the context of this checkpoint. The second primitive, stabilize, is used to initiate a rollback.
The syntax and semantics of the language are given in Fig. 3.
Expressions are variables, locations to represent channels, λabstractions, function applications, thread creations, communication actions to send and receive messages on channels, or operations to define stable sections, and to stabilize global state to a consistent checkpoint. We do not consider references in this core language as they can be modeled in terms of operations on channels.
We describe how to explicitly handle references in Section 5.2.
A program is defined as a collection of threads. Each thread
is uniquely identified, and is also associated with a stable section
identifier (denoted by δ) that indicates the stable section the thread
is currently executing within. Stable section identifiers are ordered
under a relation that allows us to compare them. Thus, we write
t[e]δ if a thread with identifier t is executing expression e in the
context of stable section δ; since stable sections can be nested,
the notation generalizes to sequences of stable section identifiers
with sequence order reflecting nesting relationships. Our semantics
is defined up to congruence of threads (PkP′ ≡ P′ kP). We write
P ⊖ {t[e]} to denote the set of threads that do not include a thread
with identifier t, and P ⊕ {t[e]} to denote the set of threads that
contain a thread executing expression e with identifier t.
We use evaluation contexts to specify order of evaluation within
a thread, and to prevent premature evaluation of the expression
encapsulated within a spawn expression. We define a thread context
E t,P [e] to denote an expression e available for execution by thread
δ
t ∈ P ; the sequence δ indicates the ordered sequence of nested
stable sections within which the expression evaluates.
Local reductions within a thread are specified by an auxiliary
relation, e → e′ that evaluates expression e within some thread
to a new expression e′ . The local evaluation rules are standard:
holes in evaluation contexts can be replaced by the value of the
expression substituted for the hole, function application substitutes
the value of the actual parameter for the formal in the function body,
channel creation results in the creation of a new location that acts as
1 Asynchronous
receives are not feasible without a mailbox abstraction [32].
2006/1/17
a receptacle for message transmission and receipt, and a function
supplied as an argument to a stable expression yields a stable
function.
Program evaluation is specified by a global reduction relation,
α
P, ∆, =⇒ P ′ , ∆′ , that maps a program state to a new program
state. A program state consists of a collection of evaluating threads
(P ) and a stable map (∆) that defines a finite function associating
stable section identifiers to states. We tag each evaluation step with
an action that defines the effects induced by evaluating the expresα
∗
sion. We write =⇒ to denote the reflexive, transitive closure of
this relation. The actions of interest are those that involve communication events, or manipulate stable sections.
There are five global evaluation rules. The first describes
changes to the global state when a thread to evaluate expression
e is created; the new thread evaluates e in a context without any
stable identifier. A communication event synchronously pairs a
sender attempting to transmit a value along a specific channel in
one thread with a receiver waiting on the same channel in another
thread.
The most interesting global evaluation rules are ones involving
stable sections. When a stable section is newly entered, a new stable
section identifier is generated; these identifiers are related under a
total order that allows the semantics to express properties about
lifetimes and scopes of such sections. The newly created identifier
is mapped to the current global state and this mapping is recorded
in the stable map. This state represents a possible checkpoint. The
actual checkpoint for this identifier is computed as the state in
the stable map that is mapped by the least stable identifier. This
identifier represents the oldest active checkpointed state. This state
is either the state just checkpointed, in the case when the stable
map is empty, or represents some earlier checkpoint state known
to not have any dependencies with actions in other stable sections.
In other words, if we consider stable sections as forming a tree
with branching occurring at thread creation points, the checkpoint
associated with any stable section represents the root of the tree at
the point where control enters that section.
When a stable section exits, the thread context is appropriately
updated to reflect that the state captured when this section was
entered no longer represents an interesting checkpoint; the stable
section identifier is removed from the resulting stable map. A
stabilize action simply reverts to the state captured by the outermost
stable section of this thread. Note that, while easily defined, the
semantics is conservative: there may be checkpoints that involve
less unrolling that the semantics does not identify.
The soundness of the semantics is defined by an erasure property on stabilize actions. Consider the sequence of actions α that
comprise a possible execution of a program. Suppose that there is a
stabilize operation that occurs in α. The effect of this operation
is to revert the current global program state to an earlier checkpoint.
However, given that program execution successfully continued after the stabilize call, it follows that there exists a sequence of
actions from the checkpoint state that yields the same state as the
original, but which does not involve execution of the stabilize
operation. In other words, stabilize actions can never manufacture new states, and thus have no effect on the final state of program
evaluation. We formalize this property in the following safety theorem.
Theorem[Safety]
Let
α. ST
E t,P [e], ∆ =⇒
δ
∗
α. ST.β
∗
P ′ , ∆′ =⇒
P ′′ kt[v], ∆f
∗
where α′ ≤ α
The proof of Theorem [Safety] for one stabilize call is given as
a weak bisimulation between evaulations, the general proof follows
by induction. We first build up need machinery before constructing
the weak bisimulation.
Lemma [1]
Let
α
α
αn−1
ST
1
0
. . . =⇒ Pn , ∆n =⇒ Pn+1 , ∆n+1
P1 , ∆1 =⇒
P0 , ∆0 =⇒
Then, there exists an i, 0 ≤ i ≤ n, such that
Pn+1 , ∆n+1 = Pi , ∆i
We proceed by proving Lemma [1] by examing the global evalST
uation rules. Clearly P, ∆ =⇒ P ′ , ∆′ returns a process and
stable map contained in ∆ (i.e. there exists some δ for which
∆(δ) = P ′ , ∆′ ). Therefore, it is enough to show that there does
α
not exist a state mapped in ∆, which was not evaluated under =⇒ .
SS
The only rule which modifies the stable map is =⇒ , which adds
ST
the current state to the stablemap. Notice, =⇒ is undefined for any
context which is not within a stable section. It is, therefore, immediate clear that Lemma [1] holds.
Consider the finite state automata Q, presented in Fig 4, which
α′ .β
represents the following evaluation: E t,P [e], ∆ =⇒ ∗ P ′′ kt[v], ∆f .
δ
We represent the initial state as q1, and the state reachable after α′
transitions as q2. The final state is represented as q3. Without loss
of generality, we colapse all intermediate states during the evaluation.
The finite state automata P, presented in Fig 5 represetns
α. ST.β
the following evaluation: E t,P [e], ∆ =⇒ ∗ P ′ , ∆′ =⇒ ∗
δ
P ′′ kt[v], ∆f . States q1, q2, and q4 reprsent the initial, the state
reachable after α′ transitions, and final respectively. The additional
α′ .β
δ
Figure 5. Finite State Automata for stabilized evaluation (P).
α. ST
Then, there exists an equivalent evaluation
E t,P [e], ∆ =⇒
Figure 4. Finite State Automata for regular evaluation (Q).
P ′′ kt[v], ∆f
5
2006/1/17
S YNTAX :
P
e
P ROGRAM S TATES :
::=
::=
|
|
P kP | t[e]δ
x | l | λ x.e | λs x.e
mkCh() | send(e, e) | recv(e) | spawn(e)
stable(e) | stable(e) | stabilize
E VALUATION C ONTEXTS :
E
::=
• | E (e) | v(E ) |
send(E , e) | send(l, E ) |
recv(E ) | stable(E ) | stable(E )
E t,P [e]
δ
::=
P
t
x
l
δ
v
α, β
Λ
∈
∈
∈
∈
∈
∈
∈
∈
Process
Tid
Var
Channel
StableId
Val
=
Op
=
StableState =
unit | λ x.e | λs x.e | l
{SP, COMM , SS , ST, ES}
Process × StableMap
∆
∈
StableMap =
StableId → StableState
fin
L OCAL E VALUATION RULES :
P kt[E [e]]δ
λ x.e(v)
mkCh()
stable(λ x.e)
e → e′
α
→
→
→
e[v/x]
l, l fresh
λs x.e
E t,P [e], ∆ =⇒ E t,P [e′ ], ∆
δ
δ
G LOBAL E VALUATION RULES :
t′ fresh
∀δ ∈ Dom(∆), δ ′ ≥ δ
∆ = ∆[δ ′ 7→ (E t,P [λs x.e(v)], ∆)]
δ
Λ = ∆′ (δmin ), δmin ≤ δ ∀δ ∈ Dom(∆′ )
′
SP
E t,P [spawn(e)], ∆ =⇒ P kt[E [unit]]δ kt′ [e]φ , ∆
δ
SS
E t,P [λs x.e(v)], ∆ =⇒ E t,P [stable(e[v/x])], ∆[δ ′ 7→ Λ]
t′ fresh
P = P ′ kt[E [send(l, v)]]δ kt′ [E ′ [recv(l)]]δ′
δ
δ.δ
COMM
ES
P, ∆ =⇒ P ′ kt[E [unit]]δ.δ kt′ [E ′ [v]]δ′ .δ′ , ∆
E t,P [stable(v)], ∆ =⇒ E t,P [v], ∆ − {δ}
δ
δ.δ
∆(δ) = (P ′ , ∆′ )
E t,P [stabilize], ∆
δ.δ
ST
=⇒ P ′ , ∆′
Figure 3. A core call-by-value language for stabilizers.
state q3, however, represents the state given by α′′ transitions from
q2. We define α as the concatenation of α′ .α′′ , namely all transitions before a stabilize operation. By the result of Lemma [1],
a stabilize operation will result in a previously seen state. Without loss of generality, we define this state as q2. Therefore, the
evaluation of a stabilize operation results in the same state as the
SS
ST
by the rule =⇒ . The state which is restored by the rule =⇒
SS
is precisely the state stored in ∆ by the rule =⇒ . Clearly, all
states reachable by the evluation of α′′ are no longer reachable
ST
after the evaluation of =⇒ . As such, we can define the pair of
tansitions between p2 and p3 as τ transitions. We therefore define
β
α′
evaluation of E t,P [e], ∆ =⇒ ∗ P ′ , ∆′ .
δ
Lemma [2]
The finite state automata P strongly simulates Q.
We can construct a strong simulation between P and Q. Namely,
the finite automata P strongly simulates Q, but the converse does
not hold. The simple simulation is given by the following equivalences {(q1, p1), (q2, p2), (q3, p4)}. The converse does not hold
due to the auxiliary state p3, for which there is no equivalent state
or set of states in Q.
Since there does not exist a stong bisimulation between P and
Q (see Lemma [2]), we show a weak bisimulation between P and
Q. Our initial step is to define τ transitions in P. Since a stabilize
operation precisely undoes affects within a program, we would like
to consider the sequence of transition p2 to p3 under α′′ and the
transition from p3 to p2 under ST as a sequence of τ transitions
(i.e. transitions with no observable affects on the system).
Lemma [3]
Transitions between states p2 and p3, and p3 and p2 have no
observable affects on the system when considered as a pair.
Based on Lemma [1] we know the result of a stabilize will result
in a previously seen state. This state is captured and stored in ∆
6
τ
τ
the transition p2 ⇒ p4 as the sequence of transitions p2 → p3 →
β
p2 → p4.
Theorem [Weak Bisimulation]
The exists a weak bisimulation between finte state automatas P and
Q.
We show a bisimulation between the following transitions in
β
α′
α′
P: p1 → p2, p2 ⇒ p4, and the mathcing tranistions in Q: q1 →
β
q2, q2 → q3. The weak bisimulation is given by the following
equivalences {(q1, p1), (q2, p2), (q3, p4)} based on the previously
defined transitions. Therefore, α′ < α. Consider the case where
α′ = α. In this situation the state p3 does not exists and the
resulting stabilize operation transitions between state p2 and p2.
The bisimulation for the resulting FSA and Q is similar to the weak
β
bisimulation between P and Q, with the definition of p2 ⇒ p4 as
τ
β
p2 → p2 → p4.
The general case holds by simple induction on the number of
stabilize calls within a sequence of reductions. The base case is
no call to stabilize and holds trivially. Assume Theorem [Safety]
holds for sequences containing n stabilize calls, clearly by inserting
2006/1/17
an extra call to stabilize within this sequence yeilds a new valid
sequence. Since we know we can construct a weak bisimulation
between a sequence that contains n calls to stabilize to a sequence
with no calls to stabilize, by Lemma [1], Lemma [3], and Theorem
[Weak Bisimulation] Theorem [Safety] for the sequence with n + 1
calls to stabilize.
5. Incremental Construction
sults in less rollback than a global point-in-time checkpoint strategy:
Theorem[Correspondence]
Let
α. ST
E t,P [e], ∆0 =⇒
δ
Then whenever
α. ST
∗
α. ST.β
∗
P ′ , ∆′ =⇒
α. ST.β ′ ∗
E t,P [e], G0 ; ∗ P ′ , G′ ;
Although correct, our semantics is overly conservative because a
global checkpoint state is computed upon entry to every stable section. Thus, all threads, even those unaffected by effects that occur in
the interval between when the checkpoint is established and when
it is restored, are unrolled. A better alternative would restore thread
state based on the actions witnessed by threads within checkpoint
intervals. If a thread T observes action α performed by thread T ′
and T is restored to a state that precedes the execution of α, T ′
can be restored to its latest local checkpoint state that precedes its
observance of α. If T witnesses no actions of other threads, it is
unaffected by any stabilize calls those threads might make. This
strategy leads to an improved checkpoint algorithm by reducing the
severity of restoring a checkpoint, limiting the impact to only those
threads that witness global effects, and establishing their rollback
point to be as temporally close as possible to their current state.
Fig. 6 presents a refinement to the semantics that incrementally
constructs a dependency graph as part of program execution. This
new definition does not require stable section identifiers or stable
maps to define checkpoints. Instead, it captures the communication
actions performed by threads within a data structure. This structure
consists of a set of nodes representing interesting program points,
and edges that connect nodes that have shared dependencies. Nodes
are indexed by ordered node identifiers, and hold thread state. We
also define maps to associate threads with nodes, and its set of
active stable sections.
Informally, the actions of each thread in the graph are represented by a chain of nodes that define temporal ordering on threadlocal actions. Backedges are established to nodes representing stable sections; these nodes define possible per-thread checkpoints.
Sources of backedges are communication actions that occur within
a stable section, or the exit of a nested stable section. Edges also
connect nodes belonging to different threads to capture inter-thread
communication events.
Graph reachability is used to ascertain a global checkpoint
when a stabilize action is performed: when thread T performs a
stabilize call, all nodes reachable from T ’s current node in the
graph are examined, and the context associated with the least such
reachable node for each thread is used as the thread-local checkpoint for that thread. If a thread is not affected (transitively) by the
actions of the thread performing the rollback, it is not reverted to
any earlier state. The collective set of such checkpoints constitutes
a global state.
α
The evaluation relation P, G ; P ′ , G′ evaluates a process P
executing action α with respect to a communication graph G to
α
yield a new process P ′ and new graph G′ . As usual,;∗ denotes
the reflexive, transitive closure of this relation. The auxiliary relation t[e], G ⇓ G′ models intra-thread actions within the graph. It
creates a new node to capture thread-local state, and sets the current node marker for the thread to this node. In addition, if the action occurs within a stable section, a back-edge is established from
that node to this section. This backedge is used to identify a potential rollback point. The auxiliary operation ADD N ODE creates new
nodes ensuring that the newly created node’s identifier is greater
than any existing node in the graph.
We define a correspondence between the two semantics that formalizes the intuition that incremental checkpoint construction re7
P ′′ kt[v], ∆f
P ′′ kt[v], Gf ,
| β ′ |≤| β |.
We begin the proof by defining a correspondance between a
stable map ∆ and a a graph G. Unlike the graph yeilded by our
incremental checkpoint construction, the corresponding graph from
a stable map is a set of disjoint sub-graphs. We can veiw each
checkpoint as its own subgraph containing nodes for each thread.
Given ∆ we construct a graph G by the following procedure:
∀δ ∈ dom(∆), s.t. ∆(δ) = P, ∆′ then create a new subgraph
Gi = (N, E, η, σ)x by ∀t[E[e]] ∈ P, addN ode(t[E[e]], N)
and ∀n, n′ ∈ N, n 6= n′ , {n, n′ } ∪ E. Lastly, we union all the
subgraphs for a complete graph G.
α
Given E t,P0 [e], G0 ;∗ Pn , Gn , we define t[E[e]] ≤ t[E ′ [e′ ]]
t,P
β
α
if ∃ E ′ i [e′ ], Gi ;∗ Pj k[E[e], Gj within E t,P0 [e], G0 ;∗ Pn , Gn .
We define the relation a = REACH(n) ≤ b = REACH(n′ )
if ∀t[E[e] ∈ a, t′ [E ′ [e′ ] ∈ b if t = t′ then t[E[e]] ≤ t′ [E ′ [e′ ]]. We
say one reach is less than or equal to another if all continuations
stored in nodes yeilded by REACH(n) are wholey contained in
the continuations stored in nodes yeilded by REACH(n′ ). Any
reach which is a subset of another reach is also considered strictly
less than the other reach.
We define the relation G ≤ G′ for any two graphs, if ∀n ∈
G, ∀n′ ∈ G′ REACH(n) ≤ REACH(n′ ) where n′ corresponds to a δ and n corresponds to a node created at the start
of a stable section. Since each δ corresponds to a global checkpoint of the oldest most active stable section and communication
is synchronous, this checkpoint represents the case if all active
stable sections were connected through communication. There are
two cases to examine, communication between stable sections, and
communication outside of stable sections. Since the global checkpoint refelects the state prior to the oldest simultaneously active
stable section, we know that such a checkpoint will unroll the
program to a point where no stable section is active. Notice, that
communication outside of stable section is not a factor, since if
such a communication is reachable from a stable section, a communication must have occured prior to it with the stable section.
Since each node within the graphs stores a continuation, we know
REACH(n) ≤ REACH(n′ ) for each n corresponding to a stable section. Therefore, we know that for any graph G yielded by
our incremental checkpointing algorithm G ≤ G′ for a graph G′
derived from our global checkpoints.
5.1 Example
To illustrate the semantics, consider the sequence of actions shown
in Fig. 7. When thread t1 spawns a new thread t2 , a new node n2 is
created to represent t2 ’s actions, and an edge between the current
node referenced by t1 in the graph (n1) to n2 is established (see
(a)). When t2 enters a stable section, a new node n3 is created, and
an edge between n2 and n3 is recorded (see (b)). When threads t1
and t2 are available to engage in a communication event, new nodes
for both threads are created, and a bi-directional edge between them
is established (see (c)). Notice, that for thread t2 a backward edge
to the node representing its stable section is also added; thus, if
t1 unrolls to an earlier checkpoint that precedes the send, t2 will
roll-back to it local checkpoint represented by n2. Notice that t1
is not executing within any stable section currently, and thus no
2006/1/17
S YNTAX AND E VALUATION C ONTEXTS
P
E t,P [e]
::=
::=
P ROGRAM S TATES
P kP | t[e]
P k t[E [e]]
e → e′
α
E t,P [e], G ; E t,P [e′ ], G′
n
e
∈
∈
η
∈
Node
Edge
CurNode
σ
G
∈
∈
StableSections =
Graph
=
=
=
=
NodeId × Process
Node × Node
fin
Thread → Node
fin
Thread → Node ∗
P(Node) × P(Edge) × CurNode × StableSections
G LOBAL E VALUATION RULES
n′ = ADD N ODE(t[e], N)
G′ = h N ∪ {n}, E ∪ {η(t) 7→ n′ , n′ 7→ n}, η[t 7→ n′ ], σ i
t[e], h N, E, η, σ[t 7→ n.n] i ⇓ G′
n = ADD N ODE(t[e], N)
G′ = h N ∪ {n}, E ∪ {η(t) 7→ n}, η[t 7→ n], σ i
t[e], h N, E, η, σ[t 7→ φ] i ⇓ G′
t′ fresh G = h N, E, η, σ i
n = ADD N ODE(t′ [e], N)
G′ = h N ∪ {n′ }, E ∪ {η(t) 7→ n}, η[t′ 7→ n], σ[t′ →
7
φ] i
n=
SS
SP
E t,P [λs x.e(v)], G ; E t,P [stable(e[v/x])], G′
P = P ′ kt[E [send(l, v)]]kt′ [E ′ [recv(l)]]
t[E [send(l, v)]], G ⇓ G′ t′ [E [recv(l)]], G′ ⇓ G′′
G′′ = h N, E, η, σ i
G′′′ = h N, E ∪ {η(t) 7→ η(t′ ), η(t′ ) 7→ η(t)}, η, σ i
′
′
′
s
x.e(v)]], N)
G = h N, E, η, σ i
G′ = h N, E ∪ {η(t) 7→ n}, η[t 7→ n], σ[t 7→ n.σ(t)] i
E t,P [spawn(e)], G ; P kt[E [unit]]kt′ [e], G′
COMM
ADD N ODE(t[E[λ
G = h N, E, η, σ i σ(t) = n.ns
n = ADD N ODE(t[E[stable(v)]], N)
G′ = h N ∪ {n}, E ∪ {n 7→ hd(ns)}, η[t 7→ n], σ[t 7→ ns] i
es
E t,P [stable(v)], G ; E t,P [v], G′
′′′
P, G ; P kt[E [unit]]kt [E [v]], G
τ = REACH(η(t))
P ′ = {t[e] | h i, t[e] i ∈ τ s.t. i ≤ j ∀h j, t[e′ ] i ∈ τ }
P ′′ = P ′ ⊕ (P ⊖ P ′ )
ST
E t,P [stabilize], G ; P ′′ , G′
Figure 6. Incremental Checkpoint Construction.
Actions: SP. SS . COMM . SS . COMM
5.2 Handling References
Figure 7. Incremental checkpoint construction.
backward edge from n5 can be constructed. When t1 enters a stable
section, the graph is augmented as before (see (d)). Finally, when
the threads communicate again, a similar extension of the graph is
performed (see (e)). Since both threads are now in stable sections,
backward edges are established from the current node to the closest
enclosing stable section.
8
We have thus far elided the presentation of how to efficiently
track shared memory access to properly support state restoration
actions. Naively tracking each read and write separately would
be inefficient. There are two problems that must be addressed:
(1) unnecessary writes should not be logged; and (2) spurious
dependencies induced by reads should be avoided. We present a
technique to solve the first problem, and then show a modification
to the semantics to handle the second.
Notice that for a given stable section, it is enough to monitor
the first write to a given memory location since each stable section
is unrolled as a single unit. For every write to location l, we need
to only monitor the first read performed by other thread to l; if
this write is unrolled, the reading thread must be unrolled to at least
before this read. To monitor writes, we create a version list in which
we store reference/value pairs. For each reference in the list, its
matching value corresponds to the value held in the reference prior
to the execution of the stable section. When the program enters
a stable section, we create an empty version list for this section.
When a write is encountered within a stable section, a write barrier
is executed that checks if the reference being written is in the
version list maintained by the section. If there is no entry for the
reference, one is created, and the current value of the reference is
recorded. Otherwise, no action is required.
2006/1/17
Until a nested stable section exits, it is possible for a call to stabilize to unroll to the start of this section. Therefore, nested sections
require maintaining their own version lists. Version list information
in these sections must be propagated to the outer section upon exit.
However, the propagation of information from nested sections to
outer ones is not trivial: if the outer section has monitored a particular memory location that has also been updated by the inner
one, we only need to store the outer section’s version, and the value
preserved by the inner one can be discarded.
Efficiently monitoring read dependencies requires us to adopt
a different methodology. We assume read operations occur much
more frequently than writes, and thus it would be impractical to
have barriers on all read operations to record dependency information in the communication graph. However, we observe that for a
program to be correctly synchronized, all reads and writes on a location l must be protected by lock ℓ.
Therefore, it is sufficient to monitor lock acquires/releases to infer shared memory dependencies. By incorporating happens-before
dependency edges on lock operations [28], rollback actions initiated by a writer to a shared location can be effectively propagated
to subsequent readers that mediate access to that location via a common lock.
To model this behavior we augment our semantics with the
domain L, a mapping between a lock ℓ and a node n. We introduce
a new lock acquire operation and add the following rule to the
semantics. When a thread t acquires a lock ℓ, a backwards edge
between the node n and the current locked stable section L(ℓ) is
added. After this, the current node of the acquiring thread is set to
the last locked node for ℓ.
t[E [acq(ℓ)]], G ⇓ h L, N, E, η, σ i
E′ = E ∪ {L(ℓ) 7→ η(t)} L′ = L[ℓ 7→ η(t)]
ACQ
E t,P [acq(ℓ)], G ; E t,P [unit()], h L′ , N, E′ , η, σ i
6. Implementation
Our implementation is incorporated within MLton [21], a wholeprogram optimizing compiler for Standard ML. The main changes
to the underlying infrastructure were the insertion of lightweight
write barriers to track shared memory updates, and hooks to the
Concurrent ML [32] (CML) library to update the communication
graph.
Because our implementation is an extension of the core CML
library, it supports first-class events [32] as well as channel-based
communication. The handling of events is no different than our
treatment of messages. If a thread is blocked on an event with an associated channel, we insert an edge from that thread’s current node
to the channel. Our implementation supports the basic send and
recv events, from which more complex first-class events can be
generated via combinators. We are able to support CML’s selective
communication with no change to the basic algorithm. Since CML
imposes a strict ordering of communication events, each channel
must be purged of spurious or dead data after a stabilize action.
Each thread can be blocked on at most one communication event
and one channel. Therefore, there can be at most one value to clear
from a channel per thread when stabilizing a program. Since CML
stores both the blocking thread and the value on the channel, it is
straightforward to determine the values that must be cleared.
The size of the communication graph grows with the number of
communication events, thread creation actions, and stable sections
entered. However, we do not need to store the entire graph for the
duration of program execution. As the program executes, parts of
the graph will become unreachable. These sections of the graph
can be safely pruned, although we find in our experiments that
graph sizes grow relatively slowly, and aggressive pruning is rarely
warranted.
9
(A)
(B)
Figure 8. Synthetic benchmark overheads.
7. Performance Results
To measure the cost of stabilizers with respect to various concurrent programming paradigms, we present a synthetic benchmarks
to quantify pure memory and time overheads, and examine several
open-source CML benchmarks to illustrate average overheads in
real programs. The benchmarks were run on a Intel P4 2.4 GHz
machine with one GByte of memory running Gentoo Linux, compiled and executed using MLton release 20041109.
To measure the costs of our abstraction, our benchmarks are
executed in two different ways: one in which the benchmark is
executed with no actions are monitored, and no checkpoints constructed, and the other in which the entire program is monitored,
effectively wrapped within a stable call, but in which no checkpoints are actually restored, and graph pruning disabled.
7.1 Synthetic Benchmarks
Our synthetic benchmark spawns two threads, a source and a sink,
that communicate asynchronously. We measure the cost of our abstraction with regard to an ever increasing load of asynchronous
events. This benchmark measures the overhead of logging program state and communication dependencies with no opportunity
for amortizing these costs among other non-stabilizer related operations.
The runtime overhead is presented in Fig. 8(a), and the total
allocation overhead is presented in Fig. 8(b). As expected, the cost
to simply maintain the graph grows linearly with the number of
asynchronous communications performed and runtime overheads
remain constant. There is a significant initial memory and runtime
cost due to pre-allocating hash tables used to store the current node
of each thread and to maintain an association between channels and
their nodes.
7.2 Open-Source Benchmarks
Our other benchmarks include several eXene [17] benchmarks
Triangles and Nbody, mostly display programs that create threads
to draw objects; and, Pretty, a pretty printing library written on
top of eXene. The eXene toolkit is a library for X Windows, implementing the functionality of xlib, written in CML and comprising
roughly 16K lines of Standard ML. It is roughly 30,000 lines of
Standard ML Events from the X server and control messages between widgets are distributed in streams (coded as CML event
values) through the window hierarchy. eXene manages the X calls
through a series of servers, dynamically spawned for each connection and screen. The last benchmark we consider is Swerve is
a webserver written in CML whose major modules communicate
with one other using message-passing channel communication; it
makes no use of eXene. All the benchmarks create various CML
threads to handle various events; communication occurs mainly
through a combination of message-passing on channels, with occasional updates to shared data.
2006/1/17
Benchmark
Triangles
N-Body
Pretty
Swerve
LOC incl. eXene
16501
16326
18400
9915
Threads
205
240
801
10532
Channels
79
99
340
231
Comm.
Events
187
224
950
902
Shared
Writes Reads
88
88
224
273
602
840
9339
80293
Graph
Size(MB)
.19
.29
.74
5.43
Overheads (%)
Runtime Memory
0.59
8.62
0.81
12.19
6.23
20.00
6.60
4.08
Table 1. Benchmark characteristics, dynamic counts, and normalized overheads.
For these benchmarks, stabilizers exhibit a runtime slow down
of approximately 6% over a CML program in which monitoring is
not performed (see Table 1). The cost of using stabilizers is only
dependent on the number of inter-thread actions and shared data
dependencies that are logged. These overheads are well amortized
over program execution. Memory overheads to maintain the communication graph are larger, although in absolute terms, they are
quite small.
Because we capture continuations prior to executing communication events and entering stable sections, part of the memory
cost is influenced by representation choices made by the underlying
compiler. Our mechanism would benefit further from a lightweight
low-overhead representation of continuations [35, 3]. Because we
were interested in understanding the worst-case bounds of our approach, graph pruning was not employed to reduce graph size. Despite this constraint, benchmarks such as Swerve that create over
10K threads, and employ non-trivial communication patterns, requires only 5MB to store the communication graph, a roughly 4%
overhead over the memory consumption of the original program.
8. Previous Work
Being able to checkpoint and rollback parts or the entirety of an execution has been the focus of notable research in the database [11]
as well as parallel and distributed computing communities [14, 23,
25]. Reverting to previous state provides a measure of fault tolerance for long-running applications [37]. Classically, checkpoints
have been used to provide fault tolerance for long-running, critical
executions, for example in scientific computing [2] but have been
typically regarded as heavyweight entities to construct and maintain.
Recent work in the programming languages community has explored abstractions and mechanisms closely related to stabilizers
and their implementation for maintaining consistent state in distributed environments [15], detecting deadlocks [10], and gracefully dealing with unexpected termination of communicating tasks
in a concurrent environment [16]. For example, kill-safe thread abstractions [16] provide a mechanism to allow cooperating threads
to operate even in the presence of abnormal termination. Stabilizers can be used for a similar goal, although the means by which
this goal is achieved is quite different. Stabilizers rely on unrolling
thread dependencies of affected threads to ensure consistency instead of employing specific runtime mechanisms to reclaim resources. In general, the work presented here is distinguished from
these other efforts in its focus on defining a coordinated safe checkpointing scheme for concurrent programs, but we believe stabilizers
can also be used as a substrate upon which the techniques suggested
by these other efforts could be expressed.
In addition to stabilizers, functional language implementations
have utilized continuations for similar tasks. For example, Tolmach
and Appel [38] describe a debugging mechanism for SML/NJ that
utilized captured continuations to checkpoint the target program at
given time intervals. This work was later extended [39] to support
multithreading, and was used to log non-deterministic thread events
to provide replay abilities.
10
Another possibility for fault recovery is micro-reboot [8], a finegrained technique for surgically recovering faulty application components which relies critically on the separation of data recovery
and application recovery. Micro-reboot allows for a system to be
restarted without ever being shut down by rebooting separate components. Such a recovery mechanism can greatly reduce downtime
for applications whose availability is critical. Unlike checkpointing schemes, which attempt to restore a program to a consistent
state within the running application, micro-reboot quickly restarts
an application component. However, micro-reboot suffers the same
problems that plague most transparent fault recovery mechanisms;
namely, such constructs are ignorant of program semantics, resulting in their use only when an error becomes a system fault.
The ability to revert to a prior point within a concurrent execution is essential to transaction systems based on optimistic
(or speculative) concurrency [1, 17, 24]; outside of their role for
database concurrency control, such approaches can improve parallel program performance by profitably exploiting speculative execution [33, 40]. Optimistic concurrency allows multiple threads
to access a guarded object concurrently. As long as these data accesses are disjoint, no error occurs. If a thread commits is changes
to a shared object, which was access by another thread, the second
thread must be reverted to a state prior to its data access to ensure a
serializable schedule. Harris proposes a transactional memory system [19] for Haskell that introduces a retry primitive to allow a
transactional execution to safely abort and be re-executed if desired
resources are unavailable. When the retry primitive is invoked, the
transaction is unrolled and re-executed. However, this work does
not propose to track or revert effectful thread interactions within
a transaction. In fact, such interactions are explicitly reject by the
Haskell type-system.
9. Conclusions and Future Work
Stabilizers are a novel on-the-fly checkpointing abstraction for
concurrent languages. Unlike other transparent checkpointing
schemes, stabilizers are not only able to identify the smallest subset
of threads which must be unrolled, but also provide useful safety
guarantees. As a language abstraction, stabilizers can be used to
simplify program structure especially with respect to error handling, debugging, and consistency management. Our results indicate that stabilizers can be implemented with small overhead and
thus serve as an effective and promising checkpointing abstraction
for high-level concurrent programs.
There are several important directions we expect to pursue in the
future. Currently, we provide the programmer the ability to write
first-class stable functions, but a stabilize call is static. We could
envision a system where there exists an explicit pairing between
stable and stabilize, where a call to stabilize stabilizes a
specific stable section. The stabilize function itself would be
first-class, allowing threads to rollback stable sections within other
threads.
Our semantics and implementation relies critical on the ability
to capture and restore thread state. Language implementations for
SML provide this ability through the reification and application of
2006/1/17
continuations. Whether one can devise an efficient implementation
for stabilizers in languages like Java or C# whose virtual machines
do not support such a facility remains an open question. We suspect
that recent results [31] that show how to use stack inspection to
reconstitute a thread’s runtime stack in these languages may be
potentially used to support stabilizers as well.
References
[1] A. Adya, R. Gruber, B. Liskov, and U. Maheshwari. Efficient
Optimistic Concurrency Control Using Loosely Synchronized
Clocks. SIGMOD Record (ACM Special Interest Group on
Management of Data), 24(2):23–34, June 1995.
[2] Saurabh Agarwal, Rahul Garg, Meeta S. Gupta, and Jose E. Moreira.
Adaptive Incremental Checkpointing for Massively Parallel Systems.
In ICS ’04: Proceedings of the 18th annual international conference
on Supercomputing, pages 277–286, New York, NY, USA, 2004.
ACM Press.
[3] Andrew W. Appel. Compiling with continuations. Cambridge
University Press, New York, NY, USA, 1992.
[4] Micah Beck, James S. Plank, and Gerry Kingsley. Compiler-Assisted
Checkpointing. Technical report, University of Tennessee, Knoxville,
TN, USA, 1994.
[5] Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul
Stodghill. Automated Application-Level Checkpointing of MPI
Programs. In PPoPP ’03: Proceedings of the ninth ACM SIGPLAN
symposium on Principles and practice of parallel programming,
pages 84–94, New York, NY, USA, 2003. ACM Press.
[6] Greg Bronevetsky, Daniel Marques, Keshav Pingali, Peter Szwed, and
Martin Schulz. Application-Level Checkpointing for Shared Memory
Programs. In ASPLOS-XI: Proceedings of the 11th international
conference on Architectural support for programming languages and
operating systems, pages 235–247, New York, NY, USA, 2004. ACM
Press.
[7] R. Bruni, H. Melgratti, and U. Montanari. Theoretical Foundations
for Compensations in Flow Composition Languages. In POPL ’05:
Proceedings of the 32nd ACM SIGPLAN-SIGACT sysposium on
Principles of programming languages, pages 209–220, New York,
NY, USA, 2005. ACM Press.
[8] G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox.
Microreboot - A Technique for Cheap Recovery. In 6th Symposium
on Operating Systems Design and Implementation, San Francisco,
California, 2004.
[9] Yuqun Chen, James S. Plank, and Kai Li. CLIP: A Checkpointing
Tool for Message-Passing Parallel Programs. In Supercomputing ’97:
Proceedings of the 1997 ACM/IEEE conference on Supercomputing,
pages 1–11, New York, NY, USA, 1997. ACM Press.
[10] Jan Christiansen and Frank Huch. Searching for Deadlocks while
Debugging Concurrent Haskell Programs. In ICFP ’04: Proceedings
of the ninth ACM SIGPLAN international conference on Functional
programming, pages 28–39, New York, NY, USA, 2004. ACM Press.
[11] Panos K. Chrysanthis and Krithi Ramamritham. ACTA: the
SAGA continues. In Database Transaction Models for Advanced
Applications, pages 349–397. Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 1992.
[12] Alan Dearle, Rex di Bona, James Farrow, Frans Henskens, Anders
Lindstrom, Stephen Norris, John Rosenberg, and Francis Vaughan.
Protection in Grasshopper: A Persistent Operating System. In
Proceedings of the Sixth International Workshop on Persistent Object
Systems, pages 60–78, London, UK, 1995. Springer-Verlag.
[13] William R. Dieter and James E. Lumpp Jr. A User-level Checkpointing Library for POSIX Threads Programs. In FTCS ’99: Proceedings of the Twenty-Ninth Annual International Symposium on
Fault-Tolerant Computing, page 224, Washington, DC, USA, 1999.
IEEE Computer Society.
[14] E. N. (Mootaz) Elnozahy, Lorenzo Alvisi, Yi-Min Wang, and
David B. Johnson. A Survey of Rollback-Recovery Protocols in
Message-Passing Systems. ACM Comput. Surv., 34(3):375–408,
2002.
11
[15] John Field and Carlos A. Varela. Transactors: a Programming
Model for Maintaining Globally Consistent Distributed State in
Unreliable Environments. In POPL ’05: Proceedings of the 32nd
ACM SIGPLAN-SIGACT sysposium on Principles of programming
languages, pages 195–208, New York, NY, USA, 2005. ACM Press.
[16] Matthew Flatt and Robert Bruce Findler. Kill-safe Synchronization
Abstractions. In PLDI ’04: Proceedings of the ACM SIGPLAN 2004
conference on Programming language design and implementation,
pages 47–58, New York, NY, USA, 2004. ACM Press.
[17] Jim Gray and Andreas Reuter. Transaction Processing. MorganKaufmann, 1993.
[18] Tim Harris and Keir Fraser. Language support for lightweight transactions. In Proceedings of the ACM SIGPLAN Conference on ObjectOriented Programming Systems, Languages, and Applications, pages
388–402. ACM Press, 2003.
[19] Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice
Herlihy. Composable Memory Transactions. In ACM Conference
on Principles and Practice of Parallel Programming, 2005.
[20] Maurice Herlihy, Victor Luchangco, Mark Moir, and William N.
Scherer, III. Software transactional memory for dynamic-sized
data structures. In ACM Conference on Principles of Distributed
Computing, pages 92–101, 2003.
[21] http://www.mlton.org.
[22] D. Hulse. On Page-Based Optimistic Process Checkpointing. In
IWOOOS ’95: Proceedings of the 4th International Workshop on
Object-Orientation in Operating Systems, page 24, Washington, DC,
USA, 1995. IEEE Computer Society.
[23] Mangesh Kasbekar and Chita Das. Selective Checkpointing and
Rollback in Multithreaded Distributed Systems. In 21st International
Conference on Distributed Computing Systems, 2001.
[24] H. T. Kung and John T. Robinson. On Optimistic Methods for
Concurrency Control. TODS, 6(2):213–226, 1981.
[25] Kai Li, Jeffrey Naughton, and James Plank. Real-time Concurrent
Checkpoint for Parallel Programs. In ACM SIGPLAN Symposium
on Principles and Practice of Parallel Programming, pages 79–88,
1990.
[26] Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew, Roy DzChing Ju, Tin-Fook Ngai, and Sun Chan. A compiler framework for
speculative analysis and optimizations. In PLDI ’03: Proceedings
of the ACM SIGPLAN 2003 conference on Programming language
design and implementation, pages 289–299, New York, NY, USA,
2003. ACM Press.
[27] Suresh Jagannathan Lukasz Ziarek, Philip Schatz. Stabilizers: Safe
Lightweight Checkpointing for Concurrent Programs. In University
of Purdue Technical Report: CSD TR #05-023, 2005.
[28] Jeremy Manson, William Pugh, and Sarita V. Adve. The Java Memory
Model. In POPL ’05: Proceedings of the 32nd ACM SIGPLANSIGACT symposium on Principles of programming languages, pages
378–391, New York, NY, USA, 2005. ACM Press.
[29] Robert H. B. Netzer and Mark H. Weaver. Optimal tracing and
incremental reexecution for debugging long-running programs. In
PLDI ’94: Proceedings of the ACM SIGPLAN 1994 conference on
Programming language design and implementation, pages 313–325,
New York, NY, USA, 1994. ACM Press.
[30] Robert H. B. Netzer and Jian Xu. Adaptive message logging
for incremental program replay. IEEE Parallel and Distributed
Technology, 1(4):32–39, 1993.
[31] Greg Pettyjohn, John Clements, Joe Marshall, Shriram Krishnamurthi, and Matthias Felleisen. Continuations from Generalized
Stack Inspection. ACM SIGPLAN International Conference on Functional Programming, 40(9):216–227, 2005.
[32] John Reppy. Concurrent Programming in ML. Cambridge University
Press, 1999.
[33] Martin Rinard. Effective Fine-Grained Synchronization for Automatically Parallelized Programs Using Optimistic Synchronization
Primitives. ACM Transactions on Computer Systems, 17(4):337–371,
November 1999.
[34] John Rosenberg, Alan Dearle, David Hulse, Anders Lindstrom,
2006/1/17
[35]
[36]
[37]
[38]
[39]
[40]
[41]
and Stephen Norris. Operating system support for persistent and
recoverable computations. Commun. ACM, 39(9):62–69, 1996.
Zhong Shao and Andrew W. Appel. Space-Efficient Closure
Representations. In LFP ’94: Proceedings of the 1994 ACM
conference on LISP and functional programming, pages 150–161,
New York, NY, USA, 1994. ACM Press.
Avraham Sinnar, David Tarditi, Mark Plesko, and Bjarne Steensgard.
Integrating Support for Undo with Exception Handling. Technical
Report MSR-TR-2004-140, Microsoft Research, 2004.
Asser N. Tantawi and Manfred Ruschitzka. Performance Analysis of
Checkpointing Strategies. ACM Trans. Comput. Syst., 2(2):123–144,
1984.
Andrew P. Tolmach and Andrew W. Appel. Debugging Standard ML
Without Reverse Engineering. In LFP ’90: Proceedings of the 1990
ACM conference on LISP and functional programming, pages 1–12,
New York, NY, USA, 1990. ACM Press.
Andrew P. Tolmach and Andrew W. Appel. Debuggable Concurrency
Extensions for Standard ML. In PADD ’91: Proceedings of the 1991
ACM/ONR workshop on Parallel and distributed debugging, pages
120–131, New York, NY, USA, 1991. ACM Press.
Adam Welc, Suresh Jagannathan, and Antony Hosking. Safe futures
for java. In Proceedings of the ACM SIGPLAN Conference on ObjectOriented Programming Systems, Languages, and Applications, pages
439–453. ACM Press, 2005.
Adam Welc, Suresh Jagannathan, and Antony L. Hosking. Transactional Monitors for Concurrent Objects. In European Conference on
Object-Oriented Programming, pages 519–542, 2004.
12
2006/1/17
Download