Title goes here - Kent State University

advertisement
Global State Recording



definitions
global state recording
 FIFO
 Chandy-Lamport’s algorithm
 collecting global state
 incremental snapshot
 non-FIFO
 Lai-Yang two color algorithm
 Mittern’s vector clocks algorithm
consistent global snapshots
 causality and zigzag paths
 rollback dependency graph
Local State




local state LSi of a site (process) Si is an assignment of values to variables
of Si
sending send(mij) and receiving rec(mij) of message mij from Si to Sj may
influence LSi
we denote
 time(send(mij) or rec(mij)) the sequence number of the state in the
compuation after which send/receive occurs (note the difference with
Singhal)
 time(LSi) the state at which Si was recorded
to aid the reasoning we consider the messages sent/received by a process
as belonging to local state
we define
that is
• the message is in transit if it was sent but not received
• the message is inconsistent if it was received but never sent
Global State

global state is a collection of
local states of all processes
and set of messages in the
channels
 notice Singhal does not
use messages in his def. –
ours is more precise
global state is consistent if it does not have any inconsistent messages,
that is:
global state is transitless if there are no messages in transition,
that is:
note that a consistent state is not necessarily transitless and v.v.
what are the global states on the picture above?
Chandy-Lamport’s
Global State Recording Algorithm



works on arbitrary topology system with FIFO channels and arbitrary algorithm
whose snapshot is taken (basic algorithm)
does not interfere with the operation of basic algorithm (does not delay, reorder
or drop basic messages)
one process
initiates
recording by
sending
control
messages
(markers)
multiple processes can
also initiate
• can C-L record an
inconsistent state?
Global State Recorded by
Chandy-Lamport’s Algorithm

•
•
does C-L record a (global) state that occurs in the computations?
 not necessarily.
 however, C-L records a state in a computation that is equivalent to the
original computation.
 moreover this equivalent computation shares with the original
computation
 a prefix up to start of snapshot
 a suffix after the snapshot
 the recorded state is between these two states
can C-L record a state where some P have messages in every channel? if
yes which one?
can several independent snapshots run in parallel?
Collecting Global State





based on spanning tree constructed on the fly
sender of the first marker to arrive at a process is its parent
each marker carries the sender’s parent
 by receiving marker process learns if it has children
if process is a leaf, after finishing state recording, it sends its
state to its parent
each process waits for its children’s states, appends its own
and forwards all info to its parent
Non-FIFO: Lai-Yang Algorithm



non-FIFO channel: is a set (rather than a queue) of messages, any
message in the set can be received
 messages can overtake one another
 fair message receipt is assumed – eventually a sent message is
received
two colors for processes and basic messages – white and red, no explicit
markers
 all processes start as white, when process sends a basic message it
attaches its color
 when process receives a differently colored message, it itself changes
color
 while white (red), process records all messages sent/received, after
changing colors, the process sends message history to the initiator;
based on sent/received histories, initiator calculates messages in transit
if only the number of messages in transit needed – may maintain counters
in stead of histories
State Recording
Using Causal Message Delivery



initiator broadcast token to all processes
each process records state and sends it to the initiator
 processes do not send markers or
coordinate local state recording
due to causal ordering of messages


if for Pi: rec(tokeni)  send(mij)
then send(tokenj)  send(mij)
therefore rec(tokenj)  rec(mij)
hence state recording at Pj happens before mij receipt
channel state recording (Archaya-Badrinath)



append sequence numbers to all messages,
at each process record highest sent/received SN
together with local state send sent/received records for initiator to
determine messages in transition
Consistent Global Snapshot





processes periodically asynchronously record local states (local checkpoints)
 a global snapshot is a collection of local checkpoints
needed in distributed failure recovery, distributed event monitoring, debugging,
etc.
global snapshot is consistent if no two checkpoints are causally related
even though checkpoints themselves are not causally related, they may not be
a part of a global snapshot
 ex: C11 and C32 are concurrent, yet there is no global snapshot that
contains both of them
the objective in global snapshot recording is to select (out of available) the set
of concurrent checkpoints.
 note that unlike global state recording, the alg. does not have control over
the
snapshot
taking
time
Zigzag Path




checkpoint interval – part of computation between two successive
checkpoints at the same process
zigzag path exists between checkpoints Cxi and Cyj if there exists a
sequence of messages m1, …mn such that
 m1 sent by Px after Cxi
 mk received by Pz, mk+1 is sent by Pz in the same checkpoint interval
 mn received by Py before Cyj
causal path - same as zigzag path but the messages are causally related
zigzag cycle is a zigzag path to the process itself
causal path
zigzag path
Zigzag Path




checkpoint interval – part of computation between two successive
checkpoints at the same process
zigzag path exists between checkpoints Cxi and Cyj if there exists a
sequence of messages m1, …mn such that
 m1 sent by Px after Cxi
 mk received by Pz then mk+1 is sent by Pz in the same checkpoint interval
 mn received by Py before Cyj
causal path - same as zigzag path but the messages are causally related
zigzag cycle is a zigzag path to the process itself
zigzag cycle
zigzag path
Sufficient Condition for
Consistent Snapshot


a consistent snapshot can be formed to include a set S of checkpoints if and
only if no zigzag path exists between any two checkpoints in S [Netzer and
Xu]
 snapshot line – a line drawn through a set of checkpoints
 due to the existence of a zigzag path, a snapshot line always crosses a
message making two checkpoints causally related and resultant
snapshot inconsistent
constructing consistent snapshot requires choosing checkpoints without
zigzag path
zigzag path
Runtime Consistent Snapshot
Construction Using R-Graph



definition of
rollbackdependency graph
(R-graph) [Wang]
basic message
carries its
checkpoint interval
number
example
computation
there is a zigzag
path between two
checkpoints if there
is a path in Rgraph
corresponding
R-graph
Download