Set 12: Causality

advertisement
Set 12: Causality
CSCE 668
DISTRIBUTED ALGORITHMS AND
SYSTEMS
CSCE 668
Fall 2011
Prof. Jennifer Welch
1
Logical Clocks Motivation
2
In an asynchronous system, we often cannot tell
which of two events occurred before the other:
Example A
Example B

p0
m0
p0
m0
m1
p1
In Example A, processors cannot tell
which message was sent first.
Probably not important.
m1
p1
In Example B, processors can tell
which message was sent first.
Might be important.
Let's try to determine relative ordering of some (not all) events.
Set 12: Causality
CSCE 668
Happens Before Partial Order
3

1.
2.
3.
Given an execution, computation event a happens
before computation event b, denoted
a  b, if
a and b occur at same processor and a precedes b,
or
a results in sending m and b includes receipt of m,
or
there exists computation event c such that a  c
and c  b (transitive closure)
Set 12: Causality
CSCE 668
Happens Before Partial Order
4

Happens before means that information can flow
from a to b, i.e., that a might cause b.
a
d
m0
m1
p0
p1
b
c
Set 12: Causality
ab
bc
cd
ac
ad
bd
CSCE 668
Concurrent Events
5

If a does not happen before b, and b does not
happen before a, then a and b are concurrent,
denoted a || b.
Set 12: Causality
CSCE 668
Happens Before Example
6
Rule 1: a  b, c  d  e  f, g  h i
Rule 2: a  d, g  e, f  i
Rule 3: a  e, c  i, …
Set 12: Causality
h || e, …
CSCE 668
Logical Clocks
7


Logical clocks are values assigned to events to
provide some information about the order in which
events happen.
Goal is to assign an integer L(e) to each
computation event e in an execution such that
if a  b, then L(a) < L(b).
Set 12: Causality
CSCE 668
Logical Timestamps Algorithm
8



Each pi keeps a counter (logical timestamp) Li,
initially 0
Every message that pi sends is timestamped with
current value of Li
Li is incremented at each step by pi to be greater
than
its current value, and
 the timestamps on all messages received at this step


If a is an event at pi, then assign L(a) to be the value
of Li at the end of a.
Set 12: Causality
CSCE 668
Logical Timestamps Example
9
1
2
1
2
1
3
4
2
5
a  b : L(a) = 1 < 2 = L(b)
f  i : L(f) = 4 < 5 = L(i)
a  e : L(a) = 1 < 3 = L(e)
etc.
Set 12: Causality
CSCE 668
Getting a Total Order
10




If a total order is required, break ties using ids.
In the example, L(a) = (1,0), L(c) = (1,1), etc.
Timestamps are ordered lexicographically.
In the example, L(a) < L(c).
Set 12: Causality
CSCE 668
Drawback of Logical Clocks
11



a  b implies L(a) < L(b), but L(a) < L(b) does not
necessarily imply a  b.
In previous example, L(g) = 1 and L(b) = 2, but g
does not happen before b.
Reason is that "happens before" is a partial order,
but logical clock values are integers, which are
totally ordered.
Set 12: Causality
CSCE 668
Vector Clocks
12



Generalize logical clocks to provide noncausality information as well as causality
information.
Implement with values drawn from a partially
ordered set instead of a totally ordered set.
Assign a value V(e) to each computation event e
in an execution such that
a  b if and only if V(a) < V(b).
Set 12: Causality
CSCE 668
Vector Timestamps Algorithm
13






Each pi keeps an n-vector Vi, initially all 0's
Entry j in Vi is pi 's estimate of how many steps pj
has taken
Every msg pi sends is timestamped with current
value of Vi
At every step, increment Vi[i] by 1
When receiving a message with vector timestamp T,
update Vi 's components j ≠ i so that
Vi[j] = max(T[j],Vi[j])
If a is an event at pi, then assign V(a) to be value
of Vi at end of a.
Set 12: Causality
CSCE 668
Manipulating Vector Timestamps
14
Let V and W be two n-vectors of integers.
Equality: V = W iff V[i] = W[i] for all i.
Example: (3,2,4) = (3,2,4)
Less than or equal: V ≤ W iff V[i] ≤ W[i] for all i.
Example: (2,2,3) ≤ (3,2,4) and (3,2,4) ≤ (3,2,4)
Less than: V < W iff V ≤ W but V ≠ W.
Example: (2,2,3) < (3,2,4)
Incomparable: V || W iff !(V ≤ W) and !(W ≤ V).
Example: (3,2,4) || (4,1,4)
Set 12: Causality
CSCE 668
Manipulating Vector Timestamps
15



The partial order on n-vectors just defined is not the
same as lexicographic ordering.
Lexicographic ordering is a total order on vectors.
Consider (3,2,4) vs. (4,1,4) in the two approaches.
Set 12: Causality
CSCE 668
Vector Timestamps Example
16
(1,0,0)
(2,0,0)
(0,1,0)
(0,0,1)
(1,2,0) (1,3,1) (1,4,1)
(0,0,2)
(1,4,3)
V(g) = (0,0,1) and V(b) = (2,0,0), which are incomparable.
Compare with logical clocks L(g) = 1 and L(b) = 2.
Set 12: Causality
CSCE 668
Correctness of Vector Timestamps
17
Theorem (6.5 & 6.6): Vector timestamps implement
vector clocks.
Proof: First, show a  b implies
V(a) < V(b).
Case 1: a and b both occur at pi, a first. Since Vi
increases at each step,
V(a) < V(b).
Set 12: Causality
CSCE 668
Correctness of Vector Timestamps
18
Case 2: a occurs at pi and causes m to be sent,
while b occurs at pj and includes the receipt of m.
 During
b, pj updates its vector timestamp in such a
way that V(a) ≤ V(b).
 pi 's estimate of number of steps taken by pj is never
an over-estimate. Since m is not received before it is
sent, pi 's estimate of the number of steps taken by pj
when a occurs is less than the number of steps taken
by pj when b occurs. So V(a)[j] < V(b)[j].
 Thus V(a) < V(b).
Set 12: Causality
CSCE 668
Correctness of Vector Timestamps
19
Case 3: There exists c such that a  c and c  b.
By induction (from Cases 1 and 2) and transitivity of
<, V(a) < V(b).
Next show V(a) < V(b) implies a  b.
Equivalent to showing !(a  b) implies !(V(a) < V(b))
Set 12: Causality
CSCE 668
Correctness of Vector Timestamps
20





Suppose a occurs at pi, b occurs at pj, and a does
not happen before b.
Let V(a)[i] = k.
Since a does not happen before b, there is no chain
of messages from pi to pj originating at pi 's k-th
step or later and ending at pj before b.
Thus V(b)[i] < k.
Thus !(V(a) < V(b)).
Set 12: Causality
CSCE 668
Size of Vector Timestamps
21

Vector timestamps are big:
n
components in each one
 values in the components grow without bound


Is there a more efficient way to implement vector
clocks?
Answer is NO, at least under some conditions.
Set 12: Causality
CSCE 668
Vector Clock Size Lower Bound
22
Theorem (6.9): Any implementation of vector clocks
using vectors of real numbers requires vectors of
length n (number of processors).
Proof: For any value of n, consider this execution:
Set 12: Causality
CSCE 668
Example Bad Execution
23
For n = 4:
Set 12: Causality
CSCE 668
Vector Clock Size Lower Bound
24
Claim 1: ai+1 || bi for all i (with wraparound)
Proof: Since each proc. does all sends before any
receives, there is no transitivity. Also pi+1 does not
send to pi.
Claim 2: ai+1  bj for all j ≠ i.
Proof: If j = i+1, obvious.
If j ≠ i+1, then pi+1 sends to pj:
Set 12: Causality
CSCE 668
Vector Clock Size Lower Bound
25


Suppose in contradiction, there is a way to implement
vector clocks with k-vectors of reals, where k < n.
By Claim 1, ai+1 || bi
=> V(ai+1) and V(bi) are incomparable
=> V(ai+1) is larger than V(bi) in some coordinate
h(i)
=> h : {0,…,n-1}  {0,…,k}
Set 12: Causality
CSCE 668
Vector Clock Size Lower Bound
26

Since k < n, the function h is not 1-1. So there exist
distinct i and j such that h(i) = h(j). Let r be this
common value of h.
V(a0)
V(a1)
…
V(ai+1)
…
V(aj+1)
…
V(an-1)
V(b0)
…
V(bi)
…
V(bj)
…
V(bn-2)
V(bn-1)
two of these
components are
the same, say
h(i) = h(j) = r
Set 12: Causality
CSCE 668
Vector Clock Size Lower Bound
27
V(bi)
V(ai+1)
V(bj)
V(aj+1)
Set 12: Causality
CSCE 668
Vector Clock Size Lower Bound
28



So V(ai+1) is larger than V(bi) in coordinate r and
V(aj+1) is larger than V(bj) in coordinate r also.
V(aj+1)[r] > V(bj)[r] by def. of r
≥ V(ai+1)[r] by Claim 2 (ai+1  bj) & correct.
≥ V(bi)[r] by def. of r
Thus V(aj+1) !< V(bi), contradicting Claim 2 (aj+1  bi)
and assumed correctness of V.
Set 12: Causality
CSCE 668
29
Application of Causality:
Consistent Cuts

Consider an asynchronous message passing system
with
FIFO message delivery per channel
 at most one msg received per computation step



Number the computation steps of each processor
1,2,3,…
A cut of an execution is K = (k0,…,kn-1), where ki
indicates number of computation steps taken by pi
Set 12: Causality
CSCE 668
Consistent Cuts
30
some cuts
In a consistent cut K = (k0,…,kn-1), if step s of pj
happens before step ki of pi, then s ≤ kj.
(1,3) and (2,4) are consistent.
(3,6) is inconsistent: step 4 by p0 happens
before step 6 of p1, but 4 is greater than 3.
Set 12: Causality
CSCE 668
Finding a Recent Consistent Cut
31
Problem Version 1: Processors all given a cut K and
must find a maximal consistent cut that is ≤ K.
Application: Logging-based crash recovery.
Procs periodically write their state to stable storage
 When a proc recovers from a crash, it tries to recover to
latest logged state, but needs to coordinate with other procs

Set 12: Causality
CSCE 668
Vector Clocks Solution
32



Implement vector clocks using vector timestamps
appended to application msgs.
Store the vector clock of each computation step in a
local array store[1,…]
When pi is given input cut K:
for x := K[i] downto 1 do
if store[x] ≤ K then return x
return x (entry for pi of global answer)
Set 12: Causality
CSCE 668
What About Channel State?
33



Processor states are not sufficient to capture entire
system state.
Messages in transit must be calculated.
Solution here requires
 additional
storage (number of messages)
 additional computation at recovery time (involving
replaying original execution to capture messages sent
but not received)
Set 12: Causality
CSCE 668
34
Another Take on Recent Consistent
State
Problem Version 2: A subset of procs initiate (at
arbitrary times) trying to find a consistent cut that
includes the state of at least one of the initiators
when it started.
 Called a distributed snapshot.
 Snapshot info can be collected at one proc. and
then analyzed.
Application: termination detection
Set 12: Causality
CSCE 668
Marker Algorithm
35


Instead of adding extra information on each
application message, insert control messages
("markers") into the channels.
Code for pi:
initially answer = -1 and num = 0
when application msg arrives:
num++; do application action
when marker arrives or when initiating snapshot:
if answer = -1 then
answer := num // pi's part of final answer
send marker to all neighbors
Set 12: Causality
CSCE 668
What About Channel States?
36


pi records sequence of msgs received from pj
between the time pi records its answer and the time
pi gets the marker from pj
These are the msgs in transit from pj to pi in the cut
returned by the algorithm.
Set 12: Causality
CSCE 668
Download