Set 15: Broadcast

advertisement
Set 15: Broadcast
CSCE 668
DISTRIBUTED ALGORITHMS AND
SYSTEMS
CSCE 668
Fall 2011
Prof. Jennifer Welch
1
Broadcast Specifications
2


Recall the specification of a broadcast service given
in the last set of slides:
Inputs: bc-sendi(m)
an input to the broadcast service
 pi wants to use the broadcast service to send m to all the
procs


Outputs: bc-recvi(m,j)
an output of the broadcast service
 broadcast service is delivering msg m, sent by pj, to pi

Set 15: Broadcast
CSCE 668
Broadcast Specifications
3

A sequence of inputs and outputs (bc-sends and bcrecvs) is allowable iff there exists a mapping  from
each bc-recvi(m,j) event to an earlier bc-sendj(m)
event s.t.
 is well-defined: every msg bc-recv'ed was previously bcsent (Integrity)
  restricted to bc-recvi events, for each i, is one-to-one: no
msg is bc-recv'ed more than once at any single proc. (No
Duplicates)
  restricted to bc-recvi events, for each i, is onto: every msg
bc-sent is received at every proc. (Liveness)

Set 15: Broadcast
CSCE 668
Ordering Properties
4


Sometimes we might want a broadcast service that
also provides some kind of guarantee on the order
in which messages are delivered.
We can add additional constraints on the mapping
:
 single-source
FIFO or
 totally ordered or
 causally ordered
Set 15: Broadcast
CSCE 668
Single-Source FIFO Ordering
5


For all messages m1 and m2 and all pi and pj, if pi
sends m1 before it sends m2, and if pj receives m1
and m2, then pj receives m1 before it receives m2.
Phrased carefully to avoid requiring that both
messages are received.
 that
is the responsibility of a liveness property
Set 15: Broadcast
CSCE 668
Totally Ordered
6


For all messages m1 and m2 and all pi and pj, if
both pi and pj receive both messages, then they
receive them in the same order.
Phrased carefully to avoid requiring that both
messages are received by both procs.
 that
is the responsibility of a liveness property
Set 15: Broadcast
CSCE 668
Happens Before for Broadcast
Messages
7




Earlier we defined "happens before" relation for
events.
Now extend this definition to broadcast messages.
Assume all communication is through broadcast sends
and receives.
Msg m1 happens before msg m2 if
some bc-recv event for m1 happens before (in the old sense)
the bc-send event for m2, or
 m1 and m2 are bc-sent by the same proc. and m1 is bc-sent
before m2 is bc-sent.

Set 15: Broadcast
CSCE 668
Example of Happens Before for
Broadcast Messages
8
m1
m3
m2
m4
m1 happens before m3 and m4
m2 happens before m4
m3 happens before m4
Set 15: Broadcast
CSCE 668
Causally Ordered
9


For all messages m1 and m2 and all pi, if m1
happens before m2, and if pi receives both m1 and
m2, then pi receives m1 before it receives m2.
Phrased carefully to avoid requiring that both
messages are received.
 that
is the responsibility of a liveness property
Set 15: Broadcast
CSCE 668
Example
10
a
b
single-source FIFO?
totally ordered?
causally ordered?
Set 15: Broadcast
CSCE 668
Example
11
a
b
single-source FIFO?
totally ordered?
causally ordered?
Set 15: Broadcast
CSCE 668
Example
12
a
b
single-source FIFO?
totally ordered?
causally ordered?
Set 15: Broadcast
CSCE 668
Algorithm BB to Simulate Basic
Broadcast on Top of Point-to-Point
13

When bc-sendi(m) occurs:
 pi
sends a separate copy of m to every processor
(including itself) using the underlying point-to-point
message passing communication system

When can pi perform bc-recvi(m)?
 when
it receives m from the underlying point-topoint message passing communication system
Set 15: Broadcast
CSCE 668
Basic Broadcast Simulation
14
bc-sendi
bc-recvi
…
Alg BB
BB0
sendi
bc-sendj
recvi
bc-recvj
BBn-1
basic broadcastsendj
recvj
asynch pt-to-pt message passing
Set 15: Broadcast
CSCE 668
Correctness of Basic Broadcast
Algorithm
15


Assume the underlying point-to-point message
passing system is correct (i.e., conforms to the spec
given in previous set of slides).
Check that the simulated broadcast service satisfies:
 Integrity
 No
Duplicates
 Liveness
Set 15: Broadcast
CSCE 668
Single-Source FIFO Algorithm
16


Assume the underlying communication system is basic
broadcast.
when ssf-bc-sendi(m) occurs:
pi uses the underlying basic broadcast service to bcast m
together with a sequence number
 pi increments sequence number by 1 each time it initiates a
bcast


when can pi perform ssf-bc-recvi(m)?

when pi has bc-recv'ed m with sequence number T and has
ssf-bc-recv'ed messages from pj (the ssf-bc-sender of m)
with all smaller sequence numbers
Set 15: Broadcast
CSCE 668
Single-Source FIFO Algorithm
17
user of SSF bcast
ssf-bc-send
ssf-bc-recv
ssf
bcast
SSF alg
(timestamps)
bc-send
bc-recv
basic bcast
alg (n copies)
send
basic
bcast
recv
point-to-point message passing
Set 15: Broadcast
CSCE 668
Asymmetric Algorithm for Totally
Ordered Broadcast
18



Assume underlying communication service is basic
broadcast.
There is a distinguished proc. pc
when to-bcasti(m) occurs:


pi sends m to pc (either assume the basic broadcast service
also has a point-to-point mechanism, or have recipients
other than pc ignore the msg)
when pc receives m from pi from the basic broadcast
service:

append a sequence number to m and bc-send it
Set 15: Broadcast
CSCE 668
Asymmetric Algorithm for Totally
Ordered Broadcast
19

when can pi perform to-bc-recv(m)?
 when
pi has bc-recv'ed m with sequence number T and
has to-bc-recv'ed messages with all smaller sequence
numbers
Set 15: Broadcast
CSCE 668
Asymmetric Algorithm Discussion
20




Simple
Only requires basic broadcast
But pc is a bottleneck
Alternative approach next…
Set 15: Broadcast
CSCE 668
Symmetric Algorithm for Totally
Ordered Broadcast
21


Assume the underlying communication service is
single-source FIFO broadcast.
Each proc. tags each msg it sends with a timestamp
(increasing).


Break ties using proc. ids.
Each proc. keeps a vector of estimates of the other
proc's timestamps:
If pi 's estimate for pj is k, then pi will not receive any later
msg from pj with timestamp k.
 Estimates are updated based on msgs received and
"timestamp update" msgs

Set 15: Broadcast
CSCE 668
Symmetric Algorithm for Totally
Ordered Broadcast
22

Each proc. keeps its timestamp to be ≥ all its
estimates:


when pi has to increase its timestamp because of the receipt
of a message, it sends a timestamp update msg
A proc. can deliver a msg with timestamp T once
every entry in the proc's vector of estimates is at
least T.
Set 15: Broadcast
CSCE 668
Symmetric Algorithm
23
when to-bc-sendi(m) occurs:
ts[i]++
add (m,ts[i],i) to pending
invoke ssf-bc-sendi((m,ts[i]))
when ssf-bc-recvi((m,T)) from pj
occurs:
ts[j] := T
add (m,T,j) to pending
if T > ts[i] then
ts[i] := T
invoke ssf-bc-sendi("ts-up",T)
invoke to-bc-recvi(m,j) when:
(m,T,j) is entry in pending with
smallest (T,j)
T ≤ ts[k] for all k
result: remove (m,T,j) from
pending
when ssf-bc-recvi("ts-up",T)
from pj occurs:
ts[j] := T
Set 15: Broadcast
CSCE 668
user of TO bcast
24
to-bc-send
to-bc-recv
TO
bcast
symmetric TO alg
ssf-bc-send
ssf-bc-recv
SSF alg
(timestamps)
bc-send
bc-recv
basic bcast
alg (n copies)
send
ssf
bcast
basic
bcast
recv
point-to-point message passing
Set 15: Broadcast
CSCE 668
Correctness of Symmetric Algorithm
25
Lemma (8.2): Timestamps assigned to msgs form a total
order (break ties with id of sender).
Theorem (8.3): Symmetric algorithm simulates totally
ordered broadcast service.
Proof: Must show top-level outputs of symmetric
algorithm satisfy 4 properties, in every admissible
execution (relies on underlying ssf-bcast service being
correct).
Set 15: Broadcast
CSCE 668
Correctness of Symmetric Alg.
26
Integrity: follows from same property for ssf-bcast.
No Duplicates: follows from same property for ssf-bcast.
Liveness:
 Suppose in contradiction some pi has some entry (m,T,j) stuck in
its pending set forever, where (T,j) is the smallest timestamp of
all stuck entries.
 Eventually (m,T,j) has the smallest timestamp of all entries in pi's
pending set.
 Why is (m,T,j) stuck at pi? Because pi's estimate of some pk's
timestamp is stuck at some value T' < T.
 But that would mean either pk never receives (m,T,j) or pk's
timestamp-update msg resulting from pk receiving (m,T,j) is
never received at pi, contradicting correctness of the SSF
broadcast.
Set 15: Broadcast
CSCE 668
Correctness of Symmetric Alg.
27
Total Ordering: Suppose pi invokes to-bc-recv for msg m
with timestamp (T,j), and later it invokes to-bc-recv for msg
m' with timestamp (T',j'). Show (T,j) < (T',j').
 By the code, if (m',T',j') is in pi's pending set when pi
invokes the to-bc-recv for m, then (T,j) < (T',j').
 Suppose (m',T',j') is not yet in pi's pending set at that time.
 When pi invokes the to-bc-recv for m, precondition ensures
that T ≤ ts[j']. So pi has received a msg from pj' with
timestamp ≥ T.
 By the SSF property, every subsequent msg pi receives
from pj' will have timestamp > T, so T' must be > T.
Set 15: Broadcast
CSCE 668
Causal Ordering Algorithms
28

The symmetric total ordering algorithm ensures
causal ordering:
 timestamp
order extends the happens-before order on
messages.

Causal ordering can also be attained without the
overhead of total ordering, by using an algorithm
based on vector clocks…
Set 15: Broadcast
CSCE 668
Causal Order Algorithm
29
Code for pi :
when co-bc-sendi(m) occurs:
vt[i]++
invoke co-bc-recvi(m)
invoke bc-sendi((m,vt))
when bc-recvi((m,w)) from pj occurs:
add (m,w,j) to pending
invoke co-bc-recvi(m,j) when:
(m,w,j) is in pending
w[j] = vt[j] + 1
w[k] ≤ vt[k] for all k ≠ j
result:
remove (m,w,j) from pending
vt[j]++
Note: vt[j] records how many msgs from pj have been co-bc-recv'ed by pi
Set 15: Broadcast
CSCE 668
Causal Order Algorithm Discussion
30



Vector clocks are implemented slightly differently
than in the point-to-point case.
In point-to-point case, we exploited indirect
(transitive) information about messages received by
other procs.
In the broadcast case, we don't need to do that,
since very proc will eventually receive every
message directly.
Set 15: Broadcast
CSCE 668
Causal Order Algorithm Example
31

Algorithm delays the delivery of the C.O. msgs
until causal order property won't be violated.
(1,3,0)
(0,1,0)
(0,2,0)
(0,3,0)
Set 15: Broadcast
CSCE 668
Correctness of Causal Order Algorithm
(Sketch)
32
Lemma (8.6): The local array variables vt serve as
vector clocks.
Theorem (8.7): The algorithm simulates causally
ordered broadcast, if the underlying communication
system satisfies (basic) broadcast.
Proof: Integrity and No Duplicates follow from the same
properties of the basic broadcast. Liveness requires
some arguing. Causal Ordering follows from the
lemma.
Set 15: Broadcast
CSCE 668
Reliable Broadcast
33


What do we require of a broadcast service when
some of the procs can be faulty?
Specifications differ from those of the
corresponding non-fault-tolerant specs in two
ways:
1.
2.
proc indices are partitioned into "faulty" and
"nonfaulty"
Liveness property is modified…
Set 15: Broadcast
CSCE 668
Reliable Broadcast Specification
34


Nonfaulty Liveness: Every msg bc-sent by a
nonfaulty proc is eventually bc-recv'ed by all
nonfaulty procs.
Faulty Liveness: Every msg bc-sent by a faulty proc
is bc-recv'ed by either all the nonfaulty procs or
none of them.
Set 15: Broadcast
CSCE 668
Discussion of Reliable Bcast Spec
35




Specification is independent of any particular fault
model.
We will only consider implementations for crash faults.
No guarantee is given concerning which messages are
received by faulty procs.
Can extend this spec to the various ordering variants:

msgs that are received by nonfaulty procs must conform to
the relevant ordering property.
Set 15: Broadcast
CSCE 668
Spec of Failure-Prone Point-to-Point
Message Passing System
36


Before we can design an algorithm to implement
reliable (i.e., fault-tolerant) broadcast, we need to
know what we can rely on from the lower layer
communication system.
Modify the previous point-to-point spec from the
no-fault case in two ways:
1.
2.
partition proc indices into "faulty" and "nonfaulty"
Liveness property is modified…
Set 15: Broadcast
CSCE 668
Spec of Failure-Prone Point-to-Point
Message Passing System
37

Nonfaulty Liveness: every msg sent by a nonfaulty
proc to any nonfaulty proc is eventually received.
Note that this places no constraints on the eventual
delivery of messages to faulty procs.
Set 15: Broadcast
CSCE 668
Reliable Broadcast Algorithm
38
when rel-bc-sendi(m) occurs:
invoke sendi(m) to all procs
when recvi(m) from pj occurs:
if m has not already been recv'ed then
invoke sendi(m) to all procs
invoke rel-bc-recvi(m)
Set 15: Broadcast
CSCE 668
Correctness of Reliable Bcast Alg
39




Integrity: follows from Integrity property of
underlying point-to-point msg system.
No Duplicates: follows from No Duplicates property
of underlying point-to-point msg system and the
check that this msg was not already received.
Nonfaulty Liveness: follows from Nonfaulty Liveness
property of underlying point-to-point msg system.
Faulty Liveness: follows from relaying and underlying
Nonfaulty Liveness.
Set 15: Broadcast
CSCE 668
Download