Chapter 21 - Kent State University


Chapter 21

Asynchronous Network Computing with

Process Failures

By Sindhu Karthikeyan.

21.1 The Network Model

Theorem 21.1 If A is asynchronous broadcast system with a reliable broadcast channel, then there is an asynchronous send/receive system B with reliable FIFO send/receive channels that has the same user interface as A and that “simulates” A, as follows. For every Execution α of B, there is an execution α‘ of A such that the following conditions hold:

1. α and α‘ are indistinguishable to U (the composition of users U i ).

2. For each i, a stop i occurs in α exactly if it does in α‘ .

Moreover, if α is fair, then α‘ is also fair.

Proof Sketch.

. System B has one processor Q i for each processor P i of A.

. Each Q i is responsible for simulating P i

, and participating in the simulation of the broadcast channel.


Q i simulates a broadcast msg bcast(m)i output of P i by performing send(m,t) i,j outputs for all j

≠ i

, where t = local-integer valued tag (which starts from 1 and increments with each successive broadcast), and also performing an internal step simulating receive(m) i,i.

. If Q i receives a message (m,t) sent by Q j, it helps in simulation of P j’s broadcast by relaying the message- it sends (m,t,j) to all processors other than i and j.

. Q i collects tagged messages which was broadcasted by Pj

, j

≠ i, which are received directly from Qj or by relays.

. Q i is also allowed to perform an internal step simulating a receive(m) j,i,

Q i can do this only when Q i has a message (m,t) originally broadcast by P j,

Q i has already relayed

(m,t,j) to all processors other than I and j, and Q i has already simulated receive j,i events for message from P j with all tag values less than t.

Some key facts for the proof are as follows:


No Q i simulates a receive(m) j,i unless it has succeeded in sending the corresponding (m,t) to all the other processors, and thus it has guaranteed that all processors will receive (m,t) from j.


Although Qi can receive messages broadcasted by P j out of order, the tags allow Qi to sort these messages into proper order.


If a message with tag t is sent by any processor Qi then it must be that message originating at Pi with all the other messages having smaller tag values have been previously sent to all processors.

Impossibility of Agreement in the presence of Faults

System A











init(v) i

Ui decide(v) i i Broadcast









Asynchronous network system

For Agreement problem


The User interface of the System A has input actions – init(v) i, and output actions – decide(v) i where 1 ≤ i ≤ n, A also has Stopi as an input action.

. Each user Ui has input actions – decide(v) i, and output actions init(v) i.

and Ui is assumed to perform at most one init i action at in any execution .

We consider the following conditions on the combined system consisting of A and Ui.

Well-formedness: In any execution and for any i, the interactions between Ui and A are well-formed for i.

Agreement : In any execution, all decisions values are identical.

Validity : In any execution, if all init actions that occur contain the same value v, then v is the only possible decision value.

Failure-free-termination : In any fair failure-free execution in which init event occurs on all ports, a decide event occurs on each port.

We say in an Asynchronous network system solves the Agreement problem if it guarantees well-formedness, agreement, validity, and failure-free termination.

f-failure termination, 0 f ≤ n : In any fair execution in which init event occur on all ports, if there are stop events on at most f ports, then a decide event occurs on every non-failing port.

Wait-free termination is defined to be the special case of f -failure termination where f = n.

Theorem 21.2 There is no algorithm in the asynchronous broadcast model with a reliable broadcast channel that solves the agreement problem and guarantees 1-failure termination.

Proof : The construction begins with a fair failure-free input-first execution with a bivalent initialization .

Then we repeatedly extend the current execution, including at least one step of process 1 in the first extension, then at least one step of 2 in the second extension, and so on, in round-robin order, all while maintaining bivalence and avoiding failures.

The resulting execution is fair, because each process takes infinitely many steps.

But no process ever reaches a decision, which contradicts the failurefree termination requirement.

21.3 A Randomized Algorithm

In the above theorem 21.2 it said that the agreement problem cannot be solved in an Asynchronous network system, even for only a single stopping failure.

The agreement problem can be solved in randomized Asynchronous network, this model is stronger than the ordinary Asynchronous network model, because it allows the processors to make random choices during the computation.

Here the correctness conditions are slightly weaker than the conditions in the ordinary asynchronous network model, all other conditions are guaranteed except for the termination condition is now probabilistic.

All non-faulty processors will decide by time t after the arrival of all inputs, with probability of at least P(t), where P is a particular monotone nondecreasing , unbounded function. This implies eventual termination with probability 1.

BenOr algorithm: works for n > 3 f and V = {0,1}


Each process Pi has some local variables x and y, which are initially null.


An init(v)i input causes process Pi to set x:= v.


Pi executes a series of stages, each stage consisting of two rounds.

Pi begins stage1 after it receives its initial value in an init i input.


It continues to perform the algorithm even after it decides.

At each stage s ≥ 1, Pi does the following:

Round1 : Pi broadcasts (“first”,s,v), where v is the current value of x, and later on it waits to obtain n – f messages of the form (“first”,s,*).

If all of these have the same value v, then Pi sets y = v, else y = null.

Round2 : Pi broadcasts (“second”,s,v), where v is its current value of y , then waits to obtain n – f messages of the form (“second”,s,*).

There are 3 cases:


If all of the n – f messages have the same value v ≠ null, then P i sets x= v and performs a decide(v) i if it has not already done so.


If at least n – 2 f message, but not all have the same value v ≠ null, then Pi sets x = v but does not decide.


Since it is assumed that n > 3 f , so it implies that there cannot be two different such values v.) Otherwise, Pi sets x to either 0 or 1, choosing randomly with equal probability.

Lemma 21.3 The BenOr algorithm guarantees well-formedness, agreement, and validity.

Proof: For validity let us suppose that all init events that occurred in an execution contains the same value v, then it becomes obvious that any process that completes stage 1 must decide on v in that stage, hence satisfying the validity condition.

Now for agreement suppose that Pi decides v at stage s and no process decides at any other smaller-numbered stages, then Pi receives (nf ) (“second”, s, v) messages. Now this implies that any other process Pj that completes stage s receives at least n-2 f

(“second”, s, v) messages, since it hears from all but at the most f of the processors that Pi hears from.

So this means that Pj cannot decide on a value which is different from v at stage s.

Since the above is true for all Pj that complete stage s, it states that as in Validity argument, that any process that completes s + 1 must decide v at stage s + 1.

Lemma 21.4 In every fair Execution of the BenOr algorithm in which init event occur on all ports, each nonfaulty process completes infinetly many stages. Moreover, if l is an upper bound on the time for each process task, and d is an upper bound on the delivery time for the oldest message in transit from each Pi to each Pj, then each nonfaulty process completes each stage s by O(s(d + l)) time after the last init event.

Lemma 21.5

For any adversary and any s ≥ 0, with probability at least 1 – (1 – 1/2 n ) s , all nonfaulty processes decide within s + 1 stages.

Lemma 21.6

For any adversary and any t ≥ 0, with probability p(t), all non-faulty processors decide within time t after the last init event.

The main correctness result is

Lemma 21.5

The BenOr algorithm guarantees well-formedness, agreement, and validity. It also guarantees that, with probability 1, all nonfaulty processors eventually decide.

21.4 Failure Detectors

Another way for solving this agreement problem in fault-prone asynchronous network, is by strengthing the model by adding a new type of system component known as failure detector.

A failure detector is a module that provides information to the process in an asynchronous network about previous process failures.

The simplest Failure detector is a perfect failure detector , which is guaranteed to report only failures that have actually happened and to eventually report all such failures to all other non-failed processes.

Formally, we consider a system A that has the same structure as an asynchronous network system, except that it has additional input actions inform-stopped(j) i for each pair i and j of ports, i ≠ j.

Architecture for asynchronous broadcast system with a perfect failure dtector.







Failure detector









Perfect FDAgreement algorithm (informal):

Each process Pi attempts to stabilize two pieces of data:


A vector val, indexed by {1,2,…….,n}, with values in V U {null}.

val(j) = v Є V, it means that Pi knows that Pj’s initial value is v.


A set stopped of process indices. If j Є stopped, it means that Pi knows that Pj has stopped.

Process Pi continually broadcasts its current val and stopped data and updates it upon receipt of new data from processes not in stopped.

It ignores messages from the processors it has already placed in stopped.

Pi also keeps track of processors that “ratify’ its data, that is from which it receives the same (val, stopped) data that it already has.

When Pi reaches a point where its data has “stabilized”, that is, when it has received ratifications for its current data from all non-stopped process, then Pi decides on the non-null value corresponding to the smallest index in its val vector.

Theorem 21.8

PerfectFDAgreement , when used with any perfect failure detectors , solves the agreement problem and guarantees wait-free termination .