Monitoring and Resetting

Chapter 5.2
Monitoring and
Resetting
Self-Stabilization
Shlomi Dolev
MIT Press , 2000
Draft of Jan 2004,
Shlomi Dolev, All Rights Reserved ©
chapter 5.2 - Monitoring and Resetting
5.1- 1
Main Ideas Of General Stabilizer
Two Mechanisms:
 One to monitor the consistency of the system
 The other to repair the system configuration when
inconsistency is detected
 Not trivial combination –
The consistency checking mechanism should
let the correction process complete its task
successfully without repeatedly triggering new
correction processes.
chapter 5.2 - Monitoring and Resetting
5.1- 2
General Stabilizer Using Snapshot
Outline :
 Stabilizer invokes self-stabilizing distributed
snapshot and examines the snapshots
 There is a predicate that identifies whether the
configuration is safe
 Global reset is invoked upon collecting a snapshot
that describes an unsafe configuration
(Global reset ensures that the system is started in
pre-defined safe configuration)
chapter 5.2 - Monitoring and Resetting
5.1- 3
Simple Snapshot Example
 The leader is the only processor that repeatedly
invokes snapshots. (can be elected by leader-election)
 A leader initiates the variables of snapshot
algorithm by invoking distributed reset algorithm
 Each processor records its state and contents of
its communication links
 Then all the records are collected to the leader
and are examined
chapter 5.2 - Monitoring and Resetting
5.1- 4
Snapshot Algorithm




The leader records its state and repeatedly sends
marker to each of its neighbors
Each processor Pi for the first time records its
state and repeatedly sends marker to each of its
neighbors
After Pi has recorded its state, Pi starts recording
arriving messages from each neighbor Pk k≠j. Pi
stops recording messages when a marker arrives
from Pk .
Pi sends the records to the leader after receiving
markers from all neighbors.
chapter 5.2 - Monitoring and Resetting
5.1- 5
Snapshot Example
P1 sends marker to P4 , P4 records its state.
P1
P2
P4
P3
chapter 5.2 - Monitoring and Resetting
5.1- 6
Snapshot Example
P4 sends marker to all its neighbors.
P1
P2
P4
P3
chapter 5.2 - Monitoring and Resetting
5.1- 7
Snapshot Example
P4 receives m1,m2 and puts them to state of channel
P2 -> P4 , receives m3 and puts them to state of
channel P3 -> P4 .Channel P1 -> P4 stays empty.
P1
P2
P4
P3
chapter 5.2 - Monitoring and Resetting
5.1- 8
Transient Fault Detectors
Two extremes – global and local monitoring:
 The idea is to augment each processor with
information about the system up to a certain
distance
 The processors check that their knowledge of the
system is consistent and configuration is safe.
 Note – processor cannot know if configuration is
safe
chapter 5.2 - Monitoring and Resetting
5.1- 9
Some Definitions
 Demands to transient fault detector:
1. No failure should be detected when both the system and
fault detector are in safe configuration
2. A failure must be detected when algorithm is not in safe
configuration
 Abstract task – set of executions in which only the
values of output variables are shown in each configuration.
 Silent task – if the output of the system that implements
the task is fixed.
(next examples speak of silent tasks)
chapter 5.2 - Monitoring and Resetting
5.1- 10
Example 1 – Rooted Spanning Tree
Rooted tree abstract task:
 Each processor Pi maintains two boolean variables
Pi [j] and Ci [j] for each neighbor j. The value of
Pi [j] or Ci [j] is true if Pi considers Pj to be its
parent or one of its children respectively.
 There exists single processor Pr that has a
hardwired false value in every Pr [j].
chapter 5.2 - Monitoring and Resetting
5.1- 11
Fault Detector – General Case
 Each processor Pi maintains a variable Vi(d) with
the view of Pi on the topology of the system and
the output of every processor up to distance d
from Pi. The radius of Vi(d) is d.
 There is a failure detector that can detect
inconsistency of every silent within time that
takes for every two neighboring processors to
communicate.
chapter 5.2 - Monitoring and Resetting
5.1- 12
Fault Detector – General Case
The Algorithm:
 Failure detector at processor Pi repeatedly
communicates Vi(d) to each of its neighbors Pj .
Whenever Pi receives Vj(d) from its neighbor, Pi
verifies that Vi(d) equals Vj(d). In addition Pi
checks Vi(d) satisfies the requirements of the
task.
 In case of rooted tree view of each processor
includes their boolean variables, d is tree radius.
Simply comparing view to each neighbor and if no
fault detected, when system is consistent.
chapter 5.2 - Monitoring and Resetting
5.1- 13
Example 2 – Coloring
(Not Memory Consuming)
The coloring abstract task:
 Each processor Pi maintains an output variable Ci
to which it assigns a value. For every two
neighboring processors Pi and Pj the values of Ci
and Cj should be different.
 A failure detector for coloring task employs view
with diameter one.
 Vi(d) consists of color of Pi and of colors of its
neighbors.
chapter 5.2 - Monitoring and Resetting
5.1- 14
Example 3 – The Topology Update
The topology update abstract task:
 Each processor Pi maintains variable Ti containing
the representation of the communication graph.
 This task defined by a global relationship but
failure detector of radius one is sufficient.
 The view Vi(d) of processor Pi includes the variable
Ti and all variables Tj of every neighbor Pj to Pi.
chapter 5.2 - Monitoring and Resetting
5.1- 15
Fault Detector For Non-silent
Algorithms


Points for the Detector:
Description for synchronous system, but with
synchronizer could be augmented for
asynchronous.
Special data structure pyramid used.
pyramid Δi = Vi(0) , Vi(1) , Vi(2) ,…, Vi(d) of views is
maintained by every processor Pi , where Vi(l) is a
view of all processors that are no more than l from
Pi , l time units ago.
chapter 5.2 - Monitoring and Resetting
5.1- 16
Fault Detector For Non-silent
Algorithms
The Algorithm:
Neighboring processors exchange their pyramids
and check whether they agree on their shared
portions.
In addition every processor checks if Vi(d) is
consistent configuration for input algorithm. (In
this case configuration reachable from initial
state)
chapter 5.2 - Monitoring and Resetting
5.1- 17
Consistency Of Configuration





Every processor Pi checks its state in the view Vi(l)
,0 < l < d - 1,is obtained by executing AL using the
state of Pi and the state of Pi ‘s neighbors in
Vi(l+1) .
This test ensures that configuration is consistent.
Pyramids updating:
Pi receives the pyramid Δj of every neighbor Pj . Pi uses the
received values of Vj(d-1) to construct Vj(d).
Analogously, Pi uses the received values of Vj(k-1) together
with Vi(k-1) to compute new value Vi(k) .
chapter 5.2 - Monitoring and Resetting
5.1- 18
Self-Stabilizing Reset





Common technique for converting a distributed algorithm AL
to a self=stabilizing algorithm DRA for a certain abstract
task Τ:
Compose a failure detector D, self-stabilizing reset R and
algorithm AL
D◦R◦AL=>DRA(Τ)
Processor executes steps of each of algorithms infinitely
often
The task of self-stabilizing reset is to initialize the system,
upon a request, to a predefined safe configuration.
Every processor may invoke reset at any of its execution
steps.
chapter 5.2 - Monitoring and Resetting
5.1- 19
Self-Stabilizing Reset Example


This reset algorithm implemented by the use of
fair composition of a self-stabilizing leaderelection and a spanning tree-construction
algorithm with a version of β-synchronizer
Assumptions:
We assume that a rooted spanning tree already
exists and design the reset algorithm in that
system.
chapter 5.2 - Monitoring and Resetting
5.1- 20
Self-Stabilizing Reset Example
(Root)

Do forever
 Forall Pj є N(i) do lrji := read(rji)
• if(∀ Pj є children(i) lrji.color = colori )) then
• colori := (colori + 1)mod(5n - 3)
• if(∀ Pj є children(i) lrji.ResetRequest=false ))
and invokei = false then
reseti := false
• else
reseti := true
invokei := false
InitializeState(DA)
• Forall Pj є children(i) do
write rji.(color,reset) := (colori ,reseti )
chapter 5.2 - Monitoring and Resetting
5.1- 21
Self-Stabilizing Reset Example
(Other)

do forever
 Forall Pj є N(i) do lrji := read(rji)
• if(colori ≠ lrparent,i.color) then
colori = lrparent,i.color
if lrparent,i.reset = true then
reseti := true
invokei := false
InitializeState(DA)
else reseti := false
else(next page)
chapter 5.2 - Monitoring and Resetting
5.1- 22
Self-Stabilizing Reset Example
(Other cont.)



else if(∀ Pj є children(i) lrji.color = colori )) then
 if reseti = false then invokei := FaultDetect()
 if((∀ Pj є children(i) lrji.ResetRequest=false ) and
invokei = false) then
ResetRequesti := false
 else
ResetRequesti := true
write ri,paren.(color, ResetRequest):=
(colori ,ResetRequesti )
Forall Pj є children(i) do
write rji.(color,reset) := (colori ,reseti )
od
chapter 5.2 - Monitoring and Resetting
5.1- 23
Little Spanning tree example
P3 detects a fault and sends request to its father
Root
P4
P2
P3
chapter 5.2 - Monitoring and Resetting
5.1- 24
Little Spanning tree example

P4 forwards the request to root
Root
P4
P2
P3
chapter 5.2 - Monitoring and Resetting
5.1- 25
Little Spanning tree example

Root initializes its state and sends reset to whole
tree
Root
P4
P2
P3
chapter 5.2 - Monitoring and Resetting
5.1- 26
Little Spanning tree example

P3 detects a fault and sends request to its father
Root
P4
P2
P3
chapter 5.2 - Monitoring and Resetting
5.1- 27