Chapter 5.2 Monitoring and Resetting Self-Stabilization Shlomi Dolev MIT Press , 2000 Draft of Jan 2004, Shlomi Dolev, All Rights Reserved © chapter 5.2 - Monitoring and Resetting 5.1- 1 Main Ideas Of General Stabilizer Two Mechanisms: One to monitor the consistency of the system The other to repair the system configuration when inconsistency is detected Not trivial combination – The consistency checking mechanism should let the correction process complete its task successfully without repeatedly triggering new correction processes. chapter 5.2 - Monitoring and Resetting 5.1- 2 General Stabilizer Using Snapshot Outline : Stabilizer invokes self-stabilizing distributed snapshot and examines the snapshots There is a predicate that identifies whether the configuration is safe Global reset is invoked upon collecting a snapshot that describes an unsafe configuration (Global reset ensures that the system is started in pre-defined safe configuration) chapter 5.2 - Monitoring and Resetting 5.1- 3 Simple Snapshot Example The leader is the only processor that repeatedly invokes snapshots. (can be elected by leader-election) A leader initiates the variables of snapshot algorithm by invoking distributed reset algorithm Each processor records its state and contents of its communication links Then all the records are collected to the leader and are examined chapter 5.2 - Monitoring and Resetting 5.1- 4 Snapshot Algorithm The leader records its state and repeatedly sends marker to each of its neighbors Each processor Pi for the first time records its state and repeatedly sends marker to each of its neighbors After Pi has recorded its state, Pi starts recording arriving messages from each neighbor Pk k≠j. Pi stops recording messages when a marker arrives from Pk . Pi sends the records to the leader after receiving markers from all neighbors. chapter 5.2 - Monitoring and Resetting 5.1- 5 Snapshot Example P1 sends marker to P4 , P4 records its state. P1 P2 P4 P3 chapter 5.2 - Monitoring and Resetting 5.1- 6 Snapshot Example P4 sends marker to all its neighbors. P1 P2 P4 P3 chapter 5.2 - Monitoring and Resetting 5.1- 7 Snapshot Example P4 receives m1,m2 and puts them to state of channel P2 -> P4 , receives m3 and puts them to state of channel P3 -> P4 .Channel P1 -> P4 stays empty. P1 P2 P4 P3 chapter 5.2 - Monitoring and Resetting 5.1- 8 Transient Fault Detectors Two extremes – global and local monitoring: The idea is to augment each processor with information about the system up to a certain distance The processors check that their knowledge of the system is consistent and configuration is safe. Note – processor cannot know if configuration is safe chapter 5.2 - Monitoring and Resetting 5.1- 9 Some Definitions Demands to transient fault detector: 1. No failure should be detected when both the system and fault detector are in safe configuration 2. A failure must be detected when algorithm is not in safe configuration Abstract task – set of executions in which only the values of output variables are shown in each configuration. Silent task – if the output of the system that implements the task is fixed. (next examples speak of silent tasks) chapter 5.2 - Monitoring and Resetting 5.1- 10 Example 1 – Rooted Spanning Tree Rooted tree abstract task: Each processor Pi maintains two boolean variables Pi [j] and Ci [j] for each neighbor j. The value of Pi [j] or Ci [j] is true if Pi considers Pj to be its parent or one of its children respectively. There exists single processor Pr that has a hardwired false value in every Pr [j]. chapter 5.2 - Monitoring and Resetting 5.1- 11 Fault Detector – General Case Each processor Pi maintains a variable Vi(d) with the view of Pi on the topology of the system and the output of every processor up to distance d from Pi. The radius of Vi(d) is d. There is a failure detector that can detect inconsistency of every silent within time that takes for every two neighboring processors to communicate. chapter 5.2 - Monitoring and Resetting 5.1- 12 Fault Detector – General Case The Algorithm: Failure detector at processor Pi repeatedly communicates Vi(d) to each of its neighbors Pj . Whenever Pi receives Vj(d) from its neighbor, Pi verifies that Vi(d) equals Vj(d). In addition Pi checks Vi(d) satisfies the requirements of the task. In case of rooted tree view of each processor includes their boolean variables, d is tree radius. Simply comparing view to each neighbor and if no fault detected, when system is consistent. chapter 5.2 - Monitoring and Resetting 5.1- 13 Example 2 – Coloring (Not Memory Consuming) The coloring abstract task: Each processor Pi maintains an output variable Ci to which it assigns a value. For every two neighboring processors Pi and Pj the values of Ci and Cj should be different. A failure detector for coloring task employs view with diameter one. Vi(d) consists of color of Pi and of colors of its neighbors. chapter 5.2 - Monitoring and Resetting 5.1- 14 Example 3 – The Topology Update The topology update abstract task: Each processor Pi maintains variable Ti containing the representation of the communication graph. This task defined by a global relationship but failure detector of radius one is sufficient. The view Vi(d) of processor Pi includes the variable Ti and all variables Tj of every neighbor Pj to Pi. chapter 5.2 - Monitoring and Resetting 5.1- 15 Fault Detector For Non-silent Algorithms Points for the Detector: Description for synchronous system, but with synchronizer could be augmented for asynchronous. Special data structure pyramid used. pyramid Δi = Vi(0) , Vi(1) , Vi(2) ,…, Vi(d) of views is maintained by every processor Pi , where Vi(l) is a view of all processors that are no more than l from Pi , l time units ago. chapter 5.2 - Monitoring and Resetting 5.1- 16 Fault Detector For Non-silent Algorithms The Algorithm: Neighboring processors exchange their pyramids and check whether they agree on their shared portions. In addition every processor checks if Vi(d) is consistent configuration for input algorithm. (In this case configuration reachable from initial state) chapter 5.2 - Monitoring and Resetting 5.1- 17 Consistency Of Configuration Every processor Pi checks its state in the view Vi(l) ,0 < l < d - 1,is obtained by executing AL using the state of Pi and the state of Pi ‘s neighbors in Vi(l+1) . This test ensures that configuration is consistent. Pyramids updating: Pi receives the pyramid Δj of every neighbor Pj . Pi uses the received values of Vj(d-1) to construct Vj(d). Analogously, Pi uses the received values of Vj(k-1) together with Vi(k-1) to compute new value Vi(k) . chapter 5.2 - Monitoring and Resetting 5.1- 18 Self-Stabilizing Reset Common technique for converting a distributed algorithm AL to a self=stabilizing algorithm DRA for a certain abstract task Τ: Compose a failure detector D, self-stabilizing reset R and algorithm AL D◦R◦AL=>DRA(Τ) Processor executes steps of each of algorithms infinitely often The task of self-stabilizing reset is to initialize the system, upon a request, to a predefined safe configuration. Every processor may invoke reset at any of its execution steps. chapter 5.2 - Monitoring and Resetting 5.1- 19 Self-Stabilizing Reset Example This reset algorithm implemented by the use of fair composition of a self-stabilizing leaderelection and a spanning tree-construction algorithm with a version of β-synchronizer Assumptions: We assume that a rooted spanning tree already exists and design the reset algorithm in that system. chapter 5.2 - Monitoring and Resetting 5.1- 20 Self-Stabilizing Reset Example (Root) Do forever Forall Pj є N(i) do lrji := read(rji) • if(∀ Pj є children(i) lrji.color = colori )) then • colori := (colori + 1)mod(5n - 3) • if(∀ Pj є children(i) lrji.ResetRequest=false )) and invokei = false then reseti := false • else reseti := true invokei := false InitializeState(DA) • Forall Pj є children(i) do write rji.(color,reset) := (colori ,reseti ) chapter 5.2 - Monitoring and Resetting 5.1- 21 Self-Stabilizing Reset Example (Other) do forever Forall Pj є N(i) do lrji := read(rji) • if(colori ≠ lrparent,i.color) then colori = lrparent,i.color if lrparent,i.reset = true then reseti := true invokei := false InitializeState(DA) else reseti := false else(next page) chapter 5.2 - Monitoring and Resetting 5.1- 22 Self-Stabilizing Reset Example (Other cont.) else if(∀ Pj є children(i) lrji.color = colori )) then if reseti = false then invokei := FaultDetect() if((∀ Pj є children(i) lrji.ResetRequest=false ) and invokei = false) then ResetRequesti := false else ResetRequesti := true write ri,paren.(color, ResetRequest):= (colori ,ResetRequesti ) Forall Pj є children(i) do write rji.(color,reset) := (colori ,reseti ) od chapter 5.2 - Monitoring and Resetting 5.1- 23 Little Spanning tree example P3 detects a fault and sends request to its father Root P4 P2 P3 chapter 5.2 - Monitoring and Resetting 5.1- 24 Little Spanning tree example P4 forwards the request to root Root P4 P2 P3 chapter 5.2 - Monitoring and Resetting 5.1- 25 Little Spanning tree example Root initializes its state and sends reset to whole tree Root P4 P2 P3 chapter 5.2 - Monitoring and Resetting 5.1- 26 Little Spanning tree example P3 detects a fault and sends request to its father Root P4 P2 P3 chapter 5.2 - Monitoring and Resetting 5.1- 27