Theoretical Foundations of Distributed Operating Systems Neeraj Mittal CS/SE 6378 (Advanced Operating Systems) The University of Texas at Dallas June 9, 2020 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 1 / 62 Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 2 / 62 Overview Advanced Operating Systems • Operating systems for • Distributed systems • Multicore systems • Database systems • Real-time systems Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 4 / 62 Overview Distributed Systems Why Distributed Systems? • Economies of scale • create a system with desired computational power using a large number of cheap machines • Modular growth • can add more machines to the system as and when needed • Fault tolerance • system can continue to operate albeit with reduced performance Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 6 / 62 Overview Distributed Systems Two Important Characteristics • Absence of Global Clock • there is no common notion of time • Absence of Shared Memory • no process has up-to-date knowledge about the system Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 7 / 62 Overview Distributed Systems Absence of Global Clock • Different processes may have different notions of time I invented AIDS cure! I invented AIDS cure! Who did it first? Patent Officer Mars Earth Problem: How do we order events on different processes? Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 8 / 62 Overview Distributed Systems Absence of Shared Memory • A process does not know current state of other processes What is Harry doing at present? Harry Bob Problem: How do we obtain a coherent view of the system? Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 9 / 62 Overview Distributed Systems When is it possible to order two events? • Three cases: 1 Events executed on the same process: • if e and f are events on the same process and e occurred before f , then e happened-before f 2 Communication events of the same message: • if e is the send event of a message and f is the receive event of the same message, then e happened-before f 3 Events related by transitivity: • if event e happened-before event g and event g happened-before event f , then e happened-before f Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 10 / 62 Ordering Events Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 11 / 62 Ordering Events Happened-Before Relation • Happened-before relation is denoted by → • Illustration: a b c P1 d P2 e f P3 g h i • Events on the same process: examples: a → b, a → c, d → f • Events of the same message: examples: b → e, f → i • Transitivity: examples: a → e, a → i, e → i Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 12 / 62 Ordering Events Concurrent Events • Events not related by happened-before relation • Concurrency relation is denoted by k • Illustration: a b c P1 d P2 e f P3 g h i • Examples: a k d, d k h, c k e • Concurrency relation is not transitive: example: a k d and d k c but a ∦ c Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 13 / 62 Abstract Clocks Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 14 / 62 Abstract Clocks Different Kinds of Clocks • Logical Clocks • used to totally order all events • Vector Clocks • used to track happened-before relation • Matrix Clocks • used to track what other processes know about other processes • Direct Dependency Clocks • used to track direct causal dependencies Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 15 / 62 Abstract Clocks Logical Clock • Implements the notion of virtual time • Can be used to totally order all events • Assigns timestamp to each event in a way that is consistent with the happened-before relation: e → f =⇒ C (e) < C (f ) C (e): timestamp for event e C (f ): timestamp for event f Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 16 / 62 Abstract Clocks Implementing Logical Clock • Each process has a local scalar clock, initialized to zero • Ci denotes the local clock of process Pi • Action depends on the type of the event • Protocol for process Pi : • On executing an interval event: Ci := Ci + 1 • On sending a message m: Ci := Ci + 1 piggyback Ci on m • On receiving a message m: let tm be the timestamp piggybacked on m Ci := max{Ci , tm } + 1 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 17 / 62 Abstract Clocks Implementing Logical Clock: An Illustration a P1 1 P2 P3 1 a: internal event Neeraj Mittal (UT Dallas) Theoretical Foundations C (a) = 1 June 9, 2020 18 / 62 Abstract Clocks Implementing Logical Clock: An Illustration a P1 1 b P2 1 c P3 1 b and c: internal events Neeraj Mittal (UT Dallas) C (b) = 1 and C (c) = 1 Theoretical Foundations June 9, 2020 18 / 62 Abstract Clocks Implementing Logical Clock: An Illustration P1 a d 1 2 2 b P2 1 c P3 1 d: send event Neeraj Mittal (UT Dallas) Theoretical Foundations C (d) = 2 June 9, 2020 18 / 62 Abstract Clocks Implementing Logical Clock: An Illustration P1 a d 1 2 2 b e 1 3 P2 c P3 1 e: receive event Neeraj Mittal (UT Dallas) Theoretical Foundations C (e) = 3 June 9, 2020 18 / 62 Abstract Clocks Implementing Logical Clock: An Illustration P1 a d 1 2 3 2 b e 1 3 4 P2 4 4 c P3 1 Neeraj Mittal (UT Dallas) 2 Theoretical Foundations 5 June 9, 2020 18 / 62 Abstract Clocks Limitation of Logical Clock • Logical clock cannot be used to determine whether two events are concurrent e k f does not imply C (e) = C (f ) Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 19 / 62 Abstract Clocks Vector Clock • Captures the happened-before relation • Assigns timestamp to each event such that: e → f ⇐⇒ C (e) < C (f ) C (e): timestamp for event e C (f ): timestamp for event f Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 20 / 62 Abstract Clocks Comparing Two Vectors • Vectors are compared component-wise: • Equality: V = V′ h∀i : V [i] = V ′ [i]i iff • Less Than: V < V′ iff h∀i : V [i] ≤ V ′ [i]i ∧ h∃i : V [i] < V ′ [i]i and 2 • Example: 1 2 0 Neeraj Mittal (UT Dallas) < 2 3 1 1 1 < 2 Theoretical Foundations 3 4 but 1 2 1 6< 2 1 3 June 9, 2020 21 / 62 Abstract Clocks Implementing Vector Clock • Each process has a local vector clock • Ci denotes the local clock of process Pi • Action depends on the type of the event • Protocol for process Pi : • On executing an interval event: Ci [i] := Ci [i] + 1 • On sending a message m: Ci [i] := Ci [i] + 1 piggyback Ci on m • On receiving a message m: let tm be the timestamp piggybacked on m ∀k Ci [k] := max{Ci [k], tm [k]} Ci [i] := Ci [i] + 1 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 22 / 62 Abstract Clocks Implementing Vector Clock: An Illustration a P1 h1i 0 0 P2 P3 h0i 0 1 a: internal event Neeraj Mittal (UT Dallas) C (a) = Theoretical Foundations 1 0 0 June 9, 2020 23 / 62 Abstract Clocks Implementing Vector Clock: An Illustration a P1 h1i 0 0 b P2 h0i 1 0 c P3 h0i 0 1 b and c: internal events Neeraj Mittal (UT Dallas) C (b) = Theoretical Foundations 1 0 0 and C (c) = 0 0 1 June 9, 2020 23 / 62 Abstract Clocks Implementing Vector Clock: An Illustration a d h1i h2i P1 0 0 0 0 b h2i 0 0 P2 h0i 1 0 c P3 h0i 0 1 d: send event Neeraj Mittal (UT Dallas) C (d) = Theoretical Foundations 2 0 0 June 9, 2020 23 / 62 Abstract Clocks Implementing Vector Clock: An Illustration a d h1i h2i P1 0 0 0 0 b h2i 0 0 e P2 h2i h0i 2 0 1 0 c P3 h0i 0 1 e: receive event Neeraj Mittal (UT Dallas) C (e) = Theoretical Foundations 2 2 0 June 9, 2020 23 / 62 Abstract Clocks Implementing Vector Clock: An Illustration a d h1i h2i P1 0 0 0 0 b h2i h3i h4i 0 0 0 0 e 0 0 P2 h2i h0i 2 0 1 0 c h2i 3 0 h2i 3 0 P3 h0i 0 1 Neeraj Mittal (UT Dallas) h0i 0 2 Theoretical Foundations h2i 3 3 June 9, 2020 23 / 62 Abstract Clocks Properties of Vector Clock • How many comparisons are needed to determine whether an event e happened-before another event f ? • As many as N integers may need to be compared in the worst case, where N is the number of processes • Can we do better? • Suppose we know the processes on which events e and f occurred (say, Pi and Pj respectively) e→f if and only if W (i = j) ∧ (C (e)[i] < C (f )[i]) (i 6= j) ∧ (C (e)[i] ≤ C (f )[i]) Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 24 / 62 Ordering of Messages Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 25 / 62 Ordering of Messages Ordering of Messages • For many applications, messages should be delivered in certain order to be interpreted meaningfully • Example: m1: Have you seen the movie “Shrek”? Bob m2: Yes I have and I liked it Alice Tom • m2 cannot be interpreted until m1 has been received • Tom receives m2 before m1 : an undesriable behavior Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 26 / 62 Ordering of Messages Useful Notations • For a message m: • • • • src(m): source process of m dst(m): destination process of m snd(m): send event of m rcv (m): receive event of m snd (m) src (m) e Pi m Pj f dst (m) Neeraj Mittal (UT Dallas) rcv (m) Theoretical Foundations June 9, 2020 27 / 62 Ordering of Messages Causal Delivery of Messages • A message w causally precedes a message m if snd(w ) → snd(m) • An execution of a distributed system is said to be causally ordered if the following holds for every message m: every message that causally precedes m and is destined for the same process as m is delivered before m Mathematically, for every message w : (snd(w ) → snd(m)) ∧ (dst(w ) = dst(m)) =⇒ rcv (w ) → rcv (m) Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 28 / 62 Ordering of Messages A Causally Ordered Delivery Protocol • Proposed by Birman, Schiper and Stephenson (BSS) • Assumption: • communication is broadcast based: a process sends a message to every other process • Each process maintains a vector with one entry for each process: • let Vi denote the vector for process Pi • the j th entry of Vi refers to the number of messages that have been broadcast by process Pj that Pi knows of Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 29 / 62 Ordering of Messages The BSS Protocol • Protocol for process Pi : • On broadcasting a message m: piggyback Vi on m Vi [i] := Vi [i] + 1 • On arrival of a message m from process Pj : let Vm be the vector piggybacked on m deliver m once Vi ≥ Vm • On delivery of a message m sent by process Pj : Vi [j] := Vi [j] + 1 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 30 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 P3 h0i 0 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h1i 0 0 h0i h0i 0 0 0 0 b h0i 0 0 P3 h0i 0 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 h1i 0 0 h0i h0i 0 0 0 0 b h1i 0 0 P3 h0i 0 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 h1i 0 0 h0i h0i 0 0 0 0 b 0 0 c h1i h1i 0 0 h1i 1 0 h1i 0 0 d P3 h0i 0 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 h1i 0 0 h0i h0i 0 0 0 0 b 0 0 c h1i h1i 0 0 h1i 1 0 h1i h1i 1 0 0 0 d P3 h0i 0 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 h1i 0 0 h0i h0i 0 0 0 0 b 0 0 c h1i h1i 0 0 h1i 1 0 h1i h1i 1 0 0 0 d f P3 h0i 0 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 h1i 0 0 h0i h0i 0 0 0 0 b 0 0 c h1i h1i 0 0 h1i 1 0 h1i h1i 1 0 0 0 d f P3 h0i h1i 0 0 Neeraj Mittal (UT Dallas) 0 0 Theoretical Foundations June 9, 2020 31 / 62 Ordering of Messages The BSS Protocol: An Illustration e a P1 h0i 0 0 P2 h0i 0 0 h1i 0 0 h0i h0i 0 0 0 0 b c h1i 0 0 h1i 1 0 h1i h1i h1i 0 0 0 0 1 0 f g P3 h0i h1i h1i 0 0 Neeraj Mittal (UT Dallas) 0 0 Theoretical Foundations 1 0 June 9, 2020 31 / 62 Ordering of Messages Another Causally Ordered Delivery Protocol • Proposed by Raynal, Schiper and Toueg (RST) • Assumption: • communication is unicast based: a process sends a message to only one other process • Each process Pi maintains a matrix Mi of size N × N • N is the number of processes in the system • The entry (j, k) in the matrix Mi captures the following knowledge: • Suppose Mi [j, k] = x • Interpretation: As far as process Pi knows, process Pj has sent x messages to process Pk • Initially, all entries in all matrices are set to 0 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 32 / 62 Ordering of Messages The RST Protocol • Protocol for process Pi : • On sending message m to process Pj : piggyback matrix Mi on message m increment Mi [i, j] by 1 • On receiving message m from process Pj carrying matrix Mm : buffer the message until Mi [∗, i] ≥ Mm [∗, i] • On delivery of message m sent by process Pj carrying matrix Mm : merge Mi and Mm ∀x, y : Mi [x, y ] := max(Mi [x, y ], Mm [x, y ]) increment Mi [j, i] by 1 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 33 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 i h0 i h0 i P1 0 0 0 0 0 0 0 0 P2 0 0 0 0 0 0 0 0 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i m1 h0 0 0 0 0 0 0 0 0 i h0 i P2 0 0 0 0 0 0 0 0 P3 P1 sends m1 to P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 i m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 i h0 i P2 0 0 0 0 0 0 0 0 P3 i P1 sends m1 to P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 i h0 i P2 0 0 0 0 0 0 0 0 P3 i P1 sends m2 to P2 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 i h0 i P2 0 0 0 0 0 0 0 0 P3 i h0 i 0 1 0 0 0 0 0 0 i P1 sends m2 to P2 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 i h0 i P2 0 0 0 0 0 0 0 0 P3 i h0 i 0 1 0 0 0 0 0 0 i m2 arrives at P2 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 i h0 i P2 0 0 0 0 0 0 0 0 P3 i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 i P2 delivers m2 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 i P2 sends m3 to P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 h0 1 1 0 0 1 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 h0 i 1 1 0 0 0 0 0 0 i i P2 sends m3 to P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 h0 1 1 0 0 1 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 h0 i 1 1 0 0 0 0 0 0 i i m3 arrives at P3 and is buffered Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 h0 1 1 0 0 1 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 h0 i 1 1 0 0 0 0 0 0 i i m1 arrives at P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 h0 1 1 0 0 1 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 h0 i 1 1 0 0 0 0 0 0 i h0 0 1 0 0 0 0 0 0 i i P3 delivers m1 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 h0 1 1 0 0 1 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 h0 i 1 1 0 0 0 0 0 0 i h0 0 1 0 0 0 0 0 0 i i h0 1 1 0 0 1 0 0 0 i P3 delivers m2 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 34 / 62 Ordering of Messages The RST Protocol: An Illustration h0 0 0 0 0 0 0 0 0 P1 i h0 0 1 0 0 0 0 0 0 h0 1 1 0 0 0 0 0 0 i m2 m1 h0 0 0 0 0 0 0 0 0 h0 0 0 0 0 0 0 0 0 P2 i i h0 i 0 1 0 0 0 0 0 0 i h0 1 1 0 0 0 0 0 0 h0 1 1 0 0 1 0 0 0 i m3 h0 0 0 0 0 0 0 0 0 P3 h0 1 1 0 0 0 0 0 0 h0 0 1 0 0 0 0 0 0 i Neeraj Mittal (UT Dallas) i Theoretical Foundations i i h0 1 1 0 0 1 0 0 0 June 9, 2020 i 34 / 62 State of a Distributed System Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 35 / 62 State of a Distributed System State of a Distributed System • State of a distributed system is a collection of states of all its processes and channels: • a process state is given by the values of all variables on the process • a channel state is given by the set of messages in transit in the channel • can be determined by examining states of the two processes it connects • State of a process is called local state or local snapshot • State of the system is called global state or global snapshot Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 36 / 62 State of a Distributed System Events and Local States • A process changes its state by executing an event • Example: x =0 P1 x =2 x =3 a y =1 P2 b y =2 c y =3 d • Process P1 changes its state from x = 0 to x = 2 on executing event a • Process P2 changes its state from y = 2 to y = 3 on executing event d Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 37 / 62 State of a Distributed System When is a Global State Meaningful? • Example: x =0 P1 p y =1 P2 s x =2 a x =3 q b y =2 c t r y =3 d u • Is it possible for x to be 0 and y to be 3 at the same time? • Does {p, u} form a meaningful global state? Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 38 / 62 State of a Distributed System Revisiting Happened-Before Relation • Earlier, happened-before relation was defined on events • The relation can be extended to local states as follows: • let s be a local state of process Pi • let t be a local state of process Pj • s → t if there exist events e and f such that: 1 Pi executed e after s, 2 Pj executed f before t, and 3 e→f • If s → t, then s and t cannot co-exist at the same time Also, s must occur before t • If s k t, then s and t can co-exist simultaneously Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 39 / 62 State of a Distributed System A Consistent Global State • For a global state G , let G [i] refer to the local state of process Pi in G • A global state G is meaningful or consistent if ∀i, j : i 6= j : (G [i] 9 G [j]) ∧ (G [j] 9 G [i]) ≡ ∀i, j : i 6= j : G [i] k G [j] Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 40 / 62 State of a Distributed System Recording a Consistent Global Snapshot • Many algorithms have been proposed • Focus on two approaches • Freezing based approach • Flushing based approach Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 41 / 62 State of a Distributed System Freezing based Approach Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 42 / 62 State of a Distributed System Freezing based Approach Freezing based Approach • A distinguished process is responsible for collecting the system snapshot • Has two non-overlapping phases • Each phase involves the following • The snapshot process sends a request message to all processes and waits to receive a reply from each one of them • A process on receiving a message from the snapshot process sends a reply message back possibly containing its local state Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 43 / 62 State of a Distributed System Freezing based Approach Phase I: Freeze Phase • Process is frozen and cannot execute certain events • Request message is referred to as Freeze message • Reply message is referred to as Frozen message Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 44 / 62 State of a Distributed System Freezing based Approach Phase II: Unfreeze Phase • Process is unfrozen and resumes normal execution • Request message is referred to as Unfreeze message • Reply message is referred to as Unfrozen message Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 45 / 62 State of a Distributed System Freezing based Approach Two Phase Snapshot Algorithms • Multiple options possible 1 Which event to freeze? (send event or receive event) 2 When to collect process states? (freeze phase or unfreeze phase) Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 46 / 62 State of a Distributed System Freezing based Approach Two Phase Snapshot Algorithms • Four possible options • Two yield correct algorithms (why?) 1 Freeze send events and collect process states in phase I 2 Freeze receive events and collect process states in phase II • Two yield incorrect algorithms (why?) 3 Freeze send events and collect process states in phase II 4 Freeze receive events and collect process states in phase I Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 47 / 62 State of a Distributed System Freezing based Approach Algorithm 1: An Illustration Snapshot Process P1 P2 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 48 / 62 State of a Distributed System Freezing based Approach Algorithm 1: An Illustration Snapshot Process P1 P2 P3 The snapshot process sends Freeze message to all processes. Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 48 / 62 State of a Distributed System Freezing based Approach Algorithm 1: An Illustration Snapshot Process Phase I P1 P2 P3 A process, on receiving the Freeze message, freezes its send events, records its local state and sends Frozen message back to the snapshot process (along with its state). Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 48 / 62 State of a Distributed System Freezing based Approach Algorithm 1: An Illustration Snapshot Process Phase I P1 P2 P3 After receiving Frozen message from all processes, the snapshot process sends Unfreeze message to all processes. Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 48 / 62 State of a Distributed System Freezing based Approach Algorithm 1: An Illustration Snapshot Process Phase I Phase II P1 P2 P3 A process, on receiving the Unfreeze message, unfreezes its send events and sends Unfrozen message back to the snapshot process. Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 48 / 62 State of a Distributed System Freezing based Approach Algorithm 2: An Illustration Snapshot Process P1 P2 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 49 / 62 State of a Distributed System Freezing based Approach Algorithm 2: An Illustration Snapshot Process P1 P2 P3 The snapshot process sends Freeze message to all processes. Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 49 / 62 State of a Distributed System Freezing based Approach Algorithm 2: An Illustration Snapshot Process Phase I P1 P2 P3 A process, on receiving the Freeze message, freezes its receive events and sends Frozen message back to the snapshot process. Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 49 / 62 State of a Distributed System Freezing based Approach Algorithm 2: An Illustration Snapshot Process Phase I P1 P2 P3 After receiving Frozen message from all processes, the snapshot process sends Unfreeze message to all processes. Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 49 / 62 State of a Distributed System Freezing based Approach Algorithm 2: An Illustration Snapshot Process Phase I Phase II P1 P2 P3 A process, on receiving the Unfreeze message, records it local state, unfreezes its receive events, and sends Unfrozen message back to the snapshot process (along with its local state). Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 49 / 62 State of a Distributed System Flushing based Approach Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 50 / 62 State of a Distributed System Flushing based Approach Flushing based Approach • Proposed by Chandy and Lamport (CL) • Assumptions and Features: • channels satisfy first-in-first-out (FIFO) property • channels are not required to be bidirectional • communication topology may not be fully connected • Messages exchanged by the underlying computation (whose snapshot is being recorded) are called application messages • Messages exchanged by the snapshot algorithm are called control messages Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 51 / 62 State of a Distributed System Flushing based Approach The CL Protocol • Initially: all processes are colored blue • Eventually: all processes become red • On changing color from blue to red: record local snapshot send a marker message along all outgoing channels • On receiving marker message along incoming channel C : if color is blue then change color from blue to red endif record state of channel C as application messages received along C since turning red Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 52 / 62 State of a Distributed System Flushing based Approach The CL Protocol (Continued) • Any process can initiate the snapshot protocol by spontaneously changing its color from blue to red • there can be multiple initiators of the snapshot protocol • Global Snapshot: local snapshots of processes just after they turn red • In-Transit Messages: blue application messages received by processes after they have turned red • Why is the global snapshot consistent? • Assume an application message has the same color as its sender • Can a blue process receive a red application message? Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 53 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration • Three processes: P1 , P2 and P3 • Five channels: C1,2 , C1,3 , C2,1 , C2,3 and C3,1 P1 C1,3 C1,2 C3,1 C2,1 P3 P2 C2,3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 54 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 All processes are blue Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 P1 turns red 2 P1 sends a marker message along C1,2 and C1,3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 P2 receives the marker message along C1,2 and turns red 2 P2 sends a marker message along C2,1 and C2,3 3 P2 records the state of C1,2 as ∅ Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 P3 receives the marker message along C1,3 and turns red 2 P3 sends a marker message along C3,1 3 P3 records the state of C1,3 as ∅ Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 P1 receives the marker message along C2,1 2 P1 records the state of C2,1 as {m4 , m5 } Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 P3 receives the marker message along C2,3 2 P3 records the state of C2,3 as {m1 } Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 State of a Distributed System Flushing based Approach The CL Protocol: An Illustration (Continued) P1 m5 m2 m4 m3 P2 m1 P3 1 P1 receives the marker message along C3,1 2 P1 records the state of C3,1 as {m2 , m3 } Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 55 / 62 Monitoring a Distributed System Outline 1 Overview Distributed Systems 2 Ordering Events 3 Abstract Clocks 4 Ordering of Messages 5 State of a Distributed System Freezing based Approach Flushing based Approach 6 Monitoring a Distributed System Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 56 / 62 Monitoring a Distributed System A Subclass of Distributed Computation • Many distributed computations obey the following paradigm: • A process is either in active state or passive state • A process can send an application message only when it is active • An active process can become passive at any time • A passive process, on receiving an application message, becomes active • Intuitively: • If a process is active, then it is doing some work • If process is passive, then it is idle • An active process uses an application message to send a part of its work to another process Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 57 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 P2 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 m1 P2 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 m1 P2 1 0 0 1 0 1 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 m1 P2 1 0 0 1 0 1 m2 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 1 0 0 1 m1 P2 1 0 0 1 0 1 m2 P3 Neeraj Mittal (UT Dallas) 11 00 00 11 Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 1 0 0 1 m1 P2 1 0 0 1 0 1 m2 m3 P3 Neeraj Mittal (UT Dallas) 11 00 00 11 Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 1 0 0 1 m1 m4 P2 1 0 0 1 0 1 m2 m3 P3 Neeraj Mittal (UT Dallas) 11 00 00 11 Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 1 0 0 1 11 00 00 11 m1 m4 P2 1 0 0 1 0 1 m2 m3 P3 Neeraj Mittal (UT Dallas) 11 00 00 11 Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 1 0 0 1 11 00 00 11 m1 m4 P2 1 0 0 1 0 1 1 0 0 1 m2 m3 P3 Neeraj Mittal (UT Dallas) 11 00 00 11 1 0 0 1 Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Distributed Computation: An Illustration P1 1 0 0 1 1 0 0 1 11 00 00 11 1 0 0 1 m1 m4 P2 1 0 0 1 0 1 1 0 0 1 m2 m3 P3 Neeraj Mittal (UT Dallas) 11 00 00 11 1 0 0 1 Theoretical Foundations June 9, 2020 58 / 62 Monitoring a Distributed System Termination Detection • To detect if the computation has finished doing all the work • all processes have become passive, and • all channels have become empty • Different types of computations: • diffusing: only one process is active in the beginning • non-diffusing: any subset of processes can be active in the beginning • no process knows which processes are active and which processes are passive Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 59 / 62 Monitoring a Distributed System Huang’s Protocol • Assumption: computation is diffusing • the initially active process is called the coordinator • coordinator is responsible for detecting termination • Initially: • coordinator has a weight of 1 • all other processes have a weight of 0 • Invariants: • total amount of weight in the system is 1 • a non-coordinator process has a non-zero weight if and only if it is active • a channel has a non-zero weight if and only if it is non-empty • weight of a channel is the sum of the weight of all messages in it Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 60 / 62 Monitoring a Distributed System Huang’s Protocol (Continued) • Actions: • On sending an application message: • send half of the weight along with the message • On receiving an application message: • add the weight of the message to the current weight • On becoming passive: • send the current weight to the coordinator • Coordinator announces termination once: • it has become passive and • it has collected all the weight Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 61 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 0 P2 0 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 m1( 21 ) 0 P2 0 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 m1( 21 ) 0 P2 1 002 11 11 00 00 11 0 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 m1( 21 ) 0 P2 1 002 11 11 00 00 11 1 4 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 00 11 11 00 00 11 m1( 21 ) 0 P2 1 002 11 11 00 00 11 1 4 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) 1 4 11 00 00 11 Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 00 11 11 00 00 11 m1( 21 ) 0 P2 1 002 11 11 00 00 11 1 4 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) 1 4 11 00 00 11 m3( 81 ) 1 8 Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 00 11 11 00 00 11 m1( 21 ) 0 P2 1 002 11 11 00 00 11 1 m4( 16 ) 1 4 3 8 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) 1 4 11 00 00 11 m3( 18 ) 1 8 1 16 Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 9 0016 11 11 00 00 11 00 11 11 00 00 11 m1( 21 ) 0 P2 1 002 11 11 00 00 11 1 m4( 16 ) 1 4 3 8 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) 1 4 11 00 00 11 m3( 18 ) 1 8 1 16 Theoretical Foundations June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 9 0016 11 11 00 00 11 00 11 11 00 00 11 m1( 21 ) 15 16 1 m4( 16 ) 3 8 0 P2 1 2 00 11 00 11 00 11 1 4 3 8 11 00 00 11 0 1 16 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) 1 4 11 00 00 11 m3( 18 ) 1 8 1 16 00 11 00 11 Theoretical Foundations 0 June 9, 2020 62 / 62 Monitoring a Distributed System Huang’s Protocol: An Illustration 1 00 11 00 00 11 P11 1 1 2 9 0016 11 11 00 00 11 00 11 11 00 00 11 m1( 21 ) 15 16 0 1 0 1 0 1 1 1 m4( 16 ) 3 8 0 P2 1 2 00 11 00 11 00 11 1 4 3 8 11 00 00 11 0 1 16 m2( 14 ) 0 P3 Neeraj Mittal (UT Dallas) 1 4 11 00 00 11 m3( 18 ) 1 8 1 16 00 11 00 11 Theoretical Foundations 0 June 9, 2020 62 / 62