An example run of the Chandy-Lamport snapshot algorithm (https://decomposition.al/blog/2019/04/26/anexample-run-of-the-chandy-lamport-snapshotalgorithm/) April 26, 2019 In the undergrad distributed systems course (https://decomposition.al/CMPS128-2019-04/) I’m teaching this spring, I decided I wanted to discuss the Chandy-Lamport algorithm for snapshotting the global state of a distributed system (https://lamport.azurewebsites.net/pubs/chandy.pdf) in some detail. When I looked around to see how other people were teaching this algorithm, I kept noticing the exact same example being used in (https://courses.engr.illinois.edu/cs425/fa2017/L13.FA17.pdf) multiple (http://web.kaust.edu.sa/Faculty/MarcoCanini/classes/CS240/F17/slides/L6-global.pdf) people’s (http://www.cs.cornell.edu/courses/cs6410/2017fa/slides/14-lamport-clocks.pdf) lecture notes, for different courses taught at different institutions. (https://courses.engr.illinois.edu/cs425/fa2017/L13.FA17.pdf) Indranil Gupta's lecture notes for CS425 at Illinois. (http://web.kaust.edu.sa/Faculty/MarcoCanini/classes/CS240/F17/slides/L6-global.pdf) Marco Canini's lecture notes for CS240 at KAUST. (http://www.cs.cornell.edu/courses/cs6410/2017fa/slides/14-lamport-clocks.pdf) Hakim Weatherspoon's lecture notes for CS6410 at Cornell. From what I can tell, the original source of this example is Indranil Gupta’s lecture notes for his CS425 distributed systems course at Illinois (https://courses.engr.illinois.edu/cs425/fa2017/). (If I’m wrong about the source, someone please let me know.) Since then, it seems like other people have been borrowing it (with attribution), and for good reason – it’s a nice example! But I found it to be a bit hard to follow, even when I watched Gupta’s own video lecture (https://www.coursera.org/lecture/cloudcomputing/1-2-global-snapshot-algorithm-hndGi) from his Coursera “Cloud Computing” course that shows him walking through the example. Also, there’s a typo in the original example that seems to have propagated to all the other copies – they all have two events labeled “E”! So, this post is my own take on this ubiquitous example, cleaned up a bit and explained in detail. Credit goes to Indy Gupta for coming up with the example in the first place. Introduction A snapshot algorithm attempts to capture a coherent global state of a distributed system (for the purpose of debugging or checkpointing, for instance). This particular snapshot algorithm – the first one, as far as I know – was proposed by Mani Chandy and Leslie Lamport in their 1985 paper “Distributed Snapshots: Determining Global States of Distributed Systems” (https://lamport.azurewebsites.net/pubs/chandy.pdf). Lamport has a funny anecdote (http://lamport.azurewebsites.net/pubs/pubs.html#chandy) about the paper’s origin: The distributed snapshot algorithm described here came about when I visited Chandy, who was then at the University of Texas in Austin. He posed the problem to me over dinner, but we had both had too much wine to think about it right then. The next morning, in the shower, I came up with the solution. When I arrived at Chandy’s office, he was waiting for me with the same solution. Getting to relate this anecdote to my class was at least half my motivation for wanting to cover the Chandy-Lamport algorithm in the first place. Let’s jump in. We’re going to model the system we’re snapshotting as a collection of processes that can send and receive messages among themselves. Sending and receiving are events that take place on processes; in this example, there also happen to be internal events that are neither sends nor receives. Messages are sent and received along channels, which are FIFO queues going between each pair of processes. It turns out to be important that the channels have FIFO behavior for the ChandyLamport algorithm to work. We’ll say that processes in our system are named P , P , etc., and that the channel from process P to process P is named C_ij. For instance, the channel from P to P is C_12, while the channel 1 j 2 i 1 2 from P to P is C_21. A snapshot is a recording of the state of each process (i.e., what events have happened on it) and each channel (i.e., what messages have passed through it). The Chandy-Lamport 2 1 algorithm ensures that when all these pieces are stitched together, they “make sense”: in particular, it ensures that for any event that’s recorded somewhere in the snapshot, any events that happened before (https://www.microsoft.com/en-us/research/publication/time-clocks-ordering-events-distributed-system/) that event in the distributed execution are also recorded in the snapshot. The setup Here’s the execution we’re going to be snapshotting. There are three processes, each with several events, including messages sent and received. Dots on process lines with no incoming or outgoing arrows are internal events. The execution we'll be snapshotting. One of the especially cool things about the Chandy-Lamport algorithm is that it is decentralized – any process (or multiple processes at once!) can begin taking a snapshot without coordinating with other processes. It doesn’t cause problems to have multiple processes simultaneously begin taking a snapshot. For this example, though, we’ll say that a single process, P , initiates the snapshot. Let’s 1 suppose that P initiates the snapshot right after event B has happened. 1 Initiating a snapshot To get a snapshot started, an initiating process has to do three things: First, it has to record its own state. Because we’re initiating the snapshot right after event B, the recorded state of P contains the events A and B. We’ll circle those events to show that 1 they’re recorded. Next – immediately after recording its own state, and before it does anything else – it has to send a marker message out on each of its outgoing channels. A marker message is sent as part of the snapshot algorithm itself, as opposed to what I’ll call application messages, which are part of the system we’re taking a snapshot of. (Marker messages should not themselves be part of the state that the snapshot algorithm is trying to record; we’d like the snapshot we’re recording to not include any artifacts from the snapshot-taking process, if we can help it!) In this case, P sends marker messages on channels C_12 and C_13, respectively. 1 Finally, it needs to start keeping track of the messages it receives on all of its incoming channels. In this case, P has two incoming channels, C_21 and C_31. So, P starts recording 1 incoming messages on those channels. Now, things look like this: Starting our distributed snapshot. 1 We’ve recorded P ’s state, sent marker messages out along C_12 and C_13, and started recording on C_21 and C_31. We’ve drawn the marker messages as dotted lines to tell them apart from 1 application messages. The marker message headed to P is taking its sweet time to get there. Meanwhile, the marker 2 messge sent to P arrives pretty quickly, so let’s talk about what happens when it arrives! 3 Receiving a marker message: the “this is the first marker message I’ve ever seen” case When a process P receives a marker message on channel C_ki, there are two possibilities: either this is the first marker message that P has ever seen, or it isn’t. If it’s the first marker message that P has ever seen, then P needs to do the following: i i i i Record its own state. Mark the channel that the marker message came in on, C_ki, as empty. No more messages can come in on that channel. Or, well, they can, but we won’t be recording them, and so they won’t be part of the snapshot. Send marker messages out itself, on all its outgoing channels. Start recording incoming messages on all its incoming channels except C_ki, the one that it just marked as empty. Is the marker message from P the first marker message that P has ever seen? Yes! So P duly 1 3 3 follows the above steps. It records its own state, which only includes one event, I . It marks channel C_13 as empty, because that’s the channel the marker message came in on, and it sends its own marker messages on its outgoing channels to P and P . It also starts recording incoming messages 1 2 on all its incoming channnels except the one it just received the marker on. Because P only has two incoming channels, C_13 and C_23, and it received the marker on C_13, it only has to start recording on C_23. 3 Now things look like this: States of two processes recorded! Yay! We’ve already recorded the states of two out of three processes! We’re on our way to being done. Receiving a marker message: the “this ain’t my first rodeo” case has sent out its marker messages. Let’s consider the one that went to P first. Is this the marker message the first that P has seen? No, because P was the first process to send marker messages in P3 1 1 1 the first place! (Sending a marker message counts as “seeing” one.) Here’s what a process P should do when it receives a marker message on C_ki that is not the first i marker message it’s ever seen: it should stop recording on C_ki (note that it would have started recording on C_ki back when it saw its first marker message), and it should set C_ki’s final state as the sequence of all the messages that arrived on C_ki since recording began. That’s it! So, P can now stop recording on channel C_31. It turns out it didn’t receive any messages on that channel during the time it was recording, anyway. So C_31’s final recorded state is just the empty sequence. 1 A little further along. Finishing up Now we can look at the marker message sent from P to P . What does P do when it gets the 3 2 2 marker message? It’s the first marker message that P has seen, so P records its state, which includes events F , G, and H . It also marks the channel that the marker message came in on, C_32, as empty; sends out its own markers on its outgoing channels to P and P ; and starts recording on 2 2 1 3 every incoming channel except C_32, the one it got the marker message on – so, just C_12. Now every process has recorded its state! Our picture looks like this: Getting there... But we’re not quite done. Now that marker message that P sent to P approximately forever ago has finally arrived. Because P has seen a marker message before, it can now stop recording on 1 2 2 channel C_12 (which it only just started recording on). No messages were received during the recording period, so C_12’s final recorded state is empty, too. ’s role in taking the snapshot is now completely done: it’s recorded its own state and that of both its incoming channels. P2 P_2 is done! P1 still has to finish up, though. It’s waiting for a marker message to come in from P , so it can stop 2 recording on channel C_21. Hey, look – that marker message just came in! Did any messages get recorded on C_21? Yes! The message whose send event was H and whose receive event was D did. So that event goes into C_21’s final channel state. Almost done... And finally, P gets the last marker that it was waiting for, from P , and the final state of C_23 can be set to empty because no messages came in while we were recording on that channel. 3 2 That's it! Now, all the processes’ states and all the channels’ states have been recorded, so we can say that we’re done! Discussion Now that we’re done taking the snapshot, we can ignore the marker messages and just look at the state we recorded. We said earlier that a property we want to have be true of any snapshot we take is that, for any event recorded somewhere in the snapshot, any events that happened before that event in the distributed execution should also be recorded in the snapshot. We can see that that’s true of the snapshot we just took: for every circled event, if we look at the events in its causal history, those events are all in the snapshot as well. In other words, the snapshot produces a consistent cut in our diagram, where every message received on the “past” side of the cut was sent on the “past” side, and no messages go backwards in time. The snapshot corresponds to a consistent cut. A student asked me whether event D counts as part of the recorded state, because it’s the receive event for the message from H to D that we recorded on C_21. We should not consider D to be part of the recorded process state. (In fact, in our example we have event C hanging out on P in D’s causal history, so if D were in the snapshot, then C would have to be as well for it to be a 1 consistent snapshot.) So, what’s the point of recording channel states, then? Well, imagine that we want at most one process in this execution to have exclusive access to some sort of resource (say, a file). We can think of access to the resource as being a token that processes pass around. At any given time, either a process should have the token, or it should be in transit between processes. Suppose that P has the token and then decides to hand it off to P . The message sent at H and received at D might say, “I’m passing the token to you.” If this were the case, then the process state at P was recorded after P gave up the token, but the process state at P was recorded before P got the token. This isn’t great – it means that, for anyone looking at only the recorded process states, the token seems to have disappeared! Recording the channel states addresses this problem: since the message from H to D is part of the recorded channel state, someone looking at the snapshot 2 1 2 2 1 1 can inspect the contents of the message and see that when the snapshot was taken, the token was in transit from P to P , maintaining the invariant that there should be one token in the system. 2 Categories: 1 distributed computing teaching Updated: April 26, 2019 COMMENTS Sponsored 'India never had a golden age' The Hindu Learn More This deadly brain infection will kill my wife! Donate For Health Donate Now Term plan with 99.17% Claim Settlement Ratio for FY24! ICICI Pru Life Insurance Plan Pilani: The price (& size) of these hearing aids might surprise you Get Quote 1 Login 21 Comments G Join the discussion… LOG IN WITH OR SIGN UP WITH DISQUS ? Name 10 Share K kautuk khare Best Newest Oldest − ⚑ 5 years ago thanks for the article Lindsey. very helpful. 2 0 Reply ⥅ − ⚑ klion26 5 years ago thanks for the great article! 2 0 Reply ⥅ − ⚑ NNP 3 years ago Very much helful, thank you 1 Y 0 Reply ⥅ − ⚑ Yu Li 5 years ago Thanks for the artical Lindsey! Very helpful. I like the discussion in the last paragraph, especially the explanation that we could take the internal events as messages sent to the process itself. In this case, we will include C, D, J and H->D into the "past" side of the consistent cut in the example, which is more intuitive. 1 D 0 Reply ⥅ − ⚑ Dhairya Baweja 5 years ago Excellent explaination!! Thanks 1 P 0 Reply Paweł Laskowski 5 years ago ⥅ − ⚑ Thank you! That was really helpful! 1 K Reply 0 ⥅ − ⚑ kautuk khare 5 years ago Is it correct to infer that if any message is captured in a channel state, then its Send event would be definitely captured in Process state, and its Receive event will definitely not be captured in Process State 1 Reply 0 ⥅ Lindsey Kuper Mod > kautuk khare 5 years ago − ⚑ This is left as an exercise for the reader. :) 1 0 Reply ⥅ − ⚑ klion26 5 years ago I have a question: in the final global snapshot, we just have all the dot(A, B, F, G, H, I), I'm wandering why do we need to snapshot the channel? Seems the snapshot of all the channels is not contained in the global snapshot 1 Reply 0 ⥅ Lindsey Kuper Mod 5 years ago > klion26 edited − ⚑ The snapshots of the channels *are* considered part of the global snapshot. That's what the definition of a global snapshot is: it includes snapshots of the state of each process and of the state of each channel. I just didn't include the channel states in my illustration of a consistent cut because they don't seem relevant there. When *do* we care about channel states, then? Consider a system with processes P1 and P2, where a token is being passed between processes. Only one process can have the token at a time. Suppose P1 passes the token to P2 (via a message). Then P1 initiates a snapshot, starting by recording its own state. P1 no longer has the token, so its recorded state won't show it having the token. But suppose that P2 also initiated a snapshot before receiving the token. Now the token doesn't appear in either process's recorded state. This is bad -- the token seems to have disappeared. But we're saved by the fact that P2 started recording on its incoming channels after snapshotting its own state, so the message that transferred the token to P2 will get captured there, making it clear that the token didn't disappear, but was in fact in transit at the time that both P1 and P2 snapshotted themselves. There's an example similar to this in the original paper by Chandy and Lamport. The existence of the single token is an example of what the paper calls a stable property, and in general you can use Chandy-Lamport snapshots (including the channel states) to detect whether a stable property is true of a system. 0 0 Reply klion26 ⥅ > Lindsey Kuper − ⚑ klion26 > Lindsey Kuper 5 years ago thanks very much for the detailed explain. "Now the token doesn't appear in either process's recorded state. This is bad -- the token seems to have disappeared." this explains what I did not understand before. 1 S Reply 0 ⥅ − ⚑ Sai a year ago so the state is recorded only once forever? or every time a message is sent the state is recorded? 0 S Reply 0 ⥅ − ⚑ sai > Sai a year ago I guess the state is recorded when a process initiates the algorithm 0 Reply 0 ⥅ Lindsey Kuper Mod > sai a year ago − ⚑ Instead of saying "the state", we should distinguish between process state and channel state. A process records its process state (i.e., all the events that have happened on it so far, including message send events, message receive events, and internal events) when it initiates the algorithm or, if it's not an initiator, when it first gets a marker message from someone. After that, it's done recording process state. Then, on each of its incoming channels (except for the one it received the marker message on, if it wasn't an initiator), the process records incoming messages until it gets a marker message on that channel. Then, it's done recording channel state. 1 L Reply 0 ⥅ − ⚑ Lin Tian 2 years ago Thank you for this amazing article! 0 Reply 0 ⥅ Lindsey Kuper Mod > Lin Tian 2 years ago − ⚑ Thanks for reading. 0 0 Mohan Radhakrishnan 2 years ago Reply ⥅ − ⚑ Do you know if anyone( Jepsen etc. ) has code to execute a simple snapshot ? I know I could search Flink's codebase but that may not be easy because according to http://www.vldb.org/pvldb/v... state storage mechanisms are also key to this algorithm. 0 Reply 0 ⥅ Lindsey Kuper Mod 2 years ago > Mohan Radhakrishnan edited − ⚑ It looks like there are a number of GitHub repos tagged with topics like "chandy-lamport-snapshot-algorithm" or "chandy-lamport-protocol" or "global-snapshotalgorithm" (e.g., https://github.com/topics/c... ), but I haven't dug into them. 0 0 Reply ⥅ Mohan Radhakrishnan 2 years ago > Lindsey Kuper − ⚑ Sponsored 'India never had a golden age' The Hindu Learn More This deadly brain infection will kill my wife! Donate For Health Donate Now Term plan with 99.17% Claim Settlement Ratio for FY24! ICICI Pru Life Insurance Plan Get Quote
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )