Clock Synchronization Ken Birman Why do clock synchronization? Time-based computations on multiple machines Applications that measure elapsed time Agreeing on deadlines Real time processes may need accurate timestamps Many applications require that clocks advance at similar rates Real time scheduling events based on processor clock Setting timeouts and measuring latencies Ability to infer potential causality from timestamps Famous example Scud rockets launched by Iraq towards Israel Ground-based Patriot missiles fire back But missiles always missed the warhead! Why? Famous example Scud rockets launched by Iraq towards Israel Ground-based Patriot missiles fire back But missiles always missed the warhead! Why? After 72 hours of waiting control system was out of sync relative to Patriot guidance system “be at (x,y,z) at time t” was misinterpreted! Goals for clock synchronization? We might be concerned with Clock accuracy relative to real-time Clock precision, or degree to which correct clocks agree with one-another Rate of possible clock drift Would we want the Patriot system to be optimally accurate, or optimally precise, if we can’t have both? The System Model Hardware clocks Physical clock of process q designated Rq(t) Clocks have a drift rate ρ: (1+ ρ)-1(t2-t1) Rp(t2)- Rp(t1) (1+ ρ) (t2-t1) Implies that rate of drift is bounded by dr = ρ(2+ ρ)/(1+ ρ) For Byzantine model assume nothing about the clock May increase or decrease or return a random number May get “stuck” (surprisingly common in real systems) Cannot necessarily be modeled by functions. There is a limit tdel on message latency Clock synchronization goals A clock synchronization protocol implements a virtual clock function mapping real time t to Cp(t) Agreement condition: |Cp(t) - Cq(t)| Dmax for all correct p, q Dmax bounds the difference between two virtual clocks running on different processors Accuracy condition: (1+)-1t + a Cp(t) (1+)t +b, for constants a, b, Says that p’s clock must be within a linear envelope of “real time” Clock Time Clocks and True Time b a True Time Authenticated Algorithm Solution for system of n processes, at most f of which are faulty. Let P be the logical time between resynchronizations A process expects the k’th resynchronization at time kP When Cp(t)=kP broadcast a signed message for the form “round k” When a process receives f+1 such messages, it sets its logical clock Cq(t)=kP+ for some constant greater than the increase in Cq since q sent its own round k message. Also, q relays round k messages it receives Srikanth and Toueg give proofs of correctness. Insight: at least one of the round k messages is from a correct process Overview of proof Lemma 1: The k’th resynchronization is bounded in size by some constant dmin, such that for k 1, endk-begink dmin Lemma 2: After k’th resynchronization, correct clocks differ by at most dmin(1+ρ) Lemma 3: No correct process starts its k’th clock until at least some correct process is ready to do so: for k 1, begink readyk Lemma 4: All correct processes start their k’th clock soon after one correct process is ready to do so: endk-readyk (1+ ρ)Dmax+tdel Lemma 5, 6, 7: The periods between resynchronizations and maximum deviations between clocks are bounded and do not overlap Theorem: the algorithm achieves agreement & accuracy Optimality Bound on accuracy: Srikanth and Toueg show that for any synchronization, accuracy cannot exceed that of the underlying hardware clocks And they show that their simple algorithm achieves optimal accuracy Proof is remarkably tricky! Unauthenticated algorithm The algorithm relies on properties of the message system: Correctness: If at least f+1 correct processes broadcast round k messages by time t, then every correct process accepts a message by time t+tdel Unforgeability: If no correct process broadcasts a round k message by time t, then no correct process accepts the message by time t or earlier Relay: If a correct process accepts the message round k at time t, then every correct process does so by time t+tdel Simulating Authentication Here they reference a different paper: T.K. Srikanth and S. Toueg. Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distributed Computing 2(2): 80-94 (1987). Based on an echoing scheme where witnesses to a broadcast effectively “sign it” Cost is O(n3) messages per broadcast round, hence per clock synchronization round Paper claims cost is O(n2) but this assumes a built-in way of sending one message to n processes in one step Realistic cost of resynchronization is something like O(n4) since each process needs to do one of these broadcasts Other ways to think about resynchronization Cristian: probabilistic clock synchronization Starts with observation about RPC If I “ping” you in a network Most round-trip times will be small But distribution may have a heavy tail Expressed in terms of expectation: “with probability p a reply to a ping will be received within time ” Cristian’s scheme His idea: System contains some number of time “authorities” that everyone trusts i.e. they have a GPS receiver – cheap and common… Periodically, client machine a pings authority b asking “what time is it?” If round-trip time is less than , then a replaces Ca(t) with (Ca(t)+ (Cb(t)- /2))/2 With high probability this scheme gives very good clock synchronization. Not tolerant of faults but can be extended into a fault-tolerant solution Verissimo and Rodriguez They notice that clock synchronization is really bounded not by actual latencies but by uncertainty in latency Instead of , think of min+, for some 0 Leads to a solution where accuracy is limited by rather than by Other practical considerations Real systems have Hardware from multiple vendors Operating systems from multiple sources Tends to limit our ability to synchronize clocks Several widely supported standards but no single solution that everyone uses Hence when crossing machine boundaries, expect problems! Real-world clocks Real systems Sometimes stop the clock Sometimes even run the clock backwards! Better approach? Pick a constant and synchronize during periods of time long If clock needs to be adjusted by , adjust at rate / over the course of a period, value catches up Avoids sudden discontinuities or stopping the clock Summary We often assume synchronized clocks In practice, quality of synchronization remains relatively poor At best synchronization will be limited by quality of physical clocks, rates of physical clock drift, and uncertainty in latencies Cristian’s probabilistic scheme makes these uncertainties explicit and also works very well