Clock Synchronization

Clock Synchronization
Ken Birman
Why do clock synchronization?
 Time-based computations on multiple machines
Applications that measure elapsed time
Agreeing on deadlines
Real time processes may need accurate timestamps
 Many applications require that clocks advance at
similar rates
Real time scheduling events based on processor clock
Setting timeouts and measuring latencies
Ability to infer potential causality from timestamps
Famous example
 Scud rockets launched by Iraq towards Israel
 Ground-based Patriot missiles fire back
 But missiles always missed the warhead!
 Why?
Famous example
 Scud rockets launched by Iraq towards Israel
 Ground-based Patriot missiles fire back
 But missiles always missed the warhead!
 Why?
After 72 hours of waiting control system was out
of sync relative to Patriot guidance system
“be at (x,y,z) at time t” was misinterpreted!
Goals for clock
 We might be concerned with
Clock accuracy relative to real-time
Clock precision, or degree to which correct
clocks agree with one-another
Rate of possible clock drift
 Would we want the Patriot system to be
optimally accurate, or optimally precise, if
we can’t have both?
The System Model
 Hardware clocks
Physical clock of process q designated Rq(t)
Clocks have a drift rate ρ:
(1+ ρ)-1(t2-t1)  Rp(t2)- Rp(t1)  (1+ ρ) (t2-t1)
Implies that rate of drift is bounded by dr = ρ(2+ ρ)/(1+ ρ)
For Byzantine model assume nothing about the clock
 May increase or decrease or return a random number
 May get “stuck” (surprisingly common in real systems)
 Cannot necessarily be modeled by functions.
 There is a limit tdel on message latency
Clock synchronization goals
 A clock synchronization protocol implements a
virtual clock function mapping real time t to Cp(t)
 Agreement condition:
|Cp(t) - Cq(t)|  Dmax for all correct p, q
Dmax bounds the difference between two virtual clocks
running on different processors
 Accuracy condition:
(1+)-1t + a  Cp(t)  (1+)t +b, for constants a, b, 
Says that p’s clock must be within a linear envelope of
“real time”
Clock Time 
Clocks and True Time
b a
True Time 
Authenticated Algorithm
 Solution for system of n processes, at most f of which are
 Let P be the logical time between resynchronizations
A process expects the k’th resynchronization at time kP
When Cp(t)=kP broadcast a signed message for the form “round k”
When a process receives f+1 such messages, it sets its logical clock
Cq(t)=kP+ for some constant  greater than the increase in Cq since
q sent its own round k message.
Also, q relays round k messages it receives
 Srikanth and Toueg give proofs of correctness. Insight: at
least one of the round k messages is from a correct process
Overview of proof
 Lemma 1: The k’th resynchronization is bounded in size by some
constant dmin, such that for k  1, endk-begink  dmin
 Lemma 2: After k’th resynchronization, correct clocks differ by at most
 Lemma 3: No correct process starts its k’th clock until at least some
correct process is ready to do so: for k  1, begink  readyk
 Lemma 4: All correct processes start their k’th clock soon after one
correct process is ready to do so: endk-readyk  (1+ ρ)Dmax+tdel
 Lemma 5, 6, 7: The periods between resynchronizations and maximum
deviations between clocks are bounded and do not overlap
 Theorem: the algorithm achieves agreement & accuracy
 Bound on accuracy: Srikanth and Toueg
show that for any synchronization, accuracy
cannot exceed that of the underlying
hardware clocks
 And they show that their simple algorithm
achieves optimal accuracy
 Proof is remarkably tricky!
Unauthenticated algorithm
 The algorithm relies on properties of the message
Correctness: If at least f+1 correct processes broadcast
round k messages by time t, then every correct process
accepts a message by time t+tdel
Unforgeability: If no correct process broadcasts a round
k message by time t, then no correct process accepts the
message by time t or earlier
Relay: If a correct process accepts the message round k
at time t, then every correct process does so by time t+tdel
Simulating Authentication
 Here they reference a different paper:
T.K. Srikanth and S. Toueg. Simulating authenticated broadcasts to
derive simple fault-tolerant algorithms. Distributed Computing 2(2):
80-94 (1987).
Based on an echoing scheme where witnesses to a broadcast
effectively “sign it”
Cost is O(n3) messages per broadcast round, hence per clock
synchronization round
Paper claims cost is O(n2) but this assumes a built-in way of sending one
message to n processes in one step
 Realistic cost of resynchronization is something like O(n4)
since each process needs to do one of these broadcasts
Other ways to think about
 Cristian: probabilistic clock synchronization
Starts with observation about RPC
If I “ping” you in a network
Most round-trip times will be small
 But distribution may have a heavy tail
Expressed in terms of expectation: “with
probability p a reply to a ping will be received
within time ”
Cristian’s scheme
 His idea: System contains some number of time
“authorities” that everyone trusts
i.e. they have a GPS receiver – cheap and common…
 Periodically, client machine a pings authority b
asking “what time is it?”
 If round-trip time is less than , then a replaces
Ca(t) with (Ca(t)+ (Cb(t)- /2))/2
 With high probability this scheme gives very good
clock synchronization. Not tolerant of faults but
can be extended into a fault-tolerant solution
Verissimo and Rodriguez
 They notice that clock synchronization is
really bounded not by actual latencies but by
uncertainty in latency
 Instead of , think of min+, for some   0
 Leads to a solution where accuracy is limited
by  rather than by 
Other practical considerations
 Real systems have
Hardware from multiple vendors
Operating systems from multiple sources
 Tends to limit our ability to synchronize clocks
Several widely supported standards but no single solution
that everyone uses
Hence when crossing machine boundaries, expect
Real-world clocks
 Real systems
Sometimes stop the clock
Sometimes even run the clock backwards!
 Better approach?
Pick a constant  and synchronize during periods of time
 long
If clock needs to be adjusted by , adjust at rate / over
the course of a period, value catches up
Avoids sudden discontinuities or stopping the clock
 We often assume synchronized clocks
 In practice, quality of synchronization remains
relatively poor
 At best synchronization will be limited by quality
of physical clocks, rates of physical clock drift, and
uncertainty in latencies
 Cristian’s probabilistic scheme makes these
uncertainties explicit and also works very well