Clock Synchronization

advertisement
Clock Synchronization
Ken Birman
Why do clock synchronization?
 Time-based computations on multiple machines



Applications that measure elapsed time
Agreeing on deadlines
Real time processes may need accurate timestamps
 Many applications require that clocks advance at
similar rates



Real time scheduling events based on processor clock
Setting timeouts and measuring latencies
Ability to infer potential causality from timestamps
Famous example
 Scud rockets launched by Iraq towards Israel
 Ground-based Patriot missiles fire back
 But missiles always missed the warhead!
 Why?
Famous example
 Scud rockets launched by Iraq towards Israel
 Ground-based Patriot missiles fire back
 But missiles always missed the warhead!
 Why?


After 72 hours of waiting control system was out
of sync relative to Patriot guidance system
“be at (x,y,z) at time t” was misinterpreted!
Goals for clock
synchronization?
 We might be concerned with



Clock accuracy relative to real-time
Clock precision, or degree to which correct
clocks agree with one-another
Rate of possible clock drift
 Would we want the Patriot system to be
optimally accurate, or optimally precise, if
we can’t have both?
The System Model
 Hardware clocks


Physical clock of process q designated Rq(t)
Clocks have a drift rate ρ:



(1+ ρ)-1(t2-t1)  Rp(t2)- Rp(t1)  (1+ ρ) (t2-t1)
Implies that rate of drift is bounded by dr = ρ(2+ ρ)/(1+ ρ)
For Byzantine model assume nothing about the clock
 May increase or decrease or return a random number
 May get “stuck” (surprisingly common in real systems)
 Cannot necessarily be modeled by functions.
 There is a limit tdel on message latency
Clock synchronization goals
 A clock synchronization protocol implements a
virtual clock function mapping real time t to Cp(t)
 Agreement condition:


|Cp(t) - Cq(t)|  Dmax for all correct p, q
Dmax bounds the difference between two virtual clocks
running on different processors
 Accuracy condition:


(1+)-1t + a  Cp(t)  (1+)t +b, for constants a, b, 
Says that p’s clock must be within a linear envelope of
“real time”
Clock Time 
Clocks and True Time
b a
True Time 
Authenticated Algorithm
 Solution for system of n processes, at most f of which are
faulty.
 Let P be the logical time between resynchronizations




A process expects the k’th resynchronization at time kP
When Cp(t)=kP broadcast a signed message for the form “round k”
When a process receives f+1 such messages, it sets its logical clock
Cq(t)=kP+ for some constant  greater than the increase in Cq since
q sent its own round k message.
Also, q relays round k messages it receives
 Srikanth and Toueg give proofs of correctness. Insight: at
least one of the round k messages is from a correct process
Overview of proof
 Lemma 1: The k’th resynchronization is bounded in size by some
constant dmin, such that for k  1, endk-begink  dmin
 Lemma 2: After k’th resynchronization, correct clocks differ by at most
dmin(1+ρ)
 Lemma 3: No correct process starts its k’th clock until at least some
correct process is ready to do so: for k  1, begink  readyk
 Lemma 4: All correct processes start their k’th clock soon after one
correct process is ready to do so: endk-readyk  (1+ ρ)Dmax+tdel
 Lemma 5, 6, 7: The periods between resynchronizations and maximum
deviations between clocks are bounded and do not overlap
 Theorem: the algorithm achieves agreement & accuracy
Optimality
 Bound on accuracy: Srikanth and Toueg
show that for any synchronization, accuracy
cannot exceed that of the underlying
hardware clocks
 And they show that their simple algorithm
achieves optimal accuracy
 Proof is remarkably tricky!
Unauthenticated algorithm
 The algorithm relies on properties of the message
system:



Correctness: If at least f+1 correct processes broadcast
round k messages by time t, then every correct process
accepts a message by time t+tdel
Unforgeability: If no correct process broadcasts a round
k message by time t, then no correct process accepts the
message by time t or earlier
Relay: If a correct process accepts the message round k
at time t, then every correct process does so by time t+tdel
Simulating Authentication
 Here they reference a different paper:


T.K. Srikanth and S. Toueg. Simulating authenticated broadcasts to
derive simple fault-tolerant algorithms. Distributed Computing 2(2):
80-94 (1987).
Based on an echoing scheme where witnesses to a broadcast
effectively “sign it”


Cost is O(n3) messages per broadcast round, hence per clock
synchronization round
Paper claims cost is O(n2) but this assumes a built-in way of sending one
message to n processes in one step
 Realistic cost of resynchronization is something like O(n4)
since each process needs to do one of these broadcasts
Other ways to think about
resynchronization
 Cristian: probabilistic clock synchronization


Starts with observation about RPC
If I “ping” you in a network
Most round-trip times will be small
 But distribution may have a heavy tail


Expressed in terms of expectation: “with
probability p a reply to a ping will be received
within time ”
Cristian’s scheme
 His idea: System contains some number of time
“authorities” that everyone trusts

i.e. they have a GPS receiver – cheap and common…
 Periodically, client machine a pings authority b
asking “what time is it?”
 If round-trip time is less than , then a replaces
Ca(t) with (Ca(t)+ (Cb(t)- /2))/2
 With high probability this scheme gives very good
clock synchronization. Not tolerant of faults but
can be extended into a fault-tolerant solution
Verissimo and Rodriguez
 They notice that clock synchronization is
really bounded not by actual latencies but by
uncertainty in latency
 Instead of , think of min+, for some   0
 Leads to a solution where accuracy is limited
by  rather than by 
Other practical considerations
 Real systems have


Hardware from multiple vendors
Operating systems from multiple sources
 Tends to limit our ability to synchronize clocks


Several widely supported standards but no single solution
that everyone uses
Hence when crossing machine boundaries, expect
problems!
Real-world clocks
 Real systems


Sometimes stop the clock
Sometimes even run the clock backwards!
 Better approach?



Pick a constant  and synchronize during periods of time
 long
If clock needs to be adjusted by , adjust at rate / over
the course of a period, value catches up
Avoids sudden discontinuities or stopping the clock
Summary
 We often assume synchronized clocks
 In practice, quality of synchronization remains
relatively poor
 At best synchronization will be limited by quality
of physical clocks, rates of physical clock drift, and
uncertainty in latencies
 Cristian’s probabilistic scheme makes these
uncertainties explicit and also works very well
Download