# File

```云计算基础分布式时间与时钟

[email protected]
15058033236
2014.2.21









……
3


Solar time (太阳时)




1 sec = 1 day / 86400
Problem: days are of different
lengths (due to tidal friction, etc.)
mean solar second: averaged over
many days
Greenwich Mean Time
(GMT 格林尼治)


The mean solar time at Royal
Observatory in Greenwich, London
Greenwich located at longitude 0,
the line that divides east and west
4

Time (UTC)





1 秒  Cesium-133 原子发生 9,192,631,770 次状态转变
TAI time is simply the number of Cesium-133 transitions since
midnight on Jan 1, 1958.
Accuracy: better than 1 second in six million years
Problem: Atomic clocks do not keep in step with solar time



Based on the atomic time (TAI) and introduced from 1 Jan 1972
A leap second is occasionally inserted or deleted to keep in step with
solar time when the difference btw a solar-day and a TAI-day is over
800ms
5









e.g., 60 or 100中断为1sec

(PIC)
CPU
counter
register
6

clock skew



Clocks tick at different rates




Ordinary quartz clocks drift
by ~ 1sec in 11-12 days.
(10-6 secs/sec).
High precision quartz clocks
drift rate is ~ 10-7 or 10-8
secs/sec
Create ever-widening gap in
perceived time

Skew (offset)

Difference between two
clocks at one point in time
7
(完美时钟)Perfect clock
8
Drift with a slow computer clock
9
Drift with a fast computer clock
10
Dealing with drift

No good to set a clock backward


Illusion of time moving backwards can confuse
message ordering and software development
environments


If fast: Make clock run slower until it synchronizes
If slow: Make clock run faster until it synchronizes
11

function




e.g.: if the system generates
an interrupt every 17 ms but
clock is too slow: generates
an interrupt at (e.g.) 15 ms

system time: Linear
compensating function
12







int adjtime(struct timeval *delta, struct timeval *old-delta)
or retarding it, by the amount of time specified in the struct
timeval pointed to by delta. “old-delta”, output parameter,
returns time left uncorrected since last call of “adjtime”

13
Getting UTC from Top Sources









Environmental Satellites,
http://www.goes.noaa.gov/)


Not practical for every machine
– Cost, size, convenience, environment
14
Getting UTC for Client Computers

Client Computer 和 time server 同步时间




source）
Also called external clock synchronization
15
Synchronizing Clocks by using RPC

Simplest synchronization technique


Make an RPC to obtain time from the server
Set the local clock to the server time
What’s the time?
server
client
10:25:18

16
Cristian’s algorithm
Compensate for network delays (assuming
symmetric)
 client sends a request at T0
 server replies with the current clock value
T T
T

T

Tserver
2
 client receives response at T1
 client sets its clock to:
1
client
server
0
Cristian’s algorithm: example


Send request at 5:08:15.100 (T0)


Response contains 5:09:25.300 (Tserver)
Round-trip time is T1 − T0
5:08:15.900 - 5:08:15.100 = 800 ms


Best guess: timestamp was generated 400 ms ago
Set the local time to Tserver + round-trip-time/2
5:09:25.300 + 400 = 5:09.25.700

Accuracy: ± round-trip-time/2
Tserver
server
client
T0
T1
18
Cristian’s algorithm: error bound
Tmin: Minimum message travel
time
(
)
19
Problems with Cristian’s algorithm


Server might fail
Subject to malicious interference
20
（伯克利算法）Berkeley Algorithm





Proposed by Gusella & Zatti, 1989 and
implemented in BSD version of UNIX
Aim: synchronize clocks of a group of machines
as close as possible (also called internal
synchronization)
Assumes no machine has an accurate time
source (i.e., no differentiation of client and server)
Obtains average from participating computers
Synchronizes all clocks to average
21
（伯克利算法）Berkeley Algorithm

1.
One machine is elected (or designated) as the master;
others are slaves:
Master polls all slaves periodically, asking for their time

2.
When results are collected, compute the average

3.
Cristian’s algorithm can be used to obtain more accurate clock
values from other machines by counting network latency
Including master’s time
Send each slave the offset that its clock need be

Avoids problems with network delays by sending “offset”
22
（伯克利算法）Berkeley Algorithm

Algorithm has provisions for ignoring
readings from clocks whose skew is too large


Compute a fault-tolerant average
Any slave can take over the master if master
fails
23
Berkeley Algorithm: example
24
Berkeley Algorithm: example
25
Berkeley Algorithm: example
+0:05
3:00
+0:15
3:25
2:50
9:10
3. Send offset to each client
26

Protocol (NTP)




NTP 是非常常用的互联网时间协议，它的准确性也非常高 (RFC
1305, http://tf.nist.gov/service/its.htm ).

1900 with a resolution of 200 pico-s）.
Many NTP client software for PC only gets time from a single server
(no averaging). The client is called SNTP (Simple Network Time
Protocol, RFC 2030), a simple version of NTP.
27
NTP synchronization subnet

。。。
28
NTP goals


（有消息延迟）







Survive lengthy losses of connectivity
Redundant paths
Redundant servers

synchronize frequently


Use statistical techniques to filter data and improve quality of
results
Adjustment of clocks by using offset (for symmetric mode)

interference

Authenticate source of data
29
NTP Synchronization Modes

Multicast (for quick LANs, low accuracy)


Remote Procedure Call (medium accuracy)



server periodically multicasts its time to its clients in the subnet
server responds to client requests with its actual timestamp
like Cristian’s algorithm
Symmetric mode (high accuracy)

used to synchronize between the time servers (peer-peer)
All messages delivered unreliably with UDP
30
Symmetric mode

The delay between the arrival of a request (at server B) and
the dispatch of the reply is NOT negligible:
Ti-2 Ti-1
Server B
Server A

m
Ti-3
m’
Ti
time
Delay = total transmission time of the two messages
di = (Ti – Ti-3 ) – (Ti-1– Ti-2)

Offset of clock A relative to clock B:

Offset of clock A: oi 
(Ti  2  Ti 3 )  (Ti 1  Ti )
2

Set clock A: Ti + oi
(Ti 1 

Accuracy bound: di /2
di
)
2
31
Symmetric mode (another
expression)
T
i-2
Server B
Server A

Ti-1
m’
m
Ti-3
di
2
Ti
time
Delay = total transmission time of the two messages
di = (Ti – Ti-3 ) – (Ti-1– Ti-2)

Clock A should set its time to (the best estimate of B’s time at Ti):
Ti-1 + di/2, which is the same as Ti + oi
32
Symmetric NTP example
Server B
Server A
Ti-2 =800 Ti-1 =850
m
Ti-3 =1100
m’
Ti =1200
time
Offset oi=((800 – 1100) + (850 – 1200))/2 = – 325
Set clock A to: Ti + oi = 1200 – 325 = 875
Note: Server A need to adjusts it current clock (1200ms)
by gradual slowdown its pace until -325ms is compensated.
33
Improving accuracy

Data filtering from a single source



Peer-selection: synchronize with lower stratum servers


Retain the multiple most recent pairs < oi, di >
Filter dispersion: choose oj corresponding to the smallest dj
lower stratum numbers, lower synchronization dispersion
The stratum of a server is dynamically changing,
depending on which server it synchronize with
34
Simple Network Time Protocol
(SNTP) RFC 2030


Targeted for machines that have no need of full NTP
implementation, particularly for machines at the end of
synchronization subnet (client nodes)
SNTP operate in one of the following modes:




designated server

broadcast/multicast its time to the subnet and does not serve
any requests from clients

request to the local subnet and takes the first response for time
synchronization
35


Cannot synchronize physical clocks perfectly
in distributed systems. [Lamport 1978]

Main function of computer clocks – order
events


If two processes don’t interact, there is no need
to sync clocks.
37


Order events with happened-before ()
relation

ab


a could have affected the outcome of b
a || b


a and b take place in different processes that don’t
exchange data
Their relative ordering does not matter (they are
concurrent)
38
Definition of happened-before
Definition of “” relationship:
1.
If a and b take place in the same process

2.
If a and b take place in the different processes

3.
a comes before b, then a  b
a is a “send” and b is the corresponding “receive”, then a  b
Transitive: if a  b and b  c, then a  c
Partial ordering – unordered events are concurrent
39
Logical Clocks

A logical clock is a monotonically increasing
software counter. It need not relate to a
physical clock.


not subtracting
Rule for assigning “time” values to events

if a  b then clock(a) < clock(b)
40
Event counting example



Three processes: P0, P1, P2, events a, b,
c, …
A local event counter in each process.
Processes occasionally communicate with
each other, where inconsistency occurs, …
Bad ordering: e  h, f  k
41
Lamport’s algorithm, 1978
Each process Pi has a logical clock Li. Clock synchronization
algorithm:
1.
Li is initialized to 0;
2.
Update Li:



LC1: Li is incremented by 1 for each new event happened in Pi
LC2: when Pi sends message m, it attaches t = Li to m
LC3: when Pj receives (m,t) it sets Lj := max{Lj, t} , and then applies
LC1 to increment Lj for event receive(m)
42
Problem: Identical timestamps
Concurrent events (e.g., a, g) may have the same timestamp
43
Make timestamps unique
Append the process ID (or system ID) to the
clock value after the decimal point:

e.g. if P1, P2 both have L1 = L2 = 40, make L1 =
40.1, L2 = 40.2
44
Problem: Detecting causal relations



If a  b, then L(a) < L(b), however:
If L(a) < L(b), we cannot conclude that a  b
It is not very useful in distributed systems.
L(g) < L(c ), but g || c

Solution: use vector clocks
45
Vector of Timestamps
Suppose there are a group of people and each needs to
keep track of events happened to others.
Requirement: Given two events, you need to tell whether
they are sequential or concurrent.
Solution: you need to have a vector of timestamps, one
element for each member.
(3,0,0)
(?,?,?)
46
Vector clocks
Each process Pi keeps a clock Vi which is a vector of N
integers
 Initialization: for 1 ≤ i ≤ N and 1 ≤ k ≤ N, Vi[k] := 0
 Update Vi :



VC1: when there is a new event in Pi, it sets Vi[i] := Vi[i] +1
VC2: when Pi sends a message m out, it attaches t = Vi to m
VC3: when Pj receives (m,t), for 1 ≤ k ≤ N, it sets Vj[k] := max{Vj[k],
t[k]}, then applies VC1 to increment Vj[j] for event receive(m,t)
Note: Vi[j] is a timestamp indicating that Pi knows all events
that happened in Pj upto this time.
47
Vector timestamps: example
48
Vector timestamps: example
49
Vector timestamps: example
50
Vector timestamps: example
51
Vector timestamps: example
52
Vector timestamps: example
53
Vector timestamps: example
54
Detecting “” or “||” events by time
vectors

Define
V = V’ iff V[i] = V’[i]) for i = 1, …, N
V ≤ V’ iff V[i] ≤ V’[i]) for i = 1, …, N
V < V’ iff V ≤ V’ and V ≠ V’


V(e)  timestamp vector of an event e
For any two events a and b,


a  b iff V(a) < V(b), a ≠ b
a || b iff neither V(a) ≤ V(b) nor V(b) ≤ V(a)
55
Detecting “” or “||” events: an
example
56
Summary on vector timestamps




57
An Application of Timestamp
Vectors: causally-ordered multicast
Multicast: a sender sends a message to a group of receivers.
Every message must be received by all group members.
Causally ordered multicast: if m1  m2, m1 must be received
(2,2,0)
(1,0,0)
(0,0,0)
(1,2,0)
(0,0,0)
(0,0,0)
(1,1,0)
(1,0,1) (1,2,2)
58
Algorithm of Causally-Ordered
Multicast
Each group member keeps a timestamp vector of n components (n group
members), all initialized to 0.
1. When Pi multicasts a message m, it increments i-th component of its
time vector Vi and attaches Vi to m.
2. When Pj with Vj receives (m, Vi) from Pi:
if (Vj [k]  Vi[k] for all k, k≠ i), then
Vj [i] := Vi [i]; // Vi [i] is always greater than Vj [i]
Vj [j] := Vj [j] + 1;
(2,2,0) (3,2,0)
(1,0,0)
Deliver m;
(0,0,0)
otherwise
(1,2,0)
Delay m until “if” is met.
(0,0,0)
(0,0,0)
(1,1,0)
(3,3,0)
?
(1,0,1) (1,2,2)
59
Causal-Order Preserved



If m1  m2, m1 is received by (delivered to) all recipients before
m2.
If m1 || m2, m1 and m2 can be received in arbitrary order by
recipients.
Total ordered multicast: for case of m1 || m2, m1 and m2 must
be received in the same order by all recipients (i.e., either all
(3,2,0)
m1 before(1,0,0)
m2, or all m2 before(2,2,0)
m1).
(0,0,0)
(1,2,0)
(0,0,0)
(0,0,0)
(1,1,0)
(1,3,0)
(3,4,0)
(1,0,1) (1,2,2)
?
(3,3,
4)
60

```
Arab people

15 Cards

Pastoralists

20 Cards