Overview: Next three lectures TDDD07 Real-time Systems Lecture 5: Distributed Systems &

advertisement
Overview: Next three lectures
TDDD07 Real-time Systems
Lecture 5: Distributed Systems &
– fundamental issues with timing and order of
events
Simin Nadjm-Tehrani
• Next, hard real-time communication
– Guaranteed message delivery within a
deadline, bandwidth as a resource
Real-time Systems Laboratory
Department of Computer and Information Science
Linköping University
50 pages
Autumn 2015
Questions
• Can we temporally order all events in a
distributed system?
– Only if we can timestamp them with a
value from a global (universal) clock
Undergraduate course on Real-time Systems
Linköping University
2 of 50
Autumn 2015
Time in Distributed Systems
• Physical time vs. Logical time
• Example clock synchronisation algorithm
• Vector clocks
3 of 50
Autumn 2015
Local vs. global clock
• Most physical (local) clocks are not
always accurate
• What is meant by accurate?
– Agreement with UTC
– Coordinated Universal Time (UTC) is in turn
an adjusted time to account for the
discrepancy between time measured based
on rotation of earth, and the International
Atomic Time (IAT)
• An atomic global clock accurately
measures IAT
• If we rely on value of local clocks, they
need to be synchronised regularly
Undergraduate course on Real-time Systems
Linköping University
• Finally: QoS guarantees instead of
timing guarantees, focus on soft RT
• Logical clocks
• Can we draw any conclusions if we do
not have a global clock?
– What about a set of local clocks?
– What if no clocks at all?
Undergraduate course on Real-time Systems
Linköping University
– Allocating VMs on multiple CPUs: Cloud
• Next, fully distributed systems
Real-time Communication
Undergraduate course on Real-time Systems
Linköping University
From one CPU to networked CPUs:
• First, from one CPU to multiple CPUs
5 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
4 of 50
Autumn 2015
Clock synchronisation
Two types of algorithms:
• External synchronisation
– Tries to keep the values of a set of
clocks agree with an accurate clock,
within a skew of δ
• Internal synchronisation
– Tries to keep a set of clock values
close to each other with a maximum
skew of δ
Undergraduate course on Real-time Systems
Linköping University
6 of 50
Autumn 2015
1
Lamport/Melliar-Smith Algorithm
• Internal synchronisation of n clocks
• After each synchronisation interval the
clocks get closer to each other
• Each clock reads the value of all other
clocks at regular intervals
• If the drifts are within δ, and the clocks
are initially synchronised, then they are
kept within δ from each other
– If the value of some clock drifts from the
own clock by more than δ, that clock value
is replaced by own clock value
– The average of all clocks is computed
– Own clock value is updated to the average
value
Undergraduate course on Real-time Systems
Linköping University
Does it work?
• But what if clocks are faulty? What is
considered a fault?
7 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
Faulty clocks
• If a clock skew exceeds δ then its value
is eliminated – does not “harm” other
clocks
• What if the skew is exactly δ?
8 of 50
Autumn 2015
A two-face faulty clock k
c
i
j
c+
– check it as an exercise!
c-
c-2
k
• What is the worst case?
Undergraduate course on Real-time Systems
Linköping University
Will be considered as correct by i and j…
9 of 50
Autumn 2015
Bound on the faulty clocks
• To guarantee that the set will keep all
non-faulty clocks within δ we need an
assumption on the number of faulty
clocks
Undergraduate course on Real-time Systems
Linköping University
10 of 50
Autumn 2015
Time in Distributed Systems
• Physical time vs. Logical time
• Example clock synchronisation algorithm
• For t faulty clocks the algorithm works if
the number of clocks n >3t
• Logical clocks
• Vector clocks
Next we look at events…
Undergraduate course on Real-time Systems
Linköping University
11 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
12 of 50
Autumn 2015
2

Logical time
Happened before~~~~
• In the absence of clock synchronisation
we may use order that is intrinsic in an
application
• Assume each process has a
monotonically increasing local clock
• Rule 1: if the time for event x is before
the time for event y then x  y
• Rule 2: if x denotes sending a message
and y denotes receiving the same
message then x  y
• Rule 3:  is transitive
Client A
ReqA
Client B
Server
RepA
ReqB
A partial order…
Undergraduate course on Real-time Systems
Linköping University
Undergraduate course on Real-time Systems
Linköping University
13 of 50
Autumn 2015
14 of 50
Autumn 2015
Lamport’s Logical clocks
Example (1)
What do we know here?
Seminal paper from 1978…
P
• Logical clock: An event counter that
respects the “happened before” ordering
Q
• Partial order: Hence, any events that
are not in the “happened before”
relation are treated as concurrent
Undergraduate course on Real-time Systems
Linköping University
e
b
R
15 of 50
Autumn 2015
Logical clocks
• Based on event counts at each node
• May reflect causality
• Sending a message always precedes
receiving it
• Messages sent in a sequence by one
node are (potentially) causally related to
each other
– I do not pay for an item if I do not first
check the item’s availability
Undergraduate course on Real-time Systems
Linköping University
g
a
17 of 50
Autumn 2015
c
f
Undergraduate course on Real-time Systems
Linköping University
h
d
16 of 50
Autumn 2015
Implementing logical clocks
LC “time-stamps” each event
• Rule 1: Each time a local event takes
place, increment LC by 1
• Rule 2: Each time a message m is sent
the LC value at the sender is appended
to the message (m_LC)
• Rule 3: Each time a message m is
received set LC to max(LC, m_LC)+1
Undergraduate course on Real-time Systems
Linköping University
18 of 50
Autumn 2015
3
Exercise
• Calculate LC for all events in example
(1)!
What does LC tell us?
• x

→
y
LC(x) <
LC(y)
• Note that:
LC(x)
Undergraduate course on Real-time Systems
Linköping University
19 of 50
Autumn 2015
Example (1)
What did we capture by LC?
P
LC(y) does not imply
x y
Undergraduate course on Real-time Systems
Linköping University
20 of 50
Autumn 2015
Is concurrency transitive?
• e is concurrent with g
• g is concurrent with f
g
a
<
• but e is not concurrent with f!
Q
e
b
R
c
f
h
• Comparing the LC values does not
tell us if two events are “concurrent”
in the sense of 
• Vector clocks do more...
d
Undergraduate course on Real-time Systems
Linköping University
21 of 50
Autumn 2015
Vector clocks (VC)
22 of 50
Autumn 2015
Implementation of VC
• Rule 1: For each local event increment own
entry
• Every node maintains a vector of
counted events (one entry for each
other node)
• Rule 2: When sending message m, append to
m the VC(send(m)) as a timestamp T
• VC for event e, VC(e) = [1,…,n], shows
the perceived count of events at nodes
1,…,n
• VC(e)[k] denotes the entry for node k
Undergraduate course on Real-time Systems
Linköping University
Undergraduate course on Real-time Systems
Linköping University
23 of 50
Autumn 2015
• Rule 3: When event x is “receiving a message”
at node i,
– increment own entry: VC(x)[i]:= VC(x)[i]+1
– For every entry j in the VC: Set the entry to
max (T[j], VC(x)[j])
Undergraduate course on Real-time Systems
Linköping University
24 of 50
Autumn 2015
4
Example (1) revisited
With Vector clocks VC:
P
Q
R
VC(e) = [0, 1, 0]
• Relation < on vector clocks defined by:
VC(x) < VC(y) iff
– For all i: VC(x)[i] ≤ VC(y)[i]
– For some i: VC(x)[i] < VC(y)[i]
VC(g) = [2, 0, 0]
VC(a) = [1, 0, 0]
VC(b) = [1, 2, 0]
VC(c) = [1, 3, 0]
VC(f) = [0,1,1]
Precedence in VC
VC(h) = [2, 4, 0]
• It follows that event x
VC(x) < VC(y)
25 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
Concurrency and VC
x
p
[0,0,0]
h
[0,0,0]
Undergraduate course on Real-time Systems
Linköping University
26 of 50
Autumn 2015
Exercise: Example (2)
[0,0,0]
y
• If neither VC(x) < VC(y) nor
VC(y) < VC(x) then x and y are
concurrent
27 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
Pros and cons
• Vector clocks are a simple means of
capturing “known” precedence
28 of 50
Autumn 2015
Distributed snapshot
• Vector clocks help to synchronise at
event level
– Consistent snapshots
VC(x) < VC(y) → x  y
P
Recall: LC(x) < LC(y) → x  y
Q
R
• For large systems we have resource
issues (bandwidth wasted), and
maintainability issues
Undergraduate course on Real-time Systems
Linköping University
event y if
VC(d) = [1, 3, 2]
Undergraduate course on Real-time Systems
Linköping University
Hence:
• VC(x) < VC(y) iff

• But reasoning about response times and
fault management needs quantitative
bounds
29 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
30 of 50
Autumn 2015
5
We will come back
Overview: Next three lectures
From one CPU to networked CPUs:
• First, from one CPU to multiple CPUs
• ... to faults and the impact of time in
distributed systems in next lectures!
– Allocating VMs on multiple CPUs: Cloud
• Next, fully distributed systems
– fundamental issues with timing and order of
events
• Next, hard real-time communication
– Guaranteed message delivery within a
deadline, bandwidth as a resource
• Finally: QoS guarantees instead of
timing guarantees, focus on soft RT
Undergraduate course on Real-time Systems
Linköping University
31 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
So far…
• We distinguished between reasoning
about message end points (clock
values) and event sequences
32 of 50
Autumn 2015
From last lectures
• In the scheduling lectures we looked at
single processor hard real-time
scheduling
...
• Next: message latency
• RT communication is about scheduling
the communication medium
...
Undergraduate course on Real-time Systems
Linköping University
33 of 50
Autumn 2015
Fundamental reason
Two interaction models in distributed
systems
• Synchronous model
– Assumes that the rate of computation at
different nodes can be related, and there is
a bound on maximum message exchange
latency
Can use timers and
timeouts
• Asynchronous model
– Has no assumptions on rate of processing in
different nodes, or bounds on message
latency
Only coordination possible
Undergraduate course on Real-time Systems
Linköping University
34 of 50
Autumn 2015
Real-time message scheduling
• Needed for providing the bound on
maximum message delay
• Essential for reasoning about system
properties under the synchronous model
of distributed systems
– e.g. proof that a service will be
provided despite a single node crash
will need bounds on message delay
at event level
Undergraduate course on Real-time Systems
Linköping University
35 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
36 of 50
Autumn 2015
6
RT communication in applications
Message constraints
• Vehicle electronics
– Power train and chassis
– Infotainment/telematics
– Body electronics
• Message delivery time bound dictated
by application
– So called end-to-end deadlines
• A modern car configuration has over 40
ECUs, distributed over several buses
• Example: shortly after each driver
braking, brake light must know it in
order to turn on!
• Avionics-specific standards
– ARINC 429 (70’s), AFDX (in Airbus 380)
Undergraduate course on Real-time Systems
Linköping University
37 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
New resource
Single Node
Distributed
Resource
CPU
Bandwidth
Scheduled
element
Task/process
Message
Demand on
resource
WCET &
interarrival
Message size &
frequency
Performance
metric
Deadlines met & Message delay &
Utilisation
Throughput
Undergraduate course on Real-time Systems
Linköping University
Two approaches
• We will look at two well-known methods
for bus scheduling
– Event triggered (CAN)
– Time triggered (TTP)
39 of 50
Autumn 2015
The CAN bus
• Controller area network that was
developed for use in all cars built in
Europe
• Compulsory for the on-board diagnostics
in USA car models from 2008
Amount of
wires…
• Why?
– Imagine: 2500 signals, 32 ECUs on
one bus
Undergraduate course on Real-time Systems
Linköping University
38 of 50
Autumn 2015
41 of 50
Autumn 2015
• Used extensively in automotive and
aerospace applications respectively
Undergraduate course on Real-time Systems
Linköping University
40 of 50
Autumn 2015
Predecessor to CAN (1976)
Ethernet:
• Current versions give high bandwidth
but time-wise nondeterministic
• CSMA/CD
– Sense before sending on the medium
(Carrier Sense: CS)
– All nodes broadcast to all (Multiple
Access: MA)
– If collision, back off and resend
(Collision Detection: CD)
Undergraduate course on Real-time Systems
Linköping University
42 of 50
Autumn 2015
7
Collisions
• Ethernet has high throughput but
temporally nondeterministic
Node 3 waits for sending
• The period for waiting after a collision
• Each node waits up to two “slot times”
after a collision (random wait)
• If a new collision, the max. backoff
interval is doubled
• After 10 attempts the node stops
doubling
• After 16 attempts declares an error
Node 2 & 3 start to send
Node 2 waits for sending
Node 1 sends
Backoff
Collision
Undergraduate course on Real-time Systems
Linköping University
43 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
Collisions & non-determinism
How
• Model the network throughput andoften?
compute probabilistic guarantees that
collisions will not be too often
– Theoretical study: With 100Mbps, sending
1000 messages of 128 bytes per second,
there is a 99% probability that there will not
be a delay longer than 1 ms due to collisions
over ~1140 years
[www.rti.com Ethernet study]
• If you cannot measure effects of
collisions, make collision resolution
deterministic!
Undergraduate course on Real-time Systems
Linköping University
45 of 50
Autumn 2015
CAN protocol
•
•
•
•
•
Developed by Bosch and Intel (1986)
ISO Standard 1993
Highest bandwidth 1Mbps, ~40m
CSMA/CR: broadcast to all nodes
CR: Collision resolution by bit-wise
arbitration plus fixed priorities
(deterministic)
• Bus value is bitwise AND of the sent
messages
Undergraduate course on Real-time Systems
Linköping University
Message priority
• The ID of the frame is located at the
beginning
– initial bits that are inserted into the
bus are the ID-bits
– ID also determines the priority of a
frame
– priority of the frame increases as the
ID decreases
Undergraduate course on Real-time Systems
Linköping University
47 of 50
Autumn 2015
44 of 50
Autumn 2015
46 of 50
Autumn 2015
Bitwise arbitration
Node 1 sends: 010... ... sends rest of packet
Node 2 sends: 100... ... detects collision first
Node 3 sends: 011... ... detects collision next
• This is how ID for a message (frame)
works as its priority
Undergraduate course on Real-time Systems
Linköping University
48 of 50
Autumn 2015
8
Note
• Two roles for message ID:
– Arbitration via priority
– Every node upon receiving a
message, uses the ID to work out
whether that message is any use to it
or not
Response time analysis
• Fixed priorities means RMS-like worst
case analysis
• Messages are sent non-preemptively!
• Blocking is only possible before the first
bit
• Scheduling analysis: Is every message
delivered before its deadline?
Undergraduate course on Real-time Systems
Linköping University
49 of 50
Autumn 2015
Undergraduate course on Real-time Systems
Linköping University
50 of 50
Autumn 2015
9
Download