Overview: Next three lectures TDDD07 Real-time Systems Lecture 4: Distributed Systems

advertisement
Overview: Next three lectures
From one CPU to networked CPUs:
• First, from one CPU to multiple CPUs
TDDD07 Real-time Systems
Lecture 4: Distributed Systems
– Allocating VMs on multiple CPUs: Cloud
• Next, fully distributed systems
– fundamental issues with timing and order of
events
Simin Nadjm-Tehrani
• Next, hard real-time communication
Real-time Systems Laboratory
– Guaranteed message delivery within a
deadline, bandwidth as a resource
Department of Computer and Information Science
Linköping University
• Finally: QoS guarantees instead of
timing guarantees, focus on soft RT
Undergraduate course on Real-time Systems
Linköping University
40 pages
Autumn 2014
Undergraduate course on Real-time Systems
Linköping University
Reading material
Distributed applications
•
•
•
•
• Note specially that from now on we
have articles and e-books from LiU
e-brary
• These are posted for each lecture,
linked from the course web page!
2 of 40
Autumn 2014
Banking systems
On-line access & electronic services
Peer-to-Peer networks
Distributed control
– Cars, Airplanes
• Sensor and ad hoc networks
– Buildings, Env. monitoring
• Mobile/Cloud computing
Undergraduate course on Real-time Systems
Linköping University
3 of 40
Autumn 2014
Undergraduate course on Real-time Systems
Linköping University
Common in all these?
Distributed model of computing:
•
•
•
•
4 of 40
Autumn 2014
Reasons for distribution
• Locality
– Engine control, brake system, gearbox
control, airbag,…
– Clients and servers
Multiple processes
Disjoint address spaces
Inter-process communication
Collectivegoal
goal
Collective
• Organisation of functions/code
– An extension of modularisation, avoiding
single points of failure
• Load sharing
– Web services, search, parallelisation of
heavy computations
Though multicore scheduling
not part of this course!
Undergraduate course on Real-time Systems
Linköping University
5 of 40
Autumn 2014
Undergraduate course on Real-time Systems
Linköping University
6 of 40
Autumn 2014
1
This lecture
(1) Can we guarantee scheduling of tasks
arriving from distributed nodes over a set
of CPUs in the cloud?
(2) What are the fundamental timerelated issues in distributed systems?
– Time, clocks, and ordering of events
– And why faults cannot be ignored...
Datacentre Scheduling
• Tasks are encapsulated in virtual machines
Appi
(VM)
• Arrive from different nodes and their resource
need varies over time
• Goals of scheduling: Allocate VMs to physical
machines (PMs) such that
– No PM is overloaded so that performance of
tasks is not degraded
– No PM is severely underloaded so that
energy is not wasted
• As load changes, decide which VM to migrate!
[Xiao et al. 2013]
Undergraduate course on Real-time Systems
Linköping University
Overview
Adaptation architecture
Predictor
Usher controller
Prober
9 of 40
Autumn 2014
Two mechanisms
• The hotspot solver detects if any PM’s
sum of resource usage is above the hot
threshold
– It will migrate away some VM from that PM
…
PMk
Appn
LNM
App1
PM1
– Coldspot solver
Hypervisor
…
Prober
…
Hypervisor
Undergraduate course on Real-time Systems
Linköping University
10 of 40
Autumn 2014
Estimating resource usage
• Predictor: Based on observation O(t)
and previous estimate E(t-1) they
suggest the Fast Up Slow Down (FUSD)
estimation model:
E(t) = α . E(t-1) + (1 - α). O(t)
• The coldspot solver detects if the PMs
are on average running at a utilisation
below the green computing threshold
– It will migrate away the VMs from some cold
PM to prepare it for shut down
Undergraduate course on Real-time Systems
Linköping University
Coldspot solver
Migration list
– Hotspot solver
• Identify any PM that runs at lower
utilisation than an energy-efficient one
Undergraduate course on Real-time Systems
Linköping University
Hotspot solver
Appm
• Skewness: Notion that describes uneven
utilisation of resources - minimise
skewness over a set of PMs
• To deal with fluctuations it helps to
predict the forthcoming load on each PM
• Identify the candidates for overload
8 of 40
Autumn 2014
App1
7 of 40
Autumn 2014
LNM
Undergraduate course on Real-time Systems
Linköping University
11 of 40
Autumn 2014
where α is a parameter (experimentally) chosen
as -0.2 when O(t) is rising and 0.7 when O(t) is
falling
Undergraduate course on Real-time Systems
Linköping University
12 of 40
Autumn 2014
2
Window of observations
Adaptation architecture
Predictor
Hotspot solver
Coldspot solver
Migration list
Usher controller
Adding a window W of observations and taking the
90th percentile
Undergraduate course on Real-time Systems
Linköping University
13 of 40
Autumn 2014
Skewness
• The skewness for a server p (here a PM)
as a function of the individual resources
ri used within it (here sum of VM
utilisations for a resource ri) and the
average resource utilisation ‾
r
This image cannot currently be display ed.
• A hotspot is a server that has high
temperature
Undergraduate course on Real-time Systems
Linköping University
15 of 40
Autumn 2014
Adaptation architecture
Predictor
Hotspot solver
Coldspot solver
Migration list
Usher controller
Prober
Hypervisor
Undergraduate course on Real-time Systems
Linköping University
…
Prober
…
Appm
App1
LNM
…
PMk
Appn
App1
LNM
PM1
Hypervisor
17 of 40
Autumn 2014
Prober
Hypervisor
Undergraduate course on Real-time Systems
Linköping University
…
Prober
…
Appm
LNM
…
App1
PMk
Appn
App1
LNM
PM1
Hypervisor
14 of 40
Autumn 2014
Load adaptation algorithm
• Order the PMs, choose a PM with highest
temperature first
• Choose one VM on that PM that would reduce
PM’s temperature (if migrated away)
• If many such VMs, choose the one that
reduces skewness most
• Find a PM that can accept this VM and not
become a hot spot
• If many such PMs, choose the one that reduces
its skewness most after accepting the VM
• If no such PM can be found proceed to the next
VM on the hot PM (and eventually to next PM)
Undergraduate course on Real-time Systems
Linköping University
16 of 40
Autumn 2014
Green computing adaptation
• A PM is a coldspot if the sum of the PM’s
utilisations for different resources is below a
given cold threshold
• Choose a green computing (GC) threshold as
an indication that a PM is underloaded
• The algorithm is invoked when average PM
resource utilisation is below the GC threshold
• Start with the PM that has the lowest
utilisation for memory below the cold threshold
• Try to migrate all its VMs to another PM that
will not be hot/cold afterwards (will stay below
a warm threshold) to avoid future hotspots
Undergraduate course on Real-time Systems
Linköping University
18 of 40
Autumn 2014
3
Does adaptation work?
• A first attempt to study load adaptation
vs. energy optimisation
• Clearly shows how scheduling is related
to load control
• Note that there are no performance
guarantees (live migration is expected
to somewhat increase response time)
• Check the scalability arguments!
• Mass scale data as input: web access, online learning
and store data traces from China
Undergraduate course on Real-time Systems
Linköping University
19 of 40
Autumn 2014
This lecture
(1) Can we guarantee scheduling of tasks
arriving from distributed nodes over a set
of CPUs in the cloud?
(2) What are the fundamental timerelated issues in distributed systems?
– Time, clocks, and ordering of events
– And why faults cannot be ignored...
Undergraduate course on Real-time Systems
Linköping University
Summary: Sharing the load
21 of 40
Autumn 2014
Undergraduate course on Real-time Systems
Linköping University
Recall from lecture 1…
Event sequence analysis: From the report on the
2003 electricity blackout in the USA-Canada:
A valuable lesson from the ... blackout is the
importance of having time-synchronized
system data recorders. The Task Force’s
investigators labored over thousands of data
items to determine the sequence of events
much like putting together small pieces of a
very large puzzle. That process would have
been significantly faster and easier if there had
been wider use of synchronized data recording
services.
Undergraduate course on Real-time Systems
Linköping University
Other application examples
From Kopetz (1997):
• Consider a nuclear reactor equipped
with many sensors that monitor
different RT entities (e.g. Values of
pressures, flows in various pipes). If a
pipe ruptures, a number of RT entities
will deviate outside their normal
operating ranges. When an RT entity
enters its alarm region an alarm event is
signalled to the operator.
Undergraduate course on Real-time Systems
Linköping University
23 of 40
Autumn 2014
20 of 40
Autumn 2014
22 of 40
Autumn 2014
Different views
The (global) system view:
• First, the pressure in the ruptured pipe
changes abruptly
• Then the flow changes causing many other RT
entities to react
• These in turn generate own alarms
The operator view:
• Operator sees a set of (correlated) alarms,
called an “alarm shower”
• Operator wants to identify the primary event
Undergraduate course on Real-time Systems
Linköping University
24 of 40
Autumn 2014
4
Event detection
• The computer system must assist the
operator to detect the primary event
that triggers the alarm shower
Causality
• If event e occurs after event e’ then e
cannot be the cause of e’
• If event e occurs before event e’ then e
can be the cause of e’ (but need not be)
• Knowledge of exact temporal order of
the events is helpful in identifying the
primary event
• Temporal order is necessary but not
sufficient for causal order
Time vs. Events
Undergraduate course on Real-time Systems
Linköping University
25 of 40
Autumn 2014
Other contexts: Time matters
Banking and finance:
• The rate of interest is applied to funds
26 of 40
Autumn 2014
Safety-critical systems
• Inaccurate local clocks can be a problem
if the result of computations at different
nodes depend on time
– In a server at a given point in time
– to a balance that reflects transactions by
several clients prior to that point
• The gain/loss on sales of stocks is
dependent on dynamic values of stocks
at a given time (the time of sale and
purchase)
Undergraduate course on Real-time Systems
Linköping University
Undergraduate course on Real-time Systems
Linköping University
27 of 40
Autumn 2014
– Calculation of trajectories: if a missile was
at a given position at a time point before a
computation, where will it be after the
computation?
– If the break signal is issued separately in
different wheels, will the car stop and when?
Undergraduate course on Real-time Systems
Linköping University
Brake-by-wire
28 of 40
Autumn 2014
This lecture
(1) Can we guarantee scheduling of tasks
arriving from distributed nodes over a set
of CPUs in the cloud?
(2) What are the fundamental timerelated issues in distributed systems?
– Time, clocks, and ordering of events
– And why faults cannot be ignored...
Undergraduate course on Real-time Systems
Linköping University
29 of 40
Autumn 2014
Undergraduate course on Real-time Systems
Linköping University
30 of 40
Autumn 2014
5
Questions
• Can we temporally order all events in a
distributed system?
– Only if we can timestamp them with a
value from a global (universal) clock
• Can we draw any conclusions if we do
not have a global clock?
– What about a set of local clocks?
– What if no clocks at all?
Undergraduate course on Real-time Systems
Linköping University
31 of 40
Autumn 2014
6
Download