Outline TDDB47 Real-time Systems Lecture 4: Scheduling(II) cont’d & Dependability

TDDB47 Real-time Systems
Lecture 4: Scheduling(II) cont’d
& Dependability
Simin Nadjm-Tehrani
Real-time Systems Laboratory
Department of Computer and Information Science
Linköping university
Undergraduate course on Real-time Systems
Linköping University
36 pages
Autumn 2005
Part 1:
• Continuing with ICP and Deadlocks
• Towards more dynamic scheduling:
– Offloading non-critical tasks to other
processors (distributed scheduling)
Part 2:
• Dependable Systems
Undergraduate course on Real-time Systems
Linköping University
Reading material (part 1)
• Chapter 13 of Burns & Wellings, in
particular 13.11.
• Background reading on deadlocks
2 of 36
Autumn 2005
ICP & Deadlock
• The ICP prevents deadlocks (How?)
• Moreover, it prevents starvation (How?)
• Article by Ramamritham, Stankovic, and
Zhao, IEEE Transactions on Computers,
Volume 38(8), August 1989. For the
evaluation results concentrating on
section V.B.
Undergraduate course on Real-time Systems
Linköping University
3 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Deadlock prevention
4 of 36
Autumn 2005
• Prevention technique
– allocate all necessary resources at
once, before execution
– Drawbacks?
... ...
– Recall the new problem created with
the dining philosopher problem…
Undergraduate course on Real-time Systems
Linköping University
5 of 36
Autumn 2005
Starvation/lockout happens
if some process never gets hold of
the resources it needs despite the
fact that the resources are not
constantly engaged
Undergraduate course on Real-time Systems
Linköping University
6 of 36
Autumn 2005
What is meant by liveness?
• Liveness – i.e. absence of
deadlock, starvation and livelock is
necessary in real-time systems
Now back to scheduling...
• Immediate ceiling protocol (ICP) is
deadlock preventing
• But not sufficient...
• If this can be guaranteed then the
system is live – intuitively, the
good things that should happen will
happen sooner or later
Undergraduate course on Real-time Systems
Linköping University
7 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
ICP & Deadlock
• The ICP prevents deadlocks (How?)
• Moreover, it prevents starvation (How?)
8 of 36
Autumn 2005
Distributed Scheduling
• Relax the restriction on a-priori fixed task sets
and arrival times
• Consider CPU and all other resources a task
might need to complete
• Can a task be offloaded from the processor it
arrives at?
Undergraduate course on Real-time Systems
Linköping University
9 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
• Task T arrives at node Ni
• If Ni cannot guarantee T meeting its
deadline, it will ask some nodes to bid
for running T
Successful bid
Nodes with
sufficient surplus
according to Ni
Undergraduate course on Real-time Systems
Linköping University
Focused addressing
• Task T arrives at node Ni
• If Ni cannot guarantee T, it looks for a
node that has a surplus resource level
above a fixed limit (Focused Addressing
Surplus- FAS)
11 of 36
Autumn 2005
10 of 36
Autumn 2005
Surplus > FAS
Undergraduate course on Real-time Systems
Linköping University
12 of 36
Autumn 2005
A combined approach
• None of earlier methods guarantees
successful scheduling
• Combine heuristics to increase chances
Successful bid
Reading material (part 2)
• Chapter 5 of Burns & Wellings
• Dependable Systems: IFIP terminology
as described in [Avizienis et.al. 2004]
Surplus > FAS
Undergraduate course on Real-time Systems
Linköping University
13 of 36
Autumn 2005
• How can we produce systems that do
their job, and how to measure how well
they do their jobs?
• How do things go wrong and why?
• What can we do about it?
– This lecture: Basic overview of faulttolerant systems
– Next lecture: Designing Dependable
Real-time systems
Undergraduate course on Real-time Systems
Linköping University
15 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
14 of 36
Autumn 2005
Early computer systems
• 1944: Real-time computer system in the
Whirlwind project at MIT, used in a
military air traffic control system 1951
• Short life of vaccum tubes gave mean
time to failure of 20 minutes
Undergraduate course on Real-time Systems
Linköping University
16 of 36
Autumn 2005
Early space and avionics
• During 1955, 18 air carrier accidents in
the USA (when only 20% of the public
was willing to fly!)
• 1970: Apollo 13 had less computing
power on board than a PC produced ten
years later
Undergraduate course on Real-time Systems
Linköping University
17 of 36
Autumn 2005
Sti ccid !
a ual
ap nus
mi not
Undergraduate course on Real-time Systems
Linköping University
18 of 36
Autumn 2005
June – January 1987
• Six patients in USA and Canada got
very high doses of radiation and
severe burns from the cancertreatment system Therac 25.
• Doses as high as 15,000-20,000
radiation units compared with the
normal levels (~ 200 units) had
been given. Three died.
Undergraduate course on Real-time Systems
Linköping University
19 of 36
Autumn 2005
3rd February 1994
• TCAS is a system designed to
avoid mid-air collisions between
passenger planes.
• Two commercial aircrafts came as
close as 1.6 km to each other
while flying over Oregon in USA.
Undergraduate course on Real-time Systems
Linköping University
20 of 36
Autumn 2005
What is dependability?
• ”Friendly Fire” - during the Gulf
war 24% of American soldiers (35
av 146) killed by own systems.
Undergraduate course on Real-time Systems
Linköping University
21 of 36
Autumn 2005
Property of a computing system which
allows reliance to be justifiably placed on
the service it delivers.
[Avizienis et al.]
Undergraduate course on Real-time Systems
Linköping University
22 of 36
Autumn 2005
Attributes of dependability
[Sv. Pålitlighet]
[Sv. Tillförlitlighet]
IFIP WG 10.4 definitions:
• Safety: non-occurance of catastrophic
consequences on the environment
• Availability: the readiness for usage
• Integrity: non-occurance of
unauthorized alteration of information
• Reliability: continuity of correct service
Means that the system (functionally)
behaves as specified, and does it
continually over measured intervals of
Typical measure in aerospace: 10-9
i.e. One failure in 109 flight hours.
Undergraduate course on Real-time Systems
Linköping University
23 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
24 of 36
Autumn 2005
Faults, Errors & Failures
• Fault: a defect within the system or a
situation that can lead to failure
• Error: manifestation (symptom) of the
fault - an unexpected behaviour
• Failure: system not performing its
intended function
• Year 2000 bug
• Bit flips in hardware due to cosmic
radiation in space
• Loose wire
• Air craft retracting its landing gear while
on ground
Effects in time:
Permanent/ transient/ intermittent
Undergraduate course on Real-time Systems
Linköping University
25 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Fault ⇒ Error ⇒ Failure
• Goal of system verification and
validation is to eliminate faults
26 of 36
Autumn 2005
More on dependability
Four approaches [IFIP 10.4]:
Some will
• Goal of safety/risk analysis is to focus
on important faults
• Goal of fault tolerance is to reduce
effects of errors if they appear eliminate or delay failures
Undergraduate course on Real-time Systems
Linköping University
27 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Fault tolerance
28 of 36
Autumn 2005
External factors
• Means that a system provides a
degraded (but acceptable) function
– Even in presence of faults
– During a period defined by certain
model assumptions
The film…
• Foreseen or unforeseen?
Undergraduate course on Real-time Systems
Linköping University
29 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
30 of 36
Autumn 2005
Types of failures
• Node failures
– Crash
– Omission
– Byzantine
• Channel failures
– Crash (and potential partitions)
– Message loss
– Erroneous/arbitrary messages
Undergraduate course on Real-time Systems
Linköping University
On-line fault-management
• Fault detection
– By program or its environment
• Fault tolerance (containment) using
– software
– hardware
– data
31 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
32 of 36
Autumn 2005
Static Redundancy
From D. Lardner: Edinburgh Review, year 1824:
”The most certain and effectual check upon errors
which arise in the process of computation is to
cause the same computations to be made by
separate and independent computers*; and this
check is rendered still more decisive if their
computations are carried out by different
Used in all cases (whether an error has
appeared or not), just in case…
– SW: N-version programming
– HW: Voting systems
– Data: parity bits, checksums
* people who compute
Undergraduate course on Real-time Systems
Linköping University
33 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
34 of 36
Autumn 2005
Dynamic Redundancy
Used when error appears and has to be
– SW: Recovery methods
– HW: Switching to back-up module
– Data: Self-correcting codes
– Time: Re-computing a result
Undergraduate course on Real-time Systems
Linköping University
35 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
36 of 36
Autumn 2005