Outline TDDB47 Real-time Systems Lecture 4: Scheduling(II) cont’d & Dependability

advertisement
Outline
TDDB47 Real-time Systems
Lecture 4: Scheduling(II) cont’d
& Dependability
Simin Nadjm-Tehrani
Real-time Systems Laboratory
Department of Computer and Information Science
Linköping university
Undergraduate course on Real-time Systems
Linköping University
36 pages
Autumn 2005
Part 1:
• Continuing with ICP and Deadlocks
• Towards more dynamic scheduling:
– Offloading non-critical tasks to other
processors (distributed scheduling)
Part 2:
• Dependable Systems
Undergraduate course on Real-time Systems
Linköping University
Reading material (part 1)
• Chapter 13 of Burns & Wellings, in
particular 13.11.
• Background reading on deadlocks
2 of 36
Autumn 2005
ICP & Deadlock
• The ICP prevents deadlocks (How?)
• Moreover, it prevents starvation (How?)
• Article by Ramamritham, Stankovic, and
Zhao, IEEE Transactions on Computers,
Volume 38(8), August 1989. For the
evaluation results concentrating on
section V.B.
Undergraduate course on Real-time Systems
Linköping University
3 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Deadlock prevention
4 of 36
Autumn 2005
Starvation
• Prevention technique
– allocate all necessary resources at
once, before execution
– Drawbacks?
...
...
... ...
...
......
– Recall the new problem created with
the dining philosopher problem…
Undergraduate course on Real-time Systems
Linköping University
5 of 36
Autumn 2005
Starvation/lockout happens
if some process never gets hold of
the resources it needs despite the
fact that the resources are not
constantly engaged
Undergraduate course on Real-time Systems
Linköping University
6 of 36
Autumn 2005
What is meant by liveness?
• Liveness – i.e. absence of
deadlock, starvation and livelock is
necessary in real-time systems
Now back to scheduling...
• Immediate ceiling protocol (ICP) is
deadlock preventing
...
• But not sufficient...
...
• If this can be guaranteed then the
system is live – intuitively, the
good things that should happen will
happen sooner or later
Undergraduate course on Real-time Systems
Linköping University
7 of 36
Autumn 2005
...
...
...
Undergraduate course on Real-time Systems
Linköping University
ICP & Deadlock
• The ICP prevents deadlocks (How?)
• Moreover, it prevents starvation (How?)
...
...
8 of 36
Autumn 2005
Distributed Scheduling
• Relax the restriction on a-priori fixed task sets
and arrival times
• Consider CPU and all other resources a task
might need to complete
• Can a task be offloaded from the processor it
arrives at?
Undergraduate course on Real-time Systems
Linköping University
9 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Bidding
• Task T arrives at node Ni
• If Ni cannot guarantee T meeting its
deadline, it will ask some nodes to bid
for running T
Nk
Successful bid
T
Nodes with
sufficient surplus
according to Ni
knowledge
Undergraduate course on Real-time Systems
Linköping University
Focused addressing
• Task T arrives at node Ni
• If Ni cannot guarantee T, it looks for a
node that has a surplus resource level
above a fixed limit (Focused Addressing
Surplus- FAS)
Nj
T
Ni
11 of 36
Autumn 2005
10 of 36
Autumn 2005
Surplus > FAS
Ni
Undergraduate course on Real-time Systems
Linköping University
12 of 36
Autumn 2005
A combined approach
• None of earlier methods guarantees
successful scheduling
• Combine heuristics to increase chances
Successful bid
Reading material (part 2)
• Chapter 5 of Burns & Wellings
• Dependable Systems: IFIP terminology
as described in [Avizienis et.al. 2004]
Surplus > FAS
T
Undergraduate course on Real-time Systems
Linköping University
13 of 36
Autumn 2005
Dependability
• How can we produce systems that do
their job, and how to measure how well
they do their jobs?
• How do things go wrong and why?
• What can we do about it?
– This lecture: Basic overview of faulttolerant systems
– Next lecture: Designing Dependable
Real-time systems
Undergraduate course on Real-time Systems
Linköping University
15 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
14 of 36
Autumn 2005
Early computer systems
• 1944: Real-time computer system in the
Whirlwind project at MIT, used in a
military air traffic control system 1951
• Short life of vaccum tubes gave mean
time to failure of 20 minutes
Undergraduate course on Real-time Systems
Linköping University
16 of 36
Autumn 2005
Early space and avionics
• During 1955, 18 air carrier accidents in
the USA (when only 20% of the public
was willing to fly!)
• 1970: Apollo 13 had less computing
power on board than a PC produced ten
years later
Undergraduate course on Real-time Systems
Linköping University
17 of 36
Autumn 2005
ts
ll,
en
Sti ccid !
a ual
s/
ap nus
h
u
s
mi not
Undergraduate course on Real-time Systems
Linköping University
18 of 36
Autumn 2005
June – January 1987
• Six patients in USA and Canada got
very high doses of radiation and
severe burns from the cancertreatment system Therac 25.
• Doses as high as 15,000-20,000
radiation units compared with the
normal levels (~ 200 units) had
been given. Three died.
Undergraduate course on Real-time Systems
Linköping University
19 of 36
Autumn 2005
3rd February 1994
• TCAS is a system designed to
avoid mid-air collisions between
passenger planes.
• Two commercial aircrafts came as
close as 1.6 km to each other
while flying over Oregon in USA.
Undergraduate course on Real-time Systems
Linköping University
20 of 36
Autumn 2005
What is dependability?
• ”Friendly Fire” - during the Gulf
war 24% of American soldiers (35
av 146) killed by own systems.
Undergraduate course on Real-time Systems
Linköping University
21 of 36
Autumn 2005
Property of a computing system which
allows reliance to be justifiably placed on
the service it delivers.
[Avizienis et al.]
Undergraduate course on Real-time Systems
Linköping University
22 of 36
Autumn 2005
Reliability
Attributes of dependability
[Sv. Pålitlighet]
[Sv. Tillförlitlighet]
IFIP WG 10.4 definitions:
• Safety: non-occurance of catastrophic
consequences on the environment
• Availability: the readiness for usage
• Integrity: non-occurance of
unauthorized alteration of information
• Reliability: continuity of correct service
Means that the system (functionally)
behaves as specified, and does it
continually over measured intervals of
time.
Typical measure in aerospace: 10-9
i.e. One failure in 109 flight hours.
Undergraduate course on Real-time Systems
Linköping University
23 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
24 of 36
Autumn 2005
Faults, Errors & Failures
• Fault: a defect within the system or a
situation that can lead to failure
• Error: manifestation (symptom) of the
fault - an unexpected behaviour
• Failure: system not performing its
intended function
Examples
• Year 2000 bug
• Bit flips in hardware due to cosmic
radiation in space
• Loose wire
• Air craft retracting its landing gear while
on ground
Effects in time:
Permanent/ transient/ intermittent
Undergraduate course on Real-time Systems
Linköping University
25 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Fault ⇒ Error ⇒ Failure
• Goal of system verification and
validation is to eliminate faults
26 of 36
Autumn 2005
More on dependability
Four approaches [IFIP 10.4]:
Some will
remain…
1.
2.
3.
4.
• Goal of safety/risk analysis is to focus
on important faults
Fault
Fault
Fault
Fault
avoidance
removal
tolerance
forecasting
• Goal of fault tolerance is to reduce
effects of errors if they appear eliminate or delay failures
Undergraduate course on Real-time Systems
Linköping University
27 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Fault tolerance
28 of 36
Autumn 2005
External factors
• Means that a system provides a
degraded (but acceptable) function
– Even in presence of faults
– During a period defined by certain
model assumptions
The film…
• Foreseen or unforeseen?
Undergraduate course on Real-time Systems
Linköping University
29 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
30 of 36
Autumn 2005
Types of failures
• Node failures
– Crash
– Omission
– Byzantine
• Channel failures
– Crash (and potential partitions)
– Message loss
– Erroneous/arbitrary messages
Undergraduate course on Real-time Systems
Linköping University
On-line fault-management
• Fault detection
– By program or its environment
• Fault tolerance (containment) using
redundancy
– software
– hardware
– data
31 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
Redundancy
32 of 36
Autumn 2005
Static Redundancy
From D. Lardner: Edinburgh Review, year 1824:
”The most certain and effectual check upon errors
which arise in the process of computation is to
cause the same computations to be made by
separate and independent computers*; and this
check is rendered still more decisive if their
computations are carried out by different
methods.”
Used in all cases (whether an error has
appeared or not), just in case…
– SW: N-version programming
– HW: Voting systems
– Data: parity bits, checksums
* people who compute
Undergraduate course on Real-time Systems
Linköping University
33 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
34 of 36
Autumn 2005
Dynamic Redundancy
Used when error appears and has to be
treated
– SW: Recovery methods
– HW: Switching to back-up module
– Data: Self-correcting codes
– Time: Re-computing a result
Undergraduate course on Real-time Systems
Linköping University
Questions?
35 of 36
Autumn 2005
Undergraduate course on Real-time Systems
Linköping University
36 of 36
Autumn 2005
Download