Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

advertisement
Design Optimization of
Time- and Cost-Constrained
Fault-Tolerant Distributed Embedded Systems
Viaceslav Izosimov, Paul Pop, Petru Eles, Zebo Peng
Embedded Systems Lab (ESLAB)
Linköping University, Sweden
1 of 1/14
14
Motivation
 Hard real-time applications
 Timing constraints
 Cost constraints
 Hardware solutions
 MARS, TTA, X-by-Wire
 Permanent faults
 Costly for transient faults
 Online preemptive
 Flexible
Faults
 Predictable
 Transient
 Intermittent
vs.  Software
Software solutions
solutions
 Re-execution/rollback recovery
 Checkpointing/rollback recovery
 Replication, primary-backup…
vs.  Off-line non-preemptive
 Predictable
2 of 2/14
14
Outline
 Motivation
System architecture and fault-model
 Fault-tolerance techniques
 Problem formulation
 Motivational examples
 Tabu-search optimization strategy
 Experimental results
 Contributions and Message
3 of 3/14
14
Fault-Tolerant Time-Triggered Systems
Transient faults
...
Processes:
Re-execution
Static cyclicand
scheduling
replication
Messages:
Fault-tolerant
Static schedule
protocol
table
Time Triggered Protocol (TTP)


Bus access scheme:
time-division multiple-access (TDMA)
Schedule table located in each TTP
controller: message descriptor list (MEDL)
S1 S3
Slot
S2
S4
S1 S3
S2
S4
TDMA Round
Cycle of two rounds
4 of 4/14
14
Fault-Tolerant Techniques
2
N1
N1
P1
P1
P1
N2
N3
Re-execution
P1
N1
P1
N2
P1
P1
P1
P1
Replication
Re-executed
replicas
5 of 5/14
14
Problem Formulation
 Given
 Fault model
 Number of transient faults in the system period
 System architecture
 Application
 WCETs, message sizes, periods, deadlines
 Determine
Fault-model: transient faults
 Schedulable and fault-tolerant design implementation
...
 Fault-tolerance policy assignment
 Mapping of processes and messages
 Schedule tables for processes and messages
Application: set of process graphs
Architecture: time-triggered system
6 of 6/14
14
Static Scheduling [Kandasamy et al. 03]
Contingency
schedules
Transparent
re-execution
P2
N1: S21
P1
N2: S12
11
P23
P3 P4
Recovery
slack
P4
m1
P1
m2
N3: S14
P5
2
Root schedules
P2
Contingency
schedules
N1
N2
N3
S1
S11
S14
P3
S2
P2
S3
P3
S4
P4
S6
P4
S5
P3
S7
P5
P1
S9
P4
P4
S8
S10
S12
P1
S13
S15
N1
N2
N3
P5
S18
P1
m1
m2
P5
P2
P3
P4
7 of 7/14
14
Re-execution vs. Replication
Deadline
P1
TTP S S
1 2
P2
Missed
P3
N1
N2
P3
P1
P1
Re-execution is better
N1
P1
N2
P2
Met
P3
Met
N1
P1
P2
P3
Missed
TTP S1S2
P1
P2
P2
N2
TTP S1S2
m1
P3
Replication is better
P3
A1
P2
TTP S1S2
m1
m1
N2
P2
m2
m2
P1
m1
m1
N1
Deadline
P3
N1
N2
N1 N2
P1 40 50
P2 40 50
P3 60 70
1
A2
P1
m1
P2
m2
P3
8 of 8/14
14
Fault-Tolerant Policy Assignment
Deadline
P1
P2 P2
P4P4
P3
P2 P2
P1
TTP S11S22
P4 P4P3
P32
m12
m1
m2
m2
N22
P11
m2
P3
P1
m1
P3
P2
m3
P4
P3
MetMissed
P4
Missed
Optimization
of fault-tolerance
policy assignment
m3
m3
TTP S1S2
N11
No fault-tolerance:
application crashes
m2
N2
m2
N1
P4
P1
P2
P3
P4
N1 N2
40 50
60 80
60 80
40 50
1
N1
N2
9 of 9/14
14
Mapping and Fault-Tolerance
P1
P2
P3
N1
m2
TTP S1S2
P1
Simultaneous
mapping and
fault-tolerance
Deadline
m4
N2
Best mapping without
considering fault-tolerance
P4
P2
P3
N2
P4
P4
P3
Missed
m2
TTP S1S2
m1 P 1 m2
P2
P3
m3
m4
P4
Met
m4
N1
P1
P2
P3
P4
N1
40
60
60
40
N2
X
70
70
X
1
N1
N2
10 of10/14
14
Optimization Strategy


Design optimization:


Fault-tolerance policy assignment
Mapping of processes and messages
Tabu-search

Root schedules
List scheduling
Three tabu-search optimization algorithms:
1. Mapping and Fault-Tolerance Policy assignment (MRX)

Re-execution, replication or both
2. Mapping and only Re-Execution (MX)
3. Mapping and only Replication (MR)
11 of11/14
14
MRX Tabu-Search Example
P2 P2
P1
TTP S1S2
P4 P4
P3
P3
P1
Tabu 2
1
Wait 0
1
P2 P3 P4
1
2 0 0
0 2
1 1
P1
Tabu 12
Wait 10
P2 P3 P4
21 0 0
0 1
2 1
Current
solution
S2
m2
N2
P1
m2
m1
N1
Design
Design
transformations
transformations
P2 P2
P1
TTP S1S2
P4P3P4P2 P3P4 P4
PP33
P3
Non-tabu
Tabu move&&&
Non-tabu
worse
better than
worse
than
best-so-far
best-so-far
S2S1
m22
m1
N2
P1
m2
m1
N1
m2
P3
P1
m1
P2
m3
P4
P1
P2
P3
P4
N1 N2
40 50
60 75
60 75
40 50
1
N1
N2
12 of12/14
14
Experimental Results
Schedulability improvement under resource constraints
Avgerage % deviation from MRX
100
90
Mapping and replication (MR)
80
70
 Case study
60
 Vehicle cruise controller
 MRX: schedulable fault-tolerant
application with 65% overhead
50
40
30
Mapping and re-execution (MX)
20
10
0
Mapping and policy assignment (MRX)
20
40
60
80
100
Number of processes
13 of13/14
14
Contributions and Message
 Contributions
 Combined re-execution and replication
 Optimization algorithms for fault-tolerance policy assignment
 Efficient contingency schedule generation
Optimization of fault-tolerance
policy assignment needed for
cost-effective fault tolerance
14 of14/14
14
Download