Microarchitecture-Based Introspection Fault Tolerance with Moinuddin K. Qureshi Onur Mutlu

advertisement
Fault Tolerance with
Microarchitecture-Based Introspection
Moinuddin K. Qureshi
Onur Mutlu
Yale N. Patt
Department of Electrical and Computer Engineering
The University of Texas at Austin
The International Conference on Dependable Systems and Networks - 2005
1/22
The Problem of Transient Faults
A transient fault occurs when a cosmic particle strikes and inverts
the logical state of a device.
The International Conference on Dependable Systems and Networks - 2005
2/22
The Problem of Transient Faults
A transient fault occurs when a cosmic particle strikes and inverts
the logical state of a device.
1
1
1
The International Conference on Dependable Systems and Networks - 2005
2/22
The Problem of Transient Faults
A transient fault occurs when a cosmic particle strikes and inverts
the logical state of a device.
COSMIC
STRIKE
1
0
1
The International Conference on Dependable Systems and Networks - 2005
2/22
The Problem of Transient Faults
A transient fault occurs when a cosmic particle strikes and inverts
the logical state of a device.
1
1
1
The International Conference on Dependable Systems and Networks - 2005
2/22
The Problem of Transient Faults
A transient fault occurs when a cosmic particle strikes and inverts
the logical state of a device.
1
1
1
Can cause a fully verified, correct design to perform incorrectly.
The International Conference on Dependable Systems and Networks - 2005
2/22
The Need for Cost-Effective Transient-Fault Tolerance
The rate of transient faults is expected to increase significantly.
Processors will need some form of fault tolerance.
However, different applications have different reliability
requirements (e.g. server-apps v/s games). Users who do not
require high reliability may not want to pay the overhead.
Fault tolerant mechanisms with low hardware cost are
attractive because they allow the designs to be used for a wide
variety of applications.
The International Conference on Dependable Systems and Networks - 2005
3/22
Background on Fault Tolerance
Storage structures are easy to protect with parity or ECC.
Logic structures often need redundant execution:
Space redundancy
Time redundancy
Space redundancy has high hardware overhead.
Time redundancy has low hardware overhead but high
performance overhead.
The performance overhead of time redundancy can be reduced if
redundant execution is performed during idle periods.
The International Conference on Dependable Systems and Networks - 2005
4/22
Exploiting the Memory Wall for Idle Periods
A long-latency miss typically stalls instruction processing.
A significant proportion of execution time is spent waiting for
memory.
The International Conference on Dependable Systems and Networks - 2005
5/22
Exploiting the Memory Wall for Idle Periods
A long-latency miss typically stalls instruction processing.
Execution Time
A significant proportion of execution time is spent waiting for
memory.
58%
L2miss
Remaining
42%
Vpr
The International Conference on Dependable Systems and Networks - 2005
5/22
Exploiting the Memory Wall for Idle Periods
A long-latency miss typically stalls instruction processing.
Execution Time
A significant proportion of execution time is spent waiting for
memory.
58%
L2miss
Remaining
42%
Vpr
Redundant execution can leverage load values and branch
prediction from primary execution.
The International Conference on Dependable Systems and Networks - 2005
5/22
Exploiting the Memory Wall for Idle Periods
A long-latency miss typically stalls instruction processing.
Execution Time
A significant proportion of execution time is spent waiting for
memory.
58%
L2miss
L1miss+BP
25%
Useful
17%
Vpr
There is enough idle time to re-execute the application!
The International Conference on Dependable Systems and Networks - 2005
5/22
Motivation
2.5
16
L2MISS
L1MISS+BP
USEFUL
1.5
1.0
0.5
0.0
mcf
art
lucas
ammp
twolf
wupwise
apsi
vpr
facerec
swim
galgel
gcc
bzip2
vortex
mgrid
parser
applu
equake
mesa
gap
sixtrack
crafty
perlbmk
gzip
eon
fma3d
Cycles Per Instruction
2.0
HIGH-MEM CATEGORY
LOW-MEM CATEGORY
The International Conference on Dependable Systems and Networks - 2005
6/22
Motivation
2.5
16
L2MISS
L1MISS+BP
USEFUL
Cycles Per Instruction
2.0
1.5
1.0
0.0
mcf
art
lucas
ammp
twolf
wupwise
apsi
vpr
facerec
swim
galgel
gcc
bzip2
vortex
mgrid
parser
applu
equake
mesa
gap
sixtrack
crafty
perlbmk
gzip
eon
fma3d
0.5
HIGH-MEM CATEGORY
LOW-MEM CATEGORY
Can we design a time redundancy technique that utilizes memory
waiting time for redundant execution?
The International Conference on Dependable Systems and Networks - 2005
6/22
Outline
Introduction
Concept of Introspection
Implementation of MBI
Evaluation of MBI
Summary
The International Conference on Dependable Systems and Networks - 2005
7/22
The Concept of Introspection
An example from the human domain:
PERFORM
ACTIONS
PERFORM
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
PERFORM
ACTIONS
KEEP RECORD
PERFORM
A
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
PERFORM
ACTIONS
KEEP RECORD
PERFORM
B A
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
PERFORM
ACTIONS
KEEP RECORD
PERFORM
C B A
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
IDLE || FORCED
PERFORM
ACTIONS
CHECK
ACTIONS
KEEP RECORD
PERFORM
C B A
INTROSPECT
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
IDLE || FORCED
PERFORM
ACTIONS
CHECK
ACTIONS
KEEP RECORD
PERFORM
C B
INTROSPECT
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
IDLE || FORCED
PERFORM
ACTIONS
CHECK
ACTIONS
KEEP RECORD
PERFORM
C
INTROSPECT
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
IDLE || FORCED
PERFORM
ACTIONS
CHECK
ACTIONS
KEEP RECORD
PERFORM
INTROSPECT
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
An example from the human domain:
IDLE || FORCED
PERFORM
ACTIONS
CHECK
ACTIONS
KEEP RECORD
PERFORM
INTROSPECT
DONE || IDLE
The International Conference on Dependable Systems and Networks - 2005
8/22
The Concept of Introspection
Extending it to the microarchitecture domain:
L2−MISS || BUFFER−FULL
EXECUTE
INSTRUCTIONS
VERIFY
RESULTS
BACKLOG BUFFER
PERFORM
INTROSPECT
MISS−SERVICED || BUFFER−EMPTY
Microarchitecture-Based Introspection (MBI)
The International Conference on Dependable Systems and Networks - 2005
8/22
Outline
Introduction
Concept of Introspection
Implementation of MBI
Evaluation of MBI
Summary
The International Conference on Dependable Systems and Networks - 2005
9/22
Additional Structures
BUS
L2 CACHE
MEMORY
D CACHE
I CACHE
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T
I ARF
R
E
The International Conference on Dependable Systems and Networks - 2005
10/22
Additional Structures
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
COMPARATOR
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
The International Conference on Dependable Systems and Networks - 2005
HEAD−PTR
=
ERROR
BACKLOG TAIL−PTR
BUFFER
10/22
Mechanism
Two modes: Performance and Introspection
Four cases:
Operation in performance mode
Switching from performance mode to introspection mode
Operation in introspection mode
Switching from introspection mode to performance mode
The International Conference on Dependable Systems and Networks - 2005
11/22
Operation in Performance Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
HEAD−PTR
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
The International Conference on Dependable Systems and Networks - 2005
=
TAIL−PTR
BACKLOG
BUFFER
12/22
Operation in Performance Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
HEAD−PTR
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
The International Conference on Dependable Systems and Networks - 2005
=
A
TAIL−PTR
BACKLOG
BUFFER
12/22
Operation in Performance Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
HEAD−PTR
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
The International Conference on Dependable Systems and Networks - 2005
=
A
B
TAIL−PTR
BACKLOG
BUFFER
12/22
Operation in Performance Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
HEAD−PTR
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
The International Conference on Dependable Systems and Networks - 2005
=
A
B
C
TAIL−PTR
BACKLOG
BUFFER
12/22
Switching from Performance Mode to Introspection Mode
Enter Introspection on:
Long latency L2 miss (wait for 30 cycles)
Backlog buffer full
System call or Interaction with I/O
The process involves:
Flushing the pipeline
Switching the mode bit
Fetching instructions from the PC of ISPEC-ARF
The International Conference on Dependable Systems and Networks - 2005
13/22
Operation in Introspection Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
HEAD−PTR
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
The International Conference on Dependable Systems and Networks - 2005
=
A
B
C
TAIL−PTR
BACKLOG
BUFFER
14/22
Operation in Introspection Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
A’
The International Conference on Dependable Systems and Networks - 2005
A
=
HEAD−PTR
B
C
TAIL−PTR
BACKLOG
BUFFER
14/22
Operation in Introspection Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
B’
The International Conference on Dependable Systems and Networks - 2005
B
=
HEAD−PTR
C
TAIL−PTR
BACKLOG
BUFFER
14/22
Operation in Introspection Mode
BUS
L2 CACHE
MEMORY
D CACHE
MODE
I CACHE
F
E
T
C
H
PROCESSOR PIPELINE
(VULNERABLE TO TRANSIENT FAULTS)
R
E
T PERF
I ARF
R ISPEC
E ARF
C’
The International Conference on Dependable Systems and Networks - 2005
C
=
HEAD−PTR
TAIL−PTR
BACKLOG
BUFFER
14/22
Switching from Introspection Mode to Performance Mode
Exit Introspection mode when:
L2 miss is serviced
Backlog buffer is empty
The process involves:
Flushing the pipeline
Switching the mode bit
Fetching instructions from the PC of PERF-ARF
The International Conference on Dependable Systems and Networks - 2005
15/22
Outline
Introduction
Concept of Introspection
Implementation of MBI
Evaluation of MBI
Summary
The International Conference on Dependable Systems and Networks - 2005
16/22
Methodology
Pipeline: 20 stages, 8-wide issue with out-of-order execution
L2 cache: Unified, 1MB, 15-cycle hit
Memory: 400 cycles (32 banks)
Prefetcher: Streaming with up to 32 streams
Backlog Buffer contains 2k entries
Benchmarks: SPEC CPU2000
The International Conference on Dependable Systems and Networks - 2005
17/22
Performance Overhead
Forced-Introspection penalty
Pipeline-Fill penalty
The International Conference on Dependable Systems and Networks - 2005
18/22
Performance Overhead
Forced-Introspection penalty
Pipeline-Fill penalty
40
30
20
0
HIGH-MEM CATEGORY
LOW-MEM CATEGORY
The International Conference on Dependable Systems and Networks - 2005
hmean-all
10
mcf
art
lucas
ammp
twolf
wupwise
apsi
vpr
facerec
swim
galgel
gcc
bzip2
vortex
mgrid
parser
applu
equake
mesa
gap
sixtrack
crafty
perlbmk
gzip
eon
fma3d
Percentage Reduction in IPC
50
18/22
Other Overheads: Storage and Error-Detection Latency
Storage overhead:
Depends on the Number of Entries in the Backlog Buffer
Num. Entries
Storage Cost
IPC overhead
1k
8.5kB
16.1%
2k
16.5kB
14.5%
4k
32.5kB
13.9%
Error Detection Latency
Average: 682 cycles
Worst case: 36183 cycles
(impact depends on the error correction mechanism )
The International Conference on Dependable Systems and Networks - 2005
19/22
Outline
Introduction
Concept of Introspection
Implementation of MBI
Evaluation of MBI
Summary
The International Conference on Dependable Systems and Networks - 2005
20/22
Summary
Transient fault rate is increasing. Processors will need some
form of fault tolerance.
Fault tolerant mechanisms with low hardware cost are
attractive because they allow the designs to be used for a wide
variety of applications.
MBI utilizes the idle time during long latency cache misses for
redundant execution. MBI has low hardware overhead and
provides coverage for the entire pipeline.
The time redundancy of MBI incurs an average IPC overhead
of 14.5%.
The International Conference on Dependable Systems and Networks - 2005
21/22
Questions
The International Conference on Dependable Systems and Networks - 2005
22/22
Extra Slides
The International Conference on Dependable Systems and Networks - 2005
23/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
Recover register state by copying ISPEC-ARF to PERF-ARF
Recover memory state by undoing younger store operations.
A
HEAD−PTR
B
C
D
ADDR: VAL
E
F
G
ADDR: VAL
H
I
ADDR: VAL
J
K
L
M
ADDR: VAL
TAIL−PTR
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
HEAD−PTR
B
C
D
ADDR: VAL
E
F
G
ADDR: VAL
H
I
ADDR: VAL
J
K
L
M
ADDR: VAL
TAIL−PTR
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
ADDR: VAL
H
I
ADDR: VAL
J
K
L
M
ADDR: VAL
TAIL−PTR
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
ADDR: VAL
H
I
ADDR: VAL
J
K
L
UNDO
TAIL−PTR
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
ADDR: VAL
H
I
ADDR: VAL
J
TAIL−PTR
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
ADDR: VAL
H
I
ADDR: VAL
TAIL−PTR
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
ADDR: VAL
H
I
UNDO
TAIL−PTR
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
ADDR: VAL
TAIL−PTR
H
I
UNDO
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
F
G
UNDO
TAIL−PTR
H
I
UNDO
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
HEAD−PTR
E
TAIL−PTR
F
G
UNDO
H
I
UNDO
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
D
ADDR: VAL
E
HEAD−PTR
TAIL−PTR
F
G
UNDO
H
I
UNDO
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
ERROR
HEAD−PTR
D
ADDR: VAL
TAIL−PTR
E
F
G
UNDO
H
I
UNDO
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
D
ERROR
HEAD−PTR
TAIL−PTR
UNDO
E
F
G
UNDO
H
I
UNDO
J
K
L
UNDO
M
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
D
ERROR
HEAD−PTR
TAIL−PTR
UNDO
E
F
G
UNDO
H
I
UNDO
J
K
L
UNDO
M
No effect of the error causing instruction or any later instruction is
architecturally visible.
The International Conference on Dependable Systems and Networks - 2005
24/22
Implementation of MBI: Recovery
MBI is primarily a fault detection mechanism.
Possible recovery mechanisms include fail-safe and checkpointing.
Can also use undo mechanism...
A
B
C
D
ERROR
HEAD−PTR
TAIL−PTR
UNDO
E
F
G
UNDO
H
I
UNDO
J
K
L
UNDO
M
Needs special handling in case of multiprocessors.
Data about error detection latency is here
The International Conference on Dependable Systems and Networks - 2005
24/22
Related work
Dual modulo redundancy:
[e.g. IBM-z390, Compaq-Himalaya]
High hardware cost
The International Conference on Dependable Systems and Networks - 2005
25/22
Related work
Dual modulo redundancy:
[e.g. IBM-z390, Compaq-Himalaya]
High hardware cost
DIVA [Austin MICRO’99]
Needs a checker processor – substantial investment
Does not provide full coverage for the pipeline
The International Conference on Dependable Systems and Networks - 2005
25/22
Related work
Dual modulo redundancy:
[e.g. IBM-z390, Compaq-Himalaya]
High hardware cost
DIVA [Austin MICRO’99]
Needs a checker processor – substantial investment
Does not provide full coverage for the pipeline
Software based techniques:
[EDDI, SWIFT, etc.]
May need recompilation (not always possible)
Cannot take advantage of run-time information (e.g. cache miss)
The International Conference on Dependable Systems and Networks - 2005
25/22
Related work
Dual modulo redundancy:
[e.g. IBM-z390, Compaq-Himalaya]
High hardware cost
DIVA [Austin MICRO’99]
Needs a checker processor – substantial investment
Does not provide full coverage for the pipeline
Software based techniques:
[EDDI, SWIFT, etc.]
May need recompilation (not always possible)
Cannot take advantage of run-time information (e.g. cache miss)
SMT based techniques:
[e.g. RMT - ISCA’00]
Need a fine-grained multi-threaded machine
Reduces throughput
The International Conference on Dependable Systems and Networks - 2005
25/22
Sensitivity to Entries in the Backlog Buffer
% Reduction in IPC
25
20
art
15
10
5
0
1K
2K
The International Conference on Dependable Systems and Networks - 2005
4K
26/22
Sensitivity to Entries in the Backlog Buffer
% Reduction in IPC
25
ammp
art
20
15
10
5
0
1K
2K
The International Conference on Dependable Systems and Networks - 2005
4K
26/22
Sensitivity to Entries in the Backlog Buffer
% Reduction in IPC
25
gcc
ammp
art
20
15
10
5
0
1K
2K
The International Conference on Dependable Systems and Networks - 2005
4K
26/22
Sensitivity to Entries in the Backlog Buffer
% Reduction in IPC
25
gcc
average
ammp
art
20
15
10
5
0
1K
2K
The International Conference on Dependable Systems and Networks - 2005
4K
26/22
Breakdown of Introspection Episodes
100
80
70
60
50
40
30
20
n
eo
gz
ip
k
pe
rlb
m
x
rte
vo
2
ip
bz
c
gc
lg
ga
r
vp
si
ap
f
ol
tw
t
ar
cf
0
el
10
m
% of all introspection episodes
90
forced-introspection
normal-introspection
The International Conference on Dependable Systems and Networks - 2005
27/22
100
90
80
70
60
50
40
30
20
n
eo
gz
ip
k
pe
rlb
m
x
rte
vo
2
ip
bz
c
gc
lg
ga
r
vp
si
ap
f
ol
tw
t
ar
cf
0
el
10
m
Coverage for all retired instructions (%)
Coverage of Different Types of Introspection
forced-introspection
normal-introspection
The International Conference on Dependable Systems and Networks - 2005
28/22
Download