Fault Tolerance with Microarchitecture-Based Introspection Moinuddin K. Qureshi Onur Mutlu Yale N. Patt Department of Electrical and Computer Engineering The University of Texas at Austin The International Conference on Dependable Systems and Networks - 2005 1/22 The Problem of Transient Faults A transient fault occurs when a cosmic particle strikes and inverts the logical state of a device. The International Conference on Dependable Systems and Networks - 2005 2/22 The Problem of Transient Faults A transient fault occurs when a cosmic particle strikes and inverts the logical state of a device. 1 1 1 The International Conference on Dependable Systems and Networks - 2005 2/22 The Problem of Transient Faults A transient fault occurs when a cosmic particle strikes and inverts the logical state of a device. COSMIC STRIKE 1 0 1 The International Conference on Dependable Systems and Networks - 2005 2/22 The Problem of Transient Faults A transient fault occurs when a cosmic particle strikes and inverts the logical state of a device. 1 1 1 The International Conference on Dependable Systems and Networks - 2005 2/22 The Problem of Transient Faults A transient fault occurs when a cosmic particle strikes and inverts the logical state of a device. 1 1 1 Can cause a fully verified, correct design to perform incorrectly. The International Conference on Dependable Systems and Networks - 2005 2/22 The Need for Cost-Effective Transient-Fault Tolerance The rate of transient faults is expected to increase significantly. Processors will need some form of fault tolerance. However, different applications have different reliability requirements (e.g. server-apps v/s games). Users who do not require high reliability may not want to pay the overhead. Fault tolerant mechanisms with low hardware cost are attractive because they allow the designs to be used for a wide variety of applications. The International Conference on Dependable Systems and Networks - 2005 3/22 Background on Fault Tolerance Storage structures are easy to protect with parity or ECC. Logic structures often need redundant execution: Space redundancy Time redundancy Space redundancy has high hardware overhead. Time redundancy has low hardware overhead but high performance overhead. The performance overhead of time redundancy can be reduced if redundant execution is performed during idle periods. The International Conference on Dependable Systems and Networks - 2005 4/22 Exploiting the Memory Wall for Idle Periods A long-latency miss typically stalls instruction processing. A significant proportion of execution time is spent waiting for memory. The International Conference on Dependable Systems and Networks - 2005 5/22 Exploiting the Memory Wall for Idle Periods A long-latency miss typically stalls instruction processing. Execution Time A significant proportion of execution time is spent waiting for memory. 58% L2miss Remaining 42% Vpr The International Conference on Dependable Systems and Networks - 2005 5/22 Exploiting the Memory Wall for Idle Periods A long-latency miss typically stalls instruction processing. Execution Time A significant proportion of execution time is spent waiting for memory. 58% L2miss Remaining 42% Vpr Redundant execution can leverage load values and branch prediction from primary execution. The International Conference on Dependable Systems and Networks - 2005 5/22 Exploiting the Memory Wall for Idle Periods A long-latency miss typically stalls instruction processing. Execution Time A significant proportion of execution time is spent waiting for memory. 58% L2miss L1miss+BP 25% Useful 17% Vpr There is enough idle time to re-execute the application! The International Conference on Dependable Systems and Networks - 2005 5/22 Motivation 2.5 16 L2MISS L1MISS+BP USEFUL 1.5 1.0 0.5 0.0 mcf art lucas ammp twolf wupwise apsi vpr facerec swim galgel gcc bzip2 vortex mgrid parser applu equake mesa gap sixtrack crafty perlbmk gzip eon fma3d Cycles Per Instruction 2.0 HIGH-MEM CATEGORY LOW-MEM CATEGORY The International Conference on Dependable Systems and Networks - 2005 6/22 Motivation 2.5 16 L2MISS L1MISS+BP USEFUL Cycles Per Instruction 2.0 1.5 1.0 0.0 mcf art lucas ammp twolf wupwise apsi vpr facerec swim galgel gcc bzip2 vortex mgrid parser applu equake mesa gap sixtrack crafty perlbmk gzip eon fma3d 0.5 HIGH-MEM CATEGORY LOW-MEM CATEGORY Can we design a time redundancy technique that utilizes memory waiting time for redundant execution? The International Conference on Dependable Systems and Networks - 2005 6/22 Outline Introduction Concept of Introspection Implementation of MBI Evaluation of MBI Summary The International Conference on Dependable Systems and Networks - 2005 7/22 The Concept of Introspection An example from the human domain: PERFORM ACTIONS PERFORM The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: PERFORM ACTIONS KEEP RECORD PERFORM A The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: PERFORM ACTIONS KEEP RECORD PERFORM B A The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: PERFORM ACTIONS KEEP RECORD PERFORM C B A The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: IDLE || FORCED PERFORM ACTIONS CHECK ACTIONS KEEP RECORD PERFORM C B A INTROSPECT The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: IDLE || FORCED PERFORM ACTIONS CHECK ACTIONS KEEP RECORD PERFORM C B INTROSPECT The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: IDLE || FORCED PERFORM ACTIONS CHECK ACTIONS KEEP RECORD PERFORM C INTROSPECT The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: IDLE || FORCED PERFORM ACTIONS CHECK ACTIONS KEEP RECORD PERFORM INTROSPECT The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection An example from the human domain: IDLE || FORCED PERFORM ACTIONS CHECK ACTIONS KEEP RECORD PERFORM INTROSPECT DONE || IDLE The International Conference on Dependable Systems and Networks - 2005 8/22 The Concept of Introspection Extending it to the microarchitecture domain: L2−MISS || BUFFER−FULL EXECUTE INSTRUCTIONS VERIFY RESULTS BACKLOG BUFFER PERFORM INTROSPECT MISS−SERVICED || BUFFER−EMPTY Microarchitecture-Based Introspection (MBI) The International Conference on Dependable Systems and Networks - 2005 8/22 Outline Introduction Concept of Introspection Implementation of MBI Evaluation of MBI Summary The International Conference on Dependable Systems and Networks - 2005 9/22 Additional Structures BUS L2 CACHE MEMORY D CACHE I CACHE F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T I ARF R E The International Conference on Dependable Systems and Networks - 2005 10/22 Additional Structures BUS L2 CACHE MEMORY D CACHE MODE I CACHE COMPARATOR F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF The International Conference on Dependable Systems and Networks - 2005 HEAD−PTR = ERROR BACKLOG TAIL−PTR BUFFER 10/22 Mechanism Two modes: Performance and Introspection Four cases: Operation in performance mode Switching from performance mode to introspection mode Operation in introspection mode Switching from introspection mode to performance mode The International Conference on Dependable Systems and Networks - 2005 11/22 Operation in Performance Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE HEAD−PTR F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF The International Conference on Dependable Systems and Networks - 2005 = TAIL−PTR BACKLOG BUFFER 12/22 Operation in Performance Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE HEAD−PTR F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF The International Conference on Dependable Systems and Networks - 2005 = A TAIL−PTR BACKLOG BUFFER 12/22 Operation in Performance Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE HEAD−PTR F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF The International Conference on Dependable Systems and Networks - 2005 = A B TAIL−PTR BACKLOG BUFFER 12/22 Operation in Performance Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE HEAD−PTR F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF The International Conference on Dependable Systems and Networks - 2005 = A B C TAIL−PTR BACKLOG BUFFER 12/22 Switching from Performance Mode to Introspection Mode Enter Introspection on: Long latency L2 miss (wait for 30 cycles) Backlog buffer full System call or Interaction with I/O The process involves: Flushing the pipeline Switching the mode bit Fetching instructions from the PC of ISPEC-ARF The International Conference on Dependable Systems and Networks - 2005 13/22 Operation in Introspection Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE HEAD−PTR F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF The International Conference on Dependable Systems and Networks - 2005 = A B C TAIL−PTR BACKLOG BUFFER 14/22 Operation in Introspection Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF A’ The International Conference on Dependable Systems and Networks - 2005 A = HEAD−PTR B C TAIL−PTR BACKLOG BUFFER 14/22 Operation in Introspection Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF B’ The International Conference on Dependable Systems and Networks - 2005 B = HEAD−PTR C TAIL−PTR BACKLOG BUFFER 14/22 Operation in Introspection Mode BUS L2 CACHE MEMORY D CACHE MODE I CACHE F E T C H PROCESSOR PIPELINE (VULNERABLE TO TRANSIENT FAULTS) R E T PERF I ARF R ISPEC E ARF C’ The International Conference on Dependable Systems and Networks - 2005 C = HEAD−PTR TAIL−PTR BACKLOG BUFFER 14/22 Switching from Introspection Mode to Performance Mode Exit Introspection mode when: L2 miss is serviced Backlog buffer is empty The process involves: Flushing the pipeline Switching the mode bit Fetching instructions from the PC of PERF-ARF The International Conference on Dependable Systems and Networks - 2005 15/22 Outline Introduction Concept of Introspection Implementation of MBI Evaluation of MBI Summary The International Conference on Dependable Systems and Networks - 2005 16/22 Methodology Pipeline: 20 stages, 8-wide issue with out-of-order execution L2 cache: Unified, 1MB, 15-cycle hit Memory: 400 cycles (32 banks) Prefetcher: Streaming with up to 32 streams Backlog Buffer contains 2k entries Benchmarks: SPEC CPU2000 The International Conference on Dependable Systems and Networks - 2005 17/22 Performance Overhead Forced-Introspection penalty Pipeline-Fill penalty The International Conference on Dependable Systems and Networks - 2005 18/22 Performance Overhead Forced-Introspection penalty Pipeline-Fill penalty 40 30 20 0 HIGH-MEM CATEGORY LOW-MEM CATEGORY The International Conference on Dependable Systems and Networks - 2005 hmean-all 10 mcf art lucas ammp twolf wupwise apsi vpr facerec swim galgel gcc bzip2 vortex mgrid parser applu equake mesa gap sixtrack crafty perlbmk gzip eon fma3d Percentage Reduction in IPC 50 18/22 Other Overheads: Storage and Error-Detection Latency Storage overhead: Depends on the Number of Entries in the Backlog Buffer Num. Entries Storage Cost IPC overhead 1k 8.5kB 16.1% 2k 16.5kB 14.5% 4k 32.5kB 13.9% Error Detection Latency Average: 682 cycles Worst case: 36183 cycles (impact depends on the error correction mechanism ) The International Conference on Dependable Systems and Networks - 2005 19/22 Outline Introduction Concept of Introspection Implementation of MBI Evaluation of MBI Summary The International Conference on Dependable Systems and Networks - 2005 20/22 Summary Transient fault rate is increasing. Processors will need some form of fault tolerance. Fault tolerant mechanisms with low hardware cost are attractive because they allow the designs to be used for a wide variety of applications. MBI utilizes the idle time during long latency cache misses for redundant execution. MBI has low hardware overhead and provides coverage for the entire pipeline. The time redundancy of MBI incurs an average IPC overhead of 14.5%. The International Conference on Dependable Systems and Networks - 2005 21/22 Questions The International Conference on Dependable Systems and Networks - 2005 22/22 Extra Slides The International Conference on Dependable Systems and Networks - 2005 23/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... Recover register state by copying ISPEC-ARF to PERF-ARF Recover memory state by undoing younger store operations. A HEAD−PTR B C D ADDR: VAL E F G ADDR: VAL H I ADDR: VAL J K L M ADDR: VAL TAIL−PTR The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A HEAD−PTR B C D ADDR: VAL E F G ADDR: VAL H I ADDR: VAL J K L M ADDR: VAL TAIL−PTR The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G ADDR: VAL H I ADDR: VAL J K L M ADDR: VAL TAIL−PTR The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G ADDR: VAL H I ADDR: VAL J K L UNDO TAIL−PTR M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G ADDR: VAL H I ADDR: VAL J TAIL−PTR K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G ADDR: VAL H I ADDR: VAL TAIL−PTR J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G ADDR: VAL H I UNDO TAIL−PTR J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G ADDR: VAL TAIL−PTR H I UNDO J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E F G UNDO TAIL−PTR H I UNDO J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL HEAD−PTR E TAIL−PTR F G UNDO H I UNDO J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR D ADDR: VAL E HEAD−PTR TAIL−PTR F G UNDO H I UNDO J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C ERROR HEAD−PTR D ADDR: VAL TAIL−PTR E F G UNDO H I UNDO J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C D ERROR HEAD−PTR TAIL−PTR UNDO E F G UNDO H I UNDO J K L UNDO M The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C D ERROR HEAD−PTR TAIL−PTR UNDO E F G UNDO H I UNDO J K L UNDO M No effect of the error causing instruction or any later instruction is architecturally visible. The International Conference on Dependable Systems and Networks - 2005 24/22 Implementation of MBI: Recovery MBI is primarily a fault detection mechanism. Possible recovery mechanisms include fail-safe and checkpointing. Can also use undo mechanism... A B C D ERROR HEAD−PTR TAIL−PTR UNDO E F G UNDO H I UNDO J K L UNDO M Needs special handling in case of multiprocessors. Data about error detection latency is here The International Conference on Dependable Systems and Networks - 2005 24/22 Related work Dual modulo redundancy: [e.g. IBM-z390, Compaq-Himalaya] High hardware cost The International Conference on Dependable Systems and Networks - 2005 25/22 Related work Dual modulo redundancy: [e.g. IBM-z390, Compaq-Himalaya] High hardware cost DIVA [Austin MICRO’99] Needs a checker processor – substantial investment Does not provide full coverage for the pipeline The International Conference on Dependable Systems and Networks - 2005 25/22 Related work Dual modulo redundancy: [e.g. IBM-z390, Compaq-Himalaya] High hardware cost DIVA [Austin MICRO’99] Needs a checker processor – substantial investment Does not provide full coverage for the pipeline Software based techniques: [EDDI, SWIFT, etc.] May need recompilation (not always possible) Cannot take advantage of run-time information (e.g. cache miss) The International Conference on Dependable Systems and Networks - 2005 25/22 Related work Dual modulo redundancy: [e.g. IBM-z390, Compaq-Himalaya] High hardware cost DIVA [Austin MICRO’99] Needs a checker processor – substantial investment Does not provide full coverage for the pipeline Software based techniques: [EDDI, SWIFT, etc.] May need recompilation (not always possible) Cannot take advantage of run-time information (e.g. cache miss) SMT based techniques: [e.g. RMT - ISCA’00] Need a fine-grained multi-threaded machine Reduces throughput The International Conference on Dependable Systems and Networks - 2005 25/22 Sensitivity to Entries in the Backlog Buffer % Reduction in IPC 25 20 art 15 10 5 0 1K 2K The International Conference on Dependable Systems and Networks - 2005 4K 26/22 Sensitivity to Entries in the Backlog Buffer % Reduction in IPC 25 ammp art 20 15 10 5 0 1K 2K The International Conference on Dependable Systems and Networks - 2005 4K 26/22 Sensitivity to Entries in the Backlog Buffer % Reduction in IPC 25 gcc ammp art 20 15 10 5 0 1K 2K The International Conference on Dependable Systems and Networks - 2005 4K 26/22 Sensitivity to Entries in the Backlog Buffer % Reduction in IPC 25 gcc average ammp art 20 15 10 5 0 1K 2K The International Conference on Dependable Systems and Networks - 2005 4K 26/22 Breakdown of Introspection Episodes 100 80 70 60 50 40 30 20 n eo gz ip k pe rlb m x rte vo 2 ip bz c gc lg ga r vp si ap f ol tw t ar cf 0 el 10 m % of all introspection episodes 90 forced-introspection normal-introspection The International Conference on Dependable Systems and Networks - 2005 27/22 100 90 80 70 60 50 40 30 20 n eo gz ip k pe rlb m x rte vo 2 ip bz c gc lg ga r vp si ap f ol tw t ar cf 0 el 10 m Coverage for all retired instructions (%) Coverage of Different Types of Introspection forced-introspection normal-introspection The International Conference on Dependable Systems and Networks - 2005 28/22