Reconfiguration Based Fault-Tolerant Systems Design - Survey of Approaches Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010 Outline ► Introduction ► FPGAs and SEU ► Reconfiguration based Fault-Tolerant designs Improved testing FT structures based on partial reconfiguration High-performance FT design Tranzistor & gate level reconfiguration ► Flash-based FPGAs ► Reconfigurable Electronics for Space ► Conclusion Introduction ► SRAM-Based FPGA ► FPGA is the most used platform for developing new designs and systems ► FPGA dependability and reliability are most discussed issues FPGAs and SEU ► FPGA is sensitive to natural radiation effects, the most discussed ones are so called Single Event Upsets ► SEU can impact FPGA in different ways: Change of conguration memory Generated pulse on interconnection Causing Latch-up Affecting non-programed part of FPGA Affecting clock domain distribution ► Different situation requires specific solution Reconfiguration based Fault-Tolerant designs ► Fault-Tolerant desing = redundancy ► Redundancy serves only for a given time ► We have to use reconfiguration to keep FPGA’s FT parameters ► There are different ways how we can use reconfiguration to achieve FT design Improved testing I. ► Testing is important part of dependable design flow ► Testing allows us to: Prove design right functionality Localize Faults Prevent latent Faults Improved testing II. ► BIST architecture based on reconfiguration ► Improved Test Access Mechanism ► Can obtain high overhead caused by bus macros Picture from: Rozkovec, M., Novak, O., “Structural test of programmed FPGA circuits" FT structures based on partial reconfiguration I. ► Reconfiguration allows various options how to implement FT design ► Basic idea is to divide design in smaller parts which can be reconfigured/replaced ► Smaller the parts bigger the overhead is, we need to find trade-off FT structures based on partial reconfiguration - app. A* ► Each application divided into many small so called partial reconfigurable modules ► Reconfiguration supervised by partial reconfigurable controller ► Good fault localization, fault impacts smaller area of design, can obtain high HW overhead (bus macros), synchronization issues after reconfiguration *) Straka M., Kastil J., Kotasek Z., “Fault Tolerant Structure for SRAM-based FPGA via Partial Dynamic Reconguration" FT structures based on partial reconfiguration - App. B* *) Borecky J., Kohlik M., Kubatova H., Kubalik P., “Fault Coverage Improvement based on Fault Simulation and Partial Duplication" FT structures based on partial reconfiguration - App. B ► Fault impacts relatively big part of design ► Obtained HW overhead is smaller ► Synchronization be solved after reconfiguration has to FT structures based on partial reconfiguration - App. C* ► Self-Repair Dual FPGA architecture used ► Design divided into columns, spares columns allow Self-Repair ability ► Soft microcontroller evaluates flags from second FPGA, in case of error, faulty FPGA is reconfigured by another one ► Obtaining good trade-off between overhead and fault localization ► Using same bit stream in both FPGA can be risky *) S. Mitra, W.-J. Huang, N. R. Saxena, S.-Y. Yu, E.J. McCluskey, “Recongurable Architecture for Autonomous Self Repair" High-performance FT system* ► SEU dosage varies with place on the orbit = we can use reconfiguration to switch modes ► When lower density of SEU we can switch to High-performance or power-safe mode ► Using High-Performance mode speeds-up computation by 2.3x compared to use of standard TMR *) Jacobs, A., George, A.D., Cieslewski, G.,”Recongurable fault tolerance: A frame-work for environmentally adaptive fault mitigation in space” Transistor and gate level reconfiguration* ► Reconfiguration is performed on transistor/gate level ► Redundant N/P diffusions can tolerate faults in silicon VDD VDD in1 out out in1 red u n d an t tran sistors in2 in2 GND GND a) *) H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits" b) Transistor and gate level reconfiguration ► Replacing gate whole faulty ► Obtained HW overhead is between 30-120% ► Requires layout supervising in Taken from: H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits" Flash-based FPGA ► Configuration stored in Flash memory ► Alternative platform to develop FT design ► Intrinsically SEU hard configuration memory ► Slower then SRAM-based FPGAs ► Higher voltage required to perform programming Reconfigurable Electronics for Space* ► NASA Rovers on MARS ► On board Xilinx FPGA ► Reconfiguration performed by ASIC Analog/Digital SRAAs ► FPGA implements digital interface between PC and Proto Board *) Didier Keymeulen, "Self-Repairing and Tuning Recongurable Electronics for Space" Conclusion I. ► Reconfiguration allows us to created FT ► Reconfiguration based systems fight high design in FPGA area overhead ► Synchronization issues is mostly overlooked, but it has to be solved Conclusion II. ► FPGA reconfiguration for space applications is due to harsh environment unreliable ► Most approaches don’t take into account industrial requirements ► Areas like aerospace or railway can benefit from reconfiguration Thank you for your attention