Reconguration Based Fault Tolerant Systems Design

advertisement
Reconfiguration Based
Fault-Tolerant Systems
Design - Survey of
Approaches
Jan Balach, Ondřej Novák
FIT, CTU in Prague
MEMICS 2010
Outline
► Introduction
► FPGAs
and SEU
► Reconfiguration based Fault-Tolerant designs




Improved testing
FT structures based on partial reconfiguration
High-performance FT design
Tranzistor & gate level reconfiguration
► Flash-based
FPGAs
► Reconfigurable Electronics for Space
► Conclusion
Introduction
► SRAM-Based
FPGA
► FPGA
is the most used platform for
developing new designs and systems
► FPGA
dependability and reliability are most
discussed issues
FPGAs and SEU
► FPGA
is sensitive to natural radiation effects, the
most discussed ones are so called Single Event
Upsets
► SEU can impact FPGA in different ways:





Change of conguration memory
Generated pulse on interconnection
Causing Latch-up
Affecting non-programed part of FPGA
Affecting clock domain distribution
► Different
situation requires specific solution
Reconfiguration based Fault-Tolerant
designs
► Fault-Tolerant
desing = redundancy
► Redundancy serves only for a given time
► We have to use reconfiguration to keep
FPGA’s FT parameters
► There are different ways how we can use
reconfiguration to achieve FT design
Improved testing I.
► Testing
is important part of dependable
design flow
► Testing
allows us to:
 Prove design right functionality
 Localize Faults
 Prevent latent Faults
Improved testing II.
► BIST
architecture
based on
reconfiguration
► Improved
Test Access
Mechanism
► Can
obtain high
overhead caused by
bus macros
Picture from: Rozkovec, M., Novak, O., “Structural test of programmed FPGA circuits"
FT structures based on partial
reconfiguration I.
► Reconfiguration
allows various options how
to implement FT design
► Basic
idea is to divide design in smaller
parts which can be reconfigured/replaced
► Smaller
the parts bigger the overhead is, we
need to find trade-off
FT structures based on partial
reconfiguration - app. A*
► Each
application divided into many small so called
partial reconfigurable modules
► Reconfiguration
supervised by partial
reconfigurable controller
► Good
fault localization, fault impacts smaller area
of design, can obtain high HW overhead (bus
macros), synchronization issues after
reconfiguration
*) Straka M., Kastil J., Kotasek Z., “Fault Tolerant Structure for SRAM-based FPGA via Partial Dynamic Reconguration"
FT structures based on partial
reconfiguration - App. B*
*) Borecky J., Kohlik M., Kubatova H., Kubalik P., “Fault Coverage Improvement based on Fault Simulation and Partial Duplication"
FT structures based on partial
reconfiguration - App. B
► Fault
impacts relatively big part of design
► Obtained
HW overhead is smaller
► Synchronization
be solved
after reconfiguration has to
FT structures based on partial
reconfiguration - App. C*
► Self-Repair
Dual FPGA architecture used
► Design divided into columns, spares columns allow
Self-Repair ability
► Soft microcontroller evaluates flags from second
FPGA, in case of error, faulty FPGA is reconfigured
by another one
► Obtaining good trade-off between overhead and
fault localization
► Using same bit stream in both FPGA can be risky
*) S. Mitra, W.-J. Huang, N. R. Saxena, S.-Y. Yu, E.J. McCluskey, “Recongurable Architecture for Autonomous Self Repair"
High-performance FT system*
► SEU
dosage varies with place on the orbit =
we can use reconfiguration to switch modes
► When lower density of SEU we can switch to
High-performance or power-safe mode
► Using High-Performance mode speeds-up
computation by 2.3x compared to use of
standard TMR
*) Jacobs, A., George, A.D., Cieslewski, G.,”Recongurable fault tolerance: A frame-work for environmentally adaptive
fault mitigation in space”
Transistor and gate
level reconfiguration*
► Reconfiguration
is
performed on
transistor/gate level
► Redundant
N/P
diffusions can tolerate
faults in silicon
VDD
VDD
in1
out
out
in1
red u n d an t
tran sistors
in2
in2
GND
GND
a)
*) H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits"
b)
Transistor and gate level
reconfiguration
► Replacing
gate
whole faulty
► Obtained
HW overhead
is between 30-120%
► Requires
layout
supervising in
Taken from: H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits"
Flash-based FPGA
► Configuration
stored in Flash memory
► Alternative platform to develop FT design
► Intrinsically SEU hard configuration memory
► Slower then SRAM-based FPGAs
► Higher voltage required to perform
programming
Reconfigurable Electronics for
Space*
► NASA
Rovers on MARS
► On board Xilinx FPGA
► Reconfiguration
performed by ASIC
Analog/Digital SRAAs
► FPGA implements
digital interface
between PC and Proto
Board
*) Didier Keymeulen, "Self-Repairing and Tuning Recongurable Electronics for Space"
Conclusion I.
► Reconfiguration
allows us to created FT
► Reconfiguration
based systems fight high
design in FPGA
area overhead
► Synchronization
issues is mostly overlooked,
but it has to be solved
Conclusion II.
► FPGA
reconfiguration for space applications
is due to harsh environment unreliable
► Most
approaches don’t take into account
industrial requirements
► Areas
like aerospace or railway can benefit
from reconfiguration
Thank you for
your attention
Download