IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng1, Yu Hu1, Lei He1 and Rupak Majumdar2 1Electrical Engineering Department 2Computer Science Department University of California, Los Angeles Present by Zhe Feng Address comments to lhe@ee.ucla.edu Outline Introduction and motivation Algorithms Experimental Results Conclusions Soft Error Soft errors could be caused by cosmic rays or noise upsets Future devices more vulnerable due to scaling Special session 1E “Resilient Computing” Two types of soft errors in FPGA Single Event Upset (SEU): Modification of the content of memory bits Single Event Transient (SET): Glitches latched by registers SEU for FPGA SEU of block memory can be detected and corrected by row-based CRC and ECC SEU of configuration memory can be fixed by Periodical memory scrubbing. Scan-based CRC and ECC Both may be too late, as the circuit function may have been changed. SER (Soft Error Rate) SER is calculated by Monte Carlo simulation under single fault model. In each run, SER is the percentage of clock cycles with observable errors at primary output for given test bench The overall SER is the average of all runs. SER 1/ MTTF (mean time to failures) Impact of SEU for FPGA FGPA has 10x bigger SER compared to ASIC Due to large configuration memory SEU is one of biggest challenges for FPGA-based applications Most FPGAs are used in systems but not prototypes One of the biggest application is internet routers FPGA boards returned after two crashes FPGA Resynthesis RTL Synthesis Logic Synthesis Technology Mapping Resynthesis Packing P&R (Source: Andrew Ling, University of Toronto, DAC'05) Resynthesis Rewrites the circuit in logic or physical netlist Reconfigures the LUTs ROSE: RObust REsynthesis [ICCAD08’] ROSE performs iterative logic transformations with explicit stochastic yield rate evaluation Logic transformation by fault tolerance Boolean Matching Boolean Matching Fault-Tolerant Boolean Matching Inputs Template H and Boolean function F for logic block Fault rates for the inputs and the SRAM bits of the template Outputs Either that F cannot be implemented by template H Or the configuration of H to obtain function F minimizes the observable faults at the output of the template Need of In-place Logic Optimization High-level circuit description ROSE, same as most existing logic optimization techniques, does not preserve the layout (topology) of a Logic synthesis circuit design. Interconnect dominates in FPGA Logic optimization In-place resynthesis (IPR) leads to a Fault info ROSE faster design closure. Physical resynthesis Minimal or no impact on the physical design IPR Bitstream Timing info Our Major Contributions Propose an in-place resynthesis algorithm, IPR Maximize the yield rate for FPGAs Preserve the topology of the logic network Reduce the runtime complexity compared to other SAT-based approaches IPR reduces the fault rate by 48% and increases MTTF by 1.94X. Compared to the state-of-the-art academic technology mapper Berkeley ABC. With the same area and performance. Outline Background Algorithms Experimental Results Conclusions IPR: In-place Reconfiguration Highly (0 -> 1) defective input 1 0 0 1 1 1 1 0 1 1 0 0 1 1 0 0 1 0 0 LUT A input 1 input 0 00 C00=1 01 C01=1 10 C10=0 11 C11=0 1 1 0 0 0 1 1 0 0 0 0 Fault rate = 37.5% 1 0 1 1 1 1 0 1 1 Fault rate = 12.5% (0 -> 1) Highly defective 0 0 1 1 1 1 0 1 1 0 input 1 0 1 1 0 0 1 0 0 LUT B input 1 input 0 00 C00=1 01 C01=0 10 C10=1 11 C11=1 Maximize identical configuration bits for complementary inputs of an LUT. Change the functions of multiple LUTs to guarantee the function of the circuit unchanged. IPR algorithm Circuit Analysis Initial Full-chip Functional Simulation Initial Full-chip ODC Mask Calculation Node Criticality Analysis Cone Construction In-place LUT Reconfiguration and Boolean Matching Localize Truth Table Update Localize ODC Mask Update Localize Update IPR algorithm Circuit Analysis Initial Full-chip Functional Simulation Initial Full-chip ODC Mask Calculation Node Criticality Analysis Cone Construction In-place LUT Reconfiguration and Boolean Matching Localize Truth Table Update Localize ODC Mask Update Localize Update ODC Mask based Node Criticality Primary outputs 1 0 1 0 1 0 0 0 0 1 … 1 0 0 1 0 1 0 1 Logic Network 1 0 01 0 ODC mask: 1010 (I. Markov, ICCAD’07) 1 0 0 0 0 The ODC mask quantifies the impact of a node on the primary outputs. The criticality of a node is defined as the percentage of one’s in the ODC mask, and decides the priority of reconfiguration in IPR. IPR algorithm Circuit Analysis Initial Full-chip Functional Simulation Initial Full-chip ODC Mask Calculation Node Criticality Analysis Cone Construction In-place LUT Reconfiguration and Boolean Matching Localize Truth Table Update Localize ODC Mask Update Localize Update Cone Construction Select a subset SN of first-order fanout LUTs of n Construct a cone for a selected root LUT Root LUT is a fanout of SN Include SN but not its first-order fanins Cut size of the cone is limited a n b d c e Root In-place LUT Reconfiguration The functions of LUTs in the cone are changed to increase # of identical configuration pairs But function of input/out nets and topology of internal nets are kept unchanged No change of circuit function and layout a n b d c e Root In-place Boolean Matching The cone can be encoded as follows PLB template Conjunctive Normal Form (CNF) Truth table can be encoded as follows c0, SRAM c16, SRAM c1, SRAM c17, SRAM LUT1 LUT2 c15, SRAM c31, SRAM x'1 x'2 x'3 x'4 Combining all the three, we have CNF formulation for in-place Boolean matching (IP-BM). Boolean function To make a pair of configuration bits (ci, cj) in LUT L symmetric, we have z1 G x'5 x'6 x'7 x1 x2 x3 0 0 0 1 0 0 0 1 0 x4 0 0 0 x5 x6 0 0 0 0 0 0 x7 0 0 0 F F0 F1 F2 1 1 1 1 F127 1 1 IP-BM preserves both the logic function and topology of the cone. 1 Outline Background Algorithms Experimental Results Conclusions Experimental Settings and CAD Flows Implemented in C++ and use miniSAT2.0 as the SAT solver Results collected on a Ubuntu workstation with 2.6GHz Xeon CPU and 2GB memory QUIP benchmarks are tested Mapped with 4-LUTs by Berkeley ABC Perform and compare the following synthesis flows: ABC, IPR, ROSE+IPR Experimental Settings and CAD Flows (Cont’) Fault model Uniform soft error rate for all configuration bits in LUT but ignore interconnect configuration bits during IPR. Uniform soft error rate for all configuration bits in LUT and interconnect during validation. The fault rate of the chip is calculated by Monte Carlo simulation Single fault injection for all configuration bits in LUT and interconnect 32k random inputs Full-chip Fault Rate by Monte Carlo Simulation 6.00% Full-chip fault rate 5.00% 59%vs. fault reduction! ABC IPRrate vs. ROSE+IPR: 1:0.52:0.51 4.00% 3.00% 2.00% 1.00% 0.00% QUIP benchmarks ABC IPR ROSE+IPR Area (LUT#) LUT# 2000 1800 1600 1400 1200 1000 800 600 400 200 0 ABC vs. IPR vs. ROSE+IPR: 1: 1 : 0.81 QUIP benchmarks ABC IPR ROSE+IPR Estimation of Mean Time To Failure MTTF ratio 2.40 2.39 1.94 1.00 0.00 1.00 ROSE+IPR 2.00 ROSE IPR 3.00 ABC 50x faster! The best flow in terms of the robustness and area is ROSE+IPR Conclusions We develop an in-place resynthesis algorithm, IPR. Increases MTTF by 2X over ABC; Preserves the topology of the logic network for a faster design closure; Complementary to existing fault-tolerant resynthesis algorithms. In the future, we will consider Experiments assume multiple uncorrelated faults and given correlations between faults; Extend IPR with criticality considering interconnects explicitly. Thank You! IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng, Yu Hu, Lei He and Rupak Majumdar Backup Slides Criticality for Configuration Bit Depends on two criteria: One is a sequence of input vectors for the LUT. The other is the ODC mask of the LUT. The criticality of a configuration bit c : In-place Boolean Matching The cone can be encoded as follows PLB template Conjunctive Normal Form (CNF) Truth table can be encoded as follows c0, SRAM c16, SRAM c1, SRAM c17, SRAM LUT1 LUT2 c15, SRAM c31, SRAM x'1 x'2 x'3 x'4 Combining all the three, we have CNF formulation for in-place Boolean matching (IP-BM). Boolean function To make a pair of configuration bits (ci, cj) in LUT L symmetric, we have z1 G x'5 x'6 x'7 x1 x2 x3 0 0 0 1 0 0 0 1 0 x4 0 0 0 x5 x6 0 0 0 0 0 0 x7 0 0 0 F F0 F1 F2 1 1 1 1 F127 1 1 IP-BM preserves both the logic function and topology of the cone. 1 IPR algorithm Circuit Analysis Initial Full-chip Functional Simulation Initial Full-chip ODC Mask Calculation Node Criticality Analysis Cone Construction In-place LUT Reconfiguration and Boolean Matching Localize Truth Table Update Localize ODC Mask Update Localize Update Localized Update Localized update of ODC mask reduces runtime CMFI is affected, ODC mask but the ODC mask updated for CR. is not updated to save time. Maximum Fanin Cone CMFI Reconfigured Cone CR CMFO is not affected, so the ODC mask does not need to be updated. Maximum Fanout Cone CMFO Key to stochastic synthesis: Logic Masking Defects are created equally but not propagated equally Logic don’t-cares may mask the propagation of defects Not affected by defects! Observability Don’t-cares with a=1&b=1 1 1 defect We can maximize don’t-cares while keeps the logic function. IPR Enhancement Iterative (i.e., random) algorithm without greedy procedure based on criticality Provide different ordering for optimization of gates Without periodic yield rate evaluation With periodic yield rate evaluation Large cut size Increase the opportunity to find the feasible cone. IPR Enhancement (Cont’) Extend to MIMO MISO MIMO Increase the opportunity to try more LUTs