ppt

advertisement
IPR: In-Place Reconfiguration
for FPGA Fault Tolerance
Zhe Feng1, Yu Hu1, Lei He1 and Rupak Majumdar2
1Electrical Engineering Department
2Computer Science Department
University of California, Los Angeles
Present by Zhe Feng
Address comments to lhe@ee.ucla.edu
Outline

Introduction and motivation

Algorithms

Experimental Results

Conclusions
Soft Error

Soft errors could be caused by cosmic rays or noise
upsets

Future devices more vulnerable due to scaling


Special session 1E “Resilient Computing”
Two types of soft errors in FPGA


Single Event Upset (SEU): Modification of the content of
memory bits
Single Event Transient (SET): Glitches latched by registers
SEU for FPGA

SEU of block memory can be detected and corrected by
row-based CRC and ECC

SEU of configuration memory can be fixed by
 Periodical memory scrubbing.
 Scan-based CRC and ECC
Both may be too late, as the circuit function may have
been changed.
SER (Soft Error Rate)

SER is calculated by Monte Carlo simulation under single
fault model.

In each run, SER is the percentage of clock cycles with
observable errors at primary output for given test bench

The overall SER is the average of all runs.

SER  1/ MTTF (mean time to failures)
Impact of SEU for FPGA

FGPA has 10x bigger SER compared to ASIC
 Due to large configuration memory

SEU is one of biggest challenges for FPGA-based
applications
 Most FPGAs are used in systems but not prototypes
 One of the biggest application is internet routers
 FPGA boards returned after two crashes
FPGA Resynthesis
RTL
Synthesis
Logic
Synthesis
Technology
Mapping
Resynthesis
Packing
P&R
(Source: Andrew Ling,
University of Toronto,
DAC'05)

Resynthesis

Rewrites the circuit in logic or physical netlist

Reconfigures the LUTs
ROSE: RObust REsynthesis
[ICCAD08’]




ROSE performs iterative logic transformations with explicit
stochastic yield rate evaluation
Logic transformation by fault tolerance Boolean Matching
Boolean Matching
Fault-Tolerant
Boolean Matching


Inputs
 Template H and Boolean function F for logic block
 Fault rates for the inputs and the SRAM bits of the template
Outputs
 Either that F cannot be implemented by template H
 Or the configuration of H to obtain function F minimizes
the observable faults at the output of the template
Need of In-place Logic Optimization

High-level circuit
description
ROSE, same as most existing logic
optimization techniques, does not
preserve the layout (topology) of a
Logic synthesis
circuit design.

Interconnect dominates in FPGA
Logic optimization

In-place resynthesis (IPR) leads to a
Fault
info
ROSE
faster design closure.
Physical resynthesis

Minimal or no impact on the
physical design
IPR
Bitstream
Timing
info
Our Major Contributions

Propose an in-place resynthesis algorithm, IPR

Maximize the yield rate for FPGAs

Preserve the topology of the logic network

Reduce the runtime complexity compared to other SAT-based
approaches

IPR reduces the fault rate by 48% and increases MTTF by 1.94X.

Compared to the state-of-the-art academic technology mapper
Berkeley ABC.

With the same area and performance.
Outline

Background

Algorithms

Experimental Results

Conclusions
IPR: In-place Reconfiguration
Highly (0 -> 1)
defective
input 1
0 0 1 1 1 1
0 1 1
0
0 1 1 0 0 1 0 0
LUT A
input 1
input 0
00
C00=1
01
C01=1
10
C10=0
11
C11=0
1 1 0 0 0 1
1
0
0 0 0
Fault rate
= 37.5%
1 0 1 1 1 1
0 1 1
Fault rate
= 12.5%
(0 -> 1)
Highly
defective
0 0 1 1 1 1
0 1 1
0
input 1
0 1 1 0 0 1 0 0


LUT B
input 1
input 0
00
C00=1
01
C01=0
10
C10=1
11
C11=1
Maximize identical configuration bits for complementary inputs of an LUT.
Change the functions of multiple LUTs to guarantee the function of the
circuit unchanged.
IPR algorithm
Circuit Analysis
Initial Full-chip
Functional Simulation
Initial Full-chip
ODC Mask Calculation
Node Criticality
Analysis
Cone Construction
In-place LUT Reconfiguration
and Boolean Matching
Localize
Truth Table Update
Localize
ODC Mask Update
Localize Update
IPR algorithm
Circuit Analysis
Initial Full-chip
Functional Simulation
Initial Full-chip
ODC Mask Calculation
Node Criticality
Analysis
Cone Construction
In-place LUT Reconfiguration
and Boolean Matching
Localize
Truth Table Update
Localize
ODC Mask Update
Localize Update
ODC Mask based Node Criticality
Primary outputs
1
0 1 0
1 0
0 0 0 1
…
1
0 0
1 0
1 0
1
Logic
Network
1 0 01 0
ODC mask: 1010
(I. Markov, ICCAD’07)

1
0 0 0 0
The ODC mask quantifies the impact of a node on the primary outputs.
The criticality of a node is defined as the percentage of one’s in the
ODC mask, and decides the priority of reconfiguration in IPR.

IPR algorithm
Circuit Analysis
Initial Full-chip
Functional Simulation
Initial Full-chip
ODC Mask Calculation
Node Criticality
Analysis
Cone Construction
In-place LUT Reconfiguration
and Boolean Matching
Localize
Truth Table Update
Localize
ODC Mask Update
Localize Update
Cone Construction


Select a subset SN of first-order fanout LUTs of n
Construct a cone for a selected root LUT
 Root LUT is a fanout of SN
 Include SN but not its first-order fanins
 Cut size of the cone is limited
a
n
b
d
c
e
Root
In-place LUT Reconfiguration


The functions of LUTs in the cone are changed to increase # of
identical configuration pairs
But function of input/out nets and topology of internal nets are kept
unchanged
 No change of circuit function and layout
a
n
b
d
c
e
Root
In-place Boolean Matching
The cone can be encoded as follows
PLB template
Conjunctive Normal Form (CNF)
Truth table can be encoded as follows
c0, SRAM
c16, SRAM
c1, SRAM
c17, SRAM
LUT1
LUT2
c15, SRAM
c31, SRAM
x'1 x'2 x'3 x'4
Combining all the three, we have CNF
formulation for in-place Boolean matching (IP-BM).
Boolean function
To make a pair of configuration bits (ci, cj)
in LUT L symmetric, we have
z1
G
x'5 x'6 x'7
x1 x2 x3
0 0 0
1 0 0
0 1 0
x4
0
0
0
x5 x6
0 0
0 0
0 0
x7
0
0
0
F
F0
F1
F2
1
1
1
1
F127
1
1
IP-BM preserves both the logic function and topology of the cone.
1
Outline

Background

Algorithms

Experimental Results

Conclusions
Experimental Settings and CAD Flows

Implemented in C++ and use miniSAT2.0
as the SAT solver


Results collected on a Ubuntu workstation
with 2.6GHz Xeon CPU and 2GB memory
QUIP benchmarks are tested

Mapped with 4-LUTs by Berkeley ABC

Perform and compare the following
synthesis flows: ABC, IPR, ROSE+IPR
Experimental Settings and CAD Flows (Cont’)

Fault model



Uniform soft error rate for all configuration bits in LUT but
ignore interconnect configuration bits during IPR.
Uniform soft error rate for all configuration bits in LUT and
interconnect during validation.
The fault rate of the chip is calculated by Monte Carlo
simulation


Single fault injection for all configuration bits in LUT and
interconnect
32k random inputs
Full-chip Fault Rate by Monte Carlo Simulation
6.00%
Full-chip fault rate
5.00%
59%vs.
fault
reduction!
ABC
IPRrate
vs. ROSE+IPR:
1:0.52:0.51
4.00%
3.00%
2.00%
1.00%
0.00%
QUIP benchmarks
ABC
IPR
ROSE+IPR
Area (LUT#)
LUT#
2000
1800
1600
1400
1200
1000
800
600
400
200
0
ABC vs. IPR vs. ROSE+IPR:
1: 1 : 0.81
QUIP benchmarks
ABC
IPR
ROSE+IPR
Estimation of Mean Time To Failure
MTTF ratio
2.40
2.39
1.94
1.00
0.00
1.00
ROSE+IPR
2.00
ROSE
IPR
3.00
ABC
50x faster!

The best flow in terms of the robustness
and area is ROSE+IPR
Conclusions

We develop an in-place resynthesis algorithm, IPR.

Increases MTTF by 2X over ABC;

Preserves the topology of the logic network for a faster design
closure;


Complementary to existing fault-tolerant resynthesis algorithms.
In the future, we will consider

Experiments assume multiple uncorrelated faults and given
correlations between faults;

Extend IPR with criticality considering interconnects explicitly.
Thank You!
IPR: In-Place Reconfiguration for FPGA Fault Tolerance
Zhe Feng, Yu Hu, Lei He and Rupak Majumdar
Backup Slides
Criticality for Configuration Bit


Depends on two criteria:

One is a sequence of input vectors for the LUT.

The other is the ODC mask of the LUT.
The criticality of a configuration bit c :
In-place Boolean Matching
The cone can be encoded as follows
PLB template
Conjunctive Normal Form (CNF)
Truth table can be encoded as follows
c0, SRAM
c16, SRAM
c1, SRAM
c17, SRAM
LUT1
LUT2
c15, SRAM
c31, SRAM
x'1 x'2 x'3 x'4
Combining all the three, we have CNF
formulation for in-place Boolean matching (IP-BM).
Boolean function
To make a pair of configuration bits (ci, cj)
in LUT L symmetric, we have
z1
G
x'5 x'6 x'7
x1 x2 x3
0 0 0
1 0 0
0 1 0
x4
0
0
0
x5 x6
0 0
0 0
0 0
x7
0
0
0
F
F0
F1
F2
1
1
1
1
F127
1
1
IP-BM preserves both the logic function and topology of the cone.
1
IPR algorithm
Circuit Analysis
Initial Full-chip
Functional Simulation
Initial Full-chip
ODC Mask Calculation
Node Criticality
Analysis
Cone Construction
In-place LUT Reconfiguration
and Boolean Matching
Localize
Truth Table Update
Localize
ODC Mask Update
Localize Update
Localized Update

Localized update of ODC mask reduces runtime
CMFI is affected,
ODC mask
but the ODC mask
updated for CR.
is not updated
to save time.
Maximum
Fanin Cone
CMFI
Reconfigured
Cone CR
CMFO is not affected, so
the ODC mask does
not need to be updated.
Maximum
Fanout Cone
CMFO
Key to stochastic synthesis: Logic Masking


Defects are created equally but not propagated equally
Logic don’t-cares may mask the propagation of defects
Not affected by defects!
Observability
Don’t-cares with
a=1&b=1
1
1
defect

We can maximize don’t-cares while keeps the logic function.
IPR Enhancement

Iterative (i.e., random) algorithm without greedy
procedure based on criticality


Provide different ordering for optimization of gates

Without periodic yield rate evaluation

With periodic yield rate evaluation
Large cut size

Increase the opportunity to find the feasible cone.
IPR Enhancement (Cont’)

Extend to MIMO

MISO
MIMO
Increase the opportunity to try more LUTs
Download