03_7_FPLD_challenges

advertisement
Recent Challenges
Soft Errors
•
Scaling:
  SEU (Single-event upset):
− Ionizing radiation corrupts data stored
 Cause:
− Radioactive impurities in device packages
− Recently: cosmic radiation
 Scaling worsens SEU:
1. Voltage scaling + reduced node capacitances
−  lower the charge threshold necessary to corrupt the
data
2. Greater level of integration
−  increases the likelihood that soft errors will affect the
device
2
SEU
• Sources:
 Configuration memory
 Flip-flops
 Memory blocks
 Combinational circuits (transient error  permanent)
3
 Combinational circuits (transient error  permanent)
4
SEU in Configuration Memory
• SEU in cinfiguration bits (SRAM-based):
 In Virtex FPGAs, ~ 91% of sensitive bits to soft errors
are configuration bits
− flash- or antifuse-based do not suffer
 Any change to the configuration memory may alter the
functionality
 Persist until FPGA is reprogrammed
5
SEU Mitigation Techniques
•
Mitigation techniques:
1. Circuit and technology-level:
−
Addition of metal capacitors to nodes in the memory 
increases the amount of charge necessary to cause SEU
2. System-level:
−
Ensures that the system can detect and recover.
−
Regularly verify their configuration memory by comparing the current
values with the desired configuration state using cyclic redundancy checks
(Altera Stratix III)
3. User-level:
a) TMR (triple modular redundancy):
−
−
Replicating a design three times and voting among outputs
Reduce the sensitivity to soft errors in the design by careful
selection of the resources used
6
Circuit Level
• [Ebrahimi]:
 Reduce # SRAM cells in a switch box (6  5)( 6 4)
0
1
2
3
0
1
2
3
7
Circuit Level
• [Ebrahimi]:
 Reduce # SRAM cells in a switch box (6  5)( 6 4)
0
N
w
0
x
W
y
z
a
e
b
E
f
c
d
0
S
0
8
User Design Level
• Care bits [Golshan07] :
 Only a subset of configuration bits affect the design
due to SEU.
• Resource A is used for net A
 A-B SRAM is not a care bit if B is not used by other nets.
 A-C SRAM bit is a care bit (change to ‘1’ hurts net A).
 A-D SRAM bit is not a care bit (w.r.t. net A) if D not used.
9
User Design Level
• Soft Error Routing Problem [Golshan07]:
 Given a routing graph and a set of multi-terminal nets,
route each net with the least care-cost, where carecost is the number of routing care bits.
• Experiments:
 14% reduction in the number of care bits
− ~80% of soft errors in the FPGA: configuration memory
[Kuon07]
10
Recent Challenges
Process Variation
Process Variation Sources
x 10-
Leff
2.3
7
2.2
2.1
2.0
1.9
1.8
60
100
Wafer X
40
50
20
0
Wafer Y
[IBM, Intel and TSMC]
12
Variation Variations
• Variation of variation over years
ILD: inter-layer dielectric
• Variation from mean value
− Gate oxides are so thin that a change of one atom
can cause a 25 percent difference in substrate
current.
− EE Times (04/11/2006)
13
Statistical Description
 The combined set of underlying deterministic and
random contributions are lumped into a combined
“random” statistical description.
 For devices on one wafer, the distribution (mean and
variance) for L can be different from devices within a
single die.
14
Inter-die vs. Intra-die Variations
Leff
Inter-die
global
Correlation
Intra-die
spatial
Correlation
• Figures are courtesy of IBM, Intel and TSMC 15
Impact of Variation
• Importance of variation:
 Timing violations
−  Yield loss
16
Impact of Variation
• Process variations can cause
 up to 2000% variation in leakage current and
 30% variation in frequency in 180nm CMOS
− Borkar, S., Karnik, T., Narenda, S., Tschanz, J., Keshavarzi, A., De, V.
Parameter Variations and Impact on Circuits and Microarchitecture. In
Proc. of DAC (2003), 338-342.
17
Impact of Variation
Die-to-die frequency variation
18
Variation in FPGA
• Binning:
 Historically: most of variation between dies
−  FPGA manufacturers test the speed of each FPGA
after manufacturing and binning each device according
to its speed.
− Higher speeds: more expensive
− Unacceptable leakage power: discard the device
 More recently: significant within die variation
− Cannot be leveraged in the same manner
− Operating speeds must be reduced to maintain
functionality
− 90nm: speed reduction of 5.7%
− 22nm: speed reduction of 22.4%
19
Solutions
•
Architectural solution:
1. Select the logic block architecture parameters to minimize
this variation
− LUT size is particularly important [Wong05]
− LUT size = 4 : highest leakage yield
− LUT size = 7 : highest timing yield
− LUT size = 5 : maximum combined leakage and timing yield.
2. Adaptively compensate for any variation through bodybiasing [Nabaa06]:
− Slow blocks: set to a body bias  decrease Vt  increase
block’s speed
− Fast blocks: increase threshold voltage  reduce leakage
power
 Experiments:
−  Area penalty: 1%–2%
−  Delay variability reduction: 30%
−  Leakage variability reduction: 78%
20
Solutions
•
CAD-Level:
1. Statistical static timing analysis (SSTA) in FPGA
CAD tools
−  Improve delays by avoiding the margins that are
necessary for traditional STA
2. Testing multiple logically equivalent configurations of
the FPGA to find one that is functional at the desired
speed [Sedcole07]
3. Generating critical paths that will be more robust in
the face of variation [Matsumoto07]
21
Inter-die vs. Intra-die Variations
P0 =
ΔPintradie =
Δ Pinterdie =
Δ Pe =
nominal design value
intra-die variation (within a given chip)
Inter-die variation (from one chip to another)
remaining “random” or unexplained variation
P: a structural or electrical parameter e.g.
−
−
−
−
−
−
W,
tox,
Vth,
channel mobility,
coupling capacitances,
line resistances.
22
Corner Analysis
•
PRCA (Process Corner Analysis):
 Takes
1. nominal values of process parameters
2. and a delta for each parameter by which it varies.
 Finds
− performance as max and min values.
•
•
Pros:
 Simple
Cons:
 conservative
 inaccurate
23
Corner Analysis
H
Hmax
M3
H Cg
T
W
M2
Tmin
Cg
M1
T
Tmax
Hmin
W
Wmin
Wmax
• PRCA shortcoming:
 Process corners are believed to coincide with
performance corners.
− Fact: best-case corner may not depend on Pmin or Pmax
for a particular interconnect parameter but on a value
within that range.
24
SSTA
25
Solutions
•
CAD-Level:
2. Testing multiple logically equivalent configurations of
the FPGA to find one that is functional at the desired
speed [Sedcole07]
26
References
• [Kuon07] Kuon, Tessier, “FPGA Architecture: Survey and
Challenges,” Foundations and Trends in Electronic Design
Automation, Vol. 2, No. 2 (2007) 135–253.
• [Lin07] Yan Lin and Lei He, Device and Architecture Concurrent
Optimization for FPGA Transient Soft Error Rate, ICCAD 2007
• [Golshan07] S. Golshan and E. Bozorgzadeh, “Single-eventupset (SEU) awareness in FPGA routing,” in DAC ’07:
• [Xilinx] www.xilinx.com
• [Altera] www.altera.com
• [Wong05] H.-Y.Wong, L. Cheng, Y. Lin, and L. He, “FPGA
device and architecture evaluation considering process
variations,” in ICCAD, 2005.
• [Nabaa06] G. Nabaa, N. Azizi, and F. N. Najm, “An adaptive
FPGA architecture with process variation compensation and
reduced leakage,” DAC, 2006.
27
References
• [Sedcole07] P. Sedcole and P. Y. K. Cheung, “Parametric
yield in FPGAs due to within-die delay variations: A
quantitative analysis,” in FPGA, 2007.
28
Download