ABSTRACT* With technology scaling, *Power-Wall* and

advertisement
ABSTRACT— With technology scaling, “Power-Wall” and
“Reliability wall” have turned out to be the major challenges.
In recent years, many approaches like clock-gating, dynamic
voltage scaling, drowsy- cache etc, have been adopted to
mitigate the issues relating to the power-wall. These
techniques like dynamic voltage scaling, drowsy cache tend to
operate at lower supply voltages, and with technology scaling
node capacitances also keep reducing. This has enabled to
have reduced energy consumption but at the cost of increased
soft error rates. In this project, we carry out the analysis of soft
error rate impact for the standard 6T SRAM cell based cache
memory due to the technology scaling and operation of lowpower drowsy mode caches. First, we study the effect of
technology scaling on the soft error rate in terms of the FIT
rate, and then we address the impact of operating the caches in
drowsy mode (lower operating voltages) on the soft error
rates. Then, we apply the PARMA model to classify the FIT
rate into SDC (Silent Data Corruption) and DUE (Detectable
Unrecoverable Errors). We carry out the simulation of
SPEC2000 benchmarks on drowsy cache enabled simplescalar tool to quantify the results. The results obtained for the
FIT rates for technologies scaling from 65nm to18nm
technologies, using the ITRS roadmap indicate the peak
increase of around 12.5% and the variation of FIT rate in each
technology nodes for the drowsy voltage levels scaling from
nominal supply voltage to the lower operating voltage, as low
as 1.1times the threshold voltage indicate the maximum
increase of 2.6%. More importantly, we demonstrate that the
low voltage operating mode of drowsy caches can be
employed at different technology scales (65nm-22nm), as they
do not impact much on the soft error rate and there is no need
to build any additional expensive corrective mechanisms for
improving the immunity to soft error for operating drowsy
mode caches.
Index Terms—Drowsy-Cache,
Technology-Scaling.
Reliability,
Soft-Error,
INTRODUCTION
Traditionally technology scaling aims to improve the
performance, increase the transistor density, and reduce the
energy/power consumption per transistor. In this aspect,
CMOS technology has been more promising in effectively
meeting the low power demands. But the CMOS technology
scaling beyond the 90nm have raised significant concerns in
meeting the power demands of the microprocessors [15, 17]
and have also raised the concerns over the system reliability
[18]. This need of meeting the power demand has become a
major issue and has led to the term “Power Wall”- which
signifies the much-aggravated issue of meeting the power
demands of the power hungry microprocessors.
Power consumption of the transistors can be
attributed to two chief aspects namely the Dynamic power and
the static/leakage power. Dynamic power is due to the
switching activity in CMOS circuits, while static power
consumption is due to leakage current, and unlike dynamic
power, static leakage is not based on activity and contributes
to power dissipation even when the transistors are not
operational. This static leakage (Ileak) is primarily due to subthreshold and gate-oxide leakage currents as shown in Eq (1).
πΌπ‘™π‘’π‘Žπ‘˜ = πΌπ‘ π‘’π‘π‘‘β„Žπ‘Ÿπ‘’π‘ β„Žπ‘œπ‘™π‘‘ + πΌπ‘”π‘Žπ‘‘π‘’π‘œπ‘₯𝑖𝑑𝑒
- (1)
Isubthreshold 𝛼 π‘Šπ‘’ −π‘£π‘‘β„Ž/𝑣
- (2)
πΌπ‘”π‘Žπ‘‘π‘’π‘œπ‘₯𝑖𝑑𝑒 ∝ (
V
Tox
2
) e−
Tox
V
- (3)
From Eq(2,3 ) sub-threshold leakage depends on the
factors like supply voltage (v) , threshold voltage(Vth) and gate
width (W), and gate-oxide leakage depends on the supply
voltage and gate oxide thickness( Tox). These parameters vary
with technology and with technology scaling the impact is
seen to be increasing the contribution of static power
dissipation. According to the 2007 International Technology
Roadmap for Semiconductors (ITRS) [1] power dissipation
due to static leakage predicted to constitute to more than 50%
of the total power dissipation of the processors.
Several efforts have been made to address the Powerwall issue, by paying special attention to reducing the power
dissipation by incorporating changes in the process technology
as well as by adding intelligence in the architecture and circuit
design [18].
Also, over the past few years, the trend has been to
incorporate large caches in the microprocessors as they tend to
provide tremendous benefit in terms of improving the
processor performance. But the large caches also constitute
towards significant fraction of the total power consumption
especially due to the static leakages. Further, with the feature
sizes shrink, the dominant component of this power loss will
be due to static leakage. In order to mitigate impact of static
leakage in microprocessors, techniques like Gated VDD,
Dynamic voltage scaling (DVS), increased threshold voltage,
Drowsy cache, etc have been proposed [ 18, 19].
The state preserving techniques like drowsy cache
employ the approach of reducing the operational supply
voltage of the cache lines that is just sufficient enough to
retain the data. This way the impact of static leakage is
greatly mitigated, without less impact of processor
performance. But in this aspect, it is important to note that the
state retention on lower voltages imply significant reduction in
the node charges, which mean that the cache data is rendered
more vulnerable to the transient faults that may occur due to
alpha particle strike or the neutrons strike. This escalation of
vulnerability accounts to add up to another issue referred to as
soft error wall.”
Radiation-induced transient faults arise from
energetic particles, such as alpha particles from packaging
material and neutrons from the atmosphere, generating
electron–hole pairs (directly or indirectly) as they pass through
a semiconductor device. Transistor source and diffusion nodes
can collect these charges. Sufficient amount of accumulated
charge may invert the state of a logic device, such as a latch,
static random access memory (SRAM) cell, or gate, thereby
introducing a logical fault into the circuit’s operation. Because
this type of fault does not reflect a permanent malfunction of
the device, it is termed soft or transient [19].
Radiation-induced transient faults arise from
energetic particles, such as alpha particles from packaging
material and neutrons from the atmosphere, generating
electron–hole pairs (directly or indirectly) as they pass through
a semiconductor device. Transistor source and diffusion nodes
can collect these charges. Several architectural techniques
have been implemented to tackle the soft error problem, for
e.g. error correction codes (ECC) are commonly employed in
the memory systems, while the high-end systems employ
redundant copies of hardware to detect faults and recover from
errors. However, many of these solutions have been
prohibitively expensive and difficult to justify in the
mainstream commodity computing market [19].
In this paper, we focus on analyzing the soft error
rate impact due to the technology scaling on low-power
drowsy mode caches. The key motivation for this project is to
study the effectiveness of employing the drowsy mode caches
for next generation technologies and to quantify their impact
on the reliability factor, by means of which we can justify
whether there is need or employing any additional/expensive
mechanisms to address the reliability concerns that may arrive
due to drowsy caches.
RELATED WORK
This paper models the effect of technology scaling on the
soft error rate of SRAM cell for current and future scaling
technologies for low power cache design, while most of the
previous experimental work related to SER has been done to
estimate the soft error rate of the SRAM cell for current
technologies with nominal VDD.[11][12]
[13] studied the SER of a low power 70nm SRAM cell. It
presents different circuit design techniques used to reduce the
power consumed by the SRAM cell, and it analysed the
impact of implementing two commonly used architecturallevel leakage reduction approaches namely the cache decay
and drowsy cache on the reliability of the cache system. It
concludes that implementing these techniques is a tradeoff
between optimizing SRAM cell for leakage power and
improving the immunity to soft error. In addition to this they
ran their experiments on a commercial chip with neutroninduced soft errors at Breazeale Nuclear Reactor Facility.
Whereas in contrast, in our paper we studied the reliability of
the drowsy cache for current and future technologies on 6T
SRAM cell and compared it with the reliability of a cache
working on nominal VDD. We use an empirical model given
by [Hazucha] to calculate SER for 6T SRAM cell.
[14] In this paper, they measure the SER for SRAM and
Combinational logic for different device and pipeline scaling
from 600nm to 50nm. Their study shows that SER for SRAM
for deeper technologies for a constant SRAM chip area will
increase slowly. In our paper we measured the SER for SRAM
cell for technologies from 65nm to 18nm for IMB L2 cache.
In addition we calculated the SER with the existence 1-bit
ECC which is supported by the PARMA model. On the other
hand, in our study we do not handle the pipeline scaling, and
hence we fixed the pipeline depth by using simplescalar
simulator running in simcache mode for all simulations.
[21] studied the Multi Bit Upset (MBU) probability for 65
nm technology, and it showed that this effect will be more
important with deeper technology scaling. In this paper we
have restricted our study to single bit upset for different
technology scaling due to time constraints.
[22] ran a 3D simulation to get the value of Qcritial
charge. Then [] compared this value with different
mathematical models used to present the generated current
pulse from a particle strike. [] shows a wide variation in the
Qcritical between different current models. In our paper we
adopted an approximation model used by [.7 tinvert] for the
calculation of the current pulse generated from the particle
strike.
[23] studied the effect of different process variation
parameters (gate length, Vth, and Tox) on Qcritical and
concluded that gate length is the main parameter that affect
Qcrit and SER. In our study we find that Vth also plays a main
role in determining the SER in drowsy mode by controlling
the lowest workable drowsy level (approximately 1.5Vt)
PROJECT EXPERIMENTAL METHODOLOGY
In this section, we describe the experimental work, aspects of
our simulation framework and how they are used to analyze
the Soft Error rate and classify them to SDC, true and DUE for
the SPEC2000 benchmarks.
First, we employ the ITRS2007 roadmap for the High
Performance profile to the MASTAR tool to derive the
technology related parameters for the technologies with
feature size ranging 65nm to 18nm technologies. We restrict
out study to 65nm to 18nm CMOS bulk technology as we see
that the impact of power and reliability walls is more
pronounced in the technology scales beyond the 90nm
technology, and the CMOS bulk model is the one more
vulnerable to the static leakage issues, where employing
drowsy cache will be more significant than the Hi-K or SOI
technologies. Also, beyond the 18nm technology, we could
not get the CMOS bulk technology scaling parameters from
the ITRS2007 profiles in the MASTAR tool.
Also, for estimation of the node capacitance, pmos
leakage current, and critical charge factors, we employ the
High Performance profile to the MASTAR tool The reason we
choose the HP profile is due to the fact that the variations in
the Threshold voltage is more significant (margin between the
nominal vdd and the threshold voltage is large enough) in HP
profile than those compared in the LSTP and LP profiles.
Also, the High performance profile provides the relatively
larger leakage currents.
Further, to evaluate the SER for drowsy cache modes,
apart from the nominal vdd we employ three lower operational
voltages chosen as the multiples of threshold voltages
operational modes namely: two, one-and-half, and 1.1 times
the threshold voltage. Even though, the references indicate the
best possible drowsy operational voltage to be 1.5times
threshold voltage [ROCHE]. This way, we ensure that we
consider the lowest possible operational drowsy voltage level,
to estimate the account of drowsy voltage operation more
aggressively.
For simulation in simple-scalar tool, we plan to set
the use the simple-cache mode of simple scalar that accounts
for in-order execution of Load/Store instructions that access
the cache, without adding the complexity of executing all the
other types of instructions in out-order mode. This way it
helps us to execute the benchmarks in much faster mode,
without compromising on the execution of the cache access
related instructions. Further, we plan to set the drowsywindow size of 4000 cycles, as this window size
approximately corresponds to the sweet-spot – where the
energy-delay product is maximum for the in-order processor
cores [20].
Finally, we choose the workloads from the
SPEC2000 benchmarks for the PISA architecture. We choose
both the benchmarks from the SPECINT and SPECFP
workloads. This way we can characterize both the integer
based and Floating point oriented benchmarks. Further, based
on the results shown in the drowsy-cache implementation
paper [20], we choose the ‘gzip’ benchmark from the integer
based and the ‘ammp’ benchmark for the floating point based
workloads- as these two correspond to have the highest runtime overheads- which suggest having more number of access
on the different drowsy-mode cache lines, at every window
cycle. This enables us to have good mix of benchmarks in
terms of cache access.
The PARMA model that accurately classifies the cache
faults into TRUE vs RAW errors and SDVC vs DUE errors is
employed to classify the Soft errors into SDC and DUE is
used without any modification to the core model. We only
modify the execution pattern form cycle-by-cycle mode to the
instruction-by-instruction mode, but the notion of cycle that is
employed by the PARMA model for estimation and
classification of the soft errors is not changed.
REFERENCE
[1] M. Powell et.al, Gated-Vdd: A Circuit technique to reduce leakage in deep
submicron cache memories. Proc. Of Int. Symp. Low Power Electronics and
design, 2000.
[2] K. Flautner, et.al, Drowsy Caches: Simple Techniques for Reducing
Leakage Power. Proc. of the 29th annual Int. Symp on Computer Architecture
(ISCA 02).
[3] S.S. Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, and T. Austin. A
systematic methodology to compute the architectural vulnerability factors for
a high-performance microprocessor. Proceedings of the 36th Annual
International Symposium on Microarchitecture, pages 29 - 40, Dec 2003.
[4] S. Mukherjee Architecture Design for Soft Errors.
[5] H. Mostafa, M. Anis, M. Elmasry. Comparative Analysis of Process
Variation Impact on Flip-Flops Soft Error Rate.
[6] P. Hazucha and C. Svensson, Member, IEEE. Impact of CMOS
Technology Scaling on the Atmospheric Neutron Soft Error Rate
[7] Accurate Reliability Benchmarking of Caches with PARMA
[8] A. J. Johnston. Scaling and Technology Issues for Soft Error Rates
[9] V. Degalahal, L. Li, V. Narayanan M. Kandemir, M. J. Irwin. Soft Errors
Issues in Low-Power Caches
[10] T. Heijmen, D. Giot, P. Roche. Factors that impact the critical charge of
memory elements.
[11]T. Juhnke and H. Klar. Calculation of the soft error rate of submicron
CMOS logic circuits. IEEE Journal of Solid State Circuits, 30:830–834, July
1995.
[12]Y. Tosaka, S. Satoh, K. Suzuki, T. Sugii, H. Ehara, G.Woffinden, and
S.Wender. Impact of cosmic ray neutron induced soft errors on advanced
submicron cmos circuits. Symposium on VLSI Technology Digest of
Technical Papers, 1996
[13]Degalahal, V.; Lin Li; Narayanan, V.; Kandemir, M.; Irwin, M.J. Soft
Errors Issues in Low-Power Caches ,Very Large Scale Integration (VLSI)
Systems, IEEE Transactions on Volume 13, Issue 10, Oct. 2005
Page(s):1157 – 1166
[14]Burger ,D. et al. Modeling the Impact of Device and Pipeline Scaling on
the Soft Error Rate of Processor Elements,technical Report,2002
[15] N.S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J.S. Hu,
M.J.Irwin, M. Kandemir, and V. Narayanan. Leakage current: Moore's law
meets static power. Computer, 36(12):68{75, Dec. 2003.
[16] S. Borkar. Design challenges of technology scaling. Micro, IEEE,
19(4):23-29,Jul-Aug 1999.
[17] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De.
Parameter variations and impact on circuits and microarchitecture. Proceedings of the 40th Annual Conference on Design Automation, pages 338342, 2003.
[18] BOOK
[19] S.S Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, and T.Austin. A
systematic methodology to compute the architectural vulnerability factors for
a high-performance microprocessor. Proceedings of the 36th Annual
International Symposium on Microarchitecture, pages 29- 40, Dec 2003.
[20] K. Flautner, N. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy
caches: simple techniques for reducing leakage power. Proceedings of the
29th Annual International Symposium on Computer Architecture, pages 148 157, July 2002.
[21] Ruckerbauer , F. ,Soft Error Rates in 65nm SRAMs – Analysis of new
Phenomena, 13th IEEE International On-Line Testing Symposium (IOLTS
2007).
[22] Naseer, R.et al. Critical Charge Characterization for Soft Error Rate
Modeling in 90nm SRAM, IEEE International Symposium on Circuits and
Systems 2007, ISCAS 2007.
[23]Ding ,Q. et al. Impact of process variation on soft error vulnerability for
nanometer VLSI circuits 6th International Conference On ASIC, 2005.
ASICON 2005
[24] S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. Cacti 5.1.
Download