Uploaded by cw1995coin

DDR3 SDRAM Stuck Weak Bit Studies and Mitigation

advertisement
2022 22nd European Conference on Radiation and Its Effects on Components and Systems (RADECS) | 979-8-3503-7123-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/RADECS55911.2022.10412606
DDR3 SDRAM Stuck/Weak Bit Studies and
Mitigation
Pierre-Xiao Wang, Vincent Wartelle, Kay Chesnut, Maggie Byers, Kai Grürmann,
Timo Dirkes, Pierre Kohler, Timothee Dargnies

Abstract-- This paper presents the results of DDR3 SDRAM
stuck/weak bit studies from HI, TID, Neutron, Proton and
temperature tests. The objectives include identifying reasonable
engineering solutions to either screen or mitigate the stuck/weak
bit for space applications.
Index Terms— Weak bit, Stuck bit, Hard error, DDR, SDRAM,
SEE, TID, Neutron, Proton.
I.
INTRODUCTION
T
he Dynamic random-access memory (DRAM) has been
widely used in space for more than 10 years. The DRAM
memory cell hard errors (HE), which are defined as an
unalterable change of state associated with semi-permanent
radiation damage to a memory cell (bit), have been observed
during ground radiation tests since the early history of DRAM
components.
This kind of radiation-effects induced error was first reported
in 1983 [1], but the term “stuck bit” was coined to describe
un-rewritable DRAM bits [2]. More recently, these DRAM
HEs were studied by A. Rodriguez [3] and M. Amrbar [4].
These studies showed that HE can be related to different
radiation factors including both Single Event Effects (SEE)
and accumulated dose effects, such as Total Ionizing Dose
(TID), Displacement Damage (DD) & Micro DD. SEE and
annealing effects with different mechanisms are described in
the investigations of L. Scheick [5] and V. Goiffon [6]. On
the other hand, if we look at the DRAM development history,
some other electrical and operating environments [7] are also
connected to HE such as operating temperature, operating
modes, refresh period and initial semiconductor low data
retention bits and End of Life (EoL) bits.
In this paper, we define the HE in two categories: Stuck Bit
(SB) and Weak Bit (WB). The SB is defined as unrewriteable memory bit (cell), that cannot be written correctly
in two consecutive write/read operations under nominal
operating frequency. The WB is a lower data retention time
Manuscript received April 9, 2022.
Pierre-Xiao Wang, Vincent Wartelle, Pierre Kohler, Timothee Dargnies
are with 3D PLUS, 408 rue Hélène Boucher, 78532 BUC CEDEX, France
(e-mail: {pwang,abosser,pkohler}@3d-plus.com).
Kay Chesnut, Maggie Byers are with Raytheon Technologies, 2000 E. El
Segundo Blvd, El Segundo, CA 90245, USA
Kai Grürmann, Timo Dirkes are with DSI Aerospace Technologie GmbH,
Sitz der Gesellschaft: Otto-Lilienthal-Str. 1, D-28199 Bremen, Germany
memory bit, and while it is rewritable, it cannot store the data
within a defined refresh period (in most cases the standard
refresh period is tREF=64ms). Please note that a write/readback operation on the single address can detect the SB.
However, a cycle of write the full memory/read-back will
detect both SBs & WBs since the WB will manifest as an
error under a preset refresh period that is running in the
background during the full memory operation.
3D PLUS provides space memory modules, and many
DRAMs have been evaluated, including different generations
(SDRAM, DDR1, DDR2, DDR3 and DDR4), feature sizes,
cell designs, and foundries used in manufacturing the DRAM.
We observed that the SB and WB characteristics depend on
these factors. For example, similar feature sizes from different
foundries may create several orders of magnitude differences
in the number of SBs/WBs. The objective of this paper is to
present a specific DDR3’s SB/WB characterization results
and study a proposed SB/WB screening and mitigation
methodology.
II. EXPERIMENTAL SETUPS AND FLOW
The Device under Test (DUT) is a 4Gb DDR3 SDRAM with
a system frequency up to 1066 MHz, organized as 512Mx8b
using a 78-ball FBGA, or 256Mx16b using a 96-ball FBGA.
The evaluation flow went through the Heavy Ion (HI) test as
presented in the paper “SEL/SEU/SEFI/TID Results of the
Radiation Hardened DDR3 SDRAM Memory Solution” in
2017 [8], to evaluate this DUT and several other components
at RADEF. The “best” or “most insensitive” one (the DUT of
this paper) was selected then to go through the TID, Neutron,
Proton, and other evaluation tests, then embedded in 3D
PLUS space grade DDR3 modules: ex: 3D3D16G72WB2723
/ 3D3D24G48YB2732 . The TID irradiation used the Co60
source at CEA, Neutron irradiation used the source at the
University of Massachusetts Lowell (MA), and Proton
irradiation used the Paul Scherrer Institute (PSI) Proton
Irradiation Facility (PIF). The HI and Proton tests used the
same test bench with capability to operate the DDR3 DUTs at
a clock frequency of up to 400 MHz, which is based on a
Xilinx Virtex6 FPGA application. The TID, Neutron and
Temperature tests used a commercial memory tester. More
irradiation and test bench details will be given in the final
paper.
979-8-3503-7123-9/22/$31.00
Authorized licensed use©2022
limitedIEEE
to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply.
III. HEAVY IONS RESULTS
The HI test was performed in 2017 with ions listed in
Table I, and test conditions were detailed in the paper
“SEL/SEU/SEFI/TID Results of the Radiation Hardened
DDR3 SDRAM Memory Solution” [8]. The SB/WBs were
verified before and after irradiation during storage, read,
write/read mode tests with a random data pattern. Only a
standard 64ms refresh period was used, and, during the test,
there was no identification discrimination between SBs and
WBs.
TABLE I
RADEF BEAM CHARACTERISTICS
during TID tests, but in this particular test, refresh time was a
fixed value, and was used as a Go-No Go test.
Semiconductors exhibit an operational margin when testing
the tREF parameter, which means that the mean value of
successful DRAM refresh timing will be longer at room
temperatures and that is much longer than the standard 64 ms
specification. However, we observed that this margin shrinks,
especially at high temperature. Some DRAMs specify a 32 ms
refresh rate at high temperature (e.g., +105°C). For those
cases, we used a burn-in procedure (240h, +125°C) with
min/max temperature tests prior to TID tests, where the
characterization testing was performed at 3D PLUS. We
observed some tREF degradation and degradation in other
parameters (e.g., leakage current) plus functional failures
using a tREF down to 64ms. These tREF degradation
characteristics were used to generate ideas on how to screen
the SB/WBs.
V. NEUTRON RESULTS
Figure 1 gives the heavy ion cross section (cm²/bit) curve of
total SBs/WBs. The HI test in 2017 focused on SEFI and
SEFI mitigation, and SB/WBs were not the first criteria used
to judge different candidates. Later studies showed that this
DUT had fewer SBs/WBs. The SBs/WBs were also checked
under the runs with 1E6p/cm² fluence.
To separate DD from TID and SEE effects, 25 samples were
divided into five bins and were exposed to different 1 MeV
equivalent fluences up to 1.18E+12 n/cm², as shown in Table
2, at the radiation laboratory of the University of
Massachusetts in Lowell, MA at the beginning of 2020.
TABLE 2
NEUTRON EFFECTIVE FLUENCES
1Mev Fluence
Level (n/cm²)
5.40E+10
1.17E+11
1.68E+11
6.70E+11
1.18E+12
Party Quantity
(piece)
4
5
5
6
5
After exposure and cool down time, the DUTs were tested
using a dedicated memory tester for functional and parametric
measurements on the full memory. All DUTs – up to the
neutron fluence of 1.68E+11 neutron/cm2 – remained fully
functional and entirely within specifications, including
meeting the tREF 64ms at +105°C.
Fig. 1 SB/WB cross-section (cm²/bit)
IV. TOTAL IONIZING DOSE RESULTS
Fig. 2 Number of WBs vs tREF at +105°C after neutron exposure to 6.7E+11
n/cm² and 1.18E+12 n/cm²
After SEE tests, this DDR3 went through several TID
characterization tests at 3D PLUS with a Co60 source using a
dose rate around 300 rads/hour. The DUTs (5pcs biased on,
5pcs with no bias, and 1 control that was not dosed)
successfully passed at 75 Krad(Si). There were no SB/WBs
observed during the TID test using a refresh rate, tREF 64 ms.
Note that the refresh time is one of the evaluated parameters
The DUTs exposed to the two highest fluences of 6.7E+11
and 1.18E+12 neutrons/cm² also remained fully functional
and within specifications, except the tREF degraded to 16ms
between +85°C and +105°C. There were no observed SBs on
any of the cells, and WBs were observed only once devices
hit the two highest fluence levels tested.
Authorized licensed use limited to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply.
and lots’ variances are observed a priori, the radiation
relationship can potentially be established at a later proton
test. Therefore, it will be possible to screen radiation SB/WB
through simple electrical tests.
Fig. 3 WB vs tREF at room temperature after 6.7E+11 and 1.18E+12 n/cm²
Fig. 4 number of DUTs with WB under +125°C & VDDmin
Figure 2 gives two samples’ WB results with different tREF
times at Vccmin/max after 6.7E+11 and 1.18E+12 n/cm²
when tested at +105°C (the counter stopped at 4096). To have
a better understanding, room temperature tests were also
carried out. Figure 3 gives two samples’ WB results at
different tREF times at Vccmin/max after 6.7E+11 and
1.18E+12 neutrons/cm2 when tested at room temperature. No
WBs were observed at -55°C (tREF > 64ms) at all fluence
levels.
VI. ANALYSIS OF “WEAK BITS” OVER TREF AND
TEMPERATURE
The SB/WBs were quantified together as HE during our
earliest HI test. In the TID evaluation, we linked the tREF
margin with temperature test. We first tested SB/WB under
different temperatures to understand neutron interactions.
After neutron characterization was completed, we developed
screening strategies and designed a specific program to screen
SB/WBs through tREF margin tests. The mechanisms to create
the SB/WBs from radiation are presented in the papers
mentioned in the paper introduction. However, the reasons &
results (TID, SEE, DD, MicroDD) of SB/WB for each
DRAM are different from our former experience because of
process and design differences. On the other hand, we
observed how the degradation of tREF plays a role in the
SB/WB. If the SB can be considered as a WB with lower
retention time than a write/read back operation, can we screen
the individual component or lot SB/WB using tREF margin
over temperature? Even if the degradation ratios from
different manufacturing and radiation sources are different,
the large margin in tREF can at least delay individual SB/WB
failure.
Based on the TID results (lot based), 24 DUTs from two lots
(12pcs/lot) were selected. Between the two lots, we observed
around 20% of difference in term of TID tolerance: A(worse
TID) and B (better TID). The initial idea was to measure lots’
tREF variances and lot homogeneity using these two “good”
and “bad” TID lots as a test case. If the lot non-homogeneity
This specific program took the tREF test worst case
(Tmax=+125°C, VDDmin=1.283V) to check the DUTs’ WB
characteristics under 32/64/75/100/150/250/500/750/1000ms
refresh periods with checkerboard and reverse-checkerboard
patterns. No WB was observed under 32ms tREF. More than
4096 WBs were observed under 750ms and 1000ms, and
4096 is the counter limit, so the figures in this chapter will not
show 32/750/1000ms data.
Fig. 4 gives the comparison between lot A and B in terms of
how many DUTs manifest WBs at different refresh times. The
“good” lot showed roughly twice better than “bad” lot in term
of tREF. The number of WBs for each piece may be between 1
to 4096. Fig. 5 gives the total number of WBs for these two
lots.
Fig. 5 number of total WBs under +125°C & VDDmin
Combining the two figures, tREF lot-to-lot and piece-to-piece
variances were observed. The TID “good” lot is also much
better than TID “bad” lot in term of tREF lot homogeneity and
variance. The tREF results were not out of datasheet limit, and
it may be used to make parameter margin screening.
VII. PROTON RESULTS
The proton test used was representative of low orbit
missions since proton interactions include TID/SEE/DD
effects. To have a good understanding of how the SB/WBs
manifest during the mission, a proton test was organized. It
had been delayed several times because of the Covid19
situation, and it was performed at the end of 2021. The
Authorized licensed use limited to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply.
objective of the proton test included testing the possibility to
screen the components’ SB/WB with temperature results. As
presented from temperature test results in section VI of this
paper, the components from two lots with different TID
results had been prepared, which had variances in tREF lot to
lot, and sample to sample within a lot. In each lot, the DUTs
were distributed in three groups: “worst”, “normal” and
“best”, based on high temperature tREF margin results. The
idea was to measure the number of SBs/WBs as a function of
the proton fluence and establish the lot characteristics using
tREF margin results from temperature tests. If lot and lot
relation is established, a tREF temperature test can be
integrated into Lot Acceptance Test (LAT) to perform lot
selection as part of the TID LAT. If sample characteristics are
established, a radiation SB/WB screening test can be realized
by using a 100% high temperature electrical tREF margin test.
Three proton energies at 50.8, 101.34 and 200Mev were
used to irradiate the DUTs up to 3E11p/cm² fluence with the
attention focused on controlling the max TID to stay at
around 30 Krad(Si) (less than half of the component’s TID
limit). We stopped at different fluences to check the SB/WB
in real time. Even at the highest 3E11 p/cm² fluence, only tens
of SB/WBs were observed. Figure 6 gives the cross section
(cm²/bit) at different proton energies. Red samples were from
lot A and the others were from lot B.
VIII. MITIGATION DISCUSSION AND CONCLUSION
A DDR3 SDRAM went through HI/Co60/Neutron/High
temperature/Proton tests, the SB/WB cross sections were
analyzed to help to quantify the SB/WB risks for space
applications. Because of test bench limitations, the SBs and
WBs were combined during HI/Proton tests, but we later
concluded that only WBs were observed through device
characterization performed after the HI/Proton tests. The
selected DDR3 only had few SB/WB at a proton fluence of
3E11p/cm².
This study shows an encouraging approach to screen DRAM
radiation effects’ SB/WBs through high temperature tREF
margin test, but it was inconclusive because of the lack of
statistically significant data to show how screening might
work on this DDR3. Note that this DUT was (successfully)
selected for space applications because it was the most
insensitive-to-radiation candidate based on tests on devices
from several DDR3 manufacturers [8].
To handle the SB/WBs we recommend:
1. A powerful ECC to correct SB/WB together with
SEU/MBU.
2. Shorter DRAM refresh periods to mitigate WBs and high
temperature applications.
3. Avoid rewrites of the SBs in loops. Some EDAC designs
require a process to rewrite/check error address, and this can
overflow the error counter because of SBs.
3D PLUS is also introducing the Radiation Intelligent
Memory controller (RIMC) IP Core for DRAM modules [9],
and the recommendations above are integrated in the RIMC.
REFERENCES
Fig. 6 Weak bit in W/R mode with 64ms refresh time
In most of cases, the number of HEs just stay around 10 to 20
bits among 4G bits with 64ms. When we shortened the refresh
time by a factor of 2, the number of HEs will roughly
decrease by a factor of 3. If refresh times are increased by a
factor of 2, the number of HEs roughly increased by a factor
of 3.
Annealing effects were also studied. Most HEs annealed after
retest of the DUTs around 34 days later when they were
shipped back from PSI, the proton facility. The worst DUT
only had 7 HEs at 64ms and had 1 HE at 32ms. These results
showed that it was difficult to establish the lot/sample and
fluence/temperature relationships because so few SBs/WBs
were observed: typically fewer than 10 SBs/WBs, and in
many cases we observed only one or two SBs/WBs at the
32ms refresh timing level across 4G memory cells.
[1] A.R.Knudson, et al. “Dose Dependence of Single Event Upset Rate in
MOS DRAMS,” IEEE Transactions on Nuclear Science, Year: 1983,
Volume: 30, Issue: 6
[2] S. Duzellier, et al “Protons and heavy ions induced stuck bits on large
capacity RAMs”, RADECS 93. Year: 1993, Pages: 468 – 472
[3] A. Rodriguez, et al. “Proton-Induced SDRAM Cell Degradation”,
RADECS 2015, Year: 2015, Pages: 1 – 4.
[4] M. Amrbar, et al “Total Ionizing Dose Response of SDRAM, DDR2 and
DDR3 Memories”, 2016 REDW, Year: 2016, Pages: 1 – 6.
[5] L. Scheick, et al. “Investigation of the Mechanism of Stuck Bits in High
Capacity SDRAMs” 978-1-4244-2545-7/08/ © 2008 IEEE P47-52
[6] V. Goiffon, et al. “Radiation-Induced Variable Retention Time in
Dynamic Random Access Memories”, IEEE Transactions on Nuclear
Science (Volume: 67, Issue: 1, Jan. 2020) Page(s): 234 – 244
[7] P.J. Restle, et al. “DRAM Variable Retention Time”, 0-7803-0817-4/92/
© 1992 IEEE
[8] P-X.Wang, et al. “SEL/SEU/SEFI/TID Results of the Radiation
Hardened DDR3 SDRAM Memory Solution”, 2018 IEEE Radiation Effects
Data Workshop (REDW), DOI: 10.1109/NSREC.2018.8584290
[9] https://www.3d-plus.com/product.php?fam=11&prod=32
Authorized licensed use limited to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply.
Download