2022 22nd European Conference on Radiation and Its Effects on Components and Systems (RADECS) | 979-8-3503-7123-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/RADECS55911.2022.10412606 DDR3 SDRAM Stuck/Weak Bit Studies and Mitigation Pierre-Xiao Wang, Vincent Wartelle, Kay Chesnut, Maggie Byers, Kai Grürmann, Timo Dirkes, Pierre Kohler, Timothee Dargnies Abstract-- This paper presents the results of DDR3 SDRAM stuck/weak bit studies from HI, TID, Neutron, Proton and temperature tests. The objectives include identifying reasonable engineering solutions to either screen or mitigate the stuck/weak bit for space applications. Index Terms— Weak bit, Stuck bit, Hard error, DDR, SDRAM, SEE, TID, Neutron, Proton. I. INTRODUCTION T he Dynamic random-access memory (DRAM) has been widely used in space for more than 10 years. The DRAM memory cell hard errors (HE), which are defined as an unalterable change of state associated with semi-permanent radiation damage to a memory cell (bit), have been observed during ground radiation tests since the early history of DRAM components. This kind of radiation-effects induced error was first reported in 1983 [1], but the term “stuck bit” was coined to describe un-rewritable DRAM bits [2]. More recently, these DRAM HEs were studied by A. Rodriguez [3] and M. Amrbar [4]. These studies showed that HE can be related to different radiation factors including both Single Event Effects (SEE) and accumulated dose effects, such as Total Ionizing Dose (TID), Displacement Damage (DD) & Micro DD. SEE and annealing effects with different mechanisms are described in the investigations of L. Scheick [5] and V. Goiffon [6]. On the other hand, if we look at the DRAM development history, some other electrical and operating environments [7] are also connected to HE such as operating temperature, operating modes, refresh period and initial semiconductor low data retention bits and End of Life (EoL) bits. In this paper, we define the HE in two categories: Stuck Bit (SB) and Weak Bit (WB). The SB is defined as unrewriteable memory bit (cell), that cannot be written correctly in two consecutive write/read operations under nominal operating frequency. The WB is a lower data retention time Manuscript received April 9, 2022. Pierre-Xiao Wang, Vincent Wartelle, Pierre Kohler, Timothee Dargnies are with 3D PLUS, 408 rue Hélène Boucher, 78532 BUC CEDEX, France (e-mail: {pwang,abosser,pkohler}@3d-plus.com). Kay Chesnut, Maggie Byers are with Raytheon Technologies, 2000 E. El Segundo Blvd, El Segundo, CA 90245, USA Kai Grürmann, Timo Dirkes are with DSI Aerospace Technologie GmbH, Sitz der Gesellschaft: Otto-Lilienthal-Str. 1, D-28199 Bremen, Germany memory bit, and while it is rewritable, it cannot store the data within a defined refresh period (in most cases the standard refresh period is tREF=64ms). Please note that a write/readback operation on the single address can detect the SB. However, a cycle of write the full memory/read-back will detect both SBs & WBs since the WB will manifest as an error under a preset refresh period that is running in the background during the full memory operation. 3D PLUS provides space memory modules, and many DRAMs have been evaluated, including different generations (SDRAM, DDR1, DDR2, DDR3 and DDR4), feature sizes, cell designs, and foundries used in manufacturing the DRAM. We observed that the SB and WB characteristics depend on these factors. For example, similar feature sizes from different foundries may create several orders of magnitude differences in the number of SBs/WBs. The objective of this paper is to present a specific DDR3’s SB/WB characterization results and study a proposed SB/WB screening and mitigation methodology. II. EXPERIMENTAL SETUPS AND FLOW The Device under Test (DUT) is a 4Gb DDR3 SDRAM with a system frequency up to 1066 MHz, organized as 512Mx8b using a 78-ball FBGA, or 256Mx16b using a 96-ball FBGA. The evaluation flow went through the Heavy Ion (HI) test as presented in the paper “SEL/SEU/SEFI/TID Results of the Radiation Hardened DDR3 SDRAM Memory Solution” in 2017 [8], to evaluate this DUT and several other components at RADEF. The “best” or “most insensitive” one (the DUT of this paper) was selected then to go through the TID, Neutron, Proton, and other evaluation tests, then embedded in 3D PLUS space grade DDR3 modules: ex: 3D3D16G72WB2723 / 3D3D24G48YB2732 . The TID irradiation used the Co60 source at CEA, Neutron irradiation used the source at the University of Massachusetts Lowell (MA), and Proton irradiation used the Paul Scherrer Institute (PSI) Proton Irradiation Facility (PIF). The HI and Proton tests used the same test bench with capability to operate the DDR3 DUTs at a clock frequency of up to 400 MHz, which is based on a Xilinx Virtex6 FPGA application. The TID, Neutron and Temperature tests used a commercial memory tester. More irradiation and test bench details will be given in the final paper. 979-8-3503-7123-9/22/$31.00 Authorized licensed use©2022 limitedIEEE to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply. III. HEAVY IONS RESULTS The HI test was performed in 2017 with ions listed in Table I, and test conditions were detailed in the paper “SEL/SEU/SEFI/TID Results of the Radiation Hardened DDR3 SDRAM Memory Solution” [8]. The SB/WBs were verified before and after irradiation during storage, read, write/read mode tests with a random data pattern. Only a standard 64ms refresh period was used, and, during the test, there was no identification discrimination between SBs and WBs. TABLE I RADEF BEAM CHARACTERISTICS during TID tests, but in this particular test, refresh time was a fixed value, and was used as a Go-No Go test. Semiconductors exhibit an operational margin when testing the tREF parameter, which means that the mean value of successful DRAM refresh timing will be longer at room temperatures and that is much longer than the standard 64 ms specification. However, we observed that this margin shrinks, especially at high temperature. Some DRAMs specify a 32 ms refresh rate at high temperature (e.g., +105°C). For those cases, we used a burn-in procedure (240h, +125°C) with min/max temperature tests prior to TID tests, where the characterization testing was performed at 3D PLUS. We observed some tREF degradation and degradation in other parameters (e.g., leakage current) plus functional failures using a tREF down to 64ms. These tREF degradation characteristics were used to generate ideas on how to screen the SB/WBs. V. NEUTRON RESULTS Figure 1 gives the heavy ion cross section (cm²/bit) curve of total SBs/WBs. The HI test in 2017 focused on SEFI and SEFI mitigation, and SB/WBs were not the first criteria used to judge different candidates. Later studies showed that this DUT had fewer SBs/WBs. The SBs/WBs were also checked under the runs with 1E6p/cm² fluence. To separate DD from TID and SEE effects, 25 samples were divided into five bins and were exposed to different 1 MeV equivalent fluences up to 1.18E+12 n/cm², as shown in Table 2, at the radiation laboratory of the University of Massachusetts in Lowell, MA at the beginning of 2020. TABLE 2 NEUTRON EFFECTIVE FLUENCES 1Mev Fluence Level (n/cm²) 5.40E+10 1.17E+11 1.68E+11 6.70E+11 1.18E+12 Party Quantity (piece) 4 5 5 6 5 After exposure and cool down time, the DUTs were tested using a dedicated memory tester for functional and parametric measurements on the full memory. All DUTs – up to the neutron fluence of 1.68E+11 neutron/cm2 – remained fully functional and entirely within specifications, including meeting the tREF 64ms at +105°C. Fig. 1 SB/WB cross-section (cm²/bit) IV. TOTAL IONIZING DOSE RESULTS Fig. 2 Number of WBs vs tREF at +105°C after neutron exposure to 6.7E+11 n/cm² and 1.18E+12 n/cm² After SEE tests, this DDR3 went through several TID characterization tests at 3D PLUS with a Co60 source using a dose rate around 300 rads/hour. The DUTs (5pcs biased on, 5pcs with no bias, and 1 control that was not dosed) successfully passed at 75 Krad(Si). There were no SB/WBs observed during the TID test using a refresh rate, tREF 64 ms. Note that the refresh time is one of the evaluated parameters The DUTs exposed to the two highest fluences of 6.7E+11 and 1.18E+12 neutrons/cm² also remained fully functional and within specifications, except the tREF degraded to 16ms between +85°C and +105°C. There were no observed SBs on any of the cells, and WBs were observed only once devices hit the two highest fluence levels tested. Authorized licensed use limited to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply. and lots’ variances are observed a priori, the radiation relationship can potentially be established at a later proton test. Therefore, it will be possible to screen radiation SB/WB through simple electrical tests. Fig. 3 WB vs tREF at room temperature after 6.7E+11 and 1.18E+12 n/cm² Fig. 4 number of DUTs with WB under +125°C & VDDmin Figure 2 gives two samples’ WB results with different tREF times at Vccmin/max after 6.7E+11 and 1.18E+12 n/cm² when tested at +105°C (the counter stopped at 4096). To have a better understanding, room temperature tests were also carried out. Figure 3 gives two samples’ WB results at different tREF times at Vccmin/max after 6.7E+11 and 1.18E+12 neutrons/cm2 when tested at room temperature. No WBs were observed at -55°C (tREF > 64ms) at all fluence levels. VI. ANALYSIS OF “WEAK BITS” OVER TREF AND TEMPERATURE The SB/WBs were quantified together as HE during our earliest HI test. In the TID evaluation, we linked the tREF margin with temperature test. We first tested SB/WB under different temperatures to understand neutron interactions. After neutron characterization was completed, we developed screening strategies and designed a specific program to screen SB/WBs through tREF margin tests. The mechanisms to create the SB/WBs from radiation are presented in the papers mentioned in the paper introduction. However, the reasons & results (TID, SEE, DD, MicroDD) of SB/WB for each DRAM are different from our former experience because of process and design differences. On the other hand, we observed how the degradation of tREF plays a role in the SB/WB. If the SB can be considered as a WB with lower retention time than a write/read back operation, can we screen the individual component or lot SB/WB using tREF margin over temperature? Even if the degradation ratios from different manufacturing and radiation sources are different, the large margin in tREF can at least delay individual SB/WB failure. Based on the TID results (lot based), 24 DUTs from two lots (12pcs/lot) were selected. Between the two lots, we observed around 20% of difference in term of TID tolerance: A(worse TID) and B (better TID). The initial idea was to measure lots’ tREF variances and lot homogeneity using these two “good” and “bad” TID lots as a test case. If the lot non-homogeneity This specific program took the tREF test worst case (Tmax=+125°C, VDDmin=1.283V) to check the DUTs’ WB characteristics under 32/64/75/100/150/250/500/750/1000ms refresh periods with checkerboard and reverse-checkerboard patterns. No WB was observed under 32ms tREF. More than 4096 WBs were observed under 750ms and 1000ms, and 4096 is the counter limit, so the figures in this chapter will not show 32/750/1000ms data. Fig. 4 gives the comparison between lot A and B in terms of how many DUTs manifest WBs at different refresh times. The “good” lot showed roughly twice better than “bad” lot in term of tREF. The number of WBs for each piece may be between 1 to 4096. Fig. 5 gives the total number of WBs for these two lots. Fig. 5 number of total WBs under +125°C & VDDmin Combining the two figures, tREF lot-to-lot and piece-to-piece variances were observed. The TID “good” lot is also much better than TID “bad” lot in term of tREF lot homogeneity and variance. The tREF results were not out of datasheet limit, and it may be used to make parameter margin screening. VII. PROTON RESULTS The proton test used was representative of low orbit missions since proton interactions include TID/SEE/DD effects. To have a good understanding of how the SB/WBs manifest during the mission, a proton test was organized. It had been delayed several times because of the Covid19 situation, and it was performed at the end of 2021. The Authorized licensed use limited to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply. objective of the proton test included testing the possibility to screen the components’ SB/WB with temperature results. As presented from temperature test results in section VI of this paper, the components from two lots with different TID results had been prepared, which had variances in tREF lot to lot, and sample to sample within a lot. In each lot, the DUTs were distributed in three groups: “worst”, “normal” and “best”, based on high temperature tREF margin results. The idea was to measure the number of SBs/WBs as a function of the proton fluence and establish the lot characteristics using tREF margin results from temperature tests. If lot and lot relation is established, a tREF temperature test can be integrated into Lot Acceptance Test (LAT) to perform lot selection as part of the TID LAT. If sample characteristics are established, a radiation SB/WB screening test can be realized by using a 100% high temperature electrical tREF margin test. Three proton energies at 50.8, 101.34 and 200Mev were used to irradiate the DUTs up to 3E11p/cm² fluence with the attention focused on controlling the max TID to stay at around 30 Krad(Si) (less than half of the component’s TID limit). We stopped at different fluences to check the SB/WB in real time. Even at the highest 3E11 p/cm² fluence, only tens of SB/WBs were observed. Figure 6 gives the cross section (cm²/bit) at different proton energies. Red samples were from lot A and the others were from lot B. VIII. MITIGATION DISCUSSION AND CONCLUSION A DDR3 SDRAM went through HI/Co60/Neutron/High temperature/Proton tests, the SB/WB cross sections were analyzed to help to quantify the SB/WB risks for space applications. Because of test bench limitations, the SBs and WBs were combined during HI/Proton tests, but we later concluded that only WBs were observed through device characterization performed after the HI/Proton tests. The selected DDR3 only had few SB/WB at a proton fluence of 3E11p/cm². This study shows an encouraging approach to screen DRAM radiation effects’ SB/WBs through high temperature tREF margin test, but it was inconclusive because of the lack of statistically significant data to show how screening might work on this DDR3. Note that this DUT was (successfully) selected for space applications because it was the most insensitive-to-radiation candidate based on tests on devices from several DDR3 manufacturers [8]. To handle the SB/WBs we recommend: 1. A powerful ECC to correct SB/WB together with SEU/MBU. 2. Shorter DRAM refresh periods to mitigate WBs and high temperature applications. 3. Avoid rewrites of the SBs in loops. Some EDAC designs require a process to rewrite/check error address, and this can overflow the error counter because of SBs. 3D PLUS is also introducing the Radiation Intelligent Memory controller (RIMC) IP Core for DRAM modules [9], and the recommendations above are integrated in the RIMC. REFERENCES Fig. 6 Weak bit in W/R mode with 64ms refresh time In most of cases, the number of HEs just stay around 10 to 20 bits among 4G bits with 64ms. When we shortened the refresh time by a factor of 2, the number of HEs will roughly decrease by a factor of 3. If refresh times are increased by a factor of 2, the number of HEs roughly increased by a factor of 3. Annealing effects were also studied. Most HEs annealed after retest of the DUTs around 34 days later when they were shipped back from PSI, the proton facility. The worst DUT only had 7 HEs at 64ms and had 1 HE at 32ms. These results showed that it was difficult to establish the lot/sample and fluence/temperature relationships because so few SBs/WBs were observed: typically fewer than 10 SBs/WBs, and in many cases we observed only one or two SBs/WBs at the 32ms refresh timing level across 4G memory cells. [1] A.R.Knudson, et al. “Dose Dependence of Single Event Upset Rate in MOS DRAMS,” IEEE Transactions on Nuclear Science, Year: 1983, Volume: 30, Issue: 6 [2] S. Duzellier, et al “Protons and heavy ions induced stuck bits on large capacity RAMs”, RADECS 93. Year: 1993, Pages: 468 – 472 [3] A. Rodriguez, et al. “Proton-Induced SDRAM Cell Degradation”, RADECS 2015, Year: 2015, Pages: 1 – 4. [4] M. Amrbar, et al “Total Ionizing Dose Response of SDRAM, DDR2 and DDR3 Memories”, 2016 REDW, Year: 2016, Pages: 1 – 6. [5] L. Scheick, et al. “Investigation of the Mechanism of Stuck Bits in High Capacity SDRAMs” 978-1-4244-2545-7/08/ © 2008 IEEE P47-52 [6] V. Goiffon, et al. “Radiation-Induced Variable Retention Time in Dynamic Random Access Memories”, IEEE Transactions on Nuclear Science (Volume: 67, Issue: 1, Jan. 2020) Page(s): 234 – 244 [7] P.J. Restle, et al. “DRAM Variable Retention Time”, 0-7803-0817-4/92/ © 1992 IEEE [8] P-X.Wang, et al. “SEL/SEU/SEFI/TID Results of the Radiation Hardened DDR3 SDRAM Memory Solution”, 2018 IEEE Radiation Effects Data Workshop (REDW), DOI: 10.1109/NSREC.2018.8584290 [9] https://www.3d-plus.com/product.php?fam=11&prod=32 Authorized licensed use limited to: ShanghaiTech University. Downloaded on April 24,2024 at 15:38:19 UTC from IEEE Xplore. Restrictions apply.