Fault Models for EmbeddedDRAM Macros Mango C.-T. Chao, Hao-Yu Yang, Chin-Yu Chin National Chiao-Tung University, Hsinchu, Taiwan Rei-Fu Huang MediaTek Inc., Hsinchu, Taiwan Shin-Chin Lin UMC Inc., Hsinchu, Taiwan Outline Introduction Faults Models for eDRAM Defect-level Estimation of Wear-out Defects under ECC Conclusion From Commodity DRAM to Embedded DRAM (eDRAM) DRAM has been the mainstream of commodity memory since its invention Researchers attempt to bring commodity DRAM’s advantages into a SoC – Reduce eDRAM’s process adders to CMOS process – Deep-trench capacitor with bottle etch, planar capacitor, shallow trench capacitor, metal-insulator-metal capacitor Applications: Networking Gaming consoles Multimedia handhelds High definition TV MP3/PDA … etc DRAM’s advantages: * * * * High density Digital Analog Structure simplicity Embedded SoC low-power Memory RF low-cost Others UMC eDRAM Architecture Size: 16Mb (64 x 64 x (16 x 2) x 128 banks) Technology: 65nm low-leakage process 2 Area: 4 mm Supply voltage: 1.2 V Operating frequency: 100 MHz Retention time: 16 ms Bandwidth: 3.125 Gb/s Required cycles for one refresh: 64 x 128 cycles (0.08192 ms) Difference between eDRAM and commodity DRAM stand-alone DRAM embedded DRAM metal layers 2~3 5~6 Cs 30f ~ 45f 7f ~ 10f Cbl fixed ratio to Cs fixed ratio to Cs refresh period > 64ms 4ms ~ 16ms data size 512Mb ~ 2Gb 2Mb ~ 64Mb operating modes multiple single ESD Yes No interface test timing check + IO setup/hold time ECC Mostly no Yes Outline Introduction Faults Models for eDRAM Defect-level Estimation of Wear-out Defects under ECC Conclusion Fault Models for eDRAM Since eDRAM uses the SRAM interface, we start from a standard SRAM test algorithm, March C-, then discuss the faults which does not cover by SRAM test but may occurs in eDRAM. 1. 2. 3. 4. 5. Retention faults Word-line-coupling faults Bit-line-coupling faults Stuck-open faults Bank faults Retention Faults Definition – When the charges in the storage capacitor leak due to the leakage current, then the storage capacitor lose it stored value before next refresh. Detection – {Wa}, Delay, {Ra} Different mechanism in SRAM off “1” Word-line-Coupling Faults The word-line-coupling faults can be classified into two types: 1. Switching word-line-coupling fault 2. Hammering word-line-coupling fault Switching Word-line-Coupling Fault Definition – When the coupling capacitor between two word-line is too large, then the two word-line may turn on at the same time. blb0 bl0 blb1 bl1 blb2 bl2 blb3 bl3 blb4 bl4 blb5 bl5 blb6 bl6 blb7 bl7 WL0 1 0 1 0 1 0 1 0 WL2 0 1 0 1 0 1 0 1 cs<0> cs<1> cs<2> Detection – Y-direction with checkerboard background cs<3> Hammering Word-line-Coupling Fault Definition – When the current word-line(aggressor) turns on, it will induces a noise signal to adjacent word-line(victim), then the victim word-line turns on slightly and induces extra leakage to the storage capacitor. Detection – {Rwlb}n,{Rwla} notation – wlb: current word-line – wla: adjacent word-line – { }n: repeat n times Bit-line-Toggling Faults Definition – When the coupling capacitor between two bit-line is too large, it will slow down the charge-sharing mechanism, and then the sense amplifier will senses the wrong data. blb0 bl0 blb1 bl1 blb2 bl2 blb3 bl3 blb4 bl4 blb5 bl5 blb6 bl6 blb7 bl7 WL0 0 0 0 0 0 0 0 0 WL2 0 0 0 0 0 0 0 0 0 Vdd 0 2 Detection – X-direction with solid background Stuck-Open Faults Stuck-open faults in DRAMs can be classified into two categories: 1. Transistor-open faults 2. Resistive-open faults BL WL R Detection – for transistor-open fault {Ra, Wb, Rb} – for resistive-open fault {R0, W1, R1} R R R Bank Faults Definition – Two banks turn on at the same time due to the large parasitical RC. Slave bank0 bank1 bank2 RC1 …… … … … … Ck A0 A1 RC1 Ck B Master Slave Master Detection – consecutive read operations from the farthest bank to the nearest one Fault Occurrences between eDRAM and commodity DRAM eDRAM commodity DRAM retention faults high low word-line-coupling faults high low bit-line-toggling faults low high transistor-open faults low high resistive-open faults high low - - bank faults Outline Introduction Faults Models for eDRAM Defect-level Estimation of Wear-out Defects under ECC Conclusion Defect-level Estimation of Wearout Defects under ECC Reliability testing, such as THB test, HAST test, HTOL test, is applied to measure the reliability or lifetime of manufactured chips Due to the cost and application time, the reliability testing can only be applied to a small portion of the products to accelerate the wear-out failures The most straight-forward method to estimate this defect level is just to run the reliability testing with the ECC function and count the failed parts The number of sampled parts for the reliability testing is usually around few hundreds and the general acceptable defect level is under 100DRRM, so this sampling size is not enough to support such a fine resolution of the defect level Defect-level Estimation of Wearout Defects under ECC Instead of counting the failed part, we directly count the number of defective eDRAM cells for each part Because the ECC circuitry may mask some defective cells, we need to turn off the ECC function The probability distribution of defective cells can be modeled by the Poisson distribution DBP Production Test pass P(Pass_PT) fail +DDR Reliability Test pass P(Pass_RT|Pass_PT) fail Defect-level Estimation of Wearout Defects under ECC DL 1 P( Pass _ RT | Pass _ PT ) P( Pass _ RT Pass _ PT ) P( Pass _ PT ) P( Pass _ RT ) 1 (1) P( Pass _ PT ) 1 C n P ( DBP x) s Cx x 0 P( Pass _ PT ) w x x C n e s C x 0 x ( 1 2 ) DBP : the random variable denoting the number of single defects existing before applying the production testing. DDR : the random variable denoting the number of added single defects during the reliability testing. 2 : the mean of the random variable DDR . Cxw n x P( Pass _ RT ) P ( DBP DDR x) s Cx x 0 x w: the number of words in one memory chip. s: the number of bits in one memory chip(s=w n). 1: the mean of the random variable DBP . Cxw n x e 1 (1 ) x (2) s C x ! x 0 x w x notations: n: the number of bits per word. (1 2 ) (3) x! x P( E ): the probability that event E occurs. Pass _ PT : the event that a part containing random single defects passes the production testing with the use of ECC. Pass _ RT : the event that a part containing random single defects passes the reliability testing with the use of ECC. DL: the defect level caused by the wear-out defects. With Equation 1, 2, and 3, the defect level DL with the use of ECC can be obtained. Outline Introduction Faults Models for eDRAM Defect-level Estimation of Wear-out Defects under ECC Conclusion Conclusion We introduced an exemplary eDRAM design and discussed the key issues which should be emphasized in eDRAM testing by comparing to commodity-DRAM testing We started from a short SRAM algorithm and discussed the fault models that are not covered by the SRAM testing but should be considered in DRAM testing We proposed a mathematical model to estimate the defect level caused by wear-out defects under ECC