Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1 1. DSSC, ECE Department, Carnegie Mellon University 2. LSI Corporation 03/14/2012 Evolution of NAND Flash Memory CMOS scaling More bits per Cell Seaung Suk Lee, “Emerging Challenges in NAND Flash Technology”, Flash Summit 2011 (Hynix) Flash memory widening its range of applications Portable consumer devices, laptop PCs and enterprise servers 2 Reliability and Endurance Challenges for NAND Flash Memories Endurance continues to deteriorate Only a few thousand reliable P/E cycles of NAND Flash memory Error correction capability requirements of ECC keep increasing Big gap between MLC flash endurance and storage reliability requirements Enterprise storage needs >50k P/E cycles 3 Future NAND Flash Storage Architecture Noisy Memory Signal Processing Raw Bit Error Rate • Read voltage adjusting • Data scrambler • Data recovery • Soft-information estimation Error Correction • Hamming codes • BCH codes • Reed-Solomon codes • LDPC codes • Other Flash friendly codes Need to understand NAND flash error patterns 4 BER < 10-15 Test System Infrastructure Algorithms Wear Leveling Address Mapping Garbage Collection ECC (BCH, RS, LDPC) Signal Processing 1. 2. 3. 4. Control Firmware Reset Erase block Program page Read page Software Platform USB Driver USB PHYChip FPGA USB controller NAND Controller Flash Memories Host USB PHY Host Computer USB Daughter Board 5 Mother Board Flash Board NAND Flash Testing Platform USB Daughter Board USB Jack HAPS-52 Mother Board Virtex-V FPGA (NAND Controller) Virtex-II Pro (USB controller) 3x-nm NAND Flash NAND Daughter Board 6 NAND Flash Usage and Error Model Erase Errors Program Errors Start P/E cycle 0 … P/E cycle i … P/E cycle n Erase Block Program Page (Page0 - Page128) Read Errors Retention Errors Retention1 Read Page (t1 days) … Retention Errors Retention j (tj days) End of life 7 Read Errors Read Page Testing Methodology Erase errors Count the number of cells that fail to be erased to “11” state Program interference errors Compare the data immediately after page programming and the data after the whole block being programmed Read errors Continuously read a given block and compare the data between consecutive read sequences Retention errors Compare the data read before retention and after retention Characterize short term retention errors under room temperature Characterize long term retention errors by baking in the oven under 125℃ 8 Flash Error Rates Comparison retention errors Error rate increases with P/E cycles Retention errors are the most dominant errors Retention error rates increase as retention time increase 9 Retention Error Mechanism LSB/MSB Stress Induced Leakage Current (SILC) Floating Gate REF1 11 REF2 REF3 10 01 00 Vth Erased Fully programmed Electrons loss from the floating gate causes retention errors Cells with more programmed electrons suffer more from retention errors Threshold voltage is more likely to shift one interval than multiple intervals 10 Retention Error Value Dependency (3 months) 00 01 01 10 Cells with more programmed electrons tend to suffer more from retention noise (i.e. 00 and 01) 11 2-bit MLC Background Overview Internal Architecture of 2-bit NAND Flash Memory LSB-Even Page Sets LSB-Odd Page Sets MSB-Even Page Sets MSB-Odd Page Sets 12 Retention Error Location Dependency LSB page has less BER REF1 Odd Page Cells LSB/MSB 11 Even pages have less BER REF3 REF2 10 01 Even Page Cells 00 Vth 13 Program interference LSB/MSB Additional Electrons Injected Floating Gate REF1 11 REF2 REF3 10 01 00 VT Erased Fully programmed Program interference errors are caused by extra electrons injection when programming neighbor cells Cells with less programmed electrons suffer more from interference errors Threshold voltage is less likely to shift up more than one level 14 Program Interference Error Value Dependency 10 01 11 10 Cells with less programmed electrons tend to suffer more from neighboring cell interference (i.e. 11 and 10) 15 Program Interference Error Location Dependency Program interference errors appear in even-MSB pages BER of bottom pages are orders of magnitude higher 16 Write Interference on bottom wordline 0V Vpass(10V) SGS WL0 GND Vpgm(20V) Vpass(10V) WL n WL31 … … Vdd SGD Vdd bitline 10 V Channel Voltage 0V Potential of drain edge of SGS transistor is raised by channel boosting Electrons are accelerated between SGS and WL0 and are quite possible to injected into the floating gate of WL0 HCI noise generated by source/drain hot-electrons in WL0 Threshold voltage of cells on WL0 shift right and it can even shift across more than one level (e.g. 11->01 or 00) 17 Read Error Analysis Floating Gate REF1 11 REF2 REF3 10 01 00 VT Erased Fully programmed 18 Erase Errors Analysis 0V Continuous erases can significantly reduce errors remove residual electrons n+ n+ +18 V 19 Conclusions & Future work Flash errors could show up for any operations Erase error, program error, retention error and read error Retention errors are the most dominant errors Flash errors show explainable error patterns Cycle-dependency, value-dependency and location-dependency Understanding of modern flash memory error patterns will enable designing effective error tolerance mechanisms Value-asymmetry aware coding techniques Cell location-aware wear leveling mechanisms 20