Slides - Carnegie Mellon University

advertisement
Error Patterns in MLC NAND Flash Memory:
Measurement, Characterization, and Analysis
Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1
1. DSSC, ECE Department, Carnegie Mellon University
2. LSI Corporation
03/14/2012
Evolution of NAND Flash Memory
CMOS scaling
More bits per Cell
Seaung Suk Lee, “Emerging Challenges in NAND Flash Technology”, Flash Summit 2011 (Hynix)
 Flash memory widening its range of applications
 Portable consumer devices, laptop PCs and enterprise servers
2
Reliability and Endurance Challenges for
NAND Flash Memories



Endurance continues to deteriorate
 Only a few thousand reliable P/E cycles of NAND Flash memory
Error correction capability requirements of ECC keep increasing
Big gap between MLC flash endurance and storage reliability requirements
 Enterprise storage needs >50k P/E cycles
3
Future NAND Flash Storage Architecture
Noisy
Memory
Signal
Processing
Raw Bit
Error Rate
• Read voltage adjusting
• Data scrambler
• Data recovery
• Soft-information estimation
Error
Correction
• Hamming codes
• BCH codes
• Reed-Solomon codes
• LDPC codes
• Other Flash friendly codes
Need to understand NAND flash error patterns
4
BER < 10-15
Test System Infrastructure
Algorithms
Wear Leveling
Address Mapping
Garbage Collection
ECC
(BCH, RS, LDPC)
Signal Processing
1.
2.
3.
4.
Control
Firmware
Reset
Erase block
Program page
Read page
Software Platform
USB Driver
USB
PHYChip
FPGA
USB controller
NAND
Controller
Flash
Memories
Host USB PHY
Host Computer
USB Daughter Board
5
Mother Board
Flash Board
NAND Flash Testing Platform
USB Daughter Board
USB Jack
HAPS-52 Mother Board
Virtex-V FPGA
(NAND Controller)
Virtex-II Pro
(USB controller)
3x-nm
NAND Flash
NAND Daughter Board
6
NAND Flash Usage and Error Model
Erase Errors
Program Errors
Start
P/E cycle 0
…
P/E cycle i
…
P/E cycle n
Erase
Block
Program
Page
(Page0 - Page128)
Read Errors
Retention Errors
Retention1
Read
Page
(t1 days)
…
Retention Errors
Retention j
(tj days)
End of life
7
Read Errors
Read
Page
Testing Methodology
 Erase errors

Count the number of cells that fail to be erased to “11” state
 Program interference errors
 Compare the data immediately after page programming and the data
after the whole block being programmed
 Read errors
 Continuously read a given block and compare the data between
consecutive read sequences
 Retention errors
 Compare the data read before retention and after retention
 Characterize short term retention errors under room temperature
 Characterize long term retention errors by baking in the oven under 125℃
8
Flash Error Rates Comparison
retention errors
 Error rate increases with P/E cycles
 Retention errors are the most dominant errors
 Retention error rates increase as retention time increase
9
Retention Error Mechanism
LSB/MSB
Stress Induced Leakage Current (SILC)
Floating
Gate
REF1
11
REF2
REF3
10
01
00
Vth
Erased
Fully programmed
 Electrons loss from the floating gate causes retention errors


Cells with more programmed electrons suffer more from retention errors
Threshold voltage is more likely to shift one interval than multiple intervals
10
Retention Error Value Dependency (3 months)
00 01
01 10
 Cells with more programmed electrons tend to suffer more from
retention noise (i.e. 00 and 01)
11
2-bit MLC Background Overview

Internal Architecture of 2-bit NAND Flash Memory
LSB-Even Page Sets
LSB-Odd Page Sets
MSB-Even Page Sets
MSB-Odd Page Sets
12
Retention Error Location Dependency


LSB page has less BER
REF1
Odd Page Cells
LSB/MSB
11
Even pages have less BER
REF3
REF2
10
01
Even Page Cells
00
Vth
13
Program interference
LSB/MSB
Additional Electrons Injected
Floating
Gate
REF1
11
REF2
REF3
10
01
00
VT
Erased
Fully programmed
 Program interference errors are caused by extra electrons injection
when programming neighbor cells


Cells with less programmed electrons suffer more from interference errors
Threshold voltage is less likely to shift up more than one level
14
Program Interference Error Value
Dependency
10  01
11  10
 Cells with less programmed electrons tend to suffer more from
neighboring cell interference (i.e. 11 and 10)
15
Program Interference Error Location
Dependency
 Program interference errors appear in even-MSB pages
 BER of bottom pages are orders of magnitude higher
16
Write Interference on bottom wordline
0V
Vpass(10V)
SGS
WL0
GND
Vpgm(20V)
Vpass(10V)
WL n
WL31
…
…
Vdd
SGD
Vdd
bitline
10 V
Channel Voltage
0V
 Potential of drain edge of SGS transistor is raised by channel
boosting
 Electrons are accelerated between SGS and WL0 and are quite
possible to injected into the floating gate of WL0
 HCI noise generated by source/drain hot-electrons in WL0
 Threshold voltage of cells on WL0 shift right and it can even shift
across more than one level (e.g. 11->01 or 00)
17
Read Error Analysis
Floating
Gate
REF1
11
REF2
REF3
10
01
00
VT
Erased
Fully programmed
18
Erase Errors Analysis
0V
 Continuous erases can
significantly reduce errors
 remove residual electrons
n+
n+
+18 V
19
Conclusions & Future work
 Flash errors could show up for any operations

Erase error, program error, retention error and read error
 Retention errors are the most dominant errors
 Flash errors show explainable error patterns

Cycle-dependency, value-dependency and location-dependency
 Understanding of modern flash memory error patterns will
enable designing effective error tolerance mechanisms


Value-asymmetry aware coding techniques
Cell location-aware wear leveling mechanisms
20
Download