Threshold Voltage Distribution in MLC NAND Flash: Characterization, Analysis, and Modeling Yu Cai1, Erich F. Haratsch2, Onur Mutlu1, and Ken Mai1 1. DSSC, ECE Department, Carnegie Mellon University 2. LSI Corporation 3/20/2013 Evolution of NAND Flash Memory Aggressive scaling MLC technology Increasing capacity Acceptable low cost High speed Low power consumption Compact physical size E. Grochowski et al., “Future technology challenges for NAND flash and HDD products”, Flash Memory Summit 2012 2 Challenges: Reliability and Endurance P/E cycles (required) Complete write of drive 10 times per day for 5 years (STEC) > 50k P/E cycles P/E cycles (provided) A few thousand E. Grochowski et al., “Future technology challenges for NAND flash and HDD products”, Flash Memory Summit 2012 3 Solutions: Future NAND Flash-based Storage Architecture Noisy Memory Signal Processing Raw Bit Error Rate • Read voltage adjusting • Data scrambler • Data recovery • Shadow program Error Correction BER < 10-15 • BCH codes • Reed-Solomon codes • LDPC codes • Other Flash friendly codes Need to understand NAND Flash Error Patterns/Channel Model Need to design efficient DSP/ECC and smart error management 4 NAND Flash Channel Modeling Write (Tx) Noisy NAND Read (Rx) Simplified NAND Flash channel model based on dominant errors Write Additive White Gaussian Noise Erase operation Program page operation Cell-to-Cell Interference Neighbor page program 5 Time-variant Retention Retention Read Testing Platform USB Board PCI-e Board HAPS-52 Motherboard Virtex-5 FPGA (NAND Controllers) Flash Board 6 Flash Chip Characterizing Cell Threshold w/ Read Retry Erased State Programmed States #cells 11 REF1 0V REF2 REF3 P1 P2 10 00 0100 i-2 i-1 i i+1 i+2 P3 01 Read Retry Read-retry feature of new NAND flash Tune read reference voltage and check which Vth region of cells Characterize the threshold voltage distribution of flash cells in programmed states through Monte-Carlo emulation 7 Vth Programmed State Analysis P3 State P2 State P1 State 8 Parametric Distribution Learning Parametric distribution Closed-form formula, only a few number of parameters to be stored Exponential distribution family Distribution parameter vector Maximum likelihood estimation (MLE) to learn parameters Observed testing data Likelihood Function Goal of MLE: Find distribution parameters to maximize likelihood function 9 Selected Distributions 10 Distribution Exploration RMSE P1 State P2 State P3 State Beta Gamma Gaussian Log-normal Weibull 19.5% 20.3% 22.1% 24.8% 28.6% Distribution can be approx. modeled as Gaussian distribution 11 Noise Analysis Signal and additive noise decoupling Power spectral density analysis of P/E noise Flat in frequency domain Auto-correlation analysis of P/E noise Spike at 0-lag point in time domain 12 Approximately can be modeled as white noise Independence Analysis over Space Correlations among cells in different locations are low (<5%) P/E operation can be modeled as memory-less channel Assuming ideal wear-leveling 13 Independence Analysis over P/E cycles High correlation btw threshold in same location under P/E cycles Programming to same location modeled as channel w/ memory 14 Cycling Noise Analysis P1 State P2 State As P/E cycles increase ... Distribution shifts to the right Distribution becomes wider 15 P3 State Cycling Noise Modeling Mean value (µ) increases with P/E cycles Exponential model Standard deviation value (σ) increases with P/E cycles Linear model 16 SNR Analysis SNR decreases linearly with P/E cycles Degrades at ~ 0.13dB/1000 P/E cycles 17 Conclusion & Future Work P/E operations modeled as signal passing thru AWGN channel Approximately Gaussian with 22% distortion P/E noise is white noise P/E cycling noise affects threshold voltage distributions Distribution shifts to the right and widens around the mean value Statistics (mean/variance) can be modeled as exponential correlation with P/E cycles with 95% accuracy Future work Characterization and models for retention noise Characterization and models for program interference noise 18 Backup Slides 19 Hard Data Decoding Read reference voltage can affect the raw bit error rate f(x) g(x) f(x) g(x) Vth Vth v0 BER1 vref vref v1 v0 v’ref v1 f ( x)dx vref BER2 g ( x)dx v 'ref f ( x)dx v 'ref g ( x)dx There exists an optimal read reference voltage Optimal read reference voltage is predictable Distribution sufficient statistics are predictable (e.g. mean, variance) 20 Soft Data Decoding Estimate soft information for soft decoding (e.g. LDPC codes) f(x) log likelihood ratio (LLR) g(x) LLR( y ) log( Vth v0 vref v1 High High Confidence Confidence Low Confidence Closed-form soft information for AWGN channel Assume same variance to show a simple case 21 Sensed threshold voltage range p( x 1 | y ) ) p( x 0 | y ) Non-Parametric Distribution Learning Non-parametric distribution Kernel Function Histogram estimation Volume of a hypercube Count the number of K of of side h in D dimensions points falling within the h region Kernel density estimation Smooth Gaussian Kernel Function Summary Pros: Accurate model with good predictive performance Cons: Too complex, too many parameters need to be stored 22 Probability Density Function (PDF) P1 State P2 State P3 State Probability density function (PDF) of NAND flash memory estimation using non-parametric kernel density methodology 23