Design for Yield Using Statistical Design Fabian Klass Director of Technology and Manufacturing EE 380 Computer Systems Colloquium, Stanford University February 7, 2007 8/31/05 P.A. Semi, Inc. - Company Confidential 1 Outline About P.A.Semi Process Variability Statistical Models Circuit Examples Statistical Timing PVT Margin Test Structures CAD Challenges Summary 2 About P.A. Semi Santa Clara-based fabless processor company Power ArchitectureTM Licensee Design our own Power Architecture processors Only 3rd company after IBM and Freescale Noted industry veterans combine in 150-strong organization Venture backed by Bessemer, Venrock, and Highland Capital Currently engaged with over 100 customers across different market segments Strategically partnered with IBM Breakthrough processor solution focused on low power @ high performance Scalable 64-Bit Power multicore architecture Redefines high performance (2GHz) at ultra low power (4W) 39 patents filed and 11 more patents in progress towards filing 3 Target Markets Compute Server Blades Digital Entertainment Embedded Boards Routers Critical Requirements Power Efficiency High Performance Cost Efficiency Throughput Efficiency Open source OS/Tools etc Game Players Imaging Systems Storage Systems Switches Wireless Basestations 4 The Challenge Power Management Process Variability 5 Process Variability Three main reasons why process variability has become so important: Moore's law: Exponential growth in device integration Billions of devices per die in 65nm and beyond Shrinking devices: Gate oxides approaching a few Angstroms Fewer dopants under the gate (~102) Ultra low VDD: VDD scaling < 1V to manage power. Vt not scaling, limited by leakage. Less headroom, more sensitivity to ∆Vt. 6 Process Variation Global: die-to-die, wfr-to-wfr, and lot-to-lot variations caused by changes in: Tox Xtor W & L N/PWELL doping N/PMOS flatband voltage Stress-induced effects Global Local: within-the-die variations caused by: Xtor W & L mismatch Vt mismatch ACLV Local 7 Width and Length Mismatch Caused by variations in the lithographic process Width and Length variations are uncorrelated Small transistors more sensitive to W/L changes Intel 65nm 6T SRAM cell 65nm CMOS NAND cell 8 Vth Mismatch Random fluctuations due to relatively small number of dopants in the channel Vth=0.78V Vth variance is inversely proportional to transistor area 170 Dopants Pelgrom's Law: V th =K / W × L Vth=0.56V Source: Asenov A. “Random Dopant Induced Threshold Voltage Lowering and Fluctuations in Sub-0.1 um MOSFET's: A 3-D 'Atomistic' Simulation Study” IEEE Trans. On Electron Devices, Vol 45, No 12, Dec 1998 9 Statistical Models Provided by most foundries More realistic than corner models. NMOS SF Cover the full design space. TT Foundries typically offer a 3σ process. The number of local sigma is determined by the designer. FF 1 23 SS 45 σL (Local) 1 2 3 4 σG (Global) FS PMOS 10 Monte Carlo Monte Carlo involves simulating a circuit over a wide range of randomly chosen devices parameters The result is a distribution plot of design constraints, e.g., delay or noise margin Typically tens of thousands simulations needed, including Vdd and Temp sweeps Delay Distribution 0.400 0.300 0.200 0.100 0.000 Vth 3. 2. 2. 2. 2. 2. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. -0 -0 -0 -0 -1 -1 -1 -1 -1 -2 -2 -2 -2 -2 -3 0 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 .2 .4 .6 .8 .0 .2 .4 .6 .8 .0 .2 .4 .6 .8 .0 Delay 11 When to use statistical analysis Usage limited to process-sensitive circuits: Races Contention Mismatch Usage limited to high-usage circuits: SRAM cells Register file cells Flip-flops Sensamps Usage limited to highly-critical circuits: Max and min critical paths 12 How many Sigmas? Failure criteria: (µ - N σG – M σL) > Safe Margin where: µ is the mean N is determined by the foundry and is typically 3. M is determined by the number of instances of the circuit being analyzed: # of instances 100 1,000 10,000 100,000 1,000,000 10,000,000 M 2.33 3.09 3.72 4.26 4.75 5.20 Example: 1MB SRAM needs M=5 sigma for the bit design. 13 Example 1: 6T SRAM Cell Find stable VDD window for 6T SRAM cell (1MB) Flow: Run Monte Carlo SNM sims Find µ, σG, & σL across VDD Define safe margin Plot 3σG and 5σL curves 0.250 Mean Find Vdd window where SNM > Safe margin. SNM [MV] 0.200 3σG 0.150 0.100 Safe 0.050 Margin 5σL 0.000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 Vdd [V] 0.85 0.9 0.95 1 14 Example 2: Sense Amplifier Find min VDIFF for sensamp Flow Run Monte Carlo Plot passing ratio vs. ∆Vin Find µ & σL for sensamp Find µ & σL for SRAM Iread 100% 90% Min VDIFF: 70% SA 1− M × 2 IREAD / IREAD PASS V DIFF = 80% 60% 50% 40% 30% σSA 20% 10% 0% -200 -150 -100 -50 0 50 100 Delta Vin [mV] 150 200 15 Other Circuits Other possible applications for statistical circuit design: Dynamic logic Latches Register files cells Pulsed flops Level shifters Analog circuits Advantages: All circuits designed to a target sigma Avoid weak links Avoid overdesign 16 Statistical Timing Each gate has a mean and sigma. Sigmas can be computed using Monte Carlo The sigma of a path is determined by adding (i.e., sumsquare) the sigmas of individual gates Critical Path σ2 σ1 σn Path = 21 22... 2n 17 Speed Distribution Each chip has a local distribution on top of the global distribution due to local variations Not all parts within a [+3σ, -3σ] window will yield above target due to local variations Speed Distribution Local σLocal +3σGlobal Global σGlobal −3σGlobal 18 OCV Ratio and Yield On-chip Variability (OCV) = σLocal OCV Ratio = σLocal / σGlobal Speed yield strongly dependent on OCV ratio Speed Distribution 0.400 NormDist 0.300 OCV Ratio 0.200 OCV Ratio=2.0 OCV Ratio=1.5 OCV Ratio=1.0 OCV Ratio=0.5 OCV Ratio=0.0 0.100 0.000 3. 2. 2. 2. 2. 2. 1. 1. 1. 1. 1.Speed 0. 0. 0. 0. 0. -0 -0 -0 -0 -1 -1 -1 -1 -1 -2 -2 -2 -2 -2 -3 0 8 6 4 2 0 8 6 4 2 0 8 6 4 2 0 .2 .4 .6 .8 .0 .2 .4 .6 .8 .0 .2 .4 .6 .8 .0 19 Yield Examples Speed yield is affected by the shape of the timing histogram: These three histograms have very different speed yields (OCV Ratio=1.5): These three histograms have the same speed yield (OCV Ratio=0.9): Slack 90ps 60ps 30ps 0 -30ps Top Path [ps] 2GHz Yield Slack 90ps 60ps 30ps 0 -30ps Top Path [ps] 2GHz Yield Hist#1 10000 1000 100 10 0 500 37.5% Hist#1 10000 1000 100 10 0 500 89.8% Hist#2 1000 100 10 0 0 470 74.0% Hist#2 10000 1000 350 0 0 470 89.8% Hist#3 1000 100 10 5 1 530 60.8% Hist#3 10000 1000 100 0 1 530 89.8% 20 Margining Races Races need to be margined for PVT variations. Fixed PVT margin (conventional): D 21 =D2− D1m× D 1 D 2 D21 Fixed PVT sigma: D 21 =D 2− D 1M D 1D 2D 21 Drawbacks of fixed margin: Pessimistic for long delays Optimistic for short delays D21 D1 D2 Advantages of fixed sigma: Accurate (pseudo-statistical) M can be tuned for a specific design 21 Margining Races (cont.) Fixed sigma: PVT margin varies with the logic depth PVT margin varies with Vdd 50.0% PVT Margin [%] 40.0% Lower Vdd 30.0% 20.0% 10.0% 0.0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Logic Depth 22 Measuring Process Variability Test structures were developed to measure process variability. A testchip was built in a 65nm, triple-Vt, dual-oxide CMOS process. Data was collected across dies, wafers, lots, and across voltage and temperature. Measured data was used to: Validate statistical SPICE models Monitor process development Determine design margins Predict circuit limited yield 23 A Racer Circuit A Racer circuit measures on-die process variations in Si. > 100 copies of the Racer module are placed across the die. The spread in the location of the leading “1” provides an indication of the process variability. 1/2 0 FF FF FF 1 1 FF FF 1 0 FF FF 0 1 clk_en CKG MUX XOR/ XNOR 0 FF FF 1 FF FF 0 0 CKG 1 0 select_0_L select_1_L MUX 2 4 6 clk 24 Racer Results Bar Chart Yield @ 4 Sigma 100% 800 600 400 200 0 1000 800 600 400 200 0 Lower Vdd Racer data shows large spreads at low Vdd. Data can be used to predict circuit yield across Vdd. Low Vdd is the yield limiter! 1000 1000 800 600 400 200 0 1000 800 600 400 200 0 Yield 75% 1000 800 600 50% Lower Vdd 400 200 0 1000 25% 800 600 400 200 0% 3% 6% 8% 10% 11% #Inverters 13% 14% 15% 17% 0 0 1 2 3 4 5 6 7 8 #Inverters 9 Binned Inverter Count 25 10 A Leaker Circuit A Leaker circuit measures leakage spread (Ioff/Ion) in Si. It measures Ioff/Ion by sensing a tied-off skewed inverter with a 2P:1N inverter and latching to a flop. Multiples copies of the leaker module are placed on the die. Separate modules are used for standard Vt, low Vt, and high Vt devices. 1024xWN LK FF 1 512xWN LK FF 1 LK FF 1 LK FF LK FF 2xW N WN 0 0 clk WP WN 26 Leaker Results IOFF/ION RATIO 50 45 DISTRIBUTION 40 35 30 HVT SVT 25 LVT 20 15 10 5 0 0 1 2 3 4 5 6 7 8 9 IOFF/ION RATIO (Log2) 10 11 12 IOFF/ION RATIO MEAN + 4 SIGMA 20.0 IOFF/ION RATIO [Log2] Leaker data was collected across voltage and temperature. Distributions were generated and µ/σ data was obtained. Ioff/Ion ratio worse at low Vdd for all Vt devices Ioff/Ion ratio worse at high temperature for all Vt devices 18.0 16.0 14.0 12.0 HVT SVT 10.0 LVT 8.0 Safe Margin 6.0 4.0 2.0 0.0 0.66V 0.76V VDD 0.91V 1.1V 27 CAD Challenges Applications for statistical design: Timing Power ERC Reliability Main Challenges: Run time: Running Monte Carlo on a library would take years! Tools need to be 'context aware': Ex:Timing optimization depends on the shape of the timing histogram Pseudo-statistical approach Using statistical methods without running Monte Carlo. 28 CAD Challenges (cont.) Cell based designs Library characterization should produce µ,σ. Timing analyzer output should be speed yield. Transistor level design In-situ characterization to generate µ,σ Timing analyzer to create µ,σ for macro ERC/Reliability Statistically derived design rules Waivers based on distributions and yield impact Yield, Yield, Yield Tools should predict yield as a metric for signoff. 29 CAD: What's missing Tool Integration Integration of DFM and DFY tools to predict: Manufacturing yield Functional Yield Speed yield Overall product yield Validation Validation of DFY tools in Silicon Justification of investment 30 Summary Ignoring process variability may lead to non-functional designs or suboptimal yields. DFY will become more relevant as Vdd continues to scale and device geometries keep shrinking. Circuit solutions alone will not be sufficient if Moore's law continues. Process variability need to be handled at higher levels of the design process Future designs will incorporate: Self-checking logic Self-correcting logic Redundant logic (besides SRAMs) Wearout compensation mechanisms. 31 Thank You The P.A. Semi name and the P.A. Semi logo and combinations thereof are trademarks of P.A. Semi, Inc. The Power name is a trademark of International Business Machines Corporation, used under license therefrom. SPECint and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation (SPEC). All other trademarks are the property of their respective owners. 8/31/05 P.A. Semi, Inc. - Company Confidential 32