Combination of Multiple Mechanism for Post-Silicon Reliability Prediction Joseph B. Bernstein Ofir Delly, Moti Gabbay Ariel University Yizhak Bot (BQR) josephbe@ariel.ac.il April30, 30,2014 2014 April 1 We always try learning from the past in order to improve the Future. One Problem….. Everyone sees the past April 30, 2014differently ! 2 “It is possible to fail in many ways...while to succeed is possible only in one way…” Aristotle If We don’t learn from the past, We are condemned to repeat it…George Santayana, 1952 April 30, 2014 3 April 30, 2014 4 The Semiconductor Test Industry Today We test the parts “blindly” and then “see how they run…” April 30, 2014 5 Field Data Analysis Results Cumulative data for over 10,000,000 Military Electronic Systems Weibull Beta Paramter Histogram MTBF Region 16 14 10 Rate Occurrences, Beta = 1 is Poisson. 8 6 Physics of Failure 4 2 4 3.8 3.6 3.4 3.2 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 Frequency 12 = 1 ± .2 for all systems Field Failures are generally Constant Beta So, we should keep MTBF and FIT April 30, 2014 6 Some Observations: Modern Electronics have nearly • constant failure rate Few (very rare) exceptions • Keep the idea of Constant Rate and work within the framework of • Failure-In-Time (FIT) April 30, 2014 7 So what’s the problems with FIT ? Handbooks are Pretty outdated MIL 217 is OLD and USELESS. o FIDES is updated but only applies a o single mechanism approach. Physics of Failure (PoF) approach o looks to TTF and not FIT. Probabilistic DfR requires unique o distributions for each mechanism. HALT/HASS cannot predict l. o April 30, 2014 8 JEDEC Publication JEP 122G Rev. Oct. 2011 I Bet You didn’t know JEDEC says this: 2 Terms and definitions (cont’d) quoted failure rate: The predicted failure rate for typical operating conditions. (This is the FIT) NOTE: The quoted failure rate is calculated from the observed failure rate under accelerated stress conditions multiplied by an accelerated factor; e.g….. “ When multiple failure mechanisms and thus multiple acceleration factors are involved, then a proper summation technique, e.g., sum-ofthe-failure rates method, is required.” April 30, 2014 9 Semiconductor Industry ‘Joke’ The Magical Mysterious Decreasing FIT Intel Maxim PLX 1 FIT = 1 Failure per 10,000 parts in 12 years. If ONLY this were true! April 30, 2014 10 Measured Component FIT (l) vs. Year Produced Field Return Data 1000 Failure Rate (FIT) ACTUAL Failures per Billion Part-Hours 45-22 nm : ??? 90 nm : ~ 150 - 300 FIT 65nm:~ 300-450 FIT 100 130 nm : ~90-120 FIT 0.25 m : ~20-50 FIT Avionic and Military Expectation ! 10 1 1985 1990 1995 2000 2005 2010 2015 2020 Year(sold) Compared to previous avionic system data, the trend • continues at a much greater than expected rate. Bernstein’s Law: ~10x increase in FIT every 10 years • April 30, 2014 11 Benefits to Accurate Prediction !! More applications means more Reliability 1. X Sale$ Performance is Designed for a required Reliability specification Suggestion: A small reduction in performance can bring a huge gain in reliability (illustrative) Two products; One design X 2. Performance April 30, 2014 Multiple Accelerated Test Matrix for Reliability Prediction More customers for the same Design 12 12 Performance vs. Reliability inverter RO 21 1.20E+08 Freq. (Hz) 1.00E+08 8.00E+07 6.00E+07 Why not operate here? 4.00E+07 2.00E+07 Nominal Voltage 0.00E+00 0 0.5 1 1.5 2 2.5 3 3.5 Core Voltage (V) I could double the speed for free If I KNOW the reliability, maybe • I CAN improve performance !?!? April 30, 2014 13 Qualification TODAY Industry ‘Standard’ FIT (failures in time) model: Acceleration Factor (AF) is the product of Voltage and Temperature acceleration factors. 3 KILLER problems: This does NOT fit with KNOWN failure models. .1 When ZERO failures are reported, there is NO statistical meaning to the acceleration factor. Uncertainty is assumed for 0/1 fails, while AF has ZERO uncertainty; no accounting for error in AF !! April 30, 2014 .2 .3 14 Multiple Mechanisms Are Here to Stay Traditional Reliability approach fails to • predict Field Failures. Modeling, Simulation and Acceleration alone • will NOT yield true results without Accurate Failure Analysis. HOWEVER: We CAN model and PREDICT • Failure Rate under Known Conditions with a more complete picture of the mechanisms ??? April 30, 2014 15 Multiple Mechanisms Don’t Add Up !!! Single Mechanism Model: AFsystem = AFThermal* AFElectrical – So, 1/MTTFuse = 1/(MTTFtest *AFMM) – Multiple Mechanism Model: 1/MTTFuse = P1/(MTTFtest *AFmech1) + P2/(MTTFtest *AFmech2) – Therefore, the effective AF for multiple mechanisms is: – 1 AFMM = P2 P1 + AFmech2 AFmech1 The True acceleration factor is the SMALLER one, not • the one which exposes a failure at accelerated test. April 30, 2014 16 Traditional Methodology Single Mechanism Model (old JEDEC Standard): • 77 Devices tested for 1000 hours with 0 failures… – For Example: AFT = 100 and AFV = 130 • AFS= 100*130 = 13000 !! Zero failures at High V and High T Assume 1 failure after 1000 hours: Thus FIT: 109 / (77 * 1000 * 13000) = 1 FIT !! NICE! Now, we have done a great job and can go home and • celebrate our success !!! NOT !!! April 30, 2014 17 The Reality of Multiple Mechanisms BUT….Multiple Mechanisms Compete ! • Same Example: AFV from HCI and AFT from EM • EM has Ea = 1 eV and voltage g ~ 1. – HCI has Ea ~ 0 eV and voltage g ~ 14 – NOW, AFS = 2/(1/100 + 1/130) = 163 • So our correct calculation for the same data: • FIT: 109 / (77 * 1000 * 163) = 113 FIT !! This is compared to 1 FIT based on HTOL. Traditional FIT is ALWAYS too low as compared to considering multiple mechanisms April 30, 2014 18 Failure Rate Estimation at System Level New System Reliability Model Replacement Program (collaboration) Nth Component FM1 FM2 FM3 Each component is comprised of several sub-components in proportion to their function and relative reliability stress. lO = lO '·PO = (B1-O lHCI +B2-O lTDDB +B3-O lEM +B4-O lNBTI )·PO lD = lD '·PD = (B1-D lHCI +B2-D lTDDB +B3-D lEM +B4-D lNBTI )·PD lS = lS '·PS = (B1-S lHCI +B2-S lTDDB +B3-S lEM +B4-S lNBTI )·PS lJ = lJ '·PJ = (B1-J lHCI +B2-J lTDDB +B3-J lEM +B4-J lNBTI )·PJ Base Failure rate can be determined at various accelerated conditions in order to normalize the matrix and make physics based reliability assessment from test data combined with knowledge of the application April 30, 2014 19 FIXtress™ : A MORE ACCURATE FIT ~S(1/MTTF1+1/MTTF2+…+1/MTTFn) l Calculated PDF (FIT) The manufacturers have the data, we can make the prediction (BQR Software Tool) ! λTDDB l λHCI λNBTI λEM λPackage 12 10 8 6 4 Time to Fail (years) April 30, 2014 2 20 Our Guiding Principle: “It is better to be roughly right than precisely wrong.” ― John Maynard Keynes April 30, 2014 21 Post-Silicon Test Strategy How can we match data from reliability Models with experimentally obtained AF from HTOL? PROPOSAL: Run Multiple Tests at different conditions while monitoring degradation. AF from Burn-in at different T, V Physics of Failure Models (JEDEC) Matrix solution can match April 30, 2014 22 22 Our New Approach (ARIEL) JEDEC or TSMC Physics models Input MTBF / FIT 24 failure mechanisms over 4 categories DOE Burn-In Input Rel. AF λTDDB λHCI λBTI λEM Relative AF Relative MTBF/FIT T1,V1 TDDB HCI BTI = T2,V2 Input T3,V3 X T4,V4 System (TEST) measurements EM Matrix solution Output Proportionality parameter X DPPM per Fmax limit (real FIT at V, T test) Reliability solution: FIT, DPPM April 30, 2014 23 Contributions from JEDEC Models 45nm Temp Volt Different Dominant Mechanism at each test condition TDDB HCI BTI EM FIT 200 1.2 2.93E+03 8.35E+00 4.26E+04 2.40E+05 242750 140 1.2 3.71E+02 1.59E-01 4.55E+02 9.71E+03 9710 -35 2.4 3.19E+08 2.12E+13 9.08E+07 8.16E-05 9710000 140 2.4 5.10E+13 5.13E+11 2.20E+13 9.71E+03 703975 30 1.2 1.00 1.00 1.00 1.00 85 1.2 30 0.67 34 399.00 1.8 5305442428 739966 42398594 5362 120 April 30, 2014 1 Use HTOL 24 HTOL is OVERWHELMINGLY measuring only TDDB This is very convenient when Zero failures • arise during the 1000 hour HTOL test. Foundries design the gate oxides very well so • there WILL be NO TDDB failures during HTOL testing. 3 other mechanisms are just ignored during • final test and qualification. April 30, 2014 25 Separation of Mechanisms Failure Mechanisms can be separated by • properly selecting test conditions. High Voltage and Low Voltage tests EM • High Temperature and High Voltage tests • for NBTI and for TDDB Low Temperature and High Voltage tests • for HCI April 30, 2014 26 Two Distinct Mechanisms ! HCI frequency dependence • See at LOW T and High V • NBTI No Freq. dependence • Seen at High T and High V • 0.006 0.006 0.005 0.005 -35°C 2.4 V 0.004 0.003 140 °C 2.4 V 0.004 0.003 0.002 0.002 0.001 0.001 0 0 200 400 600 F(MHz) 800 0 0 100 200 300 400 500 F(MHz) Note: -35°C has >2.5X failure rate as at 140°C for the same Voltage !! April 30, 2014 27 TDDB from NBTI -stage RO Frequency vs. Voltage21 700000000 600000000 Neg. Bias-Temperature Instability (NBTI) Time-Dependent Dielectric Breakdown (TDDB) Performance (freq.) 500000000 400000000 300000000 200000000 Soft breakdown 100000000 0 0 0.5 1 1.5 2 2.5 3 3.5 Voltage-core April 30, 2014 28 Prediction for 28nm FIT for f=1GHz 1000 FIT for V=1.0 V 1000 Voltage 100 FIT per billion Gates FIT per Billion Gates 100 1.2 1.1 10 10 1.0 0.8 0.9 1 30 80 Temperature °C 130 2 GHz 1.5 1.0 0.5 0.1 1 30 50 70 90 Temperature °C 110 130 Dominant Mechanisms are EM and BTI, so strong T and Freq. dependence but weak V dependence. April 30, 2014 29 Observation Increase voltage by 20% • Increase performance by 20% • Increases FIT by only factor of 2 • Increased customer satisfaction • Increased sales for FREE !!! • April 30, 2014 30 Main Observations Dominant Mechanism at HTOL test is Never .1 the dominant mechanism at USE conditions Acceleration Factor based on 1 mechanism .2 model Significantly Overestimates Reliability Foundry models today are quite .3 sophisticated and consider N- and P-MOS based on their own data AND companies trust these models. The chip companies WANT to consider the .4 true contributions of EACH mechanism. April 30, 2014 31 Conclusions We have developed a prediction model • that is based on 4 failure mechanisms Our model is more accurate than the • single failure model currently in use Collaboration with Industry is Necessary • to Verify our Models and to keep pace with advancing technology April 30, 2014 32 Thank You Presto Engineering: Quality Reliability Test Services | HAST HTOL HTSL LTOL LTSL UHAST Qualification / Reliability April 30, 2014 33 HOME (/HTTP://PRESTO-ENG.COM/) / SOLUTIONS (/SOLUTIONS/PRESTO-SOLUTIONS.HTML) / ENGINEERING SERVICES (/SOLUTIONS/PRESTO-ENGINEERING-SERVICES.HTML) 33