2012 ARS, Europe: Warsaw, Poland Track 1, Session 5 Begins at 9:10 AM, Thursday, March 29th Why You Cannot Predict Electronic Product Reliability Albertyn Barnard Lambda Consulting Lambda Consulting PRESENTATION SLIDES The following presentation was delivered at the: International Applied Reliability Symposium, Europe March 28 - 30, 2012: Warsaw, Poland http://www.ARSymposium.org/europe/2012/ The International Applied Reliability Symposium (ARS) is intended to be a forum for reliability and maintainability practitioners within industry and government to discuss their success stories and lessons learned regarding the application of reliability techniques to meet real world challenges. Each year, the ARS issues an open "Call for Presentations" at http://www.ARSymposium.org/europe/presenters/index.htm and the presentations delivered at the Symposium are selected on the basis of the presentation proposals received. Although the ARS may edit the presentation materials as needed to make them ready to print, the content of the presentation is solely the responsibility of the author. Publication of these presentation materials in the ARS Proceedings does not imply that the information and methods described in the presentation have been verified or endorsed by the ARS and/or its organizers. The publication of these materials in the ARS presentation format is Copyright © 2012 by the ARS, All Rights Reserved. Applied Reliability Symposium, Europe 2012 Agenda Introduction What is reliability? Why you cannot predict reliability Published failure data When can reliability prediction be used? Practical prototype test Physics of failure analysis Summary Questions Albertyn Barnard, Lambda Consulting Track 1 5 min 5 min 25 min 10 min 5 min 10 min Session 5 Slide Number: 2 Introduction Albertyn Barnard, South Africa Reliability engineering consultant since 1982 Primary focus on electronic product development Systems engineering viewpoint Established first commercial HALT facility in South Africa Applied Reliability Symposium, Europe 2012 Why you cannot predict electronic product reliability What is reliability prediction? What is reliability engineering? What is reliability accounting? Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 3 Applied Reliability Symposium, Europe 2012 Introduction An accurate prediction of the field reliability of an electronic product during the development stage is, for obvious reasons, highly desirable: Accurate forecasts of support requirements Spares, facilities, personnel, etc. Accurate forecasts of financial risks Annual return rate, warranty costs, etc. Marketability benefits Many reliability prediction standards have been developed and applied for many years, and some “new” standards are constantly under development However, when these methods and standards are carefully analysed, all seem to be based on misleading or even incorrect assumptions Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 4 Introduction Applied Reliability Symposium, Europe 2012 This presentation argues that reliability prediction of an electronic product as performed today in many industries is an exercise in futility All design engineers and technical managers should be aware of these serious shortcomings The presentation concludes with an example on when reliability prediction may provide useful engineering knowledge Objective of reliability prediction: To estimate field reliability (during product development stages) Development & Production Operations t=0 Albertyn Barnard, Lambda Consulting Future Track 1 Session 5 Slide Number: 5 Applied Reliability Symposium, Europe 2012 Introduction Basic reasoning when performing reliability prediction: Product consists of parts Parts have failure rates Determine part failure rates Add part failure rates to obtain product failure rate Experience suggests that some products never fail (in useful life), while others fail frequently Why are some products more reliable than others, especially since basically the same parts are used? Consider the following scenario: Product contains 2,000 electronic parts When a failure occurs and root cause analysis is performed, system failure can usually be attributed to the failure of a single part (i.e. 1,999 parts not failed) System “MTBF” is then calculated based on the reliability of this single part? Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 6 What is reliability? All failures in electronic equipment can be attributed to a traceable and preventable cause, and may not be satisfactorily explained as the manifestation of some statistical inevitability. Norman Pascoe Applied Reliability Symposium, Europe 2012 Reliability Technology : Principles and Practice of Failure Prevention in Electronic Systems, 2011 All non-conformances are caused. Anything that is caused can be prevented. Philip Crosby Quality Without Tears: The Art of Hassle-Free Management, 1995 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 7 What is reliability? These quotations emphasise two fundamental concepts in reliability engineering: 1) failures are caused, and 2) failures can be prevented Applied Reliability Symposium, Europe 2012 Reliability is the absence of failures Reliability engineering is the management function that prevents the creation of failures Development & Production Operations t=0 Albertyn Barnard, Lambda Consulting Future Track 1 Session 5 Slide Number: 8 Applied Reliability Symposium, Europe 2012 What is reliability? Product is reliable if it does not fail! This is what the customer expects! Failure-free state can only be achieved if failure is prevented from occurring What is required to prevent failures? Engineering knowledge to understand failure mechanisms Management commitment to mitigate or eliminate them Proactive prevention should be the focus of reliability engineering Not reactive failure correction or failure management Reliability engineering should not be “playing the numbers game” Failures are created primarily due to errors made by design and production personnel Products seldom fail due to part failure Products often fail due to incorrect application and integration of those parts Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 9 What is reliability? Failure rate Applied Reliability Symposium, Europe 2012 Bathtub curve Wear-out failures Failure of weak items Infant mortality Albertyn Barnard, Lambda Consulting Useful life Externally induced failures Wear-out Track 1 Time Session 5 Slide Number: 10 What is reliability? Failure rate Applied Reliability Symposium, Europe 2012 Improved bathtub curve Wear-out occurs later No or low infant mortality No or low failures during longer useful life Infant mortality Albertyn Barnard, Lambda Consulting Useful life Track 1 Wear-out Session 5 Time Slide Number: 11 Applied Reliability Symposium, Europe 2012 Why you cannot predict reliability Reliability prediction based on “published failure data” System or product decomposition Obtain failure rate for each part (assuming all parts have failure rates) Calculate part failure rate (based on Arrhenius model, for temperature), and number of Pi factors (e.g. environment, quality, complexity, etc.) Use database (similar item, parts count, part stress) Add failure rates for system failure rate (assuming failure rates can be added) MTBF = 1 / Σ λi MIL-HDBK-217 "Reliability Prediction of Electronic Equipment” Most widely used approach by both commercial and defence No longer being updated by US DoD Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 12 Why you cannot predict reliability Reliability prediction based on “published failure data” BELLCORE TR-332 (Telcordia SR-332) Telecommunications industry Applied Reliability Symposium, Europe 2012 RDF 2000 European method developed by CNET 217Plus Reliability Information Analysis Center HDR5 British Telecom IEC 61709 & IEC TR 62380 (Reliability data handbook) Electric components – Reliability – Reference conditions for failure rates and stress models for conversion Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 13 Why you cannot predict reliability Reliability prediction based on “published failure data” MIL-HDBK-217 "Reliability Prediction of Electronic Equipment” Applied Reliability Symposium, Europe 2012 Comment published 44 years ago: “Figures 4.5 to 4.14 are adapted from “Reliability Stress and Failure Rate Data,” Mil-Hdbk-217, Government Printing Office, Washington, D.C., 1962. The second edition bears the number Mil-Hdbk-217A, and was published in 1965. It is disquieting that in many cases 217A (based on different but supposedly equivalent data) tabulates failure rates a decade higher than 217. Not only is the magnitude of the difference significant, but the direction is counter to the trend which one would expect during a time of componentreliability improvement.” Martin Shooman Probabilistic Reliability : An Engineering Approach, McGraw-Hill, 1968 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 14 Why you cannot predict reliability Applied Reliability Symposium, Europe 2012 Reliability prediction based on “published failure data” Reliability prediction is exercise in futility! http://ultravolt.com Calculated MTBF = 2,204,750 hours (for GB, 21ºC) 2,204,750 hours = 251 years! This is not (reliability) engineering! Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 15 Why you cannot predict reliability Reliability prediction based on “published failure data” Applied Reliability Symposium, Europe 2012 Failure rate Mil-Hdbk-217F Reality Max rated temperature Albertyn Barnard, Lambda Consulting Track 1 Operating temperature Session 5 Slide Number: 16 Why you cannot predict reliability Reliability prediction based on “published failure data” Applied Reliability Symposium, Europe 2012 A rough rule of thumb is that the operating life of semiconductor devices decreases by half for every 10°C rise in temperature above 100°C. Article in Nuts and Volts (July 2009), reference Motorola Semiconductor Technical Data Sheet AN1083, 1990 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 17 Applied Reliability Symposium, Europe 2012 Why you cannot predict reliability Some well-known documents such as Mil-Hdbk217 and derivatives of it treat all flaws as being precipitated by temperature alone, which is completely erroneous. As a matter of general interest, it is noted in passing that the Arrhenius equation has been incorrectly used to describe any number of failure modes which do not follow the equation at all. Mil-Hdbk-217 was a prime example of the rampant misuse of the Arrhenius equation. Gregg Hobbs Accelerated Reliability Engineering: HALT & HASS, 2000 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 18 Why you cannot predict reliability Applied Reliability Symposium, Europe 2012 In the author's opinion, Mil-Hdbk-217 should be immediately placed in the shredder and all concepts there from simultaneously placed in one's mental trash can. Mil-Hdbk-217 will go down in history as one of the biggest impediments to progress ever promulgated on the technical community. Gregg Hobbs Accelerated Reliability Engineering: HALT & HASS, 2000 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 19 Why you cannot predict reliability PDT O’Connor Solid State Technology, August 1990 Applied Reliability Symposium, Europe 2012 A very serious reservation arises in connection with the relationship between temperature and failure rate expressed by the reliability predictions of MilHdbk-217. The usual relationship is based on the Arrhenius formula for reaction kinetics in physics and chemistry. The relationships in electronic devices have been worked out by testing parts to failure at high temperatures and by calculating the activation energies for the processes which lead to failure. The flaw in this argument is that the great majority of electronic parts do not suffer from physical or chemical degradation. Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 20 Why you cannot predict reliability CT Leonard IEEE Transactions on Reliability, December 1988 Applied Reliability Symposium, Europe 2012 Temperature is probably simply another design variable, and once accommodated by engineering techniques, would have no other influence, i.e. reduction in temperature would not reduce failures. It is probably a lot more cost-effective to design boxes for the environment than to modify the environment to suit perceived sensitivities, especially when those sensitivities are at best vaguely understood. Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 21 Why you cannot predict reliability EB Hakim Solid State Technology, August 1990 Applied Reliability Symposium, Europe 2012 It is my own belief that under worst case design operating conditions for equipment, temperature induced failure mechanisms are not significant during the useful life of a system. For this to be true, a necessary condition is that the electrical functionality of system components is assured beyond the system temperature envelope. The significance of this is that system reliability will not be improved by lowering the equipment operating temperature. Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 22 Why you cannot predict reliability If you can predict reliability, why don’t you prevent failures? Applied Reliability Symposium, Europe 2012 An accurate prediction of reliability implies such knowledge of the cause of failure that they could be eliminated If you can predict reliability, it means that you know what will fail in future. Why not prevent it from occurring now? Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 23 Why you cannot predict reliability Reliability prediction is contrary to proven wisdom expressed by quality and reliability gurus Applied Reliability Symposium, Europe 2012 Edwards Deming: “Avoid numerical goals. Alternatively, learn the capabilities of processes, and how to improve them.” Philip Crosby: “Zero Defects” is an asymptote (i.e. continuous improvement).” Ralph Evans: “The ultimate goal of reliability engineering is surely not to generate an accurate reliability number for the item.” If the reader is to play an effective role in contributing to failure-free targets, then it is vital that the myths embedded within much of the twentieth century reliability folklore are properly recognised and appropriately discarded. On the other hand, the legacies bequeathed by the quality pioneers and gurus of the twentieth century should, based upon their proven merit, be studied, understood and applied with earnest enthusiasm. Norman Pascoe Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 24 Why you cannot predict reliability Since failures are caused by people, why allocate failure rates to parts? Applied Reliability Symposium, Europe 2012 Failures are primarily caused by errors made by design and production personnel Failures due to human nature and complexity of engineering Success depends on an awareness of all possible failure modes, and whenever a designer is either ignorant of, or uninterested in, or disinclined to think in terms of failure, he can inadvertently invite it. Ivars Peterson Vintage Books, 1996 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 25 Why you cannot predict reliability Many parts do not have a property such as “failure rate” Applied Reliability Symposium, Europe 2012 Many electronic part failures are caused by mechanical failure mechanisms (environment) Vibration (inferior mechanical design (e.g. natural frequency)) Temperature (inferior thermal design (e.g. exceeding thermal envelope)) Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 26 Why you cannot predict reliability Parts with “failure rates” may have insignificant failure rates during their useful life Many products replaced due to technical obsolescence Datasheet failure rates (e.g. http://www.ti.com) Applied Reliability Symposium, Europe 2012 MTBF? 10.16 FIT = 10.16 x 10-9 hours MTTF = 9.84 x 107 hours = 11,235 years Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 27 Why you cannot predict reliability Applied Reliability Symposium, Europe 2012 Failures may be caused by software How do you predict software reliability? Methods based on number of faults found during testing? Most prediction methods conveniently ignore software reliability Most modern products contain one (or many) microcontrollers Interaction between hardware and software may be highlighted during accelerated testing (e.g. HALT) Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 28 Why you cannot predict reliability Applied Reliability Symposium, Europe 2012 The failure rate of a system is not the sum of the failure rate of its parts Series configuration model is invalid e.g. pull-up vs. filter resistor Interaction of parts often fails e.g. without individual part failure, timing, parameter drift Integration of parts often fails e.g. without individual part failure, quality of production / assembly Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 29 Why you cannot predict reliability All part failures do not have “constant failure rates” Applied Reliability Symposium, Europe 2012 Exponential distribution may be invalid What is MTBF? Expected life? Mean value of a distribution? Mean value of which distribution? Reliability Edge, Volume 11, Issue 1, ReliaSoft Corporation Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 30 Why you cannot predict reliability Applied Reliability Symposium, Europe 2012 Accelerated testing will accelerate different failure mechanisms differently How do you do an accelerated life test on the product level? Subject product to step-stress test (e.g. temperature)? What failure mechanisms do you accelerate? Probably only those failure mechanisms most sensitive to specific stress condition (i.e. activation energy)? Do you actually measure activation energy, or do you assume a value? Selected model (e.g. Arrhenius or “Failure rate – temperature relationship”) may be invalid for solid-state electronics Life of individual parts accelerated at different rates, yet we present results as if every part has been aged during test Accelerated life testing is very useful for relative comparisons between technologies, parts, etc. Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 31 Why you cannot predict reliability Reliability prediction results are frequently unrelated to real-life observations Applied Reliability Symposium, Europe 2012 ANSI/VITA 51.1, American National Standard for Reliability Prediction Mil-Hdbk-217 Subsidiary Specification, June 2008 “Manufacturers and electronic reliability engineers use different methods to adjust the models in MIL-HDBK-217F Notice 2 for newer technologies, use different defaults for unknown stress conditions, and make differing assumptions of quality and complexity factors for COTS items. These differing methods yield results that are not comparable. This specification is intended to provide a standard method for reliability engineers to perform failure rate predictions for COTS items used in military or high reliability applications.” Use Pi Q = 1 (and not 10) for commercial integrated circuits Use voltage ratio = 0.5 as standard default for semiconductors “This is considered an average setting for the voltage ratio.” Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 32 Why you cannot predict reliability Reliability prediction results are frequently unrelated to real-life observations Applied Reliability Symposium, Europe 2012 ANSI/VITA 51.1, Reliability Prediction MIL-HDBK-217 Subsidiary Specification, June 2008 This specification provides standard defaults and methods to adjust the models in MIL-HDBK-217F Notice 2. This is not a revision of MIL-HDBK-217F Notice 2 but a standardization of the inputs to the MIL-HDBK-217F Notice 2 calculations to give more consistent results. ANSI/VITA 51.2, Physics of Failure Reliability Predictions, 2011 It includes a discussion of the philosophy, context for use, definitions, models for key failure mechanisms, definition of the input data required, default values if technically feasible or the typical range of values as a guideline. It defines how modeling results are interpreted and used. It requires the documentation of modeling inputs, assumptions made during the analysis, modifications to the models and rationale for the analysis. Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 33 Why you cannot predict reliability Reliability prediction results are frequently unrelated to real-life observations Applied Reliability Symposium, Europe 2012 Many other company proprietary databases Use field correction factors Assume only 20% of Mil-Hdbk-217F for FETs Modify quality levels Assume high mil-spec quality levels for lower quality parts It does not make any difference how smart you are, who made the guess, or what his name is – if it disagrees with real-life results, it is wrong. That is all there is to it. Dr. Richard Feynman Nobel Prize-winning physicist Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 34 When can reliability prediction be used? Applied Reliability Symposium, Europe 2012 Practical prototype test “Failure rate measurement and prediction” System or product step-stress accelerated test Determine time-to-failure distribution needs sample of test units Determine acceleration factor needs typically three samples tested at different stress levels http://quanterion.com Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 35 When can reliability prediction be used? Applied Reliability Symposium, Europe 2012 Practical prototype test Accelerated Testing: The Only Game in Town There is the old joke about the gambler who was told that the game he was in was crooked. His reply was, “I know it’s crooked, but it’s the only game in town.” Many of the justifications for certain kinds of accelerated testing remind me of that joke. There are several forms of accelerated testing, but they all try (by definition) to get results when results are not available with ordinary use conditions. Now, there is nothing wrong with accelerated testing per se. We all do it all the time, and it serves a useful qualitative purpose. But fools (among others) often try to extrapolate quantitatively the accelerated results to ordinary use conditions. Accelerated tests can help us find failure-modes or failure-resistances that ought to be explored to see if they might occur in ordinary use. But beware of those who justify their procedures by something equivalent to “It’s the only game in town.” Ralph Evans IEEE Transactions on Reliability, Vol. R-26, No. 4, October 1977 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 36 When can reliability prediction be used? Applied Reliability Symposium, Europe 2012 Physics of failure analysis “Failure mechanism knowledge and prediction” Physics of failure approach developed from research to understand fundamental failure mechanisms (i.e. not failure modes) Detailed root cause analysis of field or test failure Knowledge gained from physics of failure approach being used proactively to prevent similar failures in new products Technology is moving from “part level” to “product level” Technology is moving from “physics of failure” to “reliability physics” Typical analyses: Vibration Shock Thermal cycling Solder joint fatigue Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 37 When can reliability prediction be used? Applied Reliability Symposium, Europe 2012 Physics of failure analysis Only when failure mechanisms are known and understood e.g. physics of failure, reliability physics Only when product may fail due to cumulative damage e.g. fatigue, wear-out Only when we predict part reliability (and not system reliability) Not for infant mortality and “random” failures? Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 38 Summary Applied Reliability Symposium, Europe 2012 The Wonderful One-Hoss-Shay Oliver Wendell Holmes Albertyn Barnard, Lambda Consulting 100 years Track 1 Session 5 Slide Number: 39 Summary Applied Reliability Symposium, Europe 2012 The Wonderful One-Hoss-Shay Have you heard of the wonderful one-hoss-shay, That was built in such a logical way It ran a hundred years to a day, And then, of a sudden, it--ah, but stay I 'll tell you what happened without delay, Scaring the parson into fits, Frightening people out of their wits,-Have you ever heard of that, I say? You see, of course, if you 're not a dunce, How it went to pieces all at once,-All at once, and nothing first,-Just as bubbles do when they burst. End of the wonderful one-hoss-shay. Logic is logic. That's all I say. Oliver Wendell Holmes, 1858 Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 40 Summary Applied Reliability Symposium, Europe 2012 The Wonderful One-Hoss-Shay "The Wonderful One-Hoss Shay" is a perfectly intelligible conception, whatever material difficulties it presents. It is conceivable that a being of an order superior to humanity should so understand the conditions of matter that he could construct a machine which should go to pieces, if not into its constituent atoms, at a given moment of the future. The mind may take a certain pleasure in this picture of the impossible. Oliver Wendell Holmes 100 years Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 41 Summary Applied Reliability Symposium, Europe 2012 Perform reliability prediction based on “published failure data” Worst method Prediction based on data unrelated to your product Exercise in futility Perform reliability prediction based on “practical prototype test” Better method Prediction based on (limited) evidence of actual product reliability Careful of assumptions and conclusions “Only game in town” Perform reliability prediction based on “physics of failure analysis” Best method Prediction based on engineering knowledge of failure mechanisms Technology maturing into practical methods “Reliability physics” Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 42 Summary Perform reliability prediction based on “published failure data” Perform reliability prediction based on “practical prototype test” Applied Reliability Symposium, Europe 2012 Perform reliability prediction based on “physics of failure analysis” (Quantification) of reliability is in effect a distraction to the goals of reliability. (e-mail from) Ted Kalal Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 43 Where to get more information Patrick O’Connor and Andre Kleyner, Practical Reliability Engineering, 5th edition, John Wiley, 2012 Applied Reliability Symposium, Europe 2012 Accelerated Testing: www.ReliaSoft.com www.weibull.com Physics of Failure: Center for Advanced Life Cycle Engineering University of Maryland www.calce.umd.edu Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 44 Albertyn Barnard • • Applied Reliability Symposium, Europe 2012 • • • • • • • • M.Eng. (Electronics), M.Eng. (Engineering Management) Lambda Consulting PO Box 11826, Hatfield 0028, South Africa Consulting services in reliability engineering Commercial HALT facility in Pretoria, South Africa Part-time lecturer at Graduate School of Technology Management, University of Pretoria, South Africa INCOSE South Africa President 2008 Chair of INCOSE Reliability Engineering Working Group Mobile : +27 82 344 0345 ab@lambdaconsulting.co.za www.lambdaconsulting.co.za Lambda Consulting Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 45 Questions Applied Reliability Symposium, Europe 2012 Thank you for your attention. Do you have any questions? Albertyn Barnard, Lambda Consulting Track 1 Session 5 Slide Number: 46