Reliability Overview Brad Beaird Last revised 30 June 2014 Agenda- Reliability Overview After completing this module, you will be able to: • Understand the role of reliability in design • Set system reliability goals • Allocate reliability in a design to subsystems • Conduct Weibull life analysis • Construct/evaluate test plans Definition Definition RELIABILITY is . . . • Probability that a component or system will not fail (probability of survival) • Under specified operating conditions • Through a given point in time, R(t) Reliability: It’s About PERFORMANCE over Time Reliability Confidence The Probability “R” that the item will perform its intended function The chance “C” that the reliability will be as good as specified Quality over Time R(t) at C% confidence Time Tools At what point in time “t” do we need to specify operation? What analyses and tests allow us to make the prediction? How will my design operate over time? Reliability not modeled, predicted or verified in the development process is left to the customer to determine!! Reliability versus Durability Key Takeaways: • Reliability testing without testing to failure provides little benefit • Durability measures how long a product will last until it cannot be repaired. Reliability measures intermittent interruptions during this usage period. • We can estimate durability from a reliability test but not the other way around • We should test similar to the customers’ environment • The customers’ experience is based primarily on reliability • Reliability tests are shorter and more efficient than durability tests Reliability Concepts Failures Typical “Quality Over” Time follows a Bathtub Curve Infant mortality Wearout Useful Life Time Initially, failures are due to problems in Workmanship or poor quality control Reduced through burn-in testing, quality control, error-proofing Then, most systems reach a constant rate; failures are caused by environment, chance events Reduced by design, redundancy Finally, systems wearout, failures are caused by fatigue, corrosion, aging Reduced by derating, PM, parts replacement, design technology Failures Quiz- Reliability Concepts Time 1. Define reliability 2. In the infant mortality phase of the bathtub curve, the failure rate is: Increasing Decreasing Constant 3. In the wear-out phase of the bathtub curve, the MTBF is: Increasing Decreasing Constant Reliability Planning, Goals & Growth Reliability Planning & Goals • No goal worst case • “As good as current” also pretty lousy • MTBF > 1000 hours a quantified goal for overall system • MTBF > 1000 hours, with 90% confidence even better, speaks to sample size • B10 life (time at which 10% of population will fail) > 1000 hours with 95% confidence for a 90th percentile user Yes! Very specific, measurable, stated in terms of customer usage Reliability Goals & Growth – MTBF example Given the following field trial data, estimate the MTBF: Unit 1 2 3 4 5 Days 61 35 59 8 90 Comment failed failed failed failed suspended Exercise: Calculate the Mean time between failures (MTBF) The previous series of field tests revealed an MTBF of 50. Is there growth in this reliability parameter? Reliability Goals & Growth Given the following program test data, calculate and plot the cumulative mean time between failures: Test Hours 0-100 101-200 201-300 301-400 401-500 Repairs/Failures 12 7 4 3 3 Cumulative MTBF 100/12 = 8.3 hours per failure 200/19 = 10.5 300/23 = 13.0 400/26 = 15.4 500/29 = 17.2 Program MTBF goal = 22 • Called a “Duane” model, can be plotted on a log-log scale to straighten out the line • Alternatively could have plotted the reciprocal failure rate (e.g., failures per 100 hrs) Growth parameter • Will we achieve the goal by the end of the program after 800 hours of testing? Reliability Allocation & Modeling Reliability Allocation-Example System goal could also have been an MTBF figure Car Engine, needs R= 0.90 at 1000 hrs System Level (i.e., B10 > 1000 hrs) Engine Block subsystem R = 0.925 Subsystem Level Fuel & Air subsystem R = 0.973 Component Level Connecting Rod component R=0.999 Fuel Injector component R=0.995 Reliability Allocation is about cascading down a System goal into subsystems & components. Q: Why do numbers get bigger at lower levels of the model? Reliability Block Diagrams, Series R1= 0.95 R2= 0.97 R3= 0.99 What is the system reliability? Reliability of System= 0.95 x 0.97 x 0.99 = 0.91 We use Reliability Block Diagrams to model our system from the bottom up using estimates on components and subsystems Reliability Block Diagrams, parallel R1= 0.75 R3= 0.99 R2= 0.75 We can design using redundant, relatively low reliability components in parallel to achieve overall system reliability goals What is the system reliability? Hint: Figure out the parallel subsystem reliability, then multiply in series with component 3. Reliability Block Diagrams, Parallel (redundancy) R1= 0.75 R3= 0.99 R2= 0.75 RS= [1-(1-R1)(1-R2)] x R3 The trick for this subsystem reliability is: = [1-(0.25)2] x 0.99 Probability (subsystem survives) = Probability (1 or more survives) = 1 – probability (R1 and R2 fail) = 1 – 0.252 = 0.9375 x 0.99 =0.928 Reliability Block Diagrams- Exercise R= 0.90 R= 0.90 R= 0.90 R= 0.85 R= 0.95 R= 0.92 Calculate the reliability of this system Reliability Allocation Exercise R= 0.90 R= 0.90 R= 0.90 R= ? R= 0.95 R= ? Q: If the overall system reliability goal is 0.98, what should the reliability be for the two redundant components in the far right subsystem? Assume both components have the same reliability. Weibull Analysis History on Weibull • Waloddi Weibull- Swedish Engineer • Famous for pioneering work on reliability and life analysis • The Weibull distribution is named after him, and is a popular tool for modeling lifetimes Probability Cumulative Reliability or Density Function Distribution Survival Function Function Hazard Function f ( t) R ( t) t F ( t) 1 F ( t) f ( t) dt h ( t) f ( t) R ( t) More on The Weibull Distribution Well suited for modeling lifetime data Components Weibull Reliability equation Systems t Parameters R (t ) e Slope of the line (shape, β) Characteristic life (measures dispersion of data, ), B63.2 life Optional, guaranteed life (time before anything will fail, ) Mimics many distribution shapes (skewed left, skewed right, symmetric) Characterizes the failure distribution so we can make predictions Tells us where we are in the bathtub curve so we can fix problems β<1 means infant mortality, β=1 means useful life, β>1 means wear-out Weibull Reliability Practice Calcs Weibull Reliability equation R (t ) e Time, t 50 50 100 100 Slope, β 2.0 3.0 2.0 3.0 t Characteristic life, Reliability, R 100 ? 100 ? 100 ? 100 ? Example Weibull analysis B10 – Point at which 10% of failures predicted Failure Time (Hours) Median Rank Value 1 2.8 0.07 2 4.3 0.18 3 5.2 0.29 4 5.7 0.39 5 8.2 0.50 6 8.7 0.61 7 9.8 0.71 8 12.3 0.82 9 18.5 0.93 Number 3 hours 1 2 3 4 5 6 789 10 20 30 40 50 Information We Can Get From Completed Plot We can compare two products or processes Example- compare components from 2 suppliers Q: Which one is “better” (hint- it’s an open-ended question) 1 2 3 4 5 6 789 10 20 30 40 50 Information We Can Get From Completed Plot DV1 We can show objective evidence of reliability GROWTH DV2 On the Weibull plot, “growth” means pushing the plotted line to the right and flattening it 1 2 3 4 5 6 789 10 20 30 40 50 Weibull Analysis Exercise • Statement on GE light bulb package: Lifetime 1000 hrs (what does this mean?) Let’s do Weibull Analysis by hand to understand the technique • Data (time to fail) 1. Gather data & sort in ascending order – 450 hrs – 2100 hrs 2. Get median rank values from table – 1200 hrs 3. Plot ordered pairs (time, median rank) – 805 hrs 4. Fit a straight line 5. • • • • Estimate distribution parameters What is the slope? What is the characteristic life? What is the B10 life? B50 life? What part of the bathtub curve are we in? Looks like the GE figure of 1000 hours is a median (B50) life Median Rank Table Examples Previous MTBF example Given the following field trial data, estimate the MTBF: Unit 1 2 3 4 5 Days 61 35 59 8 90 Comment failed failed failed failed suspended • Weibull gives much more info than MTBF alone • We can characterize the entire life distribution • When we have suspended data (like unit #5 above), software is usually used to do the Weibull analysis Accelerated Life Testing Example We can mimic the life of our product in the field via testing Problem is we only have a short time to test for a long life Solution accelerate the testing via higher stresses: • Stress in this example is temperature • We ran tests to fail at 4 higher temps and did Weibull analysis • We then predicted the life at a normal temp of 80 degrees System Maintenance Applications Used to determine optimal parts replacement strategies The approach 1. Collect time to fail data, construct a Weibull plot, and verify you are in the wearout phase of the bathtub curve. 2. The optimal time to replace a part is based on Weibull parameters and ratio of cost of unplanned maintenance to planned maintenance. 3. Overall goal is to reduce total cost of downtime and improve system availability. Availability = MTBF/ (MTBF + MTTR). 4. To improve system availability, you either increase the mean time between failures or decrease the mean time to repair or both. System Availability, Exercise If MTBF = 100 hours and MTTR = 24 hours, A = ? If MTBF = 200 hours and MTTR = 24 hours, A = ? If MTTR = 24 hours and Availability goal = 0.97 (97%), MTBF = ? Reliability Demonstration Testing Zero Failure Acceptance Testing • Calculate n ln( 1 confidence ) ln( R ) Acceptance testing (versus testing to failure) is sometimes necessary, though we do not learn as much. Exercise: The requirement for a component states that the supplier must demonstrate at least 95% reliability with 90% confidence (R95 C90). How many units should the supplier test with no failures in order to pass? Exercise- Demonstration Testing Customer acceptance criteria requires us to demonstrate a B10 life > 500 hours, with 95% confidence. Past testing shows the Weibull slope is 1.6. There is 1000 hours available in the test lab. How many units must be tested, with zero failures allowed? β=1.6 Confidence = 95% (0.95) R= 0.90 (B10 90% reliability) Target time= 500 hrs, Actual time=1000 hrs, k is multiple of required time n= 1 𝑘𝛽 × ln(1−𝑐𝑜𝑛𝑓) ln(𝑅) Agenda- Reliability Overview Our learning objectives were: • Understand the role of reliability in design • Set system reliability goals • Allocate reliability in a design to subsystems • Conduct Weibull life analysis • Construct/evaluate test plans Do you understand? Questions?