Reliability October 26, 2004 1 Today • DFDC (Design for a Developing Country) • HW November 2 – detailed design – Parts list – Trade-off • Midterm November 4 • Factory Visit November 16th 2 Midterm • Presentation Purpose- a midcourse correction – less than 15 minutes with 5 minutes discussion – Approx. 7 power point slides- all should participate in presentation – Show what you have done – Show what you are going to do – Discuss issues, barriers and plans for overcoming (procedural, team, subject matter, etc. – Scored on originality, candor, thoughtfullness, etc. not on total amount accomplished – Schedule today from 1:00 to 4:00 (speaker at 4:00 PM) 3 Reliability The probability that no (system) failure will occur in a given time interval A reliable system is one that meets the specifications Do you accept this? 4 What do Reliability Engineers Do? • Implement Reliability Engineering Programs across all functions – – – – – – Engineering Research manufacturing Testing Packaging field service 5 Reliability as a Process module INPUT • • • • • Reliability Goals Schedule time Budget Dollars Test Units Design Data Reliability Assurance Module Product Assurance Internal Methods •Design Rules •Components Testing •Subsystem Testing •Architectural Strategy •Life Testing •Prototype testing •Field Testing •Reliability Predictions (models) 6 Early product failure • Strongest effect on customer satisfaction – A field day for competitors • The most expensive to repair – – – – Why? Rings through the entire production system High volume Long C/T (cycle time) • Examples from GE (but problem not confined to GE!) – GE Variable Power module for House Air Conditioning – GE Refrigerators – GE Cellular 7 Early Product Failure • Can be catastrophic for human life – – – – – – Challenger, Columbia Titanic DC 10 Auto design Aircraft Engine Military equipment 8 Reliability as a function of System Complexity Why computers made of tubes (or discrete transistors) cannot be made to work # of components in Series Component Reliability = 99.999% Component Reliability = 99.99% 100 250 500 99.9 99.75 99.50 99.01 97.53 95.12 1000 10,000 100,000 99.01 90.48 36.79 90.48 36.79 0.01 9 Three Classifications of Reliability Failure Type • Early (infant mortality) Old Remedy- Repair mentality • Burn-in • Wearout (physical degradation) • Maintenance • Chance (overstress) • In service testing 10 Bathtub Curve Failure Rate #/million hours Infant Mortality Useful life No memory No improvement No wear-out Random causes Wear out Time 11 Reliability 90 80 70 86 70 50 30 19 16 12 5 2 0 0 Prob 60 of dying 50 in the next 40 30 year (deaths/ 20 10 1000) Age From the Statistical Bulletin 79, no 1, Jan-Mar 1998 12 Early failure causes or infant mortality (Occur at the beginning of life and then disappear) • Manufacturing Escapes – – – – workmanship/handling process control materials contamination • Improper installation 13 Chance Failures (Occur throughout the life a product at a constant rate) • • • • • Insufficient safety factors in design Higher than expected random loads Human errors Misapplication Developing world concerns 14 Wear-out (Occur late in life and increase with age) • • • • • • • Aging degradation in strength Materials Fatigue Creep Corrosion Poor maintenance Developing World Concerns 15 Failure Types • • • • Catastrophic Degradation Drift Intermittent 16 Failure Effects (What customer experiences) • • • • • • • • • • • Noise Erratic operation Inoperability Instability Intermittent operation Impaired Control Impaired operation Roughness Excessive effort requirements Unpleasant or unusual odor Poor appearance 17 Failure Modes • • • • • • • • • • • • Cracking Deformation Wear Corrosion Loosening Leaking Sticking Electrical shorts Electrical opens Oxidation Vibration Fracturing 18 Reliability Remedies • Early • Wearout • Chance • Quality manufacture/Robust Design • Physically-based models, preventative maintenance, Robust design (FMEA) • Tight customer linkages, testing, HAST 19 Reliability semi-empirical formulae Early failure Chance Failure 1 (T k1 ) f (T ) k (T k2 ) e f (T ) eT =pdf 1 1 mT e m k =constant failure rate m=MTBF Wear out 1 (T M )2 / 2 2 f (T ) e 2 20 Failures Vs time as a function of Stress High Stress Medium Stress Low Stress 21 Highly Accelerated Stress Testing • • • • Test to Failure Fix Failed component Continue to Test Appropriate for developing world? 22 Duane Plot Reinertson p 237 xx Log Failures per 100 hours Actual Reliability xx xx x xx x x x x x Required Reliability at Introduction Predicted x Log Cumulative Operating Hours 23 Integration into the Product Development Process FMEA- Failure Modes and Effects Analysis Customer Requirements Baseline data from Previous Products Feed results to Risk Assessment Process Brainstorm potential failures Summarize results (FMEA) Update FMEA Baseline data from Previous Products Develop Failure Compensation Provisions Probabilities developed through analysis Test Activity Uncovers new Failure modes Failure probthrough test/field data Use at Design Reviews 24 Risk Assessment process Assess risk • Program Risk • Market Risk • Technology Risk – Reliability Risk • Systems Integration Risk Devise mitigation Strategy Re-assess 25 Fault Tree analysis Seal Regulator Valve Fails Valve Fails Open when commanded closed Excessive leakage 1 Next Page Excessive port leakage 6 Excessive case leakage 7 Regulates High Regulates Low Fails closed when commanded open 2 3 4 Fails to meet response time Excessive hysteresis 5 Fails to meet response time 8 Fails to meet response time 9 26 Fault Tree analysis (cont) Valve Fails Open when commanded closed 1 Valve Fails Open when commanded closed Electrical Failure of Selenoid corosion Open Circuit Solder Joint Failure Mechanical Failure Selenoid Transient electro mechanical force Armature Contamination Coil short Insulation Wire Broken seals Material selection wear Material selection Valve Insuff Wire orientation filtering Broken 27 FMEA 28 FMEA Root Cause Analysis 29 Fault Tree Analysisexample Example: A solar cell driven LED 30 Reliability Management • Redundancy – Examples • Computers • memory chips? • Aircraft – What are the problems with this approach • 1. Design inelegance – expensive – heavy – slow – complex • 2. Sub optimization – Can take the eye off the ball of improving component and system reliability by reducing defects – Where should the redundancy be allocated • system • subsystem • board • chip • device • software module 31 • operation Other “best practices” • • • • • • • • Fewer Components Small Batch Size (why) Better material selection Parallel Testing Starting Earlier Module to systems test allocation Predictive (Duane) testing Look for past experience – emphasize re-use • over-design – e.g. power modules • Best: Understand the physics of the failure and model – e.g. Crack propagation in airframes or nuclear reactors 32 Other suggestions? 33