Systems Engineering Program Systems Prognostic Health Management EMIS 7305 March 28, 2006 Christopher Thompson Senior Research Engineer Lockheed Martin Missiles and Fire Control Disclaimer: This briefing is unclassified and contains no proprietary information. Any views expressed by the author are his, and in no way represent those of Lockheed Martin Corporation. Topic Outline • • • • • • • • • Introduction Definitions The Goal of Prognostic Health Management PHM Stakeholders PHM Modeling Sensors Prognostics Analysis Tools Availability Examples 2 Introduction Education B.S. in Electrical Engineering, SMU (1997) M.S. in Mechanical Engineering, SMU (2001) - Focus: Fatigue and Fracture Mechanics M.S. in Systems Engineering (one class remaining) - Focus: Reliability, Statistical Analysis Ph.D. in Applied Science (anticipated ~ 2008) - Proposed Dissertation Title: Sensor Optimization for Systems PrognosticDiagnostic Health Management in a Unmanned Ground Combat Vehicle 3 Introduction Experience Lockheed Martin Missiles and Fire Control, Dallas TX Systems Engineer - Multifunction Utility/Logistics Equipment (MULE) Reliability Engineer - Army Tactical Missile System (TACMS) Lockheed Martin Aeronautics, Fort Worth TX Vehicle Systems - Prognostic Health Management - F-35 Joint Strike Fighter SMU School of Engineering - TA for Dr. Jerrell Stracener 4 Introduction Future Combat Systems MULE Program 5 Introduction Some keys to the successful fielding of the U.S. Army’s Future Combat Systems are: • • • • • Reducing the Logistics footprint Increasing Availability Reducing total cost of ownership Implementing Performance Based Logistics Improvements in the ‘ilities’ (RAM-T) – Reliability – Availability – Maintainability – Testability – Supportability 6 Some Definitions Prognostics - Of or relating to prediction; a sign of a future happening; a portent. Prognostics is the process of calculating and reporting an estimate of remaining useful life for a component, within sufficient time to repair or replace it before failure occurs. 7 Some Definitions Prognostic Health Management (PHM) – The implementation of an integrated software and hardware system which monitors the health, status and performance of a vehicle or system, tracks consumables (oil, batteries, ammunition, filters, fuel, coolant…) and configuration (software versions, part history…), and determines remaining life of all safety and performance critical components, predicting failures before they occur, thereby enhancing logistics and maintenance activities. PHM consists of ‘on-board’ as well as ‘off-board’ components. 8 Some Definitions Diagnostics - The identification of a fault or failure condition of an element, component, sub-system or system, combined with the deduction of the lowest measurable cause of that condition through confirmation, localization, and isolation. • Confirmation is the process of validation that a failure/fault has occurred, the filtering of false alarms, and assessment of intermittent behavior. • Localization is the process of restricting a failure to a subset of possible causes. • Isolation is the process of identifying a specific cause of failure, down to the smallest possible ambiguity group. 9 Some Definitions Fault – A condition that renders an element unable to perform its required function at desired levels of performance, or in a degraded mode. Failure – The inability of a component, system or sub-system to perform its intended function as designed. Failure may be the result of one or more faults. Fault Tolerance – The design of a system so that it will continue to operate in a degraded or reduced level rather than failing completely, when some part of the system fails. 10 Some Definitions Failure Cascade – The result when a failure occurs in a system of interconnected components, and the successful operation of a component depends on the successful operation of a preceding component. Conversely, a failure can trigger the failure of successive parts, and potentially amplify the result or impact. Redundancy and fault tolerant design can reduce the criticality or impact of the cascade, but not necessarily prevent a failure. 11 Some Definitions Design Failures – These take place due to inherent errors or flaws in the system design. Infant Mortality Failures - These cause newly manufactured systems to fail, and can generally be attributed to errors in the manufacturing process, or poor material quality control. Random Failures - These can occur at any time during the entire life of a system. Electrical systems are more likely to fail in this manner. Wear Out Failures - As a system ages, degradation will cause systems to fail. Mechanical systems are more likely to fail in this manner. 12 Some Definitions One-To-One Redundancy - Each active component in a system has a redundant backup on standby. The active component is monitored at all times, and the standby component will activate if the primary component fails. Since the probability of both components failing at the same time is low, One-To-One Redundancy provides the highest level of availability, but at a considerable disadvantage of requiring double the size, weight, power and cost, while reducing reliability (more components which can fail). 13 Some Definitions N + X Redundancy – N components are required to perform a function, but the system is configured with N + X components. When any of the N components fail, one of the X modules activates. The advantage lies in reduced size, weight, power and cost of the system, in the case where X is smaller than N. In case of multiple component failures, this scheme provides lesser system availability. 14 Some Definitions Load Sharing – Multiple components share a combined load. A higher level component manages load distribution, and monitors the health and status of the components. If one of the load sharing components fails, the load is redistributed among the others, allowing for graceful performance degradation. In this scheme, there is almost no extra cost. The main disadvantage is that multiple failures, system performance may degrade below an acceptable level. 15 The Ultimate Goal of Prognostics The purpose of Prognostic Health Management is to repair systems before they fail, while maximizing useful life consumption, and to have the necessary parts, tools and maintainers waiting nearby to resolve the correct problem as quickly and efficiently as possible. 16 PHM Stakeholders SYSTEMS ENGINEERING SOFTWARE & SIMULATION TEST ENGINEERING MECHANICAL ENGINEERING ELECTRICAL ENGINEERING TRAINING & PROD. SUPP. PHM Model Design PHM Model Integration Test Planning Crack Growth Sensing Sensor Implementation Reliability/ Failure Modes Interface Management Software Interfaces Fault/Failure Criticality Stress/Strain Sensing Sensor Integration Maintainability & Testability Requirements Development Fault/Failure Simulation Fault/Failure Propagation Corrosion Sensing Data Management Logistics & Sustainment Sensor Optimization Continuous BIT/PHM Fault/Failure Simulation Vibration Sensing Data Architecture Training Platform Integration Consumables Monitoring CAIV/WAIV Analysis Prognostic Trending Acoustic Sensing System Architecture Thermal Sensing Safety 17 Systems Engineering’s Role in PHM • • • • • • • • Requirements Development System Integration System Architecture Interface Management Risk Assessment Performance Measures: TPM’s & KPP’s System Modeling & Knowledge Integration Functional Decomposition 18 PHM Requirements • The PHM system shall isolate X percent of all detected failures to a single component, within Y percent confidence interval. • The PHM system shall predict X percent of expected failures for the next Y hours of operation. • The PHM system shall predict all failures that can result in a Safety Critical Failure. • The PHM system shall incorporate sensors to assess platform health, status and performance. • The PHM system shall incorporate sensors to monitor platform consumables. • The PHM system shall record and store all sensor data in onboard memory. 19 The ‘Ilities’ & Product Support • Reliability - FMECA: Failure Modes & Effects Criticality - FRACAS: Failure Reporting & Corrective Actions - Measures: MTBF, MTBSA, MTBEFF, MTBUMA • Maintainability - Maintenance Ratio - Preventive Maintenance Checks - Condition Based Maintenance - Design for Maintainability • Availability - AO, AI, AA 20 The ‘Ilities’ & Product Support • Testability - Verification and Validation - Fault Insertion - Simulation • Supportability - Consumables Monitoring - Supply Planning and Prediction • System Safety - Single & Multiple Fault Tolerant Design - Safety Critical Failures - Human/Machine Interaction 21 PHM Modeling • eXpress Modeling Tool • Model Based Reasoning • Case Based Reasoning • Knowledge Bases • Prognostics Analysis Tools 22 eXpress Modeling Tool DATA MINING DIAGNOSTIC, PROGNOSTIC & PHM DESIGN SENSOR FUSION REQUIREMENTS ANALYSIS CONOPS, SPECS & LOGISTICS Run-Time Mission Performance Prognostic Assurance, Based Health Availability Logistics Management & Success FRACAS & FMECA DEVELOPMENT RISK ASSESSMENT LIFE CYCLE TRADE SPACE BUSINESS CASES 23 Impact Technologies Prognostics developed at Impact Technologies: • Gas Turbine Engines and Auxiliary Systems • Avionics PHM and Reasoning • Aircraft Actuators (EMA, EHA) • Switching Mode Power Supplies, GPS Receivers and Power Electronics • Generators and Electric Drive Systems • Bearings, Gears, Shafts, Drive Trains, and Clutches • Hydraulic, Lube Oil and Fuel Systems • Structures and Components • Diesel Engines 24 Impact Technologies Prognostics modules have been developed and successfully tested on the following systems: • Pratt & Whitney F-100 engine on F-15 and F-22 • Engine, generator, lubrication system and gearbox on Honeywell F124 • Oil wetted components on GE F110-129, GE F404, Rolls Royce F405 • CH-47 T-55 engine and drive-train and • CH-60 intermediate gearbox • Blackhawk Carrier Plate Prognosis System • JSF Clutch Wear and Lift-Fan Prognosis System • Fuel system and Power generation system on DDGclass Navy Ships 25 Impact Technologies A number of different techniques have been used in the development of these prognostics: • Analytical and stochastic physics of failure models • Advanced signal processing • Feature extraction methods • Health state estimation and prediction algorithms • Statistical reliability • Bayesian updating methods • Component damage accumulation models • Probabilistic remaining useful life estimation • Data driven modeling techniques 26 Model Based Reasoning Model Based Reasoning (MBR) is a qualitative scheme where a model of the system is combined with an inference engine that is able to accomplish fault detection and fault isolation. The qualitative model is used to describe system elements and components, interconnections, and input/output behavior of the system being diagnosed, or ‘Knowledge Base’ and to establish an envelope of ‘correct behavior’. To accomplish diagnosis, the model determines what differences exist between the actual behavior of the system and the model of the system. The inference engine, using this comparison information, accomplishes the fault isolation task. 27 Case Based Reasoning Case Based Reasoning (CBR) is the process of solving problems based on past understanding of similar problems. The vast majority of this type of information is contained within the maintainers and operators – the experience and knowledge of the person using the system in question. CBR compares a case, forms an implicit generalization of the case, and then identifies commonalities between a retrieved case and the target problem. 28 Knowledge Bases ‘inorganic’ sensor data subsystem/ LRU internal sensor data BIT data consumables monitors sensor fusion and signal conditioning ‘organic’ sensor data off-board prognostic trend analysis KNOWLEDGE BASE FMECA data fault/failure propagation system level interactions functional interdependencies physical interdependencies design knowledge prognostic trend analysis CAD models circuit layouts Database Management: Data Mining & Feature Extraction maintainer inputs 29 Prognostic Analysis Tools Learning Systems & Artificial Intelligence • Genetic Algorithms • Expert Systems • Fuzzy Logic • Neural Networks Database Techniques • Feature Extraction • Data Mining Mathematical Techniques • Kalman Filtering • Dempster-Schafer Method • Wavelets • Statistical Analysis • Chaos Math? 30 Prognostic Analysis Tools Traditional Academic Solutions to PHM: • Run-to-Failure analysis of large, expensive systems, such as ship or rail engines • Analysis involves impractical, complex math models that require years of training to understand and interpret • Very expensive • Time consuming process • Rarely offer concrete design guidelines or solutions 31 Prognostic Analysis Tools Why Engineers in Industry Need More: • We have bottom lines and schedules to meet! • We have customer requirements to satisfy! • Systems Engineers work with designers who don’t like impractical, complex math models that require years of training to understand and interpret! • We have program managers who don’t like very expensive, time consuming solutions! • We like concrete design guidelines and solutions! 32 Sensor Technology • BIT/BITE • Sensor Fusion and Virtual Sensors • Sensor Conditioning and Filtering • Smart Sensors 33 Availability Analysis • Availability, Achieved Up Time MTBF AA Down Time MTBF MTTR where MTBF = Mean Time Between Failure MTTR = Mean Time To Repair 34 Availability Analysis • Availability, Operational Up Time MTBUMA AO Down Time MTBUMA ALDT MTTR where MTBUMA = Mean Time Between Unscheduled Maintenance Actions ALDT = Administrative Logistical Down Time MTTR = Mean Time To Repair 35 Availability Analysis • MTBUMA = Mean Time Between Unscheduled Maintenance Actions 1 1 1 MTBUMA MTBF MTBM MTBM induced no defect where MTBM = Mean Time Between Failures MTBM = Mean Time Between Maintenance 36 Availability Analysis • How can we improve AO? - By decreasing Administrative & Logistical Down Time (ALDT) - By increasing Mean Time Between Failures (MTBF) - By decreasing Mean Time To Repair (MTTR) - By increasing Mean Time Between Unscheduled Maintenance Actions (MTBUMA) – [by decreasing MTBR induced and MTBR no defect] 37 Availability Analysis • How can we decrease ALDT? - By improving Logistics Improve scheduling of inspections Improve commonality of parts Decrease time to get replacements - By improving Prognostics Replace parts before they fail, not after Maximize use of component life Improve off-board prognostics trending More sensors!! 38 Availability Analysis • How can we increase MTBF? - By improving Reliability Select more rugged components Improve life screening and testing Improve thermal management - By improving Quality Better parts screening Better manufacturing processes - By adding Redundancy At the cost of Size, Weight and Power! 39 Availability Analysis • How can we decrease MTTR? - By improving Maintainability Improve quality and efficacy training Simplify fault isolation Decrease number of tools and special equipment Decrease access time (panels, connectors…) Improve Preventative Maintenance - By improving Diagnostics Improve BIT and BITE Decrease ambiguity group size Improve maintenance manuals and training 40 Availability Analysis • How can we increase MTBM (induced/no defect)? - By improving Safety Limit the potential for accidental damage - By improving Prognostics Improve PHM models to monitor induced damage - By improving Diagnostics Lower the false alarm rate Don’t repair/replace things which aren’t broken! 41 Sensor Example Engine Health/Performance Monitoring: Place an acoustic sensor on the engine housing. Establish ‘nominal’ operating parameters. Develop library relating fault precedents to failures: = odd sounds which warn of impending failure. Monitor for ‘out of nominal’ acoustic signature. 42 PHM Example Consider a toaster: Not just any toaster, but the toaster on the first mission to Mars. NASA could only afford to send one, and it must work, every time, or else the astronauts won’t have toast. The toaster must also not endanger the mission by causing a safety hazard or waste bread. Mission Critical Function: - make toast Safety Critical Functions: - don’t injure the astronauts - don’t damage the spaceship - don’t burn the toast! 43 PHM Example • Identify the elements of a toaster. • What are the failure modes? • What should we monitor for safety hazards? • What elements should we monitor for diagnostics? • What data should we collect for prognostics? • How would we optimize the sensor coverage and data collection? 44 Issues Related to PHM • Continually monitoring sensors and storing all that data for analysis will quickly consume available bandwidth and storage space. • Capturing ‘profound knowledge’ of a complex engineered system and its myriad failure modes is very difficult, and involves integrating knowledge which crosses discipline boundaries: SE, EE, ME, RAM-T, Safety, Software, Math, Statistics, Physics… • Prognostic analysis of data is a very difficult problem, with no easy or universal solution. • PHM is a relatively new field. 45 Final Remarks • Do I have any practical PHM suggestions? - Aim for the low hanging fruit Use the sensors you already have in creative ways. Only add sensors when you must. You can’t monitor everything, so don’t try. - Don’t reinvent the wheel Build on other’s work and experience. Find good tools to design your system. 46 Additional Prognostic Analysis Tool 47