Previous PHM slides - Lyle School of Engineering

advertisement
Systems Engineering Program
Systems Prognostic Health Management
EMIS 7305
March 28, 2006
Christopher Thompson
Senior Research Engineer
Lockheed Martin Missiles and Fire Control
Disclaimer: This briefing is unclassified and contains no proprietary information. Any views expressed by the author are his, and in no way represent those of Lockheed Martin Corporation.
Topic Outline
•
•
•
•
•
•
•
•
•
Introduction
Definitions
The Goal of Prognostic Health Management
PHM Stakeholders
PHM Modeling
Sensors
Prognostics Analysis Tools
Availability
Examples
2
Introduction
Education
B.S. in Electrical Engineering, SMU (1997)
M.S. in Mechanical Engineering, SMU (2001)
- Focus: Fatigue and Fracture Mechanics
M.S. in Systems Engineering (one class remaining)
- Focus: Reliability, Statistical Analysis
Ph.D. in Applied Science (anticipated ~ 2008)
- Proposed Dissertation Title:
Sensor Optimization for Systems PrognosticDiagnostic Health Management in a Unmanned
Ground Combat Vehicle
3
Introduction
Experience
Lockheed Martin Missiles and Fire Control, Dallas TX
Systems Engineer
- Multifunction Utility/Logistics Equipment (MULE)
Reliability Engineer
- Army Tactical Missile System (TACMS)
Lockheed Martin Aeronautics, Fort Worth TX
Vehicle Systems - Prognostic Health Management
- F-35 Joint Strike Fighter
SMU School of Engineering
- TA for Dr. Jerrell Stracener
4
Introduction
Future Combat Systems
MULE Program
5
Introduction
Some keys to the successful fielding of the U.S.
Army’s Future Combat Systems are:
•
•
•
•
•
Reducing the Logistics footprint
Increasing Availability
Reducing total cost of ownership
Implementing Performance Based Logistics
Improvements in the ‘ilities’ (RAM-T)
– Reliability
– Availability
– Maintainability
– Testability
– Supportability
6
Some Definitions
Prognostics - Of or relating to prediction; a sign of a
future happening; a portent.
Prognostics is the process of calculating and
reporting an estimate of remaining useful life for a
component, within sufficient time to repair or
replace it before failure occurs.
7
Some Definitions
Prognostic Health Management (PHM) – The
implementation of an integrated software and
hardware system which monitors the health,
status and performance of a vehicle or system,
tracks consumables (oil, batteries, ammunition,
filters, fuel, coolant…) and configuration
(software versions, part history…), and
determines remaining life of all safety and
performance critical components, predicting
failures before they occur, thereby enhancing
logistics and maintenance activities. PHM consists
of ‘on-board’ as well as ‘off-board’ components.
8
Some Definitions
Diagnostics - The identification of a fault or failure
condition of an element, component, sub-system
or system, combined with the deduction of the
lowest measurable cause of that condition
through confirmation, localization, and isolation.
• Confirmation is the process of validation that a
failure/fault has occurred, the filtering of false
alarms, and assessment of intermittent behavior.
• Localization is the process of restricting a failure
to a subset of possible causes.
• Isolation is the process of identifying a specific
cause of failure, down to the smallest possible
ambiguity group.
9
Some Definitions
Fault – A condition that renders an element unable
to perform its required function at desired levels
of performance, or in a degraded mode.
Failure – The inability of a component, system or
sub-system to perform its intended function as
designed. Failure may be the result of one or
more faults.
Fault Tolerance – The design of a system so that it
will continue to operate in a degraded or reduced
level rather than failing completely, when some
part of the system fails.
10
Some Definitions
Failure Cascade – The result when a failure occurs
in a system of interconnected components, and
the successful operation of a component depends
on the successful operation of a preceding
component. Conversely, a failure can trigger the
failure of successive parts, and potentially
amplify the result or impact. Redundancy and
fault tolerant design can reduce the criticality or
impact of the cascade, but not necessarily
prevent a failure.
11
Some Definitions
Design Failures – These take place due to inherent
errors or flaws in the system design.
Infant Mortality Failures - These cause newly
manufactured systems to fail, and can generally
be attributed to errors in the manufacturing
process, or poor material quality control.
Random Failures - These can occur at any time
during the entire life of a system. Electrical
systems are more likely to fail in this manner.
Wear Out Failures - As a system ages, degradation
will cause systems to fail. Mechanical systems are
more likely to fail in this manner.
12
Some Definitions
One-To-One Redundancy - Each active component
in a system has a redundant backup on standby.
The active component is monitored at all times,
and the standby component will activate if the
primary component fails. Since the probability of
both components failing at the same time is low,
One-To-One Redundancy provides the highest
level of availability, but at a considerable
disadvantage of requiring double the size,
weight, power and cost, while reducing reliability
(more components which can fail).
13
Some Definitions
N + X Redundancy – N components are required to
perform a function, but the system is configured
with N + X components. When any of the N
components fail, one of the X modules activates.
The advantage lies in reduced size, weight, power
and cost of the system, in the case where X is
smaller than N. In case of multiple component
failures, this scheme provides lesser system
availability.
14
Some Definitions
Load Sharing – Multiple components share a
combined load. A higher level component
manages load distribution, and monitors the
health and status of the components. If one of
the load sharing components fails, the load is redistributed among the others, allowing for
graceful performance degradation. In this
scheme, there is almost no extra cost. The main
disadvantage is that multiple failures, system
performance may degrade below an acceptable
level.
15
The Ultimate Goal of Prognostics
The purpose of Prognostic Health Management is to
repair systems before they fail, while maximizing
useful life consumption, and to have the
necessary parts, tools and maintainers waiting
nearby to resolve the correct problem as quickly
and efficiently as possible.
16
PHM Stakeholders
SYSTEMS
ENGINEERING
SOFTWARE &
SIMULATION
TEST
ENGINEERING
MECHANICAL
ENGINEERING
ELECTRICAL
ENGINEERING
TRAINING &
PROD. SUPP.
PHM Model
Design
PHM Model
Integration
Test
Planning
Crack Growth
Sensing
Sensor
Implementation
Reliability/
Failure Modes
Interface
Management
Software
Interfaces
Fault/Failure
Criticality
Stress/Strain
Sensing
Sensor
Integration
Maintainability
& Testability
Requirements
Development
Fault/Failure
Simulation
Fault/Failure
Propagation
Corrosion
Sensing
Data
Management
Logistics &
Sustainment
Sensor
Optimization
Continuous
BIT/PHM
Fault/Failure
Simulation
Vibration
Sensing
Data
Architecture
Training
Platform
Integration
Consumables
Monitoring
CAIV/WAIV
Analysis
Prognostic
Trending
Acoustic
Sensing
System
Architecture
Thermal
Sensing
Safety
17
Systems Engineering’s Role in PHM
•
•
•
•
•
•
•
•
Requirements Development
System Integration
System Architecture
Interface Management
Risk Assessment
Performance Measures: TPM’s & KPP’s
System Modeling & Knowledge Integration
Functional Decomposition
18
PHM Requirements
• The PHM system shall isolate X percent of all
detected failures to a single component, within Y
percent confidence interval.
• The PHM system shall predict X percent of expected
failures for the next Y hours of operation.
• The PHM system shall predict all failures that can
result in a Safety Critical Failure.
• The PHM system shall incorporate sensors to assess
platform health, status and performance.
• The PHM system shall incorporate sensors to
monitor platform consumables.
• The PHM system shall record and store all sensor
data in onboard memory.
19
The ‘Ilities’ & Product Support
• Reliability
- FMECA: Failure Modes & Effects Criticality
- FRACAS: Failure Reporting & Corrective Actions
- Measures: MTBF, MTBSA, MTBEFF, MTBUMA
• Maintainability
- Maintenance Ratio
- Preventive Maintenance Checks
- Condition Based Maintenance
- Design for Maintainability
• Availability
- AO, AI, AA
20
The ‘Ilities’ & Product Support
• Testability
- Verification and Validation
- Fault Insertion
- Simulation
• Supportability
- Consumables Monitoring
- Supply Planning and Prediction
• System Safety
- Single & Multiple Fault Tolerant Design
- Safety Critical Failures
- Human/Machine Interaction
21
PHM Modeling
• eXpress Modeling Tool
• Model Based Reasoning
• Case Based Reasoning
• Knowledge Bases
• Prognostics Analysis Tools
22
eXpress Modeling Tool
DATA MINING
DIAGNOSTIC,
PROGNOSTIC
& PHM DESIGN
SENSOR FUSION
REQUIREMENTS
ANALYSIS
CONOPS,
SPECS &
LOGISTICS
Run-Time
Mission
Performance
Prognostic
Assurance,
Based
Health
Availability
Logistics
Management & Success
FRACAS &
FMECA
DEVELOPMENT
RISK
ASSESSMENT
LIFE CYCLE
TRADE
SPACE
BUSINESS CASES
23
Impact Technologies
Prognostics developed at Impact Technologies:
• Gas Turbine Engines and Auxiliary Systems
• Avionics PHM and Reasoning
• Aircraft Actuators (EMA, EHA)
• Switching Mode Power Supplies, GPS Receivers
and Power Electronics
• Generators and Electric Drive Systems
• Bearings, Gears, Shafts, Drive Trains, and
Clutches
• Hydraulic, Lube Oil and Fuel Systems
• Structures and Components
• Diesel Engines
24
Impact Technologies
Prognostics modules have been developed and
successfully tested on the following systems:
• Pratt & Whitney F-100 engine on F-15 and F-22
• Engine, generator, lubrication system and gearbox
on Honeywell F124
• Oil wetted components on GE F110-129, GE F404,
Rolls Royce F405
• CH-47 T-55 engine and drive-train and
• CH-60 intermediate gearbox
• Blackhawk Carrier Plate Prognosis System
• JSF Clutch Wear and Lift-Fan Prognosis System
• Fuel system and Power generation system on DDGclass Navy Ships
25
Impact Technologies
A number of different techniques have been used in
the development of these prognostics:
• Analytical and stochastic physics of failure models
• Advanced signal processing
• Feature extraction methods
• Health state estimation and prediction algorithms
• Statistical reliability
• Bayesian updating methods
• Component damage accumulation models
• Probabilistic remaining useful life estimation
• Data driven modeling techniques
26
Model Based Reasoning
Model Based Reasoning (MBR) is a qualitative
scheme where a model of the system is combined
with an inference engine that is able to accomplish
fault detection and fault isolation. The qualitative
model is used to describe system elements and
components, interconnections, and input/output
behavior of the system being diagnosed, or
‘Knowledge Base’ and to establish an envelope of
‘correct behavior’. To accomplish diagnosis, the
model determines what differences exist between the
actual behavior of the system and the model of the
system. The inference engine, using this comparison
information, accomplishes the fault isolation task.
27
Case Based Reasoning
Case Based Reasoning (CBR) is the process of
solving problems based on past understanding of
similar problems. The vast majority of this type of
information is contained within the maintainers and
operators – the experience and knowledge of the
person using the system in question. CBR compares
a case, forms an implicit generalization of the case,
and then identifies commonalities between a
retrieved case and the target problem.
28
Knowledge Bases
‘inorganic’
sensor data
subsystem/
LRU internal
sensor data
BIT data
consumables
monitors
sensor fusion and signal conditioning
‘organic’
sensor data
off-board prognostic trend analysis
KNOWLEDGE BASE
FMECA data
fault/failure propagation
system level interactions
functional interdependencies
physical interdependencies
design knowledge
prognostic trend analysis
CAD models
circuit layouts
Database
Management:
Data
Mining
&
Feature
Extraction
maintainer
inputs
29
Prognostic Analysis Tools
Learning Systems & Artificial Intelligence
• Genetic Algorithms
• Expert Systems
• Fuzzy Logic
• Neural Networks
Database Techniques
• Feature Extraction
• Data Mining
Mathematical Techniques
• Kalman Filtering
• Dempster-Schafer Method
• Wavelets
• Statistical Analysis
• Chaos Math?
30
Prognostic Analysis Tools
Traditional Academic Solutions to PHM:
• Run-to-Failure analysis of large, expensive
systems, such as ship or rail engines
• Analysis involves impractical, complex math models
that require years of training to understand and
interpret
• Very expensive
• Time consuming process
• Rarely offer concrete design guidelines or solutions
31
Prognostic Analysis Tools
Why Engineers in Industry Need More:
• We have bottom lines and schedules to meet!
• We have customer requirements to satisfy!
• Systems Engineers work with designers who don’t
like impractical, complex math models that require
years of training to understand and interpret!
• We have program managers who don’t like very
expensive, time consuming solutions!
• We like concrete design guidelines and solutions!
32
Sensor Technology
• BIT/BITE
• Sensor Fusion and Virtual Sensors
• Sensor Conditioning and Filtering
• Smart Sensors
33
Availability Analysis
• Availability, Achieved
Up Time
MTBF
AA 

Down Time MTBF  MTTR
where
MTBF = Mean Time Between Failure
MTTR = Mean Time To Repair
34
Availability Analysis
• Availability, Operational
Up Time
MTBUMA
AO 

Down Time MTBUMA  ALDT  MTTR
where
MTBUMA = Mean Time Between Unscheduled
Maintenance Actions
ALDT = Administrative Logistical Down Time
MTTR = Mean Time To Repair
35
Availability Analysis
• MTBUMA = Mean Time Between Unscheduled
Maintenance Actions
1
1
1
MTBUMA 


MTBF MTBM MTBM
induced
no defect
where
MTBM = Mean Time Between Failures
MTBM = Mean Time Between Maintenance
36
Availability Analysis
• How can we improve AO?
- By decreasing Administrative & Logistical Down
Time (ALDT)
- By increasing Mean Time Between Failures
(MTBF)
- By decreasing Mean Time To Repair (MTTR)
- By increasing Mean Time Between Unscheduled
Maintenance Actions (MTBUMA) – [by decreasing
MTBR induced and MTBR no defect]
37
Availability Analysis
• How can we decrease ALDT?
- By improving Logistics
Improve scheduling of inspections
Improve commonality of parts
Decrease time to get replacements
- By improving Prognostics
Replace parts before they fail, not after
Maximize use of component life
Improve off-board prognostics trending
More sensors!!
38
Availability Analysis
• How can we increase MTBF?
- By improving Reliability
Select more rugged components
Improve life screening and testing
Improve thermal management
- By improving Quality
Better parts screening
Better manufacturing processes
- By adding Redundancy
At the cost of Size, Weight and Power!
39
Availability Analysis
• How can we decrease MTTR?
- By improving Maintainability
Improve quality and efficacy training
Simplify fault isolation
Decrease number of tools and special equipment
Decrease access time (panels, connectors…)
Improve Preventative Maintenance
- By improving Diagnostics
Improve BIT and BITE
Decrease ambiguity group size
Improve maintenance manuals and training
40
Availability Analysis
• How can we increase MTBM (induced/no defect)?
- By improving Safety
Limit the potential for accidental damage
- By improving Prognostics
Improve PHM models to monitor induced damage
- By improving Diagnostics
Lower the false alarm rate
Don’t repair/replace things which aren’t broken!
41
Sensor Example
Engine Health/Performance Monitoring:
Place an acoustic sensor on the engine housing.
Establish ‘nominal’ operating parameters.
Develop library relating fault precedents to failures:
= odd sounds which warn of impending failure.
Monitor for ‘out of nominal’ acoustic signature.
42
PHM Example
Consider a toaster: Not just any toaster, but the
toaster on the first mission to Mars. NASA could only
afford to send one, and it must work, every time, or
else the astronauts won’t have toast. The toaster
must also not endanger the mission by causing a
safety hazard or waste bread.
Mission Critical Function:
- make toast
Safety Critical Functions:
- don’t injure the astronauts
- don’t damage the spaceship
- don’t burn the toast!
43
PHM Example
• Identify the elements of a toaster.
• What are the failure modes?
• What should we monitor for safety hazards?
• What elements should we monitor for diagnostics?
• What data should we collect for prognostics?
• How would we optimize the sensor coverage and
data collection?
44
Issues Related to PHM
• Continually monitoring sensors and storing all that
data for analysis will quickly consume available
bandwidth and storage space.
• Capturing ‘profound knowledge’ of a complex
engineered system and its myriad failure modes is
very difficult, and involves integrating knowledge
which crosses discipline boundaries: SE, EE, ME,
RAM-T, Safety, Software, Math, Statistics, Physics…
• Prognostic analysis of data is a very difficult
problem, with no easy or universal solution.
• PHM is a relatively new field.
45
Final Remarks
• Do I have any practical PHM suggestions?
- Aim for the low hanging fruit
Use the sensors you already have in creative ways.
Only add sensors when you must.
You can’t monitor everything, so don’t try.
- Don’t reinvent the wheel
Build on other’s work and experience.
Find good tools to design your system.
46
Additional Prognostic Analysis Tool
47
Download