Health Analysis

advertisement
AN INTELLIGENT VALVE FRAMEWORK FOR INTEGRATED SYSTEMS
HEALTH MANAGEMENT ON ROCKET ENGINE TEST STANDS
by
Michael Russell
A Thesis
Submitted in partial fulfillment of the requirements of the
Master of Science Degree
of
The Graduate School
at
Rowan University
October 2010
Thesis Chair: Shreekanth Mandayam, Ph.D.
© 2010 Michael Russell
ABSTRACT
Michael J. Russell
AN INTELLIGENT VALVE FRAMEWORK FOR INTEGRATED SYSTEMS
HEALTH MANAGEMENT ON ROCKET ENGINE TEST STANDS
2009/2010
Shreekanth Mandayam, Ph.D.
Master of Science in Engineering
Intelligent sensors can play a critical role in the monitoring of complex test systems such
as those used to inspect rocket engine components. Such sensors have the capability not
only to provide raw data, but also indicate the data’s reliability and its effect on system
health at various levels in the system hierarchy. A major concern at NASA-Stennis Space
Center (SSC) in Mississippi is the failure of critical components in the rocket engine test
stand during a test cycle. Test cycles can run for extended periods of time and it is nearly
impossible to perform maintenance on mission critical components once testing has
commenced. Valves play a critical role in rocket engine test stands, because they are
essential for the cryogen transport mechanisms that are vital to test operations. Sensors
that are placed on valves monitor the pressure, temperature, flow-rate, valve position and
any other features that are required for diagnosing their functionality. Integrated systems
health management (ISHM) algorithms have been used to identify and evaluate
anomalous operating conditions of systems and sub-systems (e.g. valves and valvecomponents) on complex structures such as rocket test stands. In order for such
algorithms to be useful, there is a need to develop realistic models for the most common
and problem-prone elements. Furthermore, the user needs to be provided with efficient
tools to explore the nature of the anomaly and its possible effects on the element as well
as its relationship to overall system state.
This thesis presents the development of an intelligent valve framework that is capable
of tracking and visualizing events of the large linear actuator valve (LLAV) in order to
detect anomalous conditions. Specifically, the research work presented in this thesis
describes a diagnostic process that receives and stores incoming sensor data; performs
calculation of operating statistics; compares with existing analytical models; and,
visualizes faults, failures, and operating conditions in a 3D GUI environment. A suite of
diagnostic algorithms have been developed that can detect anomalous behavior in the
valve and other system components of the rocket engine test stand. The framework
employs a combination of technologies including a DDE data transfer protocol, autoassociative neural networks, empirical and physical models and virtual reality
environments. The diagnostic procedure that is developed has the ability to be integrated
into existing ISHM systems and reduce information overload in the typically crowded
environments of complex system control rooms. The augmentation to ISHM capabilities
that is presented in this thesis can provide significant benefits for ground-based spacecraft
monitoring and has the potential to be ultimately adapted for providing on-board support
for spacecraft.
ACKNOWLEDGEMENTS
The support of my MS program by the NASA Graduate Student Researchers Program
(GSRP) award No. NNX08AV98H in 2008 and 2009 is gratefully acknowledged. The
research work presented in this thesis was also supported by NASA Stennis Space Center
under Grant/Cooperative Agreement No. NNX08BA19A.
I also acknowledge Dr. Shreekanth Mandayam for being a great advisor and
providing the funding to get me through my master's program. To Dr. Schmalzel and Dr.
Merrill, I thank you for your guidance as part of my thesis committee. To Hak attack,
Fillman, Metin, Rane, Elwell and Freddie for helping me pass undergrad and making life
bearable during those all-nighters. To Will and Steven for being my best non-nerd
friends through college.
I would also like to thank my family who have supported me in my academic
journey. My mom and dad for always encouraging me to push myself in life and faith.
My siblings and sibling-in-laws for always being there for me. My grandparents for
supporting me in my internship to NASA which started this research.
In Memoriam: Dr. Robert (Bob) Field was one of the many engineers at Stennis
Space Center that contributed to the development of improved system models—one of
which is a core element in the intelligent valve. A Mechanical Engineer adept at thermal
system design and analysis, he brought a depth of experience and insight gained from his
many years at Pratt-Whitney designing turbomachinery blades and solving other equally
complex problems. At NASA, he applied his deep understanding of thermal systems
design and analysis to many facets of test stand design and optimization. In addition to
his thermal technical expertise, he was the leader of many a stimulating conversation into
iii
the finer—and fringier—points of the enterprises of engineering, science, and the
unknown. He was always ready to talk to young engineers and students. Bob retired from
NASA in October 2009 and passed away in February 2010.
In memory of Gladys Russell and William Kolb, the best grandparents, parents
and spouses I have ever known.
iv
TABLE OF CONTENTS
Acknowledgements .......................................................................................................... iii
List of Figures .................................................................................................................. vii
List of Tables ................................................................................................................... xii
CHAPTER 1: INTRODUCTION .................................................................................... 1
1.1 APPLICATIONS ............................................................................................................ 3
1.2 MOTIVATION .............................................................................................................. 4
1.3 OBJECTIVES ............................................................................................................... 6
1.4 SCOPE ........................................................................................................................ 7
1.5 ORGANIZATION .......................................................................................................... 7
1.6 EXPECTED CONTRIBUTIONS ....................................................................................... 8
CHAPTER 2: BACKGROUND ...................................................................................... 9
2.1 HEALTH ANALYSIS .................................................................................................... 9
2.2 FRAMEWORK FOR HEALTH ANALYSIS ...................................................................... 10
2.3 DESIGN AND TRADE STUDIES ................................................................................... 11
2.4 FAILURE MODE ANALYSIS ....................................................................................... 13
2.5 CBM TESTING, DATA COLLECTION, AND DATA ANALYSIS ..................................... 20
2.6 ALGORITHM DEVELOPMENT - DIAGNOSTICS ............................................................ 21
2.6.1 Preprocessing and Feature Extraction ............................................................ 23
2.6.2 Techniques for Diagnostics.............................................................................. 24
2.7 ALGORITHM DEVELOPMENT - PROGNOSTICS ........................................................... 32
2.8 RELIABILITY CENTERED MAINTENANCE .................................................................. 40
2.9 SYSTEM IDENTIFICATION TECHNIQUES .................................................................... 41
2.9.1 Autoregressive Models ..................................................................................... 43
2.9.2 Kalman Filters ................................................................................................. 43
CHAPTER 3: APPROACH ........................................................................................... 45
3.1 FAILURE MODES ...................................................................................................... 46
3.2 INTELLIGENT VALVE FRAMEWORK .......................................................................... 48
3.2.1 Data Acquisition .............................................................................................. 49
3.2.2 Preprocessing .................................................................................................. 51
3.2.3 Failure Mode Detection and Diagnosis ........................................................... 52
3.2.4 Valve Operational Statistics ............................................................................ 52
3.2.5 Auto-associative Neural Networks for Sensor Validation ............................... 55
v
3.2.6 Thermal Modeling ............................................................................................ 59
3.2.7 Adaptive Thresholding ..................................................................................... 60
3.3 PROGNOSTIC SURVEY .............................................................................................. 63
3.4 DIAGNOSTIC PROCESS .............................................................................................. 63
CHAPTER 4: RESULTS ............................................................................................... 68
4.1 DIAGNOSTIC VALIDATION DATA ............................................................................. 68
4.1.1 Thermal Model Data ........................................................................................ 68
4.1.2 Sensor Validation Data .................................................................................... 69
4.1.3 Adaptive Threshold Data ................................................................................. 70
4.2 THERMAL MODEL VALIDATION ............................................................................... 71
4.2.1 Thermal Modeling ............................................................................................ 72
4.2.2 Simulation Metrics ........................................................................................... 91
4.3 SENSOR VALIDATION ............................................................................................... 94
4.4 ADAPTIVE THRESHOLD .......................................................................................... 118
4.5 VALVE STATISTICS................................................................................................. 131
4.6 HEALTH VISUALIZATIONS ...................................................................................... 132
4.7 PROGNOSTICS ......................................................................................................... 134
4.8 PROGNOSTICS DATA .............................................................................................. 134
4.8.1 Canonical Data .............................................................................................. 134
4.8.2 LLAV Data ..................................................................................................... 136
4.9 PROGNOSTIC PERFORMANCE .................................................................................. 136
4.10 DIAGNOSTIC PROCESS .......................................................................................... 150
CHAPTER 5: CONCLUSIONS .................................................................................. 154
5.1 SUMMARY OF ACCOMPLISHMENTS......................................................................... 154
5.2 RECOMMENDATIONS FOR FUTURE WORK .............................................................. 157
References .......................................................................................................................159
vi
LIST OF FIGURES
Figure 1 - Integrated approach for system health analysis............................................................. 11
Figure 2 - The four types of failure mode and effect analysis (FMEA)......................................... 15
Figure 3 - Reliability analysis procedure for bottom-up and top-down FMEA approaches. ......... 17
Figure 4 - System decomposition for CBM testing, data collection, and data analysis. ................ 20
Figure 5 - Diagnostic and Prognostic Flowchart. .......................................................................... 23
Figure 6 - Model-based and Data-driven diagnostic techniques. ................................................... 26
Figure 7 - Approaches for prognosis. ............................................................................................ 35
Figure 8 - The system identification loop. ..................................................................................... 42
Figure 9 - LLAV with regions of interest labeled. ......................................................................... 46
Figure 10 - Prioritization of LLAV failure modes (see Equations 2.1 and 2.2
for y-axis calculation) . .................................................................................................................. 48
Figure 11 - System level flowchart of the Intelligent Valve framework. ...................................... 49
Figure 12 - Health analysis framework for the Intelligent Valve. ................................................. 49
Figure 13 - Valve statistics algorithm. ........................................................................................... 54
Figure 14 - Training method for auto-associative neural networks for sensor validation. ............ 58
Figure 15 - Adaptive threshold algorithm for designing and choosing ARMA models. ............... 61
Figure 16 - Adaptive threshold algorithm simulation on real-time data. ....................................... 62
Figure 17 - Intelligent Valve database schema. ............................................................................. 64
Figure 18 - Software framework for the Intelligent Valve framework. ......................................... 67
Figure 19 - MTTP Trailer used for validating sensor faults. ......................................................... 70
Figure 20 - Simulation data using thermal modeling for base run. ................................................ 73
Figure 21 - Data acquisition setup for thermal modeling fault detection. ..................................... 74
Figure 22 - Simulation data using thermal modeling for faulty connections in Tustin
amplifier input................................................................................................................................ 75
Figure 23 - Fault classification using thermal modeling for faulty connections in Tustin
amplifier input................................................................................................................................ 75
Figure 24 - Simulation data using thermal modeling for amplifier power downs and
Tustin input disconnections. .......................................................................................................... 76
vii
Figure 25 - Fault detection using thermal modeling for amplifier power down and
Tustin input disconnection. ............................................................................................................ 77
Figure 26 - Simulation data using thermal modeling for faulty input connections in
the digitizer. ................................................................................................................................... 78
Figure 27 - Fault detection using thermal modeling for amplifier power down and
Tustin input disconnection. ............................................................................................................ 78
Figure 28 - Simulation data using thermal modeling for simulated frost insulation test 1. ........... 79
Figure 29 - Fault detection using thermal modeling for frost insulation test 1. ............................. 80
Figure 30 - Simulation data using thermal modeling for simulated frost insulation test 2. ........... 81
Figure 31 - Fault detection using thermal modeling for frost insulation test 2. ............................. 81
Figure 32 - Data acquisition modified setup for thermal modeling fault detection. ...................... 82
Figure 33 - Simulation data using thermal modeling for temperature junction reference errors. .. 83
Figure 34 - Fault detection using thermal modeling temperature for junction reference errors. ... 83
Figure 35 - Simulation data using thermal modeling for thermocouple and power
disconnections. ............................................................................................................................... 84
Figure 36 - Fault detection using thermal modeling for thermocouple and power
disconnections. ............................................................................................................................... 85
Figure 37 - Simulation data using thermal modeling for thermocouple disconnections
and shorts. ...................................................................................................................................... 86
Figure 38 - Fault detection using thermal modeling for thermocouple disconnections
and shorts. ...................................................................................................................................... 86
Figure 39 - Simulation data using thermal modeling for transmitter power failures. .................... 87
Figure 40 - Fault detection using thermal modeling for transmitter power failures. ..................... 88
Figure 41 - Simulation data using thermal modeling for unaccounted thermocouple junctions. .. 89
Figure 42 - Fault detection using thermal modeling for unaccounted thermocouple junctions. .... 89
Figure 43 - Comparison of predicted and actual frost line............................................................. 90
Figure 44 - Example of a hard fault. .............................................................................................. 95
Figure 45 - Example of a soft fault. ............................................................................................... 95
Figure 46 - Example dataset from LLAV and downstream pressure sensor. ................................ 96
Figure 47 - Hard fault detection using AANN. .............................................................................. 97
Figure 48 - Soft fault detection by AANN. .................................................................................... 97
viii
Figure 49 - Fault detection of a simulated hard fault in a pressure sensor..................................... 98
Figure 50 - Fault detection of a soft fault in a pressure sensor. ..................................................... 99
Figure 51 - Detection of a simulated disconnect in a pressure transducer. .................................. 100
Figure 52 - Legend for AANN estimations: (a) Top estimation plots and (b) bottom
error plots. .................................................................................................................................... 101
Figure 53 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors
under normal operating conditions. ............................................................................................. 101
Figure 54 - AANN Estimation for PE-1143-GO and PC1 pressure sensors under
normal operating conditions. ....................................................................................................... 102
Figure 55 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors
under normal operating conditions. ............................................................................................. 102
Figure 56 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with
hard fault in PE-1143. .................................................................................................................. 104
Figure 57 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with hard
fault in PE-1143. .......................................................................................................................... 104
Figure 58 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD pressure
sensors with hard fault in PE-1143. ............................................................................................. 105
Figure 59 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors
with level shift in PE-1143-GO. .................................................................................................. 107
Figure 60 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with level
shift in PE-1143-GO. ................................................................................................................... 107
Figure 61 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors
with level shift in PE-1143-GO. .................................................................................................. 108
Figure 62 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors
with noise in PC1. ........................................................................................................................ 110
Figure 63 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise in PC1. . 110
Figure 64 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors
with noise in PC1. ........................................................................................................................ 111
Figure 65 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with
noise in VPV-1139-FB. ............................................................................................................... 112
ix
Figure 66 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise
in VPV-1139-FB. ......................................................................................................................... 113
Figure 67 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors
with noise in VPV-1139-FB. ....................................................................................................... 113
Figure 68 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure
sensors with simultaneous faults in PE-1143-GO and PC1. ....................................................... 115
Figure 69 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with
simultaneous faults in PE-1143-GO and PC1. ............................................................................. 115
Figure 70 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve
sensors with simultaneous faults in PE-1143-GO and PC1. ........................................................ 116
Figure 71 - Set point transitions for adaptive thresholding testing. ............................................. 119
Figure 72 - Set point transition #1 with fault detection while operating in :
(a) normal OS and (b) faulty OS. ................................................................................................. 120
Figure 73 - Set point transition #2 with fault detection while operating in:
(a) normal OS and (b) faulty OS. ................................................................................................. 121
Figure 74 - Set point transition #3 with fault detection while operating in:
(a) normal OS and (b) faulty OS. ................................................................................................. 123
Figure 75 - Set point transition #4 with fault detection while operating in:
(a) normal OS and (b) faulty OS. ................................................................................................. 124
Figure 76 - Set point transition #5 with fault detection while operating in:
(a) normal OS and (b) faulty OS. ................................................................................................. 126
Figure 77 - Set point transition #6 with fault detection while operating in:
(a) normal OS and (b) faulty OS. ................................................................................................. 127
Figure 78 - Average fault values for different parameters of the ARMA model
thresholding method over all tests. .............................................................................................. 128
Figure 79 - Training data with final threshold fit......................................................................... 129
Figure 80 - Fault detection of simulated obstruction fault using adaptive thresholding. ............. 130
Figure 81 - Frost line visualization of LLAV. ............................................................................. 133
Figure 82 - Cross sectional and exploded view with flow and position visualizations. .............. 133
Figure 83 - Frost line visualization of LLAV with thermocouple values. ................................... 134
Figure 84 - Linear equation with 0 mean and 1 variance. ............................................................ 135
x
Figure 85 - Linear time series with 0 mean and 10 variance ....................................................... 135
Figure 86 - Original model time series. ....................................................................................... 136
Figure 87 - AR prediction of first time signal at 1 prediction step and SNR = 25dB. ................. 137
Figure 88 - AR prediction of first time signal at 5 prediction steps and SNR = 25dB. ............... 137
Figure 89 - AR prediction of first time signal at 5 prediction step and SNR = -5dB................... 138
Figure 90 - AR MSE performance on 0 mean, 1 variance signal. ............................................... 138
Figure 91 - ARMA prediction of first time signal at 1 prediction step and SNR = 25dB. .......... 139
Figure 92 - ARMA prediction of first time signal at 1 prediction step and SNR = -5dB. ........... 139
Figure 93 - ARMA prediction of first time signal at 5 predictions steps and SNR = -5dB. ........ 140
Figure 94 - ARMA MSE performance on 0 mean, 1 variance signal. ......................................... 140
Figure 95 - Kalman filter prediction of first time signal at 1 prediction step and
SNR = 25dB. ................................................................................................................................ 141
Figure 96 - Kalman filter prediction of first time signal at 5 prediction steps and
SNR = 25dB. ................................................................................................................................ 141
Figure 97 - Kalman filter prediction of first time signal at 5 prediction steps and
SNR = -5dB. ................................................................................................................................ 142
Figure 98 - Kalman filter MSE performance on 0 mean, 1 variance signal. ............................... 142
Figure 99 - Original time series model #2. .................................................................................. 143
Figure 100 - AR MSE performance on 0 mean, 10 variance signal. ........................................... 144
Figure 101 - ARMA MSE performance on 0 mean, 10 variance signal. ..................................... 144
Figure 102 - Kalman filter performance on 0 mean, 10 variance signal. ..................................... 145
Figure 103 - ARX prediction of the LLAV data to 30 time steps................................................ 146
Figure 104 - Performance for ARX model based on LLAV data. ............................................... 147
Figure 105 - ARMAX prediction of the LLAV data to 30 time steps. ........................................ 147
Figure 106 - Performance for ARMAX model based on LLAV data.......................................... 148
Figure 107 - Kalman prediction of the LLAV data to 30 time steps. .......................................... 148
Figure 108 - Performance for Kalman filter based on LLAV data. ............................................. 149
Figure 109 - Intelligent Valve statistics tab. ................................................................................ 151
Figure 110 - Intelligent Valve thermocouple tab. ........................................................................ 152
Figure 111 - Intelligent Valve setup tab....................................................................................... 153
xi
LIST OF TABLES
Table 1 - An example morphological matrix of a redesigned rail bogie. ...................................... 12
Table 2 - Description of the four types of failure mode and effect analysis (FMEA). .................. 15
Table 3 - Possible values of the parameters used in a FMEA. ....................................................... 18
Table 4 - Diagnostic algorithms from the literature. ...................................................................... 27
Table 5 - Prognostic algorithms from the literature. ...................................................................... 35
Table 6 - Failure modes and effects for LLAV. ............................................................................. 47
Table 7 - Thermocouple types and ranges. .................................................................................... 51
Table 8 - Data server class interface. ............................................................................................. 66
Table 9 - Adaptive threshold simulation parameters. .................................................................... 71
Table 10 - Physical parameter obtained from least square optimization curve
fit of base run. ................................................................................................................................ 72
Table 11 - Performance metrics for faulty connection in amplifier input. ..................................... 91
Table 12 - Performance metrics for amplifier power down and Tustin input
disconnect. ..................................................................................................................................... 91
Table 13 - Performance metrics for input disconnection on the digitizer. ..................................... 91
Table 14 - Performance metrics for frost insulation test 1. ............................................................ 91
Table 15 - Performance metrics for frost insulation test 2. ............................................................ 92
Table 16 - Performance metrics for temperature junction reference error. .................................... 92
Table 17 - Performance metrics for thermocouple and power disconnection................................ 92
Table 18 - Performance metrics for thermocouple disconnections and shorts. ............................. 92
Table 19 - Performance metrics for transmitter power and failure. ............................................... 93
Table 20 - Performance metrics for unaccounted thermocouple junction. .................................... 93
Table 21 - Average performance metrics for all thermocouple fault tests. .................................... 93
Table 22 - Performance metrics for fault detection using AANN under normal
operating conditions. .................................................................................................................... 103
Table 23 - Performance metrics for fault detection using AANN with injected
hard fault in PE-1143-GO. ........................................................................................................... 105
xii
Table 24 - Performance metrics for fault detection using AANN with injected
level shift fault in PE-1143-GO. .................................................................................................. 108
Table 25 - Performance metrics for fault detection using AANN with injected
noise in PC1. ................................................................................................................................ 111
Table 26 - Performance metrics for fault detection using AANN with noise
in VPV-1139-FB. ......................................................................................................................... 114
Table 27 - Performance metrics for fault detection using AANN with simultaneous
faults in PE-1143-GO and PC1. ................................................................................................... 116
Table 28 - Operating Statistics for LLAV ................................................................................... 132
xiii
GLOSSARY OF TERMS
1. Health Management - A comprehensive system that detects, isolates, and quantifies
faults as well as predicts future failures in an engineering system
2. Condition based maintenance - The use of machinery run-time data to determine
the machinery condition and hence its current fault/failure condition, which can be
used to schedule required repair and maintenance prior to breakdown
3. Prognostics and health management - The prediction of future failure conditions
and remaining useful life of a system, subsystem, or component
4. Reliability centered maintenance - The process that is used to determine the most
effective approach to maintenance
5. Failure Conditions - States of components and subsystems that are indicative of a
fault occurring in the overall system.
6. Dimensionality Reduction - The process of reducing the number of random
variables under consideration in order to create a more accurate set of feature vectors.
7. Fuzzy Logic - A form of multi-valued logic derived from fuzzy set theory to deal
with reasoning that is approximate rather than accurate
8. Intelligent Component - A component in a system that relays not only raw data, but
some sort of analysis on the data, e.g. FFT, DSP, moving average, fault and failure
conditions, etc.
9. Artificial Neural Network - A mathematical model or computational model that is
inspired by the structure and/or functional aspects of biological neural networks. It
consists of an interconnected group of artificial neurons and processes information
using a connectionist approach to computation.
10. Integrated Systems Health Management - a set of system capabilities that in
aggregate perform: determination of condition for each system element, detection of
anomalies, diagnosis of causes for anomalies, and prognostics for future anomalies
and system behavior
xiv
11. Fault Diagnosis - Detecting, isolating, and identifying an impending of incipient
failure condition -- the affected component (subsystem, system) is still operational
even though at a degraded mode.
12. Fault Diagnosis - Detecting, isolating, and identifying an impending of incipient
failure condition -- the affected component (subsystem, system) is still operational
even though at a degraded mode.
13. Failure Diagnosis - Detecting, isolating and identifying a component (subsystem,
system) that has ceased to operate.
14. Fault Detection - Detection of the occurrence of faults in the functional units of the
process, which lead to undesired or intolerable behavior of the whole system.
15. Fault Isolation - Localization (classification) of different faults.
16. Fault Analysis or Identification - Determination of the type, magnitude and cause of
the fault
17. Failure modes effects and criticality analysis (FMECA) - A procedure in product
development and operations management for analysis of potential failure modes
within a system for classification by the severity and likelihood of the failures.
xv
Chapter 1: INTRODUCTION
As system complexity increases, the amount of data required to monitor system failures
also increases. Originally, a human operator could view the raw time series data to find
sensor faults that could be traced back to a root cause in the system. In modern day
systems, however, the increase in sensor data can make it difficult if not impossible for a
human operator to find anomalies in these systems in a timely fashion [1]. Therefore,
reliability engineers are deploying automated algorithms that detect failure modes in
complex, dynamic systems.
The goal of these algorithms has been extended from
detecting threshold violations in the sensors to identifying and quantifying the
degradation of health in a system and even predicting faults before they occur.
Numerous techniques have been developed with help from extensive research and
funding being put into the field of health analysis. The United States military has taken
particular interest in health analysis in order to provide their troops with robust and
reliable systems. Military studies found that maintenance protocols were based on a
schedule rather than a degradation of performance. The scheduled maintenance leads to
many components being replaced before their operational lifetime had ended. Health
analysis allows for preventive maintenance to be performed based on the current health
1
state of the component and has considerable cost benefits while also keeping operators of
the machines safe.
In any system, the proper health analysis technique must be determined and
depends on several design parameters such as application, severity, accuracy, historical
data, constraints, deadlines, and complexity of the physical dynamics of the system. For
instance, in a manufacturing plant certain critical components can cause a shutdown for
days. The shutdown can cost considerable delays in the shipping of the manufactured
products. Such systems would require a highly accurate algorithm such as a physics
model, but these algorithms also take the longest amount of time and cost to develop.
In the realm of health analysis, three major technologies have arisen: Condition
Based Maintenance, Prognostics and Health Management, Reliability Centered
Maintenance and Integrated Systems Health Management. Condition Based Maintenance
(CBM) is defined as “the use of machinery run-time data to determine the machinery
condition and hence its current fault/failure condition, which can be used to schedule
required repair and maintenance prior to breakdown” [2].
Prognostics and Health
Management (PHM) refers to the prediction of future failure conditions and remaining
useful life of a system, subsystem, or component.
Integrated Systems Health
Management (ISHM) “describes a set of system capabilities that in aggregate perform:
determination of condition for each system element, detection of anomalies, diagnosis of
causes for anomalies, and prognostics for future anomalies and system behavior” [3].
Reliability centered maintenance (RCM) "is the process that is used to determine the
most effective approach to maintenance" [4].
2
As systems grow in size and complexity, there will be a need to develop
algorithms that have higher accuracy, are more general, and have longer prediction
intervals than current system health analysis.
1.1 Applications
Diagnostics and prognostics make up the core components of the health analysis
framework. These two technologies are not limited to just engineering, but are also used
in medical, and business applications. While the goals of the analysis may be different,
the techniques used in the diverse fields are often the same.
The medical field uses diagnostics to try and determine the health of a patient, and
the disease that is affecting the patient. Once a diagnosis has been made, remedial
procedures can be created to try and help the patient as much as possible. Three of the
methods that are used by medical professionals are exhaustive, algorithmic, and patternrecognition. The exhaustive method uses every possible question and runs all possible
tests in order to create the most comprehensive diagnosis possible. The algorithmic
method follows steps from a proven strategy to diagnosis the disease based on the
symptoms the patient is going through. The final method, pattern-recognition, uses past
experience to recognize a pattern of clinical characteristics in order to diagnosis the
patient. While the procedures are different for each method, the goal of finding the
disease and coming up with a treatment based on the symptoms and test data available is
the same [5].
The global economy is constantly in a state of flux, with company’s stocks rising
and falling every day. The ability of an investor to predict these changes would result in
success for his company.
Therefore, algorithms are being designed that attempt to
3
analyze, model, and even forecast the stock market in an attempt to find trends that will
tell investors when it is the best time to buy or sell their stocks. Financial forecasting is
also used by top management for planning and implementing long-term strategic
objectives. The methods used by forecasters usually rely on probabilistic models such as
regression and Markov models. The main drawback of most of these models is the
difficulty in taking into account all the variables as well as the functions or relationships
of those variables that contribute to something as complex as the global market [6].
1.2 Motivation
The National Aeronautics and Space Administration (NASA) was formed in 1958 and
has quickly established itself as a worldwide leader for air and space research. The
accomplishments of the public space agency resulted in a number of firsts including an
interplanetary flyby, pictures from another planet, and manned landings on the moon, the
assembly the launch of a space station.
Since its inception, NASA has placed an
emphasis on the safety of its astronauts during their voyages into space. However, even
with safety procedures and equipment, the manned space flight program has suffered
several catastrophic loses including the crews of Apollo 1, STS-51-L, and STS-107.
Since the two space shuttle disasters, NASA has focused research on the development of
an Integrated Systems Health Management (ISHM) platform to ensure the highest level
of safety possible for future endeavors in space [7, 8].
NASA defines Integrated System Health Management (ISHM) as “a capability
that focuses on determining the condition (health) of every element in a complex System
(detect anomalies, diagnose causes, prognosis of future anomalies), and provide data,
information, and knowledge (DIaK) - not just data - to control systems for safe and
4
effective operation[9]”. The vision of NASA is to start incorporating ISHM at the
beginning of the conceptual design until the end of the manufacturing cycle for future
missions.
By allowing safety to influence conceptual design, engineers can catch
potential failures and anomalies in systems before they are fully designed.
By catching these flaws early enough, the best opportunity for costs savings can
be exploited at the earliest stages in development. The development and implementation
of ISHM in the complex systems designed by NASA can also create additional costs if
not applied correctly. Therefore, risk analysis tools are being created that find a balance
between cost, performance, safety and reliability throughout the system lifecycle.
NASA Stennis Space Center (SSC) in Mississippi is the location of one group
researching ISHM technologies. NASA-SSC’s primary responsibility is the testing of all
the rocket engines before they are launched from NASA Kennedy Space Center. This
includes the Space Shuttle main engine (SSME) and the new J-2X, which are both critical
to the success of their respective missions. While the SSME is being phased out, the J2X is a brand new engine in its first stages of testing. The engines require highly
combustible fluids such as liquid hydrogen and liquid oxygen to create thrust of up to
294,000 lbs [10]. The engines are bolted down to massive superstructures and are fired
for the exact amount of time the engine stays lit during the live launch. To date, no
shuttle mission ever has been delayed or aborted due to an engine failure [11]. To
continue this perfect track record, an ISHM module is being created for the newly
renovated A-Complex test stands.
One of the important aspects of the ISHM module is the determination of failures
and anomalies in the valves of the test stand.
5
These valves are responsible for
maintaining a precise flow of cryogenic fluids needed to fuel a test article. The cost of
these test articles is extremely high and even a small discrepancy in the flow rate of
cryogenic fluids can cause catastrophic events. The cost to run a test program can be in
the millions of dollars and extended delays can cause the cancellation of an entire
program. The restrictive constraints placed upon the operation of the test stands requires
the test engineers to continually monitor the valve operations and, at the first sign of
degradation, repairs must be made quickly and efficiently. Currently human engineers
perform the analysis on the valve data, but the implementation of an intelligent
framework with algorithms to monitor the health of the valves in the system could help
give additional insight to the engineers at NASA-SSC.
The valuable statistical,
diagnostic, and prognostic information introduced with such a framework could generate
advisories that, when combined with the domain expert’s opinions, would produce the
greatest accuracy in maintenance decisions.
1.3 Objectives
The objectives of this thesis are 1. To design a framework for the detection of faults and failure modes in the large linear
actuated valve that are used on the rocket engine test stands at NASA-SSC.
2. To develop a diagnostic process that –
a. Receives and stores incoming sensor data;
b. Performs calculation of operating statistics;
c. Compares with existing analytical models; and,
d. Visualizes faults, failures, and operating conditions in a 3D GUI environment.
6
3. To develop a suite of diagnostic algorithms that can detect anomalous behavior in the
valve and other system components of the rocket engine test stand.
4. To expand the capability of the diagnostic algorithm to perform prognosis in specific
context.
1.4 Scope
The survey of current diagnostic and prognostic techniques focused on how to apply
these algorithms to specific algorithms and is presented in the background section of this
thesis. The steps of a health analysis framework are also presented in the background
section.
The development of these algorithms is defined in the approach section, with
specific applications to NASA-SSC's E-complex test stand. Particularly, the valve in
question is the Large Linear actuator valve, which is a critical component to the test
stands at NASSA-SSC. The algorithms are tested on both actual data from the rocket
engine test stands, as well as simulated data from forward analytic models.
1.5 Organization
This thesis is organized as follows. Chapter 1 provides introductory information on
NASA’s history and the motivation of the agency to develop an Integrated System Health
Management framework for its rocket engine test stands.
Possible applications,
objectives, and expected contributions are also discussed. Chapter 2 provides a thorough
background on the development process of health analysis algorithms and frameworks.
An overview of the framework is given and then proceeding sections provide detailed
information for each step. Chapter 3 outlines the approach taken to develop diagnostic
and prognostic algorithms for the detection of anomalies in sensor data from the valves at
7
NASA-SSC. Chapter 4 is an account of the results of creating the functional database
and intelligent valve framework, following the premises outlined in Chapter 3. Chapter 5
is a summary of accomplishments presented in this thesis, as well as future research
recommendations.
1.6 Expected Contributions
This thesis will provide a detailed summary of existing methods for health analysis with
applications to the ISHM components used at NASA-SSC.
It will also provide a
literature review of existing diagnostic and prognostic algorithms. A functional database
that utilizes neural networks will be integrated into the existing ISHM framework. This
database will detect alarms in the subsystems and components of the test stand. These
alarms can then be used for root cause analysis to pinpoint faults and failures in the
complex test stands.
This thesis will also show the approach and results of an intelligent valve
framework. There will be two modes that the framework will be used in: health analysis
algorithms and a diagnostic process. The diagnostic process, which is run in real-time
during tests, will be responsible for the capturing of operating statistics, thermal model
diagnostics, and a 3D model of the valves. The health analysis algorithms, which will be
run after a test series has completed, is responsible for the development and validation of
advanced diagnostic and prognostic algorithms for the determination of the remaining
useful life for the valves. These algorithms will eventually convey advisory information
to NASA engineers for maintenance options in the valves at the E-Complex test stand.
8
Chapter 2: BACKGROUND
The following section contains a summary of previous work performed in the area of
fault diagnosis, fault detection, and prognostics. A detailed method of the entire health
analysis framework will be given. Finally, a discussion of various system identification
techniques will be presented.
2.1 Health Analysis
As engineering systems have become more complex, the cost to maintain these systems
has also increased dramatically. Therefore, research in the area of system health analysis
has emerged over recent years to help alleviate the cost of these expensive machines.
The research has been split into two major areas: Condition-based maintenance (CBM)
and prognostics and health management (PHM) [2].
CBM focuses on the detection of faults in a system and then labeling a specific
component that caused the fault.
This methodology replaces traditional scheduled
maintenance which commonly resulted in working parts bring replaced before their
useful life had expired.
PHM algorithms attempt to determine the remaining useful life of a system after a
fault has occurred. Knowing the remaining life of a system can minimize the downtime
risk for critical systems in manufacturing plants [12].
As systems become more
advanced, physical modeling has become too expensive to develop in a timely fashion
9
and can become too specific to be useful in health management. Therefore, systems are
broken into smaller subsystems that can be modeled more easily.
Ideally, these
subsystems are able to be modeled by first order physics equations. If this degree of
complexity is not sufficient, system identification techniques are used to model a system
based on historical data.
These techniques and their application in system health
management will be explored in the following sections of this thesis.
2.2 Framework for Health Analysis
Modeling the entire health of a system can be very complex and is impractical in most
cases. Therefore, the approach of health analysis is broken up into different sections that
include systems, subsystems and components into a pipeline that streamlines the entire
process. While the input and output formats of the sections are defined, they are each
treated as a black box where only pertinent information is passed on to the next level of
the analysis. The entire pipeline can be seen in Figure 1 [2] with a description of each
section following.
10
Figure 1 - Integrated approach for system health analysis.
2.3 Design and Trade Studies
The first step of the health analysis process is to examine the system from a top level, and
determine the best approach for each failure mode identified.
In 2002, a formal
methodology was accepted by U.S. Department of Defense called integrated product and
process design (IPPD) [3]. IPPD defines the following tasks:

Define the problem

Establish value

Generate feasible alternatives

Establish alternatives

Recommend a decision
11
The IPPD framework is applied to the system during the design phase. Its main
purpose is to provide guidance to the engineers designing the system. The IPPD uses a
morphological matrix that lists the functions of the system and proposes alternative
methods of accomplishing those functions. An example of a morphological matrix of the
redesign of a rail bogie can be seen in Table 1 [13].
Table 1 - An example morphological matrix of a redesigned rail bogie.
Function
To connect the
wheel-set and the
carriage
To allow the
primary
suspensions
simultaneously
working
To reduce
oscillations
between the
bogie and the
carriage frame
Actuator Solutions
Carriage spring
Bogie
Bogie with
single-stage
suspension
Coaxial helical
springs + shock
absorber
Helical springs
working in
parallel + shock
absorber
Helica springs
working in
parallel + shock
absorber
Helical springs
working in
parallel with
shock absorber
Coaxial
Pressure spring +
rubber small
block
When allowing CBM/PHM to contribute to the design at an early stage, more
reliable systems can be built based on past experience of what failures occur most often
in what equipment.
While the morphological table presents the best technology to
perform each function in a system, it is not always feasible in a budget to build a system
with the most state of the art components. Therefore, the morphological table must be
presented side-by-side with a quantitative analysis of the benefits of each component.
Decision analysis is a field that is well studied and provides several techniques to
quantify the options available to the design engineers. A mathematical model has been
developed for the selection of the best alternative attributes based on incomplete
preference information to asses attribute weights [14]. There are various methods of
12
multiple attribute analysis model (MADM) which are ideal for quantifying the attributes
in the morphological matrix [15].
To completely satisfy the tasks of the design and trade studies phase, all design
aspects of a system must be chosen from the techniques described above. Final design
choices should be made only after expert opinions have been solicited or simulation
studies are performed [2]. All design alternatives should be accompanied by some
technique of numerical rankings in order to best select the attributes which solve the
functions required by the system as well as stay within the budget constraints placed on
the system.
After these choices have been made, the output of the design and trade
study section is a design of a system and subsystems which accomplish the task with the
greatest reliability possible. The next stage then analyzes these designs from a health
standpoint in order to understand the failure modes of the system.
2.4 Failure Mode Analysis
Understanding not only what component fails in a system, but why it fails is critical to
any health analysis platform. To perform complete health analysis, these failures must be
classified by their criticality in the system. The field of study has become known as
failure modes and effects analysis (FMEA), and many methods have been presented in
the literature. NASA Ames Research Center developed a failure mode mechanism
through clustering analysis. The analysis includes a statistical clustering procedure to
retrieve information on the set of predominant failures that a function experiences [16].
The Society of Automative Engineers (SAE) has also developed a FMEA
procedure specifically for the automotive industry. They split their approach and have
separate procedures for the design phase, as well as the manufacturing and assembly
13
phase. It contains recommendations for appropriate terms, requirements, ranking charts,
and worksheets.
The SAE standard is not as general as the other mentioned
methodologies, which makes it only usable for the automotive industry. Therefore, the
work in the remainder of this thesis will focus on general standards that can be applied to
any health analysis framework [17].
The United States military developed a procedure for performing FMEA in
Military Procedure MIL-P-1629. The evaluation criteria of this standard determined the
effect of system and equipment failures. The criteria was extended to the Mil-Std-1629A
in order to add criticality analysis to the failure modes. NASA formally developed and
applied the 1629A method in the 1960's to improve the reliability of its space program.
The 1629A standard has become the most widely accepted method used through the
military and commercial industry [18]. Even though 1629A is considered a standard, in
many applications it is applied more as a template that is altered and updated to meet the
needs of the project. For example, similar to the SAE standards, the design process is
separated into multiple phases such as System FMEA (SFMEA), Design FMEA
(DFMEA), Process FMEA (PFMEA), System FMEA (SFMEA). A diagram each is seen
in Figure 2 with a description following in Table 2 [19].
14
Figure 2 - The four types of failure mode and effect analysis (FMEA).
Table 2 - Description of the four types of failure mode and effect analysis (FMEA).
Type
Focus
System
Minimize failure effects on the
system
Design
Minimize failure effects on the
design
Process
Minimize process failures on
the total process (system)
Service
Minimize service failures on
the total organization
Objectives and Goals
Maximize system quality,
reliability, cost, and
maintainability
Maximize design quality,
reliability, cost, and
maintainability
Maximize the total process
(system) quality, reliability,
cost, maintainability, and
productivity
Maximize the customer
satisfaction through quality,
reliability, and service
To create a complete and thorough standard process that can be used in a wide
variety of applications, certain terminology has been defined in the Mil-Std-1629A
15
document in order to simplify the communication channels between design and FMEA
team. The overall objective of the FMEA process is to discover all of the ways a process
or product can fail. Failures occur not only because of design or manufacturing flaws,
but also by misuse of the product by the operator.
That is why it is essential to
investigate all four types of FMEA; which leads to a study that follows a product from
concept and design, to the manufacturing and distribution. While these evaluations are
not guaranteed to be comprehensive, any customer complaints are able to be addressed
due to the understanding of the system based on the failure modes and effects that have
been analyzed [20].
A FMEA is a straightforward process that allows for a system to be broken down
into easily analyzed parts where failure modes are identifiable. A formal definition given
by NASA Lewis Research Center distinguishes three specific components as the
objective of a FMEA [21]:
1. Analyze and discover all potential failures modes of a system
2. Effects these failures have on the system
3. How to correct and/or mitigate the failures or effects on the system
The effects of these failure modes can be more difficult to determine from a
system level. A design FMEA can be conducted by a bottom-up approach, where the
lowest level component is analyzed, or a top-down approach where an upper level failure
is chosen, then the lower level effects are analyzed. Figure 3 shows these two approach of
failure analysis [21].
16
Figure 3 - Reliability analysis procedure for bottom-up and top-down FMEA approaches.
Once the FMEA approach has been selected, failure modes are classified based on
a set of parameters including: severity, frequency of occurrence, and testability. It is
common for these criterion to be classified based on fuzzy values rather than numerical
values as seen in Table 3. The fuzzification of the values allows for the FMEA to be
performed on systems without large amounts of quantitative data of the faults of a
system. The study also identifies the symptoms that the system exhibits while under the
fault condition, as well as recommendations of the observers that can monitor and track
the fault as it occurs [2]. The selection of observers to identify a fault may not always be
a physical sensor, but rather the features that can be extracted from data in order to build
a diagnostic algorithm.
To identify these key components, domain experts must
contribute to the FMEA study, particularly those experts who have experience with the
exact components being used to perform a specific function in the system. After the
parameters are given values, the priority of each is listed in a scale or table in order to
17
identify the keys components of a system where health diagnostic algorithms should be
developed. Once these algorithms are developed, the system is reevaluated by the same
parameters, but with improved testability and occurrence scores for those failure modes
which have been addressed [20].
Table 3 - Possible values of the parameters used in a FMEA.
Parameter
Severity
Frequency of
occurrence
Testability
Possible Values
Catastrophic
Critical
Marginal
Minor
Likely
Probable
Occasional
Unlikely
Comments based on domain expert's knowledge
Two downsides arise from the use of fuzzy logic in the FMEA process. The first
is that there is no quantitative priority number that can be deduced from the fuzzy values.
A very straight forward solution, and the most commonly applied, is to defuzzify the
values into a scalar range from 1-10 for each of the parameters. The resulting values are
then multiplied together to form a priority number, commonly known as the risk priority
number (RPN) [22]. In the same manner as before, after a diagnostic model of failure
mode with the highest RPN is designed and verified, the RPN is readjusted based on the
newly evaluated occurrence and testability parameters. Eqs. 2.1 and 2.2 shows the
formula for both the RPN and the readjusted RPN [23].
𝑅𝑃𝑁 = 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 ∗ 𝑆𝑒𝑣𝑒𝑟𝑖𝑡𝑦 ∗ 𝑇𝑒𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦
% 𝑅𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑖𝑛 𝑅𝑃𝑁 =
𝑅𝑃𝑁𝑖𝑛𝑖𝑡𝑖𝑎𝑙 − 𝑅𝑃𝑁𝑟𝑒𝑑𝑢𝑐𝑒𝑑
𝑅𝑃𝑁𝑖𝑛𝑖𝑡𝑖𝑎𝑙
(2.1)
(2.2)
Equation 1- (2.1) Risk priority number and (2.2) readjusted risk priority number formulas.
The other downside of the fuzzy FMEA approach is the lack of a biasing tool for
the parameters. For example, if a component has a high likely of occurrence, but low
18
severity and testability, and another component has high severity, but low occurrence and
testability; their priority may fall at exactly the same location in the FMEA. While in
some applications this priority scoring may be desired, some failure modes must be
identified based strictly on their severity to the system. Therefore, a criticality analysis is
added to the analysis which weights the parameters based on the applications goals and
design team concerns. The addition of the criticality parameter results in a ranking
system based on the severity classification of a failure mode, as well as the probability of
occurrence based on historical data. If there is no historical data, then a qualitative
approach must again be used, but the more desired approach is again to use the
quantitative number scaling used above.
Based on the failure modes and effects
criticality analysis (FMECA) standard being used for the system, the scaling will changed
based on specifications put forth by the designer.
Failure mode and effects criticality analysis is a very important, but often
overlooked section of the health analysis framework. It may be partially due to the
amount of time, effort, and research that must be put into the collection and analysis of
data. Collaboration in a FMECA system is essential for it to be performed correctly,
particularly when there are varying systems that require advice from different domain
experts. Also, FMECAs should be done iteratively through the life a component to
guarantee that all failure modes are identifiable and recommended actions can be
performed in the result of a real failure when the product is being used by the customer
[2].
19
2.5 CBM Testing, Data Collection, and Data Analysis
After the potential failure modes of a system have been identified, the next step in the
health analysis framework is the design of the required instrumentation and dataacquisition system in order to gather baseline data under real operations. One system
level approach to perform the design task is to decompose a system into six distinct parts.
This hierarchy, developed by Pennsylvania State University's Applied Research
Laboratory, allows for data acquisition to be performed on the lowest level before the
system is even constructed. The hierarchy is comprised of areas of focus that can be
examined by multiple level of engineers and scientists [24]:
Figure 4 - System decomposition for CBM testing, data collection, and data analysis.
By dividing a system into these 6 specific levels, the amount of health analysis
algorithms is broadened to support many different fields of engineering. For example, by
analyzing the material of a subsystem, non-destructive evaluation can be used in order to
determine degradation whereas at a system level it would be more difficult to see the
applicability of such techniques. Also studies have been performed on how materials
degrade under hostile conditions [25, 26]. These previous studies can be applied to
20
different CBM applications in order to minimize the amount of redundant research being
performed.
Another method, developed by at the University of South Carolina (USC), relies
on historical data and a relational database to tag key anomalies while maintenance is
being performed.
The historical data is retrieved from Maintenance Management
Systems (MMS), which are more traditional maintenance records that holds information
on faults and the repair actions performed. These systems are used by companies and
manufacturers in order to optimize and control the maintenance of its facilities [27].
With an abundance of fault and failure data in a MMS, a link can be built between itself
and a Health and Usage Monitoring System (HUMS) in order to monitor vehicle
component parameters. The system developed by USC attempts to create this data link
by extracting metadata from the MMS textual descriptions and combining it with the
statistical analysis performed by the HUMS. The integrated service benefits greatly from
large amounts of both qualitative and quantitative historical data; however, without a
common data format for both the MMS and HUMS, USC's MMS and HUMS link is
very application specific and difficult to apply to existing structures [25]. Example
implementations of the USC's method can be found in [28].
2.6 Algorithm Development - Diagnostics
Once faults have been seeded, proper sensor instrumentation selected and data obtained,
algorithms must be developed in order to detect failure modes as early as possible.
Diagnosis is a subject studied not only for machine systems, but other disciplines such as
medicine, sciences, business, and finance [29-31]. While the application and objective of
each is different, the methodology of detecting anomalous conditions using appropriate
21
sensor data is the same. Due to the vast number of applications for diagnosis, this
research area has been well studied during recent years. The two areas of focus on fault
diagnostics are [2]:

Fault Diagnosis: Detecting, isolating, and identifying an impending of incipient
failure condition -- the affected component (subsystem, system) is still operational
even though at a degraded mode.

Failure Diagnosis: Detecting, isolating and identifying a component (subsystem,
system) that has ceased to operate.
The overall concept consists of three essential tasks [29]:

Fault Detection: detection of the occurrence of faults in the functional units of
the process, which lead to undesired or intolerable behavior of the whole system

Fault Isolation: localization (classification) of different faults

Fault Analysis or Identification: determination of the type, magnitude and cause
of the fault
Applications require different approaches depending on the nature of the faults
and the system. Also, a strong influence in the method chosen is based on the amount of
historical data available. If large amounts of fault data has been collected, automatic
clustering algorithms can be utilized with fuzzy logic or neural networks in order to
detect known faults in a system or component [30]. Conversely, if little fault data is
available, model based approaches must be used in an attempt to create an accurate
physical representation of the system. Figure 5 shows the diagnostic and prognostic
framework [2]. The following sections will present an in-depth, though not exhaustive,
22
summary of various diagnostic methods which have been applied in various health
frameworks.
Figure 5 - Diagnostic and Prognostic Flowchart.
2.6.1 Preprocessing and Feature Extraction
Diagnostic algorithms can be considered a subset of pattern recognition and machine
learning due to the objective of classifying the current state of a machine based on the
incoming sensor data. As such, the raw data provided will not always yield the greatest
classification percentage; instead data must be analyzed in different forms and
combinations in order to extract useful information for a given fault.
Inconsistencies in
data, such as process and measurement noise, must also be considered during processing
in order to provide accurate and reliable results.
When considering these anomalies,
careful attention must be given to ensure a proper balance between signal integrity and
information loss. Preprocessing techniques normally have tradeoffs and based on the
application the engineer must be able to distinguish the degree of noise reduction that
must occurs.
In some instances a technique as simple as a low-pass filter will be
sufficient, but in other instances more advanced techniques such as Kalman filters,
wavelets, and artificial neural networks must be applied to the signal [31].
23
A classic problem with pattern recognition is a lack of information found in raw
data. Therefore, pertinent information from the sensor data must be found using feature
extraction. Many times, a very difficult problem can be reduced down to a few variables
by dimensionality reduction. These reductions in the size of the feature vector allows for
redundant data to be ignored while focusing the diagnostic algorithms on the information
that is relevant to the problem being solved. Several feature extraction techniques will be
discussed in the upcoming sections, but only as they pertain to the specific applications of
interest in this research. References [2, 32, 33] provide additional insight regarding
feature extraction.
2.6.2 Techniques for Diagnostics
Fault diagnostics requires a careful choice of implementation based on the objectives of
the project as well as the data provided to the CBM designer. These implementation
choices have been the subject of numerous investigations in recent decades. Reference
[2] lays out several major objectives that a CBM must envelop:

Ensure enhanced maintainability and safety while reducing operation and
support cost

Be designed as an open systems architecture

Closely control PHM weight

Meet reliability, availability, maintainability, and durability (RAM-D)
requirements

Meet monitoring, structural, cost, scalability, power, compatibility, and
environmental requirements
24
The technologies utilized to accomplish these objectives are split into two major
areas: model-based and data-driven. Model-based approaches involve the development
of an accurate physical model of the system under evaluation. Incoming sensor data is
then monitored and compared against the model in order to find residuals. The major
benefit of the model-based approach is its ability to detect unanticipated faults. For
mission-critical systems, the ability to detect such faults is an invaluable resource.
Conversely, the major drawback of model-based approaches is the complexity of modern
machine systems. If a system's dynamics are too complex, developing a model that is
accurate enough to find faults without false positives may prove too difficult or costly to
be a viable solution.
In the cases where determining a model is improbable due to the complexity of a
system, a data-driven approach is an alternate technology that has been proposed that
relies on parameter estimation from historical data to create a mathematical model of the
system.
Machine learning and system identification techniques are very common
methods utilized in such circumstances. Machine learning techniques such as artificial
neural networks, support vector machines, and fuzzy-logic allow engineers to classify
faults based on sensor data without any knowledge of the underlying system. System
identification techniques such as regression, black-box and state-space models create a
mathematical representation of the system by estimating known physical parameters. It
is essential for a design engineer to understand that while the mathematical model can
accurately depict the output of a system, it contains no information of the physical
dynamics of a system as discussed in the model-based approach. In fact, both machine
learning and system identification rely on a large amount of historical data to create a
25
robust algorithm that allows for accurate classification of faults. This requirement is one
of the major drawbacks of these technologies because many times a CBM system is
designed in parallel with the hardware of a system and little to no operational data exists
[34]. Figure 6 shows flow charts for both techniques [2].
Figure 6 - Model-based and Data-driven diagnostic techniques.
As discussed in the previous section, there is often a lack of information in raw
sensor data.
Therefore, a feature vector must be constructed that contains enough
information to determine the current operating mode. That vector information in a
model-based system will usually be the physical parameter that defines the system’s
dynamics. In a data-driven method, statistical regression and system identification will
most likely be used. Once the parameters that make up the feature vector have been
found, they can be compared with a library of fault vectors to determine the current fault
state of the machinery. Many times, a complex problem can be simplified to a few
parameters extracted from raw data. Once the fault has been found, advisory generation
26
can be created based on a database of corrective maintenance for each individual fault. In
the next section, once a fault has occurred, the remaining useful life (RUL) will attempt
to be found based on prognostic algorithms [2, 29].
The following table shows recent research in the literature and describes several
diagnostic approaches along with their applications.
Table 4 - Diagnostic algorithms from the literature.
Authors and Paper Title
V.Puig, J. Quevedo, T. Escobet, F. Nejjari,
and S. de las Heras, “Passive Robust Fault
Detection of Dynamic Processes Using
Interval Models,” 2008 [35]
H. Bassily, R. Lund, and J. Wagner, “Fault
Detection in Multivariate Signals With
Applicatios to Gas Turbines” 2009 [36]
Area of Research
Model-based fault detection based on interval
models that generate adaptive thresholds using
three schemes (simulation, prediction, and
observation)
Compares
multivariate
autocovariance
functions of two independently sampled signals
in order to create a model-based algorithm to
detect faults in a gas turbine
Utilizes fuzzy-genetic algorithm to detect
different types of actuator failures in a
nonlinear F-16 aircraft model
Uses an adaptive method to overcome false
alarms in slowly degrading manufacturing
processes that use Hotelling T2 and squared
prediction errors.
Details an intelligent adaptive fuzzy system
with self-learning functions that monitors
electrical equipment
C. H. Lo, Eric H. K. Fung, and Y. K. Wong,
“Intelligent Automatic Fault Detection for
Actuator Failures in Aircraft”, 2009 [37]
G. Spitzlsperger, C. Schmidt, G. Ernst, H.
Strasser, and M. Speil, “Fault Detection for
a Via Etch Process Using Adaptive
Multivariate Methods,” 2005 [38]
W. R. A. Ibrahim, M. M. Morcos, “An
Adaptive Fuzzy Self-Learning Technique for
Predication of Abnormal Operation of
Electrical Systems” 2006 [39]
S. Huang and K. K. Tan, “Fault Detection Uses multiple radial basis functions to estimate
and Diagnosis Based on Modeling and both the unknown nonlinear dynamics as well
as the fault characteristics of a simulated
Estimation Methods, ” 2009 [40]
system
J. Yun, K. Lee, K. Lee, S. B. Lee, J. Yoo, Proposes a stator-winding turn-fault detection
“Detection and Classification of Stator Turn algorithm using sensorless zero-sequence
or
negative-sequence
current
Faults and High-Resistance Electrical voltage
Connections for Induction Machines”, 2009 measurements.
[41]
The authors of [35] demonstrated a technique using interval models to detect
faults. The paper compares and contrasts several different interval models. In particular,
they show the benefits and drawbacks of simulation, observation, and prediction interval
27
models. They applied their fault detection algorithm to the European Research Training
Network DAMADICS servo motor.
They used several different parameter
configurations for each and use an optimization criterion to create an adaptive threshold.
When the input signal crosses the threshold, a fault indicator is set to a high state. It is
seen in the paper that the simulation method had the greatest accuracy because the model
does not depend on current inputs.
The adaptive threshold in the prediction and
observation follows the input sensor values too closely and either has too many false
alarms or too many false negatives. Quantitative and qualitative analyses are given for
each method.
The autocovariance function for any zero mean stationary d-dimensional signal
can be used to determine if two independently sampled signals are statistically identical.
Reference [36] presents the theory behind such a claim, and goes on to provide insight on
how to use this property for diagnostics. The authors develop a statistical measure to
determine signal equality, and then applies the measure to multi-dimensional bivariate
white noise and compared with the empirical probabilities of several simulated models.
These are compared to ensure the feasibility of the statistical measure.
To show
applicability to machinery, the method is applied to a gas turbine at Clemson University.
Several artificially induced faults are tested including: added synthetic noise, partial
blockage, and compressor relief valve failure.
The final results were compared to
standard dynamic principal component analysis and it was found that the statistical
measure was able to detect the faults earlier and with more accuracy.
The authors of [37] developed an automatic fault detection system using genetic
algorithms and fuzzy logic. The fuzzy-genetic algorithm is proposed to eliminate the
28
need for hardware redundancy in aircrafts and instead suggests that analytic redundancy
is sufficient when a robust algorithm is applied to the dynamic behaviors of such a
system.
The algorithm claims the capability to detect four types of failure mode
including no fault, elevator failure, aileron failure, and rudder failure. It detects these
failures by first fuzzifying the residuals and then evaluating them by an inference
mechanism using if-then rules. In order to optimize the rule table referenced by the
inference mechanism, a genetic search algorithm is used. The fuzzy rule table is coded
into a chromosome and the fault models are integer numbers. These chromosomes are
set to the size of the fuzzy rule table and decoded for the fuzzy evaluation system. The
fitness value of each chromosome is compared to the optimal objective function in order
to determine the optimal fuzzy rule table for each fault based on the residuals of the
sensor data. The algorithm is applied to a simulation study of the faults in a nonlinear F16 aircraft model. The system is compared to a linear classifier and a neural network.
The results show that the fuzzy-genetic algorithm performs well on all faults, and is very
resistant to measurement noise in the residuals. The proposed algorithm outperforms
both the linear classifier and the neural network in all cases.
In the semiconductor industry, Hotelling T2 and the squared prediction error are
gaining acceptance to monitor data provided by modern process tools. These methods
require models based on the covariance matrix of the training data set and problems arise
in the slow drift of modern manufacturing processes. Therefore, false alarms are created
during the estimation process which effectively negates the benefits of the diagnostic
algorithms. To counteract these drawbacks, an adaptive method for multivariate models
is developed [38]. The authors take a current adaptive method of centering and scaling
29
and expand it to incorporate domain knowledge as well as remodeling using a moving
window approach. In each case the Hotelling T2 chart is tuned based on the current drift
of the system. The results were not seen as promising and it was found that engineering
knowledge was more important to the update of the individual univariate rather than the
automatic updating of the proposed methods.
The presence of fuzzy logic in diagnostic environments is gaining more
acceptance due to its flexibility and soft classification of faults. One of the drawbacks of
fuzzy logic is the required domain knowledge and historical data needed in order to
create an accurate and robust fuzzy rule table. Reference [39] attempts to overcome these
requirements is by creating a self-learning process that can predict failure modes in a
monitored system. The algorithm first determines the number of data points required to
find the underlying trend successfully. The next step is to determine how long of a
period is required to fully define a trend. Wavelet denoising is then applied to the signal
to create a clean signal for the fuzzy logic predictor.
Two fuzzy techniques are
considered by the authors. The first uses a single fuzzy system to not only learn a trend,
but indicate whether or not the trend is part of a trend previously learned by the system.
The second technique uses a fuzzy system that learns the specific data trend, and a second
general fuzzy system that compares the incoming data to the trend produced by the first
fuzzy system. Both techniques perform well on a long and short-term simulation. They
were both able to select an adequate number of data points and period for detecting fault
trends in simulated data. The author compares the two techniques and describes the
applications for each.
30
Artificial neural networks (ANNs) have been accepted as a way to perform
function approximation without knowledge of the underlying system. The ability for an
ANN to balance its weights using optimization techniques and activation functions makes
them an ideal candidate for diagnostic algorithms. The authors of [40] propose an
algorithm that uses multiple radial basis function (RBF) neural networks to not only
detect faults, but also diagnose them. The first RBF is trained on nominal system data in
order to determine the mechanics of the system. If sufficient data is provided and the
optimal weight vector found, the RBF then becomes a state observer and residuals can be
calculated based on the incoming sensor data. These residuals are compared to an
existing threshold, and if found to be outside the bounds of error, then the observer
indicates that the system is in a failure mode state. After this step fault detection has been
performed and another RBF must be used to perform the fault classification and
diagnosis. The second RBF uses online tuning methods to diagnose the current failure
mode of the system. The RBF is initialized with its output weights set to zero in order to
force the initial state to be a “no failure” case. As the second RBF is trained using the
online data, its failure feature is compared with that of well-understood failures to
diagnose the system’s fault. The neural network was tested with simulation data of a
linear motor in order to prove its feasibility in real work. The author’s validation of their
results is considered future work when the algorithm is tested on a real robotic system.
One of the leading root causes of failure in an industrial plant is the open- and
short-circuit faults in the electrical circuit of the motor and electrical-distribution system.
These failures must be continually monitored to guarantee reliability. Reference [41]
identifies a monitoring technique to find stator-winding turn faults using sensorless
31
methods. From simple current and voltage measurements, the faults can be detected by
identifying modes of zero-sequence voltage and negative-sequence current which can be
related to the turn-faults and high-resistance connections. The authors used the dynamic
model of the motor, which had been derived in references given by the paper.
Experiments were performed on a 4P 380-V 10-hp induction motor in order to
demonstrate feasibility of the algorithm.
The stator-winding turn faults and high
resistance electrical connection faults were able to be detected and diagnosed using the
proposed method. The results promise added benefits and flexibility to maintenance
schedules in industrial plants.
2.7 Algorithm Development - Prognostics
The ability to predict faults and failures in a machine can yield great benefits for both
manufacturers and users.
Prognostics is the field of study that attempts to find solutions
to the very difficult problem of predicting the future states of systems. There are many
more challenges in predicting failures then simply identifying the current state of the
machinery.
In addition, once a failure has been detected or predicted, prognostic
algorithms must also find the propagation of the fault through the rest of the system.
Similar to diagnostics, the ability to predict the future state of a system is being widely
researched in fields other than engineering and health analysis. Financial researchers
have long attempted to forecast the stock market and provide investors with inside
knowledge into how the market fluctuates [42]. Meteorologists use artificial intelligence
along with advanced radars and sensors to predict storm paths and the formation of
natural disasters such as tornados and hurricanes [43]. Even with years of research,
advances in sensor technologies, and developments in the mathematical models of such
32
systems, prediction still is based on a probability where multiple scenarios must be taken
into account to ensure the highest reliability.
Prognostics can be broken into three categories: experience-based, evolutionary or
trending models, and physical model-based.
Experience-based is the most general of the three in that the algorithm will be
applicable to almost every machine system. This class of algorithms usually relies on
expert analysis of engineers who have worked extensively with the system. With expert
domain knowledge, a maintenance schedule can be developed and engineers can make
decisions with assistance from statistical measures and probability functions.
The evolutionary model requires enough historical data to develop an accurate
mathematical representation of the system. When the model is implemented into the
health analysis framework, the future values are predicted based on the operating
conditions and previous sensor data inputs. From these values, the model can then
predict future outputs and when faults will occur. Since these models are based strictly
on the input data, synthetic inputs can be built to simulate different operating conditions
that the system may encounter during its operational lifetime. The model is built from
historical data which makes it difficult for it to predict what will happen during abnormal
conditions. Therefore, any simulations run outside normal operating systems should be
taken as more of an uncertain advisory then an actual prediction of what will happen.
The more historical data that is available to create the mathematical model will normally
increase its accuracy and robustness when being deployed.
The final method, building physical models, is the most costly yet most accurate
approach for prognostics. Physical modeling requires a dynamic model that extracts
33
parameters from the system in order to predict the future state of the system. Once a
physical model has been made, different prediction technologies such as autoregressive
moving-average techniques or Kalman filters can be applied to predict future states of the
system. Physics-based models require the need for knowledge of both past and current
conditions in order to create a dynamic model that can be applied at any point during the
lifetime of a component. One benefit of the physical models is its ability to predict the
remaining useful life (RUL) of a component without any knowledge of faults that have
occurred in the system, though such information can increase the overall accuracy of the
prognostics system. Physical models require a thorough engineering knowledge of the
system to find quantitative measures of material properties and physical parameters
which represent the health of a system. These measures are then predicted based on the
current operating conditions and are accompanied by a probabilistic model which
provides an uncertainty factor [44]. Figure 7 shows the three prognostic algorithms along
with their scope of work, cost, and accuracy [2].
34
Figure 7 - Approaches for prognosis.
The following table shows recent research in the literature and describes several
diagnostic approaches along with their applications.
Table 5 - Prognostic algorithms from the literature.
Authors and Paper Title
F. Peysson, M. Ouladsine, R. Outbib, J.B.
Leger, O. Myx, C. Allemand, “A Generic
Prognostic Methodoloy Using Damage
Trajectory Models,” 2009 [44]
Z. Sun, J. Wang, D. How, G. Jewell,
“Analytical Prediction of the Short-Circuit
Current in Fault-Tolerant PermanentMagnet Machines,” 2008 [45]
Y. Zhang, G. W. Gantt, M. J. Rychlinski, R.
M. Edwards, J. J. Correia, C. E. Wolf, “
Connected
Vehicle
Diagnostics
and
Prognostics, Concept, and Initial Practice,”
2009 [46]
Area of Research
Presents a prognostic framework then
decomposes a system into three levels:
environment, mission, and process. Decision
and data fusion between the three levels is used
to create predictions.
Describes an analytical technique to predict
short-circuit current in a fault-tolerant
permanent-magnet machine under partial-turn
short-circuit fault conditions.
Presents a complete end-to-end framework of
diagnostics and prognostics of General Motors
vehicles.
Presents initial results of the
implemented framework
35
M. Baybutt, C. Minnella, A. E. Ginart, P. W.
Kalgren, M. J. Roemer, “Improving Digital
System Diagnostics Through Prognostics
and
Health
Management
(PHM)
Technology,” 2009 [47]
P. Lall, M. N. Islam, M. K. Rahim, J. C.
Suhling,
“Prognostics
and
Health
Management of Electronic Packaging,” 2006
[48]
S. K. Yang, “A Condition-Based FailurePrediction and Processing-Scheme for
Preventive Maintenance,” 2003 [49]
A. H. Al-Badi, S. M. Ghania, E. F. ELSaadany, “Prediction of Metallic Conductor
Voltage Owing to Electromagnetic Coupling
Using Neuro Fuzzy Modeling,” 2009 [50]
Integrates prognostics and diagnostics from
engineering disciplines to provide minimally
invasive onboard monitoring of digital systems.
Investigates methods to determine material
state in complex systems and subsystems to
determine RUL.
Specifically, electronic
packaging is targeted as a candidate for such
methods.
Uses an application-specific integrated circuit
(ASIC) to perform preventive maintenance
using Petri nets and Kalman filter prediction.
The application of the ASIC is a thermal plant.
Presents a Fuzzy algorithm that can predict the
level of a metallic conductor voltage. Provides
simulation results and validation for three
scenarios.
The authors of [44] present an overview of the prognostic approaches described in
the previous section.
They utilize these different technologies in the design of a
prognostics system for a ship.
They extend the technologies by applying not only
operating conditions and sensor readings, but also the environmental conditions under
which the system is placed during its lifetime.
It creates a formal method for modeling
a complex system based on the mission (operating condition), environment, and process.
The process is decomposed into the resources, where a resource is piece of equipment, or
a set of equipment. The mission is defined as the use of the system during a time period.
It analyzes the start and end dates of the mission as well as the set of places where that
task is performed. The environmental is the area where the system operates. It can be
characterized by a set of environmental variable that include air temperature, air
humidity, and wind force. The environment variables are then fuzzified which allows for
a definition for the impact an environment has on the system. A rule base can then be
defined in order to perform fault diagnosis and prognosis.
36
The fusion of all three
elements, mission, environment, and process, provides a damage trajectory that predicts
the degradation of resources, subsystems, and overall system. A simulation was created
where a ship was traveling on a tour of Africa. Different missions and processes were
created to test the degradation of a ship during the travel. Initial results showed that the
framework was in fact a feasible method for the predictions of degradation of a complex
system.
Fault-tolerant permanent-magnet machines are showing promise in aerospace and
automotive sectors. Fault models have been developed for such machines, but in order to
predict failures only lengthy processes have been developed thus far. Reference [45]
presents an analytical approach that quantifies various parameters of the machines.
These parameters are then used to identify worst-case short-circuit scenarios in the design
state and formulate remedial actions.
The derivation of the short-circuit current is
provided as well as a validation by finite element analysis. Experimental validation was
also conducted by seeding various failure modes into the machines and seeing if analytic
model correctly identified the short-circuit current. The results showed promise for the
short-circuit current to be a viable method of feature-extraction for fault detection and
prediction. In particular, a Kalman filter was recommended to extract the fundamental
components of the feature and predict future faults.
The authors of [46] present a methodology for diagnostics and prognostics for
vehicles, specifically those manufactured by General Motors (GM).
Three key
challenges are faced by vehicle manufacturers: unexpected new faults, infrequent and
intermittent faults, and prediction of system RUL. Many vehicle manufacturers develop
maintenance schedules for consumers, but many times parts are replaced before their
37
operational life is actually completed. To compensate for scheduled maintenance, a
concept called Connected Vehicle Diagnostics and Prognostics (CVDP) has been
developed where fault data is stored in onboard electronics and downloaded by the
manufacturing during its maintenance services.
The fault data is then analyzed to
determine root causes for the intermittent faults of the vehicles. If-then rules are applied
to the data of a battery management system in order to detect any failure modes. The
conditions of the rules are currently specified by domain experts, but future work will
allow for adaptive thresholds to be computer when sufficient data is acquired.
A
weighting of the parameters that caused the failure is then computed based on the number
of if-then rules violated. These weights allow engineers to determine the root cause of
intermittent failures which would previously not have been detected.
Preventive
maintenance can then be performed when the parameters in other cars of the same model
are seen to be degrading. The system has been deployed in a GM manufacturing plant.
Digital systems are now present in everyday life for most consumers. Since
manufacturing techniques are not fault-proof, many times systems fail before their
lifetime. In mission critical situations, especially in military or manufacturing sectors,
these faults can produce catastrophic events. Therefore, the authors of [47] present a
technique for the detection and prediction of faults in digital electronic systems. The
focus of the paper is on the degradation of MOSFET devices and four particular failure
modes: thermal cycling, hot carrier effects, time-dependent dielectric breakdown
(TDDB), and electromigration. The system used to test the PHM methods is a MPC7447
and faults are seeded to accelerate the degradation of the processor. Aggregate power of
the processor is tracked as the main feature of degradation in the processor. Multiple
38
histograms are calculated over time and compared to analyze the feature and find
different failure modes for the processor. Based on the statistical feature vector, a
percentage of life consumed is calculated based on the amount of time the processor is
operating at a specific temperature. From this percentage, RUL is calculated from a life
consumption model and fault-to-failure progression data.
The authors of [48] present a novel method of prognosis based on the damage
caused by prior stress histories of electronic packaging. The paper states that the U.S. Air
Force throws away 1000 components to remove a single unknown one that is predicted to
be in a failed state based on a theoretical model. If analysis of the post stress conditions
of such components could be performed, the cost impact of prognostic methodologies
could be immense as wasted life is recovered without increasing risk. Components were
tested as simulated thermal cycles were applied.
From this data, a mathematical
relationship was developed between phase growth and time to failure. Correlations were
found between the rate of change of the phase growth parameter and existing macro
indicators of damage. It is shown that RUL can be found based on phase growth rate and
interfacial shear stress of the chip.
State estimation is becoming a leading technology for the prognosis of complex
machine systems. Kalman filters have become a particularly appealing solution as it
contains an error parameter which provides a confidence interval of the prediction
through time.
Reference [49] incorporates such methods with Petri nets to find and
predict failures in a thermal plant. The Petri nets are a graphical representation of
relationships between conditions and events and allows for the root causes of failures to
be found and preventive maintenance to be performed only on those components which
39
are failing. Kalman filters are then applied to the current state of the system in order to
predict the following state. N-Step state predictions can be performed as well, but the
confidence of each step decreases as the error in the covariance matrix of the filter
increases. These methods were implemented on an application specific integrated circuit
and used in a thermal power plant to validate the framework. Initial results of the
proposed scheme were seen to be very promising.
The authors of [50] discuss the ability for interference of circuit conductors to be
transferred from one to the other without any physically connected components. A fuzzy
model was conceived as a method to predict the interference caused by overhead
transmission lines. The feature vector was calculated using linear correlation analysis,
nonparametric correlation analysis, and partial correlation analysis. If-then rules were
applied using training data obtained during the project. Fault current, soil resistivity,
separation distance, and mitigation systems were the fuzzified four inputs and total
pipeline maximum voltage was the defuzzified output. The member functions used the
fuzzy model are found in the paper and the effect of interference based on nearby
metallic structures was analyzed. Excellent agreement between test and validation data
was obtained for three different scenarios.
2.8 Reliability Centered Maintenance
RCM is defined as an analytical process used to determine appropriate failure
management strategies to ensure safe and cost-effective operations of a physical asset in a
specific operating environment. It relies heavily on prior knowledge of the system and
subsystems under evaluation. It was developed after it was found that most systems were
being replaced before their active useful life.
40
It compares the requirements of the
component from a user perspective and the design reliability of the component. When
employed, it is used in conjunction with the FMECA as references to the CBM and PHM
portions of the health analysis framework to guarantee that the following seven questions
are answered during a failure [2, 51]:
1. What is the item supposed to do and its associated performance standards?
2. In what ways can it fail to provide the required functions?
3. What are the events that cause each failure?
4. What happens when each failure occurs?
5. In what way does each failure matter?
6. What systematic task can be performed proactively to prevent, or to diminish to a
satisfactory degree, the consequences of the failure?
7. What must be done if a suitable preventive task cannot be found?
2.9 System Identification Techniques
System identification is the method in which mathematical models of dynamical systems
are built based on observed data of the system. These methods can save the cost and time
of having an engineer develop physical models of a system. The methods usually require
a large, well notated database of historical system data in order to build a robust
mathematical model. There are three entities involved in creating these mathematical
models [34]:

A data set

A set of candidate models

A rule by which candidate models can be assessed
41
Figure 8 shows the general system identification loop.
Figure 8 - The system identification loop.
42
2.9.1 Autoregressive Models
The most basic of system identification techniques is a linear difference equation between
the input and output of a system. While there are continuous time models in system
identification, discrete time models are used most often in practice. These difference
equations are known as autoregressive models and is notated by:
𝑦(𝑡) + 𝑎1 𝑦(𝑡 − 1) + … + 𝑎𝑛 𝑦(𝑡 − 𝑛) = 𝑏1 𝑢(𝑡 − 1) + … + 𝑏𝑚 𝑢(𝑡 − 𝑚)
(2.3)
This notation may be altered to solve for the next output value given the previous
observations:
𝑦(𝑡) = −𝑎1 𝑦(𝑡 − 1) − ⋯ − 𝑎𝑛 𝑦(𝑡 − 𝑛) + 𝑏1 𝑢(𝑡 − 1) + ⋯ + 𝑏𝑚 𝑢(𝑡 − 𝑚)
(2.4)
To account for measurement process noise, a zero-mean white noise distribution can be
estimated using another coefficient, which estimates error based on a moving average:
𝑦(𝑡) = −𝑎1 𝑦(𝑡 − 1) − ⋯ − 𝑎𝑛 𝑦(𝑡 − 𝑛) + 𝑏1 𝑢(𝑡 − 1) + ⋯ + 𝑏𝑚 𝑢(𝑡 − 𝑚) + 𝑒(𝑡)
(2.5)
+ 𝑐1 𝑒(𝑡 − 1) + ⋯ + 𝑐𝑛 𝑐 𝑒(𝑡 − 𝑛𝑐 )
To correctly model a system mathematically, the coefficients of 2.3 must be calculated.
There are various methods that can calculate the coefficients based on recorded inputs
and outputs over a time interval. Two of the most popular are the Levinson-Durbin
recursive algorithm and least squares method [34].
2.9.2 Kalman Filters
State-space models are developed to form a relationship between the input, noise, and
output signals using an auxiliary state vector.
mechanisms of the system.
These models incorporate physical
One type of state-space model is the Kalman filter, which
was developed in the 1960s [34]. The discrete Kalman filter is defined in two steps, a
time update and a measurement update. The time update equations are defined by:
43
𝑥𝑘+1 = 𝐴𝑘 𝑥𝑘 + 𝐵𝑘 𝑢𝑘 + 𝐺𝑘 𝑤𝑘
(2.6)
𝑃𝑘− = 𝐴𝑘 𝑃𝑘−1 𝐴𝑇 + 𝑄
(2.7)
Where 𝐴𝑘 and 𝐵𝑘 are vectors of parameters that correspond to unknown values of
physical coefficients, material constants, etc, 𝐺𝑘 is vector of parameters describing the
process noise in the system, 𝑥𝑘+1 is the prediction of the state vector time, 𝑥(𝑡) is the
internal state vector, 𝑤𝑘 is the process noise of the system, 𝑢(𝑡) is the control input to the
system, 𝑃𝑘 is the a posteri estimate error covariance and 𝑄 is the process noise
covariance. The measurement equations are defined by:
𝐾𝑘 = 𝑃𝑘− 𝐻 𝑇 (𝐻𝑃𝑘 𝐻𝑇 + 𝑅)−1
(2.8)
𝑥𝑘 = 𝑥̂𝑘− + 𝐾𝑘 (𝑧𝑘 − 𝐻𝑥̂𝑘− )
(2.9)
𝑃𝑘 = (𝐼 − 𝐾𝑘 𝐻)𝑃𝑘−
(2.10)
The first step in the measurement update is to compute the Kalman gain, 𝐾𝑘 .
Then the process or sensor is actually measured and placed into 𝑧𝑘 . This is used to
generate a posteriori state estimate by incorporating the new measurement data. The final
step is to obtain an a posteriori error covariance estimate as in Eq. 2.10. The goal of the
Kalman filter is to minimize the posterior covariance error. The equations are recursive
which make them appealing for practical applications. In the field of prognosis, the
Kalman filter can perform multiple time updates without a measurement update to predict
health variables in the future [2].
44
Chapter 3: APPROACH
This thesis attempts to build a framework for an intelligent valve module for ISHM. This
framework is based on the health analysis framework discussed in the previous section.
This section will focus on the specific approach taken to fulfill the objectives of each
segment in the framework. Most of the work presented is for the general support of
valves in a mission critical situation, but some is specific applications to the NASA-SSC
test stand environment. The particular valve that will be analyzed is the large linear
actuator valve (LLAV) which is responsible for the distribution of cryogenic fluids to the
test stand and test articles. Figure 9 shows the regions of interest of the LLAV.
45
Figure 9 - LLAV with regions of interest labeled.
3.1 Failure Modes
Valves are a critical component for the day to day operations at NASA-SSC. The valves
must be precisely machined to meet the strict specifications set forth by the test stand
operators.
These specifications raise the price of the valve, which can be tens of
thousands of dollars.
Though manufacturing of the valves is meticulous, physical
degradation still occurs because of the strenuous environment where the valves operate.
In particular, the LLAV must transport cyrogenic and noncyrogenic fluids in high
pressures to test articles on the test stands at NASA-SSC. Therefore, a FMECA must be
performed in order to classify and rank the important failure modes for the LLAV. The
analysis was performed in the early stages of the project in order to guarantee that the
algorithms developed could detect the failure modes of the valves. Since the valves have
already been developed and the sensors have been chosen, the goal of this FMECA will
46
to identify the critical faults and attempt to find solutions with the current capabilities at
NASA-SSC.
The LLAV FMECA was performed in conjunction with Scott Jensen, a NASASSC test operations engineer and domain expert in the valves on the test stands. Scott
was able to provide valuable insight into the valve’s operational characteristics of the
valves in the E-Complex test stand. These characteristics include the role the LLAVs
fulfill, descriptions of the different components in a LLAV, the signs of degradation in
the LLAV, and the common failure modes that have been identified by the NASA-SSC
test operations engineers. The information was compiled and risk priority numbers were
calculated to prioritize the failure modes identified during the study.
The algorithms
and framework was then able to be designed around the specific task of collecting data
that could identify and eventually predict these failure modes. Table 6 and Figure 10
shows the results of the FMECA:
Table 6 - Failure modes and effects for LLAV.
Function
Controller for cryogenic fluid
tank
Failure Mode
Seat Wear cause leaking fluid
Monitor the feedback of the
valve and downstream
pressure
Packing at the top of the valve
prevents leaks and allows for
balanced pressure
Faulty pressure sensor falsely
indicate valve failure
When frozen, the packing can
crack and break apart,
degrading the performance of
the valve
Actuator must transition from If the valve does not open or
fully open to fully closed in a close at consistent timings,
consistent amount of time.
valve maintenance must be
performed
The controller of the valve If the PID controller is
sends a valve to full close.
unstable or telling the valve to
get to a value it cannot reach,
the actuator may “bounce” on
47
Effects
Fluid can enter system during
a test causing catastrophic
failure
Incorrect valve maintenance
may be performed
Valve may not function
properly or be able to maintain
needed pressure for test
Emergency
shutdown
procedures may not be
performed properly.
Seat wear (described above)
can occur more quickly
resulting in delays and
increased maintenance costs.
the seat causing degradation in
the soft metal.
The valve feedback must Excessive “deadtimes” create If the mixture is not precise is
respond to the control signal in poor timing in test operations certain test articles, undesired
an appropriate time for and can cause pressure or flow results can occur.
effective test operations.
mixture errors.
2200
Seat Wear
2000
1800
Frost Point
Risk Priority Number
1600
1400
1200
Sensor Failures
1000
800
600
400
Extended Deadtimes
200
Transition Times
Seat bouncing
0
0
1
2
3
4
5
6
Criticality
7
8
9
10
11
Figure 10 - Prioritization of LLAV failure modes (see Equations 2.1 and 2.2 for y-axis calculation) .
3.2 Intelligent Valve Framework
Once the failure modes were found, the framework could be constructed based on the
requirements set forth by NASA. Figure 11 shows the system level flow chart of the
framework. Figure 12 shows the detailed health analysis framework for the intelligent
valve.
48
NASA Data
Acquistion
System
(DDE Server)
DDE Client
Measurement
Data
WonderWare
.NET Plugin
G2 Diagnostic
Environment
Health Data
Virtual Reality
Environment
Health Data
Figure 11 - System level flowchart of the Intelligent Valve framework.
Figure 12 - Health analysis framework for the Intelligent Valve.
3.2.1 Data Acquisition
The E-Complex operations center utilizes both the User Datagram Protocol
(UDP) and Dynamic Data Exchange (DDE) protocol to transmit data between the
networked computers in their test stands. Under the advice of NASA test operators, it
49
was decided that the best method of acquiring data into the plug-in was via the DDE
pipeline. The selection of DDE over UDP provided several benefits, as well as certain
drawbacks that must be accounted for in the development process. Some of the benefits
were:

The data could be acquired by simple strings rather than parsing the UDP packet’s
binary file.

The data would already be formatted into engineering units based on the
calibration sheets used in the UDP format.

WonderWare and Labview, both used for test operations, have built-in support for
Network DDE (NDDE).

Since the developers will not be at Stennis for the tests, application setup is easier
with DDE because of the prior knowledge the test engineers possess.

The framework can request just the specific data it requires for its algorithms
reducing its network footprint.
The drawbacks that must be overcome are:

The maximum DDE transfer rates are much lower than UDP.

The DDE data packet does not include an accurate time stamp for data annotation.

While WonderWare still includes DDE with its applications, the developers of the
protocol, Microsoft, have not updated or supported it for over a decade.
The most crucial drawback in the selection of the DDE protocol is the absent time
stamp in the data packet.
Fortunately, NASA has a link to the Inter-range
instrumentation group (IRIG) system which provides highly accurate timestamps to the
networked computers in the test stands. In the software, this IRIG timer is used to
50
timestamp all the data as soon as it is acquired from the system. While there are still
some delays from the data acquisition software, this presents an accuracy that is usable to
compare data for algorithm development purposes. If the framework ever became a
“mission-critical” component the accuracy issue would have to be addressed more
strictly.
3.2.2 Preprocessing
To validate incoming data, threshold checks are performed at the acquisition of each data
point. The thermocouple data are subjected to the following test:
𝑇𝑚𝑖𝑛 ≤ 𝑇 ≤ 𝑇𝑚𝑎𝑥
where Tmin and Tmax are the minimum and maximum temperatures of the thermocouple
type. The following table gives the type and temperature range for some commonly used
thermocouples:
Table 7 - Thermocouple types and ranges.
Thermocouple Type
J
K
E
T
Minimum Temperature (oC)
0
-200
-200
-250
Maximum Temperature (oC)
750
1250
900
350
The valves are also subjected to the threshold test:
−5 ≤ 𝑉 ≤ 100
where V is the feedback or control signal of the valve. While it does not seem intuitive
that the valve state can be below zero, the operators at NASA use this method to
guarantee that a tight seal is being created between the actuator and the soft metal at the
bottom of the valve.
51
The data acquisition systems used in the E-Complex test stand perform
preprocessing techniques themselves in an attempt to deliver noiseless signals to the test
stand computers. Therefore, there is no need for advanced preprocessing techniques in
the intelligent valve module. Moreover, this allows the module to classify any noise
detected in the signal as an anomaly instead of process and measurement noise.
3.2.3 Failure Mode Detection and Diagnosis
The failure modes were investigated based on the FMECA with priority given to those
with a high RPN.
3.2.4 Valve Operational Statistics
Seat wear is one of the most severe and costly failure modes that can occur in the LLAV.
Not only is it expensive to replace the valve seat and insert, but it also forces excessive
delays in projects. It is very difficult to obtain a direct quantitative measurement of the
seal without the use of additional sensors. There are studies into detecting seat wear and
recession, but all use external instrumentation such as x-ray machines that are not
available for this research. Even though a direct measurement is not possible, combining
the valve's operational statistics and test operator's expert knowledge can provide
information and advisories for maintenance teams.
After consulting with the test
operations team at NASA-SSC, seven statistics were selected for observation. They are
as follows:

Transitions - The amount of times the valve has traveled from a completely open
to a completely closed with non-cryogenic fluid flow.

Cryogenic Transitions - The amount of times the valve has traveled from a
completely open to a completely closed with cryogenic fluid flow.
52

Distance Traveled - The linear distance the valve has traveled in inches.

Last transition time - The time it took for a valve to go from completely open to
completely closed.

Average transition time - The average of the last ten transitions from completely
open to completely closed.

Direction changes - The amount of times the valve has changed motion from
either opening to closing or closing to opening.

Number of closings - The amount of times the valve has come to a completely
closed state.
These statistics can be used to measure how the valve is performing under certain
operating conditions.
To detect the events, an algorithm, seen in Figure 13, was
developed based on the changing state of the valve and definitions presented previously.
53
Figure 13 - Valve statistics algorithm.
In the specific application of detecting seat recession, the statistics of relevance
are transitions, cryogenic transitions, and number of closings. When under cryogenic
conditions the metal packing hardens, reducing the amount of degradation on the seat.
Conversely, non-cryogenic closings create a deeper impact and reduces the operational
life of the seat. As stated in Table 6, seat bouncing can also adversely affect the seat if
not detected. The number of closings can be observed between tests and compared to the
amount of closings the controller relayed to the valve during the test. If there is a large
54
disparity between the two, it is an indication of bouncing either due to a valve fault or
controller instability. In either case, seat wear can be accelerated when there is a constant
changing of force on the seat.
3.2.5 Auto-associative Neural Networks for Sensor Validation
In order to provide a test article with the correct mixture of propellants, pressure and fluid
flow must be kept at very specific rates. This requires accurate sensors that can relay the
current readings back to test operations. The readings during failure modes of the sensors
can be unpredictable and can cause misclassified faults in a valve. For example, if a
downstream pressure sensor has a near zero reading after a valve is opened, it can appear
as though the valve did not open properly. When this happens, weeks or months of
unnecessary valve repairs may be performed instead of the day it takes to replace a
sensor.
There are two main approaches to this type of fault, physical and analytic
redundancy. Physical redundancy requires the use of multiple, similar sensors in the
same spatial location.
Many times three sensors will be used and majority-rules
weighting system is used to determine the actual reading. Analytic redundancy exploits
functional relationships between components in the systems. The functions are normally
isolated into closely related subsystems to reduce their complexity. While physical
redundancy is a more robust solution than analytic redundancy, it is not feasible in all
situations. At the E-Complex test stands there is a limited amount of sensors that can be
attached to the data acquisition system for any given test. Also, running additional
connections through the complex test stand is very costly and safety protocols apply
stringent rules to where and how wires can be run.
55
Analytic redundancy can be applied to a system using either a complex model
comprised of physical properties and equations or a mathematical model that
approximates the functional relationship based on previous data. The physical model
results in a very detailed understanding of the system, and is applicable only for the
current system setup. Artificial neural networks (ANN) have been used extensively in
function approximation and pattern recognition. Specifically, auto-associative neural
networks (AANNs) have been used in sensor validation because of their ability to
perform nonlinear principal component analysis which allows for the extraction of key
features in a high dimensional, nonlinear dataset [52, 53].
Reference [53] presents a
training method for sensor validation and AANN where two training runs are performed.
The first training run presents accurate training data to both the input and output in order
to learn the functional relationships between the two. The second training run presents
faulty data to the input, but accurate data to the output. This method allows the AANN to
become "insensitive" to faulty data and extract only the proper features from the dataset.
Figure 14 shows the two training methods.
Linear principal component analysis (PCA) can be beneficial in reducing high
dimensional datasets into their principal components.
To accomplish this task,
eigenvalues of the covariance matrix are used to maximize the variance of the dataset in a
lower dimension, i.e.,
𝑌𝑃 = 𝑇
(3.3)
where 𝑌 is the sample set, 𝑇 is the transformed data, and 𝑃 is the eigenvectors of the
covariance matrix. Nonlinear PCA extends the capabilities of linear PCA by using
nonlinear functions instead of eigenvectors. In some cases, this can increase the variance
56
of the selected dataset and result in less information loss than linear PCA during the
dimensionality reduction. The following equations describe nonlinear PCA:
𝑇𝑖 = 𝐺𝑖 (𝑌)
(3.4)
where 𝑇𝑖 is the transformed data and 𝐺𝑖 (𝑌) is a vector nonlinear functions. In order to
restore data in nonlinear PCA, another nonlinear function is needed:
𝑌𝑗′ = 𝐻𝑗 (𝑇)
(3.5)
where 𝑌𝑗 is the restored data and 𝐻𝑗 (𝑇) is a vector nonlinear function. A difficulty in
nonlinear PCA is the determination of the nonlinear functions 𝐺 and 𝐻. However, it has
been shown in previous work that functions of the following form are capable of fitting
any nonlinear function to arbitrary precision:
𝑁2
𝑁1
(3.6)
𝑣𝑘 = ∑ 𝑤𝑗𝑘 𝜎 (∑ 𝑤𝑖𝑗 𝑢𝑖 + 𝜃𝑗 )
𝑗=1
𝑖=1
Where 𝑣 is the desired nonlinear function, 𝑤 are weights of the sigmoid function, and
𝜎(𝑥) is a function that approaches 1 as 𝑥 approaches ∞ and 0 as 𝑥 approaches −∞. A
sigmoid satisfies this criterion:
𝜎(𝑥) =
1
1 + 𝑒 −𝑥
(3.7)
Sigmoids are typically transfer functions seen in artificial neural networks. In
order to perform the dimensionality reduction, a bottleneck layer is used in the hidden
layer nodes of a multilayer perceptron. This allows for the common backpropagation
training technique to be used for sensor validation in the autoassociative neural network.
57
Figure 14 - Training method for auto-associative neural networks for sensor validation.
Another benefit of AANNs is their ability to predict the values of faulty sensors in
the output. If utilized in a mission critical situation, this can provide the information
needed to continue a test even when a fault is detected. Also, this data can provide other
fault diagnosis algorithms with accurate data that can narrow down the exact cause of
faults in a system.
58
3.2.6 Thermal Modeling
While cryogenic fluid can cause less wear on the seal at the bottom of the valve, there is
another packing at the top that allows the valve to offset pressures in order to operate
properly. If this packing freezes there is the potential that it will crack and cause pressure
equalization problems with the valve. This cracking in the packing is one of the reasons
that the steam of the valve is so long. Since the machining of the valves is so precise,
added inches in the stem can increase costs by tens of thousands of dollars. NASA-SSC
performed a series of tests under simulated conditions in the summer of August 2006 in
an attempt to establish a formula for valve frost points. From the tests, they discovered
that complex thermodynamic equations were unnecessary to estimate the frost line, but
instead a simple fin model gave accuracy up to 95%, which is sufficient for this
application. The equation estimates the base temperature of the body by tracking the
amount of time cryogenic fluid has been flowing through an open valve. This value can
be projected up the valve based on a thermal fin equation provided by NASA-SSC
engineers [54].
𝑇𝑡𝑐 = (𝑇𝑎𝑚𝑏 − 𝑇𝑓𝑙𝑢𝑖𝑑 ) ∗ 𝑒 −
𝑡𝑜𝑝𝑒𝑛
𝑚
+ 𝑇𝑓𝑙𝑢𝑖𝑑
(3.1)
where 𝑇𝑎𝑚𝑏 is the ambient temperature, 𝑇𝑓𝑙𝑢𝑖𝑑 is the boiling temperature of the flowing
cryogen, 𝑡𝑜𝑝𝑒𝑛 is the amount of time the valve has been open, 𝑚 is the amount of time
it takes for the valve to reach its steady state, and 𝑇𝑡𝑐 is the estimated base temperature
of the body.
𝑇𝑒𝑠𝑡 =
𝐶𝑜𝑠ℎ(𝑚𝑡 ∗ (𝐿𝑣𝑎𝑙𝑣𝑒 − 𝐿 𝑇𝐶 ))
∗ (𝑇𝑡𝑐 − 𝑇𝑎𝑚𝑏 ) + 𝑇𝑎𝑚𝑏
𝐶𝑜𝑠ℎ(𝑚𝑡 ∗ 𝐿𝑣𝑎𝑙𝑣𝑒 )
(3.2)
where 𝐿𝑣𝑎𝑙𝑣𝑒 is the length of the stem of the valve, 𝐿𝑇𝐶 is the distance of the
thermocouple from the base, 𝑚𝑡 is a material constant found experimentally for the
59
valve, and 𝑇𝑒𝑠𝑡 is the estimated temperature of the thermocouple located at 𝐿𝑇𝐶 . This
formula can be manipulated in order to solve for the frost line of the valve by setting 𝑇𝑒𝑠𝑡
to 32oF and solving for 𝐿𝑇𝐶 , i.e.,
𝐿𝑇𝐶 = −
32 − 𝑇
𝑐𝑜𝑠ℎ−1 (𝑇 − 𝑇𝑎𝑚𝑏 ∗ cosh(𝑚𝑡 ∗ 𝐿𝑣 ))
𝑡𝑐
𝑎𝑚𝑏
(3.3)
𝑚𝑡
This thermal model will be utilized in order to continually monitor the frost line
of the valve both during tests and when the test stand is idle. The monitoring of the frost
line provides two key benefits to NASA test operations. The first benefit is the ability to
monitor how many times and for how long the seal at the top of the valve has been
exposed to freezing temperatures. Knowledge of this statistic can assist the operator to
diagnosis any anomalies or faults found in the valve data. The second benefit is the
ability to monitor frost lines for future valve production.
If a study can present
conclusive evidence that the valves being used in the test stand are much longer than
needed, tens of thousands of dollars can be saved when the existing valves needed to be
replaced.
3.2.7 Adaptive Thresholding
When preparing for a test, control algorithms are set to autonomously operate the valves.
The timings are very specific and the valve's behavior must remain consistent in order to
guarantee proper test firings. There are various faults that can prevent the valve from
operating correctly, but one of the most important details is how the valve reacts to the
control input, independent of the operating conditions. Therefore, simulations of the
valve's output based on the input can be run to estimate valve stroke timings and
behavior. The model used is a bank of autoregressive moving average (ARMA) filters
60
with an optimization constraint to specify an adaptive threshold. This was first proposed
in [35]. Figure 15 shows the algorithm for the design and choice of ARMA models for
the adaptive thresholding with a description following.
Figure 15 - Adaptive threshold algorithm for designing and choosing ARMA models.
The adaptive threshold is chosen based on two optimization functions:
𝑦̂(𝑘) =
min
(𝐺 (𝑞, 𝜃)𝑢(𝑘))
𝜃𝜖[𝜃, 𝜃] 𝑢
(3.4)
𝑦̂(𝑘) =
max
(𝐺 (𝑞, 𝜃)𝑢(𝑘))
𝜃𝜖[𝜃, 𝜃] 𝑢
(3.5)
where 𝑦̂(𝑘) and 𝑦̂(𝑘) are the minimum and maximum value of the simulated ARMA
models, respectively, 𝐺𝑢 (𝑞, 𝜃) is the transfer function of the ARMA model with
coefficients θ and order 𝑞, and 𝑢(𝑘) is the control signal input at time 𝑘.
𝐹𝑖𝑡 = 100 ∗ (1 −
𝑛𝑜𝑟𝑚(𝑦ℎ − 𝑦)
𝑛𝑜𝑟𝑚(𝑦 − 𝑚𝑒𝑎𝑛(𝑦))
)
(3.6)
where 𝐹𝑖𝑡 is the percentage of the output variation that is explained by the model, 𝑦ℎ is
the estimated output, and 𝑦 is the measured output [17].
61
The historic data is assumed to be all nominal data in order to design a set of
models that can represent the entire set. During the training process, a fit equation is
calculated in order to guarantee that the models are not too accurate and not too lax.
Therefore, a threshold is set that the fit equation should be above 70%. This threshold
was found experimentally and may need to be refined based on the application. Once the
models have been selected, they are run through testing data that is from a similar dataset.
If any faults are found in this dataset, it can be concluded that there are not enough
models to completely describe the data properly.
Models are continually created
changing the amount of coefficients in order to create a complete representation of the
dataset. Once a sufficient amount of models have been created, the control algorithm can
be run through the simulation and compared with the actual feedback from the valve
during the test. The adaptive threshold can mark faults during the test which can alert
test operations to anomalous behavior. The simulation of the control algorithm and the
feedback can be seen in Figure 16.
Figure 16 - Adaptive threshold algorithm simulation on real-time data.
62
3.3 Prognostic Survey
The ultimate goal of the intelligent valve framework is the ability to determine the
remaining useful life of the LLAV. At this time, however, the prognostics portion of the
framework is outside the scope of this research. Therefore, several prognostic techniques
will be investigated based on simple linear predictors as well as a state-space model. The
linear predictors will consist of the autoregressive and autoregressive moving-average
filter. The state-space model implemented will be the Kalman filter. These techniques
will be used in conjunction with a neural network to determine their feasibility for future
development in the Intelligent Valve Framework.
3.4 Diagnostic Process
Creating a software framework that can be expanded in the future requires careful
planning and structuring. Therefore, object oriented programming (OOP) techniques
were utilized to construct a backend acquisition and configuration protocol. A MS-SQL
database schema was design to store configuration information throughout tests in order
to create a persistent environment. The schema for the MS-SQL database can be seen in
Figure 17.
63
Figure 17 - Intelligent Valve database schema.
This database was designed in such a way that it meets the requirements of third
normal form (3NF). The normalization of databases enforces guidelines that efficiently
organizes data into a database. The database defines the necessary attributes required to
access data from the DDE servers at NASA-SSC.
Valves contain several sensor
measurements that must be monitored for the diagnostic process to work correctly. These
values are stored in the ValveDetails table where the DDE tags and servers can be
specified as well as the length of the valve for the thermal models described previously.
In order to store the operating history of a valve, the ValveStatistics table contains a
64
column for all of the statistics described earlier.
Each valve can contain several
thermocouples that are attached to its stem in order to validate the thermal model. The
Thermocouples table holds a foreign key to the ValveDetails table to correlate
thermocouples with their valves.
This table also holds the current position of the
thermocouple on the stem of the valve and the high and low thresholds used to set the
flagged state of the thermocouples. The FluidDetails table holds the information used in
the thermal model of several fluids and their boiling point. The final table, DDE,
contains the connection strings for the DDE servers in order to access all the sensor data.
Each thermocouple, valve feedback, and valve control is required to have a foreign key to
one of the DDE servers.
The software framework was written in C# in order to simplify the development
process.
Also, since the more computationally intensive algorithms are performed
offline, the speed benefits of C++ would have been minimal for this application. A class
structure was defined that allows several user controls to share the same data in an
efficient manner.
Class interfaces and structures have been defined to allow extensibility to the
Intelligent Valve framework. The first interface defines how a sensor receives values
from the data servers. It includes a single function with parameters for the name of the
item and the value captured by the data client. The reasoning for including the name of
the value is certain sensors, such as a valve, must keep track of multiple values like its
control and process variable. Currently, only a thermocouple and valve class have been
developed that implement this interface. The purpose of this interface, however, is too
65
allow other sensors, such as pressure and strain, to be included in a single collection in
the intelligent valve data handler.
The next interface defines the functionality a data server is required to have to be
included in the IV framework. The interface defines a number of function templates that
allow the data handler to either sample incoming data at a set rate, or subscribe to data.
Since most data servers require drastically different implementations, this framework
allows for the seamless integration of various data servers to be handled in a way that is
transparent to the IV data handler. The functions for the data server interface are listed in
Table 8.
Table 8 - Data server class interface.
Method Name
RequestDelegate
Parameters
None
StartRequest
String ItemName
StopRequest
String ItemName
PerformRequests
Double elapsedTime
StartAdvise
String ItemName
StopAdvise
String ItemName
Disconnect
None
Stop
None
Resume
None
66
Description
Allows the data handler to
subscribed to any new data that is
sampled.
Commands the data server to begin
sampling the item when commanded
by the data handler.
Commands the data server to stop
sampling the item.
Command the data server to sample
all request data. The elapsed time
parameter is tracked by the IV data
handler and represents the amount of
time since the last time the server has
been sampled.
Commands the data server to begin
sampling the item whenever a new
value is available.
Commands the data server to stop
sampling the item whenever a new
value is available.
Disconnect the client from the server
and stop all sampling and advise
loops.
Stop all sampling and advise loops,
but do not disconnect from the
server.
Resume all sampling and advise
loops.
All data passed around the Intelligent Valve framework is a simple structure that
has three fields: String Item, String Value, and String TimeStamp. Each client of the
value is responsible for transforming the data into their own desired format. A static type
for the value parameter increases the predictability of the values the client will receive
and therefore reduces the amount of type and error checking needed to be performed by
future developers.
The data handler encapsulates the entire backend of the Intelligent Valve
framework. All controls in the framework receive a reference to this data handler and
can subscribe to updates of the different sensor temperatures as well as the update timer
when the data servers are commanded to sample. The data handler is also responsible for
logging the sensor data into a MS-SQL database for offline diagnostic tools. Figure 18
shows the entire class structure of the project.
Figure 18 - Software framework for the Intelligent Valve framework.
67
Chapter 4: RESULTS
Stennis Space Center test operators oversee the testing and validation of rockets for both
NASA and private companies. While few accidents have occurred at Stennis, it is still
important for test engineers to have a better understanding of the behavior of the valves
on the test stands. To further their comprehension, the diagnostic algorithms mentioned
above have been tested and validated against canonical data and simulated and injected
faults in test stand data.
4.1 Diagnostic Validation Data
Several datasets were used to validate the diagnostic algorithm and process discussed in
the previous section. The following sections will outline the procedures in which this
data was collected and how faults were injected into the data.
4.1.1 Thermal Model Data
In order to verify the thermocouple models, a test apparatus was constructed by the test
operations group. The setup was simple, but provided the ability to capture isolated
anomalies to see how the thermocouple reacts under different operating conditions. The
test was completed with the following protocol:
1. A simulated valve was programmed into the WonderWare simulation
environment.
68
2. When the simulated valve opened, liquid nitrogen (LN) was poured into
the box containing the valve.
3. During the next several hours, the liquid nitrogen was kept at a constant
level in order to simulate the passing of fluid through an open valve.
4. The temperature and frost line was monitored after the body reached a
steady state temperature of -322oF (boiling point of LN).
5. There was a thermocouple at the base of the valve and a thermocouple
about 20 inches up the stem of the valve, both were monitored and stored
in a data file.
During the test protocol, anomalies would be inserted periodically in order to
simulate and capture failure modes commonly seen at the test stands. Some anomalies
include the disconnecting of the top thermocouple, decrease in power supply voltage and
current, connection of resistor potentiometer to amplified input and output, thermocouple
debonding, and the effect of ice insulation.
As stated previously, the thermocouples used to measure the frost line
calculations have an error rate based on their type and measurement range.
This
measurement error, as well as the error associated with the thermal model, provides a
threshold value that helps guarantee accurate data from the instrumentation. In order to
more accurately determine the experimentally calculated values, 𝑚𝑡 , an optimization
algorithm was utilized based on a curve fitting method and least squares constraints.
4.1.2 Sensor Validation Data
In March 2006, NASA initiated the Methane Thruster Testbed Project (MTTP) as a
platform for the research of plume diagnostics and ISHM. Historical data from live tests
69
was used to train and test the AANN for sensor validation. Hard and soft faults were
artificially injected into the test runs and simple thresholding was used to determine when
faults had occurred. These artificial faults were characterized during the thermal model
tests in order to create realistic faults in the data. The MTTP trailer can be seen in Figure
19.
Figure 19 - MTTP Trailer used for validating sensor faults.
4.1.3 Adaptive Threshold Data
In order to validate the adaptive threshold model, extensive failure data would be needed
that tracks a valve from nominal conditions to abnormal and eventually complete failure.
This data is difficult to acquire since valves normally are not left until they fail.
Therefore, a simulated control system was needed that provided a method to show
degradation in a valve’s response based on adaptable parameters. A common transfer
function used to simulate valves is seen in with a description of each parameter to follow.
70
𝑉𝑝𝑟𝑜𝑐𝑒𝑠𝑠 =
𝑔 ∗ 𝑒 −𝑇𝑠 ∗𝑠
𝑠 2 + 2 ∗ 𝜁 ∗ 𝑇𝑤 ∗ 𝑠 + 𝑇𝑤2
(3.2)
where 𝑔 is the gain, 𝑇𝑠 is the unit delay, 𝑇𝑤 is the natural frequency, 𝜁 is the damping
ratio, and 𝑉𝑝𝑟𝑜𝑐𝑒𝑠𝑠 is the output of the transfer function modeling a valve's response to a
PID controller.
As the parameters are changed, the valve’s feedback should change accordingly,
and as the valve’s performance degrades, the algorithm’s adaptive threshold detects these
changes and labels faults in the system. In order to model NASA-SSC as closely as
possible, the control system uses a PID controller simulated in MATLAB’s Simulink.
Parameters for the PID were selected by common values used during live test firings at
NASA-SSC. The proportional constant was set at 1 and the integral component set to .1.
The parameters were modified based on the following intervals:
Table 9 - Adaptive threshold simulation parameters.
Parameter
Nominal
Low Abnormal
High Abnormal
Gain
. 98 ≤ 𝑔 ≤ 1.01
. 8 ≤ 𝑔 < .98
1.01 < 𝑔 ≤ 1.2
Natural Frequency
. 9 ≤ 𝑇𝑤 ≤ 1.1
. 8 ≤ 𝑇𝑤 < .9
1.1 < 𝑇𝑤 ≤ 1.2
Damping Ratio
. 9 ≤ 𝜁 ≤ 1.1
. 8 ≤ 𝜁 < .9
1.1 < 𝜁 ≤ 1.2
Delay
2 ≤ 𝑇𝑠 ≤ 3
0 ≤ 𝑇𝑠 < 2
𝑁/𝐴
4.2 Thermal Model Validation
In order to validate the thermal model, experiments were performed with 10 faults
injected in a thermocouple which was bonded three inches up the stem of a fifteen inch
valve.
The thermocouple data was compared to the thermal model and a simple
threshold of 22oF was used to determine when a fault had occurred. This threshold was
71
derived from the 95% accuracy of the thermal model. The overall range of temperatures
is from -322oF to 80oF or approximately 400oF and 5% of that is 22oF.
4.2.1 Thermal Modeling
The first test performed at NASA-SSC was a base run to identify the valve’s physical
parameters.
The least squares optimization curve fitting method described in the
approach section was used to determine the parameters for the remaining tests. Table 10
shows the values that were found based on the optimization algorithm and shows the
simulation results using the parameters.
Table 10 - Physical parameter obtained from least square optimization curve fit of base run.
Mt – Chill Down
659.80
Mt – Warm Up
4672
mt – Chill Down
.36
72
mt – Warm up
.32
Base Run
50
0
Degress (F)
-50
-100
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-150
-200
-250
-300
-350
0
0.5
1
1.5
2
2.5
3
3.5
4
Time (s)
4.5
4
x 10
Figure 20 - Simulation data using thermal modeling for base run.
Once the physical parameters were determined, they could be used to validate the
thermal model’s ability to detect anomalies by injecting faults during similar test runs.
Disconnections were made at various locations during the test runs to see how the system
responded. Figure 21 shows the test setup and will the numbered locations will be
referenced within parenthesis, i.e. (13), throughout the following results. The bottom
simulation throughout the tests provides inaccurate results because of the dependency on
the ambient temperature. These tests were run for several hours and sometimes over
night with only a single ambient temperature being recorded. Therefore, the measured
bottom thermocouple is used for the top simulation except in the presence of a fault, then
the simulation was used.
73
Figure 21 - Data acquisition setup for thermal modeling fault detection.
The first fault simulates a faulty connection before the amplifier (13) and after the
patch panel (12). The faulty connection was simulated by connecting a potentiometer to
the referenced locations and increasing it quickly at 8230 and 8990 seconds. The fault
detection was able to detect both faults accurately using the thermal modeling in the top
thermocouple. However, there are some false positives reported in the chill down phase
of the test. While no fault was documented, abnormal behavior can be seen in the top
thermocouple as it rises slightly as the temperature reaches its minimum. In determining
the performance metrics, only documented faults were considered to be true positives
even if the measurements show unexpected results.
74
Faulty Connection in Amplifier Input
50
0
Degress (F)
-50
-100
-150
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-200
-250
-300
-350
0
0.5
1
1.5
2
2.5
3
3.5
Time (s)
4
4
x 10
Figure 22 – Simulation data using thermal modeling for faulty connections in Tustin amplifier input.
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
2
2.5
Time (s)
3
3.5
4
4.5
x 10
Bottom Thermocouple Fault Detection
Fault Classification
5
4
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
2
2.5
Time (s)
3
3.5
4
4.5
5
4
x 10
Figure 23 - Fault classification using thermal modeling for faulty connections in Tustin amplifier input.
75
The next fault simulates a faulty amplifier (6, 13) as well as disconnects in the
Tustin patch panel (7, 14). The power downs of the amplifier were performed at 5563
and 5910 seconds with 6 input disconnections occurring at 7381, 7457, 7592, 7641, 9336,
9363 seconds. Again, simple thresholding combined with the thermal equations was able
to detect all faults accurately in both the top and bottom thermocouple. This test revealed
no false positives in the top thermocouple, which is the desired metric for these tests.
Amplifier Power Down and Tustin Input Disconnect
200
100
Degress (F)
0
-100
-200
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-300
-400
-500
0.5
1
1.5
Time (s)
2
2.5
4
x 10
Figure 24 - Simulation data using thermal modeling for amplifier power downs and Tustin input disconnections.
76
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
Time (s)
2
2.5
4
x 10
Fault Classification
Bottom Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
Time (s)
2
2.5
4
x 10
Figure 25 - Fault detection using thermal modeling for amplifier power down and Tustin input disconnection.
In order to simulate a fault in the digitizer input, a potentiometer was connected in
between (13) and (14). Instead of a hard fault, the resistance was slowly increased at
6693 seconds to simulate a drifting connection. A hard fault was injected at 7750
seconds by quickly increasing the resistance. While both faults were detected, several
false negatives were reported because the simulation was predicting values lower than the
measured value. Therefore, since the fault was slowly injected, there was a delay before
it reached the threshold values indicating a fault.
77
Faulty Input Connection in Digitizer
100
50
0
Degress (F)
-50
-100
-150
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-200
-250
-300
0
0.5
1
1.5
2
2.5
Time (s)
4
x 10
Figure 26 - Simulation data using thermal modeling for faulty input connections in the digitizer.
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
Time (s)
2
2.5
4
x 10
Fault Classification
Bottom Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
Time (s)
2
2.5
4
x 10
Figure 27 - Fault detection using thermal modeling for amplifier power down and Tustin input disconnection.
78
During a humid day, the moisture in the air can change into ice as it comes into
contact with the surface of the valve. When surrounding a thermocouple, the frost may
act as an insulator and cause incorrect readings. In order to simulate this, water was
applied to the valve stem as the test was occurring. The water then froze when that part
of the valve reached freezing point. In this test, there were no identifiable effects from
the frost insulation. However, the top simulation estimates a steeper drop in temperature
during chill down, which is recorded as a false positive. If the frost insulation occurred
on the bonnet of the valve
Frost Insulation #1
50
0
-50
-100
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-150
-200
-250
-300
0.5
1
1.5
2
Time (s)
2.5
3
4
x 10
Figure 28 - Simulation data using thermal modeling for simulated frost insulation test 1.
79
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
Time (s)
2
2.5
3
4
x 10
Fault Classification
Bottom Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
Time (s)
2
2.5
3
4
x 10
Figure 29 - Fault detection using thermal modeling for frost insulation test 1.
In the next test, frost insulation was again added, but this time the thermocouple
was not in direct contact with the valve. This induced fault checks how frost can affect a
loose thermocouple. Based on the top thermocouple's data in Figure 30, it can be seen
that the top thermocouple lowered in temperature, but was well above the actual
temperature of the valve based on the top simulation data. The simulation threshold
method was again able to detect this fault with 100% accuracy.
80
Frost Insulation Test #2
50
0
-100
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-150
-200
-250
-300
-350
0
1
2
3
4
5
6
7
8
4
Time (s)
x 10
Figure 30 - Simulation data using thermal modeling for simulated frost insulation test 2.
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0
1
2
3
4
Time (s)
5
6
7
8
4
x 10
Bottom Thermocouple Fault Detection
Fault Classification
Degress (F)
-50
1
0.8
0.6
0.4
0.2
0
0
1
2
3
4
Time (s)
5
6
7
Figure 31 - Fault detection using thermal modeling for frost insulation test 2.
81
8
4
x 10
Figure 32 - Data acquisition modified setup for thermal modeling fault detection.
A junction reference error can cause misread thermocouple readings. In this
particular test, the junction was placed into ice water to simulate a reference error.
During the beginning of the test the top thermocouple does not reach the expected
temperature, but the more noticeable fault occurs when the junction was lifted out of the
water around 11832 seconds. A sharp decrease in the temperature resulted from this
induced fault.
The fault detection algorithm was able to detect both faults with
reasonable accuracy.
82
Temperature Junction Reference Error
50
0
Degress (F)
-50
-100
-150
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-200
-250
-300
-350
0
0.5
1
1.5
2
2.5
3
3.5
4
4
Time (s)
x 10
Figure 33 - Simulation data using thermal modeling for temperature junction reference errors.
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
2.5
Time (s)
3
3.5
4
4
x 10
Fault Classification
Bottom Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
Time (s)
2.5
3
3.5
4
4
x 10
Figure 34 - Fault detection using thermal modeling temperature for junction reference errors.
83
The next test simulated a series of disconnects and shorts in both the top and
bottom thermocouple. During warm up, the top thermocouple was repeated connected
and disconnected to simulate a connection that was just starting to become faulty. The
faults were able to be detected at a high precision, but several false positives and false
negatives were found during the repetitive disconnect due to the voltage not having
enough time to reach its minimum value.
Thermocouple and Power Disconnection
100
0
Degress (F)
-100
-200
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-300
-400
-500
0
0.5
1
1.5
Time (s)
2
2.5
3
4
x 10
Figure 35 - Simulation data using thermal modeling for thermocouple and power disconnections.
84
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
Time (s)
2
2.5
4
Bottom Thermocouple Fault Detection
Fault Classification
3
x 10
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
Time (s)
2
2.5
3
4
x 10
Figure 36 - Fault detection using thermal modeling for thermocouple and power disconnections.
This test was again simply a disconnect and shortage of the thermocouple,
however, the ambient temperature was recorded during the test which provided a more
accurate simulation model. Similar symptoms were seen as previous tests where a level
shift to the channel's minimum value was the result of a disconnect and a level shift to the
channel's highest value was seen for a short.
Even with the ambient temperature,
however, several false positives can be seen during cool down. This again is probably
due to the body freezing much faster than expected due to the testing procedures.
85
Thermocouple Disconnection and Short
100
0
-100
Degress (F)
-200
-300
-400
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-500
-600
-700
-800
-900
0
0.5
1
1.5
2
4
Time (s)
x 10
Figure 37 - Simulation data using thermal modeling for thermocouple disconnections and shorts.
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
1.4
1.6
1.8
2
4
x 10
Bottom Thermocouple Fault Detection
Fault Classification
2.2
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
1.4
1.6
1.8
2
2.2
4
x 10
Figure 38 - Fault detection using thermal modeling for thermocouple disconnections and shorts.
86
This test demonstrated a drifting fault that was simulated by decreasing the
voltage on the transmitters power supply over a two minute span. Since the fault's effect
was slower and our threshold value is so high, there was a number of false negatives
reported. This same test was performed several times over the course of an hour with
similar results. Near the end of the test the power supply for both transmitters was
dropped which cause a fault in both the bottom and top thermocouple.
The fault
detection in the lower thermocouple allowed for the top thermocouple to retain a value
closer to the actual temperature of the valve which resulted in proper fault detection of
the transmitter's low power output.
Transmitter Power Failures
0
Degress (F)
-100
-200
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-300
-400
-500
0
0.5
1
1.5
Time (s)
2
2.5
3
3.5
4
x 10
Figure 39 - Simulation data using thermal modeling for transmitter power failures.
87
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
2
2.5
3
Time (s)
4
x 10
Bottom Thermocouple Fault Detection
Fault Classification
3.5
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
2
Time (s)
2.5
3
3.5
4
x 10
Figure 40 - Fault detection using thermal modeling for transmitter power failures.
A new setup was used for this test, Figure 32 , where a thermocouple junction was
added (18,19) and placed in an ice bath. At warm up, it was removed from the ice bath
and a heat gun was blown on it. When the junction reference was in the ice bath, the
thermocouple's temperature was higher than expected, and when the heat gun caused the
data to be lower than expected. Both these induced faults were detected accurately.
88
Unaccounted Thermocouple Junction
50
0
Degress (F)
-50
-100
-150
Top Thermocouple
Top Simulation
Bottom Thermocouple
Bottom Simulation
-200
-250
-300
-350
0
0.5
1
1.5
2
2.5
3
3.5
4
4
Time (s)
x 10
Figure 41 - Simulation data using thermal modeling for unaccounted thermocouple junctions.
Fault Classification
Top Thermocouple Fault Detection
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
2
Time (s)
2.5
3
3.5
4
x 10
Bottom Thermocouple Fault Detection
Fault Classification
4
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
Time (s)
2.5
3
3.5
4
4
x 10
Figure 42 - Fault detection using thermal modeling for unaccounted thermocouple junctions.
89
Figure 43 shows a comparison of the thermal model's prediction of a frost line
against an actual thermocouple that was bonded to the stem of the valve three inches
from the body.
Frost line model comparison with actual thermocouple data
13
Frost Line (inches)
9
X: 359.8
Y: 31.97
7
60
50
40
30
5
X: 417.2
Y: 3.006
20
3
10
0
Temperature (F)
Thermocouple reading at 3 inches
Actual time of frost point at 3 inches
Predicted time of frost point at 3 inches
Predicted frost line
11
0
-10
-3
-20
350
400
450
500
550
600
650
Elapsed Time (s)
Figure 43 - Comparison of predicted and actual frost line.
It can be seen that the difference between the predicted and actual frost line at
three inches is approximately only one minute. Since the heat dissipation of the valves is
exponential, the steady state time of larger valves can be upwards of twenty-two hours.
Therefore, a minute is well within the accepted error for this application. These results
further validate the study performed in [14], but expands the work to detect faults in
thermocouples. The model being incorporated in the intelligent valve framework will
allow for the continuous monitoring of the frost line in the LLAV. If it can be found that
90
the frost line of the valve never reaches the packing at the top of the valve, the stem
length can be reduced saving thousands of dollars in the manufacturing of the valve.
4.2.2 Simulation Metrics
Table 11 - Performance metrics for faulty connection in amplifier input.
Positive
Negative
Positive
5529
1411
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
1217
255460
79.67%
99.53%
81.96%
99.45%
89.89%
Table 12 - Performance metrics for amplifier power down and Tustin input disconnect.
Positive
Negative
Positive
854
7
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
30
246106
99.19%
99.99%
96.61%
100%
98.27%
Table 13 - Performance metrics for input disconnection on the digitizer.
Positive
Negative
Positive
4487
190
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
2048
240361
95.94%
99.16%
68.66%
99.92%
81.14%
Table 14 - Performance metrics for frost insulation test 1.
Positive
Negative
Positive
2592
7619
91
Negative
0
204984
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
25.38%
100%
100%
96.42%
100%
Table 15 - Performance metrics for frost insulation test 2.
Positive
Negative
Positive
34702
8800
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
0
371790
79.77%
100%
100%
97.69%
100%
Table 16 - Performance metrics for temperature junction reference error.
Positive
Negative
Positive
155980
770
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
4778
51895
99.51%
91.57%
97.03%
98.54%
94.22%
Table 17 - Performance metrics for thermocouple and power disconnection.
Positive
Negative
Positive
1120
44
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
9
244668
96.22%
100%
99.20%
99.98%
99.60%
Table 18 - Performance metrics for thermocouple disconnections and shorts.
Positive
Negative
Positive
841
22
Sensitivity
Negative
3880
173357
97.45%
92
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
97.81%
17.81%
99.99%
30.14%
Table 19 - Performance metrics for transmitter power and failure.
Positive
Negative
Positive
115910
38566
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
10
92715
75.03%
99.99%
99.99%
70.62%
99.99%
Table 20 - Performance metrics for unaccounted thermocouple junction.
Positive
Negative
Positive
92865
55685
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Negative
0
58750
62.51%
100%
100%
51.34%
100%
Table 21 – Average performance metrics for all thermocouple fault tests.
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
98.80%
81.07%
86.13%
91.39%
89.32%
The metrics validate the feasibility for the use of thermal models for calculation
of frost line and thermocouple sensor validation. With a greater population size and more
controlled test configuration, the results can be validated further, but the initial test size
shows promising results for the use of this algorithm in the Intelligent Valve
framework. The faults that caused drastic changes in temperature during disconnects and
93
shorts in the thermocouple were always detected within a measurement sample of the
induced fault occurring. Other faults, such as a slowly degrading transmitter power
supply, caused a slow discrepancy in the thermocouple’s measured data and its
simulation temperature. This type of fault was able to be detected but it took several
minutes into the fault for the measured value to cross the threshold value.
The computational efficiency of this approach is also very appealing for use in a
mission critical situation where processing power is limited and must be reserved for
operational algorithms. Therefore, the calculation of the two thermal equations can be
performed on a sample-by-sample basis on numerous thermocouples during live test fires
giving real-time results.
4.3 Sensor Validation
NASA-SSC provided valve data with a downstream pressure sensor for validation of the
diagnostic algorithms. This data provided canonical datasets for the development of the
AANN sensor validation. Five total datasets were provided with three sets used for
training and two for testing. All the data provided was nominal, so artificial soft and hard
faults were injected into the data. A hard fault is defined as a level shift in the data where
the measurement values drastically change to a certain value and remains at that value for
an extended period of time. This is typical behavior of a sensor that is completely
disconnected. A soft fault is defined when the value of the sensor deviates from the
physical value slowly. This is characteristic of a sensor that slowly begins to degrade in
performance from either a slow bonding disconnect or insulation disconnect. A hard and
soft fault can be seen in Figure 44 and Figure 45, respectively.
94
Figure 44 - Example of a hard fault.
Figure 45 - Example of a soft fault.
An example dataset of the valve can be seen in Figure 46. This dataset has a very
simple correlation between the pressure sensor and the valve’s position. It is nearly a
step function between the valve and pressure reading. This testing, while simple, will
95
provide validation for the AANN method, which will be expanded to a more complex
system later.
As previously mentioned, hard and soft faults were artificially injected
and an AANN was trained based on the method described in the background section.
Figure 47 and Figure 48 show the fault conditions and the AANN output.
Figure 46 - Example dataset from LLAV and downstream pressure sensor.
96
Figure 47 - Hard fault detection using AANN.
Figure 48 - Soft fault detection by AANN.
97
In order to further validate the AANN algorithm, the MTTP data discussed above
was also used to create a more extensive subsystem that could be tested. Again, artificial
faults were injected into different sensors at different times during the test, but were
characteristic of actual faults found during the thermal modeling tests. Figure 49 shows a
hard fault in a pressure sensor, Figure 50 demonstrates the AANN's ability to track a soft
fault in a separate pressure sensor, and Figure 51 shows the robustness of the AANN in
the case of large disturbances which is a known symptom of a faulty connection.
Simulated data for sensor validation
Pressure (PSIG)
400
300
Measured Data
Simulated Fault Data
Estimated AANN Data
200
100
0
2.75
2.8
2.85
2.9
Elapsed Time (s)
2.95
5
x 10
Fault region detection
Fault Detected
1
0.8
0.6
0.4
0.2
0
2.75
2.8
2.85
2.9
Elapsed Time (s)
Figure 49 - Fault detection of a simulated hard fault in a pressure sensor.
98
2.95
5
x 10
Measured Data
Simulated Fault Data
Estimated AANN Data
Simulated data for sensor validation
Pressure (PSIG)
400
300
200
100
0
2.75
2.8
2.85
2.9
Elapsed Time (s)
2.95
5
x 10
Fault region detection
Fault Detected
1
0.8
0.6
0.4
0.2
0
2.75
2.8
2.85
2.9
Elapsed Time (s)
Figure 50 - Fault detection of a soft fault in a pressure sensor.
99
2.95
5
x 10
Simulated data for sensor validation
Pressure (PSIG)
400
350
Measured Data
Simulated Fault Data
Estimated AANN Data
300
250
200
150
100
2.82
2.84
2.86
2.88
2.9
2.92
2.94
Elapsed Time (s)
2.96
5
x 10
Fault region detection
Fault Detected
1
0.8
0.6
0.4
0.2
0
2.82
2.84
2.86
2.88
2.9
2.92
2.94
Elapsed Time (s)
2.96
5
x 10
Figure 51 - Detection of a simulated disconnect in a pressure transducer.
The AANN was able to detect the faults in the pressure sensor as well as predict
the values of the pressure sensor to a reasonable degree. In the hard and soft fault, Figure
49 and Figure 50, no false positives or negatives were detected by the AANN. In the
simulated disconnect, the fault data occasionally approached the AANN's value causing
false negatives to be detected. Depending on the application, this may be remedied by
setting an alarm only when a predefined number of fault classifications occurs, and
conversely disable the alarm when a defined number of positive classifications occurs.
To verify this algorithm further, the GOX subsystem of the MTTP was also
tested. Similar artificial faults were injected into the test data as well as multiple sensor
100
faults at concurrent times.
The same metrics that were used in the thermocouple
algorithm were also calculated for the sensor validation with the addition of mean
squared error. Mean squared error was not used in the thermocouples due to the lack of a
"true" signal being present. The first test did not contain any faults to ensure that the
AANN had correctly learned the correlations in the system.
(a)
(b)
Figure 52 - Legend for AANN estimations: (a) Top estimation plots and (b) bottom error plots.
AANN estimation for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1134-GO
600
400
200
0
0
1
2
Elapsed Time (ms)
200
150
100
50
0
0
4
100
50
0
4
x 10
Error signal and threshold for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1134-GO
1
2
Elapsed Time (ms)
30
20
10
0
0
1
2
Elapsed Time (ms)
0
4
x 10
1
2
Elapsed Time (ms)
4
x 10
Figure 53 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors under normal operating
conditions.
101
AANN estimation for
PC1
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1143-GO
600
400
200
0
0
1
2
Elapsed Time (ms)
100
50
0
0
4
100
50
0
4
x 10
Error signal and threshold for
PC1
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1143-GO
1
2
Elapsed Time (ms)
20
15
10
5
0
0
1
2
Elapsed Time (ms)
0
4
x 10
1
2
Elapsed Time (ms)
4
x 10
Figure 54 - AANN Estimation for PE-1143-GO and PC1 pressure sensors under normal operating conditions.
AANN estimation for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
AANN estimation for
VPV-1139-FB
40
20
0
0
1
2
3
Elapsed Time (ms)
40
20
0
0
4
10
5
0
0
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
x 10
Error signal and threshold for
VPV-1139-FB
1
2
3
Elapsed Time (ms)
10
5
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 55 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors under normal operating
conditions.
102
Table 22 – Performance metrics for fault detection using AANN under normal operating conditions.
Positive
Negative
Positive
0
0
Negative
0
55685
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Average MSE
100%
NaN
NaN
100%
NaN
14.56
Sensor
MSE
PE-1134-GO
PE-1140-GO
PE-1143-GO
PC1
VPV-1139-FB
VPV-1139-CMD
48.3133
0.8716
37.4732
0.1882
0.5223
0.0176
As can be seen in Figure 53-Figure 55, the AANN was able to find the correct
correlations based on the training data, then estimate the test data set while operating
under normal conditions. The VPV-1139-FB channel had significant noise is all of the
datasets which seemed to be caused by either a bad power supply or bad connection. In
order to create relatively nominal data, a moving average window was applied to the
training dataset as well as all the test datasets.
In the next test, a hard fault was injected into the PE-1134-GO pressure sensor
during the startup phase. The hard fault was a level shift to zero for the first 200 samples
in the sequence. Figure 56 - Figure 58 show the results of the six monitored sensors and
Table 23 shows the respective performance metrics.
103
AANN estimation for
PE-1140-GO
600
400
200
0
200
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1134-GO
0
150
100
50
0
1
2
3
Elapsed Time (ms)
0
4
100
50
0
4
x 10
Error signal and threshold for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1134-GO
1
2
3
Elapsed Time (ms)
30
20
10
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 56 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with hard fault in PE-1143.
AANN estimation for
PC1
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1143-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
100
50
0
0
4
400
300
200
100
0
4
x 10
Error signal and threshold for
PC1
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1143-GO
1
2
3
Elapsed Time (ms)
20
15
10
5
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 57 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with hard fault in PE-1143.
104
AANN estimation for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
AANN estimation for
VPV-1139-FB
40
20
0
0
1
2
3
Elapsed Time (ms)
40
20
0
0
4
10
5
0
0
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
x 10
Error signal and threshold for
VPV-1139-FB
1
2
3
Elapsed Time (ms)
10
5
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 58 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD pressure sensors with hard fault in PE1143.
Table 23 – Performance metrics for fault detection using AANN with injected hard fault in PE-1143-GO.
Positive
Negative
Positive
201
0
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Average MSE
Negative
0
55685
100%
100%
100%
100%
100%
18.1059
Sensor
MSE
PE-1134-GO
PE-1140-GO
PE-1143-GO
PC1
VPV-1139-FB
VPV-1139-CMD
58.7237
3.8857
44.225
1.0074
0.7697
0.0240
105
This test shows the robustness of the AANN with a hard fault in a pressure sensor.
The AANN was able to detect all of the faults in the pressure sensor as well as maintain
proper values for the rest of the sensors. Since the training data contains windows of
zeroed out sensors, it makes sense that this test would perform well. The MSE was
slightly higher in certain sensors, especially in the faulty sensor PE-1134-GO. However,
the values produced by the AANN were close enough to be used in lieu of the faulty data,
which is the goal of this algorithm.
It was seen in the thermocouple tests that a shorted sensor connection can result in
a level shift to the maximum value of the sensor. The next test simulates a similar short
in the PE-1143-GO sensor. The value was held for the entirety of the test to make sure
that the AANN could detect the fault through all transitions and not just the initial state.
Figure 59-Figure 61 show the result of the test and Table 24 shows the respective
performance metrics.
106
AANN estimation for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1134-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
200
150
100
50
0
0
4
x 10
Error signal and threshold for
PE-1134-GO
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
60
100
50
0
40
20
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 59 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with level shift in PE-1143GO.
AANN estimation for
PC1
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1143-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
100
50
0
0
4
x 10
Error signal and threshold for
PE-1143-GO
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
PC1
Pressure (PSIG)
Pressure (PSIG)
400
300
200
100
0
20
15
10
5
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 60 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with level shift in PE-1143-GO.
107
AANN estimation for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
AANN estimation for
VPV-1139-FB
40
20
0
0
1
2
3
Elapsed Time (ms)
40
20
0
0
4
10
5
0
0
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
x 10
Error signal and threshold for
VPV-1139-FB
1
2
3
Elapsed Time (ms)
10
5
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 61 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with level shift in PE-1143GO.
Table 24 – Performance metrics for fault detection using AANN with injected level shift fault in PE-1143-GO.
Positive
Negative
Positive
975
0
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Average MSE
Negative
28
4847
100%
99.42%
97.20%
100%
98.30%
415.51
Sensor
MSE
PE-1134-GO
PE-1140-GO
PE-1143-GO
PC1
VPV-1139-FB
VPV-1139-CMD
1701.9
9.30
779.1
0.9
1.80
0.137
108
The shortage simulations produced similar fault classification results as the hard
fault, but the prediction error of the sensor data was much higher. This is to be expected
as a number of runs in the training dataset contained similar values of PE-1143-GO as the
fault was reporting. Therefore, the correlations found by the neural network's bottleneck
layer would have been caught between two different states of the training data. Even
though the prediction accuracy decreased, the fault detection would still be sufficient to
pass on the data to a fault diagnosis algorithm which could identify the faulty pressure
sensor.
There are times on the test stand due to weather and wind conditions that a
sensor's insulation can become loose causing a faulty connection in the pressure sensor.
This disconnect can cause considerable noise in the channel's measurements. These
measurements can be particularly difficult to detect at an early stage because only small
variations in the measurement data can be seen. The first test seen in Figure 62 - Figure
64 is a simulation of a more drastic disconnect where the values of the PC1 pressure
sensor have a more severe discrepancy from the actual measured value.
109
AANN estimation for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1134-GO
600
400
200
0
0
200
150
100
50
0
1
2
3
Elapsed Time (ms)
0
4
100
50
0
4
x 10
Error signal and threshold for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1134-GO
1
2
3
Elapsed Time (ms)
30
20
10
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 62 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with noise in PC1.
AANN estimation for
PC1
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1143-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
100
50
0
0
4
100
50
0
4
x 10
Error signal and threshold for
PC1
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1143-GO
1
2
3
Elapsed Time (ms)
60
40
20
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 63 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise in PC1.
110
AANN estimation for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
AANN estimation for
VPV-1139-FB
40
20
0
0
1
2
3
Elapsed Time (ms)
40
20
0
0
4
10
5
0
0
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
x 10
Error signal and threshold for
VPV-1139-FB
1
2
3
Elapsed Time (ms)
10
5
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 64 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with noise in PC1.
Table 25 – Performance metrics for fault detection using AANN with injected noise in PC1.
Positive
Negative
Positive
260
41
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Average MSE
Negative
9
5540
99.64%
86.37%
96.65%
99.26%
98.22%
89.43
Sensor
MSE
PE-1134-GO
PE-1140-GO
PE-1143-GO
PC1
VPV-1139-FB
VPV-1139-CMD
52.02
4.40
58.11
0.87
1.62
0.085
111
This test again verified the AANN's robustness even in the presence of noise. The
training method using random biases in the second training set optimized the weights to
enhance its understanding of the complex system.
The last test performed is the only "real world" fault that was available in the data.
As stated previously, the VPV-1139-FB sensor had noise is its channel during every test
run. The preprocessing of the data used a moving average to create normal operating
data that was sufficient for training the AANN. This test uses the original dataset to
validate the AANN's performance with actual fault data.
AANN estimation for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1134-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
200
150
100
50
0
0
4
100
50
0
4
x 10
Error signal and threshold for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1134-GO
1
2
3
Elapsed Time (ms)
60
40
20
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 65 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with noise in VPV-1139-FB.
112
AANN estimation for
PC1
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1143-GO
600
400
200
0
0
100
50
0
1
2
3
Elapsed Time (ms)
0
4
100
50
0
4
x 10
Error signal and threshold for
PC1
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1143-GO
1
2
3
Elapsed Time (ms)
20
15
10
5
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 66 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with noise in VPV-1139-FB.
AANN estimation for
VPV-1139-CMD
60
Percent Open (%)
Percent Open (%)
AANN estimation for
VPV-1139-FB
40
20
0
0
1
2
3
Elapsed Time (ms)
40
20
0
0
4
30
20
10
0
0
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
x 10
Error signal and threshold for
VPV-1139-FB
1
2
3
Elapsed Time (ms)
10
5
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 67 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with noise in VPV-1139-FB.
113
Table 26 – Performance metrics for fault detection using AANN with noise in VPV-1139-FB.
Positive
Negative
Positive
129
846
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Average MSE
Negative
3
4872
99.93%
13.23%
97.72%
85.20%
98.82%
23.22
Sensor
MSE
PE-1134-GO
PE-1140-GO
PE-1143-GO
PC1
VPV-1139-FB
VPV-1139-CMD
78.77
11.24
45.17
0.98
3.18
0.03
While the AANN was still able to hold a low MSE in this case, the lack of a fault
region detection algorithm produces a very low specificity rating. This resulted in a low
specificity rating which could result in an undetected fault in the sensor. Detection of
spikes in data makes it difficult to determine and diagnose the source of a fault, and
therefore the sensitivity is usually defined based on the application.
Fault region
detection algorithms can use fault windows with a majority rule decision to determine the
overall health of a sensor over a period of time to try and assist the fault diagnosis
algorithm.
The last test determines whether the AANN can detect simultaneous faults in
sensors. A disconnect was injected into PE-1143-GO and a short was injected into PC1
for the entirety of the test.
114
AANN estimation for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1134-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
200
150
100
50
0
0
4
x 10
Error signal and threshold for
PE-1134-GO
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
PE-1140-GO
Pressure (PSIG)
Pressure (PSIG)
200
300
200
100
0
0
1
2
3
Elapsed Time (ms)
150
100
50
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 68 - AANN Estimation for PE-1134-GO and PE-1140-GO pressure sensors with simultaneous faults in
PE-1143-GO and PC1.
AANN estimation for
PC1
Pressure (PSIG)
Pressure (PSIG)
AANN estimation for
PE-1143-GO
600
400
200
0
0
1
2
3
Elapsed Time (ms)
100
50
0
0
4
300
200
100
0
4
x 10
Error signal and threshold for
PC1
Pressure (PSIG)
Pressure (PSIG)
x 10
Error signal and threshold for
PE-1143-GO
1
2
3
Elapsed Time (ms)
20
15
10
5
0
0
1
2
3
Elapsed Time (ms)
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 69 - AANN Estimation for PE-1143-GO and PC1 pressure sensors with simultaneous faults in PE-1143GO and PC1.
115
AANN estimation for
VPV-1139-CMD
60
Percent Open (%)
Percent Open (%)
AANN estimation for
VPV-1139-FB
40
20
0
0
1
2
3
Elapsed Time (ms)
40
20
0
0
4
40
20
0
0
1
2
3
Elapsed Time (ms)
4
x 10
Error signal and threshold for
VPV-1139-CMD
Percent Open (%)
Percent Open (%)
x 10
Error signal and threshold for
VPV-1139-FB
1
2
3
Elapsed Time (ms)
40
30
20
10
0
0
4
x 10
1
2
3
Elapsed Time (ms)
4
x 10
Figure 70 - AANN Estimation for VPV-1139-FB and VPV-1139-CMD valve sensors with simultaneous faults in
PE-1143-GO and PC1.
Table 27 – Performance metrics for fault detection using AANN with simultaneous faults in PE-1143-GO and
PC1.
Positive
Negative
Positive
386
1564
Sensitivity
Specificity
Positive Predictive Value
Negative Predictive Value
F-Measure
Average MSE
Negative
1530
2370
60.76%
19.80%
20.15%
60.24%
30.26%
14561
Sensor
MSE
PE-1134-GO
PE-1140-GO
PE-1143-GO
PC1
VPV-1139-FB
VPV-1139-CMD
20464
10226
56154
361
50
110
116
This verifies that the AANN is only useful in the presence of a single sensor fault.
The data from the output of the AANN would not be useful for the fault diagnosis
algorithm as there are too many misclassifications of faults in sensors that were still
operating properly. A possible solution is to replace the motor valves with a feedback
sensor to get a more accurate understanding of the operating state of the system. The
binary input from these valves were tested in the AANN to see if they could improve
performance, but showed no real benefit in similar tests.
Therefore, to reduce
computational complexity and training time, they were removed from the tests.
Overall the use of a bottlenecked neural network with mapping and demapping
layers proved to be an effective tool for the detection of single sensor faults in a complex
system. The AANN was able to find correlations in the data without any knowledge of
the physical dynamics of the MTTP dataset. This provides a generic algorithm that will
work for most complex systems if enough training data is provided. The estimated data
provided by the AANN can also assist in the decision made by NASA test operators to
continue an expensive rocket engine test in the event that a fault is detected.
There are also several drawbacks to this method. First, large amounts of data
must exist that encompasses all of the system's operating conditions in order to train the
AANN properly. If insufficient data is provided, an online training algorithm may have
to be implemented to continually update the weights of the neural network.
Also,
multiple sensor faults were not able to be detected with the current amount of data that
was provided to the system. Lastly, certain sensors may have no correlations to each
other, for example valve position and strain. Therefore, while pressure and valve sensors
work in this context, detection of a fault in a strain sensor would not and may even throw
117
off the detection of faults in the other sensors.
It can be concluded then, that
determination of the AANN's sensor selection and training data must be performed
carefully in order to guarantee the successful detection of faults in the sensors that it is
monitoring.
4.4 Adaptive Threshold
To validate the adaptive threshold method, six different set point transitions were used to
determine the robustness of the algorithm with different transition ranges and speeds.
These transitions can be seen in Figure 71:
Setpoint transition 1
Setpoint transition 2
100
Open Percentage (%)
Open Percentage (%)
100
80
60
40
20
0
80
60
40
20
0
0
1000
2000
Samples
3000
4000
0
Setpoint transition 4
2000
Samples
3000
4000
Setpoint transition 5
100
Open Percentage (%)
100
Open Percentage (%)
1000
80
60
40
20
0
80
60
40
20
0
0
1000
2000
Samples
3000
4000
0
118
1000
2000
Samples
3000
4000
Setpoint transition 6
Setpoint transition 7
100
Open Percentage (%)
Open Percentage (%)
100
80
60
40
20
0
80
60
40
20
0
0
1000
2000
3000
Samples
4000
0
1000
2000
3000
Samples
4000
Figure 71 - Set point transitions for adaptive thresholding testing.
The parameters of the transfer function were modified as mentioned above in
order to validate the adaptive thresholding algorithm. Several example plots of the
algorithm working are presented below, with a description of the results following.
119
Percent Open (%)
G: 0.98 Tw : 0.98 : 0.98 Ts : 2
100
80
Measured Values
Upper Threshold
Lower Threshold
60
Fault Detection:
1500
2000
2500
3000
3500
Sample
Fault identification: 0 faults
4000
1
0
-1
1500
2000
2500
3000
Sample
3500
4000
(a)
100
Fault Detection:
Percent Open (%)
G: 0.9 Tw : 0.85 : 0.85 Ts : 4
Measured Values
Upper Threshold
Lower Threshold
80
60
1500
2000
2500
3000
3500
Sample
Fault identification: 276 faults
4000
4500
1
0
-1
1500
2000
2500
3000
Sample
3500
4000
(b)
Figure 72 - Set point transition #1 with fault detection while operating in : (a) normal OS and (b) faulty OS.
In the first set point transition, the effects of a lower natural frequency and
damping ratio can be seen. The valve reacts normally as it ramps up to the setpoint, but
has difficulty reaching its steady state value. The algorithm was able to detect this using
120
the adaptive threshold until it reached a steady state point that was reasonably close to the
set point.
G: 0.98 Tw : 0.98 : 0.98 Ts : 2
Percent Open (%)
100
50
Measured Values
Upper Threshold
Lower Threshold
0
-50
0
500
1000
0
500
1000
1500
2000
2500 3000
Sample
Fault identification: 0 faults
3500
4000
3500
4000
Fault Detection:
1
0
-1
1500
2000
2500
Sample
3000
(a)
100
Fault Detection:
Percent Open (%)
G: 0.9 Tw : 0.85 : 0.85 Ts : 4
50
Measured Values
Upper Threshold
Lower Threshold
0
-50
0
500
1000
1500
2000 2500 3000
Sample
Fault identification: 254 faults
0
500
1000
1500
3500
4000
3500
4000
1
0
-1
2000 2500
Sample
3000
(b)
Figure 73 - Set point transition #2 with fault detection while operating in: (a) normal OS and (b) faulty OS.
121
The second set point transition in Figure 73 again shows the effects of a degrading
valve that cannot reach its set point quickly enough and when it does it overshoots the
value. This test shows a lower amount of faults, but the faults are localized around the
transitional points of the test. This information could be vital to a test engineer by
providing knowledge of not only how, but also what points in the test the valve is failing.
Percent Open (%)
G: 0.98 Tw : 0.98 : 1.08 Ts : 2
60
Measured Values
Upper Threshold
Lower Threshold
40
20
0
-20
1500
1600
1700
1600
1700
1800
1900
2000
2100
Sample
Fault identification: 0 faults
2200
2300
2200
2300
Fault Detection:
1
0.5
0
-0.5
-1
1500
1800
1900
2000
Sample
(a)
122
2100
Percent Open (%)
G: 0.9 Tw : 0.9 : 0.95 Ts : 5
60
Measured Values
Upper Threshold
Lower Threshold
40
20
0
1500
1600
1700
1600
1700
1800
1900 2000 2100
Sample
Fault identification: 776 faults
2200
2300
Fault Detection:
1
0.5
0
-0.5
-1
1500
1800
1900 2000
Sample
2100
2200
2300
(b)
Figure 74 - Set point transition #3 with fault detection while operating in: (a) normal OS and (b) faulty OS.
Although all valves have an input delay between the time it receives a signal and
it actually moves, this delay can increase as the health of a valve decreases and cause
undesirable behavior. In Figure 74 (a), it can be seen that the valve and threshold react in
the same reasonable time frame, however in Figure 74 (b), the valve reacts more slowly
and causes a fault to be detected in the valve.
123
Percent Open (%)
G: 0.98 Tw : 0.98 : 1.03 Ts : 3
Measured Values
Upper Threshold
Lower Threshold
100
50
0
100
200
300
100
200
300
400
500
600
700
Sample
Fault identification: 1267 faults
800
900
Fault Detection:
1
0.5
0
-0.5
-1
400
500
600
Sample
700
800
900
(a)
Percent Open (%)
G: 0.9 Tw : 1.15 : 0.85 Ts : 5
Measured Values
Upper Threshold
Lower Threshold
100
50
0
Fault Detection:
100
200
300
400
500
600
700
Sample
Fault identification: 2864 faults
800
900
0.5
0
-0.5
-1
100
200
300
400
500
600
Sample
700
800
900
(b)
Figure 75 - Set point transition #4 with fault detection while operating in: (a) normal OS and (b) faulty OS.
124
Figure 75 shows how the algorithm would need to be tuned based a fit parameter
in order to get perfect accuracy. This set point time series was very fast with little steady
state time between the transitional periods. This is not common operating procedures in
the test stands at NASA-SSC, but is still useful to see how a valve will operate during an
emergency shutdown. Also, as the gain parameter lowers to .9, the valve is unable to
reach fully open. This can be caused by excessive wear or transition friction or a power
failure in the control systems.
G: 0.98 Tw : 0.98 : 1.08 Ts : 2
Percent Open (%)
150
100
50
0
-50
Fault Detection:
Measured Values
Upper Threshold
Lower Threshold
0
500
1000
0
500
1000
1500
2000
2500
3000
Sample
Fault identification: 76 faults
3500
4000
3500
4000
1
0.5
0
-0.5
-1
1500
2000
Sample
(a)
125
2500
3000
G: 1.1 Tw : 0.9 : 0.85 Ts : 4
Percent Open (%)
150
100
50
0
-50
Fault Detection:
Measured Values
Upper Threshold
Lower Threshold
0
500
1000
1500
2000
2500
3000
Sample
Fault identification: 754 faults
0
500
1000
1500
3500
4000
3500
4000
1
0.5
0
-0.5
-1
2000
2500
Sample
3000
(b)
Figure 76 - Set point transition #5 with fault detection while operating in: (a) normal OS and (b) faulty OS.
In Figure 76, an issue with the algorithm can be seen as initialization effects can
cause false positives in the valve. However, this would not normally be a problem as the
algorithm would be continually running, but if the framework were to be turn on in the
middle of a test, this could cause some false positives in the valve's health analysis. Also,
in Figure 76 (b), the effects of a large gain parameter, coupled with a low damping ratio
can be seen as large oscillations occur at the top of the transitional period. These effects
are continually seen as the valve is suddenly closed and not given time to reach its steady
state value. This effect continues much longer than in the previous tests due to the
continually changing control variable.
126
G: 0.98 Tw : 0.98 : 0.98 Ts : 2
Percent Open (%)
100
50
-50
Fault Detection:
Measured Values
Upper Threshold
Lower Threshold
0
0
500
1000
0
500
1000
1500
2000
2500
3000
Sample
Fault identification: 0 faults
3500
4000
3500
4000
1
0.5
0
-0.5
-1
1500
2000
Sample
2500
3000
(a)
G: 0.9 Tw : 0.9 : 0.9 Ts : 4
Percent Open (%)
100
50
Measured Values
Upper Threshold
Lower Threshold
0
-50
0
500
1000
0
500
1000
1500
2000
2500
3000
Sample
Fault identification: 0 faults
3500
4000
3500
4000
Fault Detection:
1
0.5
0
-0.5
-1
1500
2000
Sample
2500
3000
Figure 77 - Set point transition #6 with fault detection while operating in: (a) normal OS and (b) faulty OS.
(b)
127
Figure 77 shows a type of transition where the valve' degradation can be seen due
to the long and steady rise of the valve's control variable. Because most of the valve's
problems are exposed during fast transition times, the algorithm is unable to detect the
difference between a normally operating valve and a faulty operating valve. While the
algorithm is not detecting the failing health of the valve, it is also not providing false
positives where a valve would be fixed unnecessarily.
Natural Frequency
Avg. Number of Faults
200
150
100
0.9
1
1.1
1.2
Parameter value
Damping Coefficient
1.3
200
Avg. Number of Faults
Avg. Number of Faults
Avg. Number of Faults
Gain
250
180
160
140
120
0.8
1
1.2
Parameter value
1.4
600
400
200
0
0.8
1
1.2
Parameter value
Input Delay
1.4
3
4
Parameter value
5
250
200
150
100
2
Figure 78 – Average fault values for different parameters of the ARMA model thresholding method over all
tests.
It can be seen in Figure 78 that as the parameters of the transfer function are
modified, the average number of faults increases as the valve's physical parameters
degrade. While there are faults in the nominal region, there is an increasing trend in all
the variables as the boundaries are exceeded.
128
To further validate the fault detection algorithm in the scope of the Intelligent
Valve framework, data was taken from the E-Complex test stand's simulation lab using
PLCs from NASA-SSC. A similar transfer function was used to model the valve's
response, however, the PID controller was controlled by a Allen-Bradley PLC which is
used in the E-complex test stand.
By reducing the gain parameter, a simulated
obstruction or power failure can be injected into the feedback signal of the valve. The
same control signal was used for the test and training data to show how the valve changes
based on its parameters. Figure 79 shows the adaptive threshold on the training data and
Figure 80 shows the results of the simulated obstruction fault.
Adaptive Threshold Training Data
120
Upper Values
Lower Values
Actual Values
100
Percentage Closed (%)
80
60
40
20
0
-20
0
50
100
Time (s)
Figure 79 - Training data with final threshold fit.
129
150
Valve Feedback with Simulated Obstruction
Upper Threshold
Lower Treshold
Measured Valve Feedback
Percentage Closed (%)
100
80
60
40
20
0
0
50
100
150
Elapsed Time (s)
Lower Faults
1
1
0.8
0.8
Fault Detected
Fault Detected
Upper Faults
0.6
0.4
0.2
0.6
0.4
0.2
0
-0.2
0
0
50
100
Elapsed Time (s)
150
-0.2
0
50
100
Elapsed Time (s)
150
Figure 80 - Fault detection of simulated obstruction fault using adaptive thresholding.
The adaptive threshold was able to detect when the valve was unable to match the
control signal. Since all of the faults were detected by the lower threshold, the diagnosis
can be narrowed to such things as obstruction or power faults. If both lower and higher
faults were found, then the data would need to be analyzed further by domain experts to
determine the correct maintenance for the system. Also, there are several false positives
detected between 𝑡 = 111𝑠 and 𝑡 = 115, however, the faults only exist for one time step,
which can be accounted for by the error in the ARMA models.
The adaptive threshold algorithm has been validated using a forward analytical
model to detect degradation in the LLAV as well as actual data from hardware and
simulations from NASA-SSC's E-complex test stand. Since no fault data from the LLAV
was available, the parameters of the transfer function were used to model how the valve
130
would react to its given input. The algorithm provides a fit parameter which can be used
to develop a range of values that are considered nominal to the test engineer, but maintain
the quality of performance required in such a critical environment.
The algorithm
showed that it could detect faults amongst various transitions that would be commonly
seen in NASA-SSC test stands. There is a large difference in the number of faults
detected between the nominal parameters and fault parameters. If the faults detected by
the algorithm are trended between tests, the trend lines will show when a valve begins to
become faulty.
A drawback of this method is that it is data-driven and, therefore, requires
previous data from the valve to develop the ARMA models required to create the
adaptive threshold. The advantage of this method is that the data required to calculate the
coefficients is all normal functioning data rather than faulty data. Another drawback is
the lack of an optimization parameter for the fit equation. If one could be developed, the
amount of ARMA models needed could be optimized to reduce computational load of the
algorithm.
4.5 Valve Statistics
The valve operating statistics are used to advise in the fault diagnosis after a failure mode
has been detected by the previous algorithms. The operating statistics have been captured
for several test runs of two LLAVs from historical data in Table 28. These statistics are
presented to the operators in order to investigate negative trends in the system's behavior
and assists in determining more accurate maintenance decisions with the understanding
of the valve's operating history.
131
Table 28 - Operating Statistics for LLAV
Name
Transitions
Cryogenic
Transitions
Distance
Traveled
Transition
Time
Average
Transition
Time
Direction
Changes
Number
Closings
10A23
10A24
25
17
33
35
15
14
14.5
13.2
13.78
15
13
20
12
25
4.6 Health Visualizations
The data is visualized using a 3D model to show the different operating conditions of the
LLAV. Utilizing drafted design documents, each of the valve components were modeled
in Autodesk 3D Studio Max. The valves were designed and animated to allow for an
exploded view or a cross sectional view during operation.
When operating, the
visualization would display the direction of flow with a series of arrows. Green indicated
that the valve was open, while red indicated that it was closed. The frost point was
visualized through the use of a shader program that would display the frost height
through an icy bitmap texture, which would slowly replace the normal metal appearance
as the frost continued to migrate up the valve. Each of these visualizations can be seen in
Figure 81, Figure 82, and Figure 83.
132
of
Figure 81 - Frost line visualization of LLAV.
Figure 82 - Cross sectional and exploded view with flow and position visualizations.
133
Figure 83 - Frost line visualization of LLAV with thermocouple values.
4.7 Prognostics
While no prognostics were performed to determine the remaining useful life of the
LLAV, several techniques were investigated for future consideration of this task.
4.8 Prognostics Data
In order to test the feasibility of these techniques use in prognostics, the following data
was used to determine their performance under different environments.
4.8.1 Canonical Data
To perform simple validation of the AR, ARMA and Kalman filter, canonical time series
data. A linear equation using mean and variance was used to produce multiple test series.
Additive white gaussian noise was then added in order to see how well it could perform
under harsh environmental conditions. An example of this time series can be seen in
Figure 84 and Figure 85.
134
Time series with 0 mean and 1 variance
120
100
Amplitude
80
60
40
20
0
0
10
20
30
40
50
Time (s)
60
70
80
90
100
90
100
Figure 84 - Linear equation with 0 mean and 1 variance.
Time series with 0 mean and 10 variance
120
100
Amplitude
80
60
40
20
0
-20
0
10
20
30
40
50
Time (s)
60
70
80
Figure 85 - Linear time series with 0 mean and 10 variance
135
4.8.2 LLAV Data
To test the prognostic methods under an actual test, the data from the LLAV was used.
This presented a simple approach with an input and output that could determine how well
the techniques could predict into the future of a time series. This data was presented
earlier in the sensor validation section and Figure 46.
4.9 Prognostic Performance
The first time series was based on a 0 mean and 1 variance time signal. The models were
tested with the prediction steps ranging from 1 to 25 steps and the SNR of the AWGN
from -5 to 25dB. The MSE was measured and plotted to gauge performance. The results
can be seen in the following figures:
Time series with 0 mean and 1 variance
120
100
Amplitude
80
60
40
20
0
0
10
20
30
40
50
Time (s)
60
70
Figure 86 - Original model time series.
136
80
90
100
AR prediction at 1 prediction step and SNR = 25dB
120
100
Amplitude
80
60
40
AR Prediction
Actual Signal
20
0
0
20
40
60
80
100
Time (s)
Figure 87 - AR prediction of first time signal at 1 prediction step and SNR = 25dB.
AR prediction at 5 prediction steps and SNR = 25dB
120
100
Amplitude
80
60
40
AR Prediction
Actual Signal
20
0
0
20
40
60
80
100
Time (s)
Figure 88 - AR prediction of first time signal at 5 prediction steps and SNR = 25dB.
137
AR prediction at 5 prediction steps and SNR = -5dB
120
100
Amplitude
80
60
40
AR Prediction
Actual Signal
20
0
0
20
40
60
80
100
Time (s)
Figure 89 - AR prediction of first time signal at 5 prediction step and SNR = -5dB.
Prediction performance of AR model for  = 0 and  = 1 signal
3500
3000
2500
MSE
2000
1500
1000
500
0
15
10
5
0
25
20
15
10
5
0
SNR (dB)
Prediction Steps
Figure 90 - AR MSE performance on 0 mean, 1 variance signal.
138
-5
ARMA prediction at 1 prediction steps and SNR = 25dB
120
100
Amplitude
80
60
40
ARMA Prediction
Actual Signal
20
0
0
20
40
60
80
100
Time (s)
Figure 91 - ARMA prediction of first time signal at 1 prediction step and SNR = 25dB.
ARMA prediction at 1 prediction steps and SNR = -5dB
120
ARMA Prediction
Actual Signal
100
Amplitude
80
60
40
20
0
-20
0
20
40
60
80
100
Time (s)
Figure 92 - ARMA prediction of first time signal at 1 prediction step and SNR = -5dB.
139
ARMA prediction at 5 prediction steps and SNR = -5dB
120
ARMA Prediction
Actual Signal
100
Amplitude
80
60
40
20
0
0
20
40
60
80
100
Time (s)
Figure 93 - ARMA prediction of first time signal at 5 predictions steps and SNR = -5dB.
Prediction performance of ARMA model for  = 0 and  = 1 signal
1500
MSE
1000
500
0
15
10
5
10
0
30
Prediction Steps
0
20
SNR (dB)
Figure 94 - ARMA MSE performance on 0 mean, 1 variance signal.
140
-10
Kalman prediction at 1 prediction steps and SNR = 25dB
100
90
80
Amplitude
70
60
50
40
30
Kalman Prediction
Actual Signal
20
10
0
0
20
40
60
80
100
Time (s)
Figure 95 - Kalman filter prediction of first time signal at 1 prediction step and SNR = 25dB.
Kalman prediction at 5 prediction steps and SNR = 25dB
120
100
Amplitude
80
60
40
Kalman Prediction
Actual Signal
20
0
0
20
40
60
80
100
Time (s)
Figure 96 -Kalman filter prediction of first time signal at 5 prediction steps and SNR = 25dB.
141
Kalman prediction at 5 prediction steps and SNR = -5dB
120
100
Amplitude
80
60
40
20
Kalman Prediction
Actual Signal
0
-20
0
20
40
60
80
100
Time (s)
Figure 97 - Kalman filter prediction of first time signal at 5 prediction steps and SNR = -5dB.
Prediction performance of Kalman filter model for  = 0 and  = 1 signal
2000
MSE
1500
1000
500
0
15
10
5
0
Prediction Steps
25
20
15
10
5
0
SNR (dB)
Figure 98 – Kalman filter MSE performance on 0 mean, 1 variance signal.
142
-5
As can be seen in the figures, as the prediction steps increase, the accuracy of the
model decreases. This is true in both models, which is to be expected as they both use
the same general approach to predicting future time series values. However, under
significant noise, the ARMA model performances much better.
This is due to the
additional coefficients that calculate a moving average of the white noise. The signal was
changed by increasing the variance to 10 with the following results:
Time series with 0 mean and 10 variance
120
100
Amplitude
80
60
40
20
0
-20
0
10
20
30
40
50
Time (s)
60
70
Figure 99 - Original time series model #2.
143
80
90
100
Prediction performance of AR model for  = 0 and  = 10 signal
9000
8000
7000
MSE
6000
5000
4000
3000
2000
15
10
5
0
25
15
20
-5
0
5
10
SNR (dB)
Prediction Steps
Figure 100 – AR MSE performance on 0 mean, 10 variance signal.
Prediction performance of ARMA model for  = 0 and  = 10 signal
6000
5500
MSE
5000
4500
4000
3500
3000
15
10
5
Prediction Steps
0
25
20
15
10
5
0
SNR (dB)
Figure 101 - ARMA MSE performance on 0 mean, 10 variance signal.
144
-5
Prediction performance of Kalman filter for  = 0 and  = 10 signal
2000
MSE
1500
1000
500
0
15
10
5
0
25
15
20
Prediction Steps
10
5
0
-5
SNR (dB)
Figure 102 - Kalman filter performance on 0 mean, 10 variance signal.
The results in this test are similar to that of the previous time step with some
changes in the performance of the ARMA model. The two linear regression models had
an error base much higher than the first test due to the high amount of variance in the
signal, however the Kalman filter stayed relatively constant through both tests. The
ARMA model performs the worst, which is due to its cancellation of noise through the
extra coefficient. The ARMA is actually smoothing the estimation too much because the
high variance was treated as noise. The Kalman filter performed more consistently in
this test than the previous one, and was still the best of the three.
The real benefit of prognostics can be seen when a process or output variable of a
system can be predicted based on the measureable input variables of the system. One
instance of this is a valve's control variable to predict its output variable. If the process
145
variable can be measured several time steps into the future, a disaster can be averted by
performing health analysis algorithms on the future state of the valve. In order to
perform this type of calculation, the techniques mentioned above must be extended to
account for an input variable. In the AR and ARMA models, an external input is added
with another vector of coefficient that must be calculated. The Kalman filter adds
another state vector to the time update equation to account for this type of prediction.
These three methods were applied to LLAV data provided by NASA-SSC. Similarly to
previous tests, AWGN was added to the signal to test the robustness of the techniques in
harsh conditions. The results can be seen in Figure 103.
30 step prediction of process variable using ARX model
120
Previous Process
Predicted Process
Actual Process
Control Input
Percentage Open(%)
100
80
60
40
20
0
-20
85
90
95
100
105
110
Time (s)
115
120
125
Figure 103 - ARX prediction of the LLAV data to 30 time steps.
146
130
Prediction performance of ARX Model for LLAV signal
8000
RMSE
6000
4000
2000
0
30
20
10
0
Prediction Steps
10
0
-20
-10
-40
-30
-50
SNR
Figure 104 - Performance for ARX model based on LLAV data.
30 step prediction of process variable using ARMAX model
120
Previous Process
Predicted Process
Actual Process
Control Input
100
Percentage Open(%)
80
60
40
20
0
-20
85
90
95
100
105
110
Time (s)
115
120
125
Figure 105 - ARMAX prediction of the LLAV data to 30 time steps.
147
130
Prediction performance of ARMAX Model for LLAV signal
8000
RMSE
6000
4000
2000
0
30
20
10
Prediction Steps
0
10
-10
0
-20
-30
-40
-50
SNR
Figure 106 - Performance for ARMAX model based on LLAV data.
30 step prediction of process variable using ARX model
120
Previous Process
Predicted Process
Actual Process
Control Input
Percentage Open(%)
100
80
60
40
20
0
-20
85
90
95
100
105
110
Time (s)
115
120
125
Figure 107 – Kalman prediction of the LLAV data to 30 time steps.
148
130
Prediction performance of ARMAX Model for LLAV signal
4
x 10
3
RMSE
2
1
0
30
20
-60
-40
10
-20
0
Prediction Steps
0
20
SNR
Figure 108 - Performance for Kalman filter based on LLAV data.
All three algorithms were able to predict the output of the process variable
reasonably well, even out to 30 time steps. The results are similar to the canonical results
where the MSE of the techniques directly proportional to the SNR and prediction steps
used to simulate the signal. This is due to the presence of an input control variable that
allows the predictors to gain better context of how the valve will respond in future states.
The Kalman filter was the least consistent as the process and measurement noise are both
modeled by constant vectors with previous knowledge of the noise covariance. This
prognostic process, used in conjunction with the adaptive threshold method developed
above, could provide valuable seconds to the test engineers at NASA-SSC to make
determinations of test operations in the E-complex test stand.
The ARX model is the simplest of all the techniques and performs well under
systems with relatively low noise.
It's simplicity makes it the lowest in both
computational and memory costs and can save valuable resources on mission critical
149
devices if large amount of valves are being monitored. The ARMAX model provides a
way to estimate the measurement and process noise through an additional coefficient that
calculates the error of the system as a moving average white noise. The ARMAX and
ARX model coefficients are both data driven in that they require historical data to
calculate their coefficients. In the test performed in this research, the determination of
the coefficients was done quickly and with low amounts of training data and the ARX
and ARMAX models were still able to perform well in the prognosis tests. A significant
drawback to these models is their inability to incorporate physical mechanisms into their
equations. They are both mathematical models with no relation to the realworld. The
Kalman filter, as well as other state-space models, provide the ability for real world
processes to be described by an internal state vector. This state vector is continually
updated throughout the prognosis process to minimize the state error covariance through
measurement and time update equations. The drawbacks to the Kalman filter is that the
parameters must be tuned which requires knowledge of the physics of the system. Also,
initial values are needed at the start of the algorithm to ensure optimal results. The
Kalman filter is the most computationally intensive, but least memory intensive as it only
relies on the current sensor data point which can be discarded after the measurement
updates have been performed.
4.10 Diagnostic Process
In order to make the framework practical for use by NASA-SSC engineers, the health
data must be displayed efficiently in the control computers. The software used by the
control engineers, WonderWare InTouch, allows for developers to expand the
functionality of their software through the use of Microsoft’s ActiveX modules and .NET
150
controls. The control computers each have four monitors that give the control engineers
vast screen real-estate to monitor the test stands during test article firings. Through the
software framework described in the approach section, a process was designed and
implemented with the design constraints in mind that provides the data necessary to
perform and visualize the health data of the LLAV.
The .NET module accomplished the tasks mentioned above by creating a tabbed
control that provides test operations with information required to make intelligent
maintenance decisions. A tabbed control was selected because it lowers the footprint the
module will have on control computer screens, while still allowing extensibility in the
future. The first tab contained the historical context of the valve by displaying crucial
operating statistics which are continuously monitored by the module. These values are
stored in a MS-SQLCE database in order to create a persistent record of the events of the
valve. The statistics tab can be seen in Figure 109.
Figure 109 - Intelligent Valve statistics tab.
The second tab, Figure 110, demonstrates the ability to track the frost line of the
valve. The method used to track the frost line of a valve will be discussed in the thermal
modeling portion of the Prototype Diagnostics section of this report. This tab allows a
151
test operator to quickly see all the thermocouples that are attached to a valve, as well as
their current health status. The flagged attribute of the thermocouple is determined by
either a percentage or absolute threshold designated by the test operator in the setup tab.
It also shows the current position, control, and open time of the valve. Each valve can be
selected from a drop down menu to see the current status of the valve. A 2D view is
provided so when the user clicks on a thermocouple in the list view, the position is shown
by a red box. This gives context to the position of the thermocouple in relation to the
total length of the valve.
Figure 110 - Intelligent Valve thermocouple tab.
The final functional tab allows the test operator to add, modify, reset, and delete
valves. It also provides functionality for adding, modifying, and deleting thermocouples
152
from the valve, and finally the ability to add and delete DDE data servers. This feature
enables test operators to change between setups, while still keeping persistent tracking of
the valve statistics. Also, the test operator can specify a data folder where the raw
measurement data is stored in another MS-SQLCE database. Future health analysis
algorithms can be developed, tested and validated on this data. Figure 111 shows the
setup tab.
Figure 111 - Intelligent Valve setup tab.
153
Chapter 5: CONCLUSIONS
ISHM capabilities can provide significant benefits for ground-based spacecraft
monitoring and control and ultimately can be adapted to provide on-board support for
spacecraft. Progressive development and demonstration of key ISHM architectural
elements requires that key propulsion components be adequately modeled and supported
with high-performance anomaly detection algorithms. It is also important that the
integration of the model within an ISHM framework be supported with useful user
interfaces that maximize the selectivity and utility of the ISHM output in order to obtain
the intended benefits.
5.1 Summary of Accomplishments
The objectives of this thesis are revisited below, and the solutions proposed to address
each of the problems indentified in this research work are summarized.
1. To design a framework for the detection of faults and failure modes in the large
linear actuated valve that are used on the rocket engine test stands at NASA-SSC.
An Intelligent Valve framework was designed using domain expert knowledge to
identify the key faults and failure modes in the LLAV. A FMECA was performed, as
seen in Section 3.1, to focus efforts on the most critical problems with the valves. Once
154
this knowledge had been acquired, a diagnostic process and algorithms could be
developed to detect these faults and failure modes.
2. To develop a diagnostic process that –
a. Receives and stores incoming sensor data;
b. Performs calculation of operating statistics;
c. Compares with existing analytical models; and,
d. Visualizes faults, failures, and operating conditions in a 3D GUI environment.
The diagnostic process was developed with an interface that could be easily
expanded in the future. The DDE protocol and a SQL database (Section 3.4) was used to
receive and store incoming sensor data in an efficient manner that could be easily
annotated by the diagnostic algorithms. In order to give maintenance personnel historic
context of the valve's operation, an algorithm was developed (Section 3.2.4) to capture
key operating statistics used throughout the valve's lifespan. A thermal analytic model
(Section 3.2.6) was developed by NASA engineers and implemented into the Intelligent
Valve framework.
A 3D environment was developed using advanced visualization
techniques to show faults, failures, and operating statistics in a 3D environment, which
can be seen in Section 4.6.
3. To develop a suite of diagnostic algorithms that can detect anomalous behavior in
the valve and other system components of the rocket engine test stand.
A suite of diagnostic algorithms was developed that detects various anomalous
behaviors in the LLAV and other system components. The first is a sensor validation
algorithm using Auto-associative neural networks (Section 4.3), an adaptive thresholding
method to detect degradation in valve parameters (Section 4.4), and a thermocouple fault
155
detection using the thermal analytical model developed by NASA engineers (Section
4.2.1). These fault detection algorithms, coupled with the contextual information from
the operating statistics, can help advise maintenance personnel in their decisions to repair
the valves.
4. To expand the capability of the diagnostic algorithm to perform prognosis in
specific context.
The diagnostic algorithms have been expanded with prediction in specific context.
Particularly, AR, ARMA and Kalman filters were used to gauge the ability to predict the
process variable of a valve. These values can be used by the adaptive thresholding
method to determine faults in a valve seconds before they occur. If accurate enough,
these seconds could be the difference between an emergency shutdown, and a
catastrophe.
In this thesis, we have shown that a judicious combination of technologies,
namely, the DDE data transfer protocol, auto-associative neural networks, empirical and
physical models and virtual reality environments can be used to develop a diagnostic
procedure for assessing the integrity of rocket engine test stand components. We have
specifically focused on valves, because they are critical to the cryogen transport
mechanisms that are vital to test operations. This project is in the area of an identified
core competency at John C. Stennis Space Center; specifically in the technology focus
area of ISHM user interfaces. The project addressed the development of an effective
interface between the ISHM and its users in order to reduce information overload in the
typically crowded environments of complex system control rooms. We have designed,
developed and validated a user-interface that presents information related to the system
156
health and supports the user’s navigation through diagnostic scenarios with the ability to
extract and visualize the required system details.
5.2 Recommendations for Future Work
The state of the ISHM functional art is hampered by a number of factors; a major
constraint is the unavailability of intelligent process models that can provide the reasoned
determination of element condition based on the available data sources that feed the
ISHM architecture. One of the significant challenges is to develop realistic models for the
most common and problem-prone elements. Surprisingly, there are major gaps in our
understanding of how even fundamental elements (such as valves in a rocket engine test
stand) degrade and—more importantly—how to determine the remaining operational life
available from a valve or any other similar component. And, if an anomaly is detected,
what are the best means for providing a user with efficient tools to explore the nature of
the anomaly and its possible effects on the element as well as its relationship to overall
system state.
This thesis has addressed a part of the problem, by providing a framework for
diagnosing the integrity of a specific test-stand component – the large linear actuator
valve. The next steps in expanding this research work will involve the design,
development and validation of prognosis algorithms that can predict potential anomalies
in a reasonable time frame before they actually occur. This recognizes the fact that in a
test-stand environment, by the time a fault is diagnosed, it is usually too late to remedy
the problem. The subsequent addition of a prognosis module to the intelligent valve
model will provide test operations personnel to initiate “what if?” queries and enhance
the ability to perform a comprehensive risk analysis of every test procedure. The
157
combination of the analysis and prognosis algorithms can be used to arrive at a model
that can predict the remaining useful life of a test-stand component such a valve – making
such predictions provides a significant capability enhancement to ISHM platforms.
The research work presented in this thesis expands upon prior ISHM framework
that utilizes smart sensors by developing diagnostic tools that can track changing health
conditions in dynamic systems. This work has the potential to advance sensor data fusion
and integration to the degree required to achieve the benefits that are necessary to support
next-generation space exploration missions.
158
References
[1] J. Schmalzel, F. Figueroa, J. Morris, R. Polikar, and S. Mandayam, "An architecture
for intelligent systems based on smart sensors," IEEE Transactions on
Instrumentation and Measurement, vol. 54, no. 4, pp. 1612-1616, August 2005.
[2] G. Vachtsevanos, F. L. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent Fault
Diagnosis and Prognosis for Engineering Systems, 1st ed. Hoboken, United States
of America: John Wiley & Sons, Inc., 2006.
[3] D. Schrage, D. DeLaurentis, and K. Taggart, "FCS Study: IPPD Concept
Development Process for Future Combat Systems," Georgia Institute of Technology,
Atlanta, Georgia, AIAA MDO Specialists Meeting September 2002.
[4] NASA, "NASA Reliability Centered Maintenance (Rcm) Guide for Facilities and
Collateral Equipment," NASA, Maintenance Guide 2008.
[5] M. B. Mengel, W. L. Holleman, and S. A. Fields, Eds., Fundamentals of clinical
practice, 2nd ed. New York, United States of America: Kluwer Academic/Plenum
Publishers, 2002.
[6] J. K. Shim and J. G. Siegel, Handbook of financial analysis, forecasting and
modeling, 2nd ed. Chicago, United States of America: CCH Incorporated, 2004.
159
[7] NASA History Division. (2010, January) NASA History. [Online].
http://history.nasa.gov/
[8] NASA Ames Research Center. (2005, March) NASA - Design Principles for Robust
ISHM. [Online]. http://www.nasa.gov/centers/ames/research/technologyonepagers/design_principles.html
[9] F. Figueroa, R. Holland, J. Schmalzel, and D. Duncavage, "Integrated System Health
Management (ISHM): Systematic Capability," IEEE Sensors Application
Symposium, Houston, 2006, pp. 202-206.
[10] Pratt and Whitney Rocketdyne. (2010, January) J-2X. [Online].
http://www.pw.utc.com/Products/Pratt+&+Whitney+Rocketdyne/J-2X
[11] NASA. (2010, January) Propoulsion Testing at NASA's John C. Stennis Space
Center. [Online]. http://www.nasa.gov/centers/stennis/pdf/372105main_FS-2008-1000071-SSC.pdf
[12] M. Currie, "Where did all the People Go? The New Case for Condition Monitoring,"
Chicago, 2006.
[13] M. Fargnoli, E. Rovida, and R. Troisi, "An example of a morphological matrix can
be seen ," The 4th International Conference on Axiomatic Design, Florence, 2006.
[14] Z. Fan and J. Ma, "An Approach to Multiple Attribute Decision Making Based on
Incomplete Information on Alternatives," Thirty-second Annual Hawaii
International Conference on System Sciences-Volume 6, vol. 6, Maui, 1999, p. 6041.
160
[15] T. Marchant et al., Evaluation and Decision Models - A Critical Perspective
(International Series in Operations Research and Management Science Volume 32).
Norwell, United States of America: Kluwer Academic Publishers, 2000.
[16] S. G. Arunajadai, Scott J. Uder, Robert B. Stone, and Irem Y. Tumer, "Failure Mode
Identification Through Clustering Analysis," Quality and Reliability Engineering
International, vol. 20, no. 5, pp. 511-526, April 2004.
[17] Society of Automotive Engineers, "Potential Failure Mode and Effects Analysis in
Design (Design FMEA), Potential Failure Mode and Effects Analysis in
Manufacturing and Assembly Processes (Process FMEA)," Automotive Quality And
Process Improvement Committee, Standard SAE J1739, 2009.
[18] FMEA-FMECA.com. (2009, August) FMEA / FMECA Information. [Online].
www.fmea-fmeca.com
[19] D. H. Stamatis, Failure mode and effect analysis: FMEA from theory to execution,
2nd ed., Pual O'Mara, Ed. Milwaukee, United States of America: William A. Tony,
2003.
[20] R. E. McDermott, J. Raymond Mikulak, and Michael R. Beauregard, The Basic of
FMEA, 2nd ed. New York, United States of America: Productivity Press, 2008.
[21] NASA Lewis Research Center, "Tools of Reliability Analysis: Introduction and
FMEAs," Cleveland, Presentation 2009.
[22] P. D. T. O'Connor, Practical Reliability Engineering, 4th ed. Hoboken, United
States of America: John Wiley & Sons Inc., 2002.
161
[23] C. Bunis [et al.], Design for Reliability, 1st ed., Dana Crowe and Alec Feinberg, Eds.
Lowell, United States of America: CRC, 2001.
[24] E. Crow, K. Reichard, J. Banks, and L. Weiss. (2005, February) Penn State Applied
Research Laboratory. [Online]. http://csrp.psu.edu/files/ishm2005/ishm_reichard.pdf
[25] A. Bayoumi et al. (2008, February) Condition-Based Maintenance at University of
South Carolina. [Online].
http://cbm.me.sc.edu/pubs/AHS1.pdf;http://cbm.me.sc.edu/pubs/AHS3.pdf
[26] A. Bandes, "What You Need to Know About Ultrasound CBM," Pumps & Systems,
pp. 60-61, December 2006.
[27] T. Wireman, Computerized Maintenance Management Systems, 2nd ed. New York,
United States of America: Industrial Press, 1994.
[28] University of South Carolina. (2009, February) College of Engineering and
Computing Condition-Based Maintenance. [Online]. http://cbm.me.sc.edu/pubs.html
[29] S. X. Ding, Model-based fault diagnosis techniques: design schemes, algorithms,
and tools. Berlin, Germany: Sprinter-Verlag, 2008.
[30] H. Park, W. Pedrycz, and S. Oh, "Granular Neural Networks and Their Development
Through Context-Based Clustering and Adjustable Dimensionality of Receptive
Fields," IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 20, no. 10, pp.
1604-1616, October 2009.
[31] G. M. Davis, Ed., Noise Reduction in Speech Applications. Boca Raton, United
States of America: CRC, 2002.
162
[32] E. Micheli-Tzanakou, Ed., Supervised and Unsupervised Pattern Recognition:
Feature Extraction and Computational Intelligence. Boca Raton, United States of
America: CRC Press LLC, 2000.
[33] I. G. et al., Eds., Feature Exraction: Foundations and Applications. Berlin,
Germany: Springer-Verlag, 2006.
[34] L. Ljung, System Identification: Theory for the User, 2nd ed. Upper Saddle River,
United States of America: Prentice Hall PTR, 2007.
[35] V. Puig, J. Quevedo, T. Escobet, F. Nejjari, and S. de las Heras, "Passive Robust
Fault Detection of Dynamic Processes Using Interval Models," IEEE
TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, vol. 16, no. 5, pp.
1083-1089, September 2008.
[36] H. Bassily, R. Lund, and W. John, "Fault Detection in Multivariate Signals With
Applications to Gas Turbines," IEEE TRANSACTIONS ON SIGNAL PROCESSING,
vol. 57, no. 3, pp. 835-842, March 2009.
[37] C. H. Lo, Eric H. K. Fung, and Y. K. Wong, "Intelligent Automatic Fault Detection
for Actuator Failures in Aircraft," IEEE TRANSACTIONS ON INDUSTRIAL
INFORMATICS, vol. 5, no. 1, pp. 50-55, February 2009.
[38] G. Spitzlsperger, C. Schmidt, G. Ernst, H. Strasser, and M. Speil, "Fault Detection
for a Via Etch Process Using Adaptive Multivariate Methods," IEEE
TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, vol. 18, no. 4, pp.
528-533, November 2005.
163
[39] W. R. A. Ibrahim and M. M. Morcos, "An Adaptive Fuzzy Self-Learning Technique
for Predication of Abnormal Operation of Electrical Systems," IEEE Transactions
on Power Delivery, vol. 21, no. 4, pp. 1770-1777, October 2006.
[40] S. Huang and K. K. Tan, "Fault Detection and Diagnosis Based on Modeling and
Estimation Methosd," IEEE Transactions on Neural Networks, vol. 20, no. 5, pp.
872-881, May 2009.
[41] J. Yun, K. Lee, K. Lee, S. B. Lee, and J. Yoo, "Detection and Classification of Stator
Turn Faults and High-Resistance Electrical Connections for Induction Machines,"
IEEE Transactions on Industry Applications, vol. 45, no. 2, pp. 666-674,
March/April 2009.
[42] Financial Forecast Center, LLC. (2009, November) Financial Forecast Center Home
Page. [Online]. http://www.forecasts.org/
[43] A. Rodgers and A. Streluk, Forecasting the Weather, 2nd ed. Chicago, United States
of America: Reed Elsevier Inc., 2007.
[44] F. P. et al., "A Generic Prognostic Methodology Using Damage Trajectory Models,"
IEEE Transactions on Reliability, vol. 58, no. 2, pp. 277-285, June 2009.
[45] Z. Sun, J. Wang, D. Howe, and G. Jewell, "Analytical Prediction of the Short-Circuit
Current in Fault-Tolerant Permanent-Magnet Machines," IEEE Transaction on
Industrial Electronics, vol. 55, no. 12, pp. 4210-4217, December 2008.
[46] Y. Zhang et al., "Connected Vehicle Diagnostics and Prognostics, Concept, and
Initial Practice," IEEE Transactions of Reliability, vol. 58, no. 2, pp. 286-294, June
2009.
164
[47] M. Baybutt, C. Minnella, A. E. Ginart, P. W. Kalgren, and M. J. Roemer,
"Improving Digital System Diagnostics Through Prognostic and Health
Management (PHM) Technology," IEEE Transactions on Intrumentation and
Measurement, vol. 58, no. 2, pp. 255-262, February 2009.
[48] P. Lall, M. N. Islam, M. K. Rhim, and J. C. Suhling, "Prognostics and Health
Management of Electronic Packaging," IEEE Transactions on Components and
Packaging Technologies, vol. 29, no. 3, pp. 666-677, September 2006.
[49] S. K. Yang, "A Condition-Based Failure-Prediction and Processing-Scheme for
Preventive Maintenance," IEEE Transactions on Reliability, vol. 52, no. 3, pp. 373383, September 2003.
[50] A. H. Al-Badi, S. M. Ghania, and E. F. EL-Saadany, "Prediction of Metallic
Conductor Voltage Owing to Electromagnetic Coupling Using Neuro Fuzzy
Modeling," IEEE Transaction on Power Delivery, vol. 24, no. 1, pp. 319-327,
January 2009.
[51] Society of Automotive Engineers, "Evaluation Criteria for Reliability-Centered
Maintenace (RCM) Processes," Standards Report SAE JA1011, 1998.
[52] M. Kramer, "Nonlinear Principal Component Analysis Using Autoassociative
Neural Networks," AIChE Journal, vol. 37, no. 2, pp. 233-243, February 1991.
[53] L. D. Mattern, C. L. Jaw, T. Guo, R. Graham, and W. McCoy, "Using Neural
Networks for Sensor Validation," 34th Joint Propulsion Conference, Cleveland,
1998.
165
[54] J. H. Lienhard IV and J. H. Lienhard V, A Heat Transfer Textbook, 3rd ed.
Cambridge, United States of America: Phlogiston Press, 2008.
[55] S. J. McPhee and M. Papadakis, Current Medical Diagnosis and Treatment 2009,
48th ed. New York, United States of America: McGraw-Hill Professional, 2009.
[56] D. Ruppert, Statistics and finance: an introduction, 1st ed., George Caseila, Stephen
Fienberg, and Ingram Olkin, Eds. New York, United States of America: SpringerVerlag, 2004.
[57] R. Mimick, M. Thompson, and S. W. William, Business Diagnostics 2005: Evaluate
And Grow Your Business. Victoria, Canada: Trafford, 2005.
[58] J. Schmalzel and F. Figueroa, "Rocket Testing and Integrated System Health
Management," Condition Monitoring and Control for Intelligent Manufacturing, D.
T. Pham, Ed. London, England: Springer London, 2006, ch. 15, pp. 373-391.
166
Download