Vehicle Health Monitoring Using Stochastic Constraint... Christopher Rossi ARCHIVES I

Vehicle Health Monitoring Using Stochastic Constraint Suspension
ARCHIVES
by
NASSAC
TS
INSTrI
E
Christopher Rossi
I
B.S.E. Aerospace Engineering
2S
University of Michigan, 2010
Submitted to the Department of Aeronautics and Astronautics
in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Aeronautics and Astronautics
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2012
© Christopher Rossi, 2012. All rights reserved.
The author hereby grants to MIT and Draper Laboratory permission to reproduce and to distribute
publicly paper and electronic copies of this thesis document in whole or in part.
Signature of Au th or: .................................................
..............................................................
. ..
Department of Aeronautics and Astronautics
May 24, 2012
Certified by: .................................................
- - . .. . .
..
..................................
Jeffrey A. Hoffinan
Aeronautics and Astronautics
Thesis Supervisor
C ertified by : ...........................................
x ---
...........................
Russell Sargent
Member of the Technical Staff, Draper Laboratory
Thesis Supervisor
'M
/
..---------.....
A ccep ted by : ......................................................
H. Modiano
bEytan
t
Professor of Aeronautics and Astronautics
Chair, Graduate Program Committee
2
Vehicle Health Monitoring Using Stochastic Constraint Suspension
by
Christopher Rossi
Submitted to the Department of Aeronautics and Astronautics
on May 24, 2012, in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Aeronautics and Astronautics
Abstract
Autonomous vehicle health monitoring (VHM) has been identified as a high priority technology
for future space exploration in NASA's 2012 technology roadmap. Traditional VHM
approaches are often designed for a specific application and are unable to detect and isolate a
wide variety of faults. Proposed methods are often too computationally complex for NASA's
manned flight software verification and validation (V&V) process. An innovative VHM
algorithm is presented that addresses these weaknesses by integrating the constraint suspension
technique with parity space and hypothesis testing. The approach relies on on-board sensor
measurements, knowledge of control commands, and a modular mathematical system model to
provide a VHM solution. Improvement over original constraint suspension is demonstrated
using conceptual and numerical examples. Feasibility of the VHM method on a spacecraft is
explored using a numerical simulation of a generic vehicle.
Thesis Supervisor: Jeffrey A. Hoffman
Title: Professor of the Practice of Aerospace Engineering
Thesis Supervisor: Russell Sargent
Title: Member of the Technical Staff, Draper Laboratory
3
4
Acknowledgements
This thesis would not have been possible without all of the individuals who provided
guidance and support throughout my two years at MIT. First and foremost, I would like to thank
my family and friends for their constant support and encouragement. Second, I am very grateful
to my MIT and Draper advisors, Professor Hoffman, Paul Huxel, Russell Sargent, and Louis
Breger, for donating an extensive amount of their time to guide my research. Many other Draper
employees also deserve recognition for their instrumental work on TALARIS and my thesis
research. Third, I'd like to acknowledge Bobby Cohanim and Phillip Cunio for their TALARIS
leadership and wide-ranging guidance outside of the project. Fourth, I feel very fortunate to
have been surrounded by such talented peers, especially everyone on the TALARIS project, who
I learned an incredible amount from. Finally, I'd like to thank Draper Laboratory and MIT for
generously funding me through graduate school.
5
6
Table of Contents
Chapter 1: Introduction.................................................................................................................
15
1.1 M otivation ...........................................................................................................................
15
1.2 Literature Review ................................................................................................................
16
1.3 Thesis Overview ..................................................................................................................
19
Chapter 2: K ey A lgorithm s.......................................................................................................
21
2.1 Parity Space and Hypothesis Testing ...............................................................................
21
2.1.1 Parity Space .................................................................................................................
22
2.1.2 Hypothesis Testing in FD I......................................................................................
23
2.2 Constraint Suspension....................................................................................................
25
2.2.1 D etection......................................................................................................................
25
2.2.2 Isolation........................................................................................................................
29
2.2.3 M erits and Lim itations..............................................................................................
31
Chapter 3: Stochastic Constraint Suspension ............................................................................
33
3.1 Algorithm Description.........................................................................................................
33
3. 1.1 Uncertainty Propagation ...........................................................................................
35
3.1.2 Utilizing H ardw are Redundancy ................................................................................
35
3.2 Analytical Expected Perform ance....................................................................................
36
3.3 D em onstration of Improvem ent ......................................................................................
37
3.3.1 Conceptual ...................................................................................................................
38
3.3.2 Num erical .....................................................................................................................
39
Chapter 4: Spacecraft Application.............................................................................................
45
4.1 Application Overview ....................................................................................................
45
4.2 V ehicle Sim ulation.........................................................................................................
47
4.3 Constraint M odels ...............................................................................................................
52
4.4 Perform ance Analysis .........................................................................................................
60
4.4.1 System Level P(FA) .....................................................................................................
61
4.4.2 Sensor Degradation......................................................................................................
62
4.4.3 Signal-to-N oise Sensitivity......................................................................................
63
4.4.4 System Level SCS and FT Com parison....................................................................
69
4.4.5 Embedded Processor Testing ....................................................................................
Chapter 5: Summ ary and Future W ork ......................................................................................
73
75
5.1 Sum mary of Results .........................................................................................................
75
7
5.2 Challenges and Future W ork...........................................................................................
76
References.....................................................................................................................................
79
Appendix A: Chi-Squared Distribution Noncentrality Parameter Derivation...........................
83
Appendix B: Softw are Architecture...........................................................................................
87
8
List of Figures
Figure 1: The Hypothesis Testing Tradeoff between P(FA) and P(MD) .................................
Figure 2: Example System Constraint Model..........................................................................
Figure 3: Example Adder Component Constraint Model..........................................................
Figure 4: Detection Example for a Nominal System.................................................................
Figure 5: Detection Example for a Faulty System...................................................................
Figure 6: Pseudocode for Constraint Suspension .....................................................................
Figure 7: Isolation Example for a Faulty System .....................................................................
Figure 8: Stochastic Constraint Suspension Pseudocode .........................................................
Figure 9: Utilizing Redundant Sensor Hardware......................................................................
Figure 10: Representative Measurement Comparison Conceptual Example ...........................
Figure 11: Numerical Simulation Results Comparing SCS and FT. ........................................
Figure 12: Zoomed in Numerical Simulation Results Comparing SCS and FT.......................
Figure 13: Numerical Simulation Results for Two Measurements Scenario ...........................
Figure 14: Generalized Thruster Configuration and Coordinate System .................................
Figure 15: Spacecraft Simulation Architecture........................................................................
Figure 16: Constraint Model for Spacecraft Application ..........................................................
Figure 17: Constraint Model without Temperature Sensors......................................................
Figure 18: Constraint Model without Split Dynamics ...............................................................
Figure 19: Component Inputs and Outputs with Connections...................................................
Figure 20: System level P(FA) as a Function of Input P(FA) ..................................................
Figure 21: Temperature Sensor Degradation Forms Superstructure .........................................
Figure 22: IMU Model Comparison for Accelerometer Fault.................................................
Figure 23: Low P(FA) and P(MD) Region of Figure 22 .........................................................
Figure 24: Dynamics Uncertainty Comparison for Accelerometer Fault................................
Figure 25: Low P(FA) and P(MD) Region of Figure 24 ..........................................................
Figure 26: Constraint Model Highlighting FT Settings ............................................................
Figure 27: System Level Detection Performance Comparison for Accelerometer Bias ..........
Figure 28: System Level Isolation Performance Comparison for Accelerometer Bias.............
Figure 29: SC S Interface...............................................................................................................
Figure 30: Function Call Hierarchy List ....................................................................................
Figure 31: Function Call Hierarchy ...........................................................................................
9
24
26
27
28
28
30
31
34
36
38
40
41
42
47
51
53
55
56
57
62
63
66
66
68
68
70
72
72
88
89
90
10
List of Tables
Table 1: Generalized Hardware Locations and Thruster Directions ........................................
Table 2: M odel Param eters and Uncertainties .........................................................................
Table 3: IMU H ardw are Specifications ...................................................................................
11
46
50
65
12
Nomenclature
A = linearized dynamics matrix
D = decision scalar used in consistency checking
DOF = degrees of freedom
FDI = fault detection and isolation
FOM = figure of merit
FT = Fixed Threshold (an implementation of analog constraint suspension)
f = measurement residual vector
GNC guidance, navigation, and control
GPS
=
Global Positioning System
H = measurement geometry matrix
IMU = inertial measurement unit
k= noncentrality parameter for chi-squared distribution
M = number of measurements
N = number of states
P(FA) = probability of false alarm
P(MD) = probability of missed detection
Q uncertainty in constraint function propagation
I covariance matrix
a standard deviation
SCS Stochastic Constraint Suspension
SNR = signal-to-noise ratio
T = threshold used in consistency checking
V&V = verification and validation
VHM = vehicle health monitoring
W = weighting matrix of measurement uncertainties for least squares estimation
x = state vector
z = measurement vector
13
14
Chapter 1: Introduction
The latest NASA technology roadmap [1] identifies autonomous vehicle health
monitoring (VHM) as a "high priority" technology critical to achieving NASA's goals.
Spacecraft autonomous VHM monitors on-board sensor measurements and provides a system
wide fault identification and isolation solution without ground or human support. No standard
method exists for VHM and many traditional approaches are limited by their complexity or the
types of faults they are able to identify and isolate. The objective of this research is to develop
and demonstrate an improved VHM algorithm that is adequate for detecting and isolating a large
class of faults, yet simple enough to fly on current manned vehicles.
1.1 Motivation
Countless mission failures have occurred in spaceflight for a variety of reasons. Two
Space Shuttle missions are presented here to illustrate the importance of VHM in spacecraft
operations. More general reasons are then given to motivate autonomous VHM.
During the ascent of STS 51-F in 1985, one Space Shuttle Main Engine (SSME)
prematurely shutdown due to the failure of two redundant temperature sensors [2]. Due to a
defect, the sensors experienced a common cause failure and incorrectly measured temperatures
outside the acceptable limits for the engine. The majority vote of two redundant sensors triggered
the shutdown. Soon after, a temperature sensor failed in a second SSME, while a second sensor
in the same SSME approached the redline value that would have triggered another shutdown.
Fortunately, a mission controller correctly determined in a matter of seconds that the error was a
sensor failure and the engines were performing nominally. STS 51-F performed an Abort to
Orbit (ATO) using the remaining two engines, possibly saving the vehicle and crew. Over 25
years later, most traditional VHM techniques are still unable to recognize this kind of common
cause sensor failure. An improved VHM system could potentially have isolated the original
failures to the sensors, preventing any engine shutdowns and saving the nominal mission, and
eliminating the need for a risky mission control decision.
15
The 1986 Challenger disaster provides the second example. Shortly after liftoff, a failure
in a Space Shuttle Solid Rocket Booster (SRB) caused the disintegration of the vehicle and loss
of crew. The accident report [3] states that no anomalies were apparent to the crew or ground
control until live video feed showed the break up. However, post-processing of downlinked onboard data showed discrepancies in SRB pressures and attitude up to 12 seconds before break up.
Assuming an escape system was available, a VHM system could potentially have isolated the
failure and triggered an abort in time to save the crew. Planned future systems will have escape
systems so reliable VHM will be critical.
Response time requirements and increasing vehicle complexity motivate autonomous
VHM capability. Currently, common practice after a non-time-critical failure indication is to
place a spacecraft in 'safe' mode, downlink telemetry, and diagnose the fault using a team of
experts. However, this approach is not feasible for many mission scenarios because of the
response time needed, and lags due to communication availability, speed of light delay, and
diagnosis time on the ground. For instance, Geller [4] presents rendezvous cases where
autonomy is required for mission success. This reasoning can be extended to VHM during
rendezvous, entry, landing, abort, and any other dynamic scenarios. Additionally, as spacecraft
become increasingly more complex, the failure modes increase exponentially [5], making it more
difficult for humans to diagnose faults efficiently. The National Research Council, recognizing
the importance of autonomous VHM, identified it in 2012 as a "high priority" technology for
NASA to address in order to meet its exploration goals [1]. The report also points out the
potential to apply the technology across missions and industries, including non-aerospace
applications, "deep space exploration, robotic science missions, planetary landers and rovers."
1.2 Literature Review
Recent surveys on VHM algorithms in use or under development divide the approaches
into three classes of redundancy [6][7]. First, there is hardware redundancy, where signals from
multiple hardware components are compared. Second, quantitative analytical redundancy uses
on board mathematical system models to compare dissimilar measurements. Third, qualitative
analytical redundancy applies logic similar to the quantitative form, but uses a discrete system
16
model rather than an analog model. Analytical redundancy is also known as model-based
diagnosis.
Hardware redundancy is commonly employed in aerospace because of its simplicity and
reliability. Two sensors are sufficient for detecting a fault, but three are required for isolation of
a fault to a single sensor. The comparison between sensors in flight software is often done using
parity space [8] or limit value checking [9]. Parity space creates a residual vector with
magnitude and direction that enable fault detection and isolation respectively. Chapter 2
describes parity space techniques in more detail. Limit checking directly compares
measurements to predetermined thresholds. Though these techniques can detect and isolate
faults in a single system, hardware redundancy is unable to identify higher level faults, such as
common cause sensor failures. Also, this method requires redundant hardware which results in
increased mass, volume, and power requirements.
Quantitative analytical redundancy is applied in many forms in current vehicles.
Fundamentally, a real-time mathematical model of the system is used to form connections
between systems and compare dissimilar information. For example, a controller command to fire
a certain thruster should result in a known acceleration measurement within a tolerance. Forms
of quantitative analytical redundancy include full state observer, parity relations, Kalman
filtering, and parameter estimation [6]. Full state observer methods use full state estimation
with gain matrices tuned to be sensitive to specific fault modes. Parity relations generate
residuals that exhibit predictable behavior in response to modeled faults while filtering system
transients and noise [10]. Kalman filtering can also be used to identify faults by performing
statistical tests on the whiteness, mean, and covariance of the residuals [11]. Some forms of
Kalman filtering use a bank of filters, with each sensitive to a specific fault mode [12].
Parameter estimation, or system identification, compares estimated physical constants to
modeled parameters [13]. All of these approaches reduce the need for redundant hardware, but
can be more computationally expensive and are often sensitive to modeling errors. Most
importantly, many require some level of failure mode modeling a priori.
Qualitative analytical redundancy is typically used in the artificial intelligence (Al)
community. Neural networks, expert systems, and constraint suspension fall into this class.
Neural networks use pattern recognition to identify faults in nonlinear functions, but this method
cannot present reasoning to a human operator and directly incorporating expert knowledge is
17
difficult [14][15]. Expert systems use heuristic knowledge to emulate human reasoning and
perform VHM [16]. These systems are unable to detect gaps and inconsistencies in the
knowledge base and cannot learn from their errors. Constraint suspension uses a discrete system
model to check for inconsistencies in analytically redundant measurements without knowledge
of failure modes [17]. Chapter 2 describes constraint suspension in more detail. The most
prominent uses of Al in spacecraft VHM were the technology demonstration missions Deep
Space 1 (DS-1) and Earth Observing 1 (EO-1), which flew versions of the Livingstone Al VHM
system [18] [19]. Livingstone 2 on EO- 1 successfully isolated simulated faults while running for
as long as 55 days at a time and over 143 days total. However, traditional software verification
and validation (V&V) testing approaches were not feasible for these algorithms due to their
complexity, and the V&V approaches used may be insufficient for manned spacecraft V&V
requirements. Therefore, Al approaches under development offer the ability to detect and isolate
a variety of faults but are not ready for implementation in current vehicles.
Recent research has attempted to bridge the gap between quantitative and qualitative
analytical redundancy [16] [20]. Constraint suspension provides one starting point for integration
of the two fields. Davis introduced constraint suspension as a simple way to diagnose discrete
systems without modeling failure modes [17]. Constraint suspension was used effectively on
circuits, but could not be used on analog systems in its original form. Fesq built on Davis's work
with the Marple software package, extending constraint suspension for use on spacecraft by
adding analog capability [21]. Marple has been tested on spacecraft flight data and run on a real
time processor with a high fidelity subsystem model [22][23]. Despite its many advantages, the
algorithm is unable to robustly detect faults without significant tuning and has never been flighttested.
Based on this survey of traditional VHM approaches, there does not exist a VHM
method both simple enough for deployment on current manned flight systems, yet adequate for
detecting and isolating a large class of faults. This thesis uses proven quantitative analytical
methods to address a key weakness in the fault detection piece of Marple. The research goal is
thus a VHM algorithm simple enough for implementation in current vehicles, but able to detect
and isolate a large class of faults.
18
1.3 Thesis Overview
This thesis is comprised of five chapters. Chapter 2 provides background on the key
algorithms used throughout this thesis. Constraint suspension, parity space, and hypothesis
testing are provided as tools to be combined in the final VHM algorithm. Chapter 3 details how
these tools are integrated into Stochastic Constraint Suspension (SCS) and derives the expected
performance of the VHM approach. Conceptual and numerical examples are shown to illustrate
the improvement over original constraint suspension. Chapter 4 applies SCS to a generic
numerical spacecraft subsystem application. Simulated sensor measurements from the model are
used to illustrate the algorithm's performance and investigate its sensitivity to inputs and
modeling inaccuracies. Chapter 5 summarizes the research and presents challenges and
recommendations for future research.
19
20
Chapter 2: Key Algorithms
This chapter presents the three key methods in VHM that will be integrated into
Stochastic Constraint Suspension (SCS) in Chapter 3. Parity space and hypothesis testing are
proven methods for detecting and isolating faults in redundant hardware. Constraint suspension
in its analog form is a powerful VHM approach that has the potential for detecting and isolating
a large class of faults. This chapter details the theory behind the approaches and Chapter 3 will
describe how they are used in SCS.
2.1 Parity Space and Hypothesis Testing
Parity space and hypothesis testing have been used extensively in aerospace applications
for fault detection and isolation (FDI) in redundant hardware. Together, they determine
consistency between noisy measurements with known uncertainties using preset false alarm
rates. FDI typically utilizes the residual generation portion of parity space and the threshold
generation and comparison aspects of hypothesis testing. Gleason and Gebre-Egziabher discuss
the common use of parity space and hypothesis testing for global navigation satellite system
(GNSS) integrity monitoring [24]. For instance, Global Positioning System (GPS) receivers
monitor the integrity of the signals they receive from satellites using the redundant information
available from extra satellites in view. Sturza presents the approach for use in monitoring
accelerometers and gyroscopes in inertial measurement systems [25]. Strap down inertial
measurement systems are often placed in skewed configurations to maximize redundancy. Parity
space uses the redundancy with known geometry to monitor sensor failures. Recently, Draper
Laboratory implemented parity space and hypothesis testing techniques to monitor redundant
sets of GPS, inertial measurement unit (IMU), and Light Detection And Ranging (LIDAR)
navigation sensors on the Orbital Sciences Corporation Cygnus vehicle [26]. Cygnus is meant to
provide cargo resupply to the International Space Station (ISS), a manned spacecraft and
valuable national asset, meaning knowledge of failures is especially critical.
21
2.1.1 Parity Space
Historically, parity space techniques use information about the geometry and uncertainty
of measurements to compute a decision scalar that represents the inconsistencies between
sensors. The decision scalar is then compared against a threshold computed from hypothesis
testing methods. More specifically, parity space transformations project the measurement vector
into the space orthogonal to the measurement space. The result is a vector containing
information about the measurement error, independent of the sensed state. The decision scalar is
the weighted magnitude of this vector and is directly used to detect a fault. This thesis only
utilizes the fault detection component of the parity space algorithm.
The decision scalar is computed as follows. The linearized measurements can be
expressed as
z = Hx
(2.1)
where H is the measurement geometry matrix and x is the system state vector. The analysis
assumes the measurements contain Gaussian white noise. The weighted least squares estimate of
the state~x is computed as
X = Hin,z
(2.2)
where Hi,, is the pseudoinverse of H and is defined as
Hinv = (H T * W * H )
1
*HT * W
(2.3)
where W is a weighting matrix containing the measurement uncertainties and correlations [24].
For independent measurements, W can be written as
1/2
=
0
...
1/,
'
0
1
'
(2.4)
0
0
...
0
where n is the number of measurements available at a given time step and o-i represents the
standard deviation of the sensors' measurement error. The residual vector f is the difference
f = z- z
(2.5)
2 = H2= HHinyz
(2.6)
where 2 is computed using
22
with substitution from Eq. 2.2. Substituting Eq. 2.6 and simplifying,
f = z - z = z - HHinoz = (I - HHi,,,) * z = Sz
(2.7)
where I is the square identity matrix of appropriate dimension and S is a transformation matrix
used in practice to compute f. The decision scalar D is computed as
D = f Wf =
... +
(2.8)
(L)
where fi is the ith element of the residual vector. The decision scalar D represents a weighted
magnitude of the residual vector f. This decision scalar is the result of the parity space algorithm
to be used in hypothesis testing.
2.1.2 Hypothesis Testing in FDI
Hypothesis testing techniques provide a threshold T to directly compare against the
decision variable D. The acceptance or rejection of the null hypothesis proceeds according to
HO = D < T
(2.9)
Hi = D > T
(2.10)
where the null hypothesis HO states that the system is healthy and the alternate hypothesis
Hi states that there is a fault in the system [24]. As with all hypothesis testing techniques, two
errors are possible. A rejection of the null hypothesis for a healthy system is a Type I error,
called a false alarm (FA). The reverse scenario, the acceptance of the null hypothesis for a faulty
system, is a Type II error referred to as a missed detection (MD). The setting of the threshold T
dictates the tradeoff between the probability of false alarm P(FA) and probability of missed
detection P(MD).
In order to avoid excessive false alarms and missed detections in setting T,
characterization of the decision scalar D distribution is crucial. As shown previously, D is the
sum of squared random normal variables if the residuals are assumed to be normally distributed.
The sum of squared random normal variables forms a chi-squared distribution [27]. When there
is no fault, the residuals have zero mean and D follows a central chi-squared distribution. This
analysis assumes faults are in the form of a measurement bias as is typical in failure modeling.
The algorithm is able to detect other forms of faults, such as a component failing on or off, but a
fault bias is convenient for the hypothesis testing illustration. The measurement bias induces a
bias into the residuals, meaning D follows a noncentral chi-squared distribution for a faulty
system. Appendix A provides more insight into the form of chi-squared distributions. Figure 1
23
illustrates the central and noncentral chi-squared distributions along with an example threshold.
A decision scalar D that is greater than T for a nominal system (central chi-squared distribution)
results in a false alarm. Therefore, the area under the central distribution curve to the right of T
represents the P(FA). Conversely, the P(MD) is the area under the noncentral chi-squared
distribution less than T and is also labeled. Figure 1 shows that the selection of T is a tradeoff
between the P(MD) and P(FA) for a given system. Chi-squared distributions are defined by the
degrees of freedom (DOF) and in FDI the DOF can be represented as the difference M-N, where
M is the number of measurements and N is the number of state vector components. The
noncentral chi-squared distribution is also defined by the noncentrality parameter k, which will
be discussed in more depth in Chapter 3. Additional redundant measurements increase the DOF
and lower both the P(FA) and P(MD). Improving the signal-to-noise ratio (SNR) provides a
similar effect.
0.5
0.40.3.
X2 (M-N)
LL.
0
0.2
0.-
TP
P(MD)T
)x 2(M-N, X)~
0
0
2
4
6
8
10
X
Figure 1: The Hypothesis Testing Tradeoff between P(FA) and P(MD)
Explicit formulas for selecting T are presented in literature for use with the parity space
approach [24]. Probabilistic reasoning using Figure 1 leads to
24
P(FA) = X 2 (D > TIM - N) = 1 -X
2
(D < TIM - N)
(2.11)
where X2 is the central chi-squared probability density function. Eq. 2.11 can be rearranged and
expressed as
X2 (D < TIM - N) = 1 - P(FA) = X2af(TIM - N)
(2.12)
where Xdfis the central chi-squared cumulative distribution function. Solving Eq. 2.12 for T,
T = X 2f
(1 - P(FA)IM - N)
(2.13)
where P(FA) is the desired probability of false alarm, M-N is the DOF, and X2
cdf
is the inverse
cumulative chi-squared function. Due to the complexity of the chi-squared cumulative
distribution function, a numeric lookup table is used for setting T in practice. The ability to set T
according to a desired P(FA) is a significant advantage to hypothesis testing because VHM
performance is often challenging to predict.
2.2 Constraint Suspension
Constraint suspension forms the basis of the SCS algorithm presented in Chapter 3.
Constraint suspension does not have flight heritage like parity space and hypothesis testing, but it
is a powerful approach for detecting and isolating a wide variety of system level faults. The
version of constraint suspension described and used here is taken from the Marple VHM
software pioneered by Fesq [21]. Fesq extended constraint suspension to handle analog systems
with the goal of applying the approach to spacecraft. The detection and isolation procedures are
outlined here followed by a discussion of the algorithm's merits and limitations.
2.2.1 Detection
The system to be diagnosed is first modeled in a modular mathematical form. The
modules are called components and can represent specific hardware and subsystems, or abstract
models such as dynamics. The components are connected by nodes that transmit information
within the model. Forward and reverse constraint functions are defined for each component,
mapping inputs to outputs and outputs to inputs respectively. Constraint functions are not
restricted to be linear or continuous and may be based on analytical or empirical relationships.
Sensors are present at nodes where observability exists. At a given time step, sensor values are
propagated forward and backward through the model using the constraint functions. The
25
propagation stops when it reaches a node where a sensor is present. This procedure results in
multiple values at every node, originating from forward propagation, reverse propagation, or
sensors at that particular node. In a nominal system, the values will be consistent at each node.
Inconsistency at any node triggers the isolation algorithm. Figure 2 shows a generic system
constraint model with example data paths. Sensed values at the boundaries are propagated
through components using constraint functions, resulting in analytically redundant values at each
node for comparison. Figure 3 illustrates example constraint function connections within an
adder component and the resulting values to be checked for consistency. Each node is a function
of the values at the other two nodes. The output node has three values to compare against each
other, originating from forward propagation through the adder, a sensor at the node, and reverse
propagation from nodes upstream. The adder illustrates that propagation through reverse
constraint functions is not always possible in practice, because knowledge of the second input
value would be required and may not be available [21].
Sensed
input
Sensed
-, output
'4
Propagated
input
S%*%
output
o Sensed
Sensed
input
Propagated
input
output
LI
Component
---
Node
4--
'
Figure 2: Example System Constraint Model
26
Propagated
Forward data path
Reverse data path
Sensor
Forward
constraint
S..
Reverse
Propagated
x+y
Reverse
constraint
Z-x
Consistency
Adder
Check
Figure 3: Example Adder Component Constraint Model
Figure 4 and Figure 5 provide basic examples to demonstrate constraint suspension fault
detection. The system consists of one adder and two gains and ignores sensor noise for
simplicity. Values at the central node are displayed after propagation from the sensors at the
boundary nodes. Figure 4 measurements agree because the system is performing nominally,
while the measurements in Figure 5 are inconsistent due to a fault in the adder component. The
location alone of the inconsistent node does not reveal the failed component and an additional
isolation procedure is needed.
In real systems, noise and model inaccuracies will be present, making perfect agreement
at any node unlikely. A more sophisticated method for checking consistency is therefore
required. This approach must distinguish inconsistency due to noise from inconsistency due to a
fault. Fesq provides two ideas as starting points [21]. First, tolerances can be set for the
difference between the minimum and maximum values at each node. This fixed threshold (FT)
algorithm compares the range to a predetermined value at each node. The second approach uses
percentage differences rather than absolute differences. These methods require significant tuning
and have minimal quantitative basis. Chapter 3 provides a conceptual example that illustrates
the weakness in the FT technique and offers an improved methodology.
27
Key
Sensed
n/a
Forward
2
0
Sensor
Component
Information Flow
Reverse1
---------------
Sensed = 5
Input = 1
Sensed = 10
Forward
* ----------------
Reverse2
Figure 4: Detection Example for a Nominal System
Figure 5: Detection Example for a Faulty System
28
2.2.2 Isolation
The propagation and consistency checking algorithms used in detection enable isolation of the
faulty components or sensors.
Isolation is performed by systematically suspending each
component, and again propagating the sensor values through the system to compare analytically
redundant values. No information is propagated forward or backward through a suspended
component, effectively removing all assumptions about its operation.
suspended by ignoring their measurements.
Sensors may also be
If after suspending a component or sensor,
consistency is achieved at all nodes, then the suspended component is added to a list of
potentially faulty components.
However, if inconsistencies remain, then the component is
exonerated and the fault has not yet been properly isolated. Simultaneous failures can also be
isolated if all faulty components are suspended.
However, suspending all combinations of
possible faulty components can be very computationally expensive so it can usually be assumed
that two component failures do not occur within the same time step. The algorithm outputs a list
of potentially faulty components, with sensor placement and component configuration
determining the diagnostic resolution. Without observability into certain nodes, a fault may only
be isolated to a set of components rather than to a specific component. Diagnostic resolution in
constraint suspension is discussed by Fesq [21], where component sets with no internal
observability are called superstructures.
Figure 6 provides the basic pseudocode for constraint suspension. In order to simplify the
pseudocode, this version does not include the logic needed to handle simultaneous faults or
hierarchical systems.
The constraint suspension implementation tested in Chapters 3 and 4
contains these capabilities.
29
In puts: Constraint model (includes forward and reverse constraint functions
and the appropriate connections between components), sensor
measurements, fixed tolerance levelsfor each node
0. Place sensor data into model
1. Propagate sensor values forward and backward
2. Check nodes for consistency
3. If (fault detected)
a) Generate list of possible faulty components and sensors
b) For (each component and sensor on list)
i.
Suspend component/sensor
ii. Propagate sensor values forward and backward
iii. Check nodes for consistency
iv. If nominal: add component/sensor to output list
v. Unsuspend component/sensor
4. Return output list
Output: List of potentially faulty components and sensors
Figure 6: Pseudocode for Constraint Suspension
Figure 7 continues the simple example presented in Figure 5. Suspending the faulty
adder component results in agreement at the central node because no information is propagated
through it. It can be shown that suspending either of the two gain components would not remove
the inconsistency. Removing the sensor at the output of the adder would also result in agreement
in the model. This illustrates the diagnostic resolution limitation at the boundary of a system
model.
30
Sensed
Forward
n/a
2
Reversel n/a
Reverse2 2
Key
0
Sensor
Component
Information Flow
------>
FAULT
Sensed = 0
Input = 1
x.5
Sensed = 10
Figure 7: Isolation Example for a Faulty System
2.2.3 Merits and Limitations
Analog constraint suspension has several inherent advantages compared to many of the
methods described in the literature review. First, it takes advantage of modeling and
characterization work done during the design and test process. The VHM designer can organize
these results into constraint models rather than build a new system model. VHM testing results
could even be valuable in influencing the system design by demonstrating observability of
critical faults. Second, the algorithm can be applied to a wide range of system models.
Hierarchical models can be used to scale the algorithm to more complex systems. For instance, a
top level model may have components representing subsystems so that a fault could first be
isolated to a subsystem. Lower level constraint models within this component could then be used
to further isolate the failure. The algorithm flexibility also refers to its ability to handle discrete,
continuous, or hybrid systems using the constraint functions. Third, the application system model
of defined connections and constraint functions is an input to the algorithm, facilitating algorithm
code reuse between applications. The reuse of VHM architectures between applications
addresses a key need in the fault management community identified in a 2008 workshop [5].
Fourth and most importantly, a large class of faults can be detected and isolated if they are
observable, because no failure mode information is needed in the flight software (FSW). The
FSW only contains nominal system models and any behavior detected as off nominal is flagged a
as a fault. Single sensor, common cause sensor, actuator, multiple component, and subsystem
31
level failures are all within the scope of the constraint suspension. Common cause sensor
failures refer to the failure of a line of redundant hardware components, as in the STS 51-F
example presented in Chapter 1.
Computation time may be a limitation of the constraint suspension approach. The
computation needed to detect and isolate a fault may be greater than that of traditional VHM
methods such as parity space and limit checking. Kolcio proposed and successfully
demonstrated the "chase data" implementation of constraint suspension to address this issue,
where each isolation step uses a new time step of sensor data rather than the original time step
where the fault was detected [22]. This assumes the fault is persistent across multiple time steps,
meaning transient faults would be difficult to isolate. Today, faster processors and multi-core
technology may enable new solutions to the computing challenge.
Constraint suspension is unable to determine the underlying cause of a given fault. The
lack of failure mode information is an advantage in robust isolation to a component or sensor,
but prevents the algorithm from specifying how the hardware failed. In a manned spacecraft
application, redundancy is typically present in all critical systems and the isolation of a fault to
specific piece of hardware is enough to reconfigure the system by swapping a redundant system
for the faulty system. Failure mode information is not necessary in this scenario and may be
deduced at a later time by additional analysis offline. If failure mode information is necessary
during FDI, additional logic must be added to the constraint suspension algorithm. The Sherlock
VHM system presented by deKleer achieves this by combining constraint suspension with fault
modeling [28]. Diagnosis of a fault is beyond the scope of this thesis.
Chapter 3 will show that the FT consistency checking method outlined previously is a
significant weakness of analog constraint suspension. The thresholds directly impact VHM
performance in both detection and isolation, yet there is not a rigorous procedure for selecting
them appropriately. Dvorak [29] and Goldstone [30] proposed alternative methods for
consistency that use interval propagation followed by checking for overlap. Fesq chose tolerance
checks at each node to allow for more granularity in tuning VHM performance [21]. Chapter 3
incorporates a proven, quantitative approach for consistency checking in constraint suspension
using parity space and hypothesis testing. Conceptual and numerical examples explicitly show
improvement over the FT implementation.
32
Chapter 3: Stochastic Constraint
Suspension
Stochastic Constraint Suspension (SCS) utilizes parity space, hypothesis testing, and
constraint suspension techniques in order to significantly improve fault detection and isolation
performance. SCS explicitly accounts for measurement uncertainties and provides a generalized
approach for consistency checking. This chapter details the SCS approach, including the
uncertainty propagation algorithm and the efficient use of hardware redundancy. Analytical
expected performance of SCS is derived and conceptual and numerical examples are provided
for a single node case. The numerical and conceptual results demonstrate improvement in
detection over FT constraint suspension and agree with analytical predictions.
3.1 Algorithm Description
Stochastic Constraint Suspension builds upon the architecture of constraint suspension as
detailed in Chapter 2. Figure 8 gives the simplified pseudocode for SCS. The changes from the
FT constraint suspension implementation in Figure 6 are in bold. Rather than fixed tolerance
levels, the P(FA) at each node is an input to the algorithm based on the acceptable risk
requirements. In addition to measurement values, associated uncertainties are propagated
through the system constraint models. The SCS consistency check then uses parity space and
hypothesis testing techniques to more robustly determine agreement at each node. Eqs. 2.8 and
2.13 are used to compute the decision scalar and the threshold respectively. Eqs. 2.9 and 2.10
are then used to determine the presence of a fault. The rejection of the null hypothesis at any
node indicates a fault and triggers the isolation algorithm.
33
Inputs: Constraint model (includes forward and reverse constraint functions
and the appropriate connections between components), sensor
measurements, P(FA) for each node
0. Place sensor data into model
1. Propagate sensor values and uncertainties forward and backward
2. Check nodes for consistency using parity space and hypothesis testing
3. If (faultdetected)
a) Generate list of possible faulty components and sensors
b) For (each component and sensor on list)
i. Suspend component/sensor
ii. Propagate sensor values and uncertainties forward and
backward
iii. Check nodes for consistency using parity space and
hypothesis testing
iv. If nominal: add component/sensor to output list
v. Unsuspend component/sensor
4. Return output list
Output: List of potentially faulty components and sensors
Figure 8: Stochastic Constraint Suspension Pseudocode
The algorithm dynamically calculates the threshold value at each node based on both the
number of redundant analytical measurements available at the node and on the P(FA) input
parameter. As shown in Eq. 2.13, the number of analytical measurements determines the DOF
of the chi-squared distribution. As the number of DOF increases, the chi-squared distributions in
Figure 1 are shifted, reducing the missed detection and false alarm areas. Therefore, additional
measurements always improve the performance of the algorithm, as would be expected for an
estimator. As the input P(FA) increases, the threshold decreases in magnitude, reducing the
P(MD) at the expense of P(FA). The fault bias affects the noncentrality parameter X. A larger
fault bias skews the mean of the noncentral distribution to the right, reducing both the P(FA) and
P(MD) for a given T. Appendix A explicitly derives the relationship between the fault bias and
k.
34
3.1.1 Uncertainty Propagation
As discussed above, parity space and hypothesis testing require an estimate of
uncertainties associated with each analytical measurement. Though the uncertainties of the
sensor measurements are assumed to be well characterized from vendor specifications and/or
testing, the analytical measurement uncertainties are not necessarily known a priori. The
calculation of the decision scalar using Eq. 2.8 requires uncertainty estimates at each node.
Without accurate analytical estimates of the analytical measurements, parity space computations
are not feasible. Knowledge of measurement uncertainty is critical in improving VHM
performance using SCS. Measurement uncertainty includes sensor noise, misalignment,
disturbances, and any other variables other than a fault that may impact measurements of the
state. In order to solve this problem, the sensor uncertainties are propagated with the analog
measurement values through the constraint model components. Several algorithms exist for
propagating uncertainty through a system with known dynamics. The method chosen here uses
the linearized constraint functions. The uncertainty is propagated using
Ef = AgZSAT +
Q
(3.1)
where AR is the Jacobian of the constraint function evaluated at the current estimate 2, Ii is the
initial covariance matrix, Ef is the final covariance matrix, and
Q represents
uncertainty in the
constraint function [31]. Using Equation 3.1, normal probability distributions for each
measurement, represented by a mean and variance, can be propagated through component
constraints in the model. Though better methods exist for propagating uncertainty in nonlinear
systems, this method was chosen for its simplicity and low computation cost. More sophisticated
methods can be used if higher fidelity knowledge of uncertainty is required.
3.1.2 Utilizing Hardware Redundancy
The constraint suspension framework is not intended to eliminate the use of traditional sensor
level FDI, such as with parity space and hypothesis testing, for consistency checks of redundant
hardware. Manned spacecraft will continue to have significant hardware redundancy and the
direct use of sensor level FDI on redundant hardware is a proven, computationally efficient
method. Therefore single sensor failures will be isolated using traditional methods where
possible. Higher level FDI logic such as that used in SCS requires significantly more
35
computation than sensor level FDI. An advantage of SCS is its ability to isolate a wide variety
of faults, such as actuator and common cause sensor failures.
Sensor1
Sensor2
Sensor3
Low Level Parity Space &
Hypothesis Testing
Stochastic Constraint Suspension
Figure 9: Utilizing Redundant Sensor Hardware
Figure 9 illustrates how the low level parity space and hypothesis testing on redundant
hardware fits into the SCS framework. Low level sensor checks save computation by isolating
single sensor failures before the SCS algorithm is initiated. Sensors placed at a node represent all
redundant sensors with direct observability of that node. Parity space and hypothesis testing
check the consistency of the measurements before they are placed at a node for use in SCS. Any
faulty measurements will be removed from the system before the constraint suspension process
is initiated and SCS will see the redundant sensors as a single sensor. If a common cause sensor
failure occurs, the constraint suspension process can isolate the fault to the sensor group.
3.2 Analytical Expected Performance
The ability to predict consistency checking performance in terms of P(FA) and P(MD) is
a significant advantage of parity space and hypothesis testing over the FT implementation. All
VHM methods must make tradeoffs between P(FA) and P(MD). In SCS, P(FA) is an input to
the algorithm, which then fixes P(MD) for a given system. The expected P(MD) as a function of
P(FA) and the fault bias at a given node is derived here. As in Chapter 2, the fault is modeled as
a bias in one of the measurements.
36
Using Figure 1, the P(MD) for fault mode i is
P(MDi) = P(D < TIM - N,li)
where
XNcdf
=
XNcdf(TIM - N,A/i)
(3.2)
is the noncentral cumulative chi-squared distribution and Ai is the noncentrality
parameter for fault mode i. An expression for /i is derived in Appendix A and given in Eq.
A.20. Ai is a function of the fault bias and measurement uncertainties. A larger fault bias results
in a larger Ai, causing the noncentral distribution to be skewed more to the right. As expected,
the larger fault bias results in easier detection and thus better performance through lower P(FA)
and P(MD). P(MDi) can be expressed a function of the P(FA) input with
P(MDi) =
X2 cdf
(Xcd
(1 - P(FA)IM - N) M - Ni)
(3.3)
where Eq. 2.13 was substituted for T. XN_c dfand Xccdf represent the noncentral and central chi-
squared cumulative distribution functions respectively. Section 3.3 uses this result to compare
simulation results to predicted performance.
The overall P(MD) at the node is computed by summing the weighted probabilities of each
fault mode using
P(MD) =
P(Faulti) * P(MDi)
(3.4)
where P(Faulti)is the probability of occurrence for fault mode i. If all faults are assumed
equally likely, Eq. 3.4 can be simplified to
P(MD) =
-
P(MDi)
(3.5)
where m is the number of fault modes. The equations derived here are valid for P(MD) at a
given node. The system level performance measured by P(FA) and P(MD) will depend on the
system model and sensor placement.
3.3 Demonstration of Improvement
This section demonstrates the value of the SCS implementation as compared to the FT
approach for consistency checking at nodes. A representative conceptual detection scenario is
presented and then implemented numerically. The improvement in detection is expected to
result in better isolation performance.
37
3.3.1 Conceptual
The following conceptual example illustrates the value of parity space and hypothesis
testing for node consistency checks. This representative scenario has three measurements to
compare at a node. Since the purpose of this example is to compare arbitrary analytical
measurements, the origin of the values is not important here. The measurements may result from
forward propagation, reverse propagation, or direct sensing at the node. Figure 10 shows
example origins and distributions for the three measurements. Two measurements have
relatively low uncertainty (xi and X2), while the third has a relatively high uncertainty (x 3). The
distributions of xi and x2 are offset, indicating the possible presence of a fault in the system.
2e
x1
* x2
x3 from sensor
x1 from
.
1.5
x3
x2 from
forward
propagation
reverse
propagation
0.5
05
10
15
x
Figure 10: Representative Measurement Comparison Conceptual Example
In the FT implementation of constraint suspension, the difference between the maximum
and minimum values is computed and compared to a predetermined threshold T [21]. The
presence of x 3 with relatively high uncertainty forces the choice of a large T to account for the
possible variation and to avoid excessive false alarms. However, a large T prevents the
algorithm from recognizing the significant difference in the higher quality measurements xi and
x 2,
increasing the missed detection rate. Despite having two relatively high quality
measurements, the VHM algorithm is forced to make a suboptimal tradeoff between the P(MD)
and P(FA). The extra information provided by x 3 is decreasing VHM performance, when
additional measurements should only improve an estimator's performance.
The weakness of the FT approach is its inability to use knowledge of measurement
uncertainty. The FT method is sufficient if measurements have equal uncertainties, but this is
rarely true in constraint suspension consistency checking. This is because the analytically
38
redundant measurements originate from different sensors and some are propagated through
constraint functions. A threshold for each combination of two measurements is a feasible
solution but would be difficult to implement in practice. Parity space and hypothesis testing
address these issues by explicitly accounting for measurement uncertainties. They also ease the
burden of the system designer by removing the need to tune T at every node. P(FA) is an input
to each node, making the VHM performance directly tunable.
3.3.2 Numerical
The simulation presented here was created to quantitatively demonstrate the conceptual
scenario in Section 3.3.1. The FT and SCS consistency checking algorithms were implemented
at a single node. Three measurements, x1 , x2 , and x3, were present at the node, with a
deterministic bias added to x 2 to simulate a fault. The uncertainty of measurement x 3 was varied
to test VHM performance as measurement uncertainties increasingly differed.
Performance was defined relative to the P(FA) and P(MD). A single standard
performance metric does not exist for VHM algorithms so a new one is presented here [5]. A
figure of merit (FOM) was developed so that it yields values between 0 and 1, where higher
values indicate better performance. The FOM is expressed as
FOM = 1 - c * P(FA) +
1
Ci
C2
* P(MD)
(3.6)
+ C2
where ci and c2 are constants allowing the designer to tune the relative importance of false
alarms and missed detections, as the requirements will vary between applications. For this
simulation, ci and c2 are set to 1.
First, the inputs to FT and SCS algorithms, T and P(FA) respectively, must be set
appropriately. Selecting the thresholds for the FT method is not straightforward due to the
difficulty in mapping T to P(FA) and P(MD). In order to fairly assess the performance of the FT
implementation, the algorithm was numerically optimized at each data point. A range of
thresholds was input for each set of data and the threshold resulting in the highest FOM was
chosen. Therefore, the data presented for FT represents the best possible performance for the
given data and the chosen FOM.
The P(MD) is known for a given failure mode and input P(FA) in parity space and
hypothesis testing. Therefore, the optimal input P(FA) can be calculated for a given FOM,
measurement DOF, measurement uncertainties, and expected fault bias using Eq. 3.3. A closed
39
form analytical solution is cumbersome due to the form of the chi-squared cumulative
distribution functions, but a lookup table was computed a priori to provide the P(FA) input
expected to maximize FOM. This lookup table was used to select the input P(FA) for each SCS
data point. Simulations comparing performance with numerically optimized P(FA) input and
with P(FA) input based on the optimal lookup table showed nearly indistinguishable
performance, rendering numerical optimization unnecessary for SCS.
1
0.95'
0.9
0.85
0.8
-
FT
-
SCS
SCS Expected
0.75
0.7
0
5
10
G3
15
0
Figure 11: Numerical Simulation Results Comparing SCS and FT.
40
-FT
SCS
SCS Expected
0.99
0.98
0
LL
0.97
0.96
0.95
0
15
10
5
0-3
0
Figure 12: Zoomed in Numerical Simulation Results Comparing SCS and FT.
Figure 11 presents the simulation results and Figure 12 shows a zoomed in area from
Figure 11. 1000 sets of measurements for both the faulty and nominal scenarios were generated
for each data point. These measurements were fed to the FT and SCS implementations of
consistency checking to numerically determine P(FA) and P(MD) for each method. Between
data points, the ratio a3 /uo was varied, where q3 is the uncertainty of x 3 and u0 is the uncertainty
of measurements xi and x2. This effectively increased the difference between uncertainties
among the measurements considered. At low ratios, the measurements had very similar
uncertainties and the FT method could be optimized to work as well as the SCS implementation.
As the ratio grew, FT performance declined rapidly while SCS maintained a relatively high
FOM. SCS maintained 97% of its original performance after a one order of magnitude increase
in the ratio, while FT degraded to 87% in the same interval. This ratio increase is the equivalent
of a sensor performance degradation leading to one order of magnitude more noise in the signal.
Decrease in the SCS FOM is due to a lower overall signal-to-noise ratio (SNR) in the system at
41
higher ratios. Decrease in the FT FOM is primarily due to the logic presented in Section 3.1.1.
An expected performance curve derived from Eq. 3.3 and knowledge of the only fault mode
validates the simulation results.
Further validation of the logic presented in Section 3.3.1 is possible through a simulation
with only two measurements available at a node. Due to the presence of only one possible
combination of measurements for comparison, a single threshold is sufficient and the FT method
can be optimized to perform as well as SCS for any noise ratio. As with the three measurement
analysis presented earlier, 1000 data sets were generated for both the faulty and nominal
scenarios with fault insertion done by adding a deterministic bias to x 2 . The ratio c 2 /U0 was
varied, where
a2
is the uncertainty of x 2 and aois the uncertainty of measurement x1 . The
threshold for the FT method was again numerically optimized, while the P(FA) input for SCS
was found using a lookup table tuned for one less DOF
0.95
0.9:
-FT
0
IL
-SCS
SCS Expected
0.85
0.8
0.75
0
2
6
4
2
8
0
Figure 13: Numerical Simulation Results for Two Measurements Scenario
42
Figure 13 shows the simulation results for the two-measurement scenario. As predicted,
the performance of SCS and FT are indistinguishable, with performance degradation occurring
due to decreasing SNR. Again, the expected curve for SCS provides validation of the data.
Though performance in the two measurement case is identical, the numerical optimization of the
FT approach is less practical. The SCS implementation is therefore more desirable because of its
predictable performance.
43
44
Chapter 4: Spacecraft Application
As with any algorithm, Stochastic Constraint Suspension (SCS) must be validated before
use on a flight system [5]. Simulated sensor data from a representative system provides an
inexpensive way to validate and refine SCS on the ground. The selected spacecraft application is
the guidance, navigation, and control (GNC) subsystem of a generic manned spacecraft with
typical sensors and actuators present. This application leverages Draper Laboratory's extensive
experience with manned spacecraft GNC. For this purpose, we created a new simulation to
produce realistic sensor data for the VHM algorithm. We present the 3-DOF simulation
architecture and the models of vehicle actuators, dynamics, and sensors. We then converted the
vehicle simulation into a constraint model for embedding inside SCS. The constraint functions
and framework are detailed along with the alternative constraint representations considered.
Finally, SCS performance data is provided for a variety of test cases, including a direct
comparison to the Fixed Threshold (FT) implementation at the system level.
4.1 Application Overview
The application used for validation in this chapter is a generic GNC subsystem on a
manned spacecraft. The sensors, actuators and associated uncertainty levels present are
representative of a portion of a typical spacecraft GNC subsystem. Noise levels are based on
space qualified hardware specifications where open source data is available. This 3-DOF
simulation accounts for translational motion only, so attitude sensors (e.g., star trackers,
gyroscopes) are not included.
The simulated spacecraft performs maneuvers in orbit using a set of 11 primary thrusters.
The configuration was chosen for 6-DOF control on a lifting body design. Each thruster
produces 216 N of force with a standard deviation of 10 N when on. These values are based on
the European Space Agency (ESA) Automated Transfer Vehicle (ATV) reaction control system
thrusters [32]. Second and third string redundant thrusters are typically present in manned
spacecraft but are not included here, because they are not necessary for validating the SCS
algorithm. To improve the observability of thruster failures, one temperature sensor was added
to each thruster; these sensors determine the on or off state. Alternately, pressure sensors could
45
have been used to provide similar information. The navigation system uses an inertial
measurement unit (IMU) containing an accelerometer and a gyroscope for each axis. The
accelerometers and gyroscopes provide body frame accelerations and angular velocities,
respectively. A Global Positioning System (GPS) receiver is modeled as providing absolute
position and velocity data. The navigation system will also have redundant sensors that are used
for redundancy and fault detection. These are not included in the model because the low level
FDI as shown in Figure 9 is assumed to be in place. The hardware configuration in Table 1
presents the vehicle layout and thruster directions. Figure 14 illustrates the approximate thruster
locations and directions and defines the coordinate system using side, top, and rear views. The
rear view does not show thrusters parallel to the x axis. Thrusters 4 through 11 are canted
relative to the coordinate axes and are split into their component vectors for illustration purposes.
Unit Direction Vector
Location (m)
Center of Mass
[0, 0, 0]
N/A
GPS Receiver
[0, 0, 0]
N/A
Inertial Measurement Unit
[0,0, 0]
N/A
Thruster 1
[-4, 0,0]
[-1,0,0]
Thruster 2
[3,-2, 0]
[1,0,0]
Thruster 3
[3, 2, 0]
[1,0,0]
Thruster 4
[-3,-1, 0]
[0,-0.707,-0.707]
Thruster 5
[-3, 1, 0]
[0, 0.707,-0.707]
Thruster 6
[-2.5,-0.5, 1]
[0,-0.707, 0.707]
Thruster 7
[-2.5, 0.5, 1]
[0, 0.707, 0.707]
Thruster 8
[1.5,-2,-1}
[0,-0.707, 0.707]
Thruster 9
[ 2,-2, 0]
[0,-0.707,-0.707]
Thruster 10
[1.5, 2,-1]
[0, 0.707, 0.707]
Thruster 11
[ 2, 2, 0]
[0, 0.707,-0.707]
Table 1: Generalized Hardware Locations and Thruster Directions
46
1
7X10
9/11
3
1
2
4
6
5
-Direction
SIE VIEW
IE381
RThruster
TOP
6/7
z
t4
94~
1
4/5
6
9
SIDE VIEW
x*
9/11
2/3
8/10
7
Z
Y%
8
8 9
10
5
4
REAR VIEW
Figure 14: Generalized Thruster Configuration and Coordinate System
4.2 Vehicle Simulation
A 3-DOF simulation of the vehicle hardware and dynamics produced sensor data for
VHM validation. The simulation was open loop because we are only assessing the algorithm's
ability to detect and isolate faults. The VHM algorithm is independent of the GNC approach and
thus there is no need to close the GNC loop. Therefore, the sensed state has no effect on the
programmed thruster commands. The complete architecture is presented in Figure 15.
Open loop on or off commands for each thruster are inputs to the simulation. Spacecraft
maneuvers are performed using predetermined sequences of commands. All simulations in this
chapter command all of the thrusters to the 'on' position for each time step to ensure
47
observability into thrusters stuck off. There is still a net acceleration on the vehicle in the x
direction to facilitate VHM testing. Forces on the vehicle are computed from these commands
based on the thrust vector and placement of each actuator. The actuator force magnitude for
thruster i is computed as
Fj = cmdi * N (Fnom, Fstd 2)
(4.1)
where Fnom is the nominal thruster force, Fstais the expected normal standard deviation of the
thruster force, cmdiis 0 for an 'off command and 1 for an 'on' command to thruster i, and
N(I, a2 ) indicates a random number generated from a normal distribution with mean p and
standard deviation a. The force vector from thruster i is then
F, = -Fi
*
dirt
(4.2)
where dtr is the unit vector in the ith thruster nozzle direction. The negative sign is present
because the thrust is in the opposite direction to the thruster nozzle.
External environmental disturbances also produce stochastic disturbance forces on the
vehicle. The environmental forces are produced with
env = N(0,Uenv 2 )
(4.3)
where Fen, is the zero mean environmental disturbance force and aen, is the normal standard
deviation of this force. The total force on the vehicle in the body frame, F, is
F =Jen, +
F
(4.4)
i=1
where the forces from the eleven thrusters and the environment are summed.
The total force from thrusters and disturbances are inputs to the dynamics model. The
dynamics model uses the vehicle mass properties to compute linear accelerations on each axis in
the inertial frame. The body and inertial frames are chosen to be equivalent in this simulation,
because no rotational motion is included. The accelerations are integrated over time to produce
the current vehicle state. The acceleration a, velocity v, and position r on each axis were
computed using
,F
a =V= f
48
=
i+d*dt
(4.5)
(4.6)
r
f i = ij +Vi0
*
(4.7)
it + 2* d 2* d t
where F is the force on the vehicle, m is the vehicle mass, initial values are indicated by the 0
subscript, and dt is the time step used in the simulation (set to 0.01 seconds).
The navigation sensors measure the truth acceleration, velocity, and position components
of the vehicle with given uncertainties. The accelerometer is based on the Honeywell Miniature
Inertial Measurement Unit (MIMU) [33] and the GPS is based on the Integrated GPS and
Occultation Receiver (IGOR) [34]. For GPS and accelerometer measurements, the sensed values
were simulated as
=
N (truth,Uaccei 2 )
(4.8)
r = N(truth, 0o ps2)
(4.9)
where d is measured acceleration on a given axis, P is GPS measured position on a given axis,
truth is the state component value computed by the simulated dynamics, Jaccei is the standard
deviation of white noise on the accelerometer measurement, and op is the standard deviation of
white noise on the GPS measurement.
The temperature sensor parameters were chosen to have a conservative signal-to-noise
ratio (SNR). Here the signal can be defined as the difference between the on and off temperature
because we are interested in determining on or off state rather than the absolute temperature.
The temperature sensors are modeled as
300 + N(0,T 2 ) : off
400 + N(0,o T 2 ) .
(4.10)
where T is the measured temperature on a given thruster in K, 300 K is the nominal off
temperature, 400 K is the nominal on temperature, and or is the standard deviation of the white
noise on the sensor measurements. These values provide a very conservative estimate of the
SNR because in reality, the temperature extremes could be much greater for a sensor measuring
thruster exhaust temperature. The space shuttle main engine (SSME) hot gas temperature sensor
was required to measure temperatures of approximately 1000 K during engine firings and 100 K
when the engine was off due to the cryogenic propellants. The sensor was required to have
<0.5% uncertainty in these conditions [35]. To avoid the need for such a complex and expensive
sensor, this simulation assumes coarse sensors slightly upstream of the thruster exhaust. This is
acceptable, because the purpose of the sensors here is to determine on or off state for each
49
thruster, whereas the Shuttle temperature sensors performed more involved VHM on the thruster
performance. Table 2 summarizes the nominal model parameters and associated uncertainties.
Standard deviation
Mean
216 N
ION
0N
1N
Accelerometer component
Truth
0.00212 m/s 2
GPS position component
Truth
1.2 m
T. = 400 K, T0 f= 300 K
5K
10,000 kg
0
1 Hz
0
Thruster Force
Environmental Disturbance
Force
Temperature Sensor
Vehicle Mass
Control and Sensor Rates
Table 2: Model Parameters and Uncertainties
Faults can be inserted for thrusters or sensors. We model thruster faults as valves stuck
in the on or off position. For algorithm evaluation purposes, we designed simulated maneuvers
that yielded observable results. If a thruster is failed off, it must be commanded on in the
maneuver for detection to be possible. Simulations in this chapter consider only one time step of
data in order to eliminate any dependence on the time elapsed before a fault. Therefore, the
maneuvers in this chapter command all thrusters 'on' for one time step in order for any thruster
stuck off to be observable. Though this is not a realistic thruster firing sequence, the maneuver
ensures observability into thrusters stuck off for validation purposes and there is a net
acceleration on the vehicle because the thrusters are not balanced in the x direction. For sensors,
a deterministic fault bias can be added to any hardware in the system. Sensor failures are
modeled as a step bias shift for consistency with literature [25]. For example, a fault bias of 1
m/s2
may be added to the accelerometer x axis measurement.
Several assumptions were made to simplify the simulation. Vehicle mass properties were
assumed to be constant and perfectly known. In practice, fuel consumption, flexible structures,
and vehicle reconfigurations would require mass properties to be periodically updated in the
constraint models. Thrusters are assumed to be fixed and either on or off, with no gimbaling or
throttling capability. All thrusters are assumed to have the same uncertainty in force. No
uncertainty in thruster direction or on time is incorporated.
50
Figure 15: Spacecraft Simulation Architecture
Sensor models incorporate the following assumptions. Sensors are assumed to be located
at the center of mass and the IMU frame is assumed to be aligned with the body frame to remove
the need for any frame transformations. Frame transformations are easily incorporated into
constraint functions if required. No uncertainty in sensor locations or alignment was included.
Thruster commands occur at 1 Hz. In reality, sensor rates will vary from the control rate. The
simulation assumes the SCS algorithm runs at the controller rate and samples each sensor at this
rate. Thus, sensor measurements are produced at 1 Hz in the simulation for use in the VHM
algorithm. Simulated sensor measurements are produced by adding white noise to the truth state.
In practice, logic must be in place to provide all sensor measurements at the desired VHM rate.
SCS requires sensor measurements at the VHM rate but sensor sampling rates will vary.
Therefore, a procedure for supplying the available sensor measurements at the appropriate rate to
SCS is required. However, the algorithm for selecting sensor measurements at the correct rate
for SCS is unnecessary for initial validation. The availability of sensor measurements at a faster
51
rate than VHM would enable the use of filtering and should only increase the signal-to-noise
ratio.
Vehicle orbital and rotational dynamics were not incorporated into the simulation in order
to mitigate the complexity of the constraint models for initial validation. Linear equations of
motion were used with discrete integration steps. Environmental disturbances were generated
using random normally distributed forces with zero mean and a given standard deviation.
Models of specific disturbances such as gravity gradient and drag were not included because the
level of dynamics of uncertainty is more important in SCS validation than the specific
disturbances.
4.3 Constraint Models
SCS requires a constraint model of the system that is going to be monitored. As
described in Chapter 2, this model consists of a network of components connected by nodes,
constraint functions within components, and sensors at appropriate nodes. Section 4.1 presented
the GNC spacecraft subsystem application. GNC subsystems are closed loop, creating an extra
complication in VHM implementation. Fault information may propagate through the feedback
loop and make it challenging to detect the faulty component or sensor. Fesq determined that
closed loop systems could be addressed by breaking the loop and not modeling the controller
[21]. However, this loop-breaking approach requires that the VHM software be fed sensor data
at a rate at least as fast as the control loop rate. This prevents faults in one cycle from affecting
data in the next cycle. All components of the algorithm and simulation operate at 1 Hz in order
for the control loop to be broken for VHM.
A given system has more than one possible constraint model representation, and the
designer's selection of a constraint model can be significant in determining the VHM's
diagnostic resolution and computational load. In Section 2.3, we showed that superstructures
represent the limits of diagnostic resolution based on a given constraint model and sensor
placement. A fault in a component within a superstructure can only be isolated to the full group
of components forming the superstructure. For example, decoupling vehicle dynamics into
multiple components may eliminate superstructures and allow isolation to a single component
rather than a group of components. However, extra components add complexity to the constraint
model and require additional computation. Constraint model complexity also impacts the
52
designer's ability to comprehend and verify the integrated model and algorithm. Figure 16 shows
the selected constraint model. Blue blocks and yellow circles represent components and sensors
as defined by SCS, respectively. The arrows point in the forward propagation direction. The
model contains more components than may be intuitive, but this design choice increases
diagnostic resolution for the given simulated hardware configuration.
ON/OFF
COMMANDS
FORCES/
TEMPERATURES
THRUSTERS
TEMPERATURE
SENSORS
ACCELERATIONS
POSITIONS
DYNAMICS
Figure 16: Constraint Model for Spacecraft Application
We initially considered a variety of constraint models and ultimately chose the model in
Figure 16 based on preliminary testing with the alternative models considered. Two alternate
constraint model implementations were initially considered for the vehicle GNC subsystem. The
first model did not contain any temperature sensors and, as a result, had no observability into the
on/off state of each thruster. Figure 17 shows this constraint model representation. Testing
demonstrated that large superstructures formed without any observability into each thruster's
state. The low diagnostic resolution occurs because multiple thrusters affect each degree of
53
freedom. A fault detected in a given degree of freedom could only be isolated to all thrusters
that affect that degree of freedom. Therefore, without a measurement to trace a fault to a
particular thruster, all thrusters affecting a given degree of freedom become a single
superstructure. The second model, shown in Figure 18, incorporated coarse temperature sensors
on each thruster to determine actuator on/off state, allowing for isolation of thruster failures to a
single thruster. Still, testing showed that navigation sensor errors were isolated to large
superstructures due to the dynamics modeling approach. The dynamics were initially modeled to
add the forces input to each to degree of freedom, and then integrate the acceleration to produce
position measurements. The dynamics component output both acceleration and position
quantities. By combining the addition and integration dynamics into a single component, a
sensor error could not be distinguished from a dynamics error. Therefore, a configuration was
used that improved sensor diagnostic resolution without adding extra hardware. In this model,
the integration of acceleration into position was separated from the original dynamics
component.
The inputs to the selected constraint model are the commands from control, which are on
or off for all 11 thrusters. The commands are sensed in software in order for the VHM algorithm
to have knowledge of the expected thruster behavior. These commands propagate through
individual thruster components that output the forces along the vehicle axes and the thruster
temperature. Thruster force outputs are only connected to the nominal DOFs affected. In
reality, there may be residual thrust components along unintended axes, but this uncertainty in
thrust direction is accounted for in the process noise of the dynamics constraint functions. The
thruster temperatures are sensed and the forces are input into dynamics components. The
dynamics are separated into the three translational degrees of freedom to improve diagnostic
resolution. The dynamics components output accelerations, which are sensed by the
accelerometers. The accelerations are also connected to integrator components that output
vehicle absolute position. The absolute position is sensed by the GPS receiver. The constraint
function equations that capture this model are provided later in this section.
54
ACCELERATIONS/
POSITIONS
ON/OFF
COMMANDS
-40
-0
DYNAMICS
THRUSTERS
0
sensor
Component
Forward Information Flow
Figure 17: Constraint Model without Temperature Sensors
Superstructures as defined by Fesq [21] are still present but can be eliminated with minor
hardware and logic additions. For instance, at constraint model boundaries, it is not possible to
distinguish between the boundary sensor and the associated boundary component. For the
navigation sensors, the boundary components are dynamics models rather than hardware.
Typically, a failure isolated to this superstructure should be associated with a sensor failure
because the dynamics cannot fail if the model is correct. On the other hands, dynamics failures
could be associated with unexpected external disturbances, such as bumping into another
spacecraft during a docking maneuver. This simulation assumes bumping cannot occur because
a second spacecraft or body is not modeled. Validation for landing, rendezvous, or docking
maneuvers may not make this assumption. At the input boundary, command sensors cannot be
distinguished from their associated thrusters. In this case, isolation should be associated with a
thruster failure because command sensors simply read the control commands in software. One
superstructure between two hardware components involves the thruster and the associated
55
temperature sensor. A fault in the temperature sensor results in a superstructure containing both
the sensor and the thruster. However, the reverse is not true, as a thruster fault will not implicate
a temperature sensor. In this case, extra logic in the software or an extra temperature sensor on
each thruster could eliminate the superstructure. This superstructure analysis would be useful in
early-stage system design for insight into fault observability. In summary, all physical
components in the selected constraint model, except for the temperature sensors, can be isolated
to the specific faulty component with the chosen constraint model. The analysis in Section 4.4
only considers thruster and navigation sensor failures because the models of these hardware
components are the most realistic in the simulation. In addition to constraint model organization,
superstructures depend on signal-to-noise ratios. A lower SNR at a given node may lead a
higher probability of isolating to a larger superstructure. This will be shown in Section 4.4.2
with the degradation of the temperature sensors.
ON/OFF
COMMANDS
FORCES/
TEMPERATURES
,
,
DYNAMICS
THRUSTERS
TEMPERATURE
SENSORS
ACCELERATIONS/
POSITIONS
0
Figure 18: Constraint Model without Split Dynamics
56
Sensor
Component
Forward InformatIon Flow
Superstructures in SCS are largely determined by the chosen constraint model. There are
also limitations within the hardware that are not reflected in the constraint model. For example,
the GPS is an integrated hardware unit and is typically at the lowest replaceable unit level. The
position measurement is split into three axes to improve diagnostic resolution in the constraint
model, but in reality a GPS failure would likely result in error on all three axes. To
accommodate these types of hardware connections, the SCS algorithm includes logic to group
components or sensors in the model. Groups of components or sensors can be identified for
simultaneous, rather than individual, suspension. Otherwise, a failed GPS resulting in errors on
all three axes would require the algorithm to attempt all combinations of 1, 2, and 3 simultaneous
suspensions in order to isolate the fault. This large number of isolations would require
significant computation effort and can be avoided by grouping components or sensors that are
actually part of the same hardware.
:
+-
Forward
constraint
Reverse
constraint
Cmd
Thruster
F1
INPUTS
F2
>
a
OUTPUTS
F3 P
x Dynamics
Integrator
Figure 19: Component Inputs and Outputs with Connections
57
Each component contains forward and reverse constraint functions to map inputs to
outputs and vice versa. Figure 19 identifies the inputs and outputs for the three basic types of
components and maps the connections between them. The y and z dynamics differ from the x
dynamics component only in the number of input forces. The fundamental component constraint
models are the building blocks for the more complex system level diagram shown earlier in
Figure 16. The constraint functions for measurements and associated uncertainties are provided
here for each component. They are based on the simulation truth equations provided in Section
4.2. The inputs to the thruster components are on or off commands, which can be represented by
a 1 or 0, respectively. The outputs are forces in each body direction that the thruster affects, as
well as a temperature. Therefore, the forward constraint function from a command to a force
component in direction i is
Fi = cmdi * Fnm * diri
(4.11)
where Fnom is the nominal thruster force, diri is the corresponding component of the direction
unit vector for thruster i, and cmdi is the on or off command for thruster i represented by a 1 or
0. The uncertainty is propagated using Eq. 3.1, written here as
out 0 =
7
0iz
*
(Fnom
* d iri) 2
(Fst*
d iri) 2
(4.12)
where o-2zut and uzare the output and input variances respectively and Fstd is the standard
deviation of the thrust force when the thruster is on. The forward constraint from command to
temperature is
T = Toff + (Ton - Toff) * cmdi
(4.13)
where T0n and Toff are the nominal temperatures when the thruster is on and off, respectively.
The uncertainty is propagated using
ou t
=
in
* (Ton - Toff)
(4.14)
where no process noise was added because there is no uncertainty in the simulation for on or off
temperature. The reverse constraint function from force to command is
cmdF
(4.15)
Fnom * di i
and the associated uncertainty equation is
o=
58
*
F
1 d)
(nom
i
+
Fsta(4.16)
nom
where the process noise is estimated with the ratio of the thrust standard deviation to the nominal
magnitude. The reverse constraint function from temperature to command is
(4.17)
cmdi = (T - Toff)
(Ton - Toff )
where T is the measured temperature. Corresponding command uncertainty is given by
U2 u =
/ 1
t 2 *(4.18) -Ton
2
ToffJ
where no process noise was added because we assume commands are known perfectly. This is
reasonable because controller commands are directly read in software.
The dynamics blocks sum thruster forces and output vehicle acceleration on a body axis.
The forward constraint functions for the three dynamics blocks are
ax
1
-m * (F1 + F2 + F3 )
(4.19)
1
(4.20)
+F1 1 )
ay =-*(F4 +Fs+F
8 + F9 +F
6 +F7 + F
m
1
(4.21)
az= -* (F4 + Fs + F6 +F, + F8 + F9 + F10 + F11)
m
where m is the vehicle mass, at is the acceleration along the i body axis, and Fi is the force from
thruster i along that axis taking into account the thruster direction. The uncertainty is propagated
with
t
where
oen,
+ cn)
+ -- +
= (u nl+ uin 2 +02
*
2.
+ On
(4.22)
(4.
is the standard deviation of the zero mean environmental force. Section 2.2
explained that no technique has been found to propagate through reverse constraint functions that
rely on multiple input node values. Reverse constraint functions are not included for the
dynamics blocks because the constraints are dependent on knowledge of all force inputs. The
force from a given thruster cannot be computed based on the acceleration component without
knowledge of the force component from the remaining thrusters.
The integration block computes vehicle position components based on accelerations. The
discrete integration is performed using
1
r = ro +vo * dt +-* a* dt 2
2
59
(4.23)
where ro and voare initial position and velocity stored in the VHM code from the previous time
step and dt is the size of the time step based on the VHM rate. Uncertainty is propagated with
)2
(*
Ci 1
n*
gu =
dt
(4.24)
where no process noise was added because the integration has no uncertainty. Uncertainty in
previous time step values is not incorporated in this constraint model. The reverse constraint
function is a discrete differentiation written as
r - ro - vo * dt
(4.25)
dt 2
a=2 *
and the uncertainty is propagated as
(out
2
=
2
(2
)2
n * ()
(4.26)
where no process noise was added.
In a more complicated example, constraint functions might need to incorporate frame
transformations and time-varying parameters, such as mass. However, in Section 4.4 we will
show that even with this 3-DOF example, the constraint functions are sufficient for analyzing
algorithm performance in the presence of realistic uncertainties.
4.4 Performance Analysis
We investigated VHM performance using the vehicle simulation and constraint models.
We first validated the code by running a fault scenario for each component and sensor at very
high signal-to-noise ratios in order to demonstrate perfect FDI. All faults were repeatedly
successfully isolated within their expected superstructures, confirming the simulation and
constraint model consistency. Several analysis cases are described in this section. System level
P(FA) as a function of input P(FA) to each node is found empirically. The effect of a degrading
sensor on diagnostic resolution is illustrated using the temperature sensors. Sensitivity of SCS
performance to the system SNR is investigated using dynamics uncertainty and a comparison
between two IMU models. SCS is then compared against Fixed Threshold (FT) to demonstrate
the system level improvement. Finally, SCS executions on an embedded processor are briefly
discussed.
Manned spacecraft VHM requirements are typically written in terms of fault tolerance
and general ability to detect and isolate key faults [36]. However, this requirements framework
60
does not translate well to the performance metrics often reported in literature, such as P(FA) and
P(MD). A thorough literature search finds very few instances of specific requirements for
P(FA), P(MD), and for similar isolation metrics. This suggests these requirements either are not
well defined or are proprietary. For the purpose of validating SCS in this thesis, we define a set
of metrics that constitute acceptable VHM performance. In detection, both the P(FA) and the
P(MD) shall not exceed 5%. In isolation, the algorithm must be correct in at least 95% of cases,
in that the faulty component or sensor is reported by the VHM (even if it is not the only
component or sensor reported). Results reported in this section will be compared to these
measures of acceptable performance. If specific VHM requirements become available in the
future, these results can be revisited.
4.4.1 System Level P(FA)
The relationship between system level P(FA) and the P(FA) input to each node for parity
space and hypothesis testing is critical to VHM design because it links the algorithm inputs to the
algorithm performance. A desired P(FA) is input to each node for parity space and hypothesis
testing calculations. However, the system is composed of many nodes, and if any node produces
a false alarm, there is a false alarm at the system level. Therefore, the system level P(FA) is
expected to be higher than the node input P(FA). We define the system level P(FA) here as the
empirical false alarm rate achieved in simulation. In general, the inputs to SCS are the node
level P(FA) but our requirements refer to the system level P(FA). No analytical relationship
between the input and system level P(FA) has been created yet because of the large number of
variable dependencies in the constraint model. The measurements at each node are not
independent because the measurements are propagated through the system. Analytical
measurements compared for consistency at one node may not be independent of the analytical
measurements compared at a second node because they may originate at the same sensor. These
dependencies are constraint model dependent. Empirical system level P(FA) as a function of
node level input P(FA) is plotted in Figure 20 for the vehicle constraint model, using 10,000
trials to create each data point. The same P(FA) input was applied to each node in the system,
but in practice nodes can be tuned individually if desired. The plot shows the strong linear fit
with a slope of approximately 3 for the range investigated. Due to the significant scale factor
between input and system P(FA), the input P(FA) must be set relatively low to establish
acceptable system wide P(FA). The relationship between system and input P(FA) is expected to
61
be highly system dependent so the simulations and analysis required to find this relationship
should be repeated for new constraint models.
0.03
-+-5 Empirical
Linear Fit
0.025
LL
E 0.015:
>
0.01
y =2.9*x + 0.00015
r2 = 0.9895
0.005
0
0
0.002
0.004
0.006
0.008
0.01
Input P(FA)
Figure 20: System level P(FA) as a Function of Input P(FA)
4.4.2 Sensor Degradation
As described in Section 4.3, a constraint model with no temperature sensors was initially
considered. Testing demonstrated that observability into each thruster's state is required to
eliminate superstructures of several thrusters. Simulations with varying temperature sensor noise
levels illustrate the sensor quality required to isolate single thruster failures. With higher quality
sensor measurements, isolation to a single thruster is likely. As the temperature sensor degrades
in quality, isolation to a superstructure of several thrusters occurs with increasing probability.
Figure 21 shows the average extra number of objects isolated for thruster number 5 stuck off for
varying signal-to-noise ratios. The extra objects number measures the number of extra objects
reported by the VHM algorithm as potentially faulty in addition to the correct object. The metric
includes both components and sensors as 'objects' in the constraint model. The signal level is
defined here as the difference between the expected temperature readings when the thruster is on
62
versus off (i.e., Ton - Tff) and the noise level is the standard deviation of the temperature sensor
measurements. One thousand sets of simulated onboard data were generated for each data point
and tested on the VHM code. The input P(FA) to each node was 0.003. A constraint model
identical to that in Figure 16, but without any temperature sensors, was created to determine the
extra components in the superstructure of the thruster investigated. It was discovered that at least
17 extra components and sensors would be isolated if no temperature sensor existed. The
average extra components metric approaches this value as the SNR decreases, indicating that a
degrading temperature sensor approaches the situation with no temperature sensor present. As
described in Section 4.3, there will always be one extra component isolated, because the control,
command and thruster form a superstructure. The experiment shows that a SNR of
approximately 8 is sufficient for reliable isolation to a single thruster.
-V 18~
161(Ino
.
14
Degrading Sensor
No Sensor
C 12 0
Q- 10ii
E
0
o 81
uJ
0
2
4
6
8
10
Temperature Sensor Signal to Noise
Figure 21: Temperature Sensor Degradation Forms Superstructure
4.4.3 Signal-to-Noise Sensitivity
The SNR in the system directly impacts VHM performance as shown in Section 3.3.2.
As the noise level approaches the signal or fault bias magnitude level, it becomes more difficult
63
to determine if a variation is due to a fault or to random noise. The signal-to-noise ratio was
investigated in two ways. First, the sensor hardware was varied, changing the measurement
noise level relative to a constant chosen fault bias. Second, the environmental disturbance forces
were varied, effectively varying the actuator thrust-to-dynamics uncertainty ratio. Accelerometer
channel faults were chosen to illustrate performance, because experimentation on the constraint
model with a variety of faults showed that accelerometer FDI is the most sensitive to the
variables investigated in these test cases. The figures in this section plot the experimental P(MD)
vs. P(FA), which shows the tradeoff in the design space with varying node level input P(FA). A
higher chosen input P(FA) for a node corresponds to points with lower P(MD) and higher P(FA)
on a given trade off curve. Varying the system signal-to-noise ratio shifts the curve relative to
the origin. For example, increasing the signal-to-noise will shift the curve towards the origin,
lowering both P(FA) and P(MD) and improving VHM performance. Using the FOM in Eq. 3.6,
the FOM-minimizing input P(FA) can be selected based on the empirical curves shown in this
section.
First, two IMU models were compared for a given fault mode. The IMU models were the
MIMU introduced in Section 4.2 and the Litton LN200 Inertial Measurement Unit, because they
are flight proven hardware [33]. Two IMU models are sufficient for checking the algorithm and
model. Specifications for the IMU models are shown in Table 3 [33]. The standard deviation of
the white noise on each accelerometer measurement was formed using the random walk and
assuming a 200 Hz sampling rate. This sampling rate is reasonable for a commercially available
IMU [37]. The conversion equation using dimensional analysis was
x-
m
Ss
*
1
y -=x
sSs
-
m
(4.27)
where x is the random walk specification and y is the sample rate [38]. Using Eq. 4.27, the
estimated accelerometer noise standard deviation was calculated to be 0.00212 and 0.00693 m/s 2
for the MIMU and LN200, respectively. The inserted fault was a deterministic bias in the x
accelerometer channel. The bias was equal to 9 times the nominal accelerometer noise standard
deviation given in Table 2, because in preliminary testing the algorithm performance was found
to be most sensitive in this bias region. Figure 22 shows the simulation results where each data
point represents 1000 nominal and faulty system runs. Figure 23 shows the low P(FA) and
P(MD) section of Figure 22 to show the MIMU performance. The lower signal-to-noise ratio of
64
the LN200 for the given fault bias results in the empirical curve shifted significantly away from
the origin as expected. This implies the MIMU was capable of robustly detecting smaller fault
biases than the LN200, which is expected because the MIMU has less noise in its measurements.
The algorithm ranged from 95.6 to 99.5% correct in its isolations for the MIMU and from 84.5 to
98.0% correct for the LN200. The ability to isolate simultaneous failures was disabled for all
tests in this section, because the execution time required to perform a simultaneous isolation was
impractical and no simultaneous faults were inserted, so this would be an incorrect isolation. If
the algorithm failed to isolate the failure with single component or sensor suspensions, it was
forced to stop execution and report an incorrect isolation rather than attempt every combination
of multiple components and/or sensors. Also, the algorithm could only isolate the fault to a
superstructure of approximately 8 components on average due to the low signal-to-noise ratio
tested. The signal-to-noise ratio used here was chosen because preliminary testing demonstrated
that the detection performance is sensitive in this region. In isolation, the probability of isolating
to a larger superstructure is high because of the low SNR. This is similar to the reasoning in
Section 4.4.2 , shown in Figure 21, where temperature sensor degradation results in a lower SNR
and more components are isolated in the superstructure on average. Additional testing showed
that a higher SNR causes movement along the curve toward a lower average number of
components isolated. Based on the requirements defined earlier in this section, SCS
demonstrates acceptable detection performance for this small fault bias with recorded P(MD) and
P(FA) lower than 1%and 3%respectively.
IMU Model
Accelerometer Random Walk
(m/s/vs)
MIMU
0.00015
LN200
0.00049
Table 3: IMU Hardware Specifications [33]
65
1:
*
*
MIMU
LN200
0.8
0
0
0.6
:5
-
0
I
0
0.4
0.2
N
0
0.02
0
0.08
0.06
0.04
0.1
P(FA)
Figure 22: IMU Model Comparison for Accelerometer Fault
0.03 *
0.0251
0
0
MIMU .
LN200
0
U
0 .0 1 5
0.01
0
E
U
N
0
0.005 L
0.02
0.03
0.04
0.05
0.06
P(FA)
Figure 23: Low P(FA) and P(MD) Region of Figure 22
66
In the second noise sensitivity test, the environmental disturbance force was varied to
investigate the effect of dynamics uncertainty on accelerometer FDI performance. The
stochastic environmental disturbance was simulated with zero mean and a varying standard
deviation. The deterministic accelerometer fault bias was equal to that in the IMU model
comparison. Figure 24 plots the simulation results with each data point again representing 1000
trials for both nominal and faulty systems. The ratio in the legend represents the thrust force (F)
divided by the standard deviation of the environmental disturbance (env), because this can be
considered the effective SNR in vehicle dynamics. Figure 25 shows the performance at low
P(MD) and P(FA) for F/env = 100. This is considered a conservative estimate of SNR because
in practice, aerodynamic drag in low earth orbit (LEO) for a vehicle with this cross sectional area
is less than 1 N [39] and the thrust level here is 216 N, resulting in a F/env ratio greater than 200.
SCS correctly isolated the accelerometer in 93.3 to 99.4% of the trials for all data points for this
conservative SNR estimate. Most data points achieved the acceptable 95% correct isolation
metric defined at the beginning of this section. Therefore, it is possible to select an input P(FA)
to meet our requirements. For the same reasons outlined above in the IMU test analysis, the
accelerometer again could only be isolated to a superstructure averaging 8 components in size.
SCS demonstrated acceptable detection performance according to the requirements stated at the
start of this section with P(MD) and P(FA) less than 2% and 5%, respectively, for certain input
P(FA).
67
le
*
F/env= 100
F/env =10
0.4
0.3-
a0.2
U
5
0
M
0.1
0
*-
M
U
Urn
U
0
0
U
U
0.02
0
0
0.04
0.08
0.06
P(FA)
Figure 24: Dynamics Uncertainty Comparison for Accelerometer Fault
*
,
*
0.05
F/env= 100
F/env= 10
0.04
a 0.03
0
0
0
0.02
0.01
0
.
0.02
0.03
0.04
0.05
P(FA)
Figure 25: Low P(FA) and P(MD) Region of Figure 24
68
0.06
4.4.4 System Level SCS and FT Comparison
This section expands on the node level demonstration in Section 3.3.2 by comparing SCS
and Fixed Threshold (FT) at the system level. The spacecraft constraint model presented in
Section 4.3 was slightly modified to include an extra set of accelerometers, and FT was tuned to
a specific accelerometer fault bias in the model. We demonstrated SCS system level
improvement over FT for this sample fault for both detection and isolation.
For this simulation, we developed a procedure for selecting the fixed thresholds. First,
the expected mean and variance of the analytical measurements at each node in the constraint
model were calculated for a nominal system with no faults. The constraint model was identical
to that in Figure 16, but a redundant accelerometer was added at each accelerometer node. The
redundant sensor resulted in 3 measurements at the accelerometer nodes, originating from the 2
sensors and a forward propagated value. The accelerometers were based on the MIMU and
LN200 models presented in Table 3. The second accelerometer was added to create a scenario
similar to that in Section 3.3.1, wherein three measurements have different uncertainties. Next,
we derived the impact of the selected fault on the analytical measurement means at each node.
Knowledge of the constraint functions and fault bias allows the designer to compute the expected
mean of the analytical measurements throughout the system. We inserted a fault bias in a MIMU
accelerometer channel because the higher uncertainty of the LN200 measurements would render
the fault more difficult to detect for a given fault bias to noise ratio. Based on known means,
variances, and fault biases at each node, the node level numerical optimization tool from Section
3.3.2 was used to find the FOM-minimizing threshold for each node. In order to simulate
realistic implementation, threshold tuning was performed at each node. This resulted in
extremely low performance for FT due to the presence of at least three measurements with
different levels of uncertainty at several nodes. Therefore, we employed additional knowledge
of the fault to boost FT performance using the following strategy. Most nodes were unaffected
by an accelerometer fault due to the inability to reverse propagate information through the
dynamics components. Unaffected nodes were given thresholds high enough to avoid false
alarms. Also, the thresholds at the branching nodes forward of the dynamics components were
set high even though they are impacted by the accelerometer fault bias. This is because detection
and isolation for this fault are possible using consistency checks only at the accelerometer nodes.
Figure 26 highlights the nodes that had thresholds set high for FT. Red circles highlight the
69
nodes that had thresholds tuned. Purple circled nodes indicate nodes that would see the effect of
the accelerometer fault bias but were set high to boost FT performance. All other nodes would
be unaffected by the example fault and had thresholds set high.
ON/OFF
COMMANDS
FORCES/
TEMPERATURES
THRUSTERS
TEMPERATURE
SENSORS
ACCELERATIONS
DYNAMICS
O
O
POSITIONS
Affected by fault but set high
Threshold tuned
Figure 26: Constraint Model Highlighting FT Settings
SCS inputs were selected based on the performance in Figure 23. Input P(FA) was 0.007
for every node because this produced low P(MD) and P(FA) rates in the earlier analysis. No
optimization tool was used to select input P(FA) and no nodes were set high as with the FT
method. Therefore, FT was given a significant advantage over SCS by using knowledge of the
simulated fault mode.
Figure 27 shows the detection performance of each algorithm measured by the FOM used
in Chapter 3 (with ci and c2 equal to 1) for a range of accelerometer fault biases. FT was tuned
for a fault bias to noise ratio of 8. During flight, arbitrarily many fault biases are possible, and it
is impractical to tune a threshold for each, so the threshold was not changed with the independent
70
variable. Each data point represents 1000 trials for both nominal and fault scenarios. SCS
outperforms FT in detection for the SNR plotted, but the trend implies FT would asymptotically
approach or possibly surpass SCS performance at low fault biases. Further testing at lower fault
biases showed that this is due to the chosen input P(FA) for SCS. For the selected input P(FA),
FT can marginally outperform SCS in detection at for low fault biases. SCS can easily be tuned
to have a higher FOM than FT in this domain by increasing the input P(FA) value, which would
also slightly decrease the SCS FOM at higher fault biases. However, at these low fault bias
levels, neither algorithm achieves acceptable VHM performance as defined at the top of this
section. Therefore, we consider both algorithms ineffective in FDI in this SNR range and the
range plotted here illustrates the relevant SCS improvement over FT. Inserting the acceptable
detection performance in P(MD) and P(FA) defined at the start of this section into the FOM here,
an acceptable FOM is 0.95. SCS achieves acceptable performance at a SNR of 8, while FT does
not achieve acceptable performance in this SNR range.
Figure 28 provides isolation performance measured by the percent of isolations that
contained the correct faulty component. In isolation, SCS consistently achieves approximately
95% correct while FT reaches 85% correct for this fault bias to noise range. SCS achieves
acceptable isolation performance for SNR greater than approximately 6. FT again does not
achieve acceptable performance in the SNR range considered.
As shown in Figure 27 and Figure 28, SCS performed better than FT in detecting the
fault and isolating the correct component despite numerically optimizing FT around the given
fault mode. The threshold selection process was highly dependent on the constraint models and
fault mode, and in reality many failure modes would need to be taken into account, rendering
threshold tuning less practical than the SCS input process. More complex constraint models may
also result in a greater number of nodes with three or more measurements to compare. This
would make any threshold selection at these nodes less favorable than SCS as shown in Section
3.3.
71
1I
"
M
0.95FH
*
*
U
SOS
FT
M
0.85
0
0
U-
0.8
e
0.75 U
0.7
0
0.65k5
6
10
9
8
7
Fault Bias to Noise Ratio
Figure 27: System Level Detection Performance Comparison for Accelerometer Bias
100
U
U
0
.
M
95
=U
90
* SCS
)
0C:
* FT
85 1
0
80!
0
01
0
7570
0
65
5
6
7
8
9
10
Fault Bias to Noise Ratio
Figure 28: System Level Isolation Performance Comparison for Accelerometer Bias
72
4.4.5 Embedded Processor Testing
The simulations presented here were run using MATLAB on a Windows desktop
computer. During flight, the software will run on an embedded processor without the overhead
processes that a desktop computer runs. In order to facilitate testing of the algorithm in a more
realistic scenario, the VHM code was written in Embedded MATLAB (EML). EML is a subset
of the MATLAB language that supports select MATLAB features and facilitates efficient code
generation for deployment into embedded systems [40]. Appendix B provides more detail on the
implemented software architecture. Draper Laboratory engineers compiled and auto-coded the
EML into C code and successfully ran it on an embedded processor for a variety of constraint
models, including nonlinear and hierarchical systems. This demonstrated the code functionality
in an environment more closely resembling a flight system.
Future software development work will focus on reducing the memory overhead
requirements and lowering computational complexity. Once the SCS code is refined in terms of
memory and computation, additional testing on an embedded processor can provide accurate
metrics on memory usage and execution time. Testing on a variety of constraint models can lead
to a relationship between these metrics and model complexity.
73
74
Chapter 5: Summary and Future Work
This thesis presents a general vehicle health monitoring (VHM) method built on
constraint suspension. Key results of this thesis are summarized here. Future work and
challenges in implementing Stochastic Constraint Suspension (SCS) on a flight vehicle are also
outlined.
5.1 Summary of Results
Section 1.1 illustrated how autonomous VHM is a critical technology for future space
exploration missions through the use of specific mission examples and NASA priorities [1].
Section 1.2 argued that many VHM methods have been proposed and applied to various degrees,
but none have been able to detect and isolate a sufficient variety of faults while remaining simple
enough for near term deployment on a complex system. This conclusion is based on a survey of
literature on proposed and implemented VHM algorithms, including several forms of hardware
and analytical redundancy [6]. Fesq's analog constraint suspension [21] was shown to be a
promising approach to the VHM problem by linking quantitative and qualitative redundancy
algorithms. The fundamentals of constraint suspension, including its merits and limitations,
were detailed in Chapter 2 along with the parity space and hypothesis testing approaches.
SCS, presented in Chapter 3, strengthened analog constraint suspension by using parity
space and hypothesis testing rather than the Fixed Threshold (FT) algorithm for consistency
checking of analytically redundant measurements. This was enabled by propagating uncertainty
associated with each value through the constraint functions. Low level fault detection and
isolation (FDI) remains an efficient tool for comparing redundant hardware and can fit well into
the constraint suspension framework. The expected performance of SCS at the measurement
consistency checking level was derived for a given fault mode and algorithm input. Conceptual
and numerical examples then demonstrated the improvement SCS provides over FT, especially
when greater than two measurements with unequal uncertainties are being compared. Over the
measurement uncertainty range considered in Section 3.3.2, SCS maintained 97% of its
performance as measured by the defined figure or merit (FOM) in Eq. 3.6, while FT fell to 87%
of its original performance.
75
Chapter 4 provided SCS performance data for a simulated spacecraft GNC subsystem.
The subsystem application and associated simulation were described and the constraint model
representation of the subsystem was presented. Repeated simulations were used to record key
metrics of SCS performance for a variety of test cases. First, the empirical relationship between
system level P(FA) and node level P(FA) was investigated and found to be linear with a slope of
approximately 3 for the given constraint model. Next, degrading temperature sensor quality was
shown to increase the probability of isolating a fault to a large group of components and sensors
rather than to the individual thruster for a thruster fault. The presence of a degrading sensor
causes the constraint model to approach the constraint model without the sensor present. Third,
SCS performance measured by P(FA) and P(MD) was presented for different signal-to-noise
ratios. This included varying the IMU hardware and the uncertainty of vehicle dynamics in the
simulation. SCS achieved empirical P(FA) and P(MD) of less than 5% and 2% respectively for
conservative signal-to-noise estimates. Fourth, SCS was directly compared to FT at the system
level using the detection FOM presented in Chapter 3 and the percentage of correct isolations.
SCS consistently outperformed FT in detection and isolation using the defined metrics. In
isolation over the signal-to-noise range examined, SCS identified the correct component in its
output list at least 95% of the time while FT achieved less than 85% correct.
5.2 Challenges and Future Work
Future research will focus on several challenges in SCS. First, computation limitations
may become apparent with more complex constraint models. The SCS algorithm must be
characterized on an embedded processor for a better understanding of computation and memory
metrics. The current implementation is auto-codable into C as a first step towards this goal, but
significant work remains in lowering the computation and memory load. For example, new
suspension strategies in isolation or building models hierarchically may mitigate the computation
challenge. Second, a set of design rules would be helpful for the creation of constraint models
for a given set of hardware. These rules would attempt to maximize diagnostic resolution and
mitigate system complexity and ensure the VHM design process is more of a repeatable
procedure than an art. Third, the ability to recognize superstructures with software would be
useful in practice and in analysis. Superstructures are groups of components and sensors at the
limit of diagnostic resolution [21]. A fault in a component or sensor within a superstructure can
76
only be isolated to the entire superstructure. During the design process, this would be valuable in
identifying observability limitations and could influence vehicle design based on VHM
requirements. In operation, the algorithm could potentially reduce the number of suspension
steps performed during isolation. For instance, if a component within a superstructure is found
to be potentially faulty, the entire superstructure could be added to the suspect list without
individually suspending each component. Additionally, following a possible constraint model
reconfiguration due to a faulty component, the new superstructures could be found
autonomously. Fourth, VHM over an extended number of time steps has not been explored in
depth here. Many constraint models, such as the one used in Chapter 4, require integration or
differentiation over time. An algorithm for updating integrated values periodically should be
included in SCS to avoid integration errors that build over time. Filtering may be needed for
differentiating values to mitigate noise. Additionally, a moving window has been implemented
in FDI in the past to improve VHM robustness [26]. For example, the algorithm may require
isolation to a component in 3 out of 5 consecutive time steps to reconfigure the hardware. This
approach may improve the P(FA) and P(MD) but will increase the time to recover from a failure
and may prevent FDI of transient events, so a trade off must be carefully made.
SCS must be validated on progressively higher fidelity models leading up to flight
vehicle deployment. High fidelity vehicle simulations exist to provide realistic sensor data to the
algorithm. Hardware-in-the-loop testing could then provide the next step in characterizing VHM
performance and feasibility. Ultimately, the algorithm must be flight proven. However, flight
vehicles may not experience the wide array of faults required to validate SCS, and simulated
onboard faults may be used instead.
77
78
References
[1] "NASA Space Technology Roadmaps and Priorities," National Research Council,
Washington, D.C., 2012.
[2] Brian Welch, "Limits to Inhibit," Space News Roundup, vol. 24, no. 14, pp. 1-3, August
1985.
[3] "Report of the Presidential Commission on the Space Shuttle Challenger Accident,"
Washington, D.C., 1986.
[4] David K. Geller, "Orbital Rendezvous: When is Autonomy Required?," Journalof
Guidance, Control,and Dynamics, vol. 30, no. 4, pp. 974-981, July-August 2007.
[5] Lorraine M. Fesq. (2009, March) White Paper Report: Spacecraft Fault Management
Workshop Results. [Online].
http://discoverynewfrontiers.nasa.gov/lib/pdf/SpacecraftFaultManagementWorkshopResults
.pdf
[6] Inseok Hwang, Sungwan Kim, Youdan Kim, and Chze Eng Seah, "A Survey of Fault
Detection, Isolation, and Reconfiguration Methods," IEEE Transactionson Control Systems
Technology, vol. 18, no. 3, pp. 636-653, May 2010.
[7] Douglas Zimpfer, "Flight Control Health Management," in System Health Management with
Aerospace Applications.: John Wiley & Sons, 2011, ch. 30.
[8] Mukund Desai, "A Fault Detection and Isolation Methodology," in Decision and Control
Including the Symposium on Adaptive Processes, 1981 20th IEEE Conference on, vol. 20,
[9]
[10]
[11]
[12]
1981, pp. 1363-1369.
James E. Potter and James C. Deckert, "Gyro and Accelerometer Failure Detection and
Identification in Redundant Sensor Systems," NASA Technical Report E-2686 1972.
J. Gertler, "Fault Detection and Isolation Using Parity Relations," ControlEng. Practice,
vol. 5, no. 5, pp. 653-661, 1997.
R.K. Mehra and J. Peschon, "An Innovations Approach to Fault Detection and Diagnosis in
Dynamic Systems," Automatica, vol. 7, no. 5, pp. 637-640, September 1971.
D.T. Magill, "Optimal Adaptive Estimation of Sampled Stochastic Processes," IEEE
Transactionson Automatic Control, vol. 10, no. 4, pp. 434-439, October 1965.
[13] Michele Basseville, Albert Benveniste, Maurice Goursat, and Laurent Mevel, "SubspaceBased Algorithms for Structural Identification, Damage Detection, and Sensor Data
Fusion," EURASIP Journalon Advances in SignalProcessing,2007.
[14] R.J. Patton, C.J. Lopez-Toribio, and F.J. Uppal, "Artificial Intelligence Approaches to Fault
Diagnosis," in Condition Monitoring: Machinery, ExternalStructures andHealth (RefNo.
1999/034), IEE Colloquium, 1999, pp. 5/1-5/18.
[15] H. A. Talebi and K. Khorasani, "An Intelligent Sensor and Actuator Fault Detection and
Isolation Scheme for Nonlinear Systems," in Decision and Control,Proceedingsof the 46th
IEEE Conference on , New Orleans, LA, 2007, pp. 2620-2625.
[16] C. Angeli and A. Chatzinikolaou, "On-Line Fault Detection Techniques for Technical
Systems: A Survey," InternationalJournalof Computer Science & Applications, vol. 1, no.
79
1, pp. 12-30, 2004.
[17] Randall Davis, "Diagnostic Reasoning Based on Structure and Behavior," Artificial
Intelligence, vol. 24, no. 1-3, pp. 347-410, December 1984.
[18] Sandra C. Hayden, Adam J. Sweet, and Seth Shulman, "Lessons Learned in the Livingstone
2 on Earth Observing One Flight Experiment," in AIAA 1st Intelligent Systems Technical
Conference, Chicago, IL, 2004, pp. 1-11.
[19] Douglas Bernard et al., "Spacecraft Autonomy Flight Experience: The DS1 Remote Agent
Experiement," in AIAA Space Technology Conference and Exhibition, Albuquerque, NM,
1999.
[20] Marie-Odile Cordier et al., "Conflicts Versus Analytical Redundancy Relations: A
Comparative Analysis of the Model Based Diagnosis Approach From the Artificial
Intelligence and Automatic Control Perspectives," IEEE Transactionson Systems, Man, and
Cybernetics-PartB: Cybernetics, vol. 34, no. 5, pp. 2163-2177, October 2004.
[21] Lorraine M. Fesq, "Marple: An Autonomous Diagnostician for Isolating System Hardware
Failures," UCLA, Ph.D. Dissertation 1993.
[22] Ksenia 0. Kolcio, Mark L. Hanson, Lorraine M. Fesq, and David J. Forrest, "Integrating
Autonomous Fault Management With Conventional Flight Software: A Case Study," in
Aerospace Conference, 1999. Proceedings. 1999 IEEE, vol. 1, 1999, pp. 307-314.
[23] K. Kolcio, M. Hanson, and L. fesq, "Validation of Autonomous Fault Diagnostic Software,"
in Aerospace Conference, 1998 IEEE, vol. 4, Aspen, CO, 1998, pp. 251-264.
[24] Scott Gleason and Demoz Gebre-Egziabher, GNSS Application and Methods.: Artech
House, 2009.
[25] Mark A. Sturza, "Navigation System Integrity Monitoring Using Redundant
Measurements," Journal of the Insitute ofNavigation, vol. 35, no. 4, pp. 69-87, Winter
1988-89.
[26] Russell Sargent et al., "A Fault Management Strategy for Autonomous Rendezvous and
Capture with the ISS," in AIAA Info Tech at Aerospace 2011, St. Louis, MO, 2011.
[27] Milton Abramowitz and Irene A. Stegun, Handbook of mathematicalfunctionswith
formulas, graphs, and mathematical tables.: Courier Dover Publications, 1972, pp. 940-
943.
[28] Johan deKleer and Brian C. Williams, "Diagnosis With Behavioral Modes," in Proceedings
of the Eleventh InternationalJoint Conference on Artificial Intelligence (IJCAI-89), 1989,
pp. 1324-1330.
[29] Daniel L. Dvorak, "Monitoring and Diagnosis of Continuous Dynamic Systems Using
Semiquantitative Simulation," Unversity of Texas at Austin, Ph.D. Dissertation 1992.
[30] David J. Goldstone, "Controlling Inequality Reasoning in a TMS-based Analog Diagnosis
System," in Proceedings of the AAAI-91 National Conference on Artificial Intelligence,
1991, pp. 215-517.
[31] Philip R. Bevington and D. Keith Robinson, Data Reduction and ErrorAnalysisfor the
PhysicalSciences, 3rd ed.: McGraw-Hill, 2003.
[32] (2012, March) Astrium. [Online]. http://cs.astrium.eads.net/sp/spacecraftpropulsion/bipropellant-thrusters/220n-atv-thrusters.html
[33] Stephen C. Paschall II, "Mars Entry Navigation Performance Analysis using Monte Carlo
80
Techniques," MIT, Master's Thesis 2004.
[34] Oliver Montenbruck, Miquel Garcia-Fernandez, and Jacob Williams, "Performance
Comparison of Semicodeless GPS Receivers for LEO Satellites," GPS Solut., vol. 10, no. 4,
pp. 249-261, November 2006.
[35] Doug Myhre, "Space Shuttle Main Engine Hot Gas Temperature Sensor," in International
Instrumentation Symposium, 28th, Las Vegas, NV, 1982, pp. 405-416.
[36] NASA Office of Safety and Mission Assurance. (2011, November) Human-Rating
Requirements for Space Systems (NPR 8705.2B). [Online].
http://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPR&c=8705&s=2B
[37] (2009, October) Gladiator Technologies, Inc. [Online].
http://www.gladiatortechnologies.com/DATASHEET/Legacy/LandMark30_IMUdatasheet
_102309.pdf
[38] Walter Stockwell. Angle Random Walk. [Online].
http://www.xbow.com/pdf/AngleRandomWalkAppNote.pdf
[39] David A. Vallado and David Finkleman, "A Critical Assessment of Satellite Drag and
Atmospheric Density Modeling," in AAS/AIAA Astrodynamics Specialist Conference,
Honolulu, HI, 2008.
[40] MathWorks. (2012) Code Generation from MATLAB: User's Guide. [Online].
http://www.mathworks.com/help/pdf doc/eml/eml_ug.pdf
81
82
Appendix A: Chi-Squared Distribution
Noncentrality Parameter Derivation
The chi-squared distribution noncentrality parameter is derived here for use in Stochastic
Constraint Suspension (SCS). The noncentrality parameter k is necessary for the calculation of
the expected P(MD) using Eq. 3.3 for a given fault and input P(FA). This parameter describes
the mean of the normal random variables used in the chi-squared distribution.
The noncentral chi-squared distribution X2 can be written as the sum of squared random
normal variables,
k
2
(A.1)
X2 =
i=1 U
where o-1 is the standard deviation of random normal variable Xi, and k is the number of degrees
of freedom. The probability density function for x can be written as a function of k and the
noncentrality parameter k,
1
X k 1
x+A
f(xlk,A2) = -e
2
(Vi)
(A. 2)
4)
.j! * (V +j + 1)
(A. 3)
*
*
2
Ik
2
where I, (y) is a modified Bessel of the first kind given by
IV(y)
where
[(n)
= 2
*
j=0
is the gamma function computed using
F(n) = (n - 1)!
(A. 4)
where n is an integer. The noncentrality parameter k can be written as
k
/I =
k(A.
5)
where pi is the mean of random normal variable Xi [27].
The decision variable D in Eq. 2.8 was written as
D=
83
+...+ (
(A.6)
where fi is the residual for measurement i. Assuming the residuals
fi follow random normal
distributions, D follows a chi-squared distribution. For a nominal system, the residuals will have
zero mean, reducing Xto zero and forming the central chi-squared distribution. However, a fault
observed as a bias in one of the measurements will cause a bias in at least one of the residuals,
causing a nonzero Xand forming a noncentral chi-squared distribution.
The noncentrality parameter can be computed analytically using the parity space procedure
outlined in Section 2.1.1. First, the computation is reduced by setting
H=[]
(A.7)
where the measurement geometry matrix H is simplified for SCS because the analytically
redundant measurements are measuring the same quantity. Hj", is computed by combining Eqs.
2.3, 2.4, and A.7 as
Hi,, = (H TWH)-'H TW
..U 1
1
=
1
= K [12
rK1
(A.8)
+m
where
K= 1
(A. 9)
1
and H T WH was expanded to
1/2
HTWH=[1
...
1]
0
0
...
0
ali
0
1
1/22
...
0
-
2i
(A.10)
1/2
Next, the S matrix is computed using the derivation in Eq. 2.7,
S
and is expressed as
84
= Im-
H *Hin
(A.11)
-1
1
1
u1
U2
11
1
T2
-2
K
S = K*
(A.12)
1
1
1
T2
o-i
K
1
_
o
using substitution from Eq. A.8. S is directly used to compute the residual vector f as
f
1
K
1
1
1
o-2
1
u2
1
u2
f=K*
(A.13)
=S*z =z-z2
1.
K
1
5
2
1
1
...
U_
of
um2
* [Z
_zm
(A.14)
1
1
K
o
i
1rKn-
which can be expanded to
KK
f1,nom
) * Z1
±
2 *
K
z 2 + ... + - - am * Zm
K
K
f2,norm
~
2 * Z
+
Z2
+
+
- -* 2
a-
Zm
(A.15)
(A.16)
where Eqs. A. 15 and A. 16 give explicit expressions for the measurement residuals for a nominal
system. Adding a fault bias b to a given measurement zj results in faulty residual expressions
K *b
fi = fi,nom
- 2
2K
fi = f,nom + b * (1
(for nominal measurements i)
(A.17)
(for measurementj with fault bias b) (A.18)
U~2
where Eqs. A. 17 and A. 18 separate the bias induced for each residual. The residual biases are
the nonzero means that can be substituted into Eq. A.5 to form the noncentrality parameter,
85
-K2
b2 1
m
o2
A=
+
2
(A.19)
S2 2
(i:#j)
and simplified to
b2
K2
A1= -b[S;2+ -2
j
2j
1
M
-]
i1d
(A.20)
where
1
K=r
(A.21)
Eq. A.20 provides a general expression for Xthat can be used to compute the expected P(MD)
for a given fault using Eq. 3.3. Using the example from Section 3.3.2, for m=3 and a bias b on
measurement 2, the resulting Xis
b2 2
=2
T2
K2
S222 +
2T
1
U'2
1
U3
+2)]
(A.22)
Eqs. A.20 and A.21 can be simplified for equal measurement uncertainties Y. The simplified
expression for Xis
A=
b21 _ 72
o2
(i
+
2
=(1
m
i=1
1
m+ +
m-1\
b2
=(Sij ) a
(A. 23)
where Sjj is the diagonal entry of S corresponding to the measurement j with fault bias b and
a2
K= m
(A. 24)
The biased residuals are
fi =fi,nom+
f; = fj,nom + b * (1 -
b
m
(for nominal measurements i)
(for measurementj with fault bias b)
(A.25)
(A.26)
Eq. A.23 matches the expression given by Sturza for parity space and hypothesis testing on
redundant hardware [25].
86
Appendix B: Software Architecture
The vehicle health monitoring (VHM) software used in this thesis was developed in
Embedded MATLAB (EML). EML is a subset of the MATLAB language that supports select
MATLAB features and facilitates efficient code generation for deployment into embedded
systems [40]. The software is designed to perform the algorithm pseudocode as presented in
Figure 8. Additionally, the software incorporates logic for hierarchical models and execution
over multiple time steps. This appendix provides the algorithm interface and the organization
and purpose of the functions called in Stochastic Constraint Suspension (SCS).
The constraint model is initialized in the ModeliLoad,ModelNodesInit, and
ModelComponentInit functions. These functions define the structures required throughout the
code. Model iLoad defines key constants used by the algorithm, such as the maximum depth of
the constraint model hierarchy. Physical constants required for constraint functions, such as
vehicle mass, are also defined here. ModelNodesInit defines the structure containing
information on nodes and sensors. Sensor locations, connections between nodes, and the
structures required to contain forward and reverse propagated values and uncertainties are
defined here. ModelComponentInitinitializes a similar structure for components. The
connection matrix linking nodes and components is one example of information defined in this
function. The initialized structures are saved for input to the algorithm each time step.
HMShell provides the interface to the SCS algorithm code. Figure 29 provides the
interface details. Constraint model structures, sensor measurements, and consistency checking
inputs are input to HMShell. Consistency checking inputs for SCS are the probability of false
alarms (P(FA)) selected for hypothesis testing at each node as described in Section 2.1.2. SCS
outputs the VHM solution, the forward and reverse propagated values and uncertainties (for use
in the next time step), updated physical parameters and data used for debugging. The solution
structure contains the fault detection result. If a fault was detected, the solution also indicates if
isolation was successful. If isolation was successful, the solution structure contains a list of
potentially faulty components and sensors.
87
iLoad
cOmps
>
nodes
>
sensed
>
old-sensed
alg.jn
sol-out
-> oldval
HMShell
~
old var
old-var
new_iLoad
>
Input:
iLoad: physical and model parameters
Comps: information about component structure
Nodes: information about node structure
Sensed: sensor value for the time step
Oldsensed: sensor values from previous time step
Alg in: input P(FA) to each node
Output:
solout: VH M solution
Oldval:values to be stored for each node for next time step
Oldvar: variances associated with each stored value
NewiLoad: updated iLoads for next time step
Figure 29: SCS Interface
Figure 30 lists the remainder of the functions required for SCS hierarchically and Figure
31 provides a visual representation of the hierarchy. From HMShell, HMMain is called to
perform nominal VHM. HMMain conducts the propagation step (HMPropagate)and
consistency checking at each node (HM_CheckNodes). HM Propagatemakes use of the
forward and reverse constraint functions (ModelForwardConstraintsand
ModelReverseConstraints)and the HMSelect Value function. When multiple values are
available at a node for propagation, HMSelectValue chooses the best value to propagate. The
best value is defined as the value with lowest uncertainty, with preference given to sensor
measurements. If the available measurement uncertainties are within a predetermined tolerance,
the sensor value is chosen. HMCheckNodes makes use of the HMchiLookUp function. As
described in Section 2.1.2, the chi-squared distribution is required for setting the threshold in
88
hypothesis testing. HMchiLookUp stores a look up table for the chi-squared distribution and
interpolates within the table to set the threshold for consistency checking.
-
ModeliLoad
-
ModelNodesinit
Model Componentinit
HMShell
-
HMMain
- HMPropagate
-
HMSelectValue
ModelForwardConstraints
-
ModelReverseConstraints
-
HMCheckNodes
-
HMGenCandidates
HMDiagnose
-
-
-
HM_chiLookUp
HMSuspend
HMSuspendSens
HMPropagate
P HM_SelectValue
D ModelForwardConstraints
D ModelReverseConstraints
HMCheckNodes
D HM_chiLookUp
HM_UnSuspend
HMUnSuspendSens
-
ModelSwitchLevel
-
HMPropagate
- HMSelectValue
" ModelForwardConstraints
" ModelReverseConstraints
HMUpdate
- HMWLSE
ModelUpdate_iLoad
-
Figure 30: Function Call Hierarchy List
If a fault is detected, HMGenCandidatesgenerates a list of candidate components and
sensors to consider in isolation. HMDiagnosethen performs the isolation process. For each
component and sensor in the candidate list, HM Diagnoseperforms suspension, propagation,
consistency checking using the same functions as those called HMMain. HMSuspend and
89
HMSuspendSens suspend components and sensors, respectively. HMUnSuspend and
HMUnSuspendSens reverse the suspension process.
Also within HMShell, ModelSwitchLevel switches to a new constraint model within a
hierarchal model if applicable. If a fault is isolated to a single component that contains a
sublevel constraint model, the algorithm reinitializes the structures representing the constraint
model and calls HMMain on the new model. HMPropagateis also called from HMShell
because it is required in the hierarchy level switching process.
HMUpdate stores a value and associated uncertainty at each node for the next time step.
Previous time step values are sometimes required in constraint functions for integration and
differentiation over time. HM Update calls HM_ WLSE, which computes the weighted least
squares estimate at each node using the available analytically redundancy measurements (from
propagation and sensors) at the node. ModelUpdate iLoad updates any physical parameters
that are known to change over time.
Figure 31: Function Call Hierarchy
90
91
92