Vehicle Health Monitoring Using Stochastic Constraint Suspension ARCHIVES by NASSAC TS INSTrI E Christopher Rossi I B.S.E. Aerospace Engineering 2S University of Michigan, 2010 Submitted to the Department of Aeronautics and Astronautics in Partial Fulfillment of the Requirements for the Degree of Master of Science in Aeronautics and Astronautics at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2012 © Christopher Rossi, 2012. All rights reserved. The author hereby grants to MIT and Draper Laboratory permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. Signature of Au th or: ................................................. .............................................................. . .. Department of Aeronautics and Astronautics May 24, 2012 Certified by: ................................................. - - . .. . . .. .................................. Jeffrey A. Hoffinan Aeronautics and Astronautics Thesis Supervisor C ertified by : ........................................... x --- ........................... Russell Sargent Member of the Technical Staff, Draper Laboratory Thesis Supervisor 'M / ..---------..... A ccep ted by : ...................................................... H. Modiano bEytan t Professor of Aeronautics and Astronautics Chair, Graduate Program Committee 2 Vehicle Health Monitoring Using Stochastic Constraint Suspension by Christopher Rossi Submitted to the Department of Aeronautics and Astronautics on May 24, 2012, in Partial Fulfillment of the Requirements for the Degree of Master of Science in Aeronautics and Astronautics Abstract Autonomous vehicle health monitoring (VHM) has been identified as a high priority technology for future space exploration in NASA's 2012 technology roadmap. Traditional VHM approaches are often designed for a specific application and are unable to detect and isolate a wide variety of faults. Proposed methods are often too computationally complex for NASA's manned flight software verification and validation (V&V) process. An innovative VHM algorithm is presented that addresses these weaknesses by integrating the constraint suspension technique with parity space and hypothesis testing. The approach relies on on-board sensor measurements, knowledge of control commands, and a modular mathematical system model to provide a VHM solution. Improvement over original constraint suspension is demonstrated using conceptual and numerical examples. Feasibility of the VHM method on a spacecraft is explored using a numerical simulation of a generic vehicle. Thesis Supervisor: Jeffrey A. Hoffman Title: Professor of the Practice of Aerospace Engineering Thesis Supervisor: Russell Sargent Title: Member of the Technical Staff, Draper Laboratory 3 4 Acknowledgements This thesis would not have been possible without all of the individuals who provided guidance and support throughout my two years at MIT. First and foremost, I would like to thank my family and friends for their constant support and encouragement. Second, I am very grateful to my MIT and Draper advisors, Professor Hoffman, Paul Huxel, Russell Sargent, and Louis Breger, for donating an extensive amount of their time to guide my research. Many other Draper employees also deserve recognition for their instrumental work on TALARIS and my thesis research. Third, I'd like to acknowledge Bobby Cohanim and Phillip Cunio for their TALARIS leadership and wide-ranging guidance outside of the project. Fourth, I feel very fortunate to have been surrounded by such talented peers, especially everyone on the TALARIS project, who I learned an incredible amount from. Finally, I'd like to thank Draper Laboratory and MIT for generously funding me through graduate school. 5 6 Table of Contents Chapter 1: Introduction................................................................................................................. 15 1.1 M otivation ........................................................................................................................... 15 1.2 Literature Review ................................................................................................................ 16 1.3 Thesis Overview .................................................................................................................. 19 Chapter 2: K ey A lgorithm s....................................................................................................... 21 2.1 Parity Space and Hypothesis Testing ............................................................................... 21 2.1.1 Parity Space ................................................................................................................. 22 2.1.2 Hypothesis Testing in FD I...................................................................................... 23 2.2 Constraint Suspension.................................................................................................... 25 2.2.1 D etection...................................................................................................................... 25 2.2.2 Isolation........................................................................................................................ 29 2.2.3 M erits and Lim itations.............................................................................................. 31 Chapter 3: Stochastic Constraint Suspension ............................................................................ 33 3.1 Algorithm Description......................................................................................................... 33 3. 1.1 Uncertainty Propagation ........................................................................................... 35 3.1.2 Utilizing H ardw are Redundancy ................................................................................ 35 3.2 Analytical Expected Perform ance.................................................................................... 36 3.3 D em onstration of Improvem ent ...................................................................................... 37 3.3.1 Conceptual ................................................................................................................... 38 3.3.2 Num erical ..................................................................................................................... 39 Chapter 4: Spacecraft Application............................................................................................. 45 4.1 Application Overview .................................................................................................... 45 4.2 V ehicle Sim ulation......................................................................................................... 47 4.3 Constraint M odels ............................................................................................................... 52 4.4 Perform ance Analysis ......................................................................................................... 60 4.4.1 System Level P(FA) ..................................................................................................... 61 4.4.2 Sensor Degradation...................................................................................................... 62 4.4.3 Signal-to-N oise Sensitivity...................................................................................... 63 4.4.4 System Level SCS and FT Com parison.................................................................... 69 4.4.5 Embedded Processor Testing .................................................................................... Chapter 5: Summ ary and Future W ork ...................................................................................... 73 75 5.1 Sum mary of Results ......................................................................................................... 75 7 5.2 Challenges and Future W ork........................................................................................... 76 References..................................................................................................................................... 79 Appendix A: Chi-Squared Distribution Noncentrality Parameter Derivation........................... 83 Appendix B: Softw are Architecture........................................................................................... 87 8 List of Figures Figure 1: The Hypothesis Testing Tradeoff between P(FA) and P(MD) ................................. Figure 2: Example System Constraint Model.......................................................................... Figure 3: Example Adder Component Constraint Model.......................................................... Figure 4: Detection Example for a Nominal System................................................................. Figure 5: Detection Example for a Faulty System................................................................... Figure 6: Pseudocode for Constraint Suspension ..................................................................... Figure 7: Isolation Example for a Faulty System ..................................................................... Figure 8: Stochastic Constraint Suspension Pseudocode ......................................................... Figure 9: Utilizing Redundant Sensor Hardware...................................................................... Figure 10: Representative Measurement Comparison Conceptual Example ........................... Figure 11: Numerical Simulation Results Comparing SCS and FT. ........................................ Figure 12: Zoomed in Numerical Simulation Results Comparing SCS and FT....................... Figure 13: Numerical Simulation Results for Two Measurements Scenario ........................... Figure 14: Generalized Thruster Configuration and Coordinate System ................................. Figure 15: Spacecraft Simulation Architecture........................................................................ Figure 16: Constraint Model for Spacecraft Application .......................................................... Figure 17: Constraint Model without Temperature Sensors...................................................... Figure 18: Constraint Model without Split Dynamics ............................................................... Figure 19: Component Inputs and Outputs with Connections................................................... Figure 20: System level P(FA) as a Function of Input P(FA) .................................................. Figure 21: Temperature Sensor Degradation Forms Superstructure ......................................... Figure 22: IMU Model Comparison for Accelerometer Fault................................................. Figure 23: Low P(FA) and P(MD) Region of Figure 22 ......................................................... Figure 24: Dynamics Uncertainty Comparison for Accelerometer Fault................................ Figure 25: Low P(FA) and P(MD) Region of Figure 24 .......................................................... Figure 26: Constraint Model Highlighting FT Settings ............................................................ Figure 27: System Level Detection Performance Comparison for Accelerometer Bias .......... Figure 28: System Level Isolation Performance Comparison for Accelerometer Bias............. Figure 29: SC S Interface............................................................................................................... Figure 30: Function Call Hierarchy List .................................................................................... Figure 31: Function Call Hierarchy ........................................................................................... 9 24 26 27 28 28 30 31 34 36 38 40 41 42 47 51 53 55 56 57 62 63 66 66 68 68 70 72 72 88 89 90 10 List of Tables Table 1: Generalized Hardware Locations and Thruster Directions ........................................ Table 2: M odel Param eters and Uncertainties ......................................................................... Table 3: IMU H ardw are Specifications ................................................................................... 11 46 50 65 12 Nomenclature A = linearized dynamics matrix D = decision scalar used in consistency checking DOF = degrees of freedom FDI = fault detection and isolation FOM = figure of merit FT = Fixed Threshold (an implementation of analog constraint suspension) f = measurement residual vector GNC guidance, navigation, and control GPS = Global Positioning System H = measurement geometry matrix IMU = inertial measurement unit k= noncentrality parameter for chi-squared distribution M = number of measurements N = number of states P(FA) = probability of false alarm P(MD) = probability of missed detection Q uncertainty in constraint function propagation I covariance matrix a standard deviation SCS Stochastic Constraint Suspension SNR = signal-to-noise ratio T = threshold used in consistency checking V&V = verification and validation VHM = vehicle health monitoring W = weighting matrix of measurement uncertainties for least squares estimation x = state vector z = measurement vector 13 14 Chapter 1: Introduction The latest NASA technology roadmap [1] identifies autonomous vehicle health monitoring (VHM) as a "high priority" technology critical to achieving NASA's goals. Spacecraft autonomous VHM monitors on-board sensor measurements and provides a system wide fault identification and isolation solution without ground or human support. No standard method exists for VHM and many traditional approaches are limited by their complexity or the types of faults they are able to identify and isolate. The objective of this research is to develop and demonstrate an improved VHM algorithm that is adequate for detecting and isolating a large class of faults, yet simple enough to fly on current manned vehicles. 1.1 Motivation Countless mission failures have occurred in spaceflight for a variety of reasons. Two Space Shuttle missions are presented here to illustrate the importance of VHM in spacecraft operations. More general reasons are then given to motivate autonomous VHM. During the ascent of STS 51-F in 1985, one Space Shuttle Main Engine (SSME) prematurely shutdown due to the failure of two redundant temperature sensors [2]. Due to a defect, the sensors experienced a common cause failure and incorrectly measured temperatures outside the acceptable limits for the engine. The majority vote of two redundant sensors triggered the shutdown. Soon after, a temperature sensor failed in a second SSME, while a second sensor in the same SSME approached the redline value that would have triggered another shutdown. Fortunately, a mission controller correctly determined in a matter of seconds that the error was a sensor failure and the engines were performing nominally. STS 51-F performed an Abort to Orbit (ATO) using the remaining two engines, possibly saving the vehicle and crew. Over 25 years later, most traditional VHM techniques are still unable to recognize this kind of common cause sensor failure. An improved VHM system could potentially have isolated the original failures to the sensors, preventing any engine shutdowns and saving the nominal mission, and eliminating the need for a risky mission control decision. 15 The 1986 Challenger disaster provides the second example. Shortly after liftoff, a failure in a Space Shuttle Solid Rocket Booster (SRB) caused the disintegration of the vehicle and loss of crew. The accident report [3] states that no anomalies were apparent to the crew or ground control until live video feed showed the break up. However, post-processing of downlinked onboard data showed discrepancies in SRB pressures and attitude up to 12 seconds before break up. Assuming an escape system was available, a VHM system could potentially have isolated the failure and triggered an abort in time to save the crew. Planned future systems will have escape systems so reliable VHM will be critical. Response time requirements and increasing vehicle complexity motivate autonomous VHM capability. Currently, common practice after a non-time-critical failure indication is to place a spacecraft in 'safe' mode, downlink telemetry, and diagnose the fault using a team of experts. However, this approach is not feasible for many mission scenarios because of the response time needed, and lags due to communication availability, speed of light delay, and diagnosis time on the ground. For instance, Geller [4] presents rendezvous cases where autonomy is required for mission success. This reasoning can be extended to VHM during rendezvous, entry, landing, abort, and any other dynamic scenarios. Additionally, as spacecraft become increasingly more complex, the failure modes increase exponentially [5], making it more difficult for humans to diagnose faults efficiently. The National Research Council, recognizing the importance of autonomous VHM, identified it in 2012 as a "high priority" technology for NASA to address in order to meet its exploration goals [1]. The report also points out the potential to apply the technology across missions and industries, including non-aerospace applications, "deep space exploration, robotic science missions, planetary landers and rovers." 1.2 Literature Review Recent surveys on VHM algorithms in use or under development divide the approaches into three classes of redundancy [6][7]. First, there is hardware redundancy, where signals from multiple hardware components are compared. Second, quantitative analytical redundancy uses on board mathematical system models to compare dissimilar measurements. Third, qualitative analytical redundancy applies logic similar to the quantitative form, but uses a discrete system 16 model rather than an analog model. Analytical redundancy is also known as model-based diagnosis. Hardware redundancy is commonly employed in aerospace because of its simplicity and reliability. Two sensors are sufficient for detecting a fault, but three are required for isolation of a fault to a single sensor. The comparison between sensors in flight software is often done using parity space [8] or limit value checking [9]. Parity space creates a residual vector with magnitude and direction that enable fault detection and isolation respectively. Chapter 2 describes parity space techniques in more detail. Limit checking directly compares measurements to predetermined thresholds. Though these techniques can detect and isolate faults in a single system, hardware redundancy is unable to identify higher level faults, such as common cause sensor failures. Also, this method requires redundant hardware which results in increased mass, volume, and power requirements. Quantitative analytical redundancy is applied in many forms in current vehicles. Fundamentally, a real-time mathematical model of the system is used to form connections between systems and compare dissimilar information. For example, a controller command to fire a certain thruster should result in a known acceleration measurement within a tolerance. Forms of quantitative analytical redundancy include full state observer, parity relations, Kalman filtering, and parameter estimation [6]. Full state observer methods use full state estimation with gain matrices tuned to be sensitive to specific fault modes. Parity relations generate residuals that exhibit predictable behavior in response to modeled faults while filtering system transients and noise [10]. Kalman filtering can also be used to identify faults by performing statistical tests on the whiteness, mean, and covariance of the residuals [11]. Some forms of Kalman filtering use a bank of filters, with each sensitive to a specific fault mode [12]. Parameter estimation, or system identification, compares estimated physical constants to modeled parameters [13]. All of these approaches reduce the need for redundant hardware, but can be more computationally expensive and are often sensitive to modeling errors. Most importantly, many require some level of failure mode modeling a priori. Qualitative analytical redundancy is typically used in the artificial intelligence (Al) community. Neural networks, expert systems, and constraint suspension fall into this class. Neural networks use pattern recognition to identify faults in nonlinear functions, but this method cannot present reasoning to a human operator and directly incorporating expert knowledge is 17 difficult [14][15]. Expert systems use heuristic knowledge to emulate human reasoning and perform VHM [16]. These systems are unable to detect gaps and inconsistencies in the knowledge base and cannot learn from their errors. Constraint suspension uses a discrete system model to check for inconsistencies in analytically redundant measurements without knowledge of failure modes [17]. Chapter 2 describes constraint suspension in more detail. The most prominent uses of Al in spacecraft VHM were the technology demonstration missions Deep Space 1 (DS-1) and Earth Observing 1 (EO-1), which flew versions of the Livingstone Al VHM system [18] [19]. Livingstone 2 on EO- 1 successfully isolated simulated faults while running for as long as 55 days at a time and over 143 days total. However, traditional software verification and validation (V&V) testing approaches were not feasible for these algorithms due to their complexity, and the V&V approaches used may be insufficient for manned spacecraft V&V requirements. Therefore, Al approaches under development offer the ability to detect and isolate a variety of faults but are not ready for implementation in current vehicles. Recent research has attempted to bridge the gap between quantitative and qualitative analytical redundancy [16] [20]. Constraint suspension provides one starting point for integration of the two fields. Davis introduced constraint suspension as a simple way to diagnose discrete systems without modeling failure modes [17]. Constraint suspension was used effectively on circuits, but could not be used on analog systems in its original form. Fesq built on Davis's work with the Marple software package, extending constraint suspension for use on spacecraft by adding analog capability [21]. Marple has been tested on spacecraft flight data and run on a real time processor with a high fidelity subsystem model [22][23]. Despite its many advantages, the algorithm is unable to robustly detect faults without significant tuning and has never been flighttested. Based on this survey of traditional VHM approaches, there does not exist a VHM method both simple enough for deployment on current manned flight systems, yet adequate for detecting and isolating a large class of faults. This thesis uses proven quantitative analytical methods to address a key weakness in the fault detection piece of Marple. The research goal is thus a VHM algorithm simple enough for implementation in current vehicles, but able to detect and isolate a large class of faults. 18 1.3 Thesis Overview This thesis is comprised of five chapters. Chapter 2 provides background on the key algorithms used throughout this thesis. Constraint suspension, parity space, and hypothesis testing are provided as tools to be combined in the final VHM algorithm. Chapter 3 details how these tools are integrated into Stochastic Constraint Suspension (SCS) and derives the expected performance of the VHM approach. Conceptual and numerical examples are shown to illustrate the improvement over original constraint suspension. Chapter 4 applies SCS to a generic numerical spacecraft subsystem application. Simulated sensor measurements from the model are used to illustrate the algorithm's performance and investigate its sensitivity to inputs and modeling inaccuracies. Chapter 5 summarizes the research and presents challenges and recommendations for future research. 19 20 Chapter 2: Key Algorithms This chapter presents the three key methods in VHM that will be integrated into Stochastic Constraint Suspension (SCS) in Chapter 3. Parity space and hypothesis testing are proven methods for detecting and isolating faults in redundant hardware. Constraint suspension in its analog form is a powerful VHM approach that has the potential for detecting and isolating a large class of faults. This chapter details the theory behind the approaches and Chapter 3 will describe how they are used in SCS. 2.1 Parity Space and Hypothesis Testing Parity space and hypothesis testing have been used extensively in aerospace applications for fault detection and isolation (FDI) in redundant hardware. Together, they determine consistency between noisy measurements with known uncertainties using preset false alarm rates. FDI typically utilizes the residual generation portion of parity space and the threshold generation and comparison aspects of hypothesis testing. Gleason and Gebre-Egziabher discuss the common use of parity space and hypothesis testing for global navigation satellite system (GNSS) integrity monitoring [24]. For instance, Global Positioning System (GPS) receivers monitor the integrity of the signals they receive from satellites using the redundant information available from extra satellites in view. Sturza presents the approach for use in monitoring accelerometers and gyroscopes in inertial measurement systems [25]. Strap down inertial measurement systems are often placed in skewed configurations to maximize redundancy. Parity space uses the redundancy with known geometry to monitor sensor failures. Recently, Draper Laboratory implemented parity space and hypothesis testing techniques to monitor redundant sets of GPS, inertial measurement unit (IMU), and Light Detection And Ranging (LIDAR) navigation sensors on the Orbital Sciences Corporation Cygnus vehicle [26]. Cygnus is meant to provide cargo resupply to the International Space Station (ISS), a manned spacecraft and valuable national asset, meaning knowledge of failures is especially critical. 21 2.1.1 Parity Space Historically, parity space techniques use information about the geometry and uncertainty of measurements to compute a decision scalar that represents the inconsistencies between sensors. The decision scalar is then compared against a threshold computed from hypothesis testing methods. More specifically, parity space transformations project the measurement vector into the space orthogonal to the measurement space. The result is a vector containing information about the measurement error, independent of the sensed state. The decision scalar is the weighted magnitude of this vector and is directly used to detect a fault. This thesis only utilizes the fault detection component of the parity space algorithm. The decision scalar is computed as follows. The linearized measurements can be expressed as z = Hx (2.1) where H is the measurement geometry matrix and x is the system state vector. The analysis assumes the measurements contain Gaussian white noise. The weighted least squares estimate of the state~x is computed as X = Hin,z (2.2) where Hi,, is the pseudoinverse of H and is defined as Hinv = (H T * W * H ) 1 *HT * W (2.3) where W is a weighting matrix containing the measurement uncertainties and correlations [24]. For independent measurements, W can be written as 1/2 = 0 ... 1/, ' 0 1 ' (2.4) 0 0 ... 0 where n is the number of measurements available at a given time step and o-i represents the standard deviation of the sensors' measurement error. The residual vector f is the difference f = z- z (2.5) 2 = H2= HHinyz (2.6) where 2 is computed using 22 with substitution from Eq. 2.2. Substituting Eq. 2.6 and simplifying, f = z - z = z - HHinoz = (I - HHi,,,) * z = Sz (2.7) where I is the square identity matrix of appropriate dimension and S is a transformation matrix used in practice to compute f. The decision scalar D is computed as D = f Wf = ... + (2.8) (L) where fi is the ith element of the residual vector. The decision scalar D represents a weighted magnitude of the residual vector f. This decision scalar is the result of the parity space algorithm to be used in hypothesis testing. 2.1.2 Hypothesis Testing in FDI Hypothesis testing techniques provide a threshold T to directly compare against the decision variable D. The acceptance or rejection of the null hypothesis proceeds according to HO = D < T (2.9) Hi = D > T (2.10) where the null hypothesis HO states that the system is healthy and the alternate hypothesis Hi states that there is a fault in the system [24]. As with all hypothesis testing techniques, two errors are possible. A rejection of the null hypothesis for a healthy system is a Type I error, called a false alarm (FA). The reverse scenario, the acceptance of the null hypothesis for a faulty system, is a Type II error referred to as a missed detection (MD). The setting of the threshold T dictates the tradeoff between the probability of false alarm P(FA) and probability of missed detection P(MD). In order to avoid excessive false alarms and missed detections in setting T, characterization of the decision scalar D distribution is crucial. As shown previously, D is the sum of squared random normal variables if the residuals are assumed to be normally distributed. The sum of squared random normal variables forms a chi-squared distribution [27]. When there is no fault, the residuals have zero mean and D follows a central chi-squared distribution. This analysis assumes faults are in the form of a measurement bias as is typical in failure modeling. The algorithm is able to detect other forms of faults, such as a component failing on or off, but a fault bias is convenient for the hypothesis testing illustration. The measurement bias induces a bias into the residuals, meaning D follows a noncentral chi-squared distribution for a faulty system. Appendix A provides more insight into the form of chi-squared distributions. Figure 1 23 illustrates the central and noncentral chi-squared distributions along with an example threshold. A decision scalar D that is greater than T for a nominal system (central chi-squared distribution) results in a false alarm. Therefore, the area under the central distribution curve to the right of T represents the P(FA). Conversely, the P(MD) is the area under the noncentral chi-squared distribution less than T and is also labeled. Figure 1 shows that the selection of T is a tradeoff between the P(MD) and P(FA) for a given system. Chi-squared distributions are defined by the degrees of freedom (DOF) and in FDI the DOF can be represented as the difference M-N, where M is the number of measurements and N is the number of state vector components. The noncentral chi-squared distribution is also defined by the noncentrality parameter k, which will be discussed in more depth in Chapter 3. Additional redundant measurements increase the DOF and lower both the P(FA) and P(MD). Improving the signal-to-noise ratio (SNR) provides a similar effect. 0.5 0.40.3. X2 (M-N) LL. 0 0.2 0.- TP P(MD)T )x 2(M-N, X)~ 0 0 2 4 6 8 10 X Figure 1: The Hypothesis Testing Tradeoff between P(FA) and P(MD) Explicit formulas for selecting T are presented in literature for use with the parity space approach [24]. Probabilistic reasoning using Figure 1 leads to 24 P(FA) = X 2 (D > TIM - N) = 1 -X 2 (D < TIM - N) (2.11) where X2 is the central chi-squared probability density function. Eq. 2.11 can be rearranged and expressed as X2 (D < TIM - N) = 1 - P(FA) = X2af(TIM - N) (2.12) where Xdfis the central chi-squared cumulative distribution function. Solving Eq. 2.12 for T, T = X 2f (1 - P(FA)IM - N) (2.13) where P(FA) is the desired probability of false alarm, M-N is the DOF, and X2 cdf is the inverse cumulative chi-squared function. Due to the complexity of the chi-squared cumulative distribution function, a numeric lookup table is used for setting T in practice. The ability to set T according to a desired P(FA) is a significant advantage to hypothesis testing because VHM performance is often challenging to predict. 2.2 Constraint Suspension Constraint suspension forms the basis of the SCS algorithm presented in Chapter 3. Constraint suspension does not have flight heritage like parity space and hypothesis testing, but it is a powerful approach for detecting and isolating a wide variety of system level faults. The version of constraint suspension described and used here is taken from the Marple VHM software pioneered by Fesq [21]. Fesq extended constraint suspension to handle analog systems with the goal of applying the approach to spacecraft. The detection and isolation procedures are outlined here followed by a discussion of the algorithm's merits and limitations. 2.2.1 Detection The system to be diagnosed is first modeled in a modular mathematical form. The modules are called components and can represent specific hardware and subsystems, or abstract models such as dynamics. The components are connected by nodes that transmit information within the model. Forward and reverse constraint functions are defined for each component, mapping inputs to outputs and outputs to inputs respectively. Constraint functions are not restricted to be linear or continuous and may be based on analytical or empirical relationships. Sensors are present at nodes where observability exists. At a given time step, sensor values are propagated forward and backward through the model using the constraint functions. The 25 propagation stops when it reaches a node where a sensor is present. This procedure results in multiple values at every node, originating from forward propagation, reverse propagation, or sensors at that particular node. In a nominal system, the values will be consistent at each node. Inconsistency at any node triggers the isolation algorithm. Figure 2 shows a generic system constraint model with example data paths. Sensed values at the boundaries are propagated through components using constraint functions, resulting in analytically redundant values at each node for comparison. Figure 3 illustrates example constraint function connections within an adder component and the resulting values to be checked for consistency. Each node is a function of the values at the other two nodes. The output node has three values to compare against each other, originating from forward propagation through the adder, a sensor at the node, and reverse propagation from nodes upstream. The adder illustrates that propagation through reverse constraint functions is not always possible in practice, because knowledge of the second input value would be required and may not be available [21]. Sensed input Sensed -, output '4 Propagated input S%*% output o Sensed Sensed input Propagated input output LI Component --- Node 4-- ' Figure 2: Example System Constraint Model 26 Propagated Forward data path Reverse data path Sensor Forward constraint S.. Reverse Propagated x+y Reverse constraint Z-x Consistency Adder Check Figure 3: Example Adder Component Constraint Model Figure 4 and Figure 5 provide basic examples to demonstrate constraint suspension fault detection. The system consists of one adder and two gains and ignores sensor noise for simplicity. Values at the central node are displayed after propagation from the sensors at the boundary nodes. Figure 4 measurements agree because the system is performing nominally, while the measurements in Figure 5 are inconsistent due to a fault in the adder component. The location alone of the inconsistent node does not reveal the failed component and an additional isolation procedure is needed. In real systems, noise and model inaccuracies will be present, making perfect agreement at any node unlikely. A more sophisticated method for checking consistency is therefore required. This approach must distinguish inconsistency due to noise from inconsistency due to a fault. Fesq provides two ideas as starting points [21]. First, tolerances can be set for the difference between the minimum and maximum values at each node. This fixed threshold (FT) algorithm compares the range to a predetermined value at each node. The second approach uses percentage differences rather than absolute differences. These methods require significant tuning and have minimal quantitative basis. Chapter 3 provides a conceptual example that illustrates the weakness in the FT technique and offers an improved methodology. 27 Key Sensed n/a Forward 2 0 Sensor Component Information Flow Reverse1 --------------- Sensed = 5 Input = 1 Sensed = 10 Forward * ---------------- Reverse2 Figure 4: Detection Example for a Nominal System Figure 5: Detection Example for a Faulty System 28 2.2.2 Isolation The propagation and consistency checking algorithms used in detection enable isolation of the faulty components or sensors. Isolation is performed by systematically suspending each component, and again propagating the sensor values through the system to compare analytically redundant values. No information is propagated forward or backward through a suspended component, effectively removing all assumptions about its operation. suspended by ignoring their measurements. Sensors may also be If after suspending a component or sensor, consistency is achieved at all nodes, then the suspended component is added to a list of potentially faulty components. However, if inconsistencies remain, then the component is exonerated and the fault has not yet been properly isolated. Simultaneous failures can also be isolated if all faulty components are suspended. However, suspending all combinations of possible faulty components can be very computationally expensive so it can usually be assumed that two component failures do not occur within the same time step. The algorithm outputs a list of potentially faulty components, with sensor placement and component configuration determining the diagnostic resolution. Without observability into certain nodes, a fault may only be isolated to a set of components rather than to a specific component. Diagnostic resolution in constraint suspension is discussed by Fesq [21], where component sets with no internal observability are called superstructures. Figure 6 provides the basic pseudocode for constraint suspension. In order to simplify the pseudocode, this version does not include the logic needed to handle simultaneous faults or hierarchical systems. The constraint suspension implementation tested in Chapters 3 and 4 contains these capabilities. 29 In puts: Constraint model (includes forward and reverse constraint functions and the appropriate connections between components), sensor measurements, fixed tolerance levelsfor each node 0. Place sensor data into model 1. Propagate sensor values forward and backward 2. Check nodes for consistency 3. If (fault detected) a) Generate list of possible faulty components and sensors b) For (each component and sensor on list) i. Suspend component/sensor ii. Propagate sensor values forward and backward iii. Check nodes for consistency iv. If nominal: add component/sensor to output list v. Unsuspend component/sensor 4. Return output list Output: List of potentially faulty components and sensors Figure 6: Pseudocode for Constraint Suspension Figure 7 continues the simple example presented in Figure 5. Suspending the faulty adder component results in agreement at the central node because no information is propagated through it. It can be shown that suspending either of the two gain components would not remove the inconsistency. Removing the sensor at the output of the adder would also result in agreement in the model. This illustrates the diagnostic resolution limitation at the boundary of a system model. 30 Sensed Forward n/a 2 Reversel n/a Reverse2 2 Key 0 Sensor Component Information Flow ------> FAULT Sensed = 0 Input = 1 x.5 Sensed = 10 Figure 7: Isolation Example for a Faulty System 2.2.3 Merits and Limitations Analog constraint suspension has several inherent advantages compared to many of the methods described in the literature review. First, it takes advantage of modeling and characterization work done during the design and test process. The VHM designer can organize these results into constraint models rather than build a new system model. VHM testing results could even be valuable in influencing the system design by demonstrating observability of critical faults. Second, the algorithm can be applied to a wide range of system models. Hierarchical models can be used to scale the algorithm to more complex systems. For instance, a top level model may have components representing subsystems so that a fault could first be isolated to a subsystem. Lower level constraint models within this component could then be used to further isolate the failure. The algorithm flexibility also refers to its ability to handle discrete, continuous, or hybrid systems using the constraint functions. Third, the application system model of defined connections and constraint functions is an input to the algorithm, facilitating algorithm code reuse between applications. The reuse of VHM architectures between applications addresses a key need in the fault management community identified in a 2008 workshop [5]. Fourth and most importantly, a large class of faults can be detected and isolated if they are observable, because no failure mode information is needed in the flight software (FSW). The FSW only contains nominal system models and any behavior detected as off nominal is flagged a as a fault. Single sensor, common cause sensor, actuator, multiple component, and subsystem 31 level failures are all within the scope of the constraint suspension. Common cause sensor failures refer to the failure of a line of redundant hardware components, as in the STS 51-F example presented in Chapter 1. Computation time may be a limitation of the constraint suspension approach. The computation needed to detect and isolate a fault may be greater than that of traditional VHM methods such as parity space and limit checking. Kolcio proposed and successfully demonstrated the "chase data" implementation of constraint suspension to address this issue, where each isolation step uses a new time step of sensor data rather than the original time step where the fault was detected [22]. This assumes the fault is persistent across multiple time steps, meaning transient faults would be difficult to isolate. Today, faster processors and multi-core technology may enable new solutions to the computing challenge. Constraint suspension is unable to determine the underlying cause of a given fault. The lack of failure mode information is an advantage in robust isolation to a component or sensor, but prevents the algorithm from specifying how the hardware failed. In a manned spacecraft application, redundancy is typically present in all critical systems and the isolation of a fault to specific piece of hardware is enough to reconfigure the system by swapping a redundant system for the faulty system. Failure mode information is not necessary in this scenario and may be deduced at a later time by additional analysis offline. If failure mode information is necessary during FDI, additional logic must be added to the constraint suspension algorithm. The Sherlock VHM system presented by deKleer achieves this by combining constraint suspension with fault modeling [28]. Diagnosis of a fault is beyond the scope of this thesis. Chapter 3 will show that the FT consistency checking method outlined previously is a significant weakness of analog constraint suspension. The thresholds directly impact VHM performance in both detection and isolation, yet there is not a rigorous procedure for selecting them appropriately. Dvorak [29] and Goldstone [30] proposed alternative methods for consistency that use interval propagation followed by checking for overlap. Fesq chose tolerance checks at each node to allow for more granularity in tuning VHM performance [21]. Chapter 3 incorporates a proven, quantitative approach for consistency checking in constraint suspension using parity space and hypothesis testing. Conceptual and numerical examples explicitly show improvement over the FT implementation. 32 Chapter 3: Stochastic Constraint Suspension Stochastic Constraint Suspension (SCS) utilizes parity space, hypothesis testing, and constraint suspension techniques in order to significantly improve fault detection and isolation performance. SCS explicitly accounts for measurement uncertainties and provides a generalized approach for consistency checking. This chapter details the SCS approach, including the uncertainty propagation algorithm and the efficient use of hardware redundancy. Analytical expected performance of SCS is derived and conceptual and numerical examples are provided for a single node case. The numerical and conceptual results demonstrate improvement in detection over FT constraint suspension and agree with analytical predictions. 3.1 Algorithm Description Stochastic Constraint Suspension builds upon the architecture of constraint suspension as detailed in Chapter 2. Figure 8 gives the simplified pseudocode for SCS. The changes from the FT constraint suspension implementation in Figure 6 are in bold. Rather than fixed tolerance levels, the P(FA) at each node is an input to the algorithm based on the acceptable risk requirements. In addition to measurement values, associated uncertainties are propagated through the system constraint models. The SCS consistency check then uses parity space and hypothesis testing techniques to more robustly determine agreement at each node. Eqs. 2.8 and 2.13 are used to compute the decision scalar and the threshold respectively. Eqs. 2.9 and 2.10 are then used to determine the presence of a fault. The rejection of the null hypothesis at any node indicates a fault and triggers the isolation algorithm. 33 Inputs: Constraint model (includes forward and reverse constraint functions and the appropriate connections between components), sensor measurements, P(FA) for each node 0. Place sensor data into model 1. Propagate sensor values and uncertainties forward and backward 2. Check nodes for consistency using parity space and hypothesis testing 3. If (faultdetected) a) Generate list of possible faulty components and sensors b) For (each component and sensor on list) i. Suspend component/sensor ii. Propagate sensor values and uncertainties forward and backward iii. Check nodes for consistency using parity space and hypothesis testing iv. If nominal: add component/sensor to output list v. Unsuspend component/sensor 4. Return output list Output: List of potentially faulty components and sensors Figure 8: Stochastic Constraint Suspension Pseudocode The algorithm dynamically calculates the threshold value at each node based on both the number of redundant analytical measurements available at the node and on the P(FA) input parameter. As shown in Eq. 2.13, the number of analytical measurements determines the DOF of the chi-squared distribution. As the number of DOF increases, the chi-squared distributions in Figure 1 are shifted, reducing the missed detection and false alarm areas. Therefore, additional measurements always improve the performance of the algorithm, as would be expected for an estimator. As the input P(FA) increases, the threshold decreases in magnitude, reducing the P(MD) at the expense of P(FA). The fault bias affects the noncentrality parameter X. A larger fault bias skews the mean of the noncentral distribution to the right, reducing both the P(FA) and P(MD) for a given T. Appendix A explicitly derives the relationship between the fault bias and k. 34 3.1.1 Uncertainty Propagation As discussed above, parity space and hypothesis testing require an estimate of uncertainties associated with each analytical measurement. Though the uncertainties of the sensor measurements are assumed to be well characterized from vendor specifications and/or testing, the analytical measurement uncertainties are not necessarily known a priori. The calculation of the decision scalar using Eq. 2.8 requires uncertainty estimates at each node. Without accurate analytical estimates of the analytical measurements, parity space computations are not feasible. Knowledge of measurement uncertainty is critical in improving VHM performance using SCS. Measurement uncertainty includes sensor noise, misalignment, disturbances, and any other variables other than a fault that may impact measurements of the state. In order to solve this problem, the sensor uncertainties are propagated with the analog measurement values through the constraint model components. Several algorithms exist for propagating uncertainty through a system with known dynamics. The method chosen here uses the linearized constraint functions. The uncertainty is propagated using Ef = AgZSAT + Q (3.1) where AR is the Jacobian of the constraint function evaluated at the current estimate 2, Ii is the initial covariance matrix, Ef is the final covariance matrix, and Q represents uncertainty in the constraint function [31]. Using Equation 3.1, normal probability distributions for each measurement, represented by a mean and variance, can be propagated through component constraints in the model. Though better methods exist for propagating uncertainty in nonlinear systems, this method was chosen for its simplicity and low computation cost. More sophisticated methods can be used if higher fidelity knowledge of uncertainty is required. 3.1.2 Utilizing Hardware Redundancy The constraint suspension framework is not intended to eliminate the use of traditional sensor level FDI, such as with parity space and hypothesis testing, for consistency checks of redundant hardware. Manned spacecraft will continue to have significant hardware redundancy and the direct use of sensor level FDI on redundant hardware is a proven, computationally efficient method. Therefore single sensor failures will be isolated using traditional methods where possible. Higher level FDI logic such as that used in SCS requires significantly more 35 computation than sensor level FDI. An advantage of SCS is its ability to isolate a wide variety of faults, such as actuator and common cause sensor failures. Sensor1 Sensor2 Sensor3 Low Level Parity Space & Hypothesis Testing Stochastic Constraint Suspension Figure 9: Utilizing Redundant Sensor Hardware Figure 9 illustrates how the low level parity space and hypothesis testing on redundant hardware fits into the SCS framework. Low level sensor checks save computation by isolating single sensor failures before the SCS algorithm is initiated. Sensors placed at a node represent all redundant sensors with direct observability of that node. Parity space and hypothesis testing check the consistency of the measurements before they are placed at a node for use in SCS. Any faulty measurements will be removed from the system before the constraint suspension process is initiated and SCS will see the redundant sensors as a single sensor. If a common cause sensor failure occurs, the constraint suspension process can isolate the fault to the sensor group. 3.2 Analytical Expected Performance The ability to predict consistency checking performance in terms of P(FA) and P(MD) is a significant advantage of parity space and hypothesis testing over the FT implementation. All VHM methods must make tradeoffs between P(FA) and P(MD). In SCS, P(FA) is an input to the algorithm, which then fixes P(MD) for a given system. The expected P(MD) as a function of P(FA) and the fault bias at a given node is derived here. As in Chapter 2, the fault is modeled as a bias in one of the measurements. 36 Using Figure 1, the P(MD) for fault mode i is P(MDi) = P(D < TIM - N,li) where XNcdf = XNcdf(TIM - N,A/i) (3.2) is the noncentral cumulative chi-squared distribution and Ai is the noncentrality parameter for fault mode i. An expression for /i is derived in Appendix A and given in Eq. A.20. Ai is a function of the fault bias and measurement uncertainties. A larger fault bias results in a larger Ai, causing the noncentral distribution to be skewed more to the right. As expected, the larger fault bias results in easier detection and thus better performance through lower P(FA) and P(MD). P(MDi) can be expressed a function of the P(FA) input with P(MDi) = X2 cdf (Xcd (1 - P(FA)IM - N) M - Ni) (3.3) where Eq. 2.13 was substituted for T. XN_c dfand Xccdf represent the noncentral and central chi- squared cumulative distribution functions respectively. Section 3.3 uses this result to compare simulation results to predicted performance. The overall P(MD) at the node is computed by summing the weighted probabilities of each fault mode using P(MD) = P(Faulti) * P(MDi) (3.4) where P(Faulti)is the probability of occurrence for fault mode i. If all faults are assumed equally likely, Eq. 3.4 can be simplified to P(MD) = - P(MDi) (3.5) where m is the number of fault modes. The equations derived here are valid for P(MD) at a given node. The system level performance measured by P(FA) and P(MD) will depend on the system model and sensor placement. 3.3 Demonstration of Improvement This section demonstrates the value of the SCS implementation as compared to the FT approach for consistency checking at nodes. A representative conceptual detection scenario is presented and then implemented numerically. The improvement in detection is expected to result in better isolation performance. 37 3.3.1 Conceptual The following conceptual example illustrates the value of parity space and hypothesis testing for node consistency checks. This representative scenario has three measurements to compare at a node. Since the purpose of this example is to compare arbitrary analytical measurements, the origin of the values is not important here. The measurements may result from forward propagation, reverse propagation, or direct sensing at the node. Figure 10 shows example origins and distributions for the three measurements. Two measurements have relatively low uncertainty (xi and X2), while the third has a relatively high uncertainty (x 3). The distributions of xi and x2 are offset, indicating the possible presence of a fault in the system. 2e x1 * x2 x3 from sensor x1 from . 1.5 x3 x2 from forward propagation reverse propagation 0.5 05 10 15 x Figure 10: Representative Measurement Comparison Conceptual Example In the FT implementation of constraint suspension, the difference between the maximum and minimum values is computed and compared to a predetermined threshold T [21]. The presence of x 3 with relatively high uncertainty forces the choice of a large T to account for the possible variation and to avoid excessive false alarms. However, a large T prevents the algorithm from recognizing the significant difference in the higher quality measurements xi and x 2, increasing the missed detection rate. Despite having two relatively high quality measurements, the VHM algorithm is forced to make a suboptimal tradeoff between the P(MD) and P(FA). The extra information provided by x 3 is decreasing VHM performance, when additional measurements should only improve an estimator's performance. The weakness of the FT approach is its inability to use knowledge of measurement uncertainty. The FT method is sufficient if measurements have equal uncertainties, but this is rarely true in constraint suspension consistency checking. This is because the analytically 38 redundant measurements originate from different sensors and some are propagated through constraint functions. A threshold for each combination of two measurements is a feasible solution but would be difficult to implement in practice. Parity space and hypothesis testing address these issues by explicitly accounting for measurement uncertainties. They also ease the burden of the system designer by removing the need to tune T at every node. P(FA) is an input to each node, making the VHM performance directly tunable. 3.3.2 Numerical The simulation presented here was created to quantitatively demonstrate the conceptual scenario in Section 3.3.1. The FT and SCS consistency checking algorithms were implemented at a single node. Three measurements, x1 , x2 , and x3, were present at the node, with a deterministic bias added to x 2 to simulate a fault. The uncertainty of measurement x 3 was varied to test VHM performance as measurement uncertainties increasingly differed. Performance was defined relative to the P(FA) and P(MD). A single standard performance metric does not exist for VHM algorithms so a new one is presented here [5]. A figure of merit (FOM) was developed so that it yields values between 0 and 1, where higher values indicate better performance. The FOM is expressed as FOM = 1 - c * P(FA) + 1 Ci C2 * P(MD) (3.6) + C2 where ci and c2 are constants allowing the designer to tune the relative importance of false alarms and missed detections, as the requirements will vary between applications. For this simulation, ci and c2 are set to 1. First, the inputs to FT and SCS algorithms, T and P(FA) respectively, must be set appropriately. Selecting the thresholds for the FT method is not straightforward due to the difficulty in mapping T to P(FA) and P(MD). In order to fairly assess the performance of the FT implementation, the algorithm was numerically optimized at each data point. A range of thresholds was input for each set of data and the threshold resulting in the highest FOM was chosen. Therefore, the data presented for FT represents the best possible performance for the given data and the chosen FOM. The P(MD) is known for a given failure mode and input P(FA) in parity space and hypothesis testing. Therefore, the optimal input P(FA) can be calculated for a given FOM, measurement DOF, measurement uncertainties, and expected fault bias using Eq. 3.3. A closed 39 form analytical solution is cumbersome due to the form of the chi-squared cumulative distribution functions, but a lookup table was computed a priori to provide the P(FA) input expected to maximize FOM. This lookup table was used to select the input P(FA) for each SCS data point. Simulations comparing performance with numerically optimized P(FA) input and with P(FA) input based on the optimal lookup table showed nearly indistinguishable performance, rendering numerical optimization unnecessary for SCS. 1 0.95' 0.9 0.85 0.8 - FT - SCS SCS Expected 0.75 0.7 0 5 10 G3 15 0 Figure 11: Numerical Simulation Results Comparing SCS and FT. 40 -FT SCS SCS Expected 0.99 0.98 0 LL 0.97 0.96 0.95 0 15 10 5 0-3 0 Figure 12: Zoomed in Numerical Simulation Results Comparing SCS and FT. Figure 11 presents the simulation results and Figure 12 shows a zoomed in area from Figure 11. 1000 sets of measurements for both the faulty and nominal scenarios were generated for each data point. These measurements were fed to the FT and SCS implementations of consistency checking to numerically determine P(FA) and P(MD) for each method. Between data points, the ratio a3 /uo was varied, where q3 is the uncertainty of x 3 and u0 is the uncertainty of measurements xi and x2. This effectively increased the difference between uncertainties among the measurements considered. At low ratios, the measurements had very similar uncertainties and the FT method could be optimized to work as well as the SCS implementation. As the ratio grew, FT performance declined rapidly while SCS maintained a relatively high FOM. SCS maintained 97% of its original performance after a one order of magnitude increase in the ratio, while FT degraded to 87% in the same interval. This ratio increase is the equivalent of a sensor performance degradation leading to one order of magnitude more noise in the signal. Decrease in the SCS FOM is due to a lower overall signal-to-noise ratio (SNR) in the system at 41 higher ratios. Decrease in the FT FOM is primarily due to the logic presented in Section 3.1.1. An expected performance curve derived from Eq. 3.3 and knowledge of the only fault mode validates the simulation results. Further validation of the logic presented in Section 3.3.1 is possible through a simulation with only two measurements available at a node. Due to the presence of only one possible combination of measurements for comparison, a single threshold is sufficient and the FT method can be optimized to perform as well as SCS for any noise ratio. As with the three measurement analysis presented earlier, 1000 data sets were generated for both the faulty and nominal scenarios with fault insertion done by adding a deterministic bias to x 2 . The ratio c 2 /U0 was varied, where a2 is the uncertainty of x 2 and aois the uncertainty of measurement x1 . The threshold for the FT method was again numerically optimized, while the P(FA) input for SCS was found using a lookup table tuned for one less DOF 0.95 0.9: -FT 0 IL -SCS SCS Expected 0.85 0.8 0.75 0 2 6 4 2 8 0 Figure 13: Numerical Simulation Results for Two Measurements Scenario 42 Figure 13 shows the simulation results for the two-measurement scenario. As predicted, the performance of SCS and FT are indistinguishable, with performance degradation occurring due to decreasing SNR. Again, the expected curve for SCS provides validation of the data. Though performance in the two measurement case is identical, the numerical optimization of the FT approach is less practical. The SCS implementation is therefore more desirable because of its predictable performance. 43 44 Chapter 4: Spacecraft Application As with any algorithm, Stochastic Constraint Suspension (SCS) must be validated before use on a flight system [5]. Simulated sensor data from a representative system provides an inexpensive way to validate and refine SCS on the ground. The selected spacecraft application is the guidance, navigation, and control (GNC) subsystem of a generic manned spacecraft with typical sensors and actuators present. This application leverages Draper Laboratory's extensive experience with manned spacecraft GNC. For this purpose, we created a new simulation to produce realistic sensor data for the VHM algorithm. We present the 3-DOF simulation architecture and the models of vehicle actuators, dynamics, and sensors. We then converted the vehicle simulation into a constraint model for embedding inside SCS. The constraint functions and framework are detailed along with the alternative constraint representations considered. Finally, SCS performance data is provided for a variety of test cases, including a direct comparison to the Fixed Threshold (FT) implementation at the system level. 4.1 Application Overview The application used for validation in this chapter is a generic GNC subsystem on a manned spacecraft. The sensors, actuators and associated uncertainty levels present are representative of a portion of a typical spacecraft GNC subsystem. Noise levels are based on space qualified hardware specifications where open source data is available. This 3-DOF simulation accounts for translational motion only, so attitude sensors (e.g., star trackers, gyroscopes) are not included. The simulated spacecraft performs maneuvers in orbit using a set of 11 primary thrusters. The configuration was chosen for 6-DOF control on a lifting body design. Each thruster produces 216 N of force with a standard deviation of 10 N when on. These values are based on the European Space Agency (ESA) Automated Transfer Vehicle (ATV) reaction control system thrusters [32]. Second and third string redundant thrusters are typically present in manned spacecraft but are not included here, because they are not necessary for validating the SCS algorithm. To improve the observability of thruster failures, one temperature sensor was added to each thruster; these sensors determine the on or off state. Alternately, pressure sensors could 45 have been used to provide similar information. The navigation system uses an inertial measurement unit (IMU) containing an accelerometer and a gyroscope for each axis. The accelerometers and gyroscopes provide body frame accelerations and angular velocities, respectively. A Global Positioning System (GPS) receiver is modeled as providing absolute position and velocity data. The navigation system will also have redundant sensors that are used for redundancy and fault detection. These are not included in the model because the low level FDI as shown in Figure 9 is assumed to be in place. The hardware configuration in Table 1 presents the vehicle layout and thruster directions. Figure 14 illustrates the approximate thruster locations and directions and defines the coordinate system using side, top, and rear views. The rear view does not show thrusters parallel to the x axis. Thrusters 4 through 11 are canted relative to the coordinate axes and are split into their component vectors for illustration purposes. Unit Direction Vector Location (m) Center of Mass [0, 0, 0] N/A GPS Receiver [0, 0, 0] N/A Inertial Measurement Unit [0,0, 0] N/A Thruster 1 [-4, 0,0] [-1,0,0] Thruster 2 [3,-2, 0] [1,0,0] Thruster 3 [3, 2, 0] [1,0,0] Thruster 4 [-3,-1, 0] [0,-0.707,-0.707] Thruster 5 [-3, 1, 0] [0, 0.707,-0.707] Thruster 6 [-2.5,-0.5, 1] [0,-0.707, 0.707] Thruster 7 [-2.5, 0.5, 1] [0, 0.707, 0.707] Thruster 8 [1.5,-2,-1} [0,-0.707, 0.707] Thruster 9 [ 2,-2, 0] [0,-0.707,-0.707] Thruster 10 [1.5, 2,-1] [0, 0.707, 0.707] Thruster 11 [ 2, 2, 0] [0, 0.707,-0.707] Table 1: Generalized Hardware Locations and Thruster Directions 46 1 7X10 9/11 3 1 2 4 6 5 -Direction SIE VIEW IE381 RThruster TOP 6/7 z t4 94~ 1 4/5 6 9 SIDE VIEW x* 9/11 2/3 8/10 7 Z Y% 8 8 9 10 5 4 REAR VIEW Figure 14: Generalized Thruster Configuration and Coordinate System 4.2 Vehicle Simulation A 3-DOF simulation of the vehicle hardware and dynamics produced sensor data for VHM validation. The simulation was open loop because we are only assessing the algorithm's ability to detect and isolate faults. The VHM algorithm is independent of the GNC approach and thus there is no need to close the GNC loop. Therefore, the sensed state has no effect on the programmed thruster commands. The complete architecture is presented in Figure 15. Open loop on or off commands for each thruster are inputs to the simulation. Spacecraft maneuvers are performed using predetermined sequences of commands. All simulations in this chapter command all of the thrusters to the 'on' position for each time step to ensure 47 observability into thrusters stuck off. There is still a net acceleration on the vehicle in the x direction to facilitate VHM testing. Forces on the vehicle are computed from these commands based on the thrust vector and placement of each actuator. The actuator force magnitude for thruster i is computed as Fj = cmdi * N (Fnom, Fstd 2) (4.1) where Fnom is the nominal thruster force, Fstais the expected normal standard deviation of the thruster force, cmdiis 0 for an 'off command and 1 for an 'on' command to thruster i, and N(I, a2 ) indicates a random number generated from a normal distribution with mean p and standard deviation a. The force vector from thruster i is then F, = -Fi * dirt (4.2) where dtr is the unit vector in the ith thruster nozzle direction. The negative sign is present because the thrust is in the opposite direction to the thruster nozzle. External environmental disturbances also produce stochastic disturbance forces on the vehicle. The environmental forces are produced with env = N(0,Uenv 2 ) (4.3) where Fen, is the zero mean environmental disturbance force and aen, is the normal standard deviation of this force. The total force on the vehicle in the body frame, F, is F =Jen, + F (4.4) i=1 where the forces from the eleven thrusters and the environment are summed. The total force from thrusters and disturbances are inputs to the dynamics model. The dynamics model uses the vehicle mass properties to compute linear accelerations on each axis in the inertial frame. The body and inertial frames are chosen to be equivalent in this simulation, because no rotational motion is included. The accelerations are integrated over time to produce the current vehicle state. The acceleration a, velocity v, and position r on each axis were computed using ,F a =V= f 48 = i+d*dt (4.5) (4.6) r f i = ij +Vi0 * (4.7) it + 2* d 2* d t where F is the force on the vehicle, m is the vehicle mass, initial values are indicated by the 0 subscript, and dt is the time step used in the simulation (set to 0.01 seconds). The navigation sensors measure the truth acceleration, velocity, and position components of the vehicle with given uncertainties. The accelerometer is based on the Honeywell Miniature Inertial Measurement Unit (MIMU) [33] and the GPS is based on the Integrated GPS and Occultation Receiver (IGOR) [34]. For GPS and accelerometer measurements, the sensed values were simulated as = N (truth,Uaccei 2 ) (4.8) r = N(truth, 0o ps2) (4.9) where d is measured acceleration on a given axis, P is GPS measured position on a given axis, truth is the state component value computed by the simulated dynamics, Jaccei is the standard deviation of white noise on the accelerometer measurement, and op is the standard deviation of white noise on the GPS measurement. The temperature sensor parameters were chosen to have a conservative signal-to-noise ratio (SNR). Here the signal can be defined as the difference between the on and off temperature because we are interested in determining on or off state rather than the absolute temperature. The temperature sensors are modeled as 300 + N(0,T 2 ) : off 400 + N(0,o T 2 ) . (4.10) where T is the measured temperature on a given thruster in K, 300 K is the nominal off temperature, 400 K is the nominal on temperature, and or is the standard deviation of the white noise on the sensor measurements. These values provide a very conservative estimate of the SNR because in reality, the temperature extremes could be much greater for a sensor measuring thruster exhaust temperature. The space shuttle main engine (SSME) hot gas temperature sensor was required to measure temperatures of approximately 1000 K during engine firings and 100 K when the engine was off due to the cryogenic propellants. The sensor was required to have <0.5% uncertainty in these conditions [35]. To avoid the need for such a complex and expensive sensor, this simulation assumes coarse sensors slightly upstream of the thruster exhaust. This is acceptable, because the purpose of the sensors here is to determine on or off state for each 49 thruster, whereas the Shuttle temperature sensors performed more involved VHM on the thruster performance. Table 2 summarizes the nominal model parameters and associated uncertainties. Standard deviation Mean 216 N ION 0N 1N Accelerometer component Truth 0.00212 m/s 2 GPS position component Truth 1.2 m T. = 400 K, T0 f= 300 K 5K 10,000 kg 0 1 Hz 0 Thruster Force Environmental Disturbance Force Temperature Sensor Vehicle Mass Control and Sensor Rates Table 2: Model Parameters and Uncertainties Faults can be inserted for thrusters or sensors. We model thruster faults as valves stuck in the on or off position. For algorithm evaluation purposes, we designed simulated maneuvers that yielded observable results. If a thruster is failed off, it must be commanded on in the maneuver for detection to be possible. Simulations in this chapter consider only one time step of data in order to eliminate any dependence on the time elapsed before a fault. Therefore, the maneuvers in this chapter command all thrusters 'on' for one time step in order for any thruster stuck off to be observable. Though this is not a realistic thruster firing sequence, the maneuver ensures observability into thrusters stuck off for validation purposes and there is a net acceleration on the vehicle because the thrusters are not balanced in the x direction. For sensors, a deterministic fault bias can be added to any hardware in the system. Sensor failures are modeled as a step bias shift for consistency with literature [25]. For example, a fault bias of 1 m/s2 may be added to the accelerometer x axis measurement. Several assumptions were made to simplify the simulation. Vehicle mass properties were assumed to be constant and perfectly known. In practice, fuel consumption, flexible structures, and vehicle reconfigurations would require mass properties to be periodically updated in the constraint models. Thrusters are assumed to be fixed and either on or off, with no gimbaling or throttling capability. All thrusters are assumed to have the same uncertainty in force. No uncertainty in thruster direction or on time is incorporated. 50 Figure 15: Spacecraft Simulation Architecture Sensor models incorporate the following assumptions. Sensors are assumed to be located at the center of mass and the IMU frame is assumed to be aligned with the body frame to remove the need for any frame transformations. Frame transformations are easily incorporated into constraint functions if required. No uncertainty in sensor locations or alignment was included. Thruster commands occur at 1 Hz. In reality, sensor rates will vary from the control rate. The simulation assumes the SCS algorithm runs at the controller rate and samples each sensor at this rate. Thus, sensor measurements are produced at 1 Hz in the simulation for use in the VHM algorithm. Simulated sensor measurements are produced by adding white noise to the truth state. In practice, logic must be in place to provide all sensor measurements at the desired VHM rate. SCS requires sensor measurements at the VHM rate but sensor sampling rates will vary. Therefore, a procedure for supplying the available sensor measurements at the appropriate rate to SCS is required. However, the algorithm for selecting sensor measurements at the correct rate for SCS is unnecessary for initial validation. The availability of sensor measurements at a faster 51 rate than VHM would enable the use of filtering and should only increase the signal-to-noise ratio. Vehicle orbital and rotational dynamics were not incorporated into the simulation in order to mitigate the complexity of the constraint models for initial validation. Linear equations of motion were used with discrete integration steps. Environmental disturbances were generated using random normally distributed forces with zero mean and a given standard deviation. Models of specific disturbances such as gravity gradient and drag were not included because the level of dynamics of uncertainty is more important in SCS validation than the specific disturbances. 4.3 Constraint Models SCS requires a constraint model of the system that is going to be monitored. As described in Chapter 2, this model consists of a network of components connected by nodes, constraint functions within components, and sensors at appropriate nodes. Section 4.1 presented the GNC spacecraft subsystem application. GNC subsystems are closed loop, creating an extra complication in VHM implementation. Fault information may propagate through the feedback loop and make it challenging to detect the faulty component or sensor. Fesq determined that closed loop systems could be addressed by breaking the loop and not modeling the controller [21]. However, this loop-breaking approach requires that the VHM software be fed sensor data at a rate at least as fast as the control loop rate. This prevents faults in one cycle from affecting data in the next cycle. All components of the algorithm and simulation operate at 1 Hz in order for the control loop to be broken for VHM. A given system has more than one possible constraint model representation, and the designer's selection of a constraint model can be significant in determining the VHM's diagnostic resolution and computational load. In Section 2.3, we showed that superstructures represent the limits of diagnostic resolution based on a given constraint model and sensor placement. A fault in a component within a superstructure can only be isolated to the full group of components forming the superstructure. For example, decoupling vehicle dynamics into multiple components may eliminate superstructures and allow isolation to a single component rather than a group of components. However, extra components add complexity to the constraint model and require additional computation. Constraint model complexity also impacts the 52 designer's ability to comprehend and verify the integrated model and algorithm. Figure 16 shows the selected constraint model. Blue blocks and yellow circles represent components and sensors as defined by SCS, respectively. The arrows point in the forward propagation direction. The model contains more components than may be intuitive, but this design choice increases diagnostic resolution for the given simulated hardware configuration. ON/OFF COMMANDS FORCES/ TEMPERATURES THRUSTERS TEMPERATURE SENSORS ACCELERATIONS POSITIONS DYNAMICS Figure 16: Constraint Model for Spacecraft Application We initially considered a variety of constraint models and ultimately chose the model in Figure 16 based on preliminary testing with the alternative models considered. Two alternate constraint model implementations were initially considered for the vehicle GNC subsystem. The first model did not contain any temperature sensors and, as a result, had no observability into the on/off state of each thruster. Figure 17 shows this constraint model representation. Testing demonstrated that large superstructures formed without any observability into each thruster's state. The low diagnostic resolution occurs because multiple thrusters affect each degree of 53 freedom. A fault detected in a given degree of freedom could only be isolated to all thrusters that affect that degree of freedom. Therefore, without a measurement to trace a fault to a particular thruster, all thrusters affecting a given degree of freedom become a single superstructure. The second model, shown in Figure 18, incorporated coarse temperature sensors on each thruster to determine actuator on/off state, allowing for isolation of thruster failures to a single thruster. Still, testing showed that navigation sensor errors were isolated to large superstructures due to the dynamics modeling approach. The dynamics were initially modeled to add the forces input to each to degree of freedom, and then integrate the acceleration to produce position measurements. The dynamics component output both acceleration and position quantities. By combining the addition and integration dynamics into a single component, a sensor error could not be distinguished from a dynamics error. Therefore, a configuration was used that improved sensor diagnostic resolution without adding extra hardware. In this model, the integration of acceleration into position was separated from the original dynamics component. The inputs to the selected constraint model are the commands from control, which are on or off for all 11 thrusters. The commands are sensed in software in order for the VHM algorithm to have knowledge of the expected thruster behavior. These commands propagate through individual thruster components that output the forces along the vehicle axes and the thruster temperature. Thruster force outputs are only connected to the nominal DOFs affected. In reality, there may be residual thrust components along unintended axes, but this uncertainty in thrust direction is accounted for in the process noise of the dynamics constraint functions. The thruster temperatures are sensed and the forces are input into dynamics components. The dynamics are separated into the three translational degrees of freedom to improve diagnostic resolution. The dynamics components output accelerations, which are sensed by the accelerometers. The accelerations are also connected to integrator components that output vehicle absolute position. The absolute position is sensed by the GPS receiver. The constraint function equations that capture this model are provided later in this section. 54 ACCELERATIONS/ POSITIONS ON/OFF COMMANDS -40 -0 DYNAMICS THRUSTERS 0 sensor Component Forward Information Flow Figure 17: Constraint Model without Temperature Sensors Superstructures as defined by Fesq [21] are still present but can be eliminated with minor hardware and logic additions. For instance, at constraint model boundaries, it is not possible to distinguish between the boundary sensor and the associated boundary component. For the navigation sensors, the boundary components are dynamics models rather than hardware. Typically, a failure isolated to this superstructure should be associated with a sensor failure because the dynamics cannot fail if the model is correct. On the other hands, dynamics failures could be associated with unexpected external disturbances, such as bumping into another spacecraft during a docking maneuver. This simulation assumes bumping cannot occur because a second spacecraft or body is not modeled. Validation for landing, rendezvous, or docking maneuvers may not make this assumption. At the input boundary, command sensors cannot be distinguished from their associated thrusters. In this case, isolation should be associated with a thruster failure because command sensors simply read the control commands in software. One superstructure between two hardware components involves the thruster and the associated 55 temperature sensor. A fault in the temperature sensor results in a superstructure containing both the sensor and the thruster. However, the reverse is not true, as a thruster fault will not implicate a temperature sensor. In this case, extra logic in the software or an extra temperature sensor on each thruster could eliminate the superstructure. This superstructure analysis would be useful in early-stage system design for insight into fault observability. In summary, all physical components in the selected constraint model, except for the temperature sensors, can be isolated to the specific faulty component with the chosen constraint model. The analysis in Section 4.4 only considers thruster and navigation sensor failures because the models of these hardware components are the most realistic in the simulation. In addition to constraint model organization, superstructures depend on signal-to-noise ratios. A lower SNR at a given node may lead a higher probability of isolating to a larger superstructure. This will be shown in Section 4.4.2 with the degradation of the temperature sensors. ON/OFF COMMANDS FORCES/ TEMPERATURES , , DYNAMICS THRUSTERS TEMPERATURE SENSORS ACCELERATIONS/ POSITIONS 0 Figure 18: Constraint Model without Split Dynamics 56 Sensor Component Forward InformatIon Flow Superstructures in SCS are largely determined by the chosen constraint model. There are also limitations within the hardware that are not reflected in the constraint model. For example, the GPS is an integrated hardware unit and is typically at the lowest replaceable unit level. The position measurement is split into three axes to improve diagnostic resolution in the constraint model, but in reality a GPS failure would likely result in error on all three axes. To accommodate these types of hardware connections, the SCS algorithm includes logic to group components or sensors in the model. Groups of components or sensors can be identified for simultaneous, rather than individual, suspension. Otherwise, a failed GPS resulting in errors on all three axes would require the algorithm to attempt all combinations of 1, 2, and 3 simultaneous suspensions in order to isolate the fault. This large number of isolations would require significant computation effort and can be avoided by grouping components or sensors that are actually part of the same hardware. : +- Forward constraint Reverse constraint Cmd Thruster F1 INPUTS F2 > a OUTPUTS F3 P x Dynamics Integrator Figure 19: Component Inputs and Outputs with Connections 57 Each component contains forward and reverse constraint functions to map inputs to outputs and vice versa. Figure 19 identifies the inputs and outputs for the three basic types of components and maps the connections between them. The y and z dynamics differ from the x dynamics component only in the number of input forces. The fundamental component constraint models are the building blocks for the more complex system level diagram shown earlier in Figure 16. The constraint functions for measurements and associated uncertainties are provided here for each component. They are based on the simulation truth equations provided in Section 4.2. The inputs to the thruster components are on or off commands, which can be represented by a 1 or 0, respectively. The outputs are forces in each body direction that the thruster affects, as well as a temperature. Therefore, the forward constraint function from a command to a force component in direction i is Fi = cmdi * Fnm * diri (4.11) where Fnom is the nominal thruster force, diri is the corresponding component of the direction unit vector for thruster i, and cmdi is the on or off command for thruster i represented by a 1 or 0. The uncertainty is propagated using Eq. 3.1, written here as out 0 = 7 0iz * (Fnom * d iri) 2 (Fst* d iri) 2 (4.12) where o-2zut and uzare the output and input variances respectively and Fstd is the standard deviation of the thrust force when the thruster is on. The forward constraint from command to temperature is T = Toff + (Ton - Toff) * cmdi (4.13) where T0n and Toff are the nominal temperatures when the thruster is on and off, respectively. The uncertainty is propagated using ou t = in * (Ton - Toff) (4.14) where no process noise was added because there is no uncertainty in the simulation for on or off temperature. The reverse constraint function from force to command is cmdF (4.15) Fnom * di i and the associated uncertainty equation is o= 58 * F 1 d) (nom i + Fsta(4.16) nom where the process noise is estimated with the ratio of the thrust standard deviation to the nominal magnitude. The reverse constraint function from temperature to command is (4.17) cmdi = (T - Toff) (Ton - Toff ) where T is the measured temperature. Corresponding command uncertainty is given by U2 u = / 1 t 2 *(4.18) -Ton 2 ToffJ where no process noise was added because we assume commands are known perfectly. This is reasonable because controller commands are directly read in software. The dynamics blocks sum thruster forces and output vehicle acceleration on a body axis. The forward constraint functions for the three dynamics blocks are ax 1 -m * (F1 + F2 + F3 ) (4.19) 1 (4.20) +F1 1 ) ay =-*(F4 +Fs+F 8 + F9 +F 6 +F7 + F m 1 (4.21) az= -* (F4 + Fs + F6 +F, + F8 + F9 + F10 + F11) m where m is the vehicle mass, at is the acceleration along the i body axis, and Fi is the force from thruster i along that axis taking into account the thruster direction. The uncertainty is propagated with t where oen, + cn) + -- + = (u nl+ uin 2 +02 * 2. + On (4.22) (4. is the standard deviation of the zero mean environmental force. Section 2.2 explained that no technique has been found to propagate through reverse constraint functions that rely on multiple input node values. Reverse constraint functions are not included for the dynamics blocks because the constraints are dependent on knowledge of all force inputs. The force from a given thruster cannot be computed based on the acceleration component without knowledge of the force component from the remaining thrusters. The integration block computes vehicle position components based on accelerations. The discrete integration is performed using 1 r = ro +vo * dt +-* a* dt 2 2 59 (4.23) where ro and voare initial position and velocity stored in the VHM code from the previous time step and dt is the size of the time step based on the VHM rate. Uncertainty is propagated with )2 (* Ci 1 n* gu = dt (4.24) where no process noise was added because the integration has no uncertainty. Uncertainty in previous time step values is not incorporated in this constraint model. The reverse constraint function is a discrete differentiation written as r - ro - vo * dt (4.25) dt 2 a=2 * and the uncertainty is propagated as (out 2 = 2 (2 )2 n * () (4.26) where no process noise was added. In a more complicated example, constraint functions might need to incorporate frame transformations and time-varying parameters, such as mass. However, in Section 4.4 we will show that even with this 3-DOF example, the constraint functions are sufficient for analyzing algorithm performance in the presence of realistic uncertainties. 4.4 Performance Analysis We investigated VHM performance using the vehicle simulation and constraint models. We first validated the code by running a fault scenario for each component and sensor at very high signal-to-noise ratios in order to demonstrate perfect FDI. All faults were repeatedly successfully isolated within their expected superstructures, confirming the simulation and constraint model consistency. Several analysis cases are described in this section. System level P(FA) as a function of input P(FA) to each node is found empirically. The effect of a degrading sensor on diagnostic resolution is illustrated using the temperature sensors. Sensitivity of SCS performance to the system SNR is investigated using dynamics uncertainty and a comparison between two IMU models. SCS is then compared against Fixed Threshold (FT) to demonstrate the system level improvement. Finally, SCS executions on an embedded processor are briefly discussed. Manned spacecraft VHM requirements are typically written in terms of fault tolerance and general ability to detect and isolate key faults [36]. However, this requirements framework 60 does not translate well to the performance metrics often reported in literature, such as P(FA) and P(MD). A thorough literature search finds very few instances of specific requirements for P(FA), P(MD), and for similar isolation metrics. This suggests these requirements either are not well defined or are proprietary. For the purpose of validating SCS in this thesis, we define a set of metrics that constitute acceptable VHM performance. In detection, both the P(FA) and the P(MD) shall not exceed 5%. In isolation, the algorithm must be correct in at least 95% of cases, in that the faulty component or sensor is reported by the VHM (even if it is not the only component or sensor reported). Results reported in this section will be compared to these measures of acceptable performance. If specific VHM requirements become available in the future, these results can be revisited. 4.4.1 System Level P(FA) The relationship between system level P(FA) and the P(FA) input to each node for parity space and hypothesis testing is critical to VHM design because it links the algorithm inputs to the algorithm performance. A desired P(FA) is input to each node for parity space and hypothesis testing calculations. However, the system is composed of many nodes, and if any node produces a false alarm, there is a false alarm at the system level. Therefore, the system level P(FA) is expected to be higher than the node input P(FA). We define the system level P(FA) here as the empirical false alarm rate achieved in simulation. In general, the inputs to SCS are the node level P(FA) but our requirements refer to the system level P(FA). No analytical relationship between the input and system level P(FA) has been created yet because of the large number of variable dependencies in the constraint model. The measurements at each node are not independent because the measurements are propagated through the system. Analytical measurements compared for consistency at one node may not be independent of the analytical measurements compared at a second node because they may originate at the same sensor. These dependencies are constraint model dependent. Empirical system level P(FA) as a function of node level input P(FA) is plotted in Figure 20 for the vehicle constraint model, using 10,000 trials to create each data point. The same P(FA) input was applied to each node in the system, but in practice nodes can be tuned individually if desired. The plot shows the strong linear fit with a slope of approximately 3 for the range investigated. Due to the significant scale factor between input and system P(FA), the input P(FA) must be set relatively low to establish acceptable system wide P(FA). The relationship between system and input P(FA) is expected to 61 be highly system dependent so the simulations and analysis required to find this relationship should be repeated for new constraint models. 0.03 -+-5 Empirical Linear Fit 0.025 LL E 0.015: > 0.01 y =2.9*x + 0.00015 r2 = 0.9895 0.005 0 0 0.002 0.004 0.006 0.008 0.01 Input P(FA) Figure 20: System level P(FA) as a Function of Input P(FA) 4.4.2 Sensor Degradation As described in Section 4.3, a constraint model with no temperature sensors was initially considered. Testing demonstrated that observability into each thruster's state is required to eliminate superstructures of several thrusters. Simulations with varying temperature sensor noise levels illustrate the sensor quality required to isolate single thruster failures. With higher quality sensor measurements, isolation to a single thruster is likely. As the temperature sensor degrades in quality, isolation to a superstructure of several thrusters occurs with increasing probability. Figure 21 shows the average extra number of objects isolated for thruster number 5 stuck off for varying signal-to-noise ratios. The extra objects number measures the number of extra objects reported by the VHM algorithm as potentially faulty in addition to the correct object. The metric includes both components and sensors as 'objects' in the constraint model. The signal level is defined here as the difference between the expected temperature readings when the thruster is on 62 versus off (i.e., Ton - Tff) and the noise level is the standard deviation of the temperature sensor measurements. One thousand sets of simulated onboard data were generated for each data point and tested on the VHM code. The input P(FA) to each node was 0.003. A constraint model identical to that in Figure 16, but without any temperature sensors, was created to determine the extra components in the superstructure of the thruster investigated. It was discovered that at least 17 extra components and sensors would be isolated if no temperature sensor existed. The average extra components metric approaches this value as the SNR decreases, indicating that a degrading temperature sensor approaches the situation with no temperature sensor present. As described in Section 4.3, there will always be one extra component isolated, because the control, command and thruster form a superstructure. The experiment shows that a SNR of approximately 8 is sufficient for reliable isolation to a single thruster. -V 18~ 161(Ino . 14 Degrading Sensor No Sensor C 12 0 Q- 10ii E 0 o 81 uJ 0 2 4 6 8 10 Temperature Sensor Signal to Noise Figure 21: Temperature Sensor Degradation Forms Superstructure 4.4.3 Signal-to-Noise Sensitivity The SNR in the system directly impacts VHM performance as shown in Section 3.3.2. As the noise level approaches the signal or fault bias magnitude level, it becomes more difficult 63 to determine if a variation is due to a fault or to random noise. The signal-to-noise ratio was investigated in two ways. First, the sensor hardware was varied, changing the measurement noise level relative to a constant chosen fault bias. Second, the environmental disturbance forces were varied, effectively varying the actuator thrust-to-dynamics uncertainty ratio. Accelerometer channel faults were chosen to illustrate performance, because experimentation on the constraint model with a variety of faults showed that accelerometer FDI is the most sensitive to the variables investigated in these test cases. The figures in this section plot the experimental P(MD) vs. P(FA), which shows the tradeoff in the design space with varying node level input P(FA). A higher chosen input P(FA) for a node corresponds to points with lower P(MD) and higher P(FA) on a given trade off curve. Varying the system signal-to-noise ratio shifts the curve relative to the origin. For example, increasing the signal-to-noise will shift the curve towards the origin, lowering both P(FA) and P(MD) and improving VHM performance. Using the FOM in Eq. 3.6, the FOM-minimizing input P(FA) can be selected based on the empirical curves shown in this section. First, two IMU models were compared for a given fault mode. The IMU models were the MIMU introduced in Section 4.2 and the Litton LN200 Inertial Measurement Unit, because they are flight proven hardware [33]. Two IMU models are sufficient for checking the algorithm and model. Specifications for the IMU models are shown in Table 3 [33]. The standard deviation of the white noise on each accelerometer measurement was formed using the random walk and assuming a 200 Hz sampling rate. This sampling rate is reasonable for a commercially available IMU [37]. The conversion equation using dimensional analysis was x- m Ss * 1 y -=x sSs - m (4.27) where x is the random walk specification and y is the sample rate [38]. Using Eq. 4.27, the estimated accelerometer noise standard deviation was calculated to be 0.00212 and 0.00693 m/s 2 for the MIMU and LN200, respectively. The inserted fault was a deterministic bias in the x accelerometer channel. The bias was equal to 9 times the nominal accelerometer noise standard deviation given in Table 2, because in preliminary testing the algorithm performance was found to be most sensitive in this bias region. Figure 22 shows the simulation results where each data point represents 1000 nominal and faulty system runs. Figure 23 shows the low P(FA) and P(MD) section of Figure 22 to show the MIMU performance. The lower signal-to-noise ratio of 64 the LN200 for the given fault bias results in the empirical curve shifted significantly away from the origin as expected. This implies the MIMU was capable of robustly detecting smaller fault biases than the LN200, which is expected because the MIMU has less noise in its measurements. The algorithm ranged from 95.6 to 99.5% correct in its isolations for the MIMU and from 84.5 to 98.0% correct for the LN200. The ability to isolate simultaneous failures was disabled for all tests in this section, because the execution time required to perform a simultaneous isolation was impractical and no simultaneous faults were inserted, so this would be an incorrect isolation. If the algorithm failed to isolate the failure with single component or sensor suspensions, it was forced to stop execution and report an incorrect isolation rather than attempt every combination of multiple components and/or sensors. Also, the algorithm could only isolate the fault to a superstructure of approximately 8 components on average due to the low signal-to-noise ratio tested. The signal-to-noise ratio used here was chosen because preliminary testing demonstrated that the detection performance is sensitive in this region. In isolation, the probability of isolating to a larger superstructure is high because of the low SNR. This is similar to the reasoning in Section 4.4.2 , shown in Figure 21, where temperature sensor degradation results in a lower SNR and more components are isolated in the superstructure on average. Additional testing showed that a higher SNR causes movement along the curve toward a lower average number of components isolated. Based on the requirements defined earlier in this section, SCS demonstrates acceptable detection performance for this small fault bias with recorded P(MD) and P(FA) lower than 1%and 3%respectively. IMU Model Accelerometer Random Walk (m/s/vs) MIMU 0.00015 LN200 0.00049 Table 3: IMU Hardware Specifications [33] 65 1: * * MIMU LN200 0.8 0 0 0.6 :5 - 0 I 0 0.4 0.2 N 0 0.02 0 0.08 0.06 0.04 0.1 P(FA) Figure 22: IMU Model Comparison for Accelerometer Fault 0.03 * 0.0251 0 0 MIMU . LN200 0 U 0 .0 1 5 0.01 0 E U N 0 0.005 L 0.02 0.03 0.04 0.05 0.06 P(FA) Figure 23: Low P(FA) and P(MD) Region of Figure 22 66 In the second noise sensitivity test, the environmental disturbance force was varied to investigate the effect of dynamics uncertainty on accelerometer FDI performance. The stochastic environmental disturbance was simulated with zero mean and a varying standard deviation. The deterministic accelerometer fault bias was equal to that in the IMU model comparison. Figure 24 plots the simulation results with each data point again representing 1000 trials for both nominal and faulty systems. The ratio in the legend represents the thrust force (F) divided by the standard deviation of the environmental disturbance (env), because this can be considered the effective SNR in vehicle dynamics. Figure 25 shows the performance at low P(MD) and P(FA) for F/env = 100. This is considered a conservative estimate of SNR because in practice, aerodynamic drag in low earth orbit (LEO) for a vehicle with this cross sectional area is less than 1 N [39] and the thrust level here is 216 N, resulting in a F/env ratio greater than 200. SCS correctly isolated the accelerometer in 93.3 to 99.4% of the trials for all data points for this conservative SNR estimate. Most data points achieved the acceptable 95% correct isolation metric defined at the beginning of this section. Therefore, it is possible to select an input P(FA) to meet our requirements. For the same reasons outlined above in the IMU test analysis, the accelerometer again could only be isolated to a superstructure averaging 8 components in size. SCS demonstrated acceptable detection performance according to the requirements stated at the start of this section with P(MD) and P(FA) less than 2% and 5%, respectively, for certain input P(FA). 67 le * F/env= 100 F/env =10 0.4 0.3- a0.2 U 5 0 M 0.1 0 *- M U Urn U 0 0 U U 0.02 0 0 0.04 0.08 0.06 P(FA) Figure 24: Dynamics Uncertainty Comparison for Accelerometer Fault * , * 0.05 F/env= 100 F/env= 10 0.04 a 0.03 0 0 0 0.02 0.01 0 . 0.02 0.03 0.04 0.05 P(FA) Figure 25: Low P(FA) and P(MD) Region of Figure 24 68 0.06 4.4.4 System Level SCS and FT Comparison This section expands on the node level demonstration in Section 3.3.2 by comparing SCS and Fixed Threshold (FT) at the system level. The spacecraft constraint model presented in Section 4.3 was slightly modified to include an extra set of accelerometers, and FT was tuned to a specific accelerometer fault bias in the model. We demonstrated SCS system level improvement over FT for this sample fault for both detection and isolation. For this simulation, we developed a procedure for selecting the fixed thresholds. First, the expected mean and variance of the analytical measurements at each node in the constraint model were calculated for a nominal system with no faults. The constraint model was identical to that in Figure 16, but a redundant accelerometer was added at each accelerometer node. The redundant sensor resulted in 3 measurements at the accelerometer nodes, originating from the 2 sensors and a forward propagated value. The accelerometers were based on the MIMU and LN200 models presented in Table 3. The second accelerometer was added to create a scenario similar to that in Section 3.3.1, wherein three measurements have different uncertainties. Next, we derived the impact of the selected fault on the analytical measurement means at each node. Knowledge of the constraint functions and fault bias allows the designer to compute the expected mean of the analytical measurements throughout the system. We inserted a fault bias in a MIMU accelerometer channel because the higher uncertainty of the LN200 measurements would render the fault more difficult to detect for a given fault bias to noise ratio. Based on known means, variances, and fault biases at each node, the node level numerical optimization tool from Section 3.3.2 was used to find the FOM-minimizing threshold for each node. In order to simulate realistic implementation, threshold tuning was performed at each node. This resulted in extremely low performance for FT due to the presence of at least three measurements with different levels of uncertainty at several nodes. Therefore, we employed additional knowledge of the fault to boost FT performance using the following strategy. Most nodes were unaffected by an accelerometer fault due to the inability to reverse propagate information through the dynamics components. Unaffected nodes were given thresholds high enough to avoid false alarms. Also, the thresholds at the branching nodes forward of the dynamics components were set high even though they are impacted by the accelerometer fault bias. This is because detection and isolation for this fault are possible using consistency checks only at the accelerometer nodes. Figure 26 highlights the nodes that had thresholds set high for FT. Red circles highlight the 69 nodes that had thresholds tuned. Purple circled nodes indicate nodes that would see the effect of the accelerometer fault bias but were set high to boost FT performance. All other nodes would be unaffected by the example fault and had thresholds set high. ON/OFF COMMANDS FORCES/ TEMPERATURES THRUSTERS TEMPERATURE SENSORS ACCELERATIONS DYNAMICS O O POSITIONS Affected by fault but set high Threshold tuned Figure 26: Constraint Model Highlighting FT Settings SCS inputs were selected based on the performance in Figure 23. Input P(FA) was 0.007 for every node because this produced low P(MD) and P(FA) rates in the earlier analysis. No optimization tool was used to select input P(FA) and no nodes were set high as with the FT method. Therefore, FT was given a significant advantage over SCS by using knowledge of the simulated fault mode. Figure 27 shows the detection performance of each algorithm measured by the FOM used in Chapter 3 (with ci and c2 equal to 1) for a range of accelerometer fault biases. FT was tuned for a fault bias to noise ratio of 8. During flight, arbitrarily many fault biases are possible, and it is impractical to tune a threshold for each, so the threshold was not changed with the independent 70 variable. Each data point represents 1000 trials for both nominal and fault scenarios. SCS outperforms FT in detection for the SNR plotted, but the trend implies FT would asymptotically approach or possibly surpass SCS performance at low fault biases. Further testing at lower fault biases showed that this is due to the chosen input P(FA) for SCS. For the selected input P(FA), FT can marginally outperform SCS in detection at for low fault biases. SCS can easily be tuned to have a higher FOM than FT in this domain by increasing the input P(FA) value, which would also slightly decrease the SCS FOM at higher fault biases. However, at these low fault bias levels, neither algorithm achieves acceptable VHM performance as defined at the top of this section. Therefore, we consider both algorithms ineffective in FDI in this SNR range and the range plotted here illustrates the relevant SCS improvement over FT. Inserting the acceptable detection performance in P(MD) and P(FA) defined at the start of this section into the FOM here, an acceptable FOM is 0.95. SCS achieves acceptable performance at a SNR of 8, while FT does not achieve acceptable performance in this SNR range. Figure 28 provides isolation performance measured by the percent of isolations that contained the correct faulty component. In isolation, SCS consistently achieves approximately 95% correct while FT reaches 85% correct for this fault bias to noise range. SCS achieves acceptable isolation performance for SNR greater than approximately 6. FT again does not achieve acceptable performance in the SNR range considered. As shown in Figure 27 and Figure 28, SCS performed better than FT in detecting the fault and isolating the correct component despite numerically optimizing FT around the given fault mode. The threshold selection process was highly dependent on the constraint models and fault mode, and in reality many failure modes would need to be taken into account, rendering threshold tuning less practical than the SCS input process. More complex constraint models may also result in a greater number of nodes with three or more measurements to compare. This would make any threshold selection at these nodes less favorable than SCS as shown in Section 3.3. 71 1I " M 0.95FH * * U SOS FT M 0.85 0 0 U- 0.8 e 0.75 U 0.7 0 0.65k5 6 10 9 8 7 Fault Bias to Noise Ratio Figure 27: System Level Detection Performance Comparison for Accelerometer Bias 100 U U 0 . M 95 =U 90 * SCS ) 0C: * FT 85 1 0 80! 0 01 0 7570 0 65 5 6 7 8 9 10 Fault Bias to Noise Ratio Figure 28: System Level Isolation Performance Comparison for Accelerometer Bias 72 4.4.5 Embedded Processor Testing The simulations presented here were run using MATLAB on a Windows desktop computer. During flight, the software will run on an embedded processor without the overhead processes that a desktop computer runs. In order to facilitate testing of the algorithm in a more realistic scenario, the VHM code was written in Embedded MATLAB (EML). EML is a subset of the MATLAB language that supports select MATLAB features and facilitates efficient code generation for deployment into embedded systems [40]. Appendix B provides more detail on the implemented software architecture. Draper Laboratory engineers compiled and auto-coded the EML into C code and successfully ran it on an embedded processor for a variety of constraint models, including nonlinear and hierarchical systems. This demonstrated the code functionality in an environment more closely resembling a flight system. Future software development work will focus on reducing the memory overhead requirements and lowering computational complexity. Once the SCS code is refined in terms of memory and computation, additional testing on an embedded processor can provide accurate metrics on memory usage and execution time. Testing on a variety of constraint models can lead to a relationship between these metrics and model complexity. 73 74 Chapter 5: Summary and Future Work This thesis presents a general vehicle health monitoring (VHM) method built on constraint suspension. Key results of this thesis are summarized here. Future work and challenges in implementing Stochastic Constraint Suspension (SCS) on a flight vehicle are also outlined. 5.1 Summary of Results Section 1.1 illustrated how autonomous VHM is a critical technology for future space exploration missions through the use of specific mission examples and NASA priorities [1]. Section 1.2 argued that many VHM methods have been proposed and applied to various degrees, but none have been able to detect and isolate a sufficient variety of faults while remaining simple enough for near term deployment on a complex system. This conclusion is based on a survey of literature on proposed and implemented VHM algorithms, including several forms of hardware and analytical redundancy [6]. Fesq's analog constraint suspension [21] was shown to be a promising approach to the VHM problem by linking quantitative and qualitative redundancy algorithms. The fundamentals of constraint suspension, including its merits and limitations, were detailed in Chapter 2 along with the parity space and hypothesis testing approaches. SCS, presented in Chapter 3, strengthened analog constraint suspension by using parity space and hypothesis testing rather than the Fixed Threshold (FT) algorithm for consistency checking of analytically redundant measurements. This was enabled by propagating uncertainty associated with each value through the constraint functions. Low level fault detection and isolation (FDI) remains an efficient tool for comparing redundant hardware and can fit well into the constraint suspension framework. The expected performance of SCS at the measurement consistency checking level was derived for a given fault mode and algorithm input. Conceptual and numerical examples then demonstrated the improvement SCS provides over FT, especially when greater than two measurements with unequal uncertainties are being compared. Over the measurement uncertainty range considered in Section 3.3.2, SCS maintained 97% of its performance as measured by the defined figure or merit (FOM) in Eq. 3.6, while FT fell to 87% of its original performance. 75 Chapter 4 provided SCS performance data for a simulated spacecraft GNC subsystem. The subsystem application and associated simulation were described and the constraint model representation of the subsystem was presented. Repeated simulations were used to record key metrics of SCS performance for a variety of test cases. First, the empirical relationship between system level P(FA) and node level P(FA) was investigated and found to be linear with a slope of approximately 3 for the given constraint model. Next, degrading temperature sensor quality was shown to increase the probability of isolating a fault to a large group of components and sensors rather than to the individual thruster for a thruster fault. The presence of a degrading sensor causes the constraint model to approach the constraint model without the sensor present. Third, SCS performance measured by P(FA) and P(MD) was presented for different signal-to-noise ratios. This included varying the IMU hardware and the uncertainty of vehicle dynamics in the simulation. SCS achieved empirical P(FA) and P(MD) of less than 5% and 2% respectively for conservative signal-to-noise estimates. Fourth, SCS was directly compared to FT at the system level using the detection FOM presented in Chapter 3 and the percentage of correct isolations. SCS consistently outperformed FT in detection and isolation using the defined metrics. In isolation over the signal-to-noise range examined, SCS identified the correct component in its output list at least 95% of the time while FT achieved less than 85% correct. 5.2 Challenges and Future Work Future research will focus on several challenges in SCS. First, computation limitations may become apparent with more complex constraint models. The SCS algorithm must be characterized on an embedded processor for a better understanding of computation and memory metrics. The current implementation is auto-codable into C as a first step towards this goal, but significant work remains in lowering the computation and memory load. For example, new suspension strategies in isolation or building models hierarchically may mitigate the computation challenge. Second, a set of design rules would be helpful for the creation of constraint models for a given set of hardware. These rules would attempt to maximize diagnostic resolution and mitigate system complexity and ensure the VHM design process is more of a repeatable procedure than an art. Third, the ability to recognize superstructures with software would be useful in practice and in analysis. Superstructures are groups of components and sensors at the limit of diagnostic resolution [21]. A fault in a component or sensor within a superstructure can 76 only be isolated to the entire superstructure. During the design process, this would be valuable in identifying observability limitations and could influence vehicle design based on VHM requirements. In operation, the algorithm could potentially reduce the number of suspension steps performed during isolation. For instance, if a component within a superstructure is found to be potentially faulty, the entire superstructure could be added to the suspect list without individually suspending each component. Additionally, following a possible constraint model reconfiguration due to a faulty component, the new superstructures could be found autonomously. Fourth, VHM over an extended number of time steps has not been explored in depth here. Many constraint models, such as the one used in Chapter 4, require integration or differentiation over time. An algorithm for updating integrated values periodically should be included in SCS to avoid integration errors that build over time. Filtering may be needed for differentiating values to mitigate noise. Additionally, a moving window has been implemented in FDI in the past to improve VHM robustness [26]. For example, the algorithm may require isolation to a component in 3 out of 5 consecutive time steps to reconfigure the hardware. This approach may improve the P(FA) and P(MD) but will increase the time to recover from a failure and may prevent FDI of transient events, so a trade off must be carefully made. SCS must be validated on progressively higher fidelity models leading up to flight vehicle deployment. High fidelity vehicle simulations exist to provide realistic sensor data to the algorithm. Hardware-in-the-loop testing could then provide the next step in characterizing VHM performance and feasibility. Ultimately, the algorithm must be flight proven. However, flight vehicles may not experience the wide array of faults required to validate SCS, and simulated onboard faults may be used instead. 77 78 References [1] "NASA Space Technology Roadmaps and Priorities," National Research Council, Washington, D.C., 2012. [2] Brian Welch, "Limits to Inhibit," Space News Roundup, vol. 24, no. 14, pp. 1-3, August 1985. [3] "Report of the Presidential Commission on the Space Shuttle Challenger Accident," Washington, D.C., 1986. [4] David K. Geller, "Orbital Rendezvous: When is Autonomy Required?," Journalof Guidance, Control,and Dynamics, vol. 30, no. 4, pp. 974-981, July-August 2007. [5] Lorraine M. Fesq. (2009, March) White Paper Report: Spacecraft Fault Management Workshop Results. [Online]. http://discoverynewfrontiers.nasa.gov/lib/pdf/SpacecraftFaultManagementWorkshopResults .pdf [6] Inseok Hwang, Sungwan Kim, Youdan Kim, and Chze Eng Seah, "A Survey of Fault Detection, Isolation, and Reconfiguration Methods," IEEE Transactionson Control Systems Technology, vol. 18, no. 3, pp. 636-653, May 2010. [7] Douglas Zimpfer, "Flight Control Health Management," in System Health Management with Aerospace Applications.: John Wiley & Sons, 2011, ch. 30. [8] Mukund Desai, "A Fault Detection and Isolation Methodology," in Decision and Control Including the Symposium on Adaptive Processes, 1981 20th IEEE Conference on, vol. 20, [9] [10] [11] [12] 1981, pp. 1363-1369. James E. Potter and James C. Deckert, "Gyro and Accelerometer Failure Detection and Identification in Redundant Sensor Systems," NASA Technical Report E-2686 1972. J. Gertler, "Fault Detection and Isolation Using Parity Relations," ControlEng. Practice, vol. 5, no. 5, pp. 653-661, 1997. R.K. Mehra and J. Peschon, "An Innovations Approach to Fault Detection and Diagnosis in Dynamic Systems," Automatica, vol. 7, no. 5, pp. 637-640, September 1971. D.T. Magill, "Optimal Adaptive Estimation of Sampled Stochastic Processes," IEEE Transactionson Automatic Control, vol. 10, no. 4, pp. 434-439, October 1965. [13] Michele Basseville, Albert Benveniste, Maurice Goursat, and Laurent Mevel, "SubspaceBased Algorithms for Structural Identification, Damage Detection, and Sensor Data Fusion," EURASIP Journalon Advances in SignalProcessing,2007. [14] R.J. Patton, C.J. Lopez-Toribio, and F.J. Uppal, "Artificial Intelligence Approaches to Fault Diagnosis," in Condition Monitoring: Machinery, ExternalStructures andHealth (RefNo. 1999/034), IEE Colloquium, 1999, pp. 5/1-5/18. [15] H. A. Talebi and K. Khorasani, "An Intelligent Sensor and Actuator Fault Detection and Isolation Scheme for Nonlinear Systems," in Decision and Control,Proceedingsof the 46th IEEE Conference on , New Orleans, LA, 2007, pp. 2620-2625. [16] C. Angeli and A. Chatzinikolaou, "On-Line Fault Detection Techniques for Technical Systems: A Survey," InternationalJournalof Computer Science & Applications, vol. 1, no. 79 1, pp. 12-30, 2004. [17] Randall Davis, "Diagnostic Reasoning Based on Structure and Behavior," Artificial Intelligence, vol. 24, no. 1-3, pp. 347-410, December 1984. [18] Sandra C. Hayden, Adam J. Sweet, and Seth Shulman, "Lessons Learned in the Livingstone 2 on Earth Observing One Flight Experiment," in AIAA 1st Intelligent Systems Technical Conference, Chicago, IL, 2004, pp. 1-11. [19] Douglas Bernard et al., "Spacecraft Autonomy Flight Experience: The DS1 Remote Agent Experiement," in AIAA Space Technology Conference and Exhibition, Albuquerque, NM, 1999. [20] Marie-Odile Cordier et al., "Conflicts Versus Analytical Redundancy Relations: A Comparative Analysis of the Model Based Diagnosis Approach From the Artificial Intelligence and Automatic Control Perspectives," IEEE Transactionson Systems, Man, and Cybernetics-PartB: Cybernetics, vol. 34, no. 5, pp. 2163-2177, October 2004. [21] Lorraine M. Fesq, "Marple: An Autonomous Diagnostician for Isolating System Hardware Failures," UCLA, Ph.D. Dissertation 1993. [22] Ksenia 0. Kolcio, Mark L. Hanson, Lorraine M. Fesq, and David J. Forrest, "Integrating Autonomous Fault Management With Conventional Flight Software: A Case Study," in Aerospace Conference, 1999. Proceedings. 1999 IEEE, vol. 1, 1999, pp. 307-314. [23] K. Kolcio, M. Hanson, and L. fesq, "Validation of Autonomous Fault Diagnostic Software," in Aerospace Conference, 1998 IEEE, vol. 4, Aspen, CO, 1998, pp. 251-264. [24] Scott Gleason and Demoz Gebre-Egziabher, GNSS Application and Methods.: Artech House, 2009. [25] Mark A. Sturza, "Navigation System Integrity Monitoring Using Redundant Measurements," Journal of the Insitute ofNavigation, vol. 35, no. 4, pp. 69-87, Winter 1988-89. [26] Russell Sargent et al., "A Fault Management Strategy for Autonomous Rendezvous and Capture with the ISS," in AIAA Info Tech at Aerospace 2011, St. Louis, MO, 2011. [27] Milton Abramowitz and Irene A. Stegun, Handbook of mathematicalfunctionswith formulas, graphs, and mathematical tables.: Courier Dover Publications, 1972, pp. 940- 943. [28] Johan deKleer and Brian C. Williams, "Diagnosis With Behavioral Modes," in Proceedings of the Eleventh InternationalJoint Conference on Artificial Intelligence (IJCAI-89), 1989, pp. 1324-1330. [29] Daniel L. Dvorak, "Monitoring and Diagnosis of Continuous Dynamic Systems Using Semiquantitative Simulation," Unversity of Texas at Austin, Ph.D. Dissertation 1992. [30] David J. Goldstone, "Controlling Inequality Reasoning in a TMS-based Analog Diagnosis System," in Proceedings of the AAAI-91 National Conference on Artificial Intelligence, 1991, pp. 215-517. [31] Philip R. Bevington and D. Keith Robinson, Data Reduction and ErrorAnalysisfor the PhysicalSciences, 3rd ed.: McGraw-Hill, 2003. [32] (2012, March) Astrium. [Online]. http://cs.astrium.eads.net/sp/spacecraftpropulsion/bipropellant-thrusters/220n-atv-thrusters.html [33] Stephen C. Paschall II, "Mars Entry Navigation Performance Analysis using Monte Carlo 80 Techniques," MIT, Master's Thesis 2004. [34] Oliver Montenbruck, Miquel Garcia-Fernandez, and Jacob Williams, "Performance Comparison of Semicodeless GPS Receivers for LEO Satellites," GPS Solut., vol. 10, no. 4, pp. 249-261, November 2006. [35] Doug Myhre, "Space Shuttle Main Engine Hot Gas Temperature Sensor," in International Instrumentation Symposium, 28th, Las Vegas, NV, 1982, pp. 405-416. [36] NASA Office of Safety and Mission Assurance. (2011, November) Human-Rating Requirements for Space Systems (NPR 8705.2B). [Online]. http://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPR&c=8705&s=2B [37] (2009, October) Gladiator Technologies, Inc. [Online]. http://www.gladiatortechnologies.com/DATASHEET/Legacy/LandMark30_IMUdatasheet _102309.pdf [38] Walter Stockwell. Angle Random Walk. [Online]. http://www.xbow.com/pdf/AngleRandomWalkAppNote.pdf [39] David A. Vallado and David Finkleman, "A Critical Assessment of Satellite Drag and Atmospheric Density Modeling," in AAS/AIAA Astrodynamics Specialist Conference, Honolulu, HI, 2008. [40] MathWorks. (2012) Code Generation from MATLAB: User's Guide. [Online]. http://www.mathworks.com/help/pdf doc/eml/eml_ug.pdf 81 82 Appendix A: Chi-Squared Distribution Noncentrality Parameter Derivation The chi-squared distribution noncentrality parameter is derived here for use in Stochastic Constraint Suspension (SCS). The noncentrality parameter k is necessary for the calculation of the expected P(MD) using Eq. 3.3 for a given fault and input P(FA). This parameter describes the mean of the normal random variables used in the chi-squared distribution. The noncentral chi-squared distribution X2 can be written as the sum of squared random normal variables, k 2 (A.1) X2 = i=1 U where o-1 is the standard deviation of random normal variable Xi, and k is the number of degrees of freedom. The probability density function for x can be written as a function of k and the noncentrality parameter k, 1 X k 1 x+A f(xlk,A2) = -e 2 (Vi) (A. 2) 4) .j! * (V +j + 1) (A. 3) * * 2 Ik 2 where I, (y) is a modified Bessel of the first kind given by IV(y) where [(n) = 2 * j=0 is the gamma function computed using F(n) = (n - 1)! (A. 4) where n is an integer. The noncentrality parameter k can be written as k /I = k(A. 5) where pi is the mean of random normal variable Xi [27]. The decision variable D in Eq. 2.8 was written as D= 83 +...+ ( (A.6) where fi is the residual for measurement i. Assuming the residuals fi follow random normal distributions, D follows a chi-squared distribution. For a nominal system, the residuals will have zero mean, reducing Xto zero and forming the central chi-squared distribution. However, a fault observed as a bias in one of the measurements will cause a bias in at least one of the residuals, causing a nonzero Xand forming a noncentral chi-squared distribution. The noncentrality parameter can be computed analytically using the parity space procedure outlined in Section 2.1.1. First, the computation is reduced by setting H=[] (A.7) where the measurement geometry matrix H is simplified for SCS because the analytically redundant measurements are measuring the same quantity. Hj", is computed by combining Eqs. 2.3, 2.4, and A.7 as Hi,, = (H TWH)-'H TW ..U 1 1 = 1 = K [12 rK1 (A.8) +m where K= 1 (A. 9) 1 and H T WH was expanded to 1/2 HTWH=[1 ... 1] 0 0 ... 0 ali 0 1 1/22 ... 0 - 2i (A.10) 1/2 Next, the S matrix is computed using the derivation in Eq. 2.7, S and is expressed as 84 = Im- H *Hin (A.11) -1 1 1 u1 U2 11 1 T2 -2 K S = K* (A.12) 1 1 1 T2 o-i K 1 _ o using substitution from Eq. A.8. S is directly used to compute the residual vector f as f 1 K 1 1 1 o-2 1 u2 1 u2 f=K* (A.13) =S*z =z-z2 1. K 1 5 2 1 1 ... U_ of um2 * [Z _zm (A.14) 1 1 K o i 1rKn- which can be expanded to KK f1,nom ) * Z1 ± 2 * K z 2 + ... + - - am * Zm K K f2,norm ~ 2 * Z + Z2 + + - -* 2 a- Zm (A.15) (A.16) where Eqs. A. 15 and A. 16 give explicit expressions for the measurement residuals for a nominal system. Adding a fault bias b to a given measurement zj results in faulty residual expressions K *b fi = fi,nom - 2 2K fi = f,nom + b * (1 (for nominal measurements i) (A.17) (for measurementj with fault bias b) (A.18) U~2 where Eqs. A. 17 and A. 18 separate the bias induced for each residual. The residual biases are the nonzero means that can be substituted into Eq. A.5 to form the noncentrality parameter, 85 -K2 b2 1 m o2 A= + 2 (A.19) S2 2 (i:#j) and simplified to b2 K2 A1= -b[S;2+ -2 j 2j 1 M -] i1d (A.20) where 1 K=r (A.21) Eq. A.20 provides a general expression for Xthat can be used to compute the expected P(MD) for a given fault using Eq. 3.3. Using the example from Section 3.3.2, for m=3 and a bias b on measurement 2, the resulting Xis b2 2 =2 T2 K2 S222 + 2T 1 U'2 1 U3 +2)] (A.22) Eqs. A.20 and A.21 can be simplified for equal measurement uncertainties Y. The simplified expression for Xis A= b21 _ 72 o2 (i + 2 =(1 m i=1 1 m+ + m-1\ b2 =(Sij ) a (A. 23) where Sjj is the diagonal entry of S corresponding to the measurement j with fault bias b and a2 K= m (A. 24) The biased residuals are fi =fi,nom+ f; = fj,nom + b * (1 - b m (for nominal measurements i) (for measurementj with fault bias b) (A.25) (A.26) Eq. A.23 matches the expression given by Sturza for parity space and hypothesis testing on redundant hardware [25]. 86 Appendix B: Software Architecture The vehicle health monitoring (VHM) software used in this thesis was developed in Embedded MATLAB (EML). EML is a subset of the MATLAB language that supports select MATLAB features and facilitates efficient code generation for deployment into embedded systems [40]. The software is designed to perform the algorithm pseudocode as presented in Figure 8. Additionally, the software incorporates logic for hierarchical models and execution over multiple time steps. This appendix provides the algorithm interface and the organization and purpose of the functions called in Stochastic Constraint Suspension (SCS). The constraint model is initialized in the ModeliLoad,ModelNodesInit, and ModelComponentInit functions. These functions define the structures required throughout the code. Model iLoad defines key constants used by the algorithm, such as the maximum depth of the constraint model hierarchy. Physical constants required for constraint functions, such as vehicle mass, are also defined here. ModelNodesInit defines the structure containing information on nodes and sensors. Sensor locations, connections between nodes, and the structures required to contain forward and reverse propagated values and uncertainties are defined here. ModelComponentInitinitializes a similar structure for components. The connection matrix linking nodes and components is one example of information defined in this function. The initialized structures are saved for input to the algorithm each time step. HMShell provides the interface to the SCS algorithm code. Figure 29 provides the interface details. Constraint model structures, sensor measurements, and consistency checking inputs are input to HMShell. Consistency checking inputs for SCS are the probability of false alarms (P(FA)) selected for hypothesis testing at each node as described in Section 2.1.2. SCS outputs the VHM solution, the forward and reverse propagated values and uncertainties (for use in the next time step), updated physical parameters and data used for debugging. The solution structure contains the fault detection result. If a fault was detected, the solution also indicates if isolation was successful. If isolation was successful, the solution structure contains a list of potentially faulty components and sensors. 87 iLoad cOmps > nodes > sensed > old-sensed alg.jn sol-out -> oldval HMShell ~ old var old-var new_iLoad > Input: iLoad: physical and model parameters Comps: information about component structure Nodes: information about node structure Sensed: sensor value for the time step Oldsensed: sensor values from previous time step Alg in: input P(FA) to each node Output: solout: VH M solution Oldval:values to be stored for each node for next time step Oldvar: variances associated with each stored value NewiLoad: updated iLoads for next time step Figure 29: SCS Interface Figure 30 lists the remainder of the functions required for SCS hierarchically and Figure 31 provides a visual representation of the hierarchy. From HMShell, HMMain is called to perform nominal VHM. HMMain conducts the propagation step (HMPropagate)and consistency checking at each node (HM_CheckNodes). HM Propagatemakes use of the forward and reverse constraint functions (ModelForwardConstraintsand ModelReverseConstraints)and the HMSelect Value function. When multiple values are available at a node for propagation, HMSelectValue chooses the best value to propagate. The best value is defined as the value with lowest uncertainty, with preference given to sensor measurements. If the available measurement uncertainties are within a predetermined tolerance, the sensor value is chosen. HMCheckNodes makes use of the HMchiLookUp function. As described in Section 2.1.2, the chi-squared distribution is required for setting the threshold in 88 hypothesis testing. HMchiLookUp stores a look up table for the chi-squared distribution and interpolates within the table to set the threshold for consistency checking. - ModeliLoad - ModelNodesinit Model Componentinit HMShell - HMMain - HMPropagate - HMSelectValue ModelForwardConstraints - ModelReverseConstraints - HMCheckNodes - HMGenCandidates HMDiagnose - - - HM_chiLookUp HMSuspend HMSuspendSens HMPropagate P HM_SelectValue D ModelForwardConstraints D ModelReverseConstraints HMCheckNodes D HM_chiLookUp HM_UnSuspend HMUnSuspendSens - ModelSwitchLevel - HMPropagate - HMSelectValue " ModelForwardConstraints " ModelReverseConstraints HMUpdate - HMWLSE ModelUpdate_iLoad - Figure 30: Function Call Hierarchy List If a fault is detected, HMGenCandidatesgenerates a list of candidate components and sensors to consider in isolation. HMDiagnosethen performs the isolation process. For each component and sensor in the candidate list, HM Diagnoseperforms suspension, propagation, consistency checking using the same functions as those called HMMain. HMSuspend and 89 HMSuspendSens suspend components and sensors, respectively. HMUnSuspend and HMUnSuspendSens reverse the suspension process. Also within HMShell, ModelSwitchLevel switches to a new constraint model within a hierarchal model if applicable. If a fault is isolated to a single component that contains a sublevel constraint model, the algorithm reinitializes the structures representing the constraint model and calls HMMain on the new model. HMPropagateis also called from HMShell because it is required in the hierarchy level switching process. HMUpdate stores a value and associated uncertainty at each node for the next time step. Previous time step values are sometimes required in constraint functions for integration and differentiation over time. HM Update calls HM_ WLSE, which computes the weighted least squares estimate at each node using the available analytically redundancy measurements (from propagation and sensors) at the node. ModelUpdate iLoad updates any physical parameters that are known to change over time. Figure 31: Function Call Hierarchy 90 91 92