Issues in Integrated Health Management of Life Support Systems David Kortenkamp, NASA Johnson Space Center/Metrica Inc. Gautam Biswas and Eric-Jan Manders, Vanderbilt University, Institute for Software Integrated Systems Abstract The Environmental Control and Life Support (ECLS) system of a space vehicle or habitat is responsible for maintaining a livable environment for human crew members. Depending on the duration of the mission, ECLS systems can vary from a set of simple subsystems to a set of complex interacting systems. The high importance of ECLS systems on manned space vehicles and surface habitats being planned on the Moon and Mars along with the need to operate them continuously in a safe, reliable, and possibly autonomous manner while ensuring very efficient utilization of all resources will require careful monitoring and integrated control of the system. As NASA begins to work toward longer and more distant missions, the complexity of this task will escalate. This paper adopts an integrated system health management focus and uses this to establish the unique requirements of life support systems with respect to monitoring and control. The different components of a life support systems are discussed, including Advanced Life Support Systems (ALSS) that are designed to operate in a closed loop by regenerating and recycling consumables to reduce launch mass. We present a high-level ISHM architecture for life support and some recent results in implementing the architecture. Finally, the life support monitoring and control issues for the Crew Exploration Vehicle (CEV) and for lunar and martian habitats are discussed and compared. 1 Introduction Environmental Control and Life Support (ECLS) systems are among the most vital of all manned spacecraft systems. They are designed to sustain life under the harshest environmental conditions in space and on lunar and planetary surfaces that are very different from what we experience on Earth. Thus, the reliability, dependability, and efficient operation of ECLS systems is vital to the health and survival of the crew, and to the overall success of the mission. Depending on the duration and the distance (from Earth) of the mission, ECLS systems can vary greatly in complexity depending on the length of the manned missions. Some of the complexity can be attributed to the highly nonlinear behavior of the individual subsystems that make up the ECLS. The complexity is further magnified by the number of interacting subsystems that make up the system, and the fact that these systems have to operate with limited resources in unpredictable environments. The simpler ECLS systems that are currently used for the shorter missions that operate close to the Earth are open loop, i.e., all of the essential consumables, such as oxygen, water, and food, are stored and then released into the astronaut working and living areas to ensure these spaces are habitable, and all of the by-products and waste generated, such as carbon dioxide, urine, and solid waste, are removed from the astronaut living areas and either dumped or stored for later disposal. As the duration of the missions (e.g., Space Station) and distance from Earth increase (e.g., lunar and Mars missions), the corresponding ECLS systems (often called Advanced Life Support (ALS) systems) are designed to be closed loop and regenerative, i.e., the goal is to reclaim or regenerate the essential consumables from the waste products continually so that the total amount of resources required for the duration of the mission is greatly reduced. In addition to conserving resources, this approach has the big advantage that it can significantly reduce launch weight, a very important factor in determining the feasibility and cost of the overall mission. To further elaborate, two reasons why closed loop systems are essential for extended human space flight are: 1 Figure 1: A proposed ISHM architecture for life support systems. 1. the amount of consumable resources required for a long mission (air, water, and food) would far exceed the payload that current propulsion technology is capable of launching; and 2. resupply of resources during the mission is not a feasible alternative. Autonomy (or at least human-supported autonomy) for ECLS (and ALS) systems, which have to operate continuously at high efficiency to ensure safety, keep energy consumption low, and prevent loses in consumables is also an important consideration in such systems. There is no doubt that as mission durations and distances increase, managing of complex ECLS systems in a safe and reliable manner using current technology and methodologies will become very hard to achieve. Integrated Systems Health Management (ISHM) refers to a collection of techniques that provide the functionality for maintaining system health and performance over the life of a system. This is illustrated in Figure 1. ISHM requires an integrated approach to monitoring, control, fault diagnosis, adaptation, and maintenance. In all systems that operate for long periods, components in the system are bound to suffer degradations in performance, and sometimes fail. Therefore, for autonomous and human-in-the-loop systems, ISHM schemes must have the ability to detect these degradations from deviations in system behavior, analyze performance and resource usage, and use this information to determine when maintenance is necessary to preserve system functionality and minimize downtime. The two interacting loops in Figure 1 illustrate this concept. The lower loop includes the traditional monitoring, diagnosis, and feedback control systems. The introduction of a supervisory controller in this loop enables a choice between autonomous fault-adaptive control approaches to mitigate or compensate for the effects of degradations and faults, and the activation of the second loop, where monitors inform the human operators or crew about the status of the system as a whole, and the humans make decisions on when to schedule maintenance operations, and, in some cases, to alter the goals of the mission, because the loss of functionality and/or resources will not allow for previous goals to be accomplished in a safe manner. Comprehensive ISHM must involve interacting multiple control strategies. At the fast time scales, robust control can be employed to make the functionality of the system independent of the disturbance to the system [32]. At this level, the degradations and fault magnitudes are small, and a robust controller can compensate for discrepancies without noticeable changes in system behavior. The field of robust control is well developed, and capabilities and limitations of these approaches are well understood [32]. At the next level, Fault Adaptive Control (FAC) [9] goes beyond disturbance handling by changing the system control strategy to adapt the system structure and/or functionality to mitigate the fault effects. Examples of the FAC approach are: 2 1. model-predictive control techniques, where diagnosis schemes are applied to compute parameter value changes due to faults and degradations, and update the system models used for online control [1]; and 2. supervisory schemes for reconfiguring system structure to nullify fault effects [11]. ISHM for ECLS systems, but especially for ALS systems, poses several significant and unique issues that include: • Interacting subsystems: Life support systems contain many interacting subsystems. As described in the next section, air, water, food, and waste systems all generate and consume shared resources, such as electrical power, water, and oxygen. Life support systems also cover a variety of domains, from physiological and biological processes to physical processes that include fluid, mechanical, thermal, electrical, and pneumatic systems. • Sensing: Sensing for life support systems is particularly challenging because of the wide variety of sensors required. Moreover, current state of the art sensors in these domains, often produce noisy output []. Biological elements that include humans, are difficult to monitor. In-line sensing of water and air quality is difficult, and the current state-of-the-art often requires that samples be analyzed offline in a laboratory to get reliable estimates. Besides, the different subsystems operate at very different rates, so monitoring and data analysis schemes have to accommodate multi-rate analysis. • Decision-making: There are several decision-making loops in integrated health management for life support systems (see Figure 1). There are short-term loops that operate in continuous time that are involved in regulating and feedback control of the life support processes. Intermediate loops respond to events (typically fault events) and deal with fault-adaptive control and reconfiguration decisions. The long-duration upper loop usually includes humans as decision makers aided by software tools that are involved in monitoring and making duration of mission predictions on scarce, consumable resources. To build effective ISHM systems, these loops must be integrated. • Human involvement: Humans are not only a part of the system in that they produce and consume life support resources, but they may need to be a part of the decision-making process at all levels, i.e., in both the lower and upper loops in Figure 1. This places significant requirements on an integrated health management system – requirements that may not be necessary in other domains. This paper presents an overview of integrated health management schemes for ECLS systems. We begin by describing the unique attributes of an ECLS system. We do this by looking at the various modules that comprise an ECLS system and the connections between these modules. Next we discuss modeling of life support systems, which is at the core of our ISHM approach. Then we present a high-level integrated health management architecture for ECLS systems and describe some recent results using the architecture. Some of this work is futuristic, however, the diagnostic techniques that we present have already been applied to a number of real-life applications [19, 7]. We demonstrate the effectiveness of the diagnostic techniques by showing results we have obtained for degradation and fault analysis in aircraft fuel transfer systems. Finally, we look at future integrated health management needs for life support systems including the Crew Exploration Vehicle (CEV) and Lunar and Martian habitats. 1.1 Life support systems Figure 2 shows a complete and connected ECLS system. Simpler ECLS systems for short missions will not have several of these components, and they may not operate in a completely connected closed loop form. For example, there may be no biomass or food processor module in a two week manned mission of the CEV, only a food store with sufficient food to last the astronauts for the duration of the mission. A more complete, regenerative system shown in the figure is essential for long duration human missions, such as trip to the surface of Mars, which would last 15-18 months and have limited resupply opportunities. Typically, most missions allocate thin margins for mass, energy and buffers for each system of the spacecraft, and this requires optimization during the design phase, and tight control during operations to keep the systems within their desired range of behavior, i.e., providing the necessary output while ensuring resource consumption does not exceed pre-specified limits. Since advanced life support systems have many interconnected subsystems all of 3 Figure 2: The various subsystems and recovery systems (RS) and their connections that comprise an environmental control and life support system. which share resources and interact in predictable and unpredictable ways, the design and control optimization tasks become quite difficult, especially if one has to consider the dynamic behavior of the system. In other work [1] we have shown that using dynamic models (as opposed to static models) during the design phase and integrating controller and system design, leads to much smaller buffer sizes (therefore, Equivalent Systems Mass) for the ECLS system. In this section, we briefly review the different components of an environmental control and life support system. We briefly describe the various, interacting subsystems of ALS using Figure 2 as a reference configuration. While each subsystem can be self-contained subsystems also interact in terms of sharing different resources. This section provides sufficient background for readers to become aware of issues that are unique to ISHM design of ECLS systems. For detailed documentation on advanced life support systems see [31] or go to: http://advlifesupport.jsc.nasa.gov. One of the key issues that one has to take into account is that the human crew are very much a part of the physical and biological processes that define the life support systems, and at the same time they play an important role in controlling the operations and managing the working of the system. When developing autonomy by automating the ISHM, control, and resource management functions, interesting issues arise in the interactions between the humans in the control loop and autonomous systems. These issues are discussed in other papers [1, 5], However, we do take into account biological models of the crew and crew activity models as part of the overall ECLSS system. 1.1.1 Crew The crew as a subsystem places demands on the life support system for various resources necessary to sustain life. Equations that capture human consumption models are available in many papers (e.g., [15]). These models are parameterized by the number, gender, age and weight of the crew and their typical activity 4 profiles for the particular mission. An integrated monitoring and control system for the ECLS system would track crew resource use(oxygen, water, food, etc.) over time. This would require a variety of sensors, such as those that monitor air flow in and out of the crew living and working chambers, its contents (percentage of oxygen and carbon dioxide, water vapor, and trace contaminants), and its pressure and temperature, while at the same time monitoring the stores of the consumable resources associated with the air, such as the amount of oxygen, and the amount of chemical filters for available for carbon dioxide removal. In a closed loop system, there will be additional subsystems, which replenish the consumable resources, and these processes have to be monitored so that the control system can maintain the proper balance between consumption and replenishment, while ensuring that other resource constraints, such as available power, are not violated. A more advanced life support controller may take into account the biological crew models and their activities, and provide a tool for scheduling crew activities and accommodating crew requests in a way that resource constraints are not violated [5]. 1.1.2 Water The water recovery subsystem converts dirty and waste water into potable and grey water (i.e., water that can be used for washing but not drinking). An example water recovery system from a recent NASA JSC test consists of four subsystems that process the water[10]. The biological water processing (BWP) subsystem removes organic compounds. Then the water passes to a reverse osmosis (RO) subsystem, which removes inorganic and particulate matter by pushing the water through a membrane. About 85% of the dirty water passing through the RO subsystem is converted into grey water. The 15% of water remaining from the RO (called brine) is passed to the air evaporation subsystem (AES), which recovers the rest by an evaporation/condensation process. These two streams of grey water (from the RO and the AES) are combined and passed through a post-processing subsystem (PPS) to remove bacterial traces and generate potable water. This system can run in various configurations. For example, the BWP can operate with the pump running at various speeds. The RO is more sophisticated in that it has four modes of operation: (i) a primary mode, where the water circulates on a longer path, (ii) a secondary mode where the water path is shortened so it speeds up and pushes harder against the membrane, (iii) a purge mode, where the brine is transferred to the AES system, and (iv) a clean mode, where the membrane is cleaned of particulate matter by creating a reverse flow through the membrane. The different modes of operation help the system maintain desired levels of throughput without exceeding energy consumption constraints. The WRS has an external controller that can turn on or off various subsystems (e.g., if needed all the water can be passed through the AES, but then the purification process has a high energy cost). In other work, we have designed sophisticated model-predictive controllers that have fault-adaptive properties, and they operate to maintain a tradeoff between energy consumption, throughput, and water quality. 1.1.3 Air The air subsystem takes in exhalant carbon dioxide CO2 and produces oxygen O2 as long as there is sufficient energy being provided to the system. An example Air Revitalization System (ARS) from a test at NASA JSC [22] consists of three interacting air subsystems: the Carbon Dioxide Removal System (CRS) in which CO2 is removed from the air stream; the Carbon Dioxide Reduction Assembly (CDRA), which uses water to break CO2 down into methane (CH4 ) and water (H2 O); and the Oxygen Generation System (OGS) in which O2 is added to the air stream by breaking water down into hydrogen and oxygen. It is important to note that both the removal of CO2 and the addition of O2 are required for human survival. 1.1.4 Biomass The biomass subsystem is where crops are grown. It consumes water, energy (light) and CO2 and produces biomass, which can be turned into food, and O2 . Models of crop growth and crop resource consumption can be found in [17]. This subsystem is optional for all but the longest missions as food can be carried on-board fairly cheaply if it is dehydrated. However, many mid-term missions could benefit from salad crops (lettuce, tomatoes, carrots) to provide the psychological benefits of eating fresh food. Crops can also be viewed as redundant air and water processors. When one considers the entire ALS configuration, one notes that the time constants involved in the biomass system vary greatly from the air and water systems. 5 1.1.5 Food processing Before biomass can be consumed by the crew it must be converted to food. The input to the food processing subsystem is biomass, energy, and crew time, and the output is food and solid waste. Unless significant automation is provided, this is a labor intensive process. 1.1.6 Waste The waste subsystem consumes energy, O2 and solid waste and produces CO2 . Some tests have used an incinerator to burn solid waste [28]. There are many other forms of waste recycling besides incineration that might be used. Most short duration missions will simply dispose of waste by leaving it behind or packaging it for return back to earth. 1.1.7 Power While not just a part of the life support system, power is a common thread that enables all of the life support subsystems. It also one of the factors that defines the interactions among these subsystems. An integrated monitoring and control system for ECLS will need to monitor and control power consumption to both stay within power budgets and react to reductions in available power. 2 Modeling We adopt a model-based approach to ISHM The basic idea is that well-constructed models capture the relationships between the variables in the system, and between variables and components. These relationships form the basis for designing powerful to diagnosis, fault-adaptive control, and prognosis in a common framework [6]. In its basic form, a model is an executable software representation of a system that can be used to simulate system behavior. A number of different modeling forms for physical processes have been proposed by design engineers, operations specialists, and maintenance personnel. Integrated health management systems for most complex systems, such as the ALS, requires models for analysis of dynamic system behavior at different levels of detail. Attempting to build a comprehensive detailed first-principles model of the system is very difficult and time consuming, and analysis with that kind of a model will most likely be computationally intractable. Therefore, is important to build models at the right level of detail to support the tasks for which they are to be used for. For example, a model for diagnosis should have an explicit representation for the components that are under diagnostic scrutiny. Models for diagnosis and control also need to capture the dynamic behaviors that influence system functionality and performance. On the other hand, tracking the interactions between subsystems and tracking of system performance may be accomplished by higher level functional models that focus on interactions that are governed by material balance, energy transfer, and resource consumption, but these models do not need to include details of individual component dynamics. To develop ISHM applications for the ALS, we build the two kinds of models: (i) those that model subsystem behavior by composing component dynamic behaviors using principles of enegy transfer and energy conservation [18, 24], and (ii) higher level models that define subsystem interactions in terms of material and energy balance and resource consumption [21]. From another perspective, these models correspond to ones that apply to the control loop, and the ones that apply to the decision-making loop in Figure 1. Our approach to building models for diagnosis and control is to develop physics-based component models using the bond graph [18] and hybrid bond graphs [24] for continuous and hybrid system behaviors, respectively. The decision-making loop models are resource-based and operate as discrete-time and discrete-event models using very coarse time scales [21]. We discuss our approach to the two modeling paradigms next. 2.1 Physics-based modeling We develop our physics-based component models and compose them into subsystem and system models using well-defined component interfaces defined by our modeling environment toolset [19, 23, 30], The toolset 6 includes component-oriented model libraries of physical processes. Each component has well-designed interfaces to allow for construction of subsystem and system-level models by composition. The toolset also allows for designing sensor and actuator interfaces for plant models, and software-based controllers for managing plant behavior. Modeling physical system dynamics is based on bond graphs, a methodology that captures multi-domain system dynamics into an integrated, homogeneous, energy-based compositional modeling framework [18]. The Hybrid Bond Graph (HBG) paradigm is an extension that allows discrete switching between modes of behavior to capture both continuous and discrete behaviors of a system [24]. The discrete changes may be attributed to control actions that turn system components on and off or change system parameter settings, and autonomous changes that flip on-off switches when state variables of the system cross pre-specified threshold values. In a HBG, mode changes are implemented by switching bond graph junctions off and on using signals that are computed by parameterized decision functions. Nonlinear systems are modeled by components that have time-varying parameters, i.e., the parameter values are defined by modulation functions, whose arguments are again system variables. Parameters for both the decision and modulation functions can be system variables and external signals. The FACT toolset includes translators that can generate simulation test beds for diagnosis and control applications [23, 4]. Convenient user interfaces allow the user to enter faults with pre-defined profiles at specific times into the simulation, and the fault data generated can be used for testing diagnosis and health management routines. Figures 3 and 4 illustrate the component-oriented models for an air revitalization system (ARS) and a bio-regenerative water recovery system (WRS) system, respectively. The WRS system model corresponds to the physical system test-bed described in [10]. The ARS model is a preliminary model of an advanced ARS that incorporates CO2 removal and reduction and O2 generation. The models capture the main sub-systems as components, and provide interfaces (ports) for both actuator and sensor signals, and energy flows. For example, the principal energy flows through the WRS are potable and waste water flows (grey water is not explicitly captured in this model). The component model of the reverse osmosis (RO) subsystem is illustrated in Figure 5. The physical system modeling technique and diagnosis experiments for this subsystem were reported previously [7]. Hierarchical refinement of the component models to the lowest level results in HBG model fragments that are drawn from a library of standard (e.g., pumps, pipes, and valves) as well as custom components for this application domain. The HBG model of the membrane, a custom component of the RO subsystem, is illustrated in Figure 6. This lumped parameter model adjusts the flow resistance through the membrane based on the computed conductivity value of the water in the recirculation loop of the system Simulation tools are essential for developing the right models of complex systems. Through an iterative process, system behavior generated by simulating the models allows the the designer to refine the models by comparing against actual system measurements, and then using parameter estimation techniques to improve model accuracy. The simulation environment provides added functionality in that it allows modelers to insert parametric faults into system components at user-specified times during system operation, with a chosen fault profile and fault magnitude. This provides a powerful tool for testing fault-adaptive performance of the system in a simulation environment. The model interpreter constructs an abstract block diagram representation of the HBG model, and then synthesizes a MATLAB/Simulink representation for the hybrid model. A Graphical User Interface allows easy scenario construction. The simulation consists of two main components: (i) the Simulink block diagram, and (ii) a causality assignment (SCAP) algorithm. These two components contain all the information described in the HBG as well as the Input/Output aspect information of the model. HBG components are implemented as a library of Simulink blocks. The Simulink model preserves the component-based hierarchy of the system model. The simulation models generated by the interpreter have formed the basis for running most of the ISHM studies described in this paper. 2.2 Resource-based modeling A more abstract view of a life support system looks upon its subsystems as consumers of some resources and producers of other resources and waste products. This includes the human elements of the system, who are modeled as biological systems that consume oxygen, water and food depending on their metabolism and the 7 ARS Concept model CO2_Compressor_sw CO2 H2 Heat Blow O2_Compressor_sw CH4 H2O T_2 F_1 T_CRS1 F_CRS1 CRS H2_Compressor_sw CRS_heater In Swi CRS_blower Out CO2_Compressor P_CO2 CO2_tank O2 H2 Pres H2O Setp OGS_sp Pres Out In P_OGS OGS CDRS_HeaterA CDRS_HeaterB In Swi Out Pres Out In P_O2 CDRS_Blower CDRS_Valve1 Heat Heat Blow Prec AirS Valv Valv Valv Valv Valv Valv AIR_ CDRS_Precooler CDRS_AirSavePump CDRS_Valve2 CDRS_Valve3 CDRS_Valve4 O2_Compressor O2_tank T_cdrs1 T_1 P_1 CO2 AIR_ P_cdrs1 In Swi CDRS Out Pres Out In P_H2 CO2_conc CDRS_Valve5 H2_Compressor H2_tank CDRS_Valve6 Air_ O2 Setp Regulator_sp Air_in Air_out Regulator CO2 Out In Air_ O2_c O2_conc CH4 Pipe H2O_out H2O_in Figure 3: Component-oriented model of an Air Revitalization System. BioRegenerative WRS with reject valve P_Nit1 NitrifierAirSlough1 NitrifierAirSlough2 Nitr Nitr Nitr Nitr Nitr Nitr Nitr Nitr Recy Feed In NitrifierAirSlough3 NitrifierAirSlough4 NitrifierPump1_ctl NitrifierPump2_ctl NitrifierPump3_ctl P_Ni P_Ni P_Ni P_Ni P_OC P_GL P_Re Out F_Re F_GL P_tm F_Fe P_Fe BioWaterProcessor NitrifierPump4_ctl P_Nit2 P_Nit3 P_Nit4 P_OCOR P_GLS Feed Reci Valv In RecyclePump_ctl P_RecPump P_me K P_pu F_pe P_lo Out_ Out_ P_memb Conductivity P_pump ReverseOsmosis FeedPump_ctl F_perm P_loop MultiWayValve_ctl Posi In FeedPump_ctl RecyclePump_ctl Pres Out1 Out2 RejectValve RejectValve_ctl Blow Heat Cool Cond In Blower_ctl Heater_ctl CoolantPump_ctl CondensatePump_ctl In Flow Out PostProcessor T_ai T_ai T_ai T_co V_ai P_st P_wr P_co Out AirEvaporation T_air1 T_air2 T_air3 T_coolant V_air P_steam P_wres P_condensate In Out_potable Out_reject Figure 4: Component-oriented model of a Water Recovery System. 8 RO subsystem: Conductivity calculation decoupled from system FeedPump_ctl P_memb Spe InF Flo Out FeedPump In1 In2 Pres Out PrimaryMerge Pum In1 In2 Ene TubularReservoir Pres Out SecondaryMerge Pre In Pre Out RecirculationPump In Flow Out InterConnectPipe1 RecircPump_ctl K P_pump BrineFl FlowIn Mode K Conductivity K FlowIn MembPre out_per out_bri Membrane F_perm P_loop Valve_ctl Pos In Pre Out Out Out ThreeWayValve In In Flow Out InterConnectPipe2 In Flow Out PermeateOutpipe Out_perm Out_brine Membrane Figure 5: Component-oriented model of the Reverse Osmosis system. K MembPressure Rmemb Cmemb out_perm PermeatedFlow FlowIn MembPress out_brine Rdrain Figure 6: Component HBG model of the Reverse Osmosis system membrane. activities (see Section 1.1.1). Subsystems of the ALS are modeled as producers and consumers of resources. The underlying technologies, the individual component dynamics and the particular configuration of valves, pumps, blowers, and other components within the subsystem is unimportant for these kinds of analyses, and, therefore, not included in the models. Over the last several years NASA JSC has been developing a resource-based model of life support systems called BioSim [20, 21]. BioSim consists of all of the life support components described in Section 1.1 and shown in Figure 2. Readers interested in experimenting with monitoring or controlling life support systems can obtain the simulation from http://www.traclabs.com/biosim. BioSim is a discrete event simulation with a variable time step that is currently set to one hour. In simulation, each time step is mapped onto a simulation “tick.” Each module has a local counter that advances that module’s state from t to t + 1, i.e., advances its state one hour in the default setting. While each module is run sequentially, data is cached so that all modules use data generated from the previous tick, which effectively makes all modules run in parallel. When modeling life support systems we need to consider nominal and off-nominal situations. BioSim models malfunctions in each module and has an application programmer’s interface (API) to introduce those malfunctions at any time in the simulation. Each module can have malfunctions of varying degrees of severity and temporal length. For simplicity, the malfunctions have been divided into two categories: length (permanent and temporary) and severity (low, medium and high). These malfunctions are interpreted differently by each module. For example, a temporary but severe malfunction in the potable water store would be a large water leak. A permanent but low severity malfunction in the power production module would be the loss of a small part of a solar array. BioSim also models stochastic processes. Because real life support systems are not deterministic, neither is the simulation. For example, the exact amount of air that is breathed in by a crew member is different with every breath. This is modeled using a Gaussian function with adjustable parameters. The Gaussian can be set to zero to produce a deterministic simulation. 9 3 System architecture An ISHM architecture for life support systems will require many interacting components. Figure 1 shows a potential architecture for an ISHM system for life support. Parts of this architecture have been implemented in various life support systems over the past ten years. We will describe each component of the architecture in turn and discuss experimental results. 3.1 Behavior monitors and diagnoser Model-based approaches to fault detection, isolation and identification (FDII) include many different approaches that have been developed over the past decades [12]. Our focus in this area has been on physical system component faults, rather than sensor or actuator faults. These faults result in transient behavior in the system response, and analysis of the transient is at the core of the fault isolation algorithms[19, 26, 27]. Our approach to diagnosis explicitly separates the fault detection task from the fault isolation and identification tasks. A numeric observer is realized using an extended Kalman filter-based [14] state estimator [19]. Fault detection is realized through a sliding window hypothesis testing scheme in the time domain using a bank of Z-test detectors [8], and in the time-frequency domain [23] using an energy-based scheme. The energy-based scheme explicitly utilizes the properties of the energy in a fault transient response to design a statistical test that is tuned to trade sensitivity to faults versus likely false alarms. For faults that do not manifest with distinctive transient behaviors, i.e., incipient or degradation faults, our work exploits the results in the literature on change detection to design fault detection filters that are based on likelihood ratio derived techniques [16]. Fault isolation and identification is implemented as a two-stage process. The first stage uses an qualitative fault isolation engine that operates on a symbolic transformation of the residual [25, 26]. This generates a potential candidate list and fault signatures that predict measurement fault dynamics after the fault occurrence. As time progresses and additional measurement deviations are observed, the fault isolation scheme removes spurious candidates from that initial candidate set. Qualitative symbolic analysis is fast but the loss of information in the transformation can result in multiple candidates. At an appropriate time, the system switches from fault isolation to fault identification [8, 26]. Fault identification uses a search method to perform quantitative parameter estimation with multiple candidate hypotheses. Once reliable estimates are obtained, a minimum square error technique is employed to determine the unique candidate and its estimated parameter value [8]. The fault isolation and identification scheme, initially developed for continuous systems, has been extended to diagnosis of hybrid systems [26]. We illustrate the application of our fault diagnosis scheme for two realistic applications: (i) detection, isolation, and identification of faults and degradation in fuel transfer systems of fighter aircraft (an aerospace application), and (ii) detection, isolation, and identification of faults in the RO system of the water system described in Section 1.1.2 [7]. The two projects used actual data provided by Boeing and a NASA JSC RO system test, respectively. 3.1.1 Diagnosis of component faults in the Fuel Transfer System The generic fuel transfer system for fighter aircraft is illustrated in Figure 7. The system is designed to provide an uninterrupted supply of fuel at a constant rate to the aircraft engines while maintaining the center of gravity of the aircraft. The system is symmetrically divided into left and right parts (top and bottom in the schematic). The four supply tanks (Left Wing (LWT), Right Wing (RWT), Left Transfer (LTT), and Right Transfer (RTT)) are full initially, and so are the two receiving tanks (Left Feed (LFT) and Right Feed (RFT)) that directly feed the engine. During engine operation, fuel is transferred from the supply tanks through a common manifold to the two feed tanks in a sequence determined by the fuel system controller. The controller generates on/off signals for the pumps in the supply tanks and the valves in the pipes to achieve different flow configurations. Table 1 illustrates the results of a set of diagnosis experiments that we ran for a set of faults using the HBG scheme. In the experiments, we varied the fault size and amount of measurement noise in the signal. In designing the experiments, we had to set parameters for the Kalman filter, fault detector, and symbol generator. A high fidelity simulator (from Boeing PhantomWorks in St. Louis, MO) was used to generate the data for the experimental runs, and measurement noise was added to the simulated data. Ten runs were conducted for each noise level and fault size, and the mean values of the detection and isolation times, 10 Typical Fuel System Configuration Right Wing Tank P P P P Right Feed Tank BP LV FEED Right Transfer Tank LV IV BP FM FM BP LV P P Left Feed Tank P P Transfer Pump Level Control Valve Interconnect Valve Boost Pump Flow Meter Fuel Quantity Sensor FM IV TRANSFER INTER- IV CONNECT Left Transfer Tank P Right Engine Left Engine Left Wing Tank Figure 7: Fuel Transfer System of fighter aircraft. Faults Performance Parameters Fault Size Fault Detection Time (seconds) 2% 3% Fault Isolation Time (seconds) 2% 3% Final Candidate Set (number) 2% 3% Parameter Estimation Error (percent) 2% 3% LTT-Pump Efficiency Drop 33% 60% 80% 422 182 134 555 183 134 225 144 124 398 240 197 3 4 4 4 4 5 2.19% 1.28% 0.88% 5.43% 1.79% 1.49% RWT-Pump Efficiency Drop 33% 60% 80% 117 83 5 285 93 5 170 139 55 211 183 106 4 4 3 4 4 4 2.15% 1.52% 0.68% 6.11% 1.67% 0.68% RLCV Valve Block × 1.5 × 1.75 × 2.0 63 51 51 65 58 52 97 86 46 103 398 79 2 2 1 2 1 2 0.62% 0.28% 0.2% 0.5% 0.46% 0.2% Leg 21 Pipe Block × 1.5 × 1.75 × 2.0 99 95 93 100 95 93 136 90 76 150 303 202 3 2 2 3 3 2 1.58% 0.78% 0.19% 1.65% 1.57% 0.34% Fault Type Table 1: Fuel System Experiments with different fault magnitudes and noise levels. the candidates generated by qualitative fault isolation, and the parameter value error after least squares estimation are reported in the table. Note that the results of the qualitative fault isolation are ambiguous and they produce multiple fault candidates. The quantitative parameter estimation procedure reduces the fault hypotheses to a single candidate, and also estimates the magnitude of change in the parameter as a result of the fault or degradation in the particular component. The results indicate that as the noise levels in the measurements increase and the fault magnitudes become smaller the time to detection, isolation, and identification (i.e., parameter estimation) increase, and the parameter estimation error increases. Further details of the fuel system models and diagnosis experiments can be found in [26]. 11 3.1.2 Diagnosis of component faults in the Reverse Osmosis system Similar health monitoring studies were conducted on the Reverse Osmosis (RO) subsystem mentioned earlier. The WRS and RO system models illustrated in Figure 4 and Figure 5 capture the physical flow thourgh the system. Input water from the previous subsystem, a Biological Waste Processor (BWP) is pushed at high pressure through the membrane. Clean water (permeate) leaves the system, and the remaining water (with a larger concentration of brine) is recirculated in a feedback loop. As a result, the concentration of impurities in the recirculating water increases with time. The system cycles through three operating modes, which are set by the 4-way multi-position valve. The feed pump, which is on in all modes, pulls effluent from the BWP and creates a flow into the system through a coiled pipe, which acts as a tubular reservoir. In the primary mode (valve setting 1), the input flow is mixed with the water in the primary recirculation loop. The recirculation pump boosts the liquid pressure as it flows into the membrane. The flow through causes dirt to accumulate in the membrane, which increases the resistance to the flow through it, thus causing the outflow from the system to decrease with time. At a predetermined fluid pressure value at the membrane, the system switches to the secondary mode (valve setting 2), and the recirculating fluid is routed back to the membrane in a smaller secondary loop. This causes the liquid velocity (and, therefore, flowrate) to increase, and as a result the outflow from the system does not keep decreasing as sharply as it does in the primary loop. As clean water leaves the system, the concentration of brine in the residual water in the RO loop keeps increasing. At some point the increasing concentration plus the collection of impurities in the membrane decreases the output flow significantly, and again at a predetermined pressure value the RO switches to the purge mode (valve setting 3), where the recirculation pump is turned off, and concentrated brine is pushed out to the next subsystem, the Air Evaporation System (AES). Following the purge operation, the system goes back to primary mode. For the health monitoring experiments, we used five of the measurements (see Figure 5): (i) the pressure immediately after the recirculation pump, Ppump , (ii) the pressure of the permeate at the membrane, Pmemb (iii) the pressure of the liquid in the return path of the recirculation loop, Pback , (iv) the flow rate of the effluent, Fperm , and the conductivity of liquid in the return path of the recirculation loop, K. Simulation experiments were run on a number of fault scenarios. Empirical information on sensor noise was not available, so we simulated measurement noise as Gaussian white noise with a noise power level set at 2% of the average signal power for each measurement. Fault scenarios were created that correspond to abrupt faults in the pump (loss of efficiency and increased friction in the bearings), membrane (clogging), and the connecting pipes (blocks). Table 2 presents the comprehensive results for selected faults in the RO system. For each scenario, the qualitative fault isolation scheme reduces the initial candidate set considerably, and parameter estimation converges to the correct fault candidate. The estimated parameter values were also quite acceptable for all scenarios. This demonstrated the effectiveness of the health monitoring, faults isolation, and fault identification methodology. 3.2 Fault-adaptive controller The Fault Adaptive Control scheme is designed as a hierarchical limited look-ahead control scheme [2, 3], where the overall control scheme tries to satisfy given specifications (e.g., throughput for the WRS system) by continuously monitoring the system state and selecting input from a finite control set that will best meet the given specifications. In addition, the controller is required to keep the system stable within the domain that satisfies the specifications. In this setting, the controller is simply an agent that generates a sequence of events to achieve a given objective. This objective is typically expressed as a multi-attribute utility function that takes the form P i Vi (Pi ), where each Vi corresponds to a value function associated with performance parameter, Pi . The parameters, Pi , can be continuous or discrete-valued, and they are derived from the system state variables, x(t), i.e., Pi (t) = pi (x(t)). The value functions employed have been simple weighted functions of the form Vi (Pi ) = wi · Pi , where the weights, wi ∈ [−1, 1] represent the importance of the parameter in the overall operation of the system. For example, the utility function for the RO system is given by V (k) = k+N X i=k aK ( f (i) K(i) P (i) ) + af ( ) + aSV .SV + aP ( ), KM AX fM AX PM AX 12 Fault + Rmemb , 5% tf : 20000 − GY , 5% tf : 17500 + Rep , 35% tf : 20000 + Rpipe , 15% tf : 18000 − Cmemb , 10% tf : 19600 t − tf Step Symbolic Candidate set + parameter estimation 800 0 Fperm (f 25) : (−, ·) + − − + − − Cc+ , Cmemb , If+p , Iep , Rbrine , T F + , Rpipe , Rmemb , Rf+p , Rep , GY − 7200 1 Pback (e1) : (+, ·) − + − If−p , T F + , Rpipe , Rmemb , Rf+p , Rep 8280 2 Pmemb (e16) : (+, ·) 200 0 Ppump (e37) : (−, ·) − + − Rpipe , Rmemb , Rep + parameter estimation selects Rmemb , indicates change by 1.042 + − − − + + Cc+ , Cmemb , If+p , Iep , Rbrine , T F + , Rpipe , Rmemb , Ck+ , Rf+p , Rep , GY − 880 1 Fperm (f 25) : (−, ·) − + + If+p , Iep , Rbrine , T F + , Rf+p , Rep , GY − 1240 1960 2 3 Pback (e1) : (−, ·) K(e35) : (−, ·) 88 0 Ppump (e37) : (−, ·) − + + Iep , Rbrine , Rep , GY − + + Iep , Rep , GY − parameter estimation: GY − changed by 0.934 + − − − + + Cc+ , Cmemb If+p , Iep , Rbrine , T F + , Rpipe , Rmemb , Ck+ , Rf+p , Rep , GY − 640 1 Fperm (f 25) : (−, ·) − + + If+p , Iep , Rbrine , T F + , Rf+p , Rep , GY − 720 960 4640 2 3 4 Pback (e1) : (−, ·) Ppump (e37) : (−, −) K(e35) : (−, ·) 640 0 Pback (e1) : (−, ·) − + + Iep , Rbrine , Rep , GY − − + Rbrine , Rep + Rep + parameter estimation: Rep changed by 0.374 − − + + + + Cc− , Cmemb If−p , Iep , Rbrine , T F + , Rpipe , Rmemb , Ck+ , Rf−p , Rep , GY + 800 1 Pmemb (e16) : (−, ·) − + + Rbrine , T F + , Rpipe , Rep , + parameter estimation: Rpipe changed by 1.134 360 0 Fperm (f 25) : (−, ·) − + + − + + Cc− , Cmemb If−p , Iep , Rbrine , T F + , Rpipe , Rmemb , Ck+ , Rf−p , Rep , GY + 480 8680 1 2 Pback (e1) : (−, ·) Pmemb (e16) : (−, ·) + − , T F + , GY + Rbrine Cmemb − + Cmemb Rbrine , GY + − changed by 0.856 parameter estimation: Cmemb Table 2: RO diagnosis results for selected faults. where K(i) represents the conductivity of the water in the RO loop at time step i(conductivity is a measure of the concentration of brine in the water), f (i) represents the flow rate of clean water out of the membrane, SV is a measure of the cumulative number of valve switches that occur in the RO, and P (i) is a measure of the power consumed by the RO subsystem. This utility function trades off power consumed and switching on the one hand against the conductivity (dirtiness) of the water in the RO, and out flowrate of clean water from the RO. The relative weights and sign of the contribution to the utility function are determined by the magnitude and sign of the coefficient weights, aK , af , aSV , and aP . For the RO, aSV and aP are negative, whereas aK and af are assigned positive values. During operation the weights could be adjusted to handle situations where there are power restrictions to situations where high outflow are required. The utility-based controller also helps maintain desired performance under degraded anf faulty conditions. A set of simulation experiments were conducted to illustrate multi-level fault adaptive control of the RO system. Figure 8 shows the behavior of the system under online control in the presence of fault. A block in a pipe (resulting in 35% increases its resistance) was introduced at time t = 400 sec and was isolated at time t = 430 sec using the model-based FDI scheme described in the previous section. The online controller managed to compensate for the fault by increasing the time spent in the primary loop of RO operation. The overall average utility in this case was only 0.93% less than the utility in the non-faulty situation. In Figure 8 the original system output (no failure) is shown as a dotted line for comparison. 3.3 Supervisory controller The supervisory controller extends hybrid and fault adaptive control of individual subsystems to the control of interacting distributed subsystems that operate under resource constraints [1]. The supervisory controller uses a plug and play computational architecture, where the higher-level global controller acts more as a resource manager and scheduler, whereas the lower level controllers can take on one of two forms: (i) a model-predictive controller based on decision theoretic utility functions as described in Section 3.2, and (ii) a procedure-based execution component. The global controller reasons over resources, system configurations, and desired system output. The latter executes pre-defined scripts that place the system into different modes, respond to specific events and allow for direct human interaction. We discuss each of these in turn. 13 Figure 8: System Performance with Utility-based controller under fault conditions. 3.3.1 Supervisory Controller Since a detailed behavioral model of the underlying distributed system may be very complex, reasoning at this level uses an abstract (simplified) model to describe the composite behavior of the system components that is relevant to the overall requirements and operational constraints. The abstract model uses a set of global variables that are related by the input-output interactions between the individual systems. Moreover, the global controller’s decisions are based on aggregate behaviors, which are determined over longer time frames compared to the individual systems. The global model is represented by y(k+1) = g(y(k), v(k), µ(k)), where y(k) is the global state vector, v(k) ∈ V and V is the set of global control inputs which represent a set of local control settings for the local modules, and µ(k) are the global environmental inputs. The map g defines how the global state variables respond to relevant changes in environment inputs with respect to the global control inputs. The objective of the model-based reasoner is to minimize a given cost function over the operation span of the system. We assume here also that the cost function takes the form of the set point specification. The global specifications are communicated to the procedure-based executor for implementation. 3.3.2 Procedure-based execution Procedures are standardized methods for operating a system. They are pre-defined by system engineers. They typically involve a sequence of commands given to the system to move it from one configuration to another. They can be initiated by automation or by a human. In previous life support applications we have used the Reactive Action Packages (RAP) system [13] for procedure representation and procedure execution [10]. Each procedure in the RAP system consists of a set of preconditions (conditions that must be true before the procedure can be executed), a set of commands to be executed and a set of succeed conditions (conditions that are true after executing the procedure). The set of commands can be ordered in various ways (e.g., parallel, sequential) and controlled via timing relationships between the steps. Procedures cannot be created on-the-fly but are all pre-defined and available for execution. Automation or humans can request that a procedure be executed. In addition, procedures can be triggered automatically by specific external events. 14 Figure 9: Drawing of the proposed Northrop Grumman/Boeing CEV. 3.4 Resource monitors Resources are vital to the success of any space mission and to life support systems specifically. The ability to manage resources directly affects the mass of a space vehicle which directly affects its cost. For life support systems, resources include gases (such as oxygen, nitrogen and carbon dioxide), water, food, waste (liquid and solid), power, storage tanks and any spare parts such as filters. Resource monitors are responsible for predicting the need for a particular resource over the length of the mission and for allocating and optimizing resource usage. Resource monitors provide an absolute constraint on the supervisory controller described above. 3.5 Planner and scheduler Life support activities, including crew activities that impact life support systems such as exercise, need to be scheduled so as to balance system and crew activities. In current space mission operations this is primarily a manual process done by ground controllers. In ground tests we have begun experimenting with automated planning and scheduling of life support activities. For example, in a space habitat test in 1998 an automated planner was used to schedule solid waste incineration [29]. 4 Future NASA life support applications NASA is embarking on a new exploration vision that will take it to the Moon and beyond. A new set of vehicles and spacecraft is currently being designed to achieve this mission. Each vehicle or spacecraft will require different kinds of life support systems and, therefore, different kinds of health management systems for life support. 4.1 Crew exploration vehicle The Crew Exploration Vehicle (CEV) will be NASA’s successor to the Space Shuttle and will carry humans to the International Space Station by 2012 and back to the Moon by 2018. Because it will primarily be used for short-duration flights it will not have complex, regenerative life support systems. However, there will still be a need for integrated system health management for both the CEV and the ECLS system of the CEV. Most of this will focus on the air subsystem, i.e., those components that create oxygen, remove 15 carbon dioxide and detect trace contaminants. System health management for CEV will encompass more than just fault detection. It will need to be proactive in allocating resources (especially power), scheduling ECLS activities and assessing the life support system’s state and capabilities. The current NASA lunar exploration architecture states that the CEV will be uncrewed in lunar orbit while astronauts explore the lunar surface. Some scenarios envision uncrewed operation for nearly six months. Such uncrewed activities will pose significant system health management requirements – the crew on the surface needs to know that they are returning to a habitable spacecraft. The life support systems will either need to shut down and be restarted or will need to operate during the uncrewed periods. These systems will need to be checked out or restarted before the crew returns. 4.2 Lunar habitats A long-term lunar habitat will require significantly more complex life support systems because of the cost of resupplying resources. In particular, regenerative life support systems will be required especially for air and water. Such life support systems will need even more complicated and integrated system health managers. Planning and scheduling will become more prominent with long-duration missions. Resource monitoring and management will extend mission life at lower costs. 4.3 Mars habitats Mars habitats will require significant regeneration of resources, possibly including food. Because of the significant time delays these life support systems will have to be almost entirely autonomous. Adding crops into a life support systems adds redundancy (crops can produce oxygen, consume carbon dioxide and clean water) in addition to providing food. However, being entirely biological, crops pose significant problems to integrated health management. They are difficult to model and almost impossible to control. Crop planting and harvesting must be planned and scheduled and is driven by a variety of constraints. 5 Conclusions Integrated health management for life support systems poses several interesting challenges mostly because of the human’s impact on the life support system. In most other vehicle systems (propulsion, guidance, navigation and control, power, etc.) the human impact is minimal. In life support systems the human impact is substantial. Humans are producers and consumers of life support system resources. This leads to modeling challenges, human-interaction challenges and control challenges. In this paper, we have outlined a potential approach to building an integrated health management system for life support systems for longduration missions. Pieces of this approach have already been tested in simulation and in hardware tests. We have briefly described some of our previous work in applying diagnosis and fault-adaptive control techniques to aircraft and ALSS subsystems. For NASA to realize its human exploration vision, additional development and testing of health management for life support systems needs to be done. An important decision that needs to be made is that for economic and practical reasons, it is best that the ISHM design be incorporated into the early design phase of the CEV and future mission spacecraft systems. References [1] S. Abdelwahed, J. Wu, G. Biswas, and E.-J. Manders. Hierarchical online control design for autonomous resource management in advanced life support systems. In Proc. of the 35th Intl. Conf. on Environmental Systems, Rome, Italy, July 2005. [2] S. Abdelwahed, J. Wu, G. Biswas, J. Ramirez, and E.-J. Manders. On-line hierarchical fault adaptive control for advanced life support systems. In Proc. of the 34nd Intl. Conf. on Environmental Systems, Colorado Springs, CO, July 2004. 16 [3] S. Abdelwahed, J. Wu, G. Biswas, J. Ramirez, and E.-J. Manders. Online adaptive control for effective resource management in advanced life support systems. Habitation - An International Journal for Human Support Research, 10(2):105–115, February 2005. [4] C. D. Beers, E.-J. Manders, G. Biswas, and P. J. Mosterman. Building efficient simulations from hybrid bond graph models. In IFAC Conference on analysis and design of hybrid systems, Alghero, Italy, June 2006. To Appear. [5] G. Biswas, P. Bonasso, S. Abdelwahed, E.-J. Manders, J. Wu, D. Kortenkamp, and S. Bell. Requirements for an autonomous control architecture for advanced life support systems. In Proc. of the 35th Intl. Conf. on Environmental Systems, Rome, Italy, July 2005. [6] G. Biswas and E.-J. Manders. Integrated systems health management to achieve autonomy in complex systems. In Proc. of 6th IFAC Symposium On Fault Detection Supervision and Safety for Technical Processes. Beijing, PR China, August 2006. To Appear. [7] G. Biswas, E.-J. Manders, J. Ramirez, N. Mahadevan, and S. Abdelwahed. Online model-based diagnosis to support autonomous operation of an advanced life support system. Habitation: An International Journal for Human Support Research, 10(1):21–38, January 2004. [8] G. Biswas, G. Simon, N. Mahadevan, S. Narasimhan, J. Ramirez, and G. Karsai. A robust method for hybrid diagnosis of complex systems. In Proc. of 5th IFAC Symposium On Fault Detection Supervision and Safety for Technical Processes, pages 1125–1131, Washington, DC, June 2003. [9] M. Blanke, R. Izadi-Zamanabadi, S.A. Bogh, and C.P. Lunau. Fault-tolerant control systems - a holistic view. Control Engineering Practice, 5(5):693–702, 1997. [10] R. Peter Bonasso, David Kortenkamp, and Carroll Thronesbery. Intelligent control of a water recovery system. AI Magazine, 24(1), 2003. [11] C. G. Cassandras and S. Lafortune. Introduction to Discrete Event Systems. Kluwer, Norwell, MA, 1999. [12] J. Chen and R. J. Patton. Robust Model-Based Fault Diagnosis for Dynamic Systems. Kluwer Academic publishers, Boston, MA USA, 1998. [13] R. James Firby. An investigation into reactive planning in complex domains. In Proceedings of the National Conference on Artificial Intelligence (AAAI), 1987. [14] A. Gelb. Applied optimal Estimation. MIT Press, Cambridge, MA, 1994. [15] Sara Goudarzi and K.C. Ting. Top-level modeling of crew component of alss. In Proceedings International Conference on Environmental Systems, 1999. [16] F. Gustafsson. Adaptive filtering and change detection. John Wiley & Sons, Ltd, United Kingdom, 2000. [17] Harry Jones and James Cavazzoni. Top-level crop models for advanced life support analysis. In Proceedings International Conference on Environmental Systems, SAE paper 2000-01-2261, 2000. [18] D. C. Karnopp, D. L. Margolis, and R. C. Rosenberg. Systems Dynamics: Modeling and Simulation of Mechatronic Systems. John Wiley & Sons, Inc., New York, third edition, 2000. [19] G. Karsai, G. Biswas, T. Pasternak, S. Narasimhan, G. Peceli, G. Simon, and T. Kovacshazy. Towards fault-adaptive control of complex dynamical systems. In T. Samad and G. Balas, editors, SoftwareEnabled Control – Information Technology for Dynamical Systems, chapter 17, pages 347–368. WileyIEEE press, Piscataway, NJ, 2003. [20] David Kortenkamp and Scott Bell. Simulating advanced life support systems for integrated controls research. In Proceedings International Conference on Environmental Systems, 2003. 17 [21] David Kortenkamp, Scott Bell, and Luis Rodriguez. Simulating lunar habitats and activities to derive system requirements. In Proceedings 1st AIAA Space Exploration Conference, 2005. [22] Jane Malin, Joseph Nieten, Debra Schreckenghost, Matt MacMahon, Jeffrey Graham, Carroll Thronesbery, R. Peter Bonasso, Jeffrey Kowing, and Land Fleming. Multi-agent diagnosis and control of an air revitalization system for life support in space. In Proceedings of the IEEE Aerospace Conference, 2000. [23] E-.J. Manders, G. Biswas, J. Ramirez, N. Mahadevan, J. Wu, and S. Abdelwahed. A model-integrated computing tool-suite for fault adaptive control. In Working Papers of the Fifteenth Intl. Workshop on Principles of Diagnosis, Carcassonne, France, June 2004. [24] P. J. Mosterman and G. Biswas. A theory of discontinuities in physical system models. Journal of the Franklin Institute, 335B(3):401–439, 1998. [25] P. J. Mosterman and G. Biswas. Diagnosis of continuous valued systems in transient operating regions. IEEE Trans. on Systems, Man and Cybernetics – part A, 29(6):554–565, 1999. [26] S. Narasimhan and G. Biswas. Model based diagnosis of hybrid systems. IEEE Trans. on Systems, Man and Cybernetics – part B, 2006. [27] I. Roychoudhoury, G. Biswas, X. Koutsoukos, and S. Abdelwahed. Distributed diagnosis. In Working Papers Fifteenth Int Workshop Principles Diagnosis, Monterey, CA, June 2005. [28] Debra Schreckenghost, Cheryl Martin, Pete Bonasso, David Kortenkamp, Tod Milam, and Carroll Thronesbery. Supporting group interaction among humans and autonomous agents. In AAAI 2002 Workshop on Autonomy, Delegation, and Control: From Inter-Agent to Groups, 2002. [29] Debra Schreckenghost, Daniel Ryan, Carroll Thronesbery, R. Peter Bonasso, and Daniel Poirot. Intelligent control of life support systems for space habitats. In Proceedings of the Conference on Innovative Applications of Artificial Intelligence, 1998. [30] G. Simon, G. Karsai, G. Biswas, S. Abdelwahed, N Mahadevan, T Szemethy, G. Peceli, and T. Kovacshazy. Model-based fault adaptive control of complex dynamic systems. In Proc. of the 20th IEEE Instrumentation and Measurement Technology Conf., Vail, CO, May 2003. [31] T. O. Tri. Bioregenerative planetary life support systems test complex (bio-plex): Test mission objectives and facility development. In Proc. of the 29th Intl. Conf. on Environmental Systems, 1999. [32] K. Zhou and J. Doyle. Essentials of Robust Control. Prentice Hall, Inc, 1998. 18