Service-Oriented Architecture on the Grid for FDI Integration X. Ren*, M. Ong*, G. Allan*, V. Kadirkamanathan*, H. A. Thompson*† and P. J. Fleming*† † Rolls-Royce University Technology Centre in Control and Systems Engineering * Department of Automatic Control and Systems Engineering University of Sheffield Sheffield S1 3JD UK Email: {X.Ren, M.Ong, Jeff.Allan, Visakan, H.Thompson, P.Fleming}@shef.ac.uk Abstract Many model-based fault diagnosis approaches have been proposed so far and some of them are applied in industrial practice. But for modern complex processes, due to the variable nature of faults and model uncertainty, no single approach can diagnose all faults and meet different contradictory criteria. In this paper, the importance of integration of different fault diagnosis schemes in a common framework is emphasised. A service-oriented architecture for the integration is proposed based on Grid technologies. Some implementations of this integration for the gas turbine engine fault diagnosis are presented. 1. Introduction Model-based approaches are widely accepted modern approaches for industrial fault detection and isolation (FDI). It is based upon the idea that measurements from dissimilar sensors are functionally related because they are all derived from the same state of the system. Any violation of these relationships indicates the occurrence of faults. Although the model-based approach is commonly accepted as a modern approach for fault diagnosis, due to model uncertainties, intense computational requirements and unknown complicated nature of fault diversity, there is no single widely accepted generic solution of model-based fault diagnosis. Researchers working in this area have proposed different approaches of using different algorithms under the name ‘model-based’ [1-3]. Each approach, however, has its own focus and none of them is a universal approach, neither suitable nor available for all fault types. To overcome these shortfalls and exploit advantages of different approaches, an integration of different methods or hybrid schemes are highly recommended [4, 5]. With the latest development in Internet and Intranet technologies, especially with the development of the Grid computing, it is now possible to provide different algorithms as individual Grid services and combine these services dynamically to generate a ‘virtual organisation’ for fault diagnostic purposes. In this paper, a service-oriented architecture on the Grid for integrated fault diagnostics is proposed. Different fault diagnosis algorithms and analysis tools are provided as Grid services in this framework. Through configuration and workflow management, these services can be dynamically invoked to form a flexible distributed fault diagnosis environment for fault diagnosis and maintenance. This paper is organised as follows: Section 1 presents an overview of model-based fault diagnosis. Different approaches developed for model-based fault diagnosis are summarised and compared. In Section 2, the concept of Grid computing is introduced. The service-oriented architecture on the Grid for fault diagnosis is proposed and the advantages are highlighted. In Section 3, the DAME project is introduced and some experimental work on the FDI integration are presented. The developed gas turbine engine simulation services for fault detection and casebased reasoning services for decision-making are detailed. Finally, some concluding remarks are made. 2. Model-based Fault Diagnosis Model-based approaches are widely accepted modern approach for solving fault diagnosis problems. Model-based fault detection and isolation focuses on dynamic consistency (parity) relations and parameter estimation. The basic procedure of using model-based FDI is firstly to generate the analytic symptoms by using analytical knowledge about the process based on process observation. Then the generated analytic symptoms as well as heuristic symptoms are analysed at the fault diagnosis stage for finding the type, size and location of a fault as well as its time of detection. In general, model-based FDI is a two-step procedure: residual generation and residual evaluation. Residual generation is a process in which the input and output of a system are monitored and manipulated to generate a signal or vector, the so-called residual. The residual should be normally zero or close to zero when no fault is present, but is distinguishably different from zero when a fault occurs. Residual generation is thus a procedure for extracting fault symptoms from the system, with the fault symptom represented by the residual signal. The residual should ideally carry only fault information. To ensure reliable FDI, the loss of fault information in residual generation should be as small as possible. Residual evaluation is the analysis of the residual to examine the likelihood of faults. A decision is made based on the knowledge about the process and the symptoms. If a fault has occurred, more analysis should be made to isolate or even identify the fault. A decision process may consist of a simple threshold test on the instantaneous values or moving averages of the residuals, or it may consist of methods of more sophisticated decision theories. From the various approaches used for the residual generation and evaluation, there are roughly four different approaches used by model-based FDI: • Observer approach or parity relations approach. The underlying idea of observer approach is to estimate the system outputs from the available inputs and output of that system. The residual will then be a weighted difference between the estimated and actual outputs. In a similar way, the parity relations approach is based either on a technique of direct redundancy, making use of static algebraic relations between sensor and actuator signals or alternatively, upon temporal redundancy, when dynamic relations between inputs and outputs are used. • Parameter estimation approach. This approach makes use of the fact that component faults of a dynamic system can be thought of as reflected in the physical parameters of the system, e.g. friction or mass velocity resistance. A fault is then can be detected through the parameter estimation or model identification. • Statistical approaches. Mainly used to improve the fault detection capability, statistical approaches such as the generalised likelihood ratio (GLR) test can be used to find changes of the residual signal more quickly and accurately. Principle component analysis (PCA) and Fisher discriminant analysis (FDA) are the most widely used techniques for dimensionality reduction and pattern classification on FDI. • Qualitative approach. The qualitative approach is based upon the concept of a qualitative model which unlike its quantitative counterpart only requires declarative (heuristic) information. An expert system, for example, is one of the quantitative approaches that use if-then rules to represent the human knowledge of the relation between the normal/abnormal system behaviour and the causes of faults. The fault tree approach traces the evolution of the fault through the dynamic system described by a fault tree, event trees or causal networks. There are also other qualitative model-based approaches which use a qualitative model derived directly from the physical laws of the system under consideration. Bond Graphs and Petri Nets, for example, can be used to assist this modelling purpose. In general, a fault diagnosis technique should be able to complete the following two main tasks: • detect and isolate different faults occurring in a process, which include sensor faults, actuator faults and component faults. • detect and isolate incipient faults as well as abrupt faults. While at the same time, a fault diagnosis scheme should also consider the following criteria for better fault diagnosis performance: • Promptness of fault detection • Sensitivity to incipient faults • False alarm rate and missed fault detection • Incorrect fault identification Although different model-based approaches are designed to solve fault diagnosis problems, each approach has its own focus and none of the above mentioned methods are a generic solution for complex modern system fault diagnosis. This is due to the complicated nature of the monitored system and faults, the applicability of different modelling approaches or the insufficient knowledge about the monitored Table 1. A comparison of different model-based approaches for FDI process. Table 1 is a summarised comparison of different selected model-based approaches for fault diagnosis. From this comparison, it is clear that none of these approaches can fully satisfy the requirements of modern fault diagnosis such as promptness, accuracy and sensitivity to faults. It is commonly agreed that hybrid schemes would provide better solutions for future complex system fault diagnostics [6, 7]. Thus an evaluation of different FDI approaches and the integration of assorted FDI approaches in an open computational environment are crucial. Additionally, it is important to consider how these approaches should be used in conjunction to provide the most accurate diagnosis in the decision-making process. Another restriction of using some modelbased FDI techniques is the inherent demands for intensive computing power for modelling and simulation. These computation requirements also limit its application on large scale complex systems. The Grid technology provides the necessary High-Performance, High-Throughput computing power to overcome these restrictions. The distributed computing structure and the organised resource sharing features provide excellent opportunities to integrate different FDI schemes to obtain better fault diagnostic performance. 3. Service-Oriented Architecture on the Grid for FDI Integration Service-oriented architecture (SOA) is not a new concept. The early service-oriented architecture uses the Distributed Component Object Model (DCOM) [8] or Object Request Brokers (ORBs) based on the CORBA specification [9]. Over the last few decades, software has slowly been decoupled. The introduction of the Client/Server structure removed the database from the fat client. The thin-client decoupled the user interface from the business logic. Service-oriented architectures were proposed to decouple the integration logic from the business logic. Basically, services and service-oriented architectures are about designing and building systems using heterogeneous network addressable software components. A service-oriented architecture is thus an architecture that made up of components and interconnections that stresses interoperability and location transparency. With the introduction of Web services and Grid services, there have been renewed interests in building service-oriented architectures for ‘virtual organisations’ [10]. Essentially use XML to create a robust connection, Web services is the most likely connection technology of service-oriented architectures. At the core of the Web services model is the notion of a service, which is defined as a collection of operations that carry out some types of tasks. Within the context of Web services, there are three components, namely service providers, service requestors and service brokers. A service is deployed on the Web by the service provider. The functions provided by a given Web service are described using the Web Services Description Language (WSDL) [11] and published on the Web. A service broker helps the service provider and service requester find each other through a UDDI (Universal Description, Discovery, and Integration) [12] based registry. A service requester uses the standard API to ask the service broker about the services it needs and then uses SOAP (Simple Object Access Figure 1. Integrated fault diagnosis on the Grid Protocol) [13] to invoke the remote service provider side applications. The Grid is a name that was first coined in the mid-'90s to describe a vision for a distributed computing infrastructure for advanced science projects. First properly explained by Ian Foster and Carl Kesselman [14], the Grid should enable “resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organisations”. With the first generation Grid involving “metacomputers” and second generation Grid focused on middle ware and communication protocols, now it is claimed that the third generation Grid is combining service-oriented architecture concepts and Web services technologies to create Open Grid Services Architecture (OGSA) [15], whereby a set of common interface specifications support the interoperability of discrete, independently developed services. The Open Grid Services Infrastructure (OGSI) service specification is the keystone in implementing this architecture, followed by the recent Web Service Resource Framework (WSRF), which is a set of six Web services specifications that try to meld Web service with Grid computing by defining how to model and manage state in a Web services context. Defined by OGSA, a Grid Service is basically a Web service, which is a set of Internet-based distributed processes. Based on the emerging standards such as XML, SOAP, WSDL and UDDI, the promise of Grid services is to enable a distributed environment in which any number of applications, or application components, can interoperate seamlessly among organisations in a platform-neutral, languageneutral fashion on the Grid. To support the complex system fault diagnostics, a methodology have been proposed to integrate suites of modelling, estimation and analysis tools for fault diagnosis on the Grid. The service-oriented architecture is adopted here to support this integration and the latest development on the Grid has been implemented to meet the specifications of OGSA and WSRF. By adopting this open service-oriented architecture for the integrated fault diagnostics, the most commonly used techniques for FDI, which include parameter estimation, state observer, parity relations, statistic approaches and symbolic approaches, can all be developed individually as fault diagnostic services. As illustrated in Figure 1, these services can be created and maintained by different institutions for different fault diagnostic purposes and are all defined through a commonly accepted description format, namely the Web Service Figure 2. Engine simulation Grid service Description Language (WSDL). Registered with the Universal Description, Discovery, and Integration (UDDI) registry, these fault diagnostic services can be discovered and organised in a flexible way. The result is a versatile virtual FDI toolbox on the Grid. Users can access this toolbox by using an Internet browser via a Grid portal, and they can select different FDI schemes to fit their own unique requirements. By adopting the OGSA specification, the proposed architecture is a Grid solution to integrated fault diagnostics, which allows different FDI applications to share algorithms, data and computing resources as well as to access them across multiple organisation in an efficient and secure way. 4. Implementation The UK e-Science pilot project Distributed Aircraft Maintenance Environment (DAME) is developing a distributed diagnostics and prognostics system for maintenance of civil aerospace engines. The techniques will generalise to other diagnostic domains such as medicine, transport and manufacturing. The DAME system uses Grid technology to demonstrate how remote and diverse applications and services can be linked into a virtual diagnostic environment. Various techniques are employed in the project, which include advanced pattern matching to search very large data sets (Terabytes), modelling for fault diagnosis and simulation for decision making, case based advice, workflow management and collaboration environments [16]. As one effort carried out on the DAME project, the proposed service-oriented architecture for integrated fault diagnostics has been implemented. Initially, A gas turbine engine performance model was provided as a Grid service to facilitate the exploitation of further development and analysis of different model-based FDI approaches. Figure 2 illustrates a running scenario of this Grid service through a Web portal. Figure 3 shows one basic usage of the engine simulation Grid service for fault diagnosis. When an accurate system performance simulation is available on the Grid, the experienced maintenance engineers can invoke this simulation against the real monitored process data. The system that is being analysed is compared against the simulation results. The differences between the current state of the engine and the ideal model generate residuals. These residuals then need to be intelligently analysed to form a decision about the current state of the engine. This can be used to track changes in engine parameters which may indicate impending faults. histories instead of the original binary time series data can help to overcome the model uncertainty and unmodeled noise. As the result, the robustness of the fault diagnosis is improved. Figure 4. Event generation and analysis Figure 3. Simulation-based fault diagnosis The advantages of providing an engine performance simulation as a Grid service is that the engine simulation service is identified by a URI, whose public interfaces and bindings are defined and described using XML. Authorised users can perform the engine performance simulation through a Web browser remotely without knowing details of the execution of the simulation. The simulation service itself is distributed among a set of high-performance computers on the Grid. Based on the Globus Toolkit 3 (GT3) [17], this engine simulation Grid service can be invoked simultaneously in different ‘Virtual Organisations’ for different applications. The engine simulation Grid service has also been used in the event generation and analysis for engine fault diagnosis and maintenance. As illustrated in Figure 4, there are two stages for this work. In the observation processing stage, raw measurement of different engine performance variables and engine simulation results as inputs are analysed. A change detection method is used to characterise the input time series. The goal is to recognise changes that are important in the context of engine performance behaviour which correspond to engine faults. This process has two aspects: segmenting data in a meaningful way and extracting features that are useful about whether the engine is exhibiting normal or abnormal behaviour. In the combination stage, two event sequences from both the raw measurement and the engine simulation are compared, any discrepancy will indicate possible fault. The advantage of introducing a separate event history based on the engine simulation is that a reference of health engine history is provided to assist the fault diagnostic decisionmaking. The comparison of two discrete event Another effort of DAME project on the integration of fault diagnostics on the Grid is the use of Case-Based Reasoning (CBR) [18] services to correlate and integrate fault indicators from different aero engine input monitoring systems, BITE reports, maintenance data and dialog with maintenance personnel to allow troubleshooting of faults. As a qualitative fault diagnosis approach, case-based reasoning is a problem-solving paradigm that resolves new problems by adapting the solutions used to solve problems of a similar nature experienced in the past. A further advantage of this approach is that it allows consolidation of rule knowledge and provides a reasoning engine that is capable of probabilistic-based matching. With CBR technology, development can take place in an incremental fashion facilitating rapid prototyping of an initial system. The development of robust strategies for integration of multiple health information sources can be achieved with reasoning algorithms of progressively increasing complexity. Also deployed as Grid services on the Grid environment, the CBR services can be invoked by other authorised Grid services or maintenance analyses to perform the high-level fault diagnostics and decision-making, as illustrated in Figure 5. The advantage of deploying CBR as the Grid service is that maintenance personnel can access a secured connection to the service via a web browser on any computer connected to the Internet. The maintenance personnel will then have access to stores of accumulated diagnostic knowledge and maintenance data as well as large computing resources to support the fault analysis and the decision-making process. Other fault diagnostic services can be used to perform the preliminary fault diagnosis and the results can be used to facilitate the CBR analysis. As standard Grid services, the CBR services can be invoked to produce useful outcomes that are profitable to Figure 5. Case-bsed reasoning Grid service other decision-making services as well. In the future, the CBR services can be upgraded to accommodate a dynamic learning process. Anomalous data (data containing unknown faults) may be analysed in DAME to produce new fault cases that are dynamically appended to the casebase, further increasing the knowledge of the system. A typical use case which encompasses both the engine simulation and CBR services in the fault analysis and maintenance process is described as follows. Data downloaded from an aircraft is first analysed for novelties (known fault occurrences). The existence of fault and the possible fault type can be checked against the engine simulation. If a novelty exists, then further information is extracted from the data and other available fault diagnostic services to form a query within the CBR services. The result returned to the maintenance personnel consists of previous similar fault cases, known solutions to the problem, as well as a confidence ranking for each case. The maintenance analyses and domain experts can further take advantages of the integrated fault diagnostic tools to confirm the fault diagnosis findings. For example, the domain experts can substantiate a proposed fault analysis by injecting the similar fault into an engine model and perform a simulation to check the uniformity. By using the Grid technologies and the proposed serviced-oriented architecture, more modelling, simulation, and decision-making services from different providers or institutes can be shared, coordinated and integrated for fault diagnostics, prognostics and engine maintenance. 5. Concluding Remarks In this paper the use of service-oriented architecture and Grid technologies to support the distributed fault diagnosis of complex systems was discussed. An open framework based on the OGSA has been proposed to address the requirements of integrated diagnostics. The business benefits of this open, flexible approach to integrated fault diagnostics are not only improved fault diagnosis performance, but also reusable service assembly, better maintainability, better parallelism in development, higher availability and better scalability. The deployment and integration of more fault diagnostic schemes on the Grid are needed for this work. Future research will also focus on the workflow management, which orchestrates different fault diagnosis services to deliver the desired coherent fault diagnostics. 6. Acknowledgment This work and the DAME project are supported by the UK EPSRC under contract 2382. The authors would also like to thank the anonymous referees for their helpful suggestions on improving the presentation. 7. References [1] R. Isermann, “Supervison, fault-detection and fault diagnosis methods - an introduction,” Control Engineering Practice, vol. 5, no. 4, pp. 639–652, 1997. [2] R. J. Patton, “Robust model-based fault diagnosis: the state of the art,” in IFAC Sysmpsium on Fault Detection, Supervision and Safety for Technical Processes, Espoo, Finland, 1994, pp. 1–24. [3] J. Gertler, Fault Detection and Diagnosis in Engineering Systems. Marcel Dekker, 1998. [4] R. J. Patton, F. J. Uppal, and C. J. Lopez-Toribio, “Soft computing approaches to fault diagnosis for dynamic systems: A survey,” in 4th IFAC Symposium on Fault Detection supervision and Safety for Technical Processes, Budapest, Hungary, June 2000, pp. 198– 211. [5] R. Isermann and P. Balle, “Trends in the application of model-based fault detection and diagnosis of technical processes,” Control Engineering Practice, vol. 5, no. 5, pp. 709–719, 1997. [6] J. Chen and R. J. Patton, Robust Model-based Fault Diagnosis for Dynamic Systems. USA: Kluwer Academic Publishers, 1999. [7] Y. G. Li, “Performance-analysis-based gas turbine diagnostics: A review,” Proc. Instn. Mech. Engrs, Part A, vol. 216, pp. 363–377, 2002. [8] Microsoft, “Distributed Component Object Model,” URL: http://www.microsoft.com/com/tech/DCOM.asp, 2004. [9] OMG, “Object Management Group,” URL: http://www.omg.org, 2004. [10] I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke, “The physiology of the Grid: An open services architecture for distributed systems integration,” 2002. URL: http://www.ggf.org [11] WSDL, “The Web Services Description Language,” URL: http://www.w3.org/TR/wsdl, 2004. [12] UDDI, “The Universal Description, Discovery and Integration protocol,” URL: http://www.uddi.org/, 2004. [13] SOAP, “The Simple Object Access Protocol,” URL: http://www.w3.org/TR/2000/NOTE-SOAP-20000508/, 2004. [14] I. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999. [15] GGF, “Global Grid Forum,” URL: http://www.ggf.org, 2004. [16] DAME, “Distributed Aircraft Maintenance Environment project,” URL: http://www.cs.york.ac.uk/dame/, 2004. [17] Globus, “Globus alliance,” URL: http://www.globus.org, 2004. [18] J. Kolodner, Case-based Reasoning. Morgan Kaufmann, 1993.