Chapter 1 Dynamic Reconfiguration of Complex Systems to Avoid Failure Fred M. Discenzo Francisco P. Maturana Raymond J. Staron Kenwood H. Hall Rockwell Automation, USA {Fmdiscenzo,Fpmaturana,Rjstaron,Khhal}@ra.rockwell.com Pavel Tichý Petr Šlechta Jan Bezdicek Vladimír Marík Rockwell Automation, Czech Republic {Ptichy,Pslechta,Jbezdicek,Vmarik}@ra.rockwell.com 1.1. Introduction There are increasing pressures for low cost, reliable automation systems even though system applications are becoming increasingly complex and deployed in critical applications. The automation of historically manual or mechanical systems is contributing to the growth in coupled automation systems while placing a greater demand on machinery reliability. In spite of the greater focus on maintenance activities, machinery failures do occur causing process upsets and equipment damage while potentially compromising worker safety and negatively affecting the environment (Figure 1 Example Failure of a Coupled Process). The development of intelligent machines is often targeted at providing more reliable machines, machines that are easier to configure and diagnose, and machines that can be more readily integrated to help manage the growing complexity of automation systems. Examples of intelligent, self-diagnosing machines include a self-diagnosing smart valve [Marritt 2001], an intelligent motor with embedded sensors, processor, and motor diagnostic algorithms [Discenzo(a) 2000] and an intelligent variable frequency drive with embedded pump diagnostic algorithms and automatic pump protection capability [Discenzo(b) 2002]. Distributed intelligent machines support distributed computing and distributed control architectures while reducing the requirement for large complex central controllers. Distributed control implemented in intelligent machines provides the framework for highly distributed intelligent agents for diagnostic and control. A distributed multi-agent system employs intelligent agents Figure 1 that encapsulate the core fundamental behavior or function Example Failure of a of the intelligent device as an autonomous component. Coupled Process These components exhibit primitive, local goal-seeking capabilities to realize local objectives as well as collaborate with other intelligent devices to define and realize higher level cluster goals or overarching system-level goals. It is significant that new goals may emerge dynamically and replace previous goals. The suite of goals is hierarchical and is dynamic based on changes in machinery condition, predicted operating states, and changing system goals, objectives, or missions. The emerging suite of goals and strategies for realizing these goals is determined by collaboration and negotiation among groups of intelligent agents. This paper presents the foundation technologies that are essential to realizing an adaptive, reconfigurable automation system. A dynamic agent registry is presented along with a hierarchical framework for organizing agent clusters. This framework, called cluster associations, provides the basis for coordinating the dynamic reconfiguration of multiple subsystems that must be coordinated but are loosely coupled. 1.2. Intelligent Agents 1.2.1. Machine Intelligence For over 50 years there has been an ongoing effort to understand cognition and intelligence beginning with the famous Turing Paper. An important objective of these efforts is to provide a rigorous foundation to enhance the capabilities of machines and to make machines more useful [Charniak 1985] [Nilsson 1980]. Some of the techniques pursued include a suite of artificial intelligence (AI) techniques such as expert systems, fuzzy logic, genetic algorithms, analogic reasoning, artificial neural networks, and related model-based and model-free techniques [Charniak 1985]. Many of the automation successes reported apply biologically inspired architectures and techniques to solve well targeted, specific automation problems such as adaptive control, defect classification, and job scheduling. [Zurada 1994]. The capabilities which may be provided by intelligent machines may be categorized based on the degree of embedded knowledge with the most capable systems employing real-time goal adjustment, cooperation, preemption, and dynamic reconfiguration [Discenzo(c) 2000][Discenzo(d) 2002]. These capabilities may be effectively integrated in an agent-based system employing intelligent machines in a distributed automation system. This architecture built on a foundation of a society of locally intelligent cooperating machines provides an effective framework for the efficient and robust automation of complex systems. 1.2.2. Autonomous Agents The approach taken is to encapsulate the fundamental behavior of intelligent devices as an autonomous component. The autonomous component employs a model of the primitive device behavior and enables agents to act on behalf of physical devices or complex processes. The approach of establishing application-specific agent behavior in a reusable and scalable manner finds counterparts in other research activities such as Multi-agent Systems (MAS), Autonomous Agents, Flexible Manufacturing, and Virtual Enterprise [Shen 2001][Vasko 2000][Zhang 1999]. Our focus for intelligent agents is on device prognostics, and reconfiguration to realize local and system-level goals. This is complementary to previously reported developments employing autonomous control that incorporates agents for planning, communication, diagnostics, and control [Vasko 2000][Maturana 2000][Maturana 2002]. The core capabilities of the intelligent agents are summarized in Table I below [Wooldridge 1995]. Agent collaboration utilizes an agent registry facility and communicates using open, standard interfaces. The FIPA (Foundation for Intelligent Physical Agents) standard is used for multi-agent system operation. This facilitates system development and provides a basis for different and outside agents to be discovered and to participate in control and reconfiguration planning and execution [FIPA][FIPAOS]. Table I Typical Agent Characteristics 1. Autonomous 2. Reactive 3. Proactive 4. Social To collaborate, agents usually need a facility for registering their capabilities and to inquire about additional capabilities required. We employ a Matchmaker-based architecture which is based on Directory Facilitators (DF) and consistent with the FIPA standard. All agents register their capabilities with the DF agent and provide updates dynamically as functioning changes. With each request for a capability, the DF agent provides a list of agents that match the requested service or function. The DF agent acts as an information broker and passively provides information services. The DF agent organization may be hierarchical and agent capability registration may be made in a breadth-first or depth-first manner. Alternatively, to avoid propagation, information may only be propagated locally and new capability requests will be processed by local DF agents who carry out meta-level communications to discover needed capabilities. DF agents then provide location and service capability information of remote agents to the initial requesting agent. Relevant information may be stored locally to enhance organizational learning. Since one DF agent in the system represents a singe point of failure and also communication bottleneck, it is advantageous to use more than one DF agent. A structure of DF agents called Dynamic Hierarchical Teams (DHT) has been developed to insure user defined levels of fault tolerance while preserving scalability [Tichy 2004]. The implementation of self-emerging organizational structures is based on the dynamic gathering of system capabilities. The mechanism for collaboration and control is dynamic clustering of agents where transient clusters are representative of system capabilities for short periods of time. User and system tasks trigger the dynamic clustering. The task complexity determines the size and configuration of the clusters. Gathering multiple clusters forms a cooperation domain. An important characteristic of this mechanism is the capability of the system to aggregate resources as needed into the emergent organizations. This organizational feature provides an architecture that is robust and survivable. Machinery prognostic and diagnostics provide the foundation for autonomous agents to define the need for reconfiguration, the urgency for changing configuration and control, and to prescribe viable options for dynamic reconfiguration. 1.3. Machinery Prognostics Many characteristics such as component type, information descriptors, and diagnostic reference information are similar across components (e.g. motors, pumps, and compressors) and processes that use these components (e.g. industrial automation processes, aircraft actuators, shipboard auxiliary systems, and building services). An open, general framework for describing machinery health information was recently developed. The system, OSA-CBM (Open Systems Architecture for Condition-Based Maintenance) provides a framework for the real-time integration of machinery health and prognostic information with decision support activities. The scope of this framework includes legacy analog sensors and smart transducers (e.g. IEEE 1451), signal processing, state assessment, prognostics and decision support (www.osacbm.org). This architecture specification is open to Figure 2 the public and may be implemented in Intelligent Agent with OSA-CBM Data a DCOM, CORBA, or HTTP/XML environment. Closely coupling diagnostic and prognostics information with real-time automatic control can provide important new capabilities. It is possible to define the expected evolution of a state variable of interest. The evolution of a state variable, under specific environmental and operating conditions assumptions permits tracking the expected degradation of a critical system component eventually leading to a device failure. e.g. PM / Tank Empty Critical State Variable x(t) Time (t) Figure 3 Integrated Prognostics and Control The rate of component degradation may be determined continuously during machinery operation using general rules of thumb, simulation models, or dynamically updated models. For example, L10 bearing life may be calculated using speed, temperature, and loading. Similarly, motor winding lifetime is reported to be reduced by ½ with each 10 degree F rise in temperature. Alternatively on-line diagnostic and prognostic algorithms such as used for pumps, motors, or rolling element anti-friction bearings may provide a more accurate estimate of the degradation rate or lifetime trajectory of critical system components. Changing operating conditions such as speed, temperature, frequency, or acceleration / deceleration times may change the stress on critical components and cause the state variable to take a different time trajectory. The family of possible state trajectories represents the control space in which we may operate the system and where some of the possible trajectories are better than others and some states represent critical or unstable states and are to be avoided (Figure 3 Integrated Prognostics and Control). It is possible to dynamically drive the system to achieve a prescribed trajectory. This trajectory may represent an improved state than would occur if we did not alter the control based on machinery health information. Furthermore, the future state we achieve is chosen to be optimal in some sense such as machinery operating cost, machinery lifetime, or life-cycle cost. The lifetime curves for many devices or components will be linked dynamically. It is not uncommon for degradation in one component to cause excessive stress in another perfectly good component and the premature failure of this previously health component. Control decisions and reconfiguration options must consider the device stress coupling that exists and the degraded state of system components when defining the new configuration state required and the transition plan to this new state. Intelligent distributed agents provide an effective framework for managing this complexity and for controlling the dynamic reconfiguration of critical system elements. 1.4. Cluster Associations Multi-agent collaboration and control are achieved through the dynamic formation of agent clusters. Agent clusters directly support the collaboration of distributed autonomous devices. For example, in a chilled water system, agent clusters may be formed representing coupled pump, valve, chiller, and heat load entities. The cluster facility coupled with the registry facility enables intelligent agents to identify component or device faults, degraded components, or system services that, although not critical, will need to be satisfied in the near future. The cluster facility also supports defining a suite of possible reconfiguration options, evaluating the potential new configurations, defining a transition plan to the most desirable configuration, carrying out the prescribed reconfiguration plan, and implementing the associated new control action. The same agent cluster facility described above can be applied to other systems such as fluid handling systems, material handling systems, and power distribution systems. Many processes employ multiple coupled subsystems working together. For example, the operation of the chilled water system described above is affected by the supply power provided to the system components and the demand of the various heat loads such as facility cooling and equipment cooling requirements. A structure for coordinating the interface between agent clusters is needed to provide the required sub-system coordination while not inducing excessive communications and coordination demands. The coordination required between loosely coupled systems may be accomplished by associating clusters that operate concurrently in separate but coupled domains. This is shown graphically in Figure 4 Cluster Associations Across Application Domains. Cluster associations are represented as agent properties and this information is in the appropriate agent registry. The agent association may function as connectors as described by Gladwell [Barabasi 2002]. MISSION SUBSYSTEM / AUXILIARY CHILLER / BALLAST / JP5 POWER DISTRIBUTION Figure 4 Cluster Associations Across Application Domains Intelligent agent negotiation within a cluster will also propagate the reconfiguration option through the cluster association to agent clusters managing other linked domains such as power. For example, a decision to operate a pump at full load must be made in concert with the agent-based power management system to insure that power will be provided and maintained for the new critical load in spite of possibly diminished overall power capacity. Power reconfiguration planning will take into account existing and future reconfigured power requirements to support the various required services. Agent negotiations will necessarily be performed in parallel in multiple application service domains and coordination achieved across service domains through cluster associations. Coordinating the reconfiguration and control of coupled systems using can provide the capability for maintaining critical system functions in spite of unexpected disturbances, unforeseen damage, and unique or severe loading requirements. The use of both intelligent agent virtual clusters and cluster associations provides the framework for managing the complexity of dynamically reconfiguring coupled systems. 1.5. System Implementation We have developed autonomous agents for distributed control of a land based Chilled Water System (CWS) pilot system. The laboratory system is a Reduced Scale Advanced Development (RSAD) model that is a scaled down model of a chilled water system from a US Navy ship. The RSAD model employs distributed diagnostic and control agents deployed on commercially available controllers (Figure 5). Figure 5 Chilled Water Pilot System There are currently two chiller plants in the system and an infrastructure of pipes, valves, pumps, sensors, and ship services (i.e. heat loads) both vital and non-vital. This system employs 68 agents running on 23 commercially available programmable logic controllers. Clusters are formed dynamically and the Contract Net protocol [Smith 1980] is used to establish dynamic negotiations among the agents to realize ship-level and mission goals and to meet local and immediate operational objectives. This implementation utilizes a full suite of Directory Facilitators (DF) that provide matchmaking and Agent Management Services (AMS) functionality. Group goals emerge dynamically and these are agreed upon by the agents through negotiation. For example, an agent that detects a water leakage problem in a pipe section establishes a goal to isolate the leakage and informs adjacent agents to evaluate the problem according to their data. A simulation model and associated development tools were developed to facilitate designing and deploying the intelligent agent system and to validate performance over a wide range of operating conditions. A screen copy showing the system schematic and the simulation screens is shown in Figure 6 Intelligent Agent System Simulation. It is significant that this is a highly distributed autonomous system. There is no central controller and no single point of failure. This system has been shown to dynamically establish new goals and automatically reconfigure system operation to minimize damage and to meet critical cooling needs. New operating goals may emerge based on equipment prognostics or predicted component failure to avoid reaching a predicted or probable state that is undesirable (e.g. catastrophic component failure). The potential undesirable states may be efficiently avoided while continuing to satisfy critical system needs (e.g. radar cooling). This system serves to validate the agent methodology to manage the inherent complexity of highly distributed systems while Figure 6 Intelligent Agent System Simulation responding dynamically to changes in operating requirements and degraded or failed components. 1.6. Opportunities and Challenges The software content and complexity of automation systems continue to increase rapidly while software problems represent a leading cause of production breakdowns [Salimen 1992]. The new paradigm of Intelligent Multi-agent systems can provide significant benefits such as scalability, reliability, and survivability for complex critical systems. The broadscale deployment of intelligent devices such as intelligent pumps, fans, drives, valves, and motors may readily utilize this paradigm. The potential benefits of deploying intelligent devices in an autonomous multi-agent framework are significant and far surpass those of merely implementing a collection of intelligent devices. Some of the challenges that remain include the need for a consistent framework and information model that will encompass integrating disparate agents, operating constraints, mission planning, dynamic optimization criteria, adaptive learning, and self-organizing behavior. The technological developments cited above combined with intelligent devices implemented in an agent-based / Holonic framework promise to provide unprecedented capabilities for the automation of a broad class of complex systems. The unique and important capabilities are provided by integrating prognostics and reconfigurable control in an intelligent agent framework. Laboratory demonstrations and simulation studies have exhibited unprecedented capabilities for survivability, adaptability, and dynamic adaptation to changing demands and unexpected faults. The techniques outlined above promise to change the way future complex automation systems are designed and deployed. The technologies outlined above represent new and important capabilities that are broadly applicable across a wide range of industrial and commercial systems. References Barabasi, A., Linked, The new Science of Networks, Perseus Publishing, Cambridge, Massachusetts, 2002, pp. 50-54 Charniak, E., & McDermott, D, 1985, Introduction to Artificial Intelligence, Addison-Wesley Discenzo(a), F. M., Unsworth, P. J., Loparo, K. A., Marcy, H. O., Intelligent Motor Provides Enhanced Diagnostics and Control for Next Generation Manufacturing Systems, IEE Computing and Control Engineering Journal, Special Issue on Intelligent Sensors, Summer 2000 Discenzo(b), F.M., Rusnak, D., Hanson, L., Chung, D., and Zevchek, J., Next Generation Pump Systems Enable New Opportunities for Asset Management and Economic Optimization. Fluid Handling Systems, 2002, 5(3), pp. 35-42. Discenzo(c), F. M., Marik, V., Maturana, F., Loparo, K. A., Intelligent Devices Enable Enhanced Modeling and Control of Complex Real-Time Systems, International Conference on Complex Systems ICCS, Nashua NH., May 2000 Discenzo(d), F.M., Maturana, F.P., Chung, D., “Managed Complexity in An Agent based Vent Fan Control System Based on Dynamic Re-configuration”, International Conference on Complex Systems ICCS, Nashua NH., June 9-14, 2002 FIPA, The Foundation for Intelligent Physical Agents (FIPA): www.fipa.org FIPA - OS (Open Source), Emorphia: http://fipa-os.sourceforge.net Maturana F., Balasubramanian S., and Vasko D.,: An Autonomous Cooperative Systems for Material handling Applications. ECAI 2000, Berlin, Germany, 2000 Maturana F., Staron R., Tichy P., and Slechta P.: Autonomous Agent Architecture for Industrial Distributed Control. 56th Meeting of the Society for Machinery Failure Prevention Technology, Section 1A, Virginia Beach, April 15-19, 2002 Nilsson, N., 1980, Principles of Artificial Intelligence, Morgan Kaufmann (Los Altos) Salimen, V, Verho, A, 1992, Multidisciplinary Problems in Mechatronics and Some Solutions, Computers Electr. Engng., Volume 18, Number 1, pp. 1-9 Shen W., Norrie D., and Barthès J.P.: Multi-Agent Systems for Concurrent Intelligent Design and Manufacturing. Taylor & Francis, London, 2001 Smith, R. G.: The Contract Net Protocol. High-level Communication and Control in a Distributed Problem Solver. In IEEE Transactions on Computers, C-29(12), pp. 1104- 1113, 1980 Tichý P.: Fault Tolerant and Fixed Scalable Structure of Middle Agents. Fourth Computational Logic in Multi-Agent Systems Conference (CLIMA IV), Fort Lauderdale, FL, USA, 2004 Vasko D., Maturana F., Bowles A., and Vandenberg S.: Autonomous Cooperative Systems Factory Control. PRIMA 2000, Australia, 2000 Wooldridge M. J. and Jennings N. R.: Intelligent Agents: Theory and Practice. In the Knowledge Engineering Review, Vol. 10, No. 2, pp. 115-152, 1995 Zhang, X., Norrie, D.: Agentic Control at the Production and Controller Levels. IMS 99, Sept. 22-24, 1999, pp. 215-224, Leuven, Belgium 1999 Zurada, J. M., & Marks II, R. J., & Robinson, C. J., 1994, Computational Intelligence: Imitating Life, IEEE Press, ISBN 0-7803-1104-3, 1994, pp. 5-12