Autonomic System Design Visa Holopainen, visa@netlab.hut.fi Enabling autonomic behavior in systems software with hot swapping, J. Appavoo et al. 2003 Focus on object-oriented systems software By hot swapping, new algorithms and monitoring code can be added to a running system without disruption Hot swapping is accomplished either by interpositioning of code, or by replacement of code Interpositioning involves inserting a new component between two existing ones. This enables more detailed monitoring when problems occur, while minimizing run-time costs when the system is performing acceptably Replacement allows an active component to be switched with a different implementation of that component while the system is running Triggering hot swapping In many cases an object is expected to trigger a replacement itself (autonomously). For example, if an object is designed to support small files and it registers an increase in file size, then the object can trigger a hot swap with an object that supports large files In other cases, the system infrastructure is expected to determine the need for an object replacement through a hot swap. Monitoring is required for this purpose. Adaptive code vs. hot swapping Among other features, hot swapping allows systems software to react to changes in environment More traditional approach towards handling varying environments is to use adaptive code In a system using adaptive code, all possible configurations must be built to the system beforehand Adaptive code has many problematic features (presented below) Illustration of adaptive code vs. hot swapping An adaptive code implementation (A) vs a hot-swapping implementation (B) of the same function The adaptive code approach is monolithic and includes monitoring code that collects the data needed by the adaptive algorithm to choose a particular code path With hot swapping, each algorithm is implemented independently (resulting in reduced complexity per component), and is hot swapped in when needed Benefits of hot swapping Hot swapping can be beneficial at least in the following respects: Optimizing for the (non) common case Optimizing for a wide range of file attribute values Researchers have shown up to 30 percent fewer cache misses by using the appropriate cache management policy Multiprocessor optimizations For example, although the vast majority of files accessed are small (< 4 KB), OSs must also support large files Access patterns Dynamic replacement allows efficient implementations of common paths to be used when suitable, and less-efficient, less-common implementations to be switched in when necessary Some applications perform better when distributed to many processors while others perform better when run on a single processor Enabling client-specific customization Exporting system structure information Always gathering the necessary profiling information increases overhead Testing system A research operating system (K42) has been developed to test the hot swapping approach Runs on PowerPC and MIPS architectures (soon available for x86 also) K42 scales well to multiprocessor systems Performance advantages of hot swapping have been demonstrated in K42 K42 is available at http://www.research.ibm.com/K42 Adding Autonomic Functionality to object-oriented applications, M. Schanne, W. Tichy, T. Gelhausen, 2003 The goal is to separate autonomic functionality from applications (similar to hot swapping) This is accomplished by creating a system based on class renaming and proxy/wrapper generation A list of the proxy objects is kept in registry Proxy objects has always a pointer to the latest version of the actual object and access to its member functions This is accomplished by ByteCode Engineering Library (BCEL) Wrapper functions ensure synchronization of variables The design ensures that there is no need for the user to adapt his source code in any way or even to restart the program The supported environment: the likes of Java 2 platform Usable Autonomic Computing Systems: the Administrator’s Perspective, R. Barrett, P. Maglio, E. Kandogan, J. Bailey, 2004 Autonomic computing seeks to solve the problem of increasingly complex configurations through increased automation However, the AC strategy of managing complexity through automation runs the risk of making management harder (more powerful commands) This is why autonomic systems should: Provide facilities that make rehearsing and planning easy Be designed to allow administrators to quickly undo changes, making operations (whether on production systems or test systems) less risky and therefore easier Inform the administrator if undo:ing a command will not be possible (easily) Have enhanced capabilities for testing complex end-to-end systems so that administrators will be confident that their changes are not having unintended consequences Provide access to arbitrary levels of configuration detail if need be Autonomic system should also Contain a command line interface (in addition to GUI) An Architectural Approach to Autonomic Computing, S. White, J. Hanson, I. Whalley, D. Chess, J. Kephart, 2004 An autonomic system can be decomposed to 1) interfaces, 2) interactions and 3) design patterns A bit RFC-style paper with MUST and SHOULD statements about Autonomic Elements (AE) MUST Examples: An AE MUST be self-managing An AE MUST handle problems locally whenever possible An AE MUST be capable of establishing and maintaining relationships with other autonomic elements SHOULD Examples: An AE SHOULD ask for a realistic set of requirements when requesting a service from another element An AE SHOULD offer a range of performace, reliability, availability and security associated with its service An AE SHOULD protect itself against inappropriate service requests and responses Use of policies The use of policies is essential for autonomic systems Three (3) policy levels presented Action policies (IF condition THEN action) 1) • An AE employing action policies MUST measure and/or synthesize the quantities stated in the condition Goal policies (”Response time must not exceed 2 sec.”) 2) • AEs employing goal policies MUST possess sufficient modeling or planning capabilities to translate goals into actions Utility function policies (automatically determine the most valuable goal in any situation) 3) • AEs employing utility funtion policies MUST have sophisticated modeling and optimization capabilities to translate utility functions into actions Interfaces Making a system autonomic requires additional interfaces to be added to the system Monitoring and test interfaces Lifecycle interfaces Enable administrative elements to determine the lifecycle state of an element (e.g. starting, paused), to cause a state change, and to determine the lifecycle model that applies to the element, and to determine the lifecycle model that applies to the element Policy interfaces Enable an element to be monitored by any other element that has established the appropriate administrative relationships with it Enable administrative elements to send new policies to an element, and to determine the policies currently in use by the element Negotiation and binding interfaces Permit an element to request a service from other elements, or to request to provide a service Relationships When an AE has agreed to provide service to another AE, then those two elements have a relationship Relationships are typically formed at run-time Autonomic systems are built by relationships Request-response paradigm used to form relationships From autonomic elements to autonomic systems Assembling an autonomic system requires: 1) 2) 3) A collection of AEs that implement the desired function Additional autonomic elements to implement system functions that enable the needed system-level behaviors (=infrastructure elements) Design patterns for system self-management Infrastructure element can be Registry (provides mechanisms for elements to find one another) Sentinel (provides monitoring services to other elements) Aggregator (combines two or more existing elements and uses them to provide improved service) Broker (facilitates interaction) Negotiator (assists elements with complex negotiations) Towards Requirements-Driven Autonomic Systems Design, A. Lapouchnian, S. Liaskos, J. Mylopoulos, Y. Yu, 2005 There are three basic ways to make a system autonomic 1) 2) 3) Design the system to support a space of possible behaviors Equip system with planning and social capabilities so that it can delegate tasks to external software components (agents) Build the system so that it has evolutionary capabilities (like biological systems) The first approach was studied in the paper Requirements engineering Development of a framework for capturing and analyzing stakeholder intentions to generate functional and non-functional requirements Illustration of requirements engineering: goal model Top-level ”hard” goal: 4 top-level ”softgoals” Schedule meeting AND-composed of lower level hard goals Good quality schedule, Minimal effort, Minimal disturbances, Accurate constraints Lower level softgoals can be related to higher levels by help (+), hurt (-), make (++) or break (-) relationships 6 alternative ways to fulfill the goal “Schedule Meeting” An autonomic system should address all different ways of fulfilling the top-level goals Goal model -> Feature model ->Component Connector model Goal model is integrated into the knowledge of an autonomic element Architectural Design of a Distributed Application with Autonomic Quality Requirements, D. Weyns, K. Schelfthout and T. Holvoet, 2005 A reference architecture for situated multi-agent systems (situated MAS) was developed This reference architecture was applied to a real-world software system The architecture: A situated MAS consists of an environment populated with agents (autonomous entities) Intelligence in a situated MAS originates from the interaction between agents, rather than from their individual capabilities The architecture holds three abstractions: agents, ongoing activities and the environment High-level model view of the architecture The Perception module maps the local state of the environment onto a percept for the agent The Consuption module handles the effects of encironment changes that affect the agent The Decision module is responsible for action selection The application A system in which robots transport loads from one place to another within a warehouse and recharge themselves whenever needed Old system: centralized server controlled robots Main problem: inflexibility; robots can’t adapt to changing situations Improvement: Robots are agents acting in a MAS Drawback: more complicated system Module view of the application Two kinds of agents: trasport agents and AGV agents Transport agents are ”managers”; they determine the priority of the transport, assign transports to AGVs and ensure that the transport succeeds AGV agents are responsible for executing the assigned transport Architecture of the environment To cope with the complexity of the environment, it is presented through a layered architecture Virtual environment uses a middleware layer that enbles agents to communicate with each other Virtual environment enbles agent routing and prevents collisions The agent observer a 3-5 meter circle from the virtual environment at a time In this circle the agent marks the path it is going to use and removes this path when leaving the circle This way collisions can be avoided Transport agents use the virtual environment to locate AGV agents A Control Theory Foundation for SelfManaging Computing Systems, Y. Diao, J. Hellerstein, S. Parekh, R. Griffith, G. Kaiser, D. Phung, 2005 Control theory used as a way to identify a number of requirements for and challenges in building self-managing systems What does control theory bring to table in terms of self-management? Autonomic computing and control theory have slightly different points of focus: autonomic computing focuses on the specification and construction of management components that interoperate well, while the focus of control theory is on analyzing and/or developing components and algorithms so that the resulting system achieves the control objectives For example, control theory provides design techniques for determining the values of parameters in commonly used control algorithms so that the resulting control system is stable and settles quickly in response to disturbances Feedback Control Theory Reference Input (I/P) : Desired Output (O/P) (as specified by the human) Control Error : (Reference I/P – Measured O/P) Control Input : Parameters which affect behavior of the system Disturbance I/P : affects Control I/P Controller : Change Control I/P to achieve Reference I/P Measured O/P : Measurable feature of the system Noise I/P : affects Measured O/P Transducer : Transforms measured O/P to compare with Reference I/P Properties of Control Systems SASO Stable Accurate Measure Output converges to Reference (Desired) Input Short Settling Times Bounded Input produces bounded output Unstable systems not usable in mission critical work Converges to the Stable Value quickly No Overshoot Achieves objectives in a steady manner Control Analysis and Design Transfer function and Ztransformation used to control and model response times and settling times MaxUsers u (k ) Notes Server Actual RIS y (k ) Model of System Dynamics y ( k 1) a 1 y ( k ) b1 u ( k ) Transfer Function N ( z) b1 z a1 a1 0.43 b1 0.47 Example: control theory approach to web server management Objective : CPU Utilization < 50% Measured Output : CPU utilization Control Input : “MaxClients” During the first 300 s, the system operates without feedback control. When the controller is turned on, a reference input of 0.5 is used. At this point, the system begins to oscillate and the amplitude of the oscillations increases. This is a result of a controller design that overreacts to the stochastics in the CPU utilization measurement. <username>, I Need You! Initiative and Interaction in Autonomic Systems, P. Kaminski, P. Agrawal, H. Kienle, H. Müller, 2005 Autonomic job requirements If I hired a person instead, what qualities would I look for? attention to detail, strong communication skills, initiative, tempered by job boundaries, self-knowledge and willingness to seek help Treat users as partners, not masters Basic idea: The system has an optimization engine that decides if the preferred mode of action in some situation is to 1) connect a human or 2) try to repair the system Decision based on 1) explicit instructions and 2) learning Balance match, bother, rush, risk The system learns from human actions and becomes more competent in solving problems on its own Balance initiative and interaction Send messages via e-mail, instant messenger, etc. Human (operator) is added to the traditional autonomic computing cycle Autonomic interaction manager Analyze Monitor receive advice Plan Knowledge Execute ask for help