Research Challenges in Autonomic Computing IBM Research

IBM Research

Research Challenges in

Autonomic Computing

Jeff Kephart

IBM Research

kephart@us.ibm.com

www.research.ibm.com/autonomic

© 2003 IBM Corporation

IBM Research

Outline

 Background and Motivation

 Autonomic Computing Research at IBM

 Architecture

 Overview of Research Program

 Autonomic Computing Research Challenges

 Conclusions

2 Research Challenges in Autonomic Computing | EPSRC e-Science Meeting , March 26, 2004 © 2003 IBM Corporation

IBM Research

Background and Motivation (Kephart)

 My role in autonomic computing



My group does research on agents and multi-agent systems

–

Architecture, Communication, Negotiation, Machine learning



AC Research strategy; joint program manager

 University relations; faculty awards, equipment grants

 Co-chair, International Conference on Autonomic Computing 2004

 What I hope to achieve here

 Explore overlaps between research interests of e-Science and AC communities

 Explore potential collaborations


4

IBM Research

Complex heterogeneous infrastructures are a reality!

Dozens of systems and applications

DNS

Server

Web

Application

Thousands of tuning parameters

Hundreds of components

Research Challenges in Autonomic Computing | EPSRC e-Science Meeting , March 26, 2004 © 2003 IBM Corporation

IBM Research

5

Autonomic Computing: Motivation

 Individual system elements increasingly difficult to maintain and operate



100s of config, tuning parameters for commercial databases, servers, storage

 Heterogeneous systems are becoming increasingly connected



Integration becoming ever more difficult

 Architects can't intricately plan component interactions



Increasingly dynamic; more frequently with unanticipated components

 This places greater burden on system administrators, but

 they are already overtaxed

 they are already a major source of cost (6:1 for storage) and error

 We need self-managing computing systems

 Behavior specified by sys admins via high-level policies



System and its components figure out how to carry out policies


IBM Research

Facets of Self-Management

6

Self-

Heal

Optimize Web servers, databases have hundreds of nonlinear tuning parameters; many new ones with each release. Adjusted manually.

Protect

The Human-Intensive Present The Autonomic Future

Configure Corporate data centers are multivendor, multi-platform. Installing, configuring, integrating systems is time-consuming, error-prone.

Problem determination in large, complex systems can take a team of programmers weeks.

Manual vulnerability analysis. Manual detection and recovery from attacks, cascading failures.

Automated configuration of components, systems according to high-level policies; rest of system adjusts seamlessly.

Automated detection, diagnosis, and repair of localized software/hardware problems.

Components and systems will continually strive to improve their own performance and efficiency.

Automated defense against malicious attacks or cascading failures; use early warning to anticipate and prevent system-wide failures.

Business case: Increased resiliency, responsiveness, efficiency, ROI

Reduced down-time, risk, time-to-value, cost


IBM Research

Evolving towards Autonomic Computing Systems


IBM Research

Outline



 Architecture


 AI Research Challenges

 Conclusions


IBM Research

Autonomic Computing Architecture

The Autonomic Element

 AEs are the basic atoms of autonomic systems

 An AE contains



Exactly one autonomic manager

 Zero or more managed element (s)

 AE is responsible for

 Managing own behavior in accordance with policies

 Interacting with other autonomic elements to provide or consume computational services

Autonomic Manager

Analyze Plan

Monitor

Knowledge

Execute

S E

Managed Element

An Autonomic Element

9

Service-oriented architecture

Software agents

An Autonomic Element software app, workload mgr, sentinel, arbiter, OGSA infrastructure elements


IBM Research

Autonomic Computing Architecture

Element interactions

 System self-* properties, behavior arise from interactions among autonomic managers

 Interactions are



Dynamic, ephemeral

 Formed by (negotiated) agreement

 Flexible in pattern; determined by policies



Based on OGSI -

> WSRF, …? and specific AC extensions

10

A multi-agent system!


IBM Research

Example scenario: Autonomic Data Center

Autonomic Data Center

System

Manager

Resourcelevel utility

Resource

Arbiter

Policy

Repository

Registry

Service-level utility

Application

Manager

Application

Manager

Demand

11

Database

Router

Server

Storage

Application Environment

Database

Router

Server

Storage


Research Challenges in Autonomic Computing | EPSRC e-Science Meeting , March 26, 2004

Demand


IBM Research

Overview of IBM’s Autonomic Computing Research Program



100-150 researchers working on various aspects of Autonomic Computing

 Some projects predate AC initiative; now trying to realign them with AC architecture

 Technologies for specific autonomic elements

 Database, storage, network, server, client…



Generic element technologies for autonomic elements

 Autonomic Manager Toolset integrates many element-level technologies

– Modeling, analysis, forecasting, optimization, planning, feedback control , etc.

 Uses Open Grid Services Architecture standards for inter-element communication

 Available (with ETTK v1.1) on www.alphaworks.ibm.com

; open source later



Generic system-level technologies

 Dependency management, problem determination and remediation, workload management, provisioning , …

 System scenarios and prototypes

 Small- to medium-scale autonomic systems

 Demonstrate self-* arising from AC architecture + technology

 Identify gaps, necessary modifications


IBM Research




Over 150 researchers working on various aspects of Autonomic Computing



 Database, storage, server, client…





– Modeling, analysis, forecasting, optimization, planning, feedback control , etc.


 Available (with ETTK v1.1) on www.alphaworks.ibm.com

; open source later




 Dependency management, problem determination and remediation, workload management, provisioning, …






IBM Research

Autonomic Manager ToolSet

W. Arnold et al., Watson

 Facilitates autonomic mgr construction

 In accordance w/ AC architecture

 Catcher for generic AM technologies

 OGSI (Globus 3.0 beta) -> WSRF

 Policy tools

 Monitoring standards and technologies

 AI tools for knowledge representation, reasoning, planning

 Math libraries for modeling, optimization

 Feedback control

 AMTS V1.0 available as part of Emerging

Technologies Toolkit v 1.1 on IBM alphaWorks

(www.alphaworks.ibm.com)

 Considering open source



Should we think about OMII?

S E


Analyze Plan

Monitor

Knowledge

Execute

S E

Managed Element


IBM Research












– Modeling, analysis, forecasting, optimization, planning, feedback control, etc.


 Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later




 Dependency management, problem determination and remediation, workload management, provisioning , …






IBM Research

Dependency Mgt & Self-Healing

G. Kar, Watson and H. Lee & S. Ma, Watson



Determine functional dependencies among elements

 Mine design docs, system config metadata, log files

 Actively probe running system

 Use dependency information for system configuration, healing

 Problem management lifecycle



Monitor->Detect->Localize->Repair->Learn

App Server

16

Dependency Matrix pWS pAS pDBS pingR 0 pingWS 0 pingAS 0 pingDBS 0

WS AS DBS R HWS HAS HDBS

1 1 1 1 1 1 1

0 1

0 0

1

1

1

1

0

0

1

0

1

1

0

0

0

0

0

0

0

0

1

1

1

1

0

1

0

0

0

0

1

0

0

0

0

1

Web Server

Router

DB

Server

Probe

Analysis & Control


IBM Research

Enterprise Workload Management

D. Dillenberger

Data and

Transaction

Internet/

Extranet

Large, distributed, heterogeneous system

 Achieves end-to-end performance via adaptive algorithms

 Administrator defines policy

– Desired response times for various classes of users, apps

 eWLM managers on each resource cooperate to adaptively tune parameters

– OS, network, storage, virtual server knobs

– JVM heap size, # garbage collection threads

– Workload balancing, routing parameters


IBM Research












– Modeling, analysis, forecasting, optimization, planning, feedback control, etc.


 Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later




 Dependency management, problem determination and remediation, workload management, provisioning, …






IBM Research

Human Interaction with Autonomic Systems

P. Maglio, Almaden

 Basic questions

 What do middleware administrators do?

 How can we better support the problems and practices they have?



Learn answers to these questions via ethnographic studies

 Use insights to develop new ways to interact with complex computing systems

… but we thought that was the return port!

We had it wrong. Our assumption of how it worked was incorrect.

We start with looking at the proxy server log files, then the web server log files, then the application server admin log files then the application log files.


IBM Research

Example scenario: Autonomic Data Center

Autonomic Data Center

System

Manager

Resourcelevel utility

Resource

Arbiter

Policy

Repository

Registry

Service-level utility

Application

Manager

Application

Manager

Demand

20

Database

Router

Server

Storage


Database

Router

Server

Storage


Research Challenges in Autonomic Computing | EPSRC e-Science Meeting , March 26, 2004

Demand


IBM Research

Outline





Architecture




Scenarios

 Autonomic Computing Research Challenges

 Systems and Software

– Architecture, software engineering & tools, testing/validation

– Prototyping a large-scale self-* system

 Human-Computer Interaction

– Policies, Interfaces

 Artificial Intelligence

Learning, Negotiation, Self-healing, Emergent Behavior

 Conclusions


IBM Research

Challenge : Architecture

Define set of fundamental architectural principles from which self-* emerges

 AE : How to coordinate multiple threads of activity?

 AE’s live in complex environments



Multiple task instances and types

– concurrent, asynchronous



Multiple interacting expert modules

S E


Analyze Plan

Monitor Knowledge Execute

 AE : How to detect/resolve conflicts arising from

 Internal decisions by independent expert modules

 External directives (possibly asynchronous)



Internal policies vs. external directives

S E

Managed Element

An Autonomic Element

 System-level : Enable more flexible, service-oriented patterns of interaction

 As opposed to traditional top-down, hierarchical systems management

 Multi-agent architecture

– Communication

–

Representing and reasoning about needs, capabilities, dependencies


Challenge

IBM Research

: Policy

Policy : “Set of guidelines or directives provided to autonomic element to influence its behavior”

 Human interface

 Authoring and understanding policies

 Avoiding or ameliorating specification errors

S E


Analyze Plan

 Developing a universal representation and grammar

 Many different application domains, disciplines

 Many different flavors of policy

 Covers service agreements too?

 Algorithms that operate upon policies (and agreements?)

 Automated derivation of actions (e.g. planning, optimization)

 Automated derivation of lower-level policies from high-level policies

E.g. “Maximize profit from this set of service contracts”

 Conflict resolution

 Both design time and run time

 Need to establish protocols, interfaces, algorithms

Monitor Knowledge Execute

S E

Managed Element


IBM Research

Three flavors of (policy = “decision-making guide”)

 Action rule

 If (S) then do a

2

 Results implicitly in desired state s

2

 Goal

 Achieve a most desired state s

2

 Compute a

2 most likely to result in s

2

Current

State

S

 Assumes that most desired state can be determined a priori a

1 a

2

Possible

State s

1

Possible

State s

2 a

3

Possible

State s

3

24



Utility function

 Achieve state s with maximal net value V( s

) – C(a

S d s

)

 Benefit and burden of being explicit about value

 States have intrinsic value; value of policy is a derived quantity

Machine code

[More levels of code hierarchy] Workflows

Programming

Rules

Adapters,

Translaters

Actions Generative

Planning

Decisiontheoretic

Planning

Element

Goals

Optimization

Element utility functions

Modeling,

Optimization

System utility functions

Research Challenges in Autonomic Computing | CMU, September 4, 2003

Higher-level specifications


IBM Research

Challenge

: Human-System Interface



Develop new languages, metaphors and translation technologies that enable humans to monitor, visualize, and control AC systems

 Specify goals and objectives to AC systems, and visualize their potential effect

 Techniques must be

– Sufficiently expressive of preferences regarding cost vs. performance, security, risk and reliability

– Sufficiently structured and/or naturally suited to human psychology and cognition to keep specification errors to an absolute minimum

– Robust to specification errors


IBM Research

Challenge

: Learning

Establish theoretical foundation for understanding and performing learning and optimization in multi-agent systems.



Single element level

 AE needs to learn a model of itself and environment quickly; environment is noisy, and dynamic in both state and structure

 On-line, so exploration of the space can be costly and/or harmful

 May be several hundreds of tunable parameters!

–

Maybe only a few dozen are relevant, but which ones?

– Some of them can only be changed upon reboot – is it worthwhile?



System level

 Multi-agent system: several interacting learners

 What are good learning algorithms for cooperative, competitive systems?

– What are conditions for stability?

– What is sensitivity to perturbations?

 Opportunities for layered learning


IBM Research

Challenge

: Negotiation



Develop and analyze

 Methods for expressing or computing preferences

 Negotiation protocols

 Negotiation algorithms

27



Establish theoretical foundation for negotiation

 Explore conditions under which to apply

– Bilateral

– Multi-lateral (mediated, or not)

–

Supply-chain

 Study how system behavior depends on mixture of negotiation algorithms in AE population


IBM Research

Challenge

: Control and Harness Emergent Behavior



Understand, control, and exploit emergent behavior in autonomic systems

 How do self-*, stability, etc. depend on

– Behaviors and goals of the autonomic elements

– Pattern and type of interactions among AEs

– External influences and demands on system

 Invert relationship to attain desired global behavior

– How?

– Are there fundamental limits?



Develop theory of interacting feedback loops

 Hierarchical

 Distributed


IBM Research

Outline



 Architecture

 Scenarios



Overview of Research Program

 Research Challenges

 Conclusions


IBM Research

Conclusions



Autonomic Computing is a grand challenge, requiring advances in several fields of science and technology

 Policy, planning, learning, knowledge representation, multi-agent systems, negotiation, emergent behavior

 Human-system interfaces



Integrating these technologies to support self-management in complex, realistic environments is a research challenge in itself

 What are the best architectures and design patterns? Role of (multi-)agent systems?

 Building system prototypes is key to developing and validating AC technology and architecture



The e-Science community is facing many of the same challenges

 Which ones are you most interested in tackling?

 How might we collaborate?

 AMTS in OMII?

 Conferences (come to ICAC ’04 in NYC May 17-18)

 Encouragement from EPSRC?

 Seek effective collaborations with IBM Researchers


IBM Research

Additional Information

 A Vision of Autonomic Computing



IEEE Computer, January 2003

 IBM Systems Journal special issue on Autonomic Computing

 http://www.research.ibm.com/journal/sj42-1.html

 Web site

 www.research.ibm.com/autonomic



International Conference on Autonomic Computing

 www.autonomic-conference.org

 May 17-18, New York City

 Submission deadline: January 12, 2003


IBM Research

Backup Slides


Research Challenges in Autonomic Computing IBM Research

IBM Research

Research Challenges in

Autonomic Computing

kephart@us.ibm.com

www.research.ibm.com/autonomic

Outline

Background and Motivation (Kephart)

Complex heterogeneous infrastructures are a reality!

Autonomic Computing: Motivation

Facets of Self-Management

Evolving towards Autonomic Computing Systems

Outline

Autonomic Computing Architecture

Autonomic Computing Architecture

Overview of IBM’s Autonomic Computing Research Program

Overview of IBM’s Autonomic Computing Research Program

Overview of IBM’s Autonomic Computing Research Program

Overview of IBM’s Autonomic Computing Research Program

Outline

: Policy

Three flavors of (policy = “decision-making guide”)

: Human-System Interface

Develop new languages, metaphors and translation technologies that enable humans to monitor, visualize, and control AC systems

: Learning

Single element level

System level

: Negotiation

Develop and analyze

Establish theoretical foundation for negotiation

: Control and Harness Emergent Behavior

Understand, control, and exploit emergent behavior in autonomic systems

Develop theory of interacting feedback loops

Outline

Conclusions

Additional Information

International Conference on Autonomic Computing

Backup Slides

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib