The State-Space Approach to Self-Management of Enterprise Systems

The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute of Technology Hewlett-Packard labs* Outline        Motivation: Enterprise Complexity Issues Solution Overview Policy-Driven Self-Management Dynamic SLA Decomposition Results Future Work Enterprise Complexity: Some Facts  From a survey conducted by Forrester Research  Enterprises now devote 80% of their overall IT budget to maintenance and ongoing operations  More than half of the 347 participating companies used at least 3 database vendors  A major banking-industry client had 18 different travel and expense systems in the organization  “VP of IT Governance” - says tons about the state of enterprise IT infrastructure The Complexity Wall “If we don’t get a handle on complexity, it will stop the expansion” - Paul Horn, Senior Vice President, IBM Research “Our enterprise customers are working with enormous complexity” - Dick Lampman, Former Director, HP Labs The Complexity Wall @  Worldspan, one of our industry collaborators, provides services to the travel industry  One of their airline ticket pricing/availability services is hosted on a farm of 1400 servers  In 2006 alone, they processed around 9.6 billion messages  Highly varying request rates and request type mix  Several behaviors of their system are not well understood    Effects of Ticket Geography Effects of Cache Refresh Time Effects of Time of Day … To Handle The Complexity…  One must enable self-management of complex enterprise infrastructures driven by high-level goals Enterprise Self-Management: The Hurdles  Enterprise systems are too big  The problem of Scale  It is tough to relate high-level goals to lowlevel actions  The problem of Complex System Modeling  The operating environment is very dynamic  The problem of Dynamism  Administrators find it hard to trust black-box solutions  The problem of Trust & Tractability Solution Overview: System State-Space Enterprise System Monitored System Variables Monitored Component Variables System State Space V = (v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,vn) • Variables of Interest Vø  • Controllable Variables Vα V, e.g. Response-Time, QoI  V, e.g. Allocated-Servers, Memory  The aim is to establish a relation between Vø and Vα under current operating conditions Simple Automated Operation  SLO: “Response Time < 10msec”  Event: SLO Violation  Condition: Bandwidth=90Mbps, Request Rate=30  Action: set Allocated Servers to 3  : Vα Vø given V – (Vα U Vø) V Vø α 1 3 90 30 12 12 8 9 Allocated Servers Bandwidth Request Rate Response Time Solution Overview: The Function   Learn from observed system states  But there are problems  Different behavior in different sub-spaces  Large state space, |V| ≈ 102 to 103 v1 v2 . . . . . . . . . . . . . vn CPU Bottleneck Machine Learning Network Bottleneck Observed System States  Solution Overview: The Function   We decided to model the system using multiple µ-models  = { 1 ,  2 ,,  n }  We intelligently partition the set of observed system states v . . . . . . . . . exhibit . . . . vn 1 v2 partitions  homogenous behavior partitions have a reduced number of relevant variables Reduced Number of Relevant in a µ-model  Partitioning & µ-Modeling solveVariables two problems!   The problem of Scale The problem of Complex System Modeling Solution Overview: µ-Models  We use Tree Augmented Naïve Bayes (TAN) Classifier to build µ-models  The model returns the following probability γ = Pr(Vα | Vdesired)  Find assignment of values to variables in Vα that maximizes the probability of moving the system to the desired state Solution Approach: Dynamism  As the system keeps running more system states are generated, which could be incorporated into the µ-models  µ-models are easier to update as compared to monolithic system models  As a result of µ-model update  Policy Invalidation  Policy Adaptation  New Policies can Result  This addresses the problem of Dynamism Solution Approach: Tractability & Trust  Each self-management action that assigns values to variables in Vα is associated with a probability γ = Pr(Vα | V – Vø)  An action is taken only when γ > γthreshold  This can be used to fine-tune self-management  TANs can be easily understood by administrators Outline        Motivation: Enterprise Complexity Issues Solution Overview Policy-Driven Self-Management Dynamic SLA Decomposition Results Future Work Policy-Driven Self-Management  SLO: “Response Time < 10msec”     Event: SLO Violation Condition: Bandwidth=90Mbps, Request Rate=30 Given the goal state (90,30,9), find the µ-model to use Current State Goal State Action: set Allocated Servers to (90,30,9) 3 (90,30,12) evaluate c : Pr(c | 90,30,9)  max(Pr(ci | 90,30,9)) ci  V 1 3 90 30 12 12 8 9 Allocated Servers Bandwidth Request Rate Response Time Dynamic SLA Decomposition  Problem: To determine sub-SLAs for components that lead to SLA conformance System-Level SLA  Sub-SLAs can be thought of as per-component range of values for controllable variables SLA1 SLA2 SLA3 SLA4 SLA5  If each component adheres to the sub-SLAs then the SLA is not violated  Our techniques can handle SLA decomposition conformance(SLA1, SLA2, …, SLAn)  conformance(System SLA) Experimental Results: SOA Simulator Without Self-Management With Self-Management Experimental Results: RUBiS over VMs Without Self-Management Database Perturbation With Self-Management Partition Change Conclusions & Future Work  Our techniques are applicable for a variety of enterprise systems  In our experiments the techniques have proven to be very scalable and accurate  Monitoring overheads can be reduced by taking inputs about relevant variables from the state-space partitions  Design & Implement techniques that can proactively avoid SLA violations Thank You! References [1] V. Kumar, K. Schwan, S. Iyer, Y. Chen, A. Sahai. The statespace approach to SLA-based management. In submission to NOMS 2008. [2] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. iManage: Policy-Driven Self-Management for Enterprise-Scale Systsem. Middleware 2007. [3] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. Enabling Policy-Driven Self-Management for Enterprise Systems. PBAC 2007 in conjunction with ICAC-2007 [4] V. Kumar, et al. Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW. ICAC 2006

The State-Space Approach to Self-Management of Enterprise Systems

Related documents

Products

Support

The State-Space Approach to Self-Management of Enterprise Systems

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib