RESEARCH Peeking into Cloud for better Application Manageability Sambit Sahu IBM Research © 2009 IBM Corporation RESEARCH Cloud Virtualizes Resources (cpu, mem, network, storage..) Cloud simplifies resource access through virtualization Service X – User can get easy and inexpensive access to virtualized resources – Lower entry barrier with pay as you use – On-demand elastic resource access Implication on application management hosted on Cloud – Decoupling between underlying resources and application management layer – May not even have ability or control over application placements or deployments – Typically Cloud provider and Service Provider or Users belong to different organizations Challenges – How to efficiently manage applications where – No physical resource access or control due to virtualization layer – No or limited visibility on “what-is-going-on” in the physical resource level Service Y Application 1 Application 2 Application Management Layer Cloud Resource Provision Layer Hypervisor Layer Trusted Virtual Domains server, network, storage, middleware resources – What support or primitives need to be made available to facilitate efficient problem resolution? © 2009 IBM Corporation RESEARCH Analysis of Open Forum Message Threads of a Large Cloud Provider Analyzed open forum message threads for problem classification and resolution process understanding – Collected messages posted on a Cloud open forum over last three years – About 8000 unique message threads with about 38000 messages – Analyzed messages over 3 months period and random sampling of messages over rest of the period – Major problem categories and classifications – Nature of resolutions Objective – Understand top problems and solutions provided by the cloud provider – Understand impact of information hiding due to virtualization on problem resolution – Determine if any information and/or tooling would help service providers and users for efficient problem resolution/isolation © 2009 IBM Corporation RESEARCH Common Symptoms and stage based problem classification Instance Unreachabl e Instance Unbootable Provision ing Pending Fig 1: Commonly Encountered Symptoms • • • • Instance reachability problems • Virtualization mask failures Connectivity issues • Infrastructure network problems, firewall rules,… Component: Virtual infrastructure issues • • Management of instance depends on instance Instance unbootable • Fig 2: State based problem classification Provisioning stage – least problems encountered • • • Most management & third party target this Running stage – most problems encountered Terminating stage – surprisingly second largest contributor of problems Unable to disconnect/connect virtual infrastructure © 2009 IBM Corporation RESEARCH Some Problem Resolution Statistics Figure 3: About 97 administrators responding to users via open forum where few administrators answered majority of the problems (20 admin accounted for more than 80%) Figure 4: Resolution Time 60% problems were resolved in a day Additional 20% problems took more than additional 3 days © 2009 IBM Corporation RESEARCH Multiple problems manifest with same symptoms Typical steps taken to resolve/fix the problems • Level of support and solutions for resolving problem • Instance Reboot • Host Reboot • Software Fixes • Manual modification of infrastructure state • Manual modification of network configuration • Firewall • Router configuration Fig 3: Same symptoms for different set of problems © 2009 IBM Corporation RESEARCH Summary Layering in a Cloud (virtualization of physical resources) hides crucial information away from users – Multiple problems manifest as same symptoms Issues – What information can and should be shared between cloud provider and cloud users or service providers? – How to achieve this peeking without compromising security issues? Some initial thoughts on information unlayering – Topology view – Dependent resource health – Resource logs – Dependent access policies -… © 2009 IBM Corporation