“Cloud bursting” on SZTAKI Cloud Attila Csaba Marosi Cloud Computing Research Group MTA SZTAKI LPDS marosi.attila@sztaki.mta.hu Summer School on Grid and Cloud Workflows and Gateways 2013 1 Outline • • • • • • • Terminology Recap: SZTAKI Cloud and LPDS Cloud Cloud-Manager Cloud bursting definition, scalability in general Scaling scenarios @ SZTAKI Cloud Summary Additional Reading and References Summer School on Grid and Cloud Workflows and Gateways 2013 2 Terminology I. • Based on deployment model: o Public Cloud – “The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services.” 3 o Private Cloud – “The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise.”3 o Hybrid Cloud – Environment created by the combination of public and private cloud offerings o (Community Cloud) 3 Summer School on Grid and Cloud Workflows and Gateways 2013 3 Terminology II. • Based on location: o Internal Cloud – Subset of the Private Cloud model where it is offered by an IT organization to its own business1 (“on premise”3 ). o External Cloud – Not hosted by own organization and offered by a 3rd party. It can be either public or private 1 (“off premise”3 ). • Point of view of architectural service layers o Software as a Service (SaaS) o Platform as a Service (PaaS) o Infrastructure as a Service (IaaS) – Cloud bursting (scaling) at this level Summer School on Grid and Cloud Workflows and Gateways 2013 4 Recap • SZTAKI Cloud* o o o o • Institutional IaaS Cloud service by SZTAKI (private, internal) 7 nodes (7*64 Core, 7*256GB RAM), 2*32TB Storage OpenNebula 3.8.3 based Quotas for users LPDS Cloud* o Similar, but smaller scale o Internal private cloud for LPDS • Typically we use the LPDS Cloud for internal needs and scale out to SZTAKI Cloud when needed. Summer School on Grid and Cloud Workflows and Gateways 2013 * Sándor Ács: “SZTAKI Cloud”. Monday, 1st July @ 12:00. 5 Definition, scalability • Cloud Bursting: o “Cloud bursting is an application deployment model in which an application runs in a private cloud or data center and bursts into a public cloud when the demand for computing capacity spikes.”4 • However more generally, cloud bursting is a subset of the general scaling out problem • Can be split into 2 parts: 1. 2. Capability to scale out to a cloud to maintain QoS requirements (e.g., for handling short term spikes in computing capacity demand). making the decision of (a) when, (b) how much, (c) how long and (d) where to scale out. Summer School on Grid and Cloud Workflows and Gateways 2013 6 The ability to scale out (to a cloud) + Making the decision Scaling out scenarios (with SZTAKI Cloud) In this talk Summer School on Grid and Cloud Workflows and Gateways 2013 Auto-scaling techniques “Cloud bursting from WSPGRADE/ gUSE” Thursday, 11:00-11:30 7 Cloud-Manager Generic Meta-Broker Service • Part of the FCM5 (“Federated Cloud Management”) Architecture • We’ll now focus on the Cloud-Manager o For FCM c.f., Attila Kertesz: “Cloud Federation Approaches” – @ 11:00 Today • Schedules service calls to VMs and manages VMs • REST/SOAP Web service interface for service call and VM queues • The Cloud Resource Manager (CRM) component is responsible for the scaling decision (when/ where/ … ) • Initially it was intended for scaling services in a single cloud • We use this component internally for different scaling (bursting) multi-cloud scenarios. Summer School on Grid and Cloud Workflows and Gateways 2013 Cloud-Manager VAy VAx Q1 Service Handler Clouda VMQx Clouda VMQy FCM Repository VAx..VAy Clouda VM Handler VMx1 VMy1 VMx2 VMy2 … … VMxn VMym Clouda 8 Cloud-Manager Cloud-Manager 1. Single queue for incoming service calls (or tasks) 2. Multiple VM queues o o Different one for each VA and resource combination VM queues can be managed automatically (CRM) or manually 3. Manages VM lifecycle (EC2 REST API) 4. Performs the scheduling of service calls to resources (Q1→VM) VAx 1 Q1 4 Service Handler Clouda VMQx 2 VAy Clouda VMQy Clouda VM Handler 3 VMx1 VMy1 VMx2 VMy2 … … VMxn VMym Clouda Summer School on Grid and Cloud Workflows and Gateways 2013 9 Scenarios @ SZTAKI • Source: Current infrastructure type (not necessarily cloud based!) • Destination: target cloud infrastructure type Destination / Source Private Volunteer Summer School on Grid and Cloud Workflows and Gateways 2013 Public Private Private→Public Private→Private (Scenario A. – “Cloud bursting”) (Scenario B.) Volunteer→Public Volunteer→Private (Scenario C/1.) (Scenario C/2.) 10 Scenario A: Private → Public Destination / Source Private Volunteer Summer School on Grid and Cloud Workflows and Gateways 2013 Public Private Private→Public Private→Private (Scenario A. – “Cloud bursting”) (Scenario B.) Volunteer→Public Volunteer→Private (Scenario C/1.) (Scenario C/2.) 11 Scenario A: Private → Public • Form a hybrid cloud: when local resources are insufficient allocate resources from a public cloud provider • Real world example: Prezi.com o Uses private resources w/ Amazon EC2 to handle peak traffic o Batch processing of tasks • Zip files for download, fetch images for presentations, conversion jobs o Prezi.com Scale Contest – http://prezi.com/scale/ • Jobs 5 seconds max in queue, VMs 2 minute boot time, instances paid by the hour – minimize cost while honor requirements Summer School on Grid and Cloud Workflows and Gateways 2013 12 Scenario A: Private → Public • In SZTAKI We have the following possibilities for bursting: 1. 2. OpenNebula based bursting Cloud-Manager based bursting • However we prefer to use private clouds over public ones – bursting to public clouds is set up as absolute last resort Summer School on Grid and Cloud Workflows and Gateways 2013 13 OpenNebula: Building a Hybrid Cloud (Scenario A)* • OpenNebula supports accessing multiple remote providers through the EC2 API – not necessarily just Amazon EC2 • Remote provider appears as new host in OpenNebula • Resource limits by administrator for number and type of instances • VMs can be started in EC2 or locally • VM counterpart at remote provider – EC2 section in VM template • Network connectivity via VPN Summer School on Grid and Cloud Workflows and Gateways 2013 * Sándor Ács: “OpenNebula”. Monday, 1st July @ 11:00. 14 OpenNebula: Hybrid Cloud Use Cases* On-demand Scaling of Computing Clusters • E.g., elastic execution of a Condor computing cluster • Dynamic growth of the number of worker nodes to meet demands using EC2 • Private network with NIS and NFS • EC2 worker nodes connect via VPN On-demand Scaling of Web Servers • E.g., elastic execution of the NGinx web server • The capacity of the elastic web application can be dynamically increased or decreased by adding or removing NGinx instances * Sándor Ács: “OpenNebula”. Monday, 1st July @ 11:00. Cloud-Manager: multi-cloud (Scenario A) Cloud-Manager • Cloud-Manager supports multiple providers through the EC2 REST/ SOAP API o OpenNebula, OpenStack, Eucalyptus and Amazon EC2 • Primarily for scaling Distributed Computing Infrastructures (DCIs) • Service calls are bound to VA’s o Each configured provider must have the counterpart (AMI-ID) • Network connectivity via VPN when needed VAx Q1 Clouda VMQx Cloudb VMQx Service Handler Clouda Handler Cloudb Handler VMx1 VMx1 VMx2 VMx2 … … VMxn VMxm Clouda Summer School on Grid and Cloud Workflows and Gateways 2013 VAy Cloudb 16 Scenario B: Private → Private Destination / Source Private Volunteer Summer School on Grid and Cloud Workflows and Gateways 2013 Public Private Private→Public Private→Private (Scenario A. – “Cloud bursting”) (Scenario B.) Volunteer→Public Volunteer→Private (Scenario C/1.) (Scenario C/2.) 17 Scenario B: Private → Private • Scale from a private infrastructure to another private infrastructure o E.g., scale from your local infrastructure (e.g., private internal) to another academic cloud (e.g., private external) • Typical use case for us: scaling out from LPDS Cloud to SZTAKI Cloud (however both can be considered as internal clouds) Summer School on Grid and Cloud Workflows and Gateways 2013 18 SZTAKI: Scenario B+A (1/2.) • We scale primarily computing clusters (Condor, BOINC) with Cloud-Manager 1. 2. 3. We use the LPDS Cloud (private) Scale out to SZTAKI cloud (private) As last resort scale out to Amazon EC2 (public) Summer School on Grid and Cloud Workflows and Gateways 2013 19 SZTAKI: Scenario B+A (1/2.) 2 • The master node (1) and the Cloud-Manager (2) are hosted usually on a dedicated resource • VPN head (3) must be typically on a public IP node o We use a patched version on TINC with public key authentication • The Cloud Resource Manager (4) is responsible for auto-scaling • New VM instances are created and destroyed through the EC2 REST/SOAP API (5) Summer School on Grid and Cloud Workflows and Gateways 2013 4 1 5 3 20 1. 2. 3. 4. Example: Scaling a Condor cluster with Cloud-Manager CM Service calls → Jobs for Condor • Through REST/SOAP interface: (e.g., WS-PGRADE/ gUSE) VPN Head on public IP Manager node: Cloud-Manager and Condor Master VAs are deployed at LPDS, SZTAKI, Amazon EC2 1 • Contextualization by Cloud-Manager: • Key for VPN 3 • VPN Head public IP • Condor Master IP on VPN Summer School on Grid and Cloud Workflows and Gateways 2013 4 4 4 2 21 Example: Scaling a Condor cluster with Cloud-Manager Summer School on Grid and Cloud Workflows and Gateways 2013 22 Scenario C: Volunteer → {Public, Private} Destination / Source Private Volunteer Summer School on Grid and Cloud Workflows and Gateways 2013 Public Private Private→Public Private→Private (Scenario A. – “Cloud bursting”) (Scenario B.) Volunteer→Public Volunteer→Private (Scenario C/1.) (Scenario C/2.) 23 Scenario C: Volunteer → {Public, Private} • LPDS runs multiple BOINC based volunteer computing projects – SZTAKI Desktop Grid, EDGeS@home o People donate their computers’ idle computing cycles to science o We do not own the resources o We do not have any control over the resources • These resources are “free” however not very reliable o Jobs might be returned late or gone missing • We burst to clouds to provide reliable computing resources for problematic jobs when needed o LPDS → SZTAKI → Academic Clouds →Amazon EC2 • C.f., Jozsef Kovacs: “Integrating clouds with grid systems – the SZTAKI-BOINC experience” @ 11:30 Summer School on Grid and Cloud Workflows and Gateways 2013 24 Summary • Bursting (scaling) consist of the capability + decision making • In this presentation I showed some scenarios from SZTAKI: o Private → {Public, Private}; Volunteer → {Private, Public} o OpenNebula and Cloud-Manager based • The decision making process (i.e., auto-scaling) will be the topic of my presentation on Thursday o “Cloud bursting from WS-PGRADE/ gUSE” – Thursday, 11:00-11:30 Summer School on Grid and Cloud Workflows and Gateways 2013 25 References and Additional reading [1] Nair, S. K., Porwal, S., Dimitrakos, T., Ferrer, A. J., Tordsson, J., Sharif, T., Sheridan, C., Rajarajan, M. & Khan, A. U. (2010). Towards secure cloud bursting, brokerage and aggregation. Paper presented at the IEEE European conference on Web Services, 1 Dec 2010 – 3 Dec 2010, Cyprus. [2] D. McDysan: Cloud Bursting Use Case. IETF. http://tools.ietf.org/html/draft-mcdysansdnp-cloudbursting-usecase-00 [3] National Institute of Standards and Technology (NIST): The NIST Definition of Cloud Computing. September, 2011. http://csrc.nist.gov/publications/nistpubs/800145/SP800-145.pdf [4] SearchCloudComputing http://searchcloudcomputing.techtarget.com/definition/cloudbursting [5] A. Cs. Marosi, G. Kecskemeti, A. Kertesz and P. Kacsuk, FCM: an Architecture for Integrating IaaS Cloud Systems. In Proceedings of The Second International Conference on Cloud Computing, GRIDs, and Virtualization. Rome, Italy. September, 2011. Summer School on Grid and Cloud Workflows and Gateways 2013 26 Thank you! Questions? Summer School on Grid and Cloud Workflows and Gateways 2013 27 Summer School on Grid and Cloud Workflows and Gateways 2013 28