Architecting Your Cloud: Lessons Learned from 100 CloudStack Deployments Speaker: Shannon Williams Vice President Market Development, Cloud Platforms EMEA contact: Olivier Maes Sr Dir Market Development EMEA, Cloud Platforms Olivier.maes@citrix.com, twitter: @omaes72 Cloud computing in 10 years • Computing clouds will have standardized • Servers/Storage/Networking will be commodities available on demand. • Applications will be designed to leverage distributed computing resources • Key questions won’t have changed ᵒ ᵒ ᵒ ᵒ Application Performance Application Reliability Infrastructure Security/Compliance Operational Costs Goal: Deliver applications quicker with more reliably at a fraction of the current cost. Cloud computing today • Start-ups and Web Companies are achieving the 10-year vision today ᵒ Standardizing on big public clouds (Amazon, Softlayer, BT, Terremark, etc.) ᵒ Designing applications that can leverage distributed availability zones for reliability • Enterprises are generally not leveraging cloud computing ᵒ Most apps aren’t written for distribution ᵒ Security/Compliance concerns over leveraging shared resources ᵒ Proven mechanism for delivering apps remains standard. Goal: Provide improved access for developers and operators. Today’s goal: provide a basic understanding of different cloud architectures • Outline a process for defining a cloud • Describe the building blocks used to deploy a computing cloud • Look at traditional workloads and cloud workloads • Consider architectures that meet a broad set of requirements Since 2008 CloudStack has powered hundreds of clouds • Secure, multi-tenant cloud orchestration platform – Turnkey platform for delivering IaaS clouds – Hypervisor agnostic – Highly scalable, secure and open – Complete Self-service portal – Open source, open standards – Deploys on premise or as a hosted solution Since becoming part of Apache CS has exploded “It's just amazing! In just 3 months, CloudStack has gone directly to the same level as OpenStack is. This is much steeper community growth than I could have predicted (if anyone had asked me for predictions, that is...). Source: Cloudstack has proof: Foundations is the way to create a FOSS community http://openlife.cc/blogs/2012/july/cloudstack-hasproof-foundations-way-create-foss-community WINDOWS ON-DEMAND DEV & TEST DISASTER RECOVERY BRIDGE & GATEWAY BYO PLATFORM INFRASTRUCTURE YOUR SERVICE CloudPortal NetScaler CloudPlatform CloudBridge Powered by Apache CloudStack ESX Hyper-V XenServer KVM OVM VIRTUALIZATION Compute Network Storage CloudPortal Delivers Cloud Apps & the Business Logic Account Management Pricing & Billing Self Service Cloud Apps Customer Relationship Dashboard Authentication Account S elf Service Product Definition Sales CRM Usage Reporting Account Provisioning Delegated Account Management Catalog Management Ticketing / HelpDesk Messaging Customer Management Usage Tracking Community Forums Alerts Account Management Cloud Management User Roles Portal Administration Billing Flexible and Extensible SDK Service Status Service Status Payment Processing CloudPortal Plugins Content Management Customer Relationship Billing Authentication Liferay Salesforce.com Zuora CAS (LDAP/AD) Drupal Each cloud drives unique requirements Service Providers 9 Web 2.0 Enterprise Architecture definition is a process IaaS Cloud Define target workloads Determine how that workload will be delivered reliably Determine the necessary functionality and performance Develop your technical architecture Implement your environment Workload categories give us a starting point Traditional Enterprise Applications Disaster Recovery Software Development, Testing and Maintenance Social Media Applications Managed IT Services Batch processing 11 High Performance Computing Possible to categorize workloads into two sets Cloud Workloads Traditional Workload Reliable hardware, backup entire cloud, and restore for users when failure happens Cloud Workload Tell users to expect failure. Users to build apps that can withstand infrastructure failure Both types of workloads must run reliably in the cloud RPO (Recovery Point Objective) Reliability & DR are Workload Specific $ $$ 2 • Recovery Point Objective (RPO) and 1 Regular Recovery Time Objective (RTO) should be determined based on workloads • Deployment and DR plan should be designed per RPO, RTO requirements $$ 3 Critical Mission Critical RTO (Recover Time Objective) • Different types of workloads will achieve workload reliability in different ways Workload reliability drives unique requirements Traditional Workload Cloud Workload Link Aggregation VM Backup/Snapshots Storage Multi-pathing Ephemeral Resources VM HA, Fault Tolerance Chaos Monkey VM Live Migration Multi-site Redundancy Expect reliability. Back-up entire cloud. Admin controlled failure handling Think Server Virtualization 1.0 Expect failure. Design app for failure. Self-service failure handling Think Amazon Web Services Other functionality will impact design as well VM Features • Resizing • High Availabity • Cloning • Monitoring • Windows Support • Linux Support • Naming • Grouping • Security Networking Features Storage Features Template Management Management Features • Dedicated user networks • Integrated Firewall • Integrated Load Balancing • IP Address Management • Multiple Guest Networks • VPN Termination • Intrusion Prevention • Persistent Storage • Ephemeral Disk • Automated Disk Snapshots • Cloud Storage access • Disk Monitoring • Encryption • Master Template Library • User Template upload • User ISO upload • Blank VM creation • Private templates • Template migration • Delegated Administration • Live Migration of VMs • Live Migration of Storage • Usage Metering • User Interface • Console Access • MultiHypervisor • Open-Source • MultiDatacenter Every cloud starts with basic building blocks Servers Storage Networking Networking Server Clusters Server Clusters Server Clusters Storage Hypervisor Resources Availability Zones Clouds Two sample zone architectures - Traditional server virtualization zone Amazon-Style availability zone Designing a zone for a traditional workload Hypervisor Feature Rich– vSphere, vCenter vCenter Storage Enterprise Networking (e.g., VLAN) ESXi Cluster ESXi Cluster ESXi Cluster Enterprise Storage (e.g., SAN) SAN Networking L2 VLANs Network Services Load Balancing PV-LANs Multi-tier Apps Multi-tier VLANs OVF Designing a zone for a traditional workload • Can achieve significant reliability for applications running in one zone. vCenter Enterprise Networking (e.g., VLAN) ESXi Cluster ESXi Cluster ESXi Cluster Enterprise Storage (e.g., SAN) • Reliability of individual nodes is very high. • All zone storage is replicated to a second storage platform (synchronous or asynchronous) • In event of failure, images are recovered from second storage array. • Existing workloads will run reliably. • Little cost benefit over existing approaches Designing a zone for an Amazon-style workload Amazon-Style Availability Zone Software Defined Networks (e.g., Security Groups, EIP, ELB,...) Server Racks Server Racks Server Racks Server Racks Hypervisor Simple - XenServer Storage Local Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks SDN based L2 Elastic IP Network Services Security Groups Elastic Block Storage Object store Networking L3 Server Racks EBS ELB GSLB Multi-tier Apps L3 SDN based VPC CloudFormation Object store is critical for Amazon-style cloud Amazon-Style Cloud Amazon-Style Availability Zone CloudStack Mgmt. Server Availability Zone Availability Zone Object Storage Software Defined Networks (e.g., Security Groups, EIP, ELB,...) Availability Zone Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Server Racks Elastic Block Storage Object store is critical for Amazon-style cloud Amazon-Style Cloud CloudStack Mgmt. Server • Workloads are distributed across availability zones • No guarantee on zone reliability Availability Zone Availability Zone Object Storage Availability Zone • Applications designed to handle node level failue • DBs and Templates snapped to object store. • In event of failure, images are recreated on new availability zone. • Dramatically less expensive Cloud Transition – General to Workload specific Past General Architecture • General architecture for any workload • Limited definitive failure/disaster recovery strategy • Focused on legacy or cloud app architectures Today Traditional-Style Amazon-Style • Workload-centric architecture • Workload-specific failure/disaster recovery • Separate legacy and cloud app architectures with interoperability Support for different styles is required CloudStack Mgmt. Server Server Virtualization Availability Zone vCenter Enterprise Networking (e.g., VLAN) Availability Zone Availability Zone Availability Zone ESXi Cluster Object Storage ESXi Cluster ESXi Cluster Enterprise Storage (e.g., SAN) Availability zones will be distributed globally CloudStack Management Cluster San Jose London Hosted Dehli Miami Hosted Rio Tokyo Availability zones are becoming on-demand Hosted On Premise Private Cloud Managed Private Cloud Enterprise Data Center Hosted Private Cloud Enterprise Data Center Dedicated resource Total control/security Internal network Multi-tenant Users Enterprise Public Cloud Services Multi-tenant Users 3rd party hosted & operated 3rd party operated • • • Federated/Hybrid Cloud Services • • • • 3rd party owned and operated SLA bound Security Dedicated resource • • • Mix of shared and dedicated resources Shared facility and staff VPN access • • • • Shared resources Elastic scaling Pay as you go Public internet Key takeaways 1. Understand your workload and the type of cloud you want to build. 2. Consider the services you will be delivering from the cloud in the future. 3. Choose a platform and architecture that is flexible enough to support you today and in the future. Work better. Live better.