Cloud Computing and High Performance Networking David Irwin Computer Science Department University of Massachusetts, Amherst UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Cloud Computing Wikipedia: “Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like the electricity grid” UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Cloud Computing Shared resources == lots of computers in data centers Benefit from “Economy of Scale” Cost per unit falls as scale increases I.e., like Costco but for computers and software services UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Outline Virtualized Data Centers Hardware Virtualization The foundation for virtualized data centers Public/private Cloud Computing The foundation for cloud computing Relevance to education Shared testbeds NSF’s GENI prototype UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Data Center Overview Large Server and Storage Farms Used by enterprises to run server applications Used by Internet companies Google, Facebook, YouTube, Amazon Size varies depending on needs Architecture Traditional: applications run on physical servers Manual mapping of applications to servers IT admins deal with “change” Modern: virtualized data centers Application runs inside of virtual servers; VM mapped to physical servers Provides flexibility in mapping from virtual to physical resources UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Virtualized Data Center Simplifies resource management Application started from preconfigured VM images, e.g., virtual appliances Virtualization layer permits resource allocations to vary dynamically Migrate VMs between physical machines with no down-time Workload management Internet applications dynamic workloads How much capacity to allocate to applications? Traditional approach: IT admins estimate peak workloads and provision sufficient servers Flash crowd react manually by adding capacity Time scale of hours: lost revenue, bad publicity for application UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Dynamic Provisioning Track workload and dynamically provision capacity Monitor Predict Provision Predictive versus reactive provisioning Traditional data centers: bring up a new server Predictive: predict future workload and provision Reactive: react whenever capacity falls short of demand Borrow from free pool or reclaim under-used server Virtualized data center: exploit virtualization to speed up application startup time UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Outline Virtualized Data Centers Hardware Virtualization The foundation for virtualized data centers Public/private Cloud Computing The foundation for cloud computing Relevance to education Shared testbeds NSF’s GENI prototype UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Virtual machines are hot Headlines from August 2007 VMware IPO: $19.1 billion Xen sale: $500 million UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Traditional OS Structure App App App App Operating System Host Machine UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 OS abstractions Applications Threads Instructions CPU Virtual Memory OS Virtual addrs Physical mem “Kernel library” Syst calls I/O devices Hardware What are the interfaces and the resources? What is being virtualized? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 OS abstractions Applications Threads Instructions CPU Virtual Memory OS Virtual addrs Physical mem “Kernel library” Syst calls I/O devices Hardware What are the interfaces and the resources? What is being virtualized? UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Virtual Machine Structure Guest App Guest App Guest App Guest OS Guest OS Guest OS Virtual Machine Monitor (Hypervisor) Host Machine UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Why are VMs useful? Code reuse Encapsulation Can run old operating systems + apps on new hardware Original purpose of VMs by IBM in the 60s Can put entire state of an “application” in one thing Move it, restore it, copy it, etc Isolation, security All interactions with hardware are mediated Hypervisor can keep one VM from affecting another Hypervisor cannot be corrupted by guest operating systems UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Encapsulation Say I want to suspend/restore an application I decide to write the process memory to disk I reboot my kernel and restart the process Will this work? No, application state is spread out in many places Application might involve multiple processes Applications have state in the kernel (lost on reboot) (e.g. open files, locks, process ids, driver states, etc) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Encapsulation Virtual machines capture all of this state Can suspend/restore an application On same machine between boots On different machines Very useful in server farms As we discussed earlier UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Examples Full Virtualization Para-virtualization Run any OS; expose full x86 ISA E.g., VMware, Xen on HVM Run any (slightly modified) OS; expose (slightly modified) x86 ISA E.g, Xen OS-level virtualization Run multiple copies of same OS E.g, VServers, User-mode Linux UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Outline Virtualized Data Centers Hardware Virtualization The foundation for virtualized data centers Public/private Cloud Computing The foundation for cloud computing Relevance to education Shared testbeds NSF’s GENI prototype UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Types of Cloud Computing Implementations at different levels of abstraction Software-as-a-Service E.g., Gmail, Google Calendar Platform-as-a-Service Analagous to a software stack: hardware OSapplication Lower-level == more difficult to use + more freedom to innovate E.g., Azure, AppEngine Infrastructure-as-a-Service E.g., Amazon EC2, S3, EBS UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Types of Cloud Computing Implementations at different levels of abstraction Software-as-a-Service E.g., Gmail, Google Calendar Platform-as-a-Service Analagous to a software stack: hardware OSapplication Lower-level == more difficult to use + more freedom to innovate E.g., Azure, AppEngine Infrastructure-as-a-Service E.g., Amazon EC2, S3, EBS Focus of much of this talk. Good for classroom, since it provides most freedom to innovate UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Cloud Computing Benefits Low upfront capital expenditure Predictable costs No need to buy, power, cool, or maintain hardware Cheaper for small or short-term operations E.g., educators teaching project-based courses Flat fees for usage, e.g., $/hour Nice for fixed budgets: $250 class computing budget Computing budgets are hard to predict Flexible pricing plans On-demand: pay $0.10 every hour, quit at anytime Spot: use computers when price <=$0.10/hour Reserved: reserve computers for long period at $0.05/hour UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Example: Amazon EC2 UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Types of Amazon Resources Elastic Compute Cloud (EC2) ($0.085/hour) Elastic IPs AutoScaling CloudWatch ($0.015/hour) Elastic MapReduce Elastic LoadBalancing Simple Storage Service (S3) ($0.15/GB-month) Elastic Block Store (EBS) ($0.10/GB-month) Elastic Snapshots SimpleDB Prices are continuing to fall as usage increases….. UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Educators and Cloud Computing Excellent platforms for experimentation Don’t care if students “break” things Give root access to machine Install arbitrary software Students isolated from other students/users Enables distributed application projects Can access many machines for short time-period E.g., Class of 20 working in pairs developing an application that runs over 5 machines need 50 computers! Costs $500 for a two week assignment (via Amazon Spot Pricing) Also can use “free” testbeds, e.g., PlanetLab, GENI UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Importance of Distributed Apps Internet services use many computers to serve Internet users E.g., Google, Facebook, YouTube, Yahoo, Microsoft Services may use thousands of computers running at multiple data centers throughout the world Importance of these applications is still increasing at a rapid rate Important for students to learn how to develop distributed applications in this environment Traditionally a difficult task, since access is expensive UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Private Clouds If you already own a lot of mostly idle machines you can install private cloud software Make your own infrastructure look like a cloud Don’t get the low cost benefits….. ….but maybe make your infrastructure more flexible or usable Good for class project maintenance if you have the time to invest to learn and install software Example: Eucalyptus, emulates Amazon EC2 interfaces UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Outline Virtualized Data Centers Hardware Virtualization The foundation for virtualized data centers Public/private Cloud Computing The foundation for cloud computing Relevance to education Shared testbeds NSF’s GENI prototype UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Shared Testbeds Run by universities and companies to experiment with new research prototypes Often are “free”: may need to contribute a few computers Not as stable as commercial clouds Focus on one or more characteristics Formed from donations by participants Compute isolation Network isolation Geographic diversity Hardware diversity Examples: PlanetLab, Emulab, NSF’s Global Environment for Network Innovations (GENI) UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 PlanetLab Get access to machines hosted around the world Experiment with global network services No resource isolation Network presence but little compute power Success led to GENI UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 Status GENI connects diverse testbeds together Many components/link types Routers, edge nodes, wireless, wired, storage, sensors, fiber-optic, etc. Prototyping started last year (http://geni.net) Hasn’t been built, but isn’t vaporware Existing systems form foundation 80 projects clustered around 4 “control frameworks” PlanetLab, Emulab, Orca, Orbit 4 projects at UMass-Amherst! Run by BBN (Cambridge); also at UMass-Lowell and Williams UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008 GENI Overview Experiments (Guests occupying slices) Sliverable GENI Substrate (Contributing domains/Aggregates) Observatory Wind tunnel Embedding Petri dish UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science • 2008