Amazon EC2 Andrew Chekerylla & Edward Kim What is EC2? Amazon Elastic Cloud Computing Infrastructure as a Service (IaaS) Allows customers to rent virtual computers by the hour. All they need to provide is money, and they will have a virtual server instance. Development Team Amazon.com in Cape Town, South Africa Chris Pinkham, VP IT Infrastructure Christopher Brown, Design Architect Willem Van Biljon, Product Manager Product Amazon.com Elastic Compute Cloud (EC2) Web service that provides scalable computing resources in the cloud. Development Motivation Previous data center solutions required large financial investment and presented cost inefficiencies when data needs changed. Amazon saw an opportunity to provide scalable cloud computing that avoided these costs. They could charge clients only for what they needed, using a variable pricing model. Development Timeline March 2006: Filed initial patents August 2006: Public beta test with UNIX platforms October 2008: Production release with Windows Server platforms Since then: Added SQL Server, NetBSD and FreeBSD. Development Product Features Elastic Compute Units (ECUs) for variable computing power Elastic Block Storage (EBS) for network-based storage Xen Virtual Machines (VMs) for computing resources Elastic IP Addresses for user-controlled IP addresses CloudWatch for real-time dashboard of computing resource utilization. Automated Scaling to automatically add or remove EC2 instances as needed. Availability Zones to ensure failure isolation between clusters. Development Product Innovations Design details are proprietary information. However, initial patents are available and can be downloaded. They are the closest glimpse into the core technology of Amazon EC2 Two patents filed in March 2006 Managing execution of programs by multiple computing systems [1] Managing communications between computing nodes [2] Patents [1] March 2006: Managing execution of programs by multiple computing systems Central program execution service for distributing jobs to available computing resources. The service can discriminate resources by physical proximity or by similar software state. Physical proximity allows for reduced latency since data travels over a shorter distance. Similar software state allows for faster response since copies of the program are already available and possibly running. Patents [1] Network Diagram The next slide contains a network diagram from the original patent. The diagram shows multiple computing systems exchanging and running program copies. Patents [1] Network Diagram Note that System Manager nodes 140 and 150 take responsibility for managing computing resources by initiating program exchange or execute requests. Patents [1] Groups of Systems The next slide contains a picture of groups of computing systems that can store and exchange program copies. The diagram shows several computing systems that have different programs locally stored. Patents [1] Groups of Systems Note that not all programs are distributed to all nodes, since that would add needless transmission time overhead to system performance. Patents [1] Block Diagram The next slide contains a block diagram from the original patent. The diagram shows how computing systems could manage the execution of programs on other computing systems. Patents [1] Block Diagram Note that the System Manager Computing System and the Machine Manager Computing System are indicated on previous slides as parts of the same local network or cloud system. They each run a core routine that implements the program exchange and execution events in a masterslave architecture. Patents [1] Flow Diagram The next slide contains a partial flow diagram of the System Manager Module Routine. This runs on the system manager. Note this is complemented by a Machine Manager Module Routine running on each computing resource. Patents [1] Flow Diagram Note the System Manager Module Routine is a large function and has additional steps. It negotiates with the machine managers to provide program copies as needed. Patents [2] March 2006: Managing communications between computing nodes Groups of computing nodes use access policies to manage communication between virtual machines. Authorization can be dynamically negotiated and stored for later in order to automatically authorize future transmissions. Job Management Patent [1] describes a master-slave architecture between master computing resources and machine computing resources. Fault Tolerance Patent [1] describes how multiple program instances can be replicated on machines in different Availability Zones, to protect against network outages. EC2 Layers EC2 Diagram XEN Hypervisor Basic abstraction layer of software that sits directly on the hardware below any operating systems. Responsible for CPU scheduling and memory partitioning of the various virtual machines running on the hardware device. Controls the execution of virtual machines as they share the common processing environment. No knowledge of networking, external storage devices, video, or any other common I/O functions found on a computing system. Virtualization Specifications Xen Hypervisor for virtualization Provides services that allow multiple computer operation systems to execute on the same computer hardware Hardware specifications are tailored to the needs of the use Storage, Computing, Memory, Graphics Why did Amazon choose Xen? Virtualization Paravirtual Paravirtual AMIs boot with a special boot loader called PVGRUB, which starts the boot cycle and then chain loads the kernel specified in the menu.lst file on your image Hardware Virtual Machine Unlike PV guests, HVM guests can take advantage of hardware extensions that provide fast access to the underlying hardware on the host system Allows user to run an operating system directly on top of a virtual machine without any modification, as if it were run on the baremetal hardware. EC2 Instances Security Keypairs are used to authenticate when you login to the instance. Can use security groups for more protection Contained in your own Virtual Private Network Competitors Microsoft Azure Google Compute Engine GoGrid Rackspace Storm Voxel Linode VPS Joyent … Benefits Less downtime setting up new servers Highly Scalable High Availability (over 99%) Saves a lot of money Costs of upfront hardware Costs of leasing the space for the data center Operational overhead Easy to perform software updates or major upgrades Who would benefit most from this service? Benefits How/Why is it used? Availability US East (N. Virginia) US West (Oregon, Northern California) Asia Pacific (Tokyo, Sydney) Europe (Ireland, Frankfurt) South America (Sao Paulo) AWS GovCloud (US) Benefits of breaking down into regions? Network transfer distance Options for backup servers in different regions Cloud Computing for Job Management What does this mean for parallel computing? In what ways can we utilize this capability to handle large amounts of data? Amazon Elastic Map Reduce (EMR) Storage Amazon EC2 uses two different kinds of storage. One is local storage, known as Instance Storage, which is non-persistent and data will be lost after an instance terminates. The other kind is persistent, network-based storage called Elastic Block Store(EBS), which can be attached to running instances or also used as a persistent boot medium. Instance Storage EBS Elastic Block Storage Provides raw data blocks that can be attached to EC2 instances. (Essentially works as network drives) Can be backed up and restored to another instance for when failures occur on an a current instance EBS Pros / Cons Good for elasticity Built in redundancy Poor I/O rates on EBS volumes More costs involved S3 storage space IOPS Instance Storage Network Elastic IP Address Address belong to the account it was created on and not to an instance. It will exist even if the instance is deleted. IP addresses cannot be used outside the Amazon environment, customers must use the FQDN provided by Amazon to access their systems. Instances within the environment can communicate with the IP addresses. Control what goes in/out of your VPN using Network Translation Table (NAT) Elasticity Things to think about when choosing your type of instance VPN vs Classic IP Address Data Persistence Types of Instances Free Tier Use AWS instances for up to 12 months (minimal performance) On-Demand Setup and tear down whenever you need to Reserved Pay up front for servers with contracts Spot Bid for unused capacity, but no control over when it’s terminated Costs (On Demand) Why did Amazon choose this method of charging customers? • Compute • Storage • Network IOPS What others are saying about EC2 Seldo from aws.sm had some issues with the service Whole-zone failure patterns Lifecycle of virtual systems Costs to have multi-zone redudency EBS Leaked Information How about some detailed info on the xen setup? Do they silo the instances? (E.g. Have like sized instances run the same machine). hardware nodes (HN) runs a copy of Amazon Linux, which has several internal flavors. Each HN is silo'd like you say. So, if you're running m1.xl, you'll be sharing with only other m1.xl's Once your server is in a slot, it get's that internal IP address and an EIP is NAT'd to that internal IP Is it really possible to push more than 1Gbit on the larger Amazon EC2 instances? I've heard that the larger (4GB+?) instances are on different nodes which are connected by 10G. You're drifting more into the EC2 Development Team realm, butttt, from what I know it works like this. In any typical Linux application you have a runq and an io elevator. Prioritization of various pieces are included in the Kernel. So, in the case of networking, the networking get's higher io elevator priority because it also carries EBS. This higher priority directly affects the runq, ensuring that you get a two for one increase. Both in storage performance and network performance, since it all runs over the same nic. Summary One of the first major IaaS implemented Everything within EC2 has a cost to it Still there are a lot of reasons why companies use EC2 Sources 1. Awe.sm 2. http://blog.awe.sm/2012/12/18/aws-the-good-the-bad-and-the-ugly/#~p5i4KuJAFmwJnv Wikipedia 3. http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud Xenproject 4. http://www-archive.xenproject.org/files/Marketing/HowDoesXenWork.pdf AmazonAws 5. http://aws.amazon.com/ec2/ Masterclass Webinar 6. https://www.youtube.com/watch?v=TORzO9Oc9oU Rightscale 7. http://www.rightscale.com/blog/cloud-industry-insights/amazons-elastic-block-store-explained Chris Pinkham Patent #1 in 2006: 8. https://www.google.com/patents/US8190682 Chris Pinkham Patent #2 in 2006: 9. https://www.google.com/patents/US7801128 Amazon EMR https://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-what-is-emr.html Reddit – Ask Me Anything (ex amazon aws engineer) 10. 11. http://www.reddit.com/r/IAmA/comments/1e5o4p/iaman_exaws_engineer_ask_me_anything_about_the/ PCMag http://www.pcmag.com/article2/0,2817,2458757,00.asp