Uploaded by Margaret Sasha

cloud computing amazon goggle nimbus clouds 2

advertisement
CLOUD INFRASTRUCTURE
In this chapter we overview the cloud computing infrastructure offered at this time by Amazon,
Google, and Microsoft; they support one or more of the three cloud computing Paradigms ie: IaaS
(Infrastructure as a Service), SaaS (Software as a Service), and PaaS (Platform as a Service).
Amazon is a pioneer in IaaS, while Google's efforts are focused on SaaS and PaaS paradigms. Sun
and IBM offer their own cloud computing platforms, Sun Cloud and Blue Cloud, respectively.
In 2011, HP announced plans to enter the cloud computing club. Private clouds are an alternative
to commercial cloud platforms. Open-source cloud computing platforms such as Eucaliptus,
OpenNebula, and Nimbus can be used as a control infrastructure for a private cloud.
We continue our discussion of the cloud infrastruc-ture with an overview of SLAs (Service Level
Agreements), followed by a brief discussion of software licensing and energy consumption and
ecological impact of cloud computing.
Cloud computing at Amazon
Amazon was one of the first providers of cloud computing (http://aws.amazon.com); it announced
a limited public beta release of its Elastic Computing platform called EC2 in August 2006.
EC2 is based on the Xen virtualization strategy. In EC2 each virtual machine functions as a virtual
private server and it is called an instance; an instance specifes the maximum amount of resources
available to an application, the interface for that instance, as well as, the cost per hour.
Amazon uses two categories, Region and Availability Zone to describe the physical and virtual
placement of the systems providing each type of service. For example, the S3 storage service is
available in the US Standard and US West, Europe, and Asia Paci¯c Regions; the corresponding
storage facilities are located in Northern Virginia and Northern California, Ireland, Singapore and
Tokyo, respectively. An application developer has the option to use these categories to reduce the
communication latency, minimize costs, satisfy address regulatory requirements, and increase
reliability and security.
The AWS - Amazon Web Services infrastructure offers a palette of services available through the
AWS Management Console discussed next; these services include:
Elastic Compute Cloud (EC2);
Simple Storage System (S3);
Elastic Block Store (EBS);
Simple DB;
Simple Queue Service (SQS);
CloudWatch; and
Virtual Private Cloud (VPC).
1
Elastic Compute Cloud8 is a Web service with a simple interface for launching instances of an
application under several operating systems, such as several Linux distributions, Microsoft
Windows Server 2003 and 2008, OpenSolaris, FreeBSD, and NetBSD. EC2 allows a user to load
instances of an application with a custom application environment, manage networks access
permissions, and run the images using as many or as few systems as desired.
EC2 instances boot from an AMI (Amazon Machine Image) digitally signed and stored in S3; one
could use the few images provided by Amazon or customize an image and store it in S3.
A user can interact with EC2 using a set of SOAP (Simple Object Access Protocol) messages and
can list available AMI images, boot an instance from an image, terminate an image, display user's
running instances, display console output, and so on. The user has root access to each instance in
the elastic and secure computing environment of EC2. The instances can be placed in multiple
locations in different Regions and Availability Zones.
EC2 allows the import of virtual machine images from the user environment to an instance through
a facility called VM import. It also distributes automatically the incoming application traffic
among multiple instances using the elastic load balancing facility. EC2 associates an elastic IP
address with an account; this mechanism allows a user to mask the failure of an instance and
re-map a public IP address to any instance of the account, without the need to interact with the
software support team. Another facility called auto scaling empowers the user to scale seamlessly
up and down the number of instances used by an application.
The EC2 system offers several instance types:




Standard instances: micro (StdM), small (StdS), large (StdL), extra large (StdXL);
small is the default.
High memory instances: high-memory extra large (HmXL), high-memory double extra
large (Hm2XL), and high-memory quadruple extra large (Hm4XL).
High CPU instances: high-CPU extra large (HcpuXL).
Cluster computing: cluster computing quadruple extra large (Cl4XL).
Table 2 summarizes the features and the amount of resources supported by each instance.
The resources supported by each con¯guration are: main memory, virtual computers (VCs) with a
32 or 64-bit architecture, instance memory (I-memory) on persistent storage, and I/O performance
at two levels, moderate (M) of high (H). The computing power of a virtual core is measured in EC2
compute units (CUs).
A main attraction of the Amazon cloud computing is the low cost; the dollar amounts charged for
one hour of running under Linux or Unix and Windows at the time of this writing are summarized
in Table 3.
Simple Storage System is a storage service with a minimal set of functions: write, read,
and delete. It allows an application to handle an unlimited number of objects ranging in size
from 1 byte to 5 TB. An object is stored in a bucket and retrieved via a unique, developerassigned key; a bucket can be stored in a Region selected by the user. S3 maintains for each
2
object: the name, modifcation time, an access control list, and up to 4 KB of user-defined
metadata; the object names are global. Authentication mechanisms ensure that data is kept secure;
objects can be made public, and rights can be granted to other users. The Amazon S3 SLA
guarantees reliability. S3 uses standards-based REST9 and SOAP10 interfaces; the default
download protocol is HTTP and a BitTorrent11 protocol interface is provided to lower costs for
high-scale distribution. S3 supports PUT, GET and DELETE primitives to manipulate objects but
does not support primitives to copy, to rename, or to move an object from one bucket to another.
S3 computes the MD5 12 of every object written and returns it in a ¯eld called ETag. A user is
expected to compute the MD5 of an object stored or written and compare this with the ETag; if the
two values do not match, then the object was corrupted during transmission or storage. S3 is
designed to store large objects.
Elastic Block Store provides persistent block level storage volumes for use with Amazon EC2
instances. A volume appears to an application as a raw, unformatted and reliable physical disk.
3
The size of the storage volumes range from 1 GB to 1 TB; the volumes are grouped together in
Availability Zones and are automatically replicated in each zone.
An EC2 instance may mount multiple volumes, but a volume cannot be shared among multiple
instances. The EBS supports the creation of snapshots of the volumes attached to an instance and
then uses them to restart an instance. The storage strategy provided by EBS is suitable for database
applications, ¯le systems, and applications using raw data devices.
SimpleDB is a non-relational data store that allows developers to store and query data items via
web services requests; it supports store and query functions traditionally provided only by
relational databases. SimpleDB creates multiple geographically distributed copies of each data
item and supports high performance Web applications; at the same time, it manages automatically
the infrastructure provisioning, hardware and software maintenance, replication and indexing of
data items, and performance tuning.
Simple Queue Service is a hosted message queue. SQS is a system for supporting automated
work°ows; it allows multiple Amazon EC2 instances to coordinate their activities by sending and
receiving SQS messages. Any computer connected to the Internet can add or read messages
without any installed software or special firewall configurations.
Applications using SQS can run independently and asynchronously, and do not need to be
developed with the same technologies. A received message is \locked" during processing; if
processing fails, the lock expires and the message is available again. The timeout for locking can
be changed dynamically via the ChangeMessageVisibility operation. Developers can access QS
through standards-based SOAP and Query interfaces. Queues can be shared with other AWS
accounts and Anonymously; queue sharing can also be restricted by IP address and time-of-day.
CloudWatch is a monitoring infrastructure used by application developers, users, and system
administrators to collect and track metrics important for optimizing the performance of
applications and for increasing the efficiency of resource utilization. Without installing any
software a user can monitor free of charge either seven or eight pre-selected metrics collected at
one or at five minute intervals and then view graphs and statistics for these metrics.
Virtual Private Cloud provides a bridge between the existing IT infrastructure of an organization
and the AWS cloud; the existing infrastructure is connected via a Virtual Private Network (VPN)
to a set of isolated AWS compute resources. VPC allows existing management capabilities such as
security services, firewalls, and intrusion detection systems to operate seamlessly within the cloud.
In 2007 Garfinkel reported the results of an early evaluation of the Amazon Web Ser- vices . The
paper reports that EC2 instances are fast, responsive, and very reliable, a new instance could be
started in less than two minutes. During the year of testing one unscheduled reboot and one
instance freeze were experienced, no data was lost during the reboot, but no data could be
recovered from the virtual disks of the frozen instance.
To test the S3 a bucket was created and loaded with objects in sizes of 1 byte, 1 KB, 1 MB, 16 MB,
and 100 MB. The measured throughput for the 1-byte objects reflected the transaction speed of S3
4
because the testing program required that each transaction be successfully resolved before the next
was initiated. The measurements showed that a user could execute at most 50 non-overlapping S3
transactions. The 100 MB probes measured the maximum data throughput that the S3 system could
deliver to a single client thread.
From the measurements the author concluded that the data throughput for large objects was
considerably larger than for small objects due to a high transaction overhead. The write bandwidth
for 1 MB data was roughly 5 MB/s while the read bandwidth was 5 times lower, 1 MB/s.
Another test was designed to see if concurrent requests could improve the throughput of S3; the
experiment involved two virtual machines running on two different clusters and accessing the
same bucket with repeated 100 MB GET and PUT operations. The virtual machines were
coordinated, with each one executing 1 to 6 threads for 10 minutes and then repeating the pattern
for 11 hours. As the number of threads increased from 1 to 6, the bandwidth received by each
thread was roughly cut in half and the aggregate bandwidth of the six threads was 30 MB/s,
roughly three times the aggregate bandwidth of one thread.
In 107556 tests of EC2 each one consisting of multiple read and write probes, only 6 write
retries, 3 write errors, and 4 read retries were encountered.
The AWSLA (Amazon Web Services Licensing Agreement) allows the company to terminate
service to any customer at any time for any reason and contains a covenant not to sue Amazon or
its a±liates as the result of any damages that might arise out of the use of AWS. As noted in [92],
AWSLA prohibits the use of \other information obtained through AWS for the purpose of direct
marketing, spamming, contacting sellers or customers." It prohibits AWS from being used to store
any content that is \obscene, libellous, defamatory or otherwise malicious or harmful to any person
or entity;" it also prohibits S3 from being used \in any way that is otherwise illegal or promotes
illegal activities, including without limitation in any manner that might be discriminatory based on
race, sex, religion, nationality, disability, sexual orientation or age."
Cloud computing, the Google perspective
Google's efforts are concentrated in the area of Software as a Service (SaaS); Gmail, Google docs,
Google calendar, Picassa and Google Groups are Google services free of charge for individual
users and available for a fee for organizations. These services are running on a cloud and can be
invoked from a broad spectrum of devices including mobile ones such as iPhones, iPads,
BlackBerries, and laptops and tablets; the data for these services is stored at data centers on the
cloud.
The Gmail service hosts the Emails on Google servers, provides a web interface to access them
and tools for migrating from Lotus Notes and Microsoft Exchange. Google docs is a Web-based
software for building text documents, spreadsheets and presentations. It supports features such as
tables, bullet points, basic fonts and text size; it allows multiple users to edit and update the same
document and to view the history of document changes and it has a spell checker. The service
allows users to import and export files in several formats including Office, PDF, text, and
OpenOffice extensions.
5
Google calendar is a browser-based scheduler; it supports multiple calendars for a user, the ability
to share a calendar with other users, the display of daily/weekly/monthly views, to search events,
and to synchronize with the Outlook Calendar. The calendar is accessible from mobile devices;
event reminders can be received via SMS, desktop pop-ups, or Emails.
It is also possible to share your calendar with other Google calendar users. Picasa is a tool to
upload, share, and edit images; it provides 1 GB of disk space per user. Users can add tags to
images and attach locations to photos using Google Maps. Google Groups allows users to host
discussion forums to create messages online or via email.
Google is also a leader in the Platform-as-a-Service (PaaS) space. AppEngine is a developer
platform hosted on the cloud; initially it supported only Python and support for Java was added
later; detailed documentation for Java is available. The database for code development can be
accessed with GQL (Google Query Language) with an SQL-like syntax.
The concept of structured data is important for Google's service strategy. The change of search
philosophy reflects the transition from unstructured Web content to structured data, data which
contain additional information, e.g., the place where a photograph was taken, information about
the singer of a digital recording of a song, the local services at a geographic location, and so on.
Search engine crawlers rely on hyperlinks to discover new content. The deep web is content stored
in databases and served as pages created dynamically by querying HTML forms; such content is
unavailable to crawlers which are unable to fill out such forms. Examples of deep Web sources are:
sites with geographic-specific information such as local stores, services, and business; sites which
report statistics and analysis produced by governmental and non governmental organizations; art
collections; photo galleries; bus, train, and airlines schedules, and so on. Structured content is
created by labelling;
Flickr and Google Co-op are examples of structures where labels and annotations are added to
objects, images and pages, stored on the Web. Google Co-op allows users to create customized
search engines based on a set of facets or categories; for example, the facets for a search engine for
the
database
re-search
community
available
at
http://data.cs.washington.edu/coop/dbresearch/index.html are: professor, project, publication,
jobs.
Google Base is a service allowing the users to load structured data from different sources to a
central repository which is a very large, self-describing, semi-structured, heterogeneous database;
it is self-describing because each item follows a simple schema: (item type, attribute names). Few
users are aware of this service thus, Google Base is accessed in response to keyword queries posed
on Google.com provided that there is relevant data in the database.
To fully integrate Google Base the results should be ranked across properties; also the service
needs to propose appropriate refinements with candidate values in select-menus; this is done by
computing histograms on attributes and their values during query time.
6
Specialized structure-aware search engines for several areas, including travel, weather and local
services, have already been implemented. But the data available on the Web covers a wealth of
human knowledge; it is not feasible to define all the possible domains and it is nearly impossible to
decide where one domain ends and another begins.
Google has also redefrned the laptop with the introduction of the Chromebook, a purely
Web-centric tablet running Chrome-OS. Cloud-based apps, extreme portability, built-in 3G
connectivity, almost instant-on, and all-day battery life are the main attractions of this tablet with a
keyboard.
Google adheres to a bottom-up, engineer-driven, and liberal licensing and user application
development philosophy, while Apple, a recent entry in cloud computing, tightly controls the
technology stack, builds its own hardware and requires the applications developed to follow strict
rules. Apple products including the iPhone, the iOS, the iTunes Store, Mac OS X, and iCloud offer
unparalleled polish and effortless interoperability, while the flexibility of Google results in more
cumbersome user interfaces for the broad spectrum of devices running the Android OS.
Windows Azure and Online Services
Azure and Online Services are PaaS (Platforms as a Service) and, respectively, SaaS (Software as a
Service) cloud platforms from Microsoft. Windows Azure is an operating system, SQL Azure is a
cloud-based version of the SQL Server, and Azure AppFabric (formerly .NET Services) is a
collection of services for cloud applications.
The components of Windows Azure: Compute - runs cloud applications; Storage - uses blobs,
tables, and queues to store data; Fabric Controller - deploys, manages, and monitors applications;
CDN - maintains cache copies of data; Connect - allows IP connections between the user systems
and applications running on Windows Azure.
7
Windows Azure has three core components (see Figure): Compute which provides a computation
environment, Storage for scalable storage, and Fabric Controller which deploys,manages, and
monitors applications; it interconnects nodes consisting of servers, high-speed connections, and
switches.
The Content Delivery Network (CDN) maintains cache copies of data to speed up computations.
The Connect subsystem supports IP connections between the users and their applications running
on Windows Azure. The API interface to Windows Azure is built on REST, HTTP and XML. The
platform includes five services: Live Services, SQL Azure, AppFabric, SharePoint and Dynamics
CRM. A client library and tools are also provided for developing cloud applications in Visual
Studio.
The computations carried out by an application are implemented as one or more roles; an
application typically runs multiple instances of a role. One distinguishes: (i) Web role instances
used to create Web applications; (ii) Worker role instances used to run Window-based code; and
(iii) VM role instances which run a user-provided Windows Server 2008 R2 image.
Scaling, load balancing, memory management, and reliability are ensured by a fabric controller, a
distributed application replicated across a group of machines which owns all of the resources in its
environment: computers, switches, load balancers, and it is aware of every Windows Azure
application. The fabric controller decides where new applications should run; it chooses the
physical servers to optimize utilization using configuration information uploaded with each
Windows Azure application. The configuration information is an XML-based description of how
many Web role instances, how many Worker role instances, and of other needs of the application;
the fabric controller uses this configuration file to determine how many VMs to create.
Blobs, tables, queue, and drives are used as scalable storage. A blob contains binary data, a
container consists of one or more blobs. Blobs can be up to a terabyte and they may have
associated metadata, e.g., the information about where a JPEG photograph was taken. Blobs allow
a Windows Azure role instance interact with persistent storage as if it were a local NTFS13 ¯le
system. Queues enable Web role instances to communicate asynchronously with worker role
instances.
The Microsoft Azure platform currently does not provide or support any distributed parallel
computing frameworks, such as MapReduce, Dryad or MPI, other than the support for
implementing basic queue-based job scheduling.
After reviewing cloud services provided by Amazon, Google, and Microsoft we are in a better
position to understand the di®erences between SaaS, IaaS, and PaaS. There is no confusion about
SaaS, the service provider supplies both the hardware and the application software; the user has
direct access to these services through a Web interface and has no control on cloud resources.
Typical examples are Google with Gmail, Google docs, Google calendar, Google Groups, and
Picassa and Microsoft with the Online Services.
8
In the case of IaaS, the service provider supplies the hardware (servers, storage, networks), and
system software (operating systems, databases); in addition, the provider ensures system attributes
such as security, fault-tolerance, and load balancing. The representative of IaaS is Amazon AWS.
PaaS provides only a platform including the hardware and system software such as operating
systems and databases; the service provider is responsible for system updates, patches, and the
software maintenance. PaaS does not allow any user control on the operating system, security
features, or the ability to install applications. Typical examples are Google App Engine, Microsoft
Azure, and Force.com provided by Salesforce.com.
The level of users control over the system is different in IaaS versus PaaS; IaaS provides total
control, PaaS typically provides no control. Consequently, IaaS incurs administration costs similar
to a traditional computing infrastructure while the administrative costs are virtually zero for PaaS.
Open-source software platforms for private clouds Private clouds provide a cost effective
alternative for very large organizations. A private cloud has essentially the same structural
components as a commercial one: the servers, the network, Virtual Machines Monitors (VMM)
running on individual systems, an archive containing disk images of Virtual Machines (VMs), a
front end for communication with the user, and a cloud control infrastructure.
Open-source cloud computing platforms such as Eucaliptus, OpenNebula, and Nimbus can be
used as a control infrastructure for a private cloud.
Schematically, a cloud infrastructure carries out the following steps to run an application:

retrieves the user input from the front-end; retrieves the disk image of a VM (Virtual
Machine) from a repository;

locates a system and requests the VMM (Virtual Machine Monitor) running on that
system to setup a VM;

invokes the DHCP14 and the IP bridging software to set up a MAC and IP address
for the VM.
We now discuss briefly the three open-source software systems, Eucalyptus, OpenNebula,
and Nimbus.
Eucalyptus (http://www.eucalyptus.com/) can be viewed as an open-source counterpart of
Amazon's EC2. The system supports a strong separation between the user space and administrator
space; users access the system via a Web interface while administrators need root access. The
system supports a decentralized resource management of multiple clusters with multiple cluster
controllers, but a single head node for handling user interfaces. It implements a distributed storage
system, the analog of Amazons S3 system called Walrus. The procedure to construct a virtual
machine is based on the generic one described above

the euca2ools front-end is used to request a VM;
9





the VM disk image is transferred to a compute node;
this disk image is modified for use by the VMM on the compute node;
the compute node sets up network bridging to provide a virtual NIC15 with a virtual
MAC address16;
in the head node the DHCP is set up with the MAC/IP pair;
VMM activates the VM;
the user can now ssh directly into the VM.
The system can support a large number of users in a corporate enterprise environment. Users are
shielded from the complexity of disk configurations and can choose for their VM from a set of 5
configurations for available processors, memory and hard drive space setup
by the system administrators.
Open-Nebula (http://www.opennebula.org/) is a private cloud with users actually logging into the
head node to access cloud functions. The system is centralized and its default configuration uses
the NFS filesystem.
The procedure to construct a virtual machine consists of several steps:
(i) a user signs in to the head node using ssh17;
(ii) next, it uses the one vm command to request a VM;
(iii) the VM template disk image is transformed to fit the correct size and configuration within the
NFS directory on the head node;
(iv) the oned daemon on the head node uses ssh to log into a compute node;
(v) the compute node sets up network bridging to provide a virtual NIC with a virtual MAC;
(vi) the files needed by the VMM are transferred to the compute node via the NFS;
(vii)) the VMM on the compute note starts the VM;
(viii) the user is able to ssh directly to the VM on the compute node.
The system is best suited for an operation involving a small to medium size group of trusted and
knowledgeable users who are able to configure this versatile system based on their needs.
Nimbus (http://www.nimbusproject.org/) is a cloud solution for scientific applications based on
the Globus software; The system inherits from Globus the image storage, the credentials for user
authentication, and the requirement that the running Nimbus process can ssh into all compute
nodes. Customization in this system can only be done by the system administrators.
Table 4 summarizes the features of the three systems.
10
The conclusions of the comparative analysis are as follows: Eucalyptus is best suited for a large
corporation with its own private cloud; as it ensures a degree of protection from user malice and
mistakes;
OpenNebula is best suited for a testing environment with a few servers; Nimbus is more adequate
for a scientific community less interested in the technical internals of the system, but with broad
customization requirements.
Service level agreements and compliance level agreements
A Service Level Agreement (SLA) is a negotiated contract between two parties, the customer and
the service provider; the agreement can be legally binding or informal and specifies the services
that the customer receives, rather than how the service provider delivers the services.
The objectives of the agreement are:








Identify and define the customers needs and constraints including the level of resources,
security, timing, and quality of service.
Provide a framework for understanding; a critical aspect of this framework is a clear
definition of classes of service and the costs.
Simplify complex issues; for example, clarify the boundaries between the responsibilities
of the clients and those of the provider of service in case of failures.
Reduce areas of conflict.
Encourage dialog in the event of disputes.
Eliminate unrealistic expectations.
An SLA records a common understanding in several areas:
(i)
services,
(ii)
priorities,
(iii) responsibilities,
11
(iv)
(v)
guarantees, and
warranties.
An agreement usually covers: services to be delivered, performance, tracking and reporting,
problem management, legal compliance and resolution of disputes, customer duties and
responsibilities, security, handling of confidential information, and termination.
Each area of service in cloud computing should define a \target level of service" or
\minimum level of service" and specify the levels of availability, serviceability, performance,
operation, or other attributes of the service, such as billing; penalties may also be specified in the
case of non-compliance of the SLA. It is expected that any Service-Oriented Architecture (SOA)
will eventually include middleware supporting SLA management; the Framework 7 project
supported by the European Union is researching this area, see http://sla-at-soi.eu/.
The common metrics specified by an SLA are service-specific. For example, the metrics used by a
call center usually are: (i) abandonment rate - percentage of calls abandoned while waiting to be
answered; (ii) average speed to answer - average time before the service desk answers a call; (iii)
time service factor - percentage of calls answered within a definite time frame; (iv) first-call
resolution - percentage of incoming calls that can be resolved without a callback; and (v)
turnaround time - time to complete a certain task.
There are two well-differentiated phases in SLA management: the negotiation of the contract and
the monitoring of its ful¯lment in real-time. In turn, automated negotiation has three main
components: (i) the object of negotiation which define the attributes and constraints under
negotiation; (ii) the negotiation protocols which describe the interaction between negotiating
parties, and (iii) the decision models responsible for processing proposals and generating counter
proposals.
The concept of compliance in cloud computing is in the context of the user ability to select a
provider of service; the selection process is subject to customizable compliance with user
requirements such as security, deadlines, and costs.
The authors propose an infrastructure called Compliant Cloud Computing (C3) consisting of: (i) a
language to express user requirements and the Compliance Level Agreements (CLA), and (ii) the
middleware for managing CLAs.
The Web Service Agreement Specification (WS-Agreement) uses an XML-based language to
de¯ne a protocol for creating an agreement using a pre-defined template with some customizable
aspects; it only supports one-round negotiation without counter proposals.
12
Download