g1 - Ankurm.com

advertisement
Aim: Introduction to cloud computing, case study and working of Google App engine and
Amazon cloud.
Theory:
Cloud Computing
In computer networking, cloud computing is a phrase used to describe a variety of computing
concepts that involve a large number of computers connected through a communication
network such as the Internet. It is very similar to the concept of utility computing. In science,
cloud computing is a synonym for distributed computing over a network, and means the
ability to run a program or application on many connected computers at the same time.
The phrase is often used in reference to network-based services, which appear to be provided
by real server hardware, and are in fact served up by virtual hardware, simulated by software
running on one or more real machines. Such virtual servers do not physically exist and can
therefore be moved around and scaled up or down on the fly without affecting the end user,
somewhat like a cloud becoming larger or smaller without being a physical object.
In common usage, the term "the cloud" is essentially a metaphor for the Internet. Marketers
have further popularized the phrase "in the cloud" to refer to software, platforms and
infrastructure that are sold "as a service", i.e. remotely through the Internet. Typically, the
seller has actual energy-consuming servers which host products and services from a remote
location, so end-users don't have to; they can simply log on to the network without installing
anything. The major models of cloud computing service are known as software as a service,
platform as a service, and infrastructure as a service. These cloud services may be offered in a
public, private or hybrid network. Google, Amazon, Oracle Cloud, Salesforce, Zoho,
Access2MyPC, and Microsoft Azure are some well-known cloud vendors.
Advantages
Cloud computing relies on sharing of resources to achieve coherence and economies of scale,
similar to a utility (like the electricity grid) over a network. At the foundation of cloud
computing is the broader concept of converged infrastructure and shared services.
The cloud also focuses on maximizing the effectiveness of the shared resources. Cloud
resources are usually not only shared by multiple users but are also dynamically reallocated
per demand. This can work for allocating resources to users. For example, a cloud computer
facility that serves European users during European business hours with a specific application
(e.g., email) may reallocate the same resources to serve North American users during North
America's business hours with a different application (e.g., a web server). This approach
should maximize the use of computing powers thus reducing environmental damage as well
since less power, air conditioning, rackspace, etc. is required for a variety of functions. With
cloud computing, multiple users can access a single server to retrieve and update their data
without purchasing licenses for different applications.
Proponents claim that cloud computing allows companies to avoid upfront infrastructure
costs, and focus on projects that differentiate their businesses instead of infrastructure.
Proponents also claim that cloud computing allows enterprises to get their applications up
and running faster, with improved manageability and less maintenance, and enables IT to
more rapidly adjust resources to meet fluctuating and unpredictable business demand. Cloud
providers typically use a "pay as you go model." This can lead to unexpectedly high charges
if administrators do not adapt to the cloud pricing model.
Service models
Cloud computing providers offer their services according to several fundamental models:
infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service
(SaaS) where IaaS is the most basic and each higher model abstracts from the details of the
lower models.
 Infrastructure as a service (IaaS)
In the most basic cloud-service model, providers of IaaS offer computers – physical or
(more often) virtual machines – and other resources. IaaS clouds often offer additional
resources such as a virtual-machine disk image library, raw (block) and file-based
storage, firewalls, load balancers, IP addresses, virtual local area networks (VLANs),
and software bundles. IaaS-cloud providers supply these resources on-demand from
their large pools installed in data centers. For wide-area connectivity, customers can
use either the Internet or carrier clouds (dedicated virtual private networks).
 Platform as a service (PaaS)
In the PaaS models, cloud providers deliver a computing platform, typically including
operating system, programming language execution environment, database, and web
server. Application developers can develop and run their software solutions on a cloud
platform without the cost and complexity of buying and managing the underlying
hardware and software layers. With some PaaS offers like Windows Azure, the
underlying computer and storage resources scale automatically to match application
demand so that the cloud user does not have to allocate resources manually. The latter
has also been proposed by an architecture aiming to facilitate real-time in cloud
environments.
 Software as a service (SaaS)
In the business model using software as a service (SaaS), users are provided access to
application software and databases. Cloud providers manage the infrastructure and
platforms that run the applications. SaaS is sometimes referred to as "on-demand
software" and is usually priced on a pay-per-use basis. SaaS providers generally price
applications using a subscription fee.
Fig: Cloud Computing
Google App Engine
Google App Engine (often referred to as GAE or simply App Engine) is a platform as a
service (PaaS) cloud computing platform for developing and hosting web applications in
Google-managed data centers. Applications are sandboxed and run across multiple servers.
App Engine offers automatic scaling for web applications—as the number of requests
increases for an application, App Engine automatically allocates more resources for the web
application to handle the additional demand.
Google App Engine is free up to a certain level of consumed resources. Fees are charged for
additional storage, bandwidth, or instance hours required by the application. It was first
released as a preview version in April 2008, and came out of preview in September 2011.
Supported features/restrictions
Runtimes and framework
Currently, the supported programming languages are Python, Java (and, by extension, other
JVM languages such as Groovy, JRuby, Scala, Clojure), Go, and PHP. Go and PHP are in
experimental status. Google has said that it plans to support more languages in the future, and
that the Google App Engine has been written to be language independent.
Python web frameworks that run on Google App Engine include Django, CherryPy, Pyramid,
Flask, web2py and webapp2, as well as a custom Google-written webapp framework and
several others designed specifically for the platform that emerged since the release. Any
Python framework that supports the WSGI using the CGI adapter can be used to create an
application; the framework can be uploaded with the developed application. Third-party
libraries written in pure Python may also be uploaded.
Google App Engine supports many Java standards and frameworks. Core to this is the servlet
2.5 technology using the open-source Jetty Web Server, along with accompanying
technologies such as JSP. JavaServer Faces operates with some workarounds. Though the
datastore used may be unfamiliar to programmers, it is easily accessed and supported with
JPA. JDO and other methods of reading and writing data are also provided. The Spring
Framework works with GAE, however the Spring Security module (if used) requires
workarounds. Apache Struts 1 is supported, and Struts 2 runs with workarounds.
The Django web framework and applications running on it can be used on App Engine with
modification. Django-nonrel aims to allow Django to work with non-relational databases and
the project includes support for App Engine.
Reliability and Support
App Engine is designed in such a way that it can sustain multiple datacenter outages without
any downtime. This resilience to downtime is shown by the statistic that the High Replication
Datastore saw 0% downtime over a period of a year. In general, App Engine applications
have a 99.95% uptime SLA.
Paid support from Google engineers is offered as part of Premier Accounts. Free support is
offered in the App Engine Groups and Stack Overflow, however assistance by a Google staff
member is not guaranteed.
Bulk downloading
SDK version 1.2.2 adds support for bulk downloads of data using Python. The open source
Python projects gaebar, approcket, and gawsh also allow users to download and backup App
Engine data. No method for bulk downloading data from GAE using Java currently exists.
Restrictions
 Developers have read-only access to the filesystem on App Engine. Applications can
use only virtual filesystems, like gae-filestore.
 App Engine can only execute code called from an HTTP request (scheduled
background tasks allow for self calling HTTP requests).
 Users may upload arbitrary Python modules, but only if they are pure-Python; C and
Pyrex modules are not supported.
 Java applications may only use a subset (The JRE Class White List) of the classes
from the JRE standard edition.
 Does not support 'naked' domains (without www) like http://example.com. The
required alias to ghs.google.com is implemented with a DNS CNAME record in order
for changes in Google server IP addresses not to impact the service. This record
cannot be used with other DNS records, including the required Start of Authority for
the example.com DNS zone.
 Datastore cannot use inequality filters on more than one entity property per query.
 A process started on the server to answer a request can't last more than 60 seconds
(with the 1.4.0 release, this restriction does not apply to background jobs anymore).
 Does not support sticky sessions (a.k.a. session affinity), only replicated sessions are
supported including limitation of the amount of data being serialized and time for
session serialization.
Major differences
 Differences with other application hosting
Compared to other scalable hosting services such as Amazon EC2, App Engine
provides more infrastructure to make it easy to write scalable applications, but can
only run a limited range of applications designed for that infrastructure.
App Engine's infrastructure removes many of the system administration and
development challenges of building applications to scale to hundreds of requests per
second and beyond. Google handles deploying code to a cluster, monitoring, failover,
and launching application instances as necessary.
While other services let users install and configure nearly any *NIX compatible
software, App Engine requires developers to use only its supported languages, APIs,
and frameworks. Current APIs allow storing and retrieving data from a BigTable nonrelational database; making HTTP requests; sending e-mail; manipulating images; and
caching. Existing web applications that require a relational database will not run on
App Engine without modification.
Per-day and per-minute quotas restrict bandwidth and CPU use, number of requests
served, number of concurrent requests, and calls to the various APIs, and individual
requests are terminated if they take more than 60 seconds or return more than 32MB
of data.
 Differences between SQL and GQL
Google App Engine's datastore has a SQL-like syntax called "GQL". GQL
intentionally does not support the Join statement, because it seems to be inefficient
when queries span more than one machine. Instead, one-to-many and many-to-many
relationships can be accomplished using ReferenceProperty(). This shared-nothing
approach allows disks to fail without the system failing. Switching from a relational
database to the Datastore requires a paradigm shift for developers when modelling
their data. Unlike a relational database the Datastore API is not relational in the SQL
sense.
The Java version supports asynchronous non-blocking queries using the Twig Object
Datastore interface. This offers an alternative to using threads for parallel data
processing.
Portability concerns
Developers worry that the applications will not be portable from App Engine and fear being
locked into the technology. In response, there are a number of projects to create open-source
back-ends for the various proprietary/closed APIs of app engine, especially the datastore.
Although these projects are at various levels of maturity, none of them are at the point where
installing and running an App Engine app is as simple as it is on Google's service. AppScale
and TyphoonAE are two of the open source efforts.
AppScale can run Python, Java, and Go GAE applications on EC2 and other cloud vendors.
TyphoonAE can run Python App Engine applications on any cloud that support linux
machines.
Web2py web framework offers migration between SQL Databases and Google App Engine,
however it doesn't support several App Engine-specific features such as transactions and
namespaces.
Backends
In Google I/O 2011, Google announced App Engine Backends, which are allowed to run
continuously, and consume more memory.
Google Cloud SQL
In Oct 2011, Google previewed a zero maintenance SQL database, which supports JDBC and
DB-API. This service allows to create, configure and use relational databases with App
Engine applications. The database engine is MySql Version 5.1.59 and the database size must
be no larger than 10GB.
Usage quotas of Google App Engine
Google App Engine requires a Google account to get started, and an account may allow the
developer to register up to 10 applications. This limit can be increased by Google staff.
Google App Engine defines usage quotas for free applications. Extensions to these quotas can
be requested, and application authors can pay for additional resources. Below are limit and
quotas defined per application:
Hard limits
Quota
Limit
Time per request
60 sec per normal request, 10 minutes for tasks, unlimited for backends
HTTP response size 32 MB
Datastore item size 1 MB
Free quotas
Application creators who enable billing pay only for instance hours, bandwidth, storage, and
API usage in excess of the free quotas. Free quotas were reduced on May 25, 2009, reduced
again on June 22, 2009. But then revised in May 2011 to allow for more infrastructure and
pricing changes.
Quota
Limit (per day)
Instance-hours
28 hours
Emails
100 (5000 admin emails)
Bandwidth in
Unlimited
Bandwidth out
1 GB
Datastore
1 GB
Datastore Operations
50k
Blob Storage
5 GB
XMPP API
10k stanzas
Channel API
100 channels opened
URLFetch API calls per day 657,000
Amazon Web Services
Amazon Elastic Compute Cloud (EC2) is a central part of Amazon.com's cloud computing
platform, Amazon Web Services (AWS). EC2 allows users to rent virtual computers on which
to run their own computer applications. EC2 allows scalable deployment of applications by
providing a Web service through which a user can boot an Amazon Machine Image to create
a virtual machine, which Amazon calls an "instance", containing any software desired. A user
can create, launch, and terminate server instances as needed, paying by the hour for active
servers, hence the term "elastic". EC2 provides users with control over the geographical
location of instances that allows for latency optimization and high levels of redundancy.
The Amazon Elastic Block Store (EBS) provides raw block devices that can be attached to
Amazon EC2 instances. These block devices can then be used like any raw block device. In a
typical use case, this would include formatting the device with a file system and mounting
said file system. In addition EBS supports a number of advanced storage features, including
cloning. EBS volumes can be up to 1TB in size. EBS volumes are built on replicated storage,
so that the failure of a single component will not cause data loss.
Features
1. Operating Systems
When it launched in August 2006, the EC2 service offered Linux and later Sun
Microsystems OpenSolaris and Solaris Express Community Edition. In October 2008,
EC2 added the Windows Server 2003 and Windows Server 2008 operating systems to
the list of available operating systems. In November 2012, Amazon officially
supported running FreeBSD in EC2. In March 2011, NetBSD AMIs became available.
2. Amazon Linux AMI
Amazon has their own Linux distribution based on the Fedora and Red Hat Enterprise
Linux as a low cost offering known as the Amazon Linux AMI. Version 2013.03
included:

Linux kernel version 3.4.34

Java Open JDK Runtime Environment (IcedTea6 1.11.4)

GNU Compiler Collection gcc.x86_64 4.4.6-3.45.amzn1
3. Persistent storage
An EC2 instance may be launched with a choice of two types of storage for its boot
disk or "root device." The first option is a local "instance-store" disk as a root device
(originally the only choice). The second option is to use an EBS volume as a root
device. Instance-store volumes are temporary storage, which survive rebooting an
EC2 instance, but when the instance is stopped or terminated (e.g., by an API call, or
due to a failure), this store is lost.
EBS volumes provide persistent storage independent of the lifetime of the EC2
instance, and act much like hard drives on a real server. More accurately, they appear
as block devices to the operating system that are backed by Amazon's disk arrays. The
OS is free to use the device however it wants. In the most common case, a file system
is loaded and the volume acts as a hard drive. Another possible use is the creation of
RAID arrays by combining two or more EBS volumes. RAID allows increases of
speed and/or reliability of EBS. Users can set up and manage storage volumes of sizes
from 1GB to 1TB. The volumes support snapshots, which can be taken from a GUI
tool or the API. EBS volumes can be attached or detached from instances while they
are running, and moved from one instance to another.
Simple Storage Service (S3) is a storage system in which data is accessible to EC2
instances, or directly over the network to suitably authenticated callers (all
communication is over HTTP). Amazon does not charge for the bandwidth for
communications between EC2 instances and S3 storage "in the same region."
Accessing S3 data stored in a different region (for example, data stored in Europe
from a US East Coast EC2 instance) will be billed at Amazon's normal rates.
S3-based storage is priced per gigabyte per month. Applications access S3 through an
API. For example, Apache Hadoop supports a special s3: filesystem to support
reading from and writing to S3 storage during a MapReduce job. There are also S3
filesystems for Linux, which mount a remote S3 filestore on an EC2 image, as if it
were local storage. As S3 is not a full POSIX filesystem, things may not behave the
same as on a local disk (e.g., no locking).
4. Elastic IP addresse
Amazon's elastic IP address feature is similar to static IP address in traditional data
centers, with one key difference. A user can programmatically map an elastic IP
address to any virtual machine instance without a network administrator's help and
without having to wait for DNS to propagate the new binding. In this sense an Elastic
IP Address belongs to the account and not to a virtual machine instance. It exists until
it is explicitly removed, and remains associated with the account even while it is
associated with no instance. Partial IPv6 is provided in the US East (Northern
Virginia), EU (Ireland) and Asia Pacific (Tokyo & Singapore) regions.
5. Amazon CloudWatch
Amazon CloudWatch is a Web service that provides real-time monitoring to Amazon's
EC2 customers on their resource utilization such as CPU, disk and network.
Cloudwatch does not provide any memory, disk space, or load average metrics
without running additional software on the instance. Amazon provides example
scripts for Linux instances. The data is aggregated and provided through AWS
management console. It can also be accessed through command line tools and Web
API's, if the customer desires to monitor their EC2 resources through their enterprise
monitoring software.
The metrics collected by Amazon CloudWatch enables Auto Scaling feature to
dynamically add or remove EC2 instances. The customers are charged by the number
of monitoring instances.
Since May 2011, Amazon CloudWatch accepts custom metrics that can be submitted
programmatically via Web Services API and then monitored the same way as all other
internal metrics, including setting up the alarms for them.
6. Automated scaling
Amazon's auto scaling feature of EC2 allows it to automatically adapt computing
capacity to site traffic. The schedule-based (e.g. time-of-the-day) and rule-based (e.g.
CPU utilization thresholds) auto scaling mechanisms are easy to use and efficient for
simple applications. However, one potential problem is that VMs may take up to
several minutes to be ready to use, which are not suitable for time critical
applications. The VM startup time are dependent on image size, VM type, data center
locations, operating systems etc.
Benefits
Elastic Web-Scale Computing
Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days.
You can commission one, hundreds or even thousands of server instances simultaneously. Of
course, because this is all controlled with web service APIs, your application can
automatically scale itself up and down depending on its needs.
Completely Controlled
You have complete control of your instances. You have root access to each one, and you can
interact with them as you would any machine. You can stop your instance while retaining the
data on your boot partition and then subsequently restart the same instance using web service
APIs. Instances can be rebooted remotely using web service APIs. You also have access to
console output of your instances.
Flexible Cloud Hosting Services
You have the choice of multiple instance types, operating systems, and software packages.
Amazon EC2 allows you to select a configuration of memory, CPU, instance storage, and the
boot partition size that is optimal for your choice of operating system and application. For
example, your choice of operating systems includes numerous Linux distributions, and
Microsoft Windows Server.
Designed for use with other Amazon Web Services
Amazon EC2 works in conjunction with Amazon Simple Storage Service (Amazon S3),
Amazon Relational Database Service (Amazon RDS), Amazon SimpleDB and Amazon
Simple Queue Service (Amazon SQS) to provide a complete solution for computing, query
processing and storage across a wide range of applications.
Reliable
Amazon EC2 offers a highly reliable environment where replacement instances can be
rapidly and predictably commissioned. The service runs within Amazon’s proven network
infrastructure and datacenters. The Amazon EC2 Service Level Agreement commitment is
99.95% availability for each Amazon EC2 Region.
Secure
Amazon EC2 works in conjunction with Amazon VPC to provide security and robust
networking functionality for your compute resources.
 Your compute instances are located in a Virtual Private Cloud (VPC) with an IP range
that you specify. You decide which instances are exposed to the Internet and which
remain private.
 Security Groups and networks ACLs allow you to control inbound and outbound
network access to and from your instances.
 You can connect your existing IT infrastructure to resources in your VPC using
industry-standard encrypted IPsec VPN connections.
Conclusion: Hence we have studied Cloud Comptuing, Google App Engine and Amazon
Web Service.
Download