Developing Applications for Cloud Computing Platforms Jeremy Cohen Department of Computing

advertisement
Developing Applications for
Cloud Computing Platforms
Jeremy Cohen
Department of Computing
Imperial College London
The Influence and Impact of Web 2.0 on
e-Research Infrastructure, Applications and Users
NeSC, Edinburgh
24th March 2009
Outline
• Why migrate my apps to the Cloud?
• Application / Usage profiles
• Challenges
• Client / Server-side Technologies
• Examples
Why migrate?
• Need more compute power / storage than easily accessible
locally / free up local resources
• Avoid costs/problems of local resource hosting
•
Power, cooling, space, maintenance, …
• Flexibility / Scalability
• Discontinuous demand
• Rapid growth / decline
• Provisioning resources in-house takes too long
Why migrate?
• Pay only for what you use
• Local networking / bandwidth constraints
• Move some/most costs from Capex to Opex
• Greater control – firewalls, resource types, etc.
• Transparent technology refresh
Why not migrate?
• Unsuitable application model
• Security concerns – confidential data / algorithms / …
• Specific hardware/infrastructure requirements (e.g. highperformance inter-node linking)
• Infrastructure location issues
• Latency concerns
• Resource/data storage locations
• SLA guarantees not satisfactory
What services are on offer?
• Limited number of raw infrastructure providers
• Increasing numbers of higher level service providers
• Infrastructure – dynamic DNS, load balancing, etc.
• Brokering / Marketplace
• Software toolkits
• Simplified resource management – APIs, GUIs
• Consultants / Application enablers
•
Different payment models
Application Profiles
Where does your app fit in?
Application profiles
• Batch applications – limited / no interactivity
Data in
• HPC applications
Results
out
• Client / server – Web 2.0 apps, Software-as-a-Service
• Standalone interactive applications
Application profiles
• Batch applications
• Code takes some input data and carries out processing,
returning result data
• Generally no interactivity
• Individual tasks may be
• Computationally intensive – long running
• Computationally simple but high throughput
✔
• May require significant data to carry out processing – either
as input or from third-party source
• Likely to be produced as a native executable so may require
a specific CPU type for execution
Application profiles
• Web 2.0 apps – client / server model
• High throughput, interactivity
• May be data intensive / processor intensive
• Loosely-coupled, client/server design
• Message-based communication between application
components
• Handle state / sessions for support of multiple concurrent
clients
• SaaS
• Service enabled application core
• Client-side (web) application provides remote GUI
✔
Application profiles
• Standalone interactive applications
• Traditional desktop applications
• Highly interactive but generally not highly processor intensive
• Tight coupling between application functionality and user
interface
• Generally not designed for access by multiple (concurrent)
users
✖
?
Application profiles
• HPC Applications
• Processor/Memory intensive
• Data intensive
• Generally batch applications but may have elements of
interactivity
• May be parallelised – operation across multiple CPUs (e.g.
MPI, OpenMP, Hadoop, …)
✓
• May require extensive communication between parallel nodes
(high performance interconnects required)
• Visualisation / steering of output often necessary
Usage profiles
• Frequency
• How frequently an application is used
• Is usage predictable?
• Load
• Does application require significant processing power?
• Is the processing requirement similar for each application
run?
• Is it dependent on input data?
• Can required processing capacity be identified
programmatically in advance of an application run?
Usage profiles
• Data volume / proximity / coupling
• How much data is involved in a run of the application?
• Is data proximity of importance – if there is a lot of transfer of
data between storage and execution resource, data should be
stored close to where the app is run
• How tightly coupled is the data – can data transfer be
optimised?
• Availability / Reliability – need SLA?
• Are guarantees on uptime / reliability needed?
• If the resources running the application go down, how long
will it take / how complex will it be to restart it?
Usage profiles
• Information Security
• How critical is data/code security?
• IP in code (algorithms, etc.), data
• Data protection issues – where can data be sent / stored?
• Is third party data being used? Can this be transferred to
another location for processing?
• Latency requirements
• Real time data processing applications
• Are there specific requirements for latency on network
connections?
• Are these catered for under SLA?
Challenges – Preparing Your
Application for the Cloud
Preparing your application
• What are you aiming for?
• One-off/occasional manual execution of an application on a
remote resource from a terminal
• e.g. long running HPC app, don’t want to hog CPU on
local resource for a long period of time
• Use a Cloud platform such as Amazon EC2 to create an
instance of a Cloud resource and interact with it via a
terminal to upload and run your application
• Full remote deployment of application
• Remote execution / interaction
Preparing your application
• Batch applications (e.g. scientific HPC codes)
• If native code, need to ensure CPU/OS requirements are
supported
• Same goes for apps based on JIT / interpreted languages
• Does application have a GUI?
• Data transfer issues – if very data intensive, data transfer may
present problems
• Dynamic deployment / wrapping?
Preparing your application
• Web 2.0 / SaaS applications
• Deploy necessary application server and server-side code
• If supported by Cloud provider, bundle deployed system in
platform wrapper for easy restart / creating additional nodes
• Storage considerations
• How much output data is there?
• Where are you going to put it?
Preparing your application - Web 2.0
• Aim for loosely-coupled SOA model
Client
Interface
Application
Component
Application
Component
Application
Component
• Decoupling of GUI from
backend
Preparing your application - Batch
• Getting native executables onto remote platform and
controlling execution
• Deploy app at runtime – e.g. via job manager /
middleware installed on Cloud instance
Interface
Messaging APIs
Native Code
Executable
Native Libraries
Service Wrapper
• Lightweight application wrapping
• Provide service interface for basic
execution control of apps
• e.g. start, getOutput, getError
• Static deployment of application into
Cloud instance
Technologies –
Server-side / Client-side
Service-enabling your application
Server side software / technologies
• Cloud environments may provide a managed interface to physical
hardware, or a virtualised platform on which you install your own
OS/application image
• An Application Server / Servlet Container may be needed to host
your application and provide the messaging infrastructure to
communicate with it
• e.g. Apache Tomcat, Glassfish, JBoss, etc.
Server-side software / technologies
•
Services / Messaging / Transport – Getting messages to Cloud apps
•
Web Services (WSDL, SOAP) –
•
Apache Axis, JAX-WS, …
Messaging
(e.g. SOAP over HTTP)
Client
App
Server
Service
Description
(e.g. WSDL)
•
HTTP GET/POST
•
JMS
•
Adobe BlazeDS
•
RMI
•
CORBA, …
Client-side software / technologies
• Client-side tools / RIA Platforms
• JavaScript Libraries – e.g.
• Web development – e.g.
• HTML, Javascript, AJAX, …
•
Prototype, jQuery, Yahoo
•
Dojo, Script.aculo.us, …
• RIA platforms – e.g.
• Adobe Flex
• Sun JavaFX
• Microsoft Silverlight
• …
Examples –
The MESSAGE Project
Dynamic Application Deployment
The MESSAGE Project
• Mobile Environmental Sensing System Across a Grid Environment
• 3 year project starting October 2006
• Funded jointly by EPSRC and DfT (~£4m), under
EPSRC’s e-Science demonstration programme
• 5 Universities, 19 industrial partners
• Pioneering combination and extension of leading
edge grid, sensor, communication and positioning
technologies
• Create radically new sensing infrastructure based
on combination of ad-hoc mobile and fixed sensors
• www.message-project.org
MESSAGE Objectives

To extend existing e-Science, sensor, communication and modelling
technologies to enable the integration of data from heterogeneous fixed and
mobile environmental sensor grids in real time to provide dynamic estimates
of pollutant and hazard concentrations.

To demonstrate how these data can be usefully correlated with a wide range of
other complementary dynamic data on, for example, weather conditions,
transport network performance, vehicle mix and performance, driver behaviour,
travel demand, pollutant exposure and health outcomes.

To implement relevant e-Science tool sets and (fixed and mobile) sensor and
communication system in a number of selected real-world case study
applications, involving close collaboration with business and the public sector,
and to thereby to demonstrate their value to the research and policy community.
Architecture Overview
Three Layer Architecture
• Application Layer
• Realtime Data Layer
• Sensor Layer
MESSAGE Project – Data Capture
Data Capture Platform
Reliable, efficient capture of data from an environment with an unreliable
communications infrastructure and varying load.
 Different types of sensors, different pre-processing requirements
 Different communications technologies
 Real time streaming and intermittent burst
Scalable Cloud-based
processing infrastructure
Multiple sensor and
communications technologies.
Multiple DBs
distributed across
several sites.
Processing data from sensors
• Sensors join and leave the network stochastically
• Joining sensors need to know where to send their data – this
information is provided by the Root Gateway:
Root
Gateway
?
Sensor
Sensor
Sensor
?
• Difficult to know how many sensors active
at any time
• Scalable
infrastructure =
more flexibility,
less waste
Sensor
Sensor
Sensor
MESSAGE Project – Cloud Computing
• Using Amazon EC2 (http://aws.amazon.com/ec2) to provide scalable
computing infrastructure for MESSAGE
• An Amazon Machine Image (AMI) has been prepared for the Sensor
Gateway software
• Sensor Gateway AMI is stored in the Amazon S3 Simple Storage
Service
• Resources based on this image can be started on-demand
• Paid for on a CPU-hour basis
MESSAGE Project – Cloud Computing
• Minimal Linux distribution to reduce image size and provide faster
start up
• Image contains only necessary software to run Sensor Gateway:
• Java, Glassfish Application Server, Sensor Gateway Web Service
• Start up scripts start application server and Sensor Gateway service
when image boots up
• Root Gateway Service has uses embedded client to start / stop
Sensor Gateway instances as required
• Pre-processing may be carried out by Sensor Gateway nodes, data
then sent on to database for storage
MESSAGE Project – Cloud Computing
Scalable Sensor Gateway Pool
Data Storage
Sensor
Sensor
Sensor
Sensor
Sensor
Sensor
Cloud Computing Resources
Visualisation/Application Platform
Sensor
Sensor
Sensor
Dynamic application deployment
Dynamic application deployment
• Have varying application requirements
• Avoid preparing separate Cloud resources for each application
• Use Cloud resources with a generic configuration
• Use a deployment service to move application executables into
execution environment as required, at runtime
• Well suited to HPC, batch type applications that need to be run
occasionally
• Potential for automating workflow execution on Cloud resources
Dynamic application deployment
Application 1
JSDL
Job
Description
JSDL
Job
Description
• JSDL Job description sent to
GridSAM service on
execution resource
(Executable,
Libraries)
Cloud Computing Resource
Input Data
Service
Interface
Application 2
GridSAM
Job Submission and Monitoring
Service
using local fork launcher
(Executable,
Libraries)
Input Data
• Application and input files staged
onto execution resource for
execution
Conclusions
• Many different considerations when moving applications to a Cloud
environment
• Not necessarily suited to all apps but new models/services emerging
• U
• Use a deployment service to move application executables into
execution environment as required, at runtime
• Well suited to HPC, batch type applications that need to be run
occasionally
• Potential for automating workflow execution on Cloud resources
THANK YOU!
jeremy.cohen@imperial.ac.uk
Download