Data-Intensive Research Workshop Soaring through clouds with Meandre

advertisement
Data-Intensive
Research Workshop
Soaring through clouds with Meandre
Xavier Llorà and Bernie Ács
xllora@illinois.edu
bernie@ncsa.illinois.edu
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
Part 1: Cloud Overview & Introduction
• Basic Cloud Concepts
• An Ideological Metaphor & Definition
• Example: TechNet Virtual Labs
• Cloud Classification Types
• Public, Private, & Hybrid Deployments
• Cloud Computing Models
• Infrastructure aaS, Platform aaS, & Software aaS
• NCSA Virtual Machines & Enterprise Cloud
• VMWare, Xen, & Eucalyptus
• ElasticFox & AMS Web Application
• NCSA Cloud Conduits
• Cloud Computing & Programming Paradigms
Imaginations unbound
An Ideological Metaphor & Definition
• Cloud Metaphor
• The term cloud is used as a metaphor for
the Internet, based on how it is depicted in
computer network diagrams and is an
abstraction for the complex infrastructure
it conceals
• Cloud Computing – Definition
• The first academic use of this term appears to define it as a computing
paradigm where the boundaries of computing will be determined
by economic rationale rather than technical limits.
• Cloud computing is a paradigm of computing in which dynamically
scalable and often virtualized resources are provided as a service over
the Internet. Users need not have knowledge of, expertise in, or control
over the technology infrastructure in the "cloud" that supports them
http://en.wikipedia.org/wiki/Cloud_computing
Imaginations unbound
An Example: TechNet Virtual Labs
3
2
1
http://www.microsoft.com/events/vlabs/defaults.aspx
Imaginations unbound
Step 1: Builds Lab
Imaginations unbound
Step 2: Lab is Ready
Imaginations unbound
Step 3: Controlling with Lab Machines
Imaginations unbound
Step 4: Interacting with Virtual Machines
Imaginations unbound
The Tutorial Session Can Be Freely Used
Imaginations unbound
Cloud Classification Types
• Public cloud or external cloud describes cloud
computing in the traditional mainstream sense, whereby
resources are dynamically provisioned on a fine-grained,
self-service basis over the Internet, via web
applications/web services, from an off-site third-party
provider who shares resources and bills on a finegrained utility computing basis
• Private cloud and internal cloud is a neologism that
describe configurations that emulate (public) cloud
computing on private networks
• Hybrid cloud consists of multiple internal and/or
external cloud deployments
http://en.wikipedia.org/wiki/Cloud_Computing
Imaginations unbound
Cloud Computing Models
• Infrastructure as a Service (IaaS)
• the delivery of computer infrastructure (typically a platform
virtualization environment) as a service
• Rather than purchasing servers, software, data center space
or network equipment, clients instead buy those resources as
a fully outsourced service.
• The service is typically billed on a utility computing basis and
amount of resources consumed (and therefore the cost) will
typically reflect the level of activity.
• Supersedes term Hardware as a Service (HaaS)
• It is an evolution of web hosting and virtual private server
offerings.
• Example: Amazon EC2/S3 services
http://en.wikipedia.org/wiki/Infrastructure_as_a_service
Imaginations unbound
Cloud Computing Models
• Platform as a Service (PaaS)
• delivery of a computing platform and solution stack as a service
• It facilitates deployment of applications without the cost and
complexity of buying and managing the underlying hardware
and software layers, providing all of the facilities required to
support the complete life cycle of building and delivering web
applications and services entirely available from the Internet
—with no software downloads or installation for developers,
IT managers or end-users
• Open Platform as a Service (OPaaS)
• another step in the Application Service Provider, SaaS, PaaS
evolution
• Example: Microsoft TechNet VLabs
http://en.wikipedia.org/wiki/Platform_as_a_service
Imaginations unbound
Cloud Computing Models
• Software as a Service (SaaS)
• is a model of software deployment whereby a provider licenses
an application to customers for use as a service on demand
• vendors may host the application on their own web servers or
download the application to the consumer device, disabling it
after use or after the on-demand contract expires
• Examples:
• Google Apps (Maps, Docs, and Others)
• Adobe (Connect & Buzzword)
• Microsoft (Workspace office live)
http://en.wikipedia.org/wiki/Platform_as_a_service
Imaginations unbound
NCSA Virtual Machines & Enterprise Cloud
Imaginations unbound
NCSA Uses Virtual Machine Technologies
• Virtual machine technology to support projects &
services using VMware, XenServer, & Others
• An Example Case: ICLCS & WebMO
• Institute for Chemistry Literacy Through Computational Science
(http://Iclcs.uiuc.edu/workshops & http://www.webmo.net/)
Passive LB Node
Active LB Node
Internet
Users
Internet
Users
Internet
InternetUsers
Users
Internet Users
Imaginations unbound
Shared Network
File System
Centralize
Relational
Database
Worker
Worker
Worker
Worker
Node
Node
Node Worker Node
Node
NCSA Enterprise Cloud
• Virtual Machine Infrastructure Expansion
• Dedicated Resources
• 176 Cores/18 Machines with 50TB Storage and 40Gb IB
• Dedicated Switches, Network services for VM & Cloud.
• Eucalyptus installation base
• “Amazon at home”
• EC2/S3/EBS
• Potential future support for
• dynamic load-balanced services & load-based procurement
• High degree of variability possible in configurations
• Account based virtual private enterprise
• Elastic IP, Elastic Block Storage, & Elastic Computing
• Empowers users versus Constrains users
• Cloud mechanics require a steep learning curve
Imaginations unbound
NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
List, Launch, & Manage Images
Imaginations unbound
NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
Enterprise Security Rules
Imaginations unbound
NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
SSH Key-Pair Management
Imaginations unbound
NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
Allocate, Assign, & Associate Elastic IP
Imaginations unbound
NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• ElasticFox (Version 1.6)
• FireFox plugin works well; has required modification, more to do.
Allocate, Assign, &
Associate
Elastic Block Storage
Imaginations unbound
NCSA Enterprise Cloud User Tools
• Command Line Tools
• Amazon Web Services API compatible tools (euca-*)
• Customizations and Refinements
• AWS Manager
• Statically deployed Web-Application
Imaginations unbound
NCSA Enterprise Cloud Conduits
• Private Cloud to Grid Conduit
• Dynamically Scalable Web Front-end & Middleware Layers
• Next Generation WebMO “Science Gateway”
• Batch Queue Proxy Integration, Metering, and Monitoring
• Private Cloud to Private Cloud Conduit
• Exploring Transparent Integration with Remote Sites
• UIUC Computer Science Hadoop Cluster
• Dynamic Integration with other Eucalyptus Site
• Private Cloud to Public Cloud Conduit
• Exploring Transparent Integration with Amazon EC2 Service
• Roles of Virtual Private Network Services
• Dynamic Scalability and Data Localities
Imaginations unbound
Part 2: Cloud Programming Paradigm
• How are Software Architecture and Design Impacted by
Virtual Machines & Cloud technologies?
• Natural Match for Multi-tier applications
• To best leverage cloud technology applications need to be more
modular and less monolithic
• Service orientated architecture can benefit from JeOS (Just
Enough Operating System) platforms and
• Can be easily configured to dynamically scale
• Meandre: Overview & Introduction
•
•
•
•
Agile Infrastructure for Data Intensive Applications
Semantic Orientated Component Based Architecture
Data Driven Execution Paradigm
SEASR Application Examples
Imaginations unbound
MONK Project – GSLIS
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Feature Lens Blow up
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Date Entities to Simile Timeline
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Analyzing CSPAN Archives
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
NEMA – Son of Blinkie - GSLIS
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
NESTER – GSLIS
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
NESTER - Birdie Audio – GSLIS
NESTER - Birdie Audio – GSLIS
Imaginations unbound
Evolution Highway – IGB
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Fedora Commons Repository
Components & Flows
Interactive Web
Application
Web Service
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Twitter For Research
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Data-intensive Computing for the Cloud
Imaginations unbound
Data-intensive Computing for the Cloud
• Meandre
•
•
•
•
•
•
Integrates within Existing Applications
May be a Free Standing Service
Capitalize on elasticity
Provide complex data computing as a service
Collocating computation and data
Natively access data in the cloud
• Hadoop Distributed File System (HDFS)
• Document stores
• KeyValue stores
• Relational stores
Meandre: The Dataflow Component
• Data dictates component execution semantics
Inputs
Outputs
Component
P
Descriptor in RDF
of its behavior
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
The component
implementation
Meandre: Flow (Complex Tasks)
• A flow is a collection of connected components
Read
P
Merge
P
Show
Get
P
P
Do
P
Dataflow execution
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Meandre Connectors
Flows are made up of “One or More” components
with “None to Many” connectors that are described
to the Mendre Server for management
Flows may contain connectors that
are cyclical over one or more
components
Flows must contain at minimum one
component with NO Inputs to cause
an Execute call to be made.
*Outputs are Always Optional.
Flow components may have
multiple connectors assigned
to any input data port
Flows can have any number of components with
“None to Many” Inputs data port s
“None to Many” Output data ports
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
Meandre: ZigZag Script Language
• Automatic Parallelization
• Adding the operator [+4] would result in a directed grap
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string ) [+4]
print( object:pt.string )
The SEASR project and its Meandre infrastructure
are sponsored by The Andrew W. Mellon Foundation
# Describes the data-intensive flow
#
@pu = push()
@pt = pass( string:pu.string ) [+4!]
print( object:pt.string )
Scaling Genetic Algorithms with Meandre
Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.
Imaginations unbound
And Beyond with Hadoop
60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet
Resources exhaustion
Imaginations unbound
Are Components Black-Box Wrappers?
• Programming Components is multilingual
• Natively support: Java, Scala, Python, and Clojure
• Easily Wrap: R, C, and C++
• Components can also interact with the OS
• Leverage OS tools
• Orchestrate other programs
• The question:
• Can Meandre help orchestrate and facilitate interaction and
cooperation between cloud and grid assets?
Meandre Components for
Amazon & Eucalyptus
Cloud Conduits to the Grid
• Cloud mechanics have a steep learning curve
• Can Meandre help simplify the process?
• Orchestrating clouds with Meandre
• Amazon/Eucalyptus model
• Components can be created to:
• List images
• List instances
• Launch instances
• Allocate Elastic IP and Elastic Block Storage
• Transfer Data or Programs to running instances
• Trigger process computation
• Monitor processes and/or executing persistent services
• Terminate instances
Meandre Cloud Orchestration Data Flow
Conclusions
• Next generation data-intensive applications will:
•
•
•
•
Use cloud computing technologies and conduits
Require adaptation of programming paradigms
Leverage a flexible architecture and a modular
Promote processing and resources at scale.
• Meandre
• Data-intensive execution engine
• Component-based programming architecture
• Distributed data flow designs to allow processing to be colocated with data sources and enable transparent scalability
• Orchestrate cloud deployments
• Leverage cloud conduits
Imaginations unbound
Download