Data-Intensive Research Workshop Soaring through clouds with Meandre Xavier Llorà and Bernie Ács xllora@illinois.edu bernie@ncsa.illinois.edu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Part 1: Cloud Overview & Introduction • Basic Cloud Concepts • An Ideological Metaphor & Definition • Example: TechNet Virtual Labs • Cloud Classification Types • Public, Private, & Hybrid Deployments • Cloud Computing Models • Infrastructure aaS, Platform aaS, & Software aaS • NCSA Virtual Machines & Enterprise Cloud • VMWare, Xen, & Eucalyptus • ElasticFox & AMS Web Application • NCSA Cloud Conduits • Cloud Computing & Programming Paradigms Imaginations unbound An Ideological Metaphor & Definition • Cloud Metaphor • The term cloud is used as a metaphor for the Internet, based on how it is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals • Cloud Computing – Definition • The first academic use of this term appears to define it as a computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits. • Cloud computing is a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them http://en.wikipedia.org/wiki/Cloud_computing Imaginations unbound An Example: TechNet Virtual Labs 3 2 1 http://www.microsoft.com/events/vlabs/defaults.aspx Imaginations unbound Step 1: Builds Lab Imaginations unbound Step 2: Lab is Ready Imaginations unbound Step 3: Controlling with Lab Machines Imaginations unbound Step 4: Interacting with Virtual Machines Imaginations unbound The Tutorial Session Can Be Freely Used Imaginations unbound Cloud Classification Types • Public cloud or external cloud describes cloud computing in the traditional mainstream sense, whereby resources are dynamically provisioned on a fine-grained, self-service basis over the Internet, via web applications/web services, from an off-site third-party provider who shares resources and bills on a finegrained utility computing basis • Private cloud and internal cloud is a neologism that describe configurations that emulate (public) cloud computing on private networks • Hybrid cloud consists of multiple internal and/or external cloud deployments http://en.wikipedia.org/wiki/Cloud_Computing Imaginations unbound Cloud Computing Models • Infrastructure as a Service (IaaS) • the delivery of computer infrastructure (typically a platform virtualization environment) as a service • Rather than purchasing servers, software, data center space or network equipment, clients instead buy those resources as a fully outsourced service. • The service is typically billed on a utility computing basis and amount of resources consumed (and therefore the cost) will typically reflect the level of activity. • Supersedes term Hardware as a Service (HaaS) • It is an evolution of web hosting and virtual private server offerings. • Example: Amazon EC2/S3 services http://en.wikipedia.org/wiki/Infrastructure_as_a_service Imaginations unbound Cloud Computing Models • Platform as a Service (PaaS) • delivery of a computing platform and solution stack as a service • It facilitates deployment of applications without the cost and complexity of buying and managing the underlying hardware and software layers, providing all of the facilities required to support the complete life cycle of building and delivering web applications and services entirely available from the Internet —with no software downloads or installation for developers, IT managers or end-users • Open Platform as a Service (OPaaS) • another step in the Application Service Provider, SaaS, PaaS evolution • Example: Microsoft TechNet VLabs http://en.wikipedia.org/wiki/Platform_as_a_service Imaginations unbound Cloud Computing Models • Software as a Service (SaaS) • is a model of software deployment whereby a provider licenses an application to customers for use as a service on demand • vendors may host the application on their own web servers or download the application to the consumer device, disabling it after use or after the on-demand contract expires • Examples: • Google Apps (Maps, Docs, and Others) • Adobe (Connect & Buzzword) • Microsoft (Workspace office live) http://en.wikipedia.org/wiki/Platform_as_a_service Imaginations unbound NCSA Virtual Machines & Enterprise Cloud Imaginations unbound NCSA Uses Virtual Machine Technologies • Virtual machine technology to support projects & services using VMware, XenServer, & Others • An Example Case: ICLCS & WebMO • Institute for Chemistry Literacy Through Computational Science (http://Iclcs.uiuc.edu/workshops & http://www.webmo.net/) Passive LB Node Active LB Node Internet Users Internet Users Internet InternetUsers Users Internet Users Imaginations unbound Shared Network File System Centralize Relational Database Worker Worker Worker Worker Node Node Node Worker Node Node NCSA Enterprise Cloud • Virtual Machine Infrastructure Expansion • Dedicated Resources • 176 Cores/18 Machines with 50TB Storage and 40Gb IB • Dedicated Switches, Network services for VM & Cloud. • Eucalyptus installation base • “Amazon at home” • EC2/S3/EBS • Potential future support for • dynamic load-balanced services & load-based procurement • High degree of variability possible in configurations • Account based virtual private enterprise • Elastic IP, Elastic Block Storage, & Elastic Computing • Empowers users versus Constrains users • Cloud mechanics require a steep learning curve Imaginations unbound NCSA Enterprise Cloud User Tools • Command Line Tools • Amazon Web Services API compatible tools (euca-*) • Customizations and Refinements • ElasticFox (Version 1.6) • FireFox plugin works well; has required modification, more to do. List, Launch, & Manage Images Imaginations unbound NCSA Enterprise Cloud User Tools • Command Line Tools • Amazon Web Services API compatible tools (euca-*) • Customizations and Refinements • ElasticFox (Version 1.6) • FireFox plugin works well; has required modification, more to do. Enterprise Security Rules Imaginations unbound NCSA Enterprise Cloud User Tools • Command Line Tools • Amazon Web Services API compatible tools (euca-*) • Customizations and Refinements • ElasticFox (Version 1.6) • FireFox plugin works well; has required modification, more to do. SSH Key-Pair Management Imaginations unbound NCSA Enterprise Cloud User Tools • Command Line Tools • Amazon Web Services API compatible tools (euca-*) • Customizations and Refinements • ElasticFox (Version 1.6) • FireFox plugin works well; has required modification, more to do. Allocate, Assign, & Associate Elastic IP Imaginations unbound NCSA Enterprise Cloud User Tools • Command Line Tools • Amazon Web Services API compatible tools (euca-*) • Customizations and Refinements • ElasticFox (Version 1.6) • FireFox plugin works well; has required modification, more to do. Allocate, Assign, & Associate Elastic Block Storage Imaginations unbound NCSA Enterprise Cloud User Tools • Command Line Tools • Amazon Web Services API compatible tools (euca-*) • Customizations and Refinements • AWS Manager • Statically deployed Web-Application Imaginations unbound NCSA Enterprise Cloud Conduits • Private Cloud to Grid Conduit • Dynamically Scalable Web Front-end & Middleware Layers • Next Generation WebMO “Science Gateway” • Batch Queue Proxy Integration, Metering, and Monitoring • Private Cloud to Private Cloud Conduit • Exploring Transparent Integration with Remote Sites • UIUC Computer Science Hadoop Cluster • Dynamic Integration with other Eucalyptus Site • Private Cloud to Public Cloud Conduit • Exploring Transparent Integration with Amazon EC2 Service • Roles of Virtual Private Network Services • Dynamic Scalability and Data Localities Imaginations unbound Part 2: Cloud Programming Paradigm • How are Software Architecture and Design Impacted by Virtual Machines & Cloud technologies? • Natural Match for Multi-tier applications • To best leverage cloud technology applications need to be more modular and less monolithic • Service orientated architecture can benefit from JeOS (Just Enough Operating System) platforms and • Can be easily configured to dynamically scale • Meandre: Overview & Introduction • • • • Agile Infrastructure for Data Intensive Applications Semantic Orientated Component Based Architecture Data Driven Execution Paradigm SEASR Application Examples Imaginations unbound MONK Project – GSLIS The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Feature Lens Blow up The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Date Entities to Simile Timeline The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Analyzing CSPAN Archives The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation NEMA – Son of Blinkie - GSLIS The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation NESTER – GSLIS The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation NESTER - Birdie Audio – GSLIS NESTER - Birdie Audio – GSLIS Imaginations unbound Evolution Highway – IGB The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Fedora Commons Repository Components & Flows Interactive Web Application Web Service The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Twitter For Research The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Data-intensive Computing for the Cloud Imaginations unbound Data-intensive Computing for the Cloud • Meandre • • • • • • Integrates within Existing Applications May be a Free Standing Service Capitalize on elasticity Provide complex data computing as a service Collocating computation and data Natively access data in the cloud • Hadoop Distributed File System (HDFS) • Document stores • KeyValue stores • Relational stores Meandre: The Dataflow Component • Data dictates component execution semantics Inputs Outputs Component P Descriptor in RDF of its behavior The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation The component implementation Meandre: Flow (Complex Tasks) • A flow is a collection of connected components Read P Merge P Show Get P P Do P Dataflow execution The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre Connectors Flows are made up of “One or More” components with “None to Many” connectors that are described to the Mendre Server for management Flows may contain connectors that are cyclical over one or more components Flows must contain at minimum one component with NO Inputs to cause an Execute call to be made. *Outputs are Always Optional. Flow components may have multiple connectors assigned to any input data port Flows can have any number of components with “None to Many” Inputs data port s “None to Many” Output data ports The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre: ZigZag Script Language • Automatic Parallelization • Adding the operator [+4] would result in a directed grap # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4] print( object:pt.string ) The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4!] print( object:pt.string ) Scaling Genetic Algorithms with Meandre Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs. Imaginations unbound And Beyond with Hadoop 60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet Resources exhaustion Imaginations unbound Are Components Black-Box Wrappers? • Programming Components is multilingual • Natively support: Java, Scala, Python, and Clojure • Easily Wrap: R, C, and C++ • Components can also interact with the OS • Leverage OS tools • Orchestrate other programs • The question: • Can Meandre help orchestrate and facilitate interaction and cooperation between cloud and grid assets? Meandre Components for Amazon & Eucalyptus Cloud Conduits to the Grid • Cloud mechanics have a steep learning curve • Can Meandre help simplify the process? • Orchestrating clouds with Meandre • Amazon/Eucalyptus model • Components can be created to: • List images • List instances • Launch instances • Allocate Elastic IP and Elastic Block Storage • Transfer Data or Programs to running instances • Trigger process computation • Monitor processes and/or executing persistent services • Terminate instances Meandre Cloud Orchestration Data Flow Conclusions • Next generation data-intensive applications will: • • • • Use cloud computing technologies and conduits Require adaptation of programming paradigms Leverage a flexible architecture and a modular Promote processing and resources at scale. • Meandre • Data-intensive execution engine • Component-based programming architecture • Distributed data flow designs to allow processing to be colocated with data sources and enable transparent scalability • Orchestrate cloud deployments • Leverage cloud conduits Imaginations unbound