MOCCA – Component-based grid environment for

advertisement
Component-based Grid Environment for
Programming Scientific Applications
Maciej Malawski
1
Outline
•
•
•
•
•
•
Problem: programming applications on Grid
Programming models and virtualization
CCA + H2O
Extensions to the environment
Applications and tests
Summary and future work
2
Experience (CrossGrid) Grid is complex
Applications
Flood
Simulation
Meteo/
Pollution
Particle
Physics
Medical
Support
Links
Plugin
Plugin
Tools
Performance
Prediction
Benchmarks
Links
SOAP
API
MPI
Verification
API
Postprocessing
SOAP
SOAP
Application
OCM-G
API
Monitoring
Roaming
Access Server
API
API
Scheduler
MPI Library
API
API
DataGrid
Data
Access
API
• 17 sites
• 9 countries
(OMIS)
SOAP
SOAP
Network
Monitoring
Protocol
Portal
SOAP
Infrastructure
Monitoring
API
(JMX)
Performance
Analysis
API
Plugin
SOAP
Services
Migrating
Desktop
Testbed
Visualization
Kernel
Globus
Toolkit
• over 200 CPUs
• 4 TB of storage
3
Problem – how to program grid applications
• Scientific applications:





Compute intensive
May be data-intensive
Often custom-made
Written in many programming languages (e.g. Fortran)
Collaborative
• Current practice on Grid:
 “Write a JDL scripts which submits a shell script as a batch job, which uses SSH
to launch a process on the head node of the cluster to serve as a proxy for
communication...” (from CGW'06 presentation by ICM)
 “Submit a shell script which queries the LFC catalog, retrieves TAR archive from
SE using GRIDFTP, unpacks the archive, runs another computing script, stores
the output on SE and registers in LFC catalog.” - a biomedical application
(CGW'06)
• Problems with scientific computing (IPDPS'05 panel discussion):
 Software
 Software
 Software... engineering
4
Two key challenges
• Programming model





Suitable for the distributed environment
Allowing to manage complex applications
Supported by standards
Supporting scientific applications
Facilitating programming
• Virtualization
 Hiding the complexity of heterogeneous environment
 Allowing to dynamically create/acquire pools of resources on
demand
5
Research objectives
• Concept of programming environment for
scientific applications on Grid
 Analysis of programming models for grid applications
 Identification of desired features of programming environment
• Prototype implementation and feasibility study
• Verification of the model and prototype with
typical applications
• Thesis (provisional):
 Extended Component model may be used for creating grid
environment for programming and running complex scientific
applications.
6
Many programming models
•
•
•
•
•
•
•
•
MPI, PVM
Custom protocols
Tuple spaces, HLA
Distributed objects
Active objects
Components
Skeletons
Service Oriented Architectures, Web Services
7
Virtualization: state of the art (incomplete)
• Globus GRAM, Condor, VDT, gLite, Unicore
 large-scale batch job oriented submission systems
• Virtual Workspaces: using Globus to submit VMWare (or
other type) virtual machines to create a Condor pool of
resources, which can be in turn accessible using Globus
Toolkit
 Cannot call it lightweight solution!
• SOA – everything accessible as Web Service
 Efforts to support dynamic service deployment
• Component model: a container provides a virtualization
layer for hosting components
 Dynamic deployment directly embedded into a programming model (component = unit of deployment)
8
What are components?
• A unit of software
development/deployment/reuse
 i.e. has interesting functionality
 Ideally, functionality someone else might be able to (re)use
 Can be developed independently of other components
• Interacts with the outside world only through
well-defined interfaces
• Can be composed with other components
 “Plug and play” model to build applications
 Composition based on interfaces
• Hosted in a framework/container responsible for
other services (communication, security)
9
Benefits of Component-based Approach
• Enables composing applications from blocks which
originally were not designed to be combined
• Addresses software complexity issues
• Many frameworks provide language interoperability
• Enformcement of separation of interface from
implementation
• Facilitates managing third party libraries
• Allows easy swapping of implementation
• Increases software productivity
• Mature and successful technology in business and
desktop applications
10
Components vs. Web Services
• Component:
 Formal models for component
programming (e.g. Fractal)
 May be created on-demand,
e.g. more components deployed
when needed
 Explicitly declare required
interfaces (uses ports) – can be
directly connected – no need to
pass invocation data via central
workflow engine
 May have parallel connections
 Does not require SOAP as a
protocol
Server
Component
Client
1
Client
2
...
Client
N
Server
1
Server
2
Client
...
Server
N
11
Proposed approach to building grid
environment
• Use a component model
• Apply a virtualization layer
• Design a base component environment with a
set of desired features
• Extend the environment features
12
Desired features of Grid components
•
Scalable to different environments (from laptops to HPC clusters)


•
Facilitated deployment on shared resources


•
composable in space and in time
taking advantage of semantic description and reasoning
Adapted to unreliable Grid environment


•
allowing easy adaptation of legacy code
combining Java flexibility with optimized Fortran libraries
Facilitating programming


•
P2P, WANs, LANs, intercluster connections, direct binding in one process
supporting parallelism
Supporting multiple languages


•
Virtualization (creating dynamic workspaces)
Dynamic (hot) deployment
Communication adjusted to various levels of coupling


•
lightweight platform
dynamic, pluggable, reconfigurable at runtime
supporting dynamic and interactive reconfiguration of connections, locations, bindings
providing support for migration and checkpointing
Interoperability with grid standards


Web Services – SOAP, WSDL, possibly WSRF
Grid Component Model (ProActive/Fractal)
13
State of the art – examples of solutions
(incomplete)
•
Scalable to different environments (from laptops to HPC clusters)


•
Facilitated deployment on shared resources

•
composable in space and in time: XCAT, ICENI, GCM – hierarchical
Skeleton approach: HOC, ASSIST
taking advantage of semantic description and reasoning: ICENI, Semantic Web Services
Adapted to unreliable Grid environment


•
legacy code: BABEL
Interoperability: CORBA, SOAP
Facilitating programming



•
CCAFFEINE – direct binding, MPI; XCAT – SOAP
optimized communication: IBIS, GridCCM
Parallel, collective communication: GridCCM, IBIS, ProActive
Supporting multiple languages


•
ProActive, XCAT (using Globus)
Communication adjusted to various levels of coupling



•
HPC: CCAFFEINE, GridCCM
Lightweight: XCAT, ProActive, ICENI
dynamic and interactive reconfiguration: ProActive, XCAT, Web Services model
migration and checkpointing: Proactive, XCAT
Interoperability with grid standards


Web Services – XCAT, ProActive
Grid Component Model: ProActive reference implementation
14
Base for the Solution: CCA and H2O
•
Common Component Architecture
(CCA)




•
Component standard for HPC
Uses and provides ports described in SIDL
Support for scientific data types
Existing tightly coupled (CCAFFEINE) and
loosely coupled, distributed (XCAT)
frameworks
H2O




Java-based distributed resource sharing
platform
Providers setup H2O kernel (container)
Allowed parties can deploy pluglets
(components)
Separation of roles: decoupling
•
•

Providers from deployers
Providers from each other
RMIX: efficient multiprotocol RMI extension
Provider host
<<create>>
Container
A
Provider Deploy
B
Lookup
Client
& use
Traditional model
<<create>>Provider host
Container
A
Provider
B
Proposed model
15
Deploy
Provider,
Client,
or Reseller
Lookup
& use
Client
Example scenarios of H2O
Registration and Discovery
Publish
UDDI
JNDI
LDAP
DNS
GIS
e-mail,
phone, ...
...
Find
Deploy
Provider
native
Deploy
A
Client
Provider
A
A code
Provider
B
B
A
Client
Client
B
Provider
Deploy
Legacy
App
Repository
A
Reseller
B
C
1. Provider = deployer

e.g. resource = legacy
application
Repository
Developer
A
B
C
3. Client = deployer
2. Reseller:=
developer = deployer

e.g. computational service
offered within a grid system

e.g. client runs custom
distributed application on
shared resources
16
Features of the environment
•
•
•
•
•
•
•
Scalable to different environments (from Laptops to HPC clusters)
–
lightweight platform: use H2O
–
dynamic, pluggable, reconfigurable at runtime: dynamic CCA model + H2O kernel facilities
Facilitated deployment on shared resources
–
–
Static virtualization by using H2O kernel as a daemon
Dynamic virtualization using a pool of transient H2O kernels created on-demand
Communication adjusted to various levels of coupling
–
Offered by RMIX library of H2O
–
Parallel extensions for CCA: multiple ports
Facilitating programming
–
Composition in time: Low-level Python or Ruby Scripting, High-level: Virolab/GridSpace
programming environment
–
Semantic description: under development within Virolab
Supporting multiple languages
–
Integration of RMIX with Babel
–
Integration of MOCCA with Babel – pending
Interoperability with grid standards
–
–
Web Services – future work (technically feasible: either RMIX of embedded server – Xfire)
Grid Component Model (ProActive/Fractal) interoperability – recent work
Adapted to unreliable Grid environment
–
supporting dynamic and interactive reconfiguration of connections, locations, bindings
–
providing fault-tolerance support: migration and checkpointing – future work
17
MOCCA – a basic component framework
•
Each component is a separate pluglet



•
•
•
Using RMIX for communication – efficiency, multiprotocol interoperability
Flexibility and multiple scenarios – as in H2O
MOCCA_Light: pure Java implementation

•
Dynamic remote deployment of components
Components packaged as JAR files
Security: Java sandboxing, detailed access policy
Java API or Jython and Ruby scripting for application asssembly
http://www.icsr.agh.edu.pl/mambo/mocca
18
Dynamic virtualization
User's
virtual
resource
pool
NS
lookup()
bind()
H2O H2O
H2O
H2O
H2O
H2O
Resource
Broker
SSH
Grid node
Standalone
machine
Cluster
PBS
LCG
• A pool of computing resources may be created by submitting a
number of H2O kernels on many Grid sites
• Application components may be deployed on the kernels belonging to
the pool
• Virtual resource pool may be used by a single user or shared for
collaboration
• Interaction with cluster nodes in private network – JXTA transport
(needs more testing)
19
Communication extension: RMIX over
JXTA
• Fully operational RMI implementation running over JXTA
P2P network
• Methods can be
invoked on remote
objects located behind
firewalls or NATs
• Our implementation of
JXTA socket factories
manages all the JXTA
connectivity
transparently from
user’s point of view
20
Parallelism: Extensions of CCA for Multiple
Ports and Connections
• Multiple users of one provides
port (easy part)
Server
Component
Client
1
 Single provides port
 Naming convention for client
components (client1, client2, ...)
Client
2
• Single client of multiple
providers:
 Need multiple uses ports on the
client side
 Use ParameterPort of CCA to
parametrize the number of uses
ports
 Client component creates a
required number of uses ports
 Naming convention for server
components and uses port names
• Extension of CCA
BuilderService: MultiBuilder
 Creation of multiple components
 Handling multiple connections
...
Client
N
Server
1
Server
2
Client
...
Server
N
21
Support for composition in space and in
time
• Declarative vs.
imperative programing
• Composition in space
 Graph of component
connections
 ADL – Application
Description Language
 Supported by
MOCCAccino
Runtime system
init()
simulate()
getMolecule()
Configuration
Generator
Simulated
Annealing
• Composition in time
 Workflow model (script)
 Centralized execution
 Currently supported lowlevel scripting in Jython
and JRuby
 High-level scripting
developed within Virolab
store()
Storeroom
Simulated
Annealing
...
Invocation
Simulated
Annealing
Direct connection
22
Composition in space - Moccaccino
• ADLM (ADL for MOCCAccino) – XML based
language for:
 Describing types and number of components and their
connections
 Concept of hierarchical component groups
 Optional information to specify resources
 Hints for deployment of components (whether they are
computation intensive or communication intensive).
• Application Manager – responsible for:




Discovering available kernel pool
Planning optimal location of components
Deploying components in specified kernels
Connecting components
23
Moccacino usage
HDNS
Registry
Component Graph
Biulder creates one
component instance of…
Parser
Pings
1
…each with 2-element
list of…
Pongs
*
Graph
Builder
Kernel
information
Provider
1
…each with
map with „left”
and „right”
keys of…
Zonks
*
Ping
Deployment
Planner
list
index: 0
Application
Deployer
MOCCA
Builder
Application
Manager
24
index: 1
Pong
Pong
map
map
key:
„left”
key:
„right”
key:
„left”
Zonk
Zonk
Zonk
key:
„right”
Zonk
Motivation for multiprotocol and
multilanguage interoperability
• Grids are heterogeneous
• Multiple programming languages – in single application




Java for middleware
C for system programming
FORTRAN for computing
Python for scripting
• Multiple protocols – in single application




High speed local networks (Myrinet)
TCP/SSL/TLS in WAN
SOAP for loosely coupled message exchange
Overlay P2P networks for traversing private network boundaries (NATs)
• Context: MOCCA component framework
25
Multilanguage Solution - Babel
•
•
•
f77
SIDL – Scientific Interface Definition Language

Standard for CCA Components

Supports arrays and complex types

Focus on interfaces
C
Babel
SIDL parser

Code generator

Runtime library
Intermediate Object
Representation (IOR)

Core of Babel object

Array of function
pointers

Generated code in C
Python
C++
Babel:

f90
package example version 1.2 {
class Hello {
string hello( in string hello);
}
}
/**
* Method: hello[]
*/
char*
example_Hello_hello(
/*in*/ example_Hello self,
/*in*/ const char* hello);
Java
// user defined non-static methods:
/**
* Method: hello[]
*/
public java.lang.String hello_Impl (
/*in*/ java.lang.String hello )
{
// DO-NOT-DELETE splicer.begin(example.Hello.hello)
// Insert-Code-Here {example.Hello.hello} (hello)
return ”Server says: ” + hello;
// DO-NOT-DELETE splicer.end(example.Hello.hello)
}
26
Currently: Babel for Local Applications
• All Babel objects in
one process
• Implemented in
CCAFFEINE
framework
• Existing
multilanguage CCA
components – see
CCA tutorial
Fortran
native
library
SIDL
Java
application
Babel IOR
Babel IOR
SIDL
C++
native
library
27
Our Solution
• Babel + RMIX
• Implementation of
Babel RMI
extensions
– generic mechanism
of method invocation
(reflection)
– Dynamic loading of
communication
library
– No need for code
generation and
compilation
Network
RMIX
library
RMIX
library
SIDL
SIDL
Babel IOR
Java
application
Babel IOR
SIDL
SIDL
C++
native
library
Fortran
native
library
28
Interoperability with Grid Component
Model (CoreGRID)
• Based on Fractal Model
Component
Identity
Binding
LifeCycle
Controller Controller
Controller
Content
Content
Controller
• Deployment Functionalities
• Asynchronous and extensible
port semantics
• Collective Interfaces
• Autonomicity and adaptivity
thanks to “autonomic” and
“dynamic” controllers
• Support for language neutrality
and interoperability
29
Motivation for interoperability
• Framework interoperability is an important issue
for GCM
• Existing component models and frameworks for
Grids
 CCA, CCM
• Already existing „legacy” components
• ProActive/Fractal and H2O/MOCCA – alternative
Java-based frameworks for distributed
computing: can they interoperate?
30
Fractal vs. CCA
• Similarities: general for most component models
 Separation of interface from implementation
 Composition by connecting interfaces
• Differences
 Fractal components are reflective (introspection) vs. the CCA
components are given initiative to add/remove ports at runtime
 BindingController in Fractal vs. BuilderService in CCA
 No ContentController in CCA (and no hierarchy)
 Factory interface in Fractal vs. BuilderService in CCA
 AttributeController in Fractal vs. ParameterPort in CCA
 No ADL in CCA
31
Approaches to integration
C
• Single component
integration
Wrapper
cca.Services
 Wrapping a CCA component
into a primitive GCM one
 Allow to use a CCA
component in a GCM
framework
CCA
Component
C
• Framework
interoperability
 Ability for two component
frameworks to interoperate
 Allow to connect a CCA
component assembly (running
in a CCA framework) to a
GCM component application
BC
BC
Wrapper
CCA
Framework
Glue
Builder
Service
CCA
Component
CCA
Component
CCA
Component
32
Glue
Solutions to typing issues
1. Generate the type of a wrapped CCA component at
runtime (at initialization)


Pros: fully automated
Cons: restricts to usage of ports which are declared by CCA
component during initialization (at setServices() call)
2. Manual description of a CCA component in ADL format


Pros: Generic solution
Cons: Require additional task from developer
3. (Semi)automatic generation of ADL
•
May combine approach 1. and 2.
4. Reuse existing CCA type specifications (SIDL,
CCAFFEINE scripting, others – not standardized)
33
Technical approach – CCA controller
CCA
C
BC
CCA
Controller
CCA
Framework
Builder
Service
WA
A
BC
A
A
Server
Glue A
A
CCA
Component
CCA
Component
H2O Kernel
B
B
Client
Glue B
B
B
B
H2O Kernel
CCA
Component
H2O Kernel
• Creates glue components for all ports (client and server)
• Connects glue to CCA system (using CCA builder) and to membrane
(using BC)
34
Glue Components
• Server Glue:
 Deployed as Fractal component
 Uses MOCCA client code to delegate
invocation to CCA interface
 Can be also deployed on H2O kernel
WA
A
A
A
Server
Glue A
CCA
Component
H2O Kernel
• Client Glue:
 Deployed as CCA component in H2O
kernel
 Launches ProActive runtime in H2O
kernel
 Creates Fractal component in this
runtime
• Both:
 Can be generated from the interface
type (TODO)
BC
CCA
Component
B
B
H2O Kernel
35
Client
Glue B
B
B
ProActive + MOCCA
• MOCCA invocations are synchronous
 Composite (membrane) should be synchronous to avoid
deadlocks
 Or, we may consider generating glue with wrapped types
(IntWrapper, etc) – this changes types of interfaces
• Class loading issues
 The classes generated by ProActive runtime must be visible to the
code running in H2O kernel
 The RMI class loading works fine if the codebase is set properly
on ProActive side
36
Communication Intensive Application
Benchmark
• Simplified scenario:
 2 components
 Provides port: receive and send-back array of double (ping-pong)
• Tested on local Gigabit Ethernet and on transatlantic Internet between
Atlanta and Krakow
• 2.4 GHz Linux machines
• Comparison with XCAT
37
Small Data Packets
Factors:
• SOAP header overhead in XCAT
• Connection pools in RMIX
38
Large Data Packets
• Encoding (binary vs. base64)
• CPU saturation on Gigabit LAN (serialization)
• Variance caused by Java garbage collection
39
Automatic Flow Composer Example
•
•
•
•
•
Compose application graph from
initial data (e.g. initial ports) or
incomplete graph
First implemented for XCAT
framework
Easy migration to MOCCA
Modification of code required
(xcat.Port)
Similar performance for XCAT and
MOCCA (exchange of text
documents)
Flow
Com
poser
Compose
Flow
Optim
izer
Evaluate
Link
Evaluator
Evaluate
Site
Evaluator
40
Lookup
Com
ponent
Registry
Other applications
• Domain decomposition (some student toy apps)
• Data mining using Weka (as a Virolab example)
41
Gold Cluster Application
•
Components
Starter – a „driver” component for
the application, provides a Go port
Configuration generator – random
initial configurations
Simulated annealing – compute
intensive simulation component
Storeroom – used for keeping
results and statistics
Gather – auxiliary component for
passing molecules

Starter
Generator
Control

Annealing
Control


Configuration
Generator
Molecule
Simulated
Annealing

Control
Simulated
Annealing
...
Molecule
Molecule
Gather
Storeroom
•
Ports

Simulated
Annealing

Molecule – offers getMolecule()
method
Control ports – for steering the
application
42
Resources and Results
• Using heterogeneous
infrastructure – available adhoc
 Local machine
• SSH access
 Cluster in CYFRONET
• PBS
 CrossGrid tesbed (LCG based
middleware)
• Java VMs already installed
• Cluster nodes allow remote
point-to-point communication
(MPICH-enabled: no firewalls!)
• Problem size grows with
number of nodes (weak
scaling)
Computing time[s]
• Clusters in PSNC Poznan and
IFCA Santander
375
350
325
300
275
250
225
200
175
150
125
100
75
50
25
0
1
2
3
4
5
6
Number of nodes
7
43
8
9
10
Future work
• Optimization algorithms (scheduling) for ADL
and scripting models
• Monitoring support (Gemini)
• Formal model (adapted from GCM)
• Further integration with Babel
• More applications
44
Summary
• Analysis of programming models for Grid, selection of
component model
• Design and implementation of CCA framework based on
H2O platform
• Extending applicability of H2O for dynamically created
pools of resources (user-centric or ad-hoc created Vos)
• Extensions for parallel-distributed CCA components
• Support for time and space composition modes by highlevel scripting and ADL-based application
• Towards multilanguage interop
• Supporting interoperability between component models
45
Key papers
•
•
•
•
•
Maciej Malawski, Dawid Kurzyniec, and Vaidy Sunderam. MOCCA – towards a
distributed CCA framework for metacomputing. In Proceedings of the 10th
International Workshop on High-Level Parallel Programming Models and Supportive
Environments (HIPS2005), 2005. IEEE Computer Society
Maciej Malawski, Marian Bubak, Michał Placek, Dawid Kurzyniec, and Vaidy
Sunderam. Experiments with distributed component computing across Grid
boundaries. In Proceedings of the HPC-GECO/CompFrame workshop in conjunction
with HPDC 2006, 2006.
P. Jurczyk, M. Golenia, M. Malawski, D. Kurzyniec, M. Bubak, V. S. Sunderam,
Enabling Remote Method Invocations in Peer-to-Peer Environments: RMIX over
JXTA, in: Roman Wyrzykowski, Jack Dongarra, Norbert Meyer, Jerzy Wasniewski
(Eds.), Parallel Processing and Applied Mathematics: 6th International Conference,
PPAM 2005, Poznan, Poland, September 11-14, 2005, Revised Selected Papers,
Lecture Notes in Computer Science, 3911, Springer, 2006, pp. 667-674
M. Malawski, D. Harezlak, M. Bubak, Towards Multiprotocol and Multilanguage
Interoperability: Experiments with Babel and RMIX, in: M. Bubak, M. Turała, K. Wiatr
(Eds.), Proceedings of Cracow Grid Workshop - CGW'05, November 20-23 2005,
ACC-Cyfronet UST, 2006, Kraków, pp. 266-278.
M. Bubak, M. Malawski, M. Placek, Using MOCCA Component Environment for
Simulation of Gold Clusters, in: M. Bubak, M. Turała, K. Wiatr (Eds.), Proceedings of
Cracow Grid Workshop - CGW'05, November 20-23 2005, ACC-Cyfronet UST, 2006,
Kraków, pp. 295-299.
46
Acknowledgements
• Vaidy Sunderam, Dawid Kurzyniec – Emory University,
Atlanta
• Daniel Harężlak, Michał Placek
• Tomek Bartyński, Eryk Ciepiela, Joanna Kocot,
Przemysław Pelczar, Iwona Ryszka
• Paweł Jurczyk, Maciej Golenia
• Tomasz Gubała, Marek Kasztelnik, Piotr Nowakowski
• Ludovic Henrio, Matthieu Morel, Francoise Baude, Denis
Caromel – Sophia-Antipolis, France
• Marian Bubak
47
Download