SeD - COOP

advertisement
DIET Overview and some recent work
A middleware for the large scale deployment
of applications over the Grid
Frédéric Desprez
LIP ENS Lyon / INRIA
GRAAL Research Team
Join work with
N. Bard, R. Bolze, B. Depardon, Y Caniou, E. Caron,
B. Depardon, D. Loureiro, G. Le Mahec, A. Muresan, V. Pichon, …
Distributed Interactive Engineering Toolbox
Introduction
• Transparency and simplicity represent the holy grail for Grids (maybe
even before performance) !
• Scheduling tunability to take into account the characteristics of specific
application classes
• Several applications ready (and not only number crunching ones !)
• Many incarnations of the Grid (metacomputing, cluster computing, global
computing, peer-to-peer systems, Web Services, …)
 Many research projects around the world
 Significant technology base
• Do not forget good ol’ time research on scheduling and distributed
systems !
 Most scheduling problems are very difficult to solve even in their simplistic
form …
 … but simple solutions often lead to better performance results in real life
Introduction, cont
One long term idea for the grid
offering (or renting) computational power and/or storage
through the Internet
 Very high potential
•
•
•
•
•
Need of Problem Solving and Application Service Provider Environments
More performance, storage capacity
Installation difficulty for some libraries and applications
Some libraries or codes need to stay where they have been developed
Some data need to stay in place for security reasons
 Using computational servers through a simple interface
RPC and Grid-Computing: GridRPC
•
One simple idea
– Implementing the RPC programming model over the grid
– Using resources accessible through the network
– Mixed parallelism model (data-parallel model at the server level and task
parallelism between the servers)
•
Features needed
– Load-balancing (resource localization and performance evaluation,
scheduling),
– IDL,
– Data and replica management,
– Security,
– Fault-tolerance,
– Interoperability with other systems,
– …
Design of a standard interface
– within the OGF (GridRPC and SAGA WG)
– Both computation requests and data management
– Existing implementations: NetSolve, Ninf, DIET, OmniRPC
•
RPC and Grid Computing: Grid RPC
Request
AGENT(s)
Client
S2 !
Op(C, A, B)
S1
S2
S3
S4
DIET’s Goals
•
Our goals
 To develop a toolbox for the deployment of environments using the Application
Service Provider (ASP) paradigm with different applications
 Use as much as possible public domain and standard software
 To obtain a high performance and scalable environment
 Implement and validate our more theoretical results
 Scheduling for heterogeneous platforms, data (re)distribution and replication, performance
evaluation, algorithmic for heterogeneous and distributed platforms, …
•
Based on CORBA, NWS, LDAP, and our own software developments
 FAST for performance evaluation,
 LogService for monitoring,
 VizDIET for the visualization,
 GoDIET for the deployment
DIET Dashboard
•
Several applications in different fields (simulation, bioinformatics, …)
•
Release 2.2 available on the web
•
ACI Grid ASP, TLSE, ACI MD GDS, RNTL GASP, ANR LEGO, Gwendia,
COOP, Grid’5000
http://graal.ens-lyon.fr/DIET/
DIET Architecture
Client
Master Agent
MA
MA
JXTA
MA
MA
MA
Server front end
LA
LA
LA
Local Agent
LA
Client and server interface
•
Client side
 So easy …
 Multi-interface (C, C++, Fortran, Java,
Scilab, Web, etc.)
 Grid-RPC compliant
•
Server side
 Install and submit new server to agent
(LA)
 Problem and parameter description
 Client IDL transfer from server
 Dynamic services





new service
new version
security update
outdated service
Etc.
 Grid-RPC compliant
Data/replica management
•
Two needs
 Keep the data in place to reduce the overhead of communications between clients and
servers
Client
 Replicate data whenever possible
A
•
Three approaches for DIET
 DTM (LIFC, Besançon)
B
 Hierarchy similar to the DIET’s one
 Distributed data manager
 Redistribution between servers
B
 P2P data cache
Y
Client
 Joining task scheduling and data management
•
Work done within the GridRPC Working Group (OGF)
 Relations with workflow management
F
X
Server 2
 JuxMem (Paris, Rennes)
 DAGDA (IN2P3, Clermont-Ferrand)
Server 1
B
G
DAGDA
Data Arrangement for Grid and Distributed Applications
• A new data manager for the DIET middleware providing




Explicit data replication: Using the API.
Implicit data replication: The data are replicated on the selected SeDs.
Direct data get/put through the API.
Automatic data management: Using a selected data replacement algorithm
when necessary.
 LRU: The Least Recently Used data is deleted.
 LFU: The Least Frequently Used data is deleted.
 FIFO: The « oldest » data is deleted.
 Transfer optimization by selecting the more convenient source.
 Using statistics on previous transfers.
 Storage resources usage management.
 The space reserved for the data is configured by the user.
 Data status backup/restoration.
 Allowing to stop and restart DIET, saving the data status on each node.
DAGDA
• Transfer model
 Uses the pull model.
 The data are sent independently of the service call.
 The data can be sent in several parts.
1: The client send a request for a
service.
2: DIET selects some SeDs
according to the chosen
scheduler.
3: The client sends its request to
the SeD.
4: The SeD download the data
from the client and/or from other
nodes of DIET.
5: The SeD performs the call.
6: The persistent data are
updated.
DAGDA
•
DAGDA architecture
 Each data is associated to one
unique identifier
 DAGDA control the disk and
memory space limits. If
necessary, it uses a data
replacement algorithm.
 The CORBA interface is used
to communicate between the
DAGDA nodes.
 The users can access to the
data and perform replications
using the API.
DIET Scheduling
•
Collector of Resource Information (CoRI)


Interface to gather performance information
Functional requirements



Non-functional requirements



•
Set of basic metrics
One single access interface
Extensibility
Accuracy and latency
Non-Intrusiveness
Currently 2 modules
available



CoRI Easy
Fast
Extension possibilities:
Ganglia, Nagios
R-GMA, Hawkeye, INCA, MDS, …
CoRI Manager
CoRI-Easy
Collector
FAST
Collector
Other
Collectors
like
Ganglia
FAST: Fast Agent’s System Timer
•
•
Performance evaluation of platform enables to find an efficient server
(redistribution and computation costs) without testing every configuration 
performance database for the scheduler
Based on NWS (Network Weather Service)
Computer
Client application
Computer
• Memory amount
• CPU Speed
• Batch system
• Status (up or
down)
• Load
• Memory
• Batch queue
status
FAST
FAST API
Network
• Bandwidths
• Latencies
• Topology
• Protocols
Static Data
Acquisition
Dynamic Data
Acquisition
Network
• Bandwidths
• Latencies
Computation
• Feasibility
• Execution time
on a given
architecture
LDAP
BDB
NWS
Low level software
...
Plugin Schedulers
• “First” version of DIET performance management
 Each SeD answers a profile (COMP_TIME, COMM_TIME, TOTAL_TIME,
AVAILABLE_MEMORY) for each request
 Profile is filled by FAST
 Local Agents sort the results by execution time and send them back up to the
Master Agent
• Limitations







Limited availability of FAST/NWS
Hard to install and configure
Priority of FAST-enabled servers
Extension hard to handle
Non-standard application- and platform-specific performance measures
Firewall problems with some performance evaluation tools
No use of integrated performance estimator (i.e. Ganglia)
DIET Plug-in Schedulers
•
SeD level
 Performance estimation function
 Estimation Metric Vector (estVector_t) - dynamic collection of performance estimation
values
 Performance measures available through DIET



FAST-NWS performance metrics
Time elapsed since the last execution
CoRI (Collector of Resource Information)
 Developer defined values

Standard estimation tags for accessing the fields of an estVector_t




•
EST_FREEMEM
EST_TCOMP
EST_TIMESINCELASTSOLVE
EST_FREECPU
Aggregation Methods
 Defining mechanism how to sort SeD responses: associated with the service and
defined at SeD level
 Tunable comparison/aggregation routines for scheduling
 Priority Scheduler
 Performs pairwise server estimation comparisons returning a sorted list of server responses;
 Can minimize or maximize based on SeD estimations and taking into consideration the order in
which the request for those performance estimations was specified at SeD level.
Workflow Management (ANR Gwendia)
•
Workflow representation
 Direct Acyclic Graph (DAG)
 Each vertex is a task
 Each directed edge represents
communication between tasks
•
Goals
 Build and execute workflows
 Use different heuristics to solve scheduling
problems
 Extensibility to address multi-workflows
submission and large grid platform
 Manage heterogeneity and variability of
environment
Architecture with MA DAG
•
Specific agent for workflow management (MA DAG)
•
Two modes:
 MA DAG defines a complete scheduling of the workflow (ordering and mapping)
 MA DAG defines only an ordering for the workflow execution, the mapping is done in the next step
by the client which pass by the Master Agent to find the server where execute the workflow
services.
Workflow Designer
• Applications viewed as services within DIET
• Compose services to get a complete application workflow in a
drag’&’drop fashion
DIET: Batch System interface
•
A parallel world
 Grid resources are parallel (parallel machines or clusters of compute nodes)
 Applications/services can be parallel
•
Problem
 many types of Batch Systems exist, each having its own behavior and user interface
•
Solution
 Use a layer of intermediary meta variables
 Use an abstract BatchSystem factory
BatchSystem
<<abstract>>
Loadleveler_BatchSystem
OAR1_6BatchSystem
SGE_BatchSystem
PBS_BatchSystem
Grid’5000
Grid’5000
1)
Building a nation wide experimental platform for
Grid & P2P researches (like a particle accelerator for the computer scientists)
•
9 geographically distributed sites hosting clusters with 256 CPUs to 1K CPUs)
•
All sites are connected by RENATER (French Res. and Edu. Net.)
•
RENATER hosts probes to trace network load conditions
•
Design and develop a system/middleware environment for safely test and repeat
experiments
2) Use the platform for Grid experiments in real life conditions
•
Address critical issues of Grid system/middleware:
•
Programming, Scalability, Fault Tolerance, Scheduling
•
Address critical issues of Grid Networking
•
High performance transport protocols, Qos
•
Port and test applications
•
Investigate original mechanisms
•
P2P resources discovery, desktop Grids
4 main features:
•
A high security for Grid’5000 and the Internet, despite the deep reconfiguration feature
•
A software infrastructure allowing users to access Grid’5000 from any Grid’5000 site and
have home dir in every site
•
A reservation/scheduling tools allowing users to select node sets and schedule
experiments
•
A user toolkit to reconfigure the nodes and monitor experiments
Goals and Protocol of the Experiment
Grid’5000
•
Validation of the DIET architecture at large scale over different administrative
domains
•
Protocol








DIET deployment over a maximum of processors
Large number of clients
Comparison of the DIET execution times with average local execution times
1 MA, 8 LA, 540 SeDs
2 requests/SeD
1120 clients on 140 machines
DGEMM requests (2000x2000 matrices)
Simple round-robin scheduling using
time_since_last_solve
 Orsay : 40 s
 Lyon : 38 s
 Toulouse : 33 s
 Sophia : 40 s
 Parasol : 33 s
 Bordeaux : 33 s
 Paraci : 11 s
 Lilles : 34 s
 Paravent : 9 s
Results
Grid’5000
Deployment example: Décrypthon platform
Sed = Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD.
MA = master agent, coordinates Jobs. We can define rescue or multiple Master Agent.
WN = worker node
DIET
Décrypthon
Web
Master
Agent
Interface
Orsay
Decrypthon2
Project Users
LILLE
Orsay
Decrypthon1
SeD
LoadLeveler
SeD
LoadLeveler
BORDEAUX
JUSSIEU
SeD
LoadLeveler
SeD
LoadLeveler
ORSAY
IBM WII
Lyon
CRIHAN
DB2
Data manager
Interface
BD AFM
Cliniques
Eucalyptus – the Open Source Cloud
• Eucalyptus is:
 A research project of a team from the University of California, Santa
Barbara
 An Open Source Project
 An IaaS Cloud Platform
• Base principles




A collection of Web Services on each node
Virtualization to host user images (Xen technology)
Virtual networks to provide security
Implement the Amazon EC2 interface
 Systems / Tools built for EC2 are usable
 “Turing test” for Eucalyptus
 Uses commonly-known and available Linux technologies
• http://open.eucalyptus.com/
Eucalyptus platform
DIETCloud architecture
NC
MA
CC
LA
SeD
LA
SeD
DIET
+
SeD
NC
CLC
NC
CC
Eucalyptus
NC
=
DIETCloud Architecture
• Several solutions that differ by how much of the architectures of both
systems overlap or are included one in the other
 DIET is completely included in Eucalyptus
 DIET is completely outside of Eucalyptus
 …and all the possibilities in between
DIET completely included in Eucalyptus
Eucalyptus
CLC
CC
NC
SeD
CC
NC
LA
DIET
NC
MA
• The DIET platform is
virtualized inside
Eucalyptus
• Very flexible and scalable
as DIET nodes can be
launched when needed
• Scheduling is more
complex
DIET completely outside of Eucalyptus
MA
LA
Eucalyptus
CLC
SeD
SeD
CC
DIET
NC
NC
• SeD requests
resources to
Eucalyptus
• SeD works directly
with the Virtual
Machines
• Useful when
Eucalyptus is a 3-rd
party resource
Implemented Architecture
• We have considered the architecture taking benefits of DIET design
 when DIET is completely outside of Eucalyptus
• Eucalyptus is treated as a new Batch System
 Easy and natural way of use in DIET
 DIET is designed to easily add a new batch scheduler
 Provide a new implementation for the BatchSystem abstract class
• Handling of a service call is done in three steps:
1. Obtain the requested virtual machines by a SOAP call
to Eucalyptus
2. Execute the service on the instantiated virtual machines,
bypassing the Eucalyptus controllers
3. Terminating the virtual machines by a SOAP call to
Eucalyptus
DIETCloud: a new DIET architecture
Eucalyptus
Batch
System
Eucalyptus
Amazon
EC2
Some thoughs about DIET and Clouds
• The door to using Cloud platforms through DIET has been opened
• The first DIET Cloud architecture was designed
• The current work serves as a proof of concept of using the DIET
Grid-RPC middleware on top of the Eucalyptus Cloud system to
demonstrate general purpose computing using Cloud platforms
• Possible ways of connecting the two architectures have been
studied
• Several issues still remain to be solved
 Instance startup time needs to be taken into account
 A new scheduling strategy is needed for more complex architectures
 The performance of such a system needs to be measured
Conclusions and future work
•
•
•
GridRPC

Interesting approach for several applications

Simple, flexible, and efficient

Many interesting research issues (scheduling, data management, resource discovery and reservation,
deployment, fault-tolerance, …)
DIET

Scalable, open-source, and multi-application platform

Concentration on several issues like resource discovery, scheduling (distributed scheduling and plugin
schedulers), deployment (GoDIET and GRUDU), performance evaluation (CoRI), monitoring (LogService and
VizDIET), data management and replication (DTM, JuxMem and DAGDA)

Large scale validation on the Grid5000 platform

A middleware designed and tunable for an application given
And now …

Client/server DIET for Décrypthon applications

Deployment and validation on execution

Duplicate and check requests from UD

Validation using SeD_batch (Loadleveler version)

Data management optimization

Scheduling optimization

More information and statistics for users

Fault tolerance mechanisms
http://graal.ens-lyon.fr/DIET
http://www.grid5000.org/
Research Topics
•
Scheduling
 Distributed scheduling
 Software platform deployment with or without dynamic connections between
components
 Plug-in schedulers
 Multiple (parallel) workflows scheduling
 Links with batch schedulers
 Many tasks scheduling
•
Data-management
 Scheduling of computation requests and links with data-management
 Replication, data prefetching
 Workflow scheduling
•
Performance evaluation
 Application modeling
 Dynamic information about the platform (network, clusters)
Questions ?
http://graal.ens-lyon.fr/DIET
Download