Workflow management within DIET Raphaël Bolze LIP ENS Lyon, CNRS INRIA Rhône-Alpes,

advertisement
Workflow management within DIET
Raphaël Bolze
LIP ENS Lyon, CNRS
INRIA Rhône-Alpes,
GRAAL project
http://graal.ens-lyon.fr
Introduction
• Distributed Interactive Engineering Toolbox





RPC and grid-computing : gridRPC
DIET goals
DIET environment & architecture
Request management
Research topics & features
• DIET and workflow management




Needs
Language
Architectures
Scheduling propose
• Target applications




PipeAlign
Docking
Robinson
Cosmology
• Current works
2
R. Bolze – 19 oct 2006 Edinburgh
Distributed Interactive Engineering Toolbox
RPC and Grid-Computing: GridRPC
•
One simple idea
 One simple (and efficient) paradigm for grid computing: offering (or leasing)
computational power and/or storage capacity through the Internet
 One simple solution: implementing the RPC programming model over the Grid
– Using resources accessible through the network
– Mixed parallelism model (data-parallel model at server level and task parallelism
between servers)
•
Features needed
– Load-balancing (resource localization and performance evaluation, scheduling),
– Data and replica management,
– Security,
– Fault-tolerance,
– Interoperability with other systems,
– …
Design of a standard interface
– within the GGF/OGF (GridRPC WG, C. Lee)
– www.ogf.org, forge.gridforum.org/projects/gridrpc-wg
– Existing implementations: GridSolve, Ninf, DIET, XtremWeb
•
4
R. Bolze – 19 oct 2006 Edinburgh
RPC and Grid Computing: Grid RPC
Request
AGENT(s)
S2 !
Client
Op(C, A, B)
S1
S2
S3
S4
5
R. Bolze – 19 oct 2006 Edinburgh
DIET’s Goals
•
Our goals

To develop a toolbox for the deployment of environments using the Application Service Provider
(ASP) paradigm with different applications

Use as much as possible public domain and standard software

To obtain a high performance and scalable environment

Implement and validate our more theoretical results

•
Scheduling for heterogeneous platforms, data (re)distribution and replication, performance evaluation,
algorithmic for heterogeneous and distributed platforms, …
Based on CORBA, NWS, LDAP, and our own software developments

CoRI for performance evaluation,

FAST

CoRI-easy

LogService for monitoring,

VizDIET for the visualization,

GoDIET for the deployment
•
Several applications in different fields (simulation, bioinformatic, cosmological
application…)
•
Release 2.1 available on the web
•
Release 2.2 coming soon
http://graal.ens-lyon.fr/DIET/
R. Bolze – 19 oct 2006 Edinburgh
DIET Environment
CLIENT
C
C
C
C
C
C
C
A
C
A
A
A
C
A
A
A
A
S S S
A
Sequential
Application
A
Parallel
Application
A
S S S
Data management
Application
7
R. Bolze – 19 oct 2006 Edinburgh
DIET Architecture
Client
Master Agent
MA
ServerDeamons
LA
LA
Local Agent
LA
LA
8
R. Bolze – 19 oct 2006 Edinburgh
Requests Management
FindServer()
bestServer
= S3
Aggregate()
{
min(…);
FindServer()
}
runService(…);
Aggregate() {
min(…);
FindServer()
}
estimate() {
predExecTime(…);
}
agent
agent
server
9
R. Bolze – 19 oct 2006 Edinburgh
Research Topics
•
•
•
•
•
Scheduling

Distributed scheduling

Plug-in schedulers
Data-management

Scheduling of computation requests and links with data-management

Replication, data prefetching
Deployment

Mapping components on available (selected) resources

Software platform deployment with or without dynamic connections between components
Performance evaluation

Application modeling

Dynamic information about the platform (network, clusters)
Fault Tolerance

Failure Detection

Application recovery …
10
R. Bolze – 19 oct 2006 Edinburgh
Scheduling
DIET Scheduling
•
SeD level
 Performance estimation function
 Estimation metric vector (estVector_t) - dynamic collection of performance estimation
values
 Performance measures available through DIET



FAST-NWS performance metrics
Time elapsed since the last execution
CoRI (Collector of Resource Information)
 Developer defined values

Standard estimation tags for accessing the fields of an estVector_t




•
EST_FREEMEM
EST_TCOMP
EST_TIMESINCELASTSOLVE
EST_FREECPU
Aggregation Methods
 Defining mechanism how to sort SeD responses: associated with the service and
defined at SeD level
 Tunable comparison/aggregation routines for scheduling
 Priority Scheduler
 Performs pairwise server estimation comparisons returning a sorted list of server responses;
 Can minimize or maximize based on SeD estimations and taking into consideration the order in
which the request for those performance estimations was specified at SeD level.
12
R. Bolze – 19 oct 2006 Edinburgh
DIET Scheduling
• Collector of Resource Information (CoRI)
• CoRI-Easy – provides basic measurements of the environment
• CoRI Manager – manage the use of different collectors
CoRI Manager
CoRI-Easy
Collector
FAST
Collector
Other
Collectors
like
Ganglia
13
R. Bolze – 19 oct 2006 Edinburgh
Data management
Data/replica management
•
Two needs
 Keep the data in place to reduce the overhead of communications between clients and
servers
 Replicate data whenever possible
•
Two approaches for DIET
Client
 DTM (LIFC, Besançon)
 Hierarchy similar to the DIET’s one
 Distributed data manager
 Redistribution between servers
A
Server 1
B
B
F
 JuxMem (Paris, Rennes)
B
 P2P data cache
X
Server 2
•
Work done within the GridRPC Working Group (OGF)
 Relations with workflow management
Y
G
Client
15
R. Bolze – 19 oct 2006 Edinburgh
Data management with DTM within DIET
• Persistence at the server level
• To avoid useless data transfers




•
Intermediate results
Between clients and servers
Between servers
“transparent” for the client
Data Manager/Loc Manager
 Hierarchy mapped on the DIET one
 modularity
•
Proposition to the Grid-RPC WG (OGF)
 Data handles
 Persistence flag
 Data management functions
16
R. Bolze – 19 oct 2006 Edinburgh
JUXMEM
PARIS project, IRISA, France
• A peer-to-peer architecture for a data-sharing service in memory
• Persistence and data coherency mechanism
• Transparent data localization

Peer
ID
Peer
ID
Peer
ID
Peer
ID
Peer
ID
Peer
ID
Peer
ID
Peer
ID
Toolbox for the
development of P2P
applications


One peer


Peer TCP/IP
PeerPeer
Peer
Firewall
Peer
Peer
Set of protocols
Unique ID
Several communication
protocols (TCP, HTTP, …)
Peer
Firewall
Peer
Peer
Peer
Peer
HTTP
17
R. Bolze – 19 oct 2006 Edinburgh
Deployment and visualization
Deployment Management
Distributed deployment
of DIET
DIET
Administration
GoDIET
XML:
- Resources
- Machines
- Storage
- DIET hierarchy
Traces
Trace
Subset
LogService
Trace subset
VizDIET
19
R. Bolze – 19 oct 2006 Edinburgh
VizDIET
20
R. Bolze – 19 oct 2006 Edinburgh
Workflow management
Workflow Management : needs ?
 Workflow representation :
 Direct Acyclic Graph (DAG)
• Each vertex is a tasks
• Each directed edge represents
communication between tasks
 Questions :
 Ordering problem ?
 Mapping problem ?
22
R. Bolze – 19 oct 2006 Edinburgh
Workflow Management : goals
 Goals
 Build and execute workflow
 Use different heuristic methods to solve scheduling problems
 Extensibility to address mutli-workflows submission and large grid
platform
 Manage heterogeneity and variability of environment
23
R. Bolze – 19 oct 2006 Edinburgh
Workflow Management : existing languages ?
 Workflows languages:
 No standard (XML, scripts)
 Exemples :
• Condor DAGman : script
• Pegasus : DAX (xml)
• Taverna : XScuffl (xml)
 2 levels of description :
• Abstract : application description
• Concrete : execution description
24
R. Bolze – 19 oct 2006 Edinburgh
Workflow Management
 Workflow description in DIET
 Xml format
 DIET profile : problem (id), parameters (in, inout ,out)
 Description of tasks and data dependency
<!-- NORMD 2 -->
<node id="normd2" path="normd">
<in name="in_file" type="DIET_FILE" source="rascal1#out_file" />
<out name="normd_value" type="DIET_FLOAT" />
<out name="srv_time" type="DIET_DOUBLE" />
<prec id="rascal1" />
</node>
<!-- LEON 1 -->
<node id="leon1" path="leon">
<arg name="protein_name" type="DIET_STRING" value="P07942" />
<in name="clustalw_file" type="DIET_FILE" source="clustalw1#out_file" />
<in name="rascal_file" type="DIET_FILE" source="rascal1#out_file" />
<in name="clustalw_normd" type="DIET_FLOAT" source="normd1#normd_value" />
<in name="rascal_normd" type="DIET_FLOAT" source="normd2#normd_value" />
<out name="srv_time" type="DIET_DOUBLE" />
<out name="out_file1"
type="DIET_FILE" />
<out name="out_file2"
type="DIET_FILE" />
<prec id="normd2" />
</node>
25
R. Bolze – 19 oct 2006 Edinburgh
Workflow Management : architecture
 2 Architectures :
 Meta scheduler in the client side
 Meta scheduler distributed in the client and in the MA-DAG
26
R. Bolze – 19 oct 2006 Edinburgh
Workflow Management : Meta scheduler : client
 Architecture 1 :
 Meta scheduler in the client side
MA
Client
LA
LA
LA
SeD
SeD
SeD
SeD
SeD
27
R. Bolze – 19 oct 2006 Edinburgh
Workflow management : Meta scheduler : client
• Disadvantages :
 No coordination between the different clients
 Depends on client capability
• Benefits :
 More flexible for evolution :
 Client can use his own algorithm.
 More scalable, depends on client capability.
28
R. Bolze – 19 oct 2006 Edinburgh
Workflow management
 Architecture 2 :
 Meta scheduler distributed in the client and in the MA-DAG
MA DAG
Client
MA
LA
LA
LA
SeD
SeD
SeD
SeD
SeD
29
R. Bolze – 19 oct 2006 Edinburgh
Workflow management - Meta scheduler
•
Base Scheduler :
 No ranking, respect the topological
order of the DAG
 HEFT heuristic
•
Abstract Workflow Scheduler
Virtual void execute();
Virtual void reSchedule();
Flexibility :
 Architecture 1 :
 Client can have his own schedule
 No needs to re-build the platform
 Architecture 2 :
 Schedulers are define at the compile
time.
 Needs to re-build the platform if some
decide the change.
User defined Scheduler
Virtual void execute();
Virtual void reSchedule();
30
R. Bolze – 19 oct 2006 Edinburgh
Target applications
Docking Application
 Detection of protein-protein and protein-DNA interactions.
 Screening a database containing thousands of proteins for functional sites
involved in binding to other proteins, DNA or ligand targets.
params
docking
docking
docking
docking
merge
32
R. Bolze – 19 oct 2006 Edinburgh
PipeAlign Application
 The sequence-to-function relationship can be understood through the
analysis of conserved patterns and evolution of protein organization mainly
based on amino acid sequence comparisons in the context of the multiple
alignments.
blastall
ballast
filtering
clustalw
rascal
normd
normd
leon
normd
R. Bolze – 19 oct 2006 Edinburgh
33
Robinson application
 This application annotate human genes according to their expression in
neurological or muscular tissues, but also to the expression of their homolog
other species.
extract
extract
extract extract extract extract
Build DB
blastall
blastall
blastall
blastall
34
R. Bolze – 19 oct 2006 Edinburgh
Cosmology application
• Simulate the evolution of dark matter particles during time to compare it
to the real observation.
Grapfic1
rollWhiteNoise
Centre de Recherche en Astronomie de Lyon
Grapfic1
Grapfic1
Grapfic1
Grapfic1
Grapfic1
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Ramses3D
HaloMaker
HaloMaker
HaloMaker
HaloMaker
TreeMaker + GalaxyMaker
35
R. Bolze – 19 oct 2006 Edinburgh
Current Work
Multi-Workflow
 Deal with multiple workflow submission




On-line scheduling, different submission time
Implements “fair” scheduling strategies
Implements specific scheduling heuristics
Distribute the workflow management
?
grid
37
R. Bolze – 19 oct 2006 Edinburgh
Multi-Workflow
 Simulations
 Real experiments on Grid’5000
38
R. Bolze – 19 oct 2006 Edinburgh
Conclusion
 DIET
 Workflow enabled
 Data management : DTM, JuXMEM
 Performance information : CoRI, FAST
 Plugin schedulers
 Multi-Applications
39
R. Bolze – 19 oct 2006 Edinburgh
Questions ?
http://graal.ens-lyon.fr
Download