G-Fluxo: A Workflow portal specialized in Computational BioChemistry Eduardo Gutiérrez Project Technician

advertisement
G-Fluxo: A Workflow portal specialized in
specialized
Computational BioChemistry
Eduardo Gutiérrez
Project Technician
eduardo@cesga.es
mission statement

To provide
high
performance
computing,
communications resources and services to the
scientific community of Galicia and to the
National Research Council, as well as,
institutions and enterprises with R&D activity.

To promote
to
the use of new information and
communication technologies applied to research
within the scientific community of Galicia.
Index
Motivation: Our users ...
G-Fluxo proposal
P-GRADE Portal
Local Infrastructures (not Grid related). DRMAA
DAG Workflow (condor supported)
Portlet development. Visualization.
Future work
Motivation
Motivation
What happens if a specific job requires HPC resources ( a high demand
of memory, low latency/high bandwidth network) or have a private
license only available for an specific machine?
A E-scientist have access to different computing resources:
A Personal Computer
A Computer Cluster (Lab or institution)
Grid resources
Supercomputing center
Motivation
Our users:
Usually they belong to small research groups with access
to their own local clusters, CESGA servers and grid
infrastructures:
Unique access point
Applications have very different requirements depending
of the case to be run so different computational resources
should be used.
G-Fluxo
proposal
P-GRADE Portal
General purpose, workflow-oriented computational Grid portal. Supports
the development and execution of workflow-based Grid applications – a
tool for Grid orchestration
Based on GridSphere-2.2.10
Easy to expand with new portlets (e.g. application-specific portlets)
Easy to tailor to end-user needs
Solves Grid interoperability problem at the workflow level
http://www.lpds.sztaki.hu/pgportal
Local Computing Infrastructures
A modification in P-GRADE Portal have been done so it is possible to
create a “GRID configurations” formed by just one local computational
resource.
The local resource must be:
Accessible by SSH
A DRMAA C supported scheduler must run on it
Local Computing Infrastructures
Local cluster setup:
Portal administrator creates a “GRID configuration” with the syntax:
[host DN|host IP] CLUSTER-GRID
Each user add portal SSH public key to the file ~/.ssh/authorized_keys
Local Computing Infrastructures
For each job, the following directory is created in the local
cluster:
[local cluster]:gfluxo/[workflow name]/[job name]
A new syntax has been specifically defined into the workflow
editor:
cluster:[host DN|host IP]:[path to file]
Each user can indicate, for each job, the necessary resources for
the local cluster execution.
# gfluxo: [resource name] [resource value]
The administrator also has to configure a file to map each portal
user to a local cluster user.
[portal directory]/portal_work/users/[user]/usermap.[host DN|host IP]
Local Computing Infrastructures
Implementation:
In P-GRADE Portal, Bash scripts are called by Condor.
The files 'wkf_pre.sh' and 'wkf_post.sh' have been modified.
The files 'wkf_pre_CLUSTER-GRID.sh', 'wkf_post_CLUSTERGRID.sh', 'wkf_CLUSTER-GRID.sh' and 'ff_CLUSTER-GRID.sh'
have been created.
DAG Workflow (condor suported)
Workflow support:
P-GRADE Portal support DAG Workflows through the use of condor
for job management.
In this way quite complex workflows are supported
Workflows
Workflow example with HPC and HTC:
Three GROMACS simulation jobs.
Two jobs to local cluster and one to Grid.
Workflows
Still we need application communication:
Common data format: for example Q5COST
Wrappers
Conversion routines
Portlet development: Visualization
Expanding P-Grade
functionality:
A graphical interface for
specific applications:
A portlet to show the
simulation output Gromacs
based on Jmol. The user
can see the job results in an
interactive and 3D way.
Integration in COMPCHEM
Our Jmol portlet, with a GROMACS workflow, have been tested
in the P-GRADE Portal used by COMPCHEM VO of EGEE
http://compchem.unipg.it
Portlet development: Visualization
Future Work
New Portlets development:
File manager for local clusters.
Specific Portlets to produce application input.
Support for data format conversion routines like Open
Babel.
Better application support
Future Work
Scheduling based on demanded resources:
The user only specify the resources needed for the job
and the Portal Scheduler submit it to the most
adequate computational resource.
The Portal administrator define a policy based on
demanded resources and the resources available.
Still quite far from this …
Summarizing..
.
G-Fluxo try to give answers to the computational
needs a e-scientist could have:
Orchestration of the use of many different
Computational infrastructures
Specific support for applications
Acknowledgements
P-GRADE portal people (MTA-SZTAKI)
Aurelio Rodríguez, Javier López Cacheiro (CESGA)
COST D37 action (Alessandro Costantini)
Xunta de Galicia for the funding through the INCITE
project 07SIN001CT
Thanks for your attention!!
Questions??
Download