G-Fluxo: A Workflow portal specialized in specialized Computational BioChemistry Eduardo Gutiérrez Project Technician eduardo@cesga.es mission statement To provide high performance computing, communications resources and services to the scientific community of Galicia and to the National Research Council, as well as, institutions and enterprises with R&D activity. To promote to the use of new information and communication technologies applied to research within the scientific community of Galicia. Index Motivation: Our users ... G-Fluxo proposal P-GRADE Portal Local Infrastructures (not Grid related). DRMAA DAG Workflow (condor supported) Portlet development. Visualization. Future work Motivation Motivation What happens if a specific job requires HPC resources ( a high demand of memory, low latency/high bandwidth network) or have a private license only available for an specific machine? A E-scientist have access to different computing resources: A Personal Computer A Computer Cluster (Lab or institution) Grid resources Supercomputing center Motivation Our users: Usually they belong to small research groups with access to their own local clusters, CESGA servers and grid infrastructures: Unique access point Applications have very different requirements depending of the case to be run so different computational resources should be used. G-Fluxo proposal P-GRADE Portal General purpose, workflow-oriented computational Grid portal. Supports the development and execution of workflow-based Grid applications – a tool for Grid orchestration Based on GridSphere-2.2.10 Easy to expand with new portlets (e.g. application-specific portlets) Easy to tailor to end-user needs Solves Grid interoperability problem at the workflow level http://www.lpds.sztaki.hu/pgportal Local Computing Infrastructures A modification in P-GRADE Portal have been done so it is possible to create a “GRID configurations” formed by just one local computational resource. The local resource must be: Accessible by SSH A DRMAA C supported scheduler must run on it Local Computing Infrastructures Local cluster setup: Portal administrator creates a “GRID configuration” with the syntax: [host DN|host IP] CLUSTER-GRID Each user add portal SSH public key to the file ~/.ssh/authorized_keys Local Computing Infrastructures For each job, the following directory is created in the local cluster: [local cluster]:gfluxo/[workflow name]/[job name] A new syntax has been specifically defined into the workflow editor: cluster:[host DN|host IP]:[path to file] Each user can indicate, for each job, the necessary resources for the local cluster execution. # gfluxo: [resource name] [resource value] The administrator also has to configure a file to map each portal user to a local cluster user. [portal directory]/portal_work/users/[user]/usermap.[host DN|host IP] Local Computing Infrastructures Implementation: In P-GRADE Portal, Bash scripts are called by Condor. The files 'wkf_pre.sh' and 'wkf_post.sh' have been modified. The files 'wkf_pre_CLUSTER-GRID.sh', 'wkf_post_CLUSTERGRID.sh', 'wkf_CLUSTER-GRID.sh' and 'ff_CLUSTER-GRID.sh' have been created. DAG Workflow (condor suported) Workflow support: P-GRADE Portal support DAG Workflows through the use of condor for job management. In this way quite complex workflows are supported Workflows Workflow example with HPC and HTC: Three GROMACS simulation jobs. Two jobs to local cluster and one to Grid. Workflows Still we need application communication: Common data format: for example Q5COST Wrappers Conversion routines Portlet development: Visualization Expanding P-Grade functionality: A graphical interface for specific applications: A portlet to show the simulation output Gromacs based on Jmol. The user can see the job results in an interactive and 3D way. Integration in COMPCHEM Our Jmol portlet, with a GROMACS workflow, have been tested in the P-GRADE Portal used by COMPCHEM VO of EGEE http://compchem.unipg.it Portlet development: Visualization Future Work New Portlets development: File manager for local clusters. Specific Portlets to produce application input. Support for data format conversion routines like Open Babel. Better application support Future Work Scheduling based on demanded resources: The user only specify the resources needed for the job and the Portal Scheduler submit it to the most adequate computational resource. The Portal administrator define a policy based on demanded resources and the resources available. Still quite far from this … Summarizing.. . G-Fluxo try to give answers to the computational needs a e-scientist could have: Orchestration of the use of many different Computational infrastructures Specific support for applications Acknowledgements P-GRADE portal people (MTA-SZTAKI) Aurelio Rodríguez, Javier López Cacheiro (CESGA) COST D37 action (Alessandro Costantini) Xunta de Galicia for the funding through the INCITE project 07SIN001CT Thanks for your attention!! Questions??