Enabling Grids for E-sciencE gUSE: grid User Support Environment Peter Kacsuk, Krisztian Karoczkai, Andras Schnautigel, Istvan Marton, Gabor Herman MTA SZTAKI www.lpds.sztaki.hu www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks Content Enabling Grids for E-sciencE • Motivations – Lessons learnt from P-GRADE portal – Lessons learnt from accessing production Grid infrastructures – Lessons learnt from providing multi-grid service • • • • The service-oriented architecture of gUSE Services in gUSE Workflow concept of gUSE Parameter sweep support of gUSE – CancerGrid • Usage of gUSE – EDGeS • Conclusions EGEE-II INFSO-RI-031688 3rd EGEE User Forum 2 Lessons learnt from P-GRADE portal Enabling Grids for E-sciencE • Popular because it provides – Easy-to-use but powerful workflow system (graphical editor, wf manager, etc.) – Easy-to-use parameter sweep concept support – Easy-to-use MPI program execution support – Multi-grid/multi-VO access mechanism (job submission grid interoperability at workflow level) for LCG-2, gLite and GT2 • Its extension with GEMLCA enables – – – – The usage of legacy codes as grid-enabled services The usage of service/job repository Access to SRB and OGSA-DAI Multi-grid/multi-VO access mechanism for LCG-2, gLite, GT2 and GT4 – Data management level of grid interoperability EGEE-II INFSO-RI-031688 3rd EGEE User Forum 3 Popularity of P-GRADE portal Enabling Grids for E-sciencE • It has been used in many EGEE and EGEE-related VOs: – GILDA, VOCE, SEE-GRID, BalticGrid, BioInfoGrid, EGRID, etc. • It has been used in many national grids: – UK NGS (a GT2-based grid), Grid-Ireland, Turkish Grid, Croatian Grid, Ukrainan Grid, etc. • It has been used as the GIN VO Resource Testing Portal • It became OSS in the beginning of Januar 2008: https://sourceforge.net/projects/pgportal/ EGEE-II INFSO-RI-031688 3rd EGEE User Forum 4 Download of OSS P-GRADE portal Enabling Grids for E-sciencE 130 downloads within a month EGEE-II INFSO-RI-031688 3rd EGEE User Forum 5 Limitations of P-GRADE portal Enabling Grids for E-sciencE • Restricted workflow capabilities – No cycle construct, no if-then-else, no embedding • Static parameter sweep capabilities – PS can not be used inside a workflow • Single user view – Too simple for IT people – Too complicated for end-users • Lack of collaborative tools supporting user communities • Monolithic architecture and as a result problems with – Scalability: simultaneous number of jobs in the range of 100s simultaneous number of users in the range of 30-50 – Adaptivity: difficult to adapt to new grid services EGEE-II INFSO-RI-031688 3rd EGEE User Forum 6 Lessons learnt from accessing production Grid infrastructures Enabling Grids for E-sciencE • Production Grids do not enable you to modify anything, just use their services (no matter they are good or bad) • Usually they provide basic grid services • The user should construct higher level services • However, if you do not want to be locked with one particular grid the user-written service should be interoperable with many basic grid services provided by different grids EGEE-II INFSO-RI-031688 3rd EGEE User Forum 7 Motivations of creating gUSE Enabling Grids for E-sciencE • We wanted – to overcome the limitations of the current P-GRADE portal – To create a set of high-level grid services that can be used with many different grids • Therefore we have defined a new service-oriented grid layer that can be deployed – on a single machine – on a cluster – on different grid sites as Web Services • Performance comparison – P-GRADE portal monolithic architecture: 100-200 jobs – WS-PGRADE/gUSE SOA architecture: 10.000 jobs EGEE-II INFSO-RI-031688 3rd EGEE User Forum 8 Monolithic architecture of P-GRADE portal Enabling Grids for E-sciencE A single Web container Workflow save read WEB UI File storage special file formats Workflow submit special protocol WFS and file Storage Read Workflow to run special protocol EGEE-II INFSO-RI-031688 Single computer Workflow Engine Grid ClientS Built-in Grid API + Hack for nonsupported APIs 3rd EGEE User Forum 9 gUSE architecture Enabling Grids for E-sciencE Graphical User Interface: WS-PGRADE Workflow storage Workflow Engine Broker/ Meta-broker gUSE information system Submitters File storage Application repository Autonomous Services: high level middleware service layer Logging gLite resources, Globus resources and Web services EGEE-II INFSO-RI-031688 Gridsphere portlets gLite or Globus or Web service: low level middleware service layer Generic service communication scheme in gUSE Enabling Grids for E-sciencE Definition of client functions Definition of server functions Function definitions RPC Service request Client Interface Client Implementation Concrete implementation of Service calls EGEE-II INFSO-RI-031688 Function implementations Service Interface RPC server Front-end implementation Service Front-end Service Back-end Service Logic 3rd EGEE User Forum 11 Distributed SOA architecture Enabling Grids for E-sciencE WF Storage Special file formats inside Workflow list and config descriptor 1 WFS WEB UI 8 2 Workflow Executor WFE Workflow descriptor 7 Job Submit Job info Status back 6 File Storage 4 Grid ClientS Grid Api EGEE-II INFSO-RI-031688 File Storage Special file formats inside Status back Workflow Submit 3 5 Web container Files needed for wf execution 3rd EGEE User Forum Application developers’ view Enabling Grids for E-sciencE • Users of gUSE can be either – grid application developers – or end-users. • Application developers can develop sophisticated workflow applications where workflows can be – embedded into each other at any depth – recursive workflows are allowed – gUSE supports the following workflow types graphs (abstract workflows) workflow templates concrete workflows workflow instances • Parametric sweep nodes and normal nodes can be used in a mixed way. EGEE-II INFSO-RI-031688 3rd EGEE User Forum 13 Collaboration support between user communities Enabling Grids for E-sciencE • Application developers can – publish incomplete wf applications (projects), wf parts (templates, graphs, concrete wf, wf instances) into a workflow repository for the use of other developers ready-to-run wf applications for end-users – import workflows from the repository and can continue the work on them even if they were published by other developers • End-users can – import ready-to-run wf applications from the repository – execute ready-to-run wf applications imported from the repository based on a simplified portal interface hiding grid details • Grid is exposed only for application developers. EGEE-II INFSO-RI-031688 3rd EGEE User Forum 14 User activities Enabling Grids for E-sciencE New Graph Edit, Copy Delete Jobs, Edges, Ports Edit Template New New New Concrete Workflow Algorithms, Resource references, Inputs Submit Workflow Instance Running state, Outputs EGEE-II INFSO-RI-031688 Constraints, Comments, Form Generators Configure, Copy, Delete Export Repository Item Import Applications, Projects, Workflow part (G,T,CW,WI) Observe, Download, Suspend, Delete 3rd EGEE User Forum 15 The workflow concept of gUSE Enabling Grids for E-sciencE • The workflow concept of gUSE is much more flexible than P-GRADE portal and many other workflow systems • Its DAG topology is extended with – – – – – embedded WFs recursive embedded WFs parameter sweep nodes conditional control mechanism special workflow starting control mechanisms based on external events or periodic timing EGEE-II INFSO-RI-031688 3rd EGEE User Forum 16 Workflow Graph: Overview Enabling Grids for E-sciencE Input Port Node: job, service call (WS, legacy), wf Output Port The Workflow Editor as it appears for the user EGEE-II INFSO-RI-031688 3rd EGEE User Forum 17 Configuring the Workflow: Overview Enabling Grids for E-sciencE m n Determine number of accepted files on external input Ports h Generator job produces multiple data on the output port within one job submission step *K 1 Legend: Determine Dot or Cross product relation of Input ports to define the number of job submissions Cross Product Dot Product Determine Job to be Collector by defining a Gathering Input Port. The Job execution will be postponed until all input files to that Port have arrived EGEE-II INFSO-RI-031688 3rd EGEE User Forum 18 Animation the number of generated output files Enabling Grids for E-sciencE m n h m*n h m*n m*n h*K h*K m*n m*n*h*K *K h*K S m*n*h*K 1 In case of Generator job the number of job submissions may differ from the number of files on Output Ports S S=max(m*n,h*k) In case of dot product the Job is submitted with input files having a common index number in each input Ports S S S S S In case of cross product individual Job submission is generated for each possible input file combination S EGEE-II INFSO-RI-031688 S 3rd EGEE User Forum 19 An example CancerGrid workflow Enabling Grids for E-sciencE N = 20e-30e, M = 100 => very large number of executions and files x1 NxM x1 xN xN xN NxM x1 Generator job xN xN xN Generator job NxM NxM EGEE-II INFSO-RI-031688 3rd EGEE User Forum 20 Interoperability support Enabling Grids for E-sciencE • gUSE supports: – grid interoperability – workflow interoperability • gUSE can easily be connected to any known grid middleware. It is already connected to GT2, GT4, LCG-2, gLite and WS based grid systems • gUSE can also be connected to local systems like clusters or supercomputers • It contains a built-in grid broker that can automatically distribute the jobs of a workflow into any of the connected grids • It can use other grid brokers like the gLite broker or GridWay EGEE-II INFSO-RI-031688 3rd EGEE User Forum 21 Interoperability support: EDGeS Enabling Grids for E-sciencE • EDGeS: Enabling Desktop Grids for e-Science • To integrate EGEE with Desktop Grids • gUSE can provide the transparent access of EGEE and DGs WSPGRADE Appl. Repository gUSE LocalDEG University DG LocalDEG LocalDEG Service Grid EGEE GlobalDEG Volunteer DG LocalDEG EGEE-II INFSO-RI-031688 3rd EGEE User Forum 22 Family of user support products Enabling Grids for E-sciencE P-GRADE portal P-GRADE/GEMLCA portal WS-PGRADE portal 1st generation 2nd generation • P-GRADE portal and gUSE/WS-PGRADE represent a family of user support products • They support the whole range of user types: – Novice application developers: 1st generation P-GRADE portals Advances application developers: 2nd generation WS-PGRADE portal developer view – End-users without grid knowledge: 2nd generation WSPGRADE portal end-user view EGEE-II INFSO-RI-031688 3rd EGEE User Forum 23 Enabling Grids for E-sciencE Family of P-GRADE products and their use • P-GRADE – Parallelizing applications for clusters and grids • P-GRADE portal – Creating simple workflow and parameter sweep applications for grids • P-GRADE/GEMLCA portal – Creating workflow applications using legacy codes and community codes from repository • gUSE/WS-PGRADE – Creating complex workflow and parameter sweep applications for clusters, service grids and desktop grids – Creating workflow applications using embedded workflows, legacy codes and community workflows from workflow repository EGEE-II INFSO-RI-031688 3rd EGEE User Forum 24 Conclusions / Future plans Enabling Grids for E-sciencE • gUSE solves all the limitation problems of P-GRADE portal: – Implementation of gUSE is highly scalable, can be distributed on a cluster or even on different grid sites. – Stress tests show that it can simultaneously serve thousands of jobs – Its workflow concept is much more expressive than in P-GRADE portal (recursive wf, generic PS support, etc.) – Its user interface called as WS-PGRADE provides a graphical workflow editor that is much faster than the one in P-GRADE portal – gUSE provides a workflow repository and its use by end-users and application developers – gUSE solves grid interoperability at workflow level among service grids between service grids and desktop grids (see EDGeS project) EGEE-II INFSO-RI-031688 3rd EGEE User Forum 25 Roadmap of gUSE Enabling Grids for E-sciencE • First version was demonstrated at SC’07 • First version will be released in March 2008 with full support for EGEE, GT2 and GT4 • Second version will be released in July 2008 with full support for desktop grids • Third version solving interoperability between EGEE and desktop grids will be released by SC’08 EGEE-II INFSO-RI-031688 3rd EGEE User Forum 26