P-GRADE Portal Family for e-Science Communities Peter Kacsuk MTA SZTAKI Univ. of Westminster www.lpds.sztaki.hu/pgportal pgportal@lpds.sztaki.hu 1 The community aspects of e-science • Web2 is about creating and supporting web communities • Grid is about creating virtual organizations where escience communities – can share resources and – can collaborate • A portal should support e-science communities in their collaborations and resource sharing • And even more: it should provide simultaneous access to any accessible – Resources – Databases – Legacy applications – Workflows, etc. no matter in which grid they are operated on. 2 Who are the members of an e-science community? End-users (e-scientists) • Execute the published applications with custom input parameters by creating application instances using the published applications as templates Grid Application Developers • Develop grid applications by the portal • Publish the completed applications for end-users Grid Portal Developers • Develop the portal core services (job submission, etc.) • Develop higher level portal services (workflow management, etc.) • Develop specialized/customized portal services (grid testing, rendering, etc.) • Writes technical, user and installation manuals 3 What does an individual e-scientist need? App. Repository Using a portal to parameterize and run these applications by transparently accessing a large set of various IT resources from the e-science infrastructure Access to a large set of ready-to-run scientific applications (services) Portal Supercomputer based SGs (DEISA, TeraGrid) Cluster based service grids (SGs) (EGEE, OSG, etc.) Local clusters Clouds Supercomputers E-science infrastructure Desktop grids (DGs) (BOINC, Condor, etc.) Grid systems 4 What does an e-science community need? App. Repository E-scientists Application developers Portal The same as an individual scientist but in Supercomputer collaboration with other members ofbased the SGs (DEISA, TeraGrid) community Cluster based service grids (SGs) (EGEE, OSG, etc.) Local clusters Clouds Supercomputers Desktop grids (DGs) (BOINC, Condor, etc.) Grid systems 5 Collaboration between e-scientists and application developers App. Repository E-scientists Application developers Portal Application Developers • Develop e-science applications via the portal in collaboration with e-scientists • Publish the completed applications for end-users via an application repository End-users (e-scientists) • Specify the problem/application needs • Execute the published applications via the portal with custom input parameters by creating application instances 6 Collaboration between application developers • Application developers use the portal to develop complex applications (e.g. parameter sweep workflow) for the e-science infrastructure • Publish templates, legacy code appls. and halfmade applications in the repository to be continued by other appl. developers App. Repository Application developers Portal Supercomputer based SGs (DEISA, TeraGrid) Cluster based service grids (SGs) (EGEE, OSG, etc.) Local clusters Clouds Supercomputers Desktop grids (DGs) (BOINC, Condor, etc.) Grid systems 7 Collaboration between e-scientists App. Repository E-scientists Portal • Joint run appls via the portal in the e-science infrastructure • Joint observation and control of appl execution via the portal Supercomputer based SGs (DEISA, TeraGrid) Cluster based service grids (SGs) (EGEE, OSG, etc.) Local clusters Clouds Supercomputers • Sharing parameterized appls via the repository Desktop grids (DGs) (BOINC, Condor, etc.) Grid systems 8 Requirements for an e-science portal from the e-scientists’ point of view It should be able to • Support large number of e-scientists (~ 100) with good response time • Enable the store and share of ready-to-run applications • Enable to parameterize and run applications • Enable to observe and control application execution • Provide reliable appl. execution service even on top of unreliable infrastructures (like for example grids) • Provide specific, user community views • Enable the access of the various components of an escience infrastructure (grids, databases, clouds, local clusters, etc.) • Support user’s collaboration via sharing: – Applications (legacy, workflow, etc.) – Databases 9 Requirements for an e-science portal from the app. developers’ point of view It should be able to • Support large number of application developers (~ 100) with good response time • Enable the store and share of half-made applications, application templates • Provide graphical appl. developing tools (e.g. workflow editor) to develop new applications • Enable to parameterize and run applications • Enable to observe and control application execution • Provide methods and API to customize the portal interface towards specific user community needs by creating user-specific portlets • Enable the access of the various components of an e-science infrastructure (grids, databases, clouds, local clusters, etc.) • Support application developers’ collaboration via sharing: – Applications (legacy, workflow, etc.) – Databases • Enable the integration/call of other services 10 Choice of an e-science portal • Basic question for a community: – Buy a commercial portal? (Usually expensive) – Download OSS portal? (Good choice but: Does the OSS project survive for a long time?) – Develop own portal? (Requires long time and can become very costly) • The best choice is: Download OSS where there is an active development community behind the portal 11 The role of the Grid portal developers’ community Grid Portal Developers • Jointly develop the portal core services (e. g. GridSphere, OGCE, Jetspeed-2, etc.) • Jointly develop higher level portal services (workflow management, data management, etc.) • Jointly develop specialized/customized portal services (grid testing, rendering, etc.) • Never build a new portal from scratch, use the power of the community to create really good portals • Unfortunately, we are not quite there: – Hundreds of e-science portals have been developed – Some of them are really good: • Genius, Lead, etc. – However, not many of them OSS (see the sourceforge list on the next slide) – Even less is actively maintained – Even less satisfies the generic requirements of a good e-science portal 12 Downloadable Grid portals from SourceForge Generic Since Number of downloads Active or Finished activity P-GRADE yes 2008-01-04 1468 Active SDSC Gridport yes 2003-10-01 1266 2004-01-15 Lunarc App. yes 2006-10-05 783 Active GRIDPortal yes for NorduGrid 2006-07-07 231 2006-08-09 NCHC yes 2007-11-07 161 Active Telemed App. Spec. 2007-11-15 283 Active 13 P-GRADE portal family • The goal of the P-GRADE portal family – To meet all the requirements of end-users and application developers listed above – To provide a generic portal that can be used by a large set of e-science communities – To provide a community code based on which the portal developers’ community can start to develop specialized and customized portals 14 P-GRADE portal family 2008 2009 2010 P-GRADE portal 2.4 GEMLCA Grid Legacy Code Arch. P-GRADE portal 2.5 Param. Sweep NGS P-GRADE portal P-GRADE portal 2.8 Current release P-GRADE portal 2.9 Under development Basic concept Open source from Jan. 2008 GEMLCA, repository concept WS-PGRADE Portal Beta release 3.3 WS-PGRADE Portal Release 3.4 15 P-GRADE Portal in a nutshell • • • General purpose, workflow-oriented Grid portal. Supports the development and execution of workflow-based Grid applications – a tool for Grid orchestration Based on GridSphere-2 – Easy to expand with new portlets (e.g. application-specific portlets) – Easy to tailor to end-user needs Basic Grid services supported by the portal: Service Job submission File storage EGEE grids (LCG-2/gLite) Globus 2 grids Computing Element GRAM Storage Element, LFC GridFTP server Certificate management Information system Brokering MyProxy/VOMS BDII MDS-2, MDS-4 WMS (Workload Management System) GTbroker Job monitoring Mercury Workflow & job visualization PROVE 16 The typical user scenario Part 1 - development phase Certificate servers SAVE WORKFLOW, UPLOAD LOCAL FILES Portal server Grid services START EDITOR OPEN & EDIT or DEVELOP WORKFLOW 17 The typical user scenario Part 2 - execution phase Certificate servers TRANSFER FILES, SUBMIT JOBS DOWNLOAD PROXY CERTIFICATES VISUALIZE JOBS and WORKFLOW PROGRESS Portal server MONITOR JOBS Grid services SUBMIT WORKFLOW DOWNLOAD (SMALL) RESULTS DOWNLOAD (SMALL) RESULTS 18 P-GRADE Portal architecture Client Java Webstart workflow editor Web browser Tomcat Frontend layer P-GRADE Portal server Backend layer Grid P-GRADE Portal portlets (JSR-168 Gridsphere-2 portlets) DAGMan workflow manager shell scripts Information system clients CoG API & scripts gLite and Globus Information systems MyProxy server & VOMS Grid middleware clients Grid middleware services (gLite WMS, LFC,…; Globus GRAM, …) 19 P-GRADE portal in a nutshell Certificate and proxy management Grid and Grid resource management Graphical editor to define workflows and parametric studies Accessing resources in multiple VOs Built-in workflow manager and execution visualization GUI is customizable to certain applications 20 What is a P-GRADE Portal workflow? • A directed acyclic graph where – Nodes represent jobs (batch programs to be executed on a computing element) – Ports represent input/output files the jobs expect/produce – Arcs represent file transfer operations and job dependencies • Semantics of the workflow: – A job can be executed if all of its input files are available 21 Introducing three levels of parallelism Multiple instances of the same workflow with different data files – Parallel execution inside a workflow node – Parallel execution among workflow nodes – Parameter study execution of the workflow Multiple jobs run parallel Each job can be a parallel program 22 Parameter sweep (PS) workflow execution based on the black box concept 1 PS workflow execution PS port: 3 instances of the input file 4 x 3 normal executable workflows (e-workflows) PS port: 4 instances of the input file = This provides the 3rd level of parallelism resulting a very large demand for Grid resources 23 Workflow parameter studies in P-GRADE Portal Generator component(s) Initial input data Core workflow Generate or cut input into smaller pieces E-workflows Collector component(s) Files in the same LFC catalog (e.g. /grid/gilda/sipos/myinputs) Results produced in the same catalog Aggregate result 24 Generic structure of PS workflows and their execution 1st phase: executing all Generator s in parallel 3rd phase: executing all Collectors in parallel Generator jobs to generate the set of input files 2nd phase: executing all generated eWorkflows in parallel Core workflow to be executed as PS Collector jobs to collect and process the set of output files 25 Integrating P-GRADE portal with DSpace repository • Goal: to make available workflow applications for the whole P-GRADE portal user community • Solution: Integrating P-GRADE portal with DSpace repository • Functions: – App developers can publish their ready-to-use and halfmade applications in the repository – End-users can download, parameterize and execute the applications stored in the repository DSpace repository Portal Portal Portal App developer End-user • Advantage: • Appl. developers can collaborate with end-users • Members of a portal user community can share their WFs • Different portal user communities can share their WFs 26 Integrating P-GRADE portal with DSpace repository DSpace Repository Upload WF to DSpace Download WF from DSpace 27 Creating application specific portals from the generic P-GRADE portal • Creating an appl. spec. portal does not mean to develop it from scratch • P-GRADE is a generic portal that can quickly and easily be customized to any application type • Advantage: – You do not have to develop the generic parts (WF editor, WF manager, job submission, monitoring, etc.) – You can concentrate on the appl. spec. part – Much shorter development time 28 Concept of creating application specific portals ClientEnd user Web browser Appl. developer P-GRADE portal developer P-GRADE Portal server Custom User Interface (Written in Java, JSP, JSTL) Application Specific Module P-GRADE portal developer Services of P-GRADE Portal (workflow management, parameter study management, fault tolerance, …) Grid EGEE and Globus Grid services (gLite WMS, LFC,…; Globus GRAM, …) 29 Roles of people in creating and using customized P-GRADE portals Grid Application Developer • develops a grid application by P-GRADE Portal • sends the application to the grid portal developer They can be the same group Grid Portal Developer • Creates new classes from the ASM for P-GRADE by changing the names of the classes • Develops one or more Gridsphere portlets that fit to the application I/O pattern and the end users’ needs • Connects the GUI to P-GRADE Portal using the programming API of P-GRADE ASM • Using the ASM he publishes the grid application and its GUI for end users End User • Executes the published application with custom input parameters by creating application instances using the published application as a template 30 Application Specific P-GRADE portals Rendering portal by Univ. of Westminster OMNeT++ portal by SZTAKI Traffic simulation portal by Univ. of Westminster 31 Grid interoperation by P-GRADE portal • P-GRADE Portal enables: Simultaneous usage of several production Grids at workflow level • Currently connectable grids: – LCG-2 and gLite: EGEE, SEE-GRID, BalticGrid – GT-2: UK NGS, US OSG, US Teragrid • In progress: – – – – Campus Grids with PBS or LSF BOINC desktop Grids ARC: NorduGrid UniCore: D-Grid 32 Simultaneous use of production Grids at workflow level UK NGS GT2 Job Manchester SZTAKI Portal Server User Workflow P-GRADE Portal Leeds EGEE-VOCE gLite Budapest Supports both direct and brokered job submission Job WMS broker Job Athens Job Brno 33 P-GRADE Portal references • P-GRADE Portal services: – – – – SEE-GRID, BalticGrid Central European VO of EGEE GILDA: Training VO of EGEE Many national Grids (UK, Ireland, Croatia, Turkey, Spain, Belgium, Malaysia, Kazakhstan, Switzerland, Australia, etc.) – US Open Science Grid, TeraGrid – Economy-Grid, Swiss BioGrid, Bio and Biomed EGEE VOs, MathGrid, etc. Portal services and account request: – portal.p-grade.hu/index.php?m=5&s=0 34 Community based business model for the sustainability of P-GRADE portal • Some of the developments are related to EU projects. Examples: – PS feature: SEE-GRID-2 – Integration with DSpace: SEE-GRID-SCI – Integration with BOINC: EDGeS, CancerGrid • There is an open Portal Developer Alliance with the current active members: – Middle East Technical Univ. (Ankara, Turkey) • gLite file catalog management portlet – Univ. of Westminster (London, UK) • • • • GEMLCA legacy code service extension SRB integration (workflow and portlet) OGSA-DAI integration (workflow and portlet) Embedding Taverna, Kepler and Triana WFs into the P-GRADE workflow • All these features are available in the UK NGS P-GRADE portal 35 Business model for the sustainability of P-GRADE portal • Some of the developments are ordered by customer academic institutes: – – – – – Collaborative WF editor: Reading Univ. (UK) Accounting portlet: MIMOS (Malaysia) Separation of front-end and back-end: MIMOS Shiboleth integration: ETH Zurich ARC integration: ETH Zurich • Benefits for the customer academic institutes: – Basically they like the portal but they have some special needs that require extra development – Instead of developing from scratch a new portal (using many person-months) rather they pay only for the required little extension/modification of the portal – To solve their problem gets priority – They become expert of the internal structure of the portal and will be able to further develop it according to their needs – Joint publications 36 Main features of NGS P-GRADE portal • Extends P-GRADE portal with – – – – – GEMLCA legacy code architecture and repository SRB file management OGSA-DAI database access WF level interoperation of grid data resources Workflow interoperability support • All these features are provided as production service for the UK NGS 37 Interoperation of grid data resources Grid 1 Grid 2 Workflow engine DB1 J1 FS2 J3 J2 J4 DB2 J5 FS1 J: Job FS: File storage system, e.g. SRB or SRM DB: Database management system (based on OGSA-DAI) 38 Workflow level Interoperation of local, SRB, SRM and GridFTP file systems From NGS SRB From NGS SRB (both) From NGS GFTP Running at EGEE From local (both) Running at OSG Running at NGS Jobs can run in various grids and can read and write files stored in different grid systems by different file management systems Running at NGS From NGS SRB To EGEE SRM From NGS GFTP To NGS SRB Running at NGS 39 WF interoperability: P-GRADE workflow embedding Triana, Taverna, and Kepler workflows Triana workflow Taverna workflow Available for UK NGS users as production service P-GRADE workflow hosting the other workflows Kepler workflow 40 WS-PGRADE and gUSE • New product in the P-GRADE portal family: – WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment) • WS-PGRADE uses the high-level services of – gUSE (Grid User Support Environment) architecture • Integrates and generalizes P-GRADE portal and NGS P-GRADE portal features – Advance data-flows (PS features) – GEMLCA – Workflow repository • gUSE features – Scalable architecture (can be installed on one or more servers) – Various grid submission services (GT2, GT4, LCG-2, gLite, BOINC, local – Built-in inter-grid broker (seamless access to various types of resources) • Comfort features – Different separated user views supported by gUSE application repository 41 gUSE: service-oriented architecture Graphical User Interface: WS-PGRADE Workflow storage Workflow Engine Meta-broker gUSE gUSE information system Submitters Submitters Submitters Submitters Gridsphere portlets File File storage storage Application repository Autonomous Services: high level middleware service layer Logging Local resources, Service grid resources, Desktop Grid resources, Web services, Databases Resources: middleware service layer 42 Ergonomics • Users can be grid application developers or end-users. • Application developers design sophisticated dataflow graphs – embedding into any depth, recursive invocations, conditional structures, generators and collectors at any position – Publish applications in the repository at certain stages of work • Applications • Projects • Concrete workflows • Templates • Graphs • End-users see WS-PGRADE portal as a science gateway – List of ready-to-use applications in gUSE repository – Import and execute application without knowledge of programming, dataflow or grid 43 Dataflow programming concept for appl. developers • Cross & dot product datapairing – Concept similar to Taverna 50 – All-to-all vs. one-to-one pairing of data items • Any component can be generator, PS node or collector, no ordering restriction • Conditional execution based on equality of data • Nesting, recursion 5000 40 20 1000 40 5000 1 1 7042 tasks 44 Current users of gUSE beta release • CancerGrid project – Predicting various properties of molecules to find anti-cancer leads – Creating science gateway for chemists • EDGeS project (Enabling Desktop Grids for e-Science) – Integrating EGEE with BOINC and XtremWeb technologies – User interfaces and tools • ProSim project – In silico simulation of intermolecular recognition – JISC ENGAGE program (UK) 45 The CancerGrid infrastructure Portal gUSE executing workflows DG jobs Local jobs 3G Bridge Job 1 Job 2 Job N Local Resource BOINC server WU 1 WU 2 WU N Portal Storage browsing molecules BOINC client GenWrapper for batch execution WU X Legacy WU Y Application Legacy Application DG clients from all partners Portal and DesktopGrid server molecule database Molecule database server 46 CancerGrid workflow N = 30K, M = 100 --> about 0.5 year execution time x1 NxM= 3 millions x1 xN x1 Generator job xN xN xN xN N=30K NxM NxM xN Generator job NxM= 3 millions Execute on local desktop Grid 47 G-USE in ProSim Project Protein Molecule Simulation on the Grid Grid Computing team of Univ. of Westminster 48 The User Scenario PDB file 1 (Receptor) PDB file 2 (Ligand) Check (Molprobity) Energy Minimization (Gromacs) Perform docking (AutoDock) Validate (Molprobity) Molecular Dynamics (Gromacs) 49 The Workflow in g-USE • Parameter sweeps in phases 3 and 4 • Executed on 5 different sites of the UK NGS 50 The ProSim visualiser 51 P-GRADE portal family summary P-GRADE NGS P-GRADE WS-PGRADE Scalability ++ + +++ Repository DSpace/WF Job & legacy code services WF (own development) Graphical workflow editor + + + Parameter sweep support + - ++ Access to various grids GT2, LCG-2, gLite GT2, LCG-2, gLite, GT4 GT2, LCG-2, gLite, GT4, BOINC, campus Access to clouds In progress - In progress Access to databases - via OGSA DAI SQL Support for WF interoperability - + In progress 52 Further information… – Take a look at www.lpds.sztaki.hu/pgportal (manuals, slide shows, installation procedure, etc.) – Visit or request a training event! (list of events is on P-GRADE Portal homepage) • Lectures, demos, hands-on tutorials, application development support – Get an account for the GILDA P-GRADE Portal: www.portal.p-grade.hu/gilda – Get an account for one of its production installations: • Multi-grid portal (SZTAKI) for VOCE, SEEGRID, HUNGrid, Biomed VO, Compchem VO, ASTRO VO • NGS P-GRADE portal (Univ. of Westminster) for UK NGS – Install a portal for your community: • If you are the administrator of a Grid/VO, download the portal from sourceforge (http://sourceforge.net/projects/pgportal/ ) • SZTAKI is pleased to help you install a portal for your community! 53 Thank you for your attention! Any questions? www.portal.p-grade.hu www.wspgrade.hu 54