P-GRADE Portal Family for e-Science Communities Peter Kacsuk www.lpds.sztaki.hu/pgportal

advertisement
P-GRADE Portal Family
for e-Science Communities
Peter Kacsuk
MTA SZTAKI
Univ. of Westminster
www.lpds.sztaki.hu/pgportal
pgportal@lpds.sztaki.hu
1
The community aspects of e-science
• Web2 is about creating and supporting web communities
• Grid is about creating virtual organizations where escience communities
– can share resources and
– can collaborate
• A portal should support e-science communities in their
collaborations and resource sharing
• And even more: it should provide simultaneous access to
any accessible
– Resources
– Databases
– Legacy applications
– Workflows, etc.
no matter in which grid they are operated on.
2
Who are the members of an e-science
community?
End-users (e-scientists)
• Execute the published applications with custom input
parameters by creating application instances using the
published applications as templates
Grid Application Developers
• Develop grid applications by the portal
• Publish the completed applications for end-users
Grid Portal Developers
• Develop the portal core services (job submission, etc.)
• Develop higher level portal services (workflow
management, etc.)
• Develop specialized/customized portal services (grid
testing, rendering, etc.)
• Writes technical, user and installation manuals
3
What does an individual
e-scientist need?
App.
Repository
Using a portal to parameterize
and run these applications
by transparently accessing a
large set of various IT resources
from the e-science infrastructure
Access to a large set of
ready-to-run scientific
applications (services)
Portal
Supercomputer
based SGs
(DEISA, TeraGrid)
Cluster based
service grids (SGs)
(EGEE, OSG, etc.)
Local clusters
Clouds
Supercomputers
E-science infrastructure
Desktop grids (DGs)
(BOINC, Condor, etc.)
Grid systems
4
What does an
e-science community need?
App.
Repository
E-scientists
Application developers
Portal
The same as an individual scientist but in
Supercomputer
collaboration with other members ofbased
the SGs
(DEISA, TeraGrid)
community
Cluster based
service grids (SGs)
(EGEE, OSG, etc.)
Local clusters
Clouds
Supercomputers
Desktop grids (DGs)
(BOINC, Condor, etc.)
Grid systems
5
Collaboration between e-scientists
and application developers
App.
Repository
E-scientists
Application developers
Portal
Application Developers
• Develop e-science applications via the portal in collaboration
with e-scientists
• Publish the completed applications for end-users via an
application repository
End-users (e-scientists)
• Specify the problem/application needs
• Execute the published applications via the portal with custom
input parameters by creating application instances
6
Collaboration between application
developers
• Application developers use the portal to
develop complex applications (e.g. parameter
sweep workflow) for the e-science infrastructure
• Publish templates, legacy code appls. and halfmade applications in the repository to be
continued by other appl. developers
App.
Repository
Application developers
Portal
Supercomputer
based SGs
(DEISA, TeraGrid)
Cluster based
service grids (SGs)
(EGEE, OSG, etc.)
Local clusters
Clouds
Supercomputers
Desktop grids (DGs)
(BOINC, Condor, etc.)
Grid systems
7
Collaboration between e-scientists
App.
Repository
E-scientists
Portal
• Joint run appls via the
portal in the e-science
infrastructure
• Joint observation and
control of appl execution
via the portal
Supercomputer
based SGs
(DEISA, TeraGrid)
Cluster based
service grids (SGs)
(EGEE, OSG, etc.)
Local clusters
Clouds
Supercomputers
• Sharing parameterized
appls via the repository
Desktop grids (DGs)
(BOINC, Condor, etc.)
Grid systems
8
Requirements for an e-science portal
from the e-scientists’ point of view
It should be able to
• Support large number of e-scientists (~ 100) with good
response time
• Enable the store and share of ready-to-run applications
• Enable to parameterize and run applications
• Enable to observe and control application execution
• Provide reliable appl. execution service even on top of
unreliable infrastructures (like for example grids)
• Provide specific, user community views
• Enable the access of the various components of an escience infrastructure (grids, databases, clouds, local
clusters, etc.)
• Support user’s collaboration via sharing:
– Applications (legacy, workflow, etc.)
– Databases
9
Requirements for an e-science portal
from the app. developers’ point of view
It should be able to
• Support large number of application developers (~ 100) with good
response time
• Enable the store and share of half-made applications, application
templates
• Provide graphical appl. developing tools (e.g. workflow editor) to
develop new applications
• Enable to parameterize and run applications
• Enable to observe and control application execution
• Provide methods and API to customize the portal interface towards
specific user community needs by creating user-specific portlets
• Enable the access of the various components of an e-science
infrastructure (grids, databases, clouds, local clusters, etc.)
• Support application developers’ collaboration via sharing:
– Applications (legacy, workflow, etc.)
– Databases
• Enable the integration/call of other services
10
Choice of an e-science portal
• Basic question for a community:
– Buy a commercial portal? (Usually expensive)
– Download OSS portal? (Good choice but: Does
the OSS project survive for a long time?)
– Develop own portal? (Requires long time and
can become very costly)
• The best choice is: Download OSS where
there is an active development community
behind the portal
11
The role of the Grid portal developers’
community
Grid Portal Developers
• Jointly develop the portal core services (e. g.
GridSphere, OGCE, Jetspeed-2, etc.)
• Jointly develop higher level portal services (workflow
management, data management, etc.)
• Jointly develop specialized/customized portal services
(grid testing, rendering, etc.)
• Never build a new portal from scratch, use the power
of the community to create really good portals
•
Unfortunately, we are not quite there:
– Hundreds of e-science portals have been developed
– Some of them are really good:
• Genius, Lead, etc.
– However, not many of them OSS (see the sourceforge list on the
next slide)
– Even less is actively maintained
– Even less satisfies the generic requirements of a good e-science
portal
12
Downloadable Grid portals
from SourceForge
Generic
Since
Number of
downloads
Active or
Finished
activity
P-GRADE
yes
2008-01-04
1468
Active
SDSC
Gridport
yes
2003-10-01
1266
2004-01-15
Lunarc
App.
yes
2006-10-05
783
Active
GRIDPortal yes
for
NorduGrid
2006-07-07
231
2006-08-09
NCHC
yes
2007-11-07
161
Active
Telemed
App. Spec.
2007-11-15
283
Active
13
P-GRADE portal family
• The goal of the P-GRADE portal family
– To meet all the requirements of end-users and
application developers listed above
– To provide a generic portal that can be used by
a large set of e-science communities
– To provide a community code based on which
the portal developers’ community can start to
develop specialized and customized portals
14
P-GRADE portal family
2008
2009
2010
P-GRADE portal
2.4
GEMLCA
Grid Legacy
Code Arch.
P-GRADE portal
2.5
Param. Sweep
NGS P-GRADE
portal
P-GRADE portal
2.8
Current release
P-GRADE portal
2.9
Under development
Basic concept
Open source
from Jan. 2008
GEMLCA,
repository concept
WS-PGRADE
Portal
Beta release 3.3
WS-PGRADE
Portal
Release 3.4
15
P-GRADE Portal in a nutshell
•
•
•
General purpose, workflow-oriented Grid portal. Supports the
development and execution of workflow-based Grid applications – a tool
for Grid orchestration
Based on GridSphere-2
– Easy to expand with new portlets (e.g. application-specific
portlets)
– Easy to tailor to end-user needs
Basic Grid services supported by the portal:
Service
Job submission
File storage
EGEE grids (LCG-2/gLite)
Globus 2 grids
Computing Element
GRAM
Storage Element, LFC
GridFTP server
Certificate management
Information system
Brokering
MyProxy/VOMS
BDII
MDS-2, MDS-4
WMS (Workload Management
System)
GTbroker
Job monitoring
Mercury
Workflow & job visualization
PROVE
16
The typical user scenario
Part 1 - development phase
Certificate
servers
SAVE
WORKFLOW,
UPLOAD
LOCAL FILES
Portal
server
Grid
services
START
EDITOR
OPEN & EDIT
or DEVELOP
WORKFLOW
17
The typical user scenario
Part 2 - execution phase
Certificate
servers
TRANSFER FILES,
SUBMIT JOBS
DOWNLOAD
PROXY
CERTIFICATES
VISUALIZE
JOBS and
WORKFLOW
PROGRESS
Portal
server
MONITOR
JOBS
Grid
services
SUBMIT
WORKFLOW
DOWNLOAD
(SMALL)
RESULTS
DOWNLOAD
(SMALL)
RESULTS
18
P-GRADE Portal architecture
Client
Java Webstart
workflow editor
Web browser
Tomcat
Frontend
layer
P-GRADE
Portal
server
Backend
layer
Grid
P-GRADE Portal portlets (JSR-168 Gridsphere-2 portlets)
DAGMan workflow manager
shell scripts
Information
system
clients
CoG
API
&
scripts
gLite and
Globus
Information
systems
MyProxy
server
& VOMS
Grid middleware clients
Grid middleware services
(gLite WMS, LFC,…; Globus GRAM, …)
19
P-GRADE portal in a nutshell
Certificate and
proxy management
Grid and Grid
resource
management
Graphical editor to
define workflows
and parametric
studies
Accessing
resources in
multiple VOs
Built-in workflow
manager and
execution
visualization
GUI is
customizable to
certain
applications
20
What is a P-GRADE Portal workflow?
• A directed acyclic graph
where
– Nodes represent jobs (batch
programs to be executed on a
computing element)
– Ports represent input/output
files the jobs expect/produce
– Arcs represent file transfer
operations and job
dependencies
• Semantics of the
workflow:
– A job can be executed if all
of its input files are available
21
Introducing three levels of
parallelism
Multiple instances of the same
workflow with different data files
– Parallel execution
inside a workflow node
– Parallel execution
among workflow nodes
– Parameter study
execution of the workflow
Multiple jobs run
parallel
Each job can be a
parallel program
22
Parameter sweep (PS) workflow execution
based on the black box concept
1 PS workflow execution
PS port:
3 instances of
the input file
4 x 3 normal executable workflows
(e-workflows)
PS port:
4 instances of
the input file
=
This provides the 3rd level of parallelism
resulting a very large demand for Grid
resources
23
Workflow parameter studies
in P-GRADE Portal
Generator
component(s)
Initial
input
data
Core
workflow
Generate or
cut input into
smaller pieces
E-workflows
Collector
component(s)
Files in the same LFC catalog
(e.g. /grid/gilda/sipos/myinputs)
Results produced in
the same catalog
Aggregate
result
24
Generic structure of PS workflows and
their execution
1st phase:
executing
all
Generator
s in
parallel
3rd phase:
executing
all
Collectors
in parallel
Generator jobs to
generate the set of
input files
2nd phase:
executing all
generated
eWorkflows in
parallel
Core workflow to
be executed as PS
Collector jobs to
collect and process the
set of output files
25
Integrating P-GRADE portal
with DSpace repository
• Goal: to make available
workflow applications for the
whole P-GRADE portal user
community
• Solution: Integrating P-GRADE
portal with DSpace repository
• Functions:
– App developers can publish
their ready-to-use and halfmade applications in the
repository
– End-users can download,
parameterize and execute the
applications stored in the
repository
DSpace
repository
Portal
Portal
Portal
App
developer
End-user
• Advantage:
• Appl. developers can collaborate with end-users
• Members of a portal user community can share their WFs
• Different portal user communities can share their WFs
26
Integrating P-GRADE portal
with DSpace repository
DSpace
Repository
Upload WF
to DSpace
Download
WF from
DSpace
27
Creating application specific portals
from the generic P-GRADE portal
• Creating an appl. spec. portal does not mean to
develop it from scratch
• P-GRADE is a generic portal that can quickly and
easily be customized to any application type
• Advantage:
– You do not have to develop the generic parts (WF
editor, WF manager, job submission, monitoring, etc.)
– You can concentrate on the appl. spec. part
– Much shorter development time
28
Concept of creating application
specific portals
ClientEnd user
Web browser
Appl. developer
P-GRADE portal
developer
P-GRADE
Portal
server
Custom User Interface
(Written in Java, JSP, JSTL)
Application Specific Module
P-GRADE portal
developer
Services of P-GRADE Portal
(workflow management, parameter study management, fault tolerance, …)
Grid
EGEE and Globus Grid services
(gLite WMS, LFC,…; Globus GRAM, …)
29
Roles of people in creating and using
customized P-GRADE portals
Grid Application Developer
• develops a grid application by P-GRADE Portal
• sends the application to the grid portal developer
They can be the
same group
Grid Portal Developer
• Creates new classes from the ASM for P-GRADE by
changing the names of the classes
• Develops one or more Gridsphere portlets that fit to the
application I/O pattern and the end users’ needs
• Connects the GUI to P-GRADE Portal using the
programming API of P-GRADE ASM
• Using the ASM he publishes the grid application and its
GUI for end users
End User
• Executes the published application with custom input
parameters by creating application instances using the
published application as a template
30
Application Specific P-GRADE portals
Rendering
portal by Univ.
of Westminster
OMNeT++
portal by
SZTAKI
Traffic
simulation
portal by Univ.
of Westminster
31
Grid interoperation by
P-GRADE portal
• P-GRADE Portal enables: Simultaneous usage of
several production Grids at workflow level
• Currently connectable grids:
– LCG-2 and gLite: EGEE, SEE-GRID, BalticGrid
– GT-2: UK NGS, US OSG, US Teragrid
• In progress:
–
–
–
–
Campus Grids with PBS or LSF
BOINC desktop Grids
ARC: NorduGrid
UniCore: D-Grid
32
Simultaneous use of production Grids at
workflow level
UK NGS
GT2
Job
Manchester
SZTAKI Portal Server
User
Workflow
P-GRADE
Portal
Leeds
EGEE-VOCE
gLite
Budapest
Supports both
direct and
brokered job
submission
Job
WMS
broker
Job
Athens
Job
Brno
33
P-GRADE Portal references
• P-GRADE Portal services:
–
–
–
–
SEE-GRID, BalticGrid
Central European VO of EGEE
GILDA: Training VO of EGEE
Many national Grids (UK, Ireland, Croatia,
Turkey, Spain, Belgium, Malaysia,
Kazakhstan, Switzerland, Australia, etc.)
– US Open Science Grid, TeraGrid
– Economy-Grid, Swiss BioGrid, Bio and
Biomed EGEE VOs, MathGrid, etc.
Portal services and account request:
– portal.p-grade.hu/index.php?m=5&s=0
34
Community based business model for
the sustainability of P-GRADE portal
• Some of the developments are related to EU projects.
Examples:
– PS feature: SEE-GRID-2
– Integration with DSpace: SEE-GRID-SCI
– Integration with BOINC: EDGeS, CancerGrid
• There is an open Portal Developer Alliance with the
current active members:
– Middle East Technical Univ. (Ankara, Turkey)
• gLite file catalog management portlet
– Univ. of Westminster (London, UK)
•
•
•
•
GEMLCA legacy code service extension
SRB integration (workflow and portlet)
OGSA-DAI integration (workflow and portlet)
Embedding Taverna, Kepler and Triana WFs into the P-GRADE
workflow
• All these features are available in the UK NGS P-GRADE portal
35
Business model for the sustainability
of P-GRADE portal
• Some of the developments are ordered by customer
academic institutes:
–
–
–
–
–
Collaborative WF editor: Reading Univ. (UK)
Accounting portlet: MIMOS (Malaysia)
Separation of front-end and back-end: MIMOS
Shiboleth integration: ETH Zurich
ARC integration: ETH Zurich
• Benefits for the customer academic institutes:
– Basically they like the portal but they have some special needs that
require extra development
– Instead of developing from scratch a new portal (using many
person-months) rather they pay only for the required little
extension/modification of the portal
– To solve their problem gets priority
– They become expert of the internal structure of the portal and will
be able to further develop it according to their needs
– Joint publications
36
Main features of
NGS P-GRADE portal
• Extends P-GRADE portal with
–
–
–
–
–
GEMLCA legacy code architecture and repository
SRB file management
OGSA-DAI database access
WF level interoperation of grid data resources
Workflow interoperability support
• All these features are provided as production
service for the UK NGS
37
Interoperation of grid data resources
Grid 1
Grid 2
Workflow
engine
DB1
J1
FS2
J3
J2
J4
DB2
J5
FS1
J: Job
FS: File storage system, e.g. SRB or SRM
DB: Database management system (based on OGSA-DAI)
38
Workflow level Interoperation of
local, SRB, SRM and GridFTP file systems
From
NGS SRB
From NGS SRB
(both)
From NGS
GFTP
Running at EGEE
From
local
(both)
Running at OSG
Running at NGS
Jobs can run in
various grids and can
read and write files
stored in different
grid systems by
different file
management systems
Running at NGS
From
NGS SRB
To EGEE
SRM
From NGS
GFTP
To NGS
SRB
Running at NGS
39
WF interoperability: P-GRADE workflow embedding
Triana, Taverna, and Kepler workflows
Triana workflow
Taverna workflow
Available for UK NGS
users as production service
P-GRADE
workflow
hosting the
other
workflows
Kepler workflow
40
WS-PGRADE and gUSE
• New product in the P-GRADE portal family:
– WS-PGRADE (Web Services Parallel Grid Runtime and Developer
Environment)
• WS-PGRADE uses the high-level services of
– gUSE (Grid User Support Environment) architecture
• Integrates and generalizes P-GRADE portal and NGS P-GRADE
portal features
– Advance data-flows (PS features)
– GEMLCA
– Workflow repository
• gUSE features
– Scalable architecture (can be installed on one or more servers)
– Various grid submission services (GT2, GT4, LCG-2, gLite, BOINC, local
– Built-in inter-grid broker (seamless access to various types of resources)
• Comfort features
– Different separated user views supported by gUSE application repository
41
gUSE: service-oriented architecture
Graphical User Interface: WS-PGRADE
Workflow
storage
Workflow
Engine
Meta-broker
gUSE
gUSE
information
system
Submitters
Submitters
Submitters
Submitters
Gridsphere
portlets
File
File
storage
storage
Application
repository
Autonomous
Services:
high level
middleware
service layer
Logging
Local resources, Service grid resources,
Desktop Grid resources, Web services, Databases
Resources:
middleware
service layer
42
Ergonomics
• Users can be grid application developers or end-users.
• Application developers design sophisticated dataflow graphs
– embedding into any depth, recursive invocations, conditional
structures, generators and collectors at any position
– Publish applications in the repository at certain stages of work
• Applications
• Projects
• Concrete workflows
• Templates
• Graphs
• End-users see WS-PGRADE portal as a science gateway
– List of ready-to-use applications in gUSE repository
– Import and execute application without knowledge of
programming, dataflow or grid
43
Dataflow programming concept for
appl. developers
• Cross & dot product datapairing
– Concept similar to Taverna 50
– All-to-all vs. one-to-one
pairing of data items
• Any component can be
generator, PS node or
collector, no ordering
restriction
• Conditional execution based on
equality of data
• Nesting, recursion
5000
40
20
1000
40
5000
1
1
7042
tasks
44
Current users of gUSE
beta release
• CancerGrid project
– Predicting various properties of
molecules to find anti-cancer
leads
– Creating science gateway for
chemists
• EDGeS project (Enabling Desktop
Grids for e-Science)
– Integrating EGEE with BOINC
and XtremWeb technologies
– User interfaces and tools
• ProSim project
– In silico simulation of
intermolecular recognition
– JISC ENGAGE program (UK)
45
The CancerGrid infrastructure
Portal
gUSE
executing
workflows
DG jobs
Local jobs
3G
Bridge
Job 1
Job 2
Job N
Local
Resource
BOINC
server
WU 1
WU 2
WU N
Portal
Storage
browsing
molecules
BOINC client
GenWrapper for
batch execution
WU X
Legacy WU Y
Application
Legacy
Application
DG clients from all partners
Portal and
DesktopGrid
server
molecule
database
Molecule database server
46
CancerGrid workflow
N = 30K, M = 100 --> about 0.5 year execution time
x1
NxM= 3 millions
x1
xN
x1
Generator job
xN
xN
xN
xN
N=30K
NxM
NxM
xN
Generator job
NxM= 3 millions
Execute on local desktop Grid
47
G-USE in ProSim Project
Protein Molecule Simulation
on the Grid
Grid Computing team of Univ. of Westminster
48
The User Scenario
PDB file 1
(Receptor)
PDB file 2
(Ligand)
Check
(Molprobity)
Energy Minimization
(Gromacs)
Perform docking
(AutoDock)
Validate
(Molprobity)
Molecular Dynamics
(Gromacs)
49
The Workflow in g-USE
•
Parameter
sweeps in phases
3 and 4
•
Executed on 5
different sites of
the UK NGS
50
The ProSim visualiser
51
P-GRADE portal family summary
P-GRADE
NGS P-GRADE
WS-PGRADE
Scalability
++
+
+++
Repository
DSpace/WF
Job & legacy
code services
WF (own
development)
Graphical
workflow editor
+
+
+
Parameter sweep
support
+
-
++
Access to various
grids
GT2, LCG-2,
gLite
GT2, LCG-2,
gLite, GT4
GT2, LCG-2,
gLite, GT4,
BOINC, campus
Access to clouds
In progress
-
In progress
Access to
databases
-
via OGSA DAI
SQL
Support for WF
interoperability
-
+
In progress
52
Further information…
– Take a look at www.lpds.sztaki.hu/pgportal
(manuals, slide shows, installation procedure, etc.)
– Visit or request a training event!
(list of events is on P-GRADE Portal homepage)
• Lectures, demos, hands-on tutorials, application development support
– Get an account for the GILDA P-GRADE Portal:
www.portal.p-grade.hu/gilda
– Get an account for one of its production installations:
• Multi-grid portal (SZTAKI) for VOCE, SEEGRID, HUNGrid,
Biomed VO, Compchem VO, ASTRO VO
• NGS P-GRADE portal (Univ. of Westminster) for UK NGS
– Install a portal for your community:
• If you are the administrator of a Grid/VO, download the portal from
sourceforge (http://sourceforge.net/projects/pgportal/ )
• SZTAKI is pleased to help you install a portal for your community!
53
Thank you for your attention!
Any questions?
www.portal.p-grade.hu
www.wspgrade.hu
54
Download