Guse White Paper

gUse White Paper
gUse is an easily usable highly flexible and scalable co-operative Grid application
development and enactor infrastructure connecting developers and end users of grid
applications with computational resources of different technologies enabling the design,
integration and submission of sophisticated (layered and parameter sweep enabled )
workflows spreading parallel among classic Grids ( Gt2, Gt4, LCG, Glite, ….), unique
Web services, Clusters, and eventual local resources of of the gUse.
gUse is a successor of successful parameter sweep enabled P-GRADE portal inheriting
its merits:
user friendliness, by the “Learn once – use forever and everywhere” principle;
expandability, by the selected Portlet technology enabling the plugging in of user
defined custom services
simplicity, by hiding the different technologies of the backend Grid middleware
an resources
graphic user interface, facilitating natural overview of objects used in the design
and in the workflow submission process
flexibility, to add new middleware (Grid) to its backend
comfort, by the implemented standard services ( Certification Management, Grid
Information Query, Grid Resource Connection Management, and Grid File and
Catalogue management)
connectivity, by accepting Workflows whose parts running parallel in different
Grids built even by different technologies
However gUse is much more than a simple Portal
1. gUse has not the former “unintelligent” interface between a simple Condor DAG
interpreter and the job submission systems of the backend systems, but it has an
own workflow enactor capable of –among other- brokering (schedule and
distribute jobs among the available and proper resources)
2. This workflow enactor enables the abstract, hierarchical definition and the
embedded call of workflows, where even the recursive embedding of workflows
is possible.
3. The workflow enactor makes it possible, that the workflows may be started not
only directly by the user but even by an external ( by WDSL protocol described)
event issued by an automatic alien system or by a Crontab like schedule table set
by the user.
4. The versatility and scalability of the managed backend systems (middleware) is
facilitated by an array of built in submitters. All submitters have a common
standard WS interface toward the workflow enactor. Additionally to the built in
standard backend systems (GT2,GT4,LCG2,Glite,Condor, GLEMCA, Axis,….)
the administrator can add own submitter(s) to the infrastructure in real time.
These submitters may be either user defined –for example to exploit a local
resource or may be standard ones balancing loads of existing submitters.
5. The own workflow enactor facilitates a much more flexible parameter sweep
solution than the P-GRADE portal. Instead of the workflow, a single job is base
of multiplication of the submissions. The consequence is, that within a workflow
the number of submissions of separate jobs (or of a group of connected jobs) can
be independently determined from submissions of other jobs (or groups) . The
result is that the computation load can be reduced to the optimally needed
minimum. A series of instruments ( Input File containers, threshold numbers
associated to the input ports of jobs, cross product/ dot product relation switches
connecting input ports, generator / collector jobs ( enabling writing / reading of
several files during on job submission ), user programmed logic control over the
input ports of jobs (conditionally excluding them from workflow elaboration) )
are supporting the basically data driven execution of workflows.
6. The palette of job level activities has been expanded beyond the already
mentioned call of embedded workflows by the call of WEB services, and by the
call of GLEMCA interfaced legacy systems. Even the group of classical binary
job executables has been extended by the possibility of call Java JVM-s and the
and MPI interpreters of the target systems.
7. The error resisting, stable, scalable performance of the gUse has been supported
by its distributed modular structure. The same way as it has been discussed in the
case of the submitters, the whole system is a run time pluggable –in the case of
bottle necks multipliable – set of independent services speak with each other by a
standard, XML protocol. So it is up to the Administrator to put the whole system
in a common Host or distribute its among several machines.
8. In addition to the common input files ( uploaded from the local machine of the
Portal user or reached from remote Grid environment) a job can be feed by a
direct value or by the result set of an online SQL query of a remote database.
9. The workflow developer is supported by a log book storing each change of the
workflow. The logging is subdivided in Job histories.
10. May be that the most important novelty of the gUse is the emphasis on the
support the different needs of developers and of end users.
11. The experienced developers receive additional support to make their code
reusable by introducing new concepts to express the hierarchic grouping of code
Graphs are introduced to define the common topology of a group of workflows by
identifying jobs and job input output connections.
Concrete workflows define the semantic of workflows over a given Graph
Templates inherit the part of the semantics that must be preserved in a different
Additionally the templates seems to be a natural environment where the
documentation and the other kind of the verbal end user support (in form of user
digestible labels) can be placed.
An Application is the end product of the developer. It is a packed directly
applicable whole solution which gives at most a limited freedom to the end user to
modify ( Input file associations, command line parameters of jobs, resource
selection). An application which may be containing several workflows –as the
embedded workflows to be called must be included as well - will be exported and
stored in a common -by all user accessible - repository.
12. The Repository stores not only full Applications. Even the community of
developers may help each other by the publication of tested Graphs, Concrete
Workflows and Templates or - horribile dictu –by sending of not finished
Applications (consisting of several Graphs, Concrete Workflows, Templates)
named as Projects.
13. The end user imports the Application and configures it with the help of an
automatically generated simplified Web page form, gaining its building
parameters from the Template(s) belonging to the given application.
14. A new –and hopefully- running Workflow Instance will be generated from an
imported -and by the end user completed –Application upon Workflow
submission. The same happens if a developer submits a full Workflow. The
workflow instance inherits all the definitions of the workflow and additionally
contains the runtime state and output (result) information.
It follows that the user can change the –permitted- configuration of a workflow in
run time, an it may influence all jobs in all instances ( certainly excluded the just
running or terminated ones)
An other consequence is that the call of embedded workflows is implemented by
creating a separate Workflow Instance belonging to the called Workflow.
15. The concept of the workflow instance and the concept of job level parameter
sweeping is reflected by the new hierarchically structured visualization and
manipulation of activated elements. In the hierarchically descending order
Workflow, Workflow Instance, Job set (more than one element set in the case
when the multiple number of input files forces multiple submissions of the same
job ), Jobs can be visualized by reading of its states, messages and outputs.
Suspending and killing/resuming of workflows can be done on workflow
instance level. As a conclusion it can be told that as a contrast to the old P-Grade
portal each ever produced little item of workflow can be fetched and visualized
on a common interface.
The static organization of the gUse
The gUse has a 3 tire structure (see Figure 1):
1. The user interface containing two sub parts: the Graph Editor to define the
static skeleton of workflows, and a HTML one, interpretable by a web
browser to access the portlets of the system enabling the configuration,
submission, observation and harvesting of workflows and facilitating auxiliary
services as information system, certificate management, remote file access
management, resource management, workflow repository management.
The Graph Editor is a Webstart application callable from the workflow
management portlet. It means that it downloads itself to, and runs on the
user’s desktop machine on demand.
2. The middle layer composes the gUse infrastructure hosted on one ore more
server machines where the implementation of portlet activities are mapped in
a sequence of web services calls. These services are responsible for the
handling of several databases (users, user proxy certificates, user source code
and the history of their changes, time states and results of workflow
instances) , for resource management and for the management of submitted
The set of submitters are composing most important back end of the middle
layer connecting the gUse infrastructure with the job submission systems of
the different Grids.
3. The Grid and middleware on the Grid composes the third “remote” layer of
the structure. It includes the Job submission, certificate and VOMS
management, Remote Data, and Information system management systems
eventually differing from each other in their technology. This layer is
maintained and operated by foreign vendors / organizations.
Figure 1 Configuration of gUse
(From point of view of the Workflow management)
The details of the Workflow manager of Tire 2
The Portal embedded in Gridsphere contains interpretations of the UIF
operation on the server side (Tire2 +Tire3) of the system.
WF storage, implemented by a MySQL database contains the main part of
the definition of workflows. It also contains the logbook noticing the time
stamped changes the user has made during the workflow definition process.
Moreover it contains the runtime identifiers and states of instances of
submitted workflows.
The storage is subdivided by users and by object types where the main base
File Storage, implemented by a traditional Unix file system contains the by the user uploaded – input data files, files of executable code which will
perform the activities of the Workflow Jobs when the jobs will be delivered
to remote Grid resources to be executed.
Also the file system contains the eventual output “local” files of terminated
Workflow Jobs.
A file is local output file of a job, if the content of the file will be
consumed by a input port of a subsequent job with the help of the WF
enactor or the file is regarded as an result file of the workflow and the user
wants to receive and download it as a part of the packed result of a
Workflow Instance.
It must be noted that the names of input data files and files of executables
are referenced WF sorage.
WF enactor is central workflow interpreter. It maps a submitted workflow
instance to component jobs to submit.
During this process it investigates the availability of each needed input files
of the job, and checks optional user defined conditions prohibiting the job
The parameter sweep ability of the gUse is also ensured by the WF enactor
by determining the next group of associated input files to be feed for the
subsequent job submission.
Jobs are submitted by an –optionally two level mapping - backed by
submitters. The dedicated submitters fit the job description to the
technically different job submissions forms, and communicate with the
remote system via the corresponding protocol in order to send the job, to
report back its state, and to fetch its result upon termination.
The gUse internal BROKER is able to make an optimization on user
demand if the job is able to run on more than one type of Grid resource.
A special submitter serves jobs containing no executable code, but the
invocation of an existing Web Service. At persent only the SOAP protocol
is built in, but the structure of gUse enables to plug in other WS call
communication protocol.
A special type of remote services is when GLEMCA infrastructure is needed
to start a legacy application.
A different special case if the user wants to use the infrastructure of gUse as
destination to submit a job. It is an ideal choice - first of all for test purposes
- to control the semantics of a workflow excluding the Grid imposed time
delays and errors due to bad networks, firewall problems, or non existent, or
expired certificates.
The repository is a storage of tested and reusable workflow solutions
accessable for the whole user community.
IS is a central manager of the distributed gUse systems. It orchestrates the
cooperation of the above mentioned functions implemented by independent
services. Moreover it makes reports (not shown in Figure 1) for the
administrator of the gUse and ensures the seamless runtime insertion of a
host and the subsequent redistribution of services in case of bottleneck