gUse White Paper gUse is an easily usable highly flexible and scalable co-operative Grid application development and enactor infrastructure connecting developers and end users of grid applications with computational resources of different technologies enabling the design, integration and submission of sophisticated (layered and parameter sweep enabled ) workflows spreading parallel among classic Grids ( Gt2, Gt4, LCG, Glite, ….), unique Web services, Clusters, and eventual local resources of of the gUse. gUse is a successor of successful parameter sweep enabled P-GRADE portal inheriting its merits: user friendliness, by the “Learn once – use forever and everywhere” principle; expandability, by the selected Portlet technology enabling the plugging in of user defined custom services simplicity, by hiding the different technologies of the backend Grid middleware an resources graphic user interface, facilitating natural overview of objects used in the design and in the workflow submission process flexibility, to add new middleware (Grid) to its backend comfort, by the implemented standard services ( Certification Management, Grid Information Query, Grid Resource Connection Management, and Grid File and Catalogue management) connectivity, by accepting Workflows whose parts running parallel in different Grids built even by different technologies However gUse is much more than a simple Portal 1. gUse has not the former “unintelligent” interface between a simple Condor DAG interpreter and the job submission systems of the backend systems, but it has an own workflow enactor capable of –among other- brokering (schedule and distribute jobs among the available and proper resources) 2. This workflow enactor enables the abstract, hierarchical definition and the embedded call of workflows, where even the recursive embedding of workflows is possible. 3. The workflow enactor makes it possible, that the workflows may be started not only directly by the user but even by an external ( by WDSL protocol described) event issued by an automatic alien system or by a Crontab like schedule table set by the user. 4. The versatility and scalability of the managed backend systems (middleware) is facilitated by an array of built in submitters. All submitters have a common standard WS interface toward the workflow enactor. Additionally to the built in standard backend systems (GT2,GT4,LCG2,Glite,Condor, GLEMCA, Axis,….) the administrator can add own submitter(s) to the infrastructure in real time. These submitters may be either user defined –for example to exploit a local resource or may be standard ones balancing loads of existing submitters. 5. The own workflow enactor facilitates a much more flexible parameter sweep solution than the P-GRADE portal. Instead of the workflow, a single job is base of multiplication of the submissions. The consequence is, that within a workflow the number of submissions of separate jobs (or of a group of connected jobs) can be independently determined from submissions of other jobs (or groups) . The result is that the computation load can be reduced to the optimally needed minimum. A series of instruments ( Input File containers, threshold numbers associated to the input ports of jobs, cross product/ dot product relation switches connecting input ports, generator / collector jobs ( enabling writing / reading of several files during on job submission ), user programmed logic control over the input ports of jobs (conditionally excluding them from workflow elaboration) ) are supporting the basically data driven execution of workflows. 6. The palette of job level activities has been expanded beyond the already mentioned call of embedded workflows by the call of WEB services, and by the call of GLEMCA interfaced legacy systems. Even the group of classical binary job executables has been extended by the possibility of call Java JVM-s and the and MPI interpreters of the target systems. 7. The error resisting, stable, scalable performance of the gUse has been supported by its distributed modular structure. The same way as it has been discussed in the case of the submitters, the whole system is a run time pluggable –in the case of bottle necks multipliable – set of independent services speak with each other by a standard, XML protocol. So it is up to the Administrator to put the whole system in a common Host or distribute its among several machines. 8. In addition to the common input files ( uploaded from the local machine of the Portal user or reached from remote Grid environment) a job can be feed by a direct value or by the result set of an online SQL query of a remote database. 9. The workflow developer is supported by a log book storing each change of the workflow. The logging is subdivided in Job histories. 10. May be that the most important novelty of the gUse is the emphasis on the support the different needs of developers and of end users. 11. The experienced developers receive additional support to make their code reusable by introducing new concepts to express the hierarchic grouping of code abstraction: Graphs are introduced to define the common topology of a group of workflows by identifying jobs and job input output connections. Concrete workflows define the semantic of workflows over a given Graph Templates inherit the part of the semantics that must be preserved in a different workflow. Additionally the templates seems to be a natural environment where the documentation and the other kind of the verbal end user support (in form of user digestible labels) can be placed. An Application is the end product of the developer. It is a packed directly applicable whole solution which gives at most a limited freedom to the end user to modify ( Input file associations, command line parameters of jobs, resource selection). An application which may be containing several workflows –as the embedded workflows to be called must be included as well - will be exported and stored in a common -by all user accessible - repository. 12. The Repository stores not only full Applications. Even the community of developers may help each other by the publication of tested Graphs, Concrete Workflows and Templates or - horribile dictu –by sending of not finished Applications (consisting of several Graphs, Concrete Workflows, Templates) named as Projects. 13. The end user imports the Application and configures it with the help of an automatically generated simplified Web page form, gaining its building parameters from the Template(s) belonging to the given application. 14. A new –and hopefully- running Workflow Instance will be generated from an imported -and by the end user completed –Application upon Workflow submission. The same happens if a developer submits a full Workflow. The workflow instance inherits all the definitions of the workflow and additionally contains the runtime state and output (result) information. It follows that the user can change the –permitted- configuration of a workflow in run time, an it may influence all jobs in all instances ( certainly excluded the just running or terminated ones) An other consequence is that the call of embedded workflows is implemented by creating a separate Workflow Instance belonging to the called Workflow. 15. The concept of the workflow instance and the concept of job level parameter sweeping is reflected by the new hierarchically structured visualization and manipulation of activated elements. In the hierarchically descending order Workflow, Workflow Instance, Job set (more than one element set in the case when the multiple number of input files forces multiple submissions of the same job ), Jobs can be visualized by reading of its states, messages and outputs. Suspending and killing/resuming of workflows can be done on workflow instance level. As a conclusion it can be told that as a contrast to the old P-Grade portal each ever produced little item of workflow can be fetched and visualized on a common interface. The static organization of the gUse The gUse has a 3 tire structure (see Figure 1): 1. The user interface containing two sub parts: the Graph Editor to define the static skeleton of workflows, and a HTML one, interpretable by a web browser to access the portlets of the system enabling the configuration, submission, observation and harvesting of workflows and facilitating auxiliary services as information system, certificate management, remote file access management, resource management, workflow repository management. The Graph Editor is a Webstart application callable from the workflow management portlet. It means that it downloads itself to, and runs on the user’s desktop machine on demand. 2. The middle layer composes the gUse infrastructure hosted on one ore more server machines where the implementation of portlet activities are mapped in a sequence of web services calls. These services are responsible for the handling of several databases (users, user proxy certificates, user source code and the history of their changes, time states and results of workflow instances) , for resource management and for the management of submitted workflows. The set of submitters are composing most important back end of the middle layer connecting the gUse infrastructure with the job submission systems of the different Grids. 3. The Grid and middleware on the Grid composes the third “remote” layer of the structure. It includes the Job submission, certificate and VOMS management, Remote Data, and Information system management systems eventually differing from each other in their technology. This layer is maintained and operated by foreign vendors / organizations. Figure 1 Configuration of gUse (From point of view of the Workflow management) The details of the Workflow manager of Tire 2 The Portal embedded in Gridsphere contains interpretations of the UIF operation on the server side (Tire2 +Tire3) of the system. WF storage, implemented by a MySQL database contains the main part of the definition of workflows. It also contains the logbook noticing the time stamped changes the user has made during the workflow definition process. Moreover it contains the runtime identifiers and states of instances of submitted workflows. The storage is subdivided by users and by object types where the main base object types are GRAPH, CONCRETE WORKFLOW, TEMPLATE and INSTANCE of a CONCRATE WORKFLOW. File Storage, implemented by a traditional Unix file system contains the by the user uploaded – input data files, files of executable code which will perform the activities of the Workflow Jobs when the jobs will be delivered to remote Grid resources to be executed. Also the file system contains the eventual output “local” files of terminated Workflow Jobs. A file is local output file of a job, if the content of the file will be consumed by a input port of a subsequent job with the help of the WF enactor or the file is regarded as an result file of the workflow and the user wants to receive and download it as a part of the packed result of a Workflow Instance. It must be noted that the names of input data files and files of executables are referenced WF sorage. WF enactor is central workflow interpreter. It maps a submitted workflow instance to component jobs to submit. During this process it investigates the availability of each needed input files of the job, and checks optional user defined conditions prohibiting the job submissions. The parameter sweep ability of the gUse is also ensured by the WF enactor by determining the next group of associated input files to be feed for the subsequent job submission. Jobs are submitted by an –optionally two level mapping - backed by submitters. The dedicated submitters fit the job description to the technically different job submissions forms, and communicate with the remote system via the corresponding protocol in order to send the job, to report back its state, and to fetch its result upon termination. The gUse internal BROKER is able to make an optimization on user demand if the job is able to run on more than one type of Grid resource. A special submitter serves jobs containing no executable code, but the invocation of an existing Web Service. At persent only the SOAP protocol is built in, but the structure of gUse enables to plug in other WS call communication protocol. A special type of remote services is when GLEMCA infrastructure is needed to start a legacy application. A different special case if the user wants to use the infrastructure of gUse as destination to submit a job. It is an ideal choice - first of all for test purposes - to control the semantics of a workflow excluding the Grid imposed time delays and errors due to bad networks, firewall problems, or non existent, or expired certificates. The repository is a storage of tested and reusable workflow solutions accessable for the whole user community. IS is a central manager of the distributed gUse systems. It orchestrates the cooperation of the above mentioned functions implemented by independent services. Moreover it makes reports (not shown in Figure 1) for the administrator of the gUse and ensures the seamless runtime insertion of a host and the subsequent redistribution of services in case of bottleneck situation.