EU project: RI031844-OMII-Europe Project no: RI031844-OMII-Europe Project acronym: OMII-Europe Project title: Open Middleware Infrastructure Institute for Europe Instrument: Integrated Infrastructure Initiative Thematic Priority: Communication network development M:JRA1.8 – GridSAM Integration into UNICORE Due date of deliverable: 30 April 2007 Actual submission date: 30 April 2007 Start date of project: 1 May 2006 Duration: 2 years Organisation name of lead contractor for this deliverable INFN Revision [draft 0.1] Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006) PU PP RE CO Dissemination Level Public Restricted to other programme participants (including the Commission Services) Restricted to a group specified by the consortium (including the Commission Services) Confidential, only for members of the consortium (including the Commission Services) x For additional information see http://omii-europe.com or contact info@omii-europe.com Document Control Sheet Title: GriSAM Integration into UNICORE ID: M:JRA1.8 Version: 0.1 Status: Draft Available at: http://omii-europe.org Software Tool: Microsoft Word 2007 File(s): only this one Morris Riedel (FZJ), Written by: Shahbaz Memon (FZJ) Shiraz Memon (FZJ) Contributors: Sven van de Berghe (FLE) Reviewed by: Approved by: Document Authorship Document Status Sheet Version 0.1 Date 10 April 2007 Status Draft Comments All sections written Table of Contents Document Control Sheet .............................................................................................................. 2 Document Status Sheet ................................................................................................................ 2 Table of Contents ................................................................................................................................. 3 1. Introduction .................................................................................................................................. 4 2. GridSAM Overview ..................................................................................................................... 5 2.1 Typical GridSAM Use Cases ................................................................................................ 5 2.2 GridSAM Architecture .......................................................................................................... 6 3. GridSAM Integration into UNICORE ......................................................................................... 8 3.1 UNICORE Job Submission Functionalities and Services ..................................................... 8 3.2 GridSAM Job Submission Functionalities ............................................................................ 9 3.3 Overall Integrated Architecture ........................................................................................... 10 4. Conclusion ................................................................................................................................. 12 5. References .................................................................................................................................. 12 1. Introduction Many existing Distributed Resource Managers (DRM) lack standardized job submission in Grid environments. This leads to the requirement that the user needs to learn large number of job description languages and deployment mechanisms in order to exploit Grid resources. The evolvement of job description languages (e.g. JSDL) paves the way to improvise the standardization process with respect to interoperability and simple execution of jobs in Grid environments. At the time of writing, Web services are considered as the preferred technology in the broader Grid context. It implies the seamless machine to machine message oriented communication among Grid resources. The Grid community agreed to consider Web services technologies and gathered a pace in defining standards of Grid services based on Web services technologies. The service oriented nature of Grid resources allows access of resources without any complex configurations and dependencies as well as it attains the goal of standardization globally. The JSDL and OGSA-BES standards are efforts from OGF that promote interoperability among various DRM’s. OGSA-BES [5] based upon the notion of providing standard execution context and exposes the common Web Services interface for job creation, submission, execution and monitoring. The kind of job which OGSA-BES supports is considered in broader sense and flexibly corresponds to any type of job which encapsulates data staging, resource and application requirements for particular user job. It is being implemented by several state of the art Grid middlewares namely UNICORE[1], g-Lite[2] and in an older version within Globus Toolkit 4[3, 4]. Furthermore, an OGSA-BES implementation is also provided by OMII-UK middleware called GridSAM (Grid Job Submission And Monitoring Service) [6]. GridSAM is an open source project providing Web services interfaces for submission, monitoring and execution of JSDL jobs. It is funded by Open Middleware Infrastructure Institute (OMII) UK [7]. Along with UNICORE and Globus Toolkit 4, GridSAM was one of the first systems stepping forward in adoption of Web services and job descriptions. It is specially designed for the user’s who are willing to submit jobs transparently on Grids. It is supported by job launching mechanisms such as Forking, Condor, Globus and Secure Shell. The basis for the GridSAM is its Job Management Library which encapsulates conversion mechanism from JSDL to the submission actions compatible to the underlying DRM. This encapsulation is rather loosely coupled which enables clients not only to use functions from Web services but as well as an independent library, therefore this implies the utilization of GridSAM in other distributed system technologies such as EJB (Enterprise Java Beans) . TBD: GridSAM and UNICORE Task of this milestone… In order to integrate the GridSAM system into the UNICORE environment, we first analyzed the GridSAM system with respect to its features in the area of job submission and monitoring. Then we provide an example of how the GridSAM system can be integrated into UNICORE Grids. 2. GridSAM Overview The primary objective of GridSAM is to deliver a robust, reliable and high quality job submission and management service that can be deployed and used with OMII-UK framework. It provides a common Web services interface for job submission. Therefore GridSAM can be used by any kind of clients that want to submit jobs, for example from a Web browser or Java Swing application. The description of users job and resource requirements through standard JSDL document is necessary. Furthermore it provides simple job monitoring. This requires testing job at different states during its lifetime i.e. Running, Held, Finished or Failed. In addition, it provides the ability to perform data staging operation upon user’s request. This allows user to specify the virtual file-system for their job’s on which files are staged to the resource before execution and staged back after execution. 2.1 Typical GridSAM Use Cases This section covers common use cases of GridSAM. As illustrated in Figure 1, GridSAM has three major use cases which are invoked by End-user[6]. Submit Job The end-user is considered as a stakeholder as well as an actor of this use case. The job has to be prepared by end-users in the form of a file and given to the system as an input. A computational resource owner is also considered as a stake holder which provides its resource for required computation. uc GridSAMUseCase GridSAM Submit Job Monitor Job End-User Terminate Job Figure 1: GridSAM UML Use Case Diagram Monitor Job This use case is slightly related to the Job Submit use case (as discussed above). An end-user submits the job to the system and wants to monitor its state. Computational resource owner make available the status of the computation executing in its local scheduling system. But the end-user must be authenticated and authorized before getting job status which is considered as one of the preconditions of this use case. Terminate Job This is also dependent on job submission use case. End-User certainly wants the abrupt termination of its job, given that the job is not finished, halted or already terminated. This additionally implies that the security conditions must be met before any request for termination is accomplished. 2.2 GridSAM Architecture Transparency of job description and submission is one of the main goals which are accomplished by GridSAM. It lets its users to execute applications through distributed resource managers transparently. This transparency is achieved through descriptions of jobs using common job description language, JSDL, and a uniform networked access interface via Web services. The user requests are being translated into resource specific actions to stage, launch and monitor a job. This translation functionality is encapsulated within its GridSAM Job Management Library (JML). JML is responsible for orchestrating the execution of DRM-Connectors as illustrated in Figure 2. A DRM-Connector is a reusable component in the realm of GridSAM, comprises of job management functions. Several DRM-Connectors joined together to make a chain or network of stages called Job Pipeline and it is created by a deployer. JML API also supports system engineers with programming common tasks such as failure, concurrency management and failure recovery. Figure 2: GridSAM System Architecture [8] The JML API is exposed as a Web Services interface and concretely implemented in JAX-RPC style Web Service. It can be deployed in any Java Servlet compliant container. Moreover, it gives the freedom in terms of scalability requirements by choosing containers or application servers of own choice. In terms of Security, the Web Services interface makes use of HTTPS transport security and utilizes the OMII-UK security framework for authentication and authorization of users. Nevertheless, the JML API can be embedded in other applications offering different modes of interaction[8]. In particular, GridSAM consists of a submission pipeline, which is a network of stages connected by different event queues. The notion of this pipelining architecture is derived from Staged Event Driven Architecture (SEDA). Instead of treating each job as submission action it is considered as network of robust stages and each stage in turn is conditioned to load by thread holding or filtering its event queue. This strategy is tested by number of other systems and yielded positive results in terms of robust load balancing. Figure 3 depicts the typical pipelining for Condor job submissions through the GridSAM system. Each stage of a job represents an implementation of the DRMConnector event handler interface. An instance of the DRM-Connector signifies the specific functionality (e.g. stage-in event) as well as particular stage of execution process. After the completion of stage decomposition, control passes to another stage in the pipeline asynchronously. This helps in dealing with long running jobs; for example huge file transfer tasks are decomposed further into sub-stages which are certainly handled by non-blocking I/O libraries. The motivation for providing message oriented event based DRM-Connector interface is to hide the complexity and level of details of resource management and concurrency mechanisms from system developers. State representation is handled by encapsulating states with incoming events and can be distribute easily without complex state management. Figure 3. Submission pipeline for Condor job in GridSAM [8] Fault recovery in GridSAM is handled by persistence of event queues and the information pertinent to each job instance; this allows the restarting of states in case of failure. JML API specifies JobInstanceStore API, which usually stores individual job instance information using the Hibernate toolkit to provide transactional Object-Relational mapping. Using Hibernate allows for the DRMImplementation to be independent from underlying persistence mechanism (in-memory, RDBMS persistence), by default job instances and event queues are stored in RDBMS. The Concurrency management in GridSAM is achieved through Quartz Framework[9]. In this framework, every stage in pipeline is served by a thread pool that invokes stage specific DRMConnector. It also schedules stages and allocate threads. In reference[10] Welsh et. al proposes the dynamic self-tuning of resource management parameters with the help application controller which is based on run-time demands and performance targets. Quartz framework also provides advanced date-based scheduling, fault recovery and clustering support which is seen as unavailable in other frameworks. 3. GridSAM Integration into UNICORE This section highlights the integration of UNICORE with GridSAM middleware components. The first section discusses integration architecture that folds both middleware components within one subsystem. The second section will focus on the realization of the integration in the light of usage scenarios specifically focusing on job management functions. 3.1 UNICORE Job Submission Functionalities and Services UNICORE is a middleware providing seamless and transparent access to computational and application resources, distributed across the Grid. UNICORE provides a baseline functionality that all other major Grid middleware provide, such as access to heterogeneous, platform agnostic application resources exposing computational and storage capabilities. Its underlying architecture is following service oriented model. The latest UNICORE 6 is essentially exploiting current Web services trends; specially, exposing its functionality as WS-RF compliant service interfaces. The UNICORE core architecture provides an implementation of the UAS (UNICORE Atomic Services): a set of WSRF services of storage, job management and target system interfaces. Indeed, these services cover a great deal of functionality ranging from resource advertisement for service providers to resource acquisition interface for consumers. In parallel to these services, UNICORE also provides an OGSA-BES interface that was also developed within OMII – Europe. The UNICORE gateway acts as an authentication layer between any WS-based client and the Web services tier. It is positioned at the server layer, thereby encapsulating and enhancing security provisioning to services layer which includes authentication based on X.509 certificates and as a loosely coupled component from the services itself, it also prevents the services for being attacked by a denial of service attack (e.g. 100000 WS-based request at nearly the same time). The Web services tier represents a functional layer responsible for performing actual job incarnation via Incarnation Database (IDB) that is managed by an enhanced Network Job Supervisor (XNJS) while the user authorization is done by the enhanced UNICORE user database (UUDB). While performing job execution, XNJS communicates to target resources via Target System Interface (TSI), consisting of native routines performing actual job execution on the available resource management systems. In the most deployment scenarios, the UNICORE GUI-Client is represented by two user interfaces. First, the Grid Programming Environment (GPE) facilitates user with a rich graphical user interface, through which a user can interactively submit and monitor jobs. It is based on a pluggable architecture so that other Grid applications can be defined within the client and makes use of the underlying UNICORE 6 infrastructure. Secondly, UNICORE provides a command line interface (CLI), which allows user to submit jobs via console. In more detail, UNICORE 6 follows a WS-RF-based Web services architecture to expose its core functionalities. The Registry service is a WS-ServiceGroup in which each tuple is called ServiceGroupEntry. The Registry service manages the information of all the WS-Resources and services available at a UNICORE 6 site. Specifically, this service is called by GPE clients to lookup EPRs of Target System Services (TSSs) for job execution. This service is one of the candidates for considering the integration of GridSAM into UNICORE Grids. The Target System Factory (TSF) is used to create a Target System Service and its WS-Resource and thus in an implementation of this WS-RF factory pattern. In the context of WS-RF, the factory pattern is defined as any service that is capable of bringing a WS-Resource into existence. The major goal of the TSS is that clients can use this service to submit computational jobs described in the emerging OGF standardized Job Submission Description Language (JSDL) [10]. In particular, the TSS provides access to a state-ful Target System WS-Resource. This WS-Resource models a physical computational Grid resource such as supercomputers, clusters, or server farm. It exposes various kind of information, for instance resource details such as the amount of CPUs, via the exposure of WS-Resource properties. The Job Management Service (JMS) is a response of a job submission via the TSS and consists of a WS-Addressing Endpoint Reference (EPR) [8]. In particular, this EPR can be used to control the job via the Job Management Service (JMS). The Storage Management Service (SMS) supports data staging of JSDL jobs, the Storage Management Service (SMS) can be used to access storage WS-Resources that represent for instance a remote directory from where data should be transferred for job execution. Finally, the File Transfer Service (FTS) is an open OGF standard implementation of RandomByteIO and Streamable-ByteIO. SMS works in conjunction with FTS in order to address the data staging requirements in the context of job execution. 3.2 GridSAM Job Submission Functionalities GridSAM is an implementation of OGSA-BES and JSDL standards. It provides the execution services by following the OGSA specification of BES, hence using its standard port types. The standard interfaces of OGSA-BES are as follows: BES Management: A management port type to instrument multiple BES-Factory instances. The major function is to manage BES-Factory for executing further activities. BES-Factory: A Web service interface for resource responsible for creating, terminating and monitoring BES-Activity instances. BES-Factory acts as a factory for BES-Activity. Actually, the resource property document of BES-Factory contains the underlying attributes of computing or data resource being advertised. BES-Activity: A Web service resource interface to represent individual activities instantiated by BES-Factory interface. Also, GridSAM supports JSDL in conjunction with OGSA-BES. Data staging is one of the rudimentary requirements in majority of the job management applications. GridSAM provides the handling of extensive and complex jobs with varying requirements by leveraging the use of pipeline mechanisms (see above). In more detail, the GridSAM server side component consists of two layers encapsulating mature job management functions as shown in Figure 4. The GridSAM Service provides Web service interfaces to job monitoring and submission functions. This layer provides the implementation of the emerging OGSA Basic Execution Services (BES) standard and GridSAM’s customized Web service interfaces. GridSAM service layer communicates with core component to achieve remote job launching and file staging capability in compliance with Job Submission Description Language (JSDL). Figure 4 illustrates the submission and monitoring port-types on top of the architecture. The GridSAM Core is a pipeline framework that comes with a set of components providing an integration with Distributed Resource Manager- Service Provider Interfaces (DRM-SPI). GridSAM currently supports Sun Grid Engine, Globus GRAM and Condor-G DRMs as shown in Figure 4. For extensibility, this component also possesses a pluggable architecture, so that developers can implement plug-ins to align with other resource managers. GridSAM service component is a wrapper Web service to the core engine. Figure 4. GridSAM architecture showing monitoring and submission port types, and beneath these interfaces, JobManager API communicates with resource managers. Finally, the GridSAM client offers an interface for client components to communicate with server side components with an aid of wrapper interfaces. For end-users, client provides command line tools and it also provides an API for third party applications. This component is used to prepare and send client job requests with the compliance of server interfaces. 3.3 Overall Integrated Architecture Nowadays in the Grid applications, the integration aspect is becoming an inevitable concern as there are many applications implicitly dependent on the interoperability of Grid components. In order to leverage different Grid resources hosted on heterogeneous platforms it is becoming a more and more daunting task for users to interact with many applications deployed on different platforms. Integration implies to the same flavour of tools at the same tier from different vendors communicating with each other in an interoperable way. Every Grid infrastructure essentially introduces its own in-house functionality to circumvent their functional and non-functional requirements. For example, job management system is one of the core middleware components, with dependencies on underlying custom execution management systems. In order to harness the job management function of one middleware system with other middleware’s client it may be required to enhance first one to introduce any extension which supports third party clients. Thus, it will further advance and signifies the job management of the middleware not only for its own clients but can also support clients communicating from other platforms. Indeed, it is not a trivial task to identify and position certain integration points. Therefore, a proper care has to be taken before analyzing the functional interfaces. Both middlewares’ functionalities following the emerging Web service standards for application interoperability. GridSAM implementation is based on WS-I plain Web services, whereas UNICORE services are implemented using WS-RF-compliant Web standards. GridSAM services are deployed on OMII-UK hosting environment, which is an extension of Apache AXIS soap engine. UNICORE uses the X-Fire based SOAP engine, and Jetty server as a hosting environment. The registry service enables clients to discover target services for job execution and management. TSS advertises itself onto the registry, in order to be discovered by any WS-* client. In the same way GridSAM service can be an integrated target service behind the UNICORE Gateway despite of its deployment in different hosting environment. GridSAM publishes its EPR (endpoint reference) onto the registry, in this way it will be exposed inside the Registry service. As soon as client interacts with the registry, it can retrieve a set of target services. Certainly, client finds the GridSAM service endpoint reference that was already published and will be shown within a list of target job execution services. Hence, GridSAM acts as an alternative job execution service within UNICORE 6. GridSAM implements two interfaces of job execution: a GridSAM core job execution service and an OGSA BES implementation. For integration, if exposing the EPRs of the OGSABES interface of GridSAM inside the UNICORE Registry service then clients can easily be able to discover any of the published services. Upon discovering GridSAM service then client will be able to communicate with its services. Figure 5. UNICORE – GridSAM integration architecture showing their association at the web services and resource tier. Figure 5 depicts an architectural integration of UNICORE and GridSAM showing components at the Web services and target resource tier respectively. The color legend in the above diagram is used to demonstrate the characterization of several components and their associations of both systems. GridSAM services are interacting with the UNICORE Registry to publish their access points by then they can be queried and discovered by clients. 4. Conclusion GridSAM is a robust, reliable and high quality job submission and management service that is typically deployed and used with OMII-UK framework. It provides a common Web services interface for job submission. Therefore GridSAM can be used by any kind of clients that want to submit jobs, for example from a Web browser or Java Swing application. GridSAM provides an interface that is compliant with the OGSA-BES specification of OGF which includes the usage of JSDL for job descriptions. Furthermore it provides simple job monitoring. This requires testing job at different states during its lifetime i.e. Running, Held, Finished or Failed. In addition, it provides the ability to perform data staging operation upon user’s request. This allows user to specify the virtual file-system for their job’s on which files are staged to the resource before execution and staged back after execution. In this document, we presented the integration of GridSAM into UNICORE 6 by using the UNICORE Registry to expose a GridSAM endpoint behind the UNICORE Gateway. The Gateway itself provides the authentication and thus GridSAM gains the authentication automatically while the Gateway simply forwards https requests to the GridSAM endpoint within UNICORE 6 sites after authentication checks have been done. Hence, the SOAP body (with the OGSA-BES operation call) is not changed and thus it is not necessary to change the GridSAM installation for the integration into UNICORE 6 sites. To sum up, the GridSAM system can be seen and actually used as an alternative job submission service for UNICORE 6 sites secured via the UNICORE Gateway. 5. References [1] UNICORE Forum e.V., Retrieved: 17 April 2007, <http://www.unicore.eu> [2] gLite, Retrieved: 09.07.2006, <http://glite.web.cern.ch/glite/> [3] Globus Alliance, Retrieved: 14 May 2006, <http://www.globus.org/> [4] M. Riedel and M. Marzolla, Milestone M:JRA1.17 Evaluation of OGSA-BES with respect to its adoption in the middleware of the OMII – Europe partners, 2007. [5] OGSA-BES WG. [6] GridSAM - Grid Job Submission and Monitoring Web Service, Retrieved: 17.04.2007, <http://gridsam.sourceforge.net> [7] OMII: Open Middleware Infrastructure Institute UK, Retrieved: 17.04.2007, <http://www.omii.ac.uk> [8] W. Lee, A. S. McGough and J. Darlington, Performance Evaluation of the GridSAM Job Submission and Monitoring System, London e-Science Centre, Imperial College London, South Kensington Campus, London SW7 2AZ, UK. [9] Quartz Job Scheduling Framework, Retrieved: 17.04.2007, <http://www.opensymphony.com/quartz> [10] M. Welsh, D. Culler and E. Brewer, Seda: An architecture for well-connected scalable internet services, in Eighteenth Symposium on Operating Systems Principles (SOSP-18), 2001.