Project logo (optional)

advertisement
EU project: RI031844-OMII-Europe
Project no: RI031844-OMII-Europe
Project acronym: OMII-Europe
Project title: Open Middleware Infrastructure Institute for Europe
Instrument: Integrated Infrastructure Initiative
Thematic Priority: Communication network development
M:JRA1.8 – GridSAM Integration into UNICORE
Due date of deliverable: 30 April 2007
Actual submission date: 30 April 2007
Start date of project: 1 May 2006
Duration: 2 years
Organisation name of lead contractor for this deliverable
INFN
Revision [draft 0.1]
Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006)
PU
PP
RE
CO
Dissemination Level
Public
Restricted to other programme participants (including the Commission Services)
Restricted to a group specified by the consortium (including the Commission Services)
Confidential, only for members of the consortium (including the Commission Services)
x
For additional information see
http://omii-europe.com or contact info@omii-europe.com
Document Control Sheet
Title: GriSAM Integration into UNICORE
ID: M:JRA1.8
Version: 0.1
Status: Draft
Available at: http://omii-europe.org
Software Tool: Microsoft Word 2007
File(s): only this one
Morris Riedel (FZJ),
Written by:
Shahbaz Memon (FZJ)
Shiraz Memon (FZJ)
Contributors:
Sven van de Berghe (FLE)
Reviewed by:
Approved by:
Document
Authorship
Document Status Sheet
Version
0.1
Date
10 April 2007
Status
Draft
Comments
All sections written
Table of Contents
Document Control Sheet .............................................................................................................. 2
Document Status Sheet ................................................................................................................ 2
Table of Contents ................................................................................................................................. 3
1. Introduction .................................................................................................................................. 4
2. GridSAM Overview ..................................................................................................................... 5
2.1 Typical GridSAM Use Cases ................................................................................................ 5
2.2 GridSAM Architecture .......................................................................................................... 6
3. GridSAM Integration into UNICORE ......................................................................................... 8
3.1 UNICORE Job Submission Functionalities and Services ..................................................... 8
3.2 GridSAM Job Submission Functionalities ............................................................................ 9
3.3 Overall Integrated Architecture ........................................................................................... 10
4. Conclusion ................................................................................................................................. 12
5. References .................................................................................................................................. 12
1. Introduction
Many existing Distributed Resource Managers (DRM) lack standardized job submission in Grid
environments. This leads to the requirement that the user needs to learn large number of job
description languages and deployment mechanisms in order to exploit Grid resources. The
evolvement of job description languages (e.g. JSDL) paves the way to improvise the standardization
process with respect to interoperability and simple execution of jobs in Grid environments.
At the time of writing, Web services are considered as the preferred technology in the broader Grid
context. It implies the seamless machine to machine message oriented communication among Grid
resources. The Grid community agreed to consider Web services technologies and gathered a pace
in defining standards of Grid services based on Web services technologies. The service oriented
nature of Grid resources allows access of resources without any complex configurations and
dependencies as well as it attains the goal of standardization globally. The JSDL and OGSA-BES
standards are efforts from OGF that promote interoperability among various DRM’s. OGSA-BES
[5] based upon the notion of providing standard execution context and exposes the common Web
Services interface for job creation, submission, execution and monitoring. The kind of job which
OGSA-BES supports is considered in broader sense and flexibly corresponds to any type of job
which encapsulates data staging, resource and application requirements for particular user job. It is
being implemented by several state of the art Grid middlewares namely UNICORE[1], g-Lite[2]
and in an older version within Globus Toolkit 4[3, 4]. Furthermore, an OGSA-BES implementation
is also provided by OMII-UK middleware called GridSAM (Grid Job Submission And Monitoring
Service) [6]. GridSAM is an open source project providing Web services interfaces for submission,
monitoring and execution of JSDL jobs. It is funded by Open Middleware Infrastructure Institute
(OMII) UK [7].
Along with UNICORE and Globus Toolkit 4, GridSAM was one of the first systems stepping
forward in adoption of Web services and job descriptions. It is specially designed for the user’s who
are willing to submit jobs transparently on Grids. It is supported by job launching mechanisms such
as Forking, Condor, Globus and Secure Shell. The basis for the GridSAM is its Job Management
Library which encapsulates conversion mechanism from JSDL to the submission actions
compatible to the underlying DRM. This encapsulation is rather loosely coupled which enables
clients not only to use functions from Web services but as well as an independent library, therefore
this implies the utilization of GridSAM in other distributed system technologies such as EJB
(Enterprise Java Beans) .
TBD: GridSAM and UNICORE Task of this milestone…
In order to integrate the GridSAM system into the UNICORE environment, we first analyzed the
GridSAM system with respect to its features in the area of job submission and monitoring. Then we
provide an example of how the GridSAM system can be integrated into UNICORE Grids.
2. GridSAM Overview
The primary objective of GridSAM is to deliver a robust, reliable and high quality job submission
and management service that can be deployed and used with OMII-UK framework. It provides a
common Web services interface for job submission. Therefore GridSAM can be used by any kind
of clients that want to submit jobs, for example from a Web browser or Java Swing application. The
description of users job and resource requirements through standard JSDL document is necessary.
Furthermore it provides simple job monitoring. This requires testing job at different states during its
lifetime i.e. Running, Held, Finished or Failed. In addition, it provides the ability to perform data
staging operation upon user’s request. This allows user to specify the virtual file-system for their
job’s on which files are staged to the resource before execution and staged back after execution.
2.1
Typical GridSAM Use Cases
This section covers common use cases of GridSAM. As illustrated in Figure 1, GridSAM has three
major use cases which are invoked by End-user[6].
Submit Job
The end-user is considered as a stakeholder as well as an actor of this use case. The job has to be
prepared by end-users in the form of a file and given to the system as an input. A computational
resource owner is also considered as a stake holder which provides its resource for required
computation.
uc GridSAMUseCase
GridSAM
Submit Job
Monitor Job
End-User
Terminate Job
Figure 1: GridSAM UML Use Case Diagram
Monitor Job
This use case is slightly related to the Job Submit use case (as discussed above). An end-user
submits the job to the system and wants to monitor its state. Computational resource owner make
available the status of the computation executing in its local scheduling system. But the end-user
must be authenticated and authorized before getting job status which is considered as one of the preconditions of this use case.
Terminate Job
This is also dependent on job submission use case. End-User certainly wants the abrupt termination
of its job, given that the job is not finished, halted or already terminated. This additionally implies
that the security conditions must be met before any request for termination is accomplished.
2.2
GridSAM Architecture
Transparency of job description and submission is one of the main goals which are accomplished by
GridSAM. It lets its users to execute applications through distributed resource managers
transparently. This transparency is achieved through descriptions of jobs using common job
description language, JSDL, and a uniform networked access interface via Web services. The user
requests are being translated into resource specific actions to stage, launch and monitor a job. This
translation functionality is encapsulated within its GridSAM Job Management Library (JML). JML
is responsible for orchestrating the execution of DRM-Connectors as illustrated in Figure 2. A
DRM-Connector is a reusable component in the realm of GridSAM, comprises of job management
functions. Several DRM-Connectors joined together to make a chain or network of stages called
Job Pipeline and it is created by a deployer. JML API also supports system engineers with
programming common tasks such as failure, concurrency management and failure recovery.
Figure 2: GridSAM System Architecture [8]
The JML API is exposed as a Web Services interface and concretely implemented in JAX-RPC
style Web Service. It can be deployed in any Java Servlet compliant container. Moreover, it gives
the freedom in terms of scalability requirements by choosing containers or application servers of
own choice. In terms of Security, the Web Services interface makes use of HTTPS transport
security and utilizes the OMII-UK security framework for authentication and authorization of users.
Nevertheless, the JML API can be embedded in other applications offering different modes of
interaction[8].
In particular, GridSAM consists of a submission pipeline, which is a network of stages connected
by different event queues. The notion of this pipelining architecture is derived from Staged Event
Driven Architecture (SEDA). Instead of treating each job as submission action it is considered as
network of robust stages and each stage in turn is conditioned to load by thread holding or filtering
its event queue. This strategy is tested by number of other systems and yielded positive results in
terms of robust load balancing. Figure 3 depicts the typical pipelining for Condor job submissions
through the GridSAM system. Each stage of a job represents an implementation of the DRMConnector event handler interface. An instance of the DRM-Connector signifies the specific
functionality (e.g. stage-in event) as well as particular stage of execution process. After the
completion of stage decomposition, control passes to another stage in the pipeline asynchronously.
This helps in dealing with long running jobs; for example huge file transfer tasks are decomposed
further into sub-stages which are certainly handled by non-blocking I/O libraries. The motivation
for providing message oriented event based DRM-Connector interface is to hide the complexity and
level of details of resource management and concurrency mechanisms from system developers.
State representation is handled by encapsulating states with incoming events and can be distribute
easily without complex state management.
Figure 3. Submission pipeline for Condor job in GridSAM [8]
Fault recovery in GridSAM is handled by persistence of event queues and the information pertinent
to each job instance; this allows the restarting of states in case of failure. JML API specifies
JobInstanceStore API, which usually stores individual job instance information using the Hibernate
toolkit to provide transactional Object-Relational mapping. Using Hibernate allows for the
DRMImplementation to be independent from underlying persistence mechanism (in-memory,
RDBMS persistence), by default job instances and event queues are stored in RDBMS. The
Concurrency management in GridSAM is achieved through Quartz Framework[9]. In this
framework, every stage in pipeline is served by a thread pool that invokes stage specific DRMConnector. It also schedules stages and allocate threads. In reference[10] Welsh et. al proposes the
dynamic self-tuning of resource management parameters with the help application controller which
is based on run-time demands and performance targets. Quartz framework also provides advanced
date-based scheduling, fault recovery and clustering support which is seen as unavailable in other
frameworks.
3. GridSAM Integration into UNICORE
This section highlights the integration of UNICORE with GridSAM middleware components. The
first section discusses integration architecture that folds both middleware components within one
subsystem. The second section will focus on the realization of the integration in the light of usage
scenarios specifically focusing on job management functions.
3.1
UNICORE Job Submission Functionalities and Services
UNICORE is a middleware providing seamless and transparent access to computational and
application resources, distributed across the Grid. UNICORE provides a baseline functionality that
all other major Grid middleware provide, such as access to heterogeneous, platform agnostic
application resources exposing computational and storage capabilities. Its underlying architecture is
following service oriented model. The latest UNICORE 6 is essentially exploiting current Web
services trends; specially, exposing its functionality as WS-RF compliant service interfaces. The
UNICORE core architecture provides an implementation of the UAS (UNICORE Atomic Services):
a set of WSRF services of storage, job management and target system interfaces. Indeed, these
services cover a great deal of functionality ranging from resource advertisement for service
providers to resource acquisition interface for consumers. In parallel to these services, UNICORE
also provides an OGSA-BES interface that was also developed within OMII – Europe.
The UNICORE gateway acts as an authentication layer between any WS-based client and the Web
services tier. It is positioned at the server layer, thereby encapsulating and enhancing security
provisioning to services layer which includes authentication based on X.509 certificates and as a
loosely coupled component from the services itself, it also prevents the services for being attacked
by a denial of service attack (e.g. 100000 WS-based request at nearly the same time).
The Web services tier represents a functional layer responsible for performing actual job
incarnation via Incarnation Database (IDB) that is managed by an enhanced Network Job
Supervisor (XNJS) while the user authorization is done by the enhanced UNICORE user database
(UUDB). While performing job execution, XNJS communicates to target resources via Target
System Interface (TSI), consisting of native routines performing actual job execution on the
available resource management systems.
In the most deployment scenarios, the UNICORE GUI-Client is represented by two user interfaces.
First, the Grid Programming Environment (GPE) facilitates user with a rich graphical user interface,
through which a user can interactively submit and monitor jobs. It is based on a pluggable
architecture so that other Grid applications can be defined within the client and makes use of the
underlying UNICORE 6 infrastructure. Secondly, UNICORE provides a command line interface
(CLI), which allows user to submit jobs via console.
In more detail, UNICORE 6 follows a WS-RF-based Web services architecture to expose its core
functionalities. The Registry service is a WS-ServiceGroup in which each tuple is called
ServiceGroupEntry. The Registry service manages the information of all the WS-Resources and
services available at a UNICORE 6 site. Specifically, this service is called by GPE clients to lookup
EPRs of Target System Services (TSSs) for job execution. This service is one of the candidates for
considering the integration of GridSAM into UNICORE Grids.
The Target System Factory (TSF) is used to create a Target System Service and its WS-Resource
and thus in an implementation of this WS-RF factory pattern. In the context of WS-RF, the factory
pattern is defined as any service that is capable of bringing a WS-Resource into existence.
The major goal of the TSS is that clients can use this service to submit computational jobs described
in the emerging OGF standardized Job Submission Description Language (JSDL) [10]. In
particular, the TSS provides access to a state-ful Target System WS-Resource. This WS-Resource
models a physical computational Grid resource such as supercomputers, clusters, or server farm. It
exposes various kind of information, for instance resource details such as the amount of CPUs, via
the exposure of WS-Resource properties.
The Job Management Service (JMS) is a response of a job submission via the TSS and consists of a
WS-Addressing Endpoint Reference (EPR) [8]. In particular, this EPR can be used to control the
job via the Job Management Service (JMS).
The Storage Management Service (SMS) supports data staging of JSDL jobs, the Storage
Management Service (SMS) can be used to access storage WS-Resources that represent for instance
a remote directory from where data should be transferred for job execution.
Finally, the File Transfer Service (FTS) is an open OGF standard implementation of RandomByteIO and Streamable-ByteIO. SMS works in conjunction with FTS in order to address the data
staging requirements in the context of job execution.
3.2
GridSAM Job Submission Functionalities
GridSAM is an implementation of OGSA-BES and JSDL standards. It provides the execution
services by following the OGSA specification of BES, hence using its standard port types. The
standard interfaces of OGSA-BES are as follows:

BES Management: A management port type to instrument multiple BES-Factory instances.
The major function is to manage BES-Factory for executing further activities.

BES-Factory: A Web service interface for resource responsible for creating, terminating and
monitoring BES-Activity instances. BES-Factory acts as a factory for BES-Activity.
Actually, the resource property document of BES-Factory contains the underlying attributes
of computing or data resource being advertised.

BES-Activity: A Web service resource interface to represent individual activities instantiated
by BES-Factory interface.
Also, GridSAM supports JSDL in conjunction with OGSA-BES. Data staging is one of the
rudimentary requirements in majority of the job management applications. GridSAM provides the
handling of extensive and complex jobs with varying requirements by leveraging the use of pipeline
mechanisms (see above).
In more detail, the GridSAM server side component consists of two layers encapsulating mature job
management functions as shown in Figure 4. The GridSAM Service provides Web service interfaces
to job monitoring and submission functions. This layer provides the implementation of the
emerging OGSA Basic Execution Services (BES) standard and GridSAM’s customized Web
service interfaces. GridSAM service layer communicates with core component to achieve remote
job launching and file staging capability in compliance with Job Submission Description Language
(JSDL). Figure 4 illustrates the submission and monitoring port-types on top of the architecture.
The GridSAM Core is a pipeline framework that comes with a set of components providing an
integration with Distributed Resource Manager- Service Provider Interfaces (DRM-SPI). GridSAM
currently supports Sun Grid Engine, Globus GRAM and Condor-G DRMs as shown in Figure 4.
For extensibility, this component also possesses a pluggable architecture, so that developers can
implement plug-ins to align with other resource managers. GridSAM service component is a
wrapper Web service to the core engine.
Figure 4. GridSAM architecture showing monitoring and submission port types, and beneath these
interfaces, JobManager API communicates with resource managers.
Finally, the GridSAM client offers an interface for client components to communicate with server
side components with an aid of wrapper interfaces. For end-users, client provides command line
tools and it also provides an API for third party applications. This component is used to prepare and
send client job requests with the compliance of server interfaces.
3.3
Overall Integrated Architecture
Nowadays in the Grid applications, the integration aspect is becoming an inevitable concern as
there are many applications implicitly dependent on the interoperability of Grid components. In
order to leverage different Grid resources hosted on heterogeneous platforms it is becoming a more
and more daunting task for users to interact with many applications deployed on different platforms.
Integration implies to the same flavour of tools at the same tier from different vendors
communicating with each other in an interoperable way. Every Grid infrastructure essentially
introduces its own in-house functionality to circumvent their functional and non-functional
requirements. For example, job management system is one of the core middleware components,
with dependencies on underlying custom execution management systems.
In order to harness the job management function of one middleware system with other
middleware’s client it may be required to enhance first one to introduce any extension which
supports third party clients. Thus, it will further advance and signifies the job management of the
middleware not only for its own clients but can also support clients communicating from other
platforms. Indeed, it is not a trivial task to identify and position certain integration points.
Therefore, a proper care has to be taken before analyzing the functional interfaces.
Both middlewares’ functionalities following the emerging Web service standards for application
interoperability. GridSAM implementation is based on WS-I plain Web services, whereas
UNICORE services are implemented using WS-RF-compliant Web standards. GridSAM services
are deployed on OMII-UK hosting environment, which is an extension of Apache AXIS soap
engine. UNICORE uses the X-Fire based SOAP engine, and Jetty server as a hosting environment.
The registry service enables clients to discover target services for job execution and management.
TSS advertises itself onto the registry, in order to be discovered by any WS-* client. In the same
way GridSAM service can be an integrated target service behind the UNICORE Gateway despite of
its deployment in different hosting environment. GridSAM publishes its EPR (endpoint reference)
onto the registry, in this way it will be exposed inside the Registry service. As soon as client
interacts with the registry, it can retrieve a set of target services. Certainly, client finds the
GridSAM service endpoint reference that was already published and will be shown within a list of
target job execution services. Hence, GridSAM acts as an alternative job execution service within
UNICORE 6. GridSAM implements two interfaces of job execution: a GridSAM core job execution
service and an OGSA BES implementation. For integration, if exposing the EPRs of the OGSABES interface of GridSAM inside the UNICORE Registry service then clients can easily be able to
discover any of the published services. Upon discovering GridSAM service then client will be able
to communicate with its services.
Figure 5. UNICORE – GridSAM integration architecture showing their association at the web
services and resource tier.
Figure 5 depicts an architectural integration of UNICORE and GridSAM showing components at
the Web services and target resource tier respectively. The color legend in the above diagram is
used to demonstrate the characterization of several components and their associations of both
systems. GridSAM services are interacting with the UNICORE Registry to publish their access
points by then they can be queried and discovered by clients.
4. Conclusion
GridSAM is a robust, reliable and high quality job submission and management service that is
typically deployed and used with OMII-UK framework. It provides a common Web services
interface for job submission. Therefore GridSAM can be used by any kind of clients that want to
submit jobs, for example from a Web browser or Java Swing application. GridSAM provides an
interface that is compliant with the OGSA-BES specification of OGF which includes the usage of
JSDL for job descriptions. Furthermore it provides simple job monitoring. This requires testing job
at different states during its lifetime i.e. Running, Held, Finished or Failed. In addition, it provides
the ability to perform data staging operation upon user’s request. This allows user to specify the
virtual file-system for their job’s on which files are staged to the resource before execution and
staged back after execution.
In this document, we presented the integration of GridSAM into UNICORE 6 by using the
UNICORE Registry to expose a GridSAM endpoint behind the UNICORE Gateway. The Gateway
itself provides the authentication and thus GridSAM gains the authentication automatically while
the Gateway simply forwards https requests to the GridSAM endpoint within UNICORE 6 sites
after authentication checks have been done. Hence, the SOAP body (with the OGSA-BES operation
call) is not changed and thus it is not necessary to change the GridSAM installation for the
integration into UNICORE 6 sites. To sum up, the GridSAM system can be seen and actually used
as an alternative job submission service for UNICORE 6 sites secured via the UNICORE Gateway.
5. References
[1]
UNICORE Forum e.V., Retrieved: 17 April 2007, <http://www.unicore.eu>
[2]
gLite, Retrieved: 09.07.2006, <http://glite.web.cern.ch/glite/>
[3]
Globus Alliance, Retrieved: 14 May 2006, <http://www.globus.org/>
[4]
M. Riedel and M. Marzolla, Milestone M:JRA1.17 Evaluation of OGSA-BES with respect to
its adoption in the middleware of the OMII – Europe partners, 2007.
[5]
OGSA-BES WG.
[6]
GridSAM - Grid Job Submission and Monitoring Web Service, Retrieved: 17.04.2007,
<http://gridsam.sourceforge.net>
[7]
OMII: Open Middleware Infrastructure Institute UK, Retrieved: 17.04.2007,
<http://www.omii.ac.uk>
[8]
W. Lee, A. S. McGough and J. Darlington, Performance Evaluation of the GridSAM Job
Submission and Monitoring System, London e-Science Centre, Imperial College London, South
Kensington Campus, London SW7 2AZ, UK.
[9]
Quartz Job Scheduling Framework, Retrieved: 17.04.2007,
<http://www.opensymphony.com/quartz>
[10] M. Welsh, D. Culler and E. Brewer, Seda: An architecture for well-connected scalable
internet services, in Eighteenth Symposium on Operating Systems Principles (SOSP-18), 2001.
Download