Designing a Java-based Grid Scheduler using Commodity Services

advertisement
Designing a Java-based Grid Scheduler using Commodity Services
Patrick Wendel
patrick@inforsense.com
InforSense
London
Arnold Fung
arnold@inforsense.com
InforSense
London
Abstract
Yike Guo
yg@doc.ic.ac.uk
Computing Department
Imperial College
London
However, such an approach has the drawback of complicating the integration with schedulers based on native
process submission, which is usually the case, as in our
case each workflow execution has to run within a hosting
Java-based environment that provides a different set of
services above the operating system. Equally, it is difficult to reuse the clustering features usually provided by
Java application servers as they are designed for short executions, usually for transaction-based web applications,
over a close cluster of machines.
Common approaches to implementing Grid schedulers
usually rely directly on relatively low-level protocols and
services, to benefit from better performances by having
full control of file and network usage patterns. Following this approach, the schedulers are bound to particular
network protocols, communication patterns and persistence layers. With the availability of standardized highlevel application hosting environments providing a level
of abstraction between the application and the resources
and protocols it uses, we present the design and the implementation necessary to build a Java-based, protocol agnostic, scheduler for Grid applications using commodity
services for messaging and persistence. We present how
it can be deployed following two different strategies, either as a scheduler for a campus grid, or a scheduler for
a wide-area network grid.
1
Moustafa Ghanem
mmg@doc.ic.ac.uk
Computing Department
Imperial College
London
2
Approach
Instead of trying to integrate schedulers based around
command-line tools and native processes, the approach
is to build a generic scheduler for long-running tasks using the services provided by the hosting environment. In
particular, the availability of a messaging service providing both point-to-point and publish/subscribe models, a
container-managed handling of the persistence of longlived objects as well as the ability to bind object types to
specific point-to-point message services, are of interests
for building the scheduler.
It follows from that approach that the implementation:
Motivation
This project was started as part of the development of the
Discovery Net platform[1], a workflow-based platform for
the analysis of large-scale scientific data.
The platform’s architecture consists of one or more
workflow execution servers and a workflow submission
server tightly coupled with an interactive client tool for
building, executing and monitoring. Thus the client tool
benefits from the ability to communicate complex objects
as well as code with the other components of the system
and allow rich interaction between these components. Interoperability of the workflow server with other services
within a more loosely-coupled Grid architecture is enabled by providing a set of stateless services accessible
using Web Services protocols, although for a subset of
the functionalities.
As well, the platform relies on the services of a Java
application server for providing a hosting environment for
the workflow activities to be executed. This environment
can, for instance, provide the activity with support for
authentication and authorisation management or logging.
• only requires a few classes,
• does not access I/O and resources directly,
• is network protocol agnostic,
• is agnostic to the Java application server it runs atop.
The scheduling policy resides in the messaging service’s
handling of the subscribers to its point-to-point model.
The service provider used in the experiment allows configuring and extending that handling, thus making it possible to use various scheduling algorithm or services. As
an example, the Sun Grid Engine was used to find out
what resource the scheduler should choose.
1
3
Container services
• HTTP: On the subscriber-side, the messaging service pulls regularly information from the messaging
provider. This approach solves the issue discussed
above but is not as efficient as sending the notification as it happens.
The design is based around a set of four standard mechanisms available in Java-based application server, part of
the Enterprise Java Beans[3] (EJB) and Java Messaging
System[4] (JMS) specifications, as shown on Figure 1.
3.1
3.4
Stateless Objects
Associated with the messaging service, special object
types can be registered to be instantiated to handle messages coming to a Queue object (point-to-point model),
thus removing the need of subscribers to act as factories
for the actual instances that will deal with the processing
of the object from the queue.
Stateless remote objects, also known as Stateless Session
Beans, have a very simple lifecycle as they cannot keep
any state. The container then provides support to access
these objects following several protocols:
• RMI/JRMP: For Java-based systems, thus allowing
to exchange any serialiseable Java object.
• RMI/IIOP:
operability
To
support
CORBA-IIOP
3.5
inter-
Persistence Management for Stateful
Objects
This service allows the container to the lifecycle and the
persistence of stateful objects. The objects are mapped
into a relational database using a predefined mapping and
the container is responsible for making sure of the consistency between the database instance and the object in
memory. This service is provided as Container-Managed
Persistence for Entity Beans.
3.3
Security Modules
The authentication and authorisation mechanism for Java
containers[8] (JAAS) supports the definition of authentication policies as part of the configuration of the containers and the descriptors of the application instead of
being coupled to the application code itself. It also allows
defining authorisation information such as the roles associated with a user. Modules can be defined to authenticate
access to components using most standard mechanisms
such as LDAP-based authentication infrastructures, NT
authentication, UNIX authentication, as well as support
for Shibboleth[10]. In our application, this facility is used
to enable secure propagation of authentication information from the submission server to the execution server.
• SOAP/WSDL: To support Web Services interoperability
3.2
Message-driven Objects
4
4.1
Messaging
Design
Architecture
The scheduler was designed to be applied to workflow executions. One of the differences between the submission of
workflows and the submission of executables and scripts
invoked through command lines, as is often the case for
job submission, is the size of the workflow description. Potentially, the workflow can represent a complex process,
and its entire description needs to be submitted for execution. The system must therefore ensure that the workflow
is reliably stored in a database before performing the execution, as the risks of failures at that stage are greater.
The overall architecture is shown in Figure 2. Both Web
and thick clients talk to the TaskManagement service, a
stateless service implemented by a stateless session bean
for job submission, control and some basic level of monitoring. The client also connects to the messaging service
to receive monitoring information about the execution.
The TaskManagement service is hosted by a container
that also provides hosting to the JobEntity bean, a
container-managed persistence entity bean stored in a
JMS is a messaging service that supports both point-topoint model using Queue objects and publish/subscribe
model using Topic objects.
JMS providers are responsible for the network protocol
that they use to communicate and deliver messages. In
particular, the service we used JBossMQ has support for
the following communication protocols:
• RMI/JRMP: Allows faster communication by pushing the notifications to the subscriber, but requires
the subscriber to be able to export RMI objects. Being able to export RMI objects adds constraints on
the network architecture as it means that the machine that exports the object must know the IP address or name by which it can be reached by the
caller. This is the main reason why such protocol
cannot be used easily for WAN deployments if the
subscribers for the message service is behind a firewall or belongs to a network using NAT.
2
Figure 1: Container Services
5. The ExecutionHandler subscribe to the Job topic,
selecting only to be notified of messages related to
the ID of the JobEntity it must handle.
common persistent storage, and access to the JMS service providers for the Job queue, the Job topic and the
Status topic. The server hosting this service is also called
submission server.
A JobEntity instance has the main following variables:
unique ID, execution status, workflow definition, workflow status information, last status update time, start
date, end date, user information.
Execution servers host a pool of message driven ExecutionHandler beans which, on receiving a message from
the Job queue, subscribe to messages added to the Job
topic. The pool size represents the maximum number of
tasks that the execution server allows to be processed concurrently. The container hosting the execution server also
has access to the same persistent storage and messaging
service providers as the submission server and so can host
instances of Job entities.
4.2
6. The ExecutionHandler then instantiates the JobEntity object and starts its execution.
4.3
Control
The following sequence of events happen when the user
sends a control command to the execution handler, such
as pause,resume,stop or kill (See Figure 4):
1. The TaskManagement service receives the request for
a control command to a given JobEntity ID.
2. If the request to execute that JobEntity is not in the
Job queue and its state is running then it posts the
control request to the Job topic.
Submission
3. The listening ExecutionHandler receives the notification and performs the control action on the JobEntity
which will accordingly modify its execution status.
The following sequence of events happen when a workflow
is submitted for execution (See Figure 3):
1. The client submits the workflow to the TaskManagement service
4.4
Monitoring
3. It then publishes a request for execution to theJob
queue and returns the ID of the JobEntity to the
caller.
As the scheduler is used to execute workflows which must
be monitored from a client tool, the monitoring mechanism needs to support relatively large workflow status
information that could include complex activity specific
objects describing the status of each running activity in
the workflow. The sequence of events for monitoring the
execution is as follows (See Figure 5):
4. That execution is picked up by one of the ExecutionHandlers of any execution server, following the
allocation policy of the JMS provider.
1. The ExecutionHandler for a running JobEntity regularly requests the latest workflow status information
from the running workflow.
2. The service then creates a new JobEntity object,
which is transparently persisted by the container in
the database
3
Figure 2: Architecture
Figure 3: Job Submission
Figure 4: Job Control
4
2. If that status has changed since the last status update, the state of the JobEntity is updated and the
new execution status and workflow status information is submitted to the Status topic.
and use the information returned to choose the relevant
subscriber.
Although these policies can be sufficient in general, for
our application to workflow execution, another policy was
created to make sure that we use the resource that holds
any intermediate results that have already been processed
in the workflow, in order to optimise its execution. In this
case, the policy has to check the workflow description
associated with the request, to find out any intermediate
results associated with any activity, and then decide to
use the corresponding server if possible.
3. While the client tool is started, it will be subscribing to publications on the Status topic, for the tasks
that have been submitted by the current user, and
will therefore receive the notification and associated
status information.
The policy followed by the ExecutionHandler to schedule the request for the latest workflow status information,
is based on a base period and a maximum period. The
status is requested according to the base period, except
if the status has not changed in which case the update
period is doubled, up to the maximum period.
4.5
5
The persistence service provider used is HSQL[5]. It provides support for the persistence of basic types as well as
Java objects. The messaging service is JBossMQ[7]. Messages are stored if needed in the same HSQL database
instance used for the persistence of container managed
entity objects.
Failure detection
One problem with the de-coupled architecture presented,
is that there is no immediate notification that an execution server has failed, as it only communicates through
the messaging service which does not provide by default
information to the application about its subscribers. The
approach used to detect failures of the execution server is
to check regularly, on the server hosting the TaskManagement service, the last status update time of all the running JobEntity. If that update time is significantly above
the maximum update period, then that job is stopped,
killed if necessary, and restarted.
4.6
Deployment
5.1
Campus Grid scheduler
The first deployment of the scheduler uses protocols that
are only suitable over an open network without communication restrictions or network address translation (NAT)
in some parts. This is usually the case for deployment inside an organisation. The RMI protocol can be used for
simplicity and efficiency, as well as direct connection and
notification of the client tool can be performed without
the risk of network configuration issues. This setup is described in Figure 6
Security
In order to make sure that workflows run in the correct
context and has the correct associated roles and authorization, the ExecutionHandler needs to impersonate the
user who submitted the workflow. This is implemented by
a specific JAAS module that enables that impersonation
to happens for a particular security policy used by the
ExecutionHandler to login.
5.2
Scheduling over WAN
The second deployment of the scheduler uses HTTP tunnelling for method calls and HTTP-based polling mechanism for queue and topic subscribers as shown on Figure
7. The main advantages are that the client does not have
to have an IP address accessible by the messaging service,
as there is no direct call-back. This means that although
4.7 Scheduling Policy
the client tool, in our application, performs rich interacThe scheduling policy is defined by the way the JMS ser- tions with the workflow, it does not have to be on the
vice provider decides which ExecutionHandler should re- same network as the task management server.
The execution servers also do not need to have a public
ceive requests added to the Job queue. Several policies
are provided by default with the JMS provider that was IP, which makes it theoretically possible, given the right
used, in particular we used a simple round-robin policy software delivery mechanism for the execution server, to
use the scheduler in configuration such as supported by
at first.
In order to integrate at that stage with a resource man- the SETI@Home[2] scheduler.
agement tool such as the Grid Engine[11], the scheduling
Other configurations need to be modified to support
policy was extended. To find out which subscriber to use, such deployment. In particular the values for time out
we assumed that the set of execution servers was the same and retries values for connections to the persistence manas the set of resources managed by the grid engine and ager and the messaging service need to be increased, as
submitted a request to execute the command hostname network delays or even network failures are more likely.
5
Figure 5: Job Monitoring
Figure 6: Deployment as Campus Grid Scheduler
Figure 7: WAN Deployment
6
6
Evaluation
6.1
It has also been deployed over a cluster of 12 IBM
Blades server running Linux, where the execution servers
are running on a private network not accessible by the
client machine.
Functional Evaluation
We evaluate each element of the architecture to see how
its scalability and robustness properties:
Finally it was also deployed over WAN and networks
with NAT. The client was hosted on the organisation’s internal network, with only a private IP address, in the UK.
The submission server was hosted in the US on a machine
that had a public IP address. The execution servers were
hosted on a private network in the US, although directly
accessible by the submission server. Even though the impact of using tunnelling and pull mechanisms do affect
the overall feel of the client application in terms of its latency when submitting, monitoring jobs and particularly
visualising the results, because of increased communication overheads, it does not affect the performance of the
workflow execution itself, which is the main concern.
• Task Management Service: This service is stateless
and therefore can easily be made highly available
through standard load-balancing and clustering techniques. If it fails, the system can carry on processing
jobs submitted. Only the clients currently connected
to it will not be able to submit and control the workflows, and the check for failed execution servers will
not be performed.
• Execution Server: The number of concurrent execution server is only limited by the maximum number
of subscribers that the messaging service supports
and the maximum number of connections that the
persistence provider can handle. In case of failure,
the job will not be lost. Once the failure detected,
the task will be resubmitted to the queue.
The main potential bottlenecks needing further investigation are the behaviour of the messaging service with
increasing number of subscribers, in particular execution
servers, as well as growing size of workflow descriptions.
• Messaging Service Provider: This is a service provided to our implementation. Its robustness and scalability characteristics depend on its implementation
and on the database it uses for persistence.
7
• Persistence Service Provider: As for the previous
provider, the robustness and scalability characteristics of the database vary with providers. While
we used HSQL, which does not provide specific
failover, scalability or high-availability features, there
are many database vendors providing such capabilities.
6.2
Comparison
The Java CoGKit [12] is a Java wrapper around the
Globus toolkit and provides a range of functionalities for
Grid applications including job submission. It is therefore
based on native process execution, while our approach
is to distribute executions of entities that run in a Java
hosting environments. However the Java CogKit could be
used in the scheduler proposed as a way to implement the
scheduling policy for the Job queue, but not to submit the
jobs directly.
Experimental Evaluation
The Grid Application Toolkit [9] has wrappers for Java
called JavaGAT. This interface is a wrapper over the main
native GAT engine which itself is trying to provide a consistent interface layer above several Grid infrastructure
such as Globus, Condor and Unicore. Again, the difference of the approach relies on the submission of native
process executions over the grid, instead of handling that
process at a higher level and leaving the scheduling, potentially, to a natively implemented Grid or resource management service.
We have implemented and tested the scheduler using a
variety of Discovery Net bioinformatics and cheminformatics application workflows. A complete empirical evaluation of the scheduling is beyond the scope of this paper
since it takes into account the characteristics of the application workflows themselves. However, in this section we
provide a brief overview of the experimental setting used
to test and evaluate the scheduler implementation.
The scheduler was deployed for testing over an ad-hoc
and heterogeneous set of machines. The submission server
was hosted on a Linux server where the persistence and
messaging services were also held. The execution servers
were running on a set of 15 Windows single processor
desktop machines on the same organisation’s network
without restrictions. The scheduler sustained overnight
constant submission and execution of workflows, each execution server handling a maximum of 3 concurrent executions, without apparent bottlenecks.
Proactive [6] takes a Java oriented approach by providing a library for parallel, distributed and concurrent computing interoperating with several Grid standards. While
using the same approach as here, we based the engineering around Java commodity messaging and persistence
services such that the robustness of the system mainly
and the network protocols it uses depends on these service rather than on the implementation.
7
8
Conclusion
[9] E. Seidel, G. Allen, A. Merzky, and J. Nabrzyski.
Gridlab—a grid application toolkit and testbed.
Future Generation Computer Systems, 18(8):1143–
1153, Oct. 2002.
To be able to build a robust Java-based scheduler based
on commodity services could enable a wider range of Grid
applications to benefit from the rich framework provided
by Java application server, and help to simplify their implementation. This paper presents a possible way to implement such a scheduler in this framework, as well as its
deployment and some of its robustness characteristics.
9
[10] Shibboleth-aware portals and information environments (spie) project. http://spie.oucs.ox.ac.uk/.
[11] Sun Grid Engine. http://gridengine.sunsource.net.
[12] G. von Laszewski, I. T. Foster, J. Gawor, and
P. Lane. A Java commodity grid kit. Concurrency
and Computation: Practice and Experience, 13(89):645–662, 2001.
Acknowledgements
The authors would like to thank the European Commission for funding this research through the SIMDAT
project.
References
[1] S. AlSairafi, F.-S. Emmanouil, M. Ghanem, N. Giannadakis, Y. Guo, D. Kalaitzopoulos, M. Osmond,
A. Rowe, J. Syed, and P. Wendel. The design of
discovery net: Towards open grid services for knowledge discovery. International Journal of High Performance Computing Applications, 17, Aug. 2003.
[2] D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer. SETI@home: an experiment in public-resource computing. Commun. ACM,
45(11):56–61, 2002.
[3] Enterprise JavaBeans Technology.
http://java.sun.com/products/ejb.
[4] M. Hapner, R. Burridge, R. Sharma, J. Fialli, and
K. Stout. Java Message Service. Sun Microsystems,
Inc., 901 San Antonio Road Palo Alto, CA 94303
USA, 2002.
[5] HSQL. http://www.hsqldb.org.
[6] F. Huet, D. Caromel, and H. E. Bal. A high performance java middleware with a real application. In
SC’2004 Conference CD, Pittsburgh, PA, Nov. 2004.
IEEE/ACM SIGARCH.
[7] JBossMQ.
http://www.jboss.com/products/messaging.
[8] C. Lai, L. Gong, L. Koved, A. Nadalin, and
R. Schemers. User authentication and authorization in the java platform. In Proceedings of the 15th
Annual Computer Security Applications Conference,
pages 285–290, Scottsdale, Arizona, Dec. 1999. IEEE
Computer Society Press.
8
Download