Toward an Interactive Grid Adaptive Resource Broker

advertisement
Toward an Interactive Grid Adaptive Resource Broker
Abdulla Othman
Peter Dew
Karim Djemame
Iain Gourlay
School of Computing
University of Leeds
Leeds, LS2 9JT, UK
Emails: {othman, dew, karim, iain}@comp.leeds.ac.uk
Abstract
This paper presents an approach used as a basis for interactive system adaptation in which Grid jobs are
maintained at runtime. Most of the Grid research on Resource Broker assumes that the broker’s task,
other than monitoring the job status, is over once the job has been submitted, i.e. the user submits the job
to the broker or the scheduler and simply waits for the results. So far little attention has been given to
interactive jobs. The idea for an interactive job running on the Grid is that the user may, for example,
with the push of button, change an attribute of the job during run-time. Additionally the broker may
migrate a job to another resource on behalf of the user, if the status of a resource changes significantly
during run-time. An elegant solution, using a reflective technique, is proposed here. The benefit of a
reflective technique is that it is easy to upgrade in order to adapt to changes in application requirements.
It can also provide flexibility to customize policies dynamically to suit run-time environment and highlevel transparency to applications.
The design of an Adaptive Resource Broker is described and experimentally evaluated. Results indicate
that this approach enhances the likelihood of timely job completion in a dynamic Grid system.
1. Introduction
Grid computing infrastructures offer a wide
range of distributed resources to applications
[Fos01]. To support application execution in the
context of the Grid, a Grid Resource Broker is
desirable. Grid Resource Brokering is defined as
the process of making scheduling decisions
involving resources over multiple administrative
domains [Sch02]. This can include searching
multiple administrative domains to use a single
machine or scheduling a single job to use
multiple resources at a single site or multiple
sites. A Grid broker must make resource
selection decisions in an environment where it
has no control over the local resources. The
resources are distributed, and information about
the resources is often limited or dated.
Moreover a Grid system must integrate
heterogeneous resources with varying quality
and availability. This places importance on the
system’s ability to monitor the state of these
resources and adapt to changes in their
availability over time. The Grid is a dynamic
system where resources are subjected to changes
due to system performance degradation, new
resources, system failure, etc.
There is much ongoing work in the Grid to
provide access to resources for applications, e.g.
Nimrod/G, AppLes, and Condor-G [Buy00,
Ber96, Fre02].
These brokers provide
monitoring of the application and the user can
view this information. However these brokers do
not enable alterations to be made to the
application during run-time. Such brokers can be
considered “static”, in the sense that the broker
does not intervene with the running of the
application during run-time. This paper discusses
a “dynamic” broker, i.e. a broker that enables
interaction with the application during run-time.
This can include, e.g automatic job migration
when the status of a resource changes (e.g.
another user submits a job to the same resource)
and enabling the user to change a job attribute
during run-time. The user will be able to view
application status, resources used by the
application, predicted completion time etc.
Specifically, the paper presents the design and
implementation of an Adaptive Resource Broker.
Adaptability is crucial in the context of the Grid,
as it has the potential to significantly improve the
performance of global computing applications.
An adaptive application can change its behaviour
depending on available resources, optimising
itself to its dynamic environment [Fri94]. For
example, when resource load changes, a system
could seek to improve the quality of its compute
resources or relocate to another compute
resource. Adaptation can be implemented in an
ad-hoc fashion by embedding adaptability in the
applications code. While this may work for local
adaptation (i.e. for a single node), it does not
work well for global adaptation (e.g., multiinstitutional virtual working environment) or in
cases where multiple adaptation operations have
to be coordinated. It also complicates both the
application and adaptation code and makes the
reuse of adaptation strategies impossible.
Rather than the ad-hoc approach explained
above, this paper presents a solution in which a
reflective technique is used to simplify run-time
adaptation in the Grid application. The design of
the Resource Broker is described and
experimentally evaluated. The experiment
involves a time-constrained application with
requirements specified by the user. The CPU
usage of the application is monitored and the
Resource Broker can migrate the job if it is
anticipated that it will not otherwise meet the
user’s time constraint.
The reflective technique aims to separate
concerns for functional and non-functional
behaviour of a program, resulting in some code
that is highly reusable and extensible [Bla99]. In
this case, the aim is to address the concerns
relating to dynamic adaptability without affecting
existing user applications. Experimental results
indicate that the approach adopted in this paper
enhances the likelihood of timely job completion
in a dynamic Grid environment.
2. Adaptive Resource Broker Design
This section describes the design of the Adaptive
Resource Broker, which is enhanced by adding
the functionality to enable adaptability. The
Resource Broker performs a number of basic
functions. The first step is the discovery and
selection of resources that best fit the needs of
the Grid application. The broker will then submit
jobs in the application to the chosen machines.
The broker thus handles submission of jobs but
not how the job is actually executed on the
resource. This is part of the Resource
Management system (RMS) that resides on the
resource involved. These actions are referred to
as scheduling in the Resource Broker. Once jobs
are being executed (or waiting to be executed),
the broker monitors the resources and the
progression of the jobs.
Figure 1 depicts the proposed design of an
adaptable broker. The broker is comprised of
basic components implementing: Resource
Discovery & Selection; Dispatching; Monitoring
and Adapter Manager.
The Adapter Manager controls migration, which
is supported by job monitoring and enabled by
re-scheduling and check-pointing. The broker
gathers dynamic information about the resources
during the run-time like accessibility, system
workload,
performance,
etc.
Dynamic
information (e.g. performance slowdown, target
system failure, job cancellation, etc) will be
reported to the monitor.
The monitor provides predictive information to
the Adapter Manager, which uses this
information to make a decision as to whether job
migration is required. The main task for the
adapter manager is to ensure the job requirements
are fulfilled.
Figure 1: Proposed Broker Design
Adaptability is implemented in the broker in such
a way as to isolate the user from the complexities
of the system. In particular, the user is not
obliged to alter his/her code in order to achieve
adaptability. This is achieved using reflection. A
reflective system, as defined by [Mea87], is one
that is capable of reasoning about itself (besides
reasoning about the application domain). The
benefits of using reflection are [Bla98, Cou01]:
• Flexibility to customise policies dynamically to
suit run-time environment.
• High-level transparency to applications.
Reflection is used to bind the broker components
to the application object. This ensures a clean
separation between the application and the
broker. Hence the broker components are
transparent to the application. This use of
reflection is the basis of the Adaptive Resource
Broker implementation. The reflective technique
is implemented using OpenJava [Tat00, Tat99].
3. Experiments and Results
The experiments presented in this section involve
the submission of jobs, with user requirements
specified by the user, to the adaptive broker.
While a job is running, other jobs may be
submitted to the same resource(s). The results
obtained are compared to the case where the
adaptive middleware is not used. In particular,
the experiments address the following questions:
1. When job requirements are not met, are jobs
being successfully migrated?
2. Does this result in shorter job execution time,
compared to the case when the adaptive
middleware is not used?
The experiments ran on a Grid test-bed
consisting of 10 machines. Each machine has a
Pentium IV processor (1.2 GHz) with 256 RAM.
The operating system is Linux 2.4. All machines
Cpu
Usage
33
29
25
22
18
15
11
7
Cpu
Mean
Usage
4
Cpu Usage
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Time Elapsed (minutes)
Figure 2: CPU usage during run-time in
resource 3
1.05
1
Cpu Usage
have Globus 2.2 installed. Communication
between resources is via a fast LAN Ethernet.
Question (1) above is addressed as follows. A job
is submitted to one of the resources. The
Adaptive Resource Broker monitors the job
during run-time, keeping track of the current and
mean CPU usage. If the mean CPU usage
decreases significantly then the broker should
identify this and migrate the job to another
resource. This is tested by overloading the
resource on which the job is running by
submitting another job. The job total execution
time (TE) is recorded. Question (2) is addressed
by comparing TE with the total execution time
when the adaptive middleware is not used.
The set of resources are labelled 1 to 10. The
broker submits the job to a resource that is
expected to be able to execute the job within the
user’s time-constraint (81 minutes: the time the
job is expected to take at 90% CPU usage).
Hence a resource with a high percentage (>90%)
of free CPU is chosen, in this case resource 3.
Figure 2 depicts the fractional CPU usage of the
application during run-time. This information is
provided by the monitor. Specifically, the actual
CPU usage over time is shown, in addition to the
mean CPU usage, averaged over the time
elapsed. This information is used to determine
whether a job migration is required. As shown in
figure 2, the job executes normally (i.e. with
close to full CPU usage) for about 30 minutes.
The CPU usage then decreases sharply, as
confirmed by the data shown in Figure 2. Hence
the resource is no longer expected to finish the
job within the specified time constraint. This
means the broker has to take action and restart
the job on another resource.
The broker has chosen resource 4 and the job
starts from where it stopped on resource 3. Figure
3 shows the CPU usage during execution on the
new resource. As a consequence of the use of
reflective middleware, the job is adaptable to the
environment and continues to meet the user
requirements. The total execution time was 74
minutes.
The same job was executed without the reflective
technique; in this case the job took longer than
the time it was assigned. The total execution time
was 125 minutes. This results from the fact that
there was another job running on the same
resource during the job execution. Referring to
the questions posed at the beginning of this
section, it has been shown through experiments
that the adaptive middleware successfully
supports job migration. This results in a reduction
in the job execution time compared to the case
where adaptive middleware is not used.
Cpu
Usage
0.95
0.9
Cpu
Mean
Usage
0.85
0.8
39
44
49
54
60
65
70
Time Elapsed (minutes)
Figure 3: CPU usage during run-time in
resource 4
4. Conclusions and Future Work
This paper describes an Adaptive Resource
Broker that enables application configuration and
adaptation based on resource characteristics and
user preferences. Reflective middleware is
proposed, controlling each aspect of an
application in a different program. The reflective
middleware permits run-time mechanisms to
automatically decide when and how to adapt the
application in reaction to changes in resource
conditions.
We have shown that our Adaptive Resource
Broker is a viable contender for use in future
Grid implementations. This is supported by the
experimental results obtained on a Grid testbed.
Future work will focus on extending the current
broker in order to support interactive jobs, where
the user can read the job intermediate outputs,
suspend it, change some input parameters and
migrate it. During the whole session a bidirectional channel is opened between the user
client and the job running on a remote machine.
The entire job input and output streams are
exchanged with the user client via this channel.
The user sends input data to the job and receives
output results and error messages.
[Mea87] P. Meas. Concepts and Experiments in
Computational Reflection. In Proceedings of
OOPSLA’87, pp. 147-155. ACM. October 1987.
5. References
[Bla98] G. Blair B. and G. Coulson M.. The Case
for Reflective Middleware. Internal report
number MPG-98-38, Distributed Multimedia
Research Group. Lancaster
University. 1998.
[Bla99] G. Blair, F. Costa and G. Coulson.
Experiments with Reflective Middleware. In
Proceedings of ECOOP'98 Workshop on
Reflective Object Oriented Programming and
Systems, Springer Verlag, 1998. Internal Report
No MPG-98-11, Distributed Multimedia
Research Group. Lancaster University. 1999.
[Ber96] F. Berman., R. Wolski, S. Figueira, J.
Schopf, G. Shao, Application-Level Scheduling
on Distributed Heterogeneous Networks,
Supercomputing'96, November 1996. Also
UCSD CS Tech Report #CS96-482.
[Buy00] R. Buyya, D. Abramson, J. Giddy,
Nimrod/G:
An
Architecture
for
a
ResourceManagement and Scheduling System in
a Global Computational Grid. In proceedings of
the 4th International Conference on High
Performance Computing in Asia-Pacific Region,
Beijing, China, IEEE Computer Society Press,
USA, 2000.
[Cou01] G. Coulson. What is Reflective
Middleware? IEEE Distributed System online.
http://boole.computer.org/dsonline/middleware/R
Marticle1.htm
[Fos01] I. Foster, C. Kesselman and S. Tuecke.
The Anatomy of the Grid: Enabling Scalable
Virtual Organizations. International Journal of
Supercomputer Applications, 15(3), 2001.
[Fre02] J. Frey, T. Tannenbaum, I. Foster, M.
Livny, S. Tuecke Condor-G: A Computation
Management Agent for Multi-Institutional Grids..
Cluster Computing, Vol. 5, No. 3, pp.237-246,
2002.
[Fri94] N. Friedman and K. Lieberherr. Reuse of
Adaptive Software through Opportunistic
Parameterization. Technical Report NU-CCS94-17, Northeastern University, May 1994.
[Hsi96] W Hsinho, C. Chaofeng and F. Taylor. A
Multiprocessor Self-Organizing Task Scheduler.
http://www.hsdal.ufl.edu/Projects/Osculant/osc06
961.html
[Sch02] J. Schopf. A General Architecture for
Scheduling on the Grid. Submitted to the Journal
of Parallel and Distributed Computing, 2002.
[Tat99] M Tatsubori. An Extension Mechanism
for the Java Language. Master of Engineering
Dissertation, Graduate School of Engineering,
University of Tsukuba, Ibaraki, Japan, Feb. 1999.
[Tat00] M. Tatsubori, S. Chiba, M-O. Killijian
and K. Itano, OpenJava: A Class-Based Macro
System for Java. Reflection and Software
Engineering, W. Cazzola, R.J. Stroud, F. Tisato
(Eds.), Lecture Notes in Computer Science 1826,
pp.117-133,
Springer-Verlag, 2000
Download