Toward an Interactive Grid Adaptive Resource Broker Abdulla Othman Peter Dew Karim Djemame Iain Gourlay School of Computing University of Leeds Leeds, LS2 9JT, UK Emails: {othman, dew, karim, iain}@comp.leeds.ac.uk Abstract This paper presents an approach used as a basis for interactive system adaptation in which Grid jobs are maintained at runtime. Most of the Grid research on Resource Broker assumes that the broker’s task, other than monitoring the job status, is over once the job has been submitted, i.e. the user submits the job to the broker or the scheduler and simply waits for the results. So far little attention has been given to interactive jobs. The idea for an interactive job running on the Grid is that the user may, for example, with the push of button, change an attribute of the job during run-time. Additionally the broker may migrate a job to another resource on behalf of the user, if the status of a resource changes significantly during run-time. An elegant solution, using a reflective technique, is proposed here. The benefit of a reflective technique is that it is easy to upgrade in order to adapt to changes in application requirements. It can also provide flexibility to customize policies dynamically to suit run-time environment and highlevel transparency to applications. The design of an Adaptive Resource Broker is described and experimentally evaluated. Results indicate that this approach enhances the likelihood of timely job completion in a dynamic Grid system. 1. Introduction Grid computing infrastructures offer a wide range of distributed resources to applications [Fos01]. To support application execution in the context of the Grid, a Grid Resource Broker is desirable. Grid Resource Brokering is defined as the process of making scheduling decisions involving resources over multiple administrative domains [Sch02]. This can include searching multiple administrative domains to use a single machine or scheduling a single job to use multiple resources at a single site or multiple sites. A Grid broker must make resource selection decisions in an environment where it has no control over the local resources. The resources are distributed, and information about the resources is often limited or dated. Moreover a Grid system must integrate heterogeneous resources with varying quality and availability. This places importance on the system’s ability to monitor the state of these resources and adapt to changes in their availability over time. The Grid is a dynamic system where resources are subjected to changes due to system performance degradation, new resources, system failure, etc. There is much ongoing work in the Grid to provide access to resources for applications, e.g. Nimrod/G, AppLes, and Condor-G [Buy00, Ber96, Fre02]. These brokers provide monitoring of the application and the user can view this information. However these brokers do not enable alterations to be made to the application during run-time. Such brokers can be considered “static”, in the sense that the broker does not intervene with the running of the application during run-time. This paper discusses a “dynamic” broker, i.e. a broker that enables interaction with the application during run-time. This can include, e.g automatic job migration when the status of a resource changes (e.g. another user submits a job to the same resource) and enabling the user to change a job attribute during run-time. The user will be able to view application status, resources used by the application, predicted completion time etc. Specifically, the paper presents the design and implementation of an Adaptive Resource Broker. Adaptability is crucial in the context of the Grid, as it has the potential to significantly improve the performance of global computing applications. An adaptive application can change its behaviour depending on available resources, optimising itself to its dynamic environment [Fri94]. For example, when resource load changes, a system could seek to improve the quality of its compute resources or relocate to another compute resource. Adaptation can be implemented in an ad-hoc fashion by embedding adaptability in the applications code. While this may work for local adaptation (i.e. for a single node), it does not work well for global adaptation (e.g., multiinstitutional virtual working environment) or in cases where multiple adaptation operations have to be coordinated. It also complicates both the application and adaptation code and makes the reuse of adaptation strategies impossible. Rather than the ad-hoc approach explained above, this paper presents a solution in which a reflective technique is used to simplify run-time adaptation in the Grid application. The design of the Resource Broker is described and experimentally evaluated. The experiment involves a time-constrained application with requirements specified by the user. The CPU usage of the application is monitored and the Resource Broker can migrate the job if it is anticipated that it will not otherwise meet the user’s time constraint. The reflective technique aims to separate concerns for functional and non-functional behaviour of a program, resulting in some code that is highly reusable and extensible [Bla99]. In this case, the aim is to address the concerns relating to dynamic adaptability without affecting existing user applications. Experimental results indicate that the approach adopted in this paper enhances the likelihood of timely job completion in a dynamic Grid environment. 2. Adaptive Resource Broker Design This section describes the design of the Adaptive Resource Broker, which is enhanced by adding the functionality to enable adaptability. The Resource Broker performs a number of basic functions. The first step is the discovery and selection of resources that best fit the needs of the Grid application. The broker will then submit jobs in the application to the chosen machines. The broker thus handles submission of jobs but not how the job is actually executed on the resource. This is part of the Resource Management system (RMS) that resides on the resource involved. These actions are referred to as scheduling in the Resource Broker. Once jobs are being executed (or waiting to be executed), the broker monitors the resources and the progression of the jobs. Figure 1 depicts the proposed design of an adaptable broker. The broker is comprised of basic components implementing: Resource Discovery & Selection; Dispatching; Monitoring and Adapter Manager. The Adapter Manager controls migration, which is supported by job monitoring and enabled by re-scheduling and check-pointing. The broker gathers dynamic information about the resources during the run-time like accessibility, system workload, performance, etc. Dynamic information (e.g. performance slowdown, target system failure, job cancellation, etc) will be reported to the monitor. The monitor provides predictive information to the Adapter Manager, which uses this information to make a decision as to whether job migration is required. The main task for the adapter manager is to ensure the job requirements are fulfilled. Figure 1: Proposed Broker Design Adaptability is implemented in the broker in such a way as to isolate the user from the complexities of the system. In particular, the user is not obliged to alter his/her code in order to achieve adaptability. This is achieved using reflection. A reflective system, as defined by [Mea87], is one that is capable of reasoning about itself (besides reasoning about the application domain). The benefits of using reflection are [Bla98, Cou01]: • Flexibility to customise policies dynamically to suit run-time environment. • High-level transparency to applications. Reflection is used to bind the broker components to the application object. This ensures a clean separation between the application and the broker. Hence the broker components are transparent to the application. This use of reflection is the basis of the Adaptive Resource Broker implementation. The reflective technique is implemented using OpenJava [Tat00, Tat99]. 3. Experiments and Results The experiments presented in this section involve the submission of jobs, with user requirements specified by the user, to the adaptive broker. While a job is running, other jobs may be submitted to the same resource(s). The results obtained are compared to the case where the adaptive middleware is not used. In particular, the experiments address the following questions: 1. When job requirements are not met, are jobs being successfully migrated? 2. Does this result in shorter job execution time, compared to the case when the adaptive middleware is not used? The experiments ran on a Grid test-bed consisting of 10 machines. Each machine has a Pentium IV processor (1.2 GHz) with 256 RAM. The operating system is Linux 2.4. All machines Cpu Usage 33 29 25 22 18 15 11 7 Cpu Mean Usage 4 Cpu Usage 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Time Elapsed (minutes) Figure 2: CPU usage during run-time in resource 3 1.05 1 Cpu Usage have Globus 2.2 installed. Communication between resources is via a fast LAN Ethernet. Question (1) above is addressed as follows. A job is submitted to one of the resources. The Adaptive Resource Broker monitors the job during run-time, keeping track of the current and mean CPU usage. If the mean CPU usage decreases significantly then the broker should identify this and migrate the job to another resource. This is tested by overloading the resource on which the job is running by submitting another job. The job total execution time (TE) is recorded. Question (2) is addressed by comparing TE with the total execution time when the adaptive middleware is not used. The set of resources are labelled 1 to 10. The broker submits the job to a resource that is expected to be able to execute the job within the user’s time-constraint (81 minutes: the time the job is expected to take at 90% CPU usage). Hence a resource with a high percentage (>90%) of free CPU is chosen, in this case resource 3. Figure 2 depicts the fractional CPU usage of the application during run-time. This information is provided by the monitor. Specifically, the actual CPU usage over time is shown, in addition to the mean CPU usage, averaged over the time elapsed. This information is used to determine whether a job migration is required. As shown in figure 2, the job executes normally (i.e. with close to full CPU usage) for about 30 minutes. The CPU usage then decreases sharply, as confirmed by the data shown in Figure 2. Hence the resource is no longer expected to finish the job within the specified time constraint. This means the broker has to take action and restart the job on another resource. The broker has chosen resource 4 and the job starts from where it stopped on resource 3. Figure 3 shows the CPU usage during execution on the new resource. As a consequence of the use of reflective middleware, the job is adaptable to the environment and continues to meet the user requirements. The total execution time was 74 minutes. The same job was executed without the reflective technique; in this case the job took longer than the time it was assigned. The total execution time was 125 minutes. This results from the fact that there was another job running on the same resource during the job execution. Referring to the questions posed at the beginning of this section, it has been shown through experiments that the adaptive middleware successfully supports job migration. This results in a reduction in the job execution time compared to the case where adaptive middleware is not used. Cpu Usage 0.95 0.9 Cpu Mean Usage 0.85 0.8 39 44 49 54 60 65 70 Time Elapsed (minutes) Figure 3: CPU usage during run-time in resource 4 4. Conclusions and Future Work This paper describes an Adaptive Resource Broker that enables application configuration and adaptation based on resource characteristics and user preferences. Reflective middleware is proposed, controlling each aspect of an application in a different program. The reflective middleware permits run-time mechanisms to automatically decide when and how to adapt the application in reaction to changes in resource conditions. We have shown that our Adaptive Resource Broker is a viable contender for use in future Grid implementations. This is supported by the experimental results obtained on a Grid testbed. Future work will focus on extending the current broker in order to support interactive jobs, where the user can read the job intermediate outputs, suspend it, change some input parameters and migrate it. During the whole session a bidirectional channel is opened between the user client and the job running on a remote machine. The entire job input and output streams are exchanged with the user client via this channel. The user sends input data to the job and receives output results and error messages. [Mea87] P. Meas. Concepts and Experiments in Computational Reflection. In Proceedings of OOPSLA’87, pp. 147-155. ACM. October 1987. 5. References [Bla98] G. Blair B. and G. Coulson M.. The Case for Reflective Middleware. Internal report number MPG-98-38, Distributed Multimedia Research Group. Lancaster University. 1998. [Bla99] G. Blair, F. Costa and G. Coulson. Experiments with Reflective Middleware. In Proceedings of ECOOP'98 Workshop on Reflective Object Oriented Programming and Systems, Springer Verlag, 1998. Internal Report No MPG-98-11, Distributed Multimedia Research Group. Lancaster University. 1999. [Ber96] F. Berman., R. Wolski, S. Figueira, J. Schopf, G. Shao, Application-Level Scheduling on Distributed Heterogeneous Networks, Supercomputing'96, November 1996. Also UCSD CS Tech Report #CS96-482. [Buy00] R. Buyya, D. Abramson, J. Giddy, Nimrod/G: An Architecture for a ResourceManagement and Scheduling System in a Global Computational Grid. In proceedings of the 4th International Conference on High Performance Computing in Asia-Pacific Region, Beijing, China, IEEE Computer Society Press, USA, 2000. [Cou01] G. Coulson. What is Reflective Middleware? IEEE Distributed System online. http://boole.computer.org/dsonline/middleware/R Marticle1.htm [Fos01] I. Foster, C. Kesselman and S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 2001. [Fre02] J. Frey, T. Tannenbaum, I. Foster, M. Livny, S. Tuecke Condor-G: A Computation Management Agent for Multi-Institutional Grids.. Cluster Computing, Vol. 5, No. 3, pp.237-246, 2002. [Fri94] N. Friedman and K. Lieberherr. Reuse of Adaptive Software through Opportunistic Parameterization. Technical Report NU-CCS94-17, Northeastern University, May 1994. [Hsi96] W Hsinho, C. Chaofeng and F. Taylor. A Multiprocessor Self-Organizing Task Scheduler. http://www.hsdal.ufl.edu/Projects/Osculant/osc06 961.html [Sch02] J. Schopf. A General Architecture for Scheduling on the Grid. Submitted to the Journal of Parallel and Distributed Computing, 2002. [Tat99] M Tatsubori. An Extension Mechanism for the Java Language. Master of Engineering Dissertation, Graduate School of Engineering, University of Tsukuba, Ibaraki, Japan, Feb. 1999. [Tat00] M. Tatsubori, S. Chiba, M-O. Killijian and K. Itano, OpenJava: A Class-Based Macro System for Java. Reflection and Software Engineering, W. Cazzola, R.J. Stroud, F. Tisato (Eds.), Lecture Notes in Computer Science 1826, pp.117-133, Springer-Verlag, 2000