4005-739 Seminar Grid Computing I: Concepts and Practice · http://grid.rit.edu/wiki/doku.php?id=grid:seminar1 1 Globus and Xgrid in the GridShell Environment Andrew J Younge, and Gregor von Laszewski, PhD Rochester Institute of Technology, 102 Lomb Memorial Drive, Rochester, NY 14623-5608, USA Abstract - In order to lower the entry barrier to grid computing for the scientific community, we developed a shell interface to distributed computing environments exists. The GridShell has managerial services and capabilities to enhance usability and simplify workflow management. The shell also provides transparent access to and management of grid resources. The shell is divided into three major components: the client shell, the shell backend, and managerial services. This paper describes the current functionality of the shell backend, that allows for the grid shell to reach vast and diverse resources that were previously unavailable to most scientists. In this paper we focus on the description of integration the OSX Xgrid as one of its resources. However it has a large learning curve associated with its I.INTRODUCTION deployment and use that can be prohibitive to researchers GRID computing has made significant progress over the past 15 years in harnessing the potential computing power available in many systems. Using extensive queuing systems and revolutionary middleware, scientists and engineers with diverse backgrounds have been able to use grid computing technologies to solve some of the worlds most challenging computational problems [?]. However as the software solutions for using grid computing become larger, the usability becomes more complicated and often eludes users to use it due to its high entry barrier. Xgrid [6,7] is another grid solution developed by Apple Computer, Inc. Xgrid is known for its easy setup and management as well as its ability to be used across the Internet in a grid-like fashion. He difference between xgrid and Globus is … Xgrid is easy to setup and use, however it is only available on Mac OS X machines. This results in resources that are much more homogenous, however the potential span is very limited when compared to other grid technologies, especially if you do not have a Macintosh-based computing environment The Globus Toolkit [1,2] is one of the most extensive Grid middleware solutions available today. interested in grid computing. It is able to bridge spatially and administratively different computing systems allowing for its users to gain access to resources that would be otherwise unavailable. The potential of Globus is unmatched by any grid technology today. It is the largest set of tools developed for any grid environment that exists today. already in place. [THERE ARE OTHER DIFFERENCES] The goal of the GridShell is to be able to consolidate the advantages of a variety of distributed computing solutions including Globus, SSH, and XGrid. Together they provide an integrated access to a diverse set of resources that can be easily accessed by the scientific user community. Many different interfaces have been developed for girds 2 over the years, however most of them have focused on the advanced scientific community developing their can be seen in the following figure. own solutions, leaving a high point of entry for an incoming researcher. In addition, the interfaces are also been focused on one specific grid environment, typically Globus. While these grid services do meet their requirements, they keep the practice of grid computing a limited instead of allowing a commodity item available to all. [We need to add better reason: STANDARDS ON PROTOCOL LEVEL NOT ON USER INTERFACE …] Figure 1: Architectural overview of the GridShell system The GridShell was created to fill this niche. II.GRIDSHELL BACKEND The current task flow of the GridShell is straightforward and easy to follow. First, the client starts up the GridShell and Incorporating Xgrid and the Globus Toolkit into the shell issues a submit command. Based on the task and resource backend is imperative to the success of the grid shell itself. command arguments, the GridShell will talk to the Mediator Without the ability to use these technologies, the grid shell service. Then, the Mediator will submit the desired task to a will not be able to interface with the existing resources that are specific resource, effectively creating a specific job on the readily available. Its also important to construct a framework given resource. Finally, a user can monitor tasks by using the to allow for easy integration of other grid systems as this will status command, which queries the remote resources and enable the grid shell to scale freely into the future as new returns each job’s status back to the user. technologies and user requirements arise. The GridShell uses SSH to communicate not only between The GridShell introduces the notion of a Mediator, which is the client and the mediator but also between the mediator and the main interface between the GridShell front end and the the remote resource, allowing for the simple submission remote grid resources. The buffer setup up by using the interface to be deployed anywhere SSH is available. To mediator acts as a layer of separation and abstraction between simplify the need to type in the password continuously, the the client interface and the Globus and Xgrid remote resources. This results in a client that can be deployed easily in multiple locations. Each client location can then handle its own tasks and resources independently of other client locations. The architectural model presented by the Mediator ssh-add command can be used to store a session key, thereby allowing SSH without the need to enter in a password every time a command gets submitted. However there are some security risks in using ssh-add, so extra caution must be taken to secure the client system, as it would be detrimental 4005-739 Seminar Grid Computing I: Concepts and Practice · http://grid.rit.edu/wiki/doku.php?id=grid:seminar1 3 if the private key were to be stolen. To minimize the risk we clustering systems such as Condor, PBS, LVM, SGE, and recommend to not use password less key authentication. even BOINC [3], allowing researchers and scientists to The GridShell uses the notion of objects to describe something in the shell, and are stored in txt files in the ~/.cyberaide/objects/resource/ directory. leverage computing resources that were previously unobtainable. One of the major downsides to Globus is its lack of Typically, objects take the form of either a resource or a task backward compatibility between major versions. and are described in a METADATA section. A resource mechanisms used on version 2 of the Globus Toolkit (GT2) object also contains an ATTRIBUTES section where the are totally different from the version 4 (GT4) mechanisms. information about the resource is held. For a host resource The GT2 implementation is based on standard GRAM job this includes the host, the type of resource, the version, the submission and a gatekeeper authentication system [18] to mediator information, and any passwords. distribute tasks to resources where the GT4 implementation is [METADATA] type: resource name: iris01 id: 70 The based entirely on the Web Services Resource Framework (WSRF) [16] using WS-GRAM and GSI services. Based on the lessons learned from the CoG Kit we have chosen to use [ATTRIBUTES] # INFO FOR RESOURCE PERFORMING EXECUTION host: iris01.rit.edu password: (not shown) providertype: xgrid version: 1 # submit for asynchronous submission submittype: submit jobmanager: abstractions for interfacing with Grids. This means in order to # LOGIN INFO mediator: iris01.rit.edu username: grid command. GT4 on the other hand is based on WS-GRAM so have the GridShell take advantage of all the different Globus grids the GridShell has two separate mechanisms in place. The GT2 implantation, primarily used to submit jobs on the Open Science Grid [16], is based on the globus-job-run the only way to submit jobs to it is through the globusrun- Figure 2: The example Xgrid resource object (iris01.txt) ws command. Because of this, the mediator has two separate implementations for submitting jobs to Globus; one for GT2 grids and another for GT4 grids. A.Globus Toolkit Both toolkits can be set up using MyProxy, an X.509 The Globus Toolkit’s integration into the backend is the cornerstone of the GridShell project. Most true grid resources use some form of the Globus Toolkit, so being able to interface with these systems is a valuable asset. This is because Globus can encompass many high performance Public Key Infrastructure (PKI) credential manager [5,6]. Therefore, the handlers for managing credentials is streamlined and easily used for both versions, simplifying deployment. Most production level Globus installations use MyProxy, so this is an acceptable stance. If a grid exists that 4 doesn’t support MyProxy, the GridShell framework is submit jobs to Xgrid through Globus GT4, however it is only designed in such a way that implementing the grid-proxy designed for the MIT grid and has various bugs pertaining to functionality would be trivial. security and returning stdout and stderr results. One possibility for future work is to expand upon the job manager B.Apple Xgrid to be used in other Xgrid environments and fix the errors that Xgrid is a community-based grid computing system that exist in the current version. allows for multiple computers to submit jobs to multiple Like most distributed and grid computing middleware systems, similar in ideology to Globus. Xgrid differs from systems, Xgrid is dependent on a main controller to Globus by only being available on Mac OS X 10.3,10.4 and mainstream all the operations to and from the grid. The 10.5 operating systems [9,10,11]. This leads to the obvious controller software is available on OS X Server 10.4 and later. downside of only being able to reach a limited number of From here, remote and local recourses can connect as agents possible resources, as OS X is only available on Apple to the controller in order to provide their computational hardware. The overlooked advantage of this is it creates a resources to the grid. A noteworthy aspect about Xgrid is it very homogenous grid environment, which greatly simplifies treats both dedicated resources and common resources the application development for the scientists. The decision of same way, therefore allowing the maximum amount of OS X whether or not to use Xgrid in performing distributed machines to be collected into the grid. Once a grid system has computing really depends on the application being run, been created, Xgrid clients can login to the controller using however just having the ability to choose Xgrid is a new either a pre-selected password or a Kerberos single sign on concept to many in the grid computing community. key to start submitting jobs. Figure 1 shows an example of a There has been only some research into using Xgrid on a topological production level. setup of an Xgrid system. The biggest known Xgrid system is the OpenMacGrid project, which has recently integrated with the Xgrid@Stanford project [8] combining hundreds of agents together under Xgrid. Through Stanford, there have been a few additional tools created to help monitor and submit tasks to an Xgrid environment, however they are independent GUIs implantations that simply mask the command line interface. The Xgrid@MIT project [12,13] has also had a good amount of success in deploying a production system, which is used as part of the STAR Collaboration. The project has resulted in a creation of a Globus GRAM Job Manager to Figure 3: An example of an Xgrid deployment [6] 4005-739 Seminar Grid Computing I: Concepts and Practice · http://grid.rit.edu/wiki/doku.php?id=grid:seminar1 5 interface with current grid computing technologies that exist Using the Xgrid client is also a straightforward and today. This project has created the ability for the GridShell to The main command xgrid is submit jobs to almost all Globus grids as well as any Xgrid relatively simple process. installed on all machines with OS X version 10.4 and later. system. There is also support in the Objective-C framework through ACKNOWLEDGMENT Xcode, however this was not needed for the GridShell. The command line can submit jobs both synchronously or asynchronously, depending on the command line arguments. The GridShell uses the job -submit option to submit a job asynchronously. Upon the successful submission, the job will return a jobIdentifier number, which is used to check This project would not be possible without the Center for Advancing the Study of Cyber Infrastructure and its laboratory resources. Thanks to Jeffery Robble, Kyle Tirak, Anthony Vaglio and Frank Curran of the GridShell SE team and their advisor Jim Vallino who have worked on the implementation of the GridShell and helped the authors use the project. the status, return the results, or delete the job. The table below lists the corresponding Xgrid command line arguments to perform the tasks described. REFERENCES 1] Ian Foster, Hai Jin, Daniel A. Reed, W. J. (ed.). Globus Submit Job -job submit Get Status -job Retrieve Results -job results –id [id] Cancel job -job stop –id [id] attributes –id [id] Toolkit 4: Software for Service-Oriented Systems. Network And Parallel Computing: IFIP International Conference, Birkhauser, 2005, 2-13 2] Ian Foster. A Globus Primer: Describing Globus Toolkit 4. 2005 Figure 4: Table of Xgrid commands 3] Myers, D. S.; Bazinet, A. L. & Cummings, M. P. Zomaya, SECTION RESULTS A. (ed.). Expanding the reach of Grid computing: Section with example is missing and some performance data, combining Globus- and BOINC-based systems. Grids for maybe if joel or brad and jon a further we can use their Bioinformatics and Computational Biology, Wiley Book benchmarks, otherwise is suggest we use tachyon Series on Parallel and Distributed Computing, John III.CONCLUSION Wiley & Sons, 2008, 71-85 4] Welch, V.; Foster, I.; Kesselman, C.; Mulmo, O.; The purpose of the GridShell is to lower the barrier for Pearlman, L.; Tuecke, S.; Gawor, J.; Meder, S. & scientists and researchers to enter grid computing and provide Sibenlist, F. X. 509 Proxy Certificates for Dynamic an extensive range of services. While the GridShell has done Delegation. 3rd Annual PKI R&D Workshop, 2004 this, the fact remains that there is a need for the GridShell to 5] Barton, T.; Basney, J.; Freeman, T.; Scavo, T.; Siebenlist, 6 F.; Welch, V.; Ananthakrishnan, R.; Baker, B.; Goode, M. & Keahey, K. Xgrid middleware. ACSW Frontiers '06: Proceedings of Identity Federation and Attribute-based Authorization the 2006 Australasian workshops on Grid computing and through the Globus Toolkit, Shibboleth, Gridshib, and e-research, Australian Computer Society, Inc., 2006, 47- MyProxy 54 5th Annual PKI R&D Workshop, 2006 6] Xgrid Programming Guide. Advanced Computation 7] 8] 15] Baden Hughes. Building computational grids with apple's 16] Pordes, R.; Petravick, D.; Kramer, B.; Olson, D.; Livny, M.; Roy, A.; Avery, P.; Blackburn, K.; Wenaus, T.; Group, 2007 Würthwein, F. The open science grid. Journal of Physics: Xgrid Administration and High Performance Computing. Conference Series, Institute of Physics Publishing, 2007, Advanced Computing Group, 2007 78, 12-57 Xgrid@Stanford 17] Humphrey, M.; Wasson, G.; Gawor, J.; Bester, J.; Lang, http://cmgm.stanford.edu/~cparnot/xgrid- S.; Foster, I.; Pickles, S.; Mc Keown, M.; Jackson, K.; stanford/html/goodies/GridStuffer-info.html Boverhof, J. State and events for web services: a 9] Hughes, B. Building computational grids with apple's comparison of five WS-resource framework and WS- Xgrid middleware. Proceedings of the 2006 Australasian notification implementations. High Performance workshops on Grid computing and e-research-Volume 54, Distributed Computing, 2005. HPDC-14. Proceedings. Australian Computer Society, Inc., 2006, 47-54 14th IEEE International Symposium on, 2005, 3-13 10] C. Parnot. Xgrid Leopard: the good, the bad, the ugly, and the new stuff. B2007 11] Parnot, C. The Xgrid Tutorials (Part I): Xgrid Basics, 2007 12] Kocoloski, A. & Miller, M. SUMS Schedules MIT International Science Grid This Week, 2006 13] Kocoloski, A. & Miller, M. Xgrid@MIT: An innovative campus grid prototype Open Science Grid consortium, 2006 14] Kramer, D. & MacInnis, M. Utilization of a Local Grid of Mac OS X-Based Computers Using Xgrid. Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, 2004 18] Lorch, M.; Kafura, D. & Shah, S. An XACML-based policy management and authorization service for globus resources. Grid Computing, 2003. Proceedings. Fourth International Workshop on, 2003, 208-210