Preparation of Papers for IEEE TRANSACTIONS ON

advertisement
4005-739 Seminar Grid Computing I: Concepts and Practice · http://grid.rit.edu/wiki/doku.php?id=grid:seminar1
1
Globus and Xgrid in the GridShell Environment
Andrew J Younge, and Gregor von Laszewski, PhD
Rochester Institute of Technology, 102 Lomb Memorial Drive, Rochester, NY 14623-5608, USA
Abstract - In order to lower the entry barrier to grid computing for the scientific community, we developed a shell interface to
distributed computing environments exists. The GridShell has managerial services and capabilities to enhance usability and simplify
workflow management. The shell also provides transparent access to and management of grid resources. The shell is divided into
three major components: the client shell, the shell backend, and managerial services. This paper describes the current functionality of
the shell backend, that allows for the grid shell to reach vast and diverse resources that were previously unavailable to most scientists.
In this paper we focus on the description of integration the OSX Xgrid as one of its resources.
However it has a large learning curve associated with its
I.INTRODUCTION
deployment and use that can be prohibitive to researchers
GRID computing has made significant progress over the past
15 years in harnessing the potential computing power
available in many systems. Using extensive queuing systems
and revolutionary middleware, scientists and engineers with
diverse backgrounds have been able to use grid computing
technologies to solve some of the worlds most challenging
computational problems [?].
However as the software
solutions for using grid computing become larger, the usability
becomes more complicated and often eludes users to use it due
to its high entry barrier.
Xgrid [6,7] is another grid
solution developed by Apple Computer, Inc. Xgrid is known
for its easy setup and management as well as its ability to be
used across the Internet in a grid-like fashion. He difference
between xgrid and Globus is …
Xgrid is easy to setup and use, however it is only available
on Mac OS X machines. This results in resources that are
much more homogenous, however the potential span is very
limited when compared to other grid technologies, especially
if you do not have a Macintosh-based computing environment
The Globus Toolkit [1,2] is one of the most extensive Grid
middleware solutions available today.
interested in grid computing.
It is able to bridge
spatially and administratively different computing systems
allowing for its users to gain access to resources that would be
otherwise unavailable. The potential of Globus is unmatched
by any grid technology today. It is the largest set of tools
developed for any grid environment that exists today.
already in place. [THERE ARE OTHER DIFFERENCES]
The goal of the GridShell is to be able to consolidate the
advantages of a variety of distributed computing solutions
including Globus, SSH, and XGrid. Together they provide an
integrated access to a diverse set of resources that can be
easily accessed by the scientific user community.
Many different interfaces have been developed for girds
2
over the years, however most of them have focused on the
advanced
scientific
community
developing
their
can
be
seen
in
the
following
figure.
own
solutions, leaving a high point of entry for an incoming
researcher. In addition, the interfaces are also been focused on
one specific grid environment, typically Globus. While these
grid services do meet their requirements, they keep the
practice of grid computing a limited instead of allowing a
commodity item available to all.
[We need to add better
reason: STANDARDS ON PROTOCOL LEVEL NOT ON
USER INTERFACE …]
Figure 1: Architectural overview of the GridShell system
The GridShell was created to fill this niche.
II.GRIDSHELL BACKEND
The current task flow of the GridShell is straightforward
and easy to follow. First, the client starts up the GridShell and
Incorporating Xgrid and the Globus Toolkit into the shell
issues a submit command. Based on the task and resource
backend is imperative to the success of the grid shell itself.
command arguments, the GridShell will talk to the Mediator
Without the ability to use these technologies, the grid shell
service. Then, the Mediator will submit the desired task to a
will not be able to interface with the existing resources that are
specific resource, effectively creating a specific job on the
readily available. Its also important to construct a framework
given resource. Finally, a user can monitor tasks by using the
to allow for easy integration of other grid systems as this will
status command, which queries the remote resources and
enable the grid shell to scale freely into the future as new
returns each job’s status back to the user.
technologies and user requirements arise.
The GridShell uses SSH to communicate not only between
The GridShell introduces the notion of a Mediator, which is
the client and the mediator but also between the mediator and
the main interface between the GridShell front end and the
the remote resource, allowing for the simple submission
remote grid resources.
The buffer setup up by using the
interface to be deployed anywhere SSH is available. To
mediator acts as a layer of separation and abstraction between
simplify the need to type in the password continuously, the
the client interface and the Globus and Xgrid remote
resources. This results in a client that can be deployed easily
in multiple locations. Each client location can then handle its
own tasks and resources independently of other client
locations. The architectural model presented by the Mediator
ssh-add command can be used to store a session key,
thereby allowing SSH without the need to enter in a password
every time a command gets submitted. However there are
some security risks in using ssh-add, so extra caution must
be taken to secure the client system, as it would be detrimental
4005-739 Seminar Grid Computing I: Concepts and Practice · http://grid.rit.edu/wiki/doku.php?id=grid:seminar1
3
if the private key were to be stolen. To minimize the risk we
clustering systems such as Condor, PBS, LVM, SGE, and
recommend to not use password less key authentication.
even BOINC [3], allowing researchers and scientists to
The GridShell uses the notion of objects to describe
something in the shell, and are stored in txt files in the
~/.cyberaide/objects/resource/
directory.
leverage
computing
resources
that
were
previously
unobtainable.
One of the major downsides to Globus is its lack of
Typically, objects take the form of either a resource or a task
backward compatibility between major versions.
and are described in a METADATA section.
A resource
mechanisms used on version 2 of the Globus Toolkit (GT2)
object also contains an ATTRIBUTES section where the
are totally different from the version 4 (GT4) mechanisms.
information about the resource is held. For a host resource
The GT2 implementation is based on standard GRAM job
this includes the host, the type of resource, the version, the
submission and a gatekeeper authentication system [18] to
mediator information, and any passwords.
distribute tasks to resources where the GT4 implementation is
[METADATA]
type: resource
name: iris01
id: 70
The
based entirely on the Web Services Resource Framework
(WSRF) [16] using WS-GRAM and GSI services. Based on
the lessons learned from the CoG Kit we have chosen to use
[ATTRIBUTES]
# INFO FOR RESOURCE PERFORMING EXECUTION
host: iris01.rit.edu
password: (not shown)
providertype: xgrid
version: 1
# submit for asynchronous submission
submittype: submit
jobmanager:
abstractions for interfacing with Grids. This means in order to
# LOGIN INFO
mediator: iris01.rit.edu
username: grid
command. GT4 on the other hand is based on WS-GRAM so
have the GridShell take advantage of all the different Globus
grids the GridShell has two separate mechanisms in place.
The GT2 implantation, primarily used to submit jobs on the
Open Science Grid [16], is based on the globus-job-run
the only way to submit jobs to it is through the globusrun-
Figure 2: The example Xgrid resource object (iris01.txt)
ws command. Because of this, the mediator has two separate
implementations for submitting jobs to Globus; one for GT2
grids and another for GT4 grids.
A.Globus Toolkit
Both toolkits can be set up using MyProxy, an X.509
The Globus Toolkit’s integration into the backend is the
cornerstone of the GridShell project. Most true grid resources
use some form of the Globus Toolkit, so being able to
interface with these systems is a valuable asset.
This is
because Globus can encompass many high performance
Public Key Infrastructure (PKI) credential manager [5,6].
Therefore,
the
handlers
for
managing
credentials
is
streamlined and easily used for both versions, simplifying
deployment. Most production level Globus installations use
MyProxy, so this is an acceptable stance. If a grid exists that
4
doesn’t support MyProxy, the GridShell framework is
submit jobs to Xgrid through Globus GT4, however it is only
designed in such a way that implementing the grid-proxy
designed for the MIT grid and has various bugs pertaining to
functionality would be trivial.
security and returning stdout and stderr results. One
possibility for future work is to expand upon the job manager
B.Apple Xgrid
to be used in other Xgrid environments and fix the errors that
Xgrid is a community-based grid computing system that
exist in the current version.
allows for multiple computers to submit jobs to multiple
Like most distributed and grid computing middleware
systems, similar in ideology to Globus.
Xgrid differs from
systems, Xgrid is dependent on a main controller to
Globus by only being available on Mac OS X 10.3,10.4 and
mainstream all the operations to and from the grid.
The
10.5 operating systems [9,10,11]. This leads to the obvious
controller software is available on OS X Server 10.4 and later.
downside of only being able to reach a limited number of
From here, remote and local recourses can connect as agents
possible resources, as OS X is only available on Apple
to the controller in order to provide their computational
hardware. The overlooked advantage of this is it creates a
resources to the grid. A noteworthy aspect about Xgrid is it
very homogenous grid environment, which greatly simplifies
treats both dedicated resources and common resources the
application development for the scientists. The decision of
same way, therefore allowing the maximum amount of OS X
whether or not to use Xgrid in performing distributed
machines to be collected into the grid. Once a grid system has
computing really depends on the application being run,
been created, Xgrid clients can login to the controller using
however just having the ability to choose Xgrid is a new
either a pre-selected password or a Kerberos single sign on
concept to many in the grid computing community.
key to start submitting jobs. Figure 1 shows an example of a
There has been only some research into using Xgrid on a
topological
production level.
setup
of
an
Xgrid
system.
The biggest known Xgrid system is the
OpenMacGrid project, which has recently integrated with the
Xgrid@Stanford project [8] combining hundreds of agents
together under Xgrid. Through Stanford, there have been a
few additional tools created to help monitor and submit
tasks to an Xgrid environment, however they are independent
GUIs implantations that simply mask the command line
interface. The Xgrid@MIT project [12,13] has also had a
good amount of success in deploying a production system,
which is used as part of the STAR Collaboration. The project
has resulted in a creation of a Globus GRAM Job Manager to
Figure 3: An example of an Xgrid deployment [6]
4005-739 Seminar Grid Computing I: Concepts and Practice · http://grid.rit.edu/wiki/doku.php?id=grid:seminar1
5
interface with current grid computing technologies that exist
Using the Xgrid client is also a straightforward and
today. This project has created the ability for the GridShell to
The main command xgrid is
submit jobs to almost all Globus grids as well as any Xgrid
relatively simple process.
installed on all machines with OS X version 10.4 and later.
system.
There is also support in the Objective-C framework through
ACKNOWLEDGMENT
Xcode, however this was not needed for the GridShell. The
command line can submit jobs both synchronously or
asynchronously, depending on the command line arguments.
The GridShell uses the job -submit option to submit a job
asynchronously. Upon the successful submission, the job will
return a jobIdentifier number, which is used to check
This project would not be possible without the Center for
Advancing the Study of Cyber Infrastructure and its laboratory
resources. Thanks to Jeffery Robble, Kyle Tirak, Anthony
Vaglio and Frank Curran of the GridShell SE team and their
advisor Jim Vallino who have worked on the implementation
of the GridShell and helped the authors use the project.
the status, return the results, or delete the job. The table below
lists the corresponding Xgrid command line arguments to
perform the tasks described.
REFERENCES
1] Ian Foster, Hai Jin, Daniel A. Reed, W. J. (ed.). Globus
Submit Job
-job submit
Get Status
-job
Retrieve Results
-job results –id [id]
Cancel job
-job stop –id [id]
attributes –id [id]
Toolkit 4: Software for Service-Oriented Systems.
Network And Parallel Computing: IFIP International
Conference, Birkhauser, 2005, 2-13
2] Ian Foster. A Globus Primer: Describing Globus Toolkit
4. 2005
Figure 4: Table of Xgrid commands
3] Myers, D. S.; Bazinet, A. L. & Cummings, M. P. Zomaya,
SECTION RESULTS
A. (ed.). Expanding the reach of Grid computing:
Section with example is missing and some performance data,
combining Globus- and BOINC-based systems. Grids for
maybe if joel or brad and jon a further we can use their
Bioinformatics and Computational Biology, Wiley Book
benchmarks, otherwise is suggest we use tachyon
Series on Parallel and Distributed Computing, John
III.CONCLUSION
Wiley & Sons, 2008, 71-85
4] Welch, V.; Foster, I.; Kesselman, C.; Mulmo, O.;
The purpose of the GridShell is to lower the barrier for
Pearlman, L.; Tuecke, S.; Gawor, J.; Meder, S. &
scientists and researchers to enter grid computing and provide
Sibenlist, F. X. 509 Proxy Certificates for Dynamic
an extensive range of services. While the GridShell has done
Delegation. 3rd Annual PKI R&D Workshop, 2004
this, the fact remains that there is a need for the GridShell to
5] Barton, T.; Basney, J.; Freeman, T.; Scavo, T.; Siebenlist,
6
F.; Welch, V.; Ananthakrishnan, R.; Baker, B.; Goode, M.
& Keahey, K.
Xgrid middleware. ACSW Frontiers '06: Proceedings of
Identity Federation and Attribute-based Authorization
the 2006 Australasian workshops on Grid computing and
through the Globus Toolkit, Shibboleth, Gridshib, and
e-research, Australian Computer Society, Inc., 2006, 47-
MyProxy
54
5th Annual PKI R&D Workshop, 2006
6] Xgrid Programming Guide. Advanced Computation
7]
8]
15] Baden Hughes. Building computational grids with apple's
16] Pordes, R.; Petravick, D.; Kramer, B.; Olson, D.; Livny,
M.; Roy, A.; Avery, P.; Blackburn, K.; Wenaus, T.;
Group, 2007
Würthwein, F. The open science grid. Journal of Physics:
Xgrid Administration and High Performance Computing.
Conference Series, Institute of Physics Publishing, 2007,
Advanced Computing Group, 2007
78, 12-57
Xgrid@Stanford
17] Humphrey, M.; Wasson, G.; Gawor, J.; Bester, J.; Lang,
http://cmgm.stanford.edu/~cparnot/xgrid-
S.; Foster, I.; Pickles, S.; Mc Keown, M.; Jackson, K.;
stanford/html/goodies/GridStuffer-info.html
Boverhof, J. State and events for web services: a
9] Hughes, B. Building computational grids with apple's
comparison of five WS-resource framework and WS-
Xgrid middleware. Proceedings of the 2006 Australasian
notification implementations. High Performance
workshops on Grid computing and e-research-Volume 54,
Distributed Computing, 2005. HPDC-14. Proceedings.
Australian Computer Society, Inc., 2006, 47-54
14th IEEE International Symposium on, 2005, 3-13
10] C. Parnot. Xgrid Leopard: the good, the bad, the ugly, and
the new stuff. B2007
11] Parnot, C. The Xgrid Tutorials (Part I): Xgrid Basics,
2007
12] Kocoloski, A. & Miller, M. SUMS Schedules MIT
International Science Grid This Week, 2006
13] Kocoloski, A. & Miller, M.
Xgrid@MIT: An innovative campus grid prototype
Open Science Grid consortium, 2006
14] Kramer, D. & MacInnis, M. Utilization of a Local Grid of
Mac OS X-Based Computers Using Xgrid. Proceedings
of the 13th IEEE International Symposium on High
Performance Distributed Computing, 2004
18] Lorch, M.; Kafura, D. & Shah, S. An XACML-based
policy management and authorization service for globus
resources. Grid Computing, 2003. Proceedings. Fourth
International Workshop on, 2003, 208-210
Download