Sun Data and Compute Grids

advertisement
Sun Data and Compute Grids
T. M. Sloan1,2, R. Abrol1,2, G. Cawood1,2,T. Seed1,2, F. Ferstl3
1
EPCC, The University of Edinburgh, James Clerk Maxwell Building, King’s Buildings, Mayfield Road,
Edinburgh, EH9 3JZ, UK
2
National e-Science Centre, e-Science Institute, 15 South College Street, Edinburgh, EH8 9AA, UK
3
Sun Microsystems GmbH, Dr.-Leo-Ritter-Str. 7,D93049, Regensburg, Germany
Abstract
The Sun Data and Compute Grids (SunDCG) project[1] aims to develop an industry-strength, fully
Globus-enabled compute and data scheduler based around Grid Engine [2], Globus [3] plus a wide variety
of data technologies. The project started in February 2002 and will run until January 2004. The partners
are the National e-Science Centre [4], represented in this project by EPCC[5], and Sun Microsystems [6].
This paper describes the project and its current status as of August 2003. The project is funded as part the
UK e-Science Core Programme[7].
Introduction
According to [8], Grid computing can be
classified at three levels of deployment as
illustrated in Figure 1.
• Cluster Grid - a single team or project
and their associated resources.
• Enterprise Grid - multiple teams and
projects but within a single
organisation, facilitating collaboration
of resources across the enterprise.
• Global Grid - linked Enterprise and
Cluster Grids, providing collaboration
amongst organisations.
Grid Engine[2] is a distributed resource
management system that allows the efficient
use of compute resources within an
organisation. Grid Engine meets the first two
levels; Cluster and Enterprise, by allowing a
user to transparently make use of any number
of compute resources within an organisation.
However, Grid Engine, alone does not yet
meet the third level, the Global Grid.
The Globus Toolkit is essentially a Grid API
for connecting distributed compute and
instrument resources via the internet.
Integration with Globus allows Grid Engine to
meet this global level. That is, it allows
collaboration amongst enterprises.
The Sun Data and Compute Grids (SunDCG)
project aims to develop a scheduler based on
Grid Engine that allows user jobs to be
scheduled across a global grid and allow these
jobs to have access to their necessary data
sources. Globus will be used as the Grid API
to provide secure communications.
As a first step, a compute global grid scheduler
has been developed by the project. This
integrates Grid Engine V5.3 and Globus
Toolkit V2.2.x to allow access to remote
resources. This integration is achieved by use
of the Transfer-queue Over Globus (TOG)
software developed by the project [10].
Figure 1 : Three levels of grid computing: cluster, enterprise and global grids (taken from [8])
Following the development of TOG, the
project has investigated the integration of
access to data sources via data grid
technologies such as OGSA-DAI, GridFTP
and SRB. The next step for the project team is
to develop a hierarchical scheduler that scales
better in a grid environment and enables access
to remote data sources via data grid
technologies.
execution methods are used to provide the
additional functionality to Grid Engine so that
jobs can be run elsewhere.
This paper describes how TOG can be used to
create a global grid and so allow Grid engine
to schedule jobs for execution on that grid. In
addition the paper outlines the progress being
made in developing a hierarchical scheduler
solution that integrates access to data sources
across a global grid.
The TOG software has been used to create a
global compute grid between the universities
of Glasgow and Edinburgh. Researchers at the
Glasgow site of the National e-Science Centre
have been able to access compute resources at
EPCC using a Grid Engine installation
configured with the TOG software [9]. TOG
is also being used to set up a biomedical eScience demonstration using the new SRIF
network linking three sites within the
University of Edinburgh – EPCC, the Scottish
Centre for Genomic Technology and
Informatics (GTI) at the New Royal Infirmary
of Edinburgh and the MRC Human Genetics
Unit (HGU) at the Western General
Hospital[10].
Using TOG a global compute grid can use the
Grid Engine interface for job scheduling,
submission and control with remote system
administrators still having full control of their
resources.
Building a Global Compute Grid
– the Transfer-queue Over
Globus (TOG)
Figure 2 illustrates how an enterprise can
access remote compute resources at a
collaborating enterprise and thus create a
global compute grid. This is achieved by
configuring a queue on a local Grid Engine to
use TOG. TOG provides secure job submission
and control functionality between the
enterprises. TOG enables an enterprise to
schedule jobs for execution on remote
resources when local resources are busy. Data
and executables can be transferred over to the
remote resource with subsequent transfer of
results back to the local installation.
The TOG software and documentation is
available for download from the open source
Grid Engine site at
http://gridengine.sunsource.net/project/grideng
ine/tog.html.
JOSH – A hierarchical
Scheduling System
Following development and release of TOG,
the SunDCG project is now developing a
hierarchical job scheduling system referred to
as JOSH – Job Scheduling Hierarchically. This
system will match a user’s job requirements
against Grid Engine instances at available
compute sites. A job can then be sent to the
chosen compute site for local scheduling and
In figure 2, queue e at Enterprise A acts as a
proxy for a queue at B. This ‘proxy queue’ is
configured to use TOG to run the job on the
queue at B. In Grid Engine a queue that passes
a job to a third-party is known as a ‘transfer
queue’. TOG employs a similar mechanism to
that used by Transfer Queues [12]: that is,
B
A
Grid Engine
a
e
b
c
TOG
d
Globus 2
User A
Job
Grid Engine
e
f
TOG
g
User B
h
d
Figure 2: By configuring queue e to use the Transfer-queue over Globus (TOG) software,
Enterprise A can access resources at enterprise B. Similarly, enterprise B can access resources at
enterprise A by configuring queue d to use the TOG software.
execution. Before execution, any input files
will be pulled to the compute site from their
data sources (notably GridFTP servers).
Similarly, output files will be pushed to their
target data repositories after the job has
completed.
Further Information
A middleware layer will handle secure
communications and data transfer between
JOSH, Grid Engine and any remote data
sources. The OGSA-compliant Globus Toolkit
version 3 will form this layer. User Interface
software will be developed to allow job
submission and monitoring from the user’s
site.
[1] Sun Data and Compute Grids project
home page,
http://www.epcc.ed.ac.uk/sungrid/
[2] Grid Engine home page,
http://gridengine.sunsource.net/
[3] Globus home page,
http://www.globus.org/
[4] National e-Science Centre home page,
http://www.nesc.ac.uk/
[5] EPCC home page, http://www.epcc.ac.uk/
[6] Sun Microsystems home page,
http://www.sun.com/
[7] UK e-Science Core Programe,
http://www.escience-grid.org.uk/
[8] “Sun Cluster Grid Architecture – A
technical white paper describing the
foundation of Sun Grid Computing”, May
2002,
http://wwws.sun.com/software/grid/SunCl
usterGridArchitecture.pdf
[9] T. Sloan, “Going global with Globus and
Grid Engine”, EPCC News, Issue 48,
Spring 2003,
http://www.epcc.ed.ac.uk/overview/public
ations/newsletters/EPCCnews/EPCCNews
48.pdf
[10] R. Baxter, “ODDGenes: Development
Plan”, EPCC Internal Document, May
By scaling better in a grid environment and
enabling access to remote data sources via data
grid technologies, JOSH will improve upon the
recognised limitations of TOG in these areas.
Figure 3 illustrates how JOSH will query child
Grid Engine installations at collaborating sites
to determine if they are able to run a user’s job.
JOSH will then place the user’s job at the site
that best matches the following criteria.
1. It is capable of running the job.
2. It has the lowest load of the available
sites.
3. It has the best access to the required
data sources.
For those user jobs that are not data grid aware,
a data component will handle the transfer of
data between sites
For more information on the project and its
deliverables please access the project web site
at http://www.epcc.ed.ac.uk/sungrid/.
References
User
Job
Spec
Input Data Site
User Interface
Hierarchical Scheduler
Grid Service
Layer
Grid Service
Layer
Grid Engine
Grid Engine
Output Data Site
Figure 3: Using a hierarchical scheduler to provide an extensible, scaleable global grid.
2003.
[11] T. Seed, “Transfer-queue Over Globus
(TOG) : How-To”, July 2003,
http://gridengine.sunsource.net/download/
TOG/tog-howto.pdf
[12] C. Chaubal, “A Prototype of a MultiClustering Implementation using Transfer
Queues”,
http://gridengine.sunsource.net/project/gri
dengine/howto/TransferQueues/transferqu
eus.html
Download