Resource Management and Resource Brokering using UNICORE Jon MacLaren

advertisement
Resource Management
and Resource Brokering
using UNICORE
Jon MacLaren
NeSC 13th February 2003
Summary of Talk
„
Overview of UNICORE
– What is UNICORE?
– The UNICORE
Architecture
– Advantages of the
UNICORE Architecture
„
„
Resource Brokering in UNICORE
– Resources in UNICORE
– Resources and Seamlessness
– Resource Brokering – What, Why and
How
– EUROGRID Resource Broker:
Overview
– What do the Brokering Messages
Convey?
– Resource Refinement
– Abstract Resource Description: Why?
– What next?
– What the broker looks like
The UNICORE Model
– What does a UNICORE
Job Look Like?
– Execution of a UNICORE
Job
– What the User Sees…
– The UNICORE Security
Model
„
Further Resources
Background on Manchester…
„
Home of the CSAR National Supercomputing Service:
– 512-processor O3000, 816-processor T3E, 128-processor O2000
„
Involved in many major UK e-Science projects:
– RealityGrid
– OGSA-DAI, Geodise, myGrid
– Markets for Computational Economies
„
Involved in major European projects:
– EUROGRID, GRIP
Home of e-Science NorthWest – eSNW
„ First Access Grid in the UK
„ My Interests:
„
– Resource Brokering (wrote EUROGRID Resource Broker)
– Advance Reservation (co-chair of GGF-WG GRAAP-WG)
– Grid Economies (co-chair of GGF-WG GESA-WG)
What is UNICORE?
„
Grid middleware (pardon?)
„
Targeted specifically at the execution of compute-intensive
tasks on supercomputers and large clusters
„
Provides single-sign on using X509 certificates
„
Can interface to practically any hardware and batch queue
system, even “exotic” platforms.
„
UNICORE is a complete Grid computing environment as
opposed to a toolkit.
„
There is a GUI UNICORE Client – a simple command line
interface was developed later.
„
UNICORE can be extended with application-specific interfaces
(called Plugins)
The UNICORE Architecture
TSI
TSI
UPL
(over SSL)
Client
UPL
over SSL
NJS
FNTP
TSI
UUDB
Gateway
UPL
(over SSL)
NJS
TSI
TSI
FNTP
UUDB
Firewall
Internet Intranet
TSI
The UNICORE Architecture
TSI
TSI
UPL
(over SSL)
Client
UPL
over SSL
NJS
FNTP
TSI
green
Vsite
UUDB
Gateway
UPL
(over SSL)
NJS
TSI
TSI
FNTP
UUDB
Firewall
Internet Intranet
TSI
turing
Vsite
The UNICORE Architecture
TSI
TSI
UPL
(over SSL)
Client
UPL
over SSL
NJS
FNTP
TSI
green
Vsite
UUDB
Gateway
UPL
(over SSL)
NJS
TSI
TSI
FNTP
UUDB
Firewall
Internet Intranet
TSI
turing
Vsite
UoM Usite
The UNICORE Architecture: Advantages
„
„
The load on the UNICORE-enabled compute resources is
negligible, as only the lightweight TSI daemons, written in Perl,
run on the compute resource itself. The Gateway and NJS
processes are Java programs but run on other, cheaper
resources, e.g. Linux workstations.
Porting UNICORE to obscure or rare architectures is made
much simpler by the fact that only a Perl interpreter needs to
be available on the target system. Target systems include:
– Cray T3E, Origin 3000, Fujitsu VPP300, Linux clusters
– Sony Playstation 2, IBM Mainframe system
„
All incoming connections to the UNICORE Vsites are routed
via a single port on a single machine, which has proved
advantageous when using UNICORE at sites with firewalls.
What Does a UNICORE Job Look Like?
„
The user composes their job into an AJO (Abstract Job
Object).
„
This is a hierarchy of components, such as scripts, compilation
tasks, file transfer tasks, or application-specific components.
„
Structure is added to the job by two mechanisms:
– Tasks may be grouped together:
• RepeatGroup, ForGroup, AbstractJob
– Within a group, dependences control the order of execution
„
The AJO components are all members of a Java Class
Package, org.unicore.ajo, which also contains all the
methods a user requires to construct an AJO ready for
dispatching.
„
Online JavaDoc for these classes is available.
Execution of a UNICORE Job
„
„
„
„
„
„
AJO sent to green (primary Vsite)
Input file is exported from user’s
workstation
Gaussian98 job runs in batch queue
Intermediate output transferred to bezier
for visualisation
After visualisation, the rendered movie is
imported to the workstation
NJS incarnates individual tasks:
– Knows locations of applications
– Knows details of the batch queue system
„
NJS gives very simple directives to the
TSI:
– User x places this file here.
– User y runs this executable in batch
queue A.
„
NJS manages workspace of jobs
(Uspace)
AbstractJob
green
FileTransfer
Import Input Deck
ExecuteTask
Gaussian98 Job
TransferFile
Transfer Output
AbstractJob
bezier
ExecuteTask
Visualisation
FileTransfer
Import movie
What the User Sees…
The UNICORE Security Model
„
„
„
„
„
„
„
„
Mutual Authentication (between Gateway/NJS and User) using X509
Certificates.
No proxy certificates, no generalised delegation.
Authorisation performed by the NJS using the UUDB Interface. (So
authorisation is (potentially) moved away from the target system.)
Separation of consigner and endorser: only a user can endorse a
job; an NJS or a user can consign a job.
The signed AJO is sent to an NJS as a serialised Java object, via
UPL (each AbstractJob component is signed).
This model ensures no tampering with multi-Vsite jobs by
intermediate NJSs.
The user’s private key never leaves the encrypted keystore on
his/her workstation.
At no point is any private key which could be used to
impersonate the user (for any lifetime) ever created
on a remote resource.
Resources and Seamlessness
Simple extensible resource model.
„ Resources divided into CapabilityResource and CapacityResource
subclasses
„ Same resource model used to describe resources required by a job,
and resources provided by a Vsite.
„ Example CapabilityResource: SoftwareResource
„
– Application name, version number
– Job meta-data (XML document)
„
Example CapacityResource: Nodes
– Number of nodes in the system/required by job
„
Resource description abstract enough to hide
– Architectural details
– Batch queue details (systems, queue names, etc.)
– Locations of applications
„
Job can only run on a Vsite if the resources
required by the job can be satisfied by the Vsite.
Resource Broker – What, Why and How
What is Resource Brokering for?
Simply put, Resource Brokering is a method for a user to discover resources
suitable for running their work on.
Why do you need it?
At the moment, people use Grid middleware to access resources that they are
already familiar with. They manually target their work at the machine that suits
their needs best. For Grids to offer something genuinely new, they need to
become much larger, so the user can find and use resources they have never
even heard of before. However, as Grids become larger, the manual solution is
simply not scalable…
How does it work?
The user describes their needs to a third party piece of software, a Resource
Broker, in a Resource Description Language they both understand, plus a
description of their preferences (e.g. quick turnaround time at
any cost). The broker searches for suitable resources, and
passes these back to the user…
EUROGRID Resource Broker: Overview
„
„
„
„
Resource Broker component locates suitable execution
environments for the user’s jobs.
Resource Broker functionality is part of NJS
Protocol can be used symmetrically by Broker, allowing
multiple stages.
Broker is configured with a list of target NJSs to get offers from
Execution
NJS
2 CheckQoS
1 CheckQoS
User
Broker
NJS
4 CheckQoS_Outcome
3 CheckQoS_Outcome
2 CheckQoS
Execution
NJS
3 CheckQoS_Outcome
What do these Messages Convey?
CheckQoS
„
A TaskResourceDAG containing the resources specified by the job
CheckQoS_Outcome
„
A set of Estimates for a number of different Vsites, each containing:
– Start time (earliest and latest);
– End time (earliest and latest);
– Cost of job (and units of cost);
– A replacement resource set; and
– A Ticket object, with a validity time (or an advertising string).
„
The intended semantics are that for the user to accept the Estimate, they
must present the job at the target Vsite with the Tickets with the job, using
the replacement resource sets from the Tickets; provided that the job
arrives within the lifetime of the Tickets, then the Vsite should turn the job
around within the stated range for the end-time, and
for the estimated cost.
Resource Refinement
Resource Refinement – the Brokering protocol supports the inclusion
of a modified resource set with a Ticket. This resource set must be
adopted by the client for the Ticket to be valid.
This allows:
„
Brokers to make offers on resource requirements it cannot exactly
match, e.g. a 256-processor Origin could return an offer when the
user asked for 512 processors. If the turnaround time and cost were
right, the user may accept.
„
Brokers could offer different versions of requested applications, or
different applications supporting the same API.
„
Most importantly, Brokers can be extended to handle abstract
application specific resource descriptions.
Abstract Resource Description: Why?
„
Currently, the user of an application has to know about the performance of
this code for a number of architectures. This is a waste of the application
scientist’s time, and is unnecessary.
„
Users are not interested in knowing the performance characteristics of a
particular application on a given parallel architecture. They want to think in
terms of what their job does, e.g. simulates 24 hours of weather over
Manchester area, turnaround time, and cost.
„
We can do this using the job meta-data of the SoftwareResource – it’s
XML, so can encode a set of integers, or a Gaussian Input Deck.
The broker uses its programmed knowledge of the application’s
performance, and the characteristics of the machines that it is brokering for
to turn this into concrete resource requirements, e.g. 4 hours on 32
processors of Manchester’s Cray T3E, or 2 hours on 128 processors of
Manchester’s Origin 3000.
So the user gives their application domain requirement, and gets back a
list of turnaround times and costs.
„
„
What next?
„
Mechanism will be added to allow Broker to find out actual execution
time and cost, allowing comparison with estimate.
– This would provide feedback suitable for input into a learning engine.
– The Broker could modify its application performance characteristics.
– For a University of Manchester broker for Gaussian98 jobs, there would be
many points of comparison every day.
– This would mean that its estimates would increase in accuracy over time.
– Can also assess the reliability of certain sites, etc.
„
„
„
„
„
Interoperability with other Grid middleware.
Already have some interoperability with Globus Toolkit V2.
Interoperability will increase as UNICORE will move towards Open
Grid Services Architecture.
With multi-tier brokering, one can imagine a series of steps of
resource abstraction/refinement.
Payment for jobs – the brokers taking a cut.
What the Broker Looks Like
What the Broker Looks Like
What the Broker Looks Like
Further Resources
„
UNICORE downloads:
– http://www.unicore.org
„
EUROGRID website:
– http://www.eurogrid.org
„
Grid Interoperability website:
– http://www.grid-interoperability.org
„
Online JavaDoc for AJO classes:
– http://people.man.ac.uk/~zzcgujm/unicore/AJO_V4/
„
GGF GRAAP-WG Home Page:
– http://people.man.ac.uk/~zzcgujm/GGF/sched-graap-2.0.html
„
GGF GESA-WG Home Page:
– http://www.doc.ic.ac.uk/~sjn5/GGF/GESA-WG3.htm
Download