globus-job

advertisement
CENG 546
Dr. Esma Yıldırım

A fundamental enabling technology for the
"Grid," letting people share computing power,
databases, and other tools securely online
across corporate, institutional, and geographic
boundaries without sacrificing local autonomy

It includes software services and libraries for
resource monitoring, discovery, and
management, plus security and file
management
security,
 information infrastructure,
 resource management,
 data management,
 communication,
 fault detection,
 portability


How did Globus become the “de facto standard”
for Grid Computing
A small team led by Ian Foster at Argonne created new
protocols that allowed I-WAY users to run applications
on computers across the country at Super Computing 95
 The experiment got the attention of DOE and NSF
 With the funding from many national agencies, it began
in 1996.
 The project has spurred a revolution in the way science is
conducted.

 High-energy physicists designing the Large Hadron Collider
at CERN are developing Globus-based technologies through
the European Data Grid, and the U.S. efforts like the Grid
Physics Network (GriPhyN) and Particle Physics Data Grid.


The Grid Resource Allocation and
Management (GRAM5) component is used to
locate, submit, monitor, and cancel jobs on
Grid computing resources.
GRAM5 is not a Local Resource Manager, but
rather a set of services and clients for
communicating with a range of different
batch/cluster job schedulers using a common
protocol.

GRAM 5 Components
Gate Keeper
 Job Manager
 Scheduler Event Generator
 LRM Adaptor




The globus-gatekeeper service provides a
network interface to the GRAM5 system.
It authenticates client identities and starts Job
Manager processes using the local user account
to which the client identity is mapped.
One instance of the globus-gatekeeper process
runs to accept network connections, and forks a
new short-lived process to process each new
connection.


The globus-job-manager daemon processes job
requests and coordinates file transfers.
There is one long-lived instance of this per user
per LRM and one short-lived instance per job.


The globus-scheduler-event-generator process
parses LRM-specific data relating to job
startup, execution, and termination into an
LRM-independent data format.
There is optionally one instance of this
program per LRM.


The LRM adapter provides an interface
between the GRAM5 system components and
the LRM.
It provides concrete implementations of the
submit, cancel, and poll functionality for a
particular system's LRM and to generate job
status change events.




GRAM jobs consist of file transfers and program
execution on one or more compute elements
managed by a local resource manager
The GRAM client can submit the job and then later
poll for its status, or it can request that the GRAM
service notify it when the job changes state or
completes.
While the job is executing, the client may send
control messages to the GRAM service to monitor
or modify the job.
GRAM provides reliable job submission, job
recovery in case of service or client failures, file
staging, and asynchronous notification messages.

GRAM achieves its uniform interface by
implementing a domain-specific language
called the Resource Specification Language
(RSL) which provides a simple way to express
job requirements, environment, and commands
in a specification which is independent of the
local resource manager which will actually
execute the job.


GRAM uses a proxy certificate which is a shortterm credential digitally signed by a private
key
You must first obtain a security credential
(.X509 certificate)

Before interacting with a GRAM service, you
must know its contact address
Host name

Port No
Service
Name
grid.example.org:2120/jobmanagersge:/C=US/O=Example/OU=Grid/CN=host/
grid.example.org
Credential name



globus-job-run : waits until the job terminates
before exiting and prints job standard output
and stderr after the job completes
globus-job-submit : submit the job and then
exit immediately, printing the job contact to its
standard output stream
globusrun : Uses RSL language to run jobs

Minimal job running
% globus-job-run grid.example.org/jobmanager-pbs /bin/hostname
node1.grid.example.org

submits a single instance of the /bin/hostname
executable to the resource named by
grid.example.org/jobmanager-pbs

Multiprocess job running
% globus-job-run grid.example.org/jobmanager-pbs -np 4 /bin/hostname
node1.grid.example.org
node3.grid.example.org
node2.grid.example.org
node10.grid.example.org


submits ten instances of an executable /bin/hostname.
The output of the job is the name of the ten hosts that
the job ran on. The -np COUNT option causes globusjob-run to run COUNT instances of the executable.

Staging an executable file
% globus-job-run grid.example.org/jobmanager-pbs -s my-executable
node1.grid.example.org



submits an executable which is local to the submit machine to
the GRAM resource, then executes it.
The executable is removed automatically from the GRAM
resource after the job completes.
The -s option prior to the executable name causes globus-jobrun to stage the executable using GASS (an https-based
protocol) from the machine running globus-job-run to the
GRAM resource.

Providing an input file to a job
% globus-job-run grid.example.org/jobmanager-pbs -stdin inputfile.txt /bin/cat
Hello, Grid



submits a job to a GRAM resource.
When this job runs, its standard input will
read from the file $HOME/inputfile.txt, which
is located on the GRAM resource.
The -stdin command-line option indicates this
path.

Staging an input file to a job
% globus-job-run grid.example.org/jobmanager-pbs -stdin -s inputfile.txt /bin/cat
Hello, staged input on the Grid



submits a job to a GRAM resource.
When this job runs, its standard input will read
from the file inputfile.txt, which is located on
the submit client machine.
The -stdin -s command-line option
combination causes the input to be staged in
the above executable staging example
% globus-job-submit grid.example.org/jobmanager-pbs /bin/hostname
https://grid.example.org:38843/16001600430615223386/5295612977486013582/
% globus-job-status
https://grid.example.org:38843/16001600430615223386/5295612977486013582/
PENDING
% globus-job-status
https://grid.example.org:38843/16001600430615223386/5295612977486013582/
ACTIVE
% globus-job-status
https://grid.example.org:38843/16001600430615223386/5295612977486013582/
DONE
% globus-job-get-output -r grid.example.org/jobmanager-fork \
https://grid.example.org:38843/16001600430615223386/5295612977486013582/node1.g
rid.example.org
% globus-job-clean -r grid.example.org/jobmanager-fork \
https://grid.example.org:38843/16001600430615223386/5295612977486013582/
WARNING: Cleaning a job means:
- Kill the job if it still running, and
Remove the cached output on the remote resource Are you sure you want to
cleanup the job now (Y/N) ? y
Cleanup successful.

Basic interactive job
% globusrun -s -r example.grid.org/jobmanager-pbs
"&(executable=/bin/hostname (count=5)”
node03.grid.example.org
node01.grid.example.org
node02.grid.example.urg
node05.grid.example.org
node04.grid.example.org

submit interactive job with globusrun. When the -s is
used, the output of the job command is returned to
the client and displayed as if the command ran
locally. This is similar to the behavior of the globusjob-run program described.

Basic batch job
% globusrun -b -r grid.example.org/jobmanager-pbs
"&(executable=/bin/sleep)(arguments=500)”
globus_gram_client_callback_allow successful
GRAM Job submission successful
https://grid.example.org:38824/16001608125017717261/5295612977486019989/GL
OBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
% globusrun -status
https://grid.example.org:38824/16001608125017717261/5295612977486019989/PE
NDING
% globusrun -k
https://grid.example.org:38824/16001608125017717261/5295612977486019989/
%

submit, monitor, and cancel a batch job using globusrun. This method is
useful for the case where the job may run for a long time, the job may be
queued for a long time, or when there are network reliability issues
between the client and service



One of the foundational issues in HPC
computing is the ability to move large (multi
Gigabyte, and even Terabyte), file-based data
sets between sites.
Simple file transfer mechanisms such as FTP
and SCP are not sufficient either from a
reliability or performance perspective.
GridFTP extends the standard FTP protocol to
provide a high-performance, secure, reliable
protocol for bulk data transfer




Performance - GridFTP protocol supports using
parallel TCP streams and multi-node transfers to
achieve high performance.
Checkpointing - GridFTP protocol requires that the
server send restart markers (checkpoint) to the client.
Third-party transfers - The FTP protocol on which
GridFTP is based separates control and data channels,
enabling third-party transfers, that is, the transfer of
data between two end hosts, mediated by a third host.
Security - Provides strong security on both control and
data channels. Control channel is encrypted by default.
Data channel is authenticated by default with optional
integrity protection and encryption.



A server implementation called globus-gridftpserver,
A scriptable command line client called globusurl-copy,
A set of development libraries for custom
clients.

globus-url-copy –vb -p 4 source_url
destination_url
-vb -> outputs transfer performance
 -p -> sets the number of parallel streams


globus-url-copy -vb -p 4 -r -cd - cc 4 source_url
destination_url
Directory transfer
 -r -> copy files in sub directories
 -cd -> create destination directory
 -cc -> number of concurrent connections


Source and Destination URLs

file:///path/to/my/file
 if you are accessing a file on a file system accessible by
the host on which you are running your client.

gsiftp://hostname/path/to/remote/file
 if you are accessing a file from a GridFTP server.

Uploading a File


Downloading a File


globus-url-copy -vb -p 4 file:///tmp/foo
gsiftp://remote.machine.my.edu/tmp/bar
globus-url-copy -vb -p 4
gsiftp://remote.machine.my.edu/tmp/bar
file:///tmp/foo
Third party Transfers

globus-url-copy -vb -p 4
gsiftp://other.machine.my.edu/tmp/foo
gsiftp://remote.machine.my.edu/tmp/bar

Job Submission


Data Transfer


GRAM
GridFTP
Security

GSI -> Coming Soon
Download