Tutorial 1: Client command line tutorial

advertisement
Tutorial 1: Client command line tutorial
This document is designed to provide a tutorial for using the OMII_1 Client Distribution’s
Command Line API to write programs that run an application installed on an OMII_1 Server
(or sometimes referred to as a service provider). The Command Line API is made up of a set
of commands provided by the "ogre_client" script in the OMII_1 Client Distribution.
This tutorial is targeted at developers with a basic to intermediate level of bash scripting
expertise, who are familiar with running basic jobs from the OMII_1 Client Distribution using
the command line on Linux.
Example 1: Single Job
Example 2: Two Sequential Jobs
Example 3: Two Parallel Jobs
Example 4: N Parallel Jobs
Document Scope
This tutorial will focus on the client-side aspects of using the command line to implement
simple, typical grid workflow models to drive the execution of jobs on a service provider. It will
deal with how to use the client API to utilize the four services contained within a service
provider installation that facilitate obtaining an allocation, uploading input data, executing a
number of jobs, and downloading output data.
Click to see the corresponding examples in Globus, PBS or Globus/PBS.
Prerequisites
The prerequisites for the tutorial are:
1. An installation of an OMII_1 Client on a Linux platform. This will be used as a platform to
submit jobs from. If you are using an OMII_1 client on Windows, you may want to rewrite the
bash scripts shown in this tutorial with Windows batch scripting.
2. Access to an OMII_1 Server which has the OMII Base, Extension and Services installed.
●
●
An open, credit-checked account on the OMII_1 Server. (See Creating a new account.)
A working and tested GRIATestApp application (installed by default as part of the
OMII_1 services installation). See How to install the OMII Services.
Scenario Background
The simple scenario used by this tutorial is one that represents a class of problem that is
commonly solved using grid software: a user has a set of data that he wants processed by an
application, and he wishes to use the computational and data resources provided by another
user to expedite this process. This may involve running the application many times over this
data to produce output for each element in the data set.
Using an OMII_1 approach, this process involves the following steps, described in a taskoriented form:
1. Ensure application is installed on the service provider: Since the OMII_1
Distribution uses a static application model the user needs to ensure that the
application they wish to use is installed on the service provider.
2. Obtain account on the service provider: This is required before anything can be
done by the client on the server.
3. Obtain resource allocation: This involves determining the requirements of the
entire process and submitting them as a request for resources to the service provider.
If the request is approved, an allocation is sent back to the client that provides a
context within which applications can be run.
4. Upload input data: Data to be processed by the application is uploaded to input
data stager area(s) on the server.
5. Execute the application on the input data: Execute the application on each
element in the input data (held in the input data stager areas). Output is held in output
data stager area(s). A single run of an application on a set of input data that produces
output data is referred to as a job.
6. Download the output produced by the application: Data produced by the
application from the input data is downloaded from the output data stager areas on the
server.
7. Finish the allocation: The client informs the server that they no longer require the
allocation.
Note that the first step is a manual one achieved by interaction between the client user and a
service provider administrator, who must inspect and approve the application before installing
it on their server. The second step is initiated using the OMII_1 Client software as a request
to a service provider who must approve, and possibly credit check, the user’s request. Upon
success of the request, the user receives an account URI that they can use to access the
functionality provided by the server itself. Taking the nature of steps 1-2 into account, it is
steps 3-7 that we can automate into a command line workflow.
An interesting point to note is that when a job is submitted to the service provider, the job may
be run on the machine local to the service provider or it may be redistributed to a cluster
manager (e.g. PBS or Condor), depending on how the service-provider is configured. You will
have no control over which platform your job is running on. The benefit of this is that you will
not have to worry about writing different submission scripts for jobs running on different
cluster resource manager platforms.
Example Implementations
This section will look at how to write command line clients that use the OMII_1 Client API to
run jobs. The examples shown in this tutorial uses the GRIATestApp application preinstalled
in any standard installation of the OMII_1 Server, and the command line examples will focus
on Linux Bash scripts. The GRIATestApp is an application implemented in Perl which inputs
one or more text files, sorts the words in each file into alphabetical order and outputs the
same number of text files. On the service provider, the application wrapper script for the
GRIATestApp accepts input and produces output in a zipped file format.
Environment
●
●
●
All the examples shown later are expected to be run from the OMII_1 Client home
directory (to be referred to as <omii-client-home> from now on). The default installation
value for <omii-client-home> is ~/OMIICLIENT under Linux.
The examples expect to find the following files under <omii-client-home>/test-services/
test:
● your account file, i.e. Account-test.xml,
● the resource requirements XML file, i.e. Requirements-test.xml . This file
describes the resource requirements including the current start time for all the
jobs you wish to run in a session. As you will see later, we can generate this file
automatically just before we submit a job.
● the job description file, i.e. Work-test.xml. This file describes the resource
requirements for one job.
● the input file, e.g. input.zip.
Each example uses a java utility class called CreateRequirementsFile to create the
resource requirements XML on run-time. The use of CreateRequirementsFile requires
CreateRequirementsFile.class and CreateRequirementsFileException.class. These two
Java classes can be found in the OMII_1 Client distribution. The examples expect to
find these classes under <omii-client-home>.
Example 1: Single Job
This example shows a basic workflow for submitting a job that takes in an input file named
input.zip and produces an output file called output.zip. The example is a bash script with DIR
defined as ./test-services/test.
#!/bin/sh
DIR=./test-services/test
Before we can run any job, we need to first request some resources from the service provider.
To do so, we use the class CreateRequirementsFile to create an XML file which details the
requirements of all the jobs to be executed in this example. The class takes in two
parameters: (i) the application suite name, i.e. http://omii.org/GRIATestApp, and (ii) the path
and file name of the XML file to be generated, e.g. Requirements-test.xml. Always make sure
the requirements detailed in this XML file are sufficient for all the jobs you wish to run. You
can edit this file manually if needed.
java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml
With the Requirements-test.xml file, we can request an allocation of the resources specified in
Requirements-test.xml file from the service provider using the ogre_client.sh tender
command. The command takes in three arguments: (i) the Account-test.xml file (which is
obtained when we open an account with our service provider) and its path name, (ii) the
Requirements-test.xml file and its path name, and (iii) MyTask, which is the name given to
this resource allocation.
./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask
Next, the input data is uploaded to the service provider using the command ogre_client.sh
upload. For this command, we specify (i) the name of the resource allocation to be used, i.e.
MyTask, and (ii) the directory path and the input file name. The input file name, i.e. input.zip is
used later as a handler to refer to the input data staging area on the server.
./ogre_client.sh upload MyTask $DIR/input.zip
Now, we are ready to run the job. ogre_client.sh run takes in five parameters: (i) MyTask,
the name of the resource allocation used to run the job, (ii) http://omii.org/GRIATestApp, the
name of the application suite containing the application GRIATestApp, (iii)Work-test.xml and
its directory path, this file describes the processing requirements of this particular job, (iv)
input.zip, which is the input data staging handler, and (v) output.zip, which is the handler
specified to refer to the output data staging area on the server.
./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input.zip --output output.zip
output.zip is used next in the ogre_client.sh download command to download the output
from the server.
./ogre_client.sh download output.zip
Once the output is downloaded, we use ogre_client.sh finish to release the resources back
to the service provider.
./ogre_client.sh finish
Note that if you wish to run an application which can take in more than one input and/or
produce more than one output, you may do so using the multiple input/output options from
theogre_client.sh run or ogre_client.sh start command (see Example 3 for the usage of
ogre_client.sh start). For example, to run jobs on GRIATestApp with two inputs and two
outputs, we use:
./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input1.zip –-input input2.zip --output output1.zip –-output output2.zip
Each zip file in this case contains exactly one input/output file and exactly one data staging
area is defined for each of the zip files. Note that this is an application-dependant feature.
Example 2: Two Sequential Jobs
This example illustrates a workflow where two jobs are run in sequence within the same
resource allocation. The ogre_client.sh run command is used to run each job, and the
command only returns control to the script after the job has finished. This then allows us to
start the second job only after the first run returns its control. While the job is running, the
status of the job is monitored periodically until the job completes.
#!/bin/sh
DIR=./test-services/test
echo Create Requirement.xml file
java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml
echo Using a valid account to request resources ...
./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask
echo Uploading input 1...
./ogre_client.sh upload MyTask $DIR/input-1.zip
echo Running job 1...
./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input-1.zip --output output-1.zip
echo Downloading output 1 ...
./ogre_client.sh download output-1.zip
echo Uploading input 2...
./ogre_client.sh upload MyTask $DIR/input-2.zip
echo Running job 2...
./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input-2.zip --output output-2.zip
echo Downloading output 2 ...
./ogre_client.sh download output-2.zip
echo Releasing resource allocation ...
./ogre_client.sh finish
Example 3: Two Parallel Jobs
This example illustrates a workflow involving two jobs running in parallel. Here, we use
ogre_client.sh start instead of ogre_client.sh run. The former is different from the latter in
that ogre_client.sh start returns control to the script right after the job is started. This then
allows us to start off the second job and use ogre_client monitor to monitor all the current
jobs until they have finished.
#!/bin/sh
DIR=./test-services/test
echo Create Requirement.xml file
java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml
echo Using a valid account to request for resources ...
./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask
echo Uploading input...
./ogre_client.sh upload MyTask $DIR/input-1.zip
./ogre_client.sh upload MyTask $DIR/input-2.zip
echo Running job ...
./ogre_client.sh start MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input-1.zip --output output-1.zip
./ogre_client.sh start MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input-2.zip --output output-2.zip
echo Monitoring submitted jobs ...
./ogre_client.sh monitor
echo Downloading output ...
./ogre_client.sh download output-1.zip
./ogre_client.sh download output-2.zip
echo Releasing resource allocation ...
./ogre_client.sh finish
One important thing to note is that, currently, if the OMII_1 Client command line interface is
used, all the job states are logged in a single local state file named client.state located at
<omii-client-home>. Therefore, if more than one script is trying to read/write to the state file at
the same time, some unexpected behaviour might occur. Also, when ogre_client.sh monitor
is called from a script, the command reads all the current jobs recorded in the state file,
including those launched by other scripts. This is clearly not the desired behaviour. Hence, in
short, it is not thread-safe when more than one script is run from the same client platform to
drive different jobs simultaneously. To work around this problem, if you wish to run more than
one job simultaneously, use one script to do the work, as demonstrated by Example 3 and
Example 4. This problem has been identified and will be rectified in a future OMII release.
Example 4: N Parallel Jobs
This example shows you how to submit n parallel jobs:
#!/bin/sh
DIR=./test-services/test
INPUT_FILE_NUM=2
echo Create Requirements.xml file...
java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml
echo Tender...
./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask
echo Stage input...
for IN in `seq 1 $INPUT_FILE_NUM`; do
./ogre_client.sh upload MyTask $DIR/input-$IN.zip
done
echo Run jobs...
for IN in `seq 1 $INPUT_FILE_NUM`; do
./ogre_client.sh start MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input
input-$IN.zip --output output-$IN.zip
done
echo Monitor until all jobs complete...
./ogre_client.sh monitor
echo Stage output...
for OUT in `seq 1 $INPUT_FILE_NUM`; do
./ogre_client.sh download output-$OUT.zip
done
echo Release resource...
./ogre_client.sh finish
1.2 P1 © University of Southampton, Open Middleware Infrastructure Institute
Download