Tutorial 1: Client command line tutorial This document is designed to provide a tutorial for using the OMII_1 Client Distribution’s Command Line API to write programs that run an application installed on an OMII_1 Server (or sometimes referred to as a service provider). The Command Line API is made up of a set of commands provided by the "ogre_client" script in the OMII_1 Client Distribution. This tutorial is targeted at developers with a basic to intermediate level of bash scripting expertise, who are familiar with running basic jobs from the OMII_1 Client Distribution using the command line on Linux. Example 1: Single Job Example 2: Two Sequential Jobs Example 3: Two Parallel Jobs Example 4: N Parallel Jobs Document Scope This tutorial will focus on the client-side aspects of using the command line to implement simple, typical grid workflow models to drive the execution of jobs on a service provider. It will deal with how to use the client API to utilize the four services contained within a service provider installation that facilitate obtaining an allocation, uploading input data, executing a number of jobs, and downloading output data. Click to see the corresponding examples in Globus, PBS or Globus/PBS. Prerequisites The prerequisites for the tutorial are: 1. An installation of an OMII_1 Client on a Linux platform. This will be used as a platform to submit jobs from. If you are using an OMII_1 client on Windows, you may want to rewrite the bash scripts shown in this tutorial with Windows batch scripting. 2. Access to an OMII_1 Server which has the OMII Base, Extension and Services installed. ● ● An open, credit-checked account on the OMII_1 Server. (See Creating a new account.) A working and tested GRIATestApp application (installed by default as part of the OMII_1 services installation). See How to install the OMII Services. Scenario Background The simple scenario used by this tutorial is one that represents a class of problem that is commonly solved using grid software: a user has a set of data that he wants processed by an application, and he wishes to use the computational and data resources provided by another user to expedite this process. This may involve running the application many times over this data to produce output for each element in the data set. Using an OMII_1 approach, this process involves the following steps, described in a taskoriented form: 1. Ensure application is installed on the service provider: Since the OMII_1 Distribution uses a static application model the user needs to ensure that the application they wish to use is installed on the service provider. 2. Obtain account on the service provider: This is required before anything can be done by the client on the server. 3. Obtain resource allocation: This involves determining the requirements of the entire process and submitting them as a request for resources to the service provider. If the request is approved, an allocation is sent back to the client that provides a context within which applications can be run. 4. Upload input data: Data to be processed by the application is uploaded to input data stager area(s) on the server. 5. Execute the application on the input data: Execute the application on each element in the input data (held in the input data stager areas). Output is held in output data stager area(s). A single run of an application on a set of input data that produces output data is referred to as a job. 6. Download the output produced by the application: Data produced by the application from the input data is downloaded from the output data stager areas on the server. 7. Finish the allocation: The client informs the server that they no longer require the allocation. Note that the first step is a manual one achieved by interaction between the client user and a service provider administrator, who must inspect and approve the application before installing it on their server. The second step is initiated using the OMII_1 Client software as a request to a service provider who must approve, and possibly credit check, the user’s request. Upon success of the request, the user receives an account URI that they can use to access the functionality provided by the server itself. Taking the nature of steps 1-2 into account, it is steps 3-7 that we can automate into a command line workflow. An interesting point to note is that when a job is submitted to the service provider, the job may be run on the machine local to the service provider or it may be redistributed to a cluster manager (e.g. PBS or Condor), depending on how the service-provider is configured. You will have no control over which platform your job is running on. The benefit of this is that you will not have to worry about writing different submission scripts for jobs running on different cluster resource manager platforms. Example Implementations This section will look at how to write command line clients that use the OMII_1 Client API to run jobs. The examples shown in this tutorial uses the GRIATestApp application preinstalled in any standard installation of the OMII_1 Server, and the command line examples will focus on Linux Bash scripts. The GRIATestApp is an application implemented in Perl which inputs one or more text files, sorts the words in each file into alphabetical order and outputs the same number of text files. On the service provider, the application wrapper script for the GRIATestApp accepts input and produces output in a zipped file format. Environment ● ● ● All the examples shown later are expected to be run from the OMII_1 Client home directory (to be referred to as <omii-client-home> from now on). The default installation value for <omii-client-home> is ~/OMIICLIENT under Linux. The examples expect to find the following files under <omii-client-home>/test-services/ test: ● your account file, i.e. Account-test.xml, ● the resource requirements XML file, i.e. Requirements-test.xml . This file describes the resource requirements including the current start time for all the jobs you wish to run in a session. As you will see later, we can generate this file automatically just before we submit a job. ● the job description file, i.e. Work-test.xml. This file describes the resource requirements for one job. ● the input file, e.g. input.zip. Each example uses a java utility class called CreateRequirementsFile to create the resource requirements XML on run-time. The use of CreateRequirementsFile requires CreateRequirementsFile.class and CreateRequirementsFileException.class. These two Java classes can be found in the OMII_1 Client distribution. The examples expect to find these classes under <omii-client-home>. Example 1: Single Job This example shows a basic workflow for submitting a job that takes in an input file named input.zip and produces an output file called output.zip. The example is a bash script with DIR defined as ./test-services/test. #!/bin/sh DIR=./test-services/test Before we can run any job, we need to first request some resources from the service provider. To do so, we use the class CreateRequirementsFile to create an XML file which details the requirements of all the jobs to be executed in this example. The class takes in two parameters: (i) the application suite name, i.e. http://omii.org/GRIATestApp, and (ii) the path and file name of the XML file to be generated, e.g. Requirements-test.xml. Always make sure the requirements detailed in this XML file are sufficient for all the jobs you wish to run. You can edit this file manually if needed. java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml With the Requirements-test.xml file, we can request an allocation of the resources specified in Requirements-test.xml file from the service provider using the ogre_client.sh tender command. The command takes in three arguments: (i) the Account-test.xml file (which is obtained when we open an account with our service provider) and its path name, (ii) the Requirements-test.xml file and its path name, and (iii) MyTask, which is the name given to this resource allocation. ./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask Next, the input data is uploaded to the service provider using the command ogre_client.sh upload. For this command, we specify (i) the name of the resource allocation to be used, i.e. MyTask, and (ii) the directory path and the input file name. The input file name, i.e. input.zip is used later as a handler to refer to the input data staging area on the server. ./ogre_client.sh upload MyTask $DIR/input.zip Now, we are ready to run the job. ogre_client.sh run takes in five parameters: (i) MyTask, the name of the resource allocation used to run the job, (ii) http://omii.org/GRIATestApp, the name of the application suite containing the application GRIATestApp, (iii)Work-test.xml and its directory path, this file describes the processing requirements of this particular job, (iv) input.zip, which is the input data staging handler, and (v) output.zip, which is the handler specified to refer to the output data staging area on the server. ./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input.zip --output output.zip output.zip is used next in the ogre_client.sh download command to download the output from the server. ./ogre_client.sh download output.zip Once the output is downloaded, we use ogre_client.sh finish to release the resources back to the service provider. ./ogre_client.sh finish Note that if you wish to run an application which can take in more than one input and/or produce more than one output, you may do so using the multiple input/output options from theogre_client.sh run or ogre_client.sh start command (see Example 3 for the usage of ogre_client.sh start). For example, to run jobs on GRIATestApp with two inputs and two outputs, we use: ./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input1.zip –-input input2.zip --output output1.zip –-output output2.zip Each zip file in this case contains exactly one input/output file and exactly one data staging area is defined for each of the zip files. Note that this is an application-dependant feature. Example 2: Two Sequential Jobs This example illustrates a workflow where two jobs are run in sequence within the same resource allocation. The ogre_client.sh run command is used to run each job, and the command only returns control to the script after the job has finished. This then allows us to start the second job only after the first run returns its control. While the job is running, the status of the job is monitored periodically until the job completes. #!/bin/sh DIR=./test-services/test echo Create Requirement.xml file java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml echo Using a valid account to request resources ... ./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask echo Uploading input 1... ./ogre_client.sh upload MyTask $DIR/input-1.zip echo Running job 1... ./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input-1.zip --output output-1.zip echo Downloading output 1 ... ./ogre_client.sh download output-1.zip echo Uploading input 2... ./ogre_client.sh upload MyTask $DIR/input-2.zip echo Running job 2... ./ogre_client.sh run MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input-2.zip --output output-2.zip echo Downloading output 2 ... ./ogre_client.sh download output-2.zip echo Releasing resource allocation ... ./ogre_client.sh finish Example 3: Two Parallel Jobs This example illustrates a workflow involving two jobs running in parallel. Here, we use ogre_client.sh start instead of ogre_client.sh run. The former is different from the latter in that ogre_client.sh start returns control to the script right after the job is started. This then allows us to start off the second job and use ogre_client monitor to monitor all the current jobs until they have finished. #!/bin/sh DIR=./test-services/test echo Create Requirement.xml file java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml echo Using a valid account to request for resources ... ./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask echo Uploading input... ./ogre_client.sh upload MyTask $DIR/input-1.zip ./ogre_client.sh upload MyTask $DIR/input-2.zip echo Running job ... ./ogre_client.sh start MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input-1.zip --output output-1.zip ./ogre_client.sh start MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input-2.zip --output output-2.zip echo Monitoring submitted jobs ... ./ogre_client.sh monitor echo Downloading output ... ./ogre_client.sh download output-1.zip ./ogre_client.sh download output-2.zip echo Releasing resource allocation ... ./ogre_client.sh finish One important thing to note is that, currently, if the OMII_1 Client command line interface is used, all the job states are logged in a single local state file named client.state located at <omii-client-home>. Therefore, if more than one script is trying to read/write to the state file at the same time, some unexpected behaviour might occur. Also, when ogre_client.sh monitor is called from a script, the command reads all the current jobs recorded in the state file, including those launched by other scripts. This is clearly not the desired behaviour. Hence, in short, it is not thread-safe when more than one script is run from the same client platform to drive different jobs simultaneously. To work around this problem, if you wish to run more than one job simultaneously, use one script to do the work, as demonstrated by Example 3 and Example 4. This problem has been identified and will be rectified in a future OMII release. Example 4: N Parallel Jobs This example shows you how to submit n parallel jobs: #!/bin/sh DIR=./test-services/test INPUT_FILE_NUM=2 echo Create Requirements.xml file... java CreateRequirementsFile http://omii.org/GRIATestApp $DIR/Requirements-test.xml echo Tender... ./ogre_client.sh tender $DIR/Account-test.xml $DIR/Requirements-test.xml MyTask echo Stage input... for IN in `seq 1 $INPUT_FILE_NUM`; do ./ogre_client.sh upload MyTask $DIR/input-$IN.zip done echo Run jobs... for IN in `seq 1 $INPUT_FILE_NUM`; do ./ogre_client.sh start MyTask http://omii.org/GRIATestApp $DIR/Work-test.xml --input input-$IN.zip --output output-$IN.zip done echo Monitor until all jobs complete... ./ogre_client.sh monitor echo Stage output... for OUT in `seq 1 $INPUT_FILE_NUM`; do ./ogre_client.sh download output-$OUT.zip done echo Release resource... ./ogre_client.sh finish 1.2 P1 © University of Southampton, Open Middleware Infrastructure Institute