Document

advertisement
HUBbub 2013:
Developing hub tools that submit
HPC jobs
Rob Campbell
Purdue University
Thursday, September 5, 2013
Example
 “SubmitR” tool running on the DiaGrid hub
• DiaGrid: distributed
research computing
network
• SubmitR: hub tool for
running R scripts on
DiaGrid
SubmitR
 Move files, run job on remote system, view results
Hub
Building a job:
Files, options/arguments, job parameters
Job Types
One process
Multiple processes,
communicating
(parameter sweep)
independent
processes
The “submit” command:
 Runs user command on a remote system
submit
1. Connect to remote system
2. Transfer input files and program
3. Create script for user’s command
4. Talk to batch or workflow system
5. Output periodic status updates
6. Transfer files back to hub
submit options:
• VENUES
• MANAGERS
- remote systems
- commands that can be run on remote systems
For SubmitR, submit uses:
• PBS job scheduling on Purdue’s Hansen cluster
• Pegasus workflow management with HTCondor
(single or parallel jobs)
(parameter sweeps)
Building the submit command:
Job should use 2 processors, 60
minutes walltime, run on Hansen
cluster, and collect metrics.
File “inp.dat” should be included
(transported to remote system).
submit -n 2 -w 60 -v hansen -M -i inp.dat R-2.15.1 CMD BATCH -q “--args inp.dat” myscipt.R
Use manager “R-2.15.1”. Causes “R”
interpreter to run on remote system.
Options for the R interpreter. Note: submit
detects that “myscript.R” is used and
transports it to remote system.
Executing the submit command, getting status updates:
Tips for using submit:
 Use submit’s email notification feature to alert user when job finishes:
$> submit mail2self –s ‘Hey’ –t ‘Your job is done.’
 Test submit from the hub’s command line (workspace):
$> submit -n 1 -w 5 -v hansen -M R-2.15.1 CMD BATCH -q "--args 1 2" testargs.R"
=SUBMIT-METRICS=> job=1214144
(5073894) Job Submitted at hansen-a Mon Sep 2 17:38:54 2013
(5073894) Simulation Queued at hansen-a Mon Sep 2 17:39:04 2013
(5073894) Simulation Complete at hansen-a Mon Sep 2 17:39:20 2013
(5073894) Simulation Done at hansen-a Mon Sep 2 17:39:30 2013
=SUBMIT-METRICS=> job=1214144 venue=1:sshPBS:5073894:diagrida@hansen.rcac.purdue.edu status=0 cpu=3.290000 real=3.000000 wait=14.000000 (end
of output)
Additional submit feature:
 Automatic breakout of parameter combinations (for sweeps)
User wants six runs.
Parameters:
• 1
7
• 1
9
• 2
7
• 2
9
• 3
7
• 3
9

“ submit … -p @@p1=1-3;@@p2=7,9 … ”
Directories:
 “Run” directory:
• A tool-specific directory under hub’s
session directory.
• Current working directory for executing
submit .
• Isolates job-related files.
• Ex. “~/data/sessions/6716/submitr”
 Parameter sweep output:
• Job directory created under run directory.
• Pegasus puts each run’s (sub-job’s) output in
separate directory under job directory.
• Pegasus bookkeeping files in job directory.
Exiting the tool, canceling the job:
Moving files:
 Concept: File “import / export”
• Bringing files into and out of tool.
• Two flavors:
1. Browse - moving files between
directories on hub
(“os.rename(pathname,newpath”)
2. Upload / download - moving files
between workstation and hub
• Hub commands: importfile and
exportfile .
• Execute importfile from separate
thread to handle user-canceled
uploads
Information
Resource
Link
Rob Campbell
mailto:rcampbel@purdue.edu
Research Computing
at Purdue
http://www.rcac.purdue.edu
DiaGrid Hub
http://diagrid.org
SubmitR
https://diagrid.org/tools/submitr
Tool Developers Guide
http://hubzero.org/documentation/1.1.0/tooldevs
The submit command http://hubzero.org/documentation/1.1.0/tooldevs/grid.submitcmd
Pegasus
http://pegasus.isi.edu/
HTCondor
http://research.cs.wisc.edu/htcondor/
Download