Quick Reference Guide

advertisement
.pbs Script File / qsub Reference
Common qsub Torque commands for submitting jobs to the stat-cl cluster. These can be submitted on the command
line, or placed within your .pbs script file by adding the prefix #PBS for each command listed here.
Torque Command
-V
-N <jobname>
-o <outfile>
-e <errorfile>
-q <queue>
Description
Export all environment variables to the job.
Sets the name of your job.
Override the default output file to be <outfile>
Override the default error file to be <errorfile>
Specify the queue to submit your job to. Should just be –q default
for stat-cl.
-l nodes=#:ppn=#
Request the number of cpu’s you need for your job. Total number of
cpu’s is (num nodes) x (processors per node). Maximum number of
processors per node is 8 for stat-cl. Maximum number of nodes is 20
for stat-cl. However, please choose something reasonable! (start
small!)
-l ncpus=#
Request number of cpu’s in an alternative way. Directly specifies
number of cpus required. However, nodes=#:ppn=# is preferred,
particularly for parallel jobs.
-l walltime=HH:MM:SS Maximum amount of time to allow your job to run on the cluster.
-l mem=#mb
Maximum amount of physical memory to allow your job to use in
megabytes.
-t <0-9,->
Request an array job. Argument can be a comma and/or hyphenated
list of digits. Eg: -t 0-9,12,15-17 will run an array job with
PBS_ARRAYID taking on the values 0,1,2,3,4,5,6,7,8,9,12,15,16,17.
-m bea
Send mail to the user when the job begins (b), ends (e) and aborts (a).
Use in conjunction with –M.
-M <email_addr>
Specify your email address to receive notifications from stat-cl. Use in
conjunction with –m.
-j oe
Merge the standard error and standard output streams into one.
-k oe
Force real-time error and output streams. Output files are re-directed
to $HOME/<jobname>.e<jobnum> and
$HOME/<jobname>.o<jobnum>. One can view these in real-time
using tail –f filename.
-I
Start an interactive shell session on the cluster. Useful for
development or debugging.
Note: The default output file is <jobname>.o<jobnum> and similarly, the default error file is
<jobname>.e<jobnum>. If you specify different output/error files using –o and –e, the files you specify will be
overwritten on subsequent runs of your job unless the specified name includes the variable $PBS_O_JOBID in order to
uniquely specify the different runs.
Submit your job to stat-cl using the qsub command. View the status of your job(s) and the cluster using the qstat
command and delete your job (if needed) from the cluster using the qdel command (see page 2).
Job Control Commands
Users will typically only need to use qstat and qdel to manage their jobs running on the cluster.
Command / Option
qstat
-u <username>
-f
-a
qdel <jobnum>
qselect
-u <username>
-s <jobstate>
showq
checkjob <jobnum>
qsig
Options/Description
Displays general status information about the cluster, such as the names and numbers of all
jobs running on stat-cl, the users running the jobs, status of the job (R=running, Q=queued,
C=completed, E=exiting, H=held, T=transitioning to new location, W=waiting for start time,
S=suspended) and how long jobs have been running for.
Displays general status information only for user <username>. Useful way to view the status
of your jobs.
Gives detailed output about the jobs running on the cluster, but it is extensive. Not really
needed for day to day use.
Display output in an alternative format. Gives more information on the running jobs.
Removes the job <jobnum> from the cluster by killing the running process and removing the
job from the queue. If you want to remove many jobs at once, this can be done by passing
the output of qselect to qdel.
Selects jobs from the queue by matching on username, job name, status, etc. These are
determined by the options stated below. Can be combined with qdel to remove multiple jobs
from the cluster in one command.
Select jobs according to your username.
Select jobs according to their state in the queue. Relevant options for <jobstate> include ‘r’
for running jobs and ‘q’ for queued jobs.
Gives more detailed information on the status of the cluster. In particular, displays how many
nodes and CPU’s are being used, and how many are free. Also lists all active jobs, idle jobs
and blocked jobs.
Displays detailed information, but only for the job <jobnum>.
For advanced users only. For example, in some rare cases, if your program becomes
unresponsive or crashes due to a bug, qdel might not be able to kill the process. The
command qsig allows you to directly send a signal to your job, which is useful in this case.
Avoid using this command unless absolutely required.
Download