Document

advertisement
Network for Computational Nanotechnology (NCN)
Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP
Basic Portable Batch System
(PBS)
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
Network for Computational Nanotechnology (NCN)
Electrical and Computer Engineering
wang159@purdue.edu
khaume@purdue.edu
Last reviewed May 2013
The Portable Batch System (PBS)
• One major computational resource available to NCN is the cluster
system at Purdue.
• The cluster system, with its abundant computational power, serves
many users and carries massive amount of tasks, thus a workload
management system is implemented upon it.
• In order to use these resources, we as users must go through the
management system called Portable Batch System (PBS) to
properly schedule cluster usage.
• PBS is the de facto standard on Linux clusters across the world.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
2
Demo #1: a PBS run example
• We will use this example as a guide to learn
about PBS piece by piece.
• In the script file, all lines beginning with #PBS
are understood by the PBS system as PBS
commands – not as comments.
• The have to be in the top, however. Otherwise
they will be interpreted as comments by the
shell.
• The script does not have to have a specific
file type ending, but to make it easier to sort
out your files, I recommend the ending .pbs
as in myjobfile.pbs
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
3
Composition of The PBS script
Shell header
PBS in-script options
PBS environment variables
Commands
In addition, PBS has its own Unix executable functions to submit the
script, and monitor and modify the jobs. We will see them later.
Together, the in-script options, environment variables and the
executable functions form the basis of PBS commands.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
4
Shell Header
Shell header
PBS in-script options
PBS environment variables
Commands
By default PBS will run the script in your login shell, so if you are
happy with that, you don’t need a shell header.
If you want to run the script in a different shell than your login shell,
specify it with a shell header (as shown), or with the option –S
#PBS –S /bin/bash
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
5
Elements of PBS: PBS in-script options
PBS in-script options can be
divided into two categories:
» Resource allocation options
» Run options
• Resource allocation options
can be further divided into:
» Chunk allocation options
» Job allocation options
PBS in-script
options
Resource
allocation
Chunk-wise
#PBS –l
#PBS –l nodes …
Run options
Job-wise
#PBS –q/N/…
#PBS –l walltime …
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
6
Resource allocation options
Before we hand the program to PBS, we need to explicitly request
the resource.
• Resources in PBS has two levels: chunk-wise and job-wise:
» A chunk is basically a node on Purdue’s system
» A job is basically the sum of all nodes
• Resource node-wise (chunk-wise), we usually specify:
» How many nodes we need
» Cores on each node
» Memory for each core
• Resource job-wise, we usually specify:
» Maximum runtime
» Total memory for job
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
7
Resource Allocation: Asking for multiple cores and nodes
• N total cores on any number of nodes, distributed
automatically by queuing system (job might start
faster)
Example
#PBS –l procs=N
• N total cores distributed on exactly N nodes, one
process per node
Example
nodes
#PBS –l nodes=N
• N total cores distributed on exactly N nodes, one
process per node, exclusive access to entire node
Example
procs
-n and nodes
#PBS –n –l nodes=N
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
8
Resource Allocation: Asking for multiple cores and nodes
• N total cores on same node
Example
nodes and ppn
#PBS –l nodes=1:ppn=N
• To have exclusive access to entire node, set N = maximum
number of cores per node on the given cluster
Example
#PBS –l nodes=1:ppn=24
(Rossmann)
• (N*P) total cores on exactly N nodes with P
processes per node
Example
nodes and ppn
#PBS –l nodes=N:ppn=P
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
9
Resource Allocation: Asking for memory
• Size of memory per job
mem
Example
#PBS –l nodes=1:ppn=4
#PBS –l mem=10gb
• Asking for 4 cores on 1 node, total memory = 10 GB
• Size of memory per core/process
pmem
Example
#PBS –l nodes=1:ppn=4
#PBS –l pmem=2gb
• Asking for 4 cores on 1 node, memory per process =
2GB. Total memory requested is then 8GB
• Acceptable memory size units are: b, kb, mb, gb, tb
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
10
Resource Allocation: Asking for multiple cores and nodes
• Remember that procs and ppn cannot be greater than the
number of cores per node for the cluster.
• For example, for Coates this is 8, for Rossmann this is 24.
• The same goes for mem and pmem ; their sizes are limited by the
cluster you are working on.
• Refer to the user guides for each cluster for more information:
http://www.rcac.purdue.edu/userinfo/resources/
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
11
Run options: Requesting walltime
• Maximum program run time
walltime
Example
#PBS –l walltime=8:00:00
• Notice that this maximum time is a job-wise limit (that is, same on
all nodes)
• If program runs past this amount of time, it will be killed by PBS
and return an error.
• Walltime is specified in [hours]:[minutes]:[seconds] form.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
12
Run options: Queue
• PBS queue
-q
Example
#PBS –q ncn
• Queue is the realm of privilege where a user can submit his/her job
to. As NCN member, you are eligible to submit jobs to NCN owned
queue and a common queue called “standby”
• If not specified, all jobs will be sent to standby queue.
About “standby” queue
• The “standby” queue is composed of all unused resources on the
cluster and every user is able to access it.
• It has a 4 hours max walltime.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
13
Run options: Output and Error files
• PBS output/error file location and names of files
-o / -e
Example
#PBS –o /wang159/mydir –e /wang159/mydir
#PBS –o output.txt –e error.txt
• PBS by default dumps the output and error messages into two
separate files in the working directory.
• With this option, you are able to choose where you would like to
place those files and the names of the files
• Put output and error messages together in one
file. oe  to output file, eo  to error file
-j
Example
#PBS –j oe
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
14
Run options: Job Name
• PBS job name
-N
Example
#PBS –N nanowire_a1
• This option allows you to choose a specific name for your job.
• It is useful in case you have multiple jobs running.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
15
Run options: Email Notifications
• Have PBS email you with information
• -a : job aborted
• -b : job begun
• -e : job terminated
-M
-m abe
Example
#PBS –M username@gmail.com
#PBS –m abe
• This allows you to receive
information about you job
• Example from Gmail
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
16
Demo #1
• Example script with options covered so far
Resource
allocation
Chunk-wise
Job-wise
Run options
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
17
Elements of PBS script: PBS environment variables
Shell header
PBS in-script options
PBS environment variables
Commands
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
18
Elements of PBS script: PBS environment variables
PBS environment
• The PBS environment variables serves the same purpose as the
ones commonly used in Unix; they contains information about the
run-time environment of PBS such as working directory, user id,
job id and others.
• There are two kinds of PBS environment variables:
» Ones inherited from the shell you submitted your PBS script from. (It has the
form of PBS_O_)
» Ones that are not inherited. (Do not contain “O”)
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
19
List of PBS environment variables
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
20
Common environment variables
• If running under PBS
PBS_ENVIRONMENT
Boolean variable. Useful for telling if the script is currently running
under PBS
• PBS job ID / job name
PBS_JOBID /
JOBNAME
This returns a value that is unique to a certain job. Useful for
discriminating different jobs that are running altogether.
• Executable PATH (inherited)
PBS_O_PATH
If program is not within the paths defined here, it won’t be found by
the program.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
21
Common environment variables
• Current working directory (inherited)
PBS_O_WORKDIR
• You will end up at your home folder after connecting to the cluster.
• If you want to run files in the directory where the qsub command
was executed, you have to cd to that folder.
• $PBS_O_WORKDIR contains the full address of where the script
was executed, and is thus an easy way to get back to that folder.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
22
Demo #2
Shell header
PBS in-script options
PBS environment variables
Commands
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
23
Executable Functions
Remember that the basis of PBS commands consists of the
in-script options, environment variables and the
executable functions form the basis of PBS commands.
Let’s take a look at the executable functions used to submit
the script, and monitor and modify the jobs.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
24
Submitting a PBS script
Submission of a job
qsub
• qsub [script] will simply submit the job request to PBS. A
successful submission will show a job ID.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
25
Checking job status
Checking job status
qstat -a
NDS
Number of nodes used
S
Meaning
TSK
Total number of processes
Q
queued
Req’d Memory
Total RAM requested
R
running
Req’d Time
Walltime
H
hold
Elap Time
Has run for how long
E
exiting
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
26
Checking job status
Checking job status
qstat –u
• qstat and qstat –a will give you a quick view of all current running
jobs on the server on all queues.
• Too see your own, use –u yourusername
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
27
Checking job status
Checking job status
qstat –f
• qstat –f [jobID] will give
you full information about
the job, with nodes are
used, output paths,
memory used etc.
• Alternatively, the command
checkjob –v [jobID] does
almost the same
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
28
Checking queue status
Checking queue status
qstat –Q
• qstat –Q will give an overview of the queue
• Refer to man qstat  “Displaying Queue Status”
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
29
NCN queue
qstat –Qf ncn
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
30
Delete your jobs
• Delete specific job(s)
qdel
Example
• Correctly delete a job of yours
• Attempting to delete a job of others
• You may only delete a job of yours, not others.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
31
Selecting specific jobs
• Selecting specific job(s)
qselect
Selecting your jobs
• Can be used to delete all jobs returned by qselect
• Or you can write out a list of all your jobs to a file
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
32
Interactive jobs
• If you want direct control of your job, there is one more way to run
jobs on the cluster. This is called “interactive jobs”.
• If you supply the option –I (capital “i” ), you will be interactively
connected to your requested nodes, which means that you can
navigate your folders and files as normally.
• The job will be waiting in the queue until ready. Then just run the
commands as you otherwise would have done. When done, type
exit, otherwise you will be disconnected after your walltime is up.
• If the job is interactive, all commands past the last #PBS line are
ignored
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
33
PBS scripts together with command line arguments
• This guide was about writing PBS scripts to take care of
everything to make the process of submitting jobs fast and
easy. But that’s not the only way…
• You can also combine PBS script arguments with command
line arguments. All arguments we have seen in this guide can
be provided as a command line argument.
• To see all possible commands, refer to the qsub manual:
bash$ man qsub
• In case of a conflict between command line arguments and
arguments in the script, the command line takes precedence.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
34
Example: Script and Command Line comparison
• As mentioned , all #PBS lines in scripts can be written out on the
command line as arguments to qsub.
• A purely command line call to qsub is only useful if an interactive
job is wanted because everything after the last #PBS is ignored.
• This example shows how the PBS script compares to the
command line.
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
35
Topics in next presentation “Advanced PBS”
• Modifying queued job attributes
• Job arrays
• Hold/reshuffle jobs orders
• Sending message/signals to jobs
• Moving jobs between queues
• Passing variables to jobs
• Job dependencies
Xufeng Wang, Kaspar Haume, Gerhard Klimeck
36
Download