Document 13105489

advertisement
LSF for Users
ZORRO HPC
What is LSF?
LSF - Load Sharing Facility
Batch Management Subsystem
for multi-host, multi-vendor complexes
with capability to manage computing resources across multiple
platforms.
LSF runs on the AU-HPC cluster
-----------------------------------------------------------------------------Documentation: /app/docs/LSF/7.0/*.pdf
Hardware description: http://www.american.edu/hpc
At a command line enter: man lsfintro
To be able to access LSF
This has been added to your login processing:
. /opt/lsf/conf/profile.lsf (sh users)
or
source /opt/lsf/conf/cshrc.lsf (csh users)
These commands are executed before you receive a command prompt.
There is no need for you to add anything to your login files in order to use LSF.
These commands define the LSF environment:
LSF_SERVERDIR, LSF_BINDIR, LSF_LIBDIR, XLSF_UIDDIR, LSF_ENVDIR, PATH, MANPATH
------------------------------------------------------------------Check: env | grep -i lsf
Essential Commands
for Users
• 
• 
• 
• 
• 
• 
bhosts
bqueues
bsub
bjobs
bhist
bpeek
• 
• 
• 
• 
• 
bmod
bbot/btop
bswitch
bstop/bresume
bkill
Essential Commands
Purpose
• 
• 
• 
• 
• 
• 
• 
bhosts - information about available hosts (lshosts)
bqueues - information about available queues
bsub - submit jobs to batch subsystem
bjobs - list jobs in the batch subsystem
bhist - displays historical information about user’s jobs
bpeek - displays stdout and stderr of user’s unfinished job
bmod - modifies job submission options for user’s job
Essential Commands
Purpose (cont’d)
•  bbot/btop - moves a pending job relative to user’s last/
first job in a queue
•  bswitch - switches user’s unfinished jobs from one queue
to another
•  bstop/bresume - suspends/resumes user’s unfinished
jobs
•  bkill - kill, suspend or resume user’s jobs
Essential Commands: bhosts
bhosts [-w|-l][-R “res_req”][host_name|host_group]
Displays information about hosts/platforms
lshosts [-w | -l] [-R "res_req"] [host_name | cluster_name]
lshosts -s [shared_resource_name ...]
Displays hosts and their static resource information
[root@hpchead ~]$ lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
hpchead X86_64 Intel_EM 60.0 12 24097M 2015M Yes (mpich2 mg)
node15
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node14
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node13
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node12
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node11
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node10
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node09
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node08
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node07
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node06
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node05
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node04
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node03
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node02
X86_64 Intel_EM 60.0 12 24097M 2000M Yes (cuda mpich2)
node01
X86_64 Intel_EM 60.0 12 24094M 2000M Yes (cuda mpich2) Essential Commands: bqueues
bqueues [-w|-l|-r][-m host_name|-m all]
[-u user_name|-u all][queue_name …]
Displays information about queues.
By default, returns the following information about all queues: queue
name, queue priority, queue status, job slot statistics, and job state
statistics.
[root@hpchead]$ bqueues QUEUE_NAME PRIO STATUS
MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP dynamic_provisi 60 Open:Active
- - - - 0 0 0 0 owners
43 Open:Active
- - - - 0 0 0 0 priority
43 Open:Active
- - - - 0 0 0 0 night
40 Open:Inact
- - - - 0 0 0 0 chkpnt_rerun_qu 40 Open:Active
- - - - 0 0 0 0 short
35 Open:Active
- - - - 0 0 0 0 license
33 Open:Active
- - - - 0 0 0 0 normal
30 Open:Active
- - - - 0 0 0 0 hpc_linux
30 Open:Active
- - - - 0 0 0 0 hpc_linux_tv 30 Open:Active
- - - - 0 0 0 0 idle
20 Open:Active
- - - - 0 0 0 0
Essential Commands: bsub
bsub [options] command [cmd_args]
Submits a job for batch execution
OPTION LIST
-B
Sends mail at dispatch and initiation times.
-H
Holds job in PSUSP and waits for bresume
-I | -Ip | -Is
Submits as batch interactive
-K
Submits job and locks cmd line with status updates
-N
Sends job report by e-mail (use only with -I | -Is | -Ip or -o)
-r
Rerun job on another host if host terminates
-x
Exclusive execution mode
-a esub_parameters
Specifies parallel job launcher (PJL) to be used -b [[month:]day:]hour:minute
Dispatch date/time
-C core_limit
Limits size of core dumps (-C 0 recommended?)
-c [hours:]minutes[/host_name | /host_model]
Cpu time limit
-D data_limit
-e err_file
File to use as stderr
-E "pre_exec_command [arguments ...]"
Pre-exec command invoked before batch stream command processing
-ext[sched] "external_scheduler_options"
N/A
-f "local_file operator [remote_file]" ...
Files to be copied between local/remote systems
-F file_limit
Per process file size limit
Essential Commands: bsub
bsub [options] command [cmd_args]
Submits a job for batch execution
OPTION LIST (cont’d)
- g job_group_name
Submits job to a job group
-G user_group
Associates job with a specific group
-i input_file | -is input_file
Specifies stdin for job
-J job_name | -J "job_name[index_list]%job_slot_limit"
Specifies job name
-k "checkpoint_dir [checkpoint_period][method=method_name]"
Makes a job checkpointable and specifies checkpoint directory
-L login_shell
Uses login_shell for runtime environment
-m "host_name[@cluster_name][+[pref_level]] | host_group[+[pref_level]]
Selects and ranks hosts/groups on which to run
-M mem_limit
Sets per process memory limit
-n min_proc[,max_proc]
Sets min/max number of processors required to run job
-o out_file
Specifies stdout
-P project_name
Specifies project name
-p process_limit
Limits total number of processes
-q queue_name
Specifies queue for job (default provided by system)
-R "res_req"
Specifies resource requirements
-sla service_class_name
Specifies service class for job
-sp priority
Specifies priority amongst user’s jobs
-S stack_limit
Sets per-process stack limit
Essential Commands: bsub
bsub [options] command [cmd_args]
Submits a job for batch execution
OPTION LIST (cont’d)
-t [[month:]day:]hour:minute
Specifies job termination date
-T thread_limit
Sets limit on number of concurrent jobs
-U reservation_ID
Uses reservation via brsvadd command
-u mail_user
Mail-to address
-v swap_limit
Sets total process virtual memory limit
-w 'dependency_expression'
Defines dependencies to be met before job initiation
-wa '[signal | command | CHKPNT]'
Specifies action to be taken before job control step occurs
-wt '[hours:]minutes'
Specifies time interval before job control occurs to send warning signal
-W [hours:]minutes[/host_name | /host_model]
Specifies run time limit for job
-Zs
Spolls command file and runs from there
The Importance of Being <
LSF usage is different from any other job schedulers
bsub a.out
bsub -n 2 a.out
bsub myscript
bsub -q queuename a.out
bsub -i infile -o outfile - e errfile a.out
bsub < myscript
LSF Job Submission
bsub < jobfile.
* By default, the job output is sent by mail.
Each LSF job runs in a queue. If you don't give LSF a queue name, your job will go to the default
<normal> queue.
Each LSF job will be dispatched to a compute node. If you don't specify the node, LSF will
choose one for you. To find the name of the server and the current status of the job, use
the bjobs command:
[root@hpchead ~]$ bjobs 103
JOBID USER STAT QUEUE
SUBMIT_TIME
103 User DONE normal
FROM_HOST EXEC_HOST JOB_NAME
hpchead
hpchead
hostname
Jun 7 11:38
This job executed on hpchead, the same host from which it was submitted. Unless told otherwise,
LSF will chose an execution host with the same architecture as the submission host. If more than
one server meets that criterion, LSF will choose the most powerful host with the lightest load.
LSF Job Submission
LSF output/error logs
By default, LSF will send you email containing the standard output (stdout) and standard error
(stderr) from your job, as well as some basic information about the execution of the job. If your
program produces additional output files, they are separate and are not included in this email.
To save your job's output in a file instead of receiving it in email, use the -o option on
the bsub command:
bsub -o my_output <myjob
You can also put stdout and stderr in different files if you wish:
bsub -o my_out -e my_err <myjob
To make it easier to keep track of the output from multiple runs of the same program, you can use
the special %J variable in your file names. LSF will substitute the job number for the %J variable:
bsub -o out.%J –e err.%J <myjob
LSF Job Submission
Submit job at specific time:
To force your job to begin at a specific time, use the -b option on the bsub command:
bsub -b 11:00 job01
* Tells LSF to start your job at 11:00 a.m. If the current time is
job will be held until the next day.
bsub -b 2:15:23:15
* Tells LSF to start the job at 11:15 p.m. on February 15.
Submit job to specific host:
If you want your job to run on a specific host, use the -m option
bsub -m node01 <myjob
after 11:00 a.m., the
Sample LSF script
Serial Job
bsub < serial.lsf
#!/bin/bash
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# the name of your job showing on the queue system
#BSUB -J FortranJob
# the following BSUB line specify the queue that you will use,
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err
#the CPU number that you will collect (Attention: each node has 2 CPU)
#BSUB -n 1
#Fortran example
pgf90 -o samp_f -Mextend samp.f
./samp_f
# C example
pgcc -o samp_c samp.c
./samp_c
# C++ example
pgCC --no_auto_instantiation -o samp_cc samp.cc
./samp_cc
Sample LSF script
MPI Job
bsub < mpi.lsf
#!/bin/bash
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# LSF batch script to run the test MPI code
#
#BSUB -P 93300070
# Project 93300070
#BSUB -a mpich_gm
# select the mpich-gm elim
#BSUB -x
# exlusive use of node (not_shared)
#BSUB -n 2
# number of total tasks
#BSUB -R "span[ptile=1]"
# run 1 tasks per node
#BSUB -J mpilsf.test
# job name
#BSUB -o mpilsf.out
# output filename
#BSUB -e mpilsf.err
# error filename #BSUB –q normal
# queue
# Fortran example
mpif90 -o mpi_samp_f mpisamp.f
mpirun.lsf ./mpi_samp_f
# C example
mpicc -o mpi_samp_c mpisamp.c
mpirun.lsf ./mpi_samp_c
# C++ example
mpicxx -o mpi_samp_cc mpisamp.cc
mpirun.lsf ./mpi_samp_cc
Sample LSF script
Matlab Job
bsub < matlab.lsf
#!/bin/bash
# enable your environment, which will use .bashrc configuration in your home directory
#BSUB -L /bin/bash
# the name of your job showing on the queue system
#BSUB -J MatlabJob
# the following BSUB line specify the queue that you will use,
#BSUB -q normal
# the system output and error message output, %J will show as your jobID
#BSUB -o %J.out
#BSUB -e %J.err
#the CPU number that you will collect (Attention: each node has 2 CPU)
#BSUB -n 1
#when job finish that you will get email notification
#BSUB -u user@american.edu
#BSUB -N
# your matlab code
matlab -nodisplay -r myplot
#enter your working directory
cd /home/username/matlab
LSF – Running Matlab in batch
To submit a batch Matlab job, first prepare a file with your Matlab
commands, say “program_file.m”. Then issue the commands:
bsub -q normal matlab -nodisplay -nojvm -nosplash -r program_file logfile output_file.txt
NB: if you intend to use java programs do not include the flag -nojvm.
Note that the suffix ".m" is omitted from the command file name. This
submits a batch job to the batch queue taking input from the file
program_file.m, and placing text output in output_file.txt. LSF – Running Mathematica in batch
To submit a batch Mathematica job, first prepare a file with your Mathematica commands, say "test.m". Then issue the commands:
bsub -q normal "math < test.m > test-out.txt"
* Note that the suffix ".m" is included in the command file name. This submits a batch job to the batch queue taking input from
the file test.m, and placing text output in test-out.txt.
Saving graphical output is somewhat trickier. To illustrate the simplest approach, here is a sample Mathematica job:
AppendTo[$Echo, "stdout"]
3+5
Integrate[Exp[-x^2],{x,-Infinity,Infinity}]
FactorInteger[120]
sc = Plot[{Sin[x], Cos[x]}, {x, 0, 2*Pi}, PlotStyle -> {
{RGBColor[1, 0, 0], Thickness[0.01]},
{RGBColor[0, 1, 0], Thickness[0.01]}}]
Export["sc.m",sc,"TEXT"]
abc=Table[Plot[x^n,{x,0,1}], {n, 1, 3}]
Do[Export["abc"<>ToString[n]<>".m",abc[[n]],"TEXT"],{n,1,3}]
7+9
Quit
If you are connecting to HPCHEAD.american.edu from Linux or a Mac with X11 installed, merely use ssh -Y. If you are
connecting from Windows, you must use Cygwin, Xming or X-Win32 to run an X server in order to do the same.
LSF – Running R in batch
R
To submit a batch R job, first prepare a file with your R commands, say
“program_file.R”. Then issue the two commands:
bsub -q normal R CMD BATCH program_file.R output_file.txt
command submits a batch job to the batch queue taking input from the file
program_file.R, and placing text output in output_file.txt. Graphical output is
saved to a PDF file via the "pdf" command within R, for example:
pdf("graphs.pdf") # create graphical output file
X=rnorm(100) # generate 100 N(0,1) variates
Y=rexp(100) # generate 100 Exp(1) variates
c(mean(X),mean(Y)) # mean of both samples
hist(X) # plot N(0,1) histogram
hist(Y) # plot Exp(1) histogram
dev.off() # close the file
Both histograms are saved to the same PDF file (one graph per page).
Essential Commands: bjobs
bjobs - Displays information about LSF jobs
bjobs -u user_name
bjobs -u all
bjobs -l
bjobs -r
bjobs -s
bjobs -q queue_name
Essential Commands: bhist
bhist - displays historical information about jobs
bhist -J job_name
bhist -C start_time, end_time
bhist -D start_time, end_time
bhist -S start_time, end_time
bhist -T start_time, end_time
Essential Commands: bpeek
bpeek - displays stdout and stderr of user’s selected, unfinished job
bpeek -f uses ‘tail -f’ to display output instead of ‘cat’
bpeek [-q queue_name | -m host_name | -J job_name |
job_ID | "job_ID[index_list]"]
Essential Commands: bmod
bmod - modifies job submission options of a job
bmod [bsub options] [job_ID | "job_ID[index]"]
bmod -g job_group_name | -gn [job_ID]
bmod [-sla service_class_name | -slan] [job_ID]
bmod [-h | -V]
Essential Commands: bbot, btop
bbot - moves a pending job relative to the last job in the
queue
bbot job_ID | "job_ID[index_list]" [position]
bbot [-h | -V]
btop - moves a pending job relative to the first job in the
queue
btop job_ID | "job_ID[index_list]" [position]
btop [-h | -V]
Essential Commands: bswitch
bswitch - switches unfinished jobs from one queue to
another
bswitch [-J job_name] [-m host_name | -m host_group]
[-q queue_name] [-u user_name | -u user_group | -u all]
destination_queue [0]
bswitch destination_queue [job_ID | "job_ID[index_list]"] ...
bswitch [-h | -V]
Essential Commands: bstop/bresume
bstop -suspends unfinished jobs
bstop [-a] [-d] [-g job_group_name |-sla service_class_name]
[-J job_name] [-m host_name | -m host_group]
[-q queue_name] [-u user_name | -u user_group | -u all] [0]
[job_ID | "job_ID[index]"] ...
bstop [-h | -V]
bresume -resumes one or more suspended jobs
bresume [-g job_group_name] [-J job_name] [-m host_name ]
[-q queue_name] [-u user_name | -u user_group | -u all ] [0]
bresume [job_ID | "job_ID[index_list]"] ...
bresume [-h | -V]
Essential Commands: bkill
bkill - sends signals to kill, suspend, or resume unfinished
jobs
bkill [-l] [-g job_group_name | -sla service_class_name]
[-J job_name] [-m host_name | -m host_group]
[-q queue_name] [-r | -s (signal_value | signal_name)]
[-u user_name | -u user_group | -u all]
[job_ID ... | 0 | "job_ID[index]" ...]
bkill [-h | -V]
Download