Running Jobs on Blue Gene

advertisement
Running Jobs on Blue Gene
Running Batch Jobs
Batch jobs are run using the LoadLeveler scheduler. The path to the LoadLeveler
binaries should be set by default in your login shell. To submit a job using
LoadLeveler:
llsubmit run.script
run.script is a LoadLeveler script such as the following (exec refers to your
executable in the following script):
#!/usr/bin/ksh
#@ environment = COPY_ALL;
#@ job_type = BlueGene
#@ account_no = <your user account>
#@ class = parallel
#@ bg_partition = <partition name; for example: top>
#@ output = file.$(jobid).out
#@ error = file.$(jobid).err
#@ notification = complete
#@ notify_user = <your email address>
#@ wall_clock_limit = 00:10:00
#@ queue
mpirun -mode VN -np <number of procs> -exe <your executable> -cwd <working
directory>
Please note that the partition should be specified only using the bg_partition and
NOT in the mpirun arguments. In addition, the job_type should be set to BlueGene.
Use llq to check the status of jobs and llcancel to cancel jobs. Additional options are
described in Blue Gene sections of the LoadLeveler user guide (IBM site).
MPIRUN Options
Some key mpirun parameters are:
Option
–mode
–np
–mapfile
–cwd
–exe
–args
–env
Definition
compute mode: CO or VN
number of compute processors
logical mapping of processors
current working directory
full path of executable
arguments of executable
environmental variables
Additional parameters are described in the mpirun user's manual (from IBM).
The parameters that specify the choice of resources are –mode, –np, and –mapfile.
The behavior of these parameters is interdependent. Jobs run in partitions or blocks,
which are typically in powers of two. A partition must be allocated (or booted) before
a run and is restricted to a single user at a time. Please ensure that you use the
defined partitions by specifying it in the Loadleveler script (using bg_partition). If you
do not do so, an ad hoc partition is created for your run which may not be efficient
and will interfere with other users who are using the defined partitions.
Two compute modes are available:
1. In coprocessor (CO) mode, only one processor per compute node performs
computation, while the other processor performs communication and I/O.
2. In virtual node (VN) mode, both processors in a compute node perform
computation as well as communication and I/O. Each processor is thus a
virtual node.
For a given number of compute nodes, VN mode is usually faster than CO mode and
so is preferred, since it makes better use of the machine. However, the memory per
node on Blue Gene is relatively small, and in VN mode the memory per processor is
half that of CO mode. Thus some problems may run only in CO mode rather than VN
mode.
Partition Layout and Usage Guidelines
To make effective use of the Blue Gene, production runs should generally use onefourth or more of one rack of the machine, i.e., 256 or more compute nodes. Thus the
following seven predefined partitions are provided for production runs:
Partition name
SDSC
R01R02
Number of nodes
all 3,072 nodes
2,048 nodes combining rack 1 and rack 2
These 3 partitions each consist of all 1,024 nodes of rack 0, rack 1,
rack, R1, and R2
and rack 2 respectively.
top & bot
512 nodes in the top, 512 nodes in the bottom of rack 0
R01–top & R01– 512 nodes in the top, 512 nodes in the bottom of rack 1
bot
R02–top & R02– 512 nodes in the top, 512 nodes in the bottom of rack 2
bot
top256–1 &
256 nodes in each half of the top
top256–2
bot256–1 &
256 nodes in each half of the bottom
bot256–2
Smaller 64 (bot64-1, …, bot64-8) and 128 (bot128-1 , … , bot128-4) node partitions
are available for test runs. The partition layout on rack 0 and usage guidelines are
detailed in the following diagram:
Diagram 1: Availability and Time Limits for Blue Gene Partitions
Partition
Batch
Availability
All times
Time limit
18 hrs.
Batch
7PM-7AM (PST)
Mon-Fri
All day on weekends
7AM-7PM (PST)
Mon-Fri
18 hrs.
Test
30 min.
Please note that the smaller partitions are contained within the larger partitions.
Hence if there is a job running on the bot128-1 partition, the bot64-1, bot64-2,
bot256-1, bot, and rack partitions will be unavailable in addition to the bot128-1
partition. Similarly, if there is a job running on the rack partition all the other partitions
will be unavailable. Hence, if you have a small job please choose the smallest
possible partition which fits your job to enable users to run on other partitions.
Accounting
The following algorithm determines the Service Units (SUs) charged from your
allocation:
SUs = Wallclock_Hours x (Num of nodes in partition) x 2
Specifying a partition on Blue Gene precludes any other users from using the nodes
in that partition. Therefore, you are charged for the entire partition you use, even if
you do not use all the nodes for computations. For example, If you are using bot1282, you are charged for 256 processors, whether they are used or not.
We advise you to use the smallest partition capable of running your job to minimize
your charges.
How To Check Your Remaining Allocation
Users can check their remaining allocation using the reslist command (see the
example below). Complete information on the usage and options of reslist may be
found by typing reslist --help on bglogin.sdsc.edu.
bg-login1 % reslist
Querying database, this may take several seconds ...
Output shown is local machine usage. For full usage on roaming accounts,
please use tgusage.
SU Hours SU Hours
Name
UID ACID ACC PCTG ALLOCATED
USED USER
jdoe 88888 300 U 100 500000 5000.00 DOE, JOHN
use300
300
500000 450000.00
To determine the allocation usage for a single user:
% reslist -u username
To determine the allocation usage for all users under a given account:
% reslist -a grp000
To determine the allocation usage for jobs run within a particular time period:
% reslist -j -u username -a grp000 --begindate=mm-dd-yyyy --enddate=mm-dd-yyyy
Monitoring Jobs
You can monitor jobs in the queue using the llq command with the -b option. This
gives details of jobs currently in the queue and the partitions they are using. For
example:
bg-login1 /users/consult> llq -b
Id
Owner
Submitted LL BG PT Partition
Size
________________________ __________ ___________ __ __ __
________________ ______
bgsn.13985.0
u8240
9/6 09:47 C FR bot
512
1 job step(s) in queue, 0 waiting, 0 pending, 0 running, 1 held, 0 preempted
Download