Running Jobs on Blue Gene Running Batch Jobs Batch jobs are run using the LoadLeveler scheduler. The path to the LoadLeveler binaries should be set by default in your login shell. To submit a job using LoadLeveler: llsubmit run.script run.script is a LoadLeveler script such as the following (exec refers to your executable in the following script): #!/usr/bin/ksh #@ environment = COPY_ALL; #@ job_type = BlueGene #@ account_no = <your user account> #@ class = parallel #@ bg_partition = <partition name; for example: top> #@ output = file.$(jobid).out #@ error = file.$(jobid).err #@ notification = complete #@ notify_user = <your email address> #@ wall_clock_limit = 00:10:00 #@ queue mpirun -mode VN -np <number of procs> -exe <your executable> -cwd <working directory> Please note that the partition should be specified only using the bg_partition and NOT in the mpirun arguments. In addition, the job_type should be set to BlueGene. Use llq to check the status of jobs and llcancel to cancel jobs. Additional options are described in Blue Gene sections of the LoadLeveler user guide (IBM site). MPIRUN Options Some key mpirun parameters are: Option –mode –np –mapfile –cwd –exe –args –env Definition compute mode: CO or VN number of compute processors logical mapping of processors current working directory full path of executable arguments of executable environmental variables Additional parameters are described in the mpirun user's manual (from IBM). The parameters that specify the choice of resources are –mode, –np, and –mapfile. The behavior of these parameters is interdependent. Jobs run in partitions or blocks, which are typically in powers of two. A partition must be allocated (or booted) before a run and is restricted to a single user at a time. Please ensure that you use the defined partitions by specifying it in the Loadleveler script (using bg_partition). If you do not do so, an ad hoc partition is created for your run which may not be efficient and will interfere with other users who are using the defined partitions. Two compute modes are available: 1. In coprocessor (CO) mode, only one processor per compute node performs computation, while the other processor performs communication and I/O. 2. In virtual node (VN) mode, both processors in a compute node perform computation as well as communication and I/O. Each processor is thus a virtual node. For a given number of compute nodes, VN mode is usually faster than CO mode and so is preferred, since it makes better use of the machine. However, the memory per node on Blue Gene is relatively small, and in VN mode the memory per processor is half that of CO mode. Thus some problems may run only in CO mode rather than VN mode. Partition Layout and Usage Guidelines To make effective use of the Blue Gene, production runs should generally use onefourth or more of one rack of the machine, i.e., 256 or more compute nodes. Thus the following seven predefined partitions are provided for production runs: Partition name SDSC R01R02 Number of nodes all 3,072 nodes 2,048 nodes combining rack 1 and rack 2 These 3 partitions each consist of all 1,024 nodes of rack 0, rack 1, rack, R1, and R2 and rack 2 respectively. top & bot 512 nodes in the top, 512 nodes in the bottom of rack 0 R01–top & R01– 512 nodes in the top, 512 nodes in the bottom of rack 1 bot R02–top & R02– 512 nodes in the top, 512 nodes in the bottom of rack 2 bot top256–1 & 256 nodes in each half of the top top256–2 bot256–1 & 256 nodes in each half of the bottom bot256–2 Smaller 64 (bot64-1, …, bot64-8) and 128 (bot128-1 , … , bot128-4) node partitions are available for test runs. The partition layout on rack 0 and usage guidelines are detailed in the following diagram: Diagram 1: Availability and Time Limits for Blue Gene Partitions Partition Batch Availability All times Time limit 18 hrs. Batch 7PM-7AM (PST) Mon-Fri All day on weekends 7AM-7PM (PST) Mon-Fri 18 hrs. Test 30 min. Please note that the smaller partitions are contained within the larger partitions. Hence if there is a job running on the bot128-1 partition, the bot64-1, bot64-2, bot256-1, bot, and rack partitions will be unavailable in addition to the bot128-1 partition. Similarly, if there is a job running on the rack partition all the other partitions will be unavailable. Hence, if you have a small job please choose the smallest possible partition which fits your job to enable users to run on other partitions. Accounting The following algorithm determines the Service Units (SUs) charged from your allocation: SUs = Wallclock_Hours x (Num of nodes in partition) x 2 Specifying a partition on Blue Gene precludes any other users from using the nodes in that partition. Therefore, you are charged for the entire partition you use, even if you do not use all the nodes for computations. For example, If you are using bot1282, you are charged for 256 processors, whether they are used or not. We advise you to use the smallest partition capable of running your job to minimize your charges. How To Check Your Remaining Allocation Users can check their remaining allocation using the reslist command (see the example below). Complete information on the usage and options of reslist may be found by typing reslist --help on bglogin.sdsc.edu. bg-login1 % reslist Querying database, this may take several seconds ... Output shown is local machine usage. For full usage on roaming accounts, please use tgusage. SU Hours SU Hours Name UID ACID ACC PCTG ALLOCATED USED USER jdoe 88888 300 U 100 500000 5000.00 DOE, JOHN use300 300 500000 450000.00 To determine the allocation usage for a single user: % reslist -u username To determine the allocation usage for all users under a given account: % reslist -a grp000 To determine the allocation usage for jobs run within a particular time period: % reslist -j -u username -a grp000 --begindate=mm-dd-yyyy --enddate=mm-dd-yyyy Monitoring Jobs You can monitor jobs in the queue using the llq command with the -b option. This gives details of jobs currently in the queue and the partitions they are using. For example: bg-login1 /users/consult> llq -b Id Owner Submitted LL BG PT Partition Size ________________________ __________ ___________ __ __ __ ________________ ______ bgsn.13985.0 u8240 9/6 09:47 C FR bot 512 1 job step(s) in queue, 0 waiting, 0 pending, 0 running, 1 held, 0 preempted