Job Submission Using PBSPro and Globus Job Commands Overview Computational Resources Queuing Systems PBSPro Globus Toolkit Computational Resources Hard Disks (permanent storage) Number of CPUs (processing power) CPU time (processing time) Physical memory (program size) All computers have limited resources These resources may span across multiple processors/machines How to allocate these resources fairly, amongst the many users? Queuing Systems Holding area for pending requests A method for allocating the needed resources based on a user request Processing of requests dynamically, quickly and fairly Many different implementations PBSPro Queuing System used to control the allocation of computational resources to user submitted jobs Allows optimal sharing of all resources Ensures that the limited resources aren’t over-run and exceeded Unattended processing of requests PBS Queues Allows distribution of resources into clearly defined groups, called queues Queues can be defined by: Maximum CPU time Number of CPUs available Memory needed Concurrently executing jobs Queuing schemes evolve over time as user requests and workload vary Interacting With PBS qsub – Submit a job to the queues qstat – Check the status of a job qdel – Delete a job from the queues qmgr – Create/modify queue settings xpbs – Monitor all queues and jobs qsub – Submit A Job To submit a job to the Queuing System, use “qsub” E.g. qsub test.sub Test.sub is a script file containing the commands to be executed qsub returns your “QueueID”, if the job was submitted successfully: 61606.master Submit File For A Serial Job !/bin/bash #PBS -l walltime=2:00:00 #PBS -l mem=5mb #PBS -j oe #PBS –m be cd Serial ./test Submit File For A Parallel Job #!/bin/sh #PBS -l nodes=4:ppn=2 #PBS -l walltime=48:00:00 #PBS -j oe cd ./MPI/Examples mpiexec ./test qstat – Check Job Status To check the status of a job submitted to the Queuing System, use “qstat” Job id -----------------61395.master 61494.master 61495.master 61496.master 61497.master 61498.master 61555.master 61567.master 61576.master 61578.master 61580.master Name ---------G5D2C STDIN STDIN STDIN STDIN STDIN Co_V 20_12 20_21 20_23 20_25 User -----------ngs0140 ngs0227 ngs0227 ngs0227 ngs0227 ngs0227 ngs0133 ngs0234 ngs0234 ngs0234 ngs0234 Time -----------70:00:40 17:58:40 18:15:42 18:15:02 18:13:42 18:14:13 20:57:53 00:31:17 00:11:59 00:05:51 00:03:09 S R R R R R R R R R R R Queue -------cpu16 cpu1 cpu1 cpu1 cpu1 cpu8 cpu24 cpu1 cpu4 cpu1 cpu1 qdel – Delete A Job To delete a job submitted to the Queuing System, use “qdel” E.g. qdel QueueID qdel QueueID1 QueueID2 Globus Toolkit An open source toolkit for developing Grid based applications and connectivity Allocating computational resources on remote (Globus aware) machines for the execution of user submitted jobs Globus Job Commands globus-job-run <options> globus-job-submit <options> globus-job-status URL globus-job-get-output URL globus-job-run Allows you to run a job as though it were interactive, on a local or remote machine Don’t actually need to log on to the machine itself Not submitted to the Queuing System Returns the programs output as though you were running interactively globus-job-run Examples globus-job-run grid-data.man.ac.uk /bin/date globus-job-run grid-data.rl.ac.uk ./test globus-job-run \ grid-compute.leeds.ac.uk/jobmanager-pbs \ -np 8 -x ‘(jobtype=mpi)(environment= \ (NGSMODULES clusteruser))’ ./MPI/test globus-job-run grid-data.rl.ac.uk -s ./test globus-job-submit To submit a job through a Globus Job Manager Commands returns a URL to the program’s output https://grid-compute.leeds.ac.uk:64167/5291/1094639422/ The status and output of the job can be tested through this URL globus-job-submit Examples globus-job-submit \ grid-data.rl.ac.uk/jobmanager-pbs ./test globus-job-submit \ grid-compute.leeds.ac.uk/jobmanager-pbs \ -x ‘(jobtype=mpi)(directory=/home/bob/mpi) \ (environment=(NGSMODULES clusteruser)) \ (count=8)’ ./mpi_program globus-job-submit \ grid-data.man.ac.uk/jobmanager-pbs -s ./test globus-job-status URL To get the status of a submitted job globus-job-status \ https://gridcompute.leeds.ac.uk:64167/5291/1094639422/ Returns: Pending, Active, Done, Failed globus-job-get-output URL To retrieve the output of a job submitted through a Globus Job Manager globus-job-get-output \ https://gridcompute.leeds.ac.uk:64167/5291/1094639422/ Returns the output of the program to the console