Job Submission on the Olympus Cluster Jay DePasse Public Health Applications Specialist Pittsburgh Supercomputing Center MISSION 2.0 Training, Dec 11th, 2014 ISG We build general capability 1 Learning Objectives After this tutorial, you should: • Be comfortable submitting and monitoring jobs through the batch queuing system on Olympus • Be able to modify the supplied job scripts for your own work • Understand how to efficiently use the filesystems on Olympus • Know where to go for help… ISG We build general capability 2 Resources • The Olympus git repository contains the source code for the examples used in this tutorial; all of the presentations today are available for download • The Olympus cluster wiki is provides specific documentation for Olympus • The website of the Pittsburgh Supercomputing Center contains additional training materials and account management tools. • Email remarks@psc.edu with questions about available software, problems with your account, and requests for advanced consultation. ISG We build general capability 3 Types of Jobs • Serial Jobs: individual, independent jobs that run using a single core of a single processor on a single node • Multicore Parallel Jobs: use multiple cores on a single node • e.g., OpenMP • Message Passing Jobs: can use multiple cores distributed over multiple nodes • e.g., Open MPI Notes: • These categories are fuzzy; Jobs that fit into more than one (or all) aren’t uncommon • Boils down to what resources are needed: How many nodes? How many cores? ISG We build general capability 4 Job Scripts • A job script is a step-by-step recipe for completing work on a compute cluster • The recipe is written in a scripting language; we will use bash in our examples • In order to submit this job script on the Olympus compute cluster, we will use the qsub command Examples can be found on the Olympus gitlab site: https://git.isg.pitt.edu/depasse/olympus/blob/master/examples/fred/fred.bash ISG We build general capability 5 PBS Directives in a Job Submission Script The “hash-bang” or “shebang” specifies the scripting language used #!/bin/bash –f # Remarks: A line beginning with # is a comment. # A line beginning with #PBS is a PBS directive. # PBS directives must come first; any directives after the first executable statement are ignored. #PBS -N test.bash # #PBS -o stdout_file # #PBS -e stderr_file An active PBS directive Commented-out PBS directives ISG We build general capability 6 Simple Job Submission Script #!/bin/bash –f # Set PBS Directives… #PBS -N test.bash #PBS –l nodes=1:ppn=1 # Get your input files together cp ~/inputs.txt myInput.txt #Run your program myProgram –i myInput.txt –o myOutput.txt #Collect the output cp myOutput.txt ~/outputs ISG We build general capability 7 Submitting a Job and Monitoring Progress • After submitting your script with qsub it will be entered into the queue • A queue is a prioritized list of jobs to be completed • Once submitted, the status of your job can be viewed with qstat • “qstat –a” gives you more verbose output • After your job completes the output of your job will be available in your home directory ISG We build general capability 8 Example Jobs • Clone the git repository with the command: • “git clone https://git.isg.pitt.edu/depasse/olympus.git” • Enter the examples directory: • “cd olympus.git/examples” • View the directories by typing “ls”, you should see: • “sanity”: a basic diagnostic sanity check • “mpihello”: a simple example of a parallel multiple node mpi code • “flute”: a basic, real-world example of parallel MPI code • “fred”: a basic, real-world example of OpenMP multithreaded code ISG We build general capability 9 Example: “sanity” • Go to the examples/sanity directory • View the contents using “less”: • “less sanity.bash” • Navigate with up and down arrows, exit by pressing ‘q’ • The script is heavily commented, explaining each step • Submit your job! • “qsub sanity.bash” Can also view here: https://git.isg.pitt.edu/depasse/olympus/tree/master/examples/sanity ISG We build general capability 10 Using Olympus File Systems • Each node in Olympus has a “local” disk, physically located inside the node. • Fast, reliable for work on its own node • Olympus has a “shared” file system that is accessible to all nodes via the network. • This is where your home directory is • Home directory is persistent, and contents are never deleted • While running, jobs should write output to the “local” disks. • Local disks are for temporary work, will be periodically “scrubbed” • For convenience, the “local” directory can be accessed on the head node through the path /net/<node name>/tmp • Example: if you want to go to the node n002’s local disk, it the path would be /net/n002/tmp. • The local directory’s location is stored in the $LOCAL environment variable ISG We build general capability 11 What does this look like in a job Set an environment submission script? variable defining the local_scratch_path="/net/$execution_compute_node$LOCAL“ path to the “local” directory. # make a directory for this job; name created using job id local_working_dir_name="$PBS_JOBID.output.directory" local_working_dir_net_path="$local_scratch_path/$local_working_dir_name" # create the directory mkdir -p $local_working_dir_net_path # dump all environment variables to a compressed file env | gzip > $local_working_dir_net_path/$PBS_JOBID.env.gzip Define a directory name that is unique Make that directory. Create a shortcut to your job. yourtooutput so you can access it on the head node. Make the output of # create a symlink to the local working dir, available through the execution your job go to that # compute nodes NFS export directory ln -s $local_working_dir_net_path $PBS_O_WORKDIR/$local_working_dir_name ISG We build general capability 12 Try the other examples • Navigate to the other directories (flute, fred, mpihello) • Each contains a “README” text file with instructions for submission • Each job should take only a few minutes, and will produce output in the same directory that you run qsub ISG We build general capability 13 Working in the shell • “man”: Most important command of all. Opens the manual page for a command. “man man” to start. Type “q” to quit. • “ls”: List files in a directory. Similar to “dir” in Windows/DOS • “cd”: Change directory. Move up and down the directory tree. • “less”: A pager that allows you to view (but not edit) a file’s contents. • “vi”: The ubiquitous text editor. ISG We build general capability 14 Text Editing with VI • Type “i” to enter insert mode. Now you can navigate, delete, and type much the same as in other editors. • Type “ESC” to exit insert mode • Type “:w” to write your changes to disk • Type “:q” to quit the vi editor ISG We build general capability 15