Document

advertisement
Job Submission on the
Olympus Cluster
Jay DePasse
Public Health Applications Specialist
Pittsburgh Supercomputing Center
MISSION 2.0 Training, Dec 11th, 2014
ISG
We build general capability
1
Learning Objectives
After this tutorial, you should:
• Be comfortable submitting and monitoring jobs
through the batch queuing system on Olympus
• Be able to modify the supplied job scripts for your
own work
• Understand how to efficiently use the filesystems
on Olympus
• Know where to go for help…
ISG
We build general capability
2
Resources
• The Olympus git repository contains the source
code for the examples used in this tutorial; all of
the presentations today are available for download
• The Olympus cluster wiki is provides specific
documentation for Olympus
• The website of the Pittsburgh Supercomputing
Center contains additional training materials and
account management tools.
• Email remarks@psc.edu with questions about
available software, problems with your account,
and requests for advanced consultation.
ISG
We build general capability
3
Types of Jobs
• Serial Jobs: individual, independent jobs that run
using a single core of a single processor on a single
node
• Multicore Parallel Jobs: use multiple cores on a
single node
• e.g., OpenMP
• Message Passing Jobs: can use multiple cores
distributed over multiple nodes
• e.g., Open MPI
Notes:
• These categories are fuzzy; Jobs that fit into more than one (or all) aren’t uncommon
• Boils down to what resources are needed: How many nodes? How many cores?
ISG
We build general capability
4
Job Scripts
• A job script is a step-by-step recipe for completing
work on a compute cluster
• The recipe is written in a scripting language; we will
use bash in our examples
• In order to submit this job script on the Olympus
compute cluster, we will use the qsub command
Examples can be found on the Olympus gitlab site:
https://git.isg.pitt.edu/depasse/olympus/blob/master/examples/fred/fred.bash
ISG
We build general capability
5
PBS Directives in a Job Submission
Script
The “hash-bang” or
“shebang” specifies
the scripting language
used
#!/bin/bash –f
# Remarks: A line beginning with # is a comment.
# A line beginning with #PBS is a PBS directive.
# PBS directives must come first; any directives after the first executable statement are
ignored.
#PBS -N test.bash
# #PBS -o stdout_file
# #PBS -e stderr_file
An active PBS
directive
Commented-out PBS
directives
ISG
We build general capability
6
Simple Job Submission Script
#!/bin/bash –f
# Set PBS Directives…
#PBS -N test.bash
#PBS –l nodes=1:ppn=1
# Get your input files together
cp ~/inputs.txt myInput.txt
#Run your program
myProgram –i myInput.txt –o myOutput.txt
#Collect the output
cp myOutput.txt ~/outputs
ISG
We build general capability
7
Submitting a Job and Monitoring
Progress
• After submitting your script with qsub it will be
entered into the queue
• A queue is a prioritized list of jobs to be completed
• Once submitted, the status of your job can be
viewed with qstat
• “qstat –a” gives you more verbose output
• After your job completes the output of your job will
be available in your home directory
ISG
We build general capability
8
Example Jobs
• Clone the git repository with the command:
• “git clone https://git.isg.pitt.edu/depasse/olympus.git”
• Enter the examples directory:
• “cd olympus.git/examples”
• View the directories by typing “ls”, you should see:
• “sanity”: a basic diagnostic sanity check
• “mpihello”: a simple example of a parallel multiple node
mpi code
• “flute”: a basic, real-world example of parallel MPI code
• “fred”: a basic, real-world example of OpenMP multithreaded code
ISG
We build general capability
9
Example: “sanity”
• Go to the examples/sanity directory
• View the contents using “less”:
• “less sanity.bash”
• Navigate with up and down arrows, exit by pressing ‘q’
• The script is heavily commented, explaining each step
• Submit your job!
• “qsub sanity.bash”
Can also view here:
https://git.isg.pitt.edu/depasse/olympus/tree/master/examples/sanity
ISG
We build general capability
10
Using Olympus File Systems
• Each node in Olympus has a “local” disk, physically located inside the node.
• Fast, reliable for work on its own node
• Olympus has a “shared” file system that is accessible to all nodes via the
network.
• This is where your home directory is
• Home directory is persistent, and contents are never deleted
• While running, jobs should write output to the “local” disks.
• Local disks are for temporary work, will be periodically “scrubbed”
• For convenience, the “local” directory can be accessed on the head
node through the path /net/<node name>/tmp
• Example: if you want to go to the node n002’s local disk, it the path would
be /net/n002/tmp.
• The local directory’s location is stored in the $LOCAL environment variable
ISG
We build general capability
11
What does this look like in a job
Set an environment
submission script?
variable defining the
local_scratch_path="/net/$execution_compute_node$LOCAL“
path to the “local”
directory.
# make a directory for this job; name created using job id
local_working_dir_name="$PBS_JOBID.output.directory"
local_working_dir_net_path="$local_scratch_path/$local_working_dir_name"
# create the directory
mkdir -p $local_working_dir_net_path
# dump all environment variables to a compressed file
env | gzip > $local_working_dir_net_path/$PBS_JOBID.env.gzip
Define a directory
name
that
is unique
Make that
directory.
Create
a shortcut
to
your job.
yourtooutput
so you
can access it on the
head node.
Make the output of
# create a symlink to the local working dir, available through the execution
your job go to that
# compute nodes NFS export
directory
ln -s $local_working_dir_net_path $PBS_O_WORKDIR/$local_working_dir_name
ISG
We build general capability
12
Try the other examples
• Navigate to the other directories (flute, fred,
mpihello)
• Each contains a “README” text file with
instructions for submission
• Each job should take only a few minutes, and will
produce output in the same directory that you run
qsub
ISG
We build general capability
13
Working in the shell
• “man”: Most important command of all. Opens
the manual page for a command. “man man” to
start. Type “q” to quit.
• “ls”: List files in a directory. Similar to “dir” in
Windows/DOS
• “cd”: Change directory. Move up and down the
directory tree.
• “less”: A pager that allows you to view (but not
edit) a file’s contents.
• “vi”: The ubiquitous text editor.
ISG
We build general capability
14
Text Editing with VI
• Type “i” to enter insert mode. Now you can
navigate, delete, and type much the same as in
other editors.
• Type “ESC” to exit insert mode
• Type “:w” to write your changes to disk
• Type “:q” to quit the vi editor
ISG
We build general capability
15
Download