intro_to_gauss_slides

advertisement
An Introduction to Gauss
Paul D. Baines
University of California, Davis
November 20th 2012
What is Gauss?






http://wiki.cse.ucdavis.edu/support:systems:gauss
12 node compute cluster (2 x 16 cores per node)
1 TB storage per node
~ 11 TB storage on head node
64GB RAM per node
Total 416 cores (inc. head node)
What is Gauss good for?




Running large numbers of independent jobs
Running long-running jobs
Running jobs involving parallel computing
Running large-memory jobs
What Gauss is not designed for…
 Running simple, fast jobs (just use your laptop)
 Running interactive R sessions
 Running GPU-based calculations
Gauss Overview
 Create your public/private key (see Wiki for details)
 Provide CSE with your public key and campus username
(via email to help@cse.ucdavis.edu)
 Log in to Gauss via ssh:
(e.g., ssh –X username@gauss.cse.ucdavis.edu)
 When you ssh into Gauss, you log in to the head node
 If you just directly type R at the command line, you will be
running R on the head node
 (Please do not do this !)
 To use the compute nodes you submit jobs via SLURM
 SLURM manages which jobs runs on which nodes
Gauss Structure
Head
Node
SLURM
Compute
Node 1
Compute
Node 2
Compute
Node 3
Compute
Node …
Compute
Node 12
SLURM Basics
Important commands to know:




sbatch
sarray
squeue
scancel
(submit a job to Gauss)
(submit an array job to Gauss)
(check the status of running jobs)
(cancel a job)
Examples (more detailed examples later):
squeue
squeue –u pdbaines
scancel –u pdbaines
scancel 19213
# view all running jobs
# check all jobs
# cancel all of pdbaines’ jobs
# cancel job 19213
Resource Allocation on Gauss
 The compute resources (CPU’s, memory) are shared across
all Gauss users.
 When users submit jobs, SLURM allocates resources.
 You must be sure to request sufficient resources (e.g.,
cores, memory) for your jobs to run
 Resource requests are made when submitting your job (via
your sbatch or sarray scripts)
 Resources are allocated as per user requests, but strict
limits are not enforced
 If you use more memory than you requested it can
~massively~ slow down yours (and others) jobs!
 To check the memory usage of your jobs you can use the
‘myjobs’ command (see examples later)
Gauss Etiquette
 Gauss is a shared resource – your bad code can
(potentially) ruin someone elses simulation!
 Test your code thoroughly before running large jobs
 Make sure you request the correct amount of
resources for your jobs
 Regularly check memory usage for long-running jobs
 Be considerate of others!
Aside: Linux Basics
 To use Gauss you need to know some basic Linux
commands (these work on a Mac terminal too)
 You should already be, or quickly get, familiar with
the following commands:
ls, cd, cp, mv, rm, pwd, cat, tar, grep
 It helps if you learn how to use a command line editor
such as vim or nano. (hint: use vim )
Ways to use Gauss: Example 1
Bob has been given a large dataset by a collaborator
and told to analyze it in. The dataset is large and the job
will take about 3 days to complete so he doesn’t want
to use his laptop!
Bob can submit the job on Gauss, and keep on working
on other stuff in the meantime.
Example 1 cont…
Code files:
bob_example_1.R
bob_example_1.sh
To submit:
sbatch bob_example_1.sh
Example 1 Code: SLURM script
Allocating Resources
 How do you know how much memory to request? Run small trial jobs!
 Use the ‘myjobs’ command e.g.,
pdbaines@gauss:~/Examples/Example_3$ myjobs
Tue Nov 20 10:27:45 PST 2012 - pdbaines has jobs running on: c0-11
jobs for pdbaines on c0-11
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
pdbaines 13932 99.0 0.3 408424 216492 ? R 10:25 3:12 R
pdbaines 13949 99.1 0.3 434308 242336 ? R 10:25 3:12 R
pdbaines 13975 99.1 0.2 367720 175780 ? R 10:25 3:12 R
pdbaines 13995 99.1 0.3 425100 233172 ? R 10:25 3:12 R
VSZ and RSS give a rough indication of how much memory your job is using (in Kb)
e.g., The above R jobs are using ~350-450Mb each.
Ways to use Gauss: Example 2
Bob has been given 3 more datasets to analyze by his
collaborator (or three new analyses to perform on the
same dataset).
He just needs to set up the same thing as example 1
multiple times.
Example 2 cont…
Code files:
bob_example_2A.R,
bob_example_2B.R,
bob_example_2C.R
bob_example_2A.sh,
bob_example_2B.sh,
bob_example_2C.sh,
To submit:
sbatch bob_example_2A.sh
sbatch bob_example_2B.sh
sbatch bob_example_2B.sh
Example 2 Code: SLURM script
Ways to use Gauss: Example 3
Bob has developed a new methodology for analyzing supercomplicated data.
He wants to run a simulation to prove to the world how
awesome his method is compared to his competitors
methods.
He decides to simulate 100 datasets, and analyze each of
them with his method, and his competitors methods.
This is done using an array job.
Example 3 cont…
 Bob writes an R script to randomly generate and analyze one
dataset at a time
 He would like to run the script 100 times on Gauss
 To do this, he write a shell script to submit to SLURM
 Each run must use a different random seed, o/w he will analyze
the same dataset 100 times!
 He will also need to write an R script to combine the results from
all 100 jobs
 He will also need a shell script to submit the post-processing
portion of the analysis
 (Note: I have described this process in detail on the Gauss page
of the CSE Wiki:
http://wiki.cse.ucdavis.edu/support:systems:gauss)
Example 3 cont…
Code files:
bob_example_3.R
Bob_post_process.R
To submit:
sarray bob_example_3.sh
sbatch bob_post_process.sh
Example 3: SLURM script
Example 3: Modified R Code
Retrieving your results
 To copy results back from Gauss to your laptop:
 Archive them e.g.,
tar –cvzf all_results.tar.gz my_results/
• Copy them by either using a file transfer (sftp) program, or,
just use the command line (Linux/Mac users) e.g.,
scp myusername@gauss.cse.ucdavis.edu:~/all_results.tar.gz ./
More Advanced Usage
 Gauss can be setup to run parallel computing jobs
using MPI, OpenMP etc.
 SLURM submit files need to be modified to specify
number of tasks, CPU’s, memory per CPU etc.
 New (free) software can be installed on Gauss at your
request by emailing help@cse
References
Pre-requisite Linux skills:
 http://code.google.com/edu/tools101/linux/basics.html
Gauss/SLURM Links:
 http://wiki.cse.ucdavis.edu/support:general:security:ssh#moving_and_copyi
ng_keys
 http://wiki.cse.ucdavis.edu/support:faq:getting_started
 http://wiki.cse.ucdavis.edu/support:systems:gauss
 http://wiki.cse.ucdavis.edu
 http://www.sph.umich.edu/biostat/computing/cluster/slurm.html
 https://computing.llnl.gov/linux/slurm/faq.html
 https://computing.llnl.gov/linux/slurm/documentation.html
Download