Getting Started on Killdevil and Kure

advertisement
Using Kure and Killdevil
Mark Reed
Sandeep Sarangi
ITS Research Computing
Outline
 Compute Clusters
• Killdevil
• Kure
 Logging In
 File Spaces
 User Environment and
Applications, Compiling
 Job Management
2
Links
 UNC Research Computing
• http://its.unc.edu/research
 Getting started Killdevil page
• http://help.unc.edu/CCM3_031537
 Killdevil FAQ
• http://help.unc.edu/CCM3_031548
 Getting started Kure page
• http://help.unc.edu/ccm3_015682
3
What is a compute cluster?
What exactly is Killdevil?
Kure?
What is a compute cluster?
Some Typical Components
 Compute Nodes
 Interconnect
 Shared File System
 Software
 Operating System (OS)
 Job Scheduler/Manager
 Mass Storage
5
Compute Cluster Advantages
 fast interconnect, tightly coupled
 aggregated compute resources
• can run parallel jobs to access more compute
power and more memory
 large (scratch) file spaces
 installed software base
 scheduling and job management
 high availability
 data backup
6
Multi-Core Computing
 The trend in High Performance
Computing is towards multi-core or many
core computing.
 More cores at slower clock speeds for less
heat
 Dual and quad core processors are now
common.
 Soon 64+ core processors will be common
7
Kure
 A HPC/HTC research
compute cluster in RC
 Named after the beach
in North Carolina
 It’s pronounced like the
Nobel prize winning
physicist and chemist,
Madame Curie
8
Kure Compute Cluster
 Heterogeneous Research
Cluster
 Hewlett Packard Blades
 200+Compute Nodes,
mostly
• Xeon 5560 2.8 GHz
• Nehalem Microarchitecture
• Dual socket, quad core
• 48 GB memory
• over 1800 cores
• some higher memory nodes
 Infiniband 4x QDR
 priority usage for patrons
• Buy in is cheap
 Storage
•
•
/netscr – 197 TB
Isilon space
9
Kure Cont.
 The original configuration of Kure was
mostly homogeneous but it became
increasingly heterogeneous as patrons
added to it.
 Most (non-patron) compute nodes are 48 GB
but there are additional high memory nodes
 3 nodes each with 192 GB of memory
 2 nodes each with 96 GB of memory
 patron nodes with 72 GB of memory
10
Multi-Purpose Killdevil Cluster
 High Performance Computing
• Large parallel jobs, high speed interconnect
 High Throughput Computing (HTC)
• high volume serial jobs
 Large memory jobs
• special nodes for extreme memory
 GPGPU computing
• computing on Nvidia processors
11
Killdevil Nodes
 Three types of nodes:
• compute nodes
• large memory nodes
• GPGPU nodes
12
Killdevil Cluster –
Compute Nodes
 Intel Xeon processors, Model X5670
 Dual socket hex core (12 cores per node)
 2.93 GHz processors for each core
 48 or 96 GB memory per node
13
Killdevil Cluster –
Compute Nodes
 Intel Xeon processors, Model E5-2670
 Dual socket oct core (16 cores per node)
 2.60 GHz processors for each core
 64 GB memory per node
14
Killdevil Cluster –
Compute Nodes
 68 nodes with 64 GB memory per node
 604nodes with 48 GB memory per node
 68 nodes with 96 GB memory
 total of 740 nodes with 9152 cores
• plus GPU and large memory nodes
• So 774 nodes with 9600 cores total
15
Killdevil Extreme Memory
Nodes
 2 nodes each with 1 TB of memory
• extremely large shared memory node!
 Intel Xeon Model X7550
 32 cores per node
 2.0 GHz processors
 Use the bigmem queue
16
Killdevil GPGPU
Computing
 General Purpose computing on Graphics
Processing Units (GPGPU)
 32 compute nodes are paired with 64 GPU’s
in a 2:1 ratio
• this is configurable and may vary
 compute nodes are Intel Xeon X5650, 2.67
GHz, 12 cores, 48 GB memory nodes
 GPUs are Nvidia Tesla (M2070), each with 448
compute cores
 Use the gpu queue
17
Infiniband Connections
 Connection comes in single (SDR), double (DDR), and




quad data rates (QDR). Now also FDR and EDR.
• Killdevil is QDR.
Single data rate is 2.5 Gbit/s in each direction per link.
Links can be aggregated - 1x, 4x, 12x.
• Killdevil is 4x.
Links use 8B/10B encoding —10 bits carry 8 bits of data
— useful data transmission rate is four-fifths the raw
rate. Thus single, double, and quad data rates carry 2,
4, or 8 Gbit/s respectively.
Data rate for Killdevil is 32 Gb/s or 4 GB/s (4x QDR).
18
Login to Killdevil/Kure
 Use ssh to connect:
• ssh killdevil.unc.edu
• ssh kure.unc.edu
 SSH Secure Shell with Windows
• see http://shareware.unc.edu/software.html
 For use with X-Windows Display:
• ssh –X killdevil.unc.edu or ssh –X kure.unc.edu
• ssh –Y killdevil.unc.edu or ssh –Y kure.unc.edu
 Off-campus users (i.e. domains outside of
unc.edu) must use VPN connection
19
File Spaces
Killdevil File Spaces
 Home directories
• /nas02/home/<a>/<b>/<onyen>
 a = first letter of onyen, b = second letter of onyen
• hard limit of 15 GB
 Scratch Space
• NOT backed up
• purged regularly (21 days or less)
• run jobs with large output in these spaces
 /netscr – 197 TB (tuned for small files)
 /lustre – 126 TB (tuned for large files)
 Mass Storage
• ~/ms
21
Kure File Spaces
 Home directories
• /nas02/home/<a>/<b>/<onyen>
 a = first letter of onyen, b = second letter of onyen
• hard limit of 15 GB
 Scratch Space
• NOT backed up
• purged regularly (21 days or less)
• run jobs with large output in these spaces
 /netscr – 197 TB (tuned for small files)
 Mass Storage
• ~/ms
22
File System Notes
 Note that the same home directory is
mounted on Killdevil and Kure
 Check your home file space usage with the
quota command
• quota –s
(this uses more readable units)
 Lustre file space in Killdevil is attached via
Infiniband and may be faster
 Best practice for jobs with large output is to
run them in scratch space, tar and compress
results, and store them in mass storage.
23
Mass Storage
 long term archival storage
 access via ~/ms
 looks like ordinary disk file system – data
is actually stored on tape
 “limitless” capacity
 data is backed up
 For storage only, not a work directory
(i.e. don’t run jobs from here)
 if you have many small files, use tar or
zip to create a single file for better
performance
“To infinity … and beyond”
- Buzz Lightyear
 Sign up for this service on onyen.unc.edu
24
User Environment and
Applications, Compiling Code
Modules
Modules
 The user environment is managed by modules.
They provide a convenient way to access software
applications
 Modules modify the user environment by
modifying and adding environment variables such
as PATH or LD_LIBRARY_PATH
 Typically you set these once and leave them
 Note there are two module settings, one for your
current environment and one to take affect on
your next login (e.g. batch jobs running on
compute nodes)
26
Common Module Commands
 module avail
• module avail apps
 module help
Change Current Shell Login version
 module list
 module initlist
 module add
 module initadd
 module rm
 module initrm
More on modules see
http://help.unc.edu/CCM3_006660
27
Compiling on
Killdevil/Kure
 Serial Programming
 Suites for C, C++, Fortran90, Fortran77, etc
 Intel Compilers
• icc, icpc, ifort
 GNU
• gcc, g++, gfortran
 Portland Group (PGI)
• pgcc, pgCC, pgf90, pgf77
 Generally speaking the Intel or PGI compilers will
give slightly better performance
28
Parallel Jobs with MPI
 There are three implementations of the MPI
standard installed on both systems:
• mvapich
• mvapich2
• openmpi
 Performance is similar for all three, all three
run on the IB fabric. Mvapich is the default.
Openmpi and mvapich2 have more the MPI-2
features implemented.
29
Compiling MPI programs
 Use the MPI wrappers to compile your
program
• mpicc, mpiCC, mpif90, mpif77
• the wrappers will find the appropriate
include files and libraries and then invoke
the actual compiler
• for example, mpicc will invoke either gcc,
pgcc or icc depending upon which module
you have loaded
30
Compiling on
Killdevil/Kure
Parallel Programming
 MPI (see previous page)
 OpenMP
• Compiler flag:
 -openmp for Intel
 -fopenmp for GNU
 -mp for PGI
• Must set OMP_NUM_THREADS in submission script
31
Job Scheduling and Management
What does a Job Scheduler
and batch system do?
Manage Resources






allocate user tasks to resource
monitor tasks
process control
manage input and output
report status, availability, etc
enforce usage policies
33
Job Scheduling Systems
 Allocates compute nodes to job submissions
based on user priority, requested resources,
execution time, etc.
 Many types of schedulers
• Load Sharing Facility (LSF) – Used by Killdevil/Kure
• IBM LoadLeveler
• Portable Batch System (PBS)
• Sun Grid Engine (SGE)
• Simple Linux Utility for Resource Management
(SLURM)
34
LSF
 All Research Computing clusters use LSF to do job
scheduling and management
 LSF (Load Sharing Facility) is a (licensed) product
from Platform Computing (now owned by IBM)
• Fairly distribute compute nodes among users
• enforce usage policies for established queues
 most common queues: int, now, week, month
• RC uses Fair Share scheduling, not first come, first
served (FCFS)
 LSF commands typically start with the letter b (as
in batch), e.g. bsub, bqueues, bjobs, bhosts, …
• see man pages for much more info!
35
Simplified view of LSF
Jobs Queued
job_J
job_F
myjob
job_7
Login Node
job dispatched to run on
available host which satisfies
job requirements
job routed
to queue
bsub –q week myjob
user logged in to login
node submits job
36
Running Programs on
Killdevil
 Upon ssh to Killdevil/Kure, you are on the
Login node.
 Programs SHOULD NOT be run on Login node.
 Submit programs to one of the many, many
compute nodes.
 Submit jobs using Load Sharing Facility (LSF)
via the bsub command.
37
Common batch commands
 bsub - submit jobs
 bqueues – view info on defined queues
• bqueues –l week
 bkill – stop/cancel submitted job
 bjobs – view submitted jobs
• bjobs –u all
 bhist – job history
• bhist –l <jobID>
38
Common batch commands
 bhosts – status and resources of hosts (nodes)
 bpeek – display output of running job
 Use man pages to get much more info!
• man bjobs
39
Submitting Jobs: bsub
Command
 Submit Jobs - bsub
• Run large jobs out of scratch space, smaller jobs can
run out of your home space
 bsub [-bsub_opts] executable [-exec_opts]
 Common bsub options:
• –o <filename>
•
•
•
 –o out.%J
-q <queue name>
 -q week
-R “resource specification”
 -R “span[ptile=8]”
-n <number of processes>
 used for parallel, MPI jobs
40
Two methods to submit jobs:
 bsub example: submit the executable job,
myexe, to the week queue and redirect
output to the file out.<jobID> (default is to
mail output)
 Method 1: Command Line
• bsub –q week –o out.%J myexe
 Method 2: Create a file (details to follow)
called, for example, myexe.bsub, and then
submit that file. Note the redirect symbol, <
• bsub < myexe.bsub
41
Method 2 cont.
 The file you submitted will contain all the bsub
options you want in it, so for this example
myexe.bsub will look like this
#BSUB –q week
#BSUB –o out.%J
myexe
 This is actually a shell script so the top line could
be the normal #!/bin/csh, etc and you can run any
commands you would like.
• if this doesn’t mean anything to you then nevermind :)
42
Parallel Job example
Batch Command Line Method
 bsub –q week –o out.%J -n 64 mpirun myParExe
Batch File Method
 bsub < myexe.bsub
 where myexe.bsub will look like this
#BSUB –q week
#BSUB –o out.%J
#BSUB –n 64
mpirun myParallelExe
43
Minor Killdevil caveats
 Memory limits: Killdevil has a default memory
limit of 4 GB for a job- if you need more than
the default, use the “-M” LSF option:
bsub –q week –o out.%J –M 9 myExe
 PI groups: On Killdevil when you submit a job
make sure you use the correct PI group (only
applicable if you belong to more than one PI
group) by using the “-G” LSF option:
bsub –q week –G itsrc_grp myExe
44
Minor Killdevil caveats
(cont’d)
 Using the correct PI group is important
for bookkeeping we do in regard to
cluster usage by the PI groups
 To check the PI groups to which you
belong:
bugroup | grep <onyen>
45
Interactive Jobs
 To run long shell scripts on Kure, use int
(interactive) queue
 bsub –q int –Ip /bin/bash
• This bsub command provides a prompt on
compute node
• Can run program or shell script interactively from
compute node
 on Killdevil use hour or day as needed
• bsub –q hour –Ip /bin/bash
46
Specialty Scripts
 There are specialty scripts provided on Kure for
the user convenience.
 Batch scripts
• bmatlab, bsas, bstata
 X-window scripts
• xmatlab, xsas, xstata
 Interactive scripts
• imatlab, istata
 Killdevil only provides the *matlab scripts listed
above
47
MPI/OpenMP Training
 Courses are taught throughout year by
Research Computing
• http://learnit.unc.edu/workshops
• http://help.unc.edu/CCM3_008194
 See schedule for next course
• MPI
• OpenMP
48
Further Help with
Killdevil/Kure
 More details can be found on the Getting Started help
documents:
• http://help.unc.edu/CCM3_031537 - Killdevil
• http://help.unc.edu/ccm3_015682 - Kure
 For assistance with Killdevil/Kure, please contact the
ITS Research Computing group
• Email: research@unc.edu
• Phone: 919-962-HELP
• Submit help ticket at http://help.unc.edu
 For immediate assistance on a particular
command, see the manual pages
• man <command>
49
Download