Session5-cja

advertisement
Parallel Programming
Workshop
HPC 470
August, 2015
Credits
Contributors:
Dr Charles Antonelli (LSA IT)
Mark Champe (LSA IT)
Bennet Fauber (ARC)
Dr Alexander Gaenko (ARC)
Nancy Herlocher (LSA IT)
Seth Meyer (LSA IT)
Todd Raeker (ARC)
Brought to you under the auspices of
Advanced Research Computing, U-M Office of Research
LSA IT ARS / cja © 2015
2
10 Aug 2015
Roadmap
Session
Monday
August 10
Tuesday
August 11
Wednesday
August 12
Thursday
August 13
Friday
August 14
Morning (10 AM - 1 PM)
Session 1:
Introduction/roadmap (Antonelli)
Session 3:
Parallel Python (Champe)
Session 5:
Parallel C & Fortran (Antonelli)
Profiling & debugging
MPI (message-passing)
Session 7:
Accelerator Parallelism (Meyer)
CUDA
Session 9:
Intro to Globus+ (Raeker)
In particular will go through
examples of sharing features.
git (Herlocher)
Data copy intro (Antonelli)
scp/sftp, flux-xfer
Globus Connect
Intro to Cloud computing (Raeker)
Cloud-based compute sources:
AWS, Azure, Google Compute
Cloud,..
Intro to parallelism (Antonelli)
Lab (all)
Afternoon (2 PM - 5 PM)
Session 2:
Parallel R (Fauber)
Basic R functions, list applicable
functions, and converting list
applicable functions to parallel
execution.
Session 4:
Parallel MATLAB (Fauber)
Two examples will be shown,
one involving processing many
input files, the other a Monte
Carlo simulation.
Please see our course web page for more information and registration information.
http://arc-ts.umich.edu/hpc470/
Session 6:
Parallel C & Fortran (Antonelli)
OpenMP (multi-core)
OpenACC (accelerators)
Session 8:
Accelerator Parallelism (Meyer)
CUDA (continued)
Intel Xeon Phi
Session 10:
Lab (all)
Lab report-outs (participants)
Roadmap
Session
Monday
August 10
Tuesday
August 11
Morning (10 AM - 1 PM)
Session 1:
Introduction/roadmap (Antonelli)
Session 3:
Parallel Python (Cha
git (Herlocher)
Data copy intro (Antonelli)
scp/sftp, flux-xfer
Globus Connect
Intro to parallelism (Antonelli)
Afternoon (2 PM - 5 PM)
Session 2:
Parallel R (Fauber)
Basic R functions, list applicable
functions, and converting list
applicable functions to parallel
execution.
Session 4:
Parallel MATLAB (F
Two examples will
one involving proce
input files, the othe
Carlo simulation.
Copying data
LSA IT ARS / cja © 2015
5
10 Aug 2015
Copying data
From Linux or Mac OS X, use scp or sftp
Non-interactive (scp)
scp localfile uniqname@flux-xfer.engin.umich.edu:remotefile
scp -r localdir uniqname@flux-xfer.engin.umich.edu:remotedir
scp uniqname@flux-login.engin.umich.edu:remotefile localfile
Use "." as destination to copy to your Flux home directory:
scp localfile login@flux-xfer.engin.umich.edu:.
... or to your Flux scratch directory:
scp localfile
login@flux-xfer.engin.umich.edu:/scratch/allocname/uniqname
Interactive (sftp)
sftp uniqname@flux-xfer.engin.umich.edu
From Windows, use WinSCP
U-M Blue Disc: http://www.itcs.umich.edu/bluedisc/
cja 2015
6
05/15
Globus Online
Features
High-speed data transfer, much faster than SCP or SFTP
Reliable & persistent
Minimal client software: Mac OS X, Linux, Windows
GridFTP Endpoints
Gateways through which data flow
Exist for XSEDE, OSG, …
UMich: umich#flux, umich#nyx
Add your own client endpoint!
Add your own server endpoint: contact flux-support@umich.edu
More information
http://arc.umich.edu/flux-and-other-hpc-resources/flux/usingflux/transferring-files-with-globus-gridftp/
cja 2015
7
05/15
Parallelism Review
LSA IT ARS / cja © 2015
8
10 Aug 2015
Compute Node
P
RAM
Process
Processor
Local disk
ES15
9
5/15
Fine-grained parallelism
P
RAM
Cores
ES15
Local disk
10
5/15
Fine-grained parallelism
P
RAM
Cores
ES15
Local disk
11
5/15
Programming Models (1)
Fine-grained parallelism
The parallel application consists of a single process
containing several parallel threads that communicate
with each other using synchronization primitives
Used when the data can fit into a single process, and the
communications overhead of the message-passing model is
intolerable
“Symmetric multiprocessing (SMP)” or “Shared-memory
parallelism” or “multi-threaded parallelism” or …
Implemented using compilers and software libraries
OpenMP (Open Multi-Processing)
ES15
12
5/15
Coarse-grained parallelism
ES15
13
5/15
Programming Models (2)
Coarse-grained parallelism
The parallel application consists of several processes
running on different nodes and communicating with
each other over the network
Used when the data are too large to fit on a single node, and
simple synchronization is adequate
“Message-passing”
Implemented using software libraries
MPI (Message Passing Interface)
ES15
14
5/15
Good parallel
Embarrassingly parallel
Folding@home, RSA Challenges, Bitcoin mining,
password cracking, …
http://en.wikipedia.org/wiki/List_of_distributed_co
mputing_projects
ES15
15
5/15
Amdahl’s Law
If you enhance a fraction f of a computation
by a speedup S, the overall speedup is:
ES15
16
5/15
Amdahl’s Law
ES15
17
5/15
MPI
LSA IT ARS / cja © 2015
18
10 Aug 2015
Crib sheet
Login to Stampede
ssh trainNNN@stampede.tacc.utexas.edu
Get and compile the code
cp -r ~cja/hpc470/jacobi .
cd hpc470/jacobi
make
Get on a compute node
idev –m 30
Run the vanilla version
cd hpc470/jacobi
mpirun -bootstrap fork –np N ./oned
Excellent user guide
https://portal.tacc.utexas.edu/user-guides/stampede
LSA IT ARS / cja © 2015
19
19
10 Aug 2015
Debugging & profiling
LSA IT ARS / cja © 2015
20
10 Aug 2015
Debugging with GDB
Command-line debugger
Start programs or attach to running programs
Display source program lines
Display and change variables or memory
Plant breakpoints, watchpoints
Examine stack frames
Excellent tutorial documentation
http://www.gnu.org/s/gdb/documentation/
LSA IT ARS / cja © 2015
21
21
10 Aug 2015
Compiling for GDB
Debugging is easier if you ask the compiler to generate extra
source-level debugging information
Add -g flag to your compilation
icc -g serialprogram.c -o serialprogram
or
mpicc -g mpiprogram.c -o mpiprogram
GDB will work without symbols
Need to be fluent in machine instructions and hexadecimal
Be careful using -O with -g
Some compilers won't optimize code when debugging
Most will, but you sometimes won't recognize the resulting source
code at optimization level -O2 and higher
Use -O0 -g to suppress optimization
LSA IT ARS / cja © 2015
22
22
10 Aug 2015
Running GDB
Two ways to invoke GDB:
Debugging a serial program:
gdb ./serialprogram
Debugging an MPI program:
mpirun -np N xterm -e gdb ./mpiprogram
This gives you N separate GDB sessions, each debugging one
rank of the program
Remember to use the -X or -Y option to ssh when connecting
to Flux, or you can't start xterms there
LSA IT ARS / cja © 2015
23
10 Aug 2015
Useful GDB commands
gdb exec
gdb exec core
l [m,n]
disas
disas func
b func
b line#
b *0xaddr
ib
d bp#
r [args]
bt
c
step
next
stepi
p var
p *var
p &var
p arr[idx]
x 0xaddr
x *0xaddr
x/20x 0xaddr
ir
i r ebp
set var = expression
q
LSA IT ARS / cja © 2015
start gdb on executable exec
start gdb on executable exec with core file core
list source
disassemble function enclosing current instruction
disassemble function func
set breakpoint at entry to func
set breakpoint at source line#
set breakpoint at address addr
show breakpoints
delete beakpoint bp#
run program with optional args
show stack backtrace
continue execution from breakpoint
single-step one source line
single-step, don't step into function
single-step one instruction
display contents of variable var
display value pointed to by var
display address of var
display element idx of array arr
display hex word at addr
display hex word pointed to by addr
display 20 words in hex starting at addr
display registers
display register ebp
set variable var to expression
quit gdb
24
10 Aug 2015
Debugging with DDT
Allinea's Distributed Debugging Tool is a
comprehensive graphical debugger designed for the
complex task of debugging parallel code
Advantages include
Provides GUI interface to debugging
Similar capabilities as, e.g., Eclipse or Visual Studio
Supports parallel debugging of MPI programs
Scales much better than GDB
LSA IT ARS / cja © 2015
25
10 Aug 2015
Running DDT
Compile with -g:
mpicc -g mpiprogram.c -o mpiprogram
Load the DDT module:
module load ddt
Start DDT:
ddt mpiprogram
This starts a DDT session, debugging all ranks concurrently
Remember to use the -X or -Y option to ssh when connecting to Flux, or
you can't start ddt there
http://arc-ts.umich.edu/software/
http://content.allinea.com/downloads/userguide.pdf
LSA IT ARS / cja © 2015
26
10 Aug 2015
Application Profiling with MAP
Allinea's MAP Tool is a statistical application profiler
designed for the complex task of profiling parallel
code
Advantages include
Provides GUI interface to profiling
Observe cumulative results, drill down for details
Supports parallel profiling of MPI programs
Handles most of the details under the covers
LSA IT ARS / cja © 2015
27
10 Aug 2015
Running MAP
Compile with -g:
mpicc -g mpiprogram.c -o mpiprogram
Load the MAP module:
module load ddt
Start MAP:
map mpiprogram
This starts a MAP session
Runs your program, gathers profile data, displays summary statistics
Remember to use the -X or -Y option to ssh when connecting to
Flux, or you can't start ddt there
http://content.allinea.com/downloads/userguide.pdf
LSA IT ARS / cja © 2015
28
10 Aug 2015
OpenMP
LSA IT ARS / cja © 2015
29
10 Aug 2015
OpenACC
LSA IT ARS / cja © 2015
30
10 Aug 2015
Crib sheet
Login to Flux
ssh flux-login.arc-ts.umich.edu
Get on a GPU node
qsub –I –V –X –l nodes=1:gpus=1 –q fluxg –l qos=flux
–A hpc470_fluxg –l walltime=4:00:00
Get, compile, run the code
cp -r ~cja/hpc470/saxpy.
cd saxpy
module load cuda
module load pgi
pgcc -ta=nvidia,cc11 -acc -Minfo=accel saxpy.c -o saxpy
./saxpy
LSA IT ARS / cja © 2015
31
31
10 Aug 2015
Resources
http://arc-ts.umich.edu/flux/
U-M Advanced Research Computing Flux pages
http://arc.research.umich.edu/software/
Flux Software Catalog
http://arc-ts.umich.edu/flux/flux-faqs/
Flux FAQs
http://www.youtube.com/user/UMCoECAC
ARC-TS YouTube channel
For assistance: flux-support@umich.edu
Read by a team of people including unit support staff
Cannot help with programming questions, but can help with
operational Flux and basic usage questions
LSA IT ARS / cja © 2015
32
10 Aug 2015
References
1.
2.
Supported Flux software, http://arc-ts.umich.edu/software/, (accessed May 2015)
Free Software Foundation, Inc., "GDB User Manual,"
http://www.gnu.org/s/gdb/documentation/ (accessed May 2015).
3. Intel C and C++ Compiler 14 User and Reference Guide, https://software.intel.com/enus/compiler_15.0_ug_c (accessed May 2015).
4. Intel Fortran Compiler 14 User and Reference Guide,https://software.intel.com/enus/compiler_15.0_ug_f(accessed May 2015).
5. Torque Administrator's Guide, http://www.adaptivecomputing.com/resources/docs/torque/5-10/torqueAdminGuide-5.1.0.pdf (accessed May 2015).
6. Submitting GPGPU Jobs, https://sites.google.com/a/umich.edu/engincac/resources/systems/flux/gpgpus (accessed May 2015).
7. http://content.allinea.com/downloads/userguide.pdf (accessed May 2015)
LSA IT ARS / cja © 2015
33
10 Aug 2015
Download