Software and Services Group

advertisement
Intel ® Cluster Tools
Introduction and Hands on Sessions
MSU Summer School
Intel Cluster Software and Technologies
Software & Services Group
July, 8 2010, MSU Moscow
Software and Services Group
1
Agenda
– Intel Cluster Tools settings and configuration
– Intel MPI fabrics
– Message Checker
– ITAC Introduction
– ITAC practice
Software and Services Group
2
Setup configuration
•
•
•
•
Source /opt/intel/cc/11.0.74/bin/iccvars.sh intel64
Source /opt/intel/fc/11.0.74/bin/ifortvars.sh intel64
Source /opt/intel/impi/4.0.0.25/bin64/mpivars.sh
Source /opt/intel/itac/8.0.0.011/bin/itacvars.sh impi4
Software and Services Group
3
Check configuration
•
•
•
•
•
•
•
Which icc
Which ifort
Which mpiexec
Which traceanalyzer
Echo $LD_LIBRARY_PATH
Set | grep I_MPI
Set | grep VT_
Software and Services Group
4
Compile your first MPI application
• Using Intel compilers
• mpiicc, mpiicpc, mpiifort, ...
• Using Gnu compilers
• mpicc, mpicxx, mpif77, ...
• mpiicc -o hello_c test.c
• mpiifort -o hello_f test.f
Software and Services Group
5
Create mpd.hosts file
• Create mpd.hosts file in the working directory
with list of available nodes
Create mpd ring
• mpdboot -r ssh -n #nodes
Check mpd ring
• mpdtrace
Software and Services Group
6
Start your first application
• mpiexec -n 16 ./hello_c
• mpiexec -n 16 ./hello_f
Kill mpd ring
• mpdallexit
• mpdcleanup -a
Start your first application
• mpirun -r ssh -n 16 ./hello_c
• mpirun -r ssh -n 16 ./hello_f
Software and Services Group
7
Alternative Process Manager
• Use mpiexec.hydra for better scalability
• All options are the same
Software and Services Group
8
OFED & DAPL
• OFED - OpenFabrics Enterprise Distribution
http://openfabrics.org/
• DAPL - Direct Access Programming Library
http://www.openfabrics.org/downloads/dapl/
• Check /etc/dat.conf
• Set I_MPI_DAPL_PROVIDER=OpenIB-mlx4_0-2
Software and Services Group
9
Fabrics selection
I_MPI_DEVICE
I_MPI_FABRICS
Description
sock
tcp
TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*)
shm
shm
Shared-memory only
ssm
shm:tcp
Shared-memory + TCP/IP
rdma
dapl
DAPL–capable network fabrics, such as InfiniBand*, iWarp*, Dolphin*, and XPMEM* (through DAPL*)
rdssm
shm:dapl
Shared-memory + DAPL + sockets
ofa
OFA-capable network fabric including InfiniBand* (through native OFED* verbs)
shm:ofa
OFA-capable network fabric with shared memory for intra-node communication
tmi
TMI-capable network fabrics including Qlogic*, Myrinet*, (through Tag Matching Interface)
shm:tmi
TMI-capable network fabric with shared memory for intra-node communication
Software and Services Group
10
Fabrics selection (cont.)
• Use I_MPI_FABRICS to set the desired fabric
– export I_MPI_FABRICS shm:tcp
– mpirun -r ssh -n -env I_MPI_FABRICS shm:tcp ./a.out
• DAPL varieties:
– export I_MPI_FABRICS shm:dapl
– export I_MPI_DAPL_PROVIDER ofa-v2-mlx4_0-1
– export I_MPI_DAPL_UD enable
• Connectionless communication
• Better scalability
• Less memory is required
Software and Services Group
11
Fabrics selection (cont.)
• OFA fabric
– export I_MPI_FABRICS shm:ofa
• Multi-rail feature
– export I_MPI_OFA_NUM_ADAPTERS=<n>
– export I_MPI_OFA_NUM_PORTS=<n>
• For OFA devices Intel® MPI Library recognizes
some hardware events, can stop using one line
and restore connection when a line is OK again
Software and Services Group
12
How to get information from Intel MPI library
• Use I_MPI_DEBUG env variable
– Use a number from 2 to 1001 for different details level
– Level 2 shows data transfer mode
– Level 4 shows pinning information
Software and Services Group
13
cpuinfo utility
• Use this utility to get information about
processors used in your system
Intel(R) Xeon(R) Processor (Intel64 Harpertown)
===== Processor composition =====
Processors(CPUs) : 8
Packages(sockets) : 2
Cores per package : 4
Threads per core : 1
===== Processor identification =====
Processor
Thread Id. Core Id.
Package Id.
0
0
0
0
1
0
0
1
2
0
1
0
3
0
1
1
4
0
2
0
5
0
2
1
6
0
3
0
7
0
3
1
===== Placement on packages =====
Package Id. Core Id.
Processors
0
0,1,2,3
0,2,4,6
1
0,1,2,3
1,3,5,7
===== Cache sharing =====
Cache Size
Processors
L1 32 KB
no sharing
L2 6 MB
(0,2)(1,3)(4,6)(5,7)
Software and Services Group
14
Pinning
• One can change default pinning settings
–
–
–
–
export I_MPI_PIN on/off
export I_MPI_PIN_DOMAIN cache2 (for hybrid)
export I_MPI_PROCESSOR_LIST allcores
export I_MPI_PROCESSOR_LIST shift=socket
Software and Services Group
15
OpenMP and Hybrid applications
•
Check command line for application building
–
–
–
Use the thread safe version of the Intel® MPI Library (-mt_mpi option)
Use the libraries with SMP parallelization (i.e. parallel MKL)
Use –openmp compiler option to enable OpenMP* directives
$ mpiicc –openmp -o ./your_app
•
Set application execution environment for hybrid applications
–
–
Set OMP_NUM_THREADS to threads number
Use –perhost option to control process pinning
$ export OMP_NUM_THREADS=4
$ export I_MPI_FABRICS=shm:dapl
$ export KMP_AFFINITY=compact
$ mpirun -perhost 4 -n <N> ./a.out
Software and Services Group
16
Intel® MPI Library and MKL
• MKL creates own threads (openMP, TBB, …)
• MKL from version 10.2 understands settings of
Intel® MPI Library and doesn’t create more
processes than cores
• Use OMP_NUM_THREADS and
MKL_NUM_THREADS carefully
Software and Services Group
17
How to run a debugger
• TotalView
– mpirun -r ssh -tv –n # ./a.out
• GDB
– mpirun -r ssh -gdb –n # ./a.out
• Allinea DDT (from GUI)
• IDB
– mpirun -r ssh -idb –n # ./a.out
– You need idb available in your $PATH
– Some settings are required
Software and Services Group
18
Message Checker
• Local checks: isolated to single process
–
–
–
–
Unexpected process termination
Buffer handling
Request and data type management
Parameter errors found by MPI
• Global checks: all processes
– Global checks for collectives and p2p ops
•
•
•
•
Data type mismatches
Corrupted data transmission
Pending messages
Deadlocks (hard & potential)
– Global checks for collectives – one report per operation
• Operation, size, reduction operation, root mismatch
• Parameter error
• Mismatched MPI_Comm_free()
Software and Services Group
19
Message Checker (cont.)
• Levels of severity:
– Warnings: application can continue
– Error: application can continue but almost certainly not as
intended
– Fatal error: application must be aborted
• Some checks may find both warnings and errors
– Example: CALL_FAILED check due to invalid parameter
• Invalid parameter in MPI_Send() => msg cannot be sent =>
error
• Invalid parameter in MPI_Request_free() => resource leak
=> warning
Software and Services Group
20
Message Checker (cont.)
• Usage model:
– Recommended:
• -check option when running an MPI job
$ mpiexec –check –n 4 ./a.out
• Use fail-safe version in case of crash
$ mpiexec –check libVTfs.so –n 4 ./a.out
– Alternatively:
• -check_mpi option during link stage
$ mpiicc –check_mpi –g test.c –o a.out
• Configuration
– Each check can be enabled/disabled individually
• set in VT_CONFIG file, e.g. to enable local checks only:
CHECK ** OFF
CHECK LOCAL:** ON
– Change number of warnings and errors printed and/or tolerated before
abort
See lab/poisson_ITAC_dbglibs
Software and Services Group
21
Trace Collector
• Link with trace library:
– mpiicc -trace test.c -o a.out
• Run with -trace option
– mpiexec -trace -n # ./a.out
• Using of itcpin utility
– mpirun –r ssh –n # itcpin --run -- ./a.out
– Binary instrumentation
• Use -tcollect link option
– mpiicc -tcollect test.c -o a.out
Software and Services Group
22
Using Trace Collector for openMP applications
• ITA can show only those threads which call MPI functions. There is
very simple trick: e.g. before "#pragma omp barrier" add MPI call:
– { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); }
• After such modification ITA will show information about OpenMP
threads.
• Please remember that to support threads you need to use threadsafe MPI Library. Don't forget to set VT_MPI_DLL environment
variable.
–
–
$ set VT_MPI_DLL=impimt.dll (for Windows)
$ export VT_MPI_DLL=libmpi_mt.so (for Linux)
Software and Services Group
23
Light weight statistics
Use I_MPI_STATS environment variable
–
–
export I_MPI_STATS # (up to 10)
export I_MPI_STATS_SCOPE p2p:csend
~~~~ Process 0 of 256 on node C-21-23
Data Transfers
Src --> Dst
Amount(MB)
Transfers
----------------------------------------000 --> 000
0.000000e+00
0
000 --> 001
1.548767e-03
60
000 --> 002
1.625061e-03
60
000 --> 003
0.000000e+00
0
000 --> 004
1.777649e-03
60
…
=========================================
Totals
3.918986e+03 1209
Communication Activity
Operation Volume(MB)
Calls
----------------------------------------P2P
Csend
9.147644e-02
1160
Send
3.918895e+03
49
Collectives
Barrier
0.000000e+00
12
Bcast
3.051758e-05
6
Reduce
3.433228e-05 6
Allgather
2.288818e-04 30
Allreduce
4.108429e-03 97
Software and Services Group
24
Intel® Trace Analyzer
•
•
•
•
Generate a trace file for Game of Life
Investigate blocking Send using ITA
Change code
Look at difference
Software and Services Group
25
Ideal Interconnect Simulator (IIS)
Helps to figure out application's imbalance simulating its
behavior in the "ideal communication environment"
Real trace
Ideal trace
Software and Services Group
26
Imbalance diagram
Calculation
MPI_Allreduce
ITAC
Calculation
MPI_Allreduce
Calculation
traceidealizer
Calculation
 model
= load imbalance
= interconnect
Calculation
Calculation
Software and Services Group
MPI_Allreduce
MPI_Allreduce
27
Trace Analyzer - Filtering
Software and Services Group
28
mpitune utility
Cluster-specific tune
•
•
Run it once after installation and each time after cluster configuration change
Best configuration is recorded for each combination of communication device, number of nodes, MPI
ranks and process distribution model
# Collect configuration values:
$ mpitune
# Reuse recorded values:
$ mpiexec –tune –n 32 ./your_app
Application-specific tuning
•
•
•
Tune any kind of MPI application specifying its command line
By default performance is measured as inversed execution time
To reduce overall tuning time use the shortest representative application workload (if
applicable)
# Collect configuration settings
$ mpitune –-application \”mpiexec –n 32 ./my_app\” –of ./my_app.conf
Note: using of backslash and quote is mandatory
# Reuse recorded values
$ mpiexec -tune ./my_app.conf -n 32 ./my_app
Software and Services Group
29
Stay tuned!
• Learn more online
– Intel® MPI self-help pages
http://www.intel.com/go/mpi
•
Ask questions and share your knowledge
– Intel® MPI Library support page http://software.intel.com/enus/articles/intel-cluster-toolkit-support-resources/
– Intel® Software Network Forum
http://software.intel.com/en-us/forums/intel-clusters-and-hpctechnology/
Software and Services Group
30
Software and Services Group
31
Download