Intel ® Cluster Tools Introduction and Hands on Sessions MSU Summer School Intel Cluster Software and Technologies Software & Services Group July, 8 2010, MSU Moscow Software and Services Group 1 Agenda – Intel Cluster Tools settings and configuration – Intel MPI fabrics – Message Checker – ITAC Introduction – ITAC practice Software and Services Group 2 Setup configuration • • • • Source /opt/intel/cc/11.0.74/bin/iccvars.sh intel64 Source /opt/intel/fc/11.0.74/bin/ifortvars.sh intel64 Source /opt/intel/impi/4.0.0.25/bin64/mpivars.sh Source /opt/intel/itac/8.0.0.011/bin/itacvars.sh impi4 Software and Services Group 3 Check configuration • • • • • • • Which icc Which ifort Which mpiexec Which traceanalyzer Echo $LD_LIBRARY_PATH Set | grep I_MPI Set | grep VT_ Software and Services Group 4 Compile your first MPI application • Using Intel compilers • mpiicc, mpiicpc, mpiifort, ... • Using Gnu compilers • mpicc, mpicxx, mpif77, ... • mpiicc -o hello_c test.c • mpiifort -o hello_f test.f Software and Services Group 5 Create mpd.hosts file • Create mpd.hosts file in the working directory with list of available nodes Create mpd ring • mpdboot -r ssh -n #nodes Check mpd ring • mpdtrace Software and Services Group 6 Start your first application • mpiexec -n 16 ./hello_c • mpiexec -n 16 ./hello_f Kill mpd ring • mpdallexit • mpdcleanup -a Start your first application • mpirun -r ssh -n 16 ./hello_c • mpirun -r ssh -n 16 ./hello_f Software and Services Group 7 Alternative Process Manager • Use mpiexec.hydra for better scalability • All options are the same Software and Services Group 8 OFED & DAPL • OFED - OpenFabrics Enterprise Distribution http://openfabrics.org/ • DAPL - Direct Access Programming Library http://www.openfabrics.org/downloads/dapl/ • Check /etc/dat.conf • Set I_MPI_DAPL_PROVIDER=OpenIB-mlx4_0-2 Software and Services Group 9 Fabrics selection I_MPI_DEVICE I_MPI_FABRICS Description sock tcp TCP/IP-capable network fabrics, such as Ethernet and InfiniBand* (through IPoIB*) shm shm Shared-memory only ssm shm:tcp Shared-memory + TCP/IP rdma dapl DAPL–capable network fabrics, such as InfiniBand*, iWarp*, Dolphin*, and XPMEM* (through DAPL*) rdssm shm:dapl Shared-memory + DAPL + sockets ofa OFA-capable network fabric including InfiniBand* (through native OFED* verbs) shm:ofa OFA-capable network fabric with shared memory for intra-node communication tmi TMI-capable network fabrics including Qlogic*, Myrinet*, (through Tag Matching Interface) shm:tmi TMI-capable network fabric with shared memory for intra-node communication Software and Services Group 10 Fabrics selection (cont.) • Use I_MPI_FABRICS to set the desired fabric – export I_MPI_FABRICS shm:tcp – mpirun -r ssh -n -env I_MPI_FABRICS shm:tcp ./a.out • DAPL varieties: – export I_MPI_FABRICS shm:dapl – export I_MPI_DAPL_PROVIDER ofa-v2-mlx4_0-1 – export I_MPI_DAPL_UD enable • Connectionless communication • Better scalability • Less memory is required Software and Services Group 11 Fabrics selection (cont.) • OFA fabric – export I_MPI_FABRICS shm:ofa • Multi-rail feature – export I_MPI_OFA_NUM_ADAPTERS=<n> – export I_MPI_OFA_NUM_PORTS=<n> • For OFA devices Intel® MPI Library recognizes some hardware events, can stop using one line and restore connection when a line is OK again Software and Services Group 12 How to get information from Intel MPI library • Use I_MPI_DEBUG env variable – Use a number from 2 to 1001 for different details level – Level 2 shows data transfer mode – Level 4 shows pinning information Software and Services Group 13 cpuinfo utility • Use this utility to get information about processors used in your system Intel(R) Xeon(R) Processor (Intel64 Harpertown) ===== Processor composition ===== Processors(CPUs) : 8 Packages(sockets) : 2 Cores per package : 4 Threads per core : 1 ===== Processor identification ===== Processor Thread Id. Core Id. Package Id. 0 0 0 0 1 0 0 1 2 0 1 0 3 0 1 1 4 0 2 0 5 0 2 1 6 0 3 0 7 0 3 1 ===== Placement on packages ===== Package Id. Core Id. Processors 0 0,1,2,3 0,2,4,6 1 0,1,2,3 1,3,5,7 ===== Cache sharing ===== Cache Size Processors L1 32 KB no sharing L2 6 MB (0,2)(1,3)(4,6)(5,7) Software and Services Group 14 Pinning • One can change default pinning settings – – – – export I_MPI_PIN on/off export I_MPI_PIN_DOMAIN cache2 (for hybrid) export I_MPI_PROCESSOR_LIST allcores export I_MPI_PROCESSOR_LIST shift=socket Software and Services Group 15 OpenMP and Hybrid applications • Check command line for application building – – – Use the thread safe version of the Intel® MPI Library (-mt_mpi option) Use the libraries with SMP parallelization (i.e. parallel MKL) Use –openmp compiler option to enable OpenMP* directives $ mpiicc –openmp -o ./your_app • Set application execution environment for hybrid applications – – Set OMP_NUM_THREADS to threads number Use –perhost option to control process pinning $ export OMP_NUM_THREADS=4 $ export I_MPI_FABRICS=shm:dapl $ export KMP_AFFINITY=compact $ mpirun -perhost 4 -n <N> ./a.out Software and Services Group 16 Intel® MPI Library and MKL • MKL creates own threads (openMP, TBB, …) • MKL from version 10.2 understands settings of Intel® MPI Library and doesn’t create more processes than cores • Use OMP_NUM_THREADS and MKL_NUM_THREADS carefully Software and Services Group 17 How to run a debugger • TotalView – mpirun -r ssh -tv –n # ./a.out • GDB – mpirun -r ssh -gdb –n # ./a.out • Allinea DDT (from GUI) • IDB – mpirun -r ssh -idb –n # ./a.out – You need idb available in your $PATH – Some settings are required Software and Services Group 18 Message Checker • Local checks: isolated to single process – – – – Unexpected process termination Buffer handling Request and data type management Parameter errors found by MPI • Global checks: all processes – Global checks for collectives and p2p ops • • • • Data type mismatches Corrupted data transmission Pending messages Deadlocks (hard & potential) – Global checks for collectives – one report per operation • Operation, size, reduction operation, root mismatch • Parameter error • Mismatched MPI_Comm_free() Software and Services Group 19 Message Checker (cont.) • Levels of severity: – Warnings: application can continue – Error: application can continue but almost certainly not as intended – Fatal error: application must be aborted • Some checks may find both warnings and errors – Example: CALL_FAILED check due to invalid parameter • Invalid parameter in MPI_Send() => msg cannot be sent => error • Invalid parameter in MPI_Request_free() => resource leak => warning Software and Services Group 20 Message Checker (cont.) • Usage model: – Recommended: • -check option when running an MPI job $ mpiexec –check –n 4 ./a.out • Use fail-safe version in case of crash $ mpiexec –check libVTfs.so –n 4 ./a.out – Alternatively: • -check_mpi option during link stage $ mpiicc –check_mpi –g test.c –o a.out • Configuration – Each check can be enabled/disabled individually • set in VT_CONFIG file, e.g. to enable local checks only: CHECK ** OFF CHECK LOCAL:** ON – Change number of warnings and errors printed and/or tolerated before abort See lab/poisson_ITAC_dbglibs Software and Services Group 21 Trace Collector • Link with trace library: – mpiicc -trace test.c -o a.out • Run with -trace option – mpiexec -trace -n # ./a.out • Using of itcpin utility – mpirun –r ssh –n # itcpin --run -- ./a.out – Binary instrumentation • Use -tcollect link option – mpiicc -tcollect test.c -o a.out Software and Services Group 22 Using Trace Collector for openMP applications • ITA can show only those threads which call MPI functions. There is very simple trick: e.g. before "#pragma omp barrier" add MPI call: – { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); } • After such modification ITA will show information about OpenMP threads. • Please remember that to support threads you need to use threadsafe MPI Library. Don't forget to set VT_MPI_DLL environment variable. – – $ set VT_MPI_DLL=impimt.dll (for Windows) $ export VT_MPI_DLL=libmpi_mt.so (for Linux) Software and Services Group 23 Light weight statistics Use I_MPI_STATS environment variable – – export I_MPI_STATS # (up to 10) export I_MPI_STATS_SCOPE p2p:csend ~~~~ Process 0 of 256 on node C-21-23 Data Transfers Src --> Dst Amount(MB) Transfers ----------------------------------------000 --> 000 0.000000e+00 0 000 --> 001 1.548767e-03 60 000 --> 002 1.625061e-03 60 000 --> 003 0.000000e+00 0 000 --> 004 1.777649e-03 60 … ========================================= Totals 3.918986e+03 1209 Communication Activity Operation Volume(MB) Calls ----------------------------------------P2P Csend 9.147644e-02 1160 Send 3.918895e+03 49 Collectives Barrier 0.000000e+00 12 Bcast 3.051758e-05 6 Reduce 3.433228e-05 6 Allgather 2.288818e-04 30 Allreduce 4.108429e-03 97 Software and Services Group 24 Intel® Trace Analyzer • • • • Generate a trace file for Game of Life Investigate blocking Send using ITA Change code Look at difference Software and Services Group 25 Ideal Interconnect Simulator (IIS) Helps to figure out application's imbalance simulating its behavior in the "ideal communication environment" Real trace Ideal trace Software and Services Group 26 Imbalance diagram Calculation MPI_Allreduce ITAC Calculation MPI_Allreduce Calculation traceidealizer Calculation model = load imbalance = interconnect Calculation Calculation Software and Services Group MPI_Allreduce MPI_Allreduce 27 Trace Analyzer - Filtering Software and Services Group 28 mpitune utility Cluster-specific tune • • Run it once after installation and each time after cluster configuration change Best configuration is recorded for each combination of communication device, number of nodes, MPI ranks and process distribution model # Collect configuration values: $ mpitune # Reuse recorded values: $ mpiexec –tune –n 32 ./your_app Application-specific tuning • • • Tune any kind of MPI application specifying its command line By default performance is measured as inversed execution time To reduce overall tuning time use the shortest representative application workload (if applicable) # Collect configuration settings $ mpitune –-application \”mpiexec –n 32 ./my_app\” –of ./my_app.conf Note: using of backslash and quote is mandatory # Reuse recorded values $ mpiexec -tune ./my_app.conf -n 32 ./my_app Software and Services Group 29 Stay tuned! • Learn more online – Intel® MPI self-help pages http://www.intel.com/go/mpi • Ask questions and share your knowledge – Intel® MPI Library support page http://software.intel.com/enus/articles/intel-cluster-toolkit-support-resources/ – Intel® Software Network Forum http://software.intel.com/en-us/forums/intel-clusters-and-hpctechnology/ Software and Services Group 30 Software and Services Group 31