Getting Started with HPC On Iceberg Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield www.sheffield.ac.uk/wrgrid Outline • • • • • • Review of hardware and software Accessing Managing Jobs Building Applications Resources Getting Help Checkpointing Jobs • Simplest method for checkpointing – Ensure that applications save configurations at regular intervals so that jobs may be restarted (if necessary) using these configuration files. • Using the BLCR checkpointing environment – BLCR commands – Using BLCR checkpoint with an SGE job Checkpointing jobs Using BLCR • BLCR commands relating to checkpointing. cr_run • • • , cr_checkpoint , cr_restart Set an environment variable to avoid an error export LIBCR_DISABLE_NSCD=1 Start running the code under the control of the check_point system cr_run myexecutable [parameters] Find out it’s process_id (PID) ps | grep myexecutable • Checkpoint it and write the state into a file cr_checkpoint -f checkpoint.file PID If and when my executable fails/crashes runs out of time etc. it can now be restarted from the checkpoint file you specified. cr_restart checkpoint.file Using BLCR checkpoint with an SGE Job • A checkpoint environment has been setup called BLCR - it's accessible using the test cstest.q queue. • An example of a checkpointing job would look something like: # #$ -l h_cpu=168:00:00 #$ -q cstest.q #$ -c hh:mm:ss • #$ -ckpt blcr cr_run ./executable >> output.file • The -c hh:mm:ss options tells SGE to checkpoint over the specified time interval . • The -c sx options tells SGE to checkpoint if the queue is suspended, or if the execution daemon is killed. Building Applications Overview • The operating system on iceberg provides full facilities for, – scientific code development, – compilation and execution of programs. • The development environment includes, – debugging tools provided by the Portland test suite, – the eclipse IDE. Compilers • • • • PGI, GNU and Intel C and Fortran Compilers are installed on iceberg. PGI ( Portland Group) compilers are readily available to use as soon as you log into a worker node. A suitable module command is necessary to access the Intel or GNU compilers. The following modules relating to compilers are available to use with the module add ( or module load) command: compilers/pgi compilers/intel compilers/gcc • Java compiler and the phyton development environment can also be made available by using the following modules respectively; apps/java apps/phyton Building Applications Compilers • C and Fortran programs may be compiled using the GNU or Portland Group. The invoking of these compilers is summarized in the following table: Language PGI Compiler Intel Compiler GNU Compilers C Pggcc icc gcc C++ pgCC icpc g++ FORTRAN 77 pg77 ifort g77 FORTRAN 90/95 pgf90 ifort gfortran Building Applications Compilers • • • • • All of these commands take the filename containing the source to be compiled as one argument followed by a list of optional parameters. Example: pgcc myhelloworld.c –o hello The filetype {suffix} usually determines how the syntax of the source file will be treated. For example myprogram.f will be treated as a fixed format (FTN77 style) source where as myprogram.f90 will be assumed to be free format ( Fortran90 style) by the compiler. Most compilers have a --help or –help switch that lists the available compiler options. -V parameter lists the version number of the compiler you are using. Help and documentation on compilers • As well as the –help or --help parameters of the compiler commands there are man ( manual ) pages available for these compilers on iceberg. For example; man pgcc , man icc , man gcc • Full documentation provided with the PGI and Intel compilers are accessible via your browser from any platform via the page: http://www.shef.ac.uk/wrgrid/software/compilers Building Applications A few Compiler Options Option Effect -c Compile Compile, do not link. -o exefile Specifies a name for the resulting executable. -g Produce debugging information (no optimization). -Mbounds Check arrays for out of bounds access. -fast Full optimisation with function unrolling and code reordering. Building Applications Compiler Options Option Effect -Mvect=sse2 Turn on streaming SIMD extensions (SSE) and SSE2 instructions. SSE2 instructions operate on 64 bit floating point data. -Mvect=prefetch Generate prefetch instructions. -tp k8-64 -tp k8-64 Specify target processor type to be opteron processor running 64 bit system. -g77 libs Link time option allowing object files generated by g77 to be linked into programs (n.b. may cause problems with parallel libraries). Building Applications Sequential Fortran • • • • Assuming that the Fortran program source code is contained in the file mycode.f90, to compile using the Portland group compiler type: pgf90 mycode.f90 In this case the code will be output into the file a.out. To run this code issue: ./a.out at the UNIX prompt. To add some optimization, when using the Portland group compiler, the –fast flag may be used. Also –o may be used to specify the name of the compiled executable, i.e.: pgf90 –o mycode –fast mycode.f90 The resultant executable will have the name mycode and will have been optimized by the compiler. Building Applications Sequential C • Assuming that the program source code is contained in the file mycode.c, • to compile using the Portland C compiler, type: pgcc –o mycode mycode.c • In this case, the executable will be output into the file mycode which can be run by typing its name at the command prompt: ./mycode Memory Issues • • • • Programs requiring larger than 2Gigabytes of memory for its data ( i.e. using very large arrays etc. ) may get into difficulties due to addressing issues when pointers can not hold the values of these large addresses. It is also advisable that variables that store and use the array indices have sufficient number of bytes allocated to them. For example, it is not wise to use short_int (C) or integer*2 (Fortran) for variables holding array indices. Such variables must be re-declared as long_int or integer*4 . To avoid such problems; when using the PGI compilers use the option; –mcmodel=medium when using the Intel compilers use the option; –mcmodel=medium –shared-intel Setting other resource limits ulimit • ulimit provides control over available resources for processes – ulimit –a report all available resource limits – ulimit –s XXXXX set maximum stacksize • Sometimes necessary to set the hardlimit e.g. – ulimit –sH XXXXXX Building Applications 8: Debugging • The Portland group debugger is a – symbolic debugger for Fortran, C, C++ programs. • Allows the control of program execution using – breakpoints, – single stepping and • enables the state of a program to be checked by examination of – variables – and memory locations. Building Applications 9: Debugging • PGDBG debugger is invoked using – the pgdbg command as follows: – pgdbg arguments program arg1 arg2.. Argn – arguments may be any of the pgdbg command line arguments. – program is the name of the traget program being debugged, – arg1, arg2,... argn are the arguments to the program. • To get help from pgdbg use: pgdbg -help Building Applications 10: Debugging • PGDBG GUI – invoked by default using the command pgdbg. – Note that in order to use the debugging tools applications must be compiled with the -g switch thus enabling the generation of symbolic debugger information. Building Applications 11: Profiling • PGPROF profiler enables – the profiling of single process, multi process MPI or SMP OpenMP, or – programs compiled with the -Mconcur option. • The generated profiling information enables the identification of portions of the application that will benefit most from performance tuning. • Profiling generally involves three stages: – compilation – exection – analysis (using the profiler) Building Applications 12: Profiling • To use profiling in is necessary to compile your program with the following options indicated in the table below: Option Effect -Mprof=func Insert calls to produce function level pgrpof output. -Mprof=lines Insert calls to produce line level pgprof output. -Mprof=mpi. Link in mpi profile library that intercepts MPI calls to record message sizes and count message sends and receives. e.g. Mprof=mpi,func. -pg Enable sample based profiling. Building Applications 13: Profiling • The PG profiler is executed using the command pgprof [options] [datafile] – Datafile is a pgprof.out file generated from the program execution. Shared Memory applications using OpenMP • • • • Fortran and C programs containing OpenMP compiler directives can be compiled to take advantage of parallel processing on iceberg. OpenMP model of programming uses a thread-model whereby a number of instances “threads” of a program run simultaneously, when necessary communicating with each other via the memory that is shared by all threads. Although any given processor can run multiple threads of the same program via the operating system’s multi-tasking ability, it is more efficient to allocate one thread per processor in a shared memory machine. On Iceberg we have the following types of compute nodes; 2 dual-core (= 2*2 = 4 processors) AMD nodes 2 quad-core (= 2*4= 8 processor) AMD nodes 2 six-core (=2*6 =12 processor ) Intel nodes Therefore it is usually advisable to restrict OpenMP jobs to about 12 threads when using iceberg. Shared Memory Applications Compiling OpenMP applications Source code that contains $OMP pragmas for parallel programming can be compiled using the following flags: – PGI C, C++, Fortran77 or Fortran90 pgf77 , pgf90, pgcc or pgCC -mp [other options] filename - Intel C/C++, Fortran ifort , icc or icpc –openmp [other options] filename - Gnu C/C++, Fortran gcc or gfortran –fopenmp [other options] filename Note that source code compilation does not require working within a job using the openmp environment. Only the execution of an OpenMP parallel executable will necessitate such an environment that has been requested by the use of the –pe openmp flag to qsub or qsh commands. Shared Memory Applications Specifying Required Number of Threads • • The number of parallel execution threads at execution time is controlled by setting the environment variable OMP_NUM_THREADS to the appropriate value. for the bash or sh shell (which is the default shell on iceberg) use export OMP_NUM_THREADS=6 • If you are using the csh or tcsh shell, use - setenv OMP_NUM_THREADS=6 Shared Memory Applications Starting an OpenMP interactive job • Short interactive jobs that use OpenMP parallel programming are allowed. Although upto 48 way parallel jobs can theoretically be run such way, due to the high utilisation of the cluster we recommend that you do not exceed 12-way jobs. Here is an example of starting a 12way interactive job: qsh -pe openmp 12 or qrsh -pe openmp 12 And in the new shell that starts type: export OMP_NUM_THREADS=12 Alternatively, effect of these two commands can be achieved via the –v parameter: E.g. qsh –pe openmp 12 –v OMP_NUM_THREADS=12 Number of threads to use can later be redefined in the same job to experiment with hyper-threading for example. • Important Note: although the number of processors required is specified with the -pe option, it is still necessary to ensure that the OMP_NUM_THREADS environment variable is set to the correct value. Shared Memory Applications Submitting an OpenMP Job to Sun Grid Engine • • The job is submitted to a special parallel environment that ensures the job ocupies the required number of slots. Using the SGE command qsub the openmp parallel environment is requested using the -pe option as follows; qsub -pe openmp 12 -v OMP_NUM_THREADS=12 myjobfile.sh • The following job script, job.sh is submitted using, qsub job.sh Where job.sh is, #!/bin/bash #$ -cwd #$ -pe openmp 12 #$ -v OMP_NUM_THREADS=12 ./executable Parallel Programming with MPI Introduction • Iceberg is designed with the aim of running MPI (message passing interface ) parallel jobs, • the sun grid engine is able to handle MPI jobs. • In a message passing parallel program each process executes the same binary code but, – executes a different path through the code – this is SPMD (single program multiple data) execution. • Iceberg uses – openmpi-ib and mvapich2-ib implementation provide by infiniband (quadrics/connectX), using IB fast interconnect at 32GigaBits/second. MPI Tutorials From an iceberg worker, execute the following command: tar –zxvf /usr/local/courses/intrompi.tgz The directory which has been created contains some sample MPI applications which you may compile and run. Set The Correct Environment for MPI • For batch jobs the environment is normally set in Makefile or job script • See the script file mpienv.sh (in the intrompi directory) • Set the correct environment by pasting in to users .bashrc file • Set the environment by typing source mpienv.sh • export MPIR_HOME="/usr/local/packages5/openmpi-gnu" • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH":$MPIR_HOME/lib" • export PATH=$PATH:"$MPIR_HOME/bin" Environments for MPI on Iceberg • Openmpi with gigabit ethernet • • • • Openmpi with infiniband • • • • SGE PE is openmpi-ib MPIR_HOME= “/usr/mpi/pgi/openmpi-1.2.8” /usr/mpi/pgi/openmpi-1.2.8/bin/mpirun ./executable Mvapich2 with infiniband • • • • SGE PE is ompigige MPIR_HOME= “/usr/local/packages5/openmpi-pgi” /usr/local/packages5/openmpi-pgi/bin/mpirun ./executable SGE PE is mvapich2-ib MPIR_HOME= “/usr/mpi/pgi/mvapich2-1.2p1” /usr/mpi/pgi/mvapich2-1.2p1/bin/mpirun_rsh -rsh -np $NSLOTS hostfile $TMPDIR/machines ./executable NOTE There are environments for both gnu, PGI and Intel compilers Parallel Programming with MPI 2: Hello MPI World! #include <mpi.h> #include <stdio.h> int main(int argc,char *argv[]) { int rank; /* my rank in MPI_COMM_WORLD */ int size; /* size of MPI_COMM_WORLD */ /* Always initialise mpi by this call before using any mpi functions. */ MPI_Init(& argc , & argv); /* Find out how many processors are taking part in the computations. */ MPI_Comm_size(MPI_COMM_WORLD, &size); /* Get the rank of the current process */ MPI_Comm_rank(MPI_COMM_WORLD, & rank); if (rank == 0) printf("Hello MPI world from C!\n"); printf("There are %d processes in my world, and I have rank %d\n",size, rank); MPI_Finalize(); } Parallel Programming with MPI Output from Hello MPI World! • When run on 4 processors the MPI Hello World program produces the following output, Hello MPI world from C! There are 4 processes in my world, and I have rank 2 There are 4 processes in my world, and I have rank 0 There are 4 processes in my world, and I have rank 3 There are 4 processes in my world, and I have rank 1 Parallel Programming with MPI Compiling MPI Applications Using Infiniband • To compile C, C++, Fortran77 or Fortran90 MPI code using the portland compiler, type, mpif77 [compiler options] filename mpif90 [compiler options] filename mpicc [compiler options] filename mpiCC [compiler options] filename Parallel Programming with MPI Compiling MPI Applications Using Gigabit ethernet on X2200’s • To compile C, C++, Fortran77 or Fortran90 MPI code using the portland compiler, with OpenMPI type, export MPI_HOME=“/usr/local/packages5/openmpi-pgi/bin” $MPI_HOME/mpif77 [compiler options] filename $MPI_HOME/mpif90 [compiler options] filename $MPI_HOME/mpicc [compiler options] filename $MPI_HOME/mpiCC [compiler options] filename Parallel Programming with MPI Submitting an MPI Job to Sun Grid Engine • To submit a an MPI job to sun grid engine, – use the openmpi-ib parallel environment, – ensures that the job occuppies the required number of slots. • Using the SGE command qsub, – the openmpi-ib parallel environment is requested using the -pe option as follows, qsub -pe openmpi-ib 4 myjobfile.sh Parallel Programming with MPI Sun Grid Engine MPI Job Script • The following job script, job.sh is submitted using, – qsub job.sh – job.sh is, #!/bin/sh #$ -cwd #$ -pe openmpi-ib 4 # SGE_HOME to locate sge mpi execution script #$ -v SGE_HOME=/usr/local/sge6_2 /usr/mpi/pgi/openmpi-1.2.8/bin/mpirun ./mpiexecutable Parallel Programming with MPI Sun Grid Engine MPI Job Script • Using this executable directly the job is submitted using qsub in the same way but the scriptfile job.sh is, #!/bin/sh #$ -cwd #$ -pe mvapich2-ib 4 # MPIR_HOME from submitting environment #$ -v MPIR_HOME=/usr/mpi/pgi/mvapich2-1.2p1 $MPIR_HOME/bin/mpirun_rsh –rsh -np 4 -hostfile $TMPDIR/machines ./mpiexecutable Parallel Programming with MPI Sun Grid Engine OpenMPI Job Script • Using this executable directly the job is submitted using qsub in the same way but the scriptfile job.sh is, #!/bin/sh #$ -cwd #$ -pe ompigige 4 # MPIR_HOME from submitting environment #$ -v MPIR_HOME=/usr/local/packages5/openmpi-pgi $MPIR_HOME/bin/mpirun -np 4 -machinefile mpiexecutable Parallel Programming with MPI 10: Extra Notes • • Number of slots required and parallel environment must be specified using -pe openmpi-ib NSLOTS The job must be executed using the correct PGI/Intel/gnu implementation of mpirun. Note also: – Number of processors is specified using -np NSLOTS – Specify the location of the machinefile used for your parallel job, this will be located in a temporary area on the node that SGE submits the job to. Parallel Programming with MPI 10: Pros and Cons. • The downside to message passing codes is that they are harder to write than scalar or shared memory codes. – The system bus on a modern cpu can pass in excess of 4Gbits/sec between the memory and cpu. – A fast ethernet between PC's may only pass up to 200Mbits/sec between machines over a single ethernet cable and • this can be a potential bottleneck when passing data between compute nodes. • The solution to this problem for a high performance cluster such as iceberg is to use a high performance network solution, such as the 16Gbit/sec interconnect provided by infiniband. – The availability of such high performance networking makes possible a scalable parallel machine. Supported Parallel Applications on Iceberg • Abaqus • Fluent • Matlab