Introduction to Parallel Computing Presented By The UTPA Division of Information Technology IT Support Department Office of Faculty and Research Support Introduction to Parallel Computing Outline • Introduction • Using HiPAC – logging in, transferring files • Getting Help • Customizing the User Environment • Finding and Using Applications • Building and Running Applications • Performance • Reference • Files Introduction to Parallel Computing Introduction Parallel Computing is a computational method where multiple computations can be carried out simultaneously. Large computational problems can be broken down into smaller ones and solved concurrently by multiple processors or multiple computers with results gathered at a single collection point. In the former case multiple processors work on the same task in the latter case multiple computers work on the same task. Parallel Computing Architectures are available in many configurations most notably multiprocessor systems and multicomputer clusters. The HiPAC cluster – thumper.utpa.edu incorporates both architectures using common parallel computing algorithms and mechanisms. Introduction to Parallel Computing Introduction Two Major Parallel Computing Mechanisms - SMP – Symmetric Multi Processing, works with multiprocessor/multicore systems, enabling any CPU to work on any task in memory. These tasks can be moved between processors to balance the workload. SMP directives are put in the program source code and allocated during program compilation by the compiler. - MPI – Message Passing Interface, a communications protocol allowing multiple systems to communicate with each other for parallel applications. MPI provides the functionality between various computers to allow parallel execution of parallel processes. MPI directives placed in source code can be compiled into MPI based executables. Introduction to Parallel Computing introduction thumper.utpa.edu Specifications Nodes 3Mgmt, 1 storage, 68 compute NW media Infiniband (internal), 1GB Ethernet (external) Nodes Mgmt – 4, Compute - 68 Nodes Specs Compute, Mgmt, head – IBM x3550, Storage Array – x3650 Cores 816 (compute), 96 (vSMP), 36 Mgmt , total = 852 (Intel Xeon CPU) Memory Mgmt – 12GB, 48GB, compute – 24GB Disk Storage Array 24TB Operating System Red Hat Enterprise Linux Scheduler Sun Grid Engine Introduction to Parallel Computing Introduction Parallelism • Task parallel computations (loops) - Computer Code is distributed across multiple workers for execution. Task parallelism operates when there are independent tasks applying different operations to different parts of the data. E.g. if value of task y is dependent on task x, then these 2 tasks cannot be run concurrently. • Data Parallel computations (arrays) - Computer Data is distributed across multiple compute nodes for execution. Data parallelism exists when independent tasks apply the same operation to different elements of the data. I.e. if data x consists of odd numbers and data y consists of even numbers the task that looks for primes can run concurrently. • Speedup - Is the ratio between serial execution time and parallel execution time. All programs have instructions that are executed sequentially and cannot be parallelized. 𝒔𝒑𝒆𝒆𝒅𝒖𝒑 = 𝒔𝒆𝒒𝒖𝒆𝒏𝒕𝒊𝒂𝒍 𝒆𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 𝒑𝒂𝒓𝒂𝒍𝒍𝒆𝒍 𝒆𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 Introduction to Parallel Computing Using the HiPAC Cluster Logging into the Cluster • Access to the cluster is achieved in 2-ways - Secure Shell (ssh) access (from MS Windows, Linux or MAC) a) for Windows download putty ssh client from: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html b) for Windows graphical environment download xming and xming fonts from: http://www.straightrunning.com/XmingNotes/ (public domain releases) c) for windows secure file transfers between the desktop and cluster download winscp from: http://winscp.net/eng/download.php - Gompute Portal (web based) note: java is required a) In a browser open http://thumper.thumper.utpa.edu. b) mouse click the ‘Gompute Portal’ link. c) enter username and password to log in. d) in the gompute portal window select options are: Welcome, Live, Monitoring, Accounting, GomputeXplorer. Introduction to Parallel Computing • logging in using the Gompute Portal Introduction to Parallel Computing • logging in, using the Gompute Portal Introduction to Parallel Computing • logging in using the Gompute Portal Introduction to Parallel Computing • transferring files using the Gompute Portal Drag and drop between the 2 panes. Introduction to Parallel Computing • logging in using ssh – putty Introduction to Parallel Computing • logging in ssh graphics mode – putty with X11 forwarding Introduction to Parallel Computing • Using WinSCP for transferring files. Drag and drop between the 2 panes. Introduction to Parallel Computing Getting Help • The following can be used to get help • Man pages - man <command> ; e.g. ‘man qsub’. - man tutorial ; these slides. • Online References http://thumper.thumper.utpa.edu/ http://utpa.edu/hipac • UTPA Help Desk x2020 – log a service request • OFRS – ofrs@utpa.edu rjack@utpa.edu, magañac@utpa.edu Introduction to Parallel Computing Customizing the User Environment • Users can customize their user environment with aliases, and other settings. - add aliases to .bashrc file -> alias rec=‘ls –alt | less’ ; this alias displays the latest modified file at the top the directory listing. - alias vf=‘cd’ ; no more mistyping the change directory command. - alias cpu=‘grep -i --color "model name" /proc/cpuinfo’ ; check cpu’s. - alias loacte='locate‘ ; locate files, directories, etc. - alias moda='module avail‘ ; list available modules. - alias ml='module load‘ ; load a module e.g. “ml jaguar/2012” - alias list='module list' ; list loaded modules - alias mu='module unload‘ ; Introduction to Parallel Computing Customizing the User Environment .bashrc file Introduction to Parallel Computing Finding and Using Applications • Use command line commands to find and use applications - which <command> ; command must be in path. - echo $PATH ; display command path. - locate somefile; locate a file directory or application; - whereis <command> ; e.g. ‘whereis locate’ - file <somefile> ; what type of file? e.g. ‘file namd2’ - info <somefile> ; get information about a file. - module available ; find available modules. - module load <modulename> ; load execution environment. - module list ; list loaded modules. - module unload <modulename> ; unload module. Introduction to Parallel Computing Finding and Using Applications • The scheduler - In order for jobs to be handled effectively and efficiently the cluster uses the Sun Grid Engine (SGE) Scheduler. This manager queues jobs, assigns compute nodes for execution, and executes jobs in an orderly manner so that all users get fair usage of cluster resources. SGE is user managed from the Linux command line. Three queues are available, all.q, matlab.q and vsmp.q. All jobs must be submitted using the scheduler! - qsub somescript ; submit a job. Job runs when resources are available. - qrsh ; run an interactive job. - qstat –f ; display scheduled jobs. - qdel jobid ; delete your job from the queue. - qacct –j jobid ; detailed display of job. - qconf –sql ; show available queues. - qconf –spl ; show available parallel environments. Introduction to Parallel Computing Finding and Using Applications • We know about the finding applications and about the scheduler lets run an application SGE Submission script - myscript #!/bin/bash #$ -N testjob #$ -M rjack@utpa.edu #$ -o /work/rjack/testjob.$$.out #$ -e /work/rjack/testjob.$$.err #$ -S /bin/bash #$ -V #$ -q all.q #$ -cwd #$ -pe orte 5 change this value to add processors ##module load mpi mpirun -np 5 --mca btl openib,self,sm --mca btl_tcp_if_exclude lo,eth1,eth0 ./hello1 Introduction to Parallel Computing Finding and Using Applications SGE Scheduler continued • Run the job - qsub myscript • Check the submission - qstat –f • Check Results file - ls –lat testjob.o* ; the last run will be the top file listed. - less testjob.oXXXXXX ; list contents of file. • Check run specifics - qacct –j XXXXXX | less ; for performance pay attention to ru_utime – user instruction execution time and ru_stime – OS process execution time. Introduction to Parallel Computing Building and Running Applications • Building the hello_mpi MPI application. /* hello.mpi.c */ #include <stdio.h> #include <mpi.h> int main (int argc, char *argv[]) { int myrank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Processor %d of %d: Hello World!\n", myrank, size); MPI_Finalize(); } Introduction to Parallel Computing Building and Running Applications • Compile MPI source code - hello.mpi.c 1) module list; check to make sure the mpi module is loaded. 2) mpicc –o hello1 hello.mpi.c ; compilation here. 3) ls –l hello1 ; check that file was built and is executable. 4) build SGE submission script and add hello1 to the mpirun line. . . mpirun -np 5 --mca btl openib,self,sm --mca btl_tcp_if_exclude lo,eth1,eth0 ./hello1 Introduction to Parallel Computing Building and Running Applications SGE Submission script – myscript (MPI job submission script) #!/bin/bash #$ -N testjob #$ -M rjack@utpa.edu #$ -o testjob.$$.out #$ -e testjob.$$.err #$ -S /bin/bash #$ -V #$ -q all.q #$ -cwd run command for MPI jobs #$ -pe orte 5 ##module load mpi mpirun -np 5 --mca btl openib,self,sm --mca btl_tcp_if_exclude lo,eth1,eth0 ./hello1 Introduction to Parallel Computing Building and Running Applications SGE Submission script – myscript (MPI job submission script) - Submit myscript for execution qsub myscript - Check submission qstat –f - Examine output and error files for results and errors less testjob.XXXXX.out less testjob.XXXXX.err Introduction to Parallel Computing Building and Running Applications • Building the hello_smp vSMP application. /* hello.smp.c */ #include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { int nthreads, tid; /* Fork a team of threads giving them their own copies of variables */ #pragma omp parallel private(nthreads, tid) { /* Obtain thread number */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and disband */ } Introduction to Parallel Computing Building and Running Applications • Compile MPI source code - hello.smp.c 1) gcc -fopenmp hello.smp.c -o hello_smp ; compile source into executable 2) ls –l hello_smp ; check to make sure executable was built 3) ./hello_smp ; run executable to make sure it works. . Introduction to Parallel Computing Building and Running Applications SGE Submission script – myscript2 (vSMP job submission script) #!/bin/bash #$ -N testjob #$ -M rjack@utpa.edu #$ -o testjob.$$.out #$ -e testjob.$$.err #$ -S /bin/bash #$ -V use the vsmp queue for vsmp jobs #$ -q vsmp.q #$ -cwd run command for vSMP job #$ -pe smp select # of processes ##module load mpi export OMP_NUM_THREADS=5; ./hello_smp Introduction to Parallel Computing Building and Running Applications SGE Submission script – myscript (vSMP job submission script) - Submit myscript for execution qsub myscript - Check submission qstat –f - Examine output and error files for results and errors less testjob.XXXXX.out less testjob.XXXXX.err Introduction to Parallel Computing Performance SGE Submission script – myscript (vSMP job submission script) - Submit myscript for execution qsub myscript - Check submission qstat –f - Examine output and error files for results and errors less testjob.XXXXX.out less testjob.XXXXX.err Introduction to Parallel Computing Performance SGE Submission script – myscript (vSMP job submission script) - Submit myscript for execution qsub myscript - Check submission qstat –f - Examine output and error files for results and errors less testjob.XXXXX.out less testjob.XXXXX.err Introduction to Parallel Computing Conclusion The computational capability of the HiPAC cluster is equivalent to that of 2500 desktop PC’s. For today’s presentation a trivial application was demonstrated. Currently there are users running Matlab simulations for economic indicators, and molecular simulations for water nafions that use the much more of this systems computational power. HiPAC takes these jobs that would normally take months to run and in some cases reduces that time weeks, days or even hours to complete. That being said some programs will not get maximum speedup, probably seeing minimum gains. The commercial programs that run on the cluster are custom built to take full advantage of the distributed environment. For accounts, setup and support regarding HiPAC and parallel processing please contact Robert Jackson at 665-2455, Carlos Magaña at 665-7263. Introduction to Parallel Computing References • http://thumper.thumper.utpa.edu/ • http://utpa.edu/hipac • rjack@utpa.edu • magañac@utpa.edu Introduction to Parallel Computing Q&A