Almustapha Lawal & Lucy Ternent CH925 - Numerical Methods GPU Computing in MATLAB What is GPU Computing? The central processing unit (CPU) of a computer is the hardware responsible for performing the arithmetical, logical and input/output operations as instructed by a computer program. It is composed of a few cores, with a lot of cache memory, which perform the operations sequentially. A graphic processing unit (GPU), however, is composed of hundreds of cores, smaller integer and floating-point processors, which can perform multiple operations simultaneously. Traditionally used for graphics rendering in computer gaming they are now being used for scientific computing where they can be exploited for their parallel processing capabilities to accelerate computations. Compute-intensive portions of the code are transferred to the GPU, via the PCI (peripheral component interconnect) Express bus, to be performed in parallel, while the remainder of the, less computationally intensive, code is ran on the CPU. When to use GPU Computing Massively parallel computations: Vectorized MATLAB calculations on large arrays, can be broken down into independent units of work, with each unit being performed on one of the cores of the GPU, thus exploiting the parallel nature of the GPU. Computationally intensive: In order for computations to be performed on the GPU, the data first needs to be transferred from the CPU, this takes time and thus GPU is suitable when the time spent on computation will significantly exceed the time spent transferring the data from the CPU to the GPU. Computational acceleration is limited by the amount of data transfer involved. However, if the code does not fall into one of the above categories the use of a GPU may actually slow down computations. GPU in MATLAB GPU functionality in MATLAB requires the Parallel Computing Toolbox and the correct CUDA drivers. The GPU which is in use can be observed via the following command gpuDevice. The GPU is able to directly handle more that 100 built-in MATLAB functions by providing an input argument of the form gpuArray and then transferred back to the CPU using the gather command. To see which commands can be used in the GPU use the command methods(‘gpuArray’). Examples Example of transferring arrays to and from the GPU in MATLAB and how to implement the fast Fourier transform function in the GPU, with associated speeds calculated. Listing 1: MATLAB code A = rand (3000 ,3000) ; % create a random 3000 x 3000 matrix t i c ; % start timer B = f f t ( A ) ; % perform fast fourier transform function on matrix B1 = i f f t ( B ) ; 1 5 10 15 timeCPU = toc % end timer % Time GPU (without transfer) C = gpuArray ( A ) ; % transfer matrix to GPU t i c ; % start timer D = f f t ( C ) ; % perform fast fourier transform on the GPU D1 = i f f t ( C ) ; timeGPU = toc % Time GPU (including transfers) t i c ; % start timer E = gpuArray ( A ) ; % transfer matrix to GPU E1 = i f f t ( E ) ; G = gather ( F ) ; %transfer back to CPU timeTOTAL = toc Example demonstrating how the use of GPU can alter the computation time, using matrix multiplication of matrices of increasing size. Listing 2: MATLAB code 5 10 15 20 25 30 sizes = power (2 , 12:2:24) ; % specify matrix elements (n^2) N = sqrt ( sizes ) ; % sepcify matrix dimensions (n x n) TimeCPU = zeros ( s i z e ( sizes ) ) ; % initialize time vector TimeGPU = zeros ( s i z e ( sizes ) ) ; % initialize time vector TimeTot = zeros ( s i z e ( sizes ) ) ; % initialize time vector f o r ii =1: numel ( sizes ) % create random n x n matrices A = rand( N ( ii ) , N ( ii ) ) ; B = rand( N ( ii ) , N ( ii ) ) ; % start timer and perform CPU matrix multiplication timer1 = t i c () ; C = A*B; TimeCPU ( ii ) = toc ( timer1 ) ; % start timer and perform GPU matrix multiplication (timing % both the total time (including transfer) and the GPU % comptation ony timer3 = t i c () D = gpuArray ( A ) ; E = gpuArray ( B ) ; timer2 = t i c () ; F = D*E; wait ( GPU ) ; TimeGPU ( ii ) = toc ( timer2 ) ; TimeTot ( ii ) = toc ( timer3 ) ; end plot ( sizes , TimeCPU , ’b . - ’ , sizes , TimeGPU , ’r . - ’ , sizes , TimeTot , ’g . - ’) % plot results grid on legend ( ’ CPU ’ , ’ GPU ’ , ’ Total Time ’) xlabel ( ’ Size of matrix ’) ; ylabel ( ’ Time ( seconds ) ’) ; Optimizing Code for GPU Computations In order to try and optimize the performance of GPU computations the following steps should be followed (where possible): • Minimize the number of transfers between the GPU and CPU – Do as much as possible whilst on the GPU – Only transfer data back to the CPU when necessary • Create the data on the GPU initially • Don’t use the GPU if it’s not necessary (e.g. if the transfer time exceeds computation times). 2