GPU Computing in MATLAB What is GPU Computing?

advertisement
Almustapha Lawal & Lucy Ternent
CH925 - Numerical Methods
GPU Computing in MATLAB
What is GPU Computing?
The central processing unit (CPU) of a computer is the hardware responsible for performing the arithmetical,
logical and input/output operations as instructed by a computer program. It is composed of a few cores, with a
lot of cache memory, which perform the operations sequentially.
A graphic processing unit (GPU), however, is composed of hundreds of cores, smaller integer and floating-point
processors, which can perform multiple operations simultaneously. Traditionally used for graphics rendering in
computer gaming they are now being used for scientific computing where they can be exploited for their parallel
processing capabilities to accelerate computations. Compute-intensive portions of the code are transferred to
the GPU, via the PCI (peripheral component interconnect) Express bus, to be performed in parallel, while the
remainder of the, less computationally intensive, code is ran on the CPU.
When to use GPU Computing
Massively parallel computations: Vectorized MATLAB calculations on large arrays, can be broken down
into independent units of work, with each unit being performed on one of the cores of the GPU, thus exploiting
the parallel nature of the GPU.
Computationally intensive: In order for computations to be performed on the GPU, the data first needs to
be transferred from the CPU, this takes time and thus GPU is suitable when the time spent on computation will
significantly exceed the time spent transferring the data from the CPU to the GPU. Computational acceleration
is limited by the amount of data transfer involved.
However, if the code does not fall into one of the above categories the use of a GPU may actually slow down
computations.
GPU in MATLAB
GPU functionality in MATLAB requires the Parallel Computing Toolbox and the correct CUDA drivers. The
GPU which is in use can be observed via the following command gpuDevice. The GPU is able to directly
handle more that 100 built-in MATLAB functions by providing an input argument of the form gpuArray and
then transferred back to the CPU using the gather command. To see which commands can be used in the
GPU use the command methods(‘gpuArray’).
Examples
Example of transferring arrays to and from the GPU in MATLAB and how to implement the fast Fourier
transform function in the GPU, with associated speeds calculated.
Listing 1: MATLAB code
A = rand (3000 ,3000) ; % create a random 3000 x 3000 matrix
t i c ; % start timer
B = f f t ( A ) ; % perform fast fourier transform function on matrix
B1 = i f f t ( B ) ;
1
5
10
15
timeCPU = toc % end timer
% Time GPU (without transfer)
C = gpuArray ( A ) ; % transfer matrix to GPU
t i c ; % start timer
D = f f t ( C ) ; % perform fast fourier transform on the GPU
D1 = i f f t ( C ) ;
timeGPU = toc
% Time GPU (including transfers)
t i c ; % start timer
E = gpuArray ( A ) ; % transfer matrix to GPU
E1 = i f f t ( E ) ;
G = gather ( F ) ; %transfer back to CPU
timeTOTAL = toc
Example demonstrating how the use of GPU can alter the computation time, using matrix multiplication of
matrices of increasing size.
Listing 2: MATLAB code
5
10
15
20
25
30
sizes = power (2 , 12:2:24) ; % specify matrix elements (n^2)
N = sqrt ( sizes ) ; % sepcify matrix dimensions (n x n)
TimeCPU = zeros ( s i z e ( sizes ) ) ; % initialize time vector
TimeGPU = zeros ( s i z e ( sizes ) ) ; % initialize time vector
TimeTot = zeros ( s i z e ( sizes ) ) ; % initialize time vector
f o r ii =1: numel ( sizes )
% create random n x n matrices
A = rand( N ( ii ) , N ( ii ) ) ;
B = rand( N ( ii ) , N ( ii ) ) ;
% start timer and perform CPU matrix multiplication
timer1 = t i c () ;
C = A*B;
TimeCPU ( ii ) = toc ( timer1 ) ;
% start timer and perform GPU matrix multiplication (timing
% both the total time (including transfer) and the GPU
% comptation ony
timer3 = t i c ()
D = gpuArray ( A ) ;
E = gpuArray ( B ) ;
timer2 = t i c () ;
F = D*E;
wait ( GPU ) ;
TimeGPU ( ii ) = toc ( timer2 ) ;
TimeTot ( ii ) = toc ( timer3 ) ;
end
plot ( sizes , TimeCPU , ’b . - ’ , sizes , TimeGPU , ’r . - ’ , sizes , TimeTot , ’g . - ’) % plot results
grid on
legend ( ’ CPU ’ , ’ GPU ’ , ’ Total Time ’)
xlabel ( ’ Size of matrix ’) ;
ylabel ( ’ Time ( seconds ) ’) ;
Optimizing Code for GPU Computations
In order to try and optimize the performance of GPU computations the following steps should be followed
(where possible):
• Minimize the number of transfers between the GPU and CPU
– Do as much as possible whilst on the GPU
– Only transfer data back to the CPU when necessary
• Create the data on the GPU initially
• Don’t use the GPU if it’s not necessary (e.g. if the transfer time exceeds computation times).
2
Download