CUDA Library and Demo Yafeng Yin, Lei Zhou, Hong Man 07/21/2010 Outline • Basic CUDA computation library GPULib, CUBLAS, CUFFT • Advanced CUDA computation library CULA /MAGMA , VSIPL • CUDA FIR Demo(UMD) • Discuss and future work Basic lib - GPULib • GPULib provides a library of mathematical functions – addition, subtraction, multiplication, and division, as well as unary functions, including sin(), cos(), gamma(), and exp(), – interpolation, array reshaping, array slicing, and reduction operations Basic lib - CUBLAS • BLAS-- Basic Linear Algebra Subprograms • CUBLAS Provide a set of functions for basic vector and matrix operations, such as matrix‐vector copy, sort, dot product, Euclidean norm etc – Real data • Level 1 (vector-vector O(N) ) • Level 2 (matrix-vector O(N2) ) • Level 3 (matrix-matrix O(N3) ) – Complex data • Level 1 CUBLAS-Level 2 function cublasSgbmv() y = alpha * op(A) * x + beta * y cublasSgemv() y = alpha * op(A) * x + beta * y cublasSger() A = alpha * x * yT + A cublasSsbmv() y = alpha * A * x + beta * y , cublasSspmv() y = alpha * A * x + beta * y cublasSspr() A = alpha * x * xT + A cublasSspr2() A = alpha * x * yT + alpha * y * xT + A , cublasSsymv() y = alpha * A * x + beta * y cublasSsyr() A = alpha * x * xT + A cublasSsyr2() A = alpha * x * yT + alpha * y * xT + A , cublasStbmv() x = op(A) * x cublasStbsv() op(A) * x = b , output x Basic lib - CUFFT • CUFFT is the CUDA FFT library – Provides a simple interface for computing parallel FFT on an NVIDIA GPU – Allows users to leverage the floating-point power and parallelism of the GPU without having to develop a GPU-based FFT implementation – cufftPlan1d() ,cufftPlan2d() ,cufftPlan3d() Creates a 1D,2D or 3D FFT plan configuration for a specified signal size Advanced lib – CULA and MAGMA • CULA: GPU Accelerated Linear Algebra – provide LAPACK (Linear Algebra PACKage) function on CUDA GPUs • MAGMA: Matrix Algebra on GPU and Multicore Architectures – develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures and "Multicore+GPU" systems Advanced lib -CULA function • Linear Equation Routines – Solves a general system of linear equations AX=B. • Orthogonal Factorizations – LQ ,RQ factorization • Least Squares Routines • Symmetric and non- Symmetric Eigenvalue Routines • Singular Value Decomposition (SVD) Routines Advanced lib - MAGMA • LAPACK on CUDA GPUs – LU, QR, and Cholesky factorizations in both real and complex arithmetic (single and double) – Linear solvers based on LU, QR, and Cholesky in real arithmetic (single and double) – Mixed-precision iterative refinement solvers based on LU, QR, and Cholesky in real arithmetic – Reduction to upper Hessenberg form in real arithmetic (single and double) – MAGMA BLAS in real arithmetic (single and double), Advanced lib -VSIPL • VSIPL: Vector Image Signal Processing Library – Generalized matrix product – Fast FIR filtering – Correlation – Fast Fourier Transform – QR decomposition – Random number generation – Elementwise arithmetic, logical, and comparison operators, linear algebra procedures CUDA library Summary • Basic vector or matrix computation – GPULib, CUBLAS, CUFFT – vector or matrix: addition, subtraction, multiplication, and division sin(), cos(), sort, dot product, • Libraries can be used for Signal Processing – CULA /MAGMA , VSIPL – LU, QR, and Cholesky factorizations – SVD decompostion CUDA Demo (FIR) GPU: NVIDIA GeForce 8600 GT CPU: Intel Duo CPU 2.33G Software: Visual Studio 2005 CUDA Demo (FIR) Output NO GPU Run Memory Total Time Time(msec) Time(msec) CPU +GPU 1000 0.312121 0.166641 10000 0.667264 0.284254 100000 4.210870 1.489784 1000000 39.460812 5.597150 10000000 391.816345 48.080204 CPU Only Time(msec) CUDA Demo (FIR) FIR Performance 5000 4500 CPU 4000 CPU+GPU 3500 3000 msec 2500 2000 1500 1000 500 0 1000 10000 100000 1000000 10000000 Discuss and future work • how to connect CUDA to the SSP re-hosting demo • how to change the sequential executed codes in signal processing system to CUDA codes • how to transfer the XML codes to CUDA codes to generate the CUDA input. Reference • CUDA Zone http://www.nvidia.com/object/cuda_home_new.ht ml • http://en.wikipedia.org/wiki/CUDA