Introduction to Parallel Computing Intel Math Kernel Library Huan-Ting Yen, Department of Mathematics, National Taiwan University 2011/07/22 Parallel Computing 2 Introduction to Parallel Computing 2011/07/22 What is parallel computing? Traditionally, software has been written for serial computation: 3 Introduction to Parallel Computing 2011/07/22 What is parallel computing? In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: 4 Introduction to Parallel Computing 2011/07/22 Resource The compute resource 5 A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both. Core 1 Core 2 thread 1 thread 2 Core 3 thread 3 Core 4 thread 4 Introduction to Parallel Computing 2011/07/22 Resource The compute resource A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both. several threads core 1 6 several threads core 2 several threads core 3 Introduction to Parallel Computing several threads core 4 2011/07/22 Resource The compute resource 7 A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both. Introduction to Parallel Computing 2011/07/22 Resource The compute resource 8 A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both. Introduction to Parallel Computing 2011/07/22 Why use parallel computing? The primary reasons for using parallel computing: Save time – wall clock time Solve larger problems Provide concurrency (do many things at the same time) Other reasons might include: 9 Taking advantage of non-local resources Cost savings Overcoming memory constraints Introduction to Parallel Computing 2011/07/22 Amdahl’s Law Speedup of a parallel program is limited by amount of serial works. 10 Introduction to Parallel Computing 2011/07/22 Amdahl’s Law Speedup of a parallel program is limited by amount of serial works. 11 Introduction to Parallel Computing 2011/07/22 Flynn’s Taxonomy Classification for parallel computers and programs 12 Single Instruction Multiple Instruction Single Data SISD (single core CPU) MISD (very rare) Multiple Data SIMD (GPU/vector processor) MIMD (multiple core CPU) Introduction to Parallel Computing 2011/07/22 Flynn’s Taxonomy Classification for parallel computers and programs SISD 13 SIMD Introduction to Parallel Computing 2011/07/22 Flynn’s Taxonomy Classification for parallel computers and programs MISD 14 MIMD Introduction to Parallel Computing 2011/07/22 Intel Math Kernel Library 15 Introduction to Parallel Computing 2011/07/22 Overview The Intel® Math Kernel Library (Intel® MKL) provides Fortran routines and functions that perform a wide variety of operations on vectors and matrices including sparse matrices. The library also includes fast Fourier transform (FFT) functions, as well as vector mathematical and vector statistical functions with Fortran and C interfaces. The versions of Intel MKL intended for Windows* and Linux* operating systems also include ScaLAPACK software and Cluster FFT software for solving respective computational problems on distributed-memory parallel computers. 16 Intel MKL Quickstart 2011/07/22 Intel MKL: Intel Math Kernel Library Functionality 17 BLAS and Sparse BLAS Routines LAPACK Routines: Linear Equations LAPACK Routines: Eigenvalue Problems ScaLAPACK Sparse Solver Routines Fast Fourier Transforms Cluster Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 System Requirements (Hardware) Hardware: Intel® Core™ processor family Intel® Xeon® processor family Intel® Pentium® 4 processor family Intel® Pentium® lll processor Intel® Pentium® processor (300 MHz or faster) Intel® Celeron® processor AMD Athlon* and Opteron* processors How do you know that information about the CPUs ? 18 $ cat /proc/cpuinfo Intel MKL Quickstart 2011/07/22 System Requirements (Software) Following is the list of supposed operating system: How do you know that information about the operating system? Red Hat* Enterprise Linux* 3, 4, 5 Red Hat* Fedora* 9 Debian* GNU/Linux 4.0 Ubuntu* 8.04 $ cat /etc/*release Following is the list of supposed C/C++ and Fortran compilers: 19 Intel® Fortran Compiler 10.1 for Linux* Intel® C++ Compiler 10.1 for Linux* GNU Compiler Collection (gcc, g77, gfortran 4.2.0) Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System Tools & Downloads 20 http://software.intel.com/en-us/ (google “intel software”) Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 21 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 22 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 23 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 24 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 25 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 26 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System user@host:~/software$ wget “URL” user@host:~/software$ ll $ tar –zxvf l_mkl_p_10.2.x.yyy.tar.gz 27 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 28 cd l_mkl_p_10.2.x.yyy ./install.sh Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 29 Intel MKL Quickstart 2011/07/22 Installing Intel MKL on a Linux* System 30 Intel MKL Quickstart 2011/07/22 Some Examples Intel MKL Quickstart 31 Example Brief examples to 32 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Example Brief examples to 33 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Ex1. The complex dot product ( res = å (conjg(x)* y) ) #include <stdio.h> #include "mkl_blas.h” #define N 5 typedef struct{ double re; double im; }mkl_complex; int main() { int n, incx = 1, incy = 1, i; mkl_complex x[N], y[N], res; void zdotc(); n = N; for( i = 0; i < n; i++ ){ x[i].re = (double)i; x[i].im = (double)i * 2.0; y[i].re = (double)(n - i); y[i].im = (double)i * 2.0; } zdotc( &res, &n, x, &incx, y, &incy ); printf( “The complex dot product is: ( %6.2f, %6.2f )\n", res.re, res.im ); return 0; } 34 Intel MKL Quickstart 2011/07/22 ?dotc Computes a dot product of a conjugate vector with another vector. Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h Input Parameters ( zdotc(&res,&n,x,&incx,y,&incy) ) n: The length of two vectors. incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) res: The final result 35 Intel MKL Quickstart 2011/07/22 Makefile (Sequential) Test : blas_c CC = icc MKL_HOME = /home/opt/intel/mkl/10.2.2.025 MKL_INCLUDE = $(MKL_HOME)/include MKL_PATH = $(MKL_HOME)/lib/em64t EXE = blas_c.exe blas_c: $(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH) -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread 36 Intel MKL Quickstart 2011/07/22 Makefile (Parallel) Test = blas_c CC = icc MKL_HOME = /home/opt/intel/mkl/10.2.2.025 MKL_INCLUDE = $(MKL_HOME)/include MKL_PATH = $(MKL_HOME)/lib/em64t EXE = blas_c.exe blas_c: $(CC) -o $(EXE) blas_c.c -I$(MKL_INCLUDE) -L$(MKL_PATH) -Wl,--start-group -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thred -Wl,--end-group –liomp5 -lpthread 37 Intel MKL Quickstart 2011/07/22 ?dotc Computes a dot product of a conjugate vector with another vector. Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h Input Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) n: The length of two vectors. incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y output Parameters ( zdotc(&res,&n,x,&inca,y,&incb) ) res: The final result 38 Intel MKL Quickstart 2011/07/22 BLAS Routines Routines Naming Conventions BLASB routine names have the following structure: <character> <name> <mode> () The <character> filed indicates the data type: s c d z The <mode> filed indicates the data type: c u g 39 real, single precision complex, single precision real, double precision complex, double precision conjugated vector unconjugated vector Givens rotation. Intel MKL Quickstart 2011/07/22 BLAS Routines Routines Naming Conventions BLASB routine names have the following structure: <character> <name> <mode> () In BLAS level 2 and 3, <name> filed indicates the matrix type: ge gb sy sb he hb tr tb 40 general matrix general band matrix symmetric matrix symmetric band matrix Hermitian matrix Hermitian band matrix triangular matrix triangular band matrix Intel MKL Quickstart 2011/07/22 BLAS Level 1 Routines Routine Data Type Description ?asum s, d, sc, dz Sum of vector magnitudes ?axpy s, d, c, z Scalar-vector product ?copy s, d, c, z Copy vector ?dot s, d Doc product ?dotc c, z Doc conjugated ?nrm2 s, d, sc, dz Vector 2-norm (Euclidean norm) ?rotg s, d, cs, zd Givens rotation of points ?rot s, d, cs, zd Plane rotation of points ?scal s, d, c, z, cs, zd Vector-scalar product ?swap s, d, c, z Vector-vector swap i?max s, d, c, z Index of the maximum absolute value element of a vector 41 Intel MKL Quickstart 2011/07/22 Example Brief examples to 42 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Ex2-1. Matrix-vector product ( y = a Ax + b y) #include "mkl_blas.h” int main() { int m, n, incx, incy, lda, idxi, idxj; double alpha, beta, *x, *y, *A ; char trans; m n incx incy lda alpha beta trans = = = = = = = = 3; 3; 1; 1; m; 1.0; 1.0; 'n’; x = (double*)malloc(sizeof(double)*n); y = (double*)malloc(sizeof(double)*n); A = (double*)malloc(sizeof(double)*m*n); 43 Intel MKL Quickstart 2011/07/22 Ex2-2. Matrix-vector product ( y = a Ax + b y) for( idxi = 0; idxi < n; idxi++ ){ *(x+idxi) = 1.0; *(y+idxi) = 1.0; } for( idxi = 0; idxi < m; idxi++ ) for( idxj = 0; idxj < n; idxj++) *(A+idxi*m+idxj) = (double)(idxi+1) + idxj; dgemv(&trans, &m, &n, &alpha, A, &lda, x, &incx, &beta, y, &incy); return 0; } 44 Intel MKL Quickstart 2011/07/22 ?gemv Computes a matrix-vector product using a general matrix. Description : The routine is declared in Fortran77 : mkl_blas.fi Fortran95 : blas.f90 C : mkl_blas.h Input Parameters dgemv(&trans,&m,&n,&alpha,A,&lda,x,&incx,&beta,y,&incy) 45 trans: if trans = ‘N’, ‘n’, then ( y = a Ax + b y) T if trans = ‘T’, ‘t’, then ( y = a A x + b y) = a A* x + b y) if trans = ‘C’, ‘c’,( ythen m: The number of rows of the matrix A . Intel MKL Quickstart 2011/07/22 ?gemv Input Parameters n: The number of columns of the matrix A lda: The first dimension of matrix, lda = max(1,m) incx: Specifies the increment for the elements of x incy: Specifies the increment for the elements of y output Parameters y: Updated vector y. 46 Intel MKL Quickstart 2011/07/22 Ex2. Result 47 Introduction to MATLAB Vectors and Planes BLAS Level 2 Routines Routine Data Type Description ?gemv s, d, c, z Matrix-vector product using a general matrix ?gbmv s, d, c, z Matrix-vector product using a general band matrix ?symv s, d Matrix-vector product using a symmetric matrix ?sbmv s, d Matrix-vector product using a symmetric band matrix ?hemv c, z Matrix-vector product using a Hermitian matrix ?hbmv c, z Matrix-vector product using a Hermitian band matrix ?trmv c, z Matrix-vector product using a triangular matrix ?tbmv s, d, sc, dz Matrix-vector product using a triangular band matrix 48 Intel MKL Quickstart 2011/07/22 Example Brief examples to 49 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Ex3-1. Matrix-Matrix product (C = a AB + bC) #include "mkl_blas.h” int main() { int m, n, k, lda, ldb, ldc, idxi, idxj; double alpha, beta, *A, *B, *C ; char transa, transb; m n k lda ldb ldc alpha beta transa transb 50 = = = = = = = = = = 3; 3; 3; m; k; m; 1.0; 1.0; 'n’; 'n’; Intel MKL Quickstart 2011/07/22 Ex3-2. Matrix-vector product ( y = a Ax + b y) A = (double*)malloc(sizeof(double)*m*n); B = (double*)malloc(sizeof(double)*m*n); C = (double*)malloc(sizeof(double)*m*n); for( idxi = 0; idxi < m; idxi++ ) for( idxj = 0; idxj < n; idxj++) { *(A+idxi*m+idxj) = (double)(idxi+1) + idxj; *(B+idxi*m+idxj) = (double)(idxi+1) + idxj; *(C+idxi*m+idxj) = (double)(idxi+1) + idxj; } dgemm(&transa, &transb, &m, &n, &k, &alpha, A, &lda, B, &ldb, &beta, C, &ldc); return 0; } 51 Intel MKL Quickstart 2011/07/22 ?gemm Input Parameters k: The number of columns of the matrix A and the number of rows of the matrix . B lda: When transa=‘N’ or ‘n’, then lda = max(1,m),otherwise lda=max(1,k). ldb: When transa=‘N’ or ‘n’, then ldb = max(1,k),otherwise lda=max(1,n). ldc: The first dimension of matrix, ldc = max(1,m) output Parameters C: Overwritten by m-by-n matrix. 52 Intel MKL Quickstart 2011/07/22 Ex3. Result 53 Introduction to MATLAB Vectors and Planes BLAS Level 3 Routines Routine Data Type Description ?gemm s, d, c, z Matrix-matrix product of general matrices ?hemv c, z Matrix-matrix product of Hermitian matrices ?symm s, d, c, z Matrix-matrix product of symmetric matrices ?trmm s, d, sc, dz Matrix-matrix product of triangular matrices 54 Intel MKL Quickstart 2011/07/22 Example Brief examples to 55 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Ex4. LU Factorization (A = P * L *U) #include "mkl_lapack.h” int main() { int m, n, lda, info, idxi, idxj, *ipiv; double *A; m n lda = 3; = 3; = m; ipiv = (int*)malloc(sizeof(int)*m); A = (double*)malloc(sizeof(double)*m*n); *(A+0)=1; *(A+1)=2; *(A+2)=6; *(A+3)=-2; *(A+4)=3; *(A+5)=5; *(A+6)=4; *(A+7)=8; *(A+8)=1; dgetrf(&m, &n, A, &lda ,ipiv, &info); return 0; } 56 Intel MKL Quickstart 2011/07/22 ?getrf Description : The routine is declared in Fortran77 : mkl_lapack.fi Fortran95 : lapack.f90 C : mkl_lapack.h Input Parameters m: The number of columns of the matrix A. n: The number of rows of the matrix A . lda: The first dimension of matrix A . A: Array, REAL for sgetrf DOUBLE PRECISION for dgetrf COMPLEX for cgetrf DOUBLE COMPLEX for zgetrf. 57 Intel MKL Quickstart 2011/07/22 ?getrf output Parameters A: Overwritten by L and U. The unit diagonal A elements of L are not stored. ipiv: An integer array, dimension at least max(1,min(m,n)). The pivot indices; row i is interchanged with row ipiv(i) info: Integer. If info=0,the execution is successful. If info=-i,the uii = 0.i-th parameter had an illegal value. If info=i, The factorization has been completed, but U is singular. 58 Intel MKL Quickstart 2011/07/22 Ex4-1. Result 59 Introduction to MATLAB Vectors and Planes Ex4-2. Result 60 Introduction to MATLAB Vectors and Planes LAPACK Computational Routines general matrix sysmmetric indefinite sysmmetric positivedefinite Factorize matrix ?getrf ?sytrf ?potrf Solve linear system with a factored matrix ?getrs ?sytrs ?potrs ?trtrs Condition number ?gecon ?sycon ?pocon ?trcon Compute the inverse matrix using the factorization ?getri ?sytri ?potri ?trtri 61 Intel MKL Quickstart triangular matrix 2011/07/22 LAPACK Routines: Linear Equations To solve a particular problem, you can call two or more computational routines or call a corresponding driver routines that combines several tasks in one call. For example, to solve a system of linear equation with a general matrix, call ?getrf (LU factorization) and then ?getrs (computing the solution). Alternatively, use the driver routine ?gesv that performs all these tasks in one call. 62 Intel MKL Quickstart 2011/07/22 Example Brief examples to 63 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Ex5-1. Solve the Linear Eqation (Ax = b) #include <stdio.h> #include "mkl_lapack.h” int main() { int n, nrhs, lda, ldb, info, idxi, idxj, *ipiv; double *A, *b; n nrhs lda ldb = = = = 3; 1; n; n; ipiv = (int*)malloc(sizeof(int)*n); A = (double*)malloc(sizeof(double)*n*n); b = (double*)malloc(sizeof(double)*n); for( idxi = 0; idxi < n; idxi++ ) for( idxj = 0; idxj < n; idxj++) *(A+idxi*n+idxj) = (double)(idxi+1) + idxj; 64 Intel MKL Quickstart 2011/07/22 Ex5. Solve the Linear Eqation (Ax = b) *(b+0) = 6; *(b+1) = 9; *(b+2) = 12; dgesv(&n, &nrhs, A, &lda ,ipiv, b, &ldb, &info); return 0; } 65 Intel MKL Quickstart 2011/07/22 ?gesv Input Parameters nrhs: The number of columns of the matrix b . Output Parameters A: Overwritten by the factor L and U from the factorization of A = P * L *U. b: Overwritten by the solution matrix x . 66 Intel MKL Quickstart 2011/07/22 Ex5. Result 67 Introduction to MATLAB Vectors and Planes Example Brief examples to 68 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Ex6-1. Solve the Eigen Eqation (Ax = l x) #include "mkl_lapack.h” int main() { int n, lda, lwork, ldvl, ldvr, info, idxi, idxj; double *wr, *wi, *A, *work, *vl, *vr; char jobvl, jobvr; n lda ldvl ldvr lwork jobvl jobvr A wr wi vl vr work 69 = = = = = = = = = = = = = 3; n; 1; n; 4*n; // not 3*n ‘N’; ‘V’; (double*)malloc(sizeof(double)*n*n); (double*)malloc(sizeof(double)*n); (double*)malloc(sizeof(double)*n); (double*)malloc(sizeof(double)*ldvl*n); (double*)malloc(sizeof(double)*ldvr*n); (double*)malloc(sizeof(double)*lwork); Intel MKL Quickstart 2011/07/22 Ex6-2. Solve the Eigen Eqation (Ax = l x) *(A+0) *(A+1) *(A+2) *(A+3) *(A+4) *(A+5) *(A+6) *(A+7) *(A+8) = = = = = = = = = 2; -1; 0; -1; 2; -1; 0; -1; 2; dgeev(&jobvl, &jobvr, &n, A, &lda, &wr, &wi, vl, &ldvl, vr, &ldvr, work, &lwork, &info); return 0; } 70 Intel MKL Quickstart 2011/07/22 ?geev Input Parameters jobvl: If jobvl=‘N’, the left eigenvalues of A are not computed. If jobvl=‘V’, the left eigenvalues of A are computed. jobvr: If jobvr=‘N’, the right eigenvalues of A are not computed. If jobvr=‘V’, the right eigenvalues of A are computed. work: A workspace array, its dimension max(1, lwork). lwork: The dimension of the array work. lwork ≥ max(1,3n), lwork < max(1,4n)(for real). ldvl, ldvr: The leading dimension of the output array vl and vr, respectively. 71 Intel MKL Quickstart 2011/07/22 ?geev Output Parameters wr, wi: Contain the real and imaginary parts, respectively, of the computed eigenvalue. vl, vr: If jobvl = ‘V’, the left eigenvectors u(j) are stored one after another in the columns of vl, in the same order as their eigenvalues. If jobvl = ‘N’, vl is not referenced. If the j-th eigenvalue is real, then u(j) = vl(:,j), the j-th column of vl. info: info=0, the execution is successful. info=-i, the i-th parameter had an illegal value. info= i, then the QR algorithm failed to compute all the eigenvalues, and no eigenvector have been computed. 72 Intel MKL Quickstart 2011/07/22 Ex6. Result 73 Introduction to MATLAB Vectors and Planes LAPACK Computational Routines Orthogonal Factorizations (QR, QZ) Singular Value Decomposition Symmetric Eigenvalue Problems Generalized Symmetric-Definite Eigenvalue Problems Nonsymmetric Eigenvalue Problems Generalized Nonsymmetric Eigenvalue Problems Generalized Singular Value Decomposition 74 Intel MKL Quickstart 2011/07/22 LAPACK Driver Routines Linear Least Squares (LLS) Problems Generalized LLS Problems Symmetric Eigenproblems Nonsymmetric Eigenproblems Singular Value Decomposition Generalized Symmetric Definite Eigenproblems Generalized Nonsymmetric Eigenproblems 75 Intel MKL Quickstart 2011/07/22 Example Brief examples to 76 BLAS Level 1 Routines (vector-vector operations) BLAS Level 2 Routines (matrix-vector operations) BLAS Level 3 Routines (matrix-matrix operations) Compute the LU factorization of a matrix (LAPACK) Solve linear system (LAPACK) Solve eigen system (LAPACK) Fast Fourier Transforms Intel MKL Quickstart 2011/07/22 Five Stage Usage Model for Computing FFT Allocate a fresh descriptor for the problem with a call to the DftiCreateDescriptor function. (precision, rank, sizes, scaling factor, …) Optionally adjust the descriptor configuration with a call to the DftiSetValue function. Commit the descriptor with a call to the DftiCommitDescriptor function. Compute the transform with a call to the DftiComputeForward/DftiComputeBackward function. Deallocate the descriptor with a call to the DftiFreeDescriptor function. 77 Intel MKL Quickstart 2011/07/22 Ex7-1. Three-Dimensional Complex FFT #include "mkl_dfti.h” #define m 1000 #define n 1000 #define k 1000 typedef struct { double re; double im; } mkl_complex; int main() { int double MKL_LONG idxi, idxj, idxk; backward_scale; status, length[3]; mkl_complex *vec_src, *vec_tmp, *vec_dst; DFTI_DESCRIPTOR_HANDLE handle = 0; 78 Intel MKL Quickstart 2011/07/22 Ex7-2. Three-Dimensional Complex FFT x_src = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_tmp = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); x_dst = (mkl_complex*)malloc(sizeof(mkl_complex)*m*n*k); length[0] = m; length[1] = n; length[2] = k; memset(x_src, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_tmp, 0, sizeof(sizeof(mkl_complex)*m*n*k)); memset(x_dst, 0, sizeof(sizeof(mkl_complex)*m*n*k)); for(idxk=0; idxk<k; idxk++) for(idxj=0; idxj<n; idxj++) for(idxi=0; idxi<m; idxi++) { (x_src+idxk*k*n+idxj*n+idxi)->re=1.0; (x_src+idxk*k*n+idxj*n+idxi)->im=0.0; } 79 Intel MKL Quickstart 2011/07/22 Ex7-3. Three-Dimensional Complex FFT status = DftiCreateDescriptor( &handle, DFTI_DOUBLE, DFTI_COMPLEX, 3, length ); if(status && !DftiErrorClass(status, DFTI_NO_ERROR)) { printf("Error : %s\n", DftiErrorMessage(status)); printf("TEST FAILED : DftiCreatDescriptor(&hand, ...)\n"); } status = DftiSetValue( handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE ); status = DftiCommitDescriptor( handle ); status = DftiComputeForward( handle, vec_src, vec_tmp ); backward_scale = 1.0/((double)m*n*k); status = DftiSetValue( handle, DFTI_BACKWARD_SCALE, backward_scale ); status = DftiCommitDescriptor( handle ); status = DftiComputeBackward( handle, vec_tmp, vec_dst); status = DftiFreeDescriptor( &handle ); return 0; } 80 Intel MKL Quickstart 2011/07/22 FFT Functions Function Name Operation DftiCreateDescriptor Allocates memory for the descriptor data structure and preliminarily initializes it. DftiCommitDescriptor Performs all initialization for the actual FFT computation. DftiCopyDescriptor Copies an existing descriptor. DftiFreeDescriptor Frees memory allocated for a descriptor. DftiComputeForward Computes the forward FFT. DftiComputeBackward Computes the backward FFT. DftiSetValue Sets one particular configuration parameter with the specified configuration value. DftiGetValue Gets the value of one particular configuration parameter. 81 Intel MKL Quickstart 2011/07/22 Reference Web site form LLNL tutorials (https://computing.llnl.gov/tutorials/parallel_comp/) Intel® Math Kernel Library Reference Manual (mklman.pdf) Intel® Math Kernel Library for the Linux OS User’s Guide (userguide.pdf) Intel MKL Quickstart Reference 82 83 Introduction to MATLAB Vectors and Planes