conjugate gradient method - Department of Computer Science

Monica Garika Chandana Guduru METHODS TO SOLVE LINEAR SYSTEMS  Direct methods Gaussian elimination method LU method for factorization Simplex method of linear programming  Iterative method Jacobi method Gauss-Seidel method Multi-grid method Conjugate gradient method Conjugate Gradient method  The CG is an algorithm for the numerical solution of particular system of linear equations Ax=b. Where A is symmetric i.e., A = AT and positive definite i.e., xT * A * x > 0 for all nonzero vectors If A is symmetric and positive definite then the function Q(x) = ½ x`Ax – x`b + c Conjugate gradient method  Conjugate gradient method builds up a solution x*€ Rn in at most n steps in the absence of round off errors.  Considering round off errors more than n steps may be needed to get a good approximation of exact solution x*  For sparse matrices a good approximation of exact solution can be achieved in less than n steps in also with round off errors. Practical Example In oil reservoir simulation, The number of linear equations corresponds to the number of grids of a reservoir  The unknown vector x is the oil pressure of reservoir  Each element of the vector x is the oil pressure of a specific grid of the reservoir Linear System Square matrix Unknown vector (what we want to find) Known vector Matrix Multiplication Positive Definite Matrix [ x 1 x 2 … xn ] >0 Procedure  Finding the initial guess for solution 𝑥0  Generates successive approximations to 𝑥0  Generates residuals  Searching directions Conjugate gradient iteration  x0 = 0, r0 = b, p0 = r0  for k = 1, 2, 3, . . .  αk = (rTk-1rk-1) / (pTk-1Apk-1) step length  xk = xk-1 + αk pk-1 approximate solution  rk = rk-1 – αk Apk-1 residual  βk = (rTk rk) / (rTk-1rk-1) improvement  pk = rk + βk pk-1 search direction  Iteration of conjugate gradient method is of the form x(t) = x(t-1) + s(t)d(t) where, x(t) is function of old value of vector x s(t) is scalar step size x d(t) is direction vector Before first iteration, values of x(0), d(0) and g(0) must be set Steps to find conjugate gradient method  Every iteration t calculates x(t) in four steps : Step 1: Compute the gradient g(t) = Ax(t-1) – b Step 2: Compute direction vector d(t) = - g(t) + [g(t)` g(t) / g(t-1)` g(t-1)] d(t-1) Step 3: Compute step size s(t) = [- d(t)` g(t)]/d(t)’ A d(t)] Step 4: Compute new approximation of x x(t) = x(t-1) + s(t)d(t) Sequential Algorithm 1) 𝑥0 = 0 2) 𝑟0 := 𝑏 − 𝐴𝑥0 3) 𝑝0 := 𝑟0 4) 𝑘 := 0 5) 𝐾𝑚𝑎𝑥 := maximum number of iterations to be done 6) if 𝑘 < 𝑘𝑚𝑎𝑥 then perform 8 to 16 7) if 𝑘 = 𝑘𝑚𝑎𝑥 then exit 8) calculate 𝑣 = 𝐴𝑝k 9) αk : = rkT rk/pTk v 10) 𝑥k+1:= 𝑥k + ak pk 11) 𝑟k+1 := 𝑟k − ak v 12) if 𝑟k+1 is sufficiently small then go to 16 end if 13) 𝛽k :=( r T k+1 r k+1)/(rTk 𝑟k) 14) 𝑝k+1 := 𝑟k+1 +βk pk 15) 𝑘 := 𝑘 + 1 16) 𝑟𝑒𝑠𝑢𝑙𝑡 = 𝑥k+1 Complexity analysis  To Identify Data Dependencies  To identify eventual communications  Requires large number of operations  As number of equations increases complexity also increases . Why To Parallelize?  Parallelizing conjugate gradient method is a way to increase its performance  Saves memory because processors only store the portions of the rows of a matrix A that contain nonzero elements.  It executes faster because of dividing a matrix into portions How to parallelize?  For example , choose a row-wise block striped decomposition of A and replicate all vectors  Multiplication of A and vector may be performed without any communication  But all-gather communication is needed to replicate the result vector  Overall time complexity of parallel algorithm is Θ(n^2 * w / p + nlogp) Row wise Block Striped Decomposition of a Symmetrically Banded Matrix (a) (b) Dependency Graph in CG Algorithm of a Parallel CG on each Computing Worker (cw) 1) Receive 𝑐𝑤𝑟0,𝑐𝑤 𝐴,𝑐𝑤 𝑥0 2) 𝑐𝑤𝑝0 =𝑐𝑤 𝑟0 3) 𝑘 := 0 4) 𝐾𝑚𝑎𝑥 := maximum number of iterations to be done 5) if 𝑘 < 𝑘𝑚𝑎𝑥 then perform 8 to 16 6) if 𝑘 = 𝑘𝑚𝑎𝑥 then exit 7) 𝑣 =𝑐𝑤 𝐴𝑐𝑤𝑝𝑘 8) 𝑐𝑤𝑁𝛼𝑘 9) 𝑐𝑤𝐷𝛼𝑘 10) Send 𝑁𝛼𝑘,𝐷𝛼𝑘 11) Receive 𝛼𝑘 12) 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑝𝑘 13) Compute Partial Result of 𝑟𝑘+1: 𝑟𝑘+1 =𝑟𝑘 − 𝛼𝑘𝑣 14) Send 𝑥𝑘+1, 𝑟𝑘+1 15) Receive 𝑠𝑖𝑔𝑛𝑎𝑙 16) if 𝑠𝑖𝑔𝑛𝑎𝑙 = 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑟𝑒𝑎𝑐ℎ𝑒𝑑 go to 23 17) 𝑐𝑤𝑁𝛽𝑘 18) 𝑐𝑤𝐷𝛽𝑘 19) Send 𝑐𝑤𝑁𝛽𝑘 , 𝑐𝑤𝐷𝛽𝑘 20) Receive 𝛽𝑘 21) 𝑐𝑤𝑝𝑘+1 =𝑐𝑤 𝑟𝑘+1 + 𝛽𝑘 𝑐𝑤𝑝𝑘 22) 𝑘 := 𝑘 + 1 23) Result reached Speedup of Parallel CG on Grid Versus Sequential CG on Intel Communication and Waiting Time of the Parallel CG on Grid We consider the difference between f at the solution x and any other vector p: i) ii ) f p   1 f x  1 p Ap  b p  c t t 2 x Ax  b x  c t t 2 1 t  1 t  t t ii  i ) f ( x )  f  p    x A x  b x  c    p A p  b p  c  2  2  sim plify ) f ( x )  f  p   1 1 2 2 replace b ) f ( x )  f  p   1 x Ax  x Ax  x Ax  b x  t t t t 2  1 1 2 positive-definitess ) f ( x )  f  p   0 t 1 t p Ap  x Ap t t 2 1 x Ax  t 2  p Ap  b p p Ap  x Ap t 2 x  p t A x  p t Parallel Computation Design Parallel Computation Design  – Iterations of the conjugate gradient method can be executed  only in sequence, so the most advisable approach is to parallelize the computations, that are carried out at each iteration  The most time-consuming computations are the multiplication of matrix A by the vectors x and d  – Additional operations, that have the lower computational complexity order, are different vector processing procedures (inner product, addition and subtraction, multiplying by a scalar).  While implementing the parallel conjugate gradient method, it can be used parallel algorithms of matrix-vector multiplication,  “Pure” Conjugate Gradient Method (Quadratic Case) 0 - Starting at any x0 define d0 = -g0 = b - Q x0 , where gk is the column vector of gradients of the objective function at point f(xk) 1 - Using dk , calculate the new point xk+1 = xk + ak dk , where gkTd k ak=- T d k Qd k 2 - Calculate the new conjugate gradient direction dk+1 , according to: dk+1 = - gk+1 + bk dk where gk+1 TQd k bk= d kTQd k ADVANTAGES Advantages:  1) Gradient is always nonzero and linearly independent of all previous direction vectors.  2) Simple formula to determine the new direction. Only slightly more complicated than steepest descent.  3) Process makes good progress because it is based on gradients. ADVANTAGES  Attractive are the simple formulae for updating the direction vector.  Method is slightly more complicated than steepest descent, but converges faster.  Conjugate gradient method is an Indirect Solver  Used to solve large systems  Requires small amount of memory  It doesn’t require numerical linear algebra Conclusion  Conjugate gradient method is a linear solver tool which is used in a wide range of engineering and sciences applications.  However, conjugate gradient has a complexity drawback due to the high number of arithmetic operations involved in matrix vector and vector-vector multiplications.  Our implementation reveals that despite the communication cost involved in a parallel CG, a performance improvement compared to a sequential algorithm is still possible. References  Parallel and distributed computing systems by Dimitri     P. Bertsekas, John N.Tsitsiklis Parallel programming for multicore and cluster systems by Thomas Rauber,Gudula Runger Scientific computing . An introduction with parallel computing by Gene Golub and James M.Ortega Parallel computing in C with Openmp and mpi by Michael J.Quinn Jonathon Richard Shewchuk, ”An Introduction to the Conjugate Gradient Method Without the Agonizing Pain”, School of Computer Science, Carnegie Mellon University, Edition 1 1/4

conjugate gradient method - Department of Computer Science

Related documents

Products

Support

conjugate gradient method - Department of Computer Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib