Using LU Decomposition to Optimize the modconcen.m Routine Matt Tornowske April 1, 2002 Review of Gaussian Elimination Find the augmented matrix of the system of linear equations. Find an echelon form of the augmented matrix using elementary row operations. Find the system of equations corresponding to the echelon form. Use back substitution to arrive at the solution. Example using Gaussian Elimination Solve the following system of linear equations using the method of Gaussian Elimination: Starting with the augmented matrix create zeros below the pivot in the first column: Gaussian Elimination : Echelon Form At this stage we create a zero only below the pivot: The corresponding system of equations is: Gaussian Elimination : Back Substitution We now solve the system using back substitution. Since we have x4 = 2 we substitute this into the row above: Substituting x4 = 2 and x3 = -5 into the first equation: And we’re left with: Computational Cost of Gaussian Elimination for an n x n Matrix For multiplication: For addition: Total operations: For 1532 x 1532 matrix: Method of LU Decomposition Find the LU decomposition of the matrix Solve LY=B using forward substitution Solve UX=Y by back substitution Hopefully, we save computational time Finding the LU Factors The LU Factorization is done by finding the Upper and Lower triangle matrices of an original matrix. The upper matrix, U, is the reduced echelon form of the original matrix, A. The lower matrix, L, is the inverse of the elementary matrices obtained from the operations to obtain U. LU decomposition: Getting U Solve the following system of equations using LU decomposition: First reduce the associated matrix into upper triangular form: LU Decomposition: Elementary Matrices The elementary matrices that correspond to these row operations are: The inverses of these matrices are: LU Decomposition: Getting L Thus, LU Decomposition: Solving LUX = B We now solve the given system LUX = B by solving the two subsystems LY = B and UX = Y: This lower triangular system has solution y1 = -1, y2 = 7, y3 = 2. The solution to the upper triangular system is x1 = 5, x2 = -8, x3 = -1, which happens to be the solution to A. Computational Cost of LU Decomposition for an n x n Matrix Total number of arithmetic operations to solve a system of equations using LU decomposition is exactly the same as Gaussian Elimination. The method is the most efficient method available for solving many systems all having the same matrix of coefficients, when the right hand sides are not all available in advance. Since the factorization only has to be done once, it uses (4n3 – 3n2 – n)/6 operations. The solutions of the two triangular systems can be carried out in 2n2 – n operations compared to (4n3 + 9n2 – 7n)/6 operations for full Gaussian elimination each time. Ideally, for 1528 x 1528 matrix, this is a savings of nearly 99.8% per system! Permutation Matrix Our LU decomposition won’t work properly if there are any pivots needed We can use the lu() function in Matlab to create a permutation matrix P to keep track of these pivots We’ll simply change the algorithm slightly to solve LY=PB instead of LY=B Implementation in Matlab We use the Matlab lu() function to give us our L, U, and P matrices We will adjust the code in Jennifer’s modified concentration routine to use LU factorization. Modified Concentration Routine with LU Implemented In the last outer for block (modifications in red): % do the LU factorization here. [L,U,P]=lu(w); clear w; for m=1:ntimes m rhs = zeros(1528,1); for k = 1:1528 N = nb(k,find((nbor(k,:) == 1))); W = nb(k,find((nbor(k,:) == 3))); S = nb(k,find((nbor(k,:) == 5))); E = nb(k,find((nbor(k,:) == 7))); if(~isempty(N) & ~isempty(W) & ~isempty(S) & ~isempty(E)) Modified Routine, continued ux = scaledvelocity(k,1); uy = scaledvelocity(k,2); rhs(k,1) = -concentration(k,m)/deltat; else rhs(k,1) = 0; end end %xjen=w\rhs; replace this line with %backward/forward substitution rhs=P*rhs; xjen2=L\rhs; xjen2=U\xjen2; Modified Routine, continued concentration(:,m+1)=xjen2; end Note: We use the left matrix divide instead of inverse operation. While some improvement can Be gained by performing some inversions outside of the for loop, the increase in speed can be far offset in the larger memory requirements. Performance Gains on Pentium 4 1400 Mhz with 384 MB of RAM With deltat=0.5, old routine took 73 seconds while the new routine took 11 seconds Adjusting deltat for 0.25, old = 145 seconds, new = 17 seconds Deltat for .1, old = 350 seconds, new = 38 seconds Conclusions Depending on the timestep, we’ve created a speedup of 7-9 times the original routine Use LU decomposition, all the cool people do!