Using LU Decomposition to Optimize the modconcen.m Routine

advertisement
Using LU Decomposition
to Optimize the
modconcen.m Routine
Matt Tornowske
April 1, 2002
Review of Gaussian
Elimination




Find the augmented matrix of the system of
linear equations.
Find an echelon form of the augmented matrix
using elementary row operations.
Find the system of equations corresponding to
the echelon form.
Use back substitution to arrive at the solution.
Example using Gaussian
Elimination
Solve the following system of linear equations using the method
of Gaussian Elimination:
Starting with the augmented matrix create zeros below the pivot in
the first column:
Gaussian Elimination :
Echelon Form
At this stage we create a zero only below the pivot:
The corresponding system of equations is:
Gaussian Elimination : Back
Substitution
We now solve the system using back substitution.
Since we have x4 = 2 we substitute this into the row
above:
Substituting x4 = 2 and x3 = -5 into the first equation:
And we’re left with:
Computational Cost of Gaussian
Elimination for an n x n Matrix

For
multiplication:
For addition:

Total operations:

For 1532 x 1532
matrix:

Method of LU
Decomposition
Find the LU decomposition of the
matrix
 Solve LY=B using forward
substitution
 Solve UX=Y by back substitution
 Hopefully, we save computational
time

Finding the LU Factors



The LU Factorization is done by finding
the Upper and Lower triangle matrices of
an original matrix.
The upper matrix, U, is the reduced
echelon form of the original matrix, A.
The lower matrix, L, is the inverse of the
elementary matrices obtained from the
operations to obtain U.
LU decomposition: Getting
U
Solve the following system of equations using LU decomposition:
First reduce the associated matrix into upper triangular form:
LU Decomposition:
Elementary Matrices
The elementary matrices that correspond to these row operations are:
The inverses of these matrices are:
LU Decomposition: Getting
L
Thus,
LU Decomposition: Solving
LUX = B
We now solve the given system LUX = B by solving the two
subsystems LY = B and UX = Y:
This lower triangular system has solution y1 = -1, y2 = 7, y3 = 2.
The solution to the upper triangular system is x1 = 5, x2 = -8, x3 = -1,
which happens to be the solution to A.
Computational Cost of LU
Decomposition for an n x n Matrix





Total number of arithmetic operations to solve a system
of equations using LU decomposition is exactly the same
as Gaussian Elimination.
The method is the most efficient method available for
solving many systems all having the same matrix of
coefficients, when the right hand sides are not all
available in advance.
Since the factorization only has to be done once, it uses
(4n3 – 3n2 – n)/6 operations.
The solutions of the two triangular systems can be
carried out in 2n2 – n operations compared to (4n3 +
9n2 – 7n)/6 operations for full Gaussian elimination
each time.
Ideally, for 1528 x 1528 matrix, this is a savings of
nearly 99.8% per system!
Permutation Matrix



Our LU decomposition won’t work
properly if there are any pivots needed
We can use the lu() function in Matlab to
create a permutation matrix P to keep
track of these pivots
We’ll simply change the algorithm slightly
to solve LY=PB instead of LY=B
Implementation in Matlab
We use the Matlab lu() function to
give us our L, U, and P matrices
 We will adjust the code in Jennifer’s
modified concentration routine to
use LU factorization.

Modified Concentration Routine with
LU Implemented
In the last outer for block (modifications in red):
% do the LU factorization here.
[L,U,P]=lu(w);
clear w;
for m=1:ntimes
m
rhs = zeros(1528,1);
for k = 1:1528
N = nb(k,find((nbor(k,:) == 1)));
W = nb(k,find((nbor(k,:) == 3)));
S = nb(k,find((nbor(k,:) == 5)));
E = nb(k,find((nbor(k,:) == 7)));
if(~isempty(N) & ~isempty(W) & ~isempty(S) &
~isempty(E))
Modified Routine, continued
ux = scaledvelocity(k,1);
uy = scaledvelocity(k,2);
rhs(k,1) = -concentration(k,m)/deltat;
else
rhs(k,1) = 0;
end
end
%xjen=w\rhs; replace this line with
%backward/forward substitution
rhs=P*rhs;
xjen2=L\rhs;
xjen2=U\xjen2;
Modified Routine, continued
concentration(:,m+1)=xjen2;
end
Note: We use the left matrix divide instead of
inverse operation. While some improvement can
Be gained by performing some inversions outside
of the for loop, the increase in speed can be far
offset in the larger memory requirements.
Performance Gains on Pentium 4
1400 Mhz with 384 MB of RAM
With deltat=0.5, old routine took 73
seconds while the new routine took
11 seconds
 Adjusting deltat for 0.25, old = 145
seconds, new = 17 seconds
 Deltat for .1, old = 350 seconds,
new = 38 seconds

Conclusions
Depending on the timestep, we’ve
created a speedup of 7-9 times the
original routine
 Use LU decomposition, all the cool
people do!

Download