CM0368 Scientific Computing

advertisement
CM0368
Scientific Computing
Spring 2009
Professor David Walker
david@cs.cf.ac.uk
Schedule
• Weeks 1 & 2 (6): Numerical linear algebra (DWW)
– Solving Ax = b by Gaussian elimination and Gauss-Seidel.
– Algebraic eigenvalue problem. Power method and QR method.
• Weeks 3 & 4 (6): Numerical solution of differential
equations (BMB)
– Ordinary differential equations (finite difference and Runge-Kutta
methods).
– Partial differential equations (finite difference method)
• Weeks 5 & 6 (5): Applications in Physics (BMB)
– Laplace and Helmholtz equations, Schrodinger problem for the
hydrogen atom, etc.
• Weeks 6 & 7 (4): Numerical optimisation (DWW)
• Weeks 2 – 11: Tutorials on Wednesdays at 1pm in
T2.07. These will be given by Kieran Robert.
Text Book
• “Numerical Computing with MATLAB” by
Cleve B. Moler, SIAM Press, 2004. ISBN
0898715601.
• http://www.readinglists.co.uk/rsl/student/sv
iewlist.dfp?id=20248
• Web edition at
http://www.mathworks.com/moler/
Web Site
• Lecture notes and other module material
can be found at:
http://users.cs.cf.ac.uk/David.W.Walker/CM0368
Numerical Linear Algebra
• A system of n simultaneous linear equations can
be represented in matrix notation as:
Ax = b
where A is an nn matrix, and x and b are
vectors of length n.
• Can write solution as x = A-1b where A-1 is the
inverse of A.
• A square matrix is said to be non-singular if its
inverse exists. If the inverse does not exist then
the matrix is singular.
Geometrical Interpretation
• If A is a 22 matrix, for example
 2  1 x1   3 

    
 3 4  x2    1
then 2x1 – x2 = 3 and 3x1 + 4x2 = -1. Each
represents a straight line and the solution of the
above is given by their intersection.
• If A is a 33 matrix each of the three equations
represents a plane, and the solution is the point lying at
the intersection of the three planes.
MATLAB Solution
• In MATLAB we can find the solution to Ax = b by
writing:
x = A\b
• Write: A = [10 -7 0;-3 2 6;5 -1 5]
• Then: b = [7;4;6]
• Then: x = A\b
Gaussian Elimination
• Eliminate x1 from all the equations after the first.
• Then eliminate x2 from all the equations after the
second.
• Then eliminate x3 from all the equations after the
third.
• And so on, until after n-1 steps we have
eliminated xj from all the equations after the jth,
for j = 1, 2, …, n-1.
• These steps are referred to as the forward
elimination stage of Gaussian elimination.
Example
 10  7 0  x1   7 

   
  3 2 6  x2    4 
 5  1 5  x   6 

 3   
Subtract -3/10 times equation 1 from equation 2, and 5/10 times
equation 1 from equation 3.
10  7 0  x1   7 

   
 0  0.1 6  x2    6.1 
 0 2.5 5  x   2.5 

 3   
Next we swap equations 2 and 3. This is called partial pivoting. It is done
to get the largest absolute value on or below the diagonal in column 2 onto
the diagonal. This makes the algorithm more stable with respect to roundoff errors (see later).
Example (continued)
10  7 0  x1   7 

   
 0 2.5 5  x2    2.5 
 0  0.1 6  x   6.1 

 3   
Now subtract -0.1/2.5 times equation 2 from equation 3.
10  7 0  x1   7 

   
 0 2.5 5  x2    2.5 
 0 0 6.2  x   6.2 

 3   
This completes the forward elimination stage of the
Gaussian elimination algorithm.
Pseudocode for Forward
Elimination
make b column n+1 of A
for k=1 to n-1
find pivot A(m,k) in column k and row mk
swap rows k and m
for i=k+1 to n
mult = A(i,k)/A(k,k)
for j=i+1 to n+1
A(i,j) = A(i,j) – mult*A(k,j)
end
end
end
Back Substitution
• After the forward elimination phase, the matrix
has been transformed into upper triangular form.
• Equation n just involves xn and so can now be
solved immediately.
• Equation n-1 just involves xn-1 and xn, and since
we already know xn we can find xn-1.
• Working our way backwards through the
equations we can find xn, xn-1,…, x1.
• This is called the back substitution phase of the
Gaussian elimination algorithm.
The Example Again
10  7 0  x1   7 

   
 0 2.5 5  x2    2.5 
 0 0 6.2  x   6.2 

 3   
Equation 3 is 6.2x3 = 6.2, so x3 = 1. This value is
substituted into equation 2:
2.5x2 + (5)(1) = 2.5
so x2 = -1. Substituting for x2 and x3 in equation 1:
10x1 + (-7)(-1) = 7
so x1 = 0.
Pseudocode for Back Substitution
This solves Ux = b
x(n) = b(n)/U(n,n)
for k=n-1 to 1
sum = 0
for j=k+1 to n
sum = sum + U(k,j)*x(j)
end
x(k) = (b(k) – sum)/U(k,k)
end
LU Factorisation
• The Gaussian elimination process can be expressed in
terms of three matrices.
• The first matrix has 1’s on the main diagonal and the
multipliers used in the forward elimination below the
diagonal. This is a lower triangular matrix with unit
diagonal, and is usually denoted by L.
• The second matrix, denoted by U, is the upper triangular
matrix obtained at the end of the forward elimination.
• The third matrix, denoted by P, is a permutation matrix
representing the row interchanges performed in pivoting.
L, U, and P
0
0
 1


L   0. 5
1
0
  0.3  0.04 1 


10  7 0 


U   0 2.5 5 
 0 0 6.2 


1 0 0


P  0 0 1
0 1 0


The original matrix can be expressed as:
LU = PA
The permutation matrix is the identity matrix with its rows
reordered. If Pij = 1 then row i of the permuted matrix is
row j of the original matrix.
The same information can be represented in a
vector, the ith entry of which gives the number
of the column containing the 1 in row i.
1
 
p   3
 2
 
Some MATLAB Code
• L, U, and P can be found in MATLAB as follows:
[L,U,P] = lu(A)
• Solution of the system Ax=b is equivalent to
solving the two triangular systems
Ly = Pb
and Ux = y
• Once you have L, U, and P it is simple to solve
the original system of equations:
y = L\(P*b) and x = U\y, or just x = U\(L\(P*b))
• This should give the same answer as:
x = A\b
LDU Factorisation
• It is sometimes useful to explicitly separate out the
diagonal of U, which contains the pivots.
• We write U=DU’ where U’ is the same as U except that it
has 1’s on the diagonal, and D is a diagonal matrix
containing the diagonal elements of U.
• With this form of the factorisation we have
LDU = PA
0
0
 1


L   0. 5
1
0
  0.3  0.04 1 


0 
10 0


D   0 2.5 0 
 0 0 6.2 


1  7 0


U  0 1 5
0 0 1


Pseudocode for LU Factorisation
make b column n+1 of A
for k=1 to n-1
find pivot A(m,k) in column k and row mk
swap rows k and m
for i=k+1 to n
A(i,k) = A(i,k)/A(k,k)
for j=i+1 to n+1
A(i,j) = A(i,j) – A(i,k)*A(k,j)
end
end
end
Explanation of LU
• At stage i of forward elimination we do
pivoting to find the largest absolute value
in column i on or below the diagonal, and
then exchange rows to bring it onto the
diagonal.
• Then we subtract multiples of row i from
rows i+1,…,n.
• Each of these operations can be
represented by a matrix multiplication.
Elementary Matrices
• An elementary matrix, M, is one that has 1’s along the
main diagonal and 0’s everywhere else, except for one
non-zero value (-m, say) in row i and column j.
• Multiplying A by M has the effect of subtracting m times
row j of matrix A from row i.
• Ignoring pivoting, the GE algorithm applies a series of
elementary matrices to A to get U:
U = Mn-1….M2M1A
• If Li-1=Mi then
L1L2…Ln-1U = A
so taking L = L1L2…Ln-1 we have LU=A.
• Li is the same as Mi except the sign of the non-zero value
is changed.
• With pivoting U = Mn-1 Pn-1 ….M2P2M1P1A, and it can be
shown in a similar way that LU=PA, where P = Pn-1…P2P1.
The Need for Pivoting
• Suppose we change our problem slightly to:
 7 0  x1   7 
 10

  

  3 2.099 6  x2    3.901
 5
 x   6 

1
5

 3  

where the solution is still (0,-1,1)T.
• Now suppose we solve the problem on a
computer that does arithmetic to five decimal
places.
Pivoting (continued)
• The first step of the
elimination gives:
7
0  x1   7 
10

  

0

0
.
001
6
x

6
.
001

 2  

0




2.5
5  x3   2.5 

• The (2,2) element is quite small. Now we
continue without pivoting.
• The next step is to subtract -2.5103 times
equation 2 from equation 3:
(5-(-2.5 103 )(6))x3 = (2.5-(-2.5 103)(6.001))
• The righthand side is (2.5+1.50025 104). The
second term is rounded to 1.5002 104. When
we add the 2.5 the result is rounded to 1.5004
104.
Pivoting (continued)
• So the equation for x3 becomes:
1.5005 104 x3 = 1.5004 104
which gives x3 = 0.99993 (instead of 1).
• Then x2 is found from:
-0.001 x2 + (6)(0.99993) = 6.001
which gives:
-0.001 x2 = 1.5 10-3
so x2 = -1.5 (instead of -1).
• Finally, x1 is found from:
10 x1 + (-7)(-1.5) = 7
which gives x1 = -0.35 (instead of 0).
• The problem arises from using a small pivot which leads to a larger
multiplier.
• Partial pivoting ensures that the multipliers are always less than or
equal to 1 in magnitude, and results in a satisfactory solution.
Measuring Errors
• The discrepancy due to rounding error in a
solution can be measured by the error:
e = x – x*
and by the residual:
r = b - Ax*
where x is the exact solution and x* is the
computed solution.
• e=0 if and only if r=0, however, e and r are
not necessarily both small.
Error Example
• Consider an example in
which the matrix is almost
singular.
• If GE is used to compute
the solution with low
precision we get a large
error but small residual.
• Geometrically, the lines
represented by the
equation are almost
parallel.
• GE with partial pivoting
will always produce small
residuals, but not
necessarily small errors.
 0.780 0.563  x1   0.217 

   

 0.913 0.659  x2   0.254 
computed solution
exact solution
Sensitivity
• We want to know how sensitive the solution to
Ax=b is to perturbations in A and b.
• To do this we first have to come up with some
measure of how close to singular a matrix is.
• If A is singular, a solution will not exist for some b’s
(inconsistency), while for other b’s it is not unique.
• So if A is nearly singular we would expect small
changes in A and b to cause large changes in x.
• If A is the identity matrix then x=b, so if A is close
to the identity matrix we expect small changes in A
and b to result in small changes in x.
Singularity and Pivots
• In GE all the pivots are non-zero if and
only if A is non-singular, provided exact
arithmetic is used.
• So if the pivots are small we expect the
matrix to be close to singular.
• However, with finite precision arithmetic, a
matrix might still be close to singular even
if none of the pivots are small.
Norms
• Size of pivots is not adequate to decide how close a
matrix is to being singular.
• Define lp family of norms (1≤p≤):
1/ p

p
   xi 
 i 1

n
x
p
• Most common norms use p = 1, 2, and .
 n

x 1    xi 
 i 1 
1/ 2

2
x 2    xi 
 i 1

n
x

 max xi
i
Properties of Norms
• All these norms have the following properties:
x  0 if x  0
0 0
cx  c x for all scalars c
x  y  x  y (the triangle inequality)
• In MATLAB use norm(x,p) to find a norm:
norm1 = norm(x,1)
norm2 = norm(x)
norminf = norm(x,inf)
Condition Number
• Ax may have a very different norm from x, and
this change in norm is related to the solution
sensitivity to changes in A. Denote:
M = max
Ax
x
and m = min
Ax
x
where the max and min are taken over all nonzero vectors x. Note, if A is singular, m = 0.
• The condition number, (A), of A is defined as
the ratio M/m. Usually we are interested in the
order of magnitude (A), so it doesn’t matter
which norm is used.
Relative Errors
• Suppose we have an error b in b, which
results in an error x in x. So Ax=b becomes:
A(x+x) = b+b
so Ax= b. From the definition of M and m
we have:
b  M x and b  m x
Then if m0 we have the following relationship
between the relative error in b and in x:
(A) measures the
x
b
  ( A)
amplification of the
x
b
relative error!
Uses of Condition Number
1. As a measure of the amplification of
relative error due to changes in rhs.
2. As a measure of the amplification of
relative error due to changes in matrix A.
3. As a measure of how close a matrix is to
being singular (hard maths omitted). If
(A) is large then A is close to singular.
Some Properties of (A)
•
•
•
•
(A)  1 since M  m.
(P) = 1, if is a permutation matrix.
(cA) = (A) for scalar c.
For D a diagonal matrix (D) is the ratio
of the largest diagonal value to the
smallest.
Relative Error Example
• In this example we use the l1-norm:
 4.1
 4.1 2.8 

b   
A  
 9.7 6.6 
 9.7 
• Solution is x = (1 0)T, and ||b|| = 13.8 and ||x|| = 1.
Change b slightly to:
 0.34 
~  4.11
~
 then x becomes x  

b  
 9.70 
 0.97 
Small change
in b gives big
change in x.
• Errors are ||b||=0.01 and ||x||=1.63, so
relative errors are:
b
b
 0.0007246
x
x
 1.63
(A) is large =
1.63/0.0007246 =
2249.4
Relative Error and Residual
• Condition number is also important in round-off error in
GE. It can be shown that:
b  Ax*
A x*
 
x  x*
x*
   ( A) 
where x* is the numerical solution of Ax=b,  is a
constant smaller than about 10, and  the machine
precision.
• The first inequality says that the relative residual will be
about the same size as the round-off error no matter how
ill-conditioned A is.
• The second inequality says that the relative error is small
if (A) is small but might be large if (A) is large.
Matrix Norms
• The norm of matrix A is ||A|| = M, i.e.,
A  max
Ax
x
• Since ||A-1|| = 1/m it follows that the
condition number can also be defined as:
(A) = ||A|| ||A-1||
• In MATLAB, cond(A,p) computes the condition
number of A relative to the lp-norm.
Iterative methods for Ax=b
• Gaussian Elimination is a direct method
that computes the solution of Ax=b is
O(n3/3) operations.
• If n is large we might want a faster, less
accurate, method.
• With an iterative method we can stop the
iteration whenever we think we have a
sufficiently accurate solution.
Iterative methods
• Suppose we split the matrix as A = S-T, then Sx
= Tx + b.
• We can turn this into an iteration:
Sxk+1 = Txk + b
or
xk+1 = S-1Txk + S-1b
• So if this sequence converges then we can start
the iteration with a guess at the solution x0 and
get an approximate solution.
• We need S to be easily invertible
Common Iterative Methods
1. S = diagonal part of A (Jacobi’s method)
2. S = triangular part of A (Gauss-Seidel method)
3. S = combination of 1 and 2 (successive overrelaxation or SOR)
S is called a pre-conditioner. The choice of
S affects the convergence properties of the
solution.
Jacobi Method
• S = diag(A) so formula for iteration k+1 becomes:
i 1
aii ( xi ) k 1   aij ( x j ) k 
j 1
n
 a (x )
j i 1
ij
j k
 bi
• Expressed in matrices:
Dxk+1 = - (L+U)xk + b
where D, L, and U are the diagonal, strictly lower
triangular, and strictly upper triangular parts of A,
respectively. Note: these are not the D, L, and U of the
LDU factorisation.
Jacobi Example
• Example:
 0
 2  1
 2 0
0 1
1
, S  
, T  
, S T  
A  
1
 1 2 
 0 2
1 0
 2
Then:
 0
 x1 
   
 x2  k 1  1 2
1  x   b1 
2  1  
2




0  x2  k  b2 

 2
1 
2
0 

Pseudocode for Jacobi Method
choose an initial guess x0
for k=0,1,2,….
for i=1 to n
sum = 0.0
for j=1 to i-1 and i+1 to n
sum = sum + A(i,j)*xk(j)
end
xk+1(i) = (b(i)–sum)/A(i,i)
end
check convergence and continue if needed
end
Gauss-Seidel Method
• A problem with the Jacobi method is that we
have to store all of xk until we have finished
computing xk+1.
• In the Gauss-Seidel method the xk+1 are used
as soon as they are computed, and replace the
corresponding xk on the righthand side
i 1
aii ( xi ) k 1   aij ( x j ) k 1 
j 1
n
 a (x )
j i 1
ij
j k
 bi
• This uses half the storage of the Jacobi method.
Gauss-Seidel in Matrix Notation
• Expressed in matrices:
(D+L)xk+1 = -Uxk + b
where D+L is the lower triangular part of A,
and U is the strictly upper triangular part of
A.
Gauss-Seidel Example
• Example:
 2  1
 2 0
0 1
 0 1/ 2 
, S  
, T  
, S 1T  

A  
 1 2 
 1 2 
 0 0
 0 1/ 4 
Then:
 2 0  x1 
 0 1  x1   b1 

   
    
  1 2  x2  k 1  0 0  x2  k  b2 
• Gauss-Seidel is better than Jacobi because it
uses half the storage and often converges
faster.
Pseudocode for Gauss-Seidel
choose an initial guess x0
for k=0,1,2,….
for i=1 to n
sum = 0.0
for j=1 to i-1
sum = sum + A(i,j)*xk+1(j)
end
for j=i+1 to n
sum = sum + A(i,j)*xk(j)
end
xk+1(i) = (b(i)–sum)/A(i,i)
end
check convergence and continue if needed
end
Download