CM0368 Scientific Computing Spring 2009 Professor David Walker david@cs.cf.ac.uk Schedule • Weeks 1 & 2 (6): Numerical linear algebra (DWW) – Solving Ax = b by Gaussian elimination and Gauss-Seidel. – Algebraic eigenvalue problem. Power method and QR method. • Weeks 3 & 4 (6): Numerical solution of differential equations (BMB) – Ordinary differential equations (finite difference and Runge-Kutta methods). – Partial differential equations (finite difference method) • Weeks 5 & 6 (5): Applications in Physics (BMB) – Laplace and Helmholtz equations, Schrodinger problem for the hydrogen atom, etc. • Weeks 6 & 7 (4): Numerical optimisation (DWW) • Weeks 2 – 11: Tutorials on Wednesdays at 1pm in T2.07. These will be given by Kieran Robert. Text Book • “Numerical Computing with MATLAB” by Cleve B. Moler, SIAM Press, 2004. ISBN 0898715601. • http://www.readinglists.co.uk/rsl/student/sv iewlist.dfp?id=20248 • Web edition at http://www.mathworks.com/moler/ Web Site • Lecture notes and other module material can be found at: http://users.cs.cf.ac.uk/David.W.Walker/CM0368 Numerical Linear Algebra • A system of n simultaneous linear equations can be represented in matrix notation as: Ax = b where A is an nn matrix, and x and b are vectors of length n. • Can write solution as x = A-1b where A-1 is the inverse of A. • A square matrix is said to be non-singular if its inverse exists. If the inverse does not exist then the matrix is singular. Geometrical Interpretation • If A is a 22 matrix, for example 2 1 x1 3 3 4 x2 1 then 2x1 – x2 = 3 and 3x1 + 4x2 = -1. Each represents a straight line and the solution of the above is given by their intersection. • If A is a 33 matrix each of the three equations represents a plane, and the solution is the point lying at the intersection of the three planes. MATLAB Solution • In MATLAB we can find the solution to Ax = b by writing: x = A\b • Write: A = [10 -7 0;-3 2 6;5 -1 5] • Then: b = [7;4;6] • Then: x = A\b Gaussian Elimination • Eliminate x1 from all the equations after the first. • Then eliminate x2 from all the equations after the second. • Then eliminate x3 from all the equations after the third. • And so on, until after n-1 steps we have eliminated xj from all the equations after the jth, for j = 1, 2, …, n-1. • These steps are referred to as the forward elimination stage of Gaussian elimination. Example 10 7 0 x1 7 3 2 6 x2 4 5 1 5 x 6 3 Subtract -3/10 times equation 1 from equation 2, and 5/10 times equation 1 from equation 3. 10 7 0 x1 7 0 0.1 6 x2 6.1 0 2.5 5 x 2.5 3 Next we swap equations 2 and 3. This is called partial pivoting. It is done to get the largest absolute value on or below the diagonal in column 2 onto the diagonal. This makes the algorithm more stable with respect to roundoff errors (see later). Example (continued) 10 7 0 x1 7 0 2.5 5 x2 2.5 0 0.1 6 x 6.1 3 Now subtract -0.1/2.5 times equation 2 from equation 3. 10 7 0 x1 7 0 2.5 5 x2 2.5 0 0 6.2 x 6.2 3 This completes the forward elimination stage of the Gaussian elimination algorithm. Pseudocode for Forward Elimination make b column n+1 of A for k=1 to n-1 find pivot A(m,k) in column k and row mk swap rows k and m for i=k+1 to n mult = A(i,k)/A(k,k) for j=i+1 to n+1 A(i,j) = A(i,j) – mult*A(k,j) end end end Back Substitution • After the forward elimination phase, the matrix has been transformed into upper triangular form. • Equation n just involves xn and so can now be solved immediately. • Equation n-1 just involves xn-1 and xn, and since we already know xn we can find xn-1. • Working our way backwards through the equations we can find xn, xn-1,…, x1. • This is called the back substitution phase of the Gaussian elimination algorithm. The Example Again 10 7 0 x1 7 0 2.5 5 x2 2.5 0 0 6.2 x 6.2 3 Equation 3 is 6.2x3 = 6.2, so x3 = 1. This value is substituted into equation 2: 2.5x2 + (5)(1) = 2.5 so x2 = -1. Substituting for x2 and x3 in equation 1: 10x1 + (-7)(-1) = 7 so x1 = 0. Pseudocode for Back Substitution This solves Ux = b x(n) = b(n)/U(n,n) for k=n-1 to 1 sum = 0 for j=k+1 to n sum = sum + U(k,j)*x(j) end x(k) = (b(k) – sum)/U(k,k) end LU Factorisation • The Gaussian elimination process can be expressed in terms of three matrices. • The first matrix has 1’s on the main diagonal and the multipliers used in the forward elimination below the diagonal. This is a lower triangular matrix with unit diagonal, and is usually denoted by L. • The second matrix, denoted by U, is the upper triangular matrix obtained at the end of the forward elimination. • The third matrix, denoted by P, is a permutation matrix representing the row interchanges performed in pivoting. L, U, and P 0 0 1 L 0. 5 1 0 0.3 0.04 1 10 7 0 U 0 2.5 5 0 0 6.2 1 0 0 P 0 0 1 0 1 0 The original matrix can be expressed as: LU = PA The permutation matrix is the identity matrix with its rows reordered. If Pij = 1 then row i of the permuted matrix is row j of the original matrix. The same information can be represented in a vector, the ith entry of which gives the number of the column containing the 1 in row i. 1 p 3 2 Some MATLAB Code • L, U, and P can be found in MATLAB as follows: [L,U,P] = lu(A) • Solution of the system Ax=b is equivalent to solving the two triangular systems Ly = Pb and Ux = y • Once you have L, U, and P it is simple to solve the original system of equations: y = L\(P*b) and x = U\y, or just x = U\(L\(P*b)) • This should give the same answer as: x = A\b LDU Factorisation • It is sometimes useful to explicitly separate out the diagonal of U, which contains the pivots. • We write U=DU’ where U’ is the same as U except that it has 1’s on the diagonal, and D is a diagonal matrix containing the diagonal elements of U. • With this form of the factorisation we have LDU = PA 0 0 1 L 0. 5 1 0 0.3 0.04 1 0 10 0 D 0 2.5 0 0 0 6.2 1 7 0 U 0 1 5 0 0 1 Pseudocode for LU Factorisation make b column n+1 of A for k=1 to n-1 find pivot A(m,k) in column k and row mk swap rows k and m for i=k+1 to n A(i,k) = A(i,k)/A(k,k) for j=i+1 to n+1 A(i,j) = A(i,j) – A(i,k)*A(k,j) end end end Explanation of LU • At stage i of forward elimination we do pivoting to find the largest absolute value in column i on or below the diagonal, and then exchange rows to bring it onto the diagonal. • Then we subtract multiples of row i from rows i+1,…,n. • Each of these operations can be represented by a matrix multiplication. Elementary Matrices • An elementary matrix, M, is one that has 1’s along the main diagonal and 0’s everywhere else, except for one non-zero value (-m, say) in row i and column j. • Multiplying A by M has the effect of subtracting m times row j of matrix A from row i. • Ignoring pivoting, the GE algorithm applies a series of elementary matrices to A to get U: U = Mn-1….M2M1A • If Li-1=Mi then L1L2…Ln-1U = A so taking L = L1L2…Ln-1 we have LU=A. • Li is the same as Mi except the sign of the non-zero value is changed. • With pivoting U = Mn-1 Pn-1 ….M2P2M1P1A, and it can be shown in a similar way that LU=PA, where P = Pn-1…P2P1. The Need for Pivoting • Suppose we change our problem slightly to: 7 0 x1 7 10 3 2.099 6 x2 3.901 5 x 6 1 5 3 where the solution is still (0,-1,1)T. • Now suppose we solve the problem on a computer that does arithmetic to five decimal places. Pivoting (continued) • The first step of the elimination gives: 7 0 x1 7 10 0 0 . 001 6 x 6 . 001 2 0 2.5 5 x3 2.5 • The (2,2) element is quite small. Now we continue without pivoting. • The next step is to subtract -2.5103 times equation 2 from equation 3: (5-(-2.5 103 )(6))x3 = (2.5-(-2.5 103)(6.001)) • The righthand side is (2.5+1.50025 104). The second term is rounded to 1.5002 104. When we add the 2.5 the result is rounded to 1.5004 104. Pivoting (continued) • So the equation for x3 becomes: 1.5005 104 x3 = 1.5004 104 which gives x3 = 0.99993 (instead of 1). • Then x2 is found from: -0.001 x2 + (6)(0.99993) = 6.001 which gives: -0.001 x2 = 1.5 10-3 so x2 = -1.5 (instead of -1). • Finally, x1 is found from: 10 x1 + (-7)(-1.5) = 7 which gives x1 = -0.35 (instead of 0). • The problem arises from using a small pivot which leads to a larger multiplier. • Partial pivoting ensures that the multipliers are always less than or equal to 1 in magnitude, and results in a satisfactory solution. Measuring Errors • The discrepancy due to rounding error in a solution can be measured by the error: e = x – x* and by the residual: r = b - Ax* where x is the exact solution and x* is the computed solution. • e=0 if and only if r=0, however, e and r are not necessarily both small. Error Example • Consider an example in which the matrix is almost singular. • If GE is used to compute the solution with low precision we get a large error but small residual. • Geometrically, the lines represented by the equation are almost parallel. • GE with partial pivoting will always produce small residuals, but not necessarily small errors. 0.780 0.563 x1 0.217 0.913 0.659 x2 0.254 computed solution exact solution Sensitivity • We want to know how sensitive the solution to Ax=b is to perturbations in A and b. • To do this we first have to come up with some measure of how close to singular a matrix is. • If A is singular, a solution will not exist for some b’s (inconsistency), while for other b’s it is not unique. • So if A is nearly singular we would expect small changes in A and b to cause large changes in x. • If A is the identity matrix then x=b, so if A is close to the identity matrix we expect small changes in A and b to result in small changes in x. Singularity and Pivots • In GE all the pivots are non-zero if and only if A is non-singular, provided exact arithmetic is used. • So if the pivots are small we expect the matrix to be close to singular. • However, with finite precision arithmetic, a matrix might still be close to singular even if none of the pivots are small. Norms • Size of pivots is not adequate to decide how close a matrix is to being singular. • Define lp family of norms (1≤p≤): 1/ p p xi i 1 n x p • Most common norms use p = 1, 2, and . n x 1 xi i 1 1/ 2 2 x 2 xi i 1 n x max xi i Properties of Norms • All these norms have the following properties: x 0 if x 0 0 0 cx c x for all scalars c x y x y (the triangle inequality) • In MATLAB use norm(x,p) to find a norm: norm1 = norm(x,1) norm2 = norm(x) norminf = norm(x,inf) Condition Number • Ax may have a very different norm from x, and this change in norm is related to the solution sensitivity to changes in A. Denote: M = max Ax x and m = min Ax x where the max and min are taken over all nonzero vectors x. Note, if A is singular, m = 0. • The condition number, (A), of A is defined as the ratio M/m. Usually we are interested in the order of magnitude (A), so it doesn’t matter which norm is used. Relative Errors • Suppose we have an error b in b, which results in an error x in x. So Ax=b becomes: A(x+x) = b+b so Ax= b. From the definition of M and m we have: b M x and b m x Then if m0 we have the following relationship between the relative error in b and in x: (A) measures the x b ( A) amplification of the x b relative error! Uses of Condition Number 1. As a measure of the amplification of relative error due to changes in rhs. 2. As a measure of the amplification of relative error due to changes in matrix A. 3. As a measure of how close a matrix is to being singular (hard maths omitted). If (A) is large then A is close to singular. Some Properties of (A) • • • • (A) 1 since M m. (P) = 1, if is a permutation matrix. (cA) = (A) for scalar c. For D a diagonal matrix (D) is the ratio of the largest diagonal value to the smallest. Relative Error Example • In this example we use the l1-norm: 4.1 4.1 2.8 b A 9.7 6.6 9.7 • Solution is x = (1 0)T, and ||b|| = 13.8 and ||x|| = 1. Change b slightly to: 0.34 ~ 4.11 ~ then x becomes x b 9.70 0.97 Small change in b gives big change in x. • Errors are ||b||=0.01 and ||x||=1.63, so relative errors are: b b 0.0007246 x x 1.63 (A) is large = 1.63/0.0007246 = 2249.4 Relative Error and Residual • Condition number is also important in round-off error in GE. It can be shown that: b Ax* A x* x x* x* ( A) where x* is the numerical solution of Ax=b, is a constant smaller than about 10, and the machine precision. • The first inequality says that the relative residual will be about the same size as the round-off error no matter how ill-conditioned A is. • The second inequality says that the relative error is small if (A) is small but might be large if (A) is large. Matrix Norms • The norm of matrix A is ||A|| = M, i.e., A max Ax x • Since ||A-1|| = 1/m it follows that the condition number can also be defined as: (A) = ||A|| ||A-1|| • In MATLAB, cond(A,p) computes the condition number of A relative to the lp-norm. Iterative methods for Ax=b • Gaussian Elimination is a direct method that computes the solution of Ax=b is O(n3/3) operations. • If n is large we might want a faster, less accurate, method. • With an iterative method we can stop the iteration whenever we think we have a sufficiently accurate solution. Iterative methods • Suppose we split the matrix as A = S-T, then Sx = Tx + b. • We can turn this into an iteration: Sxk+1 = Txk + b or xk+1 = S-1Txk + S-1b • So if this sequence converges then we can start the iteration with a guess at the solution x0 and get an approximate solution. • We need S to be easily invertible Common Iterative Methods 1. S = diagonal part of A (Jacobi’s method) 2. S = triangular part of A (Gauss-Seidel method) 3. S = combination of 1 and 2 (successive overrelaxation or SOR) S is called a pre-conditioner. The choice of S affects the convergence properties of the solution. Jacobi Method • S = diag(A) so formula for iteration k+1 becomes: i 1 aii ( xi ) k 1 aij ( x j ) k j 1 n a (x ) j i 1 ij j k bi • Expressed in matrices: Dxk+1 = - (L+U)xk + b where D, L, and U are the diagonal, strictly lower triangular, and strictly upper triangular parts of A, respectively. Note: these are not the D, L, and U of the LDU factorisation. Jacobi Example • Example: 0 2 1 2 0 0 1 1 , S , T , S T A 1 1 2 0 2 1 0 2 Then: 0 x1 x2 k 1 1 2 1 x b1 2 1 2 0 x2 k b2 2 1 2 0 Pseudocode for Jacobi Method choose an initial guess x0 for k=0,1,2,…. for i=1 to n sum = 0.0 for j=1 to i-1 and i+1 to n sum = sum + A(i,j)*xk(j) end xk+1(i) = (b(i)–sum)/A(i,i) end check convergence and continue if needed end Gauss-Seidel Method • A problem with the Jacobi method is that we have to store all of xk until we have finished computing xk+1. • In the Gauss-Seidel method the xk+1 are used as soon as they are computed, and replace the corresponding xk on the righthand side i 1 aii ( xi ) k 1 aij ( x j ) k 1 j 1 n a (x ) j i 1 ij j k bi • This uses half the storage of the Jacobi method. Gauss-Seidel in Matrix Notation • Expressed in matrices: (D+L)xk+1 = -Uxk + b where D+L is the lower triangular part of A, and U is the strictly upper triangular part of A. Gauss-Seidel Example • Example: 2 1 2 0 0 1 0 1/ 2 , S , T , S 1T A 1 2 1 2 0 0 0 1/ 4 Then: 2 0 x1 0 1 x1 b1 1 2 x2 k 1 0 0 x2 k b2 • Gauss-Seidel is better than Jacobi because it uses half the storage and often converges faster. Pseudocode for Gauss-Seidel choose an initial guess x0 for k=0,1,2,…. for i=1 to n sum = 0.0 for j=1 to i-1 sum = sum + A(i,j)*xk+1(j) end for j=i+1 to n sum = sum + A(i,j)*xk(j) end xk+1(i) = (b(i)–sum)/A(i,i) end check convergence and continue if needed end