Class Notes 3: GAUSS-SEIDEL Math 639d Due Date: Sept. 16 (updated: September 11, 2014) The Jacobi method in the last class was the first example of the definition of an iterative method by splitting. Specifically, we started with Ax = b and split A = D + (A − D). Moving the two terms on different sides of the equation, we obtain Dx = −(A − D)x + b. The resulting iterative method involved replacing x on the left by xi+1 while replacing x on the right by xi . The method is automatically consistent (why?). The Gauss-Seidel method is developed in a similar fashion but uses the splitting A = (L + D) + U resulting in the (consistent) iterative method (3.1) (D + L)xi+1 = −U xi + b. Here D is the diagonal matrix with entries Di,i = Ai,i , i = 1, . . . , n, L is strictly lower triangular (i.e., Li,j = 0 when j ≥ i) and U is strictly upper triangular (i.e., Ui,j = 0 when i ≥ j). Let ei = x − xi then a simple computation (do it!) shows that ei+1 = Gei with (3.2) G = −(D + L)−1 U. Thus, the Gauss-Seidel method is a linear iterative method. As in the Jacobi method, for this method to make sense, we need to assume that Ai,i 6= 0 for i = 1, . . . , n. Indeed, each step in the Gauss-Seidel method requires inversion of the lower triangular matrix (D + L). As det(D + L) = n Y Di,i i=1 (why?), D + L is nonsingular if and only if D has nonzero diagonal entries. The Gauss-Seidel method can be implemented as a “sweep.” To illustrate this we consider the following pseudo-code: F U N CT ION gs(X, B, A, n) F OR j = 1, 2, . . . , n DO P { Xj = (Bj − {k6=j, Aj,k 6=0} Aj,k Xk )A−1 j,j .} 1 2 EN D RET U RN The arguments to the above routine are as follows. X and B are ndimensional vectors. On input, X contains the vector xi . On return, X contains xi+1 (xi is overwritten and lost). B contains the right hand vector b. Often the matrix A is sparse (contains relatively few nonzero entries per row). Note that the sum above only includes terms when Aj,k differs from zero. It is important that the entries when Aj,k = 0 are not included. To do this, one needs to use a sparse storage structure for A. The judicious use of sparse matrix structure such as CSR can result in significant savings in both execution time and memory. Theorem 1. If A is diagonally dominant, then the Gauss-Seidel method converges for any starting iterate and any right hand side. Proof. Set n γ = max j=1 P |Aj,k | P . |Aj,j | − k<j |Aj,k | k>j Now, by diagonal dominance, for each j, X X |Aj,j | − |Aj,k | > |Aj,k | k<j k>j so γ is less than one. We will show that (3.3) kGk∞ ≤ γ where G is the reduction matrix associated with the Gauss-Seidel method. Suppose that x ∈ Rn with x 6= 0. Let y = Gx (or (D + L)y = −U x). For any j, X X Aj,k xk Aj,k yk = − (3.4) Aj,j yj + k<j k>j Let j be such that kyk∞ = |yj |. Then absolute value of the left hand side of (3.4) is bounded from below by X X |Aj,j yj | − | Aj,k yk | ≥ |Aj,j ||yj | − |Aj,k | |yk | k<j k<j ≥ |Aj,j ||yj | − X |Aj,k | |yj | k<j = (|Aj,j | − X k<j |Aj,k |)kyk∞ . 3 Analogously, the absolute value of the right hand side of (3.4) is bounded from above by X X |Aj,k |kxk∞ . |Aj,k ||xk | ≤ k>j k>j Combining the above gives, X X |Aj,k |kxk∞ |Aj,k |)kyk∞ ≤ (|Aj,j | − k>j k<j from which (3.3) immediately follows. Our goal will be to formulate a somewhat more general theorem concerning the convergence of iterative methods. To this end, we consider the following definitions. Definition 1. The spectrum of an n × n matrix A (denoted by σ(A)) is the set of eigenvalues of A, i.e., σ(A) = {λ : λ an eigenvalue of A} Definition 2. The spectral radius ρ(A) is defined by ρ(A) = max |λ|. λ∈σ(A) Our goal will be to prove the following theorem: Theorem 2. Let A be an n × n matrix with complex entries. Given ǫ > 0 there exists a norm k · k∗ with kAk∗ < ρ(A) + ǫ. In some sense, this theorem reduces the question of convergence of linear iterative methods to the computation of the spectral radius of the error reduction matrix. This is stated more precisely in the following corollary. Corollary 1. Let G be the reduction matrix for a linear iterative method. Then the iterative method converges for any (complex) starting iterate and any (complex) right hand side if and only if ρ(G) < 1. Remark 1. The proof (below) of the corollary involves using the (proof of the) theorem to construct a norm k · k∗ for which kGk∗ < γ = ρ(G) + (1 − ρ(G)/2) < 1. It then follows that (3.5) kej k∗ ≤ γ i ke0 k∗ . 4 Any interesting application for iterative methods involves very large matrices. If the matrix is sparse, then the number of nonzeros might be O(n). (Here O(n) denotes a number which is bounded by a constant times n, for example A may have at most 5 nonzero entries per row in which case the constant would be 5). This means that each matrix evaluation involves computer work time proportional to n. We would ideally like to design iterative methods which converge and have asymptotic work as close to O(n) as possible. This is not always possible but gives us something to strive for. Example 1. Consider applying Gauss-Siedel to the matrix if i = j, 1 if j = i + 1, Aij = −1 0 otherwise. The error reduction matrix and is given by G = −(D + L)−1 U = −D−1 U = −U. The matrix G has only the eigenvalue 0 (why?) and so ρ(G) = 0. The above corollary shows that Gauss-Seidel converges (for this example) for any starting iterate and any right hand side. Consider iterating for the solution of Ax = 0 with initial iterate given by x0 = (0, 0, 0, . . . , 0, −1)t . We clearly have e0 = (0, 0, 0, . . . , 0, 1)t and a simple computation gives ei = (0, 0, . . . , 0, 1, 0, . . . , 0)t where the 1 appears in the (n−i)’th entry (when 0 ≤ i < n). Finally, en = 0. If we measure the error in the ℓ∞ norm, then ke0 k∞ = ke1 k∞ = · · · = ken−1 k∞ = 1. Even though the corollary guarantees eventual convergence, we do not obtain any convergence in ℓ∞ norm until the n’th step. The work required to solve this problem using Gauss-Seidel is O(n2 ) since each iteration requires O(n) operations. This is a pedagogical example. Note that kGk∞ = 2 so kei k∞ ≤ kGki∞ ke0 k∞ is of little use. On the other hand, the norm kei k∗ unusually scales the cartesian coordinates of ei enabling (3.5) with γ < 1. 5 The above example illustrates that the corollary and remark may not tell the whole story. The trouble in this example stems from the fact that G has a large Jordan Block (an eigenvalue of high multiplicity with a low dimensional eigenspace). Jordan Blocks and their relation to the proof of the above theorem will be discussed in the next reading assignment. We will prove the above theorem in the next class but here we prove the corollary assuming that the theorem has already been verified. Proof of the Corollary. Suppose that ρ(G) ≥ 1. Then there is an eigenvalue λ with |λ| ≥ 1. Let φ be the corresponding eigenvector (notice that φ may have to be complex even if A has real valued entries). Consider solving a problem with initial error e0 = φ. Then ei = Gi φ = λi φ does not converge to zero. If, on the other hand, ρ(G) < 1 set ǫ = (1 − ρ(G))/2. Then, by the theorem, there is a norm k · k∗ with kGk∗ < ρ(G) + ǫ = 1 + ρ(G) < 1. 2 It follows that 1 + ρ(G) i ke0 k∗ kei k∗ ≤ 2 and so kei k∗ converges to zero as i goes to infinity.