Class Notes 3: GAUSS-SEIDEL

advertisement
Class Notes 3: GAUSS-SEIDEL
Math 639d
Due Date: Sept. 16
(updated: September 11, 2014)
The Jacobi method in the last class was the first example of the definition
of an iterative method by splitting. Specifically, we started with Ax = b
and split A = D + (A − D). Moving the two terms on different sides of the
equation, we obtain
Dx = −(A − D)x + b.
The resulting iterative method involved replacing x on the left by xi+1 while
replacing x on the right by xi . The method is automatically consistent
(why?).
The Gauss-Seidel method is developed in a similar fashion but uses the
splitting A = (L + D) + U resulting in the (consistent) iterative method
(3.1)
(D + L)xi+1 = −U xi + b.
Here D is the diagonal matrix with entries Di,i = Ai,i , i = 1, . . . , n, L is
strictly lower triangular (i.e., Li,j = 0 when j ≥ i) and U is strictly upper
triangular (i.e., Ui,j = 0 when i ≥ j).
Let ei = x − xi then a simple computation (do it!) shows that
ei+1 = Gei
with
(3.2)
G = −(D + L)−1 U.
Thus, the Gauss-Seidel method is a linear iterative method.
As in the Jacobi method, for this method to make sense, we need to
assume that Ai,i 6= 0 for i = 1, . . . , n. Indeed, each step in the Gauss-Seidel
method requires inversion of the lower triangular matrix (D + L). As
det(D + L) =
n
Y
Di,i
i=1
(why?), D + L is nonsingular if and only if D has nonzero diagonal entries.
The Gauss-Seidel method can be implemented as a “sweep.” To illustrate
this we consider the following pseudo-code:
F U N CT ION gs(X, B, A, n)
F OR j = 1, 2, . . . , n DO
P {
Xj = (Bj − {k6=j, Aj,k 6=0} Aj,k Xk )A−1
j,j .}
1
2
EN D
RET U RN
The arguments to the above routine are as follows. X and B are ndimensional vectors. On input, X contains the vector xi . On return, X
contains xi+1 (xi is overwritten and lost). B contains the right hand vector
b. Often the matrix A is sparse (contains relatively few nonzero entries per
row). Note that the sum above only includes terms when Aj,k differs from
zero. It is important that the entries when Aj,k = 0 are not included. To
do this, one needs to use a sparse storage structure for A. The judicious use
of sparse matrix structure such as CSR can result in significant savings in
both execution time and memory.
Theorem 1. If A is diagonally dominant, then the Gauss-Seidel method
converges for any starting iterate and any right hand side.
Proof. Set
n
γ = max
j=1
P
|Aj,k |
P
.
|Aj,j | − k<j |Aj,k |
k>j
Now, by diagonal dominance, for each j,
X
X
|Aj,j | −
|Aj,k | >
|Aj,k |
k<j
k>j
so γ is less than one. We will show that
(3.3)
kGk∞ ≤ γ
where G is the reduction matrix associated with the Gauss-Seidel method.
Suppose that x ∈ Rn with x 6= 0. Let y = Gx (or (D + L)y = −U x). For
any j,
X
X
Aj,k xk
Aj,k yk = −
(3.4)
Aj,j yj +
k<j
k>j
Let j be such that kyk∞ = |yj |. Then absolute value of the left hand side of
(3.4) is bounded from below by
X
X
|Aj,j yj | − |
Aj,k yk | ≥ |Aj,j ||yj | −
|Aj,k | |yk |
k<j
k<j
≥ |Aj,j ||yj | −
X
|Aj,k | |yj |
k<j
= (|Aj,j | −
X
k<j
|Aj,k |)kyk∞ .
3
Analogously, the absolute value of the right hand side of (3.4) is bounded
from above by
X
X
|Aj,k |kxk∞ .
|Aj,k ||xk | ≤
k>j
k>j
Combining the above gives,
X
X
|Aj,k |kxk∞
|Aj,k |)kyk∞ ≤
(|Aj,j | −
k>j
k<j
from which (3.3) immediately follows.
Our goal will be to formulate a somewhat more general theorem concerning the convergence of iterative methods. To this end, we consider the
following definitions.
Definition 1. The spectrum of an n × n matrix A (denoted by σ(A)) is the
set of eigenvalues of A, i.e.,
σ(A) = {λ : λ an eigenvalue of A}
Definition 2. The spectral radius ρ(A) is defined by
ρ(A) = max |λ|.
λ∈σ(A)
Our goal will be to prove the following theorem:
Theorem 2. Let A be an n × n matrix with complex entries. Given ǫ > 0
there exists a norm k · k∗ with
kAk∗ < ρ(A) + ǫ.
In some sense, this theorem reduces the question of convergence of linear
iterative methods to the computation of the spectral radius of the error
reduction matrix. This is stated more precisely in the following corollary.
Corollary 1. Let G be the reduction matrix for a linear iterative method.
Then the iterative method converges for any (complex) starting iterate and
any (complex) right hand side if and only if
ρ(G) < 1.
Remark 1. The proof (below) of the corollary involves using the (proof of
the) theorem to construct a norm k · k∗ for which
kGk∗ < γ = ρ(G) + (1 − ρ(G)/2) < 1.
It then follows that
(3.5)
kej k∗ ≤ γ i ke0 k∗ .
4
Any interesting application for iterative methods involves very large matrices. If the matrix is sparse, then the number of nonzeros might be O(n).
(Here O(n) denotes a number which is bounded by a constant times n, for
example A may have at most 5 nonzero entries per row in which case the
constant would be 5). This means that each matrix evaluation involves
computer work time proportional to n. We would ideally like to design iterative methods which converge and have asymptotic work as close to O(n)
as possible. This is not always possible but gives us something to strive for.
Example 1. Consider applying Gauss-Siedel to the matrix

if i = j,

 1
if j = i + 1,
Aij = −1

 0
otherwise.
The error reduction matrix and is given by
G = −(D + L)−1 U = −D−1 U = −U.
The matrix G has only the eigenvalue 0 (why?) and so ρ(G) = 0. The
above corollary shows that Gauss-Seidel converges (for this example) for any
starting iterate and any right hand side.
Consider iterating for the solution of Ax = 0 with initial iterate given by
x0 = (0, 0, 0, . . . , 0, −1)t .
We clearly have
e0 = (0, 0, 0, . . . , 0, 1)t
and a simple computation gives
ei = (0, 0, . . . , 0, 1, 0, . . . , 0)t
where the 1 appears in the (n−i)’th entry (when 0 ≤ i < n). Finally, en = 0.
If we measure the error in the ℓ∞ norm, then
ke0 k∞ = ke1 k∞ = · · · = ken−1 k∞ = 1.
Even though the corollary guarantees eventual convergence, we do not obtain
any convergence in ℓ∞ norm until the n’th step. The work required to solve
this problem using Gauss-Seidel is O(n2 ) since each iteration requires O(n)
operations.
This is a pedagogical example. Note that kGk∞ = 2 so
kei k∞ ≤ kGki∞ ke0 k∞
is of little use. On the other hand, the norm kei k∗ unusually scales the
cartesian coordinates of ei enabling (3.5) with γ < 1.
5
The above example illustrates that the corollary and remark may not
tell the whole story. The trouble in this example stems from the fact that
G has a large Jordan Block (an eigenvalue of high multiplicity with a low
dimensional eigenspace). Jordan Blocks and their relation to the proof of
the above theorem will be discussed in the next reading assignment.
We will prove the above theorem in the next class but here we prove the
corollary assuming that the theorem has already been verified.
Proof of the Corollary. Suppose that ρ(G) ≥ 1. Then there is an eigenvalue
λ with |λ| ≥ 1. Let φ be the corresponding eigenvector (notice that φ may
have to be complex even if A has real valued entries). Consider solving a
problem with initial error e0 = φ. Then
ei = Gi φ = λi φ
does not converge to zero.
If, on the other hand, ρ(G) < 1 set ǫ = (1 − ρ(G))/2. Then, by the
theorem, there is a norm k · k∗ with
kGk∗ < ρ(G) + ǫ =
1 + ρ(G)
< 1.
2
It follows that
1 + ρ(G) i
ke0 k∗
kei k∗ ≤
2
and so kei k∗ converges to zero as i goes to infinity.
Download