Class Notes 2: NORMS AND THE JACOBI METHOD

advertisement
Class Notes 2: NORMS AND THE JACOBI METHOD
Math 639d
Due Date: Sept. 11
(updated: September 4, 2014)
We shall be primarily interested in solving matrix equations with real
coefficients. However, the analysis of iterative methods sometimes requires
the use of complex matrices for reasons which shall become clear in the
subsequent classes.
We shall consider first vector norms on Cn , the set of vectors with n
components, each one a complex number. We start the Euclidean norm
(standard vector length) defined by
kxk2 =
X
n
|xi |
i=1
2
1/2
for x = (x1 , x2 , . . . , xn )t ∈ Cn . This norm satisfies the usual norm axioms:
kxk2 ≥ 0, for all x ∈ Cn , (non-negativity)
(2.1)
kxk2 = 0, only if x = 0, (definiteness)
kαxk2 = |α|kxk2 , for α ∈ C, x ∈ Cn ,
kx + yk2 ≤ kxk2 + kyk2 , for x, y ∈ Cn , (triangle inequality).
We shall sometimes refer to this norm as the ℓ2 norm. This norm obviously
can be restricted to real vectors in Rn .
Norms give a notion of length to vectors. Consider the case of R2 . You
can think of vectors in R2 as points (the vector being from the origin to the
point). The set of vectors having Euclidean norm less than or equal to one
consists of points which are within the circle (including the boundary) of
radius one centered about the origin.
The conditions in (2.1) represent the set of axioms which a norm is required to satisfy. Many other norms are possible. For example, the “ℓ∞ ”
norm is defined by
kxk∞ = max |xj |.
j=1,...,n
It is not difficult to check that k · k∞ satisfies (2.1). The set of vectors in R2
of length at most one using this norm is no longer a circle as was the case
with the Euclidean norm. Find this set by tracing out its boundary; the set
of vectors in R2 whose ℓ∞ norm equals one.
1
2
Another useful norm is the “ℓ1 ” norm. It is given by
n
X
|xj |.
kxk1 =
j=1
Again, it is not hard to see that it satisfies all of the norm axioms (2.1).
Sketch the set of vectors of R2 of length less than or equal to one when
measured in the k · k∞ norm.
These norms can be generalized into a whole scale of norms, i.e.,
X
1/p
n
p
(2.2)
kxkp =
|xi |
.
j=1
These are norms for 1 ≤ p < ∞. When 0 < p < 1, these expressions fail to
satisfy the triangle inequality.
Problem 1. For n = 2 and p = 1/2, find two vectors x, y ∈ R2 satisfying
kx + ykp > kxkp + kykp .
We shall use norms to more precisely measure convergence of iterative
methods. Our discussion of an error vector ei converging to the zero vector
in Class 1 was somewhat intuitive. In contrast, we can be more rigorous if
we show that for some norm k · k on Rn , kei k → 0 as i → ∞.
Exercise 1. Let M be an n×n nonsingular matrix with complex coefficients
and k · k be a norm on Cn . Show that k · kM defined by kxkM = kM xk is
also a norm on Cn , i.e., show that this norm satisfies the axioms (2.1).
Let G be an n × n complex matrix. A vector norm k · k on Cn induces a
matrix norm on G by
(2.3)
kGk =
sup
x∈Cn ,x6=0
kGxk
.
kxk
The set of n × n matrices is a vector space V (under matrix addition and
the usual definition of scalar-matrix multiplication) and kGk satisfies the
norm axioms (2.1) on V . This is sometimes called the operator norm and,
in general, depends on the choice of vector norm.
If G is an n × n real matrix, we can define an alternative matrix norm by
(2.4)
kGkr =
sup
x∈Rn ,x6=0
kGxk
.
kxk
It is clear that
kGkr ≤ kGk.
3
Most of our results below hold for the this matrix norm as well and we shall
often use the same notation kGk for both. We shall sometimes refer to this
as the “real” matrix norm.
Remark 1. Because the “operator norm” is a norm, we can make conclusions of the form
kA + ǫBk ≤ kAk + |ǫ|kBk.
Here A and B are n × n matrices and ǫ is a scalar. We used two of the norm
axioms for the above inequality.
Remark 2. The spectral radius of a real or complex matrix A is defined by
ρ(A) = max |λi |
where the maximum is over all eigenvalues λi of A. We have that kAk ≥
ρ(A) no matter what norm k · k we choose to use on Cn . Indeed, if λi is an
eigenvalue of A and xi is a corresponding eigenvector, then Axi = λxi so
kAxi k
kλi xi k
=
= |λi |.
kxi k
kxi k
Now, kAk is the supremum of the quantity on the left hand side above over
all non-zero vectors and so kAk ≥ ρ(A).
The opposite inequality does not hold
1
A=
0
in general. The matrix
1
1
has only the eigenvalue λ = 1 and so ρ(A) = 1. However, kAk∞ = 2 as we
shall see below (see, Proposition 1).
Remark 3. For an n × n real matrix, the real matrix norm given by (2.4)
also satisfies
kGkr ≥ ρ(G).
This is no where near as obvious as in the complex case (Remark 2). We
shall prove this in Class 4.
As well as satisfying the norm hypothesis, the operator norm satisfies
two important additional inequalities. We state and prove these for the
real operator norm and real matrices. The proofs for the complex case are
essentially identical. Specifically, if A, B are n × n real matrices and y ∈ Rn ,
(2.5)
kABk ≤ kAk kBk and kAyk ≤ kAk kyk.
4
We shall prove the second inequality first. It is obviously true if y = 0 as
both the left and right hand side are zero. If y 6= 0 then by the definiteness
property of the norm, kyk =
6 0 and
kAxk
kAyk
≤ sup
= kAk.
n
kyk
x∈R ,x6=0 kxk
This is just the second inequality of (2.5) in disguise.
For the first inequality in (2.5), if x ∈ Rn with x 6= 0,
kABxk
kA(Bx)k
kAk kBxk
kAk kBk kxk
=
≤
≤
= kAk kBk
kxk
kxk
kxk
kxk
where we used the second inequality of (2.5) twice above. The first inequality
of (2.5) follows by taking the supremum over x ∈ Rn , x 6= 0.
Now, consider a linear1 iterative method for solving Ax = b with A an
n × n matrix. Let x1 , x2 , . . . be the sequence of iterates generated by the
method and x0 be the starting iterate. Since the method is linear, the errors
ei = x − xi are related by
ei+1 = Gei
for some n×n matrix G. When the iteration is linear, the corresponding matrix G will be referred to as the reduction matrix associated
with the iteration. Let k·k be a vector norm and kGk be the corresponding
matrix norm. Applying (2.5) gives
kei+1 k ≤ kGkkei k.
Repetitively applying this gives
kei k ≤ (kGk)i ke0 k.
Now if γ = kGk is less than one, then the k · k-norm of the error converges
to zero as i → ∞. Moreover, each step of the iteration reduces this norm of
the error by at least a factor of γ. Thus, we have shown:
A linear iteration method with reduction matrix G converges for
any starting iterate and any right hand side provided that there
is a vector norm k · k which leads to kGk < 1.
Thus, it is convenient (when possible) to characterize the matrix norm
corresponding to a given vector norm. Such a characterization is illustrated
in the following proposition.
1Defined in Class Notes 1.
5
Proposition 1. Let G be a n × n matrix. Then
n
X
n
kGk∞ = max{
|Gij |}.
i=1
j=1
Proof. Set
n
X
|Gij |}.
γ = max{
n
i=1
Let x be in
Cn
j=1
with x 6= 0 and define y = Gx. Then,
|yi | = |
n
X
Gij xj | ≤
≤
|Gij ||xj |
j=1
j=1
n
X
n
X
|Gij |kxk∞ ≤ γkxk∞ .
j=1
Taking the maximum over i = 1, . . . , n gives
kGxk∞ = kyk∞ ≤ γkxk∞ .
Dividing by kxk∞ and taking the supremum over all such x give kGk∞ ≤ γ.
For the other direction, we first note that if G = 0, the identity is obvious.
Otherwise, we let i be an index for which
γ=
n
X
|Gij |
j=1
and set x ∈ Cn by
xj =
(
0
:
if Gij = 0
Ḡij /|Gij |
:
otherwise.
Here Ḡij denotes the complex conjugate of Gij . Note that
kxk∞ = 1 and kGxk∞ ≥ |(Gx)i | =
n
X
|Gij | = γ.
j=1
Thus
γ≤
kGxk∞
≤ kGk∞ .
kxk∞
This completes the proof of the proposition.
6
Example 1. (The Classical Jacobi Method). Let A be an n × n matrix
with nonzero diagonal entries and D be the diagonal2 matrix with entries
Djj = Ajj . The Jacobi Method for solving Ax = b is given by
(2.6)
Dxi+1 = Dxi + (b − Axi ).
Note that this method is cheap to implement as it only requires simple linear
operations on vectors and the inversion of a diagonal matrix times a vector.
Remark 4. The whole point of using iterative methods to solve matrix equations is to get a sufficiently accurate solution without doing an enormous
amount of computation. Accordingly, each step of the iterative method should
be relatively cheap (at least much cheaper than the amount of computational
work required to invert the matrix using direct methods such as Gaussian
Elimination). Thus, whenever we propose an iterative method, we will always consider the computational effort required per step. Of course, we shall
also be interested in convergence/divergence and the rate of convergence.
Exercise 2. Show that the Jacobi Method is consistent and that the error
for the Jacobi Method satisfies
(2.7)
Dei+1 = Dei − Aei ,
i.e.,
ei+1 = (I − D−1 A)ei .
Definition 1. A matrix A is called (strictly) diagonally dominant if
n
X
|Ajj | >
|Aji |
i=1,i6=j
for j = 1, . . . , n.
Theorem 1. If A is (strictly) diagonally dominant then the Jacobi method
converges for any right hand side and initial iterate.
Proof. A simple computation shows that the reduction matrix G = (I −
D−1 A) is given by
0:
if i = j,
Gji =
−Aj,i /Aj,j :
otherwise.
Applying the previous proposition gives
Pn
n
kGk∞ = max
i=1,i6=j
|Aji |
|Ajj |
which is less than one when A is diagonally dominant.
j=1
2A matrix is diagonal if all of its off diagonal entries are zero, i.e., A = 0 when i 6= j.
ij
7
Remark 5. The above proof illustrates how it may be more convenient to
work with a specific norm to obtain a specific result. The ℓ∞ norm fits very
well with the diagonally dominant assumption.
We shall be applying iterative methods to sparse matrices. A matrix is
sparse if it contains relatively few non-zero elements. An example is the
tridiagonal n × n matrix A3 defined by

if i = j,

 2
(A3 )ij =
−1

 0
if |i − j| = 1,
otherwise.
A full n × n matrix has n2 non-zero entries. In contrast, A3 only has 3n − 2
nonzero entries.
Dealing with sparse matrices efficiently involves avoiding computations
involving the zero entries. To do this, the matrix must be stored in a scheme
which only involves the nonzero entries. We shall use a modified “Compressed Sparse Row” (CSR) structure. A nice discussion of compressed
sparse row structure can be found at
http://www.cs.utk.edu/∼dongarra/etemplates/node373.html.
This structure is designed so that it is easy to access the entries in a row.
Our modification is made so that it is also easy to access the diagonal entry
in any row.
The CSR structure involves three arrays, VAL, CIND and RIND. VAL is
an array of real numbers and stores the actual (nonzero) entries of A. CIND
is an integer array which contains the column indices for nonzero entries in
A. The length of VAL and CIND are equal to the number of nonzero entries
in A. Finally, RIND is an integer array of dimension n + 1 and contains the
row offsets (into the arrays VAL and CIND). By convention, RIND(n+1)
is set to the total number of nonzeroes plus one. This structure should
become clear by examining the following example (see also the URL above).
Consider


2
−1
0
0
−1/3
3
−2/3
0 
.
A=
 0
−1/4
4
−3/4
0
0
−1/5
5
The modified CSR structure is as follows:
VAL
2
−1 3 −1/3 −2/3 4 −1/4 −3/4 5
−1/5
CIND
1
2 2
1
3
3
2
4
4
3
8
and
RIND 1 3 6 9 11.
Note that the i’th entry of RIND points to the start of the nonzero values
(in VAL) for the i’th row. It also points to the start of the column indices for
that row. The modification is that we always put the diagonal entry at that
location, i.e. VAL(RIND(i))=Ai,i . The general CSR structure does not do
this. In fact, the general CSR storage scheme does not even have a diagonal
entry whenever the diagoal entry is zero.
Download