Orthogonality

advertisement
Jim Lambers
MAT 610
Summer Session 2009-10
Lecture 3 Notes
These notes correspond to Sections 2.5-2.7 in the text.
Orthogonality
A set of vectors {x1 , x2 , . . . , xπ‘˜ } in ℝ𝑛 is said to be orthogonal if x𝑇𝑖 x𝑗 = 0 whenever 𝑖 βˆ•= 𝑗. If, in
addition, x𝑇𝑖 x𝑖 = 1 for 𝑖 = 1, 2, . . . , π‘˜, then we say that the set is orthonormal. In this case, we can
also write x𝑇𝑖 x𝑗 = 𝛿𝑖𝑗 , where
{
1 𝑖=𝑗
𝛿𝑖𝑗 =
0 𝑖 βˆ•= 𝑗
is called the Kronecker delta.
[
]𝑇
[
]𝑇
are orthogonal, as x𝑇 y = 1(4) −
and y = 4 2 1
Example The vectors x = 1 −3 2
√
3(2) + 2(1) = 4 − 6 + 2 = 0. Using the fact that βˆ₯xβˆ₯2 = x𝑇 x, we can normalize these vectors to
obtain orthonormal vectors
⎑
⎀
⎑ ⎀
1
4
x
1
y
1
x̃ =
= √ ⎣ −3 ⎦ , yΜƒ =
= √ ⎣ 2 ⎦,
βˆ₯xβˆ₯2
βˆ₯yβˆ₯2
14
21 1
2
which satisfy x̃𝑇 xΜƒ = ỹ𝑇 yΜƒ = 1, x̃𝑇 yΜƒ = 0. β–‘
The concept of orthogonality extends to subspaces. If 𝑆1 , 𝑆2 , . . . , π‘†π‘˜ is a set of subspaces of β„π‘š ,
we say that these subspaces are mutually orthogonal if x𝑇 y = 0 whenever x ∈ 𝑆𝑖 , y ∈ 𝑆𝑗 , for 𝑖 βˆ•= 𝑗.
Furthermore, we define the orthogonal complement of a subspace 𝑆 ⊆ ℝ𝑛 to be the set 𝑆 ⊥ defined
by
𝑆 ⊥ = {y ∈ β„π‘š ∣ x𝑇 y = 0, x ∈ 𝑆}.
Example Let 𝑆 be the subspace of ℝ3 defined by 𝑆 = span{v1 , v2 }, where
⎑ ⎀
⎑
⎀
1
1
v1 = ⎣ 1 ⎦ , v2 = ⎣ −1 ⎦ .
1
1
{[
]𝑇 }
1 0 −1
Then, 𝑆 has the orthogonal complement 𝑆 ⊥ = span
.
It can be verified directly, by computing inner products, that the single basis vector of 𝑆 ⊥ ,
[
]𝑇
v3 = 1 0 −1 , is orthogonal to either basis vector of 𝑆, v1 or v2 . It follows that any
1
multiple of this vector, which describes any vector in the span of v3 , is orthogonal to any linear
combination of these basis vectors, which accounts for any vector in 𝑆, due to the linearity of the
inner product:
(𝑐3 v3 )𝑇 (𝑐1 v1 + 𝑐2 v2 ) = 𝑐3 𝑐1 v3𝑇 v1 + 𝑐3 𝑐2 v3𝑇 v2 = 𝑐3 𝑐3 (0) + 𝑐3 𝑐2 (0) = 0.
Therefore, any vector in 𝑆 ⊥ is in fact orthogonal to any vector in 𝑆. β–‘
It is helpful to note that given a basis {v1 , v2 } for any two-dimensional subspace 𝑆 ⊆ ℝ3 , a basis
{v3 } for 𝑆 ⊥ can be obtained by computing the cross-product of the basis vectors of 𝑆,
⎑
⎀
(v1 )2 (v2 )3 − (v1 )3 (v2 )2
v3 = v1 × v2 = ⎣ (v1 )3 (v2 )1 − (v1 )1 (v2 )3 ⎦ .
(v1 )1 (v2 )2 − (v1 )2 (v2 )1
A similar process can be used to compute the orthogonal complement of any (𝑛 − 1)-dimensional
subspace of ℝ𝑛 . This orthogonal complement will always have dimension 1, because the sum of the
dimension of any subspace of a finite-dimensional vector space, and the dimension of its orthogonal
complement, is equal to the dimension of the vector space. That is, if dim(𝑉 ) = 𝑛, and 𝑆 ⊆ 𝑉 is a
subspace of 𝑉 , then
dim(𝑆) + dim(𝑆 ⊥ ) = 𝑛.
Let 𝑆 = range(𝐴) for an π‘š × π‘› matrix 𝐴. By the definition of the range of a matrix, if y ∈ 𝑆,
then y = 𝐴x for some vector x ∈ ℝ𝑛 . If z ∈ β„π‘š is such that z𝑇 y = 0, then we have
z𝑇 𝐴x = z𝑇 (𝐴𝑇 )𝑇 x = (𝐴𝑇 z)𝑇 x = 0.
If this is true for all y ∈ 𝑆, that is, if z ∈ 𝑆 ⊥ , then we must have 𝐴𝑇 z = 0. Therefore, z ∈ null(𝐴𝑇 ),
and we conclude that 𝑆 ⊥ = null(𝐴𝑇 ). That is, the orthogonal complement of the range is the null
space of the transpose.
Example Let
⎑
⎀
1
1
𝐴 = ⎣ 2 −1 ⎦ .
3
2
If 𝑆 = range(𝐴) = span{a1 , a2 }, where a1 and a2 are the columns of 𝐴, then 𝑆 ⊥ = span{b}, where
[
]𝑇
b = a1 × a2 = 7 1 −3 . We then note that
𝐴𝑇 b =
[
1
2 3
1 −1 2
]
2
⎑
⎀
[ ]
7
⎣ 1 ⎦= 0 .
0
−3
That is, b is in the null space of 𝐴𝑇 . In fact, because of the relationships
rank(𝐴𝑇 ) + dim(null(𝐴𝑇 )) = 3,
rank(𝐴) = rank(𝐴𝑇 ) = 2,
we have dim(null(𝐴𝑇 )) = 1, so null(𝐴) = span{b}. β–‘
A basis for a subspace 𝑆 ⊆ β„π‘š is an orthonormal basis if the vectors in the basis form an
orthonormal set. For example, let 𝑄1 be an π‘š × π‘Ÿ matrix with orthonormal columns. Then these
columns form an orthonormal basis for range(𝑄1 ). If π‘Ÿ < π‘š, then this basis can be extended to a
basis of all of β„π‘š by obtaining an orthonormal basis of range(𝑄1 )⊥ = null(𝑄𝑇1 ).
⊥
If we define 𝑄2 to be a matrix whose
[ columns] form an orthonormal basis of range(𝑄1 )𝑇 , then
the columns of the π‘š × π‘š matrix 𝑄 = 𝑄1 𝑄2 are also orthonormal. It follows that 𝑄 𝑄 = 𝐼.
That is, 𝑄𝑇 = 𝑄−1 . Furthermore, because the inverse of a matrix is unique, 𝑄𝑄𝑇 = 𝐼 as well. We
say that 𝑄 is an orthogonal matrix.
One interesting property of orthogonal matrices is that they preserve certain norms. For example, if 𝑄 is an orthogonal 𝑛 × π‘› matrix, then βˆ₯𝑄xβˆ₯22 = x𝑇 𝑄𝑇 𝑄x = x𝑇 x = βˆ₯xβˆ₯22 . That is, orthogonal
matrices preserve β„“2 -norms. Geometrically, multiplication by an 𝑛 × π‘› orthogonal matrix preserves
the length of a vector, and performs a rotation in 𝑛-dimensional space.
Example The matrix
⎀
cos πœƒ 0 sin πœƒ
0
1
0 ⎦
𝑄=⎣
− sin πœƒ 0 cos πœƒ
⎑
is an orthogonal matrix, for any angle πœƒ. For any vector x ∈ ℝ3 , the vector 𝑄x can be obtained by
rotating x clockwise by the angle πœƒ in the π‘₯𝑧-plane, while 𝑄𝑇 = 𝑄−1 performs a counterclockwise
rotation by πœƒ in the π‘₯𝑧-plane. β–‘
In addition, orthogonal matrices preserve the matrix β„“2 and Frobenius norms. That is, if 𝐴 is
an π‘š × π‘› matrix, and 𝑄 and 𝑍 are π‘š × π‘š and 𝑛 × π‘› orthogonal matrices, respectively, then
βˆ₯𝑄𝐴𝑍βˆ₯𝐹 = βˆ₯𝐴βˆ₯𝐹 ,
βˆ₯𝑄𝐴𝑍βˆ₯2 = βˆ₯𝐴βˆ₯2 .
The Singular Value Decomposition
The matrix β„“2 -norm can be used to obtain a highly useful decomposition of any π‘š × π‘› matrix 𝐴.
Let x be a unit β„“2 -norm vector (that is, βˆ₯xβˆ₯2 = 1 such that βˆ₯𝐴xβˆ₯2 = βˆ₯𝐴βˆ₯2 . If z = 𝐴x, and we
define y = z/βˆ₯zβˆ₯2 , then we have 𝐴x = 𝜎y, with 𝜎 = βˆ₯𝐴βˆ₯2 , and both x and y are unit vectors.
3
𝑛 and β„π‘š , respectively, to obtain
We can extend x and y,
to orthonormal [bases of ℝ
]
]
[ separately,
orthogonal matrices 𝑉1 = x 𝑉2 ∈ ℝ𝑛×𝑛 and π‘ˆ1 = y π‘ˆ2 ∈ β„π‘š×π‘š . We then have
[ 𝑇 ]
]
[
y
𝑇
π‘ˆ1 𝐴𝑉1 =
𝐴 x 𝑉2
𝑇
π‘ˆ2
[ 𝑇
]
y 𝐴x y𝑇 𝐴𝑉2
=
π‘ˆ2𝑇 𝐴x π‘ˆ2𝑇 𝐴𝑉2
[
]
𝜎y𝑇 y y𝑇 𝐴𝑉2
=
πœŽπ‘ˆ2𝑇 y π‘ˆ2𝑇 𝐴𝑉2
[
]
𝜎 w𝑇
=
0 𝐡
≡ 𝐴1 ,
where 𝐡 = π‘ˆ2𝑇 𝐴𝑉2 and w = 𝑉2𝑇 𝐴𝑇 y. Now, let
[
s=
𝜎
w
]
.
Then βˆ₯sβˆ₯22 = 𝜎 2 + βˆ₯wβˆ₯22 . Furthermore, the first component of 𝐴1 s is 𝜎 2 + βˆ₯wβˆ₯22 .
It follows from the invariance of the matrix β„“2 -norm under orthogonal transformations that
𝜎 2 = βˆ₯𝐴βˆ₯22 = βˆ₯𝐴1 βˆ₯22 ≥
βˆ₯𝐴1 sβˆ₯22
(𝜎 2 + βˆ₯wβˆ₯22 )2
≥
= 𝜎 2 + βˆ₯wβˆ₯22 ,
2
βˆ₯sβˆ₯2
𝜎 2 + βˆ₯wβˆ₯22
and therefore we must have w = 0. We then have
π‘ˆ1𝑇 𝐴𝑉1
[
= 𝐴1 =
𝜎 0
0 𝐡
]
.
Continuing this process on 𝐡, and keeping in mind that βˆ₯𝐡βˆ₯2 ≤ βˆ₯𝐴βˆ₯2 , we obtain the decomposition
π‘ˆ 𝑇 𝐴𝑉 = Σ
where
π‘ˆ=
[
u1 ⋅ ⋅ ⋅ uπ‘š
]
∈ β„π‘š×π‘š ,
𝑉 =
[
v1 ⋅ ⋅ ⋅ v𝑛
]
∈ ℝ𝑛×𝑛
are both orthogonal matrices, and
Σ = diag(𝜎1 , . . . , πœŽπ‘ ) ∈ β„π‘š×𝑛 ,
𝑝 = min{π‘š, 𝑛}
is a diagonal matrix, in which Σ𝑖𝑖 = πœŽπ‘– for 𝑖 = 1, 2, . . . , 𝑝, and Σ𝑖𝑗 = 0 for 𝑖 βˆ•= 𝑗. Furthermore, we
have
βˆ₯𝐴βˆ₯2 = 𝜎1 ≥ 𝜎2 ≥ ⋅ ⋅ ⋅ ≥ πœŽπ‘ ≥ 0.
4
This decomposition of 𝐴 is called the singular value decomposition, or SVD. It is more commonly
written as a factorization of 𝐴,
𝐴 = π‘ˆ Σ𝑉 𝑇 .
The diagonal entries of Σ are the singular values of 𝐴. The columns of π‘ˆ are the left singular
vectors, and the columns of 𝑉 are the right singluar vectors. It follows from the SVD itself that
the singular values and vectors satisfy the relations
𝐴𝑇 u𝑖 = πœŽπ‘– v𝑖 ,
𝐴v𝑖 = πœŽπ‘– u𝑖 ,
𝑖 = 1, 2, . . . , min{π‘š, 𝑛}.
For convenience, we denote the 𝑖th largest singular value of 𝐴 by πœŽπ‘– (𝐴), and the largest and smallest
singular values are commonly denoted by πœŽπ‘šπ‘Žπ‘₯ (𝐴) and πœŽπ‘šπ‘–π‘› (𝐴), respectively.
The SVD conveys much useful information about the structure of a matrix, particularly with
regard to systems of linear equations involving the matrix. Let π‘Ÿ be the number of nonzero singular
values. Then π‘Ÿ is the rank of 𝐴, and
range(𝐴) = span{u1 , . . . , uπ‘Ÿ },
null(𝐴) = span{vπ‘Ÿ+1 , . . . , v𝑛 }.
That is, the SVD yields orthonormal bases of the range and null space of 𝐴.
It follows that we can write
π‘Ÿ
∑
𝐴=
πœŽπ‘– u𝑖 v𝑖𝑇 .
𝑖=1
This is called the SVD expansion of 𝐴. If π‘š ≥ 𝑛, then this expansion yields the “economy-size”
SVD
𝐴 = π‘ˆ1 Σ 1 𝑉 𝑇 ,
where
π‘ˆ1 =
[
u1 ⋅ ⋅ ⋅ u𝑛
]
∈ β„π‘š×𝑛 ,
Σ1 = diag(𝜎1 , . . . , πœŽπ‘› ) ∈ ℝ𝑛×𝑛 .
Example The matrix
⎑
⎀
11 19 11
𝐴 = ⎣ 9 21 9 ⎦
10 20 10
has the SVD 𝐴 = π‘ˆ Σ𝑉 𝑇 where
√
√ ⎀
⎑ √
1/√3 −1/√2
1/√6
π‘ˆ = ⎣ 1/√3
1/ 2
1/ 6 ⎦ ,
√
1/ 3
0 − 2/3
and
√
√
√ ⎀
1/ 6 −1/√3
1/ 2
√
⎦
𝑉 = ⎣ 2/3
1/√3
√
√0 ,
1/ 6 −1/ 3 −1/ 2
⎑
⎑
⎀
42.4264
0 0
0 2.4495 0 ⎦ .
𝑆=⎣
0
0 0
5
]
]
[
[
Let π‘ˆ = u1 u2 u3 and 𝑉 = v1 v2 v3 be column partitions of π‘ˆ and 𝑉 , respectively.
Because there are only two nonzero singular values, we have rank(𝐴) = 2, Furthermore, range(𝐴) =
span{u1 , u2 }, and null(𝐴) = span{v3 }. We also have
𝐴 = 42.4264u1 v1𝑇 + 2.4495u2 v2𝑇 .
β–‘
The SVD is also closely related to the β„“2 -norm and Frobenius norm. We have
βˆ₯𝐴βˆ₯2 = 𝜎1 ,
and
min
xβˆ•=0
βˆ₯𝐴βˆ₯2𝐹 = 𝜎12 + 𝜎22 + ⋅ ⋅ ⋅ + πœŽπ‘Ÿ2 ,
βˆ₯𝐴xβˆ₯2
= πœŽπ‘ ,
βˆ₯xβˆ₯2
𝑝 = min{π‘š, 𝑛}.
These relationships follow directly from the invariance of these norms under orthogonal transformations.
The SVD has many useful applications, but one of particular interest is that the truncated SVD
expansion
π‘˜
∑
π΄π‘˜ =
πœŽπ‘– u𝑖 v𝑖𝑇 ,
𝑖=1
where π‘˜ < π‘Ÿ = rank(𝐴), is the best approximation of 𝐴 by a rank-π‘˜ matrix. It follows that the
distance between 𝐴 and the set of matrices of rank π‘˜ is
min βˆ₯𝐴 − 𝐡βˆ₯2 = βˆ₯𝐴 − π΄π‘˜ βˆ₯2 =
rank(𝐡)=π‘˜
π‘Ÿ
∑
πœŽπ‘– u𝑖 v𝑖𝑇 = πœŽπ‘˜+1 ,
𝑖=π‘˜+1
because the β„“2 -norm of a matrix is its largest singular value, and πœŽπ‘˜+1 is the largest singular value
of the matrix obtained by removing the first π‘˜ terms of the SVD expansion. Therefore, πœŽπ‘ , where
𝑝 = min{π‘š, 𝑛}, is the distance between 𝐴 and the set of all rank-deficient matrices (which is zero
when 𝐴 is already rank-deficient). Because a matrix of full rank can have arbitarily small, but still
positive, singular values, it follows that the set of all full-rank matrices in β„π‘š×𝑛 is both open and
dense.
Example The best approximation of the matrix 𝐴 from the previous example, which has rank
two, by a matrix of rank one is obtained by taking only the first term of its SVD expansion,
⎑
⎀
10 20 10
𝐴1 = 42.4264u1 v1𝑇 = ⎣ 10 20 10 ⎦ .
10 20 10
6
The absolute error in this approximation is
βˆ₯𝐴 − 𝐴1 βˆ₯2 = 𝜎2 βˆ₯u2 v2𝑇 βˆ₯2 = 𝜎2 = 2.4495,
while the relative error is
βˆ₯𝐴 − 𝐴1 βˆ₯2
2.4495
=
= 0.0577.
βˆ₯𝐴βˆ₯2
42.4264
That is, the rank-one approximation of 𝐴 is off by a little less than six percent. β–‘
Suppose that the entries of a matrix 𝐴 have been determined experimentally, for example by
measurements, and the error in the entries is determined to be bounded by some value πœ–. Then, if
πœŽπ‘˜ > πœ– ≥ πœŽπ‘˜+1 for some π‘˜ < min{π‘š, 𝑛}, then πœ– is at least as large as the distance between 𝐴 and
the set of all matrices of rank π‘˜. Therefore, due to the uncertainty of the measurement, 𝐴 could
very well be a matrix of rank π‘˜, but it cannot be of lower rank, because πœŽπ‘˜ > πœ–. We say that the
πœ–-rank of 𝐴, defined by
rank(𝐴, πœ–) = min rank(𝐡),
βˆ₯𝐴−𝐡βˆ₯2 ≤πœ–
is equal to π‘˜. If π‘˜ < min{π‘š, 𝑛}, then we say that 𝐴 is numerically rank-deficient.
Projections
Given a vector z ∈ ℝ𝑛 , and a subspace 𝑆 ⊆ ℝ𝑛 , it is often useful to obtain the vector w ∈ 𝑆 that
best approximates z, in the sense that z − w ∈ 𝑆 ⊥ . In other words, the error in w is orthogonal to
the subspace in which w is sought. To that end, we define the orthogonal projection onto 𝑆 to be
an 𝑛 × π‘› matrix 𝑃 such that if w = 𝑃 z, then w ∈ 𝑆, and w𝑇 (z − w) = 0.
Substituting w = 𝑃 z into w𝑇 (z − w) = 0, and noting the commutativity of the inner product,
we see that 𝑃 must satisfy the relations
z𝑇 (𝑃 𝑇 − 𝑃 𝑇 𝑃 )z = 0,
z𝑇 (𝑃 − 𝑃 𝑇 𝑃 )z = 0,
for any vector z ∈ ℝ𝑛 . It follows from these relations that 𝑃 must have the following properties:
range(𝑃 ) = 𝑆, and 𝑃 𝑇 𝑃 = 𝑃 = 𝑃 𝑇 , which yields 𝑃 2 = 𝑃 . Using these properties, it can be shown
that the orthogonal projection 𝑃 onto a given subspace 𝑆 is unique. Furthermore, it can be shown
that 𝑃 = 𝑉 𝑉 𝑇 , where 𝑉 is a matrix whose columns are an orthonormal basis for 𝑆.
Projections and the SVD
Let 𝐴 ∈ β„π‘š×𝑛 have the SVD 𝐴 = π‘ˆ Σ𝑉 𝑇 , where
[
]
[
]
˜π‘Ÿ , 𝑉 = π‘‰π‘Ÿ 𝑉˜π‘Ÿ ,
π‘ˆ = π‘ˆπ‘Ÿ π‘ˆ
7
where π‘Ÿ = rank(𝐴) and
π‘ˆπ‘Ÿ =
[
u1 ⋅ ⋅ ⋅ uπ‘Ÿ
]
,
π‘‰π‘Ÿ =
[
v1 ⋅ ⋅ ⋅ vπ‘Ÿ
]
.
Then, we obtain the following projections related to the SVD:
βˆ™ π‘ˆπ‘Ÿ π‘ˆπ‘Ÿπ‘‡ is the projection onto the range of 𝐴
βˆ™ 𝑉˜π‘Ÿ 𝑉˜π‘Ÿπ‘‡ is the projection onto the null space of 𝐴
˜π‘Ÿ π‘ˆ
˜π‘Ÿπ‘‡ is the projection onto the orthogonal complement of the range of 𝐴, which is the null
βˆ™ π‘ˆ
space of 𝐴𝑇
βˆ™ π‘‰π‘Ÿ π‘‰π‘Ÿπ‘‡ is the projection onto the orthogonal complement of the null space of 𝐴, which is the
range of 𝐴𝑇 .
These projections can also be determined by noting that the SVD of 𝐴𝑇 is 𝐴𝑇 = 𝑉 Σπ‘ˆ 𝑇 .
Example For the matrix
⎑
⎀
11 19 11
𝐴 = ⎣ 9 21 9 ⎦ ,
10 20 10
that has rank two, the orthogonal projection onto range(𝐴) is
]
[
[
] u𝑇1
= u1 u𝑇1 + u2 u𝑇2 ,
π‘ˆ2 π‘ˆ2𝑇 = u1 u2
u𝑇2
[
]
where π‘ˆ = u1 u2 u3 is the previously defined column partition of the matrix π‘ˆ from the
SVD of 𝐴. β–‘
Sensitivity of Linear Systems
Let 𝐴 ∈ β„π‘š×𝑛 be nonsingular, and let 𝐴 = π‘ˆ Σ𝑉 𝑇 be the SVD of 𝐴. Then the solution x of the
system of linear equations 𝐴x = b can be expressed as
x = 𝐴−1 b = 𝑉 Σ−1 π‘ˆ 𝑇 b =
𝑛
∑
u𝑇 b
𝑖
𝑖=1
πœŽπ‘–
v𝑖 .
This formula for x suggests that if πœŽπ‘› is small relative to the other singular values, then the system
𝐴x = b can be sensitive to perturbations in 𝐴 or b. This makes sense, considering that πœŽπ‘› is the
distance between 𝐴 and the set of all singular 𝑛 × π‘› matrices.
8
In an attempt to measure the sensitivity of this system, we consider the parameterized system
(𝐴 + πœ–πΈ)x(πœ–) = b + πœ–e,
where E ∈ ℝ𝑛×𝑛 and e ∈ ℝ𝑛 are perturbations of 𝐴 and b, respectively. Taking the Taylor
expansion of x(πœ–) around πœ– = 0 yields
x(πœ–) = x + πœ–x′ (0) + 𝑂(πœ–2 ),
where
x′ (πœ–) = (𝐴 + πœ–πΈ)−1 (e − 𝐸x),
which yields x′ (0) = 𝐴−1 (e − 𝐸x).
Using norms to measure the relative error in x, we obtain
βˆ₯x(πœ–) − xβˆ₯
βˆ₯𝐴−1 (e − 𝐸x)βˆ₯
= βˆ£πœ–βˆ£
+ 𝑂(πœ–2 ) ≤ βˆ£πœ–βˆ£βˆ₯𝐴−1 βˆ₯
βˆ₯xβˆ₯
βˆ₯xβˆ₯
{
}
βˆ₯eβˆ₯
+ βˆ₯𝐸βˆ₯ + 𝑂(πœ–2 ).
βˆ₯xβˆ₯
Multiplying and dividing by βˆ₯𝐴βˆ₯, and using 𝐴x = b to obtain βˆ₯bβˆ₯ ≤ βˆ₯𝐴βˆ₯βˆ₯xβˆ₯, yields
(
)
βˆ₯eβˆ₯ βˆ₯𝐸βˆ₯
βˆ₯x(πœ–) − xβˆ₯
= πœ…(𝐴)βˆ£πœ–βˆ£
+
,
βˆ₯xβˆ₯
βˆ₯bβˆ₯ βˆ₯𝐴βˆ₯
where
πœ…(𝐴) = βˆ₯𝐴βˆ₯βˆ₯𝐴−1 βˆ₯
is called the condition number of 𝐴. We conclude that the relative errors in 𝐴 and b can be
amplified by πœ…(𝐴) in the solution. Therefore, if πœ…(𝐴) is large, the problem 𝐴x = b can be quite
sensitive to perturbations in 𝐴 and b. In this case, we say that 𝐴 is ill-conditioned; otherwise, we
say that 𝐴 is well-conditioned.
The definition of the condition number depends on the matrix norm that is used. Using the
β„“2 -norm, we obtain
𝜎1 (𝐴)
πœ…2 (𝐴) = βˆ₯𝐴βˆ₯2 βˆ₯𝐴−1 βˆ₯2 =
.
πœŽπ‘› (𝐴)
It can readily be seen from this formula that πœ…2 (𝐴) is large if πœŽπ‘› is small relative to 𝜎1 . We
also note that because the singular values are the lengths of the semi-axes of the hyperellipsoid
{𝐴x ∣ βˆ₯xβˆ₯2 = 1}, the condition number in the β„“2 -norm measures the elongation of this hyperellipsoid.
Example The matrices
[
𝐴1 =
0.7674 0.0477
0.6247 0.1691
]
[
,
𝐴2 =
0.7581 0.1113
0.6358 0.0933
]
do not appear to be very different from one another, but πœ…2 (𝐴1 ) = 10 while πœ…2 (𝐴2 ) = 1010 . That
is, 𝐴1 is well-conditioned while 𝐴2 is ill-conditioned.
9
To illustrate the ill-conditioned nature of 𝐴2 , we solve the systems 𝐴2 x1 = b1 and 𝐴2 x2 = b2
for the unknown vectors x1 and x2 , where
[
]
[
]
0.7662
0.7019
b1 =
, b2 =
.
0.6426
0.7192
These vectors differ from one another by roughly 10%, but the solutions
[
]
[
]
0.9894
−1.4522 × 108
x1 =
, x2 =
0.1452
9.8940 × 108
differ by several orders of magnitude, because of the sensitivity of 𝐴2 to perturbations. β–‘
Just as the largest singular value of 𝐴 is the β„“2 -norm of 𝐴, and the smallest singular value is
the distance from 𝐴 to the nearest singular matrix in the β„“2 -norm, we have, for any ℓ𝑝 -norm,
βˆ₯Δ𝐴βˆ₯𝑝
1
=
min
.
πœ…π‘ (𝐴) 𝐴+Δ𝐴 singular βˆ₯𝐴βˆ₯𝑝
That is, in any ℓ𝑝 -norm, πœ…π‘ (𝐴) measures the relative distance in that norm from 𝐴 to the set of
singular matrices.
Because det(𝐴) = 0 if and only if 𝐴 is singular, it would appear that the determinant could be
used to measure the distance from 𝐴 to the nearest singular matrix. However, this is generally not
the case. It is possible for a matrix to have a relatively large determinant, but be very close to a
singular matrix, or for a matrix to have a relatively small determinant, but not be nearly singular.
In other words, there is very little correlation between det(𝐴) and the condition number of 𝐴.
Example Let
⎑
⎒
⎒
⎒
⎒
⎒
⎒
⎒
𝐴=⎒
⎒
⎒
⎒
⎒
⎒
⎒
⎣
1 −1 −1 −1 −1 −1 −1 −1 −1 −1
0
1 −1 −1 −1 −1 −1 −1 −1 −1
0
0
1 −1 −1 −1 −1 −1 −1 −1
0
0
0
1 −1 −1 −1 −1 −1 −1
0
0
0
0
1 −1 −1 −1 −1 −1
0
0
0
0
0
1 −1 −1 −1 −1
0
0
0
0
0
0
1 −1 −1 −1
0
0
0
0
0
0
0
1 −1 −1
0
0
0
0
0
0
0
0
1 −1
0
0
0
0
0
0
0
0
0
1
⎀
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯.
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
⎦
Then det(𝐴) = 1, but πœ…2 (𝐴) ≈ 1, 918, and 𝜎10 ≈ 0.0029. That is, 𝐴 is quite close to a singular
𝑇 ,
matrix, even though det(𝐴) is not near zero. For example, the nearby matrix 𝐴˜ = 𝐴 − 𝜎10 u10 v10
10
which has entries
⎑
⎒
⎒
⎒
⎒
⎒
⎒
⎒
˜
𝐴≈⎒
⎒
⎒
⎒
⎒
⎒
⎒
⎣
1
−0
−0
−0
−0.0001
−0.0001
−0.0003
−0.0005
−0.0011
−0.0022
−1
1
−0
−0
−0
−0.0001
−0.0001
−0.0003
−0.0005
−0.0011
−1
−1
−1
−1
−1
−1
−1
−1
1
−1
−1
−1
−0
1
−1
−1
−0
−0
1
−1
−0
−0
−0
1
−0.0001
−0
−0
−0
−0.0001 −0.0001
−0
−0
−0.0003 −0.0001 −0.0001
−0
−0.0005 −0.0003 −0.0001 −0.0001
is singular. β–‘
11
−1
−1
−1
−1
−1
−1
1
−0
−0
−0
−1
−1
−1
−1
−1
−1
−1
1
−0
−0
−1
−1
−1
−1
−1
−1
−1
−1
1
−0
−1
−1
−1
−1
−1
−1
−1
−1
−1
1
⎀
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯,
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
βŽ₯
⎦
Download