Jim Lambers MAT 610 Summer Session 2009-10 Lecture 3 Notes These notes correspond to Sections 2.5-2.7 in the text. Orthogonality A set of vectors {x1 , x2 , . . . , xπ } in βπ is said to be orthogonal if xππ xπ = 0 whenever π β= π. If, in addition, xππ xπ = 1 for π = 1, 2, . . . , π, then we say that the set is orthonormal. In this case, we can also write xππ xπ = πΏππ , where { 1 π=π πΏππ = 0 π β= π is called the Kronecker delta. [ ]π [ ]π are orthogonal, as xπ y = 1(4) − and y = 4 2 1 Example The vectors x = 1 −3 2 √ 3(2) + 2(1) = 4 − 6 + 2 = 0. Using the fact that β₯xβ₯2 = xπ x, we can normalize these vectors to obtain orthonormal vectors β‘ β€ β‘ β€ 1 4 x 1 y 1 xΜ = = √ β£ −3 β¦ , yΜ = = √ β£ 2 β¦, β₯xβ₯2 β₯yβ₯2 14 21 1 2 which satisfy xΜπ xΜ = yΜπ yΜ = 1, xΜπ yΜ = 0. β‘ The concept of orthogonality extends to subspaces. If π1 , π2 , . . . , ππ is a set of subspaces of βπ , we say that these subspaces are mutually orthogonal if xπ y = 0 whenever x ∈ ππ , y ∈ ππ , for π β= π. Furthermore, we deο¬ne the orthogonal complement of a subspace π ⊆ βπ to be the set π ⊥ deο¬ned by π ⊥ = {y ∈ βπ β£ xπ y = 0, x ∈ π}. Example Let π be the subspace of β3 deο¬ned by π = span{v1 , v2 }, where β‘ β€ β‘ β€ 1 1 v1 = β£ 1 β¦ , v2 = β£ −1 β¦ . 1 1 {[ ]π } 1 0 −1 Then, π has the orthogonal complement π ⊥ = span . It can be veriο¬ed directly, by computing inner products, that the single basis vector of π ⊥ , [ ]π v3 = 1 0 −1 , is orthogonal to either basis vector of π, v1 or v2 . It follows that any 1 multiple of this vector, which describes any vector in the span of v3 , is orthogonal to any linear combination of these basis vectors, which accounts for any vector in π, due to the linearity of the inner product: (π3 v3 )π (π1 v1 + π2 v2 ) = π3 π1 v3π v1 + π3 π2 v3π v2 = π3 π3 (0) + π3 π2 (0) = 0. Therefore, any vector in π ⊥ is in fact orthogonal to any vector in π. β‘ It is helpful to note that given a basis {v1 , v2 } for any two-dimensional subspace π ⊆ β3 , a basis {v3 } for π ⊥ can be obtained by computing the cross-product of the basis vectors of π, β‘ β€ (v1 )2 (v2 )3 − (v1 )3 (v2 )2 v3 = v1 × v2 = β£ (v1 )3 (v2 )1 − (v1 )1 (v2 )3 β¦ . (v1 )1 (v2 )2 − (v1 )2 (v2 )1 A similar process can be used to compute the orthogonal complement of any (π − 1)-dimensional subspace of βπ . This orthogonal complement will always have dimension 1, because the sum of the dimension of any subspace of a ο¬nite-dimensional vector space, and the dimension of its orthogonal complement, is equal to the dimension of the vector space. That is, if dim(π ) = π, and π ⊆ π is a subspace of π , then dim(π) + dim(π ⊥ ) = π. Let π = range(π΄) for an π × π matrix π΄. By the deο¬nition of the range of a matrix, if y ∈ π, then y = π΄x for some vector x ∈ βπ . If z ∈ βπ is such that zπ y = 0, then we have zπ π΄x = zπ (π΄π )π x = (π΄π z)π x = 0. If this is true for all y ∈ π, that is, if z ∈ π ⊥ , then we must have π΄π z = 0. Therefore, z ∈ null(π΄π ), and we conclude that π ⊥ = null(π΄π ). That is, the orthogonal complement of the range is the null space of the transpose. Example Let β‘ β€ 1 1 π΄ = β£ 2 −1 β¦ . 3 2 If π = range(π΄) = span{a1 , a2 }, where a1 and a2 are the columns of π΄, then π ⊥ = span{b}, where [ ]π b = a1 × a2 = 7 1 −3 . We then note that π΄π b = [ 1 2 3 1 −1 2 ] 2 β‘ β€ [ ] 7 β£ 1 β¦= 0 . 0 −3 That is, b is in the null space of π΄π . In fact, because of the relationships rank(π΄π ) + dim(null(π΄π )) = 3, rank(π΄) = rank(π΄π ) = 2, we have dim(null(π΄π )) = 1, so null(π΄) = span{b}. β‘ A basis for a subspace π ⊆ βπ is an orthonormal basis if the vectors in the basis form an orthonormal set. For example, let π1 be an π × π matrix with orthonormal columns. Then these columns form an orthonormal basis for range(π1 ). If π < π, then this basis can be extended to a basis of all of βπ by obtaining an orthonormal basis of range(π1 )⊥ = null(ππ1 ). ⊥ If we deο¬ne π2 to be a matrix whose [ columns] form an orthonormal basis of range(π1 )π , then the columns of the π × π matrix π = π1 π2 are also orthonormal. It follows that π π = πΌ. That is, ππ = π−1 . Furthermore, because the inverse of a matrix is unique, πππ = πΌ as well. We say that π is an orthogonal matrix. One interesting property of orthogonal matrices is that they preserve certain norms. For example, if π is an orthogonal π × π matrix, then β₯πxβ₯22 = xπ ππ πx = xπ x = β₯xβ₯22 . That is, orthogonal matrices preserve β2 -norms. Geometrically, multiplication by an π × π orthogonal matrix preserves the length of a vector, and performs a rotation in π-dimensional space. Example The matrix β€ cos π 0 sin π 0 1 0 β¦ π=β£ − sin π 0 cos π β‘ is an orthogonal matrix, for any angle π. For any vector x ∈ β3 , the vector πx can be obtained by rotating x clockwise by the angle π in the π₯π§-plane, while ππ = π−1 performs a counterclockwise rotation by π in the π₯π§-plane. β‘ In addition, orthogonal matrices preserve the matrix β2 and Frobenius norms. That is, if π΄ is an π × π matrix, and π and π are π × π and π × π orthogonal matrices, respectively, then β₯ππ΄πβ₯πΉ = β₯π΄β₯πΉ , β₯ππ΄πβ₯2 = β₯π΄β₯2 . The Singular Value Decomposition The matrix β2 -norm can be used to obtain a highly useful decomposition of any π × π matrix π΄. Let x be a unit β2 -norm vector (that is, β₯xβ₯2 = 1 such that β₯π΄xβ₯2 = β₯π΄β₯2 . If z = π΄x, and we deο¬ne y = z/β₯zβ₯2 , then we have π΄x = πy, with π = β₯π΄β₯2 , and both x and y are unit vectors. 3 π and βπ , respectively, to obtain We can extend x and y, to orthonormal [bases of β ] ] [ separately, orthogonal matrices π1 = x π2 ∈ βπ×π and π1 = y π2 ∈ βπ×π . We then have [ π ] ] [ y π π1 π΄π1 = π΄ x π2 π π2 [ π ] y π΄x yπ π΄π2 = π2π π΄x π2π π΄π2 [ ] πyπ y yπ π΄π2 = ππ2π y π2π π΄π2 [ ] π wπ = 0 π΅ ≡ π΄1 , where π΅ = π2π π΄π2 and w = π2π π΄π y. Now, let [ s= π w ] . Then β₯sβ₯22 = π 2 + β₯wβ₯22 . Furthermore, the ο¬rst component of π΄1 s is π 2 + β₯wβ₯22 . It follows from the invariance of the matrix β2 -norm under orthogonal transformations that π 2 = β₯π΄β₯22 = β₯π΄1 β₯22 ≥ β₯π΄1 sβ₯22 (π 2 + β₯wβ₯22 )2 ≥ = π 2 + β₯wβ₯22 , 2 β₯sβ₯2 π 2 + β₯wβ₯22 and therefore we must have w = 0. We then have π1π π΄π1 [ = π΄1 = π 0 0 π΅ ] . Continuing this process on π΅, and keeping in mind that β₯π΅β₯2 ≤ β₯π΄β₯2 , we obtain the decomposition π π π΄π = Σ where π= [ u1 ⋅ ⋅ ⋅ uπ ] ∈ βπ×π , π = [ v1 ⋅ ⋅ ⋅ vπ ] ∈ βπ×π are both orthogonal matrices, and Σ = diag(π1 , . . . , ππ ) ∈ βπ×π , π = min{π, π} is a diagonal matrix, in which Σππ = ππ for π = 1, 2, . . . , π, and Σππ = 0 for π β= π. Furthermore, we have β₯π΄β₯2 = π1 ≥ π2 ≥ ⋅ ⋅ ⋅ ≥ ππ ≥ 0. 4 This decomposition of π΄ is called the singular value decomposition, or SVD. It is more commonly written as a factorization of π΄, π΄ = π Σπ π . The diagonal entries of Σ are the singular values of π΄. The columns of π are the left singular vectors, and the columns of π are the right singluar vectors. It follows from the SVD itself that the singular values and vectors satisfy the relations π΄π uπ = ππ vπ , π΄vπ = ππ uπ , π = 1, 2, . . . , min{π, π}. For convenience, we denote the πth largest singular value of π΄ by ππ (π΄), and the largest and smallest singular values are commonly denoted by ππππ₯ (π΄) and ππππ (π΄), respectively. The SVD conveys much useful information about the structure of a matrix, particularly with regard to systems of linear equations involving the matrix. Let π be the number of nonzero singular values. Then π is the rank of π΄, and range(π΄) = span{u1 , . . . , uπ }, null(π΄) = span{vπ+1 , . . . , vπ }. That is, the SVD yields orthonormal bases of the range and null space of π΄. It follows that we can write π ∑ π΄= ππ uπ vππ . π=1 This is called the SVD expansion of π΄. If π ≥ π, then this expansion yields the “economy-size” SVD π΄ = π1 Σ 1 π π , where π1 = [ u1 ⋅ ⋅ ⋅ uπ ] ∈ βπ×π , Σ1 = diag(π1 , . . . , ππ ) ∈ βπ×π . Example The matrix β‘ β€ 11 19 11 π΄ = β£ 9 21 9 β¦ 10 20 10 has the SVD π΄ = π Σπ π where √ √ β€ β‘ √ 1/√3 −1/√2 1/√6 π = β£ 1/√3 1/ 2 1/ 6 β¦ , √ 1/ 3 0 − 2/3 and √ √ √ β€ 1/ 6 −1/√3 1/ 2 √ β¦ π = β£ 2/3 1/√3 √ √0 , 1/ 6 −1/ 3 −1/ 2 β‘ β‘ β€ 42.4264 0 0 0 2.4495 0 β¦ . π=β£ 0 0 0 5 ] ] [ [ Let π = u1 u2 u3 and π = v1 v2 v3 be column partitions of π and π , respectively. Because there are only two nonzero singular values, we have rank(π΄) = 2, Furthermore, range(π΄) = span{u1 , u2 }, and null(π΄) = span{v3 }. We also have π΄ = 42.4264u1 v1π + 2.4495u2 v2π . β‘ The SVD is also closely related to the β2 -norm and Frobenius norm. We have β₯π΄β₯2 = π1 , and min xβ=0 β₯π΄β₯2πΉ = π12 + π22 + ⋅ ⋅ ⋅ + ππ2 , β₯π΄xβ₯2 = ππ , β₯xβ₯2 π = min{π, π}. These relationships follow directly from the invariance of these norms under orthogonal transformations. The SVD has many useful applications, but one of particular interest is that the truncated SVD expansion π ∑ π΄π = ππ uπ vππ , π=1 where π < π = rank(π΄), is the best approximation of π΄ by a rank-π matrix. It follows that the distance between π΄ and the set of matrices of rank π is min β₯π΄ − π΅β₯2 = β₯π΄ − π΄π β₯2 = rank(π΅)=π π ∑ ππ uπ vππ = ππ+1 , π=π+1 because the β2 -norm of a matrix is its largest singular value, and ππ+1 is the largest singular value of the matrix obtained by removing the ο¬rst π terms of the SVD expansion. Therefore, ππ , where π = min{π, π}, is the distance between π΄ and the set of all rank-deο¬cient matrices (which is zero when π΄ is already rank-deο¬cient). Because a matrix of full rank can have arbitarily small, but still positive, singular values, it follows that the set of all full-rank matrices in βπ×π is both open and dense. Example The best approximation of the matrix π΄ from the previous example, which has rank two, by a matrix of rank one is obtained by taking only the ο¬rst term of its SVD expansion, β‘ β€ 10 20 10 π΄1 = 42.4264u1 v1π = β£ 10 20 10 β¦ . 10 20 10 6 The absolute error in this approximation is β₯π΄ − π΄1 β₯2 = π2 β₯u2 v2π β₯2 = π2 = 2.4495, while the relative error is β₯π΄ − π΄1 β₯2 2.4495 = = 0.0577. β₯π΄β₯2 42.4264 That is, the rank-one approximation of π΄ is oο¬ by a little less than six percent. β‘ Suppose that the entries of a matrix π΄ have been determined experimentally, for example by measurements, and the error in the entries is determined to be bounded by some value π. Then, if ππ > π ≥ ππ+1 for some π < min{π, π}, then π is at least as large as the distance between π΄ and the set of all matrices of rank π. Therefore, due to the uncertainty of the measurement, π΄ could very well be a matrix of rank π, but it cannot be of lower rank, because ππ > π. We say that the π-rank of π΄, deο¬ned by rank(π΄, π) = min rank(π΅), β₯π΄−π΅β₯2 ≤π is equal to π. If π < min{π, π}, then we say that π΄ is numerically rank-deο¬cient. Projections Given a vector z ∈ βπ , and a subspace π ⊆ βπ , it is often useful to obtain the vector w ∈ π that best approximates z, in the sense that z − w ∈ π ⊥ . In other words, the error in w is orthogonal to the subspace in which w is sought. To that end, we deο¬ne the orthogonal projection onto π to be an π × π matrix π such that if w = π z, then w ∈ π, and wπ (z − w) = 0. Substituting w = π z into wπ (z − w) = 0, and noting the commutativity of the inner product, we see that π must satisfy the relations zπ (π π − π π π )z = 0, zπ (π − π π π )z = 0, for any vector z ∈ βπ . It follows from these relations that π must have the following properties: range(π ) = π, and π π π = π = π π , which yields π 2 = π . Using these properties, it can be shown that the orthogonal projection π onto a given subspace π is unique. Furthermore, it can be shown that π = π π π , where π is a matrix whose columns are an orthonormal basis for π. Projections and the SVD Let π΄ ∈ βπ×π have the SVD π΄ = π Σπ π , where [ ] [ ] ˜π , π = ππ π˜π , π = ππ π 7 where π = rank(π΄) and ππ = [ u1 ⋅ ⋅ ⋅ uπ ] , ππ = [ v1 ⋅ ⋅ ⋅ vπ ] . Then, we obtain the following projections related to the SVD: β ππ πππ is the projection onto the range of π΄ β π˜π π˜ππ is the projection onto the null space of π΄ ˜π π ˜ππ is the projection onto the orthogonal complement of the range of π΄, which is the null β π space of π΄π β ππ πππ is the projection onto the orthogonal complement of the null space of π΄, which is the range of π΄π . These projections can also be determined by noting that the SVD of π΄π is π΄π = π Σπ π . Example For the matrix β‘ β€ 11 19 11 π΄ = β£ 9 21 9 β¦ , 10 20 10 that has rank two, the orthogonal projection onto range(π΄) is ] [ [ ] uπ1 = u1 uπ1 + u2 uπ2 , π2 π2π = u1 u2 uπ2 [ ] where π = u1 u2 u3 is the previously deο¬ned column partition of the matrix π from the SVD of π΄. β‘ Sensitivity of Linear Systems Let π΄ ∈ βπ×π be nonsingular, and let π΄ = π Σπ π be the SVD of π΄. Then the solution x of the system of linear equations π΄x = b can be expressed as x = π΄−1 b = π Σ−1 π π b = π ∑ uπ b π π=1 ππ vπ . This formula for x suggests that if ππ is small relative to the other singular values, then the system π΄x = b can be sensitive to perturbations in π΄ or b. This makes sense, considering that ππ is the distance between π΄ and the set of all singular π × π matrices. 8 In an attempt to measure the sensitivity of this system, we consider the parameterized system (π΄ + ππΈ)x(π) = b + πe, where E ∈ βπ×π and e ∈ βπ are perturbations of π΄ and b, respectively. Taking the Taylor expansion of x(π) around π = 0 yields x(π) = x + πx′ (0) + π(π2 ), where x′ (π) = (π΄ + ππΈ)−1 (e − πΈx), which yields x′ (0) = π΄−1 (e − πΈx). Using norms to measure the relative error in x, we obtain β₯x(π) − xβ₯ β₯π΄−1 (e − πΈx)β₯ = β£πβ£ + π(π2 ) ≤ β£πβ£β₯π΄−1 β₯ β₯xβ₯ β₯xβ₯ { } β₯eβ₯ + β₯πΈβ₯ + π(π2 ). β₯xβ₯ Multiplying and dividing by β₯π΄β₯, and using π΄x = b to obtain β₯bβ₯ ≤ β₯π΄β₯β₯xβ₯, yields ( ) β₯eβ₯ β₯πΈβ₯ β₯x(π) − xβ₯ = π (π΄)β£πβ£ + , β₯xβ₯ β₯bβ₯ β₯π΄β₯ where π (π΄) = β₯π΄β₯β₯π΄−1 β₯ is called the condition number of π΄. We conclude that the relative errors in π΄ and b can be ampliο¬ed by π (π΄) in the solution. Therefore, if π (π΄) is large, the problem π΄x = b can be quite sensitive to perturbations in π΄ and b. In this case, we say that π΄ is ill-conditioned; otherwise, we say that π΄ is well-conditioned. The deο¬nition of the condition number depends on the matrix norm that is used. Using the β2 -norm, we obtain π1 (π΄) π 2 (π΄) = β₯π΄β₯2 β₯π΄−1 β₯2 = . ππ (π΄) It can readily be seen from this formula that π 2 (π΄) is large if ππ is small relative to π1 . We also note that because the singular values are the lengths of the semi-axes of the hyperellipsoid {π΄x β£ β₯xβ₯2 = 1}, the condition number in the β2 -norm measures the elongation of this hyperellipsoid. Example The matrices [ π΄1 = 0.7674 0.0477 0.6247 0.1691 ] [ , π΄2 = 0.7581 0.1113 0.6358 0.0933 ] do not appear to be very diο¬erent from one another, but π 2 (π΄1 ) = 10 while π 2 (π΄2 ) = 1010 . That is, π΄1 is well-conditioned while π΄2 is ill-conditioned. 9 To illustrate the ill-conditioned nature of π΄2 , we solve the systems π΄2 x1 = b1 and π΄2 x2 = b2 for the unknown vectors x1 and x2 , where [ ] [ ] 0.7662 0.7019 b1 = , b2 = . 0.6426 0.7192 These vectors diο¬er from one another by roughly 10%, but the solutions [ ] [ ] 0.9894 −1.4522 × 108 x1 = , x2 = 0.1452 9.8940 × 108 diο¬er by several orders of magnitude, because of the sensitivity of π΄2 to perturbations. β‘ Just as the largest singular value of π΄ is the β2 -norm of π΄, and the smallest singular value is the distance from π΄ to the nearest singular matrix in the β2 -norm, we have, for any βπ -norm, β₯Δπ΄β₯π 1 = min . π π (π΄) π΄+Δπ΄ singular β₯π΄β₯π That is, in any βπ -norm, π π (π΄) measures the relative distance in that norm from π΄ to the set of singular matrices. Because det(π΄) = 0 if and only if π΄ is singular, it would appear that the determinant could be used to measure the distance from π΄ to the nearest singular matrix. However, this is generally not the case. It is possible for a matrix to have a relatively large determinant, but be very close to a singular matrix, or for a matrix to have a relatively small determinant, but not be nearly singular. In other words, there is very little correlation between det(π΄) and the condition number of π΄. Example Let β‘ β’ β’ β’ β’ β’ β’ β’ π΄=β’ β’ β’ β’ β’ β’ β’ β£ 1 −1 −1 −1 −1 −1 −1 −1 −1 −1 0 1 −1 −1 −1 −1 −1 −1 −1 −1 0 0 1 −1 −1 −1 −1 −1 −1 −1 0 0 0 1 −1 −1 −1 −1 −1 −1 0 0 0 0 1 −1 −1 −1 −1 −1 0 0 0 0 0 1 −1 −1 −1 −1 0 0 0 0 0 0 1 −1 −1 −1 0 0 0 0 0 0 0 1 −1 −1 0 0 0 0 0 0 0 0 1 −1 0 0 0 0 0 0 0 0 0 1 β€ β₯ β₯ β₯ β₯ β₯ β₯ β₯ β₯. β₯ β₯ β₯ β₯ β₯ β₯ β¦ Then det(π΄) = 1, but π 2 (π΄) ≈ 1, 918, and π10 ≈ 0.0029. That is, π΄ is quite close to a singular π , matrix, even though det(π΄) is not near zero. For example, the nearby matrix π΄˜ = π΄ − π10 u10 v10 10 which has entries β‘ β’ β’ β’ β’ β’ β’ β’ ˜ π΄≈β’ β’ β’ β’ β’ β’ β’ β£ 1 −0 −0 −0 −0.0001 −0.0001 −0.0003 −0.0005 −0.0011 −0.0022 −1 1 −0 −0 −0 −0.0001 −0.0001 −0.0003 −0.0005 −0.0011 −1 −1 −1 −1 −1 −1 −1 −1 1 −1 −1 −1 −0 1 −1 −1 −0 −0 1 −1 −0 −0 −0 1 −0.0001 −0 −0 −0 −0.0001 −0.0001 −0 −0 −0.0003 −0.0001 −0.0001 −0 −0.0005 −0.0003 −0.0001 −0.0001 is singular. β‘ 11 −1 −1 −1 −1 −1 −1 1 −0 −0 −0 −1 −1 −1 −1 −1 −1 −1 1 −0 −0 −1 −1 −1 −1 −1 −1 −1 −1 1 −0 −1 −1 −1 −1 −1 −1 −1 −1 −1 1 β€ β₯ β₯ β₯ β₯ β₯ β₯ β₯ β₯, β₯ β₯ β₯ β₯ β₯ β₯ β¦