Linear Algebra FIN 1 Contents 1 Eigenanalysis 3 2 Extensions and Alternative proofs of Some Theorems Above 8 3 Singular Value Decomposition 11 2 1 Eigenanalysis In this note, all vector spaces V are finite dimensional. Definition 1.1. Let T be a linear transformation on a vector space V (that is, T ∈ End(V)). Let X ⊆ V be a subspace of V. We call that X is an invariant subspace of T if T (X ) ⊆ X . Definition 1.2. Let T ∈ End(V), then an eigenvector of T is a nonzero vector v ∈ V such that T (v) = λv for some λ ∈ F, where F is the underlying field of V. In this case, we say that λ is an eigenvalue of T and (v, λ) is an eigenpair of T . Theorem 1.3. Let T ∈ End(V) with underlying field C and X ⊆ V be a nonzero invariant subspace of T . Then there exists an eigenvector v ∈ X such that T (v) = λv for some λ ∈ C. Proof. Since X ̸= {0}, there exists a nonzero vector v ∈ X . Suppose dim(X ) = r, then {v, · · · , T r (v)} must be a linear dependant set of X . Hence, there exists a0 , ·, ar ∈ C, which are not all zero, such that (a0 I + a1 T + · · · + ar T r )v = 0. Since C is algebraic closed, p(T ) = (a0 I + a1 T + · · · + ar T r ) can be factorized into (T − λr I) · · · (T − λ1 I) with some λ1 , · · · , λr ∈ C (λi ’s may coincide). Let v0 = v, vi = (T − λi I) · · · (T − λ1 I)v for i = 1, · · · , r. We shall prove that there exists a vj , j ∈ {0, · · · , r − 1} such that (vj , λj+1 ) is an eigenpair of T . Suppose not, then (T − λj+1 I)vj ̸= 0 for all j. Then (T − λr I)vr−1 = (T − λr I) · · · (T − λ1 I)v ̸= 0, a contradiction. Theorem 1.4. Let T ∈ End(V) and (v1 , λ1 ), · · · , (vm , λm ) are eigenpairs of T with λi ̸= λj whenever i ̸= j. Then {v1 , · · · , vm } is a linear independant set of V. Proof. Consider a1 v1 + · · · + am vm = 0. By applying T on both side, we get a1 λ1 v1 + · · · + am λm vm = 0. By multiplying λ1 on both side, we get a1 λ1 v1 +· · ·+am λ1 vm = 0. Then a2 (λ2 −λ1 )v2 +· · ·+am (λm −λ1 )vm = 0. Continuing this process, we get am (λm −λ1 ) · · · (λm −λm−1 )vm = 0, which implies that am = 0. By induction, we are done. Definition 1.5. Let T ∈ End(V). We say that a nonzero vector v ∈ V is a generalized eigenvector if there exists a λ ∈ F and k ∈ N such that (T − λI)k v = 0 (if such v exists, λ is called a (generalized) eigenvalue). For a (generalized) eigenvalue λ, the index of λ is the minimum k ∈ N such that ker(T − λI)k contains all generalized eigenvectors with eigenvalue λ. Such k exists since ker(T − λI) ⊆ ker(T − λI)2 ⊆ · · · ⊆ ker(T − λI)n ⊆ · · · is an asscending chain in a finite dimensional vector space V. Theorem 1.6. Let T ∈ End(V). Let λ be an eigenvalue with index k. Then ker((T − λI)k ) = ker((T − λI)n ), where n = dim(V). Proof. Since λ is of index k, there exist a nonzero vector v such that (T − λI)k v = 0 with (T − λI)j v ̸= 0 whenever j < k. Consider a0 v + a1 (T − λI)v + · · · + ck−1 (T − λI)k−1 v = 0. Multiplying (T − λI)k−1 on both side, we deduce that a0 (T − λI)k−1 v = 0, then a0 = 0. Follow above process, we obtain that {x, · · · , (T − λI)k−1 x} is a linear independant set. Hence, k ≤ n and therefore ker((T − λI)k ) ⊆ ker((T − λI)n ). For the converse, suppose, to get a contradiction, x ∈ ker((T − λI)n ) with (T − λI)k x ̸= 0. Then by the definition of index, the index of λ > k since x is a generalized eigenvector, a contradiction. Theorem 1.7. If λ is an eigenvalue of T ∈ End(V), then ker(T − λI)n ⊕ im(T − λI)n = V, where n = dim(V). 3 Proof. By rank-nullity theorem, there is an eigenvalue λ ∈ C and we have dim(ker(T − λI)n ) + dim(im(T − λI)n ) = n. It remains to prove that ker(T −λI)n ∩im(T −λI)n = {0}. Suppose x ∈ ker(T −λI)n ∩im(T −λI)n , then (T − λI)n x = 0 and there exists y ∈ V such that x = (T − λI)n y. Hence, (T − λI)2n y = 0. By 1.6, y = 0 and therefore x = 0, as required. Theorem 1.8 (Principal decomposition theorem). Let V be a vector space with dimension n and underlying field C. Let T ∈ End(V ). Then all generalized eigenvectors of T spans V. Proof. By 1.3, there exists an eigenvalue λ1 ∈ C of T . By 1.7, decompose V into ker(T − λ1 I)n and im(T − λ1 I)n . If dim(ker(T − λ1 I)n ) = n, then we are done. Otherwise, for any vector y ∈ im(T − λ1 I)n , there is a vector x ∈ V such that y = (T − λ1 I)n x and there T y = (T − λ1 I)n (T x) ⊆ im(T − λ1 I)n . Hence, im(T − λ1 I)n is a nonzero invariant subspace of T . By 1.3 again, we can decompose im(T − λ1 I)n into im(T − λ2 I)n and ker(T − λ2 I)n for some eigenvalue λ2 ∈ C. Continuing this process, we will end up having λ1 , · · · , λm ∈ C such that m m M M V = ker(T − λi I)n = ker(T − λi I)ki , i=1 i=1 where ki is the index of the eigenvalue λi . It is easy to verify that each ker(T − λi I)n is an invariant subspace of T . Definition 1.9. Let T ∈ End(V). If λ is an eigenvalue of T , then we say that dim(ker(T − λI)n ) is the arithemetic multiplicity of λ; dim(ker(T − λI)) is the geometric multiplicity of T . It is clear that the arithmetic multiplicity is not smaller than the geometric one. Theorem 1.10. Let T ∈ End(V) with underlying field C and λ1 , · · · , λm be the eigenvalues of T . Then P βi = dim(V), where βi is the arithemetic multiplicity of λi . Proof. By 1.8, we are done. Definition 1.11. Let T ∈ End(V) with underlying field C. The characteristic polynomial of T , denoted by pT (t), is the polynomial (t − λ1 )β1 · · · (t − λm )βm , where λi , βi are defined in 1.10. Theorem 1.12 (Jordon normal form). Let T ∈ End(V). If T is nilpotent on V , that is, T m = 0 for some m ∈ N, then there is a basis of V of the form u1 , T u1 , · · · , T a1 −1 u1 , · · · , uk , T uk , · · · , T ak −1 uk , where T ai ui = 0 for 1 ≤ i ≤ k. a1 , · · · , ak are uniquely determined up to permutation by T . Proof. We shall prove it by induction on dim V. For the inductive step we may assume that dim V ≥ 1. It is clear that T (V) ⊂ V, since otherwise T m (V) = · · · = T (V) = V, a contradiction. Moreover, if T = 0 then the result is trivial. We may now suppose that 0 ⊂ T (V) ⊂ V. By applying the inductive hypothesis to the map T |T (V) we may find v1 , · · · , vl ∈ T (V) such that v1 , T v1 , · · · , T b1 −1 v1 , · · · , vl , T vl , · · · , T bl −1 vl is a basis for T (V ) and T bi vi = 0 for 1 ≤ i ≤ l. For 1 ≤ i ≤ l choose ui ∈ V such that T ui = vi . Clearly ker T contains the linearly independant vectors T b1 −1 v1 , · · · , T bl −1 vl ; extend these to a basis of ker T , by joining the the vectors w1 , · · · , wm . We claim that u1 , T u1 , · · · , T b1 u1 , · · · , ul , T ul , · · · , T bl ul , w1 , · · · , wm is a basis for V. Linear independance may easily be checked by applying T to a given linear relation between the vectors. By rank-nullity, dim(V) = dim ker(T ) + dim T (V) = l + m + b1 + · · · + bl , which is the number of vector in our claimed basis. We have there constucted a basis for V of the desired form. 4 For the uniqueness, first notice that Si = span{ui , T ui , · · · , T ai −1 ui } is a invariant subspace for all i. Let nj be the number of Si of dimension j. Then n1 + n2 + · · · + nt = dim(ker T ), n2 + · · · + nt = dim(ker T 2 ) − dim(ker T ), ··· nt = dim(ker T t ) − dim(T t−1 ), where t = max{h : T h ̸= 0}. Therefore, n1 , · · · , nt are uniquely determined. By 1.8, for each Vi = ker(T − λi I)n , the restriction of T − λi I on Vi is nilpotent. Hence, there exists a basis α of the form ((T − λi I)|Vi )a1 −1 u1 , · · · , u1 , · · · , ((T − λi I)|Vi )al −1 ul , · · · , ul such that (T − λi I)ai ui = 0 for all i. Then the matrix representation of T |Vi with respect to α is λi 1 0 ··· ··· 0 0 λi 1 0 ··· 0 l M .. .. . . . ··· ··· 0 . . Jaj (λi ), Jaj (λi ) = . . .. . j=1 . . . . ··· 1 . ··· 0 · · · · · · · · · · · · λi a ×a j j The matrices of the form Jaj (λi ) are called Jordon block. In turn, [T ]α is the direct sum of Jordon blocks, which is called the Jordon normal form of T , denoted by JT . From the uniqueness, JT is unique up to permutations of Jordon blocks. Corollary 1.12.1 (Caylay-Hamilton). For a T ∈ End(V ) with underlying field C, pT (T ) = 0. Proof. By 1.12 and the definition of pT (t), we have pT (T )ker(T −λi I)n = 0 for all eigenvalues λi . Then by 1.8, pT (T ) = 0. Theorem 1.13. Let A, B ∈ Mn (C), then A ∼ B if and only if JA is equal to JB up to permutations. Proof. Suppose A ∼ B. Since 1.12, JA ∼ A and JB ∼ B and therefore JA , JB are both Jordon normal form of A. By 1.12 again, JA is equal to JB up to permutations. Conversely, suppose JA is equal to JB up to permutations, then JA ∼ JB and therefore A ∼ B. Definition 1.14. By a functional, we mean a linear mapping φ from V to F. Theorem 1.15 (Riesz representation theorem). Let V be an inner product space and φ be a functional. Then there uniquely exists a vφ ∈ V such that φ(x) = ⟨x, vφ ⟩ for all x ∈ V. Proof. By Gram-Schmidt process, there exists a orthonormal basis β = {q1 , · · · , qn }. Then φ(x) = n X xi φ(qi ) i=1 for some x1 , · · · , xn ∈ F. Then * φ(x) = x, n X + φ(qi )qi = ⟨x, vφ ⟩ , i=1 P where vφ = i φ(qi )qi , which proves the existence. For the uniqueness, suppose there are two such vectors, say vφ1 , vφ2 . Then for any x ∈ V, we deduce that ⟨x, vφ1 ⟩ = φ(x) = ⟨x, vφ2 ⟩ =⇒ vφ1 = vφ2 , as required. 5 Corollary 1.15.1. Let T ∈ L(V, W), where V, W are inner product spaces. Then there is a unique linear transformation T ∗ ∈ L(W, V) such that for any x ∈ V, y ∈ W, we have ⟨T (x), y⟩W = ⟨x, T ∗ (y)⟩V . Such linear transformation is called the adjoint of T . Proof. Given y ∈ W, let φ : V → F such that φ(x) = ⟨T (x), y⟩W . It is clear that φ is a functional. By 1.15, there exists a unique vector in V, say zy , such that φ(x) = ⟨x, zy ⟩V . Let T ∗ : W → V be a mapping such that T ∗ (y) = zy . T ∗ is clearly well-defined since zy is unique. Given ay1 + y2 , where y1 , y2 ∈ W and a ∈ F. Then ⟨x, T ∗ (ay1 + y2 )⟩V = ⟨T x, ay1 + y2 ⟩W = a ⟨T x, y1 ⟩W + ⟨T x, y2 ⟩W = ⟨x, aT ∗ (y1 ) + T ∗ (y2 )⟩V , for any x ∈ V. In turn, T ∗ ∈ End(V ). Suppose there is another such linear transformation, say S. Then given y ∈ W, we have ⟨x, T ∗ (y)⟩V = ⟨T x, y⟩W = ⟨x, S(y)⟩V for any x ∈ V. Hence, T ∗ (y) = S(y) for all y ∈ V since y is arbitrary and therefore T ∗ = S. Definition 1.16. We say that T ∈ End(V ) is diagonalizable if there is a basis consisting of eigenvectors (not generalized) of T . Moreover, if such basis is orthonormal, then we say that T is orthogonally diagonalizable. Definition 1.17. Let T ∈ End(V ), where V is an inner product space. Then T is self-adjoint or Hermitian if T ∗ = T . Theorem 1.18. Let V be an inner product space and T ∈ End(V ) be a self-adjoint operator. Then for any invariant subspace X , there is an eigenvector v ∈ X . Proof. If the underlying field is C, then by 1.3, we are done. Now suppose F = R. Then there exists a P orthonormal basis β = {q1 , · · · , qn } in X . Let f : Sn ⊆ Rn → R by f (x1 , · · · , xn ) = i,j Tij xi xj , where [T ]β = [Tij ] ∈ M(n, R). Notice that Tij = Tji for i, j ∈ {1, · · · , n} since we define the inner product of x, y ∈ Rn by xt y. Since Sn is compact and f is of class C 1 , by EVT and Lagrange multiplier, there exists an maximum value in f (Sn ) and at this value, we have ∇(f − λg) = 0 for some λ ∈ R, where g(x1 , · · · , xn ) = x21 + · · · + x2n − 1. By the linearity of ∇, for i = 1, · · · , n, we deduce that 2 n X Tij xj = 2λxi ⇐⇒ [T ]β [x]β = λ[x]β j=1 for the vector x ∈ X such that [x]β = (x1 , · · · , xn ) ∈ Rn . Then since ∥x∥ = 1, x ̸= 0 and therefore x is an eigenvector of T , as required. Theorem 1.19. Let T be a self-adjoint operator in an inner product space V. If X is a invariant subspace of V, then the orthogonal complement of X , X ⊥ , is also an invariant subspace of T . Proof. Let y ∈ X ⊥ . Given x ∈ X , we have ⟨x, T y⟩ = ⟨T x, y⟩ = 0 since X is invariant and T is self adjoint. Since x is arbitrary, we have T y ∈ X ⊥ , which indicates that X ⊥ is also an invariant subspace. Theorem 1.20. Let T be a self-adjoint operator in an inner product space V. Then T is orthogonally diagonalizable. Proof. Let dim V = n. Suppose, to get a contradiction, there is only k < n linearly independant, orthonormal eigenvectors α = {q1 , · · · , qk } in V. Let X = span(q1 , · · · , qk ), which is a invariant subspace since qi ’s are eigenvectors. By Gram-Schmidt, extend α to a orthonormal basis β = {q1 , · · · , qk , · · · , qn }. Then X ⊥ = span(qk+1 , · · · , qn ), which by 1.19 and 1.18, there is an eigenvector in X ⊥ , a contradiction. 6 Theorem 1.21. Let T be a self-adjoint operator in an inner product space V. Then the eigenvalues of T are real. Proof. Let (v, λ) be an eigenpair of T . Then 2 v̸=0 2 λ ∥v∥ = ⟨T v, v⟩ = ⟨x, T v⟩ = λ ∥v∥ =⇒ λ = λ ⇐⇒ λ ∈ R. Theorem 1.22. Let S ∈ L(V, W), T ∈ L(W, U), where U, V, W are inner product spaces. Then (T S)∗ = S ∗ T ∗ . Proof. Given u ∈ U and v ∈ V, we have ⟨(T S)∗ u, v⟩V = ⟨u, T (Sv)⟩U = ⟨T ∗ u, Sv⟩W = ⟨S ∗ T ∗ u, v⟩V . Since u, v are arbitrary, we have (T S)∗ = S ∗ T ∗ . Theorem 1.23. Let T ∈ L(V, W), where V, W are inner product spaces. Then (T ∗ )∗ = T . Proof. Given v ∈ V, w ∈ W, we have ⟨w, (T ∗ )∗ v⟩W = ⟨T ∗ w, v⟩V = ⟨v, T ∗ w⟩V = ⟨T v, w⟩W = ⟨w, T v⟩W . Since v, w are arbitrary, we have (T ∗ )∗ = T . Theorem 1.24 (SVD). Let A : Rn → Rm (resp. A : Cn → Cm ) be a matrix. Then there exists a unitary matrix, that is, its inverse is its adjoint, U ∈ Rm → Rm (resp. A : Cn → Cm ) and V ∈ Rn → Rn (resp. A : Cn → Cn ) such that A = U ΣV ∗ , D 0 where Σ = is an m×n real matrix and D = diag(σ1 , · · · , σq ), σ1 ≥ · · · ≥ σr > 0, σr+1 = · · · = σq = 0, 0 0 q = min{m, n}. σ1 , · · · , σq are called the singular values of A. Proof. (The case A : Cn → Cm is similar) Consider A∗ A : Rn → Rn . Since A∗ A is self adjoint by 1.23 and 1.22, by 1.20, there exists an orthogonal matrix V : Rn → Rn such that A∗ A = V −1 ΛV, where Λ is diagonal. Since V consists of orthonormal eigenvectors, V ∗ = V −1 . Moreover, observe that the 2 2 eigenvalues of A∗ A are nonnegative (since λ ∥v∥ = ⟨A∗ Av, v⟩ = ∥Av∥ ). Arrange the columns of V such that Λ = diag(λ λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and λr+1 = · · · = λn = 0. For i ∈ {1, · · · , r}, √ 1 , · · · , λn ), where i define σi = λi and ui = Av , where vi ’s are the columns of V . Notice that σi ⟨ui , uj ⟩ = 1 1 ⟨Avi , Avj ⟩ = ⟨vi , A∗ Avj ⟩ = δij , σi σj σi σj whenever i, j ∈ {1, · · · , r}. By Gram-Schmidt, extend α = {u1 , · · · , ur } to a orthonormal basis, say β = {u1 , · · · , ur , ur+1 , · · · , um }. Then AV = A v1 · · · vn = σ1 u1 · · · σr ur 0 · · · 0 = u1 · · · um Σ = U Σ. And hence, A = U ΣV ∗ , as required. Theorem 1.25 (Schur’s decomposition). Suppose V is a complex vector space and T ∈ End(V ). Then T has an upper-triangular matrix with respect to a orthonormal basis of V. 7 Proof. We shall show that T is upper triangular with respect to a basis by induction on dim V. Clearly the desired result holds if dim V = 1. Suppose now that dim V > 1 and the desired result holds for all complex vector spaces whose dimension is less than dim V . By 1.3, there exists an eigenpair (v, λ) of T . Let U = im(T − λI). Since v ∈ / U, dim U < dim V. Given u ∈ U, then T u = T ((T − λI)x) = (T − λI)(T (x)) ∈ U for some x ∈ V. In turn, U is invariant under T and therefore by hypothesis, there exists a basis α = {u1 , · · · , um } ∈ U such that T |U is an upper triangular matrix. Extend α to a basis of V, say β = {u1 , · · · , um , v1 , · · · , vn }. Then given vj ∈ {v1 , · · · , vn }, we have T (vj ) = (T − λI)vj + λvj ∈ span(u1 , · · · , uj , v1 , · · · , vj ). Now apply Gram-Schmidt on β, we are done. 2 Extensions and Alternative proofs of Some Theorems Above Lemma 2.1. Suppose p ∈ P(R) is a nonconstant polynomial. Then p has a factorization of the form p(x) = c(x − λ1 ) · · · (x − λm )(x2 + b1 x + c1 ) · · · (x2 + bM x + cM ), where c, λ1 , · · · , λM , b1 , · · · , bM , c1 , · · · , cM ∈ R, with b2j < 4cj for each j. Proof. Think of p as an element of P(C). If all the (complex) zeros of p are real, then by fundamental theorem of algebra, we’re done. Thus suppose p has a zero λ ∈ C\R. Then λ is a zero of p. Thus we can write 2 p(x) = (x − λ)(x − λ)q(x) = (x2 − 2(ℜλ)x + |λ| )q(x). Now by induction on deg p, it remains to prove that q(x) ∈ P(R). p(x) Since q(x) = (x2 −2(ℜλ)x+|λ| 2 , we have q(R) ⊆ R. Set ) q(x) = a0 + a1 x + · · · + an−2 xn−2 , where n = deg p and a0 , · · · , an−2 ∈ C, we thus have 0 = ℑq(x) = (ℑa0 ) + (ℑa1 )x + · · · + (ℑan−2 )xn−2 for all x ∈ R. Hence, ℑa0 , · · · , ℑan−2 are all zero. In turn, q(x) ∈ P(R), as required. Lemma 2.2. Suppose T ∈ End(V) is self-adjoint and b, c ∈ R are such that b2 < 4c. Then T 2 + bT + cI is invertible. Proof. Let v be a nonzero vector in V . Then (T 2 + bT + cI)v, v = T 2 v, v + b ⟨T v, v⟩ + c ⟨v, v⟩ 2 2 ≥ ∥T v∥ − |b| ∥T v∥ ∥v∥ + c ∥v∥ 2 |b| ∥v∥ b2 2 + c− ∥v∥ > 0. = ∥T v∥ − 2 4 Hence, T 2 + bT + cI is injective, which implies that it is invertible. Theorem 2.3. Suppose V = ̸ {0} and T ∈ End(V) is a self adjoint operator. Then T has an eigenvalue. Proof. By 1.3, we only need to consider the case when V is real. Let n = dim V and shoose v ∈ V with v ̸= 0. Then v, T v, T 2 v, · · · , T n v 8 cannot be linearly independant. Thus there exists real numbers a0 , · · · , an , not all 0 such that 0 = a0 v + a1 T v · · · + an T n v = (a0 + a1 T + · · · + an T n )v = p(T )v ∈ P(R). By 2.1, we then have 0 = (a0 + a1 T + · · · + an T n )v = c(T 2 + b1 T + c1 I) · · · (T 2 + bM T + cM I)(T − λ1 I) · · · (T − λM I)v, where c is a nonzero real number, each bj , cj and λj is real with b2j < 4cj and m + M ≥ 1. By 2.2 and c ̸= 0, we have (T − λ1 I) · · · (T − λM I)v = 0. Hence T − λj I is not injective for at least one j. In other words, T has an eigenvalue. Theorem 2.4 (Real specturm theorem). Suppose T ∈ End(V), where V is real. Then T is self adjoint if and only if V has an orthonormal basis consisting of eigenvectors of T . Proof. One direction is proved in 1.20. P Conversely, suppose PVn has an orthonormal basis β = {q1 , · · · , qn } n consisting of eigenvectors of T . Let x = i=1 xi qi ∈ V, y = i=1 yi qi ∈ V, where xi , yi ’s are reals. Then ⟨x, T ∗ y⟩ = ⟨T x, y⟩ = n X xi yj ⟨T (qi ), qj ⟩ = i,j=1 n X xi yj ⟨qi , T (qj )⟩ = ⟨x, T y⟩ . i,j=1 Since x, y are arbitrary, we have T = T ∗ , as required. Theorem 2.5 (Complex specturm theorem). Suppose T ∈ End(V), where V is complex. Then T is normal, that is, T ∗ T = T T ∗ , if and only if T is orthogonal diagonalizable. Proof. Suppose T is orthogonal diagonalizable. Then there exists an orthonormal basis β = {q1 , · · · , qn }. In turn, [T ]β = Λ for some diagonal matrix Λ. Hence, [T ∗ T ]β = Λ∗ Λ = ΛΛ∗ = [T T ∗ ]β and therefore T T ∗ = T ∗ T . Now suppose T is normal. By 1.25, there is an orthonormal basis α = {e1 , · · · , en } of V with respect to T has an upper-triangular matrix. Thus [T ]α = [aij ] with aij = 0 whenever i > j. Since T ∗ T = T T ∗ , we have 2 2 ∥T v∥ = ⟨T v, T v⟩ = ⟨v, T ∗ T v⟩ = ⟨v, T T ∗ v⟩ = ⟨T ∗ v, T ∗ v⟩ = ∥T ∗ v∥ for any v ∈ V. Then from [T ]α , we obtain 2 2 2 2 2 2 ∥T e1 ∥ = |a11 | = |a11 | + |a12 | + · · · + |a1n | = |T ∗ v| , which implies that a12 = · · · = a1n = 0. Continuing with ej , then we see that all the nondiagonal entries in [T ]α equal 0 and therefore T is orthogonal diagonalizable. Definition 2.6. Let T ∈ End(V). We say that T has a squre root R ∈ End(V) if T = R2 . Proposition 2.7. Let T ∈ End(V). Then the following are equivalent: 1. T is positive semidefinite; 2. T is self adjoint and all eigenvalue of T are nonnegative; 3. T has a positive semidefinite sequre root; 4. T has a self-adjoint squre root; 5. there exists an operator R ∈ End(V) such that T = R∗ R. 9 Proof. We will prove that 1 ⇒ 2 ⇒ 3 ⇒ 4 ⇒ 5 ⇒ 1. First suppose 1 holds. Then T is obviously semidefinite. Let (λ, v) be an eigenpair of T , then 0 ≤ ⟨T v, v⟩ = λ ⟨v, v⟩, hence λ ≥ 0. Suppose 2 holds. By 2.4 or 2.5, there is an orthonormal basis e1 , · · · , en consisting of eigenvectors pof T . Let λ1 , · · · , λn be the corresponding eigenvalues. Then each λj ≥ 0. Let R ∈ End(V) such that Rej = λj ej . Then it is clear that R is a positive semidefinite squre root of T . Clearly 3 implies 4. Suppose 4 holds. Then there exists R ∈ End(V) such that R∗ = R and T = R2 . Hence T = R∗ R. Finally, suppose 5 holds. Then there exists R ∈ End(V) such that T = R∗ R. It is clear that T is 2 self-adjoint. Also, given v ∈ V, we have ⟨T v, v⟩ = ∥Rv∥ ≥ 0, which prove 1. Corollary 2.7.1. Such positive semidefinite square root is unique. √ Proof. Suppose T ≥ √0 and there exists an eigenpair (λ, v). We shall prove that Rv = λv. , · · · , en consisting eigenvectors To prove Rv = λv, by 2.5 or 2.4 again, there is an orthonormal basis e1p of R. Then since R ≥ 0. There exists λ1 , · · · , λn ≥ 0 such that Rej = λj ej for j = 1, · · · , n. Since e1 , · · · , en is a basis, we can write v = a1 e1 + · · · + an en for some a1 , · · · , an ∈ F. Thus, Rv = a1 p p λ 1 e1 + · · · + a n λ n en . And hence, R2 v = a1 λ1 e1 + · · · + an λn en . But R2 = T , and T v = λv. Thus is implies that aj (λ − λj ) = 0 for j = 1, · · · , n. Hence, n X v= aj ej , j=1,λj =λ and thus, Rv = n X √ √ aj λej = λv, j=1,λj =λ as required. The unique positive semidefinite square root of T is denoted by √ T. Proposition 2.8. Suppose S ∈ End(V). Then the following are equivalent: 1. S is an isometry, that is, ∥Sv∥ = ∥v∥ for all v ∈ V; 2. S is unitary; 3. Se1 , · · · , Sen is orthogonal for every orthogonal vectors e1 , · · · , en ∈ V; 4. there exists an orthonormal basis e1 , · · · , en of V such that Se1 , · · · , Sen is orthonormal; 5. S ∗ S = I; 6. SS ∗ = I; 7. S ∗ is an isometry; 8. S is invertible and S −1 = S ∗ . Proof. We will prove that 1 ⇒ 2 ⇒ 3 ⇒ 4 ⇒ 5 ⇒ 6 ⇒ 7 ⇒ 8 ⇒ 1. First suppose 1 holds. Then if V is real, we have 2 2 4 ⟨Su, Sv⟩ = ∥S(u + v)∥ − ∥S(u − v)∥ = 4 ⟨u, v⟩ =⇒ ⟨Su, Sv⟩ = ⟨u, v⟩ . If V is complex, then 2 2 2 2 4 ⟨Su, Sv⟩ = ∥S(u + v)∥ − ∥S(u − v)∥ + i ∥S(u + iv)∥ − i ∥S(u − iv)∥ = 4 ⟨u, v⟩ =⇒ ⟨Su, Sv⟩ = ⟨u, v⟩ . 10 Clearly 2 implies 3 and 3 implies 4. Now suppose 4 holds. Then there exists an orthonormal basis e1 , · · · , en such that Se1 , · · · , Sen is orthonormal. Then ⟨S ∗ Sej , ek ⟩ = ⟨ej , ek ⟩ for j, k = 1, · · · , n. Then S ∗ S = I. Clearly 5 implies 6. Suppose 6 holds, then 2 2 ∥S ∗ v∥ = ⟨S ∗ v, S ∗ v⟩ = ⟨SS ∗ v, v⟩ = ∥v∥ for every v ∈ V, hence S ∗ is an isometry. Suppose 7 holds, by replacing S with S ∗ in 1, we have SS ∗ = S ∗ S = I, hence 8 holds. Finally, suppose 8 holds, then 2 2 ⟨Sv⟩ = ⟨Sv, Sv⟩ = ⟨S ∗ Sv, v⟩ = ∥v∥ , for every v ∈ V, hence S is an isometry. 3 Singular Value Decomposition Proposition 3.1. Let T ∈ Hom(V, W). Then 1. T ∗ T, T T ∗ ≥ 0; 2. ker T ∗ T = ker T ; 3. rank T ∗ T = rank T T ∗ = rank T = rank T ∗ . Proof. 1. Trivial. 2. It is clear that ker T ⊆ ker T ∗ T . OTOH, suppose T ∗ T v = 0 for some v ∈ V, then 0 = ⟨T ∗ T v, v⟩ = ⟨T v, T v⟩ =⇒ T v = 0. 3. By 2 and rank-nullity, rank(T ∗ T ) = rank(T ). We want to prove that im(T ∗ )⊥ = ker(T ). Let v ∈ ker T , then for any T ∗ (w) ∈ im(T ∗ ), where w ∈ W, we have ⟨v, T ∗ w⟩ = ⟨T v, w⟩ = 0. Since v is arbitrary, we have ker(T ) ⊆ im(T ∗ )⊥ . OTOH, let v ∈ im(T ∗ )⊥ , then for any w ∈ W, we have 0 = ⟨v, T ∗ w⟩ = ⟨T v, w⟩ . Then T v = 0. Therefore, im(T ∗ )⊥ = ker(T ) and by rank-nullity, n − rank(T ∗ ) = n − rank(T ) ⇐⇒ rank(T ) = rank(T ∗ ). Corollary 3.1.1. The mapping L : T ∗ (W ) → T (V ), L(x) = T (x) is an isomorphism. Proof. Since rank(T ∗ ) = rank(T ), we only need to veify that L is injective. Notice that ker L = im T ∗ ∩ker T = {0}, then we are done. Theorem 3.2 (SVD). Let T ∈ Hom(V, W). Suppose dim V = n, dim W = m and rank(T ) = r. Then there exists an orthonormal basis β1 , · · · , βr , · · · , βn of V, an orthonormal basis γ1 , · · · , γr , · · · , γm of W and σ1 ≥ · · · ≥ σr > 0 such that: ( σi γi if 1 ≤ i ≤ r; 1. T (βi ) = 0 otherwise. 11 ( σ i βi 2. T (γi ) = 0 ∗ if 1 ≤ i ≤ r; otherwise. Proof. By 3.1 and specturm theorem, there exist a orthonormal basis β1 , · · · , βn of V consisting of eigenvectors ∗ of T ∗ T √. By 2.7, we can assume that λ1 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0 ∗are the eigenvalues of T T . Let σi = λi for i = 1, · · · , r. By 3.1 again, we know that T (βi ) ̸= 0 since T T (βi ) ̸= 0 for i = 1, · · · , r and T (βj ) = 0 since T ∗ T (βj ) = 0 for j = r + 1, · · · , n. Notice that {T (β1 ), · · · , T (βr )} is an orthogonal set and i) for i = 1, · · · , r. Extend {γ1 , · · · , γr } to be an orthonormal ∥T (βi )∥ = σi for i = 1, · · · , r. Let γi = T (β σi basis {γ1 , · · · , γr , γr+1 , · · · , γm } of W. It is clear that 1 is fulfilled. For 2, we have T ∗ (γi ) = σ1i T ∗ T (βi ) = σi βi for i = 1, · · · , r. For j = r + 1, · · · , n, since rank(T ∗ ) = rank(T ), T ∗ (γj ) = 0. Definition 3.3. Let T ∈ Hom(V, W). Let the (orthogonal) projection from W to T (V) be P : W → T (V) and L : T ∗ (W) → T (V ) be the one in 3.1.1, then the pseudo inverse of T , denoted by T† : W w Proposition 3.4. Let T , β, γ, σ be in 3.2. Then ( T † (γi ) = → V 7→ (L−1 P )(w) βi σi if i = 1, · · · , r; otherwise. 0 Proof. Since W = T (V ) ⊕ ker(T ∗ ), for i = 1, · · · , r, P (γi ) = γi , then L−1 P (γi ) = L−1 (γi ). Notice that L(βi ) = T (βi ) = σi γi =⇒ L−1 (γi ) = σβii . For j = r + 1, · · · , n, P (γj ) = 0 =⇒ T † (γj ) = 0. Corollary 3.4.1. T † T is the projection from V to T ∗ (W ) and T T † is the projection from W to T (V ). Proof. The second one is similar, so we only prove that first one. By 3.4, we have ( βi if i = 1, · · · , n; T † T (βi ) = 0 otherwise. Proposition 3.5. Let T , β, γ, σ be in 3.2. Then 1. (T † )† = T ; 2. (T † )∗ = (T ∗ )† . Proof. For i = 1, · · · , r, (T † )(γi ) = σβii =⇒ (T † )† (βi ) = σi βi = T (βi ). And for j = r + 1, · · · , m, T † (γj ) = 0 =⇒ (T † )† (βj ) = 0 = T (βj ). Hence, T = (T † )† , which prove 1. Since T (βi ) = σi γi ⇐⇒ T † (γi ) = σβii . Hence, by 3.2, (T † )∗ (βi ) = σγii . OTOH, T (βi ) = σi βi ⇐⇒ ∗ T (γi ) = σi βi , then (T ∗ )† (βi ) = σγii . For j = r + 1, · · · , n is similar, then we are done. Proposition 3.6 (Least Square). Let T ∈ Hom(V, W), then for any x ∈ V and b ∈ W, then ∥T (x) − b∥ ≥ T (T † (b)) − b . Proof. Since T (x)−b = T (x)−(T T † )(b)+(T T † )(b)−b, where T (x)−(T T † )(b) ∈ im T and (T T † )(b)−b ker T ∗ . Hence, 2 ∥T (x) − b∥ = T (x) − (T T † )(b) 2 + (T T † )(b) − b 2 ≥ (T T † )(b) − b 12 2 ⇐⇒ ∥T (x) − b∥ ≥ T T † (b) − b .