Uploaded by 舒跑Super

Essence of linear algebra

advertisement
Linear Algebra
FIN
1
Contents
1 Eigenanalysis
3
2 Extensions and Alternative proofs of Some Theorems Above
8
3 Singular Value Decomposition
11
2
1
Eigenanalysis
In this note, all vector spaces V are finite dimensional.
Definition 1.1. Let T be a linear transformation on a vector space V (that is, T ∈ End(V)). Let X ⊆ V be
a subspace of V. We call that X is an invariant subspace of T if T (X ) ⊆ X .
Definition 1.2. Let T ∈ End(V), then an eigenvector of T is a nonzero vector v ∈ V such that T (v) = λv
for some λ ∈ F, where F is the underlying field of V. In this case, we say that λ is an eigenvalue of T and
(v, λ) is an eigenpair of T .
Theorem 1.3. Let T ∈ End(V) with underlying field C and X ⊆ V be a nonzero invariant subspace of T .
Then there exists an eigenvector v ∈ X such that T (v) = λv for some λ ∈ C.
Proof. Since X ̸= {0}, there exists a nonzero vector v ∈ X . Suppose dim(X ) = r, then {v, · · · , T r (v)}
must be a linear dependant set of X . Hence, there exists a0 , ·, ar ∈ C, which are not all zero, such that
(a0 I + a1 T + · · · + ar T r )v = 0. Since C is algebraic closed, p(T ) = (a0 I + a1 T + · · · + ar T r ) can be factorized
into (T − λr I) · · · (T − λ1 I) with some λ1 , · · · , λr ∈ C (λi ’s may coincide).
Let v0 = v, vi = (T − λi I) · · · (T − λ1 I)v for i = 1, · · · , r. We shall prove that there exists a vj , j ∈
{0, · · · , r − 1} such that (vj , λj+1 ) is an eigenpair of T . Suppose not, then (T − λj+1 I)vj ̸= 0 for all j. Then
(T − λr I)vr−1 = (T − λr I) · · · (T − λ1 I)v ̸= 0, a contradiction.
Theorem 1.4. Let T ∈ End(V) and (v1 , λ1 ), · · · , (vm , λm ) are eigenpairs of T with λi ̸= λj whenever i ̸= j.
Then {v1 , · · · , vm } is a linear independant set of V.
Proof. Consider a1 v1 + · · · + am vm = 0. By applying T on both side, we get a1 λ1 v1 + · · · + am λm vm = 0. By
multiplying λ1 on both side, we get a1 λ1 v1 +· · ·+am λ1 vm = 0. Then a2 (λ2 −λ1 )v2 +· · ·+am (λm −λ1 )vm = 0.
Continuing this process, we get am (λm −λ1 ) · · · (λm −λm−1 )vm = 0, which implies that am = 0. By induction,
we are done.
Definition 1.5. Let T ∈ End(V). We say that a nonzero vector v ∈ V is a generalized eigenvector if
there exists a λ ∈ F and k ∈ N such that (T − λI)k v = 0 (if such v exists, λ is called a (generalized)
eigenvalue). For a (generalized) eigenvalue λ, the index of λ is the minimum k ∈ N such that ker(T − λI)k
contains all generalized eigenvectors with eigenvalue λ. Such k exists since ker(T − λI) ⊆ ker(T − λI)2 ⊆
· · · ⊆ ker(T − λI)n ⊆ · · · is an asscending chain in a finite dimensional vector space V.
Theorem 1.6. Let T ∈ End(V). Let λ be an eigenvalue with index k. Then
ker((T − λI)k ) = ker((T − λI)n ),
where n = dim(V).
Proof. Since λ is of index k, there exist a nonzero vector v such that (T − λI)k v = 0 with (T − λI)j v ̸= 0
whenever j < k. Consider
a0 v + a1 (T − λI)v + · · · + ck−1 (T − λI)k−1 v = 0.
Multiplying (T − λI)k−1 on both side, we deduce that a0 (T − λI)k−1 v = 0, then a0 = 0. Follow above
process, we obtain that {x, · · · , (T − λI)k−1 x} is a linear independant set. Hence, k ≤ n and therefore
ker((T − λI)k ) ⊆ ker((T − λI)n ).
For the converse, suppose, to get a contradiction, x ∈ ker((T − λI)n ) with (T − λI)k x ̸= 0. Then by the
definition of index, the index of λ > k since x is a generalized eigenvector, a contradiction.
Theorem 1.7. If λ is an eigenvalue of T ∈ End(V), then
ker(T − λI)n ⊕ im(T − λI)n = V,
where n = dim(V).
3
Proof. By rank-nullity theorem, there is an eigenvalue λ ∈ C and we have dim(ker(T − λI)n ) + dim(im(T −
λI)n ) = n. It remains to prove that ker(T −λI)n ∩im(T −λI)n = {0}. Suppose x ∈ ker(T −λI)n ∩im(T −λI)n ,
then (T − λI)n x = 0 and there exists y ∈ V such that x = (T − λI)n y. Hence, (T − λI)2n y = 0. By 1.6,
y = 0 and therefore x = 0, as required.
Theorem 1.8 (Principal decomposition theorem). Let V be a vector space with dimension n and underlying
field C. Let T ∈ End(V ). Then all generalized eigenvectors of T spans V.
Proof. By 1.3, there exists an eigenvalue λ1 ∈ C of T . By 1.7, decompose V into ker(T − λ1 I)n and
im(T − λ1 I)n . If dim(ker(T − λ1 I)n ) = n, then we are done. Otherwise, for any vector y ∈ im(T − λ1 I)n ,
there is a vector x ∈ V such that y = (T − λ1 I)n x and there T y = (T − λ1 I)n (T x) ⊆ im(T − λ1 I)n . Hence,
im(T − λ1 I)n is a nonzero invariant subspace of T . By 1.3 again, we can decompose im(T − λ1 I)n into
im(T − λ2 I)n and ker(T − λ2 I)n for some eigenvalue λ2 ∈ C. Continuing this process, we will end up having
λ1 , · · · , λm ∈ C such that
m
m
M
M
V =
ker(T − λi I)n =
ker(T − λi I)ki ,
i=1
i=1
where ki is the index of the eigenvalue λi .
It is easy to verify that each ker(T − λi I)n is an invariant subspace of T .
Definition 1.9. Let T ∈ End(V). If λ is an eigenvalue of T , then we say that dim(ker(T − λI)n ) is the
arithemetic multiplicity of λ; dim(ker(T − λI)) is the geometric multiplicity of T . It is clear that the
arithmetic multiplicity is not smaller than the geometric one.
Theorem
1.10. Let T ∈ End(V) with underlying field C and λ1 , · · · , λm be the eigenvalues of T . Then
P
βi = dim(V), where βi is the arithemetic multiplicity of λi .
Proof. By 1.8, we are done.
Definition 1.11. Let T ∈ End(V) with underlying field C. The characteristic polynomial of T , denoted
by pT (t), is the polynomial (t − λ1 )β1 · · · (t − λm )βm , where λi , βi are defined in 1.10.
Theorem 1.12 (Jordon normal form). Let T ∈ End(V). If T is nilpotent on V , that is, T m = 0 for some
m ∈ N, then there is a basis of V of the form
u1 , T u1 , · · · , T a1 −1 u1 , · · · , uk , T uk , · · · , T ak −1 uk ,
where T ai ui = 0 for 1 ≤ i ≤ k. a1 , · · · , ak are uniquely determined up to permutation by T .
Proof. We shall prove it by induction on dim V. For the inductive step we may assume that dim V ≥ 1. It is
clear that T (V) ⊂ V, since otherwise T m (V) = · · · = T (V) = V, a contradiction. Moreover, if T = 0 then the
result is trivial. We may now suppose that 0 ⊂ T (V) ⊂ V.
By applying the inductive hypothesis to the map T |T (V) we may find v1 , · · · , vl ∈ T (V) such that
v1 , T v1 , · · · , T b1 −1 v1 , · · · , vl , T vl , · · · , T bl −1 vl
is a basis for T (V ) and T bi vi = 0 for 1 ≤ i ≤ l.
For 1 ≤ i ≤ l choose ui ∈ V such that T ui = vi . Clearly ker T contains the linearly independant vectors
T b1 −1 v1 , · · · , T bl −1 vl ; extend these to a basis of ker T , by joining the the vectors w1 , · · · , wm . We claim that
u1 , T u1 , · · · , T b1 u1 , · · · , ul , T ul , · · · , T bl ul , w1 , · · · , wm
is a basis for V. Linear independance may easily be checked by applying T to a given linear relation between
the vectors. By rank-nullity, dim(V) = dim ker(T ) + dim T (V) = l + m + b1 + · · · + bl , which is the number
of vector in our claimed basis. We have there constucted a basis for V of the desired form.
4
For the uniqueness, first notice that Si = span{ui , T ui , · · · , T ai −1 ui } is a invariant subspace for all i. Let
nj be the number of Si of dimension j. Then
n1 + n2 + · · · + nt = dim(ker T ),
n2 + · · · + nt = dim(ker T 2 ) − dim(ker T ),
···
nt = dim(ker T t ) − dim(T t−1 ),
where t = max{h : T h ̸= 0}. Therefore, n1 , · · · , nt are uniquely determined.
By 1.8, for each Vi = ker(T − λi I)n , the restriction of T − λi I on Vi is nilpotent. Hence, there exists a
basis α of the form ((T − λi I)|Vi )a1 −1 u1 , · · · , u1 , · · · , ((T − λi I)|Vi )al −1 ul , · · · , ul such that (T − λi I)ai ui = 0
for all i. Then the matrix representation of T |Vi with respect to α is


λi 1
0 ··· ··· 0
 0 λi
1
0 ··· 0


l
M

 ..
..
.
.

. ··· ··· 0
.
.
Jaj (λi ), Jaj (λi ) =  .


.
..
.
j=1
.
.
.
. ··· 1
.
···
0 · · · · · · · · · · · · λi a ×a
j
j
The matrices of the form Jaj (λi ) are called Jordon block. In turn, [T ]α is the direct sum of Jordon blocks,
which is called the Jordon normal form of T , denoted by JT . From the uniqueness, JT is unique up to
permutations of Jordon blocks.
Corollary 1.12.1 (Caylay-Hamilton). For a T ∈ End(V ) with underlying field C, pT (T ) = 0.
Proof. By 1.12 and the definition of pT (t), we have pT (T )ker(T −λi I)n = 0 for all eigenvalues λi . Then by 1.8,
pT (T ) = 0.
Theorem 1.13. Let A, B ∈ Mn (C), then A ∼ B if and only if JA is equal to JB up to permutations.
Proof. Suppose A ∼ B. Since 1.12, JA ∼ A and JB ∼ B and therefore JA , JB are both Jordon normal form
of A. By 1.12 again, JA is equal to JB up to permutations.
Conversely, suppose JA is equal to JB up to permutations, then JA ∼ JB and therefore A ∼ B.
Definition 1.14. By a functional, we mean a linear mapping φ from V to F.
Theorem 1.15 (Riesz representation theorem). Let V be an inner product space and φ be a functional.
Then there uniquely exists a vφ ∈ V such that
φ(x) = ⟨x, vφ ⟩
for all x ∈ V.
Proof. By Gram-Schmidt process, there exists a orthonormal basis β = {q1 , · · · , qn }. Then
φ(x) =
n
X
xi φ(qi )
i=1
for some x1 , · · · , xn ∈ F. Then
*
φ(x) =
x,
n
X
+
φ(qi )qi
= ⟨x, vφ ⟩ ,
i=1
P
where vφ = i φ(qi )qi , which proves the existence.
For the uniqueness, suppose there are two such vectors, say vφ1 , vφ2 . Then for any x ∈ V, we deduce that
⟨x, vφ1 ⟩ = φ(x) = ⟨x, vφ2 ⟩ =⇒ vφ1 = vφ2 ,
as required.
5
Corollary 1.15.1. Let T ∈ L(V, W), where V, W are inner product spaces. Then there is a unique linear
transformation T ∗ ∈ L(W, V) such that for any x ∈ V, y ∈ W, we have
⟨T (x), y⟩W = ⟨x, T ∗ (y)⟩V .
Such linear transformation is called the adjoint of T .
Proof. Given y ∈ W, let φ : V → F such that φ(x) = ⟨T (x), y⟩W . It is clear that φ is a functional. By 1.15,
there exists a unique vector in V, say zy , such that φ(x) = ⟨x, zy ⟩V . Let T ∗ : W → V be a mapping such that
T ∗ (y) = zy . T ∗ is clearly well-defined since zy is unique. Given ay1 + y2 , where y1 , y2 ∈ W and a ∈ F. Then
⟨x, T ∗ (ay1 + y2 )⟩V = ⟨T x, ay1 + y2 ⟩W = a ⟨T x, y1 ⟩W + ⟨T x, y2 ⟩W = ⟨x, aT ∗ (y1 ) + T ∗ (y2 )⟩V ,
for any x ∈ V. In turn, T ∗ ∈ End(V ).
Suppose there is another such linear transformation, say S. Then given y ∈ W, we have
⟨x, T ∗ (y)⟩V = ⟨T x, y⟩W = ⟨x, S(y)⟩V
for any x ∈ V. Hence, T ∗ (y) = S(y) for all y ∈ V since y is arbitrary and therefore T ∗ = S.
Definition 1.16. We say that T ∈ End(V ) is diagonalizable if there is a basis consisting of eigenvectors (not generalized) of T . Moreover, if such basis is orthonormal, then we say that T is orthogonally
diagonalizable.
Definition 1.17. Let T ∈ End(V ), where V is an inner product space. Then T is self-adjoint or Hermitian
if T ∗ = T .
Theorem 1.18. Let V be an inner product space and T ∈ End(V ) be a self-adjoint operator. Then for any
invariant subspace X , there is an eigenvector v ∈ X .
Proof. If the underlying field is C, then by 1.3, we are done. Now suppose F = R. Then
there exists a
P
orthonormal basis β = {q1 , · · · , qn } in X . Let f : Sn ⊆ Rn → R by f (x1 , · · · , xn ) = i,j Tij xi xj , where
[T ]β = [Tij ] ∈ M(n, R). Notice that Tij = Tji for i, j ∈ {1, · · · , n} since we define the inner product
of x, y ∈ Rn by xt y. Since Sn is compact and f is of class C 1 , by EVT and Lagrange multiplier, there
exists an maximum value in f (Sn ) and at this value, we have ∇(f − λg) = 0 for some λ ∈ R, where
g(x1 , · · · , xn ) = x21 + · · · + x2n − 1. By the linearity of ∇, for i = 1, · · · , n, we deduce that
2
n
X
Tij xj = 2λxi ⇐⇒ [T ]β [x]β = λ[x]β
j=1
for the vector x ∈ X such that [x]β = (x1 , · · · , xn ) ∈ Rn . Then since ∥x∥ = 1, x ̸= 0 and therefore x is an
eigenvector of T , as required.
Theorem 1.19. Let T be a self-adjoint operator in an inner product space V. If X is a invariant subspace
of V, then the orthogonal complement of X , X ⊥ , is also an invariant subspace of T .
Proof. Let y ∈ X ⊥ . Given x ∈ X , we have
⟨x, T y⟩ = ⟨T x, y⟩ = 0
since X is invariant and T is self adjoint. Since x is arbitrary, we have T y ∈ X ⊥ , which indicates that X ⊥ is
also an invariant subspace.
Theorem 1.20. Let T be a self-adjoint operator in an inner product space V. Then T is orthogonally
diagonalizable.
Proof. Let dim V = n. Suppose, to get a contradiction, there is only k < n linearly independant, orthonormal
eigenvectors α = {q1 , · · · , qk } in V. Let X = span(q1 , · · · , qk ), which is a invariant subspace since qi ’s
are eigenvectors. By Gram-Schmidt, extend α to a orthonormal basis β = {q1 , · · · , qk , · · · , qn }. Then
X ⊥ = span(qk+1 , · · · , qn ), which by 1.19 and 1.18, there is an eigenvector in X ⊥ , a contradiction.
6
Theorem 1.21. Let T be a self-adjoint operator in an inner product space V. Then the eigenvalues of T
are real.
Proof. Let (v, λ) be an eigenpair of T . Then
2 v̸=0
2
λ ∥v∥ = ⟨T v, v⟩ = ⟨x, T v⟩ = λ ∥v∥
=⇒ λ = λ ⇐⇒ λ ∈ R.
Theorem 1.22. Let S ∈ L(V, W), T ∈ L(W, U), where U, V, W are inner product spaces. Then
(T S)∗ = S ∗ T ∗ .
Proof. Given u ∈ U and v ∈ V, we have
⟨(T S)∗ u, v⟩V = ⟨u, T (Sv)⟩U = ⟨T ∗ u, Sv⟩W = ⟨S ∗ T ∗ u, v⟩V .
Since u, v are arbitrary, we have (T S)∗ = S ∗ T ∗ .
Theorem 1.23. Let T ∈ L(V, W), where V, W are inner product spaces. Then (T ∗ )∗ = T .
Proof. Given v ∈ V, w ∈ W, we have
⟨w, (T ∗ )∗ v⟩W = ⟨T ∗ w, v⟩V = ⟨v, T ∗ w⟩V = ⟨T v, w⟩W = ⟨w, T v⟩W .
Since v, w are arbitrary, we have (T ∗ )∗ = T .
Theorem 1.24 (SVD). Let A : Rn → Rm (resp. A : Cn → Cm ) be a matrix. Then there exists a unitary
matrix, that is, its inverse is its adjoint, U ∈ Rm → Rm (resp. A : Cn → Cm ) and V ∈ Rn → Rn (resp.
A : Cn → Cn ) such that
A = U ΣV ∗ ,
D 0
where Σ =
is an m×n real matrix and D = diag(σ1 , · · · , σq ), σ1 ≥ · · · ≥ σr > 0, σr+1 = · · · = σq = 0,
0 0
q = min{m, n}. σ1 , · · · , σq are called the singular values of A.
Proof. (The case A : Cn → Cm is similar) Consider A∗ A : Rn → Rn . Since A∗ A is self adjoint by 1.23 and
1.22, by 1.20, there exists an orthogonal matrix V : Rn → Rn such that
A∗ A = V −1 ΛV,
where Λ is diagonal. Since V consists of orthonormal eigenvectors, V ∗ = V −1 . Moreover, observe that the
2
2
eigenvalues of A∗ A are nonnegative (since λ ∥v∥ = ⟨A∗ Av, v⟩ = ∥Av∥ ). Arrange the columns of V such
that Λ = diag(λ
λ1 ≥ λ2 ≥ · · · ≥ λr > 0 and λr+1 = · · · = λn = 0. For i ∈ {1, · · · , r},
√ 1 , · · · , λn ), where
i
define σi = λi and ui = Av
,
where
vi ’s are the columns of V . Notice that
σi
⟨ui , uj ⟩ =
1
1
⟨Avi , Avj ⟩ =
⟨vi , A∗ Avj ⟩ = δij ,
σi σj
σi σj
whenever i, j ∈ {1, · · · , r}. By Gram-Schmidt, extend α = {u1 , · · · , ur } to a orthonormal basis, say β =
{u1 , · · · , ur , ur+1 , · · · , um }. Then
AV = A v1 · · · vn
= σ1 u1 · · · σr ur 0 · · · 0
= u1 · · · um Σ = U Σ.
And hence, A = U ΣV ∗ , as required.
Theorem 1.25 (Schur’s decomposition). Suppose V is a complex vector space and T ∈ End(V ). Then T
has an upper-triangular matrix with respect to a orthonormal basis of V.
7
Proof. We shall show that T is upper triangular with respect to a basis by induction on dim V. Clearly the
desired result holds if dim V = 1.
Suppose now that dim V > 1 and the desired result holds for all complex vector spaces whose dimension
is less than dim V .
By 1.3, there exists an eigenpair (v, λ) of T . Let U = im(T − λI). Since v ∈
/ U, dim U < dim V. Given
u ∈ U, then T u = T ((T − λI)x) = (T − λI)(T (x)) ∈ U for some x ∈ V. In turn, U is invariant under T
and therefore by hypothesis, there exists a basis α = {u1 , · · · , um } ∈ U such that T |U is an upper triangular
matrix. Extend α to a basis of V, say β = {u1 , · · · , um , v1 , · · · , vn }. Then given vj ∈ {v1 , · · · , vn }, we have
T (vj ) = (T − λI)vj + λvj ∈ span(u1 , · · · , uj , v1 , · · · , vj ).
Now apply Gram-Schmidt on β, we are done.
2
Extensions and Alternative proofs of Some Theorems Above
Lemma 2.1. Suppose p ∈ P(R) is a nonconstant polynomial. Then p has a factorization of the form
p(x) = c(x − λ1 ) · · · (x − λm )(x2 + b1 x + c1 ) · · · (x2 + bM x + cM ),
where c, λ1 , · · · , λM , b1 , · · · , bM , c1 , · · · , cM ∈ R, with b2j < 4cj for each j.
Proof. Think of p as an element of P(C). If all the (complex) zeros of p are real, then by fundamental
theorem of algebra, we’re done. Thus suppose p has a zero λ ∈ C\R. Then λ is a zero of p. Thus we can
write
2
p(x) = (x − λ)(x − λ)q(x) = (x2 − 2(ℜλ)x + |λ| )q(x).
Now by induction on deg p, it remains to prove that q(x) ∈ P(R).
p(x)
Since q(x) = (x2 −2(ℜλ)x+|λ|
2 , we have q(R) ⊆ R. Set
)
q(x) = a0 + a1 x + · · · + an−2 xn−2 ,
where n = deg p and a0 , · · · , an−2 ∈ C, we thus have
0 = ℑq(x) = (ℑa0 ) + (ℑa1 )x + · · · + (ℑan−2 )xn−2
for all x ∈ R. Hence, ℑa0 , · · · , ℑan−2 are all zero. In turn, q(x) ∈ P(R), as required.
Lemma 2.2. Suppose T ∈ End(V) is self-adjoint and b, c ∈ R are such that b2 < 4c. Then
T 2 + bT + cI
is invertible.
Proof. Let v be a nonzero vector in V . Then
(T 2 + bT + cI)v, v = T 2 v, v + b ⟨T v, v⟩ + c ⟨v, v⟩
2
2
≥ ∥T v∥ − |b| ∥T v∥ ∥v∥ + c ∥v∥
2 |b| ∥v∥
b2
2
+ c−
∥v∥ > 0.
= ∥T v∥ −
2
4
Hence, T 2 + bT + cI is injective, which implies that it is invertible.
Theorem 2.3. Suppose V =
̸ {0} and T ∈ End(V) is a self adjoint operator. Then T has an eigenvalue.
Proof. By 1.3, we only need to consider the case when V is real. Let n = dim V and shoose v ∈ V with v ̸= 0.
Then
v, T v, T 2 v, · · · , T n v
8
cannot be linearly independant. Thus there exists real numbers a0 , · · · , an , not all 0 such that
0 = a0 v + a1 T v · · · + an T n v = (a0 + a1 T + · · · + an T n )v = p(T )v ∈ P(R).
By 2.1, we then have
0 = (a0 + a1 T + · · · + an T n )v
= c(T 2 + b1 T + c1 I) · · · (T 2 + bM T + cM I)(T − λ1 I) · · · (T − λM I)v,
where c is a nonzero real number, each bj , cj and λj is real with b2j < 4cj and m + M ≥ 1. By 2.2 and c ̸= 0,
we have
(T − λ1 I) · · · (T − λM I)v = 0.
Hence T − λj I is not injective for at least one j. In other words, T has an eigenvalue.
Theorem 2.4 (Real specturm theorem). Suppose T ∈ End(V), where V is real. Then T is self adjoint if
and only if V has an orthonormal basis consisting of eigenvectors of T .
Proof. One direction is proved in 1.20. P
Conversely, suppose
PVn has an orthonormal basis β = {q1 , · · · , qn }
n
consisting of eigenvectors of T . Let x = i=1 xi qi ∈ V, y = i=1 yi qi ∈ V, where xi , yi ’s are reals. Then
⟨x, T ∗ y⟩ = ⟨T x, y⟩ =
n
X
xi yj ⟨T (qi ), qj ⟩ =
i,j=1
n
X
xi yj ⟨qi , T (qj )⟩ = ⟨x, T y⟩ .
i,j=1
Since x, y are arbitrary, we have T = T ∗ , as required.
Theorem 2.5 (Complex specturm theorem). Suppose T ∈ End(V), where V is complex. Then T is normal,
that is, T ∗ T = T T ∗ , if and only if T is orthogonal diagonalizable.
Proof. Suppose T is orthogonal diagonalizable. Then there exists an orthonormal basis β = {q1 , · · · , qn }. In
turn, [T ]β = Λ for some diagonal matrix Λ. Hence,
[T ∗ T ]β = Λ∗ Λ = ΛΛ∗ = [T T ∗ ]β
and therefore T T ∗ = T ∗ T .
Now suppose T is normal. By 1.25, there is an orthonormal basis α = {e1 , · · · , en } of V with respect to
T has an upper-triangular matrix. Thus [T ]α = [aij ] with aij = 0 whenever i > j. Since T ∗ T = T T ∗ , we
have
2
2
∥T v∥ = ⟨T v, T v⟩ = ⟨v, T ∗ T v⟩ = ⟨v, T T ∗ v⟩ = ⟨T ∗ v, T ∗ v⟩ = ∥T ∗ v∥
for any v ∈ V. Then from [T ]α , we obtain
2
2
2
2
2
2
∥T e1 ∥ = |a11 | = |a11 | + |a12 | + · · · + |a1n | = |T ∗ v| ,
which implies that a12 = · · · = a1n = 0. Continuing with ej , then we see that all the nondiagonal entries in
[T ]α equal 0 and therefore T is orthogonal diagonalizable.
Definition 2.6. Let T ∈ End(V). We say that T has a squre root R ∈ End(V) if T = R2 .
Proposition 2.7. Let T ∈ End(V). Then the following are equivalent:
1. T is positive semidefinite;
2. T is self adjoint and all eigenvalue of T are nonnegative;
3. T has a positive semidefinite sequre root;
4. T has a self-adjoint squre root;
5. there exists an operator R ∈ End(V) such that T = R∗ R.
9
Proof. We will prove that 1 ⇒ 2 ⇒ 3 ⇒ 4 ⇒ 5 ⇒ 1.
First suppose 1 holds. Then T is obviously semidefinite. Let (λ, v) be an eigenpair of T , then 0 ≤
⟨T v, v⟩ = λ ⟨v, v⟩, hence λ ≥ 0.
Suppose 2 holds. By 2.4 or 2.5, there is an orthonormal basis e1 , · · · , en consisting of eigenvectors
pof T .
Let λ1 , · · · , λn be the corresponding eigenvalues. Then each λj ≥ 0. Let R ∈ End(V) such that Rej = λj ej .
Then it is clear that R is a positive semidefinite squre root of T .
Clearly 3 implies 4.
Suppose 4 holds. Then there exists R ∈ End(V) such that R∗ = R and T = R2 . Hence T = R∗ R.
Finally, suppose 5 holds. Then there exists R ∈ End(V) such that T = R∗ R. It is clear that T is
2
self-adjoint. Also, given v ∈ V, we have ⟨T v, v⟩ = ∥Rv∥ ≥ 0, which prove 1.
Corollary 2.7.1. Such positive semidefinite square root is unique.
√
Proof. Suppose T ≥
√0 and there exists an eigenpair (λ, v). We shall prove that Rv = λv.
, · · · , en consisting eigenvectors
To prove Rv = λv, by 2.5 or 2.4 again, there is an orthonormal basis e1p
of R. Then since R ≥ 0. There exists λ1 , · · · , λn ≥ 0 such that Rej = λj ej for j = 1, · · · , n. Since
e1 , · · · , en is a basis, we can write
v = a1 e1 + · · · + an en
for some a1 , · · · , an ∈ F. Thus,
Rv = a1
p
p
λ 1 e1 + · · · + a n λ n en .
And hence,
R2 v = a1 λ1 e1 + · · · + an λn en .
But R2 = T , and T v = λv. Thus is implies that aj (λ − λj ) = 0 for j = 1, · · · , n. Hence,
n
X
v=
aj ej ,
j=1,λj =λ
and thus,
Rv =
n
X
√
√
aj λej = λv,
j=1,λj =λ
as required.
The unique positive semidefinite square root of T is denoted by
√
T.
Proposition 2.8. Suppose S ∈ End(V). Then the following are equivalent:
1. S is an isometry, that is, ∥Sv∥ = ∥v∥ for all v ∈ V;
2. S is unitary;
3. Se1 , · · · , Sen is orthogonal for every orthogonal vectors e1 , · · · , en ∈ V;
4. there exists an orthonormal basis e1 , · · · , en of V such that Se1 , · · · , Sen is orthonormal;
5. S ∗ S = I;
6. SS ∗ = I;
7. S ∗ is an isometry;
8. S is invertible and S −1 = S ∗ .
Proof. We will prove that 1 ⇒ 2 ⇒ 3 ⇒ 4 ⇒ 5 ⇒ 6 ⇒ 7 ⇒ 8 ⇒ 1. First suppose 1 holds. Then if V is real,
we have
2
2
4 ⟨Su, Sv⟩ = ∥S(u + v)∥ − ∥S(u − v)∥ = 4 ⟨u, v⟩ =⇒ ⟨Su, Sv⟩ = ⟨u, v⟩ .
If V is complex, then
2
2
2
2
4 ⟨Su, Sv⟩ = ∥S(u + v)∥ − ∥S(u − v)∥ + i ∥S(u + iv)∥ − i ∥S(u − iv)∥ = 4 ⟨u, v⟩ =⇒ ⟨Su, Sv⟩ = ⟨u, v⟩ .
10
Clearly 2 implies 3 and 3 implies 4.
Now suppose 4 holds. Then there exists an orthonormal basis e1 , · · · , en such that Se1 , · · · , Sen is
orthonormal. Then
⟨S ∗ Sej , ek ⟩ = ⟨ej , ek ⟩
for j, k = 1, · · · , n. Then S ∗ S = I.
Clearly 5 implies 6.
Suppose 6 holds, then
2
2
∥S ∗ v∥ = ⟨S ∗ v, S ∗ v⟩ = ⟨SS ∗ v, v⟩ = ∥v∥
for every v ∈ V, hence S ∗ is an isometry.
Suppose 7 holds, by replacing S with S ∗ in 1, we have SS ∗ = S ∗ S = I, hence 8 holds.
Finally, suppose 8 holds, then
2
2
⟨Sv⟩ = ⟨Sv, Sv⟩ = ⟨S ∗ Sv, v⟩ = ∥v∥ ,
for every v ∈ V, hence S is an isometry.
3
Singular Value Decomposition
Proposition 3.1. Let T ∈ Hom(V, W). Then
1. T ∗ T, T T ∗ ≥ 0;
2. ker T ∗ T = ker T ;
3. rank T ∗ T = rank T T ∗ = rank T = rank T ∗ .
Proof.
1. Trivial.
2. It is clear that ker T ⊆ ker T ∗ T . OTOH, suppose T ∗ T v = 0 for some v ∈ V, then 0 = ⟨T ∗ T v, v⟩ =
⟨T v, T v⟩ =⇒ T v = 0.
3. By 2 and rank-nullity, rank(T ∗ T ) = rank(T ). We want to prove that im(T ∗ )⊥ = ker(T ). Let v ∈ ker T ,
then for any T ∗ (w) ∈ im(T ∗ ), where w ∈ W, we have
⟨v, T ∗ w⟩ = ⟨T v, w⟩ = 0.
Since v is arbitrary, we have ker(T ) ⊆ im(T ∗ )⊥ . OTOH, let v ∈ im(T ∗ )⊥ , then for any w ∈ W, we have
0 = ⟨v, T ∗ w⟩ = ⟨T v, w⟩ .
Then T v = 0. Therefore, im(T ∗ )⊥ = ker(T ) and by rank-nullity, n − rank(T ∗ ) = n − rank(T ) ⇐⇒
rank(T ) = rank(T ∗ ).
Corollary 3.1.1. The mapping L : T ∗ (W ) → T (V ), L(x) = T (x) is an isomorphism.
Proof. Since rank(T ∗ ) = rank(T ), we only need to veify that L is injective. Notice that ker L = im T ∗ ∩ker T =
{0}, then we are done.
Theorem 3.2 (SVD). Let T ∈ Hom(V, W). Suppose dim V = n, dim W = m and rank(T ) = r. Then
there exists an orthonormal basis β1 , · · · , βr , · · · , βn of V, an orthonormal basis γ1 , · · · , γr , · · · , γm of W and
σ1 ≥ · · · ≥ σr > 0 such that:
(
σi γi if 1 ≤ i ≤ r;
1. T (βi ) =
0
otherwise.
11
(
σ i βi
2. T (γi ) =
0
∗
if 1 ≤ i ≤ r;
otherwise.
Proof. By 3.1 and specturm theorem, there exist a orthonormal basis β1 , · · · , βn of V consisting of eigenvectors
∗
of T ∗ T
√. By 2.7, we can assume that λ1 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0 ∗are the eigenvalues of T T . Let
σi = λi for i = 1, · · · , r. By 3.1 again, we know that T (βi ) ̸= 0 since T T (βi ) ̸= 0 for i = 1, · · · , r and
T (βj ) = 0 since T ∗ T (βj ) = 0 for j = r + 1, · · · , n. Notice that {T (β1 ), · · · , T (βr )} is an orthogonal set and
i)
for i = 1, · · · , r. Extend {γ1 , · · · , γr } to be an orthonormal
∥T (βi )∥ = σi for i = 1, · · · , r. Let γi = T (β
σi
basis {γ1 , · · · , γr , γr+1 , · · · , γm } of W. It is clear that 1 is fulfilled. For 2, we have T ∗ (γi ) = σ1i T ∗ T (βi ) = σi βi
for i = 1, · · · , r. For j = r + 1, · · · , n, since rank(T ∗ ) = rank(T ), T ∗ (γj ) = 0.
Definition 3.3. Let T ∈ Hom(V, W). Let the (orthogonal) projection from W to T (V) be P : W → T (V)
and L : T ∗ (W) → T (V ) be the one in 3.1.1, then the pseudo inverse of T , denoted by
T† : W
w
Proposition 3.4. Let T , β, γ, σ be in 3.2. Then
(
T † (γi ) =
→
V
7→ (L−1 P )(w)
βi
σi
if i = 1, · · · , r;
otherwise.
0
Proof. Since W = T (V ) ⊕ ker(T ∗ ), for i = 1, · · · , r, P (γi ) = γi , then L−1 P (γi ) = L−1 (γi ). Notice that
L(βi ) = T (βi ) = σi γi =⇒ L−1 (γi ) = σβii . For j = r + 1, · · · , n, P (γj ) = 0 =⇒ T † (γj ) = 0.
Corollary 3.4.1. T † T is the projection from V to T ∗ (W ) and T T † is the projection from W to T (V ).
Proof. The second one is similar, so we only prove that first one. By 3.4, we have
(
βi if i = 1, · · · , n;
T † T (βi ) =
0 otherwise.
Proposition 3.5. Let T , β, γ, σ be in 3.2. Then
1. (T † )† = T ;
2. (T † )∗ = (T ∗ )† .
Proof. For i = 1, · · · , r, (T † )(γi ) = σβii =⇒ (T † )† (βi ) = σi βi = T (βi ). And for j = r + 1, · · · , m, T † (γj ) =
0 =⇒ (T † )† (βj ) = 0 = T (βj ). Hence, T = (T † )† , which prove 1.
Since T (βi ) = σi γi ⇐⇒ T † (γi ) = σβii . Hence, by 3.2, (T † )∗ (βi ) = σγii . OTOH, T (βi ) = σi βi ⇐⇒
∗
T (γi ) = σi βi , then (T ∗ )† (βi ) = σγii . For j = r + 1, · · · , n is similar, then we are done.
Proposition 3.6 (Least Square). Let T ∈ Hom(V, W), then for any x ∈ V and b ∈ W, then
∥T (x) − b∥ ≥ T (T † (b)) − b .
Proof. Since T (x)−b = T (x)−(T T † )(b)+(T T † )(b)−b, where T (x)−(T T † )(b) ∈ im T and (T T † )(b)−b ker T ∗ .
Hence,
2
∥T (x) − b∥ = T (x) − (T T † )(b)
2
+ (T T † )(b) − b
2
≥ (T T † )(b) − b
12
2
⇐⇒ ∥T (x) − b∥ ≥ T T † (b) − b .
Download