Notes on Linear Algebra 1 Introduction ∗

advertisement
Notes on Linear Algebra∗
Jay R. Walton
Department of Mathematics
Texas A& M University
October 3, 2007
1
Introduction
Linear algebra provides the foundational setting for the study of multivariable mathematics
which in turn is the bedrock upon which most modern theories of mathematical physics rest
including classical mechanics (rigid body mechanics), continuum mechanics (the mechanics
of deformable material bodies), relativistic mechanics, quantum mechanics, etc. At the heart
of linear algebra is the notion of a (linear) vector space which is an abstract mathematical
structure introduced to make rigorous the classical, intuitive concept of vectors as physical
quantities possessing the two attributes of length and direction.
In these brief notes, vector spaces and linear transformations are introduced in a three
step presentation. First they are studied as algebraic objects and a few important consequences of the concept of linearity are explored. Next the algebraic structure is augmented
by introducing a topological structure (via a metric or a norm, for example) providing a
convenient framework for extending key concepts from the calculus to the vector space setting permitting a rigorous framework for studying nonlinear, multivariable functions between
vector spaces. Finally, an inner product structure for vector spaces is introduced in order to
define the geometric notion of angle (and as a special case, the key concept of orthogonality)
augmenting the notion of length or distance provided by previously by a norm or a metric.
2
Vector Space
The central object in the study of linear algebra is a Vector Space which is defined here as
follows.
Definition 1 A Vector Space over a field F 1 is a set, V, whose elements are called vectors,
endowed with two commutative and associative binary operations, called vector addition and
scalar multiplication, subject to the following axioms.
∗
c 2006 by Jay R. Walton. All rights reserved.
Copyright While in general F can be any field, in these notes it will denote either the real numbers R or the
complex numbers C.
1
1
1. If a, b ∈ V, the their sum a + b is also in V.
2. Vector addition is commutative and associative, i.e. a + b = b + a and all a, b ∈ V
and a + (b + c) = (a + b) + c for all a, b, c ∈ V.
3. There exists a zero element, 0, in V satisfying 0 + a = a for all a ∈ V.
4. Every element a ∈ V possesses a (unique) additive inverse, −a, satisfying −a + a = 0.
5. If a ∈ V and α ∈ F, then αa ∈ V.
6. For all a ∈ V, 1 a = a.
7. If a, b ∈ V and α, β ∈ F, then α(a+b) = αa+αb, (α+β)a = αa+βa, α(βa) = (αβ)a.2
Remark 1 It follows readily from these axioms that for any a ∈ V, 0a = 0 and (−1)a = −a.
(Why?)
Remark 2 The above axioms are sufficient to conclude that for any collection of vectors
a1 , . . . , aN ∈ V and scalars α1 , . . . , αN ∈ F, the Linear Combination α1 a1 + . . . + αN aN is
an unambiguously defined vector in V. (Why?)
Example 1 The prototype examples of vector spaces over the real numbers R and the
complex numbers C are the Euclidean spaces RN and CN of N -tuples of real or complex
numbers. Specifically, the elements a ∈ RN are defined by
 
a1
 .. 
a :=  . 
aN
where a1 , . . . , aN ∈ R. Vector addition and scalar multiplication are defined componentwise,
that is
 
  

a1
b1
αa1 + βb1
 
  

..
αa + βb = α  ...  + β  ...  = 
.
.
aN
bN
αaN + βbN
Analogous relations hold for the complex case.
Example 2 The vector space R∞ is defined as the set of all (infinite) sequences, a, of real
numbers, that is, a ∈ R∞ means
!
a1
a = ..
.
with vector addition and scalar multiplication done componentwise.
2
The symbol “+” has been used to denote both the addition of vectors and the addition of scalars. This
should not generate any confusion since the meaning of “+” should be clear from the context in which it is
used.
2
Definition 2 A subset U of a vector space V is called a Subspace provided it is closed with
respect to vector addition and scalar multiplication (and hence is also a vector space).
Example 3
One shows easily that RN is a subspace of RM whenever N ≤ M . (Why?)
Example 4 The vector space R∞
f is the set all of infinite sequences of real numbers with
∞
only finitely many non-zero components. It follows readily that R∞
f is a subspace of R .
(Why?)
Definition 3 Let A be a subset of a vector space V. Then the Span of A, denoted span(A),
is defined to be the set of all (finite) linear combinations formed from the vectors in A. A
subset A ⊂ V is called a Spanning Set for V if span(A) = V. A spanning set A ⊂ V is called
a Minimal Spanning Set if no proper subset of A is a spanning set for V.
Remark 3
2.1
One readily shows that span(A) is a subspace of V. (How?)
Linear Independence and Dependence
The most important concepts associated with vector spaces are Linear Independence and
Linear Dependence.
Definition 4 A set of vectors A ⊂ V is said to be Linearly Independent if given any vectors
a1 , . . . , aN ∈ A, then the only way to write the zero vector as a linear combination of these
vectors is if all coefficients are zero, that is,
α1 a1 + . . . + αN an = 0
if and only if
α1 = . . . = αN = 0.
Remark 4 If a set of vectors A is linearly independent and B ⊂ A, then B is also linearly
independent. (Why?)
Remark 5 A minimal spanning set is easily seen to be linearly independent. Indeed,
suppose A is a minimal spanning set and suppose it is not linearly independent. Then there
exists a subset {a1 , . . . , aN } of A such that aN can be written as a linear combination of
a1 , . . . , aN −1 . It follows that A ∼ {aN } (set difference) is a proper subset of A that spans V
contradicting the assumption that A is a minimal spanning set.
Definition 5 If a 6= 0, then span({a}) is called a Line in V. If {a, b} is linearly independent, then span({a, b}) is called a Plane in V.
Definition 6 A set of vectors A ⊂ V is said to be Linearly Dependent if there exists a set
of vectors a1 , . . . , aN ∈ V and scalars α1 , . . . , αN ∈ F, not all zero, with
α1 a1 + . . . + αN aN = 0.
3
Remark 6
A set of vectors that contains the zero vector is linearly dependent. (Why?)
Remark 7 If the set of vectors A is linearly dependent and A ⊂ B, then B is also linearly
dependent. (Why?)
Remark 8 If A = {a1 , . . . , aN } ⊂ V is linearly dependent, then one of the aj can be
written as a non-trivial (not all scalar coefficients zero) linear combination of the others,
that is, one of the aj is in the span of the others.
2.2
Basis
The notion of Basis for a vector space is intimately connected with spanning sets of minimal
cardinality. Here attention is restricted to vector spaces with spanning sets containing only
finitely many vectors.
Definition 7 A vector space V is called Finite Dimensional provided it has a spanning set
with finitely many vectors.
Theorem 1 Suppose V is finite dimensional with minimal spanning set A = {a1 , . . . , aN }.
Let B be another minimal spanning set for V. Then #(B) = N , where #(B) denotes the
cardinality of the set B.
Proof: It suffices to assume that #(B) ≥ N . (Why?) Then choose B 0 := {b1 , . . . , bN } ⊂ B.
Clearly, B 0 is linearly independent. (Why?) Since A is spanning, there exist scalars αij such
P
αij aj . Then the matrix L := [αij ] is invertible. (Why?) Suppose L−1 = [βij ].
that bi = N
Pj=1
0
Then ai = N
j=1 βij bj , that is, the set {a1 , . . . , aN } is in the span of B . But since A spans
0
V, it follows that B must also span V. But since B was assumed to be a minimal spanning
set, it must be that B = B 0 . (Why?) Hence, #(B) = N .3
This theorem allows one to define unambiguously the notion of dimension for a finite
dimensional vector space.
Definition 8 Let V be a finite dimensional vector space. Then its Dimension, denoted
dim(V), is defined to be the cardinality of any minimal spanning set.
Definition 9 A Basis for a finite dimensional vector space is defined to be any minimal
spanning set.
It follows that given a basis B = {b1 , . . . , bN } for a finite dimensional vector space V,
every vector a ∈ V can be written uniquely as a linear combination of the base vectors, that
is
N
X
a=
αj bj
(1)
j=1
with the coefficients α1 , . . . , αN being uniquely defined. These coefficients are called the
coordinates or components of a with respect to the basis B. It is often convenient collect
3
The symbol is used in these notes to indicate the end of a proof.
4
these components into an N -tuple denoted


α1
 
[a]B :=  ...  .
αN
Notational Convention: It is useful to introduce simplifying notation for sums of the
sort (1). Specifically, in the Einstein Summation Convention, expressions in which an index
is repeated are to be interpreted as being summed over the range of the index. Thus (1)
reduces to
N
X
a=
αj bj = αj bj .
j=1
With the Einstein summation convention in effect, if one wants to write an expression like
αj bj for a particular, but unspecified, value of j, then one must write
αj bj (no sum)
Example 5 In the Euclidean vector space RN , it is customary to define the Natural Basis
N = {e1 , . . . , eN } to be
 
 
 
1
0
0
0
1
0
 
 
 
e1 :=  ..  , e2 :=  ..  , . . . , eN :=  ..  .
.
.
.
0
0
1
Finding components of vectors in RN with respect to the natural basis is then immediate.
 
 
a1
a1
 .. 
 .. 
N
a= . ∈R
=⇒ [a]N =  . 
aN
aN
that is,
a = aj ej .
In general, finding the components of a vector with respect to a prescribed basis usually
involves solving a linear system of equations. For example, suppose a basis B = {b1 , . . . , bN }
in RN is given as
 


b11
b1N
 


b1 :=  ...  , . . . , bN :=  ...  .
bN 1
bN N
Then to find the components


α1
 
[a]B =  ... 
αN
5
of a vector


a1
 
a :=  ... 
aN
with respect to B, one solves the linear system of equations ai = bij αj , i = 1, . . . , N which
in expanded form is
a1 = b11 α1 + . . . + b1N αN
..
.
aN = bN 1 α1 + . . . + bN N αN .
2.3
Change of Basis
Let B = {b1 , . . . , bN } and B̃ = {b̃1 , . . . , b̃N } be two bases for the vector space V. The
question considered here is how to change components of vectors with respect to the basis B
into components with respect to the basis B̃? To answer the question, one needs to specify
some relation between the two sets of base vectors. Suppose, for example, one can write
the base vectors b̃j as linear combinations of the base vectors bj , that is, one knows the
components of each N -tuple [b̃j ]B , j = 1, . . . , N . To be more specific, suppose
b̃j = tij bi ,
for j = 1, . . . , N.
(2)
Form the matrix T := [tij ] called the Transition Matrix between the bases B and B̃. Then
T is invertible. (Why?) Let its inverse have components T −1 = [sij ]. By definition of matrix
inverse, one sees that
sik tkj = δij
where the Kronecker symbol δij is defined as
1 when i = j,
δij :=
0 when i =
6 j.
Suppose now that a vector a has components


α1
 
[a]B =  ...  .
αN
Then one shows readily (How?) that the components of a with respect to the basis B̃ are
given by
 
α̃1
 .. 
[a]B̃ =  . 
α̃N
with
α̃i = sij αj .
In matrix notation this change of basis formula becomes
[a]B̃ = T −1 [a]B .
6
(3)
3
Linear Transformations
Linear transformations are at the heart of the subject of linear algebra. Important in their
own right, they also are the principal tool for understanding nonlinear functions between
subsets of vector spaces. They will be studied here first from an algebraic perspective.
Subsequently, they will be studied geometrically.
V −→ U, between the vector spaces V and U
Definition 10 A linear transformation, L :
is a function satisfying
L(αa + βb) = αLa + βLb,
for every α, β ∈ F and a, b ∈ V.4
(4)
Remark 9: The symbol “+” on the left hand side of (4) denotes vector addition in V
whereas on right hand side of (4) it denotes vector addition in U.
Remark 10: From the definition (4) it follows immediately that once a linear transformation
has been specified on a basis for the domain space, its values are determined on all of the
domain space through linearity.
Two important subspaces associated with a linear transformation are its Range and
Nullspace. Consider a linear transformation
L : V −→ U.
(5)
Definition 11 The Range of a linear transformation (5) is the subset of U, denoted Ran(L),
defined by Ran(L) := {u ∈ U| there exists v ∈ V such that u = Lv}.
Definition 12 The Nullspace of a linear transformation (5) is the subset of V, denoted
Nul(L), defined by Nul(L) := {v| Lv = 0}.
Remark 9 It is an easy exercise to show that Ran(L) is a subspace of U and Nul(L) is a
subspace of V. (How?)
Example 6
Let A be an M × N matrix




a11
a1N




A :=  ...  · · ·  ...  = (a1 · · · aN )
aM 1
aM N
where a1 , . . . , aN denote the column vectors of A, that is, A is viewed as consisting of N column vectors from RM .
4
In analysis it is customary to make a notational distinction between the name of a function and the
values the function takes. Thus, for example, one might denote a function by f and values the function
takes by f (x). For a linear transformation, L, it is customary to denote its values by La. This notation is
suggestive of matrix multiplication.
7
Definition 13 The Column Rank of A is defined to be the dimension of the subspace of RM
spanned by its column vectors.
One can define a linear transformation A : RN −→ RM by
Av := Av (matrix multiplication).
(6)
It is readily shown that the range of A is the span of the column vectors of A,
Ran(A) = span{a1 , . . . , aN }
and the nullspace of A is the set of N -tuples
 
v1
 .. 
v =  .  ∈ RN
vN
satisfying
v1 a1 + · · · + vN aN = 0.
It follows that dim(Ran(A)) equals the number of linearly independent column vectors of A.
One can also view the matrix A as consisting of M -row vectors from RN and define its
row rank by
Definition 14 The Row Rank of A is defined to be the dimension of the subspace of RN
spanned by its row vectors.
A fundamental result about matrices is
Theorem 2 Let A be an M × N matrix. Then
Row Rank of A = Column Rank of A.
(7)
Proof: Postponed.
A closely related result is
Theorem 3 Let A be an M × N matrix and A the linear transformation (6). Then
dim(Ran(A)) + dim(Nul(A)) = dim(RN ) = N.
(8)
The result (8) is a special case of the more general
Theorem 4 Let L : V −→ U be a linear transformation. Then
dim(Ran(L)) + dim(Nul(L)) = dim(V).
(9)
The proof of this last theorem is facilitated by introducing the notion of Direct Sum
Decomposition of a vector space V.
8
Definition 15 Let V be a finite dimensional vector space. A Direct Sum Decomposition of
V, denoted V = U ⊕ W, consists of a pair of subspaces, U, W of V with the property that
every vector v ∈ V can be written uniquely as a sum v = u + w with u ∈ U and w ∈ W.
Remark 10
It follows easily from the definition of direct sum decomposition that
V =U ⊕W
=⇒
dim(V) = dim(U) + dim(W). (Why?)
Example 7 Two non-colinear lines through the origin form a direct sum decomposition
of R2 (Why?), and a plane and non-coincident line through the origin form a direct sum
decomposition of R3 (Why?).
Proof of Theorem 4: Let {f1 , . . . , fn } be a basis for Nul(L) where n = dim(Nul(L)). Fill
out this set to form a basis for V, {g1 , . . . , gN −n , f1 , . . . , fn } (How?). Define the subspace
Ṽ := span({g1 , . . . , gN −n }). Then Ṽ and Nul(L) form a direct sum decomposition of V
(Why?), that is
V = Ṽ ⊕ Nul(L).
Moreover, one can easily show that L restricted to Ṽ is a one-to-one transformation onto
Ran(L) (Why?). It follows that {Lg1 , . . . , LgN −n } forms a basis for Ran(L) (Why?) and
hence that dim(Ran(L)) = dim(Ṽ) = N − n (Why?) from which one concludes (9).
3.1
Matrices and Linear Transformations
Let L : V −→ U be a linear transformation and let B := {b1 , . . . , bN } be a basis for V
and C := {c1 , . . . , cM } a basis for U. Then the action of the linear transformation L can be
carried out via matrix multiplication of coordinates in the following manner. Given v ∈ V,
define u := Lv. Then the Matrix of L with respect to the bases B and C is defined to be the
unique matrix L = [L]B,C satisfying
[u]C = L[v]B
(matrix multiplication)
or in more expanded form
[Lv]C = [L]B,C [v]B .
(10)
To see how to construct [L]B,C , first construct the N column vectors (M -tuples)
lj := [Lbj ]C , that is the coordinates with respect to the basis C in U of the image vectors Lbj of the base vectors in B. Then, one easily shows that defining
[L]B,C := [l1 . . . lN ] (matrix whose columns are l1 , . . . , lN )
(11)
gives the unique M × N matrix satisfying (10). Indeed, if v ∈ V has components [v]B = [vj ],
that is v = vj bj , then
[Lv]C = [vj Lbj ]C = [l1 . . . lN ][v]B
as required to prove (11).
Remark 11 If U = V and B = C, then the matrix of L with respect to B is denoted [L]B
and is constructed by using the B-coordinates of the vectors Lbj , [Lbj ]B , as its columns.
Example 8
The identity transformation I : V −→ V (Iv = v for all v ∈ V) has
[I]B = [δij ]
where δij is the Kronecker symbol. (Why?)
9
3.2
Change of Basis
Suppose L : V −→ V and B = {b1 , . . . , bN } and B̃ = {b̃1 , . . . , b̃N } are two bases for V.
The question of interest here is: How are the matrices [L]B and [L]B̃ related? The question
is easily answered by using the change of basis formula (3). From the definition of [L]B and
[L]B̃ one has
[Lv]B = [L]B [v]B and [Lv]B̃ = [L]B̃ [v]B̃ .
It follows from (3) that
[Lv]B̃ = T −1 [Lv]B = T −1 [L]B [v]B = T −1 [L]B T [v]B̃
from which one concludes
[L]B̃ = T −1 [L]B T.
Remark 12
(12)
Square matrices A and B satisfying
A = T −1 BT
for some (non-singular) matrix T are said to be Similar. Thus, matrices related to a given
linear transformation through a change of basis are similar and conversely similar matrices
correspond the same linear transformation under a change of basis.
3.3
Trace and Determinant of a Linear Transformation
Consider linear transformations on a given finite dimensional vector space V, i.e. L : V −→
V. Then its Trace is defined by
Definition 16 The trace of L, denoted tr(L), is defined by
tr(L) := tr([L]B )
for any matrix representation of L.
To see that this notion of trace of a linear transformation is well (unambiguously) defined,
let [L]B and [L]B̃ denote two different matrix representations of L. Then they must be similar
matrices, i.e. there exists a non-singular matrix T such that
[L]B̃ = T −1 [L]B T.
But then from the elementary property of the trace of a square matrix, tr(AB) = tr(BA)
for any two square matrices A, B (Why true?), one has
tr([L]B̃ ) = tr(T −1 [L]B T ) = tr(T T −1 [L]B ) = tr([L]B ).
In similar fashion, one can define the Determinant of a linear transformation on V.
Definition 17 The determinant of a linear transformation L on a finite dimensional vector
space V is defined by
det(L) := det([L]B )
where [L]B is any matrix representation of L.
That this notion of the determinant of a linear transformation is well defined can be
seen from an argument similar to that above for the trace. The relevant property of the
determinant of square matrices needed in the argument is det(AB) = det(A) det(B) (Why
true?).
10
4
Metric, Norm and Inner Product
In this section, the algebraic structure for vector spaces presented above is augmented
through the inclusion of geometric structure. Perhaps the most basic geometric notion is
that of distance or length. A convenient way to introduce a notion of distance into a vector
space is through use of a Metric.
Definition 18 A Metric on a vector space V is a real valued function of two variables
d(·, ·) : V −→ R satisfying the following axioms. For any a, b, c ∈ V,
d(a, b) ≥ 0,
d(a, b) = 0 ⇔ a = b
d(a, b) = d(b, a)
d(a, b) ≤ d(a, c) + d(c, b).
with
(P ositivity)
(Symmetry)
(T riangle Inequality)
(13)
Remark 13 Given a metric on a vector space, one can define the important concept of
a metric ball. Specifically, an (open) metric ball, B(c, r), centered at c and of radius r is
defined to be the set
B(c, r) := {v ∈ V| d(c, v) < r}.
One can also define a notion of metric convergent sequence as follows.
{aj , j = 1, 2, . . .} ⊂ V is said to be metric convergent to a ∈ V provided
A sequence
lim d(aj , a) = 0.
j→∞
An equivalent definition in terms of metric balls states that for all > 0, there exists N > 0
such that
aj ∈ B(a, ) for all j > N.
An important class of metrics on a vector space comes from the notion of a Norm.
Definition 19 A Norm on a vector space is a non-negative, real valued function
| · | : V −→ R satisfying the following axioms. For all a, b ∈ V and α ∈ F,
|a| ≥ 0,
Remark 14
with |a| = 0 ⇔ a = 0
|αa| = |α||a|
|a + b| ≤ |a| + |b|.
(Positivity)
(Homogeneity)
(Triangle Inequality)
(14)
A norm has an associated metric defined by
d(a, b) := |a − b|.
An important class of norms on a vector space comes from the notion of Inner Product.
Here a distinction must be made between real and complex vector spaces.
11
Definition 20 An inner product on a real vector space is a positive definite, symmetric
bilinear form h·, ·i : V × V ←→ R satisfying the following axioms. For all a, b, c ∈ V and
α ∈ R,
ha, ai ≥ 0 with
ha, ai = 0 ⇔ a = 0
ha, bi = hb, ai
hαa, bi = αha, bi
ha + c, bi = ha, bi + hc, bi.
(Positivity)
(Symmetry)
(Homogeneity)
(Additivity)
(15)
Definition 21 An inner product on a complex vector space is a positive definite, conjugatesymmetric form h·, ·i : V × V ←→ C satisfying the following axioms. For all a, b, c ∈ V
and α ∈ C,
ha, ai ≥ 0
ha, ai = 0 ⇔ a = 0
ha, bi = hb, ai
hαa, bi = αha, bi
ha + c, bi = ha, bi + hc, bi
with
(Positivity)
(Conjugate-Symmetry)
(Homogeneity)
(Additivity)
(16)
where in (16), hb, ai denotes the complex conjugate of hb, ai.
It is evident from the definition that a complex inner product is linear in the first variable
but conjugate linear in the second. In particular, for all a, b ∈ V and α ∈ C
hαa, bi = αha, bi whereas ha, αbi = αha, bi.
Remark 15 Given an inner product on either a real or complex vector space, one can
define an associated norm by
p
|a| := ha, ai.
(Why does this define a norm?)
Remark 16
In RN , the usual inner product, defined by


 
a1
b1
 .. 
 .. 
where a =  .  , b =  .  ,
aN
bN
ha, bi := a1 b1 + . . . + aN bN = aj bj
is often denoted by a · b, the so-called “dot product”. In CN , the usual inner product is
given by
ha, bi = a · b = a1 b1 + . . . + aN bN
where now aj bj ∈ C and the over-bar denotes complex conjugation.
Example 9 Let A be a positive definite, symmetric, real, N × N matrix.5 Then A defines
an inner product on RN through
ha, biA := a · (Ab).
5
Recall that symmetric means that AT = A, where AT denotes the transpose matrix, and positive definite
means a · (Aa) > 0 for all a ∈ RN .
12
Example 10
p ≥ 1 by
An important class of norms on RN and CN are the p-norms defined for
k a kp :=
N
X
!1/p
|aj |p
.
(17)
j=1
One must show that (17) satisfies the axioms (14) for a norm. The positivity and symmetry
axioms from (14) are straight forward to show, but the triangle inequality is not. A proof
that (17) satisfies the triangle inequality in (14) can be found in the supplemental Notes on
Inequalities. The special case p = 2 is the famous Minkowski Inequality
k a + b k2 ≤ k a k2 + k b k2
for any a, b ∈ RN .
Example 11
The ∞-norm is defined by
k a k∞ := max{|a1 |, . . . , |aN |}.
(18)
The task of showing that (18) satisfies the axioms (14) for a norm is left as an exercise. An
important observation about the p-norms is that
lim k a kp = k a k∞
p→∞
whose proof is also left as an exercise. That the ∞-norm is the limit of the p-norms as
p → ∞ is what motivates the definition (18).
Remark 17 The 2-norm is very special among the class of p-norms being the only one
coming from an inner product. The 2-norm is also called the Euclidean Norm since it comes
from the usual inner product on the Euclidean space RN .
Remark 18 In a real vector space (V, h·, ·i), one can define a generalized notion of angle
between vectors. Specifically, given a, b ∈ V, one defines the angle, θ, between a and b to be
the unique number 0 < θ < π satisfying
cos θ =
ha, bi
.
|a||b|
As a special case, two vectors a, b ∈ V are said to be Orthogonal (in symbols a ⊥ b) if
ha, bi = 0.
(19)
Remark 19 Analytically or topologically all norms on a finite dimensional vector space
are equivalent in the sense that if a sequence is convergent with respect to one norm, it is
convergent with respect to all norms on the space. This observation follows directly from
the following important
13
Theorem 5 Let V be a finite dimensional vector space and let k · kI and k · kII be two norms
on V. Then there exist positive constants m, M such that for all v ∈ V
mkvkI ≤ kvkII ≤ M kvkI .
Proof: The proof of this result is left as an exercise.
While all norms on a finite dimensional vector space might be topologically equivalent,
they are not geometrically equivalent. In particular, the “shape” of the (metric) unit ball
can vary greatly for different norms. That fact is illustrated in the following
Exercise 1
4.1
Sketch the unit ball in R2 for the p-norms, 1 ≤ p ≤ ∞.
Orthonormal Basis
In a finite dimensional inner product space (V, h·, ·i), an important class of bases consists of
the Orthonormal Bases defined as follows.
Definition 22 In an inner product space (V, h·, ·i), a basis B = {b1 , . . . , bN } is called
Orthonormal provided
hbi , bj i = δij for i, j = 1, . . . , N.
(20)
Thus, an orthonormal basis consists of pairwise orthogonal unit vectors. Finding coordinates
of arbitrary vectors with respect to such a basis is as convenient as it is for the natural basis
in FN . Indeed, what makes the natural basis so useful is that it is an orthonormal basis with
respect to the usual inner product on RN .
Remark 20 Suppose B = {b1 , . . . , bN } is an orthonormal basis for (V, h·, ·i) and a ∈ V.
Then one easily deduces that
a = a1 b1 + . . . + aN bN
=⇒
aj = ha, bj i
as can be seen by taking the inner product of the first equation with each base vector bj .
Moreover, one has
ha, ci = [a]B · [c]B
(21)
where the right hand side is the usual dot-product of N -tuples in RN
[a]B · [c]B = a1 c1 + . . . + aN cN .
Thus, computing the inner product of vectors using components with respect to an orthonormal basis uses the familiar formula for the dot-product on RN with respect to the natural
basis.
In similar fashion one can derive a simple formula for the components of the matrix
representation of a linear transformation on an inner product space with respect to an orthonormal basis. Specifically, suppose L : V −→ V and B = {b1 , . . . , bN } is an orthonormal
basis. Then it is straightforward to show that
L := [L]B = [lij ] with lij = hbi , Lbj i.
14
(22)
One way to derive (22) is to first recall the formula
lij = ei · (Lej )
(23)
for the components of an N × N matrix, L := [lij ], where {e1 , . . . , eN } is the natural basis
for RN . It follows from (21) that for all a, c ∈ V
ha, Lci = [a]B · ([Lc]B ) = [a]B · ([L]B [c]B ).
(24)
Now letting a = bi , c = bj and noting that [bj ]B = ej for all j = 1, . . . , N , one deduces (22)
from (23,24).
4.1.1
General Bases and the Metric Tensor
More generally, one would like to be able to conveniently compute the inner product of
two vectors using coordinates with respect to some general (not necessarily orthonormal)
basis. Suppose B := {f1 , . . . , fN } is a basis for (V, h·, ·i) and let a, b ∈ V with coordinates
[a]B = [aj ], [b]B = [bj ], respectively. Then one has
N
N
N
X
X
X
aj bk hfj , fk i = [a]B · (F [b]B )
bk fk i =
ha, bi = h
aj f j ,
j=1
k=1
j,k=1
where the matrix
F := [hfi , fj i] = [fij ]
(25)
is called the Metric Tensor associated with the basis B. It follows that the familiar formula
(21) for the dot-product in RN holds in general if and only if the basis B is orthonormal.
The matrix F is called a metric tensor because it is needed to compute lengths and
angles using coordinates with respect to the basis B. In particular, if v ∈ V has coordinates
[v]B = [vj ], then |v| (the length of v) is given by
|v|2 = hv, vi = fij vi vj .
4.1.2
The Gram-Schmidt Orthogonalization Procedure
Given any basis B = {f1 , . . . , fN } for an inner product space (V, h·, ·i), there is a simple
algorithm, called the Gram-Schmidt Orthogonalization Procedure, for constructing from B
an orthonormal basis B̃ = {e1 , . . . , eN } satisfying
span{f1 , . . . , fK } = span{e1 , . . . , eK } for each 1 ≤ K ≤ N.
(26)
The algorithm proceeds inductively as follows.
Step 1: Define e1 := f1 /kf1 k, where k·k denotes the norm associated with the inner product
h·, ·i.
Step 2: Subtract from f2 its component in the direction of e1 and the normalize. More
specifically, define
g2 := f2 − hf2 , e1 ie1
15
and then define
e2 :=
g2
.
kg2 k
It is easily seen that {e1 , e2 } is an orthonormal pair satisfying (26) for K = 1, 2.
Inductive Step: Suppose orthonormal vectors {e1 , . . . , eJ } have been constructed satisfying
(26) for 1 ≤ K ≤ J with J < N . Define
gJ+1 := fJ+1 − hfJ+1 , e1 ie1 − . . . − hfJ+1 , eJ ieJ
and then define
eJ+1 :=
gJ+1
.
kgJ+1 k
One easily sees that {e1 , . . . , eJ+1 } is an orthonormal set satisfying (26) for 1 ≤ K ≤ J + 1
thereby completing the inductive step.
4.2
Reciprocal Basis
Let (V, h·, ·i) be a finite dimensional inner product space with general basis B = {f1 , . . . , fN }.
Then B has an associated basis, called its Reciprocal or Dual Basis defined through
Definition 23 Given a basis B = {f1 , . . . , fN } on the finite dimensional inner product space
(V, h·, ·i), there is a unique Reciprocal Basis B ∗ := {f 1 , . . . , f N } satisfying
hfi , f j i = δij
for
i, j = 1, . . . , N.
(27)
Thus, every vector v ∈ V has the two representations
v =
N
X
v i fi
(Contravariant Expansion)
(28)
i=1
=
N
X
vi f i . (Covariant Expansion)
(29)
i=1
The N -tuples [v]B = [v i ] and [v]B ∗ = [vi ] are called the Contravariant Coordinates and
Covariant Coordinates of v, respectively. It follows immediately from (27,28,29) that the
contravariant and covariant coordinates of v can be conveniently computed via the formulae
v i = hv, f i i and vi = hv, fi i.
(30)
Exercise 2 Let B = {f1 , f2 } be any basis for the Cartesian plane R2 . Show how the
reciprocal basis B ∗ = {f 1 , f 2 } can be constructed graphically from B.
Exercise 3 Let B = {f1 , f2 , f3 } be any basis for the Cartesian space R3 . Show that the
reciprocal basis B ∗ = {f 1 , f 2 , f 3 } is given by
f1 =
f2 × f3
,
f1 · (f2 × f3 )
f2 =
f1 × f3
,
f2 · (f1 × f3 )
where a × b denotes the vector cross product of a and b.
16
f3 =
f1 × f2
f3 · (f1 × f2 )
5
Linear Transformations Revisited
The theory of linear transformations between finite dimensional vector spaces is revisited in
this section. After some general considerations, several important special classes of linear
transformations will be studied from both an algebraic and a geometric perspective. The
first notion introduced is the vector space Lin(V, U) of linear transformations between two
vector spaces.
5.1
The Vector Space Lin(V, U)
Definition 24 Given two vector spaces, V, U, Lin(V, U) denotes the collection of all linear
transformations between V and U. It is a vector space under the operations of addition and
scalar multiplication of linear transformations defined as follows. Given two linear transformations L1 , L2 : V −→ U their sum, (L1 + L2 ), is defined to be the linear transformation
satisfying
(L1 + L2 )v := L1 v + L2 v for all v ∈ V.
Moreover, scalar multiplication of a linear transformation is defined by
(αL)v := αLv
for all
α∈F
and
v ∈ V.
For the special case V = U, one denotes the vector space of linear transformations on V
by Lin(V).
5.1.1
Adjoint Transformation
Let L ∈ Lin(V, U) be a linear transformation between two inner product spaces (V, h·, ·i)
and (U, h·, ·i). The Adjoint Transformation, L∗ , is defined to be the unique transformation
in Lin(U, V) satisfying
hLv, ui = hv, L∗ ui for all v ∈ V
and u ∈ U.
(31)
Remark 21 The right hand side of (31) is the inner product on V whereas the left hand
side is the inner product on U.
Remark 22
This definition of adjoint makes sense due to the
Theorem 6 (Riesz Representation Theorem) Suppose l : V −→ F is a linear, scalar valued
transformation on the inner product space (V, h·, ·i). Then there exists a unique vector a ∈ V
such that
lv = hv, ai for all v ∈ V.
Note that for fixed u ∈ U, lv := hLv, ui is a scalar valued linear transformation on V. By
the Riesz Representation Theorem, there exists a unique vector a ∈ V such that lv = hv, ai
for all v ∈ V. One now defines L∗ u := a ∈ V.
17
Example 12 Suppose V = RN , U = RM and L ∈ MM,N (R), the space of all M × N
(real) matrices. Then L∗ = LT , the transpose of L. (Why?) On the other hand, if V = CN ,
T
U = CM and L ∈ MM,N (C), the space of all M × N (complex) matrices. Then L∗ = L ,
the conjugate transpose of L. (Why?)
Remark 23 Consider the special case V = U (over F = R) and let B := {f1 , . . . , fN } be
any basis for V. Then one can show that for any A ∈ Lin(V)
[A∗ ]B = F −1 [A]TB F
(32)
where the matrix F is given by
F := [fij ]
with
fij := hfi , fj i.
The proof of (32) is left as an exercise.
5.1.2
Two Natural Norms for Lin(V, U)
One can define a natural inner product on Lin(V) by
L1 · L2 := tr(L∗1 L2 )
(33)
where L∗1 denotes the adjoint of L1 . (Why does this satisfy the axioms (15) or (16)?)
(1)
Remark 24 In the case V = RN with L1 and L2 given by N × N - matrices L1 = [lij ] ,
(2)
(2)
L2 = [lij ], respectively, then one readily shows (How?) that
L1 · L2 =
N
X
(1) (2)
lij lij .
i,j=1
One can readily show that (33) also defines an inner product on Lin(V, U) where V and
U are any two finite dimensional vector spaces. (How?) The norm associated to this inner
product, called the trace norm, is then defined by
p
√
|L| := L · L = tr(L∗ L).
(34)
Another natural norm on Lin(V, U), where (V, | · |) and (U, | · |) are normed vector spaces,
called the operator norm, is defined by
kLk := sup |Lv|.
(35)
|v|=1
Question: What is the geometric interpretation of (35)? An important relationship between
the trace norm and operator norm is given by
18
Theorem 7 Let (V, h·, ·i) and (U, h·, ·i) be two finite dimensional, inner product spaces.
Then for all L ∈ Lin(V, U),
√
kLk ≤ |L| ≤ N kLk.
(36)
Proof: The proof uses ideas to be introduced in the up coming study of spectral theory.
To see how spectral theory plays a role in proving (36), first notice that
p
p
kLk = sup |Lv| = sup hLv, Lvi = sup hv, L∗ Lvi.
|v|=1
|v|=1
|v|=1
Next observe that L∗ L is a self-adjoint, positive linear transformation in Lin(V). As part
of spectral theory for self-adjoint operators on a finite dimensional inner product space, it is
shown that
sup hv, L∗ Lvi = max |λj |2
j=1,...,N
|v|=1
where {|λj |2 , j = 1, . . . , N } are the (positive) eigenvalues of L∗ L. Thus, one has
kLk = max |λj |.
j=1,...,N
(37)
On the other hand, spectral theory will show that for the trace norm, one has
∗
2
|L| = tr(L L) =
N
X
|λj |2 .
(38)
j=1
Appealing to the obvious inequalities
2
max |λj | ≤
j=1,...,N
N
X
|λj |2 ≤ N max |λj |2 ,
j=1,...,N
j=1
one readily deduces (36) from (37,38).
5.1.3
Elementary Tensor Product
Among the most useful linear transformations is the class of Elementary Tensor Products
which can be used as the basic building blocks of all other linear transformations. First
consider the case of Lin(V) where (V, h·, ·i) is an inner product space.
Definition 25 Let a, b ∈ V. Then the elementary tensor product of a and b, denoted a ⊗ b,
is defined to be the linear transformation on V satisfying
(a ⊗ b) v := hv, bia
for all v ∈ V.
(39)
Remark 25 It follows immediately from the definition that Ran(a ⊗ b) = span(a) and
Nul(a ⊗ b) = b⊥ , where b⊥ denote the orthogonal complement of b, i.e. the (N − 1)dimensional subspace of all vectors orthogonal to b (where dim(V) = N ). The elementary
tensor products are thus the rank one linear transformations in Lin(V).
19
Remark 26 The following identities can also be derived easily from the definition of
elementary tensor product. Given any a, b, c, d ∈ V and A ∈ Lin(V)
(a ⊗ b)∗
(a ⊗ b) · (c ⊗ d)
A · (a ⊗ b)
(a ⊗ b) (c ⊗ d)
A (a ⊗ b)
(a ⊗ b) A
tr(a ⊗ b)
=
=
=
=
=
=
=
b⊗a
ha, cihb, di
ha, Abi
hb, cia ⊗ d
(Aa) ⊗ b
a ⊗ (A∗ b)
ha, bi.
Example 13 When V = RN and a = (ai ), b = (bi ), then the matrix of a ⊗ b with respect
to the natural basis on RN is
[a ⊗ b]N = [ai bj ]. (Why?).
(40)
Remark 27 The result (40) has a natural generalization to a general finite dimensional
inner product space (V, h·, ·i). Let B := {e1 , . . . , eN } be an orthonormal basis for V and
a, b ∈ V. Then
[a ⊗ b]B = [ai bj ]
(41)
where [a]B = [ai ] and [b]B = [bi ]. Hence, the formula (40) generalizes to the matrix representation of an elementary tensor product with respect to any orthonormal basis for V.
More generally, if B = {f1 , . . . , fN } is any (not necessarily orthonormal) basis for V, then
[a ⊗ b]B = [ai bj ]F
(matrix product)
where F is the metric tensor associated with the basis B defined by (25). Verification of this
last formula is left as an exercise.
The notion of elementary tensor product is readily generalized to the setting of linear
transformations between two inner product spaces (V, h·, ·i) and (U, h·, ·i) as follows.
Definition 26 Let a ∈ U and b ∈ V. Then the elementary tensor product a ⊗ b is defined
to be the linear transformation in Lin(V, U) satisfying
(a ⊗ b) v := hv, bia
5.1.4
for all v ∈ V.
A Natural Orthonormal Basis for Lin(V, U)
Using the elementary tensor product, one can construct a natural orthonormal basis for
Lin(V, U). First consider the special case of Lin(V). Suppose (V, h·, ·i) is an inner product
space with orthonormal basis B := {e1 , . . . , eN }. Then one easily shows that the set
B := {ei ⊗ ej , i, j = 1, . . . , N }
provides an orthonormal basis for Lin(V) equipped with the inner product (33). If follows
immediately that
dim(Lin(V)) = N 2 .
20
Remark 28
If L ∈ Lin(V), then
L = lij ei ⊗ ej
with lij = hL, ei ⊗ ej i = hei , Lej i. (Why?)
(42)
More generally, suppose BU = {b1 , . . . , bM } is an orthonormal basis for U and BV =
{d1 , . . . , dN } is an orthonormal basis for V. Then a natural orthonormal basis for Lin(V, U)
(with respect to the trace inner product on Lin(V, U)) is
{bi ⊗ dj | i = 1, . . . , M, j = 1, . . . , N }.
Indeed, that these tensor products are pairwise orthogonal and have trace norm one follows
from
tr((bi ⊗ dj )∗ bk ⊗ dl )
tr(dj ⊗ bi bk ⊗ dl )
hbi , bk itr(dj ⊗ dl )
hbi , bk ihdj , dl i = δik δjl .
hbi ⊗ dj , bk ⊗ dl i =
=
=
=
The generalization of (42) for L ∈ Lin(V, U) is
L = lij bi ⊗ dj
with lij := hL, bi ⊗ dj i = hbi , Ldj i. (Why?)
Remark 29 Suppose B := {e1 , . . . , eN } is an orthonormal basis for V. Then every L ∈
Lin(V) has the associated matrix [ˆlij ] defined through
L=
N
X
ˆlij ei ⊗ ej .
i,j=1
On the other hand, L also has the associated matrix representation [L]B = [lij ] defined
through the relation
[Lv]B = [L]B [v]B
which must hold for all v ∈ V with
lij = hei , Lej i.
It is straight forward to show (How?) that these two matrices are equal, i.e.
[L]B = [lij ] = [ˆlij ].
5.1.5
General Tensor Product Bases for Lin(V) and Their Duals
Let B = {f1 , . . . , fN } be a general basis for (V, h·, ·i) with associated reciprocal basis
B ∗ = {f 1 , . . . , f N }. Then Lin(V) has four associated tensor product bases and corresponding
reciprocal bases given by
{fi ⊗ fj } ←→ {f i ⊗ f j }
{f i ⊗ f j } ←→ {fi ⊗ fj }
{fi ⊗ f j } ←→ {f i ⊗ fj }
{f i ⊗ fj } ←→ {fi ⊗ f j }
21
(43)
(44)
(45)
(46)
where i, j = 1, . . . , N in each set. Every L ∈ Lin(V) has four matrix representations defined
through
L =
N
X
ˆlij fi ⊗ fj
(Pure Contravariant)
(47)
ˆlij f i ⊗ f j
(Pure Covariant)
(48)
ˆli fi ⊗ f j
j
(Mixed Contravariant-Covariant)
(49)
ˆl j f i ⊗ fj
i
(Mixed Covariant-Contravariant).
(50)
i,j=1
=
N
X
i,j=1
=
N
X
i,j=1
=
N
X
i,j=1
It follows easily (How?) that one can compute these various matrix components of L from
the formulae
ˆlij = hf i , Lf j i
ˆlij = hfi , Lfj i
ˆli = hf i , Lfj i
j
ˆl j = hfi , Lf j i.
i
Exercise 4 Let L = I, the identity transformation on V. Then I has the four matrix
representations (47,48,49,50) with
ˆlij = hf i , f j i = f ij
ˆlij = hfi , fj i = fij
ˆli = hf i , fj i = δ i
j
j
ˆl j = hfi , f j i = δ i
j
i
where δji denotes the usual Kronecker symbol. Thus, the matrix for the identity transformation when using a general (non-orthonormal) basis has the accustomed form only using
a mixed covariant and contravariant representation.
Remark 30 Equivalent Matrix Representations.
Given a general basis B = {f1 , . . . , fN } and its dual basis B ∗ = {f 1 , . . . , f N }, one also has
the four matrix representations of a linear transformation L ∈ Lin(V) defined through
[w]B
[w]B ∗
[w]B ∗
[w]B
=
=
=
=
[Lv]B = [L](B,B) [v]B
[Lv]B ∗ = [L](B ∗ ,B ∗ ) [v]B ∗
[Lv]B ∗ = [L](B,B ∗ ) [v]B
[Lv]B = [L](B ∗ ,B) [v]B ∗
22
which in component form becomes
wi = lij v j
wi = lij vj
wi = lij v j
wi = lij vj .
One now readily shows that
ˆlij = lij ,
ˆlij = lij ,
ˆli = li
j
j
and ˆlij = lij .
Indeed, consider the following calculation:
X
ˆlij fi ⊗ fj vk f k
Lv =
i,j
=
X
=
X
ˆlij fi vk hfj , f k i
i,j
ˆlij fi vk δ k
j
i,j
!
=
X
=
X
fi
X
i
ˆlij vj
j
w i fi
i
where
wi = ˆlij vj .
It now follows from the definition of lij that ˆlij = lij . The remaining identities can be deduced
by similar reasoning. Henceforth, the matrices [ˆlij ], [lij ], etc, will be used interchangeably.
Remark 31 Raising and Lowering Indices.
The metric tensor F = [fij ] = [hfi , fj i] and its dual counterpart [f ij ] = [hf i , f j i] can
be used to switch between covariant and contravariant component forms of vectors and
linear transformations (and indeed, tensors of all orders). This process is called “raising and
lowering indices” in classical tensor algebra. Thus, for example, if v = v i fi = vi f i , then
v i = hv, f i i = hvk f k , f i i = f ik vk .
Thus, [f ij ] is used to transform the covariant components [vi ] of a vector v into the contravariant components [v i ], a process called “raising the index”. By similar reasoning one
shows that
vi = fik v k ,
that is, the metric tensor F = [fij ] is used to transform the contravariant components of a
vector into the covariant components, a process called “lowering the index”.
23
Analogous raising/lowering index formulas hold for linear transformations. In particular,
suppose w = Lv and consider the string of identities
X
Lv =
lij fi ⊗ fj v k fk
ij
=
X
lij fi v k hfj , fk i
ij
!
=
X
X
fi
i
lik fkj
vj .
k
From the definition of lij , it follows that
lij = lik fkj .
Thus, the tensor [fij ] can be used to “lower” the column index of [lij ] by right multiplication
of matrices. Similarly, [f ij ] can be used to raise the column index by right multiplication.
Moreover, they can be used to lower and raise the row index of the matrices [lij ] as illustrated
by the following string of identities.
X
L =
lij fi ⊗ fj
ij
=
X
lij (fiq f q ) ⊗ fj
ij
!
=
X X
qj
fiq lij
f q ⊗ fj
i
which from the definition of the matrix [lqj ] implies that
lij = fik lkj .
(51)
The following identities are readily obtained by similar arguments.
lij = f ik lkj = lik fik ,
lij = fik lij = lik fik ,
etc. . . .
Applying (51) to the identity transformation for which lij = δji , lij = δij , lij = f ij and
lij = fij , one sees that
δij = fik f kj .
It follows that the metric tensor [fij ] and the matrix [f ij ] are inverses of one another, i.e.
[f ij ] = [fij ]−1 .
24
5.1.6
Eigenvalues, Eigenvectors and Eigenspaces
Invariance principles are central tools in most areas of mathematics and they come in many
forms. In linear algebra, the most important invariance principle resides in the notion of an
Eigenvector.
Definition 27 Let V be a vector space and L ∈ Lin(V) a linear transformation on V. Then
a non-zero vector v ∈ V is called an Eigenvector and λ ∈ F its associated Eigenvalue provided
Lv = λv.
(52)
From the definition (52) it follows that L(span(v)) ⊂ span(v), that is the line spanned
by the eigenvector v is invariant under L in that L maps the line back onto itself. The set
of all eigenvectors corresponding to a given eigenvalue λ (with the addition of the zero vector) forms a subspace of V (Why?), denoted E(λ), called the the Eigenspace corresponding
to the eigenvalue λ. One readily shows (How?) from the definition (52) that eigenvectors corresponding to distinct eigenvalues of a given linear transformation L ∈ Lin(V) are
independent, from which one concludes
E(λ1 ) ∩ E(λ2 ) = {0} if λ1 6= λ2 .
(53)
Re-writing the definition (52) as
(L − λI)v = 0
one sees that v is an eigenvector with eigenvalue λ if and only if
v ∈ Nul(L − λI).
It follows that λ is an eigenvalue for L if and only if
det(L − λI) = 0.
Recalling that det(L − λI) is a polynomial of degree N , one defines the Characteristic
Polynomial pL (λ) of the transformation L to be
pL (λ) := det(L − λI) = (−λ)N + i1 (L)(−λ)N −1 + . . . + iN −1 (L)(−λ) + iN (L).
(54)
The coefficients IL := {i1 (L), . . . , iN (L)} of the characteristic polynomial are called the
Principal Invariants of the transformation L because they are invariant under similarity
transformation. More specifically, let T ∈ Lin(V) be non-singular. Then the characteristic
polynomial for T−1 LT is identical to that for L as can be seen from the calculation
pT−1 LT (λ) =
=
=
=
=
det(T−1 LT − λI)
det(T−1 LT − λT−1 T)
det(T−1 (L − λI)T)
det(L − λI)
pL (λ).
25
An explanation of why the elements of IL are called the Principal Invariants of L must wait
until the upcoming section on spectral theory.
The principle invariants i1 (L) and iN (L) have simple forms. Indeed one can readily show
(How?) that
i1 (L) = tr(L) and iN (L) = det(L).
Moreover, if the characteristic polynomial has the factorization
N
Y
pL (λ) =
(λj − λ),
j=1
then one also has that
i1 (λ) = tr(L) = λ1 + . . . + λN
5.2
and iN (L) = det(L) = λ1 . . . λN .
Classes of Linear Transformations
An understanding of general linear transformations in Lin(V ) can be gleaned from a study
of special classes of transformations and representation theorems expressing general linear
transformations as sums or products of transformations from the special classes. The discussion is framed within the setting of a finite dimensional, inner product vector space. The
first class of transformations introduced are the projections.
5.2.1
Projections
Definition 28 A linear transformation P on (V, h·, ·i) is called a Projection provided
P 2 = P.
(55)
Remark 32 It follows readily from the definition (55) that P is a projection if and only
if I − P is a projection. Indeed, assume P is a projection, then
(I − P )2 = I 2 − 2P + P 2 = I − 2P + P = I − P
which shows that I −P is a projection. The reverse implication follows by a similar argument
since if I − P is a projection, then one need merely write P = I − (I − P ) and apply the
above calculation.
Remark 33 The geometry of projections is illuminated by the observation that the there
is a simple relationship between their range and nullspace. In particular, one can show easily
from the definition (55) that
Ran(P ) = Nul(I − P ) and Nul(P ) = Ran(I − P ).
26
(56)
For example, to prove that Ran(P ) = Nul(I − P ), let v ∈ Ran(P ). Then v = P u for some
u ∈ V and (I − P )v = (I − P )P u = P u − P 2 u = P u − P u = 0. It follows that v ∈ Nul(P )
and hence that
Ran(P ) ⊂ Nul(I − P ).
(57)
Conversely, if v ∈ Nul(I −P ), then (I −P )v = 0, or equivalently v = P v. Thus, v ∈ Ran(P )
from which it follows that
Nul(I − P ) ⊂ Ran(P ).
(58)
From (57,58) one concludes that Ran(P ) = Nul(I − P ) as required. The second equality in
(56) follows similarly.
Remark 34 Another important observation concerning projections is that V is a direct
sum of the subspaces Ran(P ) and Nul(P )
V = Ran(P ) ⊕ Nul(P ).
(59)
To see (59), one merely notes that for any v ∈ V
v = P v + (I − P )v,
with P v ∈ Ran(P ) and (I − P )v ∈ Nul(P ).
This fact and (9) show (59). Because of (59) one often refers to a projection P as “projecting
V onto Ran(P ) along Nul(P ).”
An important subclass of projections are the Perpendicular Projections.
Definition 29 A projection P is called a Perpendicular Projection provided Ran(P ) ⊥
Nul(P ).
A non-obvious fact about perpendicular projections is the
Theorem 8 A projection P is a perpendicular projection if and only if P is self-adjoint, i.e.
P = P ∗.
Proof: First assume P is self-adjoint and let v ∈ Ran(P ) and w ∈ Nul(P ). Then, v = P u
for some u ∈ V and
hv, wi = hP u, wi = hu, P ∗ wi = hu, P wi = hu, 0i = 0
from which it follows that v ⊥ w and hence that Ran(P ) ⊥ Nul(P ).
Conversely, suppose P is a perpendicular projection. One must show that P is selfadjoint, i.e. that
hP v, ui = hv, P ui for all v, u ∈ V.
(60)
To that end, notice that
hP v, ui = hP v, P u + (I − P )ui = hP v, P ui + hP v, (I − P )ui = hP v, P ui. (Why?)
Since hP v, P ui is symmetric in v and u, the same must be true for hP v, ui, from which
ones concludes (60).
Question: Among the class of elementary tensor products, which (if any) are projections?
Which are perpendicular projections?
27
Answer: An elementary tensor product a ⊗ b is a projection if and only if ha, bi = 1. It
is a perpendicular projection if and only if b = a and |a| = 1. (Why?)
Thus, if a is a unit vector, then a ⊗ a is perpendicular projection onto the line spanned
by a. More generally, if a and b are perpendicular, unit vectors, then a ⊗ a + b ⊗ b is
a perpendicular projection onto the plane (2-dimensional subspace) spanned by a and b.
(Exercise.) Generalizing further, if a1 , . . . , aK are pairwise orthogonal unit vectors, then
a1 ⊗ a1 + . . . + aK ⊗ aK is a perpendicular projection onto span(a1 , . . . , aK ).
5.2.2
Diagonalizable Transformations
A linear transformation L on a finite dimensional vector space V is called Diagonalizable
provided it has a matrix representation that is diagonal, that is, there is a basis B for V such
that
[L]B = diag(l1 , . . . , lN ).
(61)
Notation: An N × N diagonal matrix will be denoted A = diag(a1 , . . . , aN ) where A = [aij ]
with a11 = a1 , . . . , aN N = aN and aij = 0 for i 6= j.
The fundamental result concerning diagonalizable transformations is
Theorem 9 A linear transformation L on V is diagonalizable if and only if V has a basis
consisting entirely of eigenvectors of L.
Proof: Assume first that V has a basis B = {f1 , . . . , fN } consisting of eigenvectors of L, that
is Lfj = λj fj for j = 1, . . . , N . Recall that the column N -tuples comprising the matrix [L]B
are given by [Lfj ]B , the coordinates with respect to B of the vectors Lfj for j = 1, . . . , N .
But
 
 
0
λ1
λ2 
 
0
 
 
[Lf1 ]B = [λ1 f1 ]B =  ..  , [Lf2 ]B = [λ2 f2 ]B =  0  , etc.
 .. 
.
.
0
0
It now follows immediately that
[L]B = diag(λ1 , . . . , λN ). (Why?)
Conversely, suppose L is diagonalizable, that is, there exists a basis B = {f1 , . . . , fN } for
V such that [L]B = diag(λ1 , . . . , λN ). But then for each j = 1, . . . , N
 
0
 .. 
.
 
0
 
[Lfj ]B = [L]B [fj ]B = diag(λ1 , . . . , λN ) 1 (j th -component=1) = [λj fj ]B .
 
0
.
 .. 
0
It follows that for each j = 1, . . . , N , Lfj = λj fj (Why?), and hence that B = {f1 , . . . , fN } is
a basis of eigenvectors of L.
28
5.2.3
Orthogonal and Unitary Transformations
Let (V, h·, ·i) be a finite dimensional, inner product vector space. The the class of Orthogonal Transformations, in the case of a vector space over R, and the class of Unitary
Transformation, in the case of a vector space over C, is defined by
Definition 30 A transformation Q ∈ Lin(V) is called Orthogonal (real case) or Unitary
(complex case) provided
Q−1 = Q∗
or equivalently I = Q∗ Q.
(62)
That is, the inverse of Q equals the adjoint of Q.
Remark 35 It follows immediately from (62) that orthogonal transformations preserve
the inner product on V and hence leave lengths and angles invariant. Specifically, one has
hQa, Qbi = ha, Q∗ Qbi = ha, Q−1 Qbi = ha, bi.
(63)
In particular, from (63) it follows that
|Qa|2 = hQa, Qai = ha, ai = |a|2
showing that Q preserves lengths.
Remark 36
shows that
The class of orthogonal transformations will be denoted Orth. One easily
det(Q) = ±1. (Why?)
It is useful to define the subclasses
Orth+ := {Q ∈ Orth | det(Q) = 1} and Orth− := {Q ∈ Orth | det(Q) = −1}.
Example 14 If V is the Euclidean plane R2 , then Orth+ (R2 ) is the set of (rigid) Rotations,
i.e. given Q ∈ Orth+ (R2 ), there exists a 0 < θ < 2π such that
cos(θ) − sin(θ)
Q=
.
(64)
sin(θ) cos(θ)
Also, the transformation
RM :=
0 1
1 0
is in Orth− (R2 ) (Why?) (How does RM act geometrically?). More generally,
Orth− (R2 ) = {R ∈ Orth(R2 )| R = RM Q for some Q ∈ Orth+ (R2 )}. (Why?)
Example 15 If V is the Euclidean space R3 , then Orth+ is called the set of Proper
Orthogonal Transformations or the set of Rotations. To motivate calling Q ∈ Orth+ (R3 ),
one can prove that
29
Theorem 10 If Q ∈ Orth+ (R3 ), then there exists a non-zero vector a such that Qa = a
(i.e. a is an eigenvector corresponding to the eigenvalue λ = 1) and Q restricted to the plane
orthogonal to a is a two dimensional rotation of the form (64). The line spanned by a is
called the Axis of Rotation.
Proof: Exercise. (Hint: Use the following facts to show that 1 ∈ σ(Q).
1. 1 = det(Q) = λ1 λ2 λ3 where the (complex) spectrum of Q = {λ1 , λ2 , λ3 }.
2. At least one of the eigenvalues must be real (Why?), say λ1 , and the other two must
be complex conjugates (Why?), i.e. λ2 λ3 = |λ2 |2 .
3. |λj | = 1 for j = 1, 2, 3 (Why?) and hence λ2 λ3 = 1.
4. Since λ1 = 1 is an eigenvalue for Q, it has an eigenvector a, i.e. Qa = a.
5. Q is in Orth+ when restricted to the plane orthogonal to a (Why?).
The conclusion of the theorem now follows readily.)
It then follows that
Orth− (R3 ) = {R ∈ Orth(R3 ) | R = −Q for some Q ∈ Orth+ (R3 )}. (Why?)
Remark 37 Let V = RN equipped with the Euclidean inner product and natural basis.
One can now make the identification of Lin(RN ) with the set of N × N -matrices, MN . Then
Orth(RN ), the set of orthogonal transformations on RN , corresponds to the set of orthogonal
matrices in MN . It follows from the definition (62) that Q ∈ Orth(RN ) ⊂ MN if and only
if the columns (and hence rows) of Q form an orthonormal basis for RN . (Why?)
5.2.4
Self-Adjoint Transformations
Let (V, h·, ·i) be a finite dimensional inner product space. An important subspace of Lin(V)
is the collection of Hermitian or Self-Adjoint transformations denoted by Herm(V). The
self-adjoint transformations are just those transformations equal to their adjoints, i.e. S ∈
Herm(V) means S ∗ = S. The goal of this section is to present the celebrated Spectral Theorem
for self-adjoint transformations. The presentation proceeds in steps through a sequence of
lemmas. For much of the discussion, the scalar field can be either R or C.
Definition 31 The Spectrum of a transformation in Lin(V), denoted σ(L), is the set of all
eigenvalues of L.
Lemma 1 Let L be a self-adjoint transformation on the inner product space (V, h·, ·i). Then
σ(L) ⊂ R, i.e. the spectrum of L contains only real numbers.
Proof: Let λ ∈ σ(L) with associated unit eigenvector e, i.e. |e| = 1 and Le = λe. Then
λ = λhe, ei = hλe, ei = hLe, ei = he, Lei = he, λei = λhe, ei = λ.
Thus, λ = λ from which it follows that λ ∈ R.
30
Lemma 2 Let L be a self-adjoint transformation on the inner product space (V, h·, ·i). Then
distinct eigenvalues of L have orthogonal eigenspaces.
Proof: Let λ1 , λ2 ∈ σ(L) be distinct (real) eigenvalues of L with associated eigenvectors
e1 , e2 , respectively. What must be shown is that e1 ⊥ e2 , i.e. that he1 , e2 i = 0. To that
end,
λ1 he1 , e2 i = hλ1 e1 , e2 i = hLe1 , e2 i = he1 , Le2 i = he1 , λ2 e2 i = λ2 he1 , e2 i.
It follows that
(λ1 − λ2 )he1 , e2 i = 0.
But since λ1 6= λ2 , it must be that he1 , e2 i = 0, i.e. that e1 ⊥ e2 , as required.
Lemma 3 Let L be a self-adjoint transformation on the inner product space (V, h·, ·i) and
let λ ∈ σ(L) with associated eigenspace E(λ). Then E(λ) and its orthogonal complement
E(λ)⊥ are invariant subspaces of L, i.e.
L(E(λ)) ⊂ E(λ)
L(E(λ)⊥ ) ⊂ E(λ)⊥ .
(65)
(66)
Proof: Assertion (65) follows immediately from the definitions of eigenvalue and eigenspace.
For the second assertion (66), one must show that if v ∈ E(λ)⊥ , the Lv is also in E(λ)⊥ .
To that end, let e ∈ E(λ) and v ∈ E(λ)⊥ . Then v ⊥ e and
hLv, ei = hv, L∗ ei = hv, Lei = hv, λei = λhv, ei = 0.
It follows that Lv is perpendicular to every eigenvector in E(λ) from which one concludes
that Lv ∈ E(λ)⊥ as required.
With the aid of these lemmas, one can now prove the Spectral Theorem for self-adjoint
transformations.
Theorem 11 Let L ∈ Lin(V) be self-adjoint. Then there is an orthonormal basis for V
consisting entirely of eigenvectors of L. In particular, the eigenspaces of L form a direct
sum decomposition of V
V = E(λ1 ) ⊕ . . . ⊕ E(λK )
(67)
where σ(L) = {λ1 , . . . , λK } is the spectrum of L.
Proof: Since all of the eigenvalues of L are real numbers, the characteristic polynomial
factors completely over the real numbers, i.e.
K
Y
det(L − λ I) = pL (λ) =
(λj − λ)nj
j=1
with
n1 + . . . + nK = N
31
where dim(V) = N . The exponent nj is called the algebraic multiplicity of the eigenvalue λj .
The geometric multiplicity, mj , of each eigenvalue λj is defined to be dim(E(λj )). To prove
the theorem, it suffices to show that
m1 + . . . + mK = N.
(68)
Indeed, (68) shows that there is a basis for V consisting of eigenvectors of L, called an eigenbasis, and Lemma 2 shows one can find an orthonormal eigenbasis for V. Since the eigenspaces
E(λj ), j = 1, . . . , K are pairwise orthogonal, one can form the direct sum subspace
V1 := E(λ1 ) ⊕ . . . ⊕ E(λK ).
The problem is to show that V1 = V. Suppose not. Then one can write
V = V1 ⊕ V1⊥ .
First note that
⊥
V1⊥ = ∩K
j=1 E(λj ) . (Why?)
It follows from Lemma 3, that L(V1⊥ ) ⊂ V1⊥ . (Why?) Let L1 denote the restriction of L to
the subspace V1⊥
L1 : V1⊥ −→ V1⊥ ,
that is, if v ∈ V1⊥ , then L1 v := Lv ∈ V1⊥ . Notice that L1 is self-adjoint. Hence, the
characteristic polynomial of L1 factors completely over the real numbers. Let µ be an
eigenvalue of L1 with associated eigenvector e, i.e. L1 e = Le = µe. Thus, e is also an
eigenvector of L with eigenvalue µ. It follows that µ must be one of the λj , j = 1, . . . , K and
e ∈ E(λj) for some j = 1, . . . , K. But e must be orthogonal to all of the eigenspaces E(λJ ),
j = 1, . . . , K. This contradiction implies that V1⊥ = {0} and that V1 = V as required.
Let Pj denote the perpendicular projection operator onto the eigenspace E(λj ). Then
since the eigenspaces are pairwise orthogonal, it follows that (Why?)
Pi Pj = 0 if i 6= j,
and P2j = Pj .
(69)
The spectral theorem implies that a self-adjoint transformation L has a Spectral Decomposition defined by
L = λ1 P 1 + . . . + λK P K
(70)
where the spectrum of L is σ(L) = {λ1 , . . . , λK }. Moreover, for each positive integer n, Ln
is also self-adjoint (Why?), and appealing to (70) one easily shows (How?) that Ln has the
spectral decomposition
Ln = λn1 P1 + . . . + λnK PK .
Thus, Ln has the
eigenspaces as L with spectrum σ(Ln ) = {λn1 , . . . , λnK }. More generPsame
n
ally, let f (x) = k=0 ak xk be a polynomial. Then f (L) is self-adjoint (Why?) and has the
spectral decomposition
K
X
f (L) =
f (λj )Pj .
(71)
j=1
32
It follows that σ(f (L)) = {f (λ1 ), . . .P
, f (λK )} and f (L) has the same eigenspaces as L.
k
Generalizing further, let f (x) = ∞
k=0 ak x be an entire function, i.e. f (x) has a power
series representation convergent for all x. Then, again one concludes that f (L) (Why is the
power series with L convergent?) is self-adjoint and has the spectral decomposition (71).
Example 16 Consider the entire function exp(x) =
(if L is) and has the spectral representation
exp(L) =
K
X
P∞
xk
k=0 n! .
Then, exp(L) is self-adjoint
exp(λj )Pj .
j=1
Example 17 Let V = RN with the usual inner product and let A ∈ MN (R) be a selfadjoint N × N real matrix. Consider the system of ordinary differential equations
ẋ(t) = A x(t)
(72)
with initial condition x(0) = x0 . Then the solution to the initial value problem for the
system (72) is
x(t) = exp(At)x0
which by the spectral theorem can be written
x(t) =
K
X
exp(λj t)Pj
j=1
where A has the spectrum σ(A) = {λ1 , . . . , λK } and spectral projections Pj , j = 1, . . . , K.
5.2.5
Square-root of Positive Definite Hermitian Transformations
Suppose L ∈ Herm has only positive eigenvalues. Then L has a unique positive definite
Hermitian square-root. Indeed, if L is positive definite Hermitian, then it has the spectral
decomposition
L = µ1 P 1 + . . . + µK P K
with µj ∈ R+ and Pj the perpendicular projection onto the eigenspace E(µj ). Define
√
√
√
L := µ1 P1 + . . . + µK PK .
√
√
Then clearly L is positive definite Hermitian and ( L)2 = L.
Remark 38 The spectral theorem shows that a necessary and sufficient condition for a
transformation L ∈ Lin(V) to have a real spectrum σ(L) = {λ1 , . . . , λN } ⊂ R and a set of
eigenvectors forming an orthonormal basis for V, is for L to be Hermitian or self-adjoint.
A natural question to ask is under what conditions on L does this result hold when the
spectrum is permitted to be complex. The desired condition on L is called Normality.
33
5.2.6
Normal Transformations
Definition 32 A transformation L ∈ Lin(V) is called Normal provided it commutes with
its adjoint, i.e. L∗ L = LL∗ .
Remark 39 It follows immediately from the definitions that unitary and Hermitian transformations are also normal. Moreover, if L has a spectral decomposition
L = λ1 e1 ⊗ e1 + . . . + λN eN ⊗ eN
(73)
with λj ∈ C, j = 1, . . . , N and {e1 , . . . , eN } an orthonormal basis for V, then L is normal.
The following generalization of the spectral theorem says that this condition is both necessary
and sufficient for L to be normal.
Theorem 12 A necessary and sufficient condition for L ∈ Lin(V) to be normal is that it
have a Spectral Decomposition of the form (73).
Proof: The proof follows exactly as in the case for self-adjoint transformations with the
three key lemmas being modified as follows.
Lemma 4 If L ∈ Lin(V) is normal, then Lej = λj ej if and only if L∗ ej = λj ej , and hence
σ(L∗ ) = σ(L) and EL∗ (λj ) = EL (λj ) for each λj ∈ σ(L).
Proof: Note first that if L is normal then so is L − λ I since (L − λ I)∗ = L∗ − λ I. Next
observe that if L is normal, then |Lv| = |L∗ v| for all v ∈ V. This last fact follows from
|Lv|2 = hLv, Lvi = hL∗ Lv, vi = hLL∗ v, vi = hL∗ v, L∗ vi = |L∗ v|2 .
One now deduces that
|(L − λ I)v| = |(L∗ − λ I)v|
from which the conclusion of the lemma readily follows. (Why?)
Lemma 5 If L is normal and λi , λj are distance eigenvalues, then their eigenspaces are
orthogonal, i.e. EL (λj ) ⊥ EL (λi ).
Proof: Suppose ei ∈ EL (λi ) and ej ∈ EL (λj ). Then
λi hei , ej i = hλi ei , ej i = hLei , ej i = hei , L∗ ej i = hei , λj ej i = λj hei , ej i
from which one concludes that hei , ej i = 0 if λi 6= λj thereby proving the lemma.
Lemma 6 If L is normal with eigenvalue λ and associated eigenspace EL (λ), then
L(EL (λ)⊥ ) ⊂ EL (λ)⊥ .
Proof: Let v ∈ EL (λ)⊥ and e ∈ EL (λ). Then hv, ei = 0 and
hLv, ei = hv, L∗ ei = hv, λei = λhv, ei = 0
from which one concludes that Lv ∈ EL (λ)⊥ thus proving the lemma.
Using these lemmas, the proof of the spectral theorem for self-adjoint transformations
now readily applies to the case of normal transformations. (How?)
34
5.2.7
Skew Transformation
The Skew Transformations, also called Anti-Self-Adjoint, are defined as follows.
Definition 33 A transformation W ∈ Lin(V) is called Skew provided it satisfies W∗ =
−W.
It follows immediately that skew transformations are normal, i.e. they commute with
their adjoints, and therefore can be diagonalized (over the complex numbers) by a unitary
transformation. For example, in R2
0 1
W=
−1 0
√
is skew. It’s spectrum is σ(w) = {i, −i} where i = −1 (Why?) and
Q∗ WQ = diag(i, −i)
where
1
Q= √
2
1 1
i −i
is unitary (Why?).
In R3 , every skew matrix has the form (Why?)


0 −γ β
0 −α .
W= γ
−β α
0
One then shows (How?) that W acts on a general vector v ∈ R3 through vector cross
product with a vector w, called the axial vector of W,
Wv = w × v
with
 
α
w = β  .
γ
(What is the spectrum of W?)
5.3
Decomposition Theorems
Decomposition theorems give insight into the structure and properties of general linear transformations by expressing them as multiplicative or additive combinations of special classes
of transformations whose structure and properties are easily understood. The spectral decomposition is one such result expressing self-adjoint or normal as linear combinations of
perpendicular projections. Perhaps the most basic of decompositions expresses a general
real transformation as the sum of a symmetric and a skew-symmetric transformation, and a
general complex transformation as the linear combination of two Hermitian transformations.
35
5.3.1
Sym-Skew and Hermitian-Hermitian Decompositions
Theorem 13 Let L ∈ Lin(V) be a transformation on the real vector space V. Then L can
be written uniquely as the sum
L=S+W
(74)
with
S ∈ Sym(V) and
W ∈ Skew(V)
(75)
where Sym(V) denotes the set of symmetric transformations (i.e. S ∈ Sym(V) ⇐⇒ ST = S)
and Skew(V) denotes the set of skew-symmetric transformations (i.e. W ∈ Skew(V) ⇐⇒
WT = −W).
Proof: The proof is by construction. Define
1
1
S := (L + LT ) and W := (L − LT ).
2
2
Then clearly (74,75) hold. Conversely, if (74,75) hold, then
L + LT = (S + W) + (S + W)T = 2S
and
L − LT = (L + W) − (S + W)T = 2W
thereby proving uniqueness of the representation (74,75).
Theorem 14 Let L ∈ Lin(V) be a transformation on the complex vector space V. Then L
can be written uniquely as the sum
L = S + iW
(76)
with
S, W ∈ Herm(V).
(77)
Proof: The proof is identical to that above for the real transformation case except that
1
1
S := (L + L∗ ) and W := −i (L − L∗ ).
2
2
Then
and
1
1
W∗ = i (L∗ − L) = −i (L − L∗ ) = W
2
2
1
1
S + i W = (L + L∗ ) + (L − L∗ ) = L.
2
2
Remark 40 Notice that (76,77) is the analog of the Cartesian representation of a complex
number z = x + i y with x, y ∈ R.
36
5.3.2
Polar Decomposition
The Polar Decomposition gives a multiplicative decomposition of a nonsingular linear transformation into the product of a positive definite Hermitian transformation and a unitary
transformation. It is useful to define the classes of transformations Orth+ := {Q ∈ Orth | det(Q) =
1}, PHerm := {S ∈ Herm | σ(S) ⊂ R+ } and PSym := {S ∈ Sym | σ(S) ⊂ R+ }.
Theorem 15 Let L ∈ Lin(V) for the vector space V with det(L) 6= 0. The there exist
U, V ∈ PHerm(V) and Q ∈ Unit(V) with
L = QU Right Polar Decomposition
= VQ Left Polar Decomposition.
(78)
(79)
Proof: Note first that L∗ L ∈ PHerm(V) (Why?).
√ Define U to be the (unique) positive
∗
definite Hermitian square-root of L L, i.e. U := L∗ L and then define
Q := LU−1 .
(80)
It follows that Q∗ = U−1 L∗ (Why?) and hence that
Q∗ Q = U−1 L∗ LU−1 = U−1 U2 U−1 = I.
Therefore, Q is unitary and (78) holds. One now defines
V := QUQ∗
and observes that V is Hermitian and positive definite thereby proving (79). Note also that
V2 = LL∗ .
Remark 41 An important application of the Polar Decomposition is to the definition of
strain in nonlinear elasticity. To that end, let F denote a deformation gradient.6 Then one
defines the Cauchy-Green strains by
C := F∗ F Right Cauchy-Green Strain
B := FF∗ Left Cauchy-Green Strain.
Defining the Left and Right Stretches by
√
U :=
C Right Stretch
√
V :=
B Left Stretch,
the left and right polar decompositions of F give
F = VQ = QU
which gives a multiplicative decomposition of the deformation gradient into a product of a
stretch tensor and a rigid rotation.
6
The term deformation gradient will be defined rigorously in the Notes on Tensor Analysis.
37
5.4
Singular Value Decomposition
As noted previously, not all square (real or complex) matrices can be diagonalized, that
is become diagonal relative to a suitable basis. Indeed, only those (N × N )-matrices possessing a basis for CN consisting of eigenvectors can be diagonalized. The Singular Value
Decomposition gives a representation reminiscent of the spectral decomposition for any (not
necessarily square) matrix.
Theorem 16 Let A ∈ MM,N (C) be an arbitrary M × N matrix. Then there exists unitary
matrices V ∈ MM (C) and W ∈ MN (C) and a diagonal matrix D ∈ MM,N (C) satisfying
A = VDW∗
(81)
with D = diag(σj ), σ1 ≥ σ2 ≥ . . . ≥ σk > σk+1 = . . . = σL = 0 where k is the rank of A and
L := min{M, N }.7
Remark 42 The nonnegative numbers {σ1 , . . . , σL } are called the Singular Values for the
matrix A.
Proof: It suffices to assume M ≤ N since otherwise, one need merely prove the result for
A∗ and take the adjoint of the resulting decomposition. The first step in the proof is to take
the adjoint of the desired decomposition (81)
A∗ = WD∗ V∗
and form the M × M Hermitian matrix
AA∗ = VDD∗ V∗ .
(82)
2
= ... =
The Hermitian matrix AA∗ has nonnegative eigenvalues σ12 ≥ σ22 ≥ . . . ≥ σk2 > σk+1
2
σM = 0 and (82) is its spectral decomposition. Define D := diag(σ1 , . . . , σM ) and V to be
the unitary matrix in the spectral decomposition (82). The unitary matrix W∗ must satisfy
V∗ A = DW∗
or equivalently
A∗ V = WDT .
(83)
The N × N matrix W will be constructed by specifying its column vectors
W = [w1 w2 . . . wN ].
(84)
The left hand side of (83) has the form
A∗ V = [(A∗ v1 ) . . . (A∗ vM )]
(85)
where V has the column vectors vj , j = 1, . . . , M . On the other hand, the right hand side
of (83) has the column structure
WDT = [(σ1 w1 ) . . . (σk wk ) 0 . . . 0]
7
A general matrix D = [dij ] ∈ MM,N is called diagonal if dij = 0 for i 6= j.
38
(86)
in which the last M − k columns are the (N -dimensional) zero vector. Note that
A∗ vj = 0,
for j = k + 1, . . . , M
as can be seen from
0 = (AA∗ vj ) · vj = (A∗ vj ) · (A∗ vj ) = |A∗ vj |2
for j = k + 1, . . . , M
which follows from
AA∗ vj = 0 for j = k + 1, . . . , M.
Equating columns in (85) and (86), one defines the first k-columns of W by
wj :=
1 ∗
A vj ,
σj
for j = 1, . . . , k.
These k vectors w1 , . . . , wk are easily seen to form an orthonormal set. (Why?) One now
defines the remaining N − k column vectors of W in such a way that the set {w1 , . . . , wN }
is orthonormal thereby proving the theorem.
5.5
Solving Equations
An important application of linear algebra is to the problem of solving simultaneous systems
of linear algebraic equations. Specifically, let A ∈ MM,N (C) and consider the system of
linear equations
Ax = b
(87)
where b ∈ CM is given and x ∈ CN is sought. Thus, (87) is a system of M linear equations
in N unknowns. In general, the system (87) can have no solutions, a unique solution or
infinitely many solutions. If A is regarded as defining a linear transformation from CN into
CM , then (87) has solutions if and only if b ∈ Ran(A) and such solutions are unique if and
only if Nul(A) = {0}. Moreover, Ran(A) is the span of the columns of A. However, testing
whether or not b ∈ Ran(A) can be computationally expensive. The Fredholm Alternative
Theorem gives another characterization of Ran(A) that is often much easier to check.
Theorem 17 Let A ∈ MM,N (C). Then
Ran(A) = (Nul(A∗ ))⊥ .
(88)
Proof: First, note that if b ∈ Ran(A) and a ∈ Nul(A∗ ), then b = Ax for some x ∈ RN
and
hb, ai = hAx, ai = hx, A∗ ai = 0.
Thus,
Ran(A) ⊂ Nul(A∗ )⊥ .
39
(89)
But one also has
dim(Nul(A∗ )⊥ ) = M − dim(Nul(A∗ )) (Why?)
= dim(Ran(A∗ )) (Why?)
= dim(Ran(A)). (Why?)
(90)
Combining (89) and (90) one concludes that
Ran(A) = Nul(A∗ )⊥
(Why?)
as required. Remark 43 The Fredholm Alternative Theorem says that (87) has solutions if and only
if b is orthogonal to the null-space of A∗ . Usually, the null-space of A∗ has small dimension,
so testing for b ∈ Nul(A∗ )⊥ is often much easier than testing to see if it is in the span of the
columns of A.
In many applications, the system (87) is badly over determined (i.e. far more independent
equations than unknowns) and one does not expect there to be a solution. In such cases,
one might want to find best approximate solutions. A typical such example arises in linear
regression in which one wishes to find a linear function that “best fits” a large number of
data points, where best is usually in the least square sense. These ideas are made rigorous
through the concept of Minimum Norm Least Square Solution to (87).
5.5.1
Minimum Norm Least Squares Solution
A Least Squares Solution to (87) is any x ∈ RN that minimizes the Residual Error |Ax − b|.
Let Q = {x ∈ RN | |Ax − b| = minz∈RN |Az − b|}. The set Q is always non-empty but may
contain many vectors x. Indeed, form the direct sum decomposition
RN = Ran(A) ⊕ Ran(A)⊥
and write
b = br + b0
(91)
br ∈ Ran(A) and b0 ∈ Ran(A)⊥ = Nul(AT ).
(92)
where
Then, Ax − br ∈ Ran(A) and
|Ax − b|2 = |Ax − br |2 + |b0 |2
(Why?).
(93)
Clearly, x ∈ Q if and only if Ax = br and
|b0 | = min |Az − b|.
z∈RN
(94)
Since br ∈ Ran(A), Q is non-empty. (Why?) Another characterization of Q is that x ∈ Q if
and only if AT (Ax − b) = 0. To see why, observe that if x ∈ Q, then Ax = br from which
it follows that
AT (Ax − b) = AT (Ax − br ) + AT b0
= AT (Ax − br ) = 0.
40
Conversely, if AT (Ax − b) = 0 then AT (Ax − br ) = 0. Also Nul(AT ) = Ran(A)⊥ . Hence,
Ax − br ∈ Ran(A)⊥ . But since it is obvious that (Ax − br ) ∈ Ran(A), one concludes that
(Ax − br ) = 0.
Next, note that Q is a convex subset of RN . To see this, let x1 , x2 ∈ Q and let α1 , α2 > 0
with α1 + α2 = 1. Then,
AT A(α1 x1 + α2 x2 ) =
=
=
=
α1 AT Ax1 + α2 AT Ax2
α1 AT br + α2 AT br
AT br (α1 + α2 )
AT br
which implies that α1 x1 + α2 x2 ∈ Q and hence that Q is convex.
It follows that Q has a unique vector of minimum norm, i.e. closest to the origin 0.
(Why?) This motivates the following
Definition 34 The Minimum Norm Least Squares Solution to the linear system (87) is the
unique element x ∈ Q such that |x| = minz∈Q |z|.
While the minimum norm least squares solution is well-defined, there remains the question
of how to find it. The answer lies in the Moore-Penrose Pseudo-Inverse of A.
5.5.2
Moore-Penrose Pseudo-Inverse
The Moore-Penrose Pseudo-Inverse of A is defined through its singular value decomposition.
(81).
Definition 35 Given any matrix A ∈ MM,N (C) with Singular Value Decomposition A =
VDW∗ , its Moore-Penrose Pseudo-Inverse A† ∈ MN,M is defined by
A† := WD† V∗
(95)
where D† ∈ MN,M is the N × M diagonal matrix
D† := diag(δj )
with
δj :=
1
σj
for j = 1, . . . , k, and δj = 0 for j = k + 1, . . . , L
(96)
(97)
and L = min{M, N }, k :=Rank(A).
Remark 44 Note that if M ≤ N and k =Rank(A) = M , then A† is a right inverse of
A, i.e. AA† = IM where IM denotes the M × M identity matrix. Similarly, if N ≤ M and
k =Rank(A) = N , then A† is a left inverse of A, i.e. A† A = IN the N × N identity matrix.
The principal use of the Moore-Penrose pseudo-inverse is given in the following theorem.
Theorem 18 The Minimum Norm Least Squares solution of the system of equations
Ax = b is given by
x = A† b.
41
Proof: Consider the singular value decomposition A = VDWT and suppose Rank(A) = k.
Moreover, define a := WT x and let a have components [a] = [aj ]. Also, assume V has the
column structure V = [v1 . . . vM ] Observe first that
|Ax − b|2 = |VT (Ax − b)|2 (Why?)
= |VT AWWT x − VT b|2
= |DWT x − VT b|2
k
M
X
X
2
=
(σj aj − vj · b) +
(vj · b)2 .
j=1
j=k+1
It follows that |Ax − b| is minimized by choosing
aj =
1
vj · b,
σj
for j = 1, . . . , k.
(98)
Also,
|x|2 = |WT x|2 = a21 + . . . + a2k + a2k+1 + . . . + a2N
which is minimized by choosing
ak+1 = . . . = aN = 0.
One now sees that (How?)
x = WDVT b = A† b
thereby proving the theorem. A natural question to ask is: How are the singular value decomposition and spectral
decomposition related? This answer is in the following
Remark 45 Suppose A ∈ MN (C) is normal, i.e. AA∗ = A∗ A. Then A has a spectral
decomposition A = Vdiag(λ1 , . . . , λN )V∗ with V unitary and the spectrum σ(A) ⊂ C.
Write λj = |λj |eiθj and define the diagonal matrix D := diag(|λ1 |, . . . , |λN |) and unitary
matrix W := VD (Why is it unitary?). Then A has the singular value decomposition
A = VDW∗ . (Verify.)
Remark 46 A bit more surprising is the following relationship between the polar decomposition and the singular value decomposition. Suppose A ∈ MN (R) with det(A) 6= 0 has polar
decomposition A = UQ with U ∈ PSym and Q ∈ Orth+ . Then U has a spectral decomposition U = VDVT with V ∈ Orth, D = diag(σ1 , . . . , σN ) and spectrum {σ1 , . . . , σN } ⊂ R+ .
Then A has the singular value decomposition A = VDWT with W := QT V. (Verify.)
Example 18 Let A be the 1 × N matrix A := [a1 . . . aN ]. Then show that the singular value decomposition of A = VDW∗ with V = [1] being the 1 × 1 identity matrix,
D = [|a| 0 . . . 0] and W = [w1 . . . wN ] with w1 := a/|a| and w2 , . . . , wN chosen so that
{w1 , . . . , wN } forms an orthonormal basis for CN and the N -tuple a defined by
 
a1
 .. 
a =  . .
aN
42
Exercise 5 Find the minimum norm least squares solution to the N ×1 system of equations
a1 x = b 1
..
.
aN x = b N .
(Hint: To find the singular value decomposition for the N × 1 matrix
 
a1
 .. 
A :=  . 
aN
first construct the SVD for its 1 × N transpose AT = [a1 . . . aN ].)
6
Rayleigh-Ritz Theory
Thus far, eigenvalues and eigenvectors have been studied from an algebraic and geometric
perspective. However, they also have a variational characterization that is revealed in the
Rayleigh-Ritz Theory. The first result considered is the foundational Rayleigh-Ritz Theorem.
Theorem 19 Let L ∈ (V, h·, ·i)8 be self-adjoint, i.e. L∗ = L. Let its (real) eigenvalues be
ordered λ1 ≤ λ2 ≤ . . . ≤ λN . Then
λmax
λ1 ≤ hLv, vi ≤ λN
= λN = sup hLv, vi
for all v ∈ V, |v| = 1
(99)
(100)
|v|=1
λmin = λ1 =
inf hLv, vi.
|v|=1
(101)
Proof: By the Spectral Theorem, there is an orthonormal basis {b1 , . . . , bN } for V consisting
of eigenvectors of L satisfying Lbj = λj bj for j = 1, . . . , N . It follows that for every unit
vector v ∈ V,
N
X
hLv, vi =
λj |hbj , vi|2 (Why?)
(102)
j=1
and
|hb1 , vi|2 + . . . + |hbN , vi|2 = 1. (Why?)
(103)
Equation (102) says that the quadratic form hLv, vi for unit vectors v is a weighted average of the eigenvalues of L. One then immediately deduces the inequalities (99). (How?)
Moreover, choosing v = bN and b1 , gives (100,101), respectively. (How?)
The Rayleigh-Ritz theorem gives a variational characterization of the smallest and largest
eigenvalues of a self-adjoint linear transformation. However, an easy extension of the theorem
characterizes all of the eigenvalues as solutions to constrained variational problems.
8
Unless otherwise indicated, throughout this section (V, h·, ·i) is assumed to be a complex inner product
space.
43
Theorem 20 With the same hypotheses and notation as the previous theorem, the eigenvalues λ1 ≤ . . . ≤ λN of L satisfy
λk =
hLv, vi
min
(104)
|v|=1
v⊥{b1 ,...,bk−1 }
and
λN −k =
hLv, vi.
max
(105)
|v|=1
v⊥{bN ,...,bN −k+1 }
Proof: The proof of the previous theorem is easily generalized to prove (104,105). To that
end, suppose v is a unit vector orthogonal to the first k − 1 eigenvectors, {b1 , . . . , bk−1 }.
Then one easily shows (How?) that
|hv, bk i|2 + . . . + |hv, bN i|2 = 1
and
hLv, vi =
N
X
λj |hv, bj i|2 =
j=1
N
X
λj |hv, bj i|2 ≥ λk . (Why?)
(106)
j=k
But one also has bk ⊥ {b1 , . . . , bk−1 } and
hLbk , bk i = λk .
(107)
The result (104) nows follows readily from (106,107). The proof of (105) is similar and left
as an exercise.
The Rayleigh-Ritz theory is a powerful theoretical tool for studying eigenvalues for a selfadjoint linear transformation. However, the necessity of knowing explicitly the eigenvectors
in order to estimate eigenvalues other than the smallest and largest seriously limits its use as
a practical tool. This limitation can be avoided through the following generalization known
as the Courant-Fisher Theorem which gives a min-max and max-min characterization of the
eigenvalues of a self-adjoint linear transformation.
Theorem 21 With the same hypotheses and notation of the previous two theorems, the
eigenvalues λ1 , . . . , λN of L satisfy
λk =
min
hLv, vi
(108)
hLv, vi.
(109)
max
w1 ,...,wN −k
|v|=1
v⊥{w1 ,...,wN −k }
and
λk =
max
min
w1 ,...,wk−1
|v|=1
v⊥{w1 ,...,wk−1 }
Proof: From the Spectral Theorem it follows that for any unit vector v ∈ V
v = b1 hv, b1 i + . . . + bN hv, bN i
Lv = λ1 b1 hv, b1 i + . . . + λN bN hv, bN i
1 = |hv, b1 i|2 + . . . + |hv, bN i|2
44
and hence that
hLv, vi = λ1 |hv, b1 i|2 + . . . + λN |hv, bN i|2 .
That (108,109) are true for k = 1 or N follows immediately from the Rayleigh-Ritz Theorem.
As a special case, consider first k = N − 1. Then (108) takes the form
λN −1 = min maxhLv, vi.
w
(110)
|v|=1
v⊥w
The idea now is to choose amongst all the unit vectors v orthogonal to a given vector w
only those lying in the plane orthogonal to {b1 , . . . , bN −2 }. Such unit vectors v must satisfy
1 = |hv, bN −1 i|2 + |hv, bN i|2 (Why?)
v = bN −1 hv, bN −1 i + bN hv, bN i (Why?)
hLv, vi = λN −1 |hv, bN −1 i|2 + λN |hv, bN i|2 . (Why?)
It follows that (Why?)
max hLv, vi ≥
max
1=|hv, bN −1 i|2 +|hv, bN i|2
|v|=1
v⊥w
λN −1 |hv, bN −1 i|2 + λN |hv, bN i|2 ≥ λN −1 .
(111)
v⊥w
Since (111) holds for all vectors w, one concludes that
min max hLv, vi ≥ λN −1 .
w
(112)
|v|=1
v⊥w
The desired result (110) now follows from (112) and the observation that
max hLv, vi = λN −1
|v|=1
v⊥bN
due to the Rayleigh-Ritz theorem.
The proof of (108) for all eigenvalues λk proceeds along the same lines as that just given
for λN −1 . More specifically, one chooses amongst all the unit vectors v orthogonal to a given
set of N − k vectors {w1 , . . . , wN −k } only those orthogonal to {b1 , . . . , bk−1 } (Why can this
be done?). Then such v must satisfy
1 = |hv, bk i|2 + . . . + |hv, bN i|2
v = bk hv, bk i + . . . + bN hv, bN i
hLv, vi = λk |hv, bk i|2 + . . . + λN |hv, bN i|2 .
It follows that (Why?)
max
hLv, vi ≥
|v|=1
v⊥{w1 ,...,wN −k }
max
1=|hv, bk i|2 +...+|hv, bN i|2
λk |hv, bk i|2 + . . . + λN |hv, bN i|2 ≥ λk .
(113)
v⊥{w1 ,...,wN −k }
The desired result (108) now follows from (113) and the fact that
max
1=|hv, bk i|2 +...+|hv, bN i|2
λk |hv, bk i|2 + . . . + λN |hv, bN i|2 = λk
v⊥{bk+1 ,...,bN }
due to the Rayleigh-Ritz Theorem. The proof of (109) proceeds along similar lines and is
left as an exercise.
45
Remark 47 The Rayleigh-Ritz Theorem (105) says that the k-th eigenvalue, λk , is the
maximum value of the Rayleigh Quotient
hLv, vi
hv, vi
(114)
over the k-dimensional subspace {bk+1 , . . . , bN }⊥ = span{b1 , . . . , bk }. The Courant-Fisher
Theorem (108) says that the maximum of the Rayleigh quotient (114) over an arbitrary
k-dimensional subspace is always at least as large as λk .
46
Download