Notes on Linear Algebra∗ Jay R. Walton Department of Mathematics Texas A& M University October 3, 2007 1 Introduction Linear algebra provides the foundational setting for the study of multivariable mathematics which in turn is the bedrock upon which most modern theories of mathematical physics rest including classical mechanics (rigid body mechanics), continuum mechanics (the mechanics of deformable material bodies), relativistic mechanics, quantum mechanics, etc. At the heart of linear algebra is the notion of a (linear) vector space which is an abstract mathematical structure introduced to make rigorous the classical, intuitive concept of vectors as physical quantities possessing the two attributes of length and direction. In these brief notes, vector spaces and linear transformations are introduced in a three step presentation. First they are studied as algebraic objects and a few important consequences of the concept of linearity are explored. Next the algebraic structure is augmented by introducing a topological structure (via a metric or a norm, for example) providing a convenient framework for extending key concepts from the calculus to the vector space setting permitting a rigorous framework for studying nonlinear, multivariable functions between vector spaces. Finally, an inner product structure for vector spaces is introduced in order to define the geometric notion of angle (and as a special case, the key concept of orthogonality) augmenting the notion of length or distance provided by previously by a norm or a metric. 2 Vector Space The central object in the study of linear algebra is a Vector Space which is defined here as follows. Definition 1 A Vector Space over a field F 1 is a set, V, whose elements are called vectors, endowed with two commutative and associative binary operations, called vector addition and scalar multiplication, subject to the following axioms. ∗ c 2006 by Jay R. Walton. All rights reserved. Copyright While in general F can be any field, in these notes it will denote either the real numbers R or the complex numbers C. 1 1 1. If a, b ∈ V, the their sum a + b is also in V. 2. Vector addition is commutative and associative, i.e. a + b = b + a and all a, b ∈ V and a + (b + c) = (a + b) + c for all a, b, c ∈ V. 3. There exists a zero element, 0, in V satisfying 0 + a = a for all a ∈ V. 4. Every element a ∈ V possesses a (unique) additive inverse, −a, satisfying −a + a = 0. 5. If a ∈ V and α ∈ F, then αa ∈ V. 6. For all a ∈ V, 1 a = a. 7. If a, b ∈ V and α, β ∈ F, then α(a+b) = αa+αb, (α+β)a = αa+βa, α(βa) = (αβ)a.2 Remark 1 It follows readily from these axioms that for any a ∈ V, 0a = 0 and (−1)a = −a. (Why?) Remark 2 The above axioms are sufficient to conclude that for any collection of vectors a1 , . . . , aN ∈ V and scalars α1 , . . . , αN ∈ F, the Linear Combination α1 a1 + . . . + αN aN is an unambiguously defined vector in V. (Why?) Example 1 The prototype examples of vector spaces over the real numbers R and the complex numbers C are the Euclidean spaces RN and CN of N -tuples of real or complex numbers. Specifically, the elements a ∈ RN are defined by a1 .. a := . aN where a1 , . . . , aN ∈ R. Vector addition and scalar multiplication are defined componentwise, that is a1 b1 αa1 + βb1 .. αa + βb = α ... + β ... = . . aN bN αaN + βbN Analogous relations hold for the complex case. Example 2 The vector space R∞ is defined as the set of all (infinite) sequences, a, of real numbers, that is, a ∈ R∞ means ! a1 a = .. . with vector addition and scalar multiplication done componentwise. 2 The symbol “+” has been used to denote both the addition of vectors and the addition of scalars. This should not generate any confusion since the meaning of “+” should be clear from the context in which it is used. 2 Definition 2 A subset U of a vector space V is called a Subspace provided it is closed with respect to vector addition and scalar multiplication (and hence is also a vector space). Example 3 One shows easily that RN is a subspace of RM whenever N ≤ M . (Why?) Example 4 The vector space R∞ f is the set all of infinite sequences of real numbers with ∞ only finitely many non-zero components. It follows readily that R∞ f is a subspace of R . (Why?) Definition 3 Let A be a subset of a vector space V. Then the Span of A, denoted span(A), is defined to be the set of all (finite) linear combinations formed from the vectors in A. A subset A ⊂ V is called a Spanning Set for V if span(A) = V. A spanning set A ⊂ V is called a Minimal Spanning Set if no proper subset of A is a spanning set for V. Remark 3 2.1 One readily shows that span(A) is a subspace of V. (How?) Linear Independence and Dependence The most important concepts associated with vector spaces are Linear Independence and Linear Dependence. Definition 4 A set of vectors A ⊂ V is said to be Linearly Independent if given any vectors a1 , . . . , aN ∈ A, then the only way to write the zero vector as a linear combination of these vectors is if all coefficients are zero, that is, α1 a1 + . . . + αN an = 0 if and only if α1 = . . . = αN = 0. Remark 4 If a set of vectors A is linearly independent and B ⊂ A, then B is also linearly independent. (Why?) Remark 5 A minimal spanning set is easily seen to be linearly independent. Indeed, suppose A is a minimal spanning set and suppose it is not linearly independent. Then there exists a subset {a1 , . . . , aN } of A such that aN can be written as a linear combination of a1 , . . . , aN −1 . It follows that A ∼ {aN } (set difference) is a proper subset of A that spans V contradicting the assumption that A is a minimal spanning set. Definition 5 If a 6= 0, then span({a}) is called a Line in V. If {a, b} is linearly independent, then span({a, b}) is called a Plane in V. Definition 6 A set of vectors A ⊂ V is said to be Linearly Dependent if there exists a set of vectors a1 , . . . , aN ∈ V and scalars α1 , . . . , αN ∈ F, not all zero, with α1 a1 + . . . + αN aN = 0. 3 Remark 6 A set of vectors that contains the zero vector is linearly dependent. (Why?) Remark 7 If the set of vectors A is linearly dependent and A ⊂ B, then B is also linearly dependent. (Why?) Remark 8 If A = {a1 , . . . , aN } ⊂ V is linearly dependent, then one of the aj can be written as a non-trivial (not all scalar coefficients zero) linear combination of the others, that is, one of the aj is in the span of the others. 2.2 Basis The notion of Basis for a vector space is intimately connected with spanning sets of minimal cardinality. Here attention is restricted to vector spaces with spanning sets containing only finitely many vectors. Definition 7 A vector space V is called Finite Dimensional provided it has a spanning set with finitely many vectors. Theorem 1 Suppose V is finite dimensional with minimal spanning set A = {a1 , . . . , aN }. Let B be another minimal spanning set for V. Then #(B) = N , where #(B) denotes the cardinality of the set B. Proof: It suffices to assume that #(B) ≥ N . (Why?) Then choose B 0 := {b1 , . . . , bN } ⊂ B. Clearly, B 0 is linearly independent. (Why?) Since A is spanning, there exist scalars αij such P αij aj . Then the matrix L := [αij ] is invertible. (Why?) Suppose L−1 = [βij ]. that bi = N Pj=1 0 Then ai = N j=1 βij bj , that is, the set {a1 , . . . , aN } is in the span of B . But since A spans 0 V, it follows that B must also span V. But since B was assumed to be a minimal spanning set, it must be that B = B 0 . (Why?) Hence, #(B) = N .3 This theorem allows one to define unambiguously the notion of dimension for a finite dimensional vector space. Definition 8 Let V be a finite dimensional vector space. Then its Dimension, denoted dim(V), is defined to be the cardinality of any minimal spanning set. Definition 9 A Basis for a finite dimensional vector space is defined to be any minimal spanning set. It follows that given a basis B = {b1 , . . . , bN } for a finite dimensional vector space V, every vector a ∈ V can be written uniquely as a linear combination of the base vectors, that is N X a= αj bj (1) j=1 with the coefficients α1 , . . . , αN being uniquely defined. These coefficients are called the coordinates or components of a with respect to the basis B. It is often convenient collect 3 The symbol is used in these notes to indicate the end of a proof. 4 these components into an N -tuple denoted α1 [a]B := ... . αN Notational Convention: It is useful to introduce simplifying notation for sums of the sort (1). Specifically, in the Einstein Summation Convention, expressions in which an index is repeated are to be interpreted as being summed over the range of the index. Thus (1) reduces to N X a= αj bj = αj bj . j=1 With the Einstein summation convention in effect, if one wants to write an expression like αj bj for a particular, but unspecified, value of j, then one must write αj bj (no sum) Example 5 In the Euclidean vector space RN , it is customary to define the Natural Basis N = {e1 , . . . , eN } to be 1 0 0 0 1 0 e1 := .. , e2 := .. , . . . , eN := .. . . . . 0 0 1 Finding components of vectors in RN with respect to the natural basis is then immediate. a1 a1 .. .. N a= . ∈R =⇒ [a]N = . aN aN that is, a = aj ej . In general, finding the components of a vector with respect to a prescribed basis usually involves solving a linear system of equations. For example, suppose a basis B = {b1 , . . . , bN } in RN is given as b11 b1N b1 := ... , . . . , bN := ... . bN 1 bN N Then to find the components α1 [a]B = ... αN 5 of a vector a1 a := ... aN with respect to B, one solves the linear system of equations ai = bij αj , i = 1, . . . , N which in expanded form is a1 = b11 α1 + . . . + b1N αN .. . aN = bN 1 α1 + . . . + bN N αN . 2.3 Change of Basis Let B = {b1 , . . . , bN } and B̃ = {b̃1 , . . . , b̃N } be two bases for the vector space V. The question considered here is how to change components of vectors with respect to the basis B into components with respect to the basis B̃? To answer the question, one needs to specify some relation between the two sets of base vectors. Suppose, for example, one can write the base vectors b̃j as linear combinations of the base vectors bj , that is, one knows the components of each N -tuple [b̃j ]B , j = 1, . . . , N . To be more specific, suppose b̃j = tij bi , for j = 1, . . . , N. (2) Form the matrix T := [tij ] called the Transition Matrix between the bases B and B̃. Then T is invertible. (Why?) Let its inverse have components T −1 = [sij ]. By definition of matrix inverse, one sees that sik tkj = δij where the Kronecker symbol δij is defined as 1 when i = j, δij := 0 when i = 6 j. Suppose now that a vector a has components α1 [a]B = ... . αN Then one shows readily (How?) that the components of a with respect to the basis B̃ are given by α̃1 .. [a]B̃ = . α̃N with α̃i = sij αj . In matrix notation this change of basis formula becomes [a]B̃ = T −1 [a]B . 6 (3) 3 Linear Transformations Linear transformations are at the heart of the subject of linear algebra. Important in their own right, they also are the principal tool for understanding nonlinear functions between subsets of vector spaces. They will be studied here first from an algebraic perspective. Subsequently, they will be studied geometrically. V −→ U, between the vector spaces V and U Definition 10 A linear transformation, L : is a function satisfying L(αa + βb) = αLa + βLb, for every α, β ∈ F and a, b ∈ V.4 (4) Remark 9: The symbol “+” on the left hand side of (4) denotes vector addition in V whereas on right hand side of (4) it denotes vector addition in U. Remark 10: From the definition (4) it follows immediately that once a linear transformation has been specified on a basis for the domain space, its values are determined on all of the domain space through linearity. Two important subspaces associated with a linear transformation are its Range and Nullspace. Consider a linear transformation L : V −→ U. (5) Definition 11 The Range of a linear transformation (5) is the subset of U, denoted Ran(L), defined by Ran(L) := {u ∈ U| there exists v ∈ V such that u = Lv}. Definition 12 The Nullspace of a linear transformation (5) is the subset of V, denoted Nul(L), defined by Nul(L) := {v| Lv = 0}. Remark 9 It is an easy exercise to show that Ran(L) is a subspace of U and Nul(L) is a subspace of V. (How?) Example 6 Let A be an M × N matrix a11 a1N A := ... · · · ... = (a1 · · · aN ) aM 1 aM N where a1 , . . . , aN denote the column vectors of A, that is, A is viewed as consisting of N column vectors from RM . 4 In analysis it is customary to make a notational distinction between the name of a function and the values the function takes. Thus, for example, one might denote a function by f and values the function takes by f (x). For a linear transformation, L, it is customary to denote its values by La. This notation is suggestive of matrix multiplication. 7 Definition 13 The Column Rank of A is defined to be the dimension of the subspace of RM spanned by its column vectors. One can define a linear transformation A : RN −→ RM by Av := Av (matrix multiplication). (6) It is readily shown that the range of A is the span of the column vectors of A, Ran(A) = span{a1 , . . . , aN } and the nullspace of A is the set of N -tuples v1 .. v = . ∈ RN vN satisfying v1 a1 + · · · + vN aN = 0. It follows that dim(Ran(A)) equals the number of linearly independent column vectors of A. One can also view the matrix A as consisting of M -row vectors from RN and define its row rank by Definition 14 The Row Rank of A is defined to be the dimension of the subspace of RN spanned by its row vectors. A fundamental result about matrices is Theorem 2 Let A be an M × N matrix. Then Row Rank of A = Column Rank of A. (7) Proof: Postponed. A closely related result is Theorem 3 Let A be an M × N matrix and A the linear transformation (6). Then dim(Ran(A)) + dim(Nul(A)) = dim(RN ) = N. (8) The result (8) is a special case of the more general Theorem 4 Let L : V −→ U be a linear transformation. Then dim(Ran(L)) + dim(Nul(L)) = dim(V). (9) The proof of this last theorem is facilitated by introducing the notion of Direct Sum Decomposition of a vector space V. 8 Definition 15 Let V be a finite dimensional vector space. A Direct Sum Decomposition of V, denoted V = U ⊕ W, consists of a pair of subspaces, U, W of V with the property that every vector v ∈ V can be written uniquely as a sum v = u + w with u ∈ U and w ∈ W. Remark 10 It follows easily from the definition of direct sum decomposition that V =U ⊕W =⇒ dim(V) = dim(U) + dim(W). (Why?) Example 7 Two non-colinear lines through the origin form a direct sum decomposition of R2 (Why?), and a plane and non-coincident line through the origin form a direct sum decomposition of R3 (Why?). Proof of Theorem 4: Let {f1 , . . . , fn } be a basis for Nul(L) where n = dim(Nul(L)). Fill out this set to form a basis for V, {g1 , . . . , gN −n , f1 , . . . , fn } (How?). Define the subspace Ṽ := span({g1 , . . . , gN −n }). Then Ṽ and Nul(L) form a direct sum decomposition of V (Why?), that is V = Ṽ ⊕ Nul(L). Moreover, one can easily show that L restricted to Ṽ is a one-to-one transformation onto Ran(L) (Why?). It follows that {Lg1 , . . . , LgN −n } forms a basis for Ran(L) (Why?) and hence that dim(Ran(L)) = dim(Ṽ) = N − n (Why?) from which one concludes (9). 3.1 Matrices and Linear Transformations Let L : V −→ U be a linear transformation and let B := {b1 , . . . , bN } be a basis for V and C := {c1 , . . . , cM } a basis for U. Then the action of the linear transformation L can be carried out via matrix multiplication of coordinates in the following manner. Given v ∈ V, define u := Lv. Then the Matrix of L with respect to the bases B and C is defined to be the unique matrix L = [L]B,C satisfying [u]C = L[v]B (matrix multiplication) or in more expanded form [Lv]C = [L]B,C [v]B . (10) To see how to construct [L]B,C , first construct the N column vectors (M -tuples) lj := [Lbj ]C , that is the coordinates with respect to the basis C in U of the image vectors Lbj of the base vectors in B. Then, one easily shows that defining [L]B,C := [l1 . . . lN ] (matrix whose columns are l1 , . . . , lN ) (11) gives the unique M × N matrix satisfying (10). Indeed, if v ∈ V has components [v]B = [vj ], that is v = vj bj , then [Lv]C = [vj Lbj ]C = [l1 . . . lN ][v]B as required to prove (11). Remark 11 If U = V and B = C, then the matrix of L with respect to B is denoted [L]B and is constructed by using the B-coordinates of the vectors Lbj , [Lbj ]B , as its columns. Example 8 The identity transformation I : V −→ V (Iv = v for all v ∈ V) has [I]B = [δij ] where δij is the Kronecker symbol. (Why?) 9 3.2 Change of Basis Suppose L : V −→ V and B = {b1 , . . . , bN } and B̃ = {b̃1 , . . . , b̃N } are two bases for V. The question of interest here is: How are the matrices [L]B and [L]B̃ related? The question is easily answered by using the change of basis formula (3). From the definition of [L]B and [L]B̃ one has [Lv]B = [L]B [v]B and [Lv]B̃ = [L]B̃ [v]B̃ . It follows from (3) that [Lv]B̃ = T −1 [Lv]B = T −1 [L]B [v]B = T −1 [L]B T [v]B̃ from which one concludes [L]B̃ = T −1 [L]B T. Remark 12 (12) Square matrices A and B satisfying A = T −1 BT for some (non-singular) matrix T are said to be Similar. Thus, matrices related to a given linear transformation through a change of basis are similar and conversely similar matrices correspond the same linear transformation under a change of basis. 3.3 Trace and Determinant of a Linear Transformation Consider linear transformations on a given finite dimensional vector space V, i.e. L : V −→ V. Then its Trace is defined by Definition 16 The trace of L, denoted tr(L), is defined by tr(L) := tr([L]B ) for any matrix representation of L. To see that this notion of trace of a linear transformation is well (unambiguously) defined, let [L]B and [L]B̃ denote two different matrix representations of L. Then they must be similar matrices, i.e. there exists a non-singular matrix T such that [L]B̃ = T −1 [L]B T. But then from the elementary property of the trace of a square matrix, tr(AB) = tr(BA) for any two square matrices A, B (Why true?), one has tr([L]B̃ ) = tr(T −1 [L]B T ) = tr(T T −1 [L]B ) = tr([L]B ). In similar fashion, one can define the Determinant of a linear transformation on V. Definition 17 The determinant of a linear transformation L on a finite dimensional vector space V is defined by det(L) := det([L]B ) where [L]B is any matrix representation of L. That this notion of the determinant of a linear transformation is well defined can be seen from an argument similar to that above for the trace. The relevant property of the determinant of square matrices needed in the argument is det(AB) = det(A) det(B) (Why true?). 10 4 Metric, Norm and Inner Product In this section, the algebraic structure for vector spaces presented above is augmented through the inclusion of geometric structure. Perhaps the most basic geometric notion is that of distance or length. A convenient way to introduce a notion of distance into a vector space is through use of a Metric. Definition 18 A Metric on a vector space V is a real valued function of two variables d(·, ·) : V −→ R satisfying the following axioms. For any a, b, c ∈ V, d(a, b) ≥ 0, d(a, b) = 0 ⇔ a = b d(a, b) = d(b, a) d(a, b) ≤ d(a, c) + d(c, b). with (P ositivity) (Symmetry) (T riangle Inequality) (13) Remark 13 Given a metric on a vector space, one can define the important concept of a metric ball. Specifically, an (open) metric ball, B(c, r), centered at c and of radius r is defined to be the set B(c, r) := {v ∈ V| d(c, v) < r}. One can also define a notion of metric convergent sequence as follows. {aj , j = 1, 2, . . .} ⊂ V is said to be metric convergent to a ∈ V provided A sequence lim d(aj , a) = 0. j→∞ An equivalent definition in terms of metric balls states that for all > 0, there exists N > 0 such that aj ∈ B(a, ) for all j > N. An important class of metrics on a vector space comes from the notion of a Norm. Definition 19 A Norm on a vector space is a non-negative, real valued function | · | : V −→ R satisfying the following axioms. For all a, b ∈ V and α ∈ F, |a| ≥ 0, Remark 14 with |a| = 0 ⇔ a = 0 |αa| = |α||a| |a + b| ≤ |a| + |b|. (Positivity) (Homogeneity) (Triangle Inequality) (14) A norm has an associated metric defined by d(a, b) := |a − b|. An important class of norms on a vector space comes from the notion of Inner Product. Here a distinction must be made between real and complex vector spaces. 11 Definition 20 An inner product on a real vector space is a positive definite, symmetric bilinear form h·, ·i : V × V ←→ R satisfying the following axioms. For all a, b, c ∈ V and α ∈ R, ha, ai ≥ 0 with ha, ai = 0 ⇔ a = 0 ha, bi = hb, ai hαa, bi = αha, bi ha + c, bi = ha, bi + hc, bi. (Positivity) (Symmetry) (Homogeneity) (Additivity) (15) Definition 21 An inner product on a complex vector space is a positive definite, conjugatesymmetric form h·, ·i : V × V ←→ C satisfying the following axioms. For all a, b, c ∈ V and α ∈ C, ha, ai ≥ 0 ha, ai = 0 ⇔ a = 0 ha, bi = hb, ai hαa, bi = αha, bi ha + c, bi = ha, bi + hc, bi with (Positivity) (Conjugate-Symmetry) (Homogeneity) (Additivity) (16) where in (16), hb, ai denotes the complex conjugate of hb, ai. It is evident from the definition that a complex inner product is linear in the first variable but conjugate linear in the second. In particular, for all a, b ∈ V and α ∈ C hαa, bi = αha, bi whereas ha, αbi = αha, bi. Remark 15 Given an inner product on either a real or complex vector space, one can define an associated norm by p |a| := ha, ai. (Why does this define a norm?) Remark 16 In RN , the usual inner product, defined by a1 b1 .. .. where a = . , b = . , aN bN ha, bi := a1 b1 + . . . + aN bN = aj bj is often denoted by a · b, the so-called “dot product”. In CN , the usual inner product is given by ha, bi = a · b = a1 b1 + . . . + aN bN where now aj bj ∈ C and the over-bar denotes complex conjugation. Example 9 Let A be a positive definite, symmetric, real, N × N matrix.5 Then A defines an inner product on RN through ha, biA := a · (Ab). 5 Recall that symmetric means that AT = A, where AT denotes the transpose matrix, and positive definite means a · (Aa) > 0 for all a ∈ RN . 12 Example 10 p ≥ 1 by An important class of norms on RN and CN are the p-norms defined for k a kp := N X !1/p |aj |p . (17) j=1 One must show that (17) satisfies the axioms (14) for a norm. The positivity and symmetry axioms from (14) are straight forward to show, but the triangle inequality is not. A proof that (17) satisfies the triangle inequality in (14) can be found in the supplemental Notes on Inequalities. The special case p = 2 is the famous Minkowski Inequality k a + b k2 ≤ k a k2 + k b k2 for any a, b ∈ RN . Example 11 The ∞-norm is defined by k a k∞ := max{|a1 |, . . . , |aN |}. (18) The task of showing that (18) satisfies the axioms (14) for a norm is left as an exercise. An important observation about the p-norms is that lim k a kp = k a k∞ p→∞ whose proof is also left as an exercise. That the ∞-norm is the limit of the p-norms as p → ∞ is what motivates the definition (18). Remark 17 The 2-norm is very special among the class of p-norms being the only one coming from an inner product. The 2-norm is also called the Euclidean Norm since it comes from the usual inner product on the Euclidean space RN . Remark 18 In a real vector space (V, h·, ·i), one can define a generalized notion of angle between vectors. Specifically, given a, b ∈ V, one defines the angle, θ, between a and b to be the unique number 0 < θ < π satisfying cos θ = ha, bi . |a||b| As a special case, two vectors a, b ∈ V are said to be Orthogonal (in symbols a ⊥ b) if ha, bi = 0. (19) Remark 19 Analytically or topologically all norms on a finite dimensional vector space are equivalent in the sense that if a sequence is convergent with respect to one norm, it is convergent with respect to all norms on the space. This observation follows directly from the following important 13 Theorem 5 Let V be a finite dimensional vector space and let k · kI and k · kII be two norms on V. Then there exist positive constants m, M such that for all v ∈ V mkvkI ≤ kvkII ≤ M kvkI . Proof: The proof of this result is left as an exercise. While all norms on a finite dimensional vector space might be topologically equivalent, they are not geometrically equivalent. In particular, the “shape” of the (metric) unit ball can vary greatly for different norms. That fact is illustrated in the following Exercise 1 4.1 Sketch the unit ball in R2 for the p-norms, 1 ≤ p ≤ ∞. Orthonormal Basis In a finite dimensional inner product space (V, h·, ·i), an important class of bases consists of the Orthonormal Bases defined as follows. Definition 22 In an inner product space (V, h·, ·i), a basis B = {b1 , . . . , bN } is called Orthonormal provided hbi , bj i = δij for i, j = 1, . . . , N. (20) Thus, an orthonormal basis consists of pairwise orthogonal unit vectors. Finding coordinates of arbitrary vectors with respect to such a basis is as convenient as it is for the natural basis in FN . Indeed, what makes the natural basis so useful is that it is an orthonormal basis with respect to the usual inner product on RN . Remark 20 Suppose B = {b1 , . . . , bN } is an orthonormal basis for (V, h·, ·i) and a ∈ V. Then one easily deduces that a = a1 b1 + . . . + aN bN =⇒ aj = ha, bj i as can be seen by taking the inner product of the first equation with each base vector bj . Moreover, one has ha, ci = [a]B · [c]B (21) where the right hand side is the usual dot-product of N -tuples in RN [a]B · [c]B = a1 c1 + . . . + aN cN . Thus, computing the inner product of vectors using components with respect to an orthonormal basis uses the familiar formula for the dot-product on RN with respect to the natural basis. In similar fashion one can derive a simple formula for the components of the matrix representation of a linear transformation on an inner product space with respect to an orthonormal basis. Specifically, suppose L : V −→ V and B = {b1 , . . . , bN } is an orthonormal basis. Then it is straightforward to show that L := [L]B = [lij ] with lij = hbi , Lbj i. 14 (22) One way to derive (22) is to first recall the formula lij = ei · (Lej ) (23) for the components of an N × N matrix, L := [lij ], where {e1 , . . . , eN } is the natural basis for RN . It follows from (21) that for all a, c ∈ V ha, Lci = [a]B · ([Lc]B ) = [a]B · ([L]B [c]B ). (24) Now letting a = bi , c = bj and noting that [bj ]B = ej for all j = 1, . . . , N , one deduces (22) from (23,24). 4.1.1 General Bases and the Metric Tensor More generally, one would like to be able to conveniently compute the inner product of two vectors using coordinates with respect to some general (not necessarily orthonormal) basis. Suppose B := {f1 , . . . , fN } is a basis for (V, h·, ·i) and let a, b ∈ V with coordinates [a]B = [aj ], [b]B = [bj ], respectively. Then one has N N N X X X aj bk hfj , fk i = [a]B · (F [b]B ) bk fk i = ha, bi = h aj f j , j=1 k=1 j,k=1 where the matrix F := [hfi , fj i] = [fij ] (25) is called the Metric Tensor associated with the basis B. It follows that the familiar formula (21) for the dot-product in RN holds in general if and only if the basis B is orthonormal. The matrix F is called a metric tensor because it is needed to compute lengths and angles using coordinates with respect to the basis B. In particular, if v ∈ V has coordinates [v]B = [vj ], then |v| (the length of v) is given by |v|2 = hv, vi = fij vi vj . 4.1.2 The Gram-Schmidt Orthogonalization Procedure Given any basis B = {f1 , . . . , fN } for an inner product space (V, h·, ·i), there is a simple algorithm, called the Gram-Schmidt Orthogonalization Procedure, for constructing from B an orthonormal basis B̃ = {e1 , . . . , eN } satisfying span{f1 , . . . , fK } = span{e1 , . . . , eK } for each 1 ≤ K ≤ N. (26) The algorithm proceeds inductively as follows. Step 1: Define e1 := f1 /kf1 k, where k·k denotes the norm associated with the inner product h·, ·i. Step 2: Subtract from f2 its component in the direction of e1 and the normalize. More specifically, define g2 := f2 − hf2 , e1 ie1 15 and then define e2 := g2 . kg2 k It is easily seen that {e1 , e2 } is an orthonormal pair satisfying (26) for K = 1, 2. Inductive Step: Suppose orthonormal vectors {e1 , . . . , eJ } have been constructed satisfying (26) for 1 ≤ K ≤ J with J < N . Define gJ+1 := fJ+1 − hfJ+1 , e1 ie1 − . . . − hfJ+1 , eJ ieJ and then define eJ+1 := gJ+1 . kgJ+1 k One easily sees that {e1 , . . . , eJ+1 } is an orthonormal set satisfying (26) for 1 ≤ K ≤ J + 1 thereby completing the inductive step. 4.2 Reciprocal Basis Let (V, h·, ·i) be a finite dimensional inner product space with general basis B = {f1 , . . . , fN }. Then B has an associated basis, called its Reciprocal or Dual Basis defined through Definition 23 Given a basis B = {f1 , . . . , fN } on the finite dimensional inner product space (V, h·, ·i), there is a unique Reciprocal Basis B ∗ := {f 1 , . . . , f N } satisfying hfi , f j i = δij for i, j = 1, . . . , N. (27) Thus, every vector v ∈ V has the two representations v = N X v i fi (Contravariant Expansion) (28) i=1 = N X vi f i . (Covariant Expansion) (29) i=1 The N -tuples [v]B = [v i ] and [v]B ∗ = [vi ] are called the Contravariant Coordinates and Covariant Coordinates of v, respectively. It follows immediately from (27,28,29) that the contravariant and covariant coordinates of v can be conveniently computed via the formulae v i = hv, f i i and vi = hv, fi i. (30) Exercise 2 Let B = {f1 , f2 } be any basis for the Cartesian plane R2 . Show how the reciprocal basis B ∗ = {f 1 , f 2 } can be constructed graphically from B. Exercise 3 Let B = {f1 , f2 , f3 } be any basis for the Cartesian space R3 . Show that the reciprocal basis B ∗ = {f 1 , f 2 , f 3 } is given by f1 = f2 × f3 , f1 · (f2 × f3 ) f2 = f1 × f3 , f2 · (f1 × f3 ) where a × b denotes the vector cross product of a and b. 16 f3 = f1 × f2 f3 · (f1 × f2 ) 5 Linear Transformations Revisited The theory of linear transformations between finite dimensional vector spaces is revisited in this section. After some general considerations, several important special classes of linear transformations will be studied from both an algebraic and a geometric perspective. The first notion introduced is the vector space Lin(V, U) of linear transformations between two vector spaces. 5.1 The Vector Space Lin(V, U) Definition 24 Given two vector spaces, V, U, Lin(V, U) denotes the collection of all linear transformations between V and U. It is a vector space under the operations of addition and scalar multiplication of linear transformations defined as follows. Given two linear transformations L1 , L2 : V −→ U their sum, (L1 + L2 ), is defined to be the linear transformation satisfying (L1 + L2 )v := L1 v + L2 v for all v ∈ V. Moreover, scalar multiplication of a linear transformation is defined by (αL)v := αLv for all α∈F and v ∈ V. For the special case V = U, one denotes the vector space of linear transformations on V by Lin(V). 5.1.1 Adjoint Transformation Let L ∈ Lin(V, U) be a linear transformation between two inner product spaces (V, h·, ·i) and (U, h·, ·i). The Adjoint Transformation, L∗ , is defined to be the unique transformation in Lin(U, V) satisfying hLv, ui = hv, L∗ ui for all v ∈ V and u ∈ U. (31) Remark 21 The right hand side of (31) is the inner product on V whereas the left hand side is the inner product on U. Remark 22 This definition of adjoint makes sense due to the Theorem 6 (Riesz Representation Theorem) Suppose l : V −→ F is a linear, scalar valued transformation on the inner product space (V, h·, ·i). Then there exists a unique vector a ∈ V such that lv = hv, ai for all v ∈ V. Note that for fixed u ∈ U, lv := hLv, ui is a scalar valued linear transformation on V. By the Riesz Representation Theorem, there exists a unique vector a ∈ V such that lv = hv, ai for all v ∈ V. One now defines L∗ u := a ∈ V. 17 Example 12 Suppose V = RN , U = RM and L ∈ MM,N (R), the space of all M × N (real) matrices. Then L∗ = LT , the transpose of L. (Why?) On the other hand, if V = CN , T U = CM and L ∈ MM,N (C), the space of all M × N (complex) matrices. Then L∗ = L , the conjugate transpose of L. (Why?) Remark 23 Consider the special case V = U (over F = R) and let B := {f1 , . . . , fN } be any basis for V. Then one can show that for any A ∈ Lin(V) [A∗ ]B = F −1 [A]TB F (32) where the matrix F is given by F := [fij ] with fij := hfi , fj i. The proof of (32) is left as an exercise. 5.1.2 Two Natural Norms for Lin(V, U) One can define a natural inner product on Lin(V) by L1 · L2 := tr(L∗1 L2 ) (33) where L∗1 denotes the adjoint of L1 . (Why does this satisfy the axioms (15) or (16)?) (1) Remark 24 In the case V = RN with L1 and L2 given by N × N - matrices L1 = [lij ] , (2) (2) L2 = [lij ], respectively, then one readily shows (How?) that L1 · L2 = N X (1) (2) lij lij . i,j=1 One can readily show that (33) also defines an inner product on Lin(V, U) where V and U are any two finite dimensional vector spaces. (How?) The norm associated to this inner product, called the trace norm, is then defined by p √ |L| := L · L = tr(L∗ L). (34) Another natural norm on Lin(V, U), where (V, | · |) and (U, | · |) are normed vector spaces, called the operator norm, is defined by kLk := sup |Lv|. (35) |v|=1 Question: What is the geometric interpretation of (35)? An important relationship between the trace norm and operator norm is given by 18 Theorem 7 Let (V, h·, ·i) and (U, h·, ·i) be two finite dimensional, inner product spaces. Then for all L ∈ Lin(V, U), √ kLk ≤ |L| ≤ N kLk. (36) Proof: The proof uses ideas to be introduced in the up coming study of spectral theory. To see how spectral theory plays a role in proving (36), first notice that p p kLk = sup |Lv| = sup hLv, Lvi = sup hv, L∗ Lvi. |v|=1 |v|=1 |v|=1 Next observe that L∗ L is a self-adjoint, positive linear transformation in Lin(V). As part of spectral theory for self-adjoint operators on a finite dimensional inner product space, it is shown that sup hv, L∗ Lvi = max |λj |2 j=1,...,N |v|=1 where {|λj |2 , j = 1, . . . , N } are the (positive) eigenvalues of L∗ L. Thus, one has kLk = max |λj |. j=1,...,N (37) On the other hand, spectral theory will show that for the trace norm, one has ∗ 2 |L| = tr(L L) = N X |λj |2 . (38) j=1 Appealing to the obvious inequalities 2 max |λj | ≤ j=1,...,N N X |λj |2 ≤ N max |λj |2 , j=1,...,N j=1 one readily deduces (36) from (37,38). 5.1.3 Elementary Tensor Product Among the most useful linear transformations is the class of Elementary Tensor Products which can be used as the basic building blocks of all other linear transformations. First consider the case of Lin(V) where (V, h·, ·i) is an inner product space. Definition 25 Let a, b ∈ V. Then the elementary tensor product of a and b, denoted a ⊗ b, is defined to be the linear transformation on V satisfying (a ⊗ b) v := hv, bia for all v ∈ V. (39) Remark 25 It follows immediately from the definition that Ran(a ⊗ b) = span(a) and Nul(a ⊗ b) = b⊥ , where b⊥ denote the orthogonal complement of b, i.e. the (N − 1)dimensional subspace of all vectors orthogonal to b (where dim(V) = N ). The elementary tensor products are thus the rank one linear transformations in Lin(V). 19 Remark 26 The following identities can also be derived easily from the definition of elementary tensor product. Given any a, b, c, d ∈ V and A ∈ Lin(V) (a ⊗ b)∗ (a ⊗ b) · (c ⊗ d) A · (a ⊗ b) (a ⊗ b) (c ⊗ d) A (a ⊗ b) (a ⊗ b) A tr(a ⊗ b) = = = = = = = b⊗a ha, cihb, di ha, Abi hb, cia ⊗ d (Aa) ⊗ b a ⊗ (A∗ b) ha, bi. Example 13 When V = RN and a = (ai ), b = (bi ), then the matrix of a ⊗ b with respect to the natural basis on RN is [a ⊗ b]N = [ai bj ]. (Why?). (40) Remark 27 The result (40) has a natural generalization to a general finite dimensional inner product space (V, h·, ·i). Let B := {e1 , . . . , eN } be an orthonormal basis for V and a, b ∈ V. Then [a ⊗ b]B = [ai bj ] (41) where [a]B = [ai ] and [b]B = [bi ]. Hence, the formula (40) generalizes to the matrix representation of an elementary tensor product with respect to any orthonormal basis for V. More generally, if B = {f1 , . . . , fN } is any (not necessarily orthonormal) basis for V, then [a ⊗ b]B = [ai bj ]F (matrix product) where F is the metric tensor associated with the basis B defined by (25). Verification of this last formula is left as an exercise. The notion of elementary tensor product is readily generalized to the setting of linear transformations between two inner product spaces (V, h·, ·i) and (U, h·, ·i) as follows. Definition 26 Let a ∈ U and b ∈ V. Then the elementary tensor product a ⊗ b is defined to be the linear transformation in Lin(V, U) satisfying (a ⊗ b) v := hv, bia 5.1.4 for all v ∈ V. A Natural Orthonormal Basis for Lin(V, U) Using the elementary tensor product, one can construct a natural orthonormal basis for Lin(V, U). First consider the special case of Lin(V). Suppose (V, h·, ·i) is an inner product space with orthonormal basis B := {e1 , . . . , eN }. Then one easily shows that the set B := {ei ⊗ ej , i, j = 1, . . . , N } provides an orthonormal basis for Lin(V) equipped with the inner product (33). If follows immediately that dim(Lin(V)) = N 2 . 20 Remark 28 If L ∈ Lin(V), then L = lij ei ⊗ ej with lij = hL, ei ⊗ ej i = hei , Lej i. (Why?) (42) More generally, suppose BU = {b1 , . . . , bM } is an orthonormal basis for U and BV = {d1 , . . . , dN } is an orthonormal basis for V. Then a natural orthonormal basis for Lin(V, U) (with respect to the trace inner product on Lin(V, U)) is {bi ⊗ dj | i = 1, . . . , M, j = 1, . . . , N }. Indeed, that these tensor products are pairwise orthogonal and have trace norm one follows from tr((bi ⊗ dj )∗ bk ⊗ dl ) tr(dj ⊗ bi bk ⊗ dl ) hbi , bk itr(dj ⊗ dl ) hbi , bk ihdj , dl i = δik δjl . hbi ⊗ dj , bk ⊗ dl i = = = = The generalization of (42) for L ∈ Lin(V, U) is L = lij bi ⊗ dj with lij := hL, bi ⊗ dj i = hbi , Ldj i. (Why?) Remark 29 Suppose B := {e1 , . . . , eN } is an orthonormal basis for V. Then every L ∈ Lin(V) has the associated matrix [ˆlij ] defined through L= N X ˆlij ei ⊗ ej . i,j=1 On the other hand, L also has the associated matrix representation [L]B = [lij ] defined through the relation [Lv]B = [L]B [v]B which must hold for all v ∈ V with lij = hei , Lej i. It is straight forward to show (How?) that these two matrices are equal, i.e. [L]B = [lij ] = [ˆlij ]. 5.1.5 General Tensor Product Bases for Lin(V) and Their Duals Let B = {f1 , . . . , fN } be a general basis for (V, h·, ·i) with associated reciprocal basis B ∗ = {f 1 , . . . , f N }. Then Lin(V) has four associated tensor product bases and corresponding reciprocal bases given by {fi ⊗ fj } ←→ {f i ⊗ f j } {f i ⊗ f j } ←→ {fi ⊗ fj } {fi ⊗ f j } ←→ {f i ⊗ fj } {f i ⊗ fj } ←→ {fi ⊗ f j } 21 (43) (44) (45) (46) where i, j = 1, . . . , N in each set. Every L ∈ Lin(V) has four matrix representations defined through L = N X ˆlij fi ⊗ fj (Pure Contravariant) (47) ˆlij f i ⊗ f j (Pure Covariant) (48) ˆli fi ⊗ f j j (Mixed Contravariant-Covariant) (49) ˆl j f i ⊗ fj i (Mixed Covariant-Contravariant). (50) i,j=1 = N X i,j=1 = N X i,j=1 = N X i,j=1 It follows easily (How?) that one can compute these various matrix components of L from the formulae ˆlij = hf i , Lf j i ˆlij = hfi , Lfj i ˆli = hf i , Lfj i j ˆl j = hfi , Lf j i. i Exercise 4 Let L = I, the identity transformation on V. Then I has the four matrix representations (47,48,49,50) with ˆlij = hf i , f j i = f ij ˆlij = hfi , fj i = fij ˆli = hf i , fj i = δ i j j ˆl j = hfi , f j i = δ i j i where δji denotes the usual Kronecker symbol. Thus, the matrix for the identity transformation when using a general (non-orthonormal) basis has the accustomed form only using a mixed covariant and contravariant representation. Remark 30 Equivalent Matrix Representations. Given a general basis B = {f1 , . . . , fN } and its dual basis B ∗ = {f 1 , . . . , f N }, one also has the four matrix representations of a linear transformation L ∈ Lin(V) defined through [w]B [w]B ∗ [w]B ∗ [w]B = = = = [Lv]B = [L](B,B) [v]B [Lv]B ∗ = [L](B ∗ ,B ∗ ) [v]B ∗ [Lv]B ∗ = [L](B,B ∗ ) [v]B [Lv]B = [L](B ∗ ,B) [v]B ∗ 22 which in component form becomes wi = lij v j wi = lij vj wi = lij v j wi = lij vj . One now readily shows that ˆlij = lij , ˆlij = lij , ˆli = li j j and ˆlij = lij . Indeed, consider the following calculation: X ˆlij fi ⊗ fj vk f k Lv = i,j = X = X ˆlij fi vk hfj , f k i i,j ˆlij fi vk δ k j i,j ! = X = X fi X i ˆlij vj j w i fi i where wi = ˆlij vj . It now follows from the definition of lij that ˆlij = lij . The remaining identities can be deduced by similar reasoning. Henceforth, the matrices [ˆlij ], [lij ], etc, will be used interchangeably. Remark 31 Raising and Lowering Indices. The metric tensor F = [fij ] = [hfi , fj i] and its dual counterpart [f ij ] = [hf i , f j i] can be used to switch between covariant and contravariant component forms of vectors and linear transformations (and indeed, tensors of all orders). This process is called “raising and lowering indices” in classical tensor algebra. Thus, for example, if v = v i fi = vi f i , then v i = hv, f i i = hvk f k , f i i = f ik vk . Thus, [f ij ] is used to transform the covariant components [vi ] of a vector v into the contravariant components [v i ], a process called “raising the index”. By similar reasoning one shows that vi = fik v k , that is, the metric tensor F = [fij ] is used to transform the contravariant components of a vector into the covariant components, a process called “lowering the index”. 23 Analogous raising/lowering index formulas hold for linear transformations. In particular, suppose w = Lv and consider the string of identities X Lv = lij fi ⊗ fj v k fk ij = X lij fi v k hfj , fk i ij ! = X X fi i lik fkj vj . k From the definition of lij , it follows that lij = lik fkj . Thus, the tensor [fij ] can be used to “lower” the column index of [lij ] by right multiplication of matrices. Similarly, [f ij ] can be used to raise the column index by right multiplication. Moreover, they can be used to lower and raise the row index of the matrices [lij ] as illustrated by the following string of identities. X L = lij fi ⊗ fj ij = X lij (fiq f q ) ⊗ fj ij ! = X X qj fiq lij f q ⊗ fj i which from the definition of the matrix [lqj ] implies that lij = fik lkj . (51) The following identities are readily obtained by similar arguments. lij = f ik lkj = lik fik , lij = fik lij = lik fik , etc. . . . Applying (51) to the identity transformation for which lij = δji , lij = δij , lij = f ij and lij = fij , one sees that δij = fik f kj . It follows that the metric tensor [fij ] and the matrix [f ij ] are inverses of one another, i.e. [f ij ] = [fij ]−1 . 24 5.1.6 Eigenvalues, Eigenvectors and Eigenspaces Invariance principles are central tools in most areas of mathematics and they come in many forms. In linear algebra, the most important invariance principle resides in the notion of an Eigenvector. Definition 27 Let V be a vector space and L ∈ Lin(V) a linear transformation on V. Then a non-zero vector v ∈ V is called an Eigenvector and λ ∈ F its associated Eigenvalue provided Lv = λv. (52) From the definition (52) it follows that L(span(v)) ⊂ span(v), that is the line spanned by the eigenvector v is invariant under L in that L maps the line back onto itself. The set of all eigenvectors corresponding to a given eigenvalue λ (with the addition of the zero vector) forms a subspace of V (Why?), denoted E(λ), called the the Eigenspace corresponding to the eigenvalue λ. One readily shows (How?) from the definition (52) that eigenvectors corresponding to distinct eigenvalues of a given linear transformation L ∈ Lin(V) are independent, from which one concludes E(λ1 ) ∩ E(λ2 ) = {0} if λ1 6= λ2 . (53) Re-writing the definition (52) as (L − λI)v = 0 one sees that v is an eigenvector with eigenvalue λ if and only if v ∈ Nul(L − λI). It follows that λ is an eigenvalue for L if and only if det(L − λI) = 0. Recalling that det(L − λI) is a polynomial of degree N , one defines the Characteristic Polynomial pL (λ) of the transformation L to be pL (λ) := det(L − λI) = (−λ)N + i1 (L)(−λ)N −1 + . . . + iN −1 (L)(−λ) + iN (L). (54) The coefficients IL := {i1 (L), . . . , iN (L)} of the characteristic polynomial are called the Principal Invariants of the transformation L because they are invariant under similarity transformation. More specifically, let T ∈ Lin(V) be non-singular. Then the characteristic polynomial for T−1 LT is identical to that for L as can be seen from the calculation pT−1 LT (λ) = = = = = det(T−1 LT − λI) det(T−1 LT − λT−1 T) det(T−1 (L − λI)T) det(L − λI) pL (λ). 25 An explanation of why the elements of IL are called the Principal Invariants of L must wait until the upcoming section on spectral theory. The principle invariants i1 (L) and iN (L) have simple forms. Indeed one can readily show (How?) that i1 (L) = tr(L) and iN (L) = det(L). Moreover, if the characteristic polynomial has the factorization N Y pL (λ) = (λj − λ), j=1 then one also has that i1 (λ) = tr(L) = λ1 + . . . + λN 5.2 and iN (L) = det(L) = λ1 . . . λN . Classes of Linear Transformations An understanding of general linear transformations in Lin(V ) can be gleaned from a study of special classes of transformations and representation theorems expressing general linear transformations as sums or products of transformations from the special classes. The discussion is framed within the setting of a finite dimensional, inner product vector space. The first class of transformations introduced are the projections. 5.2.1 Projections Definition 28 A linear transformation P on (V, h·, ·i) is called a Projection provided P 2 = P. (55) Remark 32 It follows readily from the definition (55) that P is a projection if and only if I − P is a projection. Indeed, assume P is a projection, then (I − P )2 = I 2 − 2P + P 2 = I − 2P + P = I − P which shows that I −P is a projection. The reverse implication follows by a similar argument since if I − P is a projection, then one need merely write P = I − (I − P ) and apply the above calculation. Remark 33 The geometry of projections is illuminated by the observation that the there is a simple relationship between their range and nullspace. In particular, one can show easily from the definition (55) that Ran(P ) = Nul(I − P ) and Nul(P ) = Ran(I − P ). 26 (56) For example, to prove that Ran(P ) = Nul(I − P ), let v ∈ Ran(P ). Then v = P u for some u ∈ V and (I − P )v = (I − P )P u = P u − P 2 u = P u − P u = 0. It follows that v ∈ Nul(P ) and hence that Ran(P ) ⊂ Nul(I − P ). (57) Conversely, if v ∈ Nul(I −P ), then (I −P )v = 0, or equivalently v = P v. Thus, v ∈ Ran(P ) from which it follows that Nul(I − P ) ⊂ Ran(P ). (58) From (57,58) one concludes that Ran(P ) = Nul(I − P ) as required. The second equality in (56) follows similarly. Remark 34 Another important observation concerning projections is that V is a direct sum of the subspaces Ran(P ) and Nul(P ) V = Ran(P ) ⊕ Nul(P ). (59) To see (59), one merely notes that for any v ∈ V v = P v + (I − P )v, with P v ∈ Ran(P ) and (I − P )v ∈ Nul(P ). This fact and (9) show (59). Because of (59) one often refers to a projection P as “projecting V onto Ran(P ) along Nul(P ).” An important subclass of projections are the Perpendicular Projections. Definition 29 A projection P is called a Perpendicular Projection provided Ran(P ) ⊥ Nul(P ). A non-obvious fact about perpendicular projections is the Theorem 8 A projection P is a perpendicular projection if and only if P is self-adjoint, i.e. P = P ∗. Proof: First assume P is self-adjoint and let v ∈ Ran(P ) and w ∈ Nul(P ). Then, v = P u for some u ∈ V and hv, wi = hP u, wi = hu, P ∗ wi = hu, P wi = hu, 0i = 0 from which it follows that v ⊥ w and hence that Ran(P ) ⊥ Nul(P ). Conversely, suppose P is a perpendicular projection. One must show that P is selfadjoint, i.e. that hP v, ui = hv, P ui for all v, u ∈ V. (60) To that end, notice that hP v, ui = hP v, P u + (I − P )ui = hP v, P ui + hP v, (I − P )ui = hP v, P ui. (Why?) Since hP v, P ui is symmetric in v and u, the same must be true for hP v, ui, from which ones concludes (60). Question: Among the class of elementary tensor products, which (if any) are projections? Which are perpendicular projections? 27 Answer: An elementary tensor product a ⊗ b is a projection if and only if ha, bi = 1. It is a perpendicular projection if and only if b = a and |a| = 1. (Why?) Thus, if a is a unit vector, then a ⊗ a is perpendicular projection onto the line spanned by a. More generally, if a and b are perpendicular, unit vectors, then a ⊗ a + b ⊗ b is a perpendicular projection onto the plane (2-dimensional subspace) spanned by a and b. (Exercise.) Generalizing further, if a1 , . . . , aK are pairwise orthogonal unit vectors, then a1 ⊗ a1 + . . . + aK ⊗ aK is a perpendicular projection onto span(a1 , . . . , aK ). 5.2.2 Diagonalizable Transformations A linear transformation L on a finite dimensional vector space V is called Diagonalizable provided it has a matrix representation that is diagonal, that is, there is a basis B for V such that [L]B = diag(l1 , . . . , lN ). (61) Notation: An N × N diagonal matrix will be denoted A = diag(a1 , . . . , aN ) where A = [aij ] with a11 = a1 , . . . , aN N = aN and aij = 0 for i 6= j. The fundamental result concerning diagonalizable transformations is Theorem 9 A linear transformation L on V is diagonalizable if and only if V has a basis consisting entirely of eigenvectors of L. Proof: Assume first that V has a basis B = {f1 , . . . , fN } consisting of eigenvectors of L, that is Lfj = λj fj for j = 1, . . . , N . Recall that the column N -tuples comprising the matrix [L]B are given by [Lfj ]B , the coordinates with respect to B of the vectors Lfj for j = 1, . . . , N . But 0 λ1 λ2 0 [Lf1 ]B = [λ1 f1 ]B = .. , [Lf2 ]B = [λ2 f2 ]B = 0 , etc. .. . . 0 0 It now follows immediately that [L]B = diag(λ1 , . . . , λN ). (Why?) Conversely, suppose L is diagonalizable, that is, there exists a basis B = {f1 , . . . , fN } for V such that [L]B = diag(λ1 , . . . , λN ). But then for each j = 1, . . . , N 0 .. . 0 [Lfj ]B = [L]B [fj ]B = diag(λ1 , . . . , λN ) 1 (j th -component=1) = [λj fj ]B . 0 . .. 0 It follows that for each j = 1, . . . , N , Lfj = λj fj (Why?), and hence that B = {f1 , . . . , fN } is a basis of eigenvectors of L. 28 5.2.3 Orthogonal and Unitary Transformations Let (V, h·, ·i) be a finite dimensional, inner product vector space. The the class of Orthogonal Transformations, in the case of a vector space over R, and the class of Unitary Transformation, in the case of a vector space over C, is defined by Definition 30 A transformation Q ∈ Lin(V) is called Orthogonal (real case) or Unitary (complex case) provided Q−1 = Q∗ or equivalently I = Q∗ Q. (62) That is, the inverse of Q equals the adjoint of Q. Remark 35 It follows immediately from (62) that orthogonal transformations preserve the inner product on V and hence leave lengths and angles invariant. Specifically, one has hQa, Qbi = ha, Q∗ Qbi = ha, Q−1 Qbi = ha, bi. (63) In particular, from (63) it follows that |Qa|2 = hQa, Qai = ha, ai = |a|2 showing that Q preserves lengths. Remark 36 shows that The class of orthogonal transformations will be denoted Orth. One easily det(Q) = ±1. (Why?) It is useful to define the subclasses Orth+ := {Q ∈ Orth | det(Q) = 1} and Orth− := {Q ∈ Orth | det(Q) = −1}. Example 14 If V is the Euclidean plane R2 , then Orth+ (R2 ) is the set of (rigid) Rotations, i.e. given Q ∈ Orth+ (R2 ), there exists a 0 < θ < 2π such that cos(θ) − sin(θ) Q= . (64) sin(θ) cos(θ) Also, the transformation RM := 0 1 1 0 is in Orth− (R2 ) (Why?) (How does RM act geometrically?). More generally, Orth− (R2 ) = {R ∈ Orth(R2 )| R = RM Q for some Q ∈ Orth+ (R2 )}. (Why?) Example 15 If V is the Euclidean space R3 , then Orth+ is called the set of Proper Orthogonal Transformations or the set of Rotations. To motivate calling Q ∈ Orth+ (R3 ), one can prove that 29 Theorem 10 If Q ∈ Orth+ (R3 ), then there exists a non-zero vector a such that Qa = a (i.e. a is an eigenvector corresponding to the eigenvalue λ = 1) and Q restricted to the plane orthogonal to a is a two dimensional rotation of the form (64). The line spanned by a is called the Axis of Rotation. Proof: Exercise. (Hint: Use the following facts to show that 1 ∈ σ(Q). 1. 1 = det(Q) = λ1 λ2 λ3 where the (complex) spectrum of Q = {λ1 , λ2 , λ3 }. 2. At least one of the eigenvalues must be real (Why?), say λ1 , and the other two must be complex conjugates (Why?), i.e. λ2 λ3 = |λ2 |2 . 3. |λj | = 1 for j = 1, 2, 3 (Why?) and hence λ2 λ3 = 1. 4. Since λ1 = 1 is an eigenvalue for Q, it has an eigenvector a, i.e. Qa = a. 5. Q is in Orth+ when restricted to the plane orthogonal to a (Why?). The conclusion of the theorem now follows readily.) It then follows that Orth− (R3 ) = {R ∈ Orth(R3 ) | R = −Q for some Q ∈ Orth+ (R3 )}. (Why?) Remark 37 Let V = RN equipped with the Euclidean inner product and natural basis. One can now make the identification of Lin(RN ) with the set of N × N -matrices, MN . Then Orth(RN ), the set of orthogonal transformations on RN , corresponds to the set of orthogonal matrices in MN . It follows from the definition (62) that Q ∈ Orth(RN ) ⊂ MN if and only if the columns (and hence rows) of Q form an orthonormal basis for RN . (Why?) 5.2.4 Self-Adjoint Transformations Let (V, h·, ·i) be a finite dimensional inner product space. An important subspace of Lin(V) is the collection of Hermitian or Self-Adjoint transformations denoted by Herm(V). The self-adjoint transformations are just those transformations equal to their adjoints, i.e. S ∈ Herm(V) means S ∗ = S. The goal of this section is to present the celebrated Spectral Theorem for self-adjoint transformations. The presentation proceeds in steps through a sequence of lemmas. For much of the discussion, the scalar field can be either R or C. Definition 31 The Spectrum of a transformation in Lin(V), denoted σ(L), is the set of all eigenvalues of L. Lemma 1 Let L be a self-adjoint transformation on the inner product space (V, h·, ·i). Then σ(L) ⊂ R, i.e. the spectrum of L contains only real numbers. Proof: Let λ ∈ σ(L) with associated unit eigenvector e, i.e. |e| = 1 and Le = λe. Then λ = λhe, ei = hλe, ei = hLe, ei = he, Lei = he, λei = λhe, ei = λ. Thus, λ = λ from which it follows that λ ∈ R. 30 Lemma 2 Let L be a self-adjoint transformation on the inner product space (V, h·, ·i). Then distinct eigenvalues of L have orthogonal eigenspaces. Proof: Let λ1 , λ2 ∈ σ(L) be distinct (real) eigenvalues of L with associated eigenvectors e1 , e2 , respectively. What must be shown is that e1 ⊥ e2 , i.e. that he1 , e2 i = 0. To that end, λ1 he1 , e2 i = hλ1 e1 , e2 i = hLe1 , e2 i = he1 , Le2 i = he1 , λ2 e2 i = λ2 he1 , e2 i. It follows that (λ1 − λ2 )he1 , e2 i = 0. But since λ1 6= λ2 , it must be that he1 , e2 i = 0, i.e. that e1 ⊥ e2 , as required. Lemma 3 Let L be a self-adjoint transformation on the inner product space (V, h·, ·i) and let λ ∈ σ(L) with associated eigenspace E(λ). Then E(λ) and its orthogonal complement E(λ)⊥ are invariant subspaces of L, i.e. L(E(λ)) ⊂ E(λ) L(E(λ)⊥ ) ⊂ E(λ)⊥ . (65) (66) Proof: Assertion (65) follows immediately from the definitions of eigenvalue and eigenspace. For the second assertion (66), one must show that if v ∈ E(λ)⊥ , the Lv is also in E(λ)⊥ . To that end, let e ∈ E(λ) and v ∈ E(λ)⊥ . Then v ⊥ e and hLv, ei = hv, L∗ ei = hv, Lei = hv, λei = λhv, ei = 0. It follows that Lv is perpendicular to every eigenvector in E(λ) from which one concludes that Lv ∈ E(λ)⊥ as required. With the aid of these lemmas, one can now prove the Spectral Theorem for self-adjoint transformations. Theorem 11 Let L ∈ Lin(V) be self-adjoint. Then there is an orthonormal basis for V consisting entirely of eigenvectors of L. In particular, the eigenspaces of L form a direct sum decomposition of V V = E(λ1 ) ⊕ . . . ⊕ E(λK ) (67) where σ(L) = {λ1 , . . . , λK } is the spectrum of L. Proof: Since all of the eigenvalues of L are real numbers, the characteristic polynomial factors completely over the real numbers, i.e. K Y det(L − λ I) = pL (λ) = (λj − λ)nj j=1 with n1 + . . . + nK = N 31 where dim(V) = N . The exponent nj is called the algebraic multiplicity of the eigenvalue λj . The geometric multiplicity, mj , of each eigenvalue λj is defined to be dim(E(λj )). To prove the theorem, it suffices to show that m1 + . . . + mK = N. (68) Indeed, (68) shows that there is a basis for V consisting of eigenvectors of L, called an eigenbasis, and Lemma 2 shows one can find an orthonormal eigenbasis for V. Since the eigenspaces E(λj ), j = 1, . . . , K are pairwise orthogonal, one can form the direct sum subspace V1 := E(λ1 ) ⊕ . . . ⊕ E(λK ). The problem is to show that V1 = V. Suppose not. Then one can write V = V1 ⊕ V1⊥ . First note that ⊥ V1⊥ = ∩K j=1 E(λj ) . (Why?) It follows from Lemma 3, that L(V1⊥ ) ⊂ V1⊥ . (Why?) Let L1 denote the restriction of L to the subspace V1⊥ L1 : V1⊥ −→ V1⊥ , that is, if v ∈ V1⊥ , then L1 v := Lv ∈ V1⊥ . Notice that L1 is self-adjoint. Hence, the characteristic polynomial of L1 factors completely over the real numbers. Let µ be an eigenvalue of L1 with associated eigenvector e, i.e. L1 e = Le = µe. Thus, e is also an eigenvector of L with eigenvalue µ. It follows that µ must be one of the λj , j = 1, . . . , K and e ∈ E(λj) for some j = 1, . . . , K. But e must be orthogonal to all of the eigenspaces E(λJ ), j = 1, . . . , K. This contradiction implies that V1⊥ = {0} and that V1 = V as required. Let Pj denote the perpendicular projection operator onto the eigenspace E(λj ). Then since the eigenspaces are pairwise orthogonal, it follows that (Why?) Pi Pj = 0 if i 6= j, and P2j = Pj . (69) The spectral theorem implies that a self-adjoint transformation L has a Spectral Decomposition defined by L = λ1 P 1 + . . . + λK P K (70) where the spectrum of L is σ(L) = {λ1 , . . . , λK }. Moreover, for each positive integer n, Ln is also self-adjoint (Why?), and appealing to (70) one easily shows (How?) that Ln has the spectral decomposition Ln = λn1 P1 + . . . + λnK PK . Thus, Ln has the eigenspaces as L with spectrum σ(Ln ) = {λn1 , . . . , λnK }. More generPsame n ally, let f (x) = k=0 ak xk be a polynomial. Then f (L) is self-adjoint (Why?) and has the spectral decomposition K X f (L) = f (λj )Pj . (71) j=1 32 It follows that σ(f (L)) = {f (λ1 ), . . .P , f (λK )} and f (L) has the same eigenspaces as L. k Generalizing further, let f (x) = ∞ k=0 ak x be an entire function, i.e. f (x) has a power series representation convergent for all x. Then, again one concludes that f (L) (Why is the power series with L convergent?) is self-adjoint and has the spectral decomposition (71). Example 16 Consider the entire function exp(x) = (if L is) and has the spectral representation exp(L) = K X P∞ xk k=0 n! . Then, exp(L) is self-adjoint exp(λj )Pj . j=1 Example 17 Let V = RN with the usual inner product and let A ∈ MN (R) be a selfadjoint N × N real matrix. Consider the system of ordinary differential equations ẋ(t) = A x(t) (72) with initial condition x(0) = x0 . Then the solution to the initial value problem for the system (72) is x(t) = exp(At)x0 which by the spectral theorem can be written x(t) = K X exp(λj t)Pj j=1 where A has the spectrum σ(A) = {λ1 , . . . , λK } and spectral projections Pj , j = 1, . . . , K. 5.2.5 Square-root of Positive Definite Hermitian Transformations Suppose L ∈ Herm has only positive eigenvalues. Then L has a unique positive definite Hermitian square-root. Indeed, if L is positive definite Hermitian, then it has the spectral decomposition L = µ1 P 1 + . . . + µK P K with µj ∈ R+ and Pj the perpendicular projection onto the eigenspace E(µj ). Define √ √ √ L := µ1 P1 + . . . + µK PK . √ √ Then clearly L is positive definite Hermitian and ( L)2 = L. Remark 38 The spectral theorem shows that a necessary and sufficient condition for a transformation L ∈ Lin(V) to have a real spectrum σ(L) = {λ1 , . . . , λN } ⊂ R and a set of eigenvectors forming an orthonormal basis for V, is for L to be Hermitian or self-adjoint. A natural question to ask is under what conditions on L does this result hold when the spectrum is permitted to be complex. The desired condition on L is called Normality. 33 5.2.6 Normal Transformations Definition 32 A transformation L ∈ Lin(V) is called Normal provided it commutes with its adjoint, i.e. L∗ L = LL∗ . Remark 39 It follows immediately from the definitions that unitary and Hermitian transformations are also normal. Moreover, if L has a spectral decomposition L = λ1 e1 ⊗ e1 + . . . + λN eN ⊗ eN (73) with λj ∈ C, j = 1, . . . , N and {e1 , . . . , eN } an orthonormal basis for V, then L is normal. The following generalization of the spectral theorem says that this condition is both necessary and sufficient for L to be normal. Theorem 12 A necessary and sufficient condition for L ∈ Lin(V) to be normal is that it have a Spectral Decomposition of the form (73). Proof: The proof follows exactly as in the case for self-adjoint transformations with the three key lemmas being modified as follows. Lemma 4 If L ∈ Lin(V) is normal, then Lej = λj ej if and only if L∗ ej = λj ej , and hence σ(L∗ ) = σ(L) and EL∗ (λj ) = EL (λj ) for each λj ∈ σ(L). Proof: Note first that if L is normal then so is L − λ I since (L − λ I)∗ = L∗ − λ I. Next observe that if L is normal, then |Lv| = |L∗ v| for all v ∈ V. This last fact follows from |Lv|2 = hLv, Lvi = hL∗ Lv, vi = hLL∗ v, vi = hL∗ v, L∗ vi = |L∗ v|2 . One now deduces that |(L − λ I)v| = |(L∗ − λ I)v| from which the conclusion of the lemma readily follows. (Why?) Lemma 5 If L is normal and λi , λj are distance eigenvalues, then their eigenspaces are orthogonal, i.e. EL (λj ) ⊥ EL (λi ). Proof: Suppose ei ∈ EL (λi ) and ej ∈ EL (λj ). Then λi hei , ej i = hλi ei , ej i = hLei , ej i = hei , L∗ ej i = hei , λj ej i = λj hei , ej i from which one concludes that hei , ej i = 0 if λi 6= λj thereby proving the lemma. Lemma 6 If L is normal with eigenvalue λ and associated eigenspace EL (λ), then L(EL (λ)⊥ ) ⊂ EL (λ)⊥ . Proof: Let v ∈ EL (λ)⊥ and e ∈ EL (λ). Then hv, ei = 0 and hLv, ei = hv, L∗ ei = hv, λei = λhv, ei = 0 from which one concludes that Lv ∈ EL (λ)⊥ thus proving the lemma. Using these lemmas, the proof of the spectral theorem for self-adjoint transformations now readily applies to the case of normal transformations. (How?) 34 5.2.7 Skew Transformation The Skew Transformations, also called Anti-Self-Adjoint, are defined as follows. Definition 33 A transformation W ∈ Lin(V) is called Skew provided it satisfies W∗ = −W. It follows immediately that skew transformations are normal, i.e. they commute with their adjoints, and therefore can be diagonalized (over the complex numbers) by a unitary transformation. For example, in R2 0 1 W= −1 0 √ is skew. It’s spectrum is σ(w) = {i, −i} where i = −1 (Why?) and Q∗ WQ = diag(i, −i) where 1 Q= √ 2 1 1 i −i is unitary (Why?). In R3 , every skew matrix has the form (Why?) 0 −γ β 0 −α . W= γ −β α 0 One then shows (How?) that W acts on a general vector v ∈ R3 through vector cross product with a vector w, called the axial vector of W, Wv = w × v with α w = β . γ (What is the spectrum of W?) 5.3 Decomposition Theorems Decomposition theorems give insight into the structure and properties of general linear transformations by expressing them as multiplicative or additive combinations of special classes of transformations whose structure and properties are easily understood. The spectral decomposition is one such result expressing self-adjoint or normal as linear combinations of perpendicular projections. Perhaps the most basic of decompositions expresses a general real transformation as the sum of a symmetric and a skew-symmetric transformation, and a general complex transformation as the linear combination of two Hermitian transformations. 35 5.3.1 Sym-Skew and Hermitian-Hermitian Decompositions Theorem 13 Let L ∈ Lin(V) be a transformation on the real vector space V. Then L can be written uniquely as the sum L=S+W (74) with S ∈ Sym(V) and W ∈ Skew(V) (75) where Sym(V) denotes the set of symmetric transformations (i.e. S ∈ Sym(V) ⇐⇒ ST = S) and Skew(V) denotes the set of skew-symmetric transformations (i.e. W ∈ Skew(V) ⇐⇒ WT = −W). Proof: The proof is by construction. Define 1 1 S := (L + LT ) and W := (L − LT ). 2 2 Then clearly (74,75) hold. Conversely, if (74,75) hold, then L + LT = (S + W) + (S + W)T = 2S and L − LT = (L + W) − (S + W)T = 2W thereby proving uniqueness of the representation (74,75). Theorem 14 Let L ∈ Lin(V) be a transformation on the complex vector space V. Then L can be written uniquely as the sum L = S + iW (76) with S, W ∈ Herm(V). (77) Proof: The proof is identical to that above for the real transformation case except that 1 1 S := (L + L∗ ) and W := −i (L − L∗ ). 2 2 Then and 1 1 W∗ = i (L∗ − L) = −i (L − L∗ ) = W 2 2 1 1 S + i W = (L + L∗ ) + (L − L∗ ) = L. 2 2 Remark 40 Notice that (76,77) is the analog of the Cartesian representation of a complex number z = x + i y with x, y ∈ R. 36 5.3.2 Polar Decomposition The Polar Decomposition gives a multiplicative decomposition of a nonsingular linear transformation into the product of a positive definite Hermitian transformation and a unitary transformation. It is useful to define the classes of transformations Orth+ := {Q ∈ Orth | det(Q) = 1}, PHerm := {S ∈ Herm | σ(S) ⊂ R+ } and PSym := {S ∈ Sym | σ(S) ⊂ R+ }. Theorem 15 Let L ∈ Lin(V) for the vector space V with det(L) 6= 0. The there exist U, V ∈ PHerm(V) and Q ∈ Unit(V) with L = QU Right Polar Decomposition = VQ Left Polar Decomposition. (78) (79) Proof: Note first that L∗ L ∈ PHerm(V) (Why?). √ Define U to be the (unique) positive ∗ definite Hermitian square-root of L L, i.e. U := L∗ L and then define Q := LU−1 . (80) It follows that Q∗ = U−1 L∗ (Why?) and hence that Q∗ Q = U−1 L∗ LU−1 = U−1 U2 U−1 = I. Therefore, Q is unitary and (78) holds. One now defines V := QUQ∗ and observes that V is Hermitian and positive definite thereby proving (79). Note also that V2 = LL∗ . Remark 41 An important application of the Polar Decomposition is to the definition of strain in nonlinear elasticity. To that end, let F denote a deformation gradient.6 Then one defines the Cauchy-Green strains by C := F∗ F Right Cauchy-Green Strain B := FF∗ Left Cauchy-Green Strain. Defining the Left and Right Stretches by √ U := C Right Stretch √ V := B Left Stretch, the left and right polar decompositions of F give F = VQ = QU which gives a multiplicative decomposition of the deformation gradient into a product of a stretch tensor and a rigid rotation. 6 The term deformation gradient will be defined rigorously in the Notes on Tensor Analysis. 37 5.4 Singular Value Decomposition As noted previously, not all square (real or complex) matrices can be diagonalized, that is become diagonal relative to a suitable basis. Indeed, only those (N × N )-matrices possessing a basis for CN consisting of eigenvectors can be diagonalized. The Singular Value Decomposition gives a representation reminiscent of the spectral decomposition for any (not necessarily square) matrix. Theorem 16 Let A ∈ MM,N (C) be an arbitrary M × N matrix. Then there exists unitary matrices V ∈ MM (C) and W ∈ MN (C) and a diagonal matrix D ∈ MM,N (C) satisfying A = VDW∗ (81) with D = diag(σj ), σ1 ≥ σ2 ≥ . . . ≥ σk > σk+1 = . . . = σL = 0 where k is the rank of A and L := min{M, N }.7 Remark 42 The nonnegative numbers {σ1 , . . . , σL } are called the Singular Values for the matrix A. Proof: It suffices to assume M ≤ N since otherwise, one need merely prove the result for A∗ and take the adjoint of the resulting decomposition. The first step in the proof is to take the adjoint of the desired decomposition (81) A∗ = WD∗ V∗ and form the M × M Hermitian matrix AA∗ = VDD∗ V∗ . (82) 2 = ... = The Hermitian matrix AA∗ has nonnegative eigenvalues σ12 ≥ σ22 ≥ . . . ≥ σk2 > σk+1 2 σM = 0 and (82) is its spectral decomposition. Define D := diag(σ1 , . . . , σM ) and V to be the unitary matrix in the spectral decomposition (82). The unitary matrix W∗ must satisfy V∗ A = DW∗ or equivalently A∗ V = WDT . (83) The N × N matrix W will be constructed by specifying its column vectors W = [w1 w2 . . . wN ]. (84) The left hand side of (83) has the form A∗ V = [(A∗ v1 ) . . . (A∗ vM )] (85) where V has the column vectors vj , j = 1, . . . , M . On the other hand, the right hand side of (83) has the column structure WDT = [(σ1 w1 ) . . . (σk wk ) 0 . . . 0] 7 A general matrix D = [dij ] ∈ MM,N is called diagonal if dij = 0 for i 6= j. 38 (86) in which the last M − k columns are the (N -dimensional) zero vector. Note that A∗ vj = 0, for j = k + 1, . . . , M as can be seen from 0 = (AA∗ vj ) · vj = (A∗ vj ) · (A∗ vj ) = |A∗ vj |2 for j = k + 1, . . . , M which follows from AA∗ vj = 0 for j = k + 1, . . . , M. Equating columns in (85) and (86), one defines the first k-columns of W by wj := 1 ∗ A vj , σj for j = 1, . . . , k. These k vectors w1 , . . . , wk are easily seen to form an orthonormal set. (Why?) One now defines the remaining N − k column vectors of W in such a way that the set {w1 , . . . , wN } is orthonormal thereby proving the theorem. 5.5 Solving Equations An important application of linear algebra is to the problem of solving simultaneous systems of linear algebraic equations. Specifically, let A ∈ MM,N (C) and consider the system of linear equations Ax = b (87) where b ∈ CM is given and x ∈ CN is sought. Thus, (87) is a system of M linear equations in N unknowns. In general, the system (87) can have no solutions, a unique solution or infinitely many solutions. If A is regarded as defining a linear transformation from CN into CM , then (87) has solutions if and only if b ∈ Ran(A) and such solutions are unique if and only if Nul(A) = {0}. Moreover, Ran(A) is the span of the columns of A. However, testing whether or not b ∈ Ran(A) can be computationally expensive. The Fredholm Alternative Theorem gives another characterization of Ran(A) that is often much easier to check. Theorem 17 Let A ∈ MM,N (C). Then Ran(A) = (Nul(A∗ ))⊥ . (88) Proof: First, note that if b ∈ Ran(A) and a ∈ Nul(A∗ ), then b = Ax for some x ∈ RN and hb, ai = hAx, ai = hx, A∗ ai = 0. Thus, Ran(A) ⊂ Nul(A∗ )⊥ . 39 (89) But one also has dim(Nul(A∗ )⊥ ) = M − dim(Nul(A∗ )) (Why?) = dim(Ran(A∗ )) (Why?) = dim(Ran(A)). (Why?) (90) Combining (89) and (90) one concludes that Ran(A) = Nul(A∗ )⊥ (Why?) as required. Remark 43 The Fredholm Alternative Theorem says that (87) has solutions if and only if b is orthogonal to the null-space of A∗ . Usually, the null-space of A∗ has small dimension, so testing for b ∈ Nul(A∗ )⊥ is often much easier than testing to see if it is in the span of the columns of A. In many applications, the system (87) is badly over determined (i.e. far more independent equations than unknowns) and one does not expect there to be a solution. In such cases, one might want to find best approximate solutions. A typical such example arises in linear regression in which one wishes to find a linear function that “best fits” a large number of data points, where best is usually in the least square sense. These ideas are made rigorous through the concept of Minimum Norm Least Square Solution to (87). 5.5.1 Minimum Norm Least Squares Solution A Least Squares Solution to (87) is any x ∈ RN that minimizes the Residual Error |Ax − b|. Let Q = {x ∈ RN | |Ax − b| = minz∈RN |Az − b|}. The set Q is always non-empty but may contain many vectors x. Indeed, form the direct sum decomposition RN = Ran(A) ⊕ Ran(A)⊥ and write b = br + b0 (91) br ∈ Ran(A) and b0 ∈ Ran(A)⊥ = Nul(AT ). (92) where Then, Ax − br ∈ Ran(A) and |Ax − b|2 = |Ax − br |2 + |b0 |2 (Why?). (93) Clearly, x ∈ Q if and only if Ax = br and |b0 | = min |Az − b|. z∈RN (94) Since br ∈ Ran(A), Q is non-empty. (Why?) Another characterization of Q is that x ∈ Q if and only if AT (Ax − b) = 0. To see why, observe that if x ∈ Q, then Ax = br from which it follows that AT (Ax − b) = AT (Ax − br ) + AT b0 = AT (Ax − br ) = 0. 40 Conversely, if AT (Ax − b) = 0 then AT (Ax − br ) = 0. Also Nul(AT ) = Ran(A)⊥ . Hence, Ax − br ∈ Ran(A)⊥ . But since it is obvious that (Ax − br ) ∈ Ran(A), one concludes that (Ax − br ) = 0. Next, note that Q is a convex subset of RN . To see this, let x1 , x2 ∈ Q and let α1 , α2 > 0 with α1 + α2 = 1. Then, AT A(α1 x1 + α2 x2 ) = = = = α1 AT Ax1 + α2 AT Ax2 α1 AT br + α2 AT br AT br (α1 + α2 ) AT br which implies that α1 x1 + α2 x2 ∈ Q and hence that Q is convex. It follows that Q has a unique vector of minimum norm, i.e. closest to the origin 0. (Why?) This motivates the following Definition 34 The Minimum Norm Least Squares Solution to the linear system (87) is the unique element x ∈ Q such that |x| = minz∈Q |z|. While the minimum norm least squares solution is well-defined, there remains the question of how to find it. The answer lies in the Moore-Penrose Pseudo-Inverse of A. 5.5.2 Moore-Penrose Pseudo-Inverse The Moore-Penrose Pseudo-Inverse of A is defined through its singular value decomposition. (81). Definition 35 Given any matrix A ∈ MM,N (C) with Singular Value Decomposition A = VDW∗ , its Moore-Penrose Pseudo-Inverse A† ∈ MN,M is defined by A† := WD† V∗ (95) where D† ∈ MN,M is the N × M diagonal matrix D† := diag(δj ) with δj := 1 σj for j = 1, . . . , k, and δj = 0 for j = k + 1, . . . , L (96) (97) and L = min{M, N }, k :=Rank(A). Remark 44 Note that if M ≤ N and k =Rank(A) = M , then A† is a right inverse of A, i.e. AA† = IM where IM denotes the M × M identity matrix. Similarly, if N ≤ M and k =Rank(A) = N , then A† is a left inverse of A, i.e. A† A = IN the N × N identity matrix. The principal use of the Moore-Penrose pseudo-inverse is given in the following theorem. Theorem 18 The Minimum Norm Least Squares solution of the system of equations Ax = b is given by x = A† b. 41 Proof: Consider the singular value decomposition A = VDWT and suppose Rank(A) = k. Moreover, define a := WT x and let a have components [a] = [aj ]. Also, assume V has the column structure V = [v1 . . . vM ] Observe first that |Ax − b|2 = |VT (Ax − b)|2 (Why?) = |VT AWWT x − VT b|2 = |DWT x − VT b|2 k M X X 2 = (σj aj − vj · b) + (vj · b)2 . j=1 j=k+1 It follows that |Ax − b| is minimized by choosing aj = 1 vj · b, σj for j = 1, . . . , k. (98) Also, |x|2 = |WT x|2 = a21 + . . . + a2k + a2k+1 + . . . + a2N which is minimized by choosing ak+1 = . . . = aN = 0. One now sees that (How?) x = WDVT b = A† b thereby proving the theorem. A natural question to ask is: How are the singular value decomposition and spectral decomposition related? This answer is in the following Remark 45 Suppose A ∈ MN (C) is normal, i.e. AA∗ = A∗ A. Then A has a spectral decomposition A = Vdiag(λ1 , . . . , λN )V∗ with V unitary and the spectrum σ(A) ⊂ C. Write λj = |λj |eiθj and define the diagonal matrix D := diag(|λ1 |, . . . , |λN |) and unitary matrix W := VD (Why is it unitary?). Then A has the singular value decomposition A = VDW∗ . (Verify.) Remark 46 A bit more surprising is the following relationship between the polar decomposition and the singular value decomposition. Suppose A ∈ MN (R) with det(A) 6= 0 has polar decomposition A = UQ with U ∈ PSym and Q ∈ Orth+ . Then U has a spectral decomposition U = VDVT with V ∈ Orth, D = diag(σ1 , . . . , σN ) and spectrum {σ1 , . . . , σN } ⊂ R+ . Then A has the singular value decomposition A = VDWT with W := QT V. (Verify.) Example 18 Let A be the 1 × N matrix A := [a1 . . . aN ]. Then show that the singular value decomposition of A = VDW∗ with V = [1] being the 1 × 1 identity matrix, D = [|a| 0 . . . 0] and W = [w1 . . . wN ] with w1 := a/|a| and w2 , . . . , wN chosen so that {w1 , . . . , wN } forms an orthonormal basis for CN and the N -tuple a defined by a1 .. a = . . aN 42 Exercise 5 Find the minimum norm least squares solution to the N ×1 system of equations a1 x = b 1 .. . aN x = b N . (Hint: To find the singular value decomposition for the N × 1 matrix a1 .. A := . aN first construct the SVD for its 1 × N transpose AT = [a1 . . . aN ].) 6 Rayleigh-Ritz Theory Thus far, eigenvalues and eigenvectors have been studied from an algebraic and geometric perspective. However, they also have a variational characterization that is revealed in the Rayleigh-Ritz Theory. The first result considered is the foundational Rayleigh-Ritz Theorem. Theorem 19 Let L ∈ (V, h·, ·i)8 be self-adjoint, i.e. L∗ = L. Let its (real) eigenvalues be ordered λ1 ≤ λ2 ≤ . . . ≤ λN . Then λmax λ1 ≤ hLv, vi ≤ λN = λN = sup hLv, vi for all v ∈ V, |v| = 1 (99) (100) |v|=1 λmin = λ1 = inf hLv, vi. |v|=1 (101) Proof: By the Spectral Theorem, there is an orthonormal basis {b1 , . . . , bN } for V consisting of eigenvectors of L satisfying Lbj = λj bj for j = 1, . . . , N . It follows that for every unit vector v ∈ V, N X hLv, vi = λj |hbj , vi|2 (Why?) (102) j=1 and |hb1 , vi|2 + . . . + |hbN , vi|2 = 1. (Why?) (103) Equation (102) says that the quadratic form hLv, vi for unit vectors v is a weighted average of the eigenvalues of L. One then immediately deduces the inequalities (99). (How?) Moreover, choosing v = bN and b1 , gives (100,101), respectively. (How?) The Rayleigh-Ritz theorem gives a variational characterization of the smallest and largest eigenvalues of a self-adjoint linear transformation. However, an easy extension of the theorem characterizes all of the eigenvalues as solutions to constrained variational problems. 8 Unless otherwise indicated, throughout this section (V, h·, ·i) is assumed to be a complex inner product space. 43 Theorem 20 With the same hypotheses and notation as the previous theorem, the eigenvalues λ1 ≤ . . . ≤ λN of L satisfy λk = hLv, vi min (104) |v|=1 v⊥{b1 ,...,bk−1 } and λN −k = hLv, vi. max (105) |v|=1 v⊥{bN ,...,bN −k+1 } Proof: The proof of the previous theorem is easily generalized to prove (104,105). To that end, suppose v is a unit vector orthogonal to the first k − 1 eigenvectors, {b1 , . . . , bk−1 }. Then one easily shows (How?) that |hv, bk i|2 + . . . + |hv, bN i|2 = 1 and hLv, vi = N X λj |hv, bj i|2 = j=1 N X λj |hv, bj i|2 ≥ λk . (Why?) (106) j=k But one also has bk ⊥ {b1 , . . . , bk−1 } and hLbk , bk i = λk . (107) The result (104) nows follows readily from (106,107). The proof of (105) is similar and left as an exercise. The Rayleigh-Ritz theory is a powerful theoretical tool for studying eigenvalues for a selfadjoint linear transformation. However, the necessity of knowing explicitly the eigenvectors in order to estimate eigenvalues other than the smallest and largest seriously limits its use as a practical tool. This limitation can be avoided through the following generalization known as the Courant-Fisher Theorem which gives a min-max and max-min characterization of the eigenvalues of a self-adjoint linear transformation. Theorem 21 With the same hypotheses and notation of the previous two theorems, the eigenvalues λ1 , . . . , λN of L satisfy λk = min hLv, vi (108) hLv, vi. (109) max w1 ,...,wN −k |v|=1 v⊥{w1 ,...,wN −k } and λk = max min w1 ,...,wk−1 |v|=1 v⊥{w1 ,...,wk−1 } Proof: From the Spectral Theorem it follows that for any unit vector v ∈ V v = b1 hv, b1 i + . . . + bN hv, bN i Lv = λ1 b1 hv, b1 i + . . . + λN bN hv, bN i 1 = |hv, b1 i|2 + . . . + |hv, bN i|2 44 and hence that hLv, vi = λ1 |hv, b1 i|2 + . . . + λN |hv, bN i|2 . That (108,109) are true for k = 1 or N follows immediately from the Rayleigh-Ritz Theorem. As a special case, consider first k = N − 1. Then (108) takes the form λN −1 = min maxhLv, vi. w (110) |v|=1 v⊥w The idea now is to choose amongst all the unit vectors v orthogonal to a given vector w only those lying in the plane orthogonal to {b1 , . . . , bN −2 }. Such unit vectors v must satisfy 1 = |hv, bN −1 i|2 + |hv, bN i|2 (Why?) v = bN −1 hv, bN −1 i + bN hv, bN i (Why?) hLv, vi = λN −1 |hv, bN −1 i|2 + λN |hv, bN i|2 . (Why?) It follows that (Why?) max hLv, vi ≥ max 1=|hv, bN −1 i|2 +|hv, bN i|2 |v|=1 v⊥w λN −1 |hv, bN −1 i|2 + λN |hv, bN i|2 ≥ λN −1 . (111) v⊥w Since (111) holds for all vectors w, one concludes that min max hLv, vi ≥ λN −1 . w (112) |v|=1 v⊥w The desired result (110) now follows from (112) and the observation that max hLv, vi = λN −1 |v|=1 v⊥bN due to the Rayleigh-Ritz theorem. The proof of (108) for all eigenvalues λk proceeds along the same lines as that just given for λN −1 . More specifically, one chooses amongst all the unit vectors v orthogonal to a given set of N − k vectors {w1 , . . . , wN −k } only those orthogonal to {b1 , . . . , bk−1 } (Why can this be done?). Then such v must satisfy 1 = |hv, bk i|2 + . . . + |hv, bN i|2 v = bk hv, bk i + . . . + bN hv, bN i hLv, vi = λk |hv, bk i|2 + . . . + λN |hv, bN i|2 . It follows that (Why?) max hLv, vi ≥ |v|=1 v⊥{w1 ,...,wN −k } max 1=|hv, bk i|2 +...+|hv, bN i|2 λk |hv, bk i|2 + . . . + λN |hv, bN i|2 ≥ λk . (113) v⊥{w1 ,...,wN −k } The desired result (108) now follows from (113) and the fact that max 1=|hv, bk i|2 +...+|hv, bN i|2 λk |hv, bk i|2 + . . . + λN |hv, bN i|2 = λk v⊥{bk+1 ,...,bN } due to the Rayleigh-Ritz Theorem. The proof of (109) proceeds along similar lines and is left as an exercise. 45 Remark 47 The Rayleigh-Ritz Theorem (105) says that the k-th eigenvalue, λk , is the maximum value of the Rayleigh Quotient hLv, vi hv, vi (114) over the k-dimensional subspace {bk+1 , . . . , bN }⊥ = span{b1 , . . . , bk }. The Courant-Fisher Theorem (108) says that the maximum of the Rayleigh quotient (114) over an arbitrary k-dimensional subspace is always at least as large as λk . 46