THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 LANCE D. DRAGER 1. The Gram-Schmidt Process Algorithm 1.1. Let π be an inner product space over K. Let π£1 , π£2 , . . . , π£π be a basis of π . The Gram-Schmidt Process below constructs an orthonormal basis π’1 , π’2 , . . . , π’π such that span(π’1 , π’2 , . . . , π’π ) = span(π£1 , π£2 , . . . , π£π ), π = 1, 2, . . . , π. Introduce the notation ππ = span(π’1 , π’2 , . . . , π’π ) ππ = span(π£1 , π£2 , . . . , π£π ). Recall that if π ⊆ π is a subspace with orthonormal basis π’1 , π’2 , . . . , π’π , and π£ ∈ π , we can write π£ uniquely as π£ = π€ + π, where π€ ∈ π and π is orthogonal to π . Specifically π€ = projπ (π£) = π ∑οΈ β¨π’π , π£β©π’π π=1 π= proj⊥ π (π£) = π£ − projπ (π£). To start the inductive construction, let π’′1 = π£1 π’1 =π’′1 /βπ’′1 β It should be clear that π1 = π1 and we record the fact that π£1 = βπ’′1 βπ’1 . For the next step, we define π’′2 = proj⊥ π1 (π£2 ) = π£2 − projπ1 (π£2 ) = π£2 − β¨π’1 , π£1 β©π’1 π’2 = π’′2 /βπ’′2 β Version Time-stamp: ”2012-11-29 16:58:23 drager”. 1 2 LANCE D. DRAGER Clearly π’′2 , and hence π’2 , are orthogonal to π’1 . If π’′2 = 0, then π£2 ∈ π1 = π1 , which implies that π£1 and π£2 are dependent contrary to our assumption. Thus, π’′2 ΜΈ= 0 and the definition of π’2 is legitimate. Since π’1 ∈ π1 = π1 , we can say that π’′2 ∈ span(π£1 , π£2 ) = π2 and π’1 ∈ π1 ⊆ π2 , so span(π’1 , π’′2 ) ⊆ π2 . On the other hand, we have π£2 = π’′2 + β¨π’1 , π£1 β©π’1 ∈ span(π’1 , π’′2 ) and π£1 ∈ π1 = π1 = span(π’1 ). Thus, π2 = span(π£1 , π£2 ) ⊆ span(π’1 , π’′2 ), and we conclude that span(π’1 , π’′2 ) = span(π£1 , π£2 ) = π2 . Since π’2 is just a scalar multiple of π’′2 , span(π’′2 , π’1 ) = span(π’1 , π’2 ) = π2 . Thus, π2 = π2 . We record the equation π£2 = βπ’′2 βπ’2 + β¨π’1 , π£1 β©π’1 For the next step, we define π’′3 = proj⊥ π2 (π£3 ) = π£3 − projπ2 (π£3 ) = π£3 − β¨π’1 , π£3 β©π’1 − β¨π’2 , π£3 β©π’2 π’3 = π’′3 /βπ’′3 β. and we can prove π3 = π3 . The reader should check this as an exercise, but we’ll do the general case in a moment. For the inductive step, suppose that we have constructed an orthonormal list π’1 , π’2 , . . . , π’β so that ππ = ππ . for π = 1, 2 . . . , β. We THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 3 define π’′β+1 = proj⊥ πβ (π£β+1 ) = π£β+1 − projπβ (π£β+1 ) = π£β+1 − π’β+1 = β ∑οΈ β¨π’π , π£β+1 β©π’π , π=1 ′ π’β+1 /βπ’′β+1 β. If we had π’′β+1 = 0, we would have π£β+1 ∈ πβ = πβ , which would contradict the independence of π£1 , π£2 , . . . , π£π . Thus, our definition of π’β+1 is legitimate. Since πβ = πβ , we have π’′β+1 ∈ πβ+1 , and we already know that π’1 , . . . , π’β are in πβ ⊆ πβ+1 . Thus, span(π’1 , π’2 . . . , π’β , π’′β+1 ) ⊆ πβ+1 . On the other hand, π£β+1 = π’′β+1 + β ∑οΈ β¨π’π , π£β+1 β©π’π , π=1 so π£β+1 ∈ span(π’1 , π’2 . . . , π’β , π’′β+1 ) We already know π£1 , . . . , π£β ∈ span(π’1 , . . . , π’β ) = πβ = πβ . Thus, span(π£1 , π£2 , . . . , π£β+1 ) ⊆ span(π’1 , π’2 . . . , π’β , π’′β+1 ). Thus, span(π’1 , π’2 . . . , π’β , π’′β+1 ) = πβ+1 . Since π’β+1 is just a scalar multiple of π’′β+1 , we conclude that πβ+1 = πβ+1 . We record the fact that π£β+1 = βπ’′β+1 βπ’β+1 + β ∑οΈ β¨π’π , π£β+1 β©π’π . π=1 Continuing this inductive construction, we arrive at π’1 , π’2 , . . . , π’π , as stated in the algorithm. Corollary 1.2. Every finite dimensional inner product space over K has an orthonormal basis. In the rest of these notes, we’ll work out some consequences of the Gram-Schmidt algorithm. 4 LANCE D. DRAGER 2. Adjoint Transformations First, we prove a classic theorem, which is simple in this case. Theorem 2.1 (Riez Representation Theorem). Let π be an inner product space over K. Let π : π → K be a linear map. Then there is a unique vector π€ ∈ π so that π(π£) = β¨π€, π£β©, ∀π£ ∈ π. Briefly, π = β¨π€, ·β©. Remark 2.2. By tradition, a linear map π : π → K is called a linear functional. Proof of Theorem. Let π be a linear functional on π . Choose an orthonormal basis π’1 , π’2 , . . . , π’π , where π = dim(π ). Define a vector π€ ∈ π by π€ = π(π’1 )π’1 + π(π’2 )π’2 + · · · + π(π’π )π’π . Then we have β¨π€, π’π β© = β¨∑οΈ π β© π(π’π ) π’π , π’π π=1 = π ∑οΈ π(π’π )β¨π’π , π’π β© π=1 = π ∑οΈ π(π’π )πΏππ π=1 = π(π’π ). Since π was arbitrary, we conclude π(π’π ) = β¨π€, π’π β© for π = 1, 2, . . . , π. Thus, the linear maps π and β¨π€, ·β© agree on a basis, so they must be the same. To prove the vector π€ is unique, suppose that β¨π€1 , ·β© = π = β¨π€2 , ·β©. Then, for all π£ ∈ π , we have < π€1 , π£ >=< π€2 , π£ > =⇒ < π€1 , π£ > − < π€2 , π£ >= 0 =⇒ < π€1 − π€2 , π£ >= 0. Thus, π€1 − π€2 = 0. THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 5 Theorem 2.3. Let π and π be inner product spaces over K and let π : π → π be a linear map. Then there is a unique linear map π : π → π so that β¨π€, π (π£)β©π = β¨π(π€), π£β©π Here β¨·, ·β©π is the inner product on π and β¨·, ·β©π is the inner product on π . Remark 2.4. Usually we’ll drop the subscripts on the inner products, which should be clear from context, unless it seems particularly useful to show the distinction. Proof of Theorem. If we fix a vector π€ ∈ π , then the map π£ β¦→ β¨π€, π (π£)β© is a linear functional on π . By the Riez Representation Theorem, there is a unique vector π’ ∈ π so that β¨π€, π (π£)β© = β¨π’, π£β©. Since π’ is determined by π€, there is a unique function π : π → π that sends π€ to the corresponding vector π’. Thus, β¨π€, π (π£)β© = β¨π(π€), π£β© for all π£ and π€. It remains to prove that this function π is linear. To do this, let π€1 and π€2 be vectors in π and let πΌ and π½ be scalars. Consider β¨πΌπ€1 + π½π€2 , π (π£)β©. On the one hand, β¨πΌπ€1 + π½π€2 , π (π£)β© = β¨π(πΌπ€1 + π½π€2 ), π£β©. On the other hand, ¯ 2 , π (π£)β© β¨πΌπ€1 + π½π€2 , π (π£)β© = πΌ ¯ β¨π€1 , π (π£)β© + π½β¨π€ ¯ =πΌ ¯ β¨π(π€1 ), π£β© + π½β¨π(π€ 2 ), π£β© = β¨πΌπ(π€1 ) + π½π(π€2 ), π£β©. Thus, β¨π(πΌπ€1 + π½π€2 ), π£β© = β¨πΌπ(π€1 ) + π½π(π€2 ), π£β©, 6 LANCE D. DRAGER Since π£ is arbitrary, we conclude that π(πΌπ€1 + π½π€2 ) = πΌπ(π€1 ) + π½π(π€2 ). We’ve now shown that π is linear and the proof is complete. Definition 2.5. The unique linear transformation π in Theorem 2.3 will be denoted π * . We call π * the adjoint of π . The reader is highly advised to write out the details of the following Theorem. Theorem 2.6. The operation π β¦→ π * has the following properties. (1) (π * )* = π . ¯ *. (2) (πΌπ + π½π)* = πΌ ¯ π * + π½π * * * (3) (ππ ) = π π . Note that the order reverses. Exercise 2.7. Prove the following. (1) π is injective if and only if π * is subjective. (2) π is subjective if and only if π * is injective. Let’s examine what happens in the case of the standard inner products on Kπ . The usual inner product is β¨π₯, π¦β© = π ∑οΈ π₯¯π π¦π , π=1 where β‘ β€ π₯1 β’ π₯2 β₯ β₯ π₯=β’ β£ ... β¦ , β‘ β€ π¦1 β’π¦2 β₯ β₯ π¦=β’ β£ ... β¦ , π₯π π¦π are column vectors. If π΄ is an π × π matrix over K, with entries π΄ = [πππ ], we define the matrix π΄* = [πππ ] by πππ = πππ . ¯ = (π΄π‘ )¯, where we take the conjugate of each In other words, π΄ = (π΄) entry in the matrix and then take the transpose. In the case K = R, π΄* = π΄π‘ . * π‘ Remark 2.8. One convection sometimes used is to write π΄π for the transpose and π΄π» for π΄* (H for Hermitian transpose). THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 7 Exercise 2.9. Show that the operation on matrices sending π΄ to π΄* has the properties (1) (π΄* )* = π΄. ¯ * (2) (πΌπ΄ + π½π΅)* = πΌ ¯ π΄* + π½π΅ (3) (π΄π΅)* = π΅ * π΄* . With this definition, we can write β¨π₯, π¦β© = π ∑οΈ π₯¯π π¦π π=1 [οΈ = π₯¯1 π₯¯2 β‘ β€ π¦1 ]οΈ β’π¦2 β₯ β₯ . . . π₯¯π β’ β£ ... β¦ π¦π * = π₯ π¦. Let π΄ be an π × π matrix, which we can think of as defining a linear transformation Kπ → Kπ . We then have β¨π΄π₯, π¦β© = (π΄π₯)* π¦ = (π₯* π΄* )π¦ = π₯* (π΄* π¦) = β¨π₯, π΄* π¦β©. In other words, if π : Kπ → Kπ : π₯ β¦→ π΄π₯ is the transformation given by multiplication by π΄, then π * : Kπ → Kπ , the adjoint of π , is given by multiplication by π΄* (so our notation should not cause any confusion). We can carry this idea further to general vector spaces. Theorem 2.10. Let π be an inner product space over K and let π = dim(π ). Let π± be an orthonormal basis π . Recall that the coordinate map ππ± : π → Kπ sends π£ to it’s coordinate vector, denoted [π£]π± , with respect to π±. In other words, [π£]π± is the unique column vector so that π£ = π± [π£]π± . Then we have < π£, π€ >π = β¨[π£]π± , [π€]π± β©Kπ = [π£]*π± [π€]π± . 8 LANCE D. DRAGER Perhaps it will help to say that the following diagram commutes. β¨·,·β©π π ×π / K ππ± ×ππ± / Kπ × Kπ β¨·,·β©Kπ K where ππ± × ππ± : (π£, π€) β¦→ ([π£]π± , [π€]π± ) [οΈ ]οΈ Proof of Theorem. Our orthonormal basis is π± = π£1 π£2 . . . π£π . If [π£]π± = π₯, then π£ = π₯1 π£1 + π₯2 π£2 + · · · + π₯π π£π . Similarly, if [π€]π± = π¦, π€ = π¦1 π£1 + π¦2 π£2 + · · · + π¦π π£π . Then, β¨π£, π€β© = β¨∑οΈ π π₯π π£π , π=1 = π ∑οΈ π ∑οΈ π ∑οΈ β© π¦π π£π π=1 π₯¯π π¦π β¨π£π , π£π β© π=1 π=1 = π ∑οΈ π ∑οΈ π₯¯π π¦π πΏππ π=1 π=1 = π ∑οΈ π₯¯π π¦π π=1 = β¨π₯, π¦β© = β¨[π£]π± , [π€]π± β©. Theorem 2.11. Let π and π be inner product spaces over K. Let π = dim(π ) and π = dim(π ). Choose orthonormal bases π± for π and π² for π . Let π : π → π be a linear transformation and let π΄ = [π ]π±π² be the matrix of π with respect to our chosen bases. Then then matrix of π * : π → π is π΄* , i.e., [π * ]π²π± = [π ]*π±π² . THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 9 To put it yet another way, if π£ ∈ π and π€ ∈ π , then (2.1) β¨π (π£), π€β©π = β¨π΄ [π£]π± , [π€]π² β©Kπ (οΈ )οΈ* = π΄ [π£]π± [π€]π² = [π£]*π± (π΄* [π€]π² ) = β¨[π£]π± , π΄* [π€]π² β©Kπ = β¨π£, π * (π€)β©π . Remark 2.12. Warning! Warning! The last theorem only works if you chose orthonormal bases. Proof of Theorem. The manipulations in (2.1) are straight forward. We just have to show that (2.1) implies that [π * ]π²π± = π΄* . Let’s focus on β¨[π£]π± , π΄* [π€]π² β©Kπ = β¨π£, π * (π€)β©π . For notational convenience, let πΆ = [π * ]π±π² . By Theorem 2.10, β¨π£, π * (π€)β©π = β¨[π£]π± , [π * (π€)]π± β©Kπ = β¨[π£]π± , [π * ]π±π² [π€]π² β©Kπ = β¨[π£]π± , πΆ [π€]π² β©Kπ . Thus, β¨[π£]π± , πΆ [π€]π² β©Kπ = β¨[π£]π± , π΄* [π€]π² β©Kπ , for all vectors π£ ∈ π and π€ ∈ π . But we can make [π£]π± and [π€]π² any vectors we want by an appropriate choice of π£ and π€. Thus, we must have β¨π₯, π΄* π¦β© = β¨π₯, πΆπ¦β© for all vectors π₯ ∈ Kπ and π¦ ∈ Kπ . If we fix π¦, we have β¨π₯, π΄* π¦ − πΆπ¦β© = 0 for all π₯, which implies π΄* π¦ − πΆπ¦ = 0. Thus π΄* π¦ = πΆπ¦ for all π¦, which we know from previous work implies π΄* = πΆ. Our proof is complete. 10 LANCE D. DRAGER 3. Orthogonal Decompositions Let π be an inner product space over K. If π ⊆ π is any set, we define π ⊥ = {π£ ∈ π | ∀π₯ ∈ π, β¨π£, π₯β© = 0}, i.e., the set of vectors that are orthogonal to everything in π. Theorem 3.1. Let π be an inner product space over K and let π ⊆ π be any set. (1) π ⊥ is a subspace of π . (2) For any set π ⊆ π , π ⊥ = span(π)⊥ . (3) If π = span(π 1 , π 2 , . . . , π π ) then π£ ∈ π ⊥ if and only if < π π , π£ >= 0 for π = 1, 2, . . . , π. Proof. For the first part, note first that 0 ∈ π ⊥ . To show that π ⊥ is closed under addition and scalar multiplication, let π£1 , π£2 ∈ π ⊥ and let π1 , π2 ∈ K. For any π₯ ∈ π, we have β¨π₯, π1 π£1 + π2 π£2 β© = π1 β¨π₯, π£1 β© + π2 β¨π₯, π£2 β© = π1 0 + π2 0 = 0, so π1 π£1 + π2 π£2 ∈ π ⊥ . Since we didn’t say that π is finite, we should add that span(π) is defined to be the set of all finite linear combinations of elements of π, i.e., all sums of the form π 1 π₯1 + π 2 π₯2 + · · · + π π π₯π where π₯1 , π₯2 , . . . , π₯π ∈ π and the ππ ’s are scalars. See Exercise 3.2. Clearly, π ⊆ span(π), since if π₯ ∈ π, then π₯ = 1π₯ ∈ span(π). Consider the second statement in the Theorem. We first show that span(π)⊥ ⊆ π ⊥ . To do this, suppose that π£ ∈ span(π)⊥ . This means that β¨π£, π β© = 0 for all π ∈ span(π). But, π ⊆ span(π), so β¨π£, π₯β© = 0 for all π₯ ∈ π. Thus π£ ∈ π ⊥ . See Exercise 3.3 Secondly, we show the other inclusion π ⊥ ⊆ span(π)⊥ . Suppose π£ ∈ π ⊥ , which means β¨π£, π₯β© = 0 for all π₯ ∈ π. If π ∈ span(π), then π = π1 π₯1 + π2 π₯2 + · · · + ππ π₯π for some π₯π ∈ π and scalars ππ . But then β¨ ∑οΈ β© ∑οΈ π π π ∑οΈ β¨π£, π β© = π£, ππ π₯π = ππ β¨π£, π₯π β© = ππ 0 = 0. π=1 π=1 π=1 Thus, π£ ∈ span(π)⊥ . The proof of the third statement is very similar to the proof of the second statement and is left as (yet another) exercise. THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 11 Exercise 3.2. Suppose that π ⊆ π . Show that span(π), as defined in the proof is a subspace. Show span(π) is the smallest subspace containing π. Show that is π is finite, span(π) is the span of finitely many vectors as we have previously defined it. The next exercise will (probably) be used later. Exercise 3.3. Let π ⊆ π ⊆ π , where π is an inner product space. Then π ⊥ ⊆ π ⊥ . Theorem 3.4 (Orthongonal Decomposition Theorem). Let π be an inner product space of dimension π over K. If π is a subspace of π , then π = π ⊕ π ⊥. It follows that π ⊥⊥ = π. Proof. Let the dimension of π be π. Choose any basis π£1 , π£2 , . . . , π£π of π. We can complete this linearly independent set to a basis π£1 , π£2 , . . . , π£π , π£π+1 , . . . π£π of π . Apply the Gram-Schmidt process of the basis π£1 , π£2 , . . . , π£π to get an orthonormal basis π’1 , π’2 , . . . , π’π of π . By the properties of the GramSchmidt process, span(π’1 , π’π , . . . , π’π ) = span(π£1 , π£2 , . . . , π£π ) = π, so π’1 , π’2 , . . . , π’π is an orthonormal basis of π. Define π = span(π’π+1 , π’π+2 , . . . , π’π ). We have, of course, π = π ⊕ π, (if that’s not obvious, check the definition of direct sum). We claim that π = π ⊥ . To see this, first suppose that π€ ∈ π . Then we have π€= π−π ∑οΈ π=1 ππ+π π’π+π 12 LANCE D. DRAGER for some scalars ππ+π . Consider π’π , where π ∈ {1, . . . , π}. We have β¨π’π , π€β© = β¨π’π , π−π ∑οΈ ππ+π π’π+π β© π=1 = π−π ∑οΈ ππ+π β¨π’π , π’π+π β© π=1 = π−π ∑οΈ ππ+π πΏπ,π+π π=1 = π−π ∑οΈ because π ΜΈ= π + π ππ+1 0, π=1 = 0. By the third statement in Theorem 3.1, we conclude that π€ ∈ π ⊥ . Thus, π ⊆ π ⊥ . To do the reverse inclusion, suppose that π₯ ∈ π ⊥ . Since π₯ ∈ π , we can write it in terms of our orthonormal basis; in fact, we know what the coefficients must be. We have π₯ = β¨π’1 , π₯β©π’1 + β¨π’2 , π₯β©π’2 + · · · + β¨π’π , π₯β©π’π + β¨π’π+1 , π₯β© + · · · + β¨π’π , π₯β©π’π . But, π₯ ∈ π ⊥ , so we must have β¨π’π , π₯β© = 0, for π = 1, 2, . . . , π (since these π’π ’s are in π). But then π₯ = β¨π’π , π₯β©π’π + β¨π’π+1 , π₯β© + · · · + β¨π’π , π₯β©π’π ∈ π. We’ve now show π ⊥ ⊆ π , so π = π ⊥ . We now have π = π ⊕ π⊥ To get the last statement of the theorem, we have to show that π = π ⊥⊥ = (π ⊥ )⊥ . Let π£ ∈ π . The we can write π£ = π + π uniquely where π ∈ π and π ∈ π ⊥ . Thus, π£ ∈ π if and only if π = 0. We have π£ ∈ (π ⊥ )⊥ ⇐⇒ π£ ⊥ π ⊥ ⇐⇒ β¨π£, πβ© = 0, ∀π ∈ π ⊥ ⇐⇒ 0 = β¨π + π, πβ© = β¨π , πβ© + β¨π, πβ© = β¨π, πβ©, ∀π ∈ π ⊥ ⇐⇒ π = 0, since π ∈ π ⊥ ⇐⇒ π£ = π ⇐⇒ π£ ∈ π. THE GRAM-SCHMIDT PROCESS MATH 5316, FALL 2012 13 Thus, (π ⊥ )⊥ = π. Exercise 3.5. If π is just a subset of π , show that (π ⊥ )⊥ = span(π). This theorem has many nice consequences. Exercise 3.6. Let π and π be inner product spaces over K and let π : π → π be a linear transformation, so π * : π → π . Show that π = im(π ) ⊕ ker(π * ) π = im(π * ) ⊕ ker(π ), where these are orthogonal direct sums, e.g., im(π )⊥ = ker(π * ). Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409-1042 E-mail address: lance.drager@ttu.edu