The Orthogonal Projection Matrix onto C(X) and the Normal Equations c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 53 Theorem 2.1: PX , the orthogonal projection matrix onto C(X), is equal to X(X0 X)− X0 . The matrix X(X0 X)− X0 satisfies (a) [X(X0 X)− X0 ][X(X0 X)− X0 ] = [X(X0 X)− X0 ] (b) X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn (c) X(X0 X)− X0 x = x ∀ x ∈ C(X) (d) X(X0 X)− X0 = [X(X0 X)− X0 ]0 (e) X(X0 X)− X0 is the same ∀ GI of X0 X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 53 It is useful to establish a few results before proving Theorem 2.1. Show that ifn×n A is a symmetric matrix and n×n G is a GI ofn×n A, then n×n G0 is also GI ofn×n A. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 3 / 53 Proof: AGA = A ⇒ (AGA)0 = A0 ⇒ A0 G0 A0 = A0 ⇒ AG0 A = A (∵ A = A0 ) ∴ G0 is a GI of A. c Copyright 2012 Dan Nettleton (Iowa State University) . Statistics 611 4 / 53 Result 2.4: X0 XA = X0 XB ⇐⇒ XA = XB. One direction is obvious (⇐=). Can you prove the other (=⇒)? c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 5 / 53 Proof (=⇒): (XA − XB)0 (XA − XB) = (A0 X0 − B0 X0 )(XA − XB) = A0 X0 XA − A0 X0 XB − B0 X0 XA + B0 X0 XB = A0 X0 XA − A0 X0 XA − B0 X0 XA + B0 X0 XA = 0. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 53 ∴ by Lemma A1, XA − XB = 0 ⇒ XA = XB. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 53 Result 2.5: Suppose (X0 X)− is any GI of X0 X. Then (X0 X)− X0 is a GI of X, i.e., X(X0 X)− X0 X = X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 53 Proof of Result 2.5 Since (X0 X)− is a GI of X0 X, X0 X(X0 X)− X0 X = X0 X. By Result 2.4, the result follows. (Take A = (X0 X)− X0 X and B = I. Then XA = X(X0 X)− X0 X = X = XB.) c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 9 / 53 Corollary to Result 2.5: For (X0 X)− any GI of X0 X, X0 X(X0 X)− X0 = X0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 53 Proof of Corollary: Suppose (X0 X)− is any GI of X0 X. ∵ X0 X is symmetric, [(X0 X)− ]0 is also a GI of X0 X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 11 / 53 Thus Result 2.5 implies that X[(X0 X)− ]0 X0 X = X 0 ⇒ X[(X0 X)− ]0 X0 X = [X]0 ⇒ X0 X(X0 X)− X0 = X0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 12 / 53 Now we can prove Theorem 2.1: First show X(X0 X)− X0 is idempotent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 53 X(X0 X)− X0 X(X0 X)− X0 = X(X0 X)− X0 by Result 2.5. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 14 / 53 Now show that X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 53 X(X0 X)− X0 y = Xz, where z = (X0 X)− X0 y. Thus, X(X0 X)− X0 y ∈ C(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 16 / 53 Now show that X(X0 X)− X0 z = z ∀ z ∈ C(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 53 If z ∈ C(X), ∃ c 3 z = Xc. ∴ X(X0 X)− X0 z = X(X0 X)− X0 Xc = Xc (By Result 2.5) = z. Thus, we have X(X0 X)− X0 z = z ∀ z ∈ C(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 53 Now we know X(X0 X)− X0 is a projection matrix onto C(X). To be the unique projection matrix onto C(X), X(X0 X)− X0 must be symmetric. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 53 To show symmetry of X(X0 X)− X0 , it will help to first show that − 0 0 0 X(X0 X)− 1 X = X(X X)2 X − 0 for any two GIs of X0 X denoted (X0 X)− 1 and (X X)2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 20 / 53 By the Corollary to Result 2.5 0 X0 = X0 X(X0 X)− 2X. Therefore, − 0 − 0 0 0 0 X(X0 X)− 1 X = X(X X)1 X X(X X)2 X = 0 X(X0 X)− 2 X by Result 2.5. ∴ X(X0 X)− X0 is the same regardless of which GI for X0 X is used. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 21 / 53 Now show that X(X0 X)− X0 is symmetric. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 22 / 53 [X(X0 X)− X0 ]0 = X[(X0 X)− ]0 X0 = X(X0 X)− X0 ∵ symmetry of X0 X implies that [(X0 X)− ]0 is a GI of X0 X, and X(X0 X)− X0 is the same regardless of choice of GI. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 23 / 53 We have shown that X(X0 X)− X0 is the orthogonal projection matrix onto C(X). We know that it is the only symmetric projection matrix onto C(X) by Result A.16. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 24 / 53 Example: Find the orthogonal projection matrix onto C( 1 ). n×1 Also, simplify PX y in this case. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 53 X0 X = 10 1 = n. Thus (X0 X)− = n1 . X(X0 X)− X0 = 1 c Copyright 2012 Dan Nettleton (Iowa State University) 1 0 1 n 1 = 110 . n Statistics 611 26 / 53 Thus, when X = 1, PX is an n × n matrix whose entries are each 1n . " We saw previously the special case c Copyright 2012 Dan Nettleton (Iowa State University) 1/2 1/2 # . 1/2 1/2 Statistics 611 27 / 53 1 PX y = 110 y n n n 1X 1 X = 1 yi = 1 yi n n i=1 i=1 ȳ ȳ = 1ȳ = . . .. ȳ c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 28 / 53 By Corollary A.4 to Result A.16, we know that I − PX = I − X(X0 X)− X0 is the orthogonal projection matrix onto C(X)⊥ = N (X0 ). This is Result 2.6. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 29 / 53 Fitted Values and Residuals By Result A.4, any y ∈ Rn may be written as y = s + t, where s ∈ C(X) and t ∈ C(X)⊥ = N (X0 ). Furthermore, the vectors s ∈ C(X) and t ∈ C(X)⊥ are unique. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 30 / 53 Fitted Values and Residuals Because y = PX y + (I − PX )y PX y ∈ C(X) and (I − PX )y ∈ C(X)⊥ , we get s = PX y and t = (I − PX )y. Moreover, ŷ ≡ PX y = the vector of fitted values ε̂ ≡ (I − PX )y = y − ŷ = the vector of residuals. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 31 / 53 Let PW and PX denote the orthogonal projection matrices onto C(W) and C(X), respectively. Suppose C(W) ⊆ C(X). Show that PW PX = PX PW = PW . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 32 / 53 Proof: C(W) ⊆ C(X) ⇒ ∃ B 3 XB = W. ∴ PX PW = X(X0 X)− X0 W(W 0 W)− W 0 = X(X0 X)− X0 XB(W 0 W)− W 0 = XB(W 0 W)− W 0 = W(W 0 W)− W 0 = PW . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 33 / 53 Now PX PW = PW ⇒ (PX PW )0 = P0W ⇒ P0W P0X = P0W ⇒ PW PX = PW . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 34 / 53 Theorem 2.2: If C(W) ⊆ C(X), then PX − PW is the orthogonal projection matrix onto C((I − PW )X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 35 / 53 Proof of Theorem 2.2: (PX − PW )0 = P0X − P0W = PX − PW . ∴ PX − PW is symmetric. (PX − PW )(PX − PW ) = PX PX − PW PX − PX PW + PW PW = PX − PW − PW + PW = PX − PW . ∴ PX − PW is idempotent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 36 / 53 Is (PX − PW )y ∈ C((I − PW )X) ∀ y? (PX − PW )y = (PX − PW PX )y = (I − PW )PX y = (I − PW )X(X0 X)− X0 y ∈ C((I − PW )X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 37 / 53 Is (PX − PW )z = z z ∈ C((I − PW )X)? (PX − PW )(I − PW )X = PX (I − PW )X − PW (I − PW )X = (PX − PX PW )X − (PW − PW PW )X = (PX − PW )X − (PW − PW )X = (PX − PW PX )X = (I − PW )PX X = (I − PW )X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 38 / 53 Now z ∈ C((I − PW )X). ⇒ z = (I − PW )Xc for some c. Therefore, (PX − PW )z = (PX − PW )(I − PW )Xc = (I − PW )Xc = z. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 39 / 53 We have previously seen that Q(b) = (y − Xb)0 (y − Xb) = ky − Xbk2 ≥ ky − PX yk2 ∀ b ∈ Rp with equality iff Xb = PX y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 40 / 53 Now we know that PX = X(X0 X)− X0 . Thus, β̂ minimizes Q(b) iff Xβ̂ = X(X0 X)− X0 y. By Result 2.4, this equation is equivalent to X0 Xβ̂ = X0 X(X0 X)− X0 y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 41 / 53 Because X0 X(X0 X)− X0 = X0 , X0 Xβ̂ = X0 X(X0 X)− X0 y is equivalent to X0 Xβ̂ = X0 y. This system of linear equations is known as the Normal Equations (NE). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 42 / 53 We have established Result 2.3: β̂ is a solution to the NE (X0 Xb = X0 y) iff β̂ minimizes Q(b). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 43 / 53 Corollary 2.1: The NE are consistent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 44 / 53 Proof of Corollary 2.1: NE are X0 Xb = X0 y. If we take β̂ = (X0 X)− X0 y, then X0 Xβ̂ = X0 X(X0 X)− X0 y c Copyright 2012 Dan Nettleton (Iowa State University) = X0 y. Statistics 611 45 / 53 By Result A.13, β̂ is a solution to X0 Xb = X0 y iff β̂ = (X0 X)− X0 y + [I − (X0 X)− X0 X]z for some z ∈ Rp . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 46 / 53 Corollary 2.3: Xβ̂ is invariant to the choice of a solution β̂ to the NE, i.e., if β̂ 1 and β̂ 2 are any two solutions to the NE, then Xβ̂ 1 = Xβ̂ 2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 47 / 53 Proof of Corollary 2.3: X0 Xβ̂ 1 = X0 Xβ̂ 2 (= X0 y) ⇒ Xβ̂ 1 = Xβ̂ 2 by Result 2.4. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 48 / 53 We will finish this set of notes with some other results from Section 2.2 of the text. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 49 / 53 Lemma 2.1: N (X0 X) = N (X). Result 2.2: C(X0 X) = C(X0 ). Corollary 2.2: rank(X0 X) = rank(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 50 / 53 Proof of Lemma 2.1: Xc = 0 ⇒ X0 Xc = 0 ∴ N (X) ⊆ N (X0 X). X0 Xc = 0 ⇒ c0 X0 Xc = 0 ⇒ Xc = 0 ∴ N (X0 X) ⊆ N (X). Thus, N (X0 X) = N (X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 51 / 53 Proof of Result 2.2: X0 Xc = X0 (Xc) ⇒ C(X0 X) ⊆ C(X0 ). X0 c = X0 PX c = X0 X(X0 X)− X0 c = X0 X[(X0 X)− X0 c] ⇒ C(X0 ) ⊆ C(X0 X). Thus, C(X0 X) = C(X0 ). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 52 / 53 Proof of Corollary 2.2: C(X0 X) = C(X0 ) ⇒ dim(C(X0 X)) = dim(C(X0 )) ⇒ rank(X0 X) = rank(X0 ) ⇒ rank(X0 X) = rank(X). (Can facilitate finding (X0 X)− needed for β̂ and PX .) c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 53 / 53