The Orthogonal Projection Matrix onto C(X) and the Normal Equations c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 53 Theorem 2.1: PX , the orthogonal projection matrix onto C(X), is equal to X(X0 X)− X0 . The matrix X(X0 X)− X0 satisfies (a) [X(X0 X)− X0 ][X(X0 X)− X0 ] = [X(X0 X)− X0 ] (b) X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn (c) X(X0 X)− X0 x = x ∀ x ∈ C(X) (d) X(X0 X)− X0 = [X(X0 X)− X0 ]0 (e) X(X0 X)− X0 is the same ∀ GI of X0 X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 53 It is useful to establish a few results before proving Theorem 2.1. Show that ifn×n A is a symmetric matrix and n×n G is a GI ofn×n A, then n×n G0 is also GI ofn×n A. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 3 / 53 Result 2.4: X0 XA = X0 XB ⇐⇒ XA = XB. One direction is obvious (⇐=). Can you prove the other (=⇒)? c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 5 / 53 Result 2.5: Suppose (X0 X)− is any GI of X0 X. Then (X0 X)− X0 is a GI of X, i.e., X(X0 X)− X0 X = X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 53 Corollary to Result 2.5: For (X0 X)− any GI of X0 X, X0 X(X0 X)− X0 = X0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 53 Now we can prove Theorem 2.1: First show X(X0 X)− X0 is idempotent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 53 Now show that X(X0 X)− X0 y ∈ C(X) ∀ y ∈ Rn . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 53 Now show that X(X0 X)− X0 z = z ∀ z ∈ C(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 53 Now we know X(X0 X)− X0 is a projection matrix onto C(X). To be the unique projection matrix onto C(X), X(X0 X)− X0 must be symmetric. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 53 To show symmetry of X(X0 X)− X0 , it will help to first show that − 0 0 0 X(X0 X)− 1 X = X(X X)2 X − 0 for any two GIs of X0 X denoted (X0 X)− 1 and (X X)2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 20 / 53 Now show that X(X0 X)− X0 is symmetric. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 22 / 53 We have shown that X(X0 X)− X0 is the orthogonal projection matrix onto C(X). We know that it is the only symmetric projection matrix onto C(X) by Result A.16. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 24 / 53 Example: Find the orthogonal projection matrix onto C( 1 ). n×1 Also, simplify PX y in this case. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 25 / 53 By Corollary A.4 to Result A.16, we know that I − PX = I − X(X0 X)− X0 is the orthogonal projection matrix onto C(X)⊥ = N (X0 ). This is Result 2.6. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 29 / 53 By Result A.4, any y ∈ Rn may be written as y = s + t, where s ∈ C(X) and t ∈ C(X)⊥ = N (X0 ). Furthermore, the vectors s ∈ C(X) and t ∈ C(X)⊥ are unique. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 30 / 53 Because y = PX y + (I − PX )y PX y ∈ C(X) and (I − PX )y ∈ C(X)⊥ , we get s = PX y and t = (I − PX )y. Moreover, ŷ ≡ PX y = the vector of fitted values ε̂ ≡ (I − PX )y = y − ŷ = the vector of residuals. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 31 / 53 Let PW and PX denote the orthogonal projection matrices onto C(W) and C(X), respectively. Suppose C(W) ⊆ C(X). Show that PW PX = PX PW = PW . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 32 / 53 Theorem 2.2: If C(W) ⊆ C(X), then PX − PW is the orthogonal projection matrix onto C((I − PW )X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 35 / 53 We have previously seen that Q(b) = (y − Xb)0 (y − Xb) = ky − Xbk2 ≥ ky − PX yk2 ∀ b ∈ Rp with equality iff Xb = PX y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 40 / 53 Now we know that PX = X(X0 X)− X0 . Thus, β̂ minimizes Q(b) iff Xβ̂ = X(X0 X)− X0 y. By Result 2.4, this equation is equivalent to X0 Xβ̂ = X0 X(X0 X)− X0 y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 41 / 53 Because X0 X(X0 X)− X0 = X0 , X0 Xβ̂ = X0 X(X0 X)− X0 y is equivalent to X0 Xβ̂ = X0 y. This system of linear equations is known as the Normal Equations (NE). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 42 / 53 We have established Result 2.3: β̂ is a solution to the NE (X0 Xb = X0 y) iff β̂ minimizes Q(b). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 43 / 53 Corollary 2.1: The NE are consistent. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 44 / 53 By Result A.13, β̂ is a solution to X0 Xb = X0 y iff β̂ = (X0 X)− X0 y + [I − (X0 X)− X0 X]z for some z ∈ Rp . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 46 / 53 Corollary 2.3: Xβ̂ is invariant to the choice of a solution β̂ to the NE, i.e., if β̂ 1 and β̂ 2 are any two solutions to the NE, then Xβ̂ 1 = Xβ̂ 2 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 47 / 53 We will finish this set of notes with some other results from Section 2.2 of the text. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 49 / 53 Lemma 2.1: N (X0 X) = N (X). Result 2.2: C(X0 X) = C(X0 ). Corollary 2.2: rank(X0 X) = rank(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 50 / 53