Least Squares Estimation of the Expected Value of the Response Vector c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 19 Consider the General Linear Model (GLM) previously introduced, where y = Xβ + ε, with E(ε) = 0. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 2 / 19 Suppose we want to estimate E(y) = Xβ based on our observed response y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 3 / 19 Because we know E(y) = Xβ ∈ C(X), a reasonable strategy may be to find the vector in C(X) that is closest to y in terms of Euclidean distance. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 4 / 19 For example, suppose " # y1 = y2 " # 1 [µ] + 1 " # ε1 ε2 . Then E " #! y1 y2 c Copyright 2012 Dan Nettleton (Iowa State University) = " # 1 1 [µ] = " # µ µ . Statistics 611 5 / 19 If we observe y = " # 4 2 and have to guess E(y) = Xβ = " # 1 1 [µ] = " # µ µ , it may be reasonable to find the value of µ that makes " # 4 2 " # µ µ as close to as possible. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 6 / 19 In general, we seek a vector β̂ 3 Xβ̂ is closer to y than any other vector in C(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 7 / 19 If Q(b) = ky − Xbk2 = (y − Xb)0 (y − Xb), we seek β̂ 3 Q(β̂) ≤ Q(b) ∀ b ∈ Rp . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 8 / 19 If Q(β̂) ≤ Q(b) ∀ b ∈ Rp , Xβ̂ is called the Least Squares Estimate (LSE) of E(y) = Xβ. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 9 / 19 Suppose PX is the orthogonal projection matrix onto C(X). Show that PX y is the unique vector in C(X) that is closest to y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 10 / 19 Proof: PX is the orthogonal projection matrix onto C(X). Therefore, PX is unique symmetric and idempotent matrix satisfying (i) PX a ∈ C(X) ∀ a ∈ Rn (ii) PX z = z ∀ z ∈ C(X). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 11 / 19 Note that PX z = z ∀ z ∈ C(X) ⇒ PX Xb = Xb ∀ b ∈ Rp ⇒ PX X = X. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 12 / 19 Now note that Q(b) = (y − Xb)0 (y − Xb) = ky − Xbk2 = ky − PX y + PX y − Xbk2 = ky − PX yk2 + kPX y − Xbk2 + 2(y − PX y)0 (PX y − Xb). c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 13 / 19 1/2 the cross product is y0 (I − PX )0 (PX y − Xb) = y0 (I − PX )(PX y − Xb) = 0 ∵ (I − PX )PX = PX − PX PX = PX − PX = 0 and (I − PX )X = X − PX X = X − X = 0. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 14 / 19 Thus, ky − Xbk2 = ky − PX yk2 + kPX y − Xbk2 . This shows that ky − PX yk2 ≤ ky − Xbk2 with equality iff PX y = Xb. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 15 / 19 Because PX projects onto C(X), we know PX y ∈ C(X). Thus PX y is the vector in C(X) that is closest to y. c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 16 / 19 Because PX y ∈ C(X), we know ∃ at least one vector b 3 PX y = Xb. We can take β̂ to be any such vector b. Then we have shown that β̂ will minimize Q(b) over b ∈ Rp . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 17 / 19 Now we know that the least square estimator of E(y) = Xβ is Xβ̂ = PX y. How do we find PX ? c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 18 / 19 We need to find the symmetric and idempotent matrix that projects onto C(X). In the next set of notes we will show PX = X(X0 X)− X0 . c Copyright 2012 Dan Nettleton (Iowa State University) Statistics 611 19 / 19