e = Y Y^ Partition of a total sum of squares YRn Squared length of Y n 2 X is yi = YT Y Squared length of i=1 the residual vector is n X Y^ % = vector space spanned by the columns of = X. dimension of this space is rank(X ). n X i=1 Y^i2 = = = = e = Y Y~ = (I PX )Y is in the space orthogonal to the space spanned by the columns of X . It has dimension n rank(X ): PX )Y Squared length of ^ = PX Y is Y = The residual vector e2i = eT e i=1 [(I PX )Y]T (I T Y (I PX )Y ^TY ^ Y (PX Y)T (PX Y) Y T (PX )T PX Y since PX is symmetric Y T PX PX Y since PX is idempotent Y T PX Y We have YT Y = YT (PX + I PX )Y = YT PX Y + YT (I PX )Y : 181 180 Result 3.6 For the linear model E (Y) = X and V ar(Y ) = , ANOVA Source of Variation Degrees of Freedom Sums of Squares model rank(X ) Y^ T Y^ = YT PX Y (uncorrected) residuals total (uncorrected) n-rank(X ) eT e = YT (I PX )Y n n 2 YT Y = i=1 yi ^ = X b = PX Y the OLS estimator Y for X is ^ ) = X (i) unbiased, i.e., E (Y (ii) a linear function of Y (iii) has variance-covariance matrix V ar(Y^ ) = PX PX X This is true for any solution b = (X T X ) X T Y 182 to the normal equations. 183 Comments: Proof: Y^ = X b = PX Y is said to be a linear unbiased estimator for (ii) is trivial, since Y^ = PX Y (iii) follows from result 2.1.(ii) (i) E (Y^ ) = E (PX Y) = PX E (Y ) from result 2.1.(i) = PX X = X since PX X = X E (Y) = X For the Gauss-Markov model, V ar(Y) = 2I and V ar(Y^ ) = PX (2I )PX = 2PX PX = 2PX = 2 X (X T X ) X T " this is sometimes called the \hat" matrix. 184 185 Estimable Functions Questions Is Y^ = X b = PX Y the \best" estimator for E (Y) = X? Is Y^ = X b = PX Y the \best" estimator for E (Y ) = X in the class of linear, unbiased estimators? What other linear functions of , say cT = c11 + c22 + + ckk; have OLS estimators that are invariant to the choice of b = (X T X ) X T Y that solves the normal equations? 186 Some estimates of linear functions of the parameters have the same value, regardless of which solution to the normal equations is used These are called estimable functions An example is E(Y) = X Check that X b has the same value for each solution to the normal equations obtained in Example 3.2, i.e., Y1: 777 Y1: 7777 7 X b = YY23:: 77777 7 Y3: 77775 Y3: 2 66 66 66 66 66 66 66 66 66 4 3 187 Example 3.2. Blood coagulation times Estimable Functions Diet 1 Diet 2 Diet 3 Y11 = 62 Y21 = 71 Y31 = 72 Y12 = 60 Y32 = 68 Y33 = 67 Defn 3.6: For a linear model E (Y) = X and V ar(Y) = we will say that cT = c11 + c22 + + ckk is estimable if there exists a linear unbiased estimator aT Y for cT , i.e., for some non-random vector a, we have E (aT Y) = cT . The \Eects" model Yij = + i + ij can be written as 2 66 66 66 66 66 66 66 66 66 4 Y11 777 666 1 Y12 7777 6666 1 Y21 7777 = 6666 1 Y31 7777 6666 1 Y32 7775 6664 1 Y33 1 3 2 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 3 77 77 77 77 77 77 77 77 77 5 2 66 66 66 66 66 4 11 777 12 7777 + 21 7777 31 7777 3 32 7775 33 3 77 77 1 77 77 2 77 5 2 66 66 66 66 66 66 66 66 66 4 3 188 Examples of estimable functions + 1 189 + 2 Choose aT = (0 0 1 0 0 0): Then, Choose aT = ( 12 12 0 0 0 0). Then, E (aT Y) = E ( 21 Y11 + 12 Y12) = 12E (Y11) + 12E (Y12) = = aT Y = Y21 and 1( + ) + 1( + ) 1 2 1 2 + 1 Choose aT = (1 0 0 0 0 0) and note that E (aT Y) = E (Y11) = + 1. 190 E (aT Y) = E (Y21) = + 2: + 3 Choose aT = (0 0 0 1 0 0): Then, E (aT Y) = E (Y31) = + 3 191 1 2 2 + 31 2 Note that Note that 1 2 = ( + 1) ( + 2) = E (Y11) E (Y21) = E (Y11 Y21) = E (aT Y) where 2 + 31 2 = = = = 3( + 1) ( + 2) 3E (Y11) E (Y21) E (3Y11 Y21) E (aT Y) where aT = (1 0 -1 0 0 0) aT = (3 0 -1 0 0 0) 192 Quantities that are not estimable include ; 1; 2; 3; 31; 1 + 2 To show that a linear function of parameters, c0 + c11 + c22 + c33 is not estimable, one must show that there is no non-random vector aT = (a0; a1; a2; a3) for which E (aT Y) = c0 + c11 + c22 + c33 194 193 For 1 to be estimable we would need to nd an a that satises 1 = E (aT Y) = a1E (Y11) + a2E (Y12) + a3E (Y21) +a4(E (Y31) + a5E (Y32) + a6E (Y33) = (a1 + a2)( + 1) + a3( + 2) +(a4 + a5 + a6)( + 3) This implies 0 = a3 = (a4 + a5 + a6), Then 1 = (a1 + a2)( + 1) which is impossible. 195 Example 3.1. Yield of a chemical process 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 Y1 1 160 Y2 1 165 Y3 = 1 165 Y4 1 170 Y5 1 175 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 3 2 1 2 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 4 1 0 2 1 + 3 2 4 5 3 7 7 7 7 7 7 7 7 5 2 66 66 66 66 66 66 66 66 64 3 77 77 77 77 77 77 77 77 75 Since X has full column rank, each element of is estimable. 0 Consider B1 = cT where c = 1 . 0 2 66 66 66 66 4 Since X has full column rank, the unique least squares estimator for is b = (X T X ) 1XT Y and an unbiased linear estimator for cT is cT b = cT (X T X ) 1X T Y call this aT 197 196 Result 3.7 For a linear model with E (Y) = X and V ar(Y ) = (i) The expectation of any observation is estimable. (ii) A linear combination of estimable functions is estimable. (iii) Each element of is estimable if and only if rank(X ) = k = number of columns. (iv) Every cT is estimable if and only if rank(X ) = k = number of columns in X. 198 3 77 77 77 77 5 Proof: Y1 77 (i) For Y = .. 77775 with E (Y) = X Yn we have 2 3 2 66 66 66 4 3 0 0. . one in the ith Yi = aTi Y where ai = 1 position 0. . 0 Then E (Yi) = E (aTi Y) = aTi E (Y) = aTi X = cTi 66 66 66 66 66 66 66 66 66 66 66 4 77 77 77 77 77 77 77 77 77 77 77 5 199 (ii) Suppose is estimable. Then, there is an ai such that E (aTi Y) = cTi . Now consider a linear combination of estimable functions w1cT1 + w2cT2 + + wpcTp cTi Let a = w1a1 + w2a2 + + wpap. Then, E (aT Y) = E (w1aT1 Y + + wpaTp Y) = w1E (aT1 Y) + + wpE (aTp Y) = w1cT1 + + wpcTp (iii) Previous argument. (iv) Follows from (ii) and (iii). Result 3.8. For a linear model with E (Y) = X and V ar(Y) = , each of the following is true if and only if cT is estimable. (i) cT = aT X for some a i.e., c is in the space spanned by the rows of X . (ii) cT a = 0 for every a for which X a = 0. (iii) cT b is the same for any solution to the normal equations (X T X )b = X T Y, i.e., there is a unique least squares estimator for cT . 201 200 Use Result 3.8. (ii) to show that is not estimable in Example 3.2. In that case 1 1 E (Y) = X = 11 1 1 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 and 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 2 3 3 77 77 77 77 77 77 75 In example 3.2, X d = 0 if and only if d=w = cT = [1 0 0 0]: Let dT = [1 Part (ii) of Result 3.8 sometimes provides a convenient way to identify all possible estimable functions of . 2 66 66 66 66 66 66 64 1 1 for some scalar w. 1 1 3 77 77 77 77 77 77 75 Then, cT is estimable if and only if 0 = cT d = w(c1 c2 c3 c4) () c1 = c2 + c3 + c4. 1 1 1], then X d = 0; but cT d = 1 6= 0 Hence, is not estimable. 202 203 Then, (c2 + c3 + c4) + c21 + c32 + c43 is estimable for any (c2 c3 c4) and these are the only estimable functions of ; 1; 2; 3. Some estimable functions are Defn 3.7: For a linear model with E (Y) = X and V ar(Y) = ; where X is an n k matrix, Crkk1 is said to be estimable if all of its elements + 13 (1 + 2 + 3) (c2 = c3 = c4 = 13 ) and C = + 2 (c2 = 1 c3 = c4 = 0) but + 22 is not estimable. 2 66 66 66 66 66 66 66 66 66 4 cT1 cT2 = .. cTr 3 77 77 77 77 77 77 77 77 77 5 2 66 66 66 66 66 66 66 66 66 4 cT1 cT2 .. cTr 3 77 77 77 77 77 77 77 77 77 5 are estimable. 204 Result 3.9 For the linear model with E (Y) = X and V ar(Y) = , where X is an n k matrix, each of the following conditions hold if and only if C is estimable. (i) AX = C for some matrix A, i.e., each row of C is in the space spanned by the rows of X . (ii) C d = 0 for any d for which X d = 0. (iii) C b is the same for any solution to the normal equations (X T X )b = X T y. 206 205 Summary For a linear model Y = X + with E (Y) = X and V ar(Y) = , we have Any estimable function has a unique interpretation The OLS estimator for an estimable function C is unique C b = C (X T X ) X T Y The OLS estimator for an estimable function C is { a linear estimator { an unbiased estimator 207 In the class of linear unbiased estimators for cT , is the OLS estimator the \best?" Here \best" means smallest expected squared error. Let t(Y) denote an estimator for cT . Then, the expected squared error is MSE = E [t(Y) cT ]2 = E [t(Y) E (t(Y)) + E (t(Y)) cT ]2 = E [t(Y) E (t(Y))]2 +[E (t(Y)) cT ]2 +2[E (t(Y)) cT ]E [t(Y) E (t(Y))] = E [t(Y) E (t(Y))]2 + [E (t(Y)) cT ]2 = V ar(t(Y)) + [bias]2 If we restrict our attention to linear unbiased estimators for cT : E(t(Y)) = cT t(Y) = aT Y for some a then, t(Y) = aT Y is the best linear unbiased estimator (blue) for cT if V ar(aT Y) V ar(dT Y) for any d and any value of . 208 Result 3.10 (Gauss-Markov Theorem) For the Gauss-Markov model, E (Y) = X and V ar(Y) = 2I; the OLS estimator of an estimable function cT is the unique best linear unbiased estimator (blue) of cT . Proof: (i) For any solution b = (X T X ) X T Y to the normal equations, the OLS estimator for cT is cT b = cT (X T X ) X T y which is a linear function of Y. 210 209 (ii) From Result 3.8.(i), there exists a vector a such that cT = aT X . Then E (cT b) = E (cT (X T X ) X T Y) = cT (X T X ) X T E (Y) = cT (X T X ) X T X = aT X (X T X ) X T X of X projection PX " onto the column space = aT X = cT Hence, cT b is an unbiased estimator. 211 (iii) Minimum variance in the class of linear unbiased estimators Suppose dT Y is any other linear unbiased estimator for cT . Then E (dT Y) = dT E (Y) = dT X = cT for every . Hence, dT X = cT and c = X T d. We must show that V ar(cT b) V ar(dT Y): First, note that V ar(dT Y) = V ar(cT b + [dT Y cT b]) = V ar(cT b) + V ar(dT Y cT b) +2Cov(cT b; dT Y cT b) Then V ar(dT Y) V ar(cT b) +2Cov(cT b; dT Y because cT b) = V ar(cT b) Cov(cT b; dT Y cT b) = 0: To show this rst note that cT b = cT (X T X ) X T Y is invariant with respect to the choice of (X T X ) . Consequently, we can use the Moore-Penrose generalized inverse which is symmteric. (Not every generalized inverse of X T X is symmetric.) 212 213 Then, Cov(cT b; dT Y = cT b) Cov(cT (X T X ) XT Y ; [dT cT (X T X ) = (cT (X T X ) XT ) V ar (Y)[dT = [cT (X T X ) XT ] 2 I [d X T ]Y) c T (X T X ) X T ]T Then, Cov(cT b; dT Y cT b) = 2[cT (X T X ) c cT (X T X ) c] X(XT X ) c] " This is where the symmetry of (X T X ) is needed. = 2[cT (X T X ) X T d since " cT (X T X ) X T X (X T X ) c] XT d = c Since cT b is invariant to the choice of b (result 3.8.(iii)), we were able to use the Moore-Penrose inverse for (X T X ) which satises (X T X ) (X T X )(X T X ) = X T X by denition. 214 =0 Consequently, V ar(dT Y) V ar(cT b) and cT b is blue. 215 (iv) To show that the OLS estimator is the unique blue, note that V ar(dT Y) = V ar(cT b + [dT Y cT b]) = V ar(cT b) + V ar(dT Y cT b) because Cov(cT b; dT Y cT b) = 0. What if you have a linear model that is not a Gauss-Markov model? Then, dT Y is blue if and only if V ar(dT Y cT b) = 0 : E (Y) = X V ar(Y) = 6= 2I This is equivalent to dT Y cT b = constant: Since both estimators are unbiased E (dT Y cT b) = E (dT Y) E (cT b) = 0: Consequently, dT Y cT b = 0 for all Y and cT b is the unique blue. 216 Parts (i) and (ii) of the proof of result 3.11 do not require V ar(Y) = 2I : 217 V ar(Y) = 2I and the OLS estimator for any estimable quantity, cT b = cT ( X T X ) X T Y ; is invariant to the choice of (X T X ) . Consequently, the OLS estimator for cT , cT b = cT (X T X ) X T Y is a linear unbiased estimator. Result 3.8 does not require The OLS estimator cT b may not be blue. There may be other linear unbiased estimators with smaller variance. 218 Generalized Least Squares (GLS) Estimation: Variance of the OLS estimator of an estimable quantity: V ar(cT b) = V ar(cT (X T X ) X T Y) = cT (X T X ) X T X [(X T X ) ]T c For the Gauss-Markov model V ar(Y) = = 2I and V ar(cT b) = 2cT (X T X ) X T X [(X T X ) ]T c = 2cT (X T X ) c Defn 3.8: For a linear model with E (Y) = X and V ar(Y) = ; where is positive denite, a generalized least squares estimator for minimizes (Y X bGLS)T 1(Y X bGLS) Strategy: Transform Y to a random vector Z for which the GaussMarkov model applies. 219 220 The spectral decomposition of yields and n = j =1 j uj uTj : X Dene E (Z) = E ( 1=2Y) = 1=2E (Y) = 1=2X n 1 T 1=2 = j =1 j uj uj X s = W and create the random vector Z = 1=2Y: and we have a Gauss-Markov model for Z, where W = 1=2X is the model matrix. Then V ar(Z) = V ar( 1=2Y) = 1=2 1=2 = I 221 Note that It must be a solution to the normal equations for the Z model W b)T (Z (Z W b) = ( 1=2Y 1=2X b)T ( 1=2Y 1=2X b) = (Y X b)T 1=2 1=2(Y = (Y X b)T 1(Y X b) X b) WTWb = WTZ , (X T 1=2 1=2X )b = X T 1=2 1=2Y , (X T 1X )b = X T 1Y Hence, any GLS estimator for the Y model is an OLS estimator for the Z model. 222 These are the generalized least squares estimating equations. 223 Result 3.11 For a linear model with E (Y) = X and V ar(Y) = , the GLS estimator of an estimable function cT , Any solution bGLS = (W T W ) W T Z = (X T 1X ) X T 1Y is called a generalized least squares (GLS) estimator for . 224 cT bGLS = cT (X T 1X ) X T 1Y ; is the unique blue of cT . Proof: Since cT is estimable, there is an a such that cT = E (aT Y) = E (aT 1=2 1=2Y) = E (aT 1=2Z) Consequently, cT is estimable for the Z model. Apply the Gauss-Markov theorem (result 3.10) to the Z model. 225 Comments For the linear model with For the Gauss-Markov model, cT bGLS = cT bOLS : E (Y) = X and V ar(Y) = ; both the OLS and GLS estimators for an estimable function cT are linear unbiased estimators. V ar(cT bOLS) = cT (X T X ) X T X [(X T X ) ]T c V ar(cT bGLS) = cT (X T 1X ) X T 1X (X T 1X ) c and V ar(cT bOLS) V ar(cT bGLS) The blue propoerty of cT bGLS assumes that V ar(Y) = is known. The same results, including Results 3.12, hold for the Aitken model where E (Y) = X and V ar(Y) = 2V for some known matrix V . 227 226 In practice V ar(Y) = is usually unknown. An approximation to bGLS = (X T 1X ) X T 1Y is obtained by substituting a consistent estimator ^ for . { use method of moments or maximum likelihood estimation to obtain ^ 228 { the resulting estimator not a linear estimator is is consistent but not necessarily unbiased does not provide a blue for estimable functions may have larger mean squared error than the OLS estimator 229