RNy, econ4160 spring 2009 Notes to slides to Lecture 1. Residuals and multiple correlation 0 The “residual maker” M = I X(X X) lowing properties is worth noting: 1 X0 , plays a central role in many derivation. The fol- M = M0 , symmetric matrix M2 = M, idempotent matrix MX = 0, regression of X on X gives perfect …t 0 P = X(X X) 1 X0 is the prediction or projection matrix. It is also symmetric and idempo- tent. M and P are orthogonal: MP = PM = 0. From y=y ^ + e = Py + My and 0 y0 y = (y P + y0 M)(Py + My) = y ^0 y ^ + e0 e which we also often write as n X i=1 | e. n X y)2 = (yi {z i=1 } SST (b yi | y^)2 + {z SSR n X e2i i=1 } | {z } SSE Alternative ways of writing SSE: 0 0 e0 e = (y M0 )My =y My = y0 e = e0 y e0 e= y 0 y = y0 y y0 P0 Py = y0 y 0 y0 X(X X) 1 h 0 y0 X(X X) X0 y = y 0 y 1 X0 ih 0 X(X X) 0 y0 Xb = y y 1 i X0 y b0 X0 y. Inference in the regression model The Wald approach: Testing H0 based in unrestricted estimation Study …rst carefully the general representation of linear hypotheses in terms of H0 : R = q:(Greene Ch5.3, p 83 ) Note that the number of restrictions is J, so R is J K and q is J 1. To develop a so called Wald test statistics of H0 we need (only) to estimate the model without imposing H0 . (aka unrestricted estimation). We start from the discrepancy vector Rb q=m 1 If b is close to hypothesized value of in H0 ; then m is small. To derive a formal test we need to know the distribution of m under the null:Since m is a linear combination of the normally distributed b, m is also normal with; E [m j X] = 0 and V ar [m j X] = V ar [Rb j X] = E (R(b ))(R(b = RE (b 2 = 0 ) j X R0 )(b = RVar [b j X] R 0 R(X X) 1 0 )) j X 0 R0 The so called Wald criterion is a quadratic form in the J deviations located in the m vector. It is de…ned by 1 W = m0 fV ar [Rb j X]g m We know that a sum of J squared independent standard normal variables (ie each N (0; 1)) is distribution. This generalizes to W which therefore becomes 2 W Since 2 2 (J) (J) jj X;H0 is unknown it must be estimated, we use 1 2 s = n K n X e2i = e0 e i=1 Since MX = 0, we have e = M(X + ") = M" and s2 = 1 n K The statistic (n n X e2i = i=1 K) s2 2 = 1 n K "0 M". "0 M" 2 2 is a quadratic form in the standard normal vector "= . However this Chi-square does not have n degrees of freedom but n K. This is due to M being symmetric idempotent which means that the characteristic roots of M are either 0 or 1. There are n K roots that are equal to 1, and K roots that are zero. As a result "0 M"= 2 is a sum of n K independent standard normal variables: (n F = K) s2 2 = "0 M" W (n K) W 2 2 = s J (n K) 2 J s2 Using the de…nition of W we see that the unknown F = h 0 m0 R(X X) 2J 1 R0 i 1 2 2 (n F (J; n 2 K j X) K j X;H0 ): disappears from the expression for F . h 0 m (n K) m0 s2 R(X X) 2 = J (n K) s 2 1 R0 i 1 m (1) see Greene p 85. In the simplest case there, is only one restriction on one parameter. Let the hypothesis be H0 : 2 = 0. This implies that R is (1 K) and q =0 so that m = b1 2 0. and n where (X0 X) 1 2 V ar[m j X] =V ar[b1 j X] = o n (X0 X) 1 o diag 2x2 0 denotes the second element along the diagonal of (X X) diag 2x2 1 . The Wald criterion for this case becomes W = (b1 1 (b1 V ar[b1 j X] 0) (b1 0)2 n o 2 (X0 X) 1 0) = diag 2x2 2 and the F statistic that results form substituting by s2 is: (b1 0)2 (b1 0)2 2 F f(X0 X) 1 gdiag 2x2 = = s2 (b1 0) n o 1 (X0 X) = s2 f(X0 X) 1 gdiag 2x2 s2 2 diag 2x2 (b1 0)2 = 1 2 1 2 f(X0 X) 1 gdiag 2x2 s2 2 (n (n K) F (1; n K) K) As you know from before, in this special case with a single parameter restriction: F = t2 where (b1 0) n o 1 s2 (X0 X) t= r = (b1 0) Est. V ar[b1 ] diag 2x2 ie. the usual t-statistic which is Student t distributed under the null hypothesis: t(n K) j H0 . The Lagrange multiplier approach: Test based on both restricted and unrestricted estimation If we impose the J restrictions on the model and then do OLS on the restricted model we obtain restricted estimates on the remaining K J parameters, see Greene p 88 for the restricted least squares estimator. If the restrictions holds exactly so that m = 0, then the restricted estimates are the same as the unrestricted (see Greene p 88) and the Lagrange multiplier of the constrained minimization problem is zero. If the restriction do not hold exactly, then the Lagrange multiplier is di¤erent from zero and the inference problem is whether this is due to di¤erence in population parameters or sampling variability. In practice we typically compute the Lagrange multiplier (LM) test by estimation both the restricted and the unrestricted model. The sum of squared residuals from the regressions with the J restrictions imposed is denoted SSE . If these two sums (SSE and the SSE from unrestricted estimation) are equal, then m = 0 (and the associated Lagrange multiplier is also zero). Moreover, we can construct 2 statistics from SSE and SSE with degrees of freedom n (K J) and n K respectively. "0R MR "R "0 M" 2 since the two statistics are 2 (n J)) and "0 M " SSE F = SSE J SSE n K (K = 2 2 2 (n "0 M" 2 2 "0 M" K)) respectively. Therefore: n K J 2 3 (J) F (J; n K) j H0 . (2) Hence the Wald and LM approaches leads to test statistics that have the same F distribution under the null hypothesis. The two test statistics are also numerically identical, see Greene p. 90. Intuitively, this is because also in Wald case we do in fact impose exactly the same restrictions as in the LM case— but not on the model: they are imposed instead after estimation of the unrestricted model, in the Wald test statistic. 4