DEPARTMENT OF ECONOMICS UNIVERSITY OF VICTORIA ECONOMICS 366: ECONOMETRICS II SPRING TERM 2005: ASSIGNMENT TWO Brief Suggested Solutions Question One: Consider the classical T-observation, K-regressor linear regression model: y = Xβ + e ; e ~ N(0, σ2IT) where X is nonstochastic of full column rank. Suppose we specify prior, nonsample, information on β as J exact linear restrictions Rβ=r, where R is a (J×K) nonstochastic, full row rank, matrix and r is a nonstochastic (J×1) vector. We consider testing whether the sample is compatible with the prior beliefs by examining the null hypothesis H0: Rβ=r versus Ha: Rβ≠r. (a) We can test H0 using the F-statistic: F= (Rb − r )′[R (X′X) −1 R ′]−1(Rb − r ) Jσˆ 2 (1) where b is the OLS estimator of β, with corresponding residual vector ê =y-Xb, and σˆ 2 = ê′ê /(T − K ) . Let: b* = b - (X′X)-1R′[R(X′X)-1R′]-1(Rb-r) and e* = y – Xb*. Show that F can also be written as: F= (b − b* )′R ′[R (X′X) −1 R ′]−1 R (b − b* ) Jσˆ 2 (2) and as: F= (e* ' e* − ê' ê) Jσˆ 2 (3) (5 marks) 2.5 marks for each part of the proof. Each ratio has the same denominator so we only need to work with the numerators. For (2), we begin with noting that Rb*=r (it is derived to ensure that it satisfies the restrictions), so that Rb-r=Rb-Rb*=R(b-b*) and we can write: (Rb-r)′[R(X′X)-1R′]-1(Rb-r) = (b-b*)′R′[R(X′X)-1R′]-1R(b-b*) as required. For (3), we use the result given in Hint 1 for part (b), which you considered in Test 1, that: e*′e* = ê ′ ê + (Rb-r)′[R(X′X)-1R′]-1(Rb-r) so the result follows immediately as this implies that: e*′e* - ê ′ ê = (Rb-r)′[R(X′X)-1R′]-1(Rb-r) (b) Suppose we estimate the classical model subject to the restrictions Rβ=r; i.e., we use the restricted least squares estimator, b*, as the estimator of β. Consider a corresponding estimator of σ2: σ *2 = ( y − Xb* )′( y − Xb* ) e *′ e * = T−K+J T−K+J Use the following Theorems and Hints to prove that σ*2 is an unbiased estimator of σ2 under the null hypothesis that Rβ=r. (10 marks) Theorem 1: If the (n×1) vector x ~ N(0, In) and A is an idempotent (n×n) matrix, then x′Ax has a chi-squared distribution with degrees of freedom equal to the rank of A. Theorem 2: If the (n×1) vector x~N(0, Ω), then x′Ω-1x has a chi-squared distribution with n degrees of freedom. Theorem 3: If z1 and z2 are independent chi-squared variables with m1 and m2 degrees of freedom, respectively, then z1 + z2 ~ χ2(m1+m2) Theorem 4: If z is a chi-squared variable with m degrees of freedom, then E(z)=m. Hints: 1. Use the result that you proved in Test 1: e*′e* = ê ′ ê + (Rb-r)′[R(X′X)-1R′]-1(Rb-r) 2 2 2. Show that ( ê ′ ê /σ ) ~ χ (T-K) using Theorem 1. 3. Show that, under the null hypothesis that Rβ=r, ((Rb-r)′[R(X′X) R′] (Rb- 4. The random variables ( ê ′ ê /σ ) and (Rb-r)′[R(X′X) R′] (Rb-r)/σ ) are -1 -1 r)/σ2) ~ χ2(J). 2 -1 -1 2 independently distributed (you may use this result without proof). Our aim is to prove that σ*2 is an unbiased estimator of σ2 under the null hypothesis that Rβ=r; i.e., that E(σ*2)=σ2 when Rβ=r. We have: σ *2 = ê′ê + (Rb − r )′[R (X′X) −1 R ′]−1 (Rb − r ) e *′ e * = T−K+J T−K+J using the result given as Hint 1. As the only random variables are in the numerator, I work only with those terms to begin. Considering ê ′ ê first, recall from Econ 365, we have: ê ′ ê =e′Me where M=I-X(X′X)-1X′ and is idempotent, symmetric of rank (T-K). As the (T×1) vector e ~ N(0, σ2IT), (e/σ) ~ N(0, IT), so applying Theorem 1, we have that ( ê ′ ê /σ2) =(e′Me/σ2) ~ χ2(T-K) so that we have shown the result in Hint 2. Moving now to show the result given in Hint 3., we want to consider the random variable ((Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2) when H0 is true. We have that the (K×1) vector b is distributed as: b ~ N(β, σ2(X′X)-1) so that the (J×1) vector (Rb-r) is distributed as: Rb-r ~ N(Rβ-r, σ2R(X′X)-1R′) and, when Rβ=r then: Rb-r ~ N(0, σ2R(X′X)-1R′) and (Rb-r)/σ ~ N(0, R(X′X)-1R′) so, using Theorem 2, ((Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2) ~ χ2(J) when Rβ=r as Cov((Rb-r)/σ)=R(X′X)-1R′ is a (J×J) matrix. So, combining the results, we have, when Rβ=r, that: (e*′e*)/σ2 = ( ê ′ ê )/σ2 + (Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2 = z1 + z2 where z1 = ( ê ′ ê /σ2) ~ χ2(T-K) and z2 = (Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2 ~ χ2(J) when Rβ=r. So, given the result in Hint 4, it follows, using Theorem 3, that: (e*′e*)/σ2 ~ χ2(T-K+J) when Rβ=r. Finally, using Theorem 4, we have: E((e*′e*)/σ2) = (T-K+J) when Rβ=r so that: E(e*′e*) = σ2(T-K+J) when Rβ=r and E(σ*2) = E(e*′e*)/(T-K+J) = σ2(T-K+J)/(T-K+J) = σ2 as desired; that is, σ*2 is an unbiased estimator of σ2 when Rβ=r. Question Two: The impact of parental characteristics on a child’s birth weight is often explored; e.g., Mullahy, J. (1997), “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior”, Review of Economics and Statistics 79, 596-593. Suppose we hypothesize the following model: bwghtt = β1 + β2cigst + β3parityt + β4faminct + β5mothereduct + β6fathereduct + et (1) where for the t’th child: bwght = birth weight, in pounds cigs = average number of cigarettes the mother smoked per day during the pregnancy parity = the birth order of this child faminc = annual family income, dollars mothereduc = years of schooling for the mother fathereduc = years of schooling for the father The workfile “366ass2.wf1”, available from the course web page, contains data on these variables for 1,191 children. (a) What signs would you expect for the slope parameters? Explain. (1 mark) A priori, I would expect β2<0, β3>0, β4>0, β5>0 and β6>0, as, ceteris paribus, more cigarettes smoked will decrease birth weight; typically birth weight increases with each additional child that the mother carries; higher income might be expected to result in better nutrition and care so higher birth weight; I am unclear as to the impact of education, though if there is an effect then higher education might imply more information and so better care etc and higher birth weight, though I think this is stretching things on this one! (b) Estimate Model (1) using least squares; report the results. Comment on the results in the light of part (a). (2 marks) The output follows: Dependent Variable: BWGHT Method: Least Squares Date: 01/19/05 Time: 19:29 Sample: 1 1191 Included observations: 1191 Variable Coefficient Std. Error t-Statistic C CIGS PARITY FAMINC MOTHEDUC FATHEDUC 114.5243 -0.595936 1.787603 0.056041 -0.370450 0.472394 3.728453 0.110348 0.659406 0.036562 0.319855 0.282643 30.71631 -5.400524 2.710932 1.532794 -1.158182 1.671345 0.038748 0.034692 19.78878 464041.1 -5242.220 1.938419 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Prob. 0.0000 0.0000 0.0068 0.1256 0.2470 0.0949 119.5298 20.14124 8.813133 8.838737 9.553500 0.000000 The signs of the coefficients are as expected, except for the estimate of the impact of mother’s education, though the effect is statistically insignificant. (c) Suppose we hypothesize, after controlling for cigs, parity, and faminc, that parents’ education has no effect on birth weight. Specify the null hypothesis that corresponds to this prior belief. What alternative hypothesis would you specify? Explain. (2 marks) This belief would correspond to the null hypothesis H0: β5=0 and β6=0. As there are two restrictions under the null, the alternative hypothesis is that at least one of the parameters is nonzero; i.e., Ha: at least one of β5 or β6 ≠0, which corresponds to the three possibilities - β5=0 and β6≠0 or β6=0 and β5≠0 or β5≠0 and β6≠0. (d) Write the nonsample information from part (b) in the form Rβ=r, appropriately defining R and r, including their dimensions and ranks. (2 marks) As there are two restrictions under H0, the dimension of R is (2×6) of rank 2, while the dimension of r is (2×1) and of rank 1. Specifically, we have: β β1 β r R 2 β 0 0 0 0 1 0 0 3 0 0 0 0 0 1 β = 0 4 ( 2×6) β5 ( 2×1) β6 (6×1) which produces the restrictions under test; the first row providing β5=0 and the second row providing β6=0. (e) Specify the restricted model that imposes the nonsample information from part (a). Estimate this model to obtain the restricted least squares (RLS) estimates of the parameters; report the results. (1 mark) Imposing that β5=0 and β6=0 gives the restricted model: bwghtt = β1 + β2cigst + β3parityt + β4faminct + et (2) The EViews output from estimating this model is: Dependent Variable: BWGHT Method: Least Squares Date: 01/19/05 Time: 19:30 Sample: 1 1191 Included observations: 1191 Variable C CIGS PARITY FAMINC R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient Std. Error t-Statistic 115.4699 -0.597852 1.832274 0.067062 1.655898 0.108770 0.657540 0.032394 69.73251 -5.496474 2.786558 2.070204 0.036416 0.033981 19.79607 465166.8 -5243.663 1.939941 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) Prob. 0.0000 0.0000 0.0054 0.0386 119.5298 20.14124 8.812197 8.829267 14.95330 0.000000 (f) Compare the unrestricted and restricted least squares estimates of the parameters and the sum of squared errors. If the prior information is correct, would you expect the unrestricted and restricted outcomes to be “close” or not? Explain. Accordingly, would your comparisons suggest that the prior, nonsample, information is perhaps true? (2 marks) The table below summarises the estimates, with standard errors in parentheses: Parameter β1 β2 β3 β4 β5 β6 OLS Estimates 114.524 (3.728) -0.596 (0.110) 1.788 (0.659) 0.056 (0.037) -0.370 (0.320) 0.472 (0.283) RLS Estimates 115.470 (1.656) -0.598 (0.109) 1.832 (0.658) 0.067 (0.032) 0 0 If the prior information is correct then we would expect there to be little difference between the RLS and OLS estimates of the parameters of interest to be similar, as the sample will approximately reflect the prior information without having to impose the information. The RLS and OLS estimates of β5 and β6 are not similar in magnitude, which may suggest that the prior information is not compatible with the sample; however, this needs to be weighed against the precision with which we estimate the parameters. Note that the estimates of the first four parameters are similar in magnitude and, as expected, we estimate them with greater precision using RLS than OLS. Comparing the sum of squared errors, we have SSEU=464041.1 and SSER=465166.8; as expected SSEU<SSER though there is only a 0.2% difference between, which suggests that we are likely to support that the sample is compatible with the restrictions. (g) Given your view in part (f), and the formulae for the F-ratio considered in Question One, would you expect a “close to zero” or “far from zero” value for the F-ratio that examines whether the sample is compatible with the prior beliefs? Carefully explain. (2 marks) One of the versions of the F-statistic examined in Question One is: F= (e* ' e* − ê' ê) Jσˆ 2 ˆ 2 = SSE U /(T − K ) . As we found little relative where SSEU= ê′ê , SSER= e *′ e * , and σ difference between the two sums of squares, it is likely that our sample value for this ratio is “close to zero”. (h) Use EViews to undertake the F-test to ascertain whether this sample is compatible with the nonsample belief that parents’ education has no impact on birth weight, after controlling for cigs, parity, and faminc. Report the results. Comment on the outcome in light of your expectations from parts (f) and (g). (3 marks) You may have used either of the F-ratio rules provided in Question One to undertake the test; I used the first one based only on the unrestricted model. The EViews output is: Wald Test: Equation: UNRESTRICTED Test Statistic Value F-statistic Chi-square Df 1.437269 2.874537 Probability (2, 1185) 2 0.2380 0.2376 Value Std. Err. Null Hypothesis Summary: Normalized Restriction (= 0) C(5) C(6) -0.370450 0.472394 0.319855 0.282643 Restrictions are linear in coefficients. which gives that the sample value for the F statistic is 1.437. The p-value associated with this is: p=Pr(F(2,1185)>1.437)=0.238, which is the smallest significance level for which I would reject the null hypothesis; the results would support the null at traditional levels of significance – e.g., at the 10% or 5% level. That is, the sample is compatible with the prior belief that the education of the mother and father is not relevant in determining a child’s birth weight. (i) Suppose that the prior information is correct. Compare the sampling properties of the OLS and RLS estimators of the parameters in this case. Which estimator would you prefer to use? Why? (2 marks) The sampling distributions of the two estimators are: b ~ N(β, σ2(X′X)-1) and [ b* ~ N( β + (X′X) −1 R ′ R (X′X) −1 R ′ ] −1 (r − Rβ) ,σ2(X′X)-1[IK- R′[R(X′X)-1R′]-1R(X′X)-1]) Irrespective of whether the null hypothesis is correct or false, the OLS estimator, b, is unbiased and best, in the sense that it has the “smallest” (in the matrix sense) variancecovariance matrix of any other unbiased estimator of β, based solely on the sample data. When the prior information is correct, b* is also an unbiased estimator and its variancecovariance matrix is: Var(b*) = σ2(X′X)-1 - σ2(X′X)-1R′[R(X′X)-1R′]-1R(X′X)-1 = Var(b) [IK- R′[R(X′X)-1R′]-1R(X′X)-1] < Var(b) in the matrix sense. Accordingly, when the prior information is correct, our preference is to use b* over b, as it has a “smaller” variance-covariance matrix, incorporating additional information. (j) Suppose that the prior information is invalid. Compare the sampling properties of the OLS and RLS estimators of the parameters in this case. Which estimator would you prefer to use? Why? (3 marks) I gave the sampling distributions in part (i) – as seen, the OLS estimator is unbiased and is the “best” estimator out of all unbiased estimators based on the sample data, irrespective of whether prior information is valid or invalid. The RLS estimator, b*, is biased when the prior information is invalid, but still has a “smaller” variance-covariance matrix than the OLS estimator; additional information, whether correct or incorrect, improves precision, but may result in bias. This suggests that we compare b* with b using a matrix MSE comparison, to enable a trade-off between the gain in variance over the loss in unbiasedness. We have: MSE(b*) = Bias(b*)Bias(b*)′ + Var(b*) [ = (X′X) −1 R ′ R (X′X) −1 R ′ ] −1 [ (r − Rβ)(r − Rβ)′ R (X′X) −1 R ′ ] −1 R (X′X) −1 + σ2(X′X)-1 - σ2(X′X)-1R′[R(X′X)-1R′]-1R(X′X)-1 And, MSE(b)=Var(b)=σ2(X′X)-1, so that: MSE(b*)-MSE(b) = [ ] (X′X)-1R′[R(X′X)-1R]-1 (r − Rβ)(r − Rβ)′[R (X′X) −1 R ′]−1 − σ 2 I J R(X′X)-1 Whether this is > or < 0, in the matrix sense, will depend on the degree to which Rβ is from r, but, clearly, when Rβ is sufficiently far from r, our preference for b* over b will switch.