Brief Suggested Solutions ˆJ )r Rb(]R)XX(R[)r Rb( F σ

advertisement
DEPARTMENT OF ECONOMICS
UNIVERSITY OF VICTORIA
ECONOMICS 366: ECONOMETRICS II
SPRING TERM 2005: ASSIGNMENT TWO
Brief Suggested Solutions
Question One:
Consider the classical T-observation, K-regressor linear regression model:
y = Xβ + e
;
e ~ N(0, σ2IT)
where X is nonstochastic of full column rank. Suppose we specify prior, nonsample, information
on β as J exact linear restrictions Rβ=r, where R is a (J×K) nonstochastic, full row rank, matrix
and r is a nonstochastic (J×1) vector. We consider testing whether the sample is compatible with
the prior beliefs by examining the null hypothesis H0: Rβ=r versus Ha: Rβ≠r.
(a)
We can test H0 using the F-statistic:
F=
(Rb − r )′[R (X′X) −1 R ′]−1(Rb − r )
Jσˆ 2
(1)
where b is the OLS estimator of β, with corresponding residual vector ê =y-Xb, and
σˆ 2 = ê′ê /(T − K ) . Let:
b* = b - (X′X)-1R′[R(X′X)-1R′]-1(Rb-r) and e* = y – Xb*.
Show that F can also be written as:
F=
(b − b* )′R ′[R (X′X) −1 R ′]−1 R (b − b* )
Jσˆ 2
(2)
and as:
F=
(e* ' e* − ê' ê)
Jσˆ 2
(3)
(5 marks)
2.5 marks for each part of the proof. Each ratio has the same denominator so we only
need to work with the numerators. For (2), we begin with noting that Rb*=r (it is derived
to ensure that it satisfies the restrictions), so that Rb-r=Rb-Rb*=R(b-b*) and we can write:
(Rb-r)′[R(X′X)-1R′]-1(Rb-r) = (b-b*)′R′[R(X′X)-1R′]-1R(b-b*) as required.
For (3), we use the result given in Hint 1 for part (b), which you considered in Test 1,
that:
e*′e* = ê ′ ê + (Rb-r)′[R(X′X)-1R′]-1(Rb-r)
so the result follows immediately as this implies that:
e*′e* - ê ′ ê = (Rb-r)′[R(X′X)-1R′]-1(Rb-r)
(b)
Suppose we estimate the classical model subject to the restrictions Rβ=r; i.e., we use the
restricted least squares estimator, b*, as the estimator of β. Consider a corresponding
estimator of σ2:
σ *2 =
( y − Xb* )′( y − Xb* )
e *′ e *
=
T−K+J
T−K+J
Use the following Theorems and Hints to prove that σ*2 is an unbiased estimator of σ2 under the
null hypothesis that Rβ=r.
(10 marks)
Theorem 1:
If the (n×1) vector x ~ N(0, In) and A is an idempotent (n×n) matrix, then x′Ax has a chi-squared
distribution with degrees of freedom equal to the rank of A.
Theorem 2:
If the (n×1) vector x~N(0, Ω), then x′Ω-1x has a chi-squared distribution with n degrees of
freedom.
Theorem 3:
If z1 and z2 are independent chi-squared variables with m1 and m2 degrees of freedom,
respectively, then
z1 + z2 ~ χ2(m1+m2)
Theorem 4:
If z is a chi-squared variable with m degrees of freedom, then E(z)=m.
Hints:
1.
Use the result that you proved in Test 1:
e*′e* = ê ′ ê + (Rb-r)′[R(X′X)-1R′]-1(Rb-r)
2
2
2.
Show that ( ê ′ ê /σ ) ~ χ (T-K) using Theorem 1.
3.
Show that, under the null hypothesis that Rβ=r, ((Rb-r)′[R(X′X) R′] (Rb-
4.
The random variables ( ê ′ ê /σ ) and (Rb-r)′[R(X′X) R′] (Rb-r)/σ ) are
-1
-1
r)/σ2) ~ χ2(J).
2
-1
-1
2
independently distributed (you may use this result without proof).
Our aim is to prove that σ*2 is an unbiased estimator of σ2 under the null hypothesis that
Rβ=r; i.e., that E(σ*2)=σ2 when Rβ=r. We have:
σ *2 =
ê′ê + (Rb − r )′[R (X′X) −1 R ′]−1 (Rb − r )
e *′ e *
=
T−K+J
T−K+J
using the result given as Hint 1. As the only random variables are in the numerator, I
work only with those terms to begin. Considering ê ′ ê first, recall from Econ 365, we
have:
ê ′ ê =e′Me
where M=I-X(X′X)-1X′ and is idempotent, symmetric of rank (T-K). As the (T×1) vector e
~ N(0, σ2IT), (e/σ) ~ N(0, IT), so applying Theorem 1, we have that
( ê ′ ê /σ2) =(e′Me/σ2) ~ χ2(T-K)
so that we have shown the result in Hint 2. Moving now to show the result given in Hint
3., we want to consider the random variable ((Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2) when H0 is
true. We have that the (K×1) vector b is distributed as:
b ~ N(β, σ2(X′X)-1)
so that the (J×1) vector (Rb-r) is distributed as:
Rb-r ~ N(Rβ-r, σ2R(X′X)-1R′)
and, when Rβ=r then:
Rb-r ~ N(0, σ2R(X′X)-1R′) and (Rb-r)/σ ~ N(0, R(X′X)-1R′)
so, using Theorem 2,
((Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2) ~ χ2(J) when Rβ=r
as Cov((Rb-r)/σ)=R(X′X)-1R′ is a (J×J) matrix.
So, combining the results, we have, when Rβ=r, that:
(e*′e*)/σ2 = ( ê ′ ê )/σ2 + (Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2
= z1 + z2
where z1 = ( ê ′ ê /σ2) ~ χ2(T-K) and z2 = (Rb-r)′[R(X′X)-1R′]-1(Rb-r)/σ2 ~ χ2(J) when Rβ=r.
So, given the result in Hint 4, it follows, using Theorem 3, that:
(e*′e*)/σ2 ~ χ2(T-K+J) when Rβ=r.
Finally, using Theorem 4, we have:
E((e*′e*)/σ2) = (T-K+J) when Rβ=r
so that:
E(e*′e*) = σ2(T-K+J) when Rβ=r
and
E(σ*2) = E(e*′e*)/(T-K+J) = σ2(T-K+J)/(T-K+J) = σ2 as desired;
that is, σ*2 is an unbiased estimator of σ2 when Rβ=r.
Question Two:
The impact of parental characteristics on a child’s birth weight is often explored; e.g., Mullahy, J.
(1997), “Instrumental-Variable Estimation of Count Data Models: Applications to Models of
Cigarette Smoking Behavior”, Review of Economics and Statistics 79, 596-593. Suppose we
hypothesize the following model:
bwghtt = β1 + β2cigst + β3parityt + β4faminct + β5mothereduct + β6fathereduct + et
(1)
where for the t’th child:
bwght = birth weight, in pounds
cigs = average number of cigarettes the mother smoked per day during the pregnancy
parity = the birth order of this child
faminc = annual family income, dollars
mothereduc = years of schooling for the mother
fathereduc = years of schooling for the father
The workfile “366ass2.wf1”, available from the course web page, contains data on these variables
for 1,191 children.
(a) What signs would you expect for the slope parameters? Explain.
(1 mark)
A priori, I would expect β2<0, β3>0, β4>0, β5>0 and β6>0, as, ceteris paribus, more
cigarettes smoked will decrease birth weight; typically birth weight increases with each
additional child that the mother carries; higher income might be expected to result in
better nutrition and care so higher birth weight; I am unclear as to the impact of
education, though if there is an effect then higher education might imply more
information and so better care etc and higher birth weight, though I think this is
stretching things on this one!
(b) Estimate Model (1) using least squares; report the results. Comment on the results in the light
of part (a).
(2 marks)
The output follows:
Dependent Variable: BWGHT
Method: Least Squares
Date: 01/19/05
Time: 19:29
Sample: 1 1191
Included observations: 1191
Variable
Coefficient
Std. Error
t-Statistic
C
CIGS
PARITY
FAMINC
MOTHEDUC
FATHEDUC
114.5243
-0.595936
1.787603
0.056041
-0.370450
0.472394
3.728453
0.110348
0.659406
0.036562
0.319855
0.282643
30.71631
-5.400524
2.710932
1.532794
-1.158182
1.671345
0.038748
0.034692
19.78878
464041.1
-5242.220
1.938419
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Prob.
0.0000
0.0000
0.0068
0.1256
0.2470
0.0949
119.5298
20.14124
8.813133
8.838737
9.553500
0.000000
The signs of the coefficients are as expected, except for the estimate of the impact of
mother’s education, though the effect is statistically insignificant.
(c) Suppose we hypothesize, after controlling for cigs, parity, and faminc, that parents’ education
has no effect on birth weight. Specify the null hypothesis that corresponds to this prior belief.
What alternative hypothesis would you specify? Explain.
(2 marks)
This belief would correspond to the null hypothesis H0: β5=0 and β6=0. As there are two
restrictions under the null, the alternative hypothesis is that at least one of the
parameters is nonzero; i.e., Ha: at least one of β5 or β6 ≠0, which corresponds to the
three possibilities - β5=0 and β6≠0 or β6=0 and β5≠0 or β5≠0 and β6≠0.
(d) Write the nonsample information from part (b) in the form Rβ=r, appropriately defining R
and r, including their dimensions and ranks.
(2 marks)
As there are two restrictions under H0, the dimension of R is (2×6) of rank 2, while the
dimension of r is (2×1) and of rank 1. Specifically, we have:
β
 β1 
β 
r
R
 2


β
0
0
0
0
1
0
0
 

 3
0 0 0 0 0 1 β  = 0

  4  
( 2×6)
β5  ( 2×1)
 
β6 
(6×1)
which produces the restrictions under test; the first row providing β5=0 and the second
row providing β6=0.
(e) Specify the restricted model that imposes the nonsample information from part (a). Estimate
this model to obtain the restricted least squares (RLS) estimates of the parameters; report the
results.
(1 mark)
Imposing that β5=0 and β6=0 gives the restricted model:
bwghtt = β1 + β2cigst + β3parityt + β4faminct + et
(2)
The EViews output from estimating this model is:
Dependent Variable: BWGHT
Method: Least Squares
Date: 01/19/05
Time: 19:30
Sample: 1 1191
Included observations: 1191
Variable
C
CIGS
PARITY
FAMINC
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
Std. Error
t-Statistic
115.4699
-0.597852
1.832274
0.067062
1.655898
0.108770
0.657540
0.032394
69.73251
-5.496474
2.786558
2.070204
0.036416
0.033981
19.79607
465166.8
-5243.663
1.939941
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
Prob.
0.0000
0.0000
0.0054
0.0386
119.5298
20.14124
8.812197
8.829267
14.95330
0.000000
(f) Compare the unrestricted and restricted least squares estimates of the parameters and the sum
of squared errors. If the prior information is correct, would you expect the unrestricted and
restricted outcomes to be “close” or not? Explain. Accordingly, would your comparisons
suggest that the prior, nonsample, information is perhaps true?
(2 marks)
The table below summarises the estimates, with standard errors in parentheses:
Parameter
β1
β2
β3
β4
β5
β6
OLS Estimates
114.524 (3.728)
-0.596 (0.110)
1.788 (0.659)
0.056 (0.037)
-0.370 (0.320)
0.472 (0.283)
RLS Estimates
115.470 (1.656)
-0.598 (0.109)
1.832 (0.658)
0.067 (0.032)
0
0
If the prior information is correct then we would expect there to be little difference
between the RLS and OLS estimates of the parameters of interest to be similar, as the
sample will approximately reflect the prior information without having to impose the
information. The RLS and OLS estimates of β5 and β6 are not similar in magnitude,
which may suggest that the prior information is not compatible with the sample; however,
this needs to be weighed against the precision with which we estimate the parameters.
Note that the estimates of the first four parameters are similar in magnitude and, as
expected, we estimate them with greater precision using RLS than OLS. Comparing the
sum of squared errors, we have SSEU=464041.1 and SSER=465166.8; as expected
SSEU<SSER though there is only a 0.2% difference between, which suggests that we are
likely to support that the sample is compatible with the restrictions.
(g) Given your view in part (f), and the formulae for the F-ratio considered in Question One,
would you expect a “close to zero” or “far from zero” value for the F-ratio that examines
whether the sample is compatible with the prior beliefs? Carefully explain.
(2 marks)
One of the versions of the F-statistic examined in Question One is:
F=
(e* ' e* − ê' ê)
Jσˆ 2
ˆ 2 = SSE U /(T − K ) . As we found little relative
where SSEU= ê′ê , SSER= e *′ e * , and σ
difference between the two sums of squares, it is likely that our sample value for this
ratio is “close to zero”.
(h) Use EViews to undertake the F-test to ascertain whether this sample is compatible with the
nonsample belief that parents’ education has no impact on birth weight, after controlling for
cigs, parity, and faminc. Report the results. Comment on the outcome in light of your
expectations from parts (f) and (g).
(3 marks)
You may have used either of the F-ratio rules provided in Question One to undertake the
test; I used the first one based only on the unrestricted model. The EViews output is:
Wald Test:
Equation: UNRESTRICTED
Test Statistic
Value
F-statistic
Chi-square
Df
1.437269
2.874537
Probability
(2, 1185)
2
0.2380
0.2376
Value
Std. Err.
Null Hypothesis Summary:
Normalized Restriction (= 0)
C(5)
C(6)
-0.370450
0.472394
0.319855
0.282643
Restrictions are linear in coefficients.
which gives that the sample value for the F statistic is 1.437. The p-value associated
with this is: p=Pr(F(2,1185)>1.437)=0.238, which is the smallest significance level for
which I would reject the null hypothesis; the results would support the null at traditional
levels of significance – e.g., at the 10% or 5% level. That is, the sample is compatible
with the prior belief that the education of the mother and father is not relevant in
determining a child’s birth weight.
(i) Suppose that the prior information is correct. Compare the sampling properties of the OLS
and RLS estimators of the parameters in this case. Which estimator would you prefer to use?
Why?
(2 marks)
The sampling distributions of the two estimators are:
b ~ N(β, σ2(X′X)-1)
and
[
b* ~ N( β + (X′X) −1 R ′ R (X′X) −1 R ′
]
−1
(r − Rβ) ,σ2(X′X)-1[IK- R′[R(X′X)-1R′]-1R(X′X)-1])
Irrespective of whether the null hypothesis is correct or false, the OLS estimator, b, is
unbiased and best, in the sense that it has the “smallest” (in the matrix sense) variancecovariance matrix of any other unbiased estimator of β, based solely on the sample data.
When the prior information is correct, b* is also an unbiased estimator and its variancecovariance matrix is:
Var(b*) = σ2(X′X)-1 - σ2(X′X)-1R′[R(X′X)-1R′]-1R(X′X)-1
= Var(b) [IK- R′[R(X′X)-1R′]-1R(X′X)-1] < Var(b) in the matrix sense.
Accordingly, when the prior information is correct, our preference is to use b* over b, as it
has a “smaller” variance-covariance matrix, incorporating additional information.
(j) Suppose that the prior information is invalid. Compare the sampling properties of the OLS
and RLS estimators of the parameters in this case. Which estimator would you prefer to use?
Why?
(3 marks)
I gave the sampling distributions in part (i) – as seen, the OLS estimator is unbiased and
is the “best” estimator out of all unbiased estimators based on the sample data,
irrespective of whether prior information is valid or invalid. The RLS estimator, b*, is
biased when the prior information is invalid, but still has a “smaller” variance-covariance
matrix than the OLS estimator; additional information, whether correct or incorrect,
improves precision, but may result in bias. This suggests that we compare b* with b
using a matrix MSE comparison, to enable a trade-off between the gain in variance over
the loss in unbiasedness. We have:
MSE(b*) = Bias(b*)Bias(b*)′ + Var(b*)
[
= (X′X) −1 R ′ R (X′X) −1 R ′
]
−1
[
(r − Rβ)(r − Rβ)′ R (X′X) −1 R ′
]
−1
R (X′X) −1
+ σ2(X′X)-1 - σ2(X′X)-1R′[R(X′X)-1R′]-1R(X′X)-1
And, MSE(b)=Var(b)=σ2(X′X)-1, so that:
MSE(b*)-MSE(b) =
[
]
(X′X)-1R′[R(X′X)-1R]-1 (r − Rβ)(r − Rβ)′[R (X′X) −1 R ′]−1 − σ 2 I J R(X′X)-1
Whether this is > or < 0, in the matrix sense, will depend on the degree to which Rβ is
from r, but, clearly, when Rβ is sufficiently far from r, our preference for b* over b will
switch.
Download