Heteroskedasticity-robust F

advertisement
Cross section and panel method
Lecture 10 (Ch8)
Heteroskedasticity
©
1
Understanding the problem of the
heteroskedasticity
Heteroskedasticity means the following.
Var(u|X)≠σ2 : variance of u depends on X.
Consider the following model.
y=β0+β1x1+β2x2+….+βkxk+u
Now remember the series of assumptions
MLR1. Linear in parameter
MLR2. Random sampling
MLR3. No-perfect colinearity
MLR4. Zero conditional mean: E(u|X)=0
MLR4’. Uncorreatedness of x and u: Cov(xj,u)=0
MLR5. Homoskedasticity: Var(u|X)=σ2
MLR6. u follows normal distribution.
MLR4’ is a
weaker
assumption
than MLR4
2
MLR1~MLR4: β1 ,…, βk are unbiased and
consistent.
MLR1~MLR4’: β1 ,…, βk are consistent.
MLR1~MLR4, and MLR5: β1 ,…, βk are
approximately normal, so t-test is valid.
MLR1~MLR4’, and MLR5: β1 ,…, βk are
approximately normal, so t-test is valid.
MLR1~MLR4, and MLR5, MLR6: β1 ,…, βk
have the t-distribution. Therefore, t-test is
valid.
3
Note that MLR1~MLR4 (or MLR4’) are
sufficient conditions for the consistency.
But, in order to conduct the “usual” t-test,
you need MLR5, the homoskedasticity
assumption. This is because, the standard
error formula we used before is not
consistent without MLR5
When MLR5 is not satisfied, but if you
apply the usual standard error anyway,
you may mistakenly judge a parameter to
be significant when it is actually not.
4
Heteroskedasticity robust standard
errors: One explanatory variable case
Heteroskedasticity means that Var(u|X)
depends on X. In such a case, we have to
modify the standard error.
Fortunately, there is a method to deal
with the heteroskedasticity. For the
illustrative purpose, I use one explanatory
variable case.
5
Consider the following model.
yi=β0+β1xi+ui
Using the argument of the asymptotic
normality in handout 4, we can show that
2 2
E
[(
x


)
ui ]
i
x
ˆ
n (1  1 )  N (0,
).....................(1)
2 2
[ x ]
d
6
(1) means that the variance of ˆ1 is given
by:
2 2
E
[(
x


)
ui ]
1
i
x
ˆ
var(1 ) 
............................(2)
2 2
n
[ x ]
Off course, you do not know E[(xi  x ) u ]
2
and  x . But, fortunately, you can
consistently estimate them.
2
2
i
7
To see this, notice that OLS estimators for
the coefficients are consistent even under
heteroskedasticity. So you can replace ui
with the OLS residual uˆ i .
The estimator for the heteroskedasticityrobust variance of ˆ1 is, then, given by:
n
varˆ( ˆ1 ) 
2 2
[(
x

x
)
uˆi ]
 i
i 1
2
[ SSTx ]
............................(3)
The square root of (3) is the
heteroskedasticity-robust standard error.
8
Heteroskedasticity-robust standard
errors: (multiple explanatory
variable case.)
Now, consider the following regression.
y=β0+β1x1+β2x2+….+βkxk+u ……..(4)
The valid estimator of var(ˆ j ) under
assumption MLR.1 ~MLR4 (or MLR4’) is
given by
n
varˆ( ˆ j ) 
2 2
ˆ
[
r
 ij uˆi ]
i 1
2
[ SSRj ]
............................(5)
This is the Heteroskedasticity robust
variance
9
where uˆ i is the OLS residual from the
original regression model (4).
rij2 Is the the OLS residual from regressing xj
on all other explanatory variables. SSRj2
is the sum of squared residuals from this
regression.
The heteroskedasticity-robust standard
error is the square root of (5). Sometimes,
this is simply called the robust standard
10
error.
Heteroskedasticity-robust tstatistic
Once the heteroskedasticity-robust
standard error is computed,
heteroskedasticity-robust t-statistic is
computed as
t
ˆ j  hypothesized value
robust standarderror
11
Heteroskedasticity-robust Fstatistic
When the error term is heteroskedastic,
the usual F-test is not valid.
Heteroskedasticity-robust F-statistic
needs to be computed for testing the joint
hypothesis.
Heteroskedasticity-robust F-statistic is
also called Heteroskedasticity robust
Wald Statistic.
12
Heteroskedasticity-robust F- statistic
involves fairly complex matrix notation.
Thus, the details are not covered in this
class. However. STATA automatically
compute this.
13
Heteroskedasticity robust
inference with STATA.
STATA computes the heteroskedasticity
robust standard errors and F statistic
automatically. Just use the ‘robust’ option
when running a regression.
Next slide shows the log salary regression
of academic economists in Japan.
14
reg lsalary female fullprof assocprof expacademic expacademicsq evermarried kids6 phdabroad extgrant privuniv phdoffer
Source
SS
df
MS
Model
Residual
19.7186266
9.58915481
11
326
1.79260242
.029414585
Total
29.3077814
337
.08696671
lsalary
Coef.
female
fullprof
assocprof
expacademic
expacademi~q
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0725573
.3330248
.1665502
.0214346
-.0003603
.0847564
.0051719
.0442625
.0001081
.1675923
.0751014
6.200925
Std. Err.
.0258508
.0505602
.0397755
.0042789
.0000925
.027398
.0224497
.0310316
.0000506
.0199125
.0202832
.0412649
Number of obs
F( 11,
326)
Prob > F
R-squared
Adj R-squared
Root MSE
t
P>|t|
-2.81
6.59
4.19
5.01
-3.90
3.09
0.23
1.43
2.14
8.42
3.70
150.27
0.005
0.000
0.000
0.000
0.000
0.002
0.818
0.155
0.033
0.000
0.000
0.000
=
=
=
=
=
=
338
60.94
0.0000
0.6728
0.6618
.17151
[95% Conf. Interval]
-.1234127
.2335594
.0883011
.0130168
-.0005423
.0308573
-.0389927
-.016785
8.56e-06
.1284191
.035199
6.119746
-.021702
.4324903
.2447994
.0298524
-.0001783
.1386556
.0493364
.10531
.0002076
.2067654
.1150039
6.282104
Homoskeda
sticity
version
reg lsalary female fullprof assocprof expacademic expacademicsq evermarried kids6 phdabroad extgrant privuniv phdoffer , robust
Linear regression
Number of obs
F( 11,
326)
Prob > F
R-squared
Root MSE
lsalary
Coef.
female
fullprof
assocprof
expacademic
expacademi~q
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0725573
.3330248
.1665502
.0214346
-.0003603
.0847564
.0051719
.0442625
.0001081
.1675923
.0751014
6.200925
Robust
Std. Err.
.0297252
.0710746
.0535357
.0056958
.0001225
.0293188
.0168028
.0204516
.0000333
.0179795
.0209559
.0475831
t
-2.44
4.69
3.11
3.76
-2.94
2.89
0.31
2.16
3.24
9.32
3.58
130.32
P>|t|
0.015
0.000
0.002
0.000
0.004
0.004
0.758
0.031
0.001
0.000
0.000
0.000
=
=
=
=
=
338
70.31
0.0000
0.6728
.17151
[95% Conf. Interval]
-.1310348
.1932021
.0612312
.0102294
-.0006013
.0270786
-.0278836
.0040288
.0000426
.1322218
.0338755
6.107316
-.0140799
.4728476
.2718693
.0326398
-.0001193
.1424343
.0382274
.0844962
.0001736
.2029627
.1163274
6.294533
Heterosked
asticity
version
15
As you can see, standard errors for
heteroskedasticity version are slightly
higher for most of the coefficients.
However, the statistical significance of the
majority of the variables is not altered.
16
Now, suppose that you want to test if
there is a gender salary gap for those with
experience greater than 20. Then you
estimate the following model,
Log(salary)=β0+β1female
+β2female×(exp>20)+ …
and test
H0: β1+β2=0
17
reg lsalary female female_exp20 fullprof assocprof expacademic expacademicsq evermarried kids6 phdabroad extgrant privuniv phdoffer, robust
Linear regression
Number of obs
F( 12,
325)
Prob > F
R-squared
Root MSE
lsalary
Coef.
female
female_exp20
fullprof
assocprof
expacademic
expacademi~q
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0873223
.0824519
.3418992
.1713122
.0202163
-.0003409
.0888877
.0051322
.0459212
.0000922
.1661488
.0747895
6.205193
Robust
Std. Err.
.0332972
.0676964
.070402
.0533299
.0057524
.0001234
.0294255
.0168447
.0202115
.000035
.0181462
.0208938
.0478927
t
-2.62
1.22
4.86
3.21
3.51
-2.76
3.02
0.30
2.27
2.63
9.16
3.58
129.56
P>|t|
0.009
0.224
0.000
0.001
0.001
0.006
0.003
0.761
0.024
0.009
0.000
0.000
0.000
=
=
=
=
=
338
64.33
0.0000
0.6745
.17134
[95% Conf. Interval]
-.1528275
-.0507266
.2033981
.0663969
.0088997
-.0005836
.0309993
-.0280062
.0061594
.0000233
.13045
.0336854
6.110974
-.0218172
.2156303
.4804003
.2762275
.0315329
-.0000981
.1467761
.0382705
.0856831
.0001611
.2018476
.1158937
6.299412
. test female+female_exp20=0
( 1)
female + female_exp20 = 0
F(
1,
325) =
Prob > F =
0.01
0.9348
We failed to reject the null hypothesis. Thus we did not find evidence that there is a
gender gap for those with experience greater than 20. But notice that, for those with
experience less than 20, there is a gender salary gap of 8.7%. Thus, gender gap is
concentrated for less experienced workers.
18
Note
Heteroskedasticity-robust standard error is
robust for any form of heteroskedasticity.
Since homoskedasticity is a special case of
heteroskedasticity, it is also robust to
homoskedasticity as well.
The majority of empirical researches uses
the heteroskedasticity-robust standard
errors. It is highly recommended that
students always use this.
19
Testing for
heteroskedasticity
Although the heteroskedasticity robust
standard errors work for any type of
heteroskedasticity including the special
case of the homoskedasticity, there are
some reasons for having a simple tests
that can detect the heteroskedasticity.
20
Basic idea is to test whether u2 is
correlated with the explanatory variables.
Consider the following regression.
uˆi2   0  1xi1  ...  k xki  error...........(6)
If the error terms are homoskedastic, all
the slope coefficients should be zero.
Thus, consider the following null
hypothesis.
H0: δ1=0, δ2=0,.…, δk=0
If we reject the null hypothesis, this is an
indication that the heteroskedasticity is
present.
21
We can test the null using either the Fstatistic or LM statistic:
F
Ruˆ22 / k
(1  Ruˆ22 ) /(n  k  1)
LM  n  Ruˆ22
where Ruˆ22is the R-squared from the regression (6).
Note, the F-stat has F(k,n-k-1) degree of freedom
and the LM is distributed as χ2k.
The LM version of the test is called the
Breusch-Pagan test for heteroskedasticity.
22
STATA implement a slightly different
version of the Breusch-Pagan test, and it is
done automatically. You can either
compute LM statistic as described in the
previous slide, or use the STATA
command to automatically test this.
To use STATA command, first run OLS
without robust option, then type the
following command.
estat hettes
23
Source
SS
df
MS
Model
Residual
19.7186266
9.58915481
11 1.79260242
326 .029414585
Total
29.3077814
337
lsalary
Coef.
female
fullprof
assocprof
expacademic
expacademi~q
evermarried
kids6
phdabroad
extgrant
privuniv
phdoffer
_cons
-.0725573
.3330248
.1665502
.0214346
-.0003603
.0847564
.0051719
.0442625
.0001081
.1675923
.0751014
6.200925
Number of obs
F( 11, 326)
Prob > F
R-squared
Adj R-squared
Root MSE
.08696671
Std. Err.
.0258508
.0505602
.0397755
.0042789
.0000925
.027398
.0224497
.0310316
.0000506
.0199125
.0202832
.0412649
t
-2.81
6.59
4.19
5.01
-3.90
3.09
0.23
1.43
2.14
8.42
3.70
150.27
P>|t|
0.005
0.000
0.000
0.000
0.000
0.002
0.818
0.155
0.033
0.000
0.000
0.000
=
=
=
=
=
=
338
60.94
0.0000
0.6728
0.6618
.17151
[95% Conf. Interval]
-.1234127
.2335594
.0883011
.0130168
-.0005423
.0308573
-.0389927
-.016785
8.56e-06
.1284191
.035199
6.119746
-.021702
.4324903
.2447994
.0298524
-.0001783
.1386556
.0493364
.10531
.0002076
.2067654
.1150039
6.282104
. estat hettes
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of lsalary
chi2(1)
=
Prob > chi2 =
4.01
0.0452
Null hypothesis that the error is homoskedastic is
rejected at 5% significance level.
24
Download