Dealing With Heteroskedasticity

advertisement
Heteroskedasticity
Heteroskedasticity
A. The Concept of Variation in Error Variances. To review, the sphericality
assumption implies that we have homoskedasticity of errors, that their variance is generally
constant across cases. This is violated, giving us another fun word to say
(heteroskedasticity), when our predictive model performs particularly poorly in some set of
cases.
When might this happen? One possibility is that we have measurement error in
some subset of our observations. Suppose you are explaining GDP growth across countries,
but you have less faith in the GDP estimates from post-Soviet and African nations. In this
case, there should be more random variation, and thus larger mean error variances, in these
countries. Or perhaps your model of state expenditures does fairly well in explaining
variation in most regions of the country, but just does terribly in the South. Here, errors in
southern cases will have a larger variation than errors in cases from the rest of the country.
In either case, Ω will not equal σ2I. Even if we do not have any covariance between
our errors, the Ω matrix will be non spherical and look something like:
3.2 0
 0 3.1

0
0

0
0
0
0
8.5
0





8.4
0
0
0
Tests for Heteroskedasticity. You have gone over many of the tests for
heteroskedasticity in 204b. If you want, we can review these, but the main thing to keep in
mind is that different tests are suitable for different substantive situations, and your test will
be most powerful when you can say something about the pattern of the errors. If you think
that the error variance is a function of some variables (such as the South, or Democracy),
you should use the Breusch-Pagan Test. If you think that you can order your observations
by their error variance (by putting Western Democracies, with their lower measurement
error, first), you should use the Goldfeld-Quant Test. If you know nothing about the error
variance, you can use the least powerful but most general White’s Test.
What’s the Problem. Heteroskedasticity will not bias your coefficients. We never
used Ω in the process of estimating β, we only used (X’X)-1X’Y. But where did we use Ω?
We used it when we were estimating the variance of our coefficient estimates, which we then
use to get our standard errors. So the standard errors of our coefficients will be biased,
which can cause just as many problems for our causal inference when we get to the
stargazing phase of our interpretation.
Modeling Heteroskedasticity using Generalized Least Squares. GLS relaxes the assumption
that Ω = σ2I. Instead, it uses the information in Ω to obtain unbiased estimates of β and of
the variance of our parameter estimates. In the case with no autocorrelation:
3.2 0
 0 3.1

0
0

0
0
0
0
8.5
0
0
0 
0

8.4
 1
 3.2

 0
1
 

 0

 0

0
0
1
3.1
0
0
1
8.5
0
0

0 

0 


0 

1 
8.4 
βhatGLS = (X’Ω-1X)-1X’Ω-1Y
Var(βhatGLS) = σ2(X’Ω-1X)-1
I. Relaxing the Sphericality Assumption
A. Another likely instance of heteroskedasticity. Another reason that your errors
may not have a generally constant variance is that the absolute value of errors will likely be
larger when the absolute value of the dependent variable is larger. If you are looking at raw
GDP, then your error in predicting the GDP of a rich country will probably be larger than
your error in predicting the GDP of a poor country. The silver lining here is that you have a
strong theory about the pattern in your error variance and a good measure to use when
modeling your error variance.
B. Rules of Thumb. When you are thinking about what the implications of the
presence of heteroskedasticity are for your OLS estimates, here are three rules of thumb to
keep in mind:
i. Your coefficient estimates will not be biased, but they will be inefficient.
This gives you biased reported standard errors, and that can be just as troubling for causal
inference.
ii. The greater the dispersion of your error variances, the greater the
inefficiency of OLS will be compared to GLS.
iii. If the heteroskedasticity is not correlated with your explanatory variables,
OLS is not misleading (see Greene, pages 217-219 for a proof). The problem is, you never
really know whether the pattern in error variances is correlated (though you can plot your
errors to get an eyeball check).
II. Generalized Least Squares
A. Another way to write the Omega Matrix. We can separate out the diagonal
elements of Ω into the product of some constant σ2 and elements ωi that are potentially
unique to each observation. This is analogous to saying that the diagonal elements can be
unique, and since σ2 can now be a scalar to multiply by the matrix Ω, σ2 will drop out of the
matrix algebra for things like the GLS estimators. The advantage of this approach is that it
focuses our attention on the differences between each observation’s error variance.
 21
0
0
0 


2
0
 2
0
0 


 0
0
 23
0 


0
0
 24 
 0
 
2
1
1

 1
0


0


0

0
1
2
0
0
1
0
3
0
0
1 0 0 0 
0 
0 0 
2
2

 
 0 0 3 0 


 0 0 0 4 

0 

0 


0 

1 
4 
B. Estimators using GLS. If you think you know what Ω looks like, then you can
obtain efficient estimators by using this version of generalized least squares. It is identical to
weighted least squares, because what you are really doing is weighting each observation’s X
values and Y values by the inverse of their estimated error variance, ω. If the error variance
is large, then they get less weight. If the error variance is small, 1/ω is larger, and they get
relatively more weight. Here are the estimators, and you can look at Greene, p. 207, 225-227
for more:
βhatGLS = (X’Ω-1X)-1X’Ω-1Y
Var(βhatGLS) = σ2(X’Ω-1X)-1
C. How to Build Your Estimated Omega Matrix. You are going to have to use
some theory to get a general idea of the patterns, and then use your data to fill in the ωs
individually or as groups.
i. If you think different regions have different error variances, then run
regressions on subsets of your data and use root mean squared errors as estimates of ωs.
ii. If you think your variance increases as some variable increases (or
decreases), then use the inverse of that variable (or that variable) as your weight.
Here is a model that uses various rules of committee procedures, elements of
legislative professionalism, and the number of bills introduced in state legislatures to explain
their “batting averages” of bill passage in 1997-1998.
reg batting senhear uplimit ksalary ksession kstaff introreg
Source |
SS
df
MS
-------------+-----------------------------Model | .730694678
6 .121782446
Residual | .612959937
42 .014594284
-------------+-----------------------------Total | 1.34365461
48 .027992804
Number of obs
F( 6,
42)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
49
8.34
0.0000
0.5438
0.4786
.12081
-----------------------------------------------------------------------------batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.1022993
.0469589
2.18
0.035
.0075324
.1970662
uplimit |
.100388
.0492296
2.04
0.048
.0010387
.1997373
ksalary | -.0147299
.0167533
-0.88
0.384
-.0485394
.0190795
ksession | -.0251643
.0118604
-2.12
0.040
-.0490997
-.001229
kstaff |
.0071363
.0111804
0.64
0.527
-.0154267
.0296992
introreg | -8.27e-06
4.22e-06
-1.96
0.056
-.0000168
2.36e-07
_cons |
.4732023
.0598806
7.90
0.000
.3523583
.5940463
-----------------------------------------------------------------------------predict battingrs, r
(951 missing values generated)
Rather than relying solely on my theoretical hunch, I can look at
patterns in error variance by plotting those errors that I saved after running
my initial model by variables in the model:
plot battingrs introreg
.317322 +
|
*
|
|
| **
|
R
| *
*
e
|
*
s
|
i
| *
* *
d
| * * *
u
| ** **
*
a
|
* ** **
l
| *
** *
s
| **** * *
*
*
| * *
|
*
* *
|
*
|
|
*
-.261077 +
*
+----------------------------------------------------------------+
745
bills introduced in regular legislative
32263
. reg batting senhear uplimit ksalary ksession kstaff introreg
[aweight=1/introreg]
(sum of wgt is
2.2627e-02)
Source |
SS
df
MS
-------------+-----------------------------Model | .738968006
6 .123161334
Residual | .585426577
42 .013938728
-------------+-----------------------------Total | 1.32439458
48 .027591554
Number of obs
F( 6,
42)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
49
8.84
0.0000
0.5580
0.4948
.11806
-----------------------------------------------------------------------------batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.081729
.0462613
1.77
0.085
-.01163
.175088
uplimit |
.1031672
.0435644
2.37
0.023
.0152506
.1910838
ksalary |
-.012647
.0165156
-0.77
0.448
-.0459768
.0206828
ksession | -.0206915
.0110394
-1.87
0.068
-.04297
.001587
kstaff |
.0038876
.0108663
0.36
0.722
-.0180413
.0258166
introreg | -.0000229
9.73e-06
-2.35
0.023
-.0000425
-3.26e-06
_cons |
.5147589
.0577829
8.91
0.000
.3981483
.6313696
-----------------------------------------------------------------------------. reg batting senhear uplimit ksalary ksession kstaff introreg
[aweight=introreg]
(sum of wgt is
1.9167e+05)
Source |
SS
df
MS
-------------+-----------------------------Model | .675155622
6 .112525937
Residual | .439093138
42 .010454599
-------------+-----------------------------Total | 1.11424876
48 .023213516
Number of obs
F( 6,
42)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
49
10.76
0.0000
0.6059
0.5496
.10225
-----------------------------------------------------------------------------batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.0870888
.0409285
2.13
0.039
.0044917
.169686
uplimit |
.1326276
.0524956
2.53
0.015
.0266871
.238568
ksalary | -.0171153
.0160612
-1.07
0.293
-.049528
.0152974
ksession | -.0312976
.0118149
-2.65
0.011
-.055141
-.0074541
kstaff |
.0159856
.0108125
1.48
0.147
-.005835
.0378061
introreg | -3.36e-06
1.90e-06
-1.77
0.084
-7.21e-06
4.78e-07
_cons |
.4247547
.0602255
7.05
0.000
.3032146
.5462948
-----------------------------------------------------------------------------plot battingrs senhear
.317322 +
| *
|
|
| *
|
R
| *
e
| *
s
|
i
| *
*
d
| *
*
u
| *
*
a
| *
l
| *
*
s
| *
*
| *
| *
| *
|
| *
-.261077 + *
+----------------------------------------------------------------+
0
senate committees must hear all bills (1
1
. plot battingrs ksession
.317322 +
| *
|
|
|
*
*
|
R
|
*
*
e
|
*
s
|
i
|
*
*
*
d
| *
*
*
u
| *
*
*
*
a
|
*
*
*
*
*
l
| *
*
*
*
s
|
*
*
*
*
| *
|
*
*
|
*
|
| *
-.261077 +
*
+----------------------------------------------------------------+
3
karl kurtz's session score
10
D. What if you have no clue about Omega? You can use White’s estimator, an
option on just about all Stata commands: (See Greene, p. 219-220)
1  X ' X  1 n 2
 X ' X 
Estimated AsymptoticVariance ( ˆ )  
 i 1 ei xi xi ' 

n  n  n
 n 
1
reg batting senhear uplimit ksalary ksession kstaff introreg, robust
Regression with robust standard errors
Number of obs =
F( 6,
42) =
Prob > F
=
R-squared
=
Root MSE
=
49
17.25
0.0000
0.5438
.12081
-----------------------------------------------------------------------------|
Robust
batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.1022993
.0424499
2.41
0.020
.016632
.1879666
uplimit |
.100388
.0556844
1.80
0.079
-.0119877
.2127636
ksalary | -.0147299
.0113512
-1.30
0.201
-.0376376
.0081777
ksession | -.0251643
.0125633
-2.00
0.052
-.0505182
.0001895
kstaff |
.0071363
.0110034
0.65
0.520
-.0150696
.0293421
introreg | -8.27e-06
4.79e-06
-1.73
0.091
-.0000179
1.39e-06
_cons |
.4732023
.0620882
7.62
0.000
.3479032
.5985014
------------------------------------------------------------------------------
Download