Heteroskedasticity Heteroskedasticity A. The Concept of Variation in Error Variances. To review, the sphericality assumption implies that we have homoskedasticity of errors, that their variance is generally constant across cases. This is violated, giving us another fun word to say (heteroskedasticity), when our predictive model performs particularly poorly in some set of cases. When might this happen? One possibility is that we have measurement error in some subset of our observations. Suppose you are explaining GDP growth across countries, but you have less faith in the GDP estimates from post-Soviet and African nations. In this case, there should be more random variation, and thus larger mean error variances, in these countries. Or perhaps your model of state expenditures does fairly well in explaining variation in most regions of the country, but just does terribly in the South. Here, errors in southern cases will have a larger variation than errors in cases from the rest of the country. In either case, Ω will not equal σ2I. Even if we do not have any covariance between our errors, the Ω matrix will be non spherical and look something like: 3.2 0 0 3.1 0 0 0 0 0 0 8.5 0 8.4 0 0 0 Tests for Heteroskedasticity. You have gone over many of the tests for heteroskedasticity in 204b. If you want, we can review these, but the main thing to keep in mind is that different tests are suitable for different substantive situations, and your test will be most powerful when you can say something about the pattern of the errors. If you think that the error variance is a function of some variables (such as the South, or Democracy), you should use the Breusch-Pagan Test. If you think that you can order your observations by their error variance (by putting Western Democracies, with their lower measurement error, first), you should use the Goldfeld-Quant Test. If you know nothing about the error variance, you can use the least powerful but most general White’s Test. What’s the Problem. Heteroskedasticity will not bias your coefficients. We never used Ω in the process of estimating β, we only used (X’X)-1X’Y. But where did we use Ω? We used it when we were estimating the variance of our coefficient estimates, which we then use to get our standard errors. So the standard errors of our coefficients will be biased, which can cause just as many problems for our causal inference when we get to the stargazing phase of our interpretation. Modeling Heteroskedasticity using Generalized Least Squares. GLS relaxes the assumption that Ω = σ2I. Instead, it uses the information in Ω to obtain unbiased estimates of β and of the variance of our parameter estimates. In the case with no autocorrelation: 3.2 0 0 3.1 0 0 0 0 0 0 8.5 0 0 0 0 8.4 1 3.2 0 1 0 0 0 0 1 3.1 0 0 1 8.5 0 0 0 0 0 1 8.4 βhatGLS = (X’Ω-1X)-1X’Ω-1Y Var(βhatGLS) = σ2(X’Ω-1X)-1 I. Relaxing the Sphericality Assumption A. Another likely instance of heteroskedasticity. Another reason that your errors may not have a generally constant variance is that the absolute value of errors will likely be larger when the absolute value of the dependent variable is larger. If you are looking at raw GDP, then your error in predicting the GDP of a rich country will probably be larger than your error in predicting the GDP of a poor country. The silver lining here is that you have a strong theory about the pattern in your error variance and a good measure to use when modeling your error variance. B. Rules of Thumb. When you are thinking about what the implications of the presence of heteroskedasticity are for your OLS estimates, here are three rules of thumb to keep in mind: i. Your coefficient estimates will not be biased, but they will be inefficient. This gives you biased reported standard errors, and that can be just as troubling for causal inference. ii. The greater the dispersion of your error variances, the greater the inefficiency of OLS will be compared to GLS. iii. If the heteroskedasticity is not correlated with your explanatory variables, OLS is not misleading (see Greene, pages 217-219 for a proof). The problem is, you never really know whether the pattern in error variances is correlated (though you can plot your errors to get an eyeball check). II. Generalized Least Squares A. Another way to write the Omega Matrix. We can separate out the diagonal elements of Ω into the product of some constant σ2 and elements ωi that are potentially unique to each observation. This is analogous to saying that the diagonal elements can be unique, and since σ2 can now be a scalar to multiply by the matrix Ω, σ2 will drop out of the matrix algebra for things like the GLS estimators. The advantage of this approach is that it focuses our attention on the differences between each observation’s error variance. 21 0 0 0 2 0 2 0 0 0 0 23 0 0 0 24 0 2 1 1 1 0 0 0 0 1 2 0 0 1 0 3 0 0 1 0 0 0 0 0 0 2 2 0 0 3 0 0 0 0 4 0 0 0 1 4 B. Estimators using GLS. If you think you know what Ω looks like, then you can obtain efficient estimators by using this version of generalized least squares. It is identical to weighted least squares, because what you are really doing is weighting each observation’s X values and Y values by the inverse of their estimated error variance, ω. If the error variance is large, then they get less weight. If the error variance is small, 1/ω is larger, and they get relatively more weight. Here are the estimators, and you can look at Greene, p. 207, 225-227 for more: βhatGLS = (X’Ω-1X)-1X’Ω-1Y Var(βhatGLS) = σ2(X’Ω-1X)-1 C. How to Build Your Estimated Omega Matrix. You are going to have to use some theory to get a general idea of the patterns, and then use your data to fill in the ωs individually or as groups. i. If you think different regions have different error variances, then run regressions on subsets of your data and use root mean squared errors as estimates of ωs. ii. If you think your variance increases as some variable increases (or decreases), then use the inverse of that variable (or that variable) as your weight. Here is a model that uses various rules of committee procedures, elements of legislative professionalism, and the number of bills introduced in state legislatures to explain their “batting averages” of bill passage in 1997-1998. reg batting senhear uplimit ksalary ksession kstaff introreg Source | SS df MS -------------+-----------------------------Model | .730694678 6 .121782446 Residual | .612959937 42 .014594284 -------------+-----------------------------Total | 1.34365461 48 .027992804 Number of obs F( 6, 42) Prob > F R-squared Adj R-squared Root MSE = = = = = = 49 8.34 0.0000 0.5438 0.4786 .12081 -----------------------------------------------------------------------------batting | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------senhear | .1022993 .0469589 2.18 0.035 .0075324 .1970662 uplimit | .100388 .0492296 2.04 0.048 .0010387 .1997373 ksalary | -.0147299 .0167533 -0.88 0.384 -.0485394 .0190795 ksession | -.0251643 .0118604 -2.12 0.040 -.0490997 -.001229 kstaff | .0071363 .0111804 0.64 0.527 -.0154267 .0296992 introreg | -8.27e-06 4.22e-06 -1.96 0.056 -.0000168 2.36e-07 _cons | .4732023 .0598806 7.90 0.000 .3523583 .5940463 -----------------------------------------------------------------------------predict battingrs, r (951 missing values generated) Rather than relying solely on my theoretical hunch, I can look at patterns in error variance by plotting those errors that I saved after running my initial model by variables in the model: plot battingrs introreg .317322 + | * | | | ** | R | * * e | * s | i | * * * d | * * * u | ** ** * a | * ** ** l | * ** * s | **** * * * * | * * | * * * | * | | * -.261077 + * +----------------------------------------------------------------+ 745 bills introduced in regular legislative 32263 . reg batting senhear uplimit ksalary ksession kstaff introreg [aweight=1/introreg] (sum of wgt is 2.2627e-02) Source | SS df MS -------------+-----------------------------Model | .738968006 6 .123161334 Residual | .585426577 42 .013938728 -------------+-----------------------------Total | 1.32439458 48 .027591554 Number of obs F( 6, 42) Prob > F R-squared Adj R-squared Root MSE = = = = = = 49 8.84 0.0000 0.5580 0.4948 .11806 -----------------------------------------------------------------------------batting | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------senhear | .081729 .0462613 1.77 0.085 -.01163 .175088 uplimit | .1031672 .0435644 2.37 0.023 .0152506 .1910838 ksalary | -.012647 .0165156 -0.77 0.448 -.0459768 .0206828 ksession | -.0206915 .0110394 -1.87 0.068 -.04297 .001587 kstaff | .0038876 .0108663 0.36 0.722 -.0180413 .0258166 introreg | -.0000229 9.73e-06 -2.35 0.023 -.0000425 -3.26e-06 _cons | .5147589 .0577829 8.91 0.000 .3981483 .6313696 -----------------------------------------------------------------------------. reg batting senhear uplimit ksalary ksession kstaff introreg [aweight=introreg] (sum of wgt is 1.9167e+05) Source | SS df MS -------------+-----------------------------Model | .675155622 6 .112525937 Residual | .439093138 42 .010454599 -------------+-----------------------------Total | 1.11424876 48 .023213516 Number of obs F( 6, 42) Prob > F R-squared Adj R-squared Root MSE = = = = = = 49 10.76 0.0000 0.6059 0.5496 .10225 -----------------------------------------------------------------------------batting | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------senhear | .0870888 .0409285 2.13 0.039 .0044917 .169686 uplimit | .1326276 .0524956 2.53 0.015 .0266871 .238568 ksalary | -.0171153 .0160612 -1.07 0.293 -.049528 .0152974 ksession | -.0312976 .0118149 -2.65 0.011 -.055141 -.0074541 kstaff | .0159856 .0108125 1.48 0.147 -.005835 .0378061 introreg | -3.36e-06 1.90e-06 -1.77 0.084 -7.21e-06 4.78e-07 _cons | .4247547 .0602255 7.05 0.000 .3032146 .5462948 -----------------------------------------------------------------------------plot battingrs senhear .317322 + | * | | | * | R | * e | * s | i | * * d | * * u | * * a | * l | * * s | * * | * | * | * | | * -.261077 + * +----------------------------------------------------------------+ 0 senate committees must hear all bills (1 1 . plot battingrs ksession .317322 + | * | | | * * | R | * * e | * s | i | * * * d | * * * u | * * * * a | * * * * * l | * * * * s | * * * * | * | * * | * | | * -.261077 + * +----------------------------------------------------------------+ 3 karl kurtz's session score 10 D. What if you have no clue about Omega? You can use White’s estimator, an option on just about all Stata commands: (See Greene, p. 219-220) 1 X ' X 1 n 2 X ' X Estimated AsymptoticVariance ( ˆ ) i 1 ei xi xi ' n n n n 1 reg batting senhear uplimit ksalary ksession kstaff introreg, robust Regression with robust standard errors Number of obs = F( 6, 42) = Prob > F = R-squared = Root MSE = 49 17.25 0.0000 0.5438 .12081 -----------------------------------------------------------------------------| Robust batting | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------senhear | .1022993 .0424499 2.41 0.020 .016632 .1879666 uplimit | .100388 .0556844 1.80 0.079 -.0119877 .2127636 ksalary | -.0147299 .0113512 -1.30 0.201 -.0376376 .0081777 ksession | -.0251643 .0125633 -2.00 0.052 -.0505182 .0001895 kstaff | .0071363 .0110034 0.65 0.520 -.0150696 .0293421 introreg | -8.27e-06 4.79e-06 -1.73 0.091 -.0000179 1.39e-06 _cons | .4732023 .0620882 7.62 0.000 .3479032 .5985014 ------------------------------------------------------------------------------