Chapter 9 Heteroskedasticity Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. Learning Objectives • Understand methods for detecting heteroskedasticity • Correct for heteroskedasticity 9-2 What is Heteroskedasticity? Heteroskedasticity is when the error term has a nonconstant variance or πππ π = ππ2 . Homoskedasticity is when the error term has a nonconstant variance πππ π = π 2 . Notice that for homoskedasticity there is no i subscript so that the variance is constant while for heteroskedasticity the i subscript denotes that the variance changes for each observation 9-3 A Picture of Homoskedasticity Versus Heteroskedasticity 9-4 The Issues And Consequences Associated With Heteroskedastic Data Problem: Heteroskedasticity violates assumption M6, which states that the error term must have constant variance. Consequences: Under heteroskedasticity parameter estimates are unbiased. Parameter estimates are not minimum variance among all unbiased estimators. Estimated standard errors are incorrect and all measures of precision based on the estimated standard errors are also incorrect. 9-5 Goals of this Chapter 9-6 An Important Caveat before Continuing • With more advanced statistical packages, many researchers include a very simple command asking their chosen statistical program to provides standard error estimates that automatically correct for heteroskedasticity (White’s heteroskedastic consistent standard errors) • Even though correcting for heteroskedasticity is straightforward, it important to first work through the more “old-school” examples that we do below before learning how to calculate White’s heteroskedastic consistent standard errors. 9-7 Understand Methods For Detecting Heteroskedasticity Informal methods - Graphs Formal methods using statistical tests - Breusch-Pagan test - General White’s Test - Modified White’s Test - Goldfeld-Quandt Test 9-8 Informal Method Either graph: (1) The dependent variable against each independent variable… (2) The residuals against each independent variable… (3) The residuals squared against each independent variable… (4) The standardized residuals against each independent variable… and look for a pattern in the dispersion of the observations. If a pattern exists then that is evidence of heteroskedasticity. 9-9 Regression of Number of Olympic Medals on per capita GDP by Country 9-10 Notice how the variance increases as the independent variable increases. This is evidence of heteroskedasticity. 9-11 This residual plot is obtained by checking the residual plot option in Excel when running a regression. As in the previous slide, notice how the variance increases as the independent variable (GDP per Capita) increases. This is evidence of heteroskedasticity. 9-12 The primary drawback of the informal method is that it is not clear how much of a pattern needs to exist to lead us to the conclusion that the model is heteroskedastic. This leads us to the need for formal tests of heteroskedasticity. 9-13 Formal Methods for Detecting Heteroskedasticity The formal methods that we consider are all based on statistical tests of the following general null and alternative hypotheses π»0 : the error term is homoskedastic π»1 : the error term is heteroskedastic 9-14 Testing for Heteroskedasticity (1) Breusch - Pagan (2) Modified White’s Test (3) Goldfeld-Quandt Test 9-15 Breusch-Pagan Test How to do it: (1) Estimate the population regression model π¦π = π½0 + π½1 π₯1π + π½2 π₯2π + β― + π½π π₯ππ + ππ and obtain the residuals, ππ . (2) Square the residuals or ππ2 . (3) Estimate the population regression model ππ2 = πΎ0 + πΎ1 π₯1π + πΎ2 π₯2π + β― + πΎπ π₯ππ + π (4) Perform an F-test for overall significance to see if the squared residuals are statistically related to any of the independent variables. 9-16 Breusch-Pagan Test Why It Works: If the squared residuals are found to be statistically related to the independent variables then we conclude that the data are heteroskedastic and we should take the appropriate steps to correct for the problem. 9-17 Breusch-Pagan Test for Olympic Medal vs GDP per Capita Data Dependent Variable is Residuals Squared The significant F is much less than 0.05 (or 0.01 for that matter) so we reject the null hypothesis of homoskedasticity and conclude model is heteroskedastic. 9-18 Modified White’s Test How to do it: (1) Estimate the population regression model π¦π = π½0 + π½1 π₯1π + π½2 π₯2π + β― + π½π π₯ππ + ππ and obtain the residuals, ππ , and predicted values. (2) Square the residuals. (3) Estimate the population regression model ππ2 = πΏ0 + πΏ1 π¦π + πΏ2 π¦π2 + π’π (4) Perform an F-test for overall significance to see if the squared residuals are statistically related to the π¦π and π¦π2 variables. 9-19 Modified White’s Test Why It Works: This test works for the same reason that that Breusch-Pagan test works. The primary difference is that the π¦π and π¦π2 variables are a function of the independent variables, the independent variables squared, and the cross-products of the independent variables, meaning that including those terms in the squared residual regression tests whether the squared residuals are a function of all of those terms rather than a function of the independent variables alone. 9-20 Modified White’s Test for Olympic Medal vs GDP per Capita Data Dependent Variable is Residuals Squared The significant F is much less than 0.05 (or 0.01 for that matter) so we reject the null hypothesis of homoskedasticity and conclude the model is heteroskedastic. 9-21 Goldfeld-Quandt Test How to do it: (1) Identify which independent variable is suspected of contributing towards heteroskedasticity and sort the data from smallest to largest on that variable. (2) Omit the middle π observations. (3) Run two regressions with the remaining (π − π) observations. πππ2 , πππ1 (4) Form the test statistic πΊπ = where πππ2 is the larger value (because the πΉ − π π‘ππ‘ππ π‘ππ must be greater than or equal to 1). (5) Reject the null hypothesis of homoskedasticity if GQ > πΉπ1−π1,π2−π2,.05 . 9-22 Goldfeld-Quandt Test Why It Works: This test works when the suspected heteroskedasticity is of the type that the error variances either increase (or decrease) with the value of a given independent variable. If we find that the unexplained sum of squares for the largest values is “large” relative to the unexplained sum of squares for the smallest values, then we conclude that the error variance changes significantly with the value of the independent variable, suggesting that the data are heteroskedastic. 9-23 Goldfeld-Quandt Test How to do it: For the Olympic Medal Data, there are 408 observations. Dividing the data into thirds, the first regression should contain the smallest 136 (408/3) GDP per capita data, and the second regression should contain the largest 136 GDP per capita data. 9-24 USS1 9-25 USS2 9-26 Goldfeld-Quandt Test Example πΊπ = πππ2 πππ1 = 63,534.37 =3.4259 18,545.19 Critical Value = πΉ∞,∞,0.05 = 1 Because 3.4259 > 1 we reject the null hypothesis of homoskedasticity and conclude that the model is heteroskedastic. 9-27 Correcting for Heteroskedasticity (1) Weighted least squares (2) White’s heteroskedastic consistent standard errors 9-28 Weighted Least Squares How to Do It: (1) Assume the form of heteroskedasticity, say πππ π = π 2 β(π₯). (2) Create new variables by dividing through by the square root of β(π₯) ∗ π¦π∗ = π¦π β(π₯), π₯0∗ = 1 β(π₯) , π₯1π = π₯1π β π₯ , ∗ ∗ π₯2π = π₯2π β(π₯), …, π₯ππ = π₯ππ β(π₯). (3) Estimate the population regression model π¦π∗ = ∗ π½0 π₯0∗ + π½1 π₯1π + ππ∗ . 9-29 Weighted Least Squares Why It Works: Weighted least squares changes the model from one that was initially heteroskedastic into one that is homoskedastic. The new error term π ∗ = π/ β(π₯) has variance πππ π ∗ = π 2 β(π₯)/( β π₯ )2 = π 2 . This only works as long as the assumed form of heteroskedasticity is correct. 9-30 Weighted Least Squares Example Assume that the form of heteroskedasticity is πππ π = π 2 πΊπ·πππππΆππππ‘ππ so that β π₯ = πΊπ·πππππΆππππ‘ππ β π₯ = πΊπ·πππππΆππππ‘ππ The transformed variables are ππππππ π ∗ πππalπ = πΊπ·πππππΆππππ‘ππ 9-31 Weighted Least Squares Example The transformed variables are πππalπ π∗ = πππ‘ππππππ‘ ∗ ππππππ π πΊπ·πππππΆππππ‘ππ = 1 πΊπ·πππππΆππππ‘ππ πΊπ·πππππΆππππ‘ππ∗ = πΊπ·πππππΆππππ‘ππ πΊπ·πππππΆππππ‘ππ 9-32 Weighted Least Squares Example Excel Results 9-33 Breusch-Pagan Test of Transformed Weighted Least Squares Data Unfortunately, even after the transformation this model still suffers from heteroskedasticity 9-34 Robust Standard Errors The preferred method to correct for heteroskedasticity is to use White’s heteroskedastic consistent standard errors. The coefficient estimates are still unbiased so the only thing that needs to be corrected are the standard errors. In STATA, the command is reg y x1 x2 x3, robust The ,robust (or even ,r) is the portion of the command that corrects the standard errors. 9-35 STATA Results with Original Standard Errors STATA Results with Robust Standard Errors 9-36