Heteroskedasticity APS 425 - Advanced Managerial Data Analysis APS 425 Fall 2015 Heteroskedasticity Instructor: G. William Schwert 585-275-2470 schwert@schwert.ssb.rochester.edu Heteroskedasticity (Nonconstant Variance of the Errors) • Recall assumption 5: – Homoskedasticity: var(ei) = constant – That means, the variance of ei is the same for all observations in the sample, and thus, the variance of Yi is the same for all observations in the sample – The uncertainty in Yi is the same amount when Xi is small as when Xi is a large • When you have heteroskedasticity, the spread of the dependent variable Y could depend on the value of X, for example • Some observations are inherently less influenced by unmeasured factors (c) Prof. G. William Schwert, 2001-2015 1 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Heteroskedasticity • Graphical example: 25 20 • Appears that there 15 is more dispersion among the Y10 values when X is 5 larger 0 0 5 10 15 Heteroskedasticity • Example: database with 249 small to medium sized companies, containing both employee and sales information for the year 2000 • SALES = total company sales in $1000 • EMPLOYEES = number of FTEs employed by the company • Model: SALESi = 0 + 1 EMPLOYEESi + ei (c) Prof. G. William Schwert, 2001-2015 2 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Heteroskedasticity & Eviews Heteroskedasticity & Eviews (c) Prof. G. William Schwert, 2001-2015 3 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Heteroskedasticity & Eviews • Look only at this part: • • • • • • Consider the p-value for the F-statistic The null hypothesis for the White test is Homoskedasticity If fail to reject the null hypothesis, then we have homoskedasticity If reject the null hypothesis, then we have heteroskedasticity Significance level of 5% is commonly used for this test Conclusion: REJECT, so assume heteroskedasticity Heteroskedasticity & Eviews • How to tell Eviews to assume Heteroskedasticity: – Click on the Estimate button at the top of the Equation window – Click on the Options button in the Equation Specification window – Check the Heteroskedasticity checkbox (c) Prof. G. William Schwert, 2001-2015 4 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Heteroskedasticity & Eviews Indicates heteroskedasticity was assumed by Eviews in these results Heteroskedasticity & Eviews • Since there is heteroskedasticity, – – – – Estimators (b0 and b) for both sets of results are unbiased and consistent Standard errors in standard results are WRONG (i.e., incorrect) Standard errors in White results are correct Estimators (b0 and b) are not efficient (i.e., they don’t have minimum standard errors), but this is the best we can do unless we know the precise nature of the heteroskedasticity (c) Prof. G. William Schwert, 2001-2015 5 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Weighted Least Squares • Up to this point we have merely been correcting OLS estimates for the bias in the estimated standard errors (and t-statistics) • We can also get better estimators of the coefficients if we can correct for the heteroskedasticity =>Weighted Least Squares Weighted Least Squares Example: Suppose that you have a regression of sales on employees (with no constant): It looks like the variance of the errors is going to be positively related to the level of employees SALESi = 1 EMPLOYEESi + ei (c) Prof. G. William Schwert, 2001-2015 6 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis White Test Confirms Heteroskedasticty It looks like there is significant heteroskedasticity in the residuals from this regression model Heteroskedasticity-consistent t-stats are about 2/3 the size of the “raw model” Weighted Least Squares Consider three possible hypotheses: (1) Var(ei) = 2 => SD(ei) = (2) Var(ei) = 2 EMPLOYEESi => SD(ei) = EMPLOYEESi ½ (3) Var(ei) = 2 EMPLOYEESi2 => SD(ei) = EMPLOYEESi WLS would imply dividing the equation by the appropriate variable so that the transformed residual has constant standard deviation and variance: (1) SALESi = 1 EMPLOYEESi + ei (2) (SALESi/EMPLOYEESi ½) = 1 EMPLOYEESi½ + ei (3) (SALESi/EMPLOYEESi) = 1 + ei (c) Prof. G. William Schwert, 2001-2015 7 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis WLS: (2) (SALESi/EMPLOYEESi ½) = 1 EMPLOYEESi½ + ei It looks like there may still be a little bit of heteroskedasticity, but this specification is better than the “raw model” -- F-test is half as big as for levels regression Note that the t-stat for the slope is about 25% bigger, because WLS is more efficient WLS in Eviews In the estimation I tell Eviews to use 1/SQR(EMPLOYEES) as the weights and you get exactly the same results as when we did the regression manually, above Note that Eviews also gives you summary statistics in terms of the unweighted/raw data (c) Prof. G. William Schwert, 2001-2015 8 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis WLS: (3) (SALESi/EMPLOYEESi) = 1 + ei Note that you can’t do a White test here because there are no regressors Also, you can’t compare the R2 statistics across these models because the dependent variables are different WLS in Eviews In this case, I tell Eviews to use 1/EMPLOYEES as the weights and you get exactly the same results as when we did the regression manually, above (c) Prof. G. William Schwert, 2001-2015 9 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis WLS: Different Estimators of 1 OLS: b1 = xi yi / xi2 => [the usual slope of the regression line] [if SD(ei) = EMPLOYEESi ½] WLS: b1 = xi½ (yi / xi½) / [xi½ ]2 _ _ = yi / xi = y / x 2 WLS: b1 = 1 (yi / xi) / [1 ]___ => = (yi / xi ) / N = (y / x) => [the ratio of the sample means] [if SD(ei) = EMPLOYEESi] [the sample mean of the ratio] WLS: From Eviews Manual (c) Prof. G. William Schwert, 2001-2015 10 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis WLS: Some Diagnostics • One of the assumptions we make when estimating a regression using least squares is that the errors are Normally distributed (with constant mean and variance) • If the variance is not constant, one thing you will generally see is a “fat-tailed” histogram; i.e., kurtosis > 3, and outliers • Thus, we can use the histogram of the residuals as a further diagnostic for whether we have “fixed” the heteroskedasticity problem Histogram from OLS Model (1) • Note that the kurtosis is large, the Jarque-Bera statistic (testing Normality) has a p-value of 0, and there are outliers in the histogram • Also note that the plot of the residuals shows erratic spikes (mostly associated with larger firms) (c) Prof. G. William Schwert, 2001-2015 11 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Histogram from WLS Model (2) • Note that the kurtosis is smaller than in the OLS model, but the Jarque-Bera statistic still has a p-value of 0, and there are outliers in the histogram • Also note that the plot of the standardized residuals still shows erratic spikes (mostly associated with larger firms) Histogram from WLS Model (3) • Note that the kurtosis is close to 3, and the Jarque-Bera statistic has a p-value of 0.11, implying that the data are consistent with a Normal distribution (and constant variance) • Also note that the plot of the standardized residuals looks much more regular in its spread • Thus, this model seems best (c) Prof. G. William Schwert, 2001-2015 12 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis More General Approach to WLS • Sometimes it will not be obvious how to use a single independent variable to create appropriate weights • This is a more data-driven approach • Start with OLS model (I am going to put the constant term back in), then create a new variable for the absolute value of the residuals, ABSRES Forecast Residual Standard Deviation Using EMPLOYEESi½ • Next, I am going to regress the absolute value of the residuals from the OLS regression against several functions of the independent variable – but I could use any variable or combination of variables here if I thought they could explain the heteroskedasticity in the original OLS regression – Start with EMPLOYEESi½ and then create forecasts of residual standard deviation from this model, ABSRESF1 (c) Prof. G. William Schwert, 2001-2015 13 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Forecast Residual Standard Deviation Using EMPLOYEESi½ • This looks a lot like the earlier WLS results, but note that the constant term now looks significant Histogram from WLS Forecast of Residual Standard Deviation Using EMPLOYEESi½ • The kurtosis is about 4.8, and the Jarque-Bera statistic has a p-value of 0., and there are outliers in the histogram • Similar to what we saw in the residuals from Model (2) – which also relied on a relationship with the square root of EMPLOYEES (c) Prof. G. William Schwert, 2001-2015 14 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Forecast Residual Standard Deviation Using EMPLOYEESi • Next, I use with EMPLOYEESi and then create forecasts of residual standard deviation from this model, ABSRESF2 Forecast Residual Standard Deviation Using EMPLOYEESi • In this case, the constant term is not significantly different from 0. (c) Prof. G. William Schwert, 2001-2015 15 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Histogram from WLS Forecast of Residual Standard Deviation Using EMPLOYEESi • The kurtosis is about 3.4, and the Jarque-Bera statistic has a p-value of 0.094, which suggests that Normality (with constant variance) is a reasonable assumption • Similar to what we saw in the residuals from Model (3) – which also relied on a relationship with the level of EMPLOYEES Try Using Logs • Often people find that using log transformations help to solve heteroskedasticity problems • Here is the log-log scatter diagram • Looks like it might work, let’s see . . . (c) Prof. G. William Schwert, 2001-2015 16 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Try Using Logs • 1% increase in employees is associated with a 1% increase in sales • Big negative outlier (obs # 40) Try Using Logs • Omitting obs # 40 reduces skewness & kurtosis • Seems inferior to WLS model 3 (c) Prof. G. William Schwert, 2001-2015 17 Heteroskedasticity APS 425 - Advanced Managerial Data Analysis Links Sales Data http://schwert.ssb.rochester.edu/a425/a425_sales.wf1 Return to APS 425 Home Page http://schwert.ssb.rochester.edu/a425/a425main.htm (c) Prof. G. William Schwert, 2001-2015 18