Heteroskedasticity Instructor: G. William Schwert Heteroskedasticity

advertisement
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
APS 425
Fall 2015
Heteroskedasticity
Instructor: G. William Schwert
585-275-2470
schwert@schwert.ssb.rochester.edu
Heteroskedasticity
(Nonconstant Variance of the Errors)
• Recall assumption 5:
– Homoskedasticity: var(ei) = constant
– That means, the variance of ei is the same for all
observations in the sample, and thus, the variance of Yi is
the same for all observations in the sample
– The uncertainty in Yi is the same amount when Xi is small
as when Xi is a large
• When you have heteroskedasticity, the spread of the
dependent variable Y could depend on the value of X, for
example
• Some observations are inherently less influenced by
unmeasured factors
(c) Prof. G. William Schwert, 2001-2015
1
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Heteroskedasticity
• Graphical
example:
25
20
• Appears that there 15
is more dispersion
among the Y10
values when X is
5
larger
0
0
5
10
15
Heteroskedasticity
• Example: database with 249 small to
medium sized companies, containing
both employee and sales information
for the year 2000
• SALES = total company sales in $1000
• EMPLOYEES = number of FTEs
employed by the company
• Model: SALESi = 0 + 1 EMPLOYEESi + ei
(c) Prof. G. William Schwert, 2001-2015
2
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Heteroskedasticity & Eviews
Heteroskedasticity & Eviews
(c) Prof. G. William Schwert, 2001-2015
3
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Heteroskedasticity & Eviews
• Look only at this part:
•
•
•
•
•
•
Consider the p-value for the F-statistic
The null hypothesis for the White test is Homoskedasticity
If fail to reject the null hypothesis, then we have homoskedasticity
If reject the null hypothesis, then we have heteroskedasticity
Significance level of 5% is commonly used for this test
Conclusion: REJECT, so assume heteroskedasticity
Heteroskedasticity & Eviews
• How to tell Eviews to assume Heteroskedasticity:
– Click on the Estimate button at the top of the Equation window
– Click on the Options button in the Equation Specification window
– Check the Heteroskedasticity checkbox
(c) Prof. G. William Schwert, 2001-2015
4
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Heteroskedasticity & Eviews
Indicates
heteroskedasticity
was assumed by
Eviews in these
results
Heteroskedasticity & Eviews
• Since there is heteroskedasticity,
–
–
–
–
Estimators (b0 and b) for both sets of results are unbiased and consistent
Standard errors in standard results are WRONG (i.e., incorrect)
Standard errors in White results are correct
Estimators (b0 and b) are not efficient (i.e., they don’t have minimum
standard errors), but this is the best we can do unless we know the precise
nature of the heteroskedasticity
(c) Prof. G. William Schwert, 2001-2015
5
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Weighted Least Squares
• Up to this point we have merely been correcting OLS estimates for the bias
in the estimated standard errors (and t-statistics)
• We can also get better estimators of the coefficients if we can correct for the
heteroskedasticity
=>Weighted Least Squares
Weighted Least Squares
Example:
Suppose that you have a regression
of sales on employees (with no
constant):
It looks like the variance of the errors is going
to be positively related to the level of
employees
SALESi = 1 EMPLOYEESi + ei
(c) Prof. G. William Schwert, 2001-2015
6
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
White Test Confirms
Heteroskedasticty
It looks like there is significant heteroskedasticity in the residuals from this regression model
Heteroskedasticity-consistent t-stats are about 2/3 the size of the “raw model”
Weighted Least Squares
Consider three possible hypotheses:
(1) Var(ei) = 2 => SD(ei) = 
(2) Var(ei) = 2 EMPLOYEESi => SD(ei) = EMPLOYEESi ½
(3) Var(ei) = 2 EMPLOYEESi2 => SD(ei) = EMPLOYEESi
WLS would imply dividing the equation by the appropriate variable
so that the transformed residual has constant standard deviation
and variance:
(1) SALESi = 1 EMPLOYEESi + ei
(2) (SALESi/EMPLOYEESi ½) = 1 EMPLOYEESi½ + ei
(3) (SALESi/EMPLOYEESi) = 1 + ei
(c) Prof. G. William Schwert, 2001-2015
7
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
WLS:
(2) (SALESi/EMPLOYEESi ½) = 1 EMPLOYEESi½ + ei
It looks like there may still be a little bit of heteroskedasticity, but this specification is
better than the “raw model” -- F-test is half as big as for levels regression
Note that the t-stat for the slope is about 25% bigger, because WLS is more efficient
WLS in Eviews
In the estimation I tell Eviews to use 1/SQR(EMPLOYEES) as the weights and you get
exactly the same results as when we did the regression manually, above
Note that Eviews also gives you summary
statistics in terms of the unweighted/raw
data
(c) Prof. G. William Schwert, 2001-2015
8
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
WLS:
(3) (SALESi/EMPLOYEESi) = 1 + ei
Note that you can’t do a White test here because there are no regressors
Also, you can’t compare the R2 statistics across these models because the
dependent variables are different
WLS in Eviews
In this case, I tell Eviews to use 1/EMPLOYEES as the weights and you get
exactly the same results as when we did the regression manually, above
(c) Prof. G. William Schwert, 2001-2015
9
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
WLS:
Different Estimators of 1
OLS: b1 =  xi yi /  xi2
=>
[the usual slope of the regression line]
[if SD(ei) = EMPLOYEESi ½]
WLS: b1 =  xi½ (yi / xi½) /  [xi½ ]2
_ _
=  yi /  xi = y / x
2
WLS: b1 =  1 (yi / xi) /  [1 ]___
=>
=  (yi / xi ) / N = (y / x) =>
[the ratio of the sample means]
[if SD(ei) = EMPLOYEESi]
[the sample mean of the ratio]
WLS:
From Eviews Manual
(c) Prof. G. William Schwert, 2001-2015
10
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
WLS: Some Diagnostics
• One of the assumptions we make when estimating a
regression using least squares is that the errors are
Normally distributed (with constant mean and variance)
• If the variance is not constant, one thing you will generally
see is a “fat-tailed” histogram; i.e., kurtosis > 3, and
outliers
• Thus, we can use the histogram of the residuals as a further
diagnostic for whether we have “fixed” the
heteroskedasticity problem
Histogram from OLS Model (1)
• Note that the kurtosis is large, the Jarque-Bera statistic (testing Normality) has
a p-value of 0, and there are outliers in the histogram
• Also note that the plot of the residuals shows erratic spikes (mostly associated
with larger firms)
(c) Prof. G. William Schwert, 2001-2015
11
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Histogram from WLS Model (2)
• Note that the kurtosis is smaller than in the OLS model, but the Jarque-Bera
statistic still has a p-value of 0, and there are outliers in the histogram
• Also note that the plot of the standardized residuals still shows erratic spikes
(mostly associated with larger firms)
Histogram from WLS Model (3)
• Note that the kurtosis is close to 3, and the Jarque-Bera statistic has a p-value
of 0.11, implying that the data are consistent with a Normal distribution (and
constant variance)
• Also note that the plot of the standardized residuals looks much more regular
in its spread
• Thus, this model seems best
(c) Prof. G. William Schwert, 2001-2015
12
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
More General Approach to WLS
• Sometimes it will not be obvious how to use a single independent variable to
create appropriate weights
• This is a more data-driven approach
• Start with OLS model (I am going to put the constant term back in), then create a
new variable for the absolute value of the residuals, ABSRES
Forecast Residual Standard Deviation
Using EMPLOYEESi½
• Next, I am going to regress the absolute value of the residuals from the OLS regression
against several functions of the independent variable
– but I could use any variable or combination of variables here if I thought they could
explain the heteroskedasticity in the original OLS regression
– Start with EMPLOYEESi½ and then create forecasts of residual standard deviation
from this model, ABSRESF1
(c) Prof. G. William Schwert, 2001-2015
13
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Forecast Residual Standard Deviation
Using EMPLOYEESi½
• This looks a lot like the earlier WLS
results, but note that the constant term
now looks significant
Histogram from WLS Forecast of
Residual Standard Deviation Using EMPLOYEESi½
• The kurtosis is about 4.8, and the Jarque-Bera statistic has a p-value of 0., and
there are outliers in the histogram
• Similar to what we saw in the residuals from Model (2)
– which also relied on a relationship with the square root of EMPLOYEES
(c) Prof. G. William Schwert, 2001-2015
14
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Forecast Residual Standard Deviation
Using EMPLOYEESi
• Next, I use with EMPLOYEESi and then create forecasts of residual standard
deviation from this model, ABSRESF2
Forecast Residual Standard Deviation
Using EMPLOYEESi
• In this case, the constant term is not
significantly different from 0.
(c) Prof. G. William Schwert, 2001-2015
15
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Histogram from WLS Forecast of
Residual Standard Deviation Using
EMPLOYEESi
• The kurtosis is about 3.4, and the Jarque-Bera statistic has a p-value of 0.094,
which suggests that Normality (with constant variance) is a reasonable
assumption
• Similar to what we saw in the residuals from Model (3)
– which also relied on a relationship with the level of EMPLOYEES
Try Using Logs
• Often people find that using log
transformations help to solve
heteroskedasticity problems
• Here is the log-log scatter diagram
• Looks like it might work, let’s see . . .
(c) Prof. G. William Schwert, 2001-2015
16
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Try Using Logs
• 1% increase in
employees is
associated with a 1%
increase in sales
• Big negative outlier
(obs # 40)
Try Using Logs
• Omitting obs # 40
reduces skewness &
kurtosis
• Seems inferior to WLS
model 3
(c) Prof. G. William Schwert, 2001-2015
17
Heteroskedasticity
APS 425 - Advanced Managerial Data Analysis
Links
Sales Data
http://schwert.ssb.rochester.edu/a425/a425_sales.wf1
Return to APS 425 Home Page
http://schwert.ssb.rochester.edu/a425/a425main.htm
(c) Prof. G. William Schwert, 2001-2015
18
Download