Heteroskedasticity

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Regression Analysis --Heteroskedasticity and WLS
Recall that one of the assumptions of the OLS method is that the variance of the error term is the same for all
individuals in the population under study. Heteroskedasticity occurs when the variance of the error term is NOT
the same for all individuals in the population. Heteroskedasticity occurs more often in cross-section datasets than
in time-series datasets.
Consequences of Heteroskedasticity:
1. the estimates of the š›½Ģ‚’s are still unbiased if heteroskedasticity is present (and that’s good),
2. but, the s.e.’s of the š›½Ģ‚’s will be biased, and we don’t know whether they will be biased upward
or downward, so we could make incorrect conclusions about whether the X’s affect Y
3. the estimate of S.E.R. is biased, so we could make incorrect conclusions about model fit
Detecting Heteroskedasticity:
1. Heteroskedasticity can be observed using a "Residual Plot." Plot the regression residuals/errors,
the “ehats,” against the X variables (you should plot the residuals against each X variable
separately to check which of the X variables might be a source of Heteroskedasticity). Some
researchers like to plot the squared residuals, the "ehats-squared", against X. So, the figures
below show both types of plots.
a. If Heteroskedasticity is not present, the variation in the ehats around (above and below) zero
will be the same for all values of X. Figures 1a and 1b below are examples of residual plots
when Heteroskedasticity is NOT present.
b. If Heteroskedasticity is present, the variation in the ehats will not be the same for all values
of X. Figures 2a and 2b are examples of residual plots when Heteroskedasticity IS present.
2. Or, you can use a more sophisticated statistical test to detect heteroskedasticity, such as the Park
test, Breusch-Pagan test, Glejser test, White test, etc. (we do not cover these tests in this course).
ehats
+
Figure 1a. No Heteroskedasticity
0
X
ehats
0
Figure 2a. Heteroskedasticity
0
X
-
-
ehats2
+
Figure 1b. No Heteroskedasticity
X
ehats2
Figure 2b. Heteroskedasticity
0
X
1
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Correcting a Regression Analysis for Heteroskedasticity:
If Heteroskedasticity is present, we can correct for its effects using the Weighted Least Squares (WLS) method.
The WLS method multiplies the intercept and every variable in the model (both the Y and X variables) by a
weight, called “w,” that is constructed in a way that will remove the effects of Heteroskedasticity from the
regression. A formula is used to calculate the weight. The formula depends on the type of Heteroskedasticity
present. There is more than one type of Heteroskedasticity; the type depends on the pattern in the residuals
(ehats).
If the residual plot shows that the ehats are proportional to variable X, as shown in Figure 3 below,
then the weight is:
ehats
w=
1
X
+
Figure 3.
0
X
Or, if the residual plot shows that the ehats are proportional to the square of variable X, as shown in
Figure 4 below, then the weight is:
ehats
w=
Figure 4.
0
1
X
+
X
2
-
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
For example, assume that we estimate the following regression equation:
Yi ļ€½ ļ¢ 0 ļ€« ļ¢1X1i ļ€« ļ¢ 2 X 2i ļ€« e i
After running the regression, we want to check for heteroskedasticity. So, we graph the ehats from the regression
against X1 and X2. Suppose that a graph of the ehats against X1 indicates that there does appear to be
heteroskedasticity, and the heteroskedasticity appears to be proportional to X1.
We decide to use Weighted Least Squares to correct for the effects of heteroskedasticity. When we suspect that
the heteroskedasticity is proportional to X1, the appropriate weight would be:
wi =
1
X1i
To use Weighted Least Squares (WLS), we multiply each term in the regression equation by the weight:
w i Yi ļ€½ w iļ¢0 ļ€« ļ¢1w i X1i ļ€« ļ¢ 2 w i X 2i ļ€« w i e i
substituting the formula for w:
1
Yi ļ€½
X1i
1
1
ļ¢ 0 ļ€« ļ¢1
X1 ļ€« ļ¢ 2
X1i
X1i
1
1
X2 ļ€«
ei
X1i
X1i
Then, to perform WLS, simply use the OLS method on the "transformed" (weighted) equation above. This is
WLS. NOTE THAT THE VARIABLES TO INCLUDE IN THE WLS REGRESSION ARE:
1
Yi ,
X1i
Yi new ļ€½
wi ļ€½
1
,
X1i
X1i new ļ€½
1
X1i ,
X1i
X 2i new ļ€½
1
X 2i
X1i
The weighting variable, “wi,” is itself added to the regression equation as an additional “variable.” The
coefficient on “wi” will be the intercept, š›½Ģ‚0. As a result, we must tell SAS not to estimate š›½Ģ‚0 separately (as SAS
usually does). We tell SAS not to estimate š›½Ģ‚0 by including the option word “noint” on the “model” command line
in Proc Reg (see the example SAS program below).
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Detecting and Correcting Heteroskedasticity in SAS
/*
SOFTWARE: SAS Statistical Software program, version 9.2
AUTHOR: Dr. Chris Dumas, UNC-Wilmington, Spring, 2013.
TITLE: Program to perform weighted least squares (WLS) regression
to correct for heteroskedasticity.
*/
/* The original cross-section data set has observations for 18 firms on two
variables:
Y = firm expenditures on research and development, and
X = firm sales. The purpose of the regression analysis is to determine whether
firm sales affects firm R&D expenditures. */
proc import datafile="v:\ECN377\HeteroData.xls" dbms=xls out=dataset01 replace;
run;
/* The SAS commands below conducts a regression of the Y on X.
The residuals are saved as variable "ehat". */
proc reg data=dataset01;
model Y = X;
output out=dataset01 residual=ehat;
run;
/* Proc Plot below plots the residuals (ehat) from the regression above
against X to check for heteroskedasticity.*/
proc plot data=dataset01;
plot ehat*X;
run;
/* Suppose that the pattern in the residual plot indicates that heteroskedasticity
appears to be present, and the heteroskedasticity appears to be proportional to X.
*/
/* The data command below creates a new dataset, dataset02,
copies dataset01 into dataset02, creates the new weighting variable, "w",
and creates weighted Y and X variables, called Ynew and Xnew. */
data dataset02;
set dataset01;
w = 1/sqrt(X);
Ynew = w*Y;
Xnew = w*X;
run;
/* The regression below performs Weighted Least Squares (WLS) by running
the regression on the weighted variables (including w itself).
NOTE THAT WE NEED TO TELL SAS TO DROP THE AUTOMATIC INTERCEPT TERM,
BECAUSE THE COEFFICIENT ON THE W VARIABLE WILL BE THE INTERCEPT!
Include the “noint” option on the “model” command line to turn off
the automatic intercept. We save the residuals from the WLS regression and
name them “ehat_new”. We can check for patterns in these new residuals to see
whether the heteroskedasticity problem has gone away.*/
4
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
proc reg data=dataset02;
model Ynew = w Xnew / noint;
output out=dataset02 residual=ehat_new;
run;
/* The plot command below creates a graph of the residuals from the WLS regression
(the ehat_new’s) against the original X variable to check whether a pattern still
exists after we have adjusted for heteroskedasticity. The pattern should be gone
or greatly reduced. (Compare the Y axes from the ehat and the ehat_new plots and
notice how correcting for heteroskedasticity greatly reduces the range in the
ehats.) */
proc plot data=dataset02;
plot ehat_new*X;
run;
5
Download