UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Regression Analysis --Heteroskedasticity and WLS Recall that one of the assumptions of the OLS method is that the variance of the error term is the same for all individuals in the population under study. Heteroskedasticity occurs when the variance of the error term is NOT the same for all individuals in the population. Heteroskedasticity occurs more often in cross-section datasets than in time-series datasets. Consequences of Heteroskedasticity: 1. the estimates of the š½Ģ’s are still unbiased if heteroskedasticity is present (and that’s good), 2. but, the s.e.’s of the š½Ģ’s will be biased, and we don’t know whether they will be biased upward or downward, so we could make incorrect conclusions about whether the X’s affect Y 3. the estimate of S.E.R. is biased, so we could make incorrect conclusions about model fit Detecting Heteroskedasticity: 1. Heteroskedasticity can be observed using a "Residual Plot." Plot the regression residuals/errors, the “ehats,” against the X variables (you should plot the residuals against each X variable separately to check which of the X variables might be a source of Heteroskedasticity). Some researchers like to plot the squared residuals, the "ehats-squared", against X. So, the figures below show both types of plots. a. If Heteroskedasticity is not present, the variation in the ehats around (above and below) zero will be the same for all values of X. Figures 1a and 1b below are examples of residual plots when Heteroskedasticity is NOT present. b. If Heteroskedasticity is present, the variation in the ehats will not be the same for all values of X. Figures 2a and 2b are examples of residual plots when Heteroskedasticity IS present. 2. Or, you can use a more sophisticated statistical test to detect heteroskedasticity, such as the Park test, Breusch-Pagan test, Glejser test, White test, etc. (we do not cover these tests in this course). ehats + Figure 1a. No Heteroskedasticity 0 X ehats 0 Figure 2a. Heteroskedasticity 0 X - - ehats2 + Figure 1b. No Heteroskedasticity X ehats2 Figure 2b. Heteroskedasticity 0 X 1 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Correcting a Regression Analysis for Heteroskedasticity: If Heteroskedasticity is present, we can correct for its effects using the Weighted Least Squares (WLS) method. The WLS method multiplies the intercept and every variable in the model (both the Y and X variables) by a weight, called “w,” that is constructed in a way that will remove the effects of Heteroskedasticity from the regression. A formula is used to calculate the weight. The formula depends on the type of Heteroskedasticity present. There is more than one type of Heteroskedasticity; the type depends on the pattern in the residuals (ehats). If the residual plot shows that the ehats are proportional to variable X, as shown in Figure 3 below, then the weight is: ehats w= 1 X + Figure 3. 0 X Or, if the residual plot shows that the ehats are proportional to the square of variable X, as shown in Figure 4 below, then the weight is: ehats w= Figure 4. 0 1 X + X 2 - 2 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas For example, assume that we estimate the following regression equation: Yi ļ½ ļ¢ 0 ļ« ļ¢1X1i ļ« ļ¢ 2 X 2i ļ« e i After running the regression, we want to check for heteroskedasticity. So, we graph the ehats from the regression against X1 and X2. Suppose that a graph of the ehats against X1 indicates that there does appear to be heteroskedasticity, and the heteroskedasticity appears to be proportional to X1. We decide to use Weighted Least Squares to correct for the effects of heteroskedasticity. When we suspect that the heteroskedasticity is proportional to X1, the appropriate weight would be: wi = 1 X1i To use Weighted Least Squares (WLS), we multiply each term in the regression equation by the weight: w i Yi ļ½ w iļ¢0 ļ« ļ¢1w i X1i ļ« ļ¢ 2 w i X 2i ļ« w i e i substituting the formula for w: 1 Yi ļ½ X1i 1 1 ļ¢ 0 ļ« ļ¢1 X1 ļ« ļ¢ 2 X1i X1i 1 1 X2 ļ« ei X1i X1i Then, to perform WLS, simply use the OLS method on the "transformed" (weighted) equation above. This is WLS. NOTE THAT THE VARIABLES TO INCLUDE IN THE WLS REGRESSION ARE: 1 Yi , X1i Yi new ļ½ wi ļ½ 1 , X1i X1i new ļ½ 1 X1i , X1i X 2i new ļ½ 1 X 2i X1i The weighting variable, “wi,” is itself added to the regression equation as an additional “variable.” The coefficient on “wi” will be the intercept, š½Ģ0. As a result, we must tell SAS not to estimate š½Ģ0 separately (as SAS usually does). We tell SAS not to estimate š½Ģ0 by including the option word “noint” on the “model” command line in Proc Reg (see the example SAS program below). 3 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas Detecting and Correcting Heteroskedasticity in SAS /* SOFTWARE: SAS Statistical Software program, version 9.2 AUTHOR: Dr. Chris Dumas, UNC-Wilmington, Spring, 2013. TITLE: Program to perform weighted least squares (WLS) regression to correct for heteroskedasticity. */ /* The original cross-section data set has observations for 18 firms on two variables: Y = firm expenditures on research and development, and X = firm sales. The purpose of the regression analysis is to determine whether firm sales affects firm R&D expenditures. */ proc import datafile="v:\ECN377\HeteroData.xls" dbms=xls out=dataset01 replace; run; /* The SAS commands below conducts a regression of the Y on X. The residuals are saved as variable "ehat". */ proc reg data=dataset01; model Y = X; output out=dataset01 residual=ehat; run; /* Proc Plot below plots the residuals (ehat) from the regression above against X to check for heteroskedasticity.*/ proc plot data=dataset01; plot ehat*X; run; /* Suppose that the pattern in the residual plot indicates that heteroskedasticity appears to be present, and the heteroskedasticity appears to be proportional to X. */ /* The data command below creates a new dataset, dataset02, copies dataset01 into dataset02, creates the new weighting variable, "w", and creates weighted Y and X variables, called Ynew and Xnew. */ data dataset02; set dataset01; w = 1/sqrt(X); Ynew = w*Y; Xnew = w*X; run; /* The regression below performs Weighted Least Squares (WLS) by running the regression on the weighted variables (including w itself). NOTE THAT WE NEED TO TELL SAS TO DROP THE AUTOMATIC INTERCEPT TERM, BECAUSE THE COEFFICIENT ON THE W VARIABLE WILL BE THE INTERCEPT! Include the “noint” option on the “model” command line to turn off the automatic intercept. We save the residuals from the WLS regression and name them “ehat_new”. We can check for patterns in these new residuals to see whether the heteroskedasticity problem has gone away.*/ 4 UNC-Wilmington Department of Economics and Finance ECN 377 Dr. Chris Dumas proc reg data=dataset02; model Ynew = w Xnew / noint; output out=dataset02 residual=ehat_new; run; /* The plot command below creates a graph of the residuals from the WLS regression (the ehat_new’s) against the original X variable to check whether a pattern still exists after we have adjusted for heteroskedasticity. The pattern should be gone or greatly reduced. (Compare the Y axes from the ehat and the ehat_new plots and notice how correcting for heteroskedasticity greatly reduces the range in the ehats.) */ proc plot data=dataset02; plot ehat_new*X; run; 5