Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation Violation of uncorrelated error terms Time-series data Spring 02 1 Heteroscedasticity The OLS model assumes homoscedasticity, i.e., the variance of the errors is constant. In some regressions, especially in cross-sectional studies, this assumption may be violated. When heteroscedasticity is present, OLS estimation puts more weight on the observations which have large error variances than on those with small error variances. The OLS estimates are unbiased but they are inefficient but have larger than minimum variance. Spring 02 2 Tests of Heteroscedasticity Lagrange Multiplier Tests Goldfeld-Quant Test White’s Test Spring 02 3 Goldfeld-Quant Test Order the data by the magnitude of the independent variable, X, which is thouth to be related to the error variance. Omit the middle d observations. (d might be 1/5 of the total sample size) Fit two separate regressions; one for the low values, another for the high values Calculate ESS1 and ESS2 Calculate ESS F(( N d 2 k ) Spring 02 ( N d 2 k ) ) 2 2 1 ESS2 4 Problem Salvatore – Data on income and consumption Y 12 13 14 15 16 17 18 19 20 21 Spring 02 10.6 11.4 12.3 13.0 13.8 14.4 15.0 15.9 16.9 17.2 Consumption 10.8 11.7 12.6 13.3 14.0 14.9 15.7 16.5 17.5 17.8 11.1 12.1 13.2 13.6 14.2 15.3 16.4 16.9 18.1 18.5 5 Problem 19.0 18.0 17.0 16.0 15.0 14.0 13.0 12.0 11.0 10.0 10 Spring 02 12 14 16 18 20 22 6 Problem Regression on the whole sample: Cˆ 1.48 * .788 *Yd Regressions on the first twelve and last twelve observations: 2 ˆ C1 .85 .837Yd , R1 0.91, ESS1 1.069 Cˆ 2.31 .837Y , R 2 0.71, ESS 3.344 2 F10,10 3.344 Spring 02 d 1.069 2 1 3.3 F5% crit 2.97 7 To Correct for Heteroscedasticity To correct for heteroscedasticity of the form Var(ei)=CX2, where C is a nonzero constant, transform the variables by dividing through by the problematic variable. In the two variable case, Yi ei 1 2 Xi Xi Xi The transformed error term is now homoscedastic Spring 02 8 Problem C 1 1 2 ui Yd Yd Cˆ 1 .792 1.421 Yd Yd Cˆ 1.421 .792Y d Spring 02 9 Serial Correlation This is the problem which arises in OLS estimation when the errors are not independent. The error term in one period is correlated with error terms in previous periods. If ei is correlated with ei-1, then we say there is first order serial correlation. Serial correlation may be positive or negative. E(ei,ei-1)>0 E(ei,ei-1)<0 Spring 02 10 Serial Correlation If serial correlation is present, the OLS estimates are still unbiased and consistent, but the standard errors are biased, leading to incorrect statistical tests and biased confidence intervals. With positive serial correlation, the standard errors of hat is biased downward, leading to higher t stats With negative serial correlation, the standard errors of hat is biased upward, leading to lower t stats Spring 02 11 Durbin-Watson Statistic n d (e t 2 t e t 1 ) 2 n e t 1 2 t 0d 4 0 Spring 02 dL +SC dU inconcl 2 no serial correlation 4-dU 4-dL inconcl -SC 4 12 Problem Data 9-4 shows corporate profits and sales in billions of dollars for the manufacturing sector of the U.S. from 1974 to 1994. Estimate the equation Profits = 1+2Sales + e Test for first-order serial correlation. Spring 02 13 Problem OLS Estimate of Profit as a function of Sales: Coefficientsa Model 1 Uns tandardized Coefficients B Std. Error (Cons tant) 34.014 24.041 SALES 2.654E-02 .011 Standardi zed Coefficien ts Beta .496 t 1.415 2.492 Sig. .173 .022 a. Dependent Variable: PROFITS ˆt 34.01 .027 * Sales Spring 02 14 Problem Test for serial correlation SPSS Model Summaryb Model 1 R R Square .496 a .246 Adjus ted R Square .207 Std. Error of the Es timate 31.251 Durbin-W atson 1.080 a. Predictors : (Cons tant), SALES b. Dependent Variable: PROFITS Spring 02 15 Correcting for Serial Correlation We assume: e t e t 1 ut Cov(e t , e t 1 ) 2 e Where ut is distributed normally with a zero mean and constant variance. Follow a Durbin Procedure Spring 02 16 Correcting for Serial Correlation Yt 1 2 X 2t ... k X kt e t Yt 1 1 2 X 2t 1 ... k X kt 1 e t 1 Yt 1 1 2 X 2t 1 ... k X kt 1 e t 1 Yt Yt 1 1 (1 ) 2 ( X 2t X 2t 1 ) ... k ( X kt X kt 1 ) (e t e t 1 ) Spring 02 17 Correcting for Serial Correlation • Move the lagged dependent variable term to the right-hand side and estimate the equation using OLS. The estimated coefficient on the lagged dependent variable is . Yt 1 (1 ) Yt 1 2 ( X 2t X 2t 1 ) ... k ( X kt X kt 1 ) (e t e t 1 ) Spring 02 18 Correcting for Serial Correlation Create new independent and dependent variables by the following process: X X t X t 1 * t Yt Yt Yt 1 * Estimate the following equation: Yt Yt 1 1 (1 ) 2 ( X 2t X 2t 1 ) ... k ( X kt X kt 1 ) (e t e t 1 ) Yt (1 )1 2 X ... k X ut * Spring 02 * 2 * k 19 Correcting for Serial Correlation Yt 1 (1 ) 2 X 2* ... k X k* ut The estimates of the slope coefficients are the same (but corrected for serial correlation) as in the original equation. The constant of the regression on the transformed variables is * (1 ) * 1 1 or 1* 1 (1 ) Spring 02 20 Problem Begin by regressing Profit () on Profit lagged one period, Sales, and Sales lagged one period. t 1 t 1 2 St 2 St 1 ut The estimated coefficient on the lagged dependent variable is . Spring 02 21 Problem = .49 Coefficientsa Model 1 (Cons tant) PROFITSL SALES SALESL Uns tandardized Coefficients B Std. Error -1.419 24.387 .492 .209 .176 .052 -.161 .053 Standardi zed Coefficien ts Beta .419 3.106 -2.840 t -.058 2.358 3.355 -3.046 Sig. .954 .031 .004 .008 a. Dependent Variable: PROFITS Spring 02 22 Problem Then generate the transformed (starred) variables. Run regression on transformed variables Coefficientsa Model 1 Uns tandardized Coefficients B Std. Error (Cons tant) .167 24.855 SALESS 4.234E-02 .020 Standardi zed Coefficien ts Beta .442 t .007 2.091 Sig. .995 .051 a. Dependent Variable: PROFITSS Profit*=.167+.042 Sales* Profit = .327 +.027 Sales Spring 02 With no serial correlation 23