10.1 Heteroskedasticity Objectives • • • • What is heteroskedasticity? What are the consequences? How is heteroskedasticity identified? How is heteroskedasticity corrected? ECON 7710, 2010 10.2 Main empirical model for Unit 10: foodexpi = 0 + 1incomei + i. foodexp: Family food expenditure income : Family income Least squares estimates, US data (UE_Tab0301) ˆf oodexp se i 40 . 77 0 . 128 22 . 14 0 . 031 *** Income R 0 . 3171 , N 40. 2 Is this the best estimated equation? ECON 7710, 2010 i 10.3 1. The Nature of Heteroskedasticity In a regression about firms, for the same mistake, million billion ECON 7710, 2010 10.4 Heteroskedasticity is a problem that occurs when the error term does not have a constant variance. CLRM: Each error term comes from the same probability distribution. Assumption CLRM.5 is violated! ECON 7710, 2010 10.5 Regression Model Yi = 0 + 1X1i + 2X2i + i zero mean: E(i|X1i,X2i) = 0 homoskedasticity: var(i|X1i,X2i) = s 2 no autocorrelation: cov(i, j|X1i,X2i,X1j,X2j) = 0 i= j ECON 7710, 2010 10.6 Identical distributions for observations i and j Distribution for i Distribution for j ECON 7710, 2010 10.7 Homoskedasticity Yi = 0 + 1Xi + i var(i|Xi) = s2 for all i f(Y) . . 0 X1 X2 X3 . X4 ECON 7710, 2010 Conditional Distribution . X 10.8 Heteroskedasticity Yi = 0 + 1Xi + i var(i|Xi) = si2 for all i ECON 7710, 2010 Conditional Distribution 10.9 ECON 7710, 2010 10.10 ECON 7710, 2010 10.11 Pure heteroskedasticity Different variances of the error term. Correctly specified PRF. Impure heteroskedasticity Different variances of the error term. Specification error. ECON 7710, 2010 10.12 2. Detecting Heteroscedasticity 2.1 Graphical Method Plotting foodexp against income (for one regressor) Scatter Diagram of Regressing foodexp on income 280 200 foodexp Example 1: Food expenditure, US Data (UE_Tab0301) 240 160 120 80 40 200 400 600 800 ECON 7710, 2010 income 1,000 1,200 10.13 Example 1: Food expenditure, US Data, UE_Tab0301 Plotting e2 against income. Plotting e against income. 7,000 120 6,000 80 squared residual 5,000 residual 40 0 4,000 3,000 2,000 -40 -80 200 1,000 400 600 800 1,000 1,200 ECON 7710, 2010 income 0 200 400 600 800 income 1,000 1,200 10.14 Example 2: textbook data, (Woody3) *** *** *** ** ˆ Y 102,192 9, 075 N 0.35 P 1.29 I se R 0.6182, N = 33. 2 40,000 30,000 residual 20,000 10,000 0 -10,000 -20,000 -30,000 0 50,000 100,000 150,000 200,000 250,000 ECON 7710, 2010 Population 10.15 3.2 Park Test Model Yi = 0 + 1X1i + … + KXKi + t i = 1,…,N (*) Suppose it is suspected that var(i) depends on Zi in the form of var( i) = si2 = s2Zi1evi lnsi2 = lns2 + 1lnZki + vi Ho: 1 = 0 (Homoskedastic errors); HA: 1 0 (Heteroskedastic errors). ECON 7710, 2010 10.16 Step 1: Estimate the equation (*) with OLS and obtain the residuals. e i Yi Yˆi Yi ˆ 0 ˆ1 X 1 i ˆ K X K i Step 2: Regress the natural log of squared residuals on the natural log of a possible proportionality factor ln(ei2) = 0 + 1lnZi + vi where vi is an error term satisfying all classical assumptions. ECON 7710, 2010 10.17 Step 3 If the coefficient of lnZ is significantly different from zero, then it would suggest that there is heteroscedastic pattern in the residuals with respect to Z. Otherwise, homoscedastic errors cannot be rejected. Example 3: Park Test: US data (UE_Tab0301) ^ ln(e2) = -7.46 + 2.07** ln(income) t (2.28) p-value (0.0284) ECON 7710, 2010 10.18 Advantages of the Park test: a. The test is simple. b. It provides information about the variance structure. Limitations of the Park test: a. The distribution of the dependent variable is problematic. b. It assumes a specific functional form. c. It does not work when the variance depends on two or more variables. d. The correct variable with which to order the observations must be identified first. e. It cannot handle partitioned ECON 7710,data. 2010 10.19 3.3 White’s Test Model Yi = 0 + 1X1i + 2X2i + i i = 1,…,N (*) Suppose it is suspected there may be heteroskedasticity but we are not sure of its functional form. Ho: The conditional variance of i is constant. HA: The conditional variance of i is not constant. ECON 7710, 2010 10.20 Step 1: Estimate the equation (*) with OLS and obtain the residuals. e Y Yˆ Y ˆ ˆ X ˆ X i i i i 0 1 1i 2 2i Step 2: Regress the squared residuals on all explanatory variables, all cross product terms and the square of each explanatory variable. ei2 = 0 + 1X1i + 2X2i + 3X1i2 + 4X2i2 + 5X1iX2i + vi ECON 7710, 2010 10.21 Step 3: Test the overall significance of the equation in Step 2. (df = number of regressors) Statistic = NR2white ~ 2df Critical value (cv) = 2df, Reject the hypothesis of homoskedasticity if NR2err > cv. Example 4: White test: US data (UE_Tab0301) ^ e2 = 1924 – 7.4 income + 0.0088income2* R2 = 0.3646, N = 40, NR2 = 14.58 cv = 2(2, 0.01) = 9.21. ECON 7710, 2010 10.22 Advantages of the White test: a. It does not assume a specific functional form. b. It is applicable when the variance depends on two or more variables. Limitations of the White test: a. It is an large-sample test. b. It provides no information about the variance structure. c. It loses many degrees of freedom when there are many regressors. d. It cannot handle partitioned data. e. It also captures specification errors. ECON 7710, 2010 10.23 3. Consequences of Heteroskedasticity If heteroskedasticity appears but OLS is used for estimation, how are the OLS estimates affected? Unaffected: OLS estimators are still linear and unbiased because, on average, overestimates are as likely as underestimates. E ˆ k k k 0 ,1, , K ECON 7710, 2010 10.24 3.1 OLS estimators are inefficient. Some fluctuations of the error term are attributed to the variation in independent variables. There are other linear and unbiased estimators that have smaller variances than the OLS estimator. ECON 7710, 2010 10.25 3.2 Unreliable Hypothesis Testing var ols biased ˆ var k hetero ˆ k se ˆ k unreliable testing conclusion ECON 7710, 2010 10.26 4. Remedies 4.1 Heteroskedasticity-Corrected Standard Errors Yi = 0 + 1X1i + 2X2i + i heteroskedasticity: var(i) = si 2 OLS estimators are unbiased. The standard errorsECON of7710, OLS 2010 are biased. 10.27 A heteroskedasticity-consistent (HC) standard error of an estimated coefficient is a standard error of an estimated coefficient adjusted for heteroskedasticity. a. HC standard errors are consistent for any type of heteroskedasticity. b. Hypothesis tests are valid with HC standard errors in large samples. c. Typically, HC se > OLS se ECON 7710, 2010 10.28 Example 5: Yi = 0 + 1Xi + i, var(i|Xi) = si. incorrect variance formula: correct variance formula: var ˆ 1 s 2 Xi X 2 var ˆ X X 2 2 si X i X 1 2 i ECON 7710, 2010 2 10.29 HC estimator of the variance of the slope coefficient in a simple regression model X ei est . var ˆ 1 2 X i X 2 i X 2 2 Example 6: HC Standard Errors, US data (UE_Tab0301) ˆfo o d ex p = 4 0 .7 7 0 .1 3 *** in co m e i i o ls se h c se 2 2 .1 4 2 4 .3 2 0 .0 3 1 0 .0 3 9 R 0 .3 1 7 1, ECON N 7710, = 42010 0. 2 10.30 4.2 Weighted Least Squares Yi = 0 + 1X1i + 2X2i + i E(i) = 0 var(i) = si 2 si 2 = c Zi 2 ECON 7710, 2010 cov(t, s) = 0 t =s The variance is assumed to be proportional to the value of Zi2 10.31 Step 1: Decide which variable is proportional to the heteroskedasticity. Step 2: Divide all terms in the original model by that variable (divide by Zi ). ECON 7710, 2010 10.32 Step 3: Run least squares on the transformed model which has new variables. Note that the transformed model have an intercept only if Z is one of the explanatory variables. For example, if Zi = X2i, then ECON 7710, 2010 10.33 Example 7: WLS: US data (UE_Tab0301) ˆ foodexp incom e 1 *** 0.1577 21.2858 0.02342 14.0380 incom e se R 0.0570, N = 40. 2 What are values of the estimated coefficients of the original model? Has the problem of heteroskedasticity solved? ECON 7710, 2010 10.34 Comparing different estimates: US data (UE_Tab0301) 0 1 OLS estimate 40.77 0.128*** OLS se 22.14 0.031 HC se 24.32 0.039 WLS estimate 21.28 0.158*** WLS se 14.03 0.023 The WLS estimates have improved upon ECON 7710, 2010 those of OLS. 10.35 Other possibilities • var(i) = cZi • var(i) = cZi • var(i) = c(a1X1i + a2X2i) ECON 7710, 2010 10.36 In large samples HC standard errors are consistent measures for any type of heteroscedasticity. CI & t-test are valid. ECON 7710, 2010 10.37 4.3 Re-specifying the Regression Model The heteroskedasticity may be impure. 4.3.1 Use another functional form E.g., Double-log: Less variation Example 8: US data (UE_Tab0301) *** ˆ ln foodexp 0.30 0.69 ln incom e 0.90 0.14 se R 0.4014, N = 40. 2 The hypothesis of constant variance can be rejected. ECON 7710, 2010 10.38 Example 9: India data (Food_India55) Empirical model: foodexpi = 0 + 1totexpi + i. ˆfoodexp 94.21** 0.44 *** totexp i se 50.86 0.078 R 0.3698, N = 55. 2 The hypothesis of homoskedasticity can be rejected by the Park and White tests. ECON 7710, 2010 10.39 Which model is the best? Double-log *** ˆ ln foodexp 1.15 0.74 ln totexp 0.78 0.12 se R 0.4125, N = 55. 2 HC ˆfoodexp 94.21** 0.44 *** totexp. i o ls se h c se ˆ foodexp WLS totexp 5 0 .8 6 4 3 .2 6 76.5439 37.9435 se ECON 7710, 2010 ** 0 .0 7 8 0 .0 7 4 1 totexp 0.4650 0.0632 *** . 10.40 4.3.2 Other reformulations E.g., take average of variables related to the size of observed units, adding more variables Example 10: Data set “Concert” The concert tour of a singer in the US revenue = 0 + 1adv + 2stad + 3cd + 4radio + 5weekend + . ECON 7710, 2010 10.41 (1) rˆ evenue 73 3.15adv 34.66stad 8.30cd 300 radio 356 w eekend rˆ evenue 81 stad (2) 2.10 stad adv 50.20 stad 7.53 stad cd stad se 176 radio stad rˆ evenue (3) 1 pop 293 w eekend stad 22 2.21 adv pop 2.53radio 4.28 w eekend ECON 7710, 2010 109 stad pop 7.93 cd pop 10.42 Remarks: •The variable Z is difficult to identify. The functional relationship between the error and Z is not known. Use WLS at last. •With correct WLS, we expect the standard errors of the regression coefficients will be smaller than the OLS counterparts. •A log transformation usually reduces the degree of heteroskedasticity. •The hypothesis of homoskedasticity should not 7710, 2010 be rejected in the newECON model. 10.43 5. A Complete Example Sources: Section 8.2.2 (pp. 255 – 256) Section 10.5 (pp. 369 – 376) Empirical regression model pconi = 0 + 1regi + 2taxi + 3uhmi + i. pconi1: regi : taxi : uhm : petroleum consumption in the ith state motor vehicle registrations in the ith state (‘000) the gasoline tax rate in the ith state(cents per gallon) ECON 7710, 2010 urban highway miles wihtin the ith state 10.44 Equation 1 ^ = 389.57*** – 0.061reg – 36.47***tax + 60.76***uhm pcon se, vif (0.04, 24.3) (13.15, 1.1) (10.26, 24.9) Adj. R2 = 0.9192, N = 50. Equation 2 ^ = 551.69*** + 0.19***reg – 53.59***tax pcon se (0.012) Adj. R2 = 0.8607, N = 50. ECON 7710, 2010 (16.86) 10.45 Graphical investigation 1,200 residual 800 400 0 -400 -800 0 5,000 10,000 REG ECON 7710, 2010 15,000 20,000 10.46 Park test ^ ln(e2) = 1.65 + 0.95***ln(REG) se R2 = 0.1657, N = 50 (0.3083) White test ^e2 = 11,098,291 + 140REG – 0.0005REG2 – 12.84REGTAX – 237,873TAX + 12347TAX2. R2 = 0.6645, N = 50, NR2 = 33.22. Checking for other specifications: Double log, quadraticECON 7710, 2010 10.47 (4) ^ = 551.69*** + 0.19***reg – 53.59***tax pcon hc se (0.022) (23.90) R2 = 0.8664, N = 50. p cˆ on (5) reg 0 . 01367 218 . 539 48 . 1033 *** 1 17 . 3890 4 . 6822 reg se R 2 0 . 3600 , N 50 p cˆ on (6) 0 . 1678 *** pop 0 . 1684 ** 0 . 1082 0 . 07159 reg pop se R 2 0 . 1989 , N 50 ECON 7710, 2010 0 . 0103 0 . 00349 *** tax *** tax reg 10.48 Selected Exercises Ch. 10: Q. 1, 3, 4, 5, 8, 10, 12, 14 ECON 7710, 2010 10.49 Regression Model Yi = 0 + 1X1i + 2X2i + i zero mean: E(i|X1i,X2i) = 0 homoskedasticity: var(i|X1i,X2i) = s 2 no autocorrelation: cov(i, j|X1i,X2i,X1j,X2j) = 0 i= j 2 heteroskedasticity: ECON 7710,var( |X ,X ) = s i 1i 2i i 2010 10.50 Heteroskedasticity Yi = 0 + 1Xi + i var(i|Xi) = si2 for all i f(Y) . . 0 X1 X2 X3 ECON 7710, 2010 Conditional Distribution . X 10.51 Step 3: Test the overall significance of the equation in Step 2. (df = number of regressors) Statistic = NR2err ~ 2df Critical value (cv) = 2df, Reject the hypothesis of homoskedasticity if NR2err > cv. ECON 7710, 2010 10.52 Step 1: Decide which variable is proportional to the heteroskedasticity. Step 2: Divide all terms in the original model by that variable (divide by Zi ). Yi Zi β 0 1 Zi β1 X 1i Zi 2 X 2i Zi * * * * * Yi β 0 X 0i β 1 X 1i β 2 X 2i i ECON 7710, 2010 i Zi 10.53 Step 3: Run least squares on the transformed model which has new variables. Note that the transformed model have an intercept only if Z is one of the explanatory variables. For example, if Zi = X2i, then Yi Zi β0 1 Zi Y β 0X * i * 0i β1 X 1i Zi β X * 1 1i ECON 7710, 2010 2 β2 i Zi * i 10.54 In large samples HC standard errors are consistent measures for any type of heteroscedasticity. CI & t-test are valid. OLS estimator WLS HC se’s Improve No Standard errors Improve Improve Specific form Yes No Large sample No Yes ECON 7710, 2010