Chapter 8 What are the consequences of using the least squares method if the assumptions * are violated ? Heteroskedasticity Introduction • The least squares estimator is still unbiased. The linear regression equation is: • The standard errors calculated for the least squares estimators are incorrect. Therefore, hypothesis testing exercises may lead to unreliable conclusions. y i = β1 + β 2 x i2 + β 3 x i 3 + . . . + βK x iK + e i for i = 1, 2, . . . , N • The least squares estimator does not have minimum variance in the class of linear unbiased estimators. The adequacy of the least squares (OLS) estimation results depend on a number of assumptions: Correct model specification – all relevant variables are included, correct functional form. Specification error gives biased estimators. This suggests that methods are needed for: • Tests for detecting heteroskedasticity and correlation patterns in the residuals. The random error term satisfies: E(e i ) = 0 • Calculation formula for correct standard errors for the least squares estimators. for all i for all i cov (e i ,e j) = 0 for i ≠ j var (e i ) = σ 2 • Estimation procedures that will give minimum variance estimators. * Chapter 8 looks at models with heteroskedastic errors. The assumptions * state the errors are homoskedastic (equal error Chapter 9 looks at models where the uncorrelated error assumption does not describe the economic behaviour. variance) and uncorrelated. 1 Econ 326 - Chapter 8 2 Econ 326 - Chapter 8 Detecting Heteroskedasticity Heteroskedasticity Heteroskedastic errors can be stated as: The Breusch-Pagan Test var (e i ) = σ 2i for i = 1, 2, . . . , N The linear regression equation with heteroskedastic errors is: The error variance can be different for each observation. y i = β1 + β 2 x i2 + β 3 x i 3 + . . . + βK x iK + e i with var (e i ) = σ 2i Example With cross-section survey data, a model that explains household expenditure as a function of household income may feature heteroskedastic errors. Households with relatively high income will have more discretionary income and, therefore, more variability in expenditure habits. Higher income households will have larger error variance compared to households in a lower income group. A proposal is that the error variance is a function of a set of explanatory variables. This leads to a general functional form for the error variance stated as: σ 2i = h (α1 + α2 x i 2 + α3 x i 3 + . . . + αK x iK ) When α 2 = α 3 = . . . = α K = 0 the equation errors are homoskedastic. A test of interest is: 3 Econ 326 - Chapter 8 4 H0 : α2 = α 3 = . . . = αK = 0 homoskedasticity H1 : not all α in H0 are zero heteroskedasticity Econ 326 - Chapter 8 A test method is now presented. Example Estimate the regression equation by least squares and obtain the least squares residuals êi . The data set introduced in Chapter 2 contains observations on weekly food expenditure (y) and income (x), in dollars, for a sample of 40 households with three family members. Then run the artificial regression: The linear regression equation is: ê2i = α1 + α2 x i 2 + α 3 x i 3 + . . . + αK x iK + v i Using the R 2 goodness-of-fit statistic from the artificial regression, a test statistic for the overall significance of the artificial regression is: N ⋅ R2 This statistic, named the Breusch-Pagan test for heteroskedasticity, 2 y i = β1 + β 2 x i + e i Consider that the error variance is potentially a function of income. The Breusch-Pagan test statistic for heteroskedasticity is calculated as 7.38. can be compared with a chi-square ( χ 2 ) distribution with K–1 degrees of freedom. The p-value for the test statistic is: With Stata the estat hettest command can be used to report the Breusch-Pagan test for heteroskedasticity. The small p-value gives evidence to reject the null hypothesis of homoskedasticity. 5 6 Econ 326 - Chapter 8 p = P(χ(21) > 7.38) = 0.007 Econ 326 - Chapter 8 The Goldfeld-Quandt Test Log-transformed variables have useful application. For the food expenditure data consider the log-log model: ln(y i ) = β1 + β 2 ln(x i ) + e i For the household expenditure function recognize that higher variability may be associated with higher household income. Sort the data set in ascending order of household income (x). For this model the Breusch-Pagan test statistic for heteroskedasticity has a calculated value of 2.30 with an accompanying p-value of 0.13. A heteroskedastic error assumption is: σ12 var(e i ) = σ 22 The conclusion now is that, for this data set, the homoskedasticity assumption suits the log-log model. The log transformation rescales the data and therefore may correct for heteroskedasticity that is observed in the linear model. In particular, the observations in the upper quartile are compressed so that the difference with the other observations is less extreme. with for group 1 (the ' low' income group) for group 2 (the ' high' income group) σ12 < σ 22 Another way of stating the error variance assumption that explicitly recognizes higher error variance for higher household income (x) is: var (e i ) = σ 2 x i To test for this form of heteroskedasticity consider the one-sided test: H0 : σ12 = σ 22 against H1 : σ12 < σ 22 The test method is called the Goldfeld-Quandt test. 7 Econ 326 - Chapter 8 8 Econ 326 - Chapter 8 The test statistic is calculated as follows. An application may suggest a higher error variance in the first group. The one-tail test of interest is now: Split the sample into two groups, with N1 observations in group 1 and N2 observations in group 2. Fit separate least squares (OLS) regressions to each group of observations and estimate an error variance for each group as: σˆ 12 = 1 SSE1 N1 − K σˆ 22 = 1 SSE2 N2 − K H0 : σ12 = σ 22 2 2 against H1 : σ12 > σ 22 The Goldfeld-Quandt test statistic and p-value are computed as: and GQ = σˆ 12 σˆ 22 p = P(F(N1 − K , N2 − K ) > GQ) where SSE1 and SSE2 are the sum of the squared residuals for each group. A two-tail test can also be considered: H0 : σ12 = σ 22 The Goldfeld-Quandt test statistic is: 2 2 2 against H1 : σ12 ≠ σ 22 2 ˆ 12 > σˆ 22 the Goldfeld-Quandt test statistic and p-value are: With σ σˆ 2 GQ = 22 σˆ 1 GQ = This can be compared with an F distribution with (N2 − K, N1 − K) degrees of freedom. σˆ 12 >1 σˆ 22 p = 2 ⋅ P(F(N1 − K , N2 − K ) > GQ) A p-value is calculated as: p = P(F(N2 − K , N1 − K ) > GQ) 9 Econ 326 - Chapter 8 10 Econ 326 - Chapter 8 Example The p-value calculation is illustrated in the figure. For the household food expenditure data set sort the observations in ascending order of household income (x) and then split the sample into two groups each with 20 observations. Estimation results for the linear regression equation show: SSE1 = 64346. ‘low’ income group SSE2 = 232595. ‘high’ income group probability density function of F(18,18) The Goldfeld-Quandt test statistic is: GQ = σˆ 22 σˆ 12 = tail area p=0.005 SSE2 /( 20 − 2) = 3.61 SSE1 /(20 − 2) 0 With Microsoft Excel, the Function F.DIST.RT(3.61, 18, 18) gives the p-value 0.005 (this calculation was also confirmed by the Stata results). For the one-tail test: H0 : σ12 = σ 22 the p-value is calculated as: GQ=3.61 against H1 : σ12 < σ 22 The calculated p-value for the Goldfeld-Quandt test is less than any standard significance level (such as 0.05 or 0.01) and therefore the null hypothesis of homoskedasticity is rejected. p = P(F(18 , 18 ) > 3.61) This result agrees with the finding of the Breusch-Pagan test statistic presented earlier. 11 Econ 326 - Chapter 8 12 Econ 326 - Chapter 8 For the log-log version of the food expenditure model a GoldfeldQuandt test statistic was calculated as 2.93. With the one-sided alternative of higher error variance in the ‘high’ income group the calculated p-value of 0.014 suggests that, at a 1% significance level, the homoskedasticity hypothesis is not rejected. Heteroskedasticity-Consistent Standard Errors Consider a simple model with heteroskedastic errors: y i = β1 + β 2 x i + e i for with E(e i ) = 0 , var (e i ) = σ 2i i = 1, 2, . . . , N and cov (e i ,e j) = 0 for i ≠ j The least squares principle gives an unbiased estimation rule for the parameters. With heteroskedastic errors, it can be shown that the variance of the slope estimator b2 is: N ∑ (x i − x )2 σ 2i var( b2 ) = i=1 N 2 (1) 2 ∑ (x i − x ) i =1 2 2 In the case of homoskedasticity σ i = σ 2 for all i and (1) is simplified to: N var( b2 ) = σ 2 13 Econ 326 - Chapter 8 14 ∑ (x i − x )2 i =1 N ∑ ( x i − x )2 i=1 2 = σ2 1 N (2) ∑ (x i − x ) 2 i=1 Econ 326 - Chapter 8 Equation (2) is the calculation formula used for obtaining the variances of the least squares estimators that are routinely reported on the least squares (OLS) estimation computer output. This may overestimate or underestimate the correct calculation formula stated in Equation (1). However Equation (1) is not operational as stated since the error variances σ 2i are unknown. To obtain an operational formula, the White variance estimator (proposed by Halbert White of the University of California at San Diego) approximates Equation (1) by: With Stata the robust option on the regress command will report estimates of the variances and covariances of the parameter estimators that are adjusted for general heteroskedasticity. As a technical note, for a bias adjustment, Stata scales the calculations by multiplying the variances and covariances by N/(N− −K). The robust option still reports the least squares parameter estimates – only the variances (and therefore, the standard errors) are adjusted. This is intended to permit more reliable hypothesis testing. N ∑ (x i − x )2 ê2i vâr(b2 ) = i=1 N ∑ ( x i − x )2 i =1 2 where the êi for i = 1, . . . , N are the least squares residuals. 15 Econ 326 - Chapter 8 16 Econ 326 - Chapter 8 Generalized Least Squares Suppose that a linear regression equation is estimated by the least squares principle (OLS) and diagnostic testing shows that heteroskedastic errors are an important feature. How can this information be used in model estimation ? The generalized (weighted) least squares estimator can be obtained as follows. Transform the regression equation by dividing the observations by σi . The transformed model is: x yi 1 = β1 + β 2 i + v i σi σi σi Consider the linear regression equation: y i = β1 + β 2 x i + e i 2 with E(e i ) = 0 , var (e i ) = σ i for i = 1, 2, . . . , N and cov (e i ,e j) = 0 for i ≠ j The error of the transformed model is: e ∑ i i = 1 σ i ei σi The statistical properties of the transformed error are: To make use of the information about the heteroskedastic errors, a proposal is to find estimators of β1 and β 2 (the intercept and slope coefficients) that minimize the ‘weighted’ sum of squared errors: N vi = 2 e 1 E(v i ) = E i = E(e i ) = 0 σi σi zero mean e 1 var( v i ) = var i = 2 var(e i ) = 1 σi σi unit variance homoskedastic errors This method is known as weighted least squares (WLS). It is a special case of generalized least squares (GLS). E(v i v j ) = 1 E(e i e j ) = 0 for i ≠ j uncorrelated errors σ iσ j Therefore, the transformed error satisfies the standard assumptions of the Gauss Markov theorem. Least squares (OLS) estimation of the transformed model gives the WLS or GLS estimator. 17 Econ 326 - Chapter 8 18 Econ 326 - Chapter 8 A practical problem is that the σ 2i is unknown. Least squares (OLS) estimation can be applied to the transformed model to get the weighted least squares (WLS) estimates. To make this operational a form for σ 2i must be specified. Example For modelling household food expenditure (y) as a function of income (x) a reasonable assumption may be: var (e i ) = σ 2i = σ 2 x i A problem is that the specification of the error variance equation may not be clear-cut. For example, another variance form is: var (e i ) = σ 2 x 2i The transformed model is now: The error variance increases as income increases (this assumes x i > 0 for all i since non-positive variance is not allowed). It can be noted that yi 1 x = β1 + β 2 i + v i x x i i xi σi = σ x i 1 = β1 + β 2 + v i xi The transformed model is: yi 1 xi x = β1 x + β 2 x + v i i i i where vi = ei xi For this model: var( v i ) = 1 var(e i ) = σ 2 xi homoskedastic errors Note that the transformed model has no intercept coefficient. 19 Econ 326 - Chapter 8 20 Econ 326 - Chapter 8 Grouped Data The error assumptions for the model are: E( ei ) = 0 Example – Wheat Production in Australia A data set, from an Australian wheat-growing district, contains 26 years of time-series data. var( e i ) = E for all i ( ) e 2i A linear regression equation is specified as: y i = β 1 + β 2 x i 2 + β 3 x i 3 + ei for for i = 1, 2, . . . , 13 for i = 14, 15, . . . , 26 (uncorrelated errors) It is expected that σ12 > σ 22 . is quantity of wheat produced in year i, This is an example of a model with heteroskedastic errors. x i 2 is the price of wheat guaranteed for year i, x i 3 = 1, 2, . . . , 26 is a time trend variable that serves as a proxy for technological improvements, and ei cov(e i , e j ) = 0 for all i ≠ j i = 1, 2, . . . , 26 where yi σ 12 = σ 22 The equation can be estimated by least squares (OLS) and the heteroskedasticity assumption can be tested with the Goldfeld-Quandt test. The one-sided test of interest is: is a random error. H0 : σ12 = σ 22 The influence of weather is reflected in the error term. against H1 : σ12 > σ 22 New wheat varieties were introduced after year 13. Their yield was less dependent on weather conditions and therefore lower error variance is suggested for years 14 to 26. For the Australian wheat data set, the estimation results reported a Goldfeld-Quandt test statistic of 11.11. The p-value for the test was calculated as less than 0.0005 to give the conclusion that the null hypothesis of homoskedasticity is rejected (at any reasonable significance level such as 0.01 or 0.05) in favour of the alternative of lower variance in the second half of the sample period. 21 22 Econ 326 - Chapter 8 Econ 326 - Chapter 8 The presence of heteroskedasticity means that the least squares standard errors will be unreliable for confidence interval estimation and hypothesis testing. The White standard errors make adjustments for general heteroskedasticity (see the earlier lecture notes). Least squares estimation results are: In this application, there is some useful information about the source of heteroskedasticity – there are two subsets of observations, each with a different variance. By including this information in the estimation better estimates may be obtained. Generalized least squares (GLS) estimates can be obtained by working with the transformed model: ŷ i = 139.9 + 19.54 x i 2 + 3.64 x i 3 (1.12) (2.57) t-statistics – OLS (<0.0005) (0.27) (6.03) (0.02) p-values (0.94) (2.13) t-statistics – White (<0.0005) (0.36) (5.35) (0.04) p-values x yi x e 1 = β1 + β2 i2 + β3 i3 + i σ1 σ1 σ1 σ1 σ1 for i = 1, 2, . . ., 13 x yi x e 1 = β1 + β2 i2 + β3 i3 + i σ2 σ2 σ2 σ2 σ2 for i = 14, 15, . . ., 26 The important feature of the transformed model is that the error term is homoskedastic. That is, In this case, the t-statistics for individual tests of significance, based on the least squares standard errors that ignored the heteroskedasticity, were bigger than the t-statistics that used the White standard errors that made use of the information about the heteroskedasticity. That is, the least squares standard errors were smaller than the White standard errors and, therefore, overstated the precision of the estimation. e 1 var i = 2 var(ei ) = 1 σ1 σ1 for i = 1, 2, . . ., 13 e 1 var i = 2 var(ei ) = 1 σ2 σ2 for i = 14, 15, . . ., 26 2 2 A practical problem is that the error variances σ 1 and σ 2 are unknown. It can be noted that the results show that the coefficient on the price of wheat ( x 2 ) is not significantly different from zero. 23 Econ 326 - Chapter 8 24 Econ 326 - Chapter 8 A feasible estimator can be obtained by a two-step estimation method. STEP 1 Apply separate least squares (OLS) estimation to each subset of observations. With N1 observations in the first group and N2 observations in the second group the error variances are estimated from the least squares residuals as: σˆ 12 = σˆ 22 = N1 1 1 SSE1 = ∑ ê 2 N1 − K N1 − K i = 1 i Note: The GLS estimator is no longer a linear function of y i because σ̂ σ̂ 2 depend on y i . σ1 and σ The usual interval estimates and hypothesis tests are now only approximate tests in ‘small’ samples. For the Australian wheat production equation, the GLS estimation results are: ŷ i = 138.1 + 21.72 x i 2 + 3.28 x i 3 and N 1 1 SSE2 = ∑ ê 2i N2 − K N2 − K i = N1 + 1 x *i 2 25 x i 2 σˆ 1 = x i 2 σˆ 2 1 σˆ 1 x *i1 = 1 σˆ 2 x *i 3 x i 3 σˆ 1 = x i 3 σˆ 2 (0.82) standard errors (3.99) t-statistics for (26.13) (20.74) (5.35) for i = 1, 2, . . . , 13 for i = 1, 2, . . . , 13 for i = 14, 15, . . . , 26 Econ 326 - Chapter 8 p-values ŷ i = 139.9 + 19.54 x i 2 + 3.64 x i 3 i = 1, 2, . . . , N for i = 14, 15, . . . , 26 (0.001) These results can be compared with the least squares (OLS) estimation results reported earlier: where the transformed observations are constructed as: y i σˆ 1 y *i = y i σˆ 2 (8.92) (2.43) (<0.0005) (0.023) STEP 2 Obtain the feasible GLS (generalized least squares) estimator by applying least squares (OLS) to the transformed model: y *i = β 1 x *i1 + β 2 x *i 2 + β 3 x *i3 + v i (12.8) (10.77) (1.71) standard errors - White (0.94) (2.13) t-statistics – White (<0.0005) (0.36) (0.04) p-values The GLS estimation gives t-statistics for individual tests of significance that are all significant at a 5% significance level. The results show that the GLS standard errors are smaller than the White standard errors that accompany the least squares estimation. That is, the GLS method gives increased precision for the estimation. 26 Econ 326 - Chapter 8 A 95% confidence interval estimate for the coefficient on the price of wheat is calculated as: bGLS ± t c se( bGLS ) = 21.72 ± 2.069 (8.92) = [3.3, 40.2] 2 2 Although the interval estimate appears to be relatively wide, the results can be compared with the interval estimate from least squares estimation: Conclusions So Far Economic theory is used to specify a linear regression equation. The intercept and slope parameters can be estimated by the least squares principle (OLS). Following model estimation a variety of diagnostic tests can be inspected. Examples are: • the Jarque-Bera test for normality of the residuals (Chapter 4) b2 ± t c se(b2 )white = 19.54 ± 2.069 (20.74) = [−23.4, 62.4] Least squares has poor ability to estimate the price coefficient with any precision. • the Ramsey RESET test for model misspecification (Chapter 6) • the Chow test for structural change (Chapter 7) • the Breusch-Pagan test for heteroskedastic errors (Chapter 8) • the Goldfeld-Quandt test for heteroskedastic errors (Chapter 8) Other tests are also available, but not presented here. 27 Econ 326 - Chapter 8 28 Econ 326 - Chapter 8 Suppose a test shows evidence of heteroskedasticity. How should this be interpreted ? Three alternative approaches can be considered. the model may be misspecified such as incorrect functional form. For example, log transformations of the variables may transform the heteroskedastic errors to homoskedastic errors. with a correctly specified model, heteroskedastic errors lead to least squares estimators that are unbiased. But the least squares standard errors are incorrect. Therefore, report the least squares parameter estimates and use the White standard errors that are adjusted for general heteroskedasticity for confidence interval estimation and hypothesis testing. the above approach is inefficient. That is, it does not give a minimum variance estimator. To get an efficient estimator use generalized (weighted) least squares – WLS or GLS. This method requires the specification of a variance function for the error variances σ 2i . In practice, this may not be clear-cut. 29 Econ 326 - Chapter 8