Topic 7: Heteroskedasticity Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Introduction If the disturbance variance is not constant across observations, the regression is heteroskedastic. That is, Var (εi ) = σi2 , i = 1, ..., n. (1) We continue to assume that disturbances are pairwise uncorrelated. This implies that 2 σ1 0 · · · 0 0 σ22 0 0 2 E (εε ) = σ Ω = . (2) .. . . .. .. . 0 0 · · · σn2 Heteroskedasticity may arise in many applications, especially in cross-sectional data. Example 1: (i) The variation in profits of large firms may be greater than that of small ones, even after accounting for differences in firm size. (ii) The variation of expenditure on certain commodity groups may be higher for high-income families than for low-income ones. (iii) When estimate the return to education, ability is unobservable and thus it enters the disturbance. It’s possible that the variance of ability varies with the level of education. (iv) Sometimes heteroskedasticity is a consequence of aggregation (e.g. taking average) of data. By eyeballing the patterns of residuals from OLS estimation, we may find some evidence of heteroskedasticity. Example 2: Consider the following model. 2 EXP = β1 +β2 AGE +β3 IN COM E +β4 (IN COM E) +β5 OW N ER+ε, (3) where EXP is the credit card expenditure and OW N ER is a dummy variable indicating whether an individual owns a house. Model (3) is estimated by OLS and the residuals are saved. In Figure 1 the residuals are plotted against IN COM E and in Figure 2 against AGE. In Figure 1, the spread of the residuals become wider for higher income, while in Figure 2 the distribution of the residuals are largely random. Figure 1 and 2 suggest that a common cause of 1 2 Consequences 2 Fig. 1: Plot of the OLS Residuals against INCOME heteroskedasticity is that the variances of the disturbance terms may depend on some of the x variables, i.e., σi2 = h (xi ). In this case, it appears that σi2 is positively related to IN COM E. STATA Tips To obtain graphs like those in Figure 1 and 2, use the following command in STATA. reg exp age income income2 owner predict e, resid graph twoway scatter e income, msymbol(oh) yline(0) 2 Consequences Recall from our previous discussion that if we use OLS when Var (ε) = σ 2 Ω, then (i) b is unbiased; b is BLUE. (ii) b is inefficient, while the GLS estimator, β, −1 −1 −1 (iii) Var (b) = σ 2 (X0 X) X0 ΩX (X0 X) . So the use of σ 2 (X0 X) is incorrect and it leads to incorrect standard errors and unreliable inferences about population parameters. 3 Robust Estimation of Asymptotic Covariance Matrix 3 Fig. 2: Plot of the OLS Residuals against AGE 3 Robust Estimation of Asymptotic Covariance Matrix The above discussions suggest that if we are to continue using OLS in the presence of heteroskedasticity, then we should at least use the correct formula for Var (b). Note that in the expression for Var (b), σ 2 and Ω are both unknown. To estimate Var (b), we need to estimate the matrix σ 2 X0 ΩX. White (1980, Econometrica) shows that under very general conditions, the matrix, n 1X 2 S0 = e xi x0i , n i=1 i (4) is a consistent estimator of n Σ= 1 2 0 1X 2 σ X ΩX = σ xi x0i , n n i=1 i (5) where ei is the OLS residual for observation i and x0i = xi1 xi2 · · · xiK . Therefore, we can obtain a consistent estimator of Var (b), which is given by ! n X −1 −1 0 2 0 Est.Asy.Var (b) = (X X) ei xi xi (X0 X) . (6) i=1 This is usually called the White heteroskedastic-consistent/robust estimator of the covariance matrix of b. Note that in forming this estimator, we don’t have to assume any specific form of heteroskedasticity. So it’s a very useful result. The asymptotic properties of the estimator is unambiguous, but 4 Testing for Heteroskedasticity 4 its usefulness in small sample is open to question. Some Monte Carlo studies suggest that in small sample the White estimator tends to underestimate the variance matrix. Remark 1: With the White robust estimator for covariance matrix, we can construct the t statistic as usual, which is called the heteroskedastic-robust t statistic. Note that this robust statistic follows a t distribution only asymptotically. In small sample, its sampling distribution is unknown. Remark 2: We cannot use the F test for testing exact linear restrictions because the distributional assumption of the F statistic requires homoskedasticity. But we can use a Wald test. The statistic is W 0 −1 = (Rb − q) {R [Est.Asy.Var (b)] R0 } ∼ χ2J (Rb − q) (7) under H0 : Rβ = q. That is, the statistic is asymptotically distributed as χ2 with degrees of freedom equal to the number of restrictions. STATA Tips: In STATA, to obtain the White estimator, we simply add the option “robust” to the “regress” command. For example, reg y x1 x2 x3, robust Then the output will report standard errors computed from the White estimator of the covariance matrix of b. 4 Testing for Heteroskedasticity Among others, three tests are common in practice to detect heteroskedasticity. They are: (1) White’s general test; (2) Goldfeld-Quandt test; and (3) BreuschPagan LM test. These tests are based on the following strategy. OLS estimator of β is consistent even in the presence of heteroskedasticity. Therefore, the OLS residuals will mimic the heteroskedasticity of the true disturbances. Hence, tests designed to detect heteroskedasticity will be applied to the OLS residuals. 4.1 White’s General Test The hypotheses under examination are H0 : σi2 = σ 2 vs. H1 : not H0 . Note that to conduct White test, we do not have to assume any specific form of heteroskedasticity. White test is motivated by the observation that if the model does not have heteroskedasticity, then ε2i should not be correlated with any regressors, the squares of those regressors and their cross products. A simple operational version of White test is carried out by obtaining nR2 in the auxiliary regression of e2i on a constant and all unique variables contained in xi and all the squares and cross products of the variables in xi . 4 Testing for Heteroskedasticity 5 Example 3: Suppose we have four regressors, x1 , x2 , x3 , and a constant term. Then White test is carried out by first obtaining the residuals, ei , from OLS of the original model and then estimating an auxiliary regression e2i on a constant and x1 , x2 , x3 , x21 , x22 , x23 , x1 x2 , x1 x3 , x2 x3 . Finally, record the R2 from the auxiliary regression and construct the test statistic nR2 . The test statistic, nR2 , is asymptotically distributed as chi-squared with P − 1 degrees of freedom, where P is the number of regressors in the auxiliary regression, including the constant. a nR2 ∼ χ2P −1 . (8) Remark 3: White test is very general in that it does not specify any specific form of heteroskedasticity. Remark 4: Due to its generality, White test may simply identify some other specification errors (such as the omission of x2 from a simple regression) instead of heteroskedasticity. Remark 5: The power of White test may be low in some cases. Remark 6: White test is nonconstructive in that if we reject the null hypothesis, then the result of the test does not provide any guidance for the next step. STATA Tips To perform White test in STATA, you can either manually construct the test statistic as in (8) or to use the “whitetst” command following a “regress” command on the original model. “whitetst” is not an official STATA command and has to be downloaded. Type “findit whitetst” in STATA and follow the link and the command will be installed automatically. 4.2 Goldfeld-Quandt Test Goldfeld-Quandt test assumes some particular form of heteroskedasticity. It tests that E ε2i = σ 2 h (xik ), e.g., σ 2 x2ik . This test is applicable if one of the x variables is thought to cause the heteroskedasticity. Steps: 1. Reorder observations by values of xk . 2. Omit c central observations and we are left with two samples of (n − c) /2 observations. 3. Let σ12 (σ22 ) be the error variance of the first (second) sample. Test H0 : σ12 = σ22 vs. H1 : σ22 > σ12 . 4. Estimate the regression y = Xβ + ε in each sub-sample (which requires that (n − c) /2 > K). Obtain e01 e1 and e02 e2 , where e1 and e2 are the residual vectors from the two sub-samples respectively. 5. Form R = e02 e2 /e01 e1 . 4 Testing for Heteroskedasticity 6 It can be shown that under H0 , R ∼ Fn∗ ,n∗ , (9) where n∗ = (n − c − 2K) /2. Remark 7: c can be zero. Introducing c is intended to increase the power of the test. However, if c increases, then (n − c) /2 decreases, which leads to lower degrees of freedom in the estimation with each sub-sample and this tends to diminish the power of the test. So there is a trade-off in choose the appropriate c. Some studies suggest that no more than a third of the observations should be dropped. One choice is that c ' n3 − 2K. Remark 8: Goldfeld-Quandt test is exactly distributed as F under H0 if the disturbances are normally distributed. If not, then F distribution is only an approximation. 4.3 Breusch-Pagan LM Test Goldfeld-Quandt test is reasonably powerful if we know or are able to identify correctly the variable to use in sample separation. This limits its generality. For example, what if a set of regressors jointly determine the nature of heteroskedasticity? In this regard, Breusch-Pagan LM test is more general. Assume σi2 = h (z0i α) , where h (·) is some function, α is a coefficient vector unrelated to β, and zi is a vector of variables causing heteroskedasticity, with the first element being 1. Within this framework, if α2 = α3 = · · · = αP = 0, then σi2 = h (α1 ) = σ 2 , i.e., homoskedasticity. Therefore, we are to test H0 : α2 = α3 = · · · = αP = 0 vs. H1 : not H0 . Steps: 1. Regress y on X. Obtain OLS residual vector e. 2. Compute σ b2 = e0 e/n and gi = e2i /b σ 2 − 1. 3. Estimate, by OLS, an auxiliary regression gi = α1 + α2 zi2 + α3 zi3 + · · · + αP ziP + vi. (10) 4. Compute the regression sum of squares (SSR), n X n 1X SSR = (b gi − g) , where g = gi . n i=1 i=1 Under H0 , LM = 2 SSR a 2 ∼ χP −1 . 2 (11) (12) 5 Generalized Least Squares Estimator 7 STATA Tips To perform Breusch-Pagan LM Test in STATA, you can use the “hettest” or the “bpagan” command following the “regress” command on the original model. “bpagan” is unofficial and thus needs to be downloaded. The syntax is the following. hettest var_list where var_list specifies zi without the 1. The same syntax applies to bpagan. 5 Generalized Least Squares Estimator 5.1 Weighted Least Squares when Ω Is Known Suppose the variance matrix of ε is given by (2), where Ω is known. Without loss of generality, we may write σi2 = σ 2 ωi . (13) So, ··· ω1 ω2 Ω= . .. 0 .. . ··· 0 .. . . ωn Now consider a “weight” matrix, P, as follows: 1 √ ··· 0 ω1 1 √ ω2 P= .. .. .. . . . 0 · · · √1ωn Hence, P0 P = Ω−1 and Py = √ y1 / ω1 √ y2 / ω2 .. . √ yn / ωn and PX = (14) . √ x1 / ω1 √ x2 / ω2 .. . √ xn / ωn (15) Regressing Py on PX using OLS gives the GLS estimator, b β = = = −1 (X0 P0 PX) X0 P0 Py −1 0 −1 X0 Ω−1 X XΩ y " n #−1 " n # X X 0 wi xi xi wi xi yi , i=1 (16) i=1 b is also called the weighted least squares where wi = 1/ωi . In this case, β (WLS) estimator. 5 Generalized Least Squares Estimator 8 A common specification is that the variance is proportional to one of the regressors or its square. For example, if σi2 = σ 2 x2ik (17) for some k, then the transformed regression model for GLS (or WLS) is y x1 x2 xK ε = βk + β1 + β2 + ... + βK + . xk xk xk xk xk (18) If the variance is proportional to xk instead of x2k , then the weight applied to √ each observation is 1/ xk instead of 1/xk . STATA Tips In STATA, you can perform WLS either by manually transforming the data and then running OLS or use the “aweight” feature in the “regress” command. The syntax is as follows. regress y x1 x2 ... xk [aweight=var_name] The weight to be used is 1/ωi . For example, if σi2 = σ 2 x2ik , then you should first generate a variable, say w, which equals 1/x2ik , and then write [aweight=w] in the “regress” command. If σi2 = σ 2 xik , then w should be 1/xik . 5.2 Estimation when Ω Is Unknown It’s rare that the form of Ω is known, so usually it has to be estimated. The general form of the heteroskedastic regression model has too many parameters to estimate. Typically, the model is restricted by formulating σ 2 Ω as a function of a few parameters, α. Write this function as Ω (α). FGLS based on a consistent estimator of Ω (α) is asymptotically equivalent to full GLS. Recall that for the heteroskedastic model, the GLS estimator is " n #−1 " n # X 1 X 1 b= β xi x0i xi yi . (19) 2 σ σi2 i i=1 i=1 Basically, we first need to obtain estimates for σi2 , say σ bi2 , usually using some b b from (19) and σ function of the OLS Then we can compute β bi2 . residuals. 2 2 Note that E εi = σi , so ε2i = σi2 + vi , (20) where vi is the difference between ε2i and its expectation. Since εi is unobservable, we would use the least squares residuals, for which ei = εi − x0i (b − β) = εi + ui . (21) e2i = ε2i + u2i + 2εi ui. (22) Then P However, we know that b is consistent, i.e., b−→β. Therefore, the terms in ui will become negligible and thus approximately we have, e2i = σi2 + vi∗ . (23) 5 Generalized Least Squares Estimator 9 The above reasoning leads to the following estimation strategy. If σi2 = h (z0i α), where zi may or may not coincide with xi , then we can obtain a consistent estimator for α by estimating e2i = h (z0i α) + vi∗ . (24) Obtaining the fitted value of e2i , say eb2i , we can use it in place of σi2 in (19) to b b the feasible generalized least squares (FGLS) estimator. This construct β, estimation method is called the two-step estimation. A common functional form for h (·) is exponential. Suppose we have a model, yi = β1 + β2 x2i + · · · + βK xKi + εi , where εi ∼ 0, σi2 . (25) We may write σi2 = exp (α1 + α2 z2i + · · · + αP zP i ) vi , where vi is uncorrelated with z’s and has expectation of 1. Then ln σi2 = α1 + α2 z2i + · · · + αP zP i + vi∗ . (26) (27) In this case, the procedures to obtain the FGLS estimator are the following. 1. Regress y on (1, x1 , ..., xK ) and obtain ei . 2. Compute ln e2i and use it as the dependent variable in model (27). Ob\ tain the fitted value, ln (e2 ). i \ 3. Compute b hi = exp ln (e2i ) and its reciprocal, wi = 1 . b hi 4. Use wi as the weight to compute the weighted least squares estimator of β.