7 Regression with heteroskedastic random disturbances 7.1 Introduction In regression equations we have studied up to now, we have always assumed that the disturbances i satisfy the so-called standard conditions given by (7.1.1) E ( i ) 0 (7.1.2) Var ( i ) E ( i2 ) 2 (7.1.3) Cov( i , j ) 0 for i j for all i It is evident that the condition described by (7.1.2) saying that all the disturbances should have the same variability is an idealized situation, which econometricans very seldom meet in practice. For instance, consider the case when the dependent variable Yi denotes a household’s expenditure of food and the exogenous variable X i denotes the household’s income. It is a well known empirical fact that the variation of food expenditure among highincome households are much larger than the variation among low-income households. In these application an assumption of a constant variance is simply not appropriate. Instead of (7.1.2) we have to assume that the error variance is some function of the households’ income. For example, we might assume that (7.1.4) i2 2 X i , but, of course, many other functional forms could be imagined. On the assumptions (7.1.1) – (7.1.3) we have clarified above the properties of the ordinary least square method and the relevant procedures for testing hypotheses on the regression coefficients. A natural question to ask is which of these properties and testing procedures will survive when we retain the assumptions (7.1.1) and (7.1.3) but drop the assumption (7.1.2) of homoskedastic disturbances? 7.2 Consequences of heteroskedastic disturbances In order to discuss these topics explicitly we can without sacrificing any essential points consider the simple regression (7.2.1) Yi 0 1 X i i , i 1,2,......N where we replace assumption (7.1.2) by the more general (7.2.2) Var ( i ) i2 , i 1,2,....., N In this regression we know that the OLS estimator of the slope parameter 1 is given by 1 N (7.2.3) ̂1 (X i 1 N i (X i 1 X )Yi i X )2 which by substituting for Yi can be written N (7.2.4) ˆ1 1 (X i 1 N i (X i 1 i i X) X )2 From this formula we see directly that (7.2.5) and that E ( ˆ1 ) 1 N (7.2.6) Var ( ˆ1 ) i 1 N 2 i (X i X )2 ( ( X 1 X ) 2 ) 2 i 1 From (7.2.5) we observe directly that the OLS estimator ˆ1 is still an unbiased estimator of . Under general conditions it also follows that ˆ will be consistent estimator of . 1 1 1 Hence, the OLS estimators obtained when the disturbances are heteroskedastic will share these two ‘good’ properties of the OLS estimator with the homoskedastic case. But, of course, when the disturbances are heteroskedastic we can find more efficient methods of estimation, e.g. generalized least square (GLS). However, a more serious objection to the OLS estimator when the disturbances are heteroskedastic is that the variances of the estimators will be different from those obtained when the disturbances are homoskedastic. For example, we remember that in the homoskedastic case the variance of ˆ1 is given by (7.2.7) Var ( ˆ1 ) 2 N (X i 1 i X )2 which can be quite different from that given by (7.2.6). We also remember that the assumption that all disturbances had the same variance 2 were crucial in deriving the distribution of our test statistics, the T statistic and the F statistic. This means that the standard errors of estimates and the standard test statistics shown in the outputs of the traditional regression programs will be wrong and unreliable. However, White (1980) showed how one could derive a consistent estimator of the variance given by (7.2.6). In the literature one usually calls this estimator White’s consistent variance estimator, although this estimator was already known in the statistical literature. A look at (7.2.6) shows that this variance estimator depends on the unknown disturbance variances. This situation, that the number of unknown parameters depends on the number of 2 observations often raises difficult estimation problems. The stationary point of the quadratic form underlying OLS might corresponds to a saddle point and not to a minimum point. But in the present case things come out nicely. The reason for that is that OLS estimators ˆ0 and ˆ1 are consistent estimators. White’s proposal is to replace i2 in (7.2.6) by the corresponding squared residuals, i.e. ˆi2 so that White’s estimator becomes N Var ( ˆ1 ) (7.2.8) ˆ i 1 N 2 i (X i X )2 ( ( X i X ) 2 ) 2 i 1 In order to indicate loosely why this might be a successful proposal, let us consider the numerator of (7.2.6). Since the explanatory variable X is non-stochastic, it is evident that the following equation holds N (7.2.9) i 1 N 2 i ( X i X ) 2 E ( i2 )( X i X ) 2 i 1 Since the disturbances are given by i Yi 0 1 X i (7.2.10) it is intuitive that an appeal to the law of large numbers will imply the convergence of 1 N (7.2.11) N i2 ( X i X ) 2 i 1 N E ( i 1 2 i )( X i X ) 2 Continuing this line of reasoning, it is also reasonable to expect since the OLS estimators ˆ and ˆ are consistent estimators that the residuals 0 (7.2.10) 1 ˆi Yi ˆ0 ˆ1 X i i 1,...N in some way will converge to the disturbances 1 , 2 ,...., N . Then a reasonable guess is that (7.2.11) 1 N N ˆ i 1 2 i (X i X )2 N E ( i 1 2 i )( X i X ) 2 where the arrows indicate convergence in probability. A great advantage of White’s variance estimator is that it does not require a parametric specification for the heteroskedasticity. Unlike other tests, there is no need for subsidiary variables to explain the heteroskedasticity. Thus, this method is just quite general. However, White’s estimator is strictly justified only for large sample sizes, so we can use this variance estimator to construct large sample tests on the regression coefficients. However, it must be admitted that in practice it is also used in applications with only moderate sample sizes. 3 7.3 Detection of heteroskedastic disturbances When we specify our assumption regarding the random disturbances we have to base our considerations on the type of application under study. Above we mentioned the cross-section studies of the demand for food as a function of income as an obvious example where the assumption of homoskedastic disturbances is doubtful, but the econometric literature abounds by similar examples. So what shall we do to secure that our methods are based on firm footing? Well, an easy and convenient first step is to run a OLS regression, then calculate the residuals ˆi and plot the residuals against the explanatory variables. This will indicate whether there is any founding for our suspicion. If the empirical plots indicate that some kind of heterskedasticity is at work in our data, we have to look for test procedures that can help revealing the presence of heteroskedasticity. In the literature there has been developed several such tests which in a way is a refinement of statistical plots in that the framework is regressions of ˆi2 on different functions ( often polynomials) of the explanatory variables or of the estimated dependent variable ( Yˆ , Yˆ 2 ,..) . Note there is no point in regressing ˆ on the i i i explanatory variables since by the working of OLS ˆi will be uncorrelated with the explanatory variables. We shall not give a review of these tests but the following details are useful (7.3.1) ˆi Yi (ˆ0 ˆ1 X i ) (ˆ1 1 )( X i X ) ( i ) implying (7.3.2) ˆi2 (ˆ1 1 ) 2 ( X i X ) 2 ( i ) 2 2(ˆ1 1 )( X i X )( i ) From (7.3.2) we deduce (do that for yourself) that N (7.3.3) E (ˆ N 2 i ) i2 i 1 i 1 N N 2 i i 1 N 2 i ( X i X )2 ( ( X i X ) 2 ) 2 i 1 If i2 2 , i.e. the disturbances are homoskedastic (7.3.3) reduces to N (7.3.4) E (ˆ i 1 2 i ) N 2 2 2 ( N 2) 2 which shows that the estimator N (7.3.5) ˆ 2 ˆ i 1 2 i N 2 4 is an unbiased estimator of 2 in the homoskedastic case. From the expression (7.3.2) we find that when the disturbances are homoskedastic (7.3.6) E (ˆi2 ) 2 2 N 2 ( X i X )2 N (X i 1 i X )2 Hence, even though the disturbances are homoskedastic the squared residual will depend on ( X i X ) 2 . However, under mild conditions on the explanatory variable X this dependency will vanish when the number of observations increases. This explains that tests based on the residuals ˆi2 only has validity when the sample size is large, they are what we call asymptotic tests. Although these tests are quite simple ‘regression tests’ we shall in this course dwell any longer on them. As a formal test of investigating the presence of heteroskedasticity we shall only have a look on the Goldfeld-Quandt test. As a background for this test let us again consider the model of food expenditure as a function of income. As we noted above it is reasonable to assume that the variability of food expenditure is much larger for high income households than for low income families. A simple test of heteroskedasticity based on this idea is to split the sample into two sub-samples where we use income as a sorting criterion. In order to be specific let us split the sample in two equal parts where the first sub-sample contains households with the highest income and the remaining households in second subsample. Then one assumes that variance in the first sub-sample is 12 and the variance in the second is 22 , but note that one still assumes that the regression coefficients are the same for the samples. Applying OLS regression to the two sub-samples will give us estimates of the regression coefficients 0 and 1 , and from these estimates we deduce estimates the two variances 12 and 22 in the usual way. If for example the variance in first sub-sample is 12 and the number of observations allocated to this sample is N1 , then from standard statistical knowledge N1 (7.3.7) V1 RSS 1 2 1 ˆ i 2 1 2 i1 ( N 1 2)ˆ 12 is 2 distribute d with DF equal to N 1 2 2 1 The treatment of sub-sample is, of course, analogical. So the Goldfeld-Quandt test of heteroskedasticity is this case: The null hypothesis H 0 : 12 22 against H A : 12 22 The test procedure is now as usual: Choose a level of significance , then clarify the test statistic one wants to use, find the distribution of the test statistic under the null hypothesis and finally determine the rejection region. In this case we use the test statistic 5 RSS 1 N 2 F 1 RSS 2 N2 2 (7.3.8) where N 2 is the number of observations in the second sub-sample and, of course, RSS 2 also refers to the second sub-sample. Note that under the null hypothesis the two variances are equal and therefore will cancel in expression (7.3.8). If the test statistic F is around 1 the null hypothesis is confirmed, if however the statistic is considerably larger this will tend towards rejection of H 0 . If we reject the null hypothesis, then we are pretty convinced that the disturbances are heteroskedastic. But what do we then do? Well, in that case it is reasonable to reflect on the nature of the heteroskedasticity. If we are convinced that this has a specific form, the next step is to apply an appropriate transformation to the variables entering the model. 7.4 Transformations of variables Let us illustrate this approach by considering a specific application, so we consider the following model (7.4.1) Yi 0 1 X i i For the disturbances we specify E ( i ) 0 for all i Var ( i ) X i 2 (7.4.2 a-c) Cov( i , j ) 0 i j Hence, we suppose that we know the form of the heteroskedasticity. Although, OLS applied to (7.4.1) provides unbiased and consistent estimators of the regression coefficients, we know that OLS is an inefficient method of estimation in this case. The so-called ‘blue’ property of OLS presupposes that the disturbances are homoskedastic. Since we know the form of the heteroskedasticity it is very tempting to try to transform the variables in order to restore the homoskedasticity property of the disturbances. In the present simple case we see directly how this can be done. We simple divide through equation (7.4.1) by the square root of X i , (7.4.3) Yi Xi 0 Xi 1 X i i Xi Since, X i is observable the transformed model will be an ordinary regression with two explanatory variables but without an intercept term. We also note that the disturbances in the transformed model are homoskedastic. With obvious notation we write (7.4.3) as (7.4.4) Vi 0 Z i1 1 Z i 2 ui 6 Since, the disturbances u i are homoskedastic OLS applied to this regression (7.4.4) will give us ‘blue’ estimators of 0 and 1 , and moreover the conventional procedures can now be applied to hypotheses on the regression coefficients. Generally, if we know the form of the heteroskedasticity, we apply the appropriate transformation and everything will be put in order. Unfortunately, we very seldom know the exact form of the heteroskedasticity. In applications there is a case when this approach can be applied. Suppose our data are group averages, but the number of observations in each group vary. That is, we have the model (7.4.5) Yij 0 1 X j ij where j 1,2,.....k counts the number groups, and we have n j observations in each group. Regarding the disturbances ij we assume they are homoskedastic. We don’t have observations on the Yij but we data on the group averages nj (7.4.6) Y.. j Y i 1 ij nj From the regression (7.4.5) we derive regression Y.. j 0 1 X j .. j nj (7.4.7) where .. j i 1 ij nj The disturbances .. j will be heteroskedastic since (7.4.8) Var ( .. j ) 2 nj Evidently, in this case we shall multiply the regression (7.4.7) by n j , so the transformed regression will be (7.4.9) n j Y.. j 0 n j 1 n j X j n j .. j , j 1,2,...., k The disturbances in this regression is homoskedastic since (7.4.10) Var ( n j .. j ) 2 Applying OLS regression to (7.4.9) will provide us with ‘blue’ estimators of 0 and 1 . Applying OLS regression to the transformed variables as exemplified in the regressions (7.4.4) and (7.4.9) are called weighed OLS regression and are examples of generalized least 7 square regression (GLS). We will learn more about GLS in more advanced courses in econometrics. 8