7 Regression with heteroskedastic random disturbances

advertisement
7 Regression with heteroskedastic random disturbances
7.1 Introduction
In regression equations we have studied up to now, we have always assumed that the
disturbances  i satisfy the so-called standard conditions given by
(7.1.1)
E ( i )  0
(7.1.2)
Var ( i )  E ( i2 )   2
(7.1.3)
Cov( i ,  j )  0 for i  j
for all i
It is evident that the condition described by (7.1.2) saying that all the disturbances should
have the same variability is an idealized situation, which econometricans very seldom meet in
practice. For instance, consider the case when the dependent variable Yi denotes a
household’s expenditure of food and the exogenous variable X i denotes the household’s
income. It is a well known empirical fact that the variation of food expenditure among highincome households are much larger than the variation among low-income households. In
these application an assumption of a constant variance is simply not appropriate. Instead of
(7.1.2) we have to assume that the error variance is some function of the households’ income.
For example, we might assume that
(7.1.4)
 i2   2 X i ,
but, of course, many other functional forms could be imagined.
On the assumptions (7.1.1) – (7.1.3) we have clarified above the properties of the ordinary
least square method and the relevant procedures for testing hypotheses on the regression
coefficients. A natural question to ask is which of these properties and testing procedures will
survive when we retain the assumptions (7.1.1) and (7.1.3) but drop the assumption (7.1.2) of
homoskedastic disturbances?
7.2 Consequences of heteroskedastic disturbances
In order to discuss these topics explicitly we can without sacrificing any essential points
consider the simple regression
(7.2.1)
Yi   0  1 X i   i , i  1,2,......N
where we replace assumption (7.1.2) by the more general
(7.2.2)
Var ( i )   i2 , i  1,2,....., N
In this regression we know that the OLS estimator of the slope parameter  1 is given by
1
N
(7.2.3)
̂1 
(X
i 1
N
i
(X
i 1
 X )Yi
i
 X )2
which by substituting for Yi can be written
N
(7.2.4)
ˆ1  1 
 (X
i 1
N
i
(X
i 1
i
i
 X)
 X )2
From this formula we see directly that
(7.2.5)
and that
E ( ˆ1 )  1
N
(7.2.6)
Var ( ˆ1 ) 

i 1
N
2
i
(X i  X )2
( ( X 1  X ) 2 ) 2
i 1
From (7.2.5) we observe directly that the OLS estimator ˆ1 is still an unbiased estimator of
 . Under general conditions it also follows that ˆ will be consistent estimator of  .
1
1
1
Hence, the OLS estimators obtained when the disturbances are heteroskedastic will share
these two ‘good’ properties of the OLS estimator with the homoskedastic case. But, of
course, when the disturbances are heteroskedastic we can find more efficient methods of
estimation, e.g. generalized least square (GLS). However, a more serious objection to the
OLS estimator when the disturbances are heteroskedastic is that the variances of the
estimators will be different from those obtained when the disturbances are homoskedastic. For
example, we remember that in the homoskedastic case the variance of ˆ1 is given by
(7.2.7)
Var ( ˆ1 ) 
2
N
(X
i 1
i
 X )2
which can be quite different from that given by (7.2.6). We also remember that the
assumption that all disturbances had the same variance  2 were crucial in deriving the
distribution of our test statistics, the T statistic and the F statistic. This means that the
standard errors of estimates and the standard test statistics shown in the outputs of the
traditional regression programs will be wrong and unreliable.
However, White (1980) showed how one could derive a consistent estimator of the variance
given by (7.2.6). In the literature one usually calls this estimator White’s consistent variance
estimator, although this estimator was already known in the statistical literature. A look at
(7.2.6) shows that this variance estimator depends on the unknown disturbance variances.
This situation, that the number of unknown parameters depends on the number of
2
observations often raises difficult estimation problems. The stationary point of the quadratic
form underlying OLS might corresponds to a saddle point and not to a minimum point. But in
the present case things come out nicely. The reason for that is that OLS estimators ˆ0 and ˆ1
are consistent estimators. White’s proposal is to replace  i2 in (7.2.6) by the corresponding
squared residuals, i.e. ˆi2 so that White’s estimator becomes
N
Var ( ˆ1 ) 
(7.2.8)
 ˆ
i 1
N
2
i
(X i  X )2
( ( X i  X ) 2 ) 2
i 1
In order to indicate loosely why this might be a successful proposal, let us consider the
numerator of (7.2.6). Since the explanatory variable X is non-stochastic, it is evident that the
following equation holds
N

(7.2.9)
i 1
N
2
i
( X i  X ) 2   E ( i2 )( X i  X ) 2
i 1
Since the disturbances are given by
 i  Yi   0  1 X i
(7.2.10)
it is intuitive that an appeal to the law of large numbers will imply the convergence of
1
N
(7.2.11)
N
  i2 ( X i  X ) 2 
i 1
N
 E (
i 1
2
i
)( X i  X ) 2
Continuing this line of reasoning, it is also reasonable to expect since the OLS estimators
ˆ and ˆ are consistent estimators that the residuals
0
(7.2.10)
1
ˆi  Yi  ˆ0  ˆ1 X i
i  1,...N
in some way will converge to the disturbances  1 ,  2 ,....,  N . Then a reasonable guess is that
(7.2.11)
1
N
N
 ˆ
i 1
2
i
(X i  X )2

N
 E (
i 1
2
i
)( X i  X ) 2
where the arrows indicate convergence in probability.
A great advantage of White’s variance estimator is that it does not require a parametric
specification for the heteroskedasticity. Unlike other tests, there is no need for subsidiary
variables to explain the heteroskedasticity. Thus, this method is just quite general. However,
White’s estimator is strictly justified only for large sample sizes, so we can use this variance
estimator to construct large sample tests on the regression coefficients. However, it must be
admitted that in practice it is also used in applications with only moderate sample sizes.
3
7.3 Detection of heteroskedastic disturbances
When we specify our assumption regarding the random disturbances we have to base our
considerations on the type of application under study. Above we mentioned the cross-section
studies of the demand for food as a function of income as an obvious example where the
assumption of homoskedastic disturbances is doubtful, but the econometric literature abounds
by similar examples. So what shall we do to secure that our methods are based on firm
footing? Well, an easy and convenient first step is to run a OLS regression, then calculate the
residuals ˆi and plot the residuals against the explanatory variables. This will indicate
whether there is any founding for our suspicion. If the empirical plots indicate that some kind
of heterskedasticity is at work in our data, we have to look for test procedures that can help
revealing the presence of heteroskedasticity. In the literature there has been developed several
such tests which in a way is a refinement of statistical plots in that the framework is
regressions of ˆi2 on different functions ( often polynomials) of the explanatory variables or
of the estimated dependent variable ( Yˆ , Yˆ 2 ,..) . Note there is no point in regressing ˆ on the
i
i
i
explanatory variables since by the working of OLS ˆi will be uncorrelated with the
explanatory variables. We shall not give a review of these tests but the following details are
useful
(7.3.1)
ˆi  Yi  (ˆ0  ˆ1 X i )  (ˆ1  1 )( X i  X )  ( i   )
implying
(7.3.2)
ˆi2  (ˆ1  1 ) 2 ( X i  X ) 2  ( i   ) 2  2(ˆ1  1 )( X i  X )( i   )
From (7.3.2) we deduce (do that for yourself) that
N
(7.3.3)
 E (ˆ
N
2
i
)    i2 
i 1

i 1
N
N
2
i


i 1
N
2
i
( X i  X )2
( ( X i  X ) 2 ) 2
i 1
If  i2   2 , i.e. the disturbances are homoskedastic (7.3.3) reduces to
N
(7.3.4)
 E (ˆ
i 1
2
i
)  N 2   2   2  ( N  2) 2
which shows that the estimator
N
(7.3.5)
ˆ 2 
 ˆ
i 1
2
i
N 2
4
is an unbiased estimator of  2 in the homoskedastic case. From the expression (7.3.2) we
find that when the disturbances are homoskedastic
(7.3.6)
E (ˆi2 )   2 
2
N

 2 ( X i  X )2
N
(X
i 1
i
 X )2
Hence, even though the disturbances are homoskedastic the squared residual will depend on
( X i  X ) 2 . However, under mild conditions on the explanatory variable X this dependency
will vanish when the number of observations increases. This explains that tests based on the
residuals ˆi2 only has validity when the sample size is large, they are what we call asymptotic
tests. Although these tests are quite simple ‘regression tests’ we shall in this course dwell any
longer on them. As a formal test of investigating the presence of heteroskedasticity we shall
only have a look on the Goldfeld-Quandt test. As a background for this test let us again
consider the model of food expenditure as a function of income. As we noted above it is
reasonable to assume that the variability of food expenditure is much larger for high income
households than for low income families. A simple test of heteroskedasticity based on this
idea is to split the sample into two sub-samples where we use income as a sorting criterion. In
order to be specific let us split the sample in two equal parts where the first sub-sample
contains households with the highest income and the remaining households in second subsample. Then one assumes that variance in the first sub-sample is  12 and the variance in the
second is  22 , but note that one still assumes that the regression coefficients are the same for
the samples. Applying OLS regression to the two sub-samples will give us estimates of the
regression coefficients  0 and 1 , and from these estimates we deduce estimates the two
variances  12 and  22 in the usual way. If for example the variance in first sub-sample is  12
and the number of observations allocated to this sample is N1 , then from standard statistical
knowledge
N1
(7.3.7)
V1 
RSS 1

2
1

 ˆ
i

2
1
2
i1

( N 1  2)ˆ 12

is  2 distribute d with DF equal to N 1  2
2
1
The treatment of sub-sample is, of course, analogical. So the Goldfeld-Quandt test of
heteroskedasticity is this case:
The null hypothesis H 0 :
 12   22
against
H A :  12   22
The test procedure is now as usual: Choose a level of significance  , then clarify the test
statistic one wants to use, find the distribution of the test statistic under the null hypothesis
and finally determine the rejection region.
In this case we use the test statistic
5
RSS 1
N 2
F 1
RSS 2
N2  2
(7.3.8)
where N 2 is the number of observations in the second sub-sample and, of course, RSS 2 also
refers to the second sub-sample. Note that under the null hypothesis the two variances are
equal and therefore will cancel in expression (7.3.8). If the test statistic F is around 1 the null
hypothesis is confirmed, if however the statistic is considerably larger this will tend towards
rejection of H 0 .
If we reject the null hypothesis, then we are pretty convinced that the disturbances are
heteroskedastic. But what do we then do? Well, in that case it is reasonable to reflect on the
nature of the heteroskedasticity. If we are convinced that this has a specific form, the next
step is to apply an appropriate transformation to the variables entering the model.
7.4
Transformations of variables
Let us illustrate this approach by considering a specific application, so we consider the
following model
(7.4.1)
Yi   0  1 X i   i
For the disturbances we specify
E ( i )  0
for all i
Var ( i )   X i
2
(7.4.2 a-c)
Cov( i ,  j )  0 i  j
Hence, we suppose that we know the form of the heteroskedasticity. Although, OLS applied
to (7.4.1) provides unbiased and consistent estimators of the regression coefficients, we know
that OLS is an inefficient method of estimation in this case. The so-called ‘blue’ property of
OLS presupposes that the disturbances are homoskedastic. Since we know the form of the
heteroskedasticity it is very tempting to try to transform the variables in order to restore the
homoskedasticity property of the disturbances. In the present simple case we see directly how
this can be done. We simple divide through equation (7.4.1) by the square root of X i ,
(7.4.3)
Yi
Xi

0
Xi
 1 X i 
i
Xi
Since, X i is observable the transformed model will be an ordinary regression with two
explanatory variables but without an intercept term. We also note that the disturbances in the
transformed model are homoskedastic. With obvious notation we write (7.4.3) as
(7.4.4)
Vi   0 Z i1  1 Z i 2  ui
6
Since, the disturbances u i are homoskedastic OLS applied to this regression (7.4.4) will give
us ‘blue’ estimators of  0 and 1 , and moreover the conventional procedures can now be
applied to hypotheses on the regression coefficients. Generally, if we know the form of the
heteroskedasticity, we apply the appropriate transformation and everything will be put in
order. Unfortunately, we very seldom know the exact form of the heteroskedasticity. In
applications there is a case when this approach can be applied. Suppose our data are group
averages, but the number of observations in each group vary. That is, we have the model
(7.4.5)
Yij   0  1 X j   ij
where j  1,2,.....k counts the number groups, and we have n j observations in each group.
Regarding the disturbances  ij we assume they are homoskedastic. We don’t have
observations on the Yij but we data on the group averages
nj
(7.4.6)
Y.. j 
Y
i 1
ij
nj
From the regression (7.4.5) we derive regression
Y.. j   0  1 X j   .. j
nj
(7.4.7)
where  .. j 

i 1
ij
nj
The disturbances  .. j will be heteroskedastic since
(7.4.8)
Var ( .. j ) 
2
nj
Evidently, in this case we shall multiply the regression (7.4.7) by
n j , so the transformed
regression will be
(7.4.9)
n j Y.. j   0 n j  1 n j X j  n j  .. j ,
j  1,2,...., k
The disturbances in this regression is homoskedastic since
(7.4.10)
Var ( n j  .. j )   2
Applying OLS regression to (7.4.9) will provide us with ‘blue’ estimators of  0 and 1 .
Applying OLS regression to the transformed variables as exemplified in the regressions
(7.4.4) and (7.4.9) are called weighed OLS regression and are examples of generalized least
7
square regression (GLS). We will learn more about GLS in more advanced courses in
econometrics.
8
Download