8 Regression with auto-correlated disturbances

advertisement
8 Regression with auto-correlated disturbances
(8.1)
(8.2)
(8.3)
(8.4)
Introduction
Properties of least square estimators when the disturbances are auto-correlated
The Durbin-Watson test for auto-correlation
Generalized least squares
8.1 Introduction
In the last section we investigated consequences for the least square estimator that the
disturbances  1 ,  2 , .....,  n had varying variances, but otherwise satisfied the classical
standard conditions. Now, we consider a problem we often encounter when analysing time
series data, namely that the random disturbances are correlated. We already know that the
disturbances also take care of the numerous factors which influence the endogenous variable,
but for various reasons have not been explicitly specified in the regression equation. It is
evident that some of these factors may show a definite temporal pattern, that is to say they are
correlated. Since the disturbances summarize these factors we intuitively realize that in many
situations this will have as a consequence that the disturbances in the regression equation will
be correlated. Since this is a breach of one of the classical assumptions in least square
regression we expect that some of the properties we have learned the least square estimator to
have, will not hold in this situation.
8.2 Properties of least square estimators when the disturbances are
auto-correlated.
Since our concern is consequences of auto-correlated disturbances we can as well consider a
regression with only one independent variable. That is to say, we specify the regression
equation:
(8.2.1)
Yt   0  1 X t   t
t  1, 2, 3, ....., T
where T denotes the size of the sample.
Regarding the auto-correlated disturbances  t  there are two popular specifications in
econometrics:
(8.2.2)
(8.2.3)
 t   t 1  ut
 t  1ut   2 ut 1
 1
In case (8.2.2) the random process  t  is said to be a first-order auto-regressive process, and
is usually denoted by the obvious symbols AR (1) process. The second process  t  is called a
moving average process, and (8.2.3) denotes a MA(1) process. The number 1 appearing in
these denotations is meant to tell us that the processes  t  are specified by using one lag only
1
t  1 . We also note that the process ut  appearing in these specifications are supposed to be
a purely random process with mean zero and variance  2 . In the present course we always
assume that the disturbances  t  are AR (1) processes. We also note that the condition   1
is important. It means that process specified by (8.2.2) is a stationary process, which implies
that the means of  t  is a constant independent of time t , and that the covariance of
 t and  t  s depends only on the time-lag s and not on the time t . These properties are easily
derived by using the recursion (8.2.2) to solve for  t with respect to white noise process u t .
However, the mean and variance of  t and the various covariance of  t and  t  s , can be
derived from specification (8.2.2). Using the standard formula for calculating the mean, we
obtain from (8.2.2) that
(8.2.4)
E t  E t 1  Eut
Since  t  describes a stationary process it follows from what we have said above that
E t  E t 1   say, so that (8.2.4) reduces to
(8.2.5)
    0 since Eut  0
From (8.2.5) it follows that   E t  0 .
The variance of  t is derived in a similar way. From the specification (8.2.2) we obtain
(8.2.6)
   
Var  t   E t2   2 E  t21  E u t2  2 E  t 1u t 
Since  t 1 and ut are uncorrelated (why?) the last term of (8.2.6) will vanish. Hence, (8.2.6)
reduces to
(8.2.7)
 2   2 2   2
(8.2.8)
 2 
implying that
2
1  2
Calculating the covariances between  t and  t 1 , and generally between  t and  t  s , we use
again the recursion (8.2.2). Obviously we have
(8.2.9)
Cov( t  t 1 )   (1)  Var ( t 1 )   2
(8.2.10)
Cov( t  t  2 )   (2)   2 2
Proceeding in this way we will find that generally
(8.2.11)
Cov( t  t  s )   ( s )   s 2
2
In time-series analysis the covariances calculated above will often be called auto-covariances.
If we normalize the auto-covariances by the variance we obtain the auto-correlations  (s ) , so
we realize that
(8.2.12)
 ( s) 
 ( s)
 s
 (0)
The interesting question is now if the auto-correlated disturbances of  t  have any
consequences for the least square estimators of the regression parameters 0 and 1
appearing in regression (8.2.1)? We know already that The OLS estimators of these
parameters are given by
(8.2.13)
ˆ0  Y  ˆ1 X
(8.2.14)
ˆ1  
Yt ( X t  X )
(X
t
 X)
2
 1 
 ( X  X )
(X  X )
t
t
2
t
Using the independence of  t  of the exogenous variable  X t  , we readily derive that
(8.2.15)
 
E ˆ1  1
and
 
E ˆ0  0
From (8.2.14) we also find that
(8.2.16)

E ˆ1  1

2
2
 Var ˆ1   2
 xt
 

 xt xt 1  .....  2  T 1 x1 xT 
1  2 

 xt2
 xt2 
where xt  X t  X .
T
When the number of observations T grows the sum
x
t 1
2
t
will become infinitely large, which
implies that the right hand side of (8.2.16) will tend to zero. Thus, the estimator ̂1 will
converge to 1 in probability and we say that ̂1 is a consistent estimator. By similar
reasoning we can show that also ̂ 0 is a consistent estimator.
Combining these facts with (8.2.15) we conclude that the OLS estimators ˆ0 and ˆ1 are
unbiased and consistent even though the disturbance process  t  are auto-correlated. So what
goes wrong in this situation? Well, we observe from (8.2.16) that auto-correlation will
change the expression for the variance of ̂1 . Note that when the error process  t  is purely
random   0 and (8.2.16) will reduce to the standard expression for the variance of ̂1 .
From this we understand that the standard t and F tests and the standard procedures for
calculating confidence intervals are not valid when the disturbances are auto-correlated.
3
8.3 The Durbin-Watson test for auto-correlation
We noted above that auto-correlation in the disturbance process can take several patterns. The
auto-regressive and moving average patterns are perhaps the more common, but the error
process can also have a combinations of these two forms. Since the prevalence of autocorrelation can cause serious problems for applied statistical analyses, we should like to have
reliable tests designed to expose this problem. A finite sample test derived for this purpose is
the so-called Durbin-Watson test. The original D-W test is constructed to disclose the
existence of a simple auto-regressive disturbance process, i.e. to disclose if the error process
 t  has an AR (1) form. The specific model under consideration is
(8.3.1)
Yt   0  1 X t   t
(8.3.2)
 t   t 1  ut
t  1, 2, 3, ....., T
 1
As the null hypothesis the D-W test uses the hypothesis of no auto-correlation so that
H 0 :   0 and in addition that the purely error term
ut  is normally distributed or ut :
N (0,  2 )
This hypothesis can be tested against the alternatives
H1 :   0,
H 2 :   0,
H3 :   0.
The construction of D-W test is simple and very intuitive. One starts by regressing Yt on X t
as indicated by (8.3.1). Having obtained the estimates ˆ and ˆ we calculate the residuals
0
(8.3.3)
1
ˆt  Yt  Yˆt  Yt  ˆ0  ˆ1 X t
Then the D-W test is based on the test statistic
T
(8.3.4)
d
 (ˆ  ˆ
t 2
t 1
t
)2
T
 ˆ
t 1
2
t
We readily see that we have approximately
(8.3.5)
d ; 2(1  ˆ )
where ̂ is the OLS estimate of  obtained from the ‘regression’ (8.3.2) or
T
(8.3.6)
ˆ 
 ˆ ˆ
t 2
T
t t 1
 ˆ
t 2
2
t 1
4
We observe that ̂ is almost equal to the empirical correlation coefficient
r between ˆt and ˆt 1 since
T
(8.3.7)
 ˆ ˆ
r
t 2
t t 1
T
T
t 2
t 2
( ˆt2 )( ˆt21 )
Hence, we can also write
(8.3.7)
d ; 2(1  r )
The value of the test statistic d depends on the observations of the exogenous variable X t
and the values of  t . However, Durbin and Watson showed that, for given values of  t , d is
necessarily contained between two limits d L and dU which are independent of the values of
X t and are functions only of the number of observations T and the number of the exogenous
variables k so that
(8.3.8)
d L  d  dU
The limits d L and dU are random variables whose distribution can be determined for each
pair of (k , T ) under given assumptions on the distribution of
 t  . We note above that under
the null hypothesis  t  is normally distributed mean 0 and var iance  2 . The distributions of
d L and dU under the null hypothesis has been tabulated by Durbin and Watson.
Since the correlation coefficient r is restricted to the interval  1,1 we observe from (8.3.7)
that d is restricted (approximately) to the interval 0,4. This fact provides us with useful
guidelines for when we shall reject the null hypothesis when testing against the various
alternative hypotheses.
Suppose we wish to test the null hypothesis: H 0 :   0 against H 3 :   0
Since ̂ is approximately equal to r , it follows from (8.3.7) that we have every reason to be
doubtful to the null hypothesis if the calculated value of the test statistic d is in the
neighbourhood of zero. But to reject or not to reject the null hypothesis has to be decided on
basis of the distributions of two bound d L and dU . The usual test procedure recommends us
first to choose the level of significance  , to this level of significance we determine the
relevant fractile values from the distributions of the two bounds so that so that
  Pd L  d1  and   PdU  d 2  . The decision process is now:
If dˆ  d1
reject H 0
If d  dˆ  d 2 we can neither reject or accept H 0 , the statistical material is
indeterminate.
If dˆ  d 2 do not reject H 0
5
In a similar way we can test
H 0 :   0 against H 2 :   0 and finally
against
H1 :   0
Most textbooks give tables for the distributions of two bounds, for example Hill et al. Table 5.
8.4 Generalized least square
When the random disturbances  t  follow an AR (1) we have seen that the disturbances are
correlated ((8.2.9) – (8.2.11)). This is a breach of the classical conditions underpinning the
ordinary least square method. Although OLS gives us unbiased and consistent estimators
when the disturbances are auto-correlated, it is evidently possible to find better and more
convenient estimating methods. Generalized least squares is such a method which we will
illustrate in this section by applying it to the regression model specified by (8.2.1) and (8.2.2).
If we shift the time index one period backwards and multiply by  the regression (8.2.1) is
transformed to
(8.4.1)
Yt 1   0  1 X t 1   t 1
Subtracting (8.4.1) from (8.2.1) we attain
(8.4.2)
Yt  Yt 1)  (1   )  0  1 ( X t  X t 1 )  u t
If we define the variables
(8.4.3)
~
Yt  Yt  Yt 1
~
X t  X t  X t 1
6
we can write (8.4.2) as
(8.4.4)
~
~
Yt  (1   ) 0  1 (1   ) X t  ut
~
~
If we know  , Yt and X t are observable and (8.4.4) turns out to be an ordinary linear
regression with a disturbance u t satisfying all classical conditions. However, it follows from
~
~
the definition of Yt and X t that time index has to run from 2 up to T , so restricting our
analysis to (8.4.4) we in effect loose one observation and hence some efficiency in the
estimation. But the ‘good’ situation is easily recovered. For the first observation the
regression satisfies
(8.4.5)
Y1   0  1 X 1   1
2
The variance of the  1 is 
(1   )
we multiply the first observation by
(8.4.6)
where, of course,  2 is the variance of u t . Hence, if
2
1   2 , that is
 1   Y    1      1   X   1   
2
2
1
2
0
1
2
1
1
we observe that the variance of the random error appearing in (8.4.6) is simply
 2 that is , equal to the var iance of u t . So if we supplement regression (8.4.4) with (8.4.6) as
the first observation, we get an extended regression which uses all T observations, has
independent and homoskedastic disturbances. We observe that the this extended regression
~
now has two explanatory variables I  1   2 , 1   , 1   ,........, 1    , the vector of the
~
~ ~
~
second explanatory variable is, of course, X  1   2 X 1 , X 2 , X 3 ,......., X T .
The method we have described above is in effect an application of generalized least square.
Generalized least square will always involve some kind of transformation of the observable
variables. Above we have tacitly assumed that the parameter  , usually it is not. When  is
not known it has to be estimated in some way in order to be able to use the transformations
above. An approach often used is to start by running the regression (8.2.1) and then calculate
the residuals

(8.4.7)



ˆt  Yt  Yˆt  Yt  ˆ0  ˆ1 X t
Then one can estimate  by running the regression of ˆt on ˆt 1 (of course without an
intercept term). Having obtained an estimate ˆ of  one proceeds as above.
7
Download