Lecture 18 - BYU Department of Economics

advertisement
Econ 388 R. Butler 2014 revisions lecture 18
I. Transformations on Highly Persistent time series
A. weakly dependent time series are integrated of order 0, denoted I(0); nothing needs to
be done to such series before we invoke large sample niceness over them
B. random walks—with or without drift—are integrated of order 1, I(1); and they need to
be transformed before using them: usually by first differencing them.
An AR(1) stochastic process, yt  yt 1  et , is I(0) if   1 , and a I(1) if   1 .
You would think that the regression of y on its lagged value, would indicate whether it is
I(1) or I(0), but when the process is a random walk (I(1)), the estimated  can be shown
to be downward biased. This is the essence of the “testing for unit roots” problem. (the
bias tends to go away as the sample size goes to infinity). The rule of thumb given in the
book is if the estimated  is .9 or higher, then you should difference the variables of
interest in the equation before estimating the model.
II. Autocorrelation or Serial Correlation: E (  t  s )  0
when s  t .
A. Introduction
One of the most common violations in time series models of the regression
assumptions is the presence of autocorrelated random disturbances in regression models.
In general, autocorrelated errors do not effect the consistency or unbiasedness of the least
squares estimators (if it is “pure” autocorrelation, and not due to an omitted variable—see
chapter 11 Wooldridge for one instance in which there is bias with lagged values of the
dependent variable); but autocorrelation does tend to bias t-tests and F-tests (because
standard errors won’t be correctly measured with OLS). The case of positive
autocorrelation might be depicted as follows:
yt
xt
1
Note that positive random disturbances tend to be followed by positive random
disturbances and negative random disturbances tend to be followed by negative random
disturbances. Thus, we are faced with a situation in which the off-diagonal elements of
the variance/covariance matrix are nonzero; therefore E( i  j )  0 and the least squares
estimators of β again will not equal the MLE or BLUE of β, and are therefore not
minimum variance estimators.
Possible causes of autocorrelated random disturbances might include deleting a
relevant variable, selecting the incorrect functional form, or the model may be correctly
specified, but the error terms are correlated. Since the problem with autocorrelation only
occurs when the data are naturally ordered--i.e., with time series data but not with cross
section data--we'll use a "t" (for time period) subscript for variables coming from year "t".
n(n  1) n(n  1)

The V(  )- matrix ,  , contains n 
distinct elements. In the
2
2
context of the generalized regression model, we lack sufficient data to obtain separate
independent estimates for each of the Cov (  i  j ). In order to circumvent this problem
we frequently assume that the  's are related in such a manner that fewer parameters
describe the process. One such model which provides an accurate approximation for
many time series (this is the most common stochastic process) is the first order
autoregressive process
 t    t 1  et
where the et are assumed to be independently and identically distributed as N(0, σ2), i.e.,
they are a white noise stochastic process. Recall that for this stochastic process that we
have shown that E(  i )=0,
Var(  t ) =  2 =
 e2
1-  2
Cov(  t  t s )=  s
2
.
(1   2 )
We observe that the random disturbances  t are characterized by constant variance
(homoskedasticity) and are uncorrelated if and only if
ρ = 0 in which case the
 t = e t . We also note that since
Cov(  t ,  t 1 )  E (  t  t 1 )  
2
(1   2 )
we expect positive random disturbances to be followed by positive random disturbances
if ρ > 0.
2
Based upon the assumption that the process  t is a first order process, we can write
the associated variance covariance matrix as

 1
 2   n-1



. 1 .    n-2 
2  
e 

=
1- 2  2 . . .
.
.  n-3 




n1
2

. . . 

1

 is now completely characterized by the two parameters ρ and σ and the estimation
problem is considerably simplified.
B.T-test from a residual regression If the regressors are strictly exogeneous, then a
two stage procedure is to 1) get the residuals from the OLS regression, and then run the
regression
ˆ t   ˆ t 1  et
without an intercept (though it would also work with an intercept, asymptotically), and
test whether ̂ =0. STATA code:
regress y x1 x2 x3 x4;
predict resids, residuals;
gen lag_resids=resides[_n-1]; *be sure to use square brackets;
regress resides lag_resids, noconstant;
regress resides lag_resids; *just so you can see that having a constant doesn’t matter;
SAS code:
proc reg; model y=x1 x2 x3 x4;
output out=next_to r=resids; run;
data diane; set next_to;
lag_resids=lag(resides); *lag2(.) for lagging twice, etc.; run;
proc reg; model resids=lag_resids / nointercept; run;
C. Durbin-Watson Test (chapter 12)
A related test—the most common test for autocorrelated errors (it checks if these is a
AR(1) process)—is the Durbin Watson test. It is defined by
n
 (ˆ
D.W.=
t
- ˆ t-1 ) 2
t=2
n
 ˆ
2
t
t =1
3
where ̂ t denotes the least squares estimator of the random disturbance  t . This
expression can be written in a useful alternative form by noting that (denote et= ̂ t for
purposes of this derivation):
n

n
(et - et-1 )2 =
t=2
n

t=2
n
=

2
et +
t=1
2
et-1 - 2
2
et-1 =
2
2
et et-1 - e1 - e0
t=2

t=1
2
2
2
e t + e0 - e n
t=1
n
n


n

=
2
et-1
t=2
n
t=1
n

et et-1 +
t=2
n

Since
n

2
et - 2

2
et +
t=1
n
2
et - 2
t=1
n

2
2
2
2
et et-1 - e0 - en - e1 - e0
t=2
n
= 2(  e2t -

t=1
t=2
2
2
et et-1) - e1 - en
hence,
D.W =
 n
2  e2t  t=1
n
 ee
e
t
t=2

2
2
 - e1 - en

t-1
2
t
t=1
n
= 2(1 -  ) -
2
2
e1 + en
n

t=1
e
2
t
where  =

et et-1
t=2
n

2
et
t=1
so that D.W.  2(1 - ̂ ) where ̂ denotes an estimator of ρ.
From this expression we note that if ρ = 0, we would expect to have ̂ "close" to
zero and the value of Durbin and Watson statistic (DW) close to two. In attempting to
determine confidence intervals for DW derived the distribution of two statistics which
bounded DW. Consequently, the reported confidence intervals for the hypothesis ρ = 0
for D.W. (derived from confidence intervals for the bounds) may appear somewhat
peculiar as may be illustrated by the following figure.
4
True
Distribution
Lower Bound
Distribution
Upper Bound
Distrbution
Durbin Watson
Statistic
2
dL
dtrue
dU
The values of dL (lower bound, for say, the five percent level) and dU (upper bound, for
say, the five percent level) define the critical region bound (around the true value for the
5 percent level, dtrue), and are tabulated according to the critical level (α level), sample
size (n), and number of non-intercept coefficients in the model (k). The tables have been
extended to cover additional sample sizes and number of explanatory variables by Savin
and White [Econometrica, 1977]. The null hypothesis Ho: ρ = 0 is rejected if
D.W. < dL or D.W. > 4 - dL.
We fail to reject the hypothesis if
dU < D.W. (for positive autocorrelation, and DW< 4 - dU for negative
autocorrelation)
and the test is inconclusive if
dL < D.W.< dU (or 4 - dU < D.W. < 4 - dL. for negative autoregression)
STATA will calculate the exact DW statistic for you in there time series program that
does regression with first order autoregressive errors (“prais” procedure):
prais y x1 x2 x3; *this does prais-winsten estimators using T2 transformation below;
SAS, just use the DW option in proc reg:
proc reg; model y x1 x2 x3 /dw; run; /*OR proc autoreg; model y=x1 x2 x3/dw; run;*/
5
In general, the Durbin-Watson test is not strictly appropriate for models with lagged
dependent variables included; in this case one should use the Durbin-h statistic (see
Durbin, Econometrica, 1970 , mentioned, but not given on p. 384 in Wooldridge), and the
simple procedure mentioned in chapter 12 of Wooldridge: the same as the t-test above
but include the independent regressors in the second stage as well:
ˆ t   ˆ t 1   0  1 x1t   2 x 2t    et
and test whether ̂ = 0. This test can be readily extended to AR(q) processes (using a Ftest for the joint significance of several ̂ i -coefficients (see Wooldridge, chapter 12 and
a LM-test as described just after that).
II.
Estimation with Autocorrelated Errors
For applications in which the hypothesis of no autocorrelation is rejected, we may
want to obtain maximum likelihood estimators of the vector  . This will be obtained by
proceeding in the same manner as in the case of heteroskedasticity, i.e., we will attempt
to transform the model so that the transformed random disturbances satisfy the
assumptions of the linear regression model and then apply least squares.
Consider the model
y t  X t    t   0   1 x1t     k x kt   t
where
 t    t 1  et
t = 1, 2, . . ., n.
Replacing the t in the expression for yt by t-1 and multiplying by  we obtain  yt-1 =
 X
t-1  +   t-1
=   0   1 x1t      k x kt   t
Subtracting  yt-1 from yt yields
yt -  yt-1 =  0 (1   )  1 ( x1t   x1t 1 )     k ( x kt   x kt 1 )  ( t   
or yt* =  0 (1   )  1 x *1t   et
where
yt* = yt -  yt-1
x*it = xit -  xit-1
t 1)
t = 2, . . ., n
t = 2, . . ., n, i = 2, . . ., k.
Note that we have (n - 1) observations on y*, x*i. The random disturbance term
associated with the transformed equation satisfies the usual regression assumptions. The
transformed data matrices are given by
6
 y1 


 y 2   y1    1 0 0  0 0   y 2 


y   y 2  0   1 0 0  0 0  y 3 
y*   3




 
 


 


0
0
0



1
y
y


y
 
  n 1 
n 1 
 n
 y n 
= T1Y
and
X* =
1   x1t   x1t 1  x kt   x kt 1 


1   x1t 1   x1t  2  x kt 1   x kt  2 




1   x1n   x1n 1  x kn   x kn 1 
= T1X
A common technique of estimation is then based upon applying least squares to
y* = X*  +  t
or
yt -  yt-1 = ( X t   X t 1 )   t
t = 2, . . ., n
Several comments need to be made about this approach. First,  is generally not known
and estimates of  will need to be used. Note that the intercept in the transformed
equation is  0 (1-  ) and hence the final estimate of the intercept must be divided by 1 in order to recover an estimate of  . Finally we need to mention that even if  is
known this estimator of  will not be identically equal to the MLE of  because n-1
observations are used rather than n observations, i.e., we are not using all of the sample
information.
MLE of  can be obtained by noting that
(1   2 ) y1  (1   2 ) X 1   (1   2 ) 1 (transformation of the first observation)
where
(1   2 ) 1  N (0,  2 )
and then applying least squares to the transformed equation
y** = X**  +  *
where
7
( 1   2 ) y 
1

 y 2   y1 


y **   y3   y 2 




 y n   y n 1 


= T2y
 1  2 1  2 x 
1t

1   x1t   x1t 1 

X** = 1   x1t 1   x1t  2 


1   x1n   x1n 1 



x kt   x kt 1 

x kt 1   x kt  2 


x kn   x kn 1 

1   2 x kt
=T2X
where
 1   2 0 0 0  0 0


  1 0 0  0 0 0 
2




T2 = 0   1 0 0  0 0  =  1   0 0  0


T1




0 0  0   1 


Note:
(1) T2 is n x n whereas T1 is n-1 x n; hence, y** is n x 1 and y* is n-1 x 1. T2
is the Prais-Winsten transformation, and is probably preferred over the T1 (CochraneOrcutt) transformation, especially for small samples. SAS uses a Yule-Walker
transformation (Prais-like transformation) in its proc autoreg.
(3) In cases in which  is known the above procedures are relatively straightforward.
When  is not known alternative techniques have been developed. A common
technique can be outlined as follows:
(a) Estimate ̂ as discussed above when making the t-tests for first order
autocorrelation.
(b) Transform the data using ̂ instead of  , T1 or T2 can be used. Apply least squares
to the transformed data. The associated estimators are referred to as two stage estimators.
(Don't confuse this with two stage "least squares" estimator which will be discussed
later).
What we said about oblique projections using GLS at the end of the lecture 17 applies
here as well.
8
Download