```Econ 388 R. Butler 2014 revisions Lecture 17
I. Some important definitions
A time series is a collection of random variables, ordered in time, and also known as a
stochastic (Greek for “pertaining to chance”) process. An example would be:
xt1 , xt 2 , xt3 , xt 4 , ...xt m , where in the data we have just a single realization for each of the “ t j ”
time periods. Hence, we cannot replicate our data. Also, the time series is often not
independent. Because of the lack of replication and the lack of independence, we often
need to impose some restrictive assumptions on the stochastic process. Those
assumptions usually concern the mean, variance, and covariances of the stochastic
process. Suppose that you look at the annual real price of gold from 1880 to 1909, a
sample of 30 time series observations (which you can get from the Historical Statistics of
the United States, volume 1), so n=30. Then pick any other 30-year period (1910 to
1939, or 1895 to 1924, or 1940 to 1969, to name just a few examples), and see whether it
has the same mean, the same variance, and the same covariances. The next three
definitions focus on the properties of these series, when we chose different time
subscripts.
A. (strict) stationary stochastic process= xt1 , xt 2 , xt3 , xt 4 , ...xt m has the same joint distribution
as the series of xt1  h , xt 2  h , xt3  h , xt 4  h , ...xt m  h for any given (integer value) of h (as long as
the series exists). This means that the series is identically distributed, including same
correlations between similarly spaced (in time) terms:
corr( xt j , xt l ) = corr( xt j  h , xtl  h )
for all integer values h, and the same higher order moments as well. The distribution is
the same whatever period we examine (whatever 30 year period we happen to use for our
gold prices).
B. (weak or) covariance stationary stochastic process= if it has a finite second moment
and 1) constant mean, 2) constant variance, and 3) the covariance between any two given
terms depends only on how far they are spaced and not on t; that is:
cov( xt j , xt l ) = cov( xt j  h , xtl  h )
depends only on how many periods apart are the tj and tl terms, and not on the value of h.
Since the normal distribution is completely characterized by the means and covariances,
the weak stationarity and normality together equals strict stationarity.
Stationarity of either type makes it easier to employ the law of large numbers and the
central limit theorem when working with time series data.
C. weak dependence= xt and xt+h are weakly dependent if they are almost independent as
h gets “large;” that is, as h increase without bound then the correlation between xt and xt+h
goes to zero.
II. Four fun cases to use our definitions on
1
A. White noise (purely random stochastic process)
et
such that there is a constant mean (say zero), a constant variance (say  2 ) and all of the
covariances (other than the variance) equals zero. “White noise” is essentially the
assumption we make in the cross sectional models studied earlier, where the subscript
varied across individuals rather than over time. Since the covariances are all zero
(constants, independent of time), then white noise is a stationary stochastic process, and
weakly dependent (more strongly, white noise variates are independent).
B. Random Walk
yt  yt 1  et
where et is a white noise stochastic process, yt is known as a random walk. If we keep
substituting for lagged values of y we get:
yt  ( yt  2  et 1 )  et = ([ yt 3  et  2 ]  et 1 )  et = etc. = et  et 1  et  2  ...  e1  y0
So that the E( yt ) = y0 . (Often, y0 is assumed to be zero.) The variance of a random walk
is the sum of the variances (why?), Var( yt ) =  2 t , where t is the number of time periods.
Since the variance is not constant, a random walk is an example of a stochastic process
that is not stationary.
Is it weakly dependent in the sense that corr( yt , yt  h ) goes to zero as h gets large, for any
given value of t? Recall corr( yt , yt  h )=cov( yt , yt  h )/{ Var ( yt ) Var ( yt  h ) }, so we need to
look at each term separately, and then put them together. We know from above that Var(
yt ) =  2 t , and that Var( yt  h ) =  2 (t  h) . Finally, the covariance is
E[yt-E(yt)(yt+h-E(yt+h)], but assume that y0 is assumed to be zero, and we get
cov(yt,yt+h)=E([yt ]{yt+h})= E([ et  et 1  ...  e1 ] { et  h  et  h 1  ...  et  et 1  ...  e1 }),
now since all the cross products here (with differing subscript values have covariances
equal to zero—recall the white noise stochastic process above—the covariances all
vanish) have zero expected value, only the own product terms (those terms with the same
subscripts) remain, and their expected value is simply the variance for each of these
terms. Hence, we have
cov( yt yt  h )=t  2 . Now putting this altogether we have,
corr ( yt , yt  h ) 
t2
t
2
(t  h) 
2
=
t
th
The problem with these correlations are, that no matter how big we make h, we can
always make t large enough so that the correlation is close to one. Hence, a random walk
is not weakly dependent (the terms are not asymptotically uncorrelated).
Another way to look at the absence of weak dependence is to consider the expected value
of y at time t+h, given the value of y at time t. That is, does the current value of y tell us
2
anything about the future value of y, even when h is very large? The answer is yes,
because an application to y at time t+h of the recursion technique above yields
yt  h  et  h  et  h 1  et  h  2  ...  et 1  yt
So that E( yt  h | yt )  yt ; today, at time t, my best guess of the value of y in h periods from
today is just today’s value. This is a reason that Wooldridge calls the random walk a
stochastic process that is a highly persistent time series. Another non-stationary process,
closely related, is the random walk with drift process:
yt   0  yt 1  et
which, by successive substitution as before, can be rewritten as
yt   0t  et  et 1  ...  e1  y0
In this case, not only is the variance increasing over time, but the mean is increasing as
well. Again, it is not stationary and it is not weakly dependent.
C. Moving Average Processes
Suppose that
yt   0et  1et 1  ...   met  m
then yt is known as a moving average process (MA) of order m, denoted as MA(m),
where the et are white noise. In this case, the mean is zero and the variance equal to
V ( yt )   20V (et )   21V (et 1)  ...   2mV (et  m )   2 (02     m2 )
which is also constant, independent of t. The covariances of yt and yt  h are zero as long
as h>m (so that none of the et terms overlap, so all of the E( et j , e(t  h) k )=0 ), so that the
MA process is weakly dependent. Indeed, all of the covariance terms are independent of
time, so that the MA process is stationary. As an example, consider a MA(2) process,
and covariance for yt that are one period apart (recall that E( yt )=0):
cov(yt , yt 1 )  E ([ 0et  1et 1   2et  2 ][ 0et 1  1et  2   2et 3 ]) =
E ([ 01et 1et 1  1 2et  2et  2   0 0et et 1   01et et  2   0 2et et 3  11et 1et  2  ...   2 2et  2et 3 ])
= 01 2  1 2 2  0  0  0  0  0  0  0 =  2 (01  1 2 )
We get this result as only the first two terms have common time subscripts, and hence
their expected value is the variance of the white noise random variable, while all the other
terms have different time subscripts so that their covariances (the covariances of the
white noise terms) is zero. Hence, the covariance is independent of the time subscripts,
3
depending only on how far apart the y time subscripts are (here, h=1). Again, the series
is stationary.
D. Autoregressive Stochastic Process
Suppose that
yt  1 yt 1  2 yt  2  3 yt 3    r yt  r  et
where the last term is the usual white noise term, then yt is said to be an autoregressive
stochastic process of order r, denoted by AR(r). Since you regress y on lagged values of
itself, it is called “auto-regressive.” Again, by successive substitution you can show that
the mean is zero, given that E( y0 )=0. The variance and covariances take some more
work; it is easy to illustrate with a the special (and quite important) case of AR(1):
yt  yt 1  et
We start by successively substituting for lagged values of y to get
yt   (  yt  2  et 1 )  et =  2 (  yt 3  et  2 )  et 1  et =  3 (  yt 4  et 3 )   2et  2  et 1  et =
T
 e
i
t i
 y0 .
i 0
where T is how many periods in the past that the process began. So that the expected

T

 i  0

value of yt is zero, as long as y0 is zero. Since E( yt )=0, Var( yt )=E( y 2t )=E(   i et  i 
2
)
Suppose that T is infinite (general regarded as the biggest value we can have for T), then
2
T

we obviously have a lot of terms in   i et  i  . The nice thing is that all of the “cross

 i  0

product terms” (like et et v , whenever v is not equal to zero) have zero expected value and
so drop out when we take the expectation. So we
2

2
 



 
 i et  i  ) =   2i E (e2t i )  =  2  2i  =  2 .
E( 
1 
 i 0



 i  0
  i 0




The last equality follows if the infinite series converges, which it will do as long as  lies
between –1 and 1 (not including one, since it would then be a random walk). Wooldridge
provides a “sort of” alternative derivation of this result at the bottom of p. 350.
To get the covariances for an autoregressive model, recall that E( yt )=0, so that we need
only consider


 


Cov(( yt yt  h )=E( yt yt  h )=E(   i et  i    i et  h  i  )
 i  0
  i  0

4
As before, the expected value of all the cross product terms will equal zero, and only the
white noise terms with the same time subscripts will remain (though one will have a  i
coefficient, and the other will have a  i h coefficient, so that the covariance will be

E(   h  i et  i
 i  0

2

 )=

  h 2i
2
   E (e t  i )
 i  0

  h 2  2i

 =  
 
i 0


2
.
 = h
1  2

With a stable AR(1) process with the absolute value of  less than one,  h goes to zero
(and so does the covariance, and the correlation) as h gets large. This implies that the
AR(1) process (with  less than one) is weakly dependent.
II. OLS assumptions and properties of the least squares estimator vector
The OLS estimators are consistent when the following assumptions hold:
1. linearity and weak dependence (of the xt , yt --the variables in the model)
The weak dependence assumption assures us the law of large numbers and central limit
theorem can be applied as we take plims.
2. zero conditional mean of the error term, t and the current period regressors, xt
. (This is a weaker assumption then one in which t is uncorrelated with x from all
periods, but it does preclude omitted variables that are correlated with included
regressors---i.e., if we leave out important variables and these left out important variables
are correlated with the included independent variables, our estimates will not be
consistent. But this assumption does allow for models with lagged dependent variables,
as Wooldridge notes in chapter 12.)
3. No perfect collinearity between the independent variables in the model.
The OLS estimators are asymptotically normally distributed (and all the usual tests can
be used in large samples) when the following holds:
1.-3. assumptions 1 through 3 above hold, and
4. The errors are homoskedastic
5. There is no serial correlation.
These conditions are enough to assure consistency of the OLS estimators, but not their
unbiasedness.
5
```