Stationary Processes and the Ergodic LLN

advertisement
Lecture 3 – Stationary Processes and the Ergodic LLN
(Reference – Section 2.2, Hayashi)
Our immediate goal is to formulate an LLN and a
CLT which can be applied to establish sufficient
conditions for the consistency and asymptotic
normality of the OLS estimator in time series
regressions with temporally dependent,
predetermined (but not necessarily strictly
exogenous) regressors and serially uncorrelated
disturbances. [We will deal with serially correlated
disturbances later in the course.]
In this lecture we will state the Ergodic Theorem, an
LLN that applies to “stationary and ergodic stochastic
processes.” We will begin by defining and describing
stationary and ergodic processes.
In the next lecture we will state the Ergodic
Stationary Martingale Differences CLT, providing
a definition and description of martingales and
martingale difference sequences before presenting
that theorem.
Then, in the following lecture, we apply these
theorems to formulate a set of conditions under
which the OLS estimator is consistent and
asymptotically normal.
Definition – Stochastic Process
A sequence of random variables is also called a
stochastic process or a time series (if the index
refers to a period or point in time).
Note - Sometimes it is most convenient to define the
stochastic process {zi}over the positive (or
nonnegative integers) i = 1,2,… (or i = 0,1,2,…)
and sometimes it is most convenient to define the
process over the entire set of integers i = …,-2,1,0,1,2,…
Definition – Realizations of a Stochastic Process
The outcome of a stochastic process forms a
sequence of real numbers, which we also write as
{zi}. This sequence of real numbers is called a
realization of the stochastic process or a
(realizaition of the) time series.
In econometrics, the time series data we observe, e.g.,
quarterly U.S. real GDP from 1960-2004, are thought
of as part of a realization of a stochastic process. Our
goal in applied time series analysis is to draw
inferences about the stochastic process based upon
the realization we have observed.
The most useful class of stochastic processes is the
class of stationary stochastic processes.The basic
idea underlying the notion of a stationary process is
that in a probability sense, made precise by the
following definition, the process behaves the same
way over time.
Definition – Stationarity
The stochastic process {zi}, i = …,-1,0,1,… is
strictly stationary if all of its finite dimensional
distributions are time invariant. That is,
Pr ob( z i1   1 ,..., z ik   k )  Pr ob( z h i1   1 ,..., z h ik   k )
for all: positive interger k, integers i1,…,ik,
integers h, and real numbers 1,…,k.
Note – the z’s can be random variables or random
vectors (provided that they have the same
dimension). In the case where the z’s are random
vectors, we would say that the process is jointly
stationary. (It can be the case that each element of z
is strictly stationary, but the vector process is not
jointly stationary. See Example 2.3 in Hayashi.)
Fact –
If {zi}is strictly stationary , the -moment of zi,
E ( z i ) , is the same for all I, if it exists and is
finite. (Why? Because the distribution of zi is the
same for all i by the definition of stationarity. )
Fact –
If {zi}is strictly stationary and f(.) is a
continuous function, then {f(zi)} is also strictly
stationary.
So, for example,{yi}, yi= a0 + a1zi+1 + a2zi + a3zi-1,
is strictly stationary.
Also, if the z’s are m-dimensional and jointly
stationary, then zz’ and z’z are strictly stationary.
If zz’ is nonsingular, then (zz’)-1 is stationary.
A couple of (extreme) examples of stationary
stochastic processes:
An i.i.d. sequence is a strictly stationary sequence
(This follows almost immediate from the
definition. 1) use the independence property to
factor the joint distribution into the product of the
marginal distributions. 2). Then use the identical
distribution property that Prob(zi < ) = Prob(zj <
) for all i,j,.)
A constant sequence is a strictly stationary
sequence.
Suppose we flip a coin. If H, then zi = 0 for all i; if
T, then zi = 1 for all i. Then, for example, Prob(zi <
1/2) = Prob(zj < 1/2) = Prob(H) for all i,j. …
Note that in the first example, the process has no
memory; in the second example the process has
inifinite memory – the initial value completely
determines the remainder of the sequence.
In order for stationary processes to be of use to us,
we will need to restrict the class of stationary
processes to those with sufficiently weak memory.
(In example 2, there will be no way to infer, e.g.,
the probability of H, from a single realization of
the process, regardless of how many observations
we get to see.)
Ergodicity –
Stationarity is a useful concept because it means
that there is something that is fixed across the
sequence of random variables for us to learn
about from observing outcomes of the process:
the fixed finite dimensional distributions and
their moments (provided these exist).
However, in order for us to learn about the
characteristics of the stationary process as the
realization unfolds, there must be new
information contained in the new observations.
An additional condition that relates to this
requirement is the condtion of ergodicity.
Ergodicity is a condition that restricts the
memory of the process. It can be defined in a
variety of ways. A loose definition of ergodicity
is that the process is asymptotically
independent. That is, for sufficiently large n, zi
and zi+n are nearly independent. A more formal
definition is provided in the text. All of these
definitions essentially say that the effect of the
present on the future eventually disappears.
An i.i.d. sequence is ergodic (though ergodic
sequences need not be i.i.d.). The stochastic
process defined above by the coin toss example
is not ergodic.
Bottom line – Stationary and ergodic processes
allow for processes that are temporally
dependent but with sufficiently weak memory
for learing to take place as new observations are
revealed.
The Ergodic Theorem Let {zi} be stationary and ergodic with E(zi) = μ
(i.e., the mean of the process exists and is finite).
Then
1 n

 zi 
a.s.
n 1
That is, if {zi}is stationary and ergodic with a
finite mean, then the sample mean is a (strongly)
consistent estimator of that mean.
An important corollary to the Ergodic Theorem –
Let {zi} be stationary and ergodic and let f(.) be
a continuous function. Assume that E(f(zi)) = η.
Then
1 n

 f ( zi ) 
a.s.
n 1
(The corollary follows from the Ergodic
Theorem because {f(zi)} will be stationary and
ergodic if {zi} is stationary and ergodic.)
Digression on Covariance Stationary Process –
A second commonly encountered class of stationary
processes is the class of covariance stationary
processes (also called weakly stationary or stationary
in the wide-sense processes).
Definition – Covariance Stationarity
The stochastic process {zi}, i = 1,2,… is
covariance stationary if
i. E(zi) = μ for i = 1,2,…
ii. Var(zi) = 2 < ∞ for i = 1,2,…
iii. Cov(zi,zi-j) = j for all i,j
That is, a stochastic process is covariance
stationary if it has a constant and finite variance,
a constant mean, and the covariance between two
elements of the sequence only depends on how
far apart they are.
Note that –
 a strictly stationary process will be
covariance stationary if it has a finite
variance.
 a covariance stationary process does not
require that the zi’s have identical
distributions; thus strictly stationary
processes are d.i.d. while covariance
stationary processes can be d.ni.d.
Fact – If {zi} is stationary and ergodic and if
Var(zi) = 2 < ∞, then the Ergodic Theorem can be
applied to show that
ˆ j ,n
1 n
  ( z i  ˆ )( z i  j  ˆ )   j
a. s .
n i  j 1
and
ˆ j ,n  ˆ j ,n / ˆ0,n   j
a.s.
That is, the sample autocovariances and sample
autocorrrelations are consistent estimators of the
population autocovariances and autocorrelations.
Definition –
The stochastic process {zi}is a white noise process if
i. E(zi) =0 for i = 1,2,…
ii. Var(zi) = 2 < ∞ for i = 1,2,…
iii. Cov(zi,zi-j) =0 for all i≠j
That is, a white noise (w.n.) process, is a zero-mean,
constant variance, and serially uncorrelated process.
A w.n. process is covariance stationary (but not
necessarily strictly stationary,since the zi’s are not
necessarily identically distributed). An i.i.d. sequence
is a white noise sequence if it has a finite variance.
White noise processes are the fundamental building
building blocks of covariance stationary processes
and play a very important role in time series analysis.
The differences between strict stationarity and
covariance stationarity is, for the most part, only of
interest to the theoretician. That is, if we are willing
to treat a particular time series as a covariance
stationary process there is usually little reason to
think that it’s not also strictly stationary and vice
versa.
So, why are both definitions useful? In theoretical
settings, when we are trying to establish consistency
and asymptotic normality it is often easier to work
under the assumption of strict stationarity. However,
in applications when we look at a time series and
consider whether it looks like a realization from a
stationary process, we usually think in terms of the
conditions for covariance stationarity.
In applications, we observe part of a single
realization of a stochastic process, say, the real
numbers z1,…,zn, and then we have to decide whether
it is reasonable to assume that this is a realization of
stationary stochastic process (or not).
Later in this course, if we have time, we will talk
about testing this assumption against a particular type
of non-stationarity. But, often our willingness to
make this assumption is based on observating the
time series graph of the series and asking the
following questions –
1. Does it look like a realization of a process with
a constant mean? Or, does it look like the
realization of a process with an increasing
mean? (I.e., does the series display a time
trend?)
2. Does it look like a realization of a process with
a constant variance? Or, does it look like the
volatility of the process is varying
systematically with time?
Consider, for example, the U.S. unemployment rate
and U.S. real GDP.
Many economic time series, like real GDP, seem to
be nonstationary because their means are increasing
with time. This would seem to greatly limit the
appeal and usefulness of stationarity. Although these
series appear to be nonstationary, there might be
simple transformations that can be applied to create
stationary series: first differencing, removing a linear
trend,…
Download