Lecture 3 – Stationary Processes and the Ergodic LLN (Reference – Section 2.2, Hayashi) Our immediate goal is to formulate an LLN and a CLT which can be applied to establish sufficient conditions for the consistency and asymptotic normality of the OLS estimator in time series regressions with temporally dependent, predetermined (but not necessarily strictly exogenous) regressors and serially uncorrelated disturbances. [We will deal with serially correlated disturbances later in the course.] In this lecture we will state the Ergodic Theorem, an LLN that applies to “stationary and ergodic stochastic processes.” We will begin by defining and describing stationary and ergodic processes. In the next lecture we will state the Ergodic Stationary Martingale Differences CLT, providing a definition and description of martingales and martingale difference sequences before presenting that theorem. Then, in the following lecture, we apply these theorems to formulate a set of conditions under which the OLS estimator is consistent and asymptotically normal. Definition – Stochastic Process A sequence of random variables is also called a stochastic process or a time series (if the index refers to a period or point in time). Note - Sometimes it is most convenient to define the stochastic process {zi}over the positive (or nonnegative integers) i = 1,2,… (or i = 0,1,2,…) and sometimes it is most convenient to define the process over the entire set of integers i = …,-2,1,0,1,2,… Definition – Realizations of a Stochastic Process The outcome of a stochastic process forms a sequence of real numbers, which we also write as {zi}. This sequence of real numbers is called a realization of the stochastic process or a (realizaition of the) time series. In econometrics, the time series data we observe, e.g., quarterly U.S. real GDP from 1960-2004, are thought of as part of a realization of a stochastic process. Our goal in applied time series analysis is to draw inferences about the stochastic process based upon the realization we have observed. The most useful class of stochastic processes is the class of stationary stochastic processes.The basic idea underlying the notion of a stationary process is that in a probability sense, made precise by the following definition, the process behaves the same way over time. Definition – Stationarity The stochastic process {zi}, i = …,-1,0,1,… is strictly stationary if all of its finite dimensional distributions are time invariant. That is, Pr ob( z i1 1 ,..., z ik k ) Pr ob( z h i1 1 ,..., z h ik k ) for all: positive interger k, integers i1,…,ik, integers h, and real numbers 1,…,k. Note – the z’s can be random variables or random vectors (provided that they have the same dimension). In the case where the z’s are random vectors, we would say that the process is jointly stationary. (It can be the case that each element of z is strictly stationary, but the vector process is not jointly stationary. See Example 2.3 in Hayashi.) Fact – If {zi}is strictly stationary , the -moment of zi, E ( z i ) , is the same for all I, if it exists and is finite. (Why? Because the distribution of zi is the same for all i by the definition of stationarity. ) Fact – If {zi}is strictly stationary and f(.) is a continuous function, then {f(zi)} is also strictly stationary. So, for example,{yi}, yi= a0 + a1zi+1 + a2zi + a3zi-1, is strictly stationary. Also, if the z’s are m-dimensional and jointly stationary, then zz’ and z’z are strictly stationary. If zz’ is nonsingular, then (zz’)-1 is stationary. A couple of (extreme) examples of stationary stochastic processes: An i.i.d. sequence is a strictly stationary sequence (This follows almost immediate from the definition. 1) use the independence property to factor the joint distribution into the product of the marginal distributions. 2). Then use the identical distribution property that Prob(zi < ) = Prob(zj < ) for all i,j,.) A constant sequence is a strictly stationary sequence. Suppose we flip a coin. If H, then zi = 0 for all i; if T, then zi = 1 for all i. Then, for example, Prob(zi < 1/2) = Prob(zj < 1/2) = Prob(H) for all i,j. … Note that in the first example, the process has no memory; in the second example the process has inifinite memory – the initial value completely determines the remainder of the sequence. In order for stationary processes to be of use to us, we will need to restrict the class of stationary processes to those with sufficiently weak memory. (In example 2, there will be no way to infer, e.g., the probability of H, from a single realization of the process, regardless of how many observations we get to see.) Ergodicity – Stationarity is a useful concept because it means that there is something that is fixed across the sequence of random variables for us to learn about from observing outcomes of the process: the fixed finite dimensional distributions and their moments (provided these exist). However, in order for us to learn about the characteristics of the stationary process as the realization unfolds, there must be new information contained in the new observations. An additional condition that relates to this requirement is the condtion of ergodicity. Ergodicity is a condition that restricts the memory of the process. It can be defined in a variety of ways. A loose definition of ergodicity is that the process is asymptotically independent. That is, for sufficiently large n, zi and zi+n are nearly independent. A more formal definition is provided in the text. All of these definitions essentially say that the effect of the present on the future eventually disappears. An i.i.d. sequence is ergodic (though ergodic sequences need not be i.i.d.). The stochastic process defined above by the coin toss example is not ergodic. Bottom line – Stationary and ergodic processes allow for processes that are temporally dependent but with sufficiently weak memory for learing to take place as new observations are revealed. The Ergodic Theorem Let {zi} be stationary and ergodic with E(zi) = μ (i.e., the mean of the process exists and is finite). Then 1 n zi a.s. n 1 That is, if {zi}is stationary and ergodic with a finite mean, then the sample mean is a (strongly) consistent estimator of that mean. An important corollary to the Ergodic Theorem – Let {zi} be stationary and ergodic and let f(.) be a continuous function. Assume that E(f(zi)) = η. Then 1 n f ( zi ) a.s. n 1 (The corollary follows from the Ergodic Theorem because {f(zi)} will be stationary and ergodic if {zi} is stationary and ergodic.) Digression on Covariance Stationary Process – A second commonly encountered class of stationary processes is the class of covariance stationary processes (also called weakly stationary or stationary in the wide-sense processes). Definition – Covariance Stationarity The stochastic process {zi}, i = 1,2,… is covariance stationary if i. E(zi) = μ for i = 1,2,… ii. Var(zi) = 2 < ∞ for i = 1,2,… iii. Cov(zi,zi-j) = j for all i,j That is, a stochastic process is covariance stationary if it has a constant and finite variance, a constant mean, and the covariance between two elements of the sequence only depends on how far apart they are. Note that – a strictly stationary process will be covariance stationary if it has a finite variance. a covariance stationary process does not require that the zi’s have identical distributions; thus strictly stationary processes are d.i.d. while covariance stationary processes can be d.ni.d. Fact – If {zi} is stationary and ergodic and if Var(zi) = 2 < ∞, then the Ergodic Theorem can be applied to show that ˆ j ,n 1 n ( z i ˆ )( z i j ˆ ) j a. s . n i j 1 and ˆ j ,n ˆ j ,n / ˆ0,n j a.s. That is, the sample autocovariances and sample autocorrrelations are consistent estimators of the population autocovariances and autocorrelations. Definition – The stochastic process {zi}is a white noise process if i. E(zi) =0 for i = 1,2,… ii. Var(zi) = 2 < ∞ for i = 1,2,… iii. Cov(zi,zi-j) =0 for all i≠j That is, a white noise (w.n.) process, is a zero-mean, constant variance, and serially uncorrelated process. A w.n. process is covariance stationary (but not necessarily strictly stationary,since the zi’s are not necessarily identically distributed). An i.i.d. sequence is a white noise sequence if it has a finite variance. White noise processes are the fundamental building building blocks of covariance stationary processes and play a very important role in time series analysis. The differences between strict stationarity and covariance stationarity is, for the most part, only of interest to the theoretician. That is, if we are willing to treat a particular time series as a covariance stationary process there is usually little reason to think that it’s not also strictly stationary and vice versa. So, why are both definitions useful? In theoretical settings, when we are trying to establish consistency and asymptotic normality it is often easier to work under the assumption of strict stationarity. However, in applications when we look at a time series and consider whether it looks like a realization from a stationary process, we usually think in terms of the conditions for covariance stationarity. In applications, we observe part of a single realization of a stochastic process, say, the real numbers z1,…,zn, and then we have to decide whether it is reasonable to assume that this is a realization of stationary stochastic process (or not). Later in this course, if we have time, we will talk about testing this assumption against a particular type of non-stationarity. But, often our willingness to make this assumption is based on observating the time series graph of the series and asking the following questions – 1. Does it look like a realization of a process with a constant mean? Or, does it look like the realization of a process with an increasing mean? (I.e., does the series display a time trend?) 2. Does it look like a realization of a process with a constant variance? Or, does it look like the volatility of the process is varying systematically with time? Consider, for example, the U.S. unemployment rate and U.S. real GDP. Many economic time series, like real GDP, seem to be nonstationary because their means are increasing with time. This would seem to greatly limit the appeal and usefulness of stationarity. Although these series appear to be nonstationary, there might be simple transformations that can be applied to create stationary series: first differencing, removing a linear trend,…