Econ 388 R. Butler 2014 revisions Lecture 17 I. Some important definitions A time series is a collection of random variables, ordered in time, and also known as a stochastic (Greek for “pertaining to chance”) process. An example would be: xt1 , xt 2 , xt3 , xt 4 , ...xt m , where in the data we have just a single realization for each of the “ t j ” time periods. Hence, we cannot replicate our data. Also, the time series is often not independent. Because of the lack of replication and the lack of independence, we often need to impose some restrictive assumptions on the stochastic process. Those assumptions usually concern the mean, variance, and covariances of the stochastic process. Suppose that you look at the annual real price of gold from 1880 to 1909, a sample of 30 time series observations (which you can get from the Historical Statistics of the United States, volume 1), so n=30. Then pick any other 30-year period (1910 to 1939, or 1895 to 1924, or 1940 to 1969, to name just a few examples), and see whether it has the same mean, the same variance, and the same covariances. The next three definitions focus on the properties of these series, when we chose different time subscripts. A. (strict) stationary stochastic process= xt1 , xt 2 , xt3 , xt 4 , ...xt m has the same joint distribution as the series of xt1 h , xt 2 h , xt3 h , xt 4 h , ...xt m h for any given (integer value) of h (as long as the series exists). This means that the series is identically distributed, including same correlations between similarly spaced (in time) terms: corr( xt j , xt l ) = corr( xt j h , xtl h ) for all integer values h, and the same higher order moments as well. The distribution is the same whatever period we examine (whatever 30 year period we happen to use for our gold prices). B. (weak or) covariance stationary stochastic process= if it has a finite second moment and 1) constant mean, 2) constant variance, and 3) the covariance between any two given terms depends only on how far they are spaced and not on t; that is: cov( xt j , xt l ) = cov( xt j h , xtl h ) depends only on how many periods apart are the tj and tl terms, and not on the value of h. Since the normal distribution is completely characterized by the means and covariances, the weak stationarity and normality together equals strict stationarity. Stationarity of either type makes it easier to employ the law of large numbers and the central limit theorem when working with time series data. C. weak dependence= xt and xt+h are weakly dependent if they are almost independent as h gets “large;” that is, as h increase without bound then the correlation between xt and xt+h goes to zero. II. Four fun cases to use our definitions on 1 A. White noise (purely random stochastic process) et such that there is a constant mean (say zero), a constant variance (say 2 ) and all of the covariances (other than the variance) equals zero. “White noise” is essentially the assumption we make in the cross sectional models studied earlier, where the subscript varied across individuals rather than over time. Since the covariances are all zero (constants, independent of time), then white noise is a stationary stochastic process, and weakly dependent (more strongly, white noise variates are independent). B. Random Walk yt yt 1 et where et is a white noise stochastic process, yt is known as a random walk. If we keep substituting for lagged values of y we get: yt ( yt 2 et 1 ) et = ([ yt 3 et 2 ] et 1 ) et = etc. = et et 1 et 2 ... e1 y0 So that the E( yt ) = y0 . (Often, y0 is assumed to be zero.) The variance of a random walk is the sum of the variances (why?), Var( yt ) = 2 t , where t is the number of time periods. Since the variance is not constant, a random walk is an example of a stochastic process that is not stationary. Is it weakly dependent in the sense that corr( yt , yt h ) goes to zero as h gets large, for any given value of t? Recall corr( yt , yt h )=cov( yt , yt h )/{ Var ( yt ) Var ( yt h ) }, so we need to look at each term separately, and then put them together. We know from above that Var( yt ) = 2 t , and that Var( yt h ) = 2 (t h) . Finally, the covariance is E[yt-E(yt)(yt+h-E(yt+h)], but assume that y0 is assumed to be zero, and we get cov(yt,yt+h)=E([yt ]{yt+h})= E([ et et 1 ... e1 ] { et h et h 1 ... et et 1 ... e1 }), now since all the cross products here (with differing subscript values have covariances equal to zero—recall the white noise stochastic process above—the covariances all vanish) have zero expected value, only the own product terms (those terms with the same subscripts) remain, and their expected value is simply the variance for each of these terms. Hence, we have cov( yt yt h )=t 2 . Now putting this altogether we have, corr ( yt , yt h ) t2 t 2 (t h) 2 = t th The problem with these correlations are, that no matter how big we make h, we can always make t large enough so that the correlation is close to one. Hence, a random walk is not weakly dependent (the terms are not asymptotically uncorrelated). Another way to look at the absence of weak dependence is to consider the expected value of y at time t+h, given the value of y at time t. That is, does the current value of y tell us 2 anything about the future value of y, even when h is very large? The answer is yes, because an application to y at time t+h of the recursion technique above yields yt h et h et h 1 et h 2 ... et 1 yt So that E( yt h | yt ) yt ; today, at time t, my best guess of the value of y in h periods from today is just today’s value. This is a reason that Wooldridge calls the random walk a stochastic process that is a highly persistent time series. Another non-stationary process, closely related, is the random walk with drift process: yt 0 yt 1 et which, by successive substitution as before, can be rewritten as yt 0t et et 1 ... e1 y0 In this case, not only is the variance increasing over time, but the mean is increasing as well. Again, it is not stationary and it is not weakly dependent. C. Moving Average Processes Suppose that yt 0et 1et 1 ... met m then yt is known as a moving average process (MA) of order m, denoted as MA(m), where the et are white noise. In this case, the mean is zero and the variance equal to V ( yt ) 20V (et ) 21V (et 1) ... 2mV (et m ) 2 (02 m2 ) which is also constant, independent of t. The covariances of yt and yt h are zero as long as h>m (so that none of the et terms overlap, so all of the E( et j , e(t h) k )=0 ), so that the MA process is weakly dependent. Indeed, all of the covariance terms are independent of time, so that the MA process is stationary. As an example, consider a MA(2) process, and covariance for yt that are one period apart (recall that E( yt )=0): cov(yt , yt 1 ) E ([ 0et 1et 1 2et 2 ][ 0et 1 1et 2 2et 3 ]) = E ([ 01et 1et 1 1 2et 2et 2 0 0et et 1 01et et 2 0 2et et 3 11et 1et 2 ... 2 2et 2et 3 ]) = 01 2 1 2 2 0 0 0 0 0 0 0 = 2 (01 1 2 ) We get this result as only the first two terms have common time subscripts, and hence their expected value is the variance of the white noise random variable, while all the other terms have different time subscripts so that their covariances (the covariances of the white noise terms) is zero. Hence, the covariance is independent of the time subscripts, 3 depending only on how far apart the y time subscripts are (here, h=1). Again, the series is stationary. D. Autoregressive Stochastic Process Suppose that yt 1 yt 1 2 yt 2 3 yt 3 r yt r et where the last term is the usual white noise term, then yt is said to be an autoregressive stochastic process of order r, denoted by AR(r). Since you regress y on lagged values of itself, it is called “auto-regressive.” Again, by successive substitution you can show that the mean is zero, given that E( y0 )=0. The variance and covariances take some more work; it is easy to illustrate with a the special (and quite important) case of AR(1): yt yt 1 et We start by successively substituting for lagged values of y to get yt ( yt 2 et 1 ) et = 2 ( yt 3 et 2 ) et 1 et = 3 ( yt 4 et 3 ) 2et 2 et 1 et = T e i t i y0 . i 0 where T is how many periods in the past that the process began. So that the expected T i 0 value of yt is zero, as long as y0 is zero. Since E( yt )=0, Var( yt )=E( y 2t )=E( i et i 2 ) Suppose that T is infinite (general regarded as the biggest value we can have for T), then 2 T we obviously have a lot of terms in i et i . The nice thing is that all of the “cross i 0 product terms” (like et et v , whenever v is not equal to zero) have zero expected value and so drop out when we take the expectation. So we 2 2 i et i ) = 2i E (e2t i ) = 2 2i = 2 . E( 1 i 0 i 0 i 0 The last equality follows if the infinite series converges, which it will do as long as lies between –1 and 1 (not including one, since it would then be a random walk). Wooldridge provides a “sort of” alternative derivation of this result at the bottom of p. 350. To get the covariances for an autoregressive model, recall that E( yt )=0, so that we need only consider Cov(( yt yt h )=E( yt yt h )=E( i et i i et h i ) i 0 i 0 4 As before, the expected value of all the cross product terms will equal zero, and only the white noise terms with the same time subscripts will remain (though one will have a i coefficient, and the other will have a i h coefficient, so that the covariance will be E( h i et i i 0 2 )= h 2i 2 E (e t i ) i 0 h 2 2i = i 0 2 . = h 1 2 With a stable AR(1) process with the absolute value of less than one, h goes to zero (and so does the covariance, and the correlation) as h gets large. This implies that the AR(1) process (with less than one) is weakly dependent. II. OLS assumptions and properties of the least squares estimator vector The OLS estimators are consistent when the following assumptions hold: 1. linearity and weak dependence (of the xt , yt --the variables in the model) The weak dependence assumption assures us the law of large numbers and central limit theorem can be applied as we take plims. 2. zero conditional mean of the error term, t and the current period regressors, xt . (This is a weaker assumption then one in which t is uncorrelated with x from all periods, but it does preclude omitted variables that are correlated with included regressors---i.e., if we leave out important variables and these left out important variables are correlated with the included independent variables, our estimates will not be consistent. But this assumption does allow for models with lagged dependent variables, as Wooldridge notes in chapter 12.) 3. No perfect collinearity between the independent variables in the model. The OLS estimators are asymptotically normally distributed (and all the usual tests can be used in large samples) when the following holds: 1.-3. assumptions 1 through 3 above hold, and 4. The errors are homoskedastic 5. There is no serial correlation. These conditions are enough to assure consistency of the OLS estimators, but not their unbiasedness. 5