Econ 388 R. Butler 2014 revisions lecture 18 I. Transformations on Highly Persistent time series A. weakly dependent time series are integrated of order 0, denoted I(0); nothing needs to be done to such series before we invoke large sample niceness over them B. random walks—with or without drift—are integrated of order 1, I(1); and they need to be transformed before using them: usually by first differencing them. An AR(1) stochastic process, yt yt 1 et , is I(0) if 1 , and a I(1) if 1 . You would think that the regression of y on its lagged value, would indicate whether it is I(1) or I(0), but when the process is a random walk (I(1)), the estimated can be shown to be downward biased. This is the essence of the “testing for unit roots” problem. (the bias tends to go away as the sample size goes to infinity). The rule of thumb given in the book is if the estimated is .9 or higher, then you should difference the variables of interest in the equation before estimating the model. II. Autocorrelation or Serial Correlation: E ( t s ) 0 when s t . A. Introduction One of the most common violations in time series models of the regression assumptions is the presence of autocorrelated random disturbances in regression models. In general, autocorrelated errors do not effect the consistency or unbiasedness of the least squares estimators (if it is “pure” autocorrelation, and not due to an omitted variable—see chapter 11 Wooldridge for one instance in which there is bias with lagged values of the dependent variable); but autocorrelation does tend to bias t-tests and F-tests (because standard errors won’t be correctly measured with OLS). The case of positive autocorrelation might be depicted as follows: yt xt 1 Note that positive random disturbances tend to be followed by positive random disturbances and negative random disturbances tend to be followed by negative random disturbances. Thus, we are faced with a situation in which the off-diagonal elements of the variance/covariance matrix are nonzero; therefore E( i j ) 0 and the least squares estimators of β again will not equal the MLE or BLUE of β, and are therefore not minimum variance estimators. Possible causes of autocorrelated random disturbances might include deleting a relevant variable, selecting the incorrect functional form, or the model may be correctly specified, but the error terms are correlated. Since the problem with autocorrelation only occurs when the data are naturally ordered--i.e., with time series data but not with cross section data--we'll use a "t" (for time period) subscript for variables coming from year "t". n(n 1) n(n 1) The V( )- matrix , , contains n distinct elements. In the 2 2 context of the generalized regression model, we lack sufficient data to obtain separate independent estimates for each of the Cov ( i j ). In order to circumvent this problem we frequently assume that the 's are related in such a manner that fewer parameters describe the process. One such model which provides an accurate approximation for many time series (this is the most common stochastic process) is the first order autoregressive process t t 1 et where the et are assumed to be independently and identically distributed as N(0, σ2), i.e., they are a white noise stochastic process. Recall that for this stochastic process that we have shown that E( i )=0, Var( t ) = 2 = e2 1- 2 Cov( t t s )= s 2 . (1 2 ) We observe that the random disturbances t are characterized by constant variance (homoskedasticity) and are uncorrelated if and only if ρ = 0 in which case the t = e t . We also note that since Cov( t , t 1 ) E ( t t 1 ) 2 (1 2 ) we expect positive random disturbances to be followed by positive random disturbances if ρ > 0. 2 Based upon the assumption that the process t is a first order process, we can write the associated variance covariance matrix as 1 2 n-1 . 1 . n-2 2 e = 1- 2 2 . . . . . n-3 n1 2 . . . 1 is now completely characterized by the two parameters ρ and σ and the estimation problem is considerably simplified. B.T-test from a residual regression If the regressors are strictly exogeneous, then a two stage procedure is to 1) get the residuals from the OLS regression, and then run the regression ˆ t ˆ t 1 et without an intercept (though it would also work with an intercept, asymptotically), and test whether ̂ =0. STATA code: regress y x1 x2 x3 x4; predict resids, residuals; gen lag_resids=resides[_n-1]; *be sure to use square brackets; regress resides lag_resids, noconstant; regress resides lag_resids; *just so you can see that having a constant doesn’t matter; SAS code: proc reg; model y=x1 x2 x3 x4; output out=next_to r=resids; run; data diane; set next_to; lag_resids=lag(resides); *lag2(.) for lagging twice, etc.; run; proc reg; model resids=lag_resids / nointercept; run; C. Durbin-Watson Test (chapter 12) A related test—the most common test for autocorrelated errors (it checks if these is a AR(1) process)—is the Durbin Watson test. It is defined by n (ˆ D.W.= t - ˆ t-1 ) 2 t=2 n ˆ 2 t t =1 3 where ̂ t denotes the least squares estimator of the random disturbance t . This expression can be written in a useful alternative form by noting that (denote et= ̂ t for purposes of this derivation): n n (et - et-1 )2 = t=2 n t=2 n = 2 et + t=1 2 et-1 - 2 2 et-1 = 2 2 et et-1 - e1 - e0 t=2 t=1 2 2 2 e t + e0 - e n t=1 n n n = 2 et-1 t=2 n t=1 n et et-1 + t=2 n Since n 2 et - 2 2 et + t=1 n 2 et - 2 t=1 n 2 2 2 2 et et-1 - e0 - en - e1 - e0 t=2 n = 2( e2t - t=1 t=2 2 2 et et-1) - e1 - en hence, D.W = n 2 e2t t=1 n ee e t t=2 2 2 - e1 - en t-1 2 t t=1 n = 2(1 - ) - 2 2 e1 + en n t=1 e 2 t where = et et-1 t=2 n 2 et t=1 so that D.W. 2(1 - ̂ ) where ̂ denotes an estimator of ρ. From this expression we note that if ρ = 0, we would expect to have ̂ "close" to zero and the value of Durbin and Watson statistic (DW) close to two. In attempting to determine confidence intervals for DW derived the distribution of two statistics which bounded DW. Consequently, the reported confidence intervals for the hypothesis ρ = 0 for D.W. (derived from confidence intervals for the bounds) may appear somewhat peculiar as may be illustrated by the following figure. 4 True Distribution Lower Bound Distribution Upper Bound Distrbution Durbin Watson Statistic 2 dL dtrue dU The values of dL (lower bound, for say, the five percent level) and dU (upper bound, for say, the five percent level) define the critical region bound (around the true value for the 5 percent level, dtrue), and are tabulated according to the critical level (α level), sample size (n), and number of non-intercept coefficients in the model (k). The tables have been extended to cover additional sample sizes and number of explanatory variables by Savin and White [Econometrica, 1977]. The null hypothesis Ho: ρ = 0 is rejected if D.W. < dL or D.W. > 4 - dL. We fail to reject the hypothesis if dU < D.W. (for positive autocorrelation, and DW< 4 - dU for negative autocorrelation) and the test is inconclusive if dL < D.W.< dU (or 4 - dU < D.W. < 4 - dL. for negative autoregression) STATA will calculate the exact DW statistic for you in there time series program that does regression with first order autoregressive errors (“prais” procedure): prais y x1 x2 x3; *this does prais-winsten estimators using T2 transformation below; SAS, just use the DW option in proc reg: proc reg; model y x1 x2 x3 /dw; run; /*OR proc autoreg; model y=x1 x2 x3/dw; run;*/ 5 In general, the Durbin-Watson test is not strictly appropriate for models with lagged dependent variables included; in this case one should use the Durbin-h statistic (see Durbin, Econometrica, 1970 , mentioned, but not given on p. 384 in Wooldridge), and the simple procedure mentioned in chapter 12 of Wooldridge: the same as the t-test above but include the independent regressors in the second stage as well: ˆ t ˆ t 1 0 1 x1t 2 x 2t et and test whether ̂ = 0. This test can be readily extended to AR(q) processes (using a Ftest for the joint significance of several ̂ i -coefficients (see Wooldridge, chapter 12 and a LM-test as described just after that). II. Estimation with Autocorrelated Errors For applications in which the hypothesis of no autocorrelation is rejected, we may want to obtain maximum likelihood estimators of the vector . This will be obtained by proceeding in the same manner as in the case of heteroskedasticity, i.e., we will attempt to transform the model so that the transformed random disturbances satisfy the assumptions of the linear regression model and then apply least squares. Consider the model y t X t t 0 1 x1t k x kt t where t t 1 et t = 1, 2, . . ., n. Replacing the t in the expression for yt by t-1 and multiplying by we obtain yt-1 = X t-1 + t-1 = 0 1 x1t k x kt t Subtracting yt-1 from yt yields yt - yt-1 = 0 (1 ) 1 ( x1t x1t 1 ) k ( x kt x kt 1 ) ( t or yt* = 0 (1 ) 1 x *1t et where yt* = yt - yt-1 x*it = xit - xit-1 t 1) t = 2, . . ., n t = 2, . . ., n, i = 2, . . ., k. Note that we have (n - 1) observations on y*, x*i. The random disturbance term associated with the transformed equation satisfies the usual regression assumptions. The transformed data matrices are given by 6 y1 y 2 y1 1 0 0 0 0 y 2 y y 2 0 1 0 0 0 0 y 3 y* 3 0 0 0 1 y y y n 1 n 1 n y n = T1Y and X* = 1 x1t x1t 1 x kt x kt 1 1 x1t 1 x1t 2 x kt 1 x kt 2 1 x1n x1n 1 x kn x kn 1 = T1X A common technique of estimation is then based upon applying least squares to y* = X* + t or yt - yt-1 = ( X t X t 1 ) t t = 2, . . ., n Several comments need to be made about this approach. First, is generally not known and estimates of will need to be used. Note that the intercept in the transformed equation is 0 (1- ) and hence the final estimate of the intercept must be divided by 1 in order to recover an estimate of . Finally we need to mention that even if is known this estimator of will not be identically equal to the MLE of because n-1 observations are used rather than n observations, i.e., we are not using all of the sample information. MLE of can be obtained by noting that (1 2 ) y1 (1 2 ) X 1 (1 2 ) 1 (transformation of the first observation) where (1 2 ) 1 N (0, 2 ) and then applying least squares to the transformed equation y** = X** + * where 7 ( 1 2 ) y 1 y 2 y1 y ** y3 y 2 y n y n 1 = T2y 1 2 1 2 x 1t 1 x1t x1t 1 X** = 1 x1t 1 x1t 2 1 x1n x1n 1 x kt x kt 1 x kt 1 x kt 2 x kn x kn 1 1 2 x kt =T2X where 1 2 0 0 0 0 0 1 0 0 0 0 0 2 T2 = 0 1 0 0 0 0 = 1 0 0 0 T1 0 0 0 1 Note: (1) T2 is n x n whereas T1 is n-1 x n; hence, y** is n x 1 and y* is n-1 x 1. T2 is the Prais-Winsten transformation, and is probably preferred over the T1 (CochraneOrcutt) transformation, especially for small samples. SAS uses a Yule-Walker transformation (Prais-like transformation) in its proc autoreg. (3) In cases in which is known the above procedures are relatively straightforward. When is not known alternative techniques have been developed. A common technique can be outlined as follows: (a) Estimate ̂ as discussed above when making the t-tests for first order autocorrelation. (b) Transform the data using ̂ instead of , T1 or T2 can be used. Apply least squares to the transformed data. The associated estimators are referred to as two stage estimators. (Don't confuse this with two stage "least squares" estimator which will be discussed later). What we said about oblique projections using GLS at the end of the lecture 17 applies here as well. 8