Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/commonerrorintreOOquah Mr" working paper department of economics A COMHOI KEROE IH THE TREATHEHT OF TRENDING TIME SERIES by Danny Quah Jeffrey K. Vooldridge «La_- * 0,3C V«*hruarv 1988 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 % A COHHOI ERROR II TEE TREATMENT OF TRENDING TIKE SERIES by Danny Quah Jeffrey K. Vooldridge No. 483 February 1988 A Common Error in the Trending Time Treatment of Series. by Danny Quah and Jeffrey M. Wooldridge ' February 1988. * the Both are at the Department NBER. We thank of Economics and the Statistics Center, MIT. Quah Olivier Blanchard, Francis Diebold, Stanley Fischer, N. Gregory Melino for helpful comments on an earlier draft. is also affiliated with Mankiw and Angelo A Common Error in the Trending Time Treatment of Series. by Danny Quah and M. Wooldridge Economics Department, MIT. Jeffrey February 1988. Abstract There are two common difference-stationary, less show 1. in the analysis of trending time First, if a series is series. removing a linear time trend introduces spurious cyclically. Second, regard- of whether a series in either case, a misconceptions is difference-stationary or trend-stationary, taking Erst differences produces, covariance stationary sequence, and bo that the Bret statement is incorrect is recommended econometric practice. We and that the second can be misleading. Introduction. A number of recent papers have recommended taking metric analysis (see for example Campbell and a series is Mankiw observed time series prior to econo- (1988) and others). The reasoning as follows. If is truly difference-stationary, then removing a linear time-trend produces spurious cyclicality in the residuals. Under the same and so convenient for econometric analysis. is first-differences of condition, taking first differences produces a series that If, on the other hand, the series covariance stationary, is is truly trend-stationary, taking first-differences nevertheless produces a covariance stationary series, albeit one with a zero in the spectral density at frequency zero. This the is still satisfactory however (the reasoning goes), as the unit root in moving average part produced by over-differencing remain agnostic if one is to is always to take econometric analysis. Put another way, the recommended practice is not to detrend for detrending leads to spurious "cyclical* by many macroeconomists. See (1981, 1984), Thus, practice as to the cyclicality of the observed time series, the first-differences prior to Kang will manifest in the final estimates. for behavior in the residuals. recommended This view has become quite widely held example Campbell (1987), Deaton (1986), Nelson (1987), Nelson and Mankiw and Shapiro (1985), Romer (1987), and Shapiro (1986) among others. Nelson and Kang (1981) and Nelson and Plosser (1982) have forcefully argued that ing of a unit root process produces spurious cyclicality. It is known that when least squares detrend- the data are trend-stationary, 2 OLS estimators of the intercept and time trend coefficients converge to their probability limits quite rapidly. Thus there is good reason to believe that in this case, the detrended data appropriately reveal the true underlying dynamics about trend. In addition, Durlauf and Phillips (1986) have shown that OLS are difference-stationary, the does so much more when the data estimator for the time trend coefficient, while converging in probability, slowly relative to that in the trend-stationary case. Further, the intercept in this case actually diverges. Thus is is OLS estimator for the not surprising that the "spurious cyclically" view is so prevalent. i This paper demonstrates that this reasoning fundamentally incorrect. is First, we demonstrate removing a linear time trend actually preserves the true stochastic characteristics of the data. We that show that data detrended by least squares regression asymptotically provide the correct picture of the underlying dynamics, independent of whether the data are truJy trend- or difference-stationary. establish data two results here: First, if More precisely, we the data are trend-stationary, the covariogram estimator using detrended asymptotically indistinguishable from that using the true unobserved fluctuations about trend. Con- is sequently, the same is difference-stationary, gram estimator true of the correlogram estimator in this case. Second, suppose instead the data are and a researcher detrends the data by at each lag converges in probability to 1, least squares. We show then that the correlo- which correctly indicates the presence of a unit root. In other words, the researcher will appropriately conclude that the residuals are a unit root process, and are not cyclical about trend. These statements however are asymptotic; dynamic correlation subsequent in the when is is still be the case that and Kang (1981) on the the data are truly difference stationary. covariogram estimator process may in finite to least squares detrending are severely misleading. as starting point the findings of Kelson residuals it of approximately the trend-stationary or difference-stationary. finite this, we take detrended residuals, the exact bias same magnitude, regardless finite analyze sample bias due to using detrended We show that using With To samples, measures of of whether the underlying samples, any reasonable transformation of the estimated covariogram, such as the estimator for the spectral density, will have approximately the same bias properties, again regardless of whether the data are truly trend- or difference-stationary. indicated by Nelson and Kang (1981) is Thus the bias irreducible even in the "best practice" case of using least squares to 3 detrend trend-stationary data. process is misguided: there Clearly, if is We conclude that emphasizing bias due to detrending a difference stationary a significant finite sample bias associated with least squares detrending, period. the data are difference-stationary, it is a misspecification to estimate a time trend, and then to interpret the residuals as being close to a stationary process. model is We agree that estimating a correctly specified better than estimating an incorrectly specified model. However, unless the case that least squares detrending produces misleading results to well, we do not find convincing the is and finally, the data are trend-stationary as treacherous for subsequent econometric analysis, should the true data generating process actually be trend-stationary. mean proponents wish to extend argument that detrending produces spurious cyclically. Next, we show that taking first-differences procedures: estimating the when its We illustrate this for three different statistical of such a transformed process, estimating the when one estimating causality patterns of the variables is moving average coefficient, such an over-differenced process. In sum, our analysis almost exactly overturns the conclusions indicated in the introductory paragraph. We contend with the suggestion that macroeconomists should always use first-differenced pre-judge the cyclically in the data: The remainder of the paper is we series so as not to strongly disagree with this view. organized as follows. Section 2 considers the effects of removing a fixed but arbitrary linear time trend from a difference-stationary process. The conclusion here is obvious: the deviations from any fixed linear time trend remain difference-stationary. Section 3 considers the effects of first-differencing a trend-stationary process. The results here are less benign: this introduction of a stochastic singularity manifests in the spectral density vanishing at frequency zero. number for a situation is shown to cause problems of inferential questions of interest. In Sections 2 removal of a This fixed and 3, we treat the artificial but usefully intuitive case where by detrending we mean although completely arbitrary linear time trend. Section 4 considers the relevant practical when detrending is performed by least squares regression. We show here are unmodified: regardless of whether the true data generating process is that our earlier conclusions trend- or difference-stationary, the correlogram estimators using least squares detrended data are consistent asymptotically, and are "similarly'' behaved in finite samples. Section 5 concludes the paper. 4 Difference Stationary Data Generating Process. 2. Suppose that the underlying data generating mechanism (i) (ii) (iii) Y = fa + Yt-i + «t, is difference-stationary: «>1, t Yo a given random variable, and u< covariance stationary with Iterating on (i) , we have mean zero and spectral density bounded away from zero. that: 3=1 A linear time trend specification is a pair of real numbers (qj, 0x). Removing this linear time trend from Y t results in: X = Y — <*i — 0\ X = [Y - t t • t t «*. t a,) + (Po-Pi)-t + Yl «i]=i The "detrended X-t residuals" can be written X t have the integrated form of a difference-stationary sequence. In particular, as: (a.) X = (i.) Xq a given (c.) t the {0o same ~ as 0i) + *t-i random (iii) + «t, * variable, Yo > — 1, Qi, above. Thus, the detrended residuals are another difference-stationary sequence, where the first-difference sequence Xt — Xt-i has exactly the same probability properties as the first-difference of the original sequence, Y — Yt-i t (except possibly in mean). Therefore, unless one supposes that the original series already has a tendency to show "spurious cyclically,'' there is We return of the data. absolutely no reason to draw that conclusion for the detrended series. in Section 4 below to this observation that detrending does not alter the dynamic properties 5 Trend Stationary Data Generating Process. S. Suppose now that the true data generating mechanism Y = (i) t a is trend-stationary: + p -t+u u u t covariance stationary with mean zero and spectral density bounded away from zero. (ii) As before, a linear time trend is Removing a time trend numbers (ai,5i). a pair of real therefore produces: =Y X\ => The resulting process A',1 as in the original data is is Y (in u). 2, X] = (oo -h-t ai - 01) + (ft, -fa)-t + u we In the special case, where qj and Removing note: numerous Next consider taking first tJje qo and Comparing underlying model covariance stationary, 2 X? =r 1 is -r _ t it does not distort the true features true. is 1 = ^o + (u 1 -u 1 _ 1 ). also covariance stationary, although has always been relegated to the statement that point out that instead Inference about X? with the this difference stationary or trend stationary. is it has a zero in density at frequency zero. This last characteristic has been noted by macroeconomists, but We now tn « result ft), differences of trend-stationary data: t is same dynamic stochastic properties a fixed but arbitrary linear time trend preserves the true assertions that the opposite A Since u t . ft are equal to linear de-trending actually has the very desirable property that of the data, contrary to t simply a special case of trend-stationarity. is properties of the data, reg-ardJess o/wietiier Hence - seen to be trend-stationary, with exactly the covariance stationarity, which conclusion from Section t its "it will presence renders X 7 show up Then informative on a hypothesis of interest. does not have desirable properties. First, demonstrate that we show significance moving average X2 produces statistics that part." admit nonde- the researcher can be confident that a given sample We spectral treacherous to analyze econometrically. requires that as a stochastic process, generate asymptotic distributions. as a unit root in the its its X 7 is actually being an over-differenced sequence in fact that in the leading case of expectation estimation, such an over-differenced process produces a degenerate asymptotic distribution for the sample mean estimator. 6 In particular, it produces a timation. Second, we show an over-differenced X2 statistic that tantamount to using only the is first and last data points in es- that the nonlinear least squares estimator for the moving average coefficient in has the same asymptotic properties as that for the autoregressive coefficient levels of a difference stationary process. Thus, if in the macroeconomists are using first-differenced data because they do not wish to use the nonstandard distribution theory associated with unit root processes, they should recognize that they are faced with precisely the same problem truly trend-stationary. Third, test will may be found in first-difference a process that is that using an over-differenced sequence produce spurious evidence of causality, when related. This last point To we show when they in fact the Sims (1972) as well, begin, consider the problem of estimating the mean n" 1 J2 *? = Po + n~ X7 . -2 in a bivariate causality data are actually not Granger causally although of A it does not appear to be well-known. For a sample of size n, the sample mean is: l (u„ -u ). t=i Notice that only two (covariance stationary) mean statistic. random Thus the sample mean converges mean and of) the statistic. In we see that taking actually trend stationary; will produce data that For our second point, suppose first differences, may not for simplicity that nondegenerate words, after first-differencing a trend- associated statistics are econometrically useless. techniques that rely in part on a central limit property for the sample valid statistical inference. Thus, u enter the calculation of the sample to po, but at a rate that does not allow a asymptotic distribution for (any normalized version stationary sequence, the sample variables u„ and mean when will Econometric consequently not allow any the true data generating process is be particularly informative for statistical inference. /? is 0, and that uq is 0. Consider estimating the model: X? = u «o For parameter = t 0, --yout-i, 7o = r=l, 2,... I- 7, define the residual function: Rod) = o, Rth) = X? + 7^-1(7), Thus, the true disturbance ti t is obtained as i?t(l) for all t t-1,2,.... > 1. Estimation by nonlinear least squares solves the problem: min^i JZt( 7 <-* 2 i To obtain the asymptotic properties of such an estimator, For the of the score. sequence 8R t th term, the score is s t = {i) follows from the initial condition it it is . convenient to study the asymptotic distribution Rt{l) = (^)/d'y at the true parameter value 70 equal to U(_i. Then fo behaves — t 2 ) 1 is ' (& R*[~l) I d 7) But from above, the random . a unit root process, with itself = dRo(l)/di as: 0, first difference that the score at the true parameter i *i(l) = ««(l) =u 0= ux t -^u y t>2. y=i Notice that the score the product of a covariance stationary sequence u t with is its accumulation 5H/»i U J- Clearly, the resulting process will not have the usual central limit properties; thus neither will the nonlinear least squares estimator of the coefficient. asymptotic properties. Moreover, the hessian In fact, the score here has exactly the can be seen to have nonstandard itself same form as the score in least squares estimation of the auto-regressive coefficient in a difference-stationary process. 1 Thus (the score for) the nonlinear least squares estimator of the moving average coefficient in an "over-differenced process" converges random not to a normal variable, but instead to now-familiar functional of Brownian motion. If the data were truly difference stationary to begin, then of course the usual asymptotic normality theory would apply. The distribution theory for the non-standard (first-differenced) case procedure taking this into account is is different Finally, length 1 is data we turn is Our point instead is it is not that a that the distribution theory that Campbell and Mankiw (1988) seem to suggest for the use of simply non-existent. to the use of over-differenced processes in unknown and so across trend stationary and difference stationary models. Thus, the unifying framework that authors such as first- differenced now known, completely intractable. (We have not seen that any economist using first-differenced data has actually used this however.) applies to first-differenced data is instead is made Granger causality a function of sample size. This last condition when the true lag may seem unusual, but tests This is related to a familiar result from Lagrange Multiplier theory for standard models: for instance, Godfrey (1978) shows that Lagrange Multiplier tests against stationary autoregressive and moving average alternatives are identical when the null is white noise; the Lagrange Multiplier of course is just the score. That literature is however not particularly concerned with unit root models. 8 remember that the reader should applied researchers often do have to decide on a lag-length specification in time domain work. They do this based on observed data (and hence lag length we is never We are interested. Y, which is known Consider two random sequences for certain. will use the notation X 2 already stationary: thus Z is 2, the causality relations between X 2 has a zero in variable on current, lagged and that Z and Y and Y Y Z and is and size), Y, in of course the "true" whose causality the outcome of first-differencing Suppose that spectral density at frequency zero. its relations bounded away from From zero. Sims's Theorem can be studied by considering the two-sided projection of one and future values assume that Y is Thus neither Z nor Y Granger cause the trivially one-sided. A researcher using of the other. For simplicity and are in fact uncorrected at all leads lags. and thus are other: the two-sided projections are identically lero and sample to indicate a variable that a covariance stationary process with spectral density Z its sample will necessarily discover this for sufficiently large However consider instead the two-sided projection Z of on just white noise, Z sizes. X 2 : OO EvtX2.,- = Z = Y, H3)X?-i + Vu t for all 0, j. y=-oo The true projection coefficients b squares estimation will attempt to achieve projections b n (j), j = — oo, . . fourier transform of a (square projections (implied by) formula bn . , n > oo, Let n denote sample this. 1. summable) sequence fit and The b. difference in (implied by) bo is 2 = 0,1,..., symmetrir. about at lag and , mean squared and least let b error denote the between fitted given by Sims's approximation error x [b n , b 2 )} = ±- [ Z7T J ^ T \l n [u) £ - toM Sx {w) du> 2 bn{j)X _ 3 . consider the following family of candidate fitted projection coefficients: for n |y| zero, the sequence of fitted call X is as: =E for size, Call Sx{v) the spectral density of and the true best \d Now R2 again are zero. Equivalently, the true (optimal) {j) and lead j n and 0: = otherwise. For each n, this the distribution takes the value n. But when X? 1 = Y — Y -i = t t j = 0, and then declines u — u -i with u white t 1, b n (j] = n~ \n — j\ a triangular shaped two-sided lag distribution is at > t as a straight line to noise, such a family of lag distributions implies a deterioration in [d x {b a ,bo)r = H E i =E mean squared K{j)[ut-j error given by: - Vt-i-l] j=-oo bn {-n)u t+n + ^2 |fc„(y) -&„(/- l)]u t _y-fc„(n)u ( _ n _ 1 j=-n+l = Var(u) = 2n £ (Mj) "Mi- 2 I)) y=-n+i As Bample Var(u) size increases to infinity, squared error. However, at _1 any sample size all -Oasn-oo. such a family of lag distributions members fits of this family are two-sided: thus would lead the researcher arbitrarily well in the sense of Granger causality to conclude incorrectly that This rather surprising result can be understood in Z statistics Granger causes X singularity in the X'X. This can be the case of interest here, this than one distribution will fit interpreted to exactly is . terms of ordinary least squares theory. The spectral the spectral density If produced 2 density of the right hand side variable in a distributed lag regression corresponds to the X' ordinary least squares models. mean mean is zero at some frequency, that that the population regression what happens. The lag distribution is X matrix in equivalent to not identified. In is not uniquely identified: more is the data equally well. Using first-differenced data therefore renders particularly subtle the interpretation of Granger causality statistics. We emphasize that this is fundamentally different from the usual prefiltering that with covariance stationary data, prefiltering by arbitrary one-sided filters bounded away from zero leaves unaltered patterns of Granger causality. The effects. It is whose easy to show fourier transforms are result here is that prefiltering one of the series by first-differencing does affect Granger causality relations. Thus, contrary to the sanguine conclusion that over-differencing a trend-stationary process will simply "show up as a unit root in the unreliable econometric results. resulting process is moving average It is part," we conclude that this may true that in either case of difference stationarity or trend stationarity, the always covariance stationary. However, the resulting process to econometric analysis when the data were trend stationary to begin. seem to us especially convincing instead produce altogether We is not necessarily amenable have presented three examples that of the difficulties associated with using "over-differenced" data. At the same 10 time, these examples are of particular interest to macroeconomists. the mean and testing for Granger causality) are related in that The first and third examples (estimating they are both due directly to the spectral density of an over-differenced process vanishing at frequency zero. Information on such a process does not accumulate as the observed sample size increases. Our second example cautions that first-differencing to produce stationarity, no matter whether the original data are difference- or trend-stationary, a panacea for the econometric difficulties confronting researchers In particular, if is by no means when they analyze trending time Beries. they wish to avoid the nonstandard distribution theory associated with analyzing difference stationary processes in levels, they should realize that they have simply re-created those difficulties when they take first-differences of a trend-stationary sequence. 4. The Effects of Least Squares Detrending. In the previous two sections, although arbitrary time trend. when the detrending line First, we is we used We now the convenient fiction that detrending involved removing a fixed show that appropriate versions of the reasoning above apply even estimated by regression. establish that if the data have a unit root, the correlogram estimated from detrended data converges pointwise at each lag to the correct value of generating process has a unit root, the OLS 1. This initially estimator for the intercept Thus one might conjecture Durlauf and Phillips (1986)). may is seem known surprising: when the data to diverge (see for instance that the detrended residuals have no desirable properties. However, recall that the Durbin- Watson statistic in that case nevertheless correct value of zero (again, see Durlauf and Phillips (1986)). Durbin- Watson for the intercept statistic involves the first difference of the is simply subtracted out, and its On converges in probability to further thought, this happens because the detrended data. Thus the ill-behaved estimator lack of convergence in probability is inconsequential. Next, the estimator for the time trend coefficient only converges at rate yn, whereas by a variable (time) growing with the sample size. the fitted residuals. as n, so that the error However, again Thus it is recall that the only the first its it gets multiplied from using estimated rather than true residuals grows Durbin-Watson statistic involves the first difference of difference of the time variable that is relevant. This of course 11 is just constant. This feature controls the rate at which the estimation error grows, and in particular, that error actually converges at rate y/n in probability. But now we note that this reasoning applies not only at the first lag, as for the but in fact applies at every fixed that we lag. Thus even in the difference stationary case it is Durbin- Watson statistic, asymptotically irrelevant use estimated rather than true residuals in estimating the correlogram. For the trend stationary case, the estimators probability at rates n 1 / 2 and n 3 / 2 respectively. for the intercept Thus in this case, and time trend it coefficients converge in straightforward to establish that is the correlogram estimator using estimated residuals converges in probability to the correlogram of the true residuals. From the discussion above, and the formal results below, statement extends to difference stationary data as 1 the time trend coefficient converges only at the slower rate n least squares we turn to behavior in finite samples. detrended data, when in fact the are able to establish that this Note that by contrast with trend-stationary data, well. in the difference-stationary case, the estimator for the intercept diverges Second, we ' at rate n ' , and the estimator for 2 . Kang Nelson and (1981) have argued that the use of data are difference stationary, results covariogram and the spectral density that are biased towards showing cyclically in in estimators for the We samples. finite interpret this as saying that incorrect econometric specification will lead to an incorrect conclusion. In our view, this argument carries weight only if correct specification in this context does not lead to that erroneous conclusion. Suppose on the other hand, that the true data generating mechanism about trend. Then, the "best practice" procedure asymptotically and in finite with the trend with a two-step procedure by line, or is to detrend the true data generating will necessarily least squares: samples whether one estimates the residual first mechanism were white be serially correlated. example introductory textbooks such fixed lag, the exact bias in the covariogram estimator irrelevant both is serial correlation The answer is simultaneously yes. noise about deterministic trend, the fitted residuals This follows directly from the properties of as Theil (1971)). it stationary detrending. In this situation, would a researcher following "best practice" similarly discover spurious cyclically? If by is same We show is below that BLUS residuals (see for for every finite sample, at any "continuous" in the serial persistence of the data generating mechanism, in a neighborhood containing unit root processes. Loosely speaking, in finite samples, 12 there certainly a nonzero bias in the estimated dynamics of an inappropriately detrended unit root process; is however, this bias only of the same order of magnitude as the bias is when trend stationary processes are correctly detrended. One possible conclusion from this is that the message due to Nelson and Kang (1981) applies to persistent trend-stationary models as well as to difference-stationary models: researchers will just always find spurious cyclicality, regardless of the true is We think model. this is a little that finite sample theory does not EUggest that detrending is extreme: our preferred interpretation instead a bad thing to do. Putting our results for the asymptotic behavior of the detrended residuals, we conclude that there detrending produces misleading results when the true model has a unit root. econometric specification is we do with no evidence that Of course using the always better than using an incorrect specification, but is this together correct not think that is the issue here. Some economists have suggested to us that why unit root process, you'll see clearly made the unit root process are cyclical.'' While there there come to "if you just look at the picture of fitting a trend line to a spurious cyclicality; the end-points of both the trend and is close to each other. Thus the detrended data are certainly something to this graphical intuition, is calculations below. Consequently, we its effects believe this intuition to be incorrect, made to look show up nowhere and we do not in our find these "look at the picture" -type arguments persuasive. To begin the formal analysis, dependence permitted sum sequence: So Assumption (a.) = in our data. Let S = , we impose some standard assumptions on t 2J,-=1 £u = for all t Eu\ = al > (c.) sup t £'|u t 2+ * uj- ^' e impose the following conditions. J ; for all t < oo for ; some 6 | a% be the disturbance identified above, and define the partial 4.1 (Regularity): Let u satisfy: (b.) (d.) {u^}^ = lim^oo E (n~ 1 S%) exists, > ; < c% < the heterogeneity and serial 00 ; ' ' 13 (e.) {u,}^.j is strong mixing, with mixing coefficients a m such that: oo £«*l-2/« < oo. m=l We now use these conditions as they are by relatively familiar in the literature: The reader conditions possible to obtain our results. (1988) for further discussion of these conditions. referred to Phillips (1987) is they are not the weakest and Finite order covariance stationary Phillips ARMA and Perron process with Gaussian disturbances (where the moving average part does not have a unit root) can be shown to satisfy these assumptions. We will use the following result repeatedly in the Lemma subsequent discussion: (Asymptotic Distributions): Assume 4.2 the conditions of Assumption 4.1. Then as n — oo: , (a-) n- 1 /2X: tLiUt^«ToA'(0,l); (b.) n-^E.l^^ac/^'Mdr; (e.) n- 3 /= where E n=i *«t t A'(0, 1) is 4.8 OLS = ,t l,2,...,n} be an observed sample; and time regression on a constant n then for each fixed lag - 1 is standard Brownian motion, and => denotes weak convergence. is t IfY-t is trend-stationary; IfYt W results are expressed in the following: — (b.) H W(r) dr); (Consistency): Let {Y residuals from an (a..) - the standard normal, Our asymptotic Theorem =* *o (W(l) — > — ) n - t=j+i — v^t: —r^ r~) — » oc: u t (n)u -j[n) t — in probability. t=y+i E,"=3T i"'(")"'-;(") Et=iUtL" n ' difference stationary, then for each fixed lag Ut(n) be the fitted trend. " 1 uj(n)u-_,-(Ti) J j, as let j, as n — oo: > in— m 1 u k-r* probability. 14 Thus detrending does not affect convergence in either case, the correct values. While this result it nevertheless is process We is reassuring, is an asymptotic statement. still therefore establish a continuity proposition: If the data are trend-stationary, with the true distur- model somewhat redefine the w iU Theorem e xt (b.) e = 4.4 (Finite u - Ae M _j + u ei,t-i +u t , t , Sample For each j t = Bx{j,-n).= |A| 1, £ 1, 0, 1, . . . , < = 10 n > size E Bias): Let u is A>* + same £«, 1, u (»-j) Jet {u ( } uncorreiated with t>1 be = £ A0 modeb: "o; . Jet 3, n— be a random variable, and 1, l ?At("), «it( n ), t = l,2,...,n, denote the detrended data (fitted the exact biases in the covariogram estimator are: 13 ht[n)ix.t- :in) - [n-j) 1 eit[n)ei, t -j[n) - [n-j) 1 ]P e At ^ £ (n)e A t _y(n) , t=j+i and *i(j» =E [n-j] : Yi t=j+i Then of the in fact difference stationary. disturbance {ft} t >! be generated by two alternative 4.1; let the > t> Given a sampJe of residuals). + Qo using detrended data Let for convenience: = in either be covariance stationary, or be generated as a unit root sequence with zero drift. uo and satisfy Assumption (a.) sample bias finite sample bias when the data are finite Yt et that in finite samples, detrending a unit root severely misleading. order of magnitude as the where correlogram estimator to reasons articulated above, initially surprising for the may be It bances about trend highly correlated, then the We and in probability of the for each £xed n > 3, for each fixed j t=y+i = Bx [j, 0, 1, n) -» n — Bx {j, n) . . . , 1: asA-t 1. "( n ) e i-'-j( n) 15 Case in the (b.) Theorem specifies a zero drift unit root process; however since the true parameter /5q is arbitrary, our discussion certainly covers unit root processes with nonzero drift. Theorem least squares) 4.4 states the following: of the is same order the bias that arises from using estimated residuals (detrended by of magnitude, independent of whether the underlying data are difference- stationary or trend-stationary. To summarize the results of this section, while there are clearly differences in the behavior of least squares estimators of trend coefficients across trend- and difference-stationary data, the resulting detrended data have similarly revealing properties for their true underlying dynamics. Conclusion. 5. The specific technical contribution of this paper is two-fold: First we have shown that the correlogram estimators using least squares detrended data are consistent for the true values of the correlogram, regardless of whether the data are actually trend- or difference-stationary. Second, we have established that the sample bias in "cyclically" that results from detrending difference-stationary data finite no worse than that is in the best practice case of detrending trend-stationary data. Our conclusions draw on both exact as well as asymptotic arguments. reasoning that has motivated the erroneous conclusions a number of applied workers have we list in This is We the Introduction. adopted these incorrect statements in their own in keeping with the observe that quite research. We emphasize that: (i) removing arbitrary fixed linear time trends actually preserves the true properties of the time series data, regardless of whether the true (ii) taking first differences, if model is difference stationary or trend stationary, the true model is in fact trend stationary, produces data that are econometri- cally useless, and, (iii) least squares detrending does not disguise the cyclical properties of the data. In our view, the discussion here overturns the conventional Our wisdom among many applied researchers. results contradict the observation that incorrect detrending distorts the statistical properties of the data and produces "spurious cyclicality," and warn against the undiscriminating use of first differencing: First 16 differencing data. is not a procedure we would recommend to researchers confronted with trending time series 17 References Campbell, J. Y. (1987): "Does Saving Anticipate Declining Labor Income? An Alternative Test Permanent Income Hypothesis," Econometrica, November, 55 no. 6, 1249-1274. of tlie Campbell, J.Y. and N.G. Mankiw (1988): "Are Output Fluctuations Transitory?" Quarterly Journai of Economics, forthcoming. Deaton, A.S. (1986): "Life-Cycle Models of Consumption: Working Paper No. 1910, Cambridge. Is the Evidence Consistent with the Theory?" NBER Durlauf, S.N. and P.C.B. Phillips (1986): "Trends versus Cowles Foundation Discussion Paper no. 788, Yale University. Random Walks in Time Series Analysis," Godfrey, L.G. (1978): "Testing against General Autoregressive and Moving Average Error Models when Dependent Variables," Econometrica., 46, 1293-1302. the Regressors include Lagged Mankiw, N.G. and M.D. Shapiro (1985): "Trends, Random Walks, and Tests of the Permanent Income Hypothesis," Journal of Monetary Economics, September, 16, 165-174. "A Reappraisal Economy, 95 No. 3, 641-646. Nelson, C. (1987): Political Nelson, C. and H. Kang (1981): of Recent Tests of the Permanent Income Hypothesis," Journai of "Spurious Periodicity in Time Inappropriately Detrended Series," Econometrica, May, 49 no. 3, 741-751. Nelson, C. and H. Kang (1984): "Pitfalls in the Use of Time as an Explanatory Variable Journal of Business and Economic Statistics, January, 2, 73-82. Nelson, C. and C. Plosser (1982): "Trends and Random Walks in Macroeconomic Time in Regression," Series," Journal of Monetary Economics, 10, 139-162. Phillips, P.C.B. (1987): and Phillips, P.C.B. P. "Time Series Regression with Perron (1988): "Testing for a A Unit Root," Econometrica, 55, 277-301. Unit Root in Time Series Regression," forthcoming Biomet.ri.fca. ing Romer, CD. (1987): "Changes Paper No. 2440, Cambridge. Shapiro, Activity, 1, M.D. (1986): in the Cyclical Behavior of Individual Production Series," NBER Work- "Investment, Output, and the Cost of Capital," Brooking Papers on Economic 111-152. Sims, C.A. (1972): "The Role of Approximate Prior Restrictions nal of the American Statistical Association, 67, 169-175. Theil, H. (1971): Principles of Econometrics, New in Distributed Lag Estimation," Jour- York: John Wiley. White, H. (1984): Asymptotic Theory for Econometricians, New York: Academic Press. Appendix. Proof of Lemma 4.2: Part (a..) is the usual central limit result, see e.g. White (1984) Theorem remainder is simply Lemma 2.3 in Phillips and Perron (1988). Q.E.D. To prove main Theorems, the it is = Yt where Zt = (l zero, but it is — t 0oi + #02 (* — n+ 1 ) + et =z t e + £t in the text ^y^). This alters the original specification of #oi to the extent that #02 inconsequential for studying the regression residuals. regarding the disturbance data refers to the fitted residuals of this equation. We abuse notation and call processes covariance stationary processes difference stationary this (a.) if A. Let When c is 6n = \6 n i,6 n 2j is When e is and we place easy, it they satisfy Assumption if 4.1. We first here for completeness. estimator for 8 - (e n2 6a W(l) - 2 Soi) =>°oJ O2 ) =* . J Lemma A. By {e n7 the usual - (§ nl - O2 ) => teo W(r) dr . OLS W(r)dr, f rW[r) dr - 2 W[r) dr J formula, L-e =\Yl Z' Z t Y^ Z' t in 8 t J t . Let [Di{n] Rewrite the above D 2 (n)) = 1' (n (n~ 2 3/ 2 n -3 n~ ' 2 covariance stationary; ) if e is ) if e is difference stationary. hl 2 as: Gn — 6o = -i Di{n) ^2(n)n 3 which implies that: Di(n) ±D {n)n 3 J / 2 \-n V -0; } — \D r\ V^ n (t-^)e '1-1 2 (n)JZ=1 i r~\ I j (1981), extremely convenient, and difference stationary, n 1/2 from Kang alternative assumptions covariance stationary, n" l/2 Proof of OLS denote the n 3/2 (b.) is it The Assumption their first differences satisfy comprises known results, but Lemma is are taken to apply to the disturbances of this equation, e different , Durlauf and Phillips (1986), Phillips and Perron (1988), and others. Also, is The convenient to consider an alternative regression model: the specification that most researchers have used, see for example Nelson and the modification 5.19. , I t ) made and detrended 4.1. Similarly call establish a Lemma; (a..) When e is covariance sta.tiona.ry: D 1 =n ^J2e ^ 1 (n)J2e t t a M(0,\); t=i wE !)• (* ^ - « *- _3/2 E * - r" X> 1/2 =* ff (*w - jf w (r)dr--W(l] ° Thus: n 1/2 (Li-8oi) =*a o J/(0,l); n 3/2 (*„ 2 (b.) When e is O2 ) = 6a W(l)-it W[r)dr . difference stationary: D {n) l ^^. = n- 3 / 2 f\, = " t=i V / *o t=i t=i z 1=1 W(r) f Jo dr; LJo .=i J - Uo Thus: n- 1/2 n 1/2 (f»a - Lemma. This establishes the Part although => 6a ) /" 2 rW'(r) W{r)dr; /" dr- W[r) dr Q.E.D. is due to Durlauf and Phillips (1986); we have not been able to find a reference seems to be relatively well-known. (b.) it Proof of Theorem .;o <?o 2 (Li-8m)^voJ By (Consistency): 4. S for part (a.), the definition, that: et (n)£,_ y (n) = +Z <:,€,_., = UU-3 (a.) Suppose e is -£% Lemma - t = O p (n -1 ), -r tr (in I p„ - 8 - o - (§ n j \v n j S - - Z[ j {§„ ZZ - voj t t - [Z[e t - 3 6 - ) i vn - u j - K-pt] [2t £t-j - ^t-/ € »J covariance stationary. Rewriting the above, y^£,(n)£ _ y (n) But by t A, 6 nl and fln2 - t e w= tr (§n 8 01 = O r (n- a / 2 ), 602) = O r {n~ 3 ). - 6 ) and [8„ 6 ni - Thus the - n 6 J 6 Q7 ^n 3 = O r (n- 3 / 2 ), which imply that \6 nl - £rst term on the left hand side is O p (l) + 6 0,,{1) + = ;1 the left Further, by (l). hand side Lemma 1/ 2 and £2 te = 0,,(n 3 / 2 ), so that the £t e = O r (n + 0,,(l) = 0,,(l). Thus the right hand side satisfies: ' 4.1, again 0,,{\) is ) t ) t t second term on that —2 (b.J Suppose t is 2^£ £t(i)?t-y(i) Joss — £,_y as » — n » oo. difference stationary. Write: £?=/+! M")<«-yW _ Without f > of generality, suppose j - £,_ y (n) 0. = £«(n) ^ Er=y+ «t(») i l n) - ? «-y( ? iW] Then: (e t -y =- - £ £t) u( _ fc + (Z. + (0 - £t-y) («» - j) [e„ 6 - j *o) . k=0 Therefore: y-i £t (n) [et.^n) - tt {n)} = £l - Z, (X - O [ = - JT m-xtt + -E k=0 )] (§ n - S U( - fc )' ( °) u + (§ n -6o)'j2 Z't ut-t ' ^ k=0 -z -So) (L-6o)' (e n t fc=o ' rX which implies: J2 U[n) [?t-y(n] - lt =- (n)} ^E fc=0 JVow apply Lemma 4.1 and Lemma A y^ u,_ fc £t U <- fc£ « + {*» ~ Q) 5> + £ *")' v " t ' (*» " 6 °)' fc=0 t repeatedly: = y^ u,_ t 22 Ut -' £t-k-i fc 1=0 \ t J k = ('» fa, - e o)' (°) £ -«o)Y TUl - ^ Ut-*«t-fc-l »(»~ 1/2 ) £< = ") - O.fn 1 / 2 fc 5582 093 ) • • T X] °»(» 3/2 ) H = U _fcU _i ( t = Op(tl), °p( n ). 0„(n^) + O p (n-V>) O p {n*") = 0„(n), . and finally, (L - Bo)' (f\ - h)'^2Zi = O r {n~ 112 (§ n = \0 r {n- ) ll2 + )0 r {n) p {rT l l*)0 9 [n* 0,,(n). Thus the numerator £? (n)(f _ t t y (n)-? t = (n)) 0,,(n). t To complete the proof, we now establish the asymptotic probability order of the denominator. By 4.1 and Lemma A, -V 2, n 2 converges weakly to Borne functional of W. In particular, the M Lemma limiting random ) variable takes on the value Consequently, as n — » with probability ^n This establishes the Theorem. - in probability. » <r Q.E.D. the text, the divergent crucial points in evaluating (§„ ..... — 1 s E,=i in 6 first £ • ( j ) «t t - entry of 6n and (§„ -6 in part 6 • . ( J J numerator would be Oj,(n 7 ), and given the asymptotic probability order would fail to converge in probability. Proof of Theorem 4.4 (Finite Sample Bias): Thus in either case the exact bias in the = {n-j)- fl ) u-i - z[_ i covariogram estimator (e n - 6 is j multiplied by zero at £ f Z[. Otherwise, the of the denominator, the estimator product - e ) is et . is r f Y, Z E 1 - (b.) (§n In either case, the cross z[ (e n n B{j,n) - (§ n t O ) (k - K *o) i=i+i E -(n-j)- 1 (^[(«»-«o)«t-y]+^-y^[(^»-«o)«t] t=J+l Consider case (a.j. B_v iterating, e^i = + A l £o T^ fc=0 A^Ut—^, whicii implies /or [/ E Ee^ r tx: - ,-i A't I + 2J LV / \ fc ^ "*-fc I / fc=o aii s,t: <-i ^ £ c -r /J A ^1 ' U,_/ j=o V r-i t-i = )' +, £ (2 + ^^) = k=0 t+ 'r(vrt-,) 1 Define the function: def /(s,t,A) Next consider case (b.). is 0,,(l). oo: EILy+i «iW«t-yW As remarked Therefore (^j-^2, tt{n) 7 ) 0. By iterating, en — eo + = Ee x ,e xt . 12h=o u t-k, so that for all »-i t-i Ee u e lt =Ec-c ^y2^TE (u,_ K=0 1=0 fc u t _,) s, t: Define the function: = Ee u e lt g(s,t) Then case in . fa. J: E[(L-6 )ext] [Tz'.E^.^ = [JZZ'.ZA [j^KZA = \t=i \«=i V«=l t=l v«=i Similarly, in case (X>.'/M,A) v,«=i v«=i ^t>.^/ [e n -e )e lt ] \Tz' zA = \Tz',g{s,t)\ t , and V« =l V« = l t=l V« =l Therefore: -1 v«=l 5 Tiie expression /or a is v« identical with g in place of f. The =l \»=1 difference in bias is then: BxV,n)-Bi[j,n) = {n-3V E 1 t=y+i -Z t U[EW \«=i [ (f\Z',zA E« - z<-: V-=i For eacn fixed s,t, as A continuous in /(s, t, A) — establishes the Theorem. — » 1, p(s, t). V»=it=i \»=i / / [J2Z',(f{s,t-j,\)-g(s,t-j))) Ez .'c/(^ ) £> A) - -°( V«=i j we nave /(s,i, A) Therefore, Q.E.D. z;,,. fEE^t(/(-,t,A)-p(«,t)))(f;z;z.) / we see £ > 0) / - g[s,t) — » 0. Further, Bx{j,n) immediately that as A — • 1, B\ [j, n) J5i(j, n) — Bi [j, is n) evidentJy — 0. Tiiis Date Due MIT LIBRARIES 3 TDfiD DOS 3 ?fi Sfls