Document 11157822

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium Member Libraries
http://www.archive.org/details/commonerrorintreOOquah
Mr"
working paper
department
of economics
A COMHOI KEROE IH THE TREATHEHT OF
TRENDING TIME SERIES
by
Danny Quah
Jeffrey K. Vooldridge
«La_-
* 0,3C
V«*hruarv 1988
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
%
A COHHOI ERROR II TEE TREATMENT OF
TRENDING TIKE SERIES
by
Danny Quah
Jeffrey K. Vooldridge
No. 483
February 1988
A Common
Error
in the
Trending Time
Treatment of
Series.
by
Danny Quah and
Jeffrey
M. Wooldridge
'
February 1988.
*
the
Both
are at the
Department
NBER. We thank
of
Economics and the
Statistics Center,
MIT. Quah
Olivier Blanchard, Francis Diebold, Stanley Fischer, N. Gregory
Melino for helpful comments on an
earlier draft.
is
also affiliated with
Mankiw and Angelo
A Common
Error
in the
Trending Time
Treatment
of
Series.
by
Danny Quah and
M. Wooldridge
Economics Department, MIT.
Jeffrey
February 1988.
Abstract
There are two
common
difference-stationary,
less
show
1.
in the analysis
of trending time
First, if a series is
series.
removing a linear time trend introduces spurious cyclically. Second, regard-
of whether a series
in either case, a
misconceptions
is
difference-stationary or trend-stationary, taking Erst differences produces,
covariance stationary sequence, and bo
that the Bret statement
is
incorrect
is
recommended econometric
practice.
We
and that the second can be misleading.
Introduction.
A number
of recent papers have
recommended taking
metric analysis (see for example Campbell and
a series
is
Mankiw
observed time series prior to econo-
(1988) and others).
The reasoning
as follows. If
is
truly difference-stationary, then removing a linear time-trend produces spurious cyclicality in the
residuals.
Under the same
and so
convenient for econometric analysis.
is
first-differences of
condition, taking
first
differences produces a series that
If,
on the other hand, the series
covariance stationary,
is
is
truly trend-stationary,
taking first-differences nevertheless produces a covariance stationary series, albeit one with a zero in the
spectral density at frequency zero. This
the
is still
satisfactory however (the reasoning goes), as the unit root in
moving average part produced by over-differencing
remain agnostic
if
one
is
to
is
always to take
econometric analysis. Put another way, the recommended practice
is
not to detrend
for detrending leads to spurious "cyclical*
by many macroeconomists. See
(1981, 1984),
Thus,
practice
as to the cyclicality of the observed time series, the
first-differences prior to
Kang
will manifest in the final estimates.
for
behavior
in the residuals.
recommended
This view has become quite widely held
example Campbell (1987), Deaton (1986), Nelson (1987), Nelson and
Mankiw and Shapiro
(1985),
Romer
(1987),
and Shapiro (1986) among others.
Nelson and Kang (1981) and Nelson and Plosser (1982) have forcefully argued that
ing of a unit root process produces spurious cyclicality.
It is
known
that
when
least squares detrend-
the data are trend-stationary,
2
OLS
estimators of the intercept and time trend coefficients converge to their probability limits quite rapidly.
Thus there
is
good reason to believe that
in this case,
the detrended data appropriately reveal the true
underlying dynamics about trend. In addition, Durlauf and Phillips (1986) have shown that
OLS
are difference-stationary, the
does so
much more
when
the data
estimator for the time trend coefficient, while converging in probability,
slowly relative to that in the trend-stationary case. Further, the
intercept in this case actually diverges.
Thus
is is
OLS
estimator for the
not surprising that the "spurious cyclically" view
is
so
prevalent.
i
This paper demonstrates that this reasoning
fundamentally incorrect.
is
First,
we demonstrate
removing a linear time trend actually preserves the true stochastic characteristics of the data.
We
that
show
that data detrended by least squares regression asymptotically provide the correct picture of the underlying
dynamics, independent of whether the data are truJy trend- or difference-stationary.
establish
data
two
results here: First,
if
More
precisely,
we
the data are trend-stationary, the covariogram estimator using detrended
asymptotically indistinguishable from that using the true unobserved fluctuations about trend. Con-
is
sequently, the
same
is
difference-stationary,
gram estimator
true of the correlogram estimator in this case. Second, suppose instead the data are
and a researcher detrends the data by
at each lag converges in probability to
1,
least squares.
We show
then that the correlo-
which correctly indicates the presence of a unit
root. In other words, the researcher will appropriately conclude that the residuals are a unit root process,
and are not
cyclical
about trend.
These statements however are asymptotic;
dynamic correlation subsequent
in the
when
is
is
still
be the case that
and Kang (1981) on the
the data are truly difference stationary.
covariogram estimator
process
may
in finite
to least squares detrending are severely misleading.
as starting point the findings of Kelson
residuals
it
of approximately the
trend-stationary or difference-stationary.
finite
this,
we take
detrended residuals, the exact bias
same magnitude, regardless
finite
analyze
sample bias due to using detrended
We show that using
With
To
samples, measures of
of whether the underlying
samples, any reasonable transformation of
the estimated covariogram, such as the estimator for the spectral density, will have approximately the same
bias properties, again regardless of whether the data are truly trend- or difference-stationary.
indicated by Nelson and
Kang
(1981)
is
Thus
the bias
irreducible even in the "best practice" case of using least squares to
3
detrend trend-stationary data.
process
is
misguided: there
Clearly,
if
is
We
conclude that emphasizing bias due to detrending a difference stationary
a significant
finite
sample bias associated with least squares detrending, period.
the data are difference-stationary,
it is
a misspecification to estimate a time trend, and then to
interpret the residuals as being close to a stationary process.
model
is
We
agree that estimating a correctly specified
better than estimating an incorrectly specified model. However, unless
the case that least squares detrending produces misleading results to
well,
we do not
find convincing the
is
and
finally,
the data are trend-stationary as
treacherous for subsequent econometric analysis, should
the true data generating process actually be trend-stationary.
mean
proponents wish to extend
argument that detrending produces spurious cyclically.
Next, we show that taking first-differences
procedures: estimating the
when
its
We
illustrate this for three different statistical
of such a transformed process, estimating the
when one
estimating causality patterns
of the variables
is
moving average
coefficient,
such an over-differenced process.
In sum, our analysis almost exactly overturns the conclusions indicated in the introductory paragraph.
We
contend with the suggestion that macroeconomists should always use first-differenced
pre-judge the cyclically in the data:
The remainder
of the
paper
is
we
series so as not to
strongly disagree with this view.
organized as follows. Section 2 considers the effects of removing a fixed
but arbitrary linear time trend from a difference-stationary process.
The conclusion
here
is
obvious: the
deviations from any fixed linear time trend remain difference-stationary. Section 3 considers the effects of
first-differencing a trend-stationary process.
The
results here are less benign: this introduction of a stochastic
singularity manifests in the spectral density vanishing at frequency zero.
number
for a
situation
is
shown
to cause
problems
of inferential questions of interest.
In Sections 2
removal of a
This
fixed
and
3,
we
treat the artificial but usefully intuitive case
where by detrending we mean
although completely arbitrary linear time trend. Section 4 considers the relevant practical
when detrending
is
performed by
least squares regression.
We show here
are unmodified: regardless of whether the true data generating process
is
that our earlier conclusions
trend- or difference-stationary, the
correlogram estimators using least squares detrended data are consistent asymptotically, and are "similarly''
behaved
in finite
samples. Section 5 concludes the paper.
4
Difference Stationary Data Generating Process.
2.
Suppose that the underlying data generating mechanism
(i)
(ii)
(iii)
Y = fa + Yt-i + «t,
is
difference-stationary:
«>1,
t
Yo a given random variable, and
u< covariance stationary with
Iterating on
(i)
,
we have
mean
zero and spectral density
bounded away from
zero.
that:
3=1
A
linear time trend specification
is
a pair of real numbers (qj, 0x).
Removing
this linear
time trend from
Y
t
results in:
X
= Y — <*i — 0\
X
= [Y -
t
t
•
t
t
«*.
t
a,)
+ (Po-Pi)-t + Yl «i]=i
The "detrended
X-t
residuals"
can be written
X
t
have the integrated form of a difference-stationary sequence. In particular,
as:
(a.)
X
=
(i.)
Xq
a given
(c.)
t
the
{0o
same
~
as
0i)
+
*t-i
random
(iii)
+ «t,
*
variable, Yo
>
—
1,
Qi,
above.
Thus, the detrended residuals are another difference-stationary sequence, where the first-difference sequence
Xt — Xt-i has exactly the same probability properties
as the first-difference of the original sequence,
Y — Yt-i
t
(except possibly in mean).
Therefore, unless one supposes that the original series already has a tendency to show "spurious cyclically,'' there is
We return
of the data.
absolutely no reason to draw that conclusion for the detrended series.
in
Section 4 below to this observation that detrending does not alter the dynamic properties
5
Trend Stationary Data Generating Process.
S.
Suppose now that the true data generating mechanism
Y =
(i)
t
a
is
trend-stationary:
+ p -t+u u
u t covariance stationary with mean zero and spectral density bounded away from zero.
(ii)
As
before, a linear time trend
is
Removing a time trend
numbers (ai,5i).
a pair of real
therefore
produces:
=Y
X\
=>
The
resulting process A',1
as in the original data
is
is
Y
(in u).
2,
X] =
(oo
-h-t
ai
-
01)
+
(ft,
-fa)-t + u
we
In the special case, where qj
and
Removing
note:
numerous
Next consider taking
first
tJje
qo and
Comparing
underlying model
covariance stationary,
2
X?
=r
1
is
-r _
t
it
does not distort the true features
true.
is
1
=
^o
+
(u 1
-u
1
_ 1 ).
also covariance stationary, although
has always been relegated to the statement that
point out that instead
Inference about
X?
with the
this
difference stationary or trend stationary.
is
it
has a zero in
density at frequency zero. This last characteristic has been noted by macroeconomists, but
We now
tn « result
ft),
differences of trend-stationary data:
t
is
same dynamic stochastic properties
a fixed but arbitrary linear time trend preserves the true
assertions that the opposite
A
Since u t
.
ft are equal to
linear de-trending actually has the very desirable property that
of the data, contrary to
t
simply a special case of trend-stationarity.
is
properties of the data, reg-ardJess o/wietiier
Hence
-
seen to be trend-stationary, with exactly the
covariance stationarity, which
conclusion from Section
t
its
"it will
presence renders
X
7
show up
Then
informative on a hypothesis of interest.
does not have desirable properties. First,
demonstrate that
we show
significance
moving average
X2
produces
statistics that
part."
admit nonde-
the researcher can be confident that a given sample
We
spectral
treacherous to analyze econometrically.
requires that as a stochastic process,
generate asymptotic distributions.
as a unit root in the
its
its
X
7
is
actually
being an over-differenced sequence in fact
that in the leading case of expectation estimation, such
an over-differenced process produces a degenerate asymptotic distribution
for the
sample mean estimator.
6
In particular,
it
produces a
timation. Second,
we show
an over-differenced
X2
statistic that
tantamount to using only the
is
first
and
last
data points
in es-
that the nonlinear least squares estimator for the moving average coefficient in
has the same asymptotic properties as that for the autoregressive coefficient
levels of a difference stationary process.
Thus,
if
in the
macroeconomists are using first-differenced data because
they do not wish to use the nonstandard distribution theory associated with unit root processes, they should
recognize that they are faced with precisely the same problem
truly trend-stationary. Third,
test will
may be found
in
first-difference a process that is
that using an over-differenced sequence
produce spurious evidence of causality, when
related. This last point
To
we show
when they
in fact the
Sims (1972) as
well,
begin, consider the problem of estimating the
mean
n" 1 J2 *? = Po
+ n~
X7
.
-2
in a bivariate causality
data are actually not Granger causally
although
of
A
it
does not appear to be well-known.
For a sample of size
n,
the sample
mean
is:
l
(u„
-u
).
t=i
Notice that only two (covariance stationary)
mean
statistic.
random
Thus the sample mean converges
mean and
of) the statistic. In
we
see that taking
actually trend stationary; will produce data that
For our second point, suppose
first
differences,
may not
for simplicity that
nondegenerate
words, after first-differencing a trend-
associated statistics are econometrically useless.
techniques that rely in part on a central limit property for the sample
valid statistical inference. Thus,
u enter the calculation of the sample
to po, but at a rate that does not allow a
asymptotic distribution for (any normalized version
stationary sequence, the sample
variables u„ and
mean
when
will
Econometric
consequently not allow any
the true data generating process
is
be particularly informative for statistical inference.
/?
is 0,
and that uq
is
0.
Consider estimating the
model:
X? = u
«o
For parameter
=
t
0,
--yout-i,
7o
=
r=l, 2,...
I-
7, define the residual function:
Rod) =
o,
Rth) = X? + 7^-1(7),
Thus, the true disturbance
ti t
is
obtained as i?t(l) for
all t
t-1,2,....
> 1.
Estimation by nonlinear least squares solves
the problem:
min^i
JZt( 7
<-*
2
i
To obtain
the asymptotic properties of such an estimator,
For the
of the score.
sequence
8R
t
th term, the score
is s
t
=
{i)
follows from the initial condition
it
it is
.
convenient to study the asymptotic distribution
Rt{l)
=
(^)/d'y at the true parameter value 70
equal to U(_i. Then
fo behaves
—
t
2
)
1
is
'
(& R*[~l) I d 7)
But from above, the random
.
a unit root process, with
itself
=
dRo(l)/di
as:
0,
first
difference
that the score at the true parameter
i
*i(l)
=
««(l)
=u
0=
ux
t
-^u y
t>2.
y=i
Notice that the score
the product of a covariance stationary sequence u t with
is
its
accumulation 5H/»i U J-
Clearly, the resulting process will not have the usual central limit properties; thus neither will the nonlinear
least squares estimator of the coefficient.
asymptotic properties.
Moreover, the hessian
In fact, the score here has exactly the
can be seen to have nonstandard
itself
same form
as the score in least squares
estimation of the auto-regressive coefficient in a difference-stationary process. 1
Thus
(the score for) the
nonlinear least squares estimator of the moving average coefficient in an "over-differenced process" converges
random
not to a normal
variable, but instead to now-familiar functional of
Brownian motion.
If
the data
were truly difference stationary to begin, then of course the usual asymptotic normality theory would apply.
The
distribution theory for the non-standard (first-differenced) case
procedure taking
this into
account
is
is
different
Finally,
length
1
is
data
we turn
is
Our point instead
is
it is
not that a
that the distribution theory that
Campbell and Mankiw (1988) seem
to suggest for the use of
simply non-existent.
to the use of over-differenced processes in
unknown and
so
across trend stationary and difference stationary models. Thus,
the unifying framework that authors such as
first- differenced
now known,
completely intractable. (We have not seen that any economist using
first-differenced data has actually used this however.)
applies to first-differenced data
is
instead
is
made
Granger causality
a function of sample size. This last condition
when
the true lag
may seem
unusual, but
tests
This is related to a familiar result from Lagrange Multiplier theory for standard models: for instance,
Godfrey (1978) shows that Lagrange Multiplier tests against stationary autoregressive and moving average
alternatives are identical when the null is white noise; the Lagrange Multiplier of course is just the score.
That literature is however not particularly concerned with unit root models.
8
remember that
the reader should
applied researchers often do have to decide on a lag-length specification in
time domain work. They do this based on observed data (and hence
lag length
we
is
never
We
are interested.
Y, which
is
known
Consider two random sequences
for certain.
will use the notation
X
2
already stationary: thus
Z
is
2,
the causality relations between
X
2
has a zero in
variable on current, lagged
and that
Z
and
Y
and
Y
Y
Z and
is
and
size),
Y,
in
of course the "true"
whose causality
the outcome of first-differencing
Suppose that
spectral density at frequency zero.
its
relations
bounded away from
From
zero.
Sims's
Theorem
can be studied by considering the two-sided projection of one
and future values
assume that
Y
is
Thus neither Z nor
Y
Granger cause the
trivially one-sided.
A
researcher using
of the other. For simplicity
and
are in fact uncorrected at all leads
lags.
and thus are
other: the two-sided projections are identically lero
and
sample
to indicate a variable that
a covariance stationary process with spectral density
Z
its
sample
will necessarily discover this for sufficiently large
However consider instead the two-sided projection
Z
of
on
just white noise,
Z
sizes.
X
2
:
OO
EvtX2.,- =
Z = Y, H3)X?-i + Vu
t
for all
0,
j.
y=-oo
The
true projection coefficients
b
squares estimation will attempt to achieve
projections
b n (j),
j
= — oo,
.
.
fourier transform of a (square
projections (implied by)
formula
bn
.
,
n >
oo,
Let n denote sample
this.
1.
summable) sequence
fit
and
The
b.
difference in
(implied by) bo
is
2
=
0,1,...,
symmetrir. about
at lag
and
,
mean squared
and
least
let b
error
denote the
between
fitted
given by Sims's approximation error
x [b n
,
b
2
)}
= ±- [
Z7T J ^ T
\l
n [u)
£
-
toM Sx {w) du>
2
bn{j)X _ 3
.
consider the following family of candidate fitted projection coefficients: for n
|y|
zero,
the sequence of fitted
call
X
is
as:
=E
for
size,
Call Sx{v) the spectral density of
and the true best
\d
Now
R2
again are zero. Equivalently, the true (optimal)
{j)
and lead j
n and
0:
=
otherwise.
For each n, this
the distribution takes the value
n.
But when X?
1
= Y — Y -i =
t
t
j
=
0,
and then declines
u — u -i with u white
t
1, b n (j]
= n~
\n
—
j\
a triangular shaped two-sided lag distribution
is
at
>
t
as a straight line to
noise, such a family of lag
distributions implies a deterioration in
[d
x {b a ,bo)r =
H
E
i
=E
mean squared
K{j)[ut-j
error given by:
- Vt-i-l]
j=-oo
bn
{-n)u t+n
+
^2
|fc„(y)
-&„(/-
l)]u t _y-fc„(n)u ( _ n _
1
j=-n+l
=
Var(u)
=
2n
£
(Mj)
"Mi-
2
I))
y=-n+i
As Bample
Var(u)
size increases to infinity,
squared error. However,
at
_1
any sample
size
all
-Oasn-oo.
such a family of lag distributions
members
fits
of this family are two-sided: thus
would lead the researcher
arbitrarily well in the sense of
Granger causality
to conclude incorrectly that
This rather surprising result can be understood
in
Z
statistics
Granger causes
X
singularity in the
X'X. This can be
the case of interest here, this
than one distribution
will
fit
interpreted to
exactly
is
.
terms of ordinary least squares theory. The spectral
the spectral density
If
produced
2
density of the right hand side variable in a distributed lag regression corresponds to the X'
ordinary least squares models.
mean
mean
is
zero at
some frequency, that
that the population regression
what happens. The
lag distribution
is
X
matrix
in
equivalent to
not identified. In
is
not uniquely identified: more
is
the data equally well. Using first-differenced data therefore renders particularly
subtle the interpretation of Granger causality statistics.
We
emphasize that
this
is
fundamentally different from the usual prefiltering
that with covariance stationary data, prefiltering by arbitrary one-sided filters
bounded away from zero
leaves unaltered patterns of Granger causality.
The
effects. It is
whose
easy to show
fourier transforms are
result here
is
that prefiltering
one of the series by first-differencing does affect Granger causality relations.
Thus, contrary to the sanguine conclusion that over-differencing a trend-stationary process will simply
"show up as a unit root
in the
unreliable econometric results.
resulting process
is
moving average
It is
part,"
we conclude
that this
may
true that in either case of difference stationarity or trend stationarity, the
always covariance stationary. However, the resulting process
to econometric analysis
when
the data were trend stationary to begin.
seem to us especially convincing
instead produce altogether
We
is
not necessarily amenable
have presented three examples that
of the difficulties associated with using "over-differenced" data.
At the same
10
time, these examples are of particular interest to macroeconomists.
the
mean and
testing for
Granger causality) are related
in that
The
first
and third examples (estimating
they are both due directly to the spectral
density of an over-differenced process vanishing at frequency zero. Information on such a process does not
accumulate as the observed sample
size increases.
Our second example cautions
that first-differencing to
produce stationarity, no matter whether the original data are difference- or trend-stationary,
a panacea for the econometric difficulties confronting researchers
In particular,
if
is
by no means
when they analyze trending time
Beries.
they wish to avoid the nonstandard distribution theory associated with analyzing difference
stationary processes in levels, they should realize that they have simply re-created those difficulties
when
they take first-differences of a trend-stationary sequence.
4.
The
Effects of Least Squares Detrending.
In the previous
two
sections,
although arbitrary time trend.
when
the detrending line
First,
we
is
we used
We now
the convenient fiction that detrending involved removing a fixed
show that appropriate versions
of the reasoning
above apply even
estimated by regression.
establish that
if
the data have a unit root, the correlogram estimated from detrended data
converges pointwise at each lag to the correct value of
generating process has a unit root, the
OLS
1.
This
initially
estimator for the intercept
Thus one might conjecture
Durlauf and Phillips (1986)).
may
is
seem
known
surprising:
when
the data
to diverge (see for instance
that the detrended residuals have no desirable
properties.
However,
recall that the
Durbin- Watson
statistic in that case nevertheless
correct value of zero (again, see Durlauf and Phillips (1986)).
Durbin- Watson
for the intercept
statistic involves the first difference of the
is
simply subtracted out, and
its
On
converges in probability to
further thought, this happens because the
detrended data. Thus the ill-behaved estimator
lack of convergence in probability
is
inconsequential.
Next, the estimator for the time trend coefficient only converges at rate yn, whereas
by a variable (time) growing
with the sample
size.
the fitted residuals.
as n, so that the error
However, again
Thus
it is
recall that the
only the
first
its
it
gets multiplied
from using estimated rather than true residuals grows
Durbin-Watson
statistic involves the first difference of
difference of the time variable that
is
relevant. This of course
11
is
just constant. This feature controls the rate at which the estimation error grows,
and
in particular,
that
error actually converges at rate y/n in probability.
But now we note that
this reasoning applies not only at the first lag, as for the
but in fact applies at every fixed
that
we
lag.
Thus even
in the difference
stationary case
it is
Durbin- Watson
statistic,
asymptotically irrelevant
use estimated rather than true residuals in estimating the correlogram.
For the trend stationary case, the estimators
probability at rates n 1 / 2 and n 3 / 2 respectively.
for the intercept
Thus
in this case,
and time trend
it
coefficients converge in
straightforward to establish that
is
the correlogram estimator using estimated residuals converges in probability to the correlogram of the true
residuals.
From
the discussion above, and the formal results below,
statement extends to difference stationary data as
1
the time trend coefficient converges only at the slower rate n
least squares
we turn
to behavior in finite samples.
detrended data, when
in fact the
are able to establish that this
Note that by contrast with trend-stationary data,
well.
in the difference-stationary case, the estimator for the intercept diverges
Second,
we
'
at rate
n
'
,
and the estimator
for
2
.
Kang
Nelson and
(1981) have argued that the use of
data are difference stationary, results
covariogram and the spectral density that are biased towards showing cyclically
in
in
estimators for the
We
samples.
finite
interpret this as saying that incorrect econometric specification will lead to an incorrect conclusion. In our
view, this argument carries weight only
if
correct specification in this context does not lead to that
erroneous conclusion. Suppose on the other hand, that the true data generating mechanism
about trend.
Then, the "best practice" procedure
asymptotically and in
finite
with the trend
with a two-step procedure by
line, or
is
to detrend
the true data generating
will necessarily
least squares:
samples whether one estimates the residual
first
mechanism were white
be serially correlated.
example introductory textbooks such
fixed lag, the exact bias in the
covariogram estimator
irrelevant both
is
serial correlation
The answer
is
simultaneously
yes.
noise about deterministic trend, the fitted residuals
This follows directly from the properties of
as Theil (1971)).
it
stationary
detrending. In this situation, would a researcher
following "best practice" similarly discover spurious cyclically?
If
by
is
same
We show
is
below that
BLUS
residuals (see for
for every finite sample, at
any
"continuous" in the serial persistence of the data
generating mechanism, in a neighborhood containing unit root processes. Loosely speaking,
in finite
samples,
12
there
certainly a nonzero bias in the estimated dynamics of an inappropriately detrended unit root process;
is
however, this bias
only of the same order of magnitude as the bias
is
when trend
stationary processes are
correctly detrended.
One
possible conclusion from this
is
that the message due to Nelson and
Kang
(1981) applies to persistent
trend-stationary models as well as to difference-stationary models: researchers will just always find spurious
cyclicality, regardless of the true
is
We think
model.
this
is
a
little
that finite sample theory does not EUggest that detrending
is
extreme: our preferred interpretation instead
a
bad thing to do. Putting
our results for the asymptotic behavior of the detrended residuals, we conclude that there
detrending produces misleading results when the true model has a unit root.
econometric specification
is
we do
with
no evidence that
Of course using the
always better than using an incorrect specification, but
is
this together
correct
not think that
is
the issue here.
Some economists have
suggested to us that
why
unit root process, you'll see clearly
made
the unit root process are
cyclical.''
While there
there
come
to
"if
you
just look at the picture of fitting a trend line to a
spurious cyclicality; the end-points of both the trend and
is
close to each other.
Thus
the detrended data are
certainly something to this graphical intuition,
is
calculations below. Consequently,
we
its effects
believe this intuition to be incorrect,
made
to look
show up nowhere
and we do not
in
our
find these "look
at the picture" -type arguments persuasive.
To begin
the formal analysis,
dependence permitted
sum
sequence: So
Assumption
(a.)
=
in
our data. Let
S =
,
we impose some standard assumptions on
t
2J,-=1
£u =
for all
t
Eu\ = al >
(c.)
sup t £'|u t 2+ *
uj- ^' e impose the following conditions.
J
;
for all t
<
oo
for
;
some
6
|
a%
be the disturbance identified above, and define the partial
4.1 (Regularity): Let u satisfy:
(b.)
(d.)
{u^}^
= lim^oo E (n~
1
S%)
exists,
>
;
<
c%
<
the heterogeneity and serial
00
;
'
'
13
(e.)
{u,}^.j
is
strong mixing, with mixing coefficients a m such that:
oo
£«*l-2/« <
oo.
m=l
We
now
use these conditions as they are by
relatively familiar in the literature:
The reader
conditions possible to obtain our results.
(1988) for further discussion of these conditions.
referred to Phillips (1987)
is
they are not the weakest
and
Finite order covariance stationary
Phillips
ARMA
and Perron
process with
Gaussian disturbances (where the moving average part does not have a unit root) can be shown to
satisfy
these assumptions.
We
will use the following result repeatedly in the
Lemma
subsequent discussion:
(Asymptotic Distributions): Assume
4.2
the conditions of
Assumption
4.1.
Then
as n
—
oo:
,
(a-)
n- 1 /2X: tLiUt^«ToA'(0,l);
(b.)
n-^E.l^^ac/^'Mdr;
(e.)
n- 3 /=
where
E n=i *«t
t
A'(0, 1)
is
4.8
OLS
=
,t
l,2,...,n} be an observed sample;
and time
regression on a constant
n
then for each fixed lag
-
1
is
standard Brownian motion, and => denotes weak convergence.
is
t
IfY-t is trend-stationary;
IfYt
W
results are expressed in the following:
—
(b.)
H W(r) dr);
(Consistency): Let {Y
residuals from an
(a..)
-
the standard normal,
Our asymptotic
Theorem
=* *o (W(l)
—
>
—
)
n
-
t=j+i
—
v^t:
—r^
r~)
—
»
oc:
u t (n)u -j[n)
t
—
in probability.
t=y+i
E,"=3T i"'(")"'-;(")
Et=iUtL"
n
'
difference stationary, then for each fixed lag
Ut(n) be the fitted
trend.
"
1
uj(n)u-_,-(Ti)
J
j, as
let
j,
as n
—
oo:
>
in— m
1
u k-r*
probability.
14
Thus
detrending does not affect convergence
in either case,
the correct values. While this result
it
nevertheless
is
process
We
is
reassuring,
is
an asymptotic statement.
still
therefore establish a continuity proposition: If the data are trend-stationary, with the true distur-
model somewhat
redefine the
w iU
Theorem
e xt
(b.) e
=
4.4 (Finite
u -
Ae
M _j + u
ei,t-i
+u
t
,
t
,
Sample
For each j
t
=
Bx{j,-n).=
|A|
1,
£
1,
0, 1,
.
. .
,
<
=
10
n >
size
E
Bias): Let u
is
A>*
+
same
£«,
1,
u
(»-j)
Jet
{u
(
}
uncorreiated with
t>1 be
=
£ A0
modeb:
"o;
.
Jet
3,
n—
be a random variable, and
1,
l
?At("), «it( n ),
t
=
l,2,...,n, denote the detrended data (fitted
the exact biases in the covariogram estimator are:
13
ht[n)ix.t- :in)
- [n-j)
1
eit[n)ei, t -j[n)
- [n-j)
1
]P
e At
^
£
(n)e A t _y(n)
,
t=j+i
and
*i(j»
=E
[n-j]
:
Yi
t=j+i
Then
of the
in fact difference stationary.
disturbance {ft} t >! be generated by two alternative
4.1; let the
>
t>
Given a sampJe of
residuals).
+
Qo
using detrended data
Let
for convenience:
=
in
either be covariance stationary, or be generated as a unit root sequence with zero drift.
uo and satisfy Assumption
(a.)
sample bias
finite
sample bias when the data are
finite
Yt
et
that in finite samples, detrending a unit root
severely misleading.
order of magnitude as the
where
correlogram estimator to
reasons articulated above,
initially surprising for the
may be
It
bances about trend highly correlated, then the
We
and
in probability of the
for each
£xed n >
3,
for each fixed j
t=y+i
=
Bx [j,
0, 1,
n) -»
n
—
Bx {j,
n)
. .
.
,
1:
asA-t
1.
"( n ) e i-'-j( n)
15
Case
in the
(b.)
Theorem
specifies a zero drift unit root process;
however since the true parameter
/5q
is
arbitrary, our discussion certainly covers unit root processes with nonzero drift.
Theorem
least squares)
4.4 states the following:
of the
is
same order
the bias that arises from using estimated residuals (detrended by
of magnitude, independent of
whether the underlying data are difference-
stationary or trend-stationary.
To summarize
the results of this section, while there are clearly differences in the behavior of least
squares estimators of trend coefficients across trend- and difference-stationary data, the resulting detrended
data have similarly revealing properties
for their true underlying
dynamics.
Conclusion.
5.
The
specific technical contribution of this
paper
is
two-fold:
First
we have shown that the correlogram
estimators using least squares detrended data are consistent for the true values of the correlogram, regardless
of
whether the data are actually trend- or difference-stationary. Second, we have established that the
sample bias
in "cyclically" that results
from detrending difference-stationary data
finite
no worse than that
is
in
the best practice case of detrending trend-stationary data.
Our
conclusions draw on both exact as well as asymptotic arguments.
reasoning that has motivated the erroneous conclusions
a
number
of applied workers have
we
list in
This
is
We
the Introduction.
adopted these incorrect statements
in their
own
in
keeping with the
observe that quite
research.
We
emphasize
that:
(i)
removing arbitrary
fixed linear time trends actually preserves the true properties of the time series data,
regardless of whether the true
(ii)
taking
first
differences,
if
model
is
difference stationary or trend stationary,
the true model
is
in fact
trend stationary, produces data that are econometri-
cally useless, and,
(iii)
least squares
detrending does not disguise the cyclical properties of the data.
In our view, the discussion here overturns the conventional
Our
wisdom among many applied
researchers.
results contradict the observation that incorrect detrending distorts the statistical properties of the data
and produces "spurious
cyclicality,"
and warn against the undiscriminating use
of
first
differencing: First
16
differencing
data.
is
not a procedure we would
recommend
to researchers confronted with trending time series
17
References
Campbell, J. Y. (1987): "Does Saving Anticipate Declining Labor Income? An Alternative Test
Permanent Income Hypothesis," Econometrica, November, 55 no. 6, 1249-1274.
of tlie
Campbell, J.Y. and N.G. Mankiw (1988): "Are Output Fluctuations Transitory?" Quarterly Journai
of Economics, forthcoming.
Deaton, A.S. (1986): "Life-Cycle Models of Consumption:
Working Paper No. 1910, Cambridge.
Is
the Evidence Consistent with the Theory?"
NBER
Durlauf, S.N. and P.C.B. Phillips (1986): "Trends versus
Cowles Foundation Discussion Paper no. 788, Yale University.
Random Walks
in
Time
Series Analysis,"
Godfrey, L.G. (1978): "Testing against General Autoregressive and Moving Average Error Models when
Dependent Variables," Econometrica., 46, 1293-1302.
the Regressors include Lagged
Mankiw, N.G. and M.D. Shapiro
(1985): "Trends,
Random
Walks, and Tests of the Permanent Income
Hypothesis," Journal of Monetary Economics, September, 16, 165-174.
"A Reappraisal
Economy, 95 No. 3, 641-646.
Nelson, C. (1987):
Political
Nelson, C. and H.
Kang
(1981):
of
Recent Tests of the Permanent Income Hypothesis," Journai of
"Spurious Periodicity
in
Time
Inappropriately Detrended
Series,"
Econometrica, May, 49 no. 3, 741-751.
Nelson, C. and H. Kang (1984): "Pitfalls in the Use of Time as an Explanatory Variable
Journal of Business and Economic Statistics, January, 2, 73-82.
Nelson, C. and C. Plosser (1982): "Trends and
Random Walks
in
Macroeconomic Time
in
Regression,"
Series," Journal
of Monetary Economics, 10, 139-162.
Phillips, P.C.B. (1987):
and
Phillips, P.C.B.
P.
"Time
Series Regression with
Perron (1988): "Testing
for a
A
Unit Root," Econometrica, 55, 277-301.
Unit Root
in
Time
Series Regression," forthcoming
Biomet.ri.fca.
ing
Romer, CD. (1987): "Changes
Paper No. 2440, Cambridge.
Shapiro,
Activity,
1,
M.D.
(1986):
in the Cyclical
Behavior of Individual Production Series,"
NBER
Work-
"Investment, Output, and the Cost of Capital," Brooking Papers on Economic
111-152.
Sims, C.A. (1972): "The Role of Approximate Prior Restrictions
nal of the American Statistical Association, 67, 169-175.
Theil, H. (1971): Principles of Econometrics,
New
in
Distributed Lag Estimation," Jour-
York: John Wiley.
White, H. (1984): Asymptotic Theory for Econometricians,
New
York: Academic Press.
Appendix.
Proof of Lemma 4.2: Part (a..) is the usual central limit result, see e.g. White (1984) Theorem
remainder is simply Lemma 2.3 in Phillips and Perron (1988). Q.E.D.
To prove
main Theorems,
the
it is
=
Yt
where Zt
=
(l
zero, but
it
is
—
t
0oi
+
#02 (*
— n+
1
)
+
et
=z
t
e
+
£t
in the text
^y^). This alters the original specification of #oi to the extent that #02
inconsequential for studying the regression residuals.
regarding the disturbance
data refers to the fitted residuals of this equation.
We abuse notation and call processes covariance stationary
processes difference stationary
this
(a.)
if
A. Let
When
c is
6n
=
\6 n i,6 n 2j
is
When
e is
and we place
easy,
it
they satisfy Assumption
if
4.1.
We
first
here for completeness.
estimator for
8
-
(e n2
6a
W(l) -
2
Soi)
=>°oJ
O2 ) =*
.
J
Lemma
A. By
{e n7
the usual
-
(§ nl
-
O2 ) => teo
W(r) dr
.
OLS
W(r)dr,
f rW[r) dr -
2
W[r) dr
J
formula,
L-e =\Yl Z' Z
t
Y^ Z'
t
in 8
t
J
t
.
Let
[Di{n]
Rewrite the above
D 2 (n)) =
1'
(n
(n~
2
3/ 2
n -3
n~
'
2
covariance stationary;
)
if e is
)
if e is difference stationary.
hl 2
as:
Gn
—
6o
=
-i
Di{n)
^2(n)n 3
which implies that:
Di(n)
±D
{n)n 3 J
/
2
\-n
V
-0;
}
— \D r\ V^ n (t-^)e
'1-1
2 (n)JZ=1
i
r~\
I
j
(1981),
extremely convenient, and
difference stationary,
n 1/2
from
Kang
alternative assumptions
covariance stationary,
n" l/2
Proof of
OLS
denote the
n 3/2
(b.)
is
it
The
Assumption
their first differences satisfy
comprises known results, but
Lemma
is
are taken to apply to the disturbances of this equation,
e
different
,
Durlauf and Phillips (1986), Phillips and Perron (1988), and others. Also,
is
The
convenient to consider an alternative regression model:
the specification that most researchers have used, see for example Nelson and
the modification
5.19.
,
I
t
)
made
and detrended
4.1. Similarly call
establish a
Lemma;
(a..)
When
e is
covariance
sta.tiona.ry:
D
1
=n ^J2e ^
1
(n)J2e
t
t
a M(0,\);
t=i
wE
!)•
(*
^
-
«
*-
_3/2
E * - r" X>
1/2
=*
ff
(*w - jf w (r)dr--W(l]
°
Thus:
n 1/2 (Li-8oi) =*a o J/(0,l);
n 3/2 (*„ 2 (b.)
When
e is
O2 )
= 6a W(l)-it
W[r)dr
.
difference stationary:
D
{n)
l
^^. =
n- 3 / 2
f\, =
"
t=i V
/
*o
t=i
t=i
z
1=1
W(r)
f
Jo
dr;
LJo
.=i
J
-
Uo
Thus:
n- 1/2
n 1/2 (f»a -
Lemma.
This establishes the
Part
although
=> 6a
)
/"
2
rW'(r)
W{r)dr;
/"
dr-
W[r) dr
Q.E.D.
is due to Durlauf and Phillips (1986); we have not been able to find a reference
seems to be relatively well-known.
(b.)
it
Proof of Theorem
.;o
<?o 2
(Li-8m)^voJ
By
(Consistency):
4. S
for part (a.),
the definition,
that:
et
(n)£,_ y (n)
=
+Z
<:,€,_.,
= UU-3
(a.)
Suppose
e is
-£%
Lemma
-
t
= O p (n -1 ),
-r tr
(in
I
p„
-
8
-
o
-
(§ n
j
\v n
j
S
-
-
Z[
j
{§„
ZZ -
voj
t
t
-
[Z[e t - 3
6
-
)
i
vn
-
u
j
- K-pt]
[2t £t-j
- ^t-/ € »J
covariance stationary. Rewriting the above,
y^£,(n)£ _ y (n)
But by
t
A, 6 nl
and fln2
-
t
e
w=
tr
(§n
8 01
= O r (n- a / 2 ),
602)
= O r {n~ 3
).
-
6
)
and
[8„
6 ni
-
Thus the
-
n
6
J
6 Q7
^n 3
= O r (n- 3 / 2 ),
which imply that \6 nl -
£rst term on the left
hand
side is
O p (l) +
6
0,,{1)
+
=
;1
the
left
Further, by
(l).
hand
side
Lemma
1/ 2
and £2 te = 0,,(n 3 / 2 ), so that the
£t e = O r (n
+ 0,,(l) = 0,,(l). Thus the right hand side satisfies:
'
4.1,
again 0,,{\)
is
)
t
)
t
t
second term on
that
—2
(b.J
Suppose
t is
2^£
£t(i)?t-y(i)
Joss
—
£,_y
as
»
—
n
»
oo.
difference stationary. Write:
£?=/+! M")<«-yW _
Without
f
>
of generality, suppose j
-
£,_ y (n)
0.
=
£«(n)
^ Er=y+
«t(»)
i
l
n) -
? «-y(
?
iW]
Then:
(e t -y
=-
-
£
£t)
u( _
fc
+
(Z.
+
(0
-
£t-y) («»
-
j) [e„
6
-
j
*o)
.
k=0
Therefore:
y-i
£t
(n)
[et.^n)
-
tt
{n)}
=
£l
-
Z,
(X -
O
[
= - JT m-xtt +
-E
k=0
)]
(§ n
-
S
U( - fc
)' (
°) u +
(§ n
-6o)'j2
Z't ut-t
'
^
k=0
-z
-So) (L-6o)'
(e n
t
fc=o
'
rX
which implies:
J2 U[n)
[?t-y(n]
-
lt
=-
(n)}
^E
fc=0
JVow apply
Lemma
4.1
and
Lemma A
y^ u,_
fc
£t
U <- fc£ «
+
{*»
~
Q) 5> + £
*")'
v "
t
'
(*»
"
6 °)'
fc=0
t
repeatedly:
= y^ u,_
t
22 Ut -'
£t-k-i
fc
1=0
\
t
J
k
=
('»
fa,
-
e o)'
(°)
£
-«o)Y TUl -
^
Ut-*«t-fc-l
»(»~ 1/2 )
£<
=
")
- O.fn 1 / 2
fc
5582 093
)
•
•
T X]
°»(»
3/2
)
H
=
U _fcU _i
(
t
=
Op(tl),
°p( n ).
0„(n^) + O p (n-V>) O p {n*") =
0„(n),
.
and
finally,
(L -
Bo)'
(f\
- h)'^2Zi = O r {n~ 112
(§ n
=
\0 r {n-
)
ll2
+
)0 r {n)
p {rT
l l*)0
9 [n*
0,,(n).
Thus the numerator
£? (n)(f _
t
t
y
(n)-?
t
=
(n))
0,,(n).
t
To complete the proof, we now establish the asymptotic probability order of the denominator. By
4.1 and Lemma A, -V 2,
n 2 converges weakly to Borne functional of W. In particular, the
M
Lemma
limiting
random
)
variable takes on the value
Consequently, as n
—
»
with probability
^n
This establishes the Theorem.
-
in probability.
»
<r
Q.E.D.
the text, the divergent
crucial points in evaluating (§„
.....
—
1
s
E,=i
in
6
first
£
•
(
j
)
«t
t
-
entry of
6n
and (§„
-6
in part
6
•
.
(
J
J
numerator would be Oj,(n 7 ), and given the asymptotic probability order
would fail to converge in probability.
Proof of Theorem 4.4 (Finite Sample Bias):
Thus
in either case the exact bias in the
= {n-j)-
fl
)
u-i - z[_ i
covariogram estimator
(e n
-
6
is
j
multiplied by zero at
£
f
Z[. Otherwise, the
of the denominator, the estimator
product
-
e
)
is
et .
is
r
f
Y, Z E
1
-
(b.)
(§n
In either case, the cross
z[ (e n
n
B{j,n)
-
(§ n
t
O
)
(k -
K
*o)
i=i+i
E
-(n-j)- 1
(^[(«»-«o)«t-y]+^-y^[(^»-«o)«t]
t=J+l
Consider case
(a.j.
B_v iterating, e^i
=
+
A l £o
T^ fc=0 A^Ut—^, whicii implies /or
[/
E
Ee^ r tx: -
,-i
A't
I
+ 2J
LV
/
\
fc
^ "*-fc
I
/
fc=o
aii s,t:
<-i
^ £ c -r /J A
^1
'
U,_/
j=o
V
r-i t-i
=
)' +,
£ (2 +
^^)
=
k=0
t+
'r(vrt-,)
1
Define the function:
def
/(s,t,A)
Next consider case
(b.).
is 0,,(l).
oo:
EILy+i «iW«t-yW
As remarked
Therefore (^j-^2, tt{n) 7 )
0.
By
iterating,
en
—
eo
+
= Ee x ,e xt
.
12h=o u t-k, so that
for all
»-i t-i
Ee u e lt =Ec-c
^y2^TE (u,_
K=0 1=0
fc
u t _,)
s, t:
Define the function:
= Ee u e lt
g(s,t)
Then
case
in
.
fa. J:
E[(L-6
)ext]
[Tz'.E^.^
= [JZZ'.ZA
[j^KZA
=
\t=i
\«=i
V«=l t=l
v«=i
Similarly, in case
(X>.'/M,A)
v,«=i
v«=i
^t>.^/
[e n
-e
)e lt
]
\Tz' zA
=
\Tz',g{s,t)\
t
,
and
V«
=l
V«
= l t=l
V«
=l
Therefore:
-1
v«=l
5
Tiie expression /or
a
is
v«
identical with g in place of f.
The
=l
\»=1
difference in bias
is
then:
BxV,n)-Bi[j,n)
= {n-3V
E
1
t=y+i
-Z
t
U[EW
\«=i
[
(f\Z',zA
E«
- z<-:
V-=i
For eacn
fixed s,t, as A
continuous
in /(s,
t,
A)
—
establishes the Theorem.
—
»
1,
p(s, t).
V»=it=i
\»=i
/
/
[J2Z',(f{s,t-j,\)-g(s,t-j)))
Ez
.'c/(^
)
£>
A)
-
-°(
V«=i
j
we nave
/(s,i, A)
Therefore,
Q.E.D.
z;,,.
fEE^t(/(-,t,A)-p(«,t)))(f;z;z.)
/
we
see
£
>
0)
/
-
g[s,t)
—
»
0.
Further, Bx{j,n)
immediately that as A
—
•
1,
B\ [j,
n)
J5i(j, n)
— Bi [j,
is
n)
evidentJy
—
0.
Tiiis
Date Due
MIT LIBRARIES
3
TDfiD
DOS
3 ?fi
Sfls
Download