The general solution to a stochastic linear difference equation has

advertisement
Models with Trends and Nonstationary Time Series
Ref : Enders Chapter 4, Favero Chapter 2, Cochrane Chapter 10.
The general solution to a stochastic linear difference equation has three parts:
yt  trend  stationary component  noise
The noise component: ARCH, GARCH approaches model this variance (volatility)
component.
The stationary component: AR(p), MA(q), ARMA(p,q) models. Require the roots of the
characteristic equation to lie within the unit circle (or the roots of the inverse of the
characteristic equation to lie outside the unit circle).
Here: we examine the trend component.
Trend = deterministic trend + stochastic trend
Deterministic trend: constant, accelerating nonrandom trend.
Stochastic trend: random. It can be due to any shock, such as technology, oil prices,
policy, etc.
Until the 1960s researches modeled time series as covariance stationary.
Problem: this assumption did not describe macroeconomic time series that generally
grow over time.
Originally some proposed ways for dealing with the problems of growing series:
 taking the log of Y and
 assuming the DGP could be described by log(Yt )  y t    bt  z t , where y was
assumed covariance stationary and E(y)=0, which led to expected growth rate of b
in the series: E (y t )  b . The series y were said to be trend stationary.
Box and Jenkins first proposed the idea that instead of treating the macro series as
covariance stationary around a deterministic trend, we should accept that they are not cov
stationary, but instead first difference them to make them cov stationary:
If y t  b  y t 1  u t , then a stationary model for y would be: yt  b  ut . They then
modeled u as a covariance stationary, ARMA (p,q) process and thus y is an
ARIMA(p,1,q) process.
Models with Trends
1
Reminder: “I” stands for integrated process and “1” shows that the process needs to be differenced
once to be stationary: integrated of degree 1=I(1). A covariance stationary series are I(0). If a
time series needs to be differenced d times to become stationary, it is integrated of degree d, I(d).
Then the series can be represented by an integrated moving average process of the order, p, d, q,
an ARIMA(p,d,q). Usually d=1 is sufficient. In economics d=2 is the maximum we would need
to differentiate. For ex: rate of growth of inflation (differentiate the price level twice).
They showed that many time series could be successfully modeled this way. Nelson and
Plosser (1982) later tested and confirmed that they could not reject nonstationarity in
most macroeconomic and financial series. They suggested technology shocks as an
explanation for this finding, but others later interpreted it as evidence of rigidities. Since
this finding, which established that most macroeconomic and financial series can be
described well by an ARIMA process, nonstationarity became part of
macroeconometrics.
Why is it important to recognize this?
Most macro variables are very persistent (nonstationary). But standard inference
techniques are unreliable with nonstationary data.
Dickey and Fuller: OLS estimates are biased towards stationarity, suggesting that series
that looked stationary with OLS regressions would be in fact generated by random walks.
This finding made most of the conclusions in the macro literature wrong or at least
undependable.
In this lesson, we will look at:
 Trend stationary models
 Random walk models
 Stochastic trend models
 Trends and univariate decompositions
When a process is unit root nonstationary, it has a stochastic trend. If linear
combinations of more than one nonstationary processes do not have stochastic trends,
these variables are cointegrated.
I. Trend Stationary and Difference Stationary Models
Consider the process
xt    xt 1  et
where e is white noise ( et ~ n.i.d.(0,  2 ) ).
If   1 we can write

(1  L) xt  et hence xt  i 1  i (  et i ) stationary process.
Models with Trends
2
If   1 , then (1  L) xt  et , there is a unit root in the AR part of the series and we have
to solve the equation recursively. If x 0  0 , recursive substitution until t=0 gives the
solution:
xt  ti 10 (  et i )  x0 .
Rewriting:
xt 
(2)

t
determinis tic
trend
 ti 10 et i  x0



.
stochastic
trend
If there are no shocks, the intercept is x 0 .
Suppose there is a shock at time i (e.g. an oil-price shock), it shifts the intercept by et i
and the effect is permanent (with coefficient 1). This is a stochastic trend since each
shock affects the mean randomly. The model has a very different behavior than the
traditional covariance stationary models where the effect of shocks dies over time.
GDP
et  i
x0
time
tt i
Models with Trends
3
If xt ~I(0) then
xt
Trend Stationary (TS)
xt has a linear time trend
Difference Stationary (DS)
I(1) or unit root stationary
xt    bt  et
LR var(xt )  0
Non-zero serial correlation.
Special case: random walk
LR var(xt )  0
Look at special cases for xt    xt 1  et
1. Difference Stationary models (DS)
(i) Random walk: AR(1) model with   0 ,   1 (a unit root process)
xt  xt 1  et
Note: This is a martingale process if e is not an i.i.d (0,  2 ) process.
xt  et  the process is stationary in its first-difference.
The solution to the differential equation is:
xt  ti10 et i  x0 if
x0  0
Properties of x t :
E ( xt )  E ( xt  s )  x0 or E ( xt )  E ( xt s )  0 if x 0  0 . The mean of a random walk is constant.



2

 (0)  Var( xt )  E( xt2 )  E ti 10 et i = E ti10 e 2 t i = t 2 and


2


t 1
Var( xt  s )  E( xt2 s )  E ti 10 et i  s = E i0 e 2t is = (t  s) 2 .
lim t  E( xt2 )   . The variance explodes as t grows over time. RW is nonstationary.
Covariances and correlation coefficient:
Models with Trends
4
 (s)  Covar ( xt , xt s )  E[(xt  E( xt ))(xt s  E( xt s ))]  (t  s) 2
 (2)
 (s)
 (1)
 (t  2) / t ,  ( s ) 
 (t  s ) / t
 (1) 
 (t  1) / t ,  (2) 
 (0)
 ( 0)
 (0)
As s  0   1 and as s  t ,   0 . So, the correlation coefficient slowly dies out, though
it takes a long time.
 It may be difficult to distinguish the ACF of an AR(1) from random walk,
especially if the autocorrelation coefficient is large. So ACF is not a useful tool
with RW to determine if a process is nonstationary.
Variance and covariance are time dependent, thus the random walk process is nonstationary and needs to be first differenced to become stationary  it is called an I(1), or
difference stationary (DS) process. This means that stochastic shocks have a
nondecaying effect on the level of the series. They never disappear or die away over time
slowly.
(ii) Random walk with drift: AR(1) model with   0 ,   1
xt    xt 1  et
xt    et  the process is stationary in its first difference.
We saw that the solution to this differential equation is:
xt  ti10 et i  x0  t
Thus x has a linear trend when we have a RW with drift. This process is the sum of two
nonstationary processes:
xt  zt  ( x0  t )
where x0  t = linear (deterministic) trend
zt  ti 10 et  i = stochastic trend (random walk without drift)
As t grows, the linear trend will dominate the random walk.
Since z t  et , the first difference of x is stationary:
xt  et   --mean constant ( ) , var constant ( Var (xt )   2 ), covar=0
A RW with drift is also a called difference stationary (DS) model.
Models with Trends
5
(iii) ARIMA(p,1,q) model
A( L) xt  B( L)et
If A(L) has a unit root, B(L) all roots outside the unit circle, we can write the model as:
(1  L) A * ( L) xt  B( L)et where polynomial A*(L)’s roots all lie outside the unit circle
and it is of order p-1,
A * ( L)xt  B( L)et
Now xt sequence is stationary since A*(L)’s roots all lie outside the unit circle.
With ARIMA(p,d,q), we can first-difference d times and the resulting sequence will be
stationary as well.
First-differencing is used to make stationary a nonstationary series. It removes both
the deterministic and the stochastic trends. But as we will see in the topic about
cointegration, this makes the researcher lose valuable long-run information.
2. Trend-Stationary (TS) processes:
xt     t  ut , where u is a white-noise process.
Here nonstationarity is removed by regressing the series on the deterministic trend. The
process fluctuates around a trend but it has no memory and the variation is predictable.
Ex: log(GNP) can be stationary around a linear trend. If you difference this process, then
the resulting series are not well-behaved.
1. First-differencing a TS process
xt    ut  ut 1    (1  L)ut
We are introducing a unit root in the MA component. Thus xt becomes
noninvertible.
2. Subtracting the deterministic trend: “detrending”
Substract the estimated values of x from the observed series xt    t . If
nonstationarity is only due to deterministic trend, the resulting series uˆt  xt  xˆt will
be a stationary process and thus can be modeled as an ARMA(p,q) process for ex.
More generally the trend function can be a polynomial with the degree to be
determined by the AIC or SBC.
xt    i 0 i t i  ut
n
Models with Trends
6

Similarly, if you try to detrend a DS model, you end up adding a deterministic trend
to the existing stochastic trend portion of the in the first difference :

Warning: the problem with detrending is that it may be seriously misleading: a RW
generates a lot of low frequency fluctuations. In a short sample, a drift may be
wrongly interpreted as a trend or a broken trend. If you fit a trend to the series, which
true representation is a RW, you would be estimating the wrong model. This is the
problem with “technical analysis” in the stock or FX markets (head-and-shoulder
patterns).
Models with Trends
7
Illustration
Eviews graphs
TREND.PRG
BeveridgeNelson.WF
Generate three series: one DT (with deterministic trend) with   0,   0.1 and two with
stochastic trends (ST1, ST2) whose difference is that error terms are from different
drawings from the same distribution. STs are RW with a drift =0.1 .
smpl 1 1
genr ST1=0
genr ST2=0
smpl 2 200
series ST1= 0.1+ST1(-1) +nrnd
series ST2=0.1+ST2(-1)+nrnd
series DT= 0.1*@trend +nrnd
40
30
20
10
0
-10
25
50
75
100
DT
125
ST1
150
175
200
ST2
Unlike for the series DT, detrending the series ST1 and ST2 will not make the series
stationary.
series dtdet=dt-0.1*@trend
series st1det=st1-0.1*@trend
series st2det=st2-0.1*@trend
plot dtdet st1det st2det
Models with Trends
8
20
15
10
5
0
-5
-10
25
50
75
DTDET
100
125
ST1DET
150
175
200
ST2DET
Try now first-differencing. All series become stationary.
series ddt=dt-dt(-1)
series dst1=st1-st1(-1)
series dst2=st2-st2(-1)
plot ddt dst1 dst2
5
4
3
2
1
0
-1
-2
-3
-4
25
50
75
DDT
100
125
DST1
150
175
200
DST2
The TS model is a special case of DS model:
Consider a general form of a DS model:
(1  L) xt    a( L)et (the random walk with drift= simplified version where a(L)=1).
And a general form of a TS model:
xt     t  b( L)et , or:
(1  L) xt    (1  L)b( L)et    a( L)et where a( L)  (1  L)b( L) ,  a(L) has unit root.
Models with Trends
9
** The main difference between the two models is that the MA part of the TS model has
a unit root.
Both the DS (integrated variable) model and the TS (deterministic trend) model exhibit
systematic variations.
Differences
 TS models: variation predictable  can be removed by removing the
trend.
 DS models: variation not predictable  cannot be removed by detrending.
Alternative Representation of an AR(p) processes
This section shows that any polynomial of the process  ( L) xt can be written
 ( L)   (1)  (1  L) * ( L)
(3)
where  * ( L)  i1i L0  i2 i L  i3 i L2  ....   pj01 ip1 j i L j
This is a useful representation that will be used to derive the Beveridge-Nelson
decomposition.
Consider a polynomial
 ( L)  1  1L  2 L2  ...   p1Lp1   p Lp
(4)
Define  (1)  1  1  2  ...   p 1   p
and  j   j 1   j 2  ...   p for j=0,1,2,…
The polynomial (3) is equivalent to:
 (1)   * ( L)(1  L)   (1)  ( 0  1L   2 L2  ...   p1 Lp1 )(1  L) , because
(5)
  (1)   0   0 L  1L  1L2   2 L2   2 L3  ...   p1Lp1   p1Lp
  (1)   0  (1   0 ) L  ( 2  1 ) L2  (3   2 ) L3  ...  ( p1   p2 ) Lp1  ( p1 ) Lp
Replace  (1) and  i :
 (1  1  2  ...   p )  (1  2  ...   p )  [(2  3  ...   p )  (1  2  ...   p )]L 
[(3  4  ...   p )  (2  3  ...   p )]L2  ...  [( p )  ( p1   p )]Lp1   p Lp
 1  1L  2 L2  ...   p 1Lp 1   p Lp   ( L)
Models with Trends
10
Thus:
 ( L)   (1)  (1  L) * ( L)
A more elegant way of showing the same thing (Favero): C(L)=C(1)+(1-L)C*(L)
Consider C(L), a polynomial of order q.
Define a polynomial D(L) such that:
D(L)=C(L)-C(1), also of order q since C(1) is constant.
Thus D(1)=0, meaning that 1 is a root of D(L), and
D(L)=C*(L)(1-L)=C(L)-C(1)
thus C(L)=C(1)+(1-L)C*(L).
II. Decomposition of Univariate Time-series
Ref: Favero Ch.2, Enders Ch.4, Pagan online lecture notes #4
The idea is that it is informative to decompose a nonstationary sequence into its
permanent and temporary (stationary) components.
Beveridge and Nelson (1981, JME): expressed an ARIMA(p,1,q) model in terms of
random walk+drift+stationary components. They showed how to recover from the data
the trend and the stationary components.
This idea goes back to measuring the “output gap” used to assess the business cycle or
estimate the Phillips curve. Also, if you assume like in Blanchard and Quah that demand
shocks affect out put temporarily while supply shocks affect it permanently, you can also
infer the demand shocks from the temporary component. However, overall this is not a
good way to approach this question, since the temporary vs permanent components
should be model determined. Moreover, there are some problems associated with the
assumptions behind trend extraction approaches since they are not unique.
Consider again equation 2
xt  t  ( et i  x0 )
(2)
with x = deterministic trend due to drift + (difference stationary process).
BN decomposition further decomposes the 2nd term in the RHS into a stationary
component and a random walk component.
Models with Trends
11
Deriving the temporary vs. permanent effects (Favero p.51):
Consider the first difference of an integrated process:
(6) xt    C ( L)et where et ~ i.i.d .(0, 2 ) and C(L) is a polynomial of order q.
Define a polynomial D(L) such that D(L)=C(L)-C(1), also of order q since C(1) is
constant.  D(1)=0, meaning 1 is a root of D(L) and hence we can express D(L)
as:
D(L)=C*(L)(1-L)
Also
 C(L)=C(1)+C*(L)(1-L)
D(L)=C(L)-C(1)
Using this result, we can thus rewrite (6) substituting C(L):
(7) xt    C (1)et  C * ( L)et .
Integrating this equation (i.e., divide both sides by (1-L)) we get:
xt  [ t  C (1) zt ]  C * ( L)et

TRt
 Ct
where z is a process such that zt  et , thus TRt  TRt 1  C (1)et
TR=deterministic trend + stochastic trend=permanent (random walk) component;
C = cyclical trend = temporary (stationary) component.
Alternative interpretation of the decomposition (Pagan, online lecture notes):
Rewrite (7) as:
(1  L) xt    C (1)et  C * ( L)(1  L)et
xt  (1  L)1[  C(1)et  C * (L)(1  L)et ]  (1  L  L2  ...)[  C(1)et ]  C * ( L)et
xt  [ t  C (1)tj 0 et  j ]  C * ( L)et
 Pt ( permanent)  Tt (temporary)
The term in the summation is an integrated series, which reflects permanent
shocks  P is an I(1) process and T is an I(0) process since lim j C *j  0 .
Models with Trends
12
This is what is called the
1. Beveridge and Nelson decomposition:
For any time-series xt that is I(1), we can represent it as
(1  L) xt    C ( L) t (6) with  t ~ i.i.d and C(L) a polynomial of order q,
and we can write
xt  Ct  TRt
where
Ct =temporary (cyclical) component = C * ( L) t
TRt =permanent (trend) component=deterministic trend+stochastic trend
=   TRt 1  C (1) t ( t  C (1)  t  j ) .
To see this: apply C(L)=C(1)+(1-L)C*(L) in (3) to the polynomial C(L) in (6).
The equation xt    C ( L)et can be written as
(1  L) x t    C (1) t  (1  L)C * ( L) t  (1  L)TRt  (1  L)Ct
 xt  TRt  Ct
.
Two features of B&N decomposition:
1. The shocks to the permanent component C(1)e are white noise.
2. The shocks to permanent and temporary components are perfectly negatively
correlated.
Examples: find the cyclical and trend components in
1. ARIMA(0,1,1) process xt   t   t 1 with 0    1
The BN decomposition gives: xt  Ct  TRt
With Ct = C * ( L) t and (1-L) TRt =   C (1) t .
From the example, we have:
C( L) 1  L
C (1) 1  
From C(L)  C(1)  (1  L)C * (L)
C * ( L) 
C ( L)  C (1) L  L

 
1 L
1 L
Thus xt   t  (1   ) zt
where zt 
t
1 L
, a process for which zt   t .
2. An ARIMA(1,1,1) process xt  xt 1   t   t 1 with 0    1
Rewrite the process as xt (1  L)  (1  L) t  xt  (1  L) t /(1  L)
Thus C ( L) 
1  L
1
and C (1) 
1

1  L
Models with Trends
13
and C * ( L) 
C ( L)  C (1)
 

1 L
(1   )(1  L )
Hence, xt  C * ( L) t  C (1) t /(1  L) =
 
1
t 
zt . The longer the
(1   )(1  L )
1 
memory (higher rho), the bigger is the effect of the stochastic trend on the process.
3. An AR(1) process: xt  xt 1   t
HW2: Find the transitory and the permanent components.
Show that the permanent component’s effect on the process is higher, the longer is
the memory.
When the transitory component is measured by the Beveridge-Nelson decomposition,
and the growth rate is AR(1), show that the transitory component is proportional to
the growth rate in x,
Procedure to use the BN technique in empirical studies:
Step 1: Estimate the first-difference of the series, and identify the best ARIMA model
of the sequence. Get the error term and all the parameters in the constant in
TR and C(1).
Step 2: Given an initial value for TR, compute the permanent component TR of the
sequence.
Step 3: Compute the cyclical component by subtracting the permanent component
from the observed values in each period.
Numerical Example
1. Favero p.52
Suppose you estimated an ARIMA(2,1,1) model and got the following regression
result:
xt  xt 1  0.6 xt 1  0.6 xt 2  ut  0.5ut 1
How can we generate the permanent and the cyclical (transitory) components of this
process? For this, we need to calculate the parameters in C(1), TR and compute
C*(L).
Rewriting the model with lag operators, you transform it into an ARMA(1,1) model:
(1  L) xt  0.6L(1  L) xt  (1  0.5L)ut
(1  L) xt 
(1  0.5L)
ut
(1  0.6 L)
Models with Trends
14
Using the expression in (6): (1  L) xt    C( L) t
C ( L) 
and
(1  0.5L)
and C(1)=1.5/0.4
(1  0.6 L)
1.1  1.1L
1  0.5L 1.5

 1.1
C ( L)  C (1) 1  0.6 L 0.4 0.4(1  0.6 L)


C * ( L) 

1 L
1 L
0.4(1  0.6 L)
1 L
Thus the BN decomposition of the x process is:
xt  Ct  TRt  C * ( L)ut  (  C (1)ut ) /(1  L)
=
1.1ut
1.5 zt

0.4(1  0.6 L)
0.4
and TRt  TRt 1 
where zt 
ut
and   0
1 L
1.5
ut .
0.4
We can compute the permanent and the cyclical (transitory) components (tr and
cycle, respectively) in Eviews as follows:
Use bn.prg and bn.wf
smpl 1 2
genr x=0
smpl 1 200
genr u=nrnd
smpl 3 200
series x=x(-1) +0.6*x(-1)-0.6*x(-2) +u+0.5*u(-1)
smpl 1 2
genr TR=0
smpl 3 200
series TR= TR(-1)+(1.5/0.4)*u
genr CYCLE=X-TR
plot cycle TR x
10
5
0
-5
20
-10
0
-20
-40
-60
25
50
75
X
Models with Trends
100
TR
125
150
175
200
CYCLE
15
2. REER_Enders: Canadian real exchange rate.
Workfile: reer_enders.wf
Program: bn.prg
Consider an ARIMA(1,1,1) model
xt  108  0.98 xt 1  ut  0.39ut 1 . The model has an unstable AR root.
So fit the ARMA model
xt  0.186  ut  0.37ut 1
(eqreer)
Dependent Variable: CANADA-CANADA(-1)
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
-0.186171
0.343883
-0.541381
0.5896
MA(1)
0.370357
0.099292
3.729962
0.0003
R-squared
0.114123
Mean dependent var
-0.186517
Adjusted R-squared
0.103941
S.D. dependent var
2.506556
S.E. of regression
2.372716
Akaike info criterion
4.588163
Sum squared resid
489.7908
Schwarz criterion
4.644087
F-statistic
11.20779
Prob(F-statistic)
0.001205
Log likelihood
Durbin-Watson stat
Inverted MA Roots
-202.1733
2.020327
-.37
Root=-0.37 it is inside the unit circle, thus the ARMA model is invertible.
The residuals are nearly white noise, no serial correlation: DW=2 (can use it because
there are no lagged dependent variable on the RHS), all autocorrelations and partial
autocorrelations are close to zero.
Models with Trends
16
Date: 10/03/07 Time: 09:26
Sample: 1980Q2 2002Q2
Included observations: 89
Q-statistic probabilities adjusted for 1 ARMA term(s)
Autocorrelation
.|.
.|.
. |*.
.|.
.|.
.|.
.|.
. |*.
.|.
.*| .
.*| .
.*| .
**| .
.|.
**| .
.*| .
.*| .
.|.
.*| .
.*| .
Partial Correlation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
.|.
.|.
. |*.
.|.
.|.
.|.
.|.
. |*.
.|.
**| .
.*| .
.*| .
**| .
.|.
.*| .
.|.
.*| .
. |*.
.|.
.|.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
AC
PAC
Q-Stat
Prob
-0.013
0.036
0.147
0.033
0.053
0.041
-0.010
0.113
0.028
-0.188
-0.099
-0.062
-0.239
0.037
-0.193
-0.100
-0.071
0.025
-0.093
-0.063
-0.013
0.036
0.148
0.037
0.045
0.020
-0.022
0.097
0.023
-0.201
-0.152
-0.078
-0.213
0.063
-0.153
-0.055
-0.074
0.148
-0.001
-0.037
0.0154
0.1344
2.1585
2.2604
2.5293
2.6949
2.7040
3.9707
4.0524
7.6886
8.7160
9.1245
15.210
15.358
19.447
20.562
21.134
21.204
22.196
22.663
0.714
0.340
0.520
0.639
0.747
0.845
0.783
0.852
0.566
0.559
0.610
0.230
0.286
0.149
0.151
0.173
0.217
0.223
0.252
Compute the cyclical and permanent components:
xt  0.186  (1  0.37 L)ut
C(L)=1+0.37L
Thus C * ( L) 
C(1)=1.37
  0.186
1  0.37 L  1.37 0.37 L  0.37

 0.37 and therefore,
1 L
1 L
Ct  C * ( L)ut  0.37ut ---the cyclical (transitory) component.
(1  L)TR    C (1)ut  0.186  1.37 u
---permanent component.
TRt  TRt 1    C (1)ut  TRt 1  0.186  1.37u
Then we can rewrite x in terms of its cyclical and permanent components as:
Models with Trends
17
xt  (0.37ut )  (0.186  1.37 zt )
EVIEWS:
Generate the permanent and the transitory components of the x process.
The decomposition of a series into a random walk and a stationary component is not
unique, BN is one way of doing it so, but there are also other ways, such as HodrickPrescott Filter as we will see below.
BN requires
(i) the correlation between the trend and the cyclical components to be -1,
(ii) var(trend)>var(the x series): the permanent component is more volatile than the
series because the negative correlation between the trend and the cyclical component
smoothes out the x sequence.
We can see this using the Example 1 where we got
xt 
1.1ut
1.5 zt

0.4(1  0.6 L)
0.4
where zt 
1.5
ut
ut
and TRt  TRt 1 
0.4
1 L
Thus   E[(Ct  E(Ct )(TRt  E(TRt ))]/ E[Ct  E(Ct )]2 E[TRt  E(TRt )]2

   1.1 t   1.5 t  
E  

 
0
.
4
(
1

0
.
6
L
)
0
.
4
(
1

L
)





  1.1 t


E 
 0.4(1  0.6 L) 
2
 1.5 t 

E 
 0.4(1  L) 
= -1
2
Var ( xt )   2  Var (Ct )  Var (TRt )
 1.5ut 

Var (TRt )  E (TRt )  E 
 0.4(1  L) 
2
2
2
2
 1.5 ut  i 
 1.5ut 


i 0
E
 E
 since Eut  iut  j  0


0.4
0.4 



2
 14 u  Var ( xt )   2
2


1.1ut
1.5ut 
 E
Var ( xt ) =  E 

 0.4 
 0.4(1  0.6 L) 
Models with Trends
2
18
2
2
 1.1 0.6i ut  i 
 1.1  2
2
2


i 0
 E
 14 u   E 
ut  15 u



0.4
 0.4 


2
2
2
 7.56 u  15 u  7.5 u
2
Hence  2   TR
.
If the innovation between the trend and the stationary component is not -1, but rather
0 (they are uncorrelated), but you use a BN decomposition, you would impose on the
data an incorrect partitioning of the variances. But there is no way to know the
identification from the data when you use univariate decomposition.
2. Hodrick-Prescott Decomposition (Filter)
This is another way of decomposing a series into its trend (permanent), TR and
cyclical (temporary) components (x-TR). Vast applications in the real business cycle
literature. The idea is to minimize variations around the trend (permanent
component).
Min Tt1 ( xt  TRt ) 2   Tt21[(TRt 1  TRt )  (TRt  TRt 1 )]2
TR
t
The minimization is done under the constraint of a penalty, represented
by    C2 /  x2 , which penalizes variability in the growth component (TR) series. The
larger the value of  , the smoother is the solution series TR.
See Hodrick and Prescott (JMCB 1997) for an application to growth in GNP and
aggregate demand’s components (all variables are in natural logs, so the change in TR
corresponds to a growth rate).
Illustration
Workfile bn.wf
Select X, proc, HP filter. Call the smoothed series hptrendx, with   100 .
Compare results with BN: trendHP is smoother than trendNB. If increase  the
maximum volatility will be that of the actual series x.
Models with Trends
19
Hodrick-Prescott Filter (lambda=100)
20
20
6
0
10
-20
0
-40
-10
4
-20
-60
2
-30
0
-40
-2
-50
-4
-6
-60
25
50
75
X
100 125 150 175 200
Trend
Cycle
HP Filter
25
50
75
CYCLE
100 125 150 175 200
TR
X
BN decomposition
Both methods give different decompositions. Since we don’t have prior information
on the relationship between innovations in the trend and the stationary components,
the decomposition of the series into a permanent and temporary component is not
unique.
The advantage of the HP filter is that it is applying the same trend to all the series you
use. Therefore, it is often applied in business cycle analyzes where most series share
the same stochastic trend. BN decomposition extracts a different trend from each
series. The disadvantage of HP filter though is that since it smoothes the trend, it may
introduce spurious volatility into the stationary component of the series.
Which method you want to use depends on the type of identification problem you are
analyzing. Theoretical models imply that several time series share common
stochastic trends. In empirical methodology this translates into multivariate models.
Then we need to see if we are able to group series into common stochastic trends and
look at whether this makes the trend is disappear: cointegration.
To consider whether different series with stochastic trends are cointegrated, we first
need to look at the unit roots and unit root tests.
II. Unit roots and spurious regressions
We mentioned that many economic time series are nonstationary. This includes most
NI statistics, goods and asset prices. Nonstationarity is usually due to a few causes,
and it gets transmitted to all other variables.
Models with Trends
20
 Productivity shocks: affect output and the real side of the economy.
 Monetary policy: a deviation of money supply from its LR constant growth,
the log of money supply is a random walk with drift:
 ln M t    et
then all nominal variables and nominal prices become nonstationary.
 Individual budget constraints:
at  ct  yt  (1  rt )at 1 a=assets, c=consumption, y=income, r=interest rate.
Solving for a:
[1  (1  rt ) L]at  yt  ct or [1  L]at  yt  ct
But   1 (root>1) thus the system is unstable (   1 is a random walk) thus
I(d).
Implications of nonstationarity in economics and finance
1. Model building:
Nonstationary series are more volatile than stationary series
Series with drift terms tend to be more volatile around strong trends.
If we have nonstationary processes with drifts, the other series also must exhibit the
same behavior to have meaningful estimates. For ex: you cannot explain GDP with
unemployment since GDP has a trend while the other variable does not. If the
dependent variable is nonstationary, you need to have at least a subset of the
independent variables to be nonstationary as well.
2. Econometrics
If data are nonstationary, then the sampling distributions of coefficients estimates
may not be well approximated by the Normal distribution. This is particularly true if
the series have a drift.
Suppose you regress one independent random walk process on another random walk
process, you may end up getting high R2 and significant estimates but this result is
called spurious, and meaningless since the two series have no relation (Granger and
Newbold, 1974, J of Econometrics).
Consider:
(8) xt    zt  et
The traditional regression methodologies require xt ~ I (0) , zt ~ I (0) , and
et ~ i.i.d .(0,  2 ) . If e is nonstationary, then estimates are meaningless because shocks
will have permanent effects, and thus the regression will have permanent errors.
Suppose that x and z are random walk with   0 .
xt  xt 1   t
zt  zt 1  t .
Thus et  xt  zt =
Models with Trends
i0  t i   i0 t i assuming initial conditions are 0.
k
k
21
Note that the first moment of e is not constant but instead: Eet  s  et
et 1  xt 1  z t 1  xt   t 1  z t   t 1 = et   t 1   t 1 .
Thus also: Eet  s  et .
The second moment increases with t, and also not constant.
Hence the traditional statistical inference methods, such as OLS, are not valid. This
problem will not disappear in large samples either, making asymptotic results not
applicable also.
Illustration
USUK.wf
Run series LCUS on c LYUK
EqTable2_2
Dependent Variable: LCUS
Method: Least Squares
Date: 01/02/07 Time: 16:17
Sample (adjusted): 1959Q2 1998Q1
Included observations: 156 after adjustments
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
LYUK
-5.612166
1.208546
0.162908
0.014643
-34.44987
82.53580
0.0000
0.0000
R-squared
0.977893
Mean dependent var
7.829120
Adjusted R-squared
0.977750
S.D. dependent var
0.351691
S.E. of regression
0.052460
Akaike info criterion
-3.044783
Sum squared resid
0.423821
Schwarz criterion
-3.005682
Log likelihood
239.4931
F-statistic
6812.158
Durbin-Watson stat
0.138834
Prob(F-statistic)
0.000000
High R2, and t stats. but meaningless results. Spurious regressions.
Models with Trends
22
See if the series LCUS and LYUK are AR(1).
Eqlcus and eqlyuk
Dependent Variable: LCUS
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
9.964851
0.595449
16.73503
0.0000
AR(1)
0.996284
0.001022
975.2605
0.0000
R-squared
0.999836
Mean dependent var
7.836301
Adjusted R-squared
0.999835
S.D. dependent var
0.355190
S.E. of regression
0.004563
Akaike info criterion
-7.929080
Sum squared resid
0.003248
Schwarz criterion
-7.890313
Log likelihood
628.3973
F-statistic
951133.0
Durbin-Watson stat
1.366495
Prob(F-statistic)
0.000000
Inverted AR Roots
1.00
Dependent Variable: LYUK
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
0.049911
0.049873
1.000764
0.3185
LYUK(-1)
0.996123
0.004486
222.0758
0.0000
R-squared
0.996887
Mean dependent var
11.12186
Adjusted R-squared
0.996867
S.D. dependent var
0.287769
S.E. of regression
0.016108
Akaike info criterion
-5.406311
Sum squared resid
0.039956
Schwarz criterion
-5.367210
Log likelihood
423.6923
F-statistic
49317.67
Durbin-Watson stat
2.296729
Prob(F-statistic)
0.000000
In both equations ˆ  1 , and the error terms seem i.i.d. with zero mean and constant or
finite variance, thus equations are close to random walk.
Models with Trends
23
.02
.01
.02
.00
.01
-.01
.00
-.02
-.01
-.02
60
65
70
75
80
85
RELCUS
90
95
00
RELYUK
This leads to the solution to (8) with a stochastic trend. In (8) we are thus regressing a
trending variable on another trending variable, hence ˆ is high.
Residuals from running LCUS on LYUK show that the stochastic error term is still
present. So this is a spurious regression.
8.5
8.0
.15
7.5
.10
.05
7.0
.00
-.05
-.10
-.15
1960
1965
1970
1975
Residual
Models with Trends
1980
1985
Actual
1990
1995
Fitted
24
Several cases:
xt ~ I (0)
y t ~ I (0)
y t ~ I (d )
y t ~ I (d )
y t ~ I (k )
xt ~ I ( d )
Classical regression model appropriate
Classical regression model not
appropriate. Spurious
regression.
Classical regression model not appropriate,
If residual sequence has
spurious regression. Residual sequence has stochastic trend, spurious
stochastic trend.
regression.
If residual sequence is
stationary, series are
cointegrated.
Classical regression model not appropriate. Spurious regression
(Enders)
Important to check if the series are nonstationary. Tests such as Dickey-Fuller (DF)
and Augmented Dickey-Fuller (ADF) have been widely used to check the stationarity
of the processes.
Dickey-Fuller (DF) test (1979)
xt    t  xt 1   t where  is NID
But this test is valid only for AR(1) processes. If there is higher order correlation,
then we need to use the ADF test.
Augmented Dickey-Fuller (ADF) or Said-Dickey test
Consider the following AR(p) equation:
xt    t  ip1 i xt i  et
Assume that there is at most one unit root, thus the process is unit root nonstationary.
We will reparameterize this equation. For this, consider an AR(2) process and
reparameterize it (subtract xt 1 from LHS and RHS, subtract and add  2 xt 1 to RHS):
xt  x t 1    t  x t 1  1 xt 1   2 xt 2  β 2 x t 1  β 2 x t 1  et
xt    t  (1  1   2 ) xt 1   2 ( xt 1  xt  2 )  et
xt    t  xt 1   2 xt 1  et
And for AR(p):
xt    t  xt 1  ip1 i xt i  et ,
Models with Trends
25
where   (1  1   2  ....   p ) and  i  ( i1   i2  ...   p ) for i=1, 2, 3,…, p-1.
This is the augmented version of the DF test (for DF all  i  0 ). It is general enough
to be asymptotically valid when there is an MA component, if there are sufficient
lagged difference terms.
Null hypothesis:   0 . Alternative:   0
If not rejected then x is nonstationary.
The test is evaluated with the t-ratio: t  
ˆ
.
se( ˆ )
(Note: for p=1,   (1  1 ) , and the test is equivalent to whether there is a stable root or not.)
This statistics does not follow the conventional Student’s t-distribution. In both tests
critical values are calculated by Dickey and Fuller and depend on whether there is an
intercept and/or deterministic trend, whether it is a DF or ADF test. Eviews uses the
more recent MacKinnon estimates for critical values, which are valid for any sample
size.
Intercept and deterministic time trend
The intercept and trend depend on the alternative hypothesis.
 If the alternative is that the series is mean-zero stationary, then there is no
intercept. Unlikely in macroeconomics series.
 If alternative is constant-mean stationary, then include an intercept. This is a
good assumption for time series that do not grow over time.
 If the alternative is that the series is trend stationary, then include a constant
and a linear trend. This specification represents most macroeconomic time
series that are growing over time.
Choice of lag length
1. Sequential t/F-statistics approach: start from a large number of lags, and
reestimate by reducing by one lag every time. Stop when the last lag is
significant. Quarterly data: look at the t-stats for the last lag, and F-stats for the
last quarter. Then check if the error term is white noise.
2. Information Criteria: AIC or SBC or HQC
** Note that the ADF test is not robust to the choice of p.
Models with Trends
26
Phillips-Perron (PP) test (1998)
It uses a nonparametric method to correct for the serial correlation of  t . PP make the
adjustment to both DF and ADF tests, and call their adjusted tests Z(t) and Z ( ) ,
respectively and they have the same asymptotic distribution as the DF and the ADF
tests.
- Advantage: more powerful than ADF.
- Disadvantage: more severe size distortions than ADF tests (the size of the test in
small samples is significantly different from the size of the test obtained from
asymptotic theory).
Kwiatkowski, Phillips Schmidt and Shin (KPSS) test (1992)
Often we model variables assuming that the variables are stationary. The tests so far
have the null of nonstationarity. KPSS test is one of the first ones that sets the null
that the series is trend stationary. It assumes that the series is (trend) stationary under
the null. It uses LM statistics and the critical values for this are based on asymptotic
results given in KPSS.
Procedure for unit root test:
1. Decide on the model: whether or not a constant/time trend is required. If there is
no trend, then the alternative hypothesis should be that the process is non-zero
mean and without trend. If there is a trend, then the alternative is that the process
is trend stationary with a non-zero constant.
2. Choose the maximum order of lags.
3. If the t-statistics is negative and greater than the critical value in absolute value,
reject the null of unit root.
Problem with the nonstationarity tests:
The power of the DF and ADF tests are very low, i.e. they can not reject the null of
nonstationarity too often (suggest that there is a unit root).
Models with Trends
27
Download