Local to Unity, Long-Horizon Forecasting Thresholds Turner

advertisement
Local to Unity, Long-Horizon Forecasting Thresholds
for Model Selection in the AR(1)
John L. Turner#
Abstract: The paper develops a framework for analyzing long-horizon forecasting in the AR(1)
model using the local to unity specification of the autoregressive parameter. I report new
asymptotic results for the distributions of the forecast errors when the AR(1) parameters are
estimated with ordinary least squares (OLS), and also for the distributions of Random Walk
(RW) forecasts. There exist functions, relating local to unity “drift” to forecast horizon, such
that OLS and RW forecasts share the same expected square error. RW forecasts are preferred on
one side of these “forecasting thresholds,” while OLS forecasts are preferred on the other. I
identify these forecasting thresholds, use them to develop novel model selection criteria and
show how they help a forecaster reduce error.
JEL Classification: C22
Keywords: forecasting, autoregressive, near-non-stationary, long-horizon
September 2003
#
Department of Economics, Terry College of Business, University of Georgia, Brooks Hall 5th Floor, Athens, GA
30602. Tel: 706-542-3682. Fax:706-542-3376. E-mail: jlturner@terry.uga.edu. I thank Christopher Otrok, Tim
Vogelsang, Jonathan Wright, Bill Lastrapes, two anonymous referees and seminar participants at the University of
Virginia for their helpful comments.
Time series forecasting of near non-stationary, possibly I(1), autoregressive processes
remains poorly understood despite the frequent use of such models in the econometric and
macroeconomic literature. Research has focused on two forecasting models to choose from, the
VAR framework and the I(1) specified co-integration framework. However, neither unit root nor
forecasting research has yielded precise theoretical rules for selecting between these two models.
Consider the univariate AR(1) model, for which the I(1) framework specifies the Random
Walk model (RW) and the VAR framework uses a simple ordinary least squares (OLS)
regression to estimate the parameters of the model. It is clear that, in finite samples, RW
forecasts have lower mean-square error (MSE) than OLS forecasts when the autoregressive
parameter is sufficiently close, but perhaps not equal, to unity (see, e.g. Stock (1996) and
Diebold and Kilian (2000)). As a result, RW forecasts can be preferred even where the Random
Walk model itself represents a misspecification of the true time series process.
In such cases, RW produces biased forecasts. Letting the autoregressive parameter shrink
away from unity, this bias increases, driving up the error of RW forecasts (relative to OLS).
Eventually, for sufficiently small values of the autoregressive parameter, the MSE of RW
forecasts becomes equal to, then larger than, the MSE of OLS forecasts.
I term the value of the autoregressive parameter where MSE(RW) = MSE(OLS) a
forecasting “threshold,” since on one side of the threshold RW is a preferred forecasting model,
while on the other side OLS is preferred.
Since the threshold value of the autoregressive
parameter changes with sample size, I appeal to local to unity theory to identify asymptotic
Pitman drift thresholds that are robust to changes in sample size.
The identification of these thresholds is important for three primary reasons. First, it
extends Stock (1996) in a way that allows us to accurately predict the relative performance of
RW and OLS forecasts. Second, it helps explain how downward bias in the OLS estimate of the
autoregressive parameter affects forecasting. Most importantly, though, it reframes the question
of how best to select between RW and OLS. This paper shows that rendering a statistical
judgment as to the presence or absence of a unit root should not be a major concern of a
forecaster.
2
1. Introduction
Consider the canonical AR(1) model:1
yt = d t + ut
d t = µ + θt
u t = ρu t −1 + ε t , ρ ∈ [−1,1]
(0)
ε t ~ iid (0, σ 2 )
u0 = 0
While the model is only univariate with a simple innovation structure, it is nonetheless very
important to economists. Theory often predicts that certain macroeconomic aggregates have an
autoregressive structure, and empirical research confirms the first-order autoregressive
tendencies (see, e.g., Hall (1978)). Given this, it is important for economists to understand fully
how best to forecast this model. Additionally, the lessons learned from the AR(1) model can
then be extended in more general VAR and ARMA specifications.
I consider three different versions of (0):
Case I:
µ = 0,θ = 0
⇒ y t = ρy t −1 + ε t
(1)
Case II:
µ ≠ 0,θ = 0
⇒ y t = α + ρy t −1 + ε t
(2)
Case III:
µ ≠ 0,θ ≠ 0
⇒ y t = α + δt + ρy t −1 + ε t
(3)
where α = µ (1 − ρ ) + θρ , δ = θ (1 − ρ ) . Cases II and III are widely applicable to forecasting
economic variables. Case I is of mostly theoretical interest but is relatively straightforward to
analyze. I consider it primarily for expository purposes.
Several previous papers have expressed analytically the distribution of forecasts of cases
I and II when the model’s parameters are estimated by least-squares. Box and Jenkins (1970)
derive crude, analytically tractable, approximations for the distribution of AR(1) forecasts for
Case I, while Phillips (1979, Case I), Fuller and Hasza (1981, Case II) and Magnus and Pesaran
(1989, Case II) derive more precise, analytically cumbersome, approximations for the
distributions of these forecasts.
This past research, which employs traditional probability theory, offers considerable
insight into AR(1) forecasts. However, the development and dissemination of local to unity
1
My specification of u 0 = 0 follows Stock (1991), for reasons that will become clear later in the paper.
3
asymptotic theory has laid the foundation for significantly more powerful analysis of forecasting.
It specifies the autoregressive parameter ρ as being in an O(1/T) neighborhood of unity:
ρ = 1+
c
T
(4)
For asymptotic analysis, the drift c is held constant as T approaches infinity.
Using the
specification in (4) and the functional central limit theorem, it is possible to explain why nearly
non-stationary AR processes retain some important properties of I(1) processes.2 Many authors
use this specification to develop unit root tests and median unbiased estimation procedures (Cf.
Bobkoski (1983), Chan and Wei (1987), Phillips (1987), Stock (1991), Elliott et al (1996) and
Elliott (1999)) while Stock (1996), Phillips (1998), Kemp (1999) and Ng and Vogelsang (2002)
use it to analyze forecasting.
Stock (1996), Phillips (1998) and Kemp (1999), who consider Case I, consider both onestep-ahead and “long” horizons.
For the latter case, they treat the forecast horizon
asymptotically as a fraction η of the sample size of the time series (i.e. η = h/T, where h is the
forecast horizon and T is the sample size). Stock (1996) shows that the first order asymptotic
performance of long-horizon forecasts depend on whether one chooses a VAR or co-integration
framework. An implication of this result is that, as T increases to infinity, OLS and RW
forecasts tend to disperse the same relative to each other if the drift c and the horizon η are held
constant.
This paper adopts both the local to unity and the long-horizon specifications. I extend
Stock (1996) to show that, for a given forecast horizon η, since OLS forecast errors improve
monotonically relative to RW forecasts as c falls, there is a value of c such that the expected
squared forecast errors from these models are asymptotically identical. Specifically, for a given
forecast horizon η, there is a value of drift, c*(η), such that RW and OLS are equally accurate. I
thus say that for a given value of η, c*(η) is the forecasting threshold of the AR(1) model at
forecast horizon η. Then, if c > c*, RW is a preferred model and if c < c*, OLS is preferred.
In this paper, I identify forecasting thresholds for cases I, II and III. Since Stock (1996),
Phillips (1998) and Kemp (1999) each report local to unity, long-horizon asymptotic
2
It also allows a researcher to abstract from some distributional assumptions
4
distributions of Case I forecast errors for RW and OLS, I must only apply their theorems to build
the threshold function c*(η) for Case I. I do so in section 3.
For cases II and III, however, much more work is needed. Ng and Vogelsang (2002)
report local to unity asymptotic distributions for one-step-ahead OLS forecasts and prove the
invariance of forecasts to the values of the trend parameters (µ and θ). They do not report longhorizon distributions, however.
I give theorems describing local to unity, long-horizon
asymptotic distributions for Case II and III forecast errors for both RW and OLS in section 2. I
then use them to build forecasting thresholds for cases II and III in section 3.
The identification and analysis of the structure of these forecasting thresholds forms a
body of research capable of speaking to numerous important questions about forecasting and
about the AR(1) in particular. I focus on three points in this paper. First, the failure of OLS to
outperform RW when RW is misspecified (but c is very close to 0) is due, in part, to bias in OLS
estimates of the autoregressive parameter but is due primarily to the additional estimation error
present in OLS forecasts.
This illustrates clearly the fundamental advantage of using
parsimonious models for forecasting.
Second, it is clear that optimal criteria for choosing between RW and OLS are linked to
the forecasting thresholds but not to the presence or absence of a unit root. In fact, the unit root
hypothesis and test are only indirectly related to selecting a forecasting model. Thus, in general,
one should not expect a 5% unit root test to choose optimally between RW and OLS forecasts.
In section 5, I use a forecasting example to show the superiority of a novel model selection
criterion that uses these forecasting thresholds explicitly. Specifically, I propose using median
unbiased estimation of the drift c, interpreted with respect to the forecasting thresholds, to select
the model most likely to minimize forecast error.
This offers substantial improvement in
forecasting over a selection criterion using a 5% unit root test.
Third, although I only present theoretical results for the AR(1), the methods in this paper
are likely to be useful in studying forecasting in near-non-stationary AR processes of larger
order. In section 6, I give some preliminary Monte Carlo results comparing forecasts from an
OLS-estimated AR(q) to forecasts obtained by specifying a unit root and estimating the
parameters of the AR(q) under that restriction. It is clear both that the lessons learned from the
AR(1) help one understand the forecasting of these processes and that much more work is
needed to understand it fully.
5
The balance of this paper is as follows. In section 2, I derive asymptotic distributions of
the forecast errors for RW and OLS forecasts for cases I-III. In section 3, I use computer
simulation to identify forecasting threshold functions c*(η) for all cases. In section 4, I discuss
the results of sections 2 and 3 in detail. In section 5, I demonstrate the usefulness of the
forecasting threshold functions in a forecasting example. In section 6, I offer some preliminary
results on forecasting thresholds for AR(q) processes with q > 1 and give brief concluding
remarks.
2. Analysis
In this section I derive the asymptotic distributions of forecast errors from the Random
Walk (RW) and OLS-estimated forecasting models. Ultimately, I wish to compare the meansquare error (MSE) of forecasts for these models. To derive limiting distributions for the
forecast errors (Stock’s Theorems and Theorems 1-3), I use local to unity asymptotic theory.
The forecast errors are functions of Ornstein-Uhlenbeck processes and, for the OLS-estimated
model, the limiting distributions are quite complicated. Indeed, closed-form expressions for
squared errors are mathematically intractable.
Recall that the local to unity assumption (equation (4)) specifies the autoregressive
parameter ρ as a function of sample size T. When the drift parameter c is a negative number, ρ is
less than unity, which would seem to imply that the autoregressive process is strictly stationary.
However, for a finite sample size T, autoregressive processes with ρ close to but slightly below
unity retain some evolutionary properties of the non-stationary random walk process (c = 0).
Primarily, the closer c is to zero, the less is the tendency of the time series to mean-revert.
I start with the following definition of terminology.
Definition: Let J c (r ) be defined as follows:
dJ c (r ) = cJ c (r )dr + dB (r )
(5)
where B(r) = σW(r) and W(r) is a standard Weiner process. This is a standard OrnsteinUhlenbeck process, and

 e 2 cr − 1  
 
J c (r ) ~ N  0, σ 2 

c
2



(6)
6
It is well known that, for the data generating process (0):3
[Tr ]
∑ε
t =1
T
t
⇒ σW ( r ) = B ( r )
[Tr ]
and
u[Tr ]
T
=
c
∑ (1 + T )
t =1
[Tr ]−t
εt
T
⇒ J c (r )
where [w] denotes the integer part of w for any real w and the symbol “⇒” denotes convergence
in distribution. All of the asymptotic results used in this paper to build forecast error limiting
distributions derive from these two.
The “total error” in a single forecast can be broken down into two parts: “forecast error”
_
and “innovation error.” For the h-period-ahead forecast y T + h|T , this “total error” is given as:
_
_
y T + h|T − yT + h = ( y T + h|T − yT + h|T ) + ( yT + h|T − yT + h )
(7)
where yT + h|T is the (infeasible) correct forecast, using the model’s true parameters. The second
term on the right hand side of (7) is the “innovation error”:
yT + h|T − yT + h = ρ h −1ε T +1 + ρ h − 2 ε T + 2 + ... + ρε T + h −1 + ε T + h
(8)
Clearly, this error is common to forecasts from both the RW and OLS models. The expected
square of this term is straightforward to evaluate:
 1 − ρ 2h 
2

E ( yT + h|T − yT + h ) = σ 2 
2 
 1− ρ 
The first term in parentheses on the right-hand side of (7) is the difference between the feasible
forecast and the infeasible forecast that would be made if the model’s parameters were known
with certainty. For OLS and RW, respectively, we define the “forecast errors” for Case K:
_ K
_ OLS ( K )
_ K
_ RW ( K )
S T + h|T = y T + h|T − yT + h|T
R T + h|T = y T + h|T − yT + h|T
(9)
(10)
My use of the term “forecast error” is the same as that in Ng and Vogelsang (2002). Clearly, if a
forecast has a lower expected squared forecast error, it has lower expected squared total error.
3
See, e.g., Phillips (1987).
7
Case I
For Case I OLS forecasts, ρ is estimated using a least squares regression of { y t }Tt=1 on
{ y t −1 }Tt=1 . The forecast is then:
_
(I )
h
yTOLS
+ h|T = ( ρ ) yT
(11)
where the bar over ρ denotes the OLS estimate. For forecasts from this model, the “forecast
error” is thus given:
_ I
 _ h
(I )
h
S T + h|T = yTOLS
+ h|T − yT + h|T =  ( ρ ) − ρ  y t


The following result was first reported by Stock (1996). It is also a special case of a theorem due
to Kemp (1999) and is similar to a result in Phillips (1998).
_ I
4
Theorem (Stock (1996)): Let the data be generated by (1). Let S T +ηT |T be the forecast
error for the OLS-estimated forecast (equation (11)) for horizon η =
_ I
S T +ηT |T
T
ηc
⇒ e (e
_



η  c − c 
h
.
T
− 1) J c (1)
where:
1
_ 
_

 c − c  = lim T →∞ T  ρ − ρ  =




∫J
c
(r )dB(r )
0
1
∫J
c
(r ) 2 dr
0
_ 
The asymptotic distribution of  c − c  is non-normal and non-central. Its moments depend upon


c only.
For cases I and II, the RW model imposes ρ = 1 and forecasts all future values of the
time series by using the final available value of the time series:
4
This is equivalent to equation (4) from Stock (1996), p. 689.
8
_ RW ( I , II )
y T + h|T
= yT
(12)
The forecast error is thus defined as follows:
_ I , II
_ RW ( I , II )
R T + h|T = y T + h|T
− yT + h|T = (1 − ρ h ) yT
_ I , II
5
Theorem (Stock (1996)): Let the data be generated by (1) or (2). Let R T +ηT |T be the
forecast error for the random walk forecast (equation (12)) for horizon η =
_ I , II
R T +ηT |T
T
(
h
.
T
)
⇒ 1 − eηc J c (1)
Note that no estimation error is present in the forecast error. All potential error is due to
bias and if c = 0, RW is correctly specified and the forecast error is zero.
Case II
For Case II OLS forecasts, ρ and α are estimated using a least-squares regression of
{ y t }Tt=1 on { y t −1 }Tt=1 and {1,1,…,1}’. The forecast is then:
y
OLS ( II )
T + h|T
_

h 
1 − (ρ)  _ h
+ ( ρ ) yT
= α
_
 1 − ρ 


_
(13)
_ II
_ I
The construction of the forecast error S T + h|T is analogous to S T + h|T . I now give the following
theorem.
_ II
Theorem 1: Let the data be generated by (2). Let S T +ηT |T be the forecast error for the
forecast from the OLS-estimated model (equation (13)) for horizon η =
5
h
.
T
This is equivalent to equation (6) from Stock (1996), p. 689.
9
_ II
 1 − eη c
⇒ 
_

 −c
−
S T +ηT |T
T
1

_
 − B (1) + (c_ − c) J (r )dr  + eηc (eη ( c −c ) − 1) J (1)
c
∫0 c




where:
1
_ 
 c− c  =


1
∫ J c (r )dB(r ) − B(1)∫ J c (r )dr
0
0
1
∫J
0


(r ) 2 dr −  ∫ J c (r )dr 
0

1
c
Proof: See Appendix
This limiting distribution appears to include more error than the limiting distribution for the Case
I OLS forecast error. Since the RW model uses the same forecast for Case II as in Case I, it is to
be expected that it should perform even better against the OLS-estimated model for Case II.
Case III
For Case III, the form of RW is known as the Random Walk with Drift (RWD) and
collapses to:
yt = θ + yt − 1 + ε t
(14)
To forecast h steps ahead with this model, a researcher estimates θ (by taking the mean of the
first differences of the time series) and then uses the forecast:
_ RW ( III )
y T + h|T
_
= θ h + yT
(15)
where the bar over θ denotes the estimated value.
_ III
Theorem 2: Let the data be generated by (3). Let R T +ηT |T be the forecast error for the
random walk (with drift) forecast (equation (15)) for horizon η =
_ III
R T +η T |T
T
(
h
.
T
)
⇒ 1 + η − eηc J c (1)
Proof: See Appendix
10
Note that, whenever η > 0, the expected square of the above term will be larger than for
the random walk without drift (cases I and II). This additional error stems exclusively from the
error in estimating the drift θ.
For Case III OLS forecasts, ρ, α and δ are estimated using a least squares regression of
{ y t }Tt=1 on { y t −1 }Tt=1 , {1,1,…,1}’ and {1,2,…,T}’. The forecast is then:
( III )
yTOLS
+ h|T
_

h 
_
 _ h
 _ _  1 − ( ρ )  _  h
j −1
 + ( ρ ) yT
+
−
+
= α + δ T 
δ
(
h
j
1
)(
ρ
)
_

∑


 1 − ρ 
j =1




(16)
The following theorem gives the asymptotic distribution for the forecast errors.
_ III
Theorem 3: Let the data be generated by (3). Let S T +ηT |T be the forecast error for the
forecast from the OLS-estimated model (equation (16)) for horizon η =
_ III
S T +ηT |T
T
{
⇒
h
.
T
_
 η c_

1
1

 _ 
 e − η c− 1  1

(
6
12
r
)
dB
(
r
)
c
c
6
J
(
r
)
dr
12
rJ
(
r
)
dr
−
−
−
+
−


c
c
∫

 ∫0
 ∫

c2


0
 0



_ 

1
1
1
ηc 
  ηc η  c −c 
 _ 
 1 − e 
+
− 1) J c (1)
− ∫ (2 − 6r )dB(r ) +  c− c  2∫ J c (r )dr − 6∫ rJ c (r )dr   + e (e
_

 − c  0

 0
0




_
}
where:
1
1
1
 1
 1
 1

B(1) 6 ∫ rJ c (r )dr − 4∫ J c (r )dr  + ∫ J c (r )dB(r ) + ∫ rdB(r ) 6 ∫ J c (r )dr − 12∫ rJ c (r )dr 
_ 
0
0
0
 0
 0
 0

 c− c  =
2
2
1
1
1
1
1









2
∫0 J c (r ) dr − 4 ∫0 J c (r )dr  + 12 ∫0 J c (r )dr  ∫0 rJ c (r )dr  − 12 ∫0 rJ c (r )dr 
Proof: See Appendix
3. Simulation of Forecasting Thresholds
In this section, I identify forecasting thresholds for the RW/OLS decision in two ways:
(1) “predicted” thresholds using the formulas in the Theorems of section 2; and (2) “actual”
thresholds using h-period-ahead forecasts from RW and OLS models. I rely upon simulation; for
11
each Monte Carlo trial I use Gaussian errors for {ε t }Tt=1 , with T = 1000, in the DGP given in (0).6
I consider the forecasting horizons η = {.001, .01, .05, .1, .2, .3, .4} and drift values c ∈ [0,−12] ,
and use a grid search to determine the threshold values of drift c* as a function of horizon η. For
each (c,η) pair I use 50,000 trials for precise estimates of the expected squared forecast error
from RW and OLS. I then use linear interpolation to estimate the function c*(η) where the
expected squared forecast errors are identical for these two models. The estimated functions for
cases I, II and III are shown in Figures 1, 2 and 3 respectively.
I also compute actual forecasts from the RW and OLS models for η = {.01, .05, .1, .2, .4}
for a typical empirical sample size for post-war quarterly US macroeconomic aggregates (T =
200). I compare the mean-squared error of forecasting, which includes both the forecast error
and the innovation error, for these two models, to describe the relative difference in forecasting
efficiency. For simplicity, I report simulated measures of:
R (c , η ) =
MSE ( RW , c,η )
MSE (OLS , c,η )
(17)
for each c and η. Thus, if this value is below 1, RW is a preferred forecasting model.
Figure 1 and Tables 1a and 1b give the results for Case I. The region above the curves in
Figure 1 is where RW forecasts have lower forecast error (and thus lower MSE) than OLS
forecasts. In the graph, the drift c is plotted against the forecast horizon fraction η. The
difference between the predicted and actual thresholds is small, indicating that the asymptotic
approximations given in Stock’s Theorems are very accurate. Note that, for each forecast
horizon, RW is always preferred for c very close to 0, and OLS is preferred for c far from 0. For
instance, at η = .001, so that h = 1, the threshold c* ≈ -2.8. Thus, RW is preferred whenever ρ ∈
6
Simulation of the “predicted” forecast errors, using the theorems of section 2, consists of simply building partial
sums that converge to the various pieces of the asymptotic distributions and then combining them according to the


1


 1 T
theorems. For example, to simulate  J c (r )dr  , use  3
ut  .


 2 t =1 
0


T
∫
∑
_
For simulation of “actual” forecast errors, for each trial I build the h-period-ahead forecast y T + h|T , compare it to
the actual h-period-ahead value yT + h|T , and compute the square. Averaging over all trials gives the expected square.
12
(.9972,1] and OLS is preferred whenever ρ < .9972. However, the “threshold” value of c, where
RW and OLS have the same MSE, changes with the horizon η. The simulated function c*(η)
achieves a minimum at η = .001, and rises with η until η = .3, then it declines again.
In Table 1a, I report the threshold functions c*(η), which gives the numbers used to build
the graphs in Figure 1. Note the “predicted” and “actual” values of c*(η) never differ by more
than .04. In Table 1b, I report values of R(c,η) for selected c and η. At c = -1, η = .05, for
instance, R = .95, so that RW has 5% lower MSE than OLS. This ratio decreases with the
forecast horizon whenever RW is clearly preferred (c = 0 or c = -1), and increases with the
forecast horizon whenever OLS is clearly preferred (c ≤ -3). Hence, the expected loss from
choosing the wrong forecasting model is greater at longer forecasting horizons. When c = -2,
RW is preferred at some horizons, while OLS is preferred at others.
For Case II, c*(η) is shown in Figure 2. This function shares several features with that of
Case I. Again c*(η) achieves a minimum at η = .001 and grows with η. In this case, however,
there is no peak; the function achieves its largest value at η = .4. Note that the function is
beneath that of Case I at all forecast horizons. As expected, RW has a greater advantage against
this model.
Next consider Tables 2a and 2b. Again the “predicted” and “actual” values of c*(η) are
close, never differing by more than .09. Note that the loss from using OLS, when c is very close
to zero, is greater here than in Case I. For instance, we report in Table 2b that R(-1,.05) = .92,
for a difference of 8%, as compared to a 5% difference in Case I. Again, the expected loss from
choosing the wrong forecasting model is greater at longer forecasting horizons.
For Case III, c*(η), as shown in Figure 3, is well below the Case I and Case II functions.
It, too, achieves a minimum at η = .001 and is concave, but peaks at η = .2. In Tables 3a, the
“predicted” and “actual” values of c*(η) are again close. In Table 3b, the loss from using OLS
when c is very close to zero is shown to be greater than in cases II and III. For instance, R(2,.05) = .88, for a difference in MSE of 12%. The expected loss from choosing the wrong
forecasting model is usually magnified at longer forecasting horizons, although not between η =
.2 and η = .4 when c = -7, -8, -9 and -10, where c*(η) is decreasing in η.
13
4. The Importance of Downward Bias
Intuitively, RW forecasts should be preferred when c is very close to 0. There are fewer
parameters to estimate and bias due to misspecification is also small. It should not be surprising,
then, that there are even theoretical cases where RW is preferred to OLS when the autoregressive
parameter is known with certainty.7 However, downward bias in the OLS estimate of the
autoregressive parameter, a well-known phenomenon, does matter.8 In fact, this largely explains
why RW forecasts are preferred more often at short horizons in my analysis.
Consider a version of Case I where c < 0, so that ρ < 1. The RW error is (1 − ρ h )uT ,
_
which clearly increases monotonically in h. For most values of ρ < 1, usually the case for the
_
OLS estimate, its error, (( ρ ) h − ρ h )uT also grows in absolute value with h.9 But even if it
grows, it does so more slowly than does (1 − ρ h )uT . Thus, OLS errors typically improve,
relative to RW, as h increases. 10
_
Because of downward bias, (( ρ ) h − ρ h )uT will frequently be larger in absolute value
_
than (1 − ρ h )uT when c is close to 0. If ρ is below ρ, the ill effects of this bias will be most
damaging, relative to RW, when h = 1. Thus, holding c constant, the relative improvement of
In Case II, for instance, if OLS must estimate α but not ρ, RW is preferred at all forecast horizons whenever -2 < c
< 0. This is simple to show analytically.
8
Phillips (1979) gives an excellent discussion of downward bias. For Case II, an oft-cited (crude) approximate
7
_
formula for downward bias in the OLS estimate of ρ is E ( ρ − ρ ) ≅ −T −1 (1 + 3ρ ) . (e.g., Mark (1995)).
9
_
It will only shrink initially if ρ is near 0.
_
_
For instance, suppose ρ = .98. If ρ = .96 is the OLS estimate, then ( ρ − .98)u T is the same in absolute value
( .02u T ) for the RW as for the OLS model. At h = 4, however, this term is (1 − .922)uT = .078uT in the RW model
but only (.849 − .922)uT = −.073uT in the OLS model. At h = 20, this term is (1− .667)u T = .333u T in the RW
model, but only (.442 − .667)uT = −.225uT in the OLS model. Thus, it is intuitive that the OLS model should
10
_
improve, relative to RW, as the forecast horizon increases. Since the term (( ρ ) h − ρ h )u T appears in the forecast
errors for cases I, II and III, this effect holds in all three cases.
14
OLS forecasts as η increases, as illustrated by the positive slope of the forecasting thresholds in
Figures 1-3, is magnified by downward bias in the OLS estimate of ρ.11
5. Selecting Between RW and OLS
The predicted thresholds in this paper help inform the decision between RW and OLS
forecasts. As was mentioned in the introduction to this paper, if one had strong evidence that ρ
is very close to or equal to 1, RW would be a good choice for a forecasting model. Unit root
tests can provide such evidence.
Recent research has investigated forecasting strategies of the model in (0) that use unit
root tests to decide whether or not to specify a RW model. Stock (1990), Campbell and Perron
(1991), Stock (1996) and Diebold and Kilian (2000) have investigated the Monte Carlo accuracy
of these “pretest” strategies across a considerable range of parameterizations of this model, and
found that these strategies have some obvious value relative to uniform strategies that either
always or never use RW. In all cases, the size of pretests chosen has been either 5% or 10%.
Sizing pretests according to conventional hypothesis test sizes is due to two facts: (1) that
5% and 10% are generally accepted thresholds for statistical significance in hypothesis testing;
and (2) the critical values are readily available for these sizes. As Diebold and Kilian (2000)
point out, however, these sizes satisfy no criterion of optimality for forecasting. Indeed, as this
paper has shown, the unit root hypothesis is far stricter than a hypothesis that says that RW will
be a preferred forecasting model. In fact, what matters is whether c is above or below c*(η).
Unfortunately, c is not consistently estimable in a univariate framework. Thus, it is
impossible to test, using traditional asymptotic theory, whether c is above or below c*(η).
However, Stock (1991) shows that it is possible to find estimates of c that are median unbiased.
These values can be used to predict, for a given time series, which side of c*(η) the true c most
While the threshold functions for cases I and II become nearly flat after η ≥ .2, the function for Case III has a
pronounced peak at η = .2 and declines significantly as the horizon increases further. This appears to be due to the
need to estimate the trend in the OLS model. In a hypothetical version of this case where ρ is known with certainty,
it can be shown that the threshold is near c* ≈ -5.9 at η ≅ 0, and declines monotonically with η, reaching c* ≈ -8.5 at
η = .4. In Figure 3, as η increases from 0 to .2, this downward pressure on the threshold function is dominated by
11
_
upward pressure on (( ρ ) h − ρ h )u T described earlier in this section. After η reaches .2, however, the forecasting
threshold does indeed decline.
15
likely falls on. Hence, it is possible to choose the forecast that is most likely to minimize the
forecast error.
We now test the usefulness of this proposition. Figure 4 shows the relative performance
of two intuitive strategies in forecasting Case II of the AR(1). One strategy uses the traditional
statistical criterion, employing a 5% Dickey-Fuller test and choosing OLS only if the DF test
rejects the unit root null.
The second strategy obtains a median unbiased estimate of drift c, then compares this
value to the appropriate forecasting threshold.
OLS is used if this estimate is below the
threshold. This consists of first building the Dickey-Fuller t-statistic:
_
τ=
_
ρ−1
(18)
_
σρ
_
_
where σ ρ is the usual OLS standard error. A median-unbiased estimate of c can be found for τ
using the tables in Stock (1991). For forecast horizon η, if that estimate is below c*(η), OLS is
_
used. This decision rule essentially produces a critical value for the t-statistic ( τ * ) that is
_
different from the 5% Dickey-Fuller critical value of -2.86. I report the critical τ * values for
Cases II and III beneath Figure 4.
I employ a Monte Carlo simulation to estimate the MSE of each strategy.12 The surface
in Figure 4 measures the ratio of MSEs of these two strategies:
R * ( c, η ) =
MSE ( Strategy 2, c,η )
MSE ( Strategy1, c,η )
(19)
Whenever the surface in Figure 4 is above 1, Strategy 1 has lower MSE than our proposed
strategy 2, which uses forecasting thresholds and median-unbiased estimation. This only occurs
when c is larger than –4, and in these cases R* is never more than 1.05. When c is less than –4,
on the other hand, Strategy 2 is dominant and by a larger amount. R* sinks to .81 when c = -13,
η = .1, and reaches .7 at η = .4. This is an extremely significant gain in forecast accuracy by
Strategy 2. For example, consider an AR(1) with a non-zero mean and true ρ = .935. If one had
a time series of length 200 generated by this process, Strategy 2 would reduce the MSE of 20step-ahead forecasts by nearly 20%.
12
T=200, number of trials = 20,000.
16
Clearly, strategy 2 has an advantage over strategy 1 if there is any reason to believe that c
is less than the forecasting threshold. Moreover, it is simple to employ, and amounts essentially
to using a more powerful DF test. In sum, the proper use of unit root tests to select forecasting
models must account for the fact that the RW model can be a better forecasting model even if
there is no unit root. Thus, there is no argument for generally using unit root tests, with
traditional 5% and 10% statistical criterion, as pretests to select forecasting models. Indeed, the
selection criteria introduced here represents an obvious improvement.
6. Thresholds for the AR(q) with q > 1
In forecasting an empirical time series with autoregressive tendencies, a researcher will
seldom know with absolute certainty whether the process is in fact an AR(1). A typical method
for handling this problem is to treat the series as an AR(q) and estimate q directly. Hence, it is
an important practical matter to ask how the thresholds in this paper would change if q > 1. For
case II, then, I specify the time series as follows:
p
y t = α + ∑ ρ i y t −i + ε t
(20)
i =1
where:
p
α = µ (1 − ∑ ρ i )
i =1
I refer to forecasts built from estimating (20) as OLS forecasts. Under the AR(q) assumption, the
series is stationary whenever the persistence of the time series, as defined by the sum of the
autoregressive parameters, is less than one in absolute value.13
Here, then, I model the
persistence as local-to-unity:
p
∑ρ
i =1
i
= 1+
c
T
(21)
13
I also assume that, with the exception of the c = 0 case, the system is stable, in the sense that all eigenvalues are
less than one in modulus.
17
Note that if c = 0, RW is no longer necessarily the correctly specified model when q > 1.14 In
this case, however, since it will always be true that the sum of the autoregressive coefficients is
1, there exists a restriction of the series in (20) that is a natural qth-order analog to RW:
 p −1 
y t = ρ1 y t −1 + ρ 2 y t −1 + ... + 1 − ∑ ρ i  y t − p + ε t
i =1


(22)
As is the case with RW, there is no need to estimate the parameter α that is present in (20). In
addition, there is one less ρ i parameter to estimate. Thus, this unit root (UR) specified AR(q)
estimates two fewer parameters than the full specification of (20). This is the same difference as
between RW and OLS in the AR(1) case.
This suggests that the forecasting thresholds for the AR(q) may share some features of
the AR(1) thresholds. As we now demonstrate in a Monte Carlo analysis, if the true process is
actually an AR(1) but a higher order AR(q) is used, the forecasting thresholds are very similar to
the AR(1) thresholds. Figure 5 shows estimated threshold functions for the AR(1), AR(2) and
AR(5) cases together. The levels of the estimated functions are very similar, although the shape
of the AR(2) and AR(5) thresholds is different at very short horizons. While the AR(1) threshold
is at its minimum at η = .001, the AR(2) and AR(5) thresholds fall from η = .001 to η = .01.
When q > 1, the fact that the persistence of the series does not uniquely determine the
autoregressive parameters affects the thresholds. In the AR(2), for instance, as ρ 2 moves,
holding c constant, the thresholds move as well. This is demonstrated in Figure 6, which shows
the estimated AR(2) threshold from Figure 5 along with two additional specifications of ρ 2 : -.1
and .1. All threshold functions have short-horizon features that are different than those in the
AR(1) thresholds. At most horizons, the thresholds for ρ 2 = -.1 favor OLS at more values of c
than in the AR(1) while the thresholds for ρ 2 = .1 favor UR at more values of c. The shapes of
these functions are similar, but the levels are clearly different at long horizons, more so than the
threshold functions in Figure 5.
We form two important but preliminary conclusions from these results. First, it appears
that if the autoregressive process is local to unity, the first autoregressive parameter is very close
RW is correctly specified only when c = 0 and ρ 2 = 0 . Even with this additional reason that RW may be
misspecified, it may nonetheless continue to perform well as a forecasting model. We discuss this in more detail
later in this section.
14
18
to unity and the other autoregressive parameters are very close to zero, the AR(1) thresholds
from this paper provide excellent approximations to true AR(q) thresholds. That is, if the true
series is very nearly an AR(1) but an AR(5) is used for forecasting, the AR(1) thresholds
accurately frame the choice between OLS and UR.
Second, the factors affecting forecasting for higher-order AR processes are potentially
many and complex. Clearly, it remains to explain the short-horizon dip in the AR(2) and AR(5)
threshold functions. Moreover, although the AR(1) thresholds appear to be quite reasonable
approximations for AR(2) thresholds when ρ 2 is close to but not equal to zero, both the
magnitude of ρ 2 and its sign clearly affect the thresholds. This is not surprising, given that the
eigenvalues of the system, which determine its dynamics, are not uniquely determined by the
local-to-unity persistence.15
Thus, in order to more fully understand these results, challenging theoretical work may
be necessary. However, such research is likely to lead to a far greater understanding of the
forecasting of near non-stationary AR processes. Given the prevalence of such processes in
studies of empirical macroeconomic and financial data, this research is likely to be productive.
It will be interesting also to investigate the relative forecasting performance of RW to the
OLS estimated AR(q).
It is well established that, in forecasting persistent economic and
financial time series, RW is very difficult to beat. If the advantages of RW forecasts extend to
near-non-stationary AR(q) processes theoretically, such research may help explain why.
15
For instance, Hamilton (1994, pp. 13-18) shows that, in the AR(2), the dynamic multiplier follows different
patterns depending upon the eigenvalues of the system. Namely, if the eigenvalues are real and of less than unit
modulus, which is true whenever ρ 2 and ρ1 are both greater than zero, the dynamic multiplier follows a pattern of
geometric decay. If they are complex and of less than unit modulus, which may hold if ρ 2 is below zero, the
dynamic multiplier follows a pattern of damped oscillation.
19
8. References
Bobkoski, M.J. “Hypothesis Testing in Non-stationary Time Series,” unpublished PhD dissertation,
Department of Statistics, University of Wisconsin, 1983.
Campbell, John Y. and Pierre Perron. 1991. “Pitfalls and opportunities: what macroeconomists should
know about unit roots. in O. Blanchard and S. Fischer (eds.), NBER Macroeconomics Annual,
Boston., MA.
Chan, N.H. and Wei, C.Z. 1987. Asymptotic inference for nearly non-stationary AR(1) processes. Annals
of Statistics, 15,1050-63.
Diebold, Francis X. and Lutz Kilian. 2000. Unit root tests are useful for selecting forecasting models.
Journal of Business and Economic Statistics 18 (3), 265-73.
Elliott, Graham, Rothenberg, T.J. and J.H. Stock. 1996. Efficient tests for an autoregressive unit root.
Econometrica 64, 813-36.
Elliott, Graham. 1999. Efficient tests for a unit root when the initial observation is drawn from its
unconditional distribution. International Economic Review 40, 767-783.
Fuller, Wayne A. and David P. Hasza. 1981. Properties of Predictors for Autoregressive Time Series.
Journal of the American Statistical Association 76, 155-161.
Hall, Robert E. 1978. Stochastic implications of the life cycle permanent income hypothesis: theory and
evidence. Journal of Political Economy 86, 971-988.
Hamilton, James D. 1994. Time Series Analysis. Princeton University Press, Princeton, NJ.
Kemp, Gordon C.R. 1999. The behavior of forecast errors from a nearly integrated AR(1) model as both
sample size and forecast horizon become large. Econometric Theory 15, 238-256.
Magnus, Jan R. and M. Hashem Pesaran. 1989. The exact multi-period mean-square forecast error for the
first-order autoregressive model with an intercept. Journal of Econometrics 42 (2), 157-87.
Mark, Nelson. 1995. Exchange rates and fundamentals: Evidence on long-horizon predictability.
American Economic Review 85 (1), 201-218.
Ng, Serena and Timothy J. Vogelsang. 2002. Forecasting autoregressive time series in the presence of
deterministic components. Econometrics Journal 5, 196-224.
Phillips, P.C.B. 1979. The sampling distribution of forecasts from a first-order autoregression. Journal of
Econometrics 9, 241-261.
Phillips, P.C.B. 1987. Toward a unified asymptotic theory for autoregression. Biometrika 74, 535-47.
Phillips, P.C.B. 1998. Impulse response and forecast error variance asymptotics in nonstationary VARs.
Journal of Econometrics 83, 21-56.
20
Sims, Christopher A., James H. Stock and Mark W. Watson. 1990. Inference in linear time series with
some unit roots. Econometrica 58, 113-44.
Stock, James H. 1990. Unit roots in economic time series: do we know and do we care? A comment.
Carnegie-Rochester Conference Series on Public Policy 32, 63-82.
Stock, James H. 1991. Confidence intervals for the largest autoregressive root in US macroeconomic time
series. Journal of Monetary Economics 28, 435-459.
Stock, James H. 1996. VAR, error correction, and pretest forecasts at long horizons. Oxford Bulletin of
Economics and Statistics 58, 685-701.
21
Case I
Figure 1 - Forecasting Thresholds
0
0
0.1
0.2
0.3
0.4
Drift (c)
-1
Predicted c*
Actual c*
-2
-3
-4
Horizon (eta)
Table 1a - Forecasting Thresholds
Horizon (η)
Predicted c*
Actual c*
0.001
0.01
0.05
0.1
0.2
0.3
0.4
-2.78
-2.66
-2.27
-2.07
-1.93
-1.89
-2.01
-2.8
-2.66
-2.3
-2.07
-1.9
-1.87
-1.97
Table 1b - MSE(RW) / MSE(OLS)
Horizon (η)
Drift
(c)
0
-1
-2
-3
-4
-5
-6
-8
-10
0.01
0.98
0.99
0.99
1.00
1.01
1.01
1.02
1.03
1.04
0.05
0.92
0.95
0.99
1.02
1.06
1.07
1.11
1.16
1.22
0.1
0.85
0.92
1.00
1.06
1.12
1.17
1.23
1.32
1.43
0.2
0.74
0.86
1.01
1.13
1.26
1.35
1.44
1.61
1.72
0.4
0.51
0.75
0.98
1.25
1.45
1.62
1.72
1.89
1.97
22
Case II
Figure 2 - Forecasting Thresholds
0
0
0.1
0.2
0.3
0.4
Drift (c)
-2
Predicted c*
Actual c*
-4
-6
-8
Horizon (eta)
Table 2a - Forecasting Thresholds
Horizon (η)
Predicted c*
Actual c*
0.001
0.01
0.05
0.1
0.2
0.3
0.4
-5.6
-5.2
-4.24
-3.56
-2.97
-2.84
-2.81
-5.69
-5.31
-4.23
-3.59
-2.98
-2.81
-2.76
Table 2b - MSE(RW) / MSE(OLS)
Horizon (η)
Drift
(c)
0
-1
-2
-3
-4
-5
-6
-7
-8
-9
-10
0.01
0.97
0.97
0.98
0.98
0.99
1.00
1.00
1.01
1.01
1.02
1.03
0.05
0.90
0.92
0.93
0.97
1.00
1.02
1.05
1.08
1.10
1.13
1.15
0.1
0.86
0.89
0.92
0.97
1.02
1.08
1.13
1.17
1.22
1.25
1.31
0.2
0.80
0.86
0.93
1.01
1.09
1.18
1.26
1.33
1.41
1.47
1.51
0.4
0.70
0.81
0.94
1.06
1.19
1.30
1.40
1.47
1.55
1.61
1.64
23
Case III
Figure 3 - Forecasting Thresholds
0
Drift (c)
-2
0
0.1
0.2
0.3
0.4
-4
Predicted c*
Actual c*
-6
-8
-10
-12
Horizon (eta)
Table 3a - Forecasting Thresholds
Horizon (η)
Predicted c*
Actual c*
0.001
0.01
0.05
0.1
0.2
0.3
0.4
-10.32
-9.65
-7.87
-7.05
-6.73
-7.04
-7.83
-10.5
-9.63
-7.86
-7
-6.82
-7.05
-7.87
Table 3b - MSE(RW) / MSE(OLS)
Horizon (η)
Drift
(c)
0
-2
-4
-5
-6
-7
-8
-9
-10
-11
-12
0.01
0.95
0.96
0.96
0.97
0.98
0.98
0.99
0.99
0.99
1.00
1.01
0.05
0.87
0.88
0.90
0.93
0.95
0.98
1.00
1.02
1.05
1.07
1.09
0.1
0.82
0.84
0.89
0.93
0.97
0.99
1.03
1.07
1.11
1.15
1.18
0.2
0.78
0.79
0.86
0.90
0.95
1.01
1.06
1.11
1.15
1.21
1.25
0.4
0.73
0.71
0.78
0.83
0.89
0.94
1.01
1.06
1.12
1.16
1.21
24
Figure 4 – {MSE(Strategy 2) / MSE(Strategy1)} (Case II)
Strategy 1: Use a 5% Dickey-Fuller Pretest to decide between RW and OLS.
~
Thus use OLS if the Dickey-Fuller t-statistic τ is less than the 5% critical value
(-2.86 for Case II and -3.41 for Case III) and RW otherwise.
Strategy 2: Get a Median Unbiased (MU) estimate of c, using tables from Stock
(1991). For forecasting horizon η, use OLS if the MU estimate of c is less than
the forecasting threshold c*(η) given in Tables 2a and 3a (Cases II and III
respectively) and use RW otherwise. This corresponds to using OLS if the
~
~
Dickey-Fuller t-statistic τ is less than the τ * given below.
Horizon (η)
~
Case II τ *
~
Case III τ *
0.001
0.01
0.05
0.1
0.2
0.3
0.4
-2.13
-2.09
-1.99
-1.92
-1.87
-1.85
-1.85
-2.86
-2.81
-2.67
-2.6
-2.57
-2.6
-2.66
25
Figure 5 - AR(p) Forecasting Thresholds,
True Process is an AR(1)
0
-1 0
0.1
0.2
0.3
0.4
c*
-2
AR(5)
AR(2)
AR(1)
-3
-4
-5
-6
-7
Horizon (eta)
Drift (c)
Figure 6 - AR(2) Forecasting Thresholds
0
-1 0
-2
-3
-4
-5
-6
-7
-8
-9
0.1
0.2
0.3
0.4
rho2 = -.1
rho2 = 0
rho2 = .1
Horizon (eta)
26
Download