Local to Unity, Long-Horizon Forecasting Thresholds for Model Selection in the AR(1) John L. Turner# Abstract: The paper develops a framework for analyzing long-horizon forecasting in the AR(1) model using the local to unity specification of the autoregressive parameter. I report new asymptotic results for the distributions of the forecast errors when the AR(1) parameters are estimated with ordinary least squares (OLS), and also for the distributions of Random Walk (RW) forecasts. There exist functions, relating local to unity “drift” to forecast horizon, such that OLS and RW forecasts share the same expected square error. RW forecasts are preferred on one side of these “forecasting thresholds,” while OLS forecasts are preferred on the other. I identify these forecasting thresholds, use them to develop novel model selection criteria and show how they help a forecaster reduce error. JEL Classification: C22 Keywords: forecasting, autoregressive, near-non-stationary, long-horizon September 2003 # Department of Economics, Terry College of Business, University of Georgia, Brooks Hall 5th Floor, Athens, GA 30602. Tel: 706-542-3682. Fax:706-542-3376. E-mail: jlturner@terry.uga.edu. I thank Christopher Otrok, Tim Vogelsang, Jonathan Wright, Bill Lastrapes, two anonymous referees and seminar participants at the University of Virginia for their helpful comments. Time series forecasting of near non-stationary, possibly I(1), autoregressive processes remains poorly understood despite the frequent use of such models in the econometric and macroeconomic literature. Research has focused on two forecasting models to choose from, the VAR framework and the I(1) specified co-integration framework. However, neither unit root nor forecasting research has yielded precise theoretical rules for selecting between these two models. Consider the univariate AR(1) model, for which the I(1) framework specifies the Random Walk model (RW) and the VAR framework uses a simple ordinary least squares (OLS) regression to estimate the parameters of the model. It is clear that, in finite samples, RW forecasts have lower mean-square error (MSE) than OLS forecasts when the autoregressive parameter is sufficiently close, but perhaps not equal, to unity (see, e.g. Stock (1996) and Diebold and Kilian (2000)). As a result, RW forecasts can be preferred even where the Random Walk model itself represents a misspecification of the true time series process. In such cases, RW produces biased forecasts. Letting the autoregressive parameter shrink away from unity, this bias increases, driving up the error of RW forecasts (relative to OLS). Eventually, for sufficiently small values of the autoregressive parameter, the MSE of RW forecasts becomes equal to, then larger than, the MSE of OLS forecasts. I term the value of the autoregressive parameter where MSE(RW) = MSE(OLS) a forecasting “threshold,” since on one side of the threshold RW is a preferred forecasting model, while on the other side OLS is preferred. Since the threshold value of the autoregressive parameter changes with sample size, I appeal to local to unity theory to identify asymptotic Pitman drift thresholds that are robust to changes in sample size. The identification of these thresholds is important for three primary reasons. First, it extends Stock (1996) in a way that allows us to accurately predict the relative performance of RW and OLS forecasts. Second, it helps explain how downward bias in the OLS estimate of the autoregressive parameter affects forecasting. Most importantly, though, it reframes the question of how best to select between RW and OLS. This paper shows that rendering a statistical judgment as to the presence or absence of a unit root should not be a major concern of a forecaster. 2 1. Introduction Consider the canonical AR(1) model:1 yt = d t + ut d t = µ + θt u t = ρu t −1 + ε t , ρ ∈ [−1,1] (0) ε t ~ iid (0, σ 2 ) u0 = 0 While the model is only univariate with a simple innovation structure, it is nonetheless very important to economists. Theory often predicts that certain macroeconomic aggregates have an autoregressive structure, and empirical research confirms the first-order autoregressive tendencies (see, e.g., Hall (1978)). Given this, it is important for economists to understand fully how best to forecast this model. Additionally, the lessons learned from the AR(1) model can then be extended in more general VAR and ARMA specifications. I consider three different versions of (0): Case I: µ = 0,θ = 0 ⇒ y t = ρy t −1 + ε t (1) Case II: µ ≠ 0,θ = 0 ⇒ y t = α + ρy t −1 + ε t (2) Case III: µ ≠ 0,θ ≠ 0 ⇒ y t = α + δt + ρy t −1 + ε t (3) where α = µ (1 − ρ ) + θρ , δ = θ (1 − ρ ) . Cases II and III are widely applicable to forecasting economic variables. Case I is of mostly theoretical interest but is relatively straightforward to analyze. I consider it primarily for expository purposes. Several previous papers have expressed analytically the distribution of forecasts of cases I and II when the model’s parameters are estimated by least-squares. Box and Jenkins (1970) derive crude, analytically tractable, approximations for the distribution of AR(1) forecasts for Case I, while Phillips (1979, Case I), Fuller and Hasza (1981, Case II) and Magnus and Pesaran (1989, Case II) derive more precise, analytically cumbersome, approximations for the distributions of these forecasts. This past research, which employs traditional probability theory, offers considerable insight into AR(1) forecasts. However, the development and dissemination of local to unity 1 My specification of u 0 = 0 follows Stock (1991), for reasons that will become clear later in the paper. 3 asymptotic theory has laid the foundation for significantly more powerful analysis of forecasting. It specifies the autoregressive parameter ρ as being in an O(1/T) neighborhood of unity: ρ = 1+ c T (4) For asymptotic analysis, the drift c is held constant as T approaches infinity. Using the specification in (4) and the functional central limit theorem, it is possible to explain why nearly non-stationary AR processes retain some important properties of I(1) processes.2 Many authors use this specification to develop unit root tests and median unbiased estimation procedures (Cf. Bobkoski (1983), Chan and Wei (1987), Phillips (1987), Stock (1991), Elliott et al (1996) and Elliott (1999)) while Stock (1996), Phillips (1998), Kemp (1999) and Ng and Vogelsang (2002) use it to analyze forecasting. Stock (1996), Phillips (1998) and Kemp (1999), who consider Case I, consider both onestep-ahead and “long” horizons. For the latter case, they treat the forecast horizon asymptotically as a fraction η of the sample size of the time series (i.e. η = h/T, where h is the forecast horizon and T is the sample size). Stock (1996) shows that the first order asymptotic performance of long-horizon forecasts depend on whether one chooses a VAR or co-integration framework. An implication of this result is that, as T increases to infinity, OLS and RW forecasts tend to disperse the same relative to each other if the drift c and the horizon η are held constant. This paper adopts both the local to unity and the long-horizon specifications. I extend Stock (1996) to show that, for a given forecast horizon η, since OLS forecast errors improve monotonically relative to RW forecasts as c falls, there is a value of c such that the expected squared forecast errors from these models are asymptotically identical. Specifically, for a given forecast horizon η, there is a value of drift, c*(η), such that RW and OLS are equally accurate. I thus say that for a given value of η, c*(η) is the forecasting threshold of the AR(1) model at forecast horizon η. Then, if c > c*, RW is a preferred model and if c < c*, OLS is preferred. In this paper, I identify forecasting thresholds for cases I, II and III. Since Stock (1996), Phillips (1998) and Kemp (1999) each report local to unity, long-horizon asymptotic 2 It also allows a researcher to abstract from some distributional assumptions 4 distributions of Case I forecast errors for RW and OLS, I must only apply their theorems to build the threshold function c*(η) for Case I. I do so in section 3. For cases II and III, however, much more work is needed. Ng and Vogelsang (2002) report local to unity asymptotic distributions for one-step-ahead OLS forecasts and prove the invariance of forecasts to the values of the trend parameters (µ and θ). They do not report longhorizon distributions, however. I give theorems describing local to unity, long-horizon asymptotic distributions for Case II and III forecast errors for both RW and OLS in section 2. I then use them to build forecasting thresholds for cases II and III in section 3. The identification and analysis of the structure of these forecasting thresholds forms a body of research capable of speaking to numerous important questions about forecasting and about the AR(1) in particular. I focus on three points in this paper. First, the failure of OLS to outperform RW when RW is misspecified (but c is very close to 0) is due, in part, to bias in OLS estimates of the autoregressive parameter but is due primarily to the additional estimation error present in OLS forecasts. This illustrates clearly the fundamental advantage of using parsimonious models for forecasting. Second, it is clear that optimal criteria for choosing between RW and OLS are linked to the forecasting thresholds but not to the presence or absence of a unit root. In fact, the unit root hypothesis and test are only indirectly related to selecting a forecasting model. Thus, in general, one should not expect a 5% unit root test to choose optimally between RW and OLS forecasts. In section 5, I use a forecasting example to show the superiority of a novel model selection criterion that uses these forecasting thresholds explicitly. Specifically, I propose using median unbiased estimation of the drift c, interpreted with respect to the forecasting thresholds, to select the model most likely to minimize forecast error. This offers substantial improvement in forecasting over a selection criterion using a 5% unit root test. Third, although I only present theoretical results for the AR(1), the methods in this paper are likely to be useful in studying forecasting in near-non-stationary AR processes of larger order. In section 6, I give some preliminary Monte Carlo results comparing forecasts from an OLS-estimated AR(q) to forecasts obtained by specifying a unit root and estimating the parameters of the AR(q) under that restriction. It is clear both that the lessons learned from the AR(1) help one understand the forecasting of these processes and that much more work is needed to understand it fully. 5 The balance of this paper is as follows. In section 2, I derive asymptotic distributions of the forecast errors for RW and OLS forecasts for cases I-III. In section 3, I use computer simulation to identify forecasting threshold functions c*(η) for all cases. In section 4, I discuss the results of sections 2 and 3 in detail. In section 5, I demonstrate the usefulness of the forecasting threshold functions in a forecasting example. In section 6, I offer some preliminary results on forecasting thresholds for AR(q) processes with q > 1 and give brief concluding remarks. 2. Analysis In this section I derive the asymptotic distributions of forecast errors from the Random Walk (RW) and OLS-estimated forecasting models. Ultimately, I wish to compare the meansquare error (MSE) of forecasts for these models. To derive limiting distributions for the forecast errors (Stock’s Theorems and Theorems 1-3), I use local to unity asymptotic theory. The forecast errors are functions of Ornstein-Uhlenbeck processes and, for the OLS-estimated model, the limiting distributions are quite complicated. Indeed, closed-form expressions for squared errors are mathematically intractable. Recall that the local to unity assumption (equation (4)) specifies the autoregressive parameter ρ as a function of sample size T. When the drift parameter c is a negative number, ρ is less than unity, which would seem to imply that the autoregressive process is strictly stationary. However, for a finite sample size T, autoregressive processes with ρ close to but slightly below unity retain some evolutionary properties of the non-stationary random walk process (c = 0). Primarily, the closer c is to zero, the less is the tendency of the time series to mean-revert. I start with the following definition of terminology. Definition: Let J c (r ) be defined as follows: dJ c (r ) = cJ c (r )dr + dB (r ) (5) where B(r) = σW(r) and W(r) is a standard Weiner process. This is a standard OrnsteinUhlenbeck process, and e 2 cr − 1 J c (r ) ~ N 0, σ 2 c 2 (6) 6 It is well known that, for the data generating process (0):3 [Tr ] ∑ε t =1 T t ⇒ σW ( r ) = B ( r ) [Tr ] and u[Tr ] T = c ∑ (1 + T ) t =1 [Tr ]−t εt T ⇒ J c (r ) where [w] denotes the integer part of w for any real w and the symbol “⇒” denotes convergence in distribution. All of the asymptotic results used in this paper to build forecast error limiting distributions derive from these two. The “total error” in a single forecast can be broken down into two parts: “forecast error” _ and “innovation error.” For the h-period-ahead forecast y T + h|T , this “total error” is given as: _ _ y T + h|T − yT + h = ( y T + h|T − yT + h|T ) + ( yT + h|T − yT + h ) (7) where yT + h|T is the (infeasible) correct forecast, using the model’s true parameters. The second term on the right hand side of (7) is the “innovation error”: yT + h|T − yT + h = ρ h −1ε T +1 + ρ h − 2 ε T + 2 + ... + ρε T + h −1 + ε T + h (8) Clearly, this error is common to forecasts from both the RW and OLS models. The expected square of this term is straightforward to evaluate: 1 − ρ 2h 2 E ( yT + h|T − yT + h ) = σ 2 2 1− ρ The first term in parentheses on the right-hand side of (7) is the difference between the feasible forecast and the infeasible forecast that would be made if the model’s parameters were known with certainty. For OLS and RW, respectively, we define the “forecast errors” for Case K: _ K _ OLS ( K ) _ K _ RW ( K ) S T + h|T = y T + h|T − yT + h|T R T + h|T = y T + h|T − yT + h|T (9) (10) My use of the term “forecast error” is the same as that in Ng and Vogelsang (2002). Clearly, if a forecast has a lower expected squared forecast error, it has lower expected squared total error. 3 See, e.g., Phillips (1987). 7 Case I For Case I OLS forecasts, ρ is estimated using a least squares regression of { y t }Tt=1 on { y t −1 }Tt=1 . The forecast is then: _ (I ) h yTOLS + h|T = ( ρ ) yT (11) where the bar over ρ denotes the OLS estimate. For forecasts from this model, the “forecast error” is thus given: _ I _ h (I ) h S T + h|T = yTOLS + h|T − yT + h|T = ( ρ ) − ρ y t The following result was first reported by Stock (1996). It is also a special case of a theorem due to Kemp (1999) and is similar to a result in Phillips (1998). _ I 4 Theorem (Stock (1996)): Let the data be generated by (1). Let S T +ηT |T be the forecast error for the OLS-estimated forecast (equation (11)) for horizon η = _ I S T +ηT |T T ηc ⇒ e (e _ η c − c h . T − 1) J c (1) where: 1 _ _ c − c = lim T →∞ T ρ − ρ = ∫J c (r )dB(r ) 0 1 ∫J c (r ) 2 dr 0 _ The asymptotic distribution of c − c is non-normal and non-central. Its moments depend upon c only. For cases I and II, the RW model imposes ρ = 1 and forecasts all future values of the time series by using the final available value of the time series: 4 This is equivalent to equation (4) from Stock (1996), p. 689. 8 _ RW ( I , II ) y T + h|T = yT (12) The forecast error is thus defined as follows: _ I , II _ RW ( I , II ) R T + h|T = y T + h|T − yT + h|T = (1 − ρ h ) yT _ I , II 5 Theorem (Stock (1996)): Let the data be generated by (1) or (2). Let R T +ηT |T be the forecast error for the random walk forecast (equation (12)) for horizon η = _ I , II R T +ηT |T T ( h . T ) ⇒ 1 − eηc J c (1) Note that no estimation error is present in the forecast error. All potential error is due to bias and if c = 0, RW is correctly specified and the forecast error is zero. Case II For Case II OLS forecasts, ρ and α are estimated using a least-squares regression of { y t }Tt=1 on { y t −1 }Tt=1 and {1,1,…,1}’. The forecast is then: y OLS ( II ) T + h|T _ h 1 − (ρ) _ h + ( ρ ) yT = α _ 1 − ρ _ (13) _ II _ I The construction of the forecast error S T + h|T is analogous to S T + h|T . I now give the following theorem. _ II Theorem 1: Let the data be generated by (2). Let S T +ηT |T be the forecast error for the forecast from the OLS-estimated model (equation (13)) for horizon η = 5 h . T This is equivalent to equation (6) from Stock (1996), p. 689. 9 _ II 1 − eη c ⇒ _ −c − S T +ηT |T T 1 _ − B (1) + (c_ − c) J (r )dr + eηc (eη ( c −c ) − 1) J (1) c ∫0 c where: 1 _ c− c = 1 ∫ J c (r )dB(r ) − B(1)∫ J c (r )dr 0 0 1 ∫J 0 (r ) 2 dr − ∫ J c (r )dr 0 1 c Proof: See Appendix This limiting distribution appears to include more error than the limiting distribution for the Case I OLS forecast error. Since the RW model uses the same forecast for Case II as in Case I, it is to be expected that it should perform even better against the OLS-estimated model for Case II. Case III For Case III, the form of RW is known as the Random Walk with Drift (RWD) and collapses to: yt = θ + yt − 1 + ε t (14) To forecast h steps ahead with this model, a researcher estimates θ (by taking the mean of the first differences of the time series) and then uses the forecast: _ RW ( III ) y T + h|T _ = θ h + yT (15) where the bar over θ denotes the estimated value. _ III Theorem 2: Let the data be generated by (3). Let R T +ηT |T be the forecast error for the random walk (with drift) forecast (equation (15)) for horizon η = _ III R T +η T |T T ( h . T ) ⇒ 1 + η − eηc J c (1) Proof: See Appendix 10 Note that, whenever η > 0, the expected square of the above term will be larger than for the random walk without drift (cases I and II). This additional error stems exclusively from the error in estimating the drift θ. For Case III OLS forecasts, ρ, α and δ are estimated using a least squares regression of { y t }Tt=1 on { y t −1 }Tt=1 , {1,1,…,1}’ and {1,2,…,T}’. The forecast is then: ( III ) yTOLS + h|T _ h _ _ h _ _ 1 − ( ρ ) _ h j −1 + ( ρ ) yT + − + = α + δ T δ ( h j 1 )( ρ ) _ ∑ 1 − ρ j =1 (16) The following theorem gives the asymptotic distribution for the forecast errors. _ III Theorem 3: Let the data be generated by (3). Let S T +ηT |T be the forecast error for the forecast from the OLS-estimated model (equation (16)) for horizon η = _ III S T +ηT |T T { ⇒ h . T _ η c_ 1 1 _ e − η c− 1 1 ( 6 12 r ) dB ( r ) c c 6 J ( r ) dr 12 rJ ( r ) dr − − − + − c c ∫ ∫0 ∫ c2 0 0 _ 1 1 1 ηc ηc η c −c _ 1 − e + − 1) J c (1) − ∫ (2 − 6r )dB(r ) + c− c 2∫ J c (r )dr − 6∫ rJ c (r )dr + e (e _ − c 0 0 0 _ } where: 1 1 1 1 1 1 B(1) 6 ∫ rJ c (r )dr − 4∫ J c (r )dr + ∫ J c (r )dB(r ) + ∫ rdB(r ) 6 ∫ J c (r )dr − 12∫ rJ c (r )dr _ 0 0 0 0 0 0 c− c = 2 2 1 1 1 1 1 2 ∫0 J c (r ) dr − 4 ∫0 J c (r )dr + 12 ∫0 J c (r )dr ∫0 rJ c (r )dr − 12 ∫0 rJ c (r )dr Proof: See Appendix 3. Simulation of Forecasting Thresholds In this section, I identify forecasting thresholds for the RW/OLS decision in two ways: (1) “predicted” thresholds using the formulas in the Theorems of section 2; and (2) “actual” thresholds using h-period-ahead forecasts from RW and OLS models. I rely upon simulation; for 11 each Monte Carlo trial I use Gaussian errors for {ε t }Tt=1 , with T = 1000, in the DGP given in (0).6 I consider the forecasting horizons η = {.001, .01, .05, .1, .2, .3, .4} and drift values c ∈ [0,−12] , and use a grid search to determine the threshold values of drift c* as a function of horizon η. For each (c,η) pair I use 50,000 trials for precise estimates of the expected squared forecast error from RW and OLS. I then use linear interpolation to estimate the function c*(η) where the expected squared forecast errors are identical for these two models. The estimated functions for cases I, II and III are shown in Figures 1, 2 and 3 respectively. I also compute actual forecasts from the RW and OLS models for η = {.01, .05, .1, .2, .4} for a typical empirical sample size for post-war quarterly US macroeconomic aggregates (T = 200). I compare the mean-squared error of forecasting, which includes both the forecast error and the innovation error, for these two models, to describe the relative difference in forecasting efficiency. For simplicity, I report simulated measures of: R (c , η ) = MSE ( RW , c,η ) MSE (OLS , c,η ) (17) for each c and η. Thus, if this value is below 1, RW is a preferred forecasting model. Figure 1 and Tables 1a and 1b give the results for Case I. The region above the curves in Figure 1 is where RW forecasts have lower forecast error (and thus lower MSE) than OLS forecasts. In the graph, the drift c is plotted against the forecast horizon fraction η. The difference between the predicted and actual thresholds is small, indicating that the asymptotic approximations given in Stock’s Theorems are very accurate. Note that, for each forecast horizon, RW is always preferred for c very close to 0, and OLS is preferred for c far from 0. For instance, at η = .001, so that h = 1, the threshold c* ≈ -2.8. Thus, RW is preferred whenever ρ ∈ 6 Simulation of the “predicted” forecast errors, using the theorems of section 2, consists of simply building partial sums that converge to the various pieces of the asymptotic distributions and then combining them according to the 1 1 T theorems. For example, to simulate J c (r )dr , use 3 ut . 2 t =1 0 T ∫ ∑ _ For simulation of “actual” forecast errors, for each trial I build the h-period-ahead forecast y T + h|T , compare it to the actual h-period-ahead value yT + h|T , and compute the square. Averaging over all trials gives the expected square. 12 (.9972,1] and OLS is preferred whenever ρ < .9972. However, the “threshold” value of c, where RW and OLS have the same MSE, changes with the horizon η. The simulated function c*(η) achieves a minimum at η = .001, and rises with η until η = .3, then it declines again. In Table 1a, I report the threshold functions c*(η), which gives the numbers used to build the graphs in Figure 1. Note the “predicted” and “actual” values of c*(η) never differ by more than .04. In Table 1b, I report values of R(c,η) for selected c and η. At c = -1, η = .05, for instance, R = .95, so that RW has 5% lower MSE than OLS. This ratio decreases with the forecast horizon whenever RW is clearly preferred (c = 0 or c = -1), and increases with the forecast horizon whenever OLS is clearly preferred (c ≤ -3). Hence, the expected loss from choosing the wrong forecasting model is greater at longer forecasting horizons. When c = -2, RW is preferred at some horizons, while OLS is preferred at others. For Case II, c*(η) is shown in Figure 2. This function shares several features with that of Case I. Again c*(η) achieves a minimum at η = .001 and grows with η. In this case, however, there is no peak; the function achieves its largest value at η = .4. Note that the function is beneath that of Case I at all forecast horizons. As expected, RW has a greater advantage against this model. Next consider Tables 2a and 2b. Again the “predicted” and “actual” values of c*(η) are close, never differing by more than .09. Note that the loss from using OLS, when c is very close to zero, is greater here than in Case I. For instance, we report in Table 2b that R(-1,.05) = .92, for a difference of 8%, as compared to a 5% difference in Case I. Again, the expected loss from choosing the wrong forecasting model is greater at longer forecasting horizons. For Case III, c*(η), as shown in Figure 3, is well below the Case I and Case II functions. It, too, achieves a minimum at η = .001 and is concave, but peaks at η = .2. In Tables 3a, the “predicted” and “actual” values of c*(η) are again close. In Table 3b, the loss from using OLS when c is very close to zero is shown to be greater than in cases II and III. For instance, R(2,.05) = .88, for a difference in MSE of 12%. The expected loss from choosing the wrong forecasting model is usually magnified at longer forecasting horizons, although not between η = .2 and η = .4 when c = -7, -8, -9 and -10, where c*(η) is decreasing in η. 13 4. The Importance of Downward Bias Intuitively, RW forecasts should be preferred when c is very close to 0. There are fewer parameters to estimate and bias due to misspecification is also small. It should not be surprising, then, that there are even theoretical cases where RW is preferred to OLS when the autoregressive parameter is known with certainty.7 However, downward bias in the OLS estimate of the autoregressive parameter, a well-known phenomenon, does matter.8 In fact, this largely explains why RW forecasts are preferred more often at short horizons in my analysis. Consider a version of Case I where c < 0, so that ρ < 1. The RW error is (1 − ρ h )uT , _ which clearly increases monotonically in h. For most values of ρ < 1, usually the case for the _ OLS estimate, its error, (( ρ ) h − ρ h )uT also grows in absolute value with h.9 But even if it grows, it does so more slowly than does (1 − ρ h )uT . Thus, OLS errors typically improve, relative to RW, as h increases. 10 _ Because of downward bias, (( ρ ) h − ρ h )uT will frequently be larger in absolute value _ than (1 − ρ h )uT when c is close to 0. If ρ is below ρ, the ill effects of this bias will be most damaging, relative to RW, when h = 1. Thus, holding c constant, the relative improvement of In Case II, for instance, if OLS must estimate α but not ρ, RW is preferred at all forecast horizons whenever -2 < c < 0. This is simple to show analytically. 8 Phillips (1979) gives an excellent discussion of downward bias. For Case II, an oft-cited (crude) approximate 7 _ formula for downward bias in the OLS estimate of ρ is E ( ρ − ρ ) ≅ −T −1 (1 + 3ρ ) . (e.g., Mark (1995)). 9 _ It will only shrink initially if ρ is near 0. _ _ For instance, suppose ρ = .98. If ρ = .96 is the OLS estimate, then ( ρ − .98)u T is the same in absolute value ( .02u T ) for the RW as for the OLS model. At h = 4, however, this term is (1 − .922)uT = .078uT in the RW model but only (.849 − .922)uT = −.073uT in the OLS model. At h = 20, this term is (1− .667)u T = .333u T in the RW model, but only (.442 − .667)uT = −.225uT in the OLS model. Thus, it is intuitive that the OLS model should 10 _ improve, relative to RW, as the forecast horizon increases. Since the term (( ρ ) h − ρ h )u T appears in the forecast errors for cases I, II and III, this effect holds in all three cases. 14 OLS forecasts as η increases, as illustrated by the positive slope of the forecasting thresholds in Figures 1-3, is magnified by downward bias in the OLS estimate of ρ.11 5. Selecting Between RW and OLS The predicted thresholds in this paper help inform the decision between RW and OLS forecasts. As was mentioned in the introduction to this paper, if one had strong evidence that ρ is very close to or equal to 1, RW would be a good choice for a forecasting model. Unit root tests can provide such evidence. Recent research has investigated forecasting strategies of the model in (0) that use unit root tests to decide whether or not to specify a RW model. Stock (1990), Campbell and Perron (1991), Stock (1996) and Diebold and Kilian (2000) have investigated the Monte Carlo accuracy of these “pretest” strategies across a considerable range of parameterizations of this model, and found that these strategies have some obvious value relative to uniform strategies that either always or never use RW. In all cases, the size of pretests chosen has been either 5% or 10%. Sizing pretests according to conventional hypothesis test sizes is due to two facts: (1) that 5% and 10% are generally accepted thresholds for statistical significance in hypothesis testing; and (2) the critical values are readily available for these sizes. As Diebold and Kilian (2000) point out, however, these sizes satisfy no criterion of optimality for forecasting. Indeed, as this paper has shown, the unit root hypothesis is far stricter than a hypothesis that says that RW will be a preferred forecasting model. In fact, what matters is whether c is above or below c*(η). Unfortunately, c is not consistently estimable in a univariate framework. Thus, it is impossible to test, using traditional asymptotic theory, whether c is above or below c*(η). However, Stock (1991) shows that it is possible to find estimates of c that are median unbiased. These values can be used to predict, for a given time series, which side of c*(η) the true c most While the threshold functions for cases I and II become nearly flat after η ≥ .2, the function for Case III has a pronounced peak at η = .2 and declines significantly as the horizon increases further. This appears to be due to the need to estimate the trend in the OLS model. In a hypothetical version of this case where ρ is known with certainty, it can be shown that the threshold is near c* ≈ -5.9 at η ≅ 0, and declines monotonically with η, reaching c* ≈ -8.5 at η = .4. In Figure 3, as η increases from 0 to .2, this downward pressure on the threshold function is dominated by 11 _ upward pressure on (( ρ ) h − ρ h )u T described earlier in this section. After η reaches .2, however, the forecasting threshold does indeed decline. 15 likely falls on. Hence, it is possible to choose the forecast that is most likely to minimize the forecast error. We now test the usefulness of this proposition. Figure 4 shows the relative performance of two intuitive strategies in forecasting Case II of the AR(1). One strategy uses the traditional statistical criterion, employing a 5% Dickey-Fuller test and choosing OLS only if the DF test rejects the unit root null. The second strategy obtains a median unbiased estimate of drift c, then compares this value to the appropriate forecasting threshold. OLS is used if this estimate is below the threshold. This consists of first building the Dickey-Fuller t-statistic: _ τ= _ ρ−1 (18) _ σρ _ _ where σ ρ is the usual OLS standard error. A median-unbiased estimate of c can be found for τ using the tables in Stock (1991). For forecast horizon η, if that estimate is below c*(η), OLS is _ used. This decision rule essentially produces a critical value for the t-statistic ( τ * ) that is _ different from the 5% Dickey-Fuller critical value of -2.86. I report the critical τ * values for Cases II and III beneath Figure 4. I employ a Monte Carlo simulation to estimate the MSE of each strategy.12 The surface in Figure 4 measures the ratio of MSEs of these two strategies: R * ( c, η ) = MSE ( Strategy 2, c,η ) MSE ( Strategy1, c,η ) (19) Whenever the surface in Figure 4 is above 1, Strategy 1 has lower MSE than our proposed strategy 2, which uses forecasting thresholds and median-unbiased estimation. This only occurs when c is larger than –4, and in these cases R* is never more than 1.05. When c is less than –4, on the other hand, Strategy 2 is dominant and by a larger amount. R* sinks to .81 when c = -13, η = .1, and reaches .7 at η = .4. This is an extremely significant gain in forecast accuracy by Strategy 2. For example, consider an AR(1) with a non-zero mean and true ρ = .935. If one had a time series of length 200 generated by this process, Strategy 2 would reduce the MSE of 20step-ahead forecasts by nearly 20%. 12 T=200, number of trials = 20,000. 16 Clearly, strategy 2 has an advantage over strategy 1 if there is any reason to believe that c is less than the forecasting threshold. Moreover, it is simple to employ, and amounts essentially to using a more powerful DF test. In sum, the proper use of unit root tests to select forecasting models must account for the fact that the RW model can be a better forecasting model even if there is no unit root. Thus, there is no argument for generally using unit root tests, with traditional 5% and 10% statistical criterion, as pretests to select forecasting models. Indeed, the selection criteria introduced here represents an obvious improvement. 6. Thresholds for the AR(q) with q > 1 In forecasting an empirical time series with autoregressive tendencies, a researcher will seldom know with absolute certainty whether the process is in fact an AR(1). A typical method for handling this problem is to treat the series as an AR(q) and estimate q directly. Hence, it is an important practical matter to ask how the thresholds in this paper would change if q > 1. For case II, then, I specify the time series as follows: p y t = α + ∑ ρ i y t −i + ε t (20) i =1 where: p α = µ (1 − ∑ ρ i ) i =1 I refer to forecasts built from estimating (20) as OLS forecasts. Under the AR(q) assumption, the series is stationary whenever the persistence of the time series, as defined by the sum of the autoregressive parameters, is less than one in absolute value.13 Here, then, I model the persistence as local-to-unity: p ∑ρ i =1 i = 1+ c T (21) 13 I also assume that, with the exception of the c = 0 case, the system is stable, in the sense that all eigenvalues are less than one in modulus. 17 Note that if c = 0, RW is no longer necessarily the correctly specified model when q > 1.14 In this case, however, since it will always be true that the sum of the autoregressive coefficients is 1, there exists a restriction of the series in (20) that is a natural qth-order analog to RW: p −1 y t = ρ1 y t −1 + ρ 2 y t −1 + ... + 1 − ∑ ρ i y t − p + ε t i =1 (22) As is the case with RW, there is no need to estimate the parameter α that is present in (20). In addition, there is one less ρ i parameter to estimate. Thus, this unit root (UR) specified AR(q) estimates two fewer parameters than the full specification of (20). This is the same difference as between RW and OLS in the AR(1) case. This suggests that the forecasting thresholds for the AR(q) may share some features of the AR(1) thresholds. As we now demonstrate in a Monte Carlo analysis, if the true process is actually an AR(1) but a higher order AR(q) is used, the forecasting thresholds are very similar to the AR(1) thresholds. Figure 5 shows estimated threshold functions for the AR(1), AR(2) and AR(5) cases together. The levels of the estimated functions are very similar, although the shape of the AR(2) and AR(5) thresholds is different at very short horizons. While the AR(1) threshold is at its minimum at η = .001, the AR(2) and AR(5) thresholds fall from η = .001 to η = .01. When q > 1, the fact that the persistence of the series does not uniquely determine the autoregressive parameters affects the thresholds. In the AR(2), for instance, as ρ 2 moves, holding c constant, the thresholds move as well. This is demonstrated in Figure 6, which shows the estimated AR(2) threshold from Figure 5 along with two additional specifications of ρ 2 : -.1 and .1. All threshold functions have short-horizon features that are different than those in the AR(1) thresholds. At most horizons, the thresholds for ρ 2 = -.1 favor OLS at more values of c than in the AR(1) while the thresholds for ρ 2 = .1 favor UR at more values of c. The shapes of these functions are similar, but the levels are clearly different at long horizons, more so than the threshold functions in Figure 5. We form two important but preliminary conclusions from these results. First, it appears that if the autoregressive process is local to unity, the first autoregressive parameter is very close RW is correctly specified only when c = 0 and ρ 2 = 0 . Even with this additional reason that RW may be misspecified, it may nonetheless continue to perform well as a forecasting model. We discuss this in more detail later in this section. 14 18 to unity and the other autoregressive parameters are very close to zero, the AR(1) thresholds from this paper provide excellent approximations to true AR(q) thresholds. That is, if the true series is very nearly an AR(1) but an AR(5) is used for forecasting, the AR(1) thresholds accurately frame the choice between OLS and UR. Second, the factors affecting forecasting for higher-order AR processes are potentially many and complex. Clearly, it remains to explain the short-horizon dip in the AR(2) and AR(5) threshold functions. Moreover, although the AR(1) thresholds appear to be quite reasonable approximations for AR(2) thresholds when ρ 2 is close to but not equal to zero, both the magnitude of ρ 2 and its sign clearly affect the thresholds. This is not surprising, given that the eigenvalues of the system, which determine its dynamics, are not uniquely determined by the local-to-unity persistence.15 Thus, in order to more fully understand these results, challenging theoretical work may be necessary. However, such research is likely to lead to a far greater understanding of the forecasting of near non-stationary AR processes. Given the prevalence of such processes in studies of empirical macroeconomic and financial data, this research is likely to be productive. It will be interesting also to investigate the relative forecasting performance of RW to the OLS estimated AR(q). It is well established that, in forecasting persistent economic and financial time series, RW is very difficult to beat. If the advantages of RW forecasts extend to near-non-stationary AR(q) processes theoretically, such research may help explain why. 15 For instance, Hamilton (1994, pp. 13-18) shows that, in the AR(2), the dynamic multiplier follows different patterns depending upon the eigenvalues of the system. Namely, if the eigenvalues are real and of less than unit modulus, which is true whenever ρ 2 and ρ1 are both greater than zero, the dynamic multiplier follows a pattern of geometric decay. If they are complex and of less than unit modulus, which may hold if ρ 2 is below zero, the dynamic multiplier follows a pattern of damped oscillation. 19 8. References Bobkoski, M.J. “Hypothesis Testing in Non-stationary Time Series,” unpublished PhD dissertation, Department of Statistics, University of Wisconsin, 1983. Campbell, John Y. and Pierre Perron. 1991. “Pitfalls and opportunities: what macroeconomists should know about unit roots. in O. Blanchard and S. Fischer (eds.), NBER Macroeconomics Annual, Boston., MA. Chan, N.H. and Wei, C.Z. 1987. Asymptotic inference for nearly non-stationary AR(1) processes. Annals of Statistics, 15,1050-63. Diebold, Francis X. and Lutz Kilian. 2000. Unit root tests are useful for selecting forecasting models. Journal of Business and Economic Statistics 18 (3), 265-73. Elliott, Graham, Rothenberg, T.J. and J.H. Stock. 1996. Efficient tests for an autoregressive unit root. Econometrica 64, 813-36. Elliott, Graham. 1999. Efficient tests for a unit root when the initial observation is drawn from its unconditional distribution. International Economic Review 40, 767-783. Fuller, Wayne A. and David P. Hasza. 1981. Properties of Predictors for Autoregressive Time Series. Journal of the American Statistical Association 76, 155-161. Hall, Robert E. 1978. Stochastic implications of the life cycle permanent income hypothesis: theory and evidence. Journal of Political Economy 86, 971-988. Hamilton, James D. 1994. Time Series Analysis. Princeton University Press, Princeton, NJ. Kemp, Gordon C.R. 1999. The behavior of forecast errors from a nearly integrated AR(1) model as both sample size and forecast horizon become large. Econometric Theory 15, 238-256. Magnus, Jan R. and M. Hashem Pesaran. 1989. The exact multi-period mean-square forecast error for the first-order autoregressive model with an intercept. Journal of Econometrics 42 (2), 157-87. Mark, Nelson. 1995. Exchange rates and fundamentals: Evidence on long-horizon predictability. American Economic Review 85 (1), 201-218. Ng, Serena and Timothy J. Vogelsang. 2002. Forecasting autoregressive time series in the presence of deterministic components. Econometrics Journal 5, 196-224. Phillips, P.C.B. 1979. The sampling distribution of forecasts from a first-order autoregression. Journal of Econometrics 9, 241-261. Phillips, P.C.B. 1987. Toward a unified asymptotic theory for autoregression. Biometrika 74, 535-47. Phillips, P.C.B. 1998. Impulse response and forecast error variance asymptotics in nonstationary VARs. Journal of Econometrics 83, 21-56. 20 Sims, Christopher A., James H. Stock and Mark W. Watson. 1990. Inference in linear time series with some unit roots. Econometrica 58, 113-44. Stock, James H. 1990. Unit roots in economic time series: do we know and do we care? A comment. Carnegie-Rochester Conference Series on Public Policy 32, 63-82. Stock, James H. 1991. Confidence intervals for the largest autoregressive root in US macroeconomic time series. Journal of Monetary Economics 28, 435-459. Stock, James H. 1996. VAR, error correction, and pretest forecasts at long horizons. Oxford Bulletin of Economics and Statistics 58, 685-701. 21 Case I Figure 1 - Forecasting Thresholds 0 0 0.1 0.2 0.3 0.4 Drift (c) -1 Predicted c* Actual c* -2 -3 -4 Horizon (eta) Table 1a - Forecasting Thresholds Horizon (η) Predicted c* Actual c* 0.001 0.01 0.05 0.1 0.2 0.3 0.4 -2.78 -2.66 -2.27 -2.07 -1.93 -1.89 -2.01 -2.8 -2.66 -2.3 -2.07 -1.9 -1.87 -1.97 Table 1b - MSE(RW) / MSE(OLS) Horizon (η) Drift (c) 0 -1 -2 -3 -4 -5 -6 -8 -10 0.01 0.98 0.99 0.99 1.00 1.01 1.01 1.02 1.03 1.04 0.05 0.92 0.95 0.99 1.02 1.06 1.07 1.11 1.16 1.22 0.1 0.85 0.92 1.00 1.06 1.12 1.17 1.23 1.32 1.43 0.2 0.74 0.86 1.01 1.13 1.26 1.35 1.44 1.61 1.72 0.4 0.51 0.75 0.98 1.25 1.45 1.62 1.72 1.89 1.97 22 Case II Figure 2 - Forecasting Thresholds 0 0 0.1 0.2 0.3 0.4 Drift (c) -2 Predicted c* Actual c* -4 -6 -8 Horizon (eta) Table 2a - Forecasting Thresholds Horizon (η) Predicted c* Actual c* 0.001 0.01 0.05 0.1 0.2 0.3 0.4 -5.6 -5.2 -4.24 -3.56 -2.97 -2.84 -2.81 -5.69 -5.31 -4.23 -3.59 -2.98 -2.81 -2.76 Table 2b - MSE(RW) / MSE(OLS) Horizon (η) Drift (c) 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 0.01 0.97 0.97 0.98 0.98 0.99 1.00 1.00 1.01 1.01 1.02 1.03 0.05 0.90 0.92 0.93 0.97 1.00 1.02 1.05 1.08 1.10 1.13 1.15 0.1 0.86 0.89 0.92 0.97 1.02 1.08 1.13 1.17 1.22 1.25 1.31 0.2 0.80 0.86 0.93 1.01 1.09 1.18 1.26 1.33 1.41 1.47 1.51 0.4 0.70 0.81 0.94 1.06 1.19 1.30 1.40 1.47 1.55 1.61 1.64 23 Case III Figure 3 - Forecasting Thresholds 0 Drift (c) -2 0 0.1 0.2 0.3 0.4 -4 Predicted c* Actual c* -6 -8 -10 -12 Horizon (eta) Table 3a - Forecasting Thresholds Horizon (η) Predicted c* Actual c* 0.001 0.01 0.05 0.1 0.2 0.3 0.4 -10.32 -9.65 -7.87 -7.05 -6.73 -7.04 -7.83 -10.5 -9.63 -7.86 -7 -6.82 -7.05 -7.87 Table 3b - MSE(RW) / MSE(OLS) Horizon (η) Drift (c) 0 -2 -4 -5 -6 -7 -8 -9 -10 -11 -12 0.01 0.95 0.96 0.96 0.97 0.98 0.98 0.99 0.99 0.99 1.00 1.01 0.05 0.87 0.88 0.90 0.93 0.95 0.98 1.00 1.02 1.05 1.07 1.09 0.1 0.82 0.84 0.89 0.93 0.97 0.99 1.03 1.07 1.11 1.15 1.18 0.2 0.78 0.79 0.86 0.90 0.95 1.01 1.06 1.11 1.15 1.21 1.25 0.4 0.73 0.71 0.78 0.83 0.89 0.94 1.01 1.06 1.12 1.16 1.21 24 Figure 4 – {MSE(Strategy 2) / MSE(Strategy1)} (Case II) Strategy 1: Use a 5% Dickey-Fuller Pretest to decide between RW and OLS. ~ Thus use OLS if the Dickey-Fuller t-statistic τ is less than the 5% critical value (-2.86 for Case II and -3.41 for Case III) and RW otherwise. Strategy 2: Get a Median Unbiased (MU) estimate of c, using tables from Stock (1991). For forecasting horizon η, use OLS if the MU estimate of c is less than the forecasting threshold c*(η) given in Tables 2a and 3a (Cases II and III respectively) and use RW otherwise. This corresponds to using OLS if the ~ ~ Dickey-Fuller t-statistic τ is less than the τ * given below. Horizon (η) ~ Case II τ * ~ Case III τ * 0.001 0.01 0.05 0.1 0.2 0.3 0.4 -2.13 -2.09 -1.99 -1.92 -1.87 -1.85 -1.85 -2.86 -2.81 -2.67 -2.6 -2.57 -2.6 -2.66 25 Figure 5 - AR(p) Forecasting Thresholds, True Process is an AR(1) 0 -1 0 0.1 0.2 0.3 0.4 c* -2 AR(5) AR(2) AR(1) -3 -4 -5 -6 -7 Horizon (eta) Drift (c) Figure 6 - AR(2) Forecasting Thresholds 0 -1 0 -2 -3 -4 -5 -6 -7 -8 -9 0.1 0.2 0.3 0.4 rho2 = -.1 rho2 = 0 rho2 = .1 Horizon (eta) 26