The ARAR Error Model for Univariate Time Series and Distributed Lag Models R. A. L. Carter University of Western Ontario and University of Calgary A. Zellner University of Chicago Abstract We show that the use of prior information derived from former empirical findings and/or subject matter theory regarding the lag structure of the observable variables together with an AR process for the error terms can produce univariate and single equation models that are intuitively appealing, simple to implement and work well in practice. Key words: time series analysis, model formulation. JEL Classification: C11, C22. 1 Introduction “None of the previous work should be construed as a demonstration of the inevitability of MA disturbances in econometric models. As Parzen’s proverb1 reminds us the disturbance term is essentially man-made, and it is up to man to decide if some of his creations are more reasonable than others.” Nicholls, Pagan and Terrell (1975) p.117. Much theoretical and empirical work in economics features low order autoregressions in endogenous variables, often accompanied by exogenous variables. For example Caplin and Leahy (1999) and Sargent (1987, Chapter IX) present AR(1) models. AR(2) models for cycles appear in: Samuelson(1939a,b), Burns and Mitchell (1946), Goodwin (1947), Cooper and Johri(1999) and Sargent (1987, Chapter IX). Papers by Metzler (1941), Garcia-Ferrer et. al. (1987), Geweke (1988), Zellner and Hong (1989), Zellner, Hong and Min (1991) and Zellner and Chen (2001) feature AR(3) models. Gandolfo (1996) provides many examples of models which can be expressed as autoregressions, plus exogenous variables including: cobweb models, foreign trade and taxation multipliers, multiplier-accelerator models, market adjustment models and inventory models. 1 “God made X (the data), man made all the rest (especially, the error term).” quoted in Nicholls et al (1975). We add that in many instances humans, not God, construct the data. 1 Thus, economists building time series models often have quite strong prior beliefs, based on theoretical and applied work, about the lag structure for an observable variable. However, they usually have quite weak prior beliefs about the lag structure for the error in a model. The AR model for the error was popular until Box and Jenkins (1976) methods became dominant in univariate time series analysis and introduced the MA model without much theoretical justification. This produced complicated likelihood functions and estimation procedures and implied infinite AR processes for observed variables. We show here how the use of prior information about the lag structure of the observable variables and a return to the AR model for the error is intuitively appealing, yields finite AR processes for observed variables and simplifies inference procedures. 2 Univariate Models A reasonable starting point for an empirical analysis is to specify that the random variable of interest has been generated by an autoregressive process that reflects prior beliefs about the lag structure. This gives φ(L)(Yt − µ) = Ut , (2.0.1) where φ(L) is a polynomial of degree p in the lag operator L, µ is the origin from which Yt is measured (the mean if Yt is stationary) and Ut is a covariancestationary error with zero mean. Researchers may have quite strong beliefs about the value of p and any restrictions on φ(L) necessary to make Yt stationary. In such cases the parameters of interest are the coefficients of φ(L), φi , or some function of them, such as the roots of φ(L). However, it is rare for researchers to have such strong prior beliefs about the error process and thus a variety of models for Ut may be entertained: see e.g. Fuller and Martin (1961), Zellner et. al. (1965) and Zellner and Geisel (1970). One possible model for the error is Ut = ε t , (2.0.2) where εt is white noise with zero mean and variance σ 2 . But this rather restrictive model implies that a shock to the subject-matter portion of the model, φ(L)(Yt − µ), has an impact in only the current period. Of course, if φ(L) is invertible the MA representation of Yt , Yt = µ + 1 εt , φ(L) (2.0.3) has infinite length. 2.1 The ARMA model A popular alternative model for Ut is the MA model of Box and Jenkins(1976). Ut = θ(L)εt , (2.1.1) where θ(L) is an invertible polynomial of degree q in L. This model was introduced “To achieve greater flexibility in fitting actual time series, . . . ”2 rather than from any explicit prior knowledge about the behavior of the error. But this model is only slightly less restrictive than the white noise model because it implies that the effect on φ(L)(Yt − µ) of a shock εt dies out completely after q periods. Also strong restrictions on the values of the autocorrelations of Y t are needed for θ(L) to be invertible. Now the infinite MA representation Yt = µ + θ(L) εt φ(L) (2.1.2) allows somewhat richer behavior than does the white noise case. However, the AR representation, φ(L) (Yt − µ) = εt , (2.1.3) θ(L) is also infinite, which may be undesirable from a subject matter viewpoint and which complicates inference about the φi . Of course, an MA model for Ut is desirable on a priori grounds if the observations on Yt were the result of temporal aggregation or if the model features unobserved states. However, in many cases an MA model for Ut arises from an examination of sample autocorrelations and is not justified by any prior theory. In these cases the model described in the next section is worthy of consideration. 2.2 The ARAR model We propose an alternative, AR model for Ut which is parsimonious, allows rich time series behavior and simplifies inference procedures: ω(L)Ut = εt , (2.2.1) where ω(L) is a polynomial of degree r in L. Assuming ω(L) is invertible, the infinite MA representation of Ut is Ut = 1 εt . ω(L) (2.2.2) Substituting (2.2.2) into (2.0.1) gives the “structural form” of the model, which we label3 ARAR(p,r), as φ(L)(Yt − µ) = Ut = 2 Box 1 εt . ω(L) (2.2.3) and Jenkins (1976) page 11. (1982) introduces a model with a similar label. His ARARMA model consists of two filters applied sequentially. The first filter is a nonstationary AR with a stationary output which is input to a second ARMA filter with a white noise output. However, his model does not partition the second filter into a portion containing prior information about the dynamics of Yt and a portion modeling the error dynamics. 3 Parzen In (2.2.3) the impact on φ(L)(Yt − µ) of a shock εt may die out very slowly, rather than being cut off abruptly, so our model can produce rich behavior in Ut with values of r as small as two. Also, (2.2.2) is consistent with Wold’s (1938) Decomposition Theorem, in contrast to MA models for Ut which impose a truncation4 on the Wold representation of Ut . If Yt is covariance-stationary, then the MA representations of all the above models for Yt are infinite and, therefore, consistent with Wold’s theorem. However, (2.2.2) has the advantage of being the simplest model that also imposes this consistency on Ut . The “reduced form” of the model is, from (2.0.1) and (2.2.2), ω(L)φ(L)(Yt − µ) = εt , (2.2.4) α(L)Yt = α0 + εt (2.2.5) or where α(L) = ω(L)φ(L) is a polynomial in L of degree p + r with restricted coefficients and α0 = ω(1)φ(1)µ. If φ(L) and ω(L) are invertible the MA representation of Yt is 1 εt . (2.2.6) Yt = µ + ω(L)φ(L) The restrictions that the ARAR model imposes on the coefficients of α(L) will affect the rate at which the impulse response coefficients in (2.2.6) decline, but that decline will be more rapid than in models with fractional differencing. In contrast to (2.1.3), both (2.0.1) and (2.2.3) are finite AR processes. Thus the ARAR model is simpler than the ARMA model and it should be assigned higher prior probability according to the “Simplicity Postulate” of Wrinch and Jeffreys (1921). Also, few scientific theories are framed as infinite AR processes, but many are in the form of finite difference or differential equations. Finally, the form of (2.2.4) makes it simpler to estimate than either (2.1.2) or (2.1.3). To see the parsimony of (2.2.2) as compared to (2.1.1) consider an example in which ω(L) = 1 − ω1L − ω2 L2 = (1 − ξ1 L)(1 − ξ2 L) with ξ1 = .7 and ξ2 = −.5. Then the coefficient of εt−5 in (2.2.2) is .111 so it would take an MA process with at least five parameters to approximate the inverse of the AR with two parameters. Next assume the roots are complex conjugate with ξ1 = .7 + .5i and ξ2 = .7 − .5i. Now the coefficient of εt−5 is .242 so, again, an approximating MA would have to have at least five parameters, compared to two in ω(L). The only behavior that is allowed by the ARMA model but ruled out by the ARAR model is an infinite AR in Yt ; although if p and r are high (2.2.4) will be a very long AR. Seasonal effects are modeled by expressing φ(L) or ω(L) as products of nonseasonal and seasonal polynomials. Because some prior information about φ(L) is assumed to be available, the polynomials φ(L) and ω(L) can be analyzed separately. Some of the coefficients of α(L) may be quite small in absolute value, even though no φi or ωi is small. This can result in small and imprecise estimates of 4 Of course, it is possible for the Wold decomposition to yield a finite MA form in some cases. However, we regard a model which requires a finite MA form for every case to be less desirable. these αi if the restrictions implied by the ARAR structure are ignored, tempting researchers to impose invalid zero restrictions on α(L). As we show below, direct estimation of φ and ω is easy so there is no need to focus solely on (2.2.5). Before considering inferences about φ(L) and ω(L) we note that, without further information, they are not identified. To see this, interchange φ(L) and ω(L) in (2.2.3) which would yield the same unrestricted AR(p + r) as (2.2.5) so the likelihood function based on (2.2.5) would be unchanged. Also, multiplying both sides of (2.2.3) by a nonzero scalar υ0 would leave (2.2.5) unchanged but it would result in υ0 φ(L) = υ0 − υ0 φ1 L − . . . υ0 φp Lp . We avoid this identification problem by adopting the common assumption that the model has been normalized to have the first term in φ(L) equal to 1.0. Identification failure also arises if both sides of (2.2.3) are multiplied by the invertible lag polynomial υ(L) to give υ(L)φ(L)Yt = υ(1)µ + υ(L) εt , ω(L) (2.2.7) leaving (2.2.5) and the likelihood function unchanged. This is analogous to model multiplicity in ARMA models containing the products of seasonal and nonseasonal lag polynomials. Following Box and Jenkins (1976), we assume that all common factor polynomials like υ(L) have been canceled out of the model. Even after normalization and common factor cancellations, an identification problem remains. Write φ(L) and ω(L) in terms of terms of their roots as φ(L) = (1 − λ1 L)(1 − λ2 L) . . . (1 − λp L) (2.2.8) ω(L) = (1 − ξ1 L)(1 − ξ2 L) . . . (1 − ξr L). (2.2.9) α(L) = φ(L)ω(L) = (1 − λ1 L) . . . (1 − λp L)(1 − ξ1 L) . . . (1 − ξr L). (2.2.10) and Thus Now assume the λi and ξi are unknown. Then write α(L) = (1 − η1 L)(1 − η2 L) . . . (1 − ηp+r L) (2.2.11) and assume we know the values of the roots ηi , i = 1, 2, . . . , p + r. The identification problem lies in deciding which terms (1 − ηi L) are part of φ(L) and which are part of ω(L). The solution is to use the same prior information that was used to specify φ(L) to make this decision. In many cases φ(L) will be specified to be of degree two giving a damped cycle with a period in some a priori most probable range. Thus the roots ηi belonging to φ(L) must be a complex conjugate pair with modulus less than one and a period lying in the most probable range. Any roots ηi which are real or complex conjugate with a smaller period must belong to ω(L). In practice the values of the ηi are unknown. But once we specify p and r we can fit an AR(p + r) by ordinary least squares (OLS) and find the roots of the resulting α̂(L). They can be used to find initial values of the φ̂i and ω̂i for use in nonlinear least squares (NLS) estimation of ω(L)φ(L). This idea can even be used on a long AR model produced by someone else. Using their estimated AR coefficients we can obtain estimated roots, moduli and period to help us judge whether their model is really the reduced form of an ARAR model. Alternatively, a priori plausible initial values of φi and ωi can be used in NLS to obtain estimates ω̂(L) and φ̂(L). Then their roots can be compared to those of α̂(L) from (2.2.5) as a check that the specification is correct. After ruling out under identified models, the total number of free parameters in φ(L) and ω(L) may be equal to the number of free parameters in α(L). Such models are just identified so, under normality, ML estimates of the φi and ωi can be derived from OLS estimates of the αi . Alternatively, if NLS estimates of the φi and ωi are used to derive estimates of the αi they should equal the OLS estimates. However, if the number of parameters in α(L) is greater than that in φ(L) and ω(L) (e.g. if some of the φi or ωi are specified to be zero without reducing the value of p or r) the model is over identified so it is not generally possible to solve for unique estimates of the φi and ωi from OLS estimates of the αi . In such cases it will be necessary to estimate the φi and ωi directly by NLS. An interesting simple case is obtained by assuming that p and r are both one so that (2.2.4) becomes ω(L)φ(L)(Yt − µ) = (1 − ωL)(1 − φL)(Yt − µ) (2.2.12) 2 = (1 − [ω + φ]L + ωφL )Y − (1 − ω)(1 − φ)µ = εt from which Yt = (1 − ω)(1 − φ)µ + (ω + φ)Yt−1 − φωYt−2 + εt = α0 + α1 Yt−1 + α2 Yt−2 + εt . (2.2.13) (2.2.14) We believe this ARAR(1,1) model is a useful alternative to the popular ARMA(1,1) model with the same number of parameters. Imposing stationarity and invertibility on the ARMA(1,1) model results in very strong restrictions on the range of admissible values for the autocorrelations of Yt at lags 1 and 2, ρ1 and ρ2 : see Box and Jenkins(1976), Figure 3.10(b). The analogous restrictions on the ARAR(1,1) model result in restrictions on ρ1 and ρ2 which are much weaker. Assuming that −1 < φ < 1 and −1 < ω < 1, results in the feasible set of α1 and α2 values being a subset of that for an unrestricted AR(2) which excludes the subset associated with complex roots. This is shown in Figure 1 which should be compared with Box and Jenkins (1976), Figure 3.9. These restrictions also restrict the ρ1 , ρ2 space to a subset of that for an unrestricted AR(2). This space is shown in Figure 2a which should be compared with Box and Jenkins (1976) Figures 3.3(b) and, especially, 3.10(b), which is reproduced below as Figure 2b. The admissible parameter space shown in Figure 2a is considerably larger than that shown in Figure 2b and negative values of ρ2 are excluded. This last point should aid in model formulation as sample autocorrelations which are negative at both lags one and two would be evidence against an ARAR(1,1) model being appropriate. We note that for many economic time series the first two sample autocorrelations are positive so that the ARAR model is not ruled out. Since both φ(L) and ω(L) have been normalized and there are no common factors, all that is needed for identification is some prior information on the roots of α(L), φ and ω. For example we might identify φ as the (absolutely) larger root of α(L). Since this model is just identified, we could obtain estimates of φ and ω (plus µ) from estimates of the αi by OLS, ML, MAD, traditional Bayes and Bayesian Method of Moments (BMOM): see Zellner (1996, 1997), Zellner and Tobias (2001) and Green and Strawderman (1996). The stationarity restrictions imply that −2 < α1 < 2 and −1 < α2 < 1. Note that if, for example, φ > 0 and ω < 0, α1 may be quite small in absolute value but imposing α1 = 0 would be a specification error. Alternatively, one could estimate φ and ω by NLS and use the results to form estimates of the αi , which should be the same as the OLS estimates. Posterior densities for φ and ω can be obtained from the unrestricted posterior for α1 and α2 by simulation. If the stationarity restriction is to be imposed draws which violate this restriction would be discarded. Now assume a quarterly seasonal model for the error, ω(L) = 1 − ω4 L4 , with the nonseasonal, subject-matter structure φ(L) = 1 − φL, as before. This gives a restricted AR(5) for the reduced form, α(L) = (1 − φL − ω4 L4 + φω4 L5 ) 4 (2.2.15) 5 = (1 − α1 L − α4 L − α5 L ), with only three parameters in contrast to an unrestricted AR(5) with five. However, this model is over identified so it would be necessary to obtain estimates of φ and ω4 directly using NLS. To ensure that ω(L) is invertible impose the uniform prior, p(ω4 ) = .5 over the range −1 < ω4 < 1. To impose stationarity on Yt use the prior p(φ) ∝ (1 − φ)a−1 (1 + φ)b−1 for −1 < φ < 1: see Zellner (1971 p.190). Finally, adopt a uniform prior on log(σ) so that p(σ) ∝ σ −1 . Then, given a T × 1 vector y of data and assuming εt to be independent normal, σ can be integrated out analytically and the joint posterior density for φ and ω4 can be analyzed using bivariate numerical integration. The first step in building a general ARAR model is to select a degree p for the polynomial φ(L) based on prior knowledge. Data plots, sample autocorrelations and partial autocorrelations may also be useful at this stage, e.g. to confirm the presence of unit roots or cycles which suggest a p ≥ 2. A tentative AR(r) model for the error Ut should now be specified with r believed large enough to make εt white noise. This will imply that the degree of α(L) is p + r. The suitability of r can be gauged from the autocorrelations of the residuals from OLS applied to (2.2.5). This procedure differs from the Box and Jenkins (1976) method which bases the lag lengths primarily on the sizes of data autocorrelations and partial autocorrelations. Their procedure may be suitable if the researcher has no prior beliefs about φ(L), although it involves considerable pretesting that can have adverse effects upon subsequent inference; see Judge and Bock(1978). Of course, models initially based on features of a sample can in the future be rationalized by theory and confirmed with new samples. In Bayesian analysis the choice of p and r should be guided by posterior odds ratios. If the odds analysis leads to several favored models results regarding parameters and predictions can be averaged over the models leading to forecasts which are often superior to non-combined forecasts. If the prior odds ratio is set to one, the posterior odds ratio becomes the Bayes factor, B1,2 , which can be calculated exactly for small models: see Monahan (1983). For large samples Schwarz (1978) provides the approximation B1,2 ' T q/2 elr , (2.2.16) where: the log-likelihood ratio lr = log[l(Θ̂1 /y)/l(Θ̂2/y)], Θ̂i is the maximumlikelihood estimate of Θ from model i, q = q2 − q1 and qi is the number of parameters in model i. We note that the regression equation commonly used to test for the presence of a unit root versus a linear trends can be written as φ(L)Yt = µ0 + µ1 t + [ω(L)]−1 εt , (2.2.17) α(L)(Yt − µ0 − µ1 t) = εt : (2.2.18) or as see Schotman and van Dijk (1991) and Zivot (1994). These are both examples of the ARAR models (2.2.3) and (2.2.4) with µ a linear function of the trend variable t and φ(L) of degree 2 or more. If the unit root is believed to be in φ(L), repeated draws can be made from the joint posterior of its coefficients and the roots calculated for each draw. The proportion of draws for which the roots are complex versus real and with modulus below 1.0 versus 1.0 or more are estimates of the posterior probabilities of these properties of φ(L): see Geweke(1988), Schotman and van Dijk (1991) and Zivot (1994). 2.3 Empirical example: Housing starts One of the leading indicators published by the U.S. Department of Commerce is the number of permits issued by local authorities for the building of new private housing units. Pankratz (1983) has described it as “an especially challenging series to model” with Box-Jenkins procedures5 . This makes it an attractive series to use in illustrating the ARAR technique. First we formed a prior belief about the degree of φ(L). Since this series is thought to lead the business cycle, we believed it should have a cycle. Therefore, 5 Pankratz, (1983) p.369. We suspect that the absence of exogenous variables, such as real per capita income, the stock of housing and the real price of housing, from the univariate time series model accounts for much of the difficulty in modeling and, especially, forecasting this series. we specified6 p = 2 and we chose a proper prior for the parameters of φ(L) which placed a modest amount of probability on the region corresponding to conjugate complex roots. Because the data are quarterly, we chose an AR(4) as our tentative model for Ut . This allows for imperfections in the seasonal adjustment and for the presence in the error of omitted variables which are seasonally unadjusted. Thus our structural model is (1 − φ1 L − φ2 L2 )(Yt − µ) = Ut , (2.3.1) (1 − ω1 L − ω2 L2 − ω3 L3 − ω4 L4 )Ut = εt . (2.3.2) with From (2.3.1) and (2.3.2) we obtained the restricted AR(6) reduced form Yt = µφ(1)ω(1) + (φ1 + ω1 )Yt−1 + (φ2 + ω2 − ω1 φ1 )Yt−2 + (ω3 − ω1 φ2 − ω2 φ1 )Yt−3 + (ω4 − ω2 φ2 − ω3 φ1 )Yt−4 − (ω3 φ2 + ω4 φ1 )Yt−5 − ω4 φ2 Yt−6 + εt , (2.3.3) which is just identified and implies the unrestricted reduced form AR(6) Yt = α0 + α1 Yt−1 + α2 Yt−2 + α3 Yt−3 + α4 Yt−4 + α5 Yt−5 + α6 Yt−6 + εt . (2.3.4) Note that if φ(L) has a unit root then φ(1) = 0 and if ω(L) has a unit root then ω(1) = 0. Thus, given a strong prior belief that µ 6= 0 or a large sample mean for Yt , an estimate of α0 close to 0 would lead us to question the stationarity of the both Yt and Ut . Our sample of 191 observations was obtained from the U.S. Department of Commerce, Bureau of Economic Analysis, Survey of Current Business, October 1995 and January 1996. These data are seasonally adjusted index numbers (based on 1987) extending from January 1948 until October 1995. We converted them to quarterly form by averaging over the months of each quarter: they are plotted in Figure 3. Neither Figure 3, nor the sample autocorrelations and partial autocorrelations7, in Table 1, display the pattern typical of a unit root but they do show the cyclical pattern rather clearly. These results, and those which follow, used the observations from the third quarter of 1949 until the third quarter of 1990, 165 observations. The earlier observations provided lagged values and the observations from the fourth quarter of 1990 to the third quarter of 1995 were used for post-sample forecasts. If our specification in (2.3.1) and (2.3.2) is adequate OLS applied to (2.3.4) should yield serially uncorrelated residuals and the roots of α̂(L) should contain a conjugate complex pair with a modulus less than one and a period between 12 and 24 quarters. These are estimates of the roots of φ(L) giving a damped cycle with a period of three to six years. There may also be additional conjugate 6 In this example we are abstracting from the problem of analyzing multiple cycles, such as long cycles in construction activity. 7 For all the calculations reported in Tables 1 to 4 we used TSP Version 4.3A for OS/2. complex pairs of roots with moduli less than one but with shorter periods which model the dynamics of Ut . The estimated lag polynomial, with standard errors in parentheses, is α̂(L) = 1 − 1.189L (07685) − .2700L5 (.1228) − .002388L2 (.1208) + .2014L6. (.07871) + .4343L3 (.1236) − .01077L4 (.1228) (2.3.5) The sample autocorrelations and partial autocorrelations for the OLS residuals are shown in the second and third columns of Table 1: they indicate a lack of serial correlations. The roots of α̂(L) plus their moduli and periods are given in Table 2. There are three complex conjugate pairs all of which have moduli less than one. The pair in the first line have a period of about 23 quarters which corresponds to our prior belief about the period of the business cycle. The other two pairs have shorter periods which we attribute to dynamics of the error. Thus on the basis of these results it appears that our specification is adequate. Unit root and trend analysis was carried out using (2.2.18) assuming a normal likelihood and α0 = α(1)µ0 , β1 = α(1)µ1 and ρ = φ1 + φ2 . The results are in Table 3 where the means of the posteriors are denoted by ˆ and the standard deviations by “Std Dev”. PD (ρ ≥ 1) is the posterior probability that there is a root of one or more when a diffuse prior is used while PJ (ρ ≥ 1) is obtained8 when Jefferys’ prior is used: see Phillips (1991). These results led us to infer that there is neither a unit root in Yt nor a linear trend9 . If the parameters φi and ωi are not of explicit interest and all that is wanted are forecasts of future values of Yt they can easily be obtained by both Bayesian and frequentist analysis of the unrestricted AR(6). Our primary interest here is to draw inferences about φ1 and φ2 with µ, ω1 , ω2 , ω3 , ω4 and σ being nuisance parameters. For this purpose we used NLS, which also gives the means and standard deviations of large-sample normal approximations to the posteriors for the φi and ωi , assuming diffuse priors and normal likelihood. The results are in the second column of Table 4. The approximate posterior mode for σ is denoted by σ̂. Since this model is just identified, these estimates could have been derived from those given in (2.3.5). The posterior means of ω2 and ω3 are small compared to their standard deviations so we imposed ω2 ≡ ω3 ≡ 0 which over identified the model. Thus NLS was used to obtain the results in the third column of Table 4. The approximation in (2.2.16) gave a Bayes factor in favor of the restricted model against the unrestricted model of 32.5. The results in Table 4 can be used to find approximate posterior probabilities for various aspects of the dynamic behavior of Yt . The roots of φ(L) are complex, 8 These values were calculated using the COINT routines for GAUSS which scale the trend value to range from 1 to 1/T. 9 We drew the same inference from the results of augmented Dickey-Fuller (Said and Dickey (1984)), Phillips-Perron (1988) and weighted symmetric (Pantula et.al. (1994)) tests. Table 1: Housing Starts; Autocorrelations and Partial Autocorrelations Data AR(6) Residuals Autocorr. Partial Autocorr. Partial Lag (S.E.) (S.E.) (S.E.) (S.E.) 1 .911 .911 .00213 .00213 (.0765) (.0765) (.0778) (.0778) 2 .765 -.376 -.0273 -.0273 (.125) (.0765) (.0779) (.0778) 3 .581 -.242 -.0238 -.0237 (.150) (.0765) (.0779) (.0778) 4 .397 -.0111 .0146 .0139 (.162) (.0765) (.0780) (.0778) 5 .230 -.0207 .0481 .0468 (.168) (.0765) (.780) (.0778)) 6 .0751 -.126 -.0628 -.0630 (.144) (.0765) (.0781) (.0778)) 7 -.0459 .0382 .00562 .00920 (.170) (.0765) (.0785) (.0778) 8 -.141 -.0471 .0232 .0221 (.170) (.0765) (.0785) (.0778) 9 -.210 -.0548 .0322 .0283 (.171) (.0765) (.0785) (.0778) 10 -.267 -.106 -.0387 -.0384 (.172) (.0765) (.0786) (.0778) Ljung-Box (1978) portmanteau statistics at lags 5 & 10 Q5 344 19.3 Q1 0 370 30.3 Data source: U.S. Department of Commerce, Survey of Current Business Table 2: Housing Starts; Roots of α̂(L) by OLS Roots .860 ± .242i −.624 ± .382i .358 ± .586i Modulus .893 .731 .687 Period 23.0 2.42 6.14 Table 3: Housing Starts; Posteriors for Unit Root and Trend No Trend Linear Trend α̂0 15.4 15.2 Std Dev 3.39 3.43 β̂1 —.00568 Std Dev —.0123 ρ̂ .836 .833 Std Dev .0352 .0359 PD (ρ ≥ 1) .000 .000 PJ (ρ ≥ 1) .000 .000 Table 4: Housing Starts; Asymptotic Posterior Moments Unrestricted Restricted Unrestricted Restricted ARAR(2,4) ARAR(2,4) ARMA(2,2) ARMA(2,2) φ̂1 1.72 1.63 1.04 1.22 Std Dev .0727 .0628 .189 .0771 φ̂2 -.798 -.722 -.217 -.371 Std Dev .0699 .0623 .176 .0741 ω̂1 -.531 -.461 ——Std Dev .0985 .0805 ——ω̂2 -.114 ——– —Std Dev .118 ———ω̂3 -.206 ———Std Dev .110 ———– ω̂4 -.252 -.149 ——Std Dev .0882 .0743 ——θ̂1 ——-.175 —Std Dev ——.177 ———-.398 -.329 θ̂2 Std Dev ——.0910 .0845 σ̂ 7.38 7.41 7.52 7.48 ln likelihood -560 -562 -565 -565 Q5 .658 4.80 2.37 3.96 Q10 1.89 5.79 6.96 7.39 Table 5: Housing Starts; Approx. Posterior Moments, Functions of φ1 and φ2 Unrestricted Restricted ARAR(2,4) ARAR(2,4) Dynamic Method of Std. Std. Property Approx. Mean Dev. Skew. Kurt. Mean Dev. Skew. Kurt. cycle φ21 + 4φ2 Normal Simulated -.233 -.228 .0631 .0648 0.0 .138 3.0 3.12 -.232 -.227 .0815 .0810 0.0 .0600 3.0 2.91 modulus √ −φ2 Normal Simulated .893 .893 .0391 .0392 0.0 -.120 3.0 3.03 .850 .849 .0366 .0365 0.0 -.123 3.0 2.91 period 2π/ arctan(ϑ) Normal Simulated 23.0 24.0 2.75 4.75 0.0 7.91 3.0 175 21.9 23.7 3.41 8.87 0.0 21.0 3.0 831 leading to a cycle in Yt , if φ21 + 4φ2 √ < 0. If a cycle is present, it will be damped if the modulus of the roots, −φ2 , is less than 1.0 and its period p will be 2π/ arctan(ϑ), where ϑ = −φ21 − 4φ2 /φ1 . One way to approximate the posterior probabilities of these properties is to take the first two terms in a Taylor series expansion of these functions about φ̂1 and φ̂2 which have asymptotically normal posterior distributions. The moments of these normal approximations are given in Table 5. An alternative procedure is to simulate the asymptotic joint posterior for the φi and ωi and at each replication compute the relevant functions of φ1 and φ2 . Estimates of posterior moments can be calculated from these simulated values and the relative frequency curves are estimates of the posterior densities. An estimate of the posterior probability of the existence of a cycle is the proportion of the φ√21 + 4φ2 values which are negative and the proportion of those values for which −φ2 < 1.0 is an estimate of the conditional posterior probability that the cycle is damped. Also, estimates of the conditional posterior probability for interesting ranges of the period can be calculated from the simulated values of 2π/ arctan(ϑ). We performed 5000 replications of these simulations for both the unrestricted and restricted ARAR(2,4) models. The relative frequency curves for φ21 +4φ2 are shown in Figure 4 for the unrestricted model and Figure 7 for the restricted model. Their moments are in the top block of Table 5. Although this is a nonlinear function of φ1 and φ2 , the two methods of approximation are very close to one another. The estimated posterior probability of the existence √ of a cycle is greater than .99 for both models. The relative frequency curves for −φ2 are in Figures 5 and 8 and their moments are in the second block of Table 5. The estimate of the conditional posterior probability of the cycles being damped is 1.0 for both models. In this case too, the two methods of approximation gave very similar results. The relative frequency curves for 2π/ arctan(ϑ) are in Figures 6 and 9 and their moments are in the bottom block of Table 5. Now the two methods of approximation were very different: the simulated values are positively skewed and leptokurtic. Still, our estimate of the conditional probability of the period being between 16 and 30 quarters is more than .90 for both models. While the ARAR model is best suited to obtaining inferences about the φi , it is also useful to compare its forecasting performance with that of a traditional ARMA model. An ARMA model might be suggested here by the conversion of the data from monthly to quarterly form by averaging. Application of Box-Jenkins model identification techniques to the results in Table 1 led to an ARMA(2,2) model. This is one of the models found by Pankratz10 to be adequate and it retains the form of φ(L) suggested by our prior beliefs. The NLS results for this model are in the last two columns of Table 4. The restriction θ1 ≡ 0, imposed to produce the results in the last column, was also imposed by Pankratz. Using a starting point based on the sample autocorrelations the ARMA routine took 22 iterations to converge. In contrast, the ARAR routine started from 0.0 and took five iterations to converge. Also, as a vehicle for learning about φ1 and φ2 the restricted ARMA(2,2) is less successful than the restricted ARAR(2,4): the Q10 is larger for the restricted ARMA(2,2), the absolute ratios of posterior means to standard deviations are larger for the restricted ARAR model and the approximate Bayes factor is 1.38 in favor of the restricted ARAR. Also, the ARMA(2,2) results have quite different dynamic properties relative to those of the ARAR(2,4) models: the approximate posterior probability of cycles for the unrestricted ARMA(2,2) is only .276 while that for the restricted ARMA(2,2) is only .401. We obtained one-step-ahead forecasts by the restricted and unrestricted ARAR(2,4) model and by the restricted and unrestricted ARMA(2,2) for 20 quarters beyond the end of the sample. For the first forecast the coefficients were set at the sample period posterior means. Then for subsequent forecasts they were updated for each period. Since data for this period had been left out of the sample used in estimation, we were able to calculate the forecast errors for each of the 20 quarters in the forecast period. Summary statistics for these forecast errors, as percentages of the actual future values of Y t , are in Table 6. The main difference between the methods is the size of the mean percentage forecast errors, which were larger for the ARAR models. This made the RMSE for the unrestricted ARAR(2,4) larger than that for the unrestricted ARMA(2,2). However, the standard deviation of the restricted ARAR percentage forecast errors was smaller than for the restricted ARMA so that in this case ARAR has a smaller RMSE. These forecasts are rather imprecise because, as noted above, obvious exogenous variables have been omitted and no account has been taken of long cycles in building activity. This example demonstrates several advantages of the ARAR model over the ARMA model. NLS estimates of the ARAR parameters took many fewer iterations than ARMA estimates. There was more serial correlation in the 10 The other model which Pankratz found adequate was an ARMA(3,2) with φ 2 set to 0. For our data that model has slightly higher residual autocorrelation, was less parsimonious and had larger forecast errors than did the ARAR(2,2). Table 6: Housing Starts; Percentage Forecast Error Summary Statistics Unrestricted Restricted Unrestricted Restricted ARAR(2,4) ARAR(2,4) ARMA(2,2) ARMA(2,2) Mean 2.92 2.52 1.85 1.86 Std Dev 7.25 6.68 7.25 6.98 RMSE 7.82 7.14 7.49 7.22 ARMA residuals than in the ARAR residuals and the Bayes factor favored the ARAR model. The estimates of the ARAR parameters and the dynamic properties of the estimated ARAR model were in closer accord with our prior beliefs than were the ARMA results. Finally, the restricted ARAR model had percentage forecast errors with smaller RMSE than did the restricted ARMA model. Although a sample of size 165 may seem large enough to justify largesample approximate posteriors, it is still useful to consider the calculation of exact posteriors for the φi and ωi . We adapted the procedure in Zellner (1971) Chapter IV to obtain exact posteriors for φ1 , φ2 , ω1 and ω4 from the restricted ARAR(2,4) model of column 3 in Table 411 . Let φ0 = [φ0 , φ1 , φ2 ], where φ0 = µφ(1), and ω 0 = [ω1 , ω4 ]. Our prior12 on φ, ω and σ was p(φ, ω, σ) = p(φ|σ)p(ω|σ)p(σ). (2.3.6) An inverted gamma distribution was used for p(σ) with parameters s2 = 1.0 and v = 1 giving a mode of .7071. We set p(ω|σ) ∝ 1.0 to reflect our lack of prior knowledge regarding the behavior of Ut . We chose a normal form for p(φ|σ) with E(φ|σ)0 = φ0 = [0, 1.0, −.5] and covariance matrix V (φ|σ) = σ 2 W −1 with 400 0 0 16 −8 . W −1 = 0 0 −8 16 This prior is centered in the region corresponding to complex roots for φ(L) but it is still rather uninformative with respect to φ1 and φ2 . For example, conditional on σ = .7071, the prior probability of obtaining complex roots is only about .12. 11 For these calculations the data was measured as deviations from the estimated sample mean divided by the standard deviation of these deviations. This had no effect on the φ̂1 , φ̂2 , ω̂1 or ω̂4 but it served to eliminate overflow and underflow errors in the numerical evaluations of the integrals discussed below. 12 This prior, and the improper prior used in section (3.3), are quite different to the smoothness priors introduced by Shiller (1973) because they assign prior probability to individual parameters while the smoothness prior assigns prior probability only to the differences between parameters Let the vector of initial values of Yt be y00 = [y1 , . . . , yp+r ] and the vector of the T remaining values be y0 = [yp+r+1 , . . . , yp+r+T ]. For this example p = 2, r = 4, T = 165 and all inferences are conditional on y0 . Define y(ω) as the T × 1 vector with elements yt (ω) = ω(L)yt and z(ω) as the T × k matrix with rows zt0 (ω) = [ω(1), yt−1 (ω), yt−2 (ω)], where k = p + 1. Assume εt ∼ IN (0, σ 2 ). Then the joint posterior, after completing the square on φ, can be written as ( ) g 0 Q(ω)[φ − φ(ω)]) g 2 (ω) + [φ − φ(ω)] (ṽ sg (ṽ−+k+1) p(φ, ω, σ|y, y0 ) ∝ σ exp − 2σ 2 g = [W + z0 (ω)z(ω)]−1 [W φ + z0 (ω)y(ω)], where: ṽ = T + v; φ(ω) 0 and g [W + z0 (ω)z(ω)]φ(ω) g 2 (ω) = vs2 + φ0 W φ + y0 (ω)y(ω) − φ(ω) ṽ sg Q(ω) = [W + z0 (ω)z(ω)]. The use of the proper prior p(φ|σ) in (2.3.6) ensures that Q(ω) will be nonsingular even if ω(1) = 0, thus avoiding the need to confine attention to models with µ ≡ 0 or to impose the restriction ω(1) 6= 0: see Zellner (1971) page 89. Integrating out σ 2 gives h i−(ṽ+k)/2 n o−(ṽ+k)/2 g 0 H(ω)[φ − φ(ω)] g 2 (ω) p(φ, ω|y, y0 ) ∝ sg ṽ + [φ − φ(ω)] h i−1 (2.3.7) 2 (ω) Q(ω). where H(ω) = sg Integrating (2.3.7) over φ gives the joint posterior density for ω1 and ω4 . h i−ṽ/2 −1/2 2 (ω) . |W + z0 (ω)z(ω)| p(ω|y, y0 ) ∝ ṽ sg (2.3.8) However, the lack of identification mentioned above means that (2.3.8) could also be the posterior density for φ1 and φ2 . This composite (scaled) posterior is shown in Figure 10. Identification was achieved by imposing our prior belief that φ(L) has conjugate complex roots leading to cyclical behavior in yt . Thus we identified the higher hill in Figure 10, in the region leading to complex roots, as the joint posterior for φ1 and φ2 and the lower hill, in the region leading to real roots, as the joint posterior for ω1 and ω4 . Since (2.3.8) is difficult to integrate analytically, the marginal posteriors for ω1 , ω4 , φ1 and φ2 were obtained numerically. To obtain the marginal posterior for either ω1 or ω4 we numerically integrated13 over the other ωi . Since conditional on ω, (2.3.7) is multivariate Student-t, conditional posterior densities of the φi are univariate Student-t which can be obtained analytically. Then the products of these conditional densities and the joint density for ω in (2.3.8) are the conditional densities of the φi , given ω. 13 All the numerical integrations reported here were done by Gaussian quadrature using GAUSS 386 version 3.2.13 Table 7: Housing Starts; Exact Posterior Moments, Restricted ARAR(2,4) Coeff. φ1 φ2 ω1 ω4 Mean 1.62 -.706 -.440 -.137 Std. Dev. .0680 .0671 .0860 .0768 Skewness -.0000186 -.000210 -.00176 -.00848 The marginal posterior densities of the φi were obtained by integrating these products over ω1 and ω4 14 . The marginal posteriors for ω1 and ω4 are plotted in Figures 11 and 12, those for φ1 and φ2 are in Figures 13 and 14. In each figure the exact density is graphed with a solid line while the large-sample normal approximate density, is graphed with a dashed line. The exact posterior moments are given in Table 7. The exact and approximate posterior densities differ because they are based on different priors and because the sample is only moderately large. However, they are quite close in the case of φ1 and φ2 , which are the parameters of most interest. To summarize, the exact posterior densities for the ARAR parameters were easily obtained and were close to the asymptotic normal posteriors derived from the NLS estimates. In the next section these techniques will be extended to single equation, distributed lag models. 3 3.1 Distributed Lag Models Early distributed lag models Jorgenson (1966) considered the general distributed lag model Yt = µ + λ(L)xt + Vt (3.1.1) where xt is the value of an exogenous variable Xt , the lag polynomial λ(L) is infinitely long and the error was modeled as Vt = εt . Then λ(L) was approximated as δ(L)/φ(L) with δ(L) = δ0 + δ1 L + . . . + δm Lm and φ(L) as in (2.0.1), having no factors in common with δ(L). These assumptions lead to φ(L)Yt = µ0 + δ(L)xt + φ(L)εt (3.1.2) where µ0 = φ(1)µ. The parameters of interest are the coefficients of the lag polynomials φ(L) and δ(L). As in the previous section, the first disadvantage of this model for Vt is that it confines the effect of a shock εt on Yt − µ − λ(L)xt to only the current period. Secondly, it implies an MA form for the error term in (3.1.2) leading to complicated inference procedures15 . 14 Gibbs sampling procedures could also have been employed to compute these integrals. Marginal densities for φ1 or φ2 can also be obtained by integrating the portion of (2.3.8) which we have identified as the joint posterior of φ1 and φ2 over the other φi . 15 These complications are avoided if the error term φ(L)ε is white noise; see Zellner and t Geisel (1970). A more general model for Vt is the stationary ARMA π(L)Vt = θ(L)εt . (3.1.3) This leads to the Box and Jenkins(1976) transfer function model π(L)φ(L)Yt = π(1)µ0 + π(L)δ(L)xt + φ(L)θ(L)εt . (3.1.4) Now it’s even more difficult to draw inferences about the φi and δi because of the long, restricted MA structure of the error. Also it’s important not to over parameterize the pairs δ(L), φ(L) and π(L), θ(L) to avoid introducing common factors which destroy identification. The ARMAX model can be obtained by either imposing π(L) ≡ φ(L) on (3.1.3) or by adding a distributed lag in xt to the univariate ARMA model (2.1.2) leading to φ(L)Yt = µ0 + δ(L)xt + θ(L)εt . (3.1.5) This model is somewhat simpler to analyze than the transfer function model but it retains the inconvenient MA error structure. 3.2 The ARAR distributed lag model (ARDLAR) Our general distributed lag model is the univariate ARAR model (2.0.1) extended by the addition of a distributed lag in xt to obtain φ(L)Yt = µ0 + δ(L)xt + Ut , (3.2.1) We assume that the degrees of φ(L) and δ(L) can be specified a priori, that φ(L) is invertible and that it has no factors in common with δ(L). Then we can write the model as Yt µ0 δ(L) 1 + xt + Ut φ(1) φ(L) φ(L) = µ + λ(L)xt + Vt . = (3.2.2) This model retains the same infinite lag structure on xt as Jorgenson’s model (3.1.1), the transfer function model (3.1.4), and the ARMAX model (3.1.5). It also approximates this infinite lag structure by the ratio of the two finite lag polynomials: λ(L) = δ(L)/φ(L). However, in contrast to these models, we specify the same stationary AR process for Ut as in the univariate case: ω(L)Ut = εt and φ(L)Vt = Ut . Then (3.1.1) becomes the structural ARDLAR(p,m,r) model16 Yt = µ + 1 δ(L) xt + εt . φ(L) φ(L)ω(L) (3.2.3) We can rewrite (3.2.3) as the reduced form ω(L)φ(L)Yt = α0 + ω(L)δ(L)xt + εt , 16 This (3.2.4) produces the same results as the technique introduced by Fuller and Martin (1961). where α0 = µω(1)φ(1). Our ARDLAR model is simpler than the transfer function model but it still produces very rich behavior in both Yt and Vt because both λ(L) and 1/(ω(L)φ(L)) are infinite in length. Thus it retains the infinite lag structure on xt and there is no need to assume that Ut is white noise to facilitate estimation. Our model is especially convenient because it allows seasonal behavior to be included in ω(L), which could be the product of seasonal and nonseasonal polynomials. The ARDLAR lag model requires the same sort of prior information to identify the φi and ωi as does the ARAR model. Write the reduced form (3.2.4) as α(L)Yt = α0 + β(L)xt + εt , (3.2.5) where α(L) = ω(L)φ(L) and β(L) = ω(L)δ(L). In contrast to the univariate case, interchanging φ(L) and ω(L) in (3.2.4) will change the likelihood function except in the special case that φ(L) = ω(L). Similarly, interchanging φ(L) and δ(L) will change the likelihood function, unless φ(L) = δ(L), and so will interchanging ω(L) and δ(L), unless ω(L) = δ(L). Thus, so long as δ0 6= 1.0, a sufficient condition for identification is that φ(L) and ω(L) be of different degrees. Two other sources of identification failure are removed by assuming that φ(L) has its first term equal to 1.0 and that there are no common factors in φ(L) and δ(L). Now assume that α(L) and β(L) are known but φ(L), ω(L) and δ(L) are unknown and write α(L) in terms of its roots, ηi . Given the values of ηi we would use prior information in the same way as with the ARAR model to decide which (1 − ηi L) are part of φ(L) and which are part of ω(L). Once ω(L) had been identified in this way it could be factored out of β(L) to leave δ(L). In practice, the roots ηi and the polynomial β(L) are unknown but estimates of them may provide some guidance as to identification. In general, the number of free parameters in α(L) and β(L) are p + r and r + m + 1, respectively. Unrestricted OLS applied to (3.2.5) gives p + m + 2r + 1 point estimates which exceeds the number of free parameters in φ(L), ω(L) and δ(L) by r: thus ARDLAR models are generally over identified. Note that (3.2.5) is the autoregressive-distributed-lag (ARDL) model much favored by Hendry: see inter alia Hendry et al (1986). However, the two models are built in different ways. The ARDLAR model begins with fairly precise prior beliefs about the dynamics represented by φ(L) and δ(L). It then explicitly introduces the common factor ω(L) as the model for the error Ut . Prior beliefs about ω(L) are often diffuse. The parameters of interest are the φi and δi while the ωi and σ are nuisance parameters. In contrast, the ARDL model begins with the unrestricted polynomials α(L) and β(L) and tries, through COMFAC analysis, to discover if they contain any common factors, like ω(L). There are no prior beliefs regarding φ(L) and δ(L). Any common factors that are discovered are then put into an AR model for the error. In this procedure the parameters of interest are initially the αi and βi but if common factors are discovered interest presumably switches to the φi and δi . Since this procedure uses the same data over several rounds of testing, it is subject to the pre-test problems mentioned above. Assuming identification, the common factor ω(L) in (3.2.4) imposes a restriction linking the parameters in α(L) and β(L). To exploit this restriction, first condition on the coefficients of ω(L) to obtain φ(L)[ω(L)Yt ] = δ(L)[ω(L)xt ] + εt . (3.2.6) Then condition on the elements of φ(L) and δ(L) to obtain ω(L)[φ(L)Yt ] = ω(L)[δ(L)xt ] + εt . (3.2.7) Equations (3.2.6) and (3.2.7) can be used to generate NLS estimates or to provide the basis for application of the Gibbs sampler. An interesting simple example assumes that the degrees of δ(L) and φ(L) are both one, with | φ |< 1 and δ1 /δ0 6= −φ, so that Yt = µ + (δ0 + δ1 L) xt + V t . (1 − φL) (3.2.8) If observations are taken quarterly without seasonal adjustment an attractive AR model for Ut is the product of a nonseasonal AR(1) and a seasonal AR(1): ω(L)Ut = (1 − ω1 L)(1 − ω4 L4 )Ut = εt (3.2.9) with | ω1 |< 1 and | ω4 |< 1. Thus our model for Ut is a restricted AR(5) with two free parameters and our model for Vt is a restricted AR(6) with three free parameters (φ, ω1 and ω4 ) plus σ 2 . Since δ(L) 6= δ0 and the degree of ω(L) is different from that of φ(L), φ, ω1 and ω4 are identified, as are δ0 and δ1 . The model for Yt can be written as (1−ω1 L)(1−ω4 L4 )(1−φL)Yt = α0 +(1−ω1 L)(1−ω4 L4 )(δ0 +δ1 L)xt +εt (3.2.10) or as the unrestricted ARDL Yt = α0 + Σ6j=1 αj Yt−j + Σ6j=0 βj xt−j + εt . (3.2.11) The over identification of this model is easily seen from the restrictions: α0 = µφ(1)ω(1) α1 = ω 1 + φ α2 = −ω1 φ α3 = 0 α4 = ω 4 α5 = −ω4 (ω1 + φ) α6 = ω 1 ω4 φ β 0 = δ0 β 1 = δ 1 − ω 1 δ0 β2 = −ω1 δ1 β3 = 0 β4 = −ω4 δ0 β5 = ω4 (ω1 δ0 − δ1 ) β6 = ω 1 ω4 δ 1 . (3.2.12) To take advantage of the common factors in (3.2.11) first condition on ω1 and ω4 , then on φ, δ0 and δ1 to obtain equations analogous to (3.2.6) and (3.2.7). The first step in building a general ARDLAR model is to select p, the degree of φ(L), and m, the degree of δ(L). These selections should reflect prior beliefs, based on subject matter knowledge and previous research, about the presence of cycles or trends and about the pattern of the coefficients in the ratio λ(L) = δ(L)/φ(L). A tentative AR model of degree r for the error Ut should now be chosen. This will imply that α(L) is of degree p + r and β(L) is of degree m + r. The adequacy of the choice of r can be checked by examining the residuals from the OLS fit of (3.2.5). If the aim of the model is only to produce forecasts one may wish to stop at this point. However, if the aim is to gain knowledge about the parameters of φ(L) and δ(L) the procedures discussed above and illustrated below should be followed. Error correction models, partial adjustment models and adaptive expectations models can all be written as ARDLAR models. A summary of these model is given in Table 14 below. 3.3 Empirical example: growth of real GDP The AR(3) model has often been used to forecast real GDP or its rate of growth, as in Geweke(1988). However, Garcia-Ferrer et al (1987), Zellner and Hong (1989), Hong (1989) and Zellner et al (1991) have shown that AR(3) models give poor forecasts of turning points which can be improved by converting them to distributed lag models including lagged exogenous variables as leading indicators. We show in this example how a further extension to an ARDLAR is useful. Let Yt be the first difference of the log of real GDP. We begin by building univariate AR and ARAR models for Yt to further illustrate the ARAR technique and to provide a basis for comparison with the subsequent ARDLAR model. We then introduce two leading indicators and use them as the exogenous variables in an ARDLAR model. We follow earlier literature in specifying the first of our univariate models as an unrestricted AR(3). A priori we believe that Yt follows a damped cycle so the first step in building our ARAR model was to specify p = 2. Then setting r = 1 led to an α(L) of degree three, as in the unrestricted AR(3). (1 − ω1 L)(1 − φ1 L − φ2 L2 )Yt = α0 + εt . (3.3.1) Yt was constructed from quarterly observations on U.S. GDP in millions of 1987 dollars, seasonally adjusted at annual rates, from the fourth quarter of 1949 to the fourth quarter of 1990; 165 observations. Observations from the first quarter of 1991 to the third quarter of 1995 were held back for use in calculating forecast errors. The results of trend and unit roots analyses using three lags (the same as in (3.3.1)) appear in top panel of Table 8 which uses the same notation as Table 3. We concluded from these results that Yt is free of unit roots and trends. Table 8: Unit Roots and Trends; Real GDP, M2 and Stock Returns No Trend Linear Trend Rate of Growth of Real GDP α̂0 .04604 .006394 Std Dev .001045 .001776 β̂1 —-.003146 Std Dev —.002529 ρ̂ .3894 .36375 Std Dev .09417 .09600 PD (ρ ≥ 1) .00000 .00000 PJ (ρ ≥ 1) .00000 .00000 Rate of Growth of Real M2 α̂0 .002594 .0032332 Std Dev .0008632 .001585 β̂1 —-.001253 Std Dev —.002611 ρ̂ .5922 .5910 Std Dev .06351 .06352 PD (ρ ≥ 1) .00000 .00000 PJ (ρ ≥ 1) .00000 .00000 Rate of Growth of Real Stock Returns α̂0 .006503 .01626 Std Dev .005782 .01163 —-.01928 β̂1 Std Dev —.01996 ρ̂ .1447 .1370 Std Dev .07708 .07726 PD (ρ ≥ 1) .00000 .00000 PJ (ρ ≥ 1) .00000 .00000 OLS results for the unrestricted AR(3) model are in the second column of Table 9. The roots of α̂(L), their moduli and periods are in the top panel of Table 10. They support our a priori belief in a damped cycle. Because the ARAR(2,1) model is just identified estimates of its parameters could have been obtained from the OLS results or directly by NLS: see column three of Table 9. Of course, the ln likelihoods, residual diagnostics, estimated roots, moduli and periods are the same for the two methods. The value of α̂3 obtained by OLS is small compared to its standard deviation but the values of both φ̂2 and ω̂1 are large compared to their standard deviations. Thus setting α3 to 0 would be a specification error. Table 11 uses the same notation as Table 5 to present posterior results for the dynamic properties of the estimated φ(L) from Table 9. The asymptotic posterior probability of the presence of a cycle is .72. We now introduce two leading indicator variables which have been used in the past by Garcia-Ferrer et al (1987), Zellner and Hong (1989), Hong (1989) and Zellner et al (1991); the rate of growth of real M2 and the real rate of return on stocks. Monthly data on M2 in billions of 1987 dollars and on an index of the prices of 500 common stocks were taken from U.S. Department of Commerce, Bureau of Economic Analysis, Survey of Current Business, October 1995 and January 1996. To convert these monthly series to quarterly series we used the value for the third month in each quarter as the value for that quarter. The rate of growth of real M2, x1,t , is the first difference of the log of M2 and the real rate of return on stocks, x2,t , is the first difference of the log of common stock prices which were deflated by the consumer price index. The results of trend and unit roots analyses for x1,t and x2,t are in the second and third panels of Table 8. The lag length was chosen to minimize the BIC for an unrestricted AR model: it was a lag of one in both cases. Here too, all the posterior probabilities of stochastic nonstationarity were zero to eight significant digits. To obtain an ARDLAR model we added distributed lags in x1,t and x2,t to (3.3.1). In doing so we specified the lag polynomials so as to: reflect the character of x1,t and x2,t as leading indicators; account for the fact that the reporting lag before data on the growth of M2 are publicly available is much longer than the reporting lag for real stock returns; take advantage of the way in which monthly values of x1,t and x2,t were converted to quarterly values. This produced a generalization of the restricted reduced form (3.2.4) which we label ARDLAR(p, m1 , m2 , r) ω(L)φ(L)Yt = α0 + ω(L)δ1 (L)x1,t + ω(L)δ2 (L)x2,t + εt (3.3.2) where: ω(L) and φ(L) are specified as above; δ1 (L) = δ1,2 L2 ( so m1 = 2 with δ1,0 = δ1,1 = 0 ) and δ2 (L) = δ2,1 L (so m2 = 1 with δ2,0 = 0). Thus, there is a delay of three months before x1,t has an impact on Yt and a delay of only one month before x2,t has an impact. We can also write (3.3.2) as the partially restricted reduced form ARDL(3,3,2) model Yt = α0 + α1 Yt−1 + α2 Yt−2 + α3 Yt−3 + β1,2 x1,t−2 + β1,3 x1,t−3 + β2,1 x2,t−1 + β2,2 x2,t−2 + εt , (3.3.3) Table 9: GDP Growth; Asymptotic Posterior Moments Restricted AR(3) ARAR(2,1) ARDLAR(2,2,1,1) ARDL(3,3,2) Eqn (2.2.5) Eqn (3.3.1) Eqn (3.3.2) Eqn (3.3.3) By OLS on ARDL(3,3,2) Eqn (3.3.3) α̂0 Std Dev α̂1 Std Dev α̂2 Std Dev α̂3 Std Dev .004884 .001103 .3403 .07883 .1257 .08267 -.08642 .07841 .004884 .001103 ——————- .004049 .001056 ——————- .004049 .001056 .2091 .07766 .1357 .07206 -.05417 .06628 .004023 .001056 .1993 .07826 .1158 .07693 -.01090 .07346 β̂1,2 Std Dev β̂1,3 Std Dev β̂2,1 Std Dev β̂2,2 Std Dev ————————- ————————- ————————- .1100 .05283 .04623 .02255 .04137 .008879 .01739 .005981 .1328 .07869 .003707 .07698 .03639 .009529 .02818 .01047 φ̂1 Std Dev φ̂2 Std Dev ω̂1 Std Dev δ̂1,2 Std Dev δ̂2,1 Std Dev ——————————- .7676 .1443 -.2023 .1279 -.4273 .1364 ————- .6295 .1450 -.1289 .1208 -.4203 .1410 .1100 .05283 .04137 .008879 ——————————- ——————————- σ̂ .09761 .009761 .008921 .008921 .08919 ln(L) Q5 Q10 531.7 .459 6.78 531.7 .459 6.78 547.3 .803 9.41 547.3 .803 9.41 548.7 .537 7.96 where the restrictions, which over identify the model, are α0 = µφ(1)ω(1) α 1 = ω1 + φ1 α 2 = φ 2 − ω 1 φ1 β1,2 = δ1,2 α3 = −ω1 φ2 β1,3 = −ω1 δ1,2 β2,1 = δ2,1 β2,2 = −ω1 δ2,1 . NLS results for this model are in the fourth column of Tables 9. They show that the addition of the distributed lags in x1,t and x2,t was useful because the posterior means δ̂1,2 and δ̂2,1 are large compared to their standard deviations and because the residual variance was reduced. Also the approximate Bayes factor is 36100 in favor of the ARDLAR(2,2,1,1) over the ARAR(2,1). The fifth column of Table 9 shows the results for the reduced form ARDL(3,3,2) implied by the restrictions shown below (3.3.3), while those in the sixth column were obtained by unrestricted OLS on (3.3.3). A researcher who focussed on these OLS results might be tempted to set α3 and β1,3 to zero. But the results in columns four and five of Table 9 suggest that these restrictions would be inappropriate. Also the approximate Bayes factor in favor of the ADLAR(2,2,1,1) model over the partially restricted ARDL(3,3,2) is 40.69. The estimated roots of α̂(L) implied by the ADLAR(2,2,1,1) model, plus their modulus and period, are given in the third panel of Table 10. They agree closely with those from the AR(3) and ARAR(2,1) models but are much different to those from the partially restricted ARDL(3,3,2) model in the fourth panel. As in the housing example, we approximated the posterior distributions of the dynamic properties of φ(L) from the ARAR(2,1) and ARDLAR(2,2,1,1) models in two ways: by a normal distribution using the first two terms in a Taylor series expansion of the asymptotic NLS distributions or by simulating the nonlinear functions 5000 times. The moments of these two approximations are in Table 11 and their relative frequency curves are in Figures 15 to 20. From the simulations the estimated posterior probability of there being a cycle is .73 for the ARAR(2,1) model and .62 for the ARDLAR(2,2,1,1) model. Conditional on there being a cycle, the estimated probability of it being damped was 1.0 for both models. As with the housing example, the two approximations were quite close to one another for the functions showing the existence of a cycle and its modulus. However, the simulations of the period were skewed right and leptokurtic especially for the ARAR(2,1) model. For both models the estimated probability of the period being between six quarters and 30 quarters was greater than .96. Next we obtained forecasts of the level of GDP for 16 periods beyond the end of the sample used in estimation. For this purpose the ARAR(2,1) model was written in its restricted AR(3) reduced form and the ARDLAR(2,2,1,1) in its restricted ARDL(3,3,2) reduced form. Then forecasts were calculated from these two models and from the partially restricted ARDL(3,3,2) and converted to forecasts of the level of GDP. Summary statistics for the percentage forecast Table 10: GDP Growth; Roots of Estimated α̂(L) Roots Modulus Period Unrestricted AR(3) by OLS .3838 ± .2344i -.4273 .4497 .4273 11.46 α̂(L) From ARAR(2,1) by NLS .3838 ± .2343i -.4273 .4497 .4273 11.46 α̂(L) From ARDLAR(2,2,1,1) by NLS .3147 ± .1727i -.4203 .3590 .4203 12.52 α̂(L) From ARDL(3,3,2) by OLS .4151 -.3026 .08681 .4151 .3026 .08681 Table 11: GDP Growth; Approx. Posterior Moments, Functions of φ1 and φ2 ARAR(2,1) ARDLAR(2,2,1,1) Dynamic Method of Std. Std. Property Approx. Mean Dev. Skew. Kurt. Mean Dev. Skew. Kurt. cycle φ21 + 4φ2 Normal Simulated -.220 -.199 .334 .329 0.0 .138 3.0 2.95 -.119 -.0928 .334 .336 0.0 .256 3.0 3.15 modulus √ −φ2 Normal Simulated .450 .500 .142 .0925 0.0 .179 3.0 2.75 .359 .438 .168 .0922 0.0 .194 3.0 2.66 period 2π/ arctan(ϑ) Normal Simulated 11.5 13.2 6.02 15.7 0.0 16.5 3.0 362 12.5 12.8 13.1 13.2 0.0 8.59 3.0 103 Table 12: GDP Growth; Percentage Forecast Error Summary Statistics Partially Restricted ARAR(2,1) ARDLAR(2,2,1,1) ARDL(3,3,2) Mean -.0434 -.180 -.170 Std Dev 1.65 1.74 1.74 RMSE 1.66 1.75 1.75 errors are shown in Table 12. The forecasting performance of the ARAR model was the best of the three models, in spite of its lack of exogenous variable. On RMSE grounds the ARDLAR and ARDL models were nearly equal. To calculate exact posterior results for the ARDLAR(2,2,1,1) model, let x0t = [x1,t−2 , x2,t−1 ] and γ 0 = [µ0 , φ1 , φ2 , δ1,2 , δ2,1 ]. In contrast to the housing model we used a uniform, diffuse prior for ω1 , γ 0 and log σ. Let s = max(p + r, m1 + r, m2 + r), and write the initial values of Yt and xt as y00 = [y1 , . . . , ys ] and x00 = [x01 , . . . , x0s ]. Write the T remaining observations as y0 = [ys+1 , . . . , ys+T ] and x0 = [xs+1 , . . . , xs+T ]. In this example s = 3 and T = 165. All inferences were conditional on x00 and y00 . Define y(ω) as the T × 1 vector with elements yt (ω) = yt − ω1 yt−1 and x(ω) as the T × 2 matrix with rows x0t − ω1 x0t−1 . Then let z(ω) be the T × k matrix with rows z0t (ω) = [ω(1), yt−1 (ω), yt−2 (ω), x0t (ω)], where k = 1 + p + 2 = 5. Assume εt ∼ IN (0, σ 2 ). Then the joint posterior, after completing the square on γ, is ) ( g 0 [z0 (ω)z(ω)][γ − γ(ω)]) g 2 (ω) + [γ − γ(ω)] (v sg −(T +2) p(γ, ω, σ|y, z, y0 , x0 ) ∝ σ exp − 2σ 2 where; g = [z0 (ω)z(ω)]−1 z0 (ω)y(ω) v = T − k; γ(ω) and g 0 [z0 (ω)z(ω)]γ(ω). g 2 (ω) = y0 (ω)y(ω) − γ(ω) v sg Integrating out σ 2 gives h i−(v+k)/2 n o−(v+k)/2 g 0 H(ω)[γ − γ(ω)] g 2 (ω) p(γ, ω|y, z, y0 , x0 ) ∝ sg v + [γ − γ(ω)] h 2 (ω) where H(ω) = sg i−1 (3.3.4) 0 [z (ω)z(ω)]. Integrating (3.3.4) over γ gives the marginal posterior density for ω1 . h i−v/2 −1/2 2 (ω) |z0 (ω)z(ω)| . p(ω1 |y, z, y0 , x0 ) ∝ v sg (3.3.5) This density is plotted, together with the large-sample approximation, in Figure 21. Posterior moments for ω1 are in Table 13. Since a diffuse prior was used, the difference between the two densities is due entirely to the approximation error resulting from the small sample size. Conditional on ω1 , (3.3.4) is a multivariate Table 13: GDP Growth; ARDLAR(2,2,1,1) Model, Exact Posterior Moments Coeff. φ1 φ2 ω1 δ1,2 δ2,1 Mean .4678 -.02723 -.4203 .1472 .03836 Std. Dev. .1842 .1190 .1898 .06145 .008749 Skewness -.4976 -.02367 .5911 .3166 -.02477 Student-t density so the conditional posterior densities of the γi are univariate Student-t. To obtain the marginal posterior density for an individual γi we numerically integrated ω1 out of the product of the conditional Student-t density and the marginal density for ω1 given in (3.3.5). The resulting posterior densities are plotted in Figures 22 to 25 and their moments are in Table 13. Clearly, the asymptotic approximations are not nearly so close to the exact densities as was the case in the housing example above. 4 Conclusions We have shown in this paper that a useful model for the error in univariate and single equation distributed lag models is a finite, stationary AR. In the past researchers have often used a white noise or MA model for the error without much prior or other information to support their choice. The ARAR and ARDLAR models allow for very rich behavior of the error process and yet are usually easier to implement empirically than models with MA errors. Other appealing features of our ARAR model are its parsimony and the fact that all its components obey the Wold decomposition. We have shown theoretically and empirically how a researcher’s prior beliefs about the autoregressive structure of the observable variable can be used to solve the identification problem inherent in the model. In cases with restrictions on the parameters of the lag polynomials or if exogenous variables are present the models become over identified. The extension of these ideas to dynamic multivariate time series, simultaneous equations models and nonlinear models displaying chaos are the subjects of our current research. Table 14: Examples of Other ARDLAR Models Simple Error Correction Model Banerjee et al (1993), Hendry et al (1986) φ(L) = 1 − φL, ω(L) = 1 − ωL, δ(L) = δ0 + δ1 L, ∆Yt = α0 + (ω + φ − 1)(Yt−1 − xt−1 ) − ωφ(Yt−2 − xt−2 ) + δ0 ∆xt + [ω + φ + δ1 + (1 − ω)δ1 − 1]xt−1 − ω(φ + δ1 )xt−2 + εt Partial Adjustment Model Harberger(1960), Nerlove(1958), Pagan(1985) ∆Yt = µ∗ + φ∗ (yt∗ − Yt−1 ) + Ut ,; 0 ≤ φ∗ < 1 yt∗ = ψ0 + ψ1 xt φ(L) = 1 − φL, δ(L) = δ0 , ω(L) = 1 φ = 1 − φ ∗ , δ 0 = φ ∗ ψ 1 , µ0 = µ ∗ + φ ∗ ψ 0 (1 − φL)Yt = µ0 + δ0 xt + εt Adaptive Expectations Model Zellner and Geisel (1970), Zellner (1971) Yt = µ + ψx∗t+1 + Vt , x∗t+1 = φx∗t + (1 − φ)xt , 0 ≤ φ ≤ 1 δ0 = ψ(1 − φ) ω(L)(1 − φL)Vt = εt ω(L)(1 − φL)Yt = δ0 ω(L)xt + εt Finite Distributed Lag Model φ(L) ≡ 1.0 ω(L)Yt = ω(L)δ(L)xt + εt Acknowledgements Carter acknowledges the generous hospitality of the Graduate School of Business, The University of Chicago during the conduct of this research. Zellner’s research was financed in part by the National Science Foundation and by the H.G.B. Alexander Endowment Fund, Graduate School of Business, University of Chicago. References Banerjee, A., J. Dolado, J. W. Galbraith and D. F. Hendry (1993): CoIntegration, Error-Correction, and the Econometric Analysis of Non-Stationary Data, Oxford, Oxford University Press. Box, G. E. and G. Jenkins (1976): Time Series Analysis: Forecasting and Control, 2nd ed. San Francisco, Holden Day. Burns, A. F. and W. C. Mitchell (1946) Measuring Business Cycles, New York, NBER. Caplin, A. and J. Leahy (1999): “Durable Goods Cycles,” NBER Working Paper 6987. Cooper, R. and A. Johri (1999): “Learning by doing and aggregate fluctuations,” NBER Working Paper 6898. Fuller, W.A. and J.E. Martin (1961): “The effects of autocorrelated errors on the statistical estimation of distributed lag models,” Journal of Farm Economics, 63, 71-82. Gandolfo, G. (1996): Economic Dynamics, 3rd ed., Berlin, SpringerVerlag. Garcia-Ferrer, A., R.A. Highfield, F. Palm and A. Zellner (1987): “Macroeconomic forecasting using pooled international data,” Journal of Business and Economic Statistics, 5, 53-67. Geweke, J. (1988): “The secular and cyclical behavior of real GDP in 19 OECD countries,1957-1983,”Journal of Business & Economic Statistics, 6,479-486. Goodwin, R.M. (1947): “Dynamical coupling with especial reference to markets having production lags,” Econometrica, 15, 181-204. Green, E. and W. Strawderman (1996) “A Bayesian Growth and Yield Model for Slash Pine Plantations,” Journal of Applied Statistics, 23 , 285299. Harberger, A.C. (1960): The Demand for Durable Goods, Chicago, University of Chicago Press. Hendry, D.F., A.R. Pagan and J.D. Sargan (1986): “Dynamic specification,” in Griliches, Z. and M.D. Intriligator eds, Handbook of Econometrics, 3, North-Holland, Amsterdam. Hong, C. (1989): Forecasting Real Output Rates and Cyclical Properties of Models, A Bayesian Approach, PhD Thesis, Department of Economics, University of Chicago. Jorgenson, D. (1966): “Rational distributed lag functions,” Econometrica, 34, 135-139. Judge, G.G. and M.E. Bock (1978): The Statistical Implications of PreTesting and Stein-Rule Estimators in Econometrics, Amsterdam, NorthHolland. Ljung, G.M. and G.E.P. Box (1978): “On a measure of lack of fit in time series models,” Biometrika, 66, 297-303. Metzler, L.A. (1941): “The nature and stability of inventory cycles,” The Review of Economics and Statistics, 23, 113-129. Monahan, J.F. (1983): “Fully Bayesian analysis of ARMA time series models,” Journal of Econometrics, 21, 307-331. Nerlove, M. (1958): The Dynamics of Supply: Estimation of Farmers’ Response to Price, Baltimore, John Hopkins Press. Nicholls, D.F., A.R. Pagan and R.D. Terrell (1975): “The estimation and use of models with moving average disturbance terms: a survey,” International Economic Review, 16, 112-134. Pagan, A. (1985): “Time series behavior and dynamic specification,” Oxford Bulletin of Economics and Statistics, 47, 199-211. Pankratz, A. (1983): Forecasting With Univariate Box-Jenkins Models, New York, Wiley. Pantula, S.G., G. Gonzalez-Farias and W. Fuller (1994): “A comparison of unit-root test criteria,” Journal of Business and Economic Statistics, 12, 449-459. Parzen, E. (1982): “ARARMA models for time series analysis and forecasting,” Journal of Forecasting, 1, 67-82. Phillips, P.C.B. and P. Perron (1988): “Testing for a unit root in time series regression,” Biometrika, 76, 335-346. Phillips, P.C.B. (1991): “To criticize the critics: an objective Bayesian analysis of stochastic trends,” Journal of Applied Econometrics, 6, 333364. Said, E. and D.E. Dickey (1984): “Testing for unit roots in autoregressivemoving average models of unknown order,” Biometrika, 71, 599-607. Samuelson, P.A. (1939a): “Interactions between the multiplier analysis and the principal of acceleration,” Review of Economics and Statistics, 21, 75-78. Samuelson, P.A. (1939b): “A synthesis of the principle of acceleration and the multiplier,” Journal of Political Economy, 47, 786-797. Sargent, T. J. (1987): Macroeconomic Theory, 2nd ed. San Diego, Academic Press. Schotman, P.C. and H.K. Van Dijk (1991): “On Bayesian routes to unit roots,” Journal of Applied Econometrics, 6, 387-401. Schwarz, G. (1978): “Estimating the dimension of a model,” Annals of Statistics, 6, 461-464. Shiller, R. J. (1973): “A distributed lag estimator derived from smoothness priors,” Econometrica, 41, 775-788. Wold, H. (1938): A Study in the Analysis of Stationary Time Series, Uppsala, Almqvist and Wiksell. Wrinch, D.H. and H. Jeffreys (1921): “On certain fundamental principles of scientific inquiry,” Philosophical Magazine Series 6, 42, 369-390. Zellner, A. (1971): An Introduction to Bayesian Inference in Econometrics, New York, John Wiley & Sons. Reprinted by Wiley in 1996. Zellner, A. (1996): “Bayesian method of moments/instrumental variables (BMOM): analysis of mean and regression models,” in J.C. Lee, A. Zellner and W.O. Johnson, eds. Modelling and Prediction Honoring Seymour Geisser, New York, Springer-Verlag and in Zellner A. (1997): Bayesian Analysis in Econometrics and Statistics: The Zellner View and Papers, Cheltenham, U.K., Edward Elgar Publishers, 291-304. Zellner, A. (1997): “The Bayesian method of moments (BMOM): theory and applications,” in T. Fomby and R.C. Hill, eds. Advances in Econometrics, Vol 12, 85-105. Zellner, A. and B. Chen (2001): “Bayesian modeling of economies and data requirements,” Macroeconomic Dynamics, 5, 673-700. Zellner, A. and M. S. Geisel (1970): “Analysis of distributed lag models with applications to consumption function estimation,” Econometrica, 38, 865-888. Zellner, A. and C. Hong (1989): “Forecasting international growth rates using Bayesian shrinkage and other procedures,” Journal of Econometrics, 40, 183-202. Zellner, A., C. Hong and C.-K. Min (1991): “Forecasting turning points in international growth rates using Bayesian exponentially weighted autoregression, time-varying parameters and pooling techniques,” Journal of Econometrics, 49, 275-304. Zellner, A., D.S. Huang and L.C. Chau (1965): “Further analysis of the short-run consumption function with emphasis on the role of liquid assets,” Econometrica, 33, 571-581. A. Zellner and Tobias, J. (2001): “Further results on Bayesian method of moments analysis of the multiple regression model,” International Economic Review, 42, 121-132. Zivot, E. (1994): “A Bayesian analysis of the unit root hypothesis within an unobserved components model,” Econometric Theory, 10, 552-578.