A Comparison of Autoregressive Distributed Lag and Dynamic OLS Cointegration Estimators in the Case of a Serially Correlated Cointegration Error Nikitas Pittis∗ University of Piraeus Ekaterini Panopoulou University of Piraeus March 17, 2004 Abstract This paper deals with a family of parametric, single-equation cointegration estimators that arise in the context of the Autoregressive Distributed Lag (ADL) models. We particularly focus on a subclass of the ADL models, those that do not involve lagged values of the dependent variable, referred to as Augmented Static (AS) models. The general ADL and the restricted AS models give rise to the ADL and Dynamic OLS (DOLS) estimators, respectively. The relative performance of these estimators is assessed by means of Monte Carlo simulations in the context of a triangular Data Generation Process (DGP) where the cointegration error and the error that drives the regressor follow a VAR(1) process. The results suggest that ADL fares consistently better than DOLS, both in terms of estimation precision and reliability of statistical inferences. This is due to the fact that DOLS, as opposed to ADL, does not fully correct for the second-order asymptotic bias effects of cointegration, since a “truncation bias” always remains. As a result, the performance of DOLS approaches that of ADL, as the number of lagged values of the first difference of the regressor in the AS model increases. Another set of Monte Carlo simulations suggests that the commonly used information criteria select the corrrect order of the ADL model quite frequently, thus making the employment of ADL over DOLS quite appealing and feasible. Additional results suggest that ADL re-emerges as the optimal estimator within a wider class of asymptotically efficient estimators including, apart from DOLS, the semiparametric Fully Modified Least Squares (FMLS) estimator of Phillips and Hansen (1990, Review of Economic Studies, 57, 99-125), the non-linear parametric estimator (PL) of Phillips and Loretan (1991, Review of Economic Studies, 58, 407-436) and the system-based maximum likelihood estimator (JOH) of Johansen (1991, Econometrica, 59, 1551-1580). All the aforementioned results are robust to alternative models for the error term, such as Vector Autoregressions of higher order, or Vector Moving Average processes. JEL classification: C12, C13, C22 Acknowledgements: We acknowledge financial support from the Greek Ministry of Education and the European Union under “Hrakleitos” grant. We are grateful to Stéphane Grégoir, an anonymous referee and participants in the XXVIII Simposio de Analisis Economico, Universidad Pablo de Olavide, Sevilla, Spain, December 11-13, 2003 and the XXIX Conference on Stochastic Processes and their Applications, IMPA, Rio de Janeiro, Brasil, August 3-9, 2003 for helpful suggestions and comments. The usual disclaimer applies. ∗ Correspondence to: Nikitas Pittis, Department of Banking and Financial Management, University of Piraeus, 80 M.Karaoli and A. Dimitriou str. 18534 Piraeus, Greece. E-mail: npittis@unipi.gr 1 1 Introduction The concept of cointegration has evolved into a fully developed statistical theory that covers regressions with integrated variables. Efficient estimators, either in a single or in a system-of-equations framework are now available with well known asymptotic properties. An interesting aspect of cointegration is that single equation methods are immuned to the classical problem of the endogeneity of the regressor(s). That is, the OLS estimator converges at rate T, where T is the sample size, regardless of the correlation structure between the cointegration error and the regressor (see Stock 1987). However, “long-run correlation” and/or “endogeneity” problems, are still encountered when statistical inference on the cointegration vector is conducted. In the presence of contemporaneous and/or temporal correlation between the cointegration error and the regressor, the asymptotic distribution for the OLS estimator does not belong to the Local Asymptotic Mixtures of Normal (LAMN) family and depends on nuisance parameters (see Phillips 1988, Park and Phillips 1988, Sims, Stock and Watson 1990, Phillips and Loretan 1991). Various single-equation estimation methods dealing with the second-order effects, either parametrically or non-parametrically have been suggested in the literature (see, for example, Johansen 1988, 1991, Phillips and Hansen 1990, Stock and Watson 1993). The parametric methods attempt to estimate the long-run parameters in the context of a dynamic model, in which the regression error forms a martingale difference sequence with respect to a selected information set. The resulting models fall into the category of the Hendry-style Autoregressive Distributed Lag (ADL) models, which encompass the Error Correction models (ECM) as a special case (see Hendry et. al. 1984, Banerjee et. al. 1993, Pesaran and Shin 1999). In empirical applications, however, the ADL class of models is rarely employed. Instead, applied researchers seem to favor a subclass of the ADL family, namely those models that do not involve lagged values of the dependent variable, say yt . These models can be thought of as arising from the static equation of yt on xt , augmented by current and past values of the first difference of the regressor.1 We shall refer to these models as the Augmented Static (AS) models. Estimation of the cointegration vector in the context of the AS models by means of least squares is asymptotically optimal and the resulting estimator is usually referred to as the Dynamic Ordinary Least Squares (DOLS) estimator (see Stock and Watson 1993). In other words, for optimal parametric inference we do not have to employ the full dynamic ADL model; instead the AS model suffices. This is due to the fact that the AS model is based on the projection of the cointegration error on the current, and past values of the error that drives the regressor (say, set A), that is it involves all the necessary parametric corrections for removing the second-order effects.2 On the other hand, the ADL model is based on the projection of the cointegration error on the full information set, (say, set B) that is on set A, plus the past values of the cointegration error. This in turn implies that the AS and ADL models differ in two respects: First, the error in the AS model, as opposed to the error in the ADL model is, in general, serially correlated. This is not a major problem, provided that the long-run variance of the error in the AS model is consistently estimated (see Kramer 1986, Park and Phillips 1988). Second, and more importantly, in the cases that the cointegration error and the error that drives the regressor follow a Vector Au1 The discussion refers to the case that there are no feedbacks from the cointegration error to the error that drives the regressor. In the case that the cointegration error Granger-causes the regressor’s error, the generating mechanism for the latter is not fully estimated. In such a case, further augmentation of the ADL model by the leads of the regressor restores strong exogeneity and removes the second-order asymptotic bias (see Phillips and Loretan 1991, Saikonnen 1991, Stock and Watson 1993, Pesaran and Shin 1999). 2 This is true under the assumption that the cointegration error does not Granger cause the error that drives the regressor. We relax this assumption in the third section of the paper. 2 toregressive process of order m (VAR(m)), the projection of the cointegration error on set B is summarized in terms of a small number of variables. On the other hand, the projection of the cointegration error on set A results in an infinite weighted sum of current and past values of the error that drives the regressor. In practice, of course, this infinite sum is truncated at a specific lag, say p, so there is always a truncation remainder, which represents the second-order effects that have not been taken into account. Therefore, the ADL model, utilizing the exact projection of the cointegration error on set B, offers a better framework for estimating the cointegration vector than the AS model that utilizes an approximate projection of the cointegration error on set A. The preceding discussion implies that the relative performance of ADL against DOLS is likely to depend on the specific parametric model that generates the errors. For example, if the error generating mechanism is a Vector Moving Average (VMA) process, then the performance of the ADL estimator in finite samples is likely to be comparable to that of DOLS. A direct implication of the VMA assumption is that the memory of the cointegration error is designed to be extremely short. This, however, does not seem to be the case, when actual data is used. In most macroeconomic applications, the equilibrium error seems to exhibit a rather long memory. In fact, sometimes it is difficult to distinguish between such a highly persistent error and a nonstationary one. In view of this, it is natural to compare ADL and DOLS within a framework that is capable of reproducing the observed behaviour of the cointegration error. Stock and Watson (1993) (SW, henceforth) specify a VAR(1) model for the errors, which does give rise to a highly persistent cointegration error. Their designs, however, are such that the truncation bias of DOLS is zero, thus favoring the DOLS estimator against its competitors.3 In this paper we follow SW and employ a triangular Data Generation Process (DGP) assuming that the cointegration error and the error that drives the regressor follow a VAR(1) process with normal innovations. The purpose of this paper is to compare the performance of the ADL and DOLS estimators when the cointegration error exhibits various degrees of persistence. The parameter that controls the persistence of the cointegration error also controls the truncation bias of the DOLS estimator. The performance of the estimators under consideration is assessed via Monte Carlo simulations. The results confirm the superiority of the ADL estimator over DOLS for all possible scenarios on the persistence of the cointegration error and the Granger causality structure between the cointegration error and the error that drives the regressor. In fact, in most cases, the limiting performance of DOLS, as the number of lagged values of the first difference of the regressor in the AS model increases, seems to be that of ADL. These results strongly suggest the employment of the ADL estimator, provided that the correct order of the model is selected. In this respect, additional Monte Carlo simulations suggest that the commonly used information criteria are capable of delivering the correct order of ADL at a satisfactory frequency. Another set of simulations suggest that all the aforementioned results favoring the ADL estimator are robust to alternative error processes, such as VAR(2) or even VMA(1) processes. The paper is organized as follows. Section 2 introduces the DGP and derives the ADL and AS models, as well as the conditions that render them equivalent. Section 3 reports the Monte Carlo results. For completeness, we also report simulation evidence on the performance of some other commonly used estimators, such as the semiparametric Fully Modified Least Squares (FMLS) estimator of Phillips and Hansen (1990), the nonlinear-in-parameters estimator of Phillips and Loretan (1991), henceforth (PL), which 3 SW consider parameter settings such that the truncation effect is zero (cases A and B in pp.795-799). However, these authors are not interested in comparing the DOLS estimator with the more general ADL estimator. Their concern lies on examining the performance of the DOLS estimator against that of some other commonly used estimators. 3 utilizes the same dynamic structure with that of ADL, and the system-based estimator of Johansen (1991), henceforth (JOH). Within this broader set of alternative estimators, ADL re-emerges as the optimal estimator, closely followed by the PL estimator. Section 4 concludes the paper by briefly summarizing our main results. 2 Models and Estimators Let zt and ut be two bivariate processes, with zt = [yt , xt ]> and ut = [u1t , u2t ]> . We further assume that ut is a VAR(1) process, driven by et = [e1t , e2t ]> and the generating mechanism for yt is given by the system yt = θxt + u1t and µ u1t u2t ¶ = µ µ a11 a21 e1t e2t ¶ a12 a22 (1) ∆xt = u2t ¶µ ¶ µ ¶ u1t−1 e1t + , a21 = 0 u2t−1 e2t ˜N IID ·µ 0 0 ¶µ σ 11 σ 12 σ 12 σ 22 ¶¸ (2) (3) (4) for t = 1, 2, ...T . Both eigenvalues of the matrix A = [aij ], i, j = 1, 2 are assumed to be less than one in modulus, in order for yt and xt to be I(1) variables, and the cointegration error to be an I(0) process. The long-run covariance matrix Ω and the one-sided covariance matrix ∆, needed to define the asymptotic nuisance parameters, are given by equations (5) and (6), respectively Ω = (I − A)−1 Σ(I − A> )−1 (5) ∆ = G(I − A> )−1 (6) where Σ denotes the innovations covariance matrix of the VAR and G is the unconditional covariance matrix of ut given by, vecG = (I − A ⊗ A)−1 vecΣ (7) An early result by Stock (1987) shows that the OLS estimator of θ obtained from (1) is super-consistent, regardless of the presence of temporal and/or contemporaneous correlation between the regression error, u1t , and the error that drives the regressor, u2t . On the other hand, in general, the asymptotic distribution of the OLS estimator of θ falls outside the Local Asymptotic Mixture of Normals (LAMN) family and contains nuisance parameters. The reason for the presence of non-standard asymptotics is that in the presence of contemporaneous and temporal correlation between the elements of ut , two types of second-order asymptotic effects are present in the limiting distribution of the OLS estimator (see Phillips and Loretan 1991): The first is the nuisance parameter, ω 12 /ω 22 that describes the “long-run correlation” effect, due to non-diagonality of the long run P∞covariance matrix Ω = [ω ij ] , i, j = 1, 2. The second is the nuisance parameter δ 21 = k=0 E(u20 u1k ) that describes the “endogeneity” effect. In the present case, where there are no feedbacks from the cointegration error to the error that drives the regressor (a21 = 0), both nuisance parameters have the same source, namely the contemporaneous correlation between u1t and u2t and the temporal correlation between u2t−i , i = 1, 2, ... and u1t . 4 In order to remove the second order effects parametrically, we must employ a new regression model whose error term is orthogonal to u2t and u2t−i , i = 1, 2, .... This can be done by employing the conditional expectation of u1t either on the current and past values of u2t (set A) or on the current and past values of u2t plus the past values of u1t (set B). As mentioned in the introduction, the first and second conditioning information sets result in the AS and ADL models, respectively. Next, we show how the AS and ADL models are actually derived, starting from the latter. 2.1 The ADL estimator based on the ADL model. The full system (1) and (2) with errors specified by (3) - (4), implies the following conditional density of yt , for the most general case with a21 6= 0: D(yt | xt , z0t−1 , λ1 ) = N (θ1 xt + c1 yt−1 + c2 xt−1 + c3 xt−2 , σ 2v ) where λ1 ≡ (θ1 , c1 , c2 , c3 , σ2v ) and σ 12 σ 22 σ 12 c1 = a11 − a21 σ 22 σ 12 (a22 + 1 − a21 θ) − a11 θ c2 = a12 − σ 22 σ12 c3 = (a22 − a12 ) σ22 θ1 = θ + (8) (9) (10) (11) (12) σ 212 (13) σ 22 This conditional model can be written as the ADL(q,r) regression, with orders (q,r)=(1,2): σ 2ν = σ11 − yt = θ1 xt + c1 yt−1 + c2 xt−1 + c3 xt−2 + ν t (14) The new error term, vt , is now orthogonal to u2t , ut−1 , ut−2 , ...and its variance is equal to σ2 σ 2ν = σ11 − 12 (15) σ 22 In the context of the ADL(1,2) model the cointegration parameter θ is equal to the longrun multiplier of yt with respect to xt , that is θ= θ1 + c2 + c3 1 − c1 (16) This is a relationship between the parameter of interest and the parameters of the conditional model alone, suggesting that it meets the first condition for xt to be weakly exogenous for θ, in the sense of Engle et. al. (1983).4 This means that we can always 4 The second condition for weak exogeneity requires the parameters of the conditional model and those of the marginal model to be variation-free (see Engle et.al. 1983). In the present case, the marginal density of xt is given by D xt | z0t−1 , λ2 = N (φ1 xt−1 + ϕ2 xt−2 + φ3 yt−1 , σ 22 ) (17) where λ2 ≡ (φ1 , φ2 , φ3 , σ22 ) and ϕ1 = 1 − α21 θ + α22 (18) φ2 = −α22 (19) 5 estimate (14) by OLS and then use (16) to obtain an efficient estimate of θ. However, additional computations are required to obtain the variance of this estimate (see Banerjee et. al. 1993). A more convenient approach, proposed by Bewley (1979), transforms the model (14) in such a way that a point estimate of θ and its variance can be obtained directly. After some algebraic manipulation, model (14) can be equivalently written as: yt = δ 0 ∆yt + θxt + λ0 ∆xt + λ1 ∆xt−1 + ηt where c1 δ 0 = − (1−c 1) c2 +c3 λ0 = − (1−c 1) c3 λ1 = − (1−c 1) ηt = (22) 1 (1−c1 ) ν t Estimates of the coefficients and their standard errors can be obtained by using the Instrumental Variables (IV) estimator, with the original matrix of regressors being the instrumental variables (see Wickens and Breusch 1988). This means that the ADL estimator of θ is very easy to apply since it involves only IV estimation techniques. 2.2 The DOLS estimator based on the AS model. The ADL model, derived above, may be thought of as arising from projecting u1t on the full information set B = (u2t , ut−1 , ut−2 , ...), that is E(u1t | B) = σ 12 e2t + a11 u1t−1 + a12 u2t−1 σ 22 (23) As already mentioned, the second-order effects can be dealt with by projecting u1t on a subset of this set, namely A = (u2t , u2t−1 , u2t−2 , ...), A ⊂ B : The resulting conditional expectation involves an infinite sum, E(u1t | A) = ∞ X β i u2t−i (24) i=0 where β i are functions of the parameters in (3)-(4)5 . This conditional expectation does not admit a parsimonious representation analogous to (23). On the other hand, it allows for direct substitution of this expression into (1), thus yielding the AS model yt = θxt + ∞ X β i ∆xt−i + υ t (25) i=0 where υ t is, in general, a serially correlated error term. In particular, υ t follows the AR(1) model υ t = γ 2 υ t−1 + εt (26) φ3 = α21 (20) The variation-free condition between λ1 and λ2 is achieved in the case that α21 = 0. This is because, in general, λ1 and λ2 are not variation free, due to the following cross restriction between the elements of λ1 and λ2 , (θ1 + c2 + c3 ) φ3 = (1 − c1 ) (1 − φ2 − φ1 ) (21) On the other hand, if α21 = 0, variation freeness is restored, xt becomes weakly exogenous for θ and OLS on (14) will give a (super) consistent and asymptotically mixed normal estimate of θ. 5 It is easy to show that β = σ 12 , β = a σ12 + a , β = a2 σ 12 + a a 11 σ 12 11 12 + a12 a22 , ..., when 0 1 2 11 σ 22 σ 22 22 a21 = 0. 6 where γ 2 is the MA coefficient in the ARMA (2,1) representation of u2t . Specifically, the univariate representation for u2t with a21 = 0, is where γ 2 solves u2t − (a11 + a22 ) u2t−1 + a11 a22 u2t−2 = ξ 2t + γ 2 ξ 2t−1 (27) σ 22 a11 γ 22 + σ 22 (1 + a211 )γ 2 + a11 σ 22 = 0 (28) The last three relationships suggest that the degree of serial correlation in the error of the AS model is controlled by a11 . This is because in the case of a11 = 0, the coefficient γ 2 in (26) is zero, thus yielding a serially uncorrelated error in the AS model6 . The serial correlation of υt does not raise any serious problems in the estimation of θ, provided that a consistent estimator of the long-run variance of υ t is employed, such as the one proposed by Newey and West (1987). Alternatively, the application of Generalized Least Squares (GLS) on (25), ensures valid asymptotic inferences on θ.7 In practice, however, the second term on the right-hand side of (25) has to be replaced by an approximation in which the infinite sum is truncated at i = p. The resulting AS(p) model accommodates a truncation remainder that is likely to increase the bias of the DOLS estimator of θ. This bias grows with the parameter a11 , which mainly controls for the persistence of the cointegration error. Increasing the truncation point reduces the DOLS bias, but increases its variance. Moreover, estimating (25) by OLS is not feasible if p is too large compared to the sample size. Saikkonen (1991) specifies an upper bound for the rate at which p is allowed to increase with the sample size T, which is given by the condition p3 /T → 0. Nevertheless, this condition cannot be used to define the optimal value of p for any given sample size. Finally, it is easy to show that when a11 = 0, the ADL model reduces to the AS model. In this case, the ADL(q,r) and AS(p) models, implied by this specific DGP, are the ADL(0,2) and AS(1) models, respectively. 3 Simulation Results In this section, we attempt to quantify the cost of employing the AS(p) instead of the ADL(q,r) model for the estimation of θ, by means of Monte Carlo simulations. The OLS and IV estimators applied to the AS(p) and ADL(1,2) models (22), respectively, are referred to as the DOLS(p) and ADL(1,2) estimators. The serial correlation effect on the DOLS(p) estimator is taken into account by means of the autocorrelation consistent covariance matrix estimator of Newey and West (1987). The bandwidth parameter is estimated non-parametrically, according to Newey and West (1994). Alternatively, we assume an AR(1) model for υt and employ the feasible generalized least squares estimator, referred to as the DGLS(p). The truncation parameter, p, takes values in the interval [1, 20], by steps of 1. As mentioned in the introduction, the comparison is extended to include some other commonly used estimators, such as the FMLS, the PL(s,l) and the JOH(z) estimators.8 The mean bias, median bias and average root mean 6 See also Stock and Watson 1993, pp.798, for a similar discussion on this issue, for the general case with a21 6= 0 7 Note that in the case of a linear regression which involves an I(1) strictly exogeneous regressor, the OLS is asymptotically equivalent to the GLS estimator (see Kramer 1986, Park and Phillips 1988). 8 The FMLS estimator is based on consistent estimation of the matrices Ω and ∆, which in turn requires the selection of a kernel and the determination of the bandwidth. We employ the Quadratic Spectral kernel and determine the bandwidth by means of the Andrews (1991) data-dependent procedure. e t prior to Moreover, the "prewhitened" version of FMLS (PW-FMLS) which filters the error vector u estimating Ω and ∆ is also employed (see Christou and Pittis 2002, for a discussion on the performance of the various versions of the FMLS estimator). Regarding the PL(s,l) estimator, the orders s and l refer 7 squared error (MSE) are used to assess the estimators. The associated t-tests are assessed by comparing the 2.5% (t0.025 ) and the 97.5% (t0.975 ) points in the empirical distributions of the relevant t-statistics with those from the standard N(0,1). Moreover, for nominal sizes of 5%, the empirical sizes of the t-tests for testing the hypothesis θ = 1 are computed. We generate 2000 series of length 150, starting with u10 = u20 = 0, and then discard the initial 50 observations, thus generating a sample size of 100. Although many other parameter settings were run, we only report the results for the leading case {a12 = 0.5, σ 12 = 0.7, a21 = a22 = 0, σ11 = σ 22 = 1, θ = 1 and 0 < a11 < 1} , referred to as DGP1, because this summarizes the main differences between the ADL (1,2) and DOLS(p) /DGLS(p) estimators. In this case, the regressor xt is a random walk and weakly exogenous for θ, in the context of the conditional model (14). The asymptotic nuisance parameters, ω 12 /ω 22 and δ 21 reduce to: a12 + σ12 ω 12 = δ 21 = (29) ω 22 1 − a11 It is easy to show that when a11 → 1, then · ω 12 +∞ if a12 + σ12 > 0 = δ 21 → (30) −∞ if a12 + σ12 < 0 ω 22 This means that the magnitude of the nuisance parameters increases with the persistence of the cointegration error, thus amplifying the truncation effect on the DOLS(p) and DGLS(p) estimators. The key parameter a11 takes the values 0.3, 0.6 and 0.9. A near-to-unit root case is also examined by setting a11 = 0.95.9 First, we focus solely on comparing ADL(1,2) with DOLS(p) and DGLS(p). The results, concerning the mean, median bias and MSE of these estimators are reported in Figures 1A - 1D, 2A-2D and 3A-3D, respectively and are summarized as follows: (i) The mean (or median) bias for all the estimators, namely ADL(1,2), DOLS(p) and DGLS(p), increases with the degree of persistence of the cointegration error. (ii) DOLS(p) and DGLS(p) perform far worse than ADL(1,2) in bias and MSE for small values of the truncation parameter, p. When p increases, the DOLS(p) and DGLS(p) bias converges to that of ADL(1,2). However, the lag length, necessary to reduce the bias of DOLS(p) and DGLS(p) towards the bias of ADL(1,2), increases with the persistence of the cointegration error. For example, when a11 is equal to 0.3, 0.6 and 0.9, the number of lags necessary to bring the bias of DOLS(p) down to the level of ADL(1,2) is 4, 7 and 20, respectively. In the near-to-unit root case, a11 = 0.95, the performance of DOLS(20) and DGLS(20) in bias is still much worse than that of ADL(1,2). (iii) For small values of p, DOLS(p) fares much better than DGLS(p). When p becomes sufficiently large, DOLS(p) and DGLS(p) become equivalent in bias and MSE. (iv) When p increases, the rate of decrease of the bias of DOLS(p) and DGLS(p) is much higher than the rate at which the standard deviation of these estimators increases, for all the values of a11 , except for a11 = 0.3. This explains why the MSE is a decreasing function of p for all the values of a11 , except for a11 = 0.3. (v) When we increase the sample size to 300, the overall picture regarding the relative performance of the ADL(1,2) and DOLS(p) /DGLS(p) estimators, remains the same. Next, we compare the ADL(1,2) estimator, which so far has emerged as the best estimator, with the rest of the estimators under scrutiny. For the DGP under study, the optimal orders s, l, and z for the PL(s,l) and JOH(z) estimators are 1, 0 and 2 to the lags and leads of ∆xt , respectively. Finally, the order z in the JOH(z) estimator corresponds to the lag-order of the Vector Autoregressive Model on which this estimator is based. 9 Given the values of a , a 12 21 and a22 in this design, a value of a11 as large as 0.95, still satisfies the eigenvalue stability condition for the VAR model of the errors. 8 respectively.10 The results are reported in Table 1 and summarized below: (i) As expected, the performance of the PL(1,0) estimator is comparable to that of ADL(1,2), since both estimators utilize the same dynamic structure. The JOH(2) estimator also fares well, especially for the most persistent cases of a11 = 0.9 and a11 = 0.95. (ii) The standard FMLS and, to a lesser extent, the prewhitened FMLS estimators underperform ADL(1,2), PL(1,0) and JOH(2) for all the values of a11 . For example, for a11 = 0.6 the bias of the FMLS, the PW-FMLS and the ADL(1,2) estimators is equal to 0.066, 0.0202 and 0.0017, respectively. (iii) Comparing DOLS(p) with the PW-FMLS estimator yields ambiguous results. For a11 = 0.3 and a11 = 0.6, DOLS(p) dominates PW-FMLS in terms of bias for all but very small values of p. For a11 = 0.95, however, the opposite is true; the PW-FMLS estimator is less biased than DOLS(p) for all the values of p that are less or equal to 13. We now turn to the problem of inference by examining the empirical distribution of the estimators’ t-statistics as well as the corresponding empirical sizes for testing the hypothesis θ = 1. Table 2 reports the 2.5% (t0.025 ) and 97.5% (t0.975 ) points of the empirical distribution of the t-statistics for all the estimators under consideration and for the four values of a11 . Again we start the comparisons by focusing on the ADL(1,2), DOLS(p) and DGLS(p) estimators. The results suggest that the DOLS(p) and DGLS(p) t-statistics are not, in general, well approximated by a standard N(0,1), even when a sufficiently large value of p is employed. On the other hand, the ADL(1,2) t-statistic is much better approximated by the standard N(0,1), especially when the persistence of the cointegration error is not particularly high. Moreover, the value of p that minimizes the bias of DOLS(p) and DGLS(p) does not always coincide with the value of p that minimizes the distributional divergence of the corresponding t-statistics from the standard N(0,1). For example, for a moderately persistent cointegration error, that is for a11 = 0.6, the bias of DOLS(p) reaches the level of the ADL(1,2) for p=7. For this value of p the 2.5% and 97.5% points of the corresponding t-statistic distribution are -2.9 and 2.9, respectively. The situation deteriorates for higher values of a11 . For a11 = 0.9, the biases of both DOLS(p) and DGLS(p) are minimized for p=20, a value for which the t0.025 and t0.975 points are equal to -4.7 and 6.2, respectively for DOLS(p), and -2.3 and 3.8, respectively for DGLS(p). More dramatic effects occur when the cointegration error is nearly nonstationary, that is when a11 = 0.95. On the other hand, for a11 = 0.9, the t0.025 and t0.975 points for ADL(1,2) are -1.7 and 3.8, respectively, thus ensuring much more reliable inferences on θ. These distributional characteristics of the t-statistics are reflected on the empirical sizes of the t-tests for testing the hypothesis θ = 1. The results, reported in Figures 4A-4D, reveal large size distortions for both DOLS(p) and DGLS(p) in the following two cases: First, when a11 = 0.6 and the value of p is relatively small. Second, when the cointegration error is highly persistent, that is when a11 = 0.9 and even worse when a11 = 0.95. In the second case, the size distortions are present regardless of the value of p, and yield totally unreliable inferences. For example, for a11 = 0.9 the empirical size of DOLS(p) ranges from 72 percent for p=1 to 43 percent for p=20. At the same time, the empirical size of ADL(1,2) is at the reasonable level of 15 percent. Increasing the sample size to 300 yields qualitatively similar results. For example, for a11 = 0.9, the empirical size of DOLS(p) is 67 percent for p=1 and reduces to 36 percent for p=20, while the size of the ADL(1,2) is at the level of 7 percent. We now examine the issue of statistical inference in the context of the PL(1,0), the JOH(2), the FMLS and the PW-FMLS estimators. The t0.025 and t0.975 values for these 10 For the prewhitened version (PW) of FMLS, a VAR(1) model is used as the filter for prewhitening residuals. That is, the VAR-filter coincides with the true model for ut , thus creating the best case environment for the performance of the PW-FMLS estimator. 9 estimators are also reported in Table 2, whereas the corresponding empirical sizes are reported in the last column of Table 1. The FMLS bias, reported above, is accompanied by size distortions which become more severe as the value of a11 increases. For example, for a11 = 0.9, the empirical size of FMLS and PW-FMLS is 63 percent and 29 percent respectively, whereas the corresponding size for the ADL(1,2) is as low as 15 percent. These distortions are due to the large divergence of the FMLS t-statistic from the standard normal, occuring when the persistence of cointegration error is high. For example, for a11 = 0.95, the value of t0.975 is 38.78 and 8.73 for FMLS and PW-FMLS, respectively. The empirical size of PL(1,0) is almost identical to that of ADL(1,2) for all the values of a11 . This, however, does not seem to be the case for the JOH-based t-test, which appears to be under-sized for low and moderate degrees of persistence of the cointegration error. Specifically, for a11 = 0.3 and a11 = 0.6 the value of t0.025 is -0.952 and -1.211 respectively, resulting in empirical sizes that are substantially smaller than the nominal ones. As far as alternative parametrizations are concerned, we run the following simulations: (i) The second-order effects arise solely from the contemporaneous correlation between the innovations of the error, that is a12 = 0, σ12 = 0.7, a21 = a22 = 0, σ 11 = σ 22 = 1, θ = 1 and 0 < a11 < 1. (ii) The error that drives the regressor Granger causes the cointegration error, but the contemporaneous correlation between the two errors is zero, that is a12 = 0.5, σ 12 = 0, a21 = a22 = 0, σ 11 = σ 22 = 1, θ = 1 and 0 < a11 < 1. In both of these cases, the results are qualitatively similar to those of the leading case. DOLS(p) and DGLS(p) are generally beaten by ADL(1,2) in bias and MSE for all values of p under consideration. When p reaches a sufficiently large value, say p∗ , the performance of DOLS(p∗ ) and DGLS(p∗ ) approaches that of ADL(1,2). As in the leading case, p∗ increases with the degree of persistence of the cointegration error. The problems of statistical inferences on θ, in both cases are very similar to those reported for the leading case. Finally, we briefly discuss the case, where the key parameter a11 is set equal to zero, that is a12 = 0.5, σ 12 = 0.7, a21 = a22 = 0, σ 11 = σ 22 = 1, θ = 1 and a11 = 0. This is a case where the AS model utilizes an exact rather than an approximate projection of u1t on the current and past values of u2t , which in turn implies that the DOLS(1) estimator utilizes the correct model, whereas the ADL(1,2) estimator is based on a slightly overspecified model.11 The simulation results seem to confirm the theoretical predictions. The bias and standard deviation of DOLS(1) are slightly smaller than those of ADL(1,2). Of course, the addition of more lags of ∆xt in the AS model increases the variability (and the MSE) of DOLS(p), but this is something that occurs in the case of an over-parametrized ADL(q,r) model as well. 3.1 Information Criteria The analysis, so far, seems to favor the ADL(q,r) over the DOLS(p)/DGLS(p) estimation method for conducting inferences on θ. In fact, this estimator dominates, in some or all the aspects of statistical inference, not only the DOLS(p)/DGLS(p) estimator but also the rest of the estimators presently under study. Throughout the analysis, we assumed that the ADL(q,r) estimator utilizes the correct dynamic model, implied by the DGP (1) - (4), that is q=1, r=2. In such a case, the performance of the ADL(1,2) estimator may be thought of as the limiting performance of the DOLS(p) or DGLS(p) estimators. Does this clearly suggest that in empirical applications, researchers should always employ ADL(q,r) for estimating θ? The answer seems to be in the affirmative, conditional, however, on the ability of researchers to determine the correct dynamic model for each particular case, that is to select the correct values for q and r. A more realistic experiment for measuring 11 In this case, υ is serially uncorrelated, which in turn implies that neither non-parametric nor GLSt type corrections are necessary. 10 the benefits from employing ADL(q,r) over DOLS(p)/DGLS(p) should incorporate the issue of selecting the lag orders (q,r) and p in the corresponding estimators. To address this issue, we design the following experiment, in the context of the DGP1: We consider the family of ADL(q,r) estimators that arise from allowing q and r to take integer values in the interval [0,4], thus obtaining fourteen ADL(q,r) estimators. In this class, and for the specific DGP under study, ADL(0,0), ADL(0,1), ADL(1,0) and ADL(1,1) are underspecified, ADL(1,2) is correctly specified, and the rest are over-specified. We also consider twenty one DOLS(p) estimators and another twenty one DGLS(p) estimators, by allowing p to take integer values in the interval [0,20]. As far as the PL(s,l) and JOH(z) estimators are concerned, we allow s and z to take integer values in the interval [1,4].12 Since no leads of the regressor are required for this particular DGP, we set l equal to zero. To select the orders (q,r), p, s and z, we use the three most commonly used information criteria for model selection, namely the Akaike (1974), the Schwarz (1978) and the Hannan and Quinn (1979) criteria, denoted by AIC, SIC and HQ, respectively. In each replication, we select the orders of ADL(q,r), DOLS(p)/DGLS(p), PL(s,0) and JOH(z) by each of the three criteria and calculate the statistics, defined in the previous section. The average values of the statistics concerning the estimation precision are reported in Table 3, whereas those on hypothesis testing are reported in Table 4. We also report the frequencies at which each criterion selects the orders (q, r) and p, in Figures 5A-5D, 6A-6D and 7A-7D for the ADL(q,r), DOLS(p) and DGLS(p) estimators, respectively. For brevity, we do not report the selection frequencies of the orders s and z in PL(s,0) and JOH(z), respectively, but we briefly discuss them in the text. We consider sample sizes of 100 and 300, but report the results only from the former. First, we confine our discussion on the comparison between the ADL(q,r) and DOLS(p)/ DGLS(p) estimators. The main results may be summarized as follows: (i) Irrespective of the value of a11 , the SIC and HQ criteria select the correct specification of the dynamic model, i.e. the ADL(1,2), in 85 percent of the cases, whereas the respective figure for the AIC is 60 percent. Increasing the sample size to 300 increases the frequency at which the correct ADL model is selected to 65 percent for AIC and to 95 percent for SIC and HQ. (ii) In the context of the DOLS(p)/DGLS(p) estimators, SIC and HQ fail to select a sufficiently large p, especially for large values of a11 . In the context of DOLS(p), the best performing criterion is by far AIC, which tends to point towards large values of p as the persistence of the cointegration error increases. The performance of AIC, however, is greatly reduced in the context of the DGLS(p) estimator, where AIC is still the best criterion but only by a slight margin over SIC and HQ. (iii) The behavior of the information criteria has the following consequences: the mean, median bias and MSE are much lower in the context of ADL(q,r) than DOLS(p)/DGLS(p), especially as a11 tends to unity. For example, when a11 = 0.95, the average bias of the AIC-based ADL(q,r), is four and thirteen times lower than the bias of the AIC-based DOLS(p) and DGLS(p), respectively. The picture is similar as regards hypothesis testing. The distributions of the ADL(q,r), DOLS(p) and DGLS(p) t-statistics shift to the right as the persistence of the cointegration error increases. This is due to the fact that the nuisance parameters ω 12 /ω 22 and δ 21 tend to +∞ as a11 approaches unity. This shift is profound in the case of the “partly corrected” DOLS(p) and DGLS(p) estimators, thus yielding empirical sizes of 67 percent and 99 percent, respectively, for a11 = 0.95. For the same degree of persistence, the empirical size of the ADL(q,r) procedure is around 23 12 Obviously, the problem of selecting the correct lag order is not relevant for FMLS or PW-FMLS, due to their non-parametric nature. A comparable issue concerns the selection of the optimal bandwidth by means of optimality criteria, such as the ones suggested by Andrews (1991) or Newey and West (1994). We do not attempt to deal with this issue in detail, since it is clearly outside the scope of the paper. 11 percent, regardless of the information criterion employed. (iv) Turning to the relative performance of the DOLS(p) versus DGLS(p) estimators, the superiority of the DOLS(p) estimator is evident in both estimation and hypothesis testing. This is due to the fact that all the criteria fail to select a sufficiently large p for the DGLS(p) estimator. When a11 = 0.9, the mean biases of the AIC-based DOLS(p) and DGLS(p) are 0.08 and 0.44, respectively, whereas for a11 = 0.95 the corresponding biases climb to 0.27 and 0.85, respectively. (v) There is a simple reason why AIC is the best performing criterion in the context of DOLS(p), whereas it does worse than SIC and HQ in the context of the ADL(q,r) estimator: AIC is an asymptotically efficient criterion, that is it selects the model that best fits the data without assuming that the correct model belongs in the set of candidate models. This is obviously the case for the class of the AS(p) models under consideration, since the correct model assumes p=∞. On the other hand, when the class of the ADL(q,r) models is considered, the correct model, ADL(1,2), belongs to the set of candidate models. In such a case, consistent selection criteria, such as SIC and HQ work well in selecting the correct model for reasonable sample sizes. Now we examine the performance of the PL(s,0), the JOH(z) and the PW-FMLS estimators. The main results are summarized below: (i) The frequencies at which the criteria select the correct PL(1,0) model are almost equal to the corresponding ones for the ADL(1,2) model. As a result, the performance of PL(s,0) is comparable to that of ADL(q,r) as far as estimation precision and reliability of statistical inferences are concerned. Similar results are obtained for the JOH(z) estimator. Therefore, the ADL(q,r), PL(s,0) and JOH(z) estimators may be thought of as forming a class of parametric estimators, say Class A, with similar characteristics. (ii) The PW-FMLS estimator, with the bandwidth parameter selected by the Andrews (1991) data-dependent procedure, and the DOLS(p) estimator, with p selected by any of the three criteria under study seem to form a second class of estimators, say Class B. Any estimator of Class A seems to dominate any estimator of Class B in any aspect of statistical inference. Finally, the standard FMLS and the DGLS(p) estimators seem to form a third class, say Class C, consisting of the worst-performing estimators. 3.2 Further extensions So far, regarding statistical inferences on θ, the Monte Carlo evidence strongly suggests the use of an estimator from Class A (in particular, ADL(q,r)) over estimators from Class B or, even more so, over estimators from Class C. Moreover, attention has focused on the case that the cointegration error and the first difference of the regressor are generated by a VAR(1) process, and the cointegration error does not Granger-cause the error that drives the regressor. The ADL(1,2) model is the correct model implied by this specific DGP and its order is successfully selected by the three most commonly used information criteria, especially the consistent ones, i.e. the SIC and HQ criteria. Next, we investigate the extent to which the relative performance of the estimators remains unchanged, when alternative specifications of the error dynamics are considered. In particular, we extend our simulations to include the following cases: (i) the cointegration error Granger-causes the error that drives the regressor, that is a21 6= 0. (ii) the cointegration error and the error that drives the regressor follow a VAR(2) process. (iii) the cointegration error and the error that drives the regressor follow a first-order vector moving average,VMA(1), process. 12 3.2.1 The cointegration error Granger-causes the error that drives the regressor. In this set of simulations, the error vector, ut , is still a VAR(1) process, but the transition matrix A does not contain any zero elements, except for a22 . In this case, as opposed to the ones analyzed in the previous section, there are feedbacks from the cointegration error to the error that drives the regressor. This, in turn, implies that further augmentation of the ADL(q,r) and AS(p) models by g leads of xt and t leads of ∆xt , respectively is required for asymptotic optimality. The resulting estimators, referred to as ADL(q,r,g) and DOLS(p,t)/DGLS(p,t), aim at removing the second-order asymptotic bias effects that arise from contemporaneous and temporal correlation between the elements of ut (see Phillips and Loretan 1991, Saikonnen 1991, Stock and Watson 1993).13 In this respect, we consider the family of ADL(q,r,g) estimators, that arise from allowing q, r and g to take integer values in the interval [0,4], thus obtaining fourteen ADL(q,r,g) estimators. We also consider nine DOLS(p,t) estimators and another nine DGLS(p,t) estimators, by allowing p and t to take integer values in the interval [0,4]. Finally, we consider fourteen PL(s,l) estimators by allowing s and l to take integer values in the intervals [1,4] and [0,4], respectively. To select the orders of these estimators, we use the three criteria mentioned above. The design of this set of simulations is the same with that described in the previous section. The DGP under consideration is the following: {a12 = 0.5, σ 12 = 0.7, a21 = 0.5, a22 = 0, σ 11 = σ 22 = 1, θ = 1 and 0 < a11 < 1}. The parameter a11 is set equal to 0.3 and 0.7, in order for the eigenvalue stability condition to be satisfied. The average values of the statistics concerning the estimation precision and hypothesis testing are tabulated in Tables 5 and 6, respectively. We also report the frequencies at which each criterion selects the orders of the ADL(q,r,g), DOLS(p,t) and DGLS(p,t) estimators in Figures 8A to 8B, 9A to 9B and 10A to 10B, respectively. First compare the ADL(q,r,g), DOLS(p,t) and DGLS(p,t) estimators. This set of simulations provides further evidence on the dominance of the ADL class of estimators over the DOLS/DGLS one, in terms of both estimation precision and reliability of statistical inference. However, the difference in the performance between ADL(q,r,g) and DOLS(p,t)/DGLS(p,t) is less prominent in this case than it was in the leading case, DGP1. This is due to the fact that when a21 6= 0, the “long-run correlation” parameter ω 12 /ω 22 converges to a well defined limit as a11 → 1. On the other hand, when a21 = 0, the nuisance parameter ω 12 /ω 22 tends to infinity, as a11 → 1. As a consequence of the limiting behavior of ω 12 /ω 22 , the average biases of the ADL(q,r,g) estimators for a11 = 0.7 lie between 0.0008 and 0.0015 depending on the selection criterion, whereas the corresponding biases of the DOLS(p,t) and DGLS(p,t) estimators lie between 0.0009 and 0.0019, and 0.0014 and 0.0032, respectively. Similarly, statistical inferences on θ are much more reliable in the context of the ADL(q,r,g) estimator, as suggested by the ADL(q,r,g) empirical sizes, which hardly exceed 10 percent, irrespective of the selection criterion used and the value of a11 . On the other hand, the empirical sizes for the DOLS(p,t) and DGLS(p,t) t-tests range from 18.6 percent to 21.9 percent and from 11.7 percent to 19.5 percent, respectively. As far as the rest of the estimators are concerned, the discussion is confined solely to hypothesis testing, for reasons of space. The PL(s,l) t-statistic is distributed approximately as N (0, 1), thus resulting in very reliable statistical inferences on θ. On the other hand, the JOH(z)-based t-test is under-sized, especially for small values of a11 , despite the fact that the information criteria select the correct lag order, z=2, at a frequency that ranges from 90 to 98 percent. Interestingly, the PW-FMLS procedure allows for statistical inferences of reasonable accuracy. In particular, the empirical size of the associated t-test is 6 percent and 10.6 percent for a11 = 0.3 and a11 = 0.7, respectively. This means that 13 In a similar vein, the order, l, in the PL(s,l) estimator is assumed to be greater than zero. 13 the performance of the PW-FMLS estimator is comparable to that of DGLS(p) and, for small values of a11 , even to that of ADL(q,r,g). However, the relatively good properties of PW-FMLS are not shared by the standard FMLS, which remains the worst-performing estimator, producing empirical sizes of 10.9 percent and 49.6 percent for a11 = 0.3 and a11 = 0.7, respectively. 3.2.2 VAR(2) errors In this set of simulations we investigate the extent to which the ADL(q,r) estimator outperforms the DOLS/DGLS(p) one in the case that the errors are generated by a bivariate VAR(2) process. First, we obtain the ADL(q,r) model implied by this DGP, and second, we derive the conditions under which the ADL(q,r) model reduces to the AS(p) model. Specifically, we assume that the errors are generated by the following process: µ u1t u2t ¶ = µ a11 a21 ¶µ ¶ µ ¶µ ¶ µ ¶ a12 u1t−1 b11 b12 u1t−2 e1t + + a22 u2t−1 b21 b22 u2t−2 e2t µ ¶ ·µ ¶ µ ¶¸ e1t 0 σ 11 σ 12 ˜N IID e2t σ 12 σ 22 0 (31) with a21 = 0, b21 = 0, that is there is no Granger-causality running from the cointegration error to the error that drives the regressor. The VAR(2) structure of ut implies that the conditional expectation of u1t on the full information set can be summarized as follows: E(u1t | u2t , ut−1 , ut−2 , ...) = σ 12 e2t + a11 u1t−1 + a12 u2t−1 + b11 u1t−2 + b12 u2t−2 (32) σ 22 This conditional expectation gives rise to the following ADL(2,3) model yt = θxt + d1 yt−1 + d2 yt−2 + d3 xt−1 + d4 xt−2 + d5 xt−3 + ν t where (33) d1 = a11 − a21 σ 12 σ 22 (34) d2 = b11 − b21 σ 12 σ 22 (35) d3 = a12 − a11 θ + (a21 θ − a22 − 1) σ 12 σ 22 d4 = b12 − b11 θ − a12 + (b21 θ + a22 − b22 ) (36) σ 12 σ 22 (37) σ 12 (38) − b12 σ 22 It is easy to show that the ADL(2,3) model reduces to the AS(2) model when a11 = b11 = 0. Similarly to the previous experiments, we consider the family of ADL(q,r) estimators that arise from allowing q and r to take integer values in the interval [0,4], thus obtaining fourteen ADL(q,r) estimators. Regarding the AS(p) model, we allow p to take integer values in the interval [0,20], thus obtaining twenty one DOLS(p)/ DGLS(p) estimators.14 To select the q, r and p orders, we employ the information criteria employed in d5 = b22 14 For the DGLS(p) estimator, we assume an AR(2) model for υ t . 14 our previous simulations. In each replication, we determine the orders of ADL(q,r) and DOLS(p)/DGLS(p) by each of the three criteria, and then we use the resulting estimators to calculate the statistics, defined in the previous section. The DGP under consideration is the following: {a12 = b12 = 0.5, σ 12 = 0.7, a21 = a22 = b21 = b22 = 0, σ 11 = σ22 = 1, θ = 1 and 0 ≤ a11 < 1, 0 ≤ b11 < 1}. The parameter a11 is set equal to 0, 0.3 and 0.6, while b11 takes the values of 0 and 0.3, in order for the eigenvalue stability condition to be satisfied.15 To conserve space, we do not report the results from these experiments, but we briefly discuss them below: All in all, the simulation results continue to provide strong evidence in favor of the ADL(q,r) models, similar to our leading case, DGP1. The SIC and HQ criteria select the correct order of the ADL(q,r) estimator in 90 percent of the cases, whereas the performance of AIC falls to the level of 60 percent. On the other hand, the consistent criteria fail to select a sufficiently large p for the DOLS(p)/DGLS(p) estimators. The only exception seems to be AIC, which in the case of a highly persistent cointegration error, that is when a11 +b11 = 0.9, selects p=20 in 40 percent of the cases. As a result, the ADL(q,r) estimator has substantially lower mean and median biases than the DOLS(p) or DGLS(p) estimators. For example, when a11 = 0.6 and b11 = 0.3, the bias of the ADL(q,r) estimator ranges from 0.049 when HQ is used to 0.055 when AIC is used. For the same values of a11 and b11 , the bias of DOLS(p) ranges from 0.145 when AIC is used to 0.175 when SIC is used. DGLS(p) is by far the worst estimator in both bias and MSE. For example, for a11 + b11 = 0.9 the bias and MSE of the AIC-based DGLS(p) are 0.588 and 0.517, respectively. The bias and MSE of DGLS(p) increase dramatically when SIC is used, reaching the values of 1.033 and 1.121, respectively. The differences in the degree of biases and MSEs between the ADL(q,r) and DOLS(p)/DGLS(p) estimators are also reflected in the size performance of the corresponding test procedures. In particular, the DOLS(p)/DGLS(p) t-tests suffer from severe size distortions, especially in the cases of a highly persistent cointegration error. For example, for a11 + b11 = 0.9, the empirical size of DOLS(p) and DGLS(p), when AIC is used, is 58.8 and 75.5 percent, respectively. On the other hand, for the same degree of persistence and by means of the same information criterion, the empirical size of the ADL(q,r)-based t-test is only 15.6 percent.16 Turning to the rest of the estimators, the PL(s,l) t-statistic fares slightly better than the ADL(q,r) one, producing an empirical size of approximately 14 percent in the case of a highly persistent cointegration error. The empirical distribution of the JOH(z)-based t-statistic is skewed to the right producing empirical sizes considerably greater than those associated with the ADL(q,r) or PL(s,l) procedures. Nevertheless, the size distortions of JOH(z) are significantly smaller than those produced by DOLS(p) or DGLS(p). Finally, the behavior of the semiparametric estimators imitates that of the leading case, DGP1. In particular, the PW-FMLS, and especially the FMLS procedures fail to account for the second-order asymptotic bias effects, thus resulting in t-statistics whose empirical distributions are located away from zero. The more persistent the cointegration error is, the more pronounced these effects appear to be. For example, when a11 + b11 = 0.9, the mean value of the FMLS and PW-FMLS t-statistic is 4.6 and 2.3, respectively, producing empirical sizes of 69.7 percent and 50.6 percent, respectively. 15 Given the values of a 12, a21, a22 and b12, b21, b22 in this design, the eigenvalue stability condition reduces to: a11 + b11 ≺ 1. 16 The only case, where the performance of the DOLS(p)/DGLS(p) estimators is comparable to that of the ADL(q,r) estimator is when a11 = b11 = 0. This is the case where the ADL(0,3) model reduces to the AS(2) model and the DOLS(2)/DGLS(2) estimators do not suffer from a truncation bias. The consistent criteria identify the correct order in the context of both specifications in more than 90 per cent of the cases. As a result, the performance of the DOLS(p)/DGLS(p) estimators is almost equal to that of the ADL(q,r) estimator. 15 3.2.3 VMA(1) errors In this set of simulations, we use a first-order bivariate moving average, VMA(1), process to generate the errors, u1t and u2t . The moving average assumption implies that the memory of the cointegration error is designed to be extremely short. Such a case rarely occurs in macroeconomic applications, where a highly persistent cointegration error is often detected. Specifically, µ ¶ µ ¶µ ¶ µ ¶ u1t a11 a12 e1t−1 e1t = + (39) u2t a21 a22 e2t−1 e2t and µ e1t e2t ¶ ∼ N IID ·µ 0 0 ¶µ σ11 σ12 σ 12 σ 22 ¶¸ (40) for t = 1, 2, ...T . This DGP does not produce a finite-order ADL(q,r) model, as the VMA(1) process has a VAR(∞) representation. In this set of simulations, we consider the set of ADL(q,r) and DOLS(p)/DGLS(p) estimators, employed in our previous simulations, where the orders q, r, p are selected by the AIC, SIC and HQ criteria. The parameter settings for the DGP under consideration are the following: {a12 = 0.5, σ 12 = 0.7, a21 = a22 = 0, σ 11 = σ 22 = 1, θ = 1 and 0 < a11 < 1}. As in the case with the VAR(1) errors, the parameter a11 takes the values 0.3, 0.6, 0.9 and 0.95. The results (not reported) may be summarized as follows: (i) Irrespective of the value of a11 , the SIC and HQ criteria choose the DOLS(1)/ DGLS(1) estimator in more than 90 percent of the cases. The respective figure for AIC is only 58 percent. (ii) The order of the ADL(q,r) estimator, selected by the criteria, is an increasing function of the parameter a11 . For example, when a11 = 0.3, the SIC and HQ criteria select the ADL(1,2) in more than 50 percent of the cases, whereas for a11 = 0.9 or 0.95, they select the ADL(2,4) model most frequently. On the other hand, when a11 = 0.9 or 0.95, the efficient AIC criterion selects almost evenly among the ADL(2,4), ADL(3,4) and ADL(4,4) estimators. (iii) When a11 = 0.9 or 0.95, the AIC-based DOLS(p) estimator is consistently the best but only by a slight margin over the AIC-based ADL(q,r). (iv) The distribution of the t-statistic of both ADL(q,r) and DOLS(p)/ DGLS(p) estimators is properly centered around zero, while slightly negatively skewed and leptokurtic. On the other hand, the ADL(q,r)-based t-tests marginally outperform the DOLS(p)-based ones in minimizing size distortions. For a nominal size of 5 percent, the empirical size of the SIC-based ADL(q,r) estimator is 9.7 percent when a11 = 0.9, whereas the respective figure for the SIC-based DOLS(p) estimator is 10.2 percent. Moreover, the size of the DGLS estimators is a decreasing function of the parameter a11 , ranging from around 5 percent for a11 = 0 to 3 percent for a11 = 0.95. (v) The behavior of the JOH(z) and PL(s,l) estimators is almost identical to that of ADL(q,r) in terms of bias, MSE and percent rejections of the null hypothesis. Interestingly, the semiparametric methods fare reasonably well in this case. The PW-FMLS estimator, in particular, seems to account fully for the second-order endogeneity effects, thus providing a reasonable alternative to parametric procedures in conducting statistical inferences on θ. The overall picture suggests that when the errors are generated by a MA(1) process, the DOLS(p)/DGLS(p) and PW-FMLS estimators fare no worse than the ADL(q,r), PL(s,l) and JOH(z) estimators, in terms of estimation precision and reliability of statistical inferences. 16 4 Conclusions The simulation experiments reported in this paper highlight the potential pitfalls of employing the DOLS(p)/DGLS(p) estimators or the class of FMLS estimators for the estimation of a cointegration vector in a single-equation framework. These pitfalls are easily addressed by using the ADL(q,r) or PL(s,l) estimators instead. The results of this paper are summarized as follows: (i) In general, the performance of the ADL(q,r) (or PL(s,l)) estimators is superior to that of the DOLS(p)/ DGLS(p) estimators. This is due to the fact that the latter estimators, as opposed to the former, suffer from truncation bias. A large value of p is usually required for the DOLS(p)/ DGLS(p) bias to approach the levels of the ADL(q,r) bias. However, the 2.5% and 97.5% points of the empirical distribution of the DOLS(p)/ DGLS(p) t-statistics do not approach the corresponding points of the N (0, 1), even for large values of p. As a consequence, the sizes of the tests based on the DOLS(p)/ DGLS(p) estimators, as opposed to those based on the ADL(q,r) estimators, are far off their nominal size of 5 %. (ii) The truncation bias of the DOLS(p)/ DGLS(p) estimators depends on the asymptotic long-run correlation and endogeneity nuisance parameters, both of which depend on the Granger causality structure of the errors in the model and the persistence of the cointegration error. As a result, the difference between the performances of ADL(q,r) and DOLS(p)/ DGLS(p) increases with the persistence of the cointegration error. This effect is milder in the presence of Granger causality running from the cointegration error to the error that drives the regressor, because in this case, the nuisance parameters do not explode as the persistence of the cointegration error increases. (iii) The benefits from employing the ADL(q,r) estimators, instead of the DOLS(p)/ DGLS(p) estimators, remain substantial when the orders (q, r) and p are selected via the usual order-selection criteria. The use of the consistent SIC and HQ criteria in the context of the ADL(q,r) model, leads to selection of the correct order in more than 90 percent of the cases. On the other hand, these criteria are totally unable to move away from low orders in the context of the DOLS(p)/ DGLS(p) estimation method, thus producing a very large truncation bias. The efficient AIC criterion is by far the best performing one in the context of the DOLS(p) estimator, since it selects a sufficiently large p in the cases that the truncation bias is likely to be large. (iv) The simulation results provide strong evidence against the employment of the standard FMLS estimator. In fact, this estimator is inferior even to DOLS(p)/DGLS(p) for most values of p and for all the DGPs under study. If the applied researcher insists on using FMLS, then at least he/she must utilize the “prewhitened” version of this estimator, in order to achieve performance comparable to that of DOLS(p). (v) The above mentioned results mainly refer to the cases of autoregressive errors. When the errors follow a bivariate moving average process, where the persistence of the cointegration error is low and the truncation bias of the DOLS(p)/ DGLS(p) estimators is negligible, the two methods under study are almost equivalent. 17 References [1]Akaike, H. (1974), A New Look at the Statistical Model Identification, IEEE Transactions on Automatic Control, AC-19, 667-673. [2]Andrews, D.W.K. (1991), Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation. Econometrica, 59, 817-858. [3]Banerjee, A., Dolado, J.J., Galbraith, J.W. and D.F. Hendry (1993), Cointegration, Error Correction and the Econometric Analysis of Non-Stationary Data, Oxford, Oxford University Press. [4]Bewley, R.A (1979), The Direct Estimation of the Equilibrium Response in a Linear Model, Economics Letters, 3, 357-61. [5]Christou, C. and N. Pittis (2002), Kernel and Bandwidth Selection, Prewhitening, and the Performance of the Fully Modified Least Squares Estimation Method. Econometric Theory, 18, 948-961. [6]Engle, R.F., D.F. Hendry and J.F. Richard (1983), Exogeneity, Econometrica, 51, 277-304. [7]Hannan, E.J. and Quinn, B.G. (1979), The Determination of the Order of an Autoregression, Journal of the Royal Statistical Society, B41, 190-195. [8]Hendry, D.F., A.R. Pagan and J.D. Sargan (1984), Dynamic Specification, in Z. Griliches and M.D. Intrilligator (eds.) Handbook of Econometrics, vol II, ch.18, 10231100. [9]Johansen, S. (1988), Statistical Analysis of Cointegrating Vectors, Journal of Economic Dynamics and Control, 12, 231-254. [10]Johansen, S. (1991), Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models, Econometrica, 59, 1551-1580. [11]Kramer, W. (1986), Least-squares regression when the independent variable follows an ARIMA process, Journal of the American Statistical Association, 81, 150-154. [12]Newey, W.K. and K.D. West (1987), A simple Positive, Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix, Econometrica, 55, 703-708. [13]Newey, W.K. and K.D. West (1994), Automatic lag selection in covariance matrix estimation, Review of Economic Studies, 61, 4, 631-653. [14]Park, J.Y. and P.C.B. Phillips (1988), Statistical Inference in Regressions with Integrated Processes: Part 1, Econometric Theory, 4, 468-498. [15]Pesaran, H.M. and Y. Shin (1999), An autoregressive distributed lag modelling approach to cointegration analysis, in S. Strom (ed.), Econometrics and Economic Theory in the Twentieth Century: The Ragnar Frisch Centennial Symposium, Cambridge University Press, Cambridge, UK. [16]Phillips, P.C.B. (1988), Reflections on Econometric Methodology, Economic Record, 64, 344-359. [17]Phillips, P.C.B. and B.E. Hansen (1990), Statistical Inference in Instrumental Regressions with I(1) processes, Review of Economic Studies, 57, 99-125. 18 [18]Phillips, P.C.B. and M. Loretan (1991), Estimating Long-run Economic Equilibria, Review of Economic Studies, 58, 407-436. [19]Saikkonen, P. (1991), Asymptotically Efficient Estimation of the Cointegration Regressions, Econometric Theory, 7,1, 1-27. [20]Schwarz, G. (1978), Estimating the Dimension of a Model, Annals of Statistics, 6, 461-464. [21]Sims, C.A., Stock, J.H. and M.W. Watson (1990), Inference in Linear Time Series with Some Unit Roots, Econometrica, 58: 113-144. [22]Stock, J.H. (1987), Asymptotic Properties of Least Squares Estimators of Cointegrating Vectors, Econometrica, 55, 1035-1056. [23]Stock, J.H. and M.W. Watson (1993), A Simple Estimator of Cointegrating Vectors in Higher-order Integrated Systems, Econometrica, 61, 783-820. [24]Wickens, M.R. and T.S. Breusch (1988), Dynamic Specification, the Long Run and the Estimation of Transformed Regression Models, Economic Journal, 98, (Conference 1988), 189-205. 19 Table 1 Panel A Estimator ADL(1,2) PL(1,0) JOH(2) FMLS PW-FMLS Panel B Estimator ADL(1,2) PL(1,0) JOH(2) FMLS PW-FMLS Panel C Estimator ADL(1,2) PL(1,0) JOH(2) FMLS PW-FMLS Panel D Estimator ADL(1,2) PL(1,0) JOH(2) FMLS PW-FMLS a11=0.3 Mean bias 0.0012 0.0012 0.0012 0.0269 0.0099 Median Bias 0.0006 0.0006 0.0059 0.0179 0.0066 MSE 0.0012 0.0012 0.0009 0.0032 0.0017 Size 5.75 5.75 0.75 14.75 8.15 Median Bias 0.0016 0.0017 0.0044 0.0460 0.0142 MSE 0.0041 0.0041 0.0062 0.0140 0.0059 Size 7.10 7.05 1.55 26.25 11.35 Median Bias 0.0399 0.0447 0.0244 0.2972 0.1188 MSE 0.0882 0.0890 0.1724 0.1981 0.1457 Size 15.10 15.35 14.10 63.20 29.30 Median Bias 0.1574 0.1670 0.1082 0.5168 0.3199 MSE 0.3659 0.3580 0.4100 0.4711 0.7436 Size 27.90 27.80 30.05 74.60 46.15 a11=0.6 Mean bias 0.0017 0.0018 0.0077 0.0660 0.0202 a11=0.9 Mean bias 0.0372 0.0397 0.0336 0.3202 0.1409 a11=0.95 Mean bias 0.1037 0.1132 0.0965 0.5170 0.3315 20 Table 2 t0.025 t0.975 a11 Estimator OLS ADL(1,2) PL (1,0) JOH (2) FMLS PW-FMLS 0.3 0.6 0.9 0.95 0.3 0.6 0.9 0.95 -0.655 -2.020 -2.020 -0.952 -1.541 -1.855 -0.765 -1.926 -1.966 -1.211 -1.629 -1.919 -1.074 -1.676 -1.738 -1.908 -2.108 -2.313 -1.131 -1.484 -4.186 -2.483 -2.006 -2.792 4.199 2.084 2.084 1.571 3.162 2.499 4.975 2.196 2.299 1.658 4.369 2.757 9.983 3.826 3.491 3.555 15.099 5.398 15.363 6.048 5.513 5.798 38.781 8.732 DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS DOLS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 -1.936 -2.271 -2.361 -2.357 -2.340 -2.454 -2.513 -2.533 -2.513 -2.514 -2.497 -2.527 -2.527 -2.574 -2.583 -2.563 -2.507 -2.533 -2.454 -2.469 -1.484 -1.882 -2.249 -2.500 -2.675 -2.784 -2.872 -2.934 -2.895 -2.871 -2.933 -2.961 -3.005 -3.042 -3.032 -3.076 -2.991 -3.047 -2.985 -2.962 -1.387 -1.613 -1.738 -2.040 -2.234 -2.449 -2.759 -2.867 -3.012 -3.068 -3.289 -3.465 -3.662 -3.745 -3.959 -4.124 -4.315 -4.535 -4.646 -4.724 -1.459 -1.617 -1.764 -2.098 -2.226 -2.420 -2.585 -2.723 -2.804 -2.992 -3.016 -3.117 -3.356 -3.565 -3.624 -3.793 -3.980 -4.147 -4.316 -4.422 2.947 2.577 2.436 2.420 2.389 2.429 2.375 2.430 2.401 2.318 2.363 2.384 2.423 2.461 2.390 2.341 2.348 2.388 2.467 2.493 4.242 3.815 3.520 3.322 3.085 2.999 2.886 2.904 2.927 2.851 2.806 2.799 2.857 2.894 2.906 2.824 2.873 2.855 2.876 2.908 9.717 9.712 9.575 9.459 9.257 9.015 8.699 8.524 8.133 7.762 7.599 7.389 7.184 6.936 6.699 6.547 6.396 6.304 6.253 6.236 15.052 14.800 14.815 14.626 14.604 14.441 14.601 14.093 13.885 13.723 13.462 13.037 12.671 12.367 12.207 12.148 11.788 11.601 11.401 11.117 DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS DGLS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 -1.467 -1.899 -2.049 -2.083 -2.073 -2.098 -2.115 -2.166 -2.143 -2.145 -2.161 -2.156 -2.140 -2.158 -2.171 -2.166 -2.128 -2.186 -2.131 -2.121 -0.116 -0.984 -1.424 -1.758 -1.930 -2.053 -2.118 -2.184 -2.171 -2.193 -2.237 -2.258 -2.266 -2.307 -2.279 -2.286 -2.272 -2.324 -2.251 -2.266 5.141 3.659 2.641 1.662 0.885 0.288 -0.255 -0.567 -0.856 -1.023 -1.241 -1.366 -1.481 -1.656 -1.850 -1.909 -2.050 -2.167 -2.245 -2.324 5.974 4.602 3.688 2.868 2.316 1.777 1.159 0.715 0.494 0.077 -0.206 -0.394 -0.611 -0.839 -1.003 -1.201 -1.269 -1.287 -1.481 -1.690 2.585 2.230 2.086 2.088 2.073 2.071 2.051 2.031 2.045 1.997 2.049 2.074 2.045 2.082 2.071 2.052 2.036 2.092 2.070 2.070 5.345 3.382 2.777 2.548 2.442 2.384 2.293 2.256 2.173 2.152 2.196 2.228 2.250 2.250 2.219 2.207 2.224 2.211 2.176 2.194 10.632 8.989 7.605 6.740 6.131 5.722 5.395 5.155 4.931 4.816 4.626 4.537 4.419 4.344 4.284 4.124 4.038 3.969 3.910 3.844 12.159 10.715 9.578 8.730 8.208 7.902 7.605 7.415 7.172 7.181 6.966 6.933 6.845 6.812 6.671 6.416 6.322 6.347 6.290 6.121 21 Table 3 Mean Bias Median Bias a11 Criterion AIC SIC HQ 0.3 0.6 0.9 0.95 0.3 0.001 0.002 0.001 0.004 0.005 0.004 0.042 0.048 0.045 0.062 0.067 0.062 0.001 0.001 0.001 AIC SIC HQ 0.002 0.006 0.004 0.004 0.015 0.009 0.081 0.113 0.097 0.270 0.289 0.280 0.001 0.004 0.003 AIC SIC HQ 0.003 0.008 0.006 0.013 0.038 0.026 0.440 0.915 0.726 0.852 1.098 1.039 0.001 0.006 0.004 AIC SIC HQ 0.001 0.001 0.001 0.003 0.003 0.003 0.038 0.037 0.038 0.131 0.132 0.131 0.001 0.001 0.001 AIC SIC HQ 0.010 0.010 0.010 0.009 0.009 0.009 0.040 0.040 0.040 0.125 0.125 0.125 0.006 0.006 0.006 0.6 0.9 ADL 0.003 0.046 0.003 0.048 0.003 0.046 DOLS 0.003 0.065 0.011 0.097 0.007 0.080 DGLS 0.009 0.389 0.026 0.970 0.018 0.816 PL 0.002 0.039 0.002 0.041 0.003 0.041 JOH 0.005 0.028 0.005 0.028 0.005 0.028 MSE 0.95 0.3 0.6 0.9 0.95 0.173 0.172 0.174 0.001 0.001 0.001 0.004 0.004 0.004 0.101 0.094 0.098 0.211 0.199 0.187 0.257 0.274 0.269 0.002 0.001 0.001 0.005 0.005 0.005 0.077 0.080 0.082 0.261 0.276 0.269 0.967 1.115 1.083 0.002 0.002 0.001 0.005 0.008 0.006 0.369 0.910 0.672 0.882 1.237 1.141 0.147 0.153 0.151 0.001 0.001 0.001 0.005 0.005 0.005 0.077 0.077 0.077 0.193 0.192 0.193 0.109 0.109 0.109 0.001 0.001 0.001 0.023 0.023 0.023 0.058 0.058 0.058 0.138 0.138 0.138 0.95 0.3 0.6 0.9 0.95 6.150 6.106 6.087 7.05 6.65 6.95 8.00 7.30 7.45 14.35 14.05 14.05 23.40 23.10 23.25 14.059 14.412 14.382 13.90 11.85 12.15 22.15 21.10 21.05 49.85 53.20 52.60 66.15 66.70 67.00 11.510 12.072 11.927 7.90 7.85 7.05 11.35 14.35 12.20 62.70 96.95 86.05 88.50 99.75 98.05 5.109 5.208 5.208 7.60 7.70 7.65 8.55 8.55 8.40 14.15 14.15 14.00 22.65 22.90 22.80 5.233 5.233 5.233 1.05 1.05 1.05 2.10 2.10 2.10 10.35 10.35 10.35 21.80 21.80 21.80 Table 4 t0.025 t0.975 a11 Criterion AIC SIC HQ 0.3 0.6 0.9 0.95 0.3 -2.062 -2.049 -2.062 -2.029 -1.970 -1.970 -1.754 -1.745 -1.748 -1.523 -1.505 -1.514 2.316 2.185 2.214 AIC SIC HQ -2.608 -2.393 -2.445 -3.130 -2.894 -2.977 -4.803 -4.613 -4.779 -4.525 -4.345 -4.443 2.618 2.753 2.683 AIC SIC HQ -2.155 -1.977 -2.036 -2.303 -1.866 -2.072 -1.911 1.803 -0.477 -0.290 4.658 2.406 2.229 2.335 2.298 AIC SIC HQ -2.133 -2.143 -2.130 -2.053 -2.077 -2.036 -1.922 -1.945 -1.916 -2.256 -2.393 -2.337 2.299 2.287 2.273 AIC SIC HQ -0.956 -0.956 -0.956 -1.196 -1.196 -1.196 -1.563 -1.563 -1.563 -1.657 -1.657 -1.657 1.553 1.553 1.553 22 0.6 0.9 ADL 2.505 4.022 2.312 4.017 2.331 4.022 DOLS 3.252 8.388 3.470 8.773 3.371 8.606 DGLS 2.577 9.289 3.062 10.498 2.739 10.123 PL 2.538 3.713 2.538 3.698 2.504 3.713 JOH 1.652 3.276 1.652 3.276 1.652 3.276 Size Table 5 Mean Bias a11 Criterion AIC SIC HQ 0.3 0.7 0.0003 0.0001 0.0004 0.0008 0.0015 0.0010 AIC SIC HQ 0.0000 -0.0003 -0.0001 0.0009 0.0019 0.0013 AIC SIC HQ 0.0001 -0.0003 -0.0001 0.0014 0.0032 0.0021 AIC SIC HQ 0.0003 0.0003 0.0003 -0.0009 -0.0009 -0.0009 AIC SIC HQ 0.0033 0.0033 0.0033 0.0008 0.0008 0.0008 FMLS PW-FMLS 0.0049 0.0015 0.0027 -0.0014 Median Bias 0.3 0.7 ADL 0.0003 0.0007 0.0000 0.0012 0.0003 0.0008 DOLS 0.0001 0.0008 -0.0002 0.0016 0.0000 0.0011 DGLS 0.0001 0.0011 -0.0002 0.0024 0.0000 0.0017 PL 0.0002 -0.0006 0.0002 -0.0006 0.0002 -0.0006 JOH 0.002 0.0003 0.002 0.0003 0.002 0.0003 FMLS 0.0031 0.0021 0.0010 -0.0001 23 MSE 0.3 0.7 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0000 0.0000 0.0000 0.0002 0.0001 0.001 0.0002 Table 6 t0.025 t0.975 a11 Criterion AIC SIC HQ 0.3 0.7 -2.040 -2.016 -2.009 -1.873 -1.706 -1.840 AIC SIC HQ -2.249 -2.177 -2.229 -2.387 -2.165 -2.307 AIC SIC HQ -2.118 -2.093 -2.099 -1.917 -1.615 -1.799 AIC SIC HQ -2.006 -2.006 -2.006 -2.274 -2.274 -2.274 AIC SIC HQ -1.001 -1.001 -1.001 -1.454 -1.454 -1.454 FMLS PW-FMLS -1.918 -1.873 -7.411 -2.998 0.3 0.7 ADL 2.305 2.503 2.154 2.735 2.246 2.493 DOLS 2.300 3.286 2.232 3.736 2.278 3.467 DGLS 2.167 2.763 2.073 3.572 2.143 3.163 PL 2.058 2.282 2.058 2.282 2.058 2.282 JOH 1.663 1.809 1.663 1.809 1.663 1.809 FMLS 2.739 6.766 2.188 2.218 24 Size 0.3 0.7 6.90 6.15 6.45 9.55 10.05 9.15 8.90 8.15 8.65 18.60 21.90 19.75 6.75 6.25 6.60 11.65 19.45 14.45 5.65 5.65 5.65 8.35 8.35 8.35 1.05 1.05 1.05 2.55 2.55 2.55 10.9 6.00 49.55 10.60 Figures 1A-1D Mean bias (a11=0.3, T=100) 0.025 0.020 0.015 0.010 0.005 0.000 -0.005 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Mean bias (a11=0.6, T=100) 0.300 0.250 0.200 0.150 0.100 0.050 0.000 -0.050 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Mean bias (a11=0.9, T=100) 1.200 1.000 0.800 0.600 0.400 0.200 0.000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Mean bias (a11=0.95, T=100) 1.200 1.000 0.800 0.600 0.400 0.200 0.000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) 25 DGLS(p) Figures 2A-2D Median bias (a11=0.3, T=100) 0.020 0.015 0.010 0.005 0.000 -0.005 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Median bias (a11=0.6, T=100) 0.250 0.200 0.150 0.100 0.050 0.000 -0.050 1 2 3 4 5 6 7 8 9 DOLS (p) 10 11 12 13 14 15 16 17 18 19 20 ADL(1,2) DGLS(p) Median bias (a11=0.9, T=100) 1.200 1.000 0.800 0.600 0.400 0.200 0.000 1 2 3 4 5 6 7 8 DOLS (p) 9 10 11 12 13 14 15 16 17 18 19 20 ADL(1,2) DGLS(p) Median bias (a11=0.95, T=100) 1.200 1.000 0.800 0.600 0.400 0.200 0.000 1 2 3 4 5 6 7 8 DOLS (p) 9 10 11 12 13 14 15 16 17 18 19 20 ADL(1,2) 26 DGLS(p) Figures 3A-3D MSE (a11=0.3, T=100) 0.003 0.003 0.002 0.002 0.001 0.001 0.000 1 2 3 4 5 6 7 8 9 DOLS (p) 10 11 12 13 14 15 16 17 18 19 20 ADL(1,2) DGLS(p) MSE (a11=0.6, T=100) 0.120 0.100 0.080 0.060 0.040 0.020 0.000 1 2 3 4 5 6 7 8 9 DOLS (p) 10 11 12 13 14 15 16 17 18 19 20 ADL(1,2) DGLS(p) MSE (a11=0.9, T=100) 1.200 1.000 0.800 0.600 0.400 0.200 0.000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) MSE (a11=0.95, T=100) 1.400 1.200 1.000 0.800 0.600 0.400 0.200 0.000 1 2 3 4 5 6 7 8 DOLS (p) 9 10 11 12 13 14 15 16 17 18 19 20 ADL(1,2) 27 DGLS(p) Figures 4A-4D Empirical size (a11=0.3, T=100) 16 14 12 10 8 6 4 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Empirical size (a11=0.6, T=100) 71 61 51 41 31 21 11 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Empirical size (a11=0.9, T=100) 122 102 82 62 42 22 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) DGLS(p) Empirical size (a11=0.95, T=100) 122 102 82 62 42 22 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 DOLS (p) ADL(1,2) 28 DGLS(p) AIC SIC 29 HQ ADL model selection (a11=0.95) 80 60 40 20 0 ADL(0,1) ADL(0,0) ADL(4,4) ADL(2,1) 100 ADL(2,1) HQ ADL(2,1) 0 ADL(1,0) 40 20 ADL(1,0) 60 ADL(1,0) 80 ADL(0,1) ADL model selection (a11=0.9) ADL(0,1) 100 ADL(0,0) HQ ADL(0,0) ADL(4,4) SIC ADL(3,4) SIC ADL(4,4) SIC ADL(3,4) ADL(2,4) ADL(1,4) ADL(3,3) ADL(2,3) ADL(1,3) ADL(2,2) ADL(1,2) ADL(1,1) AIC ADL(3,4) AIC ADL(2,4) ADL(1,4) ADL(3,3) ADL(2,3) ADL(1,3) ADL(2,2) ADL(1,2) ADL(1,1) AIC ADL(2,4) ADL(1,4) ADL(3,3) ADL(2,3) ADL(1,3) ADL(2,2) ADL(1,2) ADL(1,1) ADL(2,1) ADL(1,0) ADL(0,1) ADL(0,0) ADL(4,4) ADL(3,4) ADL(2,4) ADL(1,4) ADL(3,3) ADL(2,3) ADL(1,3) ADL(2,2) ADL(1,2) ADL(1,1) Figures 5A-5D 100 ADL model selection (a11=0.3) 80 60 40 20 0 HQ 100 ADL model selection (a11=0.6) 80 60 40 20 0 AIC SIC 30 HQ DOLS20 DOLS18 DOLS16 SIC DOLS20 DOLS18 DOLS16 DOLS14 DOLS12 SIC DOLS14 AIC DOLS20 DOLS18 DOLS16 DOLS14 DOLS12 DOLS10 DOLS8 DOLS6 DOLS4 DOLS2 OLS SIC DOLS12 AIC DOLS10 DOLS8 DOLS6 DOLS4 DOLS2 OLS AIC DOLS10 DOLS8 DOLS6 DOLS4 DOLS2 OLS DOLS20 DOLS18 DOLS16 DOLS14 DOLS12 DOLS10 DOLS8 DOLS6 DOLS4 DOLS2 OLS Figures 6A-6D 70 60 50 40 30 20 10 0 DOLS model selection (a11=0.3) HQ 40 35 30 25 20 15 10 5 0 DOLS model selection (a11=0.6) HQ 25 DOLS model selection (a11=0.9) 20 15 10 5 0 HQ 50 DOLS model selection (a11=0.95) 40 30 20 10 0 AIC SIC SIC SIC 31 HQ HQ 100 DGLS model selection (a11=0.95) 80 60 40 20 0 DGLS14 DGLS12 DGLS10 DGLS8 DGLS6 DGLS4 DGLS2 OLS DGLS20 DGLS model selection (a11=0.9) DGLS20 80 70 60 50 40 30 20 10 0 DGLS20 HQ DGLS20 0 DGLS18 10 DGLS18 20 DGLS18 30 DGLS18 40 DGLS16 DGLS model selection (a11=0.6) DGLS16 50 DGLS16 DGLS14 DGLS12 HQ DGLS16 DGLS14 DGLS12 DGLS10 DGLS8 DGLS6 DGLS4 DGLS2 OLS SIC DGLS14 AIC DGLS12 AIC DGLS10 DGLS8 DGLS6 DGLS4 DGLS2 OLS AIC DGLS10 DGLS8 DGLS6 DGLS4 DGLS2 OLS Figures 7A-7D 70 60 50 40 30 20 10 0 DGLS model selection (a11=0.3) AIC SIC 32 HQ ADL(4,4,0) ADL(1,2,0) ADL(1,1,0) ADL(4,4,4) SIC ADL(3,4,4) AIC ADL(2,4,4) ADL(1,4,4) ADL(3,3,3) ADL(2,3,3) ADL(1,3,3) ADL(2,2,2) ADL(1,2,2) ADL(1,2,1) ADL(1,1,1) ADL(4,4,0) ADL(1,2,0) ADL(1,1,0) ADL(4,4,4) ADL(3,4,4) ADL(2,4,4) ADL(1,4,4) ADL(3,3,3) ADL(2,3,3) ADL(1,3,3) ADL(2,2,2) ADL(1,2,2) ADL(1,2,1) ADL(1,1,1) Figures 8A-8B 100 ADL model selection (a11=0.3) 80 60 40 20 0 HQ 80 70 60 50 40 30 20 10 0 ADL model selection (a11=0.7) AIC AIC 33 SIC SIC HQ DOLS(4,0) DOLS(4,0) DOLS(1,0) DOLS(4,4) DOLS(3,3) DOLS(2,2) DOLS(1,1) OLS DOLS(3,0) DOLS model selection (a11=0.7) DOLS(3,0) 60 50 40 30 20 10 0 DOLS(2,0) HQ DOLS(2,0) DOLS(1,0) DOLS(4,4) DOLS(3,3) DOLS(2,2) DOLS(1,1) OLS Figures 9A-9B 100 DOLS model selection (a11=0.3) 80 60 40 20 0 AIC AIC SIC SIC 34 HQ DGLS(4,0) DGLS(4,0) DGLS(1,0) DGLS(4,4) DGLS(3,3) DGLS(2,2) DGLS(1,1) OLS DGLS(3,0) DGLS model selection (a11=0.7) DGLS(3,0) 70 60 50 40 30 20 10 0 DGLS(2,0) HQ DGLS(2,0) DGLS(1,0) DGLS(4,4) DGLS(3,3) DGLS(2,2) DGLS(1,1) OLS Figures 10A-10B 100 80 60 DGLS model selection (a11=0.3) 40 20 0