R B T

advertisement

R ESIDUALS B ASED T ESTS FOR THE N ULL OF N O C OINTEGRATION: AN

A NALYTICAL C OMPARISON.

Elena Pesavento*

Department of Economics

Emory University

2 January, 2005

Address correspondence to:

Elena Pesavento

Department of Economics

Emory University

Atlanta, GA 30322; e-mail: epesave@emory.edu

; phone: (404) 712 9297; fax: (404) 727 4639.

R ESIDUALS B ASED T ESTS FOR THE N ULL OF N O C OINTEGRATION: AN

A NALYTICAL C OMPARISON.

Abstract

This paper computes the asymptotic distribution of five residuals-based tests for the null of no cointegration under a local alternative when the tests are computed using both OLS and GLS detrended variables. The local asymptotic power of the tests is shown to be a function of Brownian Motion and Ornstein-Uhlenbeck processes, depending on a single nuisance parameter, which is determined by the correlation at frequency zero of the errors of the cointegration regression with the shocks to the right-hand variables. The tests are compared in terms of power in large and small samples. It is shown that, while no significant improvement can be achieved by using different unit root tests than the OLS detrended t-test originally proposed by Engle and Granger (1987), the power of GLS residuals tests can be higher than the power of system tests for some values of the nuisance parameter.

JEL classification: C32

Keywords: cointegration, residual tests, unit root, power.

I thank Robert DeJong, Hashem Dezhbakhsh, Graham Elliott, Clive Granger, Jeff Wooldrige, the participants to the 2001 Australasian Econometrics Society meeting, and two anonymous referees for valuable comments. All errors remain my responsibility.

1.

INTRODUCTION

The purpose of this paper is to provide a careful analysis of residuals-based tests for the absence of cointegration. In their seminal paper on cointegration, Engle and

Granger (1987) suggest a two-step analysis to test for cointegration: estimate the cointegrating regression by OLS in the first step and then test for a unit root in the residuals from the cointegrating regression in the second step. If the null of a unit root in the residuals is rejected, then there is evidence of cointegration. Because the cointegrating vector is estimated in the first stage, the asymptotic distributions of tests on the residuals will be different than the non-estimated unit root case. The distribution generally shifts to the left. Engle and Yoo (1987) compute the asymptotic distribution of the Dickey-Fuller (DF) and the Augmented Dickey-Fuller (ADF) tests under the null hypothesis that the series are independent random walks. Phillips and Ouliaris (1990) relax the independence assumption and compute the asymptotic null distribution of the

Dickey-Fuller tests and other residuals-based tests.

The asymptotic distributions of residuals-based tests under the null are functions of Brownian Motions and are similar with respect to the cointegration vector and the variance covariance matrix of the innovations. Although Engle and Granger recommend using the ADF test, in practice, different unit root tests have been applied to test for no cointegration. As in the case of unit roots, we expect different tests to have different asymptotic distributions and to perform differently in term of size and power. In particular, since statistics that perform better than the ADF test in unit root testing have been found, it is conceivable that the same tests will be superior when testing cointegration.

There is a vast literature that examines various integration tests in terms of their asymptotic power, size distortions, and performances in small samples 1 . Elliott et al.

(1996), for example, compute the asymptotic power envelope for unit root tests and show that, although no feasible test will lie exactly on the power envelope, it is possible to find tests that are asymptotically close to the power envelope and also have very good small sample properties. They show that GLS (or quasi-difference) detrending leads to estimates of the mean and trend that are asymptotically more efficient than OLS

1 See Stock (1994, 1999) and Phillips and Xiao (1998) for a comprehensive survey of tests for integration.

estimates, and show that the power for the Dickey-Fuller test with GLS detrending is almost indistinguishable from the power envelope. Since the improvement in the test is mostly given by the different detrending procedure, the gain from using the DF-GLS test or the PT test of Elliott et al. (1996) is large in the case when either mean or trend are present. Because in a cointegration framework the unit root tests are applied to residuals from an OLS regression, which have by definition mean zero, there is no gain in applying

Elliott et al. (1996) tests to the GLS detrended residuals. At the same time, as Perron and

Rodriguez (2001) show, there is a significant increase in power of residuals tests when the GLS detrending is applied to the variables in the cointegration regression before computing the residuals.

Little work has been done to analytically compare the power of residuals-based tests for cointegration. Pesavento (2004) shows that different tests for the null of no cointegration can have significantly different power functions. In fact, in this nonstandard environment, all the tests have non-normal asymptotic distribution and no uniformly most powerful test exists. It is therefore not clear which test is better when testing cointegration and it merits to examining how different residual-based tests of cointegration behave. This paper analytically derives results for the power of residualsbased tests for the null hypothesis of no cointegration, and shows which features of the model are important for power.

Currently, there is no analytical solution for the asymptotic power of any residual tests for cointegration. There are a number of practical reasons for being interested in the asymptotic distribution of these tests under a local alternative. First, although the asymptotic local power for some of the tests for no cointegration has been computed before (Banerjee et al., 1986, Kremers et al., 1992, Johansen, 1995, Zivot, 2000), different tests have been mostly compared on the basis of Monte Carlo analysis for particular Data Generating Process (Haug, 1996, Gonzalo and Lee, 1998, Bewley and

Yang, 1998, Boswijk and Frances, 1992, Kremers et al., 1992, and Ericsson and

Mackinnon, 1999). The problem with Monte Carlo simulations is that the results of the experiments are dependent on the particular design and no general conclusions are available. In general, these experiments are unable to find any ranking of the tests or to

find important parameters that would allow such a ranking. The analytical derivation of the asymptotic distribution under a local alternative allows the isolation of a single parameter that is important for power, relates understanding to the results of previous

Monte Carlo analysis, and assists in designing a more informative set of experiments. A study of the performance of different tests in terms of power can determine if there is any gain in using a different test than the ADF test originally recommended by Engle and

Granger and can quantify the gains in computing the residuals tests with GLS detrended data as proposed by Perron and Rodriguez (2001). Second, classical tests for cointegration and no-cointegration can exhibit severe size distortions in small samples.

One solution proposed in the recent literature is to apply bootstrap methods to such tests

(Li and Maddala, 1997, Berkowitz and Kilian, 2000, Inoue and Kilian, 2002, and Chang,

Park and Song, 2002 among others). Evaluation of bootstrap methods in a non-stationary environment shows that bootstrapping provides substantial improvements to the finite sample size of tests. Although controlling the size of a test is extremely important, it is also relevant to evaluate whether the refinement obtained with bootstrap algorithms does not come at the cost of lower power. This paper presents a benchmark to evaluate the local power of bootstrap tests that addresses the relevant question of comparing the bootstrap power with the asymptotic power of classical tests (see also Swensen (2003)).

Since all tests are consistent, they all have power equal to one asymptotically and thus the asymptotic power for fixed alternatives cannot be used to rank the tests. I therefore derive the asymptotic distribution of these tests under a local alternative parameterization. The local asymptotic power of the tests is shown to be a function of

Brownian Motion and Ornstein-Uhlenbeck processes, depending on a single nuisance parameter, which is determined by the correlation at frequency zero of the independent variables with the errors of the cointegration regression. Having an analytical formulation for the asymptotic distribution of the tests, I can approximate the asymptotic power directly. I show that, unlike for unit root tests, there is little gain in using different residual tests with OLS detrending while, as Perron and Rodriguez (2001) show, there is significant gain from using residuals tests with GLS detrending. The gain is higher the higher is the nuisance parameter. Interestingly, depending on the value of the nuisance

parameter, residuals test with GLS detrending can have power even higher than the power of system tests.

Section 2 gives a brief description of the different tests analyzed in this paper, the analytical asymptotic power functions are analyzed in section 3, and section 4 compares the local power of these tests in large samples. Small sample properties are presented in section 5. Section 6 concludes.

2. RESIDUALS BASED TESTS

Consider the model: y t x t

µ τ x

+

1 t

= µ τ y

+ t

' β u t

= ρ u t − 1

+ v

2 t

+ u t

(2.1) where t = 1, , T ; x t

is a n

1

× 1 vector, y t

is a scalar, and µ = µ ′ x

µ y

,

τ = τ τ y

′ are deterministic terms. v t

= v '

1 t v

2 t

'

are serially correlated errors with

Φ ( ) t

= ε t

where Φ L is a polynomial of possible infinite order with first element equal to the identity matrix (the usual normalization), and Φ ( ) = 0 has roots outside the unit circle. Furthermore, I assume that the researcher knows that x t

individually have unit roots and are not cointegrated. The error term ε t

= [ ε ε

2 t

] ' is defined implicitly by

Assumption 1, which is valid under a wide range of different assumptions on the process

(see Phillips and Solo (1992) or Wooldridge (1994)).

ASSUMPTION 1: { } satisfies a FCLT, i.e. T − 1/ 2 t

= 1

ε t

⇒ Σ 1/ 2 W where W ( ) is a standard ( n

1

+ × Brownian motion and ⇒ denotes weak convergence.

Assumption 1 along with the assumptions on Φ ( )

means that

T − 1/ 2 t

= 1 v t

⇒ Ω 1/ 2 W where Ω = Φ (1) − 1 ΣΦ (1) − 1 ′ is the spectral density at frequency zero of v t

scaled by 2 π , and Σ is the variance covariance matrix of ε t

. Partition Ω and

Φ ( ) as Ω =

11

ω

21

ω

ω

12

22

 and Φ =

Φ

11

Φ

21

Φ

12

Φ

22

, and define R 2 = δ δ where

δ = Ω − 1/ 2

11

ω ω −

12 22

1/ 2 is a vector containing the bivariate zero frequency correlations of each element of v

1 t

with v

2 t

. R 2 lies between zero and one, and represents the contribution of the right hand variables in the second equation of (2.1) as it is zero when these variables are not correlated in the long run with the errors from the cointegration regression. Under the assumption that x t

are not individually cointegrated, Ω

11

is non-singular. Elliott,

Jansson and Pesavento (2002) and Pesavento (2004) show that the model (2.1) is general and does not assume that the x t

variables are weakly exogenous for the cointegration parameter β .

For the purpose of this paper, I consider the cases where (i) µ = 0 τ = so no mean or trend are included, (ii) µ x

= 0, τ = 0 , and a constant is included in the regressions, (iii), τ = 0 or τ x

= 0 and a constant and a time trend are included in the regressions. The first case corresponds to no deterministic terms. In the second case there is no drift or trend in ∆ x t

, but a constant in the cointegrating vector. Case three has x t

with a unit root and drift, and a mean, or a mean and trend, in the cointegrating vector.

For case three, I assume that a constant and a time trend are included in the estimated regressions to obtain tests that are similar under the null. In all cases the number of lags is approximated by a finite number , chosen with some criteria that can be data dependent.

When ρ < 1, the linear combination y t

− µ τ y t x t

' β is stationary, y t

and x t are cointegrated and the system (2.1) contains n

1

unit roots; when ρ = 1 the two variables are not cointegrated and there are n unit roots in the system.

Standard residual tests for cointegration are based on the OLS residuals from the cointegration regression:

y t

= + x t

' β + u t

(2.2) for various possible choices of a deterministic d t

. This is equivalent to first demeaning or detrending x t and y t

according to the appropriate deterministic case, and then testing the residuals from the regression y d t

= x d ' t

β + u t

(2.3)

Since β

OLS

is a consistent estimator for the true β , testing stationarity of the residuals is a good proxy for testing that the linear combination (2.3) is stationary. Rejection of a unit root in the residuals from (2.3) is an indication of cointegration between the two variables. As Perron and Rodriguez (2001) suggest, residual based tests constructed using GLS detrended variables in (2.3) may have higher power than standard residual tests.

As Stock (1999) shows, most unit root tests can be classified within one unique class.

Following him and Phillips and Ouliaris (1990), I will consider the asymptotic properties of the following residual based tests:

1) The Augmented Dickey-Fuller t

α

and α

T

tests in the augmented regression with p lags

∆ u t

= α u t − 1

+

1 p π i

∆ u + ξ t

.

2) The Phillips (1987) Z t

and Z

α

tests from the regression u ˆ t

= ρ ˆ u ˆ t − 1

+ k

ˆ t

where

Z

ˆ

α

= T ( ρ )

1

2

T

(

− 2 s 2

Tl

− s k

2

∑ u ˆ t

2

− 1

)

, s 2 k

= T − 1

∑ t

T

= 1 k t

2 , s 2

Tl

= T − 1

∑ t

T

= 1 k t

2 + 2 T − 1

∑ l s = 1 w sl

∑ T

1 k k with w sl

1 s l + 1 ) . s 2

Tl

is the sum of covariances estimator of the spectral density at frequency zero of k t

.

MSB = d

T − 2

ω i

AR u t where ω

AR

is a parametric estimator of the spectral density

at frequency zero of k t

that can be easily obtained from the augmented regression in 1).

Tests that are compute using GLS detrended data will be denoted respectively ADF-

, and MSB-GLS. For each test, k

ˆ t

= − ρ u ˆ t − 1

where u t

are the residuals from the LS estimation of the cointegration regression (2.3), when y t d = y t

and x t d = x t for case (i) and y t d , x t d are demeaned or demeaned and detrended variables respectively for case (ii) and case (iii). Hansen (1992) shows that, if the variables have a drift, and the regression (2.2) is run only with a mean, the above tests will have different distributions, with critical values that depend on whether the drift is different than zero or not. Although Hansen (1992) shows that in such case the tests have higher power, we may still want to estimate the regression with a trend when we are not sure if any of the elements of x t

have a drift. This paper only considers the widely used case in which the regression is run with a trend when the variables have a drift.

The following conditions on the expansion rate of the truncation lag p in the

Augmented Dickey-Fuller and the truncation lag l in the non-parametric estimation of the spectral density at frequency zero in the Phillips (1987a) Z t

and Z

α

tests are assumed throughout the paper:

ASSUMPTION 2: T − 1/ 3 p → 0 and p → ∞ as T → ∞ .

ASSUMPTION 3: l → ∞ as T → ∞ and ( 1/ 4 ) .

The condition in Assumption 2 specifies an upper bound for the rate at which the value p is allowed to tend to infinity with the sample size. Ng and Perron (1995) show that conventional model selection criteria like AIC and BIC yield = p

( log T ) , which satisfies Assumption 2. Often, a second condition is also assumed to impose a lower bound on p . The lower bound condition is only necessary to obtain T consistency of the parameters on the stationary variables, and it is sufficient but not necessary to prove

the limiting distribution of the relevant test statistics (Ng and Perron, 1995, Lutkephol and Saikonnen, 1999). Because I am only interested in the t-ratio statistics, Assumption

2 is necessary and sufficient to prove the asymptotic distribution of the tests.

3. ASYMPTOTIC POWER

Given that the traditional optimality theory cannot be applied to the case of tests for the absence of cointegration, there is, in general, no reason to expect one test to perform uniformly better. The literature comparing tests for cointegration uses Monte Carlo experiments with various possible combinations of values for the parameters of the data generating process. To understand the previous studies, we need to know how the nuisance parameters affect power. This can be accomplished by computing the analytical power of the tests. The knowledge of which nuisance parameter enters the local power will help us design Monte Carlo experiments that are informative in terms of the direction of the parameter space that we need to examine.

As Phillips and Ouliaris (1990) show, the residuals tests from the previous sections are consistent tests, so they cannot be compared on the basis of their asymptotic power for fixed alternatives as it equals to one for any large sample. As it is commonly done, we can look at a sequence of local alternatives. Model (2.1) naturally suggests the use of the parametrization ρ = + c T . When c is equal to zero the errors u t

are integrated of order one. For c negative, the variables in equation (2.2) are cointegrated. Under this parametrization u t

is a local to unity process and standard functional central limit theorems can be used to derive asymptotic distributions. Using this parameterization and the results of Phillips (1988), I evaluate and compare the power of the residuals tests for the absence of cointegration presented in the previous section. The derivation of the analytical power for both OLS and GLS residuals tests is one of the main contributions of this paper, as with the exception of the ADF test with OLS detrending, the analytical power of the residuals test has never been computed before.

In Theorem 1 and for the rest of the paper I will use the following notation:

δ ' = ω ω

21

Ω −

11

1/ 2 so that δ δ =

1

R 2

− R 2

; W ' = W W

2

is a n × 1 vector of standard

independent Brownian motions partitioned conformably to v

1 t and v

2 t

, and W

1

is an univariate standard Brownian motion independent of W

2

.

THEOREM 1: When the model is generated according to (2.1) , Assumptions 1 and 2 are valid, and the residuals are computed using OLS detrended variables, then, as T → ∞ :

(1) ˆ ˆ t α

ADF ⇒ c

η d c

' A d c

η d c

η d c

' D η d c

1/ 2

1/ 2

+

η c d

η c d ' D η c d

' ∫ c

' η c d

 

η c d ' A c d η d c

1/ 2

(2) Z

ˆ

α

, α ˆ

T

⇒ c +

η d c

' ∫

η c d ' A d c

η d c

' η c d

(3) MSB ⇒

η c d ' A c d η d c

η c d ' D η c d

1/ 2

1/ 2 where:

η d c

' 

 

1 d d

12 c

∫ W W

1 d

− 1

1

, W c d =

W

1 d

J d

12 c

, A c d = ∫ c c d

,

W ′ =

W

1

′ W

12

, W

12

=

R 2

1 − R 2 2

, D =

 δ

I

′ 1 +

δ 

.

δ ' = ω ω

21

Ω −

11

1/ 2 , R 2 = δ δ , W is a univariate standard Brownian motion, and

J

12 c

( ) is a Ornstein-Uhlenbeck process such that

J

12 c

( ) = W

12

( ) + c ∫

0

λ e ( λ − s c W

12

( ) , and

(i)

(ii)

(iii)

W

1 d = W

1

and J d

12 c

W

1 d = W

1

− ∫ W

1

= J

12 c

and J d

if µ = 0 , and τ = 0

12 c

= J

12 c

− z

J

12 c

if µ x

= 0, τ = 0 , and a mean is included in the regression.

W

1 d = W

1

(4 6 ) ∫

1

− (12 r − 6) ∫ sW

1 and

J d

12 c

= J

12 c

− − ∫

12 c

− (12 r − 6) ∫ sJ

12 c if τ = 0 or τ x

= 0 , and mean and trend are included in the regressions.

Theorem 1 shows that the asymptotic distributions of each of the tests are different; thus they have different asymptotic local power functions. Although the power is invariant to the cointegrating vector (i.e. the value of β ), the variances of the error terms and the parameters of the serial correlation, it depends on a few parameters. The test depends on the alternative c , on the dimension of x t

through the dimension of the

Brownian Motion W

1

( ) , more importantly, on a single nuisance parameter R 2 . When c = 0 , J d

12 c

= W d

12

and the asymptotic distribution of the tests under the null is a function of standard Brownian motions and depends only on the dimension of x t

(see also theorem 4.2 in Phillips and Ouliaris (1990)). Since R 2 enters the asymptotic distribution under the alternative, we expect the power of the tests to vary with R 2 . This result is important as it clarifies some of the previous results found using Monte Carlo simulations

(Haug (1993), (1996)) and it gives an indication of where to look if we are interested in improving and understanding the power of residual tests.

THEOREM 2: When the model is generated according to (2.1) , Assumptions 1 and 2 are valid, and the residuals are estimated using GLS detrended variables, then, as , T → ∞ the tests are distributed as in Theorem 1 with η c d ' , c

A , W ′ ,

Theorem 1 and

(i) W

1 d = W

1

and J d

12 c

= J

12 c

if µ = 0 , and τ = 0

(ii) W

1 d = W

1

and J d

12 c

= J

12 c

if µ x

= 0, τ = 0 , and a mean is included in the regression.

(iii) W

1 d = W

1

λ W

1

( ) ( 1 λ ) 3 ∫ and

J d

12 c

= J

12 c

λ J

12 c

( ) ( 1 λ ) 3 ∫ c

where λ =

1

1 − c

2 / 3 if τ = 0 or

τ x

= 0 , and mean and trend are included in the regressions.

The results of Theorem 2 generalize the results of Perron and Rodriguez (2001) to the local power. When c = 0 , J d

12 c

= W d

12

and the asymptotic distribution of the tests in

Theorem 2 coincide with the results in Perron and Rodriguez (2001). The analytical computation of the local asymptotic distribution of the tests shows that even with GLS detrending the tests depend as before only on the single nuisance parameter R 2 .

Although the results of Theorem 1 and 2 have not been proved before, they are not surprising. In fact, R 2 represents the contribution of the right hand variables in equation

(2.1) so it affects the asymptotic distribution of the LS estimate of β . At the same time,

R 2 is the correlation at frequency zero between v

1 t

and v

2 t

in the cointegration regression and it enters the local power through the Ornstein-Uhlenbeck processes and the local alternative. When c = 0 , β is not identified, J

12 c

= W

12

and the asymptotic distribution of the tests under the null is a function of standard Brownian motions that depends only on the dimension of x t

(see also Phillips and Ouliaris (1990) and Stock

(1999), Perron and Rodriguez (2001)). These results are particularly relevant since no optimality theory exists for testing in this framework. In fact, while the power envelope for known cointegration vector is computed in Elliott, Jansson and Pesavento (2004), no power envelope exists for the case in which β is estimated.

Although in this paper I am only interested in comparing different residuals tests for the no cointegration it should be pointed out that there are various other tests can one could use to test the null of no cointegration. Pesavento (2004) compares the local power of the ADF test applied to the residuals from the cointegrating regression, with system tests like Johansen’s maximum eigenvalue test and three tests in the Error Correction

Model: The t-test on the error correction term (which is equivalent to applying Hansen

(1995) unit root test to the known cointegration vector using ∆ x t

as the stationary covariate), a feasible version of the t-test obtained by adding a redundant regressor

(which is asymptotically equivalent to the EC t -test with β pre-specified of Zivot

(2000)) and the Wald test proposed by Boswijk (1994). Using a general and coherent framework Pesavento (2004) show that the asymptotic distribution of all the tests depend on the same single nuisance parameter R 2 .

As it is shown in the next section, the asymptotic power of all residuals based tests can be good when R 2 is small, it is decreasing in R 2 and it can be low for large values of

R 2 .

4. LARGE SAMPLE RESULTS

Previous Monte Carlo comparisons of cointegration tests have shown that different tests can perform differently depending on the particular design. Haug (1996), for example, compares 9 different tests for cointegration on the basis of power and size distortions induced by the presence of moving average components. Haug (1996) finds that, in general, single equation tests have smaller size distortions, but also lower power than system-based tests. The paper concludes by recommending the application of both sets of tests in empirical exercises. As Haug (1996) also points out “A theory that gives the direction in which to experiment would be necessary but this theory is not available at the moment”. The local asymptotic distributions computed in the previous section tell us exactly which parameters are important for power, and give a precise indication of which direction we need to look in the Monte Carlo analysis. For each test statistic, the asymptotic distribution is a function of a unique nuisance parameter, the number of unit roots in the system, and the local alternative 2 . The power functions are computed as the probability that the tests are less than some critical value. Given the expression for the limit distribution of all the tests, the asymptotic local power can be approximated by simulating the distributions presented in the previous sections. This experiment considers c , , , , − and R 2 = 0, 0.3, 0.5

which, inthe case in which x t

is univariate, correspond to correlation values of 0, 0.5, 0.7. Each Brownian Motion’s piece in the asymptotic distribution is approximated by step functions using Gaussian random walk with T=1000 observations. 10,000 replications are used to compute the critical values and the rejection probabilities for each c and R . For the GLS tests the quasi-difference parameter c is equal to -7 for the constant only case and to -13.5 for the linear trend case 3 .

2 For example the results of the previous section suggest that in the simulations in Haug (1996) the only relevant nuisance parameter for the power is η .

3 To compare our results to Perron and Rodriguez (2001) I use the values for c as initially suggested by

Elliott, Rothemberg and Stock (1996). Since a power envelope for tests for no cointegration when the cointegration vector is estimated has not been computed yet, we cannot give the same justification for those values as Elliott, Rothemberg and Stock (1996). Another sensible choice would be c = 0 . Because

Perron and Rodriguez (2001) show that, in general, the increase in power in larger for values of c different than zero I will limit my choice to the values used by standard econometrics packages.

Figures 1-3 compare the asymptotic power of the tests in Theorem 1 and 2, for the case with no mean, mean only and mean and trend in the cointegration vector. For the no deterministic case, the power curves for the GLS detrended test are not reported as they will be identical to the corresponding OLS-detrended test.

When there is no mean in the cointegration and no constant is included in the

OLS regression, all the tests perform similarly. The ADF α

T

test has marginally higher asymptotic power especially when we are further away from the null. Generally, no significant increase in power results from using a different test than the most commonly used t-test. In the demeaned case, the ranking of the OLS-detrended residual tests is the same as that in the unit root case: MSB has slightly better power than α

T

which performs t

α

, however, is not as big as in the unit root case. For example, Stock (1994) reports that for c=-10 the power of the

ADF test when used to test for integration is 35% while the power of MSB test is 55%.

The difference in power between the same two tests when used to test for cointegration is only between 0.02% and 0.06% depending on the value of R 2 . When a mean and a trend are included in the regression, there is no unique ranking between the tests anymore. The t ˆ

α

test performs slightly better in a close neighborhood of the null, while α

T

performs better for alternatives further away.

As also documented in Perron and Rodriguez (2001), test computed with GLSdetrended variables perform uniformly better than test computed with OLS-detrended variables. The ranking between the tests is the same as in the standard tests but the power can be up to twice as large. As in the unit root case, when a mean and trend are included all the tests have lower power, and the gain from GLS-detrending is not as large, although still impressive.

As Theorem 1 shows, the asymptotic power of the tests is different for different values of the nuisance parameter R 2 . Figure 4 shows the asymptotic power of the ADF t ˆ

α

test for different values of R 2 .

4 As R 2 increases, the power of the ADF test shifts to the right. In fact the reduction in power for large value of R 2 can be significant. For

4 Results for any of the other tests would be similar.

example, for the demeaned case, when c is –20, the probability of rejecting a false hypothesis is 30% lower when R 2 is 0.7 than when R 2 is zero. For the detrended case this difference is even larger. There are two reasons for the low power of the residual based tests in the presence of high values of R 2 . First, these tests are based on a single equation in which the right hand variables are highly correlated (at frequency zero) with the error terms. Residual based tests for cointegration draw on a single equation,(2.2), which does not fully exploit this correlation. As R 2 increases, i.e. as the right hand variables in (2.2) are more correlated with the errors from the cointegration regression, the power of the tests decreases. Pesavento (2004) shows that other cointegration tests based on a full system approach can exploit this correlation and have power higher than residual based tests when R 2 is large. Secondly, the power is affected by how well we estimate the cointegration vector. Elliott, Jansson and Pesavento (2004) show that the absolute loss in power from not knowing the cointegration vector can be very large and it is also increasing with R 2 . Better estimates (for example, leads and lags estimators) of the cointegration vector could then results in tests with better power.

Figures 3 and 3 show that there is a significant gain in power from GLSdetrending the variables prior to constructing the residuals of the cointegration vector. It is interesting then to compare the gain from GLS-detrending with the gain from using a system test as the ones analyzed in Pesavento (2004). Figure 5 compares the power of the ADF, ADF-GLS and the error correction test with redundant regressors (ECR thereafter) described in Pesavento (2004). This is of course not the only system test I could analyze, but Pesavento (2004) shows that in general the error correction test performs better than other system tests like Johansen maximum eigenvalue test and the

Wald test. Figure 5 shows that the ADF single equation test with GLS detrending has power that is considerably higher that the ECR test when R 2 is zero. As R 2 increases the system test performs better as it exploits the correlation in the errors but only for R 2 equal to 0.3 or higher the system test is better than ADF-GLS. This is an interesting result as in a bivariate system R 2 equal to 0.3 correspond to a correlation of about 0.5 which is reasonable estimate for many empirical applications.

From a practical point of view, the results of Theorem 1 suggest that, at least asymptotically, only R 2 will affect the power of the tests. Failure to reject no cointegration in empirical applications could then be due to low power rather than to truly absence of cointegration. Unfortunately, finding a consistent estimator for R 2 when the cointegration vector is unknown is not a trivial question. Corollary A2 in the Appendix

β cannot be consistently estimated: If R 2 = 0 and c = 0 , then

J d

12 c

( ) ≡ W

2 d ( ) β coincides with the usual distribution in the spurious regression as defined by Granger and Newbold (1974). In the original definition of spurious regression, there is no cointegration and indeed no relationship between the two sets of variables. As Phillips (1986) shows, the same results are valid in the more general case in which R 2 ≠ 0 . In this case, even though there is a relationship between the two sets of variables, this relationship is not consistently estimated and the asymptotic distribution of Corollary A2 is the same as in Phillips and

Ouliaris (1990). In the local to unity framework of model (2.1), β is also not consistently estimable under the (local) alternative. The study of how to construct an informative estimator for R 2 when the cointegration vector is unknown is left to future research.

To see whether the asymptotic distribution is a good approximation for practical purpose we need to look at the small samples properties of the tests: the power of the tests in small samples will be studied in the next sections.

5. SMALL SAMPLES PROPERTIES

Starting with the work of Schwert (1989) a vast literature has been developed to study the small sample performance of unit root tests both in term of size and power. In general, the findings confirm that finite size distortions for unit root tests vary widely across tests depending on the way the spectral density estimator is constructed and the lag length involved 5 . Haug (1993, 1996) conducts some Monte Carlo simulations to compare small sample distortions of unit root tests applied to residuals from a cointegration regression. He finds results similar to the unit root tests, in that the tests present

5 Stock (1994) lists many such studies.

significant size distortions especially in the presence of large negative moving average roots. The model derived in Section 2 allows us to compare the small sample power and size distortions in a more general setting than Haug’s (1993).

Using the DGP (2.1) I randomly generate the errors from a bivariate Normal with mean zero and variance-covariance matrix

Ω = NM

L

ω 2

1

ω ω δ ω 2

2

QP

O

For R 2 = δ 2 , I consider R 2 = 0, 0.3, 0.5, and c 0, 5, 10, 15, 20 that corresponds to

ρ = 1, 0.95, 0.9, 0.85, 0.80

for T=100. The tests are all invariant to β and the variance of the errors so I can choose any reasonable number. Table 1 shows the rejection probabilities for the case in which there is no serial correlation in the error terms. Since no serial correlation is assumed all the regressions are estimated without lags. For now, I am only interested in examining the quality of the local-to-unity asymptotic approximations so Tables 1 presents the size-adjusted power. Given that the regressions are estimated without lags and no serial correlation is present, the estimated spectral density in Z

ˆ t

and Z

ˆ

α

is only marginally different so the results are presented only for the t ˆ

α

, α ˆ

T

and MSB tests. t

α

does not perform as well as MSB in small samples but the difference is marginal.

Although none of the tests has very high power in a neighbor of the null, the difference in performance is more evident for small values of

R 2 . When R 2 is large, the gain from using MSB over the ˆ

α

test is less than 5% in term of power. Similarly to the asymptotic results using GLS-detrended variables increases the probability of rejecting a false null. The increase in power is larger as R 2 increases.

The ranking of the tests in small sample is similar to what predicted from the asymptotic theory. For samples as small as 100 observations, contrary to the asymptotic results

α ˆ

T

− GLS performs marginally better than MSB-GLS although the difference could be due to Monte Carlo errors. As the same size increases, the ranking of the tests is the same as what is predicted by the asymptotic results in Figure1-3.

Since a test with good power, but bad size may not be the best choice, it is important to evaluate the size properties of the tests. The size properties of the tests are not reported, as they are very similar to what has been previously reported in the literature. These results emphasize the well-known trade-off between size and power in t

α

test has power lower that the other tests, its size distortions are less evident across different specifications for the error terms.

6. CONCLUSIONS

Over the past years, residual tests for cointegration have been widely used in testing for the null of no cointegration. Until now, evaluation of the tests in term of their power of size distortions has been mainly done with Monte Carlo simulations. However, it is hard to understand how the tests behave and to design the correct experiment without the knowledge of which parameters enter the asymptotic distribution of the tests under the alternative. This paper computes the analytical local power of five unit root tests applied to the residuals of a cointegration regression. The difference in power between the tests is significantly less than in the unit root tests. The paper also shows that the power depends on a single nuisance parameter, which is, determined by the correlation at frequency zero of the independent variables with the errors of the cointegration regression. I show that there is no particular gain in using different unit root tests from the ADF t-test originally suggested by Engle and Granger (1987) while significant gain can be achieved by using GLS-detrended variables in the cointegration regression.

Simulations show that for large values of the nuisance parameter, the power of residuals based tests is low, leaving room for improvement and future work. The power of GLS residuals tests is also decreasing as the nuisance parameter increases but it can sometime be higher than the power of system tests.

REFERENCES:

Banerjee, A., Dolado, J.J., Hendry, D.F. and G.W. Smith, 1986, Exploring Equilibrium

Relationship in Econometrics through Static Models: Some Monte Carlo Evidence,

Oxford Bulletin of Economics and Statistics 48, 253-277.

Berkowitz, J. and L.Kilian, 2000, Recent Developments in Bootstrapping Time Series,

Econometric Reviews 19, 1-48.

Bewley, R. and M.Yang, 1998, On the Size and Power of System Tests for Cointegration,

The Review of Economics and Statistics 80, 675-679.

Boswijk, H. P., 1994, Testing for an Unstable Root in Conditional and Structural Error

Correction Models, Journal of Econometrics 63, 37-60.

Boswijk, H.P. and P.H. Frances, 1992, Dynamic Specification and Cointegration, Oxford

Bulletin of Economics and Statistics 54, 369-381.

Chan, N.H. and C. Z. Wei (1988): “Limiting Distributions of the Least Squares Estimates on Unstable Autoregressive Processes”, Annals of Statistics, 16, 367-401.

Chang, Y., Park J.Y. and K.Song, 2002, Bootstrapping Cointegrating Regressions, Rice

University, Department of Economics Working Paper #2002-04.

Elliott, G., M. Jansson (2003): “Testing for Unit Roots with Stationary Covariates”,

Journal of Econometrics, 115, 75-89.

Elliott, G., M. Jansson and E. Pesavento (2004): “Optimal Power for Testing Potential

Cointegrating Vectors with Known Parameters for Nonstationarity”, forthcoming Journal of Business and Economic Statistics .

Elliott, G., T. J. Rothenberg and J. H. Stock (1996): “Efficient Tests for an

Autoregressive Unit Root”, Econometrica, 64, 813-836.

Engle, R.F. and C. W. Granger (1987): “Cointegration and Error Correction:

Representation, Estimation and Testing”, Econometrica, 55, 251-276.

Engle, R.F. and B. S. Yoo (1987): “Forecasting and Testing in Co-integrated Systems”,

Journal of Econometrics, 35, 143-159.

Ericsson, N.R. and J.G. Mackinnon, 1999, Distributions of Error Correction Tests for

Cointegration”, International Finance Discussion Paper (655), Board of Governors of the

Federal Reserve System.

Gonzalo, J. and T-H Lee (1998): “Pitfalls in Testing for Long Run Relationships”,

Journal of Econometrics, 86, 129-154.

Granger, C. W. and P. Newbold, 1974, Spurious Regressions in Econometrics, Journal of

Econometrics 2, 111-120.

Hansen B. (1992): “Efficient Estimation and Testing of Cointegrating Vectors in the

Presence of Deterministic Trends”, Journal of Econometrics, 53, 87-121.

Hansen, B., 1995, Rethinking the Univariate Approach to Unit Root testing, Econometric

Theory, 11, 1148-1171.

Haug, A.A. (1993): “Residual Based Tests for Cointegration: A Monte Carlo Study of

Size Distortions”, Economics Letters, 41, 345-351.

Haug, A.A. (1996): “Tests for Cointegration A Monte Carlo Comparison”, Journal of

Econometrics, 71, 89-115.

Inoue, A. and L.Kilian, 2002, Bootstrapping Autoregressive Processes with Possible Unit

Roots, Econometrica 70, 377-391.

Johansen, S., 1995, Likelihood Based Inference in Cointegrated autoregressive Models.

( Oxford University Press, Oxford).

Kremers J.M. et al. (1992): “The Power of Cointegration Tests”, Oxford Bulletin of

Economics and Statistics, 54, 325-348.

Li, H. and G.S. Maddala, 1997, Bootstrapping Cointegration Regressions, Journal of

Econometrics 80, 297-318.

Lutkephol, H. and P. Saikkonen, 1999, Order Selection in Testing for the Cointegrating

Rank of VAR Process, in: R.F. Engle and H. White, eds., Cointegration, Causality and

Forecasting, (Oxford University Press, Oxford), 168-199.

Ng, S, and P. Perron, 1995, Unit Root Tests in ARMA Models with Data-Dependent

Methods for Selection of the Truncation Lag, Journal of the American Statistical

Association 90, 268-281.

Park, J.Y. and P.C.B. Phillips (1988): “Statistical Inference in Regressions with

Integrated Processes: Part 1”, Econometric Theory, 4, 468-497.

Pesavento E. (2004): “Analytical Evaluation of the Power of Tests for the Absence of

Cointegration”, Journal of Econometrics, 122/2, October 2004, pp. 349-384.

Perron, P. and Rodríguez, G . (2001), ''Residual Based Tests for Cointegration with GLS

Detrended Data'', University of Ottawa, Department of Economics Working Paper

#0004E.

Phillips, P.C.B., 1986, Understanding Spurious Regression in Econometrics, Journal of

Econometrics 33, 311-340.

Phillips, P.C.B. (1987a): “Toward and Unified Asymptotic Theory for Autoregression”,

Biometrika, 74, 535-547.

Phillips, P.C.B. (1987b): “Time Series Regression with a Unit Root”, Econometrica, 55,

277-302.

Phillips, P.C.B. (1988): “Regression Theory for Near-Integrated Time Series”,

Econometrica, 56, 1021-1043.

Phillips, P.C.B. and S. Ouliaris (1990): “Asymptotic Properties of Residuals Based Tests for Cointegration”, Econometrica, 58, 165-193.

Phillips, P.C.B. and V.Solo (1992): “Asymptotic for Linear Processes”, The Annals Of

Statistics, 20(2), 971-1001.

Phillips, P.C.B. and Z. Xiao (1998): “A Primer in Unit Root Testing”, Cowles

Foundation for Research in Economics Working Paper.

Sargan, J.D. and A. Bhargava (1983): “Testing for Residuals from Least Squares

Regression for Being Generated by the Gaussian Random Walk”, Econometrica, 51, 153-

174.

Schwert, G.W. (1989): “Tests for Unit Roots: A Monte Carlo Investigation”, Journal of

Business and Economic Statistics, 7, 147-159.

Stock J.H. (1994): “Unit roots, Structural Breaks and Trends”, in McFadden and R.F.

Engle eds. Handbook of Econometrics V4, North Holland, 2739-2841.

Stock, J.H. (1999): “A Class of Tests for Integration and Cointegration”, in R.F. Engle and H. White eds. Cointegration, Causality and Forecasting, Oxford: Oxford University

Press.

Swensen, A.R., 2003, A Note on the Power of Bootstrap Unit Root Tests, Econometric

Theory 19, 32-48.

Wooldridge, J. (1994): “Estimation and Inference for Dependent Processes” in Handbook of Econometrics , vol.4, ed. D. Mc Fadden and R.F. Engle. Amsterdam: North Holland

2639-2738.

Zivot, E., 2000, The Power of Single Equation Tests for Cointegration when the

Cointegrating Vector is Prespecified, Econometric Theory 16, 407-439.

APPENDIX

LEMMA A1: When the model is generated according to (2.1) with T b g c , then, as

T → ∞ 6 :

(1) Ω −

11

1/ 2 T − 2 Σ x x t d

'

Ω −

11

1/ 2 ⇒ ∫ d

1 d '

(2) Ω −

11

1/ 2 ω − 1/ 2

2.1

T − 2 Σ x u t d ⇒ ∫ d d

12 c

(3) ω − 1

2.1

T − 2 Σ u t d

2 ⇒ ∫ J d

12 c

2

(4) ω − 1/ 2

2.1

T − 1 Σ u t d

− 1

(

Σ − 1/ 2 ε t

)

⇒ ∫ J d

12 c dW

(5) Ω −

11

1/ 2 T − 1 Σ x t d

− 1

(

Σ 1/ 2 ε t

)

⇒ ∫ d where i = 1 2 , the summation goes from 1 to T and ⇒ denotes weak convergence, ii denotes the ii element of a matrix. W ' = W W

2

is a n × 1 vector of standard independent Brownian motions partitioned conformably to v

1 t and v

2 t

. W

1

is an univariate standard Brownian motion independent of J

12 c is a scaled Ornstein

Uhlenbeck process such that: J

12 c

( ) = W

12

( ) + c ∫

0

λ e ( λ − W

12

( ) with

W

12

=

R 2

1 − R 2 2

. If the variables are detrended using OLS then

µ = 0 , and τ = 0 , x t d = x t and W d = W , J d = J

µ = 0, τ = 0 and a mean is included in the regression, x t d x x and W d = − ∫ W , J d = − ∫ J

6 I follow the usual convention and suppress the ( λ ) from the Brownian motion terms. Unless specified otherwise, all the integrals are intended to be between 0 and 1. Unless noted otherwise Σ denotes

T t = 1

.

(iii) if τ = 0 or τ x

= 0 , and mean and trend are included in the regressions, x t d is x t

detrended by OLS and

W d W

J d J

λ ∫ W − (12 λ − 6) ∫ sW ,

λ ∫ J − (12 λ − 6) ∫ sJ

If the variables are detrended using GLS then

(iv) if µ = 0 , and τ = 0 , W

1 d = W

1

and J d

12 c

= J

12 c

.

(v) if µ x

= 0, τ = 0 , and a mean is included in the regression, W

1 d = W

1

and

(vi)

J d

12 c

= J

12 c

.

if τ = 0 or τ x

= 0 , and mean and trend are included in the regressions, W

1 d = W

1

λ W

1

( ) ( 1 λ ) 3 ∫ and c

where λ =

1

1 − c

2 / 3

.

J d

12 c

= J

12 c

λ J

12 c

( ) ( 1 λ ) 3 ∫

Proof of Lemma A1:

Assumption 1 along with the assumptions on Φ ( ) implies T − 1/ 2

∑ [ T λ ] s = 1 v s

⇒ Ω 1/ 2 W for

λ ∈ [0,1] where, Ω =

 ω

Ω 1/ 2

11

Ω −

21 11

1/ 2

0

ω 1/ 2

2.1

, ω

2.1

= ω

22

− ω

21

Ω − 1

11

ω

12

, and W ' = [ W

1

' W

2

' ] .

Using this notation T − 1/ 2

∑ [ T λ ] s = 1 v

2 s

⇒ ω

21

Ω −

11

1/ 2 W

1

+ ω 1/ 2

2.1

W

2

. Define δ ' = ω ω

21

Ω − 1/ 2

11

so that δ δ =

1

R 2

− R 2

; then T − 1/ 2

∑ [ T λ ] s = 1

δ ' Ω −

11

1/ 2 v

1 s

⇒ δ ' W

1

=

1

R 2

− R 2

W

1

where W

1

is a univariate standard Brownian Motion independent of W

2

. From Phillips (1987a,b) and

the Multivariate FCLT ω − 1/ 2

2.1

T − 1/ 2 u ⇒

1

R 2

− R 2

W

1 c

+ W

2 c

= J

12 c

. Results (1)-(3) follows from the Continuous Mapping Theorem (CMT thereafter) while (4) and (5) follows directly from Chan and Wei (1988) or Phillips (1987a). When the variables are detrended using quasi-differences the results follow directly from the CMT, Elliott,

Rothemberg and Stock (1996).

COROLLARY A2: When the model is generated according to (2.1) then, as T → ∞ ,

(A1)

(

β ˆ ols

− β

)

⇒ ω 1/ 2 Ω −

2.1

11

1/ 2 ∫

(

W W

1 d

) − 1 ( ∫ W J d

12 c

) where β ols

is the LS estimator in the cointegration regression (2.2), and W ' =

 1

2

 is a n × 1 vector of standard independent Brownian motions partitioned conformably to v

1 t and v

2 t

, ω

2.1

= ω

22

− ω

21

Ω − 1

11

ω

12

, δ = Ω −

11

1/ 2 ω ω −

12 22

1/ 2 and R 2 = δ δ . J

12 c is a scaled

Ornstein Uhlenbeck process such that: J

12 c

( ) = W

12

( ) + c ∫

0

λ e ( λ − s c W

12

( ) with

W

12

=

R 2

1 − R 2 2

.

Proof of COROLLARY A2:

J d

12 c

are as defined in Lemma A1.

β can be estimated by regressing y t d on x t d . If z

2 t

= 0 then x t d = x t

and y t d = y t

, for case (ii) both variable are demeaned and for case (iii) both variables are demeaned and detrended. In the general case

(

β β

)

=

(

T − 2 Σ x x t d '

) (

T − 2 Σ x u t d

)

. From LEMMA A1 and the CMT we have that

(

β β

)

⇒ ω 1/ 2 Ω −

2.1

11

1/ 2

( ∫ W W

1 d '

) ( ∫ W J d

12 c

)

.

Before proving Theorem 1 and 2, I will introduce some auxiliary results.

COROLLARY A3: When the model is generated according to (2.1) with T b g c , then, as T → ∞ :

(i) ω − 1

2.1

T − 2 Σ u ˆ t d

2 ⇒ η c d

'

A c d η d c

(ii) ω − 1

2.1

T − 1 Σ u ˆ t d

− 1

∆ u ˆ t d ⇒ cd (1) η d c

'

A d c

η c d + d (1) η c d

' ∫ c

(iii) s 2

ξ

⇒ d (1) 2 ω η c d ' D η d c

(iv) ω ˆ

AR

⇒ ω η c d ' D η d c

' η c d where u t

are the residuals from the cointegration regression (2.2) and s 2

ξ

= T − 1 Σ ξ t

2 is the estimated variance of the residuals of the ADF regression.

η d c

' 

 

1 d d

12 c

∫ W W

1 d

− 1

1

, A d c

= ∫ W W c d

′ =

∫ W W

1 d

′ d ′ d

12 c

∫ W J d

12 c

,

∫ J d

12 c

2

W c d =

W

1 d ′

J d

12 c

, W =

W

1

R 2

1 − R 2

+

2

and D =

 δ

I

′ as defined in Lemma A1 .

Proof of Corollary A3:

1 +

δ 

. J d

12 c

are

(i) By OLS projections u ˆ t

= u ˆ t d = u t d ( γ γ ) ′ x t d . By Lemma A1 and the CMT

ω − 1/ 2

2.1

T − 1/ 2 ˆ d ⇒ η c d ′

W c d and ω − 1

2.1

T − 2

T

2 u ˆ t d 2 ⇒ η d c

' ∫ W W c d c d

′ η c d = η c d ' A c d η d c

when k

T

→ 0 which is true under Assumption 2.

(ii) For a proof. of (ii) see Pesavento (2004).

(iii) Under Assumption 2, the estimates of the ADF regression are consistent.

( α ρ 1 ) and s

ξ

2 k

= T − 1

T

2

ξ ˆ

2 tk

= T − 1

T

∑ t k 2

ϖ t d 2 + o p

(1) ⇒ d (1) 2 η c d ' ΓΩΓ ' η c d = d (1) 2 ω η c d '

 δ

I

′ 1 +

δ 

η d c

(iv) Following the argument in (ii) and (iii) it is easy to show that k

ˆ t

= ∆ − ( ρ ˆ − 1 ) u ˆ t d

− 1

.

If assumption 1 and 3 are valid, the autoregressive estimate of the spectral density of k

ˆ t

is a consistent estimate of the spectral density at frequency zero of b v t d which, as shown in (ii), converges to η d c

'

ΓΩΓ ' η d c

= ω η d c

'

D η d c

.

Proof of Theorem 1:

(1a) The ADF test statistic is the usual t ratio test in the augmented regression

∆ u t

= α u t − 1

+

1 p π i

∆ u + ξ t

, or t

α

=

− 1

− 2

'

− 1

'

− 1

− 1 where, u ˆ

− 1

and ∆ u ˆ

− 1

are the vectors of observations u ˆ t d

− 1

and ∆ u ˆ t d

− 1

, U is the matrix of observation on the p-1 regressors ∆ u ˆ t d

− 1

∆ u ˆ t d

− 2

...

∆ u ˆ d t p

, M

U

= − ( ′ ) − 1

U ′ and s

ξ

2 = T − 1 Σ ξ t

2 . Under the condition for Corollary A1, A2, and Assumptions 1 and 2, t

α

=

ω

ω − 1

2.1

T − 1 Σ u ˆ t d

− 1

∆ u ˆ t d

− 1/ 2

2.1

s

ξ

ω − 1

2.1

T − 2 Σ u ˆ t d

− 1

2

1/ 2

+ o p

(1)

By Corollary A3, t

α

⇒ c d d

(1)

η d c

' A d c

η d c

(1) η d c

' D η c d

1/ 2

1/ 2

+ d (1) η d c

' ∫ d (1) 2 η c d ' D η c d c

′ ' η c d

 

η d c

' A c d η c d

1/ 2

(1b) Recall that k

ˆ t

= ∆ − ( ρ − 1 ) u ˆ t d

− 1

. As proved in Corollary A3, part (ii) u ˆ ( ρ − 1 ) u ˆ t d

− 1

+

( ) ( − 1 ) x t d

− 1

+ ˆ ′ t d .

Under assumption 2

2

1 c s 2

Tl

− s 2 k h

= T − 1

∑ l s = 1 w sl

∑ T t s 1 k k = T − 1

∑ l s = 1 w sl

∑ T t s 1 p

T − 1

∑ l s = 1 w sl

∑ T t s 1

'

ˆ ⇒ d c

Park and Phillips (1988) for a reference).

′ η c d where Γ

1

is the lim T − 1

∑ ∑

( t s

) ( See

Similarly, s 2

Tl

⇒ d c

Γ Γ Γ ′ d c

+ 2 c d

ΓΓ Γ ′ η d c

= η c d

ΓΩΓ ′ η d c

= η d c

D η d c

which is the spectral density at frequency zero of

ˆ ' t

and

T − 1 Σ u ˆ t d

− 1 u ˆ t d cT − 1 Σ ˆ ˆ d

− t − 1

+ ˆ

' − 1 Σ ˆ t d d

− 1 t

+ o p

(1) . (See also proof of Corollary A3, part

(ii)). Because v t d is a dependent process, following Park and Phillips (1988) we have that

ω − 1

2.1

T − 1 Σ u ˆ t d

− 1

∆ u ˆ t d ⇒ c η d c

' A d c

η η c d ' ∫ c

Noting that T ( ρ ˆ ) (

T − 2 Σ u ˆ t d

− 1

2

′ c c d

1

′ η c d .

T − 1 Σ u ˆ t d

− 1

∆ u ˆ t d

)

and substituting each piece in the expression for Z

ˆ t

, we obtain

Z ˆ t

⇒ c η d c

'

A c d η η d c

' ∫ c

′ c

+ ΓΓ Γ ′ η η ′ η c d

η c d

'

A c d η d c

 1/ 2

 

η c d

D η d c

 1/ 2

which simplifies to the expression in Theorem 1.

(2a) We can write α

T

=

T b g

ω s

ξ

AR .

T ( ρ )

T − 1 Σ u ˆ t d

− 1 u ˆ t d o p

T − 2 Σ u ˆ t d

2

1

+ o p

(1)

(1)

⇒ cd (1) η d c

' A d c

η d c

+ d (1) η c d ' ∫

η d c

' A c d η c d c

A3. Adding the limits for the variances

' η c d

from Corollary

α ˆ

T

⇒ c d d (1) η d c

' A d c

η d c

(1) η d c

' A d c

η d c

+ d (1) η c d ' d (1)

∫ c

η d c

' A c d η c d

′ η c d η c d ' ∫ c

η d c

' A d c

η d c

′ η c d

(2b) The proof for Z

ˆ

α follows from (1b):

Z ˆ

α

⇒ c η c d ' A c d η η c d ' ∫ c

′ η η

η d c

' A d c

η d c

1

η η

1

η c d η c d ' ∫ c

η d c

' A d c

η c d

′ η c d

(3) MSB =

(

T − 2

∑ u t d 2

ω ˆ

AR

) 1/ 2

. Substituting all the terms from Corollary A3:

MSB ⇒

ω 1/ 2

2.1

η c d ' A c d η c d  1/ 2

1/ 2

ω η c d

'

D η c d

=

η c d ' A c d η c d  1/ 2

1/ 2

η c d

'

D η c d

Proof of Theorem 2 :

The proof pf Theorem 2 follows directly from the proof of Theorem 1, with different

Brownian Motions deriving from a different detrending procedure. For the case in which c = 0 see also Perron and Rodriguez (2001).

0.7

0.6

0.5

0.4

0.3

Figure 1: Large Sample Power, no constant case.

Large Sample Power for Rsquare = 0

1

0.9

0.8

adf rhohat msb

0.2

0.1

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 c

Local Power for delta =0.3

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

adf rhohat msb

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 c

Large Sample Power for Rsquare = 0.5

1 adf rhohat msb

0.1

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 c

Figure 2: Large Sample Power, Demeaned

Large Sample Power for Rsquare=0, Demeaned

1

0.9

0.8

0.7

0.6

0.5

adf rhohat msb adf_gls rhohat_gls msb_gls

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Large Sample Power for Rsquare=0.3, Demeaned

1

0.9

0.8

0.7

0.6

0.5

adf rhohat msb adf_gls rhohat_gls msb_gls

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Large Sample Power for Rsquare=0.5, Demeaned

1

0.9

0.8

0.7

0.6

0.5

adf rhohat msb adf_gls rhohat_gls msb_gls

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

0.6

0.5

0.4

0.3

0.2

Figure 3: Large Sample Power, Demeaned and Detrended.

Large Sample Power for Rsquare=0, Deamened and Detrended

1

0.9

0.8

adf rhohat msb adf_gls rhohat_gls msb_gls

0.7

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Large Sample Power for Rsquare=0.3, Deamened and Detrended

1

0.9

0.8

0.7

0.6

0.5

adf rhohat msb adf_gls rhohat_gls msb_gls

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Large Sample Power for Rsquare=0.5, Deamened and Detrended

1

0.9

0.8

adf rhohat msb adf_gls rhohat_gls msb_gls

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Figure 4: Asymptotic Power for ADF t-test for different values of Rsquare.

Rejection rates for ADF test for different values of Rsquare

Demeaned

1

0.9

0.8

0.5

0.4

0.7

0.6

Rsquare =0

Rsquare =0.1

Rsquare =0.3

Rsquare =0.5

Rsquare =0.7

Rsquare =0.9

0.3

0.2

0.1

0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Rejection rates for ADF test for different values of Rsquare

Demeaned and Detrended

0.8

0.7

0.6

1

0.9

Rsquare =0

Rsquare =0.1

Rsquare =0.3

Rsquare =0.5

Rsquare =0.7

Rsquare =0.9

0.3

0.2

0.1

0.5

0.4

0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Figure 5: Comparison with system tests, mean only case.

Large Sample Power for Rsquare=0, Demeaned

1

0.9

0.8

0.7

0.6

0.5

adf adf_gls ecr

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Large Sample Power for Rsquare=0.3, Demeaned

1

0.9

0.8

0.7

0.6

0.5

adf adf_gls ecr

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Large Sample Power for Rsquare=0.5, Demeaned

1

0.9

0.8

0.7

0.6

0.5

adf adf_gls ecr

0.4

0.3

0.2

0.1

0

0 2 4 6 8 10 12 14 16 18 c

20 22 24 26 28 30 32 34 36 38

Table 1: Small sample size adjusted power no constant, no serial correlation, T=100

R 2

-c

\ ρ

0

1

5

0.95

10

0.90

15

0.85

Case (ii)

0

0.3

0.5

Case (iii)

0

0.3

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

ADF

α

MSB

ADF-GLS

α − GLS

MSB-GLS

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

ADF

α ˆ

T

MSB

ADF-GLS

α −

ˆ

T

GLS

MSB-GLS

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

0.050

0.050

0.051

0.050

0.050

0.050

0.050

0.050

0.051

0.050

0.050

0.050

0.050

0.050

0.051

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.050

0.095

0.110

0.116

0.139

0.138

0.132

0.078

0.088

0.088

0.119

0.119

0.116

0.066

0.071

0.072

0.098

0.099

0.096

0.076

0.079

0.082

0.094

0.093

0.092

0.063

0.062

0.063

0.079

0.079

0.064

0.171

0.208

0.222

0.296

0.293

0.278

0.142

0.170

0.180

0.254

0.253

0.240

0.115

0.129

0.l31

0.167

0.164

0.163

0.5 ADF

α ˆ

T

0.050

0.050

0.050

0.052

0.050

0.051

0.084

0.091

0.092

MSB

ADF-GLS

α − GLS

0.050

0.050

0.050

0.063

0.063

0.064

0.121

0.119

0.119

MSB-GLS

Note: The power is computed with T=100 and 10,000 replications.

0.157

0.174

0.178

0.221

0.219

0.216

0.224

0.208

0.285

0.356

0.351

0.335

0.351

0.362

0.365

0.433

0.427

0.424

0.228

0.255

0.262

0.338

0.332

0.329

0.169

0.188

0.193

0.259

0.256

0.254

0.452

0.446

0.563

0.638

0.632

0.611

0.374

0.446

0.469

0.564

0.562

0.541

0.316

0.376

0.400

0.510

0.507

0.488

20

0.80

0.710

0.709

0.812

0.849

0.847

0.831

0.629

0.709

0.737

0.802

0.800

0.782

0.568

0.651

0.681

0.762

0.761

0.741

0.590

0.595

0.609

0.683

0.679

0.673

0.414

0.461

0.484

0.581

0.575

0.572

0.331

0.370

0.384

0.482

0.478

0.474

Table 2: Small sample size adjusted power no constant, no serial correlation, T=300

R 2

-c

\ ρ

0

1

5

0.98

10

0.97

15

0.95

Case (ii)

0

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

0.050

0.050

0.051

0.050

0.050

0.051

0.090

0.106

0.119

0.124

0.123

0.122

0.3

0.5

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

0.050

0.050

0.051

0.050

0.050

0.051

0.050

0.050

0.051

0.050

0.050

0.051

0.081

0.091

0.102

0.107

0.106

0.109

0.065

0.076

0.084

0.091

0.090

0.089

0.140

0.169

0.186

0.247

0.243

0.245

Case (iii)

0

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

MSB-GLS

0.050

0.050

0.051

0.050

0.050

0.050

0.079

0.084

0.085

0.091

0.089

0.091

0.3 ADF

α ˆ

T

0.050

0.050

0.051

0.066

0.069

0.071

0.114

0.130

0.133

0.5

MSB

ADF-GLS

α − GLS

MSB-GLS

ADF

α ˆ

T

MSB

ADF-GLS

α − GLS

0.050

0.050

0.050

0.050

0.050

0.051

0.050

0.050

0.050

0.077

0.075

0.077

0.056

0.055

0.055

0.062

0.060

0.062

0.158

0.156

0.160

0.084

0.095

0.099

0.116

0.115

0.118

MSB-GLS

Note: The power is computed with T=300 and 10,000 replications.

0.149

0.172

0.178

0.203

0.199

0.201

0.214

0.259

0.285

0.328

0.323

0.322

0.171

0.206

0.226

0.281

0.277

0.280

0.432

0.512

0.555

0.614

0.606

0.600

0.363

0.432

0.469

0.554

0.552

0.548

0.305

0.363

0.402

0.508

0.506

0.503

0.296

0.337

0.352

0.408

0.403

0.407

0.220

0.251

0.263

0.320

0.313

0.319

0.163

0.185

0.194

0.247

0.244

0.247

0.607

0.693

0.734

0.798

0.795

0.794

0.541

0.628

0.670

0.767

0.765

0.761

0.505

0.566

0.585

0.652

0.649

0.650

0.396

0.446

0.463

0.548

0.543

0.546

0.305

0.351

0.363

0.462

0.457

0.464

20

0.93

0.693

0.775

0.806

0.847

0.843

0.839

36

Download