Tests for a Mean Shift with Good Size and Monotonic Power Mohitosh Kejriwaly Purdue University July 3, 2007 Abstract Commonly used LM tests for a shift in the mean of a dynamic time series su¤er from the important problem of non-monotonic power in …nite samples. Wald tests, on the other hand, can be seriously oversized with persistent errors. This paper proposes a new hybrid estimator of the long-run variance which is based on residuals computed under both the null and alternative hypotheses. It is shown that modi…ed Wald tests based on this estimator can bypass the non-monotonic power problem while maintaining adequate size properties. The proposed tests may be expected to su¤er some power loss compared to the Wald tests in the case of mildly persistent errors, where the latter are adequately sized. Our simulations show, however, that the power loss is not signi…cant. An empirical illustration is provided. 1 Introduction Testing for a shift in the mean of a series is an important aspect of the analysis of time series data. Bai (1994) considers a mean shift at an unknown date in a linear process where the unknown shift date is estimated by least squares. He establishes the consistency, rate of convergence and the asymptotic distribution of the estimated change point assuming that the magnitude of the shift is small.1 With respect to testing, Andrews (1993) derives the limit distribution of the Sup-type tests, which is based on the maximal value of some statistic over possible break dates within a pre-speci…ed set that excludes some proportion of the data I am indebted to Pierre Perron for continuous guidance and support. Any remaining errors are my own responsibility. y Krannert School of Management, Purdue University, 403 W.State Street, West Lafayette IN 47907 (mkejriwa@purdue.edu). 1 Bai (1997) generalizes these results to regression models involving stationary and/or trending regressors. 1 near the beginning or the end of the sample. Andrews and Ploberger (1994) develop tests in a local asymptotic framework that are optimal in the sense that they maximize a weighted average power criterion. More recently, Kim and Perron (2006) compare the asymptotic relative e¢ ciency of structural break tests using a framework based on the approximate Bahadur slope. Vogelsang (1999) discusses sources of non-monotonic power for a wide variety of tests for a shift in the mean of a dynamic time series.2 With serially correlated errors, the problem lies in the behavior of the estimate of the long-run variance under the alternative hypothesis of a shift in mean. In particular, if the error variance is estimated under the null (the LM tests), non-monotonic power can result. The extent of such non-monotonicity can be quite serious and power can even go to zero as the magnitude of the shift increases. Crainiceanu and Vogelsang (2007) show that data dependent bandwidths used in the estimation of the long-run variance (based on the misspeci…ed null model) are too large under the alternative hypothesis. This in turn leads to a decrease in power as the magnitude of change increases. A possible solution to this problem of non-monotonic power is to estimate the long-run variance under the alternative hypothesis (the Wald tests). However, as documented by Vogelsang (1999), this leads to size distortions in excess of 20% when the errors follow a …rst order autoregression with autoregressive coe¢ cient 0.7. Andreou (2007) proposes a nearstationarity condition which can solve the non-monotone power problem of the CUSUM test.3 This paper proposes a new hybrid estimator of the long-run variance which is based on residuals computed under both the null and alternative hypotheses. In particular, the estimate of the variance and bandwidth are based on the residuals calculated under the alternative hypothesis while the covariances are estimated under the null hypothesis. We show both theoretically and through Monte-Carlo experiments that modi…ed Wald tests based on this estimator can not only bypass the problem of non-monotonicity and maintain power comparable to the usual Wald tests but also retain adequate size properties. Our analysis also suggests that the source of size distortions associated with the usual Wald tests is the estimation of the covariances under the alternative hypothesis. The rest of the paper is organized as follows. Section 2 presents the model and derives 2 Perron (1991) suggests that this non-monotonic feature is likely to be common to most tests for structural change when lagged dependent variables are included as regressors. 3 Her results, however, are very sensitive to the value of a parameter which represents deviations of the autoregressive parameter from unity. Moreover, the results are primarily based on Monte-Carlo experiments with little theoretical justi…cation. 2 the main result of the paper. Section 3 o¤ers Monte-Carlo evidence to assess the adequacy of the proposed test in terms of …nite sample size and power. Section 4 presents an application of the proposed tests to the U.S. in‡ation rate. Section 5 concludes. 2 The Main Result We consider the simple mean shift model + DUtc + ut yt = c where DUtc = I(t > Tbc ); Tbc = [T (1) ]. We make the following assumption on the errors: Assumption A1: The errors fut g satisfy an invariance principle: T 1=2 [T r] X t=1 ut ) W (r) 2 where W (:) is a standard Brownian motion, 1 = limT !1 T P E( Tt=1 ut )2 and ) denotes weak convergence under the Skorohod topology.4 Let SSR0 denote the sum of squared residuals under the null hypothesis of stability and SSR ( ) denote the sum of squared residuals obtained using the break date Tb (= [T ]). Let ^ denote the break fraction which minimizes the sum of squared residuals, that is, ^ = arg min arbitrarily small positive number ; 2 =f : ; g. For a given break fraction 1 , the LM and Wald test statistics for a mean shift are then given by LM ( ) = e 2 SSR0 SSR( ) 2 e T X = T 1 u e2t + 2T t=1 SSR0 1 T 1 X j=1 SSR( ) ^ ( ) T X u^2t ( ) + 2T ^2( ) = T 1 W ald( ) = where u et u^t ( ) = yt = yt T (T Tb ) 1 PT w(j; m) e T X t=j+1 u et u et j 2 1 t=1 4 SSR( ); where, for some 2 t=1 yt (t P T 1 t=Tb +1 yt T 1 X w(j; m( ^ )) j=1 = 1; ::; T ); u^t ( ) = yt T X u^t ( )^ ut j ( ) t=j+1 Tb 1 PTb t=1 yt if t [T ] and otherwise. m e is the bandwidth estimated using residuals Such an invariance principle can be shown to hold under quite general conditions on the errors. See, for example, Qu and Perron (2007). 3 under the null hypothesis and m( ^ ) is the bandwidth estimated using residuals that are obtained assuming a break date Tb = [T ]. For example, when Andrews’ (1991) data dependent method is used, we have m e = c(e T )1=# where # depends on the kernel (e.g., # = 3 for the Bartlett kernel and # = 5 for the Quadratic Spectral) and e is, say, based on an AR(1) approximation: e = 4e2 =(1 e2 )2 with e the OLS estimate from a regression of u et on u et 1 . Similarly, m( ^ ) is based on ^ ( ) which is computed using residuals u^t ( ). We make the following assumption on the kernel function: Assumption A2: We assume that the kernel function w(j; m) satis…es the regularity P conditions stated in Andrews (1991), in particular the fact that Tj=11 jw(j; mj = O(m). 2 We propose the following hybrid estimator of ^ 2M =T 1 T X u^t ( ^ )2 + 2T 1 t=1 T 1 X : w(j; m( ^ ^ )) j=1 T X t=j+1 and the corresponding test statistic: W ( )= SSR0 u et u et j SSR( ) (2) (3) ^ 2M We consider the following three functionals: Z M ean-J = J( ) 2 Z 1 Exp-J = log exp J( ) 2 2 Sup-J = sup J( ) 2 where J = LM; W ald; W . In the following Proposition, we will show that tests based on the hybrid estimator (2) has power that is monotonic with respect to the magnitude of the break j j. Proposition 1 Assume that yt (t = 1; :::; T ) is generated by (1); where limit of the test statistic (3) is given by T 1+1=# p W ( )! ( c )2 (1 ) 2 (Op ( )) 6= 0. Then the 2 (4) We also have the following approximation: T 1+1=# W ( ) ( c )2 (1 (T 4 1=# 2 2 ) + Op (1)) (5) Proof: It is easy to show that T 1 1 SSR0 = T T X u2t + c c (1 2 ) + op (1) t=1 We also have T 1 SSR( ) = T 1 T X c u2t +( c 2 ) + op (1) t=1 We thus get Let j ( c )2 (1 T 1 (SSR0 SSR( )) = P =plimT !1 T 1 Tt=j+1 ut ut j . We can show that T 1 T X t=j+1 We thus have T u et u et 1=# 2 ^M =T p j ! j 1=# 2 c + + 2T (1 c 1=# 2 ) 2 T 1 X lim (1 T !1 ) 2 + op (1) j=T ) uniformly in j w(j; m( ^ ^ ))(1 j=T ) + op (1) (6) j=1 P P We have m( ^ ^ ) = c(^ T )1=# ; ^ = 4^2 =(1 ^)4 ; ^ = T 1 Tt=2 u^t ( ^ )^ ut 1 ( ^ )=T 1 Tt=2 u^t 1 ( ^ )2 . p Using the consistency of ^ for c (see Bai, 1994), we can show that ^ ! 1 = 0 . We thus get ^ = Op (1) and m( ^ ^ ) = Op (T 1=# ). This gives T 1 X w(j; m( ^ ^ ))(1 j=T ) = Op (T 1=# ) (7) j=1 Hence, we have from (6); T 1=# 2 ^M = Op ( 2 ) and the limit in (4) follows. The approximation (5) is also immediate from (6) and (7). Equation (4) shows that the proposed tests diverge at rate Op (T 1 spectral kernel (# = 5), the rate of divergence is Op (T 4=5 1=# ). With the quadratic ) while it is Op (T 2=3 ) with the Bartlett kernel (# = 3). Hence, using the Bartlett kernel might lead to some power loss due to the reduced rate of divergence. While the LM tests diverge at the same rate as the proposed tests, the Wald tests diverge at a faster rate (Op (T )) and hence are asymptotically more e¢ cient. However, as we show in the next section, the Wald tests can be seriously oversized with strong persistence in the errors. The approximation (5) shows that as the magnitude of the break j j increases, the limit of W ( ) increases so that power is monotonic. 5 Remark 1 Deng and Perron (2007) show that e2 = Op ( 4=#+2 T 1=# ). This implies that for the LM tests the denominator has a higher order than the numerator thus inducing a nonmonotonic power function. As pointed out by Kim and Perron (2006), this feature of the LM tests can also be observed in the case of general regressors Xt whose mean increases with the sample size. 3 Monte-Carlo Evidence In this section, we present Monte Carlo evidence to assess the adequacy of the proposed tests in …nite samples as well as compare its performance to the LM and Wald tests. We consider the following error processes: ut = ut 1 ut = et + et ; u0 = 0 (AR(1) Errors) et 1 (MA(1) Errors) The innovations et are generated as i:i:d: N (0; 1) random variables. We consider 4 values of the AR(1) parameter and 2 values of the MA(1) parameter: = 0; 0:2; 0:5; 0:7; = 0:5; 0:7. The sample sizes considered are T = 120 and T = 240. The level of trimming used is = :15. The long-run variance estimators are constructed using a quadratic spectral kernel with an AR(1) approximation to the bandwidth.5 All experiments are based on 1000 replications. Table 1 presents a comparison of the size ( = 0) of the proposed tests with the LM and Wald tests. Consider …rst the case of AR(1) errors. For mildly persistent errors ( = 0; 0:2) and T = 120, the Wald and W are slightly oversized although doubling the sample size results in an exact size closer to the nominal size. With strong persistence in the errors ( = 0:5; 0:7), the Wald tests su¤er from substantial size distortions. While the size improves when the sample size is doubled, the distortions continue to persist. For example, with = 0:7 and T = 240, the size of the exp-Wald test is nearly 14%. On the other hand, the modi…ed tests perform quite well with the size never exceeding 10%. The Sup-W test performs the best among the W -tests with …nite sample size very close to nominal size. The LM tests also perform adequately in terms of size. With MA(1) errors, all the tests are quite conservative irrespective of the sample size and the extent of persistence. However, as we will see in the power simulations, the conservative nature of the tests do not seem to have 5 We also tried an ARMA(1,1) approximation for the bandwidth. The results showed substantial size distortions with MA(1) errors. 6 Table 1: Size of LM, Wald and modi…ed Wald Tests (Nominal Size = 5%) AR(1) Errors =0 = 0:2 T = 120 T = 240 T = 120 = 0:5 = 0:7 T = 240 T = 120 T = 240 T = 120 T = 240 M -LM .057 .047 .057 .057 .068 .056 .061 .059 E-LM .052 .041 .052 .052 .056 .050 .034 .047 S-LM .030 .035 .030 .041 .028 .035 .008 .021 M -W ald .075 .051 .075 .066 .126 .088 .171 .110 E-W ald .076 .054 .076 .074 .160 .106 .222 .135 S-W ald .066 .051 .066 .073 .140 .091 .203 .125 M -W .075 .053 .075 .063 .091 .074 .091 .073 E-W .068 .052 .068 .063 .090 .062 .084 .064 S-W .056 .045 .056 .052 .053 .046 .045 .036 MA(1) Errors = 0:5 = 0:7 T = 120 T = 240 T = 120 T = 240 M -LM .004 .009 .000 .000 E-LM .004 .007 .000 .000 S-LM .002 .005 .000 .000 M -W ald .007 .011 .000 .001 E-W ald .011 .007 .000 .000 S-W ald .009 .005 .001 .001 M -W .006 .010 .000 .000 E-W .008 .009 .000 .000 S-W .006 .006 .001 .000 an adverse e¤ect on power. Next, we present a power comparison of the LM, Wald and W -tests. We consider a 7 break in the middle of the sample - c = 0:5:6 The power functions of the LM and modi…ed tests with AR(1) errors are plotted in Figures 1 and 2.7 The …gures clearly illustrate that the LM tests su¤er from the problem of non-monotonicity irrespective of the sample size and the degree of persistence. Among the LM tests, the Mean-test seems to perform the best while the Sup-test performs most poorly (though the Sup-test is quite conservative). On the contrary, all the W -tests exhibit monotonic power (except for the case = 0 which shows slight non-monotonicity). The ranking among the W -tests is less evident - all tests seem to perform similarly except that the Mean-test performs slightly better than the Exp and Sup-tests when = 0:7 and T = 120. Figures 3 and 4 present a power comparison of the Wald tests and the proposed modi…ed tests. The power functions of the modi…ed tests are only slightly lower than those of the Wald tests. Thus, the proposed tests o¤er an improvement in size over the Wald tests with little loss in power. Figure 5 presents a power comparison of the LM and modi…ed Wald tests with MA(1) errors. In this case, the decline in power for the LM tests occurs at larger values of as compared to the AR(1) case. Comparing the power functions of the Wald and the modi…ed tests we found that as in the AR(1) case, the modi…ed tests su¤er very little power loss compared to the Wald tests. Overall, the proposed modi…cation of the Wald test clearly outperforms all the LM tests in terms of power while still maintaining adequate size properties. Since the Wald tests are based on long-run variance estimates constructed using residuals under the alternative hypothesis only, our proposed tests may be expected to su¤er some power loss; our experiments show, however, that the loss is not signi…cant. 4 Empirical Application This section presents an application of the proposed test to the U.S. in‡ation rate. There is a substantial debate in the macroeconomics literature regarding the possible nonstationary behavior of the in‡ation rate. The nonstationarity of in‡ation has important economic implications. Conventional sticky price models such as Dornbusch (1976) and Taylor (1979) imply stationarity of the price level. Moreover, the nonstationarity of in‡ation a¤ects the construction and evaluation of monetary policy rules. Hall (1984) and Taylor (1985), among many others, have advocated that monetary policy should target either the level or growth of national income. If in‡ation is nonstationary, then so is nominal income growth unless Power experiments were also conducted for c = 0:3; 0:7. The results were qualitatively similar to = 0:5 and hence not reported. 7 All power functions presented are size-unadjusted. 6 c 8 in‡ation and real income growth are cointegrated. Benati and Kapetanios (2003) …nds evidence of breaks in in‡ation dynamics for 23 in‡ation series from 18 countries. Rapach and Wohar (2005) present evidence that breaks in the mean in‡ation rate are correlated with those in the real interest rate for 13 industrialized countries. Levin and Piger (2004) apply both classical and Bayesian econometric methods to test the presence of a break in the mean of in‡ation after 1984 for 12 industrial countries. They show that allowing for a break in intercept, the in‡ation measures generally exhibit low persistence. We apply the proposed tests to analyze whether there was a mean shift in the U.S. in‡ation rate as a result of Paul Volcker taking up Fed Chairmanship in 1979 and consequent changes in the Federal Reserve’s operating procedures. We use monthly data on the seasonally adjusted CPI in‡ation rate obtained from the FRED website. The sample period used is 1970:1-1987:7. The start date corresponds to Arthur F. Burns taking up Fed Chairmanship while the end date corresponds to the end of the Volcker-era. Figure 6 presents a plot of the in‡ation rate over the sample period. The plot suggests a change in the mean sometime around the late ’70s/early 80’s. Table 2 presents results of the W - tests as well as the LM tests. Table 2: Empirical Results for the U.S. In‡ation Rate (1970:1-1987:7) Test Value 5% Critical Value M ean-LM 2.13 2.88 Exp-LM 1.78 2.06 Sup-LM 6.85 8.85 M ean-W 2.81 2.88 Exp-W 2.61 2.06 Sup-W 9.05 8.85 The empirical results show that the LM tests cannot reject the null hypothesis of a constant mean in the in‡ation rate over the sample period considered. In contrast, all the W tests except M ean-W reject the null of stability at the 5% level. The estimated break date (that which minimizes the sum of squared residuals) is 1982:6 which falls within the period of Volcker disin‡ation (1979:10-1982:10). To ensure the absence of a unit root, we estimated an AR(p) model on the residuals where p was selected using the Bayesian Information Criterion (BIC). The sum of the AR coe¢ cients was estimated at 0.66. The AR(p) models were also estimated separately using residuals in each regime - the AR coe¢ cients summed to 0.72 in 9 the …rst regime and 0.36 in the second regime. These estimates fall within the range of values of the persistence parameter used in the simulation experiments. which shows the bene…ts of using the W -tests over the LM and Wald tests. This simple analysis demonstrates that while the LM tests may lack power in detecting shifts in the mean of a time series, the W tests are easily able to detect such shifts and hence should be used in practical applications. 5 Conclusion This paper proposes a solution to the size-power trade-o¤ between the LM and Wald tests by proposing a new estimator of the long-run variance of the errors which is based on estimated residuals under both the null and alternative hypotheses. Our results show that structural break tests modi…ed with this estimator are able to bypass both the problem of non-monotonicity of the LM tests as well as the size distortions of the Wald tests. The proposed tests may be expected to su¤er some power loss compared to the Wald tests in the case of mildly persistent errors, where the latter are adequately sized. Our simulations show, however, that the power loss is not signi…cant. Hence, we expect that the proposed tests will provide a useful addition to the existing battery of tests for structural change. References [1] Andreou, E. (2007): “Restoring Monotone Power in the CUSUM Test,” Manuscript, Department of Economics, University of Cyprus. [2] Andrews, D.W.K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,”.Econometrica, 59, 817-858. [3] Andrews, D.W.K. (1993): “Tests for Parameter Instability and Structural Change with Unknown Change Point,”Econometrica, 61, 821-856. [4] Andrews, D.W.K. and Ploberger, W. (1994): “Optimal Tests when a Nuisance Parameter is Present only under the Alternative,”Econometrica, 62, 1383-1414. [5] Bai, J. (1994): “Least Squares Estimation of a Shift in Linear Processes,” Journal of Time Series Analysis, 15, 453-472. [6] Bai, J. (1997): “Estimation of a Change Point in Multiple Regression Models,”Review of Economics and Statistics, 79, 551-63. [7] Benati, L. and Kapetanios, G. (2003): “Structural Breaks in In‡ation Dynamics,”Computing in Economics and Finance, No. 169. 10 [8] Crainiceanu, C. and Vogelsang, T.J. (2007): “Non-monotonic Power for Tests of Mean Shift in a Time Series,”Journal of Statistical Computation and Simulation, forthcoming. [9] Deng, A. and Perron, P. (2007): “A Non-local Perspective on the Power Properties of the CUSUM and CUSUM of Squares Tests for Structural Change,” Journal of Econometrics, forthcoming. [10] Dornbusch, R. (1976): “Expectations and Exchange Rate Dynamics,”Journal of Political Economy, 84, 1161-1176. [11] Hall, R. (1984): “Monetary Strategy with an Elastic Price Standard,” Price Stability and Public Policy, Federal Research Bank of Kansas City, 137-159. [12] Kim, D. and Perron, P. (2006): “Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope,” Manuscript, Department of Economics, Boston University. [13] Levin, A.T. and Piger, J.M. (2004): “Is In‡ation Persistence Intrinsic in Industrial Economies?,”European Central Bank Working Paper No. 334. [14] Perron, P. (1991): “A Test for Changes in a Polynomial Trend Function for a Dynamic Time Series,”Manuscript, Department of Economics, Princeton University. [15] Qu, Z. and Perron, P. (2007): “Estimating and Testing Structural Changes in Multivariate Regressions,”Econometrica, 75, 459-502. [16] Rapach, D.E. and Wohar, M.E. (2005): “Regime Changes in International Interest Rates: Are They a Monetary Phenomenon,” Journal of Money, Credit and Banking, 37, 887-906. [17] Taylor, J. (1979): “Staggered Wage Setting in a Macro Model,” American Economic Review, 69, 108-113. [18] Taylor, J. (1985): “What would Nominal GNP Targeting do to the Business Cycle,” Carnegie-Rochester Conference Series on Public Policy, 22, 61-84. [19] Vogelsang, T.J. (1999): “Sources of Nonmonotonic Power when Testing for a Shift in a Dynamic Time Series,”Journal of Econometrics, 88, 283-299. 11 Figure 1: Power Functions with AR(1) Errors: A Comparison of LM and W* Tests Figure 2: Power Functions with AR(1) Errors: A Comparison of LM and W* Tests Figure 3: Power Functions with AR(1) Errors: A Comparison of Wald and W* Tests Figure 4: Power Functions with AR(1) Errors: A Comparison of Wald and W* Tests Figure 5: Power Functions with MA(1) Errors: A Comparison of LM and W* Tests Figure 6: The U.S. In‡ation Rate - 1970:1-1987:7