ARTICLE IN PRESS Journal of Econometrics 143 (2008) 227–262 www.elsevier.com/locate/jeconom Detections of changes in return by a wavelet smoother with conditional heteroscedastic volatility$ Gongmeng Chena, Yoon K. Choib, Yong Zhouc,d, a Department of Economics, School of Economics and Management, Shanghai Jiaotong University, Shanghai 200030, PR China b Department of Finance, College of Business Administration, University of Central Florida, Orlando, FL 32816, USA c Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing 100080, PR China d Department of Statistics, Shanghai University of Finance and Economics, Shanghai 200433, PR China Available online 17 October 2007 Abstract In this paper, we propose two estimators, an integral estimator and a discretized estimator, for the wavelet coefficient of regression functions in nonparametric regression models with heteroscedastic variance. These estimators can be used to test the jumps of the regression function. The model allows for lagged-dependent variables and other mixing regressors. The asymptotic distributions of the statistics are established, and the asymptotic critical values are analytically obtained from the asymptotic distribution. We also use the test to determine consistent estimators for the locations of change points. The jump sizes and locations of change points can be consistently estimated using wavelet coefficients, and the convergency rates of these estimators are derived. We perform some Monte Carlo simulations to check the powers and sizes of the test statistics. Finally, we give practical examples in finance and economics to detect changes in stock returns and short-term interest rates using the empirical wavelet method. r 2007 Elsevier B.V. All rights reserved. JEL classification: C12; C52 Keywords: Nonparametric regression; Wavelet coefficient; Change points; Kernel estimation; Local polynomial smoother; Conditional heteroscedastic variance; a-Mixing 1. Introduction One of the intensively studied models in finance is the one-factor diffusion model: drt ¼ mðrt Þ dt þ sðrt Þ dwt , (1.1) where wt ¼ wðtÞ is a standard Wiener process. The functions mðÞ and sðÞ are the drift (or instantaneous mean regression return) and diffusion (or instantaneous variance, volatility, risk) functions of the process rt of $ This paper has been supported in part by a grant from the Hong Kong Polytechnic University. Zhou’s research was supported in part by National Natural Science Foundation of China (NSFC) Grants 10471140, the National Basic Research Program of China (973 Program) Grant 2007CB814902. Corresponding author. Tel.: +86 10 62651335; fax: +86 10 62541689. E-mail addresses: afgmchen@inet.polyu.edu.hk (G. Chen), ychoi@bus.ucf.edu (Y.K. Choi), yzhou@mail.amss.ac.cn (Y. Zhou). 0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2007.10.001 ARTICLE IN PRESS 228 G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 interest, respectively. We call (1.1) a drift-plus-diffusion model. In asset pricing, mðÞ is the expected rate of return and sðÞ the price volatility. With rt representing the short-term rate, model (1.1) incorporates many short-rate models, such as the CIR model (Cox et al., 1985), the AL model (Anderson and Lund, 1997), and Aı̈t-Sahalia’s model (Aı̈t-Sahalia 1996a, b). It is of general interest in finance to identify the drift mðÞ and volatility (risk) sðÞ. As Aı̈t-Sahalia (1996a, b) pointed out, one of the most important features of (1.1) for derivative security pricing is the specification of the return mðÞ and the volatility sðÞ. However, the continuous-time model faces a severe limitation in practical applications such as short-term hedging strategies or the pricing of many derivatives, when jumps exist in underlying sample data. Amin (1993) recognizes the sensitivity of option prices when jumps are present. Merton (1976) considered Poisson jumps superimposed on a geometric Brownian motion in describing stock prices. However, Merton’s model assumes that returns are independent and identically distributed, which contradicts evidence of conditional heteroscedasticity in returns. Other authors also allowed for jumps in continuous-time models (see Ahn and Thompson, 1988; Bates, 1991; Das and Foresi, 1996; Duffie et al., 2000; Aı̈t-Sahalia et al., 2001, among others). One common feature of these recent models is that the assumed distributions with parameters need to be estimated. Recently, Aı̈t-Sahalia (1996b) proposed a test to examine whether any standard, one-factor model, often used to fit short-rate processes and asset pricing models, is parametric or nonparametric. Using 22 years of daily U.S. short-rate data, Aı̈t-Sahalia (1996b) strongly rejected well-known one factor, parametric diffusion models (such as Vasicek’s model (1977), the CIR model, and Duffie–Kan’s model, 1993). The strong rejection using Aı̈t-Sahalia’s test suggests that nonparametric models are very powerful and robust for differentiating among short-rate models. Furthermore, Aı̈t-Sahalia (2002) proposed a method to distinguish between diffusion and non-diffusion processes, based on a discrete subsample of the continuous-time path. Following Karlin and McGregor (1959), he derived a necessary and sufficient condition on the transition densities of diffusions at the sampling interval of the observed data. He also showed that the S&P 500 index is consistent with a continuous-time diffusion with jumps. In this paper, we propose estimators, and their test statistics for jumps, of the regression function in nonparametric regression models, taking into account dependent observations using wavelet methods. Unlike traditional smoothing methods based on a fixed spatial case (e.g., Fourier series methods or fixed bandwidth kernel methods), the wavelet method is a multi-resolution approach and has local adaptivity. Recently, Delgado and Hidalgo (2000) proposed estimators of locations and sizes of structural breaks in general regression models, based on the kernel method. However, the kernel estimator is sensitive to the bound of the estimated function. In addition, the wavelet method is different from those based on residual errors: for example, CUSUM tests (Cumulative Sum of Residuals test; see Krämer et al., 1988), mostly in the special case where the observations are assumed to be a sequence of independent and identically distributed (i.i.d.) random variables. Some authors (Tran, 1999; Kao and Ross, 1995) have shown that the CUSUM test is not robust with respect to departure from independence. Monte Carlo results suggest that the performance of the standard CUSUM test is quite disappointing (see Kao and Ross, 1995; Tran, 1999). ðTÞ The wavelet coefficient method has many merits. First, empirical wavelet coefficients U ðTÞ J n ðkÞ and W J n ðkÞ, to be defined below, are much more sensitive than the CUSUM test, with respect to changes in the conditional mean function. Thus the empirical wavelet coefficient method is more likely to detect the regression changes. Second, in constructing estimators of change points in the conditional mean function, only the conditional mean function needs to be estimated using the wavelet method. The CUSUM method, however, needs the estimated residual errors from model (2.1), which requires a complicated estimation of conditional heteroscedastic variance (see Krämer et al., 1988; Tran, 1999). Third, the wavelet method makes it convenient to study the change in the conditional mean in more generalized models with the heteroscedastic volatility function; furthermore, the method can be easily extended to study multivariate models (Wang, 1998). Finally, we can use the tests to determine the location and sizes of jump points of the regression function, even in a general model with the conditional heteroscedastic variance. In addition, location estimators of the jump points have been shown to have the minimax convergence rate, which is the optimal rate for the estimation of change points, even if the observations are not a sequence of i.i.d. random variables. There have been developments of wavelet applications with respect to financial time series. Several authors proposed jump-point detection procedures to estimate jumps in signals observed with noise using the wavelet ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 229 procedure (see Mallat and Wang, 1992; Wang, 1995; Raimondo, 1998; Antoniadis and Gijbels, 1997). In all the regressions in the aforementioned studies, the models have been formulated by means of additive i.i.d. Gaussian noise. Wang (1995) employed the wavelet method to detect jumps in a continuous-time model with a constant volatility, and applied it to stock market return data. Raimondo (1998) extended the assumption of Gaussian noise. Wang (1999) proposed a wavelet decomposition to detect and estimate change points of a function for noisy data, observed from a transformation of the function. Li and Xie (1999) and Wong et al. (1999) also studied the estimation of jumps of threshold regression and autoregressions using the wavelet method. Recently, Drost et al. (1998) considered a similar problem in estimating and testing continuous-time models with jumps and conditional heteroscedasticity. Interestingly, their tests have the same ideal property as the wavelet method, applicable at any frequency. However, their tests are based on the variance and kurtosis of the process, which are estimated using the quasi-maximum likelihood method, assuming that the process follows a normal distribution. Furthermore, they do not estimate the locations of any jumps. Most recently, Wong et al. (1999) have shown that the wavelet coefficient has significantly large absolute values near the jump points across fine levels, while having relatively small absolute values as soon as the location shifts away from the jump points. Hence, the wavelet coefficient exhibits high peaks near the jump points. However, they have not derived the distribution or asymptotic distribution of the statistics, and thus there are no critical values for the empirical wavelet coefficient to determine the significance level for any jump points. Moreover, their method strongly depends on the lower boundness of jump sizes, which is unknown in practice. We obtain the asymptotic distributions of empirical wavelet coefficients and test for multiple changes in their locations and sizes. The critical values of our test statistics are calculated analytically. The remainder of this paper is organized as follows. In Section 2, we discuss the empirical wavelet coefficients and propose an integral estimator and a discretized estimator for the wavelet coefficient of the regression function to detect the change points. We also derive the asymptotic distributions of the tests, which are the extreme distributions, under the null hypothesis, and construct the estimators of locations and jump sizes of change points. We further establish the consistency of estimators of the locations and jump sizes. The convergence rates of location estimators are obtained, which are the best convergence rates in a nonparametric frame, even when the observations are not a sequence of i.i.d. random variables (that is, the optimal minimax rate in the sense of Raimondo, 1998). In Section 3, we conduct simulation experiments to assess the finite sample properties of the tests and calculate the sizes and powers of the tests for different sample sizes. In Section 4, we analyze the structural changes of financial data using the proposed wavelet method. Some proofs of the main results in Section 2 are very lengthy and not included here. Only the outlines are provided in Appendices A and B. 2. Wavelet method 2.1. Models and hypotheses We consider a discrete version of (1.1) with the conditional heteroscedastic variance as follows. Y t ¼ TðX t Þ þ sðX t Þt , (2.1) 2 where TðxÞ ¼ EðY jX ¼ xÞ and s ðxÞ ¼ varðY jX ¼ xÞ: fðY t ; X t Þ; t ¼ 1; 2; . . .g is a sequence of random vectors satisfying some mixing-dependent conditions, and ft ; t ¼ 1; 2; . . .g is a sequence of i.i.d. random variables. Here, the usual assumption of independence regarding fðY t ; X t Þ; t ¼ 1; 2; . . .g is relaxed to allow for dependent observations in a time series, which is very important because the sequence of observations in economics and finance is often dependent and highly persistent. When xt is a fixed time variable, we consider the following model: Y t ¼ Tðxt Þ þ sðxt Þt . (2.2) In this model, we assume that i is a sequence of random variables and fxt ; t ¼ 1; 2; . . . ; ng forms a sequence of fixed designs such that xt 2 ½a; b and Z xtþ1 f ðxÞdx ¼ 1=n for all n; t ¼ 1; 2; . . . ; n xt with a known probability density function, f ðxÞ. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 230 Function gðxÞ is said to have að0pap1Þ sharp cusp at x0 if there exists a positive constant C such that for a small enough h, jgðx0 þ hÞ gðx0 ÞjXCjhja . When a ¼ 0, gðxÞ has a jump at x0 . We assume that gðxÞ is smooth in the sense of being continuous and differentiable. In this paper, we assume that TðxÞ is smooth except at those discontinuous points. Our interest is to test the hypothesis that there is no discontinuous point in the regression function TðxÞ, against an alternative hypothesis that there exists at least one discontinuous point in TðxÞ, x 2 ½a; b for some constants jajo1 and jbjo11; that is H0 : TðxÞ ¼ T 0 ðxÞ 3 H1 : Tðx0 ÞaTðx0 þÞ, for at least x0 2 ½a; b, where T 0 ðxÞ is a smooth function in [a,b]. In fact, when TðxÞ has p discontinuous points in ½a; b, it implies that TðxÞ can be re-written, for the sake of simplicity, as TðxÞ ¼ CðxÞ þ DðxÞ, P where DðxÞ ¼ pl¼1 d l I ½tl ;b ðxÞ with aot1 ot2 o otp ob and CðxÞ is twice continuously differentiable on ða; bÞ. This implies that TðxÞ is a smooth function except at finite jump points; that is, when d l ¼ 0, TðxÞ is very smooth. Let d i ¼ Tðti þÞ Tðti Þ denote the size of a jump of the function TðxÞ at point ti . Here p, d l and tl ; l ¼ 1; 2; . . . ; p are all unknown constants to be estimated. 2.2. Estimation of the wavelet coefficient Let F1 and F2 be two s-algebras. The measures of dependence between F1 and F2 are defined as follows: aðF1 ; F2 Þ ¼ supfjPðA \ BÞ PðAÞPðBÞj; A 2 F1 ; B 2 F2 g. Suppose that fU t ; tX1g is a sequence of real-valued random variables. Let Fba ¼ sðU i ; apipbÞ be the salgebra generated by the indicated random variables. Then write aðnÞ ¼ sup aðFt1 ; F1 nþt Þ. tX1 The sequence fU t ; tX1g is said to be a-mixing (or strong mixing) if aðnÞ ! 0 as n ! 1. For model (2.1), assume that Eðt jFt Þ ¼ 0, Varðt jFt Þ ¼ s20 ; Ejt jR o1 for some R42. Occasionally, we assume that ft g is independent of Ft , where Ft ¼ FðX t ; X t1 ; . . .Þ is the information set up to time t. Masry and Tjøstheim (1995) have shown that under some conditions, the sequence created by model (2.1) is geometrically ergodic. Therefore, it follows from Bradley (1986) that the sequence created by model (2.1) is stationary and a-mixing, with the mixing coefficient of an exponentially decreasing rate. A sufficient condition is given in Appendix A (see Remark A.1). Before discussing the wavelet transformation of regression (2.1), we need to introduce some notations. Assume that fðY t ; X t Þ; 1ptpng is a realization of ðY ; X Þ. Suppose that I n ðx0 Þ ¼ fi : 1pipn and jX i x0 jphn g, and let N n ðx0 Þ ¼ #I n ðx0 Þ denote the number of points (or the sample) in I n ðx0 Þ, Dn ¼ f0; 1; . . . ; 2J n 1g where J n is often a sequence with J n ! 1 as n ! 1. Let k Iðs; dn Þ ¼ k : a þ J n ðb aÞ spdn , 2 where dn ¼ 2J n . 1 The support of X may be infinite when X is a random design. But we may always consider the finite interval ½a; b as the support of X. Without loss of generality, we assume X 2 ½a1 ; a2 with the density function f ðxÞ, where a1 ¼ 1 and a2 ¼ þ1. We can take a transformation for X by replacing the original X by 1=f1 þ expfX gg, which does not have any effect on our proofs below. Obviously, the random variable 1=f1 þ expfX gg is in ½0; 1. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 231 We take a wavelet cðxÞ throughout this paper, x 2 R, with the following properties (using the notations in Wong et al., 1999). (I) cðxÞ is of bounded variation on ½A; A with a compact support on ½A; A with A41, and cðxÞ ¼ 0; x 2 ½1; 1: (II) cðxÞ is a twice continuously differentiable function on ½A; A. The wavelet function cðxÞ satisfies some integral conditions of Z A Z A cðxÞ dx ¼ 0; xcðxÞ dx ¼ 0 A A and Z Z A A xcðxÞ dxa0. cðxÞ dxa0; 1 1 (III) Furthermore, the wavelet function cðxÞ has the following properties: Z A Z A 0o cðxÞ dxo cðxÞ dx, y 1 for all 1oyoA and Z y Z 0o cðxÞ dxo A 1 A cðxÞ dx, for all 1oyoA. From the wavelet cðxÞ and any scale function fðxÞ, we can obtain the orthogonal wavelet basis on L2 ½a; b per ffper l;k ðxÞ; k 2 I l ; cJ n ;k ðxÞ; k 2 Dn ; J n Xlg, where fper l;k ðxÞ ¼ X n cper J n ;k ðxÞ ¼ x a 1 pffiffiffiffiffiffiffiffiffiffiffi fl;k þn , ba ba (2.3) x a 1 pffiffiffiffiffiffiffiffiffiffiffi cJ n ;k þn , ba ba (2.4) X n with fJ n ;k ðxÞ ¼ 2J n =2 fð2J n x kÞ; cJ n ;k ðxÞ ¼ 2J n =2 cð2J n x kÞ, and Dn ¼ f0; 1; 2; . . . ; 2J n 1g. As the orthogonal properties are not required in this paper, without loss of generality, we take x a x a 1 1 fper ; cper , l;k ðxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi fl;k l;k ðxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi cl;k ba ba ðb aÞ ðb aÞ and Dn ¼ fk : k ¼ ½2J n z; 0ozo1g. Now, the wavelet coefficient of the conditional mean function of model (2.1) is defined as Z b bðTÞ ¼ TðxÞcper J n ;k ðxÞ dx, J n ;k (2.5) a ðTÞ where cper J n ;k ðxÞ is defined by (2.4). Wong et al. (1999) proposed a simple empirical estimator of bJ n ;k , which is V ðTÞ J n ðkÞ ¼ N b aX 1 X cper Y l, J n ;k ðwi Þ N i¼1 ni l2I ðw Þ n (2.6) i where N ! 1, wi are those points to divide the interval ½a; b into N þ 1 sub-intervals, that is wi ¼ a þ iðb aÞ=N and ni ¼ #I n ðwi Þ. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 232 Let KðÞ be a probability density function with bounded support ½c; c for some constant c40. When the conditional regression function TðxÞ is a smooth function, we have the following estimator: T n ðxÞ ¼ n X Kn;hn ðX i xÞY i = i¼1 n X Kn;hn ðX i xÞ. i¼1 We can obtain two consistent estimators using the kernel estimation (Nadaraya, 1964) and local linear smoothers (Fan and Gijbels, 1996). The kernel estimator is T n ðxÞ where Kn;hn ðX i xÞ ¼ K h ðX i xÞ, while the local linear smoother is T n ðxÞ where Kn;hn ðX i xÞ ¼ K h ðX i xÞ n X K h ðX j xÞðX j xÞ2 j¼1 K h ðX i xÞðX i xÞ n X K h ðX j xÞðX j xÞ, ð2:7Þ j¼1 in which K h ðÞ ¼ Kð=hn Þ. In many applications, one often assumes that KðÞ is a symmetric probability density with finite support ½c; c, and fhn ; n ¼ 1; 2; . . .g is a sequence of bandwidths with hn ! 0 and nhn ! 1. For simplicity, we only consider the kernel estimator. When TðxÞ has no jump, the bias between T n ðxÞ and TðxÞ converges at zero. Conversely, when the conditional mean function TðxÞ has at least one jump, T n ðxÞ is not a consistent estimator of TðxÞ at the neighborhood of the change points. The wavelet transformation of TðxÞ would magnify the bias with a suitable wavelet cðxÞ and a fine scale J n ; provided that the wavelet has an appropriate number of vanishing moments to expand the values of the wavelet transformation of TðxÞ, such that the wavelet coefficients of TðxÞ have larger values than other disturbances. This is easily proved by the fact that the existence of a jump at point k0 for function TðxÞ results in the wavelet coefficient of TðxÞ near k0 being large for a suitable fine scale J n . Some authors (Wang, 1995; Daubechies, 1992, p. 300) have shown similar consequences. Therefore, the integral estimator for the theoretic wavelet coefficient is a good statistic to test whether there are jumps in TðxÞ. In fact, we may obtain two more generalized empirical estimators of bðTÞ J n ;k based on T n ðxÞ. The first estimator is an integral estimator of the theoretic wavelet coefficient bðTÞ . The idea of constructing this estimator is very J n ;k simple and intuitive. This estimator is defined by Pn Z b per i¼1 Kn;hn ðX i xÞY i P U JðTÞ dx. (2.8) ðkÞ ¼ c ðxÞ n J n ;k n a i¼1 Kn;hn ðX i xÞ One often prefers the discretized estimator to the integral estimator because of computational problems. The discretized estimator is defined by Pn N baX j¼1 Kn;hn ðX j wi ÞY j ðTÞ per W J n ðkÞ ¼ , (2.9) cJ n ;k ðwi Þ Pn N i¼1 j¼1 Kn;hn ðX j wi Þ where N and wi are the same as those in (2.6). The simple empirical estimator (2.6) can be obtained from (2.9) by taking the kernel function ( 1 if kxkp1; KðxÞ ¼ 2 0 if kxk41: The simple empirical estimator (2.6) has some drawbacks because the bandwidth hn selected in this estimator cannot reach the optimal value, that is, hopt ¼ Cn1=5 for some constant C. The selected bandwidth in the estimator (2.6) suggests that the estimator has a larger mean integration square error (MISE) than the estimators (2.8) and (2.9), which have the optimal bandwidths. This results in accepting the null hypothesis too often. Without loss of generality, we only discuss the wavelet coefficient when X is a random design in the following sections. However, the results still hold true when X is the fixed design with small modification. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 233 2.3. Asymptotic results of test statistics Let W ðxÞ be the standard Wiener process with EW ðtÞ ¼ 0, and EðW ðsÞW ðtÞÞ ¼ s ^ t ¼ minðs; tÞ, in the probability space D½0; 1, of all real-valued functions on the interval ½0; 1 that are right continuous and have left limits. We endow the space D½0; 1 with the Skorohod metric (e.g., see Billingsley, 1968). Let fBðtÞ; 0ptp1g denote a standard Brownian bridge, BðtÞ ¼ W ðtÞ tW ð1Þ, and EBðtÞ ¼ 0 and EðBðsÞBðtÞÞ ¼ t ^ s st. We define some elements of D½0; 1 to express our results as follows: Z 1 xy Y n0 ðxÞ ¼ d1=2 c dW ðyÞ, n dn 0 M n ðxÞ ¼ M n ðxÞ Jn f 1=2 ða þ xðb aÞÞU ðTÞ J n ð½2 xÞ for sða þ xðb aÞÞ Jn f 1=2 ða þ xðb aÞÞW ðTÞ J n ð½2 xÞ ¼ sða þ xðb aÞÞ 0pxp1, for 0pxp1, where dn ¼ 2J n and ½ denotes the integer less than or equal to its argument. J n is required to satisfy some conditions to derive the asymptotic distribution of empirical wavelet coefficients. To obtain our main results, some assumptions about J n , hn and N are required. These assumptions imply that the three sequences J n , hn and N need to satisfy some given convergence rate. But these assumptions are weak. In particular, the convergence rates for these constant sequences are always satisfied in many applications. Assumption J(a). limn!1 22J n ðlog nÞ3 =n ¼ 0, limn!1 ð25J n =nÞ ¼ 1, and limn!1 2J n h2n log n ¼ 0 Assumption J(b). limn!1 n2J n =ðNhn Þ2 ¼ 0. The following results play a role in detecting the jump points of the regression function: Theorem 2.1. (a) Assume that conditions (A.1)–(A.5) in Appendix A, and Assumption J(a) are satisfied. Then under the null hypothesis H0 (that is, when there is no change in the regression function TðxÞ), 1=2 n sup jM n ðxÞj and sup jY n0 ðxÞj 2 s 0pxp1 0pxp1 0 have the same asymptotic distribution. (b) Assume that conditions (A.1)–(A.5) in Appendix A, and Assumptions J(a)–J(b) are satisfied. Then under the null hypothesis, 1=2 n sup jM n ðxÞj and sup jY n0 ðxÞj 2 0pxp1 s0 0pxp1 have the same asymptotic distribution. The following corollary gives approximate critical values for the tests under the null hypothesis, which states that there is no jump point in the regression TðxÞ. Corollary 2.1. Assume that the conditions of Theorem 2.1(a) are satisfied. Then under the null hypothesis, we have ( ) n 1=2 P Aðdn Þ sup jM n ðxÞj aðdn Þoz ! expð2 expðzÞÞ. k2 s20 0pxp1 Suppose that the conditions of Theorem 2.1(b) are satisfied. Then under the null hypothesis, we have ( ) n 1=2 P Aðdn Þ sup jM n ðxÞj aðdn Þoz ! expð2 expðzÞÞ, k2 s20 0pxp1 ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 234 where AðxÞ ¼ j2 log xj1=2 and 1=2 1=2 aðxÞ ¼ j2 log xj where k1 ¼ RA A ðc 0 1=2 þ j2 log xj ðxÞÞ2 dx and k2 ¼ log RA A k1 ! 1=2 , 2pk2 c2 ðxÞ dx. It is interesting to see whether the test statistics are consistent under the alternative hypothesis H1 , that there are changes in the regression function. Hence, we further study the asymptotic properties of the statistics U ðTÞ J n ;k and W ðTÞ J n ;k under the alternative hypothesis in the following theorem. From this theorem, we obtain the estimators of the jump sizes and locations of change points. Theorem 2.2. Assume that the conditions (A.1)–(A.5) of Appendix A are satisfied. Let tl ; l ¼ 1; 2; . . . ; p be p jump points of TðxÞ, and the corresponding jump sizes be denoted d l ; l ¼ 1; 2; . . . ; p. (a) If Assumption J(a) is satisfied, then for k 2 Iðtl ; 2J n ðb aÞÞ we obtain that Z A 1=2 J n =2 U ðTÞ ðkÞ ¼ 2 ðb aÞ d cðxÞ dx þ Op ðn1=2 Þ, l Jn (2.10) 1 where S an ¼ OP ðbn Þ denotes limn!1 an =bn ¼ C ke pl¼1 Iðtl ; 2J n =2 ðb aÞÞ, we have in 1=2 U ðTÞ Þ. J n ðkÞ ¼ OP ðn probability for some constant C, and for (2.11) (b) If Assumptions J(a) and J(b) are satisfied, then (2.10) and (2.11) hold for the discretized estimator W ðTÞ J n ðkÞ of the empirical wavelet coefficient. From the theorems above we can show that our tests for jumps of the regression function are consistent. Hence, we derive the following important corollary. Corollary 2.2. Suppose that the assumptions of Theorem 2.2 are satisfied. Under the alternative hypothesis H1 , 1=2 1=2 tests n1=2 U ðTÞ W ðTÞ maxk2Dn jU JðTÞ ðkÞj!1 and J n ðkÞ and n J n ðkÞ are consistent in probability; that is, n n ðTÞ n1=2 maxk2Dn jW J n ðkÞj!1 as n ! 1 in probability. 2.4. Estimation of jump size and change points When we assume that TðxÞ has only one change point tl in ½a; b, an estimator for the jump size d l of the change point is proposed as follows: ðiÞ d^l ¼ 2J n =2 maxk2Dn jU ðTÞ J n ðkÞj R 1=2 A ðb aÞ 1 cðxÞ dx ðdÞ d^l ¼ 2J n =2 maxk2Dn jW ðTÞ J n ðkÞj . R 1=2 A ðb aÞ 1 cðxÞ dx or ðiÞ ðdÞ We can show that asymptotic distributions of the estimators d^ l and d^ l are extreme distributions; that is, they have asymptotic distributions similar to those of Corollary (2.1). It is easy to prove the consistency of the two ðiÞ ðdÞ estimators d^ and d^ . l l Assumption J(c). limn!1 ð2J n =nÞ ¼ 0; limn!1 ð25J n =nÞ ¼ 1. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 235 Theorem 2.3. Assume that the conditions (A.1)–(A.5) of Appendix A and Assumption J(c) are satisfied. Then under H1 , ðiÞ jd^ l d l j ¼ OP ðð2J n nÞ1=2 Þ. Furthermore, assume that J(b) is satisfied. Then under H1 , ðdÞ jd^ l d l j ¼ OP ðð2J n nÞ1=2 Þ. It should be noted that some values of M n ðxÞ and M n ðxÞ in Corollary 2.1, such as f ðtl Þ, sðtl Þ and tl , are unknown. Hence, we need to estimate them in order to use the results of Theorem 2.1. In the next section, we discuss the estimation of f ðÞ and sðÞ by nonparametric kernel methods and local polynomial smoothers. Here we only discuss two estimators of the change point tl . For the sake of simplicity, assume that TðxÞ has only one change point tl in ½a; b. From the results of Theorem 2.2, we can easily suggest an estimator for change point tl . Let ðiÞ ðiÞ t^l ¼ a þ k^l ðb aÞ=2J n , where ðiÞ k^l ¼ arg maxfjU JðTÞ ðkÞj; k 2 Dn g. n Similarly, we can define another estimator for change point tl . Let ðdÞ ðdÞ t^l ¼ a þ k^l ðb aÞ=2J n , where ðdÞ k^l ¼ arg maxfjW ðTÞ J n ðkÞj; k 2 Dn g. ðiÞ ðdÞ It can be shown that t^l and t^l are the consistent estimators of change points. Furthermore, we can obtain ðiÞ ðdÞ the convergence rates for t^l and t^l , which are the optimal minimax rates. Theorem 2.4. Suppose that the conditions (A.1)–(A.5) in Appendix A and Assumption J(c) are satisfied. Then ðiÞ t^l tl ¼ OP ð2J n Þ. In addition, if Assumption J(c) is satisfied, we have ðdÞ t^l tl ¼ OP ð2J n Þ. Remark 2.1. It should be noted that J n is not required to satisfy Assumption J(a) in Theorem 2.4, but it is necessary for Theorems 2.1 and 2.2. In particular, based on the integral estimator of the empirical wavelet coefficient, 2J n can be taken as OP ðn1 ðlog nÞZ Þ for any Z40. For the discretized case, although N and J n are required to satisfy Assumption J(c), N should be chosen to be large enough such that n=ðNhn Þ ! 1 when 2J n ¼ Oðnðlog nÞZ Þ, as n ! 1 for any Z40. We find that the convergence rate for the estimator of the change point by empirical wavelet method is OP ðn1 Þ. Carlstein et al. (1994), Raimondo (1998), and Wang (1995) have studied estimation problems of the change point in the regression TðxÞ for fixed designs with nonparametric methods under the i.i.d. assumption for observations. If TðxÞ has only one jump at a point tl (otherwise, it is the Lipschtisz continuous function), then the minimax rate of the problem is known to be OP ðn1 Þ (Korostelev, 1987). Therefore, from the results of Theorem 2.4, the convergence rate reaches the best possible rate. Müller (1992) further proposed consistent kernel type estimators and established limit distributions. The corresponding rates of convergence are OP ðna Þ for 1=2pao1. Hence, the convergence rate of Theorem 2.4 is typically faster than OP ðn1=2 Þ. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 236 2.5. Multiple change points When regression function TðxÞ has p (known or unknown) change points in the interval ½a; b, we can assume, without loss of generality, that jd lþ1 j4jd l j, l ¼ 1; 2; . . . ; p 1. Suppose that TðxÞ is differentiable at all points on ða; bÞ except at tl ; l ¼ 1; 2; . . . ; p. ðiÞ ðdÞ When we use the estimators t^l and t^l for jump points as mentioned before, we only obtain an estimate of the point with the largest jump size if TðxÞ has at least two jump points. Therefore, the estimation of change points needs to be conducted sequentially. Since we assume that jd lþ1 j4jd l j, l ¼ 1; 2; . . . ; p, we can sequentially suggest several estimators for the break points tl ; l ¼ 1; 2; . . . ; p (following, Wang, 1995; and Delgado and Hidalgo, 2000), which are defined by ðiÞ ðiÞ t~l ¼ a þ k~l ðb aÞ=2J n ; l ¼ 1; 2; . . . ; p, ðdÞ ðdÞ t~l ¼ a þ k~l ðb aÞ=2J n ; l ¼ 1; 2; . . . ; p, where ðiÞ k~l ¼ arg max jU ðTÞ J n ðkÞj; j ¼ 1; 2; . . . ; p, k2QðjÞ ðdÞ k~l ¼ arg max jW ðTÞ J n ðkÞj; j ¼ 1; 2; . . . ; p, k2QðjÞ (2.12) S ðTÞ ~ J n Aðb aÞÞ, and t~l is one of t~ðiÞ ~ðdÞ in which QðjÞ ¼ Dn j1 l and tl , corresponding to U J n ðkÞ and l¼1 Iðtl ; 2 ðTÞ W J n ðkÞ, respectively. The estimators of jump sizes of change points tl , l ¼ 1; 2; . . . ; p can be defined by ðiÞ d~l ¼ ~ðiÞ 2J n =2 U ðTÞ J n ðk l Þ RA ðb aÞ1=2 1 cðxÞ dx ðdÞ d~l ¼ ~ðdÞ 2J n =2 W ðTÞ J n ðk l Þ . RA ðb aÞ1=2 1 cðxÞ dx and When p is unknown, Theorem 2.1(a) implies that n1=2 jU ðTÞ J n ;k jpC 1o with approximate probability, 1 o, at those x values at which TðxÞ has no jump. So we take C 1o as a threshold and use these values to determine ðiÞ ðiÞ which point is a change point. In fact, max ðjÞ jU ðTÞ ðkÞj4C 1o , hence t~ ¼ a þ k~ ðb aÞ=2J n is a change k2Q Jn l l ðiÞ ~ðdÞ ^ j ðj ¼ 1; 2Þ be the point with probability 1 o, where k~l ¼ maxk2QðjÞ jU ðTÞ J n ðkÞj. Similarly, we define tl . Let p ðiÞ number of the maxima and t^1 ; . . . ; t^p be their locations, where t^1 is one of t~l (corresponding to p^ ¼ p^ 1 , based on ðdÞ the integral estimation of the wavelet coefficient), and t~l (corresponding to p^ ¼ p^ 2 , based on the discretized estimation of the wavelet coefficient). As a result, we can sequentially define estimators of jump sizes for p change points, and thus obtain the following results: Theorem 2.5. Assume that conditions (A.1)–(A.5) in Appendix A are satisfied. Let tl ; l ¼ 1; 2; . . . ; p be p jump points of function TðxÞ, and the corresponding jump size be denoted d l ; l ¼ 1; 2; . . . ; p. (a) If Assumption J(c) is satisfied, then Pðp^ 1 ¼ pÞ!1, ðiÞ d~l d l ¼ OP ðð2J n nÞ1=2 Þ ðiÞ t~l tl ¼ OP ð2J n Þ; ^ l ¼ 1; 2; . . . ; p, ^ l ¼ 1; 2; . . . ; p. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 237 (b) If Assumptions J(c) and J(d) are satisfied, then Pðp^ 2 ¼ pÞ!1, ðdÞ d~l d l ¼ OP ðð2J n nÞ1=2 Þ; ðdÞ t~l tl ¼ OP ð2J n Þ; ^ l ¼ 1; 2; . . . ; p, ^ l ¼ 1; 2; . . . ; p. 2.6. Estimation of the asymptotic variance The estimation of the asymptotic variance in model (2.1) is complicated because it involves the density function f ðxÞ of explanatory variable X and the heteroscedastic variance function sðxÞ. The estimator of density has been established by nonparametric kernel techniques (see Nadaraya, 1964). The kernel estimator is defined by n 1 X Xi x f n ðxÞ ¼ K , (2.13) nhn i¼1 hn where hn is a window-width variable with hn ! 0 and nhn ! 1, and KðxÞ is a probability density function that may be different from that of (2.7). The regression function TðxÞ can be estimated by the kernel estimator or the local linear smoother under null hypothesis; that is, Pn i¼1 K h ðX i xÞY i ^ TðxÞ ¼ P , (2.14) n i¼1 K h ðX i xÞ where the kernel function KðÞ is a probability density function, or Pn i¼1 Kn;hn ðX i xÞY i ^ T 1 ðxÞ ¼ P , n i¼1 Kn;hn ðX i xÞ (2.15) where the kernel function Kn;hn ðÞ is the same as that of (2.7), but where KðÞ may be different from that of (2.7). The heteroscedastic variance estimator is slightly more complex. Estimation of the conditional variance and of density is of common interest in a variety of statistical applications, such as in measuring volatility or risk, the return distribution function, and the distribution of the predictive error in finance (Anderson and Lund, 1997; Gallent and Tauchen, 1997). Under a general setup, which includes nonlinear time series models as a special case, Fan and Yao (1998) have proposed an efficient and adaptive method for estimating the conditional variance. They improved the estimator suggested by Härdle and Tsybakov (1997). Fan and Yao (1998) have proposed a better estimator by regarding the estimation of s2 ðÞ as a nonparametric regression problem in view of the relation EðrjX ¼ xÞ ¼ s2 ðxÞ, where r ¼ ðY TðX ÞÞ2 , assuming that Eði jX i Þ ¼ 0 and varði jX i Þ ¼ 1. Xia (1999) has proposed a robust estimator for the conditional variance by using the same idea as that of Fan and Yao (1998). Hence, we use a similar robust estimator for the volatility function sðxÞ. For simplicity, we assume that j is independent of ðX i ; ipjÞ in model (2.1), and sðxÞ (or s2 ðxÞ) has a bounded and continuous derivative of second order on ða; bÞ. We can derive the two estimators for the heteroscedastic variance in model (2.1). From model (2.1), we can write an alternative nonparametric volatility model as jY i TðX i Þj ¼ s10 sðX i Þ þ sðX i Þðji j s10 Þ (2.16) ðY i TðX i ÞÞ2 ¼ s20 s2 ðX i Þ þ s2 ðX i Þð2i s20 Þ, (2.17) and where s10 ¼ EðjjÞ and s20 ¼ Eð2 Þ. Hence the transformations (2.16) and (2.17) of model (2.1) are completely similar to those of model (2.1). First, we consider the simple case where TðxÞ has been estimated. With the ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 238 expressions of (2.16) and (2.17), TðxÞ in the estimators of the conditional variance may be replaced by the estimated regression function. Hence this leads to the following simple forms as the absolute deviation estimator s^ 1 ðxÞ: Pn ^ i Þj Vn;b ðX i xÞjY i TðX , (2.18) s^ 1 ðxÞ ¼ i¼1 Pn s10 i¼1 Vn;b ðX i xÞ and the square estimator s^ 22 ðxÞ: Pn ^ i ÞÞ2 Vn;b ðX i xÞðY i TðX s^ 22 ðxÞ ¼ i¼1 Pn , s20 i¼1 Vn;b ðX i xÞ (2.19) where Vn;b ðX i xÞ ¼ V b ðX i xÞ n X V b ðX j xÞðX j xÞ2 j¼1 V b ðX i xÞðX i xÞ n X V b ðX j xÞðX j xÞ, ð2:20Þ j¼1 where V b ðÞ ¼ V ð=bÞ, V ðÞ is a kernel function that may be different from those kernel functions in (2.7) and (2.13), bn is a sequence of bandwidths, and s10 and s20 are the same as those in (2.16) and (2.17). Without loss of generality, we assume that s10 ¼ 1 or s20 ¼ 1 for identification. It is easy to show that s^ 1 ðxÞ is more robust than s^ 2 ðxÞ, and the distribution of is fat-tailed or there exist outliers in the observations. Xia (1999) has shown that when the distribution of has a kurtosis greater than 3, s^ 1 ðxÞ tends to be more efficient than s^ 2 ðxÞ, and the larger the degree of the diffusion, the more efficient is s^ 1 ðxÞ than s^ 2 ðxÞ. However, in mathematics, it is easier to deal with the estimator s^ 2 ðxÞ. Hence, in our simulations we use the estimator s^ 2 ðxÞ incomparison with s^ 1 ðxÞ. Using similar proofs to those of Theorems 5 and 6 in Masry (1996), and incorporating main results of Masry and Tjøstheim (1995), we have ! log n 1=2 ^ sup jsðxÞ sðxÞj ¼ O a:s. (2.21) nhn apxpb ^ sðxÞj ¼ OP sup jsðxÞ apxpb log n nhn 1=2 ! , (2.22) ^ is either where a.s. denotes convergence in probability 1, and OP denotes convergence in probability, and sðxÞ s^ 1 ðxÞ or s^ 2 ðxÞ. Hence, we can construct two tests that do not include any unknown quantities: 1=2 Jn f^ ða þ xðb aÞÞU ðTÞ J n ð½2 xÞ ^ n ðxÞ ¼ , M ^ þ xðb aÞÞ sða ^ ðxÞ ¼ M n 1=2 Jn f^ ða þ xðb aÞÞW ðTÞ J n ð½2 xÞ , ^ þ xðb aÞÞ sða (2.23) (2.24) where 0pxp1. Therefore, the following corollary follows immediately from Corollary 2.1, Lemma A.4, and (2.22). Corollary 2.3. Assume that the conditions of Theorem 2.1(a) are satisfied. Then under the null hypothesis H0 , we have ( ) n 1=2 ^ n ðxÞj aðdn Þoz ! expð2 expðzÞÞ. P Aðdn Þ sup jM k2 s20 0pxp1 ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 239 Suppose that the conditions of Theorem 2.1(b) are satisfied. Then, under the null hypothesis, we have ( ) n 1=2 ^ sup jM n ðxÞj aðdn Þoz ! expð2 expðzÞÞ, P Aðdn Þ k2 s20 0pxp1 where AðxÞ ¼ j2 log xj1=2 and 1=2 1=2 aðxÞ ¼ j2 log xj where k1 ¼ RA 0 2 A ðc ðxÞÞ 1=2 þ j2 log xj dx and k2 ¼ log RA A k1 ! 1=2 , 2pk2 c2 ðxÞ dx. This corollary gives the asymptotic critical values to detect jump points of regression function TðxÞ. The results similar to Corollary 2.4 for model (2.2) also hold for f^ða þ xðb aÞÞ, replaced by known f ða þ xðb aÞÞ. 3. Simulation In this section, we carry out several Monte Carlo simulations to investigate the finite sample properties of tests based on the empirical wavelet coefficients. All simulations and statistical computations were written with Fortran 77. First, we consider the case of the regression function with heteroscedastic conditional variance. The choice of the bandwidth is very important in the estimation of the regression function and in the estimators of the asymptotic variance in Theorem 2.1. To use the data-driven bandwidth, we show that the random bandwidth can be used. This can be achieved by proving that the results in Section 2 hold uniformly for hn 2 ½c1 n1=5 ; c2 n1=5 , where c1 oc2 . It is known that the power of the tests are sensitive to a choice of bandwidths. We shall only address the problem of bandwidth selection in the regression function since the same principle applies to the estimation of the density function and the conditional variance function. Let Z b ^ ½TðxÞ TðxÞ2 wðxÞf ðxÞ dx , MISE ¼ E a where wðxÞ is a given weight function which we will later take to be 1, and Pn j¼1 Kn;hn ðX j xÞY j ^ , TðxÞ ¼ Pn j¼1 Kn;hn ðX j xÞ ^ where Kn;hn ðÞ satisfies (2.20). The ideal bandwidth in the sense of MISE for TðxÞ is ! R 1=5 1 k2 s2 0 wðxÞ dx n1=5 , h0 ¼ R 1 00 T ðxÞwðxÞf ðxÞ dx 0 (3.1) (3.2) provided that the null hypothesis holds and the conditional variance is constant. However, while TðxÞ has a bounded and continuous second-order derivative on ða; bÞ, the conditional variance is not constant; that is, the conditional variance is s2 ðxÞ. Hence, (3.2) should be changed to !1=5 R1 k2 0 s2 ðxÞwðxÞ dx n1=5 . h0 ¼ R 1 00 T ðxÞwðxÞf ðxÞ dx 0 Note that in the formula of ideal bandwidth (3.2), T 00 ðxÞ and s2 are unknown. Thus, based on the properties available, we first use the cross-validation method to obtain h00 for the third-order local polynomial fitting. Then, we make an adjustment to h00 in order to obtain a suitable bandwidth for the estimation of T 00 ðxÞ (see Fan and Gijbels, 1996). The bandwidth for the local third-order polynomial fitting by the cross-validation method is h^00 ¼ arg min 0 h n X ðY i T^ 0;h0 ðX i ÞÞ2 wðX i Þ, i¼1 (3.3) ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 240 where T^ 0;h0 is the kernel estimator (3.1), or local linear smoother (Fan and Gijbels, 1996), using the data fðX j ; Y j Þ; jaig. Then the bandwidth suitable for T^ 00 ðxÞ is h^n ¼ adj 2;3 h^00 , (3.4) where adj 2;3 is an adjusting constant, which depends only on the kernel function. For example, if the kernel function is a Gaussian (Epanechnikov) kernel, then adj 2;3 ¼ 0:8285ð0:7776Þ (see Fan and Gijbels, 1996). For the estimation of the constant conditional variance, s2 , we use the procedure of Gasser et al. (1986). A proposed estimator of s2 is s^ 2 ¼ n1 1 X at Y~ t1 þ bt Y~ tþ1 Y~ t , n 2 t¼1 a2t þ b2t þ 1 (3.5) where fY~ t ; t ¼ 1; 2; . . . ; ng corresponds to the order statistics fX~ ½t ; t ¼ 1; 2; . . . ; ng of fX t ; t ¼ 1; 2; . . . ; ng, and when X ½tþ1 X ½t1 a0, at ¼ ðX ½tþ1 X ½t Þ=ðX ½tþ1 X ½t1 Þ, bt ¼ ðX ½t X ½t1 Þ=ðX ½tþ1 X ½t1 Þ. 00 When X ½tþ1 X ½t1 ¼ 0, let at ¼ bt ¼ 1=2. After obtaining estimators T^ ðxÞ and s^ 2 of T 00 ðxÞ and s2 , we then ^ have the ‘‘plug-in’’ bandwidth for TðxÞ as !1=5 R1 k2 s^ 2 0 wðxÞ dx h0 ¼ n=5 . (3.6) P 4 n T^ 00 ðX t ÞwðX t Þ t¼1 When the conditional variance is not a constant, the ‘‘plug-in’’ bandwidth involves the nonparametric ^ estimation of conditional variance s2 ðxÞ. It becomes more complicated to choose the bandwidth in TðxÞ. We 2 can use the estimators of (2.18) and (2.19) for sðxÞ and s ðxÞ, respectively. Another bandwidth must be selected to estimate the unknown conditional variance function s2 ðxÞ. We obtain some suitable bandwidths for the conditional variance function s2 ðxÞ and density function f ðxÞ by similar procedures. The details have been omitted. In the following examples, we consider the Epanechnikov kernel pffiffiffi KðxÞ ¼ 3ð1 x2 ÞIðx2 p5Þ=ð4 5Þ, ^ for the estimator TðxÞ, and the Gaussian kernel K 0 ðxÞ ¼ ð2pÞ1=2 expðx2 =2Þ for the estimator T^ 00 ðxÞ and the estimator of the conditional heteroscedastic variance. Example 3.1. To assess the size and power of tests we consider the following model Y t ¼ TðX t Þ þ sðX t Þt ; t ¼ 1; 2; . . . , with conditional heteroscedastic variance (s2 ðxÞ). The errors ft ; t ¼ 1; 2; . . .g are a sequence of independently and identically distributed (i.i.d.) random variables with the standard normal Nð0; 1Þ; fX t ; t ¼ 1; 2; . . .g are a sequence of i.i.d. uniform random variables in ½0; 1. We generate 500 pairs of samples of sizes n ¼ 128 and n ¼ 256 (these sample sizes are convenient for resolution level J n , such as 26 ¼ 128 and 27 ¼ 256). Assume that the regression function TðxÞ has the form ( 0:6x2 if x4t0 ; TðxÞ ¼ (3.7) 0:6x2 þ d 0 if xpt0 ; and the conditional variance s2 ðxÞ ¼ 0:4x2 . To study the effect of tests at different locations, we take t0 at three different values: 0.3, 0.5, and 0.75, for the location and d 0 at four different values, 0, 0.3, 0.5, and 1, for ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 241 the jump size. The wavelet has been taken as 8 4 if 1pxp2; > < 5ðx 1Þ 3 2 20 cðxÞ ¼ if 2pxp 1; 3 ðx þ 1Þ þ 2ðx þ 1Þ > : 0 otherwise: We compare the absolute deviation estimator with the square estimator for the conditional heteroscedastic variance in different samples. The simulated results are listed in Tables 1 and 2.We find that the powers and sizes of the tests are very good. The estimation of the location of change points is very accurate from Tables 1 and 2, in addressing model (3.7). We may view that the estimation of the locations and jump sizes of change points, with the absolute deviation estimator for the conditional heteroscedastic variance, is better than those with the square estimator. When the size of the sample is small (np128), the test statistic, M n ðkÞ; with the absolute deviation estimator for s2 ðxÞ, has slightly more power than that with the square estimator. But when the sample ðdÞ becomes large, it is not easy to determine which has more power. We also see that the estimator d^ l overestimates the jump size when the sample is small (when np128). Again, from Tables 1 and 2, we find that the location of change points seems to have an effect on the power of the tests, and the bias in the estimation of the locations of change points increases as the real locations of change points approximate each other. Next, consider when the error ft ; t ¼ 1; 2; . . .g is a sequence of stationary, random variables satisfying the ARMA(1,1) model: t ¼ rt1 þ ut þ yut1 , where fut ; t ¼ 1; 2; . . .g is a sequence of i.i.d. random variables with standard distribution Nð0; 1Þ. We address model (3.7) following ARMA (1,1) errors and calculate the power and size of test M n ðkÞ. The results with the correlation errors are almost the same as those in Tables 1 and 2. This implies that a test based Table 1 ^ n ðkÞ for model (3.7), based on square estimator, s^ 2 ðxÞ; for conditional heteroscedastic variance Sizes and powers for test M 2 t0 n ¼ 128 0.00 0.30 0.50 0.75 n ¼ 256 0.00 0.30 0.50 0.75 d0 Test Estimation a ¼ 10% a ¼ 5% a ¼ 1% t^ SE d^ SE 0.30 0.50 1.00 0.30 0.50 1.00 0.30 0.50 1.00 23.6% 85.8% 99.8% 100% 93.2% 97.6% 99.6% 74.2% 97.6% 98.6% 6.8% 76.8% 98.0% 99.8 % 83.4% 91.0% 98.8% 46.4% 91.0% 96.4% 1% 65.8% 86.2% 99.8% 50.0% 57.2% 98.8% 26.8% 57.2% 96.2% 0.319 0.301 0.301 0.503 0.498 0.502 0.692 0.733 0.747 0.127 0.048 0.053 0.119 0.075 0.025 0.168 0.113 0.070 0.539 0.854 1.650 0.613 0.930 1.734 0.697 1.060 1.839 0.157 0.185 0.280 0.188 0.214 0.281 0.341 0.311 0.398 0.30 0.50 1.00 0.30 0.50 1.00 0.30 0.50 1.00 10.5% 98.8% 100% 100% 100% 100% 100% 100% 100% 100% 5.80% 96.8% 100% 100% 98.4% 100% 100% 99.6% 98.2% 100% 1.2% 87.4% 93.4% 99.4 % 89.4% 96.8% 100% 90.4% 92.0% 97.2% 0.297 0.297 0.306 0.491 0.493 0.501 0.732 0.737 0.746 0.037 0.024 0.023 0.032 0.029 0.025 0.080 0.041 0.021 0.383 0.571 1.067 0.434 0.627 1.121 0.664 0.724 1.211 0.036 0.048 0.088 0.043 0.058 0.098 0.100 0.079 0.111 ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 242 Table 2 ^ ðkÞ for model (3.7), based on absolute estimator s^ 1 ðxÞ, for conditional heteroscedastic variance Sizes and powers for test M n t0 d0 n ¼ 128 0.00 0.30 0.50 0.75 n ¼ 256 0.00 0.30 0.50 0.75 Test Estimation a ¼ 10% 5% 1% t^ SE d^ SE 0.30 0.50 1.00 0.30 0.50 1.00 0.30 0.50 1.00 18.4% 98.0% 100% 100% 98.4% 99.8% 99.8% 97.6% 99.0% 100% 7.0% 92.4% 99.6% 99.8 % 91.4% 99.6% 99.8% 89.6% 98.2% 99.8% 1% 66.8% 98.8% 99.8% 60.4% 98.8% 99.8% 51.4% 92.8% 99.8% 0.311 0.293 0.301 0.492 0.495 0.499 0.762 0.759 0.756 0.108 0.070 0.034 0.110 0.064 0.061 0.047 0.033 0.035 0.410 0.607 0.998 0.488 0.665 1.351 0.714 0.956 1.582 0.114 0.177 0.158 0.134 0.222 0.194 0.135 0.157 0.239 0.30 0.30 0.50 1.00 0.30 0.50 1.00 0.30 0.50 1.00 11.5% 98.2% 100% 100% 100% 100% 100% 100% 100% 100% 7.00% 95.8% 100% 100% 96.4% 100% 100% 100% 98.2% 100% 0.92% 93.4% 100% 100 % 93.6% 96.8% 100% 94.2% 92.0% 97.2% 0.298 0.301 0.301 0.498 0.499 0.500 0.731 0.737 0.759 0.049 0.026 0.024 0.042 0.026 0.026 0.032 0.041 0.042 0.319 0.571 1.067 0.372 0.627 1.121 0.453 0.724 1.211 0.027 0.048 0.088 0.033 0.058 0.098 0.051 0.079 0.111 on empirical wavelet coefficients may be used in the case of the errors being correlated. We have omitted the details of this case (available from the authors on request). Example 3.2. Consider the following model of a time series. The data fðY t ; X t Þ; t ¼ 1; 2; . . .g are serially correlated. Assuming that Y t ¼ X tþ1 , the model can be written as X tþ1 ¼ TðX t Þ þ sðX t Þt , where TðxÞ is the threshold function: ( 1:2 0:6 x if xpt0 ; TðxÞ ¼ 1:2 0:6 x þ d 0 if x4t0 ; pffiffiffi and s2 ðxÞ ¼ 0:4 x þ 0:1. (3.8) It would be interesting to compare the current results with those in Example 3.1, and with i.i.d. observations. Assume that the original X 0 comes from a uniform variable in ½0; 1. ft ; t ¼ 1; 2; g is the same as in Example 3.1. The results for this example are very similar to those listed in Tables 1 and 2, which implies that the test method proposed in this paper is not affected when the observation is a sequence of time series with correlation. The details are omitted. Lastly, we consider a model with multiple change points. We calculate the percentage of rejecting the null hypothesis when the regression has two change points with jump sizes of d 1 ¼ a1 þ a2 and d 2 ¼ a2 , respectively. Example 3.3. Consider the following model with conditional heteroscedastic variance Y t ¼ TðX t Þ þ sðX t Þt , where TðxÞ has two different change points at t1 and t2 . The jump sizes for the two change points t1 and t2 (here t2 4t1 ) are d 1 ¼ a1 þ a2 and d 2 ¼ a2 , respectively. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 243 When a1 ¼ 0, the two change points have the same jump sizes, and as a2 ¼ 0, TðxÞ has only one change point. That is, 8 2 > < 0:6x TðxÞ ¼ 0:6x2 þ a1 > : 0:6x2 þ a1 þ a2 if xpt1 ; if x4t2 ; (3.9) if t1 oxpt2 ; and sðxÞ ¼ 0:4x2 þ 0:1, in which ft ; tX1g is a sequence of i.i.d. normal random variables. In the simulations, let t1 and t2 be equal to the different values 0.3 and 0.6, respectively. The parameters in jump size a1 have been taken as three values, 0.0, 0.2, and 0.5, and a2 as three different values, 0.0, 0.3, and 0.5. For the sake of simplicity, the size of the sample is only n ¼ 256. The simulated results for Example 3.3 are shown in Table 3.In Table 3, t^i is the estimator of ti , and i ¼ 1; 2. Fig. 1(a) describes the scatter points and the true curve of the regression produced by model (3.9), and Fig. 1(b) is the plot of the series produced by Y t ¼ TðX t Þ þ sðX t Þt , with (3.9) and conditional heteroscedastic ^ ðxÞ in which the conditional variance s2 ðxÞ ¼ 0:4x2 þ 0:1. Figs. 1(c) and (d) are the plots by all values of M n heteroscedastic variance has been estimated by the absolute deviation, and the square estimator, respectively. As with Example 3.1, the tests based on the empirical wavelet method for the change points are very powerful, and the estimations of locations and jump sizes of change points are very accurate, except for those with the same jump size. The estimation of those points at which the regression has changed with the same jump size (that is, when a1 ¼ 0) is not accurate. This is reasonable because we cannot tell which point is the first jump when two jump sizes are the same. From the results in Panels A and B, we observe that the conditional heteroscedastic variances can be estimated with virtually the same accuracy by either the absolute deviation estimator or the square estimator. The powers of the test with the absolute deviation estimator are slightly stronger than those with the square estimator. Table 3 ^ n ðkÞ and M ^ ðkÞ for model (3.9) with multiple change points Powers for tests M n a1 a2 Test for t1 10% Test for t2 5% 10% Estimation 5% t^1 SE t^2 SE d^1 d^2 Panel A: n ¼ 256, 0.00 0.0 0.3 0.5 0.20 0.0 0.3 0.5 0.50 0.0 0.3 0.5 absolute deviation estimator for s2 ðxÞ 27.0% 7.40% 6.4% 99.0% 93.4% 92.8% 100% 100% 98.8% 99.4% 97.6% – 100% 100% 98.4% 100% 100% 100% 100% 100% – 100% 100% 98.4% 100% 100% 100% 0.60% 89.2% 91.0% – 89.2% 100% – 89.2% 99.8% 0.334 0.396 0.313 0.303 0.310 0.301 0.305 0.327 0.094 0.145 0.088 0.042 0.145 0.262 0.024 0.078 0.316 0.369 – 0.598 0.508 – 0.617 0.570 0.115 0.133 – 0.136 0.134 – 0.170 0.091 0.369 0.489 0.269 0.554 0.601 0.559 0.854 1.002 0.300 0.411 – 0.164 0.393 – 0.219 0.510 Panel B: n ¼ 256, 0.00 0.0 0.3 0.5 0.20 0.0 0.3 0.5 0.50 0.0 0.3 0.5 square estimator for s2 ðxÞ 15.0% 3.40% 98.6% 91.0% 100% 100% 95.0% 86.0% 100% 100% 100% 99.4% 100% 99.8% 100% 100% 100% 100% 0.20% 82.6% 85.0% – 77.0% 87.4% – 47.4% 99.4% 0.333 0.402 0.329 0.298 0.360 0.297 0.302 0.308 0.110 0.148 0.142 0.033 0.115 0.025 0.023 0.022 0.324 0.372 – 0.575 0.398 – 0.604 0.599 0.135 0.136 – 0.088 0.143 – 0.131 0.043 0.281 0.414 0.262 0.554 0.728 0.559 0.855 1.056 0.350 0.476 – 0.116 0.517 – 0.255 0.493 2.4% 91.0% 98.6% – 97.2% 100% – 75.4% 100% ARTICLE IN PRESS 244 G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 Fig. 1. This figure describes the curve of Example 3.3: (a) is the plot generating the data based on the model in Example 3.3, and the true curve with two change points; (b) plots the curve of the observed data; (c) is the plot of the empirical wavelet coefficient test based on the square estimator for the conditional heteroscedastic variance; (d) draws the plot of the empirical wavelet coefficient test based on the absolute deviation estimator for the conditional heteroscedastic variance. 4. Applications: short-term interest rates and stock prices The procedures proposed in this paper suggest a variety of applications in finance and economics. In this section, as examples off possible applications we apply the test based on the empirical wavelet method to short-term interest data and the stock closing prices of selected companies. We consider the nonparametric regression: Y t ¼ mðxt Þ þ sðxt Þ; where fg is Gaussian white noise. Y t ¼ Zt Z t1 , where Z t denotes the short-rate (see Fan and Yao, 1998), while Z t can be the logarithm of the stock price. In practice, the white noise model can be expressed as Y t ¼ mðxt Þ þ sðxt Þ, where xt ¼ t=T (t ¼ 1; . . . ; TÞ, implying that mðxt Þ is a continuous function of dates (see Wang, 1995). Example 4.1 (Short-term interest rate). The default-free short-term interest rate is a key economic variable. It directly affects the short end of the term structure, and thus has implications for the pricing of the full range of fixed-income securities and derivatives. Alternatively, the short rate is an important input for business cycle analysis because of its impact on the cost of credit, and its sensitivity to monetary policy and inflationary expectations. This example deals with the yield of the three-month Treasury Bill in the U.S. The data consist of 1,735 weekly observations from January 5, 1962 to March 31, 1995. The data are presented in Fig. 2(a).Y t (Y t ¼ Z t Z t1 Þ is plotted against Z t1 in Fig. 2(b). From Fig. 2(b), fY t g is an approximation of a stationary sequence. We choose the same wavelet cðxÞ as in Example 3.1. The kernel function KðxÞ, in the estimation of the regression function TðxÞ, is taken as uniform kernel; that is, ( 1 if kxkp1; KðxÞ ¼ 2 0 if kxk41: The kernel functions in the estimation of conditional variance, sðxÞ; and density function, f ðxÞ; which are the Gaussian kernel and Epanechnickov kernel, respectively. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 245 Fig. 2. Results of Example 4.1: (a) is the plot of the logarithm of the original data; (b) plots the first difference of the logarithm of the original data against the logarithm of the original data; (c) is the plot of the empirical wavelet coefficient test based on the square variance estimator; (d) is the plot of the empirical wavelet coefficient test based on the absolute deviation estimator. Table 4 Summary of properties of changes in returns on the three-month Treasury Bill for 1,735 weekly observations from January 5, 1962 to March 31, 1995 t^1 t^2 t^3 t^4 Estimated changes Dates ^ M n P-value 1,184 1,358 1,623 1,710 May 9, 1984 July 10, 1987 February 12, 1992 October 14, 1994 3.3947 4.5145 5.0843 5.7663 o0:1 o0.05 o0.05 o0.01 Using the empirical wavelet method proposed in this paper, we observe that the test statistics exceed the critical values at several locations. For example, k ¼ 1; 157 (May, 1982) at the 10% significant level, k ¼ 1; 335 (July 1987) and k ¼ 1; 567 (January 1992) at the 5% significant level, and k ¼ 1; 766 (December 1993) at the 1% significant level. The asymptotic critical values are 2.9435 at the 10% significance level, 3.6633 at the 5% significance level, and 5.2933 at the 1% significance level, based on Corollary 2.1. Thus, the potential change points are k ¼ 1; 335 (July 1987), k ¼ 1; 567 (January 1992), and k ¼ 1; 766 (December 1993), at the 5% significant level. At the 1% significant level; we find a change point k ¼ 1; 766 (December 1993). This implies that there were significant local structural changes in the short-term interest rates at the corresponding points for the sample period (Table 4). Example 4.2 (Stock closing prices). This example studies the possible changes and jumps in stock closing prices of IBM, Motorola Inc., etc., from January 2, 1991 to December 31, 1999. These companies are listed in Table 5. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 246 Table 5 Companies studied to check changes in stock prices in Example 4.2 No. Company name Abbreviation 1 2 3 4 5 6 7 Berry Petroleum Co. International Business Machines C Merrill Lynch & Co. INC. Motorola INC. Northeast Utilities Philips N. V. Royal Gold INC. BPC IBM MER MOT NU PHG RGLD Table 6 Statistics of other companies to check changes in the stock closing prices in Example 4.2 No. Company Possible change points Corresponding dates P-values 1 2 3 4 5 6 7 BPC IBM MER MOT NU PHG RGLD 1,501 1,602 1,631 457,1,620 1367,1,682 1,641 1,133, 1,554 12/3/96 4/30/97 6/11/97 10/19/92, 5/27/97 5/24/96, 8/22/96 6/25/97 4/8/96, 6/6/94, 2/20/97 o0:001 o0:001 o0:001 o0:001 o0:001 o0:001 o0:001 To induce an approximate stationarity of the observed series, we consider the first difference of the logarithmically transformed stock closing prices of the companies. Then, we use the wavelet method to check which dates are the change points in the stock closing prices of these companies. After transforming the original data, no trend in fY t g is discernible; and the sample autocorrelation function is not significantly different from the Kronecker delta function. The wavelet cðxÞ taken here is the same as that in Example 3.1. The kernel functions in the estimation of TðxÞ, sðxÞ, and f ðxÞ are correspondingly similar to those of Example ^ ðkÞ and M ^ n ðkÞ shows that the possible change points in the volatility of the 4.1. Our analysis using the tests M n data of IBM stock closing prices are k1 ¼ 453 and k2 ¼ 1; 619, at any usual significant level (5%; 1%). The corresponding dates are October 13, 1992 and May 24, 1997, respectively. The statistics of the other companies are summarized in Table 6. 5. Summary and conclusions The dynamic nonparametric models have been introduced to fit the time series data (short-term interest rates and stock returns), under general assumptions. The theory of wavelet coefficients permits the decomposition of the estimated function into localized, oscillating components. Hence, the wavelet method is an ideal and powerful tool to study localized changes, such as jumps and sharp cusps in time series. Detection by wavelets is a multi-resolution (time–frequency) technique. The multi-resolution approach has local adaptivity and hence has advantages over existing smoothing methods based on a fixed spatial scale (even on a random scale), such as in the Fourier series method and the kernel methods (Müller, 1992). In this paper, we proposed test statistics, the integral estimator, and the discretized estimator of wavelet coefficients, which allow for dependent observations and serially correlated errors in dynamic nonparametric models. Furthermore, we derived the asymptotic distribution of our test statistics and established the consistency of the estimators of jump points. These estimators are also shown to converge at the rate OP ðn1 Þ as n goes to infinity, which is actually the minimax rate and optimal rate of convergence in the sense of probability convergence. Finally, we identified several significant jumps in the short rates and selected stock prices for the sample period. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 247 The empirical wavelet coefficients can be extended to other nonparametric and parametric models, such as the threshold autoregression model, the threshold ARCH model, and multivariate additive models. Moreover, the wavelet procedure can be extended to the multivariate drift-plus-diffusion model. For example, we can consider the following models (additive models) in a similar procedure, where drift-plus-diffusion model may be of the form yt ¼ f 1 ðX 1t Þ þ f 2 ðX 2t Þ þ þ f p ðX pt Þ þ sðX t Þt , where Xt ¼ ðX 1t ; . . . ; X pt Þt is a vector of several factors, and t denotes the transpose of the vector. Acknowledgments We owe the Editor Cheng Hsiao and anonymous referees many thanks for their guidance and suggestions for revising this paper. Appendix A The proofs are very long and complex; we thus provide outlines only. For convenience, let k ¼ ½2J n z for z 2 ð0; 1Þ. To obtain the desired results, the following assumptions will be convenient. Let U be a non-empty open neighborhood of the origin of R and ½a; b U. (A.1) The probability density of X 1 is bounded away from zero and infinity on some open subset U. That is, there exists some positive constant such that M 1 1 pf ðxÞpM 1 ; for x 2 U. (A.2) Let f ðyjxÞ be the conditional density function of X 1 , given X l ¼ x, and there exists some positive constant M 2 such that f ðx1 jxÞpM 2 for x1 ; x 2 U. (A.3) Let fX i ; i ¼ 1; 2; . . . ; ng be a sequence of stationary and strongly mixing random vectors, with mixing coefficient aðÞ with aðkÞ ¼ Oðkr Þ for some large r40. (A.4) f ðxÞ has a twice bounded derivative, and sðxÞ and CðxÞ are continuous third-order differentiable on U. (A.5) Let fi ; i ¼ 1; 2; . . .g be a sequence of i.i.d. random variables and for each i, i is independent of fðX j ; Y j1 Þ; jpig. Meanwhile, EjX jR o1 and EjY jR o1 for some R42. Remark A.1. Assumptions (A.1) and (A.2) are necessary for the kernel estimation with dependent data. (A.3) is to simplify proofs. It can be weakened to aðkÞ ¼ Oðkn Þ for some n40. Many sufficient conditions have been proposed to ensure that X t is strictly stationary and geometrically ergodic. Auestad and Tjøstheim (1990) provided the following conditions: lim x!1 jTðxÞ bxj ¼ 0; jxj lim x!1 sðxÞ ¼ 0, jxj where jbjo1, sXc40. Assumption (A.4) is to meet the continuity requirement for kernel smoothing. (A.5) is also made for simplicity of proofs (see Fan and Yao, 1998). The existence of finite moments is sufficient. When x is a design point, we only need Assumptions (A.1) and (A.4), and the sequence fX i ; i ¼ 1; 2; . . . ; ng should be changed to the sequence fi ; i ¼ 1; 2; . . . ; ng in Assumption (A.5). In the proof of these theorems, we need the following results. Lemma A.1. Suppose that conditions (A.1)–(A.2) are satisfied. Then for a sufficiently large value of B, as n ! 1, ( ) n X 1 1=2 1=2 P sup ðnhn Þ K h ðX j xÞ Ef n ðxÞ4Bðnhn Þ log n !0, apxpb j¼1 P where K h ðÞ ¼ Kð=hn Þ and f n ðxÞ ¼ ð1=ðnhn ÞÞ nj¼1 K h ðX j xÞ. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 248 Proof. It follows from the results of Lemma 3.6 in Zhou and Liang (1999), and the continuity argument, or from the similar arguments of Theorem 5 in Masry (1996). & Lemma A.2. (a) Assume that cðxÞ satisfies Assumptions (I), (II), and (III), and that CðxÞ is a continuously differentiable function in the order of two. Then, uniformly for k in Dn , Pn Z b j¼1 K h ðX j xÞ½CðX j Þ CðxÞ per Pn dx ¼ OP ð2J n =2 hn cn Þ, cJ n ;k ðxÞ K ðX xÞ h j a j¼1 where cn ¼ h2n þ ðlog nÞ1=2 =ðnhn Þ1=2 and Z b 5J n =2 cper Þ. J n ;k ðxÞCðxÞ dx ¼ Oð2 a (b) Furthermore, for the discretized empirical wavelet coefficient, we have uniformly for k in Dn ; ! Pn N baX 2J n =2 j¼1 K h ðX j wi Þ½CðX j Þ Cðwi Þ per Pn ¼ OP c ðwi Þ þ OP ð2J n hn cn Þ N i¼1 J n ;k N j¼1 K h ðX j wi Þ and N baX 2J n =2 cper ðw ÞCðw Þ ¼ O i i N i¼1 J n ;k N ! þ Oð25J n =2 Þ. Pn Proof. It is easy to show the second formula Pn in part (a) of Lemma A.2. Write rn ðxÞ ¼ ð1=ðnhn ÞÞ j¼1 K h ðX j xÞðCðX j Þ CðxÞÞ and f n ðxÞ ¼ ð1=ðnhn ÞÞ j¼1 K h ðX j xÞ. Pn Z b Z b rn ðxÞ j¼1 K h ðX j xÞ½CðX j Þ CðxÞ P dx ¼ dx cper ðxÞ cper n J n ;k J n ;k ðxÞ f K ðX xÞ j a a n ðxÞ j¼1 h Z b rn ðxÞ dx þ OP ð2J n =2 hn cn Þ; a:s:; cper ¼ J n ;k ðxÞ f ðxÞ a where CðxÞ is a twice continuously differentiable function. By Taylor’s expansion and Lemma B.2, we obtain Z c x2 KðxÞ þ Oðhn cn Þ a:s:; rn ðxÞ ¼ h2n f 0 ðxÞC 0 ðxÞ c uniformly for x 2 ½a; b, where cn ¼ h2n þ ðlog nÞ1=2 =ðnhn Þ1=2 . Hence, Z b rn ðxÞ dx ¼ Oð2J n =2 hn cn Þ a:s. cper J n ;k ðxÞ f ðxÞ a Similarly, we can prove part (b). Hence, the proof of Lemma A.2 is complete. & Lemma A.3. Assume that KðxÞ is a kernel function with finite support ½c; c and hn ! 0 as n ! 1. Let tl ; l ¼ 1; 2; . . . ; p be some jump points of function TðxÞ and the corresponding jump size be denoted as d l ; l ¼ 1; 2; . . . ; p. Then, for the integral estimator of the wavelet coefficient, we have Pp Pn Z A Z b j¼1 K h ðX j xÞ l¼1 d l Iðtl pX j pbÞ 1=2 J n =2 P dx ¼ 2 cper ðxÞ ðb aÞ d cðxÞ dx, (A.1) l n J n ;k a 1 j¼1 K h ðX j xÞ uniformly for k in Iðtl ; 2J n ðb aÞÞ, and Pp Pn Z b j¼1 K h ðX j xÞ l¼1 d l Iðtl pX j pbÞ per Pn dx ¼ 0, cJ n ;k ðxÞ a j¼1 K h ðX j xÞ S uniformly for ke pl¼1 Ll ðAÞ. (A.2) ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 For the discretized estimator of wavelet coefficient, we have Pn Pp N b aX j¼1 K h ðX j wi Þ l¼1 d l Iðtl pX j pbÞ per Pn c ðwi Þ N i¼1 J n ;k K ðX j wi Þ j¼1 h ! Z A 2J n =2 1=2 J n =2 ðb aÞ d l cðxÞ dx þ OP ¼2 , N 1 249 ðA:3Þ uniformly for k in Iðtl ; 2J n ðb aÞÞ and ! Pp Pn N baX 2J n =2 j¼1 K h ðX j wi Þ l¼1 d l Iðtl pX j pbÞ per Pn ¼ OP c ðwi Þ , N i¼1 J n ;k N j¼1 K h ðX j wi Þ S uniformly for ke pl¼1 Ll ðAÞ, J kðb aÞ n Ll ðAÞ ¼ k : a þ tl o2 Aðb aÞ , 2J n (A.4) in which A is the support point. Proof of Lemma A.3. By Lemma B.1 in Appendix B, we have for large n; P d l nj¼1 K h ðX j xÞIðtl pX j pbÞ Pn ¼ d l Iðtl pxpbÞ. j¼1 K h ðX j xÞ (A.5) Hence it follows that for k 2 Iðtl ; 2J n ðb aÞÞ; Pp Pn Z b Z ð1gk Þ=dn p X j¼1 K h ðX j xÞ l¼1 d l Iðtl pX j pbÞ per Pn dx ¼ ðdn ðb aÞÞ1=2 cJ n ;k ðxÞ dl cðxÞ dx ððtl aÞ=ðbaÞgk Þ=dn a j¼1 K h ðX j xÞ l¼1 Z A ¼ ðdn ðb aÞÞ1=2 d l cðxÞ dx, 1 J n J n and gk ¼ 2 k, in which we have used where dn ¼ 2 1 dn k a þ J n ðb aÞ tl o1, ba 2 for k 2 Iðtl ; 2J n ðb aÞÞ and for ial d1 k d1 k d1 n a þ J n ðb aÞ ti ¼ n a þ J n ðb aÞ tl þ n ðti tl Þ ba ba ba 2 2 Sp tends to infinity as n ! 1. This implies (A.1). Since for all ke l¼1 Ll ðAÞ, we have 1 tl a gk ! 1, dn b a as n ! 1. This implies (A.2). Similarly, we can then prove (A.3) and (A.4). This completes the proof of Lemma A.3. & Lemma A.4. (a) Assume that the conditions (A.1)–(A.5) in Appendix A are satisfied. Then we have Pn Z b j¼1 K h ðX j xÞsðX j Þj per Pn dx ¼ OP ðn1=2 Þ. cJ n ;k ðxÞ K ðX xÞ h j a j¼1 (b) Assume that the conditions (A.1)–(A.5) in Appendix A and Assumption J(c) are satisfied. Then Pn N baX j¼1 K h ðX j wl ÞsðX j Þj per Pn ¼ OP ðn1=2 Þ. cJ n ;k ðwl Þ N l¼1 j¼1 K h ðX j wl Þ Proof. This is by similar logic for the proof of Lemma A.3. & ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 250 Theorem A.1. Assume that fX t ; t ¼ 1; 2; . . .g is a sequence of strictly stationary random variables with mixing coefficient aðkÞ ¼ Oðkr Þ for some constant r40, and has a continuously differentiable density function f ðxÞ with 0ocpf ðxÞpCo1 for x 2 U. The conditional density functions f ðx0 jxl Þ (la0) are bounded. ft ; t ¼ 1; 2; . . .g is a sequence of independent and identically distributed random variables with Ei ¼ 0 and E2i ¼ s20 , Rand for each t, Rt is independent of fX s ; sptg. KðÞ is a kernel function with finite support ½c; c and KðxÞ dx ¼ 1, xKðxÞ dx ¼ 0. sðÞ and KðÞ are continuously differentiable functions. Let hn ! 0; nhn ! 1 and N ! 1 as n ! 1. Then there exists a sequence of the standard Wiener process fW ðtÞ; t40g such that X n pffiffiffi 1 n 2 sup0ptp1 An ðX j ; tÞj ns0 W n ðEAn ðX j ; tÞÞ ¼ O n1=4 ðlog nÞ3=4 þ þ a:s. j¼1 hn Nhn Furthermore, we have X n pffiffiffi sup An ðX j ; tÞj ns0 W n ðrðtÞÞ 0ptp1 j¼1 n 1 ¼ O n1=4 ðlog nÞ3=4 þ n1=2 hn ðlog nÞ1=2 þ þ Nhn hn a:s:; where b a X K h ðX j xÞ sðX j Þ, Nhn l¼1 f ðwl Þ ½Nt An ðX j ; tÞ ¼ Z t rðtÞ ¼ 0 s2 ððb aÞx þ aÞ dx f ððb aÞx þ aÞ for 0ptp1, in which wl ¼ a þ i=Nðb aÞ. This result can be proved by a strong approximation of the empirical process. Its proof is very lengthy, and is thus placed in Appendix B. Proof of Theorem 2.1. We only prove part (b) of Theorem 2.1. Similar arguments apply to part (a) of Theorem 2.1. Write W ðiÞ J n ;k N b aX ¼ cper ðwl Þ N l¼1 J n ;k Pn K h ðX j wl Þsðwj Þj Pn . j¼1 K h ðX j wl Þ j¼1 From Lemmas A.2 and A.3, it remains to prove that W ðiÞ J n ;k has the same asymptotic distribution as in ðTÞ Theorem 2.1, since W ðiÞ and W have the same asymptotic distribution functions. By a simple J n ;k J n ;k transformation from Lemma A.2, we have W ðiÞ J n ;k ¼ n 1 X Gn;j sðX j Þj , nhn j¼1 (A.6) where Gn;j ¼ N K i;j Zn ðwi Þ baX , N i¼1 f ðwi Þ (A.7) ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 251 J n where K i;j ¼ K h ðX j wi Þ and Zn ðxÞ ¼ cper k, J n ;k ðxÞ, qi ¼ ðwi aÞ=ðb aÞ ¼ i=N, wi ¼ a þ qi ðb aÞ and gk ¼ 2 Jn where k ¼ ½2 z for 0ozo1. Let Ln ðzÞ denote the term on the right-hand side of (A.6). Hence we have n X N K i;j Zn ðwi Þ b aX sðX j Þj nNhn j¼1 i¼1 f ðwi Þ Ln ðzÞ ¼ n X N K i;j 2J n =2 cð2J n ðqi gk ÞÞsðX j Þj b aX . nNhn j¼1 i¼1 ðb aÞ1=2 f ðwi Þ ¼ ðA:8Þ To use the results of the strong approximation for the partial sum of independent sequence, we write ðb aÞ1=2 K h ðX j wi ÞsðX j Þj , Nhn f ðwi Þ P P and bi ¼ nj¼1 ai;j . Let zn;0 ¼ 0 and zn;i ¼ il¼1 bl . Hence we obtain that aij ¼ zn;i ¼ i X n X alj l¼1 j¼1 " # n i X K h ðX j wl Þ ðb aÞ1=2 X ¼ sðX j Þj f ðwl Þ Nhn j¼1 l¼1 ¼ n X An ðX j ; qi Þj , ðA:9Þ j¼1 where An ðX j ; qi Þ ¼ i K h ðX j wl Þ ðb aÞ1=2 X sðX j Þ. f ðwl Þ Nhn l¼1 By simple calculation, we can obtain the mean of A2n;j ðwi Þ, which is, for large enough n; !2 i K h ðX j wl Þ ðb aÞ1=2 X 2 sðX j Þ EAn ðX j ; qi Þ ¼ E f ðwl Þ Nhn l¼1 2 Z wi K h ðX j xÞ 1 1 dx þ O E sðX j Þ ¼ 2 N f ðxÞ hn ðb aÞ a 2 ! Z qi 2 s ðxðb aÞ þ aÞ 1 dx þ O ¼ þ Oðh2n Þ. f ðxðb aÞ þ aÞ Nhn 0 ðA:10Þ Write Z rðtÞ ¼ 0 t s2 ðxðb aÞ þ aÞ dx f ðxðb aÞ þ aÞ for 0ptp1, otherwise, rðtÞ ¼ 0. Assuming that Ej ¼ 0 and Eð2j Þ ¼ s20 , by Theorem A.1, we have X n pffiffiffi An ðX j ; tÞj ns0 W n ðrðtÞÞ sup 0ptp1 j¼1 n 1 ¼ O n1=4 ðlog nÞ3=4 þ n1=2 hn ðlog nÞ1=2 þ þ a:s:; Nhn hn ðA:11Þ ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 252 where ðb aÞ1=2 X K h ðX j ql Þ sðX j Þ, f ðwl Þ Nhn l¼1 ½Nt An ðX j ; tÞ ¼ in which ½a denotes the integer part of real a. Obviously, zn;i ¼ n X An ðX j ; wi Þj . j¼1 Hence, from (A.8), we have N 1 X qi g k c Ln ðzÞ ¼ 1=2 ½zn;i zn;i1 , dn ndn i¼1 where dn ¼ 2J n . Hence, by Abel’s summation, we have N 1 X qi g k ðzn;i zn;i1 Þc Ln ðzÞ ¼ 1=2 dn ndn i¼1 1 X q gk 1 q gk 1 N q gk ¼ 1=2 c N c iþ1 zn;N 1=2 c i zn;i . dn dn dn ndn ndn i¼1 ðA:12Þ For the Wiener process, it is known that sup jW ðrðx þ dn ÞÞ W ðrðxÞÞj ¼ Oððdn log dn Þ1=2 Þ a:s. 0pxp1 where dn is any small number (cf. Csörgo+ and Révész, 1981). Using this property and the bounded variation of KðxÞ, we have N 1 X qiþ1 x qi x c c W ðrðqi ÞÞ dn dn i¼1 ! Z 1 tx log N 1=2 ¼ W ðrðtÞÞ dC a:s:; ðA:13Þ þO dn N 0 uniformly for x 2 ½0; 1. It follows from (A.12) and (A.13) that ! Z 1 s0 x gk Yn log N 1=2 Ln ðzÞ ¼ c dW ðrðxÞÞ þ O 1=2 þ Nndn dn ðndn Þ1=2 0 dn a:s. uniformly for 0pgk p1, where Yn ¼ 2 1=2 log n 3=4 h log n 1 1 þ n þ þ . n n Nhn nhn Let Y n ðgk Þ ¼ s0 d1=2 n Z 1 c 0 x gk dW ðrðxÞÞ. dn From the definition of the Wiener process, it is easy to show by calculating its moments that Z y gðsÞ dW ðsÞ for 0pxp1, W ðrðyÞÞ:¼ 0 (A.14) ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 253 where :¼ means that the arguments on both of its sides have the same distribution, and for 0oxo1 g2 ðxÞ ¼ ðb aÞs2 ðxðb aÞ þ aÞ , f ðxðb aÞ þ aÞ otherwise, g2 ðxÞ ¼ 0. Hence, Z 1 s0 xy c gðxÞ dW ðxÞ. Y n ðyÞ ¼ 1=2 dn dn 0 Let Y n1 ðxÞ ¼ Y n ðxÞ=gðxÞ and Z 1 xy 1=2 C dW ðxÞ. Y n0 ðyÞ ¼ s0 dn dn 0 Following the method of Härdle (1989), we have sup jY n0 ðxÞ Y n1 ðxÞj ¼ Op ðd1=2 n Þ. (A.15) x2ð0;1Þ Hence, the proof of Theorem 2.1 is complete, since it follows from (A.8)–(A.15) and Lemmas A.2 and A.3 that pffiffiffi n sup0pxp1 jM n ðxÞj and sup0pxp1 jY n0 ðxÞj have the same asymptotic distribution functions. & Proof of Corollary 2.1. It is straightforward from Theorem 2.1 and Lemma B.3 (see Bickel and Rosenblatt, 1973). & Proof of Theorem 2.2. Note that U JðTÞ ¼ U ðTÞ J n ðkÞ can be decomposed into two parts: n ;k ðiÞ ðiiÞ U ðTÞ J n ;k ¼ U J n ;k þ U J n ;k , (A.16) where U ðiÞ J n ;k U ðiiÞ J n ;k Z Pn b ZJ n ;k ðxÞ ¼ a Z b ¼ a K ðX xÞsðX l Þl l¼1 Pn h l l¼1 K h ðX l xÞ dx, (A.17) Pn K ðX xÞTðX l Þ Pn h l dx, ZJ n ;k ðxÞ l¼1 l¼1 K h ðX l xÞ (A.18) with ZJ n ;k ðxÞ ¼ cper J n ;k ðxÞ. From Lemmas A.2 and A.3, we have Z A 1=2 J n =2 ¼ 2 d ðb aÞ cðxÞ dx þ OP ð25J n =2 þ 2J n =2 hn cn Þ, U ðiiÞ l J n ;k 1 uniformly for k 2 Iðtl ; 2 for J n Aðb aÞÞ, and 5J n =2 U ðiiÞ þ 2J n =2 hn cn Þ, J n ;k ¼ OP ð2 Sp ke l¼1 Iðtl ; 2J n =2 Aðb aÞÞ. From Lemma A.4, it follows that 1=2 U ðiÞ Þ J n ;k ¼ OP ðn for all k 2 Dn . This implies that Theorem 2.2 holds for the integral estimation of the wavelet coefficient. Similarly, we can prove Theorem 2.2 for the discretized estimation of the wavelet coefficient. & Proof of Theorem 2.3. The proof is straightforward from Theorem 2.2. & Proof of Theorem 2.4. As TðxÞ has a jump at tl and TðxÞ is differentiable at all points except tl , it follows from the similar arguments in Lemmas A.2 and A.3 Z Pn b j¼1 K h ðX j xÞTðX j Þ per dxpC23J n =2 , cJ n ;k ðxÞ Pn a K ðX xÞ j j¼1 h ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 254 for all keIðtl ; 2J n Aðb aÞÞ, where C is a generic constant whose value may change from line to line. Hence, by Lemma A.3 we have Z Pn b j¼1 K h ðX j xÞTðX j Þ per dxXC2J n =2 , cJ n ;k ðxÞ Pn a j¼1 K h ðX j xÞ for k 2 Iðtl ; 2J n Aðb aÞÞ. From the same arguments of Lemma A.2, we can easily show that Pn 1=2 ! Z b dn j¼1 K h ðX j xÞsðX j Þj per Pn dx ¼ OP cJ n ;k ðxÞ , nhn a j¼1 K h ðX j xÞ for all k ¼ ½2J n z 2 Dn , 0ozo1. In fact, Taylor’s formula implies that n X K h ðX j xÞsðX j Þj ¼ sðxÞ j¼1 n X K h ðX j xÞj þ s0 ðxÞ n X j¼1 þ K h ðX j xÞðX j xÞj j¼1 n 1X s00 ðxj ÞK h ðX j xÞðX j xÞ2 j , 2 jþ1 where xj lies between X j and x. By Lemma B.2 (see Proposition 1 and Lemma 1 of Xia, 1998), we have X n sup K h ðX j xÞj ¼ OP ððnhn Þ1=2 Þ, x2L j¼1 X n sup K h ðX j xÞðX j xÞj ¼ OP ððnhn Þ1=2 hn Þ, x2L j¼1 X n sup K h ðX j xÞðX j xÞ2 j ¼ OP ððnhn Þ1=2 h2n Þ, x2L j¼1 where L ¼ ½a d0 ; b þ d0 for some d0 40. Hence, Pn 1=2 ! Z b K ðX xÞsðX Þ dn h j j j j¼1 Pn dx ¼ OP cper . J n ;k ðxÞ K ðX xÞ nh j n a j¼1 h These inequalities imply that maxfjU ðTÞ J n ;k j; k 2 Iðtl ; 2 J n Aðb aÞgXC d1=2 n dn nhn 1=2 ! (A.19) and maxfjU ðTÞ J n ;k j; k 2 Dn Iðtl ; 2 J n Aðb aÞgpC d 3=2 dn þ nhn 1=2 ! . (A.20) As 1 1=2 d1=2 n ðnhn dn Þ 1 1=2 d3=2 n þ ðnhn dn Þ ! 1, combining (A.19) with (A.20), we obtain that ðTÞ J n =2 Aðb aÞÞg. maxfjU ðTÞ J n ;k j; k 2 Dn g ¼ maxfjU J n ;k j; k 2 Iðtl ; 2 (A.21) ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 255 ðTÞ In fact, U ðTÞ J n ;k can be replaced by W J n ;k in (A.21). This implies that ðiÞ k^0 2 Iðtl ; 2J n Aðb aÞÞ, (A.22) ðiÞ ðiÞ ðdÞ ðniÞ ðndÞ where k^0 is one of k^l , k^l , k^l and k^l . Hence, we have ðiÞ k^l ðiÞ jt^l tl j ¼ a þ J n ðb aÞ tl o2J n Aðb aÞ. 2 (A.23) ðdÞ Similarly, we can prove (A.23) also holds for other estimators t^l . This completes the proof of Theorem 2.4. & Proof of Theorem 2.5. Let tl , l ¼ 1; 2; . . . ; p denote jump points tl , l ¼ 1; 2; . . . ; L of TðxÞ. Similarly, t^l denotes the corresponding estimators. Let gðxÞ ¼ f 1=2 ðxÞ sðxÞ for x 2 ½a; b. Using similar arguments of (A.21), we obtain that maxfgða þ kðb aÞdn ÞjU ðTÞ J n ;k j; k 2 Dn g J n Aðb aÞg. ¼ maxfgða þ kðb aÞdn ÞjU ðTÞ J n ;k j; k 2 Iðtl ; 2 ðA:24Þ As ðTÞ gða þ kðb aÞdn ÞjU ðTÞ J n ;k j ¼ gðtl ÞjU J n ;k j þ Dn ðkÞ, where Dn ðkÞ ¼ ðgða þ kðb aÞdn Þ gðtl ÞÞjU ðTÞ J n ;k j, it is easy to show that J n maxfjDn ðkÞj; k 2 Iðtl ; 2J n Aðb aÞg ¼ Oð2J n ÞmaxfjU ðTÞ Aðb aÞg. J n ;k j; k 2 Iðtl ; 2 From (A.19), (A.20), and Lemma A.4, we have 1=2 ! dn Cdn d1=2 pmaxfjDn ðkÞj; k 2 Iðtl ; 2J n Aðb aÞgpCd3=2 n n . nhn Thus, we have J n Aðb aÞg maxfgða þ kðb aÞ=2J n ÞjU ðTÞ J n ;k j; k 2 Iðtl ; 2 ¼ gðtl ÞmaxfjU ðTÞ J n ;k j; k 2 Iðtl ; 2 J n Aðb aÞg þ OP d3=2 n d3n nhn 1=2 ! . In contrast, arguments of Lemmas A.2–A.4 imply 1=2 ! n o [L dn ðTÞ max gða þ kðb aÞ=2J n ÞjU J n ;k j; ke l¼1 Iðtl ; 2J n Aðb aÞ ¼ OP d3=2 . n þ nhn Note that 1 3 1=2 d3=2 n ððnhn Þ dn Þ 1 1=2 d3=2 n þ ððnhn Þ dn Þ ! 0. ðTÞ Hence, the estimation of locations of change points defined by U ðTÞ J n ;k (W J n ;k ) is equivalent to that defined by M n ðk2J n Þ (M n ðk2J n Þ) in probability. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 256 Thus, Theorem 2.1 implies that with a probability tending to 1 w, jM n ðxÞjpC 1w for those x at which TðxÞ has no jump or sharp cusp, where C 1w is the 1 w quantile of the asymptotic distribution of Theorem 2.1. Using the same arguments as those in Theorems 2.4 and 3.2, we can easily show that with probability 1 w, M n ðxÞ (or jM nn ðxÞj) will exceed C 1o at only those x values where J ½2J n x n x : a þ J n ðb aÞ tl p2 Aðb aÞ , 2 l ¼ 1; 2; . . . ; p. From the definitions of t^l ; l ¼ 1; 2; . . . ; p, we can show with probability 1 w that p^ j ¼ p ðj ¼ 1; 2Þ, or q^ j ¼ q ðj ¼ 1; 2Þ. Finally, we can prove Theorem 2.5 by letting n ! 1 and w ! 0. & Appendix B In this section, we prove (A.5) and Theorem A.1. Lemma B.1. Suppose that KðxÞ is a kernel function with finite support ½c; c and hn ! 0 as n ! 1. Let tl 2 ½a; b, K h ðxÞ ¼ KððX i xÞ=hn Þ. Then Pn j¼1 K h ðX j xÞIðtl pX j pbÞ Pn ¼ Iðtl pxpbÞ. j¼1 K h ðX j xÞ Proof. It is easy to obtain that 8 if tl pX i pb; xotl or tl pX i pb; x4b; > <1 Iðtl pX i pbÞ Iðtl pxpbÞ ¼ 1 if tl pxpb; X i otl or tl pxpb; X i 4b; > : 0 otherwise: Hence, Iðtl pX i pbÞ Iðtl pxpbÞ ¼ Iðtl pX i pb; xotl Þ þ Iðtl pX i pb; x4bÞ þ Iðtl pxpb; X i otl Þ þ Iðtl pxpb; X i 4bÞ. We can easily show that Pn j¼1 K h ðX j xÞIðtl pX i pb; xotl Þ Pn j¼1 K h ðX j xÞ Pn j¼1 K h ðX j xÞIðtl xpX i xpb x; xotl Þ Pn ¼ j¼1 K h ðX j xÞ Pn j¼1 KðY j ÞIððtl xÞ=hn pY i pðb xÞ=hn ; xotl Þ Pn ¼ . j¼1 K h ðY j Þ As jY i j4C, KðY i Þ ¼ 0 and for large enough n, xotl implies ðtl xÞ=hn ! 1, we obtain that Iððtl xÞ=hn pY i pðb xÞ=hn ; xotl ÞpIðY i 4CÞ. Hence, for large n, we have n X KðY j ÞIððtl xÞ=hn pY i pðb xÞ=hn ; xotl Þ ¼ 0. i¼1 Hence, Pn xÞIðtl pX i pb; xotl Þ Pn ¼ 0. j¼1 K h ðX j xÞ j¼1 K h ðX j ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 257 Similarly, we can show that Pn j¼1 K h ðX j xÞIðtl pX i pb; x4bÞ Pn ¼ 0, j¼1 K h ðX j xÞ Pn j¼1 K h ðX j xÞIðtl pxpb; X i otl Þ Pn ¼ 0, j¼1 K h ðX j xÞ Pn j¼1 K h ðX j xÞIðtl pxpb; X i 4bÞ Pn ¼ 0. j¼1 K h ðX j xÞ This completes the proof of Lemma B.1. & Proof of Theorem A.1.. For the sake of convenience, let ðb aÞ1=2 X K h ðz wl Þ sðzÞ f ðwl Þ Nhn l¼1 ½Nt An ðz; tÞ ¼ and L ¼ ½a d0 ; b þ d0 ¼ ½a0 ; b0 for some positive constant d0 40. By the continuity of KðÞ and f ðÞ, we obtain jAn ðz; tÞjpM, sup 0oto1;z2L 1 1 jAn ðz1 ; tÞ An ðz2 ; tÞj ¼ O þ , nhn Nhn 0oto1 jz1 z2 jpM=n sup sup jAn ðz; tÞj ¼ 0, sup 0oto1;zeL for some large M40. We have the following: n X X An ðX j ; tÞj ¼ An ðX j ; tÞj þ X j 2L j¼1 X An ðX j ; tÞj X j eL ¼ Dn1 þ Dn2 . Thus, we easily obtain sup jDn2 j ¼ 0 a:s:; 0oto1 where a.s. denotes the convergence almost surely. Now we divide the interval L into nonoverlapping intervals of equal length L1 ; L2 ; . . . ; Ln , with right extreme point li , i ¼ 1; 2; . . . ; nðln ¼ b0 Þ. Define X~ j ¼ li IðX j 2 Li Þ, i ¼ 1; 2; . . . ; and X~ j ¼ b0 þ 1, if X j eLi ; i ¼ 1; 2; . . . ; n. Thus we have 1 jX~ j X j j ¼ O a:s. n if X j 2 Li . Next, we prove that Dn1 ¼ n X j¼1 ¼ n X j¼1 An ðX~ j ; tÞIðX j 2 LÞj þ n X ½An ðX j ; tÞ An ðX~ j ; tÞIðX j 2 LÞj j¼1 1 n ~ An ðX j ; tÞIðX j 2 LÞj þ O þ , hn Nhn uniformly for 0oto1. P Let F n ðxÞ ¼ nj¼1 IðX j oxÞj denote the hybrids of the empirical process and the partial sums process. For convenience, we denote F ðx2 Þ F ðx1 Þ by F ð½x1 ; x2 ÞÞ and F n ðx2 Þ F n ðx1 Þ by F n ð½x1 ; x2 ÞÞ. Horváth (2000) and Xia (1999) independently studied the properties of the stochastic process F n ðxÞ. Xia (1999) in more general ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 258 conditions has shown that sup jF n ðxÞ pffiffiffi ns0 W n ðGðxÞÞj ¼ Oðn1=4 ðlog nÞ3=4 Þ a:s:; (B.1) x where W n ðtÞ for 0oto1 is a sequence of the standard Wiener process in Cð0; 1Þ. Thus, by Abel’s transformation, we have n X An ðX~ j ; tÞIðX j 2 LÞj ¼ j¼1 n X n X An ðli ; tÞIðX j 2 Li Þj i¼1 j¼1 ¼ n X An ðli ; tÞ i¼1 ¼ n X n X IðX j 2 Li Þj j¼1 An ðli ; tÞF n ðLi Þ i¼1 ¼ An ðb0 ; tÞ n X F n ðLi Þ i¼1 n1 i X X ½An ðliþ1 ; tÞ An ðli ; tÞ F n ðLj Þ i¼1 j¼1 pffiffiffi ¼ s0 nAn ðb0 ; tÞW ðF ðLÞÞ n1 pffiffiffi X ½An ðliþ1 ; tÞ An ðli ; tÞW ðF ð½a0 ; liþ1 ÞÞÞ s n i¼1 pffiffiffi þ An ðb0 ; tÞ½F n ðLÞ s0 nW ðF ðLÞÞ n1 X pffiffiffi ½An ðliþ1 ; tÞ An ðli ; tÞ½F n ð½a0 ; liþ1 ÞÞ s0 nW ðF ð½a0 ; liþ1 ÞÞÞ i¼1 ¼ D31 D32 þ D33 D34 . ðB:2Þ By simple calculation, we can show that sup n1 X jAn ðliþ1 ; tÞ An ðli ; tÞjpM. 0oto1 i¼1 From (B.1), we have sup jD33 j ¼ Oðn1=4 ðlog nÞ3=4 Þ a:s. (B.3) 0oto1 and pffiffiffi sup jD34 j ¼ max jF n ð½a0 ; liþ1 ÞÞ s0 nW n ðF ð½a0 ; liþ1 ÞÞj 1pipn 0oto1 sup n1 X jAn ðliþ1 ; tÞ An ðli ; tÞj 0oto1 i¼1 ¼ Oðn1=4 ðlog nÞ3=4 Þ a:s. ðB:4Þ By the limit results of the increment of the Wiener process in Csörgo+ and Révész (1981, p. 26), we have sup jz1 z2 jpMan ;z1 ;z2 2L 1=2 jW ðz1 Þ W ðz2 Þj ¼ Oða1=2 Þ n ðlog nÞ a:s:; (B.5) ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 259 where an ! 0. Hence, using Abel’s transformation and the mean value theorem of integration, we have Z n1 X An ðx; tÞ dW ðF ð½a0 ; xÞÞ An ðb0 ; tÞW ðF ðLÞÞ þ ½An ðliþ1 ; tÞ An ðli ; tÞW ðF ða0 ; liþ1 ÞÞ x2L i¼1 X n1 ¼ ½An ðliþ1 ; tÞ An ðli ; tÞ½W ðF ð½a0 ; liþ1 ÞÞ W ðF ð½a0 ; liþ1 ÞÞ i¼1 ¼ Oðn1=2 ðlog nÞ1=2 Þ a:s:; ðB:6Þ uniformly for t 2 ð0; 1Þ, where liþ1 lies between li and liþ1 . This implies that Z pffiffiffi D31 D32 ¼ ns0 W ðF ð½a0 ; xÞÞÞ dAn ðx; tÞ þ Oððlog nÞ1=2 Þ a:s:; x2L uniformly for t 2 ð0; 1Þ. Hence, from (B.2)–(B.4) and (B.6) we have Z X n pffiffiffi sup An ðX j ; tÞj ns0 An ðx; tÞ dW ðF ð½a0 ; xÞÞÞ 0oto1 j¼1 x2L ¼ Oðn1=4 ðlog nÞ3=4 þ 1=hn þ n=ðNhn ÞÞ a:s. ðB:7Þ Write Y~ n ðtÞ ¼ Z An ðx; tÞ dW ðF ½a0 ; xÞÞ, x2L and Y n ðtÞ ¼ W ðrn ðtÞÞ, where rn ðtÞ ¼ EðY~ n ðtÞÞ2 . By simple calculation (c.f. (A.10)), we have ! Z wt 2 s ðxÞ 1 2 2 dx þ O rn ðtÞ ¼ þ hn . f ðxÞ Nhn a By the properties of the Wiener process, it is easy to show that Z Z An ðx; sÞ dW ðF ð½a0 ; xÞÞÞ An ðx; tÞ dW ðF ð½a0 ; xÞÞÞ EY~ n ðsÞY~ n ðtÞ ¼ E Z x2L Z ðxws Þ=hn Z x2L ðxwt Þ=hn KðzÞ dzs2 ðxÞf ðxÞ dx þ Oð1=ðNhn Þ2 Þ f ðx hn zÞ x2L ðxaÞhn ðxaÞhn ! Z s2 ðxÞ 1 2 dx þ O ¼ þ h2n Nhn fx2Lg\fapxpws g\fapxpwt g f ðxÞ ! Z ws ^wt 2 s ðzÞ 1 2 2 dz þ O ¼ þ hn f ðzÞ Nhn a ¼ KðyÞ dy f ðx hn yÞ ¼ EY n ðsÞY n ðtÞ, where a ^ b ¼ minða; bÞ; and the constant in OðÞ is independent of t, s, and n. Hence, Y~ n ðtÞ and Y n ðtÞ have the same distributions for all t 2 L. Thus, X n pffiffiffi n 1 sup An ðX j ; tÞj ns0 W ðrn ðtÞÞ ¼ O n1=4 ðlog nÞ3=4 þ þ a:s. Nhn hn 0oto1 j¼1 This implies the first result of Lemma B.1. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 260 Again, using (B.5), we have ðlog nÞ1=2 þ hn ðlog nÞ1=2 sup jW ðrn ðtÞÞ W ðrðtÞÞj ¼ O Nhn 0oto1 ! a:s. Hence, sup jY~ n ðtÞ W ðrðtÞÞj ¼ 0oto1 ðlog nÞ1=2 þ hn ðlog nÞ1=2 Nhn ! a:s. From (B.7), we obtain that X n pffiffiffi 1 n sup An ðX j ; tÞj ns0 W ðrðtÞÞ ¼ O n1=4 ðlog nÞ3=4 þ n1=2 hn ðlog nÞ1=2 þ þ hn Nhn 0oto1 j¼1 This completes the proof of Lemma B.1. a:s. & To prove Lemmas A.2 and A.4–A.6, we need the following lemma: Lemma B.2. Assume that fX t ; t ¼ 1; 2; . . .g is a strictly stationary mixing sequence with coefficient aðkÞ ¼ Oðkr Þ for some r40. The density function f ðxÞ of X t satisfies 0ocpf ðxÞpCo1 for some constants c and C, and f ðxÞ has a bounded derivative. The conditional density functions f ðxR0 jxl Þ ðla0Þ areR bounded. KðÞ is a continuously differentiable kernel function with finite support ½c; c, and KðxÞ dx ¼ 1, xKðxÞ dx ¼ 0. Let hn ! 0; nhn ! 1 and N ! 1 as n ! 1. Then (a) for any positive integer i, we have n X Xt x 0 iþ1 K ðX t xÞi ¼ nhiþ1 fi f ðxÞ þ nhiþ2 n fiþ1 f ðxÞ þ Oðnhn cn Þ a.s., hn t¼1 uniformly for x 2 ½a; b, where cn ¼ h2n þ log n=ðnhn ÞÞ1=2 and Z fi ¼ xi KðxÞ dx; (b) for any 0prp1, we have n X Xt x rþ2 0 rþ1 K jX t xjr ¼ nhrþ1 n Zr f ðxÞ þ nhn Zrþ1 f ðxÞ þ Oðnhn cn Þ a.s., h n t¼1 uniformly for x 2 ½a; b, where Z Z r Zr ¼ jxj KðxÞ dx; Zrþ1 ¼ xjxjr KðxÞ dx; (c) in addition, assume that ft ; t ¼ 1; 2; . . .g satisfies the conditions of Lemma B.2, then ( n X Oððnhn log nÞ1=2 Þ a.s.; Xt x K t ¼ hn OP ððnhn Þ1=2 Þ; t¼1 and ( n X Oððnhn log nÞ1=2 Þ a.s.; Xt x Xt x K t ¼ hn hn OP ððnhn Þ1=2 Þ; t¼1 uniformly for x 2 ½a; b. ARTICLE IN PRESS G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 261 Proof. Following the proof of Proposition 1 in Masry (1996), we can easily prove this lemma. In fact, parts (a) and (c) are also the results of Lemma 1 in Xia (1998). & Lemma B.3 (Bickel and Rosenblatt, 1973). Let cðÞ be a kernel function with bounded support ½A; A and W ðÞ be a standard Wiener process. Define Z xt V n ðtÞ ¼ dn1=2 c dW ðxÞ. dn Suppose that dn ! 0 and nd2n ! 1. Then ( ) 1=2 n P Aðdn Þ sup jV n ðxÞj aðdn Þoz ! expð2 expðzÞÞ, k2 0pxp1 where AðÞ, aðÞ, k1 , and k2 are the same as those of Corollary 2.1. References Ahn, C.M., Thompson, H.E., 1988. Jump-diffusion processes and the term-structure of interest rates. Journal of Finance 43, 155–174. Aı̈t-Sahalia, Y., 1996a. Nonparametric pricing of interest rate derivative securities. Econometrica 64, 527–560. Aı̈t-Sahalia, Y., 1996b. Testing continuous-time models of the spot interest rate. The Review of Financial Studies 9, 385–426. Aı̈t-Sahalia, Y., 2002. Telling from discrete data whether the underlying continuous-time model is a diffusion. Journal of Finance 61, 2075–2112. Aı̈t-Sahalia, Y., Wang, Y., Yared, F., 2001. Do option markets correctly price the probability of movement of the underlying asset. Journal of Econometrics 53, 499–547. Amin, K., 1993. Jump diffusion option valuation in discrete time. Journal of Finance 48, 1833–1863. Antoniadis, A., Gijbels, I., 1997. Detecting abrupt changes by wavelet methods. Discussion Paper 9716. Institute of Statistics, Louvain-laNeuve. Anderson, T.G., Lund, J., 1997. Estimating continuous time stochastic volatility models of the short term interest rate. Journal of Econometrics 77, 343–377. Auestad, B., Tjøstheim, D., 1990. Identification of nonlinear time series: first order characterization and order determination. Biometrika 77, 669–687. Bates, D., 1991. The crash of 87’s: was it expected? the evidence from options markets. Journal of Finance 46, 1009–1044. Bickel, B.L., Rosenblatt, M., 1973. On some global measures of the deviations of density function estimates. The Annals of Statistics 1, 1071–1095. Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York. Bradley, R.C., 1986. Basic properties of strong mixing conditions. In: Eberlein, E., Taqqu, M.S. (Eds.), Dependence in Probability and Statistics: A Survey of Recent Results. Birhauser, Boston, pp. 165–192. Carlstein, H., Müller, G., Siegmund, D., 1994. Change Points Problem. IMS, Hayward, CA. Cox, J.C., Ingersoll, J.E., Ross, S.A., 1985. An intertemporal General equilibrium model of asset prices. Econometrica 53, 363–384. + M., Révész, P., 1981. Strong Approximation in Probability and Statistics. Academic Press, New York. Csörgo, Das, S.R., Foresi, S., 1996. Exact solutions for bond and option prices with systematic jump risk. Review of Derivatives Research 1, 7–24. Daubechies, I., 1992. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia. Delgado, M., Hidalgo, J., 2000. Nonparametric inference on structural breaks. Journal of Econometrics 96, 113–144. Drost, F., Nijman, T., Werker, B., 1998. Estimation and testing in models containing both jumps and conditional heteroscedasticity. Journal of Business & Economic Statistics 16, 237–243. Duffie, D., Kan, R., 1993. A yield factor model of interest rates. Working Paper. Stanford University. Duffie, D., Pan, J., Singleton, K.J., 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 1343–1376. Fan, J., Gijbels, I., 1996. Local Polynomial Modeling and its Applications. Chapman & Hall, London. Fan, J., Yao, Q., 1998. Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645–660. Gallent, A.R., Tauchen, G., 1997. Estimation of continuous time models for stock returns and interest rates. Macroeconomics Dynamics 1, 135–168. Gasser, T., Sroka, L., Jennen-Steinmetz, C., 1986. Residual variance and residual pattern in nonlinear regression. Biometrika 73, 625–633. Härdle, W., 1989. Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis 29, 163–179. Härdle, W., Tsybakov, A., 1997. Local polynomial estimators of the volatility function in nonparametric autoregression. Journal of Econometrics 81, 223–242. Horváth, L., 2000. Approximations for hybrids of empirical and partial sums processes. Journal of Statistical Planning and Inference 88, 1–18. Kao, C.R., Ross, S.L., 1995. A CUSUM test in the linear regression model with serially correlated disturbances. Econometric Review 14, 331–346. ARTICLE IN PRESS 262 G. Chen et al. / Journal of Econometrics 143 (2008) 227–262 Karlin, S., McGregor, J., 1959. Coincidence probabilities. Pacific Journal of Mathematics 9, 1141–1164. Korostelev, A.P., 1987. Minimax estimation of a discontinuous signal. Theory of Probability and its Application 32, 727–730. Krämer, W., Ploberger, W., Alt, R., 1988. Testing for structural change in dynamic regression models. Econometrica 56, 1355–1369. Li, Y., Xie, Z., 1999. The wavelet identification of threshold and time delay of threshold autoregressive models. Statistica Sinica 9, 153–166. Mallat, H.G., Wang, J.L., 1992. Singularity detection and processing with wavelets. IEEE Transactions on Information Theory 2, 617–643. Masry, E., 1996. Multivariate local polynomial regression for time series: uniform and strong consistency and rates. Journal of Time Series Analysis 17, 571–599. Masry, E., Tjøstheim, D., 1995. Nonparametric estimation and identification of nonlinear ARCH times series: strong convergence and asymptotic normality. Econometric Theory 11, 258–289. Merton, R., 1976. Option pricing when returns are discontinuous. Journal of Financial Economics 3, 125–144. Müller, T.G., 1992. Change points in nonparametric regression analysis. The Annals of Statistics 20, 737–761. Nadaraya, E.A., 1964. On estimating regression. Theory of Probability and its Applications 9, 141–142. Raimondo, M., 1998. Minimax estimation of sharp change points. The Annals of Statistics 26, 1379–1397. Tran, K.C., 1999. Testing for structural change in the dynamic adjustment model with autoregressive errors. Empirical Economics 24, 61–74. Vasicek, O., 1977. An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177–188. Wang, Y., 1995. Jump and sharp cusp detection by wavelets. Biometrika 82, 385–397. Wang, Y., 1998. Change curve estimation via wavelets. Journal of the American Statistical Association 93, 163–172. Wang, Y., 1999. Change-points via wavelets for indirect data. Statistica Sinica 9, 103–117. Wong, H., Ip, W., Li, Y., Xie, Z., 1999. Threshold variable selection by wavelets in open-loop threshold autoregressive models. Statistics and Probability Letters 42, 375–392. Xia, Y., 1998. Bias-corrected confidence bands in nonparametric regression. Journal of Royal Statistical Society Series B 60, 797–811. Xia, Y., 1999. On the estimation and testing of function-coefficient linear models. Statistica Sinica 9, 735–777. Zhou, Y., Liang, H., 1999. Asymptotic normality for L1 -norm kernel estimator of conditional media under dependence. Journal of Multivariate Analysis 73, 136–154.