Detections of changes in return by a wavelet smoother with

advertisement
ARTICLE IN PRESS
Journal of Econometrics 143 (2008) 227–262
www.elsevier.com/locate/jeconom
Detections of changes in return by a wavelet smoother with
conditional heteroscedastic volatility$
Gongmeng Chena, Yoon K. Choib, Yong Zhouc,d,
a
Department of Economics, School of Economics and Management, Shanghai Jiaotong University, Shanghai 200030, PR China
b
Department of Finance, College of Business Administration, University of Central Florida, Orlando, FL 32816, USA
c
Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing 100080, PR China
d
Department of Statistics, Shanghai University of Finance and Economics, Shanghai 200433, PR China
Available online 17 October 2007
Abstract
In this paper, we propose two estimators, an integral estimator and a discretized estimator, for the wavelet coefficient of
regression functions in nonparametric regression models with heteroscedastic variance. These estimators can be used to
test the jumps of the regression function. The model allows for lagged-dependent variables and other mixing regressors.
The asymptotic distributions of the statistics are established, and the asymptotic critical values are analytically obtained
from the asymptotic distribution. We also use the test to determine consistent estimators for the locations of change points.
The jump sizes and locations of change points can be consistently estimated using wavelet coefficients, and the convergency
rates of these estimators are derived. We perform some Monte Carlo simulations to check the powers and sizes of the test
statistics. Finally, we give practical examples in finance and economics to detect changes in stock returns and short-term
interest rates using the empirical wavelet method.
r 2007 Elsevier B.V. All rights reserved.
JEL classification: C12; C52
Keywords: Nonparametric regression; Wavelet coefficient; Change points; Kernel estimation; Local polynomial smoother; Conditional
heteroscedastic variance; a-Mixing
1. Introduction
One of the intensively studied models in finance is the one-factor diffusion model:
drt ¼ mðrt Þ dt þ sðrt Þ dwt ,
(1.1)
where wt ¼ wðtÞ is a standard Wiener process. The functions mðÞ and sðÞ are the drift (or instantaneous mean
regression return) and diffusion (or instantaneous variance, volatility, risk) functions of the process rt of
$
This paper has been supported in part by a grant from the Hong Kong Polytechnic University. Zhou’s research was supported in part
by National Natural Science Foundation of China (NSFC) Grants 10471140, the National Basic Research Program of China (973
Program) Grant 2007CB814902.
Corresponding author. Tel.: +86 10 62651335; fax: +86 10 62541689.
E-mail addresses: afgmchen@inet.polyu.edu.hk (G. Chen), ychoi@bus.ucf.edu (Y.K. Choi), yzhou@mail.amss.ac.cn (Y. Zhou).
0304-4076/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.jeconom.2007.10.001
ARTICLE IN PRESS
228
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
interest, respectively. We call (1.1) a drift-plus-diffusion model. In asset pricing, mðÞ is the expected rate of
return and sðÞ the price volatility. With rt representing the short-term rate, model (1.1) incorporates many
short-rate models, such as the CIR model (Cox et al., 1985), the AL model (Anderson and Lund, 1997), and
Aı̈t-Sahalia’s model (Aı̈t-Sahalia 1996a, b). It is of general interest in finance to identify the drift mðÞ and
volatility (risk) sðÞ. As Aı̈t-Sahalia (1996a, b) pointed out, one of the most important features of (1.1) for
derivative security pricing is the specification of the return mðÞ and the volatility sðÞ.
However, the continuous-time model faces a severe limitation in practical applications such as short-term
hedging strategies or the pricing of many derivatives, when jumps exist in underlying sample data. Amin (1993)
recognizes the sensitivity of option prices when jumps are present. Merton (1976) considered Poisson jumps
superimposed on a geometric Brownian motion in describing stock prices. However, Merton’s model assumes that
returns are independent and identically distributed, which contradicts evidence of conditional heteroscedasticity in
returns. Other authors also allowed for jumps in continuous-time models (see Ahn and Thompson, 1988; Bates,
1991; Das and Foresi, 1996; Duffie et al., 2000; Aı̈t-Sahalia et al., 2001, among others).
One common feature of these recent models is that the assumed distributions with parameters need to be
estimated. Recently, Aı̈t-Sahalia (1996b) proposed a test to examine whether any standard, one-factor model,
often used to fit short-rate processes and asset pricing models, is parametric or nonparametric. Using 22 years
of daily U.S. short-rate data, Aı̈t-Sahalia (1996b) strongly rejected well-known one factor, parametric
diffusion models (such as Vasicek’s model (1977), the CIR model, and Duffie–Kan’s model, 1993). The strong
rejection using Aı̈t-Sahalia’s test suggests that nonparametric models are very powerful and robust for
differentiating among short-rate models. Furthermore, Aı̈t-Sahalia (2002) proposed a method to distinguish
between diffusion and non-diffusion processes, based on a discrete subsample of the continuous-time path.
Following Karlin and McGregor (1959), he derived a necessary and sufficient condition on the transition
densities of diffusions at the sampling interval of the observed data. He also showed that the S&P 500 index is
consistent with a continuous-time diffusion with jumps.
In this paper, we propose estimators, and their test statistics for jumps, of the regression function in
nonparametric regression models, taking into account dependent observations using wavelet methods. Unlike
traditional smoothing methods based on a fixed spatial case (e.g., Fourier series methods or fixed bandwidth
kernel methods), the wavelet method is a multi-resolution approach and has local adaptivity. Recently,
Delgado and Hidalgo (2000) proposed estimators of locations and sizes of structural breaks in general
regression models, based on the kernel method. However, the kernel estimator is sensitive to the bound of the
estimated function. In addition, the wavelet method is different from those based on residual errors: for
example, CUSUM tests (Cumulative Sum of Residuals test; see Krämer et al., 1988), mostly in the special case
where the observations are assumed to be a sequence of independent and identically distributed (i.i.d.) random
variables. Some authors (Tran, 1999; Kao and Ross, 1995) have shown that the CUSUM test is not robust
with respect to departure from independence. Monte Carlo results suggest that the performance of the
standard CUSUM test is quite disappointing (see Kao and Ross, 1995; Tran, 1999).
ðTÞ
The wavelet coefficient method has many merits. First, empirical wavelet coefficients U ðTÞ
J n ðkÞ and W J n ðkÞ,
to be defined below, are much more sensitive than the CUSUM test, with respect to changes in the conditional
mean function. Thus the empirical wavelet coefficient method is more likely to detect the regression changes.
Second, in constructing estimators of change points in the conditional mean function, only the conditional
mean function needs to be estimated using the wavelet method. The CUSUM method, however, needs the
estimated residual errors from model (2.1), which requires a complicated estimation of conditional
heteroscedastic variance (see Krämer et al., 1988; Tran, 1999).
Third, the wavelet method makes it convenient to study the change in the conditional mean in more
generalized models with the heteroscedastic volatility function; furthermore, the method can be easily
extended to study multivariate models (Wang, 1998). Finally, we can use the tests to determine the location
and sizes of jump points of the regression function, even in a general model with the conditional
heteroscedastic variance. In addition, location estimators of the jump points have been shown to have the
minimax convergence rate, which is the optimal rate for the estimation of change points, even if the
observations are not a sequence of i.i.d. random variables.
There have been developments of wavelet applications with respect to financial time series. Several authors
proposed jump-point detection procedures to estimate jumps in signals observed with noise using the wavelet
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
229
procedure (see Mallat and Wang, 1992; Wang, 1995; Raimondo, 1998; Antoniadis and Gijbels, 1997). In all
the regressions in the aforementioned studies, the models have been formulated by means of additive i.i.d.
Gaussian noise. Wang (1995) employed the wavelet method to detect jumps in a continuous-time model with a
constant volatility, and applied it to stock market return data. Raimondo (1998) extended the assumption of
Gaussian noise. Wang (1999) proposed a wavelet decomposition to detect and estimate change points of a
function for noisy data, observed from a transformation of the function. Li and Xie (1999) and Wong et al.
(1999) also studied the estimation of jumps of threshold regression and autoregressions using the wavelet
method. Recently, Drost et al. (1998) considered a similar problem in estimating and testing continuous-time
models with jumps and conditional heteroscedasticity. Interestingly, their tests have the same ideal property as
the wavelet method, applicable at any frequency. However, their tests are based on the variance and kurtosis
of the process, which are estimated using the quasi-maximum likelihood method, assuming that the process
follows a normal distribution. Furthermore, they do not estimate the locations of any jumps.
Most recently, Wong et al. (1999) have shown that the wavelet coefficient has significantly large absolute
values near the jump points across fine levels, while having relatively small absolute values as soon as the
location shifts away from the jump points. Hence, the wavelet coefficient exhibits high peaks near the jump
points. However, they have not derived the distribution or asymptotic distribution of the statistics, and thus
there are no critical values for the empirical wavelet coefficient to determine the significance level for any jump
points. Moreover, their method strongly depends on the lower boundness of jump sizes, which is unknown in
practice. We obtain the asymptotic distributions of empirical wavelet coefficients and test for multiple changes
in their locations and sizes. The critical values of our test statistics are calculated analytically.
The remainder of this paper is organized as follows. In Section 2, we discuss the empirical wavelet coefficients
and propose an integral estimator and a discretized estimator for the wavelet coefficient of the regression function
to detect the change points. We also derive the asymptotic distributions of the tests, which are the extreme
distributions, under the null hypothesis, and construct the estimators of locations and jump sizes of change points.
We further establish the consistency of estimators of the locations and jump sizes. The convergence rates of
location estimators are obtained, which are the best convergence rates in a nonparametric frame, even when the
observations are not a sequence of i.i.d. random variables (that is, the optimal minimax rate in the sense of
Raimondo, 1998). In Section 3, we conduct simulation experiments to assess the finite sample properties of the tests
and calculate the sizes and powers of the tests for different sample sizes. In Section 4, we analyze the structural
changes of financial data using the proposed wavelet method. Some proofs of the main results in Section 2 are very
lengthy and not included here. Only the outlines are provided in Appendices A and B.
2. Wavelet method
2.1. Models and hypotheses
We consider a discrete version of (1.1) with the conditional heteroscedastic variance as follows.
Y t ¼ TðX t Þ þ sðX t Þt ,
(2.1)
2
where TðxÞ ¼ EðY jX ¼ xÞ and s ðxÞ ¼ varðY jX ¼ xÞ: fðY t ; X t Þ; t ¼ 1; 2; . . .g is a sequence of random vectors
satisfying some mixing-dependent conditions, and ft ; t ¼ 1; 2; . . .g is a sequence of i.i.d. random variables. Here,
the usual assumption of independence regarding fðY t ; X t Þ; t ¼ 1; 2; . . .g is relaxed to allow for dependent
observations in a time series, which is very important because the sequence of observations in economics and
finance is often dependent and highly persistent. When xt is a fixed time variable, we consider the following model:
Y t ¼ Tðxt Þ þ sðxt Þt .
(2.2)
In this model, we assume that i is a sequence of random variables and fxt ; t ¼ 1; 2; . . . ; ng forms a sequence of
fixed designs such that xt 2 ½a; b and
Z xtþ1
f ðxÞdx ¼ 1=n for all n; t ¼ 1; 2; . . . ; n
xt
with a known probability density function, f ðxÞ.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
230
Function gðxÞ is said to have að0pap1Þ sharp cusp at x0 if there exists a positive constant C such that for a
small enough h,
jgðx0 þ hÞ gðx0 ÞjXCjhja .
When a ¼ 0, gðxÞ has a jump at x0 . We assume that gðxÞ is smooth in the sense of being continuous and
differentiable. In this paper, we assume that TðxÞ is smooth except at those discontinuous points.
Our interest is to test the hypothesis that there is no discontinuous point in the regression function TðxÞ,
against an alternative hypothesis that there exists at least one discontinuous point in TðxÞ, x 2 ½a; b for some
constants jajo1 and jbjo11; that is
H0 : TðxÞ ¼ T 0 ðxÞ 3 H1 : Tðx0 ÞaTðx0 þÞ,
for at least x0 2 ½a; b, where T 0 ðxÞ is a smooth function in [a,b].
In fact, when TðxÞ has p discontinuous points in ½a; b, it implies that TðxÞ can be re-written, for the sake of
simplicity, as
TðxÞ ¼ CðxÞ þ DðxÞ,
P
where DðxÞ ¼ pl¼1 d l I ½tl ;b ðxÞ with aot1 ot2 o otp ob and CðxÞ is twice continuously differentiable on
ða; bÞ. This implies that TðxÞ is a smooth function except at finite jump points; that is, when d l ¼ 0, TðxÞ is very
smooth. Let d i ¼ Tðti þÞ Tðti Þ denote the size of a jump of the function TðxÞ at point ti . Here p, d l and
tl ; l ¼ 1; 2; . . . ; p are all unknown constants to be estimated.
2.2. Estimation of the wavelet coefficient
Let F1 and F2 be two s-algebras. The measures of dependence between F1 and F2 are defined as follows:
aðF1 ; F2 Þ ¼ supfjPðA \ BÞ PðAÞPðBÞj; A 2 F1 ; B 2 F2 g.
Suppose that fU t ; tX1g is a sequence of real-valued random variables. Let Fba ¼ sðU i ; apipbÞ be the salgebra generated by the indicated random variables. Then write
aðnÞ ¼ sup aðFt1 ; F1
nþt Þ.
tX1
The sequence fU t ; tX1g is said to be a-mixing (or strong mixing) if aðnÞ ! 0 as n ! 1.
For model (2.1), assume that Eðt jFt Þ ¼ 0, Varðt jFt Þ ¼ s20 ; Ejt jR o1 for some R42. Occasionally, we
assume that ft g is independent of Ft , where Ft ¼ FðX t ; X t1 ; . . .Þ is the information set up to time t. Masry
and Tjøstheim (1995) have shown that under some conditions, the sequence created by model (2.1) is
geometrically ergodic. Therefore, it follows from Bradley (1986) that the sequence created by model (2.1) is
stationary and a-mixing, with the mixing coefficient of an exponentially decreasing rate. A sufficient condition
is given in Appendix A (see Remark A.1).
Before discussing the wavelet transformation of regression (2.1), we need to introduce some notations.
Assume that fðY t ; X t Þ; 1ptpng is a realization of ðY ; X Þ. Suppose that
I n ðx0 Þ ¼ fi : 1pipn and jX i x0 jphn g,
and let N n ðx0 Þ ¼ #I n ðx0 Þ denote the number of points (or the sample) in I n ðx0 Þ, Dn ¼ f0; 1; . . . ; 2J n 1g where
J n is often a sequence with J n ! 1 as n ! 1. Let
k
Iðs; dn Þ ¼ k : a þ J n ðb aÞ spdn ,
2
where dn ¼ 2J n .
1
The support of X may be infinite when X is a random design. But we may always consider the finite interval ½a; b as the support of X.
Without loss of generality, we assume X 2 ½a1 ; a2 with the density function f ðxÞ, where a1 ¼ 1 and a2 ¼ þ1. We can take a
transformation for X by replacing the original X by 1=f1 þ expfX gg, which does not have any effect on our proofs below. Obviously, the
random variable 1=f1 þ expfX gg is in ½0; 1.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
231
We take a wavelet cðxÞ throughout this paper, x 2 R, with the following properties (using the notations in
Wong et al., 1999).
(I) cðxÞ is of bounded variation on ½A; A with a compact support on ½A; A with A41, and cðxÞ ¼ 0;
x 2 ½1; 1:
(II) cðxÞ is a twice continuously differentiable function on ½A; A. The wavelet function cðxÞ satisfies some
integral conditions of
Z A
Z A
cðxÞ dx ¼ 0;
xcðxÞ dx ¼ 0
A
A
and
Z
Z
A
A
xcðxÞ dxa0.
cðxÞ dxa0;
1
1
(III) Furthermore, the wavelet function cðxÞ has the following properties:
Z A
Z A
0o
cðxÞ dxo
cðxÞ dx,
y
1
for all 1oyoA and
Z y
Z
0o
cðxÞ dxo
A
1
A
cðxÞ dx,
for all 1oyoA.
From the wavelet cðxÞ and any scale function fðxÞ, we can obtain the orthogonal wavelet basis on L2 ½a; b
per
ffper
l;k ðxÞ; k 2 I l ; cJ n ;k ðxÞ; k 2 Dn ; J n Xlg,
where
fper
l;k ðxÞ ¼
X
n
cper
J n ;k ðxÞ ¼
x a
1
pffiffiffiffiffiffiffiffiffiffiffi fl;k
þn ,
ba
ba
(2.3)
x a
1
pffiffiffiffiffiffiffiffiffiffiffi cJ n ;k
þn ,
ba
ba
(2.4)
X
n
with fJ n ;k ðxÞ ¼ 2J n =2 fð2J n x kÞ; cJ n ;k ðxÞ ¼ 2J n =2 cð2J n x kÞ, and Dn ¼ f0; 1; 2; . . . ; 2J n 1g.
As the orthogonal properties are not required in this paper, without loss of generality, we take
x a
x a
1
1
fper
; cper
,
l;k ðxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi fl;k
l;k ðxÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi cl;k
ba
ba
ðb aÞ
ðb aÞ
and Dn ¼ fk : k ¼ ½2J n z; 0ozo1g.
Now, the wavelet coefficient of the conditional mean function of model (2.1) is defined as
Z b
bðTÞ
¼
TðxÞcper
J n ;k ðxÞ dx,
J n ;k
(2.5)
a
ðTÞ
where cper
J n ;k ðxÞ is defined by (2.4). Wong et al. (1999) proposed a simple empirical estimator of bJ n ;k , which is
V ðTÞ
J n ðkÞ ¼
N
b aX
1 X
cper
Y l,
J n ;k ðwi Þ
N i¼1
ni l2I ðw Þ
n
(2.6)
i
where N ! 1, wi are those points to divide the interval ½a; b into N þ 1 sub-intervals, that is wi ¼
a þ iðb aÞ=N and ni ¼ #I n ðwi Þ.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
232
Let KðÞ be a probability density function with bounded support ½c; c for some constant c40. When the
conditional regression function TðxÞ is a smooth function, we have the following estimator:
T n ðxÞ ¼
n
X
Kn;hn ðX i xÞY i =
i¼1
n
X
Kn;hn ðX i xÞ.
i¼1
We can obtain two consistent estimators using the kernel estimation (Nadaraya, 1964) and local linear
smoothers (Fan and Gijbels, 1996). The kernel estimator is T n ðxÞ where
Kn;hn ðX i xÞ ¼ K h ðX i xÞ,
while the local linear smoother is T n ðxÞ where
Kn;hn ðX i xÞ ¼ K h ðX i xÞ
n
X
K h ðX j xÞðX j xÞ2
j¼1
K h ðX i xÞðX i xÞ
n
X
K h ðX j xÞðX j xÞ,
ð2:7Þ
j¼1
in which K h ðÞ ¼ Kð=hn Þ. In many applications, one often assumes that KðÞ is a symmetric probability density
with finite support ½c; c, and fhn ; n ¼ 1; 2; . . .g is a sequence of bandwidths with hn ! 0 and nhn ! 1. For
simplicity, we only consider the kernel estimator.
When TðxÞ has no jump, the bias between T n ðxÞ and TðxÞ converges at zero. Conversely, when the
conditional mean function TðxÞ has at least one jump, T n ðxÞ is not a consistent estimator of TðxÞ at the
neighborhood of the change points. The wavelet transformation of TðxÞ would magnify the bias with a
suitable wavelet cðxÞ and a fine scale J n ; provided that the wavelet has an appropriate number of vanishing
moments to expand the values of the wavelet transformation of TðxÞ, such that the wavelet coefficients of TðxÞ
have larger values than other disturbances. This is easily proved by the fact that the existence of a jump at
point k0 for function TðxÞ results in the wavelet coefficient of TðxÞ near k0 being large for a suitable fine scale
J n . Some authors (Wang, 1995; Daubechies, 1992, p. 300) have shown similar consequences. Therefore, the
integral estimator for the theoretic wavelet coefficient is a good statistic to test whether there are jumps in
TðxÞ.
In fact, we may obtain two more generalized empirical estimators of bðTÞ
J n ;k based on T n ðxÞ. The first estimator
is an integral estimator of the theoretic wavelet coefficient bðTÞ
.
The
idea
of constructing this estimator is very
J n ;k
simple and intuitive. This estimator is defined by
Pn
Z b
per
i¼1 Kn;hn ðX i xÞY i
P
U JðTÞ
dx.
(2.8)
ðkÞ
¼
c
ðxÞ
n
J n ;k
n
a
i¼1 Kn;hn ðX i xÞ
One often prefers the discretized estimator to the integral estimator because of computational problems.
The discretized estimator is defined by
Pn
N
baX
j¼1 Kn;hn ðX j wi ÞY j
ðTÞ
per
W J n ðkÞ ¼
,
(2.9)
cJ n ;k ðwi Þ Pn
N i¼1
j¼1 Kn;hn ðX j wi Þ
where N and wi are the same as those in (2.6). The simple empirical estimator (2.6) can be obtained from (2.9)
by taking the kernel function
(
1
if kxkp1;
KðxÞ ¼ 2
0 if kxk41:
The simple empirical estimator (2.6) has some drawbacks because the bandwidth hn selected in this
estimator cannot reach the optimal value, that is, hopt ¼ Cn1=5 for some constant C. The selected bandwidth
in the estimator (2.6) suggests that the estimator has a larger mean integration square error (MISE) than the
estimators (2.8) and (2.9), which have the optimal bandwidths. This results in accepting the null hypothesis too
often. Without loss of generality, we only discuss the wavelet coefficient when X is a random design in the
following sections. However, the results still hold true when X is the fixed design with small modification.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
233
2.3. Asymptotic results of test statistics
Let W ðxÞ be the standard Wiener process with EW ðtÞ ¼ 0, and EðW ðsÞW ðtÞÞ ¼ s ^ t ¼ minðs; tÞ, in the
probability space D½0; 1, of all real-valued functions on the interval ½0; 1 that are right continuous and have
left limits. We endow the space D½0; 1 with the Skorohod metric (e.g., see Billingsley, 1968). Let fBðtÞ; 0ptp1g
denote a standard Brownian bridge, BðtÞ ¼ W ðtÞ tW ð1Þ, and EBðtÞ ¼ 0 and EðBðsÞBðtÞÞ ¼ t ^ s st. We
define some elements of D½0; 1 to express our results as follows:
Z 1 xy
Y n0 ðxÞ ¼ d1=2
c
dW ðyÞ,
n
dn
0
M n ðxÞ ¼
M n ðxÞ
Jn
f 1=2 ða þ xðb aÞÞU ðTÞ
J n ð½2 xÞ
for
sða þ xðb aÞÞ
Jn
f 1=2 ða þ xðb aÞÞW ðTÞ
J n ð½2 xÞ
¼
sða þ xðb aÞÞ
0pxp1,
for 0pxp1,
where dn ¼ 2J n and ½ denotes the integer less than or equal to its argument. J n is required to satisfy some
conditions to derive the asymptotic distribution of empirical wavelet coefficients. To obtain our main results,
some assumptions about J n , hn and N are required. These assumptions imply that the three sequences J n , hn
and N need to satisfy some given convergence rate. But these assumptions are weak. In particular, the
convergence rates for these constant sequences are always satisfied in many applications.
Assumption J(a). limn!1 22J n ðlog nÞ3 =n ¼ 0, limn!1 ð25J n =nÞ ¼ 1, and limn!1 2J n h2n log n ¼ 0
Assumption J(b). limn!1 n2J n =ðNhn Þ2 ¼ 0.
The following results play a role in detecting the jump points of the regression function:
Theorem 2.1. (a) Assume that conditions (A.1)–(A.5) in Appendix A, and Assumption J(a) are satisfied. Then
under the null hypothesis H0 (that is, when there is no change in the regression function TðxÞ),
1=2
n
sup
jM n ðxÞj and
sup jY n0 ðxÞj
2
s
0pxp1
0pxp1
0
have the same asymptotic distribution.
(b) Assume that conditions (A.1)–(A.5) in Appendix A, and Assumptions J(a)–J(b) are satisfied. Then under
the null hypothesis,
1=2
n
sup
jM n ðxÞj and
sup jY n0 ðxÞj
2
0pxp1 s0
0pxp1
have the same asymptotic distribution.
The following corollary gives approximate critical values for the tests under the null hypothesis, which states
that there is no jump point in the regression TðxÞ.
Corollary 2.1. Assume that the conditions of Theorem 2.1(a) are satisfied. Then under the null hypothesis, we
have
(
)
n 1=2
P Aðdn Þ
sup jM n ðxÞj aðdn Þoz ! expð2 expðzÞÞ.
k2 s20
0pxp1
Suppose that the conditions of Theorem 2.1(b) are satisfied. Then under the null hypothesis, we have
(
)
n 1=2
P Aðdn Þ
sup jM n ðxÞj aðdn Þoz ! expð2 expðzÞÞ,
k2 s20
0pxp1
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
234
where AðxÞ ¼ j2 log xj1=2 and
1=2
1=2
aðxÞ ¼ j2 log xj
where k1 ¼
RA
A ðc
0
1=2
þ j2 log xj
ðxÞÞ2 dx and k2 ¼
log
RA
A
k1
!
1=2
,
2pk2
c2 ðxÞ dx.
It is interesting to see whether the test statistics are consistent under the alternative hypothesis H1 , that there
are changes in the regression function. Hence, we further study the asymptotic properties of the statistics U ðTÞ
J n ;k
and W ðTÞ
J n ;k under the alternative hypothesis in the following theorem. From this theorem, we obtain the
estimators of the jump sizes and locations of change points.
Theorem 2.2. Assume that the conditions (A.1)–(A.5) of Appendix A are satisfied. Let tl ; l ¼ 1; 2; . . . ; p be p
jump points of TðxÞ, and the corresponding jump sizes be denoted d l ; l ¼ 1; 2; . . . ; p.
(a) If Assumption J(a) is satisfied, then for k 2 Iðtl ; 2J n ðb aÞÞ we obtain that
Z A
1=2
J n =2
U ðTÞ
ðkÞ
¼
2
ðb
aÞ
d
cðxÞ dx þ Op ðn1=2 Þ,
l
Jn
(2.10)
1
where
S an ¼ OP ðbn Þ denotes limn!1 an =bn ¼ C
ke pl¼1 Iðtl ; 2J n =2 ðb aÞÞ, we have
in
1=2
U ðTÞ
Þ.
J n ðkÞ ¼ OP ðn
probability
for
some
constant
C,
and
for
(2.11)
(b) If Assumptions J(a) and J(b) are satisfied, then (2.10) and (2.11) hold for the discretized estimator W ðTÞ
J n ðkÞ of
the empirical wavelet coefficient.
From the theorems above we can show that our tests for jumps of the regression function are consistent.
Hence, we derive the following important corollary.
Corollary 2.2. Suppose that the assumptions of Theorem 2.2 are satisfied. Under the alternative hypothesis H1 ,
1=2
1=2
tests n1=2 U ðTÞ
W ðTÞ
maxk2Dn jU JðTÞ
ðkÞj!1 and
J n ðkÞ and n
J n ðkÞ are consistent in probability; that is, n
n
ðTÞ
n1=2 maxk2Dn jW J n ðkÞj!1 as n ! 1 in probability.
2.4. Estimation of jump size and change points
When we assume that TðxÞ has only one change point tl in ½a; b, an estimator for the jump size d l of the
change point is proposed as follows:
ðiÞ
d^l ¼
2J n =2 maxk2Dn jU ðTÞ
J n ðkÞj
R
1=2 A
ðb aÞ
1 cðxÞ dx
ðdÞ
d^l ¼
2J n =2 maxk2Dn jW ðTÞ
J n ðkÞj
.
R
1=2 A
ðb aÞ
1 cðxÞ dx
or
ðiÞ
ðdÞ
We can show that asymptotic distributions of the estimators d^ l and d^ l are extreme distributions; that is, they
have asymptotic distributions similar to those of Corollary (2.1). It is easy to prove the consistency of the two
ðiÞ
ðdÞ
estimators d^ and d^ .
l
l
Assumption J(c). limn!1 ð2J n =nÞ ¼ 0; limn!1 ð25J n =nÞ ¼ 1.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
235
Theorem 2.3. Assume that the conditions (A.1)–(A.5) of Appendix A and Assumption J(c) are satisfied. Then
under H1 ,
ðiÞ
jd^ l d l j ¼ OP ðð2J n nÞ1=2 Þ.
Furthermore, assume that J(b) is satisfied. Then under H1 ,
ðdÞ
jd^ l d l j ¼ OP ðð2J n nÞ1=2 Þ.
It should be noted that some values of M n ðxÞ and M n ðxÞ in Corollary 2.1, such as f ðtl Þ, sðtl Þ and tl , are
unknown. Hence, we need to estimate them in order to use the results of Theorem 2.1. In the next section, we
discuss the estimation of f ðÞ and sðÞ by nonparametric kernel methods and local polynomial smoothers. Here
we only discuss two estimators of the change point tl . For the sake of simplicity, assume that TðxÞ has only one
change point tl in ½a; b. From the results of Theorem 2.2, we can easily suggest an estimator for change point
tl . Let
ðiÞ
ðiÞ
t^l ¼ a þ k^l ðb aÞ=2J n ,
where
ðiÞ
k^l ¼ arg maxfjU JðTÞ
ðkÞj; k 2 Dn g.
n
Similarly, we can define another estimator for change point tl . Let
ðdÞ
ðdÞ
t^l ¼ a þ k^l ðb aÞ=2J n ,
where
ðdÞ
k^l ¼ arg maxfjW ðTÞ
J n ðkÞj; k 2 Dn g.
ðiÞ
ðdÞ
It can be shown that t^l and t^l are the consistent estimators of change points. Furthermore, we can obtain
ðiÞ
ðdÞ
the convergence rates for t^l and t^l , which are the optimal minimax rates.
Theorem 2.4. Suppose that the conditions (A.1)–(A.5) in Appendix A and Assumption J(c) are satisfied. Then
ðiÞ
t^l tl ¼ OP ð2J n Þ.
In addition, if Assumption J(c) is satisfied, we have
ðdÞ
t^l tl ¼ OP ð2J n Þ.
Remark 2.1. It should be noted that J n is not required to satisfy Assumption J(a) in Theorem 2.4, but it is
necessary for Theorems 2.1 and 2.2. In particular, based on the integral estimator of the empirical wavelet
coefficient, 2J n can be taken as OP ðn1 ðlog nÞZ Þ for any Z40. For the discretized case, although N and J n are
required to satisfy Assumption J(c), N should be chosen to be large enough such that n=ðNhn Þ ! 1 when
2J n ¼ Oðnðlog nÞZ Þ, as n ! 1 for any Z40.
We find that the convergence rate for the estimator of the change point by empirical wavelet method is
OP ðn1 Þ. Carlstein et al. (1994), Raimondo (1998), and Wang (1995) have studied estimation problems of the
change point in the regression TðxÞ for fixed designs with nonparametric methods under the i.i.d. assumption
for observations. If TðxÞ has only one jump at a point tl (otherwise, it is the Lipschtisz continuous function),
then the minimax rate of the problem is known to be OP ðn1 Þ (Korostelev, 1987). Therefore, from the results
of Theorem 2.4, the convergence rate reaches the best possible rate. Müller (1992) further proposed consistent
kernel type estimators and established limit distributions. The corresponding rates of convergence are OP ðna Þ
for 1=2pao1. Hence, the convergence rate of Theorem 2.4 is typically faster than OP ðn1=2 Þ.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
236
2.5. Multiple change points
When regression function TðxÞ has p (known or unknown) change points in the interval ½a; b, we can
assume, without loss of generality, that jd lþ1 j4jd l j, l ¼ 1; 2; . . . ; p 1. Suppose that TðxÞ is differentiable at
all points on ða; bÞ except at tl ; l ¼ 1; 2; . . . ; p.
ðiÞ
ðdÞ
When we use the estimators t^l and t^l for jump points as mentioned before, we only obtain an estimate of
the point with the largest jump size if TðxÞ has at least two jump points. Therefore, the estimation of change
points needs to be conducted sequentially. Since we assume that jd lþ1 j4jd l j, l ¼ 1; 2; . . . ; p, we can
sequentially suggest several estimators for the break points tl ; l ¼ 1; 2; . . . ; p (following, Wang, 1995; and
Delgado and Hidalgo, 2000), which are defined by
ðiÞ
ðiÞ
t~l ¼ a þ k~l ðb aÞ=2J n ;
l ¼ 1; 2; . . . ; p,
ðdÞ
ðdÞ
t~l ¼ a þ k~l ðb aÞ=2J n ;
l ¼ 1; 2; . . . ; p,
where
ðiÞ
k~l ¼ arg max jU ðTÞ
J n ðkÞj;
j ¼ 1; 2; . . . ; p,
k2QðjÞ
ðdÞ
k~l ¼ arg max jW ðTÞ
J n ðkÞj;
j ¼ 1; 2; . . . ; p,
k2QðjÞ
(2.12)
S
ðTÞ
~ J n Aðb aÞÞ, and t~l is one of t~ðiÞ
~ðdÞ
in which QðjÞ ¼ Dn j1
l and tl , corresponding to U J n ðkÞ and
l¼1 Iðtl ; 2
ðTÞ
W J n ðkÞ, respectively. The estimators of jump sizes of change points tl , l ¼ 1; 2; . . . ; p can be defined by
ðiÞ
d~l ¼
~ðiÞ
2J n =2 U ðTÞ
J n ðk l Þ
RA
ðb aÞ1=2 1 cðxÞ dx
ðdÞ
d~l ¼
~ðdÞ
2J n =2 W ðTÞ
J n ðk l Þ
.
RA
ðb aÞ1=2 1 cðxÞ dx
and
When p is unknown, Theorem 2.1(a) implies that n1=2 jU ðTÞ
J n ;k jpC 1o with approximate probability, 1 o,
at those x values at which TðxÞ has no jump. So we take C 1o as a threshold and use these values to determine
ðiÞ
ðiÞ
which point is a change point. In fact, max ðjÞ jU ðTÞ ðkÞj4C 1o , hence t~ ¼ a þ k~ ðb aÞ=2J n is a change
k2Q
Jn
l
l
ðiÞ
~ðdÞ
^ j ðj ¼ 1; 2Þ be the
point with probability 1 o, where k~l ¼ maxk2QðjÞ jU ðTÞ
J n ðkÞj. Similarly, we define tl . Let p
ðiÞ
number of the maxima and t^1 ; . . . ; t^p be their locations, where t^1 is one of t~l (corresponding to p^ ¼ p^ 1 , based on
ðdÞ
the integral estimation of the wavelet coefficient), and t~l (corresponding to p^ ¼ p^ 2 , based on the discretized
estimation of the wavelet coefficient). As a result, we can sequentially define estimators of jump sizes for p
change points, and thus obtain the following results:
Theorem 2.5. Assume that conditions (A.1)–(A.5) in Appendix A are satisfied. Let tl ; l ¼ 1; 2; . . . ; p be p jump
points of function TðxÞ, and the corresponding jump size be denoted d l ; l ¼ 1; 2; . . . ; p.
(a) If Assumption J(c) is satisfied, then
Pðp^ 1 ¼ pÞ!1,
ðiÞ
d~l d l ¼ OP ðð2J n nÞ1=2 Þ
ðiÞ
t~l tl ¼ OP ð2J n Þ;
^
l ¼ 1; 2; . . . ; p,
^
l ¼ 1; 2; . . . ; p.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
237
(b) If Assumptions J(c) and J(d) are satisfied, then
Pðp^ 2 ¼ pÞ!1,
ðdÞ
d~l d l ¼ OP ðð2J n nÞ1=2 Þ;
ðdÞ
t~l tl ¼ OP ð2J n Þ;
^
l ¼ 1; 2; . . . ; p,
^
l ¼ 1; 2; . . . ; p.
2.6. Estimation of the asymptotic variance
The estimation of the asymptotic variance in model (2.1) is complicated because it involves the density
function f ðxÞ of explanatory variable X and the heteroscedastic variance function sðxÞ. The estimator of
density has been established by nonparametric kernel techniques (see Nadaraya, 1964). The kernel estimator is
defined by
n
1 X
Xi x
f n ðxÞ ¼
K
,
(2.13)
nhn i¼1
hn
where hn is a window-width variable with hn ! 0 and nhn ! 1, and KðxÞ is a probability density function
that may be different from that of (2.7).
The regression function TðxÞ can be estimated by the kernel estimator or the local linear smoother under
null hypothesis; that is,
Pn
i¼1 K h ðX i xÞY i
^
TðxÞ ¼ P
,
(2.14)
n
i¼1 K h ðX i xÞ
where the kernel function KðÞ is a probability density function, or
Pn
i¼1 Kn;hn ðX i xÞY i
^
T 1 ðxÞ ¼ P
,
n
i¼1 Kn;hn ðX i xÞ
(2.15)
where the kernel function Kn;hn ðÞ is the same as that of (2.7), but where KðÞ may be different from that
of (2.7).
The heteroscedastic variance estimator is slightly more complex. Estimation of the conditional variance and
of density is of common interest in a variety of statistical applications, such as in measuring volatility or risk,
the return distribution function, and the distribution of the predictive error in finance (Anderson and Lund,
1997; Gallent and Tauchen, 1997). Under a general setup, which includes nonlinear time series models as a
special case, Fan and Yao (1998) have proposed an efficient and adaptive method for estimating the
conditional variance. They improved the estimator suggested by Härdle and Tsybakov (1997). Fan and Yao
(1998) have proposed a better estimator by regarding the estimation of s2 ðÞ as a nonparametric regression
problem in view of the relation EðrjX ¼ xÞ ¼ s2 ðxÞ, where r ¼ ðY TðX ÞÞ2 , assuming that Eði jX i Þ ¼ 0 and
varði jX i Þ ¼ 1. Xia (1999) has proposed a robust estimator for the conditional variance by using the same idea
as that of Fan and Yao (1998).
Hence, we use a similar robust estimator for the volatility function sðxÞ. For simplicity, we assume that j is
independent of ðX i ; ipjÞ in model (2.1), and sðxÞ (or s2 ðxÞ) has a bounded and continuous derivative of second
order on ða; bÞ. We can derive the two estimators for the heteroscedastic variance in model (2.1). From model
(2.1), we can write an alternative nonparametric volatility model as
jY i TðX i Þj ¼ s10 sðX i Þ þ sðX i Þðji j s10 Þ
(2.16)
ðY i TðX i ÞÞ2 ¼ s20 s2 ðX i Þ þ s2 ðX i Þð2i s20 Þ,
(2.17)
and
where s10 ¼ EðjjÞ and s20 ¼ Eð2 Þ. Hence the transformations (2.16) and (2.17) of model (2.1) are completely
similar to those of model (2.1). First, we consider the simple case where TðxÞ has been estimated. With the
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
238
expressions of (2.16) and (2.17), TðxÞ in the estimators of the conditional variance may be replaced by the
estimated regression function. Hence this leads to the following simple forms as the absolute deviation
estimator s^ 1 ðxÞ:
Pn
^ i Þj
Vn;b ðX i xÞjY i TðX
,
(2.18)
s^ 1 ðxÞ ¼ i¼1 Pn
s10 i¼1 Vn;b ðX i xÞ
and the square estimator s^ 22 ðxÞ:
Pn
^ i ÞÞ2
Vn;b ðX i xÞðY i TðX
s^ 22 ðxÞ ¼ i¼1 Pn
,
s20 i¼1 Vn;b ðX i xÞ
(2.19)
where
Vn;b ðX i xÞ ¼ V b ðX i xÞ
n
X
V b ðX j xÞðX j xÞ2
j¼1
V b ðX i xÞðX i xÞ
n
X
V b ðX j xÞðX j xÞ,
ð2:20Þ
j¼1
where V b ðÞ ¼ V ð=bÞ, V ðÞ is a kernel function that may be different from those kernel functions in (2.7) and
(2.13), bn is a sequence of bandwidths, and s10 and s20 are the same as those in (2.16) and (2.17). Without loss
of generality, we assume that s10 ¼ 1 or s20 ¼ 1 for identification. It is easy to show that s^ 1 ðxÞ is more robust
than s^ 2 ðxÞ, and the distribution of is fat-tailed or there exist outliers in the observations. Xia (1999) has
shown that when the distribution of has a kurtosis greater than 3, s^ 1 ðxÞ tends to be more efficient than s^ 2 ðxÞ,
and the larger the degree of the diffusion, the more efficient is s^ 1 ðxÞ than s^ 2 ðxÞ. However, in mathematics, it is
easier to deal with the estimator s^ 2 ðxÞ. Hence, in our simulations we use the estimator s^ 2 ðxÞ incomparison with
s^ 1 ðxÞ.
Using similar proofs to those of Theorems 5 and 6 in Masry (1996), and incorporating main results of
Masry and Tjøstheim (1995), we have
!
log n 1=2
^
sup jsðxÞ
sðxÞj ¼ O
a:s.
(2.21)
nhn
apxpb
^
sðxÞj ¼ OP
sup jsðxÞ
apxpb
log n
nhn
1=2 !
,
(2.22)
^
is either
where a.s. denotes convergence in probability 1, and OP denotes convergence in probability, and sðxÞ
s^ 1 ðxÞ or s^ 2 ðxÞ. Hence, we can construct two tests that do not include any unknown quantities:
1=2
Jn
f^ ða þ xðb aÞÞU ðTÞ
J n ð½2 xÞ
^ n ðxÞ ¼
,
M
^ þ xðb aÞÞ
sða
^ ðxÞ ¼
M
n
1=2
Jn
f^ ða þ xðb aÞÞW ðTÞ
J n ð½2 xÞ
,
^ þ xðb aÞÞ
sða
(2.23)
(2.24)
where 0pxp1. Therefore, the following corollary follows immediately from Corollary 2.1, Lemma A.4, and
(2.22).
Corollary 2.3. Assume that the conditions of Theorem 2.1(a) are satisfied. Then under the null hypothesis H0 , we
have
(
)
n 1=2
^ n ðxÞj aðdn Þoz ! expð2 expðzÞÞ.
P Aðdn Þ
sup jM
k2 s20
0pxp1
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
239
Suppose that the conditions of Theorem 2.1(b) are satisfied. Then, under the null hypothesis, we have
(
)
n 1=2
^
sup jM n ðxÞj aðdn Þoz ! expð2 expðzÞÞ,
P Aðdn Þ
k2 s20
0pxp1
where AðxÞ ¼ j2 log xj1=2 and
1=2
1=2
aðxÞ ¼ j2 log xj
where k1 ¼
RA
0
2
A ðc ðxÞÞ
1=2
þ j2 log xj
dx and k2 ¼
log
RA
A
k1
!
1=2
,
2pk2
c2 ðxÞ dx.
This corollary gives the asymptotic critical values to detect jump points of regression function TðxÞ. The
results similar to Corollary 2.4 for model (2.2) also hold for f^ða þ xðb aÞÞ, replaced by known
f ða þ xðb aÞÞ.
3. Simulation
In this section, we carry out several Monte Carlo simulations to investigate the finite sample properties of
tests based on the empirical wavelet coefficients. All simulations and statistical computations were written with
Fortran 77. First, we consider the case of the regression function with heteroscedastic conditional variance.
The choice of the bandwidth is very important in the estimation of the regression function and in the
estimators of the asymptotic variance in Theorem 2.1. To use the data-driven bandwidth, we show that the
random bandwidth can be used. This can be achieved by proving that the results in Section 2 hold uniformly
for hn 2 ½c1 n1=5 ; c2 n1=5 , where c1 oc2 . It is known that the power of the tests are sensitive to a choice of
bandwidths. We shall only address the problem of bandwidth selection in the regression function since the
same principle applies to the estimation of the density function and the conditional variance function. Let
Z b
^
½TðxÞ
TðxÞ2 wðxÞf ðxÞ dx ,
MISE ¼ E
a
where wðxÞ is a given weight function which we will later take to be 1, and
Pn
j¼1 Kn;hn ðX j xÞY j
^
,
TðxÞ
¼ Pn
j¼1 Kn;hn ðX j xÞ
^
where Kn;hn ðÞ satisfies (2.20). The ideal bandwidth in the sense of MISE for TðxÞ
is
!
R
1=5
1
k2 s2 0 wðxÞ dx
n1=5 ,
h0 ¼ R 1
00
T
ðxÞwðxÞf
ðxÞ
dx
0
(3.1)
(3.2)
provided that the null hypothesis holds and the conditional variance is constant. However, while TðxÞ has a
bounded and continuous second-order derivative on ða; bÞ, the conditional variance is not constant; that is, the
conditional variance is s2 ðxÞ. Hence, (3.2) should be changed to
!1=5
R1
k2 0 s2 ðxÞwðxÞ dx
n1=5 .
h0 ¼ R 1
00
T
ðxÞwðxÞf
ðxÞ
dx
0
Note that in the formula of ideal bandwidth (3.2), T 00 ðxÞ and s2 are unknown. Thus, based on the properties
available, we first use the cross-validation method to obtain h00 for the third-order local polynomial fitting.
Then, we make an adjustment to h00 in order to obtain a suitable bandwidth for the estimation of T 00 ðxÞ (see
Fan and Gijbels, 1996). The bandwidth for the local third-order polynomial fitting by the cross-validation
method is
h^00 ¼ arg min
0
h
n
X
ðY i T^ 0;h0 ðX i ÞÞ2 wðX i Þ,
i¼1
(3.3)
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
240
where T^ 0;h0 is the kernel estimator (3.1), or local linear smoother (Fan and Gijbels, 1996), using the data
fðX j ; Y j Þ; jaig. Then the bandwidth suitable for T^ 00 ðxÞ is
h^n ¼ adj 2;3 h^00 ,
(3.4)
where adj 2;3 is an adjusting constant, which depends only on the kernel function. For example, if the kernel
function is a Gaussian (Epanechnikov) kernel, then adj 2;3 ¼ 0:8285ð0:7776Þ (see Fan and Gijbels, 1996). For
the estimation of the constant conditional variance, s2 , we use the procedure of Gasser et al. (1986). A
proposed estimator of s2 is
s^ 2 ¼
n1
1 X
at Y~ t1 þ bt Y~ tþ1 Y~ t
,
n 2 t¼1
a2t þ b2t þ 1
(3.5)
where fY~ t ; t ¼ 1; 2; . . . ; ng corresponds to the order statistics fX~ ½t ; t ¼ 1; 2; . . . ; ng of fX t ; t ¼ 1; 2; . . . ; ng, and
when X ½tþ1 X ½t1 a0,
at ¼ ðX ½tþ1 X ½t Þ=ðX ½tþ1 X ½t1 Þ,
bt ¼ ðX ½t X ½t1 Þ=ðX ½tþ1 X ½t1 Þ.
00
When X ½tþ1 X ½t1 ¼ 0, let at ¼ bt ¼ 1=2. After obtaining estimators T^ ðxÞ and s^ 2 of T 00 ðxÞ and s2 , we then
^
have the ‘‘plug-in’’ bandwidth for TðxÞ
as
!1=5
R1
k2 s^ 2 0 wðxÞ dx
h0 ¼
n=5 .
(3.6)
P
4 n T^ 00 ðX t ÞwðX t Þ
t¼1
When the conditional variance is not a constant, the ‘‘plug-in’’ bandwidth involves the nonparametric
^
estimation of conditional variance s2 ðxÞ. It becomes more complicated to choose the bandwidth in TðxÞ.
We
2
can use the estimators of (2.18) and (2.19) for sðxÞ and s ðxÞ, respectively. Another bandwidth must be
selected to estimate the unknown conditional variance function s2 ðxÞ. We obtain some suitable bandwidths for
the conditional variance function s2 ðxÞ and density function f ðxÞ by similar procedures. The details have been
omitted.
In the following examples, we consider the Epanechnikov kernel
pffiffiffi
KðxÞ ¼ 3ð1 x2 ÞIðx2 p5Þ=ð4 5Þ,
^
for the estimator TðxÞ,
and the Gaussian kernel
K 0 ðxÞ ¼ ð2pÞ1=2 expðx2 =2Þ
for the estimator T^ 00 ðxÞ and the estimator of the conditional heteroscedastic variance.
Example 3.1. To assess the size and power of tests we consider the following model
Y t ¼ TðX t Þ þ sðX t Þt ;
t ¼ 1; 2; . . . ,
with conditional heteroscedastic variance (s2 ðxÞ). The errors ft ; t ¼ 1; 2; . . .g are a sequence of independently
and identically distributed (i.i.d.) random variables with the standard normal Nð0; 1Þ; fX t ; t ¼ 1; 2; . . .g are a
sequence of i.i.d. uniform random variables in ½0; 1. We generate 500 pairs of samples of sizes n ¼ 128 and
n ¼ 256 (these sample sizes are convenient for resolution level J n , such as 26 ¼ 128 and 27 ¼ 256). Assume that
the regression function TðxÞ has the form
(
0:6x2
if x4t0 ;
TðxÞ ¼
(3.7)
0:6x2 þ d 0 if xpt0 ;
and the conditional variance s2 ðxÞ ¼ 0:4x2 . To study the effect of tests at different locations, we take t0 at
three different values: 0.3, 0.5, and 0.75, for the location and d 0 at four different values, 0, 0.3, 0.5, and 1, for
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
241
the jump size. The wavelet has been taken as
8
4
if 1pxp2;
>
< 5ðx 1Þ
3
2
20
cðxÞ ¼
if 2pxp 1;
3 ðx þ 1Þ þ 2ðx þ 1Þ
>
:
0
otherwise:
We compare the absolute deviation estimator with the square estimator for the conditional heteroscedastic
variance in different samples.
The simulated results are listed in Tables 1 and 2.We find that the powers and sizes of the tests are very
good. The estimation of the location of change points is very accurate from Tables 1 and 2, in addressing
model (3.7). We may view that the estimation of the locations and jump sizes of change points, with the
absolute deviation estimator for the conditional heteroscedastic variance, is better than those with the square
estimator. When the size of the sample is small (np128), the test statistic, M n ðkÞ; with the absolute deviation
estimator for s2 ðxÞ, has slightly more power than that with the square estimator. But when the sample
ðdÞ
becomes large, it is not easy to determine which has more power. We also see that the estimator d^
l
overestimates the jump size when the sample is small (when np128). Again, from Tables 1 and 2, we find that
the location of change points seems to have an effect on the power of the tests, and the bias in the estimation of
the locations of change points increases as the real locations of change points approximate each other.
Next, consider when the error ft ; t ¼ 1; 2; . . .g is a sequence of stationary, random variables satisfying the
ARMA(1,1) model:
t ¼ rt1 þ ut þ yut1 ,
where fut ; t ¼ 1; 2; . . .g is a sequence of i.i.d. random variables with standard distribution Nð0; 1Þ.
We address model (3.7) following ARMA (1,1) errors and calculate the power and size of test M n ðkÞ. The
results with the correlation errors are almost the same as those in Tables 1 and 2. This implies that a test based
Table 1
^ n ðkÞ for model (3.7), based on square estimator, s^ 2 ðxÞ; for conditional heteroscedastic variance
Sizes and powers for test M
2
t0
n ¼ 128
0.00
0.30
0.50
0.75
n ¼ 256
0.00
0.30
0.50
0.75
d0
Test
Estimation
a ¼ 10%
a ¼ 5%
a ¼ 1%
t^
SE
d^
SE
0.30
0.50
1.00
0.30
0.50
1.00
0.30
0.50
1.00
23.6%
85.8%
99.8%
100%
93.2%
97.6%
99.6%
74.2%
97.6%
98.6%
6.8%
76.8%
98.0%
99.8 %
83.4%
91.0%
98.8%
46.4%
91.0%
96.4%
1%
65.8%
86.2%
99.8%
50.0%
57.2%
98.8%
26.8%
57.2%
96.2%
0.319
0.301
0.301
0.503
0.498
0.502
0.692
0.733
0.747
0.127
0.048
0.053
0.119
0.075
0.025
0.168
0.113
0.070
0.539
0.854
1.650
0.613
0.930
1.734
0.697
1.060
1.839
0.157
0.185
0.280
0.188
0.214
0.281
0.341
0.311
0.398
0.30
0.50
1.00
0.30
0.50
1.00
0.30
0.50
1.00
10.5%
98.8%
100%
100%
100%
100%
100%
100%
100%
100%
5.80%
96.8%
100%
100%
98.4%
100%
100%
99.6%
98.2%
100%
1.2%
87.4%
93.4%
99.4 %
89.4%
96.8%
100%
90.4%
92.0%
97.2%
0.297
0.297
0.306
0.491
0.493
0.501
0.732
0.737
0.746
0.037
0.024
0.023
0.032
0.029
0.025
0.080
0.041
0.021
0.383
0.571
1.067
0.434
0.627
1.121
0.664
0.724
1.211
0.036
0.048
0.088
0.043
0.058
0.098
0.100
0.079
0.111
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
242
Table 2
^ ðkÞ for model (3.7), based on absolute estimator s^ 1 ðxÞ, for conditional heteroscedastic variance
Sizes and powers for test M
n
t0
d0
n ¼ 128
0.00
0.30
0.50
0.75
n ¼ 256
0.00
0.30
0.50
0.75
Test
Estimation
a ¼ 10%
5%
1%
t^
SE
d^
SE
0.30
0.50
1.00
0.30
0.50
1.00
0.30
0.50
1.00
18.4%
98.0%
100%
100%
98.4%
99.8%
99.8%
97.6%
99.0%
100%
7.0%
92.4%
99.6%
99.8 %
91.4%
99.6%
99.8%
89.6%
98.2%
99.8%
1%
66.8%
98.8%
99.8%
60.4%
98.8%
99.8%
51.4%
92.8%
99.8%
0.311
0.293
0.301
0.492
0.495
0.499
0.762
0.759
0.756
0.108
0.070
0.034
0.110
0.064
0.061
0.047
0.033
0.035
0.410
0.607
0.998
0.488
0.665
1.351
0.714
0.956
1.582
0.114
0.177
0.158
0.134
0.222
0.194
0.135
0.157
0.239
0.30
0.30
0.50
1.00
0.30
0.50
1.00
0.30
0.50
1.00
11.5%
98.2%
100%
100%
100%
100%
100%
100%
100%
100%
7.00%
95.8%
100%
100%
96.4%
100%
100%
100%
98.2%
100%
0.92%
93.4%
100%
100 %
93.6%
96.8%
100%
94.2%
92.0%
97.2%
0.298
0.301
0.301
0.498
0.499
0.500
0.731
0.737
0.759
0.049
0.026
0.024
0.042
0.026
0.026
0.032
0.041
0.042
0.319
0.571
1.067
0.372
0.627
1.121
0.453
0.724
1.211
0.027
0.048
0.088
0.033
0.058
0.098
0.051
0.079
0.111
on empirical wavelet coefficients may be used in the case of the errors being correlated. We have omitted the
details of this case (available from the authors on request).
Example 3.2. Consider the following model of a time series. The data fðY t ; X t Þ; t ¼ 1; 2; . . .g are serially
correlated. Assuming that Y t ¼ X tþ1 , the model can be written as
X tþ1 ¼ TðX t Þ þ sðX t Þt ,
where TðxÞ is the threshold function:
(
1:2 0:6 x
if xpt0 ;
TðxÞ ¼
1:2 0:6 x þ d 0 if x4t0 ;
pffiffiffi
and s2 ðxÞ ¼ 0:4 x þ 0:1.
(3.8)
It would be interesting to compare the current results with those in Example 3.1, and with i.i.d.
observations. Assume that the original X 0 comes from a uniform variable in ½0; 1. ft ; t ¼ 1; 2; g is the same
as in Example 3.1. The results for this example are very similar to those listed in Tables 1 and 2, which implies
that the test method proposed in this paper is not affected when the observation is a sequence of time series
with correlation. The details are omitted.
Lastly, we consider a model with multiple change points. We calculate the percentage of rejecting the null
hypothesis when the regression has two change points with jump sizes of d 1 ¼ a1 þ a2 and d 2 ¼ a2 ,
respectively.
Example 3.3. Consider the following model with conditional heteroscedastic variance
Y t ¼ TðX t Þ þ sðX t Þt ,
where TðxÞ has two different change points at t1 and t2 . The jump sizes for the two change points t1 and t2
(here t2 4t1 ) are d 1 ¼ a1 þ a2 and d 2 ¼ a2 , respectively.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
243
When a1 ¼ 0, the two change points have the same jump sizes, and as a2 ¼ 0, TðxÞ has only one change
point. That is,
8
2
>
< 0:6x
TðxÞ ¼ 0:6x2 þ a1
>
:
0:6x2 þ a1 þ a2
if xpt1 ;
if x4t2 ;
(3.9)
if t1 oxpt2 ;
and sðxÞ ¼ 0:4x2 þ 0:1, in which ft ; tX1g is a sequence of i.i.d. normal random variables. In the simulations,
let t1 and t2 be equal to the different values 0.3 and 0.6, respectively. The parameters in jump size a1 have been
taken as three values, 0.0, 0.2, and 0.5, and a2 as three different values, 0.0, 0.3, and 0.5. For the sake of
simplicity, the size of the sample is only n ¼ 256. The simulated results for Example 3.3 are shown in
Table 3.In Table 3, t^i is the estimator of ti , and i ¼ 1; 2.
Fig. 1(a) describes the scatter points and the true curve of the regression produced by model (3.9), and
Fig. 1(b) is the plot of the series produced by Y t ¼ TðX t Þ þ sðX t Þt , with (3.9) and conditional heteroscedastic
^ ðxÞ in which the conditional
variance s2 ðxÞ ¼ 0:4x2 þ 0:1. Figs. 1(c) and (d) are the plots by all values of M
n
heteroscedastic variance has been estimated by the absolute deviation, and the square estimator, respectively.
As with Example 3.1, the tests based on the empirical wavelet method for the change points are very
powerful, and the estimations of locations and jump sizes of change points are very accurate, except for those
with the same jump size. The estimation of those points at which the regression has changed with the same
jump size (that is, when a1 ¼ 0) is not accurate. This is reasonable because we cannot tell which point is the
first jump when two jump sizes are the same. From the results in Panels A and B, we observe that the
conditional heteroscedastic variances can be estimated with virtually the same accuracy by either the absolute
deviation estimator or the square estimator. The powers of the test with the absolute deviation estimator are
slightly stronger than those with the square estimator.
Table 3
^ n ðkÞ and M
^ ðkÞ for model (3.9) with multiple change points
Powers for tests M
n
a1
a2
Test for t1
10%
Test for t2
5%
10%
Estimation
5%
t^1
SE
t^2
SE
d^1
d^2
Panel A: n ¼ 256,
0.00
0.0
0.3
0.5
0.20
0.0
0.3
0.5
0.50
0.0
0.3
0.5
absolute deviation estimator for s2 ðxÞ
27.0%
7.40%
6.4%
99.0%
93.4%
92.8%
100%
100%
98.8%
99.4%
97.6%
–
100%
100%
98.4%
100%
100%
100%
100%
100%
–
100%
100%
98.4%
100%
100%
100%
0.60%
89.2%
91.0%
–
89.2%
100%
–
89.2%
99.8%
0.334
0.396
0.313
0.303
0.310
0.301
0.305
0.327
0.094
0.145
0.088
0.042
0.145
0.262
0.024
0.078
0.316
0.369
–
0.598
0.508
–
0.617
0.570
0.115
0.133
–
0.136
0.134
–
0.170
0.091
0.369
0.489
0.269
0.554
0.601
0.559
0.854
1.002
0.300
0.411
–
0.164
0.393
–
0.219
0.510
Panel B: n ¼ 256,
0.00
0.0
0.3
0.5
0.20
0.0
0.3
0.5
0.50
0.0
0.3
0.5
square estimator for s2 ðxÞ
15.0%
3.40%
98.6%
91.0%
100%
100%
95.0%
86.0%
100%
100%
100%
99.4%
100%
99.8%
100%
100%
100%
100%
0.20%
82.6%
85.0%
–
77.0%
87.4%
–
47.4%
99.4%
0.333
0.402
0.329
0.298
0.360
0.297
0.302
0.308
0.110
0.148
0.142
0.033
0.115
0.025
0.023
0.022
0.324
0.372
–
0.575
0.398
–
0.604
0.599
0.135
0.136
–
0.088
0.143
–
0.131
0.043
0.281
0.414
0.262
0.554
0.728
0.559
0.855
1.056
0.350
0.476
–
0.116
0.517
–
0.255
0.493
2.4%
91.0%
98.6%
–
97.2%
100%
–
75.4%
100%
ARTICLE IN PRESS
244
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
Fig. 1. This figure describes the curve of Example 3.3: (a) is the plot generating the data based on the model in Example 3.3, and the true
curve with two change points; (b) plots the curve of the observed data; (c) is the plot of the empirical wavelet coefficient test based on the
square estimator for the conditional heteroscedastic variance; (d) draws the plot of the empirical wavelet coefficient test based on the
absolute deviation estimator for the conditional heteroscedastic variance.
4. Applications: short-term interest rates and stock prices
The procedures proposed in this paper suggest a variety of applications in finance and economics. In this
section, as examples off possible applications we apply the test based on the empirical wavelet method to
short-term interest data and the stock closing prices of selected companies. We consider the nonparametric
regression: Y t ¼ mðxt Þ þ sðxt Þ; where fg is Gaussian white noise. Y t ¼ Zt Z t1 , where Z t denotes the
short-rate (see Fan and Yao, 1998), while Z t can be the logarithm of the stock price. In practice, the white
noise model can be expressed as Y t ¼ mðxt Þ þ sðxt Þ, where xt ¼ t=T (t ¼ 1; . . . ; TÞ, implying that mðxt Þ is a
continuous function of dates (see Wang, 1995).
Example 4.1 (Short-term interest rate). The default-free short-term interest rate is a key economic variable. It
directly affects the short end of the term structure, and thus has implications for the pricing of the full range of
fixed-income securities and derivatives. Alternatively, the short rate is an important input for business cycle
analysis because of its impact on the cost of credit, and its sensitivity to monetary policy and inflationary
expectations.
This example deals with the yield of the three-month Treasury Bill in the U.S. The data consist of 1,735
weekly observations from January 5, 1962 to March 31, 1995. The data are presented in Fig. 2(a).Y t
(Y t ¼ Z t Z t1 Þ is plotted against Z t1 in Fig. 2(b). From Fig. 2(b), fY t g is an approximation of a stationary
sequence. We choose the same wavelet cðxÞ as in Example 3.1. The kernel function KðxÞ, in the estimation of
the regression function TðxÞ, is taken as uniform kernel; that is,
(
1
if kxkp1;
KðxÞ ¼ 2
0 if kxk41:
The kernel functions in the estimation of conditional variance, sðxÞ; and density function, f ðxÞ; which are the
Gaussian kernel and Epanechnickov kernel, respectively.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
245
Fig. 2. Results of Example 4.1: (a) is the plot of the logarithm of the original data; (b) plots the first difference of the logarithm of the
original data against the logarithm of the original data; (c) is the plot of the empirical wavelet coefficient test based on the square variance
estimator; (d) is the plot of the empirical wavelet coefficient test based on the absolute deviation estimator.
Table 4
Summary of properties of changes in returns on the three-month Treasury Bill for 1,735 weekly observations from January 5, 1962 to
March 31, 1995
t^1
t^2
t^3
t^4
Estimated changes
Dates
^
M
n
P-value
1,184
1,358
1,623
1,710
May 9, 1984
July 10, 1987
February 12, 1992
October 14, 1994
3.3947
4.5145
5.0843
5.7663
o0:1
o0.05
o0.05
o0.01
Using the empirical wavelet method proposed in this paper, we observe that the test statistics exceed the
critical values at several locations. For example, k ¼ 1; 157 (May, 1982) at the 10% significant level, k ¼ 1; 335
(July 1987) and k ¼ 1; 567 (January 1992) at the 5% significant level, and k ¼ 1; 766 (December 1993) at the
1% significant level. The asymptotic critical values are 2.9435 at the 10% significance level, 3.6633 at the 5%
significance level, and 5.2933 at the 1% significance level, based on Corollary 2.1. Thus, the potential change
points are k ¼ 1; 335 (July 1987), k ¼ 1; 567 (January 1992), and k ¼ 1; 766 (December 1993), at the 5%
significant level. At the 1% significant level; we find a change point k ¼ 1; 766 (December 1993). This implies
that there were significant local structural changes in the short-term interest rates at the corresponding points
for the sample period (Table 4).
Example 4.2 (Stock closing prices). This example studies the possible changes and jumps in stock closing
prices of IBM, Motorola Inc., etc., from January 2, 1991 to December 31, 1999. These companies are listed in
Table 5.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
246
Table 5
Companies studied to check changes in stock prices in Example 4.2
No.
Company name
Abbreviation
1
2
3
4
5
6
7
Berry Petroleum Co.
International Business Machines C
Merrill Lynch & Co. INC.
Motorola INC.
Northeast Utilities
Philips N. V.
Royal Gold INC.
BPC
IBM
MER
MOT
NU
PHG
RGLD
Table 6
Statistics of other companies to check changes in the stock closing prices in Example 4.2
No.
Company
Possible change points
Corresponding dates
P-values
1
2
3
4
5
6
7
BPC
IBM
MER
MOT
NU
PHG
RGLD
1,501
1,602
1,631
457,1,620
1367,1,682
1,641
1,133, 1,554
12/3/96
4/30/97
6/11/97
10/19/92, 5/27/97
5/24/96, 8/22/96
6/25/97
4/8/96, 6/6/94, 2/20/97
o0:001
o0:001
o0:001
o0:001
o0:001
o0:001
o0:001
To induce an approximate stationarity of the observed series, we consider the first difference of the
logarithmically transformed stock closing prices of the companies. Then, we use the wavelet method to check
which dates are the change points in the stock closing prices of these companies. After transforming the
original data, no trend in fY t g is discernible; and the sample autocorrelation function is not significantly
different from the Kronecker delta function. The wavelet cðxÞ taken here is the same as that in Example 3.1.
The kernel functions in the estimation of TðxÞ, sðxÞ, and f ðxÞ are correspondingly similar to those of Example
^ ðkÞ and M
^ n ðkÞ shows that the possible change points in the volatility of the
4.1. Our analysis using the tests M
n
data of IBM stock closing prices are k1 ¼ 453 and k2 ¼ 1; 619, at any usual significant level (5%; 1%). The
corresponding dates are October 13, 1992 and May 24, 1997, respectively. The statistics of the other
companies are summarized in Table 6.
5. Summary and conclusions
The dynamic nonparametric models have been introduced to fit the time series data (short-term interest
rates and stock returns), under general assumptions. The theory of wavelet coefficients permits the
decomposition of the estimated function into localized, oscillating components. Hence, the wavelet method is
an ideal and powerful tool to study localized changes, such as jumps and sharp cusps in time series. Detection
by wavelets is a multi-resolution (time–frequency) technique. The multi-resolution approach has local
adaptivity and hence has advantages over existing smoothing methods based on a fixed spatial scale (even on a
random scale), such as in the Fourier series method and the kernel methods (Müller, 1992).
In this paper, we proposed test statistics, the integral estimator, and the discretized estimator of wavelet
coefficients, which allow for dependent observations and serially correlated errors in dynamic nonparametric
models. Furthermore, we derived the asymptotic distribution of our test statistics and established the
consistency of the estimators of jump points. These estimators are also shown to converge at the rate OP ðn1 Þ
as n goes to infinity, which is actually the minimax rate and optimal rate of convergence in the sense of
probability convergence. Finally, we identified several significant jumps in the short rates and selected stock
prices for the sample period.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
247
The empirical wavelet coefficients can be extended to other nonparametric and parametric models, such as
the threshold autoregression model, the threshold ARCH model, and multivariate additive models. Moreover,
the wavelet procedure can be extended to the multivariate drift-plus-diffusion model. For example, we can
consider the following models (additive models) in a similar procedure, where drift-plus-diffusion model may
be of the form
yt ¼ f 1 ðX 1t Þ þ f 2 ðX 2t Þ þ þ f p ðX pt Þ þ sðX t Þt ,
where Xt ¼ ðX 1t ; . . . ; X pt Þt is a vector of several factors, and t denotes the transpose of the vector.
Acknowledgments
We owe the Editor Cheng Hsiao and anonymous referees many thanks for their guidance and suggestions
for revising this paper.
Appendix A
The proofs are very long and complex; we thus provide outlines only.
For convenience, let k ¼ ½2J n z for z 2 ð0; 1Þ. To obtain the desired results, the following assumptions will
be convenient. Let U be a non-empty open neighborhood of the origin of R and ½a; b U.
(A.1) The probability density of X 1 is bounded away from zero and infinity on some open subset U. That is,
there exists some positive constant such that M 1
1 pf ðxÞpM 1 ; for x 2 U.
(A.2) Let f ðyjxÞ be the conditional density function of X 1 , given X l ¼ x, and there exists some positive
constant M 2 such that f ðx1 jxÞpM 2 for x1 ; x 2 U.
(A.3) Let fX i ; i ¼ 1; 2; . . . ; ng be a sequence of stationary and strongly mixing random vectors, with mixing
coefficient aðÞ with aðkÞ ¼ Oðkr Þ for some large r40.
(A.4) f ðxÞ has a twice bounded derivative, and sðxÞ and CðxÞ are continuous third-order differentiable on U.
(A.5) Let fi ; i ¼ 1; 2; . . .g be a sequence of i.i.d. random variables and for each i, i is independent of
fðX j ; Y j1 Þ; jpig. Meanwhile, EjX jR o1 and EjY jR o1 for some R42.
Remark A.1. Assumptions (A.1) and (A.2) are necessary for the kernel estimation with dependent data. (A.3)
is to simplify proofs. It can be weakened to aðkÞ ¼ Oðkn Þ for some n40. Many sufficient conditions have been
proposed to ensure that X t is strictly stationary and geometrically ergodic. Auestad and Tjøstheim (1990)
provided the following conditions:
lim
x!1
jTðxÞ bxj
¼ 0;
jxj
lim
x!1
sðxÞ
¼ 0,
jxj
where jbjo1, sXc40. Assumption (A.4) is to meet the continuity requirement for kernel smoothing. (A.5) is
also made for simplicity of proofs (see Fan and Yao, 1998). The existence of finite moments is sufficient. When
x is a design point, we only need Assumptions (A.1) and (A.4), and the sequence fX i ; i ¼ 1; 2; . . . ; ng should be
changed to the sequence fi ; i ¼ 1; 2; . . . ; ng in Assumption (A.5).
In the proof of these theorems, we need the following results.
Lemma A.1. Suppose that conditions (A.1)–(A.2) are satisfied. Then for a sufficiently large value of B, as
n ! 1,
(
)
n
X
1
1=2
1=2
P sup ðnhn Þ
K h ðX j xÞ Ef n ðxÞ4Bðnhn Þ
log n !0,
apxpb
j¼1
P
where K h ðÞ ¼ Kð=hn Þ and f n ðxÞ ¼ ð1=ðnhn ÞÞ nj¼1 K h ðX j xÞ.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
248
Proof. It follows from the results of Lemma 3.6 in Zhou and Liang (1999), and the continuity argument, or
from the similar arguments of Theorem 5 in Masry (1996). &
Lemma A.2. (a) Assume that cðxÞ satisfies Assumptions (I), (II), and (III), and that CðxÞ is a continuously
differentiable function in the order of two. Then, uniformly for k in Dn ,
Pn
Z b
j¼1 K h ðX j xÞ½CðX j Þ CðxÞ
per
Pn
dx ¼ OP ð2J n =2 hn cn Þ,
cJ n ;k ðxÞ
K
ðX
xÞ
h
j
a
j¼1
where cn ¼ h2n þ ðlog nÞ1=2 =ðnhn Þ1=2 and
Z b
5J n =2
cper
Þ.
J n ;k ðxÞCðxÞ dx ¼ Oð2
a
(b) Furthermore, for the discretized empirical wavelet coefficient, we have uniformly for k in Dn ;
!
Pn
N
baX
2J n =2
j¼1 K h ðX j wi Þ½CðX j Þ Cðwi Þ
per
Pn
¼ OP
c ðwi Þ
þ OP ð2J n hn cn Þ
N i¼1 J n ;k
N
j¼1 K h ðX j wi Þ
and
N
baX
2J n =2
cper
ðw
ÞCðw
Þ
¼
O
i
i
N i¼1 J n ;k
N
!
þ Oð25J n =2 Þ.
Pn
Proof. It is easy to show the second formula
Pn in part (a) of Lemma A.2. Write rn ðxÞ ¼ ð1=ðnhn ÞÞ j¼1 K h ðX j xÞðCðX j Þ CðxÞÞ and f n ðxÞ ¼ ð1=ðnhn ÞÞ j¼1 K h ðX j xÞ.
Pn
Z b
Z b
rn ðxÞ
j¼1 K h ðX j xÞ½CðX j Þ CðxÞ
P
dx
¼
dx
cper
ðxÞ
cper
n
J n ;k
J n ;k ðxÞ
f
K
ðX
xÞ
j
a
a
n ðxÞ
j¼1 h
Z b
rn ðxÞ
dx þ OP ð2J n =2 hn cn Þ; a:s:;
cper
¼
J n ;k ðxÞ
f ðxÞ
a
where CðxÞ is a twice continuously differentiable function. By Taylor’s expansion and Lemma B.2, we obtain
Z c
x2 KðxÞ þ Oðhn cn Þ a:s:;
rn ðxÞ ¼ h2n f 0 ðxÞC 0 ðxÞ
c
uniformly for x 2 ½a; b, where cn ¼ h2n þ ðlog nÞ1=2 =ðnhn Þ1=2 . Hence,
Z b
rn ðxÞ
dx ¼ Oð2J n =2 hn cn Þ a:s.
cper
J n ;k ðxÞ
f
ðxÞ
a
Similarly, we can prove part (b). Hence, the proof of Lemma A.2 is complete.
&
Lemma A.3. Assume that KðxÞ is a kernel function with finite support ½c; c and hn ! 0 as n ! 1. Let
tl ; l ¼ 1; 2; . . . ; p be some jump points of function TðxÞ and the corresponding jump size be denoted as
d l ; l ¼ 1; 2; . . . ; p. Then, for the integral estimator of the wavelet coefficient, we have
Pp
Pn
Z A
Z b
j¼1 K h ðX j xÞ l¼1 d l Iðtl pX j pbÞ
1=2
J n =2
P
dx
¼
2
cper
ðxÞ
ðb
aÞ
d
cðxÞ dx,
(A.1)
l
n
J n ;k
a
1
j¼1 K h ðX j xÞ
uniformly for k in Iðtl ; 2J n ðb aÞÞ, and
Pp
Pn
Z b
j¼1 K h ðX j xÞ l¼1 d l Iðtl pX j pbÞ
per
Pn
dx ¼ 0,
cJ n ;k ðxÞ
a
j¼1 K h ðX j xÞ
S
uniformly for ke pl¼1 Ll ðAÞ.
(A.2)
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
For the discretized estimator of wavelet coefficient, we have
Pn
Pp
N
b aX
j¼1 K h ðX j wi Þ
l¼1 d l Iðtl pX j pbÞ
per
Pn
c ðwi Þ
N i¼1 J n ;k
K
ðX
j wi Þ
j¼1 h
!
Z A
2J n =2
1=2
J n =2
ðb aÞ d l
cðxÞ dx þ OP
¼2
,
N
1
249
ðA:3Þ
uniformly for k in Iðtl ; 2J n ðb aÞÞ and
!
Pp
Pn
N
baX
2J n =2
j¼1 K h ðX j wi Þ l¼1 d l Iðtl pX j pbÞ
per
Pn
¼ OP
c ðwi Þ
,
N i¼1 J n ;k
N
j¼1 K h ðX j wi Þ
S
uniformly for ke pl¼1 Ll ðAÞ,
J
kðb aÞ
n
Ll ðAÞ ¼ k : a þ
tl o2 Aðb aÞ ,
2J n
(A.4)
in which A is the support point.
Proof of Lemma A.3. By Lemma B.1 in Appendix B, we have for large n;
P
d l nj¼1 K h ðX j xÞIðtl pX j pbÞ
Pn
¼ d l Iðtl pxpbÞ.
j¼1 K h ðX j xÞ
(A.5)
Hence it follows that for k 2 Iðtl ; 2J n ðb aÞÞ;
Pp
Pn
Z b
Z ð1gk Þ=dn
p
X
j¼1 K h ðX j xÞ
l¼1 d l Iðtl pX j pbÞ
per
Pn
dx ¼ ðdn ðb aÞÞ1=2
cJ n ;k ðxÞ
dl
cðxÞ dx
ððtl aÞ=ðbaÞgk Þ=dn
a
j¼1 K h ðX j xÞ
l¼1
Z A
¼ ðdn ðb aÞÞ1=2 d l
cðxÞ dx,
1
J n
J n
and gk ¼ 2 k, in which we have used
where dn ¼ 2
1 dn k
a þ J n ðb aÞ tl o1,
ba
2
for k 2 Iðtl ; 2J n ðb aÞÞ and for ial
d1
k
d1
k
d1
n
a þ J n ðb aÞ ti ¼ n
a þ J n ðb aÞ tl þ n ðti tl Þ
ba
ba
ba
2
2
Sp
tends to infinity as n ! 1. This implies (A.1). Since for all ke l¼1 Ll ðAÞ, we have
1 tl a
gk ! 1,
dn b a
as n ! 1. This implies (A.2). Similarly, we can then prove (A.3) and (A.4). This completes the proof of
Lemma A.3. &
Lemma A.4. (a) Assume that the conditions (A.1)–(A.5) in Appendix A are satisfied. Then we have
Pn
Z b
j¼1 K h ðX j xÞsðX j Þj
per
Pn
dx ¼ OP ðn1=2 Þ.
cJ n ;k ðxÞ
K
ðX
xÞ
h
j
a
j¼1
(b) Assume that the conditions (A.1)–(A.5) in Appendix A and Assumption J(c) are satisfied. Then
Pn
N
baX
j¼1 K h ðX j wl ÞsðX j Þj
per
Pn
¼ OP ðn1=2 Þ.
cJ n ;k ðwl Þ
N l¼1
j¼1 K h ðX j wl Þ
Proof. This is by similar logic for the proof of Lemma A.3.
&
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
250
Theorem A.1. Assume that fX t ; t ¼ 1; 2; . . .g is a sequence of strictly stationary random variables with mixing
coefficient aðkÞ ¼ Oðkr Þ for some constant r40, and has a continuously differentiable density function f ðxÞ with
0ocpf ðxÞpCo1 for x 2 U. The conditional density functions f ðx0 jxl Þ (la0) are bounded. ft ; t ¼ 1; 2; . . .g is
a sequence of independent and identically distributed random variables with Ei ¼ 0 and E2i ¼ s20 , Rand for each t,
Rt is independent of fX s ; sptg. KðÞ is a kernel function with finite support ½c; c and KðxÞ dx ¼ 1,
xKðxÞ dx ¼ 0. sðÞ and KðÞ are continuously differentiable functions. Let hn ! 0; nhn ! 1 and N ! 1 as
n ! 1. Then there exists a sequence of the standard Wiener process fW ðtÞ; t40g such that
X
n
pffiffiffi
1
n
2
sup0ptp1 An ðX j ; tÞj ns0 W n ðEAn ðX j ; tÞÞ ¼ O n1=4 ðlog nÞ3=4 þ þ
a:s.
j¼1
hn Nhn
Furthermore, we have
X
n
pffiffiffi
sup An ðX j ; tÞj ns0 W n ðrðtÞÞ
0ptp1 j¼1
n
1
¼ O n1=4 ðlog nÞ3=4 þ n1=2 hn ðlog nÞ1=2 þ
þ
Nhn hn
a:s:;
where
b a X K h ðX j xÞ
sðX j Þ,
Nhn l¼1
f ðwl Þ
½Nt
An ðX j ; tÞ ¼
Z
t
rðtÞ ¼
0
s2 ððb aÞx þ aÞ
dx
f ððb aÞx þ aÞ
for 0ptp1,
in which wl ¼ a þ i=Nðb aÞ.
This result can be proved by a strong approximation of the empirical process. Its proof is very lengthy, and
is thus placed in Appendix B.
Proof of Theorem 2.1. We only prove part (b) of Theorem 2.1. Similar arguments apply to part (a) of Theorem
2.1. Write
W ðiÞ
J n ;k
N
b aX
¼
cper ðwl Þ
N l¼1 J n ;k
Pn
K h ðX j wl Þsðwj Þj
Pn
.
j¼1 K h ðX j wl Þ
j¼1
From Lemmas A.2 and A.3, it remains to prove that W ðiÞ
J n ;k has the same asymptotic distribution as in
ðTÞ
Theorem 2.1, since W ðiÞ
and
W
have
the
same
asymptotic
distribution functions. By a simple
J n ;k
J n ;k
transformation from Lemma A.2, we have
W ðiÞ
J n ;k ¼
n
1 X
Gn;j sðX j Þj ,
nhn j¼1
(A.6)
where
Gn;j ¼
N
K i;j Zn ðwi Þ
baX
,
N i¼1 f ðwi Þ
(A.7)
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
251
J n
where K i;j ¼ K h ðX j wi Þ and Zn ðxÞ ¼ cper
k,
J n ;k ðxÞ, qi ¼ ðwi aÞ=ðb aÞ ¼ i=N, wi ¼ a þ qi ðb aÞ and gk ¼ 2
Jn
where k ¼ ½2 z for 0ozo1. Let Ln ðzÞ denote the term on the right-hand side of (A.6). Hence we have
n X
N
K i;j Zn ðwi Þ
b aX
sðX j Þj
nNhn j¼1 i¼1 f ðwi Þ
Ln ðzÞ ¼
n X
N
K i;j 2J n =2 cð2J n ðqi gk ÞÞsðX j Þj
b aX
.
nNhn j¼1 i¼1
ðb aÞ1=2 f ðwi Þ
¼
ðA:8Þ
To use the results of the strong approximation for the partial sum of independent sequence, we write
ðb aÞ1=2 K h ðX j wi ÞsðX j Þj
,
Nhn
f ðwi Þ
P
P
and bi ¼ nj¼1 ai;j . Let zn;0 ¼ 0 and zn;i ¼ il¼1 bl . Hence we obtain that
aij ¼
zn;i ¼
i X
n
X
alj
l¼1 j¼1
"
#
n
i
X
K h ðX j wl Þ
ðb aÞ1=2 X
¼
sðX j Þj
f ðwl Þ
Nhn
j¼1
l¼1
¼
n
X
An ðX j ; qi Þj ,
ðA:9Þ
j¼1
where
An ðX j ; qi Þ ¼
i
K h ðX j wl Þ
ðb aÞ1=2 X
sðX j Þ.
f ðwl Þ
Nhn
l¼1
By simple calculation, we can obtain the mean of A2n;j ðwi Þ, which is, for large enough n;
!2
i
K h ðX j wl Þ
ðb aÞ1=2 X
2
sðX j Þ
EAn ðX j ; qi Þ ¼ E
f ðwl Þ
Nhn
l¼1
2
Z wi
K h ðX j xÞ
1
1
dx þ O
E sðX j Þ
¼ 2
N
f ðxÞ
hn ðb aÞ
a
2 !
Z qi 2
s ðxðb aÞ þ aÞ
1
dx þ O
¼
þ Oðh2n Þ.
f ðxðb aÞ þ aÞ
Nhn
0
ðA:10Þ
Write
Z
rðtÞ ¼
0
t
s2 ðxðb aÞ þ aÞ
dx
f ðxðb aÞ þ aÞ
for 0ptp1,
otherwise, rðtÞ ¼ 0.
Assuming that Ej ¼ 0 and Eð2j Þ ¼ s20 , by Theorem A.1, we have
X
n
pffiffiffi
An ðX j ; tÞj ns0 W n ðrðtÞÞ
sup 0ptp1 j¼1
n
1
¼ O n1=4 ðlog nÞ3=4 þ n1=2 hn ðlog nÞ1=2 þ
þ
a:s:;
Nhn hn
ðA:11Þ
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
252
where
ðb aÞ1=2 X K h ðX j ql Þ
sðX j Þ,
f ðwl Þ
Nhn
l¼1
½Nt
An ðX j ; tÞ ¼
in which ½a denotes the integer part of real a. Obviously,
zn;i ¼
n
X
An ðX j ; wi Þj .
j¼1
Hence, from (A.8), we have
N
1 X
qi g k
c
Ln ðzÞ ¼ 1=2
½zn;i zn;i1 ,
dn
ndn i¼1
where dn ¼ 2J n . Hence, by Abel’s summation, we have
N
1 X
qi g k
ðzn;i zn;i1 Þc
Ln ðzÞ ¼ 1=2
dn
ndn i¼1
1
X
q gk
1
q gk
1 N
q gk
¼ 1=2 c N
c iþ1
zn;N 1=2
c i
zn;i .
dn
dn
dn
ndn
ndn i¼1
ðA:12Þ
For the Wiener process, it is known that
sup jW ðrðx þ dn ÞÞ W ðrðxÞÞj ¼ Oððdn log dn Þ1=2 Þ a:s.
0pxp1
where dn is any small number (cf. Csörgo+ and Révész, 1981). Using this property and the bounded variation of
KðxÞ, we have
N
1
X
qiþ1 x
qi x
c
c
W ðrðqi ÞÞ
dn
dn
i¼1
!
Z 1
tx
log N 1=2
¼
W ðrðtÞÞ dC
a:s:;
ðA:13Þ
þO
dn
N
0
uniformly for x 2 ½0; 1. It follows from (A.12) and (A.13) that
!
Z 1 s0
x gk
Yn
log N 1=2
Ln ðzÞ ¼
c
dW ðrðxÞÞ þ O 1=2 þ
Nndn
dn
ðndn Þ1=2 0
dn
a:s.
uniformly for 0pgk p1, where
Yn ¼
2
1=2
log n 3=4
h log n
1
1
þ n
þ
þ
.
n
n
Nhn nhn
Let
Y n ðgk Þ ¼
s0
d1=2
n
Z
1
c
0
x gk
dW ðrðxÞÞ.
dn
From the definition of the Wiener process, it is easy to show by calculating its moments that
Z y
gðsÞ dW ðsÞ for 0pxp1,
W ðrðyÞÞ:¼
0
(A.14)
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
253
where :¼ means that the arguments on both of its sides have the same distribution, and for 0oxo1
g2 ðxÞ ¼
ðb aÞs2 ðxðb aÞ þ aÞ
,
f ðxðb aÞ þ aÞ
otherwise, g2 ðxÞ ¼ 0. Hence,
Z 1 s0
xy
c
gðxÞ dW ðxÞ.
Y n ðyÞ ¼ 1=2
dn
dn 0
Let Y n1 ðxÞ ¼ Y n ðxÞ=gðxÞ and
Z 1 xy
1=2
C
dW ðxÞ.
Y n0 ðyÞ ¼ s0 dn
dn
0
Following the method of Härdle (1989), we have
sup jY n0 ðxÞ Y n1 ðxÞj ¼ Op ðd1=2
n Þ.
(A.15)
x2ð0;1Þ
Hence,
the proof of Theorem 2.1 is complete, since it follows from (A.8)–(A.15) and Lemmas A.2 and A.3 that
pffiffiffi
n sup0pxp1 jM n ðxÞj and sup0pxp1 jY n0 ðxÞj have the same asymptotic distribution functions. &
Proof of Corollary 2.1. It is straightforward from Theorem 2.1 and Lemma B.3 (see Bickel and Rosenblatt,
1973). &
Proof of Theorem 2.2. Note that U JðTÞ
¼ U ðTÞ
J n ðkÞ can be decomposed into two parts:
n ;k
ðiÞ
ðiiÞ
U ðTÞ
J n ;k ¼ U J n ;k þ U J n ;k ,
(A.16)
where
U ðiÞ
J n ;k
U ðiiÞ
J n ;k
Z
Pn
b
ZJ n ;k ðxÞ
¼
a
Z
b
¼
a
K ðX xÞsðX l Þl
l¼1
Pn h l
l¼1 K h ðX l xÞ
dx,
(A.17)
Pn
K ðX xÞTðX l Þ
Pn h l
dx,
ZJ n ;k ðxÞ l¼1
l¼1 K h ðX l xÞ
(A.18)
with ZJ n ;k ðxÞ ¼ cper
J n ;k ðxÞ.
From Lemmas A.2 and A.3, we have
Z A
1=2
J n =2
¼
2
d
ðb
aÞ
cðxÞ dx þ OP ð25J n =2 þ 2J n =2 hn cn Þ,
U ðiiÞ
l
J n ;k
1
uniformly for k 2 Iðtl ; 2
for
J n
Aðb aÞÞ, and
5J n =2
U ðiiÞ
þ 2J n =2 hn cn Þ,
J n ;k ¼ OP ð2
Sp
ke l¼1 Iðtl ; 2J n =2 Aðb aÞÞ. From Lemma
A.4, it follows that
1=2
U ðiÞ
Þ
J n ;k ¼ OP ðn
for all k 2 Dn . This implies that Theorem 2.2 holds for the integral estimation of the wavelet coefficient.
Similarly, we can prove Theorem 2.2 for the discretized estimation of the wavelet coefficient. &
Proof of Theorem 2.3. The proof is straightforward from Theorem 2.2.
&
Proof of Theorem 2.4. As TðxÞ has a jump at tl and TðxÞ is differentiable at all points except tl , it follows from
the similar arguments in Lemmas A.2 and A.3
Z
Pn
b
j¼1 K h ðX j xÞTðX j Þ
per
dxpC23J n =2 ,
cJ n ;k ðxÞ Pn
a
K
ðX
xÞ
j
j¼1 h
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
254
for all keIðtl ; 2J n Aðb aÞÞ, where C is a generic constant whose value may change from line to line. Hence,
by Lemma A.3 we have
Z
Pn
b
j¼1 K h ðX j xÞTðX j Þ
per
dxXC2J n =2 ,
cJ n ;k ðxÞ Pn
a
j¼1 K h ðX j xÞ
for k 2 Iðtl ; 2J n Aðb aÞÞ.
From the same arguments of Lemma A.2, we can easily show that
Pn
1=2 !
Z b
dn
j¼1 K h ðX j xÞsðX j Þj
per
Pn
dx ¼ OP
cJ n ;k ðxÞ
,
nhn
a
j¼1 K h ðX j xÞ
for all k ¼ ½2J n z 2 Dn , 0ozo1. In fact, Taylor’s formula implies that
n
X
K h ðX j xÞsðX j Þj ¼ sðxÞ
j¼1
n
X
K h ðX j xÞj þ s0 ðxÞ
n
X
j¼1
þ
K h ðX j xÞðX j xÞj
j¼1
n
1X
s00 ðxj ÞK h ðX j xÞðX j xÞ2 j ,
2 jþ1
where xj lies between X j and x. By Lemma B.2 (see Proposition 1 and Lemma 1 of Xia, 1998), we have
X
n
sup
K h ðX j xÞj ¼ OP ððnhn Þ1=2 Þ,
x2L j¼1
X
n
sup
K h ðX j xÞðX j xÞj ¼ OP ððnhn Þ1=2 hn Þ,
x2L j¼1
X
n
sup
K h ðX j xÞðX j xÞ2 j ¼ OP ððnhn Þ1=2 h2n Þ,
x2L j¼1
where L ¼ ½a d0 ; b þ d0 for some d0 40. Hence,
Pn
1=2 !
Z b
K
ðX
xÞsðX
Þ
dn
h
j
j
j
j¼1
Pn
dx ¼ OP
cper
.
J n ;k ðxÞ
K
ðX
xÞ
nh
j
n
a
j¼1 h
These inequalities imply that
maxfjU ðTÞ
J n ;k j; k
2 Iðtl ; 2
J n
Aðb aÞgXC
d1=2
n
dn
nhn
1=2 !
(A.19)
and
maxfjU ðTÞ
J n ;k j; k
2 Dn Iðtl ; 2
J n
Aðb aÞgpC d
3=2
dn
þ
nhn
1=2 !
.
(A.20)
As
1 1=2
d1=2
n ðnhn dn Þ
1 1=2
d3=2
n þ ðnhn dn Þ
! 1,
combining (A.19) with (A.20), we obtain that
ðTÞ
J n =2
Aðb aÞÞg.
maxfjU ðTÞ
J n ;k j; k 2 Dn g ¼ maxfjU J n ;k j; k 2 Iðtl ; 2
(A.21)
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
255
ðTÞ
In fact, U ðTÞ
J n ;k can be replaced by W J n ;k in (A.21). This implies that
ðiÞ
k^0 2 Iðtl ; 2J n Aðb aÞÞ,
(A.22)
ðiÞ
ðiÞ
ðdÞ
ðniÞ
ðndÞ
where k^0 is one of k^l , k^l , k^l and k^l . Hence, we have
ðiÞ
k^l
ðiÞ
jt^l tl j ¼ a þ J n ðb aÞ tl o2J n Aðb aÞ.
2
(A.23)
ðdÞ
Similarly, we can prove (A.23) also holds for other estimators t^l . This completes the proof of Theorem
2.4. &
Proof of Theorem 2.5. Let tl , l ¼ 1; 2; . . . ; p denote jump points tl , l ¼ 1; 2; . . . ; L of TðxÞ. Similarly, t^l denotes
the corresponding estimators. Let
gðxÞ ¼
f 1=2 ðxÞ
sðxÞ
for x 2 ½a; b.
Using similar arguments of (A.21), we obtain that
maxfgða þ kðb aÞdn ÞjU ðTÞ
J n ;k j; k 2 Dn g
J n
Aðb aÞg.
¼ maxfgða þ kðb aÞdn ÞjU ðTÞ
J n ;k j; k 2 Iðtl ; 2
ðA:24Þ
As
ðTÞ
gða þ kðb aÞdn ÞjU ðTÞ
J n ;k j ¼ gðtl ÞjU J n ;k j þ Dn ðkÞ,
where
Dn ðkÞ ¼ ðgða þ kðb aÞdn Þ gðtl ÞÞjU ðTÞ
J n ;k j,
it is easy to show that
J n
maxfjDn ðkÞj; k 2 Iðtl ; 2J n Aðb aÞg ¼ Oð2J n ÞmaxfjU ðTÞ
Aðb aÞg.
J n ;k j; k 2 Iðtl ; 2
From (A.19), (A.20), and Lemma A.4, we have
1=2 !
dn
Cdn d1=2
pmaxfjDn ðkÞj; k 2 Iðtl ; 2J n Aðb aÞgpCd3=2
n n .
nhn
Thus, we have
J n
Aðb aÞg
maxfgða þ kðb aÞ=2J n ÞjU ðTÞ
J n ;k j; k 2 Iðtl ; 2
¼
gðtl ÞmaxfjU ðTÞ
J n ;k j; k
2 Iðtl ; 2
J n
Aðb aÞg þ OP
d3=2
n
d3n
nhn
1=2 !
.
In contrast, arguments of Lemmas A.2–A.4 imply
1=2 !
n
o
[L
dn
ðTÞ
max gða þ kðb aÞ=2J n ÞjU J n ;k j; ke l¼1 Iðtl ; 2J n Aðb aÞ ¼ OP d3=2
.
n þ
nhn
Note that
1 3 1=2
d3=2
n ððnhn Þ dn Þ
1
1=2
d3=2
n þ ððnhn Þ dn Þ
! 0.
ðTÞ
Hence, the estimation of locations of change points defined by U ðTÞ
J n ;k (W J n ;k ) is equivalent to that defined by
M n ðk2J n Þ (M n ðk2J n Þ) in probability.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
256
Thus, Theorem 2.1 implies that with a probability tending to 1 w, jM n ðxÞjpC 1w for those x at which
TðxÞ has no jump or sharp cusp, where C 1w is the 1 w quantile of the asymptotic distribution of Theorem
2.1. Using the same arguments as those in Theorems 2.4 and 3.2, we can easily show that with probability
1 w, M n ðxÞ (or jM nn ðxÞj) will exceed C 1o at only those x values where
J
½2J n x
n
x : a þ J n ðb aÞ tl p2 Aðb aÞ ,
2
l ¼ 1; 2; . . . ; p. From the definitions of t^l ; l ¼ 1; 2; . . . ; p, we can show with probability 1 w that p^ j ¼ p
ðj ¼ 1; 2Þ, or q^ j ¼ q ðj ¼ 1; 2Þ. Finally, we can prove Theorem 2.5 by letting n ! 1 and w ! 0. &
Appendix B
In this section, we prove (A.5) and Theorem A.1.
Lemma B.1. Suppose that KðxÞ is a kernel function with finite support ½c; c and hn ! 0 as n ! 1. Let
tl 2 ½a; b, K h ðxÞ ¼ KððX i xÞ=hn Þ. Then
Pn
j¼1 K h ðX j xÞIðtl pX j pbÞ
Pn
¼ Iðtl pxpbÞ.
j¼1 K h ðX j xÞ
Proof. It is easy to obtain that
8
if tl pX i pb; xotl or tl pX i pb; x4b;
>
<1
Iðtl pX i pbÞ Iðtl pxpbÞ ¼ 1 if tl pxpb; X i otl or tl pxpb; X i 4b;
>
:
0
otherwise:
Hence,
Iðtl pX i pbÞ Iðtl pxpbÞ
¼ Iðtl pX i pb; xotl Þ þ Iðtl pX i pb; x4bÞ þ Iðtl pxpb; X i otl Þ þ Iðtl pxpb; X i 4bÞ.
We can easily show that
Pn
j¼1 K h ðX j xÞIðtl pX i pb; xotl Þ
Pn
j¼1 K h ðX j xÞ
Pn
j¼1 K h ðX j xÞIðtl xpX i xpb x; xotl Þ
Pn
¼
j¼1 K h ðX j xÞ
Pn
j¼1 KðY j ÞIððtl xÞ=hn pY i pðb xÞ=hn ; xotl Þ
Pn
¼
.
j¼1 K h ðY j Þ
As jY i j4C, KðY i Þ ¼ 0 and for large enough n, xotl implies ðtl xÞ=hn ! 1, we obtain that
Iððtl xÞ=hn pY i pðb xÞ=hn ; xotl ÞpIðY i 4CÞ.
Hence, for large n, we have
n
X
KðY j ÞIððtl xÞ=hn pY i pðb xÞ=hn ; xotl Þ ¼ 0.
i¼1
Hence,
Pn
xÞIðtl pX i pb; xotl Þ
Pn
¼ 0.
j¼1 K h ðX j xÞ
j¼1 K h ðX j
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
257
Similarly, we can show that
Pn
j¼1 K h ðX j xÞIðtl pX i pb; x4bÞ
Pn
¼ 0,
j¼1 K h ðX j xÞ
Pn
j¼1 K h ðX j xÞIðtl pxpb; X i otl Þ
Pn
¼ 0,
j¼1 K h ðX j xÞ
Pn
j¼1 K h ðX j xÞIðtl pxpb; X i 4bÞ
Pn
¼ 0.
j¼1 K h ðX j xÞ
This completes the proof of Lemma B.1.
&
Proof of Theorem A.1.. For the sake of convenience, let
ðb aÞ1=2 X K h ðz wl Þ
sðzÞ
f ðwl Þ
Nhn
l¼1
½Nt
An ðz; tÞ ¼
and L ¼ ½a d0 ; b þ d0 ¼ ½a0 ; b0 for some positive constant d0 40. By the continuity of KðÞ and f ðÞ, we
obtain
jAn ðz; tÞjpM,
sup
0oto1;z2L
1
1
jAn ðz1 ; tÞ An ðz2 ; tÞj ¼ O
þ
,
nhn Nhn
0oto1 jz1 z2 jpM=n
sup
sup
jAn ðz; tÞj ¼ 0,
sup
0oto1;zeL
for some large M40.
We have the following:
n
X
X
An ðX j ; tÞj ¼
An ðX j ; tÞj þ
X j 2L
j¼1
X
An ðX j ; tÞj
X j eL
¼ Dn1 þ Dn2 .
Thus, we easily obtain
sup jDn2 j ¼ 0
a:s:;
0oto1
where a.s. denotes the convergence almost surely. Now we divide the interval L into nonoverlapping intervals
of equal length L1 ; L2 ; . . . ; Ln , with right extreme point li , i ¼ 1; 2; . . . ; nðln ¼ b0 Þ. Define X~ j ¼ li IðX j 2 Li Þ,
i ¼ 1; 2; . . . ; and X~ j ¼ b0 þ 1, if X j eLi ; i ¼ 1; 2; . . . ; n. Thus we have
1
jX~ j X j j ¼ O
a:s.
n
if X j 2 Li . Next, we prove that
Dn1 ¼
n
X
j¼1
¼
n
X
j¼1
An ðX~ j ; tÞIðX j 2 LÞj þ
n
X
½An ðX j ; tÞ An ðX~ j ; tÞIðX j 2 LÞj
j¼1
1
n
~
An ðX j ; tÞIðX j 2 LÞj þ O
þ
,
hn Nhn
uniformly for 0oto1.
P
Let F n ðxÞ ¼ nj¼1 IðX j oxÞj denote the hybrids of the empirical process and the partial sums process. For
convenience, we denote F ðx2 Þ F ðx1 Þ by F ð½x1 ; x2 ÞÞ and F n ðx2 Þ F n ðx1 Þ by F n ð½x1 ; x2 ÞÞ. Horváth (2000) and
Xia (1999) independently studied the properties of the stochastic process F n ðxÞ. Xia (1999) in more general
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
258
conditions has shown that
sup jF n ðxÞ pffiffiffi
ns0 W n ðGðxÞÞj ¼ Oðn1=4 ðlog nÞ3=4 Þ
a:s:;
(B.1)
x
where W n ðtÞ for 0oto1 is a sequence of the standard Wiener process in Cð0; 1Þ.
Thus, by Abel’s transformation, we have
n
X
An ðX~ j ; tÞIðX j 2 LÞj ¼
j¼1
n X
n
X
An ðli ; tÞIðX j 2 Li Þj
i¼1 j¼1
¼
n
X
An ðli ; tÞ
i¼1
¼
n
X
n
X
IðX j 2 Li Þj
j¼1
An ðli ; tÞF n ðLi Þ
i¼1
¼ An ðb0 ; tÞ
n
X
F n ðLi Þ i¼1
n1
i
X
X
½An ðliþ1 ; tÞ An ðli ; tÞ
F n ðLj Þ
i¼1
j¼1
pffiffiffi
¼ s0 nAn ðb0 ; tÞW ðF ðLÞÞ
n1
pffiffiffi X
½An ðliþ1 ; tÞ An ðli ; tÞW ðF ð½a0 ; liþ1 ÞÞÞ
s n
i¼1
pffiffiffi
þ An ðb0 ; tÞ½F n ðLÞ s0 nW ðF ðLÞÞ
n1
X
pffiffiffi
½An ðliþ1 ; tÞ An ðli ; tÞ½F n ð½a0 ; liþ1 ÞÞ s0 nW ðF ð½a0 ; liþ1 ÞÞÞ
i¼1
¼ D31 D32 þ D33 D34 .
ðB:2Þ
By simple calculation, we can show that
sup
n1
X
jAn ðliþ1 ; tÞ An ðli ; tÞjpM.
0oto1 i¼1
From (B.1), we have
sup jD33 j ¼ Oðn1=4 ðlog nÞ3=4 Þ a:s.
(B.3)
0oto1
and
pffiffiffi
sup jD34 j ¼ max jF n ð½a0 ; liþ1 ÞÞ s0 nW n ðF ð½a0 ; liþ1 ÞÞj
1pipn
0oto1
sup
n1
X
jAn ðliþ1 ; tÞ An ðli ; tÞj
0oto1 i¼1
¼ Oðn1=4 ðlog nÞ3=4 Þ a:s.
ðB:4Þ
By the limit results of the increment of the Wiener process in Csörgo+ and Révész (1981, p. 26), we have
sup
jz1 z2 jpMan ;z1 ;z2 2L
1=2
jW ðz1 Þ W ðz2 Þj ¼ Oða1=2
Þ
n ðlog nÞ
a:s:;
(B.5)
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
259
where an ! 0. Hence, using Abel’s transformation and the mean value theorem of integration, we have
Z
n1
X
An ðx; tÞ dW ðF ð½a0 ; xÞÞ An ðb0 ; tÞW ðF ðLÞÞ þ
½An ðliþ1 ; tÞ An ðli ; tÞW ðF ða0 ; liþ1 ÞÞ
x2L
i¼1
X
n1
¼ ½An ðliþ1 ; tÞ An ðli ; tÞ½W ðF ð½a0 ; liþ1 ÞÞ W ðF ð½a0 ; liþ1 ÞÞ
i¼1
¼ Oðn1=2 ðlog nÞ1=2 Þ
a:s:;
ðB:6Þ
uniformly for t 2 ð0; 1Þ, where liþ1 lies between li and liþ1 . This implies that
Z
pffiffiffi
D31 D32 ¼ ns0
W ðF ð½a0 ; xÞÞÞ dAn ðx; tÞ þ Oððlog nÞ1=2 Þ a:s:;
x2L
uniformly for t 2 ð0; 1Þ. Hence, from (B.2)–(B.4) and (B.6) we have
Z
X
n
pffiffiffi
sup An ðX j ; tÞj ns0
An ðx; tÞ dW ðF ð½a0 ; xÞÞÞ
0oto1 j¼1
x2L
¼ Oðn1=4 ðlog nÞ3=4 þ 1=hn þ n=ðNhn ÞÞ
a:s.
ðB:7Þ
Write
Y~ n ðtÞ ¼
Z
An ðx; tÞ dW ðF ½a0 ; xÞÞ,
x2L
and Y n ðtÞ ¼ W ðrn ðtÞÞ, where
rn ðtÞ ¼ EðY~ n ðtÞÞ2 .
By simple calculation (c.f. (A.10)), we have
!
Z wt 2
s ðxÞ
1 2
2
dx þ O
rn ðtÞ ¼
þ hn .
f ðxÞ
Nhn
a
By the properties of the Wiener process, it is easy to show that
Z
Z
An ðx; sÞ dW ðF ð½a0 ; xÞÞÞ
An ðx; tÞ dW ðF ð½a0 ; xÞÞÞ
EY~ n ðsÞY~ n ðtÞ ¼ E
Z
x2L
Z
ðxws Þ=hn
Z
x2L
ðxwt Þ=hn
KðzÞ
dzs2 ðxÞf ðxÞ dx þ Oð1=ðNhn Þ2 Þ
f ðx hn zÞ
x2L ðxaÞhn
ðxaÞhn
!
Z
s2 ðxÞ
1 2
dx þ O
¼
þ h2n
Nhn
fx2Lg\fapxpws g\fapxpwt g f ðxÞ
!
Z ws ^wt 2
s ðzÞ
1 2
2
dz þ O
¼
þ hn
f ðzÞ
Nhn
a
¼
KðyÞ
dy
f ðx hn yÞ
¼ EY n ðsÞY n ðtÞ,
where a ^ b ¼ minða; bÞ; and the constant in OðÞ is independent of t, s, and n. Hence, Y~ n ðtÞ and Y n ðtÞ have the
same distributions for all t 2 L. Thus,
X
n
pffiffiffi
n
1
sup An ðX j ; tÞj ns0 W ðrn ðtÞÞ ¼ O n1=4 ðlog nÞ3=4 þ
þ
a:s.
Nhn hn
0oto1 j¼1
This implies the first result of Lemma B.1.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
260
Again, using (B.5), we have
ðlog nÞ1=2
þ hn ðlog nÞ1=2
sup jW ðrn ðtÞÞ W ðrðtÞÞj ¼ O
Nhn
0oto1
!
a:s.
Hence,
sup jY~ n ðtÞ W ðrðtÞÞj ¼
0oto1
ðlog nÞ1=2
þ hn ðlog nÞ1=2
Nhn
!
a:s.
From (B.7), we obtain that
X
n
pffiffiffi
1
n
sup An ðX j ; tÞj ns0 W ðrðtÞÞ ¼ O n1=4 ðlog nÞ3=4 þ n1=2 hn ðlog nÞ1=2 þ þ
hn Nhn
0oto1 j¼1
This completes the proof of Lemma B.1.
a:s.
&
To prove Lemmas A.2 and A.4–A.6, we need the following lemma:
Lemma B.2. Assume that fX t ; t ¼ 1; 2; . . .g is a strictly stationary mixing sequence with coefficient aðkÞ ¼
Oðkr Þ for some r40. The density function f ðxÞ of X t satisfies 0ocpf ðxÞpCo1 for some constants c and C,
and f ðxÞ has a bounded derivative. The conditional density functions f ðxR0 jxl Þ ðla0Þ areR bounded. KðÞ is a
continuously differentiable kernel function with finite support ½c; c, and KðxÞ dx ¼ 1, xKðxÞ dx ¼ 0. Let
hn ! 0; nhn ! 1 and N ! 1 as n ! 1. Then
(a) for any positive integer i, we have
n
X
Xt x
0
iþ1
K
ðX t xÞi ¼ nhiþ1 fi f ðxÞ þ nhiþ2
n fiþ1 f ðxÞ þ Oðnhn cn Þ a.s.,
hn
t¼1
uniformly for x 2 ½a; b, where cn ¼ h2n þ log n=ðnhn ÞÞ1=2 and
Z
fi ¼ xi KðxÞ dx;
(b) for any 0prp1, we have
n
X
Xt x
rþ2
0
rþ1
K
jX t xjr ¼ nhrþ1
n Zr f ðxÞ þ nhn Zrþ1 f ðxÞ þ Oðnhn cn Þ a.s.,
h
n
t¼1
uniformly for x 2 ½a; b, where
Z
Z
r
Zr ¼ jxj KðxÞ dx; Zrþ1 ¼ xjxjr KðxÞ dx;
(c) in addition, assume that ft ; t ¼ 1; 2; . . .g satisfies the conditions of Lemma B.2, then
(
n
X
Oððnhn log nÞ1=2 Þ a.s.;
Xt x
K
t ¼
hn
OP ððnhn Þ1=2 Þ;
t¼1
and
(
n
X
Oððnhn log nÞ1=2 Þ a.s.;
Xt x Xt x
K
t ¼
hn
hn
OP ððnhn Þ1=2 Þ;
t¼1
uniformly for x 2 ½a; b.
ARTICLE IN PRESS
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
261
Proof. Following the proof of Proposition 1 in Masry (1996), we can easily prove this lemma. In fact, parts (a)
and (c) are also the results of Lemma 1 in Xia (1998). &
Lemma B.3 (Bickel and Rosenblatt, 1973). Let cðÞ be a kernel function with bounded support ½A; A and W ðÞ
be a standard Wiener process. Define
Z xt
V n ðtÞ ¼ dn1=2 c
dW ðxÞ.
dn
Suppose that dn ! 0 and nd2n ! 1. Then
(
)
1=2
n
P Aðdn Þ
sup jV n ðxÞj aðdn Þoz ! expð2 expðzÞÞ,
k2
0pxp1
where AðÞ, aðÞ, k1 , and k2 are the same as those of Corollary 2.1.
References
Ahn, C.M., Thompson, H.E., 1988. Jump-diffusion processes and the term-structure of interest rates. Journal of Finance 43, 155–174.
Aı̈t-Sahalia, Y., 1996a. Nonparametric pricing of interest rate derivative securities. Econometrica 64, 527–560.
Aı̈t-Sahalia, Y., 1996b. Testing continuous-time models of the spot interest rate. The Review of Financial Studies 9, 385–426.
Aı̈t-Sahalia, Y., 2002. Telling from discrete data whether the underlying continuous-time model is a diffusion. Journal of Finance 61,
2075–2112.
Aı̈t-Sahalia, Y., Wang, Y., Yared, F., 2001. Do option markets correctly price the probability of movement of the underlying asset.
Journal of Econometrics 53, 499–547.
Amin, K., 1993. Jump diffusion option valuation in discrete time. Journal of Finance 48, 1833–1863.
Antoniadis, A., Gijbels, I., 1997. Detecting abrupt changes by wavelet methods. Discussion Paper 9716. Institute of Statistics, Louvain-laNeuve.
Anderson, T.G., Lund, J., 1997. Estimating continuous time stochastic volatility models of the short term interest rate. Journal of
Econometrics 77, 343–377.
Auestad, B., Tjøstheim, D., 1990. Identification of nonlinear time series: first order characterization and order determination. Biometrika
77, 669–687.
Bates, D., 1991. The crash of 87’s: was it expected? the evidence from options markets. Journal of Finance 46, 1009–1044.
Bickel, B.L., Rosenblatt, M., 1973. On some global measures of the deviations of density function estimates. The Annals of Statistics 1,
1071–1095.
Billingsley, P., 1968. Convergence of Probability Measures. Wiley, New York.
Bradley, R.C., 1986. Basic properties of strong mixing conditions. In: Eberlein, E., Taqqu, M.S. (Eds.), Dependence in Probability and
Statistics: A Survey of Recent Results. Birhauser, Boston, pp. 165–192.
Carlstein, H., Müller, G., Siegmund, D., 1994. Change Points Problem. IMS, Hayward, CA.
Cox, J.C., Ingersoll, J.E., Ross, S.A., 1985. An intertemporal General equilibrium model of asset prices. Econometrica 53, 363–384.
+ M., Révész, P., 1981. Strong Approximation in Probability and Statistics. Academic Press, New York.
Csörgo,
Das, S.R., Foresi, S., 1996. Exact solutions for bond and option prices with systematic jump risk. Review of Derivatives Research 1, 7–24.
Daubechies, I., 1992. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia.
Delgado, M., Hidalgo, J., 2000. Nonparametric inference on structural breaks. Journal of Econometrics 96, 113–144.
Drost, F., Nijman, T., Werker, B., 1998. Estimation and testing in models containing both jumps and conditional heteroscedasticity.
Journal of Business & Economic Statistics 16, 237–243.
Duffie, D., Kan, R., 1993. A yield factor model of interest rates. Working Paper. Stanford University.
Duffie, D., Pan, J., Singleton, K.J., 2000. Transform analysis and asset pricing for affine jump-diffusions. Econometrica 68, 1343–1376.
Fan, J., Gijbels, I., 1996. Local Polynomial Modeling and its Applications. Chapman & Hall, London.
Fan, J., Yao, Q., 1998. Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645–660.
Gallent, A.R., Tauchen, G., 1997. Estimation of continuous time models for stock returns and interest rates. Macroeconomics Dynamics
1, 135–168.
Gasser, T., Sroka, L., Jennen-Steinmetz, C., 1986. Residual variance and residual pattern in nonlinear regression. Biometrika 73, 625–633.
Härdle, W., 1989. Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis 29, 163–179.
Härdle, W., Tsybakov, A., 1997. Local polynomial estimators of the volatility function in nonparametric autoregression. Journal of
Econometrics 81, 223–242.
Horváth, L., 2000. Approximations for hybrids of empirical and partial sums processes. Journal of Statistical Planning and Inference 88,
1–18.
Kao, C.R., Ross, S.L., 1995. A CUSUM test in the linear regression model with serially correlated disturbances. Econometric Review 14,
331–346.
ARTICLE IN PRESS
262
G. Chen et al. / Journal of Econometrics 143 (2008) 227–262
Karlin, S., McGregor, J., 1959. Coincidence probabilities. Pacific Journal of Mathematics 9, 1141–1164.
Korostelev, A.P., 1987. Minimax estimation of a discontinuous signal. Theory of Probability and its Application 32, 727–730.
Krämer, W., Ploberger, W., Alt, R., 1988. Testing for structural change in dynamic regression models. Econometrica 56, 1355–1369.
Li, Y., Xie, Z., 1999. The wavelet identification of threshold and time delay of threshold autoregressive models. Statistica Sinica 9,
153–166.
Mallat, H.G., Wang, J.L., 1992. Singularity detection and processing with wavelets. IEEE Transactions on Information Theory 2,
617–643.
Masry, E., 1996. Multivariate local polynomial regression for time series: uniform and strong consistency and rates. Journal of Time Series
Analysis 17, 571–599.
Masry, E., Tjøstheim, D., 1995. Nonparametric estimation and identification of nonlinear ARCH times series: strong convergence and
asymptotic normality. Econometric Theory 11, 258–289.
Merton, R., 1976. Option pricing when returns are discontinuous. Journal of Financial Economics 3, 125–144.
Müller, T.G., 1992. Change points in nonparametric regression analysis. The Annals of Statistics 20, 737–761.
Nadaraya, E.A., 1964. On estimating regression. Theory of Probability and its Applications 9, 141–142.
Raimondo, M., 1998. Minimax estimation of sharp change points. The Annals of Statistics 26, 1379–1397.
Tran, K.C., 1999. Testing for structural change in the dynamic adjustment model with autoregressive errors. Empirical Economics 24,
61–74.
Vasicek, O., 1977. An equilibrium characterization of the term structure. Journal of Financial Economics 5, 177–188.
Wang, Y., 1995. Jump and sharp cusp detection by wavelets. Biometrika 82, 385–397.
Wang, Y., 1998. Change curve estimation via wavelets. Journal of the American Statistical Association 93, 163–172.
Wang, Y., 1999. Change-points via wavelets for indirect data. Statistica Sinica 9, 103–117.
Wong, H., Ip, W., Li, Y., Xie, Z., 1999. Threshold variable selection by wavelets in open-loop threshold autoregressive models. Statistics
and Probability Letters 42, 375–392.
Xia, Y., 1998. Bias-corrected confidence bands in nonparametric regression. Journal of Royal Statistical Society Series B 60, 797–811.
Xia, Y., 1999. On the estimation and testing of function-coefficient linear models. Statistica Sinica 9, 735–777.
Zhou, Y., Liang, H., 1999. Asymptotic normality for L1 -norm kernel estimator of conditional media under dependence. Journal of
Multivariate Analysis 73, 136–154.
Download