A TEST FOR MODEL SPECIFICATION OF DIFFUSION PROCESSES

advertisement
A TEST FOR MODEL SPECIFICATION OF DIFFUSION
PROCESSES
arXiv:math.ST/0702123v1 6 Feb 2007
By Song Xi Chen, Jiti Gao and Cheng Yong Tang
Iowa State University, The University of Western Australia and Iowa
State University
We propose a test for model specification of a parametric diffusion process based on a kernel estimation of the transitional density of
the process. The empirical likelihood is used to formulate a statistic,
for each kernel smoothing bandwidth, which is effectively a studentized L2 -distance between the kernel transitional density estimator
and the parametric transitional density implied by the parametric
process. To reduce the sensitivity of the test on smoothing bandwidth choice, the final test statistic is constructed by combining the
empirical likelihood statistics over a set of smoothing bandwidths.
To better capture the finite sample distribution of the test statistic
and data dependence, the critical value of the test is obtained by
a parametric bootstrap procedure. Properties of the test are evaluated asymptotically and numerically by simulation and by a real data
example.
1. Introduction. Let X1 , · · · , Xn+1 be n+1 equally spaced (with spacing ∆ in time) observations of a diffusion process
(1.1)
dXt = µ(Xt )dt + σ(Xt )dBt ,
where µ(·) and σ 2 (·) > 0 are, respectively, the drift and diffusion functions,
and Bt is the standard Brownian motion. Suppose a parametric specification
of model (1.1) is
(1.2)
dXt = µ(Xt ; θ)dt + σ(Xt ; θ)dBt ,
where θ is a parameter within a parameter space Θ ⊂ Rd for a positive
integer d. The focus of this paper is on testing the validity of the parametric
specification (1.2) based on a set of discretely observed data {Xt∆ }n+1
t=1 .
∗
We thank Co-Editor Jianqing Fan, an Associate Editor and two referees for their
valuable comments and suggestions which have improved the presentation of the paper.
All three authors acknowledge generous support from a National University of Singapore
Academic research grant; Chen and Tang acknowledge support from an National Science Foundation grant (DMS-0604563), and Gao acknowledges support of an Australian
Research Council Discovery Grant.
AMS 2000 subject classifications: Primary 62G05; secondary 62J02
Keywords and phrases: Bootstrap, Diffusion process, Empirical likelihood, Goodnessof-fit test, Time series, Transitional density.
1
In a pioneer work that represents a break-through in financial econometrics, Aı̈t-Sahalia [1] considered two approaches for testing the parametric
specification (1.2). The first one was based on a L2 –distance between a
kernel stationary density estimator and the parametric stationary density
implied by model (1.2) with the critical value of the test obtained from the
asymptotic normal distribution of the test statistic. The advantage of the
test is that the parametric stationary density is easily derivable for almost
all processes and performing the test is straightforward. There are several
limitations with the test. One is, as pointed out by Aı̈t-Sahalia [1], that a
test that targets on the stationary distribution is not conclusive as different
processes may share a common stationary distribution. Another is that it
can take a long time for a process to produce a sample path that contains
enough information for accurate estimation of the stationary distribution.
These were confirmed by Pritsker [41] who reported noticeable discrepancy
between the simulated and nominal sizes of the test under a set of Vasicek
[47] diffusion processes. In the same paper, Aı̈t-Sahalia considered another
approach based on certain discrepancy measure regarding the transitional
distribution of the process derived from the Kolmogorov-backward and backward equations. The key advantage of a test that targets on the transitional
density is that it is conclusive as transitional density fully specifies the dynamics of a diffusion process due to its Markovian property.
In this paper, we propose a test that is focused on the specification of
the transitional density of a process. The basic building blocks used in constructing the test statistic are the kernel estimator of the transitional density
function and the empirical likelihood (Owen [39]). We first formulate an integrated empirical likelihood ratio statistic for each smoothing bandwidth
used in the kernel estimator, which is effectively a L2 -distance between the
kernel transitional density estimator and the parametric transitional density implied by the process. The use of the empirical likelihood allows the
L2 -distance being standardized by the variation. We also implement a series
of measures to make the test work more efficiently. This includes properly
smoothing the parametric transitional density so as to cancel out the bias induced by the kernel estimation, which avoids undersmoothing and simplifies
theoretical analysis. To make the test robust against the choice of smoothing bandwidth, the test statistic is formulated based on a set of bandwidths.
Finally, a parametric bootstrap procedure is employed to obtain the critical
value of the test, to better capture the finite sample distribution of the test
statistic and data dependence induced by the stochastic process.
A continuous-time diffusion process and discrete-time time series share
some important features. They can be both Markovian and weakly depen2
dent with mixing conditions. The test proposed in this paper draws experiences from research works on kernel based testing and estimation of discrete
time models established in the last decade or so. Kernel–based tests have
been shown to be effective in testing discrete time series models as demonstrated in Robinson [43], Fan and Li [22], Hjellvik, Yao and Tjøstheim [28],
Li [36], Aı̈t-Sahalia, Bickel and Stoker [4], Gozalo and Linton [25] and Chen,
Härdle and Li [12]; see Hart [27] and Fan and Yao [17] for extended reviews and lists of references. For kernel estimation of diffusion processes,
in addition to Aı̈t-Sahalia [1], Jiang and Knight [33] proposed a semiparametric kernel estimator; Fan and Zhang [20] examined the effects of high order stochastic expansions and proposed separate generalized likelihood ratio
tests for the drift and diffusion functions; Bandi and Phillips [7] considered
a two–stage kernel estimation without the strictly stationary assumption.
See Cai and Hong [9] and Fan [15] for comprehensive reviews.
In an important development after Aı̈t-Sahalia [1], Hong and Li [30] developed a test for diffusion processes via a conditional probability integral
transformation. The test statistic is based on a L2 -distance between the
kernel density estimator of the transformed data and the uniform density
implied under the hypothesized model. Although the kernel estimator is
employed, the transformation leads to asymptotically independent uniform
random variables under the hypothesized model. Hence, the issue of modeling data dependence induced by diffusion processes is avoided. In a recent
important development, Aı̈t-Sahalia, Fan and Peng [5] proposed a test for
the transitional densities of diffusion and jump diffusion processes based on
a generalized likelihood ratio statistic, which, like our current proposal, is
able to fully test diffusion process specification and has attractive power
properties.
The paper is structured as follows. Section 2 outlines the hypotheses and
the kernel smoothing of transitional densities. The proposed EL test is given
in Section 3. Section 4 reports the main results of the test. Section 5 considers
computational issues. Results from simulation studies are reported in Section
6. A Federal fund rate data set is analyzed in Section 7. All technical details
are given in the appendix.
2. The hypotheses and kernel estimators. Let π(x) be the stationary density and p(y|x; ∆) be the transitional density of X(t+1)∆ = y given
Xt∆ = x under model (1.1) respectively; and πθ (x) and pθ (y|x, ∆) be their
parametric counterparts under model (1.2). To simplify notation, we suppress ∆ in the notation of transitional densities and write {Xt∆ } as {Xt }
for the observed data. Let X be the state space of the process.
3
Although πθ (x) has a close form expression via Kolmogorov forward equation
Z x
2µ(t, θ)
ξ(θ)
exp
dt
,
πθ (x) = 2
2
σ (x, θ)
x0 σ (t, θ)
where ξ(θ) is a normalizing constant, pθ (y|x) as defined by the Kolmogorov
backward-equation may not admit a close form expression. However, this
problem is overcome by Edgeworth type approximations developed by Aı̈tSahalia ([2], [3]). As the transitional density fully describes the dynamics of
a diffusion process, the hypotheses we would like to test are
H0 : p(y|x) = pθ0 (y|x) for some θ0 ∈ Θ and all (x, y) ∈ S ⊂ X 2 versus
H1 : p(y|x) 6= pθ (y|x) for all θ ∈ Θ and some (x, y) ∈ S ⊂ X 2 ,
where S is a compact set within X 2 and can be chosen based on the kernel
transitional density estimator given in (2.1) below; see also demonstrations
in simulation and case studies in Sections 6 and 7. As we are to properly
smooth the parametric density pθ (y|x), the boundary bias associated with
the kernel estimators (Fan and Gijbles [16] and Müller and Stadtmüller [38])
is avoided.
Let K(·) be a kernel function which is a symmetric probability density
function, h be a smoothing bandwidth such that h → 0 and nh2 → ∞ as
n → ∞, and Kh (·) = h−1 K(·/h). The kernel estimator of p(y|x) is
(2.1)
p̂(y|x) = n−1
n
X
t=1
Kh (x − Xt )Kh (y − Xt+1 )/π̂(x)
where π̂(x) = (n + 1)−1 n+1
t=1 Kh (x − Xt ) is the kernel estimator of the
stationary density used in Aı̈t-Sahalia [1]. The local polynomial estimator
introduced by Fan, Yao and Tong [18] can also be employed without altering
the main results of this paper. It is known (Hydman and Yao [32]) that
P
E{p̂(y|x) − p(y|x)} =
Var{p̂(y|x)} =
1 2 2 ∂ 2 p(y|x)
2 σk h
∂x2
R2 (K)p(y|x)
nh2 π(x)
+
∂ 2 p(y|x)
∂y 2
′
(x) ∂p(y|x)
+ 2 ππ(x)
+ o(h2 ),
∂x
(1 + o(1)),
2 =
where σK
u2 K(u)du and R(K) = K 2 (u)du. Here we use a single
bandwidth h to smooth the bivariate data (Xt , Xt+1 ). This is based on a
consideration that both Xt and Xt+1 are identically distributed and hence
have the same scale which allows one smoothing bandwidth to smooth for
both components. Nevertheless, the results in this paper can be generalized
to the situation where two different bandwidths are employed.
4
R
R
Let θ̃ be a consistent estimator of θ under model (1.2), for instance, the
maximum likelihood estimator under H0 , and
(2.2)
wt (x) = Kh (x − Xt )
s2h (x) − s1h (x)(x − Xt )
s2h (x)s0h (x) − s21h (x)
be the local linear weight with srh (x) = ns=1 Kh (x − Xs )(x − Xs )r for
r = 0, 1 and 2. In order to cancel out the bias in p̂(y|x), we smooth pθ̃ (y|x)
as
P
(2.3)
p̃θ̃ (y|x) =
Pn+1
t=1
Kh (x − Xt ) n+1
s=1 ws (y)pθ̃ (Xs |Xt )
.
Pn+1
t=1 Kh (x − Xt )
P
Here we apply the kernel smoothing twice: first for each Xt using the local
linear weight to smooth pθ̃ (Xs |Xt ) and then employing the standard kernel
to smooth with respect to Xt . This is motivated by Härdle and Mammen
[26]. It can be shown from the standard derivations in Fan and Gijbels [16]
that, under H0
(2.4)
(2.5)
E{p̂(y|x) − p̃θ̃ (y|x)} = o(h2 ) and
Var{p̂(y|x) − p̃θ̃ (y|x)} = Var{p̂(y|x)}{1 + o(1)}.
Hence the biases of p̂(y|x) and p̃θ̃ (y|x) are canceled out in the leading order
whileas smoothing the parametric density does not affect the asymptotic
variance.
3. Formulation of test statistic. The test statistic is formulated by the
empirical likelihood (EL) (Owen [39]). Despite its being intrinsically nonparametric, EL possesses two key properties of a parametric likelihood: the
Wilks’ theorem and Bartlett correction. Qin and Lawless [42] established
EL for parameters defined by generalized estimating equations which is the
broadest framework for EL formulation so far, which was extended by Kitamura [34] to dependent observations. Chen and Cui [11] showed that the
EL admits Bartlett correction under this general framework. See also Hjort,
McKeague and Van Keilegom [29] for extensions. The EL has been used for
goodness-of-fit tests of various model structures. Fan and Zhang [21] proposed a sieve EL test for varying–coefficient regression model that extends
the test of Fan and Zhang [20]; Tripathi and Kitamura [46] studied a test
for conditional moment restrictions; Chen, Härdle and Li [12] proposed an
EL test for time series regression models. See also Li [34] for survival data.
We now formulate the EL for the transitional density at a fixed (x, y).
For t = 1, · · · , n, let qt (x, y) be non-negative weights allocated to (Xt , Xt+1 ).
5
The EL evaluated at p̃θ̃ (y|x) is
(3.1)
L{p̂θ̃ (y|x)} = max
n
Y
qt (x, y)
t=1
subject to
(3.2)
Pn
t=1 qt (x, y)
n
X
t=1
= 1 and
qt (x, y)Kh (x − Xt )Kh (y − Xt+1 ) = p̃θ̃ (y|x)π̂(x).
By introducing a Lagrange multiplier λ(x, y), the optimal weights as solutions to (3.1) and (3.2) are
(3.3)
qt (x, y) = n−1 {1 + λ(x, y)Tt (x, y)}−1 ,
where Tt (x, y) = Kh (x − Xt )Kh (y − Xt+1 ) − p̃θ̃ (x, y) and λ(x, y) is the root
of
n
X
(3.4)
t=1
Tt (x, y)
= 0.
1 + λ(x, y)Tt (x, y)
The overall maximum EL is achieved at qt (x, y) = n−1 which maximizes
(3.1) without constraint (3.2). Hence, the log-EL ratio is
(3.5)
ℓ{p̃θ̃ (y|x)} = −2 log [L{p̃θ̃ (y|x)}nn ]
= 2
X
log{1 + λ(x, y)Tt (x, y)}.
It may be shown by similar derivations to those in Chen, Härdle and Li [12]
that
(3.6)
sup |λ(x, y)| = op {(nh2 )−1/2 log(n)}.
(x,y)∈S
Let Ū1 (x, y) = (nh2 )−1 Tt (x, y) and Ū2 (x, y) = (nh2 )−1 Tt2 (x, y). From
(3.4) and (3.6), λ(x, y) = Ū1 (x, y)Ū2−1 (x, y)+Op {(nh2 )−1 log2 (n)} uniformly
with respect to (x, y) ∈ S. This leads to
P
P
ℓ{p̃θ̃ (y|x)} = nh2 Ū12 (x, y)Ū2−1 (x, y) + Op {n−1/2 h−1/2 log3 (n)}
(3.7)
= nh2
{p̂(y|x) − p̃θ̃ (y|x)}2
+ Op {h2 + n−1/2 h−1/2 log3 (n)}
V (y|x)
uniformly for (x, y) ∈ S, where V (y|x) = R2 (K)p(y|x)π −1 (x). Hence, the
EL ratio is a studentized local goodness-of-fit measure between p̂(y|x) and
p̃θ̃ (y|x) as Var{p̂(y|x)} = (nh2 )−1 V (y|x).
6
Integrating the EL ratio against a weight function ω(·, ·) supported on S,
the global goodness–of–fit measure based on a single bandwidth is
(3.8)
N (h) =
Z Z
ℓ{p̃θ̃ (y|x)}ω(x, y)dxdy.
To make the test less dependent on a single bandwidth h, we compute
N (h) over a bandwidth set H = {hk }Jk=1 where hk /hk+1 = a for some a ∈
(0, 1). The choice of H can be guided by the cross-validation method of Fan
and Yim [19] or other bandwidth selection methods; see Section 5 for more
discussions and demonstration. This formulation is motivated by Horowitz
and Spokoiny [31] who considered achieving the optimal convergence rate
for the distance between a null hypothesis and a series of local alternative
hypotheses in testing regression models. A similar approach is advocated
in Fan [14] for wavelets based testing. To attain the optimal convergence,
Horowitz and Spokoiny [31] employed Jn bandwidths where Jn = O{log(n)}
which is slowly increasing to infinity. As we concern with testing against a
fixed alternative only, it is adequate to have a finite number of bandwidths
in H.
The final test statistic based on the bandwidth set H is
(3.9)
Ln = max
1≤k≤J
N (hk ) − 1
√
,
2hk
where the standardization reflects that Var{N (h)} = O(2h2 ) as shown in
the appendix.
4. Main results. Our theoretical results are based on the following assumptions.
Assumption 1. (i) The process {Xt } is strictly stationary and α-mixing with
mixing coefficient α(t) = Cα αt , where
α(t) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ Ωs1 , B ∈ Ω∞
s+t }
for all s, t ≥ 1, where Cα is a finite positive constant and Ωji denotes the
σ-field generated by {Xt : i ≤ t ≤ j}.
(ii) K(·) is a bounded symmetric probability density supported on [−1, 1]
and has bounded second derivative; and ω(x, y) is a bounded probability
density supported on S.
(iii) For the bandwidth set H, h1 = c1 n−γ1 and hJ = cJ n−γ2 , in which
1
1
7 < γ2 ≤ γ1 < 4 , c1 and cJ are constants satisfying 0 < c1 , cJ < ∞, and J
is a positive integer not depending on n.
7
Assumption 2. (i) Each of the diffusion processes given in (1.1) and (1.2)
admits a unique weak solution and possesses a transitional density with
p(y|x) = p(y|x, ∆) for (1.1) and pθ (y|x) = pθ (y|x, ∆) for (1.2).
(ii) Let ps1,s2 ,··· ,sl (·) be the joint probability density of (X1+s1 , . . . , X1+sl ).
Assume that each ps1 ,τ2 ,··· ,sl (x) is three times differentiable in x ∈ X l for
1 ≤ l ≤ 6.
(iii) The parameter space Θ is an open subset of Rd and pθ (y|x) is three
times differentiable in θ ∈ Θ. For every θ ∈ Θ, µ(x; θ) and σ 2 (x; θ), and µ(x)
and σ 2 (x) are all three times continuously differentiable in x ∈ X , and both
σ(x) and σ(x; θ) are positive for x ∈ S and θ ∈ Θ.
h
τ i
∂pθ (Xt+1 |Xt )
t+1 |Xt )
is of full rank. Let G(x, y)
Assumption 3. (i) E ∂pθ (X∂θ
∂θ
be a positive and integrable function with E[max1≤t≤n G(Xt , Xt+1 )] < ∞
uniformly in n ≥ 1 such that supθ∈Θ |pθ (y|x)|2 ≤ G(x, y) and supθ∈Θ || ▽jθ
pθ (y|x)||2 ≤ G(x, y) for all (x, y) ∈ S and j = 1, 2, 3, where ▽θ pθ (·|·) =
∂pθ (·|·)
∂ 3 pθ (·|·)
∂ 2 pθ (·|·)
3
2
∂θ , ▽θ pθ (·|·) = (∂θ)2 and ▽θ pθ (·|·) = (∂θ)3 .
(ii) p(y|x) > c1 > 0 for all (x, y) ∈ S and the stationary density π(x) >
c2 > 0 for all x ∈ Sx which is the projection of S on X .
Assumption 4. Under either H0 or H1 , there is a θ ∗ ∈ Θ and a sequence
of positive constants {an } that
such that, for any ε > 0
diverges to infinity
√
√
∗
and some C > 0, limn→∞ P an ||θ̃ − θ || > C < ε and nha−1
n = o( h)
for any h ∈ H.
Assumption 1(i) imposes the strict stationarity and α–mixing condition
on {Xt }. Under certain conditions, such as Assumption A2 of Aı̈t-Sahalia
[1] and Conditions (A4) and (A5) of Genon–Catalot, Jeantheau and Larédo
[24], Assumption 1(i) holds. Assumption 1(ii) and (iii) are quite standard
conditions imposed on the kernel and the bandwidth in kernel estimation.
Assumption 2 is needed to ensure the existence and uniqueness of a solution
and the transitional density function of the diffusion process. Such an assumption may be implied under Assumptions 1–3 of Aı̈t-Sahalia [3], which
also cover non-stationary cases. For the stationary case, Assumptions A0 and
A1 of Aı̈t-Sahalia [1] ensure the existence and uniqueness of a stationary solution of the diffusion process. Assumption 3 imposes additional conditions
to ensure the smoothness of the transitional density and the identifiability
of the parametric transitional density. The θ ∗ in Assumption 4 is the true
parameter θ0 under H0 . When H1 is true, θ ∗ can be regarded as a projection
of the parameter estimator θ̃ onto the null parameter space. Assumption 4
√
also requires that an , the rate convergence of θ̃ to θ ∗ , is faster than nh,
8
the convergence rate for the kernel transitional density estimation. This is
√
certainly satisfied when θ̃ converges at the rate of n as attained by the
maximum likelihood estimation. Our use of the general convergence rate
an for the parameter estimation is to cover situations where the parame√
ter estimator has a slower rate than n, for instance when estimation is
based on certain forms of discretization which requires ∆ → 0 in order to
be consistent (Lo [37]).
Let K (2) (z, c) = K(u)K(z + cu)du, a generalization to the convolution
R
2 R 2
of K, ν(t) = K (2) (tu, t) du K (2) (v, t) dv and
R
ΣJ =
2
4
R (K)
Z Z
ω 2 (x, y)dxdy ν(ai−j )
J×J
be a J × J matrix, where a is the fixed factor used in the construction
of H.R Furthermore,
Let 1J be a J-dimensional vector of ones and β =
R p(x,y)
1
R(K)
π(y) ω(x, y)dxdy.
d
THEOREM 1. Under Assumptions 1–4 and H0 , Ln → max1≤k≤J Zk as
n → ∞ where Z = (Z1 , . . . , ZJ )T ∼ N (β1J , ΣJ ).
Theorem 1 brings a little surprise in that the mean of Z is non-zero. This
is because, although the variance of p̃θ (x, y) is at a smaller order than that
of p̂(x, y), it contributes
to the second order mean of N (h) which emerges
√
after dividing 2h in (3.9). However, this does not affect Ln being a test
statistic.
We are reluctant to formulate a test based on Theorem 1 as the convergence would be slow. Instead, we propose the following parametric bootstrap
procedure to approximate lα , the 1 − α quantile of Ln for a nominal significance level α ∈ (0, 1):
Step 1: Generate an initial value X0∗ from the estimated stationary density
πθ̃ (·).
Then simulate a sample path {Xt∗ }n+1
t=1 at the same sampling interval ∆
according to dXt = µ(Xt ; θ̃)dt + σ(Xt ; θ̃)dBt .
Step 2: Let θ̃ ∗ be the estimate of θ based on {Xt∗ }n+1
t=1 . Compute the test
statistic Ln based on the resampled path and denote it by L∗n .
Step 3: For a large positive integer B, repeat Steps 1 and 2 B times and
2∗
B∗
obtain after ranking L1∗
n ≤ Ln ≤ · · · ≤ Ln .
Let lα∗ be the 1 − α quantile of L∗n satisfying P L∗n ≥ lα∗ |{Xt }n+1
t=1 = α. A
[B(1−α)]+1∗
Monte Carlo approximation of lα∗ is Ln
H0 if Ln ≥ lα∗ .
9
. The proposed test rejects
The following theorem is the bootstrap version of Theorem 1 establishing
the convergence in joint distribution of the bootstrap version of the test
statistics.
THEOREM 2. Under Assumptions 1–4, as n → ∞, given {Xt }n+1
t=1 , the
conditional distribution of L∗n converges to the distribution of max1≤k≤J Zk
in probability as n → ∞, where (Z1 , . . . , ZJ )T ∼ N (β1J , ΣJ ).
The next theorem shows that the proposed EL test based on the bootstrap
calibration has correct size asymptotically under H0 and is consistent under
H1 .
THEOREM 3. Under Assumptions 1–4, limn→∞ P (Ln ≥ lα∗ ) = α under
H0 ; and limn→∞ P (Ln ≥ lα∗ ) = 1 under H1 .
5. Computation. The computation of the proposed EL test statistic
N (h) involves first compute the local EL ratio ℓ{p̃θ̂ (y|x)} over a grid of
(x, y)-points within the set S ⊂ X 2 . The number of grid points should be
large enough to ensure good approximation by the Riemann sum. On top
of this is the bootstrap procedure that replicates the above computation
for a large number of times. The most time assuming component of the
computation is the nonlinear optimalization carried out when obtaining the
local EL ratio ℓ{p̃θ̂ (y|x)}.
In the following, we consider using a simpler version of the EL, the least
squares empirical likelihood (LSEL), to formulate the test statistic. The log
LSEL ratio evaluated at p̃θ̃ (y|x) is
ℓls {p̃θ̃ (y|x)} = min
n
X
t=1
{nqt (x, y) − 1}2
Pn
Pn
subject to t=1 qt (x, y) = 1 and t=1 qt (x, y)Tt (x, y) = 0. Let T (x, y) =
Pn
Pn
2
t=1 Tt (x, y) and S(x, y) =
t=1 Tt (x, y). The LSEL is much easier to
compute as there are closed-form solutions for the weights qt (x, y) and hence
avoids the expensive nonlinear optimization of the EL computation. According to Brown and Chen [8], the LSEL weights qt (x, y) = n−1 +{n−1 T (x, y)−
Tt (x, y)}τ S −1 (x)T (x, y) and
ℓls {m̃θ̃ (x)} = S −1 (x, y)T 2 (x, y),
which is readily computable. The LSEL counterpart to N (h) is
N ls (h) =
Z Z
ℓls {m̃θ̃ (x, y)}ω(x, y)dxdy
10
√
and the final test statistic Ln becomes maxh∈H ( 2h)−1 {N ls (h) − 1}. It can
be shown from Brown and Chen [8] that N ls (h) and N (h) are equivalent to
the first order. Therefore, those first-order results in Theorems 1, 2 and 3
continue to hold for the LSEL formulation. One may just use this less expensive LSEL to carry out the testing. In fact, the least squares EL formulation
was used in all the simulation studies reported in the next section.
One may use the leading order term in the expansion of the EL test
statistic as given in (3.7) as the local test statistic. However, doing so would
require estimation of secondary “parameters”, like p(y|x) and π(x). The use
of LSEL or EL avoids the secondary estimation as they studentize automatically via their respective optimalization procedures.
6. Simulation studies. We report results of simulation studies which
were designed to evaluate the empirical performance of the proposed EL
test. To gain information on its relative performance, Hong and Li’s test is
performed for the same simulation.
2 2
Throughout the paper, the biweight kernel K(u) = 15
16 (1 − u ) I(|u| ≤
1) was used in all the kernel estimation. In the simulation, we set ∆ =
1
12 implying monthly observations which coincide with that of the Federal
fund rate data to be analyzed. We chose n = 125, 250 or 500 respectively
corresponding roughly to 10 to 40 years of data. The number of simulations
was 500 and the number of bootstrap resample paths was B = 250.
6.1. Size Evaluation. Two simulation studies were carried out to evaluate
the size of the proposed test for both Vasicek and Cox, Ingersoll and Ross
(CIR) [13] diffusion models.
6.1.1 Vasicek Models. We first consider testing Vasicek model
dXt = κ(α − Xt )dt + σdBt .
The vector of parameters θ = (α, κ, σ 2 ) takes three sets of values which
correspond to Model -2, Model 0 and Model 2 of Pritsker [41]. The baseline
Model 0 assigns κ0 = 0.85837, α0 = 0.089102 and σ02 = 0.0021854 which
matches estimates of Aı̈t-Sahalia [1] for an interest rate data. Model -2 is
obtained by quadrupling k0 and σ02 and Model 2 by halving k0 and σ02 twice
while keeping α0 unchanged. The three models have the same marginal
2
distribution N (α0 , VE ), where VE = σ2κ = 0.001226. Despite the stationary
distribution being the same, the models offer different levels of dependence
as quantified by the mean-reverting parameter κ. From Models -2 to 2, the
process becomes more dependent as κ gets smaller.
11
The region S was chosen based on the underlying transitional density so
that the region attained more than 90% of the probability. This is consistent
with our earlier recommendation to choose S based on the kernel estimate of
the transitional density. In particular, for Models -2, 0 and 2, it was chosen
by rotating respectively [0.035, 0.25]×[−0.03, 0.03], [0.03, 0.22]×[−0.02, 0.02]
and [0.02, 0.22]×[−0.009, 0.009] 45 degrees anti clock-wise. The weight function ω(x, y) = |S|−1 I{(x, y) ∈ S} where |S| is the area of S.
Both the cross-validation (CV) and the reference to a bivariate normal
distribution (the Scott Rule, Scott [44]) methods were used to select the
bandwidth set H. Table 1 reports the average bandwidths obtained by the
two methods. We observed that, for each given n, regardless of which method
was used, the chosen bandwidth became smaller as the model was shifted
from Model -2 to Model 2. This indicated that both methods took into
account the changing level of dependence induced by these models. We considered two methods in choosing the bandwidth set. One was to choose six
bandwidths for each combination of model and sample size that contained
the average hcv as reported in Table 1 within the lower range of H. These
fixed bandwidths are reported in Table 3 and Table 5 for CIR models. The
second approach was to select href given the Scott Rule for each sample and
then choose other h-values in the set by setting a = 0.95 so that href is the
third smallest bandwidth in the set of six. This second approach could be
regarded as data-driven as it was different from sample to sample.
The maximum likelihood estimator was used to estimate θ in each simulation and each resample in the bootstrap. Table 2 summarizes the quality
of the parameter estimation. Table 2 shows that the estimation of κ is subject to severe bias when the mean-reversion is weak. The deterioration in
the quality of the estimates, especially for κ when the dependence became
stronger, is alarming.
The average sizes of the proposed test at the nominal size of 5% using
the two bandwidth set selection rules are reported in Table 3. It shows
that the sizes of the proposed test using the proposed two bandwidth set
selections were quite close to the nominal level consistently for the sample
sizes considered. For Model 2, which has the weakest mean-reversion, there
was some size distortion when n = 125 for the fix bandwidth selection.
However, it was significantly alleviated when n was increased. The message
conveyed by Table 3 is that we need not have a large number of years of data
in order to achieve a reasonable size for the test. Table 3 also reports the
single-bandwidth based test based on N (h) and the asymptotic normality
as conveyed by Theorem 1 with J = 1. However, the asymptotic test has
severe size distortion and highlights the need for the bootstrap procedure.
12
We then carried out simulation for the test of Hong and Li [30]. The Scott
rule adopted by Hong and Li was used to get an initial bandwidth hscott =
Ŝz n−1/6 , where Ŝz is the sample standard deviation of the transformed series.
There was little difference in the average of hscott among the three Vasicek
models in the simulation. We used the one corresponding to Vasicek -2. We
then chose 2 equally spaced bandwidths below and above the average hscott .
The nominal 5% test at each bandwidth was carried out with the lag value
1. For the sample sizes considered, the sizes of the test did not settle well
at the nominal level similar to what happened for the asymptotic test as
reported in Table 3. We then carried out the proposed parametric bootstrap
procedure for Hong and Li’s test. As shown in Table 4, the bootstrap largely
improved the size of the test.
6.1.2 CIR Models. We then conduct simulation on the CIR process
(6.1)
p
dXt = κ(α − Xt )dt + σ Xt dBt
to see if the pattern of results observed for the Vasicek models holds for
the CIR models. The parameters were: κ = 0.89218, α = 0.09045 and
σ 2 = 0.032742 in the first model (CIR 0); κ = 0.44609, α = 0.09045
and σ 2 = 0.016371 in model CIR 1 and κ = 0.22305, α = 0.09045 and
σ 2 = 0.008186 in model CIR 2. CIR 0 was the model used in Pritsker [41]
for power evaluation. The region S was chosen by rotating 45-degrees anticlockwise [0.015, 0.25] × [−0.015, 0.015] for CIR 0, [0.015, 0.25] × [−0.012, 0.012] for
CIR 1 and [0.015, 0.25] × [−0.008, 0.008] for CIR 2 respectively. All the regions
have a coverage probability of at least 0.90.
Table 5 reports the sizes of the proposed test based on two bandwidth
sets as well as the single bandwidth-based tests that involve the bootstrap
simulation. The bandwidth set were chosen based on the same principle as
outlined for the Vasicek models and are reported in the table. The parameter
estimation under these CIR models has the same pattern of quality as the
Vasicek model as reported in Table 2. We find that the proposed test continued to have reasonable size for the three CIR models despite that there
were severe biases in the estimation of κ. The size of the single bandwidth
based tests as well as the overall test were quite respectable for n = 125.
It is interesting to see that despite κ is still being poorly estimated for the
CIR 2, the severe size distortion observed earlier for Vasicek 2 for the fixed
bandwidth set was not present. Hong and Li’s test was also performed for
the three CIR models. The performance was similar to that of the Vasicek
models reported in Table 4 and hence we would not report here.
6.2. Power evaluation. To gain information on the power of the proposed
13
test, we carried out simulation to test for the Vasicek model while the real
process was the CIR 0 as in Pritsker’s power evaluation of Aı̈t-Sahalia’s test.
The region S was obtained by rotating [0.015, 0.25] × [−0.015, 0.015] 45 degrees anti-clock wise. The average CV bandwidths based on 500 simulations
were 0.0202 (the standard error of 0.0045) for n = 125, 0.016991 (0.00278)
for n = 250 and 0.014651 (0.00203) for n = 500.
Table 6A reports the power of the EL test and the single bandwidth
based tests, including the fixed bandwidth sets used in the simulation. We
find the tests had quite good power. As expected, the power increased as
n increased. One striking feature was that the power of the test tends to
be larger than the maximum power of the single bandwidth-based tests,
which indicates that it is worthwhile to formulate the test based on a set of
bandwidths. Table 6B also reports the power of the Hong and Li’s test. It
is found that while the bootstrap calibration improved the size of the test,
it largely reduced the power. The power reduction was quite alarming. In
some case, the test had little power. Our simulation results on Hong and
Li’s test were similar to those reported in Aı̈t-Sahalia, Fan and Peng [5].
7. Case studies. We apply the proposed test on the Federal fund rate
data set between January 1963 and December 1998 which has n = 432 observations. Aı̈t-Sahalia [2] used this data set to demonstrate the performance
of the maximum likelihood estimation. We test for five popular one-factor
diffusion models which have been proposed to model interest rate dynamics.
In additional to the Vasicek and CIR processes, we consider
(7.1)
(7.2)
(7.3)
3/2
dXt = Xt {κ − (σ 2 − κα)Xt }dt + σXt
dXt = κ(α − Xt )dt +
dBt ,
σXtρ dBt ,
3/2
dXt = (α−1 Xt−1 + α0 + α1 Xt + α2 Xt2 )dt + σXt
dBt .
They are respectively the inverse of the CIR process (ICIR) (7.1), the constant elasticity of volatility (CEV) model (7.2) and the nonlinear drift (NL)
model (7.3) of Aı̈t-Sahalia [1].
The data are displayed in Figure 1(a), which indicates a strong dependence as they scattered around a narrow band around the 45-degree line.
There was an increased volatility when the rate was larger than 12%. The
model-implied transitional densities under the above five diffusion models
are displayed in the other panels of Figure 1 using the MLEs given in Aı̈tSahalia [2], which were also used in the formulation of the proposed test
statistic. Figure 1 shows that the densities implied by the Inverse CIR, the
CEV and the nonlinear drift models were similar to each other, and were
14
quite different from those of the Vasicek and CIR models. The bandwidths
prescribed by the Scott rule and the CV for the kernel estimation were respectively href = 0.007616 and hcv = 0.00129. Plotting the density surfaces
indicated that a reasonable range for h was from 0.007 to 0.02, which offered
a lot of smoothness from slightly undersmoothing to slightly oversmoothing.
This led to a bandwidth set consisting of J = 7 bandwidths with h1 = 0.007,
hJ = 0.020 and a = 0.8434.
Kernel transitional density estimates and the smoothed model-implied
transitional densities for the five models are plotted in Figure 2 for h =
0.007. By comparing Figure 2 with Figure 1, we notice the effect of kernel
smoothing on these model-implied densities. In formulating the final test
statistic Ln , we chose
(7.4)
N (h) =
n
1X
ℓ{p̃θ̃ (Xt+1 |Xt )}ω1 (Xt , Xt+1 ),
n t=1
where ω1 is a uniform weight over a region by rotating [0.005, 0.4]×[−0.03, 0.03]
45 degree anti clock-wise. The region contains all the data values of the pair
(Xt , Xt+1 ). As seen from (7.4), N (h) is asymptotically equivalent to the
statistic defined in (3.8) with ω(x, y) = p(x, y)ω1 (x, y).
The p-values of the proposed tests as well as the tests based on single
bandwidth for each of the five diffusion models are reported in Table 7, which
were obtained based on 500 bootstrap resamples. It shows little empirical
support for the Vasicek model and quite weak support for the CIR. What
was surprising is that there was some empirical support for the inverse CIR,
the CEV and the nonlinear drift models. In particular, for CEV and the
nonlinear drift models, the p-values of the single bandwidth based tests
were all quite supportive even for small bandwidths. Indeed, by looking
at Figure 2, we see quite noticeable agreement between the nonparametric
kernel density estimates and the smoothed densities implied by the CEV
and nonlinear drift models.
8. Conclusion. The proposed test shares some similar features with
the test proposed in Hong and Li [30]. For instance, both are applicable
to test continuous-time and discrete-time Markov processes by focusing on
the specification of the transitional density. An advantage of Hong and Li’s
test is its better handling of nonstationary processes. The proposed test is
based on a direct comparison between the kernel estimate and the smoothed
model-implied transitional density, whereas Hong and Li’s test is an indirect
comparison after the probability integral transformation. An advantage of
15
the direct approach is its robustness against poor quality parameter estimation which is often the case for weak mean-reverting diffusion models. This
is because both the shape and the orientation of the transitional density
are much less affected by the poor quality parameter estimation. Another
aspect is that Hong and Li’s test is based on asymptotic normality and can
be under the influence of slow convergence despite that the transformed series is asymptotically independent. Indeed, our simulation showed that it is
necessary to implement the bootstrap procedure for Hong and Li’s test. The
last and the most important aspect is that a test based on the conditional
distribution transformation tends to reduce the power comparing with a direct test based on the transitional density. This has been indicated by our
simulation study as well as that of Aı̈t-Sahalia, Fan and Peng [5]. Interested
readers can read Aı̈t-Sahalia, Fan and Peng [5] for more insights.
Our proposed test is formulated for the univariate diffusion process. An
extension to multivariate diffusion processes can be made by replacing the
univariate kernel smoothing in estimating the transitional density with multivariate kernel transitional density estimation. There is no substantial difference in the formulation of the EL test statistic and the parametric bootstrap
procedure. If the exact form of the transitional density is unknown which is
more likely for multivariate diffusion processes, the approximate transitional
density expansion of Aı̈t-Sahalia and Kimmel [6] is needed.
APPENDIX
As the Lagrange multiplier λ(x, y) is implicitly dependent on h, we need
first to extend the convergence rate for a single h-based sup(x,y)∈S λ(x, y)
conveyed in (3.6) to be valid uniformly over H. To prove Theorem 1, we
need the following lemmas first.
LEMMA A.1. Under Assumptions 1–4,
max sup λ(x, y) = op {n−1/4 log(n)}.
h∈H (x,y)∈S
PROOF. For any δ > 0,
P
≤
max sup hλ(x, y) ≥ δn
h∈Hn (x,y)∈S
X
h∈H
P
−1/2
sup hλ(x, y) ≥ δn
(x,y)∈S
16
−1/2
!
log(n)
!
log(n) .
As the number of bandwidths in H is finite, by checking the relevant
derivations in Chen, Härdle and Li [12], it can be shown that

P
X
(x,y)∈S

hλ(x, y) ≥ δn−1/2 log(n) → 0
as n → ∞. This implies that maxh∈H sup(x,y)∈S hλ(x, y) = op {δn−1/2 log(n)}.
Then the lemma is established by noting that h1 , the smallest bandwidth in
H, is of order n−γ1 where γ1 ∈ (1/7, 1/4) as assumed in Assumption 1.
Before introducing the next lemma, we present some expansions for the
EL test statistic N (h). Let
p̃θ (x, y) = p̃θ (y|x)π̂(x) and p̃(x, y) = n−1
n+1
X
Kh (x−Xt )
t=1
n+1
X
s=1
ws (y)p(Xs |Xt )
be the kernel smoothed versions of the parametric and nonparametric joint
densities pθ (x, y) and p(x, y) respectively. Due to the relationship between
transitional and joint densities,
Z Z
{p̂(x, y) − p̃θ̃ (x, y)}2
ω(x, y)dxdy
R2 (K)p(x, y)
N (h)
=
(nh2 )
(A.1)
+
Õp {h2 + (nh2 )−1/2 log3 (n)}
Z Z {p̂(x, y) − p̃(x, y)}2
(nh2 )R−2 (K)
p(x, y)
2{p̂(x, y) − p̃(x, y)}{p̃(x, y) − p̃θ̃ (x, y)}
p(x, y)
{p̃(x, y) − p̃θ̃ (x, y)}2
ω(x, y)dxdy + Õp {h2 + (nh2 )−1/2 log3 (n)}
p(x, y)
=
+
+
=: N1 (h) + N2θ̃ (h) + N3θ̃ (h) + Õp {h2 + (nh2 )−1/2 log3 (n)}.
Here and throughout the proofs, õ(δn ) and Õ(δn ) denote stochastic quantities which are respectively o(δn ) and O(δn ) uniformly over S for a nonnegative sequence {δn }.
Using Assumptions 3 and 4, we have Nlθ̃ (h) = Nlθ∗ (h) + õp (h) where
∗
θ = θ0 under H0 and θ1 under H1 . Thus,
N (h) = N1 (h) + N2θ∗ (h) + N3θ∗ (h) + õp (h) + Õp {(nh2 )−1/2 log3 (n)}.
We start with some lemmas on p̂(x,
y), p̃(x, y) and p̃θ (x, y). Let K (2) be
R
R
the convolution of K, M K (2) (t) =
uK(u)K(t + u)du and p3 (x, y, z) be
the joint density of (Xt , Xt+1 , Xt+2 ).
17
LEMMA A.2. Under Assumptions 1–4,
Cov{p̂(s1 , t1 ), p̂(s2 , t2 )}
K (2)
=
s2 −s1
h
M K (2)
−
1
K (2) s2 −s
p(s1 , t1 )
h
2
nh
s2 −s1 ∂p(s1 ,t1 )
+ M K (2)
h
∂x
p3 (s1 , t1 , t2 )K (2)
+
+ o{(nh)−1 }.
t2 −t1
nh
∂p(s1 ,t1 )
∂y
nh
s2 −t1
+ p3 (s2 , t2 , t1 )K (2)
h
nh
s1 −t2
h
PROOF. Let Zt (s, t) = Kh (s − Xt )Kh (t − Xt+1 ). So that
p̂(s, t) = n−1
n
X
Zt (s, t) and
t=1
(A.2) Cov{p̂(s1 , t1 ), p̂(s1 , t1 )}
where
Qn = n
−1
n−1
X
= n−1 Cov{Z1 (s1 , t1 ), Z1 (s2 , t2 )}
1 +
Cov{Z1 (s1 , t1 ), Z2 (s2 , t2 )}
n−1
1 +
Cov{Z2 (s1 , t1 ), Z1 (s2 , t2 )} + Qn ,
n−1
(1−ln−1 ) Cov{Z1 (s1 , t1 ), Zl+1 (s2 , t2 )}+Cov{Zl+1 (s1 , t1 ), Z1 (s2 , t2 )} .
l=2
Standard derivations show that
Cov{Z1 (s1 , t1 ), Z1 (s2 , t2 )}
=
−
(A.3)
Cov{Z1 (s1 , t1 ), Z2 (s2 , t2 )}
K (2)
s2 −s1
h
M K (2)
1
K (2) s2 −s
p(s1 , t1 )
h
h2
s2 −s1 ∂p(s1 ,t1 )
+ M K (2)
h
∂x
h
+ Õ(1) and
=
p3 (s1 , t1 , t2 )K (2)
h
s2 −t1
h
t2 −t1
h
∂p(s1 ,t1 )
∂y
+ Õ(1).
Applying Davydov inequality for α-mixing sequences (Fan and Gijbles
[16], p. 251), it can be shown that Qn = õ{(nh)−1 ). This together with
(A.2) - (A.3) leads to the lemma.
18
LEMMA A.3. Suppose that Assumptions 1–4 hold. Let ∆θ (x, y) = {pθ (y|x)−
p(y|x)}π(x). Then
E{p̃θ (x, y) − p̂(x, y)}
(A.4)
E{p̃θ (x, y) − p̃(x, y)}
(A.5)
(A.6)
∂2
∂y 2 }∆θ (x, y)
2
∂2
∂y 2 }∆θ (x, y)
+
Õ(h3 ),
=
∂
2
{ ∂x
∆θ (x, y) + 12 h2 σK
2 +
Õ(h3 ),
+
Cov{p̃(s1 , t1 ), p̃(s2 , t2 )}
2
∂
2
∆θ (x, y) + 12 h2 σK
{ ∂x
2 +
=
K (2)
=
t2 −t1
h
p(s1 , t1 )p(s2 , t1 )
nhπ(t2 )
õ{(nh)−1 }.
+
n+1
PROOF. As p̃θ (x, y) = n−1 n+1
t=1 Kh (x − Xt )
s=1 ws (y)pθ (Xs |Xt ), it
follows from the bias of the local linear regression (Fan and Gijbels [16])
and kernel density estimators that the transitional density has uniformly
bounded third derivatives in Assumption 3,
P
P
2
2 { ∂ +
E{p̃θ (x, y)} = pθ (y|x)π(x) + 12 h2 σK
∂x2
∂2
∂y 2 }pθ (y|x)π(x)
+ Õ(h3 ).
Then employing the same kind of derivation on E{p̂(x, y)}, we readily establish (A.4).
To derive (A.6), we note the following expansion for p̃(x, y) based on the
notion of equivalent kernel for local linear estimator (Fan and Gijbels [16]):
p̃(x, y) = n−1
= n−1
(A.7)
X
Kh (x − Xt )
X
Kh (x − Xt )
= π(x)π −1 (y)
n+1
X
s=1
n+1
X
s=1
n+1
X
s=1
ws (y)pθ (Xs |Xt )
π −1 (y)Kh (y − Xs )pθ (Xs |Xt ){1 + õp (h)}
π −1 (y)Kh (y − Xs )pθ (Xs |x){1 + õp (h)}.
Then derivations similar to those used in Lemma A.2 readily establish (A.6).
LEMMA A.4. Under Assumptions 1–4, we have
Cov{p̂(s1 , t1 ), p̃(s2 , t2 )} =
+
t2 − s 1
p(s1 , t1 )
K (2)
p(s2 , s1 )
nhπ(t2 )
h
p(s1 , t1 )
(2) t2 − s1
K
p(s2 , s1 ) .
nhπ(t2 )
h
19
PROOF. From (A.7),
Cov{p̂(s1 , t1 ), p̃(s2 , t2 )}
π(s2 )
Cov{Kh (s1 − X1 )Kh (t1 − X2 ), Kh (t2 − X1 )p(X1 |s2 )}
=
nπ(t2 )
+ Cov{Kh (s1 − X1 )Kh (t1 − X2 ), Kh (t2 − X2 )p(X2 |s2 )} {1 + õ(1)}
π(s2 )
t2 − s 1
{K (2)
p(s1 |s2 )p(s1 , t1 )
nhπ(t2 )
h
(2) t2 − t1
+ K
p(t1 |s2 )p(s1 , t1 )}{1 + õ(1)},
h
=
which leads to the statement of the lemma.
LEMMA A.5. If H0 is true, then N2θ∗ (h) = N3θ∗ (h) = 0 for all h ∈ H.
PROOF. Under H0 , p(y|x) = pθ0 (y|x) and θ ∗ = θ0 . Hence,
p̃(x, y)−p̃θ∗ (x, y) = n−1
X
Kh (x−Xt )
X
ws (y){p(Xs |Xt )−pθ0 (Xs |Xt )} = 0.
Let us now study the leading term N1 (h). From (A.1) and by hiding the
variables of integrations,
N1 (h)
=
=
+
+
=:
{p̂ − p̃}2
(nh2 )
ω
R2 (K)
R2 (K)p
Z Z (nh2 )
{p̂ − E p̂}2 {E p̃ − p̃}2 {E p̂ − E p̃}2
+
+
R2 (K)
p
p
p
2{p̂ − E p̂}{E p̂ − E p̃} 2{p̂ − E p̂}{E p̃ − p̃}
+
p
p
2{E p̂ − E p̃}{E p̃ − p̃}
ω
p
Z Z
6
X
N1j (h).
j=1
We are to show in the following lemmas that N11 (h) dominates N1 (h)
and N1j (h) for j ≥ 2 are all negligible except N12 (h) which contributes to
the mean of N1 (h) in the second order.
20
LEMMA A.6. Under Assumptions 1–4, then uniformly with respect to H
h−1 E{N11 (h) − 1} =
(A.8)
(A.9)
Var{h
−1
N11 (h)}
o(1),
Z Z
2K (4) (0)
ω 2 (x, y)dxdy + o(1),
R4 (K)
Z Z
2ν(h1 /h2 )
ω 2 (x, y)dxdy
R4 (K)
o(1).
=
Cov{h1−1 N11 (h1 ), h−1
2 N11 (h2 )} =
(A.10)
+
PROOF. From Lemma A.1 and the fact that M K (2) (0) = 0,
nh2
V ar{p̂(x, y)}
E{N11 (h)} =
ω(x, y)dxdy
2
R (K)
p(x, y)
Z Z 2
1
(2) y − x p3 (x, y, y)
(2)
+
2hK
K
(0)
=
R2 (K)
h
p(x, y)
2
(A.11)
× ω(x, y)dxdy{1 + o(1)} = 1 + O(h ),
Z Z
which leads to (A.8). To derive (A.9), let
Ẑn (s, t) = (nh2 )1/2
p̂(s, t) − E p̂(s, t)
.
R(K)p1/2 (x, t)
It may be shown from the fact that K is bounded and other regularity
condition assumed that E{|Ẑn (s1 , t1 )|2+ǫ |Ẑn (s2 , t2 )|2+ǫ } ≤ M for some positive ǫ and M . And hence {Ẑn (s, t)}n≥1 and {Ẑn2 (s1 , t1 )Ẑn2 (s2 , t2 )}n≥1 are
uniformly integrable respectively. Also,
T
Ẑn (s1 , t1 ), Ẑn (s2 , t2 )
d
→ (Z(s1 , t1 ), Z(s2 , t2 ))T
which is a bivariate normal process with mean zero and covariance
Σ=
1
g{(s1 , t1 ), (s2 , t2 )}
g{(s1 , t1 ), (s2 , t2 )}
1
where g{(s1 , t1 ), (s2 , t2 )} = K (2)
s2 −s1 (2)
K
h
21
!
p1/2 (s1 ,t1 )
t2 −t1 .
h
R(K)p1/2 (s2 ,t2 )
Hence,
by ignoring smaller order terms,
=
=
=
=
×
(A.12) =
Var{N11 (h)}
Z Z Z Z
Cov{Ẑn2 (s1 , s2 ), Ẑn2 (s2 , t2 )}ω(s1 , t1 )ω(s2 , t2 )ds1 dt1 ds2 dt2
Z Z Z Z
Cov{Z 2 (s1 , s2 ), Z 2 (s2 , t2 )}ω(s1 , t1 )ω(s2 , t2 )ds1 dt1 ds2 dt2
Z Z Z Z
2
Cov2 {Z(s1 , s2 ), Z(s2 , t2 )}ω(s1 , t1 )ω(s2 , t2 )ds1 dt1 ds2 dt2
2
Z Z Z Z t2 − t1
s2 − s1
2
(2)
(2)
K
K
R4 (K)
h
h
p(s1 , t1 )
ω(s1 , t1 )ω(s2 , t2 )ds1 dt1 ds2 dt2
R2 (K)p(s2 , t2 )
Z Z
2h2 K (4) (0)
ω 2 (s, t)π −2 (t)dsdt.
R4 (K)
In the third equation above, we use a fact regarding the fourth product
moments of normal random variables. Combining (A.11) and (A.12), (A.8)
and (A.9) are derived. It can be checked that it is valid uniformly for all
h ∈ H.
The proof for (A.10) follows from that for (A.9).
LEMMA A.7. Under Assumptions 1–4, then uniformly with respect to
h∈H
(A.13)
(A.14)
p(x, y)
1
ω(x, y)dxdy + op (1),
R(K)
π(y)
h−1 N1j (h) = op (1) for j ≥ 3.
h−1 N12 (h) =
Z Z
PROOF. From (A.6) and the fact that K (2) (0) = R(K),
(A.15) E{N12 (h)} =
=
(nh2 )
Var{p̃}
ω
2
R (K)
p
Z Z
p(x, y)
h
ω(x, y)dxdy{1 + õ(1)}.
R(K)
π(y)
Z Z
p̃(s,t)
Let Z̃n (s, t) = (nh)1/2 p̃(s,t)−E
. Using the same approach that derives
R1/2 (K)p(x,t)
22
(A.12) and from Lemma A.2
Var{N12 (h)}
(A.16)
=
×
=
×
=
Z Z Z Z
h2
Cov{Z̃n2 (s1 , s2 ), Z̃n2 (s2 , t2 )}ω(s1 , t1 )ω(s2 , t2 )
R2 (K)
ds1 dt1 ds2 dt2
Z Z Z Z
t2 − t1 2 −2
2h2
(2)
} π (t2 )ω(s2 , t2 )
{K
R4 (K)
h
ds1 dt1 ds2 dt2
Z Z
2h4 K (4) (0)
ω 2 (s, t)π −2 (t)dsdt
R4 (K)
up to a factor {1 + õ(1)}. Combining (A.15) and (A.16), (A.13) is derived.
It can be checked that it is valid uniformly for all h ∈ H.
As {E p̂(x, y) − E p̃(x, y)}2 = Õ(h6 ) and h = o(n−1/7 ) for all h ∈ H, we
have (nh2 )h6 = op (h) uniformly for all h ∈ H and hence (A.11) for j = 2.
Obviously E{N14 (h)} = 0. Use the same method for Var{N12 (h)} and
from Lemmas A.1 and A.2,
Var{N14 (h)}
Z Z Z Z
4(nh2 )2
E{p̂ − p̃}(s1 , t1 )E{p̂ − p̃}(s2 , t2 )
R4 (K)
p(s1 , t1 )p(s2 , t2 )
× Cov{p̂(s1 , t1 ), p̂(s2 , t2 )}w(s1 , t1 )w(s2 , t2 )ds1 dt1 ds2 dt2 .
=
As E{p̂(s1 , t1 ) − p̃(s1 , t1 )}E{p̂(s2 , t2 ) − p̃(s2 , t2 )} = O(h6 ) and the integral over the covariance produces an additional factor of h2 in addition to
(nh2 )−1 , we have V ar{N14 (h)} = O{(nh2 )h6 h2 } = o(h2 ) uniformly for all
h ∈ H. This means that N14 = op (h) uniformly for all h ∈ H. Using exactly
the same derivation, but employing Lemma A.3 instead of Lemma A.1, we
have N16 = op (h) uniformly for h ∈ H.
It remains to study N15 (h). From Lemma A.3,
2(nh2 )
E{N15 (h)} = − 2
R (K)
Z Z
Cov{p̂(x, y), p̃(x, y)}
ω(x, y)dxdy
p(x, y)
Z Z K (2)
y−x
h
p(x, y)
4h
ω(x, y)dxdy
R2 (K)
π(y)
Z
Z
4h2
= − 2
K (2) (u)du π(y)ω(y, y)dy{1 + õ(1)}
R (K)
= −
using Assumption 1(iv). It may be shown using the same method that derives (A.16) that
Var{N15 (h)} = o(h2 ) and hence N15 (h) = op (h). The uniformity with respect to h ∈ H can be checked.
23
p(x,y)
1
1
Let L(h) = C(K)h
{N (h) − 1} and β = √2R(K)
π(y) ω(x, y)dxdy. In
view of Lemmas A.5, A.6 and A.7, we have under H0 , uniformly with respect
to H,
RR
1
√ {N11 (h) − 1} + β + op (1).
2h
L(h) =
(A.17)
Define L1 (h) =
√1 {N11 (h)
2h
− 1}.
LEMMA A.8. Under Assumptions 1–4 and H0 , as n → ∞,
d
(L1 (h1 ), · · · , L1 (hJ ))T → NJ (β1J , ΣJ ).
PROOF. According to the Cramér–Wold device, it suffices to show
J
X
(A.18)
i=1
d
ci L1 (hi ) → NJ (cτ β1J , cτ ΣJ c)
for an arbitrary vector of constants c = (c1 , · · · , cJ )τ . Without loss of generality, we will only prove the case of J = 2. To apply Lemma A.1 of Gao and
i
King [23], we introduce the following notation. For i = 1, 2, define di = √c2h
i
and ξt = (Xt , Xt+1 ),
ǫti (x, y)
φi (ξs , ξt )
φst
y − Xt+1
x − Xt
y − Xt+1
x − Xt
K
−E K
K
,
hi
hi
hi
hi
Z Z
ǫsi (x, y)ǫti (x, y)
1
ω(x, y)dxdy,
2
nhi
p(x, y)R2 (K)
= K
=
= φ(ξs , ξt ) =
2
X
di φi (ξs , ξt ) and L1 (h1 , h2 ) =
T X
t−1
X
φst .
t=2 s=1
i=1
It is noted that for any given s, t ≥ 1 and fixed x and y, E[φ(x, ξt )] =
E[φ(ξs , y)] = 0. It suffices to verify
max{Mn , Nn }h−2
1 → 0 as n → ∞,
(A.19)
where
2
1
1+δ
1
2(1+δ)
2
Mn = max n Mn1 , n Mn51
3
2
1
2(1+δ)
Nn = max n Mn21
3
2
2
1
2(1+δ)
, n Mn52
1
2(1+δ)
, n Mn22
24
3
2
1
2
2
1
2
, n Mn6
3
2
1
2(1+δ)
, n Mn3 , n Mn4
3
2
1
1+δ
, n Mn7
,
in which
Z
max E|φik ψjk |1+δ , |φik θjk |1+δ dP (ξi )dP (ξj , ξk ) ,
1≤i<j<k≤n
Z
=
max max E|φik φjk |2(1+δ) , |φik φjk |2(1+δ) dP (ξi )dP (ξj , ξk ) ,
1≤i<j<k≤n
Z
=
max max
|φik φjk |2(1+δ) dP (ξi , ξj )dP (ξk ),
1≤i<j<k≤n
Z
|φik φjk |2(1+δ) dP (ξi )dP (ξj )dP (ξk ) ,
Mn1
=
Mn21
Mn22
Mn3
=
max
max
1≤i<j<k≤n
E|φik φjk |2 ,
Z
max |φ1i φjk |2(1+δ) dP ,
max
P
1 < i, j, k ≤ 2n; i, j, k distinct
( Z
2(1+δ) )
=
max max E φik φjk φik φjk dP (ξi )
,
1≤i<j<k≤n
(Z Z
)
2(1+δ)
=
max max
dP (ξj )dP (ξk ) ,
φik φjk φik φjk dP (ξi )
1≤i<j<k≤n
Mn4
=
Mn51
Mn52
Mn6
=
max
1≤i<j<k≤n
Z
2
E φik φjk dP (ξi ) ,
Mn7 =
i
h
1+δ
max E |φij |
1≤i<j<n
where the maximization in Mn4 is over the probability measures P (ξ1 , ξi , ξj , ξk ),
P (ξ1 )P (ξi , ξj , ξk ), P (ξ1 )P (ξi1 )P (ξi2 , ξi3 ), and P (ξ1 )P (ξi )P (ξj )P (ξk ).
Without confusion, we replace h1 by h for simplicity. To verify the Mn
part of (A.19), we verify only
1
1+δ
= 0.
lim n2 h−2 Mn1
(A.20)
n→∞
Let q(x, y) = ω(x, y)p−1 (x, y) and
ψij =
1
nh2
Z
K((x−Xi )/h)K((y−Xi+1 )/h)K((x−Xj )/h)K((y−Xj+1 )/h)q(x, y)dxdy
for 1 ≤ i < j < k ≤ n. Direct calculation implies
ψik ψjk
y − Xi+1
x − Xk
K
×
h
h
y − Xk+1
u − Xj
v − Xj+1
u − Xk
K
q(x, y)K
K
K
h
h
h
h
v − Xk+1
q(u, v)dxdydudv = bijk + δijk
× K
h
= (nh2 )−2
Z
K
x − Xi
h
K
25
where δijk = ψik ψjk − bijk and
bijk = n−2 q(Xi , Xi+1 )q(Xj , Xj+1 ) K (2)
× K (2)
Xj − Xk
h
K (2)
Xi − Xk
h
Xi+1 − Xk+1
Xj+1 − Xk+1
K (2)
.
h
h
For any given 1 < ζ < 2 and n sufficiently large, we may show that
h
Mn11 = E |ψij ψik |ζ
h
i
i
h
≤ 2 E |bijk |ζ + E |δijk |ζ
=
×
i
h
i
= 2E |bijk |ζ (1 + o(1))
u − z ζ
x−z
K (2)
n
|q(x, y)q(u, v)| K (2)
h
h
ζ
(2) y − w
v − w K
K (2)
p(x, y, u, v, z, w)dxdydudvdzdw
h
h
−2ζ
Z Z Z
ζ
(A.21) = C1 n−2ζ h4
where p(x, y, u, v, z, w) denotes the joint density of (Xi , Xi+1 , Xj , Xj+1 , Xk , Xk+1 )
and C1 is a constant. Thus, as n → ∞
1
(A.22)
1+δ
= Cn2 h−1 n−2ζ h2
n2 h−2 Mn11
1/ζ
2
=h
(2−ζ)
ζ
→ 0.
Hence, (A.22) shows that (A.20) holds for the first part of Mn1 . The proof
for the second part of Mn1 follows similarly.
Proof of Theorem 1: From (A.17) and Lemma A.8, we have under H0
d
(L(h1 ), · · · , L(hJ )) → NJ (β1J , ΣJ ).
d
Let Z = (Z1 , · · · , ZJ )T ∼ NJ (β1J , ΣJ ). By the mapping theorem, under
H0 ,
(A.23)
d
Ln = max L(h) → max Zk .
h∈H
1≤k≤J
Hence the theorm is established. .
Let l0α be the upper-α quantile of max1≤i≤J Zi . As the distribution of
NJ (β1k , ΣJ ) is free of n, so is that of max1≤i≤J Zi . And hence l0α is a fixed
quantity with respect to n.
The following lemmas are required for the proof of Theorem 3.
26
LEMMA A.9. Under Assumptions 1-4, for constants C and γ ∈ (1/3, 1/2),
ℓ{p̃(y|x) + C(nh2 )−γ } → ∞ in probability uniformly for (x, y) ∈ S.
PROOF. Let µ̃(x, y) = p̃(y|x)+C(nh2 )−γ and Qt,h (x, y) = wnw , t(x)Kh (y−
Xt+1 ). Recall from the EL formulation given in Section 3 that ℓ{µ̃(x, y)} =
P
2 nt=1 log[1 + λ(x, y){Qt,h (x, y) − µ̃(x, y)}] where, according to (3.4), λ(x, y)
satisfies
0 =
Note that
(A.24)
λ(x, y)
n
X
t=1
X
Qt,h (x, y) − µ̃(x, y)
.
1 + λ(x, y){Qt,h (x, y) − µ̃(x, y)}
n
X
{Qt,h (x, y) − µ̃(x, y)}2
=
{Qt,h (x, y) − µ̃(x, y)}.
1 + λ(x, y){Qt,h (x, y) − µ̃(x, y)} t=1
Let Sh2 (x, y) = n−1 nt=1 {Qt,h (x, y) − µ̃(x, y)}2 . From established results on
the kernel estimator for α-mixing sequences,
P
(A.25)
Sh2 (x, y) = h−2 R2 (K)p(y|x)p−1 (x) + Op {(nh2 )−2γ } + op (h−2 ).
Note that for a positive constant M0 and sufficiently large n sup(x,y)∈S |Qt,h (x, y)−
µ̃(x, y)| ≤ h−2 M0 with probability one. Hence, (A.24) implies that
|λ(x, y)Sh2 (x, y)| ≤ {1 + |λ(x, y)|h−2 M0 }|n−1
n
X
t=1
{Qt,h (x, y) − µ̃(x, y)}|.
This, along with (A.25) and the facts that
n−1
n
X
t=1
{Qt,h (x, y) − p̃(x, y)} = Õp {(nh2 )−1/2 log(n)}
and µ̃(x, y) − p̃(y|x) = C(nh2 )−γ , implies
(A.26)
λ(x, y) = Õp {h2 (nh2 )−γ }
uniformly with respect to (x, y) ∈ S. The rate of λ(x, y) established in (A.26)
allows us to carry out Taylor expansion in (A.24) and obtain
(A.27) λ(x, y) = Sh−2 (x, y)n−1
n
X
t=1
{Qt,h (x, y) − µ̃(x, y)} + Õp {h4 (nh2 )−2γ }.
27
At the same time, as Xt s are continuous random variable, by applying the
blocking technique and Davydov inequality, it can be shown that for any
η > 0 and (x, y) ∈ S,
P (min |Qt,h (x, y) − µ̃(x, y)| > η) → 0
which, using (A.24), implies
|λ(x, y)| ≥ Ch2 (nh2 )−γ {1 + õp (1)}.
(A.28)
From (A.27) and (A.28),
ℓ{µ̃(x, y)}
=
2
n
X
t=1
=
(A.29)
+
=
log[1 + λ(x, y){Qt,h (x, y) − µ̃(x, y)}]
2nλ(x, y)n−1
6
n
X
t=1
2 −3γ
{Qt,h (x, y) − µ̃(x, y)} − nλ2 (x, y)Sh2 (x, y)
Õp {nh (nh )
}
n
X
nλ(x, y)n−1
{Qt,h (x, y) − µ̃(x, y)} + Õp {nh6 (nh2 )−3γ }
t=1
=
=
nλ(x, y){C(nh2 )−γ + Õp {(nh2 )−1/2 log n}} + Õp {h4 (nh2 )1−3γ }
1
Õp {(nh2 )1−2γ } + Õp {(nh2 ) 2 −γ log n} + Õp {h4 (nh2 )1−3γ }.
As γ ∈ (1/3, 1/2), ℓ{µ̃(x, y)} → ∞ in probability as n → ∞. Since both
(A.26) and (A.29) are true uniformly with respect to (x, y) ∈ S, the divergence of ℓ{µ̃(x, y)} is uniform with respect to (x, y) ∈ S.
LEMMA A.10. Under Assumptions 1-4 and H1 , for any fixed real value
x, as n → ∞, P (Ln ≥ x) → 1.
PROOF. From the established theory on EL, ℓ{µ(x, y)} is a convex function of µ(x, y), the candidate for p(y|x). From the mean–value theorem and
the facts that both ℓ(·) and ω are non-negative and ℓ is continuous in (x, y),
N (h) ≥ Cδ ℓ{p̃θ̃ (y0 |x0 )}
for some (x0 , y0 ) ∈ S and Cδ > 0.
Let µ1 (x0 , y0 ) = p̃θ̃ (y0 |x0 ). By choosing C properly and for n large enough,
we make sure that µ2 (x0 , y0 ) = p̃(y0 |x0 )+C(nh2 )−γ falls within (µ1 (y0 |x0 ), p̂(y0 |x0 ))
or (p̂(y0 |x0 ), µ1 (y0 |x0 )) depending on the order between p̂(y0 |x0 ) and µ1 (x0 , y0 ).
28
Hence there exists an α ∈ (0, 1) such that µ2 (x0 , y0 ) = αp̂(y0 |x0 ) + (1 −
α)µ1 (y0 |x0 ). The convexity of ℓ(·) and the fact that ℓ{p̂(y0 |x0 )} = 0 lead to
ℓ{µ2 (x0 , y0 )} ≤ (1 − α)ℓ{µ1 (x0 , y0 )}.
Since Lemma A.9 implies that ℓ{µ2 (x0 , y0 )} → ∞ holds in probability
with µ2 (x0 , y0 ) = µ(x0 , y0 ), we have as n → ∞ ℓ{p̃θ̃ (y0 |x0 )} → ∞ in probability, which implies N (h) → ∞ in probability as n → ∞. This means
L(h) → ∞. The lemma is proved by noting P (Ln ≥ x) ≥ P {L(hi ) > x} for
any i ∈ {1, . . . , J}.
We now turn to the bootstrap EL test statistic N ∗ (h), which is a version of
N (h) based on {Xt∗ }n+1
t=1 generated according to the parametric transitional
density pθ̃ . Let p̂∗ (x, y) and p̃∗θ (x, y) be the bootstrap versions of p̂(x, y) and
p̃(x, y) respectively, and θ̃ ∗ be the maximum likelihood estimate based the
bootstrap sample. Then, the following expansion similar to (A.2) is valid for
N ∗ (h),
∗
2
Z Z {p̂∗ (x, y) − p̃∗ (x, y)}2
∗
θ̃
ω(x, y)dxdy + õp {h}
R2 (K)pθ̃ (x, y)
= N1∗ (h) + N2∗θ̃ (h) + N3∗θ̃ (h) + õp (h)
N (h) = (nh )
where Nj∗ (h) for j = 1, 2 and 3 are the bootstrap version of Nj (h) respectively. As the bootstrap resample is generated according to pθ̃ , the same
arguments which lead to Lemma 5 mean that N2∗ (h) = N3∗ (h) = 0. Thus,
N ∗ (h) = N1∗ (h) + õp (h) where
N1∗ (h)
2
= (nh )
Z Z
{p̂∗ (x, y) − p̃∗ (x, y)}2
ω(x, y)dxdy
R2 (K)pθ̃ (x, y)
And similar lemmas to Lemmas A.6 and A.7 can be established to study
∗ (h) which are the bootstrap version of N ∗ (h) respectively.
N1j
1j
Proof of Theorem 2: It can be shown by taking the same route that
establishes Lemma A.8 that as n → ∞ and conditioning on {Xt }n+1
t=1 , the
distribution of (L∗ (h1 ), · · · , L∗ (hJ )) converges to NJ (β1J , ΣJ ) in probability
which readily imply the conclusion of the theorem.
Let lα∗ be the upper-α conditional quantile of L∗n = maxh∈H L∗ (h) given
{Xt }n+1
t=1 .
29
∗ be the upper-α quantile of max
Proof of Theorem 3: Let l0α
1≤i≤J Zi .
From Theorem 2, and due to the use of the parametric bootstrap that is
faithful to H0 ,
lα∗ = l0α + op (1)
(A.30)
d
under both H0 and H1 . As Ln → max1≤i≤k Zi , by Slutsky theorem,
P (Ln ≥ lα∗ ) = P (Ln + op (1) ≥ l0α ) → P ( max Zi ≥ l0α ) = α
1≤i≤J
which completes the first part of Theorem 3. The second part of Theorem 3
is a direct consequence of Lemma A.10 and (A.30).
TABLE 1
Smoothing bandwidths in the simulation of the Three Vasicek Models
n
n = 125
n = 250
500
Model -2
hcv
href
0.0319
0.0276
(0.005) (0.0017)
0.0261
0.0247
(0.004) (0.0015)
0.0231
0.0220
(0.003) (0.0013)
Model 0
hcv
href
0.0181
0.0161
(0.003) (0.0014)
0.0163
0.0146
(0.002) (0.0012)
0.0140
0.013
(0.0017) (0.0007)
Model 2
hcv
href
0.0095
0.0093
(0.0017) (0.0007)
0.0090
0.0080
(0.0014) (0.001)
0.0080
0.0073
(0.001) (0.0008)
TABLE 2
Relative Bias (RBIAS) and Variance of the MLES for the Vasicek Models.
Model -2
n
120
250
500
Model 0
120
250
500
Model 2
120
250
500
κ = 3.334
RBIAS
VAR
13.78%
1.134
5.19%
0.5132
3.987% 0.24714
κ = 0.429
54.33%
0.347
21.18%
0.125
12.31%
0.054
κ = 0.215
238.35%
0.240
96.05%
0.056
50.96%
0.022
α = 0.0891
RBIAS
VAR
0.088% 6.591e-5
0.382% 3.477e-5
0.057% 1.685e-5
α = 0.0891
0.218% 2.612e-4
0.722% 1.396e-4
0.102% 6.669e-5
α = 0.0891
3.792%
0.0103
1.274% 6.640e-4
0.122% 2.682e-4
30
σ 2 = 0.093498
RBIAS
VAR
1.004% 5.056e-5
0.393% 2.456e-5
0.1%
9.912e-6
σ 2 = 0.033057
1%
1.035e-5
0.449% 4.911e-6
0.008% 2.016e-6
σ 2 = 0.023375
0.898% 2.549e-6
0.485% 1.134e-6
0.001% 4.793e-7
TABLE 3
Empirical sizes (in percentage) of the proposed EL Test (the last two columns)
and the single bandwidth based test (in the middle) for the Vasicek Models, as well as that of the Single Bandwidth Test Based on the Asymptotic
Normality (in round bracket): α1 — size based on the fixed bandwidth set;
α2 — size based one the data-driven bandwidths .
A: Model -2
n = 125
Size
n = 250
Size
n = 500
Size
0.030
9.4
(40.4)
0.022
8
(34.4)
0.02
6.2
(29.6)
0.032
8.2
(38.8)
0.023
5.2
(28.6)
0.021
5.8
(23.6)
bandwidths
0.034 0.036
5.2
4.6
(34.8) (34.8)
0.024 0.026
4.6
4.4
(24)
(21)
0.022 0.023
5.4
5.2
(19) (14.6)
0.016
5.8
(43)
0.014
6
(31.6)
0.01
6.8
(36.4)
0.017
6
(39)
0.015
6.2
(27)
0.011
4.4
(26.8)
bandwidths
0.019 0.020
6
4.2
(36.6) (34.8)
0.017 0.018
6.2
3.8
(20.6)
(20)
0.012 0.013
5.2
6.4
(20.6)
(13)
0.009
11
(53.4)
0.007
10
(35)
0.005
8.4
(63)
bandwidths
0.010
0.011
10
14.6
(47.2)
(46)
0.008
0.009
7.4
8.8
(31)
(30)
0.0054 0.0063
8
8.6
(51.6) (39.8)
B: Model 0
n = 125
Size
n = 250
Size
n = 500
Size
C: Model 2
n = 125
Size
n = 250
Size
n = 500
Size
0.008
12.6
(60)
0.006
12.2
(39)
0.004
8.2
(75.2)
31
0.0386
3
34.2)
0.0269
3.4
(17.4)
0.0245
5
(10.8)
0.041
2.4
(34.8)
0.0284
2.6
(16)
0.0258
5
(8.4)
0.022
4.4
(36.6)
0.02
2.4
(17.8)
0.015
5.6
(11.2)
0.024
3
(37)
0.022
2.8
(17.8)
0.016
4
(9.2)
0.013
14.4
(45)
0.01
7
(31)
0.0074
7
(32.8)
0.014
13.6
(42.2)
0.011
11
(33.2)
0.0086
9
(24.4)
α1
α2
4.4
7.6
4.6
6
5.4
5.6
α1
α2
4.2
7
5.2
5.8
5.4
6.2
α1
α2
12.6
3.4
8.8
4.2
7.2
5.6
TABLE 4
Asymptotic and Bootstrap (in parentheses) Sizes of Hong and Li’s Test for
Vasicek Models
n = 125
Vasicek -2
Vasicek 0
Vasicek 2
n = 250
Vasicek -2
Vasicek 0
Vasicek 2
n = 500
Vasicek -2
Vasicek 0
Vasicek 2
h:
h:
h:
0.08
11.8 (6.2)
13.2 (4.6)
11.8 (4.4)
0.07
15 (6.2)
13 (4.8)
15 (5.4)
0.06
15.6 (5.6)
19.4 (5.4)
17 (6.6)
0.1
3.8 (5.8)
4.2 (4.2)
4.2 (3.2)
0.09
6.2 (5.4)
5.8 (5.4)
7.4 (5.4)
0.08
7.6 (5.6)
8.4 (5.6)
9 (5.8)
32
0.12
1.2 (5.2)
1.2 (3.8)
2 (3.2)
0.11
2.2 (6)
3.4 (5.4)
3 (5.8)
0.1
2.8 (4.6)
3.4 (5.4)
4.6 (5.4)
0.14
0.6 (4.8)
0.8 (3.4)
1.4 (2.8)
0.13
1.6 (5.2)
1.8 (5.8)
1.6 (6.6)
0.12
2.4 (4.8)
2.6 (6.2)
3.2 (6.6)
0.16
0.6 (4.4)
0.8 (3.4)
1.6 (2.8)
0.15
1 (6.4)
2 (6.6)
1.4 (5.4)
0.14
1.2 (6.2)
1.6 (6.6)
2.4 (7.6)
TABLE 5
Empirical sizes (in percentage) of the proposed EL Test (the last two columns)
and the single bandwidth based test (in the middle) for the CIR Models:
α1 — size based on the fixed bandwidth set; α2 — size based one the datadriven bandwidths .
A: CIR 0
n = 125
Size
n = 250
Size
n = 500
Size
B: CIR 1
n = 125
Size
n = 250
Size
n = 500
Size
C: CIR 2
n = 125
Size
n = 250
Size
n = 500
Size
0.022
4.8
0.018
5.2
0.016
4.2
0.017
5.4
0.014
5.2
0.012
5.4
0.012
7.6
0.01
5.6
0.008
4
0.025
4.8
0.021
5.6
0.018
5.4
bandwidths
0.029 0.033
4.8
3.2
0.024 0.028
5.2
5.2
0.021 0.024
4.6
4.8
0.02
3.6
0.016
6.6
0.014
3.8
bandwidths
0.022 0.026
4.0
4.2
0.018 0.021
5.6
5.6
0.016 0.018
4.4
4.4
0.014
7.2
0.012
6.4
0.009
4.2
bandwidths
0.016 0.018
7.6
6.2
0.013 0.015
5.8
6.8
0.011 0.012
3.6
4
33
0.038
2.4
0.032
4
0.027
3.6
0.03
3.2
0.024
5.6
0.021
4.6
0.021
6.6
0.017
6.2
0.014
4.8
0.044
2
0.037
3.8
0.031
4.2
0.035
2.6
0.028
3.4
0.024
4
0.024
4.2
0.02
5.4
0.016
3.4
α1
α2
3.0
6.6
5.0
6
4.8
5.6
α1
α2
3.8
7.6
5.2
5.6
5.2
5.2
α1
α2
6.8
8.4
6.2
7.6
4
6.6
TABLE 6
A: Empirical power (in percentage) of the proposed EL Test (last two
columns) and the single bandwidth based test: α1 — power based on the
fixed bandwidth set; α2 — power based one the data-driven bandwidths.
n
125
250
500
h
Power
h
Power
h
Power
Single Bandwidth Based Tests
0.0199 0.0219 0.0241 0.0265 0.0291
80.4
74
67.2
66.4
65.2
0.0141 0.0158 0.0177 0.0199 0.0223
87.6
81.2
76.4
74
72.8
0.0113 0.0126 0.0141 0.0157 0.0175
90.8
84.8
82.8
84.4
80.8
α1
α2
79.8
63.6
88.6
65.2
96.8
81.4
B: Asymptotic and Bootstrap Power (in round brackets) of Hong and Li’s Test.
n
125
250
500
h
Power
h
Power
h
Power
Asymptotic power (Bootstrap power)
0.08
0.1
0.12
0.14
26 (4.6) 16.4 (3.8) 9.6 (2.8)
5.8 (3)
0.07
0.09
0.11
0.13
41 (5.8)
29 (6)
18.8 (5.4) 14.6 (7.2)
0.06
0.08
0.1
0.12
57.4 (6) 49 (5.4)
40.2 (5)
34 (6.2)
34
0.16
4.8 (3.6)
0.15
10.4 (6.4)
0.14
31.2 (7.2)
TABLE 7
P-Values (PV) for the Federal Fund Rate Data
Model
Single Bandwidth Based Tests
0.010 0.012 0.014 0.017
16.81 9.64
3.52 -0.47
-3.04 -3.45 -3.79 -3.64
0.00
0.00
0.00
0.00
Adaptive
∗
l0.05
P-V
0.0083
23.64
-2.80
0.00
CIR
T
∗
l0.05
P-V
0.32
-3.66
0.00
-1.74
-3.89
0.00
-2.19
-3.52
0.014
-0.34
-0.45
0.044
4.09
6.63
0.09
12.80
22.27
0.142
12.80
22.27
0.142
ICIR
T
∗
l0.05
P-V
9.13
2.68
0.018
14.06
10.84
0.044
21.97
30.19
0.078
32.72
66.83
0.146
46.58
145.5
0.216
66.63
303.4
0.294
66.63
303.4
0.294
CEV
T
∗
l0.05
P-V
8.16
14.78
0.10
12.86
14.78
0.064
20.56
37.98
0.136
31.11
88.75
0.238
44.76
185.3
0.33
64.65
434.7
0.434
64.65
434.77
0.434
T
NL
∗
l0.05
9.21
9.77
0.054
14.22
27.47
0.11
22.45
66.06
0.172
33.69
128.5
0.266
48.15
281.2
0.344
69.10
557.5
0.422
69.10
557.52
0.422
T
VSK
P-V
35
0.020
-1.18
-2.19
0.016
29.71
2.54
0.00
Vasicek
0.15
Fed Fund Rates
250
150
0.10
X1−X0
X_t+1
200
100
50
0.05
0
0.006
0.004
0.002
0.000
X1 −0.002
−0.004
−0.006
0.05
0.05
0.10
0.20
0.10
0.15
X0
0.15
X_t
CIR
Inverse−CIR
1000
400
800
300
X1−X0
X1−X0
600
200
400
100
200
0
0.006
0.004
0.002
0.000
X1 −0.002
−0.004
−0.006
0.05
0
0.006
0.004
0.002
0.000
X1 −0.002
−0.004
−0.006
0.05
0.20
0.10
0.15
X0
CEV
0.20
0.10
0.15
X0
Non−Linear Drift
1000
1000
800
800
X1−X0
600
X1−X0
600
400
400
200
0
0.006
0.004
0.002
0.000
X1 −0.002
−0.004
−0.006
0.05
200
0
0.006
0.004
0.002
0.000
X1 −0.002
−0.004
−0.006
0.05
0.20
0.10
0.15
X0
0.20
0.10
0.15
X0
FIG. 1. The Federal Fund Rate Data and five parametric transitional densities after rotating 45 degree anti clock-wise.
36
Nonparametric Kernel
Vasicek
80
60
100
X1−X0
X1−X0
40
50
20
0
0
0.01
0.00
X1 −0.01
−0.02 0.05
0.15
0.10
X0
0.20
0.25
0.01
CIR
0.00
X1 −0.01
−0.02 0.05
0.10
0.15
X0
0.20
0.25
Inverse−CIR
200
60
150
X1−X0
250
80
X1−X0
100
100
40
50
20
0
0
0.01
0.00
X1 −0.01
−0.02 0.05
0.10
0.15
X0
0.20
0.25
0.01
CEV
0.00
X1 −0.01
−0.02 0.05
0.10
0.15
X0
0.20
0.25
Non−Linear Drift
200
200
150
150
X1−X0
X1−X0
100
100
50
50
0
0
0.01
0.00
X1 −0.01
−0.02 0.05
0.15
0.10
X0
0.20
0.25
0.01
0.00
X1 −0.01
−0.02 0.05
0.10
0.15
X0
0.20
0.25
FIG. 2. Nonparametric kernel transitional density estimate and smoothed
parametric transitional densities.
37
References.
[1] Aı̈t-Sahalia, Y. (1996). Testing continuous–time models of the spot interest rate. Review of Financial Studies 9 385–426.
[2] Aı̈t-Sahalia, Y. (1999). Transition densities for interest rate and other nonlinear diffusions. Journal of Finance 54 1361–1395.
[3] Aı̈t-Sahalia, Y. (2002). Maximum likelihood estimation of discretely sampled diffusions: a closed–form approximation approach. Econometrica 70 223–262.
[4] Aı̈t-Sahalia, Y., Bickel, P. and Stoker, T. (2001). Goodness-of-fit tests for regression
using kernel methods. Journal of Econometrics 105 363–412.
[5] Aı̈t-Sahalia, Y., Fan, J. and Peng, H. (2005). Nonparametric transition
based
tests
for
diffusions.
Working
paper
available
from
http://www.princeton.edu/˜yacine/research.htm.
[6] Aı̈t-Sahalia, Y. and Kimmel, R. (2002). Estimating affine multifactor term structure models using closed-form likelihood expansions. Working paper available from
http://www.princeton.edu/˜yacine/research.htm.
[7] Bandi, F. and Phillips, P. C. B. (2003). Fully nonparametric estimation of scalar
diffusion models. Econometrica 71 241–284.
[8] Brown, B. and Chen, S. X. (1998). Combined and least squares empirical likelihood.
Annals of the Institute of Statistical Mathematics 50 697–714.
[9] Cai, Z. and Hong, Y. (2003). Nonparametric methods in continuous-time finance: A
selective review, In Recent Advances and Trends in Nonparametric Statistics (M.G.
Akritas and D.M. Politis, eds.), 283–302.
[10] Chapman, D. A. and Pearson, N. D. (2000). Is the short rate drift actually nonlinear.
Journal of Finance 55 355–388.
[11] Chen, S. X. and Cui, H. J. (2007). On the second order properties of empirical
likelihood for generalized estimation equations. Journal of Econometrics, to appear.
[12] Chen, S. X., Härdle, W. and Li, M. (2003). An empirical likelihood goodness–of–fit
test for time series. Journal of the Royal Statistical Society Series B 65 663–678.
[13] Cox, J. C., Ingersoll, J. E. and Ross, S. A. (1985). A theory of term structure of
interest rates. Econometrica 53 385–407.
[14] Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman’s
truncation. Journal of the American Statistical Association 434 674–688.
[15] Fan, J. (2005). A selective overview of nonparametric methods in financial econometrics. Statistical Science 20 317–357.
[16] Fan, J. and Gijbels, I. (1996) Local Polynomial Modeling and Its Applications. Chapman and Hall, London.
[17] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: nonparametric and parametric
methods. Springer-Verlag, New York.
[18] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83 189–196.
[19] Fan, J. and Yim, T. H. (2004). A data-driven method for estimating conditional
densities. Biometrika 91 819–834.
[20] Fan, J. and Zhang, C. (2003). A re–examination of Stanton’s diffusion estimation
with applications to financial model validation. Journal of the American Statistical
Association 457 118–134.
[21] Fan, J. and Zhang, J. (2004). Sieved empirical likelihood ratio tests for nonparametric
functions. Annals of Statistics 29 153–193.
[22] Fan, Y. and Li, Q. (1996). Consistent model specification tests: omitted variables and
semiparametric functional forms. Econometrica 64 865–890.
38
[23] Gao, J. and King, M. L. (2005). Estimation and model specification testing in nonparametric and semiparametric regression models. Unpublished paper available at
www.maths.uwa.edu.au/˜jiti/jems.pdf.
[24] Genon-Caralot, V., Jeantheau, T. and Laredo, C. (2000). Stochastic volatility models
as hidden Markov models and statistical applications. Bernoulli 6 1051–1079.
[25] Gozalo, P. L. and Linton, Q. (2001). Testing additivity in generalized nonparametric
regression models with estimated parameters. Journal of Econometrics 104 1–48.
[26] Härdle, W. and Mammen, E. (1993). Comparing nonparametric versus parametric
regression fits. Annals of Statistics 21 1926–1947.
[27] Hart, J. (1997). Nonparametric Smoothing and Lack-of-Fit Tests. Springer, New York.
[28] Hjellvik, V., Yao, Q. and Tjøstheim, D. (1998). Linearity testing using local polynomial approximation. Journal of Statistical Planning & Inference 68 295-321.
[29] Hjort, N. L., McKeague, I. W. and Van Keilegom, I. (2005). Extending the scope of
empirical likelihood. Under revision for Annals of Statistics.
[30] Hong, Y. and Li, H. (2005). Nonparametric specification testing for continuous–time
models with application to spot interest rates. Review of Financial Studies 18 37–84.
[31] Horowitz, J. L. and Spokoiny, V. G. (2001). An adaptive, rate–optimal test of a
parametric mean–regression model against a nonparametric alternative. Econometrica
69 599–632.
[32] Hyndman, R. J. and Yao, Q. (2002). Nonparametric estimation and symmetry tests
for conditional density functions. Journal of Nonparametric Statistics 14 259–278.
[33] Jiang, G. and Knight, J. (1997). A nonparametric approach to the estimation of diffusion processes with an application to a short–term interest rate model. Econometric
Theory 13 615–645.
[34] Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes.
Ann. Statist. 25 2084–2102.
[35] Li, G. (2003). Nonparametric likelihood ratio goodness-of-fit tests for survival data.
Journal of Multivariate Analysis 86 166–182.
[36] Li, Q. (1999). Consistent model specification tests for time series econometric models.
Journal of Econometrics 92 101–147.
[37] Lo, A.W. (1988). Maximum likelihood estimation of generalized Ito processes with
discretely sampled data. Econometric Theory 4 231-247.
[38] Müller, H. G. and Stadtmüller, U. (1999). Multivariate boundary kernels and a continuous least squares principle. Journal of the Royal Statistical Society Series B 61
439–458.
[39] Owen, A. (1988). Empirical likelihood ratio confidence regions for a single functional.
Biometrika 75 237–249.
[40] Phillips, P. C. B. and Yu, J. (2005). Jackknifing bond option prices. Review of Financial Studies 18 707–742.
[41] Pritsker, M. (1998). Nonparametric density estimation and tests of continuous time
interest rate models. Review of Financial Studies 11 449-487.
[42] Qin, J. and J. Lawless (1994). Empirical likelihood and general estimating functions.
Annals of Statistics 22 300–325.
[43] Robinson, P. (1989). Hypothesis testing in semiparametric and nonparametric models
for econometric time series. Review of Economic Studies 56 511–534.
[44] Scott, D. W. (1992). Multivariate Density Estimation. John Wiley, New York.
[45] Stanton, R. (1997). A nonparametric model of term structure dynamics and the
market price of interest rate risk. Journal of Finance 52 1973–2002.
[46] Tripathi, G. and Kitamura, Y. (2003). Testing conditional moment restrictions. Annals of Statistics 31 2059-2095.
39
[47] Vasicek, O. (1977). An equilibrium characterization of the term structure. Journal of
Financial Economics 5 177–188.
Department of Statistics
Iowa State University
Ames, IA 50011-1210 USA
E-mail: songchen@iastate.edu
School of Mathematics and Statistics
The University of Western Australia
Crawley WA 6009, Australia
E-mail: jiti.gao@uwa.edu.au
Department of Statistics
Iowa State University
Ames, IA 50011-1210 USA
E-mail: yongtang@iastate.edu
40
Download