SIMULATED LIKELIHOOD ESTIMATORS FOR DISCRETELY OBSERVED JUMP-DIFFUSIONS

advertisement
SIMULATED LIKELIHOOD ESTIMATORS FOR
DISCRETELY OBSERVED JUMP-DIFFUSIONS
By Kay Giesecke and Gustavo Schwenkler∗
Stanford University and Boston University
This paper develops an unbiased Monte Carlo approximation to
the transition density of a jump-diffusion process with state-dependent
drift, volatility, jump intensity, and jump magnitude. The approximation is used to construct a likelihood estimator of the parameters
of a jump-diffusion observed at fixed time intervals that need not be
short. The estimator is asymptotically unbiased for any sample size.
It has the same large-sample asymptotic properties as the true but
uncomputable likelihood estimator. Numerical results illustrate its
advantages.
1. Introduction. Continuous-time jump-diffusion processes are widely
used in a range of disciplines. This paper addresses the parameter inference
problem for a jump-diffusion observed at fixed time intervals that need not
be short. We develop an unbiased Monte Carlo approximation to the transition density of the process, and use it to construct likelihood estimators
of the parameters specifying the dynamics of the process. The results include asymptotic unbiasedness, consistency, and asymptotic normality as
the sample period grows. Our approach is motivated by (i) the fact that it
offers asymptotically unbiased and efficient estimation at any observation
frequency, and (ii) its computational advantages in calculating and maximizing the likelihood.
More specifically, we consider a one-dimensional jump-diffusion process
∗
Schwenkler is corresponding author. Schwenkler acknowledges support from a Mayfield
Fellowship and a Lieberman Fellowship. We are grateful to Rama Cont, Darrell Duffie,
Peter Glynn, Emmanuel Gobet, Marcel Rindisbacher, Olivier Scaillet, and the participants at the Bachelier World Congress, the BU Conference on Credit and Systemic Risk,
the Conference on Computing in Economics and Finance, the European Meeting of the
Econometric Society, the INFORMS Annual Meeting, the SIAM Financial Mathematics
and Engineering Conference, and seminars at Boston University, Carnegie Mellon University, the Federal Reserve Board, the University of California at Berkeley, and the Worcester
Polytechnic Institute for useful comments. We are also grateful to Francois Guay for excellent research assistance. An implementation in R of the methods developed in this paper
can be downloaded at http://people.bu.edu/gas.
Keywords and phrases: Density estimator, Parameter estimator, Maximum likelihood,
Exact simulation, Unbiased estimation, Jump-diffusions
1
2
GIESECKE AND SCHWENKLER
whose drift, volatility, jump intensity, and jump magnitude are allowed to be
arbitrary parametric functions of the state. We develop unbiased simulation
estimators of the transition density of the process and its partial derivatives. Our approach can be extended to time-inhomogenous jump-diffusions
and certain multi-dimensional jump-diffusions. Volatility and measure transformation arguments are first used to represent the transition density as
a mixture of weighted Gaussian distributions, generalizing the results of
Dacunha-Castelle and Florens-Zmirou (1986) and Rogers (1985) for diffusions. A weight takes the form of a conditional probability that a certain
doubly-stochastic Poisson process has no jumps in a given interval. We develop an unbiased Monte Carlo approximation of that probability using an
exact sampling method, building on the schemes proposed by Beskos and
Roberts (2005), Chen and Huang (2013) and Giesecke and Smelov (2013)
for sampling the solutions of stochastic differential equations. The resulting
transition density estimator is unbiased and almost surely non-negative for
any argument of the density.1 Its accuracy depends only on the number of
Monte Carlo replications used, making it appropriate for any time interval.
Moreover, the estimator can be evaluated at any value of the parameter
and arguments of the density function without re-simulation. This property
generates computational efficiency for the simulated likelihood problem. It
reduces the maximization of the simulated likelihood to a deterministic problem that can be solved using standard methods.
We analyze the asymptotic behavior of the estimator maximizing the simulated likelihood for a fixed observation frequency.2 The estimator converges
almost surely to the true likelihood estimator as the number of Monte Carlo
replications grows, for a fixed sample period (an asymptotic unbiasedness
property). This ensures that the estimator inherits the consistency, asymptotic normality, and asymptotic efficiency of the true likelihood estimator if
the number of Monte Carlo replications grows at least as fast as the sample period. Our estimator does not suffer from the second-order bias generated by a conventional Monte Carlo approximation of the transition density,
which relies on a time-discretization of the process and non-parametric kernel estimation.3 Our exact Monte Carlo approach eliminates the need to
discretize the process and perform kernel estimation. It facilitates asymp1
Given that we do not require any debiasing technique, our non-negativity result does
not contradict the finding of Jacob and Thiery (2015).
2
Bibby and Sørensen (1995), Florens-Zmirou (1989), and Gobet, Hoffmann and Reiß
(2004) consider various diffusion estimators in a similar asymptotic regime.
3
Detemple, Garcia and Rindisbacher (2006) analyze this bias in the diffusion case and
propose a bias-corrected discretization scheme. For jump-diffusions with state-independent
coefficients, Kristensen and Shin (2012) provide conditions under which the bias is zero.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
3
totically unbiased and efficient likelihood estimation.
Numerical results illustrate the accuracy and computational efficiency of
the density approximation as well as the performance of the simulated likelihood estimator. Our density estimator is found to outperform alternative
estimators of the transition density, both in terms of accuracy and computational efficiency. The error of our density estimator converges at the fastest
rate. Moreover, our density estimator entails the smallest computational cost
per observation when calculating the likelihood. The cost decreases as more
observations become available. Performing maximum likelihood estimation
for simulated monthly and quarterly data, we confirm that our simulated
likelihood estimator behaves indeed as the true maximum likelihood estimator, as predicted by our theoretical results. Our numerical results indicate
that the distribution of our simulated likelihood estimator for finite data
samples and a finite number of Monte Carlo replications is similar to that
of the true likelihood estimator. They also indicate that our estimator compares favorably to alternative simulation-based likelihood estimators when
a fixed computational budget is given.
Our results have several important applications. Andrieu, Doucet and
Holenstein (2010) show that an unbiased density estimator can be combined with a Markov Chain Monte Carlo method to perform exact Bayesian
estimation. Thus, our density estimator enables exact Bayesian estimation
of jump-diffusion models with general coefficients. Focusing on pure diffusion models, Semaidis et al. (2013) take a step in this direction. Based on
the results of these authors, we conjecture that the beneficial properties of
our estimators carry over to the Bayesian case.
The transition density estimator can also be used to perform efficient
generalized method of moments (GMM) estimation of jump-diffusions. It
is well-known that the optimal instrument that yields maximum likelihood
efficiency for GMM estimators is a function of the underlying transition
density (see Feuerverger and McDunnough (1981) and Singleton (2001)).
Our unbiased density estimator can be used instead of the unknown true
density. This enables efficient GMM estimation for many jump-diffusions
that were previously intractable.4
Finally, the transition density approximation generates an unbiased estimator of an expectation of a given function of a jump-diffusion evaluated
at a fixed horizon. In financial applications, for example, the expectation
might represent the value of a derivative security. Prices at different model
parameter values can be computed without having to re-simulate the tran4
Approximate GMM estimators that achieve efficiency have recently been proposed by
Carrasco et al. (2007), Chen, Peng and Yu (2013), and Jiang and Knight (2010).
4
GIESECKE AND SCHWENKLER
sition density, generating computational efficiency in econometric applications. Traditional discretization-based approaches to estimating the transition density (see, e.g., Platen and Bruti-Liberati (2010)) generate biased
estimators of security prices. Exact sampling approaches to estimating the
density (e.g., Giesecke and Smelov (2013)) generate unbiased estimators of
prices but might be computationally burdensome.
We have implemented the transition density approximation and the likelihood estimator in R. The code can be downloaded at http://people.bu.
edu/gas. It can be easily customized to treat a given jump-diffusion.
1.1. Related literature. Prior research on the parametric inference problem for discretely-observed stochastic processes has focused mostly on diffusions. Of particular relevance to our work is the Monte Carlo likelihood
estimator for diffusions proposed by Beskos, Papaspiliopoulos and Roberts
(2009). They use the exact sampling method of Beskos, Papaspiliopoulos
and Roberts (2006) to approximate the likelihood for a discretely-observed
diffusion. In the absence of jumps, our estimator reduces to their estimator.
However, our approach requires weaker assumptions, so our estimator has a
broader scope even in the diffusion case. Moreover, our approach allows us
to optimize the computational efficiency of estimation.
Lo (1988) treats a jump-diffusion with state-independent Poisson jumps
by numerically solving the partial integro differential equation governing
the transition density. Kristensen and Shin (2012) analyze a nonparametric
kernel estimator of the transition density of a jump-diffusion with stateindependent coefficients.5 Aı̈t-Sahalia and Yu (2006) develop saddlepoint
expansions of the transition densities of Markov processes, focusing on jumpdiffusions with state-independent Poisson jumps and Lévy processes. Filipović, Mayerhofer and Schneider (2013) analyze polynomial expansions of
the transition density of an affine jump-diffusion. Li (2013) studies a power
series expansion of the transition density of a jump-diffusion with stateindependent Poisson jumps. Yu (2007) provides a small-time expansion of
the transition density of a jump-diffusion in a high-frequency observation
regime, assuming a state-independent jump size. The associated estimator inherits the asymptotic efficiency of the theoretical likelihood estimator as the observation frequency grows large; see Chang and Chen (2011)
for the diffusion case. Jiang and Knight (2002), Chacko and Viceira (2003),
Duffie and Glynn (2004), and Duffie and Singleton (1993) develop gener5
The assumption that the distribution of t is independent of t and θ in equation
(1) of Kristensen and Shin (2012) effectively restricts their model to state-independent
jump-diffusions.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
5
alized method of moments estimators for jump-diffusions and other timehomogenous Markov processes. If an infinite number of moments is used,
then these estimators inherit the asymptotic properties of the theoretical
likelihood estimator. This, however, is infeasible in practice.
Unlike the transition density approximations developed in the aforementioned papers, our unbiased Monte Carlo approximation of the transition
density applies to jump-diffusions with general state-dependent drift, diffusion, jump intensity, and jump size. Beyond mild regularity, no structure is
imposed on the coefficient functions. The approximation has a significantly
wider scope than existing estimators, including, in particular, models with
state-dependent jumps and non-affine formulations. The simulated likelihood estimator inherits the asymptotic efficiency of the theoretical likelihood estimator as both the sample period and the number of Monte Carlo
replications grow, for any observation frequency.
1.2. Structure of this paper. Section 2 formulates the inference problem.
Section 3 develops a representation of the transition density of a jumpdiffusion. Section 4 uses this representation to construct an unbiased Monte
Carlo estimator of the density and its partial derivatives. Section 5 discusses the implementation of the estimator. Section 6 analyzes the asymptotic behavior of the estimator maximizing the simulated likelihood. Section
7 presents numerical results. There are two technical appendices, one containing the proofs.
2. Inference problem. Fix a complete probability space (Ω, F, P) and
a right-continuous, complete information filtration (Ft )t≥0 . Let X be a
Markov jump-diffusion process valued in S ⊂ R that is governed by the
stochastic differential equation
(1)
dXt = µ(Xt ; θ)dt + σ(Xt ; θ)dBt + dJt ,
where X0 is a constant, µ : S × Θ → R is the drift function, σ : S ×
Θ →P
R+ is the volatility function, B is a standard Brownian motion, and
t
Jt = N
n=1 Γ(XTn − , Dn ; θ) for a N is a non-explosive counting process with
event stopping times (Tn )n≥1 and intensity λt = Λ(Xt ; θ) for a function Λ :
S × Θ → R+ . The function Γ : S × D × Θ → R governs the jump magnitudes
of X, and (Dn )n≥1 is a sequence of i.i.d. mark variables with probability
density function π on D ⊂ R. The drift, volatility, jump intensity, and jump
size functions are specified by a parameter θ ∈ Θ to be estimated, where the
parameter space Θ is a subset of Euclidean space. More specifically, X is a
Markov process with infinitesimal generator for functions f with bounded
6
GIESECKE AND SCHWENKLER
and continuous first and second derivatives given by
Z
1
0
00
(f (x + Γ(x, u; θ)) − f (x)) π(u)du.
µ(x; θ)f (x) + σ(x; θ)f (x) + Λ(x; θ)
2
D
We impose the following assumptions. First, the boundary of S is unattainable. Second, the parameter space Θ is a compact subset of Rr with nonempty interior for r ∈ N. Third, the SDE (1) admits a unique strong solution.
Sufficient conditions are given in (Protter, 2004, Theorem V.3.7). Finally,
X admits a transition density. Cass (2009) provides sufficient conditions.
Our goal is to estimate the parameter θ specifying the dynamics of X
given a sequence of values X = {Xt0 , . . . , Xtm } of X observed at fixed times
0 = t0 < · · · < tm < ∞. For ease of exposition, we assume that ti −
ti−1 = ∆ for all i and some fixed ∆ > 0. The data X is a random variable
valued in S m and measurable with respect to B m , where B is the Borel
σ-algebra on S. The likelihood of the data is the Radon-Nikodym density
of the law of X with respect to the Lebesgue measure on (S m , B m ). Let
pt (x, ·; θ) be the Radon-Nikodym density of the law of Xt given X0 = x
with respect to the Lebesgue measure on (S, B), i.e., the transition density
of X. Given the Q
Markov property of X, the likelihood function L(θ) takes
the form L(θ) = m
i=1 p∆ (X(i−1)∆ , Xi∆ ; θ). A maximum likelihood estimator
(MLE) θ̂m satisfies θ̂m ∈ arg maxθ∈Θ L(θ) almost surely. We only consider
interior MLEs for which ∇L(θ)|θ=θ̂m = 0 Throughout, let ∇ and ∇2 denote
the gradient and the Hessian matrix operators, respectively. Also, assume
that θ = (θ1 , . . . , θr ). For any 1 ≤ i1 , . . . , in ≤ r, write ∂in1 ,...,in for the n-th
partial derivative with respect to θi1 , . . . , θin .
Suppose the true data-generating parameter θ∗ ∈ int Θ. Appendix A provides sufficient conditions for consistency and asymptotic normality of a
MLE θ̂m as m → ∞. Unlike the well-known standard hypotheses (see, e.g.,
(Singleton, 2006, Chapter 3)), our conditions do not require knowledge of the
true data-generating parameter. Our conditions can be verified in practice
using an unbiased Monte Carlo approximation to p∆ developed in Section
3. On the other hand, they are somewhat stronger than the standard hypotheses because they need to hold globally in the parameter space.
Some assumptions were made for clarity in the exposition and can be
relaxed. We can extend to time-inhomogenous Markov jump-diffusions, for
which the coefficient functions may depend on state and time. It is straightforward to treat the case of observation interval lengths ti − ti−1 that vary
across i, the case of a random initial value X0 , and the case of mark variables with parameter dependent density function π = π(·; θ). The analysis
can be extended to certain multi-dimensional jump-diffusions, namely those
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
7
that are reducible in the sense of Definition 1 in Aı̈t-Sahalia (2008). Finally,
we can also extend to settings in which the boundary ∂S is attainable. The
results presented below hold until the first hitting time of ∂S.
3. Transition density. Using volatility and measure transformation
arguments, this section develops a weighted Gaussian mixture representation
of the transition density p∆ of X.
3.1. Change of variables. We begin by applying a change of variables to
transform X
R winto1 a unit-volatility process. Define the Lamperti transform
F (w; θ) = X0 σ(u;θ) du for w ∈ S and θ ∈ Θ. For every θ ∈ Θ, the mapping w 7→ F (w; θ) is well-defined given that σ(u; θ) > 0 for all u ∈ S. Set
Yt = F (Xt ; θ) on the state space SY = F (S; θ). If σ(x; θ) is continuously
differentiable in x, then Itô’s formula implies that Y solves the SDE
dYt = µY (Yt ; θ)dt + dBt + dJtY ,
Y0 = F (X0 ; θ) = 0,
up to the exit time of SY . Since 0 < σ(u; θ) < ∞ for all u ∈ S, θ ∈ Θ, it
follows that F is invertible with respect to w ∈ S. Let F −1 (y; θ) denote the
inverse of F , such that F (F −1 (y; θ); θ) = y. The drift function of Y satisfies
µY (y; θ) =
µ(F −1 (y; θ); θ) 1 0 −1
− σ (F (y; θ); θ)
σ(F −1 (y; θ); θ) 2
in the interior of SY . The Lamperti transform does not affect the jump intensity of N , but it alters the jump magnitudes of the P
state process. The process
t
Y
Y
J describing the jumps of Y is given by Jt = N
n=1 ΓY (YTn − , Dn ; θ) for
the jump size function
ΓY (y, d; θ) = F F −1 (y; θ) + Γ(F −1 (y; θ), d; θ); θ − y,
y ∈ SY .
The Lamperti transformation can be understood as a change of variables.
If X has a transition density p∆ , then Y has a transition density, denoted
pY∆ . We have pY∆ (F (v; θ), F (w; θ); θ) = p∆ (v, w; θ)σ(w; θ).
3.2. Change of measure. To facilitate the computation of the transition
density pY∆ , we change from Pθ to a measure Qθ under which Y has constant
drift ρ ∈ R and the jump counting process N is a Poisson process with some
D (θ)Z P (θ), where
fixed rate ` > 0. Define the variable Z∆ (θ) = Z∆
∆
Z ∆
Z ∆
1
2
D
(2) Z∆ (θ) = exp −
(µY (Ys ; θ) − ρ) ds −
(µY (Ys ; θ) − ρ) dBs
2 0
0
Z ∆
Y
N∆
`
P
(θ) = exp
Λ(F −1 (Ys ; θ); θ) − ` ds
(3) Z∆
−1 (Y
Λ(F
Tn − ; θ); θ)
0
n=1
8
GIESECKE AND SCHWENKLER
for θ ∈ Θ. If Eθ [Z∆ (θ)] = 1, we can define an equivalent probability measure
Qθ on (Ω, F∆ ) by Qθ [A] = Eθ [Z∆ (θ)1A ] for any A ∈ F∆ . If µY (y; θ) is
continuously differentiable in y, integration by parts implies that
Z
(4) Z∆ (θ) = exp a(Y0 ; θ) − a(Y∆ ; θ) +
∆
b(Ys ; θ)ds
0
Y
N∆
n=1
1
c(YTn − , Dn ; θ)
where a, b : SY × Θ → R and c : SY × D × Θ → R are given by
Z y
(µY (u; θ) − ρ) du,
a(y; θ) =
0
µ2Y (y; θ) − ρ2 + µ0Y (y; θ)
,
2
!
Z y+ΓY (y,d;θ)
Λ(F −1 (y; θ); θ)
(µY (u; θ) − ρ) du .
exp −
c(y, d; θ) =
`
y
b(y; θ) = Λ(F −1 (y; θ); θ) − ` +
The theorems of Girsanov,
Lévy, and Watanabe imply that, under Qθ and
Rt
on [0, ∆], Wt = Bt + 0 (µY (Ys ; θ) − ρ)ds is a standard Qθ -Brownian motion,
N is a Poisson process with rate `, the random variables (Dn )n≥1 are i.i.d.
with density π, and Y is governed by the stochastic differential equation
dYt = ρdt + dWt + dJtY .
(5)
Under Qθ , the process Y is a jump-diffusion with state-independent Poisson
jumps that arrive at rate `. The size of the nth jump of Y is a function of
YTn − and Dn , where the Dn ’s are i.i.d. variables with density π. Between
jumps, Y follows a Brownian motion with drift ρ. Thus, Y is a strong Markov
process under Qθ .
3.3. Density representation. We exploit the volatility and measure transformations to represent the transition density p∆ as a mixture of weighted
Gaussian distributions.
Theorem 3.1.
Fix ∆ > 0. Suppose the following assumptions hold.
(B1) For any θ ∈ Θ, the function u 7→ µ(u; θ) is continuously differentiable
and the function u 7→ σ(u; θ) is twice continuously differentiable.
(B2) For any θ ∈ Θ, the expectation Eθ [Z∆ (θ)] = 1.
Let v, w be arbitrary points in S and x, y be arbitrary points in SY . Let Y x
be the solution of the SDE (5) on [0, ∆] with Y0 = x. Then
(6)
p∆ (v, w; θ) =
ea(F (w;θ);θ)−a(F (v;θ);θ)
Ψ∆ (F (v; θ), F (w; θ); θ)
σ(w; θ)
9
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
for any θ ∈ Θ, where
"
Ψ∆ (x, y; θ) =
EQ
θ
Z
exp −
0
∆
Y
N∆
b(Ysx ; θ)ds
c(YTxn − , Dn ; θ)
n=1
(7)
1
×p
exp
2π(∆ − TN∆ )
−
y − ρ (∆ − TN∆ ) − YTxN
2 !#
∆
.
2 (∆ − TN∆ )
Assumption (B1) is standard. Assumption (B2) guarantees that the change
of measure is well-defined. Sufficient conditions for Assumption (B2) are
given by Blanchet and Ruf (2013), for example.
The representation (6) can be thought of as arising from Bayes’ formula
as the product of the conditional Qθ -law of Y∆x given (Tn , YTxn )n≤N∆ and the
Qθ -law of (Tn , YTxn )n≤N∆ . The former is represented by the density function
1
p
exp
2π(∆ − TN∆ )
−
y − ρ(∆ − TN∆ ) − YTxN
2 (∆ − TN∆ )
2 !
∆
,
which is Gaussian because Y x follows a Brownian motion with drift between
jumps and ∆ is fixed. The expectation (7) integrates this density according
to the Qθ -law of (Tn , YTxn )n≤N∆ . The additional terms appearing in (6) take
account of the changes of variable and measure.
Theorem 3.1 also applies in the diffusion case (Γ ≡ 0). In this case, (6)
provides a significant generalization of the diffusion density representations
of Dacunha-Castelle and Florens-Zmirou (1986) and Rogers (1985). Compared to these, we extend the state space by introducing the counting process
N , which allows us to employ the change of measure defined by the RadonNikodym densities (2)-(3) and parametrized by the Poisson rate ` and the
drift ρ. As explained in Section 5, the ability to select ρ and ` facilitates the
construction of computationally efficient density estimators.
We exploit the representation (6) to develop conditions under which the
transition density is smooth with respect to the parameter θ. Smoothness is
often required for consistency and asymptotic normality of a MLE θ̂m ; see,
e.g., Appendix A.
Proposition 3.2. Suppose that the conditions of Theorem 3.1 hold. Furthermore, suppose that the following conditions also hold.
(B3) The functions (u, θ) 7→ Λ(u; θ) and (u, d, θ) 7→ Γ(u, d; θ) are n-times
continuously differentiable in (u, d, θ) ∈ S × D × Θ. The function
10
GIESECKE AND SCHWENKLER
(u, θ) 7→ µ(u; θ) is (n + 1)-times continuously differentiable. The function (u, θ) 7→ σ(u; θ) is (n + 2)-times continuously differentiable in
(u, θ) ∈ S × Θ.
(B4) The order of differentiation and Qθ -expectation can be interchanged
for Ψ∆ (x, y; θ) for the n-th partial derivative taken with respect x, y,
or θ. In other words, for qi1 , . . . , qin ∈ {θ1 , . . . , θr , x, y},
∂n
∂n
Q
Ψ∆ (x, y; θ) = Eθ
H(x, y; θ)
∂qi1 . . . ∂qin
∂qi1 . . . ∂qin
where H(x, y; θ) is the integrand of Ψ∆ in (7).
Then θ 7→ p∆ (v, w; θ) is n-times continuously differentiable for any v, w ∈ S.
Condition (B4) is intentionally formulated loosely as there are many sufficient conditions that allow for the interchange of expectation and differentiation. For example, invoking the bounded convergence theorem, a sufficient condition is that the difference quotients of n-th order of H(x, y; θ)
are uniformly bounded. A necessary condition according to Section 7.2.2 of
Glasserman (2003) is that the difference quotients are uniformly integrable.
4. Transition density estimator. This section develops an unbiased
Monte Carlo estimator of the transition density p∆ based on the representation obtained in Theorem 3.1. The key step consists of estimating the
expectation (7). For values w1 , w2 ∈ R, time t > 0, parameter θ ∈ Θ, and a
standard Qθ -Brownian motion W , define the function f (v, w, t; θ) as
Z t
Q
(8)
Eθ exp −
b(v + ρu + Wu ; θ)du Wt = w − v − ρt .
0
By iterated expectations, the strong Markov property, and the fact that Y x
follows a Brownian motion with drift between jumps under Qθ , we have
2
y−ρ(∆−TN )−YTx
"
∆
N∆
x
x
f (YTN , Y∆ , ∆ − TN∆ ; θ) −
2(∆−TN )
∆
p∆
Ψ∆ (x, y; θ) = EQ
e
θ
2π(∆
−
T
)
N
∆
(9)
#
N∆
Y
×
c(YTxn − , Dn ; θ)f (YTxn−1 , YTxn − , Tn − Tn−1 ; θ) .
n=1
In order to construct an unbiased Monte Carlo estimator of (9), we require
exact samples of several random quantities. Samples of N∆ and the jump
times (Tn )n≤N∆ of the Qθ -Poisson process N can be generated exactly (using
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
11
the order statistics property, for example). Under Qθ , the variables Y∆x and
YTxn − are conditionally Gaussian so samples can also be generated exactly.
Moreover, the marks (Dn )n≤N∆ can be sampled exactly from the Qθ -density
π by the inverse transform method, for example. The only non-trivial task
is the unbiased estimation of the expectation (8). We extend an approach
developed by Beskos and Roberts (2005) for the exact sampling of a diffusion.
To see how an unbiased estimator can be constructed, suppose the function b is positive. Then (8) is the conditional probability that a doublystochastic Poisson process with intensity b(v + ρs + Ws ; θ) has no jumps in
the interval [0, t], given (W0 , Wt ). If b is also bounded, then this probability
can be estimated without bias using a simple thinning scheme (Lewis and
Shedler (1979)). Here, one generates the jump times τ1 < · · · < τp of a dominating Poisson process on [0, t] with intensity maxw b(w; θ), and a skeleton
Wτ1 , . . . , Wτp of a Brownian bridge starting from 0 at time 0 and ending at
w − v − ρt at time t. An estimator of the desired no-jump probability is
p Y
b(v + ρτi + Wτi ; θ)
(10)
1−
,
maxw b(w; θ)
i=1
which is the empirical probability of rejecting the jump times of the dominating Poisson process as jump times of a doubly-stochastic Poisson process
with intensity b(v + ρs + Ws ; θ) conditional on Wt = w − v − ρt.
This approach extends to the case where b is not necessarily bounded or
positive; see Chen and Huang (2013) for the case ρ = 0. We partition the
interval [0, t] into segments in which W is bounded. Let η = min{t > 0 :
Wt ∈
/ [−L, L]} for some level L > 0 whose choice will be discussed below. If
the function b is continuous, then
(11)
b(v + ρt + Wt ; θ) −
min
|w|≤L+|ρ|η
b(v + w; θ)
is positive and bounded for t ∈ [0, η]. Consequently, we can use the estimator
(10) locally up to time η if we replace the function b with (11). We iterate
this localization argument to obtain an unbiased estimator of (8). Let (ηi :
i = 1, . . . , I + 1) be i.i.d. samples of η for I = sup{i : η1 + · · · + ηi ≤ t} and
set E0 = 0, Ei = η1 + · · · + ηi . In addition, let τ1i < · · · < τpii be a sample of
the jump times of a dominating Poisson process on [0, ηi ] with rate Mi (θ)
given by
max
b v + ρEi−1 + WEi−1 + x; θ − b v + ρEi−1 + WEi−1 + y; θ .
|x|,|y|≤L+|ρ|ηi
Note that 0 ≤ Mi (θ) < ∞ for all y ∈ SY if b is continuous. Finally, define
τ0i = 0, wi,0 = WEi−1 , and let wi,1 , . . . , wi,pi be a skeleton of the Brownian
12
GIESECKE AND SCHWENKLER
bridge starting at WEi−1 at time Ei−1 and finishing at WEi at time Ei .
Letting mi (θ) = min|w|≤L+|ρ|ηi b(v + ρEi−1 + WEi−1 + w; θ), an unbiased
estimator of (8) is given by fˆ(v, w, t; θ) defined as
!
pi
I+1
i ) + w ; θ) − m (θ)
Y
Y
b(v
+
ρ(E
+
τ
i−1
i,j
i
j
(12)
e−mi (θ)Ei
1−
Mi (θ)
i=1
j=1
P
where EI+1 = t − Ii=1 Ei . This estimator generates an unbiased estimator
of the transition density of X.
Theorem 4.1. Fix the localization level L > 0, the Poisson rate ` > 0,
and the drift ρ ∈ R. Set
2
y−ρ(∆−TN )−YTx
∆
N∆
fˆ(YTxN , Y∆x , ∆ − TN∆ ; θ) a(y;θ)−a(x;θ)−
2(∆−TN )
∆
p̂∆ (v, w; θ) = p ∆
e
2π(∆ − TN∆ )σ(w; θ)
(13)
×
N∆
Y
c(YTxn − , Dn ; θ) fˆ(YTxn−1 , YTxn − , Tn − Tn−1 ; θ)
n=1
for x = F (v; θ) and y = F (w; θ). Suppose the following conditions hold:
(B5) For any θ ∈ Θ, v 7→ Λ(v; θ) is continuous, v 7→ µ(v; θ) is continuously
differentiable, and v 7→ σ(v; θ) is twice continuously differentiable.
(B6) At least one of the functions Λ, µ, σ, σ 0 , or σ 00 is not constant.
Then p̂∆ (v, w; θ) is an unbiased estimator of the transition density p∆ (v, w; θ)
for any v, w ∈ S, and θ ∈ Θ. That is, EQ
θ [p̂∆ (v, w; θ)] = p∆ (v, w; θ).
A simulation estimator of the transition density p∆ (v, w; θ) of the jumpdiffusion X is given by the average of independent Monte Carlo samples of
p̂∆ (v, w; θ) drawn from Qθ . Theorem 4.1 provides mild conditions on the
coefficient functions of X guaranteeing the unbiasedness of this estimator.
The practical implementation of the estimator will be discussed in Section
5, including the selection of the quantities L, ` and ρ.
The density estimator (13) also applies to a diffusion, i.e., in the absence
of jumps (Γ ≡ 0). In this case, and if Λ ≡ 0, ρ = 0, ` = 0, L = ∞,
and inf w∈SY b(w; θ) > ∞ and supw∈SY b(w; θ) − inf w∈SY b(w; θ) < ∞ for
all θ ∈ Θ, our density estimator reduces to the estimator of Beskos, Papaspiliopoulos and Roberts (2009). Our approach requires weaker assumptions, so our estimator has a broader scope even in the diffusion case. Moreover, our approach allows us to select the Poisson rate `, the drift ρ, and the
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
13
localization bound L, and this facilitates the construction of computationally
efficient density estimators (see Section 5).
Often, one is interested in evaluating partial derivatives of p∆ (v, w; θ). For
example, many sufficient conditions for consistency and asymptotic normality of a MLE θ̂m are formulated in terms of partial derivatives of the density;
see Appendix A. Conveniently, under certain conditions, the density estimator (13) can be differentiated to obtain unbiased estimators of the partial
derivatives of the transition density.
Corollary 4.2. Suppose that the conditions of Proposition 3.2 and
Theorem 4.1 hold. In addition, suppose that:
(B7) For any ξ > 0, the following functions are n-times continuously differentiable: (y, θ) 7→ minw∈[−ξ,ξ] b(y +w; θ) and (y, θ) 7→ maxw∈[−ξ,ξ] b(y +
w; θ).
(B8) The order of differentiation and Qθ -expectation can be interchanged
for p̂∆ (v, w; θ) for the n-th partial derivative taken with respect to θ;
i.e., for i1 , . . . , in ∈ {1, . . . , r},
Q n
∂in1 ,...,in EQ
θ [p̂∆ (v, w; θ)] = Eθ ∂i1 ,...,in p̂∆ (v, w; θ) .
Then the mapping θ 7→ p̂∆ (v, w; θ) is almost surely n-times continuously
differentiable for any v, w ∈ S. In addition, any n-th partial derivative of
p̂∆ (v, w; θ) with respect to θ is an unbiased estimator of the corresponding
partial derivative of p∆ (v, w; θ). That is, for i1 , . . . , in ∈ {1, . . . , r},
n
∂
p̂
(v,
w;
θ)
= ∂in1 ,...,in p∆ (v, w; θ).
EQ
∆
i
,...,i
θ
n
1
The assumptions of Proposition 3.2 together with Assumption (B7) imply
that the function fˆ in (12) is n-times continuously differentiable with respect
to all its arguments. This is a necessary condition for the differentiability
of the density estimator p̂∆ . Assumption (B8), again formulated loosely,
ensures the unbiasedness of a derivative estimator.
5. Implementation of density estimator. This section explains the
practical implementation of the transition density estimator (13). The algorithms stated below have been implemented in R and are available for
download at http://people.bu.edu/gas.
Fix L, ` > 0 and ρ ∈ R. To generate a sample of p̂∆ (v, w; θ), we require a
vector R = (P, T, E, W, V, D) of variates with the following properties:
• P ∼ Poisson(`∆) is a sample of the jump count N∆ under Qθ
• T = (Tn )1≤n≤P is a sample of the jump times (Tn )1≤n≤N∆ under Qθ
14
GIESECKE AND SCHWENKLER
•
•
•
•
E = (Enk )1≤n≤P+1,k≥1 is a collection of i.i.d. samples of the exit time η
W = (Win )1≤n≤P+1,i≥1 is a collection of i.i.d. uniforms on {−L, L}
n )
V = (Vi,j
1≤n≤P+1,i,j≥1 is a collection of i.i.d. standard uniforms
D = (Dn )1≤n≤P is a collection of i.i.d. samples from the density π
The variates P and D can be sampled exactly using the inverse transform
method. The Poisson jump times T can be generated exactly as the order
statistics of P uniforms on [0, ∆]. The collection E of exit times can be
generated exactly using an acceptance-rejection scheme; see Section 4.1 of
Chen and Huang (2013). This scheme uses gamma variates. The sampling
of the remaining variates is trivial.
Algorithm 5.1 (Computation of Density Estimator).
S, θ ∈ Θ, do:
For given v, w ∈
(i) Set Y0x = x = F (v; θ) and y = F (w; θ).
(ii) For n = 1, . . . , P, do:
(a) Draw samples of YTxn − and YTxn under Qθ according to (5). Compute the quantity fˆ(YTxn−1 , YTxn − , Tn − Tn−1 ; θ) according to (12).
(iii) Draw samples of YTxP and Y∆x under Qθ and compute fˆ(YTxP , Y∆x , ∆ −
TP ; θ).
(iv) Compute the density estimator p̂∆ (v, w; θ) as
fˆ(YTxP , Y∆x , ∆ − TP ; θ) a(F (w;θ);θ)−a(F (v;θ);θ)−
p
e
2π(∆ − TP )σ(w; θ)
×
P
Y
2
y−ρ(∆−TP )−YTx
P
2(∆−TP )
c(YTxn − , Dn ; θ) fˆ(YTxn−1 , YTxn − , Tn − Tn−1 ; θ).
n=1
Only Steps (ii)a and (iii) of Algorithm 5.1 are nontrivial. The following
algorithm details the implementation of these steps.
Algorithm 5.2 (Sampling YTxn − and YTxn and computing fˆ(YTxn−1 , YTxn − ,
Tn − Tn−1 ; θ) ). Fix YTxn−1 , Tn−1 , Tn , and Dn . Let I = max{i ≥ 1 : E1 +
· · ·+Ei ≤ Tn −Tn−1 } and set w1,0 = YTxn−1 and wi,0 = wi−1,0 +ρEi−1 +Wi−1
for i = 2, . . . , I + 1. For i = 1, . . . , I + 1, do:
(i) Compute
mi =
Mi =
min
b(wi,0 + w; θ)
max
b (wi,0 + w; θ) − mi
|w|≤L+|ρ|Ei
|w|≤L+|ρ|Ei
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
15
(ii) Draw samples of the jump times τi,1 , . . . , τi,pi of a dominating Poisson
process with rate Mi . Set τi,0 = 0 and τi,j = τi,j−1 − (log Vi,j )/Mi for
j ≥ 1 while τi,j ≤ Ei . Set
pi =
max{j ≥ 0 : τi,j ≤ Ei },
P
max{j ≥ 0 : τi,j ≤ Tn − Tn−1 − Ii=1 Ei },
i = 1, . . . , I
i=I +1
(iii) Compute a skeleton wi,1 , . . . , wi,pi of a Brownian bridge reaching from
0 at time 0 to Wi at time Ei . For i = IP+ 1, compute the additional
skeleton point w− at time Tn − Tn−1 − Ii=1 Ei
(iv) Compute the normalizing factor
(
exp(−m
i = 1, . . . , I
i Ei ),
PI
ei =
i=I +1
exp −mi (Tn − Tn−1 − i=1 Ei ) ,
P
A sample of YTxn − is given by y − = wI+1,0 + w− + ρ(Tn − Tn−1 − Ii=1 Ei ).
A sample of YTxn is given by y + = y − + ΓY (y − , Dn ; θ). A sample of fˆ(YTxn−1 ,
YTxn − , Tn − Tn−1 ; θ) is given by
I+1
Y
i=1
ei
pi
Y
j=1
!
b wi,0 + wi,j + ρτi,pj ; θ − mi
1−
.
Mi
The correctness of Algorithm 5.2 follows from Theorem 4.1 after noting
that an exact sample of the first jump time of a Poisson process with rate
Mi (θ) is given by − log(U )/Mi (θ), where U is standard uniform. This observation is used in Step (ii). The skeleton of a Brownian bridge required in
Step (iii) can be sampled exactly using the procedure outlined in Section 6.4
of Beskos, Papaspiliopoulos and Roberts (2009), which constructs a skeleton
in terms of three independent sequences of standard normals.
Note that the vector of basic random variates R used in the algorithms
above does not depend on the arguments (v, w; θ) of the density estimator
(13). Thus, a single sample of R suffices to generate samples of p̂∆ (v, w; θ)
for any v, w ∈ S and θ ∈ Θ. This property generates significant computational benefits for the maximization of the simulated likelihood based on the
estimator (13); see Section 6.
We discuss the optimal selection of the level L, the Poisson rate `, and the
drift ρ. While these parameters have no impact on the unbiasedness of the
density estimator, they influence the variance and computational efficiency
of the estimator. The Poisson rate ` governs the frequency of the jumps of
X under Qθ . If ` is small, then TN∆ ≈ 0 with high Qθ -probability and the
16
GIESECKE AND SCHWENKLER
estimator (13) approximates the density p∆ as a weighted Gaussian density,
ignoring the jumps of X. Thus, the estimator (13) has large variance in the
tails. On the other hand, the computational effort required to evaluate the
estimator increases with `. This is because EQ
θ [P] = `∆ so that Step (ii)a of
Algorithm 5.1 is repeated more frequently for large values of `.
The level parameter L controls the number of iterations I +1 in Algorithm
5.2. Note that Qθ [|Ytx | > L] → 0 as L → ∞ for any fixed 0 ≤ t ≤ ∆ because
Y x is non-explosive under Qθ . Thus, the larger L, the smaller I, and the
fewer iterations of Algorithm 5.2 are needed to compute (12). On the other
hand, large values of L make Mi (θ) large, which increases pi in (12) for
all 1 ≤ i ≤ I + 1. This, in turn, increases both the variance of the thinning
estimator (12) and the computational effort required to evaluate it. Similarly,
large positive or negative values of ρ also make Mi (θ) large, increasing both
the variance of the density estimator and the computational effort necessary
to evaluate it.
We propose to choose the quantities `, L, and ρ so as to optimally trade
off computational effort and variance. We adopt the efficiency concept of
Glynn and Whitt (1992) for simulation estimators, defining efficiency as the
inverse of the product of the variance of the estimator and the work required
to evaluate the estimator. Thus, we select `, L, and ρ as the solution of the
optimization problem
2
(14)
min
max EQ
θ p̂∆ (v, w; θ) × R(v, w; θ),
`,L>0,ρ∈R v,w∈S
θ∈Θ
where R(v, w; θ) is the time required to compute the estimator for given v,
w, θ, `, L, and ρ. The solution of this optimization problem leads to a density
estimator that is efficient across the state and the parameter spaces.6
The problem (14) is a non-linear optimization problem with constraints,
which can be solved numerically using standard methods. A run of Algorithm
5.1 yields R(v, w; θ) for given v, w, θ, `, L, and ρ. An unbiased estimator
2
of the second moment EQ
θ [p̂∆ (v, w; θ)] can be evaluated using a variant of
Algorithm 5.1.
6. Simulated likelihood estimators. This section analyzes the asymptotic behavior of the simulated likelihood estimator of the parameter θ of the
jump-diffusion process X. Let p̂K
∆ be a transition density estimator based on
K ∈ N Monte Carlo samples of (13). The simulated likelihood function of
6
One
could
also
choose
“locally”
optimal
parameters
by
solving
2
min`,L>0,ρ∈R EQ
[p̂
(v,
w;
θ)]R(v,
w;
θ)
for
each
(v,
w;
θ).
However,
this
may
be
com∆
θ
putationally burdensome.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
17
Q
K
θ at the data X is given by L̂K (θ) = m
n=1 p̂∆ (X(n−1)∆ , Xn∆ ; θ); this is the
Monte Carlo counterpart of L(θ) is Section 2. A simulated maximum likeliK satisfies almost surely θ̂ K ∈ arg max
K
hood estimator (SMLE) θ̂m
θ∈Θ L̂ (θ).
m
Conveniently, the maximization of the simulated likelihood is effectively
a deterministic optimization problem. We draw K Monte Carlo samples of
the basic random variate R to construct the density estimator p̂K
∆ (v, w; θ);
see Section 5. Because p̂K
(v,
w;
θ)
is
a
deterministic
function
of
(v, w, θ)
∆
given the samples of R, it can be evaluated at various data points (v, w) =
(X(n−1)∆ , Xn∆ ) without re-simulation. Thus, given the samples of R and the
data X, the likelihood L̂K (θ) is a deterministic function of the parameter
θ. There is no need to re-simulate the likelihood during the optimization,
eliminating the need to deal with a simulation optimization problem.
K . We first establish that θ̂ K is
We study the properties of a SMLE θ̂m
m
asymptotically unbiased. A sufficient condition for this property is that
limK→∞ L̂K (θ) = L(θ) almost surely and uniformly over the parameter
space Θ. The strong law of large numbers and the continuous mapping theorem imply that the above convergence occurs almost surely in our setting,
but they do not provide uniformity of the convergence. We use the strong law
of large numbers for random elements in separable Banach spaces to prove
uniform convergence, exploiting the compactness of the parameter space Θ
and the fact that L̂K takes values in a separable Banach space (see, e.g.,
Beskos, Papaspiliopoulos and Roberts (2009) and Straumann and Mikosch
(2006)). Conveniently, asymptotic unbiasedness implies (strong) consistency
of a SMLE if a MLE is (strongly) consistent.
18
Theorem 6.1.
suppose that:
GIESECKE AND SCHWENKLER
Suppose the conditions of Theorem 3.1 hold. Moreover,
h
i
(C1) For any v, w ∈ S, E supθ∈Θ p̂∆ (v, w; θ) < ∞.
K is an asymptotically unbiased estimator of θ̂ , i.e.,
Then any SMLE θ̂m
m
K
θ̂m → θ̂m almost surely as K → ∞. If a MLE θ̂m is (strongly) consistent,
K is also a (strongly) consistent estimator of the true paramthen a SMLE θ̂m
K → θ ∗ in P ∗ -probability
eter θ∗ if K → ∞ as m → ∞. In other words, θ̂m
θ
(almost surely) as m → ∞ and K → ∞.
Theorem 6.1 states that, for a given realization of the data X, a SMLE
K converges to a theoretical MLE as the number of Monte Carlo samples
θ̂m
K → ∞. This implies that a SMLE inherits the consistency of a MLE if more
Monte Carlo samples are used as more data becomes available (see Appendix
A for sufficient conditions for consistency of a MLE). Condition (C1) is mild,
and implies that the simulated likelihood is bounded in expectation.
How many Monte Carlo samples K of the density estimator (13) need to
be generated for each additional observation of X? In general, the number
of samples will influence the variance of the density estimator and this will
affect the asymptotic distribution of a SMLE. Standard Monte Carlo theory
asserts that the error from approximating the true transition density p∆
−1/2 ). Thus, the Monte Carlo error
by the estimator p̂K
∆ is of order O(K
K
arising from using p̂∆ instead of p∆ for a single observation vanishes as
K → ∞. However, the aggregate Monte Carlo error associated with the
simulated likelihood function L̂K (θ) may explode as m → ∞ if K is not
chosen optimally in accordance with m. The following theorem indicates
the optimal choice of K.
Theorem 6.2. Suppose the conditions of Theorem 6.1 hold, and assume
the conditions of Corollary 4.2 for differentiation up to second order. In
addition, suppose the following condition holds.
(C2) The mapping θ 7→ p∆ (v, w; θ) is three-times continuously differentiable
for any v, w ∈ S.
(C3) There exists a deterministic matrix Σθ∗ of full rank such that the following limit holds in Pθ∗ -distribution as m → ∞: √1m ∇ log L(θ∗ ) →
N (0, Σθ∗ ).
1 2
(C4) The equality Σθ∗ = − limm→∞ m
∇ log L(θ∗ ) holds in Pθ∗ -probability
for Σθ∗ from Condition (C3).
h i
p̂∆ (v,w;θ)
(C5) For any θ ∈ int Θ, supv,w∈S VarQ
∇
< ∞.
θ
p∆ (v,w;θ)
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
19
m
K is asympIf K
→ c ∈ [0, ∞) as m → ∞ and K → ∞, then a SMLE θ̂m
√
−1
K − θ ∗ ) → N 0, Σ
in Pθ∗ -distribution
totically normal. That is, m(θ̂m
θ∗
m
as m → ∞ and K → ∞. On the other hand, if K → ∞ as m → ∞ and
√
K − θ ∗ ) → 0 almost surely.
K → ∞, then K(θ̂m
Conditions (C3) and (C4) imply asymptotic normality of a MLE θ̂m
−1
with asymptotic variance-covariance matrix Σ−1
θ∗ ; i.e., θ̂m → N (0, Σθ∗ ) as
m → ∞. A SMLE inherits this asymptotic normality property if the density
m
estimator satisfies (C5) and K
converges to some finite constant. There is
no loss of efficiency when estimating θ using the simulated likelihood rather
than the theoretical likelihood. The Monte Carlo variance does not impact
the asymptotic distribution of a SMLE. If the number of Monte Carlo samples K grows fast enough, then the Monte Carlo variance vanishes in the
limit as m → ∞. This is guaranteed by the choice K = O(m).
Sufficient conditions for differentiability of the density p∆ are given in
Proposition 3.2. Sufficient conditions for (C3) and (C4) are given in Appendix A. Condition (C5) is mild but necessary. It implies that the variance
of the simulated score function is finite. Sufficient conditions for (C5) are
given in Table 2.
That our Monte Carlo approximation of the transition density does not
affect the asymptotic distribution of the estimator is a consequence of the
unbiasedness of our density approximation. To appreciate this feature, consider a conventional Monte Carlo approximation of the transition density,
where one first approximates X on a discrete-time grid and then applies a
nonparametric kernel to the Monte Carlo samples. In the special case of a
diffusion that is approximated using an Euler scheme, Detemple, Garcia and
Rindisbacher (2006) show that this approach distorts the asymptotic distriEuler denote the
bution of the likelihood estimator. More precisely, letting θ̂m
estimator obtained from Km i.i.d. Monte Carlo samples of the Euler approximation of X∆ that are based on km discretization steps, Theorem 12
of Detemple, Garcia and Rindisbacher (2006) implies that if Km → ∞ and
√
Euler − θ ∗ ) → N β, ΣEuler as m → ∞ in
km → ∞ as m → ∞, then m(θ̂m
Pθ∗ -distribution, where either β 6= 0 or ΣEuler 6= Σ−1
θ∗ . In particular, Detemple, Garcia and Rindisbacher (2006) show that β = 0 and ΣEuler = Σ−1
θ∗ cannot hold simultaneously. Thus, this approach either generates size-distorted
asymptotic standard errors or is inefficient.7 Our exact Monte Carlo approach, in contrast, facilitates efficient parameter estimation and produces
7
Efficiency can be achieved if the number of Euler discretization steps
√km is chosen according to the square-root rule of Duffie and Glynn (1995), i.e., km = O( Km ). Detemple,
Garcia and Rindisbacher (2006) develop an improved discretization scheme for diffusions
for which β = 0. For jump-diffusions with state-independent coefficients, Kristensen and
20
GIESECKE AND SCHWENKLER
correct asymptotic standard errors at the same time. It eliminates the need
to discretize X, and generates an estimator that has the same asymptotic
distribution as a true MLE.
7. Numerical results. This section illustrates the performance of the
density and simulated likelihood estimators. We consider two alternative
models. The first is the mean-reverting interest rate model of Das (2002).
We specify the jump-diffusion SDE (1) by choosing the following functions
for θ = (κ, X̄, σ, l0 , l1 , γ1 , γ2 ) ∈ R+ × R × R3+ × R × R+ , x ∈ S = R, and
d ∈ D = R: µ(x; θ) = κ(X̄ − x), σ(x; θ) = σ, Γ(x, d; θ) = γ1 + γ2 d, and
Λ(x; θ) = l0 + l1 x. The jump-diffusion X has dynamics
(15)
dXt = κ(X̄ − Xt )dt + σdBt + dJt ,
PNt
where Jt =
n=1 (γ1 + γ2 Dn ) and N is a counting process with statedependent intensity λt = l0 + l1 Xt . The marks (Dn )n≥1 are i.i.d. standard normal. We choose the parameter space as Θ = [0.0001, 3] × [−1, 1] ×
[0.0001, 1] × [0.0001, 100] × [−10, 10] × [−0.1, 0.1] × [0.0001, 0.1]. The true
parameter θ∗ is taken as (0.8542, 0.0330, 0.0173, 54.0500, 0.0000, 0.0004,
0.0058) ∈ int Θ, the value estimated by Das (2002) from daily data of the
Fed Funds rate between 1988 and 1997.8 We take X0 = X̄ = 0.0330.
The second model we consider is the double-exponential stock price model
of Kou (2002). For this model, we set θ = (µ, σ, l0 , l1 , p, γ1 , γ2 ) ∈ R × R+ ×
R2+ ×[0, 1]×R2+ , x ∈ S = R+ , d ∈ D = R2+ ×[0, 1], µ(x; θ) = µx, σ(x; θ) = σx,
Λ(x; θ) = l0 + l1 x, and
x(ed1 − 1),
if d3 < p,
Γ(x, d; θ) =
x(e−d2 − 1), otherwise.
This results in a jump-diffusion X with dynamics
(16)
dXt = µXt− dt + σXt− dBt + Xt− dJt
P t Un
with Jt = N
− 1) for a counting process N with state-dependent
n=1 (e
intensity λt = `0 + `1 Xt . The random variable Un has an asymmetric
(1)
(3)
double exponential distribution with Un = Dn if Dn < p, and Un =
(2)
(1)
(2)
(3)
−Dn otherwise, for a mark variable Dn = (Dn , Dn , Dn ) that satisfies
(1)
(2)
(3)
Dn ∼ Exp(1/η1 ), Dn ∼ Exp(1/η2 ), and Dn ∼ Unif[0, 1]. We choose
Θ = [−1, 1] × [0.0001 × 1] × [0.0001, 100] × [−1, 1] × [0.001, 100]2 . The true
Shin (2012) show that β = 0 can be achieved under conditions on the kernel.
8
Das (2002) assumed l1 = 0.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
21
parameter is θ∗ = (0.15, 0.20, 10, 0, 0.3, 1/0.02, 1/0.04) ∈ int Θ, which is
consistent with the choice of Kou (2002). We take X0 = 10.
Model (15) is affine in the sense of Duffie, Pan and Singleton (2000), allowing us to compute the “true” transition density of X by Fourier inversion
of the characteristic function of X. For Model (16), only the process log X
is affine. We can nonetheless recover the transition density of log X in semianalytical form via Fourier inversion, and then compute the density of X via
a change of variables. The density estimators derived from Fourier inversion
in these ways serve as benchmarks for Models (15) and (16).
We implement the Fourier inversion via numerical quadrature with 103
discretization points in [−103 , 103 ]. The characteristic functions of X in
Model (15) and log X in Model (16) are known in closed form if l1 = 0.
When l1 6= 0, the characteristic functions solve sets of ordinary differential
equations that need to be solved numerically. We use a Runge-Kutta method
based on 50 time steps to numerically solve these differential equations.
The numerical results reported below are based on an implementation in
R, running on an 2×8-core 2.6 GHz Intel Xeon E5-2670, 128 GB server at
Boston University with a Linux Centos 6.4 operating system. All R codes
are available for download at http://people.bu.edu/gas.
7.1. Transition density estimator. We begin by evaluating the accuracy
∗
of the unbiased density (UD) estimator. Figures 1, 2, and 3 show p̂K
∆ (X0 , w; θ )
1 1 1
for Models (15) and (16), ∆ ∈ { 12 , 4 , 2 }, and each of several K, along with
pointwise empirical confidence bands obtained from 103 bootstrap samples.9
We compare the UD estimator with several alternatives:
• A Gaussian kernel estimator obtained from K samples of X∆ generated
with the exact method of Giesecke and Smelov (2013).
• A Gaussian kernel estimator obtained from K samples of X∆ generated with the discretization method of Giesecke
√ and Teng (2012). The
number of discretization steps is taken as K, as suggested by the
results of Duffie and Glynn (1995).
The UD estimator oscillates around the true transition density, which is
governed by the expectation Ψ∆ in (7). The expectation Ψ∆ can be viewed
as a weighted sum of infinitely many normal distributions with variance
TN∆ . The UD estimator approximates this infinite sum by a finite sum. The
oscillations of p̂K
∆ correspond to the normal densities that are mixed by the
UD estimator (13). The amplitude of the oscillations of p̂K
∆ is large for small
9
We find that the solutions of (14) yielding optimal configurations of the density estimator are ` = 58.18, L = 3.65 and ρ = 0.02 for Model (15) and ` = 21.92, L = 7.38, and
ρ = 0.06 for Model (16). The optimizations were solved using the Nelder-Mead method.
22
GIESECKE AND SCHWENKLER
K = 1000
0
10
20
30
UD
Confidence bands
Fourier inversion
Kernel (Euler)
Kernel (Exact)
0.00
0.02
0.04
0.06
0.08
0.10
K = w5000
0
10
20
30
UD
Confidence bands
Fourier inversion
Kernel (Euler)
Kernel (Exact)
0.00
0.02
0.04
0.06
0.08
0.10
K = 10000
w
0
10
20
30
UD
Confidence bands
Fourier inversion
Kernel (Euler)
Kernel (Exact)
0.00
0.02
0.04
0.06
0.08
0.10
∗
Fig 1. Estimator p̂K
∆ (X0 , w; θ ) for Model (15) with K ∈ {1000, 5000, 10000}, w ∈ [0, 0.1]
and ∆ = 1/12, along with 90% bootstrap confidence bands, as well as the true transition
density and kernel estimators obtained from Euler’s method and exact simulation.
values of K. As K increases, the amplitude of the oscillations decreases and
the confidence bands become tight. This confirms the unbiasedness result of
Theorem 4.1. If the number K of Monte Carlo samples is sufficiently large,
the UD estimator p̂K
∆ accurately approximates the transition density over
the entire range of X. In contrast, both kernel density estimators are biased.
The biases are relatively large in the tails of the distribution; see Figure 2.
They are also large when the time ∆ between consecutive observations is
23
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
1e−05
1e−03
1e−01
K = 1000
1e−09
1e−07
UD
Confidence bands
Fourier inversion
Kernel (Exact)
Kernel (Euler)
7
8
9
10
11
12
13
11
12
13
11
12
13
1e−05
1e−03
1e−01
K = w5000
1e−09
1e−07
UD
Confidence bands
Fourier inversion
Kernel (Exact)
Kernel (Euler)
7
8
9
10
1e−05
1e−03
1e−01
K = 10000
w
1e−09
1e−07
UD
Confidence bands
Fourier inversion
Kernel (Exact)
Kernel (Euler)
7
8
9
10
∗
Fig 2. Estimator p̂K
∆ (X0 , w; θ ) for Model (16) with K ∈ {1000, 5000, 10000}, w ∈ [0, 0.1]
and ∆ = 1/12, along with 90% bootstrap confidence bands, as well as the true transition
density and kernel estimators obtained from Euler’s method and exact simulation. The
y-axis in this plot is displayed in log scale.
large, as can be seen in Figure 3.
By construction, our density estimator respects the boundary of the state
space S. That is, our density estimator assigns no probability mass to values
outside of S. Figure 4 illustrates this property for Model (16), in which
X has the bounded state space S = [0, ∞). The 90% confidence bands of
p̂∆ (X0 , w; θ∗ ) are tight and close to zero for very small values of w regardless
24
GIESECKE AND SCHWENKLER
∆ = 0.25 (log scale)
0.0
1e−10
0.1
1e−06
0.2
0.3
UD
Confidence bands
Fourier inversion
Kernel (Exact)
Kernel (Euler)
1e−02
0.4
∆ = 0.25
6
8
10
12
14
16
6
8
12
14
16
0.00
1e−10
0.10
1e−06
0.20
1e−02
UD
Confidence bands
Fourier inversion
Kernel (Exact)
Kernel (Euler)
0.30
10
∆ = 0.5 (log
w scale)
∆ =w0.5
4
6
8
10
12
14
16
18
4
6
8
10
12
14
16
18
∗
1 1
Fig 3. Estimator p̂K
∆ (X0 , ·; θ ) for Model (16) with K = 1000, X0 = 10, and ∆ ∈ { 4 , 2 },
along with 90% bootstrap confidence bands, as well as the true transition density and kernel
estimators obtained from Euler’s method and exact simulation. The y-axes of the right plots
are displayed in log scale.
of the size of X0 . Although not displayed in Figure 4, we know that the
Gaussian kernel estimator derived from exact samples of X∆ will also restrict
itself to S. If the samples are derived from Euler’s method, then the Gaussian
kernel estimator may not satisfy this property because Euler’s method does
not ensure that the approximate solution of the SDE (16) will stay within
the state space S.
Figure 4 also shows that our density estimator is always non-negative.
This property holds because our estimator is a mixture of Gaussian densities
and the weights are non-negative. Although not displayed, the kernel density
estimators are also always non-negative. However, Figure 4 shows that the
Fourier inverse estimator may become negative for extreme values of the
state space S. This occurs because the numerical inversion of the Fourier
transform may become numerically unstable in certain situations.
7.2. Computational efficiency. We run three experiments to assess the
computational efficiency of our UD estimator. We start by analyzing the con-
25
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
X0 = 0.01
−1010
0.00
0.05
0.10
−200
UD
Confidence bands
Fourier inversion
UD
Confidence bands
Fourier inversion
−10100
−20 0
20
200
40
60
600
X0 = 0.1
0.15
0.000
0.005
0.010
0.015
X0 = 0.0001
w
−101000
UD
Confidence bands
Fourier inversion
0.0000
0.0005
0.0010
0.0015
−1010000
−2000
0
2000
40000
6000
X0 = w0.001
UD
Confidence bands
Fourier inversion
0.00000
0.00005
0.00010
0.00015
∗
Fig 4. Estimator p̂K
∆ (X0 , ·; θ ) for Model (16) with K ∈ {1000, 5000, 10000}, X0 ∈
{0.1, 0.01, 0.001, 0.0001}, and ∆ = 1/12, along with 90% bootstrap confidence bands, as
well as the true transition density.
vergence of the root-mean-squared error (RMSE) of the estimator p̂K
∆ (v, w; θ)
at a random parameter θ and 120 randomly selected points (v, w). The
bias of the RMSE is computed relative to the “true” density obtained by
Fourier inversion. We take ∆ = 1/12 for Model (15) and ∆ = 1/4 for
Model (16). Repeated calculations of the transition density at the points
(v, w) = (X(n−1)∆ , Xn∆ ) are required for evaluating the likelihood. Thus,
our analysis also indicates the efficiency of computing the likelihood given
10 years of monthly data for Model (15), and 30 years of quarterly data
for Model (16). Figure 5 shows the RMSE of p̂K
∆ (v, w; θ) as a function of
the time required to evaluate the estimator at a randomly selected θ ∈ Θ
and 120 randomly selected pairs (v, w) ∈ [0, 0.1]2 . It also shows the RMSE
for the alternative estimators discussed above. The UD estimator has the
fastest error convergence rate. It also has the smallest RMSE when the time
between observations is small or when the available computational budget
is large, consistent with the asymptotic computational efficiency concept of
Glynn and Whitt (1992).
Next, we study the computational effort required estimate the likelihood
for different sample sizes. To this end, we select a random parameter θ ∈ Θ
and m random pairs (v, w) ∈ [0, 0.1]2 , and measure the average time it takes
3
to evaluate p̂K
∆ (v, w; θ) across all m pairs (v, w) for ∆ = 1/12 and K = 10 .
26
GIESECKE AND SCHWENKLER
1.0
Kernel (Euler)
K = 750
K = 500
Kernel (Exact)
K = 1000
UD
K = 2500
0.5
RMSE (log scale)
2.0
K = 45
K = 5000
K = 5000
K = 10000
K = 15000
2
5
10
20
Run time (in mins, log scale)
0.030 0.040
0.015 0.020
RMSE (log scale)
(a) Model (15), ∆ = 1/12.
Kernel (Exact)
K = 50
K = 1000
UD
K = 100
K = 500
Kernel (Euler)
K = 1000
K = 500
K = 5000
K = 1000
K = 5000
K = 10000
K = 15000
1
2
5
10
20
K = 10000
50
100
Run time (in mins, log scale)
(b) Model (16), ∆ = 1/4.
Fig 5. Root-mean-squared error (RMSE) of different density estimators as a function of computation time. The RMSE is the square root of the average squared error of an estimator of p∆ (v, w; θ) over 120 randomly selected pairs (v, w) and a randomly selected parameter θ. For Model (15) with ∆ = 1/12, we randomly select
(v, w) ∈ [0, 0.1]2 and θ = (0.9924, 0.0186, 0.0345, 32.6581, 2.3996, 0.0006, 0.0039) ∈ Θ.
For Model (16) with ∆ = 1/4, we randomly select (v, w) ∈ [5, 15]2 and θ =
(−0.1744, 0.2342, 28.4388, 0.5418, 0.9278, 80.8277, 69.6841) ∈ Θ.
Figure 6 shows the average run time as a function of m for Model (15). It also
compares our average run times to the average run times associated with
alternative density estimators. Our density estimator involves the smallest
per-observation cost when evaluating the likelihood for a given m. Further,
the per-observation cost decreases as m grows. Similar findings hold for
Model (16), and for alternative choices of ∆.
The UD estimator performs well in these first two experiments for two reasons. First, the samples of the basic random variates R used to compute the
density estimator (see Section 5) need to be generated only once. They can
27
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
10
5
2
Fourier inversion
Kernel (Euler)
1
Per−observation run time (in secs, log scale)
Kernel (Exact)
UD
50
100
200
500
Sample size m (log scale)
Fig 6. Per-observation run time required to compute the likelihood of Model (15)
with different density estimators as a function of the sample size m. The perobservation run time is measured as the average time necessary to estimate the density p∆ (v, w; θ) at m randomly selected pairs (v, w) ∈ [0, 0.1]2 for ∆ = 1/12 and
θ = (0.9924, 0.0186, 0.0345, 32.6581, 2.3996, 0.0006, 0.0039) selected randomly from Θ. We
take K = 1000 for the UD estimator. For the Kernel (Euler) estimator, we use 1000 Euler
samples of X∆ as to achieve roughly the same RMSE (see Figure 5). Due to computational
constraints, for the Kernel (Exact) estimator we use only 500 exact samples of X∆ .
be reused to compute p̂K
∆ (v, w; θ) for any v, w ∈ S and any θ ∈ Θ. This yields
the small and decreasing per-observation cost for estimating the likelihood.
Second, our density estimator is unbiased. Alternative estimators do not
exhibit both of these properties. Kernel density estimation introduces bias,
which slows down the rate of convergence for the RMSE. Euler discretization
introduces additional bias. As a result, the discretization-based density estimator has the slowest rate of convergence. For the other simulation-based
estimator, which is based on exact samples of X∆ , the samples of the process
must be re-generated for every pair (v, w) at which the transition density
is estimated. This increases computational costs. Finally, the Fourier density estimator of Duffie, Pan and Singleton (2000) is essentially error-free.
However, evaluating this estimator requires numerically solving a system
of ordinary differential equations for each pair (v, w), which increases the
computational costs.
A Monte Carlo estimator of the density has the implicit benefit that it
28
GIESECKE AND SCHWENKLER
Table 1
Run times of the different operations necessary to compute the density estimator
p̂K
∆ (v, w; θ) for Model (15) at a random parameter θ ∈ Θ and 120 random
(v, w) ∈ [0, 0.1]2 using Kp processors.
Generating K i.i.d. samples of R
Computing one sample of p̂∆ given R
Computing p̂K
∆ given K samples of R for K = 1000
Total run time in seconds
Speed-up factor
1
9.84
0.10
98.86
108.70
Processors Kp
2
4
9.84
9.84
0.10
0.10
53.64 31.93
63.48 41.77
1.71
2.60
6
9.84
0.10
27.19
37.03
2.94
can be computed in parallel using multiple processors. If a density estimator
requires K i.i.d. Monte Carlo samples and Kp processors are available, then
each processor only needs to compute KKp Monte Carlo samples. We analyze
the computational gains generated by parallel computing our density estimator. For Model (15), we measure the time it takes to generate K = 1000
i.i.d. samples of the basic random variates R and parallel compute p̂∆ (v, w; θ)
at a random parameter θ ∈ Θ and 120 random pairs (v, w) ∈ [0, 0.1]2 using
Kp ∈ {1, 2, 4, 6} processors. Table 1 shows that parallelization results in a
significant reduction of the run time necessary to evaluate our density estimator. Increasing the number of processors from 1 to 4 reduces the total
run time by a factor of 2.6. There are other possibilities to further reduce
the run time required to evaluate our UD estimator. Graphics processing
units (GPUs) may be used, for example.
7.3. Simulated likelihood estimators. A Monte Carlo analysis illustrates
the properties of the simulated maximum likelihood estimators (SMLE).
We generate 100 samples of the data X = {X0 , X∆ , . . . , Xm∆ } from the
law Pθ∗ for m = 600 and ∆ = 1/12 for Model (15), and m = 400 and
∆ = 1/4 for Model (16) using the exact algorithm of Giesecke and Smelov
(2013). This corresponds to 50 years of monthly data for Model (15) and 100
years of quarterly data for Model (16). For each data sample, we compute
K by maximizing the simulated likelihood L̂K , and an MLE
an SMLE θ̂m
θ̂m by maximizing the likelihood obtained from the true transition density.
The Nelder-Mead method, initialized at θ∗ , is used to numerically solve the
optimization problems. In accordance with Theorem 6.2, we choose K =
10m for Model (15) and K = 15m for Model (16) in order to guarantee
asymptotic normality of the SMLE.
We verify the conditions that imply consistency and asymptotic normality. We first check that the conditions of Appendix A for consistency and
29
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
Table 2
Sufficient conditions for consistency and asymptotic normality as stated in Theorems 6.1
(k)
and 6.2, and Appendix A. We choose K = 1000 and m = 600. Here, (p̂∆ )1≤k≤K are
i.i.d. samples of the density estimator p̂∆ , and (Xn∆ )0≤n≤m is generated from Pθ .
Panel A
Proxy
Condition
(A2)
supθ∈Θ,1≤n≤m p̂K
∆ (X(n−1)∆ , Xn∆ ; θ) < ∞
(A3)
2
supθ∈Θ,1≤i≤r,1≤n≤m (∂i p̂K
∆ (X(n−1)∆ , Xn∆ ; θ)) < ∞
2 K
2
∂i,j p̂∆ (X(n−1)∆ ,Xn∆ ;θ)
supθ∈Θ,1≤i,j≤r,1≤n≤m
<∞
p̂K (X
,X
;θ)
(A4)
∆
(n−1)∆
n∆
(m)
(A6)
inf θ∈Θ 1{Σ̂(m) is positiv definite} = 1 for Σ̂θ
θ
as defined in (A-5) in Appendix A
2
(k)
(v,w;θ)
k=1 p̂
2 < ∞
supθ∈Θ,v,w∈[0,0.1]2 1 PK ∆
(k)
p̂
(v,w;θ)
k=1 ∆
K
2
P
(k)
K
1
k=1 ∂i p̂∆ (v,w;θ)
2
supθ∈Θ,1≤i≤r,v,w∈[0,0.1]2 K1 PK
(k)
∂ p̂
(v,w;θ)
k=1 i ∆
K
1
K
(C5)
Satisfied by:
Model (15)
Model (16)
Model (15)
Model (16)
Model (15)
Model (16)
Model (15)
Model (16)
PK
and
Model (15)
<∞
Model (16)
asymptotic normality of a MLE are satisfied. We can verify the conditions
using our density estimator p̂K
∆ . The analysis in Section 7.1 indicates that the
density estimator is a finite mixture of Gaussian densities. As a result, our
density estimator is three-times continuously differentiable, and Assumptions (A1) and (A5) of Appendix A are valid given that Θ is compact. Table
2 summarizes our procedure to evaluate the remaining conditions of Appendix A, and indicates that these conditions are satisfied. As a result, a
MLE is consistent and asymptotically normal. Next, we verify the conditions
of Theorems 6.1 and 6.2 implying consistency and asymptotic normality of
a SMLE. Table 2 shows that Condition (C5) is satisfied. Condition (C1) is
naturally satisfied because Θ is compact and p̂∆ is continuous. Conditions
(C2)-(C4) are also satisfied as implied by the conditions of Appendix A.
Thus, a SMLE is also consistent and asymptotically normal with the same
variance-covariance matrix as a MLE.
We test asymptotic unbiasedness of a SMLE (Theorem 6.1). Table 3 comK − θ̂ ] of a SMLE from a MLE to the average
pares the average deviation E[θ̂m
m
deviation (E[(θ̂m − θ∗ )2 ])1/2 of a MLE from the true parameter, for Model
(15) with ∆ = 1/12 in Panel A, and Model (16) with ∆ = 1/4 in Panel B.
The expectations are estimated by sample averages across all 100 data sets.
The values in Table 3 show that the average “error” of a SMLE is small
when compared to the average error of a MLE. The null hypothesis that the
error of a SMLE is equal to zero cannot be rejected for any model parameter
over any horizon based on the asymptotic distribution implied by Theorem
30
GIESECKE AND SCHWENKLER
Table 3
Average deviation of a SMLE from a MLE and average deviation of a MLE from the
true parameter θ∗ over all 100 data samples for Models (15) and (16).
k
X̄
σ
l0
l1
γ1
γ2
µ
σ
l0
l1
p
η1
η2
Model (15), ∆ = 1/12
m = 120, K = 1200
m = 600, K = 6000
K
K
E[θ̂m
− θ̂m ] (E[(θ̂m − θ∗ )2 ])1/2
E[θ̂m
− θ̂m ] (E[(θ̂m − θ∗ )2 ])1/2
0.0867
0.3627
0.2418
0.4296
0.0093
0.1739
−0.0506
0.1161
0.0071
0.0173
0.0088
0.0154
0.1192
3.2255
0.4691
2.5916
−1.9035
3.7097
−2.4162
4.1564
0.0000
0.0019
0.0005
0.0008
0.0012
0.0027
0.0003
0.0016
Model (16), ∆ = 1/4
m = 200, K = 3000
m = 400, K = 6000
K
K
E[θ̂m
− θ̂m ] (E[(θ̂m − θ∗ )2 ])1/2
E[θ̂m
− θ̂m ] (E[(θ̂m − θ∗ )2 ])1/2
0.0934
0.3465
0.1469
0.2987
0.1259
0.1618
0.2524
0.2293
−0.4477
12.8842
13.9345
14.2461
−0.1977
0.8127
−0.2395
0.8247
0.0623
0.3797
0.2956
0.3907
1.0535
14.1629
−1.2553
18.1835
−1.7469
9.4348
−6.7299
12.1261
6.2. This verifies the asymptotic unbiasedness property.
We analyze the finite-sample distribution of a SMLE. Tables 4 and 5
√
K−
compare the mean and the standard deviation of the scaled errors m(θ̂m
√
∗
∗
θ ) for a SMLE and m(θ̂m − θ ) for a MLE across all 100 data sets for
Models (15) and (16). They also indicates the theoretical asymptotic means
√
K − θ ∗ ) and √m(θ̂ − θ ∗ ) in accordance
and standard deviations of m(θ̂m
m
with Theorem 6.2. For most parameters, the moments of the scaled error of a
SMLE are similar to the corresponding moments of the scaled error of a MLE
over all time horizons for both models. Based on the asymptotic standard
errors indicated in the column “Asymp.” of Table 4, we cannot reject the
null hypothesis that the differences between the scaled error moments of
a SMLE and a MLE are equal to zero for any value of m. This tells us
that the finite-sample distribution of a SMLE is similar to the finite-sample
distribution of a MLE.
Next, we consider the asymptotic distribution of a SMLE. According to
Theorem 6.2, a SMLE and a MLE share the same asymptotic normal distribution with mean zero and variance-covariance matrix Σ−1
θ∗ . As a result,
we expect that the differences between the means and the standard devia√
K − θ ∗ ) and √m(θ̂ − θ ∗ ) decrease as m increases. We find
tions of m(θ̂m
m
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
31
Table 4
√
K
Empirical means (“M”) and standard deviations (“SD”) of the scaled error m(θ̂m
− θ∗ )
√
∗
of a SMLE and the scaled error m(θ̂m − θ ) of a MLE, over 100 independent data
samples for Model (15) with ∆ = 1/12. The table also shows the differences (“Diff.”)
between the sample moments of the scaled errors of a SMLE and a MLE, as well as the
average asymptotic moments (“Asymp.”) of a SMLE in accordance with Theorem 6.2.
The asymptotic mean is zero, the asymptotic standard deviations are given by the average
of the square-roots of the diagonal entries of Σ−1
θ ∗ in Theorem 6.2 across all 100 data sets.
√
√
m(k̂ − k∗ )
ˆ − X̄ ∗ )
m(X̄
√
m(σ̂ − σ ∗ )
√
√
m(lˆ0 − l0∗ )
m(lˆ1 − l1∗ )
√
m(γˆ1 − γ1∗ )
√
m(γˆ2 − γ2∗ )
√
√
m(k̂ − k∗ )
ˆ − X̄ ∗ )
m(X̄
√
m(σ̂ − σ ∗ )
√
√
m(lˆ0 − l0∗ )
m(lˆ1 − l1∗ )
√
m(γˆ1 − γ1∗ )
√
m(γˆ2 − γ2∗ )
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
m = 120, K = 1200
SMLE
MLE
Diff.
0.706 −0.243
0.950
2.034
3.965
−1.931
0.240
0.138
0.102
0.409
1.900
−1.491
0.017 −0.060
0.077
0.047
0.179
−0.132
20.252
18.946
1.306
25.768
29.763
−3.996
12.536
33.388 −20.852
18.956
22.921
−3.965
0.000 −0.001
0.000
0.003
0.021
−0.017
−0.002 −0.016
0.013
0.006
0.025
−0.019
m = 600, K = 6000
SMLE
MLE
Diff.
−3.266 −9.207
5.942
4.753
5.026
−0.273
0.276
1.507
−1.231
2.734
2.376
0.358
0.002 −0.219
0.221
0.092
0.309
−0.217
57.327
46.822
10.505
66.527
42.760
23.767
33.679
91.691 −58.011
50.360
42.808
7.552
0.001 −0.011
0.012
0.024
0.016
0.008
−0.007 −0.013
0.006
0.009
0.037
−0.029
Asymp.
0
4.614
0
2.089
0
0.499
0
50.588
0
110.867
0
0.161
0
0.207
Asymp.
0
4.614
0
2.089
0
0.499
0
50.588
0
110.867
0
0.161
0
0.207
32
GIESECKE AND SCHWENKLER
Table 5
√
K
Empirical means (“M”) and standard deviations (“SD”) of the scaled error m(θ̂m
− θ∗ )
√
∗
of a SMLE and the scaled error m(θ̂m − θ ) of a MLE, over 100 independent data
samples of Model (16) with ∆ = 1/4. When computing SMLE, we set K = 6000 for
m = 400, and K = 3000 for m = 200. The table also shows the analogous scaled errors of
the parameter estimators derived from the Kernel (Euler) estimator (“Euler”) as
described in Section 7.1, and the average asymptotic moments (“Asymp.”) of a SMLE in
accordance with Theorem 6.2. The asymptotic mean is zero, the asymptotic standard
deviations are given by the average of the square-roots of the diagonal entries of Σ−1
θ∗
across all 100 data sets. For Kernel (Euler), we set K = 1400 for m = 200, and
K = 2300 for m = 400.
Model (16), ∆ = 1/4
µ
σ
l0
l1
p
η1
η2
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
M
SD
SMLE
1.469
2.927
1.381
2.403
51.054
100.368
−1.703
2.350
1.049
3.141
6.588
62.247
2.827
60.441
m = 200
MLE
0.148
4.462
−0.401
1.001
57.385
166.556
1.092
11.421
0.167
4.749
−8.310
194.367
27.533
130.804
Euler
−0.018
0.635
−0.058
0.242
9.990
14.138
0.862
1.503
0.197
0.810
17.309
23.556
24.319
26.252
SMLE
3.383
5.212
4.738
5.078
344.954
279.682
−4.696
5.806
5.335
6.492
−86.325
245.855
−101.198
228.143
m = 400
MLE
0.446
4.771
−0.311
0.843
66.263
182.975
0.094
15.617
−0.578
5.500
−61.220
320.415
33.399
172.182
Euler
0.057
0.717
−0.039
0.227
8.424
15.685
0.601
2.075
0.114
0.984
36.686
36.806
21.623
31.143
Asymp.
0
20.445
0
6.264
0
181.520
0
216.612
0
43.010
0
338.527
0
127.635
that these differences decrease in value or stay roughly unchanged for most
parameters as m grows from 120 to 600 in Model (15) (Table 4) and from
200 to 400 in Model (16) (Table 5). The few large differences between these
moments are due to large asymptotic standard deviations, as indicated in
√
K − θ ∗ ) gets closer to
the columns “Asymp.” Thus, the distribution of m(θ̂m
√
∗
the distribution of m(θ̂m − θ ) as m grows and more data becomes available. In support of Theorem 6.2, these findings confirm that the asymptotic
distribution of a SMLE is the same as that of a MLE.
For Model (16) with ∆ = 1/4, Table 5 also shows the means and standard deviations of the scaled errors of the parameter estimators derived from
the Kernel (Euler) estimator; see Section 7.1.10 This method is implemented
such that the evaluation of the corresponding likelihood takes the same com10
We do not carry out the analogous analysis using the Kernel (Exact) estimator from
Section 7.1 because this would require that one resimulates samples of X∆ for every
evaluation of the density, which makes the numerical optimization of the corresponding
likelihood approximation unstable and imprecise.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
33
putational effort as the evaluation of the simulated likelihood derived from
our density estimator. We see that the moments of the scaled parameter errors derived from the Kernel (Euler) estimator differ strongly from those of
our SMLE and true MLE. The standard deviations are too small compared
to the theoretical asymptotic distribution of an MLE. These findings suggest
that the parameter estimators derived from Kernel estimation based on Euler discretization have distorted finite-sample and asymptotic distributions,
consistent with the findings of Detemple, Garcia and Rindisbacher (2006).
Furthermore, the findings indicate that in using up the same computational
budget, our simulated likelihood estimators are more accurate approximations of true likelihood estimators than the parameter estimators derived
from Kernel estimation based on Euler discretization.
APPENDIX A: ASYMPTOTIC PROPERTIES OF MLES
This appendix discusses the large-sample asymptotic properties of a MLE
θ̂m satisfying ∇L(θ)|θ=θ̂m = 0 as m → ∞. Suppose the true data-generating
parameter θ∗ ∈ int Θ. We begin by generalizing the conditions of Bar-Shalom
(1971) for consistency of a MLE. Unlike the conditions of Bar-Shalom (1971),
our conditions do not require knowledge of the true data-generating parameter. They can be verified in practice using an unbiased Monte Carlo
approximation to p∆ developed in Section 3. On the other hand, they are
somewhat stronger than the conditions of Bar-Shalom (1971) because they
need to hold globally in the parameter space.
Proposition A.1.
Suppose the following conditions are valid.
(A1) The mapping θ 7→ p∆ (v, w; θ) is three-times differentiable for any
v, w ∈ S.
(A2) For any θ ∈ Θ,
Pθ sup p∆ (X(n−1)∆ , Xn∆ ; θ) < ∞ = 1.
n≥1
(A3) For any 1 ≤ i ≤ r and θ ∈ Θ◦ ,
h
2 i
sup Eθ ∂i log p∆ (X(n−1)∆ , Xn∆ ; θ)
< ∞.
n≥1
(A4) For any 1 ≤ i, j ≤ r and θ ∈ Θ◦ ,

!2 
2 p (X
∂i,j
∆
(n−1)∆ , Xn∆ ; θ)
 < ∞.
sup Eθ 
p∆ (X(n−1)∆ , Xn∆ ; θ)
n≥1
34
GIESECKE AND SCHWENKLER
(A5) There exists a function H : S × S → R with supv,w∈S H(v, w) < ∞,
such that for any θ ∈ Θ◦ , v, w ∈ S, and 1 ≤ i, j, k ≤ r,
3
∂
i,j,k log p∆ (v, w; θ) ≤ H(v, w).
Then any MLE θ̂m satisfying the first-order condition ∇L(θ) = 0 is consistent.
Assuming that the conditions of Proposition A.1 hold, Taylor expansion
for the first-order condition results in
(17)
0 = ∇ log L(θ̂m ) = ∇ log L(θ∗ ) + ∇2 log L(θ∗ )(θ̂ − θ∗ ) + oP (1).
If
(18)
(19)
1
√ ∇ log L(θ∗ ) → N (0, Σθ∗ ) in Pθ∗ −distribution and
m
1 2
∇ log L(θ∗ ) → −Σθ∗ in Pθ∗ −probability,
m
for
(20)
m
1 X
Σθ = lim
∇ log p∆ (X(n−1)∆ , Xn∆ ; θ)> ∇ log p∆ (X(n−1)∆ , Xn∆ ; θ)
m→∞ m
n=1
a deterministic matrix of full rank, then we can rewrite (17) as follows:
√
1
∗
m(θ̂ − θ∗ ) = Σ−1
θ∗ √ ∇ log L(θ ) + oP (1).
m
In this case, a MLE θ̂m is asymptotically normal with asymptotic variancecovariance matrix Σ−1
θ∗ . The following Proposition provides sufficient conditions for (18)-(20).
Proposition A.2. Suppose the conditions of Proposition A.1 hold. Then,
for any θ ∈ Θ◦ the limiting matrix Σθ in (20) exists in Pθ -probability and
is deterministic. Further, Conditions (18) and (19) hold. If in addition the
following condition also holds,
(A6) For any θ ∈ Θ◦ , the matrix Σθ in (20) is positiv definite,
then Σθ is of full rank.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
35
We provide sufficient conditions for differentiability in Proposition 3.2.
The exact verification of Assumption (A6) is hard given the limited amount
of data one has available in practice. One commonly only analyzes the finitesample counterpart
m
(m)
(21) Σ̂θ
=
1 X
∇ log p∆ (X(n−1)∆ , Xn∆ ; θ)> ∇ log p∆ (X(n−1)∆ , Xn∆ ; θ),
m
n=1
which is a consistent estimator of Σθ0 (see Greene (2008)).
APPENDIX B: PROOFS
Proof of Theorem 3.1. Define x = F (v; θ) and y = F (w; θ). Assumption (B1) implies that the process Y and the functions a, b, c are well-defined.
The transition densities satisfy
p∆ (v, w; θ) =
1
pY (x, y; θ).
σ(w; θ) ∆
Assumption (B2) implies that (Zt (θ))0≤t≤∆ is a martingale so that the
change of measure is well-posed with density process
dQθ = Zt (θ)
dPθ Ft
for 0 < t ≤ ∆. Without loss of generality, assume that x = F (v; θ) = 0 and
write Y instead of Y x .
Let PY denote the (SY , σ(SY ), Pθ )-law of Y∆ , i.e., PY [A] = Pθ [Y∆ ∈ A] for
A ∈ B+ . With B+ we denote the Borel σ-algebra on R+ . Further, define L
as the Lebesgue measure on (SY , σ(SY )). Then PY is absolutely continuous
with respect to L with Radon-Nikodym density
dPY
= pY∆ (0, Y∆ ; θ)
dL
since Y0 = 0. In addition, define QY as the (SY , σ(SY ), Qθ )-law of Y∆ . Iterated expectations and the fact that Pθ and Qθ are equivalent measures
imply that
dPY
= EQ
θ [1/Z∆ (θ) | Y∆ ].
dQY
(22)
As a consequence, the law QY is also absolutely continuous with respect to
L since
pY∆ (0, Y∆ ; θ) =
dPY
dPY dQY
dQY
Q
=
=
E
[1/Z
(θ)
|
Y
]
.
∆
∆
θ
dL
dQY dL
dL
36
GIESECKE AND SCHWENKLER
Hence, the transition density of Y under Qθ exists, i.e.,
dQY
= q∆ (0, Y∆ ; θ).
dL
(23)
It follows that
(24)
pY∆ (0, y; θ) = q∆ (0, y; θ)EQ
θ [1/Z∆ | Y∆ = y] = q∆ (0, y; θ) exp (a(y; θ) − a(0; θ))
#
"
Y
Z ∆
N∆
Q
b(Ys ; θ)ds
c(YTn − , Dn ; θ) Y∆ = y .
× Eθ exp −
0
n=1
We simplify the conditional expectation in (24) using an iterative argument. Write
Z t
Y
Nt
c(YTn − , Dn ; θ)
Φt (θ) = exp −
b(Ys ; θ)ds
0
n=1
Note that YTn = YTn − + ΓY (YTn − , Dn ; θ) for n ≥ 1 under Qθ . By the law of
iterated expectation, the strong Markov property of Y , and since no jump
occurs between times TN∆ and ∆, we have
Q
Eθ Φ∆ (θ) Y∆ = y
!
#
#
"
"
Z ∆
Q
Q
b(Ys ; θ)ds FTN∆ − , Y∆ = y Y∆ = y
= Eθ ΦTN∆ Eθ exp −
T N∆
"
"
!
#
#
Z ∆
Q
Q
b(Ys ; θ)ds YTN∆ , Y∆ = y Y∆ = y
= Eθ ΦTN∆ Eθ exp −
T N∆
Q
= Eθ ΦTN∆ f (YTN∆ , Y∆ ; ∆ − TN∆ ; θ) Y∆ = y ,
where f as defined as
(25)
f (v, w, t; θ) =
EQ
θ
−
e
Rt
0
Wt = w − v − ρt .
b(v+ρs+Ws ;θ)ds We iterate the above argument by conditioning on the σ-algebras FTn − and
σ{Ys : s ≥ Tn+1 } for n = N∆ − 1, . . . , 1, and conclude that
"
Q
EQ
θ [Φ∆ |Y∆ = y] = Eθ f (YTN∆ , Y∆ , ∆ − TN∆ ; θ)
×
N∆
Y
n=1
#
f (YTn−1 , YTn − , Tn − Tn−1 ; θ)c(YTn − , Dn ; θ) Y∆ = y .
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
37
The above expectation is taken with respect to the distribution of the
random variable R̃∆ = (N∆ , (T1 , YT1 − , D1 ), . . . , (TN∆ , YTN∆ − , DN∆ )) conditional on Y∆ = y. The random variable R̃∆ contains the number of jumps of
Y up to time ∆, the jump times of Y in [0, ∆], the realization of the random
variable Dn at all jumps in [0, ∆], as well as the corresponding values of Y
immediately before of a jump. Note that R̃∆ contains the same information
as the path R∆ = (Rs : 0 ≤ s ≤ ∆), where R is the càdlàg process
!
Ns
Ns
X
X
Dn .
YTn − ,
Rs = Ns ,
n=1
n=1
This process is measurable relative to the Skorohod space of càdlàg functions
mapping [0, ∆] onto N × R × R with the associated Skorohod σ-algebra.
Define QY,C as the conditional Qθ -law of Y∆ on (SY , σ(SY )) given R∆ . Since
Y follows a standard Brownian motion between jumps and the last jump
of Y before time ∆ is known given R∆ , it follows that QY,C is absolutely
continuous with respect to the Lebesgue measure L with Gaussian density:
!
(y − YTN∆ )2
dQY,C
1
C
∆
(26)
= q∆ (y | R ; θ) = p
exp −
.
dL
2(∆ − TN∆ )
2π(∆ − TN∆ )
In addition, write QR for the Qθ -law of R∆ and QR,C for its conditional
Qθ -law given Y∆ , both defined on the corresponding Skorohod space. Bayes’
rule implies that QR,C × QY = QY,C × QR . We reformulate this result and
obtain that the conditional law of R∆ is absolutely continuous with respect
to its unconditional law:
C (y | R∆ ; θ)
q∆
dQR,C
=
,
dQR
q∆ (0, y; θ)
We can do this given that the laws QY,C and QY are absolutely continuous
with respect to the Lebesgue measure L. We apply this insight and obtain
C
∆
Q
Q q∆ (y | R ; θ)
(27)
Φ∆ .
Eθ [ Φ∆ | Y∆ = y] = Eθ
q∆ (0, y; θ)
All in one, the claim follows after merging (24) and (27):
pY∆ (0, y; θ) = q∆ (0, y; θ) exp (a(y; θ) − a(0; θ)) EQ
θ [ Φ∆ | Y∆ = y]
C
= exp (a(y; θ) − a(0; θ)) EQ
θ q∆ (y | R∆ ; θ)Φ∆ .
38
GIESECKE AND SCHWENKLER
Proof of Proposition 3.2. To prove differentiability, note that the
Qθ -distribution of Y x is driven by a standard Brownian motion W and
a Poisson process N of rate ` such that
Ytx = x + ρt + Wt +
Nt
X
ΓY (YTn − , Dn ; θ)
n=1
for Dn ∼ π. Thus, Ytx is pathwise differentiable under Assumption (B3) as
in Section 7.2 of Glasserman (2003). Assumption (B4) implies that the order
of differentiation and integration can be interchanged for Ψ∆ . Furthermore,
Assumption (B3) implies that the functions F , σ, ΓY , a, b, and c are n-times
continuously differentiable.
Proof of Theorem 4.1. Assumptions (B5) and (B6) together imply
that the function y 7→ b(y; θ) is continuous but not constant. Thus, the
functions mi (θ) and Mi (θ) are well defined and finite. The claim follows
along the lines of Theorem 4.3 of Chen and Huang (2013) after accounting
for the possibly non-zero drift ρ.
Proof of Corollary 4.2. Assumptions (B3) and (B7) imply that fˆ(Ytx1 ,
Ytx2 , t2 − t1 ; θ) is n-times continuously differentiable in θ ∈ int Θ and x ∈
int SY except if
τk,pk = η.
Nevertheless, this event occurs with zero probability. Thus, the mapping
θ 7→ p̂∆ (v, w; θ) is almost surely n-times continuously differentiable. For
unbiasedness, Assumption (B8) and Theorem 4.1 imply that
n
Q
n
n
EQ
θ ∂i1 ,...,in p̂∆ (v, w; θ) = ∂i1 ,...,in Eθ [p̂∆ (v, w; θ)] = ∂i1 ,...,in p∆ (v, w; θ).
where p∆ (v, w; θ) is n-times continuously differentiable according to Proposition 3.2.
Proof of Theorem 6.1. Since L̂K is the product of the simulated transition densities as in (18), it suffices to show that the simulated transition
densities converge almost surely and uniformly in the parameter space to
the true simulated transition densities.
Fix v, w ∈ S and t > 0. Set x = F (v; θ) and y = F (w; θ). Define the
function
(28)
θ ∈ Θ 7→ H(θ) = p̂K
∆ (v, w; θ) − p∆ (v, w; θ).
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
39
Corollary 4.2 implies that H is a continuous function on the parameter space
Θ. Define (CΘ , k·k) as the space of continuous functions on Θ, equipped with
the supremum norm. Since Θ is compact, (CΘ , k · k) is a separable Banach
space. Now, H ∈ CΘ and we can rewrite
H(θ) =
K
1 X
hk (θ)
K
k=1
for a sequence (hk )k=1,...,K of i.i.d. samples of p̂∆ (v, w; θ) − p∆ (v, w; θ). It
follows that
E [khk k] ≤ sup p∆ (v, w; θ) + E sup p̂∆ (v, w; θ) < ∞
θ∈Θ
θ∈Θ
by Theorem 3.1 and Assumption (C1). Further, E [hk (θ)] = 0 by Theorem
4.1. Thus, we are in position to use the strong law of large numbers in
separable Banach spaces (see, e.g., (Beskos, Papaspiliopoulos and Roberts,
2009, Theorem 2)). It follows that
sup p̂K
∆ (v, w; θ) − p∆ (v, w; θ) → 0
θ∈Θ
almost surely as K → ∞. Consequenlty, also
sup L̂K (θ) − L(θ) → 0
θ∈Θ
K follows.
almost surely as K → ∞ and the asymptotical unbiasedness of θ̂m
For (strong) consistency, note that if K → ∞ as m → ∞, also
lim θ̂K
m→∞ m
lim θ̂K
m→∞ K→∞ m
= lim
= lim θ̂m = θ∗
m→∞
in probability (almost surely for strong consistency) given that Θ is compact.
K is asymptotically unbiased almost surely
This follows since the SMLE θ̂m
and the MLE θ̂m is (strongly) consistent.
Proof of Theorem 6.2. Taylor expansion together with the smoothK
ness of p̂K
∆ implied by Corollary 4.2 and the consistency of θ̂m as m, K → ∞
as in Theorem 6.1 lead to
i
√
1 2
1 h
K ∗
K
− ∇ log L̂ (θ )
m(θ̂m
− θ∗ ) = √
∇ log L̂K (θ∗ ) − ∇ log L(θ∗ )
m
m
1
(29)
+ √ ∇ log L(θ∗ ) + oP (1) .
m
40
GIESECKE AND SCHWENKLER
Condition (C3) controls the convergence in distribution of the second summand on the right-hand side of (29). Thus, it remains to study the convergence of the first summand on the right hand side of (29), as well as of the
1 2
term − m
∇ log L̂K (θ∗ ).
The strong law of large numbers, Theorem 4.1, and Assumption (B8)
imply that, for any v, w ∈ S and θ ∈ int Θ,
Q 2
2 Q
2
∇2 p̂K
∆ (v, w; θ) → Eθ ∇ p̂∆ (v, w; θ) = ∇ Eθ [p̂∆ (v, w; θ)] = ∇ p∆ (v, w; θ)
almost surely as K → ∞. It follows that
(30)
1 2
1 2
∇ log L̂K (θ∗ ) → lim
∇ log L(θ∗ ) = −Σθ∗
m→∞ m
m
in probability as m →
(C4).
h ∞ and K → ∞ by Condition
i
1
K
∗
∗
K
Write Gm = √m ∇ log L̂ (θ ) − ∇ log L(θ ) for the first summand on
the right-hand side of (29). Now
GK
m
m
K
1 X 1 1 X k
=√
gn
m n=1 hK
n K
k=1
for
gnk = ∇
hK
n
p̂k∆ (X(n−1)∆ , Xn∆ ; θ)
p∆ (X(n−1)∆ , Xn∆ ; θ)
!
,
θ=θ∗
K
1 X p̂k∆ (X(n−1)∆ , Xn∆ ; θ∗ )
=
,
K
p∆ (X(n−1)∆ , Xn∆ ; θ∗ )
k=1
and a sequence (p̂k∆ )k=1,...,K of i.i.d. samples of the unbiased density estimator (16). From the strong law of large numbers and Theorem 4.1 we conclude
that
Q 1 hK
n → Eθ hn X(n−1)∆ , Xn∆ = 1
almost surely as K → ∞. Thus, it suffices to study the convergence of
m
K
1 X 1 X k
√
ḠK
=
gn
m
m n=1 K
k=1
First, consider the case
m
K
→ 0. Chebyshev’s inequality and the i.i.d. prop-
41
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
erty of (p̂k∆ )1≤k≤K imply that, for any > 0,

!2 
m X
K
X
K
1
Qθ |Ḡm | > ≤
EQ 
gnk 
mK 2 2 θ
n=1 k=1
"
#
m
K
X
2
1 X Q 1 X k 2
E
(g
)
+
≤
n
θ
mK2
K
mK2
n=1
k=1
1≤n<l≤m
"
EQ
θ
#
K
1 X k k
|gn gl |
K
k=1
1 2 m − 1
1 1
1
≤
max EQ
(gn ) +
max EQ
|gn gl | .
θ
θ
2
2
K 1≤n≤m
K 1≤n<l≤m
Hölder’s inequaltity and Assumption (C5) imply that Qθ [|ḠK
m | > ] → 0 as
m
K
m → ∞ and K → ∞ if K → 0. We conclude that Ḡm → 0 in Pθ -probability
as m, K → ∞.
m
→ c ∈ (0, ∞). Asymptotically, we have K =
Next, consider the case K
O(m) and


m/c
m
X
X
1
c
√
ḠK
gnk  .
m = OP
m
m
n=1
k=1
Assumption (C5) and the i.i.d. property of (gnk )1≤k≤K imply that the central
limit theorem holds under Qθ so that
m/c
1 X k
√
gn → N
m
k=1
i 1
1 Qh k
Q 1
, Xn∆ , Varθ gn | X(n−1)∆ , Xn∆ .
E g |X
c θ n (n−1)∆
c
Corollary 4.2 states that the partial derivatives of p̂k∆ are unbiased, which
leads to
h
i
k
g
|
X
,
X
EQ
n∆ = 0.
(n−1)∆
n
θ
Thus, (gn1 )n≥1 are uncorrelated. We conclude from the strong law of large
numbers that ḠK
m → 0 almost surely.
m
Finally, consider the case K
→ ∞ so that K
m → 0. Taylor expansion yields
r
r
√
1 2
K K
K 1
K ∗
K
∗
√ ∇ log L(θ∗ ) + oP (1) .
K(θ̂m − θ ) =
G +
− ∇ log L̂ (θ )
m
m m
m m
The second term on the right-hand side converges to zero almost surely as
m → ∞ and K → ∞ according to Condition (C3). For the first term, note
that
"
#
r
m
K
K K
1 X 1 1 X k
1
√
G =√
gn → 0
m m
m m n=1 hK
K k=1
n
42
GIESECKE AND SCHWENKLER
almost surely according to the central limit theorem and√the strong law of
m
K − θ∗ ) → 0
large numbers as in the case K
→ c > 0. It follows that K(θ̂m
almost surely given (30).
Proof of Proposition A.1. We show that the conditions of Bar-Shalom
(1971) (BS) for consistency of a MLE θ̂m obtained from dependent observations are valid in our setting. The generalization of the conditions of BarShalom (1971) to a multivariate parameter θ is straightforward.
R
Assumption (A1) implies Condition (C1) of BS. Since 1 = S p∆ (v, w; θ)dw,
it follows from Assumption (A2) that
Z
∇p∆ (v, w; θ)dw = Eθ [∇ log p∆ (v, X∆ ; θ)].
0=
S
Thus, Condition (C2) of BS holds. Assumption (A3) yields Condition (C3)
of BS. Note that ∇2 log p∆ (X(n−1)∆ , Xn∆ ; θ) is equal to
∇2 p∆,n,θ
− ∇ log p>
∆,n,θ ∇ log p∆,n,θ .
p∆,n,θ
where p∆,n,θ = p∆ (X(n−1)∆ , Xn∆ ; θ). Assumption (A3) implies that 0 =
h
i
R 2
2 p (v, X ; θ)/p (v, X ; θ) . Thus,
∂
p
(v,
w;
θ)dw
=
E
∂
∆
∆
∆
∆
∆
θ
i,j
i,j
S
i
h
∇
log
p
Eθ ∇2 log p∆,n,θ = −Eθ ∇ log p>
∆,n,θ
∆,n,θ
and Condition (C4) of BS follows.
Assumption (A5) yields Condition (C5)
R
of BS. The fact that 0 = S ∇p∆ (v, w; θ)dw = Eθ [∇ log p∆ (v, X∆ ; θ)] implies that the sequence (∂i log p∆ (X(n−1)∆ , Xn∆ , θ))1≤n≤m is pairwise uncorrelated so that Condition (C6’) of BS also holds. Finally, Assumptions (A3)
and (A4) yield that the second moment of ∇2 log p∆ (X(n−1)∆ , Xn∆ ; θ) is uniformly bounded for all n ≥ 1. Thus, Condition (C7) of BS also holds.
Proof of Proposition A.2. The bounded convergence theorem and
Assumption (A3) imply that the matrix Σθ exists in Pθ -probability for all
θ ∈ Θ◦ . Further, Chebyshev’s inequality together with Assumptions (A3)
and (A4) imply that Σθ is deterministic under Pθ . Therefore, Condition
(20) holds. If Assumption (A6) also holds, then Σθ is of full rank.
1 2
For Condition (19), we have that m
∇ log L(θ) is equal to
(31)
m
m
n=1
n=1
1 X ∇2 p∆,n,θ
1 X
−
∇ log p>
∆,n,θ ∇ log p∆,n,θ .
m
p∆,n,θ
m
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
43
where p∆,n,θ = p∆ (X(n−1)∆ , Xn∆ ; θ). Chebyshev’s inequality yields

2 
X
m
2 p
2 p
X
m ∂i,j
∂
1
∆,n,θ ∆,n,θ

i,j
> m ≤

E
Pθ θ
p∆,n,θ m2 2
p∆,n,θ n=1
n=1

!2 
"
#
2 p
2 p
2
X
∂i,j
∂i,j
1
1
∆,n,θ
∆,n,θ ∂i,j p∆,l,θ
+
≤
max Eθ 
Eθ
.
m2 1≤n≤m
p∆,n,θ
m2 2
p∆,n,θ
p∆,l,θ
1≤n<l≤m
h
i
R 2
2 p (v, X ; θ)/p (v, X ; θ) ,
Given that 0 = S ∂i,j
p∆ (v, w; θ)dw = Eθ ∂i,j
∆
∆
∆
∆
Assumption (A4) implies that the first sum in (31) converges to zero in Pθ probability. Assumption (A6) tells us that the second sum in (31) converges
in Pθ -probability to Σθ .
P
For Condition (18), define Mm = √1m ∇ log L(θ∗ ) = √1m m
n=1 ∇ log p∆,n,θ∗ .
A multivariate version of the martingale central limit theorem of Brown
(1971) implies that Mm converges in distribution to a normal random variable with mean 0 and variance-covariance matrix Σθ∗ as m → ∞ if the
Lindeberg condition holds, i.e., if
"
#
m
1 X
>
Eθ∗ ∇ log p∆,n,θ∗ ∇ log p∆,n,θ∗ 1An,m X(n−1)∆ → 0
m
n=1
√
as m → ∞, where An,m = {|∇ log p∆,n,θ∗ | ≥ m} for an arbitrary > 0.
Note that the left-hand side of the Lindeberg condition is bounded for large
values of m given Assumption (A3). Further, Assumption (A3) also yields
lim sup Pθ∗ |∇ log p∆ (X(n−1)∆ , Xn∆ ; θ∗ )| > M = 0.
M →∞ n≥1
Thus, sup1≤n≤m 1An,m → 0 almost surely as m → ∞ for all > 0. As a
result, the Linderberg condition is valid in our setting.
REFERENCES
Aı̈t-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions.
The Annals of Statistics 36 906-937.
Aı̈t-Sahalia, Y. and Yu, J. (2006). Saddlepoint approximations for continuous-time
Markov processes. Journal of Econometrics 134 507 - 551.
Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte
Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 269-342.
Bar-Shalom, Y. (1971). On the Asymptotic Properties of the Maximum-Likelihood Estimate Obtained from Dependent Observations. Journal of the Royal Statistical Society.
Series B (Methodological) 33 72-77.
44
GIESECKE AND SCHWENKLER
Beskos, A., Papaspiliopoulos, O. and Roberts, G. (2006). Retrospective exact simulation of diffusion sample paths with applications. Bernoulli 12 1077-1098.
Beskos, A., Papaspiliopoulos, O. and Roberts, G. (2009). Monte Carlo maximum
likelihood estimation for discretely observed diffusion processes. Annals of Statistics 37
223-245.
Beskos, A. and Roberts, G. (2005). Exact Simulation of diffusions. Annals of Applied
Probability 15 2422-2444.
Bibby, B. M. and Sørensen, M. (1995). Martingale estimation functions for discretely
observed diffusion processes. Bernoulli 1 17–39.
Blanchet, J. and Ruf, J. (2013). A Weak Convergence Criterion Constructing Changes
of Measure. Working Paper.
Brown, B. M. (1971). Martingale Central Limit Theorems. The Annals of Mathematical
Statistics 42 59-66.
Carrasco, M., Chernov, M., Florens, J.-P. and Ghysels, E. (2007). Efficient estimation of general dynamic models with a continuum of moment conditions. Journal of
Econometrics 140 529-573.
Cass, T. (2009). Smooth densities for solutions to stochastic differential equations with
jumps. Stochastic Processes and their Applications 119 1416 - 1435.
Chacko, G. and Viceira, L. M. (2003). Spectral GMM estimation of continuous-time
processes. Journal of Econometrics 116 259-292.
Chang, J. and Chen, S. X. (2011). On the approximate maximum likelihood estimation
for diffusion processes. The Annals of Statistics 39 2820-2851.
Chen, N. and Huang, Z. (2013). Localization and Exact Simulation of Brownian Motion
Driven Stochastic Differential Equations. Mathematics of Operations Research 38 591616.
Chen, S. X., Peng, L. and Yu, C. L. (2013). Parameter estimation and model testing
for Markov processes via conditional characteristic functions. Bernoulli 19 228–251.
Dacunha-Castelle, D. and Florens-Zmirou, D. (1986). Estimation of the coefficients
of a diffusion from discrete observations. Stochastics 19 263-284.
Das, S. R. (2002). The surprise element: jumps in interest rates. Journal of Econometrics
106 27-65.
Detemple, J., Garcia, R. and Rindisbacher, M. (2006). Asymptotic properties of
Monte Carlo estimators of diffusion processes. Journal of Econometrics 134 1-68.
Duffie, D. and Glynn, P. (1995). Efficient Monte Carlo Estimation of security prices.
Annals of Applied Probability 4 897–905.
Duffie, D. and Glynn, P. (2004). Estimation of Continuous-Time Markov Processes
Sampled at Random Time Intervals. Econometrica 72 1773-1808.
Duffie, D., Pan, J. and Singleton, K. (2000). Transform analysis and asset pricing for
affine jump-diffusions. Econometrica 68 1343–1376.
Duffie, D. and Singleton, K. J. (1993). Simulated Moments Estimation of Markov
Models of Asset Prices. Econometrica 61 929-952.
Feuerverger, A. and McDunnough, P. (1981). On the Efficiency of Empirical Characteristic Function Procedures. Journal of the Royal Statistical Society. Series B (Methodological) 43 20-27.
Filipović, D., Mayerhofer, E. and Schneider, P. (2013). Density approximations for
multivariate affine jump-diffusion processes. Journal of Econometrics 176 93-111.
Florens-Zmirou, D. (1989). Approximate discrete-time schemes for statistics of diffusion
processes. Statistics 20 547-557.
Giesecke, K. and Smelov, D. (2013). Exact Sampling of Jump-Diffusions. Operations
Research 61 894-907.
SIMULATED LIKELIHOOD FOR JUMP-DIFFUSIONS
45
Giesecke, K. and Teng, G. (2012). Numerical Solution of Jump-Diffusion SDEs. Working Paper, Stanford University.
Glasserman, P. (2003). Monte Carlo Methods in Financial Engineering. Springer-Verlag,
New York.
Glynn, P. W. and Whitt, W. (1992). The Asymptotic Efficiency of Simulation Estimators. Operations Research 40 505-520.
Gobet, E., Hoffmann, M. and Reiß, M. (2004). Nonparametric Estimation of Scalar
Diffusions Based on Low Frequency Data. The Annals of Statistics 32 2223-2253.
Greene, W. H. (2008). Econometric Analysis. Prentice Hall.
Jacob, P. E. and Thiery, A. H. (2015). On nonnegative unbiased estimators. The Annals
of Statistics 43 769-784.
Jiang, G. J. and Knight, J. L. (2002). Estimation of Continuous-Time Processes via
the Empirical Characteristic Function. Journal of Business & Economic Statistics 20
198-212.
Jiang, G. J. and Knight, J. L. (2010). ECF estimation of Markov models where the
transition density is unknown. Econometrics Journal 13 245-270.
Kou, S. G. (2002). A Jump-Diffusion Model for Option Pricing. Management Science 48
1086-1101.
Kristensen, D. and Shin, Y. (2012). Estimation of dynamic models with nonparametric
simulated maximum likelihood. Journal of Econometrics 167 76 - 94.
Lewis, P. and Shedler, G. (1979). Simulation of nonhomogeneous Poisson processes by
thinning. Naval Logistics Quarterly 26 403–413.
Li, C. (2013). Estimating Jump-Diffusions Using Closed-form Likelihood Expansions.
Working Paper.
Lo, A. W. (1988). Maximum Likelihood Estimation of Generalized Itô Processes with
Discretely Sampled Data. Econometric Theory 4 231-247.
Platen, E. and Bruti-Liberati, N. (2010). Numerical Solution of Stochastic Differential
Equations With Jumps in Finance. Springer.
Protter, P. (2004). Stochastic Integration and Differential Equations. Springer-Verlag,
New York.
Rogers, L. C. G. (1985). Smooth Transition Densities for One-Dimensional Diffusions.
Bulletin of the London Mathematical Society 17 157-161.
Semaidis, G., Papaspiliopoulos, O., Roberts, G. O., Beskos, A. and Fearnhead, P.
(2013). Markov Chain Monte Carlo for Exact Inference for Diffusions. Scandinavian
Journal of Statistics 40 294-321.
Singleton, K. J. (2001). Estimation of affine asset pricing models using the empirical
characteristic function. Journal of Econometrics 102 111-141.
Singleton, K. J. (2006). Empirical Dynamic Asset Pricing. Princeton University Press.
Straumann, D. and Mikosch, T. (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. The
Annals of Statistics 34 2449-2495.
Yu, J. (2007). Closed-form likelihood approximation and estimation of jump-diffusions
with an application to the realignment risk of the Chinese Yuan. Journal of Econometrics 141 1245-1280.
Department of Management Science & Engineering
Stanford University
Stanford, CA 94305, USA
E-mail: giesecke@stanford.edu
Questrom School of Business
Boston University
Boston, MA 02215, USA
Phone: +1 (617) 358-6266
Web: http://people.bu.edu/gas/
E-mail: gas@bu.edu
Download