On second order minimax estimation of invariant density for ergodic

advertisement
Statistics & Decisions 22, 500–524 (2004)
c R. Oldenbourg Verlag, München 2004
°
On second order minimax estimation of
invariant density for ergodic diffusion
Arnak S. Dalalyan, Yury A. Kutoyants
Received: March 17, 2003; Accepted: March 3, 2004
Summary: There are many asymptotically first order efficient estimators in the problem of estimating the invariant density of an ergodic diffusion process nonparametrically. To distinguish
between them, we consider the problem of asymptotically second order minimax estimation of
this density based on a sample path observation up to the time T . It means that we have two
problems. The first one is to find a lower bound on the second order risk of any estimator. The
second one is to construct an estimator, which attains this lower bound. We carry out this program
(bound+estimator) following Pinsker’s approach. If the parameter set is a subset of the Sobolev
ball of smoothness k > 1 and radius R > 0, the second order minimax risk is shown to behave as
−T −2k/(2k−1) Π̂(k, R) for large values of T . The constant Π̂(k, R) is given explicitly.
1
Introduction
The Model. In this paper we deal with a diffusion process X T = {Xt , 0 ≤ t ≤ T } given
by the stochastic differential equation
dXt = S(Xt ) dt + σ(Xt ) dWt ,
X0 = ξ,
0 ≤ t ≤ T,
(1.1)
where W T = {Wt , 0 ≤ t ≤ T } is the standard one-dimensional Wiener process. We
suppose that the trend coefficient S(·) and the diffusion coefficient σ(·)2 are such that
the equation (1.1) has a unique weak solution and the conditions
½
¾
Z x
Z y
S(z)
exp − 2
dz
dy −→ ±∞ as x → ±∞,
(1.2)
2
0
0 σ(z)
½ Z y
¾
Z ∞
S(z)
G(S) =
σ(y)−2 exp 2
dz
dy < ∞
(1.3)
2
−∞
0 σ(z)
are fulfilled. We fix a diffusion coefficient σ(·)2 and denote by Σ the class of all trend
coefficients satisfying conditions (1.2) and (1.3). By these conditions the process X =
AMS 1991 subject classifications. Primary: 62M05, 62G20; Secondary: 62G07, 60G35
Key words and phrases: ergodic diffusion, invariant density estimation, second order minimax, lower bound,
Pinsker’s constant
Second order minimax density estimation
{Xt , t ≥ 0} is recurrent positive with the density of invariant law
½ Z x
¾
1
S(z)
fS (x) =
exp
2
dz
.
2
G(S) σ(x)2
0 σ(z)
501
(1.4)
The aim of the present paper is to estimate the function fS (·) nonparametrically,
given a sample path X T = {Xt , 0 ≤ t ≤ T } of the diffusion process X defined by (1.1).
We assume that σ(·) is a known, sufficiently smooth, positive function such that σ + σ −1
increases at most polynomially. The trend coefficient is supposed to be unknown.
First order efficiency. During last twenty years the problem of stationary density estimation for continuous time processes has been studied by many authors. We mention here
Nguyen [25], Prakasa Rao [29], Castellana and Leadbetter [4], Leblanc [23], Kutoyants
[21], Bosq and Davydov [3], Bosq [2]. It is proven by Castellana and Leadbetter [4] that
for processes satisfying some strong mixing condition, this density can be estimated with
the “parametric” rate T −1/2 . In most situations, the above mentioned mixing condition
is not easy to verify since it involves the joint density of X0 and Xt , which in general
can not be written explicitly.
In Kutoyants [21, 22], it is shown that for one dimensional ergodic diffusion process,
one can obtain the same rate of convergence without verifying that mixing condition. The
nonparametric analogue of the Fisher information in this problem is shown to be equal
·
µ
¶ ¸
χ{ξ>x} − FS (ξ) 2 −1
2
I(S, x) = 4fS (x) ES
.
σ(ξ)fS (ξ)
(1.5)
For more detailed discussion on the Fisher information and its connection with bounding
the minimax risk the interested reader is referred to Koshevnik and Levit [20], where
the i.i.d. case is presented. Here and in the sequel, ES stands for the expectation with
respect to the probability measure induced by the process (1.1) on the space of continuous functions from [0, T ] to R equipped with the Borel σ-algebra defined by uniform
convergence. Also, the random
√ variable ξ is supposed to follow the invariant law.
In fact, the existence of T -consistent estimators in the case of ergodic diffusion
can also be explained by the fact that, due to (1.4), the invariant density is a smooth
functional of the trend coefficient. Since the trend coefficient can be estimated with
the standard nonparametric rate of convergence, it is natural that a smooth functional of
trend is estimated with rate T −1/2 . An example illustrating the connection between the
minimax estimation of a functional parameter and the second order minimax estimation
of a smooth functional of this parameter can be found in Golubev and Levit [18].
Furthermore, it is shown in [21] that under mild regularity conditions the local-time
estimator
Z T
1
|XT − x| − |X0 − x|
◦
fT (x) =
sgn(x − Xt ) dXt +
,
T σ(x)2 0
T
kernel-type estimators f¯K,T (x) and a wide class of unbiased estimators f˜T (x) are consistent, asymptotically normal and asymptotically (first-order) efficient. Moreover, as it
is proved in Kutoyants [22, Section 4.3] the corresponding lower bound on the minimax
Dalalyan – Kutoyants
502
risk and the efficiency of the estimators hold true for the L2 -type risk as well. Particularly, if the bandwidth is of order T −1/2 or smaller, then the kernel estimator satisfies
Z
Z
¡
¢2
T
ES [ f¯K,T (x) − fS (x) ] dx −−−−→ R(S) =
I(S, x)−1 dx.
T →∞
R
R
Second order minimax approach. The abundance of the first order efficient estimators motivates our interest in developing the second order minimax approach. The main
concepts of this approach can be presented as follows. Any of above cited estimators
attaining the lower bound (T I(S, x))−1 , is unbiased or has an asymptotically negligible
bias with respect to the variance. Furthermore, the choice of the smoothing parameter (the bandwidth in the case of kernel estimator) does not make the main part of the
variance smaller. However, it is clear that the oversmoothing (choosing the smoothing
parameter larger) decreases the variance, but this effect is perceptible only on the second
term of the asymptotic expansion of the variance. So the main idea of the second order
minimax approach is to choose the smoothing parameter by minimizing the sum of the
bias and the second term of the variance’s asymptotic expansion.
More precisely, we suppose that the function fS (·) is k-times differentiable and the
trend coefficient S(·) belongs to some set Σ∗ defined below. The second order risk is
then defined as follows:
Z
¡
¢2
¯
RT (fT , fS ) =
ES f¯T (x) − fS (x) dx − T −1 R(S),
R
where f¯T (x) is an arbitrary estimator of the density. It is evident that for asymptotically efficient estimators f¯T the quantity T RT (f¯T , fS ) tends to zero, as T → ∞. It
can be shown that for some of these estimators there exists a non degenerate limit for
2k
T 2k−1 RT (f¯T , fS ) and for the others this limit is equal to infinity. Therefore we can
compare the performance of these estimators according to the limits of this quantity.
It is noteworthy that for the local time estimator, the quantity RT (fT◦ , fS ) tends to
zero with the speed T −3/2 (this assertion can be easily derived from the martingale representation of the local time estimator). It means that if the smoothness order k of fS
exceeds 3/2, we have T 2k/(2k−1) RT (fT◦ , fS ) → 0, and consequently the minimal limiting value of T 2k/(2k−1) RT (f¯T , fS ) is less than or equal to zero. We show in this paper
that in the case of Sobolev regularity k ≥ 2, this minimal value is strictly negative, which
entails the inefficiency in second order of the local time estimator. Note that the condition that the function fS belongs to a Sobolev ball of smoothness k ≥ 2 means that the
function S is absolutely continuous, which is not a strong restriction.
As it is typical for the results describing the asymptotic behaviour of the minimax
risk, we divide the problem into two parts. The first one is to find a constant Π̂ such that
the following inequality holds
2k
lim inf sup T 2k−1 RT (f¯T , fS ) ≥ − Π̂,
T →∞ f¯T S(·)∈Σ∗
where the inf is taken over all estimators f¯T (·). This inequality gives us the lower bound
for the risks of all estimators. The second part is to construct an estimator fˆT (·) which
Second order minimax density estimation
503
attains this bound, i.e., such that
lim
2k
sup T 2k−1 RT (fˆT , fS ) = −Π̂.
T →∞ S(·)∈Σ∗
Such estimators will be called second order minimax over the parameter set Σ∗ . We
show in this paper that the optimal constant Π̂ is given by the expression
µ
Π̂ = Π̂(k, R) = 2(2k − 1)
4k
π(k − 1)(2k − 1)
2k
¶ 2k−1
1
R− 2k−1 .
(1.6)
The methodology. The approach adopted in this work is inspired by the paper Golubev
and Levit [17], who have studied the second order minimax estimation of the distribution
function (d. f.) in the i.i.d. case (see also Golubev and Levit [18] for the case of infinitely
smooth d. f. estimation). In that paper, the authors prove that Pinsker’s phenomenon
(see [28]), well known for classical problems of nonparametric curve estimation, appears
in the problem of d. f. estimation when the second order minimax approach is considered. The list of articles concerning the Pinsker bound and related topics can be found in
Nussbaum [26] and Efromovich [12]. For ergodic diffusion processes we have already
obtained similar results in the problems of density derivative [8] and trend coefficient [9]
estimation.
To obtain a lower bound, we construct a parametric family {S(ϑ, ·), ϑ ∈ ΘT },
which is almost entirely contained in the original nonparametric set Σ∗ . This allows
us to evaluate the global minimax risk by the Bayesian risk in a finite dimensional parameter estimation problem. Then we choose the worst prior distribution and the least
favorable parametric family and the risk of the latter problem is shown to be greater than
T −2k/(2k−1) Π̂(k, R).
To construct a second order minimax estimator we use the Hilbert space structure
of L2 (R, dx). Namely, we argue that the estimation of fS in L2 -norm is equivalent to
the estimation in l2 -norm of Fourier coefficients of fS in a suitably chosen orthonormal
basis of L2 (R). Afterwards we proceed to a modification of the linear estimator of these
coefficients permitting to remove some boundary effects. Then we prove that Pinsker’s
filter leads to a second order minimax estimator.
Of course, the estimator proposed in this paper depends on the parameters k and
R describing the parameter set. In the case when these parameters are not known, one
should use an adaptive estimating procedure. Such a procedure can be provided using
the unbiased risk minimization (see for instance Cavalier et al. [5]), the Lepski method
(see Lepski and Spokoiny [24]) or blockwise adaptation (see Cavalier and Tsybakov [6]).
The exact theoretical study of an adaptive procedure lies beyond the scope of this paper
and will be done in forthcoming works. We believe that in our problem, combining the
techniques developed in the papers Golubev and Levit [19], Golubev and Härdle [16] and
Dalalyan [7], it is possible to construct an adaptive estimating procedure achieving the
minimax bound obtained in this paper over a wide range of Sobolev balls.
It is worth noting that if the precise value of k is not known, but the function S is
supposed to be absolutely continuous (therefore fS0 exists and is absolutely continuous),
then the estimator proposed in Section 4 with k = 2 will have a second order risk of order
T −4/3 , that is significantly better than the second order risk of the local time estimator.
Dalalyan – Kutoyants
504
The parameter set. Let us now describe exactly the conditions imposed on the model.
To simplify the computations we put σ(x) ≡ 1, so the diffusion process is given by
dXt = S(Xt ) dt + dWt ,
X0 ,
0 ≤ t ≤ T,
(1.7)
©
ª
R
x
and we have to estimate the invariant density fS (x) = G(S)−1 exp 2 0 S(v) dv
based on the observations X T = {XT , 0 ≤ t ≤ T }. Some comments on the possible generalizations to the case σ 6≡ 1 are given in the end of Section 4. For technical
reasons, we replace conditions (1.2), (1.3) by a stronger one S ∈ Σγ∗ (A∗ , C∗ , ν∗ ), where
n
Σγ∗ (A∗ , C∗ , ν∗ ) = S ∈ Σ : sgn(x)S(x) ≤ −γ∗ for |x| > A∗
o
¯
¯
¯S(x)¯ ≤ C∗ (1 + |x|ν∗ ), ∀x ∈ R .
Here γ∗ , A∗ , C∗ and ν∗ are some positive constants. It is easy to see that for such
functions S(·) the conditions (1.2), (1.3) are fulfilled.
Fix some integer k > 1. The function S(·) is supposed to be (k − 2)-times differentiable with absolutely continuous (k − 2)th derivative and to belong to the set
Z
n
o
(k)
(k)
Σ(k, R) = S(·) ∈ Σ :
[fS (x) − fS∗ (x)]2 dx ≤ R ,
(1.8)
R
(k)
fS (x)
where R > 0 is some constant and
is the k-th derivative (in the distributional
sense) of the function fS (x) w.r.t. x. The set Σ(k, R) is a Sobolev ball of smoothness k
and radius R centered at fS∗ = f∗ . The choice of the center is not arbitrary, it should be
smoother than the other functions of the class. For simplicity we consider S∗ (x) = −x.
In this case the corresponding diffusion is an Ornstein-Uhlenbek process and the invariant
density is infinitely differentiable. Finally we define the parameter set Σ∗ = Σ∗ (k, R) =
Σ(k, R) ∩ Σγ∗ (A∗ , C∗ , ν∗ ).
The paper is organised as follows. The next section contains some heuristical arguments helping to understand the result. We state in Section 3 the lower bound result and
prove it up to some technical lemmas. Section 4 is devoted to the construction of the
second order minimax estimator. It contains also the proof of its minimaxity. The proofs
of the technical results are gathered in Section 5.
2
Heuristical explanation
As we mentioned in the introduction, the local time estimator fT◦ (x) is asymptotically
normal with mean fS (x) and variance [T I(S, x)]−1 . Since the local time is a sufficient statistic in the model of ergodic diffusion (see Kutoyants [22]), this latter is weakly
asymptotically equivalent to the heteroscedastic Gaussian experiment
Z
χ{u>t} − FS (u)
2f (t)
p
Yt = fS (t) + √S
dBu ,
T
fS (u)
R
where B = (Bu , u ∈ R) is a Brownian motion. Let now v(·) be a test function (infinitely differentiable with compact support) and V (·) be its primitive. A formal use of
Second order minimax density estimation
505
the integration by parts formula yields
Z
Z
Z
Z
χ{u>t} − FS (u)
2
p
v(t)Yt dt =
v(t)fS (t) dt + √
v(t)fS (t)
dBu dt
T R
fS (u)
R
R
R
Z
Z
p
2
= − V (t)fS0 (t) dt − √
V (t) fS (t) dBt
T R
R
Z
Z
χ{u>t} − FS (u)
2
p
−√
V (t)fS0 (t)
dBu dt.
T R
fS (u)
R
The third term of this sum can be dropped in the asymptotical results, since in contrast
with the second one, it is uniformly bounded in L2 (see [8]) in the following sense:
¸2
Z
Z ·
χ{u>t} − FS (u)
p
fS0 (t)
sup ES
dBu dt < ∞.
fS (u)
S∈Σ∗
R
R
Therefore, if we drop the third term we come to the Gaussian experiment
p
t ∈ R.
dYt = fS0 (t) dt + 2 fS (t)T −1 dBt ,
As it is noted by Golubev and Levit [18], second order minimax estimation of a function,
which is a regular functional (invariant density in our case) of the unknown parameter,
is closely related to the problem of first order minimax estimation of the derivative of
this function. On the other hand,
p it is known (Golubev [15]) that in the Gaussian shift
experiment dYt = θ(t) dt + ² I −1 (t)
R dBt , the optimal constant depends on the Fisher
information I(t) only via the integral I −1 (t) dt. Due to the fact that fS is a probability
density, the integral of the inverse of the Fisher information in our case is equal to 2 and
does not depend on S.
This explains why we succeed to obtain the second order minimax (up to constant)
bound over a global nonparametric class of the unknown parameter. These arguments
show also that the restriction to the L2 -norm in the risk definition is essential for carrying
out the optimal constants.
To end up with heuristic reasoning, let us remark that the reason, which made the
obtention of the second order minimax constant possible in the problem of the distribution function F (·) estimation from i.i.d. observations, was the same. Indeed, according to Nussbaum [27], the statistical problem of F 0 (·) estimation is locally asymptot0
ically equivalent to
p recovering the function F (·) from the Gaussian shift experiment
0
0
−1
dYt = F (t) dt + n F0 (t) dBt , where n is the size of the sample and F0 is the center
of localisation.
3 Lower bound
The first result of this work concerns the asymptotically exact lower bound on the L2 risks of all estimators. This bound turns out to be almost the same as in the problem of
the distribution function estimation of a sequence of i. i. d. random variables (Golubev
and Levit [17]). The main reason of this similarity is that these constants appear as the
solutions of two very similar optimisation problems.
Dalalyan – Kutoyants
506
Theorem 3.1 Let the integer k > 1, then
2k
lim inf sup T 2k−1 RT (f¯T , fS ) ≥ −Π̂(k, R),
(3.1)
T →∞ f¯T S∈Σ∗
where
µ
Π̂(k, R) = 2(2k − 1)
4k
π(k − 1)(2k − 1)
2k
¶ 2k−1
1
R− 2k−1 .
Proof: The main steps of the proof are close to those of Theorem 1 in [9]. The minimax
risk over the nonparametric set Σ is evaluated by the Bayesian risk over a parametric set
of increasing dimension (constructed like in Golubev [15] and Dalalyan, Kutoyants [9]),
then the van Trees inequality is applied.
Let us introduce a parametric family of diffusion processes
dXt = S(ϑ, Xt ) dt + dWt ,
X0 ,
0 ≤ t ≤ T,
(3.2)
where S(ϑ, ·) is chosen in the following way: We fix an increasing interval
[−A, A] with
£
A = A¤T = γ∗−1 log(T +1) and a sequence of its sub-intervals Im = am −AT −β , am +
AT −β , where β = (2k − 1)−1 and am = 2mAT −β , m = 0, ±1, ±2, . . . , ±M. Here
M = MT is the greatest integer such that the interval IM is entirely included in [−A, A].
Let us denote
s
X
X
2A
S(ϑ, x) = S∗ (x) +
ϑi,m gi,m (x),
T β f∗ (am )
|m|≤M
|i|≤L
where ϑ is the (2M + 1) × (2L + 1) matrix of coefficients ϑi,m and
q
¡
¢
gi,m (x) = T β /A ei T β A−1 (x − am ) U (A − |x − am |T β ).
Here ei (·) is the trigonometric basis on [−1, 1], that is

if i > 0,
 sin(πix),
√
ei (x) = 1/ 2,
if i = 0,

cos(πix), if i < 0,
the function U (x) is (k + 1)-times differentiable, increasing, vanishing for x ≤ 0 and
equal to one for x ≥ 1. The integer L = LT will be chosen later.
The parameter ϑ ∈ ΘT is of increasing dimension and
¯
µ¯
¶
p
¯ Λ(1 − ε) ¯k
1
¯
¯
|ϑl,m | ≤ K σl (ε),
σl (ε) =
(3.3)
¯ −1 ,
2AT 1−β ¯
l
+
for l 6= 0, and σ0 = T −β , for l = 0. Above we used the standard notation B+ =
max(0, B); ε is a positive number and
¶ 1
µ
R(k − 1)(2k − 1) 2k−1
.
Λ = ΛT = A
4kπ 2k−2
Second order minimax density estimation
507
The number L = LT is now the integer part of Λ. For the moment, the choice of the σl
and Λ is uncomprehending, but it will be clarified a little later.
Note that for S(·) ∈ Σγ∗ (A∗ , C∗ , ν∗ ) we have the estimate (see [9, Lemma 4 in
Appendix])
µ
Z
2
sup
ϑ∈ΘT
|x|>A
fϑ (x) Eϑ
χ{ξ>x} − Fϑ (ξ)
¶2
dx ≤ C e−γ∗ A ,
fϑ (ξ)
which implies that the choice A = γ∗−1 log T makes this term asymptotically negligible.
Therefore it is sufficient to study the lower bound for the risk
Z
RˆT (f¯T , fS ) = ES
A
−A
[f¯T (x) − fS (x)]2 dx −
1
T
Z
A
I−1 (S, x)dx
−A
¯
¯
because ¯RT (f¯T , fS ) − R̂T (f¯T , fS )¯ ≤ CT −2 , and the order of the term RˆT will be
shown to be less than T −2 .
© ¯
¯ ¯
¯
ª We consider now only the estimators belonging to the set WT = fT R̂T (fT , f∗ ) <
1 . It is evident that in the proof of lower bound one can drop all the estimators do not
belonging to WT . Moreover, it is clear that for T large enough this set is not empty (since
there exist consistent estimators). Suppose that the parameter ϑ is a random matrix with
a prior distribution Q(dϑ) on the set ΘT . Then we have the obvious inequality
Z
sup R̂T (f¯T , fϑ ) ≥ EQ R̂T (f¯T , fϑ ) =
R̂T (f¯T , fϑ ) Q(dϑ),
ϑ∈ΘT
ΘT
where the expectation EQ is defined by the last equality. We can write now the following
sequence of inequalities:
sup R̂T (f¯T , fS )
inf
f¯T ∈WT S∈Σ∗
≥
inf
sup
f¯T ∈WT S∈Σ∗ ∩ΘT
Z
≥
inf
f¯T ∈WT
R̂T
ΘT
¡
R̂T (f¯T , fS )
¢
f¯T , fϑ Q(dϑ) − sup
f¯T ∈WT
Z
≥ R̂T (Q) − sup
f¯T ∈WT
¡
Z
¡
¢
R̂T f¯T , fϑ Q(dϑ)
ΘT \Σ∗
¢
R̂T f¯T , fϑ Q(dϑ).
(3.4)
ΘT \Σ∗
Here R̂T (Q) denotes the second order Bayesian risk with respect to the prior Q, that is
£
¤
R̂T (Q) = EQ R̂T (fˆT , fϑ ) .
The next step is to choose a prior distribution that maximises the second order Bayesian
risk and, in the same time, is essentially concentrated on the set Σ∗ , that is the probability
of the set ΘT \ Σ∗ is sufficiently small. This distribution Q is usually called asymptotically least favorable prior distribution. In our case, it is defined in the following way:
Dalalyan – Kutoyants
508
let ηl,m , l = 0, ±1, ±2, . . . , ±L, m = 0, ±1, . . . , ±M , be i.i.d. random variables with
common density p(·), such that
Z
|ηl,m | ≤ K,
E[ηl,m ] = 0,
2
E[ηl,m
] = 1,
K
J=
−K
ṗ(x)2
dx = 1 + ε,
p(x)
where ε → 0 as K → ∞ (forpexample, one can set K = ε−1 ). Then Q is the distribution
of the random array ϑl,m = σl (ε) ηl,m , in other words, it is the product measure
µ
¶
Y Y
ϑl,m
1
p
p
Q(dϑ) =
p
dϑl,m .
(3.5)
σl (ε)
σl (ε)
|m|≤M |l|≤L
Therefore the Fisher information of one component of this prior distribution Q is
Jl,m = Jl = (1 + ε)σl (ε)−1 .
We shall evaluate now the Bayesian risk, which will give us√the main part
¡ of the minimax¢ risk. Remark firstly that the set of functions el,m (x) = A−1 T β el A−1 T β (x −
am ) forms an orthonormal basis of the Hilbert space L2 [−A, A] and hence, using Parseval’s identity, we can write
Z
Z A
1
2
¯
[fT (x) − fϑ (x)] dx − E
If−1 (S, x)dx
T
−A
−A
#
"
X X
4
≥
E |ψl,m,T − ψl,m,ϑ |2 − E |Ψl,m,ϑ (ξ)|2
T
EQ R̂T (f¯T , fϑ ) = E
A
|m|≤M |l|≤L
−
X X 4
E |Ψl,m,ϑ (ξ)|2 := A1T + A2T .
T
(3.6)
|m|≤M |l|>L
(T )
where E is the expectation with respect to the probability measure dPϑ × Q(dϑ) and
ψl,m,T , ψl,m,ϑ and Ψl,m,ϑ (y) are the Fourier coefficients
Z
ψl,m,T =
f¯T (x) el,m (x) dx,
Im
Z
ψl,m,ϑ =
fϑ (x) el,m (x) dx,
Z
Im
Ψl,m,ϑ (y) =
Im
µ
fϑ (x)
χ{y>x} − Fϑ (y)
fϑ (y)
¶
el,m (x) dx
of the corresponding functions.
From the other nonparametric problems related to this model (see [8, 9]), it turned
out that the quantity
µ
Il,m (ϑ) = Eϑ
∂S(ϑ, ξ)
∂ϑl,m
¶2
=
£ 2
¤
2A
Eϑ gl,m
(ξ) < ∞
∗ (am )
T βf
Second order minimax density estimation
509
plays the role of Fisher information concerning the parameter ϑl,m in the van Trees inequality (see Gill and Levit [14]). That is, the following inequality holds
³
´2
∂ψ
EQ ∂ϑl,m,ϑ
l,m
E |ψl,m,T − ψl,m,ϑ |2 ≥
.
T EQ Il,m (ϑ) + Jl,m
From this inequality one can deduce that
E|ψl,m,T − ψl,m,ϑ |2 −
4Jl E[Ψ2l,m,ϑ (ξ)]
4
E[Ψ2l,m,ϑ (ξ)] ≥ −
+ µl,m,T , (3.7)
T
T (T EQ Il,m (ϑ) + Jl )
where
µl,m,T =
∂ψl,m,ϑ 2
∂ϑl,m )
(EQ
− 4 EQ [Il,m (ϑ)] E |Ψl,m,ϑ (ξ)|2
T EQ [Il,m (ϑ)] + Jl
.
Due to Lemma 5.4, Section 5, the sum of the terms µl,m,T is less than a power of A
2k
divided by T 1+2β , which is clearly oT (1)/T 2k−1 . This implies that the main part of
the convergence of the second order Bayesian risk is given by the first terms of inequality (3.7).
Now we shall evaluate the sum of the first terms in the inequality (3.7). Since the
function gl,m (·) is very close in supremum norm to the normalised function el,m (·), we
have
EQ [Il,m (ϑ)] ≥ 2AT −β (1 − ε) .
Remark that the sum over m of the first terms in the inequality (3.7) corresponding to
2k
the case l = 0 is smaller in order than T − 2k−1 , since the value J0 is chosen significantly
smaller than the other Jl ’s. Further, using the result of Corollary 5.3, Section 5, together
with the inequality of Minkovsky, we get
A1T ≥ −
4
T
M
X
±L
X
Jl E |Ψl,m,ϑ (ξ)|2
oT (1)
+
2k
T EQ Il,m (ϑ) + Jl
T 2k−1
m=−M l=±1
M
L
X
X
Jl A2 f∗ (am )(T 2β π 2 l2 )−1
oT (1)
+
2k
2T 1−β A(1 − ε) + Jl
T 2k−1
m=−M l=1
=−
16(1 + ε)
T
≥−
L
8(1 + ε)2 A X
1
oT (1)
+
2k .
(1 − ε)T 1+β π 2
l2 (2T 1−β A σl (ε) + 1) T 2k−1
l=1
Further, direct calculations (for more details, see [9]) yield the estimate
Z A
(k)
[f (k) (ϑ, x) − f∗ (x)]2 dx
−A
Z
A
= 4(1 + oT (1))
(k−1)
[S (k−1) (ϑ, x) − S∗
−A
= 4(1 + oT (1))
X
|m|≤M
2A
T β f∗ (am )
X
(x)]2 f∗2 (x)dx
ϑ2l,m f∗2 (am )
1≤|l|≤L
µ
πlA
Tβ
¶2(k−1)
.
(3.8)
(3.9)
Dalalyan – Kutoyants
510
If we evaluate now the mathematical expectation with respect to Q of this expression,
using the fact that the partial sums of the function f∗ (·) tend to its integral (which equals
1), we obtain
Z
(k)
[f (k) (ϑ, x) − f∗ (x)]2 dx
E
R
= 4(1 + oT (1))
X 2Af∗ (am )
Tβ
|m|≤M
= 4(1 + oT (1))
X
1≤|l|≤L
X
µ
¶2(k−1)
πlT β
σl (ε)
A
1≤|l|≤L
µ
¶2(k−1)
πlT β
σl (ε)
.
A
(3.10)
We can now explain the choice (3.5) of the prior distribution. It is clear from (3.8) that
the least favorable prior is the one, for which the Fisher informations Jl are minimal
under the constraint (3.10) involving only the variances {σl (ε), l = ±1, . . . , ±L}. It is
well known that the minimum of the Fisher information of a random variable with zero
mean and fixed variance σ is attained by the Gaussian distribution and this minimum is
equal to σ −1 . We can not use here directly the product of normal distributions as a prior
distribution, since for Höffding’s inequality we need to have bounded random variables.
That is why we use an approximation of Gaussian distribution and we pay a factor 1 + ε
for the error of this approximation.
In order to determine the values of σl , we solved the maximisation problem derived
from (3.9) and (4.3). Thus, we looked for the sequence y = (yl ) maximising the functional
Φ(y) =
L
X
l=1
1
l2 (2T 1−β A y
l
+ 1)
,
ª
¡ πlT β ¢2(k−1)
©
PL
≤ R . This can be done
over the set E(k, R) = y ∈ RN
+ : 8
l=1 yl
A
directly using the method of Lagrange multipliers and the result is given by yl∗ equal
¢
−1 ¡¯ −1 ¯k
¯Λl ¯ − 1 , which leads to the definition of σl presented in the
to (2AT 1−β )
+
beginning of the proof.
Now we shall evaluate the sums in right hand sides of (3.8) and (3.10). For the second
one, setting Λε = Λ(1 − ε) we have
X
1≤|l|≤L
µ
πlT β
σl (ε)
A
¶2(k−1)
1
=
2AT 1−β
=
π
2(k−1)
¶ µ
¶2(k−1)
X µ· Λε ¸k
πlT β
−1
l
A
+
1≤|l|≤L
Λ2k−1
ε
2A2k−1
X µ· l ¸k−2 · l ¸2k−2 ¶ 1
−
.
Λε
Λε
Λε
1≤|l|≤L
Second order minimax density estimation
511
Since Λε tends to infinity, when T tends to infinity, the last sum is close to the integral
over [0, 1] of the function xk−2 − x2k−2 . This yields
X
µ
¶2(k−1)
Z 1
πlT β
(1 + oT (1))π 2(k−1) Λ2k−1
ε
(xk−2 − x2k−2 ) dx
σl (ε)
=
A
2A2k−1
0
1≤|l|≤L
=
(1 + oT (1))π 2(k−1) Λ2k−1
k
R(1 − ε)
ε
≤
.
2A2k−1
(k − 1)(2k − 1)
8
This upper estimate permits us to use the Höffding inequality to obtain an exponential
bound for Q-probability of the set ΘT \ Σ∗ (see [9] for details). Thus we get
Q(ΘT \ Σ∗ ) ≤ CT −2 .
(3.11)
Evaluating the sum of (3.8) in the same way, we get
A1T
µ ¶k
Z
L
8(1 + ε)2 A X 1
l
8(1 + ε)2 A(1 + oT (1)) 1 k−2
≥−
=
−
x
dx
(1 − ε)T 1+β π 2
l2 Λε
(1 − ε)2 T 1+β π 2 Λ
0
l=1
8(1 + ε)3 A
≥−
.
(1 − ε)2 T 1+β π 2 Λ(k − 1)
Similarly, due to Corollary 5.3, Section 5,
A2T = −
Z
∞
8A(1 + oT (1)) X −2
8A(1 + oT (1)) ∞ −2
l
=
−
x dx
T 1+β π 2
Λε T 1+β π 2
1
l=L+1
≥ −
8A(1 + ε)
8(1 + ε)3 A
≥−
.
1+β
2
Λε T
π
(1 − ε)2 T 1+β π 2 Λ
Putting together the last two estimates we get the following lower bound for the Bayesian
risk with prior distribution Q:
EQ RT (f¯T , fϑ ) ≥ −
2k
(1 + ε)3
8(1 + ε)3 Ak
− 2k−1
=
−T
Π̂(k,
R)
.
(1 − ε)2 T 1+β π 2 Λ(k − 1)
(1 − ε)2
This inequality, (3.4) and (3.11) imply the desired result, since f¯T ∈ WT and the real
number ε can be chosen as small as we want.
2
4 Estimator
Our goal now is to construct an estimator attaining the lower bound obtained in the previous section. Note that in the problem of signal detection with Gaussian white noise,
there is a linear estimator which is second order minimax. The linear filter defining this
estimator utilises the so called Pinsker’s weights.
The same assertion holds true for the problem of distribution function estimation
from i.i.d. observations. In this case the second order minimax estimator is linear with
Dalalyan – Kutoyants
512
respect to the (global) trigonometric basis on the interval [−B, B], where B tends to
infinity when the number of observations tends to infinity.
It turns out that an analogous result holds true in the model of ergodic diffusion, but
one has to be careful in the definition of linear estimator. More precisely, the estimation
of the function f in L2 -norm is equivalent to estimating in L2 its Fourier transform.
Therefore, it would be natural to expect that the best linear estimator of the Fourier
transform of f (which leads to a kernel estimator) is asymptotically optimal over the set
of all possible estimators. But this assertion turned out to be wrong.
A very heuristical explanation of this fact is that when we consider the linear estimators of the Fourier transform, we disperse the information concerning each point over the
whole real line and therefore, the estimator does not take into account the local structure
of the model with respect to the space parameter. That is why, the localised bases of
functions fit better to our model, in particular it appears that a slightly modified linear
estimator with respect to a localised trigonometric basis is second order asymptotically
minimax.
To construct our estimator, we introduce the localised orthonormal basis {el,m :
l, m ∈ Z} of L2 (R) defined in the following way
¾
½
1
iπl(x − am + δT )
el,m (x) = √
exp
χ{x∈Im } .
δT
2δT
√
Above i = −1 and Im is the interval of length 2δT with center am = 2mδT . We call
this basis localised since δT tends to zero when T → ∞. The Fourier coefficients of the
local time and the invariant density with respect to this basis are
Z
Z
1 T
ϕ◦T,l,m =
el,m (x)fT◦ (x) dx =
e (Xt ) dt,
T 0 l,m
R
Z
el,m (x)fS (x) dx = ES [el,m (Xt )],
ϕS,l,m =
R
where el,m states for the complex conjugate of el,m . Of course, the estimation of the
function fS (·) in L2 is equivalent to the estimation of the infinite matrix {ϕS,l,m : l, m ∈
Z}. Since ϕ◦T,l,m is the empirical estimator of ϕS,l,m , the linear estimator of {ϕS,l,m :
l, m ∈ Z} is defined as
ϕ̄T,l,m = ϕT,l,m ϕ◦T,l,m ,
l, m ∈ Z,
(4.1)
where ϕT,l,m are some numbers between 0 and 1. This leads us to the ensuing definition
P
of the linear estimator of the invariant density: f¯T (x) = l,m∈Z ϕT,l,m ϕ◦T,l,m el,m (x).
In order to attain the lower bound of the previous section, this linear estimator should be
modified. In some sense, we have to correct it at boundaries. Let us discuss it more in
details. By condition the estimated function fS belongs to the Sobolev class Σ(k, R). We
would like to derive from this condition an ellipsoid type inequality for the coefficients
ϕS,l,m . Using the integration by parts formula, for l 6= 0, we get
ϕS,l,m = ϕf (k) ,l,m
³ δ ´k
T
iπl
k−1
¢³ δT ´j+1
(−1)l X ¡ (j)
−√
.
f (am + δT ) − f (j) (am − δT )
iπl
2δT j=0
Second order minimax density estimation
513
R
(k)
This last sum will be denoted ql,m (f ). Now the condition R (fS (x))2 dx ≤ R can be
rewritten as
³ πl ´2k
XX
|ϕS,l,m + ql,m (fS )|2
≤ R.
(4.2)
δT
m∈Z l∈Z
Recall that the regularity condition imposed on the estimated function determines the
behaviour of the bias and does not influence the variance. That is why we want to modify
the linear estimator defined by (4.1) so that the bias of the modified estimator be linear
with respect to ϕS,l,m + ql,m (fS ) and its variance remains unchanged. It is easy to check
that the sequence
ϕ∗T,l,m = ϕT,l,m (ϕ◦T,l,m + ql,m (fS )) − ql,m (fS ),
l, m ∈ Z,
satisfies these conditions, but it is not an estimator since it involves the unknown function
fS via the term ql,m (fS ). To overcome this problem, we substitute this term by a rate
optimal estimator q̂T,l,m . Thus, ϕ̃T,l,m = ϕT,l,m (ϕ◦T,l,m + q̂T,l,m ) − q̂T,l,m , l, m ∈ Z,
where
k−1
¢³ δT ´j+1
(−1)l X ¡ ˜(j)
(j)
q̂T,l,m = √
fT (am + δT ) − f˜T (am − δT )
iπl
2δT j=0
=
k−1
(−1)l X ³ δT ´j+1
√
T 2δT j=0 ihT πl
Z T·
³a + δ − X ´
³ a − δ − X ´¸
m
T
t
m
T
t
Q(j)
− Q(j)
dt
h
h
T
T
0
and Q is a kernel of order k with support [−1, 1]. From now on, we set hT = T −1/(2k−1)
1/2
and δT = hT . Note that this trick with ql,m has been already exploited by Delattre and
Hoffmann [10], it is particularly useful in the case of nonhomogeneous Fisher information.
We show in this section that there exists a modified linear estimator which is second
¡
order minimax. The weights ϕT,l,m of this estimator are defined by ϕ̂T,l,m = 1 −
¯ ¯k+µT ¢
√
¯l/ν̂ ¯
, where µT = 1/ log T tends to zero as T → ∞ and
+
µ
ν̂ = ν̂T = δT
8kπ 2(k−1)
T R(k − 1)(2k − 1)
1
¶− 2k−1
.
We denote by fˆT (·) the estimator of the invariant density defined via these weights. Note
that ν̂T tends to infinity as T tends to infinity.
Theorem 4.1 Let k > 1, then the estimator fˆT is second order asymptotically minimax,
that is
2k
lim sup T 2k−1 R(fˆT , fS ) = − Π̂(k, R).
(4.3)
T →∞ S(·)∈Σ∗
Dalalyan – Kutoyants
514
Proof: Let S(·) ∈ Σ∗ (k, R). Recall that the normalised first order minimax risk asymptotically equals
·
¸
Z
X
χ{ξ>x} − FS (ξ) 2
£
¤
2
R(S) = 4
dx =
ES Ψ2S,l,m (ξ)
fS (x) ES
f
(ξ)
R
S
l,m∈Z
where we applied the Parseval identity to the square integrable function gS (·, x) having
the following Fourier coefficients:
¸
·
Z
χ{x>y} − FS (x)
ΨS,l,m (x) = 2 el,m (y)fS (y)
dy.
fS (x)
R
Using this decomposition, one can rewrite the second order risk function as
RT (fˆT , fS )
"
#
X ³¯
¯2
£ 2
¤´
−1
¯
¯
= ES
ϕ̃T,l,m − ϕS,l,m − T ES ΨS,l,m (ξ)
l,m∈Z
=
X
"
=
X
◦
T,l,m ϕT,l,m
ES
l,m∈Z
¯
¯ϕ̂
2
− ϕS,l,m + (ϕ̂T,l,m − 1)q̂T,l,m | −
Ψ2S,l,m (ξ)
#
T
"
¯
¡
¢ ¡
¢
ES ¯ϕ̂T,l,m ϕ◦T,l,m − ql,m (fS ) − ϕS,l,m − ql,m (fS )
l,m∈Z
#
¡
¢¯2
£ 2
¤
1
¯
+ (ϕ̂T,l,m − 1) q̂T,l,m − ql,m (fS ) − ES ΨS,l,m (ξ) .
T
On the one hand, we shall show below that the term
X
¯
¡
¢ ¡
¢¯2
ES ¯ϕ̂T,l,m ϕ◦T,l,m − ql,m (fS ) − ϕS,l,m − ql,m (fS ) ¯
l,m∈Z
is of order T −1 . On the other hand, the second inequality of Lemma 5.5 implies that
¯
¡
¢¯2
P
¯
¯ is of order T −(2k+1/2)/(2k−1) . So, a
l,m∈Z ES (ϕ̂T,l,m − 1) q̂T,l,m − ql,m (fS )
simple application of the Hölder inequality yields
X ³ ¯
¡
¢ ¡
¢¯2
RT (fˆT , fS ) ≤
ES ¯ϕ̂T,l,m ϕ◦T,l,m − ql,m (fS ) − ϕS,l,m − ql,m (fS ) ¯
l,m∈Z
£
¤´
−T −1 ES Ψ2S,l,m (ξ) + CT −(2k+1/4)/(2k−1) .
The stationarity of the process X implies that the mathematical expectation of ϕ◦T,l,m is
equal to ϕS,l,m , therefore
X ³
RT (fˆT , fS ) =
|ϕ̂T,l,m − 1|2 |ϕS,l,m − ql,m (fS )|2 + |ϕ̂T,l,m |2 VarS [ϕ◦T,l,m ]
l,m∈Z
£
¤´
− T −1 ES Ψ2S,l,m (ξ) + CT −(2k+1/4)/(2k−1) .
Second order minimax density estimation
515
The first term of this sum is in the appropriate form and we show below that it is of
2k
order T − 2k−1 . To determine how behave the second and the third terms, we need several
technical results.
Lemma 4.2 The following relations hold:
¶
µ
X
¤
£ 2
◦
−1
2
|ϕ̂T,l,m | VarS [ϕT,l,m ] − T ES ΨS,l,m (ξ)
l,m∈Z
X ¡
≤ T −2k/(2k+1) oT (1),
¤
¢ £
1 − |ϕ̂T,l,m |2 ES Ψ2S,l,m (ξ)
l,m∈Z
¡
¢
¡
¢ X 8fS (am ) 1 − |ϕ̂T,l,m |2
≥ 1 + oT (1)
.
δT−2 π 2 l2
l,m∈Z
(4.4)
(4.5)
∗
The proof of this lemma is quite technical and relies essentially on the arguments used in
the proofs of Lemmas 5.2–5.4, Section 5. That is why it will be omitted here.
Using this lemma, for any linear filter ϕ̂ = {ϕ̂T,l,m } such that |ϕ̂T,l,m | ≤ 1, the
second order risk can be evaluated as follows:
X ³
RT (fˆT , fS ) ≤
|ϕ̂T,l,m − 1|2 |ϕS,l,m − qlm (fS )|2
l,m∈Z, l6=0
−
¢´
2k+1/4
8δT2 fS (am ) ¡
2
1
−
|
ϕ̂
|
+ CT − 2k−1 .
T,l,m
2
2
Tπ l
Recall that the function fS (·) belongs to Σ∗ (k, R). Acting like in (4.2), we get
X
l2k |ϕS,l,m − ql,m (fS ) − ϕ∗,l,m + ql,m (f∗ )|2 ≤ R.
(πδT−1 )2k
(4.6)
l,m∈Z
Since the central function f∗ is smoother than fS , we have
X
l2k+2µT |ϕ∗,l,m − ql,m (f∗ )|2 < C < +∞,
l,m∈Z
where the constant C does not depend on T . If we use this inequality and the convergence
ν̂TµT → ∞, we obtain
X ¯
¯
¯1 − ϕ̂T,l,m ¯2 | ϕS,l,m − ql,m (fS )|2
l,m∈Z
≤ ν̂T−2k
X
l2k |ϕS,l,m − ql,m (fS ) − ϕ∗,l,m + ql,m (f∗ )|2 (1 + oT (1))
l,m∈Z
≤
(ν̂T δT−1 π)−2k R (1
+ oT (1)).
Since the coefficients ϕ̂T,l,m do not depend on m, we shall omit the index m. We denote
by Z∗ the set of all integers different from zero. Using the fact that the integral of fS (·)
Dalalyan – Kutoyants
516
is equal to one and the convergence of partial sums, we get
¡
¢
2k ¡
¢ X 4δT 1 − |ϕ̂T,l |2
Rδ
T
RT (fˆT , fS ) ≤
1 + oT (1) −
.
(πν̂)2k
π 2 T l2
l∈Z∗
It is easy to check that
¡
¢
X 1 − |ϕ̂T,l |2
X 1 µ ¯¯ l ¯¯k ¯¯ l ¯¯2k ¶ X 1
∼
2¯ ¯ − ¯¯ ¯¯
+
l2
l2 ¯ ν̂ ¯
ν̂
l2
l∈Z∗
|l|≤ν̂
|l|>ν̂
Z
Z
2 1
2 ∞ −2
k−2
2k−2
∼
(2x
−x
) dx +
x dx
ν̂ 0
ν̂ 1
µ
¶
2
1
4k 2
2
−
+1 =
.
=
ν̂ k − 1 2k − 1
ν̂(k − 1)(2k − 1)
¡
¢
Above aT ∼ bT means aT = bT 1 + oT (1) . We are able now to explain the choice of
ν̂; it is simply the value minimising the function
G(ν −1 ) =
RδT2k ν −2k
16k 2 δT ν −1
− 2
.
2k
π
π T (k − 1)(2k − 1)
The substitution of ν̂T by its value leads to the desired bound for the second order risk of
our estimator:
¡
¢
2k
RT (fˆT , fS ) ≤ 1 + oT (1) T − 2k−1 Π̂(k, R).
2
This completes the proof of the theorem.
Remark 4.3 We supposed in all statements and proofs that the diffusion coefficient σ is
identically 1, but this condition can be relaxed. There are two possible approaches for
obtaining optimal constants in this case:
• Either one considers the weighted (second order) L2 -risk
Z
ES
¡
R
¢2
4
f¯T (x) − fS (x) σ 2 (x) dx −
T
Ã
Z
R
fS2 (x)σ 2 (x)ES
χ{ξ>x} − FS (ξ)
σ(ξ)fS (ξ)
!2
dx
© R (k)
ª
and defines the weighted Sobolev ball
(fS − f∗(k) )2 σ 2 ≤ R as the parameter
space. Then the optimal constant remains unchanged.
• Or one refuses to change the risk definition but accepts to work in local minimax
settings, that is when the estimated function belongs to a narrowing vicinity of an
unknown (but fixed) function f∗ . In that case the optimal constant depends on the
central function f∗ and has the following form:
µZ
Π̂(f∗ , k, R) = 2(2k − 1)
R
2k µ
2k
¶ 2k−1
¶ 2k−1
1
4k
f∗ (x)
dx
R 2k−1 .
σ 2 (x)
π(k − 1)(2k − 1)
Second order minimax density estimation
517
Remark 4.4 The optimal constant Π̂(k, R) we obtained in this work contains a factor
2 (therefore is twice smaller) which is absent in the second order optimal constant in
the problem of distribution function estimation from i.i.d. observations (cf. Golubev and
Levit [17]). In some sense, it reveals that the estimation problem in our settings is easier
than in the settings of [17]. This phenomenon has a simple explanation.
In the definition of the parameter set, we required that the function S satisfies the
inequality sgn(x)S(x) ≤ γ∗ for x 6∈ [−A, A]. This condition excludes automatically
the density functions fS possessing heavy tails, moreover, it implies that these densities
decrease exponentially fast at infinity. In particular, the densities close to those of uniform
distributions over a large interval do not belong to our nonparametric class. But exactly
these densities are the most difficult to estimate, because the observations distributed
following these laws are quite dispersed on the whole real line and contain very few
information about the local variation of the underlying density function.
Remark 4.5 We estimated the infinite matrix ϕS = {ϕS,l,m }l,m∈Z of the Fourier coefficients of fS by the infinite matrix ϕ̃T = {ϕ̃T,l,m }l,m∈Z . In reality, this last matrix
contains only a finite number of elements different from null. Indeed, for large values
of m, the intervals Im contain no observation and corresponding coefficients ϕ̃T,l,m are
therefore equal to zero. So we used here a kind of soft thresholding procedure.
5
Proofs of technical lemmas
Lemma 5.1 In the notation of Section 3, for any integer l different from 0, we have
E[Ψ2l,m,ϑ (ξ)χ{ξ6∈Im } ] ≤
CA3
.
l2 T 3β
Proof: We denote δ = A/T β and divide the indicator into two parts
E[Ψ2l,m,ϑ (ξ)χ{ξ6∈Im } ] = E[Ψ2l,m,ϑ (ξ)χ{ξ>am +δ} ] + E[Ψ2l,m,ϑ (ξ)χ{ξ<am −δ} ].
The first term can be evaluated as follows:
µZ
E[Ψ2l,m,ϑ (ξ)χ{ξ>am +δ} ] = E
¶2 µ
Im
1 − Fϑ (ξ)
χ{ξ>am +δ}
fϑ (ξ)
¶2
fϑ (x)el,m (x) dx ,
fϑ (x)el,m (x) dx
µZ
≤ CEQ
Im
¡
¢
since one can prove exactly like in [9], that fϑ (x) = f∗ (x) 1 + oT (1) and
Z
∞
am +δ
µ
1 − F∗ (y)
f∗ (y)
¶2
f∗ (y)dy < CAf∗−1 (am + δ).
¶2
Dalalyan – Kutoyants
518
The first multiplier will be evaluated using the integration by parts formula:
·
Z
¢
δ ¡
fϑ (x)el,m (x) dx =
fϑ (am + δ) − fϑ (am − δ) e−l,m (am − δ)
πl
Im
¸
Z
−
fϑ0 (x) e−l,m (x) dx
Im
Using the fact that fϑ (x) is differentiable, its derivative is 2Sϑ (x)fϑ (x) and that Sϑ (·)
is bounded by 2A on the interval [−A, A] for any ϑ ∈ ΘT , we get
¯Z
¯
¯
¯ CAδ 32
¯
¯≤
f
(x)e
(x)
dx
f∗ (am ).
ϑ
l,m
¯
¯
l2
Im
Combining these estimates we obtain the desired result for the first term. The term involving the indicator of the event ξ < am − δ can be evaluated analogously. This completes
the proof of the lemma.
2
Lemma 5.2 If l 6= 0 and y ∈ Im , then the following representation holds
¡
¢
¢ AΦl,m,ϑ (y)
A 1 + oT (1) ¡
Ψl,m,ϑ (y) =
e−l,m (y) − e−l,m (am − AT −β ) +
,
β
πlT
πlT β
where the sequence Φl,mϑ is defined as follows
µ
Z
Φl,m,ϑ (y) =
Im
fϑ0 (x)
¶
χ{y>x} − Fϑ (y)
e−l,m (x) dx.
fϑ (y)
Proof: The proof of this lemma is quite close to the proof of the preceding one. Using
the integration by parts formula one gets
Ψl,m,ϑ (y) =
δ
¡
¢
fϑ (y)e−l,m (y) − fϑ (am − δ)e−l,m (am − δ)
πlfϑ (y)
¢
δFϑ (y) ¡
−
f (am + δ) − fϑ (am ) e−l,m (am − δ)
πlfϑ (y) ϑ
µ
¶
Z
χ{y>x} − Fϑ (y)
δ
0
−
f (x)
e−l,m (x) dx.
πl Im ϑ
fϑ (y)
To finish the proof, it remains to remark that on the interval Im , one has |fϑ (y)−fϑ (am −
δ)| ≤ Cδ|fϑ0 (y)| ≤ CAδ|fϑ (y)|, since fϑ0 (x) = 2Sϑ (x)fϑ (x) and Sϑ (x) is bounded
by 1 + |x|.
2
Corollary 5.3 For any l different from 0, we define the number cl to be equal to 1, if
l < 0, and to 3, if l > 0. Then we have
q £
£
¤ A2 f∗ (am )(1 + oT (1)) ³√
¤´2
CA3
2
,
E Ψ2l,m,ϑ (ξ) ≤
c
+
E
Φ
(ξ)
+
l
l,m,ϑ
π 2 l2 T 2β
l2 T 3β
Second order minimax density estimation
519
and (see [9, Appendix], for the proof)
X £
¤
E Φ2l,m,ϑ (ξ) ≤ C.
l,m∈Z
Lemma 5.4 The residual terms µl,m,T defined in (3.7) satisfy |µl,m,T | ≤ CδT3 T −1
where C is a constant.
Proof: The proof of this lemma is rather technical. We give below the main ideas of the
proof and leave the technical details to the reader. First of all, due to Lemma 5.1, it is
enough to prove this inequality for
³
´2
£
¤
∂ψ
EQ ∂ϑl,m,ϑ
− 4 EQ [Il,m (ϑ)] E Ψ2l,m,ϑ (ξ)χ{ξ∈Im }
l,m
µ̄l,m,T =
T EQ [Il,m (ϑ)] + Jl
Next, it can be shown (see Kutoyants [22]) that
s
Z
∂ψl,m,ϑ
2A
=2
g (x)Ψl,m,ϑ (x) fϑ (x) dx.
∂ϑl,m
f∗ (am )T β Im l,m
(5.1)
Since all the function under the integral are differentiable with bounded derivative and
the length of the interval Im is 2δT , we have
s
5
32δT3
∂ψl,m,ϑ
=
gl,m (am )Ψl,m,ϑ (am ) fϑ (am ) + δT2 OT (1)
∂ϑl,m
f∗ (am )
Using the same arguments, one can check that
£
¤
Il,m (ϑ) Eϑ Ψ2l,m,ϑ (ξ)χ{ξ∈Im }
£
¤
2δT
=
E [g 2 (ξ)]Eϑ Ψ2l,m,ϑ (ξ)χ{ξ∈Im }
f∗ (am ) ϑ l,m
8δT3
g 2 (am )Ψ2l,m,ϑ (am ) fϑ (am )2 + OT (δT4 ).
=
f∗ (am ) l,m
(5.2)
It is clear now that the difference of the terms (5.1) and (5.2) is of order δT4 . Since the
denominator of µ̄l,m,T is larger than CT δT , we get the desired inequality.
2
Lemma 5.5 In the notation of Section 4, we have
¡ (j)
¢2
(j)
ES f˜T (a) − fS (a)
≤ Ch2k−2j
(1 + |a|)ν∗ fS (a),
T
¡
¢2
ES q̂T,l,m − ql,m (fS )
Z
k−1
X ³ ν̂ ´2j+2
2k+1/2
T
ν∗
ν̂T−1 .
≤ ChT
(1 + |a|) fS (a) da
l
Im
j=0
(5.3)
(5.4)
Dalalyan – Kutoyants
520
Proof: The proof of this lemma is quite standard, it is based on a bias-variance decomposition, then the bias is estimated in the usual way (like in the i. i. d. case) and the variance
is bounded via the Itô formula exactly like in Kutoyants (1998).
The bias-variance decomposition yields
¡ (j)
¢2
¡
¢2
¡ (j)
¢2
(j)
(j)
(j)
(j)
ES f˜T (a) − fS (a) ≤ 2 ES [f˜T (a)] − fS (a) + 2ES f˜T (a) − ES [f˜T (a)] .
R
In view of stationarity of X and the equality Q(u)du = 1, we get
Z
£
¤
(j)
(j)
˜
ES [fT (a)] − fS (a) =
Q(u) f S (j) (a + uhT ) − f S (j) (a) du.
R
We apply now the Taylor formula
with the rest term in the integral form and the fact that
R
Q is a kernel of order k (i.e. ul Q(u)du = 0 for any integer l ∈ [1, k]):
(j)
(j)
ES [f˜T (a)] − fS (a) =
1
(k − j − 1) !
Z
Z
uhT
(k)
y k−j−1 fS (a + uhT − y) dy du.
Q(u)
R
0
Since Q(u) = 0 if |u| > 1, we obtain
¯
¯
¯ES [f˜(j) (a)] − f (j) (a)¯ ≤
T
S
hTk−j−1
(k − j − 1) !
Z
hT
¯ (k)
¯
¯f (a + y)¯ dy.
−hT
S
The Cauchy–Schwarz inequality yields
¯
¯
¯ES [f˜(j) (a)] − f (j) (a)¯2 ≤
T
S
hT2k−2j−1
[(k − j − 1) !]2
Z
hT
−hT
£ (k)
¤2
fS (a + y) dy.
(5.5)
We turn to the evaluation of the variance. We have
·
Z Tµ
³
´
³
´¸¶
1
(j)
(j)
(j) Xt − a
(j) Xt − a
Q
−
E
Q
dt.
f˜T (a) − ES [f˜T (a)] =
S
hT
hT
T hj+1
0
T
Let us introduce the functions g(u) and H(u) as follows:
·
³
´
³
´¸
(j) u − a
(j) X0 − a
g(u) = Q
− ES Q
,
hT
hT
Z u
Z y
2
H(u) =
g(v)fS (v) dv dy.
f
0
S (y) −∞
It can be easily checked that the Itô formula implies
Z T
Z
g(Xt ) dt = H(XT ) − H(X0 ) −
0
T
H 0 (Xt ) dWt ,
0
and therefore, due to stationarity,
µZ T
¶2
£
¤
£¡
¢2 ¤
ES
g(Xt ) dt ≤ 6ES H 2 (X0 ) + 3T ES H 0 (X0 )
0
Second order minimax density estimation
521
By integrating by parts, we get
Z u Z y
2
g(v) dvfS0 (y) dy
f
(u)
−∞ −∞
−∞
S
Z u
³
´
(j) y − a
=2
Q
dy
hT
−∞
Z Z y
³v − a´
¡
¢
2
−
Q(j)
dvfS0 (y) χ{y<u} − FS (u) dy
fS (u) R −∞
hT
³u − a´
= 2hT Q(j−1)
hT
Z
³y − a´
¡
¢
2hT
Q(j−1)
−
fS0 (y) χ{y<u} − FS (u) dy
fS (u) R
hT
Z
H 0 (u) = 2
u
g(y) dy −
(5.6)
Using the inequality
χ{y<u} − FS (u)
Cχ{u∈[0,y]}
Cχ{u∈[0,y]}
≤C+
≤C+p
,
fS (u)
fS (u)
fS (u)fS (y)
the equality fS0 (x) = S(x)fS0 (x) and the fact that S(·) increases at most like a polynom,
one can prove that the second term in (5.6) is asymptotically negligible with respect to
the first one. Therefore,
µh
³
´i2 ¶
£¡ 0
¢2 ¤
¡
¢
(j−1) X0 − a
ES H (X0 )
≤ 4 1 + oT (1) ES hT Q
hT
Z
¡
¢ 3 1 £ (j−1) ¤2
≤ 4 1 + oT (1) hT
Q
(u) fS (a + uhT ) du
Z
≤ Ch3T
−1
a+hT
a−hT
fS (x) dx.
It is evident that the term involving the expectation of H 2 (X0 ) is asymptotically much
smaller than the term T ES [H 0 (X0 )2 ]. Consequently,
¡ (j)
¢2
(j)
ES f˜T (a) − fS (a)
Z a+hT
£ (k) ¤2
≤ ChT2k−2j−1
fS (y) dy +
a−hT
Z
≤ Ch2k−2j−1
T
a+hT
a−hT
C
T h2j−1
T
Z
a+hT
a−hT
fS (x) dx
fS (x) dx ≤ Ch2k−2j
(1 + |a|)ν∗ fS (a).
T
Therefore, using the equality
Z
k−1
¡ (j+1)
¢
(−1)l X ³ δT ´j+1
(j+1)
q̂T,l,m − ql,m (fS ) = √
f˜T
(a) − fS
(a) dx
iπl
2δT j=0
Im
Dalalyan – Kutoyants
522
and the Cauchy–Schwarz inequality, we get
¯
¯2
ES ¯q̂T,l,m − ql,m (fS )¯
¶2
µZ
k−1
¡ (j+1)
¢
k X ³ δT ´2j+2
(j+1)
˜
ES
fT
(a) − fS
(a) dx
≤
2δT j=0 πl
Im
≤
k−1
C X ³ δT ´2j+2 2k−2j
hT
(1 + |am | − δ)ν∗ fS (am − δ)
δT j=0 l
≤
Ch2k+1
δT−1
T
Z
Im
(1 + |a|)ν∗ fS (a) da
k−1
X³
j=0
ν̂T ´2j+2 −1
ν̂T .
l
In the last inequality we have used the fact that the step of localisation δT is equal to the
square root of hT and that ν̂T is of order δT /hT .
2
References
[1] G. Banon. Nonparametric identification for diffusion processes. SIAM J. Control
and Optim., 16, 3:380–395, 1978.
[2] D. Bosq. Nonparametric Statistics for Stochastic Processes. Lecture Notes in
Statistics, 110, Springer, New York, 1998.
[3] D. Bosq and Yu. Davydov. Local time and density estimation in continuous time.
Math. Methods. Statist., 8:22–45, 1999.
[4] J. V. Castellana and M. R. Leadbetter. On smoothed density estimation for stationary processes. Stoch. Proc. Appl., 21:179–193, 1986.
[5] L. Cavalier, G. K. Golubev, D. Picard and A. B. Tsybakov. Oracle inequalities for
inverse problems Ann. Statist., 30, 843–874, 2002.
[6] L. Cavalier and A. B. Tsybakov. Penalized blockwise Stein’s method, monotone
oracles and sharp adaptive estimation Math. Methods Statist., 10, 3:247–282, 2001.
[7] A. S. Dalalyan. Sharp adaptive estimation of the trend coefficient for ergodic diffusion. Prépublication no. 02–1, Laboratoire Statistique et Processus, Univ. du Maine.
[8] A. S. Dalalyan and Yu. A. Kutoyants. Asymptotically efficient estimation of the
derivative of the invariant density. Statist. Infer. Stochast. Processes, Vol. 6, 1:89–
109, 2003.
[9] A. S. Dalalyan and Yu. A. Kutoyants. Asymptotically efficient trend coefficient
estimation for ergodic diffusions. Math. Methods Statist., Vol. 11, 4:402–427, 2002.
[10] S. Delattre and M. Hoffmann. The Pinsker bound in mixed Gaussian white noise.
Math. Methods Statist, Vol. 10, 3:283–316, 2001.
Second order minimax density estimation
523
[11] R. Durrett. Stochastic Calculus: a practical introduction. CRC Press, Boca Raton,
1996.
[12] S. Yu. Efromovich. Nonparametric Curve Estimation. Springer, N. Y. e. a, 1998.
[13] I. I. Gikhman and A. V. Skorokhod. Introduction to Theory of Random Processes.
W.B. Saunders, Philadelphia, 1969.
[14] R. D. Gill and B. Ya. Levit. Application of the van Trees inequality: a Bayesian
Cramer–Rao bound. Bernoulli, 1:59–79, 1995.
[15] G. K. Golubev. LAN in problems of non-parametric estimation of functions and
lower bounds for quadratic risks. Theory Probab. Appl., 36, 1:152–157, 1991.
[16] G. K. Golubev and W. Härdle. On adaptive smoothing in partial linear models.
Math. Methods Statist., Vol. 11, 1:98–117, 2002.
[17] G. K. Golubev and B. Ya. Levit. On the second order minimax estimation of distribution functions. Math. Methods Statist., Vol. 5, 1:1–31, 1996.
[18] G. K. Golubev and B. Ya. Levit. Asymptotically efficient estimation for analytic
functions. Math. Methods Statist., Vol. 5, 357–368, 1996.
[19] G. K. Golubev and B. Ya. Levit. Distribution function estimation: adaptive smoothing. Math. Methods Statist., Vol. 5, 383–403, 1996.
[20] Yu. A. Koshevnik and B. Ya. Levit. On a non-parametric analogue of the information matrix. Theory Probab. Appl., Vol. 21, 738–753, 1976.
[21] Yu. A. Kutoyants. Efficient density estimation for ergodic diffusion. Statistical
Inference for Stochastic Processes. Vol. 1, 2:131–155, 1998.
[22] Yu. A. Kutoyants. Statistical Inference for Ergodic Diffusion Processes. Springer
Series in Statistics, New York, 2003.
[23] F. Leblanc. Density estimation for a class of continuous time process. Math. Methods of Statist., Vol. 6, 2:171–199, 1997.
[24] O. V. Lepski and V. G. Spokoiny. Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist., 25:2512–2546, 1997.
[25] H. T. Nguyen. Density estimation in a continuous-time Markov processes. Ann.
Statist., 7:341–348, 1979.
[26] M. Nussbaum. Minimax risk: Pinsker’s bound. In Encyclopedia of Statistical
Sciences, Update Volume 3, 451–460 (S.Kotz, Ed.), Wiley, New York, 1999.
[27] M. Nussbaum. Asymptotic equivalence of density estimation and white noise. Ann.
Statist., 24:2399–2430, 1996.
[28] M. S. Pinsker. Optimal filtering of square integrable signals in Gaussian white noise.
Problems Inform. Transmission, 16:120–133, 1980.
524
Dalalyan – Kutoyants
[29] B. L. S. Prakasa Rao. Nonparametric density estimation for stochastic processes
from sampled data. Publ. ISUP, XXXV:51–84, 1990.
Arnak S- Dalalyan
Laboratoire de Probabilités
Université Paris 6, Boı̂te courrier 188
75252 Paris cedex 05, France
dalalyan@ccr.jussieu.fr
Yu.A. Kutoyants
Laboratoire de Statistique et Processus
Université du Maine, Av. O. Messiaen
72085 Le Mans cedex 9, France
kutoyants@univ-lemans.fr
Download