Statistics & Decisions 22, 500–524 (2004) c R. Oldenbourg Verlag, München 2004 ° On second order minimax estimation of invariant density for ergodic diffusion Arnak S. Dalalyan, Yury A. Kutoyants Received: March 17, 2003; Accepted: March 3, 2004 Summary: There are many asymptotically first order efficient estimators in the problem of estimating the invariant density of an ergodic diffusion process nonparametrically. To distinguish between them, we consider the problem of asymptotically second order minimax estimation of this density based on a sample path observation up to the time T . It means that we have two problems. The first one is to find a lower bound on the second order risk of any estimator. The second one is to construct an estimator, which attains this lower bound. We carry out this program (bound+estimator) following Pinsker’s approach. If the parameter set is a subset of the Sobolev ball of smoothness k > 1 and radius R > 0, the second order minimax risk is shown to behave as −T −2k/(2k−1) Π̂(k, R) for large values of T . The constant Π̂(k, R) is given explicitly. 1 Introduction The Model. In this paper we deal with a diffusion process X T = {Xt , 0 ≤ t ≤ T } given by the stochastic differential equation dXt = S(Xt ) dt + σ(Xt ) dWt , X0 = ξ, 0 ≤ t ≤ T, (1.1) where W T = {Wt , 0 ≤ t ≤ T } is the standard one-dimensional Wiener process. We suppose that the trend coefficient S(·) and the diffusion coefficient σ(·)2 are such that the equation (1.1) has a unique weak solution and the conditions ½ ¾ Z x Z y S(z) exp − 2 dz dy −→ ±∞ as x → ±∞, (1.2) 2 0 0 σ(z) ½ Z y ¾ Z ∞ S(z) G(S) = σ(y)−2 exp 2 dz dy < ∞ (1.3) 2 −∞ 0 σ(z) are fulfilled. We fix a diffusion coefficient σ(·)2 and denote by Σ the class of all trend coefficients satisfying conditions (1.2) and (1.3). By these conditions the process X = AMS 1991 subject classifications. Primary: 62M05, 62G20; Secondary: 62G07, 60G35 Key words and phrases: ergodic diffusion, invariant density estimation, second order minimax, lower bound, Pinsker’s constant Second order minimax density estimation {Xt , t ≥ 0} is recurrent positive with the density of invariant law ½ Z x ¾ 1 S(z) fS (x) = exp 2 dz . 2 G(S) σ(x)2 0 σ(z) 501 (1.4) The aim of the present paper is to estimate the function fS (·) nonparametrically, given a sample path X T = {Xt , 0 ≤ t ≤ T } of the diffusion process X defined by (1.1). We assume that σ(·) is a known, sufficiently smooth, positive function such that σ + σ −1 increases at most polynomially. The trend coefficient is supposed to be unknown. First order efficiency. During last twenty years the problem of stationary density estimation for continuous time processes has been studied by many authors. We mention here Nguyen [25], Prakasa Rao [29], Castellana and Leadbetter [4], Leblanc [23], Kutoyants [21], Bosq and Davydov [3], Bosq [2]. It is proven by Castellana and Leadbetter [4] that for processes satisfying some strong mixing condition, this density can be estimated with the “parametric” rate T −1/2 . In most situations, the above mentioned mixing condition is not easy to verify since it involves the joint density of X0 and Xt , which in general can not be written explicitly. In Kutoyants [21, 22], it is shown that for one dimensional ergodic diffusion process, one can obtain the same rate of convergence without verifying that mixing condition. The nonparametric analogue of the Fisher information in this problem is shown to be equal · µ ¶ ¸ χ{ξ>x} − FS (ξ) 2 −1 2 I(S, x) = 4fS (x) ES . σ(ξ)fS (ξ) (1.5) For more detailed discussion on the Fisher information and its connection with bounding the minimax risk the interested reader is referred to Koshevnik and Levit [20], where the i.i.d. case is presented. Here and in the sequel, ES stands for the expectation with respect to the probability measure induced by the process (1.1) on the space of continuous functions from [0, T ] to R equipped with the Borel σ-algebra defined by uniform convergence. Also, the random √ variable ξ is supposed to follow the invariant law. In fact, the existence of T -consistent estimators in the case of ergodic diffusion can also be explained by the fact that, due to (1.4), the invariant density is a smooth functional of the trend coefficient. Since the trend coefficient can be estimated with the standard nonparametric rate of convergence, it is natural that a smooth functional of trend is estimated with rate T −1/2 . An example illustrating the connection between the minimax estimation of a functional parameter and the second order minimax estimation of a smooth functional of this parameter can be found in Golubev and Levit [18]. Furthermore, it is shown in [21] that under mild regularity conditions the local-time estimator Z T 1 |XT − x| − |X0 − x| ◦ fT (x) = sgn(x − Xt ) dXt + , T σ(x)2 0 T kernel-type estimators f¯K,T (x) and a wide class of unbiased estimators f˜T (x) are consistent, asymptotically normal and asymptotically (first-order) efficient. Moreover, as it is proved in Kutoyants [22, Section 4.3] the corresponding lower bound on the minimax Dalalyan – Kutoyants 502 risk and the efficiency of the estimators hold true for the L2 -type risk as well. Particularly, if the bandwidth is of order T −1/2 or smaller, then the kernel estimator satisfies Z Z ¡ ¢2 T ES [ f¯K,T (x) − fS (x) ] dx −−−−→ R(S) = I(S, x)−1 dx. T →∞ R R Second order minimax approach. The abundance of the first order efficient estimators motivates our interest in developing the second order minimax approach. The main concepts of this approach can be presented as follows. Any of above cited estimators attaining the lower bound (T I(S, x))−1 , is unbiased or has an asymptotically negligible bias with respect to the variance. Furthermore, the choice of the smoothing parameter (the bandwidth in the case of kernel estimator) does not make the main part of the variance smaller. However, it is clear that the oversmoothing (choosing the smoothing parameter larger) decreases the variance, but this effect is perceptible only on the second term of the asymptotic expansion of the variance. So the main idea of the second order minimax approach is to choose the smoothing parameter by minimizing the sum of the bias and the second term of the variance’s asymptotic expansion. More precisely, we suppose that the function fS (·) is k-times differentiable and the trend coefficient S(·) belongs to some set Σ∗ defined below. The second order risk is then defined as follows: Z ¡ ¢2 ¯ RT (fT , fS ) = ES f¯T (x) − fS (x) dx − T −1 R(S), R where f¯T (x) is an arbitrary estimator of the density. It is evident that for asymptotically efficient estimators f¯T the quantity T RT (f¯T , fS ) tends to zero, as T → ∞. It can be shown that for some of these estimators there exists a non degenerate limit for 2k T 2k−1 RT (f¯T , fS ) and for the others this limit is equal to infinity. Therefore we can compare the performance of these estimators according to the limits of this quantity. It is noteworthy that for the local time estimator, the quantity RT (fT◦ , fS ) tends to zero with the speed T −3/2 (this assertion can be easily derived from the martingale representation of the local time estimator). It means that if the smoothness order k of fS exceeds 3/2, we have T 2k/(2k−1) RT (fT◦ , fS ) → 0, and consequently the minimal limiting value of T 2k/(2k−1) RT (f¯T , fS ) is less than or equal to zero. We show in this paper that in the case of Sobolev regularity k ≥ 2, this minimal value is strictly negative, which entails the inefficiency in second order of the local time estimator. Note that the condition that the function fS belongs to a Sobolev ball of smoothness k ≥ 2 means that the function S is absolutely continuous, which is not a strong restriction. As it is typical for the results describing the asymptotic behaviour of the minimax risk, we divide the problem into two parts. The first one is to find a constant Π̂ such that the following inequality holds 2k lim inf sup T 2k−1 RT (f¯T , fS ) ≥ − Π̂, T →∞ f¯T S(·)∈Σ∗ where the inf is taken over all estimators f¯T (·). This inequality gives us the lower bound for the risks of all estimators. The second part is to construct an estimator fˆT (·) which Second order minimax density estimation 503 attains this bound, i.e., such that lim 2k sup T 2k−1 RT (fˆT , fS ) = −Π̂. T →∞ S(·)∈Σ∗ Such estimators will be called second order minimax over the parameter set Σ∗ . We show in this paper that the optimal constant Π̂ is given by the expression µ Π̂ = Π̂(k, R) = 2(2k − 1) 4k π(k − 1)(2k − 1) 2k ¶ 2k−1 1 R− 2k−1 . (1.6) The methodology. The approach adopted in this work is inspired by the paper Golubev and Levit [17], who have studied the second order minimax estimation of the distribution function (d. f.) in the i.i.d. case (see also Golubev and Levit [18] for the case of infinitely smooth d. f. estimation). In that paper, the authors prove that Pinsker’s phenomenon (see [28]), well known for classical problems of nonparametric curve estimation, appears in the problem of d. f. estimation when the second order minimax approach is considered. The list of articles concerning the Pinsker bound and related topics can be found in Nussbaum [26] and Efromovich [12]. For ergodic diffusion processes we have already obtained similar results in the problems of density derivative [8] and trend coefficient [9] estimation. To obtain a lower bound, we construct a parametric family {S(ϑ, ·), ϑ ∈ ΘT }, which is almost entirely contained in the original nonparametric set Σ∗ . This allows us to evaluate the global minimax risk by the Bayesian risk in a finite dimensional parameter estimation problem. Then we choose the worst prior distribution and the least favorable parametric family and the risk of the latter problem is shown to be greater than T −2k/(2k−1) Π̂(k, R). To construct a second order minimax estimator we use the Hilbert space structure of L2 (R, dx). Namely, we argue that the estimation of fS in L2 -norm is equivalent to the estimation in l2 -norm of Fourier coefficients of fS in a suitably chosen orthonormal basis of L2 (R). Afterwards we proceed to a modification of the linear estimator of these coefficients permitting to remove some boundary effects. Then we prove that Pinsker’s filter leads to a second order minimax estimator. Of course, the estimator proposed in this paper depends on the parameters k and R describing the parameter set. In the case when these parameters are not known, one should use an adaptive estimating procedure. Such a procedure can be provided using the unbiased risk minimization (see for instance Cavalier et al. [5]), the Lepski method (see Lepski and Spokoiny [24]) or blockwise adaptation (see Cavalier and Tsybakov [6]). The exact theoretical study of an adaptive procedure lies beyond the scope of this paper and will be done in forthcoming works. We believe that in our problem, combining the techniques developed in the papers Golubev and Levit [19], Golubev and Härdle [16] and Dalalyan [7], it is possible to construct an adaptive estimating procedure achieving the minimax bound obtained in this paper over a wide range of Sobolev balls. It is worth noting that if the precise value of k is not known, but the function S is supposed to be absolutely continuous (therefore fS0 exists and is absolutely continuous), then the estimator proposed in Section 4 with k = 2 will have a second order risk of order T −4/3 , that is significantly better than the second order risk of the local time estimator. Dalalyan – Kutoyants 504 The parameter set. Let us now describe exactly the conditions imposed on the model. To simplify the computations we put σ(x) ≡ 1, so the diffusion process is given by dXt = S(Xt ) dt + dWt , X0 , 0 ≤ t ≤ T, (1.7) © ª R x and we have to estimate the invariant density fS (x) = G(S)−1 exp 2 0 S(v) dv based on the observations X T = {XT , 0 ≤ t ≤ T }. Some comments on the possible generalizations to the case σ 6≡ 1 are given in the end of Section 4. For technical reasons, we replace conditions (1.2), (1.3) by a stronger one S ∈ Σγ∗ (A∗ , C∗ , ν∗ ), where n Σγ∗ (A∗ , C∗ , ν∗ ) = S ∈ Σ : sgn(x)S(x) ≤ −γ∗ for |x| > A∗ o ¯ ¯ ¯S(x)¯ ≤ C∗ (1 + |x|ν∗ ), ∀x ∈ R . Here γ∗ , A∗ , C∗ and ν∗ are some positive constants. It is easy to see that for such functions S(·) the conditions (1.2), (1.3) are fulfilled. Fix some integer k > 1. The function S(·) is supposed to be (k − 2)-times differentiable with absolutely continuous (k − 2)th derivative and to belong to the set Z n o (k) (k) Σ(k, R) = S(·) ∈ Σ : [fS (x) − fS∗ (x)]2 dx ≤ R , (1.8) R (k) fS (x) where R > 0 is some constant and is the k-th derivative (in the distributional sense) of the function fS (x) w.r.t. x. The set Σ(k, R) is a Sobolev ball of smoothness k and radius R centered at fS∗ = f∗ . The choice of the center is not arbitrary, it should be smoother than the other functions of the class. For simplicity we consider S∗ (x) = −x. In this case the corresponding diffusion is an Ornstein-Uhlenbek process and the invariant density is infinitely differentiable. Finally we define the parameter set Σ∗ = Σ∗ (k, R) = Σ(k, R) ∩ Σγ∗ (A∗ , C∗ , ν∗ ). The paper is organised as follows. The next section contains some heuristical arguments helping to understand the result. We state in Section 3 the lower bound result and prove it up to some technical lemmas. Section 4 is devoted to the construction of the second order minimax estimator. It contains also the proof of its minimaxity. The proofs of the technical results are gathered in Section 5. 2 Heuristical explanation As we mentioned in the introduction, the local time estimator fT◦ (x) is asymptotically normal with mean fS (x) and variance [T I(S, x)]−1 . Since the local time is a sufficient statistic in the model of ergodic diffusion (see Kutoyants [22]), this latter is weakly asymptotically equivalent to the heteroscedastic Gaussian experiment Z χ{u>t} − FS (u) 2f (t) p Yt = fS (t) + √S dBu , T fS (u) R where B = (Bu , u ∈ R) is a Brownian motion. Let now v(·) be a test function (infinitely differentiable with compact support) and V (·) be its primitive. A formal use of Second order minimax density estimation 505 the integration by parts formula yields Z Z Z Z χ{u>t} − FS (u) 2 p v(t)Yt dt = v(t)fS (t) dt + √ v(t)fS (t) dBu dt T R fS (u) R R R Z Z p 2 = − V (t)fS0 (t) dt − √ V (t) fS (t) dBt T R R Z Z χ{u>t} − FS (u) 2 p −√ V (t)fS0 (t) dBu dt. T R fS (u) R The third term of this sum can be dropped in the asymptotical results, since in contrast with the second one, it is uniformly bounded in L2 (see [8]) in the following sense: ¸2 Z Z · χ{u>t} − FS (u) p fS0 (t) sup ES dBu dt < ∞. fS (u) S∈Σ∗ R R Therefore, if we drop the third term we come to the Gaussian experiment p t ∈ R. dYt = fS0 (t) dt + 2 fS (t)T −1 dBt , As it is noted by Golubev and Levit [18], second order minimax estimation of a function, which is a regular functional (invariant density in our case) of the unknown parameter, is closely related to the problem of first order minimax estimation of the derivative of this function. On the other hand, p it is known (Golubev [15]) that in the Gaussian shift experiment dYt = θ(t) dt + ² I −1 (t) R dBt , the optimal constant depends on the Fisher information I(t) only via the integral I −1 (t) dt. Due to the fact that fS is a probability density, the integral of the inverse of the Fisher information in our case is equal to 2 and does not depend on S. This explains why we succeed to obtain the second order minimax (up to constant) bound over a global nonparametric class of the unknown parameter. These arguments show also that the restriction to the L2 -norm in the risk definition is essential for carrying out the optimal constants. To end up with heuristic reasoning, let us remark that the reason, which made the obtention of the second order minimax constant possible in the problem of the distribution function F (·) estimation from i.i.d. observations, was the same. Indeed, according to Nussbaum [27], the statistical problem of F 0 (·) estimation is locally asymptot0 ically equivalent to p recovering the function F (·) from the Gaussian shift experiment 0 0 −1 dYt = F (t) dt + n F0 (t) dBt , where n is the size of the sample and F0 is the center of localisation. 3 Lower bound The first result of this work concerns the asymptotically exact lower bound on the L2 risks of all estimators. This bound turns out to be almost the same as in the problem of the distribution function estimation of a sequence of i. i. d. random variables (Golubev and Levit [17]). The main reason of this similarity is that these constants appear as the solutions of two very similar optimisation problems. Dalalyan – Kutoyants 506 Theorem 3.1 Let the integer k > 1, then 2k lim inf sup T 2k−1 RT (f¯T , fS ) ≥ −Π̂(k, R), (3.1) T →∞ f¯T S∈Σ∗ where µ Π̂(k, R) = 2(2k − 1) 4k π(k − 1)(2k − 1) 2k ¶ 2k−1 1 R− 2k−1 . Proof: The main steps of the proof are close to those of Theorem 1 in [9]. The minimax risk over the nonparametric set Σ is evaluated by the Bayesian risk over a parametric set of increasing dimension (constructed like in Golubev [15] and Dalalyan, Kutoyants [9]), then the van Trees inequality is applied. Let us introduce a parametric family of diffusion processes dXt = S(ϑ, Xt ) dt + dWt , X0 , 0 ≤ t ≤ T, (3.2) where S(ϑ, ·) is chosen in the following way: We fix an increasing interval [−A, A] with £ A = A¤T = γ∗−1 log(T +1) and a sequence of its sub-intervals Im = am −AT −β , am + AT −β , where β = (2k − 1)−1 and am = 2mAT −β , m = 0, ±1, ±2, . . . , ±M. Here M = MT is the greatest integer such that the interval IM is entirely included in [−A, A]. Let us denote s X X 2A S(ϑ, x) = S∗ (x) + ϑi,m gi,m (x), T β f∗ (am ) |m|≤M |i|≤L where ϑ is the (2M + 1) × (2L + 1) matrix of coefficients ϑi,m and q ¡ ¢ gi,m (x) = T β /A ei T β A−1 (x − am ) U (A − |x − am |T β ). Here ei (·) is the trigonometric basis on [−1, 1], that is if i > 0, sin(πix), √ ei (x) = 1/ 2, if i = 0, cos(πix), if i < 0, the function U (x) is (k + 1)-times differentiable, increasing, vanishing for x ≤ 0 and equal to one for x ≥ 1. The integer L = LT will be chosen later. The parameter ϑ ∈ ΘT is of increasing dimension and ¯ µ¯ ¶ p ¯ Λ(1 − ε) ¯k 1 ¯ ¯ |ϑl,m | ≤ K σl (ε), σl (ε) = (3.3) ¯ −1 , 2AT 1−β ¯ l + for l 6= 0, and σ0 = T −β , for l = 0. Above we used the standard notation B+ = max(0, B); ε is a positive number and ¶ 1 µ R(k − 1)(2k − 1) 2k−1 . Λ = ΛT = A 4kπ 2k−2 Second order minimax density estimation 507 The number L = LT is now the integer part of Λ. For the moment, the choice of the σl and Λ is uncomprehending, but it will be clarified a little later. Note that for S(·) ∈ Σγ∗ (A∗ , C∗ , ν∗ ) we have the estimate (see [9, Lemma 4 in Appendix]) µ Z 2 sup ϑ∈ΘT |x|>A fϑ (x) Eϑ χ{ξ>x} − Fϑ (ξ) ¶2 dx ≤ C e−γ∗ A , fϑ (ξ) which implies that the choice A = γ∗−1 log T makes this term asymptotically negligible. Therefore it is sufficient to study the lower bound for the risk Z RˆT (f¯T , fS ) = ES A −A [f¯T (x) − fS (x)]2 dx − 1 T Z A I−1 (S, x)dx −A ¯ ¯ because ¯RT (f¯T , fS ) − R̂T (f¯T , fS )¯ ≤ CT −2 , and the order of the term RˆT will be shown to be less than T −2 . © ¯ ¯ ¯ ¯ ª We consider now only the estimators belonging to the set WT = fT R̂T (fT , f∗ ) < 1 . It is evident that in the proof of lower bound one can drop all the estimators do not belonging to WT . Moreover, it is clear that for T large enough this set is not empty (since there exist consistent estimators). Suppose that the parameter ϑ is a random matrix with a prior distribution Q(dϑ) on the set ΘT . Then we have the obvious inequality Z sup R̂T (f¯T , fϑ ) ≥ EQ R̂T (f¯T , fϑ ) = R̂T (f¯T , fϑ ) Q(dϑ), ϑ∈ΘT ΘT where the expectation EQ is defined by the last equality. We can write now the following sequence of inequalities: sup R̂T (f¯T , fS ) inf f¯T ∈WT S∈Σ∗ ≥ inf sup f¯T ∈WT S∈Σ∗ ∩ΘT Z ≥ inf f¯T ∈WT R̂T ΘT ¡ R̂T (f¯T , fS ) ¢ f¯T , fϑ Q(dϑ) − sup f¯T ∈WT Z ≥ R̂T (Q) − sup f¯T ∈WT ¡ Z ¡ ¢ R̂T f¯T , fϑ Q(dϑ) ΘT \Σ∗ ¢ R̂T f¯T , fϑ Q(dϑ). (3.4) ΘT \Σ∗ Here R̂T (Q) denotes the second order Bayesian risk with respect to the prior Q, that is £ ¤ R̂T (Q) = EQ R̂T (fˆT , fϑ ) . The next step is to choose a prior distribution that maximises the second order Bayesian risk and, in the same time, is essentially concentrated on the set Σ∗ , that is the probability of the set ΘT \ Σ∗ is sufficiently small. This distribution Q is usually called asymptotically least favorable prior distribution. In our case, it is defined in the following way: Dalalyan – Kutoyants 508 let ηl,m , l = 0, ±1, ±2, . . . , ±L, m = 0, ±1, . . . , ±M , be i.i.d. random variables with common density p(·), such that Z |ηl,m | ≤ K, E[ηl,m ] = 0, 2 E[ηl,m ] = 1, K J= −K ṗ(x)2 dx = 1 + ε, p(x) where ε → 0 as K → ∞ (forpexample, one can set K = ε−1 ). Then Q is the distribution of the random array ϑl,m = σl (ε) ηl,m , in other words, it is the product measure µ ¶ Y Y ϑl,m 1 p p Q(dϑ) = p dϑl,m . (3.5) σl (ε) σl (ε) |m|≤M |l|≤L Therefore the Fisher information of one component of this prior distribution Q is Jl,m = Jl = (1 + ε)σl (ε)−1 . We shall evaluate now the Bayesian risk, which will give us√the main part ¡ of the minimax¢ risk. Remark firstly that the set of functions el,m (x) = A−1 T β el A−1 T β (x − am ) forms an orthonormal basis of the Hilbert space L2 [−A, A] and hence, using Parseval’s identity, we can write Z Z A 1 2 ¯ [fT (x) − fϑ (x)] dx − E If−1 (S, x)dx T −A −A # " X X 4 ≥ E |ψl,m,T − ψl,m,ϑ |2 − E |Ψl,m,ϑ (ξ)|2 T EQ R̂T (f¯T , fϑ ) = E A |m|≤M |l|≤L − X X 4 E |Ψl,m,ϑ (ξ)|2 := A1T + A2T . T (3.6) |m|≤M |l|>L (T ) where E is the expectation with respect to the probability measure dPϑ × Q(dϑ) and ψl,m,T , ψl,m,ϑ and Ψl,m,ϑ (y) are the Fourier coefficients Z ψl,m,T = f¯T (x) el,m (x) dx, Im Z ψl,m,ϑ = fϑ (x) el,m (x) dx, Z Im Ψl,m,ϑ (y) = Im µ fϑ (x) χ{y>x} − Fϑ (y) fϑ (y) ¶ el,m (x) dx of the corresponding functions. From the other nonparametric problems related to this model (see [8, 9]), it turned out that the quantity µ Il,m (ϑ) = Eϑ ∂S(ϑ, ξ) ∂ϑl,m ¶2 = £ 2 ¤ 2A Eϑ gl,m (ξ) < ∞ ∗ (am ) T βf Second order minimax density estimation 509 plays the role of Fisher information concerning the parameter ϑl,m in the van Trees inequality (see Gill and Levit [14]). That is, the following inequality holds ³ ´2 ∂ψ EQ ∂ϑl,m,ϑ l,m E |ψl,m,T − ψl,m,ϑ |2 ≥ . T EQ Il,m (ϑ) + Jl,m From this inequality one can deduce that E|ψl,m,T − ψl,m,ϑ |2 − 4Jl E[Ψ2l,m,ϑ (ξ)] 4 E[Ψ2l,m,ϑ (ξ)] ≥ − + µl,m,T , (3.7) T T (T EQ Il,m (ϑ) + Jl ) where µl,m,T = ∂ψl,m,ϑ 2 ∂ϑl,m ) (EQ − 4 EQ [Il,m (ϑ)] E |Ψl,m,ϑ (ξ)|2 T EQ [Il,m (ϑ)] + Jl . Due to Lemma 5.4, Section 5, the sum of the terms µl,m,T is less than a power of A 2k divided by T 1+2β , which is clearly oT (1)/T 2k−1 . This implies that the main part of the convergence of the second order Bayesian risk is given by the first terms of inequality (3.7). Now we shall evaluate the sum of the first terms in the inequality (3.7). Since the function gl,m (·) is very close in supremum norm to the normalised function el,m (·), we have EQ [Il,m (ϑ)] ≥ 2AT −β (1 − ε) . Remark that the sum over m of the first terms in the inequality (3.7) corresponding to 2k the case l = 0 is smaller in order than T − 2k−1 , since the value J0 is chosen significantly smaller than the other Jl ’s. Further, using the result of Corollary 5.3, Section 5, together with the inequality of Minkovsky, we get A1T ≥ − 4 T M X ±L X Jl E |Ψl,m,ϑ (ξ)|2 oT (1) + 2k T EQ Il,m (ϑ) + Jl T 2k−1 m=−M l=±1 M L X X Jl A2 f∗ (am )(T 2β π 2 l2 )−1 oT (1) + 2k 2T 1−β A(1 − ε) + Jl T 2k−1 m=−M l=1 =− 16(1 + ε) T ≥− L 8(1 + ε)2 A X 1 oT (1) + 2k . (1 − ε)T 1+β π 2 l2 (2T 1−β A σl (ε) + 1) T 2k−1 l=1 Further, direct calculations (for more details, see [9]) yield the estimate Z A (k) [f (k) (ϑ, x) − f∗ (x)]2 dx −A Z A = 4(1 + oT (1)) (k−1) [S (k−1) (ϑ, x) − S∗ −A = 4(1 + oT (1)) X |m|≤M 2A T β f∗ (am ) X (x)]2 f∗2 (x)dx ϑ2l,m f∗2 (am ) 1≤|l|≤L µ πlA Tβ ¶2(k−1) . (3.8) (3.9) Dalalyan – Kutoyants 510 If we evaluate now the mathematical expectation with respect to Q of this expression, using the fact that the partial sums of the function f∗ (·) tend to its integral (which equals 1), we obtain Z (k) [f (k) (ϑ, x) − f∗ (x)]2 dx E R = 4(1 + oT (1)) X 2Af∗ (am ) Tβ |m|≤M = 4(1 + oT (1)) X 1≤|l|≤L X µ ¶2(k−1) πlT β σl (ε) A 1≤|l|≤L µ ¶2(k−1) πlT β σl (ε) . A (3.10) We can now explain the choice (3.5) of the prior distribution. It is clear from (3.8) that the least favorable prior is the one, for which the Fisher informations Jl are minimal under the constraint (3.10) involving only the variances {σl (ε), l = ±1, . . . , ±L}. It is well known that the minimum of the Fisher information of a random variable with zero mean and fixed variance σ is attained by the Gaussian distribution and this minimum is equal to σ −1 . We can not use here directly the product of normal distributions as a prior distribution, since for Höffding’s inequality we need to have bounded random variables. That is why we use an approximation of Gaussian distribution and we pay a factor 1 + ε for the error of this approximation. In order to determine the values of σl , we solved the maximisation problem derived from (3.9) and (4.3). Thus, we looked for the sequence y = (yl ) maximising the functional Φ(y) = L X l=1 1 l2 (2T 1−β A y l + 1) , ª ¡ πlT β ¢2(k−1) © PL ≤ R . This can be done over the set E(k, R) = y ∈ RN + : 8 l=1 yl A directly using the method of Lagrange multipliers and the result is given by yl∗ equal ¢ −1 ¡¯ −1 ¯k ¯Λl ¯ − 1 , which leads to the definition of σl presented in the to (2AT 1−β ) + beginning of the proof. Now we shall evaluate the sums in right hand sides of (3.8) and (3.10). For the second one, setting Λε = Λ(1 − ε) we have X 1≤|l|≤L µ πlT β σl (ε) A ¶2(k−1) 1 = 2AT 1−β = π 2(k−1) ¶ µ ¶2(k−1) X µ· Λε ¸k πlT β −1 l A + 1≤|l|≤L Λ2k−1 ε 2A2k−1 X µ· l ¸k−2 · l ¸2k−2 ¶ 1 − . Λε Λε Λε 1≤|l|≤L Second order minimax density estimation 511 Since Λε tends to infinity, when T tends to infinity, the last sum is close to the integral over [0, 1] of the function xk−2 − x2k−2 . This yields X µ ¶2(k−1) Z 1 πlT β (1 + oT (1))π 2(k−1) Λ2k−1 ε (xk−2 − x2k−2 ) dx σl (ε) = A 2A2k−1 0 1≤|l|≤L = (1 + oT (1))π 2(k−1) Λ2k−1 k R(1 − ε) ε ≤ . 2A2k−1 (k − 1)(2k − 1) 8 This upper estimate permits us to use the Höffding inequality to obtain an exponential bound for Q-probability of the set ΘT \ Σ∗ (see [9] for details). Thus we get Q(ΘT \ Σ∗ ) ≤ CT −2 . (3.11) Evaluating the sum of (3.8) in the same way, we get A1T µ ¶k Z L 8(1 + ε)2 A X 1 l 8(1 + ε)2 A(1 + oT (1)) 1 k−2 ≥− = − x dx (1 − ε)T 1+β π 2 l2 Λε (1 − ε)2 T 1+β π 2 Λ 0 l=1 8(1 + ε)3 A ≥− . (1 − ε)2 T 1+β π 2 Λ(k − 1) Similarly, due to Corollary 5.3, Section 5, A2T = − Z ∞ 8A(1 + oT (1)) X −2 8A(1 + oT (1)) ∞ −2 l = − x dx T 1+β π 2 Λε T 1+β π 2 1 l=L+1 ≥ − 8A(1 + ε) 8(1 + ε)3 A ≥− . 1+β 2 Λε T π (1 − ε)2 T 1+β π 2 Λ Putting together the last two estimates we get the following lower bound for the Bayesian risk with prior distribution Q: EQ RT (f¯T , fϑ ) ≥ − 2k (1 + ε)3 8(1 + ε)3 Ak − 2k−1 = −T Π̂(k, R) . (1 − ε)2 T 1+β π 2 Λ(k − 1) (1 − ε)2 This inequality, (3.4) and (3.11) imply the desired result, since f¯T ∈ WT and the real number ε can be chosen as small as we want. 2 4 Estimator Our goal now is to construct an estimator attaining the lower bound obtained in the previous section. Note that in the problem of signal detection with Gaussian white noise, there is a linear estimator which is second order minimax. The linear filter defining this estimator utilises the so called Pinsker’s weights. The same assertion holds true for the problem of distribution function estimation from i.i.d. observations. In this case the second order minimax estimator is linear with Dalalyan – Kutoyants 512 respect to the (global) trigonometric basis on the interval [−B, B], where B tends to infinity when the number of observations tends to infinity. It turns out that an analogous result holds true in the model of ergodic diffusion, but one has to be careful in the definition of linear estimator. More precisely, the estimation of the function f in L2 -norm is equivalent to estimating in L2 its Fourier transform. Therefore, it would be natural to expect that the best linear estimator of the Fourier transform of f (which leads to a kernel estimator) is asymptotically optimal over the set of all possible estimators. But this assertion turned out to be wrong. A very heuristical explanation of this fact is that when we consider the linear estimators of the Fourier transform, we disperse the information concerning each point over the whole real line and therefore, the estimator does not take into account the local structure of the model with respect to the space parameter. That is why, the localised bases of functions fit better to our model, in particular it appears that a slightly modified linear estimator with respect to a localised trigonometric basis is second order asymptotically minimax. To construct our estimator, we introduce the localised orthonormal basis {el,m : l, m ∈ Z} of L2 (R) defined in the following way ¾ ½ 1 iπl(x − am + δT ) el,m (x) = √ exp χ{x∈Im } . δT 2δT √ Above i = −1 and Im is the interval of length 2δT with center am = 2mδT . We call this basis localised since δT tends to zero when T → ∞. The Fourier coefficients of the local time and the invariant density with respect to this basis are Z Z 1 T ϕ◦T,l,m = el,m (x)fT◦ (x) dx = e (Xt ) dt, T 0 l,m R Z el,m (x)fS (x) dx = ES [el,m (Xt )], ϕS,l,m = R where el,m states for the complex conjugate of el,m . Of course, the estimation of the function fS (·) in L2 is equivalent to the estimation of the infinite matrix {ϕS,l,m : l, m ∈ Z}. Since ϕ◦T,l,m is the empirical estimator of ϕS,l,m , the linear estimator of {ϕS,l,m : l, m ∈ Z} is defined as ϕ̄T,l,m = ϕT,l,m ϕ◦T,l,m , l, m ∈ Z, (4.1) where ϕT,l,m are some numbers between 0 and 1. This leads us to the ensuing definition P of the linear estimator of the invariant density: f¯T (x) = l,m∈Z ϕT,l,m ϕ◦T,l,m el,m (x). In order to attain the lower bound of the previous section, this linear estimator should be modified. In some sense, we have to correct it at boundaries. Let us discuss it more in details. By condition the estimated function fS belongs to the Sobolev class Σ(k, R). We would like to derive from this condition an ellipsoid type inequality for the coefficients ϕS,l,m . Using the integration by parts formula, for l 6= 0, we get ϕS,l,m = ϕf (k) ,l,m ³ δ ´k T iπl k−1 ¢³ δT ´j+1 (−1)l X ¡ (j) −√ . f (am + δT ) − f (j) (am − δT ) iπl 2δT j=0 Second order minimax density estimation 513 R (k) This last sum will be denoted ql,m (f ). Now the condition R (fS (x))2 dx ≤ R can be rewritten as ³ πl ´2k XX |ϕS,l,m + ql,m (fS )|2 ≤ R. (4.2) δT m∈Z l∈Z Recall that the regularity condition imposed on the estimated function determines the behaviour of the bias and does not influence the variance. That is why we want to modify the linear estimator defined by (4.1) so that the bias of the modified estimator be linear with respect to ϕS,l,m + ql,m (fS ) and its variance remains unchanged. It is easy to check that the sequence ϕ∗T,l,m = ϕT,l,m (ϕ◦T,l,m + ql,m (fS )) − ql,m (fS ), l, m ∈ Z, satisfies these conditions, but it is not an estimator since it involves the unknown function fS via the term ql,m (fS ). To overcome this problem, we substitute this term by a rate optimal estimator q̂T,l,m . Thus, ϕ̃T,l,m = ϕT,l,m (ϕ◦T,l,m + q̂T,l,m ) − q̂T,l,m , l, m ∈ Z, where k−1 ¢³ δT ´j+1 (−1)l X ¡ ˜(j) (j) q̂T,l,m = √ fT (am + δT ) − f˜T (am − δT ) iπl 2δT j=0 = k−1 (−1)l X ³ δT ´j+1 √ T 2δT j=0 ihT πl Z T· ³a + δ − X ´ ³ a − δ − X ´¸ m T t m T t Q(j) − Q(j) dt h h T T 0 and Q is a kernel of order k with support [−1, 1]. From now on, we set hT = T −1/(2k−1) 1/2 and δT = hT . Note that this trick with ql,m has been already exploited by Delattre and Hoffmann [10], it is particularly useful in the case of nonhomogeneous Fisher information. We show in this section that there exists a modified linear estimator which is second ¡ order minimax. The weights ϕT,l,m of this estimator are defined by ϕ̂T,l,m = 1 − ¯ ¯k+µT ¢ √ ¯l/ν̂ ¯ , where µT = 1/ log T tends to zero as T → ∞ and + µ ν̂ = ν̂T = δT 8kπ 2(k−1) T R(k − 1)(2k − 1) 1 ¶− 2k−1 . We denote by fˆT (·) the estimator of the invariant density defined via these weights. Note that ν̂T tends to infinity as T tends to infinity. Theorem 4.1 Let k > 1, then the estimator fˆT is second order asymptotically minimax, that is 2k lim sup T 2k−1 R(fˆT , fS ) = − Π̂(k, R). (4.3) T →∞ S(·)∈Σ∗ Dalalyan – Kutoyants 514 Proof: Let S(·) ∈ Σ∗ (k, R). Recall that the normalised first order minimax risk asymptotically equals · ¸ Z X χ{ξ>x} − FS (ξ) 2 £ ¤ 2 R(S) = 4 dx = ES Ψ2S,l,m (ξ) fS (x) ES f (ξ) R S l,m∈Z where we applied the Parseval identity to the square integrable function gS (·, x) having the following Fourier coefficients: ¸ · Z χ{x>y} − FS (x) ΨS,l,m (x) = 2 el,m (y)fS (y) dy. fS (x) R Using this decomposition, one can rewrite the second order risk function as RT (fˆT , fS ) " # X ³¯ ¯2 £ 2 ¤´ −1 ¯ ¯ = ES ϕ̃T,l,m − ϕS,l,m − T ES ΨS,l,m (ξ) l,m∈Z = X " = X ◦ T,l,m ϕT,l,m ES l,m∈Z ¯ ¯ϕ̂ 2 − ϕS,l,m + (ϕ̂T,l,m − 1)q̂T,l,m | − Ψ2S,l,m (ξ) # T " ¯ ¡ ¢ ¡ ¢ ES ¯ϕ̂T,l,m ϕ◦T,l,m − ql,m (fS ) − ϕS,l,m − ql,m (fS ) l,m∈Z # ¡ ¢¯2 £ 2 ¤ 1 ¯ + (ϕ̂T,l,m − 1) q̂T,l,m − ql,m (fS ) − ES ΨS,l,m (ξ) . T On the one hand, we shall show below that the term X ¯ ¡ ¢ ¡ ¢¯2 ES ¯ϕ̂T,l,m ϕ◦T,l,m − ql,m (fS ) − ϕS,l,m − ql,m (fS ) ¯ l,m∈Z is of order T −1 . On the other hand, the second inequality of Lemma 5.5 implies that ¯ ¡ ¢¯2 P ¯ ¯ is of order T −(2k+1/2)/(2k−1) . So, a l,m∈Z ES (ϕ̂T,l,m − 1) q̂T,l,m − ql,m (fS ) simple application of the Hölder inequality yields X ³ ¯ ¡ ¢ ¡ ¢¯2 RT (fˆT , fS ) ≤ ES ¯ϕ̂T,l,m ϕ◦T,l,m − ql,m (fS ) − ϕS,l,m − ql,m (fS ) ¯ l,m∈Z £ ¤´ −T −1 ES Ψ2S,l,m (ξ) + CT −(2k+1/4)/(2k−1) . The stationarity of the process X implies that the mathematical expectation of ϕ◦T,l,m is equal to ϕS,l,m , therefore X ³ RT (fˆT , fS ) = |ϕ̂T,l,m − 1|2 |ϕS,l,m − ql,m (fS )|2 + |ϕ̂T,l,m |2 VarS [ϕ◦T,l,m ] l,m∈Z £ ¤´ − T −1 ES Ψ2S,l,m (ξ) + CT −(2k+1/4)/(2k−1) . Second order minimax density estimation 515 The first term of this sum is in the appropriate form and we show below that it is of 2k order T − 2k−1 . To determine how behave the second and the third terms, we need several technical results. Lemma 4.2 The following relations hold: ¶ µ X ¤ £ 2 ◦ −1 2 |ϕ̂T,l,m | VarS [ϕT,l,m ] − T ES ΨS,l,m (ξ) l,m∈Z X ¡ ≤ T −2k/(2k+1) oT (1), ¤ ¢ £ 1 − |ϕ̂T,l,m |2 ES Ψ2S,l,m (ξ) l,m∈Z ¡ ¢ ¡ ¢ X 8fS (am ) 1 − |ϕ̂T,l,m |2 ≥ 1 + oT (1) . δT−2 π 2 l2 l,m∈Z (4.4) (4.5) ∗ The proof of this lemma is quite technical and relies essentially on the arguments used in the proofs of Lemmas 5.2–5.4, Section 5. That is why it will be omitted here. Using this lemma, for any linear filter ϕ̂ = {ϕ̂T,l,m } such that |ϕ̂T,l,m | ≤ 1, the second order risk can be evaluated as follows: X ³ RT (fˆT , fS ) ≤ |ϕ̂T,l,m − 1|2 |ϕS,l,m − qlm (fS )|2 l,m∈Z, l6=0 − ¢´ 2k+1/4 8δT2 fS (am ) ¡ 2 1 − | ϕ̂ | + CT − 2k−1 . T,l,m 2 2 Tπ l Recall that the function fS (·) belongs to Σ∗ (k, R). Acting like in (4.2), we get X l2k |ϕS,l,m − ql,m (fS ) − ϕ∗,l,m + ql,m (f∗ )|2 ≤ R. (πδT−1 )2k (4.6) l,m∈Z Since the central function f∗ is smoother than fS , we have X l2k+2µT |ϕ∗,l,m − ql,m (f∗ )|2 < C < +∞, l,m∈Z where the constant C does not depend on T . If we use this inequality and the convergence ν̂TµT → ∞, we obtain X ¯ ¯ ¯1 − ϕ̂T,l,m ¯2 | ϕS,l,m − ql,m (fS )|2 l,m∈Z ≤ ν̂T−2k X l2k |ϕS,l,m − ql,m (fS ) − ϕ∗,l,m + ql,m (f∗ )|2 (1 + oT (1)) l,m∈Z ≤ (ν̂T δT−1 π)−2k R (1 + oT (1)). Since the coefficients ϕ̂T,l,m do not depend on m, we shall omit the index m. We denote by Z∗ the set of all integers different from zero. Using the fact that the integral of fS (·) Dalalyan – Kutoyants 516 is equal to one and the convergence of partial sums, we get ¡ ¢ 2k ¡ ¢ X 4δT 1 − |ϕ̂T,l |2 Rδ T RT (fˆT , fS ) ≤ 1 + oT (1) − . (πν̂)2k π 2 T l2 l∈Z∗ It is easy to check that ¡ ¢ X 1 − |ϕ̂T,l |2 X 1 µ ¯¯ l ¯¯k ¯¯ l ¯¯2k ¶ X 1 ∼ 2¯ ¯ − ¯¯ ¯¯ + l2 l2 ¯ ν̂ ¯ ν̂ l2 l∈Z∗ |l|≤ν̂ |l|>ν̂ Z Z 2 1 2 ∞ −2 k−2 2k−2 ∼ (2x −x ) dx + x dx ν̂ 0 ν̂ 1 µ ¶ 2 1 4k 2 2 − +1 = . = ν̂ k − 1 2k − 1 ν̂(k − 1)(2k − 1) ¡ ¢ Above aT ∼ bT means aT = bT 1 + oT (1) . We are able now to explain the choice of ν̂; it is simply the value minimising the function G(ν −1 ) = RδT2k ν −2k 16k 2 δT ν −1 − 2 . 2k π π T (k − 1)(2k − 1) The substitution of ν̂T by its value leads to the desired bound for the second order risk of our estimator: ¡ ¢ 2k RT (fˆT , fS ) ≤ 1 + oT (1) T − 2k−1 Π̂(k, R). 2 This completes the proof of the theorem. Remark 4.3 We supposed in all statements and proofs that the diffusion coefficient σ is identically 1, but this condition can be relaxed. There are two possible approaches for obtaining optimal constants in this case: • Either one considers the weighted (second order) L2 -risk Z ES ¡ R ¢2 4 f¯T (x) − fS (x) σ 2 (x) dx − T à Z R fS2 (x)σ 2 (x)ES χ{ξ>x} − FS (ξ) σ(ξ)fS (ξ) !2 dx © R (k) ª and defines the weighted Sobolev ball (fS − f∗(k) )2 σ 2 ≤ R as the parameter space. Then the optimal constant remains unchanged. • Or one refuses to change the risk definition but accepts to work in local minimax settings, that is when the estimated function belongs to a narrowing vicinity of an unknown (but fixed) function f∗ . In that case the optimal constant depends on the central function f∗ and has the following form: µZ Π̂(f∗ , k, R) = 2(2k − 1) R 2k µ 2k ¶ 2k−1 ¶ 2k−1 1 4k f∗ (x) dx R 2k−1 . σ 2 (x) π(k − 1)(2k − 1) Second order minimax density estimation 517 Remark 4.4 The optimal constant Π̂(k, R) we obtained in this work contains a factor 2 (therefore is twice smaller) which is absent in the second order optimal constant in the problem of distribution function estimation from i.i.d. observations (cf. Golubev and Levit [17]). In some sense, it reveals that the estimation problem in our settings is easier than in the settings of [17]. This phenomenon has a simple explanation. In the definition of the parameter set, we required that the function S satisfies the inequality sgn(x)S(x) ≤ γ∗ for x 6∈ [−A, A]. This condition excludes automatically the density functions fS possessing heavy tails, moreover, it implies that these densities decrease exponentially fast at infinity. In particular, the densities close to those of uniform distributions over a large interval do not belong to our nonparametric class. But exactly these densities are the most difficult to estimate, because the observations distributed following these laws are quite dispersed on the whole real line and contain very few information about the local variation of the underlying density function. Remark 4.5 We estimated the infinite matrix ϕS = {ϕS,l,m }l,m∈Z of the Fourier coefficients of fS by the infinite matrix ϕ̃T = {ϕ̃T,l,m }l,m∈Z . In reality, this last matrix contains only a finite number of elements different from null. Indeed, for large values of m, the intervals Im contain no observation and corresponding coefficients ϕ̃T,l,m are therefore equal to zero. So we used here a kind of soft thresholding procedure. 5 Proofs of technical lemmas Lemma 5.1 In the notation of Section 3, for any integer l different from 0, we have E[Ψ2l,m,ϑ (ξ)χ{ξ6∈Im } ] ≤ CA3 . l2 T 3β Proof: We denote δ = A/T β and divide the indicator into two parts E[Ψ2l,m,ϑ (ξ)χ{ξ6∈Im } ] = E[Ψ2l,m,ϑ (ξ)χ{ξ>am +δ} ] + E[Ψ2l,m,ϑ (ξ)χ{ξ<am −δ} ]. The first term can be evaluated as follows: µZ E[Ψ2l,m,ϑ (ξ)χ{ξ>am +δ} ] = E ¶2 µ Im 1 − Fϑ (ξ) χ{ξ>am +δ} fϑ (ξ) ¶2 fϑ (x)el,m (x) dx , fϑ (x)el,m (x) dx µZ ≤ CEQ Im ¡ ¢ since one can prove exactly like in [9], that fϑ (x) = f∗ (x) 1 + oT (1) and Z ∞ am +δ µ 1 − F∗ (y) f∗ (y) ¶2 f∗ (y)dy < CAf∗−1 (am + δ). ¶2 Dalalyan – Kutoyants 518 The first multiplier will be evaluated using the integration by parts formula: · Z ¢ δ ¡ fϑ (x)el,m (x) dx = fϑ (am + δ) − fϑ (am − δ) e−l,m (am − δ) πl Im ¸ Z − fϑ0 (x) e−l,m (x) dx Im Using the fact that fϑ (x) is differentiable, its derivative is 2Sϑ (x)fϑ (x) and that Sϑ (·) is bounded by 2A on the interval [−A, A] for any ϑ ∈ ΘT , we get ¯Z ¯ ¯ ¯ CAδ 32 ¯ ¯≤ f (x)e (x) dx f∗ (am ). ϑ l,m ¯ ¯ l2 Im Combining these estimates we obtain the desired result for the first term. The term involving the indicator of the event ξ < am − δ can be evaluated analogously. This completes the proof of the lemma. 2 Lemma 5.2 If l 6= 0 and y ∈ Im , then the following representation holds ¡ ¢ ¢ AΦl,m,ϑ (y) A 1 + oT (1) ¡ Ψl,m,ϑ (y) = e−l,m (y) − e−l,m (am − AT −β ) + , β πlT πlT β where the sequence Φl,mϑ is defined as follows µ Z Φl,m,ϑ (y) = Im fϑ0 (x) ¶ χ{y>x} − Fϑ (y) e−l,m (x) dx. fϑ (y) Proof: The proof of this lemma is quite close to the proof of the preceding one. Using the integration by parts formula one gets Ψl,m,ϑ (y) = δ ¡ ¢ fϑ (y)e−l,m (y) − fϑ (am − δ)e−l,m (am − δ) πlfϑ (y) ¢ δFϑ (y) ¡ − f (am + δ) − fϑ (am ) e−l,m (am − δ) πlfϑ (y) ϑ µ ¶ Z χ{y>x} − Fϑ (y) δ 0 − f (x) e−l,m (x) dx. πl Im ϑ fϑ (y) To finish the proof, it remains to remark that on the interval Im , one has |fϑ (y)−fϑ (am − δ)| ≤ Cδ|fϑ0 (y)| ≤ CAδ|fϑ (y)|, since fϑ0 (x) = 2Sϑ (x)fϑ (x) and Sϑ (x) is bounded by 1 + |x|. 2 Corollary 5.3 For any l different from 0, we define the number cl to be equal to 1, if l < 0, and to 3, if l > 0. Then we have q £ £ ¤ A2 f∗ (am )(1 + oT (1)) ³√ ¤´2 CA3 2 , E Ψ2l,m,ϑ (ξ) ≤ c + E Φ (ξ) + l l,m,ϑ π 2 l2 T 2β l2 T 3β Second order minimax density estimation 519 and (see [9, Appendix], for the proof) X £ ¤ E Φ2l,m,ϑ (ξ) ≤ C. l,m∈Z Lemma 5.4 The residual terms µl,m,T defined in (3.7) satisfy |µl,m,T | ≤ CδT3 T −1 where C is a constant. Proof: The proof of this lemma is rather technical. We give below the main ideas of the proof and leave the technical details to the reader. First of all, due to Lemma 5.1, it is enough to prove this inequality for ³ ´2 £ ¤ ∂ψ EQ ∂ϑl,m,ϑ − 4 EQ [Il,m (ϑ)] E Ψ2l,m,ϑ (ξ)χ{ξ∈Im } l,m µ̄l,m,T = T EQ [Il,m (ϑ)] + Jl Next, it can be shown (see Kutoyants [22]) that s Z ∂ψl,m,ϑ 2A =2 g (x)Ψl,m,ϑ (x) fϑ (x) dx. ∂ϑl,m f∗ (am )T β Im l,m (5.1) Since all the function under the integral are differentiable with bounded derivative and the length of the interval Im is 2δT , we have s 5 32δT3 ∂ψl,m,ϑ = gl,m (am )Ψl,m,ϑ (am ) fϑ (am ) + δT2 OT (1) ∂ϑl,m f∗ (am ) Using the same arguments, one can check that £ ¤ Il,m (ϑ) Eϑ Ψ2l,m,ϑ (ξ)χ{ξ∈Im } £ ¤ 2δT = E [g 2 (ξ)]Eϑ Ψ2l,m,ϑ (ξ)χ{ξ∈Im } f∗ (am ) ϑ l,m 8δT3 g 2 (am )Ψ2l,m,ϑ (am ) fϑ (am )2 + OT (δT4 ). = f∗ (am ) l,m (5.2) It is clear now that the difference of the terms (5.1) and (5.2) is of order δT4 . Since the denominator of µ̄l,m,T is larger than CT δT , we get the desired inequality. 2 Lemma 5.5 In the notation of Section 4, we have ¡ (j) ¢2 (j) ES f˜T (a) − fS (a) ≤ Ch2k−2j (1 + |a|)ν∗ fS (a), T ¡ ¢2 ES q̂T,l,m − ql,m (fS ) Z k−1 X ³ ν̂ ´2j+2 2k+1/2 T ν∗ ν̂T−1 . ≤ ChT (1 + |a|) fS (a) da l Im j=0 (5.3) (5.4) Dalalyan – Kutoyants 520 Proof: The proof of this lemma is quite standard, it is based on a bias-variance decomposition, then the bias is estimated in the usual way (like in the i. i. d. case) and the variance is bounded via the Itô formula exactly like in Kutoyants (1998). The bias-variance decomposition yields ¡ (j) ¢2 ¡ ¢2 ¡ (j) ¢2 (j) (j) (j) (j) ES f˜T (a) − fS (a) ≤ 2 ES [f˜T (a)] − fS (a) + 2ES f˜T (a) − ES [f˜T (a)] . R In view of stationarity of X and the equality Q(u)du = 1, we get Z £ ¤ (j) (j) ˜ ES [fT (a)] − fS (a) = Q(u) f S (j) (a + uhT ) − f S (j) (a) du. R We apply now the Taylor formula with the rest term in the integral form and the fact that R Q is a kernel of order k (i.e. ul Q(u)du = 0 for any integer l ∈ [1, k]): (j) (j) ES [f˜T (a)] − fS (a) = 1 (k − j − 1) ! Z Z uhT (k) y k−j−1 fS (a + uhT − y) dy du. Q(u) R 0 Since Q(u) = 0 if |u| > 1, we obtain ¯ ¯ ¯ES [f˜(j) (a)] − f (j) (a)¯ ≤ T S hTk−j−1 (k − j − 1) ! Z hT ¯ (k) ¯ ¯f (a + y)¯ dy. −hT S The Cauchy–Schwarz inequality yields ¯ ¯ ¯ES [f˜(j) (a)] − f (j) (a)¯2 ≤ T S hT2k−2j−1 [(k − j − 1) !]2 Z hT −hT £ (k) ¤2 fS (a + y) dy. (5.5) We turn to the evaluation of the variance. We have · Z Tµ ³ ´ ³ ´¸¶ 1 (j) (j) (j) Xt − a (j) Xt − a Q − E Q dt. f˜T (a) − ES [f˜T (a)] = S hT hT T hj+1 0 T Let us introduce the functions g(u) and H(u) as follows: · ³ ´ ³ ´¸ (j) u − a (j) X0 − a g(u) = Q − ES Q , hT hT Z u Z y 2 H(u) = g(v)fS (v) dv dy. f 0 S (y) −∞ It can be easily checked that the Itô formula implies Z T Z g(Xt ) dt = H(XT ) − H(X0 ) − 0 T H 0 (Xt ) dWt , 0 and therefore, due to stationarity, µZ T ¶2 £ ¤ £¡ ¢2 ¤ ES g(Xt ) dt ≤ 6ES H 2 (X0 ) + 3T ES H 0 (X0 ) 0 Second order minimax density estimation 521 By integrating by parts, we get Z u Z y 2 g(v) dvfS0 (y) dy f (u) −∞ −∞ −∞ S Z u ³ ´ (j) y − a =2 Q dy hT −∞ Z Z y ³v − a´ ¡ ¢ 2 − Q(j) dvfS0 (y) χ{y<u} − FS (u) dy fS (u) R −∞ hT ³u − a´ = 2hT Q(j−1) hT Z ³y − a´ ¡ ¢ 2hT Q(j−1) − fS0 (y) χ{y<u} − FS (u) dy fS (u) R hT Z H 0 (u) = 2 u g(y) dy − (5.6) Using the inequality χ{y<u} − FS (u) Cχ{u∈[0,y]} Cχ{u∈[0,y]} ≤C+ ≤C+p , fS (u) fS (u) fS (u)fS (y) the equality fS0 (x) = S(x)fS0 (x) and the fact that S(·) increases at most like a polynom, one can prove that the second term in (5.6) is asymptotically negligible with respect to the first one. Therefore, µh ³ ´i2 ¶ £¡ 0 ¢2 ¤ ¡ ¢ (j−1) X0 − a ES H (X0 ) ≤ 4 1 + oT (1) ES hT Q hT Z ¡ ¢ 3 1 £ (j−1) ¤2 ≤ 4 1 + oT (1) hT Q (u) fS (a + uhT ) du Z ≤ Ch3T −1 a+hT a−hT fS (x) dx. It is evident that the term involving the expectation of H 2 (X0 ) is asymptotically much smaller than the term T ES [H 0 (X0 )2 ]. Consequently, ¡ (j) ¢2 (j) ES f˜T (a) − fS (a) Z a+hT £ (k) ¤2 ≤ ChT2k−2j−1 fS (y) dy + a−hT Z ≤ Ch2k−2j−1 T a+hT a−hT C T h2j−1 T Z a+hT a−hT fS (x) dx fS (x) dx ≤ Ch2k−2j (1 + |a|)ν∗ fS (a). T Therefore, using the equality Z k−1 ¡ (j+1) ¢ (−1)l X ³ δT ´j+1 (j+1) q̂T,l,m − ql,m (fS ) = √ f˜T (a) − fS (a) dx iπl 2δT j=0 Im Dalalyan – Kutoyants 522 and the Cauchy–Schwarz inequality, we get ¯ ¯2 ES ¯q̂T,l,m − ql,m (fS )¯ ¶2 µZ k−1 ¡ (j+1) ¢ k X ³ δT ´2j+2 (j+1) ˜ ES fT (a) − fS (a) dx ≤ 2δT j=0 πl Im ≤ k−1 C X ³ δT ´2j+2 2k−2j hT (1 + |am | − δ)ν∗ fS (am − δ) δT j=0 l ≤ Ch2k+1 δT−1 T Z Im (1 + |a|)ν∗ fS (a) da k−1 X³ j=0 ν̂T ´2j+2 −1 ν̂T . l In the last inequality we have used the fact that the step of localisation δT is equal to the square root of hT and that ν̂T is of order δT /hT . 2 References [1] G. Banon. Nonparametric identification for diffusion processes. SIAM J. Control and Optim., 16, 3:380–395, 1978. [2] D. Bosq. Nonparametric Statistics for Stochastic Processes. Lecture Notes in Statistics, 110, Springer, New York, 1998. [3] D. Bosq and Yu. Davydov. Local time and density estimation in continuous time. Math. Methods. Statist., 8:22–45, 1999. [4] J. V. Castellana and M. R. Leadbetter. On smoothed density estimation for stationary processes. Stoch. Proc. Appl., 21:179–193, 1986. [5] L. Cavalier, G. K. Golubev, D. Picard and A. B. Tsybakov. Oracle inequalities for inverse problems Ann. Statist., 30, 843–874, 2002. [6] L. Cavalier and A. B. Tsybakov. Penalized blockwise Stein’s method, monotone oracles and sharp adaptive estimation Math. Methods Statist., 10, 3:247–282, 2001. [7] A. S. Dalalyan. Sharp adaptive estimation of the trend coefficient for ergodic diffusion. Prépublication no. 02–1, Laboratoire Statistique et Processus, Univ. du Maine. [8] A. S. Dalalyan and Yu. A. Kutoyants. Asymptotically efficient estimation of the derivative of the invariant density. Statist. Infer. Stochast. Processes, Vol. 6, 1:89– 109, 2003. [9] A. S. Dalalyan and Yu. A. Kutoyants. Asymptotically efficient trend coefficient estimation for ergodic diffusions. Math. Methods Statist., Vol. 11, 4:402–427, 2002. [10] S. Delattre and M. Hoffmann. The Pinsker bound in mixed Gaussian white noise. Math. Methods Statist, Vol. 10, 3:283–316, 2001. Second order minimax density estimation 523 [11] R. Durrett. Stochastic Calculus: a practical introduction. CRC Press, Boca Raton, 1996. [12] S. Yu. Efromovich. Nonparametric Curve Estimation. Springer, N. Y. e. a, 1998. [13] I. I. Gikhman and A. V. Skorokhod. Introduction to Theory of Random Processes. W.B. Saunders, Philadelphia, 1969. [14] R. D. Gill and B. Ya. Levit. Application of the van Trees inequality: a Bayesian Cramer–Rao bound. Bernoulli, 1:59–79, 1995. [15] G. K. Golubev. LAN in problems of non-parametric estimation of functions and lower bounds for quadratic risks. Theory Probab. Appl., 36, 1:152–157, 1991. [16] G. K. Golubev and W. Härdle. On adaptive smoothing in partial linear models. Math. Methods Statist., Vol. 11, 1:98–117, 2002. [17] G. K. Golubev and B. Ya. Levit. On the second order minimax estimation of distribution functions. Math. Methods Statist., Vol. 5, 1:1–31, 1996. [18] G. K. Golubev and B. Ya. Levit. Asymptotically efficient estimation for analytic functions. Math. Methods Statist., Vol. 5, 357–368, 1996. [19] G. K. Golubev and B. Ya. Levit. Distribution function estimation: adaptive smoothing. Math. Methods Statist., Vol. 5, 383–403, 1996. [20] Yu. A. Koshevnik and B. Ya. Levit. On a non-parametric analogue of the information matrix. Theory Probab. Appl., Vol. 21, 738–753, 1976. [21] Yu. A. Kutoyants. Efficient density estimation for ergodic diffusion. Statistical Inference for Stochastic Processes. Vol. 1, 2:131–155, 1998. [22] Yu. A. Kutoyants. Statistical Inference for Ergodic Diffusion Processes. Springer Series in Statistics, New York, 2003. [23] F. Leblanc. Density estimation for a class of continuous time process. Math. Methods of Statist., Vol. 6, 2:171–199, 1997. [24] O. V. Lepski and V. G. Spokoiny. Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist., 25:2512–2546, 1997. [25] H. T. Nguyen. Density estimation in a continuous-time Markov processes. Ann. Statist., 7:341–348, 1979. [26] M. Nussbaum. Minimax risk: Pinsker’s bound. In Encyclopedia of Statistical Sciences, Update Volume 3, 451–460 (S.Kotz, Ed.), Wiley, New York, 1999. [27] M. Nussbaum. Asymptotic equivalence of density estimation and white noise. Ann. Statist., 24:2399–2430, 1996. [28] M. S. Pinsker. Optimal filtering of square integrable signals in Gaussian white noise. Problems Inform. Transmission, 16:120–133, 1980. 524 Dalalyan – Kutoyants [29] B. L. S. Prakasa Rao. Nonparametric density estimation for stochastic processes from sampled data. Publ. ISUP, XXXV:51–84, 1990. Arnak S- Dalalyan Laboratoire de Probabilités Université Paris 6, Boı̂te courrier 188 75252 Paris cedex 05, France dalalyan@ccr.jussieu.fr Yu.A. Kutoyants Laboratoire de Statistique et Processus Université du Maine, Av. O. Messiaen 72085 Le Mans cedex 9, France kutoyants@univ-lemans.fr