Asymptotic normality of plug-in level set estimates

advertisement
Asymptotic normality of plug-in level set estimates
(expanded version)
David M. Mason∗ and Wolfgang Polonik†
University of Delaware and University of California, Davis
October 20, 2008
Abstract
We establish the asymptotic normality of the G-measure of the symmetric difference
between the level set and a plug-in-type estimator of it formed by replacing the
density in the definition of the level set by a kernel density estimator. Our proof
will highlight the efficacy of Poissonization methods in the treatment of large sample
theory problems of this kind.
Short title: Level set estimation
AMS 2000 Subject Classifications: 60F05, 60F15, 62E20, 62G07.
Keywords: Central limit theorem, kernel density estimator, level set estimation
1
Introduction
Let f be a Lebesgue density on IRd , d ≥ 1. Define the level set of f at level c ≥ 0 as
C (c) = {x : f (x) ≥ c} .
In this paper we are concerned with the estimation of C(c) for a given level c. Such level
sets play a crucial role in various scientific fields, and their estimation has received significant recent interest in the fields of statistics and machine learning/pattern recognition
(see below for more details). Theoretical research on this topic is mainly concerned with
rates of convergence of level set estimators. While such results are interesting, they show
only limited potential to be useful in practical applications. The available results do
not permit statistical inference or making quantitative statements about the contour sets
themselves. The contribution of this paper constitutes a significant step forward in this
direction, since we establish the asymptotic normality of a class of level set estimators
∗
†
Research partially supported by NSF Grant DMS–0503908
Research partially supported by NSF Grant DMS–0706971
1
Cn (c) formed by replacing f by a kernel density estimator fn in the definition of C (c),
in a sense that we shall soon make precise.
Here is our setup. Let X1 , X2 , . . . be i.i.d. with distribution function F and density f ,
and consider the kernel density estimator of f based on X1 , . . . , Xn , n ≥ 1,
n
x − Xi
1 X
K
fn (x) =
, x ∈ IRd ,
1/d
nhn i=1
hn
where K is a kernel and hn > 0 is a smoothing parameter. Consider the plug-in estimator
Cn (c) = {x : fn (x) ≥ c} .
Let G be a positive measure dominated by Lebesgue measure λ. Our interest is to establish
the asymptotic normality of
dG (Cn (c), C(c) ) := G(Cn (c) ∆ C(c))
Z
=
| I{fn (x) ≥ c } − I{ f (x) ≥ c} | dG(x),
(1.1)
IRd
where A∆B = (A \ B) ∪ (B \ A) denotes the set-theoretic symmetric difference of two
sets. Of particular interest is G being the Lebesgue measure λ, as well as G = H with H
denoting the measure having Lebesgue density |f (x) − c|. The latter corresponds to the
so-called excess-risk which is used frequently in the classification literature, i.e.
Z
dH (Cn (c), C(c) ) =
|f (x) − c| dx.
(1.2)
Cn (c) ∆ C(c)
It is well-known that under mild conditions dλ (Cn (c), C(c)) → 0 in probability as n → ∞,
and also rates of convergence have already been derived (cf. Baillo et al. (2000, 2001),
Cuevas et al. (2000), Baillo (2003), Baillo and Cuevas (2004)). Even more is known.
Cadre (2006) derived assumptions under which for some µG > 0 we have
p
n hn dG (Cn (c), C(c)) → µG in probability as n → ∞.
(1.3)
However, asymptotic normality of dG (Cn (c), C(c)) has not yet been considered.
Our main result says that under suitable regularity conditions there exist a normalizing
2
sequence {an,G } and a constant 0 < σG
< ∞ such that
an,G {dG (Cn (c), C(c) ) − EdG (Cn (c), C(c) )} →d σG Z, as n → ∞,
(1.4)
where Z denotes a standard normal random variable. In the important special cases of
G = λ the Lebesgue measure, and G = H we shall see that under suitable regularity
conditions
14
n
, and
(1.5)
an,λ =
hn
1
an,H = n3 hn 4 ,
(1.6)
2
respectively.
In the next subsection we shall discuss further related work and relevant literature. In
Section 2 we formulate our main result, provide some heuristics for its validity, discuss a
possible statistical application and then present the proof of our result. We end Section
2
2 with an example and some proposals to estimate the limiting variance σG
. A number
of the technical details are relegated to the Appendix.
1.1
Related work and literature
Before we present our results in detail, we shall extend our overview of the literature
on level set estimation to include regression level set estimation (with classification as a
special case) as well as density level set estimation.
Observe that there exists a close connection between level set estimation and binary
classification. The optimal (Bayes) classifier corresponds to a level set Cψ (0) = {x :
ψ(x) ≥ 0 } of ψ = p f − (1 − p) g, where f and g denote the Lebesgue densities of two
underlying class distributions F and G and p ∈ [0, 1] defines the prior probability for
f . If an observation X falls into {x : ψ(x) ≥ 0 } then it is classified by the optimal
classifier as coming from F , otherwise as coming from distribution G. Hall and Kang
(2005) derive large sample results for this optimal classifier that are very closely related
to Cadre’s result (1.3). In fact, if Err(C) denotes the probability of a misclassification
of a binary classifier given by a set C, then Hall and Kong derive rates of convergence
results for the quantity Err(Ĉ(0)) − Err(Cψ (0) where Ĉ is the plug-in classifier given by
Ĉ(0) = {x : p fn (x) − (1 − p) gn (x) ≥ 0} with fn and gn denoting the kernel estimators
for f and g, respectively. It turns out that
Z
Err(Ĉ(0)) − Err(Cψ (0) =
|ψ(x)| dx.
Ĉ(0) ∆ Cψ (0)
The latter quantity is of exactly the form (1.2). The only difference is, that the function
ψ is not a probability density, but a (weighted) difference of two probability densities.
Similarly, the plug-in estimate is a weighted difference of kernel estimates. Though the
results presented here do not directly apply to this situation, the methodology used to
prove them can be adapted to it in a more or less straightforward manner.
Hartigan (1975) introduced a notion of clustering via maximally connected components
of density level sets. For more on this approach to clustering see Stuetzle (2003), and for
an interesting application of this clustering approach to astronomical sky surveys refer to
Jang (2006). Klemelä (2004, 2006a,b) applies a similar point of view to develop methods
for visualizing multivariate density estimates. Goldenshluger and Zeevi (2004) use level
set estimation in the context of the Hough transform, which is a well-known computer
vision algorithm. Certain problems in flow cytometry involve the statistical problem of
estimating a level set for a difference of two probability densities (Roederer and Hardy,
2001; see also Wand, 2005). Further relevant applications include detection of minefields
3
based on arial observations, the analysis of seismic data, as well as certain issues in image
segmentation; see Huo and Lu (2004) and references therein. Another application of level
set estimation is anomaly detection or novelty detection. For instance, Theiler and Cai
(2003) describe how level set estimation and anomaly detection go along in the context of
multispectral image analysis, where anomalous locations (pixels) correspond to unusual
spectral signatures in these images. Further areas of anomaly detection include intrusion
detection (e.g. Fan et al. 2001, Yeung and Chow, 2002), anomalous jet engine vibrations
(e.g. Nairac et al. 1997, Desforges et al. 1998, King et al. 2002) or medical imaging e.g.
(Gerig et al 2001, Prastawa et al. 2003) and EEG-based seizure analysis (Gardner et al.
2006). For a recent review of this area see Markou and Singh (2003).
The above list of applications of level set estimation clearly motivates the need to understand the statistical properties of level set estimators. For this reason there has been lot of
recent investigation into this area. Relevant published work (not yet mentioned above) include Hartigan (1987), Polonik (1995), Cavalier (1997), Tsybakov (1997), Walther (1997),
Baillo et al. (2000, 2001), Cuevas et al. (2000), Baillo (2003), Bailllo and Cuevas (2004),
Tsybakov (2004), Steinwart et al. (2004, 2005), Gayraud and Rousseau (2005), Willett
and Novak (2005, 2006), Cuevas et al. (2006), Scott and Davenport (2006), Scott and
Novak (2006), Vert and Vert (2006), and Rigollet and Vert (2008).
Finally we mention a problem closely related to that of level set estimation. This is the
problem of the estimation of the support of a density, when the support is assumed to be
bounded. It turns out that the methods of estimation and the techniques used to study
the asymptotic properties of the estimator are very similar to those of level set estimation.
Refer especially to Biau et al. (2008) and the references therein.
2
Main result
The rates of convergence in our main result depend on a regularity parameter 1/γg that
describes the behavior of the slope of g at the boundary set β (c) = {x ∈ IRd : f (x) = c }
(see assumption (G) below). In the important special case of G = λ the slope of g is
zero, and this implies 1/γg = 0 (or γg = ∞). For G = H our assumptions imply that the
slope of g close to the boundary is bounded away from zero and infinity which says that
1/γg = 1.
Here is our main result. The indicated assumptions are quite technical to state and therefore for the sake of convenience they are formulated in Subsection 2.2 below. In particular,
the integer k ≥ 1 that appears in the statement of our theorem is defined in (B.ii).
Theorem. Under assumptions (D.i)−(D.ii), (K.i)−(K.iii), (G), (H) and (B.i)−(B.ii),
we have as n → ∞ that
an,G {dG (Cn (c), C(c)) − EdG (Cn (c), C(c)) } →d σG Z,
4
(2.1)
where Z denotes a standard normal random variable, and
γ1
n 41 p
g
.
n hn
an,G =
hn
(2.2)
2
The constant 0 < σG
< ∞ is defined as in (2.57) in the case d ≥ 2 and k = 1; as in
(2.61) in the case d ≥ 2 and k ≥ 2; and as in (2.62) in the case d = 1 and k ≥ 2. (The
case d = 1 and k = 1 cannot occur under our assumptions. )
Remark 1. Write
δn (c) = an,G dG (Cn (c), C(c)) − EdG (Cn (c), C(c)) .
A slight extension of the proof of our theorem shows that if c1 , . . . , cm , m ≥ 1, are distinct
positive numbers, each of which satisfies the assumptions of the theorem, then
(δn (c1 ) , . . . , δn (cm )) →d (σ1 Z1 , . . . , σm Zm ) ,
where Z1 , . . . , Zm are independent standard normal random variables and σ1 , . . . , σm are
as defined in the proof of the theorem.
2
Remark 2. In Subsection 2.7 we provide an example when the variance σG
does have
a closed form convenient for calculation. Since such a closed form cannot be given in
2
general, Subsection 2.7 also discusses some methods to estimate σG
from the data.
2.1
Heuristics
Before we continue with our exposition, we shall provide some heuristics to indicate why
1
an = hnn 4 is the correct normalizing factor in (1.5), i.e. we consider the case G = λ,
or γg = ∞. This should help the reader to understand why our theorem is true. It is
well-known that under certain regularity conditions we have
p
n hn {fn (x) − f (x)} = OP (1), as n → ∞.
Therefore the boundary of the set
can be expected to fluctuate in a band B with a
Cn (c)
width (roughly) of the order OP √n1hn around the boundary set β (c) = {x : f (x) = c }.
For notational simplicity we shall write β = β (c). Partitioning B by N = O √n h1n hn =
O √ 1 3 regions Rk , k = 1, . . . , N, of Lebesgue measure λ (Rk ) = hn , we can approxin hn
mate dλ (Cn (c), C(c)) as
dλ (Cn (c), C(c)) ≈
N Z
X
k=1
Rk
| I{fn (x) ≥ c } − I{ f (x) ≥ c} | dx =:
1
. Writing
Here we use the fact that the band B has width √nh
n
Z
Yn,k =
Λn (x) dx
Rk
5
N
X
k=1
Yn,k .
with
Λn (x) = | I{fn (x) ≥ c } − I{ f (x) ≥ c} |,
we see that
Var(Yn,k ) =
Z
Rk
Z
Rk
(2.3)
cov(Λn (x), Λn (y)) dx dy = O λ(Rk ))2 = O(h2n ),
where the O-terms turns out to be exact. Further, due to the nature of the kernel density
estimator the variables Yn,k can be assumed to behave asymptotically like independent
variables, since we can choose the regions Rk to be disjoint. Hence, the variance of
1/2
, which motivates the
dλ (Cn (c), C(c)) can be expected to be of the order N h2n = hnn
1
normalizing factor an = hnn 4 .
2.2
A connection to Lp-rates of convergence of kernel density
estimates
The following discussion of Lp -rates, p ≥ 1, of convergence of kernel density estimates
implicitly provides another heuristic for our our result.
Consider the case G = Hp−1, where Hp−1 denotes the measure with Radon-Nikodym
derivative hp−1 (x) = |f (x) − c|p−1 with p ≥ 1. Note that H2 = H with H from above.
Then we have the identity
Z ∞
Z
1
Hp−1(Cn (c) ∆ C(c)) dc =
|fn (x) − f (x)|p dx, p ≥ 1.
(2.4)
d
p
0
IR
The proof is straightforward (see Detail 1 in the Appendix). The case p = 1 gives the
geometrically intuitive relation
Z ∞
Z ∞Z
Z
λ(Cn (c) ∆ C(c)) dc =
dx dc =
|fn (x) − f (x)| dx.
0
0
Cn (c)∆ C(c)
IRd
Assuming f to be bounded, we split up the vertical axis into successive intervals ∆(k),
k = 1, . . . , N of length ≈ √n1hn with midpoints ck . Approximate the integral (2.4) by
Z
Z ∞
1
p
|fn (x) − f (x)| dx =
Hp−1 (Cn (c)∆ C(c)) dc
p IRd
0
N Z
X
Hp−1 (Cn (c)∆ C(c)) dc
≈
k=1
∆(k)
N
1 X
≈√
Hp−1 (Cn (ck )∆ C(ck )).
n hn k=1
√
Utilizing the 1/ nhn -rate of fn (x) we see that the last sum consists of (roughly) independent random variables. Assuming further that the variance of each (or of most) of
6
−(p−1)
n −1/2
nh
these random variables is of the same order a−2
=
(to obtain this,
n
n,p
hn
apply our theorem with γg = 1/(p − 1)) we obtain that the variance of the sum is of the
order
!p
a−2
1
n
√
.
=
1− 1
nhn
nh p
n
In other words, the normalizing factor of the Lp -norm of the kernel density estimator
p −1
1− 1 p
in IRd can be expected to be (nhn p 2 = nhn 2 hn 2 . In the case p = 2 this gives the
−1/2
1/2
normalizing factor nhn hn
= nhn , and this coincides with the results from Rosenblatt
(1975). In the special case d = 2 these rates can also be found in Horvath (1991).
2.3
Possible application to online testing
Suppose that when a certain industrial process is working properly it produces items,
which may be considered as i.i.d. IRd random variables X1 , X2 , . . . with a known density function f . On the basis of a sample size n taken from time to time from the
production we can measure the deviation of the sample X1 , X2 , . . . , Xn from the desired
distribution by looking at the discrepancy between λ (Cn (c) ∆ C (c)) and its expected
value Eλ (Cn (c) ∆ C (c)). The value c may be chosen so that
Z
f (x) dx = α,
P {X ∈ C (c)} =
{f (x)≥c}
some typical values being α = .90, .95 and .99. We may decide to shut down the process
and look for production errors if
σλ−1
n
hn
1/4
|λ (Cn (c) ∆ C (c)) − Eλ (Cn (c) ∆ C (c))| > 1.96.
(2.5)
Otherwise as long as the estimated level set Cn (c) does not deviate too much from the
target level set C (c) in which fraction α of the data should lie if the process is functioning
properly, we do not disrupt production. Our central limit theorem tells us that for large
enough sample sizes n the probability of the event in (2.5) would be around .05, should,
in fact, the process be working as it should. Thus using this decision rule, we make a type
I error with roughly probability .05 if we decide to stop production, when it is actually
working fine. Sometimes one might want to replace the Cn (c) in the first λ (Cn (c) ∆ C (c))
in (2.5) by Cn (cn ), where
Z
fn (x) dx = α.
{fn (x)≥cn }
A mechanical engineering application where this approach seems to be of some value is
described in Desforges et al. (1998). This application considers gearbox fault data, the
collection of which is described in that paper. In fact, two classes of data were collected,
corresponding to two states: a gear in good condition, and a gear in bad condition,
7
respectively. Desforges et al. indicate a data analysis approach based on kernel density
estimation to recognize the faulty condition. The idea is to calculate a kernel density
estimator gm based on the data X1 , . . . , Xm from the gear in good condition, and then
this estimator is evaluated at the data Y1 , . . . , Yn that are sampled under a bad gear
condition. Desforges et al. then examine
Pn the level sets of gm in which the faulty data
1
lie. One of their ideas is to use n i=1 gm (Yi ), to detect the faulty condition. Their
methodology is ad hoc in nature and no statistical inference procedure is proposed.
Our test procedure could be applied as follows by using f = gm (i.e. we are conditioning
on X1 , . . . , Xm ). Set C(c) := {x : gm (x) ≥ c, } for an appropriate value of c, and find
the corresponding set Cn (c) based on Y1 , . . . , Yn . Then check whether (2.5) holds. If
yes, then we can conclude with at significance level .05 that the Y1 , . . . , Yn stem from
a different distribution than the X1 , . . . , Xm . Observe that in this set-up we calculate
Eλ(Cn (c) ∆ C (c)) as well as σλ2 by using gm as the underlying distribution. (In practice
we may have to estimate these two quantities. See Subsection 2.7.) How this approach
would work in practice is the subject of a separate paper.
2.4
Assumptions and notation
Assumptions on the density f.
(D.i) f is in C 2 (IRd ) and its partial derivatives of order 1 and 2 are bounded;
(D.ii) inf x∈IRd f (x) < c < supx∈IRd f (x) .
Notice that (D.i) implies the existence of positive constants M and A such that
sup f (x) ≤ M < ∞
(2.6)
x
and
2
d
d
∂ f (x) 1 XX
=: A < ∞.
sup
2 i=1 j=1 x∈IRd ∂xi ∂xj (2.7)
(Condition (D.i) implies that f is uniformly continuous on IRd from which (2.6) follows.)
Assumptions on K.
(K.i) K is a probability density having support contained in the closed ball of radius
1/2 centered at zero and is bounded by a constant κ.
Pd R
(K.ii)
i=1 IRd ti K (t) dt = 0;
Observe that (K.i) implies that
Z
IRd
|t|2 |K (t)| dt = κ1 < ∞.
8
(2.8)
Assumptions on the boundary β = {x : f (x) = c} for d ≥ 2.
(B.i) For all (y1 , . . . , yd ) ∈ β,
′
′
f (y) = f (y1 , . . . , yd ) =
Define
Id =
The d − 1 sphere
∂f (y1 , . . . , yd)
∂f (y1 , . . . , yd )
,...,
∂y1
∂yd
6= 0.
[0, 2π)
,d = 2
[0, π]d−2 × [0, 2π) , d > 2.
S d−1 = x ∈ IRd : |x| = 1 ,
can be parameterized by (e.g. see Lang, 1997)
x (θ) = (x1 (θ) , . . . , xd (θ)) , θ ∈ Id ,
where
x1 (θ) = cos (θ1 ) ,
x2 (θ) = sin (θ1 ) cos (θ2 ) ,
x3 (θ) = sin (θ1 ) sin (θ2 ) cos (θ3 ) ,
..
.
xd−1 (θ) = sin (θ1 ) · · · sin (θd−2 ) cos (θd−1 ) ,
xd (θ) = sin (θ1 ) · · · sin (θd−2 ) sin (θd−1 ) .
(B.ii) We assume that the boundary β can be written as
β=
k
[
j=1
βj , with inf {|x − y| : x ∈ βj , y ∈ βl } > 0 if j 6= l,
that is a function (depending on j) of the above parameterization x (θ) of S d−1 , which is
1–1 on Jd , the interior of Id , with
∂y (θ)
∂y1 (θ)
∂yd (θ)
=
,...,
6= 0, θ ∈ Jd .
∂θi
∂θi
∂θi
We further assume that for each j = 1, . . . , k and i = 1, . . . , d, the function
continuous and uniformly bounded on Jd .
Assumptions on the boundary β for d = 1.
(B.i)
′
inf f (zi ) =: ρ0 > 0.
1≤i≤k
9
∂y(θ)
∂θi
is
(B.ii) β = {z1 , . . . , zk }, k ≥ 2.
(Condition (B.i) and f ∈ C 2 (IR) imply that the case k = 1 cannot occur when d = 1.)
Assumptions on G.
(G) The measure G has a bounded continuous Radon-Nikodym derivative g w.r.t.
Lebesgue measure λ. There exists a constant 0 < γg ≤ ∞ such that the following holds.
In the case d ≥ 2 there exists a function g (1) (·, ·) bounded on Id × S d−1 such that for each
j = 1, . . . , k, for some cj ≥ 0,
g(y (θ) + a z)
(1)
= o(1), as a ց 0,
−
c
g
(θ,
z)
sup sup j
a1/γg
|z|=1 θ∈Id
with 0 < sup|z|=1 supθ∈Id |g (1) (θ, z)| < ∞, where y (θ) is the parametrization pertaining to
βj , with at least one of the cj strictly positive.
In the case d = 1 there exists a function g (1) (·) with 0 < |g (1) (zj )| < ∞, j = 1, . . . , k such
that for each j = 1, . . . , k for some cj ≥ 0,
g(zj + a z)
(1)
= o(1), as a ց 0,
−
c
g
(z
)
sup j
j
a1/γg
|z|=1
with at least one of the cj strictly positive. By convention, in the above statement
1
∞
= 0.
Assumptions on hn .
As n → ∞,
q
1+2/d
(H) nhn
→ γ, with 0 ≤ γ < ∞ and nhn / log n → ∞, where γ = 0 in the case d = 1.
Discussion of the assumptions and some implications
Discussion of assumption (G). Measures G of particular interest that satisfy assumption
(G) are given by g(x) = |f (x) − c|p with p ≥ 0, and also by g(x) = f (x). The latter of
course leads to the F -measure of the symmetric distance. The former has connections to
the Lp -norm of the kernel density estimator (see discussion in Subsection 2.2). As pointed
out above in the Introduction, the choice p = 1 is closely connected to the excess risk
from the classification literature. The choice p = 0 yields the Lebesgue measure of the
symmetric difference.
Assumptions (B.i) and (D.i) imply that (G) holds for |f (x) − c|p with 1/γg = p. For the
choice g = f we have 1/γg = 0 (notice that by (D.ii) we have c > 0).
Discussion of smoothness assumptions on f . Our smoothness assumptions on f imply
that f has a γ-exponent with γ = 1 at the level c, i.e. we have
F {x ∈ IRd : |f (x) − c| ≤ ǫ} ≤ C ǫ.
10
(This fact follows from Lebesgue-Bosicovich theorem; e.g. see Cadre, 2006). This type
of assumption is common in the literature of level set estimation. It was used first by
Polonik (1995) in the context of density level set estimation.
Implications of (B) in the case d ≥ 2. In the following we shall record some conventions
and implications of assumption (B), which are needed in the proof of our theorem. Using
the notation introduced in assumption (B), we define ∂y(θ)
for points on the boundary of
∂θi
Id to be the limit taken from points in Jd . In this way, we see that each vector
continuous and bounded on the closure I d of Id .
∂y(θ)
∂θi
is
Notice in the case d ≥ 2 that for each j = 1, . . . , k and i = 1, . . . , d − 1,
df (y (θ))
∂f (y (θ)) ∂y1 (θ)
∂f (y (θ)) ∂yd (θ)
=
+···+
= 0,
dθi
∂y1
∂θi
∂yd
∂θi
(2.9)
where y (θ) is the parameterization pertaining to βj . This implies that the unit vector
u (θ) = (u1 (θ) , . . . , ud (θ)) :=
f ′ (y (θ))
|f ′ (y (θ))|
(2.10)
is normal to the tangent space of βj at y (θ).
From assumption (B.ii) we infer that β is compact, which when combined with (B.i) says
that
′
inf
f
(y
,
.
.
.
,
y
)
(2.11)
1
d =: ρ0 > 0.
(y1 ,...,yd )∈β
In turn, assumptions (D.i), (B.i) and (B.ii), when combined with (2.11), imply that for
each 1 ≤ i ≤ d − 1, the vector
∂ud (θ)
∂u1 (θ)
∂u (θ)
=
,...,
∂θi
∂θi
∂θi
is uniformly bounded on Id .
Consider for each j = 1, . . . , k, with y (θ) being the parameterization pertaining to βj ,
the absolute value of the determinant,
∂y(θ) ∂θ1 · · (2.12)
det =: ι (θ) .
· ∂y(θ) ∂θ
d−1 u (θ) We can infer from (B.ii) that we have
sup ι (θ) < ∞.
θ∈Id
11
(2.13)
2.5
Proof of the Theorem in the case d ≥ 2
We shall only present a detailed proof for the case k = 1. However, at the end we shall
describe how the proof of the general k ≥ 2 case goes. Thus for ease of notation we shall
drop the subscript j in the above assumptions. Also we shall assume c1 = 1 in assumption
(G).
We shall first show that with a suitably defined sequence of centerings bn , we have
p
(n/hn )1/4 ( n hn )1/γg {dG (Cn (c), C(c)) − bn } →d σZ
(2.14)
2
for some σ 2 > 0. (For the sake of notational convenience, we write in the proof σ 2 = σG
.)
From this result we shall infer that our central limit theorem (2.1) holds. The asymptotic
variance σ 2 will be defined in the course of the proof. It finally appears in (2.57) below.
Theorem 1 of Einmahl and Mason (2005) implies that when hn satisfies (H) and f is
bounded that for some constant γ1 > 0
s
nhn
sup |fn (x) − Efn (x)| ≤ γ1 , a.s.
(2.15)
lim sup
log n x∈IRd
n→∞
It is shown in (A.2) of Detail 2 in the Appendix, that under the assumptions (D), (K)
and (H) for some γ2 > 0,
p
sup nhn sup |Efn (x) − f (x)| ≤ γ2 .
(2.16)
n≥2
Set with ς >
√
x∈IRd
2 ∨ γ1 ,
En =
√
ς log n
x : |f (x) − c| ≤ √
.
nhn
(2.17)
We see by (1.1), (2.15) and (2.16) that with probability 1 for all large enough n
G (Cn (c) ∆ C (c)) =
Z
En
|I {fn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx
=: Ln (c) .
(2.18)
It turns out that rather than considering the truncated quantity Ln (c) directly, it is more
convenient to first study a Poissonized version of Ln (c) formed by replacing fn (x) by
Nn
x − Xi
1 X
K
,
πn (x) =
1/d
nhn i=1
hn
where Nn is a mean n Poisson random variable independent of X1 , X2 , . . . (When Nn = 0
we set πn (x) = 0.) Notice that
Eπn (x) = Efn (x) .
12
We shall make repeated use of the fact following from the assumption that K has support
contained in the closed ball of radius 1/2 centered at zero, that πn (x) and πn (y) are
1/d
independent whenever |x − y| > hn .
Here is the Poissonized version of Ln (c) that we shall treat first. Define
Z
Πn (c) =
|I {πn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx.
(2.19)
En
Our goal is to infer a central limit theorem for Ln (c) and thus for G (Cn (c) ∆ C (c)) from
a central limit theorem for Πn (c) .
Set
∆n (x) = |I {πn (x) ≥ c} − I {f (x) ≥ c}| .
(2.20)
√
The first item on this agenda is to verify that (n/hn )1/4 ( n hn )1/γg is the correct sequence
of norming constants. To do this we must analyze the exact asymptotic behavior of the
variance of Πn (c) . We see that
Z
Var (Πn (c)) = Var
Z Z
=
En
Let
Yn (x) =
"
X
j≤N1
(1)
K
x − Xj
1/d
hn
∆n (x) dG(x)
En
cov (∆n (x) , ∆n (y)) dG(x) dG(y).
En
− EK
x−X
1/d
hn
# ,s
EK 2
x−X
1/d
hn
(n)
and Yn (x), . . . , Yn (x) be i.i.d. Yn (x).
Clearly
=d
!
πn (x) − Eπn (x) πn (y) − Eπn (y)
p
, p
Var (πn (x))
Var (πn (x))
!
Pn
Pn
(i)
(i)
i=1 Yn (y)
i=1 Yn (x)
√
√
=: (π n (x) , πn (y)) .
,
n
n
Set
R
√
1/d
√
c
−
K(y)f
x
−
yh
dy
nh
d
n
n
IR
nhn (c − Efn (x))
r
=
.
cn (x) = r
x−X
x−X
1
1
2
2
EK
EK
1/d
1/d
hn
hn
hn
hn
13
Since K has support contained in the closed ball of radius 1/2 around zero, which implies
1/d
that ∆n (x) and ∆n (y) are independent whenever |x − y| > hn , we have
Z
∆n (x) dG(x)
Var
En
Z Z
I(|x − y| ≤ h1/d
=
n ) cov (∆n (x) , ∆n (y)) dG(x) dG(y),
En
En
where now we write
)
(
√
nh
(c
−
f
(x))
n
∆n (x) = I {π n (x) ≥ cn (x)} − I 0 ≥ 1/2 .
1
EK 2 x−X
1/d
hn
hn
1/d
The change of variables y = x + thn , t ∈ B, with
B = {t : |t| ≤ 1} ,
(2.21)
gives
Var
Z
∆n (x) dx = hn
Z
En
where
En
Z
gn (x, t) dt dx,
(2.22)
B
1/d
gn (x, t) = IEn (x)IEn (x + th1/d
n ) cov ∆n (x) , ∆n x + thn
× g(x) g(x + t h1/d
n ).
For ease of notation let an = an,G =
lim
a2n
Z
Var
Z
2
= lim an hn
n→∞
n
hn
41 √
nhn
∆n (x) dG(x)
En
Z
γ1
g
. We intend to prove that
gn (x, t) dt dx
Z
1
1
+
gn (x, t) dt dx
= lim ( nhn ) 2 γg
n→∞
En B
Z
Z
1
1
+
= lim lim ( nhn ) 2 γg
gn (x, t) dt dx =: σ 2 < ∞,
n→∞
En
τ →∞ n→∞
where
Dn (τ ) :=
ZB
Dn (τ )
(2.23)
(2.24)
B
s u (θ)
, θ ∈ Id , |s| ≤ τ
z : z = y (θ) + √
nhn
The set Dn (τ ) forms a band around the surface β of thickness
.
√2τ .
nhn
Recall the definition of B in (2.21). Since β is a closed submanifold of IRd without
boundary the Tubular Neighborhood Theorem (see Theorem 11.4 on page 93 of Bredon
14
(1993)) says that for all δ > 0 sufficiently small for each x ∈ β + δB there is an unique
θ ∈ Id and |s| ≤ δ such that x = y (θ) + s u (θ). This, in turn, implies that for all δ > 0
sufficiently small
{y (θ) + su (θ) : θ ∈ Id and |s| ≤ δ} = β + δB.
In particular, we see by using (H) that for all large enough n
τ
B, where B = {z : |z| ≤ 1} .
Dn (τ ) = β + √
nhn
(2.25)
Moreover, it says that x = y (θ) + su (θ), θ ∈ Id and |s| ≤ δ is a well–defined parameterization of β + δB, and it validates the change of variables in the integrals below.
We now turn to the proof of (2.24). Let
ρn x, x + th1/d
= Cov π n (x) , πn x + th1/d
n
n
h i
x−X
x−X
h−1
E
K
+
t
K
1d
1d
n
hn
hn
=r
.
x−X
x−X
−1
2
−1
2
hn EK
+t
hn EK
h1d
h1d
n
n
s u(θ)
It is routine to show that for each θ ∈ Id , |s| ≤ τ , x = y (θ) + √
and t ∈ B we have as
nhn
n → ∞ that
s u (θ)
s u (θ)
1/d
1/d
, y (θ) + √
→ ρ(t),
+ thn
ρn (x, x + thn ) = ρn y (θ) + √
nhn
nhn
where
ρ(t) :=
R
IRd
K (u) K (u + t) du
R
.
K 2 (u) du
IRd
(See Detail 4 in Appendix.) Notice that ρ(t) = ρ(−t). One can then infer by the central
limit theorem that for each θ ∈ Id , |s| ≤ τ and t ∈ B,
π n (x) , π n x + th1/d
n
s u (θ)
s u (θ)
1/d
= π n y (θ) + √
, πn y (θ) + √
+ thn
nhn
nhn
p
(2.26)
→d Z1 , ρ(t)Z1 + 1 − ρ2 (t)Z2 ,
where Z1 and Z2 are independent standard normal random variables.
We also get by using our assumptions and straightforward Taylor expansions that for
u
|s| ≤ τ , u = s u (θ), x = y (θ) + √nh
and θ ∈ Id
n
√nh c − Ef y (θ) + √ u
n
n
u
nhn
r
=
cn (x) = cn y (θ) + √
u
nhn
y(θ)+ √nh −X
1
n
2
EK
1/d
hn
hn
15
→ −p
n→∞
and similarly since
q
f ′ (y (θ)) · u
s |f ′ (y (θ))|
=− √
=: c(s, θ, 0)
c kKk2
f (y (θ)) kKk2
1+2/d
→ γ,
nhn
cn x + th1/d
n
We also have
(2.27)
f ′ (y (θ)) · (u + t)
→ − p
n→∞
f (y (θ)) kKk2
s |f ′ (y (θ))| γf ′ (y (θ)) · t
− √
=: c(s, θ, γt).
= − √
c kKk2
c kKk2
√
nhn (c − f (x))
r
→ c(s, θ, 0)
n→∞
x−X
1
2
EK
1/d
hn
hn
and
√
1/d
nhn c − f x + thn
r
→ c(s, θ, γt).
n→∞
x−X
1
2
EK
1/d
hn
hn
(See Detail 5 in the Appendix.) Hence by (2.26) and (G) for y (θ) ∈ β,
u
(n hn )1/γg gn (x, t) = (n hn )1/γg gn y(θ) + √
,t
nhn
u u
1/d
√
√
= IEn y (θ) +
IEn y (θ) +
+ thn
nh
nh
n n
u
u
1/d
, ∆n y (θ) + √
+ thn
× cov ∆n y (θ) + √
nhn
nhn
u
u
1/d
1/γg
g y(θ) + √
+ t hn
× (nhn )
g y(θ) + √
nhn
nhn
→ cov |I {Z1 ≥ c(s, θ, 0)} − I {0 ≥ c(s, θ, 0)}| ,
n→∞
n
o
p
2
I ρ(t)Z1 + 1 − ρ (t)Z2 ≥ c(s, θ, γt) − I {0 ≥ c(s, θ, γt)}
su(θ) + γt 1
1
(1)
(1)
γg
γg
× |s| g (θ, u(θ)) |su(θ) + γt| g
θ,
|su(θ) + γt|
=: Γ (θ, s, t) .
(2.28)
Using the change of variables
sud (θ)
su1 (θ)
, . . . , xd = yd (θ) + √
,
x1 = y1 (θ) + √
nhn
nhn
16
(2.29)
we get
Z
Dn (τ )
Z
gn (x, t) dt dx =
B
Z
τ
−τ
where
su (θ)
, t | Jn (θ, s)| dt dθ ds,
gn y (θ) + √
nhn
B
Z Z
Id
|Jn (θ, s)| = det ∂y(θ)
∂θ1
∂y(θ)
∂θd−1
+
√1
nhn
·
·
·
∂u(θ)
∂θ1
∂u(θ)
1
+ √nh
n ∂θd−1
u(θ)
√
nhn
Clearly, with ι (θ) as in (2.12),
.
(2.30)
p
nhn |Jn (θ, s)| → ι (θ) .
(2.31)
√
Under our assumptions we have nhn |Jn (θ, s)| is uniformly bounded in n ≥ 1 and
1
(θ, s) ∈ Id × [−τ, τ ]. Also by using (G) we see that for all n large enough (nhn ) γg gn is
1
√
bounded on Id × B. Thus since (nhn ) γg gn and nhn |Jn | are eventually bounded on the
appropriate domains, and (2.28) and (2.31) hold, we get by the dominated convergence
theorem and (G) that
Z
Z
1
+ 1
2 γg
gn (x, t) dt dx
(nhn )
Dn (τ ) B
Z τZ Z
1
1
su
(θ)
+
, t | Jn (θ, s)| dt dθ ds
gn y (θ) + √
= (nhn ) 2 γg
nhn
−τ Id B
Z τZ Z
Γ (θ, s, t) ι (θ) dt dθ ds, as n → ∞.
(2.32)
→
−τ
Id
B
We claim that as τ → ∞ we have
Z τZ Z
Γ (θ, s, t) ι (θ) dt dθ ds
−τ Id B
Z ∞Z Z
Γ (θ, s, t) ι (θ) dt dθ ds =: σ 2 < ∞
→
−∞
and
lim lim sup (nhn )
τ →∞
(2.33)
B
Id
1
+ 1
2 γg
n→∞
Z
C (τ )∩E
Dn
n
Z
gn (x, t) dt dx = 0,
(2.34)
B
which in light of (2.32) implies that the limit in (2.24) is equal to σ 2 as defined in (2.33).
First we show (2.33). Consider
+
Γ (τ ) :=
Z
τ
0
Z Z
Id
Γ (θ, s, t) ι (θ) dt dθ ds.
B
17
We shall show existence and finiteness of the limit limτ →∞ Γ+ (τ ). Similar arguments apply
to
Z Z Z
0
lim Γ− (τ ) := lim
τ →∞
τ →∞
−τ
Observe that when s ≥ 0,
Id
B
Γ (θ, s, t) ι (θ) dt dθ ds < ∞.
|I {Z1 ≥ c (s, θ, 0)} − I {0 ≥ c (s, θ, 0)}| = I {Z1 < c (s, θ, 0)}
and with Φ denoting the cdf of a standard normal distribution we write
o
E I Z1 < c (s, θ, 0) = Φ c (s, θ, 0) .
Hence by taking into account (2.28), the assumed finiteness of sup sup g (1) (θ, z), and using
|z|=1
the elementary inequality
θ
|cov (X, Y ) | ≤ 2E |X| , whenever |Y | ≤ 1,
we get for all s ≥ 0 and some c1 > 0 that
1
1
1
γg
γg
γg
|s| + γ
Φ c (s, θ, 0) .
|Γ(θ, s, t)| ≤ c1 |s|
(2.35)
The lower bound (2.11) implies the existence of a constant c̃ > 0 such that
s |f ′ (y(θ))|
≤ Φ(−c̃ s).
Φ c (s, θ, 0) = Φ − √
c kKk2
Together with (2.35) and (2.13) it follows that for some c > 0 we have
Z τ
1
1
1
+
γg
γg
γg
|s| + γ
( 1 − Φ(c̃ s) ) ds < ∞.
lim Γ (τ ) ≤ c lim
|s|
τ →∞
τ →∞
0
Similarly,
+
+
|Γ (∞) − Γ (τ )| ≤ c
Z
∞
τ
1
|s| γg
This validates claim (2.33).
1
1
|s| γg + γ γg
(1 − Φ(c̃ s) ) ds → 0, as τ → ∞.
Next we turn to the proof of (2.34). Recall the definition of gn (x, t) in (2.23). Notice that
for all n large enough, we have
Z
Z
1
+ 1
2 γg
gn (x, t) dt dx
(nhn )
C (τ )∩E
B
Dn
n
Z
Z
p
cov ∆n (x) , ∆n x + th1/d
≤ nhn
n
C (τ )∩E
Dn
n
B
1
≤
p
nhn
Z
C (τ )∩E
Dn
n
× (nhn ) γg g(x) g(x + th1/d
n ) dt dx
Z 1/2
Var (∆n (x))
(2.36)
B
1
× (nhn ) γg g(x) g(x + th1/d
n ) dt dx.
18
(2.37)
1/d
1/d
The last inequality uses the fact that ∆n (x + thn )) ≤ 1 and thus Var(∆n (x + thn )) ≤ 1.
Applying the inequality
|I {a ≥ b} − I {0 ≥ b}| ≤ I {|a| ≥ |b|} ,
(2.38)
we obtain that
Var (∆n (x)) + Var (|I {πn (x) ≥ c} − I {f (x) ≥ c}|) ,
≤ E (I {|πn (x) − f (x)| ≥ |c − f (x)|})2 = P {|πn (x) − f (x)| ≥ |c − f (x)|} .
Thus we get that
p
nhn
≤
Z
C (τ )∩E
Dn
n
p
nhn
Z
Z p
1
Var(∆n (x))(nhn ) γg g(x) g(x + th1/d
n ) dt dx
B
C (τ )∩E
Dn
n
Z p
B
P {|πn (x) − f (x)| ≥ |c − f (x)|}
1
× (nhn ) γg g(x) g(x + th1/d
n ) dt dx.
(2.39)
We must bound the probability inside the integral. For this purpose we need a lemma.
Lemma 2.1 Let Y, Y1 , Y2, . . . be i.i.d. with mean µ and bounded by 0 < M < ∞. Independent of Y1 , Y2 , . . . let Nn be a Poisson random variable with mean n. For any v ≥
2 (e30)2 EY 2 and with d = e30M we have for all λ > 0,
(N
)
n
X
λ2 /2
.
(2.40)
P
Yi − nµ ≥ λ ≤ exp −
nv + dλ
i=1
Proof. Let N be a Poisson random variable with mean 1 independent of Y1 , Y2, . . . and
let
N
X
ω=
Yi .
i=1
Clearly if ω1 , . . . , ωn are i.i.d. ω, then
Nn
X
i=1
Yi − nµ =d
n
X
i=1
(ωi − µ) .
Our aim is to use Bernstein’s inequality to prove (2.40). Notice that for any integer r ≥ 2,
r
N
X
(2.41)
Yi − µ .
E |ω − µ|r = E i=1
At this point we need the following fact, which is Lemma 2.3 of Giné, Mason and Zaitsev
(2003).
19
Fact 1. If, for each n ≥ 1, ζ, ζ1, ζ2 , . . . , ζn , . . . , are independent identically distributed
random variables, ζ0 = 0, and η is a Poisson random variable with mean γ > 0 and
independent of the variables {ζi }∞
i=1 , then, for every p ≥ 2,
p η
p
X
h
i
p/2
15p
ζi − γEζ ≤
E
max γEζ 2
, γE|ζ|p .
(2.42)
log p
i=0
Applying inequality (2.42) to (2.41) gives for r ≥ 2
r r
N
X
h
i
r/2
15r
r
max EY 2
, E|Y |r .
Yi − µ ≤
E |ω − µ| = E log r
i=1
Now
max
h
EY 2
r/2
i
h
i
r/2−1
, E|Y |r ≤ max EY 2 EY 2
, EY 2 M r−2
≤ EY 2 M r−2 .
Moreover, since log 2 ≥ 1/2, we get
E |ω − µ|r ≤ (30r)r EY 2 M r−2 .
By Stirling’s formula (see page 864 of Shorack and Wellner (1986))
r r ≤ er r!.
Thus
E |ω − µ|r ≤ (e30r)r EY 2 M r−2 ≤
v
2 (e30)2 EY 2
r! (e30M)r−2 ≤ r! dr−2,
2
2
where v ≥ 2 (e30)2 EY 2 and d = e30M. Thus by Bernstein’s inequality (see page 855 of
Shorack and Wellner (1986)) we get (2.40). ⊔
⊓
i
Here is how Lemma 2.1 is used. Let Yi = K x−X
. Since by assumption both K and f
1/d
hn
are bounded, and K has support contained in the closed ball of radius 1/2 around zero,
we obtain that for some D0 > 0 and all n ≥ 1,
2
x−X
≤ D0 hn .
sup E K
1/d
hn
x∈IRd
√
Consider z ≥ a/ nhn for some a > 0. With this choice, and since by (A.1) of Detail 2 in
Appendix, supx |Efn (x) − f (x)| ≤ A1 h2/d ≤ 2 √anhn for n large enough by using (H), we
have
P {πn (x) − f (x) ≥ z} = P {πn (x) − Efn (x) ≥ z − (Efn (x) − f (x))}
n
1 a o
≤ P πn (x) − Efn (x) ≥ z − √
2 nhn
n
zo
≤ P πn (x) − Efn (x) ≥
2
20
√
for z ≥ a/ nhn and n large enough. We get then from inequality (2.40) that for n ≥ 1,
all z > 0 that for some constants D1 and D2
Nn
n
nX
x − X x − X zo
nhn z o
i
P πn (x) − Efn (x) ≥
=P
K
−
nEK
≥
1/d
1/d
2
2
hn
hn
i=1
!
(nhn )2 z 2
nhn z 2
≤ exp −
= exp −
.
D1 nhn + D2 nhn z
D1 + D2 z
√
We see that for some a > 0 for all z ≥ a/ nhn and n large enough,
p
nhn z 2
≥ nhn z.
D1 + D2 z
√
Observe that for 0 ≤ z ≤ a/ nhn ,
p
exp (a) exp − nhn z ≥ exp (a) exp (−a) = 1 ≥ P {πn (x) − f (x) ≥ z} .
Therefore by setting A = exp (a) we get for all large enough n ≥ 1, z > 0 and x,
p
P {πn (x) − f (x) ≥ z} ≤ A exp − nhn z .
In the same way, for all large enough n ≥ 1, z > 0 and x,
p
P {πn (x) − f (x) ≤ −z} ≤ A exp − nhn z .
Notice these inequalities imply that for all large enough n ≥ 1, z > 0 and x,
√nh |c − f (x)| p
√
n
P {|πn (x) − f (x)| ≥ |c − f (x)|} ≤ A exp −
.
2
(2.43)
Returning to the proof of (2.34), from (2.36), (2.37), (2.39) and (2.43) we get that for all
large enough n ≥ 1,
Z
Z
1
+ γ1
2
g
gn (x, t) dt dx
(nhn )
C (τ )∩E
Dn
n
≤
p
√ Z
nhn A
C (τ )∩E
Dn
n
which equals
where
ϕn (x) =
B
Z
e−
√
nhn |c−f (x)|
2
B
Z
p
λ(B) nhn
√
1
(nhn ) γg g(x) g(x + th1/d
n )dt dx,
ϕn (x) dx,
C (τ )∩E
Dn
n
√
1
nhn |c − f (x)|
(nhn ) γg g(x) g(x + th1/d
A exp −
n ).
2
21
√
Our assumptions imply that for some 0 < η < 1 for all 1 ≤ |s| ≤ ς log n and n large
√
nhn su (θ) ≥ η |s| .
c − f y (θ) + √
2 nhn (See Detail 3 in Appendix.) We get using the change of variables (2.29) that for all τ > 1,
Z
Z
Z
su (θ)
ϕn (x) dx =
| Jn (θ, s)| dθ ds,
ϕn y (θ) + √
√
C (τ )∩E
nhn
τ ≤|s|≤ς log n Id
Dn
n
which by our assumptions (refer to the remarks after equation (2.31) and assumption (G))
is for some constant C > 0, for all large enough τ and n
√
su(θ) !
Z
Z
√
c
−
f
y
(θ)
+
nh
n
2
C
nhn
γg
dθ ds
≤√
|s|
exp
−
2
nhn τ ≤|s|≤ς √log n Id
Z
Z
C
η |s|
≤√
exp −
dθ ds .
2
nhn τ ≤|s|≤ς √log n Id
Thus
Z
C (τ )∩E
Dn
n
4π d−1 C exp − ητ2
√
.
ϕn (x) dx ≤
η nhn
B
Z
(2.44)
Therefore after inserting all of the above bounds we get that
(nhn )
1
+ 1
2 γg
Z
C (τ )∩E
Dn
n
4 π d−1 C exp − ητ2
gn (x, t) dx dt ≤
,
η
B
Z
and hence we readily conclude that (2.34) holds.
Putting everything together we get that as n → ∞,
a2n Var (Πn (c)) → σ 2 ,
(2.45)
with σ 2 defined as in (2.33). For future use, we point out that we can infer by (2.24),
(2.34) and (2.45) that for all ε > 0 there exist a τ0 and an n0 ≥ 1 such that for all τ ≥ τ0
and n ≥ n0
2
σn (τ ) − σ 2 < ε,
(2.46)
where
σn2 (τ ) = Var
n
hn
1/4 p
nhn
γ1 Z
g
!
∆n (x) g (x) dx .
Dn (τ )
(2.47)
Our next goal is to de-Poissonize by applying the following version of a theorem in Beirlant
and Mason (1995).
22
Lemma 2.2 Let N1,n and N2,n be independent Poisson random variables with N1,n being
Poisson(nβn ) and N2,n being Poisson(n (1 − βn )) where βn ∈ (0, 1). Denote Nn = N1,n +
N2,n and set
N1,n − nβn
N2,n − n (1 − βn )
√
√
Un =
and Vn =
.
n
n
Let {Sn }∞
n=1 be a sequence of random variables such that
(i) for each n ≥ 1, the random vector (Sn , Un ) is independent of Vn ,
(ii) for some σ 2 < ∞,
Sn →d σZ, as n → ∞,
(iii) βn → 0, as n → ∞.
Then, for all x,
P {Sn ≤ x | Nn = n} → P {σZ ≤ x} .
The proof follows along the same lines as Lemma 2.4 in Beirlant and Mason (1995).
Consult Detail 6 of the Appendix for a proof.
We shall now use this de-Poissonization lemma to complete the proof of our theorem.
Recall the definitions of Ln (c) and Πn (c) in (2.18) and (2.19), respectively. Noting that
Dn (τ ) ⊂ En for all large enough n ≥ 1, we see that
an (Ln (c) − EΠn (c))
Z
= an
{| I{fn (x) ≥ c } − I{ f (x) ≥ c}| − E∆n (x)} g(x) dx
Dn (τ )
Z
{| I{fn (x) ≥ c } − I{ f (x) ≥ c}| − E∆n (x)} g(x) dx
+ an
=:
Dn (τ )C ∩En
Tn (τ ) + Rn (τ ) .
We can control the Rn (τ ) piece of this sum using the inequality, which follows from
Lemma 2.3 below,
E (Rn (τ ))2
Z
2
≤ 2an Var
|I {πn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx
Dn (τ )C ∩En
Z
Z
21 + γ1
g
gn (x, t) dt dx,
= 2 nhn
C (τ )∩E
Dn
n
(2.48)
B
which goes to zero as n → ∞ and τ → ∞ as we proved in (2.34).
The needed inequality is a special case of the following result in Giné, Mason and Zaitsev
(2003). We say that a set D is a (commutative) semigroup if it has a commutative and
associative operation, in our case sum, with a zero element. If D is equipped with a
σ-algebra D for which the sum, + : (D × D, D ⊗ D) 7→ (D, D), is measurable, then we
say the (D, D) is a measurable semigroup.
23
Lemma 2.3 Let (D, D) be a measurable semigroup; let Y0 = 0 ∈ D and let Yi , i ∈ N, be
independent identically distributed D-valued random variables; for any given n ∈ N, let
η be a Poisson random variable with mean n independent of the sequence {Yi }; and let
B ∈ D be such that P {Y1 ∈ B} ≤ 1/2. If G : D 7→ R is non–negative and D-measurable,
then
!
!
η
n
X
X
EG
I(Yi ∈ B)Yi ≤ 2EG
I(Yi ∈ B)Yi .
(2.49)
i=0
i=0
Next we consider Tn (τ ). Observe that
(Sn (τ ) |Nn = n) =d
Tn (τ )
,
σn (τ )
(2.50)
where as above Nn denotes a Poisson random variable with mean n,
R
an Dn (τ ) {∆n (x) − E∆n (x)} g(x) dx
Sn (τ ) =
,
σn (τ )
and σn2 (τ ) is defined as in (2.47). We shall apply Lemma 2.2 to Sn (τ ) with
N1,n =
Nn
X
√
1 Xi ∈ Dn τ + nhn ,
i=1
N2,n
Nn
X
√
1 Xi ∈
/ Dn τ + nhn
=
i=1
and
√
βn = P Xi ∈ Dn τ + nhn .
We first need to verify that as n → ∞
R
an Dn (τ ) {∆n (x) − E∆n (x)} g(x) dx
→d Z.
Sn (τ ) =
σn (τ )
To show this we require the following special case of Theorem 1 of Shergin (1990).
Fact 2. (Shergin (1990)) Let {Xi,n : i ∈ Zd } denote a triangular array of mean zero
m-dependent random fields, and let J n ⊂ Zd be such that
P
(i) Var
X
→ 1, as n → ∞, and
i,n
i∈Jn
P
(ii) for some 2 < s < 3, i∈Jn E|Xi,n |s → 0, as n → ∞.
Then
X
i∈Jn
Xi,n →d Z,
where Z is a standard normal random variable.
24
We use Shergin’s result as follows. Under our regularity conditions, for each τ > 0 there
exist positive constants d1 , . . . , d5 such that for all large enough n,
d1
|Dn (τ )| ≤ √
;
nhn
(2.51)
d2 ≤ σn (τ ) ≤ d3 .
(2.52)
Clearly (2.52) follows from (2.46), and (2.51) is shown in Detail 7 in the Appendix. There
it is also shown that for each such integer n ≥ 1 there exists a partition {Ri , i ∈ Jn ⊂ Zd }
of Dn (τ ) such that for each i ∈ Jn
|Ri | ≤ d4 hn ,
where
Define
Xi,n =
an
R
(2.53)
d5
.
|Jn | =: mn ≤ p
nh3n
Ri
(2.54)
{∆n (x) − E∆n (x)} g(x) dx
σn (τ )
, i ∈ Jn .
It follows from Detail 7 in the Appendix that Xi,n can be extended to a 1−dependent
random field on Zd .
Notice that by (G) there exists a constant A > 0 such that for all x ∈ Dn (τ ),
1
p
−γ
g .
nhn
|g(x)| ≤ A
Recalling that an = an,G =
n
hn
14 √
nhn
γ1
g
we thus obtain for all for i ∈ Jn ,
√
an 2A |Ri | A nhn
|Xi,n | ≤
σn (τ )
2d4 Ahn
≤
d2
Therefore,
X
i∈Jn
5
2
E|Xi,n | ≤ mn
n
hn
1/4
=
1/4
2d4
nh3n
d2
− γ1
g
1/4
2Ad4
nh3n
.
d2
52
≤ d5
2d4
d2
25
This bound when combined with (H) implies that as n → ∞,
X
5
E|Xi,n | 2 → 0,
i∈Jn
25
nh3n
1/8
.
which by the Shergin fact (with s = 5/2) yields
X
Xi,n →d Z.
Sn (τ ) =
i∈Jn
Thus, using (2.50) and βn = P {Xi ∈ Dn (τ +
√
nhn )} → 0, Lemma 2.2 implies that
Tn (τ )
→d Z.
σn (τ )
(2.55)
Putting everything together we get from (2.48) that
lim lim sup E (Rn (τ ))2 = 0
τ →∞
n→∞
and from (2.46) that
lim lim sup σn2 (τ ) − σ 2 = 0,
τ →∞
n→∞
which in combination with (2.55) implies that
an (Ln (c) − EΠn (c)) →d σZ,
where
2
σ =
Z
∞
−∞
Z Z
Id
(2.56)
Γ (θ, s, t) ι (θ) dtdθds,
(2.57)
B
with Γ(θ, s, t) as defined in (2.28). Since by Lemma 2.3
E (an (Ln (c) − EΠn (c)))2 ≤ 2 Var (an Πn (c))
and
we can conclude that
and thus
Var (an Πn (c)) → σ 2 < ∞,
an (ELn (c) − EΠn (c)) → 0
an (Ln (c) − ELn (c)) →d σZ.
This gives that
Z
an G (Cn (c) ∆ C (c)) −
En
E |I {fn (x) ≥ c} − I {f (x) ≥ c}| dG(x)
→d σZ,
which is (2.14). In light of (2.58) and keeping mind that
Z
EG (Cn (c) ∆ C (c)) =
E |I {fn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx,
IRd
26
(2.58)
we see that to complete the proof of (2.1) it remains to show that
Z
an E
|I {fn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx → 0.
(2.59)
c
En
We shall begin by bounding
E |I {fn (x) ≥ c} − I {f (x) ≥ c}| , x ∈ Enc .
Applying inequality (2.38) with a = fn (x) − f (x) and b = c − f (x) we have for x ∈ Enc ,
E |I {fn (x) ≥ c} − I {f (x) ≥ c}|
≤ EI {|fn (x) − f (x)| ≥ |c − f (x)|}
= P {|fn (x) − f (x)| ≥ |c − f (x)|}
≤ P {|fn (x) − Efn (x)| ≥ |c − f (x)| − |f (x) − Efn (x)|} ,
which by recalling the definition of En in (2.17) is
)
(
ς (log n)1/2
− |f (x) − Efn (x)| .
≤ P |fn (x) − Efn (x)| ≥
(nhn )1/2
This last bound is, in turn, by (A.1)
(
≤P
ς (log n)1/2
− A1 h2/d
|fn (x) − Efn (x)| ≥
n
1/2
(nhn )
)
.
Thus for all large enough n uniformly in x ∈ Enc ,
E |I {fn (x) ≥ c} − I {f (x) ≥ c}|
q



1+2/d 1/d
1/2


hn
A1 nhn
ς (log n) 

√
1
−
≤ P |fn (x) − Efn (x)| ≥
.


(nhn )1/2
ς
log n
which by (H) is for all large enough n uniformly in x ∈ Enc ,
)
(
ς (log n)1/2
=: pn (x) .
≤ P |fn (x) − Efn (x)| ≥
2(nhn )1/2
We shall bound pn (x) using Bernstein’s inequality on the i.i.d. sum
n 1 X
x − Xi
x − Xi
fn (x) − Efn (x) =
K
− EK
,
1/d
1/d
nhn i=1
hn
hn
27
Notice that for each i = 1, . . . , n,
Z
x−y
1
x − Xi
1
2
K
Var
K
≤
f (y) dy
1/d
1/d
nhn
(nhn )2 IRd
hn
hn
1
= 2
n hn
and by (K.i),
Z
2
IRd
K (u) f x −
h1/d
n u
kKk22 M
du ≤
n2 hn
K x − Xi − EK x − Xi ≤ 2κ .
nhn
1/d
1/d
hn
hn
Therefore by Bernstein’s inequality (i.e. page 855 of Shorack and Wellner (1986)),


ς 2 log n
− 4nhn

pn (x) ≤ 2 exp 
2
1/2
kKk2 M
κ
2 ς(log n)
+
nhn
3 2(nhn )1/2 nhn
1
nhn
=

2
n
− ς log
4
2 exp 
n)1/2
kKk22 M + κς(log
3(nhn )1/2

.
√
Hence by (H) and keeping in mind that ς > 2 in (2.17), we get for some constant a > 0
that for all large enough n, uniformly in x ∈ Enc , we have the bound
pn (x) ≤ 2 exp (−ςa log n) .
(2.60)
We shall show below that λ (Cn (c) ∆ C (c)) ≤ m < ∞ for some 0 < m < ∞. Assuming
this to be true, we have the following (similar lines of arguments are used in Rigollet and
Vert, 2008)
Z
E
|I {fn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx
c
En
Z
= E
|I {fn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx
c ∩(C (c)∆ C(c))
En
n
Z
≤
sup E
|I {fn (x) ≥ c} − I {f (x) ≥ c} g(x)|dx
c ∩A
A:λ(A)≤m
En
Z
E |I {fn (x) ≥ c} − I {f (x) ≥ c} | g(x) dx
≤
sup
A:λ(A)≤m
c ∩A
En
≤ m sup g(x) sup E |I {fn (x) ≥ c} − I {f (x) ≥ c} |
x
c
x∈En
≤ m sup g(x) sup pn (x) .
x
c
x∈En
28
With c0 = m supx g(x) and (2.60) this gives the bound
Z
an E
|I {fn (x) ≥ c} − I {f (x) ≥ c}| g(x) dx
c
En
≤ 2 c0 an exp (−ςa log n) .
Clearly by (H), we see that for large enough ς > 0
an exp (−ςa log n) → 0
and thus (2.59) follows. It remains to verify that there exists 0 < m < ∞ with
λ (Cn (c) ∆ C (c)) ≤ m.
Notice that
1≥
Z
and
1≥
Cn (c)
Z
C(c)
fn (x) dx ≥ c λ (Cn (c))
f (x) dx ≥ c λ (C (c)) .
Thus
λ (Cn (c) ∆ C (c)) ≤ 2/c =: m.
We see now that the proof of the theorem in the case k = 1 and d ≥ 2 is complete.
The proof for the case k ≥ 2 goes through by an obvious extension of the argument used
in the case k = 1. On account of (B.ii) we can write for large enough n
Z
∆n (x) dG(x) =
En
n Z
X
j=1
∆n (x) dG(x),
Ej,n
where the sets Ej,n , j = 1, . . . , k, are disjoint and constructed from the βj just as En was
formed from the boundary set β in the proof for the case k = 1. Therefore by reason
of the Poissonization, the summands are independent. Hence the asymptotic normality
readily follows as before, where the limiting variance in (2.1) becomes
σ2 =
k
X
i=1
where each σj2 is formed just like (2.57).
29
c2j σj2 ,
(2.61)
2.6
Proof of the theorem in the case d = 1
The case d = 1 follows along very similar ideas as presented above in the case d ≥ 2 and is
in fact somewhat simpler than the case d ≥ 2. We therefore skip all the details and only
point out that by assumption (B.ii) the boundary set β = {x ∈ IR : f (x) = c } consists
of k points zi , i = 1, . . . , k. Therefore, the integral over θ in the definition of σ 2 in (2.57)
has to be replaced by a sum, leading to
2
σ :=
k
X
(1)
g (zi )
i=1
where
2
Z
∞
−∞
Z
1
−1
2
Γ (i, s, t) |s| γg dt ds,
(2.62)
n
n
sf ′ (zi ) o
sf ′ (zi ) o
Γ (i, s, t) = cov I Z1 ≥ − √
− I 0 ≥ −√
,
c kKk2
c kKk2
n
n
p
s f ′ (zi ) o
s f ′ (zi ) o
− I 0 ≥ −√
.
I ρ(t)Z1 + 1 − ρ2 (t)Z2 ≥ − √
c kKk2
c kKk2
We can drop the absolute value sign on f ′ (zi ) in our definition of Γ (i, s, t) for i = 1, . . . , k
and thus σ 2 , since ρ(t) = ρ(−t).
2.7
Remarks on the variance and its estimation
2
Clearly the variance σG
that appears in our Theorem does not have a nice closed form
2
and in many situations is not feasible to calculate. Therefore in applications σG
will
very likely have to be estimated either by simulation or from the data itself. In the
latter case, an obvious suggestion is to apply the bootstrap and another is to use the
jackknife. A separate investigation is required to verify that these methods work in this
setup. (Similarly we may also need to estimate EG(Cn (c) ∆ C (c)).)
2
Here is a quick and dirty way to estimate σG
. Let X1 , . . . , Xn be i.i.d. f . Choose a
sequence of integers 1 ≤ mn ≤ n, such that mn → ∞ and n/mn → ∞. Set ςn = [n/mn ]
and take a random sample of the data X1 , . . . , Xn of size mn ςn and then randomly divide
this sample into ςn disjoint samples of size mn . Let
ξi = dG (C(i)
mn (c)∆ C(c)) for i = 1, . . . , ςn ,
(i)
2
, the sample
where Cmn (c) is formed from sample i. We propose as our estimator of σG
1/4
variance of hmmn
ξi , i = 1, . . . , ςn ,
n
mn
hmn
1/2 X
ςn
i=1
ξi − ξ
2
/ (ςn − 1) .
Under suitable regularity conditions it is routine to show that this is a consistent estimator
2
of σG
, again the details are beyond the scope of this paper.
30
2
The variance σG
under a bivariate normal model. In order to obtain a better understanding about the expression of the variance we consider it in the following simple bivariate
normal example. Assume
2
x + y2
1
, (x, y) ∈ IR2 .
exp −
f (x, y) =
2π
2
A special case of our Theorem says that whenever
nh2n → 0 and nhn / log n → ∞,
(here γ = 0), then
n
hn
1/4
{λ(Cn (c)∆ C(c)) − Eλ(Cn (c)∆C(c)) } →d σλ Z.
(2.63)
(2.64)
We shall calculate σλ in this case. We get that
f ′ (x, y) = − (x, y) f (x, y) .
Notice for any 0 < c <
Setting
1
,
2π
β = (x, y) : x2 + y 2 = −2 log (c2π) .
r (c) =
p
−2 log (c2π),
we see that β is the circle with center 0 and radius r (c). Choosing the obvious differmorphism,
y (θ) = (r (c) cos θ, r (c) sin θ) , for θ ∈ [0, 2π] ,
we get that for θ ∈ [0, 2π] ,
u (θ) = (− cos θ, − sin θ) , y ′ (θ) = (r (c) sin θ, −r (c) cos θ) ,
and
r (c) sin θ −r (c) cos θ
ι (θ) = det − cos θ
− sin θ
Here g = 1 and we are assuming γ = 0. We get that
= r (c) .
√
s |f ′ (y (θ))|
sr (c) c
c (s, θ, γt) = c (s, θ, 0) = − √
=−
.
c kKk2
kKk2
Thus
√ √ c
c sr
(c)
sr
(c)
−I 0≥−
Γ (θ, s, t) =cov I Z1 ≥ −
,
kKk2
kKk2
√ √ p
c
c sr
(c)
sr
(c)
I ρ(t)Z1 + 1 − ρ2 (t)Z2 ≥ −
−
I
0
≥
−
.
kKk2
kKk2
31
This gives
σλ2
= r (c)
Set
Z
∞
−∞
2π
Z
0
Z
Γ (θ, s, t) dt dθ ds.
B
Υ (θ, u, t) =cov |I {Z1 ≥ −u} − I {0 ≥ −u}| ,
n
o
p
2
I ρ(t)Z1 + 1 − ρ (t)Z2 ≥ −u − I {0 ≥ −u} .
We see then by the change of variables u =
σλ2
kKk
= √ 2
c
Z
∞
−∞
Z
0
2π
√
sr(c) c
kKk2
Z
that
Υ (θ, u, t) dt dθ du.
B
For comparison, Theorem 2.1 of Cadre (2006) says that if
nhn / (log n)16 → ∞ and nh3n (log n)2 → 0
then
p
nhn λ(Cn (c) ∆ C(c)) →P kKk2
r
2c
π
Z
β
dH
= 2 kKk2
k∇f k
(2.65)
r
2π
.
c
(2.66)
The measure dH denotes the Hausdorff measure on β. In this case H (β) is the circumference of β.
1/4
√
1/4 1/4
= (nh2n ) hn → 0, (2.64) and (2.66) imply that whenObserve that since nhn hnn
ever (2.63) and (2.65) hold, we get
r
p
2π
.
nhn Eλ(Cn (c)∆C(c)) → 2 kKk2
c
√
(Notice that the choice hn = 1/ n log n satisfies both (2.63) and (2.65).)
APPENDIX
In this appendix we supply a number of technical details that are needed in our proofs.
Detail 1. The proof of the identity in (2.4) is as follows:
32
Z
∞
0
=
|f (x) − c| p−1 dx dc
Cn (c)∆ C(c)
Z
IRd
=
Z
Z
∞
0
I{(x, c) : f (x) < c ≤ fn (x)}
Z
{x:f (x)≤fn (x)}
+
Z
p−1
dc dx
+ I{ (x, c) : fn (x) < c ≤ f (x)} f (x) − c
∞
0
Z
p−1
I{c : f (x) < c ≤ fn (x)} f (x) − c
dc dx
{x:fn (x)≤f (x)}
=
Z
{x:f (x)≤fn (x)}
Z
fn (x)
f (x)
+
Z
∞
0
p−1
dc dx
I{c : fn (x) < c ≤ f (x)} f (x) − c
f (x) − c p−1 dc dx
Z
{x:fn (x)≤f (x)}
=
Z
{x:f (x)≤fn (x)}
Z
fn (x)−f (x)
c p−1 dc dx
Z
{x:fn (x)≤f (x)}
=
{x:f (x)≤fn (x)}
1
=
p
Z
IRd
fn (x)
f (x) − c p−1 dc dx
0
+
Z
f (x)
Z
Z
f (x)−fn (x)
c p−1 dc dx
0
p
1
fn (x) − f (x) dx +
p
Z
{x:fn (x)≤f (x)}
p
1
f (x) − fn (x) dx
p
fn (x) − f (x) p dx.
Detail 2. Notice that (K.i), (K.ii), (2.7) and (2.8) imply after a change of variables and
1/d
an application of Taylor’s formula for f (x + hn v) − f (x) that for some constant A1 > 0,
sup h−2/d
sup |Efn (x) − f (x)| ≤ A1 .
n
n≥2
Thus we see by (H) that there exists a γ2 > 0 such that
p
p
1/d
sup nhn sup |Efn (x) − f (x)| ≤ A1 nhn h2/d
n ≤ γ2 hn .
n≥2
(A.1)
x∈IRd
x∈IRd
33
(A.2)
Detail 3. By (2.7) for all x, v ∈ IRd
d
X
∂f (x) vi 2
≤ 2A |v| .
f (x + v) − f (x) −
∂xi (A.3)
i=1
√
Thus we see that for 1 ≤ |s| ≤ ς log n, with n large enough,
p
p
su
(θ)
su (θ) nhn c − f y (θ) + √
nhn f (y (θ)) − f y (θ) + √
=
nhn nhn 2As2
− γ2 h1/d
≥ |sf ′ (y (θ)) · u (θ)| − √
n
nhn
2As2
− γ2 h1/d
≥ |s| inf |f ′ (y (θ))| − √
n ,
θ∈Id
nhn
√
2Aς log n
≥ |s| ρ0 − √
− γ2 h1/d
n ,
nhn
√
n
→ 0, we readily
where for the last inequality we use (2.11). Since (H) implies 2Aς√nhlog
n
see that there
√ is an 0 < η < 1 such that for all large enough n we have for all θ ∈ Id and
1 ≤ |s| ≤ ς log n,
√
su (θ) nhn c − f y (θ) + √
≥ η |s| .
(A.4)
2 nhn Detail 4. Let
ρn x, x + th1/d
= Cov π n (x) , πn x + th1/d
n
n
i
h x−X
x−X
+
t
K
h−1
E
K
n
h1d
h1d
n
n
=r
.
x−X
x−X
−1
2
−1
2
hn EK
+t
hn EK
h1d
h1d
n
Choose any θ ∈ Id , |s| ≤ τ , x = y (θ) +
1/d
y = x − uhn
s u(θ)
√
.
nhn
n
We see that with the change of variable
,
ρn (x, x +
th1/d
n )
s u (θ)
s u (θ)
1/d
= ρn y (θ) + √
, y (θ) + √
+ thn
nhn
nhn
R
K(u)K (u + t) f (yn (s, θ, u)) du
,
R
2
2
K (u)f (yn (s, θ, u)) du IRd K (u + t)f (yn (s, θ, u))du
IRd
= qR
IRd
1/d
s u(θ)
where yn (s, θ, u) = y (θ) + √
− uhn , which by the bounded convergence theorem
nhn
converges as n → ∞ to
R
R
d K (u) K (u + t) du
d K (u) K (u + t) f (y (θ)) du
IR R
= IR R
=: ρ(t).
2
K (u) f (y (θ)) du
K 2 (u) du
IRd
IRd
34
Detail 5. We shall show that for |s| ≤ τ , u = s u (θ), x = y (θ) +
√
nhn (c − Efn (x))
cn (x) = r
1
2 x−X
EK
1/d
hn
√u
nhn
and θ ∈ Id that
hn
= cn y (θ) + √
→ −p
n→∞
u
nhn
√
=
u
nhn c − Efn y (θ) + √nh
n
r
y(θ)+ √ u −X nhn
1
EK 2
1/d
hn
hn
s |f ′ (y (θ))|
f ′ (y (θ)) · u
=− √
.
c kKk2
f (y (θ)) kKk2
First the argument in Detail 4 implies that
v
!
u
u
u1
−
X
y (θ) + √nh
√
n
t EK 2
→
c kKk2 .
1/d
n→∞
hn
hn
Next by (A.1) and (H)
u
nhn c − Efn y (θ) + √
=
nhn
p
u
nhn c − f y (θ) + √
+ o (1)
nhn
p
and clearly by (A.3)
p
u
nhn c − f y (θ) + √
→ −s |f ′ (y (θ))| .
n→∞
nhn
Detail 6. Here we provide the proof of Lemma 2.2. Consider the characteristic function
φn (t, u) := E exp(itSn + iuNn )
=
∞
X
eiuk E (exp(itSn )|Nn = k) P (Nn = k).
k=0
¿From this we see by Fourier inversion that the conditional characteristic function of Sn ,
given Nn = n, is
Z π
1
e−iun φn (t, u)du.
ψn (t) := E (exp(itSn )|Nn = n) =
2πP (Nn = n) −π
Applying Stirling’s formula, we obtain as n → ∞
2πP (Nn = n) = 2πe−n nn−1 /(n − 1)! ∼ (2π/n)1/2 ,
35
(A.5)
√
which, after changing variables from u to v/ n and using assumption (i), gives
Z π√n
E exp(itSn + ivUn )E exp(ivVn )dv.
ψn (t) = (2π)−1/2 (1 + o(1))
√
−π n
We shall deduce our proof from this expression for the conditional characteristic function
ψn (t), after we have collected some facts about the asymptotic behavior of the components
in ψn (t).
First, by assumption (iii) we have Un →P 0, which in combination with (ii) implies
E exp(itSn + ivUn ) → φ(t),
(A.6)
where
φ(t) = exp −σ 2 t2 /2 .
Next, by arguing as in the proof of Theorem 3 on pages 490-91 of Feller (1966) we see
that as n → ∞
Z π √n
Z
2
|E exp(ivVn ) − exp(−v /2)|dv +
exp(−v 2 /2)dv → 0,
(A.7)
√
√
−π n
which implies that
Z π√n
√
−π n
|v|>π n
|E exp(itSn + iuUn ) E exp(ivVn ) − exp(−v 2 /2) |dv → 0.
Here is the argument that shows (A.7). Now
it
it
φn (t) := E exp(itVn ) = exp nβn exp √
−1− √
.
n
n
Next using the elementary inequality
3
2
exp(ix) − 1 − ix + x ≤ |x|
2
6
gives
2
|φn (t)| ≤ exp −
3
βn |t|
βn t
+ √
2
6 n
√
which for all large enough n and |t| ≤ π n is
2
t
.
≤ exp −
3
!
Now since φn (t) → exp (−t2 /2) for all t, we infer by setting
√ gn (t) = φn (t) 1 |t| ≤ π n
36
,
(A.8)
and applying the Lebesgue dominated convergence theorem that
Z
gn (t) − exp −t2 /2 dt → 0, as n → ∞,
IR
which yields (A.7). To complete the proof of the lemma, we get by putting (A.6) and
(A.8) together with the Lebesgue dominated convergence theorem that
Z
1
ψn (t) → √
φ(t) exp(−v 2 /2)dv = exp(−σ 2 t2 /2).
2π IR
⊔
⊓
Detail 7. First we show (2.51). For any measurable set C ⊂ Id define
[
τ
B .
y (θ) + √
Dn (τ, C) =
nh
n
θ∈C
We see that, with |A| = λ (A) for a measurable subset A of IRd ,
Z
|Dn (τ, C)| =
dx,
Dn (τ,C)
which by the change of variables (2.29) is equal to
Z τZ
| Jn (θ, s)| dθ ds,
−τ
C
where Jn (θ, s) is defined as in (2.30). Since uniformly in (θ, s) ∈ Id × [−τ, τ ],
p
nhn |Jn (θ, s)| → ι (θ)
and there exists a constant 0 < b/2 < ∞ such that
0 ≤ ι (θ) ≤ b/2,
we see that for all large enough n ≥ 1 for all (θ, s) ∈ Id × [−τ, τ ],
0 ≤ |Jn (θ, s)| ≤ b.
This implies that,
p
nhn |Dn (τ, C)| → 2τ
and for all large enough n ≥ 1,
0≤
p
Z
ι (θ) dθ
C
nhn |Dn (τ, C)| < 2τ b |C| .
37
Since
Dn (τ ) = Dn (τ, Id ) ,
we get, in particular, that for all large enough n ≥ 1,
4τ bπ d−1
2τ b |Id |
= √
.
0 ≤ |Dn (τ )| ≤ √
nhn
nhn
(A.9)
This establishes (2.51).
Next, we construct the partition asserted after (2.54). Consider the regular grid given by
Ai = (xi1 , xi1 +1 ] × · · · × (xid , xid +1 ],
1/d
where i =(i1 , . . . , id ), i1 , . . . , id ∈ Z and xi = i hn
for i ∈ Z. Define
Ri = Ai ∩ Dn (τ ).
With Jn = {i : Ai ∩ Dn (τ ) 6= ∅ } we see that
{Ri : i ∈ Jn }
constitutes a partition of Dn (τ ) with
|Ri | ≤ hn .
Further we have
[
i∈Jn
S
Ri ⊂
[
i∈Jn
The inequality | i∈Jn Ai | ≤ |Dn (τ +
and for n large enough we have
Ai ⊂ Dn (τ +
√
√
d h1/d
n ).
1/d
d hn )| implies via (A.9) that with mn := |Jn |
5τ bπ d−1
,
mn hn ≤ √
nhn
giving
5τ bπ
mn ≤ p
,
nh3n
which is (2.54). Observing the fact that if B1 , . . . , Bk are disjoint sets such that for
1 ≤ i 6= j ≤ k
inf {|x − y| : x ∈ Bi, y ∈ Bj } > h1/d
(A.10)
n ,
then
Z
∆n (x) g(x) dx, i = 1, . . . , k, are independent,
Bi
38
(A.11)
we can easily infer that
Xi,n :=
n
hn
1/4 √
nhn
γ1 R
g
Ri
{∆n (x) − E∆n (x)} g(x) dx
σn (τ )
constitutes a 1-dependent random field on Zd .
Acknowledgement The authors would like to thank Gerard Biau for pointing out to
them the Tubular Neighborhood Theorem, and also the Associate Editor and the referee
for useful suggestions and a careful reading of the manuscript.
References
Baillo, A., Cuevas, A. and Justel A. (2000). Set estimation and nonparametric
detection. Canad. J. Statist. 28, 765-782.
Baillo A., Cuestas-Albertos, J.A., and Cuevas, A. (2001). Convergence rates
in nonparametric estimation of level sets. Stat. Probab. Lett. 53, 27-35.
Baillo, A. (2003). Total error in a plug-in estimator of level sets. Stat. Probab. Lett.
65, 411-417.
Baillo A. and Cuevas, A. (2006). Image estimators based on marked bins. Statistics.
40(4), 277-288.
Beirlant, J. and Mason, D. M. (1995). On the asymptotic normality of Lp –norms
of empirical functionals. Math. Methods of Statistics, 4, 1-19.
Biau, G., Cadre, B. and Pelletier, B. (2008). Exact rates in density support
estimation. J. Multivariate Anal. In press.
Bredon, G. E. (1993). Topology and Geometry, Volume 139 of Graduate Texts in
Mathematics. Springer-Verlag, New York.
Cadre, B. (2006). Kernel estimation of density level sets. J. Multivariate Anal. 97,
999–1023.
Cavalier, L. (1997). Nonparametric estimation of regression level sets. Statistics 29,
131-160.
Cuevas, A., Febrero, M., and Fraiman, R. (2000). Estimating the number of
clusters. Canad. J. Statist. 28, 367-382.
Cuevas, A., Gonzalez-Manteiga, W., and Rodriguez-Casal, A. (2006). Plugin estimation of general level sets. Australian and New Zealand J. Statist. 48(1),
7-19.
39
Desforges, M.J., Jacob, P.J. and Cooper, J.E. (1998). Application of probability density estimation to the detection of abnormal conditions in engineering. In:
Proceedings of the Institute of Mechanical Engineering 212, 687-703.
Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kerneltype function estimators. Ann. Statist. 33, 1380–1403.
Fan, W., Miller, M., Stolfo, S.J., Lee, W. and Chan, P.K. (2001). Using
artificial anomalies to detect unknown and known network intrusions. In IEEE
International Conference on Data Mining (ICDM’01), IEEE Computer Society, 123130.
Feller, W. (1966). An Introduction to Probability Theory and Its Applications, Volume
II. Wiley, New York.
Gardner, A.B., Krieger, A.M., Vachtsevanos, G. and Litt B. (2006). Oneclass novelty detection for seizure analysis from intracranial EEG. J. Machine Learning Research 7, 1025 - 1044.
Gayraud, G. and Rousseau, J. (2005). Rates of convergence for a Bayesian level
set estimation. Scand. J. Statist. 32(4), 639 - 660.
Gerig, G., Jomier, M. and Chakos, M. (2001). VALMET: a new validation tool
for assessing and improving 3D object segmentation. In Niessen, W., Viergever, M.
eds.: Medical image computing and computer assisted intervention MICCAI 2208,
Springer, New York, 516-523.
Giné, E, Mason, D. M. and Zaitsev, A. (2003). The L1 −norm density estimator
process. Ann. Probab. 31, 719–768.
Goldenshluger, A. and Zeevi, A. (2004). The Hough transform estimator. Ann.
Statist. 32(5), 1908-1932.
Hall, P. and Kang, K-H. (2005). Bandwidth choice for nonparametric classification.
Ann. Statist. 33, 284-306.
Hartigan, J.A. (1975). Clustering algorithms. Wiley, New York
Hartigan, J.A. (1987). Estimation of a convex density contour in two dimensions. J.
Amer. Statist. Assoc. 82, 267-270.
Horvath, L. (1991). On Lp -norms of multivariate density estimators. Ann. Statist.
19(4), 1933-1949.
Huo, X. and Lu, J.-C. (2004). A network flow approach in finding maximum likelihood estimate of high concentration regions. Comp. Stat. Data Anal. 46, 33-56.
40
Jang, W. (2006). Nonparametric density estimation and clustering in astronomical sky
survey. Comp. Statist. & Data Anal. 50, 760 - 774.
Johnson, W. B., Schechtman, G. and Zinn, J. (1985). Best constants in moment inequalities for linear combinations of independent and exchangeable random
variables. Ann. Probab. 13, 234–253.
King, S.P., King, D.M., Anuzis, P., Astley, K. Tarassenko, L., Hayton,
P. and Utete, S. (2002). The use of novelty detection techniques for monitoring
high-integrity plant. In Proceedings of the 2002 International Conference on Control
Applications Vol. 1, Cancun, Mexico, 221-226.
Klemelä, J. (2004). Visualization of multivariate density estimates with level set trees.
J. Comput. Graph. Statist. 13(3), 599-620.
Klemelä, J. (2006a). Visualization of multivariate density estimates with shape trees.
J. Comput. Graph. Statist. 15(2), 372-397.
Klemelä, J. (2006b). Visualization of scales of multivariate density estimates. Unpublished manuscript.
Lang, S. (1997). Undergraduate Analysis. Second edition. Springer, New York.
Markou, M. and Singh, S. (2003). Novelty detection: a review - part 1: statistical
approaches. Signal Processing 83 2481-2497.
Molchanov, I.S. (1998). A limit theorem for solutions of inequalities. Scand. J. Stat.
25, 235-242.
Nairac, A., Townsend, N., Carr, R., King, S., Cowley, L., and Tarassenko,
L. (1997). A system for the analysis of jet engine vibration data. Integrated Comput.
Aided Eng. 6, 53-65.
Polonik, W. (1995). Measuring mass concentration and estimating density contour
clusters: an excess mass approach. Ann. Statist. 23 855-881.
Prastawa M., Bullitt, E., Ho, S. and Gerig, G. (2003). Robust estimation for
brain tumor segmentation. In: MICCAI proceedings LNCS 2879, 530-537.
Rigollet, P. and Vert, R. (2008). Fast rates for plug-in estimators of density level
sets. Available at arxiv.org/pdf/math/0611473
Roederer, M. and Hardy, R.R. (2001). Frequency difference gating: A multivariate
method for identifying subsets that differ between samples. Cytometry 45, 56-64.
Rosenblatt, M. (1975). The quadratic measure of deviation of two-dimensional density estimates and a test of independence. Ann. Statist. 3(1), 1 - 14.
41
Scott, C.D. and Davenport, M. (2006). Regression level set estimation via costsensitive classification. To appear in IEEE Trans. Inf. Theory.
Scott, C.D. and Nowak, R.D. (2006). Learning minimum volume sets. J. Machine
Learning Research 7, 665-704.
Shergin, V. V. (1990). The central limit theorem for finitely dependent random variables. In: Proc. 5th Vilnius Conf. Probability and Mathematical Statistics, Vol. II,
Grigelionis, B., Prohorov, Y.V., Sazonov, V.V. and Statulevicius V. (Eds.) 424 431.
Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications
to Statistics. Wiley Series in Probability and Mathematical Statistics. John Wiley,
New York.
Steinwart, I., Hush, D. and Scovel, C. (2004). Density level detection is classification. Technical Report, Los Alamos National Laboratory.
Steinwart, I., Hush, D. and Scovel, C. (2005). A classification framework for
anomaly detection. J. Machine Learning Research 6, 211-232.
Stuetzle, W. (2003). Estimating the cluster tree of a density by analyzing the minimum spanning tree of a sample. J. Classification 20(5), 25 - 47.
Theiler, J. and Cai, D.M. (2003). Resampling approach for anomaly detection in
multispectral images. In Proceedings of the SPIE 5093, 230-240.
Tsybakov, A.B. (1997). On nonparametric estimation of density level sets. Ann.
Statist. 25, 948-969.
Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning.
Ann. Statist. 32, 135 - 166.
Vert, R. and Vert, J.-P. (2006). Consistency and convergence rates of one-class
SVM and related algorithms. J. Machine Learning Research 7, 817-854.
Walther, G. (1997). Granulometric smoothing. Ann. Statist. 25, 2273 - 2299.
Wand, M. (2005): Statistical methods for flow cytometric data. Presentation; available
at http://www.uow.edu.au/∼mwand/talks.html.
Willett, R.M. and Nowak, R.D. (2005). Level Set Estimation in Medical Imaging,
To appear in Proceedings of the IEEE Statistical Signal Processing Workshop, June
17-20 2005, Bordeaux, France.
Willett, R.M. and Nowak, R.D. (2006). Minimax optimal level set estimation.
Submitted to IEEE Trans. Image Proc. (Available at
http://www.ee.duke.edu/∼willett/).
42
Yeung, D.W. and Chow, C. (2002). Parzen window network intrusion detectors. In:
Proceedings of the 16th International Conference on Pattern Recognition, Quebec,
Canada, Vol. 4 385-388.
Statistics Program
University of Delaware
206 Townsend Hall
Newark, DE 19717
USA
davidm@udel.edu
Departement of Statistics
University of California, Davis
One Shields Avenue
Davis, CA 95616-8705
USA
wpolonik@ucdavis.edu
43
Download