A Locally Adaptive Transformation Method of

advertisement
A Locally Adaptive Transformation Method of
Boundary Correction in Kernel Density Estimation
R.J. Karunamunia and T. Albertsb
a
Department of Mathematical and Statistical Sciences
University of Alberta, Edmonton, Alberta, Canada, T6G 2G1;
b
Courant Institute of Mathematical Sciences,
New York University, New York, NY, USA 10012
Abstract
Kernel smoothing methods are widely used in many research areas in statistics.
However, kernel estimators suffer from boundary effects when the support of the function to be estimated has finite endpoints. Boundary effects seriously affect the overall
performance of the estimator. In this article, we propose a new method of boundary
correction for univariate kernel density estimation. Our technique is based on a data
transformation that depends on the point of estimation. The proposed method possesses desirable properties such as local adaptivity and non-negativity. Furthermore,
unlike many other transformation methods available, the proposed estimator is easy
to implement. In a Monte Carlo study, the accuracy of the proposed estimator is numerically analyzed and compared with the existing methods of boundary correction.
We find that it performs well for most shapes of densities. The theory behind the
new methodology, along with the bias and variance of the proposed estimator, are
presented. Results of a data analysis are also given.
Keywords: density estimation; mean squared error; kernel estimation; transformation.
Short Title: Kernel Density Estimation
AMS Subject Classifications: Primary: 62G07; Secondary: 62G20
Corresponding Author: R.J.Karunamuni@ualberta.ca
1
1
Introduction
Nonparametric kernel density estimation is now popular and in wide use with great success in
statistical applications. Kernel density estimates are commonly used to display the shape of
a data set without relying on a parametric model, not to mention the exposition of skewness,
multimodality, dispersion, and more. Early results on kernel density estimation are due to
Rosenblatt (1956) and Parzen (1962). Since then, much research has been done in the area;
see, e.g., the monographs of Silverman (1986), and Wand and Jones (1995).
Let f denote a probability density function with support [0, ∞) and consider nonparametric estimation of f based on a random sample X1 , . . . , Xn from f . Then the traditional
kernel estimator of f is given by
n
1 X
fn (x) =
K
nh i=1
µ
x − Xi
h
¶
,
(1.1)
where K is some chosen unimodal density function, symmetric about zero with support
traditionally on [−1, 1], and h is the bandwidth (h → 0 as n → ∞). The basic properties of
fn are well-known, and under some smoothness assumptions these include, for x ≥ 0,
Z
Z c
h2 (2)
E fn (x) = f (x)
K(t)dt − hf (x)
tK(t)dt + f (x)
t2 K(t)dt + o(h2 ),
2
−1
µ ¶ −1
Z−1c
f (x)
1
Var fn (x) =
K 2 (t)dt + o
,
nh −1
nh
c
Z
(1)
c
where c = min(x/h, 1). For x ≥ h, the so-called interior points, the bias of fn (x) is of
order O(h2 ), whereas at boundary points, i.e. for x ∈ [0, h), fn is not even consistent. In
nonparametric curve estimation problems this phenomenon is referred to as the “boundary
effects” of (1.1). To remove these boundary effects a variety of methods have been developed
in the literature. Some well-known methods are summarized below:
2
• The reflection method (Cline and Hart, 1991; Schuster, 1985; Silverman, 1986).
• The boundary kernel method (Gasser and Muller, 1979; Gasser et al., 1983; Jones,
1993; Muller, 1991; Zhang and Karunamuni, 2000).
• The transformation method (Marron and Ruppert, 1994; Ruppert and Cline, 1994;
Wand et al., 1991).
• The pseudo-data method (Cowling and Hall, 1996).
• The local linear method (Cheng et al., 1997; Cheng, 1997; Zhang and Karunamuni,
1998).
• Other methods (Zhang et al., 1999; Hall and Park, 2002).
The reflection method is specifically designed for the case f (1) (0) = 0, where f (1) denotes
the first derivative of f . The boundary kernel method is more general than the reflection
method in the sense that it can adapt to any shape of density. However, a drawback of this
method is that the estimates might be negative near the endpoints; especially when f (0) ≈ 0.
To correct this deficiency of boundary kernel methods some remedies have been proposed; see
Jones (1993), Jones and Foster (1996), Gasser and Muller (1979) and Zhang and Karunamuni
(1998). The local linear method is a special case of the boundary kernel method that is
thought of by some as a simple, hard-to-beat default approach, partly because of “optimal”
theoretical properties (Cheng et al., 1997) in the boundary kernel implicit in local linear
fitting. The pseudo-data method of Cowling and Hall (1996) generates some extra data X(i) ’s
using what they call the “three-point-rule”, which are then combined with the original data
Xi ’s to form a kernel type estimator. Marron and Ruppert’s (1994) transformation method
consists of a three-step process. First, a transformation g is selected from a parametric
family so that the density of Yi = g(Xi ) has a first derivative that is approximately equal
3
to 0 at the boundaries of its support. Next, a kernel estimator with reflection is applied to
the Yi ’s. Finally, this estimator is converted by the change-of-variables formula to obtain an
estimate of f . Among other methods, two very promising recent ones are due to Zhang et
al. (1999) and Hall and Park (2002). The former method is a combination of the methods
of pseudo-data, transformation and reflection; whereas the latter method is based on what
they call an “empirical translation correction”.
The boundary kernel and related methods usually have low bias but the price for that is an
increase in variance. It has been observed that approaches involving only kernel modifications
without regard to f or data, such as the boundary kernel method, are always associated
with larger variance. Furthermore, the corresponding estimates tend to take negative values
near the boundary points. On the other hand, transformation-based boundary correction
estimates are non-negative and generally have low variance, possibly due to non-negativity. It
has gradually gotten through to researchers that this non-negativity property is important
in practical applications and the approaches producing non-negative estimators are well
worth exploring. In this article, we develop a new transformation approach for boundary
correction. Our method consists of a straightforward transformation of data. It is easy to
implement in practice compared to other existing transformation methods (compare with
Marron and Ruppert, 1994). A distinct feature of our method is that it is locally adaptive;
that is, the transformation depends on the point of estimation. Other desirable properties
of the proposed estimator are: (i) it is non-negative everywhere, and (ii) it reduces to the
traditional kernel estimator (1.1) at the interior points. In simulations, we compared the
proposed estimator with other existing methods of boundary correction and observed that
it performs quite well for most shapes of densities.
Section 2 contains the methodology and the main results. Section 3 and 4 present simulation studies and a data analysis, respectively. Final comments are given in Section 5.
4
2
2.1
Locally Adaptive Transformation Estimator
Methodology
For convenience, we shall assume that the unknown probability density function f has support [0, ∞), and consider estimation of f based on a random sample X1 , . . . , Xn from f . Our
transformation idea is based on transforming the original data X1 , . . . , Xn to g(X1 ), . . . , g(Xn ),
where g is a non-negative, continuous and monotonically increasing function from [0, ∞) to
[0, ∞). Based on the transformed data, we now define for x = ch, c ≥ 0,
n
1 X
K
f˜n (x) =
nh i=1
µ
¶ Z
x − g(Xi ) . c
K(t)dt,
h
−1
(2.1)
where h is the bandwidth (again h → 0 as n → ∞), K is a non-negative symmetric
R
R
kernel function with support on [−1, 1], satisfying K(t)dt = 1, tK(t)dt = 0, and 0 6=
R 2
t K(t)dt < ∞. We can now state the following lemma, which exhibits the bias and variance of (2.1) under certain conditions on g and f . The proof is given in the Appendix.
Lemma 2.1. Let f˜n be defined by (2.1). Assume that f (2) and g (2) exist and are continuous
on [0, ∞), where f (i) and g (i) denote the ith derivative of f and g, respectively, with f (0) =
f, g (0) = g. Further assume that g(0) = 0 and g (1) (0) = 1. Then for x = ch, 0 ≤ c ≤ 1, we
5
have
−h
E f˜n (x) − f (x) = R c
K(t)dt
−1
½
Z c
Z
(2)
(1)
f (0)g (0)
(c − t)K(t)dt + f (0)
2
−1
c
½
Z
Z
c
¾
tK(t)dt
−1
c
h
+ Rc
−f (2) (0)c2
K(t)dt +
(t − c)2 K(t)dt
2 −1 K(t)dt
−1
−1
¾
£ (2)
¤
(3)
(2)
(1)
(2)
× f (0) − f (0)g (0) − 3g (0)(f (0) − f (0)g (0))
+ o(h2 ),
(2.2)
and
f (0)
Rc
Var f˜n (x) =
nh( −1 K(t)dt)2
f (x)
Rc
=
nh( −1 K(t)dt)2
Z
c
µ
2
K (t)dt + o
−1
Z c
µ
2
K (t)dt + o
−1
1
nh
1
nh
¶
¶
,
(2.3)
The last equality follows since f (0) = f (x) − chf (1) (x) + (ch)2 f (2) (x)/2 + o(h2 ) for x = ch
(see (A.3)). Note that the leading term of the variance of f˜n is not affected by the transR1
formation g. When c = 1, Var f˜n (x) = f (x)(nh)−1 −1 K 2 (t)dt + o((nh)−1 ), which is exactly
the expansion of the interior variance of the traditional estimator (1.1).
We shall use our transformation g so that the first order term in the bias expansion (2.2)
is zero. Assuming that f (0) > 0, it is enough to let
Z
(2)
g (0) = −f
(1)
c
(0)
Z c
.
tK(t)dt f (0)
(c − t)K(t)dt.
−1
(2.4)
−1
Note that the right-hand side of (2.4) depends on c; that is, the transformation g depends
6
on the point of estimation inside the boundary region [0, h). We call this property local
adaptivity. In order to display this local dependence we shall denote g by gc , 0 ≤ c ≤ 1,
in what follows. Combining (2.4) with the additional assumptions in Lemma 2.1, gc should
now satisfy the following three conditions for each c, 0 ≤ c ≤ 1:
(i) gc : [0, ∞) → [0, ∞), gc is continuous, monotonically increasing
and gc(i) exists, i = 1, 2, 3,
(ii) gc (0) = 0, gc(1) (0) = 1
Z c
Z c
.
(2)
(1)
(iii) gc (0) = −f (0)
tK(t)dt f (0)
(c − t)K(t)dt.
−1
(2.5)
−1
Functions satisfying conditions (2.5) can be easily constructed, with the simplest candidates
being polynomials. In order to satisfy property (iii) of (2.5) and maintain gc as an increasing
function we require at least a cubic (for the case f (1) (0) < 0). So for 0 ≤ c ≤ 1 we use the
transformation
d
gc (y) = y + lc y 2 + d2 lc2 y 3 ,
2
(2.6)
where
Z
.Z
c
lc = −
c
tK(t)dt
−1
(c − t)K(t)dt
(2.7)
−1
and
±
d = f (1) (0) f (0).
7
(2.8)
When c = 1, gc (y) = y. This means that f˜n (x) defined by (2.1) with gc of (2.6) reduces to the
usual kernel estimator fn (x) defined by (1.1) for interior points, i.e., for x ≥ h, f˜n (x) coincides
with fn (x). Higher order polynomials could be used to obtain the same properties but they
are increasingly more complicated to deal with and offer very little marginal advantage. For
gc defined by (2.6), the bias term of (2.2) takes the form
h2
˜
R
E fn (x) − f (x) =
c
2 −1 K(t)dt
+6
(f
(1)
2
(0))
f (0)
½
Z c
(2)
f (0)
(t2 − 2tc)K(t)dt
Z
−1
c
2
(t − c)
−1
K(t)dt(lc2
for x = ch, 0 ≤ c ≤ 1. Note that when c = 1, E f˜n (x) − f (x) =
¾
+ lc ) + o(h2 ),
(2.9)
R1 2
h2 (2)
f
(x)
t K(t)dt + o(h2 ),
2
−1
which is exactly the same expression of the interior bias of the traditional kernel estimator
(1.1), since f (2) (x) = f (2) (0) + o(1) for x = ch (see (A.5)).
2.2
Estimation of gc
In order to implement the transformation (2.6) in practice, one must replace d = f (1) (0)/f (0)
with a pilot estimate. This requires the use of another density estimation method, such
as semiparametric, kernel or nearest neighbour method. Our experience is that proposed
estimator of f given below is insensitive to the fine details of the pilot estimate of d, and
therefore any convenient estimate can be employed. A natural estimate would be a fixed
kernel estimate with bandwidth chosen optimally at x = 0 as in Zhang et al. (1999). Other
possible estimators of d are proposed in Choi and Hall (1999) and Park et al. (2003).
Following Zhang et al. (1999), in this paper we use the kernel type pilot estimate of d given
8
by
dˆ = (log fn∗ (h1 ) − log fn∗ (0))/h1 ,
(2.10)
where
fn∗ (h1 )
µ
¶
n
h1 − Xi
1
1 X
K
+ 2
=
nh1 i=1
h1
n
(2.11)
and
(
fn∗ (0)
= max
µ
¶
n
1 X
−Xi
1
K(0)
, 2
nh0 i=1
h0
n
)
,
(2.12)
where h1 = o(h) with h as in (2.1), K is a usual symmetric kernel function as in (1.1), K(0)
is a so-called endpoint order two kernel satisfying
Z
Z
0
K(0) (t)dt = 1,
−1
Z
0
0
tK(0) (t)dt = 0, and 0 <
−1
t2 K(0) (t)dt < ∞,
−1
and h0 = b(0)h1 with b(0) given by
( R1 2
)
R0 2
( −1 t K(t)dt)2 ( −1 K(0)
(t)dt)
b(0) =
.
R0
R1
( −1 t2 K(0) (t)dt)2 ( −1 K 2 (t)dt)
(2.13)
dˆ
ĝc (y) = y + lc y 2 + dˆ2 lc2 y 3 ,
2
(2.14)
We now define
for 0 ≤ c ≤ 1, as our estimator of gc (y) defined by (2.6), where dˆ is given by (2.10). Note
that ĝc and dˆ depend on n but this dependence is suppressed for notational convenience.
9
It has been argued in the literature (see, e.g., Park et al., 2003) that one may use a
bandwidth h1 different from h for the pilot estimator of d, and that h1 should be of the same
order as or faster than h to inherit the full advantages of the proposed density estimator.
This adjustment also improves the asymptotic theoretical results of our proposed density
estimator fˆn (x) given below at (2.15), and should act as the basis for the choice of h1 in
(2.10) and (2.11).
2.3
The Proposed Estimator
Our proposed estimator of f (x) is defined as, for x = ch, c ≥ 0,
n
1 X
fˆn (x) =
K
nh i=1
µ
¶ Z
x − ĝc (Xi ) . c
K(t)dt,
h
−1
(2.15)
where ĝc is given by (2.14), K is as given in (2.1), and h is the bandwidth used in (2.1).
For x ≥ h, fˆn (x) reduces to the traditional kernel estimator fn (x) given by (1.1). Thus
fˆn (x) is a natural boundary continuation of the usual kernel estimator (1.1). An important
adjustment in the estimator (2.15) is that it is based on a locally adaptive transformation ĝc
given by (2.14), since ĝc transforms the data depending on the point of estimation. Furthermore, it is important to remark here that the adaptive kernel estimator (2.15) is non-negative
(provided K is non-negative), a property shared with reflection based estimators (see Jones
and Foster, 1996; Zhang et al. 1999) and transformation based estimators (see Marron and
Ruppert, 1994; Zhang et al. 1999; Hall and Park, 2002), but not with most boundary kernel
approaches.
The next theorem establishes the bias and variance of (2.15).
Theorem 2.1. Let fˆn (x) be defined by (2.15) with a kernel function K as given in (2.1)
and a bandwidth h = O(n−1/5 ). Suppose h1 in (2.10) is of the form h1 = O(n−1/4 ). Further
10
assume that K (1) exists and is continuous on [−1, 1], and that f (0) > 0 and f (2) exists and
is continuous in a neighbourhood of 0. Then for x = ch, 0 ≤ c ≤ 1, we have
½
Z c
(2)
f (0)
(t2 − 2tc)K(t)dt
h2
Rc
2 −1 K(t)dt
E fˆn (x) − f (x) =
+6
(f
(1)
(0))
f (0)
2
−1
Z
c
2
(t − c)
−1
K(t)dt(lc2
¾
+ lc ) + o(h2 ),
(2.16)
where lc is given by (2.7), and
Var fˆn (x) =
Z
f (0)
Rc
nh( −1 K(t)dt)2
µ
c
2
K (t)dt + o
−1
1
nh
¶
.
(2.17)
Remark 2.1
As one of the referees correctly pointed out, the choice of bandwidth is very important
for the good performance of any kernel estimator. The optimal bandwidth for (2.15) can
be derived as follows. From (2.16) and (2.17), the leading terms of the asymptotic MSE of
fˆn (x) at x = ch, 0 ≤ c ≤ 1 are
h4
4
µZ
¶−2 ½
c
K(t)dt
Z
f
(2)
−1
c
(0)
(t2 − 2tc)K(t)dt
−1
Z
−1
+6(f (0)) (f
µZ
−1
¶−2 Z
c
+(nh) f (0)
(1)
(0))
c
−1
11
c
2
(t − c)
−1
K(t)dt
−1
2
K 2 (t)dt.
K(t)dt(lc2
¾2
+ lc )
Minimizing the preceding expression with respect to h yields
½·
hc = n
−1/5
Z
¸·
c
2
K (t)dt
4f (0)
Z
(2)
f
c
(t2 − 2tc)K(t)dt
(0)
−1
−1
Z
c
+ 6(f (0))−1 (f (1) (0))2
−1
¸−2 )1/5
(t − c)2 K(t)dt(lc2 + lc )
(2.18)
which is the local optimal bandwidth at x = ch, 0 ≤ c ≤ 1, for the transformation estimator
(2.15). Expression (2.18) is quite similar to that of the “adaptive variable location” kernel
estimator given by Park et al. (2003). In order to see this more clearly we find from
their paper that the leading terms of the asymptotic MSE of their adaptive location kernel
estimator at x = ch, 0 ≤ c ≤ 1, are
h4
4
µZ
¶−2 ½
1
K(t)dt
·Z
f
(2)
Z
1
2
t K(t)dt + 2ρ(c)
(x)
−c
−c
−1
Z
+(f (x)) (f
µZ
−1
(1)
−c
(t)dt
1
K
(2)
¾2
(t)dt
−c
1
K 2 (t)dt,
K(t)dt
−c
−c
R1
2
tK
(1)
−c
(x)) ρ(c)
¶−2 Z
1
+(nh) f (x)
where ρ(c) = (K(c))−1
2
¸
1
tK(t)dt. Minimizing the preceding expression with respect to h
yields
½·
h∗c
−1/5
=n
Z
1
4f (x)
¸·
2
K (t)dt
µZ
f
(2)
(x)
−c
Z
1
2
t K(t)dt + 2ρ(c)
−c
Z
+ (f (x))−1 (f (1) (x))2 ρ(c)2
1
1
¶
(t)dt
−c
¸−2 )1/5
K (2) (t)
tK
(1)
,
(2.19)
−c
which is the local optimal bandwidth at x = ch, 0 ≤ c ≤ 1, for Park et al.’s (2003) adaptive
estimator. It is easy to see the similarity between (2.18) and (2.19). In applications neither
expression can be employed directly, however, due to their dependence on the unknown
12
functional quantities f (x), f (1) (x) and f (2) (x). An alternative approach is to apply the
“bandwidth variation function” technique that is frequently implemented in boundary kernel
methods.
“Adaptive bandwidth” or “variable window width” kernel density estimators are generally
of the form
n
1 X 1
f˜n (x) =
K
nh i=1 α(Xi )
µ
x − Xi
hα(Xi )
¶
,
(2.20)
where α is some non-negative function; see, e.g. Abramson (1982), Hall et al. (1995) and
Terrell and Scott (1992), among others. It is easy to show that the estimator (2.20) also
exhibits boundary effects when the unknown density f is supported on [0, ∞) or [0, 1] and a
kernel K with support on [−1, 1] is used. To the best of our knowledge there are no boundary
adjusted adaptive bandwidth or variable window width kernel estimators that exist in the
literature. A transformation technique based on such an estimator would be
n
f¯n (x) =
X
1
1
Rc
K
nh −1 K(t)dt i=1 α(g(Xi ))
µ
x − g(Xi )
α(g(Xi ))
¶
,
(2.21)
where g is a suitable transformation, like (2.6), that removes the boundary effects. However,
the details of estimators of the form (2.21) are beyond the scope of the present paper.
Remark 2.2
Smoothing methods based on local linear and local polynomial fitting methods are now
popular. In the density estimation context the local polynomial method has been implemented in Cheng et al. (1997), Cheng (1997) and Zhang and Karunamuni (1998), among
others. For example, Zhang and Karunamuni (1998) have studied the following local polynomial density estimator (a very similar estimator is given in Cheng (1997) and Cheng et
13
al. (1997)):
n
fn∗ (x)
1 X ∗
=
K
nh i=1 o
µ
x − Xi
h
¶
(2.22)
where Ko∗ (t) = eTo S −1 (1, t, . . . , tp )T K(t), S = (sj,l ) is a p × p matrix with sj,l =
R
tj+l K(t)dt,
0 ≤ j, l ≤ p, and eT = (0, 1, . . . , 0)T is a p × 1 vector whose second element is 1 and the
others are zero. Then the bias and variance of (2.22) at x = ch, 0 ≤ c ≤ 1, are
Bias(fn∗ (x))
1
=
(b(c)h)p+1 f (p+1) (x)
(p + 1)!
and
Var(fn∗ (x))
1
=
f (x)
nhb(c)
Z
c/b(c)
−1
Z
¡
c/b(c)
−1
∗
(t)dt
tp+1 Ko,c/b(c)
∗
Ko,c/b(c)
(t)
¢2
dt,
∗
where b : [0, 1] → R+ with b(1) = 1 is a bandwidth variation function and Ko,c
is called an
“equivalent boundary kernel”; see Zhang and Karunamuni (1998) for more details.
The asymptotic MSE of fn∗ (x) is typically less than that of fˆn (x) given by (2.15) in a
weak minimax sense; see Cheng et al. (1997). However, local polynomial estimators are
∗
based on the so-called equivalent boundary kernels Ko,c
which are not always positive. As a
result, local polynomial estimators may take negative values, especially where the unknown
density is very small. At the crucial point x = 0 if f (0) ≈ 0 then fn∗ (0) may be negative;
see, e.g., Figure 1. This unattractive feature of local polynomial density estimators has been
criticized by many authors; see, e.g., Zhang et al. (1999) and Hall and Park (2002). On the
other hand, the proposed estimator fˆn (x) given by (2.15) is always non-negative whenever
the kernel K is non-negative. Thus, as far as applications are concerned, we recommend the
use of fˆn (x) over fn∗ (x), even though the latter possesses some good theoretical properties.
14
3
Simulations and Discussion
To test the effectiveness of our estimator we simulated its performance against that of other
well-known methods. Among these were some relatively new techniques, such as a recent
estimator due to Hall and Park (2002) which is very similar to our own, and the transformation and reflection based method given by Zhang et al. (1999). We also competed against
some more classical estimators; namely the boundary kernel and its close counterpart the
local linear fitting method, Jones and Foster’s (1996) nonnegative adaptation estimator, and
Cowling and Hall’s (1996) pseudo-data method.
The first of the competing estimators is due to Hall and Park (2002) (hereafter H&P). It
is defined as, for x = ch, 0 ≤ c ≤ 1,
n
1 X
K
fˆHP (x) =
nh i=1
µ
¶ Z
x − Xi + α̂(x) . c
K(t)dt,
h
−1
(3.1)
where α̂(x) is a correction term given by
f˜0 (x) ³ x ´
α̂(x) = h2 ¯ ρ
,
h
f (x)
and
n
1 X
f¯(x) =
K
nh i=1
µ
¶ Z
x − Xi . c
K(t)dt,
h
−1
(3.2)
which is referred to as the cut-and-normalized kernel estimator, f˜0 (x) is an estimate of
R
f (1) (x), and ρ(u) = K(u)−1 v≤u vK(v)dv. To estimate f (1) (x), we used the endpoint kernel
15
of order (1,2) (see Zhang and Karunamuni, 1998) given by
K1,c (t) =
12(2c(1 + t) − 3t2 − 4t − 1)
I[−1,c] ,
(c + 1)4
and the corresponding kernel estimator
µ
¶
n
x − Xi
1 X
0
˜
K1,c/b(c)
,
f (x) =
nh2c i=1
hc
where hc = b(c)h with the bandwidth variation function b(c) = 21/5 (1 − c) + c for 0 ≤ c ≤ 1.
Note that Hall and Park’s estimator is a special case of the estimator (2.1), but with gˆc (y) =
y − α̂(ch), where x = ch for 0 ≤ c ≤ 1. This ĝ, however, does not necessarily satisfy the
conditions of (2.5).
The method of Zhang et al. (1999) (hereafter Z,K&J) applies a transformation to the
data and then reflects it across the left endpoint of the support of the density. The resultant
kernel estimator is of the form
µ
¶
µ
¶¾
n ½
x − Xi
x + gn (Xi )
1 X
ˆ
K
+K
fZKJ (x) =
nh i=1
h
h
(3.3)
where
gn (x) = x + dn x2 + Ad2n x3 .
Note this gn is very similar to our own, the main difference being it is not dependent on the
point of estimation. The only requirement on A is that 3A > 1, and in practice we used the
recommended value of A = .55. However, dn is again an estimate of d = f (1) (0)/f (0), and
the methodology used is the same as that given by (2.10) to (2.13).
16
The boundary kernel with bandwidth variation is defined (see Zhang and Karunamuni,
1998) as
¶
µ
n
1 X
x − Xi
ˆ
fB (x) =
K(c/b(c))
nhc i=1
hc
(3.4)
with c = min(x/h, 1). On the boundary the bandwidth variation function hc = b(c)h is
employed, here b(c) = 2 − c. We used the boundary kernel
¾
½
12
3c2 − 2c + 1
K(c) (t) =
I[−1,c] .
(1 + t) (1 − 2c)t +
(1 + c)4
2
(3.5)
Zhang and Karuamuni have shown that this kernel is optimal in the sense of minimizing the
MSE in the class of all kernels of order (0,2) with exactly one change of sign in their support.
A simple modification of the boundary kernel gives us the local linear fitting method
(LL). We used the kernel
K(c) (t) =
12(1 − t2 )
{8 − 16c + 24c2 − 12c3 + t(15 − 30c + 15c2 )}I[−1,c] (3.6)
(1 + c)4 (3c2 − 18c + 19)
in (3.4) and call the resulting estimator fˆLL (x). In this case the bandwidth variation function
is b(c) = 1.86174 − .86174c.
The Jones and Foster (1996) nonnegative adaptation estimator (hereafter J&F) is defined
as
(
fˆJF (x) = f¯(x) exp
)
fˆ(x)
−1
f¯(x)
(3.7)
where f¯(x) is the cut-and-normalized kernel estimator of (3.2). This version of the J&F
estimator takes fˆ to be some boundary kernel estimate of f , here we use fˆ(x) = fˆB (x) as
defined by (3.5).
17
Cowling and Hall’s (1996) pseudo-data estimator generates data beyond the left endpoint
of the support of the density. Their estimator (hereafter C&H) is defined as
" n
¶#
µ
m
X µ x − Xi ¶ X
x
−
X
1
(−i)
K
fˆCH (x) =
+
K
nh i=1
h
h
i=1
(3.8)
where m is an integer of larger order than nh but smaller order than n; we chose m = n9/10 .
Further X(−i) is given by their three-point rule
X(−i) = −5X(i/3) − 4X(2i/3) +
10
X(i)
3
where, for real t > 0, X(t) linearly interpolates among the values 0, X(1) , . . . , X(n) , where X(i)
represents the ith order statistic.
Throughout our simulations, we used the Epanechnikov kernel K(t) = 43 (1 − t2 )I[−1,1]
whenever a kernel of order two was required. Note that outside of the boundary, the majority
of the estimators presented reduce to the regular kernel method given by (1.1) with the
Epanechnikov. The only exception is Cowling and Hall’s, which is a close approximation to
the kernel estimator away from the origin.
Each estimator was tested over various shapes of density curves, but we have collapsed
our results into four specific densities which are representative of the different possible combinations of the behaviour of f at 0. Density 1 handles the case f (0) = 0 and Densities 2,
3, & 4 satisfy f (0) > 0 but f (1) (0) = 0, f (1) (0) > 0 and f (1) (0) < 0, respectively.
In all simulations a sample size of n = 200 was used. The bandwidth was taken to be
the optimal global bandwidth of the regular kernel estimator (1.1), given by the formula
(
h=
R1
K(t)2 dt
−1
R1
R
[ −1 t2 K(t)dt]2 [f (2) (x)]2 dx
18
)1/5
n−1/5 ,
(3.9)
(see Silverman, 1986, Ch. 3). This particular choice was made so that it is possible to
compare the different estimators without regard to bandwidth effects. In our estimation of
d we used the bandwidth h1 = hn−1/20 .
For each density we have calculated the bias, variance and MSE of the estimators at zero,
along with the MISE over the region [0, h). The results are all averaged over 1000 trials and
are presented in Table 1. We have also graphed the performance of ten typical performances
of each estimator, along with the regular kernel estimator, over the boundary region and
part of the interior. These are presented in Figures 1 through 4.
Table 1 and Figures 1 - 4 about here.
In Density 1, with f (0) = 0, our estimator behaves quite well at zero. It is only slightly
behind Hall and Park’s in terms of MSE at zero. In terms of MISE the boundary kernel
puts in the best showing, followed closely by the LL method and J&F. It should be noted
though that the boundary kernel and LL method are usually negative near zero. Despite it’s
good showing at zero the H&P method comes in fourth in MISE as it slightly underestimates
the density over [0, h). In fifth and sixth place are the Z,K&J and C&H methods. In last
place in the MISE column is our estimator, which, from Figure 1, is evidently the result of
severely underestimating f to the right of zero. We believe that error in the estimate dˆ of d
is the reason behind this. When f (0) = 0, dˆ may be inaccurate and introduce error into our
method. Future work may be done on creating better estimates of dˆ in this situation.
For Density 2, with f (1) (0) = 0 and f (0) > 0, our estimator is the clear winner in
terms of MSE at zero and the MISE over the boundary. This is also obvious from Figure
2. It is somewhat surprising that Hall and Park’s estimator is much weaker for this density,
especially given that it and our estimators are all special cases of the same general form
(2.1).
19
Density 3 proves to be difficult for all of the estimators. This is most evident from Figure
3, in which we see a large amount of variability. In terms of MISE all the estimators perform
similarly except for H&P, which does slightly worse than the rest, and C&H, which does
poorly over the whole boundary. At zero, our estimator is the best overall but only by a slim
margin this time. In terms of MISE it is the LL and J&F methods that are the winners, but
again only by a small amount.
Our estimator has its poorest showing on the exponential shape of Density 4. At zero our
estimator produces an extremely high MSE value relative to the others, and even in terms
of MISE it is handily beaten (except by C&H). From Figure 4 we see the underlying reason
is the inability of our estimator to accurately capture the steep growth of the exponential
density. The Z,K&J estimator is quite accurate at zero and over the entire boundary, and
the performance of the boundary kernel, LL method, J&F and H&P estimators do not lag
far behind. Even Cowling and Hall’s estimator does well at zero, but pays the price for it in
MISE over the rest of the boundary region.
4
Data Analysis
We tested our proposed estimator over two datasets. The first consists of 35 measurements of
the average December precipitation in Des Moines, Iowa from 1961 to 1995. The bandwidth
h = 0.425 was produced by cross-validation (see Bowman and Azzalini, 1997). Figure 5
shows our proposed estimator (solid line) along with Hall and Park’s (dashed line). Over
most of the boundary region they are similar, but around zero they disagree on the shape of
the density. Hall and Park’s estimator seems to believe there’s a local minimum to the right
of zero, but the histogram and the rug show otherwise. A larger value for h would probably
diminish this behavior. We have observed that the boundary kernel, LL method and J&F
20
estimators tend to linearly approximate the density over the boundary region, with little
regard for curvature.
Figure 5 about here.
The second dataset is comprised of 365 wind speed measurements taken daily in 1993 at
2 AM in Eindhoven, the Netherlands (source: Royal Netherlands Meteorological Institute).
Intuition would suggest that the underlying density f should be similar to Density 3, with
f (0) > 0 and f (1) (0) > 0, so that there is substantial mass at zero but the mode is slightly
to the right. We used bandwidth h = 2.5 which we chose subjectively. Values suggested by
cross-validation typically seem to undersmooth this data, hence the subjective choice. We
have observed that for this dataset the shape of the estimators is fairly insensitive to the
choice of bandwidth, provided that it is large enough, and so we believe our choice is justified.
Figure 6 shows our proposed estimator (solid line) and the LL method (dashed line) applied
to the wind data. Both estimators capture the hypothesized shape of the density and are
similar over the boundary region. However our estimator clearly produces a smoother curve.
Figure 6 about here.
5
Concluding Remarks
Marron and Ruppert’s (1994) excellent work on kernel density estimation was among the
first to introduce a transformation technique in boundary correction (also see Wand et al.,
1991). Their sophisticated transformation methodology, however, has been labelled “complicated” by some researchers; see, e.g., Jones and Foster (1996). In this paper, we presented an
easy-to-implement, general transformation method given by (2.1). This class of estimators
possesses certain desirable properties, such as being everywhere non-negative and having
21
unit integral asymptotically, and can be tailored specifically to combat the boundary effects
found in the traditional kernel estimator. The proposed estimator defined by (2.15) and
the boundary corrected estimator of Hall and Park (2002) are special cases of the preceding
general technique. Beyond this, both estimators are locally adaptive in the sense that the
amount of transformation depends upon the point of estimation, and, even further, both
transformations are estimated from the data itself. We believe the latter point is especially
important, as it allows for all-important reductions in bias while, at the same time, maintaining low variance. The transformation that we have chosen at (2.14) is a convenient and
computationally easy one, but it is by no means the result of an exhaustive search of functions
satisfying the conditions (2.5). In fact, we conjecture that it may be possible to construct
different transformations satisfying these conditions but with much improved performance,
in the sense that they will be more robust to the shape of the density.
Acknowledgements: This research was supported by a grant from the Natural Sciences
and Engineering Research Council of Canada. We wish to thank the Editor, the Associate
Editors and the referees for very constructive and useful comments that led to a much
improved presentation. We also thank Drs. B.U. Park and S. Zhang for some personal
correspondence.
22
A
Appendix: Proofs
Proof of Lemma 2.1: For x = ch, 0 ≤ c ≤ 1, we have from (2.1)
Z
E f˜n (x)
µ
¶
1
x − g(Xi )
K(t)dt = E K
h
h
−1
¶
Z ∞ µ
1
x − g(y)
=
K
f (y)dy
h 0
h
Z c
f (g −1 ((c − t)h))
dt.
=
K(t) (1) −1
g (g ((c − t)h))
−1
c
(A.1)
Letting q(t) = g −1 ((c − t)h) we have
q (1) (t) =
−h
−h2 g (2) (q(t))
(2)
,
and
q
(t)
=
,
g (1) (q(t))
g (1) (q(t))3
so then a Taylor expansion of order 2 on the function f (q(·))/g (1) (q(·)) at t = c gives
· (1)
¸
f (q(t))
f (q(c))g (1) (q(c)) − f (q(c))g (2) (q(c))
f (q(c))
= (1)
− (t − c)h
g (1) (q(t))
g (q(c))
[g (1) (q(c))]3
½·
1 2
h (t − c)2
f (2) (q(c))
g (2) (q(c))
(1)
2
+ (1)
−
f
(q(c))
[g (q(c))]3
g (1) (q(c))
[g (1) (q(c))]2
¸
g (3) (q(c))g (1) (q(c)) − g (2) (q(c))2
× [g (1) (q(c))]
−f (q(c))
[g (1) (q(c))]3
µ
¶
¾
g (2) (q(c)) g (2) (q(c))
(1)
− 2 f (q(c)) − f (q(c)) (1)
+ o(h2 )
(1)
g (q(c)) g (q(c))
= f (0) − (t − c)h[f (1) (0) − f (0)g (2) (0)]
+
h2
(t − c)2 [f (2) (0) − f (0)g (3) (0) − 3g (2) (0)(f (1) (0) − f (0)g (2) (0))]
2
+ o(h2 ),
(A.2)
23
with the last equality following from q(c) = g −1 (0) = 0 and g (1) (q(c)) = g (1) (0) = 1. Also,
by the existence and continuity of f (2) (·) near 0, for x = ch we have
f (0) = f (x) − chf (1) (x) +
(ch)2 (2)
f (x) + o(h2 ),
2
(A.3)
f (1) (x) = f (1) (0) + chf (2) (0) + o(h),
(A.4)
f (2) (x) = f (2) (0) + o(1).
(A.5)
and
Now from (A.1) to (A.5), we obtain for x = ch, 0 ≤ c ≤ 1,
Z
E f˜n (x)
c
Z
c
(ch)2 (2)
K(t)dt(f (x) − chf (1) (x) +
f (x) + o(h2 ))
2
−1
Z c
−h
(t − c)K(t)dt(f (1) (0) − f (0)g (2) (0))
−1
2 Z c
©
h
(t − c)2 K(t)dt f (2) (0) − f (0)g (3) (0)
+
2 −1
ª
−3g (2) (0)(f (1) (0) − f (0)g (2) (0))
Z c
2 (2)
− (ch) f (0)
K(t)dt + o(h2 )
−1
½
Z c
Z c
(1)
(2)
= f (x)
K(t)dt − h f (0) − f (0)g (0)
tK(t)dt
−1
−1
¾
Z c
(2)
+ f (0)g (0)
cK(t)dt
−1
¶
½µZ c
£
h2
2
+
(t − c) K(t)dt f (2) (0) − f (0)g (3) (0)
2
−1
K(t)dt =
−1
Z
2 (2)
− 2c f
c
(0)
¤
− 3g (2) (0)(f (1) (0) − f (0)g (2) (0))
¾
K(t)dt + o(h2 ).
(A.6)
−1
24
The proof of (2.2) is now completed by dividing both sides of (A.6) by
Rc
−1
K(t)dt.
In order to prove (2.3) we first note that from (2.1),
(
µ
¶)
¶−2
n
X
x
−
g(X
)
1
i
K
Var f˜n (x) =
K(t)dt
Var
nh i=1
h
−1
(
µZ c
¶−2
µ
¶ µ
¶¶2 )
µ
1
x
−
g(X
)
x
−
g(X
)
i
i
=
K(t)dt
E K2
− EK
2
nh
h
h
−1
µZ c
¶−2
=
K(t)dt
(I1 + I2 ),
(A.7)
µZ
c
−1
where
¶
µ
1
x − g(Xi )
2
EK
I1 =
nh2
h
Z c
1
f (g −1 ((c − t)h))
=
K 2 (t) (1) −1
dt
nh −1
g (g ((c − t)h))
µ ¶
Z c
1
1
2
K (t)dt + o
=
f (0)
,
nh
nh
−1
(A.8)
from (A.2). Similarly,
1
I2 = − 2
nh
µ
µ
EK
x − g(Xi )
h
¶¶2
¶2
µ Z c
1
f (g −1 ((c − t)h))
dt
=− 2 h
K(t) (1) −1
nh
g (g ((c − t)h))
−1
µ ¶
1
=o
,
nh
(A.9)
again from (A.2). By combining (A.7) to (A.9) we now complete the proof of (2.3).
Lemma A.1. Let dˆ be defined by (2.10) with h = O(n−1/5 ) and h1 = O(n−1/4 ). Suppose
25
that f (x) > 0 for x = 0, h and that f (2) is continuous in a neighbourhood of 0. Then
·¯
¸
¯3
¯ˆ
¯
E ¯d − d¯ | Xi = xi , Xj = xj = O(h31 )
for any integer i, 1 ≤ i, j ≤ n, where d is given by (2.8).
Proof. Similar to the proof of Lemma A.2 of Zhang, Karunamuni and Jones (1999).
Proof of Theorem 2.1: For x = ch, 0 ≤ c ≤ 1, we write
E fˆn (x) − f (x) = I3 + I4 ,
(A.10)
where I3 = E fˆn (x) − E f˜n (x) and I4 = E f˜n (x) − f (x), with f˜n (x) given by (2.1). From
Lemma 2.1, we obtain |I4 | = o(h2 ) when g is chosen to satisfy (2.4). By an application of
Taylor’s expansion of order 1 on K, we see I3 satisfies
Rc
¯ µ
¶
µ
¶¯
n
¯
( −1 K(t)dt)−1 X
x − gc (Xi ) ¯¯
x − ĝc (Xi )
¯
|I3 | ≤
E ¯K
−K
¯
nh
h
h
i=1
Rc
¯µ
¶
µ
¶¯
n
¯
¯ gc (Xi ) − ĝc (Xi )
( −1 K(t)dt)−1 X
x
−
g
(X
)
+
ε(g
(X
)
−
ĝ
(X
))
c
i
c
i
c
i
(1)
¯,
=
E ¯¯
K
¯
nh
h
h
i=1
(A.11)
where 0 < ε < 1 is a constant. Note that for any constants d and lc , we have for any y ≥ 0,
d
gc (y) = y + lc y 2 + (dlc )2 y 3 = y
2
Thus gc (y) ≥ h for y ≥
16
h.
15
"µ
1
dlc y +
4
¶2
#
15
15
+
≥ y.
16
16
Therefore, εĝc (Xi ) + (1 − ε)gc (Xi ) ≥ h for Xi ≥
26
16
h.
15
Since K
vanishes outside [−1, 1], from (A.11) we have
Rc
¶¯
n ½ ¯µ
¯ gc (Xi ) − ĝc (Xi ) ¯
( −1 K(t)dt)−1 X
¯
¯
|I3 | ≤
E¯
¯
nh
h
i=1
¯
µ
¶¯
¾
¯ (1) x − εĝc (Xi ) − (1 − ε)gc (Xi ) ¯
¯
¯
× ¯K
¯ I[0 ≤ Xi ≤ ρh]
h
n
C X
E |ĝc (Xi ) − gc (Xi )| I[0 ≤ Xi ≤ ρh],
(A.12)
≤
nh2 i=1
where ρ = 16/15 and C = sup|t|≤1 |K (1) (t)|. Now by the definitions of gc and ĝc (see (2.6)
and (2.14)) we obtain
E |ĝc (Xi ) − gc (Xi )|I[0 ≤ Xi ≤ ρh]
¯
¯
¯ lc
¯
= E ¯¯ (dˆ − d)Xi2 + lc2 (dˆ2 − d2 )Xi3 ¯¯ I[0 ≤ Xi ≤ ρh]
2
|lc | 2 2 ˆ
≤
h ρ E |d − d|I[0 ≤ Xi ≤ ρh]
2
+ lc2 (ρh)3 E |dˆ2 − d2 |I[0 ≤ Xi ≤ ρh].
(A.13)
h
i
¯
The Cauchy-Schwartz inequality and Lemma A.1 yield that E |dˆ − d|k ¯Xi = xi = O(hk1 ),
for 1 ≤ k ≤ 3. From this we have
n
o
¯
E |dˆ − d|I[0 ≤ Xi ≤ ρh] = E E[|dˆ − d|I[0 ≤ Xi ≤ ρh]¯Xi = xi ]
n
o
¯
= E I[0 ≤ Xi ≤ ρh] E[|dˆ − d|¯Xi = xi ]
≤ O(h1 ) E I[0 ≤ Xi ≤ ρh]
= o(h2 ),
(A.14)
where the last equality follows from the fact that limn→∞ h−1 E I[0 ≤ Xi ≤ ρh] = f (0) for
27
each 0 ≤ c ≤ 1. From the same expression we also obtain that
E |dˆ2 − d2 |I[0 ≤ Xi ≤ ρh] = E |dˆ − d||dˆ + d|I[0 ≤ Xi ≤ ρh]
= E |dˆ − d||dˆ − d + 2d|I[0 ≤ Xi ≤ ρh]
≤ E|dˆ − d|2 I[0 ≤ Xi ≤ ρh] + 2|d| E |dˆ − d|I[0 ≤ Xi ≤ ρh]
= o(h2 ).
(A.15)
By combining (A.12) to (A.15), we now have |I3 | = o(h2 ). The proof is now completed by
the preceding result, the fact that |I4 | = o(h2 ), and (A.10).
We now prove (2.17). First write
µZ
¶2
c
K(t)dt
−1
( n
)
X µ x − ĝc (Xi ) ¶
1
Var fˆn (x) =
Var
K
(nh)2
h
( i=1
µ
¶
µ
¶¸
n ·
X
x − ĝc (Xi )
x − gc (Xi )
1
=
Var
K
−K
(nh)2
h
h
i=1
)
µ
¶
n
X
x − gc (Xi )
+
K
h
i=1
= I5 + I6 + I7 ,
(A.16)
where gc is given by (2.6),
I5 =
1
Var
(nh)2
(
n ·
X
i=1
µ
K
x − ĝc (Xi )
h
n
X
1
I6 =
Var
K
(nh)2
i=1
28
µ
¶
µ
−K
x − gc (Xi )
h
x − gc (Xi )
h
¶¸)
,
(A.17)
¶
,
(A.18)
and
à n · µ
¶
µ
¶¸ X
¶!
µ
n
X
x − gc (Xi )
2
x − ĝc (Xi )
x − gc (Xi )
Cov
.
I7 =
K
−K
,
K
(nh)2
h
h
h
i=1
i=1
(A.19)
From Lemma 2.1, we have
f (0)
I6 =
nh
Z
c
µ
2
K (t)dt + o
−1
1
nh
¶
.
(A.20)
Now consider I5 . Note that
( n · µ
¶
¶¸)2
µ
X
1
x − gc (Xi )
x − ĝc (Xi )
I5 ≤
E
−K
K
(nh)2
h
h
i=1
· µ
¶
µ
¶¸2
n
x − ĝc (Xi )
x − gc (Xi )
1 X
=
E K
−K
(nh)2 i=1
h
h
·
µ
¶
µ
¶¸
X
2
x − ĝc (Xi )
x − gc (Xi )
+
E K
−K
(nh)2 1≤i<j≤n
h
h
· µ
¶
µ
¶¸
x − ĝc (Xj )
x − gc (Xj )
× K
−K
h
h
= I8 + I9 .
(A.21)
29
By an application of Taylor’s expansion of order 1 on K, we obtain
¶
µ
¶¸2
· µ
n
x − gc (Xi )
1 X
x − ĝc (Xi )
I8 =
E K
−K
(nh)2 i=1
h
h
·
µ
¶¸2
n
1 X
gc (Xi ) − ĝc (Xi ) (1) x − gc (Xi ) + ε(gc (Xi ) − ĝc (Xi ))
=
E
K
(nh)2 i=1
h
h
·
µ
¶¸2
n
x − gc (Xi ) + ε(gc (Xi ) − ĝc (Xi ))
1 X
(1)
E (gc (Xi ) − ĝc (Xi ))K
≤ 2 4
n h i=1
h
n
C X
≤ 2 4
E(ĝc (Xi ) − gc (Xi ))2 I[0 ≤ Xi ≤ ρh],
n h i=1
(A.22)
using an argument similar to (A.12) above, where 0 < ε < 1, ρ = 16/15 and C > 0 are all
constants independent of n. Similar to (A.13) we can write
E(ĝc (Xi ) − gc (Xi ))2 I[0 ≤ Xi ≤ ρh]
¸2
·
lc ˆ
2
2 ˆ2
2
3
(d − d)Xi + lc (d − d )Xi I[0 ≤ Xi ≤ ρh]
=E
2
l2
≤ c (ρh)4 E(dˆ − d)2 I[0 ≤ Xi ≤ ρh] + 2lc4 (ρh)6 E(dˆ2 − d2 )2 I[0 ≤ Xi ≤ ρh]
2
≤ O(h4 h21 h) + O(h6 h21 h)
= o(h7 ).
(A.23)
Now combining (A.21) and (A.23), we obtain I8 = o (n−1 h3 ) = o ((nh)−1 ).
An argument similar to (A.22) yields that
|I9 | ≤
C
n2 h4
X
E |ĝc (Xi ) − gc (Xi )| |ĝc (Xj ) − gc (Xj )| I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh],
1≤i≤j≤n
(A.24)
where ρ = 16/15, and C > 0 is a constant independent of n. As in (A.23), we obtain using
30
Lemma A.1,
E |ĝc (Xi ) − gc (Xi )| |ĝc (Xj ) − gc (Xj )| I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh]
¯
¯
¯
¯ lc
2
2 ˆ2
2
3¯
ˆ
¯
= E ¯ (d − d)Xi + lc (d − d )Xi ¯
2
¯
¯
¯ lc
¯
2
2 ˆ2
2
3¯
ˆ
¯
× ¯ (d − d)Xj + lc (d − d )Xj ¯ I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh]
2
n
o2
≤ C1 E h2 |dˆ − d| + h3 |dˆ2 − d2 | I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh]
n
≤ 2C1 h4 E |dˆ − d|2 I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh]
o
2
2 2
ˆ
+ E |d − d | I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh]
©
ª
≤ C2 h4 h21 E I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh]
= O(h4 h21 h2 )
= o(h8 ),
(A.25)
where C1 , C2 are positive constants independent of n. Now combine (A.24) and (A.25) to
conclude that I9 = o(h4 ) = o((nh)−1 ). Similarly, it is easy to show that I4 = o((nh)−1 ) using
the covariance inequality. This completes the proof of (2.17).
31
References
[1] Abramson, I. (1982). On bandwidth variation in kernel estimates - A square root law.
The Annals of Statistics, 10, 1217 - 1223.
[2] Bowman, A.W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: the kernel approach with S-plus illustrations, Oxford University Press.
[3] Cheng, M.Y. (1997). Boundary-aware estimators of integrated density derivative. Journal of the Royal Statistical Society Ser. B, 59, 191-203.
[4] Cheng, M.Y., Fan, J. and Marron, J.S. (1997). On Automatic Boundary Corrections.
The Annals of Statistics, 25, 1691-1708.
[5] Cline, D.B.H and Hart, J.D. (1991). Kernel Estimation of Densities of Discontinuous
Derivatives. Statistics, 22, 69-84.
[6] Choi, E. and Hall, P. (1999). Data sharpening as a prelude to density estimation.
Biometrika, 86, 941-947.
[7] Cowling, A. and Hall, P. (1996). On Pseudodata Methods for Removing Boundary
Effects in Kernel Density Estimation. Journal of the Royal Statistical Society Ser. B,
58, 551-563.
[8] Gasser, T. and Müller, H.G. (1979). Kernel Estimation of Regression Functions. In
Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics 757, eds.
T. Gasser and M. Rosenblatt, Heidelberg: Springer-Verlag, pp. 23-68.
[9] Gasser, T., Müller, H.G. and Mammitzsch, V. (1985). Kernels for Nonparametric Curve
Estimation. Journal of the Royal Statistical Society Ser. B., 47, 238-252.
32
[10] Hall, P., Hu, T.C., and Marron, J.S. (1995). Improved variable window kernel estimates
of probability densities. The Annals of Statistics, 23, 1-10.
[11] Hall, P. and Park, B.U. (2002). New methods for bias correction at endpoints and
boundaries. The Annals of Statistics, 30, 1460-1479.
[12] Jones, M.C. (1993). Simple Boundary Correction for Kernel Density Estimation. Statistics and Computing, 3, 135-146.
[13] Jones, M.C. and Foster, P.J. (1996). A Simple Nonnegative Boundary Correction
Method for Kernel Density Estimation. Statistica Sinica, 6, 1005-1013.
[14] Marron, J.S. and Ruppert, D. (1994). Transformations to Reduce Boundary Bias in
Kernel Density Estimation. Journal of the Royal Statistical Society Ser. B, 56, 653-671.
[15] Müller, H.G. (1991). Smooth Optimum Kernel Estimators Near Endpoints. Biometrika,
78, 521-530.
[16] Parzen, E. (1962). On estimation of a probability density function and mode. Annals of
Mathematical Statistics, 33, 1065-1076.
[17] Park, B.U., Jeong, S.O., Jones, M.C. and Kang, K.H. (2003). Adaptive Variable Location Kernel Density Estimators with Good Performance at Boundaries. Nonparametric
Statistics, 15, 61-75.
[18] Rosenblatt, M. (1956). Remarks on Some Nonparametric Estimates of a Density Function. Annals of Mathematical Statistics, 27, 832-837.
[19] Royal Netherlands Meteorological Institute (2001), Available wind speed data.
http://www.knmi.nl/samenw/hydra/meta data/time series.htm, s370.asc, June 23,
2003.
33
[20] Schuster, E.F. (1985). Incorporating Support Constraints Into Nonparametric Estimators of Densities. Communications in Statistics, Part A - Theory and Methods, 14,
1123-1136.
[21] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, London:
Chapman and Hall.
[22] Terrell, G.R. and Scott, D.W. (1992). Variable kernel density estimation. The Annals
of Statistics, 20, 1236 - 1265.
[23] Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing, London: Chapman and Hall.
[24] Wand, M.P., Marron, J.S. and Ruppert, D. (1991). Transformations in Density Estimation (with discussion). Journal of the American Statistical Association, 86, 343-361.
[25] Zhang, S. and Karunamuni, R.J. (1998). On Kernel Density Estimation Near Endpoints.
Journal of Statistical Planning and Inference, 70, 301-316.
[26] Zhang, S. and Karunamuni, R.J. (2000). On Nonparametric Density Estimation at the
Boundary. Nonparametric Statistics, 12, 197-221.
[27] Zhang S., Karunamuni, R.J. and Jones, M.C. (1999). An Improved Estimator of the
Density Function at the Boundary. Journal of the American Statistical Association, 448,
1231-1241.
34
Table 1: Bias, variance and MSE of the indicated estimators at 0, and their MISE over [0, h).
New Method
H&P method
Z,K&J method
Boundary Kernel
LL method
J&F method
C&H method
Density 1: f (x) = x2 e−x , x ≥ 0,
with n = 200, h = .832553
Bias
Var
MSE
MISE
.001726
.000026 .000029 .000821
.001482
.000023 .000025 .000375
.021782
.000078 .000553 .000426
-.017091 .000345 .000637 .000234
-.018519 .000431 .000774 .000271
.011745
.000086 .000223 .000268
.026570
.000082 .000788 .000501
p
2
Density 2: f (x) = 2/πe−x /2 , x ≥ 0,
with n = 200, h = .707481
Bias
Var
MSE
MISE
-.014177 .005493 .005694 .001994
.011316
.015845 .015973 .002879
.082133
.011484 .018230 .003481
.052554
.009875 .012637 .002736
.054271
.009811 .012757 .002616
.059196
.010808 .014312 .002915
.015713
.015374 .015621 .005709
New Method
H&P method
Z,K&J method
Boundary Kernel
LL method
J&F method
C&H method
Density 3: f (x) = (x2 + 2x + 12 )e−2x ,
x ≥ 0, with n = 200, h = .467044
Bias
Var
MSE
MISE
.049289
.017806 .020235 .003527
.031373
.024642 .025626 .003719
.098353
.014214 .023887 .003230
.090833
.013616 .021867 .003202
.093012
.013934 .022586 .003172
.092976
.013373 .022017 .003178
.091860
.027231 .035669 .005177
Density 4: f (x) = 2e−2x , x ≥ 0,
with n = 200, h = .342128
Bias
Var
MSE
MISE
-.417383 .012278 .186487 .011051
-.163295 .048959 .075624 .006007
-.111601 .025951 .038406 .005838
-.122779 .044610 .059684 .006358
-.124788 .043427 .058999 .005948
-.087819 .051707 .059419 .006785
-.117587 .036915 .050741 .025894
2
35
1.00
0.00
0.00
0.25
0.25
0.75
0.75
x2 −x
e ,x
2
LL
0.50
Kernel
0.50
Figure 1: Ten typical estimates of density 1, f (x) =
Boundary
0.00
0.00
0.75
0.05
0.05
0.50
0.10
0.10
0.25
0.15
0.15
0.00
0.20
0.20
New Method
1.00
0.00
0.00
0.75
0.05
0.05
0.50
0.10
0.10
0.25
0.15
0.15
0.00
0.20
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.00
0.25
0.25
J&F
0.50
H&P
0.50
0.75
0.75
1.00
1.00
0.00
0.05
0.10
0.15
0.20
0.00
0.05
0.10
0.15
0.20
0.00
0.00
0.25
0.25
≥ 0, with the optimal global bandwidth h = .832553.
1.00
1.00
0.00
0.05
0.10
0.15
0.20
C&H
0.50
Z,K&J
0.50
0.75
0.75
1.00
1.00
1.00
0.00
0.00
0.25
0.25
p
LL
0.50
Kernel
0.50
Figure 2: Ten typical estimates of density 2, f (x) =
Boundary
0.50
0.50
0.75
0.75
0.75
0.50
1.00
0.25
1.00
1.00
0.00
0.75
New Method
0.50
0.50
0.50
0.25
0.75
0.75
0.00
1.00
1.00
0.50
0.75
1.00
0.00
0.00
0.25
0.25
J&F
0.50
H&P
0.50
0.75
0.75
1.00
1.00
0.50
0.75
1.00
0.50
0.75
1.00
0.00
0.00
0.25
0.25
, with the optimal global bandwidth h = .707481.
1.00
1.00
2 /2
2/πe−x
0.75
0.75
0.50
0.75
1.00
C&H
0.50
Z,K&J
0.50
0.75
0.75
1.00
1.00
0.60
0.00
0.00
0.20
0.20
LL
0.40
Kernel
0.40
0.60
0.60
0.40
0.60
0.80
0.40
0.60
0.80
0.00
0.00
0.20
0.20
J&F
0.40
H&P
0.40
0.60
0.60
0.40
0.60
0.80
0.40
0.60
0.80
0.00
0.00
0.20
0.20
0.40
C&H
0.40
Z,K&J
Figure 3: Ten typical estimates of density 3, f (x) = 12 (2x2 + 4x + 1)e−2x , with the optimal global bandwidth h = .467044.
Boundary
0.40
0.40
0.40
0.60
0.60
0.20
0.80
0.00
0.60
0.80
New Method
0.40
0.40
0.40
0.20
0.60
0.60
0.00
0.80
0.80
0.60
0.60
1.00
1.50
2.00
2.50
1.00
1.50
2.00
2.50
J&F
0.00 0.10 0.20 0.30 0.40 0.50
H&P
0.00 0.10 0.20 0.30 0.40 0.50
1.00
1.50
2.00
2.50
1.00
1.50
2.00
2.50
C&H
0.00 0.10 0.20 0.30 0.40 0.50
Z,K&J
0.00 0.10 0.20 0.30 0.40 0.50
Figure 4: Ten typical estimates of density 4, f (x) = 2e−2x , with the optimal global bandwidth h = .342128.
LL
1.00
1.00
Boundary
1.50
1.50
0.00 0.10 0.20 0.30 0.40 0.50
2.00
2.00
0.00 0.10 0.20 0.30 0.40 0.50
2.50
2.50
Kernel
1.00
1.00
New Method
1.50
1.50
0.00 0.10 0.20 0.30 0.40 0.50
2.00
2.00
0.00 0.10 0.20 0.30 0.40 0.50
2.50
2.50
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Figure 5: Density estimates for 35 measurements of average December precipitation in Des Moines, Iowa from 1961-1965,
shown on the rug, with h = 0.425. The solid line is our proposed estimator (without BVF) and the dashed line is Hall
and Park’s.
0.0
0.1
0.2
0.3
0.4
0.5
0
5
10
15
Figure 6: Density estimates for 365 measurements of wind speed at Eindhoven, the Netherlands, taken at 2 AM every
day during 1993. The values are plotted on the bottom. The solid line is our proposed estimator (without BVF) and the
dashed line is the LL estimator, both with bandwidth h = 2.5.
0.00
0.05
0.10
0.15
Download