A Locally Adaptive Transformation Method of Boundary Correction in Kernel Density Estimation R.J. Karunamunia and T. Albertsb a Department of Mathematical and Statistical Sciences University of Alberta, Edmonton, Alberta, Canada, T6G 2G1; b Courant Institute of Mathematical Sciences, New York University, New York, NY, USA 10012 Abstract Kernel smoothing methods are widely used in many research areas in statistics. However, kernel estimators suffer from boundary effects when the support of the function to be estimated has finite endpoints. Boundary effects seriously affect the overall performance of the estimator. In this article, we propose a new method of boundary correction for univariate kernel density estimation. Our technique is based on a data transformation that depends on the point of estimation. The proposed method possesses desirable properties such as local adaptivity and non-negativity. Furthermore, unlike many other transformation methods available, the proposed estimator is easy to implement. In a Monte Carlo study, the accuracy of the proposed estimator is numerically analyzed and compared with the existing methods of boundary correction. We find that it performs well for most shapes of densities. The theory behind the new methodology, along with the bias and variance of the proposed estimator, are presented. Results of a data analysis are also given. Keywords: density estimation; mean squared error; kernel estimation; transformation. Short Title: Kernel Density Estimation AMS Subject Classifications: Primary: 62G07; Secondary: 62G20 Corresponding Author: R.J.Karunamuni@ualberta.ca 1 1 Introduction Nonparametric kernel density estimation is now popular and in wide use with great success in statistical applications. Kernel density estimates are commonly used to display the shape of a data set without relying on a parametric model, not to mention the exposition of skewness, multimodality, dispersion, and more. Early results on kernel density estimation are due to Rosenblatt (1956) and Parzen (1962). Since then, much research has been done in the area; see, e.g., the monographs of Silverman (1986), and Wand and Jones (1995). Let f denote a probability density function with support [0, ∞) and consider nonparametric estimation of f based on a random sample X1 , . . . , Xn from f . Then the traditional kernel estimator of f is given by n 1 X fn (x) = K nh i=1 µ x − Xi h ¶ , (1.1) where K is some chosen unimodal density function, symmetric about zero with support traditionally on [−1, 1], and h is the bandwidth (h → 0 as n → ∞). The basic properties of fn are well-known, and under some smoothness assumptions these include, for x ≥ 0, Z Z c h2 (2) E fn (x) = f (x) K(t)dt − hf (x) tK(t)dt + f (x) t2 K(t)dt + o(h2 ), 2 −1 µ ¶ −1 Z−1c f (x) 1 Var fn (x) = K 2 (t)dt + o , nh −1 nh c Z (1) c where c = min(x/h, 1). For x ≥ h, the so-called interior points, the bias of fn (x) is of order O(h2 ), whereas at boundary points, i.e. for x ∈ [0, h), fn is not even consistent. In nonparametric curve estimation problems this phenomenon is referred to as the “boundary effects” of (1.1). To remove these boundary effects a variety of methods have been developed in the literature. Some well-known methods are summarized below: 2 • The reflection method (Cline and Hart, 1991; Schuster, 1985; Silverman, 1986). • The boundary kernel method (Gasser and Muller, 1979; Gasser et al., 1983; Jones, 1993; Muller, 1991; Zhang and Karunamuni, 2000). • The transformation method (Marron and Ruppert, 1994; Ruppert and Cline, 1994; Wand et al., 1991). • The pseudo-data method (Cowling and Hall, 1996). • The local linear method (Cheng et al., 1997; Cheng, 1997; Zhang and Karunamuni, 1998). • Other methods (Zhang et al., 1999; Hall and Park, 2002). The reflection method is specifically designed for the case f (1) (0) = 0, where f (1) denotes the first derivative of f . The boundary kernel method is more general than the reflection method in the sense that it can adapt to any shape of density. However, a drawback of this method is that the estimates might be negative near the endpoints; especially when f (0) ≈ 0. To correct this deficiency of boundary kernel methods some remedies have been proposed; see Jones (1993), Jones and Foster (1996), Gasser and Muller (1979) and Zhang and Karunamuni (1998). The local linear method is a special case of the boundary kernel method that is thought of by some as a simple, hard-to-beat default approach, partly because of “optimal” theoretical properties (Cheng et al., 1997) in the boundary kernel implicit in local linear fitting. The pseudo-data method of Cowling and Hall (1996) generates some extra data X(i) ’s using what they call the “three-point-rule”, which are then combined with the original data Xi ’s to form a kernel type estimator. Marron and Ruppert’s (1994) transformation method consists of a three-step process. First, a transformation g is selected from a parametric family so that the density of Yi = g(Xi ) has a first derivative that is approximately equal 3 to 0 at the boundaries of its support. Next, a kernel estimator with reflection is applied to the Yi ’s. Finally, this estimator is converted by the change-of-variables formula to obtain an estimate of f . Among other methods, two very promising recent ones are due to Zhang et al. (1999) and Hall and Park (2002). The former method is a combination of the methods of pseudo-data, transformation and reflection; whereas the latter method is based on what they call an “empirical translation correction”. The boundary kernel and related methods usually have low bias but the price for that is an increase in variance. It has been observed that approaches involving only kernel modifications without regard to f or data, such as the boundary kernel method, are always associated with larger variance. Furthermore, the corresponding estimates tend to take negative values near the boundary points. On the other hand, transformation-based boundary correction estimates are non-negative and generally have low variance, possibly due to non-negativity. It has gradually gotten through to researchers that this non-negativity property is important in practical applications and the approaches producing non-negative estimators are well worth exploring. In this article, we develop a new transformation approach for boundary correction. Our method consists of a straightforward transformation of data. It is easy to implement in practice compared to other existing transformation methods (compare with Marron and Ruppert, 1994). A distinct feature of our method is that it is locally adaptive; that is, the transformation depends on the point of estimation. Other desirable properties of the proposed estimator are: (i) it is non-negative everywhere, and (ii) it reduces to the traditional kernel estimator (1.1) at the interior points. In simulations, we compared the proposed estimator with other existing methods of boundary correction and observed that it performs quite well for most shapes of densities. Section 2 contains the methodology and the main results. Section 3 and 4 present simulation studies and a data analysis, respectively. Final comments are given in Section 5. 4 2 2.1 Locally Adaptive Transformation Estimator Methodology For convenience, we shall assume that the unknown probability density function f has support [0, ∞), and consider estimation of f based on a random sample X1 , . . . , Xn from f . Our transformation idea is based on transforming the original data X1 , . . . , Xn to g(X1 ), . . . , g(Xn ), where g is a non-negative, continuous and monotonically increasing function from [0, ∞) to [0, ∞). Based on the transformed data, we now define for x = ch, c ≥ 0, n 1 X K f˜n (x) = nh i=1 µ ¶ Z x − g(Xi ) . c K(t)dt, h −1 (2.1) where h is the bandwidth (again h → 0 as n → ∞), K is a non-negative symmetric R R kernel function with support on [−1, 1], satisfying K(t)dt = 1, tK(t)dt = 0, and 0 6= R 2 t K(t)dt < ∞. We can now state the following lemma, which exhibits the bias and variance of (2.1) under certain conditions on g and f . The proof is given in the Appendix. Lemma 2.1. Let f˜n be defined by (2.1). Assume that f (2) and g (2) exist and are continuous on [0, ∞), where f (i) and g (i) denote the ith derivative of f and g, respectively, with f (0) = f, g (0) = g. Further assume that g(0) = 0 and g (1) (0) = 1. Then for x = ch, 0 ≤ c ≤ 1, we 5 have −h E f˜n (x) − f (x) = R c K(t)dt −1 ½ Z c Z (2) (1) f (0)g (0) (c − t)K(t)dt + f (0) 2 −1 c ½ Z Z c ¾ tK(t)dt −1 c h + Rc −f (2) (0)c2 K(t)dt + (t − c)2 K(t)dt 2 −1 K(t)dt −1 −1 ¾ £ (2) ¤ (3) (2) (1) (2) × f (0) − f (0)g (0) − 3g (0)(f (0) − f (0)g (0)) + o(h2 ), (2.2) and f (0) Rc Var f˜n (x) = nh( −1 K(t)dt)2 f (x) Rc = nh( −1 K(t)dt)2 Z c µ 2 K (t)dt + o −1 Z c µ 2 K (t)dt + o −1 1 nh 1 nh ¶ ¶ , (2.3) The last equality follows since f (0) = f (x) − chf (1) (x) + (ch)2 f (2) (x)/2 + o(h2 ) for x = ch (see (A.3)). Note that the leading term of the variance of f˜n is not affected by the transR1 formation g. When c = 1, Var f˜n (x) = f (x)(nh)−1 −1 K 2 (t)dt + o((nh)−1 ), which is exactly the expansion of the interior variance of the traditional estimator (1.1). We shall use our transformation g so that the first order term in the bias expansion (2.2) is zero. Assuming that f (0) > 0, it is enough to let Z (2) g (0) = −f (1) c (0) Z c . tK(t)dt f (0) (c − t)K(t)dt. −1 (2.4) −1 Note that the right-hand side of (2.4) depends on c; that is, the transformation g depends 6 on the point of estimation inside the boundary region [0, h). We call this property local adaptivity. In order to display this local dependence we shall denote g by gc , 0 ≤ c ≤ 1, in what follows. Combining (2.4) with the additional assumptions in Lemma 2.1, gc should now satisfy the following three conditions for each c, 0 ≤ c ≤ 1: (i) gc : [0, ∞) → [0, ∞), gc is continuous, monotonically increasing and gc(i) exists, i = 1, 2, 3, (ii) gc (0) = 0, gc(1) (0) = 1 Z c Z c . (2) (1) (iii) gc (0) = −f (0) tK(t)dt f (0) (c − t)K(t)dt. −1 (2.5) −1 Functions satisfying conditions (2.5) can be easily constructed, with the simplest candidates being polynomials. In order to satisfy property (iii) of (2.5) and maintain gc as an increasing function we require at least a cubic (for the case f (1) (0) < 0). So for 0 ≤ c ≤ 1 we use the transformation d gc (y) = y + lc y 2 + d2 lc2 y 3 , 2 (2.6) where Z .Z c lc = − c tK(t)dt −1 (c − t)K(t)dt (2.7) −1 and ± d = f (1) (0) f (0). 7 (2.8) When c = 1, gc (y) = y. This means that f˜n (x) defined by (2.1) with gc of (2.6) reduces to the usual kernel estimator fn (x) defined by (1.1) for interior points, i.e., for x ≥ h, f˜n (x) coincides with fn (x). Higher order polynomials could be used to obtain the same properties but they are increasingly more complicated to deal with and offer very little marginal advantage. For gc defined by (2.6), the bias term of (2.2) takes the form h2 ˜ R E fn (x) − f (x) = c 2 −1 K(t)dt +6 (f (1) 2 (0)) f (0) ½ Z c (2) f (0) (t2 − 2tc)K(t)dt Z −1 c 2 (t − c) −1 K(t)dt(lc2 for x = ch, 0 ≤ c ≤ 1. Note that when c = 1, E f˜n (x) − f (x) = ¾ + lc ) + o(h2 ), (2.9) R1 2 h2 (2) f (x) t K(t)dt + o(h2 ), 2 −1 which is exactly the same expression of the interior bias of the traditional kernel estimator (1.1), since f (2) (x) = f (2) (0) + o(1) for x = ch (see (A.5)). 2.2 Estimation of gc In order to implement the transformation (2.6) in practice, one must replace d = f (1) (0)/f (0) with a pilot estimate. This requires the use of another density estimation method, such as semiparametric, kernel or nearest neighbour method. Our experience is that proposed estimator of f given below is insensitive to the fine details of the pilot estimate of d, and therefore any convenient estimate can be employed. A natural estimate would be a fixed kernel estimate with bandwidth chosen optimally at x = 0 as in Zhang et al. (1999). Other possible estimators of d are proposed in Choi and Hall (1999) and Park et al. (2003). Following Zhang et al. (1999), in this paper we use the kernel type pilot estimate of d given 8 by dˆ = (log fn∗ (h1 ) − log fn∗ (0))/h1 , (2.10) where fn∗ (h1 ) µ ¶ n h1 − Xi 1 1 X K + 2 = nh1 i=1 h1 n (2.11) and ( fn∗ (0) = max µ ¶ n 1 X −Xi 1 K(0) , 2 nh0 i=1 h0 n ) , (2.12) where h1 = o(h) with h as in (2.1), K is a usual symmetric kernel function as in (1.1), K(0) is a so-called endpoint order two kernel satisfying Z Z 0 K(0) (t)dt = 1, −1 Z 0 0 tK(0) (t)dt = 0, and 0 < −1 t2 K(0) (t)dt < ∞, −1 and h0 = b(0)h1 with b(0) given by ( R1 2 ) R0 2 ( −1 t K(t)dt)2 ( −1 K(0) (t)dt) b(0) = . R0 R1 ( −1 t2 K(0) (t)dt)2 ( −1 K 2 (t)dt) (2.13) dˆ ĝc (y) = y + lc y 2 + dˆ2 lc2 y 3 , 2 (2.14) We now define for 0 ≤ c ≤ 1, as our estimator of gc (y) defined by (2.6), where dˆ is given by (2.10). Note that ĝc and dˆ depend on n but this dependence is suppressed for notational convenience. 9 It has been argued in the literature (see, e.g., Park et al., 2003) that one may use a bandwidth h1 different from h for the pilot estimator of d, and that h1 should be of the same order as or faster than h to inherit the full advantages of the proposed density estimator. This adjustment also improves the asymptotic theoretical results of our proposed density estimator fˆn (x) given below at (2.15), and should act as the basis for the choice of h1 in (2.10) and (2.11). 2.3 The Proposed Estimator Our proposed estimator of f (x) is defined as, for x = ch, c ≥ 0, n 1 X fˆn (x) = K nh i=1 µ ¶ Z x − ĝc (Xi ) . c K(t)dt, h −1 (2.15) where ĝc is given by (2.14), K is as given in (2.1), and h is the bandwidth used in (2.1). For x ≥ h, fˆn (x) reduces to the traditional kernel estimator fn (x) given by (1.1). Thus fˆn (x) is a natural boundary continuation of the usual kernel estimator (1.1). An important adjustment in the estimator (2.15) is that it is based on a locally adaptive transformation ĝc given by (2.14), since ĝc transforms the data depending on the point of estimation. Furthermore, it is important to remark here that the adaptive kernel estimator (2.15) is non-negative (provided K is non-negative), a property shared with reflection based estimators (see Jones and Foster, 1996; Zhang et al. 1999) and transformation based estimators (see Marron and Ruppert, 1994; Zhang et al. 1999; Hall and Park, 2002), but not with most boundary kernel approaches. The next theorem establishes the bias and variance of (2.15). Theorem 2.1. Let fˆn (x) be defined by (2.15) with a kernel function K as given in (2.1) and a bandwidth h = O(n−1/5 ). Suppose h1 in (2.10) is of the form h1 = O(n−1/4 ). Further 10 assume that K (1) exists and is continuous on [−1, 1], and that f (0) > 0 and f (2) exists and is continuous in a neighbourhood of 0. Then for x = ch, 0 ≤ c ≤ 1, we have ½ Z c (2) f (0) (t2 − 2tc)K(t)dt h2 Rc 2 −1 K(t)dt E fˆn (x) − f (x) = +6 (f (1) (0)) f (0) 2 −1 Z c 2 (t − c) −1 K(t)dt(lc2 ¾ + lc ) + o(h2 ), (2.16) where lc is given by (2.7), and Var fˆn (x) = Z f (0) Rc nh( −1 K(t)dt)2 µ c 2 K (t)dt + o −1 1 nh ¶ . (2.17) Remark 2.1 As one of the referees correctly pointed out, the choice of bandwidth is very important for the good performance of any kernel estimator. The optimal bandwidth for (2.15) can be derived as follows. From (2.16) and (2.17), the leading terms of the asymptotic MSE of fˆn (x) at x = ch, 0 ≤ c ≤ 1 are h4 4 µZ ¶−2 ½ c K(t)dt Z f (2) −1 c (0) (t2 − 2tc)K(t)dt −1 Z −1 +6(f (0)) (f µZ −1 ¶−2 Z c +(nh) f (0) (1) (0)) c −1 11 c 2 (t − c) −1 K(t)dt −1 2 K 2 (t)dt. K(t)dt(lc2 ¾2 + lc ) Minimizing the preceding expression with respect to h yields ½· hc = n −1/5 Z ¸· c 2 K (t)dt 4f (0) Z (2) f c (t2 − 2tc)K(t)dt (0) −1 −1 Z c + 6(f (0))−1 (f (1) (0))2 −1 ¸−2 )1/5 (t − c)2 K(t)dt(lc2 + lc ) (2.18) which is the local optimal bandwidth at x = ch, 0 ≤ c ≤ 1, for the transformation estimator (2.15). Expression (2.18) is quite similar to that of the “adaptive variable location” kernel estimator given by Park et al. (2003). In order to see this more clearly we find from their paper that the leading terms of the asymptotic MSE of their adaptive location kernel estimator at x = ch, 0 ≤ c ≤ 1, are h4 4 µZ ¶−2 ½ 1 K(t)dt ·Z f (2) Z 1 2 t K(t)dt + 2ρ(c) (x) −c −c −1 Z +(f (x)) (f µZ −1 (1) −c (t)dt 1 K (2) ¾2 (t)dt −c 1 K 2 (t)dt, K(t)dt −c −c R1 2 tK (1) −c (x)) ρ(c) ¶−2 Z 1 +(nh) f (x) where ρ(c) = (K(c))−1 2 ¸ 1 tK(t)dt. Minimizing the preceding expression with respect to h yields ½· h∗c −1/5 =n Z 1 4f (x) ¸· 2 K (t)dt µZ f (2) (x) −c Z 1 2 t K(t)dt + 2ρ(c) −c Z + (f (x))−1 (f (1) (x))2 ρ(c)2 1 1 ¶ (t)dt −c ¸−2 )1/5 K (2) (t) tK (1) , (2.19) −c which is the local optimal bandwidth at x = ch, 0 ≤ c ≤ 1, for Park et al.’s (2003) adaptive estimator. It is easy to see the similarity between (2.18) and (2.19). In applications neither expression can be employed directly, however, due to their dependence on the unknown 12 functional quantities f (x), f (1) (x) and f (2) (x). An alternative approach is to apply the “bandwidth variation function” technique that is frequently implemented in boundary kernel methods. “Adaptive bandwidth” or “variable window width” kernel density estimators are generally of the form n 1 X 1 f˜n (x) = K nh i=1 α(Xi ) µ x − Xi hα(Xi ) ¶ , (2.20) where α is some non-negative function; see, e.g. Abramson (1982), Hall et al. (1995) and Terrell and Scott (1992), among others. It is easy to show that the estimator (2.20) also exhibits boundary effects when the unknown density f is supported on [0, ∞) or [0, 1] and a kernel K with support on [−1, 1] is used. To the best of our knowledge there are no boundary adjusted adaptive bandwidth or variable window width kernel estimators that exist in the literature. A transformation technique based on such an estimator would be n f¯n (x) = X 1 1 Rc K nh −1 K(t)dt i=1 α(g(Xi )) µ x − g(Xi ) α(g(Xi )) ¶ , (2.21) where g is a suitable transformation, like (2.6), that removes the boundary effects. However, the details of estimators of the form (2.21) are beyond the scope of the present paper. Remark 2.2 Smoothing methods based on local linear and local polynomial fitting methods are now popular. In the density estimation context the local polynomial method has been implemented in Cheng et al. (1997), Cheng (1997) and Zhang and Karunamuni (1998), among others. For example, Zhang and Karunamuni (1998) have studied the following local polynomial density estimator (a very similar estimator is given in Cheng (1997) and Cheng et 13 al. (1997)): n fn∗ (x) 1 X ∗ = K nh i=1 o µ x − Xi h ¶ (2.22) where Ko∗ (t) = eTo S −1 (1, t, . . . , tp )T K(t), S = (sj,l ) is a p × p matrix with sj,l = R tj+l K(t)dt, 0 ≤ j, l ≤ p, and eT = (0, 1, . . . , 0)T is a p × 1 vector whose second element is 1 and the others are zero. Then the bias and variance of (2.22) at x = ch, 0 ≤ c ≤ 1, are Bias(fn∗ (x)) 1 = (b(c)h)p+1 f (p+1) (x) (p + 1)! and Var(fn∗ (x)) 1 = f (x) nhb(c) Z c/b(c) −1 Z ¡ c/b(c) −1 ∗ (t)dt tp+1 Ko,c/b(c) ∗ Ko,c/b(c) (t) ¢2 dt, ∗ where b : [0, 1] → R+ with b(1) = 1 is a bandwidth variation function and Ko,c is called an “equivalent boundary kernel”; see Zhang and Karunamuni (1998) for more details. The asymptotic MSE of fn∗ (x) is typically less than that of fˆn (x) given by (2.15) in a weak minimax sense; see Cheng et al. (1997). However, local polynomial estimators are ∗ based on the so-called equivalent boundary kernels Ko,c which are not always positive. As a result, local polynomial estimators may take negative values, especially where the unknown density is very small. At the crucial point x = 0 if f (0) ≈ 0 then fn∗ (0) may be negative; see, e.g., Figure 1. This unattractive feature of local polynomial density estimators has been criticized by many authors; see, e.g., Zhang et al. (1999) and Hall and Park (2002). On the other hand, the proposed estimator fˆn (x) given by (2.15) is always non-negative whenever the kernel K is non-negative. Thus, as far as applications are concerned, we recommend the use of fˆn (x) over fn∗ (x), even though the latter possesses some good theoretical properties. 14 3 Simulations and Discussion To test the effectiveness of our estimator we simulated its performance against that of other well-known methods. Among these were some relatively new techniques, such as a recent estimator due to Hall and Park (2002) which is very similar to our own, and the transformation and reflection based method given by Zhang et al. (1999). We also competed against some more classical estimators; namely the boundary kernel and its close counterpart the local linear fitting method, Jones and Foster’s (1996) nonnegative adaptation estimator, and Cowling and Hall’s (1996) pseudo-data method. The first of the competing estimators is due to Hall and Park (2002) (hereafter H&P). It is defined as, for x = ch, 0 ≤ c ≤ 1, n 1 X K fˆHP (x) = nh i=1 µ ¶ Z x − Xi + α̂(x) . c K(t)dt, h −1 (3.1) where α̂(x) is a correction term given by f˜0 (x) ³ x ´ α̂(x) = h2 ¯ ρ , h f (x) and n 1 X f¯(x) = K nh i=1 µ ¶ Z x − Xi . c K(t)dt, h −1 (3.2) which is referred to as the cut-and-normalized kernel estimator, f˜0 (x) is an estimate of R f (1) (x), and ρ(u) = K(u)−1 v≤u vK(v)dv. To estimate f (1) (x), we used the endpoint kernel 15 of order (1,2) (see Zhang and Karunamuni, 1998) given by K1,c (t) = 12(2c(1 + t) − 3t2 − 4t − 1) I[−1,c] , (c + 1)4 and the corresponding kernel estimator µ ¶ n x − Xi 1 X 0 ˜ K1,c/b(c) , f (x) = nh2c i=1 hc where hc = b(c)h with the bandwidth variation function b(c) = 21/5 (1 − c) + c for 0 ≤ c ≤ 1. Note that Hall and Park’s estimator is a special case of the estimator (2.1), but with gˆc (y) = y − α̂(ch), where x = ch for 0 ≤ c ≤ 1. This ĝ, however, does not necessarily satisfy the conditions of (2.5). The method of Zhang et al. (1999) (hereafter Z,K&J) applies a transformation to the data and then reflects it across the left endpoint of the support of the density. The resultant kernel estimator is of the form µ ¶ µ ¶¾ n ½ x − Xi x + gn (Xi ) 1 X ˆ K +K fZKJ (x) = nh i=1 h h (3.3) where gn (x) = x + dn x2 + Ad2n x3 . Note this gn is very similar to our own, the main difference being it is not dependent on the point of estimation. The only requirement on A is that 3A > 1, and in practice we used the recommended value of A = .55. However, dn is again an estimate of d = f (1) (0)/f (0), and the methodology used is the same as that given by (2.10) to (2.13). 16 The boundary kernel with bandwidth variation is defined (see Zhang and Karunamuni, 1998) as ¶ µ n 1 X x − Xi ˆ fB (x) = K(c/b(c)) nhc i=1 hc (3.4) with c = min(x/h, 1). On the boundary the bandwidth variation function hc = b(c)h is employed, here b(c) = 2 − c. We used the boundary kernel ¾ ½ 12 3c2 − 2c + 1 K(c) (t) = I[−1,c] . (1 + t) (1 − 2c)t + (1 + c)4 2 (3.5) Zhang and Karuamuni have shown that this kernel is optimal in the sense of minimizing the MSE in the class of all kernels of order (0,2) with exactly one change of sign in their support. A simple modification of the boundary kernel gives us the local linear fitting method (LL). We used the kernel K(c) (t) = 12(1 − t2 ) {8 − 16c + 24c2 − 12c3 + t(15 − 30c + 15c2 )}I[−1,c] (3.6) (1 + c)4 (3c2 − 18c + 19) in (3.4) and call the resulting estimator fˆLL (x). In this case the bandwidth variation function is b(c) = 1.86174 − .86174c. The Jones and Foster (1996) nonnegative adaptation estimator (hereafter J&F) is defined as ( fˆJF (x) = f¯(x) exp ) fˆ(x) −1 f¯(x) (3.7) where f¯(x) is the cut-and-normalized kernel estimator of (3.2). This version of the J&F estimator takes fˆ to be some boundary kernel estimate of f , here we use fˆ(x) = fˆB (x) as defined by (3.5). 17 Cowling and Hall’s (1996) pseudo-data estimator generates data beyond the left endpoint of the support of the density. Their estimator (hereafter C&H) is defined as " n ¶# µ m X µ x − Xi ¶ X x − X 1 (−i) K fˆCH (x) = + K nh i=1 h h i=1 (3.8) where m is an integer of larger order than nh but smaller order than n; we chose m = n9/10 . Further X(−i) is given by their three-point rule X(−i) = −5X(i/3) − 4X(2i/3) + 10 X(i) 3 where, for real t > 0, X(t) linearly interpolates among the values 0, X(1) , . . . , X(n) , where X(i) represents the ith order statistic. Throughout our simulations, we used the Epanechnikov kernel K(t) = 43 (1 − t2 )I[−1,1] whenever a kernel of order two was required. Note that outside of the boundary, the majority of the estimators presented reduce to the regular kernel method given by (1.1) with the Epanechnikov. The only exception is Cowling and Hall’s, which is a close approximation to the kernel estimator away from the origin. Each estimator was tested over various shapes of density curves, but we have collapsed our results into four specific densities which are representative of the different possible combinations of the behaviour of f at 0. Density 1 handles the case f (0) = 0 and Densities 2, 3, & 4 satisfy f (0) > 0 but f (1) (0) = 0, f (1) (0) > 0 and f (1) (0) < 0, respectively. In all simulations a sample size of n = 200 was used. The bandwidth was taken to be the optimal global bandwidth of the regular kernel estimator (1.1), given by the formula ( h= R1 K(t)2 dt −1 R1 R [ −1 t2 K(t)dt]2 [f (2) (x)]2 dx 18 )1/5 n−1/5 , (3.9) (see Silverman, 1986, Ch. 3). This particular choice was made so that it is possible to compare the different estimators without regard to bandwidth effects. In our estimation of d we used the bandwidth h1 = hn−1/20 . For each density we have calculated the bias, variance and MSE of the estimators at zero, along with the MISE over the region [0, h). The results are all averaged over 1000 trials and are presented in Table 1. We have also graphed the performance of ten typical performances of each estimator, along with the regular kernel estimator, over the boundary region and part of the interior. These are presented in Figures 1 through 4. Table 1 and Figures 1 - 4 about here. In Density 1, with f (0) = 0, our estimator behaves quite well at zero. It is only slightly behind Hall and Park’s in terms of MSE at zero. In terms of MISE the boundary kernel puts in the best showing, followed closely by the LL method and J&F. It should be noted though that the boundary kernel and LL method are usually negative near zero. Despite it’s good showing at zero the H&P method comes in fourth in MISE as it slightly underestimates the density over [0, h). In fifth and sixth place are the Z,K&J and C&H methods. In last place in the MISE column is our estimator, which, from Figure 1, is evidently the result of severely underestimating f to the right of zero. We believe that error in the estimate dˆ of d is the reason behind this. When f (0) = 0, dˆ may be inaccurate and introduce error into our method. Future work may be done on creating better estimates of dˆ in this situation. For Density 2, with f (1) (0) = 0 and f (0) > 0, our estimator is the clear winner in terms of MSE at zero and the MISE over the boundary. This is also obvious from Figure 2. It is somewhat surprising that Hall and Park’s estimator is much weaker for this density, especially given that it and our estimators are all special cases of the same general form (2.1). 19 Density 3 proves to be difficult for all of the estimators. This is most evident from Figure 3, in which we see a large amount of variability. In terms of MISE all the estimators perform similarly except for H&P, which does slightly worse than the rest, and C&H, which does poorly over the whole boundary. At zero, our estimator is the best overall but only by a slim margin this time. In terms of MISE it is the LL and J&F methods that are the winners, but again only by a small amount. Our estimator has its poorest showing on the exponential shape of Density 4. At zero our estimator produces an extremely high MSE value relative to the others, and even in terms of MISE it is handily beaten (except by C&H). From Figure 4 we see the underlying reason is the inability of our estimator to accurately capture the steep growth of the exponential density. The Z,K&J estimator is quite accurate at zero and over the entire boundary, and the performance of the boundary kernel, LL method, J&F and H&P estimators do not lag far behind. Even Cowling and Hall’s estimator does well at zero, but pays the price for it in MISE over the rest of the boundary region. 4 Data Analysis We tested our proposed estimator over two datasets. The first consists of 35 measurements of the average December precipitation in Des Moines, Iowa from 1961 to 1995. The bandwidth h = 0.425 was produced by cross-validation (see Bowman and Azzalini, 1997). Figure 5 shows our proposed estimator (solid line) along with Hall and Park’s (dashed line). Over most of the boundary region they are similar, but around zero they disagree on the shape of the density. Hall and Park’s estimator seems to believe there’s a local minimum to the right of zero, but the histogram and the rug show otherwise. A larger value for h would probably diminish this behavior. We have observed that the boundary kernel, LL method and J&F 20 estimators tend to linearly approximate the density over the boundary region, with little regard for curvature. Figure 5 about here. The second dataset is comprised of 365 wind speed measurements taken daily in 1993 at 2 AM in Eindhoven, the Netherlands (source: Royal Netherlands Meteorological Institute). Intuition would suggest that the underlying density f should be similar to Density 3, with f (0) > 0 and f (1) (0) > 0, so that there is substantial mass at zero but the mode is slightly to the right. We used bandwidth h = 2.5 which we chose subjectively. Values suggested by cross-validation typically seem to undersmooth this data, hence the subjective choice. We have observed that for this dataset the shape of the estimators is fairly insensitive to the choice of bandwidth, provided that it is large enough, and so we believe our choice is justified. Figure 6 shows our proposed estimator (solid line) and the LL method (dashed line) applied to the wind data. Both estimators capture the hypothesized shape of the density and are similar over the boundary region. However our estimator clearly produces a smoother curve. Figure 6 about here. 5 Concluding Remarks Marron and Ruppert’s (1994) excellent work on kernel density estimation was among the first to introduce a transformation technique in boundary correction (also see Wand et al., 1991). Their sophisticated transformation methodology, however, has been labelled “complicated” by some researchers; see, e.g., Jones and Foster (1996). In this paper, we presented an easy-to-implement, general transformation method given by (2.1). This class of estimators possesses certain desirable properties, such as being everywhere non-negative and having 21 unit integral asymptotically, and can be tailored specifically to combat the boundary effects found in the traditional kernel estimator. The proposed estimator defined by (2.15) and the boundary corrected estimator of Hall and Park (2002) are special cases of the preceding general technique. Beyond this, both estimators are locally adaptive in the sense that the amount of transformation depends upon the point of estimation, and, even further, both transformations are estimated from the data itself. We believe the latter point is especially important, as it allows for all-important reductions in bias while, at the same time, maintaining low variance. The transformation that we have chosen at (2.14) is a convenient and computationally easy one, but it is by no means the result of an exhaustive search of functions satisfying the conditions (2.5). In fact, we conjecture that it may be possible to construct different transformations satisfying these conditions but with much improved performance, in the sense that they will be more robust to the shape of the density. Acknowledgements: This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. We wish to thank the Editor, the Associate Editors and the referees for very constructive and useful comments that led to a much improved presentation. We also thank Drs. B.U. Park and S. Zhang for some personal correspondence. 22 A Appendix: Proofs Proof of Lemma 2.1: For x = ch, 0 ≤ c ≤ 1, we have from (2.1) Z E f˜n (x) µ ¶ 1 x − g(Xi ) K(t)dt = E K h h −1 ¶ Z ∞ µ 1 x − g(y) = K f (y)dy h 0 h Z c f (g −1 ((c − t)h)) dt. = K(t) (1) −1 g (g ((c − t)h)) −1 c (A.1) Letting q(t) = g −1 ((c − t)h) we have q (1) (t) = −h −h2 g (2) (q(t)) (2) , and q (t) = , g (1) (q(t)) g (1) (q(t))3 so then a Taylor expansion of order 2 on the function f (q(·))/g (1) (q(·)) at t = c gives · (1) ¸ f (q(t)) f (q(c))g (1) (q(c)) − f (q(c))g (2) (q(c)) f (q(c)) = (1) − (t − c)h g (1) (q(t)) g (q(c)) [g (1) (q(c))]3 ½· 1 2 h (t − c)2 f (2) (q(c)) g (2) (q(c)) (1) 2 + (1) − f (q(c)) [g (q(c))]3 g (1) (q(c)) [g (1) (q(c))]2 ¸ g (3) (q(c))g (1) (q(c)) − g (2) (q(c))2 × [g (1) (q(c))] −f (q(c)) [g (1) (q(c))]3 µ ¶ ¾ g (2) (q(c)) g (2) (q(c)) (1) − 2 f (q(c)) − f (q(c)) (1) + o(h2 ) (1) g (q(c)) g (q(c)) = f (0) − (t − c)h[f (1) (0) − f (0)g (2) (0)] + h2 (t − c)2 [f (2) (0) − f (0)g (3) (0) − 3g (2) (0)(f (1) (0) − f (0)g (2) (0))] 2 + o(h2 ), (A.2) 23 with the last equality following from q(c) = g −1 (0) = 0 and g (1) (q(c)) = g (1) (0) = 1. Also, by the existence and continuity of f (2) (·) near 0, for x = ch we have f (0) = f (x) − chf (1) (x) + (ch)2 (2) f (x) + o(h2 ), 2 (A.3) f (1) (x) = f (1) (0) + chf (2) (0) + o(h), (A.4) f (2) (x) = f (2) (0) + o(1). (A.5) and Now from (A.1) to (A.5), we obtain for x = ch, 0 ≤ c ≤ 1, Z E f˜n (x) c Z c (ch)2 (2) K(t)dt(f (x) − chf (1) (x) + f (x) + o(h2 )) 2 −1 Z c −h (t − c)K(t)dt(f (1) (0) − f (0)g (2) (0)) −1 2 Z c © h (t − c)2 K(t)dt f (2) (0) − f (0)g (3) (0) + 2 −1 ª −3g (2) (0)(f (1) (0) − f (0)g (2) (0)) Z c 2 (2) − (ch) f (0) K(t)dt + o(h2 ) −1 ½ Z c Z c (1) (2) = f (x) K(t)dt − h f (0) − f (0)g (0) tK(t)dt −1 −1 ¾ Z c (2) + f (0)g (0) cK(t)dt −1 ¶ ½µZ c £ h2 2 + (t − c) K(t)dt f (2) (0) − f (0)g (3) (0) 2 −1 K(t)dt = −1 Z 2 (2) − 2c f c (0) ¤ − 3g (2) (0)(f (1) (0) − f (0)g (2) (0)) ¾ K(t)dt + o(h2 ). (A.6) −1 24 The proof of (2.2) is now completed by dividing both sides of (A.6) by Rc −1 K(t)dt. In order to prove (2.3) we first note that from (2.1), ( µ ¶) ¶−2 n X x − g(X ) 1 i K Var f˜n (x) = K(t)dt Var nh i=1 h −1 ( µZ c ¶−2 µ ¶ µ ¶¶2 ) µ 1 x − g(X ) x − g(X ) i i = K(t)dt E K2 − EK 2 nh h h −1 µZ c ¶−2 = K(t)dt (I1 + I2 ), (A.7) µZ c −1 where ¶ µ 1 x − g(Xi ) 2 EK I1 = nh2 h Z c 1 f (g −1 ((c − t)h)) = K 2 (t) (1) −1 dt nh −1 g (g ((c − t)h)) µ ¶ Z c 1 1 2 K (t)dt + o = f (0) , nh nh −1 (A.8) from (A.2). Similarly, 1 I2 = − 2 nh µ µ EK x − g(Xi ) h ¶¶2 ¶2 µ Z c 1 f (g −1 ((c − t)h)) dt =− 2 h K(t) (1) −1 nh g (g ((c − t)h)) −1 µ ¶ 1 =o , nh (A.9) again from (A.2). By combining (A.7) to (A.9) we now complete the proof of (2.3). Lemma A.1. Let dˆ be defined by (2.10) with h = O(n−1/5 ) and h1 = O(n−1/4 ). Suppose 25 that f (x) > 0 for x = 0, h and that f (2) is continuous in a neighbourhood of 0. Then ·¯ ¸ ¯3 ¯ˆ ¯ E ¯d − d¯ | Xi = xi , Xj = xj = O(h31 ) for any integer i, 1 ≤ i, j ≤ n, where d is given by (2.8). Proof. Similar to the proof of Lemma A.2 of Zhang, Karunamuni and Jones (1999). Proof of Theorem 2.1: For x = ch, 0 ≤ c ≤ 1, we write E fˆn (x) − f (x) = I3 + I4 , (A.10) where I3 = E fˆn (x) − E f˜n (x) and I4 = E f˜n (x) − f (x), with f˜n (x) given by (2.1). From Lemma 2.1, we obtain |I4 | = o(h2 ) when g is chosen to satisfy (2.4). By an application of Taylor’s expansion of order 1 on K, we see I3 satisfies Rc ¯ µ ¶ µ ¶¯ n ¯ ( −1 K(t)dt)−1 X x − gc (Xi ) ¯¯ x − ĝc (Xi ) ¯ |I3 | ≤ E ¯K −K ¯ nh h h i=1 Rc ¯µ ¶ µ ¶¯ n ¯ ¯ gc (Xi ) − ĝc (Xi ) ( −1 K(t)dt)−1 X x − g (X ) + ε(g (X ) − ĝ (X )) c i c i c i (1) ¯, = E ¯¯ K ¯ nh h h i=1 (A.11) where 0 < ε < 1 is a constant. Note that for any constants d and lc , we have for any y ≥ 0, d gc (y) = y + lc y 2 + (dlc )2 y 3 = y 2 Thus gc (y) ≥ h for y ≥ 16 h. 15 "µ 1 dlc y + 4 ¶2 # 15 15 + ≥ y. 16 16 Therefore, εĝc (Xi ) + (1 − ε)gc (Xi ) ≥ h for Xi ≥ 26 16 h. 15 Since K vanishes outside [−1, 1], from (A.11) we have Rc ¶¯ n ½ ¯µ ¯ gc (Xi ) − ĝc (Xi ) ¯ ( −1 K(t)dt)−1 X ¯ ¯ |I3 | ≤ E¯ ¯ nh h i=1 ¯ µ ¶¯ ¾ ¯ (1) x − εĝc (Xi ) − (1 − ε)gc (Xi ) ¯ ¯ ¯ × ¯K ¯ I[0 ≤ Xi ≤ ρh] h n C X E |ĝc (Xi ) − gc (Xi )| I[0 ≤ Xi ≤ ρh], (A.12) ≤ nh2 i=1 where ρ = 16/15 and C = sup|t|≤1 |K (1) (t)|. Now by the definitions of gc and ĝc (see (2.6) and (2.14)) we obtain E |ĝc (Xi ) − gc (Xi )|I[0 ≤ Xi ≤ ρh] ¯ ¯ ¯ lc ¯ = E ¯¯ (dˆ − d)Xi2 + lc2 (dˆ2 − d2 )Xi3 ¯¯ I[0 ≤ Xi ≤ ρh] 2 |lc | 2 2 ˆ ≤ h ρ E |d − d|I[0 ≤ Xi ≤ ρh] 2 + lc2 (ρh)3 E |dˆ2 − d2 |I[0 ≤ Xi ≤ ρh]. (A.13) h i ¯ The Cauchy-Schwartz inequality and Lemma A.1 yield that E |dˆ − d|k ¯Xi = xi = O(hk1 ), for 1 ≤ k ≤ 3. From this we have n o ¯ E |dˆ − d|I[0 ≤ Xi ≤ ρh] = E E[|dˆ − d|I[0 ≤ Xi ≤ ρh]¯Xi = xi ] n o ¯ = E I[0 ≤ Xi ≤ ρh] E[|dˆ − d|¯Xi = xi ] ≤ O(h1 ) E I[0 ≤ Xi ≤ ρh] = o(h2 ), (A.14) where the last equality follows from the fact that limn→∞ h−1 E I[0 ≤ Xi ≤ ρh] = f (0) for 27 each 0 ≤ c ≤ 1. From the same expression we also obtain that E |dˆ2 − d2 |I[0 ≤ Xi ≤ ρh] = E |dˆ − d||dˆ + d|I[0 ≤ Xi ≤ ρh] = E |dˆ − d||dˆ − d + 2d|I[0 ≤ Xi ≤ ρh] ≤ E|dˆ − d|2 I[0 ≤ Xi ≤ ρh] + 2|d| E |dˆ − d|I[0 ≤ Xi ≤ ρh] = o(h2 ). (A.15) By combining (A.12) to (A.15), we now have |I3 | = o(h2 ). The proof is now completed by the preceding result, the fact that |I4 | = o(h2 ), and (A.10). We now prove (2.17). First write µZ ¶2 c K(t)dt −1 ( n ) X µ x − ĝc (Xi ) ¶ 1 Var fˆn (x) = Var K (nh)2 h ( i=1 µ ¶ µ ¶¸ n · X x − ĝc (Xi ) x − gc (Xi ) 1 = Var K −K (nh)2 h h i=1 ) µ ¶ n X x − gc (Xi ) + K h i=1 = I5 + I6 + I7 , (A.16) where gc is given by (2.6), I5 = 1 Var (nh)2 ( n · X i=1 µ K x − ĝc (Xi ) h n X 1 I6 = Var K (nh)2 i=1 28 µ ¶ µ −K x − gc (Xi ) h x − gc (Xi ) h ¶¸) , (A.17) ¶ , (A.18) and à n · µ ¶ µ ¶¸ X ¶! µ n X x − gc (Xi ) 2 x − ĝc (Xi ) x − gc (Xi ) Cov . I7 = K −K , K (nh)2 h h h i=1 i=1 (A.19) From Lemma 2.1, we have f (0) I6 = nh Z c µ 2 K (t)dt + o −1 1 nh ¶ . (A.20) Now consider I5 . Note that ( n · µ ¶ ¶¸)2 µ X 1 x − gc (Xi ) x − ĝc (Xi ) I5 ≤ E −K K (nh)2 h h i=1 · µ ¶ µ ¶¸2 n x − ĝc (Xi ) x − gc (Xi ) 1 X = E K −K (nh)2 i=1 h h · µ ¶ µ ¶¸ X 2 x − ĝc (Xi ) x − gc (Xi ) + E K −K (nh)2 1≤i<j≤n h h · µ ¶ µ ¶¸ x − ĝc (Xj ) x − gc (Xj ) × K −K h h = I8 + I9 . (A.21) 29 By an application of Taylor’s expansion of order 1 on K, we obtain ¶ µ ¶¸2 · µ n x − gc (Xi ) 1 X x − ĝc (Xi ) I8 = E K −K (nh)2 i=1 h h · µ ¶¸2 n 1 X gc (Xi ) − ĝc (Xi ) (1) x − gc (Xi ) + ε(gc (Xi ) − ĝc (Xi )) = E K (nh)2 i=1 h h · µ ¶¸2 n x − gc (Xi ) + ε(gc (Xi ) − ĝc (Xi )) 1 X (1) E (gc (Xi ) − ĝc (Xi ))K ≤ 2 4 n h i=1 h n C X ≤ 2 4 E(ĝc (Xi ) − gc (Xi ))2 I[0 ≤ Xi ≤ ρh], n h i=1 (A.22) using an argument similar to (A.12) above, where 0 < ε < 1, ρ = 16/15 and C > 0 are all constants independent of n. Similar to (A.13) we can write E(ĝc (Xi ) − gc (Xi ))2 I[0 ≤ Xi ≤ ρh] ¸2 · lc ˆ 2 2 ˆ2 2 3 (d − d)Xi + lc (d − d )Xi I[0 ≤ Xi ≤ ρh] =E 2 l2 ≤ c (ρh)4 E(dˆ − d)2 I[0 ≤ Xi ≤ ρh] + 2lc4 (ρh)6 E(dˆ2 − d2 )2 I[0 ≤ Xi ≤ ρh] 2 ≤ O(h4 h21 h) + O(h6 h21 h) = o(h7 ). (A.23) Now combining (A.21) and (A.23), we obtain I8 = o (n−1 h3 ) = o ((nh)−1 ). An argument similar to (A.22) yields that |I9 | ≤ C n2 h4 X E |ĝc (Xi ) − gc (Xi )| |ĝc (Xj ) − gc (Xj )| I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh], 1≤i≤j≤n (A.24) where ρ = 16/15, and C > 0 is a constant independent of n. As in (A.23), we obtain using 30 Lemma A.1, E |ĝc (Xi ) − gc (Xi )| |ĝc (Xj ) − gc (Xj )| I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh] ¯ ¯ ¯ ¯ lc 2 2 ˆ2 2 3¯ ˆ ¯ = E ¯ (d − d)Xi + lc (d − d )Xi ¯ 2 ¯ ¯ ¯ lc ¯ 2 2 ˆ2 2 3¯ ˆ ¯ × ¯ (d − d)Xj + lc (d − d )Xj ¯ I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh] 2 n o2 ≤ C1 E h2 |dˆ − d| + h3 |dˆ2 − d2 | I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh] n ≤ 2C1 h4 E |dˆ − d|2 I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh] o 2 2 2 ˆ + E |d − d | I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh] © ª ≤ C2 h4 h21 E I[0 ≤ Xi ≤ ρh, 0 ≤ Xj ≤ ρh] = O(h4 h21 h2 ) = o(h8 ), (A.25) where C1 , C2 are positive constants independent of n. Now combine (A.24) and (A.25) to conclude that I9 = o(h4 ) = o((nh)−1 ). Similarly, it is easy to show that I4 = o((nh)−1 ) using the covariance inequality. This completes the proof of (2.17). 31 References [1] Abramson, I. (1982). On bandwidth variation in kernel estimates - A square root law. The Annals of Statistics, 10, 1217 - 1223. [2] Bowman, A.W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: the kernel approach with S-plus illustrations, Oxford University Press. [3] Cheng, M.Y. (1997). Boundary-aware estimators of integrated density derivative. Journal of the Royal Statistical Society Ser. B, 59, 191-203. [4] Cheng, M.Y., Fan, J. and Marron, J.S. (1997). On Automatic Boundary Corrections. The Annals of Statistics, 25, 1691-1708. [5] Cline, D.B.H and Hart, J.D. (1991). Kernel Estimation of Densities of Discontinuous Derivatives. Statistics, 22, 69-84. [6] Choi, E. and Hall, P. (1999). Data sharpening as a prelude to density estimation. Biometrika, 86, 941-947. [7] Cowling, A. and Hall, P. (1996). On Pseudodata Methods for Removing Boundary Effects in Kernel Density Estimation. Journal of the Royal Statistical Society Ser. B, 58, 551-563. [8] Gasser, T. and Müller, H.G. (1979). Kernel Estimation of Regression Functions. In Smoothing Techniques for Curve Estimation, Lecture Notes in Mathematics 757, eds. T. Gasser and M. Rosenblatt, Heidelberg: Springer-Verlag, pp. 23-68. [9] Gasser, T., Müller, H.G. and Mammitzsch, V. (1985). Kernels for Nonparametric Curve Estimation. Journal of the Royal Statistical Society Ser. B., 47, 238-252. 32 [10] Hall, P., Hu, T.C., and Marron, J.S. (1995). Improved variable window kernel estimates of probability densities. The Annals of Statistics, 23, 1-10. [11] Hall, P. and Park, B.U. (2002). New methods for bias correction at endpoints and boundaries. The Annals of Statistics, 30, 1460-1479. [12] Jones, M.C. (1993). Simple Boundary Correction for Kernel Density Estimation. Statistics and Computing, 3, 135-146. [13] Jones, M.C. and Foster, P.J. (1996). A Simple Nonnegative Boundary Correction Method for Kernel Density Estimation. Statistica Sinica, 6, 1005-1013. [14] Marron, J.S. and Ruppert, D. (1994). Transformations to Reduce Boundary Bias in Kernel Density Estimation. Journal of the Royal Statistical Society Ser. B, 56, 653-671. [15] Müller, H.G. (1991). Smooth Optimum Kernel Estimators Near Endpoints. Biometrika, 78, 521-530. [16] Parzen, E. (1962). On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33, 1065-1076. [17] Park, B.U., Jeong, S.O., Jones, M.C. and Kang, K.H. (2003). Adaptive Variable Location Kernel Density Estimators with Good Performance at Boundaries. Nonparametric Statistics, 15, 61-75. [18] Rosenblatt, M. (1956). Remarks on Some Nonparametric Estimates of a Density Function. Annals of Mathematical Statistics, 27, 832-837. [19] Royal Netherlands Meteorological Institute (2001), Available wind speed data. http://www.knmi.nl/samenw/hydra/meta data/time series.htm, s370.asc, June 23, 2003. 33 [20] Schuster, E.F. (1985). Incorporating Support Constraints Into Nonparametric Estimators of Densities. Communications in Statistics, Part A - Theory and Methods, 14, 1123-1136. [21] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, London: Chapman and Hall. [22] Terrell, G.R. and Scott, D.W. (1992). Variable kernel density estimation. The Annals of Statistics, 20, 1236 - 1265. [23] Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing, London: Chapman and Hall. [24] Wand, M.P., Marron, J.S. and Ruppert, D. (1991). Transformations in Density Estimation (with discussion). Journal of the American Statistical Association, 86, 343-361. [25] Zhang, S. and Karunamuni, R.J. (1998). On Kernel Density Estimation Near Endpoints. Journal of Statistical Planning and Inference, 70, 301-316. [26] Zhang, S. and Karunamuni, R.J. (2000). On Nonparametric Density Estimation at the Boundary. Nonparametric Statistics, 12, 197-221. [27] Zhang S., Karunamuni, R.J. and Jones, M.C. (1999). An Improved Estimator of the Density Function at the Boundary. Journal of the American Statistical Association, 448, 1231-1241. 34 Table 1: Bias, variance and MSE of the indicated estimators at 0, and their MISE over [0, h). New Method H&P method Z,K&J method Boundary Kernel LL method J&F method C&H method Density 1: f (x) = x2 e−x , x ≥ 0, with n = 200, h = .832553 Bias Var MSE MISE .001726 .000026 .000029 .000821 .001482 .000023 .000025 .000375 .021782 .000078 .000553 .000426 -.017091 .000345 .000637 .000234 -.018519 .000431 .000774 .000271 .011745 .000086 .000223 .000268 .026570 .000082 .000788 .000501 p 2 Density 2: f (x) = 2/πe−x /2 , x ≥ 0, with n = 200, h = .707481 Bias Var MSE MISE -.014177 .005493 .005694 .001994 .011316 .015845 .015973 .002879 .082133 .011484 .018230 .003481 .052554 .009875 .012637 .002736 .054271 .009811 .012757 .002616 .059196 .010808 .014312 .002915 .015713 .015374 .015621 .005709 New Method H&P method Z,K&J method Boundary Kernel LL method J&F method C&H method Density 3: f (x) = (x2 + 2x + 12 )e−2x , x ≥ 0, with n = 200, h = .467044 Bias Var MSE MISE .049289 .017806 .020235 .003527 .031373 .024642 .025626 .003719 .098353 .014214 .023887 .003230 .090833 .013616 .021867 .003202 .093012 .013934 .022586 .003172 .092976 .013373 .022017 .003178 .091860 .027231 .035669 .005177 Density 4: f (x) = 2e−2x , x ≥ 0, with n = 200, h = .342128 Bias Var MSE MISE -.417383 .012278 .186487 .011051 -.163295 .048959 .075624 .006007 -.111601 .025951 .038406 .005838 -.122779 .044610 .059684 .006358 -.124788 .043427 .058999 .005948 -.087819 .051707 .059419 .006785 -.117587 .036915 .050741 .025894 2 35 1.00 0.00 0.00 0.25 0.25 0.75 0.75 x2 −x e ,x 2 LL 0.50 Kernel 0.50 Figure 1: Ten typical estimates of density 1, f (x) = Boundary 0.00 0.00 0.75 0.05 0.05 0.50 0.10 0.10 0.25 0.15 0.15 0.00 0.20 0.20 New Method 1.00 0.00 0.00 0.75 0.05 0.05 0.50 0.10 0.10 0.25 0.15 0.15 0.00 0.20 0.20 0.00 0.05 0.10 0.15 0.20 0.00 0.00 0.25 0.25 J&F 0.50 H&P 0.50 0.75 0.75 1.00 1.00 0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0.00 0.00 0.25 0.25 ≥ 0, with the optimal global bandwidth h = .832553. 1.00 1.00 0.00 0.05 0.10 0.15 0.20 C&H 0.50 Z,K&J 0.50 0.75 0.75 1.00 1.00 1.00 0.00 0.00 0.25 0.25 p LL 0.50 Kernel 0.50 Figure 2: Ten typical estimates of density 2, f (x) = Boundary 0.50 0.50 0.75 0.75 0.75 0.50 1.00 0.25 1.00 1.00 0.00 0.75 New Method 0.50 0.50 0.50 0.25 0.75 0.75 0.00 1.00 1.00 0.50 0.75 1.00 0.00 0.00 0.25 0.25 J&F 0.50 H&P 0.50 0.75 0.75 1.00 1.00 0.50 0.75 1.00 0.50 0.75 1.00 0.00 0.00 0.25 0.25 , with the optimal global bandwidth h = .707481. 1.00 1.00 2 /2 2/πe−x 0.75 0.75 0.50 0.75 1.00 C&H 0.50 Z,K&J 0.50 0.75 0.75 1.00 1.00 0.60 0.00 0.00 0.20 0.20 LL 0.40 Kernel 0.40 0.60 0.60 0.40 0.60 0.80 0.40 0.60 0.80 0.00 0.00 0.20 0.20 J&F 0.40 H&P 0.40 0.60 0.60 0.40 0.60 0.80 0.40 0.60 0.80 0.00 0.00 0.20 0.20 0.40 C&H 0.40 Z,K&J Figure 3: Ten typical estimates of density 3, f (x) = 12 (2x2 + 4x + 1)e−2x , with the optimal global bandwidth h = .467044. Boundary 0.40 0.40 0.40 0.60 0.60 0.20 0.80 0.00 0.60 0.80 New Method 0.40 0.40 0.40 0.20 0.60 0.60 0.00 0.80 0.80 0.60 0.60 1.00 1.50 2.00 2.50 1.00 1.50 2.00 2.50 J&F 0.00 0.10 0.20 0.30 0.40 0.50 H&P 0.00 0.10 0.20 0.30 0.40 0.50 1.00 1.50 2.00 2.50 1.00 1.50 2.00 2.50 C&H 0.00 0.10 0.20 0.30 0.40 0.50 Z,K&J 0.00 0.10 0.20 0.30 0.40 0.50 Figure 4: Ten typical estimates of density 4, f (x) = 2e−2x , with the optimal global bandwidth h = .342128. LL 1.00 1.00 Boundary 1.50 1.50 0.00 0.10 0.20 0.30 0.40 0.50 2.00 2.00 0.00 0.10 0.20 0.30 0.40 0.50 2.50 2.50 Kernel 1.00 1.00 New Method 1.50 1.50 0.00 0.10 0.20 0.30 0.40 0.50 2.00 2.00 0.00 0.10 0.20 0.30 0.40 0.50 2.50 2.50 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Figure 5: Density estimates for 35 measurements of average December precipitation in Des Moines, Iowa from 1961-1965, shown on the rug, with h = 0.425. The solid line is our proposed estimator (without BVF) and the dashed line is Hall and Park’s. 0.0 0.1 0.2 0.3 0.4 0.5 0 5 10 15 Figure 6: Density estimates for 365 measurements of wind speed at Eindhoven, the Netherlands, taken at 2 AM every day during 1993. The values are plotted on the bottom. The solid line is our proposed estimator (without BVF) and the dashed line is the LL estimator, both with bandwidth h = 2.5. 0.00 0.05 0.10 0.15