Bandwidth selectors performance through SiZer Map Martı́nez-Miranda, M.D.1 , Raya-Miranda, R.1 , González-Manteiga, W.2 and González-Carmona, A.1 1 2 Department of Statistics and R.O., University of Granada, Spain mmiranda@ugr.es, rraya@ugr.es, andresgc@ugr.es Department of Statistics and R.O., University of Santiago de Compostela, Spain wenceslao@usc.es Summary. In this paper, we extend the graphic tool SiZer Map to estimate additive models by backfitting, marginal integration and efficient mixed methods in order to evaluate the behaviour of several bandwidth selectors. The strategy consists of visualizing their position inside the colour space defined by the maps. It’s been carried out a simulation study considering different bivariate additive regression models which illustrates and describes a convenient use of SiZer to provide meaningfully comparisons among bandwidth selectors. Key words: Additive Model, Backfitting, Marginal Integration, SiZer Map, Smoothing Parameter 1 Introduction Additive models in nonparametric regression, ( [PE01]), are formulated by considering the following regression model: m(x) = α + D X md (xd ) , (1) d=1 where α is a constant term, x = (x1 , . . . , xd )T are the d-dimensional predictor variables and {md , d = 1, ..., D} denotes a set of unknown univariate smooth functions with E [md (Xd )] = 0. Different methods have been proposed to estimate the additive models, among others it outstands the backfitting algorithm ( [BHT89]) and the marginal integration method ( [LN95]). The paper [SLH99] gives a comparative study between both methods, showing that, in general, neither can be definitively considered superior to the other. Other efficient methods of estimation in additive models raise the combination of backfitting and marginal integration. In this sense [L97] and [KLH99], propose two steps estimators, i.e., using starting estimations provided by the marginal integration, within the backfitting algorithm. 1278 Martı́nez-Miranda et al. SiZer Map is a graphical tool for exploring the features of the data set which support the estimation curve problem and it can be used to evaluate the behavior of several bandwidth selectors, by visualizing their position inside the colour space defined by this curve. 2 Notation and preliminar definitions Let us assume that Yi = α + D X md (Xdi ) + εi , i = 1, ..., n, (2) d=1 where {(X1 , Y1 ) , ..., (Xn , Yn )}, with Xi = (X1i , ..., XDi )T , is a set of independent and identically random variables. The residuals, εi , are independent and identically distributed with mean 0 and variance, σ 2 (Xi ). Under the additive regression model (2) represents the n × n smoother matrix with respect to the dth covariate vector as Sd . The nonparametric estimation of the component functions, md , can given by solving the system of normal equations 2 I 6 S2 6 6 . 4 .. SD S1 I .. . SD ··· ··· .. . ··· 32 3 2 3 S1 m1 S1 6 m2 7 6 S2 7 S2 7 76 7 6 7 .. 7 6 .. 7 = 6 .. 7 Y. . 54 . 5 4 . 5 mD SD I (3) The backfitting algorithm ( [BHT89]) provides an iterative solution of (3). In ( [O00]) derived explicit expressions of the estimators when the matrices Sd are base b d = Wd Y, d = 1, ..., D, with Wd on linear smoothers, and it can be written by m being a matrix of weights (see [O00] for more details). Under a different perspective, marginal integration method estimates each component md (·) by integrating a pilot multivariate smoother of m involving a D − 1dimensional probability measure. The so-called empirical marginal integration estimator is based on the D-dimensional Nadaraya-Watson estimator, m(x). e Briefly, it’s defined a partition Xi ≡ (Xdi , X−d,i ) (here X−d,i denotes the (D − 1)-dimensional vector defined by removing Xdi from Xi ), and then, the empirical marginal integration estimator of the d-th component, md (·), is computed by b γd (xd ) = n−1 n X m(x e d , X−d,j ). j=1 b IM (x) = Then, the additive reconstruction is m D X b γd (xd ). d=1 Recently, [KLH99] proposed an efficient oracle estimator which was defined by inserting the previous defined empirical marginal integration estimator into a backfitting algorithm but taking one step only. The method first constructs responses by 2−step Ydi = Yi − D X j6=d b γj (Xji , h0 ) (4) Bandwidth selectors performance through SiZer Map 1279 with h0 being a pilot scalar bandwidth, and afterwards it’s applying the univariate 2−step local polynomial smoother to the pairs {(Xdi , Ydi ), i = 1, . . . n} in order to estimate the d-th component by 2−step m bd (xd ) = n X 2−step wiLP (xd , hd )Ydi . (5) i=1 Here, hd (d = 1, . . . , D) denotes the scalar bandwidths at each component, and the weights wiLP are those associated to the univariate local polynomial smoother (of a degree of pd ) (more details of the method can be found in [KLH99]). 3 SiZer Map SiZer Map ( [CM99]) is a graphical tool first introduced in a univariate context and recently generalized for the bidimensional case ( [GMC02]). SiZer allows to explore the features of the data set that supports the estimation curve problem. By using different colours (or different tones of grey for black and white versions), we can show the significant increase or decrease of the target curve, considering different smoothing levels. In this paper, we use the SiZer Map to evaluate the behaviour of several bandwidth selectors, by visualizing their position inside the colour space defined by this curve. SiZer is used to determine the significance of the features of a target curve, such as peaks and valleys, by considering a family of smoothers, {m b (x; h) : h ∈ [hmin , hmax ]}. The procedure involves the construction of confidence intervals for the derivative, m′ (x; h), at the space defined by the smoothing parameter, h. The extension of SiZer Map to a multidimensional context presents serious difficulties, even with regard to the graphical representation of the maps. One simple solution is to consider additive models, because these allow us to consider the effect of each covariate separately. Following this idea, an immediate extension to an additive model would be to construct as many SiZer Maps as there are covariates. So that [RMG02] develops the expressions necessary to create the SiZer Map for an additive model estimated using the backfitting algorithm. Based in this work, we present the expressions necessary to do the SiZer Map for an additive model estimated using the efficient method ( [KLH99]). Consider a family of efficient estimators for an additive model such as the one considered in this paper, {m b (x; h1 , ..., hD ) : hd ∈ [hd,min , hd;max ] , d = 1, ..., D}, and define confidence intervals for the derivative of the component functions, m b ′d (xd ; hd ), d = 1, ..., D. The d-th curve shows the features of the md component by means of different colours, in a similar way to that adopted for the univariate context. The confidence intervals for the derivative of the d-th component are written as q ′ m b d (xd ; hd ) ± q d (m Var b ′d (xd ; hd )), where the quantile, q, is calculated from normal approximations. The expressions for the derivatives of the components and the variances of these (r) derivatives are derived. Let sd,xd , d = 1, . . . , D a polinomial smoother of the r-th derivative in xd , where (r) sd,xd T = r!eTr+1 XxTd Wxd Xxd −1 XxTd Wxd , with eTr+1 a 1280 Martı́nez-Miranda et al. h (r) vector of zeros, (pd + 1) × 1, with a 1 in the r + 1-position. Let Sd i i (r) = sd,Xdi T . The vectors of the local polinomial regression smoothers of the derivative function, (r) md with respect to xd can be defined as 0 (r) m bd = (r) Sd Y − D X 1 (r) b γj (Xj , h0 )A = Sd Yd2−step , (6) j6=d where Yd2−step are defined in (4). Then, the estimator of the first derivative of the component md can be written ∗ as m′d (xd ) = Wd,x Y , where d 0 (1) ∗ Wd,x = S d 1 − d D X 1 wjIM (xd , hd )A . j6=d Therefore ∗T ∗ ΣWd,x , Var m′d (xd ) = Wd,x d d where Σ = diag ε2i . b (Xi ), for any pilot estimator, m. b The residuals, εi , are estimated by εbi = Yi − m And by substituting these estimations in the matrix Σ, the previous variance can be estimated as follow: ∗T b ∗ d m′d (xd ) = Wd,x Var ΣWd,x . d d 4 Evaluating bandwidth selectors using SiZer Map The choice of smoothing parameter (or parameters) becomes a crucial technical problem in nonparametric and semiparametric regression, because of its direct repercussion on the smoothers performance. Our objective is to evaluate the behaviour of several bandwidth selectors, including local and global selectors, for the two methods of estimation. The newness and the most important particularity of the performed comparison relies ont the fact that it allows to understand how bandwidth selectors works across the data. This meaningfully comparison can be provided by the convenient use of the SiZer tool which has been extending to here considered additive nonparametric regression context. Among the bandwidth selector for additive models we’ve evaluated global bandwidth selectors based on a crossvalidation method, which is available for backfitting, marginal integration and efficient mixed estimators (presented in section 2), or plugin techniques developed for any of them. The global nature of these measures leads to smoothing parameter values being constant over the whole estimation interval. Local bandwidth selectors for nonparametric smoothers provide notable improvements in the estimation of surfaces by achieving a major adaptation to the subjacent features of data. Under additivity it has been previously proposed local version of crossvalidation, and bootstrap selectors like the introduced by [MRGG05] which showed a good performance with all additive smoothers. For backfitting smoothers, we’ve considered two global bandwidth selectors, a plug-in selector and other based on cross-validation (both proposed in [OR98]). And Bandwidth selectors performance through SiZer Map 1281 two local selectors, the bootstrap selector proposed by [MRGG05], the theoretical optimum (the minimizer of M SE-criterion), and a local crossvalidation selector (an extension of that proposed by [V91] in a unidimensional situation). For the efficient estimator obtained by the mixed method, we’ve evaluated the local bootstrap selector and the local optimum. In the just described spirit we’ve carried out a simulation study by considering three bivariate additive regression models, but to avoid a too large document we present only the results for one of the considered models that’s given by: m1 (x1 ) = 1 − 6x1 + 36x21 − 53x31 + 22x51 and m2 (x2 ) = sin (5πx2 ) . The explicative variables were generated from independent normal distributions with mean 0.5 and variance 1/9. the residuals were generated from a distribution normal with mean zero and constant variance of 0.25. With these definitions we generated 100 samples with sizes n = 100. The local linear smoother involved in the estimations was calculated with a gaussian kernel, K (x) = (2π)(−1/2) exp −x2 /2 . Figures 1 and 2 show the Sizer maps for the backfitting estimators ( [OR98]) and the smoothers derived from the presented mixed estimation method ( [KLH99]). Family Plot Family Plot 2 2 1 1 0 0 −1 −1 −2 0 0.2 0.4 0.6 0.8 −2 1 0 0.2 0.6 0.8 1 0.8 1 SiZer Plot 0 0 −0.5 −0.5 log10(h2) log10(h1) SiZer Plot 0.4 −1 −1.5 −2 −1 −1.5 −2 0 0.2 0.4 0.6 x 0.8 1 0 0.2 0.4 0.6 z Fig. 1. SiZer Map for the simulated data with components estimated by Backfitting. Each figure consists of two types of graphs, the so called Family Plot and the SiZer map, both being constructed for each component in the model. Family Plot 1282 Martı́nez-Miranda et al. Family Plot Family Plot 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −2 0.2 0.4 0.6 −2 0.8 0.2 0.6 0.8 SiZer Plot 0 0 −0.2 −0.2 −0.4 −0.4 −0.6 −0.6 −0.8 −0.8 log10(h2) log10(h1) SiZer Plot 0.4 −1 −1 −1.2 −1.2 −1.4 −1.4 −1.6 −1.6 −1.8 −1.8 −2 −2 0 0.2 0.4 0.6 x 0.8 1 0 0.2 0.4 0.6 0.8 1 z Fig. 2. SiZer Map for the simulated data and estimated components using the Efficient method. allows us to compare different choices of smoothing level for estimating the components (represented by the showed blue curves), and also it includes estimates using various specific smoothing parameters. Indeed, the black curve is associated to the local bootstrap selector, the blue one represents the optimal local theoretical bandwidth; the red curve is the estimation with a local crossvalidation bandwidth; green shows the global plug-in selector, and yellow is used to plot estimations with a global crossvalidation bandwidth. SiZer maps, for each additive component, represent on the horizontal axis the range of the given covariate and the smoothing levels are displayed on the vertical level. In the panel it’s showed curves associated to the evaluated bandwidths, both local an global: the solid black line represents the plug-in selector and the dashed one shows the global crossvalidation bandwidth. The solid white curve is the local bootstrap bandwidth, the white dotted curve is associated to the optimal local theoretical bandwidth, and the white dashed curve represents a local crossvalidation bandwidth. References [BHT89] Buja, A., Hastie, T.J., Tibshirani, R.: Linear Smoothers and Additive Models (with discussion). Ann. Statist. 17, 453–555 (1989) Bandwidth selectors performance through SiZer Map [CM99] 1283 Chaudhuri, P., Marron, J.S.: SiZer for Exploration of Structures in Curves. J. Amer. Statist. Assoc. 94, 807–823 (1999) [GMC02] Godtliebsen, F., Marron, J.S., Chaudhuri, P.: Significance in scale space for bivariate density estimation. J. Comput. Graphical Statist. 11, 1–22 (2002) [GMP04] González-Manteiga, W., Martı́nez-Miranda, M.D., Pérez-González, A.: The Choice of Smoothing Parameter in Nonparametric Regression through Wild Bootstrap. Comput. Statist. Data Anal. 47, 487–515 (2004) [HT90] Hastie, T.J., Tibshirani, R.: Generalized additive models. Chapman & Hall (1990) [KO04] Kauermann, G., Opsomer, J.D.: Generalized cross-validation for bandwidth selection of backfitting estimates in generalized additive models. J. Comput. Graphical Statist. 13, 66–89 (2004) [KLH99] Kim, W., Linton, O.B., Hengartner, N.W.: A computationally efficient oracle estimator for additive nonparametric regression with bootstrap confidence intervals. J. Comput. Graphical Statist. 8, 278–297 (1999) [L97] Linton, O.B.: Efficient estimation of additive nonparametric regression models. Biometrika, 84, 469–473 (1997) [LN95] Linton, O.B., Nielsen, J.P.: A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika. 82, 93–100 (1995) [MRGG05] Martı́nez-Miranda, M.D., Raya-Miranda, R., González-Manteiga, W., González-Carmona, A.: SiZer Map for evaluating a bootstrap local bandwidth selector in nonparametric additive models. Technical Report 05-01. Universidad de Santiago de Compostela (2005) [O00] Opsomer, J.D.: Asymptotic properties of backfitting estimators. J. Multivariate Anal. 73, 166–179 (2000) [OR98] Opsomer, J.D., Ruppert, D.: A fully Automated Bandwidth Selection Method for Fitting Additive Models. J. Amer. Statist. Assoc. 93, 605– 619 (1998) [RMG02] Raya-Miranda, R., Martı́nez-Miranda, M.D., González-Carmona, A.: Exploring the structure of regression surfaces by using SiZer Map for additive models. Proceedings in Computational Statistics. XV COMPSTAT. Statistics Netherlands (2002) [SLH99] Sperlich, S., Linton, O.B., Härdle, W.: Integration and Backfitting Methods in Additive Models - Finite Sample Properties and Comparison. Test, 8, 419–458 (1999) [V91] Vieu, P.: Nonparametric regression: Optimal local bandwidth choice. J. Roy. Statist. Soc., 53, 453–464 (1991)