Asymptotic properties in a semi-functional partial linear regression model Germán Aneiros-Pérez1 and Philippe Vieu2 1 2 Departamento de Matemáticas, Facultad de Informática, Universidade da Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain ganeiros@udc.es Laboratoire de Statistique et Probabilités, Université Paul Sabatier, Toulouse3, 118 route de Narbonne, 31062 Toulouse Cedex, France vieu@cict.fr Summary. A new regression model is introduced in order to capture both the advantages of a semi-linear modelling (see [AGV04]) and those of the recent advances on nonparametric statistics for functional data (see [FV06]). This leads us to the following so-called Semi-Functional Partial Linear Regression model: Y = r (X1 , ..., Xp , T ) + ε = p X Xj βj + m(T ) + ε, (1) j=1 where Xj and T are real and functional explanatory variables, respectively. Estimates for the vector of parameters β and the function m in (2) are presented and some asymptotic results are given. Specifically, we obtain: √ b n βh − β −→N 0, σε2 B−1 , (i) (ii) lim supn→∞ (n/(2 log log n))1/2 βbhj − βj = σε2 bjj α p 1/2 a.s., Finally, a real b h (t) − m(t)| = O (h ) + O log n/(nφ (h)) a.s. (iii) supt∈C |m data example illustrates the usefulness of the model. 1 Introduction Since the introductory work by [EGRW86], the partial linear model has been widely studied (see [S88], [C88], [S96], [AGV04], and references therein) and its interest has been emphasized in many fields of applied statistics. The aim of such a model is to allow to some among the explanatory variables to act in a free nonparametric manner, while some other are controlled by means of a parametric (linear) relation. Until now, such kind of model has only been investigated in the situation where all the explanatory variables take real values. See [HLG00] for a monograph on these models. From another part, because of their increasing interest at the present moment in many fields of applied statistics for which statistical observations are curves, functional data are the object of the attention of many researches. 1652 Germán Aneiros-Pérez and Philippe Vieu The reader can have access to the state of art in parametric modelling (respectively in nonparametric modelling) for functional data by looking at [RS02] and [RS05] (respectively by looking at [FV06]). In the setting of functional regression problems, most recent advances are those by [CFS03] (concerning parametric linear models) and those by [FV06] (concerning nonparametric approaches). The aim of our note is to combine the flexibility of a partial linear modelling together with the recent methodology for nonparametric treatment of functional data. This leads us to the following so-called Semi-Functional Partial Linear Regression model (SFPLR model) Y = r (X1 , ..., Xp , T ) + ε = p X Xj βj + m(T ) + ε, (2) j=1 where Xj (j = 1, ..., p) are real explanatory variables, T is another explanatory variable but of functional nature, ε is a random error satisfying E (ε | X1 , ..., Xp , T ) = 0, T β = (β1 , ..., βp ) is a vector of unknown real parameters and m is an unknown smooth real function. According to [FV06], the most interesting spaces to model functional variables are semi-metric spaces. With other words, we consider that T is valued in some abstract semi-metric space H and we denote by d (·, ·) the associated semi-metric. In the rest of the paper all the used topological notions derive from the topology Td associated with this semi-metric. In this note, we will study the SFPLR model. More precisely, in Section 2 we construct estimates based on a sample of independent and identically distributed vectors, while in Section 3 we derive their first asymptotic properties. Finally, Section 4 is devoted to the illustration of how our general methodology applies to some spectrometric data set. 2 The model and the estimates Assume that we have a sample of n independent and identically distributed vectors valued in Rp+1 × C (C ⊂ H). These vectors will be denoted from now on by n {(Yi , Xi1 , ..., Xip , Ti )}i=1 . The SFPLR model can be rewritten by assuming that we have Yi = p X j=1 where Xij βj + m(Ti ) + εi (i = 1, . . . , n), (3) Semi-functional partial linear regression E (εi | Xi1 , ..., Xip , Ti ) = 0 (i = 1, . . . , n). 1653 (4) For technical reasons, we will assume that n C is some given compact subset of H such that C ⊂ ∪τk=1 B (zk , ln ) , (5) where τn lnγ = C (γ and C denote real positive constants), and τn → ∞ and ln → 0 as n → ∞ (we have denoted B (t, h) = {t′ ∈ H; d (t′ , t) < h}). The compactness of C is an usual condition in the setting of nonfunctional partial linear models (see, e.g., [C88], [BZ97], and [L00]), while conditions on τn and ln are usual ones in the setting of functional nonparametric models (see [FV06], Section 9.7). We estimate the vector of parameters β and the function m in (2) by means of −1 eTY e eTX e βbh = X X (6) h h h h and m b h (t) = n X i=1 wn,h (t, Ti )(Yi − XTi βbh ), (7) respectively. In these estimators, h is a smoothing parameter that typically appears in any setting of nonparametric estimation. Furthermore, we have deT T T noted X = (X1 , ..., Xn ) with Xi = (Xi1 , ..., Xip ) , Y = (Y1 , ..., Yn ) and, for e h = (I − W )A, where Wh = (wn,h (Ti , Tj )) any (n × q)-matrix A (q ≥ 1), A h i,j with wn,h (·, ·) being a weight function that can take different forms. Concretely, in this paper we will focus on the weights K (d (t, Ti ) /h) wn,h (t, Ti ) = Pn , j=1 K (d (t, Tj ) /h) (8) where K is a function from [0, ∞) into [0, ∞). These weights, used in [FV06] for a purely nonparametric regression model, are a functional version of the Nadaraya-Watson type weights. As it is usual in nonfunctional partial linear regression models, the conditions linked with the estimation of the nonparametric component m are exactly the same as those used in pure nonparametric regression models. Therefore, we will naturally need the same set of assumptions as those originally proposed in [FV06]. This concerns as well the kernel function K, which is assumed to satisfy the following usual restrictions: K has support [0, 1] is Lipschitz continuous on [0, ∞), and ∃θ such that ∀u ∈ [0, 1], − K ′ (u) > θ > 0, (9) as well the probability distribution of the infinite-dimensional process T , which is assumed such that there exist a positive-valued function φ on (0, ∞) and positive constants α0 , α1 and α2 such that 1654 Z1 0 Germán Aneiros-Pérez and Philippe Vieu φ (hs) ds > α0 φ (h) and α1 φ (h) ≤ P (T ∈ B (t, h)) ≤ α2 φ (h) , ∀ t ∈ C, h > 0. (10) The reader will find in [FV06] a discussion concerning the links between these assumptions, the semi-metric d and the small ball concentration properties of T , as well as discussion about how the small ball probability condition (10) can be interpreted in finite dimensional setting in terms of standard conditions on the density of T . The conditions on the smoothing parameter h > 0 are standard and will be stated along the theorems below. While these conditions are those needed to deal with the functional nonparametric component of the model, there is naturally a second set of conditions which is linked with the linear part of the model. For that, let us introduce the following notations: T gj (t) = E (Xij | Ti = t) , ηij = Xij − E (Xij | Ti ) and ηi = (ηi1 , ..., ηip ) . Observe that the expressions of our estimators (6) and (7) contain estimators of g1 , ..., gp . So, in addition to the usual smoothness conditions on m we need similar ones on the gj . More precisely, we assume that all the operators to be estimated are smooth, in the sense that for some C < ∞ and some α > 0 we have α (11) E |ε1 | + E |η11 | + · · · + E |η1p | < ∞, where r ≥ 3, σε2 = V ar (ε) > 0 and B = E η1 η1T is a positive definite matrix, (12) ∀ (u, v) ∈ C × C, ∀ f ∈ {m, g1 , ..., gp } , |f (u) − f (v)| ≤ Cd (u, v) . Furthermore, we need the following assumptions: r r r (13) and ηi is independent of εi (i = 1, ..., n). (14) Observe that assumptions (11)-(14) are not unduly restrictive, and they are quite usual in the setting of nonfunctional partial linear models (see [RB90], [G95], and [L00], among others). 3 Asymptotic behaviour We are now in position to give our asymptotic results. Theorem 1 studies the asymptotic behaviour of the estimate of the parametric component of the model, while Theorem 2 concerns the nonparametric one. Semi-functional partial linear regression 1655 Theorem 1. Under assumptions (3)-(5) and (8)-(14), if in addition nh4α → 2 0 as n → ∞ and φ (h) ≥ n(2/r)+b−1 / (log n) for n large enough and some constant b > 0 satisfying (2/r) + b > 1/2 (where r ≥ 3 was defined in assumption (12)), then √ (i) n βbh − β −→N 0, σε2 B−1 , 1/2 1/2 a.s., (ii) lim supn→∞ (n/(2 log log n)) βbhj − βj = σε2 bjj jj −1 where b = B . jj This theorem extends previous results established in the nonfunctional setting (see [C88], and [G95], among others) to the case where the explanatory variable T is of functional nature. We see that the dimension of T does not change the rate of convergence of βbh , but it modifies the conditions on the smoothing parameter h. Theorem 2. Under the assumptions of Theorem 1, we have that p sup |m b h (t) − m(t)| = O (hα ) + O log n/(nφ (h)) a.s. t∈C This result can be seen as an extension, in several directions, of existing literature. Firstly, observe that it is an extension to the results existing in pure nonparametric functional models (see [FV06], and references therein). The rates of convergence are similar, showing that (as it was previously the case in nonfunctional partial linear models) the existence of a linear component does not change the rates of convergence of the nonparametric component. At this point it is worth noting that, even if the main novelty of our methodology is to consider functional situations (that is, situations where the space H is of infinite dimension), all our results apply directly to the special case where H = Rq . To fix the ideas, let us just mention that if T takes values in C ⊂ Rq and if T has a strictly positive density (on its support) with respect to Lebesgue measure, then we can take as function φ(h) ∼ hq . This means that, if we particularize our results to the finite dimensional case, the rates of convergence p of the nonparametric component go back to be the classical ones of order log n/nhq . 4 A real example In this section, we present an application of the SFPLR model to spectrometric curves. The aim of this application is not to achieve a full case study but to show the interest of the three ideas in our model, that is: (i) functional nonparametric part, (ii) additional information with real explanatory variables and (iii) linearity of the effect of the real explanatory variables. Each food sample contains finely chopped pure meat with different fat, protein and moisture (water) contents. For each food sample, the functional 1656 Germán Aneiros-Pérez and Philippe Vieu data consists of a 100 channel spectrum of absorbances recorded on a Tecator Infratec Food and Feed Analyzer working in the wavelength range 850 1050 nm by the Near Infrared Transmission (NIT) principle. The fat, protein and moisture contents, measured in percent, are determined by analytic chemistry. The aim is to find the relationship between the percentage of fat content Y , and the corresponding percentages of protein content X1 and moisture content X2 , and the spectrometric curve T . More details on the data can be found in [FV06]. Finally, we had n = 215 independent observations {(Yi , Xi1 , Xi2 , Ti )}ni=1 of (Y, X1 , X2 , T ). This sample was divided into two data sets: the training sample I = {1, ..., 165} was used to select some parameters of the estimates and the testing sample J = {166, ..., 215} allowed to verify the quality of prediction. We have used various different models to predict the fat content of a meat sample on the basis of its protein and/or moisture contents and/or its NIT absorbance spectrum. Concerning the functional features of the models, and according to [FV06], we used the semi-metrics ds (·, ·) =k · − · ks , s = 0, 1, 2, 3, R 2 1/2 where k f ks = f (s) (t) dt and we used k-nearest neighbours type bandwidths. Both parameters s and k were selected by cross-validation over the training sample I. Furthermore, for linear and additive features of the models (non- and semi-functional), OLS estimators and a backfitting algorithm (see [HT90]) were used, respectively. The criteria used on the test sample J to compare the skill of the different models was the following mean quadratic error of prediction 2 X 1 Yj − Ybj /V arJ (Y ) . CardJ (15) j∈J The different models used and the corresponding values of this criterion error are shown in Table 1 below. The first interesting thing to be noted is the strong linear relationship between the fat content and the protein and moisture contents, the corresponding linear model giving a similar information (in terms of the mean error of prediction) as that of the SFPLR model containing the moisture content and the spectrometric curve. More interestingly, if one mix these two models (by mean of a SFPLR model with protein and moisture as real explanatory variables and spectrometric curve as functional one), then the mean error of prediction is reduced in a 50%. The rest of the studied models are clearly worse than the three models just mentioned. As a conclusion, we would say that this spectrometric application has shown the interest of the three points of the model (see (i)-(iii) at the start of this section), these data being charaterized by their functional nonparametric structure and the linear effect of exogenous variables. The SFPLR model is a competitive one for such data. Semi-functional partial linear regression 1657 Table 1. Models and mean value of the criterion error for the test sample Mean error of prediction 0.2296 0.0232 0.0111 Linear models Y = α1 + X1 β1 + ε1 Y = α2 + X2 β2 + ε2 Y = α3 + X1 β3,1 + X2 β3,2 + ε3 Nonparametric models Y = m1 (X1 ) + ε4 Y = m2 (X2 ) + ε5 0.3761 0.0256 Additive model Y = µ + m3 (X1 ) + m4 (X2 ) + ε6 0.0317 Optimal semi-metric 2 0.0233 Semi-Functional Partial Linear models Y = X1 β4 + m6 (T ) + ε8 Y = X2 β5 + m7 (T ) + ε9 Y = X1 β6,1 + X2 β6,2 + m8 (T ) + ε10 2 1 1 0.0223 0.0114 0.0052 Additive Semi-Functional models Y = µ + m9 (X1 ) + m10 (T ) + ε11 Y = µ + m11 (X2 ) + m12 (T ) + ε12 Y = µ + m13 (X1 ) + m14 (X2 ) + m15 (T ) + ε13 2 2 2 0.0242 0.0368 0.0395 Functional model Y = m5 (T ) + ε7 Acknowledgements. Research of the first author was supported in part by MEC Grant (EU ERDF support included) MTM2005-00429. Philippe Vieu wishes to thank all the participants of the working group STAPH on Functional Statistics at the University Paul Sabatier of Toulouse for stimulating and continuous helpful comments. The activities of this group are available on http://www.lsp.ups-tlse.fr/Fp/Ferraty/staph.html. References [AGV04] Aneiros-Pérez, G., González-Manteiga, W., Vieu, P.: Estimation and testing in a partial linear regression model under long-memory dependence. Bernoulli, 10, 49–78 (2004) [BZ97] Bhattacharya, P.K., Zhao, P-L.: Semiparametric inference in a partial linear model. Ann. Statist., 25, 244–262 (1997) [CFS03] Cardot, H., Ferraty, F., Sarda, P.: Spline estimators for the functional linear model. Statist. Sinica, 13, 571–591 (2003) [C88] Chen, H.: Convergence rates for parametric components in a partly linear model. Ann. Statist., 16, 136–146 (1988) 1658 Germán Aneiros-Pérez and Philippe Vieu [EGRW86] Engle, R., Granger, C., Rice, J., Weiss, A.: Nonparametric estimates of the relation between weather and electricity sales. J. Amer. Statist. Assoc., 81, 310–320 (1986) [FV06] Ferraty, F., Vieu, P.: Nonparametric functional data analysis. Springer, New York (in print) [G95] Gao, J.T.: The laws of the iterated logarithm of some estimates in partly linear models. Statist. Probab. Lett., 25, 153–162 (1995) [HLG00] Härdle, W., Liang, H., Gao, J.: Partially linear models. Physica-Verlag (2000) [HT90] Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman&Hall, New York (1990) [L00] Liang, H.: Asymptotic normality of parametric part in partially linear models with measurement error in the nonparametric part. J. Statist. Plann. Inference, 86, 51–62 (2000) [RS02] Ramsay, J., Silverman, B.: Applied functional data analysis. Methods and case studies. Springer-Verlag (2002) [RS05] Ramsay, J., Silverman, B.: Functional data analysis. Springer-Verlag (2005) [RB90] Ritov, Y., Bickel, P.J.: Achieving information bounds in non and semiparametric models. Ann. Statist., 18, 925–938 (1990) [S96] Schick, A.: Root-n consistent estimation in partly linear regression models. Statist. Probab. Lett, 28, 353–358 (1996) [S88] Speckman, P.: Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B, 50, 413–436 (1988)