Hindawi Publishing Corporation Journal of Probability and Statistics Volume 2012, Article ID 131085, 20 pages doi:10.1155/2012/131085 Research Article Least Absolute Deviation Estimate for Functional Coefficient Partially Linear Regression Models Yanqin Feng,1 Guoxin Zuo,2 and Li Liu1 1 2 School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China Correspondence should be addressed to Yanqin Feng, yanqf2008@yahoo.com.cn Received 5 August 2012; Revised 10 October 2012; Accepted 1 November 2012 Academic Editor: Artur Lemonte Copyright q 2012 Yanqin Feng et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The functional coefficient partially linear regression model is a useful generalization of the nonparametric model, partial linear model, and varying coefficient model. In this paper, the local linear technique and the L1 method are employed to estimate all the functions in the functional coefficient partially linear regression model. The asymptotic properties of the proposed estimators are studied. Simulation studies are conducted to show the validity of the estimate procedure. 1. Introduction In this paper, we are concerned with a functional coefficient partially linear regression model FCPLR, that is, Y a0 X p aj UZj ε, 1.1 j1 where X and U are random explanatory variables, Z Z1 , . . . , Zp is a random vector, and aj · is some measurable function from R to R for j 0, . . . , p. We call a0 · the intercept function, and {aj ·}, j 1, . . . , p, the coefficient functions. As usual, ε denotes the errors with zero-mean and fixed variance. The FCPLR model, first introduced by Wong et al. 1, is a generalization of the nonparametric model, partial linear model, and varying coefficient model. Zhu et al. 2 studied a similar functional coefficient model, a functional mixed model, using a new Bayesian method. Model 1.1 reduces to a varying coefficient regression mode if the intercept 2 Journal of Probability and Statistics function a0 · is a constant function and a partially linear regression model when the p coefficient functions {aj ·}j1 are constants. Many researchers, for example, Aneiros-Pérez and Vieu 3, made contributions to studying this kind of model. When X and U in model 1.1 coincide, the model becomes a semiparametric varying coefficient model discussed by Ahmad et al. 4. Since the FCPLR model combines the nonparametric and functional coefficient regression model, its flexibility makes it attractable in various regression problems 1. Statistical inference for the FCPLR model mainly includes the estimations of the p intercept function a0 · and the coefficient functions {aj ·}j1 . To estimate the unknown functions in the nonparametric/semiparametric regression models, many statistical inference methods have been proposed over the past decades, such as the kernel estimate method 5–7, spline smoothing 8, 9, and two-step estimation method 10, 11. Wong et al. 1 employed local linear regression method and integrated method to give the initial estimates of all functions in the FCPLR model. All the papers mentioned above used the least-squares technique to obtain the estimators of the unknown coefficient functions. The least-squares estimators L2 method, of course, have some good properties, especially for the normal random errors case. It is well known that, however, the least-squares method will perform poor when the random errors have a heavy-tailed distribution in that it is highly sensitive to extreme values and outliers. This motivates us to find more robust estimation methods instead of the aforementioned inference methods for model 1.1. Local linear approximation is a good method for nonparametric regression problems 12, and the L1 method based on the least absolute deviations overcomes the sensitivity caused by outliers. As noted in Wang and Scott 13 and Fan and Gijbels 12, among many robust estimation methods, the L1 method based on the local least absolute deviations behaves quite well. In this paper, we adopt the L1 method, accompany with the local linear technique and the integrated method to estimate all the unknown functions in model 1.1. Furthermore, the estimating problem can be reduced to a linear programming problem, and the numerical solutions are obtained quickly by some available softwares subsequently e.g., Matlab is very useful for this kind of problems. The main difficulty encountered in the proof of the asymptotic normalities is that the L1 estimates have no closed form. This paper shows the asymptotic normalities of L1 estimators through a method completely different from those based on the L2 method, and the simulation results show that the L1 method is a robust method indeed. The rest of this paper is organized as follows. In Section 2, we describe the estimation method and the associated bandwidth selection procedure. Section 3 gives the the asymptotic theories of the estimators. Simulation studies are conducted in Section 4. A real application is given in Section 5. Section 6 gives the proofs of the main results. 2. Least Absolute Deviation Estimate This section gives the main idea of the proposed estimation method; that is, local linear polynomials are used to approximate the nonparametric function and the functional coefficients, and the least absolute deviation technique is used to find the best approximation. Bandwidth selection technique is also discussed in this section. Throughout this paper, we suppose that {Yi , Xi , Ui , Zi }ni1 is an i.i.d. sample from model 1.1 and assume the following conditions. Journal of Probability and Statistics 3 Assumptions 1 Ω EZZ | X x0 , U u0 is a positive definite matrix, EZ | X x0 , U u0 ω. 2 Bandwidth h subjects to h ∼ n−1/6 . 3 Random error ε, with zero mean and zero median, is independent of Z conditional on X, U. The conditional probability density g·|x, u of ε given random variables X and U is continuous in a neighborhood of the point 0, and g0|x, u > 0. ε1 , . . . , εn are independent and identically distributed. 4 The density functions fx, u, f1 x, f2 u of X, U, X, and U are continuous in neighborhoods of x0 , u0 , x0 and u0 , and fx0 , u0 > 0. 5 All functions aj u, j 1, . . . , p and a0 x are twice continuously differentiable in neighborhoods of u0 and x0 , respectively. 6 Kernel functions Kk ·, k 1, 2 are bounded, nonnegative, and compactly supported. 7 max1≤i≤n Zi op n1/3 . To simplify typesetting, we introduce the following symbols: μi k vi Kk vdv, γi k vi Kk2 vdv for k 1, 2. i 1, 2, . . . . 2.1 2.1. Local Linear Estimate Based on Least Absolute Deviation The main idea is to approximate the functional coefficients aj · by linear functions for j 0, . . . , p, that is, a0 x can be approximated by a0 x ≈ a0 x0 a0 x0 x − x0 2.2 for x in a neighborhood of x0 within the closed support of X, and aj u by aj u ≈ aj u0 aj u0 u − u0 , j 1, . . . , p 2.3 for u in a neighborhood of u0 within the closed support of U. For simplicity, denoting a0 x0 , a0 x0 as a0 , b0 , and aj u0 , aj u0 as aj , bj for j 1, . . . , p. The local linear least absolute deviation estimate L1 estimate of the unknown parameters a0 , b0 , a : a1 , . . . , ap and b : is the optimal solution of the minimization b1 , . . . , bp , denoted, respectively, by a0 , b0 , a, b, problem as follows: min n Yi − a0 − b0 Xi − x0 − a Zi − b Ui − u0 Zi K1,h Xi − x0 K2,h Ui − u0 , i1 2.4 4 Journal of Probability and Statistics where Kk,h · Kk ·/h, k 1, 2 are the given kernel functions and h is the chosen bandwidth. The optimization problem is equivalent to the following linear programming problem: min n ei ei− K1,h Xi − x0 K2,h Ui − u0 i1 s.t. a0 b0 Xi − x0 a Zi b Ui − u0 Zi ei − ei− Yi , ei ≥ 0, ei− ≥ 0, 2.5 i 1, . . . , n. There are many algorithms available for the optimal solution of problem 2.5; for example, the feasible direction method can be directly used to compute the optimal solution 14, and the numerical solution of 2.5 can be quickly computed by a series of Matlab functions. By the integrated method 1, the estimator of the intercept function a0 x is defined by a0 x0 n 1 a0 x0 , Uk , n k1 2.6 and the estimators of the coefficient functions aj u0 , j 1, . . . , p are defined by aj u0 n 1 aj Xk , u0 , n k1 j 1, . . . , p. 2.7 We focus our main task in establishing the asymptotic distributions of the estimators a0 x0 and aj u0 for j 1, . . . , p. 2.2. Selection of Bandwidth It is well known that the choice of the bandwidth strongly influences the adequacy of the estimators. We use an automatic bandwidth choice procedure in this paper; that is, the absolute cross-validation ACV method, and the ACV bandwidth hACV is defined as hACV p n 1 −k −k Yk − a0 Xk − aj Uk Zkj , arg min ACVh n k1 j1 2.8 −k where a−k 0 Xk and aj Uk , j 1, . . . , p are constructed based on observations with size n − 1 by leaving out the kth observation Yk , Xk , Uk , Zk . According to Wang and Scott 13, this bandwidth is better than cross-validation CV bandwidth hCV . The latter one is suggested by Rice and Silverman 15 and often used in curve regression as in Hoover et al. 6 and Wu et al. 7. Journal of Probability and Statistics 5 3. Asymptotic Theory This section gives the asymptotic distribution theories of the estimators. Using Taylor’s expansion, for |Xi − x0 | ≤ Ch and |Ui − u0 | ≤ Ch, we have 1 a0 Xi a0 x0 a0 x0 Xi − x0 a0 ξi Xi − x0 2 , 2 1 aj Ui aj u0 aj u0 Ui − u0 aj ηij Ui − u0 2 , 2 3.1 for j 1, . . . , p, where ξi is between x0 and Xi , and ηij is between Ui and u0 . Let au0 a1 u0 , . . . , ap u0 , bu0 a1 u0 , . . . , ap u0 , cu0 a1 u0 , . . . , ap u0 , then we have a0 , b0 , a , b n Yi − a0 − b0 Xi − x0 − a Zi − b Ui − u0 Zi K1,h Xi − x0 K2,h Ui − u0 i1 ⎧ n ⎨ nh2 a0 − a0 x0 b0 − a x0 Xi − x0 √ 1 arg min 0 ⎩ nh2 i1 1 nh2 a − au0 Zi b − bu0 Ui − u0 Zi √ nh2 ⎞ ⎡ ⎛ ⎤ p 1 2 2 ⎠ ⎝ ⎣ ⎦ a0 ξi Xi − x0 aj ηij Ui − u0 Zij εi − 2 j1 ⎛ ⎫ ⎞ ⎬ p 1 2 2 ⎠ ⎝ − a0 ξi Xi − x0 aj ηij Ui − u0 Zij εi 2 ⎭ j1 arg min × K1,h Xi − x0 K2,h Ui − u0 . 3.2 √ The aim of this paper is to study the asymptotic behavior of nh a0 − a0 x0 and √ nh a − au . Combining some technique reasons, we first introduce the new 0 √ √ √ variables α0 √ 2 2 2 nh a0 − a0 x0 , α nh a− au0 , β0 nh b0 − a0 x0 h, and β nh2 b − bu0 h 16 and form a new equivalent problem as follows: , β α 0 , β0 , α ⎧ n ⎨ 1 1 Ui − u0 α0 β0 Xi − x0 arg min αZ β Zi √ √ i 2 ⎩ h h nh nh2 i1 ⎞ ⎤ ⎡ ⎛ p 1 2 2 ⎠ ⎝ ⎦ ⎣ a0 ξi Xi − x0 aj ηij Ui − u0 Zij εi − 2 j1 ⎛ ⎫ ⎞ ⎬ p 1 2 2 ⎠ ⎝ − a0 ξi Xi − x0 aj ηij Ui − u0 Zij εi 2 ⎭ j1 × K1,h Xi − x0 K2,h Ui − u0 . 3.3 6 Journal of Probability and Statistics Let Fn ESn , Rn Sn − Fn 1 Xi − x0 √ h nh2 ! 1 Ui − u0 α Zi β K1,h Xi − x0 K2,h Ui − u0 sgnεi Zi √ h nh2 n i1 : Sn − Fn n α0 β0 Lni , i1 3.4 where Sn is the objective function of the equality above and sgn· is the sign function. Since the L1 estimators have no closed forms, we first give the limit form of the function Fn , which is critical to obtain the asymptotic properties of the estimators. Theorem 3.1. Suppose Assumptions 1–7 hold, and n → ∞, then for any fixed α0 , β0 , α, β, Fn converges to Fα0 , β0 , α, β, which is defined as g0 | x0 , u0 fx0 , u0 " α20 2α0 β0 μ1 1 β02 μ2 1 α Ωα 2μ1 2α Ωβ μ2 2β Ωβ 2α0 α ω μ1 2β ω 2β0 μ1 1 α ω μ1 2β ω − a x0 α0 μ2 1 β0 μ3 1 μ2 1α ω μ2 1μ1 2β ω # − cu0 α0 μ2 2ω β0 μ1 1μ2 2ω μ2 2Ωα μ3 2Ωβ . 3.5 Remark 3.2. If the kernel functions K1 ·, K2 · are symmetric about zero and Lipschitz continuous, the limit form of Fα0 , β0 , α, β can be simplified as F α0 , β0 , α, β " g0 | x0 , u0 fx0 , u0 α20 μ2 1β02 α Ωα μ2 2β Ωβ # 2α0 α ω − μ2 1a x0 α0 α ω − μ2 2cu0 α0 ω Ωα . 3.6 Now we are in the position to state the asymptotic properties of the estimators. Journal of Probability and Statistics 7 Theorem 3.3. Suppose Assumptions 1–7 hold and n → ∞, then one has $ % h2 μ22 1 − μ3 1μ1 1 a x0 d 2 , nh a0 x0 − a0 x0 − −→ N 0, σ α 0 2 μ2 1 − μ21 1 % $ h2 μ22 2 − μ1 2μ3 2 cu0 d nh au0 − au0 − −→ Np 0, Σα , 2 2 μ2 2 − μ1 2 3.7 3.8 where σα20 ⎧ ⎨ 1 − ωΩ−1 ωγ0 1r 2 x0 2 ⎩ 4 μ21 1 μ21 1γ0 1 − 2μ1 1γ1 1 γ2 1 r22 x0 2 4g 2 0 | x0 , u0 f 2 x0 , u0 μ21 1 − μ2 1 ⎫ 1 − ω Ω−1 ω μ1 1 μ1 1γ0 1 − γ1 1 r22 x0 ⎬ − , ⎭ 2g0 | x0 , u0 fx0 , u0 μ21 1 − μ2 1 ⎧ ⎨ Ω − ωω −1 γ0 2r 2 u0 μ21 2 μ21 2γ0 2 − 2μ1 2γ1 2 γ2 2 r12 u0 Ω−1 1 Σα 2 ⎩ 4 4g 2 0 | x0 , u0 f 2 x0 , u0 μ21 2 − μ2 2 & μ1 2 μ1 2γ1 2 − γ1 2 r12 u0 − 2g0 | x0 , u0 fx0 , u0 μ21 2 − μ2 2 3.9 ' ' with r12 u0 f12 ufu, u0 du and r22 x0 f22 ufx0 , udu. Remark 3.4. If the kernel functions K1 ·, K2 · are symmetric about zero and Lipschitz continuous, the results of Theorem 3.3 can be simplified as $ h2 μ2 1a x0 nh a0 x0 − a0 x0 − 2 % % 1 − ωΩ−1 ω γ0 1r22 x0 , −→ N 0, 4 $ d $ % % $ Ω − ωω −1 γ0 2r12 u0 h2 μ2 2cu0 d . nh au0 − au0 − −→ Np 0, 2 4 3.10 Remark 3.5. Here, we have considered estimation method and asymptotic distributions for the case that two bandwidths are same. It is important to note that similar asymptotic theories can be obtained for the case that two bandwidths with the same order are different. Remark 3.6. If we consider different bandwidth h1 , h2 for kernel functions K1 and K2 . Furthermore, suppose the Assumptions 2 and 7 are replaced by h51 h2 o1/n and h1 h52 o1/n, for example, h1 ∼ n−1/6 , h2 ∼ n−1/5 , and max Zi op min{h−2 nh1 h2 }, 2 , respectively. Similar results also will be obtained except that all the second-order derivatives in the results will disappear. Remark 3.7. This paper restricts the study to one-dimensional variable X. The ideas used here can be adapted to higher dimensional variable X, for example, consider d-dimensional 8 Journal of Probability and Statistics variable X for the case of √ √ same bandwidths. Similar asymptotic distribution results for nhd ax0 − ax0 and nhau0 − au0 can be obtained under the assumptions with Assumptions 2 and 7 being replaced by h ∼ n−1/5d and max Zi op n2/5d , respectively. 4. Simulations In this section, we carry out some simulations to illustrate the performance of L1 -method, and compare the performance of our L1 -method with that of the L2 -method. All the following simulations are conducted for sample size n 100. The following example is considered: Y a0 X a1 UZ1 a2 UZ2 ε, 4.1 where a0 x x 4 exp−8x2 , a1 u 2 sin1.5πu u, a2 u 4u1 − u, X, √U ∼ 2 U−0.5, 0.5 , Z1 , Z2 are normally distributed with correlation coefficient 1/ 2, the marginal distributions of Z1 and Z2 are standard normal, ε ∼ N0, 0.22 , and ε, X, U and Z1 , Z2 are mutually independent. In each simulation, the L1 estimators of a0 x, a1 u, a2 u were computed by solving the minimization problem 2.5 and using the integrated method described in 2.6 and 2.7. We use the Epanechnikov kernel, Ku 3/41 − u2 I|u|<1 , for every Kl ·, l 1, 2. All bandwidths in a model are selected by the method proposed in Section 2. To evaluate the asymptotic results given in Theorem 3.3, the quantile-quantile plots of the estimators are constructed. Figure 1 presents the quantile-quantile plots for a0 0, a1 0, a2 0 with sample size n 100 and 100 replications, respectively, and these plots reveal that the asymptotic approximation is reasonable. Figure 2 displays the true function curves of a0 x, a1 u, and a2 u and their estimated curves with sample size n 100 and one replication. We can see from the figure that the L1 estimates perform well. In order to illustrate that the L1 method is a robust method. Figure 3 displays the estimated curves with four outliers, that is, −0.4538, −0.4402, −10, 0.13377, 0.1393, 20, 0.2703, 0.3085, 20, 0.4174, 0.4398, 18 for the three element array x, u, y. From Figure 3, we can see that the L1 estimate also has a good performance even in the presence of four large singular points of Y . The fact that outliers have little influence on the L1 estimates is displayed in Figure 3, so it is a robust method. By solving the following minimization problem: min n 2 Yi − a0 − b0 Xi − x0 − a Zi − b Ui − u0 Zi K1,h Xi − x0 K2,h Ui − u0 , 4.2 i1 we can obtain similarly the L2 estimators of the functions a0 x, a1 u, a2 u by the equations 2.6 and 2.7. For comparing the L1 -method with the L2 method. We simulated the function a0 x by L2 method and display the fitted curves with without outliers for sample sizes n 100 and 1000 replications in Figure 4. We can see that L2 estimate cannot perform well for the data sparsity and singularity, and three small outlying data points make the estimated curve by the L2 method deviate from the true curve significantly. Combining Figures 2, 3, and 4, Journal of Probability and Statistics a0 (0) 2 2 1 0 −1 1 0 −1 −2 −2 −3 −3 a1 (0) 3 Sample quantiles Sample quantiles 3 9 0 1 2 −2 −1 Standard normal quantiles 3 −3 −3 −2 a 0 1 2 −1 Standard normal quantiles 3 b a2 (0) Sample quantiles 3 2 1 0 −1 −2 −3 −3 0 1 2 −2 −1 Standard normal quantiles 3 c Figure 1: Normal QQ-plot for a0 0, a1 0 and a2 0 n 100. we conclude that the L1 method performs better than the L2 method, the L1 method is a robust method. Finally, for further comparing the L1 -estimate with the L2 -estimate method, we also assess their performance via the weighted average squared error WASE, which is defined as $ % n 1 a0 xi − a0 xi 2 a1 ui − a1 ui 2 a2 ui − a2 ui 2 , WASE 2 2 2 n i1 rangea0 rangea1 rangea2 4.3 where range ak for k 0, 1, 2 are the ranges of the functions a0 x, a1 u, and a2 u. The weights are introduced to account for the different scales of the functions. We conducted 200 replications with sample size n 100. For the bandwidths and the Epanechnikov kernels used in the simulations, we obtain the mean and standard deviation of the WASE are 0.1201 and 0.0183 for the L1 method, and 1.6613 and 0.8936 for the L2 method. We can see that L1 method outperforms L2 method. 10 Journal of Probability and Statistics 4.5 2 1 4 1.5 0.5 1 0 0.5 −0.5 3.5 2 a2 (u) 2.5 a1 (u) a0 (x) 3 0 −1 −0.5 −1.5 −1 −2 −1.5 −2.5 1.5 1 0.5 0 −0.5 0 x −2 −0.5 0.5 a 0 u −3 −0.5 0.5 b 0 u 0.5 c 2 1 3.5 1.5 0.5 3 1 0 2.5 0.5 −0.5 2 a2 (u) 4 a1 (u) a0 (x) Figure 2: Simulation results, for example. Dotted curves are L1 estimators; solid curve are the true curves. 0 −1 1.5 −0.5 −1.5 1 −1 −2 0.5 −1.5 −2.5 0 −0.5 0 0.5 −2 −0.5 0.5 −3 −0.5 u u a 0 b 0 0.5 u c Figure 3: Simulation results, for example, with outliers. Dotted curves are L1 estimators; solid curve are the true curves. Journal of Probability and Statistics 11 6 4 4 3 2 a0 (x) a0 (x) (N) 5 2 0 1 −2 0 −4 −1 −0.5 0 (Y) −6 −0.5 0.5 0 0.5 u u a b Figure 4: Simulation results for a0 x. Dotted curves are L2 estimators; solid curve are the true curves. N plot without outliers, Y plot with three outliers x, y −0.4538, −5, 0.1948, 8, and 0.4649, 5. 5. Application A real data is analyzed by the proposed L1 -method in this section. The classic gas furnace data was studied recently by Wong et al. 1. The data set includes 296 samples It , Ot , t 1, . . . , 296 measured at a fixed interval of 9 seconds, where It ’s represent the input gas rate in cubic feet per minute, and Ot ’s represent the concentration of carbon dioxide in the gas out of the furnace. Similar to the procedures of Wong et al. 1, the original data are transformed as xt It 2.716/5.55, yt Ot − 45.6/14.9 for t 1, . . . , 296 such that both x’s and y’s are limited in the interval 0, 1, and the model yt a0 xt−3 4 aj yt−3 yt−j t 5.1 j1 is used to fit the data. The first 250 samples are used to establish the model, and the remained 46 samples are used for prediction. In the proposed method, Epanechnikov kernel is used and all the bandwidths hj j 0, . . . , 4 are selected as 0.14 via cross validation for simplicity. The mean absolute error MAE and the mean squared error MSE for fitting and forecasting are listed as follows. Fitting: MAE 0.3189, MSE 0.1584; forecasting: MAE 0.3705, MSE 0.2127. Since the model is chosen based on the L2 errors, the results are not perfect as that in Wong et al. 1. Moreover, the data set does not contain obvious outliers, so the advantage of the L1 estimation method is not apparent. Compared to the results showed in Wong et al. 1, the difference between the fitting MAE/MSE and the forecasting MAE/MSE is small. 12 Journal of Probability and Statistics 62 60 60 59 58 57 56 CO2 concentration CO2 concentration 58 54 52 50 56 55 54 53 48 52 46 51 44 0 50 100 150 Input gas rate 200 250 50 250 260 a 270 280 Input gas rate 290 300 b Figure 5: a Fitted values of Ot t 5, . . . , 250 for gas data; b Predictive values Ot t 251, . . . , 296 for gas data. Dotted lines are fitted or predicted; solid lines are observed. 250 t }t5 and The reason is that L1 -methods are employed in our method. The fitted values {O 296 t }t251 are shown in Figure 5. These results indicate that the estimated predictive values {O results are reasonable. 6. Proofs of the Main Results Before completing the proofs of the main results, we give the following useful lemma first. Lemma 6.1. Suppose Assumptions 1–7 hold, then for any fixed α0 , β0 , α, β, Rn converges to 0 in probability, that is, Rn op 1 as n → ∞. Proof of Lemma 6.1. Let ⎧ ⎨ 1 Ui − u0 Xi − x0 1 dn max α0 β0 Zi √ αZi β √ 1≤i≤n ⎩ h h nh2 nh2 ⎫ ⎬ p 1 2 2 a0 ξi Xi − x0 aj ηij Ui − u0 Zij . 2 ⎭ j1 6.1 Note that |Xi − x0 | ≤ Ch, |Ui − u0 | ≤ Ch and Assumptions 2, 5, and 7, for any fixed α0 , β0 , α and β, we have that dn op 1 as n −→ ∞. 6.2 Journal of Probability and Statistics 13 Let ⎧ ⎨ 1 Ui − u0 Xi − x0 1 Zi √ αZi β Tni Lni α0 β0 √ ⎩ h h nh2 nh2 ⎞ ⎤ ⎡ ⎛ p 1 − ⎣ ⎝a0 ξi Xi − x0 2 aj ηij Ui − u0 2 Zij ⎠ εi ⎦ 2 j1 ⎛ ⎫ ⎞ ⎬ p 1 2 2 ⎠ ⎝ − a0 ξi Xi − x0 aj ηij Ui − u0 Zij εi 2 ⎭ j1 6.3 × K1,h Xi − x0 K2,h Ui − u0 . It can be easily seen that Tni 0 if |εi | ≥ dn . Hence we have ! 1 Xi − x0 Ui − u0 α Zi β Zi √ |Tni | ≤ 2 α0 β0 h h nh2 6.4 × K1,h Xi − x0 K2,h Ui − u0 I{|εi |<dn } , where I· is the indicator function. For any > 0, δ > 0, we have ( ) P {|Rn | > } ≤ P {dn ≥ δ} P I{dn <δ} |Rn | > . 6.5 Since Esgnεi 0, we have E I{dn <δ} R2n ⎧ %2 ⎫ $ n ⎨ ⎬ " # E I{dn <δ} Sn − Fn Ln 2 E I{dn <δ} , Tni − ETni ⎩ ⎭ i1 6.6 14 Journal of Probability and Statistics here Ln lim E n→∞ *n i1 Lni . Combining 6.2 and 6.4, we have I{dn <δ} R2n ⎧ ⎨ + n lim E I{dn <δ} E Tni − ETni 2 | X, U, Z n→∞ ⎩ i1 ⎤⎫ ⎬ E Tni − ETni Tnj − E Tnj | X, U, Z ⎦ ⎭ ij / ⎧ + ⎪ 2 2 ⎨ Xi − x0 Ui − u0 ≤ 8 lim E α0 β0 Zi α Zi β n→∞ ⎪ h h ⎩ × ⎫ ⎪ ⎬ 1 2 2 K Xi − x0 K2,h Ui − u0 P {|εi | ≤ δ | X, U} ⎪ h2 1,h ⎭ . ≤ 16δg0 | x0 , u0 fx0 , u0 γ0 2 α20 γ0 1 2α0 β0 γ1 1 β02 γ2 1 γ0 1 γ0 2α Ωα / 2γ2 1α Ωβ γ2 2β Ωβ o1 −→ 0, 6.7 when δ → 0. Combining 6.2, 6.5, and the argument EI{dn <δ} R2n → 0 as δ → 0 and n → ∞, the desired conclusion follows by using the Chebyshev’s inequality. This completes the proof of Lemma 6.1. Proof of Theorem 3.1. Set Ui − u0 1 1 α Zi β , Zi √ √ 2 h nh nh2 ⎞ ⎛ p 1 ⎝ BXi , Ui , Zi a0 ξi Xi − x0 2 aj ηij Ui − u0 2 Zij ⎠. 2 j1 AXi , Ui , Zi α0 β0 Xi − x0 h 6.8 By Lemma 6.1, under Assumptions 1 − 7, AXi , Ui , Zi and BXi , Ui , Zi converge to zero as n → ∞. Start from the equality, Fn I{dn <δ} Fn I{dn ≥δ} Fn . 6.9 Journal of Probability and Statistics 15 We first give the limit form of Fn , for fixed α0 , β0 , α, β, we have I{dn <δ} Fn & 0 n I{dn <δ} E |AXi , Ui , Zi − BXi , Ui , Zi − εi | − |BXi , Ui , Zi εi | i1 × K1,h Xi − x0 K2,h Ui − u0 I{dn <δ} nE{|AX1 , U1 , Z1 − BX1 , U1 , Z1 − ε1 | − |BX1 , U1 , Z1 ε1 |} × K1,h X1 − x0 K2,h U1 − u0 I{dn <δ} nE{E{|AX1 , U1 , Z1 − BX1 , U1 , Z1 − ε1 | − |BX1 , U1 , Z1 ε1 | ×K1,h X1 − x0 K2,h U1 − u0 | X, U, Z}} 0 I{dn <δ} nE wBX1 , U1 , Z1 −AX1 , U1 , Z1 gw | X1 , U1 dw δ>w>AX1 ,U1 ,Z1 −BX1 ,U1 ,Z1 AX1 , U1 , Z1 −BX1 , U1 , Z1 −wgw | X1 , U1 dw −δ<w<AX1 ,U1 ,Z1 −BX1 ,U1 ,Z1 − δ>w>−BX1 ,U1 ,Z1 w BX1 , U1 , Z1 gw | X1 , U1 dw −δ<w<−BX1 ,U1 ,Z1 BX1 , U1 , Z1 wgw | X1 , U1 dw −gw | X1 , U1 AX1 , U1 , Z1 dw w>δ ! gw | X1 , U1 AX1 , U1 , Z1 dw K1,h X1 − x0 K2,h U1 − u0 . w<−δ 6.10 By the Integral Mean Value Theorem refer to the Appendix, we have I{dn <δ} Fn 0 nI{dn <δ} E gτ1 | X1 , U1 + δ2 BX1 , U1 , Z1 − AX1 , U1 , Z1 2 BX1 , U1 , Z1 −AX1 , U1 , Z1 δ × 2 2 gτ2 | X1 , U1 - 16 Journal of Probability and Statistics + δ2 BX1 , U1 , Z1 − AX1 , U1 , Z1 2 AX1 , U1 , Z1 −BX1 , U1 , Z1 δ × 2 2 AX1 , U1 , Z1 Gδ | X1 , U1 G−δ | X1 , U1 − 1 + δ2 B2 X1 , U1 , Z1 BX1 , U1 , Z1 δ − gτ3 | X1 , U1 2 2 + -& δ2 B2 X1 , U1 , Z1 − BX1 , U1 , Z1 δ − gτ4 | X1 , U1 2 2 × K1,h X1 − x0 K2,h U1 − u0 , 6.11 where G·|· is the conditional probability distribution function of ε, and τ1 , τ2 , τ3 , τ4 converge to zero as δ → 0 and n → ∞. Then by Assumptions 1–5, for any small enough δ > 0, we have I{dn <δ} Fn I{dn <δ} nE " g0 | X1 , U1 o1 A2 X1 , U1 , Z1 − 2AX1 , U1 , Z1 BX1 , U1 , Z1 # × K1,h X1 − x0 K2,h U1 − u0 I{dn <δ} g0 | x0 , u0 fx0 , u0 × " α20 2α0 β0 μ1 1 β02 μ2 1 α Ωα 2μ1 2α Ωβ μ2 2β Ωβ 2α0 α ω μ1 2β ω 2β0 μ1 1 α ω μ1 2β ω − a x0 α0 μ2 1 β0 μ3 1 μ2 1α ω μ2 1μ1 2β ω − cu0 α0 μ2 2ω β0 μ1 1μ2 2ω # μ2 2Ωα μ3 2Ωβ o1 I{dn <δ} F α0 , β0 , α, β o1 , Fn F α0 , β0 , α, β o1 − F α0 , β0 , α, β o1 − Fn I{dn ≥δ} . 6.12 Note that, for any fixed α0 , β0 , α, β,dn → 0 as n → ∞, we draw the desired conclusion. This completes the proof of Theorem 3.1. Proof of Theorem 3.3. Since the proofs of 3.7 and 3.8 are quite similar, we only give the proof of 3.8. Journal of Probability and Statistics 17 Start from the equality Ln ! n Xi − x0 Ui − u0 1 α Zi β Zi √ α0 β0 h h nh2 i1 6.13 × K1,h Xi − x0 K2,h Ui − u0 sgnεi . By Lemma 6.1 and Theorem 3.1, for fixed α0 , β0 , α, β, we have Sn F α0 , β0 , α, β − Ln Rn o1 F α0 , β0 , α, β − Ln Rn , 6.14 Sn Ln F α0 , β0 , α, β Rn . 6.15 hence Note that EL2n n EL2ni i1 ≤ 2nE 0 α0 β0 Xi − x0 h 2 1 2 K 2 Xi − x0 K2,h Ui − u0 nh2 1,h & 2 1 Ui − u0 2 2 Zi K Xi − x0 K2,h Ui − u0 α Zi β h nh2 1,h . / 2 γ0 2 α20 γ0 1 2α0 β0 γ1 1 2β02 γ2 1 γ0 1 α Ωα 2γ1 1α Ωβ γ2 2β Ωβ . 6.16 We obtain that Ln is bounded in probability for any fixed α0 , β0 , α, β. Thus, for any fixed α0 , β0 , α, β, the random convex function Sn Ln converges to Fα0 , β0 , α, β. According to the convexity lemma 17, we can deduce that, for any compact set K, sup Rn op 1, α0 ,β0 ,α ,β ∈K 6.17 when δ → 0. By the proof of Theorem 2 in Wang 18, we obtain that the “limit” here is not only in the sense of the limit of a sequence of random variables, but also in the sense of the limit of a sequence of stochastic process, and the minimizer of Sn converges to the minimizers of Fα0 , β0 , α, β − Ln . 18 Journal of Probability and Statistics By the convexity of the function Fα0 , β0 , α, β, we have √ 2 *n 1/ nh K1,h Xi − x0 K2,h Ui − u0 sgnεi − x − μ 1 X /h 1 i 0 i1 β0 2g0 | x0 , u0 fx0 , u0 μ21 1 − μ2 1 μ2 1μ1 1 − μ3 1 a x0 , 2 μ21 1 − μ2 1 √ * Ω−1 ni1 μ1 2 − Ui − u0 /h Zi / nh2 K1,h Xi − x0 K2,h Ui − u0 sgnεi β 2g0 | x0 , u0 fx0 , u0 μ21 2 − μ2 2 μ2 2μ1 2 − μ3 2 cu0 , 2 μ21 2 − μ2 2 2 μ 2 − μ1 2μ3 2 cu0 α 2 2 μ2 2 − μ21 2 n −1 1 √ Ω − ωω Zi − ωK1,h Xi − x0 K2,h Ui − u0 sgnεi 2 nh2 i1 √ * μ1 2Ω−1 ni1 μ1 2 − Ui − u0 /h Zi 1/ nh2 K1,h Xi − x0 K2,h Ui − u0 sgnεi − 2g0 | x0 , u0 fx0 , u0 μ21 2 − μ2 2 2 n μ2 1 − μ3 1μ1 1 a x0 1 −1 1 − ωΩ K1,h Xi − x0 K2,h Ui − u0 sgnεi α0 Z √ i 2 μ2 1 − μ21 1 2 nh2 i1 √ * μ1 1 ni1 μ1 1 − Xi − x0 /h 1/ nh2 K1,h Xi − x0 K2,h Ui − u0 sgnεi − . 2g0 | x0 , u0 fx0 , u0 μ21 1 − μ2 1 6.18 Thus we obtain n α u0 1 α Xk , u0 √ √ h n h k1 n n −1 1 Ω − ωω Zi − ωK1,h Xi − Xk K2,h Ui − u0 sgnεi √ 2nh nh k1 i1 * * μ1 2Ω−1 nk1 ni1 μ1 2 − Ui − u0 /h Zi K1,h Xi − Xk K2,h Ui − u0 sgnεi − √ 2nh nhg0 | x0 , u0 fx0 , u0 μ21 2 − μ2 2 2 μ 2 − μ1 2μ3 2 cu0 2 √ 2 h μ2 2 − μ21 2 2 μ2 2 − μ1 2μ3 2 cu0 : A1 √ . 2 h μ2 2 − μ21 2 6.19 Journal of Probability and Statistics 19 By interchanging summation signs and noting that n ( ) 1 K1,h Xi − Xk hf1 Xi 1 op 1 n k1 6.20 A1 can be rewritten as n ( ) −1 1 A1 √ Ω − ωω Zi − ωK2,h Ui − u0 f1 Xi sgnεi 1 op 1 2 nh i1 ( ) * μ1 2Ω−1 ni1 μ1 2 − Ui − u0 /h Zi K2,h Ui − u0 f1 Xi sgnεi 1 op 1 . − √ 2 nhg0 | x0 , u0 fx0 , u0 μ21 2 − μ2 2 6.21 For all t ∈ Rp , by some straightforward computations, we get that the mean and variance of t A1 are E t A1 0 2 E tA1 0 t Ω − ωω −1 γ0 2r22 u0 4 μ21 2 μ21 2γ0 2 − 2μ1 2γ1 2 γ2 2 r12 u0 Ω−1 2 4g 2 0 | x0 , u0 f 2 x0 , u0 μ21 2 − μ2 2 & μ1 2 μ1 2γ1 2 − γ1 2 r12 u0 ( ) − 2 t 1 op 1 . 2g0 | x0 , u0 fx0 , u0 μ1 2 − μ2 2 6.22 By Assumptions 1–7, and using the methods used in the proof of Theorem 1 in Tang and Wang 19, we can easily check that the Lindeberg-Feller’s condition for t A1 is held. So Cramer-Wold theorem tells us 3.8 follows. This completes the proof of Theorem 3.3. Appendix Integral Mean Value Theorem If f and g are continuous functions defined on a, b, and gx ≥ 0 or gx ≤ 0, then there exists a number ξ ∈ a, b such that b a fxgxdx fξ b gxdx. a A.1 20 Journal of Probability and Statistics Acknowledgments The authors gratefully acknowledge the helpful recommendations from the referees that led to an improved revision of an earlier paper. References 1 H. Wong, R. Zhang, W.-c. Ip, and G. Li, “Functional-coefficient partially linear regression model,” Journal of Multivariate Analysis, vol. 99, no. 2, pp. 278–305, 2008. 2 H. Zhu, P. J. Brown, and J. S. Morris, “Robust, adaptive functional regression in functional mixed model framework,” Journal of the American Statistical Association, vol. 106, no. 495, pp. 1167–1179, 2011. 3 G. Aneiros-Pérez and P. Vieu, “Nonparametric time series prediction: a semi-functional partial linear modeling,” Journal of Multivariate Analysis, vol. 99, no. 5, pp. 834–857, 2008. 4 I. Ahmad, S. Leelahanon, and Q. Li, “Efficient estimation of a semiparametric partially linear varying coefficient model,” The Annals of Statistics, vol. 33, no. 1, pp. 258–283, 2005. 5 W. S. Cleveland, E. Grosse, and W. M. Shyu, “Local regression models,” in Statistical Models in S, J. M. Chambers and T. J. Hastie, Eds., pp. 309–376, Wadsworth and Brooks, Pacific Grove, Calif, USA, 1992. 6 D. R. Hoover, J. A. Rice, C. O. Wu, and L.-P. Yang, “Nonparametric smoothing estimates of timevarying coefficient models with longitudinal data,” Biometrika, vol. 85, no. 4, pp. 809–822, 1998. 7 C. O. Wu, C.-T. Chiang, and D. R. Hoover, “Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data,” Journal of the American Statistical Association, vol. 93, no. 444, pp. 1388–1402, 1998. 8 T. Hastie and R. Tibshirani, “Varying-coefficient models,” Journal of the Royal Statistical Society Series B, vol. 55, no. 4, pp. 757–796, 1993, With discussion and a reply by the authors. 9 C.-T. Chiang, J. A. Rice, and C. O. Wu, “Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables,” Journal of the American Statistical Association, vol. 96, no. 454, pp. 605–619, 2001. 10 J. Fan and W. Zhang, “Statistical estimation in varying coefficient models,” The Annals of Statistics, vol. 27, no. 5, pp. 1491–1518, 1999. 11 J. Fan and J.-T. Zhang, “Two-step estimation of functional linear models with applications to longitudinal data,” Journal of the Royal Statistical Society Series B, vol. 62, no. 2, pp. 303–322, 2000. 12 J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications, vol. 66 of Monographs on Statistics and Applied Probability, Chapman & Hall, London, UK, 1996. 13 F. T. Wang and D. W. Scott, “The L1 method for robust nonparametric regression,” Journal of the American Statistical Association, vol. 89, no. 425, pp. 65–76, 1994. 14 M. S. Bazaraa and C. M. Shetty, Nonlinear Programming, John Wiley & Sons , New York, NY, USA, 1979, Theory and Algorithms. 15 J. A. Rice and B. W. Silverman, “Estimating the mean and covariance structure nonparametrically when the data are curves,” Journal of the Royal Statistical Society Series B, vol. 53, no. 1, pp. 233–243, 1991. 16 B. L. S. Prakasa Rao, Asymptotic Theory of Statistical Inference, John Wiley & Sons, New York, NY, USA, 1987. 17 D. Pollard, “Asymptotics for least absolute deviation regression estimators,” Econometric Theory, vol. 7, no. 2, pp. 186–199, 1991. 18 J. D. Wang, “Asymptotic normality of L1 -estimators in nonlinear regression,” Journal of Multivariate Analysis, vol. 54, no. 2, pp. 227–238, 1995. 19 Q. Tang and J. Wang, “L1 -estimation for varying coefficient models,” Statistics, vol. 39, no. 5, pp. 389– 404, 2005. Advances in Operations Research Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Advances in Decision Sciences Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Mathematical Problems in Engineering Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Journal of Algebra Hindawi Publishing Corporation http://www.hindawi.com Probability and Statistics Volume 2014 The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 International Journal of Differential Equations Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Volume 2014 Submit your manuscripts at http://www.hindawi.com International Journal of Advances in Combinatorics Hindawi Publishing Corporation http://www.hindawi.com Mathematical Physics Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Journal of Complex Analysis Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 International Journal of Mathematics and Mathematical Sciences Journal of Hindawi Publishing Corporation http://www.hindawi.com Stochastic Analysis Abstract and Applied Analysis Hindawi Publishing Corporation http://www.hindawi.com Hindawi Publishing Corporation http://www.hindawi.com International Journal of Mathematics Volume 2014 Volume 2014 Discrete Dynamics in Nature and Society Volume 2014 Volume 2014 Journal of Journal of Discrete Mathematics Journal of Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Applied Mathematics Journal of Function Spaces Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Optimization Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014