Hindawi Publishing Corporation Abstract and Applied Analysis Volume 2013, Article ID 139318, 7 pages http://dx.doi.org/10.1155/2013/139318 Research Article Regularized Least Square Regression with Unbounded and Dependent Sampling Xiaorong Chu and Hongwei Sun School of Mathematical Science, University of Jinan, Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan 250022, China Correspondence should be addressed to Hongwei Sun; ss sunhw@ujn.edu.cn Received 29 October 2012; Revised 22 March 2013; Accepted 22 March 2013 Academic Editor: Changbum Chun Copyright © 2013 X. Chu and H. Sun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This paper mainly focuses on the least square regression problem for the đź-mixing and đ-mixing processes. The standard bound assumption for output data is abandoned and the learning algorithm is implemented with samples drawn from dependent sampling process with a more general output data condition. Capacity independent error bounds and learning rates are deduced by means of the integral operator technique. 1. Introduction and Main Results The aim of this paper is to study the least square regularized regression learning algorithm. The main novelty of this problem here is the unboundedness and dependence of the sampling process. Let đ be a compact metric space (usually a subset of Rđ ) and đ = R. Suppose that đ is a probability distribution defined on đ = đ × đ. In regression learning, one wants to learn or approximate the regression function đđ : đ → đ given by đđ (đĽ) = E (đŚ | đĽ) = ∫ đŚ đđ (đŚ | đĽ) , đ đĽ ∈ đ, (1) where đ(đŚ | đĽ) is the conditional distribution of đŚ for given đĽ. đđ is not directly computable because đ is unknown in fact. Instead we learn a good approximation of đđ from a đ set of observations z = {(đĽđ , đŚđ )}đ đ=1 ∈ đ drawn according to đ. The learning algorithm studied here is based on a Mercer kernel đž : đ × đ → R which is a continuous, symmetric, and positive semidefinite function. The RKHS Hđž associated with the Mercer kernel đž is the completion of span {đžđĽ = đž(⋅, đĽ) : đĽ ∈ đ} with the inner product satisfying â¨đž(đĽ, ⋅), đž(đĽó¸ , ⋅)âŠđž = đž(đĽ, đĽó¸ ). The learning algorithm is a regularization scheme in Hđž given by đz,đ = arg min { đ∈Hđž 1 đ 2 óľŠ óľŠ2 ∑ (đ (đĽđ ) − đŚđ ) + đóľŠóľŠóľŠđóľŠóľŠóľŠđž } , đ đ=1 (2) where đ > 0 is a regularization parameter. Error analysis for learning algorithm (2) has been studied in a lot of literatures [1–4], which focused on independent samples. In recent years, there are some studies relaxing the independent restriction and turning to the dependent sampling learning [5–8]. In [8] the learning performance of regularized least square regression was studied with the mixing sequences, and the result for this setting was refined by an operator monotone inequality in [7]. For a stationary real-valued sequence {đ§đ }đ≥1 , the đalgebra generated by the random variables đ§đ , đ§đ+1 , . . ., đ§đ is denoted by Mđđ . The uniformly mixing condition (or đmixing condition) and the strongly mixing condition (or đźmixing condition) are defined as follows. Definition 1 (đ-mixing). The đth đ-mixing coefficient for the sequence is defined as đđ = sup sup đ≥1 đ´∈Mđ1 , đľ∈M∞ đ+đ |đ (đ´ | đľ) − đ (đ´)| . (3) 2 Abstract and Applied Analysis The process {đ§đ }đ≥1 is said to satisfy a uniformly mixing condition (or đ-mixing condition) if đđ → 0, as đ → ∞. of đđ by đz,đ is usually measured by the excess generalization error E(đz,đ ) − E(đđ ) = âđz,đ − đđ â2đ . Denoting Definition 2 (đź-mixing). The đth đź-mixing coefficient for random sequence {đ§đ }đ≥1 is defined as đ := sup√đž (đĽ, đĽ) < ∞, đźđ = sup sup đ≥1 đ´∈Mđ1 , đľ∈M∞ đ+đ |đ (đ´ ∩ đľ) − đ (đ´) đ (đľ)| . (4) The random process {đ§đ }đ≥1 is said to satisfy a strongly mixing condition (or đź-mixing condition) if đźđ → 0, as đ → ∞. By the fact đ(đ´ ∩ đľ) = đ(đ´ | đľ)đ(đľ), đź-mixing condition is weaker than đ-mixing condition. Many random processes satisfy the strongly mixing condition, for example, the stationary Markov process which is uniformly pure nondeterministic, the stationary Gaussian sequence with a continuous spectral density that is bounded away from 0, certain ARMA processes, and some aperiodic, Harris-recurrent Markov processes; see [5, 9] and the references therein. In this paper we follow [7, 8] to consider đź-mixing and đmixing processes, estimate the error bounds, and derive the learning rates of algorithm (2), where the output data satisfy the following unbounded condition. Unbounded Hypothesis. There exist two constants đ > 0 and đ ≥ 2 such that óľ¨ óľ¨đ Eóľ¨óľ¨óľ¨đŚóľ¨óľ¨óľ¨ ≤ đ. (5) The error analysis for the algorithm (2) was usually presented under the standard assumption that |đŚ| ≤ đ almost surely with some constant đ > 0. This standard assumption was abandoned in [10–14]. In [10] the authors introduced the condition óľ¨2 óľ¨óľ¨ 2 óľ¨óľ¨óľ¨đŚ−đH (đĽ)óľ¨óľ¨óľ¨ óľ¨đŚ−đH óľ¨óľ¨óľ¨ óľ¨ − 1) đđ (đŚ | đĽ) ≤ Σ }− óľ¨ ∫ (exp {− óľ¨ đ đ 2đ2 đ (6) for almost every đĽ ∈ đ and some constants đ, Σ > 0, where đH is the orthogonal projection of đđ onto the closure of Hđž in đż2đđ (đ). In [11–13] the error analysis was conducted in another setting satisfying the following moment hypotheĚ > 0 and đś Ě > 0 such that sis; that is, there exist constants đ đ đ Ě Ě ∫đ |đŚ| đđ(đŚ | đĽ) ≤ đśđ!đ for all đ ∈ N, đĽ ∈ đ. Notice that with different constants the moment hypothesis and (6) are equivalent in the case đH ∈ đż∞ (đ) [13]. Obviously, our unbounded hypothesis is a natural generalization of the moment hypothesis. An example for which unbounded hypothesis (5) is satisfied but moment hypothesis failed has been given in [15]. It mainly studies the half supervised coefficient regularization with indefinite kernels and unbounded Ě2 sampling, where the unbounded condition is ∫đ đŚ2 đđ ≤ đ Ě > 0. for some constant đ Since E(đđ ) = minE(đ), where the generalization error E(đ) = ∫đ (đ(đĽ) − đŚ)2 đđ, the goodness of the approximation đ đĽ∈đ (7) the reproducing property in RKHS Hđž yields that âđâ∞ ≤ đ âđâđž for any đ ∈ Hđž . Thus, the distance between đz,đ and đđ in Hđž can be applied to measure this approximation as well when đđ ∈ Hđž . The noise-free limit of algorithm (2) takes the form óľŠ2 óľŠ óľŠ óľŠ2 đđ := arg min {óľŠóľŠóľŠóľŠđ − đđ óľŠóľŠóľŠóľŠđ + đóľŠóľŠóľŠđóľŠóľŠóľŠđž } , đ đ∈Hđž (8) thus the error analysis can be divided into two parts. The difference between đz,đ and đđ is called the sample error, and the distance between đđ and đđ is called the approximation error. We will bound the error in đż2đđ (đ) and Hđž , respectively. Estimate of the sample error is more difficult because đz,đ changes with the sample z and cannot be considered as a fixed function. The approximation error does not depend on the samples, which has been studied in the literature [2, 3, 7, 16, 17]. We mainly devote the next two sections to estimating the sample error with more general sampling processes. Our main results can be stated as follows. Theorem 3. Suppose that the unbounded hypothesis holds, 2 đż−đ đž đđ ∈ đż đđ (đ) for some đ > 0, and the đ-mixing coefficients satisfy a polynomial decay, that is, đđ ≤ đđ−đĄ for some đ > 0 and đĄ > 0. Then, for any 0 < đ < 1, one has with confidence 1 − đ, óľŠ óľŠóľŠ óľŠóľŠđz,đ − đđ óľŠóľŠóľŠ = đ (đ−đ min{đĄ/2,1} (log đ)3/4 ) , óľŠđđ óľŠ (9) where đ is given by 3đ { { { 4 (đ + 1) { { { { { { { đ đ={ { 2đ + 1 { { { { { { { {1 {3 1 if 0 < đ < , 2 if 1 ≤ đ < 1, 2 (10) if đ ≥ 1. Moreover, when đ > 1/2, one has with confidence 1 − đ, ó¸ óľŠ óľŠóľŠ óľŠóľŠđz,đ − đđ óľŠóľŠóľŠ = đ (đ−đ min{đĄ/2,1} (log đ)1/2 ) , óľŠđž óľŠ (11) where đó¸ is given by 2đ − 1 { { { { 2 (2đ + 1) đó¸ = { { { {1 {4 if 1 3 <đ< , 2 2 3 if đ ≥ . 2 (12) Theorem 3 proves the asymptotic convergence of algorithm (2) with the samples satisfying a uniformly mixing condition. Our second main result considers this algorithm with đź-mixing process. Abstract and Applied Analysis 3 Theorem 4. Suppose that the unbounded hypothesis with đ > 2 2 holds, đż−đ đž đđ ∈ đż đđ (đ) for some đ > 0, and the đź-mixing coefficients satisfy a polynomial decay, that is, đźđ ≤ đđ−đĄ for some đ > 0 and đĄ > 0. Then, for any 0 < đ < 1, one has with confidence 1 − đ, óľŠ óľŠóľŠ óľŠóľŠđz,đž − đđ óľŠóľŠóľŠ = đ (đ−đ min{(đ−2)đĄ/đ,1} (log đ)1/2 ) , óľŠđđ óľŠ (13) where đ is given by đđ { { { 2 (2đ + đ − 1) { { { { { { { { 3đđ { { { { { { 2 (4đ + 3đ − 2) đ={ { { { đ { { { { { 2đ +1 { { { { { { { {1 {3 đ 1 if 0 < đ < , 0 < đĄ < ; 2 đ−2 đ 1 if 0 < đ < , đĄ ≥ ; 2 đ−2 if 2. Sampling Satisfying đ-Mixing Condition (14) 1 ≤ đ < 1; 2 if đ ≥ 1. Moreover, when đ > 1/2, with confidence 1 − đ, ó¸ óľŠ óľŠóľŠ óľŠóľŠđz,đž − đđ óľŠóľŠóľŠ = đ (đ−đ min{(đ−2)đĄ/đ,1} (log đ)1/2 ) , óľŠđž óľŠ (15) where đó¸ is given by 2đ − 1 { { { 4đ + 2 { đó¸ = { { { {1 {4 1 3 if < đ < , 2 2 3 if đ ≥ . 2 E(đŚ − đđ (đĽ))2 is bounded by EđŚ2 < ∞. But for đźmixing process, to derive the learning rate, we have to estimate E|đŚ − đđ (đĽ)|đ with đ > 2. (iv) Under đź-mixing condition, when đĄ > đ/(đ − 2) and đ ≥ 1/2, the influence of the unbounded condition becomes weak. Recall that the learning rate derived in [8] is đ(đ−đ/(1+2đ) ) for 1/2 ≤ đ ≤ 1, đĄ ≥ 1. It implies that when đĄ is large enough, our learning rate for unbounded samples is as sharp as that for the uniform bounded sampling. In this section, we would apply the integral operator technique in [7] to handle the sample error with đ-mixing condition. However, different from the uniform bounded case the learning performance of the unbounded sampling is not measured directly. Instead, the expectations are estimated first and then the bound for the sample error can be obviously deduced by Markov inequality: To this end, define the sampling operator đx : Hđž → đ2 (x) as đx (đ) = (đ(đĽđ ))đ đ=1 , where x is the set of input data 2 {đĽ1 , . . . , đĽđ }. Then its adjoint is đxT đ = ∑đ đ=1 đđ đžđĽđ for đ ∈ đ (x). The analytic expression of optimization solution đz,đ , đđ was given in [3], đz,đ = ( (i) Smoother target function đđ (i.e., đ becomes larger) implies better learning rates. Stronger dependence between samples (i.e., đĄ becomes smaller) implies that they contain less information and hence lead to worse rates. (ii) The learning rates are improved as the dependence between samples becomes weaker and đ becomes larger but they are no longer improved after some constant đĄ, đ. This phenomenon is called saturation effect, which was discussed in [18–20]. In our setting, saturation effects include saturation for smoothness of function đđ mainly relative to the approximation error and saturation for dependence between samples. An interesting phenomenon revealed here is that when đź-mixing coefficients satisfy đźđ ≤ đ(đ−đĄ ), đ ∈ N for some đĄ > 0, the saturation for dependence between samples is đĄ = đ/(đ − 2) for đ > 2, which is dependent on the unbounded condition parameter đ. (iii) For đ-mixing process, the learning rates have nothing to do with unbound condition parameter đ since (17) −1 đđ = (đż đž + đđź) đż đž đđ , (16) The proof of these two theorems will be given in Sections 2, 3, and 4, and notice that the log term can be dropped when đĄ ≠ 2. Our error analysis reveals some interesting phenomena for learning with unbounded and dependent sampling. −1 1 đ 1 đ đx đx + đđź) đ đŚ, đ đ x where đż đž : đż2đđ (đ) → đż2đđ (đ) is the integral operator defined as đż đž đ (đĽ) = ∫ đž (đĽ, đĄ) đ (đĄ) đđđ (đĄ) , đ for any đĽ ∈ đ. (18) For a random variable đ with values in a Hilbert space H and 0 ≤ đ˘ ≤ +∞, denote the đ˘th moment as âđâđ˘ = 1/đ˘ (Eâđâđ˘H ) if 1 ≤ đ˘ < ∞ and âđâ∞ = sup âđâH . Lemma 5 is due to Billingsley [21]. Lemma 5. Let đ and đ be random variables with values in a separable Hilbert space H measurable đ-field J and D and having finite đth and đth moments, respectively, where đ, đ ≥ 1 with đ−1 + đ−1 = 1. Then óľ¨ óľŠ óľŠ óľŠ óľŠ óľ¨óľ¨ 1/đ óľ¨óľ¨E (đ, đ) − (Eđ, Eđ)óľ¨óľ¨óľ¨ ≤ 2đ (J, D) óľŠóľŠóľŠđóľŠóľŠóľŠđ óľŠóľŠóľŠđóľŠóľŠóľŠđ . (19) Lemma 6. For an đ-mixing sequence {đĽđ }, one has đ−1 óľŠóľŠ óľŠóľŠ2 đ 4 1 EóľŠóľŠóľŠóľŠđż đž − đxđ đx óľŠóľŠóľŠóľŠ ≤ (1 + 4 ∑ đđ1/2 ) . đ đ óľŠ óľŠ đ=1 (20) Proof. With the definition of the sample operator, we have đżđž − 1 đ 1 đ đx đx = đż đž − ∑ đžđĽđ ⊗ đžđĽđ . đ đ đ=1 (21) 4 Abstract and Applied Analysis Letting đ(đĽ) = đžđĽ ⊗ đžđĽ , then đ(đĽ) is an HS(Hđž )-valued random variable defined on đ. Note that Eđ(đĽ) = đż đž ∈ HS(Hđž ), and âđż đž âHS ≤ đ 2 , âđ(đĽ)âHS ≤ đ 2 . We have óľŠóľŠ2 óľŠóľŠ 1 EóľŠóľŠóľŠóľŠđż đž − đxđ đx óľŠóľŠóľŠóľŠ đ óľŠ óľŠ óľŠóľŠ2 óľŠóľŠóľŠ óľŠóľŠ 1 đ óľŠóľŠ ≤ EóľŠóľŠEđ − ∑ đ (đĽđ )óľŠóľŠóľŠ óľŠóľŠ óľŠóľŠ đ đ=1 óľŠ óľŠHS 2 = ∫ (∫ đŚ đđ(đŚ | đĽ)) đđđ ≤ ∫ đŚ2 đđ ≤ đ2/đ . đ 1/2 óľŠ óľŠóľŠđóľŠóľŠóľŠ2 Eâ¨đ (đĽđ ) , đ (đĽđ )âŠHS ≤ â¨Eđ (đĽđ ) , Eđ (đĽđ )âŠHS + 2đ|đ−đ| óľŠ óľŠ2 óľŠ óľŠ2 1/2 ≤ óľŠóľŠóľŠđż đž óľŠóľŠóľŠHS + 2đ 4 đ|đ−đ| . (23) Thus the desired estimate can be obtained by plugging (23) into (22). Proposition 7. Suppose that the unbounded hypothesis holds with some đ ≥ 2 and that the sample sequence {(đĽđ , đŚđ )}đ đ=1 sat−đ 2 isfies an đ-mixing condition and đż đž đđ ∈ đż đđ (đ) with đ > 0. Then one has óľŠ óľŠ EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđ đ ≤ đś (đ đ −3/4 +đ đ đ−1 (1 + 4 ∑ đ=1 which implies 2 2 óľŠóľŠ óľŠóľŠ2 2 óľŠóľŠđóľŠóľŠ2 = E ((đŚ − đđ (đĽ)) đž (đĽ, đĽ)) ≤ đ E(đŚ − đđ (đĽ)) 2 2 = đ 2 (E(đŚ− đđ (đĽ)) + E(đđ (đĽ)− đđ (đĽ)) ) ≤ 2đ 2 đ2/đ . (29) Plugging (29) into (26), there holds óľŠóľŠ2 óľŠóľŠ đ óľŠóľŠ óľŠóľŠ 1 óľŠ EóľŠóľŠ ∑ đ(đ§đ ) − đż đž (đđ − đđ )óľŠóľŠóľŠ óľŠóľŠ óľŠóľŠ đ đ=1 óľŠđž óľŠ (30) đ−1 ≤ 2đ2/đ đ2 đ−1 (1 + 4 ∑ đđ1/2 ) . ) đ=1 Combining (25), (22), and (30) and taking the constant đś = √2đ(đ + 1)đ1/đ , we complete the proof. đ−1 × √ 1 + 4 ∑ đđ1/2 , đ=1 (24) The following proposition provides the bound of the difference between đz,đ and đđ in Hđž with đ-mixing process. Proposition 8. Under the assumption of Proposition 7, there holds where đś is a constant only dependent on đ , đ. Proof. By [7, Theorem 3.1], we have óľŠ óľŠ EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđ đ óľŠóľŠ óľŠóľŠ2 1/4 1 ≤ (đ−1/2 + đ−1 (EóľŠóľŠóľŠóľŠđż đž − đxđ đx óľŠóľŠóľŠóľŠ ) ) đ óľŠ óľŠ (27) 2 óľŠ óľŠ óľŠ2 −1 óľŠ2 E(đđ (đĽ)−đđ (đĽ)) = óľŠóľŠóľŠóľŠđ (đđź + đż đž ) đđ óľŠóľŠóľŠóľŠđ ≤ óľŠóľŠóľŠóľŠđđ óľŠóľŠóľŠóľŠđ ≤ đ2/đ , đ đ (28) 1/4 đđ1/2 ) đ đ By Lemma 5 with đ = đ = 2, for đ ≠ đ, −1 đ Thus E(đŚ − đđ (đĽ))2 = EđŚ2 − âđđ â2đ ≤ đ2/đ and (22) −1/2 óľ¨ óľ¨đ 2/đ EđŚ2 ≤ (Eóľ¨óľ¨óľ¨đŚóľ¨óľ¨óľ¨ ) ≤ đ2/đ , óľŠóľŠ óľŠóľŠ2 óľŠóľŠđđ óľŠóľŠ = ∫ đđ2 (đĽ) đđđ óľŠ óľŠđđ đ 1 óľŠ óľŠ2 1 óľŠ óľŠ2 = óľŠóľŠóľŠđóľŠóľŠóľŠ2 + 2 ∑ Eâ¨đ (đĽđ ) , đ (đĽđ )âŠHS − óľŠóľŠóľŠđż đž óľŠóľŠóľŠHS . đ đ đ ≠ đ −1/2 It suffices to estimate âđâ2 . By HoĚlder inequality, there is đ−1 óľŠ óľŠ EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđž ≤ √2đ1/đ đ đ−1 đ−1/2 √ 1 + 4 ∑ đđ1/2 . (31) đ=1 (25) óľŠóľŠ đ−1 óľŠóľŠ2 óľŠóľŠ 1 óľŠóľŠ × √ EóľŠóľŠóľŠ ∑ đ(đ§đ ) − đż đž (đđ − đđ )óľŠóľŠóľŠ , óľŠóľŠ đ đ=1 óľŠóľŠ óľŠ óľŠđž where đ(đ§) = (đŚ − đđ (đĽ))đžđĽ is a random variable with values in Hđž , and Eđ = đż đž (đđ − đđ ). A similar computation together with the result of Lemma 6 leads to óľŠóľŠ2 óľŠóľŠ đ đ−1 óľŠóľŠ óľŠóľŠ 1 1 óľŠ óľŠ2 óľŠ EóľŠóľŠ ∑ đ (đ§đ ) − đż đž (đđ − đđ )óľŠóľŠóľŠ ≤ (1 + 4 ∑ đđ1/2 ) óľŠóľŠóľŠđóľŠóľŠóľŠ2 . óľŠóľŠ óľŠóľŠ đ đ=1 đ đ=1 óľŠđž óľŠ (26) Proof. The representations of đz,đ and đđ imply that óľŠ óľŠ EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđž óľŠóľŠ óľŠóľŠ −1 óľŠóľŠ 1 óľŠóľŠ 1 đ−1 = EóľŠóľŠóľŠ( đxđ đx + đđź) ( ∑ đ (đ§đ )−đż đž (đđ − đđ ))óľŠóľŠóľŠ óľŠóľŠ đ óľŠóľŠ đ đ=1 óľŠ óľŠđž ≤đ −1 √ óľŠóľŠ đ−1 óľŠóľŠ2 óľŠóľŠ 1 óľŠóľŠ óľŠ EóľŠóľŠ ∑ đ(đ§đ ) − đż đž (đđ − đđ )óľŠóľŠóľŠ . óľŠóľŠ đ đ=1 óľŠóľŠ óľŠ óľŠđž Then the desired bound follows from (30) and (32). (32) Abstract and Applied Analysis 5 3. Samples Satisfying đź-Mixing Condition Now we can deduce that Now we turn to bound the sample error when the sampling process satisfies strongly mixing condition, and unbounded hypothesis holds. In Section 2, the key point is to estimate âđâ2 with the lack of uniform boundedness. For the sampling satisfying đź-mixing condition, we have to deal with âđâđ for some đ > 2. Proposition 9. Suppose that the unbounded hypothesis holds with some đ > 2 and that the sample sequence {(đĽđ , đŚđ )}đ đ=1 sat2 isfies an đź-mixing condition and đż−đ đ ∈ đż (đ) with đ > 0. đž đ đđ Then one gets đ−1 óľŠ óľŠ Ě min{(đ−2)(2đ−1)/2đ,0} √ 1 + ∑ đź(đ−2)/đ EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđ ≤ đśđ đ đ óľ¨đ 2/đ óľ¨ ≤ đ 2 (Eóľ¨óľ¨óľ¨đŚ − đđ (đĽ)óľ¨óľ¨óľ¨ ) óľ¨đ 2/đ óľ¨ óľ¨đ óľ¨ ≤ 4đ 2 (E max {óľ¨óľ¨óľ¨đŚóľ¨óľ¨óľ¨ , óľ¨óľ¨óľ¨đđ (đĽ)óľ¨óľ¨óľ¨ }) óľ¨đ 2/đ óľ¨ óľ¨đ 2/đ óľ¨ ≤ 2đ 2 ((Eóľ¨óľ¨óľ¨đŚóľ¨óľ¨óľ¨ ) + (Eóľ¨óľ¨óľ¨đđ (đĽ)óľ¨óľ¨óľ¨ ) ) 2 ≤ 2đ 2 (đ2/đ + đ4/đ (đś12 đ 2 + 1)) đmin{(đ−2)(2đ−1)/đ,0} . (38) Plugging this estimate into (35) yields đ=1 đ−1 đ/2 2/đ 2 óľŠóľŠ óľŠóľŠ2 óľŠóľŠđóľŠóľŠđ = (E((đŚ − đđ (đĽ)) đž (đĽ, đĽ)) ) 1/4 × (đ−1/2 đ−1/2 +đ−1 đ−3/4 (1 + ∑ đźđ) ), đ=1 (33) óľŠóľŠ2 óľŠóľŠ đ óľŠóľŠ óľŠóľŠ 1 EóľŠóľŠóľŠ ∑ đ(đ§đ ) − đż đž (đđ − đđ )óľŠóľŠóľŠ óľŠóľŠ óľŠóľŠ đ đ=1 óľŠđž óľŠ đ−1 (39) (đ−2)/đ ≤ đś2 đ−1 đmin{(đ−2)(2đ−1)/đ,0} (1 + ∑ đźđ ), đ=1 Ě is a constant only depending on đ , đ and where đś − min{đ,1/2} đđ âđ . âđż đž đ Proof. For the strongly mixing process, by [8, Lemma 5.1], đ−1 óľŠóľŠ óľŠóľŠ2 đ4 1 EóľŠóľŠóľŠóľŠđż đž − đxđ đx óľŠóľŠóľŠóľŠ ≤ (1 + 30 ∑ đźđ ) . đ đ óľŠ óľŠ đ=1 (34) đ−1 (đ−2)/đ óľŠ óľŠ , EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđž ≤ đś3 đ−1/2 đ−1 √ 1 + ∑ đźđ óľŠóľŠ đ óľŠóľŠ2 óľŠóľŠ 1 óľŠóľŠ EóľŠóľŠóľŠ ∑ đ(đ§đ ) − đż đž (đđ − đđ )óľŠóľŠóľŠ óľŠóľŠ đ đ=1 óľŠóľŠ óľŠ óľŠđž (35) The estimation of âđâ2 has been obtained in Section 2, and now we mainly devote to estimating âđâđ . To get this estimation, the bound of đđ is needed which can be stated as follows ([3, Lemma 3] or [8, Lemma 4.3]): óľŠ óľŠ óľ¨ óľ¨óľ¨ min{(2đ−1)/2,0} , óľ¨óľ¨đđ (đĽ)óľ¨óľ¨óľ¨ ≤ đ óľŠóľŠóľŠđđ óľŠóľŠóľŠđž ≤ đś1 đ đ (36) where đś1 = âđż−đžmin{đ,1/2} đđ âđ . Observe that âđđ â2đđ ≤ đ đ 2/đ óľ¨đ 2/đ óľŠ óľŠ2 đ−2 óľ¨ (Eóľ¨óľ¨óľ¨đđ (đĽ)óľ¨óľ¨óľ¨ ) ≤ (óľŠóľŠóľŠđđ óľŠóľŠóľŠđđ đś1 đ đ−2 đ(đ−2) min{(2đ−1)/2,0} ) ≤đ (đś12 đ 2 min{(đ−2)(2đ−1)/đ,0} + 1) đ (40) đ=1 1 óľŠ óľŠ2 30 đ−1 (đ−2)/đ óľŠóľŠ óľŠóľŠ2 ≤ óľŠóľŠóľŠđóľŠóľŠóľŠ2 + ∑đź óľŠóľŠđóľŠóľŠđ . đ đ đ=1 đ 4/đ2 For đź-mixing process we have the following proposition to get the bound of sample error in Hđž , and the proof can be directly obtained by the inequality (32). Proposition 10. Under assumption of Proposition 9, one has Taking đż = đ − 2 in [8, Lemma 4.2], we have âđđ â2đ ≤ EđŚ2 ≤ đ2/đ . Hence, where đś2 is a constant only depending on đ , đ and âđż−đžmin{đ,1/2} đđ âđ . Then combining (34) and (39) with (25), đ we complete the proof. . (37) where đś3 = √đś2 . 4. Error Bounds and Learning Rates In this section we derive the learning rates, that is, the convergence rates of âđz,đ − đđ âđ and âđz,đ − đđ âđž as đ → ∞ đ by choosing the regularization parameter đ according to đ. The following approximation error bound is needed to get the convergence rates. 2 Proposition 11. Supposing that đż−đ đž đđ ∈ đż đđ (đ) for some đ > 0, there holds óľŠ óľŠ óľŠ óľŠóľŠ óľŠóľŠđđ − đđ óľŠóľŠóľŠ ≤ đmin{đ,1} óľŠóľŠóľŠđż−đžmin{đ,1} đđ óľŠóľŠóľŠ . óľŠđđ óľŠ óľŠđđ óľŠ (41) Moreover, when đ ≥ 1/2, that is, đđ ∈ Hđž , there holds óľŠóľŠ óľŠ óľŠ óľŠ óľŠóľŠđđ − đđ óľŠóľŠóľŠ ≤ đmin{đ−(1/2),1} óľŠóľŠóľŠđż−đžmin{đ,3/2} đđ óľŠóľŠóľŠ . óľŠ óľŠđž óľŠ óľŠđđ (42) 6 Abstract and Applied Analysis The first conclusion in Proposition 11 has been proved in [20], and the second one can be proved in the same way. To derive the learning rates, we need to balance the approximation error and sample error. For this purpose, the following simple facts are necessary: đ−1 ∑ đ−đ đ=1 1 { đ1−đ { { 1 − đ { { { { { { ≤ {log đ { { { { { { { { 1 {đ − 1 ≤ (đmin{đ−(1/2),1} + đ−1 đ−1 đ−1/2 √ 1 + 4 ∑ đđ1/2 ) . đ=1 if đ = 1, The rest of the proof is analogous to the estimate of âđz,đ − đđ âđ mentioned previously. if đ > 1. 1 + 4 ∑ đđ1/2 ≤ 1 + 4√đ ∑ đ−đĄ/2 ≤ (1 + đ=1 đ=1 (48) (43) đ Case 1. For 0 < đĄ < 2, by (43) and đđ ≤ đđ−đĄ , there is đ−1 óľŠ óľŠóľŠ óľŠóľŠđz − đđ óľŠóľŠóľŠ óľŠđž óľŠ đ−1 if 0 < đ < 1, Proof of Theorem 3. The estimate of learning rates in đż2đđ (đ) norm is divided into two cases. đ−1 Next for bounding the generalization error in Hđž , Proposition 8 in connection with Proposition 11 tells us that with confidence 1 − đ, Proof of Theorem 4. For 0 < đĄ < 1, by (43) and đźđ ≤ đđ−đĄ , there is đ−1 (đ−2)/đ 1 + ∑ đźđ đ−1 ≤ 1 + đ(đ−2)/đ ∑ đ−((đ−2)/đ)đĄ đ=1 8√đ ) đ1−(đĄ/2) . 2−đĄ (44) đ=1 ≤ (1 + (đ−2)/đ đđ ) đ1−((đ−2)/đ)đĄ , đ − (đ − 2) đĄ đ−1 đ−1 đ=1 đ=1 1 + ∑ đźđ ≤ 1 + đ ∑ đ−đĄ ≤ (1 + Thus Proposition 7 yields that đ ) đ1−đĄ . 1−đĄ (49) 16√đ óľŠ óľŠ EóľŠóľŠóľŠđz,đ − đđ óľŠóľŠóľŠđ ≤ 2đś (1 + ) (đ−1/2 đ−đĄ/4 + đ−1 đ−3đĄ/8 ) . đ 2−đĄ (45) By Propositions 9 and 11 and Markov inequality, with confidence 1 − đ, there holds By Proposition 11 and Markov inequality, with confidence 1 − đ, there holds óľŠóľŠ óľŠ óľŠóľŠđz,đ − đđ óľŠóľŠóľŠ = đ (đmin{đ,1} + đ−1 đmin{(đ−2)(2đ−1)/2đ,0}−1/2 óľŠ óľŠđđ óľŠóľŠ óľŠ óľŠóľŠđz,đ − đđ óľŠóľŠóľŠ ≤ đ (đmin{đ,1} + đ−1 (đ−1/2 đ−đĄ/4 + đ−1 đ−3đĄ/8 )) . óľŠ óľŠđđ (46) For 0 < đ < 1/2, by taking đ = đ−3đĄ/(8(đ+1)) , we can deduce the learning rate as đ(đ−3đĄđ/(8(đ+1)) ). When 1/2 ≤ đ < 1, taking đ = đ−đĄ/(2(2đ+1)) , the learning rate đ(đ−đđĄ/(2(2đ+1)) ) can be derived. When đ ≥ 1, the desired convergence rate is obtained by taking đ = đ−đĄ/6 . Case 2. đĄ ≥ 2. With confidence 1 − đ, there holds óľŠóľŠ óľŠ óľŠóľŠđz,đ − đđ óľŠóľŠóľŠ óľŠ óľŠđđ = đ (đmin{đ,1} + đ−1 (đ−1/2 đ−1/2 + đ−1 đ−3/4 ) (log đ) 3/4 ). (47) For 0 < đ < 1/2, taking đ = đ−3/(4(đ+1)) , the learning rate đ(đ−3đ/(4(đ+1)) (log đ)3/4 ) can be derived, and for 1/2 ≤ đ < 1, by taking đ = đ−1/(2đ+1) , we can deduce the learning rate đ(đ−đ/(2đ+1) (log đ)3/4 ). When đ ≥ 1, the desired convergence rate is obtained by taking đ = đ−1/3 . × (đ−(đ−2)đĄ/2đ +đ−1/2 đ−(đ−2)đĄ/2đ−đĄ/4 ) ) . (50) For 0 < đ < 1/2, by taking đ = đ−(đ−2)đĄ/2(2đ+đ−1) , we can deduce the learning rate as đ(đ−(đ−2)đĄđ/2(2đ+đ−1) ). When 1/2 ≤ đ < 1, taking đ = đ−(đ−2)đĄ/đ(2đ+1) , the learning rate đ(đ−(đ−2)đđĄ/đ(2đ+1) ) can be derived. When đ ≥ 1, the desired convergence rate is obtained by taking đ = đ−(đ−2)đĄ/3đ . The rest of the analysis is similar; we omit it here. Acknowledgment This paper is supported by the National Nature Science Foundation of China (no. 11071276). References [1] T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000. Abstract and Applied Analysis [2] S. Smale and D.-X. Zhou, “Shannon sampling. II. Connections to learning theory,” Applied and Computational Harmonic Analysis, vol. 19, no. 3, pp. 285–302, 2005. [3] S. Smale and D.-X. Zhou, “Learning theory estimates via integral operators and their approximations,” Constructive Approximation, vol. 26, no. 2, pp. 153–172, 2007. [4] Q. Wu, Y. Ying, and D.-X. Zhou, “Learning rates of leastsquare regularized regression,” Foundations of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006. [5] D. S. Modha and E. Masry, “Minimum complexity regression estimation with weakly dependent observations,” IEEE Transactions on Information Theory, vol. 42, no. 6, pp. 2133–2145, 1996. [6] S. Smale and D.-X. Zhou, “Online learning with Markov sampling,” Analysis and Applications, vol. 7, no. 1, pp. 87–113, 2009. [7] H. Sun and Q. Wu, “A note on application of integral operator in learning theory,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 416–421, 2009. [8] H. Sun and Q. Wu, “Regularized least square regression with dependent samples,” Advances in Computational Mathematics, vol. 32, no. 2, pp. 175–189, 2010. [9] K. B. Athreya and S. G. Pantula, “Mixing properties of Harris chains and autoregressive processes,” Journal of Applied Probability, vol. 23, no. 4, pp. 880–892, 1986. [10] A. Caponnetto and E. De Vito, “Optimal rates for the regularized least-squares algorithm,” Foundations of Computational Mathematics, vol. 7, no. 3, pp. 331–368, 2007. [11] Z.-C. Guo and D.-X. Zhou, “Concentration estimates for learning with unbounded sampling,” Advances in Computational Mathematics, vol. 38, no. 1, pp. 207–223, 2013. [12] S.-G. Lv and Y.-L. Feng, “Integral operator approach to learning theory with unbounded sampling,” Complex Analysis and Operator Theory, vol. 6, no. 3, pp. 533–548, 2012. [13] C. Wang and D.-X. Zhou, “Optimal learning rates for least squares regularized regression with unbounded sampling,” Journal of Complexity, vol. 27, no. 1, pp. 55–67, 2011. [14] C. Wang and Z. C. Guo, “ERM learning with unbounded sampling,” Acta Mathematica Sinica, vol. 28, no. 1, pp. 97–104, 2012. [15] X. R. Chu and H. W. Sun, “Half supervised coefficient regularization for regression learning with unbounded sampling,” International Journal of Computer Mathematics, 2013. [16] S. Smale and D.-X. Zhou, “Shannon sampling and function reconstruction from point values,” The American Mathematical Society, vol. 41, no. 3, pp. 279–305, 2004. [17] H. Sun and Q. Wu, “Application of integral operator for regularized least-square regression,” Mathematical and Computer Modelling, vol. 49, no. 1-2, pp. 276–285, 2009. [18] F. Bauer, S. Pereverzev, and L. Rosasco, “On regularization algorithms in learning theory,” Journal of Complexity, vol. 23, no. 1, pp. 52–72, 2007. [19] L. Lo Gerfo, L. Rosasco, F. Odone, E. De Vito, and A. Verri, “Spectral algorithms for supervised learning,” Neural Computation, vol. 20, no. 7, pp. 1873–1897, 2008. [20] H. Sun and Q. Wu, “Least square regression with indefinite kernels and coefficient regularization,” Applied and Computational Harmonic Analysis, vol. 30, no. 1, pp. 96–109, 2011. [21] P. Billingsley, Convergence of Probability Measures, John Wiley & Sons, New York, NY, USA, 1968. 7 Advances in Operations Research Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Advances in Decision Sciences Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Mathematical Problems in Engineering Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Journal of Algebra Hindawi Publishing Corporation http://www.hindawi.com Probability and Statistics Volume 2014 The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 International Journal of Differential Equations Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Volume 2014 Submit your manuscripts at http://www.hindawi.com International Journal of Advances in Combinatorics Hindawi Publishing Corporation http://www.hindawi.com Mathematical Physics Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Journal of Complex Analysis Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 International Journal of Mathematics and Mathematical Sciences Journal of Hindawi Publishing Corporation http://www.hindawi.com Stochastic Analysis Abstract and Applied Analysis Hindawi Publishing Corporation http://www.hindawi.com Hindawi Publishing Corporation http://www.hindawi.com International Journal of Mathematics Volume 2014 Volume 2014 Discrete Dynamics in Nature and Society Volume 2014 Volume 2014 Journal of Journal of Discrete Mathematics Journal of Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Applied Mathematics Journal of Function Spaces Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Optimization Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014