Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2010, Article ID 249507, 18 pages doi:10.1155/2010/249507 Research Article The Convergence Rate for a K-Functional in Learning Theory Bao-Huai Sheng1 and Dao-Hong Xiang2 1 2 Department of Mathematics, Shaoxing University, Shaoxing, Zhejiang 312000, China Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China Correspondence should be addressed to Bao-Huai Sheng, bhsheng@usx.edu.cn Received 11 November 2009; Accepted 21 February 2010 Academic Editor: Yuming Xing Copyright q 2010 B.-H. Sheng and D.-H. Xiang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. It is known that in the field of learning theory based on reproducing kernel Hilbert spaces the upper bounds estimate for a K-functional is needed. In the present paper, the upper bounds for the K-functional on the unit sphere are estimated with spherical harmonics approximation. The results show that convergence rate of the K-functional depends upon the smoothness of both the approximated function and the reproducing kernels. 1. Introduction It is known that the goal of learning theory is to approximate a function or some function features from data samples. Let X be a compact subset of n-dimensional Euclidean spaces Rn , Y ⊂ R. Then, learning theory is to find a function f related the input x ∈ X to the output y ∈ Y see 1–3. The function fx is determined by a probability distribution ρx, y ρy | xρX x on Z : X × Y, where ρX x is the marginal distribution on X and ρy | x is the condition probability of y for a given x. Generally, the distribution ρx, y is known only through a set of sample z : {zi }m i1 independently drawn according to ρx, y. Given a sample z, the regression {xi , yi }m i1 problem based on Support Vector Machine SVM learning is to find a function fz : X → Y such that fz x is a good estimate of y when a new input x is provided. The binary classification problem based on SVM learning is to find a function ψz : X → {−1, 1} which divides X into two parts. Here ψz is often induced by a real-valued function fz with the form of ψz x sgnfz x where sgnfz x 1 if fz x ≥ 0, otherwise, −1. The functions fz are often generated from the following Tikhonov regularization scheme see, e.g., 4– 9 associated with a reproducing kernel Hilbert space RKHS HK defined below and a 2 Journal of Inequalities and Applications sample z: fz : fz,λ,HK arg min f∈HK m 2 1 Vp yi fxi λf HK m i1 1.1 , p where λ > 0 is a positive constant called the regularization parameter and Vp t 1 − t max{1 − t, 0}p p ≥ 1 called p-norm SVM loss. In addition, the Tikhonov regularization scheme involving offset b see, e.g., 4, 10, 11 can be presented below with a similar way to 1.1 fz : fz, λ, HK arg min ff ∗ b, f ∗ ∈HK , b∈R m 2 1 Vp yi fxi λf ∗ HK m i1 . 1.2 We are in a position to define reproducing kernel Hilbert space. A function K : X×X → R is called a Mercer kernel if it is continuous, symmetric, and positive semidefinite, that is, for any finite set of distinct points {x1 , x2 , . . . , xl } ⊂ X, the matrix Kxi , xj li,j1 is positive semidefinite. The reproducing kernel Hilbert space RKHS HK see 12 associated with the Mercer kernel K is defined to be the closure of the linear span of the set of functions {Kx : Kx, · : x ∈ X} with the inner product ·, ·HK ·, ·K satisfying Kx , Ky K Kx, y and the reproducing property Kx , gK gx, ∀x ∈ X, g ∈ HK . 1.3 If gx i ci Kx, xi , then g 2K i,j ci cj Kxi , xj . Denote CX as the space of continuous function on X with the norm · ∞ . Let κ : K ∞ . Then the reproducing property tells that g ≤ κg , ∞ K ∀g ∈ HK . 1.4 It is easy to see that HK is a subset of CX. We say that Kx, y is a universal kernel if for any compact subset X HK is dense in CX see 13, Page 2652. Let ∧ ⊂ X be a given discrete set of finite points. Then, we may define an RKHS H∧K , · H∧K by the linear span of the set of functions {Kx : Kx, · : x ∈ ∧}. Then, it is easy to see that H∧K ⊂ HK and for any f ∈ H∧K there holds f H∧K f K . Vp Define Ef : Z Vp yfxdρx, y and fρ : arg min Ef, where the minimum is taken over all measurable functions. Then, to estimate the explicit learning rate, one needs to estimate the regularization errors see, e.g., 4, 7, 9, 14 Dλ : inf f∈HK Dλ : inf ∗ V 2 p E f − E fρ λf HK , V 2 p E f − E fρ λf ∗ HK . ff ∗ b, f ∈HK , b∈R 1.5 1.6 Journal of Inequalities and Applications 3 The convergence rate of 1.5 is controlled by the K-functional see, e.g., 9 2 K f, λ HK inf f − g p,dρX λg K , g∈HK λ > 0, f ∈ Lp dρX , 1.7 and 1.6 is controlled by another K-functional see, e.g., 4 K f, λ HK 2 f − g λg ∗ K , p,dρX inf g∈HK , gg ∗ b, g ∗ ∈HK , b∈R 1.8 where λ > 0, f ∈ Lp dρX {fx : f p,dρX < ∞} with f p, dρX ⎧ 1/p ⎪ ⎪ fxp dρX x ⎪ , 1 ≤ p < ∞, ⎪ ⎪ ⎨ X ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ess supfx, 1.9 p ∞. x∈X We notice that, on one hand, the K-functionals 1.7 and 1.8 are the modifications of the K-functional of interpolation theory see 15 since the interpolation relation 1.4. On the other hand, they are different from the usual K-functionals see e.g., 16–30 since the term g 2K . However, they have some similar point. For example, if Kx, y is a universal kernel, HK is dense in Lp dρX see e.g., 31. Moreover, some classical function spaces such as the polynomial spaces see 2, 32 and even some Sobolev spaces may be regarded as RKHS see e.g., 33. In learning theory we often require Kf, tHK Otβ and Kf, tHK Otβ for some β > 0 see e.g., 1, 7, 14. Many results on this topic have been achieved. With the weighted Durrmeyer operators 8, 9 showed the decay by taking Kx, y to be the algebraic polynomials kernels on 0, 1 × 0, 1 or on the simplex in R2 . However, in general case, the convergence of K-functional 1.8 should also be considered since the offset often has influences on the solution of the learning algorithms see e.g., 6, 11. Hence, the purpose of this paper is twofold. One is to provide the convergence rates of 1.7 and 1.8 when Kx, y is a general Mercer kernel on the unit sphere Sq and 1 ≤ p ≤ ∞. The other is how to construct functions of the type of fx β0 g ∗ x β0 m βi Kx, xi , x ∈ X, 1.10 i1 to obtain the convergence rate of 1.8. The translation networks constructed in 34–37 have the form of 1.10 and the zonal networks constructed in 38, 39 have the form of 1.10 with β0 0. So the methods used by these references may be used here to estimate the convergence rates of 1.7 and 1.8 if one can bound the term g ∗ 2K . In the present paper, we shall give the convergence rate of 1.7 and 1.8 for a general kernel defined on the unit sphere Sn {x ∈ Rn1 : x Rn1 1} and ρx, y ρy | xμn x with μn x being the usual Lebesgue measure on Sn . If there is a distortion between ρSn x 4 Journal of Inequalities and Applications and μn x, the convergence rate of 1.7-1.8 in the general case may be obtained according to the way used by 1, 8. The rest of this paper is organized as follows. In Section 2, we shall restate some notations on spherical harmonics and present the main results. Some useful lemmas dealing with the approximation order for the de la Vallée means of the spherical harmonics, the Gauss integral formula, the Marcinkiewicz-Zygmund with respect to the scattered data obtained by G. Brown and F. Dai and a result on the zonal networks approximation provided by H. N. Mhaskar will be given in Section 3. A kind of weighted norm estimate for the Mercer kernel matrices on the unit sphere will be given in Lemma 3.8. Our main results are proved in the last section. Throughout the paper, we shall write A OB if there exists a constant C > 0 such that A ≤ CB. We write A ∼ B if A OB and B OA. 2. Notations and Results To state the results of this paper, we need some notations and results on spherical harmonics. 2.1. Notations For integers l ≥ 0, q ≥ 1, the class of all one variable algebraic polynomials of degree ≤ l defined on −1, 1 is denoted by Pl , the class of all spherical harmonics of degree l will be q denoted by Hl , and the class of all spherical harmonics of degree l ≤ n will be denoted by q q n . The dimension of Hl is given by see 40, Page 65 q dl q and that of Πn is 10, Theorem 2: n q dimHl ⎧ ⎛ ⎞ ⎪ l q − 1 ⎪ 2l q − 1 ⎪ ⎝ ⎠, ⎪ ⎪ ⎪ ⎨ lq−1 q−1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩1, l ≥ 1, 2.1 l 0, q l0 dl . One has the following well-known addition formula see 41, Page q q dl d q1 Yl,k xYl,k y l pl xy , ωq k1 q1 where pl l 0, 1, . . . , x, y ∈ Sq , 2.2 x is the degree-l generalized Legendre polynomial. The Legendre polynomials q1 are normalized so that pl 1 −1 q1 pl 1 1 and satisfy the orthogonality relations q1 xpk xWq xdx ωq q δl,k , ωq−1 dl q/2−1 Wq x 1 − x2 . 2.3 Journal of Inequalities and Applications 5 p Define Lp Sq and LWα,β by taking ρX to be the usual volume element of Sq and the Jacobi weights functions Wα,β x 1 − xα 1 xβ , α > −1, β > −1, respectively. For any φ ∈ L1Wq we have the following relation see 42, Page 312: Sq φ xy dμq x ωq−1 1 −1 2.4 φxWq xdx. q The orthogonal projections Yk f, x of a function f ∈ L1 Sq on Hk are defined by see e.g., 43 q dk Yk f, x ωq q1 Sq pk xy f y dμq y , 2.5 x ∈ Sq , where xy denotes the inner product of x and y. 2.2. Main Results Let φ ∈ L1Wq satisfy al φ ωq−1 1 −1 q1 φxpl xWq xdx > 0 and q ∞ dl q1 K φ, x, y K φ, xy al φ p xy , ωq l l0 ∞ l0 q al φdl < ∞. Define al φ > 0, x, y ∈ Sq . 2.6 Then, by 44, Chapter 17 we know that Kφ, x, y is positive semidefinite on Sq and the right q1 of 2.6 is convergence absolutely and uniformly since |pl x| ≤ 1. Therefore, Kφ, x, y is a Mercer kernel on Sq . By 13, Theorem 10 we know that Kφ, x, y is a universal kernel on Sq . We suppose that there is a constant Cp > 0 depending only on p such for any y ∈ Sq Kφ, ·, y q ≤ Cp φ , p,S p,Wq p 1 ≤ p ≤ ∞, φ ∈ LWq . 2.7 Given a finite set ∧, we denote by ∧ the cardinality of ∧. For r > 0 and a ≥ 1 we say that a finite subset ∧ ⊂ Sq is an r, a-covering of Sq if Sq ⊂ Bω, r, ω∈∧ max∧ ∩ Bω, r ≤ a, ω∈∧ 2.8 where Bω, r {x ∈ Sq : dω, x ≤ r} with dω, x : arc cosω, x being the geodesic distance between x and ω. 6 Journal of Inequalities and Applications Let S ≥ 1 be an integer, a {an }∞ n0 a sequence of real numbers. Define forward difference operators by Δan Δ1 an : an1 − an , Δr : ΔΔr−1 an , r 2, 3, . . . , Δ0 an : an , |a|S : |a|∗S sup v 1r |Δr av |, 0≤r≤S, v≥0 ∞ S 2.9 v 1r−1 |Δr av |. r1 v0 We say a finite subset ∧ ⊂ Sq is a subset of interpolatory type if for any real numbers q {yω }ω∈∧ there is a p ∈ Πn such that pω yω , ω ∈ ∧. This kind of subsets may be found from 45, 46. Let BS be the set of all sequence a for which |a|S < ∞ and B∗S the set of all sequence a for which |a|S |a|∗S < ∞. Let β > 0 be a real number, f ∈ Lp Sq . Then, we say ∂β f ∈ Lp Sq if there is a function p ϕ ∈ L Sq such that ϕx ∼ ∞ kβ Yk f, x , x ∈ Sq . 2.10 k0 We now give the results of this paper. Theorem 2.1. If there is a constant γ0 > 0 depending only on q such that ∧ is a subset of interpolatory type and a δ/62m , a-covering of Sq satisfying 0 < δ < γ0 /a with a ≥ 1 and m being a given positive integer. S > q is an integer. β > 0 is a real number such that there is γ ≥ γ 2q − 1 and ∞ ∞ β −β −1 ∗ ∧ β − γ > q, φ ∈ L∞ Wq satisfies {l 1 al φ}l0 ∈ BS and {l 1 al φ}l0 ∈ BS . HKφ is the p q reproducing kernel space reproduced by ∧ and the kernel 2.6. ∂γ f ∈ L S . Then there is a constant ∗ ∗ x with gm,φ x ∈ H∧Kφ Cp,q > 0 depending only on p and q and a function gm,φ x C0 gm,φ and C0 a constant such that f − gm,φ q O p,S ∗ 2 gm,φ H∧Kφ 1 , 2.11 O 22m2q−1 . 2.12 2mγ The functions φ satisfying the conditions of Theorem 2.1 may be found in 39, Page 357. Corollary 2.2. Under the conditions of Theorem 2.1. If ∂γ f ∈ Lp Sq , then K f, 1 22mγ O mγ . 2 H∧Kφ 1 2.13 Corollary 2.2 shows that the convergence rate of the K-functional 1.8 is controlled by the smoothness of both the reproducing kernels and the approximated function f. Journal of Inequalities and Applications 7 Theorem 2.3. If there is a constant γ > 0 depending only on q such that ∧ is a subset of interpolatory type and a δ/62m , a-covering of Sq satisfying 0 < δ < γ/a with a ≥ 1 and m being a given positive integer. H∧Kφ is the reproducing kernel space reproducing by ∧ and the kernel 2.6 with φ ∈ β β −1 ∗ L∞ Wq satisfying {l 1 al φ} ∈ BS and {1/l 1 al φ} ∈ BS . Then, for α ≥ q1/p − 1/2 β and ∂β f ∈ Lp Sq there holds 1 K f, αm 2 H∧ O Kφ 1 2mβ , 2.14 where a maxa, 0. 3. Some Lemmas To prove Theorems 2.1 and 2.3, we need some lemmas. The first one is about the Gauss integral formula and Marcinkiewicz inequalities. Lemma 3.1 see 47–50. There exist constants γ > 0 depending only on q such that for any positive integer n and any δ/n, a-covering ∧ of Sq satisfying 0 < δ < γ/a, there exists a set of real numbers λω ∼ n−q , ω ∈ ∧ such that Sq q f y dμq y λω fω 3.1 ω∈∧ q for any f ∈ Π3n , and for f ∈ Πn f q ∼ p,S ⎧ 1/p ⎪ p ⎪ ⎪ ⎪ λω fω , ⎪ ⎪ ⎨ ω∈∧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩maxfω, ω∈∧ 0 < p < ∞, 3.2 p ∞, q where ∧ ∼ nq ∼ dimΠn , the constants of equivalence depending only on q, a, δ, and p when p is small. Here one employs the slight abuse of notation that 00 1. The second lemma we shall use is the Nikolskii inequality for the spherical harmonics. q Lemma 3.2 see 38, 45, 49, 51, 52. If 0 < p < r ≤ ∞, p ∈ Πn , then one has the following Nikolskii inequality: p q ≤ C q p q ≤ C q nq1/p−1/r p q , p,S r,S p,S 3.3 where the constant Cq depends only on q. We now restate the general approximation frame of the Cesàro means and de la Vallée Poussin means provided by Dai and Ditzian see 53. 8 Journal of Inequalities and Applications Lemma 3.3. Let μ be a positive measure on X. {Hk }∞ k0 is a sequence of finite-dimensional spaces satisfying the following: I Hk ⊂ Lp dμ. k. II Hk is orthogonal to Hl (in Lp dμ) when l / ∞ p III span k0 Hk is dense in L dμ for all 1 ≤ p ≤ ∞. IV H0 is the collection of the constants. The Cesàro means Cδ of fx is given by N 1 δ σN AδN−k Pk f, x , f, x δ AN k0 AδN Γk δ 1 Γk 1Γδ 1 3.4 for N 1, 2, . . ., where d k f y ϕk,i xϕk,i y dμ y , Pk f, x X x ∈ X, 3.5 l1 k and {ϕk,i x}di1 is an orthogonal base of Hk in L2 dμ. One sets,for a given α > 0, ∂α fx ∼ ∞ α p p α k1 k Pk f, x and ∂α f ∈ L dμ if there exists ϕ ∈ L dμ such that Pk ϕ, x k Pk f, x. ∞ Let η be defined as ηu ∈ C , ηu 1 for u ∈ 0, 1/2 and ηu 0 for u > 1 and is a nonegative and nonincrease function. ρN f, x are the de la Vallée Poussin means defined as ∞ ρN f, x η k0 k Pk f, x , 2N x ∈ X, f ∈ L1 dμ . 3.6 δ Then, ρN f, x fx, f ∈ k≤N Hk . If for some δ > 0, σN f p, dμ ≤ C f p, dμ , 1 ≤ p ≤ ∞, then, ρN f p, dμ ≤ C f p, dμ and ρN f − f O p, dμ ∂α f p, dμ , Nα ∂α f ∈ Lp dμ . 3.7 Lemma 3.3 makes the following Lemma 3.4. Lemma 3.4. Let η be the function defined as in Lemma 3.3. Define two kinds of operators, respectively, by ∞ η VN f, x k0 k Yk f, x , 2N x ∈ Sq , f ∈ L1 Sq , q ∞ dl q1 k al f η p x, f, x 2N ωq l k0 ∗ VN 3.8 x ∈ −1, 1, f ∈ L1Wq . Journal of Inequalities and Applications 9 q Then, VN f, x fx for any f ∈ ΠN and VN∗ f, x fx for any f ∈ PN . Moreover, VN f − f q O p,S ∂α f q p,S Nα ∂α f ∈ Lp Sq , , ⎞ ⎛ ∂α f ∗ p,Wq V f − f ⎠, O⎝ N p,Wq Nα p ∂α f ∈ LWq , 3.9 3.10 p where for f ∈ LWq one defines ∂α fx ∼ q ∞ d q1 kα ak f l pl x ωq k1 x ∈ −1, 1. 3.11 Proof. By 54, Lemma 2.2 we know VN f p,Sq ≤ C f p,Sq for some C > 0. Hence, 3.9 holds δ f p,Wq ≤ C f p,Wq for δ > q/2−1/2. Hence, 3.10 by 3.7. By 19, Theorem A we know σN holds by 3.7. Let ∧ ⊂ Sq be a finite set. Then we call ∧ an M-Z quadrature measure of order n if 3.1 q and 3.2 hold for f ∈ Πn . By this definition one knows the finite set ∧ in Lemma 3.1 is an M-Z quadrature measure of order n. Define an operator as ∞ dq l q1 l σy ∧, η, f, x η λω fωpl ωx, y ω q ω∈∧ l0 x ∈ Sq . 3.12 Then, we have the following results. Lemma 3.5 see 39. For a given integer n ≥ 1, let ∧ be an M-Z quadrature measure of order 62n , φ ∈ L1Wq , S > q an integer, 1 ≤ p ≤ ∞, β > q/p , where p satisfies 1/p 1/p 1 which satisfies p 1 if p ∞ and p ∞ if p 1. ηt defined in Lemma 3.3 is a nonnegative and non-increasing ∗ φ}∞ function. Let φ ∈ L1Wq satisfy {l 1−β a−1 l0 ∈ BS . Then, for f ∈ Hγ,p , 0 < γ ≤ β, where Hγ,p l p q consists of f ∈ L S for which the derivative of order γ; that is, ∂γf, belongs to Lp Sq . Then, there is an operator Dφ : Hγ,p → LP Sq such that i (see [39, Proposition 3.1, (b)]). Dφ f p,Sq ≤ c ∂γf p,Sq for f ∈ Hβ,p fx Sq φ xy Dφ f y dμq y , ! fl, k al φ D φ fl, k, where fl, k Sq fxYl,k xdμq x. x ∈ Sq , 3.13 3.14 10 Journal of Inequalities and Applications ii (see [39, Theorem 3.1]). Moreover, if one adds an assumption that {l 1β al φ}∞ l0 ∈ BS , then, there are constants C > 0 and C1 > 0 such that λω Dφ σ2n ∧, η, f, ωφ·ω f· − ω∈∧ ≤ C2−nγ ∂γf p,Sq 3.15 p,Sq and for 0 ≤ γ ≤ β ∂γ f − ∂γ σn ∧, η, f q ≤ C1 nγ−β ∂β f q . p,S p,S 3.16 Lemma 3.6 see e.g., 29, Page 230. Let f ∈ L2 Sq . Then, f 2,Sq 1/2 ∞ 2 Yk f . q 3.17 2,S k0 Following Lemma 3.7 deals with the orthogonality of the Legendre polynomials pl xy. q1 Lemma 3.7. For the generalized Legendre polynomials pl q dl ωq one has q1 Sq q1 q1 uy pl xudμq u pk xy , pk x, y ∈ Sq . 3.18 Proof. It may be obtained by 2.2. ∞ q Lemma 3.8. Let φ ∈ L∞ Wq satisfy 2.7 for p ∞ and ∂q φ ∈ LWq . ∧ ⊂ S is a finite set satisfying the conditions of Theorem 2.1. Then, there is a constant Cφ > 0 depending only on φ such that 2 λω λω K φ, ω , ω 1/2 3.19 ≤ Cφ < ∞. ω,ω ∈∧ ∧ λω K∧ ω , ω λω ω,ω ∈∧ , where K∧ ω , ω λl × l0 2 1/2 q q1 2 dl /ωq pl ω ω with λl ≥ 0 and l∧ {a aω ω∈∧ : aω < ∞}. Then, √ λ Proof. Define a matrix by K∧ ω∈∧ ω,ω ∈∧ 2 λω λω K∧ ω , ω 1/2 √ λ √ a K∧ a λ K∧ 2 sup . l∧ a a a/ 0, a∈l2 ∧ 3.20 Journal of Inequalities and Applications 11 By the Parseval equality we have √ λ a K∧ a aω λω K∧ ω , ω aω λω ⎞ q dl q1 λω ⎝ λl × pl ω ω ⎠aω λω ωq l0 ω,ω ∈∧ aω ω,ω ∈∧ ⎛ ∧ q 2 dl ∧ λl aω λω Yl,α ω ω∈∧ l0 α0 3.21 2 q dl ∧ ≤ max λl aω λω Yl,α ω Yl,α x dμq x 0≤l≤∧ Sq l0 α0 ω∈∧ 2 q ∧ dl q1 aω λω dμq x. max λl p xω l ω 0≤l≤∧ q Sq ω∈∧ l0 q Let L∧ x ∈ Πn satisfy L∧ ω aω / λω , ω ∈ ∧. Then, by 3.1 √ λ a K∧ a aω ω,ω ∈∧ λω K∧ ω , ω aω λω 2 q ∧ dl q1 dμq x ≤ max λl L ωp xωdμ ω ∧ q l 0≤l≤∧ Sq l0 ωq Sq max λl |L∧ x|2 dμq x 0≤l≤∧ max λl 0≤l≤∧ 3.22 Sq λω |L∧ x|2 max λl a a. ω∈∧ 0≤l≤∧ √ λ Hence, a K∧ a ≤ max0≤l≤∧ λl a a. On the other hand, since Kφ, x, · ∞,Sq ≤ C φ ∞,Wq , x ∈ Sq , we have for any x ∈ Sq that ∗ ∗ ∗ φ φ , x, y K φ − V∧ φ , x, y ≤ Cφ − V∧ K φ, x, y − K V∧ ∞,Wq . 3.23 It follows for x, y ∈ Sq that ∗ ∗ K V∧ φ φ , x, y − Cφ − V∧ ∞,Wq ∗ ∗ ≤ K φ, x, y ≤ K V∧ φ φ , x, y Cφ − V∧ ∞,Wq . 3.24 12 Journal of Inequalities and Applications √ λ Define K∧ φ λω Kφ, ω , ω λω ω,ω ∈∧ . Then, 3.24, 3.10, the Cauchy inequality, and the fact ∧ ∼ nq make √ √ λ λ ∗ ∗ V∧ φ a − a K∧ φ a O ∧ φ − V∧ φ a K∧ ∞,Wq × a a O1a a. 3.25 It follows that √ √ √ λ λ λ ∗ ∗ V∧ V∧ φ a ≤ a K∧ φ a a K∧ φ a − a K∧ φ a √ λ a K∧ ≤ max η 0≤l≤∧ l a a O1a a. 2∧ 3.26 Equation 3.2 thus holds. 4. Proof of the Main Results We now show Theorems 2.1 and 2.3, respectively. Proof of Theorem 2.1. Lemma 4.3 in 39 gave the following results. Let 1 ≤ p ≤ ∞, γ > q/p , S > q be an integer, and a sequence of real numbers such p γ a : {l 1γ al } ∈ BS . Then, there exists φ ∈ LWq such that al φ ∈ al , l 0, 1, 2, . . . . ∗ Since 1 lβ 1 lγ 1 lβ−γ and β − γ > q, we have a φ∗ ∈ L∞ Wq such that al φ 1 lγ al φ. Hence, ∂γ φ ∈ L∞ Wq and q ∞ dl q1 K V2∗m1 φ , x, y al V2∗m1 φ p xy , ωq l l0 4.1 x, y ∈ Sq , and for 0 ≤ k ≤ 2m1 there holds for x, y ∈ Sq that q dk ωq q q q1 dk q1 d q1 K V2∗m1 φ , x, u pl uy dμq u ak V2∗m1 φ pk xy ak φ k pk xy . ωq ωq Sq 4.2 It follows for x, y ∈ Sq that q1 pk xy Sq q1 K V2∗m1 φ , x, u pl uy dμq u ak φ 4.3 On the other hand, since 2 V2m f, x Y0 V2m f , x Yk V2m f , x , m1 k1 x ∈ Sq , 4.4 Journal of Inequalities and Applications 13 where for 1 ≤ k ≤ 2m1 , we have by 4.3 Yk V2m f , x q dk ωq ak φ q1 K V2∗m1 φ , x, u pl uy dμq u × V2m f, y dμq y . 4.5 Sq Hence, above equation and 3.1-3.2 makes V2m f, x Y0 V2m f , x q dk q1 K V2∗m1 φ , x, u pk uy dμq u Sq Sq k1 ωq ak φ × V2m f, y dμq y C0 λω Cω,φ K V2∗m1 φ , x, ω , m1 2 4.6 ω∈∧ where C0 Y0 V2m f, x, Cω,φ m1 2 k1 q dk /ωq Yk V2m f, ω/ak φ. Define ∗ gm,φ x C0 gm,φ x C0 λω Cω,φ K φ, x, ω , x ∈ Sq . ω∈∧ 4.7 ∗ Then, we know gm,φ x ∈ R H∧Kφ and by 3.9 gm,φ − f q ≤ gm,φ − V2m f q V2m f − f q p,S p,S p,S O 1 2mγ gm,φ f − V2m fp,Sq , 4.8 where gm,φ x − V2m f, x ≤ λω Cω,φ K V ∗m1 φ − φ, x, ω . 2 ω∈∧ 4.9 It follows by 3.9 that gm,φ − V2m f q ≤ λω Cω,φ KV ∗m1 φ − φ, ·, ω q 2 p,S p,S ω∈∧ O 1 2mγ 4.10 λω Cω,φ . ω∈∧ On the other hand, by the definition of V2m f and 3.14 we have for 1 ≤ k ≤ 2m1 that Yk V2m f , ω η k 2m1 ak φ Yk Dφ f, ω , ω ∈ Sq , 4.11 14 Journal of Inequalities and Applications where Dφ denotes the operator Dφ of Lemma 3.5 for φxy Kφ, xy. Hence, Cω,φ η m1 Yk Dφ f, ω , ω 2 k1 q m1 2 q dk k 4.12 ω ∈ Sq . Equation 3.2 and the definition of ηt make m1 q 2 dk k λω Cω,φ ≤ λω η m1 Yk Dφ f, ω k1 ωq 2 ω∈∧ ω∈∧ ≤ Cq m1 2 q dk k1 Sq q1 x| ≤ 1 make |Yk Dφ f, ω| ≤ . 4.14 The Hölder inequality, the i of Lemma 3.5, and the fact that |pl q q Cq dk Dφ f p,Sq ≤ Cq dk ∂γ f p,Sq . Therefore, m1 2 q 2 λω Cω,φ ≤ Cq d ∂γ f ∗ x Take gm,φ p,Sq k ω∈∧ 4.13 Yk Dφ f, ω dμq ω. k1 λω Cω,φ Kφ, x, ω, then ω∈∧ ∗ 2 gm,φ H∧Kφ λω λω Cω,φ Cω ,φ K φ, ω , ω ω,ω ∈∧ ≤ 2 λω Cω,φ 2 λω λω K φ, ω , ω 4.15 1/2 . ω,ω ∈∧ ω∈∧ Equations 3.2, 3.17, 3.16, and the Cauchy inequality make m1 q 2 2 d Yk Vm f , ω 2 k λω Cω,φ ≤ C ω a φ q k ω∈∧ k1 2,Sq ≤C m1 2 k1 q 2 dk ∂γ f p,Sq 2 4.16 . Let Γx be the Gamma function. Then, it is well known that Γx/Γx a ∼ x−a , x → ∞. Therefore, q dl 2l q − 1 lq−1 lq−1 q−1 2l q − 1 Γ l q − 1 2l q − 1 q−2 ∼ l ∼ lq−1 . Γ q Γl 1 Γ q 4.17 Journal of Inequalities and Applications 15 Hence, m1 2 q 2 dk m1 2 O k1 l 2q−1 O 2m12q−1 , m −→ ∞. 4.18 , 4.19 l1 Equations 4.14 and 4.4 make gm,φ − V2m f q O p,S 1 2mγ −2q1 and hence f − gm,φ q O p,S 1 2mγ 1 2mγ −2q1 4.20 . Since γ ≥ 2q − 1 γ, we have 2.11 by 4.20. Equation 2.12 follows by 4.3, 4.4, and 3.19. Proof of Corollary 2.2. By 2.11-2.12 one has K f, 1 22mγ HKφ 1 ∗ 2 gm,φ ≤ f − gm,φ p,Sq O 1 2mγ 22mγ 1 22mγ HKφ ×O 2 2m2q−1 O 4.21 1 2mγ . Proof of Theorem 2.3. Take the place of φxω in Lemma 3.5 with Kφ, xω, denote still by Dφ the operator Dφ in Lemma 3.5 with φxy Kφ, xy and gm,φ x λω Dφ σ2m ∧, η, f, ω K φ, xω , x ∈ Sq , 4.22 ω∈∧ ∧ then, gm,φ ∈ HKφ and by 3.15 f − gm,φ p,Sq O1/2mβ . In this case, gm,φ 2 ∧ H Kφ λω λω Dφ σ2m ∧, η, f, ω Dφ σ2m ∧, η, f, ω K φ, ωω ω,ω ∈∧ ≤ 2 λω Dφ σ2m ∧, η, f, ω ω∈∧ 2 λω λω K φ, ωω ω,ω ∈∧ 4.23 1/2 . 16 Journal of Inequalities and Applications Since σ2m ∧, η, x is a spherical harmonics of order ≤ 2m , we know by i of Lemma 3.5 that Dφ σ2m ∧, η, f, x are also spherical harmonics of order ≤ 2m . Then, 3.2, i of Lemma 3.5, 3.3, and 3.16 make 2 λω Dφ σ2m ∧, η, f, ω ≤ C Sq ω∈∧ Dφ σ2m ∧, η, f, ω2 dμq ω 2 ≤ C∂β σ2m ∧, η, f2,Sq 2 ∂β σ2m ∧, η, fp,Sq 4.24 2mq1/p−1/2 ≤ C2 2 2 ≤ C22mq1/p−1/2 ∂β f − ∂β σ2m ∧, η, f p,Sq C∂β f p,Sq . Hence, 3.19 and above equation make gm,φ 2H ∧ follows by 3.15. Kφ ≤ C2mq1/p−1/2 ∂β f 2p,Sq . Equation 2.14 Acknowledgments This work is supported by the National NSF 10871226 of China. The authors thank the reviewers for giving very valuable suggestions. References 1 F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002. 2 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2007. 3 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, New York, NY, USA, 1998. 4 D. R. Chen, Q. Wu, Y. M. Ying, and D. X. Zhou, “Support vector machine soft margin classifier: error analysis,” Journal of Machine Learning and Research, vol. 5, pp. 1143–1175, 2004. 5 T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000. 6 Y. Li, Y. Liu, and J. Zhu, “Quantile regression in reproducing kernel Hilbert spaces,” Journal of the American Statistical Association, vol. 102, no. 477, pp. 255–268, 2007. 7 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,” Foundations of Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009. 8 H. Tong, D.-R. Chen, and L. Peng, “Learning rates for regularized classifiers using multivariate polynomial kernels,” Journal of Complexity, vol. 24, no. 5-6, pp. 619–631, 2008. 9 D.-X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,” Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006. 10 D. Chen and D.-H. Xiang, “The consistency of multicategory support vector machines,” Advances in Computational Mathematics, vol. 24, no. 1–4, pp. 155–169, 2006. 11 E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri, “Some properties of regularized kernel methods,” Journal of Machine Learning Research, vol. 5, pp. 1363–1390, 2004. 12 N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950. 13 C. A. Micchelli, Y. Xu, and H. Zhang, “Universal kernels,” Journal of Machine Learning Research, vol. 7, pp. 2651–2667, 2006. Journal of Inequalities and Applications 17 14 Q. Wu, Y. Ying, and D.-X. Zhou, “Multi-kernel regularized classifiers,” Journal of Complexity, vol. 23, no. 1, pp. 108–134, 2007. 15 J. Bergh and J. Löfström, Interpolation Spaces, Springer, New York, NY, USA, 1976. 16 H. Berens and G. G. Lorentz, “Inverse theorems for Bernstein polynomials,” Indiana University Mathematics Journal, vol. 21, no. 8, pp. 693–708, 1972. 17 H. Berens and L. Q. Li, “The Peetre K-moduli and best approximation on the sphere,” Acta Mathematica Sinica, vol. 38, no. 5, pp. 589–599, 1995. 18 H. Berens and Y. Xu, “K-moduli, moduli of smoothness, and Bernstein polynomials on a simplex,” Indagationes Mathematicae, vol. 2, no. 4, pp. 411–421, 1991. 19 W. Chen and Z. Ditzian, “Best approximation and K-functionals,” Acta Mathematica Hungarica, vol. 75, no. 3, pp. 165–208, 1997. 20 W. Chen and Z. Ditzian, “Best polynomial and Durrmeyer approximation in Lp S,” Indagationes Mathematicae, vol. 2, no. 4, pp. 437–452, 1991. 21 F. Dai and Z. Ditzian, “Jackson inequality for Banach spaces on the sphere,” Acta Mathematica Hungarica, vol. 118, no. 1-2, pp. 171–195, 2008. 22 Z. Ditzian and X. Zhou, “Optimal approximation class for multivariate Bernstein operators,” Pacific Journal of Mathematics, vol. 158, no. 1, pp. 93–120, 1993. 23 Z. Ditzian and K. Runovskii, “Averages and K-functionals related to the Laplacian,” Journal of Approximation Theory, vol. 97, no. 1, pp. 113–139, 1999. 24 Z. Ditzian, “A measure of smoothness related to the Laplacian,” Transactions of the American Mathematical Society, vol. 326, no. 1, pp. 407–422, 1991. 25 Z. Ditzian and V. Totik, Moduli of Smoothness, vol. 9 of Springer Series in Computational Mathematics, Springer, New York, NY, USA, 1987. 26 Z. Ditzian, “Approximation on Banach spaces of functions on the sphere,” Journal of Approximation Theory, vol. 140, no. 1, pp. 31–45, 2006. 27 Z. Ditzian, “Fractional derivatives and best approximation,” Acta Mathematica Hungarica, vol. 81, no. 4, pp. 323–348, 1998. 28 L. L. Schumaker, Spline Functions: Basic Theory, John Wiley & Sons, New York, NY, USA, 1981, Pure and Applied Mathematics. 29 K. Y. Wang and L. Q. Li, Harmonic Analysis and Approximation on the Unit Sphere, Science Press, Beijing, China, 2000. 30 Y. Xu, “Approximation by means of h-harmonic polynomials on the unit sphere,” Advances in Computational Mathematics, vol. 21, no. 1-2, pp. 37–58, 2004. 31 S. Smale and D.-X. Zhou, “Estimating the approximation error in learning theory,” Analysis and Applications, vol. 1, no. 1, pp. 17–41, 2003. 32 B. Sheng, “Estimates of the norm of the Mercer kernel matrices with discrete orthogonal transforms,” Acta Mathematica Hungarica, vol. 122, no. 4, pp. 339–355, 2009. 33 S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning Research, vol. 9, pp. 1559–1582, 2008. 34 H. N. Mhaskar and C. A. Micchelli, “Degree of approximation by neural and translation networks with a single hidden layer,” Advances in Applied Mathematics, vol. 16, no. 2, pp. 151–183, 1995. 35 B. H. Sheng, “Approximation of periodic functions by spherical translation networks,” Acta Mathematica Sinica. Chinese Series, vol. 50, no. 1, pp. 55–62, 2007. 36 B. Sheng, “On the degree of approximation by spherical translations,” Acta Mathematicae Applicatae Sinica. English Series, vol. 22, no. 4, pp. 671–680, 2006. 37 B. Sheng, J. Wang, and S. Zhou, “A way of constructing spherical zonal translation network operators with linear bounded operators,” Taiwanese Journal of Mathematics, vol. 12, no. 1, pp. 77–92, 2008. 38 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Approximation properties of zonal function networks using scattered data on the sphere,” Advances in Computational Mathematics, vol. 11, no. 2-3, pp. 121–137, 1999. 39 H. N. Mhaskar, “Weighted quadrature formulas and approximation by zonal function networks on the sphere,” Journal of Complexity, vol. 22, no. 3, pp. 348–370, 2006. 40 H. Groemer, Geometric Applications of Fourier Series and Spherical Harmonics, vol. 61 of Encyclopedia of Mathematics and Its Applications, Cambridge University Press, Cambridge, Mass, USA, 1996. 41 C. Müller, Spherical Harmonics, vol. 17 of Lecture Notes in Mathematics, Springer, Berlin, Germany, 1966. 42 S. Z. Lu and K. Y. Wang, Bochner-Riesz Means, Beijing Normal University Press, Beijing, China, 1988. 43 Y. Wang and F. Cao, “The direct and converse inequalities for jackson-type operators on spherical cap,” Journal of Inequalities and Applications, vol. 2009, Article ID 205298, 16 pages, 2009. 18 Journal of Inequalities and Applications 44 H. Wendland, Scattered Data Approximation, vol. 17 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2005. 45 H. N. Mhaskar, F. J. Narcowich, N. Sivakumar, and J. D. Ward, “Approximation with interpolatory constraints,” Proceedings of the American Mathematical Society, vol. 130, no. 5, pp. 1355–1364, 2002. 46 F. J. Narcowich and J. D. Ward, “Scattered data interpolation on spheres: error estimates and locally supported basis functions,” SIAM Journal on Mathematical Analysis, vol. 33, no. 6, pp. 1393–1410, 2002. 47 G. Brown and F. Dai, “Approximation of smooth functions on compact two-point homogeneous spaces,” Journal of Functional Analysis, vol. 220, no. 2, pp. 401–423, 2005. 48 G. Brown, D. Feng, and S. Y. Sheng, “Kolmogorov width of classes of smooth functions on the sphere Sd−1 ,” Journal of Complexity, vol. 18, no. 4, pp. 1001–1023, 2002. 49 F. Dai, “Multivariate polynomial inequalities with respect to doubling weights and A∞ weights,” Journal of Functional Analysis, vol. 235, no. 1, pp. 137–170, 2006. 50 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Spherical Marcinkiewicz-Zygmund inequalities and positive quadrature,” Mathematics of Computation, vol. 70, no. 235, pp. 1113–1130, 2001. 51 E. Belinsky, F. Dai, and Z. Ditzian, “Multivariate approximating averages,” Journal of Approximation Theory, vol. 125, no. 1, pp. 85–105, 2003. 52 A. I. Kamzolov, “Approximation of functions on the sphere Sn ,” Serdica, vol. 10, no. 1, pp. 3–10, 1984. 53 F. Dai and Z. Ditzian, “Cesàro summability and Marchaud inequality,” Constructive Approximation, vol. 25, no. 1, pp. 73–88, 2007. 54 F. Dai, “Some equivalence theorems with K-functionals,” Journal of Approximation Theory, vol. 121, no. 1, pp. 143–157, 2003.