Document 10941696

advertisement
Hindawi Publishing Corporation
Journal of Inequalities and Applications
Volume 2010, Article ID 249507, 18 pages
doi:10.1155/2010/249507
Research Article
The Convergence Rate for a K-Functional in
Learning Theory
Bao-Huai Sheng1 and Dao-Hong Xiang2
1
2
Department of Mathematics, Shaoxing University, Shaoxing, Zhejiang 312000, China
Department of Mathematics, Zhejiang Normal University, Jinhua, Zhejiang 321004, China
Correspondence should be addressed to Bao-Huai Sheng, bhsheng@usx.edu.cn
Received 11 November 2009; Accepted 21 February 2010
Academic Editor: Yuming Xing
Copyright q 2010 B.-H. Sheng and D.-H. Xiang. This is an open access article distributed under
the Creative Commons Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
It is known that in the field of learning theory based on reproducing kernel Hilbert spaces the
upper bounds estimate for a K-functional is needed. In the present paper, the upper bounds for
the K-functional on the unit sphere are estimated with spherical harmonics approximation. The
results show that convergence rate of the K-functional depends upon the smoothness of both the
approximated function and the reproducing kernels.
1. Introduction
It is known that the goal of learning theory is to approximate a function or some function
features from data samples.
Let X be a compact subset of n-dimensional Euclidean spaces Rn , Y ⊂ R. Then,
learning theory is to find a function f related the input x ∈ X to the output y ∈ Y see
1–3. The function fx is determined by a probability distribution ρx, y ρy | xρX x
on Z : X × Y, where ρX x is the marginal distribution on X and ρy | x is the condition
probability of y for a given x.
Generally, the distribution ρx, y is known only through a set of sample z : {zi }m
i1 independently
drawn
according
to
ρx,
y.
Given
a
sample
z,
the
regression
{xi , yi }m
i1
problem based on Support Vector Machine SVM learning is to find a function fz : X → Y
such that fz x is a good estimate of y when a new input x is provided. The binary
classification problem based on SVM learning is to find a function ψz : X → {−1, 1} which
divides X into two parts. Here ψz is often induced by a real-valued function fz with the
form of ψz x sgnfz x where sgnfz x 1 if fz x ≥ 0, otherwise, −1. The functions
fz are often generated from the following Tikhonov regularization scheme see, e.g., 4–
9 associated with a reproducing kernel Hilbert space RKHS HK defined below and a
2
Journal of Inequalities and Applications
sample z:
fz : fz,λ,HK arg min
f∈HK
m
2
1
Vp yi fxi λf HK
m i1
1.1
,
p
where λ > 0 is a positive constant called the regularization parameter and Vp t 1 − t max{1 − t, 0}p p ≥ 1 called p-norm SVM loss.
In addition, the Tikhonov regularization scheme involving offset b see, e.g., 4, 10, 11
can be presented below with a similar way to 1.1
fz : fz, λ, HK arg
min
ff ∗ b, f ∗ ∈HK , b∈R
m
2
1
Vp yi fxi λf ∗ HK
m i1
.
1.2
We are in a position to define reproducing kernel Hilbert space. A function K : X×X →
R is called a Mercer kernel if it is continuous, symmetric, and positive semidefinite, that is,
for any finite set of distinct points {x1 , x2 , . . . , xl } ⊂ X, the matrix Kxi , xj li,j1 is positive
semidefinite.
The reproducing kernel Hilbert space RKHS HK see 12 associated with the
Mercer kernel K is defined to be the closure of the linear span of the set of functions
{Kx : Kx, · : x ∈ X} with the inner product ·, ·HK ·, ·K satisfying Kx , Ky K Kx, y
and the reproducing property
Kx , gK gx,
∀x ∈ X, g ∈ HK .
1.3
If gx i ci Kx, xi , then g
2K i,j ci cj Kxi , xj . Denote CX as the space of
continuous function on X with the norm · ∞ . Let κ : K
∞ . Then the reproducing
property tells that
g ≤ κg ,
∞
K
∀g ∈ HK .
1.4
It is easy to see that HK is a subset of CX. We say that Kx, y is a universal kernel if for
any compact subset X HK is dense in CX see 13, Page 2652.
Let ∧ ⊂ X be a given discrete set of finite points. Then, we may define an RKHS
H∧K , · H∧K by the linear span of the set of functions {Kx : Kx, · : x ∈ ∧}. Then, it is
easy to see that H∧K ⊂ HK and for any f ∈ H∧K there holds f
H∧K f
K .
Vp
Define Ef : Z Vp yfxdρx, y and fρ : arg min Ef, where the minimum is
taken over all measurable functions. Then, to estimate the explicit learning rate, one needs to
estimate the regularization errors see, e.g., 4, 7, 9, 14
Dλ : inf
f∈HK
Dλ :
inf
∗
V
2 p
E f − E fρ λf HK ,
V
2 p
E f − E fρ λf ∗ HK .
ff ∗ b, f ∈HK , b∈R
1.5
1.6
Journal of Inequalities and Applications
3
The convergence rate of 1.5 is controlled by the K-functional see, e.g., 9
2 K f, λ HK inf f − g p,dρX λg K ,
g∈HK
λ > 0, f ∈ Lp dρX ,
1.7
and 1.6 is controlled by another K-functional see, e.g., 4
K f, λ HK 2 f − g λg ∗ K ,
p,dρX
inf
g∈HK , gg ∗ b, g ∗ ∈HK , b∈R
1.8
where λ > 0, f ∈ Lp dρX {fx : f
p,dρX < ∞} with
f p, dρX
⎧
1/p
⎪
⎪
fxp dρX x
⎪
, 1 ≤ p < ∞,
⎪
⎪
⎨ X
⎪
⎪
⎪
⎪
⎪
⎩ess supfx,
1.9
p ∞.
x∈X
We notice that, on one hand, the K-functionals 1.7 and 1.8 are the modifications
of the K-functional of interpolation theory see 15 since the interpolation relation 1.4.
On the other hand, they are different from the usual K-functionals see e.g., 16–30 since
the term g
2K . However, they have some similar point. For example, if Kx, y is a universal
kernel, HK is dense in Lp dρX see e.g., 31. Moreover, some classical function spaces such
as the polynomial spaces see 2, 32 and even some Sobolev spaces may be regarded as
RKHS see e.g., 33.
In learning theory we often require Kf, tHK Otβ and Kf, tHK Otβ for
some β > 0 see e.g., 1, 7, 14. Many results on this topic have been achieved. With the
weighted Durrmeyer operators 8, 9 showed the decay by taking Kx, y to be the algebraic
polynomials kernels on 0, 1 × 0, 1 or on the simplex in R2 .
However, in general case, the convergence of K-functional 1.8 should also be
considered since the offset often has influences on the solution of the learning algorithms see
e.g., 6, 11. Hence, the purpose of this paper is twofold. One is to provide the convergence
rates of 1.7 and 1.8 when Kx, y is a general Mercer kernel on the unit sphere Sq and
1 ≤ p ≤ ∞. The other is how to construct functions of the type of
fx β0 g ∗ x β0 m
βi Kx, xi ,
x ∈ X,
1.10
i1
to obtain the convergence rate of 1.8. The translation networks constructed in 34–37 have
the form of 1.10 and the zonal networks constructed in 38, 39 have the form of 1.10 with
β0 0. So the methods used by these references may be used here to estimate the convergence
rates of 1.7 and 1.8 if one can bound the term g ∗ 2K .
In the present paper, we shall give the convergence rate of 1.7 and 1.8 for a general
kernel defined on the unit sphere Sn {x ∈ Rn1 : x
Rn1 1} and ρx, y ρy | xμn x
with μn x being the usual Lebesgue measure on Sn . If there is a distortion between ρSn x
4
Journal of Inequalities and Applications
and μn x, the convergence rate of 1.7-1.8 in the general case may be obtained according
to the way used by 1, 8.
The rest of this paper is organized as follows. In Section 2, we shall restate some
notations on spherical harmonics and present the main results. Some useful lemmas dealing
with the approximation order for the de la Vallée means of the spherical harmonics, the Gauss
integral formula, the Marcinkiewicz-Zygmund with respect to the scattered data obtained by
G. Brown and F. Dai and a result on the zonal networks approximation provided by H. N.
Mhaskar will be given in Section 3. A kind of weighted norm estimate for the Mercer kernel
matrices on the unit sphere will be given in Lemma 3.8. Our main results are proved in the
last section.
Throughout the paper, we shall write A OB if there exists a constant C > 0 such
that A ≤ CB. We write A ∼ B if A OB and B OA.
2. Notations and Results
To state the results of this paper, we need some notations and results on spherical harmonics.
2.1. Notations
For integers l ≥ 0, q ≥ 1, the class of all one variable algebraic polynomials of degree ≤ l
defined on −1, 1 is denoted by Pl , the class of all spherical harmonics of degree l will be
q
denoted by Hl , and the class of all spherical harmonics of degree l ≤ n will be denoted by
q
q
n . The dimension of Hl is given by see 40, Page 65
q
dl
q
and that of Πn is
10, Theorem 2:
n
q
dimHl
⎧
⎛
⎞
⎪
l
q
−
1
⎪
2l
q
−
1
⎪
⎝
⎠,
⎪
⎪
⎪
⎨ lq−1
q−1
⎪
⎪
⎪
⎪
⎪
⎪
⎩1,
l ≥ 1,
2.1
l 0,
q
l0
dl . One has the following well-known addition formula see 41, Page
q
q
dl
d q1 Yl,k xYl,k y l pl xy ,
ωq
k1
q1
where pl
l 0, 1, . . . , x, y ∈ Sq ,
2.2
x is the degree-l generalized Legendre polynomial. The Legendre polynomials
q1
are normalized so that pl
1
−1
q1
pl
1 1 and satisfy the orthogonality relations
q1
xpk xWq xdx ωq
q δl,k ,
ωq−1 dl
q/2−1
Wq x 1 − x2
.
2.3
Journal of Inequalities and Applications
5
p
Define Lp Sq and LWα,β by taking ρX to be the usual volume element of Sq and the
Jacobi weights functions Wα,β x 1 − xα 1 xβ , α > −1, β > −1, respectively. For any
φ ∈ L1Wq we have the following relation see 42, Page 312:
Sq
φ xy dμq x ωq−1
1
−1
2.4
φxWq xdx.
q
The orthogonal projections Yk f, x of a function f ∈ L1 Sq on Hk are defined by see e.g.,
43
q
dk
Yk f, x ωq
q1 Sq
pk
xy f y dμq y ,
2.5
x ∈ Sq ,
where xy denotes the inner product of x and y.
2.2. Main Results
Let φ ∈ L1Wq satisfy al φ ωq−1
1
−1
q1
φxpl
xWq xdx > 0 and
q
∞
dl q1 K φ, x, y K φ, xy al φ
p
xy ,
ωq l
l0
∞
l0
q
al φdl < ∞. Define
al φ > 0, x, y ∈ Sq .
2.6
Then, by 44, Chapter 17 we know that Kφ, x, y is positive semidefinite on Sq and the right
q1
of 2.6 is convergence absolutely and uniformly since |pl x| ≤ 1. Therefore, Kφ, x, y is
a Mercer kernel on Sq . By 13, Theorem 10 we know that Kφ, x, y is a universal kernel on
Sq . We suppose that there is a constant Cp > 0 depending only on p such for any y ∈ Sq
Kφ, ·, y q ≤ Cp φ
,
p,S
p,Wq
p
1 ≤ p ≤ ∞, φ ∈ LWq .
2.7
Given a finite set ∧, we denote by ∧ the cardinality of ∧. For r > 0 and a ≥ 1 we say
that a finite subset ∧ ⊂ Sq is an r, a-covering of Sq if
Sq ⊂
Bω, r,
ω∈∧
max∧ ∩ Bω, r ≤ a,
ω∈∧
2.8
where Bω, r {x ∈ Sq : dω, x ≤ r} with dω, x : arc cosω, x being the geodesic
distance between x and ω.
6
Journal of Inequalities and Applications
Let S ≥ 1 be an integer, a {an }∞
n0 a sequence of real numbers. Define forward
difference operators by Δan Δ1 an : an1 − an , Δr : ΔΔr−1 an , r 2, 3, . . . , Δ0 an : an ,
|a|S :
|a|∗S sup v 1r |Δr av |,
0≤r≤S, v≥0
∞
S 2.9
v 1r−1 |Δr av |.
r1 v0
We say a finite subset ∧ ⊂ Sq is a subset of interpolatory type if for any real numbers
q
{yω }ω∈∧ there is a p ∈ Πn such that pω yω , ω ∈ ∧. This kind of subsets may be found from
45, 46.
Let BS be the set of all sequence a for which |a|S < ∞ and B∗S the set of all sequence a
for which |a|S |a|∗S < ∞.
Let β > 0 be a real number, f ∈ Lp Sq . Then, we say ∂β f ∈ Lp Sq if there is a function
p
ϕ ∈ L Sq such that
ϕx ∼
∞
kβ Yk f, x ,
x ∈ Sq .
2.10
k0
We now give the results of this paper.
Theorem 2.1. If there is a constant γ0 > 0 depending only on q such that ∧ is a subset of interpolatory
type and a δ/62m , a-covering of Sq satisfying 0 < δ < γ0 /a with a ≥ 1 and m being a given
positive integer. S > q is an integer. β > 0 is a real number such that there is γ ≥ γ 2q − 1 and
∞
∞
β
−β −1
∗
∧
β − γ > q, φ ∈ L∞
Wq satisfies {l 1 al φ}l0 ∈ BS and {l 1 al φ}l0 ∈ BS . HKφ is the
p
q
reproducing kernel space reproduced by ∧ and the kernel 2.6. ∂γ f ∈ L S . Then there is a constant
∗
∗
x with gm,φ
x ∈ H∧Kφ
Cp,q > 0 depending only on p and q and a function gm,φ x C0 gm,φ
and C0 a constant such that
f − gm,φ q O
p,S
∗ 2
gm,φ H∧Kφ
1
,
2.11
O 22m2q−1 .
2.12
2mγ
The functions φ satisfying the conditions of Theorem 2.1 may be found in 39, Page
357.
Corollary 2.2. Under the conditions of Theorem 2.1. If ∂γ f ∈ Lp Sq , then
K f,
1
22mγ
O mγ .
2
H∧Kφ
1
2.13
Corollary 2.2 shows that the convergence rate of the K-functional 1.8 is controlled by
the smoothness of both the reproducing kernels and the approximated function f.
Journal of Inequalities and Applications
7
Theorem 2.3. If there is a constant γ > 0 depending only on q such that ∧ is a subset of interpolatory
type and a δ/62m , a-covering of Sq satisfying 0 < δ < γ/a with a ≥ 1 and m being a given
positive integer. H∧Kφ is the reproducing kernel space reproducing by ∧ and the kernel 2.6 with φ ∈
β
β −1
∗
L∞
Wq satisfying {l 1 al φ} ∈ BS and {1/l 1 al φ} ∈ BS . Then, for α ≥ q1/p − 1/2 β
and ∂β f ∈ Lp Sq there holds
1
K f, αm
2
H∧
O
Kφ
1
2mβ
,
2.14
where a maxa, 0.
3. Some Lemmas
To prove Theorems 2.1 and 2.3, we need some lemmas. The first one is about the Gauss
integral formula and Marcinkiewicz inequalities.
Lemma 3.1 see 47–50. There exist constants γ > 0 depending only on q such that for any positive
integer n and any δ/n, a-covering ∧ of Sq satisfying 0 < δ < γ/a, there exists a set of real numbers
λω ∼ n−q , ω ∈ ∧ such that
Sq
q
f y dμq y λω fω
3.1
ω∈∧
q
for any f ∈ Π3n , and for f ∈ Πn
f q ∼
p,S
⎧
1/p
⎪
p
⎪
⎪
⎪
λω fω
,
⎪
⎪
⎨ ω∈∧
⎪
⎪
⎪
⎪
⎪
⎪
⎩maxfω,
ω∈∧
0 < p < ∞,
3.2
p ∞,
q
where ∧ ∼ nq ∼ dimΠn , the constants of equivalence depending only on q, a, δ, and p when p is
small. Here one employs the slight abuse of notation that 00 1.
The second lemma we shall use is the Nikolskii inequality for the spherical harmonics.
q
Lemma 3.2 see 38, 45, 49, 51, 52. If 0 < p < r ≤ ∞, p ∈ Πn , then one has the following
Nikolskii inequality:
p q ≤ C q p q ≤ C q nq1/p−1/r p q ,
p,S
r,S
p,S
3.3
where the constant Cq depends only on q.
We now restate the general approximation frame of the Cesàro means and de la Vallée
Poussin means provided by Dai and Ditzian see 53.
8
Journal of Inequalities and Applications
Lemma 3.3. Let μ be a positive measure on X. {Hk }∞
k0 is a sequence of finite-dimensional spaces
satisfying the following:
I Hk ⊂ Lp dμ.
k.
II Hk is orthogonal to Hl (in Lp dμ) when l /
∞
p
III span k0 Hk is dense in L dμ for all 1 ≤ p ≤ ∞.
IV H0 is the collection of the constants.
The Cesàro means Cδ of fx is given by
N
1 δ
σN
AδN−k Pk f, x ,
f, x δ
AN k0
AδN Γk δ 1
Γk 1Γδ 1
3.4
for N 1, 2, . . ., where
d
k
f y
ϕk,i xϕk,i y dμ y ,
Pk f, x X
x ∈ X,
3.5
l1
k
and {ϕk,i x}di1
is an orthogonal base of Hk in L2 dμ. One sets,for a given α > 0, ∂α fx ∼
∞ α
p
p
α
k1 k Pk f, x and ∂α f ∈ L dμ if there exists ϕ ∈ L dμ such that Pk ϕ, x k Pk f, x.
∞
Let η be defined as ηu ∈ C , ηu 1 for u ∈ 0, 1/2 and ηu 0 for u > 1 and is a
nonegative and nonincrease function. ρN f, x are the de la Vallée Poussin means defined as
∞
ρN f, x η
k0
k
Pk f, x ,
2N
x ∈ X, f ∈ L1 dμ .
3.6
δ
Then, ρN f, x fx, f ∈ k≤N Hk . If for some δ > 0, σN
f
p, dμ ≤ C
f
p, dμ , 1 ≤ p ≤ ∞,
then, ρN f
p, dμ ≤ C
f
p, dμ and
ρN f − f O
p, dμ
∂α f p, dμ
,
Nα
∂α f ∈ Lp dμ .
3.7
Lemma 3.3 makes the following Lemma 3.4.
Lemma 3.4. Let η be the function defined as in Lemma 3.3. Define two kinds of operators, respectively,
by
∞
η
VN f, x k0
k
Yk f, x ,
2N
x ∈ Sq , f ∈ L1 Sq ,
q
∞ dl q1
k
al f
η
p x,
f, x 2N
ωq l
k0
∗
VN
3.8
x ∈ −1, 1, f ∈ L1Wq .
Journal of Inequalities and Applications
9
q
Then, VN f, x fx for any f ∈ ΠN and VN∗ f, x fx for any f ∈ PN . Moreover,
VN f − f q O
p,S
∂α f q p,S
Nα
∂α f ∈ Lp Sq ,
,
⎞
⎛
∂α f ∗
p,Wq
V f − f ⎠,
O⎝
N
p,Wq
Nα
p
∂α f ∈ LWq ,
3.9
3.10
p
where for f ∈ LWq one defines
∂α fx ∼
q
∞
d q1
kα ak f l pl x
ωq
k1
x ∈ −1, 1.
3.11
Proof. By 54, Lemma 2.2 we know VN f
p,Sq ≤ C
f
p,Sq for some C > 0. Hence, 3.9 holds
δ
f
p,Wq ≤ C
f
p,Wq for δ > q/2−1/2. Hence, 3.10
by 3.7. By 19, Theorem A we know σN
holds by 3.7.
Let ∧ ⊂ Sq be a finite set. Then we call ∧ an M-Z quadrature measure of order n if 3.1
q
and 3.2 hold for f ∈ Πn . By this definition one knows the finite set ∧ in Lemma 3.1 is an
M-Z quadrature measure of order n.
Define an operator as
∞ dq l
q1
l
σy ∧, η, f, x η
λω fωpl ωx,
y
ω
q ω∈∧
l0
x ∈ Sq .
3.12
Then, we have the following results.
Lemma 3.5 see 39. For a given integer n ≥ 1, let ∧ be an M-Z quadrature measure of order 62n ,
φ ∈ L1Wq , S > q an integer, 1 ≤ p ≤ ∞, β > q/p , where p satisfies 1/p 1/p 1 which satisfies
p 1 if p ∞ and p ∞ if p 1. ηt defined in Lemma 3.3 is a nonnegative and non-increasing
∗
φ}∞
function. Let φ ∈ L1Wq satisfy {l 1−β a−1
l0 ∈ BS . Then, for f ∈ Hγ,p , 0 < γ ≤ β, where Hγ,p
l
p
q
consists of f ∈ L S for which the derivative of order γ; that is, ∂γf, belongs to Lp Sq . Then, there
is an operator Dφ : Hγ,p → LP Sq such that
i (see [39, Proposition 3.1, (b)]). Dφ f
p,Sq ≤ c
∂γf
p,Sq for f ∈ Hβ,p
fx Sq
φ xy Dφ f y dμq y ,
!
fl, k al φ D
φ fl, k,
where fl, k Sq
fxYl,k xdμq x.
x ∈ Sq ,
3.13
3.14
10
Journal of Inequalities and Applications
ii (see [39, Theorem 3.1]). Moreover, if one adds an assumption that {l 1β al φ}∞
l0 ∈ BS ,
then, there are constants C > 0 and C1 > 0 such that
λω Dφ σ2n ∧, η, f, ωφ·ω
f· −
ω∈∧
≤ C2−nγ ∂γf p,Sq
3.15
p,Sq
and for 0 ≤ γ ≤ β
∂γ f − ∂γ σn ∧, η, f q ≤ C1 nγ−β ∂β f q .
p,S
p,S
3.16
Lemma 3.6 see e.g., 29, Page 230. Let f ∈ L2 Sq . Then,
f 2,Sq
1/2
∞ 2
Yk f
.
q
3.17
2,S
k0
Following Lemma 3.7 deals with the orthogonality of the Legendre polynomials
pl xy.
q1
Lemma 3.7. For the generalized Legendre polynomials pl
q
dl
ωq
one has
q1 Sq
q1
q1 uy pl xudμq u pk xy ,
pk
x, y ∈ Sq .
3.18
Proof. It may be obtained by 2.2.
∞
q
Lemma 3.8. Let φ ∈ L∞
Wq satisfy 2.7 for p ∞ and ∂q φ ∈ LWq . ∧ ⊂ S is a finite set satisfying
the conditions of Theorem 2.1. Then, there is a constant Cφ > 0 depending only on φ such that
2
λω λω K φ, ω , ω 1/2
3.19
≤ Cφ < ∞.
ω,ω ∈∧
∧
λω K∧ ω , ω λω ω,ω ∈∧ , where K∧ ω , ω λl ×
l0
2 1/2
q
q1
2
dl /ωq pl ω ω with λl ≥ 0 and l∧ {a aω ω∈∧ : aω < ∞}. Then,
√
λ
Proof. Define a matrix by K∧
ω∈∧
ω,ω ∈∧
2
λω λω K∧ ω , ω 1/2
√
λ
√ a K∧ a
λ K∧ 2 sup
.
l∧
a a
a/
0, a∈l2
∧
3.20
Journal of Inequalities and Applications
11
By the Parseval equality we have
√
λ
a K∧
a
aω
λω K∧ ω , ω aω λω
⎞
q
dl q1 λω ⎝ λl ×
pl ω ω ⎠aω λω
ωq
l0
ω,ω ∈∧
aω
ω,ω ∈∧
⎛
∧
q 2
dl ∧
λl aω λω Yl,α ω
ω∈∧
l0 α0
3.21
2
q
dl ∧ ≤ max λl
aω λω Yl,α ω Yl,α x dμq x
0≤l≤∧
Sq l0 α0 ω∈∧
2
q
∧
dl q1
aω λω
dμq x.
max λl
p
xω
l
ω
0≤l≤∧
q
Sq ω∈∧
l0
q
Let L∧ x ∈ Πn satisfy L∧ ω aω / λω , ω ∈ ∧. Then, by 3.1
√
λ
a K∧
a
aω
ω,ω ∈∧
λω K∧ ω , ω aω λω
2
q
∧ dl
q1
dμq x
≤ max λl
L
ωp
xωdμ
ω
∧
q
l
0≤l≤∧
Sq l0 ωq Sq
max λl
|L∧ x|2 dμq x
0≤l≤∧
max λl
0≤l≤∧
3.22
Sq
λω |L∧ x|2 max λl a a.
ω∈∧
0≤l≤∧
√
λ
Hence, a K∧ a ≤ max0≤l≤∧ λl a a. On the other hand, since Kφ, x, ·
∞,Sq ≤ C
φ
∞,Wq ,
x ∈ Sq , we have for any x ∈ Sq that
∗
∗
∗
φ
φ , x, y K φ − V∧
φ , x, y ≤ Cφ − V∧
K φ, x, y − K V∧
∞,Wq
.
3.23
It follows for x, y ∈ Sq that
∗
∗
K V∧
φ
φ , x, y − Cφ − V∧
∞,Wq
∗
∗
≤ K φ, x, y ≤ K V∧
φ
φ , x, y Cφ − V∧
∞,Wq
.
3.24
12
Journal of Inequalities and Applications
√
λ
Define K∧ φ λω Kφ, ω , ω λω ω,ω ∈∧ . Then, 3.24, 3.10, the Cauchy inequality,
and the fact ∧ ∼ nq make
√
√ λ λ
∗
∗
V∧
φ a − a K∧
φ a O ∧ φ − V∧
φ a K∧
∞,Wq
× a a O1a a. 3.25
It follows that
√ √
√ λ
λ λ
∗
∗
V∧
V∧
φ a ≤ a K∧
φ a a K∧
φ a − a K∧
φ a
√
λ a K∧
≤ max η
0≤l≤∧
l
a a O1a a.
2∧
3.26
Equation 3.2 thus holds.
4. Proof of the Main Results
We now show Theorems 2.1 and 2.3, respectively.
Proof of Theorem 2.1. Lemma 4.3 in 39 gave the following results.
Let 1 ≤ p ≤ ∞, γ > q/p , S > q be an integer, and a sequence of real numbers such
p
γ
a : {l 1γ al } ∈ BS . Then, there exists φ ∈ LWq such that al φ ∈ al , l 0, 1, 2, . . . .
∗
Since 1 lβ 1 lγ 1 lβ−γ and β − γ > q, we have a φ∗ ∈ L∞
Wq such that al φ 1 lγ al φ. Hence, ∂γ φ ∈ L∞
Wq and
q
∞
dl q1 K V2∗m1 φ , x, y al V2∗m1 φ
p
xy ,
ωq l
l0
4.1
x, y ∈ Sq ,
and for 0 ≤ k ≤ 2m1 there holds for x, y ∈ Sq that
q
dk
ωq
q
q
q1 dk q1 d q1 K V2∗m1 φ , x, u pl uy dμq u ak V2∗m1 φ
pk xy ak φ k pk xy .
ωq
ωq
Sq
4.2
It follows for x, y ∈ Sq that
q1 pk
xy Sq
q1 K V2∗m1 φ , x, u pl uy dμq u
ak φ
4.3
On the other hand, since
2 V2m f, x Y0 V2m f , x Yk V2m f , x ,
m1
k1
x ∈ Sq ,
4.4
Journal of Inequalities and Applications
13
where for 1 ≤ k ≤ 2m1 , we have by 4.3
Yk V2m f , x q
dk
ωq ak φ
q1 K V2∗m1 φ , x, u pl uy dμq u × V2m f, y dμq y . 4.5
Sq
Hence, above equation and 3.1-3.2 makes
V2m f, x Y0 V2m f , x
q
dk
q1 K V2∗m1 φ , x, u pk uy dμq u
Sq
Sq
k1 ωq ak φ
× V2m f, y dμq y
C0 λω Cω,φ K V2∗m1 φ , x, ω ,
m1
2
4.6
ω∈∧
where C0 Y0 V2m f, x, Cω,φ m1
2
k1
q
dk /ωq Yk V2m f, ω/ak φ. Define
∗
gm,φ x C0 gm,φ
x C0 λω Cω,φ K φ, x, ω ,
x ∈ Sq .
ω∈∧
4.7
∗
Then, we know gm,φ
x ∈ R H∧Kφ and by 3.9
gm,φ − f q ≤ gm,φ − V2m f q V2m f − f q
p,S
p,S
p,S
O
1
2mγ
gm,φ f − V2m fp,Sq ,
4.8
where
gm,φ x − V2m f, x ≤
λω Cω,φ K V ∗m1 φ − φ, x, ω .
2
ω∈∧
4.9
It follows by 3.9 that
gm,φ − V2m f q ≤
λω Cω,φ KV ∗m1 φ − φ, ·, ω q
2
p,S
p,S
ω∈∧
O
1
2mγ
4.10
λω Cω,φ .
ω∈∧
On the other hand, by the definition of V2m f and 3.14 we have for 1 ≤ k ≤ 2m1 that
Yk V2m f , ω η
k
2m1
ak φ Yk Dφ f, ω ,
ω ∈ Sq ,
4.11
14
Journal of Inequalities and Applications
where Dφ denotes the operator Dφ of Lemma 3.5 for φxy Kφ, xy. Hence,
Cω,φ
η m1 Yk Dφ f, ω ,
ω
2
k1 q
m1
2
q
dk
k
4.12
ω ∈ Sq .
Equation 3.2 and the definition of ηt make
m1 q 2 dk
k
λω Cω,φ ≤
λω η m1 Yk Dφ f, ω k1 ωq
2
ω∈∧
ω∈∧
≤ Cq
m1
2
q
dk
k1
Sq
q1
x| ≤ 1 make |Yk Dφ f, ω| ≤
.
4.14
The Hölder inequality, the i of Lemma 3.5, and the fact that |pl
q
q
Cq dk Dφ f
p,Sq ≤ Cq dk ∂γ f
p,Sq . Therefore,
m1
2
q 2 λω Cω,φ ≤ Cq
d ∂γ f ∗
x Take gm,φ
p,Sq
k
ω∈∧
4.13
Yk Dφ f, ω dμq ω.
k1
λω Cω,φ Kφ, x, ω, then
ω∈∧
∗ 2
gm,φ H∧Kφ
λω λω Cω,φ Cω ,φ K φ, ω , ω
ω,ω ∈∧
≤
2
λω Cω,φ 2
λω λω K φ, ω , ω 4.15
1/2
.
ω,ω ∈∧
ω∈∧
Equations 3.2, 3.17, 3.16, and the Cauchy inequality make
m1 q 2
2 d Yk Vm f , ω 2
k
λω Cω,φ ≤ C
ω
a
φ
q
k
ω∈∧
k1
2,Sq
≤C
m1
2
k1
q 2
dk ∂γ f p,Sq
2
4.16
.
Let Γx be the Gamma function. Then, it is well known that Γx/Γx a ∼ x−a , x →
∞. Therefore,
q
dl
2l q − 1
lq−1
lq−1
q−1
2l q − 1 Γ l q − 1
2l q − 1 q−2
∼
l ∼ lq−1 .
Γ q Γl 1
Γ q
4.17
Journal of Inequalities and Applications
15
Hence,
m1
2
q 2
dk
m1
2
O
k1
l
2q−1
O 2m12q−1 ,
m −→ ∞.
4.18
,
4.19
l1
Equations 4.14 and 4.4 make
gm,φ − V2m f q O
p,S
1
2mγ −2q1
and hence
f − gm,φ q O
p,S
1
2mγ
1
2mγ −2q1
4.20
.
Since γ ≥ 2q − 1 γ, we have 2.11 by 4.20. Equation 2.12 follows by 4.3, 4.4, and
3.19.
Proof of Corollary 2.2. By 2.11-2.12 one has
K f,
1
22mγ
HKφ
1 ∗ 2
gm,φ ≤ f − gm,φ p,Sq O
1
2mγ
22mγ
1
22mγ
HKφ
×O 2
2m2q−1
O
4.21
1
2mγ
.
Proof of Theorem 2.3. Take the place of φxω in Lemma 3.5 with Kφ, xω, denote still by Dφ
the operator Dφ in Lemma 3.5 with φxy Kφ, xy and
gm,φ x λω Dφ σ2m ∧, η, f, ω K φ, xω ,
x ∈ Sq ,
4.22
ω∈∧
∧
then, gm,φ ∈ HKφ
and by 3.15 f − gm,φ p,Sq O1/2mβ . In this case,
gm,φ 2 ∧
H
Kφ
λω λω Dφ σ2m ∧, η, f, ω Dφ σ2m ∧, η, f, ω K φ, ωω
ω,ω ∈∧
≤
2
λω Dφ σ2m ∧, η, f, ω ω∈∧
2
λω λω K φ, ωω ω,ω ∈∧
4.23
1/2
.
16
Journal of Inequalities and Applications
Since σ2m ∧, η, x is a spherical harmonics of order ≤ 2m , we know by i of Lemma 3.5 that
Dφ σ2m ∧, η, f, x are also spherical harmonics of order ≤ 2m . Then, 3.2, i of Lemma 3.5,
3.3, and 3.16 make
2
λω Dφ σ2m ∧, η, f, ω ≤ C
Sq
ω∈∧
Dφ σ2m ∧, η, f, ω2 dμq ω
2
≤ C∂β σ2m ∧, η, f2,Sq
2
∂β σ2m ∧, η, fp,Sq
4.24
2mq1/p−1/2 ≤ C2
2 2
≤ C22mq1/p−1/2 ∂β f − ∂β σ2m ∧, η, f p,Sq C∂β f p,Sq .
Hence, 3.19 and above equation make gm,φ 2H ∧
follows by 3.15.
Kφ
≤ C2mq1/p−1/2 ∂β f
2p,Sq . Equation 2.14
Acknowledgments
This work is supported by the National NSF 10871226 of China. The authors thank the
reviewers for giving very valuable suggestions.
References
1 F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bulletin of the American
Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
2 F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge
Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge,
Mass, USA, 2007.
3 V. N. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing,
Communications, and Control, John Wiley & Sons, New York, NY, USA, 1998.
4 D. R. Chen, Q. Wu, Y. M. Ying, and D. X. Zhou, “Support vector machine soft margin classifier: error
analysis,” Journal of Machine Learning and Research, vol. 5, pp. 1143–1175, 2004.
5 T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,”
Advances in Computational Mathematics, vol. 13, no. 1, pp. 1–50, 2000.
6 Y. Li, Y. Liu, and J. Zhu, “Quantile regression in reproducing kernel Hilbert spaces,” Journal of the
American Statistical Association, vol. 102, no. 477, pp. 255–268, 2007.
7 H. Tong, D.-R. Chen, and L. Peng, “Analysis of support vector machines regression,” Foundations of
Computational Mathematics, vol. 9, no. 2, pp. 243–257, 2009.
8 H. Tong, D.-R. Chen, and L. Peng, “Learning rates for regularized classifiers using multivariate
polynomial kernels,” Journal of Complexity, vol. 24, no. 5-6, pp. 619–631, 2008.
9 D.-X. Zhou and K. Jetter, “Approximation with polynomial kernels and SVM classifiers,” Advances in
Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.
10 D. Chen and D.-H. Xiang, “The consistency of multicategory support vector machines,” Advances in
Computational Mathematics, vol. 24, no. 1–4, pp. 155–169, 2006.
11 E. De Vito, L. Rosasco, A. Caponnetto, M. Piana, and A. Verri, “Some properties of regularized kernel
methods,” Journal of Machine Learning Research, vol. 5, pp. 1363–1390, 2004.
12 N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol.
68, pp. 337–404, 1950.
13 C. A. Micchelli, Y. Xu, and H. Zhang, “Universal kernels,” Journal of Machine Learning Research, vol. 7,
pp. 2651–2667, 2006.
Journal of Inequalities and Applications
17
14 Q. Wu, Y. Ying, and D.-X. Zhou, “Multi-kernel regularized classifiers,” Journal of Complexity, vol. 23,
no. 1, pp. 108–134, 2007.
15 J. Bergh and J. Löfström, Interpolation Spaces, Springer, New York, NY, USA, 1976.
16 H. Berens and G. G. Lorentz, “Inverse theorems for Bernstein polynomials,” Indiana University
Mathematics Journal, vol. 21, no. 8, pp. 693–708, 1972.
17 H. Berens and L. Q. Li, “The Peetre K-moduli and best approximation on the sphere,” Acta
Mathematica Sinica, vol. 38, no. 5, pp. 589–599, 1995.
18 H. Berens and Y. Xu, “K-moduli, moduli of smoothness, and Bernstein polynomials on a simplex,”
Indagationes Mathematicae, vol. 2, no. 4, pp. 411–421, 1991.
19 W. Chen and Z. Ditzian, “Best approximation and K-functionals,” Acta Mathematica Hungarica, vol.
75, no. 3, pp. 165–208, 1997.
20 W. Chen and Z. Ditzian, “Best polynomial and Durrmeyer approximation in Lp S,” Indagationes
Mathematicae, vol. 2, no. 4, pp. 437–452, 1991.
21 F. Dai and Z. Ditzian, “Jackson inequality for Banach spaces on the sphere,” Acta Mathematica
Hungarica, vol. 118, no. 1-2, pp. 171–195, 2008.
22 Z. Ditzian and X. Zhou, “Optimal approximation class for multivariate Bernstein operators,” Pacific
Journal of Mathematics, vol. 158, no. 1, pp. 93–120, 1993.
23 Z. Ditzian and K. Runovskii, “Averages and K-functionals related to the Laplacian,” Journal of
Approximation Theory, vol. 97, no. 1, pp. 113–139, 1999.
24 Z. Ditzian, “A measure of smoothness related to the Laplacian,” Transactions of the American
Mathematical Society, vol. 326, no. 1, pp. 407–422, 1991.
25 Z. Ditzian and V. Totik, Moduli of Smoothness, vol. 9 of Springer Series in Computational Mathematics,
Springer, New York, NY, USA, 1987.
26 Z. Ditzian, “Approximation on Banach spaces of functions on the sphere,” Journal of Approximation
Theory, vol. 140, no. 1, pp. 31–45, 2006.
27 Z. Ditzian, “Fractional derivatives and best approximation,” Acta Mathematica Hungarica, vol. 81, no.
4, pp. 323–348, 1998.
28 L. L. Schumaker, Spline Functions: Basic Theory, John Wiley & Sons, New York, NY, USA, 1981, Pure
and Applied Mathematics.
29 K. Y. Wang and L. Q. Li, Harmonic Analysis and Approximation on the Unit Sphere, Science Press, Beijing,
China, 2000.
30 Y. Xu, “Approximation by means of h-harmonic polynomials on the unit sphere,” Advances in
Computational Mathematics, vol. 21, no. 1-2, pp. 37–58, 2004.
31 S. Smale and D.-X. Zhou, “Estimating the approximation error in learning theory,” Analysis and
Applications, vol. 1, no. 1, pp. 17–41, 2003.
32 B. Sheng, “Estimates of the norm of the Mercer kernel matrices with discrete orthogonal transforms,”
Acta Mathematica Hungarica, vol. 122, no. 4, pp. 339–355, 2009.
33 S. Loustau, “Aggregation of SVM classifiers using Sobolev spaces,” Journal of Machine Learning
Research, vol. 9, pp. 1559–1582, 2008.
34 H. N. Mhaskar and C. A. Micchelli, “Degree of approximation by neural and translation networks
with a single hidden layer,” Advances in Applied Mathematics, vol. 16, no. 2, pp. 151–183, 1995.
35 B. H. Sheng, “Approximation of periodic functions by spherical translation networks,” Acta
Mathematica Sinica. Chinese Series, vol. 50, no. 1, pp. 55–62, 2007.
36 B. Sheng, “On the degree of approximation by spherical translations,” Acta Mathematicae Applicatae
Sinica. English Series, vol. 22, no. 4, pp. 671–680, 2006.
37 B. Sheng, J. Wang, and S. Zhou, “A way of constructing spherical zonal translation network operators
with linear bounded operators,” Taiwanese Journal of Mathematics, vol. 12, no. 1, pp. 77–92, 2008.
38 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Approximation properties of zonal function
networks using scattered data on the sphere,” Advances in Computational Mathematics, vol. 11, no.
2-3, pp. 121–137, 1999.
39 H. N. Mhaskar, “Weighted quadrature formulas and approximation by zonal function networks on
the sphere,” Journal of Complexity, vol. 22, no. 3, pp. 348–370, 2006.
40 H. Groemer, Geometric Applications of Fourier Series and Spherical Harmonics, vol. 61 of Encyclopedia of
Mathematics and Its Applications, Cambridge University Press, Cambridge, Mass, USA, 1996.
41 C. Müller, Spherical Harmonics, vol. 17 of Lecture Notes in Mathematics, Springer, Berlin, Germany, 1966.
42 S. Z. Lu and K. Y. Wang, Bochner-Riesz Means, Beijing Normal University Press, Beijing, China, 1988.
43 Y. Wang and F. Cao, “The direct and converse inequalities for jackson-type operators on spherical
cap,” Journal of Inequalities and Applications, vol. 2009, Article ID 205298, 16 pages, 2009.
18
Journal of Inequalities and Applications
44 H. Wendland, Scattered Data Approximation, vol. 17 of Cambridge Monographs on Applied and
Computational Mathematics, Cambridge University Press, Cambridge, Mass, USA, 2005.
45 H. N. Mhaskar, F. J. Narcowich, N. Sivakumar, and J. D. Ward, “Approximation with interpolatory
constraints,” Proceedings of the American Mathematical Society, vol. 130, no. 5, pp. 1355–1364, 2002.
46 F. J. Narcowich and J. D. Ward, “Scattered data interpolation on spheres: error estimates and locally
supported basis functions,” SIAM Journal on Mathematical Analysis, vol. 33, no. 6, pp. 1393–1410, 2002.
47 G. Brown and F. Dai, “Approximation of smooth functions on compact two-point homogeneous
spaces,” Journal of Functional Analysis, vol. 220, no. 2, pp. 401–423, 2005.
48 G. Brown, D. Feng, and S. Y. Sheng, “Kolmogorov width of classes of smooth functions on the sphere
Sd−1 ,” Journal of Complexity, vol. 18, no. 4, pp. 1001–1023, 2002.
49 F. Dai, “Multivariate polynomial inequalities with respect to doubling weights and A∞ weights,”
Journal of Functional Analysis, vol. 235, no. 1, pp. 137–170, 2006.
50 H. N. Mhaskar, F. J. Narcowich, and J. D. Ward, “Spherical Marcinkiewicz-Zygmund inequalities and
positive quadrature,” Mathematics of Computation, vol. 70, no. 235, pp. 1113–1130, 2001.
51 E. Belinsky, F. Dai, and Z. Ditzian, “Multivariate approximating averages,” Journal of Approximation
Theory, vol. 125, no. 1, pp. 85–105, 2003.
52 A. I. Kamzolov, “Approximation of functions on the sphere Sn ,” Serdica, vol. 10, no. 1, pp. 3–10, 1984.
53 F. Dai and Z. Ditzian, “Cesàro summability and Marchaud inequality,” Constructive Approximation,
vol. 25, no. 1, pp. 73–88, 2007.
54 F. Dai, “Some equivalence theorems with K-functionals,” Journal of Approximation Theory, vol. 121, no.
1, pp. 143–157, 2003.
Download