On the Power of Function Values for the Approximation zniakowski

advertisement
On the Power of Function Values for the Approximation
Problem in Various Settings
Erich Novak and Henryk Woźniakowski
June 2, 2011
Abstract
This is an expository paper on approximating functions from general Hilbert or Banach
spaces in the worst case, average case and randomized settings with error measured in the Lp
sense. We define the power function as the ratio between the best rate of convergence of
algorithms that use function values over the best rate of convergence of algorithms that use
arbitrary linear functionals for a worst possible Hilbert or Banach space for which the problem
of approximating functions is well defined. Obviously, the power function takes values at most
one. If these values are one or close to one than the power of function values is the same or
almost the same as the power of arbitrary linear functionals. We summarize and supply a few
new estimates on the power function. We also indicate eight open problems related to the power
function since this function has not yet been studied in many cases. We believe that the open
problems will be of interest to a general audience of mathematicians.
MSC: 41A25, 41A46, 65Y20
Keywords: degree of approximation, widths and sampling numbers, complexity of numerical
algorithms
1 Introduction . . . .
2 Worst case setting . .
2.1 Double Hilbert Case .
2.2 Single Hilbert Case
2.3 Banach Case . . .
3 Randomized setting
.
3.1 Double Hilbert Case .
3.2 Other Cases
. .
4 Average case setting with a
References
. . . . .
Surveys in Approximation Theory
Volume 6, 2011. pp. 1–23.
c 2011 Surveys in Approximation Theory.
ISSN 1555-578X
All rights of reproduction in any form reserved.
1
.
2
4
. . . . .
6
. . . . .
11
. . . . .
14
. . . . .
17
. . . . .
19
. . . . .
19
Gaussian measure 20
. . . . .
21
.
.
.
.
.
.
.
.
.
E. Novak, H. Woźniakowski
1
2
Introduction
This is an expository paper on the problem of approximating functions from general Hilbert or
Banach spaces, which has been thoroughly studied in many books and papers. This problem has
many variants depending on how we measure the error of such approximations (algorithms). A
popular choice is to take the norm of an Lp space and all values of p ∈ [1, ∞] have been considered.
Furthermore, the error of algorithms can be defined in the worst case, average case or randomized
setting. For the worst and average case settings, we consider deterministic algorithms. The worst
case error is defined as the maximal error over the unit ball of a given space whereas the average
case error is defined as the average error over the whole space with respect to a given measure. The
usual choice is a zero mean Gaussian measure. For the randomized setting we consider randomized
algorithms and the error is defined as the maximal expected error over the unit ball of a given
space. Here, the expected error is given with respect to a probability distribution of randomized
elements.
We approximate functions f by algorithms that use information about f given by finitely many
functionals of f . Information is called linear if we can choose arbitrary linear functionals, and it
is called standard if only function values may be used. Clearly, linear information is at least as
powerful as standard information. For many applications, only standard information is available.
But even in this case, it is a good idea to study linear information and learn how difficult is the
function approximation problem. For example, if we can prove that even for linear information the
problem is too difficult then, obviously, the same also holds for standard information. On the other
hand, all positive results for linear information do not have to hold for standard information.
The main question addressed in this expository paper is the study of the power of standard
information or equivalently the power of function values. We want to know how much we lose if
function values are used instead of linear information. Or more optimistically, we ask when the
power of standard information is the same or nearly the same as the power of linear information.
Such questions have been addressed in a number of papers and we will refer to them in the course
of this paper. It has been usually done for specific spaces and only a few papers addressed these
questions for some classes of spaces.
Our approach is a little more general and we want to verify the power of function values/standard
information for all Hilbert or Banach spaces for which the problem of function approximation is
well defined. More precisely, we define the power function1
ℓ sett−x : (0, ∞) × [1, ∞] → [0, 1].
Here sett ∈ {wor, ran, avg} denotes the setting we use for the error definition. Hence, wor stands
for the worst case setting, ran for the randomized setting, and avg for the average case setting. The
second superscript x ∈ {H, B} tells us if we consider only Hilbert spaces (x = H) or if we allow all
Banach spaces (x = B).
We now explain the meaning of the value
ℓ sett−x (r, p).
The first argument r means that the nth minimal error (formally defined in Definition 1) behaves
like n−r if we use linear information. Since r > 0, we consider Hilbert or Banach spaces which
1
We needed to find a good one-letter name for the power function. Since in English and in Polish this would
indicate the letter “p” which is already used as the parameter of the Lp space, we turn to German and use the word
“Leistung”. That is why the letter ℓ denotes the power function.
3
The Power of Function Values
admit convergence, and furthermore they admit a polynomial rate of convergence of the minimal
errors. The second argument p denotes the use of the norm of Lp . The value ℓ sett−x (r, p) is defined
as r −1 times the best rate of convergence we obtain using only function values for a worst possible
choice of a Hilbert or Banach space. That is why ℓ sett−x (r, p) ≤ 1, and the larger ℓ sett−x (r, p) the
better. Hence, if we have
ℓ sett−x (r, p) = 1
then the power of standard information is the same as the power of linear information. We will
see later that this does happen in some cases. Then standard information yields the same rate of
convergence as linear information for the embeddings I : F → Lp for all Hilbert (if x = H) or
Banach (if x = B) spaces without the need of a case to case study for each F . This holds in the
randomized setting for Hilbert spaces with p = 2, see Theorem 6, and in the average case setting
for Banach spaces equipped with zero mean Gaussian measures and p = 2, see Theorem 8. It is
open if ℓ sett−x (r, p) = 1 may happen in the worst case setting, see Open Problem 1.
On the other hand, if we have
ℓ sett−x (r, p) = 0
then the power of standard information is zero as compared to the power of linear information.
Finally, if we have
ℓ sett−x (r, p) ∈ (0, 1)
then we know qualitatively how much we may lose by using function values.
The concept of the power function seems to be new. For many values of p, especially when
p 6= 2, this function has not yet been studied. This is especially the case for the randomized and
average case settings. That is why we indicate eight open problems related to the power function
with the hope that many mathematicians will be interested in solving them and advancing our
knowledge about the power of function values.
In this paper, we tried to summarize and supply a few new estimates on the power function.
We now briefly indicate a few results presented in the paper.
In the worst case setting for the Hilbert case and p = 2, we conclude from [6, 8] that
ℓ wor−H (r, 2) = 0
2r
ℓ wor−H (r, 2) ∈
,1
2r + 1
for all
r ∈ (0, 12 ],
for all r ∈ ( 12 , ∞).
Hence, the power of function values is zero for r ≤ 1/2, and almost the same as the power of linear
information for large r. One of the main open problem is to verify whether ℓ wor−H (r, 2) = 1 for all
r > 1/2.
Staying with the worst case and Hilbert spaces but with p 6= 2, we conclude from [18] that
i
ℓ wor−H (r, p) = 0 for all r ∈ 0, min( 1p , 12 ) .
For r > min(1/p, 1/2), we do not know anything about the values of ℓ wor−H (r, p) except the case
p = ∞ for which we know from [12] that
1
ℓ wor−H/B (r, ∞) ≥ 1 − .
r
4
E. Novak, H. Woźniakowski
By H/B we mean that we obtain this result for both Hilbert and Banach spaces. Again for large r,
the power of standard information is almost the same as the power of linear information.
For the worst case and the Banach case, we have
ℓ wor−B (r, p) = 0
ℓ
wor−B
(r, p) = 0
for all
1
1
1−
r
p
1
ℓ wor−B (r, p) ≤ 1 −
2r
1
1
1 − ≤ ℓ wor−B (r, ∞) ≤ 1 −
r
2r
ℓ wor−B (r, p) ≤ 1 −
for all
r ∈ (0, 1] and p ∈ [1, 2],
r ∈ (0, 12 + p1 ] and p ∈ (2, ∞),
for all
r > 1 and p ∈ [1, 2],
for all
r > 1 and p ∈ [2, ∞),
for all
r > 1,
see Theorem 5 in Section 2.3.
Even though we do not know much about the power function in this case, we can conclude that
the Hilbert and Banach cases are different since
ℓ wor−B (r, 2) < ℓ wor−H (r, 2)
for all r ∈ ( 12 , ∞).
Surprisingly enough, for the randomized setting with the Hilbert case and for the average case
setting with the Hilbert or Banach case we have complete knowledge about the power function for
p = 2 due to [28] and [5]. More precisely, we know that
ℓ ran−H (r, 2) = ℓ avg−H/B (r, 2) = 1 for all
r > 0.
More estimates of the power function can be found in the subsequent sections.
2
Worst case setting
Let F be a Hilbert or Banach space of functions, defined on a set Ω, such that the linear functionals
f 7→ f (x) are continuous for all x ∈ Ω. We assume that F ⊂ Lp and that the embedding I : F → Lp
is continuous2 , where I(f ) = f . We write H instead of F if F is a Hilbert space.
Let (cn ) be a sequence of nonnegative numbers. Assume first that (cn ) converges to zero. We
define its (polynomial) rate of convergence r(cn ) by
r(cn ) = sup{ β ≥ 0 | lim cn nβ = 0}.
n→∞
If (cn ) is not convergent to zero, we set r(cn ) = 0. Then r(cn ) is well defined for all nonnegative
sequences (cn ). For example, the rate of convergence of n−α is max(0, α).
We approximate functions from F using finitely many arbitrary linear functionals L ∈ F ∗ or
function values f (x) for some x ∈ Ω. We define the error of such approximations by taking the
worst case setting with respect to the Lp norm. The norm of Lp is denoted by k · kp .
We define two classes Λall and Λstd of information evaluations. We have Λstd ⊆ Λall = F ∗
and Λstd consists of linear functionals of the form Lx (f ) = f (x) for all f ∈ F , where x ∈ Ω. We
approximate functions from F by algorithms An : F → Lp given by
An (f ) = φn (L1 (f ), L2 (f ), . . . , Ln (f )),
2
We do not specify Ω or the underlying measure of Lp since they can be arbitrary.
The Power of Function Values
5
where n is a nonnegative integer, φn : Rn → Lp is an arbitrary mapping, and Lj ∈ Λ, where
Λ ∈ {Λall , Λstd }. The choice of Lj can be adaptive, that is, Lj (·) = Lj (·; L1 (f ), L2 (f ), . . . , Lj−1 (f ))
may depend on the already computed values L1 (f ), L2 (f ), . . . , Lj−1 (f ). For n = 0, An (f ) equals
some fixed element of the space Lp . More details can be found in e.g., [14, 21].
Hence, we consider algorithms that use n linear functionals either from the class Λstd or from
the class Λall . We define the minimal errors as follows.
Definition 1. For n = 0 and n ∈ N := {1, 2, . . . }, let
eall−wor
(F, Lp ) :=
n
inf
An with Lj ∈Λall kf kF ≤1
and
estd−wor
(F, Lp ) :=
n
sup f − An (f )p
sup f − An (f )p .
inf
An with Lj ∈Λstd kf kF ≤1
For n = 0, it is easy to see that the best algorithm is A0 (f ) = 0 and we obtain
eall−wor
(F, Lp ) = estd−wor
(F, Lp ) = sup kf kp = sup kI(f )kp = kIk.
0
0
kf kF ≤1
kf kF ≤1
This is the initial error that can be achieved without computing any linear functional on the
functions f . Clearly,
eall−wor
(F, Lp ) ≤ estd−wor
(F, Lp ) for all n ∈ N.
n
n
The sequences eall−wor
(F, Lp ) and estd−wor
(F, Lp ) are both non-increasing but not necessarily
n
n
convergent to zero.
We want to compare the rates of convergence
r all−wor (F, Lp ) := r eall−wor
(F, Lp )
and
r std−wor (F, Lp ) := r estd−wor
(F, Lp ) .
n
n
all−wor (F, L ) converges
In particular, we would like to know if it is possible that
the
sequence
e
p
n
to zero much faster than the sequence estd−wor
(F, Lp ) . In many cases it is much easier to analyze
n
the sequence (eall−wor
(F, Lp ))n∈N . It is then natural to ask what can be said about the sequence
n
std−wor
(en
(F, Lp ))n∈N .
The main question addressed in this paper is to find or estimate the power function defined as
ℓ wor−x : (0, ∞) × [1, ∞] → [0, 1] by
ℓ wor−x (r, p) :=
r std−wor (F, Lp )
,
r
F : r all−wor (F,Lp )=r
inf
where x ∈ {H, B} and indicates that the infimum is taken over all Hilbert spaces (x = H) or over
all Banach spaces (x = B) continuously embedded in Lp for which function values are continuous
linear functionals and the rate of convergence is r when we use arbitrary linear functionals.
It is easy to show, and it will be shown later, that the set of spaces F for which r all−wor (F, Lp ) = r
is not empty and therefore ℓ wor−x is well defined. Obviously, ℓ wor−x (r, p) ∈ [0, 1], as already
claimed. The power function ℓ wor−x measures the ratio between the best rates of convergence of
approximations based on function values over those based on arbitrary linear functionals for a worst
possible Hilbert or Banach space.
E. Novak, H. Woźniakowski
6
We briefly comment on why we take the infimum over F in the definition of the power function.
For some specific spaces F , standard information is as powerful as linear information3 . But this is
a property of F , not the indication of the power of standard information. By taking the infimum
with respect to F , we concentrate on the power of standard information as compared to the power
of linear information.
Suppose now that we take the minimal n = n wor−all/std (ε, F, Lp ) for which the minimal worst
case error is ε or ε kIk. Assume for simplicity that
eall−wor
(F, Lp ) = n−r
n
and estd−wor
(F, Lp ) = n−α
n
for some positive α = r std−wor (F, Lp ) ≤ r. Then
l
m
n wor−all (ε, F, Lp ) = ε−1/r
and
Clearly,
lim
ε→0
l
m
n wor−std (ε, F, Lp ) = ε−1/α .
ln n wor−all (ε, F, Lp )
α
= ≥ ℓ wor−x (r, p).
ln n wor−std (ε, F, Lp )
r
Hence, if ℓ wor−x (r, p) = 1 then function values are as powerful as arbitrary linear functionals.
On the other hand, the smaller ℓ wor−x (r, p) the less powerful are function values as compared to
arbitrary linear functionals. If ℓ wor−x (r, p) = 0 then the polynomial behavior of n all (ε, F, Lp ) in
ε−1 can be drastically changed for n std (ε, F, Lp ).
Remark 1. It is well known that, in some cases, we can restrict ourselves only to linear algorithms.
This holds when p = ∞ or when F is a Hilbert space. Then the corresponding infima for the minimal
worst case errors are attained by
n
X
Lj (f )hj
An (f ) =
j=1
{Λstd , Λall }
for some Lj ∈ Λ ∈
and hj ∈ Lp . Much more about the existence of linear optimal error
algorithms can be found in e.g., [14].
2.1
Double Hilbert Case
In this subsection, we consider the approximation problem defined over a Hilbert space with the
error measured also in the Hilbert space L2 . That is why the name of this subsection is the double
Hilbert case. Approximation in the L2 norm for Hilbert spaces has been studied in many papers.
For our problem the most relevant papers are [6], [8] and [27].
Assume that H is a Hilbert space of functions defined on a set Ω. Since we assume that function
values are continuous this means that H is a reproducing kernel Hilbert space, H = H(K), where
3
This holds with r = ∞ for all finite dimensional spaces F . This also holds for some infinite dimensional Hilbert
spaces F . For example, take F as the space of piecewise constant functions over, say, Ij := [1/(j + 1), j) for
j = 1, 2, . . . . The inner product
of F is chosen such that the functions ej equal to 1 over Ij are orthonormal. Then
Pn
the algorithm An (f ) =
j=1 hf, ej iF ej minimizes the worst case error for all Lp with p ∈ [1, ∞). The error is
[n(n + 1)]−1/p . Since hf, ej iF = f (1/(j + 1)), we may say that this algorithm uses standard information. Therefore
r all−wor (F, Lp ) = r std−wor (F, Lp ) = 2/p.
7
The Power of Function Values
K is defined on Ω × Ω. Let L2 = L2 (Ω, µ) be the space of µ-square integrable functions with a
measure µ on Ω. Since the embedding I : H(K) → L2 (Ω, µ) is continuous, we have
Z
|f (t)|2 dµ(t) < ∞ for all f ∈ H(K).
Ω
In particular, we can take f = K(·, t) for arbitrary t ∈ Ω, since such a function f belongs to H(K).
Therefore W = I ∗ I : H(K) → H(K), where I ∗ is defined by hg, I(f )iL2 (Ω,µ) = hI ∗ (g), f iH(K) for
all f ∈ H(K) and g ∈ L2 (Ω, µ), is given by
Z
W (f ) (x) =
K(x, t) f (t) dµ(t) for all f ∈ H(K).
Ω
The operator W is self-adjoint and positive semi-definite. It is well known that
(H, L2 ) = 0
lim ewor−all
n
n
if and only if W is compact, see, e.g., [14, Section 4.2.3]. Unfortunately, in general, W needs
not be compact and therefore ewor−all
(H, L2 ) does not have to go to zero. In fact, the sequence
n
ewor−all
(H,
L
)
can
be
an
arbitrary
non-increasing
sequence as the following example shows.
2
n
Example 1 (Arbitrary Sequence ewor−all
(H, L2 )).
n
Let (αn )n∈N be an arbitrary non-increasing sequence of nonnegative numbers. Define k∗ as
the number of positive αn . If all αn are positive, we formally set k∗ = ∞. If k∗ is finite let
Nk∗ = {1, 2, . . . , k∗ }, otherwise let Nk∗ = N.
∗
S For k ∈ Nk , take arbitrary disjoint intervals Ik of positive Lebesgue measure |Ik | such that
I
=
[0,
1], and define the functions ek : [0, 1] → R by
k∈Nk∗ k
√
αk
ek = p
1Ik ,
|Ik |
p
where 1Ik is the indicator function of Ik . That is, ek (x) = αk /|Ik | for x ∈ Ik and ek (x) = 0 for
x∈
/ Ik .
Define the Hilbert space H = span{ek | k ∈ Nk∗ } equipped with the inner product such that
hek , ej iH = δk,j for all k, j ∈ Nk∗ . This means that H is the space of piecewise constant functions
f : [0, 1] → R such that
k
X
∗
f=
ak ek
with
k=1
ak = hf, ek iH and kf kH =
X
k∗
a2k
k=1
1/2
< ∞.
The Hilbert space H has the reproducing kernel
k
X
∗
K(x, y) =
ek (x)ek (y) for all
k=1
x, y ∈ [0, 1].
Indeed, first of all note that K is well defined since for all x and y the last series has at most one
nonzero term. Then hK(·, yi ), K(·, yj )iH = K(yi , yj ), and
X
2
m
X
m
0≤
aj K(·, yj )
=
ai aj K(yi , yj ).
j=1
H
i,j=1
8
E. Novak, H. Woźniakowski
This shows that the matrix (K(yi , yj ))i,j=1,2,...,m is symmetric and positive semi-definite for all m
and yj . Clearly,
k∗
X
hf, K(·, y)iH =
ak ek (y) = f (y),
k=1
and this completes the proof of the fact that K is the reproducing kernel of H.
Let L2 = L2 ([0, 1]) be the usual space of square Lebesgue integrable functions. Note that
αk
kek k2 = p
|Ik |
Z
dt
k
X
a2k α2k
Ik
1/2
= αk .
Therefore, for any f ∈ H, we have
∗
kI(f )k2 = kf k2 =
k=1
!1/2
≤ α1 kf kH .
The last bound is sharp, and therefore kIk = α1 showing that H is continuously embedded in L2 .
The operator W takes now the form
k
X
∗
W (f ) =
k=1
hf, ek i2 ek .
Note that W (ek ) = kek k22 ek = α2k ek . This means that (α2k , ek ) are the eigenpairs of W and
k
X
∗
W (f ) =
k=1
α2k hf, ek iH ek .
It is well known that
ewor−all
(H, L2 ) = αn+1
n
for all
n ∈ N,
see, e.g., [14, Section 4.2.3]. This proves that the behavior of ewor−all
(H, L2 ) can be arbitrary and,
n
(H,
L
)
to
zero.
Clearly,
W is compact if and
in general, we do not have convergence of ewor−all
2
n
only if limn αn = 0.
In addition, this example also shows that for a given β ≥ 0 we can define a sequence αk such
that r all−wor (H, L2 ) = β. Indeed, it is enough to take αk = k−β .
We discuss the power function ℓ wor−H . We now assume that r all−wor(H, L2 ) = r > 0. In
particular, we assume that the operator W is compact. Then W has eigenpairs (λj , ηj ),
W (ηj ) = λj ηj
for all j = 1, 2, . . . ,
with hηj , ηk iH = δj,k . Without loss of generality, we can order the eigenvalues λj such that λ1 ≥
λ2 ≥ · · · . For all f ∈ H, we have
hf, ηk i2 = hI(f ), I(ηk )i2 = hf, W (ηk )iH = λk hf, ηk iH .
In particular, letting f = ηj , we conclude that the functions ηj are also orthogonal in the space L2 .
9
The Power of Function Values
As above, it is well known that
ewor−all
(H, L2 ) =
n
p
λn+1
for all
n ∈ N.
If (ewor−all
(H, L2 )) is convergent to zero then the same also holds
n
wor−std
(en
(H, L2 )) is also convergent to zero. Indeed, we can reason as in
for function values, i.e.,
Section 10.4 of [14] that
all linear functionals can be approximated with an arbitrarily
small
error
when we use function
p
values, and then it is enough to remember that the error λn+1 is achieved by a linear algorithm
that uses the n linear functionals hf, ηj iH(K) .
We have
Z
∞
∞ h
i2
X
X
λj =
K(x, x) dµ(x) =
(H,
L
)
trace(W ) :=
ewor−all
2
n
j=1
Ω
n=0
2
wor−all
P
and this is finite if r all−wor (H, L2 ) > 12 . If r all−wor (H, L2 ) = 12 then ∞
(H, L2 ) may be
n=0 en
wor−all
2
P
finite or infinite, and if r all−wor (H, L2 ) < 12 then ∞
(H, L2 ) is infinite.
n=0 en
The result from [8] states that r all−wor (H, L2 ) = r > 12 implies
r std−wor (H, L2 ) ≥ r −
r
2r 2
=
.
2r + 1
2r + 1
wor−all
2
P
The case ∞
(H, L2 ) = ∞ was studied in [6]. It was shown that for any r ∈ [0, 12 ]
n=0 en
there is a Hilbert space H such that
r all−wor (H, L2 ) = r
and
r std−wor (H, L2 ) = 0.
These results give us the following bounds on the power function ℓ wor−H (·, 2).
Theorem 1 ([6, 8]).
ℓ wor−H (r, 2) = 0
2r
wor−H
ℓ
(r, 2) ∈
,1
2r + 1
for all
r ∈ (0, 12 ],
for all
r ∈ ( 12 , ∞).
Although we do not know the power function ℓ wor−H (·, 2) exactly, we know that there is a jump
at since ℓ wor−H (r, 2) ≥ 1/2 for all r > 1/2. Note also that for large r, the values of ℓ wor−H (r, 2)
are close to 1. This means that the power of function values for r ∈ (0, 12 ) is zero, and is almost
optimal for large r.
The problem of finding the exact values of ℓ wor−H (r, 2) for r > 12 is one of the main open
problems in the worst case setting. We know that many people, including the two of us, spent a lot
of time trying to solve this problem but so far in vain. That is why we propose an open problem
with the hope that it will soon be solved by the reader.
1
2
Open Problem 1. Suppose that r > 12 . Is it true that
ℓ wor−H (r, 2) = 1 ?
If not, what are the values of ℓ wor−H (r, 2)?
E. Novak, H. Woźniakowski
10
The rate of convergence neglects to distinguish between sequences that differ by a power of
logarithms of n. Indeed, for cn = n−r and bn = n−r [ln (n + 1)]β for a positive r and an arbitrary β,
we have r(cn ) = r(bn ) = r independent of β. Obviously, for some standard spaces, we would like to
know not only the rate but also a power of logarithms. We discuss this point in the next example,
where we use the notation
cn ≍ bn
which means that there exist positive numbers a1 and a2 such that a1 ≤ cn /bn ≤ a2 for large n.
Example 2 (Sobolev spaces, p = 2).
a) For the standard Sobolev spaces W2s ([0, 1]d ) with an arbitrary s > 0, which measures the
total smoothness of functions, it is well known that
(W2s ([0, 1]d ), L2 ) ≍ n−s/d .
eall−wor
n
Of course, in general, function values are not well defined in W2s ([0, 1]d ). We must assume the
embedding condition 2s > d and then function values are well defined and they are continuous
linear functionals. Furthermore, it is known that
(W2s ([0, 1]d ), L2 ) ≍ estd−wor
(W2s ([0, 1]d ), L2 ) ≍ n−s/d ,
eall−wor
n
n
see, e.g., [14] for a survey of such results.
b) For the Sobolev spaces W2r,mix([0, 1]d ) with r > 0, which measures the smoothness of functions
with respect to each variable, it is known that
eall−wor
(W2r,mix ([0, 1]d ), L2 ) ≍ n−r (log n)(d−1)r ,
n
see, e.g., [1, 11, 17, 19, 21, 26], where this result can be found in various generalities.
For function values, we must assume that r > 1/2, and then the best upper bound is
r,mix
d
−r
(d−1)(r+1/2)
(W
([0,
1]
),
L
)
=
O
n
(log
n)
estd−wor
,
2
n
2
see [17, 19, 22].
It is not known whether this extra power (d − 1)/2 of logarithms is needed. It would be very
interesting to verify whether
(W2r,mix ([0, 1]d ), L2 ) ≍ estd−wor
(W2r,mix ([0, 1]d ), L2 )
eall−wor
n
n
holds also for this example.
The examples in [6] use very irregular sequences (eall−wor
(H, L2 )) and hence do not exclude a
n
positive answer to the question in the next open problem.
Open Problem 2. Assume that eall−wor
(H, L2 ) ≍ n−r [ln(n + 1)]β with arbitrary r > 0 and β ∈ R.
n
Is it true that this implies
estd−wor
(H, L2 ) ≍ eall−wor
(H, L2 )?
n
n
11
The Power of Function Values
2.2
Single Hilbert Case
In this short subsection, we mostly consider the approximation problem defined over a Hilbert
space with the error measured in the non-Hilbert space Lp for p 6= 2. That is why the name of this
subsection is the single Hilbert case.
We report on a recent result of Tandetzky [18] who considered the approximation problem for
arbitrary p ∈ [1, ∞). He proved that for any r ∈ (0, min( 1p , 12 )] there exists a Hilbert space H
continuously embedded in Lp = Lp ([0, 1]) such that
r all−wor (H, Lp ) = r
r std−wor (H, Lp ) = 0.
and
This result obviously implies that the power function is zero over (0, min( 1p , 12 )]. It seems to us that
no example is known in the literature for a Hilbert space for which eall−wor
(H, Lp ) tends to zero faster
n
std−wor
all−wor
than the sequence e
(H, Lp ) with the additional assumption that r
(H, Lp ) > min( 1p , 12 ).
This implies that we do not know the behavior of the power function over (min( 1p , 12 ), ∞). We
summarize our partial knowledge of the power function in the following theorem.
Theorem 2 ([18]). Let p 6= 2.
ℓ wor−H (r, p) = 0
for all
r ∈ (0, min( 1p , 12 )].
Only for the case p = ∞ do we know a little more about the behavior of the power function.
In this case the rates are related as explained in the following theorem.
Theorem 3 ([12]). Let F be a Hilbert or a Banach space. Then
(F, L∞ ) ≤ (1 + n) eall−wor
(F, L∞ )
estd−wor
n
n
for all
n ∈ N.
(1)
This inequality follows from Proposition 1.2.5, page 16, in [12], where it is stated for the
Kolmogorov widths and also applies to the linear or Gelfand widths.
The inequality (1) cannot be improved even if we assume that F is a Hilbert space. This follows
from the following example.
Example 3. Take F = H = Rn+1 . That is, f ∈ H is now defined on {1, 2, . . . , n + 1} and can
be identified with f = [f1 , f2 , . . . , fn+1 ], where fi = f (i). The space H is equipped with the inner
product
"n+1 # "n+1 #
n+1
X
X
X
hf, giH =
fi
gi + ε
fi gi for all f, g ∈ H.
i=1
i=1
i=1
The unit ball of H is thus
hP
i2
P
n+1
2
B = f ∈ Rn+1 |
f
+ ε n+1
i
i=1
i=1 fi ≤ 1 .
Then for ε → 0, we obtain
estd−wor
(F, L∞ ) ≥ 1.
n
Indeed, knowing f (xi ) for i = 1, 2, . . . , n, with xi ∈ {1, 2, . . . , n + 1}, we take f such that f (xi ) = 0.
Since we have at most n conditions on n + 1 components
of f then at least one component of f
√
from the unit ball is free and
can
be
taken
as
±1/
1
+
ε.
This
proves that the worst case error of
√
any algorithm is at least 1/ 1 + ε which in the limit as ε goes to zero is 1.
12
E. Novak, H. Woźniakowski
Consider the information
N (f ) = [f1 − f2 , f2 − f3 , . . . , fn − fn+1 ]
for all
f ∈ H.
It is known that the minimal error of all algorithms that use N is the supremum of kf kH for f ∈ B
and N (f ) = 0. Observe that N (f ) = 0 implies that f = [c, c, . . . , c]. Next, f ∈ B implies that
c2 ≤
1 + ε/(n + 1)
.
(n + 1)2
(F, L∞ ) ≤ 1/(n + 1).
Hence, again for ε → 0, we obtain eall−wor
n
Let r all−wor (F, L∞ ) = r > 1. Then the inequality (1) implies that
r std−wor (F, L∞ ) ≥ r − 1.
Thus, Theorem 3 implies the following behavior of the power function for p = ∞.
Theorem 4.
ℓ
wor−H/B
(r, ∞) ∈
r−1
,1
r
for all
r > 1.
Hence, for both p = 2 and p = ∞, we see that for large r, the power function is almost one.
We want to guess the behavior of the power function for r > min( 1p , 12 ). It can be helpful to see
the actual rates of convergence for some standard spaces. In particular, for p = ∞, the rates are
known for Sobolev spaces.
Example 4 (Sobolev spaces, p = ∞).
a) For the Sobolev spaces W2s ([0, 1]d ) and an arbitrary s for which 2s > d, it is well known that
eall−wor
(W2s ([0, 1]d ), L∞ ) ≍ estd−wor
(W2s ([0, 1]d ), L∞ ) ≍ n−s/d+1/2 ,
n
n
see, e.g., [14].
b) For the Sobolev spaces W2s,mix ([0, 1]d ) with s > 1/2, it is known that
eall−wor
(W2s,mix ([0, 1]d ), L∞ ) ≍ estd−wor
(W2s,mix([0, 1]d ), L∞ ) ≍ n−s+1/2 (log n)(d−1)s ,
n
n
see [20].
Hence, at least for the standard Sobolev spaces the rates are the same even up to logarithmic
factors. This again suggests that the power function can be just one for all r > (min( 1p , 12 ), ∞).
This is the next open problem.
Open Problem 3. Verify whether it is true that for all p ∈ [1, ∞] we have
(
0 for all r ∈ 0, min( 1p , 12 ) ,
wor−H
ℓ
(r, p) =
1 for all r ∈ min( 1p , 12 ), ∞ .
We end this section with a remark on the rates of convergence for different p.
13
The Power of Function Values
Remark 2. It is interesting to compare the sequences
eall−wor
(H, Lp )
n
estd−wor
(H, Lp )
n
and/or
for the same H but different p. The following example shows that, in general, there exists no
relation between these sequences. Some relations do exist as shown in [7] but under some additional
assumptions about H. The following example shows that some assumptions on H are indeed needed,
otherwise everything can happen.
Take L2 = L2 ([0, 1]), L∞ = L∞
1]) and assume that [0, 1] is the disjoint union of intervals
P([0,
∞
Ik of positive length λk such that k=1 λk = 1. Assume also that
λ1 ≥ λ2 ≥ · · ·
and put ek = 1Ik . We define a Hilbert space H by its unit ball
B=
X
∞
k=1
where
Hence for f =
k=1
γ1 ≥ γ2 ≥ · · · > 0 with
P∞
k=1 αk ek
kf k2H
X
∞ α2k
αk ek ≤1 ,
γk2
=
lim γk = 0.
k→∞
∈ H, we obtain
∞
X
α2
k
γk2
k=1
kf k22
and
=
∞
X
α2k λk ,
k=1
kf k∞ = sup |αk |.
k
From this, we easily conclude that the optimal approximation for L2 as well as for L∞ is given by
f=
∞
X
k=1
αk ek 7→
n
X
αk ek .
k=1
Note that
αk = hf, ek iH = f (xk ) λk ,
where xk ∈ Ik . This means that the optimal error algorithm for function values and linear functionals is the same, and therefore
eall−wor (H, Lp ) = estd−wor (H, Lp )
for p ∈ {2, ∞}.
However,
eall−wor
(H, L∞ ) = γn+1
n
and
eall−wor
(H, L2 ) = γn+1
n
Since {γn } and {λn } are not related, it is easy to get an example with
r all−wor (H, L∞ ) = 0
but
p
λn+1 .
r all−wor (H, L2 ) = ∞.
Hence, in general, the difference between the minimal rates for L2 and L∞ approximation can be
extreme.
14
E. Novak, H. Woźniakowski
2.3
Banach Case
In this subsection, we study the approximation problem defined over a Banach space that is continuously embedded in Lp . As always, we assume that function evaluations are continuous functionals.
We establish some bounds on the power functions by recalling known results for Sobolev spaces.
Example 5 (Sobolev spaces, 1 ≤ p < ∞).
For the Sobolev space Wps ([0, 1]d ) for an arbitrary s > 0, it is known that
eall−wor
(Wps ([0, 1]d ), Lp ) ≍ n−s/d .
n
Function values are well defined in Wps ([0, 1]d ) only if the embedding condition s/d > 1/p or s = d
and p = 1 holds. However, we may use the approach suggested in [2] that allows us to consider the
case without this embedding condition. Namely, we limit ourselves only to continuous functions by
taking
F = Wps ([0, 1]d ) ∩ C([0, 1]d )
with norm
kf kF = kf kWps ([0,1]d ) + kf kC([0,1]d ) .
Here, C([0, 1]d ) is the space of continuous functions equipped with the max norm. Then F is a
Banach space for which function values are well defined and function values are continuous linear
functionals on this space. Then for s/d ≤ 1/p and s/d < 1 in the case p = 1, respectively, it was
shown in [2] that
(F, Lp ) ≍ 1.
estd−wor
n
The last example implies that
ℓ wor−B (r, p) = 0
ℓ
wor−B
(r, 1) = 0
for all
for all
r ∈ (0, 1/p]
r ∈ (0, 1).
and 1 < p < ∞,
We now show that ℓ wor−B (r, p) = 0 over larger domains of r for a given p by recalling other
results for Sobolev spaces.
Example 6 (Sobolev space W1s ([0, 1]d ), 1 ≤ p < ∞).
Consider the approximation problem for the Sobolev space W1s ([0, 1]d ) with error measured in
Lp = Lp ([0, 1]d ). This problem is well defined and convergent for the class Λall if we assume that
s/d > 1 − 1/p.
For p ∈ [1, 2], we have
(W1s ([0, 1]d ), Lp ) ≍ n−s/d ,
eall−wor
n
whereas for p ∈ [2, ∞), we have
eall−wor
(W1s ([0, 1]d ), Lp ) ≍ n−s/d+1/2−1/p ,
n
see e.g., [24]. The last relation also holds for p = ∞ as will be needed later.
The same results are also valid for the space F = W1s ([0, 1]d ) ∩ C([0, 1]d ) with the norm
kf kF = kf kW1s ([0,1]d ) + kf kC([0,1]d ) .
For the space F , we can consider function values for all s/d > 1 − 1/p. For s/d ≤ 1, we have
estd−wor
(F, Lp ) ≍ 1.
n
The Power of Function Values
Let p ∈ [1, 2]. The previous example implies that
ℓ
wor−B
(r, p) = 0
for all
r∈
15
1
1− , 1 .
p
For p ∈ (1, 2], we showed before that ℓ wor−B (r, p) = 0 for all r ∈ (0, 1/p]. Since (0, 1/p]∪(1−1/p, 1] =
(0, 1], we obtain
ℓ wor−B (r, p) = 0
for all
r ∈ (0, 1]
Let p ∈ [2, ∞). The previous example implies that
ℓ
wor−B
(r, p) = 0
for all
r∈
and p ∈ [1, 2].
1 1 1
, +
.
2 2 p
Now we show that ℓ wor−B (r, p) = 0 also for p ∈ [2, ∞) and r ∈ (0, 12 ]. We increase the space
F = W1s ([0, 1]) ∩ C([0, 1]) with the norm
kf kF = kf kW1s ([0,1]) + kf kC([0,1])
(for d = 1) even more by adding functions from a Hölder class C α , where 0 < α ≤ 1/2. Hence we
take the space
Fe = F + C α
with the norm
kf kFe := inf{kgkF + khkC α | f = g + h, g ∈ F, h ∈ C α }.
Since the unit ball of Fe is larger than that of F we still have estd−wor
(Fe, Lp ) ≍ 1 for s ≤ 1. It
n
all−wor
α
−α
is well known that en
(C ([0, 1]), Lp ) ≍ n
and the same holds for Fe if α ≤ s − 1/2 + 1/p.
Hence for p ∈ [2, ∞), we obtain
1 1
wor−B
ℓ
(r, p) = 0
for all
r ∈ 0, +
.
2 p
We learnt some properties of the power function by using known results for Sobolev spaces
Wps1 ([0, 1]d ) in the case s/d ≤ 1/p1 so that function values did not even supply convergence. Since
we needed to assume that s/d > 1/p1 − 1/p, the case p = ∞ could not be covered.
We now recall some results for Sobolev spaces when the embedding condition is satisfied and
when there is a difference in the convergence rates between function values and arbitrary linear
functionals.
Example 7 (Sobolev space W1s ([0, 1]d ), 1 ≤ p ≤ ∞).
Consider the approximation problem for the Sobolev space W1s ([0, 1]d ) with error measured in
Lp . We now assume that s/d ≥ 1. Then function values are well defined and are continuous linear
functionals. Furthermore,
estd−wor
(W1s ([0, 1]d ), Lp ) ≍ n−s/d+1−1/p ,
n
see, e.g., the survey of such results in Section 4.2.4 of [14] or [23, 24].
16
E. Novak, H. Woźniakowski
The last two examples imply the following estimates of the power function. For all r > 1 and
p ∈ [1, 2], we have
1
1
wor−B
ℓ
(r, p) ≤ 1 −
1−
,
r
p
and for all r > 1 and p ∈ [2, ∞], we have
ℓ wor−B (r, p) ≤ 1 −
1
.
2r
We summarize the properties of the power function established in this section in the following
theorem. The only case where we have a positive lower bound is the case p = ∞, see Theorem 4.
Theorem 5.
ℓ wor−B (r, p) = 0
ℓ
wor−B
(r, p) = 0
for all
1
1
1−
r
p
1
ℓ wor−B (r, p) ≤ 1 −
2r
1
1
1 − ≤ ℓ wor−B (r, ∞) ≤ 1 −
r
2r
ℓ wor−B (r, p) ≤ 1 −
for all
for all
r ∈ (0, 1] and p ∈ [1, 2],
r ∈ (0, 12 + 1p ] and p ∈ (2, ∞),
r > 1 and p ∈ [1, 2],
for all
r > 1 and p ∈ [2, ∞),
for all
r > 1.
It is interesting to note that although we do not know the exact values of the power functions
in the Hilbert and Banach cases, we can check that they are different at least for p = 2. Indeed,
from Theorems 1 and 5, we have
ℓ wor−B (r, 2) = ℓ wor−H (r, 2)
ℓ wor−B (r, 2) = 0 <
1
ℓ wor−B (r, 2) ≤ 1 −
<
2r
≤ ℓ wor−H (r, 2)
2r
≤ ℓ wor−H (r, 2)
2r + 1
1
2
for all
r ∈ (0, 12 ],
for all
r ∈ ( 12 , 1],
for all
r ∈ (1, ∞).
This shows that at least for p = 2 the power of function values for the Hilbert case is larger than
for the Banach case for all r > 12 .
Obviously, it would be desirable to find the exact values of the power function ℓ wor−B (r, p) for
all r ∈ (0, ∞) and p ∈ [1, ∞]. However, it could be a very difficult problem. Hence, as maybe a less
difficult problem, we would like to check the following property of the power function.
Open Problem 4. For p ∈ [1, ∞], find the supremum a∗ (p) of a for which
ℓ wor−B (r, p) = 0 for all r ∈ (0, a].
We only know that a∗ (p) ≥ 1 for all p ∈ [1, ∞).
We already indicated that the power functions for the Hilbert and Banach cases are different
for p = 2. It would be of interest to check if this holds for all p.
Open Problem 5. Find all p ∈ [1, ∞] for which
ℓ wor−B (·, p) 6= ℓ wor−H (·, p).
The Power of Function Values
17
Similar to Example 3, we present an example of a Banach space F where the ratio
estd−wor
(F, Lp )
n
all−wor
en
(F, Lp )
is large for p > 1 and a fixed n.
Example 8. Take F = ℓn+1
, i.e., F = Rn+1 with the ℓ1 norm. Then we obtain
1
estd−wor
(F, Lp ) = (n + 1)1−1/p eall−wor
(F, Lp ),
n
n
(2)
since estd−wor
(F, Lp ) = 1 and eall−wor
(F, Lp ) = (n + 1)1/p−1 . The upper bound in the last statement
n
n
follows again with the information N (x) = (x2 − x1 , x3 − x2 , . . . , xn+1 − xn ) while the lower bound
contains a ℓn+1
ball of radius (n + 1)1/p−1 .
follows from the fact that the unit ball of ℓn+1
p
1
Again this ratio (n + 1)1−1/p as in (2) can be obtained with a Hilbert space and actually we can
take the same spaces as in Example 3, i.e., we define in H = Rn+1 the scalar product
"n+1 # "n+1 #
n+1
X
X
X
hf, giH =
fi
gi + ε
fi gi for all f, g ∈ H,
i=1
i=1
i=1
and consider the limit where ε > 0 tends to zero.
We end this section with another open problem.
Open Problem 6. Find the supremum of estd−wor
(F, Lp )/eall−wor
(F, Lp ) over all Banach and/or
n
n
Hilbert spaces. So far, we know that
sup
F
estd−wor
(F, Lp )
n
≥ (n + 1)1−1/p ,
all−wor
en
(F, Lp )
(3)
and equality holds if p = ∞.
3
Randomized setting
We approximate the embedding operator I : F → Lp in the randomized setting. We now briefly
define this setting. The reader may find more on this subject, e.g., in [14, 15, 21].
We approximate I by algorithms An that use n values of linear functionals on the average and
each linear functional is chosen randomly with respect to a probability distribution.
More precisely, the algorithm An is of the following form
An (f, ω) = φn,ω L1,ω1 (f ), L2,ω2 (f ), . . . , Ln(ω),ωn(ω) (f ) ,
(4)
and the number n(ω) of functionals can also be random. Here ω = [ω1 , ω2 , . . . ], and the linear
functionals Lj,ωj are random functionals distributed according to a probability distribution on
elements ωj which may depend on j as well as on the values already computed, i.e., on Li,ωi (f ) for
i = 1, 2, . . . , j − 1. The mapping φn,ω : Rn(ω) → Lp is a random mapping, and
Eω n(ω) ≤ n.
18
E. Novak, H. Woźniakowski
We also allow adaptive choices of the functionals Lj,ωj . That is, Lj,ωj may depend on the already
selected functionals and the values L1,ω1 (f ), L2,ω2 (f ), . . . , Lj−1,ωj−1 (f ).
Without loss of generality, we assume that An (f, ·) is measurable, and define the randomized
error of An as
1/2
eran (An ) = sup Eω kI(f ) − An (f, ω)k2p
.
kf kF ≤1
Again, we compare such algorithms with algorithms that are based on function values, i.e., each
Lj,ωj is now of the form Lj,ωj (f ) = f (tj,ωj ) and
An (f, ω) = φn,ω f (t1,ω1 ), f (t2,ω2 ), . . . , f (tn(ω),ωn(ω) ) .
(5)
Hence, we consider algorithms that use n linear functionals either from the class Λstd or the
class Λall . We define the minimal errors as follows.
Definition 2. For n ∈ N0 , let
and
n
o
ran
all
eall−ran
(F,
L
)
=
inf
e
(A
)
|
L
∈
Λ
and
A
as
in
(4)
,
p
n
j
n
n
n
o
ran
std
estd−ran
(F,
L
)
=
inf
e
(A
)
|
L
∈
Λ
and
A
as
in
(5)
.
p
n
j
n
n
As in the worst case setting, for n = 0 it is easy to see that the best algorithm is A0 = 0 and
obtain
eall−ran
(F, Lp ) = estd−ran
(F, Lp ) = sup kf kp = sup kI(f )kp = kIk.
0
0
kf kF ≤1
kf kF ≤1
This is the initial error that can be achieved without computing any linear functional on the
functions f . Clearly,
eall−ran
(F, Lp ) ≤ estd−ran
(F, Lp ) for all n ∈ N.
n
n
The sequences eall−ran
(F, Lp ) and estd−ran
(F, Lp ) are both non-increasing but not necessarily
n
n
convergent to zero.
As in the worst case setting, we want to compare the rates of convergence
std−ran
std−ran
(F,
L
)
and
r
(F,
L
)
=
r
e
(F,
L
)
.
r all−ran (F, Lp ) = r eall−ran
p
p
p
n
n
In particular, we would like to know if it is possible
that the sequence r all−ran (F, Lp ) converges
much faster than the sequence r std−ran (F, Lp ) . The main question addressed in this section is to
find or estimate the power function defined as ℓ ran−x : (0, ∞) × [1, ∞] → [0, 1] by
ℓ ran−x (r, p) :=
r std−ran (F, Lp )
,
r
F : r all−ran (F,Lp )=r
inf
where x ∈ {H, B} indicates that the infimum is taken over all Hilbert spaces (x = H) or over all
Banach spaces (x = B) continuously embedded in Lp and the rate of convergence is r when we use
arbitrary linear functionals. In the randomized setting, we do not need to assume that function
values are continuous linear functionals.
19
The Power of Function Values
3.1
Double Hilbert Case
In this subsection, we consider the approximation problem defined over a Hilbert space with the
error measured also in the Hilbert space L2 . It may be surprising but the results in the double
Hilbert case are complete due to [28], and there is no need to discuss different cases depending on
the values of r.
Theorem 6 ([28]). Let I : H → L2 (Ω) be a continuous embedding from a Hilbert space H into
L2 (Ω). Then
r all−ran (H, L2 ) = r std−ran (H, L2 ).
Therefore
ℓ ran−H (r, 2) = 1
for all
r > 0.
We add that it was known before, see [13, 25], that also
r all−ran (H, L2 ) = r all−wor (H, L2 ).
This means that the power of function values in the randomized setting is the same as the power of
arbitrary linear functionals in the worst case setting, which in turn is the same as in the randomized
setting.
3.2
Other Cases
For p > 2, we know examples from the literature where the rate r all−ran (H, Lp ) is larger than the
rate r std−ran (H, Lp ). Namely take I : W2r ([0, 1]) → Lp ([0, 1]). Then with Λall one can achieve the
order n−r (with additional log terms in the case p = ∞, but the order is still r), see [10]. For Λstd
the optimal order is n−r+1/2−1/p , see [2]. The authors of [2, 10] studied the case of integer r, but
the results can be extended via interpolation to all r > 1. Therefore, we obtain
ℓ ran−H (r, p) ≤
r − 1/2 + 1/p
r
if
r ≥ 1 and
p > 2.
We summarize these estimates of the power function in the following theorem.
Theorem 7. Let p > 2. Then
ℓ ran−H (r, p) ≤ 1 −
1/2 − 1/p
r
for all
r ≥ 1.
Sobolev embeddings in the randomized setting were studied by several authors, including [2, 3,
4, 10, 12, 21, 25]. For our purpose, the most important papers are [2, 10] and the paper [3] for the
interpolation argument.
For the embedding I : W2r ([0, 1]) → L∞ ([0, 1]) the rate is improved by 1/2 if we switch from the
class Λstd to the class Λall . This gap of 1/2 is the largest possible under some additional conditions,
see [7, 9]. Let us add in passing that the same gap of 1/2 appears for Λall between the worst case
and the randomized setting.
The Hilbert case for p ∈ [1, 2) as well as the Banach case for all p ∈ [1, ∞] have not yet been
studied. We pose this as an open problem.
20
E. Novak, H. Woźniakowski
Open Problem 7. Study the power function in the randomized setting for the Hilbert case with
p ∈ [1, 2) and for the Banach case for all p ∈ [1, ∞]. In particular, determine the supremum a∗ (p)
of a for which
ℓ ran−H/B (r, p) = 0 for all r ∈ (0, a].
4
Average case setting with a Gaussian measure
In the average case setting, we assume that I : F → Lp (Ω) is continuously embedded and function
evaluations are continuous functionals on F . As far as we know, only the case p = 2 was studied
and we report the known results from [5] for this case.
We assume that F is a separable Hilbert/Banach space equipped with a zero mean Gaussian
measure µ. As in the worst case setting, we consider deterministic algorithms, and due to general
results, see [21], it is enough to compare linear algorithms
An (f ) =
n
X
Lk (f )gk
and An (f ) =
k=1
n
X
f (xk )gk ,
k=1
where gk ∈ L2 (Ω). The average case error of an algorithm is defined by
eavg (A) :=
Z
F
1/p
kf − A(f )k2p dµ(f )
.
As in the other settings, we define the minimal nth average case errors eall−avg
(F, Lp ), estd−avg
(F, Lp )
n
n
and the power function ℓ avg−H/B . That is, for
r all/std−avg (F, Lp ) = r(eall/std−avg
(F, Lp ))
n
we have
ℓ avg−x (r, p) :=
r std−avg (F, Lp )
.
r
F : r all−avg (F,Lp )=r
inf
As always, x ∈ {H, B} and we take the infimum over separable Hilbert (x = H) or Banach (x = B)
spaces equipped with zero mean Gaussian measures that are continuously embedded in Lp and for
which function values are continuous linear functionals as well as the rate of convergence is r when
arbitrary linear functionals are used.
As already mentioned, results are known only for p = 2. Then the cases of the Hilbert and
Banach spaces are the same due to the presence of Gaussian measures. This follows from the fact
that even if F is a separable Banach space then the minimal errors for the class Λall depend on the
Gaussian measure ν = µ I −1 given by
ν(M ) = µ ({f ∈ F | I(f ) ∈ M })
for a Borel set M of L2 . The measure ν is also a zero mean Gaussian measure whose covariance
operator Cν : L2 → L2 is given by
Z
hCν f1 , f2 iL2 =
hf, f1 iL2 hf, f2 iL2 dν(f ) for all f1 , f2 ∈ L2 .
L2
21
The Power of Function Values
The operator Cν is self adjoint, positive semi-definite, compact and has a finite trace. That is, its
ordered eigenvalues λj have a finite sum. It is known that
eall−avg
(F, L2 )
n
=
X
∞
λj
j=n+1
1/2
.
As in the randomized setting for the double Hilbert space, the results on the power function
are complete and there is no need to discuss different cases of r.
Theorem 8 ([5]). Let I : F → L2 (Ω) be a continuous embedding from a separable Banach space F
equipped with a zero mean Gaussian measure µ into L2 (Ω). Then
r all−avg (F, L2 ) = r std−avg (F, L2 ).
Therefore
ℓ avg−H/B (r, 2) = 1
for all
r > 0.
Of course it would be interesting to study the power function for other values of p. This is
posed as our last open problem.
Open Problem 8. Study the power function in the average case setting for p 6= 2. In particular,
verify whether a similar result as Theorem 8 holds.
Acknowledgment
We appreciate comments on this paper from Stefan Heinrich, Anargyros Papageorgiou, Joseph
F. Traub, Grzegorz W. Wasilkowski and two anonymous referees. We especially thank Stefan
Heinrich for pointing out some errors in our previous manuscript.
E.N. was partially supported by the DFG-Priority Program 1324. H.W. was partially supported
by the National Science Foundation.
References
[1] E. M. Galeev, Linear widths of Hölder-Nikolskii classes of periodic functions of several variables,
Math. Notes 59, 133–146, 1996.
[2] S. Heinrich, Randomized approximation of Sobolev embeddings, in: Monte Carlo and QuasiMonte Carlo Methods 2006, A. Keller, S. Heinrich, H. Niederreiter (eds.), 445–459, Springer,
Berlin, 2008.
[3] S. Heinrich, Randomized approximation of Sobolev embeddings II, J. Complexity 25, 455–472,
2009.
[4] S. Heinrich, Randomized approximation of Sobolev embeddings III, J. Complexity 25, 473–507,
2009.
[5] F. Hickernell, G. W. Wasilkowski and H. Woźniakowski, Tractability of linear multivariate
problems in the average case setting, in: Monte Carlo and Quasi-Monte Carlo Methods 2006,
A. Keller, S. Heinrich, H. Niederreiter (eds.), 461–494, Springer, Berlin, 2008.
[6] A. Hinrichs, E. Novak and J. Vybı́ral, Linear information versus function evaluations for L2 approximation, J. Approx. Th. 153, 97–107, 2008.
E. Novak, H. Woźniakowski
22
[7] F. Y. Kuo, G. W. Wasilkowski and H. Woźniakowski, Multivariate L∞ approximation in the
worst case setting over reproducing kernel Hilbert spaces, J. Approx. Th. 152, 135–160, 2008.
[8] F. Y. Kuo, G. W. Wasilkowski and H. Woźniakowski, On the power of standard information
for multivariate approximation in the worst case setting, J. Approx. Th. 158, 97–125, 2009.
[9] F. Y. Kuo, G. W. Wasilkowski and H. Woźniakowski, On the power of standard information
for L∞ approximation in the randomized case setting, BIT Numer. Math. 49, 543–564, 2009.
[10] P. Mathé, Random approximation of Sobolev embeddings, J. Complexity 7, 261–281, 1991.
[11] C. A. Micchelli and G. Wahba, Design problems for optimal surface interpolation, in Approximation Theory and Applications, Z. Ziegler ed., pp. 329–347, Academic Press, New York,
1981.
[12] E. Novak, Deterministic and Stochastic Error Bounds in Numerical Analysis, LNiM 1349,
Springer-Verlag, Berlin, 1988.
[13] E. Novak, Optimal linear randomized methods for linear operators in Hilbert spaces, J. Complexity 8, 22–36, 1992.
[14] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems, Volume I: Linear Information, European Math. Soc., Zürich, 2008.
[15] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems, Volume II: Standard
Information for Functionals, EMS, Zürich, 2010, to appear.
[16] A. Pietsch, Operator Ideals, North Holland, 1980.
[17] W. Sickel and T. Ullrich, Spline interpolation on sparse grids, to appear in Applicable Analysis.
[18] R. Tandetzky, Approximation of functions from a Hilbert space using function values or general
linear information, in progress, 2010.
[19] V. N. Temlyakov, Approximation of Periodic Functions, Nova Science, New York, 1993.
[20] V. N. Temlyakov, On approximate recovery of functions with bounded mixed derivative. J.
Complexity 9, 41–59, 1993.
[21] J. F. Traub, G. W. Wasilkowski and H. Woźniakowski, Information-Based Complexity, Academic Press, 1988.
[22] H. Triebel, Bases in Function Spaces, Sampling, Discrepancy, Numerical Integration, EMS
Publ. House, Zürich, 2010.
[23] J. Vybı́ral, Sampling numbers and function spaces, J. Complexity 23, 773–792, 2008.
[24] J. Vybı́ral, Widths of embeddings in function spaces, J. Complexity 24, 545–570, 2008.
[25] G. W. Wasilkowski, Randomization for continuous problems, J. Complexity 5, 195–218, 1989.
[26] H. Woźniakowski, Tractability and strong tractability of multivariate tensor product problems,
J. of Computing and Information 4, 1–19, 1994.
[27] G. W. Wasilkowski and H. Woźniakowski, On the power of standard information for weighted
approximation, Found. Comput. Math. 1, 417–434, 2001.
[28] G. W. Wasilkowski and H. Woźniakowski, The power of standard information for multivariate
approximation in the randomized case setting, Math. Comp. 76, 965–988, 2007.
Erich Novak
Mathematisches Institut, Universität Jena
Ernst-Abbe-Platz 2, 07740 Jena, Germany
novak@mathematik.uni-jena.de
http://users.minet.uni-jena/∼novak
The Power of Function Values
Henryk Woźniakowski
Department of Computer Science, Columbia University
New York, NY 10027, USA, and
Institute of Applied Mathematics, University of Warsaw
ul. Banacha 2, 02-097 Warszawa, Poland
henryk@cs.columbia.edu
http://www.cs.columbia.edu/∼henryk
23
Download