High-dimensional integration: The quasi-Monte Carlo way ∗† Josef Dick

advertisement
Acta Numerica (2013), pp. 133–288
doi:10.1017/S0962492913000044
c Cambridge University Press, 2013
Printed in the United Kingdom
High-dimensional integration:
The quasi-Monte Carlo way∗†
Josef Dick
University of New South Wales,
Sydney, NSW 2052, Australia
E-mail: josef.dick@unsw.edu.au
Frances Y. Kuo
University of New South Wales,
Sydney, NSW 2052, Australia
E-mail: f.kuo@unsw.edu.au
Ian H. Sloan
University of New South Wales,
Sydney, NSW 2052, Australia
E-mail: i.sloan@unsw.edu.au
and
King Fahd University of Petroleum and Minerals,
Dhahran 34463, Saudi Arabia
This paper is a contemporary review of QMC (‘quasi-Monte Carlo’) methods,
that is, equal-weight rules for the approximate evaluation of high-dimensional
integrals over the unit cube [0, 1]s , where s may be large, or even infinite. After a general introduction, the paper surveys recent developments in lattice
methods, digital nets, and related themes. Among those recent developments
are methods of construction of both lattices and digital nets, to yield QMC
rules that have a prescribed rate of convergence for sufficiently smooth functions, and ideally also guaranteed slow growth (or no growth) of the worst-case
error as s increases. A crucial role is played by parameters called ‘weights’,
since a careful use of the weight parameters is needed to ensure that the
worst-case errors in an appropriately weighted function space are bounded,
or grow only slowly, as the dimension s increases. Important tools for the
analysis are weighted function spaces, reproducing kernel Hilbert spaces, and
discrepancy, all of which are discussed with an appropriate level of detail.
∗
†
Colour online for monochrome figures available at journals.cambridge.org/anu.
The URLs cited in this work were correct at the time of going to press, but the publisher
and the authors make no undertaking that the citations remain live or are accurate or
appropriate.
134
J. Dick, F. Y. Kuo and I. H. Sloan
CONTENTS
1 Introduction
2 Introducing quasi-Monte Carlo methods
3 Error, discrepancy, and reproducing kernel
4 Weighted spaces and tractability
5 Lattice rules
6 Digital nets and sequences
7 Infinite-dimensional integration
8 Concluding remarks
References
134
137
164
177
191
228
263
274
275
1. Introduction
Numerical integration in more than one dimension has for a century been a
topic that presents many challenges, as witnessed by the classical monograph
by Stroud (1971). Among the challenges are those posed by the rich choice
of multi-dimensional geometries, whether the cube, ball, sphere, simplex, or
something more elaborate. Another challenge is the common occurrence of
singularities of the integrand at special points or sub-manifolds of the region.
But of all the challenges the one that stands above all others is the difficulty
of doing anything effective when the number of integration variables (the
‘dimension’) passes beyond the number of fingers on two hands. Certainly
the simple strategy of using a ‘product’ of one-dimensional rules becomes
infeasible, since even a mere 10-fold product of say two-point Gauss rules
requires 210 points. Yet these days problems arise, and are tackled, in which
the number of dimensions is in the hundreds, or even the thousands or tens
of thousands.
The aim of this survey is to describe and analyse effective methods for
numerical integration, with a special emphasis on a high number of dimensions. The price that has to be paid is that we consider only the simplest
geometry (namely the unit cube in an arbitrary number s of dimensions),
somewhat smooth functions, and only equal-weight integration rules.
Thus the task is to approximate the integral
f (x) dx,
(1.1)
Is (f ) =
[0,1]s
where s is greater than 1 and possibly large, for some integrable function f ,
by an n-point integration rule of the form
1
f (ti ),
Qn,s (f ) =
n
n−1
i=0
(1.2)
High-dimensional integration: The quasi-Monte Carlo way
135
thus a rule that uses the prescribed sample points t0 , . . . , tn−1 ∈ [0, 1]s . Of
course, we shall have much to say about the choice of these sample points. A
rule of this equal-weight form is called a quasi-Monte Carlo (or QMC) rule.
The name quasi-Monte Carlo comes from a certain analogy with the
Monte Carlo (MC) method: in its simplest form the MC approximation
to the integral (1.1) takes exactly the same form as (1.2), but with one
crucial difference, that the sample points are chosen randomly (and independently), from a uniform distribution on the cube [0, 1]s . The MC method
(Hammersley and Handscomb 1964, Lemieux 2009, Woźniakowski 2013) has
some very attractive features, most notably that there exists a probabilistic
(root-mean-square) error estimate of the form
σ(f )
√ ,
n
where σ 2 (f ) is the variance of f ,
σ 2 (f ) := Is (f 2 ) − (Is (f ))2 ,
which is finite whenever f is square-integrable. Moreover, σ(f ) can be
estimated by the same MC rule
√ used to estimate Is (f ). But for all the virtues
of the MC method, its O(1/ n) rate of convergence is often considered to
be impossibly slow, especially when f is smooth. For example, one needs
four times as many points to reduce the error by a factor of two, or 100
times as many points to reduce the error by an order of magnitude.
The QMC methods were born in the 1950s and 1960s from the√
(successful)
desire to achieve faster convergence than the MC rate of O(1/ n). In the
next sections we shall meet QMC methods that depend on constructing
points sets with small ‘discrepancy’. These methods were indeed successful
in improving upon the MC convergence rate, achieving a convergence rate
of order O((log n)s /n) or better if sufficient smoothness of f is assumed.
In particular, we will meet the so-called ‘lattice rules’, which for certain
periodic functions (periodic with respect to each component of x) could
achieve convergence rates of O((log n)s /n2 ) or faster, and so-called ‘higherorder digital nets’, which could achieve convergence rates of O((log n)2s /n2 )
or faster even for non-periodic functions.
In the early days of QMC research, the emphasis was mainly on the rate
of convergence as n increases, and less attention was paid to what happens as the dimensionality s increases; those early QMC researchers surely
never envisaged QMC methods being applied to integrands in hundreds of
dimensions. For lattice methods there was much interest in the 1960s in
periodization strategies (for converting non-periodic integrands into periodic ones), which can then achieve fast convergence for appropriate QMC
rules, but regrettably those periodization strategies often turn easy highdimensional problems into ones that are impossibly hard. In any case, the
136
J. Dick, F. Y. Kuo and I. H. Sloan
rate of convergence is only part of the story: the size of the implied constant
is also important.
In the last two decades there has been great research interest in developing
QMC methods and theory that can cope with arbitrarily high dimensions, if
the circumstances are right. (Note that we do not say that all high-dimensional problems can be successfully tackled by QMC methods. Rather,
the interest is in recognizing and analysing mathematically the particular
features that make some high-dimensional problems manageable.)
Applications have also played an important role in the development of
QMC methods suitable for high-dimensional problems. In financial mathematics, the numerical experiments of Paskov and Traub (1995), which
used low-discrepancy QMC methods in 360 dimensions to value parcels of
mortgage-backed obligations, were successful to a degree that caused universal surprise. This led to many theoretical developments, as researchers
struggled to understand how such high-dimensionality could be handled successfully (for example Caflisch, Morokoff and Owen 1997, Wang and Fang
2003). Option pricing problems have spurred many developments (for example Acworth, Broadie and Glasserman 1998, L’Ecuyer 2004, Giles, Kuo,
Sloan and Waterhouse 2008, Wang and Sloan 2011, Griebel, Kuo and Sloan
2013). A challenging class of problems in multivariate statistics led Kuo
et al. (2008a) to a new understanding of the importance of proper handling of the transformation process from Rs into [0, 1]s for problems over
unbounded regions. The problem of evaluating high-dimensional expected
values arising from partial differential equations with random coefficients,
typified by the flow of a liquid (oil or water) through a porous material,
with the permeability treated as a random field, is the newest driver of innovation (Graham et al. 2011, Kuo, Schwab and Sloan 2012). However, in
this survey the focus will be on the theoretical innovations rather than on
the applications themselves.
The structure of the survey is as follows. In Section 2 we introduce QMC
methods, and develop basic definitions and concepts without much theory.
In Section 3 we begin to develop key tools (such as reproducing kernel
Hilbert spaces) needed for error analysis of QMC methods. In Section 4
we introduce the modern theory of weighted function spaces, in which certain parameters (‘weights’) are introduced to describe the circumstance that
some variables (or groups of variables) are more important than others. In
Sections 5 and 6 we introduce more sophisticated material on lattice rules
and digital nets. Then in Section 7 we introduce a setting which is currently
attracting great interest, namely that of numerical integration in an infinite
number of dimensions. Finally in Section 8 we give concluding remarks. At
the end of each section we provide notes with references to further related
material.
High-dimensional integration: The quasi-Monte Carlo way
137
2. Introducing quasi-Monte Carlo methods
2.1. Multivariate numerical integration
Recall from (1.1) that our goal is to approximate a multiple integral over
the s-dimensional unit cube [0, 1]s ,
1
1
Is (f ) =
f (x) dx :=
···
f (x1 , . . . , xs ) dx1 · · · dxs ,
[0,1]s
0
0
where s is large, possibly in the hundreds or thousands. In practice most
integrals over bounded or unbounded regions can be transformed into this
desired form with a suitable change of variables. The transformation plays
an important role as it determines the behaviour of the transformed integrand f . We shall defer the choice and strategy of transformation to be
discussed later in Section 2.11.
For our analysis in this survey, we will generally assume that f is continuous and has a certain level of smoothness, as will be made precise in the
next section.
In the classical theory of numerical integration (Davis and Rabinowitz
1984), an integral in one dimension can be approximated by a quadrature
rule, as follows:
1
n−1
f (x) dx ≈
wi f (ti ),
0
i=0
where t0 , . . . , tn−1 ∈ [0, 1] are the quadrature
n−1 points, and w0 , . . . , wn−1 ∈ R
wi = 1. Well-known examples
are the quadrature weights satisfying i=0
include the (left) rectangle rule, which uses equally spaced points ti = i/n
and equal weights wi = 1/n and has the error f (ξ)/(2n) for some ξ ∈
(0, 1); the trapezoidal rule (which has second-order convergence); Simpson’s
rule (with fourth-order convergence); and the Gauss rules (which integrate
polynomials of degree 2n − 1 exactly).
For integration in s dimensions with s ≥ 2, we have cubature rules (Stroud
1971),
n−1
f (x) dx ≈
wi f (ti ),
[0,1]s
i=0
where t0 , . . . , tn−1 ∈
are the cubature points, ti = (ti,1 , . . . , ti,s ), and
n−1
wi = 1. Explicit
w0 , . . . , wn−1 ∈ R are the cubature weights satisfying i=0
cubature rules are available in low dimensions (Cools 1997). An obvious
way to approximate a high-dimensional integral is to use a product rule,
1
1
m−1
m−1
···
f (x1 , . . . , xs ) dx1 · · · dxs ≈
···
ui1 · · · uis f (vi1 , . . . , vis ),
[0, 1]s
0
0
i1 =0
is =0
138
J. Dick, F. Y. Kuo and I. H. Sloan
in effect applying a one-dimensional quadrature rule m−1
i=0 ui f (vi ) to each
of the s integrals. The cubature points ti are given by the s-fold product of
the quadrature points vi , and the cubature weights wi are the corresponding
products of the quadrature weights ui . The total number of points is n =
ms , which is enormous when s is large, while (since f could be a function
of only one component, such as f (x1 , . . . , xs ) = x21 ) the error is only the sth
root of the error of the one-dimensional quadrature rule: for example, the
product rectangle rule has error of order O(n−1/s ).
As the title suggests, this survey will focus on quasi-Monte Carlo methods
for high-dimensional integration. This section is devoted to introducing the
basic principles behind these methods. We begin in the next subsection with
a brief introduction to the related and widely used classical Monte Carlo
method.
Another competing technique for high-dimensional integration is based on
sparse grid methods (Bungartz and Griebel 2004), but we will not discuss
these methods in this survey.
2.2. Monte Carlo method
The classical Monte Carlo (MC ) method is an equal-weight cubature rule
of the form
n−1
1
f (ti ),
Qn,s (f ) =
n
i=0
where t0 , . . . , tn−1 are i.i.d. (independent and identically distributed) uniform random samples from [0, 1]s .
Theorem 2.1 (Monte Carlo root-mean-square error). For all squareintegrable functions f , we have
σ(f )
E[|Is (f ) − Qn,s (f )|2 ] = √ ,
n
where the expectation is taken with respect to the uniform random samples
t0 , . . . , tn−1 , and
σ 2 (f ) := Is (f 2 ) − (Is (f ))2
is the variance of f .
Although this result is well known, we give a proof since it will serve as
a model for later arguments.
Proof.
We have
E[|Qn,s (f ) − Is (f )|2 ] = E[(Qn,s (f ))2 ] − 2 E[Qn,s (f )]Is (f ) + (Is (f ))2 ,
High-dimensional integration: The quasi-Monte Carlo way
where
E[Qn,s (f )] =
=
···
[0,1]s
1
n
[0,1]s
n−1
i=0
139
n−1
1
f (ti ) dt0 . . . dtn−1
n
i=0
f (ti ) dti
[0,1]s
= Is (f )
and
E[(Qn,s (f )) ] =
=
[0,1]s
···
[0,1]s
=
[0,1]s
=
···
[0,1]s
[0,1]s
2
···
[0,1]s
1
f (ti )
n
i=0
n−1
2
dt0 . . . dtn−1
n−1 n−1
1 f (ti )f (tk ) dt0 . . . dtn−1
n2
1
n2
i=0 k=0
n−1
2
i=0
n−1 n−1
1 f (ti ) + 2
f (ti )f (tk ) dt0 . . . dtn−1
n
i=0 k=0
k=i
n−1 n−1 n−1 1 1 2
f
(t
)
dt
+
f
(t
)
dt
f (tk ) dtk
i
i
i
i
n2
n2
[0,1]s
[0,1]s
[0,1]s
i=0
i=0 k=0
k=i
1
n−1
Is (f 2 ) +
(Is (f ))2 .
n
n
Hence
=
E[|Qn,s (f ) − Is (f )|2 ] =
Is (f 2 ) − (Is (f ))2
,
n
as claimed.
Treating Qn,s (f ) as a random variable, we see from the above proof that
its mean is
E[Qn,s (f )] = Is (f )
(that is to say, the MC method is unbiased), and its variance is
Var[Qn,s (f )] = E[|Qn,s (f ) − Is (f )|2 ] =
σ 2 (f )
.
n
By the central limit theorem, if 0 < σ(f ) < ∞ then
c
1
σ(f )
2
=√
lim P |Is (f ) − Qn,s (f )| ≤ c √
e−x /2 dx.
n→∞
n
2π −c
In other words, we have a ‘probabilistic’ error bound with a convergence rate
140
J. Dick, F. Y. Kuo and I. H. Sloan
of O(n−1/2 ). The convergence rate is independent of the dimension. Comparing this with, for example, the product rectangle rule whose convergence
rate is O(n−1/s ), we see that the MC method has the faster convergence
when s ≥ 3 even for smooth functions.
Another advantage of the MC method is that it is easy to obtain a practical error estimation. Indeed, an unbiased estimator for Var[Qn,s (f )] is
given by
n−1
n−1
1
1
2
2
2
(f (ti ) − Qn,s (f )) =
f (ti ) − n[Qn,s (f )] .
n(n − 1)
n(n − 1)
i=0
i=0
(2.1)
The square root of this quantity provides an estimate for the root-meansquare MC error.
Although widely applicable, MC methods suffer from the slow O(n−1/2 )
convergence rate. Variance reduction techniques (e.g., importance sampling, stratified sampling, correlated sampling) can be used to improve the
efficiency of MC, but in practice MC methods often remain distressingly
slow. Bakhvalov (1959) proved that the O(n−1/2 ) rate of convergence cannot be improved for general square-integrable or continuous functions f .
For functions with more smoothness, this slow convergence rate is the main
motivation for the switch to quasi-Monte Carlo methods.
2.3. Quasi-Monte Carlo methods
Quasi-Monte Carlo (QMC ) methods are equal-weight cubature rules of the
form
n−1
1
f (ti ),
Qn,s (f ) =
n
i=0
just like MC methods, but now the points t0 , . . . , tn−1 ∈ [0, 1]s are chosen
deterministically to be better than random, in the sense that the deterministic nature of QMC leads to guaranteed error bounds, and that the convergence rate may be faster than the MC rate of O(n−1/2 ) for sufficiently
smooth functions.
There are two types of QMC methods.
• The ‘open’ type: this uses the first n points of an infinite sequence.
Thus to increase n one only needs to evaluate the integrand at the
additional cubature points.
• The ‘closed’ type: this uses a finite point set which depends on n. Thus
a new value of n means a completely new set of cubature points.
(Here and in the following, a point set is understood to be a multiset, that
is, we count points according to their multiplicity.)
141
High-dimensional integration: The quasi-Monte Carlo way
The following definition is needed before we provide examples of QMC
methods.
Definition 2.2 (radical inverse function). For integers i ≥ 0 and b ≥
2, we define the radical inverse function φb (i) as follows:
if
i=
∞
ia ba−1 ,
where
ia ∈ {0, 1, . . . , b − 1},
then
φb (i) :=
a=1
∞
ia
.
ba
a=1
In other words, if i = (· · · i2 i1 )b denotes the base b representation of i, then
φb (i) := (0.i1 i2 · · · )b .
Example 2.3 (van der Corput sequence). The van der Corput sequence in base b is the one-dimensional sequence
φb (0), φb (1), φb (2), . . . .
For example, take b = 2. First we write down the natural numbers 0, 1, 2, . . .
in base 2
0, 12 , 102 , 112 , 1002 , 1012 , 1102 , . . . .
Then we apply the radical inverse function φ2 to each number, to obtain
the sequence
0, 0.12 , 0.012 , 0.112 , 0.0012 , 0.1012 , 0.0112 , . . . ,
which in decimal form is the sequence
0, 0.5, 0.25, 0.75, 0.125, 0.625, 0.375, . . . .
Example 2.4 (Halton sequence). Let p1 , p2 , . . . , ps be the first s prime
numbers. The Halton sequence t0 , t1 , . . . in s dimensions is given by
ti = φp1 (i), φp2 (i), . . . , φps (i) , i = 0, 1, . . . ,
that is, the jth components of points in the Halton sequence form the van
der Corput sequence in base pj , where pj is the jth prime. The Halton
sequence leads to an ‘open’ QMC method. We have explicitly
t0
t1
t2
t3
= (0, 0, 0, . . . , 0),
= (0.12 , 0.13 , 0.15 , . . . , 0.1ps ),
= (0.012 , 0.23 , 0.25 , . . . , 0.2ps ),
= (0.112 , 0.013 , 0.35 , . . . , 0.3ps )
..
.
The Halton sequence satisfies the error bound
|Is (f ) − Qn,s (f )| ≤ Cs
(log n)s
V (f ),
n
for all n ≥ 2,
142
J. Dick, F. Y. Kuo and I. H. Sloan
where Cs depends only on s, and V (f ) is the variation of f in the sense of
Hardy and Krause; see for instance Kuipers and Niederreiter (1974, p. 147,
Definition 5.2). For now we simply observe that, although the convergence
rate appears to beat the MC rate of O(n−1/2 ), the MC rate is independent
of s, whereas for fixed s, the function (log n)s /n increases with increasing n
for all n < exp(s).
Example 2.5 (Hammersley point set). Let p1 , p2 , . . . , ps−1 be the first
s − 1 prime numbers. The Hammersley point set {t0 , t1 , . . . , tn−1 } with n
points in s dimensions is given by
i
, φp1 (i), φp2 (i), . . . , φps−1 (i) , i = 0, 1, . . . , n − 1.
ti =
n
The Hammersley point set leads to a ‘closed’ QMC method. We have explicitly
t0 = (0, 0, . . . , 0),
t1 = n1 , 0.12 , 0.13 , . . . , 0.1ps−1 ,
t2 = n2 , 0.012 , 0.23 , . . . , 0.2ps−1
..
.
tn−1 = n−1
n ,... .
The QMC method based on the Hammersley point set satisfies the error
bound
(log n)s−1
V (f ), for all n ≥ 2,
|Is (f ) − Qn,s (f )| ≤ Cs
n
where Cs depends only on s. Note that there is one less power of log n
compared to the error bound for the Halton sequence. Typically the error
bounds for QMC methods based on ‘closed’ point sets are better than those
based on ‘open’ sequences.
Example 2.6 (Kronecker sequence). Let 1, α1 , α2 , . . . , αs ∈ R be linearly independent over Q. The Kronecker sequence t0 , t1 , . . . in s dimensions
is given by
ti = {iα1 }, {iα2 }, . . . , {iαs } , i = 0, 1, . . . ,
where the braces indicate that we take the fractional part of a real number,
that is,
{x} := x − x.
(2.2)
√
For example, a good choice of parameters is αj = pj where pj is the
jth prime number. The error bound for the Kronecker sequence takes the
same form as that of the Halton sequence, but with a different multiplying
constant, again depending on s.
High-dimensional integration: The quasi-Monte Carlo way
143
Two main families of QMC methods are currently under investigation:
• digital nets (‘closed’) and digital sequences (‘open’),
• lattice rules (‘closed’ and ‘open’).
We will introduce these two families in the next few subsections.
2.4. Lattice rules
A lattice in Rs is a discrete subset of Rs which is closed under addition and
subtraction. An integration lattice in Rs is a lattice which contains Zs as
a subset. A lattice rule is an equal-weight cubature rule whose cubature
points are those points of an integration lattice that lie in the half-open
unit cube [0, 1)s .
Every lattice point set includes the origin 0. The projection of the lattice
points onto each axis gives equally spaced points. In a sense the integral in
each dimension is approximated by a rectangle rule (or a trapezoidal rule
if the integrand is periodic). Every lattice rule can be written as a multiple
sum involving one or more generating vectors. The minimal number of
generating vectors required to generate a lattice rule is known as the rank
of the rule. Besides rank-one lattice rules involving just one generating
vector, there exist lattice rules having rank up to s. Lattice rules were
introduced by Korobov (1959). They were originally designed for periodic
integrands.
Definition 2.7 (rank-one lattice rule). An n-point rank-one lattice rule
in s dimensions, also known as the method of good lattice points, is a QMC
method with cubature points
iz
, i = 0, 1, . . . , n − 1,
(2.3)
ti =
n
where z ∈ Zs , known as the generating vector, is an s-dimensional integer
vector having no factor in common with n, and the braces around the vector
indicate, as in (2.2), that we take the fractional part of each component in
the vector.
Example 2.8 (Fibonacci lattice). Let z = (1, Fk ) and n = Fk+1 , where
Fk and Fk+1 are consecutive Fibonacci numbers. Then the resulting twodimensional lattice point set is called a Fibonacci lattice; see Figure 2.1 for
two examples. Fibonacci lattices in two dimensions have a certain optimality
property, but there is no obvious generalization to higher dimensions that
retains the optimality property.
Since we are only interested in the fractional part of iz/n, the components
of z can be restricted to the set
Zn := {0, 1, 2, . . . , n − 1}.
144
J. Dick, F. Y. Kuo and I. H. Sloan
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 2.1. The Fibonacci lattices with 5 points and 55 points. The
corresponding generating vectors are (1, 3) and (1, 34), respectively.
Moreover, since we would prefer that every one-dimensional projection of
the lattice rule has n distinct values, the components of z can be further
restricted to the set
Un := {z ∈ Z : 1 ≤ z ≤ n − 1 and gcd(z, n) = 1}.
The number of elements in the set Un is ϕ(n) := |Un |, the Euler totient
function. If n = pa11 pa22 · · · pakk is the prime factorization of n, then
ϕ(n) = (pa11 − p1a1 −1 )(pa22 − p2a2 −1 ) · · · (pakk − pkak −1 ).
Asymptotically, ϕ(n) grows at a rate close to n: 1/ϕ(n) = O((log log n)/n).
For simplicity, we often assume that n is prime and thus ϕ(n) = n − 1.
This implies that:
• there are n − 1 choices for each component of z,
• there are (n − 1)s choices for the generating vector z.
For large n and s, an exhaustive search to find a generating vector that
minimizes some desired error criterion is practically impossible. We now
present two approaches for constructing lattice generating vectors in high
dimensions.
Example 2.9 (Korobov construction).
1 ≤ a ≤ n − 1 and gcd(a, n) = 1, we define
Given an integer a satisfying
z = z(a) := (1, a, a2 , . . . , as−1 ) mod n.
There are (at most) n − 1 choices for the Korobov parameter a, leading to
(at most) n − 1 choices for the generating vector z. Thus it is feasible in
practice to search through the (at most) n − 1 choices and take the one that
minimizes the desired error criterion.
High-dimensional integration: The quasi-Monte Carlo way
145
Example 2.10 (component-by-component (CBC) construction).
Given n, we construct a generating vector z = (z1 , z2 , . . .) as follows.
(1) Set z1 = 1.
(2) With z1 held fixed, choose z2 from Un to minimize a desired error
criterion in 2 dimensions.
(3) With z1 , z2 held fixed, choose z3 from Un to minimize a desired error
criterion in 3 dimensions.
(4) With z1 , z2 , z3 held fixed, choose z4 from Un to minimize a desired error
criterion in 4 dimensions.
(5) . . .
The generating vector obtained by the CBC construction is extensible in s.
In Section 5 we will consider two important aspects of the CBC construction:
error analysis and computational cost. We will also discuss how to obtain
extensible lattice sequences that are extensible in both s and n.
2.5. (t, m, s)-nets and (t, s)-sequences
Nets and sequences provide another method for obtaining well-distributed
point sets in the unit cube [0, 1)s that are useful for QMC integration. The
concept of (t, m, s)-net is based on subdividing the unit cube into intervals
and placing points in the cube such that each interval of a certain size and
shape contains the ‘correct’ number of points.
Definition 2.11 ((t, m, s)-net). Let t ≥ 0, m ≥ 1, s ≥ 1, and b ≥ 2 be
integers with t ≤ m. A (t, m, s)-net in base b is a point set P consisting of
bm points in [0, 1)s such that every elementary interval of the form
s aj aj + 1
(2.4)
,
bdj bdj
j=1
with each dj ≥ 0, 0 ≤ aj < bdj , and d1 + d2 + · · · + ds = m − t, contains
exactly bt points of P .
An elementary interval (2.4) has volume b−(d1 +d2 +···+ds ) = bt−m , which
is precisely the proportion of the points from P that lie in this elementary
interval. One would expect such a property to hold if the point set is uniformly distributed. Figure 2.2 provides an illustration of a two-dimensional
net with 16 points.
Note that any point set consisting of bm points in [0, 1)s is trivially an
(m, m, s)-net in base b, indicating that the concept of (t, m, s)-net is only
useful when t < m. For fixed b and m, the (t, m, s)-net condition gets
stronger as t gets smaller. Hence the aim is to find (t, m, s)-nets in base
146
J. Dick, F. Y. Kuo and I. H. Sloan
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 2.2. Illustration of a (0, 4, 2)-net in base 2: every elementary
interval of volume 1/16 contains exactly one of the 16 points. A point
that lies on the dividing line counts toward the interval above or to
the right.
b where t is small. A (t, m, s)-net is said to be a strict (t, m, s)-net if it
is not a (t − 1, m, s)-net. The ‘t-value’ is often referred to as the quality
parameter. It is also not useful to choose the base b too large, noting that in
the extreme case in which b = n we have m = 1, so that even when t = 0 the
only elementary intervals are the ‘boxes’ of side length 1 and thickness 1/n,
in which case even the set of points k(1/n, 1/n, . . . , 1/n) for k = 0, . . . , n − 1
lying on the main diagonal is a (0, m, s)-net. The relevant quantity for fixed
n is the strength k = m − t of the (t, m, s)-net. The aim is then to find nets
with large strength k.
There is an analogous concept for infinite sequences.
Definition 2.12 ((t, s)-sequences). Let t ≥ 0 and s ≥ 1 be integers. A
(t, s)-sequence in base b is a sequence of points S = (t0 , t1 , . . .) in [0, 1)s such
that for any integers m > t and ≥ 0, every block of bm points
tbm , . . . , t(+1)bm −1
in the sequence S forms a (t, m, s)-net in base b.
The van der Corput sequence in base b is a (0, 1)-sequence in base b.
Higher-dimensional examples include the Sobol sequence (Sobol 1967),
which is a (t, s)-sequence in base 2, where t is a non-decreasing function of s;
the Faure sequence (Faure 1982); the Niederreiter sequence (Niederreiter
1987); and the Niederreiter–Xing sequence (Niederreiter and Xing 1995).
High-dimensional integration: The quasi-Monte Carlo way
147
More details about these sequences will be given in Section 2.7. These sequences are also commonly referred to as ‘low-discrepancy sequences’ due to
their high uniformity. We will explain the term ‘discrepancy’ in Section 3.
2.6. Digital construction scheme
We now introduce a method for the construction of (t, m, s)-nets and (t, s)sequences. This method is based on linear algebra over finite fields.
Let b be prime and let Zb := {0, 1, . . . , b−1} denote the equivalence classes
of integers modulo b. Using addition + and multiplication ∗ of elements in
Zb modulo the prime b, we obtain a finite field of order b. In the following
we identify the element k in the finite field Zb with the integer 0 ≤ k < b.
Let C1 , . . . , Cs be m-by-m matrices with entries in Zb . Then ti,j , the jth
component of the ith point in P = {t0 , . . . , tbm −1 }, can be constructed as
follows.
(1) Write i in its base b representation:
i = (im · · · i2 i1 )b = i1 + i2 b + · · · + im bm−1 .
(2) Compute


 
y1
i1
 y2 
 i2 
 
 
 ..  = Cj  .. ,
 . 
. 
ym
im
where all additions and multiplications are done in the finite field Zb ,
i.e., modulo b.
(3) Set
y 1 y2
ym
+ 2 + ··· + m.
b
b
b
The resulting point set P = {t0 , . . . , tbm −1 } is called a digital net over Zb
and the matrices C1 , . . . , Cs are called the generating matrices of the digital
net. The following definition connects (t, m, s)-nets in base b with digital
nets.
ti,j = (0.y1 y2 · · · ym )b =
Definition 2.13 (digital (t, m, s)-net). Let b be prime, t ≥ 0, m ≥ 1,
and s ≥ 1 be integers. If a digital net P constructed over Zb is a (t, m, s)-net
in base b, then P is called a digital (t, m, s)-net over Zb .
The following lemma provides the connection between the generating matrices of a digital net and the (t, m, s)-net property.
Lemma 2.14 (t-value for a digital net). A digital net over Zb with
generating matrices C1 , . . . , Cs is a digital (t, m, s)-net over Zb if and only if,
148
J. Dick, F. Y. Kuo and I. H. Sloan
for all d1 , . . . , ds ≥ 0 with d1 + · · · + ds = m − t, the set of vectors
c1,1 , c1,2 , . . . , c1,d1 , c2,1 , c2,2 , . . . , c2,d2 , . . . , cs,1 , cs,2 , . . . , cs,ds
is linearly independent over Zb , where cj, denotes the th row vector of the
matrix Cj .
Proof.
Let d1 , . . . , ds ≥ 0 such that d1 + · · · + ds = m − t, let
s aj aj + 1
J=
,
bdj bdj
j=1
be an elementary interval, and let
uj,d
uj,1 uj,2
aj
+ 2 + ··· + d j .
=
d
j
b
b
b
bj
Thus, for given d1 , . . . , ds , we can uniquely associate the vector
u = (u1,1 , . . . , u1,d1 , . . . , us,1 , . . . , us,ds )
to the elementary interval J. Let {t0 , t1 , . . . , tbm −1 } be a digital net with
. Let ti = (ti,1 , ti,2 , . . . , ti,s ) with
generating matrices C1 , . . . , Cs ∈ Zm×m
b
−1
−2
−m
ti,j = τi,j,1 b + τi,j,2 b + · · · + τi,j,m b . Then ti ∈ J if and only if
uj, = τi,j, for 1 ≤ ≤ dj and 1 ≤ j ≤ s. Let Cj = (c
j,1 , . . . , cj,m ) , that is,
cj, is the th row vector of Cj . Let A = (c
1,1 , . . . , c1,d1 , . . . , cs,1 , . . . , cs,ds ) .
Then the number of points of the digital net which lie in J is given by the
number of solutions (i1 , . . . , im ) of the equation over Zb :
 
i1
 i2 
 
A  ..  = u.
(2.5)
. 
im
For each vector u the number of solutions of (2.5) is bm−t if and only if the
rows of A are linearly independent. Thus the result follows.
Constructions of good generating matrices for digital nets yielding digital
(t, m, s)-nets with small t-value will be discussed in Section 2.7. In the
following we give a two-dimensional example of digital (0, m, 2)-nets over
Zb for arbitrary prime b.
Example 2.15.
Let b be a prime and let

1 0 ··· ···
0 1
0 ···

 .. . .
.
.. ...
C1 =  .
.

0 · · · 0
1
0 ··· ··· 0
m ≥ 1 be an integer. Let

0
0

..  ∈ Zm×m
.
b

0
1
High-dimensional integration: The quasi-Monte Carlo way
and

0 ···
0 · · ·


C2 =  ... . . .

0 1
1 0
···
0
.
..
0
···
0
1
..
.
···
···
149

1
0

..  ∈ Zm×m .
.
b

0
0
Then Lemma 2.14 implies that C1 and C2 are generating matrices of a
digital (0, m, 2)-net over Zb .
The concept of digital nets can be extended to infinite sequences S =
(t0 , t1 , . . .) in [0, 1)s . To do so, we need infinite matrices C1 , . . . , Cs ∈ ZN×N ,
with Cj = (cj,k, )k,∈N and cj,k, ∈ Zb , such that for every there is a k
such that cj,k, = 0 for all k > k . Then ti,j , the jth component of the ith
point in the sequence S is constructed in the following way.
(1) Write i in its base b representation:
i = (· · · i2 i1 )b = i1 + i2 b + · · · .
(Note that only finitely many digits ik are different from 0.)
(2) Compute
 
 
y1
i1
 y2 
i2 
  = Cj  ,
..
..
.
.
where all additions and multiplications are done in the finite field Zb .
(3) Set
y 1 y2
+ 2 + ··· .
b
b
The resulting sequence S = (t0 , t1 , . . .) is called a digital sequence over Zb
and the matrices C1 , . . . , Cs are called the generating matrices of the digital
sequence. The following definition connects (t, s)-sequences in base b with
digital sequences.
ti,j = (0.y1 y2 · · · )b =
Definition 2.16 (digital (t, s)-sequences). Let b be prime, and let t ≥ 0
and s ≥ 1 be integers. If a digital sequence S constructed over Zb is a (t, s)sequence in base b, then S is called a digital (t, s)-sequence over Zb .
Constructions of good generating matrices for digital sequences yielding
digital (t, s)-sequences with small t-value will be discussed in Section 2.7.
In the following we give a one-dimensional example of a (0, 1)-sequence over
Zb for arbitrary prime b.
Example 2.17. Let b be a prime. Let C ∈ ZbN×N be the matrix with
ones along the main diagonal and zeros everywhere else (i.e., the identity
150
J. Dick, F. Y. Kuo and I. H. Sloan
matrix). Then C generates a digital (0, 1)-sequence over Zb . In fact, C
generates the van der Corput sequence in base b.
2.7. Constructions of digital nets and sequences
In this subsection we discuss explicit constructions of digital nets and sequences with small t value.
In the previous subsection we introduced (digital) (t, m, s)-nets and (digital) (t, s)-sequences and we gave explicit constructions for small dimensions.
The first constructions in arbitrary dimensions are due to Sobol (1967) and
Faure (1982), before the general concept of (digital) (t, s)-sequence was introduced in Niederreiter (1987). In the following we discuss these explicit
constructions in detail.
We use the following notation. Let Zb [x] denote the set of polynomials
over the finite field Zb . The polynomial p(x) = a0 + a1 x + · · · + am xm with
am = 0 is said to have degree deg(p) = m. For p = 0 we set deg(p) = −∞.
A polynomial p ∈ Zb [x] is irreducible if it cannot be written as a product of
two non-constant polynomials, that is, there are no q, q ∈ Zb [x] such that
p(x) = q(x)q (x) with deg(q), deg(q ) > 0. A polynomial p(x) = a0 + a1 x +
· · ·+am xm ∈ Zb [x] of degree m ≥ 1 is primitive if and only if am = 1, a0 = 0
and the smallest integer k such that there is a polynomial q ∈ Zb [x] with
p(x)q(x) = xk −1 ∈ Zb [x] is k = bm −1. It can be shown that every primitive
polynomial is irreducible. Further, if p is primitive, then the polynomial x is
a primitive element in Zb [x]/p, that is, {xk mod p : k = 0, 1, . . . , bm − 2} =
(Zb [x]/p) \ {0}.
Example 2.18 (Sobol sequence). Sobol (1967) was the first to introduce a construction of (t, s)-sequences over Z2 , which are now referred to as
Sobol sequences. These are digital sequences over the finite field Z2 .
(1) Let p1 , . . . , ps ∈ Z2 [x] be distinct primitive polynomials ordered according to their degree, and let
pj (x) = xej + a1,j xej −1 + a2,j xej −2 + · · · + aej −1,j x + 1,
for 1 ≤ j ≤ s,
where aj,k ∈ Zb . Note that ej denotes the degree of the polynomial pj .
(2) Choose odd natural numbers 1 ≤ m1,j , . . . , mej ,j such that mk,j < 2k
for 1 ≤ k ≤ ej , and for all k > ej define mk,j recursively by
mk,j = 2a1,j mk−1,j ⊕· · ·⊕2ej −1 aej −1,j mk−ej +1,j ⊕2ej mk−ej ,j ⊕mk−ej ,j ,
where ⊕ is the bit-by-bit exclusive-or operator.
(3) The so-called direction numbers are defined by
mk,j
vk,j := k , for k ≥ 1.
2
High-dimensional integration: The quasi-Monte Carlo way
151
(4) Then for i ∈ N0 with dyadic expansion i = i0 + 2i1 + · · · + 2r−1 ir−1 we
define
ti,j = i0 v1,j ⊕ i1 v2,j ⊕ · · · ⊕ ir−1 vr,j .
The Sobol sequence is then the sequence of points t0 , t1 , . . . , where
ti = (ti,1 , . . . , ti,s ).
Sobol sequences are digital (t, s)-sequences with
t=
s
(ej − 1).
j=1
This result holds for all choices of direction numbers, and therefore the
direction numbers do not influence the overall quality of Sobol sequences.
Note, however, that the t-value for a Sobol net of bm points for different m
can be smaller than the t-value of the full sequence, and hence the choice
of direction numbers can affect the quality of a Sobol net.
Example 2.19 (Faure sequence). Faure (1982) introduced a construction of (0, s)-sequences over prime fields Zb with s ≤ b, which are now called
Faure sequences. The generating matrices C1 , . . . , Cs are given by
Cj = (P )j−1
(mod b)
for 1 ≤ j ≤ s,
where P is the Pascal matrix given by
0
0
0

...
0
1
2
1
1
1
. . .
 0

,
12
22
P =
 2
. . .
1
2
 0

..
..
.. . .
.
.
.
.
k where we set = 0 for > k. A Faure sequence is a digital (0, s)-sequence.
Example 2.20 (Niederreiter sequence). Niederreiter (1987) introduced
the general concept of digital (t, s)-sequences, which includes both the Sobol
and Faure constructions as special cases. We describe a special case of this
sequence in the following.
Let Zb = {0, 1, . . . , b − 1} denote the finite field of prime order b with
addition and multiplication modulo b. Let Zb [x] denote the set of polynomials with coefficients in Zb . Addition and multiplication of polynomials
in Zb is defined by using arithmetic in Zb (i.e., modulo b). This can also
be extended to division of polynomials in Zb , which gives rise to the set of
formal series Zb ((x−1 )), which are series of the form
∞
=w
a x−
152
J. Dick, F. Y. Kuo and I. H. Sloan
for some integer w and coefficients a ∈ Zb . Note that this set contains
the set of polynomials Zb [x]. This set now permits addition, subtraction,
multiplication and division (except dividing by 0 of course) of formal series
where the coefficients are added, subtracted, multiplied and divided modulo b. Thus Zb ((x−1 )) is a field, which is called the field of formal Laurent
series. The construction of the generating matrices is based on arithmetic
in this field.
(1) Let p1 , p2 , . . . , ps ∈ Zb [x] be distinct monic irreducible polynomials over
Zb (monic means that the coefficient of the monomial in pj with the
highest degree is 1). Let ej = deg(pj ) for 1 ≤ j ≤ s. The best results
are obtained by choosing the degree of the polynomials p1 , p2 , . . . , ps
as small as possible.
(2) For integers 1 ≤ j ≤ s, u ≥ 1 and 0 ≤ k < ej , consider the expansions
∞
xej −k−1 (j)
=
a (u, k, )x−−1
pj (x)u
(2.6)
=0
over the field of formal Laurent series Zb ((x−1 )).
(3) Then we define the entries in the matrix Cj = (cj,i, )i,∈N in the following way:
cj,i, = a(j) (Q + 1, k, ) ∈ Zb
for 1 ≤ j ≤ s, i ≥ 1, ≥ 0,
where i − 1 = Qej + k with integers Q = Q(i, j) and k = k(i, j), with
k satisfying 0 ≤ k < ej .
We briefly describe how to compute the coefficients a(j) (u, k, ) recursively. Let u, j be fixed and let v = a(j) (u, ej − 1, ) for all ≥ 0. Then
a(j) (u, k, ) = v+ej −1−k for 0 ≤ k < ej . Let pj (x)u = xuej − buej −1 xuej −1 −
· · ·−b0 , where we assume without loss of generality that pj (x) is monic (i.e.,
the coefficient of xej is 1). Then (2.6) can be written as
1 = (v0 x−1 + v1 x−2 + · · · )(xuej − buej −1 xuej −1 − · · · − b0 ).
By comparing coefficients we obtain v0 = · · · = viej −2 = 0, viej −1 = 1 and
vuej + = buej −1 vuej +−1 + · · · + b0 v ∈ Zb ,
for all ≥ 0.
The Niederreiter sequence is a digital (t, s)-sequence with
t=
s
(ej − 1).
j=1
The Faure sequence can be obtained as a special case of the Niederreiter
sequence, which corresponds to the case where the base b is a prime number
High-dimensional integration: The quasi-Monte Carlo way
153
such that b ≥ s and pj (x) = x − j + 1 for 1 ≤ j ≤ s. The Sobol sequence is
obtained by choosing p1 (x) = x and p2 , p3 , . . . , ps as primitive polynomials.
This yields a Sobol sequence with a special choice of direction numbers. The
Niederreiter sequence can also be generalized, where xej −k−1 is replaced
by a polynomial of degree ej − k − 1. The choice of polynomials then
corresponds to choosing different direction numbers. This way one can
obtain the generating matrices for a Sobol sequence. On the other hand, an
algorithm similar to the one introduced above for Sobol sequences can also
be obtained for Niederreiter sequences, which yields a fast implementation
of Niederreiter sequences.
A practical implementation of digital nets and sequences can use a Gray
code ordering to improve efficiency.
2.8. Polynomial lattice rule construction
In this subsection we discuss polynomial lattice rules, a concept analogous
to lattice rules, but based on linear algebra over finite fields; they form a
special class of digital nets.
Let b be prime, let p be a polynomial of degree m with coefficients in Zb ,
and let q1 , . . . , qs be polynomials of degree at most m − 1 with coefficients
in Zb . Then Cj , the jth generating matrix, can be chosen as follows.
(1) Let u1 , u2 , . . . ∈ Zb be such that
qj (x)
u1 u2
=
+ 2 + ··· .
p(x)
x
x
For given qj (x) and p(x), the values of u1 , u2 , . . . can be obtained by
equating coefficients in qj (x) = (u1 /x + u2 /x2 + · · · )p(x), noting that
all additions and multiplications are to be done in Zb .
(2) Set

u1
 u2


Cj =  u3
 ..
 .
u2
u3
.
..
.
..
u3
.
..
.
..
.
..
···
.
..
.
..
.
..
um um+1 um+2 · · ·

um
um+1 

um+2 
.
 ∈ Zm×m
b
.. 

.
u2m−1
The digital net with generating matrices C1 , . . . , Cs as defined above is called
a polynomial lattice point set, and a QMC rule using a polynomial lattice
point set is called a polynomial lattice rule. The polynomial p is referred to
as the modulus, and the vector of polynomials (q1 , . . . , qs ) is referred to as
the generating vector.
There is also a faster method for generating the cubature points of a
polynomial lattice rule which does not use the generating matrices. For
154
J. Dick, F. Y. Kuo and I. H. Sloan
the following we need to introduce some definitions and some notation. As
before, let Zb [x] denote the set of polynomials with coefficients in Zb and
let Zb ((x−1 )) denote the set of formal Laurent series
∞
a x− ,
=w
where a ∈ Zb . For m ∈ N let υm be the map from Zb ((x−1 )) to the interval
[0, 1) defined by
∞
m
−
=
t x
t b− .
υm
=w
=max(1,w)
We frequently associate a non-negative integer k, with b-adic expansion
k = κ0 +κ1 b+· · ·+κa ba , with the polynomial k(x) = κ0 +κ1 x+· · ·+κa xa ∈
Zb [x] and vice versa. Further, for arbitrary k = (k1 , . . . , ks ) ∈ Zb [x]s and
q = (q1 , . . . , qs ) ∈ Zb [x]s , we define the ‘inner product’
k·q =
s
ki qi ∈ Zb [x],
i=1
and we write q ≡ 0 (mod p) if p divides q in Zb [x].
With these definitions we can give the following equivalent but simpler
form of the construction of P(q, p).
Theorem 2.21. Let b be a prime and let m, s ∈ N. For p ∈ Zb [x] with
deg(p) = m and q = (q1 , . . . , qs ) ∈ Zb [x]s , the polynomial lattice point set
P(q, p) is the point set consisting of the bm points
i(x)q1 (x)
i(x)qs (x)
, . . . , υm
∈ [0, 1)s ,
t i = υm
p(x)
p(x)
for i ∈ Zb [x] with deg(i) < m.
The task of choosing good generating matrices is now replaced by the
task of choosing a good generating vector (q1 , . . . , qs ). The total number of
2
possible choices of generating matrices is reduced from bm s to bms .
Example 2.22. In this example we show the polynomial lattice rule analogue of Fibonacci lattice rules; they are constructed via the continued fraction expansion of the ratio q2 (x)/p(x). Let b be prime and let m ≥ 1 be an
integer. Set q1 (x) = 1. Let p(x) be a polynomial over Zb of degree m and
let q2 (x) be the polynomial over Zb defined by
1
q2 (x)
=
p(x)
1+x+
1+x+
1
.
1
1+x+
..
.
155
High-dimensional integration: The quasi-Monte Carlo way
The corresponding polynomial lattice point set is a digital (0, m, 2)-net over
Zb . For instance, let b = 2. Then for m = 2 we obtain
q2 (x)
1
=
p(x)
1+x+
1
1+x
=
1+x
1+x
=
,
(1 + x)2 + 1
x2
and thus p(x) = x2 and q2 (x) = 1 + x. For m = 3 we obtain
1
q2 (x)
=
p(x)
1+x+
1
1
1+x+ 1+x
=
x2
x2
=
,
(1 + x)x2 + (1 + x)
x3 + x2 + x + 1
and thus p(x) = x3 + x2 + x + 1 and q2 (x) = x2 .
2.9. Randomization and error estimation: shifted lattice rules
We recall that the mean-square MC error can be estimated in practice using
(2.1). In comparison, a fully deterministic QMC method, although having
a faster rate of convergence, is biased and lacks a practical error estimate.
‘Randomized’ QMC methods combine the best of both worlds; their advantages are as follows.
• Randomization yields an unbiased estimator.
• Randomization provides a practical error estimate.
• Randomized QMC methods enjoy faster rates of convergence than the
MC method for smooth functions.
• Some randomization technique (see ‘scrambling’ below) can further
improve the QMC rate of convergence by an additional O(n−1/2 ).
Here we discuss the simplest form of randomization called shifting, and
we explain how an error estimate can be obtained in practice. We begin
with a remark that shifting preserves the lattice structure, and therefore this
randomization technique typically goes with lattice rules, to yield so-called
shifted lattice rules. Having said that, shifting can be used together with
any QMC method to obtain a practical error estimate (however, it need not
preserve the original structure of the QMC point set).
The idea behind shifting is to move all the points in the same direction
by the same amount. If any point falls outside the unit cube then it is
‘wrapped’ back into the cube from the opposite side. More precisely, given
a vector ∆ ∈ [0, 1]s , known as the shift, the ∆-shift of the QMC points
t0 , . . . , tn−1 yields points
{ti + ∆},
i = 0, 1, . . . , n − 1,
(2.7)
where, as in (2.2), the braces indicate taking the fractional parts. Figure 2.3
illustrates how shifting is done.
156
J. Dick, F. Y. Kuo and I. H. Sloan
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
(a)
(b)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
(c)
Figure 2.3. Applying a (0.1, 0.3)-shift to a 64-point lattice rule in
two dimensions: (a) original lattice rule, (b) moving all points by
(0.1, 0.3), (c) wrapping the points back inside the unit cube.
For a random shift ∆ ∈ [0, 1]s , the shifted QMC points {ti + ∆}, i =
0, 1, . . . , n − 1 are correlated. Therefore we cannot estimate the variance of
the shifted QMC rule using the sample variance as in (2.1). Instead, we
need to use a number of independent random shifts as follows.
(1) We generate q independent random shifts ∆0 , ∆1 , . . . , ∆q−1 from the
uniform distribution on [0, 1]s .
(0)
(1)
(2) For a given QMC rule, we form the approximations Qn,s (f ), Qn,s (f ),
(q−1)
. . . , Qn,s (f ), where
1
f ({ti + ∆k }),
n
n−1
Q(k)
n,s (f ) =
k = 0, 1, . . . , q − 1,
i=0
is the approximation of the integral using a ∆k -shift of the original
QMC rule.
(3) We take the average
1 (k)
Qn,s (f )
Q̄n,s,q (f ) =
q
q−1
k=0
as our final approximation to the integral.
(4) An unbiased estimate for the mean-square error of Q̄n,s,q (f ) is given
by
q−1
1
2
(Q(k)
n,s (f ) − Q̄n,s,q (f )) .
q(q − 1)
k=0
Typically we take n in the thousands or more while keeping q small, say
around 10–50. To obtain a fair comparison between the MC method and
High-dimensional integration: The quasi-Monte Carlo way
157
a randomized QMC method (e.g., for the two methods to have the same
number of function evaluations), we should take nMC = q · nQMC .
Scrambling is a popular but more complicated randomization method.
The idea is to randomly and recursively permute the points between elementary intervals so that the digital net structure is preserved. More details
will be given in the next subsection. For now we just introduce a related
randomization technique called digital shift, which is similar to shifting. We
just need to replace (2.7) by
ti ⊕ ∆,
i = 0, 1, . . . , n − 1,
where ⊕ denotes the digit-wise addition operator in base b defined as follows.
If x, σ ∈ [0, 1) have base b representations
x = (0.x1 x2 · · · )b
and
σ = (0.σ1 σ2 · · · )b ,
then y = x ⊕ σ = (0.y1 y2 · · · )b , where
yi = (xi + σi ) mod b.
The procedure for estimating the error is the same as for shifting.
For later use we also define digit-wise subtraction y = xσ = (0.y1 y2 · · ·)b ,
which is defined by yi = (xi − σi ) mod b.
2.10. Scrambled nets and sequences
Randomization methods are designed to introduce randomness into the
point sets but at the same time should preferably keep the relevant structure intact. In the case of (t, m, s)-nets this means that the net should still
be a (t, m, s)-net after the randomization (with probability 1). This can be
achieved by the following algorithm introduced by Owen (1997a).
We first introduce Owen’s scrambling algorithm, which is most easily
described for some generic point x ∈ [0, 1)s , with x = (x1 , . . . , xs ) and
xj = xj,1 b−1 + xj,2 b−2 + · · · . The scrambled point will be denoted by
y ∈ [0, 1)s , where y = (y1 , . . . , ys ) and yj = yj,1 b−1 + yj,2 b−2 + · · · . The
point y is obtained by applying permutations to each digit of each coordinate of x. The permutation applied to xj, depends on xj,k for 1 ≤ k < .
Specifically, yj,1 = πj (xj,1 ), yj,2 = πj,xj,1 (xj,2 ), yj,3 = πj,xj,1 ,xj,2 (xj,3 ), and in
general
yj,k = πj,xj,1 ,...,xj,k−1 (xj,k ),
(2.8)
where πj,xj,1 ,...,xj,k−1 is a random permutation of {0, . . . , b − 1}. We assume
that permutations with different indices are chosen mutually independent
from each other and that each permutation is chosen with the same probability. In this case the scrambled point y is uniformly distributed in [0, 1)s .
We show this fact in Section 6.
158
J. Dick, F. Y. Kuo and I. H. Sloan
We now introduce some convenient notation to describe Owen’s scrambling. For 1 ≤ j ≤ s let
Πj = {πj,xj,1 ,...xj,k−1 : k ∈ N, xj,1 , . . . , xj,k−1 ∈ {0, . . . , b − 1}}
(where for k = 1 we set πj,xj,1 ,...xj,k−1 = πj ) be a given set of permutations,
and let Π = (Π1 , . . . , Πs ). Then, when applying Owen’s scrambling using
these permutations to some point x ∈ [0, 1)s , we write y = Π(x), where
y is the point obtained by applying Owen’s scrambling to x using the permutations Π1 , . . . , Πs . For x ∈ [0, 1) we drop the subscript j and just write
y = Π(x).
To scramble a (t, m, s)-net or a (t, s)-sequence, we choose the permutations in Π for each coordinate independently from each other and then
apply these permutations to each point of the (t, m, s)-net or (t, s)-sequence
as explained above. For instance, for a given (t, m, s)-net t0 , t1 , . . . , tbm −1 ,
the Owen-scrambled (t, m, s)-net is given by
Π(t0 ), Π(t1 ), . . . , Π(tbm −1 ).
In practice, the number of permutations one has to choose is large, but
there are simplifications of the randomization procedure which greatly reduce this number. This yields a fast scrambling algorithm. More details are
given in Section 6.
In Section 6 we show that scrambling also yields an unbiased estimate of
the integral, that is,
n−1
1
f (Π(ti )) =
f (x) dx.
E
n
[0,1]s
i=0
In the same way as for random (digital) shifts, one can also obtain an
unbiased estimate of the mean-square error for scrambling. It is sufficient
to choose only one set of random permutations Π. Let
Qn1 ,n2 ,s (f ) =
n
2 −1
1
f (Π(ti )).
n2 − n1
i=n1
Let 1 ≤ m < m. The mean-square error of the cubature rule applied to f
can now be estimated by
bm
−1
1
bm (bm
− 1)
2
Qibm−m ,(i+1)bm−m ,s (f ) − Q0,bm ,s (f ) ,
i=0
where m can be chosen such that, say, bm ≈ 30.
High-dimensional integration: The quasi-Monte Carlo way
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
159
Figure 2.4. Owen’s scrambling in base 2: (a) original; (b) swap left and
right halves; (c) swap 3rd and 4th vertical quarters; (d) swap 3rd and 4th,
7th and last vertical eighths; (e) swap 3rd and 4th, 7th and 8th, 9th and
10th, 15th and last sixteenths; (f) swap 1st and 2nd horizontal quarters;
(g) swap 1st and 2nd, 5th and 6th, 7th and last horizontal eighths; (h) swap
3rd and 4th, 7th and 8th, 9th and 10th, 15th and last horizontal sixteenths.
160
J. Dick, F. Y. Kuo and I. H. Sloan
2.11. Transformation to the unit cube
In this subsection we discuss some simple strategies for transforming an
integral over Rs to an integral over [0, 1]s . We begin with the univariate
case.
Consider an integral
∞
g(y) φ(y) dy,
−∞
where φ : R → R is some univariate
probability density function, that is,
∞
φ(y) ≥ 0 for all y ∈ R and −∞ φ(y) dy = 1. Let Φ : R → [0, 1] denote the
y
cumulative distribution function of φ, that is, Φ(y) := −∞ φ(t) dt, and let
Φ−1 : [0, 1] → R denote the inverse of the cumulative distribution function.
Then we can use the substitution (or change of variables)
x = Φ(y) ⇐⇒ y = Φ−1 (x),
to obtain
∞
1
g(y) φ(y) dy =
−∞
−1
g(Φ
1
(x)) dx =
0
f (x) dx,
0
with the transformed integrand defined by f := g ◦ Φ−1 .
Note that this is not the only way to obtain a transformed integrand.
Indeed, we can divide and multiply the original integrand by any other
probability density function φ̃, and then map to [0, 1] using its inverse cumulative distribution function Φ̃−1 :
∞
∞
g(y) φ(y)
g(y) φ(y) dy =
φ̃(y) dy
φ̃(y)
−∞
−∞
1
1
∞
−1
g̃(y) φ̃(y) dy =
g̃(Φ̃ (x)) dx =
f˜(x) dx,
=
−∞
0
0
where g̃(y) := g(y) φ(y)/φ̃(y), giving a different transformed integrand f˜ :=
g̃ ◦ Φ̃−1 . Ideally we would like to choose a density function φ̃ that leads
to a ‘nice’ integrand in the unit cube. This is related to the concept of
importance sampling for MC methods.
The MC approximation for the integral over R is
∞
n−1
1
g(y) φ(y) dy ≈
g(τi ),
n
−∞
i=0
where the MC points τ0 , . . . , τn−1 are randomly sampled following the distribution of φ. Depending on the choice of φ, there may exist an algorithm
to sample from the distribution directly, or it might be necessary to generate random samples from the uniform distribution on [0, 1]and then map
161
High-dimensional integration: The quasi-Monte Carlo way
them back to R using Φ−1 . The latter approach is consistent with the MC
method over [0, 1],
∞
1
g(y) φ(y) dy =
−∞
g(Φ−1 (x)) dx ≈
0
1
g(Φ−1 (ti )),
n
n−1
i=0
where t0 , . . . , tn−1 are uniform random samples from [0, 1]. If we use a
different density function φ̃ to sample the points from, then we evaluate the
points for a different function g̃:
∞
∞
g(y) φ(y) dy =
−∞
−∞
1
g̃(τi ).
n
n−1
g̃(y) φ̃(y) dy ≈
i=0
The idea behind importance sampling is to choose a sampling density φ̃ to
minimize the variance of the resulting function g̃.
This transformation strategy can be generalized to s dimensions as follows. If we have a product of univariate densities, then we can apply the
mapping Φ−1 component-wise,
y = Φ−1 (x) := (Φ−1 (x1 ), . . . , Φ−1 (xs )),
to obtain
g(y)
Rs
s
−1
φ(yj ) dy =
g(Φ
(x)) dx =
[0,1]s
j=1
f (x) dx.
[0,1]s
Note that we can always multiply and divide any given integrand by such
a product provided that φ(y) > 0 for all y ∈ R:
h(y) dy =
Rs
Rs
s
h(y)
φ(yj ) dy.
j=1 φ(yj )
s
j=1
Thus, this strategy always works in principle, but the resulting transformed
integrand might not be ‘nice’.
Many integrals from practical models involve the multivariate normal
density. If the multivariate normal density is the dominating part of the
entire integrand, then a good strategy is to factorize the covariance matrix
Σ, that is, to find an s × s matrix A such that
Σ = AAT ,
(2.9)
and then use the substitutions (treating all vectors as column vectors)
y = Az
followed by
z = Φ−1 (x),
162
J. Dick, F. Y. Kuo and I. H. Sloan
to obtain
exp(− 12 y T Σ−1 y)
exp(− 1 z T z)
dy =
g(y) g(Az) 2
dz
(2π)s det(Σ)
(2π)s
Rs
Rs
s
exp(− 12 zj2 )
√
dz
=
g(Az)
2π
Rs
j=1
=
g(AΦ−1 (x)) dx =
f (x) dx.
[0,1]s
[0,1]s
The factorization (2.9) is not unique. Two obvious choices are the Cholesky
factorization with lower triangular matrix A, and the principal components
factorization, which is given by
λ1 η 1 ; · · · ; λs η s ,
A=
where (λj , η j )sj=1 denotes the set of eigenpairs of Σ, with ordered eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λs and unit-length column eigenvectors η 1 , . . . , η s .
Other factorizations are possible: for example, in finance applications the
covariance matrix arising from the time-discretized Brownian paths can also
be factorized using the ‘Brownian bridge’ technique. In general, factorization of Σ in very high dimensions can be very costly, and the problem can
be poorly conditioned.
In some applications the multivariate normal density is not the dominating part of the entire integrand. In those cases, other transformation steps
(such as recentering and rescaling) might be required to capture the main
feature of the integrand: see, for example, Kuo et al. (2008a).
2.12. Notes
Halton (1960) introduced what is now called the Halton sequence. Discrepancy bounds for Halton sequences where the constant decreases with the
dimension have been shown by Atanassov (2004b). Kronecker sequences
were studied by Niederreiter (1978). Lattice rules were introduced by Korobov (1959) and have also been discovered by Hlawka (1962). For the early
history of QMC point sets and related concepts see Niederreiter (1978),
Kuipers and Niederreiter (1974), Niederreiter (1992a) and Sloan and Joe
(1994). More recent results are covered by Dick and Pillichshammer (2010).
For an efficient implementation of Sobol sequences see Antonov and
Saleev (1979) and Bratley and Fox (1988), and for questions concerning
the choice of primitive polynomials and direction numbers, see, for example, Joe and Kuo (2008). The implementation of Faure sequences has been
discussed in Atanassov (2004a) and Fox (1986). Implementation of the
Niederreiter sequence has been discussed in Bratley, Fox and Niederreiter
(1992). For digital sequences with improved t-value see Niederreiter and
High-dimensional integration: The quasi-Monte Carlo way
163
Xing (1995, 1996a, 1996b) and Xing and Niederreiter (1995). For an implementation of such sequences see Pirsic (2002).
It has been observed that the quality of QMC point sets usually decreases
as the dimension increases. This has led to the suggestion to study ‘mixed’
point sets, that is, the first few coordinates are QMC point sets with the
remaining coordinates filled up by, say, random numbers. This fits in with
the structure of integrands whose importance is concentrated in the first few
coordinates. The idea of combining the advantages of QMC methods and
MC methods was first proposed by Spanier, who applied mixed QMC–MC
sequences to particle transport problems. Probabilistic discrepancy bounds
for such ‘mixed sequences’ have been provided in Ökten (1996), Ökten, Tuffin and Burago (2006), Gnewuch (2009), Gnewuch and Roşca (2009) and
Aistleitner and Hofer (2012); in these papers it is assumed that the Monte
Carlo components of the points consist of idealized random numbers. There
have also been studies of mixed point sets whose remaining coordinates are
filled up by deterministic pseudo-random numbers or by the components of
other deterministic point sets. Deterministic discrepancy bounds for such
point sets can be found in Niederreiter (2009, 2010a, 2010b), Niederreiter
and Winterhof (2011), Hofer and Kritzer (2011) and Niederreiter (2012).
Further mixed sequences using Halton, Kronecker and digital sequences
have been studied by Hofer and Larcher (2010, 2012) and Hofer, Kritzer,
Larcher and Pillichshammer (2009). So-called (t, e, s)-sequences have been
introduced by Tezuka (2013). This concept captures the dependence on
the dimension of digital nets more precisely and leads to improvements of
discrepancy bounds in terms of their dependence on the dimension. Digital sequences in which the generating matrices have only a finite number
of non-zero elements in each row have been studied, for instance, by Hofer
and Pirsic (2011) and Hofer and Niederreiter (2013). The latter paper
constructs digital (t, s)-sequences with finite-row generating matrices and
asymptotically optimal t-values (in the sense of fixed base and dimension
s going to ∞). Such sequences with finite-row generating matrices may be
advantageous in implementations.
More information on transformations from Rs to the unit cube [0, 1]s ,
variance reduction techniques, and other information for the practical use
of MC and QMC, can be found in Lemieux (2009). The monographs by
Devroye (1986) and Hörmann, Leydold and Derflinger (2010) are primarily
concerned with transformations. Integration of functions with singularities
has been discussed by Owen (2006). Kuo et al. (2008a) considered transformation strategies for applying QMC methods to maximum likelihood
integrals from statistics. Kuo, Sloan, Wasilkowski and Waterhouse (2010a)
analysed lattice rules for unbounded integrands arising from the transformation. A construction of digital nets constructed in Rs has been studied
by Dick (2011b).
164
J. Dick, F. Y. Kuo and I. H. Sloan
Information relevant to statistical applications can be found in Fang and
Wang (1994). Control variates for QMC have been discussed by Hickernell,
Lemieux and Owen (2005). QMC methods have been applied to computer
graphics by Keller (2006, 2013), to transport problems by Spanier, and to
experimental designs by Hickernell (1999).
An introduction to lattice rules can be found in Sloan and Joe (1994);
see also Niederreiter (1992a, Chapter 5). Dick and Pillichshammer (2010)
provide a comprehensive introduction to (digital) (t, m, s)-nets, (digital)
(t, s)-sequences and related point sets and sequences. A survey of QMC
methods and their randomization strategies is given by Hickernell and Hong
(2002).
Sparse grids were discovered independently by Smolyak (1963) and Zenger
(1991). See also Wasilkowski and Woźniakowski (1995) and the review article by Bungartz and Griebel (2004).
3. Error, discrepancy, and reproducing kernel
In this section we present the necessary ingredients for the error analysis of
QMC methods. To provide an accessible entry point for readers who are
new to QMC, we begin by studying QMC in its simplest setting, namely,
equal-weight quadrature rules for integration over the unit interval [0, 1].
We will meet the useful notion of reproducing kernel Hilbert spaces, and
meet specific examples of function spaces that have proved useful for designing and analysing QMC methods. All of the spaces we deal with in this and
the next section are spaces of non-periodic functions. We follow contemporary practice in referring to such spaces as ‘Sobolev’ spaces, to distinguish
them from ‘Korobov’ spaces of periodic functions, to be considered later in
Section 5.8.
3.1. Error analysis in one dimension
We consider real-valued functions f defined on the interval [0, 1]. As is
common in numerical analysis, we require that the functions have some
smoothness and that the fundamental theorem of calculus holds, so that
1
1
f (y) dy = f (1) −
f (y)1[0,y] (x) dy,
(3.1)
f (x) = f (1) −
x
0
where the indicator function 1[0,y] is defined by
1 if x ∈ [0, y],
1[0,y] (x) =
0 if x ∈
/ [0, y].
High-dimensional integration: The quasi-Monte Carlo way
165
We consider QMC rules of the form
1
f (ti ),
n
n−1
Q(f ) =
i=0
and study their integration error given by
1
n−1
1
errorn (f ; Q) =
f (x) dx −
f (ti ).
n
0
i=0
Upon substituting (3.1) in the equation above and changing the order of
integration, we obtain
1 1
n−1
1
−
1[0,y] (x) dx +
1[0,y] (ti ) f (y) dy,
errorn (f ; Q) =
n
0
0
i=0
or
1
errorn (f ; Q) =
∆P (y)f (y) dy,
(3.2)
0
where P := {t0 , . . . , tn−1 } ⊂ [0, 1] denotes the set of quadrature points, and
∆P (y) is the local discrepancy of the point set P , defined by
1
n−1
1
∆P (y) :=
1[0,y] (ti ) −
1[0,y] (x) dx, y ∈ [0, 1].
(3.3)
n
0
i=0
Note that the local discrepancy function ∆P is just the integration error for
the indicator function 1[0,y] .
Thus from (3.2) the integration error can be viewed as the L2 inner product of the local discrepancy function ∆P and the derivative f of the integrand.
Using Hölder’s inequality we obtain from (3.2)
1
1/p 1
1/q
p
q
|∆P (y)| dy
|f (y)| dy
|errorn (f ; Q)| ≤
0
0
= ∆P Lp f Lq ,
1 1
+ = 1,
p q
(3.4)
1
where gLp = ( 0 |g(y)|p dy)1/p , and we make the usual modification if
p = ∞. If p = ∞ and q = 1, this is a special case of Koksma’s inequality
(Koksma’s inequality holds when one replaces f L1 with the variation
of f ). The quantity ∆P Lp is referred to as the discrepancy of the point
set P , and is closely related to the so-called worst-case error, which we shall
introduce formally in Section 3.3.
Several conclusions can be drawn from this elementary inequality. First,
the upper bound (3.4) is best possible: indeed, equality holds if ∆p (y)f (y) ≥
0 and |f (y)|q is a multiple of |∆P (y)|p . Second, to minimize the integration
166
J. Dick, F. Y. Kuo and I. H. Sloan
error for all functions for which f Lq < ∞, one should choose quadrature
points P for which the discrepancy ∆P Lp is, in some sense, small. Third,
it is enough to study the integration error of the indicator functions 1[0,y]
for y ∈ [0, 1], since the local discrepancy function ∆P evaluated at y is the
integration error of the indicator function 1[0,y] .
The above analysis of the integration error rests entirely on the integral
representation of the function f given by
1
f (y)1[0,y] (x) dy.
(3.5)
f (x) = f (1) −
0
For two functions f, g that permit such a representation, we can define an
inner product
1
f (y)g (y) dy,
(3.6)
f, g := f (1)g(1) +
0
and a corresponding norm
f =
|f (1)|2
1
+
|f (y)|2 dy.
0
Then (3.4) holds with p = q = 2 for all functions in the function space
H := {f : [0, 1] → R : f is absolutely continuous and f < ∞}.
3.2. Reproducing kernel Hilbert spaces
We have seen above that the integration error for functions f ∈ H has the
integral representation (3.2) involving the local discrepancy function. This
rests in turn on the representation of f given by (3.5). We now express
(3.5) in the language of reproducing kernels. The idea is to find a function
K : [0, 1] × [0, 1] → R such that
f, K(·, x) = f (x)
for all f ∈ H and all x ∈ [0, 1].
For this to hold we must have, from (3.6) and (3.5),
1
1
∂K
(x, y) dy = f (1) −
f (y)
f (y)1[0,y] (x) dy,
f (1)K(1, x) +
∂y
0
0
which is clearly satisfied for all f ∈ H if
∂K
K(1, x) = 1 and
(x, y) = −1[0,y] (x)
∂y
This can be achieved by taking
K(x, y) = 2 − max(x, y).
for all x, y ∈ [0, 1].
(3.7)
The function of two variables K(x, y) given by (3.7) is our first example of
a reproducing kernel, and H is our first example of a reproducing kernel
Hilbert space.
High-dimensional integration: The quasi-Monte Carlo way
167
We now formally define a reproducing kernel Hilbert space of real-valued
functions defined on [0, 1]s .
Definition 3.1 (reproducing kernel Hilbert space).
The Hilbert
space H(K) with inner product ·, ·H is a reproducing kernel Hilbert space
(RKHS) with kernel K : [0, 1]s × [0, 1]s → R if:
• K(·, x) ∈ H for all x ∈ [0, 1]s ,
• (the reproducing property) f (x) = f, K(·, x)H for all f ∈ H and all
x ∈ [0, 1]s .
For a given RKHS there is a uniquely defined reproducing kernel satisfying
the above two properties. In fact, every Hilbert space where point evaluation
is a bounded linear functional has a reproducing kernel. This follows from
the Riesz representation theorem. Reproducing kernels have the additional
properties that:
• (symmetry) K(x, y) = K(y, x) for all x, y ∈ [0, 1]s ,
M
• (positive semidefiniteness) M
i=1
k=1 ai ak K(ti , tk ) ≥ 0 for all M ≥ 1
and all a1 , . . . , aM ∈ R and t1 , . . . , tM ∈ [0, 1]s .
Indeed, any function K : [0, 1]s ×[0, 1]s → R which is symmetric and positive
semidefinite is a reproducing kernel to which there corresponds a uniquely
defined inner product and RKHS. A comprehensive theory on reproducing
kernels can be found in Aronszajn (1950).
The thing that makes an RKHS so useful in the problem of numerical
integration is that, as we shall demonstrate in Section 3.3, there exists a
closed-form expression for the worst-case error. That expression for the
worst-case error is expressed in terms of the kernel of the RKHS, and so
is particularly useful when there exists a simple closed expression for the
kernel K.
We remark that it is not always easy to find the explicit kernel of an
RKHS with a given inner product. The particular example above, where
the kernel is not only known but is of very simple piecewise linear form,
has played an important part, as we shall see, in the recent development of
QMC methods. We shall meet other reproducing kernels in Section 3.4.
We now show how to use a reproducing kernel for a space of functions of
a single variable as a building block for a multivariate RKHS. First we need
the following definition.
Definition 3.2 (tensor product Hilbert space). A Hilbert space Hs of
functions on [0, 1]s is a tensor product of Hilbert spaces H1,1 , H1,2 , . . . , H1,s
of functions on [0, 1], written as
Hs = H1,1 ⊗ H1,2 ⊗ · · · ⊗ H1,s ,
168
J. Dick, F. Y. Kuo and I. H. Sloan
if it is the completion of the span of products sj=1 fj (xj ), where fj ∈ H1,j ,
under the norm in Hs . The Hs norm of the simple product sj=1 fj (xj ) is
just the product of the norms fj H1,j .
Tensor product spaces provide a popular setting for studying QMC methods, for the very good reason that the reproducing kernel for a tensor product
of reproducing kernel Hilbert spaces is just the product of the kernels.
Example 3.3 (a key tensor product RKHS). Take each one-dimensional RKHS H1,j to be the space of single-variable absolutely continuous
functions with reproducing kernel (3.7). Then the tensor product space Hs
is the s-variable RKHS with reproducing kernel
Ks (x, y) =
s
(2 − max(xj , yj )).
j=1
The inner product in this space is
∂ |u|
∂ |u|
f (xu ; 1)
g(xu ; 1) dxu ,
f, gHs =
∂xu
[0,1]|u| ∂xu
(3.8)
u⊆{1:s}
and the norm is
f Hs =
u⊆{1:s}
1/2
|u|
2
∂
f (xu ; 1) dxu
.
[0,1]|u| ∂xu
Here {1 : s} is a shorthand notation for {1, 2, . . . , s}, and the sum is over
all subsets u ⊆ {1 : s}, including the empty set, while for x ∈ [0, 1]s the
symbol xu denotes the set of components xj of x with j ∈ u, and (xu ; 1)
indicates that the components of x for j ∈
/ u are replaced by 1. The partial
derivative ∂ |u| /∂xu denotes the mixed first partial derivative with respect
to the components xu . Which functions lie in this space? The answer is
all real-valued functions on [0, 1]s that have square-integrable mixed first
derivatives and that are expressible in the form
f (x) = f, K(·, x)Hs
|u|
=
(−1)
u⊆{1:s}
[0,1]|u|
∂ |u|
f (y u ; 1) 1[0,yu ] (x) dy u ,
∂y u
x ∈ [0, 1]s .
We refer to this space as an anchored Sobolev space with the anchor point 1.
3.3. Worst-case error in an RKHS
We have foreshadowed in Section 3.1 that the discrepancy of a point set is
closely related to the worst-case error. We now give the formal definition.
High-dimensional integration: The quasi-Monte Carlo way
169
Definition 3.4 (worst-case error). The worst-case error of a QMC rule
Qn,s using the point set P ⊂ [0, 1]s in a normed space H (not necessarily a
Hilbert space) is
en,s (P ; H) := sup |Is (f ) − Qn,s (f )|,
f H ≤1
that is, it is the worst error attained by Qn,s for f in the unit ball of H.
The initial error is
e0,s (H) := sup |Is (f )|,
f H ≤1
which is the error obtained with the QMC rule replaced by zero.
Due to linearity, for any function f ∈ H we have
|Is (f ) − Qn,s (f )| ≤ en,s (P ; H) f H ,
which takes the same form as the inequality in (3.4). Later in Section 3.5
we shall see other inequalities of the same form.
Worst-case errors are in general hard to compute, except for the case
of an RKHS. In every RKHS of functions defined on the unit cube (not
just a tensor product space), the following theorem gives a formula for the
worst-case error in terms of the reproducing kernel.
Theorem 3.5 (formula for the squared worst-case error).
[0, 1]s × [0, 1]s → R be a reproducing kernel that satisfies
K(x, y) dx dy < ∞.
[0,1]s
Let K :
[0,1]s
The squared worst-case error and initial error for a QMC rule in an RKHS
Hs (K) with reproducing kernel K satisfy
e2n,s (P ; Hs (K)) =
[0,1]s
+
1
n2
n−1 [0,1]s
K(x, y) dx dy −
2
n
i=0
n−1
n−1
[0,1]s
K(ti , tk ),
K(ti , y) dy
(3.9)
i=0 k=0
and
e20,s (Hs (K))
=
[0,1]s
[0,1]s
K(x, y) dx dy.
(3.10)
Proof. We give here the proof for s = 1. The proof for general s can be
easily obtained by replacing the unit interval [0, 1] in the argument below
by the unit cube [0, 1]s .
170
J. Dick, F. Y. Kuo and I. H. Sloan
For an f ∈ H we write the reproducing property f (x) = f, K(·, x)H
with successive arguments x = t1 , t2 , · · · and average the results to obtain,
for the quadrature sum,
n−1
n−1
n−1
1
1
1
f (ti ) =
f, K(·, ti )H = f,
K(·, ti )
.
n
n
n
i=0
i=0
i=0
In a similar way we find
1
1
f (x) dx =
f, K(·, x)H dx = f,
0
0
H
1
K(·, x) dx
0
,
H
provided that integration
1 from 0 to 1 is a bounded linear functional in H,
or equivalently that 0 K(·, x) dx ∈ H. In turn this requires
!2
! 1
1 1
!
!
"
#
!
!
K(·,
x)
dx
=
dx dy
K(·,
x),
K(·,
y)
!
!
H
0
H
0
0
1 1
=
0
K(x, y) dx dy < ∞,
0
which will hold for all the kernels we shall consider. Note that in the last
step we used the reproducing property. By subtraction, the integration
error is
1
n−1
n−1
1
1
1
f (x) dx −
f (ti ) = f,
K(·, x) dx −
K(·, ti )
= f, ξH ,
n
n
0
0
i=0
where
i=0
1
ξ(y) :=
1
K(y, ti ),
n
H
n−1
K(x, y) dx −
0
y ∈ [0, 1].
(3.11)
i=0
We call the function ξ the representer of the integration error. Thus
en,1 (P ; H) = sup |f, ξH | = ξH ,
f H ≤1
since the supremum is attained by the choice f = ξ/ξ ∈ H. Therefore
e2n,1 (P ; H) = ξ, ξH
1
n−1
n−1
1
1
=
K(·, x) dx −
K(x·, ti ),
K(·, x) dx −
K(x, ti )
n
n
0
0
i=0
i=0
H
1 1
1
n−1
n−1
n−1
2
1
=
K(x, y) dx dy −
K(x, ti ) dx + 2
K(ti , tk ),
n
n
0
0
0
1
i=0
i=0 k=0
where we again used the reproducing property of the kernel.
171
High-dimensional integration: The quasi-Monte Carlo way
For the particular Hilbert space with reproducing kernel K(x, y) = 2 −
max(x, y), it is easily seen that
1
0
3 − x2
K(x, y) dy =
2
1 1
and
0
0
4
K(x, y) dx dy = ,
3
so that the worst-case error is
e2n,1 (P ; H)
n−1
n−1 n−1
1 4 2 3 − t2i
+ 2
= −
(2 − max(ti , tk )).
3 n
2
n
i=0
i=0 k=0
For future purposes we note that the derivative of the error representer ξ is
ξ = ∆P , the local discrepancy function.
For the corresponding tensor product RKHS the worst-case error is
e2n,s (P ; Hs )
(3.12)
s
n−1 s n−1 n−1 s
1 4
2 3 − t2i,j
+ 2
=
−
2 − max(ti,j , tk,j ) ,
3
n
2
n
i=0 j=1
i=0 k=0 j=1
where ti,j denotes the jth component of the ith cubature point ti . This is
sometimes called the squared L2 discrepancy of the point set P , for reasons
that will soon become clear.
It proves to be very useful (and surprisingly easy) to compute the average
of the squared worst-case error over all cubature points,
2
···
e2n,s (t0 , . . . , tn−1 ; Hs (K)) dt0 · · · dtn−1 .
En,s (Hs (K)) :=
[0,1]s
[0,1]s
We refer to the quantity En,s (K) as the QMC mean of the worst-case error.
(Why this is a useful quantity will be explained after the theorem.)
Theorem 3.6 (QMC mean). Let K : [0, 1]s × [0, 1]s → R be a reproducing kernel that satisfies
K(x, x) dx < ∞.
[0,1]s
The QMC mean of the worst-case error in a reproducing kernel Hilbert
space Hs (K) of functions on [0, 1]s and with kernel K satisfies
1
2
K(x, x) dx −
K(x, y) dx dy . (3.13)
En,s (Hs (K)) =
n [0,1]s
[0,1]s [0,1]s
172
J. Dick, F. Y. Kuo and I. H. Sloan
Proof.
Using Theorem 3.5 we have
2
(Hs (K))
En,s
···
=
[0,1]s
[0,1]s
[0,1]s
1
+ 2
n
n−1
i=0
n−1 [0,1]s
K(x, y) dx dy −
2
n
i=0
1
K(ti , ti ) + 2
n
n−1
n−1
[0,1]s
K(ti , y) dy
K(ti , tk ) dt0 · · · dtn−1 ,
i=0 k=0
k=i
noting that in the double sum we separated the diagonal and off-diagonal
terms as in the proof of Theorem 2.1. The first term inside the parentheses
is independent of t0 , . . . , tn−1 , thus these variables can all be integrated
out to give the result 1. In the second term, after interchanging the outer
integrations and the sum, all variables but ti can be integrated out. The
situation is similar for the third term, while for the fourth term all but ti
and tk can be integrated out. We are left with
n−1 2
=
K(x, y) dx dy −
K(ti , y) dti dy
n
s [0,1]s
[0,1]s [0,1]s
i=0 [0,1]
n−1 n−1 n−1 1 1 + 2
K(ti , ti ) dti + 2
K(ti , tk ) dti dtk
n
n
[0,1]s
[0,1]s [0,1]s
2
En,s
(K)
i=0
=
[0,1]s
[0,1]s
K(x, y) dx dy − 2
i=0 k=0
k=i
[0,1]s
[0,1]s
K(x, y) dx dy
n−1
K(x, x) dx +
K(x, y) dx dy
n
[0,1]s
[0,1]s [0,1]s
1
1
=
K(x, x) dx −
K(x, y) dx dy.
n [0,1]s
n [0,1]s [0,1]s
1
+
n
The importance of Theorem 3.6 is that it gives us an immediate existence
result for QMC integration: by applying the principle that there is always
at least one choice as good as the average, we conclude that there exists a
QMC point set P for which
2
(Hs (K)).
e2n,s (P ; Hs (K)) ≤ En,s
Thus, provided that both integrals on the right-hand side in Theorem 3.6
are finite (as they are in all the cases we will consider), there exists a QMC
point set with worst-case error of order O(n−1/2 ), that is, the same rate of
convergence as Monte Carlo. Later we will not be satisfied with this result,
for two reasons: first, we want not just the Monte Carlo rate of convergence,
High-dimensional integration: The quasi-Monte Carlo way
173
but rather a rate of convergence close to O(n−1 ), or even better; second, we
want not just existence, but also a method of constructing the points.
3.4. Other univariate reproducing kernels
Note that (3.1) can be viewed as a Taylor series with integral remainder,
developed at 1. As such it is clear that we could choose an arbitrary anchor
point c ∈ [0, 1] instead of 1. Or we can consider function spaces whose
members have more than one derivative. Or we can depart from a Taylor
series representation of f to obtain other examples of reproducing kernels.
All of the following examples can be used as building blocks for multivariate
Sobolev spaces.
Example 3.7 (anchored reproducing kernel of smoothness α).
If the (α − 1)th derivative of f is absolutely continuous for some natural
number α, then the Taylor series of a univariate function f anchored at 0 is
1
α−1
f (r) (0)
(x − y)α−1
+
xr +
dy,
f (α) (y)
f (x) =
r!
(α
−
1)!
0
r=0
where
f (r)
denotes the rth derivative of f and
(x − y)α−1 if x > y,
α−1
(x − y)+ =
0
if x ≤ y.
For functions f, g permitting such a Taylor series presentation, we can define
the inner product
1
α−1
f (r) (0)g (r) (0) +
f (α) (y)g (α) (y) dy.
f, g =
r=0
0
The set of functions whose norm corresponding to this inner product is finite
is another RKHS, with kernel given by
α−1
xr y r 1 (x − z)α−1
(y − z)α−1
+
+
+
dz.
Kα (x, y) =
r! r!
(α
−
1)!
(α
−
1)!
0
r=0
We call the corresponding RKHS the anchored Sobolev space of smoothness
α with anchor 0.
Example 3.8 (unanchored reproducing kernel of smoothness α).
For a univariate function f belonging to the same smoothness class as in
the preceding example, consider the representation given by
1
α
Br (x) 1 (r)
Bα (|x − y|) (α)
α
f (y) dy.
f (y) dy − (−1)
f (x) =
r!
α!
0
0
r=0
174
J. Dick, F. Y. Kuo and I. H. Sloan
Here, Br denotes the Bernoulli polynomial of degree r: see Chapter 24 of
the Digital Library of Mathematical Functions (2012). In this case the inner
product is
1
1
α−1
1
(r)
(r)
f (y) dy
g (y) dy +
f (α) (y)g (α) (y) dy,
f, g =
r=0
0
0
0
and the reproducing kernel is
Kα (x, y) =
α
Br (x) Br (y)
r=0
r!
r!
− (−1)α
B2α (|x − y|)
.
(2α)!
The corresponding RKHS is called the unanchored Sobolev space of smoothness α.
3.5. Geometric discrepancy
We now return to the local discrepancy function
1
n−1
1
1[0,x] (ti ) −
1[0,x] (y) dy,
∆P (x) =
n
0
i=0
which we encountered already in (3.3), and which we also obtained as the
derivative ξ of the representer ξ, where ξ is given by (3.11) with K as in
(3.7).
The local discrepancy function has a geometric interpretation: since
n−1
1[0,x] (ti )
i=0
is the number of points of P lying in the interval [0, x], therefore
1
1[0,x] (ti )
n
n−1
i=0
is the proportion of points of P lying in the interval [0, x]; and the local
discrepancy is the departure of this proportion from the ‘ideal’ proportion,
which is the length of the interval. By taking the Lp norm of ∆P we obtain
the Lp discrepancy of the point set P , given by
1
1/p
p
∆P p :=
|∆P (x)| dx
0
for 1 ≤ p < ∞, with the obvious modifications for p = ∞. The Lp discrepancy can therefore be understood as a measure for how uniformly the
point set P is distributed, that is, it measures the discrepancy between the
empirical distribution of the point set P and the uniform distribution.
High-dimensional integration: The quasi-Monte Carlo way
175
We consider now a generalization to dimensions s > 1. In one dimension,
the local discrepancy function can be derived as the derivative of the representer ξ based on the reproducing kernel K(x,
s y) = 2 − max(x, y). As before
we consider the product kernel Ks (x, y) = j=1 K(xj , yj ). The representer
of the error is given by
n−1
1
Ks (x, y) dy −
Ks (x, ti )
ξ(x) =
n
[0,1]s
i=0
s n−1 s
3 − x2j
1 −
=
(2 − max(xj , ti,j )).
2
n
j=1
i=0 j=1
The mixed first partial derivatives are then given by
n−1
∂ |u| ξ
1
(xu ; 1) = (−1)|u|
xj −
1[0u ,xu ] (ti,u ) ,
∂xu
n
j∈u
where
1[0u ,xu ] (ti,u ) =
i=0
1[0,xj ] (ti,j ).
j∈u
We define the local discrepancy function ∆P in s dimensions by
∆P (x) :=
1
1[0,x] (ti ) −
xj .
n
n−1
s
i=0
j=1
Then
∂ |u| ξ
(xu ; 1) = (−1)|u|+1 ∆P (xu ; 1).
∂xu
Analogously to the one-dimensional case, we can show that
n−1
1
∂ |u| f
|u|
f (ti ) −
f (x) dx =
(−1)
(xu ; 1)∆P (xu ; 1) dxu ,
n
[0,1]s
[0,1]|u| ∂xu
u⊆{1:s}
i=0
where the u = ∅ term is actually zero since ∆P (1) = 0. This equality is
called by different people the Hlawka identity or the Zaremba identity.
By applying Hölder’s inequality for integrals and sums, we then obtain
the following inequality.
Theorem 3.9 (Koksma–Hlawka inequality). We have
n−1
1 ≤ ∆P p,p f q,q ,
f
(t
)
−
f
(x)
dx
i
n
[0,1]s
i=0
(3.14)
176
J. Dick, F. Y. Kuo and I. H. Sloan
where 1 ≤ p, p , q, q ≤ ∞, p1 + 1q = 1, p1 + q1 = 1, and
p /p 1/p
p
∆P (xu ; 1) dxu
∆P p,p =
u⊆{1:s}
and
f q,q =
u⊆{1:s}
[0,1]|u|
|u|
q
q /q 1/q
∂ f
(xu ; 1) dxu
,
[0,1]|u| ∂xu
with the obvious modifications if one or more of p, p , q, q are infinite.
In the theorem the norm f q,q can obviously be replaced by the corresponding seminorm |f |q,q , defined in exactly the same way but with the
u = ∅ term omitted. (This term can be omitted because the QMC rule is
exact for constants.)
The inequality (3.14) says that the integration error is bounded by the
product of the norm f q,q or seminorm |f |q,q of the integrand and the
discrepancy ∆P p,p of the cubature points. It is often called a Koksma–
Hlawka inequality, especially for the case p = p = ∞. If the integrand
is given, and cannot be changed, the inequality motivates the search for
cubature points with small discrepancy. It also connects QMC with the field
of discrepancy theory, since it shows that point sets with low discrepancy
are attractive to use as QMC integration points.
The classical Koksma–Hlawka inequality states that
n−1
1 ≤ Dn∗ (P )V (f ),
f
(t
)
−
f
(x)
dx
(3.15)
i
n
[0,1]s
i=0
where
Dn∗ (P )
is the star discrepancy
Dn∗ (P ) := sup |∆P (x)|,
(3.16)
x∈[0,1]s
and V (f ) is the variation of f in the sense of Hardy and Krause. If f has
continuous mixed first partial derivatives, the variation V (f ) coincides with
|u|
∂ f
dxu .
(x
;
1)
|f |1,1 =
u
[0,1]|u| ∂xu
∅=u⊆{1:s}
3.6. Notes
The classical reference for reproducing kernels is Aronszajn (1950). See
also Thomas-Agnan (1996) and Wahba (1990). In the context of QMC,
reproducing kernels were introduced by Hickernell (1996a); see also Hickernell (1998a). For another elementary introduction to RKHSs in the context of QMC integration see Dick and Pillichshammer (2010, Chapter 2).
High-dimensional integration: The quasi-Monte Carlo way
177
Theorem 3.5 can be found in Hickernell (1998a). Mercer’s theorem implies
that each reproducing kernel has an expansion in terms of its eigenfunctions
K(x, y) =
∞
λr ψr (x)ψr (y),
r=0
where λr > 0 are the eigenvalues and ψr are the eigenfunctions of the
integral operator on [0, 1]s with kernel K. For many reproducing kernels
considered in QMC theory these eigenfunctions are explicitly known: see
Dick, Nuyens and Pillichshammer (2013), Wasilkowski and Woźniakowski
(1999) and Werschulz and Woźniakowski (2009).
The one-dimensional form of (3.15) is due to Koksma (1942/43) and the
higher-dimensional result is due to Hlawka (1961).
Geometric discrepancy was studied long before QMC integration. The
classical reference to uniform distribution modulo 1 is Weyl (1916). Further
monographs dealing with discrepancy theory and various aspects of it are
due to Kuipers and Niederreiter (1974), Niederreiter (1992a), Drmota and
Tichy (1997), Matoušek (1999), Chazelle (2000) and Dick and Pillichshammer (2010).
Novak and Woźniakowski (2009) studied whether general L2 discrepancies
(in Euclidean spaces) are related to the worst-case error of multivariate integration on an RKHS. In particular, they provided a concrete formula for the
corresponding reproducing kernel for arbitrary L2 discrepancies. Gnewuch
(2012a) extended these results to weighted L2 discrepancies and weighted
RKHS. Here the underlying domain can be more general as in Novak and
Woźniakowski (2009). In particular, Gnewuch (2012a) covers the relation
between infinite-dimensional integration on weighted Hilbert spaces and the
corresponding weighted (limiting) L2 discrepancies.
4. Weighted spaces and tractability
4.1. Meeting the high-dimensional challenge
High-dimensional problems, as noted already in the Introduction, can be
very hard. Yet some problems have caused surprise in the opposite direction by turning out to be easier than might have been expected. Especially
influential in this respect were certain 360-dimensional calculations carried
out in the mid-1990s by Paskov and Traub (1995) at Columbia University.
That paper described a problem coming directly from Wall Street, on the
valuation of a class of financial derivatives known as mortgage-backed obligations. While the details of the model or the calculation are not important
here, broadly the problem was to evaluate a parcel of mortgages held by a
bank, where each month borrowers make individual choices as to whether or
not to repay the mortgage, and are assumed to make this choice according
178
J. Dick, F. Y. Kuo and I. H. Sloan
to some probability distribution. And of course the value of a mortgage
held by the bank is affected by whether or not the loan is repaid. Because there are 360 months in the 30-year period of the loan, the problem
is to evaluate a 360-dimensional expected value. The standard approach
to such a problem is to model the process month by month by a Monte
Carlo method. Paskov and Traub (1995), in contrast, treated the problem
as one of 360-dimensional integration. To the surprise of most observers,
they were able to obtain satisfactory results with a QMC method, at much
smaller computational cost than with a Monte Carlo method.
Following the success of those experiments, the idea began to emerge
that perhaps problems such as the one described in the previous paragraph
are in some hidden sense not really as difficult as might be expected from
their nominal high-dimensionality. In this spirit Caflisch, Morokoff and
Owen (1997) defined two notions of ‘effective dimension’ (namely ‘truncation dimension’ and ‘superposition dimension’), with the idea that either
or both might be much smaller than the nominal dimension. Sloan and
Woźniakowski (1998) sought to build this notion into the mathematics, by
introducing function spaces containing ‘weights’. It is fair to say that in
some form or another ‘weighted spaces’ have become a standard part of
the QMC framework. We will define such spaces formally in the next subsection, but first we describe the underlying idea, and indicate why it has
become so popular.
Sloan and Woźniakowski speculated that perhaps the reason that QMC
calculations such as those in Paskov and Traub (1995) were successful is
that some coordinate directions are more important than others, in the
sense of being in some way more difficult. Assuming then that the coordinate directions are ordered in order of difficulty, they sought to quantify
this decreasing importance by associating with each coordinate direction a
positive number γj , where
γ1 ≥ γ2 ≥ γ3 · · · > 0.
By introducing a function space incorporating these weights, they were able
to obtain a rather precise result, namely that the worst-case error (defined
already in Definition 3.4) is bounded independently of dimension if and
only if
∞
γj < ∞.
j=0
This means that a classical choice of weights (with all weights equal) certainly fails to have worst-case errors bounded independently of the dimension. So too, though only marginally, do weights γj = 1/j. On the other
hand weights of the form γj = 1/j a lead to uniformly bounded worst-case
errors if (and only if) a > 1.
High-dimensional integration: The quasi-Monte Carlo way
179
The weights described in the preceding paragraph are now called ‘product
weights’, for reasons that will soon become clear. Since that time many other
forms of weights have been studied (we shall meet ‘general weights’, ‘finiteorder’ weights, ‘order-dependent’ weights, and the most recent addition,
‘POD’ (for ‘product and order-dependent’) weights. The driving motivation
for this flowering of possibilities has been the desire to describe in a more
precise way the influence of particular combinations of the variables. At the
(unrealistic) extreme, the greatest freedom attaches to ‘general weights’,
which allow a different weight γu for each of the 2s subsets u ⊆ {1 : s}.
For all these choices of weights the question of ‘tractability’ arises: loosely,
we ask under what conditions on the weights are the worst-case errors uniformly bounded in dimension; or if they are not bounded, then at worst grow
only slowly with dimension. We shall discuss tractability in Section 4.5.
4.2. Weighted reproducing kernel Hilbert spaces: anchored Sobolev spaces
In this subsection we first introduce a weighted RKHS in its simplest product
form. But we then quickly develop weighted spaces of a more general form.
Arguing as in Example 3.3, we build our first weighted s-variable space out
of single-variable weighted spaces with simple kernels. Our one-dimensional
space H1,γ is almost the same as before (in particular, it is always the space
of absolutely continuous functions f defined on [0, 1] whose first derivatives are square-integrable), except that we now associate with the space a
weight γ > 0, and in addition we allow the ‘anchor’ value to be any number c ∈ [0, 1], instead of just 1 as before. Our building block is now the
one-dimensional kernel
K1,γ (x, y) = 1 + γ η(x, y),
where


 min(x, y) − c
η(x, y) =
c − max(x, y)


0
if x, y > c,
if x, y < c,
otherwise.
(4.1)
If c = 1 and γ = 1 it is easily seen that this reduces to the reproducing
kernel (3.7). For general c and γ it is easy to verify, using no more than the
fundamental theorem of calculus, that K1,γ (x, y) is the reproducing kernel
associated with the inner product
1 1 f (x) g (x) dx,
f, g1,γ = f (c)g(c) +
γ 0
which is a generalization of (3.6). The corresponding norm is
1/2
f 1,γ = f, f 1,γ .
180
J. Dick, F. Y. Kuo and I. H. Sloan
The s-dimensional tensor product RKHS corresponding to the above onedimensional space is
Hs,γ := H1,γ1 ⊗ H1,γ2 ⊗ · · · ⊗ H1,γs ,
if we allow a different weight γj for each component xj . It has the reproducing kernel
s
(1 + γj η(xj , yj )).
Ks,γ (x, y) =
j=1
The identity
s
(1 + aj ) =
j=1
aj = 1 +
u⊆{1:s} j∈u
aj
(4.2)
∅=u⊆{1:s} j∈u
can now be used to rewrite the product as a sum over all subsets of {1 : s},
γu
η(xj , yj ),
(4.3)
Ks,γ (x, y) =
u⊆{1:s}
j∈u
where for this product weight case we have
γu =
γj ,
j∈u
with the convention that the empty product has the value 1. The reason for
the ‘product’ tag for these weights now becomes clear: the weight associated
with the subset u of the variables is the product of the weights for the
individual components. One implication of this is that the scaling of product
weights is crucially important. If, for example, all the weights γj are halved
then the weights for the subset u = {1, 3} with two elements are reduced by
a factor of 4, and those for the subset u = {2, 4, 5} by a factor of 8.
Because of the limited flexibility of product weights, other choices of
weights have been considered. In principle the formula (4.3) allows a different weight γu to be chosen for each subset u of the variables, but the
cost of even one evaluation of the reproducing kernel would then be prohibitive when s is large. This is the general weight case (Dick, Sloan, Wang
and Woźniakowski 2006), where we take γ∅ ≡ 1. There is much interest
(as well as some unease (Sloan 2007)) in finite-order weights (Sloan, Wang
and Woźniakowski 2004, Wasilkowski and Woźniakowski 2004), in which
γu = 0 for all |u| greater than some number q ∗ . Order-dependent weights,
introduced by Dick et al. (2006), take the form
γu = Γ|u|
for some non-negative numbers Γ1 , Γ2 , . . . . The most recent addition to the
menagerie of weights are the POD weights (i.e., product and order-dependent
weights), in which two sequences γ1 , γ2 , . . . and Γ1 , Γ2 , . . . of non-negative
High-dimensional integration: The quasi-Monte Carlo way
numbers are defined, and we take
γu = Γ|u|
γj .
181
(4.4)
j∈u
Weights of the POD form were found to arise naturally in a recent study of
PDEs with random coefficients (Kuo et al. 2012).
For general weights γu the inner product in our anchored space is
∂ |u|
∂ |u|
γu−1
f (xu ; c)
g(xu ; c) dxu ,
f, gs,γ =
∂xu
[0,1]|u| ∂xu
u⊆{1:s}
where the notation follows that in (3.8), and the term labelled by u is to be
omitted for u = ∅.
The worst-case errors for spaces with reproducing kernel of the form (4.3)
and η given by (4.1) can be found from (3.9). To make use of that expression
one needs the single and double integrals of the kernel. For completeness we
give the necessary integrals here, as well as the ‘diagonal’ integral needed
for the QMC mean (3.13):
Ks,γ (x, y) dy =
γu
α(xj ) + β ,
[0,1]s
[0,1]s
[0,1]s
u⊆{1:s}
Ks,γ (x, y) dx dy =
u⊆{1:s}
[0,1]s
Ks,γ (x, x) dx =
u⊆{1:s}
j∈u
γu β |u| ,
(4.5)
|u|
γu β + 16 ,
where
α(x) := max(x, c) −
x2
2
−
c2
2
−
1
3
β := c2 − c + 13 .
and
(4.6)
When the weights are of the product form, these three integrals simplify
(by using the identity (4.2)) to
s
1 + γj α(xj ) + β ,
s
j=1
j=1
1 + γj β ,
s
1 + γj β + 16 ,
j=1
respectively.
For example, with product weights and c = 1, we have from (3.9) and
(3.10) that the worst-case error and initial error are given by
s n−1 s γj
γj
2 2
2
1+
1 + (1 − ti,j )
−
en,s (P ; H(Ks,γ )) =
3
n
2
j=1
+
i=0 j=1
1
n2
n−1
s
n−1
1 + γj 1 − max(ti,j , tk,j ) ,
i=0 k=0 j=1
182
J. Dick, F. Y. Kuo and I. H. Sloan
which reduces to (3.12) if γj = 1 for all j, and
s γj
2
1+
e0,s (H(Ks,γ )) =
.
3
j=1
Moreover, we have from (3.13) that the QMC mean satisfies
s s γj
γj
1 2
1+
1+
−
.
En,s (H(Ks,γ )) =
n
2
3
j=1
j=1
We remark that it is also possible to use a different anchor value cj for
each coordinate direction xj , thus leading to different ηj , αj , βj for each
index j. All results can be trivially generalized to that case.
4.3. Unanchored Sobolev spaces
The reproducing kernels considered so far in this section have been of the socalled ‘anchored’ variety, in which there is a special number c ∈ [0, 1], called
the ‘anchor’, at which the components of x that are not active (i.e., that
are not in the active set u) are fixed. However, there are other RKHSs with
interesting properties (Sloan and Woźniakowski 2002). In particular, the
so-called ‘unanchored’ spaces, in which the inactive variables are integrated
over rather than fixed, have some advantages. We give the necessary formulas here; all can be checked by similar arguments to those given previously
for the anchored spaces (or see the reference above).
In the one-dimensional weighted case the inner product is now (cf. Example 3.8 for α = 1)
1
1
1 1 f (x) dx
g(x) dx +
f (x) g (x) dx,
f, g1,γ =
γ 0
0
0
and the corresponding norm is
1/2
f 1,γ = f, f 1,γ .
The corresponding reproducing kernel is, as before,
K1,γ (x, y) = 1 + γ η(x, y),
but now η is given by
η(x, y) = 12 B2 (|x − y|) + x − 12 y − 12 ,
(4.7)
where B2 (x) := x2 − x + 16 is the Bernoulli polynomial of degree 2, which
has the useful property
1
1
B2 (|x − y|) dy =
B2 (y) dy = 0.
0
0
High-dimensional integration: The quasi-Monte Carlo way
183
The corresponding kernel for the s-dimensional general weight case is
again given by (4.3), but now with η given by (4.7). The inner product for
the s-dimensional general weight unanchored space is
∂ |u|
−1
γu
f (x) dx−u
f, gs,γ =
[0,1]|u|
[0,1]s−|u| ∂xu
u⊆{1:s}
∂ |u|
g(x) dx−u dxu ,
×
[0,1]s−|u| ∂xu
where x−u stands for all the components of the s-dimensional vector x that
are not included in xu .
The range of choices for the weights is exactly the same as for the anchored
case.
The worst-case error is again given by (3.9). The necessary single and
double integrals of the kernel, along with the diagonal integral needed for
the QMC mean, now take the simpler forms
Ks,γ (x, y) dy = 1,
[0,1]s
Ks,γ (x, y) dx dy = 1,
[0,1]s [0,1]s
|u|
Ks,γ (x, x) dx =
γu 16 ,
[0,1]s
u⊆{1:s}
and in the case of product weights the third integral simplifies to
s γj
.
1+
6
j=1
To avoid having to treat the anchored and unanchored spaces separately,
we can write η(x, y) from (4.1) and (4.7) in the common form
η(x, y) = 12 B2 (|x − y|) + (x − 12 )(y − 12 ) + α(x) + α(y) + β,
(4.8)
where α(·) and β are defined by (4.6) for the anchored case, and α ≡ 0 and
β = 0 for the unanchored case.
4.4. Why weighted spaces are interesting
We have defined different kinds of weighted spaces for functions defined on
the s-dimensional cube, but we have not so far said why weighted spaces
are interesting. The answer is that the worst-case errors in any of our
weighted reproducing kernel Hilbert spaces are bounded independently of the
dimension s if (and only if ) the weights decay in a suitable way.
For product weights, the following result, first obtained by Sloan and
Woźniakowski (1998) for the anchored case c = 1, is particularly striking.
184
J. Dick, F. Y. Kuo and I. H. Sloan
Theorem 4.1. For product weights, and for the weighted anchored or
unanchored Sobolev space of Sections 4.2 and 4.3, there exist point sets
Pn ⊂ [0, 1]s for n = 1, 2, . . . such that the worst-case error en,s (Pn ; H(Ks,γ ))
is bounded independently of s if and only if
∞
γj < ∞.
(4.9)
j=1
There are two parts to this theorem: one part is an existence result,
stating that ‘good’ point sets exist if (4.9) holds, without saying how to find
them; and the other part is a negative result, saying that if the assumption
(4.9) does not hold then, no matter how the points are chosen, the worstcase error is unbounded as s → ∞. We shall address each result separately
in the following two lemmas.
Lemma 4.2. For product weights, and for the weighted anchored or unanchored Sobolev space of Sections 4.2 and 4.3, there exist point sets Pn ⊂
[0, 1]s for n = 1, 2, . . . such that the worst-case error en,s (Pn ; H(Kγ,s )) satisfies
∞ s
1
1
γj ≤ √ exp B
γj ,
en,s (Pn ; H(Ks,γ )) ≤ √ exp B
n
n
j=1
j=1
(c2 −c+1/2)/2
for the anchored space with anchor value c ∈ [0, 1],
where B =
and B = 1/12 for the unanchored space. (In particular, B = 1/4 for the
anchor c = 1 and B = 1/8 for the midpoint anchor c = 1/2.)
Thus the worst-case error is bounded independently of s, and has the
Monte Carlo rate of convergence, if the infinite sum in (4.9) converges.
Proof. We give the proof for the anchored space with c = 1. The other
cases follow similarly. It follows from the averaging argument (there is
always at least one choice as good as the average) that there exists a QMC
point set Pn for which the worst-case error is less than or equal to the QMC
mean given by (3.13). We produce an upper bound on the latter by omitting
the negative term, and then use the third equation in (4.5) to evaluate the
integral, obtaining
s γj
1
2
2
1+
en,s (Pn ; H(Ks,γ )) ≤ En,s (H(Ks,γ )) ≤
n
2
j=1
s
s
γj
1
1
1
≤ exp
log 1 +
γj ,
= exp
n
2
n
2
j=1
j=1
where in the last step we used the property that log(1 + x) ≤ x for all x > 0.
The rest of the claim in the lemma follows immediately.
High-dimensional integration: The quasi-Monte Carlo way
185
The ‘only if’ part of Theorem 4.1 for the anchored case comes from the
following result established by Sloan and Woźniakowski (1998). The argument there is relatively elementary, starting from (3.9) with the off-diagonal
terms of the last term dropped (which is justifiable by the non-negativity
assumption).
Lemma 4.3. If the reproducing kernel Ks of some Hilbert space H(Ks )
is non-negative, then for any point set Pn ⊂ [0, 1]s we have
e2n,s (Pn ; H(Ks )) ≥ e20,s (H(Ks ))(1 − n κ2s ),
where
κs :=
sup x∈[0,1]s
and
1
,
Ks (x, x) as s
as (x)
as (x) :=
[0,1]s
Ks (x, y) dy.
Hence en,s (Pn , H(Ks )) ≤ εe0,s (H(Ks )) can happen only if
1 − ε2
.
κ2s
n≥
This lemma applies to the anchored space of Section 4.2 because the
kernel Ks,γ is in this case manifestly positive. The proof of the second part
of Theorem 4.1 then follows in this case by showing that κs → 0 if the
sum in (4.9) diverges. For the unanchored case a different proof is needed,
because the kernel Ks,γ in that case is not necessarily positive. The result
for that case is proved in Sloan and Woźniakowski (2002, Theorem 1), but in
any case is superseded by the stronger result stated in Theorem 4.5 below.
In the case of our original anchored but unweighted Sobolev space discussed in Example 3.3, it can be shown (Sloan and Woźniakowski 1998) that
κs ≈ 1.055−s , thus in this case n must be exponentially large to reduce the
initial error by some fixed percentage, say 50%.
The upper bound on the worst-case error in Lemma 4.2 is independent
of s under the condition (4.9), but the apparent rate of convergence in that
lemma is just the Monte Carlo rate of O(n−1/2 ). Fortunately, as we shall
see in Sections 5 and 6, a convergence rate arbitrarily close to O(n−1 ) can
be attained if the (product) weights satisfy the stronger condition
∞
j=1
1/2
γj
< ∞.
186
J. Dick, F. Y. Kuo and I. H. Sloan
4.5. Tractability of multivariate integration
In this subsection we frame the earlier discussion in terms of the valuable
notion of tractability. Loosely speaking, the tractability of a problem relates
to the question of how quickly the difficulty of a problem increases as the
dimension increases. Note that it relates to the difficulty of the problem,
not to the cost of any particular algorithm.
The standard reference in this area is the recent three-volume work by
Novak and Woźniakowski (2008, 2010, 2012). In the present brief discussion
we limit ourselves to a very small part of the subject, touching only parts
that are relevant to our concerns with multivariate integration over the unit
cube.
Thus our problem is that of evaluating the s-dimensional integral Is (f )
defined in (1.1). We will assume that f belongs to a Banach space Fs of
real-valued functions defined on [0, 1]s . The space Fs could be one of our
reproducing kernel spaces Hs , but could also (as in the next subsection)
be a non-Hilbert space. The setting is that we are allowed to approximate
Is (f ) by any deterministic algorithm An,s (f ) that uses at most n function
values of Fs . (In the language of information-based complexity (Traub,
Wasilkowski and Woźniakowski 1988), the algorithm is restricted to ‘standard information’.) Thus An,s could be a QMC integration rule, but could
also be an integration rule that uses unequal weights, including even negative weights, and An,s is even allowed to be a nonlinear combination of
f (t0 ), . . . , f (tn−1 ).
For a given s ≥ 1 and a given ε ∈ (0, 1] we define the nth minimal number
for this problem by
n(ε, s) := min n ≥ 1 : inf sup |Is (f ) − An,s (f )| ≤ ε ,
An,s f Fs ≤1
where the infimum is over all algorithms An,s that use no more than n
function values of f (and no other information about f ).
Definition 4.4 (tractability). The integration problem is said to be polynomially tractable if there is some C > 0 and some p, q > 0 and q ≥ 0 such
that
p
1
sq
n(ε, s) ≤ C
ε
for all s ∈ N and ε ∈ (0, 1]. It is strongly polynomially tractable if this
inequality holds with q = 0, that is, if
p
1
.
n(ε, s) ≤ C
ε
High-dimensional integration: The quasi-Monte Carlo way
187
Many other variants of tractability are defined in the cited works, but
these two will suffice for our present purposes.
The following theorem mimics Theorem 4.1 in the preceding subsection,
but there is an important difference: whereas for the ‘only if’ part of the theorem we previously allowed only QMC integration rules, here all algorithms
An,s are allowed, making the statement considerably stronger.
Theorem 4.5 (strong polynomial tractability). For product weights,
and for the weighted anchored or unanchored Sobolev space of Sections 4.2
and 4.3, the integration problem is strongly polynomially tractable if and
only if
∞
γj < ∞.
j=1
A necessary and sufficient condition for polynomial tractability is as follows.
Theorem 4.6 (polynomial tractability). For product weights, and for
the weighted anchored or unanchored Sobolev space of Sections 4.2 and 4.3,
the integration problem is polynomially tractable if and only if
s
j=1 γj
< ∞.
lim sup
s→∞ log(s + 1)
The preceding theorem may give the impression that non-trivial weights
are needed in order to have (polynomial) tractability, but the remarkable
result discussed in the next subsection shows that, while this is true for the
Hilbert spaces we have met so far, it is not true for the classical L1 version
of the anchored spaces.
But the Hilbert spaces retain one great advantage, namely that the worstcase error for a given QMC rule is computable, something that turns out (as
we shall see in the next two sections) to be helpful in construction. The star
discrepancy (whether weighted or not), on the other hand, is notoriously
difficult to compute: see Gnewuch, Srivastav and Winzen (2009).
In this subsection we have so far restricted ourselves to product weights.
Tractability results for general weights were first considered by Sloan et al.
(2004). For example, for general weights γu and the unanchored Sobolev
space it is shown that the integration problem is strongly polynomially
tractable if
|u|
γu 16
< ∞,
|u|<∞
where now the sum is over all finite subsets of the natural numbers. For
other tractability results for general weights and variants of the weights, we
refer the reader to Novak and Woźniakowski (2010).
188
J. Dick, F. Y. Kuo and I. H. Sloan
4.6. Tractability of star discrepancy
A surprising result established in Heinrich, Novak, Wasilkowski and Woźniakowski (2001) is that there exists a point set P in [0, 1]s consisting of n points
such that its star discrepancy, given by (3.16), satisfies
(
s
∗
,
(4.10)
Dn (P ) ≤ C
n
for some constant C > 0 which is independent of n and s. Aistleitner (2011)
showed that C can be chosen as 10, for instance. A lower bound by Hinrichs
(2004) states that the infimum of Dn∗ (P ) over all point sets P is bounded by
s
∗
inf
,
Dn (P ) ≥ min c0 , c
n
P ⊂[0,1]s ,|P |=n
where the constant c > 0 is independent of s and n and
c0 =
1
,
32e2
as stated in Gnewuch and Roşca (2009).
In the following we prove a slightly weaker version of (4.10). The proof is
based on Hoeffding’s inequality, a special case of which we describe in the
following. Let X0 , X1 , . . . , Xn−1 be independent and identically distributed
random variables with mean µ and Xi ∈ [−1, 1] almost surely for 0 ≤ i < n.
Let
1
Xi .
n
n−1
X=
i=0
Then
P(|X − µ| ≥ δ) ≤ 2 e−δ
2 n/2
,
for all δ ≥ 0.
We need an additional lemma.
Lemma 4.7. Let P = {t0 , t1 , . . . , tn−1 } ⊂ [0, 1]s be a point set consisting
of n elements. Let δ > 0 and m = s/δ. Let Γm be the equidistant grid on
[0, 1]s with mesh size 1/m (and therefore cardinality (m + 1)s ). Then
Dn∗ (P ) ≤ max |∆P (x)| + δ.
x∈Γm
Proof.
that
Let η > 0. Then there is a vector x∗ = (x∗1 , . . . , x∗s ) ∈ [0, 1]s such
Dn∗ (P ) ≤ |∆P (x∗ )| + η.
High-dimensional integration: The quasi-Monte Carlo way
189
Let y, z ∈ Γm be such that yj ≤ x∗j ≤ zj and zj − yj = m−1 . Then we have
s
r−1 s
r
s
s
s
zj −
yj =
zj
yj −
zj
yj
j=1
r=1
j=1
=
s
r=1
j=r
s
j=r+1
j=1
zj
r−1
j=r+1
yj (zr − yr ) ≤
j=1
j=1
s
≤ δ,
m
where the empty product is set to 1. Thus we have
s
j=1
zj −
1
1
1[0,z) (ti ) − δ ≤
x∗j −
1[0,x∗ ) (ti )
n
n
n−1
s
n−1
i=0
j=1
i=0
≤
s
1
1[0,y) (ti ) + δ,
n
n−1
yj −
j=1
i=0
which implies the result.
We are now ready to prove the following slightly weaker version of (4.10),
which is Theorem 1 of Heinrich et al. (2001).
Theorem 4.8. For every n, s ∈ N there exists a point set P consisting of
n points in [0, 1]s , whose star discrepancy satisfies
√ ) √ *
2
s n
2
∗
Dn (P ) ≤ √
+ 1 + log 2.
s log √
n
4 log 2
Proof. Let t0 , t1 , . . . , tn−1 be independently and uniformly distributed random variables in [0, 1]s . Let P = {t0 , t1 , . . . , tn−1 }. Then for each x ∈ [0, 1]s
∆ti (x) = 1[0,x) (ti ) − x1 · · · xs
is a random variable with mean 0 and |∆ti (x)| ≤ 1 for 0 ≤ i < n. Further,
we have
n−1
1
∆ti (x).
∆P (x) =
n
i=0
Thus we can use Hoeffding’s inequality, which implies that
+
,
P Dn∗ (P )) ≤ 2δ ≥P max |∆P (x)| ≤ δ
x∈Γm
≥1 − 2(m + 1)s e−δ
2 n/2
.
We now choose the parameters such that 1 − 2(m + 1)s e−δ
equivalently
δ2n
< 0.
log 2 + s log(m + 1) −
2
2 n/2
> 0, or
190
J. Dick, F. Y. Kuo and I. H. Sloan
Since m = s/δ, this holds for all δ > δ0 = δ0 (n, s), where δ0 satisfies
δ02 = 2n−1 (s log(s/δ0 + 1) + log 2).
This implies that
1
≤
δ0
and therefore
δ02
≤ 2n
−1
(
n
,
4 log 2
) √ *
s n
+ 1 + log 2 .
s log √
4 log 2
Thus for any δ > δ0 there exist points t0 , t1 , . . . , tn−1 such that
Dn∗ ({t0 , t1 , . . . , tn−1 }) ≤ 2δ.
Hence there is a point set P = {t0 , t1 , . . . , tn−1 } such that Dn∗ (P ) ≤ 2δ0 .
4.7. Notes
Weighted spaces were first introduced by Sloan and Woźniakowski (1998),
and subsequently generalized by Sloan and Woźniakowski (2001, 2002), Dick
et al. (2006) and Sloan et al. (2004). A non-technical introduction to the
weighted space setting and lattice rules is given in Kuo and Sloan (2005).
For the standard reference on tractability see the recent three-volume work
by Novak and Woźniakowski (2008, 2010, 2012).
The ‘only if’ parts of Theorems 4.5 and 4.6 were proved by Novak and
Woźniakowski (2001) for the anchored Sobolev space, and also for the more
general class of ‘decomposable’ kernels. For the unanchored Sobolev space
the second parts of the theorems were proved by Sloan and Woźniakowski
(2002), using results from Hickernell and Woźniakowski (2001) and in turn
Hickernell and Woźniakowski (2000). See also Wasilkowski and Woźniakowski (2004).
A classic paper on numerical integration and discrepancy is by Hickernell (1998a), who defined discrepancies which include lower-dimensional
projections and introduced the reproducing kernel machinery for numerical
integration. Sufficient conditions on the weights for which Sobol , Halton,
and Niederreiter sequences achieve strong tractability have been studied
by Wang (2002, 2003). Tractability of so-called ‘finite-order weights’ has
been shown by Sloan et al. (2004). Strong tractability of scrambled digital
nets and sequences has been studied by Yue and Hickernell (2005, 2006).
Tractability questions for the weighted star discrepancy have been considered by Hinrichs, Pillichshammer and Schmid (2008). A strategy for
choosing the weights in finance applications has been considered in Wang
and Sloan (2006, 2007). The concepts of ‘effective dimension’, ‘truncation
High-dimensional integration: The quasi-Monte Carlo way
191
dimension’ and ‘superposition dimension’ have been introduced in Caflisch
et al. (1997). The effective dimension of problems arising from financial
applications have been studied by Wang and Fang (2003) and Wang and
Sloan (2005) in the context of QMC.
The equidistant grid Γm used in Lemma 4.7 is a special δ-cover or (essentially) bracketing cover. The notion of bracketing is well established in empirical process theory (see van der Vart and Wellner 2009, for example), and
to achieve better results than in Theorem 4.8 one needs better bracketing
covers than Γm . In fact, the results from Aistleitner (2011), together with
the results on probabilistic discrepancy estimates from Gnewuch and Roşca
(2009), Aistleitner and Hofer (2012) and Gnewuch (2009), as well as theoretical arguments from Doerr, Gnewuch and Wahlström (2009) and Doerr,
Gnewuch, Kritzer and Pillichshammer (2008) for constructing small discrepancy samples, all rely on the constructive bracketing covers and the induced
upper bounds on bracketing numbers from Gnewuch (2008). Aistleitner
(2011) and Aistleitner and Hofer (2012) also use Bernstein’s inequality and
the technique of dyadic chaining.
Attempts have been made to find constructions of point sets whose star
discrepancy grows only polynomially with the dimension: see for instance
Doerr, Gnewuch and Wahlström (2010) and Doerr et al. (2008). An implementation and numerical experiments have been reported in Doerr et al.
(2009). However, the construction cost of these algorithms depends exponentially on the dimension. An essential problem in these algorithms is
the estimation of the star discrepancy of a given point set, which is itself
intractable as shown by Gnewuch et al. (2009). Algorithms for estimating
the star discrepancy have been further investigated by Gnewuch, Wahlström
and Winzen (2012).
Let n(ε, d) be the smallest number necessary to reduce the d-dimensional
L2 discrepancy ∆P 2 by a factor of ε. The exponent of discrepancy is then
the smallest number p such that n(ε, d) ≤ Cε−p for all d ≥ 1. It has been
shown that 1.0669 ≤ p ≤ 1.41274 for the classical L2 discrepancy, where the
lower bound is by Matoušek (1998a) and the upper bound by Wasilkowski
and Woźniakowski (2010).
5. Lattice rules
In Section 2 we gave a brief introduction to lattice rules. We defined rankone lattice rules and explained how randomly shifted lattice rules can be
used for practical error estimation. We also introduced the component-bycomponent (CBC) construction for obtaining good lattice rule generating
vectors. Here in this section we provide the theory behind the CBC construction, including error analysis of randomly shifted lattice rules and the
fast implementation of the CBC algorithm. We will also discuss extensible
192
J. Dick, F. Y. Kuo and I. H. Sloan
lattice sequences, which turn lattice rules from ‘closed’ point sets to ‘open’
sequences, making them more flexible for practical applications.
For most of this section, we will consider the weighted anchored or unanchored spaces of Section 4, which contain integrands with square-integrable
mixed first derivatives. The lattice rules constructed for these function
spaces can achieve close to O(n−1 ) convergence rate, with implied constants
that can be independent of the dimension s if the weights of the function
space satisfy a certain condition. We will also discuss the use of the baker’s
transformation to obtain close to O(n−2 ) convergence rate when the integrands have square-integrable mixed second derivatives, again with implied
constants that can be independent of s. We will also outline other strategies
for obtaining even higher orders of convergence using lattice rules, but now
the error bounds known so far have exponential growth in s.
5.1. A brief background on the classical theory of lattice rules
Lattice rules were originally designed for periodic functions. It is customary
to assume that the integrand f has an absolutely convergent Fourier series
(5.1)
f-(h) e2πih·x , i2 = −1,
f (x) =
h∈Zs
with Fourier coefficients
f-(h) =
f (x) e−2πih·x dx,
[0,1]s
where h · x = h1 x1 + · · · + hs xs denotes the usual vector dot product. It
follows from the absolute (and hence uniform) convergence of (5.1) that f is
necessarily continuous and also one-periodic with respect to each variable,
that is, f (x)|xj =0 = f (x)|xj =1 for all j = 1, . . . , s.
Theorem 5.1 (the lattice rule error). Let Qn,s denote a lattice rule
(not necessarily rank-one) and let L denote the associated integration lattice. If f has an absolutely convergent Fourier series (5.1), then
f-(h),
Qn,s (f ) − Is (f ) =
h∈L⊥ \{0}
where L⊥ := {h ∈ Zs : h · x ∈ Z for all x ∈ L} is the dual lattice associated
with L.
We will prove this result for rank-one lattice rules (see (2.3)), that is, for
QMC rules of the form
n−1 1
iz
.
f
Qn,s (f ) =
n
n
i=0
High-dimensional integration: The quasi-Monte Carlo way
193
In this case, L⊥ = {h ∈ Zs : h · z ≡ 0 (mod n)}. A proof for general lattice
rules is given in Sloan and Joe (1994).
Theorem 5.2 (rank-one lattice rule error). Let Qn,s denote a rankone lattice rule with generating vector z. If f has an absolutely convergent
Fourier series (5.1), then
f-(h).
Qn,s (f ) − Is (f ) =
h∈Zs \{0}
h·z≡0 (mod n)
Proof.
Using the Fourier series (5.1), we can write
kz
f (x) dx
−
n
[0,1]s
k=0
n−1
1
2πikh·z/n
e
e2πih·x dx.
−
f-(h)
f-(h)
=
n
s
[0,1]
s
s
1
f
Qn,s (f ) − Is (f ) =
n
n−1
h∈Z
h∈Z
k=0
The result then follows from the elementary properties
1 if h = 0,
2πih·x
e
dx =
s
0 otherwise,
[0,1]
and
1 2πikh·z/n
e
=
n
n−1
k=0
1
0
if h · z ≡ 0 (mod n),
otherwise.
(5.2)
The property (5.2) is sometimes referred to as the ‘character property’ of
lattice rules with respect to the exponential functions. It is one of the key
properties we need for the error analysis of lattice rules.
In the classical theory of lattice rules, one considers a class Eα (c) of
functions whose Fourier coefficients satisfy, for α > 1 and c > 0,
|f-(h)| ≤
c
,
(h1 · · · hs )α
with h := max(1, |h|).
It follows that
|Qn,s (f ) − Is (f )| ≤ c
h∈Zs \{0}
h·z≡0 (mod n)
1
,
(h1 · · · hs )α
for f ∈ Eα (c).
194
J. Dick, F. Y. Kuo and I. H. Sloan
This leads to the definition of a quality measure called Pα ,
1
Pα (z, n) :=
, α > 1,
(h1 · · · hs )α
s
(5.3)
h∈Z \{0}
h·z≡0 (mod n)
and the goal is to choose z so that Pα (z, n) is as small as possible. By using
the character property (5.2) (or by recognizing that Pα is the integration
error for a function f with Fourier coefficients f-(h) = 1/(h1 · · · hs )α ), one
can express Pα as
n−1 s
e2πikhzj /n
1 1+
(5.4)
Pα (z, n) = −1 +
n
|h|α
k=0 j=1
h∈Z\{0}
s
s
e2πikhzj /n
1 n−1
1 = −1 +
1 + 2ζ(α) +
1+
,
n
n
|h|α
j=1
where
k=1 j=1
∞
1
,
ζ(x) :=
hx
h∈Z\{0}
x > 1,
(5.5)
h=1
is the Riemann zeta function. For practical computation, α is often taken
to be an even integer. This is because the Bernoulli polynomial of degree α,
with α an even integer, has the Fourier series
α
(−1) 2 +1 α!
Bα (x) =
(2π)α
h∈Z\{0}
e2πihx
,
hα
for x ∈ [0, 1],
so that in this case we can write
α
e2πikhzj /n
kzj
(−1) 2 +1 (2π)α
B
.
=
α
|h|α
α!
n
(5.6)
(5.7)
h∈Z\{0}
This allows Pα in (5.4) to be computed in O(n s) operations. One may
therefore use the Korobov construction in Example 2.9 to search for a lattice
generating vector z, using the criterion that Pα be as small as possible.
It is well known that the rate of decay of the Fourier coefficients of a
function is related to the smoothness of the function. For instance, if α > 1
is an integer and all partial derivatives
∂ q1 +···+qs f
,
∂xq11 · · · ∂xqss
0 ≤ q1 , . . . , qs ≤ α,
exist and are continuous on [0, 1]s , then there exists c > 0 for which f ∈
Eα (c). Thus the class Eα (c) is essentially a class of functions with smoothness determined by α, therefore the parameter α is called the smoothness
High-dimensional integration: The quasi-Monte Carlo way
195
parameter. It is known that a convergence rate of
Pα (z, n) = O(n−α (log n)α s )
can be achieved. There are also other related quality measures for lattice
rules (Niederreiter 1992a, Sloan and Joe 1994).
We shall not discuss the classical theory further, because in the classical
error analysis all error bounds grow exponentially with dimension. In the
following, we will carry out error analysis in the weighted function space
setting of Section 4.
5.2. Shift-averaged worst-case error
In this subsection we discuss a general strategy for the error analysis of randomly shifted QMC rules, as a preparation for our later analysis of randomly
shifted lattice rules.
For any QMC point set P = {t0 , . . . , tn−1 } and any shift ∆ ∈ [0, 1]s , let
P + ∆ = {{ti + ∆} : i = 0, 1, . . . , n − 1}
denote the shifted QMC point set, and let Qn,s (∆; f ) denote the corresponding shifted QMC rule. Then, for any integrand f belonging to some
normed space H, it follows from the definition of the worst-case error (see
Definition 3.4) that
|Is (f ) − Qn,s (∆, f )| ≤ en,s (P + ∆; H) f H .
We deduce a bound for the root-mean-square error
E |Is (f ) − Qn,s (∆, f )|2 ≤ esh
n,s (P ; H) f H ,
where the expectation E is taken over the random shift ∆ which is uniformly
distributed over [0, 1]s , and where the quantity
esh
n,s (P ; H) :=
[0,1]s
e2n,s (P + ∆; H) d∆
(5.8)
is referred to as the shift-averaged worst-case error.
The shift-averaged worst-case error (5.8) will be used as our quality measure for randomly shifted QMC rules. For any given point set P , the averaging argument guarantees the existence of at least one shift ∆ for which
en,s (P + ∆; H) ≤ esh
n,s (P ; H).
The following theorem gives an explicit formula for the shift-averaged
worst-case error when the function space is an RKHS.
196
J. Dick, F. Y. Kuo and I. H. Sloan
Theorem 5.3 (formula for the shift-averaged worst-case error).
The shift-averaged worst-case error (5.8) for a QMC point set P in an RKHS
Hs (K) satisfies
2
[esh
n,s (P ; Hs (K))] = −
where
[0,1]s
[0,1]s
K(x, y) dx dy +
n−1 n−1
1 sh
K (ti , tk ),
n2
i=0 k=0
(5.9)
K sh (x, y) :=
[0,1]s
K({x + ∆}, {y + ∆}) d∆
for all x, y ∈ [0, 1]s . (5.10)
Proof. Using the definition (5.8) and applying the formula (3.9) for the
worst-case error en,s (P + ∆; Hs (K)), we obtain
2
[esh
n,s (P ; Hs (K))]
n−1 2
=
K(x, y) dx dy −
K({ti + ∆}, y) d∆ dy
n
s [0,1]s
[0,1]s [0,1]s
[0,1]
i=0
n−1
n−1
1
+ 2
K({ti + ∆}, {tk + ∆}) d∆.
n
[0,1]s
i=0 k=0
With a change of variables x = {ti + ∆}, the double integrals in the second
term turn into the double integral in the first term. The result then follows
from the definition (5.10).
The function K sh defined by (5.10) is actually a reproducing kernel, with
the shift-invariant property
K sh (x, y) = K sh ({x + ∆}, {y + ∆})
for all x, y, ∆ ∈ [0, 1]s ,
or equivalently
K sh (x, y) = K sh ({x − y}, 0)
for all x, y ∈ [0, 1]s .
Furthermore, it can be verified that the shift-averaged worst-case error
esh
n,s (P ; Hs (K)) is precisely the worst-case error of the QMC point set P
in the RKHS with K sh as the reproducing kernel, that is,
e2n,s (P + ∆; Hs (K)) d∆ = e2n,s (P ; Hs (K sh )).
[0,1]s
In words, the average over all possible shifts of the squared worst-case error
for a shifted QMC rule in a Hilbert space with reproducing kernel K is
equal to the squared worst-case error of the original unshifted QMC rule
in a Hilbert space with reproducing kernel K sh .This important connection
High-dimensional integration: The quasi-Monte Carlo way
197
provides a powerful tool for analysing randomly shifted QMC rules, because
it is often easier to work with a shift-invariant kernel. We refer to the kernel
K sh as the shift-invariant kernel associated with K.
Further, we note that for the reproducing kernels we consider below, the
reproducing kernel K sh has the same smoothness properties as K. As the
smoothness properties of the reproducing kernel determine the convergence
rate of the worst-case error, one obtains the same convergence rate of the
worst-case error for Hs (K sh ) as for Hs (K).
5.3. Randomly shifted lattice rules in weighted Sobolev spaces
We are now ready to analyse randomly shifted lattice rules in the weighted
anchored or unanchored Sobolev space of Section 4. First we derive the
associated shift-invariant kernel needed for the shift-averaged worst-case
error.
Lemma 5.4. Let H(Ks,γ ) be the weighted anchored or unanchored Sobolev
space of Section 4. The shift-invariant kernel associated with Ks,γ is
sh
(x, y) =
γu
(5.11)
B2 (|xj − yj |) + β ,
Ks,γ
u⊆{1:s}
j∈u
where β = c2 − c + 1/3 for the anchored variant with anchor c, and β = 0
for the unanchored variant.
Proof.
From the definitions (5.10) and (4.3), we have
1
sh
(x, y) =
γu
η({xj + ∆j }, {yj + ∆j }) d∆j ,
Ks,γ
u⊆{1:s}
j∈u
0
where the function η can be written in the common form (4.8) for both the
anchored and unanchored spaces. Using the symmetry B2 (x) = B2 (1 − x)
1
and 0 α(x) dx = 0, we obtain
1
η({x + ∆}, {y + ∆}) d∆
0
1
1
= 2 B2 (|x − y|) +
{x + ∆} − 12 {y + ∆} − 12 d∆ + β.
0
Therefore it remains to show that
1
{x + ∆} − 12 {y + ∆} − 12 d∆ = 12 B2 (|x − y|).
I(x, y) :=
0
Since I(x, y) = I(y, x), without loss of generality we assume that x ≤ y.
There are three possible arrangements of the values of x + ∆ and y + ∆
198
J. Dick, F. Y. Kuo and I. H. Sloan
for x, y, ∆ ∈ [0, 1):
x + ∆ ≤ y + ∆ < 1 ⇒ ∆ < 1 − y,
x + ∆ < 1 ≤ y + ∆ ⇒ 1 − y ≤ ∆ < 1 − x,
1 ≤ x + ∆ ≤ y + ∆ ⇒ 1 − x ≤ ∆.
Thus we have
1−y I(x, y) =
0
x+∆−
1−x 1
2
y+∆−
x+∆−
+
1−y
1 1
2
1
2
d∆
y + ∆ − 1 − 12 d∆
y + ∆ − 1 − 12 d∆
1−x
1
1
= 12 |x − y|2 − |x − y| + 16 .
= 2 (y − x)2 − 12 (y − x) + 12
+
x+∆−1−
1
2
This completes the proof.
For simplicity, let
sh
esh
n,s (z) := en,s (P ; H(Ks,γ ))
denote the shift-averaged worst-case error for a rank-one lattice rule with
point set P and generating vector z in our weighted anchored or unanchored
2
space. The following lemma gives an explicit expression for [esh
n,s (z)] .
Lemma 5.5 (shift-averaged worst-case error for lattice rules). The
shift-averaged worst-case error for a rank-one lattice rule in the weighted
anchored or unanchored Sobolev space satisfies
n−1 .
kz
1
j
2
γu
B2
+ β − β |u| , (5.12)
[esh
n,s (z)] =
n
n
∅=u⊆{1:s}
k=0 j∈u
where β = c2 − c + 1/3 for the anchored variant with anchor c, and β = 0 for
the unanchored variant. For product weights, the expression simplifies to
.
s
s 1 n−1
kzj
sh
2
1 + γ j B2
[en,s (z)] = −
+ β . (5.13)
1 + γj β +
n
n
j=1
k=0 j=1
Proof. Substituting ti = {iz/n} into (5.9) and using (4.5) and (5.11), we
obtain
n−1 n−1
(i − k)zj .
1 sh
2
|u|
+β .
B2
γu β + 2
γu
[en,s (z)] = −
n
n
u⊆{1:s}
i=0 k=0 u⊆{1:s}
j∈u
As i and k range from 0 to n − 1, the values of (i − k) mod n are just
0, . . . , n − 1 in some order, with each value occurring n times. Thus we can
High-dimensional integration: The quasi-Monte Carlo way
199
2
reduce the double sum in [esh
n,s (z)] to a single sum. Grouping the sum over
u and then cancelling the u = ∅ term yields the formula for general weights
above. The formula for product weights follows from the identity (4.2).
5.4. Component-by-component construction
Recall that the components of the generating vector z can be restricted to
the set
Un = {z ∈ Z : 1 ≤ z ≤ n − 1 and gcd(z, n) = 1},
whose cardinality is given by the Euler totient function
ϕ(n) := |Un | = |{z ∈ Z : 1 ≤ z ≤ n − 1 and gcd(z, n) = 1}|.
(5.14)
When n is prime ϕ(n) takes its largest value n−1, hence there are altogether
up to (n − 1)s possible choices for z, far too many to allow an exhaustive
search for the best one.
The CBC construction below, already foreshadowed in Example 2.10,
provides a feasible way to obtain good lattice generating vectors. The first
component z1 is arbitrarily set to 1, since all choices in one dimension lead
to the same rectangle rule.
Algorithm 5.6 (CBC construction). Given: n, smax , and weights γu .
(1) Set z1 = 1.
2
(2) For s = 2, 3, . . . , smax , choose zs in Un to minimize [esh
n,s (z1 , . . . , zs )] .
With general weights γu , the cost of the CBC algorithm is prohibitively
expensive, thus when discussing implementation we will always assume there
is some special structure, such as product weights, order-dependent weights,
finite-order weights, or POD weights. We will discuss the implementation
of CBC in later subsections.
In this subsection we focus on error bounds resulting from the CBC construction. The CBC error bounds are proved by induction: under the induction hypothesis that the desired bound holds in s − 1 dimensions, we
show by the averaging argument that the algorithm picks a next component
zs , for which the bound holds in s dimensions.
To illustrate the general argument, we first prove that in the simplest
case of product weights and prime n, a generating vector constructed by
the CBC algorithm beats the QMC mean (see (3.13) together with (4.5))
s s
1
1
2
1 + γj β +
−
(Ks,γ ) =
(1 + γj β) ,
En,s
n
6
j=1
j=1
which yields a convergence rate of O(n−1/2 ). We will prove a better, if more
complicated, result in Theorem 5.8 below.
200
J. Dick, F. Y. Kuo and I. H. Sloan
Theorem 5.7 (CBC error bound beats QMC mean). Consider the
weighted anchored or unanchored Sobolev space with product weights γ1 ≥
γ2 ≥ · · · > 0, and suppose that n is a prime number satisfying
γ1
,
(5.15)
n≥
6(1 + γ1 β)
where β = c2 − c + 1/3 for the anchored variant with anchor c, and β = 0
for the unanchored variant. The generating vector z ∈ Usn constructed by
the CBC algorithm, minimizing the squared shift-averaged worst-case error
2
[esh
n,s (z)] in the weighted anchored or unanchored Sobolev space in each
step, satisfies
2
2
[esh
n,s (z)] < En,s (Ks,γ ).
(5.16)
Proof. For the inductive step we need to show, under the assumption
2
2
sh
2
2
[esh
n,s−1 (z1 , . . . , zs−1 )] < En,s−1 (Ks−1,γ ), that [en,s (z)] < En,s (Ks,γ ) when
sh
2
zs ∈ Un is chosen to minimize [en,s (z)] .
To proceed, we rewrite (5.13) as
2
[esh
n,s (z)]
s−1
sh
1
γs 2
1 + γj β +
= 1 + γs β [en,s−1 (z1 , . . . , zs−1 )] +
6n
6
+
γs
n
n−1
B2
j=1
s−1
..
kzj
kzs
+β
,
1 + γ j B2
n
n
j=1
k=1
where we separated out the k = 0 term and used B2 (0) = 1/6. Next we
average over all possible choices of zs , forming
µ(z1 , . . . , zs−1 ) :=
n−1
1 sh
[en,s (z)]2 .
n−1
zs =1
Since B2 (x) = (1/2π 2 ) h∈Z\{0} e2πihx /h2 for x ∈ [0, 1], for any integer k
from 1 to n − 1 we can write
n−1
n−1
2π 2 kzs
1 e2πihkz/n
B2
=
n−1
n
n−1
h2
zs =1
=
h∈Z\{0}
h≡0 (mod n)
=
h∈Z\{0}
h≡0 (mod n)
1
1
−
2
h
n−1
1
1
−
h2 n − 1
z=1 h∈Z\{0}
h∈Z\{0}
h≡0 (mod n)
h∈Z\{0}
1
h2
1
−
h2
h∈Z\{0}
h≡0 (mod n)
1
h2
201
High-dimensional integration: The quasi-Monte Carlo way
=
n
n−1
h∈Z\{0}
h≡0 (mod n)
1
1
−
2
h
n−1
h∈Z\{0}
1
h2
2π 2
2π 2
2π 2 n
−
=
−
,
6n2 (n − 1) 6(n − 1)
6n
2
2
where we used ∞
h=1 1/h = π /6. Thus we conclude for any integer 1 ≤
k ≤ n − 1 that
n−1
kzs
1
1 =− .
(5.17)
B2
n−1
n
6n
=
zs =1
Combining the expressions yields
2
µ(z1 , . . . , zs−1 ) = 1 + γs β [esh
n,s−1 (z1 , . . . , zs−1 )]
.
s−1
n−1 s−1
kzj
γs 1
γs − 2
+β .
1 + γj β +
1 + γ j B2
+
6n
6
6n
n
j=1
k=1 j=1
Rewriting the third expression in terms of e2n,s−1 (z1 , . . . , zs−1 )
collecting similar expressions, we obtain
γs
2
[esh
µ(z1 , . . . , zs−1 ) = 1 + γs β −
n,s−1 (z1 , . . . , zs−1 )]
6n
s−1
s−1
γs 1 γs γs
+
1 + γj β +
−
1 + γj β +
6n
6
6n n
j=1
j=1
and then
1
6
.
The condition (5.15) ensures that 1 + γs β − γs /(6n) ≥ 0. Applying the
induction hypothesis and dropping a negative term, we finally arrive at
2 (K
µ(z1 , . . . , zs−1 ) < En,s
s,γ ).
2
Since µ(z1 , . . . , zs−1 ) is the average of [esh
n,s (z)] over all zs ∈ Un , the
2
choice of zs that minimizes [esh
n,s (z)] must satisfy
2
2
[esh
n,s (z)] ≤ µ(z1 , . . . , zs−1 ) < En,s (Ks,γ ).
To complete the proof by induction we note that the error bound (5.16)
is easily satisfied for s = 1: indeed we may adapt (5.17) to obtain
n−1
k
γ1 1 n − 1
γ1
γ1 γ1
sh
2
=
−
= 2 ≤
.
B2
[en,1 (1)] =
n
n
n 6
6n
6n
6n
k=0
This completes the proof.
We now show, through a more complicated averaging argument in the
induction proof, that the CBC algorithm yields a convergence rate arbitrarily close to O(n−1 ). The result holds for general n (not necessarily prime),
202
J. Dick, F. Y. Kuo and I. H. Sloan
and holds for the unanchored space with general weights (including product
weights), but holds for the anchored space only with product weights. For
the anchored space with general non-product weights, the same error bound
holds, but a modification to the search criterion in the CBC construction
is needed. We therefore state the results for the unanchored and anchored
spaces separately.
In the following theorems, the choice of zs at each step is independent of
the parameter λ, and the error bound holds for all values of λ ∈ (1/2, 1].
The optimal convergence rate close to O(n−1 ) is obtained with λ → 1/2,
but note that λ = 1/2 is not permitted because ζ(2λ) → ∞ as λ → 1/2.
The implied constant in the big-O bound can be independent of s under an
appropriate condition on the weights. For example, in the case of product
weights, a sufficient condition to obtain close to O(n−1 ) convergence with
1/2
the implied constant independently of s is ∞
< ∞.
j=1 γj
Theorem 5.8 (optimal CBC error bound: the unanchored case).
The generating vector z ∈ Usn constructed by the CBC algorithm, minimiz2
ing the squared shift-averaged worst-case error [esh
n,s (z)] for the weighted
unanchored Sobolev space in each step, satisfies
|u| 1/λ
2ζ(2λ)
1
2
γuλ
,
(5.18)
[esh
n,s (z)] ≤
ϕ(n)
(2π 2 )λ
∅=u⊆{1:s}
for all λ ∈ (1/2, 1], where ζ(·) is the Riemann zeta function (5.5), and ϕ(n)
is the Euler totient function (5.14).
Proof. We prove this by induction on s. The base step s = 1 is straightforward to verify for all λ ∈ (1/2, 1]. Assume now that we have chosen the
first s − 1 components z1 , . . . , zs−1 and that (5.18) holds with s replaced by
s − 1. We separate the terms in (5.12) (remembering that β = 0 for this
unanchored space), depending on whether or not the element s is included
in the set u, to obtain the recursive expression
2
sh
2
[esh
n,s (z1 , . . . , zs−1 , zs )] = [en,s−1 (z1 , . . . , zs−1 )] + θ(zs ),
with (suppressing the dependence of θ on z1 , . . . , zs−1 )
n−1
kzj
1 γu
B2
θ(zs ) :=
n
n
k=0 j∈u
s∈u⊆{1:s}
n−1
γu
e2πikhu ·z u /n
1
=
2
(2π 2 )|u| n
j∈u hj
|u|
s∈u⊆{1:s}
k=0 hu ∈(Z\{0})
(5.19)
203
High-dimensional integration: The quasi-Monte Carlo way
=
s∈u⊆{1:s}
=
s∈u⊆{1:s}
γu
(2π 2 )|u|
γu
(2π 2 )|u|
hu ∈(Z\{0})|u|
hu ·z u ≡0 (mod n)
hs ∈Z\{0}
1
2
j∈u hj
1
h2s
hu\{s} ∈(Z\{0})|u|−1
hu\{s} ·z u\{s} ≡−hs zs (mod n)
1
2
j∈u\{s} hj
,
where we made use of the Fourier expansion of B2 and the character property (5.2).
If zs∗ denotes the value chosen by the CBC algorithm in dimension s, then
(since the minimum is always smaller than or equal to the average) we have
for all λ ∈ (0, 1] that
[θ(zs∗ )]λ ≤
1 1 γuλ
[θ(zs )]λ ≤
ϕ(n)
ϕ(n)
(2π 2 )|u|λ
zs ∈Un
zs ∈Un s∈u⊆{1:s}
1
1
×
,
2λ
|hs |2λ
j∈u\{s} |hj |
hs ∈Z\{0}
hu\{s} ∈(Z\{0})|u|−1
hu\{s} ·z u\{s} ≡−hs zs (mod n)
where we used the inequality (sometimes referred to as Jensen’s inequality)
λ ak
≤
aλk , ak ≥ 0, λ ∈ (0, 1].
(5.20)
k
k
Next we separate the terms depending on whether or not hs is a multiple
of n, to obtain
λ
2ζ(2λ)
γ
1
u
·
[θ(zs∗ )]λ ≤
2λ
n2λ
(2π 2 )|u|λ
j∈u\{s} |hj |
hu\{s} ∈(Z\{0})|u|−1
hu\{s} ·z u\{s} ≡0 (mod n)
s∈u⊆{1:s}
n−1
γuλ
1 +
ϕ(n)
(2π 2 )|u|λ
zs ∈Un c=1 s∈u⊆{1:s}
1
×
|hs |2λ
hs ∈Z\{0}
hs ≡−czs−1 (mod n)
∈(Z\{0})|u|−1
hu\{s}
hu\{s} ·z u\{s} ≡c (mod n)
1
j∈u\{s} |hj |
2λ
,
where zs−1 denotes the multiplicative inverse of zs in Un , that is, zs zs−1 ≡
1 (modn). For fixed c satisfying 1 ≤ c ≤ n − 1, we have {czs−1 mod n : zs ∈
Un } = {cz mod n : z ∈ Un }. Let g = gcd(c, n). Then gcd(c/g, n/g) = 1,
204
J. Dick, F. Y. Kuo and I. H. Sloan
and
zs ∈Un
hs ∈Z\{0}
hs ≡−czs−1 (mod n)
=
z∈Un m∈Z
=g
−2λ
1
=
|hs |2λ
z∈Un
hs ∈Z\{0}
hs ≡−cz (mod n)
1
|hs |2λ
1
1
−2λ
=
g
|mn − cz|2λ
|m(n/g) − (c/g)z|2λ
z∈Un m∈Z
z∈Un
h∈Z\{0}
h≡−(c/g)z (mod n/g)
n/g−1
1
−2λ
≤
g
g
|h|2λ
a=1
= g 1−2λ · 2ζ(2λ) 1 − (n/g)−2λ ≤ 2ζ(2λ),
h∈Z\{0}
h≡a (mod n/g)
1
|h|2λ
where the last step holds because g ≥ 1 and λ > 1/2. (The condition
λ > 1/2 is needed to ensure that ζ(2λ) < ∞.) Hence
2ζ(2λ)
γuλ
1
∗ λ
·
[θ(zs )] ≤
2λ
n2λ
(2π 2 )|u|λ
j∈u\{s} |hj |
s∈u⊆{1:s}
1
+
ϕ(n)
1
≤
ϕ(n)
s∈u⊆{1:s}
s∈u⊆{1:s}
γuλ
· 2ζ(2λ)
(2π 2 )|u|λ
γuλ
2ζ(2λ)
(2π 2 )λ
|u|
hu\{s} ∈(Z\{0})|u|−1
hu\{s} ·z u\{s} ≡0 (mod n)
∈(Z\{0})|u|−1
hu\{s}
hu\{s} ·z u\{s} ≡0 (mod n)
1
j∈u\{s} |hj |
2λ
.
Now we use (5.19), the induction hypothesis and another use of (5.20) to
obtain (5.18).
Theorem 5.9 (optimal CBC error bound: the anchored case). A
generating vector z ∈ Usn can be constructed by a (modified) CBC algorithm
such that the squared shift-averaged worst-case error esh
n,s (z) in the weighted
anchored Sobolev space with anchor c satisfies
|u| 1/λ
2ζ(2λ)
1
2
γuλ
+ βλ
[esh
n,s (z)] ≤
ϕ(n)
(2π 2 )λ
∅=u⊆{1:s}
for all λ ∈ (1/2, 1], where β = c2 − c + 1/3, ζ(·) is the Riemann zeta function
(5.5), and ϕ(n) is the Euler totient function (5.14). For product weights,
2
the algorithm minimizes [esh
n,s (z)] step by step as usual. For general nonproduct weights, the algorithm minimizes an auxiliary quantity e/2n,s,d (z)
High-dimensional integration: The quasi-Monte Carlo way
depending on s,
e/2n,s,d (z)
:=
γ
/s,v
∅=v⊆{1:d}
1 B2
n
n−1
k=0 j∈v
kzj
n
205
,
(5.21)
step by step for each d = 2, 3, . . . , s, with auxiliary weights defined by
γu β |u|−|v| , v ⊆ {1 : s}.
(5.22)
γ
/s,v :=
v⊆u⊆{1:s}
Note that the latter algorithm is not extensible in s.
Proof. In the previous proof we made use of the Fourier expansion of B2 ,
which has no constant term. To allow us to use essentially the same argument here, we start from (5.12), adapting the identity (4.2) and swapping
the order of sums, to obtain
n−1
kzj 1 sh
2
|u|−|v|
−
γu
β
B2
γu β |u|
[en,s (z)] =
n
n
v⊆u
k=0 u⊆{1:s}
=
n−1
1 n
γu β |u|−|v|
k=0 v⊆{1:s} v⊆u⊆{1:s}
=
=
1
n
n−1
k=0 v⊆{1:s}
∅=v⊆{1:s}
γ
/s,v
γ
/s,v
B2
j∈v
B2
j∈v
1
n
u⊆{1:s}
j∈v
n−1
kzj
n
B2
k=0 j∈v
kzj
n
−
γu β |u|
u⊆{1:s}
−γ
/s,∅
kzj
n
,
(5.23)
where we introduced the auxiliary weights defined in (5.22).
This last expression takes the same form as the squared shift-averaged
worst-case error in the unanchored space. However, there is a vital difference
that the auxiliary weights γ
/s,v depend on the dimension s. Consequently, a
recursive formula of the form (5.19) does not hold.
Case 1. Consider first the case of product weights γu = j∈u γj . It follows
from the definition (5.22) that, for s ∈
/ v, we have
γu β |u|−|v| +
γu γs β |u|−|v|+1 = (1 + γs β)/
γs−1,v .
γ
/s,v =
v⊆u⊆{1:s−1}
v⊆u⊆{1:s−1}
Thus, by separating the terms depending on whether or not the element s
is included in the set v, we obtain
2
sh
2
[esh
n,s (z)] = (1 + γs β)[en,s−1 (z1 , . . . , zs−1 )] + θ(zs ),
206
J. Dick, F. Y. Kuo and I. H. Sloan
where
θ(zs ) :=
γ
/s,v
1 B2
n
n−1
k=0 j∈v
s∈v⊆{1:s}
kzj
n
.
The induction hypothesis is now
2
[esh
n,s−1 (z1 , . . . , zs−1 )]
≤
1
ϕ(n)
λ
γ
/s−1,v
∅=v⊆{1:s−1}
2ζ(2λ)
(2π 2 )λ
|v| 1/λ
for all λ ∈ (1/2, 1]. For the quantity θ(zs ) we follow the averaging argument
in the previous proof, to obtain
1
≤
ϕ(n)
[θ(zs∗ )]λ
λ
γ
/s,v
s∈v⊆{1:s}
2ζ(2λ)
(2π 2 )λ
|v|
.
Combining the estimates, we obtain
1
2
[esh
n,s (z)] ≤ (1 + γs β)
ϕ(n)
+
1
ϕ(n)
λ
γ
/s−1,v
∅=v⊆{1:s−1}
λ
γ
/s,v
s∈v⊆{1:s}
2ζ(2λ)
(2π 2 )λ
2ζ(2λ)
(2π 2 )λ
|v| 1/λ
|v| 1/λ
.
Applying (5.20) then yields
2
[esh
n,s (z)] ≤
1
ϕ(n)
λ
γ
/s,v
∅=v⊆{1:s}
2ζ(2λ)
(2π 2 )λ
|v| 1/λ
.
(5.24)
Finally we express the result in terms of the original weights. Using (5.22)
and (5.20), we have
λ
γ
/s,v
∅=v⊆{1:s}
≤
2ζ(2λ)
(2π 2 )λ
|v|
γuλ β (|u|−|v|)λ
v⊆{1:s} v⊆u⊆{1:s}
=
u⊆{1:s}
γuλ
v⊆u
β (|u|−|v|)λ
2ζ(2λ)
(2π 2 )λ
2ζ(2λ)
(2π 2 )λ
|v|
|v|
−
−
γuλ β |u|λ
u⊆{1:s}
u⊆{1:s}
γuλ β |u|λ
High-dimensional integration: The quasi-Monte Carlo way
=
γuλ
u⊆{1:s}
≤
2ζ(2λ)
+ βλ
(2π 2 )λ
γuλ
∅=u⊆{1:s}
|u|
2ζ(2λ)
+ βλ
(2π 2 )λ
−
|u|
207
γuλ β |u|λ
u⊆{1:s}
.
(5.25)
That the bound holds for s = 1 follows as before, completing the proof for
the case of product weights.
Remark. For the case of product weights, the use of auxiliary weights is
only needed in the proof and has no bearing on the actual CBC construction. It is also possible to prove the result by working directly with the
simplified worst-case error expression for product weights (see (5.13)) without introducing auxiliary weights. We have chosen to prove the result this
way to provide valuable insights into the more complicated case of general
non-product weights.
Case 2. Now we consider the case of general non-product weights. Due to
2
the lack of structure in general weights, we are unable to relate [esh
n,s (z)]
2
to [esh
n,s−1 (z1 , . . . , zs−1 )] in a meaningful way. We therefore switch to a
modified CBC construction based on the auxiliary quantity (5.21). Note
that the auxiliary weights depend only on the final dimension s and not on
the induction index d in (5.21). This allows us to write
/ d ),
e/2n,s,d (z1 , . . . , zd ) = e/2n,s,d−1 (z1 , . . . , zd−1 ) + θ(z
where
/ d ) :=
θ(z
γ
/s,v
d∈v⊆{1:d}
1 B2
n
n−1
k=0 j∈v
kzj
n
.
Note that
e/2n,s,s (z) = e2n,s (z),
but
e/2n,s,d (z1 , . . . , zd ) = e2n,d (z1 , . . . , zd )
for d < s.
Hence the CBC construction based on the auxiliary quantity is not the same
as the original CBC construction. We prove by induction that the CBC
construction based on the auxiliary quantity yields, for each d = 1, 2, . . . , s,
1/λ
2ζ(2λ) |v|
1
2
λ
γ
/s,v
e/n,s,d (z1 , . . . , zd ) ≤
ϕ(n)
(2π 2 )λ
∅=v⊆{1:d}
for all λ ∈ (1/2, 1]. Since the auxiliary weights do not change with the
induction index d, the proof is essentially the same as the previous proof for
the unanchored space. At the final step d = s we recover the error bound
(5.24), which can be expressed in terms of the original weights following
(5.25). This completes the proof.
208
J. Dick, F. Y. Kuo and I. H. Sloan
The following theorem illustrates how we can apply the theory of this
section to a given integration problem.
Theorem 5.10 (CBC integration error). Suppose that a given integrand f belongs to either our weighted anchored or unanchored Sobolev
space. A lattice rule generating vector z ∈ Usn can be constructed by a
CBC algorithm such that, for all λ ∈ (1/2, 1],
E |Is (f ) − Qn,s (·; f )|2 ≤
1
ϕ(n)
γuλ
∅=u⊆{1:s}
2ζ(2λ)
+β λ
(2π 2 )λ
|u| 1/(2λ)
f s,γ ,
where the expectation is taken with respect to the random shift which is
uniformly distributed over [0, 1]s , β = c2 − c + 1/3 for the anchored variant
with anchor c and β = 0 for the unanchored variant, ζ(·) is the Riemann
zeta function (5.5), and ϕ(n) is the Euler totient function (5.14).
We conclude this subsection with the remark that a different analysis can
be used for the anchored Sobolev space if, roughly speaking, the auxiliary
weights (5.22) are dominated by the original weights, that is, if
γu β |u|−|v| ≤ Cγv
|u|<∞
v⊆u⊂N
for all finite subsets v ⊂ N, where C > 0 is independent of v. A similar
result holds if the weights are of the following special POD form:
γj ,
γu = ((|u| + )!)a
j∈u
where is a non-negative integer and a is a positive real number. We explain
the alternative analysis for this case below.
Recall that for the anchored space the CBC algorithm must work with
auxiliary weights (5.22) which do not preserve the POD structure of the
original weights. We can write
((|u| + )!)a
γj
γ
/s,v =
v⊆u⊆{1:s}
= ((|v| + )!)
a
j∈u
γj
j∈v
= ((|v| + )!)
a
j∈v
γj
v⊆u⊆{1:s}
w⊆{1:s}\v
(|u| + )!
(|v| + )!
a γj
j∈u\v
a
(|v| + |w| + )!
(|v| + )!
j∈w
γj .
209
High-dimensional integration: The quasi-Monte Carlo way
Using the simple inequality
p+q p+q
(p + q)!
p+q
= 2p+q ,
=
≤
k
p! q!
p
k=0
we obtain
γ
/s,v ≤ ((|v| + )!)
a
j∈v
≤ ((|v| + )!)a
γj
a γj
|w|! 2|v|+|w|+
w⊆{1:s}\v
w⊆{1:s}
j∈v
where
/
γ
/v := ((|v| + )!)a (2a γj ),
and
j∈w
cs,γ := 2a
(|w|!)a
w⊆{1:s}
j∈v
Thus from (5.23) we can write
2
[esh
n,s (z)] ≤ cs,γ
j∈w
/
(2a γj ) 2a
(|w|!)a
(2a γj ) = cs,γ γ
/v ,
∅=v⊆{1:s}
/
γ
/v
1 B2
n
n−1
k=0 j∈v
kzj
n
(2a γj ).
j∈w
.
(5.26)
Hence, we can use the expression on the right-hand side of (5.26) (without
the factor cs,γ ) as the search criterion for the CBC construction. The benefit
/
is that the new weights γ
/v are also of the POD form, and they do not depend
on the dimension s. This means that a fast implementation is possible (see
Section 5.6) and that the algorithm is extensible in s. The resulting error
bound would be slightly worse (each factor in the product part of the weights
1/2
would be scaled by 2a , and there is an additional factor of cs,γ in the overall
error bound in Theorem 5.10), but the convergence rate and the dependence
of the implied constant on the dimension s remain the same as before.
5.5. Fast CBC construction for product weights
In this subsection we explain how to implement the CBC construction efficiently for product weights. We begin by presenting a naive implementation
of Algorithm 5.6 as a pseudocode (Pseudocode 1, overleaf). Then we discuss
the techniques to speed up the computation.
The reader should keep in mind that the general approach described in
this subsection applies to all shift-invariant kernels: we can replace B2 (x)
by any generic function whose integral over [0, 1] is 0.
Recall that
Zn := {z ∈ Z : 0 ≤ z ≤ n − 1},
Un := {z ∈ Zn : gcd(z, n) = 1},
210
J. Dick, F. Y. Kuo and I. H. Sloan
Pseudocode 1 (CBC: naive implementation)
for s from 1 to smax do
for all zs ∈ Un do
.
s
n−1 s kzj
1 1 + γ j B2
+β
(1 + γj β) +
e2s (zs ) = −
n
n
j=1
k=0 j=1
end for
zs = argminz∈Un e2s (z)
end for
with ϕ(n) := |Un |. In the pseudocode we used the simplified notation
2
e2s (zs ) := [esh
n,s (z1 , . . . , zs )]
to highlight the fact that zs is the only component to be determined at
step s; all previous components z1 , . . . , zs−1 have already been fixed. To
simplify the exposition, we included z1 in the search as well, even though
all choices in the first dimension are equivalent.
The computational cost of this naive implementation is O(n2 s2max ) operations. There are a number of ways to reduce this cost. Firstly, due to the
symmetry B2 (x) = B2 (1 − x), we have e2s (zs ) = e2s (n − zs ), thus we only
need to search through half of the elements in Un . Secondly, at step s we
can write
(5.27)
e2s (zs ) = (1 + γs β) e2s−1
0 s−1
.5
n−1
kzj
kzs
γs +β
1 + γ j B2
B2
.
+
n
n
n
j=1
k=0
23
41
23
4
1
Ωn (zs ,k)
ps−1 (k)
The n values ps−1 (k) do not depend on zs , and can be stored during the
search. This reduces the construction cost to O(n2 smax ) operations, at the
expense of O(n) storage.
The values of (5.27) for all zs ∈ Un can be expressed as a ϕ(n) × 1 vector
e2s = [e2s (zs )]zs ∈Un ,
which can be computed in terms of a matrix–vector product, with the ϕ(n)×
n matrix
.
kz mod n
(5.28)
Ωn = [Ωn (z, k)]z∈Un = B2
z∈Un ,
n
k∈Zn
k∈Zn
High-dimensional integration: The quasi-Monte Carlo way
Pseudocode 2 (CBC: matrix–vector form)
p0 = 1
e20 = 0
for s from 1 to smax do
γs
Ωn ps−1
e2s = (1 + γs β)e2s−1 +
n
2
zs = argminz∈Un es (z)
e2s = e2s (zs )
ps = 1 + γs (Ωn (zs , :) + β) .∗ ps−1
end for
211
compute
select
set
update
and the n × 1 vector
ps−1 = [ps−1 (k)]k∈Zn .
This yields our second pseudocode. Here Ωn (zs , :) means taking a particular
row of the matrix, while .∗ means element-wise vector multiplication. Also,
1 = (1, . . . , 1) and β = (β, . . . , β) are vectors with the appropriate length.
At the update stage, the vector ps−1 can be overwritten by the vector ps .
Thus the memory requirement is of order O(n).
The trick now is to order the indices z ∈ Un and k ∈ Zn in (5.28) in a
clever way in order to allow fast matrix–vector multiplication. Below we
explain how this is achieved for the simpler case when n is prime.
When n is prime and using the Rader factorization, the matrix Ωn can be
reordered such that it has a circulant submatrix C n of size (n − 1) × (n − 1).
This is achieved by taking the indices in the order
z = gi
and
k = (g −1 )i
for 0 ≤ i, i ≤ n − 2,
where g is a primitive root of n (i.e., g is a generator for the cyclic group
Un ) and g −1 denotes its multiplicative inverse. We denote this particular
g
g
ordering of Ωn by Ωn . Thus, for n prime, the element Ωn (i, i ) in the
g
(i + 1)th row and (i + 1)th column of Ωn is given by
Ωg
n (i, i )
=
Ωn (g i , (g −1 )i )
if 0 ≤ i, i ≤ n − 2,
0
if i = n − 1,
where g i , (g −1 )i ∈ Un for 0 ≤ i, i ≤ n − 2. A graphical illustration of the
effect of reordering the indices is given in Figure 5.1. We present a concrete
illustration of the reordering in the next example.
J. Dick, F. Y. Kuo and I. H. Sloan
(a)
53
1
212
(b)
Figure 5.1. (Image by Dirk Nuyens.) The structure of the matrix
Ω53 for the fast CBC construction of lattice rules. (a) Original
ordering of indices, and (b) after a reordering of indices.
Example 5.11. Take n = 11.
natural ordering of the indices z

  0 1 2

 
  0 2 4
 
  0 3 6
 
  0 4 8
 
 1  0 5 10

Ω11 = B2  
 11 
0 6 1
 
 
0 7 3
 
 

  0 8 5
 
0 9 7



0 10 9
Then U11 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. The
∈ U11 and k ∈ Z11 gives the 10 × 11 matrix


3 4 5 6 7 8 9 10 


6 8 10 1 3 5 7 9 


9 1 4 7 10 2 5 8 


1 5 9 2 6 10 3 7 


4 9 3 8 2 7 1 6 

.
7 2 8 3 9 4 10 5 


10 6 2 9 5 1 8 4 


2 10 7 4 1 9 6 3 


5 3 1 10 8 6 4 2 

8 7 6 5 4 3 2 1 
The generator for the group U11 can be taken to be g = 2. The ordering for
the z indices is then
[g i : 0 ≤ i ≤ 9] = [1, 2, 4, 8, 5, 10, 9, 7, 3, 6].
For the ordering of the indices k = 0 we take the powers of g −1 = 6,
[(g −1 )i : 0 ≤ i ≤ 9] = [1, 6, 3, 7, 9, 10, 5, 8, 4, 2],
High-dimensional integration: The quasi-Monte Carlo way
which, leaving aside the index 1, is the reverse order of the
reordering, together with a column of zeros for k = 0, gives

  1 6 3 7 9 10 5 8 4

 
  2 1 6 3 7 9 10 5 8
 
  4 2 1 6 3 7 9 10 5
 
  8 4 2 1 6 3 7 9 10
 
1  5 8 4 2 1 6 3 7 9

2
Ω11 = B2  
 11 
10 5 8 4 2 1 6 3 7
 
 
9 10 5 8 4 2 1 6 3
 
 
7 9 10 5 8 4 2 1 6
 
 

3 7 9 10 5 8 4 2 1



6 3 7 9 10 5 8 4 2
23
1
C 11
213
z indices. This
the matrix


2 0 


4 0 


8 0 


5 0 


10 0 

,
9 0 


7 0 


3 0 


6 0 

1 0 
4
where the 10 × 10 submatrix C 11 is circulant.
We can halve the number of rows and the number of columns of the
circulant submatrix C 11 due to the symmetry B2 (x) = B2 (1 − x), to obtain
the reduced form (denoted by a ∼ instead of =)


2
Ω11
 

1

  2
1 
 
∼ B2   4
 11 
  8


5

1
6
1
2
4
8
3
6
1
2
4
23
7
3
6
1
2
11
C
9
7
3
6
1
4
!
!
!
!
!
!
!
!
!
!
!
0
0
0
0
0






 ,





2
where we remember that the full matrix Ω11 can be obtained by doubling
/ 11 vertically and horizontally up to the
the 5 × 5 circulant submatrix C
double bar. This concludes our simple example.
If we express the reordering of indices induced by g as permutations ΠTg
and Πg−1 on the rows and columns of the matrix Ωn , then we obtain
g −1 y = Ωn ps−1 ⇐⇒ ΠTg y = (ΠTg Ωn Πg−1 ) (ΠTg−1 ps−1 ) = Ωg
n ps−1 .
Since the initialization of p0 = 1 is unaffected by any permutation, we only
need to work with the permuted vector throughout. We just need to make
sure that the correct index zs is chosen at the select stage.
214
J. Dick, F. Y. Kuo and I. H. Sloan
A matrix–vector multiplication with a circulant matrix C m of size m×m,
with first column c, can be obtained by
C m x = F −1
m diag(F m c) F m x
= IFFT(FFT(c).∗ FFT(x)),
where
F m = [exp(−2πi p q/m)]0≤p,q≤m−1
is the Fourier matrix of size m × m, and
1
[exp(2πi p q/m)]0≤p,q≤m−1 .
m
Using FFT and IFFT, this matrix–vector multiplication takes O(m log m)
operations instead of the usual O(m2 ), and uses O(m) memory.
Using permutation together with FFT, the cost for the CBC construction
can be reduced to O(n log n smax ) operations, at the expense of O(n) storage.
This approach is called the fast CBC construction, and is due to Nuyens
and Cools (2006a, 2006b).
For composite n, the general idea is that the complete matrix Ωn can be
partitioned in blocks which have a circulant or block-circulant structure.
We summarize the fast CBC algorithm in the pseudocode below. We
stress again that the key point is to use FFT to compute the matrix–vector
g
product Ωn ps−1 , taking care to select the correct zs under the corresponding permuted set of indices.
F −1
m =
Pseudocode 3 (Fast CBC: matrix–vector form with permuted matrix)
p0 = 1
e20 = 0
for s from 1 to smax do
γs g
g
Ω p
compute – use FFT
e2s = (1 + γs β)e2s−1 +
n n s−1
zs = argminz∈Un e2s (z)
select – pick the correct index
set
e2s = e2s (zs )
g
update
ps = 1 + γs (Ωn (zs , :) + β) .∗ ps−1
end for
5.6. Fast CBC construction for POD weights
Here we explain how the fast CBC construction can be extended to POD
(product and order-dependent) weights (see (4.4)), covering order-dependent weights as a special case. For completely general weights γu the cost
2
for evaluating [esh
n,s (z)] in (5.12) is prohibitively expensive, whereas for
High-dimensional integration: The quasi-Monte Carlo way
215
2
POD weights their special structure enables us to compute [esh
n,s (z)] using
a recursive argument starting from dimension one.
We restrict ourselves to the unanchored Sobolev space. (Recall that the
CBC algorithm for the anchored Sobolev space minimizes an auxiliary quantity that depends on the auxiliary weights (5.22). The POD form of the
original weights is not preserved under the auxiliary weights, thus the strategy to be described here is not applicable to the anchored space setting; see
also the remark at the end of Section 5.4 for cases where fast CBC for POD
weights can be used.) However, the approach can be generalized to other
shift-invariant kernels if B2 is replaced by another function that integrates
to zero.
With POD weights γu = Γ|u| j∈u| γj , we can write the shift-averaged
worst-case error at step s (using again the simplified notation of the previous
subsection) as
n−1
kzj
1 2
γ j B2
Γ|u|
es (zs ) =
n
n
k=0 ∅=u⊆{1:s}
1
=
n
n−1
s
j∈u
k=0 =1 u⊆{1:s}
|u|=
kzj
γ j B2
Γ
.
n
(5.29)
j∈u
23
1
4
ps, (k)
We split the sum over u into two depending on whether s ∈ u, to obtain
0
n−1
s
kzj
1
2
es (zs ) =
γ j B2
Γ
n
n
k=0 =1 u⊆{1:s−1}
|u|=
j∈u
23
1
Γ
+
γ s B2
Γ−1
1
4
ps−1, (k)
kzs
n
23
u⊆{1:s−1}
|u|=−1
41
Ωn (zs ,k)
=
e2s−1
5
kzj
γ j B2
Γ−1
n
j∈u
23
s
n−1
Γ
γs +
Ωn (zs , k)
ps−1,−1 (k) .
n
Γ−1
k=0
4
ps−1,−1 (k)
(5.30)
=1
The computation of (5.30) for all zs ∈ Un can be expressed in matrix–
vector form as
s
Γ
γs
2
2
p
,
es = es−1 1 + Ωn
n
Γ−1 s−1,−1
=1
216
J. Dick, F. Y. Kuo and I. H. Sloan
with the update formula
ps, = ps−1, +
Γ
γs Ωn (zs , :).∗ ps−1,−1
Γ−1
for = 1, . . . , s,
starting from
ps,0 = 1 and
ps, = 0 for all s ≥ 1 and > s.
We need to store the vectors ps, for = 1, . . . , s, each of length n. These
s vectors correspond to the s possible values of |u| for ∅ = u ⊆ {1 : s}. These
vectors can be overwritten for dimension s + 1. Thus we require O(n smax )
storage overall. The update cost for dimension s is O(n s) operations.
If we permute the rows and columns of the matrix Ωn and use FFT to
carry out the matrix–vector multiplication as in the case of product weights,
then the search cost for each dimension is O(n log n) operations. The overall
cost, combining search and update in all dimensions up to smax , is
O(n log n smax + n s2max )
operations.
If the weights are of finite order q, that is, Γ = 0 for all > q, then the
update cost for every dimension is capped at O(n q) operations, with O(n q)
memory requirement, and the overall cost combining search and update in
all dimensions up to smax is reduced to
O(n log n smax + n q smax )
operations.
The reader may question why we did not leave out the factor Γ from
the definition of ps, (k) in (5.29). Indeed, the expression (5.30) and the
update formula would have been simpler if we had defined ps, (k) that way.
The problem is that for certain POD weights we may have Γ growing very
quickly with increasing and γj decaying with increasing j: for example,
j −2.1 .
γu = (|u|!)4/3
j∈u
For weights of this kind, the alternative approach would be numerically
unstable. The approach we outlined here works with the ratio Γ /Γ−1 ,
thus avoiding the potential problem of overflow when working with Γ .
5.7. Extensible lattice sequences
The CBC construction yields a lattice rule which is extensible in dimension,
but not extensible in the number of points. Recall that the formula for
obtaining the ith point of an n-point lattice rule with generating vector z is
i
z .
(5.31)
ti =
n
217
High-dimensional integration: The quasi-Monte Carlo way
In an extensible lattice sequence we take z to be a vector of b-adic numbers
∞
r
r=0 ar b , where ar ∈ {0, 1, . . . , b − 1}, and allow n to increase. In an
extensible lattice sequence with base b ≥ 2, the formula (5.31) is changed to
ti = {φb (i) z},
(5.32)
where φb (·) is the radical inverse function in base b (or the Gray code variant
in which the successive indices differ only in one b-ary digit). The formula
(5.32) does not require you to know n in advance, and so in practice you can
add more points to your lattice rule approximation until you are satisfied
with the error.
When n = bm for any m ≥ 1, the formulas (5.31) and (5.32) produce the
same set of points, only the ordering of the points is different. Therefore,
to obtain the points of an extensible lattice rule only at exact powers of the
base, one can avoid the radical inverse function and still use the formula
(5.31). For example, if we take b = 2 and denote by Lm the lattice point
set with 2m points, then we can illustrate the lattice sequence as follows:
t0 = 0, t1 = 12 , t2 , t3 , t4 , t5 , t6 , t7 , t8 , t9 , t10 , t11 , t12 , t13 , t14 , t15 , . . . .
1 23 4 1 23 4 1 23 4 1 23 4 1
23
4
1
1
1
1
L0
23
L1
∆L1
4
∆L2
23
L2
∆L3
∆L4
4
23
4
L3
23
4
L4
The number of points in Lm doubles as we move along the sequence, and
the set of additional points ∆Lm = Lm \ Lm−1 is generated by {iz/2m }
with i running through all the odd numbers up to 2m − 1. In Table 5.1 we
show how the points are ordered under radical inverse and the Gray code
variant.
Empirical searches for extensible lattice sequences were considered in
Hickernell, Hong, L’Ecuyer and Lemieux (2000). Then it was proved by
Hickernell and Niederreiter (2003) that good generating vectors exist for
extensible lattice sequences, but the proof was non-constructive.
Cools, Kuo and Nuyens (2006) constructed embedded lattice rules that are
good for n in a large practical range bm1 ≤ n ≤ bm2 . They used a modified
CBC algorithm to construct a generating vector z, minimizing as much as
possible the search criterion
Xm1 ,m2 ,s (z) :=
max
m1 ≤m≤m2
esh
bm ,s (z)
(m) )
esh
bm ,s (z
,
(m) ) denotes the shifted-averaged worst-case error for the genwhere esh
bm ,s (z
erating vector obtained by the original CBC construction for bm points,
218
J. Dick, F. Y. Kuo and I. H. Sloan
Table 5.1. Ordering of lattice points.
i
Natural order
Radical inverse
Gray code variant
L0
0
0
0
0
∆L1
1
1/2 = 0.5
(0.1)2 = 0.5
(0.1)2 = 0.5
∆L2
2
3
1/4 = 0.25
3/4 = 0.75
(0.01)2 = 0.25
(0.11)2 = 0.75
(0.11)2 = 0.75
(0.01)2 = 0.25
∆L3
4
5
6
7
1/8 = 0.125
3/8 = 0.375
5/8 = 0.625
7/8 = 0.875
(0.001)2
(0.101)2
(0.011)2
(0.111)2
= 0.125
= 0.625
= 0.375
= 0.875
(0.011)2
(0.111)2
(0.101)2
(0.001)2
= 0.375
= 0.875
= 0.625
= 0.125
∆L4
8
9
10
11
12
13
14
15
1/16 = 0.0625
3/16 = 0.1875
5/16 = 0.3125
7/16 = 0.4375
9/16 = 0.5625
11/16 = 0.6875
13/16 = 0.8125
15/16 = 0.9375
(0.0001)2
(0.1001)2
(0.0101)2
(0.1101)2
(0.0011)2
(0.1011)2
(0.0111)2
(0.1111)2
= 0.0625
= 0.5625
= 0.3125
= 0.8125
= 0.1875
= 0.6875
= 0.4375
= 0.9375
(0.0011)2
(0.1011)2
(0.1111)2
(0.0111)2
(0.0101)2
(0.1101)2
(0.1001)2
(0.0001)2
= 0.1875
= 0.6875
= 0.9375
= 0.4375
= 0.3125
= 0.8125
= 0.5625
= 0.0625
which is used as the reference error to provide a good normalization. The
components of z can take values up to bm2 − 1. Due to many structures in
the embedding, the computational cost for the modified CBC algorithm is
only a factor of O(m2 ) more than the cost for constructing a lattice rule with
a fixed number of points. Figure 5.2 illustrates the embedding structure.
Numerically, it is found in Cools et al. (2006) for 210 ≤ n ≤ 220 , s up
to 360, and a number of different choices of product weights and orderdependent weights, that the quantity Xm1 ,m2 ,s (z) resulting from the modified CBC construction is at most 1.6, indicating that the embedded lattice
rules achieve the near-optimal rate of convergence of lattice rules with a
fixed number of points, but with an implied constant up to 1.6 times larger.
Dick, Pillichshammer and Waterhouse (2008) introduced a sieve algorithm
to obtain extensible lattice sequences. The general principle is to begin
with a number of generating vectors that are good for some value of n, drop
those vectors that are not good for a larger value of n, and then repeat
this process for larger and larger values of n. Some parameters need to be
specified beforehand to ensure that the sieve process leaves a non-empty set
of generating vectors. Note that this sieve algorithm yields lattice sequences
219
High-dimensional integration: The quasi-Monte Carlo way
(2) grouping on divisors
32
64
128
8
16
4
2
1
32
64
128
8
(4) symmetric reduction after application
of B2 kernel function
16
4
2
1
(1) natural ordering of the indices
(3) generator ordering of the indices
Figure 5.2. (Image by Dirk Nuyens.) The structure of the matrix Ω128 for
the fast CBC construction of lattice rules, illustrating the reordering of
indices and the reduction due to symmetry. The matrices Ωn for n = 2m
with m < 7 are embedded in Ω128 .
that are extensible in n but the algorithm is very expensive to implement.
Dick et al. (2008) also introduced a cheaper version called the CBC sieve
algorithm, which yields embedded lattice rules that work for a finite set of
n and are extensible in s. This theory can also be modified to provide the
theoretical justification for the algorithm from Cools et al. (2006).
5.8. Lattice rules in weighted Korobov spaces
In the previous subsections we have presented the lattice rule results for
weighted anchored or unanchored variants of Sobolev spaces containing
functions with square-integrable mixed first derivatives. Analogous results
were originally obtained in a different function space setting, the weighted
Korobov spaces of periodic functions.
We now briefly review the essential ingredients and results for weighted
Korobov spaces, with the motive of understanding how one might achieve
better than order-one convergence rate for non-periodic integrands. In the
Korobov space setting, there is a smoothness parameter α > 1/2. (The
parameter α as used here differs by a factor of two from the usage in Sloan
and Woźniakowski 2001.) The Korobov space can be viewed as an L2 version
of the Korobov class Eα (c) mentioned in Section 5.1. It again contains oneperiodic functions with absolutely convergent Fourier series (5.1) but with
220
J. Dick, F. Y. Kuo and I. H. Sloan
the inner product now given by
f, gs,α,γ =
j∈u(h) |hj |
γu(h)
h∈Zs
2α
f-(h) g-(h),
with u(h) := {j ∈ {1 : s} : hj = 0}. The reproducing kernel is
e2πihj (xj −yj )
γu
Ks,α,γ (x, y) =
|hj |2α
u⊆{1:s}
=
γu(h)
h∈Zs
=
h∈Zs
j∈u
hj ∈Z\{0}
e2πihj (xj −yj )
|hj |2α
j∈u(h)
γu(h)
e2πih·(x−y) .
2α
|h
|
j
j∈u(h)
From this last expression it is easy to verify the reproducing property.
The parameter α moderates the rate of decay of the Fourier coefficients,
and is related to the smoothness of the functions. To illustrate this point, we
note that when α is an integer, the norm in one dimension can be written as
1
2
1
(α) 2
1
2
f (t) dt +
f (t) dt.
f 1,α,γ =
(2π)2α γ 0
0
Thus α is precisely the number of available square-integrable derivatives.
When α is an integer, we can use the property (5.7) to rewrite the kernel
as
(2π)2α
B2α |xj − yj | ,
γu
(5.33)
Ks,α,γ (x, y) =
(−1)α+1 (2α)!
u⊆{1:s}
j∈u
where B2α (·) is the Bernoulli polynomial of degree 2α. Our earlier results
in the Sobolev space setting were originally obtained by observing that
the associated shift-invariant kernel (see (5.11)) is in fact the kernel for a
Korobov space with α = 1 but with some redefined weights.
It is straightforward to deduce, using Theorem 3.5, that the worst-case
error for a rank-one lattice rule in a weighted Korobov space satisfies
1
n
n−1
e2n,s (z) =
k=0 ∅=u⊆{1:s}
γu
j∈u h∈Z\{0}
For product weights, the expression simplifies to
n−1
s
1
1 + γj
e2n,s (z) = −1 +
n
k=0 j=1
h∈Z\{0}
e2πihkzj /n
.
|h|2α
e2πihkzj /n
,
|h|2α
High-dimensional integration: The quasi-Monte Carlo way
221
which is precisely the formula for Pα when all γj = 1 (see (5.4)), but with
α replaced by 2α.
The following theorem summarizes the error bound for lattice rules constructed by the CBC algorithm in weighted Korobov space. We remark
that the result holds for unshifted lattice rules, but since the reproducing
kernel is shift-invariant, the result applies also to shifted lattice rules. The
best convergence rate is obtained by taking λ → 1/(2α), which yields close
to O(n−α ) convergence.
Theorem 5.12 (optimal CBC error bound in Korobov spaces).
The generating vector z constructed by the CBC algorithm in Korobov
spaces, minimizing e2n,s (z) in each step, satisfies
1/λ
1
2
λ
|u|
γu (2ζ(2αλ))
en,s (z) ≤
ϕ(n)
∅=u⊆{1:s}
for all λ ∈ (1/(2α), 1], where ζ(·) is the Riemann zeta function (5.5) and
ϕ(n) is the Euler totient function (5.14).
Proof.
Using the character property (5.2), we can write
1
γu
.
e2n,s (z) =
2α
j∈u |hj |
∅=u⊆{1:s}
hu ∈(Z\{0})|u|
hu ·z u ≡0 (mod n)
The rest of the proof follows the proof of Theorem 5.8, which is essentially
the case α = 1, but with weights scaled by (2π)|u| .
The results for Korobov spaces are not often usable in practice because
integrands are typically not fully periodic. However, the analysis in Korobov
spaces provides the fundamental framework for the lattice rule analysis in
Sobolev spaces. There we achieved order-one convergence via shifting: the
shift-averaged worst-case error in Sobolev spaces of smoothness one is related to the Korobov space with α = 1. Our aim now is to achieve higher
convergence rates for non-periodic functions by linking with the results in
Korobov space with α > 1.
5.9. The baker’s transformation
Extending Example 3.8 to s dimensions and introducing weights as in Section 4, we obtain the reproducing kernel for the weighted and unanchored
Sobolev space of smoothness α:
α
B2α (|xj − yj |) Br (xj ) Br (yj )
+
. (5.34)
γu
Ks,α,γ (x, y) =
(−1)α+1 (2α)!
(r!)2
u⊆{1:s}
j∈u
r=1
222
J. Dick, F. Y. Kuo and I. H. Sloan
Comparing with (5.33), we see that this kernel has additional terms involving the lower-degree Bernoulli polynomials B1 , . . . , Bα . We already know
that in the case of smoothness one (α = 1) the associated shift-invariant
kernel does not involve the lower-degree polynomial B1 ; in short, shifting
removes B1 . In this subsection we explain how shifting followed by folding
removes B2 .
The baker’s transformation makes use of the function
2t
if t ≤ 12 ,
φ(t) := 1 − |2t − 1| =
2 − 2t if t > 12 .
It is called the baker’s transformation since it emulates how a baker stretches
and folds bread dough. In one dimension, if we apply the baker’s transformation to the n points of a left-rectangle rule 0, 1/n, 2/n, . . . , (n − 1)/n,
where n is even, then we obtain the points
n−2 n−4
2
2 4
,
,..., ,
0, , , . . . , 1,
n n
n
n
n
which are the points for a trapezoidal rule. A rectangle rule has quadrature error O(n−1 ), but a trapezoidal rule has quadrature error O(n−2 ) if
the integrand is sufficiently smooth. This is the underlying motivation for
considering the baker’s transformation. Its application in the context of
QMC is due to Hickernell (2002). In the following, we refer to the process
of applying the baker’s transformation as folding and the resulting point set
as the folded point set.
A shifted and then folded QMC rule takes the form
Qφn,s (∆; f )
n−1
=
1 <
=
f φ ti + ∆
.
n
i=0
Note that this can also be viewed as applying a shifted lattice rule to a
transformed integrand g = f ◦ φ. However, this interpretation is not helpful
because the transformed integrand g does not have sufficient smoothness to
get higher-order convergence.
As in Section 5.2, for a given point set P we let φ(P + ∆) denote the
shifted and then folded QMC point set. Then, for a given shift ∆ we have
|Is (f ) − Qφn,s (∆; f )| ≤ en,s (φ(P + ∆); H) f H ,
and for a randomly shifted and folded rule we have
E|Is (f ) − Qφn,s (∆; f )|2 ≤ esh,φ
n,s (P ; H) f H ,
High-dimensional integration: The quasi-Monte Carlo way
where
223
esh,φ
n,s (P ; H) :=
[0,1]s
e2n,s (φ(P + ∆); H) d∆.
(5.35)
Theorem 5.13 (formula for the shift-averaged and folded worstcase error). The shift-averaged and folded worst-case error (5.35) for a
QMC point set P in an RKHS Hs (K) satisfies
n−1 n−1
1 sh,φ
sh,φ
2
[en,s (P ; Hs (K))] = −
K(x, y) dx dy + 2
K
(ti , tk ),
n
[0,1]s [0,1]s
i=0 k=0
where
K
sh,φ
(x, y) :=
K(φ{x + ∆}, φ{y + ∆}) d∆
[0,1]s
for all x, y ∈ [0, 1]s .
(5.36)
Proof. Using the definition (5.35) and applying the formula (3.9) for the
worst-case error en,s (φ(P + ∆); Hs (K)), we obtain
2
[esh,φ
n,s (P ; Hs (K))]
n−1 2
=
K(x, y) dx dy −
K(φ({ti + ∆}), y) d∆ dy
n
s [0,1]s
[0,1]s [0,1]s
[0,1]
i=0
n−1
n−1
1
+ 2
K(φ({ti + ∆}), φ({tk + ∆})) d∆.
n
[0,1]s
i=0 k=0
With a change of variables
w = {ti + ∆}, the double integral in the sec
ond term becomes [0,1]s [0,1]s K(φ(w), y) dw dy, which is easily seen to
equal [0,1]s [0,1]s K(x, y) dx dy. The result then follows from the definition
(5.36).
We now demonstrate the effect of shifting followed by folding for the case
α = 2. To obtain the shifted and folded kernel K sh,φ associated with (5.34),
we need to evaluate the following integrals:
1
B1 (φ({x + ∆})) B1 (φ({y + ∆})) d∆,
I1 (x, y) :=
0
1
B2 (φ({x + ∆})) B2 (φ({y + ∆})) d∆,
I2 (x, y) :=
0
1
B4 (|φ({x + ∆}) − φ({y + ∆})) d∆.
J4 (x, y) :=
0
By using the Fourier series expansion (5.6) of the Bernoulli polynomials,
224
J. Dick, F. Y. Kuo and I. H. Sloan
it can be shown (Hickernell 2002) that
=
4
4 <
I1 (x, y) = − B4 ({x − y}) + B4 x − y − 12 ,
3
3
=
4
4 <
I2 (x, y) = − B4 ({x − y}) − B4 x − y − 12 ,
3
3
=
32
8 <
J4 (x, y) = B4 ({x − y}) − B4 x − y − 12
3
3
=
128
128 <
B6 ({x − y}) −
B6 x − y − 12 .
+
15
15
Since the lowest-degree Bernoulli polynomial in the above expressions is B4 ,
we conclude that the shifted and folded kernel is related to the kernel of the
Korobov space with smoothness parameter α = 2. This indicates that we
get a convergence rate close to O(n−2 ).
In addition, it can be shown that the CBC construction yields lattice rules
achieving a convergence rate close to O(n−2 ), and that the implementation
can be done in a fast way for product or POD weights. We omit the details.
It is possible to generalize the baker’s transformation to obtain even higher
rates of convergence, but there is a seemingly inevitable dependence on s,
prohibiting the strategy for practical use in high dimensions.
5.10. Periodization
From the results for the Korobov space, we know that lattice rules can
yield higher rates of convergence for periodic integrands. This brings us to
consider strategies for transforming a non-periodic integrand to a periodic
integrand.
We now briefly explain one periodization strategy. Let ω : [0, 1] → R be
a smooth function, with
1
ω(y) dy = 1.
ω(0) = ω(1) = 0, ω(y) > 0 for all y ∈ (0, 1), and
0
Define
y
ω(u) du.
ψ(y) :=
0
Then ψ is an increasing function satisfying
ψ (y) = ω(y),
ψ(0) = 0 and
ψ(1) = 1.
The periodization transformation is essentially a change of variables x =
ψ(y) = (ψ(y1 ), . . . , ψ(ys )) so that
f (x) dx =
F (y) dy = Is (F ),
Is (f ) =
[0,1]s
[0,1]s
High-dimensional integration: The quasi-Monte Carlo way
with
F (y) := f (ψ(y))
s
225
ω(yj ).
j=1
The function F is periodic since ω vanishes at the end points.
Popular choices of ω include the family of polynomial transformations
of Korobov (1963) and the family of trigonometric transformations of Sidi
(1993); see also Laurie (1996). For example, the former family is given by
2α α
ω(y) = (2α + 1)
y (1 − y)α
α
for a positive integer α. In particular, we have ω (r) (0) = ω (r) (1) for all r ≤
α − 1. With this choice of ω, and if f is sufficiently smooth, the transformed
integrand F belongs to the Korobov space with smoothness parameter α,
and the lattice rule convergence rate is therefore close to O(n−α ).
The problem with the periodization strategy is that the norm of the
transformed integrand F can be exponentially large in s, thus is generally
only feasible for small to moderate s.
As we explained in the previous subsection, the baker’s transformation
can also be viewed as a transformation of the integrand. The baker’s transformation makes the end points of the function take the same value by reflection. The periodization strategy, on the other hand, makes the function
value and perhaps also some derivatives vanish at the end points.
5.11. Notes
Lattice rules date back to the works of number theorists Korobov (1959)
and Hlawka (1962). There is an extensive literature up to the late 1990s,
which we refer to as the classical results; these are reviewed in detail in the
books by Niederreiter (1992a) and Sloan and Joe (1994); see also Hickernell
(1998b). The modern consideration of lattice rules can be said to have begun
with Sloan and Woźniakowski (2001), who proved the existence of good
lattice rules for periodic integrands in weighted Korobov spaces, and via
the shift-invariant kernel, they showed that there exist good shifted lattice
rules for non-periodic integrands in weighted Sobolev space of smoothness
one. Although the results were non-constructive, they demonstrated that
lattice rules have a role to play for truly high-dimensional integration as
well as for non-periodic integrands.
The CBC construction was first considered by Korobov (1959) (see also
Korobov 1963), who proved error bounds for Pα of order n−α+δ for any
δ > 0 and for arbitrary n. The modern analysis of CBC construction
for the lattice rule generating vector began with Sloan and Reztsov (2002)
for the classical (unweighted) criterion Pα in the periodic setting. Sloan,
226
J. Dick, F. Y. Kuo and I. H. Sloan
Kuo and Joe (2002a) then introduced the CBC algorithm to construct a
deterministic shift alongside the generating vector for integration in the
non-periodic weighted Sobolev spaces. Subsequently, the CBC algorithm
for constructing randomly shifted lattice rules in weighted Sobolev spaces
(see Algorithm 5.6) was introduced in Sloan, Kuo and Joe (2002b). In
these earlier works only O(n−1/2 ) convergence rate was proved. The fact
that the CBC algorithm achieves the (optimal) convergence rate close to
O(n−α ) in weighted Korobov and close to O(n−1 ) in weighted Sobolev spaces
with smoothness one was later proved by Kuo (2003). Extensions of these
results from prime n to composite n were given in Kuo and Joe (2002)
with convergence rate of O(n−1/2 ), and Dick (2004) with convergence rate
close to O(n−1 ). Other considerations include the CBC construction of
intermediate-rank lattice rules (Kuo and Joe 2003), and a modified CBC
algorithm making use of the prime factorization of n for the purpose of
speeding up the construction (Dick and Kuo 2004a, 2004b). All of the above
results were originally proved in the setting of product weights. Results for
general weights in Korobov spaces were obtained in Dick et al. (2006), and
the corresponding results for weighted Sobolev spaces were obtained in Sloan
et al. (2004), again via the shift-invariant kernel.
The fast CBC construction was first introduced for prime n in Nuyens
and Cools (2006a) and later extended to composite n in Nuyens and Cools
(2006b). The special case of n being a power of 2 is discussed in Cools
et al. (2006). Fast CBC for POD weights was first considered in Kuo,
Schwab and Sloan (2011). Some good generating vectors for lattice rules can
be downloaded from http://www.maths.unsw.edu.au/∼fkuo/lattice/. Implementation of fast CBC constructions for various forms of weights can be
obtained from http://people.cs.kuleuven.be/∼dirk.nuyens/fast-cbc/.
Extensible lattice sequences were considered in Maize (1980), Korobov
(1982), and Hickernell et al. (2000), but the first theoretical result proving
that good extensible lattice sequences exist was proved by Hickernell and
Niederreiter (2003). Cools et al. (2006) introduced a fast CBC construction for embedded lattice rules, while Dick et al. (2008) analysed a sieve
algorithm and also provided the theoretical justification for the algorithm
in Cools et al. (2006). An algorithm was introduced in Niederreiter and
Pillichshammer (2009) which constructs the lattice rule digit by digit. Hickernell, Kritzer, Kuo and Nuyens (2012) investigated strategies for applying
a lattice sequence point by point.
Joe (2004) introduced a CBC construction of lattice rules based on an
error criterion called R, which is part of an upper bound to the classical
star discrepancy. This was later extended to the weighted setting for product weights in Joe (2006), and to composite n in Sinescu and Joe (2008).
General weights were considered by Sinescu and Joe (2007) for prime n, and
by Sinescu and L’Ecuyer (2011) for composite n. The analysis for general
High-dimensional integration: The quasi-Monte Carlo way
227
weights presented difficulties. (Recall that the CBC construction for the
anchored space with general non-product weights must work with an auxiliary quantity. There is a similar difficulty here.) Thus, instead of working
with weighted-R, the last two papers considered a related quantity which
/ and which forms part of a different upwe shall refer to as weighted-R,
per bound to the weighted star discrepancy. This argument holds under a
restrictive monotonicity assumption on the weights that
γv ≥ γu
whenever
v ⊆ u.
(5.37)
/ does not require a redefinition of
Switching from weighted-R to weighted-R
weights. Thus any structure in the weights, such as the POD form, would
be preserved. Note, however, that the given POD weights may not satisfy
the monotonicity requirement (5.37) needed to allow for this switch from
/ (We remark that imposing condition (5.37) does
weighted-R to weighted-R.
not appear to eliminate the need to work with the auxiliary quantity in the
anchored space with general non-product weights.)
The baker’s transformation (which is called ‘tent transformation’ in ergodic theory) for shifted lattice rules was first introduced by Hickernell
(2002) to gain an extra order of convergence for non-periodic integrands,
that is, from nearly O(n−1 ) to nearly O(n−2 ). The recent work by Dick
et al. (2013) generalizes this strategy to obtain even higher rates of convergence. There is, however, an inevitable dimension dependence. This is
similar to the issue with periodization in high dimensions, which was discussed in Kuo, Sloan and Woźniakowski (2007). An interesting discovery
by Dick et al. (2013) is that the baker’s transformation can be applied to
an unshifted lattice rule, to achieve close to O(n−1 ) convergence for nonperiodic integrands, without the need to use random shifts. This is different
from most of the results discussed in this section.
Randomly shifted lattice rules for integrands over unbounded domains
(including Rs ) were considered in Kuo, Wasilkowski and Waterhouse (2006b)
and Kuo et al. (2010a). The strategy is to transform the integrals to the
unit cube by a change of variables, using the inverse of the cumulative
distribution function of some univariate probability density function. The
transformed integrands over the unit cube typically do not belong to the
standard Sobolev space setting, and this is why new theory is needed. It is
proved that good lattice rules can be constructed by a CBC algorithm in a
similar way to the standard theory in Sobolev spaces.
Another branch of lattice rule analysis focuses on the trigonometric degree
and similar quantities; see for example Cools and Lyness (2001), Lyness and
Sørevik (2006), Cools and Nuyens (2008) and the references therein. Recent
constructions in terms of trigonometric degrees include the works by Cools,
Kuo and Nuyens (2010), Achtsis and Nuyens (2012) and Kämmerer, Kunis
and Potts (2012).
228
J. Dick, F. Y. Kuo and I. H. Sloan
Other search criteria for lattice rules were reviewed in L’Ecuyer and
Lemieux (2000), Lemieux and L’Ecuyer (2001) and L’Ecuyer and Munger
(2012), while L’Ecuyer, Munger and Tuffin (2010) considered the distribution of the integration error with respect to the random shift.
Lattice rules with exponential convergence rates in (modified) weighted
Korobov spaces were considered in Dick, Larcher, Pillichshammer and Woźniakowski (2011).
Approximation of functions by lattice rules was considered in Li and Hickernell (2003) and Zeng, Leung and Hickernell (2006). CBC constructions of
lattice rules for the approximation of functions were analysed in Kuo, Sloan
and Woźniakowski (2006a, 2008b) and Kuo, Wasilkowski and Woźniakowski
(2009).
The choice of weights in relation to the application of lattice rules were
considered in Wang and Sloan (2006), Dick (2012) and Kuo et al. (2012).
The character property (5.2) is based on the following general algebraic
structure: The set ([0, 1)s , ⊕) with addition modulo 1 given by x ⊕ y :=
{x + y} forms an abelian group, and the set of lattice points Plat = {tk =
{kz/n} : 0 ≤ k < n} is a finite subgroup. The set K of functions χh :
[0, 1)s → {z ∈ C : |z| = 1}, h ∈ Z, given by
χh (x) = e2πih·x ,
constitute a group homomorphism of the group ([0, 1)s , ⊕) called characters.
They themselves form a group via the multiplication χh χk = χh+k which
is isomorphic to the group (Zs , +). The character-theoretic dual lattice of
the lattice point set Plat is given by the set of characters
L⊥ = {χh : h ∈ Zs , χh ({kz/n}) = 1 for all 0 ≤ k < n},
which is a subgroup of the set of characters. This can be used to define
⊥ = K/L⊥ , which can be viewed as the dual group of
the quotient group Plat
characters of a lattice point set Plat . Since the characters of ([0, 1)s , ⊕) are
the basis functions of Fourier series, it is natural to study lattice rules for
numerical integration of Fourier series. We will use an analogous algebraic
structure in the next section.
6. Digital nets and sequences
In the previous section we discussed the modern theory of lattice rules.
Many of the ideas there apply also to polynomial lattice rules, or more
generally, to nets and sequences. For lattice rules we have seen that Fourier
series play an important role. For polynomial lattice rules and nets this role
is played by Walsh series, which we introduce in the following subsection.
Throughout this section we denote by N0 the set of non-negative integers
and N the set of positive integers.
High-dimensional integration: The quasi-Monte Carlo way
229
6.1. A brief discussion of numerical integration of Walsh series
Walsh (1923) introduced a system of functions denoted by {walk : k ∈ N0 },
which is in some way similar to the trigonometric function system {e2πikx :
k ∈ Z} that is connected to the well-known Fourier theory.
Definition 6.1 (Walsh functions in one dimension). For b ≥ 2 we
denote by ωb := e2πi/b the primitive bth root of unity. Let k ∈ N0 with badic expansion k = κ0 + κ1 b + · · · + κr−1 br−1 . The kth b-adic Walsh function
walk : R → C, periodic with period one, is defined by
κ ξ1 +κ1 ξ2 +···+κr−1 ξr
walk (x) := ωb 0
,
for x ∈ [0, 1) with b-adic expansion x = ξ1 b−1 + ξ2 b−2 + · · · (unique in the
sense that infinitely many of the digits ξi must be different from b − 1).
In the literature the function system defined above is often called the
generalized Walsh function system and only in the case b = 2 one speaks of
Walsh functions.
One of the main differences between Walsh functions and the trigonometric functions is that Walsh functions are only piecewise continuous. This is
clear, since Walsh functions are step functions as we now show.
Let k ∈ N with b-adic expansion k = κ0 + κ1 b + · · · + κr−1 br−1 . Let
J = [a/br , (a + 1)/br ), with integers r ≥ 1 and 0 ≤ a < br , be a so-called
elementary b-adic interval of order r. Let a have b-adic expansion of the
form a = α0 + α1 b + · · · + αr−1 br−1 . Then any x ∈ J has b-adic expansion
x = αr−1 b−1 + αr−2 b−2 + · · · + α0 b−r + ξr+1 b−(r+1) + · · ·
for some digits 0 ≤ ξi ≤ b − 1 for i ≥ r + 1, and hence
κ αr−1 +κ1 αr−2 +···+κr−1 α0
walk (x) = ωb 0
= walk (a/br ).
We summarize this result in the following proposition.
Proposition 6.2 (Walsh functions are step functions). Let k ∈ N
satisfy br−1 ≤ k < br for some r ∈ N. Then the kth Walsh function walk
is constant on elementary b-adic intervals of order r of the form [a/br , (a +
1)/br ) with value walk (a/br ). Further, wal0 ≡ 1.
Now we generalize the definition of Walsh functions to higher dimensions.
Definition 6.3 (Walsh functions in s dimensions). For dimension s ≥
2, and k1 , . . . , ks ∈ N0 we define the s-dimensional b-adic Walsh function
walk1 ,...,ks : Rs → C by
walk1 ,...,ks (x1 , . . . , xs ) :=
s
j=1
walkj (xj ).
230
J. Dick, F. Y. Kuo and I. H. Sloan
For vectors k = (k1 , . . . , ks ) ∈ Ns0 and x = (x1 , . . . , xs ) ∈ [0, 1)s we write,
with some abuse of notation,
walk (x) := walk1 ,...,ks (x1 , . . . , xs ).
The system {walk : k ∈ Ns0 } is called the s-dimensional b-adic Walsh function system.
Since any s-dimensional Walsh function is a product of one-dimensional
Walsh functions, it is clear that s-dimensional Walsh functions are step
functions too.
Let P = {t0 , t1 , . . . , tbm −1 } be a digital (t, m, s)-net constructed over Zb
as introduced in Section 2.6. The set P is an additive group whose dual
group of characters is given by the Walsh functions in the same base b, that
is, we have the following character property:
bm −1
1 if C1k1 + · · · + Csks = 0,
1 (6.1)
wal
(t
)
=
i
k
bm
0 otherwise,
i=0
where for a vector k ∈ N0 with b-adic expansion k = κ0 + κ1 b + · · · +
κm−1 bm−1 we write k = (κ0 , κ1 , . . . , κm−1 ) . This leads to a similar result
to Theorems 5.1 and 5.2 for lattice rules.
Theorem 6.4 (integration error for digital net). Assume we are given
a Walsh series
f-(k)walk (x),
f (x) =
k∈Ns0
where the Walsh coefficients f-(k) are given by
f (x)walk (x) dx.
f (k) =
[0,1]s
The integration error of approximating the integral of f using a QMC rule
Qbm ,s based on a digital net with bm points is given by
f-(k),
Qbm ,s (f ) − Is (f ) =
k∈D\{0}
where
D := {k ∈ Ns0 : C1 k1 + · · · + Csks = 0}.
(6.2)
is the dual net associated with the digital net.
Proof.
We have
b −1
b −1
1 1 f
(t
)
−
f
(x)dx
=
walk (ti ) − f-(0),
f
(k)
i
bm
bm
[0,1]s
s
m
i=0
m
k∈N0
i=0
The result then follows immediately from the character property (6.1).
High-dimensional integration: The quasi-Monte Carlo way
231
We consider now the case where the Walsh coefficients f-(k) decay with a
certain rate. For k ∈ N0 let
0 if k = 0,
µ1 (k) :=
(6.3)
a if k = κ0 + · · · + κa−1 ba−1 , κa−1 = 0.
(Later in Section 6.6 we generalize the function µ1 to µα .) For vectors
k = (k1 , . . . , ks ) ∈ Ns0 let
µ1 (k) := µ1 (k1 ) + · · · + µ1 (ks ).
Let
<
=
Eϑ,s (c) := f : [0, 1]s → R : |f-(k)| ≤ c b−ϑµ1 (k) for all k ∈ Ns0 ,
for some ϑ > 1 and c > 0. Then we obtain
b−ϑµ1 (k)
|Qbm ,s (f ) − Is (f )| ≤ c
for all f ∈ Eϑ,s (c).
k∈D\{0}
The quantity
Tϑ :=
b−ϑµ1 (k) ,
ϑ > 1,
(6.4)
k∈D\{0}
depends only on the function class Eϑ,s (1) and the digital net and can be
understood as the ‘worst-case error’ for this function class. Thus, for the
function class Eϑ,s (1) one aims at finding digital nets for which Tϑ is small.
Thus the quantity Tϑ plays a similar role to the quantity Pα for lattice rules
(see (5.3)).
We briefly describe the connection between the quantity Tϑ and the quality parameter t of the digital net. Lemma 2.14 implies that if k ∈ D \ {0},
then µ1 (k) > m − t. If the digital net is a strict (t, m, s)-net, then there
exists a k ∈ D \{0} such that µ1 (k) = m−t+1. Thus, the largest summand
in (6.4) is of the form b−ϑ(m−t+1) . On the other hand it can also be shown
that the quantity Tϑ is dominated by its largest term, that is, there is a
constant c̃ > 0 such that
(m − t)s−1
.
bϑ(m−t+1)
Thus digital nets for which m − t is large yield a small value of Tϑ and are
therefore useful for numerical integration of Walsh series.
Tϑ ≤ c̃
6.2. Digitally shift-averaged worst-case error
In this subsection we discuss the general strategy for the error analysis of
randomly digitally shifted QMC rules not specific to digital nets (similar to
Section 5.2), and then focus on digital nets in the next subsection.
232
J. Dick, F. Y. Kuo and I. H. Sloan
For any QMC point set P = {t0 , . . . , tn−1 } and any digital shift ∆ ∈
[0, 1]s , let
P ⊕ ∆ = {ti ⊕ ∆ : i = 0, 1, . . . , n − 1}
denote the digitally shifted QMC point set, and let Qn,s (⊕∆; f ) denote
the corresponding digitally shifted QMC rule, where ⊕ denotes digit-wise
addition modulo b already defined in Section 2.9. Then, for any integrand
f belonging to some normed space H, it follows from the definition of the
worst-case error that
|Is (f ) − Qn,s (⊕∆, f )| ≤ en,s (P ⊕ ∆; H) f H .
We consider the root-mean-square error
E |Is (f ) − Qn,s (⊕∆, f )|2 ≤ edsh
n,s (P ; H) f H ,
where the expectation E is taken over the random digital shift ∆ which is
uniformly distributed over [0, 1]s , and where the quantity
dsh
e2n,s (P ⊕ ∆; H) d∆
(6.5)
en,s (P ; H) :=
[0,1]s
is referred to as the digitally shift-averaged worst-case error.
The digitally shift-averaged worst-case error (5.8) will be used as our
quality measure for randomly digitally shifted QMC rules. The averaging
argument guarantees the existence of at least one digital shift ∆ for which
en,s (P ⊕ ∆; H) ≤ edsh
n,s (P ; H).
Analogously to Theorem 5.3, the following theorem gives an explicit formula for the digitally shift-averaged worst-case error when the function
space is an RKHS.
Theorem 6.5 (formula for the digitally shift-averaged worst-case
error). The digitally shift-averaged worst-case error (6.5) for a QMC point
set P in an RKHS Hs (K) satisfies
n−1 n−1
1 dsh
dsh
2
K(x, y) dx dy + 2
K (ti , tk ),
[en,s (P ; Hs (K))] = −
n
[0,1]s [0,1]s
i=0 k=0
where
K
dsh
(x, y) :=
[0,1]s
K(x ⊕ ∆, y ⊕ ∆) d∆
for all x, y ∈ [0, 1]s .
(6.6)
The proof follows along the same lines as the proof of Theorem 5.3.
The function K dsh defined by (6.6) is actually a reproducing kernel with
the digitally shift-invariant property
K dsh (x, y) = K dsh (x ⊕ ∆, y ⊕ ∆)
for all x, y, ∆ ∈ [0, 1]s ,
High-dimensional integration: The quasi-Monte Carlo way
233
or equivalently,
K dsh (x, y) = K dsh (x y, 0)
for all x, y ∈ [0, 1]s ,
where denotes digit-wise subtraction modulo b already defined in Section 2.9. As in Section 5.2, it can be verified that the digitally shift-averaged
worst-case error edsh
n,s (P ; Hs (K)) is precisely the worst-case error of the QMC
point set P in the RKHS with K dsh as the reproducing kernel, that is,
e2n,s (P ⊕ ∆; Hs (K)) d∆ = e2n,s (P ; Hs (K dsh )).
[0,1]s
This important connection provides a powerful tool for analysing randomly
digitally shifted QMC rules. We refer to the kernel K dsh as the digital
shift-invariant kernel associated with K.
6.3. Randomly digitally shifted digital nets in weighted Sobolev spaces
The digital shift-invariant kernel K dsh in the previous subsection has Walsh
series expansion
dsh (k)wal (x y)
K
K dsh (x, y) =
k
k∈Ns0
dsh (k) ≥ 0. If K is the unanchored Sobolev space,
for some coefficients K
then it can be shown that the Walsh coefficients of the digital shift-invariant
kernel satisfy
s
−2µ1 (k)
dsh
υb (kj ),
|K (k)| = b
j=1
for some function υb (k) (see (6.9) below). This leads to an expression of the
digitally shift-averaged worst-case error for the unanchored Sobolev space
using a QMC rule based on a digital net.
Theorem 6.6 (digitally shift-averaged worst-case error for digital
nets). The digitally shift-averaged worst-case error for a digital net P in
the weighted unanchored Sobolev space H(Ks,γ ) satisfies
2
γu
b−2µ1 (ku )
υb (kj ),
(6.7)
[edsh
bm ,s (P ; H(Ks,γ ))] =
∅=u⊆{1:s}
where
Du∗ =
ku ∈ N|u| :
ku ∈Du∗
j∈u
>
Cjkj ≡ 0 ,
(6.8)
1
1
1
,
υb (k) :=
−
2 sin2 (κa−1 π/b) 3
(6.9)
j∈u
and
234
J. Dick, F. Y. Kuo and I. H. Sloan
where k = κa−1 ba−1 + κa−2 ba−2 + · · · + κ0 with κa−1 ∈ {1, 2, . . . , b − 1}. In
particular, if b = 2 we have κa−1 = 1 and υb (k) is a constant independent
of k.
For the weighted anchored Sobolev space H(Ks,γ ) with anchor c we have
2
−2µ1 (kv )
(P
;
H(K
))]
=
γ
/
b
υb (kj ),
(6.10)
[edsh
m
s,γ
s,v
b ,s
kv ∈Dv∗
∅=v⊆{1:s}
j∈v
where υb (k) is defined as in (6.9) and γ
/s,v are the auxiliary weights defined
in (5.22).
Note that (6.7) and (6.10) are essentially weighted versions of the quantity
Tϑ in (6.4), with some additional factors υb (k) and a change of weights for
the anchored Sobolev space.
The proof of Theorem 6.6 is somewhat intricate and was shown by Dick
and Pillichshammer (2005) for the unanchored Sobolev space and by Dick,
Kuo, Pillichshammer and Sloan (2005) for the anchored Sobolev space. Instead of giving the proof, we provide an example to show how the decay rate
of the Walsh coefficients is connected to the smoothness of the integrand.
Example 6.7. To illustrate how to prove a bound on the decay rate of
the Walsh coefficients we consider the simplest possible example. Let f :
[0, 1] → R be absolutely continuous with
1
f (t)1[x,1] (t) dt.
f (x) = f (1) −
0
In this example let b = 2, and let k = 2a−1 + κa−2 2a−2 + · · · + κ0 ∈ N,
for some a ≥ 1. Then walk (x) ∈ {−1, 1} for all x ∈ [0, 1). Note that walk
changes sign on intervals of the form [2−a+1 , (+1)2−a+1 ) for 0 ≤ < 2a−1 ,
i.e., it has the opposite sign on the interval [2−a+1 , 2−a+1 +2−a ) compared
to the interval [2−a+1 + 2−a , ( + 1)2−a+1 ). Thus
2a−1
1
−1 2−a+1 +2−a
f (x)walk (x) dx ≤
|f (x) − f (x + 2−a )| dx
|f-(k)| = −a+1
0
≤
2
=0
1−2−a
|f (x) − f (x + 2−a )| dx
0
1
≤
0
−a
1−2−a
|f (t)|
0
1
≤2
|f (t)| dt < k
0
Thus for any k ∈ N0 we have
|f-(k)| ≤
1[x,x+2−a ) (t) dx dt
1
f H ,
max(1, k)
−1
1
0
|f (t)| dt
2
1/2
.
High-dimensional integration: The quasi-Monte Carlo way
235
where · H denotes the norm in the unanchored Sobolev space. This can
be generalized to arbitrary base b and dimensions s > 1.
In particular, if K : [0, 1]2 → R is the reproducing kernel for the unanchored Sobolev space, then K has Walsh coefficients K(k,
) which satisfy
|K(k,
)| ≤ Cb−µ1 (k)−µ1 () .
For the corresponding digital shift-invariant kernel defined by (6.6), it can
dsh (k) = K(k,
k). We therefore obtain
be shown that K
dsh (k)| = |K(k,
k)| ≤ Cb−2µ1 (k) .
|K
6.4. CBC construction of polynomial lattice rules
Analogous to lattice rules, no optimal explicit construction of polynomial
lattice rules is known beyond dimension 2. On the other hand, a CBC
approach is also possible for polynomial lattice rules. An essential ingredient
of the CBC algorithm is an easily computable quality criterion. We now
show that the digitally shift-averaged worst-case error (6.7) can be computed
efficiently.
Lemma 6.8. The digitally shift-averaged worst-case error for a digital
net P = {t0 , . . . , tbm −1 } in the weighted unanchored Sobolev space H(Ks,γ )
satisfies
b −1
1 = m
b
m
2
[edsh
bm ,s (P ; H(Ks,γ ))]
γu
i=0 ∅=u⊆{1:s}
with
φb (t) :=
6−1 −
6−1
τa0 (b−τa0 )
ba0 +1
φb (ti,j ),
(6.11)
j∈u
if 0 < t < 1,
if t = 0,
where, for t ∈ (0, 1) with b-adic expansion t = τ1 b−1 + τ2 b−2 + · · · (unique
in the sense that infinitely many τr are different from b − 1), a0 = −logb t
is the smallest positive integer such that τ1 = τ2 = · · · = τa0 −1 = 0 and
τa0 = 0.
Proof.
We can write from (6.7) that
2
[edsh
bm ,s (P ; H(Ks,γ )))]
=
−m
γu b
∅=u⊆{1:s}
−m
=b
m −1
b
m −1
b
b−2µ1 (ku )
i=0 ku ∈N|u|
i=0 ∅=u⊆{1:s}
γu
∞
j∈u k=1
υb (kj )walkj (ti,j )
j∈u
b−2µ1 (k) υb (k)walk (ti,j ).
(6.12)
236
J. Dick, F. Y. Kuo and I. H. Sloan
To simplify the proof we only consider the case b = 2 in the following.
Then υb (k) = 1/3 for all k ∈ N. Let ω2 = e2πi/2 = −1. Then we have
a −1
2
walk (t) =
−2µ1 (k)
2
1
κ
ω2 r−1
τr
r=1 κr−1 =0
k=2a−1
Thus
∞
a−1
 a−1

2
=
− 2a−1


0
walk (t) =
∞
2
a=1
k=1
=
−2a
a
0 −1
1
κ
ω2 a−1
τa
κa−1 =1
if τ1 = τ2 = · · · = τa = 0,
if τ1 = τ2 = · · · = τa−1 = 0, τa = 0,
otherwise.
a −1
2
walk (t)
k=2a−1
2−2a 2a−1 − 2−2a0 2a0 −1 = 2−1 − 2−a0 (1 + 2−1 ).
a=1
For t = 0 we have
∞
2−2µ1 (k) walk (0) =
k=1
∞
a=1
1
2−2a 2a−1 = .
2
The result now follows from (6.12).
The expression (6.11) can be easily evaluated by a computer. Thus we
can use the following CBC algorithm to construct polynomial lattice rules.
In the following we write
2
dsh
2
dsh
2
[edsh
bm ,s (P ; H(Ks,γ ))] = [ebm ,s (q)] = [ebm ,s (q1 , . . . , qs )]
if the digital net is a polynomial lattice rule with generating vector q =
(q1 , . . . , qs ) ∈ (G∗b,m )s , where G∗b,m = {q ∈ Zb [x] \ {0} : deg(q) < m}.
Algorithm 6.9 (CBC construction). Given: m, smax , and weights γu .
(1) Set q1 = 1.
2
(2) For s = 2, 3, . . . , smax , choose qs in G∗b,m to minimize [edsh
bm ,s (q1 , . . . , qs )] .
The next theorem states that the CBC algorithm yields a convergence
rate close to O(b−m ).
Theorem 6.10 (optimal CBC error bound). The generating vector q
constructed by the CBC algorithm, minimizing the shift-averaged worst-case
error for the unanchored Sobolev space edsh
bm ,s (q) in each step, satisfies
1/λ
1
2
[edsh
γuλ (ρb (λ))|u|
bm ,s (q)] ≤
bm − 1
∅=u⊆{1:s}
High-dimensional integration: The quasi-Monte Carlo way
for any 1/2 < λ ≤ 1, where

1


 3λ (22λ − 2)
ρb (λ) :=

(b − 1)(4b2 − 9)λ


54λ (b2λ − b)
237
for b = 2,
for b > 2.
Proof. Since the algorithm constructs the polynomial lattice rule inductively with respect to the dimension, the proof also uses induction. The argument follows along the same lines as for the lattice rule case. To simplify
the proof we only consider b = 2 in the following. In this case υ2 (k) = 1/3
for all k ∈ N in (6.7).
First note that we have for 1/2 < λ ≤ 1 that
∞
2−2λµ1 (k) =
∞
2−2λa 2a−1 =
a=1
k=1
1
−2
22λ
and
∞
k=1
2m |k
−2λµ1 (k)
2
=
∞
−2λµ1 (bm )
2
=
=1
∞
2−2λm 2−2λµ1 () = 2−2λm
=1
1
.
−2
22λ
For a polynomial lattice rule with generating vector (q1 , . . . , qs ) and modulus p, the set Du∗ in (6.8) is given by
>
Du∗ = ku ∈ N|u| :
trm (kj )qj ≡ 0 (mod p) ,
j∈u
where for kj = κj,0 + κj,1 2 + · · · ∈ N0 we set trm (kj ) = κj,0 + κj,1 x + · · · +
κj,m−1 xm−1 .
In dimension one we have
2
−1
[edsh
2m ,s (q1 )] = γ{1} 3
∞
2−2µ1 (k)
k=1
2m |k
= 2−2m γ{1} 3−1
∞
2−2µ1 () = 3−1 2−2m−1 γ{1} .
=1
Thus the result holds for dimension one.
Suppose, for some 1 ≤ d < s, we have (q1 , . . . , qd ) ∈ (G∗2,m )d and
2
[edsh
2m ,d (q1 , . . . , qd )] ≤
1
m
2 −1
∅=u⊆{1:d}
γuλ
1
3λ (22λ − 2)
|u| 1/λ
,
(6.13)
238
J. Dick, F. Y. Kuo and I. H. Sloan
for all 1/2 < λ ≤ 1. Now we consider (q1 , . . . , qd , qd+1 ). We have
2
[edsh
2m ,d+1 (q1 , . . . , qd , qd+1 )]
=
γu 3−|u|
2−2µ1 (ku ) +
∅=u⊆{1:d}
ku ∈Du∗
γu 3−|u|
ku ∈Du∗
{d+1}⊆u⊆{1:d+1}
2
= [edsh
2m ,d (q1 , . . . , qd )] + θ(qd+1 ),
(6.14)
where
θ(qd+1 ) =
2−2µ1 (ku )
γu
{d+1}⊆u⊆{1:d+1}
b−2µ1 (ku ) .
ku ∈Du∗
According to the algorithm, qd+1 is chosen such that the mean-square
2
worst-case error [edsh
2m ,d+1 (q1 , . . . , qd , qd+1 )] is minimized. Since the only
dependency on qd+1 is in θ(qd+1 ), we have θ(qd+1 ) ≤ θ(q) for all q ∈ G∗2,m ,
which implies that for any 1/2 < λ ≤ 1 we have θ(qd+1 )λ ≤ θ(q)λ for all
q ∈ G∗2,m . This leads to
1/λ
1
θ(q)λ
.
θ(qd+1 ) ≤
2m − 1
∗
q∈G2,m
We obtain a bound on θ(qd+1 ) through this last inequality.
For λ satisfying 1/2 < λ ≤ 1 it follows from Jensen’s inequality that
γuλ 3−λ|u|
2−2λµ1 (ku ) .
θ(q)λ ≤
{d+1}⊆u⊆{1:d+1}
ku ∈Du∗
The condition ku ∈ Du∗ is equivalent to the equation
trm (k1 )q1 + · · · + trm (kd )qd ≡ −trm (kd+1 )q
(mod p).
If kd+1 is a multiple of 2m , then trm (kd+1 ) = 0 and the corresponding term in
the sum is independent of q. If kd+1 is not a multiple of 2m , then trm (kd+1 )
can be any polynomial in G∗2,m . Moreover, since q = 0 and p is irreducible,
trm (kd+1 )q is never a multiple of p.
By averaging over all q ∈ G∗2,m , with the above discussion in mind, we
obtain
1
θ(q)λ
2m − 1
∗
q∈G2,m
≤
∞
2−2λµ1 (kd+1 )
kd+1 =1
2m |kd+1
+
∅=u⊆{1:d}
∞
2−2λµ1 (kd+1 )
2m − 1
kd+1 =1
2m kd+1
λ
γu∪{d+1}
3−λ(|u|+1)
∅=u⊆{1:d}
2−2λµ1 (ku )
ku ∈N|u|
trm (ku )·q u ≡0 (mod p)
λ
γu∪{d+1}
3−λ(|u|+1)
2−2λµ1 (ku )
ku ∈N|u|
trm (ku )·q u ≡0 (mod p)
High-dimensional integration: The quasi-Monte Carlo way
≤ 2−2λm
+
≤
=
1
3λ(|u|+1) (22λ
− 2)
∅=u⊆{1:d}
1 − 2−2λm
1
2m − 1 3λ(|u|+1) (22λ − 2)
1
1
2m − 1 3λ(|u|+1) (22λ − 2)
1
2m − 1
{d+1}⊆u⊆{1:d+1}
Thus we conclude that
1
θ(qd+1 ) ≤
m
2 −1
λ
γu∪{d+1}
∅=u⊆{1:d}
1
3λ (22λ − 2)
2−2λµ1 (ku )
ku ∈N|u|
trm (ku )·q u ≡0 (mod p)
λ
γu∪{d+1}
∅=u⊆{1:d}
γuλ
2−2λµ1 (ku )
ku ∈N|u|
trm (ku )·q u ≡0 (mod p)
λ
γu∪{d+1}
239
|u|
ku ∈N|u|
.
2−2λµ1 (ku )
γuλ
{d+1}⊆u⊆{1:d+1}
1
|u| 1/λ
3λ (22λ − 2)
which, together with (6.13) and (6.14), yields
2λ
2λ
[edsh
= [edsh
+ θ(qd+1 )λ
2m ,d+1 (q1 , . . . , qd , qd+1 )]
bm ,d (q1 , . . . , qd )]
|u|
1
1
λ
≤ m
γu
.
2 −1
3λ (22λ − 2)
∅=u⊆{1:d+1}
Hence the result follows by induction.
The choice of qd at each step is independent of λ and therefore the error
bound holds for all values of λ ∈ (1/2, 1]. The optimal convergence rate
close to O(b−m ) is obtained with λ → 1/2.
6.5. Fast CBC construction of polynomial lattice rules
In the previous subsection we showed how to construct, for a given modulus
p, a generating vector q which yields a polynomial lattice rule that achieves
a small integration error using a CBC algorithm. We have seen in Section 5.5 that FFT can be used to reduce the construction cost of the CBC
construction of lattice rules. We now show that the same technique can be
used for polynomial lattice rules by outlining the key differences from lattice
rules. For product weights, it is possible to construct, for a given polynomial p with deg(p) = m, an s-dimensional generating vector q in O(s m bm )
operations, compared to O(s2 b2m ) operations for a naive implementation
of the CBC algorithm.
Let b be a prime. Throughout this subsection we consider the polynomial
p ∈ Zb [x] with deg(p) = m to be irreducible. We restrict ourselves to
240
J. Dick, F. Y. Kuo and I. H. Sloan
product weights γu = j∈u γj , for a sequence of positive real numbers γj .
The extension to POD weights can be done as in the lattice rule case: see
Section 5.6.
Using Algorithm 6.9 we construct, component by component, a generating
vector q = (q1 , . . . , qs ) ∈ (G∗b,m )s such that for all 1 ≤ d ≤ s the quantity
2
e2d (qd ) := [edsh
bm ,d (q1 , . . . , qd−1 , qd )] is minimized with respect to qd for fixed
q1 , . . . , qd−1 . Assume that q1 , . . . , qd−1 are already constructed. Then we
have to find q ∈ G∗b,m which minimizes
d−1
j=1 (1 + γj φb (0))
2
dsh
2
ed (q) = [ebm ,d−1 (q1 , . . . , qd−1 )] + γd φb (0)
bm
m
b −1
q(x)i(x)
γd ,
+ m
ηd−1 (i)φb υm
b
p(x)
i=1
where we used Theorem 2.21, separated out the i = 0 term and introduced
the column vector η d−1 = ηd−1 (1), . . . , ηd−1 (bm − 1) , where
ηd−1 (i) =
d−1
(1 + γj φb (ti,j )),
i = 1, . . . , bm − 1.
j=1
Note that the vector η d−1 remains fixed for fixed d regardless of the choice
m
of polynomial q. We set η 0 = (1, . . . , 1) ∈ Rb −1 . The vector η d−1 can be
computed as part of the CBC algorithm with the update formula ηd−1 (i) =
ηd−2 (i)(1 + γd−1 φb (ti,d−1 )).
For short we write ω := φb ◦ υm from now on. We define the (bm − 1) ×
m
(b − 1) matrix
q(x)i(x)
Ωp := ω
.
q=1,...,bm −1
p(x)
m
i=1,...,b −1
Further, let e2d = (e2d (1), . . . , e2d (bm − 1)) be the column vector collecting
the quantities e2d (q) and let 1 be the (bm − 1)-dimensional column vector
whose components are all 1. Then we have
d
γd
j=1 (1 + γj φb (0))
1 + m Ωp η d−1 .
e2d = e2d−1 +
m
b
b
2
As we are only interested in which q minimizes ed (q), but not the value of
e2d (q) itself, we only need to compute Ωp η d−1 .
The entries of the matrix Ωp are of the form ω q(x)i(x)
, where the product
p(x)
of the non-zero polynomials q and i has to be evaluated in the field Zb [x]/(p),
and thus modulo p. Since the multiplicative group of every finite field is
cyclic, we can find a primitive element g which generates all elements of
the multiplicative group (Zb [x]/(p))∗ (= (Zb [x]/(p)) \ {0}). That is, there
High-dimensional integration: The quasi-Monte Carlo way
241
is a g ∈ (Zb [x]/(p))∗ such that (Zb [x]/(p))∗ = {g 0 , g 1 , g 2 , . . . , g b −1 }. Thus,
we can write the product of any non-zero polynomials q, i ∈ Zb [x]/(p) as a
power of the polynomial g. Nuyens and Cools (2006a, 2006b) then suggest
permuting the rows of Ωp by the positive powers of the primitive polynomial
g and the columns by the negative powers of the same primitive polynomial.
This procedure is often called Rader transform, since it goes back to an idea
of Rader (1968).
We now describe the Rader transform in detail. Let g(x) be a primitive
element in (Zb [x]/(p))∗ . We define a (bm − 1) × (bm − 1) matrix Π(g) =
(πk, (g))1≤k,<bm , where
1 if k(x) ≡ g(x) (mod p(x)),
πk, (g) =
0 otherwise.
m
Here k(x) denotes the polynomial which is associated with the integer k.
Since g is a primitive element it follows that each row and each column of
Π(g) has exactly one entry, which is 1, and the remaining entries are 0.
Further, Π(g)Π(g) = I, the identity matrix. In fact, the matrix Π(g) is a
permutation matrix. That is, for any (bm − 1) × (bm − 1) matrix C, Π(g)C
just changes the order of the rows of C and CΠ(g) only changes the order
of the columns of C.
Let C = (ck, )1≤k,<bm and
C = (Π(g)) Ωp Π(g −1 ).
Then
ck, =
m −1
b
u,v=1
u(x)v(x)
g(x)k g(x)−
−1
πv, (g ) = ω
.
πu,k (g)ω
p(x)
p(x)
g(x)r
.
cr = ω
p(x)
Let
Observe that cr = cr for all r, r ∈ Z with r ≡ r (mod bm − 1), since
m
g(x)b −1 = 1. Then we have ck, = ck− , and therefore C is circulant (cf.
Section 5.5).
Note that the circulant matrix C is fully determined by its first column
c = (c0 , c1 , c2 , . . . , cbm −2 ) . Such matrices have a similarity transform which
n−1
be
has the Fourier matrix as its eigenvectors. For n ∈ N, let Fn := (fk, )k,=0
−k
2πi/n
. Note
the Fourier matrix of order n given by fk, = ωn , where ωn = e
that Fn is symmetric and n1 Fn F n = I, the identity matrix. Furthermore, let
diag(a1 , . . . , an ) be the n × n diagonal matrix A = (Ai,j )ni,j=1 with Ai,i = ai
for 1 ≤ i ≤ n and Ai,j = 0 for i = j. Then we have (cf. Section 5.5)
Ωp =
bm
1
Π(g)F bm −1 DFbm −1 Π(g −1 ) ,
−1
242
J. Dick, F. Y. Kuo and I. H. Sloan
where Π(g), Π(g −1 ) are permutation matrices, Fbm −1 is a Fourier matrix, F bm −1 its complex conjugate, and D is the diagonal matrix D =
diag(F bm −1 c).
m
For any vector x = (x1 , . . . , xbm −1 ) ∈ (Cb −1 ) , the matrix–vector multiplications Π(g)x, Dx, and Π(g −1 ) x can be done in O(bm ) operations (in
fact, in the implementation of the algorithm the permutations using Π(g)
and Π(g −1 ) can be avoided altogether: see Section 5.5). Further, Fbm −1 x
and F bm −1 x can be computed in O(mbm ) operations using the fast Fourier
transform.
Therefore a fast CBC algorithm requires only O(s m bm ) operations by
using O(bm ) memory space. This is a significant speed-up compared to a
straightforward implementation. Only through this reduction of the construction cost does the CBC algorithm become applicable for the generation
of polynomial lattice point sets with reasonably large cardinality.
6.6. Integration of smooth functions
It is well understood that the convergence rate for numerical integration
depends on the smoothness of the integrand. In the following we consider
the weighted unanchored Sobolev space of smoothness α > 1, H(Ks,γ,α ),
already encountered in Section 5.9; see also Example 3.8 for the one-dimensional case. The reproducing kernel is given by (5.34), and the inner product is
γu−1
f, gs,γ,α =
u⊆{1:s}
v⊆u r v ∈{1:α−1}|v|
∂ |rv |1 +α|u\v| f
dxv dx{1:s}\u
∂x(0,r,α)
[0,1]|u\v|
[0,1]|{1:s}\u| [0,1]|v|
∂ |rv |1 +α|u\v| g
dxv dx{1:s}\u dxu\v ,
×
∂x(0,r,α)
[0,1]|{1:s}\u| [0,1]|v|
where |r v |1 := j∈v |rj |, and
∂ |rv |1 +α|u\v| f
∂x(0,r,α)
stands for the partial derivatives of f of order rj for j ∈ v, of order α for
j ∈ u \ v and order 0 otherwise, where r v = (rj )j∈v . Let the corresponding
norm be denoted by f s,γ,α = f, f s,γ,α .
For functions in H(Ks,γ,α ) it is known that a convergence rate of order n−α+δ for any δ > 0 can be achieved with a suitable algorithm. In
the previous subsections we have used the decay of the Walsh coefficients
for functions in H(Ks,γ,1 ) to show a convergence rate of order n−1+δ . To
generalize this approach to higher order we need to investigate how the
High-dimensional integration: The quasi-Monte Carlo way
243
Walsh coefficients of smooth functions decay. Since the proof is involved we
describe the main idea, leaving out technical details.
To simplify the notation we restrict ourselves to base b = 2. Let k ∈ N
be given by k = 2a1 −1 + 2a2 −1 + · · · + 2aν −1 for a1 > a2 > · · · > aν > 0. We
define for α ≥ 2 an integer
a1 + a2 + · · · + amin{α,ν} if k ∈ N,
µα (k) :=
0
if k = 0.
This generalizes the definition of µ1 (k) given by (6.3). Note that the Walsh
functions in base b = 2 only take on the values 1 or −1. Thus in the following
we write walk instead of walk .
x
Let f ∈ H(Ks,γ,α ) and Jk (x) := 0 walk (t) dt. Then
1
1
1
f (x)walk (x) dx = f (x)Jk (x) x=0 −
f (x)Jk (x) dx
f-(k) =
0
0
1
f (x)Jk (x) dx,
(6.15)
=−
0
1
walk (x) dx = 0.
We would now like to relate the Walsh coefficient f-(k) to some Walsh
coefficient of f . This is done by using the Walsh series expansion of Jk .
Let k := k − 2a1 −1 , and hence 0 ≤ k < 2a1 −1 . Then it can be shown that
∞
Jk (x) = 2−a1 −1 walk (x) −
2−c wal2a1 +c−1 +k (x) .
as
0
c=1
Substituting this into (6.15), we obtain approximately
1
−a1
f (x)walk (x) dx = −2−a1 f- (k ).
f (k) ≈ −2
0
In fact we obtain an infinite sum on the right-hand side, but the main term is
the first one: the remaining terms can be bounded by a suitable expression.
The above step can be repeated. To do so, let k := k − 2a2 −1 , k :=
k − 2a3 −1 and so on, where k (ν) = 0 and k (τ ) = 0 for τ ≥ ν. We can repeat
the last step τ times until either f (τ ) is no longer differentiable, or k (τ ) = 0,
that is, we can repeat it min(α, ν) times. Hence
f-(k) ≈ 2−a1 f- (k (1) ) ≈ 2−a1 −a2 f? (k (2) )
..
.
≈ 2−a1 −···−amin(α,ν) f-(min(α,ν)) (k (min(α,ν)) ).
244
J. Dick, F. Y. Kuo and I. H. Sloan
Taking the absolute value and using some estimation we obtain
|f-(k)| 2−a1 −···−amin(α,ν) |f-(min(α,ν)) (k (min(α,ν)) )|
−µα (k)
≤2
|f-(min(α,ν)) (k (min(α,ν)) )| ≤ 2−µα (k)
f (min(α,ν)) (x) dx,
1
0
where we used
-(min(α,ν))
|f
(k
(min(α,ν))
1
(min(α,ν))
)| = f
(x)walk(min(α,ν)) (x) dx
0
1
(min(α,ν)) f
(x)|walk(min(α,ν)) (x)| dx
≤
0
1
(min(α,ν)) f
(x) dx.
=
0
Thus if f is α-times differentiable, we obtain
|f-(k)| Cf 2−µα (k) ,
where Cf > 0 is a constant which depends only on f and α. By some
modification of the above approach it can be shown that the constant Cf
can be replaced by a constant which depends only on α (but not on f ) and
the norm of f , that is, we have
|f-(k)| ≤ Cα f 1,(1),α 2−µα (k) .
The same holds for dimension s > 1 and integer base b > 2, where the
constant additionally depends on s and b, that is,
|f-(k)| ≤ Cα,b,s f s,1,α b−µα (k) ,
where µα (k) = µα (k1 ) + · · · + µα (ks ) for k = (k1 , . . . , ks ) and 1 is the set of
weights which are 1 for each projection. For some values of b, this constant
Cα,b,s goes to 0 exponentially as s increases.
We can now begin to investigate numerical integration of functions in
H(Ks,γ,α ) using digital nets. From Theorem 6.4 we conclude that the integration error for a digital net P = {t0 , . . . , tbm −1 } satisfies
m −1
b
1
f (x) dx − m
f (ti ) ≤ Cα,b,s f s,1,α
b−µα (k) ,
b
s
[0,1]
i=0
k∈D\{0}
where D is the dual net defined by (6.2).
The last inequality separates the contribution of the function from the
contribution of the QMC rule, that is, f s,1,α depends only on the function
f but not on the digital net, whereas k∈D\{0} b−µα (k) depends only on the
generating matrices of the digital net and not on the function itself (only
on the smoothness of f , i.e., it is the same for all functions that have
High-dimensional integration: The quasi-Monte Carlo way
245
smoothness α). Therefore,
when considering the integration error we can
now focus on the term k∈D\{0} b−µα (k) .
It is possible to also introduce weights in the approach above. A precise
result for the unanchored weighted Sobolev space H(Ks,γ,α ) for α > 1 is
the following.
Theorem 6.11. The square worst-case error for multivariate integration
in the weighted unanchored Sobolev space H(Ks,γ,α ) of smoothness α > 1
using a QMC rule based on a digital net P can be bounded by
2
γu (Dα,b )|u|
b−µα (ku ) ,
[e(P, H(Ks,γ,α ))]2 ≤
ku ∈Du∗
∅=u⊆{1:s}
where Du∗ is defined by (6.8) and where
α 2
(τ −2)+
1
1
−τ
(2 sin(π/b))
b−(τ −ν)
1+ +
Dα,b = max
1≤ν≤α
b
b(b
+
1)
τ =ν
2α−2
1
1
−2α −2(α−ν)
.
+2 1+ +
(2 sin(π/b))
b
b b(b + 1)
Baldeaux and Dick (2009) investigated the value of the constant Dα,b and
showed that for b = 2 and 3 and 2 ≤ α ≤ 8 the constant Dα,b < 1.
Theorem 6.11 holds for any digital net, that is, any point set obtained via
the digital construction scheme as described in Section 2.6, including the
higher-order digital nets to be introduced in the next subsection.
6.7. Higher-order digital nets
The aim is now to find digital nets, that is, generating matrices C1 , . . . , Cs ∈
such that
Zw×m
b
b−µα (k) = O(n−α (log n)αs ),
k∈D\{0}
where the number of cubature points is n = bm . Note that we now use generating matrices with w rows, and we usually choose w ≈ αm as explained
below.
Roughly speaking, the sum k∈D\{0} b−µα (k) is dominated by its largest
term. To find this largest term, define
µ∗α (C1 , . . . , Cs ) =
min µα (k).
k∈D\{0}
The dependence on the generating matrices C1 , . . . , Cs on the right-hand
side of the above
D = D(C1 , . . . , Cs ). The
equation is via the dual net
∗
largest term in k∈D\{0} b−µα (k) is then b−µα (C1 ,...,Cs ) .
246
J. Dick, F. Y. Kuo and I. H. Sloan
In order to achieve a convergence
of almost n−α = b−αm we must have
−µ
that the largest term in k∈D\{0} b α (k) is also of this order, that is, we
must have µ∗α (C1 , . . . , Cs ) ≈ αm (or say µ∗α (C1 , . . . , Cs ) > αm − t for some
constant t independent of m). That this condition is also sufficient is quite
technical and was shown in Dick (2008, Lemma 5.2).
which
We can use some analogy to find matrices C1 , . . . , Cs ∈ Zw×m
b
∗
achieve µα (C1 , . . . , Cs ) ≈ αm. Let Cj = (cj,1 , . . . , cj,w ) , that is, cj, ∈ Zm
b
is the th row of Cj . Then the matrices C1 , . . . , Cs generate a classical
digital (t, m, s)-net if, for all i1 , . . . , is ≥ 0 with i1 + · · · + is ≤ m − t, the
vectors
c1,1 , . . . , c1,i1 , . . . , cs,1 , . . . , cs,is
are linearly independent over Zb .
Now assume C1 , . . . , Cs generate a classical digital (t, m, s)-net and that
we are given a k ∈ Ns0 \ {0} with µ1 (k) ≤ m − t. Let ij = µ1 (kj ) for
j = 1, . . . , s, then C1k1 + · · · + Csks is a linear combination of the vectors
c1,1 , . . . , c1,i1 , . . . , cs,1 , . . . , cs,is . As k = 0 and i1 + · · · + is ≤ m − t, which
implies that c1,1 , . . . , c1,i1 , . . . , cs,1 , . . . , cs,is are linearly independent, it fol/ D. This shows that
lows that C1k1 + · · · + Csks = 0 ∈ Zm
b . Thus k ∈
if C1 , . . . , Cs generate a classical digital (t, m, s)-net and k ∈ D \ {0}, then
µ1 (k) > m − t. This is precisely the type of result described above which
we also want to have for α > 1.
We now want to generalize this linear independence condition to α > 1,
that is, we want to have that if k ∈ Ns0 \ {0} with µα (k) ≤ αm − t, then
the generating matrices should have linearly independent rows such that
aj,1 −1 +
C1k1 + · · · + Csks = 0 ∈ Zm
b . Let k = (k1 , . . . , ks ), where kj = κj,1 b
aj,νj −1
, with aj,1 > · · · > aj,νj > 0 and 0 < κj,1 , . . . , κj,νj < b. First
· · ·+κj,νj b
note that if w < αm − t, then k = (bw , 0, . . . , 0) ∈ D, but µα (k) = w + 1 ≤
αm − t. In order to avoid this problem we may choose w = αm. Hence
we may now assume that aj,1 ≤ w = αm for j = 1, . . . , s, as otherwise
µα (k) > αm already and no independence condition on the generating
matrices is required in this case.
Now C1k1 + · · · + Csks is a linear combination of the rows
c1,a1,1 , . . . , c1,a1,ν1 , . . . , cs,as,1 , . . . , cs,as,νs .
Thus, if these rows are linearly independent, then
C1k1 + · · · + Csks = 0 ∈ Zm
b ,
and therefore k ∈
/ D.
are such that for all choices of aj,1 >
Therefore, if C1 , . . . , Cs ∈ Zαm×m
b
· · · > aj,νj > 0 for j = 1, . . . , s, with
a1,1 + · · · + a1,min(α,ν1 ) + · · · + as,1 + · · · + as,min(α,νs ) ≤ αm − t,
High-dimensional integration: The quasi-Monte Carlo way
247
the rows
c1,a1,1 , . . . , c1,a1,ν1 , . . . , cs,as,1 , . . . , cs,as,νs
are linearly independent, then k ∈ D \ {0} implies that µα (k) > αm − t.
(Note that we also include the case where some νj = 0, in which case we
just set aj,1 + · · · + aj,min(α,νj ) = 0.)
We can now formally define such digital nets for which the generating
matrices satisfy such a property. The following definition is a special case
of Dick (2008, Definition 4.3).
Definition 6.12 (digital (t, α, m, s)-nets). Let m, α ≥ 1, and 0 ≤ t ≤
αm be natural numbers. Let Zb be the finite field of prime order b and let
with Cj = (c
c
C1 , . . . , Cs ∈ Zαm×m
j,1 , . . . , j,αm ) . If for all 0 < aj,νj < · · · <
b
aj,1 , where 0 ≤ νj for all j = 1, . . . , s, with
s min(ν
j ,α)
j=1
aj, ≤ αm − t,
=1
the vectors
c1,a1,ν1 , . . . , c1,a1,1 , . . . , cs,as,νs , . . . , cs,as,1
are linearly independent over Zb , then the digital net with generating matrices C1 , . . . , Cs is called a digital (t, α, m, s)-net over Zb .
The need for a more general definition in Dick (2008) arises as we assume
therein that the smoothness α of the integrand is not known, so one cannot
choose w = αm in this case.
We have the following result.
Theorem 6.13 (worst-case error bound for digital (t, α, m, s)-nets).
Let α ≥ 2 be an integer, and let b ≥ 2 be a prime number. Then the worstcase error of multivariate integration in the weighted unanchored Sobolev
space H(Ks,γ,α ) of smoothness α > 1, using a QMC rule with a digital
(t, α, m, s)-net P over Zb as cubature points, is bounded by
1
2
γu (D|u|,α,b
)2 (αm − t + α + 2)2|u|α ,
e(P, H(Ks,γ,α )) ≤ b−(αm−t)
∅=u⊆{1:s}
= (Dα,b )
where D|u|,α,b
|u|
2
b|u|α (b−1 + (1 − b1/α−1 )−|u|α ).
We conclude from this theorem that a digital (t, α, m, s)-net used as cubature points in a QMC rule will yield a convergence of the integration error
of order n−α (log n)αs for integrands with f s,γ,α < ∞. The remaining
question now is: Do digital (t, α, m, s)-nets exist for all given α, s ≥ 1 and
some fixed t (which may depend on α and s but not on m) for all m ∈ N?
An affirmative answer to this question will be given in the next subsection.
248
J. Dick, F. Y. Kuo and I. H. Sloan
6.8. Construction of higher-order digital nets
We now present explicit constructions of digital (t, α, m, s)-nets. Central to
this method is the digit interlacing function.
Definition 6.14 (digit interlacing function).
function with interlacing factor α ∈ N is given by
The digit interlacing
Dα : [0, 1)α → [0, 1)
∞ α
ξr,a b−r−(a−1)α ,
(x1 , . . . , xα ) →
a=1 r=1
b−1 +ξ
where xr = ξr,1
r,2
for vectors by setting
b−2 +· · ·
for 1 ≤ r ≤ α. We also define this function
Dα : [0, 1)αs → [0, 1)s
(x1 , . . . , xαs ) → (Dα (x1 , . . . , xα ), . . . , Dα (x(s−1)α+1 , . . . , xsα )).
Example 6.15 (construction of higher-order digital nets using point
sets). Let t0 , t1 , . . . , tbm −1 ∈ [0, 1)sα be a digital (t , m, sα)-net in base b.
Then the point set
Dα (t0 ), Dα (t1 ), . . . , Dα (tbm −1 )
is a digital (t, α, m, s)-net in base b, where t is given by Theorem 6.17.
Higher-order digital sequences can be constructed in an analogous way.
To analyse the properties of higher-order digital nets, it is more convenient
to describe the above construction using the generating matrices rather than
the point set.
Example 6.16 (construction of higher-order digital nets using generating matrices). Let C1 , . . . , Csα be the generating matrices of a digital
c
cj, are
(t , m, sα)-net. Let Cj = (c
j,1 , . . . , j,m ) for j = 1, . . . , sα; that is, (α)
the row vectors of Cj . Now let the matrix Cj be made of the first rows of
the matrices C(j−1)α+1 , . . . , Cjα , then the second rows of C(j−1)α+1 , . . . , Cjα ,
(α)
and so on. The matrix Cj
is then an αm × m matrix, that is,
(α)
Cj
= (cj,1 , . . . , cj,αm ) ,
(α)
(α)
where
(α)
cj, = cu,v ,
with = (v − j)α + u, 1 ≤ v ≤ m, and (j − 1)α < u ≤ jα for = 1, . . . , αm
(α)
(α)
and j = 1, . . . , s. Then the matrices C1 , . . . , Cs are the generating matrices of a digital (t, α, m, s)-net over Zb , where t is given by Theorem 6.17.
High-dimensional integration: The quasi-Monte Carlo way
249
As we will see later, the choice of the underlying (t , m, sα)-net has a
direct impact on the bound on the t-value of the digital (t, α, m, s)-net.
To give the idea why this construction works we may consider the case
s = 1. Let α > 1. To simplify the notation we drop the j (which denotes
the coordinate) from the notation for a moment. Let C (α) be constructed
from a classical digital (t , m, α)-net with generating matrices C1 , . . . , Cα
as described above. Let αm ≥ a1 > a2 > · · · > aν ≥ 1. Then we need
(α)
(α)
to consider the row vectors ca1 , . . . , caν . Now by the construction above,
(α)
the vector ca1 may stem from any of the generating matrices C1 , . . . , Cα .
(α)
Without loss of generality assume that ca1 stems from C1 , i.e., it is the
(α)
i1 th row of C1 , where i1 = a1 /α. Next consider ca2 . This row vector may
(α)
again stem from any of the matrices C1 , . . . , Cα . If ca2 also stems from
C1 , then a2 /α < i1 . If not, we may assume without loss of generality
that it stems from C2 . Indeed, it will be the i2 th row of C2 , where i2 =
a2 /α. We continue in this fashion and define numbers i3 , i4 , . . . , i , where
1 ≤ ≤ α. Further, we set i+1 = · · · = iα = 0. Then by the (t , m, α)-net
(α)
(α)
property of C1 , . . . , Cα , it follows that ca1 , . . . , caν are linearly independent
provided that i1 + · · · + iα ≤ m − t . Hence, if we choose t such that
a1 + · · · + amin(α,ν) ≤ αm − t implies that i1 + · · · + iα ≤ m − t for all
admissible choices of a1 , . . . , aν , then the digital (t, α, m, 1)-net property of
C (α) follows.
Note that i1 = a1 /α and i ≤ a /α for = 2, . . . , α. Thus
i1 + · · · + iα ≤ a1 /α + · · · + aα /α ≤ (a1 + · · · + aα + α(α − 1))/α
a1 + · · · + aα
+ α − 1 ≤ m − t/α + α − 1.
=
α
Thus, if we choose t such that m − t/α + α − 1 ≤ m − t , then the result
follows. Simple algebra then shows that
t = αt + α(α − 1)
will suffice.
A more general and improved result is given in the following.
Theorem 6.17. Let α ≥ 1 be a natural number and let C1 , . . . , Csα be
the generating matrices of a digital (t , m, sα)-net over the finite field Zb of
(α)
(α)
prime order b. Let C1 , . . . , Cs be defined as in Example 6.16. Then the
(α)
(α)
matrices C1 , . . . , Cs are the generating matrices of a digital (t, α, m, s)net over Zb with
A
@
s(α − 1)
.
t = α min m, t +
2
250
J. Dick, F. Y. Kuo and I. H. Sloan
This shows that digital (t, α, m, s)-nets can be explicitly constructed for
all α, m, s ≥ 1 with t bounded independently of m. Indeed, the dependence
of t on α and s is also known from Dick and Baldeaux (2009): namely, there
are constants c, C > 0 independent of α and s such that cα2 s ≤ t ≤ Cα2 s.
Using Theorem 6.17 we therefore obtain explicit constructions of higherorder quasi-Monte Carlo rules which satisfy Theorem 6.13.
6.9. Geometric properties of higher-order digital nets
Figures 6.1–6.4 illustrate the properties of a digital (2, 2, 4, 2)-net in base 2.
Figure 6.2 shows a partition of the square for which each union of the shaded
rectangles contains exactly two points. Figures 6.3 and 6.4 show that also
other partitions of the unit square are possible where each union of shaded
rectangles contains the ‘fair’ number of points. Many other partitions of the
square are possible where the point set always contains the fair number of
points in each union of rectangles (see Dick and Baldeaux 2009), but there
are too many to show all of them here. Even in the simple case considered
here there are 12 partitions possible, for each of which the point set is fair:
this is quite remarkable since the point set itself has only 16 points (we
exclude all those partitions for which the fairness would follow already from
some other partition, otherwise there would be 34 of them). In the classical
case we have 4 such partitions, all of which are shown in Figure 6.1. (The
partitions from the classical case are included in the generalized case; so out
of the 12 partitions 4 are shown in Figure 6.1, one is shown in Figure 6.2,
one is shown in Figure 6.4 and one is indicated in Figure 6.3.)
The subsets of [0, 1)s which form a partition, each having a fair number
of points, are of the form
J(aν , dν )
b−1
s
B
dj,n dj,1
dj,n
dj,1
1
+ · · · + αm ,
+ · · · + αm + αm ,
=
b
b
b
b
b
j=1
dj, =0
∈{1,...,αm}\{aj,1 ,...,aj,νj }
s ν j
where b ≥ 2 is the base and where
j=1
=1 aj, ≤ αm − t. For j =
1, . . . , s we again assume 1 ≤ aj,νj < · · · < aj,1 ≤ αm for νj > 0 and
{aj,1 , . . . , aj,νj } = ∅ for νj = 0. Further, we also use the following notation:
ν = (ν1 , . . . , νs ),
|ν |1 =
s
νj ,
j=1
aν = (a1,1 , . . . , a1,ν1 , . . . , as,1 , . . . , as,νs ),
<
=|ν |
dν ∈ 0, . . . , b − 1 1 ,
dν = (d1,i1,1 , . . . , d1,i1,ν1 , . . . , ds,is,1 , . . . , ds,is,νs ),
251
High-dimensional integration: The quasi-Monte Carlo way
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 6.1. Higher-order Sobol points: these 16 points are from a Sobol
point set in dimension four with interlacing factor two (which yields a
two-dimensional point set).
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 6.2. Higher-order Sobol points: the shaded area in each square
contains exactly two points.
where the components aj, and dj, , = 1, . . . , νj , do not appear in the
vectors aν and dν for νj = 0.
Figures 6.1–6.4 give only a few examples of unions of intervals for which
each subset of the partition contains the right number of points. As the
subsets J(aν , dν ) for fixed ν and aν (with dν running through all possibilities) form a partition of [0, 1)s , it is clear that the right number of points
in J(aν , dν ) has to be bm Vol(J(aν , dν )). For example, the digital net in
Figure 6.2 has 16 points and the partition consists of 8 different subsets
J(aν , dν ), hence each J(aν , dν ) contains exactly 16/8 = 2 points. (In general, the volume of J(aν , dν ) is given by b−|ν |1 : see Dick and Baldeaux
2009.)
252
J. Dick, F. Y. Kuo and I. H. Sloan
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 6.3. Higher-order Sobol points: the
shaded area contains exactly two points.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Figure 6.4. Higher-order Sobol points: the
shaded area contains exactly half of the points.
6.10. Scrambled digital nets and sequences
We now discuss properties of Owen’s scrambling algorithm, which was introduced in Section 2.10. First we show that a point x scrambled using
Owen’s algorithm is uniformly distributed in [0, 1)s .
Proposition 6.18. Let x ∈ [0, 1)s and let Π be a uniformly and i.i.d. set
of permutations. Then Π(x) is uniformly distributed in [0, 1)s , that is, for
any Lebesgue-measurable set G ⊆ [0, 1)s , the probability that Π(x) ∈ G,
denoted by P[Π(x) ∈ G], satisfies P[Π(x) ∈ G] = λs (G), where λs denotes
the s-dimensional Lebesgue measure.
Proof. We follow Owen (1995, Proof of Proposition 2) in our exposition.
We use the notation from above and set y := Π(x). Consider the case s = 1
first and let
a a+1
E = , b
b
be an elementary interval where ≥ 0 and 0 ≤ a < b . A technical problem
which can arise in the proof below is when y is of the form
y1 = y1,1 b−1 + · · · + y1, b− + (b − 1)b−−1 + (b − 1)b−−1 + · · · ,
since then we have y1 = y1,1 b−1 + · · · + (y1, + 1)b− . We show that this only
happens with probability 0.
High-dimensional integration: The quasi-Monte Carlo way
253
The probability that there are u ≥ j0 ≥ 1 such that y1,j0 = y1,j0 +1 =
· · · = y1,u = b − 1 is given by ((b − 1)!)−(u−j0 ) . Hence the probability that
y1,j0 = y1,j0 +1 = · · · = b − 1, that is, all digits of y1 are b − 1 from some
index j0 onwards, is 0.
Let ab− = a1 b−1 + a2 b−2 + · · · + a b− . Then y1 ∈ E if and only if
y1 = a1 , y2 = a2 , . . . , y = a . Using (2.8), this is equivalent to
π1,x1,1 ,...,x1,k−1 (x1,k ) = ak ,
for 1 ≤ k ≤ .
(6.16)
For each 1 ≤ k ≤ , the probability that (6.16) holds is b−1 . Hence the
probability that y1 ∈ E is b− . The result therefore holds for all elementary
intervals of [0, 1).
We now extend the result to the general case. First notice that the result
also holds for all subintervals [ub− , vb− ), where ≥ 0 and 0 ≤ u < v ≤ b− .
The endpoints of these intervals are dense in [0, 1). A corollary of Chung
(1974, p. 28) extends the result P[y1 ∈ B] = λ1 (B) to all Borel-measurable
subsets B ⊆ [0, 1). The equality P[y1 ∈ B] = λ1 (B) extends to Lebesguemeasurable sets B since subsets of sets of measure zero have probability
zero of containing y1 .
Consider now s > 1. Let B1 , . . . , Bs be measurable subsets of [0, 1).
Because the components y1 , . . . , ys of y are independent, it follows that
P[yi ∈ Bi , 1 ≤ i ≤ s] =
s
λ1 (Bi ).
(6.17)
i=1
Finally, λs is the unique measure on [0, 1)s which satisfies (6.17).
Consider a (t, m, s)-net in base b consisting of points t0 , . . . , tbm −1 , where
ti = (ti,1 , . . . , ti,s ) and ti,j = ti,j,1 b−1 + ti,j,2 b−2 + · · · . We shall denote the
scrambled points by y 0 , . . . , y bm −1 , where y i = (yi,1 , . . . , yi,s ) and yi,j =
yi,j,1 b−1 + yi,j,2 b−2 + · · · . Specifically, the scrambled points are given by
yi,j,k = πj,ti,j,1 ,...,ti,j,k−1 (ti,j,k ), for 0 ≤ i < bm , 1 ≤ j ≤ s, and k ≥ 1.
Similarly, if (t0 , t1 , . . . ) is a (t, s)-sequence, then the scrambled sequence
will be denoted by (y 0 , y 1 , . . . ), where, using the same notation as above,
again yi,j,k = πj,ti,j,1 ,...,ti,j,k−1 (ti,j,k ), for all i ≥ 0, 1 ≤ j ≤ s, and k ≥ 1.
n−1
We consider now the expected value of n1 i=0
f (y i ). For any measurable
function f we have
5
0 n−1
n−1
1
1
f (y i ) =
E[f (y i )] =
f (y)dy,
E
n
n
[0,1]s
i=0
i=0
since each point y i is uniformly distributed in [0, 1)s and hence
E[f (y i )] =
f (y)dy for 0 ≤ i < n.
[0,1]s
254
J. Dick, F. Y. Kuo and I. H. Sloan
In other words, this means that a scrambled point set (note that the above
applies even if the underlying point set t0 , . . . , tn−1 is not a digital net) used
in a QMC rule yields an unbiased estimator.
The second important property of the randomization algorithm we require
is that the (t, m, s)-net structure of the points t0 , . . . , tbm −1 is retained after
applying the scrambling algorithm. For some technical reason this does not
quite hold, but it holds with probability one, which is still sufficient. The
following proposition was first shown by Owen (1995).
Proposition 6.19. If t0 , . . . , tbm −1 form a (t, m, s)-net in base b, then
y 0 , . . . , y bm −1 is a (t, m, s)-net in base b with probability one. If t0 , t1 , . . .
are obtained from a (t, s)-sequence, then the scrambled points y 0 , y 1 , . . .
form a (t, s)-sequence with probability one.
Proof. The probability that, for some 0 ≤ i < bm , 1 ≤ j ≤ s, and ∈ N, all
yi,j,k = b − 1 for all k ≥ is 0. Equivalently, with probability one, infinitely
many digits in the b-adic expansion of yi,j are different from b−1. Therefore,
the probability that yi,j has infinitely many digits in the b-adic expansion
of yi,j are different from b − 1 for all 0 ≤ i < bm and 1 ≤ j ≤ s is 0, since
the union of a finite number of zero probability events has probability zero.
Hence this holds for each component of each point of a (t, m, s)-net.
For a (t, s)-sequence the same applies, since a countable union of probability zero events itself has probability zero. Hence this also holds for each
component of each point of a (t, s)-sequence.
Therefore we may, in the following, assume that infinitely many digits in
the b-adic expansion of yi,j differ from b − 1 for all i ∈ N0 and 1 ≤ j ≤ s.
Assume we are given an elementary interval
J=
s
[aj b−dj , (aj + 1)b−dj ),
j=1
where 0 ≤ aj < bdj , dj ∈ N0 , and d1 + · · · + ds ≤ m − t. Let aj b−dj =
aj,1 b−1 + aj,2 b−2 + · · · + aj,dj b−dj .
Then y i ∈ J if and only if yi,j,k = aj,k for all 1 ≤ k ≤ dj and all
−1
1 ≤ j ≤ s. Further, yi,j,k = aj,k if and only if xi,j,k = πj,t
(aj,k ).
i,j,1 ,...,ti,j,k−1
−1
(aj,k ). Then y i ∈ J if and only if
Let aj,k = πj,t
i,j,1 ,...,ti,j,k−1
ti ∈ J =
s
[aj b−dj , (aj + 1)b−dj ),
j=1
where aj b−dj = aj,1 b−1 + · · · + aj,dj b−dj . As the points t0 , . . . , tbm −1 form
a (t, m, s)-net, it follows that there are exactly bm−t points of this net in
J and hence there are exactly bm−t points of y 0 , . . . , y bm −1 in J. Thus
y 0 , . . . , y bm −1 form a (t, m, s)-net with probability one.
255
High-dimensional integration: The quasi-Monte Carlo way
For a (t, s)-sequence t0 , t1 , . . . for all k ∈ N0 and m ≥ t the point
set consisting of tkbm , . . . , tkbm +bm −1 forms a (t, m, s)-net which is again
a (t, m, s)-net after scrambling with probability one. Since the union of
countably many zero probability events has probability zero, the result for
(t, s)-sequences follows as well.
6.11. Error analysis of scrambled digital nets
In this subsection we study upper bounds on the variance of scrambled
digital nets. To explain the result for scrambled digital nets, we introduce
the so-called nested ANOVA decomposition.
Theorem 6.20 (nested ANOVA decomposition). Consider a function
f ∈ L2 ([0, 1]) with Walsh series expansion
f (x) ∼
∞
f-(k)walk (x),
(6.18)
k=0
where
f-(k) =
Define β0 :=
1
0
1
f (x)walk (x) dx.
0
f (y) dy, and for ≥ 1,
β (x) :=
−1
b
f-(k)walk (x)
and
σ2 (f ) := Var[β ].
k=b−1
Then we have the nested ANOVA decomposition of f
Var[f ] =
∞
σ2 (f ).
=1
Notice that we do not necessarily have equality in (6.18), since the function f is only assumed to be in L2 ([0, 1]) (and hence may for example be
changed arbitrarily on a set of measure zero without changing f-(k) for any
k ≥ 0).
Proof.
−1
b
k=0
Consider the b -term approximation of f given by
f-(k)walk (x) =
−1 b
1
k=0
0
1
f (y)walk (x y) dy =
f (y)D (x y) dy,
0
where D is the Walsh–Dirichlet kernel given by
−1
b
b if z ∈ [0, b− ),
walk (z) =
D (z) =
0 otherwise.
k=0
256
J. Dick, F. Y. Kuo and I. H. Sloan
Hence we have
−1
b
f-(k)walk (x) = b
k=0
f (y) dy,
yb =xb where the integration is over all y such that yb = xb , that is, the first
digits of x and y coincide. Therefore, for ≥ 1 we have
f (y) dy − b−1
f (y) dy.
β (x) = b
yb =xb yb−1 =xb−1 (Owen 1997b defined this function using Haar wavelets.) Notice that β is
constant on intervals of the form [ub− , (u + 1)b− ), so β (x) = β (b xb− ).
Then, because of the orthogonality of the Walsh functions we obtain
σ2 (f )
and also
1
=
|β (x)| dx =
2
0
−1
b
|f-(k)|2
k=b−1
1
β (x)β (x)dx = 0,
for = .
0
Since f ∈ L2 ([0, 1]) and the Walsh function system is complete, we can
use Plancherel’s identity to obtain
1
∞
∞
|f (y) − E(f )|2 dy =
|f-(k)|2 =
σ2 (f ).
Var[f ] =
0
k=1
=1
Therefore we have obtained a decomposition of the variance of f in terms
of the variances of β .
∞
- 2
Notice that ∞
k=1 |f (k)| = Var
k=0 f (k)walk . Hence, as a by-product,
we obtain that for any f ∈ L2 ([0, 1]), the variance of f and the variance of
its Walsh series coincide, that is,
5
0∞
f-(k)walk .
Var[f ] = Var
k=0
For functions f : [0, 1]s → R the nested ANOVA decomposition can be
applied component-wise. Let f ∈ L2 ([0, 1]s ) have the following Walsh series
expansion:
f-(k)walk (x) =: S(x, f ).
f (x) ∼
k∈Ns0
High-dimensional integration: The quasi-Monte Carlo way
257
Let = (1 , . . . , s ) ∈ Ns0 and L = {k = (k1 , . . . , ks ) ∈ Ns0 : bj −1 ≤ kj <
bj for 1 ≤ j ≤ s}. Then let
β (x) =
f-(k)walk (x)
k∈L
and
σ2 (f ) := Var[β ] =
[0,1]s
|β (x)|2 dx =
|f-(k)|2 .
k∈L
Ns0
\ {0} let
For = (1 , . . . , s ) ∈
n−1 s b
1 1
1 1 −1
,
−
G = 2
−1
N
b − 1 b j ti,j =b j ti ,j b − 1 b j ti,j =b j ti ,j i,i =0 j=1
where ti = (ti,1 , . . . , ti,s ) for 0 ≤ i ≤ n − 1 and where 1A=B is 1 if A = B
and 0 otherwise.
If k ∈ L , then
n−1
1 E walk (y i y i ) = G ,
2
n i,i =0
since the coordinates are randomized independently from each other. Owen
(1997b) called the numbers Γ := nG gain coefficients, since, as we see
below, they determine how much one gains compared to the classical Monte
Carlo algorithm.
We now estimate the variance of the estimator
- n,s (f ) = 1
f (y i )
Q
n
n−1
(6.19)
i=0
for integrands f ∈ L2 ([0, 1]s ) when the points y 0 , . . . , y n−1 are obtained
by applying Owen’s scrambling to a (digital) (t, m, s)-net over Zb . (To
- n,s rather
emphasize that (6.19) is a random variable, we use the notation Q
than Qn,s .)
For the following results see Owen (1997a, 1997b, 1998).
Theorem 6.21 (variance of the integral estimator using scrambled
- n,s (f ) be given by (6.19). Let the point
nets). Let f ∈ L2 ([0, 1]s ) and Q
s
set {y 0 , . . . , y n−1 } ⊆ [0, 1) be obtained by applying Owen’s scrambling
algorithm to the point set {t0 , . . . , tn−1 } ⊆ [0, 1)s . Then the variance of the
- n,s (f ) is given by
estimator Q
- n,s (f )] = 1
Γ σ2 (f ).
Var[Q
n
s
∈N0 \{0}
258
J. Dick, F. Y. Kuo and I. H. Sloan
Theorem 6.22 (a bound on the variance in terms of gain coeffi- n,s (f ) be given by (6.19). Let the points
cients). Let f ∈ L2 ([0, 1]s ) and Q
{t0 , . . . , tbm −1 } be a digital (t, m, s)-net over Zb . Then
s −m+t b + 1
σ2 (f ).
Var[Qn,s (f )] ≤ b
b−1
s
∈N0
||1 >m−t
2
- n,s (f )] = 1 s
For MC one obtains a variance Var[Q
∈N0 \{0} σ (f ). Hence
n
the gain of scrambled digital nets lies in the fact that we only sum over σ2 (f )
for which ||1 > m−t, although one incurs a penalty factor of bt ((b+1)/(b−
1))s using scrambled digital (t, m, s)-nets. Notice that the gain coefficients
Γ are 0 for ∈ Ns0 with ||1 ≤ m − t and Γ = bt ((b + 1)/(b − 1))s for
∈ Ns0 with ||1 > m − t.
- n,s (f )] for a scrambled (0, m, s)-net is
Theorem 6.22 shows that Var[Q
always at most by a factor of ((b + 1)/(b − 1))s larger than the variance
for a plain MC algorithm (see also Owen (1997a, Theorem 1), who shows
that the gain coefficients are always bounded by e in this case). Further,
for scrambled (t, m, s)-nets we have
s m
t b+1
σ2 (f ) → 0 as m → ∞.
b Var[Qn,s (f )] = b
b−1
s
∈N0
||1 >m−t
- n,s (f )]
Hence, scrambled (t, m, s)-nets outperform MC with respect to Var[Q
2
asymptotically, as for MC we have nVar[Qn,s (f )] = ∈Ns \{0} σ (f ) for all
0
n ∈ N.
If the integrand f has smoothness 0 < α ≤ 1, then one can show that
|σ (f )| ≤ Cf b−αµ1 () . Scrambled (t, m, s)-nets can take advantage now,
since Γ = 0 for all ∈ Ns0 with µ1 () ≤ m − t and Γ is bounded otherwise.
Thus, asymptotically one obtains an improved rate of convergence in this
case. If the integrand is only in L2 ([0, 1]s ), then one still gets the MC rate
of convergence of n−1/2 .
Definition 6.23 (generalized Vitali variation). We define the generalized variation in the sense of Vitali of order 0 < α ≤ 1 by
1/2
∆(f, J) 2
(s)
Vol(J)
,
Vα (f ) = sup
Vol(J)α P
J∈P
where the supremum is extended over all partitions P of [0, 1]s into subintervals and Vol(J) denotes the volume of the subinterval J.
High-dimensional integration: The quasi-Monte Carlo way
259
For α = 1 and if the partial derivatives of f are continuous on [0, 1]s , we
also have the formula
2 1/2
∂sf
(s)
(x) dx
.
V1 (f ) =
[0,1]s ∂x1 · · · ∂xs
Indeed we have
∂sf
∂sf
(x)dx = Vol(J)
(ζ J )
|∆(f, J)| = ∂x1 · · · ∂xs
J ∂x1 · · · ∂xs
for some ζ J ∈ J, which follows by applying the mean value theorem to the
inequality
∂sf
∂sf
−1 (x) ≤ Vol(J) (x)dx
min
x∈J ∂x1 · · · ∂xs
J ∂x1 · · · ∂xs
s
∂ f
(x).
≤ max
x∈J ∂x1 · · · ∂xs
Therefore we have
2
sf
∆(f, J) 2 ∂
=
Vol(J)
Vol(J)
(ζ J ) ,
Vol(J)
∂x1 · · · ∂xs
J∈P
J∈P
2
sf
dx, and thus
which is just a Riemann sum for the integral [0,1]s ∂x1∂···∂x
s
the equality follows.
So far we have not taken into account projections to lower-dimensional
faces.
Definition 6.24 (generalized Hardy and Krause variation). For ∅ =
(|u|)
u ⊆ {1 : s}, let Vα (fu ; u) be the generalized Vitali variation with coefficient 0 < α ≤ 1 of the |u|-dimensional function
f (x)dx{1:s}\u .
fu (xu ) =
For u = ∅ we have f∅ =
Then
[0,1]s−|u|
(|∅|)
[0,1]s
Vα (f ) =
f (x)dx{1:s} and we define Vα
2
Vα(|u|) (fu ; u)
(f∅ ; ∅) = |f∅ |.
1/2
u⊆{1:s}
is called the generalized Hardy and Krause variation of f on [0, 1]s .
A function f for which Vα (f ) < ∞ is said to be of finite variation of
order α.
Theorem 6.25 (a bound on the variance in terms of variation).
Let f : [0, 1]s → R have bounded variation Vα (f ) < ∞ of order 0 < α ≤ 1.
260
J. Dick, F. Y. Kuo and I. H. Sloan
- n,s (f )] using a randomly scrambled
Then the variance of the estimator Var[Q
digital (t, m, s)-net over Zb is bounded by
(2α−1)+ s b2s m − t + s
2
−(1+2α)(m−t) (b − 1)
,
Var[Qn,s (f )] ≤ Vα (f )b
s−1
b2α (b − 1)s
where again (2α − 1)+ = max(2α − 1, 0).
6.12. Simplifications of the scrambling scheme: affine matrix scrambling
The scrambling algorithm as introduced by Owen requires one to generate
and store all the necessary permutations used for the scrambling scheme.
Since the number of permutations needed is very large, this is not practical,
and therefore simplifications of this algorithm have been found for which
the main result still holds.
One version of such a simplified scrambling is the following: Let xj =
xj,1 b−1 + xj,2 b−2 + · · · denote the base b expansion of the jth coordinate
of x = (x1 , . . . , xs ) ∈ [0, 1)s . Let y ∈ [0, 1)s denote the point which is
obtained after scrambling x. Assume that the jth coordinate has base b
expansion yj = yj,1 b−1 + yj,2 b−2 + · · · . To obtain a simplified scrambling,
choose mj,k, , σj,k ∈ Zb for 1 ≤ < k and mj,k,k ∈ Zb \ {0} for 1 ≤ j ≤ s
independently and uniformly distributed over their ranges. Then we set
yj,k =
k
mj,k, xj, + σj,k
(mod b)
for 1 ≤ j ≤ s.
=1
This can also be written in matrix form. Set Mj = (mj,k, )k, , where mj,k, =
0 for > k and σj = (σj,k )
k . Note that Mj is a lower triangular matrix.
Further, let xj = (xj,1 , xj,2 , . . .) and yj = (yj,1 , yj,2 , . . .) . Then we have
yj = Mxj + σ .
This method reduces the number of permutations significantly while still
preserving the main properties of Owen’s scrambling. This scrambling
scheme is called affine matrix scrambling. The results for Owen’s scrambling presented above still hold for the affine matrix scrambling.
6.13. Higher-order scrambling
We have seen that scrambling can improve the convergence rate of the variance to n−3/2+δ , for any δ > 0, for instance for functions whose generalized
variation is bounded. Since there are higher-order nets which yield deterministic QMC rules with convergence rates of n−α+δ for arbitrary δ > 0
for functions with smoothness α ≥ 1, the question also arises of how to
generalize Owen’s scrambling and its simplifications to achieve higher-order
convergence. We briefly describe this in the following.
High-dimensional integration: The quasi-Monte Carlo way
261
Let t0 , . . . , tbm −1 ∈ [0, 1)ds be a randomly scrambled digital (t, m, ds) net
over the finite field Zb of prime order b (here scrambling can mean Owen’s
original scrambling or any of the simplifications for which the same results
hold as for Owen’s scrambling). Then one simply uses the sample points
y i = Dα (xi ) ∈ [0, 1)s
for 0 ≤ i < bm ,
where Dα is the digit interlacing function of order α ≥ 1. The integral is
then estimated using
b −1
1 f (y n ).
Qn,s (f ) = m
b
m
n=0
This method again yields an unbiased estimator, that is,
- n,s (f )) =
f (x)dx.
E(Q
[0,1]s
If the integrand has square-integrable partial mixed derivatives up to order
- n,s (f ) satisfies
α ≥ 1 in each variable, then the variance of Q
- n,s (f )] = O(n−2 min(d,α)−1+δ )
Var[Q
for any δ > 0, where n = bm is the number of sample points.
6.14. Notes
Section 6.1 is taken partly from Dick and Pillichshammer (2010). Sections 6.4 and 6.5 are adapted from Dick and Pillichshammer (2010, Chapter 10). Section 6.6 is based on Dick and Baldeaux (2009). Sections 6.7
and 6.8 are based on Dick (2009b). Sections 6.10 and 6.11 are based on
Dick and Pillichshammer (2010, Chapter13).
The monographs by Niederreiter (1992a) and by Dick and Pillichshammer (2010) give a comprehensive introduction to (digital) nets and (digital)
sequences, discrepancy theory and QMC rules based on such point sets and
sequences. More references can be found therein.
Walsh functions were introduced by Walsh (1923); see also Chrestenson
(1955) and Fine (1949). For information on Walsh functions in the context
of QMC integration, see Dick and Pillichshammer (2010).
Numerical integration of Walsh series using QMC rules based on digital nets was studied, for instance, by Larcher and Traunfellner (1994) and
Larcher, Schmid and Wolf (1994, 1996b), whereas numerical integration in
Haar wavelet spaces was studied by Sobol (1969) and Heinrich, Hickernell
and Yue (2004). Numerical integration in anchored and unanchored Sobolev
spaces using QMC rules based on digital nets was first studied by Dick and
Pillichshammer (2005).
262
J. Dick, F. Y. Kuo and I. H. Sloan
Polynomial lattice rules have been introduced by Niederreiter (1992b).
Shift nets are a subclass of digital nets which were introduced by Schmid
(1996) and further studied by Pillichshammer (2002). The so-called Salzburg tables provide a table of digital nets found by computer search with
small t-value. These computations are described by Hansen, Mullen and
Niederreiter (1993), Larcher, Lauß, Niederreiter and Schmid (1996a) and
Schmid (2000). An improvement of the method for computing the t-value of
digital nets has been studied in Pirsic and Schmid (2001). A table with many
of the currently best known t-values for (t, m, s)-nets and (t, s)-sequences
for many values of m and s can be found at http://mint.sbg.ac.at/.
The CBC construction of polynomial lattice rules was introduced by
Dick et al. (2005) for anchored and unanchored Sobolev spaces and by
Dick, Kritzer, Leobacher and Pillichshammer (2007a) for the weighted stardiscrepancy. The case where the modulus is not necessarily irreducible was
considered by Kritzer and Pillichshammer (2007).
The fast CBC algorithm was introduced by Nuyens and Cools (2006a)
using FFT methods for the construction of classical lattice point sets. Due
to the similarities between ordinary and polynomial lattice point sets it
turned out that their methods can also be carried over to the polynomial
case: see Nuyens and Cools (2006b). Implementations of the fast algorithm
using Matlab can be found in Nuyens and Cools (2006c).
Cyclic nets are another subclass of digital nets of size bm which were
introduced by Niederreiter (2004). Hyperplane nets are a subclass of digital nets of size bms which were introduced by Pirsic, Dick, Pillichshammer
(2006). The discrepancy of cyclic and hyperplane nets was considered in
Pillichshammer and Pirsic (2009). Extensible polynomial lattice rules were
studied by Niederreiter (2003) and a construction algorithm was introduced
by Dick (2007b). Extensible hyperplane nets were studied by Pirsic and Pillichshammer (2011). Constructions of (digital) nets and (digital) sequences
based on existing constructions are called propagation rules: see Dick and
Pillichshammer (2010, Chapter 9) for a summary.
The basic construction principle of higher-order digital nets and sequences
appeared first in Dick (2007a) and was slightly modified in Dick (2008). A
bound on the decay of the Walsh coefficients has been studied in more
detail in Dick (2009a). Theorem 6.17 is a special case of Dick (2008, Theorem 4.11), with an improvement for some cases from Dick and Baldeaux
(2009) (a proof of this result can be found in Dick and Baldeaux 2009,
Dick 2007a and Dick and Kritzer 2010). The CBC construction of higherorder polynomial lattice rules was studied by Baldeaux, Dick, Greslehner
and Pillichshammer (2011) and Baldeaux et al. (2012). A CBC construction of higher-order scrambled polynomial lattice rules was studied by Goda
and Dick (2013). Propagation rules for higher-order digital nets were studied by Dick and Kritzer (2010). Tractability of higher-order polynomial
High-dimensional integration: The quasi-Monte Carlo way
263
lattice rules was studied by Dick and Pillichshammer (2007) and the existence of higher-order polynomial lattice rules with small t-value was studied by Dick, Kritzer, Pillichshammer and Schmid (2007b). Recently Matsumoto, Saito and Matoba (2013) and Matsumoto and Yoshiki (2013) constructed QMC rules which achieve higher-order convergence for function
classes of very high smoothness.
Scrambling was introduced by Owen (1995) and further studied in Owen
(1997a, 1997b). Scrambled Niederreiter and Xing sequences were studied
by Owen (1998). The mean-square discrepancy of scrambled nets was studied by Hickernell and Yue (2000), whereas integration and approximation
using scrambled nets were studied by Yue and Hickernell (2001). The gain
coefficients of digital nets were further investigated by Yue and Hickernell
(2002). Strong tractability of scrambled Niederreiter sequences was studied
by Yue and Hickernell (2005), whereas strong tractability in Banach spaces
was studied by Yue and Hickernell (2006). Of particular interest in the context of scrambled nets is also the result by Loh (2003), who shows that a
- n,s (f ) for which the cubature
central limit theorem holds for the estimate Q
points are based on a scrambled (0, m, s)-net. This allows one to obtain an
- n,s (f ).
approximate confidence interval from the variance estimates of Q
Simplifications of Owen’s scrambling algorithm were studied by Hickernell
(1996b), Matoušek (1998b) and Tezuka and Faure (2003). In particular, the
affine matrix scrambling is from Matoušek (1998b) and is implemented in
Matlab. Higher-order scrambling has been studied in Dick (2011a). A
construction of randomly scrambled polynomial lattice rules which have a
better dependence on the dimension has been studied in Baldeaux and Dick
(2011) for functions of smoothness at most 1.
7. Infinite-dimensional integration
In this section we consider briefly numerical integration for problems with
an infinite number of dimensions. We do not try to be definitive, because
the subject is developing and changing rapidly. Rather, we introduce some
themes likely to be of continuing importance. We concentrate here on ‘anchored’ function spaces, because anchoring is a natural idea for functions
that depend on an infinite number of variables. The section concludes with
a brief discussion of an application that is driving much recent interest in
infinite-dimensional problems.
7.1. The infinite-dimensional problem
The problem is to integrate real-valued functions f that have a countably
infinite number of variables,
f (x1 , x2 , . . .),
264
J. Dick, F. Y. Kuo and I. H. Sloan
with each component lying on the unit interval, xj ∈ [0, 1], j ∈ N. Equivalently we write f (x) with
∞
N
[0, 1].
x = (xj )j∈N ∈ [0, 1] :=
j=1
To avoid formal problems associated with functions of an infinite number
of variables, we choose an ‘anchor’ c ∈ [0, 1], and in our algorithms only
allow function evaluation when at most a finite number of components of
x have values different from the anchor value c. Often c = 0 is the most
natural choice for the anchor, but in some applications it is better to take the
midpoint c = 1/2 as anchor. The components xj that have the value c are
considered to be ‘inactive variables’ for the particular function evaluation,
in contrast to the ‘active variables’ xj = c.
If the set of active variables is xu , we write the value of f as f (xu ; c).
In this section we shall assume that f (xu ; c) is a continuous function of xu
on [0, 1]|u| for every finite subset u of the natural numbers. The infinitedimensional integral may now be defined by
f (x{1:s} ; c) dx{1:s} .
(7.1)
I∞ (f ) := lim
s→∞ [0,1]s
In Section 7.4 we shall introduce a reproducing kernel Hilbert space setting
that ensures that this limit exists, and ensures also that f (x) is well defined
for all x ∈ [0, 1]N . For the present, however, we concentrate on algorithms
and costs.
Our algorithms for approximating this infinite-dimensional integral all
have the form
n−1
(i)
wi f (tui ; c),
(7.2)
Q(f ) =
i=0
where wi ∈ R, ui is a finite subset of N, and tui ∈ [0, 1]|ui | . Three kinds of
algorithms have been studied in the literature, which we introduce in the
following subsection.
(i)
7.2. Three kinds of algorithms
In this subsection we describe single-level (or fixed-dimension), multi-level
and changing-dimension algorithms.
Example 7.1 (single-level algorithm). The single-level (SL) or fixeddimension algorithm approximates the infinite-dimensional integral by an
s-dimensional QMC rule,
n−1
1
(i)
f (t{1:s} ; c),
(7.3)
QSL (f ) :=
n
i=0
in which the active variables are the leading variables x1 , x2 , . . . , xs .
High-dimensional integration: The quasi-Monte Carlo way
265
Example 7.2 (multi-level algorithm). A different kind of algorithm
comes from adopting a multi-level (ML) approach (Heinrich 1998, Giles
2008). We define an infinite sequence of dimensions
0 = s0 < s1 < s2 < · · · < ∞,
and write the function as a collapsing sum,
f (x) =
∞
f (x{1:s } ; c) − f (x{1:s−1 } ; c) + f (c),
=1
and similarly write the integral (7.1) also as a collapsing sum,
I∞ (f ) =
∞
Is (f ) − Is−1 (f ) + f (c),
=1
with I0 (f ) = f (c), and
Is (f ) :=
[0,1]s
f (x{1:s } ; c) dx{1:s } .
We then approximate the collapsing sum up to level L > 0 by the cubature
formula
L
ML
Qn , (ψ (f ) − ψ−1 (f )) + f (c),
(7.4)
Q (f ) :=
=1
where ψ (f ) := f (x{1:s } ; c), and
Qn , (ψ (f ) − ψ−1 (f )) =
n −1+
,
1 (i)
(i)
f (t{1:s } ; c) − f (t{1:s−1 } ; c) ,
n
i=0
and where
(i)
t{1:s }
∈
[0, 1]s ,
and
(i)
t{1:s−1 }
contains the first s−1 components
(i)
of t{1:s } . For n = 0 we assume that Qn , (ψ (f ) − ψ−1 (f )) = 0. The idea
of an ML scheme is that the successive terms in the collapsing sum for the
integral should be smaller and smaller, and hence can be well approximated
by a smaller number of points n as increases, leaving only relatively lowdimensional integrals needing a large number of points. We assumed in (7.4)
that the cubature rules Qn , are QMC rules, but other choices are clearly
possible.
Roughly speaking, ML algorithms are of particular advantage if for a
particular f the important subsets u involve only a small number of leading
variables (that is, in the language of Caflisch et al. 1997, if the ‘truncation
dimension’ is small).
Example 7.3 (changing-dimension algorithm). A third kind of algorithm is the changing-dimension (CD) algorithm (Kuo, Sloan, Wasilkowski
and Woźniakowski 2010c). In this approach a different cubature rule is
266
J. Dick, F. Y. Kuo and I. H. Sloan
applied to each term fu of the so-called ‘anchored’ decomposition of f . The
anchored decomposition has the form
fu ,
(7.5)
f=
|u|<∞
where for a finite set u the function fu depends only on the set xu of active
variables xj , j ∈ u, and vanishes if any of those active variables has the
value c. The anchored decomposition has appeared under many different
guises: see for example Sobol (1969), Li, Schoendorf, Ho and Rabitz (2004)
and Griebel (2006). The terms in the anchored decomposition may be
defined recursively by
fv (xv ),
(7.6)
fu (xu ) = f (xu ; c) −
v⊂u
where the sum is over the strict subsets of u. An explicit formula for fu in
terms of f is given by
(−1)|u|−|v| f (xv ; c),
(7.7)
fu (xu ) =
v⊆u
where now the sum is over all subsets of u. The formula is established,
for example, in Theorem 1 of Kuo, Sloan, Wasilkowski and Woźniakowski
(2010b).
The CD algorithms then take the form
Qnu ,u (fu ),
(7.8)
QCD (f ) :=
|u|<∞
where, for a finite subset u,
Qnu ,u (fu ) =
nu −1
1 (i)
fu (tu ),
nu
i=0
with tu ∈ [0, 1]|u| , and we use the convention that Qnu ,u (fu ) vanishes if
nu = 0. Thus in the CD algorithm a different cubature rule is applied
to each component of the anchored decomposition. We assumed for simplicity that the cubature rule Qnu ,u is a QMC rule, but other choices are
clearly possible. Roughly speaking, this algorithm has a significant advantage if for a particular f the important subsets u all have small cardinality
(that is, in the language of Caflisch et al. 1997, if the ‘superposition dimension’ is small).
(i)
7.3. Cost models and the choice of algorithms
In earlier sections we implicitly assumed that the cost of a single function evaluation is always the same, since in assessing cost we only counted
the number of function evaluations. Now that the number of variables is
High-dimensional integration: The quasi-Monte Carlo way
267
unbounded, it seems reasonable to allow the cost of evaluating f (xu ; c) to
depend on the set u. We consider two different cost models.
• Model A: the cost depends on the highest index of the active variables
costA (u) = $(max u).
• Model B: the cost depends on the number of active variables
costB (u) = $(|u|).
Here the ‘dollar’ function $ : N0 → R+ is assumed to be positive and nondecreasing. For example, $(k) = [max(1, k)]σ for some σ > 0. We note
that Model A is a simplification of a model introduced in Creutzig, Dereich,
Müller-Gronbach and Ritter (2009) and Model B was introduced in Kuo
et al. (2010c).
Whereas in previous sections we assumed that the cost of an algorithm is
simply the number of function evaluations, we now assume that the cost of
the algorithm given by (7.2) is
n−1
costX (ui ), X ∈ {A, B}.
(7.9)
costX (Q) :=
i=0
The aim now is to approximate the integral with accuracy ε > 0 by an algorithm with small cost (rather than a small number of function evaluations).
The costs of the SL and ML algorithms under both cost models are,
respectively,
L
SL
ML
n $(s ) + $(s−1 ) ,
costX (Q ) = n $(s) and costX (Q ) =
=1
while the cost of the CD algorithm under each cost model is
nu
$(max v) ≤
nu $(max u) 2|u| ,
costA (QCD ) =
v⊆u
|u|<∞
costB (Q
CD
)=
|u|<∞
nu
v⊆u
|u|<∞
$(|v|) ≤
nu $(|u|) 2|u| ,
|u|<∞
where the sum over v appears due to the need to compute fu using (7.7).
One cost model may be more appropriate than the other, depending on
the integrand for the application at hand. The choice of algorithm should
therefore also depend on the cost model for the given application.
7.4. A reproducing kernel Hilbert space setting
Now we restrict the function class further. Just as in the finite-dimensional
case, we restrict our considerations to functions that lie in a certain reproducing Hilbert space (RKHS), because it is then possible to compute
worst-case errors, and to study tractability of the integration problem.
268
J. Dick, F. Y. Kuo and I. H. Sloan
Let η : [0, 1]2 → R be a reproducing kernel on the interval [0, 1] such that
there exists a c ∈ [0, 1] with η(c, c) = 0. Examples of reproducing kernels
satisfying this condition are min(x, y) (see (4.1) with c = 0) or 1 − max(x, y)
(see (4.1) with c = 1). Let the RKHS be denoted by H(η), the inner product
by ·, · and the corresponding norm by · . Assume that
M := sup |η(x, x)| < ∞.
x∈[0,1]
Then, since
|η(x, y)| = |η(·, x), η(·, y)| ≤ η(·, x)η(·, y) = η(x, x)η(y, y) (7.10)
for all x, y ∈ [0, 1], we have |η(x, c)| ≤ M η(c, c) = 0.Thus
η(x, c) = 0,
for all x ∈ [0, 1],
(7.11)
and we have
f (c) = f, η(·, c) = f, 0 = 0,
for all f ∈ H(η).
Thus all functions f ∈ H(η) vanish at c, in particular, the only constant
function in H(η) is f ≡ 0. This implies that the space of constant functions
H(1), defined by the kernel 1, is orthogonal to H(η). Moreover, the kernel
1 + η defines an RKHS which is the direct sum of constant functions and
functions in H(η). This implies that the space of constant functions H(1),
defined by the kernel 1, is orthogonal to H(η) in the space H(1 + η).
To define the infinite-dimensional RKHS, for each set u ⊂ N with |u| < ∞
let γu be a non-negative real number, and for points x, y ∈ [0, 1]N let
η(xj , yj ) and Kγ (x, y) :=
γu Ku (xu , y u ), (7.12)
Ku (xu , y u ) :=
|u|<∞
j∈u
where again the empty product is defined to be 1. In the following we
assume that
γu M |u| < ∞.
(7.13)
|u|<∞
Condition (7.13) ensures that the series expression for Kγ (x, y) converges
absolutely and uniformly, since from (7.10) we have |η(x, y)| ≤ M . The
kernel Ku defines an RKHS H(Ku ) on [0, 1]|u| , and the kernel Kγ defines an
RKHS H(Kγ ) on [0, 1]N . The inner products in H(Kγ ) and H(Ku ) will be
denoted by ·, ·γ and ·, ·u respectively, and the corresponding norms by
· γ and · u .
Note that a condition significantly weaker than (7.13) would be sufficient to define infinite-dimensional integration, but to avoid some technicalities we shall be content with (7.13). In particular, our assumptions
imply that function evaluation is continuous at every point x ∈ [0, 1]N . The
High-dimensional integration: The quasi-Monte Carlo way
269
idea for weakening the conditions is to impose conditions such that infinitedimensional integration and function evaluation at finite-dimensional projections are continuous linear functionals. See the work by Gnewuch, Mayer
and Ritter (2013) for even weaker conditions.
A function f ∈ H(Kγ ) can be decomposed into a sum of functions fu ∈
H(Ku ) by setting
(7.14)
fu (xu ) := f, γu Ku (·, xu )γ ,
which implies
fu (xu ) =
f, γu Ku (·, xu )γ = f,
γu Ku (·, xu )
|u|<∞
|u|<∞
|u|<∞
= f (x).
γ
(7.15)
f
is
orthogonal
in
H(K
),
since
H(η)
The decomposition f =
γ
|u|<∞ u
is orthogonal to the space of constant functions and for u = v either
there exists j ∈ u, j ∈
/ v or there exists j ∈ v, j ∈
/ u, implying that
Ku (·, xu ), Kv (·, y v )γ = 0 and hence by (7.14) we have fu , Kv (·, y v )γ = 0.
By orthogonality, the norm in H(Kγ ) is related to the norm of the projections by the formula
fu 2γ =
γu−1 fu 2u .
f 2γ =
|u|<∞
|u|<∞
We now study integration of functions defined on H(Kγ ). Let
1
Ku (y u , xu ) dxu =
η(yj , xj ) dxj ,
hu (y u ) :=
[0,1]u|
j∈u
and
h(y) :=
0
γu hu (y u ).
|u|<∞
From (7.13) we conclude that this series converges absolutely and uniformly.
The functions hu are orthogonal in H(Kγ ) (since hu ∈ H(Ku )), from which
it follows that
γu2 hu 2γ =
γu2 γu−1 hu 2u
(7.16)
h2γ =
|u|<∞
=
γu2 γu−1
|u|<∞
=
|u|<∞
γu
j∈u
|u|<∞
[0,1]|u|
1 1
[0,1]|u|
Ku (·, x), Ku (·, y)u dx dy
η(xj , yj ) dxj dyj <
0
0
γu M |u| < ∞,
|u|<∞
and thus h ∈ H(Kγ ). Note that this decomposition of h differs from the
270
J. Dick, F. Y. Kuo and I. H. Sloan
decomposition (7.15) (which holds for all functions in H(Kγ )) by a scaling
factor γu for each term hu .
For f ∈ H(Kγ ) it follows from (7.1) and (7.12) that
f (x) dx = lim
f (x{1:s} ; c) dx
s→∞ [0,1]s
[0,1]N
f,
= lim
γu Ku (xu , ·) dx
s→∞ [0,1]s
γ
u⊆{1:s}
γu hu
= lim f,
s→∞
γ
u⊆{1:s}
= f, hγ .
(7.17)
Since I∞ = hγ < ∞, it follows that I∞ is a bounded linear functional
on H(Kγ ).
7.5. Error analysis in RKHS
The analysis of the worst-case error is based on the orthogonal decomposition in the space H(Kγ ). The representer (see (3.11)) of the integration
error is given by
Kγ (y, x) dy − Q(Kγ (·, x)),
ξ(x) =
[0,1]N
where, in general, Q is a cubature algorithm of the form (7.2) which is
applied to the first variable of Kγ (later we will consider only the SL, ML,
CD algorithms). As explained in Section 3.3, the worst-case integration
error is given by
e(Q; H(Kγ )) = ξγ .
(In this subsection we use the notation e(Q; H(Kγ )) instead of e(P ; H(Kγ ))
as in Section 3.3, since we now allow unequal cubature weights in the algorithm.) Analogously to (7.16), we can obtain an orthogonal decomposition
of the worst-case error
γu2 ξu 2γ
e2 (Q; H(Kγ )) = ξ2γ =
=
|u|<∞
γu ξu 2u =
|u|<∞
γu e2 (Qu ; H(Ku )),
|u|<∞
where e(Qu ; H(Ku )) is the worst-case error in the space H(Ku ) using the
cubature rule
n−1
(i)
wi fu (tui ∩u ; c).
Qu (f ) =
i=0
High-dimensional integration: The quasi-Monte Carlo way
271
The relationship between f and its projection fu is given by (7.6) and (7.7).
Note that if for u = ∅ there does not exist at least one 0 ≤ i < n such that
u ⊆ ui , then Qu (f ) = 0 since fu (tv ; c) = 0 for v ⊆ u and v = u. This orthogonal decomposition of the worst-case error is an essential tool in analysing
the integration error. We consider now the orthogonal decompositions of
the worst-case error for the SL, ML and CD algorithms in more detail.
For the SL algorithm we obtain
e2 (QSL ; H(Kγ )) =
γu e2 (Qu ; H(Ku )) +
u⊆{1:s}
|u|<∞
u⊆{1:s}
= e2 (QSL ; H(Ks,γ )) +
γu e2 (0; H(Ku ))
γu m|u| ,
|u|<∞
u⊆{1:s}
where e2 (0; H(Ku )) is the squared initial error (see (3.10))
|u|
2
e (0; H(Ku )) = m ,
1 1
m :=
0
η(x, y) dx dy ≤ M,
0
and e2 (QSL ; H(Ks,γ )) is the squared worst-case error of the cubature rule
for the finite-dimensional weighted space H(Ks,γ ) with reproducing kernel
Ks,γ (x, y) =
γu Ku (xu , y u ),
x, y ∈ [0, 1]s .
u⊆{1:s}
We can then apply any estimate for e2 (QSL ; H(Ks,γ )) based on lattice rules
or digital nets from the previous sections. We can also use general cubature
rules other than QMC rules.
The ML algorithm approximates each level using a cubature rule with
n points. To bound the ML cubature error, note that from (7.15) we have
f (x{1:s } ; c) − f (x{1:s−1 } ; c) =
fu (xu ) −
u⊆{1:s }
fu (xu ).
u⊆{1:s−1 }
Thus the error for the ML algorithm satisfies
e2 (QML ; H(Kγ ))
=
L
=1 u⊆{1:s }
u{1:s−1 }
γu e2 (QML
u ; H(Ku )) +
|u|<∞
u⊆{1:sL }
γu e2 (0; H(Ku ))
272
J. Dick, F. Y. Kuo and I. H. Sloan
=
L
e2 (Qn , ; H(Ks ,γ − Ks−1 ,γ )) +
L
γu m|u|
|u|<∞
u⊆{1:sL }
=1
≤
e2 (Qn , ; H(Ks ,γ )) +
γu m|u| .
|u|<∞
u⊆{1:sL }
=1
We then apply previous results for e2 (Qn , ; H(Ks ,γ − Ks−1 ,γ )) or simply
e2 (Qn , ; H(Ks ,γ )).
For the CD algorithm we have
e2 (QCD ; H(Kγ )) =
γu e2 (Qnu ,u ; H(Ku ))
|u|<∞
=
γu e2 (Qnu ,u ; H(Ku )) +
|u|<∞
nu >0
γu m|u| .
|u|<∞
nu =0
Again, we then apply known estimates for e2 (Qnu ,u ; H(Ku )).
The idea now is to choose the number of cubature points (n for SL, n
for ML, nu for CD) and the dimensions or sets of variables (s for SL, s for
ML, active sets u for CD), such that the error is small for a given cost, or
conversely, the cost is small for a given required level of accuracy. One needs
to balance the error obtained by truncating the infinite-dimensional integral against the integration error. This leads to a constrained optimization
problem which in some cases can be solved using the technique of Lagrange
multipliers. For a given application, one would also need to determine suitable weights γu so that the integrand not only belongs to the RKHS setting
but also has small norm · γ .
7.6. An infinite-dimensional application
Do infinite-dimensional problems arise in practice? A class of infinite-dimensional problems arises from partial differential equations with random fields
as coefficients, with one problem being the flow of a liquid (oil or water)
through a porous medium (such as rock). For a given pressure gradient, the
rate of flow at a particular location depends on the local ‘permeability’ of
the medium, which can vary rapidly from point to point. Engineers studying
the overall flow properties then have two choices: either to model the true
permeability as a rapidly varying function over the region, or (as is often
done in practice) to model the permeability as a ‘random field’ whose general
characteristics match those of the physical problem. A random field over
a two- or three-dimensional region may require a countably infinite number of independent random variables for its complete description, hence
High-dimensional integration: The quasi-Monte Carlo way
273
the infinite-dimensionality of the problem. Such problems are presently
tackled by a variety of methods, beginning with Norbert Wiener’s ‘polynomial chaos’, and continuing to ‘generalized polynomial chaos’, ‘stochastic
Galerkin’, and ‘stochastic collocation’ methods. In all those methods the
probabilistic aspect of the problem is parametrized by a continuous variable
x with a large (possibly infinite) number of components. The solution is approximated by truncation to a manageable number of components of x, and
an approximate solution depending on both the truncated x and the physical variables is then sought in a finite-dimensional function space. All such
methods face great challenges when the effective dimension is large (where
the effective dimension may be loosely defined as the number of components
of x needed to obtain a reliable approximation). For hard problems of this
type MC or QMC methods may be the engineer’s only choice (Ghanem and
Spanos 1991).
In a recent paper (Kuo et al. 2012) a QMC method (randomly shifted
lattice rule) was applied to a simple (but infinite-dimensional) model of a
partial differential equation with a random coefficient, using the methodology of Section 5, after truncation to a finite dimension s. An interesting
feature of this work was the appearance for the first time of POD (product
and order-dependent) weights (see (4.4)), obtained by minimizing a certain
upper bound on the error of some functional (for example, the overall effective permeability) of the solution. An ML version of the same problem
was studied in Kuo, Schwab and Sloan (2013), using an ML scheme more
complicated than that above, in that a different spatial discretization was
employed at each level .
7.7. Notes
Early studies of numerical integration for infinite-dimensional problems (and
ML methods) include the following: Wasilkowski and Woźniakowski (1995)
considered Feynman–Kac path integration, Wasilkowski and Woźniakowski
(1996) considered path integration, Heinrich (1998) studied ML in the context of integral equations, Heinrich and Sindambiwe (1999) studied ML in
the context of parametric integration, Hickernell and Wang (2002) studied
QMC algorithms for infinite-dimensional integration. Giles (2008) considered ML MC schemes for stochastic partial differential equations.
Creutzig et al. (2009) introduced the variable subspace sampling model
(which generalizes cost model A) and obtained optimality results for ML
algorithms for this model, which apply in particular to infinite-dimensional
integration of stochastic differential equations. Infinite-dimensional integration using randomized MC algorithms in a hierarchy of finite-dimensional subspaces was studied by Hickernell, Müller-Gronbach, Niu and Ritter (2010). Niu, Hickernell, Müller-Gronbach and Ritter (2011) studied
274
J. Dick, F. Y. Kuo and I. H. Sloan
deterministic ML algorithms. Niu and Hickernell (2009) studied MC simulation of infinite-dimensional stochastic integrals arising in financial applications. The CD algorithm combined with QMC methods was introduced in Kuo et al. (2010c). The above results for ML and CD algorithms
were improved by Gnewuch (2012b). Plaskota and Wasilkowski (2011) considered the CD algorithm combined with Smolyak algorithms and studied
tractability in the worst-case and randomized setting. Higher-order convergence using higher-order polynomial lattice rules in infinite-dimensional
anchored spaces was studied by Dick and Gnewuch (2013). Baldeaux (2012)
and Baldeaux and Gnewuch (2013) studied randomly scrambled polynomial
lattice rules for infinite-dimensional integration, the latter in unanchored
Sobolev spaces. Lower bounds were considered by Gnewuch (2013). A
probabilistic ML algorithm which yields an unbiased estimator was introduced by Rhee and Glynn (2012).
The mathematics of random fields is well described in the book by Adler
(1981). For engineering applications of the MC method to problems of
porous flow: see for instance Ghanem and Spanos (1991). The ML MC
approach has recently been developed for elliptic problems with random
input data in Barth, Schwab and Zollinger (2011), Charrier, Scheichl and
Teckentrup (2011), Cliffe, Giles, Scheichl and Teckentrup (2011), Schwab
and Gittelson (2011) and Teckentrup, Scheichl, Giles and Ullmann (2012).
Application of QMC methods to these PDE problems has been considered
in Graham et al. (2011), Kuo et al. (2012, 2013) and Graham et al. (2013).
8. Concluding remarks
In this survey we have tried to capture key concepts and methods in the
rapidly developing field of high-dimensional numerical integration. We have
concentrated on equal-weight rules, not because they will always be the best
rules for a particular problem (indeed, in low dimensions we know that this
is certainly not the case), but because they can be used in practice for very
high-dimensional problems. We have not covered sparse grid methods, reviewed in a previous Acta Numerica article by Bungartz and Griebel (2004),
which also provide a well-understood approach to integration in moderately
high dimensions, and which are especially attractive for the related topic of
approximation of functions.
In this rapidly developing subject it is almost certain that future research
will give a different emphasis to the material covered in this survey, but we
think it likely that topics such as reproducing kernels, weighted function
spaces, component-by-component constructions, and discrepancy in all its
forms, will continue to play starring roles in the science of high-dimensional
numerical integration.
High-dimensional integration: The quasi-Monte Carlo way
275
Acknowledgements
We gratefully acknowledge the the support of the Australian Research Council. J. Dick is supported by an Australian Research Council QEII Fellowship.
We have received valuable comments and corrections from Johann Brauchart, Michael Gnewuch, Mario Hefter, Hernan Leovey, Sebastian Mayer,
Harald Niederreiter, Dirk Nuyens, Steffen Omland, Art Owen, Klaus Ritter,
Grzegorz Wasilkowski, Markus Weimar, and Henryk Woźniakowski.
REFERENCES
N. Achtsis and D. Nuyens (2012), A component-by-component construction for the
trigonometric degree. In Monte Carlo and Quasi-Monte Carlo Methods 2010
(L. Plaskota and H. Woźniakowski, eds), Springer, pp. 235–253.
P. Acworth, M. Broadie and P. Glasserman (1998), A comparison of some Monte
Carlo and quasi-Monte Carlo techniques for option pricing. In Monte Carlo
and Quasi-Monte Carlo Methods 1996 (P. Hellekalek, G. Larcher, H. Niederreiter and P. Zinterhof, eds), Springer, pp. 1–18.
R. J. Adler (1981), The Geometry of Random Fields, Wiley.
C. Aistleitner (2011), ‘Covering numbers, dyadic chaining and discrepancy’,
J. Complexity 27, 531–540.
C. Aistleitner and M. Hofer (2012), ‘Probabilistic error bounds for the discrepancy
of mixed sequences’, Monte Carlo Methods Appl. 18, 181–200.
I. A. Antonov and V. M. Saleev (1979), ‘An effective method for the computation
of λPτ -sequences’, Ž. Vyčisl. Mat. i Mat. Fiz. 19, 243–245, 271.
N. Aronszajn (1950), ‘Theory of reproducing kernels’, Trans. Amer. Math. Soc.
68, 337–404.
E. I. Atanassov (2004a), Efficient CPU-specific algorithm for generating the generalized Faure sequences. In Large-Scale Scientific Computing, Vol. 2907 of
Lecture Notes in Computer Science, Springer, pp. 121–127.
E. I. Atanassov (2004b), ‘On the discrepancy of Halton sequences’, Math. Balkanica
18, 15–32.
N. S. Bakhvalov (1959), ‘On approximate computation of integrals’ (in Russian),
Vestnik MGU, Ser. Math. Mech. Astron. Phys. Chem. 4, 3–18.
J. Baldeaux (2012), Scrambled polynomial lattice rules for infinite-dimensional integration. In Monte Carlo and Quasi-Monte Carlo Methods 2010 (L. Plaskota
and H. Woźniakowski, eds), Springer, pp. 255–263.
J. Baldeaux and J. Dick (2009), ‘QMC rules of arbitrary high order: Reproducing
kernel Hilbert space approach’, Constr. Approx. 30, 495–527.
J. Baldeaux and J. Dick (2011), ‘A construction of polynomial lattice rules with
small gain coefficients’, Numer. Math. 119, 271–297.
J. Baldeaux and M. Gnewuch (2013), Optimal randomized multilevel algorithms
for infinite-dimensional integration on function spaces with ANOVA-type decomposition. Submitted.
J. Baldeaux, J. Dick, J. Greslehner and F. Pillichshammer (2011), ‘Construction
algorithms for higher order polynomial lattice rules’, J. Complexity 27, 281–
299.
276
J. Dick, F. Y. Kuo and I. H. Sloan
J. Baldeaux, J. Dick, G. Leobacher, D. Nuyens and F. Pillichshammer (2012), ‘Efficient calculation of the worst-case error and (fast) component-by-component
construction of higher order polynomial lattice rules’, Numer. Algorithms 59,
403–431.
A. Barth, C. Schwab and N. Zollinger (2011), ‘Multi-level Monte Carlo finite element method for elliptic PDEs with stochastic coefficients’, Numer. Math.
119, 123–161.
P. Bratley and B. L. Fox (1988), ‘Algorithm 659: Implementing Sobol s quasirandom sequence generator’, ACM Trans. Math. Softw. 14, 88–100.
P. Bratley, B. L. Fox, and H. Niederreiter (1992), ‘Implementation and tests of
low-discrepancy sequences’, ACM Trans. Model. Comput. Simul. 2, 195–213.
H. Bungartz and M. Griebel (2004), Sparse grids. In Acta Numerica, Vol. 13,
Cambridge University Press, pp. 147–269.
R. E. Caflisch, W. Morokoff and A. Owen (1997), ‘Valuation of mortgage backed
securities using Brownian bridges to reduce effective dimension’, J. Comput.
Finance 1, 27–46.
J. Charrier, R. Scheichl and A. L. Teckentrup (2011), Finite element error analysis
of elliptic PDEs with random coefficients and its application to multilevel
Monte Carlo methods. Preprint 2/11, Bath Institute For Complex Systems.
B. Chazelle (2000), The Discrepancy Method: Randomness and Complexity, Cambridge University Press.
H. E. Chrestenson (1955), ‘A class of generalized Walsh functions’, Pacific J. Math.
5, 17–31.
K. L. Chung (1974), A Course in Probability Theory, second edition, Vol. 21 of
Probability and Mathematical Statistics, Academic Press.
K. A. Cliffe, M. B. Giles, R. Scheichl and A. L. Teckentrup (2011), ‘Multilevel
Monte Carlo methods and applications to elliptic PDEs with random coefficients’, Comput. Vis. Sci. 14, 3–15.
R. Cools (1997), Constructing cubature formulae: The science behind the art. In
Acta Numerica, Vol. 6, Cambridge University Press, pp. 1–54.
R. Cools and J. N. Lyness (2001), ‘Three- and four-dimensional K-optimal lattice
rules of moderate trigonometric degree’, Math. Comp. 70, 1549–1567.
R. Cools and D. Nuyens (2008), A Belgian view on lattice rules. In Monte Carlo and
Quasi-Monte Carlo Methods 2006 (A. Keller, S. Heinrich and H. Niederreiter,
eds), Springer, pp. 3–21.
R. Cools, F. Y. Kuo and D. Nuyens (2006), ‘Constructing embedded lattice rules
for multivariate integration’, SIAM J. Sci. Comput. 28, 2162–2188.
R. Cools, F. Y. Kuo and D. Nuyens (2010), ‘Constructing lattice rules based on
weighted degree of exactness and worst case error’, Computing 87, 63–89.
J. Creutzig, S. Dereich, T. Müller-Gronbach and K. Ritter (2009), ‘Infinitedimensional quadrature and approximation of functions’, Found. Comp.
Math. 9, 391–429.
P. J. Davis and P. Rabinowitz (1984), Methods of Numerical Integration, second
edition, Academic.
L. Devroye (1986), Nonuniform Random Variate Generation, Springer.
J. Dick (2004), ‘On the convergence rate of the component-by-component construction of good lattice rules’, J. Complexity 20, 493–522.
High-dimensional integration: The quasi-Monte Carlo way
277
J. Dick (2007a), ‘Explicit constructions of quasi-Monte Carlo rules for the numerical
integration of high-dimensional periodic functions’, SIAM J. Numer. Anal.
45, 2141–2176.
J. Dick (2007b), ‘The construction of extensible polynomial lattice rules with small
weighted star discrepancy’, Math. Comp. 76, 2077–2085.
J. Dick (2008), ‘Walsh spaces containing smooth functions and quasi-Monte Carlo
rules of arbitrary high order’, SIAM J. Numer. Anal. 46, 1519–1553.
J. Dick (2009a), ‘The decay of the Walsh coefficients of smooth functions’, Bull.
Aust. Math. Soc. 80, 430–453.
J. Dick (2009b), On Quasi-Monte Carlo rules achieving higher order convergence.
In Monte Carlo and Quasi-Monte Carlo Methods 2008 (P. L’Ecuyer and
A. B. Owen, eds), Springer, pp. 73–96.
J. Dick (2011a), ‘Higher order scrambled digital nets achieve the optimal rate of the
root mean square error for smooth integrands’, Ann. Statist. 39, 1372–1398.
J. Dick (2011b), ‘Quasi-Monte Carlo numerical integration on Rs : Digital nets and
worst-case error’, SIAM J. Numer. Anal. 49, 1661–1691.
J. Dick (2012), ‘Random weights, robust lattice rules and the geometry of the cbcrc
algorithm’, Numer. Math. 122, 443–467.
J. Dick and J. Baldeaux (2009), Equidistribution properties of generalized nets
and sequences. In Monte Carlo and Quasi-Monte Carlo Methods 2008
(P. L’Ecuyer and A. B. Owen, eds), Springer, pp. 305–322.
J. Dick and M. Gnewuch (2013), Infinite-dimensional integration in weighted
Hilbert spaces: Anchored decompositions, optimal deterministic algorithms,
and higher order convergence. Submitted.
J. Dick and P. Kritzer (2010), ‘Duality theory and propagation rules for generalized
digital nets’, Math. Comp. 79, 993–1017.
J. Dick and F. Y. Kuo (2004a), ‘Reducing the construction cost of the componentby-component construction of good lattice rules’, Math. Comp. 73, 1967–
1988.
J. Dick and F. Y. Kuo (2004b), Constructing good lattice rules with millions of
points. In Monte Carlo and Quasi-Monte Carlo Methods 2002 (H. Niederreiter, ed.), Springer, pp. 181–197.
J. Dick and F. Pillichshammer (2005), ‘Multivariate integration in weighted Hilbert
spaces based on Walsh functions and weighted Sobolev spaces’, J. Complexity
21, 149–195.
J. Dick and F. Pillichshammer (2007), ‘Strong tractability of multivariate integration of arbitrary high order using digitally shifted polynomial lattice rules’,
J. Complexity 23, 436–453.
J. Dick and F. Pillichshammer (2010), Digital Nets and Sequences, Cambridge
University Press.
J. Dick, P. Kritzer, G. Leobacher and F. Pillichshammer (2007a), ‘Constructions
of general polynomial lattice rules based on the weighted star discrepancy’,
Finite Fields Appl. 13, 1045–1070.
J. Dick, P. Kritzer, F. Pillichshammer and W. C. Schmid (2007b), ‘On the existence
of higher order polynomial lattices based on a generalized figure of merit’,
J. Complexity 23, 581–593.
278
J. Dick, F. Y. Kuo and I. H. Sloan
J. Dick, F .Y. Kuo, F. Pillichshammer and I. H. Sloan (2005), ‘Construction algorithms for polynomial lattice rules for multivariate integration’, Math. Comp.
74, 1895–1921.
J. Dick, G. Larcher, F. Pillichshammer and H. Woźniakowski (2011), ‘Exponential
convergence and tractability of multivariate integration for Korobov spaces’,
Math. Comp. 80, 905–930.
J. Dick, D. Nuyens and F. Pillichshammer (2013), Lattice rules for nonperiodic
smooth integrands. Submitted.
J. Dick, F. Pillichshammer, and B. J. Waterhouse (2008), ‘The construction of
good extensible rank-1 lattices’, Math. Comp. 77, 2345–2374.
J. Dick, I. H. Sloan, X. Wang and H. Woźniakowski (2006), ‘Good lattice rules in
weighted Korobov spaces with general weights’, Numer. Math. 103, 63–97.
Digital Library of Mathematical Functions, National Institute of Standards and
Technology.
B. Doerr, M. Gnewuch, P. Kritzer and F. Pillichshammer (2008), ‘Component-bycomponent construction of low-discrepancy point sets of small size’, Monte
Carlo Methods Appl. 14, 129–149.
B. Doerr, M. Gnewuch and M. Wahlström (2009), Implementation of a componentby-component algorithm to generate small low-discrepancy samples. In Monte
Carlo and Quasi-Monte Carlo Methods 2008 (P. L’Ecuyer and A. B. Owen,
eds), Springer, pp. 323–338.
B. Doerr, M. Gnewuch and M. Wahlström (2010), ‘Algorithmic construction of lowdiscrepancy point sets via dependent randomized rounding’, J. Complexity
26, 490–507.
M. Drmota and R. F. Tichy (1997), Sequences, Discrepancies and Applications,
Vol. 1651 of Lecture Notes in Mathematics, Springer.
K. T. Fang and Y. Wang (1994), Number-Theoretic Methods in Statistics, Chapman
& Hall.
H. Faure (1982), ‘Discrépance de suites associées à un système de numération (en
dimension s)’, Acta Arith. 41, 337–351.
N. J. Fine (1949), ‘On the Walsh functions’, Trans. Amer. Math. Soc. 65, 372–414.
B. L. Fox (1986), ‘Algorithm 647: Implementation and relative efficiency of quasirandom sequence generators’, ACM Trans. Math. Softw. 12, 362–376.
R. G. Ghanem and P. D. Spanos (1991), Stochastic Finite Elements: A Spectral
Approach, Springer.
M. B. Giles (2008), ‘Multilevel Monte Carlo path simulation’, Oper. Res. 56, 607–
617.
M. Giles, F. Y. Kuo, I. H. Sloan and B. J. Waterhouse (2008), ‘Quasi-Monte Carlo
for finance applications’, ANZIAM J. 50 (CTAC2008), C308–C323.
M. Gnewuch (2008), ‘Bracketing numbers for axis-parallel boxes and applications
to geometric discrepancy’, J. Complexity 24, 154–172.
M. Gnewuch (2009), ‘On probabilistic results for the discrepancy of a hybrid-Monte
Carlo sequence’, J. Complexity 23, 312–317.
M. Gnewuch (2012a), ‘Weighted geometric discrepancies and numerical integration
on reproducing kernel Hilbert spaces’, J. Complexity 28, 2–17.
M. Gnewuch (2012b), ‘Infinite-dimensional integration on weighted Hilbert spaces’,
Math. Comp. 280, 2175–2205.
High-dimensional integration: The quasi-Monte Carlo way
279
M. Gnewuch (2013), Lower error bounds for randomized multilevel and changing
dimension algorithms. In Monte Carlo and Quasi-Monte Carlo Methods 2012
(J. Dick, F. Y. Kuo, G. W. Peters and I. H. Sloan, eds), Springer, to appear.
M. Gnewuch and A. V. Roşca (2009), ‘On G-discrepancy and mixed Monte Carlo
and quasi-Monte Carlo sequences’, Acta Univ. Apulensis Math. Inform. 18,
97–110.
M. Gnewuch, S. Mayer and K. Ritter (2013), On weighted Hilbert spaces and
integration of functions of infinitely many variables. Submitted.
M. Gnewuch, A. Srivastav and C. Winzen (2009), ‘Finding optimal volume subintervals with k-points and calculating the star discrepancy are NP-hard problems’, J. Complexity 25, 115–127.
M. Gnewuch, M. Wahlström and C. Winzen (2012), ‘A new randomized algorithm
to approximate the star discrepancy based on threshold accepting’, SIAM J.
Numer. Anal. 50, 781–807.
T. Goda and J. Dick (2013), Construction of interlaced scrambled polynomial lattice rules of arbitrary high order. Submitted.
I. G. Graham, F. Y. Kuo, J. Nichols, R. Scheichl, C. Schwab and I. H. Sloan (2013),
Quasi-Monte Carlo finite element methods for elliptic PDEs with log-normal
random coefficients. In preparation.
I. G. Graham, F. Y. Kuo, D. Nuyens, R. Scheichl and I. H. Sloan (2011), ‘QuasiMonte Carlo methods for elliptic PDEs with random coefficients and applications’, J. Comput. Phys. 230, 3668–3694.
M. Griebel (2006), Sparse grids and related approximation schemes for higher
dimensional problems. In Foundations of Computational Mathematics, Santander, 2005, Cambridge University Press, pp. 106–161.
M. Griebel, F. Y. Kuo and I. H. Sloan (2013), ‘The smoothing effect of integration
in Rd and the ANOVA decomposition’, Math. Comp. 82, 383–400.
J. H. Halton (1960), ‘On the efficiency of certain quasi-random sequences of points
in evaluating multi-dimensional integrals’, Numer. Math. 2, 84–90.
J. M. Hammersley and D. C. Handscomb (1964), Monte Carlo Methods, Methuen.
T. Hansen, G. L. Mullen and H. Niederreiter (1993), ‘Good parameters for a class
of node sets in quasi-Monte Carlo integration’, Math. Comp. 61, 225–234.
S. Heinrich (1998), ‘Monte Carlo complexity of global solution of integral equations’, J. Complexity 14, 151–175.
S. Heinrich and E. Sindambiwe (1999), ‘Monte Carlo complexity of parametric
integration’, J. Complexity 15, 317–341.
S. Heinrich, F. Hickernell and R. X. Yue (2004), ‘Optimal quadrature for Haar
wavelet spaces’, Math. Comp. 73, 259–277.
S. Heinrich, E. Novak, G. W. Wasilkowski, H. Woźniakowski (2001), ‘The inverse
of the star-discrepancy depends linearly on the dimension’, Acta Arith. 96,
279–302.
F. J. Hickernell (1996a), ‘Quadrature error bounds with applications to lattice
rules’, SIAM J. Numer. Anal. 33, 1995–2016. Erratum: ‘Quadrature error
bounds with applications to lattice rules’, SIAM J. Numer. Anal. 34, 853–
866 (1997).
F. J. Hickernell (1996b), ‘The mean square discrepancy of randomized nets’, ACM
Trans. Modeling Comput. Simul. 6, 274–296.
280
J. Dick, F. Y. Kuo and I. H. Sloan
F. J. Hickernell (1998a), ‘A generalized discrepancy and quadrature error bound’,
Math. Comp. 67, 299–322.
F. J. Hickernell (1998b), Lattice rules: How well do they measure up? In Random
and Quasi-Random Point Sets (P. Hellekalek and G. Larcher, eds), Springer,
pp. 109–166.
F. J. Hickernell (1999), ‘Goodness-of-fit statistics, discrepancies and robust designs’, Statist. Probab. Lett. 44, 73–78.
F. J. Hickernell (2002), Obtaining O(N −2+ ) convergence for lattice quadrature
rules. In Monte Carlo and Quasi-Monte Carlo Methods 2000 (K. T. Fang, F.
J. Hickernell and H. Niederreiter, eds), Springer, pp. 274–289.
F. J. Hickernell and H. S. Hong (2002), Quasi-Monte Carlo methods and their
randomisations. In Applied Probability: AMS/IP Studies in Advanced Mathematics, Vol. 26 (R. Chan, Y.-K. Kwok, D. Yao and Q. Zhang, eds), AMS,
pp. 59–77.
F. J. Hickernell and H. Niederreiter (2003), ‘The existence of good extensible rank-1
lattices’, J. Complexity 19, 286–300.
F. J. Hickernell and X. Wang (2002), ‘The error bounds and tractability of quasiMonte Carlo algorithms in infinite dimension’, Math. Comp. 71, 1641–1661.
F. J. Hickernell and H. Woźniakowski (2000), ‘Integration and approximation in
arbitrary dimensions’, Adv. Comput. Math. 12, 25–58.
F. J. Hickernell and H. Woźniakowski (2001), ‘Tractability of multivariate integration for periodic functions’, J. Complexity 17, 660–682.
F. J. Hickernell and R. X. Yue (2000), ‘The mean square discrepancy of scrambled
(t, s)-sequences’, SIAM J. Numer. Anal. 38, 1089–1112.
F. J. Hickernell, H. S. Hong, P. L’Ecuyer, and C. Lemieux (2000), ‘Extensible
lattice sequences for quasi-Monte Carlo quadrature’, SIAM J. Sci. Comput.
22, 1117–1138.
F. J. Hickernell, P. Kritzer, F. Y. Kuo and D. Nuyens (2012), ‘Weighted compound
integration rules with higher order convergence for all N ’, Numer. Algorithms
59, 161–183.
F. J. Hickernell, C. Lemieux and A. B. Owen (2005), ‘Control variates for quasiMonte Carlo’, with comments by Pierre L’Ecuyer and Xiao-Li Meng and a
rejoinder by the authors. Statist. Sci. 20, 1–31.
F. Hickernell, T. Müller-Gronbach, B. Niu and K. Ritter (2010), ‘Monte Carlo
algorithms for infinite-dimensional integration on RN ’, J. Complexity 26, 229–
254.
A. Hinrichs (2004), ‘Covering numbers, Vapnik–Červonenkis classes and bounds
for the star-discrepancy’, J. Complexity 20, 477–483.
A. Hinrichs, F. Pillichshammer and W. C. Schmid (2008), ‘Tractability properties
of the weighted star discrepancy’, J. Complexity 24, 134–143.
E. Hlawka (1961), ‘Funktionen von beschränkter Variation in der Theorie der
Gleichverteilung’, Ann. Mat. Pura Appl. 54, 325–333.
E. Hlawka (1962), ‘Zur angenäherten Berechnung mehrfacher Integrale’, Monatsh.
Math. 66, 140–151.
R. Hofer and P. Kritzer (2011), ‘On hybrid sequences built from Niederreiter–
Halton sequences and Kronecker sequences’, Bull. Aust. Math. Soc. 84, 238–
254.
High-dimensional integration: The quasi-Monte Carlo way
281
R. Hofer and G. Larcher (2010), ‘On existence and discrepancy of certain digital
Niederreiter–Halton sequences’, Acta Arith. 141, 369–394.
R. Hofer and G. Larcher (2012), ‘Metrical results on the discrepancy of Halton–
Kronecker sequences’, Math. Z. 271, 1–11.
R. Hofer and H. Niederreiter (2013), ‘A construction of (t, s)-sequences with finiterow generating matrices using global function fields’, Finite Fields Appl. 21,
97–110.
R. Hofer and G. Pirsic (2011), ‘An explicit construction of finite-row digital (0, s)sequences’, Unif. Distrib. Theory 6, 13–30.
R. Hofer, P. Kritzer, G. Larcher and F. Pillichshammer (2009), ‘Distribution
properties of generalized van der Corput–Halton sequences and their subsequences’, Int. J. Number Theory 5, 719–746.
W. Hörmann, J. Leydold and G. Derflinger (2010), Automatic Nonuniform Random
Variate Generation, Springer.
S. Joe (2004), Component by component construction of rank-1 lattice rules having
O(n−1 (ln(n))d ) star discrepancy. In Monte Carlo and Quasi-Monte Carlo
Methods 2002 (H. Niederreiter, ed.), Springer, pp. 293–298.
S. Joe (2006), Construction of good rank-1 lattice rules based on the weighted
star discrepancy. In Monte Carlo and Quasi-Monte Carlo Methods 2004
(H. Niederreiter and D. Talay, eds), Springer, pp. 181–196.
S. Joe and F. Y. Kuo (2008), ‘Constructing Sobol sequences with better twodimensional projection’, SIAM J. Sci. Comput. 30, 2635–2654.
L. Kämmerer, S. Kunis and D. Potts (2012), ‘Interpolation lattices for hyperbolic
cross trigonometric polynomials’, J. Complexity 28, 76–92.
A. Keller (2006), Myths of computer graphics. In Monte Carlo and Quasi-Monte
Carlo Methods 2004 (H. Niederreiter and D. Talay, eds), Springer, pp. 217–
243.
A. Keller (2013), Quasi-Monte Carlo image synthesis in a nutshell. In Monte Carlo
and Quasi-Monte Carlo Methods 2012 (J. Dick, F. Y. Kuo, G. W. Peters and
I. H. Sloan, eds), Springer, to appear.
J. F. Koksma (1942/43), ‘Een algemeene stelling uit de theorie der gelijkmatige
verdeeling modulo 1’, Mathematica B (Zutphen) 11, 7–11.
N. M. Korobov (1959), ‘The approximate computation of multiple integrals’ (in
Russian), Dokl. Akad. Nauk SSSR 124, 1207–1210.
N. M. Korobov (1963), Number-Theoretic Methods in Approximate Analysis, Fizmatgiz.
N. M. Korobov (1982), ‘On the calculation of optimal coefficients’ (in Russian),
Dokl. Akad. Nauk SSSR 267, 289–292.
P. Kritzer and F. Pillichshammer (2007), ‘Constructions of general polynomial
lattices for multivariate integration’, Bull. Austral. Math. Soc. 76, 93–110.
L. Kuipers and H. Niederreiter (1974), Uniform Distribution of Sequences, Wiley.
F. Y. Kuo (2003), ‘Component-by-component constructions achieve the optimal
rate of convergence for multivariate integration in weighted Korobov and
Sobolev spaces’, J. Complexity 19, 301–320.
F. Y. Kuo and S. Joe (2002), ‘Component-by-component construction of good
lattice rules with a composite number of points’, J. Complexity 18, 943–976.
282
J. Dick, F. Y. Kuo and I. H. Sloan
F. Y. Kuo and S. Joe (2003), ‘Component-by-component construction of good
intermediate-rank lattice rules’, SIAM J. Numer. Anal. 41, 1465–1486.
F. Y. Kuo and I. H. Sloan (2005), ‘Lifting the curse of dimensionality’, Not. Amer.
Math. Soc. 52, 1320–1328.
F. Y. Kuo, W. T. M. Dunsmuir, I. H. Sloan, M. P. Wand, R. S. Womersley
(2008a), ‘Quasi-Monte Carlo for highly structured generalised response models’, Method. Comput. Appl. Probab. 10, 239–275.
K. Y. Kuo, C. Schwab and I. H. Sloan (2011), ‘Quasi-Monte Carlo methods for
high-dimensional integration: The standard (weighted Hilbert space) setting
and beyond’, ANZIAM J. 53, 1–37.
F. Y. Kuo, C. Schwab and I. H. Sloan (2012), Quasi-Monte Carlo finite element
methods for a class of elliptic partial differential equations with random coefficients, SIAM J. Numer. Anal. 50, 3351–3374.
F. Y. Kuo, C. Schwab and I. H. Sloan (2013), Multi-level quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with
random coefficients. Submitted.
F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski, and B. J. Waterhouse (2010a), ’Randomly shifted lattice rules with the optimal rate of convergence for unbounded
integrands’, J. Complexity 26, 135–160.
F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski and H. Woźniakowski (2010b), ‘On
decompositions of multivariate functions’, Math. Comp. 79, 953–966.
F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski and H. Woźniakowski (2010c), ‘Liberating the dimension’, J. Complexity 26, 422–454.
F. Y. Kuo, I. H. Sloan, and H. Woźniakowski (2006a), Lattice rules for multivariate
approximation in the worst case setting. In Monte Carlo and Quasi-Monte
Carlo Methods 2004 (H. Niederreiter and D. Talay, eds), Springer, pp. 289–
330.
F. Y. Kuo, I. H. Sloan and H. Woźniakowski (2007), ‘Periodization strategy may
fail in high dimensions’, Numer. Algorithms 46, 369–391.
F. Y. Kuo, I. H. Sloan, and H. Woźniakowski (2008b), ‘Lattice rule algorithms for
multivariate approximation in the average case setting’, J. Complexity 24,
283–323.
F. Y. Kuo, G. W. Wasilkowski, and B. J. Waterhouse (2006b), ‘Randomly shifted
lattice rules for unbounded integrals’, J. Complexity 22, 630–651.
F. Y. Kuo, G. W. Wasilkowski, and H. Woźniakowski (2009), ‘Lattice algorithms
for multivariate L∞ approximation in the worst case setting’, Constr. Approx.
30, 475–493.
G. Larcher and C. Traunfellner (1994), ‘On the numerical integration of Walsh
series by number-theoretic methods’, Math. Comp. 63, 277–291.
G. Larcher, A. Lauß, H. Niederreiter and W. C. Schmid (1996a), ‘Optimal polynomials for (t, m, s)-nets and numerical integration of multivariate Walsh series’,
SIAM J. Numer. Anal. 33, 2239–2253.
G. Larcher, W. C. Schmid and R. Wolf (1994), ‘Representation of functions as
Walsh series to different bases and an application to the numerical integration
of high-dimensional Walsh series’, Math. Comp. 63, 701–716.
High-dimensional integration: The quasi-Monte Carlo way
283
G. Larcher, W. C. Schmid and R. Wolf (1996b), ‘Quasi-Monte Carlo methods
for the numerical integration of multivariate Walsh series: Monte Carlo and
quasi-Monte Carlo methods’, Math. Comput. Modelling 23, 55–67.
D. Laurie (1996), ‘Periodizing transformations for numerical integration’, J. Comput. Appl. Math. 66, 337–344.
P. L’Ecuyer (2004), Quasi-Monte Carlo methods in finance. In Proc. 2004 Winter
Simulation Conference (R. G. Ingalls, M. D. Rossetti, J. S. Smith and B. A.
Peters, eds), IEEE Computer Society Press, pp. 1645–1655.
P. L’Ecuyer and C. Lemieux (2000), ‘Variance reduction via lattice rules’, Management Sci. 46, 1214–1235.
P. L’Ecuyer and D. Munger (2012), On figures of merit for randomly shifted lattice
rules. In Monte Carlo and Quasi-Monte Carlo Methods 2010 (L. Plaskota
and H. Woźniakowski, eds), Springer, pp. 133–159.
P. L’Ecuyer, D. Munger and B. Tuffin (2010), ’On the distribution of integration
error by randomly-shifted lattice rules’. Electron. J. Stat. 4, 950–993.
C. Lemieux (2009), Monte Carlo and Quasi-Monte Carlo Sampling, Springer.
C. Lemieux and P. L’Ecuyer (2001), ‘On selection criteria for lattice rules and other
quasi-Monte Carlo point sets’, Math. Comput. Simulation 55, 139–148.
D. Li and F. J. Hickernell (2003), Trigonometric spectral collocation methods on
lattices. In Recent Advances in Scientific Computing and Partial Differential
Equations (S. Y. Cheng, C.-W. Shu and T. Tang, eds), Vol. 330 of AMS
Series in Contemporary Mathematics, AMS, pp. 121–132.
G. Li, J. Schoendorf, T.-S, Ho and H. Rabitz (2004), ‘Multicut-HDMR with an
application to an ionospheric model’, J. Comput. Chem. 25, 1149–1156.
W. L. Loh (2003), ‘On the asymptotic distribution of scrambled net quadrature’,
Ann. Statist. 31, 1282–1324.
J. Lyness and T. Sørevik (2006), ‘Five-dimensional k-optimal lattice rules’, Math.
Comp. 75, 1467–1480.
E. Maize (1980), Contributions to the theory of error reduction in quasi-Monte
Carlo methods. PhD thesis, Claremont Graduate School.
J. Matoušek (1998a), ‘The exponent of discrepancy is at least 1.0669’, J. Complexity
14, 448–453.
J. Matoušek (1998b), ‘On the L2 discrepancy for anchored boxes’, J. Complexity
14, 527–556.
J. Matoušek (1999), Geometric Discrepancy: An Illustrated Guide, Vol. 18 of Algorithms and Combinatorics, Springer.
M. Matsumoto and T. Yoshiki (2013), Existence of higher order convergent quasiMonte Carlo rules via Walsh figure of merit. In Monte Carlo and Quasi-Monte
Carlo Methods 2012 (J. Dick, F. Y. Kuo, G. W. Peters and I. H. Sloan, eds),
Springer, to appear.
M. Matsumoto, M. Saito, K. Matoba (2013), ‘A computable figure of merit for
quasi-Monte Carlo point sets’, Math. Comp., to appear.
H. Niederreiter (1978), ‘Quasi-Monte Carlo methods and pseudo-random numbers’,
Bull. Amer. Math. Soc. 84, 957–1041.
H. Niederreiter (1987), ‘Point sets and sequences with small discrepancy’, Monatsh.
Math. 104, 273–337.
284
J. Dick, F. Y. Kuo and I. H. Sloan
H. Niederreiter (1988), ‘Low-discrepancy and low-dispersion sequences’, J. Number
Theory 30, 51–70.
H. Niederreiter (1992a), Random Number Generation and Quasi-Monte Carlo
Methods, SIAM.
H. Niederreiter (1992b), ‘Low-discrepancy point sets obtained by digital constructions over finite fields’, Czechoslovak Math. J. 42, 143–166.
H. Niederreiter (2003), ‘The existence of good extensible polynomial lattice rules’,
Monatsh. Math. 139, 295–307.
H. Niederreiter (2004), Digital nets and coding theory. In Coding, Cryptography and
Combinatorics (K. Q. Feng, H. Niederreiter and C. P. Xing, eds), Birkhäuser,
pp. 247–257.
H. Niederreiter (2009), ‘On the discrepancy of some hybrid sequences’, Acta Arith.
138, 373–398.
H. Niederreiter (2010a), ‘A discrepancy bound for hybrid sequences involving digital explicit inversive pseudorandom numbers’, Unif. Distrib. Theory 5, 53–63.
H. Niederreiter (2010b), ‘Further discrepancy bounds and an Erdős–Turán–Koksma
inequality for hybrid sequences’, Monatsh. Math. 161, 193–222.
H. Niederreiter (2012), ‘Improved discrepancy bounds for hybrid sequences involving Halton sequences’, Acta Arith. 155, 71–84.
H. Niederreiter and F. Pillichshammer (2009), ‘Construction algorithms for good
extensible lattice rules, Constr. Approx. 30, 361–393.
H. Niederreiter and A. Winterhof (2011), ‘Discrepancy bounds for hybrid sequences
involving digital explicit inversive pseudorandom numbers’, Unif. Distrib.
Theory 6, 33–56.
H. Niederreiter and C. P. Xing (1995), ‘Low-discrepancy sequences obtained from
algebraic function fields over finite fields’, Acta Arith. 72, 281–298.
H. Niederreiter and C. P. Xing (1996a), ‘Low-discrepancy sequences and global
function fields with many rational places’, Finite Fields Appl. 2, 241–273.
H. Niederreiter and C. P. Xing (1996b), Quasirandom points and global function
fields. In Finite Fields and Applications Vol. 233 of London Mathematical
Society Lecture Note Series, Cambridge University Press, pp. 269–296.
B. Niu and F. J. Hickernell (2009), Monte Carlo simulation of stochastic integrals
when the cost of function evaluation is dimension dependent. In Monte Carlo
and Quasi-Monte Carlo Methods 2008 (P. L’Ecuyer and A. B. Owen, eds),
Springer, pp. 545–560.
B. Niu, F. J. Hickernell, T. Müller-Gronbach and K. Ritter (2011), ‘Deterministic
multi-level algorithms for infinite-dimensional integration on RN ’, J. Complexity 27, 331–351.
E. Novak and H. Woźniakowski (2001), ‘Intractability results for integration and
discrepancy’, J. Complexity, 17, 388–441.
E. Novak and H. Woźniakowski (2008), Tractability of Multivariate Problems,
Vol. I: Linear Information, EMS.
E. Novak and H. Woźniakowski (2009), L2 discrepancy and multivariate integration. In Analytic Number Theory: Essays in Honour of Klaus Roth (W. W. L.
Chen, W. T. Gowers, H. Halberstam, W. M. Schmidt and R. C. Vaughan,
eds), Cambridge University Press, pp. 359–388.
High-dimensional integration: The quasi-Monte Carlo way
285
E. Novak and H. Woźniakowski (2010), Tractability of Multivariate Problems,
Vol. II: Standard Information for Functionals, EMS.
E. Novak and H. Woźniakowski (2012), Tractability of Multivariate Problems,
Vol. III: Standard Information for Operators, EMS.
D. Nuyens and R. Cools (2006a), ‘Fast algorithms for component-by-component
construction of rank-1 lattice rules in shift-invariant reproducing kernel
Hilbert spaces’, Math. Comp. 75, 903–920.
D. Nuyens and R. Cools (2006b), ‘Fast component-by-component construction of
rank-1 lattice rules with a non-prime number of points’, J. Complexity 22,
4–28.
D. Nuyens and R. Cools (2006c), Fast component-by-component construction, a
reprise for different kernels. In Monte Carlo and Quasi-Monte Carlo Methods
2004 (H. Niederreiter and D. Talay, eds), Springer, pp. 373–387.
G. Ökten (1996), ‘A probabilistic result on the discrepancy of a hybrid Monte Carlo
sequence and applications’, Monte Carlo Methods Appl. 2, 255–270.
G. Ökten, B. Tuffin and V. Burago (2006), ‘A central limit theorem and improved
error bounds for a hybrid-Monte Carlo sequence with applications in computational finance’, J. Complexity 22, 435–458.
A. B. Owen (1995), Randomly permuted (t, m, s)-nets and (t, s)-sequences. In
Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (Las
Vegas, NV, 1994), Vol. 106 of Lecture Notes in Statistics, Springer, pp. 299–
317.
A. B. Owen (1997a), ‘Scrambled net variance for integrals of smooth functions’,
Ann. Statist. 25, 1541–1562.
A. B. Owen (1997b), ‘Monte Carlo variance of scrambled net quadrature’, SIAM
J. Numer. Anal. 34, 1884–1910.
A. B. Owen (1998), ‘Scrambling Sobol and Niederreiter–Xing points’, J. Complexity 14, 466–489.
A. B. Owen (2006), Quasi-Monte Carlo for integrands with point singularities at
unknown locations. In Monte Carlo and Quasi-Monte Carlo Methods 2004
(H. Niederreiter and D. Talay, eds), Springer, pp. 403–417.
S. H. Paskov and J. Traub (1995), ‘Faster evaluation of financial derivatives’,
J. Portfolio Management 22, 113–120.
F. Pillichshammer (2002), ‘Bounds for the quality parameter of digital shift nets
over Z2 ’, Finite Fields Appl. 8, 444–454.
F. Pillichshammer and G. Pirsic (2009), Discrepancy of hyperplane nets and cyclic
nets. In Monte Carlo and Quasi-Monte Carlo Methods 2008 (P. L’Ecuyer and
A. B. Owen, eds), Springer, pp. 573–587.
G. Pirsic (2002), A software implementation of Niederreiter–Xing sequences. In
Monte Carlo and Quasi-Monte Carlo Methods 2000 (K. T. Fang, F. J. Hickernell and H. Niederreiter, eds), Springer, pp. 434–445.
G. Pirsic and F. Pillichshammer (2011), ‘Extensible hyperplane nets’, Finite Fields
Appl. 17, 407–423.
G. Pirsic and W. C. Schmid (2001), ‘Calculation of the quality parameter of digital
nets and application to their construction’, J. Complexity 17, 827–839.
286
J. Dick, F. Y. Kuo and I. H. Sloan
G. Pirsic, J. Dick and F. Pillichshammer (2006), ‘Cyclic digital nets, hyperplane
nets, and multivariate integration in Sobolev spaces’, SIAM J. Numer. Anal.
44, 385–411.
L. Plaskota and G. Wasilkowski (2011), ‘Tractability of infinite-dimensional integration in the worst case and randomized setting’, J. Complexity 27, 505–518.
L. Plaskota, G. Wasilkowski and H. Woźniakowski (2000), ‘A new algorithm and
worst case complexity for Feynman–Kac path integration’, J. Comput. Phys.
164, 335–353.
C. M. Rader (1968), ‘Discrete Fourier transforms when the number of data samples
is prime’, Proc. IEEE 5, 1107–1108.
C.-H. Rhee and P. W. Glynn (2012), A new approach to unbiased estimation for
SDE’s. In Proc. 2012 Winter Simulation Conference (C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose and A. M. Uhrmacher, eds).
W. C. Schmid (1996), Shift-nets: A new class of binary digital (t, m, s)-nets.
In Monte Carlo and Quasi-Monte Carlo Methods 1996 (H. Niederreiter,
P. Hellekalek, G. Larcher and P. Zinterhof, eds), Vol. 127 of Lecture Notes in
Statistics, Springer, pp. 369–381.
W. C. Schmid (2000), Improvements and extensions of the ‘Salzburg Tables’ by using irreducible polynomials. In Monte Carlo and Quasi-Monte Carlo Methods
1998 (H. Niederreiter and J. Spanier, eds), Springer, pp. 436–447.
C. Schwab and C. J. Gittelson (2011), Sparse tensor discretizations of highdimensional parametric and stochastic PDEs. In Acta Numerica, Vol. 20,
Cambridge University Press, pp. 291–467.
A. Sidi (1993), A new variable transformation for numerical integration. In Numerical Integration IV: Oberwolfach, 1992 (H. Brass and G. Hämmerlin, eds),
Birkhäuser, pp. 359–373.
V. Sinescu and S. Joe (2007), ‘Good lattice rules based on the general weighted
star discrepancy’, Math. Comp. 76, 989–1004.
V. Sinescu and S. Joe (2008), Good lattice rules with a composite number of points
based on the product weighted star discrepancy. In Monte Carlo and QuasiMonte Carlo Methods 2006 (A. Keller, S. Heinrich and H. Niederreiter, eds),
Springer, pp. 645–658.
V. Sinescu and P. L’Ecuyer (2011), ‘Existence and construction of shifted lattice
rules with an arbitrary number of points and bounded weighted star discrepancy for general decreasing weights’, J. Complexity 27, 449–465.
I. H. Sloan (2007), ‘Finite order integration weights can be dangerous’, Comput.
Meth. Appl. Math. 7, 239–254.
I. H. Sloan and S. Joe (1994), Lattice Methods for Multiple Integration, Oxford
University Press.
I. H. Sloan and A, V. Reztsov (2002), ‘Component-by-component construction of
good lattice rules’, Math. Comp. 71, 263–273.
I. H. Sloan and H. Woźniakowski (1998), ‘When are quasi-Monte Carlo algorithms
efficient for high-dimensional integrals?’, J. Complexity 14, 1–33.
I. H. Sloan and H. Woźniakowski (2001), ‘Tractability of multivariate integration
for weighted Korobov classes’, J. Complexity 17, 697–721.
High-dimensional integration: The quasi-Monte Carlo way
287
I. H. Sloan and H. Woźniakowski (2002), ‘Tractability of integration in non-periodic
and periodic weighted tensor product Hilbert spaces’, J. Complexity 18, 479–
499.
I. H. Sloan, F. Y. Kuo and S. Joe (2002a) ‘On the step-by-step construction of quasiMonte Carlo integration rules that achieve strong tractability error bounds
in weighted Sobolev spaces’, Math. Comp. 71, 1609–1640.
I. H. Sloan, F. Y. Kuo and S. Joe (2002b) ‘Constructing randomly shifted lattice
rules in weighted Sobolev spaces’, SIAM J. Numer. Anal. 40, 1650–1665.
I. H. Sloan, X. Wang and H. Woźniakowski (2004), ‘Finite-order weights imply
tractability of multivariate integration’, J. Complexity 20, 46–74.
I. M. Sobol (1967), ‘Distribution of points in a cube and approximate evaluation
of integrals’ (in Russian), Ž. Vyčisl. Mat. i Mat. Fiz. 7, 784–802.
I. M. Sobol (1969), Multidimensional Quadrature Formulas and Haar Functions
(in Russian), Nauka.
S. Smolyak (1963), ‘Quadrature and interpolation formulas for tensor products of
certain classes of functions’, Soviet Math. Dokl. 4, 240–243. Russian original
in Dokl. Akad. Nauk SSSR 148 (1963), 1042–1045.
A. H. Stroud (1971), Approximate Calculation of Multiple Integrals, Prentice-Hall.
A. L. Teckentrup, R. Scheichl, M. B. Giles and E. Ullmann (2012), Further analysis
of multilevel Monte Carlo methods for elliptic PDEs with random coefficient.
Technical Report, University of Bath.
S. Tezuka (2013), ‘On the discrepancy of generalized Niederreiter sequences,
J. Complexity, to appear.
S. Tezuka and H. Faure (2003), ‘I-binomial scrambling of digital nets and sequences’, J. Complexity 19, 744–757.
C. Thomas-Agnan (1996), ‘Computing a family of reproducing kernels for statistical applications’, Numer. Algorithms 13, 21–32.
J. F. Traub, G. W. Wasilkowski and H. Woźniakowski (2008), Information-Based
Complexity, Academic Press.
A. W. van der Vart and J. A. Wellner (2009), Weak Convergence and Empirical
Processes, Springer Series in Statistics, Springer.
G. Wahba (1990), Spline Models for Observational Data, Vol. 59 of CBMS-NSF
Regional Conference Series in Applied Mathematics, SIAM.
J. L. Walsh (1923), ‘A closed set of normal orthogonal functions’, Amer. J. Math.
45, 5–24.
X. Wang (2002), ‘A constructive approach to strong tractability using quasi-Monte
Carlo algorithms’, J. Complexity 18, 683–701.
X. Wang (2003), ‘Strong tractability of multivariate integration using quasi-Monte
Carlo algorithms’, Math. Comp. 72, 823–838.
X. Wang and K.-T. Fang (2003), ‘Effective dimension and quasi-Monte Carlo integration’, J. Complexity 19, 101–124.
X. Wang and I. H. Sloan (2005), ‘Why are high-dimensional finance problems often
of low effective dimension?’, SIAM J. Sci. Comput. 27, 159–183.
X. Wang and I. H. Sloan (2006), ‘Efficient weighted lattice rules with applications
to finance’, SIAM J. Sci. Comput. 28, 728–750.
X. Wang and I. H. Sloan (2007), ‘Brownian bridge and principal component analysis: Towards removing the curse of dimensionality’, IMA J. Numer. Anal.
27, 631–654.
288
J. Dick, F. Y. Kuo and I. H. Sloan
X. Wang and I. H. Sloan (2011), ‘Quasi-Monte Carlo methods in financial engineering: An equivalence principle and dimension reduction’, Oper. Res. 59,
80–95.
G. W. Wasilkowski and H. Woźniakowski (1995), ‘Explicit cost bounds of algorithms for multivariate tensor product problems’, J. Complexity 11, 1–56.
G. W. Wasilkowski and H. Woźniakowski (1996), ‘On tractability of path integration’, J. Math.Phys. 37, 2071–2088.
G. W. Wasilkowski and H. Woźniakowski (1999), ‘Weighted tensor product algorithms for linear multivariate problems’, J. Complexity 15, 402–447.
G. W. Wasilkowski and H. Woźniakowski (2004), ‘Finite-order weights imply
tractability of linear multivariate problems’, J. Approx. Theory 130, 57–77.
G. W. Wasilkowski and H. Woźniakowski (2010), ‘On the exponent of discrepancy’,
Math. Comp. 79, 983–992.
A. Werschulz and H. Woźniakowski (2009), ‘Tractability of multivariate approximation over a weighted unanchored Sobolev space’, Constr. Approx. 30, 395–
421.
H. Weyl (1916), ‘Über die Gleichverteilung von Zahlen mod. Eins’, Math. Ann. 77,
313–352.
H. Woźniakowski (2013), Monte Carlo integration. In Encyclopedia of Numerical
Analysis.
C. P. Xing and H. Niederreiter (1995), ‘A construction of low-discrepancy sequences
using global function fields’, Acta Arith. 73, 87–102.
R. X. Yue and F. J. Hickernell (2001), ‘Integration and approximation based on
scramble sampling in arbitrary dimensions’, J. Complexity 17, 881–897.
R. X. Yue and F. J. Hickernell (2002), ‘The discrepancy and gain coefficients of
scrambled digital nets’, J. Complexity 18, 135–151.
R. X. Yue and F. J. Hickernell (2005), ‘Strong tractability of integration using
scrambled Niederreiter points’, Math. Comp. 74, 1871–1893.
R. X. Yue and F. J. Hickernell (2006), ‘Strong tractability of quasi-Monte Carlo
quadrature using nets for certain Banach spaces’, SIAM J. Numer. Anal. 44,
2559–2583.
X. Y. Zeng, K. T. Leung, and F. J. Hickernell (2006), Error analysis of splines
for periodic problems using lattice designs. In Monte Carlo and Quasi-Monte
Carlo Methods 2004 (H. Niederreiter and D. Talay, eds), Springer, pp. 501–
514.
C. Zenger (1991), Sparse grids. In Parallel Algorithms for Partial Differential Equations (W. Hackbusch, ed.), Vol. 31 of Notes on Numerical Fluid Mechanics,
Vieweg.
Download