On optimal spatial subsample size for variance estimation

advertisement
On optimal spatial subsample size for variance estimation
Daniel J. Nordman
Soumendra N. Lahiri
Dept. of Statistics
Dept. of Statistics
Iowa State University
Iowa State University
Ames, IA 50011
Ames, IA 50011
Abstract
In this paper, we consider the problem of determining the optimal block size for a spatial subsampling
method for spatial processes observed on regular grids. We derive expansions for the mean square error
of the subsampling variance estimator, which yields an expression for the theoretical optimal block size.
The theoretical optimal block size is shown to depend in an intricate way on the geometry of the spatial
sampling region as well as on the characteristics of the underlying random field. Final expressions for
the optimal block size make use of some nontrivial estimates of lattice point counts in shifts of convex
sets. The expressions for the optimal block size are computed for sampling regions of a number of
commonly-encountered shapes.
AMS(2000) Subject Classification: Primary 62G09; Secondary 62M40, 60G60
Key Words: Block bootstrap, block size, lattice point count, mixing, nonparametric variance estimation,
random fields, spatial statistics, subsampling method
Research partially supported by NSF grants DMS 98-16630 and DMS 0072571 and by the Deutsche
Forschungsgemeinschaft (SFB 475).
1
Introduction
We examine the problem of choosing subsample sizes to maximize the performance of subsampling
methods for variance estimation. The data at hand are viewed as realizations of a stationary, weakly
dependent, spatial lattice process. We consider the common scenario of sampling from sites of regular
distance (e.g., indexed by the integer lattice Zd ), lying within some region Rn embedded in IRd . Such
1
lattice data appear often in time series, agricultural field trials, and remote sensing and image analysis
(medical and satellite image processing).
For variance estimation via subsampling, the basic idea is to construct several “scaled-down” copies
of the sampling region Rn (subsamples) that fit inside Rn , evaluate the analog of θ̂n on each of these
subregions, and then compute a properly normalized sample variance from the resulting values. The
Rn -sampling scheme is essentially recreated at the level of the subregions. Two subsampling designs are
most typical: subregions can be maximally overlapping (OL) or devised to be non-overlapping (NOL).
The accuracy (e.g. variance and bias) of subsample-based estimators depends crucially on the choice
of subsample size.
To place our work into perspective, we briefly outline previous research in variance estimation with
subsamples and theoretical size considerations. The concept of variance estimation through subsampling
originated from analysis of weakly dependent time processes. Let θ̂n be an estimator of a parameter
of interest θ based on {Z(1), . . . , Z(n)} from a stationary temporal process {Z(i)}i≥1 . To obtain
subsamples from a stationary stretch Z(1), ..., Z(n) in time, Carlstein (1986) first proposed the use of
NOL blocks of length m ≤ n:
n
o
Z(1 + (i − 1)m), ..., Z(im) ,
for i = 1, . . . , ⌊n/m⌋,
while the sequence of subseries:
n
o
Z(i), ..., Z(i + m − 1)
for i = 1, ..., n − m + 1
provides OL subsamples of length m (cf. Künsch, 1989; Politis and Romano, 1993b). Here, ⌊x⌋ denotes
the integer part of a real number x. In each respective collection, evaluations of an analog statistic θ̂i
are made for each subseries and a normalized sample variance is calculated to estimate the parameter
nVar(θ̂n ):
Sn2
=
J
X
m(θ̂i − θ̃)2
i=1
J
,
θ̃ =
J
X
θ̂i
i=1
J
,
where J = ⌊n/m⌋ (J = n − m + 1) for the NOL (OL) subsample-based estimator. Carlstein (1986)
and Fukuchi (1999) established the L2 consistency of the NOL and OL estimators, respectively, for
the variance of a general (not necessarily linear) statistic. Politis and Romano (1993b) determined
asymptotic orders of the variance O(m/n) and bias O (1/m) of the subsample variance estimators for
linear statistics. For mixing time series, they found that a subsample size m proportional to n1/3 is
optimal in the sense of minimizing the Mean Square Error (MSE) of variance estimation, concurring
also with optimal block order for the moving block bootstrap variance estimator [Hall et al. (1995),
Lahiri (1996)].
Cressie (1991, p. 492) conjectured the recipe for extending Carlstein’s variance estimator to the
general spatial setting, obtaining subsamples by tiling the sample region Rn with disjoint “congruent”
2
subregions. Politis and Romano (1993a, 1994) have shown the consistency of subsample-based variance
estimators for rectangular sampling/subsampling regions in IRd when the sampling sites are observed
Qd
Qd
on Zd ∩ i=1 [1, ni ] and integer translates of i=1 [1, mi ] yield the subsamples. Garcia-Soidan and
Hall (1997) and Possolo (1991) proposed similar estimators under an identical sampling scenario. For
Qd
linear statistics, Politis and Romano (1993a) determined that a subsampling scaling choice i=1 mi =
Q
C{ di=1 ni }d/(d+2) , for some unknown C, minimizes the order of a variance estimator’s asymptotic
MSE. Sherman and Carlstein (1994) and Sherman (1996) proved the MSE-consistency of NOL and OL
subsample estimators, respectively, for the variance of general statistics in IR2 . Their work allowed for a
more flexible sampling scheme: the “inside” of a simple closed curve defines a set D ⊂ [−1, 1]2 , Z2 ∩ nD
(using a scaled-up copy of D) constitutes the set of sampling sites, and translates of mD within nD form
subsamples. Sherman (1996) minimized a bound on the asymptotic order of the OL estimator’s MSE
to argue that the best size choice for OL subsamples involves m = O(n1/2 ) (coinciding with the above
findings of Politis and Romano (1993a) for rectangular regions in IR2 ). Politis and Sherman (2001) have
developed consistent subsampling methods for variance estimation with marked point process data [cf.
Politis et al. (1999), Chapter 6].
Few theoretical and numerical recommendations for choosing subsamples have been offered in the
spatial setting, especially with the intent of variance estimation. As suggested in the literature, an
explicit theoretical determination of optimal subsample “scaling” (size) requires calculation of an order
and associated proportionality constant for a given sampling region Rn . Even for the few sampling
situations where the order of optimal subsample size has been established, the exact adjustments to
theses orders are unknown and, quoting Politis and Romano (1993a), “important (and difficult) in
practice.” Beyond the time series case with the univariate sample mean, the influence of the geometry
and dimension of Rn , as well as the structure of θ̂n , on precise subsample selection has not been explored.
We attempt here to advance some ideas on the best size choice, both theoretically and empirically, for
subsamples.
We work under the “smooth function” model of Hall (1992), where the statistic of interest θ̂n can
be represented as a function of sample means. We formulate a framework for sampling in IRd where
the sampling region Rn (say) is obtained by “inflating” a prototype set in the unit cube in IRd and the
subsampling regions are given by suitable translates of a scaled down copy of the sampling region Rn .
We consider both a non-overlapping version and a (maximal) overlapping version of the subsampling
method. For each method, we derive expansions for the variance and the bias of the corresponding
subsample estimator of Var(θ̂n ). The asymptotic variance of the spatial subsample estimator for the
OL version turns out to be smaller than that of the NOL version by a constant factor K1 (say) which
depends solely on the geometry of the sampling region Rn . In the time series case, Meketon and
Schmeiser (1984), Künsch (1989), Hall et al. (1995) and Lahiri (1996) have shown in different degrees
3
of generality that the asymptotic variance under the OL subsampling scheme, compared to the NOL
one, is K1 =
2
3
times smaller. Results of this paper show that for rectangular sampling regions Rn in
d-dimensional space, the factor K1 is given by ( 32 )d . We list the factor K1 for sampling regions of some
common shapes in Table 1.
Table 1: Examples of K1 for several shapes of the sampling region Rn ⊂ IRd .
Shape of Rn
Rectangle in IRd
K1
2/3
d
Sphere in IR3
Circle in IR2
Right triangle in IR2
17π/315
π/4 − 4/(3π)
1/5
In contrast, the bias parts of both the OL and NOL subsample variance estimators are (usually)
asymptotically equivalent and depend on the covariance structure of the random field as well as on the
geometry of the sampling region Rn . Since the bias term is typically of the same order as the number
of lattice points lying near a subsample’s boundary, determination of the leading bias term involves
some nontrivial estimates of the lattice point counts over translated subregions. Counting lattice points
in scaled-up sets is a hard problem and has received a lot of attention in Analytic Number Theory
and in Combinatorics. Even for the case of the plane (i.e., d = 2), the counting results available in
the literature are directly applicable to our problem only for a very restricted class of subregions (that
have the so-called “smoothly winding border” [cf. van der Corput (1920), Huxley (1993, 1996)]. Here
explicit expressions for the bias terms are derived for a more general class of sampling regions using
some new estimates on the discrepancy between the number of lattice points and the volume of the
shifted subregions in the plane and in three dimensional Euclidean space. In particular, our results are
applicable to sampling regions that do not necessarily have “smoothly winding borders”.
Minimizing the combined expansions for the bias and the variance parts, we derive explicit expressions for the theoretical optimal block size for sampling regions of different shapes. To briefly describe
the result for a few common shapes, suppose the sampling region Rn is obtained by inflating a given
set R0 ∈ (− 12 , 21 ]d by a scaling constant λn as Rn = λn R0 and that the subsamples are formed by
considering the translates of s Rn =
s λn
s λn R0 .
Then, the theoretically optimal choice of the block size
for the OL version is of the form
d 2 1/(d+2)
λn B0
opt
(1 + o(1))
sλn =
dK0 τ 4
as n → ∞
for some constants B0 and K0 (coming from the bias and the variance terms, respectively), where τ 2 is
a population parameter that does not depend on the shape of the sampling region Rn (see Theorem 5.1
for details). The following table lists the constants B0 and K0 for some shapes of Rn . It follows from
Table 2 that, unlike the time series case, in higher dimensions the optimal block size critically depends
4
Table 2: Examples of B0 , K0 for some sampling regions Rn . Cross and triangle shapes appear in
Figure 1, Section 6; see Section 6 for further details. Autocovariances σ(·) and Euclidean, l1 , and l∞
norms k · k, k · k1 , k · k∞ are described in Section 2.3.
Rn
K0
Sphere in IR3
B0
3/2
34/105
X
kkkσ(k)
X
kkk1 σ(k)
k∈Z3
Cross in IR2
4/3
4/9 · 191/192
Right triangle in IR2 @
k∈Z2
2
2/5
X
kkk1 σ(k) + 2
k=(k1 ,k2 )′ ∈Z2
signk1 =signk2
X
kkk∞ σ(k)
k∈Z2
signk1 6=signk2
on the shape of the spatial sampling region Rn . It simplifies only slightly for the NOL subsampling
scheme as the constant K0 is unnecessary for computing optimal NOL subsamples, but the bias constant
B0 is often the same for both estimators from each version of subsampling. These expressions may be
readily used to obtain ‘plug-in’ estimates of the theoretical optimal block lengths for use in practice.
The rest of the paper is organized as follows. In Section 2, we describe the spatial subsampling
method and state the Assumptions used in the paper. In Sections 3 and 4, we respectively derive
expansions for the variance and the bias parts of the subsampling estimators. Theoretical optimal
block lengths are derived in Section 5. The results are illustrated with some common examples in
Section 6. Section 7 addresses data-based subsample size selection. Proofs of variance and bias results
are separated into Sections 8 and 9, respectively.
2
Variance estimators via subsampling
In Section 2.1, we frame the sampling design and the structure of the sampling region. Two methods of
subsampling are presented in Section 2.2 along with corresponding nonparametric variance estimators.
Assumptions and Conditions used in the paper are given in Section 2.3.
2.1
The sampling structure
To describe the sampling scheme used, we first assume all potential sampling sites are located on a
translate of the rectangular integer lattice in IRd . For a fixed (chosen) vector t ∈ [−1/2, 1/2)d, we
identify the t-translated integer lattice as Zd ≡ t + Zd . Let {Z(s) | s ∈ Zd } be a stationary weakly
dependent random field (hereafter r.f.) taking values in IRp . (We use bold font as a standard to denote
5
vectors in the space of sampling IRd and normal font for vectors in IRp , including Z(·).) We suppose
that the process Z(·) is observed at sampling sites lying within the sampling region Rn ⊂ IRd . Namely,
the collection of available sampling sites is
{Z(s) : s ∈ Rn ∩ Zd }.
To obtain the results in the paper, we assume that the sampling region Rn becomes unbounded
as the sample size increases. This will provide a commonly used “increasing domain” framework for
studying asymptotics with spatial lattice data [cf. Cressie (1991)]. We next specify the structure of the
regions Rn and employ a formulation similar to that of Lahiri (1999a,b).
Let R0 be a Borel subset of (−1/2, 1/2]d containing an open neighborhood of the origin such that
for any sequence of positive real numbers an → 0, the number of cubes of the scaled lattice an Zd which
d−1
intersect the closures R0 and R0c is O((a−1
) as n → ∞. Let ∆n be a sequence of d × d diagonal
n )
(n)
(n)
matrices, with positive diagonal elements λ(n)
1 , . . . , λd , such that each λi → ∞ as n → ∞. We assume
that the sampling region Rn is obtained by “inflating” the template set R0 by the directional scaling
factors ∆n ; that is,
Rn = ∆n R0 .
Because the origin is assumed to lie in R0 , the sampling region Rn grows outward in all directions as n
increases. Furthermore, if the scaling factors are all equal (λ1(n) = · · · = λ(n)
d ), the shape of Rn remains
the same for different values of n.
The formulation given above allows the sampling region Rn to have a large variety of fairly irregular
shapes with the boundary condition on R0 imposed to avoid pathological cases. Some common examples
of such regions are convex subsets of IRd , such as spheres, ellipsoids, polyhedrons, as well as certain
non-convex subsets with irregular boundaries, such as star-shaped regions. Sherman and Carlstein
(1994) and Sherman (1996) consider a similar class of such regions in the plane (i.e. d = 2) where the
boundaries of the sets R0 are delineated by simple rectifiable curves with finite lengths. The border
requirements on R0 ensure that the number of observations near the boundary of Rn is negligible
compared to the totality of data values. This assumption is crucial to the derivation of results to follow.
In addition, Perera (1997) demonstrates that the border geometry of R0 can significantly influence the
asymptotic distribution of sample means taken with observations from “expanding” sets, like Rn .
2.2
Subsampling designs and variance estimators
We suppose that the relevant statistic, whose variance we wish to estimate, can be represented as a
function of sample means. Let θ̂n = H(Z̄Nn ) be an estimator of the population parameter of interest
θ = H(µ), where H : IRp → IR is a smooth function, EZ(t) = µ ∈ IRp is the mean of the stationary r.f.
6
Z(·), and Z̄Nn is the sample mean of the N n observations within Rn . We can write
X
Z̄Nn = N −1
n
Z(s).
(2.1)
s∈Zd ∩Rn
This parameter and estimator formulation is what Hall (1992) calls the “smooth function” model and
it has been used in other scenarios, such as with the moving block bootstrap (MBB) and empirical
likelihood, for studying approximately linear functions of a sample mean [Lahiri (1996), DiCiccio et
al. (1991)]. By considering suitable functions of the Z(s)’s, one can represent a wide range of estimators under the present framework. In particular, these include means, products and ratios of means,
autocorrelation estimators, sample lag cross-correlation estimators [Politis and Romano (1993b)], and
Yule-Walker estimates for autoregressive processes [cf. Guyon (1995)].
The quantity which we seek to estimate nonparametrically is the variance of the normalized statistic
√
N n θ̂n , say, τn2 = N n E(θ̂n − Eθ̂n )2 . In our problem, this goal is equivalent to consistently estimating
the limiting variance τ 2 = limn→∞ τn2 .
2.2.1
Overlapping subsamples
Variance estimation with OL subsampling regions has been presented previously in the literature, though
in more narrow sampling situations. Sherman (1996) considered an OL subsample-based estimator for
sampling regions in IR2 ; Politis and Romano (1994) extended a similar estimator for rectangular regions
in IRd with faces parallel to the coordinate axes (i.e. R0 = (−1/2, 1/2]d); and a host of authors, in a
variety of contexts, have examined OL subsample estimators applied to time series data [cf. Song and
Schmeiser (1988), Politis and Romano (1993a), Fukuchi (1999)].
We first consider creating a smaller version of Rn , which will serve as a template for the OL subsampling regions. To this end, let s∆n be a d × d diagonal matrix with positive diagonal elements,
(n)
(n)
(n)
{sλ(n)
→ 0 and sλ(n)
→ ∞, as n → ∞, for each i = 1, . . . , d. (The
1 , . . . , sλd }, such that sλi /λi
i
matrix ∆n represents the determining scaling factors for Rn and s∆n shall be factors used to define the
subsamples.) We make the “prototype” subsampling region,
sRn
= s∆n R0 ,
(2.2)
and identify a subset of Zd , say JOL , corresponding to all integer translates of sRn lying within Rn .
That is,
JOL = i ∈ Zd : i + sRn ⊂ Rn .
The desired OL subsampling regions are precisely the translates of sRn given by: R i,n ≡ i + sRn ,
i ∈ JOL . Note that the origin belongs to JOL and, clearly, some of these subregions overlap.
Let sN n = |Zd ∩ sRn | be the number of sampling sites in sRn and let |JOL | denote the number of
7
available subsampling regions. (The number of sampling sites within each OL subsampling region is
the same, namely for any i ∈ JOL , sN n = |Zd ∩ Ri,n |.) For each i ∈ JOL , compute θ̂i = H(Zi,n ), where
Zi,n = sN n −1
X
Z(s)
s∈Zd ∩Ri,n
denotes the sample mean of observations within the subregion. We then have the OL subsample variance
estimator of τn2 as
2
= |JOL |−1
τ̂n,
OL
2.2.2
X
sN n
i∈JOL
θ̂i,n − θ̃n
2
θ̃n = |JOL |−1
,
X
θ̂i,n .
i∈JOL
Non-overlapping subsamples
Carlstein (1986) first proposed a variance estimator involving NOL subsamples for time processes.
Politis and Romano (1993a) and Sherman and Carlstein (1994) demonstrated the consistency of variance
estimation, via NOL subsampling, for certain rectangular regions in IRd and some sampling regions in
IR2 , respectively. Here we adopt a formulation similar to those of Sherman and Carlstein (1994) and
Lahiri (1999a).
The sampling region Rn is first divided into disjoint “cubes”. Let s∆n be the previously described
d × d diagonal matrix from (2.2), which will determine the “window width” of the partitioning cubes.
Let
JN OL = i ∈ Zd : s∆n i + (−1/2, 1/2]d ⊂ Rn
represent the set of all “inflated” subcubes that lie inside Rn . Denote its cardinality as |JN OL |. For
each i ∈ JN OL , define the subsampling region R̃i,n = s∆n (i + R0 ) by inscribing the translate of s∆n R0
such that the origin is mapped onto the midpoint of the cube s∆n (i + (−1/2, 1/2]d). This provides a
collection of NOL subsampling regions, which are smaller versions of the original sampling region Rn ,
that lie inside Rn .
For each i ∈ JN OL , the function H(·) is evaluated at the sample mean, say Z̃i,n for a corresponding
subsampling region R̃ i,n to obtain θ̂ i,n = H(Z̃ i,n ). The NOL subsample estimator of τn2 is again an
appropriately scaled sample variance:
2
τ̂n,
= |JN OL |−1
N OL
X
i∈JN OL
sN i,n
θ̂i,n − θ̃n
2
,
θ̃n = |JN OL |−1
X
θ̂i,n
i∈JN OL
where sN i,n = |Zd ∩ R̃i,n | denotes the number of sampling sites within a given NOL subsample.
We note that sN i,n may differ between NOL subsamples, but all such subsamples will have exactly
sN i,n
= sN n sites available if the diagonal elements of s∆n are integers.
8
2.3
Assumptions
For stating the assumptions, we need to introduce some notation. For a vector x = (x1 , ..., xd )′ ∈ IRd ,
Pd
let kxk and kxk1 = i=1 |xi | denote the usual Euclidean and l1 norms of x, respectively. Denote the
l∞ norm as kxk∞ = max1≤k≤d |xk |. Define dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 } for two sets
E1 , E2 ⊂ IRd . We shall use the notation | · | also in two other cases: for a countable set B, |B| would
denote the cardinality of the set B; for an uncountable set A ⊂ IRd , |A| would refer to the volume (i.e.,
the IRd Lebesgue measure) of A.
Let FZ (T ) = σhZ(s) : s ∈ T i be the σ-field generated by the variables {Z(s) : s ∈ T }, T ⊂ Zd .
For T1 , T2 ⊂ Zd , write
α̃(T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ FZ (T1 ), B ∈ FZ (T2 )}.
Then, the strong mixing coefficient for the r.f. Z(·) is defined as
α(k, l) = sup{α̃(T1 , T2 ) : Ti ⊂ Zd , |Ti | ≤ l, i = 1, 2; dis(T1 , T2 ) ≥ k}
(2.3)
Note that the supremum in the definition of α(k, l) is taken over sets T1 , T2 which are bounded. For
d > 1, this is important. A r.f. on the (rectangular) lattice Zd with d ≥ 2 that satisfies a strong mixing
condition of the form
lim sup{α̃(T1 , T2 ) : T1 , T2 ⊂ Zd , dis(T1 , T2 ) ≥ k} = 0
(2.4)
k→∞
with supremum taken over possibly unbounded sets necessarily belongs to the more restricted class
of ρ-mixing r.f.’s [cf. Bradley (1989)]. Politis and Romano (1993a) used moment inequalities based
2
2
on the mixing condition in (2.4) to determine the orders of the bias and variance of τ̂n,
OL , τ̂n,N OL for
rectangular sampling regions.
For proving the subsequent theorems, the assumptions below are needed along with two conditions
stated as functions of a positive argument r ∈ Z+ = {0, 1, 2, . . .}. In the following, det(∆) represents
the determinant of a square matrix ∆. For α = (α1 , ..., αp )′ ∈ (Z+ )p , let Dα denote the αth order
α
p
′
1
partial differential operator ∂ α1 +...+αp /∂xα
1 ...∂xp and ∇ = (∂H(µ)/∂x1 , . . . , ∂H(µ)/∂xp ) be the vec-
tor of first order partial derivatives of H at µ. Limits in order symbols are taken letting n tend to infinity.
Assumptions:
(A.1) There exists a d × d diagonal matrix ∆0 , det(∆0 ) > 0, such that
1
∆n → ∆0 .
(n) s
sλ1
(A.2) For the scaling factors of the sampling and subsampling regions:
d
X
i=1
d
(n)
X
1
[det( s∆n )](d+1)/d
sλi
+
= o(1),
+
(n)
(n)
det(∆n )
λi
sλi
i=1
9
max λi(n) = O min λi(n) .
1≤i≤d
1≤i≤d
(A.3) There exist nonnegative functions α1 (·) and g(·) such that limk→∞ α1 (k) = 0,
liml→∞ g(l) = ∞ and the strong-mixing coefficient α(k, l) from (2.3) satisfies the inequality
α(k, l) ≤ α1 (k)g(l) k > 0, l > 0.
(A.4) sup{ α̃(T1 , T2 ) : T1 , T2 ⊂ Zd , |T1 | = 1, dis(T1 , T2 ) ≥ k} = o(k −d ).
(A.5) τ 2 > 0, where τ 2 =
P
k∈Zd
σ(k), σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) .
Conditions:
Dr : H : IRp → IR is r-times continuously differentiable and, for some a ∈ Z+ and real C > 0,
max{|Dν H(x)| : kνk1 = r} ≤ C(1 + kxka ),
Mr :
x ∈ IRp .
For some 0 < δ ≤ 1, 0 < κ < (2r − 1 − 1/d)(2r + δ)/δ, and C > 0,
EkZ(t)k2r+δ < ∞,
∞
X
x=1
x(2r−1)d−1 α1 (x)δ/(2r+δ) < ∞,
g(x) ≤ Cxκ .
Some comments about the assumptions and the conditions are in order.
Assumption A.5 implies a positive, finite asymptotic variance τ 2 for the standardized estimator,
√
N n θ̂n . We would like τ 2 ∈ (0, ∞) for a purposeful variance estimation procedure.
In Assumption A.3, we formulate a conventional bound on the mixing coefficient α(k, l) from (2.3)
that is applicable to many r.f.s and resembles the mixing assumption of Lahiri (1999a,b). For r.f.s
satisfying Assumption A.3, the “distance” component of the bound, α1 (·), often decreases at an exponential rate while the function of “set size,” g(·), increases at a polynomial rate [cf. Guyon (1995)].
Examples of r.f.s that meet the requirements of A.3 and Mr include Gaussian fields with analytic spectral densities, certain linear fields with a moving average or autoregressive (AR) representation (like
m-dependent fields), separable AR(1)xAR(1) lattice processes suggested by Martin (1990) for modeling
in IR2 , many Gibbs and Markov fields, and important time series models [cf. Doukhan (1994)]. Condition Mr combined with A.3 also provides useful moment bounds for normed sums of observations (see
Lemma 8.2).
Assumption A.4 permits the CLT in Bolthausen (1982) to be applied to sums of Z(·) on sets of
increasing domain, in conjunction with the boundary condition on R0 , Assumption A.3, and Condition
Mr . This version of the CLT (Stein’s method) is derived from α-mixing conditions which ensure
asymptotic independence between a single point and observations in arbitrary sets of increasing distance
[cf. Perera (1997)].
Assumptions A.1 and A.2 set additional guidelines for how sampling and subsampling design parameters, ∆n and s∆n , may be chosen. The assumptions provide a flexible framework for handling “increas10
ing domains” of many shapes. For d = 1, A.1-A.2 are equivalent to the requirements of Lahiri (1999c)
who provides variance and bias expansions for the MBB variance estimator with weakly dependent time
processes.
3
Variance expansions
We now give expansions for the asymptotic variance of the OL/NOL subsample variance estimators
2
2
τ̂n,
and τ̂n,
of τn2 = N n Var(θ̂n ).
OL
N OL
Theorem 3.1 Suppose that Assumptions A.1 - A.5 and Conditions D2 and M5+2a hold, then
(a)
2
Var(τ̂n,
OL )
(b)
2
Var(τ̂n,
N OL )
= K0 ·
=
det(s∆n ) 4 2τ (1 + o(1)),
det(∆n )
1
det(s∆n ) 4 ·
2τ (1 + o(1)),
|R0 | det(∆n )
Z
1
|(x + R0 ) ∩ R0 |2
where K0 =
·
dx is an integral with respect to the IRd Lebesgue measure.
|R0 | IRd
|R0 |2
2
is a property of the
The constant K0 appearing in the variance expansion of the estimator τ̂n,
OL
shape of the sampling template R0 but not of its exact embedding in space IRd or even the scale of
the set. Namely, K0 is invariant to invertible affine transformations applied to R0 and hence can be
computed from either R0 or Rn = ∆n R0 . Values of K0 for some template shapes are given in Table 3
and Section 6.
Table 3: Examples of K0 from Theorem 3.1 for several shapes of R0 ⊂ IRd . ∗ The trapezoid has a 90◦
interior ∠ and parallel sides b2 ≥ b1 ; c = (b2 /b1 + 1)−2 [1 + 2(b2 /b1 − 1)/(b2 /b1 + 1)].
R0 Shape
IRd Rectangle
IR3 Ellipsoid
IR3 Cylinder
IR2 Ellipse
IR2 Trapezoid∗
K0
(2/3)d
34/105
2/3 1 − 16/(3π 2 )
1 − 16/(3π 2 )
2/5 (1 + 4c/9)
A stationary time sequence Z1 , . . . , Zn can be obtained within our sampling formulation by choosing
R0 = (−1/2, 1/2] and λ(n)
1 = n on the untranslated integer lattice Z = Z. In this special sampling case,
an application of Theorem 3.1 yields
2
2
Var(τ̂n,
OL ) = 2/3 · Var(τ̂n,N OL ),
(n)
(n)
2
Var(τ̂n,
· [2τ 4 ](1 + o(1)),
N OL ) = sλ1 /λ1
a result which is well-known for “nearly” linear functions θ̂n of a time series sample mean [cf. Künsch (1989)].
Theorem 3.1 implies that, under the “smooth” function model, the asymptotic variance of the OL
11
subsample-based variance estimator is always strictly less than the NOL version because
2
Var(τ̂n,
)
OL
= K0 |R0 | < 1.
2
n→∞ Var(τ̂n,N OL )
K1 = lim
(3.1)
If both estimators have the same bias (which is often the case), (3.1) implies that variance estimation
with OL subsamples is more efficient than the NOL subsample alternative owing to a smaller asymptotic
MSE.
Unlike K0 , K1 does depend on the volume |R0 |, which in turn is constrained by the R0 -template’s
geometry. That is, K1 is ultimately bounded, through |R0 | in (3.1), by the amount of space that an
object of R0 ’s shape can possibly occupy within (−1/2, 1, 2]d; or by how much volume can be filled by
a given geometrical body (e.g. circle) compared to a cube. The constants K1 in Table 1 are computed
with templates of prescribed shape and largest possible volume in (−1/2, 1/2]d. These values most
accurately reflect the influence of R0 ’s (or Rn ’s) geometry on the relative performance (ie. variance,
2
2
efficiency) of the estimators τ̂n,
and τ̂n,
.
OL
N OL
To conclude this section, we remark that both subsample-based variance estimators can be shown to
be (MSE) consistent under Theorem 3.1 conditions, allowing for more general spatial sampling regions,
in both shape and dimension, than previously considered. Inference on the parameter θ can be made
√
through the limiting standard normal distribution of N n (θ̂n − θ)/τ̂n for τ̂n = τ̂n,OL or τ̂n,N OL .
4
Bias expansions
We now try to capture and precisely describe the leading order terms in the asymptotic bias of each
subsample-based variance estimator, similar to the variance determinations from the previous section.
2
We first establish and note the order of the dominant component in the bias expansions of τ̂n,
OL and
2
τ̂n,
, which is the subject of the following lemma.
N OL
Lemma 4.1 With Assumptions A.1 - A.5, suppose that Conditions D2 and M2+a hold for d ≥ 2; and
D3 and M3+a for d = 1. Then, the subsample estimators of τn2 = N n Var(θ̂n ) have expectations:
2
E(τ̂n,
) = τn2 + O (1/ sλ(n)
OL
1 )
and
2
E(τ̂n,
) = τn2 + O (1/ sλ(n)
N OL
1 ).
The lemma shows that, under the smooth function model, the asymptotic bias of each estimator
is O (1/ sλ1(n) ) for all dimensions of sampling. Politis and Romano (1993a, p.323) and Sherman (1996)
2
showed this same size for the bias of τ̂n,
with sampling regions based on rectangles R0 = (−1/2, 1/2]d
OL
or simple closed curves in IR2 , respectively. Lemma 4.1 extends these results to a broader class of
sampling regions. However, we would like to precisely identify the O (1/ sλ(n)
1 ) bias component for
2
2
τ̂n,
OL or τ̂n,N OL to later obtain theoretically correct proportionality constants associated with optimal
subsample scaling.
12
To achieve some measure of success in determining the exact bias of the subsampling estimators, we
reformulate the subsampling design slightly so that sλn ≡ sλ1(n) = · · · = sλ(n)
d . That is, a common scaling
factor in all directions is now used to define the subsampling regions (as in Sherman and Carlstein (1994);
Sherman (1996)). This constraint will allow us to deal with the counting issues at the heart of the bias
expansion.
Adopting a common scaling factor sλn for the subsamples also is sensible for a few other reasons at
this stage:
• “Unconstrained” optimum values of s∆n cannot always be found by minimizing the asymptotic
2
2
MSE of τ̂n,
or τ̂n,
, even for variance estimation of some desirable statistics on geometrically
OL
N OL
“simple” sampling and subsampling regions. Consider estimating the variance of a real-valued
sample mean over a rectangular sampling region in IRd based on R0 = (−1/2, 1/2]d, with observations on Zd = Zd . If Assumptions A.1-A.5 and Condition M1 hold, the leading term in the bias
expansion can be shown to be:
2
Bias of τ̂n,
OL =
d
X
Li
−
(n)
λ
i=1 s i
!
(1 + o(1));
Li =
X
|ki |Cov Z(0), Z(k) .
k∈Zd
k=(k1 ,...,kd )′
2
In using the parenthetical sum above to expand the MSE of τ̂n,
, one finds that the resulting MSE
OL
cannot be minimized over the permissible, positive range of s∆n if the signs of the Li values are
unequal. That is, for d > 1, the subsample estimator MSE cannot always be globally minimized to
obtain optimal subsample factors s∆n by considering just the leading order bias terms. An effort
to determine and incorporate (into the asymptotic MSE) second or third order bias components
quickly becomes intractable, even with rectangular regions.
• The diagonal components of s∆n are asymptotically scalar multiples of each other by Assumption
A.1. If so desired, a template choice for R0 could be used to scale the expansion of the subsampling
regions in each direction.
In the continued discussion, we assume
sRn
= sλn R0 .
(4.1)
We frame the components necessary for determining the biases of the spatial subsample variance estimators in the next theorem. Let Cn (k) ≡ |Zd ∩ sRn ∩ (k + sRn )| denote the number of pairs of
observations in the subsampling region sRn separated by a translate k ∈ Zd .
Theorem 4.1 Suppose that d ≥ 2, sRn = sλn R0 and Assumptions A.1 - A.5, Conditions D3 and M3+a
hold. If, in addition, sλn ∈ Z+ for NOL subsamples and
lim
n→∞
− Cn (k)
= C(k)
(sλn )d−1
sN n
13
(4.2)
exists for all k ∈ Zd , then


X
−1

E(τ̂n2 ) − τn2 =
C(k)σ(k) (1 + o(1))
sλn |R0 |
d
k∈Z
2
2
where σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) and where τ̂n2 is either τ̂n,
OL or τ̂n,N OL .
Note that the numerator on the left side of (4.2) is the number of integer grid points that lie in the
subregion sRn , but not in the translate k + sRn . Hence, computing the bias above actually requires
counting the number of lattice points inside intersections like sRn ∩ k + sRn , which is difficult in general.
To handle the problem, one may attempt to estimate the count Cn (k) with the corresponding Lebesgue
volume, |sRn ∩ k + sRn |, and then quantify the resulting approximation error. The determination of
volumes or areas may not be easy either but hopefully more manageable. For example, if R0 is a circle,
the area of sλn R0 can be readily computed, but the number of Z2 integers inside sλn R0 is not so simple
and was in fact a famous consideration of Gauss [cf. Krätzel (1988, p. 141)].
We first note that the boundary condition on R0 provides a general (trivial) bound on the discrepancy
between the count Cn (k) and the volume |sRn ∩ k + sRn |: O (sλd−1
n ). However, the size of the numerator
d
in (4.2) is also O (sλd−1
n ), corresponding to the order of Z lattice points “near” the boundary of sRn .
Consequently, a standard O (sλd−1
n ) bound on the volume-count approximation error is too large to
immediately justify the exchange of volumes |sRn |, |sRn ∩ k + sRn | for counts sN n , Cn (k) in (4.2).
Bounds on the difference between lattice point counts and volumes have received much attention in
analytic number theory, which we briefly mention. Research has classically focused on sets outlined by
“smooth” simple closed curves in the plane IR2 and on one question in particular [Huxley (1996)]: When
a curve with interior area A is ‘blown up’ by a factor b, how large is the difference between the number
of Z2 integer points inside the new curve and the area b2 A? For convex sets with a smoothly winding
border, van der Corput’s (1920) answer to the posed question above is O (b46/69+ǫ ), while the best answer
is O (b46/73+ǫ ) for curves with sufficiently differentiable radius of curvature [Huxley (1993, 1996)]. These
types of bounds, however, are invalid for many convex polygonal templates R0 in IR2 such as triangles,
trapezoids, etc., where often the difference between number of Z2 integer points in sRn = s λn R0 and
its area is of exact order O (sλn ) (set also by the boundary condition on R0 or the perimeter length
of sRn ). The problem above, as considered by number theorists, does not directly address counts for
intersections between an expanding region and its vector translates, e.g. sRn ∩ k + sRn .
2
To eventually compute “closed-form” bias expansions for τ̂n,
OL , we use approximation techniques
for subtracted lattice point counts. For each k ∈ Zd , we
1. replace the numerator of (4.2) with the difference of corresponding Lebesgue volumes.
2. show the following error term is of sufficiently small order o(sλd−1
n ):
d
d
sN n − Cn (k) − sλn |R0 | − |sRn ∩ k + sRn | = sN n − sλn |R0 | − Cn (k) − |sRn ∩ k + sRn | .
14
We do approximate the number of lattice points in sRn and sRn ∩ k + sRn by set volumes, though
the Lebesgue volume may not adequately capture the lattice point count in either set. However, the
difference between approximation errors sN n − sλdn |R0 | and Cn (k) − |sRn ∩ k + sRn | can be shown to
be asymptotically small enough, for some templates R0 , to justify replacing counts with volumes in
(4.2) (see Lemma 9.4). That is, these two volume-count estimation errors can cancel to a sufficient
extent when subtracted. The above approach becomes slightly more complicated for NOL subsamples,
R̃i,n = s∆n (i + R0 ), which may vary in number of sampling sites sN i,n . In this case, errors incurred by
approximating counts |Zd ∩ R̃i,n ∩ k + R̃i,n | with volumes |R̃i,n ∩ k + R̃i,n | are shown to be asymptotically
negligible, uniformly in i ∈ JN OL .
In the following theorem, we use this technique to give bias expansions for a large class of sampling
regions in IRd , d ≤ 3, which are “nearly” convex. That is, the sampling region Rn differs from a convex
set possibly only at its boundary, but sampling sites on the border may be arbitrarily included or
excluded from Rn .
Q
i
Some notation is additionally required. For α = (α1 , ..., αp )′ ∈ (Z+ )p , x ∈ IRp , write xα = pi=1 xα
i ,
Qp
α! = i=1 (αi !), and cα = Dα H(µ)/α!. Let Z∞ denote a random vector with a normal N (0, Σ∞ )
√
distribution on IRp , where Σ∞ is the limiting covariance matrix of the scaled sample mean N n (Z̄Nn −µ)
from (2.1). Let B ◦ , B denote the interior and closure of B ∈ IRd , respectively.
Theorem 4.2 Suppose sRn = sλn R0 and there exists a convex set B such that B ◦ ⊂ R0 ⊂ B. With
Assumptions A.2 - A.5, assume Conditions D5−d and M5−d+a hold for d ∈ {1, 2, 3}. Then,
C(k) = V (k) ≡ lim
n→∞
|sRn | − |sRn ∩ (k + sRn )|
,
(sλn )d−1
k ∈ Zd
2
2
2
2
whenever V (k) exists and the biases E(τ̂n,
OL ) − τn , E(τ̂n,N OL ) − τn are equal to:
for d = 1,
for d = 2 or 3,
−1
λ
s n |R0 |

X
k∈Z
|k|σ(k) + C∞
!
(1 + o(1));

X |sRn | − |sRn ∩ (k + sRn )|
−
σ(k) (1 + o(1))
|sRn |
d
k∈Z


−1  X
or
V (k)σ(k) (1 + o(1)), provided each V (k) exists;
sλn |R0 |
d
k∈Z


X cα
X cα cβ
α
α β
where σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) and C∞ = Var 
Z∞
+2
E Z∞
Z∞
α!
β!
kαk1 =2
15
kαk1 =1,
kβk1 =3
+2
X
k1 , k2 ∈Z
X
α β γ cα c(β+γ) E Z(t) − µ Z(t + k1 ) − µ Z(t + k2 ) − µ
.
(β + γ)!
kαk1 =1,
kβk1 =1,kγk1 =1
Remark 1: If Condition Dx holds with C = 0 for x ∈ {2, 3, 4}, then Condition Mx−1 is sufficient in
Theorem 4.2.
Remark 2: For each k ∈ Zd , the numerator in V (k) is of order O (sλd−1
n ) by the R0 -boundary condition
which holds for convex templates. We may then expand the bias of the estimators through the limiting,
scaled volume differences V (k). For d = 1, with samples and subsamples based on intervals, it can be
easily seen that V (k) = |k| which appears in Theorem 4.2.
2
The function H(·) needs to be increasingly “smoother” to determine the bias component of τ̂n,
OL
2
or τ̂n,
in lower dimensional spaces d = 1 or 2. For a real-valued time series sample mean θ̂n = Z̄n ,
N OL
the well-known bias of the subsample variance estimators follows from Theorem 4.2 under our sampling
framework R0 = (−1/2, 1/2], Z = Z:
−1
sλn
X
k∈Z
|k|Cov ∇′ Z(0), ∇′ Z(k)
(4.3)
with ∇ = 1. In general though, terms in the Taylor expansion of θ̂ i,n (around µ) up to fourth order
2
2
can contribute to the bias of τ̂n,
OL and τ̂n,N OL when d = 1. In contrast, the asymptotic bias of the
time series MBB variance estimator with “smooth” model statistics is very different from its subsamplebased counterpart. The MBB variance estimator’s bias is given by (4.3), determined only by the linear
component from the Taylor’s expansion of θ̂i,n [cf. Lahiri (1996)].
5
Asymptotically optimal subsample sizes
In the following, we consider “size” selection for the subsampling regions to maximize the large-sample
accuracy of the subsample variance estimators. For reasons discussed in Section 4, we examine a
theoretically optimal scaling choice sλn for subregions in (4.1).
5.1
Theoretical optimal subsample sizes
2
Generally speaking, there is a trade off in the effect of subsample size on the bias and variance of τ̂n,
OL
2
or τ̂n,
. For example, increasing sλn reduces the bias but increases the variance of the estimators.
N OL
The best value of sλn optimizes the over-all performance of a subsample variance estimator by balancing
the contributions from both the estimator’s variance and bias. An optimal sλn choice can be found by
minimizing the asymptotic order of a variance estimator’s MSE under a given OL or NOL sampling
scheme.
2
2
Theorem 4.1 implies that the bias of the estimators τ̂n,
and τ̂n,
is of exact order O (1/sλn ). For
OL
N OL
16
a broad class of sampling regions Rn , the leading order bias component can be determined explicitly with
Theorem 4.2. We bring these variance and bias expansions together to obtain an optimal subsample
scaling factor sλn .
Theorem 5.1 Let sRn = sλn R0 . With Assumptions A.2 - A.5, assume Conditions D2 and M5+2a
P
hold if d ≥ 2; and Conditions D3 and D7+2a if d = 1. If B0 |R0 | ≡ k∈Zd C(k)σ(k) + I{d=1} C∞ 6= 0,
then
opt
n,OL
sλ
=
det(∆n )(B0 )2
dK0 τ 4
1/(d+2)
(1 + o(1))
and
opt
n,N OL
sλ
=
det(∆n )|R0 |(B0 )2
dτ 4
1/(d+2)
(1 + o(1)).
Remark 3: If Condition Dx holds with C = 0 for x ∈ {2, 3}, then Condition M2x−1 is sufficient.
Remark 4: Theorem 5.1 suggests that optimally scaled OL subsamples should be larger than the NOL
ones by a scalar: (K1 )−1/(d+2) > 1 where K1 = K0 |R0 | the limiting ratio of variances from (3.1).
It is well-known in the time series case that the OL subsampling scheme produces an asymptotically
more efficient variance estimator than its NOL counterpart. We can now quantify the relative efficiency
of the two subsampling procedures in d-dimensional sampling space. With each variance estimator
2
2
respectively optimized using (4.1), τ̂n,
OL is more efficient than τ̂n,N OL and the asymptotic relative
2
2
efficiency of τ̂n,
to τ̂n,
depends solely of the geometry of R0 :
N OL
OL
2
E(τ̂n,
− τn2 )2
OL
= (K1 )2/(d+2) < 1.
2
n→∞ E(τ̂n,
− τn2 )2
N OL
AREd = lim
Possolo (1991), Politis and Romano (1993a, 1994), Hall and Jing (1996), and Garcia-Soidan and
Hall (1997) have examined subsampling with rectangular regions based essentially on R0 = (−1/2, 1/2]d.
Using the geometrical characteristic K1 = ( 23 )d for rectangles, we can now examine the effect of the
2
2
sampling dimension on the AREd of τ̂n,
to τ̂n,
for these sampling regions. Although the AREd deN OL
OL
2
2
creases as the dimension d increases, we find the relative improvement of τ̂n,
OL over τ̂n,N OL is ultimately
limited and the AREd has a lower bound of 4/9 for all IRd -rectangular regions.
We conclude this section by addressing a question raised by a referee on subsample shape selection. Although not widely considered in the literature, it is possible to formulate subsample variance
estimators using subsamples of a freely chosen shape, rather than scaled-down copies of Rn . Nordman
and Lahiri (2003) discuss comparing estimators, based on subsamples of different shapes, in terms of
the asymptotic MSE of variance estimation (e.g., to select subsamples as scaled down Rn -copies vs.
rectangles). This involves finding variance and bias expansions for OL, NOL estimators to determine
optimal scaling for subsamples of an arbitrary shape. Modified versions of Theorems 3.45 are given.
Such performance comparisons depend heavily on the subsample geometry (which solely determines
the bias of OL, NOL subsample estimators) as well as the r.f. covariances σ(k), k ∈ Zd . For some
17
limited sampling regions, it may be possible to select a good subsample shape only on geometrical
considerations. See Nordman and Lahiri (2003) for further details and examples.
6
Examples
We now provide some examples of the important quantities K0 , K1 , B0 associated with optimal scaling
sλn
opt
with some common sampling region templates, determined from Theorems 3.1 and 4.2. For
subsamples from (4.1), the theoretically best sλopt
n can also be formulated in terms of |Rn | = det(∆n )|R0 |
(sampling region volume), K1 , and B0 .
6.1
Examples in IR2
example 1. Rectangular regions in IR2 (potentially rotated):
n
o
′
If R0 = (l1 cos θ, l2 sin θ)x, (−l1 sin θ, l2 cos θ)x : x ∈ (−1/2, 1/2]2 for θ ∈ [0, π], 0 < l1 , l2 , then
K0 =
4
,
9
B0 =
X
k∈Z2
k=(k1 ,k2 )′
|k1 cos θ − k2 sin θ| |k1 sin θ + k2 cos θ|
+
σ(k).
l1
l2
The characteristics K1 , B0 for determining optimal subsamples based on two rectangular templates are
further described in Table 4.
example 2. If R0 is a circle of radius r ≤ 1/2 centered at the origin, then K0 appears in Table 3 and
P
B0 = 2/(rπ) k∈Z2 kkkσ(k).
example 3. For any triangle, K0 = 2/5. Two examples are provided in Tables 2 and 4.
Figure 1. Examples of templates R0 ⊂ (−1/2, 1/2]2 are outlined by solid lines. Cross-shaped sampling
regions Rn described in Table 2 are based on R0 in (v).
0.5
i
ii
iii
@
@
A
A
@
@
A
@
@
0
A
@
@
A
@
@
-0.5
@
@
A
-0.5
0
0.5
iv
v
example 4. If R0 is a regular hexagon, centered at the origin and with side length l ≤ 1/2, then
37
K0 =
,
81
√
√
2 3 X
B0 =
|k2 | + max{ 3|k1 |, |k2 |} σ(k).
l
2
k∈Z
example 5. For any parallelogram in IR2 with interior angle γ and adjacent sides of ratio b ≥ 1,
K0 = 4/9 + 2/15 · b−2 | cos γ|(1 − | cos γ|). In particular, if a parallelogram R0 is formed by two vectors
18
Table 4: Examples of several shapes of R0 ⊂ IR2 and associated K1 , B0 for sλopt
n
R0
K1
B0
(−1/2, 1/2]2
4/9
P
Circle of radius 1/2 at origin
π/4 − 4/(3π)
4/π
“Diamond” in Figure 1(i)
2/9
Right triangle in Figure 1(ii)
1/5
Triangle in Figure 1(iii)
1/5
k∈Z2
2
k∈Z2
P
k∈Z2
kkkσ(k)
kkk∞ σ(k)
Table 2
P
2|k2 |σ(k) +
k∈Z2 ,
|k1 |≤|k2 |/2
Parallelogram in Figure 1(iv)
P
kkk1 σ(k)
√
2/9 + ( 5 − 1)/375
P
|k2 | + 2|k1 | σ(k)
k∈Z2 ,
|k1 |>|k2 |/2
√ P
4/ 5 k∈Z2 |k1 − 2k2 |/5 + |k2 | σ(k)
(0, l1 )′ , (l2 cos γ, l2 sin γ)′ extended from a point x ∈ (−1/2, 1/2]2, then
X k1 · | cos θ| − k2 · | sin θ| 1
|k2 |
B0 =
+
σ(k),
| sin θ|
max{l1 , l2 }
min{l1 , l2 }
2
k∈Z
γ ∈ (0, π), l1 , l2 > 0.
For further bias term B0 calculation tools with more general (non-convex) sampling regions and
templates R0 (represented as the union of two approximately convex sets), see Nordman (2002).
6.2
Examples in IRd , d ≥ 3
example 6. For any sphere, K0 is given in Table 3. The properties B0 , K1 of the sphere described in
Table 1 and 2 correspond to the template sphere R0 of radius 1/2 (and maximal volume in (−1/2, 1/2]3).
example 7. The K0 value for any IR3 cylinder appears in Table 3. If R0 is a cylinder with circular
base (parallel to x-y plane) of radius r and height h, then
!
p
X
|k3 | 2 k12 + k22
B0 =
+
σ(k).
h
πr
k∈Z3
k=(k1 ,k2 ,k3 )′
The results of Theorem 4.2 for determining the bias B0 also seem plausible for convex sampling regions in IRd , d ≥ 4, but require further study of lattice point counting techniques in higher dimensions.
However, bias expansions of the OL and NOL subsample variance estimators are relatively straightforward for an important class of rectangular sampling regions based on the prototype R0 = (−1/2, 1/2]d,
19
which can then be used in optimal subsample scaling. These hypercubes have “faces” parallel to the
coordinate axes which simplifies the task of counting sampling sites, or lattice points, within such regions. We give precise bias expansions in the following theorem, while allowing for potentially missing
sampling sites at the borders, or faces, of the sampling region Rn .
d
Theorem 6.1 Let (−1/2, 1/2)d ⊂ Λ−1
ℓ R0 ⊂ [−1/2, 1/2] , d ≥ 3 for a d × d diagonal matrix Λℓ with
entries 0 < ℓi ≤ 1, i = 1, ..., d. Suppose sRn = sλn R0 and Assumptions A.2 - A.5, Conditions D2 and
2
2
2
2
−1
M2+a hold. Then, the biases E(τ̂n,
OL ) − τn , E(τ̂n,N OL ) − τn are equal to −sλn B0 (1 + o(1)) where
d
X X
|ki |
σ(k),
B0 =
ℓi
d
i=1
k∈Z
σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) .
example 8. For rectangular sampling regions Rn = ∆n (−1/2, 1/2]d, optimal subsamples may be
chosen with
opt
n,N OL
sλ
=
|Rn |
dτ 4
X
k∈Zd
2 !1/(d+2)
kkk1 σ(k)
(1 + o(1))
or
opt
opt
sλn,OL = sλn,N OL
1/(d+2)
3
2
(common directional scaling) using the template R0 = (−1/2, 1/2]d.
7
Empirical subsample size determination
In this section we consider data-based estimation of the theoretical optimal subsample sizes. In particular, we propose methods for estimating the scaling factor sλopt
n for subsamples in (4.1), which are
applicable to both OL and NOL subsampling schemes. Inference on “best” subsample scaling closely
resembles the problem of empirically gauging the theoretically optimal block size with the MBB variance estimator, which has been much considered for time series. We first frame the techniques used in
this special setting, which can be modified for data-based subsample selection.
Two distinct techniques have emerged for estimating the (time series) MBB block size. One approach involves “plug-in” type estimators of optimal block length [cf. Bühlman and Künsch (1999)].
For subsample inference on nVar(Z̄n ), Carlstein (1986) (with AR(1) models) and Politis et al. (1999)
have described “plug-in” estimators for subsample length. The second MBB block selection method,
suggested by Hall et al. (1995) and Hall and Jing (1996), uses subsampling to create a data-based
version of the MSE of the MBB estimator (as a function of block size) which then is minimized to
obtain an estimate of optimal block length. With spatial data, Garcia-Soidan and Hall (1997) extended
this empirical MSE selection procedure with subsample-based distribution estimators. For variance
estimation of a univariate time series sample mean, Lèger et al. (1992) proposed a subsample length
estimate based on asymptotic MSE considerations; namely, selecting a size by simultaneously solving
3
2
2
opt
2
(sλopt
n ) = (3/2)(B̂/τ̂n,OL ) n for sλn and the associated estimator τ̂n,OL , where B̂ estimates the paren-
thetical quantity in (4.3).
20
A “plug-in” procedure for inference on the theoretical, optimal subsample size involves constructing,
and subsequently substituting, (consistent) estimates of unknown population parameters in sλopt
n from
Theorem 5.1. The limiting variance τ 2 appears in the formulation of sλopt
n and could be approximated
with a subsample variance estimate based on some preliminary size choice. The value K0 can be determined from the available sampling region Rn , but selection of a template R0 is also required in the
“plug-in” approach. The best solution may be to pick a sampling template R0 to be the largest set of
d
the form Λ−1
l Rn for a positive diagonal matrix Λ that fits within (−1/2, 1/2] ; this choice appears to
2
2
reduce the magnitude of the bias of τ̂n,
, τ̂n,
, which contributes heavily to the MSE of an estimator.
OL
N OL
In a spirit similar to the “plug-in” approach, one could empirically select sλopt
n as in Lèger et al. (1992)
and Politis and Romano (1993b). After evaluating K0 and an estimate B̂0 of the leading bias compo2
2
nent B0 , compute τ̂n,
OL (or analogously τ̂n,N OL ) for a series of sλn values and simultaneously solve the
2
asymptotical MSE-based formulation sλd+2
= det(∆n )(B̂0 )2 /{dK0 (τ̂n,
)2 } for s λn and an associated
n
OL
2
τ̂n,
.
OL
8
Proofs for variance expansions
For the proofs, we use C to denote generic positive constants that do not depend on n or any Zd integers
(or Zd lattice points). The real number r, appearing in some proofs, always assumes the value stated
under Condition Mr with respect to the lemma or theorem under consideration. Unless otherwise
specified, limits in order symbols are taken letting n tend to infinity.
In the following, we denote the indicator function as I{·} (ie. I{·} ∈ {0, 1} and I{A} = 1 if and
only if an event A holds). For two sequences {rn } and {tn } of positive real numbers, we write rn ∼ tn
if rn /tn → 1 as n → ∞. We write λmax
and sλmax
for the largest diagonal entries of ∆n and s∆n ,
n
n
respectively, while sλ(n)
min ≥ 1 will denote the smallest diagonal entry of s∆n .
We require a few lemmas for the proofs.
Lemma 8.1 Suppose T1 , T2 ⊂ Zd ≡ t + Zd are bounded. Let p, q > 0 where 1/p + 1/q < 1. If X1 , X2
are random variables, with Xi measurable with respect to FZ (Ti ), i = 1, 2, then,
1−1/p−1/q
|Cov(X1 , X2 )| ≤ 8(E|X1 |p )1/p (E|X2 |q )1/q α dis(T1 , T2 ); max |Ti |
,
i=1,2
provided expectations are finite and dis(T1 , T2 ) > 0.
Proof. Follows from Theorem 3, Doukhan (1994, p. 9). Lemma 8.2 Let r ∈ Z+ . Under Assumption A.3 and Condition Mr , for 1 ≤ m ≤ 2r and any
T ⊂ Zd ≡ t + Zd ,
X
m
m/2
E
Z(s) − µ ;
≤ C(α)|T |
s∈T
C(α) is a constant that depends only on the coefficients α(k, l), l ≤ 2r, and EkZ(t)k2r+δ .
21
Proof: This follows from Theorem 1, Doukhan (1994, p. 26-31) and Jensen’s inequality. We next determine the asymptotic sizes of important sets relevant to the sampling or subsampling
designs.
Lemma 8.3 Under Assumptions A.1-A.2, the number of sampling sites within
(a) the sampling region Rn : N n = |Rn ∩ Zd | ∼ |R0 | · det(∆n );
(b) an OL subsample, Ri,n , i ∈ JOL : sN n ∼ |R0 | · det(s∆n );
(c) a NOL subsample, R̃i,n , i ∈ JN OL : sN i,n ∼ |R0 | · det(s∆n ).
The number of
(d) OL subsamples within Rn : |JOL | ∼ |R0 | · det(∆n );
(e) NOL subsamples within Rn : |JN OL | ∼ |R0 | · det(∆n ) · det(s∆n )−1 .
(f ) sampling sites near the border of a subsample, Ri,n or R̃i,n , is less than
c 6= ? f or T j = j + [−2, 2]d }| ≤ C( λmax )d−1 .
supi∈Zd |{j ∈ Zd : T j ∩ Ri,n 6= ?, T j ∩ Ri,n
s n
Proof: Results follow from the boundary condition on R0 ; see Nordman (2002) for more details.
We require the next lemma for counting the number of subsampling regions which are separated by
an appropriately “small” integer translate; we shall apply this lemma in the proof of Theorem 3.1. For
k = (k1 , . . . , kd )′ ∈ Zd , define the following sets,
Jn (k) = |{i ∈ JOL : i + k + s∆n R0 ⊂ Rn }|,
En = {k ∈ Zd : |kj | ≤ sλ(n)
j , j = 1, . . . , d}.
Lemma 8.4 Under Assumption A.2,
Jn (k) max 1 −
= o(1).
k∈En |JOL | Proof: For k ∈ En , write the set Jn∗ (k) and bound its cardinality:
Jn∗ (k)
=
≤
|{i ∈ JOL : (i + k + s∆n R0 ) ∩ ∆n R0c 6= ?}|
d
max
max d−1
|{i ∈ Zd : T i ∩ ∆n R0c 6= ?, T i ∩ ∆n R0 6= ?; T i = i + sλmax
,
n [−2, 2] }| ≤ C sλn (λn )
by the boundary condition on R0 . We have then that for all k ∈ En ,
max d−1
|JOL | ≥ Jn (k) = |JOL | − Jn∗ (k) ≥ |JOL | − C sλmax
.
n (λn )
By Assumption A.2 and the growth rate of |JOL | from Lemma 8.3, the proof is complete. We now provide a theorem which captures the main contribution to the asymptotic variance expan2
sion of the OL subsample variance estimator τ̂n,
from Theorem 3.1.
OL
22
Theorem 8.1 For i ∈ Zd , let Yi,n = ∇′ (Zi,n − µ). Under the Assumptions and Conditions of Theorem 3.1,
sN n
X
k∈En
2
2
Cov(Y0,n
, Yk,n
) = K0 · [2τ 4 ](1 + o(1)),
where the constant K0 is defined in Theorem 3.1.
Proof: We give only a sketch of the important features; for more details, see Nordman (2002). For a
set T ⊂ IRd , define the function Σ(·) as
X
Σ(T ) =
s∈Zd ∩T
∇′ Z(s) − µ .
(I)
With the set intersection Rk,n
= sRn ∩ (k + sRn ), k ∈ Zd , write functions:
(I)
H1n (k) = Σ(Rk,n \ Rk,n
),
(I)
H2n (k) = Σ(R0,n \ Rk,n
),
(I)
H3n (k) = Σ(Rk,n
).
These represent, respectively, sums over sites in: Rk,n but not R0,n = sRn ; R0,n but not Rk,n ; and
both R0,n and Rk,n . Then define hn (·) : Zd → IR as
2
2
2
2
2
2
4
2
hn (k) = E[H1n
(k)]E[H2n
(k)]+E[H1n
(k)]E[H3n
(k)]+E[H2n
(k)]E[H3n
(k)]+E[H3n
(k)]−(sN n )4 [E(Y0,n
)]2 .
We will make use of the following proposition.
Proposition 8.1 Under the Assumptions and Conditions of Theorem 3.1,
2
2
) − (sN n )−2 hn (k) = o(1).
max (sN n )2 Cov(Y0,n
, Yk,n
k∈En
The proof of Proposition 8.1 can be found in Nordman (2002) and involves cutting out Zd lattice points
near the borders of R0,n and Rk,n , (say) Bn0 and Bnk with
e
ℓn = ⌊(sλ(n)
min ) ⌋,
c 6= ? ,
Bj,n = i ∈ Zd : i ∈ Rj,n , (i + ℓn (−1, 1]d) ∩ Rj,n
j ∈ Zd ,
where e = (κδ/{(2r + δ)(2r − 1 − 1/d)} + 1)/2 < 1 from Condition Mr . Here ℓn → ∞, ℓn = o(sλ(n)
min ) is
(I)
chosen so that the remaining observations in R0,n , Rk,n , Rk,n
are nearly independent upon remov-
ing B0,n , Bk,n points and, using the R0 -boundary condition, the set cardinalities |B0,n |, |Bk,n | ≤
d−1
Cℓn (sλmax
are of smaller order than sN n (namely, these sets are asymptotically negligible in size).
n )
By Proposition 8.1 and |En | = O(sN n ), we have
X
X
2
2
, Yk,n
) − (sN n )−3
hn (k) = o(1).
Cov(Y0,n
sN n
k∈En
Consequently, we need only focus on (sN n )−3
(8.1)
k∈En
P
k∈En
hn (k) to complete the proof of Theorem 8.1.
For measurability reasons, we create a set defined in terms of the IRd Lebesgue measure:
o
det(∆0 )|R0 | n
E+ ≡ (0, 1) ∩ ǫ <
: x ∈ IRd : |(x + ∆0 R0 ) ∩ ∆0 R0 | = ǫ or det(∆0 )|R0 | − ǫ = 0 .
2
23
Note the set {0 < ǫ < min{1, (det(∆0 )|R0 |)/2} : ǫ ∈
/ E+ } is at most countable [cf. Billingsley (1986),
Theorem 10.4]. For ǫ ∈ E+ , define a new set as a function of ǫ and n:
(I)
(I)
(n) d
d
R̃ǫ,n = {k ∈ Zd : |Rk,n
| > ǫ(sλ(n)
1 ) , |sRn \ Rk,n | > ǫ(sλ1 ) }.
(I)
Here R̃ǫ,n ⊂ En because k 6∈ En implies Rk,n
= ?.
P
We now further simplify (sN n )−3 k∈En hn (k) using the following proposition involving R̃ǫ,n .
Proposition 8.2 There exist N ∈ Z+ and a function b(·) : E+ → (0, ∞) such that b(ǫ) ↓ 0 as ǫ ↓ 0 and
X
X
−1
d
(sN n )−3 hn (k) −
hn (k) ≤ C ǫ + (sλ(n)
)
+
[b(ǫ)]
,
1
k∈En
(8.2)
k∈R̃ǫ,n
where C > 0 does not depend on ǫ ∈ E+ or n ≥ N.
The proof of Proposition 8.2 is tedious and given in Nordman (2002). The argument involves bounding
the sum of hn (·) over two separate sets in En : those integers in En that are either “too large” or “too
small” in magnitude to be included in R̃ǫ,n .
To finish the proof, our approach (for an arbitrary ǫ ∈ E+ ) will be to write (sN n )−3
P
k∈R̃ǫ,n
hn (k)
as an integral of a step function with respect to the Lebesgue measure of a step function, say fǫ,n (x),
then show limn→∞ fǫ,n (x) exists almost everywhere (a.e.) on IRd , and apply the Lebesgue Dominated
P
2
2
Convergence Theorem (LDCT). By letting ǫ ↓ 0, we will obtain the limit of sN n k∈En Cov(Y0,n
, Yk,n
).
Fix ǫ ∈ E+ . With counting arguments based on the boundary condition of R0 and the definition of
(I)
(I)
R̃ǫ,n , it holds that for some Nǫ ∈ Z+ and all k ∈ R̃ǫ,n : |Rk,n
∩ Zd | ≥ 1 and sN n − |Rk,n
∩ Zd | ≥ 1 when
n ≥ Nǫ . We can rewrite (sN n )−2 hn (k), k ∈ R̃ǫ,n , in the well-defined form (for n ≥ Nǫ ):
hn (k)
(sN n )2
=
# "
#
!2
(I)
2
2
|Rk,n
∩ Zd |
H1n
(k)
H2n
(k)
E
E
1−
(I)
(I)
d
d
sN n
sN n − |Rk,n ∩ Z |
sN n − |Rk,n ∩ Z |
#
"
"
#
!
!
(I)
(I)
2
2
2
X
|Rk,n
∩ Zd |
|Rk,n
∩ Zd |
Hjn
(k)
H3n
(k)
E
+
E
1−
(I)
(I)
d
|Rk,n
∩ Zd |
sN n
sN n
sN n − |Rk,n ∩ Z |
j=1
"
#
!
2
(I)
4
|Rk,n
∩ Zd |
H3n
(k)
2
+E
− [sN n E(Y0,n
)]2 .
(I)
|Rk,n
∩ Zd |2
sN n
"
For x = (x1 , ..., xd )′ ∈ IRd , write ⌊x⌋ = (⌊x1 ⌋, . . . , ⌊xd ⌋)′ ∈ Zd . Let fǫ,n (x) : IRd → IR be the step
function defined as
fǫ,n (x) = (sN n )−2 I{⌊sλ(n) x⌋∈R̃ǫ,n } hn (⌊sλ(n)
1 x⌋).
1
+
We have then that (with the same fixed ǫ ∈ E ):
d Z
X
1
(sλ(n)
−2
1 )
(sN n ) hn (k) =
fǫ,n (x)dx.
sN n
sN n
IRd
k∈R̃
ǫ,n
24
(8.3)
We focus on showing:
lim fǫ,n (x) = fǫ (x) ≡ I{x∈R̃ǫ } 2τ 4
n→∞
|(x + ∆0 R0 ) ∩ ∆0 R0 |
det(∆0 )|R0 |
2
a.e. x ∈ IRd ,
(8.4)
with R̃ǫ = {x ∈ IRd : |(x + ∆0 R0 ) ∩ ∆0 R0 | > ǫ, |∆0 R0 \ (x + ∆0 R0 )| > ǫ}, a Borel measurable set.
Write xn = ⌊sλ1(n) x⌋, x ∈ IRd , to ease the notation. To establish (8.4), we begin by showing
convergence of indicator functions:
I{xn ∈R̃ǫ,n } → I{x∈R̃ǫ }
a.e. x ∈ IRd .
(8.5)
−1
−1
Define the sets An (x) = (sλ(n)
{(xn + sRn ) ∩ sRn }, Ãn (x) = {(sλ(n)
sRn } \ An (x) as a function of
1 )
1 )
x ∈ IRd . The LDCT can be applied to show: for each x ∈ IRd , |An (x)| → |(x + ∆0 R0 ) ∩ ∆0 R0 | and
|Ãn (x)| → |∆0 R0 \ (x + ∆0 R0 )|. Thus, if x ∈ R̃ǫ , then
|An (x)| → |(x + ∆0 R0 ) ∩ ∆0 R0 | > ǫ,
|Ãn (x)| → |∆0 R0 \ (x + ∆0 R0 )| > ǫ,
(8.6)
implying further that 1 = I{xn ∈R̃ǫ,n } → I{x∈R̃ǫ } = 1. Now consider R̃ǫc . If x 6∈ R̃ǫ such that |(x+∆0 R0 )∩
∆0 R0 | < ǫ (or |∆0 R0 \ (x + ∆0 R0 )| < ǫ), then |An (x)| < ǫ (or |Ãn (x)| < ǫ) eventually for large n and
0 = I{xn ∈R̃ǫ,n } → I{x∈R̃ǫ } = 0 in this case. Finally, ǫ ∈ E+ implies that a last possible subset of R̃ǫc has
Lebesgue measure zero; namely, x ∈ R̃ǫc : |(x + ∆0 R0 ) ∩ ∆0 R0 | = ǫ or |∆0 R0 \ (x + ∆0 R0 )| = ǫ = 0.
We have now proven (8.5).
We next establish a limit for (sN n )−2 hn (xn ), x ∈ R̃ǫ . We wish to show:
|Rx(I)n ,n ∩ Zd |
|(x + ∆0 R0 ) ∩ ∆0 R0 |
→
,
N
det(∆0 )|R0 |
s n
x ∈ R̃ǫ .
(8.7)
d−1
Using the bound | |Rx(I)n ,n | − |Rx(I)n ,n ∩ Zd | | ≤ C(sλmax
from the R0 -boundary condition and noting
n )
−d
the limit in (8.6), we find: (s λ(n)
|Rx(I)n ,n ∩ Zd | → |(x + ∆0 R0 ) ∩ ∆0 R0 |, x ∈ R̃ǫ . By this and
1 )
(sλ1(n) )d /sN n → (det(∆0 )|R0 |)−1 , (8.7) follows.
We can also establish: for each x ∈ R̃ǫ , j = 1 or 2,
"
#
"
#
2j
2
Hjn
(xn )
H3n
(xn )
′
2j
2
E
→ E([∇ Z∞ ] ), E
, sN n E(Y0,n
) → E([∇′ Z∞ ]2 )
(I)
d
|Rx(I)n ,n ∩ Zd |j
sN n − |Rxn ,n ∩ Z |
(8.8)
where ∇′ Z∞ is a normal N (0, τ 2 ) random variable and so it follows E([∇′ Z∞ ]2j ) = (2j − 1)τ 2j , j = 1, 2.
The limits in (8.8) follow essentially from the Central Limit Theorem (CLT) of Bolthausen (1982), after
verifying that the CLT can be applied; see Nordman (2002) for more details.
Putting (8.5), (8.7), and (8.8) together, we have shown the (a.e.) convergence of the univariate
functions fǫ,n (x) as in (8.4). For k ∈ En and n ≥ Nǫ , Lemma 8.2 ensures: (sN n )−2 |hn (k)| ≤ C,
implying that for x ∈ IRd : |fǫ,n (x)| ≤ CI{x∈[−c,c]d} for some c > 0 by Assumption A.1. With this
uniform bound on fǫ,n (·) and the limits in (8.4), we can apply the LDCT to get
Z
Z
lim
f
(x)dx
=
fǫ (x)dx, ǫ ∈ E+ .
ǫ,n
d
n→∞ IRd
IR
25
(8.9)
+
Let {ǫm }∞
where ǫm ↓ 0. Then, R̃ǫm ⊂ ∆0 [−1, 1]d and limm→∞ I{x∈R̃ǫm } → I{x∈R̃0 } for
m=1 ⊂ E
x 6= 0 ∈ IRd , with R̃0 = x ∈ IRd : 0 < |(x + ∆0 R0 ) ∩ ∆0 R0 | < det(∆0 )|R0 | . Hence, by the LDCT,
lim
m→∞
Z
IRd
fǫm (x)dx =
Z
IRd
f0 (x)dx,
2
|(x + ∆0 R0 ) ∩ ∆0 R0 |
.
f0 (x) ≡ I{x∈R̃0 } [2τ ]
det(∆0 )|R0 |
4
(8.10)
From (8.1)-(8.3), (8.9)-(8.10) and (sλ1(n) )d /sN n → (det(∆0 )|R0 |)−1 , we have that:
Z
X
1
2
2
limn→∞ sN n
Cov(Y0,n , Yk,n ) −
f
(x)dx
0
det(∆0 )|R0 | IRd
k∈En
Z
Z
X
(sλ1(n) )d
1
2
2
≤ lim sN n
Cov(Y0,n , Yk,n ) −
fǫm ,n (x)dx +
fǫm (x) − f0 (x)dx
d
d
det(∆0 )|R0 | IR
sN n
IR
k∈En
(n) d Z
Z
(sλ )
1
+ lim 1
f
(x)dx
f
(x)dx
−
ǫ
,n
ǫ
m
m
det(∆0 )|R0 | IRd
sN n
IRd
Z
1
→ 0 as ǫm ↓ 0.
≤ C ǫm + [b(ǫm )]d +
f
(x)
−
f
(x)dx
ǫ
0
m
det(∆0 )|R0 | IRd
Finally,
Z
Z
2τ 4
|(y + R0 ) ∩ R0 |2
1
f
(x)dx
=
dy,
0
det(∆0 )|R0 | IRd
|R0 | IRd
|R0 |2
using a change of variables y = ∆−1
0 x. This completes the proof of Theorem 8.1. For clarity of exposition, we will prove Theorem 3.1, parts (a) and (b), separately for the OL and
NOL subsample variance estimators.
8.1
Proof of Theorem 3.1(a)
For i ∈ JOL , we use a Taylor’s expansion of H(·) (around µ) to rewrite the statistic θ̂i,n = H(Zi,n ):
θ̂i,n
X
X (Zi,n − µ)α Z 1
cα (Zi,n − µ) + 2
(1 − ω)Dα H(µ + ω(Zi,n − µ))dω
α!
0
α
=
H(µ) +
≡
H(µ) + Yi,n + Qi,n .
kαk1 =1
kαk1 =2
(8.11)
We also have: θ̃n = H(µ) + |JOL |−1
2
τ̂n,
OL = sN n
1
|JOL |
X
i∈JOL
2
Yi,n
+
P
1
|JOL |
i∈JOL
X
Yi,n + |JOL |−1
Q2i,n +
i∈JOL
2
|JOL |
P
i∈JOL
X
i∈JOL
Qi,n ≡ H(µ) + Ȳn + Q̄n . Then,
Yi,n Qi,n − Ȳn2 − Q̄2n − 2(Ȳn )(Q̄n ) .
We establish Theorem 3.1(a) in two parts by showing
X
det(s∆n )
sN n
2
(a) Var
Yi,n = K0 ·
· [2τ 4 ](1 + o(1)),
|JOL |
det(∆n )
i∈JOL
X
sN n
2
2
= o det(s∆n ) .
(b) Var(τ̂n,
)
−
Var
Y
OL
i,n |JOL |
det(∆n )
i∈JOL
26
(8.12)
2
2
We will begin with proving (8.12)(a) above. For k ∈ Zd , let σn (k) = Cov(Y0,n
, Yk,n
). We write
X
(sN n )2
(sN n )2 X
2
Var
Yi,n
=
Jn (k)σn (k) +
2
|JOL |
|JOL |2
i∈JOL
k∈En
X
k∈Zd \En
Jn (k)σn (k) ≡ W1n + W2n .
4
By stationarity and Lemma 8.2, we bound: |σn (k)| ≤ E(Y0,n
) ≤ C(sN n )−2 , k ∈ Zd . Using this
covariance bound, Lemmas 8.3-8.4, and |En | ≤ 3d det(s∆n ):
(sN n )2 X
|En |
Jn (k) det(s∆n )
σn (k) − W1n ≤ C ·
· max 1 −
= o
.
|JOL |
|JOL | k∈En |JOL | det(∆n )
(8.13)
k∈En
Then applying Theorem 8.1 and Lemma 8.3,
(sN n )2 X
det(s∆n )
σn (k) = K0 ·
· [2τ 4 ](1 + o(1)).
|JOL |
det(∆n )
(8.14)
k∈En
By (8.13)-(8.14), we need only show that W2n = o (det(s∆n )/ det(∆n )) to finish the proof of (8.12)(a).
For i ∈ Zd , denote a set of lattice points within a translated rectangular region:
Qd
(n)
∩ Zd ,
Fi,n = i + j=1 − ⌈sλ(n)
j ⌉/2, ⌈sλj ⌉/2
where ⌈·⌉ represents the “ceiling” function. Note that for k = (k1 , . . . , kd )′ ∈ Zd \ En , there exists
d
d
j ∈ {1, . . . , d} such that |kj | > s λ(n)
j , implying dis(R0,n ∩ Z , Rk,n ∩ Z ) ≥ dis(F0,n , Fk,n ) ≥ 1.
Hence, sequentially using Lemmas 8.1-8.2, we may bound the covariances σn (k), k ∈ Zd \ En , with the
mixing coefficient α(·, ·):
|σn (k)|
h
i2r/(2r+δ)
2(2r+δ)/r
≤ 8 E(Y0,n
)
α(dis(R0,n ∩ Zd , Rk,n ∩ Zd ), sN n )δ/(2r+δ)
≤ C(sN n )−2 α(dis(F0,n , Fk,n ), sN n )δ/(2r+δ) .
From the above bound and Jn (k)/|JOL | ≤ 1, k ∈ Zd , we have
|W2n | ≤
C(x,j,n)
=
C|JOL |−1
∞ X
d
X
x=1
j=1
C(x,j,n) α(x, sN n )δ/(2r+δ) ,
(8.15)
n
o
i ∈ Zd : dis(F0,n , Fi,n ) = x = inf{|vj − wj | : v ∈ F0,n , w ∈ Fi,n } .
The function C(x,j,n) counts the number of translated rectangles Fi,n that lie a distance of x ∈ Z+ from
the rectangle F0,n , where this distance is realized in the jth coordinate direction for j = 1, . . . , d. For
i ∈ Zd , x ≥ 1 and j ∈ {1, . . . , d}, if dis(F0,n , Fi,n ) = x = inf{|vj − wj | : v ∈ F0,n , w ∈ Fi,n }, then
|ij | = ⌈sλj(n) ⌉ + x − 1 with the remaining components of i, namely im for m ∈ {1 . . . d} \ {j}, constrained
27
by |im | ≤ sλ(n)
m + x. We use this observation to further bound the right hand side of (8.15) by:
C|JOL |−1
∞ X
d
X
x=1
det(s∆n )
|JOL |
≤
C·
≤
det(s∆n )
C·
|JOL |
d
Y
j=1 m=1,j6=m
d
X
−j
(sλ(n)
min )
ℓn
hX
dℓn
+
(n)
sλmin
∞
X
xj−1 +
x=1
j=1
(n)
3(sλm
+ x) α(x, sN n )δ/(2r+δ)
x=ℓn +1
1/e dκδ/(2r+δ)
ℓn
ℓ2rd−d
n
xj−1 [α1 (x)g(sN n )]δ/(2r+δ)
∞
X
2rd−d−1
x
δ/(2r+δ)
α1 (x)
x=ℓn +1
i
=o
det(s∆n )
det(∆n )
,
using Assumptions A.1, A.3, Condition Mr , and ℓn = o(sλ(n)
min ). This completes the proof of (8.12)(a).
To establish (8.12)(b), first note that
X
X
5
5
X
X
sN n
sN n
1/2
1/2
1/2
2
2
2
Var(τ̂n,
Y
≤
4
A
A
+
Var
Y
,
)
−
Var
i,n i,n
OL
jn
jn
|JOL |
|JOL |
j=1
i∈JOL
j=1
i∈JOL
P
where A1n = Var(sN n Ȳn2 ), A2n = Var(|JOL |−1 sN n i∈JOL Q2i,n ), A3n = Var(sN n Q̄2n ), A4n =
P
Var(|JOL |−1 sN n i∈JOL Yi,n Qi,n ), A5n = Var(sN n Ȳn Q̄n ).
By (8.12)(a), it suffices to show that Ajn = o(det(s∆n )/ det(∆n )) for each j = 1, . . . , 5. We handle
only two terms for illustration: A1n , A4n .
Consider A1n . For s ∈ Rn ∩ Zd , let ω(s) = [2d det(s∆n )]−1 |{i ∈ JOL : s ∈ i + s∆n R0 }| so that
0 ≤ ω(s) ≤ 1. By Condition Mr and Theorem 3, Doukhan (1994, p.31) (similar to Lemma 8.2),
h X
i4
(2d det(s∆n ))4
(N n )2 (det(s∆n ))4
4
′
A1n ≤ E(Ȳn ) =
Z(s)
−
µ
≤
C
·
E
ω(s)∇
. (8.16)
|JOL |4 (sN n )2
|JOL |4 (sN n )2
d
s∈Rn ∩Z
Then A1n = o(det(s∆n )/ det(∆n )) follows from Lemma 8.3.
To handle A4n , write σ1n (k) = Cov(Y0,n Q0,n , Yk,n Qk,n ), k ∈ Zd . Then,
A4n
X
(sN n )2 X
(sN n )2 X
=
Jn (k)σ1n (k) ≤
|σ1n (k)|+
2
|JOL |
|JOL |
d
d
k∈Z
k∈En
k∈Z \En
|σ1n (k)| = A4n (En )+A4n (Enc ).
For k ∈ En , note |σ1n (k)| ≤ C(sN n )−3 using |Y0,n Q0,n | ≤ CkZ0,n − µk3 (1 + kZ0,n − µka ) (from
Condition D) with Lemmas 8.1-8.2. From this bound, Lemma 8.3, and |En | ≤ 3d det(s∆n ), we find
A4n (En ) = o(det(s∆n )/ det(∆n )). We next bound the covariances σ1n (k), k ∈ Zd \ En ,
|σ1n (k)|
h
i2r/(2r+δ)
≤ 8 E(|Y0,n Q0,n |(2r+δ)/r )
α(dis(R0,n ∩ Zd , Rk,n ∩ Zd ), sN n )δ/(2r+δ)
≤ C(sN n )−3 α(dis(F0,n , Fk,n ), sN n )δ/(2r+δ)
by the stationarity of the random field Z(·) and Lemmas 8.1-8.2. Using this inequality and repeating
the same steps used to majorize ‘W2n ’ from the proof of (8.12)(a) (see (8.15)), we have A4n (Enc ) =
o(det(s∆n )/ det(∆n )). The proof of Theorem 3.1(a) is finished. 28
8.2
Proof of Theorem 3.1(b)
To simplify the counting arguments, we assume here that s∆n ∈ Z+ , implying sN i,n = sN n , i ∈ Zd .
The more general case, in which the NOL subregions may differ in the number of sampling sites, is
treated in Nordman (2002).
For each NOL subregion R̃i,n , we denote the corresponding sample mean Z̃i,n = (sN i,n )−1
P
s∈R̃i,n ∩Zd
The subsample evaluations of the statistic of interest, θ̂i,n , i ∈ JN OL , can be expressed through a Taylor’s
expansion of H(·) around µ, substituting Z̃i,n for Zi,n in (8.11): θ̂i,n = H(Z̃i,n ) = H(µ) + Ỹi,n + Q̃i,n .
We will complete the proof of Theorem 3.1(b) in two parts by showing
X
det(s∆n )
sN n
2
Ỹi,n
=
· [2τ 4 ](1 + o(1)),
(a) Var
|JN OL |
det(∆n )|R0 |
i∈JN OL
X
det(s∆n )
sN n
2
2
(b) Var(τ̂n,N OL ) − Var
Ỹi,n = o
.
|JN OL |
det(∆n )
(8.17)
i∈JN OL
We will begin with showing (8.17)(a). For k ∈ Zd , let J˜n (k) = {i ∈ JN OL : i + k ∈ JN OL } and
σ̃n (k) = Cov(Ỹ0,n , Ỹk,n ). Then we may express the variance:
X
X
(sN n )2
sN n
2
Ỹi,n
=
J˜n (k)σ̃n (k) +
Var
|JN OL |
|JN OL |2
d
i∈JN OL
k∈Z
0<kkk∞ ≤1
X
˜
Jn (k)σ̃n (k) + |JN OL |σ̃n (0)
k∈Zd
kkk∞ >1
≡ U1n + U2n + |JN OL |−1 (sN n )2 σ̃n (0).
(8.18)
We first prove U2n = o(|JN OL |−1 ), noting that det(s∆n )/ det(∆n ) = O(|JN OL |−1 ) from Lemma 8.3.
When k = (k1 , . . . , kd )′ ∈ Zd , kkk∞ > 1, then for some 1 ≤ mk ≤ d:
dis(R̃0,n ∩ Zd , R̃k,n ∩ Zd ) ≥ dis s∆n [−1/2, 1/2]d, s∆n (k + [−1/2, 1/2]d)
=
(n)
max (|kj | − 1)sλ(n)
≡ (|kmk | − 1)sλm
.
j
k
1≤j≤d
−1
d
If j ∈ {1 . . . d}, j 6= mk , we have |kj | ≤ (s λ(n)
(|kmk | − 1)s λ(n)
mk + 1. Note also if k ∈ Z , kkk∞ > 1,
j )
then |σ̃(k)| ≤ C(sN n )−2 α (|kmk | − 1)sλ(n)
mk , sN n by Lemmas 8.1-8.2. Hence, we have
∞ d
d
C X X (n) −(d−1) Y
(n)
(n)
(n)
|U2n | ≤
(sλ )
(sλi x + sλj ) α(sλmin
x, sN n )δ/(2r+δ)
|JN OL | x=1 j=1 j
i=1,i6=j
max d−1
∞
X
C
sλn
δ/(2r+δ)
≤
xd−1
α(sλ(n)
min x, sN n )
(n)
|JN OL | x=1
sλmin
∞
C {det(s∆n )}κδ/(2r+δ) X (n) 2rd−d−1
1
(n)
δ/(2r+δ)
≤
(sλmin x)
α1 (sλmin x)
= o
,
2rd−d−1
|JN OL | (sλ(n)
|JN OL |
min )
x=1
by Assumptions A.1, A.3 and Condition Mr .
We now show that U1n = o(|JN OL |−1 ). For k ∈ Zd , 0 < kkk∞ ≤ 1, define



{x ∈ IRd : 1/2 · sλj(n) < xj ≤ 1/2 · sλ(n)

j + ℓn }

j
(n)
Tk,n =:
{x ∈ IRd : −1/2 · sλ(n)
j − ℓn < xj ≤ −1/2 · sλj }



 ?
29
the trimmed set:
if kj = 1,
if kj = −1,
if kj = 0,
Z(s).
j
for each coordinate direction j = 1, . . . , d. Let Tk,n = ∪dj=1 Tk,n
. We decompose the sum: sN n Ỹk,n =
∗
Σ(R̃k,n \ Tk,n ) + Σ(R̃k,n ∩ Tk,n ) ≡ S̃k,n + S̃k,n
. Then, U1n = o(|JN OL |−1 ) follows from (1)-(4) below:
h
i1/3
2
∗
6
∗
(1) |E(Ỹ0,n
S̃k,n S̃k,n
)| ≤ E(Ỹ0,n
)E(|S̃k,n |3 )E(|S̃k,n
|3 )
= o(1), using Lemma 8.2 and
|R̃k,n ∩ Tk,n ∩ Zd | ≤
d
X
j=1
j
|R̃k,n ∩ Tk,n
∩ Zd | ≤ ℓn det(s∆n )
d
X
−1
(sλ(n)
.
j )
j=1
2
∗2
4
∗4
(2) Likewise, E(Ỹ0,n
S̃k,n
) ≤ [E(Ỹ0,n
)E(S̃k,n
)]1/2 = o(1).
nh
i1/2
o
2
2
2
∗2
∗2
(3) |sN n E(Ỹk,n
) − (sN n )−1 E(S̃k,n
)| ≤ 4(sN n )−1 max E(S̃k,n
)E(S̃k,n
)
, E(S̃k,n
) = o(1).
2
2
(4) |Cov(Ỹ0,n
, S̃k,n
)| ≤ Cα(ℓn , sN n )δ/(2r+δ) = o(1) by applying Lemmas 8.1-8.2, Assumption A.3, and
Condition Mr with dis(R̃k,n ∩ Zd \ Tk,n , sRn ∩ Zd ) ≥ ℓn .
2
Since σ̃n (0) = Var(Y0,n
), the remaining quantity in (8.18) can be expressed as
(sN n )2
1
det(∆n )
σ̃n (0) =
Var [∇′ Z∞ ]2 (1 + o(1)) =
[2τ 4 ](1 + o(1))
|JN OL |
|JN OL |
det(∆n )|R0 |
by applying the CLT (as in (8.8)) and Lemma 8.3. We have now established (8.17)(a).
We omit the proof of (8.17)(b), which resembles the one establishing (8.12)(b) and incorporates
arguments used to bound U1n , U2n ; Nordman (2002) provides more details. 9
Proofs for bias expansions
We will use the following lemma concerning τn2 = N n Var(θ̂n ) to prove the theorems pertaining to bias
2
2
expansions of τ̂n,
and τ̂n,
.
OL
N OL
Lemma 9.1 Under the Assumptions and Conditions of Theorem 3.1,
τn2 = τ 2 + O [det(∆n )]−1/2 .
Proof: By a Taylor’s expansion around µ: θ̂n = H(Z̄Nn ) = H(µ) + ȲNn + Q̄Nn (replacing Z̄Nn for Zi,n in
(8.11)) and so N n Var(θ̂n ) = N n Var(ȲNn + Q̄Nn ). For k ∈ Zd , let Nn (k) = |{i ∈ Rn ∩ Zd : i + k ∈ Rn }|.
It holds that Nn (k) ≤ Nn and
Nn
≤
Nn (k) + |{i ∈ Zd : T i ∩ Rn 6= ?, T i ∩ Rnc 6= ?; T i = i + kkk∞ [−1, 1]d}|
≤
d−1
Nn (k) + Ckkkd∞ (λmax
,
n )
(9.1)
by the boundary condition on R0 . Also, by Lemma 8.1 and stationarity, for each k 6= 0 ∈ Zd :
|σ(k)| ≤ Cα1 (kkk∞ )δ/(2r+δ) ,
k ∈ Zd .
(9.2)
Using |{k ∈ Zd : kkk∞ = x}| ≤ Cxd−1 , x ≥ 1, the covariances are absolutely summable over Zd :
X
k∈Zd
|σ(k)| ≤ |σ(0)| + C
∞
X
x=1
30
xd−1 α1 (x)δ/(2r+δ) < ∞.
(9.3)
From (9.1)-(9.3), we find
N n Var(ȲNn )
1 X
Nn (k)σ(k) = τ 2 + In ;
Nn
d
=
(9.4)
k∈Z
1 X
|N n − Nn (k)| · |σ(k)|
Nn
d
|In | ≤
k∈Z
≤ C·
∞
d−1 X
(λmax
n )
x2d−1 α1 (x)δ/(2r+δ) = O [det(∆n )]−1/d .
Nn
x=1
(9.5)
By Condition D and Lemma 8.2, it follows that N n Var(Q̄Nn ) = O([det(∆n )]−1 ).
Finally, with bounds on the variance of ȲNn and Q̄Nn , we apply the Cauchy-Schwartz inequality to the covariance: Nn |Cov(ȲNn , Q̄Nn )| = O([det(∆n )]−1/2 ), setting the order on the difference
|N n Var(θ̂n ) − τ 2 |. 2
2
We give a few lemmas which help compute the bias of the estimators τ̂n,
OL and τ̂n,N OL .
Lemma 9.2 Let Ỹi,n = (sN i,n )−1
P
′
s∈Zd ∩R̃i,n∇ (Z(s)
− µ), i ∈ Zd . Suppose Assumptions A.1 - A.5
and Conditions D2 and M2+a hold with d ≥ 2. Then,
2
2
E(τ̂n,
) − sN 0,n E(Ỹ0,n
)
OL
2
−1
E(τ̂n,
N OL ) − |JN OL |
P
2
i∈JN OL sN i,n E(Ỹi,n )
=: O [det(s∆n )]−1/2 + o [det(s∆n )]−1/d .
2
2
Proof: We consider here only E(τ̂n,
OL ). For integer s∆n , the arguments for E(τ̂n,N OL ) are essentially
the same; more details are provided in Nordman (2002).
By stationarity and an algebraic expansion:
h
i
2
2
2
2
2
E(τ̂n,
)
=
N
E(Y
)
+
E(Q
)
+
2E(Y
Q
)
−
E(
Ȳ
)
−
E(
Q̄
)
−
2E(
Ȳ
Q̄
)
.
s
n
0,n
0,n
n
n
OL
0,n
0,n
n
n
With the moment arguments based on Lemma 8.2 and Condition Dr , we have
2
sN n E(Y0,n )
≤ C,
2
sN n E(Ȳn )
≤ C sN n (N n )−1 ,
2
2
sN n E(Q0,n ), sN n E(Q̄n )
≤ C(sN n )−1 ,
(9.6)
where bound on sN n E(Ȳn2 ) follows from (8.16). By Holder’s inequality and Assumption A.2:
2
2
E(τ̂n,
) = sN n E(Y0,n
) + O (sN n )−1/2 + O sN n (N n )−1 .
OL
2
2
Note that E(Y0,n
) = E(Ỹ0,n
), sN n = sN 0,n . Hence, applying Lemma 8.3 and Assumption A.2, we
2
establish Lemma 9.2 for τ̂n,
. OL
The next lemma provides a small refinement to Lemma 9.2 made possible when the function H(·)
2
2
is smoother. We shall make use of this lemma in bias expansions of τ̂n,
OL and τ̂n,N OL in lower sampling
dimensions, namely d = 1 or 2.
31
Lemma 9.3 Assume d = 1 or 2. In addition to Assumptions A.1 - A.5, suppose that Conditions D3
and M3+a hold. Then,
 
 O [det(s∆n )]−1
=:
P

 o [det( ∆ )]−1/2
2
−1
2
E(τ̂n,N OL ) − |JN OL |
N
E(
Ỹ
)
s
n
s
i,n
i,n
i∈JN OL
2
2
E(τ̂n,
) − sN 0,n E(Ỹ0,n
)
OL
if d = 1,
if d = 2.
2
Proof: We again consider only τ̂n,
. For i ∈ JOL , we use a third-order Taylor’s expansion of each
OL
subsample statistic around µ: θ̂i,n = H(µ) + Yi,n + Qi,n + Ci,n , where Yi,n = ∇′ (Zi,n − µ);
Qi,n
Z 1
X cα
X cα
α
α
=
(Zi,n − µ) ; Ci,n = 3
(Zi,n − µ)
(1 − ω)2 Dα H(µ + ω(Zi,n − µ))dω.
α!
α!
0
kαk1 =2
kαk1 =3
Here Ci,n denotes the remainder term in the Taylor’s expansion and Qi,n is defined a little differently.
P
Write the sample means for the Taylor terms: Ȳn , Q̄n as before, C̄n = |JOL |−1 i∈JOL Ci,n . The
moment inequalities in (9.6) are still valid and, by Lemma 8.2 and Condition D, we can produce
2
bounds: sN n E(C0,n
), sN n E(C̄n2 ) ≤ C(sN n )−2 . By Holder’s inequality and the scaling conditions from
Assumptions A.1-A.2, we then have

 O [det(s∆n )]−1

h
i
2
2
E(τ̂n,
+
OL ) = sN n E(Y0,n ) + 2E(Y0,n Q0,n )

 o [det( ∆ )]−1/2 s n
if d = 1,
if d = 2.
2
2
2
Since sN n E(Y0,n
) = sN 0,n E(Ỹ0,n
), Lemma 9.3 for τ̂n,
will follow by showing
OL
sN n E(Y0,n Q0,n )
= sN n
p
X
i,j,k=1
h
i
ci aj,k E (Zi,0,n − µ)(Zj,0,n − µ)(Zk,0,n − µ) = O [det(s∆n )]−1 ,
(9.7)
where Z0,n = (Z1,0,n , . . . , Zp,0,n )′ ∈ IRp is a vector of coordinate sample means; ci = ∂H(µ)/∂xi ;
aj,k = 1/2 · ∂ 2 H(µ)/∂xj ∂xk .
Denote the observation Z(s) = (Z1 (s), . . . , Zp (s))′ ∈ IRd , s ∈ Zd . Fix i, j, k ∈ {1, . . . , p} and WLOG
ijk assume µ = 0. Then, sN n E(Zi,0,n Zj,0,n Zk,0,n ) = (sN n )−1 E(Zi (t)Zj (t)Zk (t)) + Lijk
1n + L2n where
Lijk
1n
=
(sN n )−2
X
u,v,w∈Zd ∩sRn ,
u6=v6=w
Lijk
2n
=
(sN n )−2
X
h
i
E Zi (u)Zj (v)Zk (w) ,
h
i
E Zi (u)Zj (u)Zk (v) + Zi (u)Zj (v)Zk (u) + Zi (v)Zj (u)Zk (u) .
u,v∈Zd ∩sRn ,
u6=v
By Lemma 8.1, Assumption A.3, and Condition Mr ,
|Lijk
2n | ≤
∞
C X d−1
x α(x, 1)δ/(2r+δ) = O [det(s∆n )]−1 ,
sN n x=1
similar to (9.3). For y1 , y2 , y3 ∈ IRd , define dis3 ({y1 , y2 , y3 }) = max1≤i≤3 dis({yi }, {y1 , y2 , y3 } \
{yi }). If x ≥ 1 ∈ Z+ , then |{(y1 , y2 ) ∈ (Zd )2 : dis3 ({y1 , y2 , 0}) = x}| ≤ Cx2d−1 from Theorem 4.1,
32
Lahiri (1999a). Thus,
|Lijk
1n | ≤
∞
C X 2d−1
x
α(x, 2)δ/(2r+δ) = O [det(s∆n )]−1 .
sN n x=1
2
This establishes (9.7), completing the proof of Lemma 9.3 for τ̂n,
. OL
We use the next lemma in the proof of Theorem 4.2. It allows us to approximate lattice point counts
with Lebesgue volumes, in IR2 or IR3 , to a sufficient degree of accuracy.
Lemma 9.4 Let d = 2, 3 and R0 ⊂ (−1/2, 1/2]d such that B ◦ ⊂ R0 ⊂ B for a convex set B. Let
d
{bn }∞
n=1 be a sequence of positive real numbers such that bn → ∞. If k ∈ Z , then there exists Nk ∈ Z+
and Cd > 0 such that for n ≥ Nk , i ∈ Zd ,
|bn R0 | − |Zd ∩ bn (i + R0 )| − |bn R0 ∩ k + bn R0 | − |Zd ∩ bn (i + R0 ) ∩ k + bn (i + R0 )| ≤



C2 kkk2∞

2
 C3 kkk4∞ b5/3
n + ξk,n bn
if d = 2
if d = 3,
where {ξk,n }∞
n=1 ⊂ IR is a nonnegative sequence (possibly dependent on k) such that ξk,n → 0.
Proof: Provided in Nordman and Lahiri (2002). To establish Lemma 4.1, we require some additional notation. For i, k ∈ Zd , let sN i,n (k) =
|Zd ∩ R̃i,n ∩ k + R̃i,n | denote the number of sampling sites (lattice points) in the intersection of a
NOL subregion with its k-translate. Note sN i,n (k) is a subsample version of Nn (k) from (9.1).
Proof of Lemma 4.1: We start with bounds:
d−1
C(sλmax
,
n )
(9.8)
≤
|{j ∈ Zd : T j ∩ sRn 6= ?, T j ∩ sRcn 6= ?; T j = j + kkk∞ [−2, 2]d}|
≤
d−1
Ckkkd∞ (sλmax
,
n )
(9.9)
sup |sN n − sN i,n | ≤
i∈Zd
|sN i,n − sN i,n (k)|
by the boundary condition on R0 (cf. Lemma 8.3) and inf j∈Zd ks∆n i − jk∞ ≤ 1/2.
Modify (9.4) by replacing N n , Nn (k), ȲNn with sN i,n , sN i,n (k), Ỹi,n = ∇′ (Z̃i,n − µ) (i.e., use a NOL
subsample version in place of the sample one); and replace N n , ∆n , λmax
with the subsample analogs
n
max
sN i,n , s∆n , sλn
in (9.5). We then find using (9.3): for each i ∈ Zd ,
2
sN i,n E(Ỹi,n )
− τ2 =
1 X
(sN i,n (k) − sN i,n )σ(k) ≡ s Ii,n ,
sN i,n
d
k∈Z
33
(9.10)
sup |s Ii,n | ≤
sup
i∈Zd
i∈Zd
1 X
|sN i,n (k) − sN i,n | · |σ(k)|
sN i,n
d
k∈Z
∞
X
(sλn )d−1
2d−1
δ/(2r+δ)
−1/d
x
α
(x)
=
O
[det(
∆
)]
,
1
s n
max d−1
sN n − C(sλn )
x=1
max
≤
C·
(9.11)
from (9.8)-(9.9) and Assumption A.1. Now applying Lemma 9.1 and Assumption A.2 with Lemma 9.2
for d ≥ 2 or Lemma 9.3 for d = 1, Lemma 4.1 follows. Proof of Theorem 4.1: Here sN i,n = sN n , sN i,n (k) = Cn (k), E(Ỹi,n ) = E(Ỹ0,n ) for each i, k ∈ Zd
(since sλn ∈ Z+ ) and det(s∆n ) = sλdn . Applying Lemma 9.2 for d ≥ 3 and Lemma 9.3 for d = 2,
Lemma 9.1, Assumption A.2, and (9.11):
d
−1 X
sN n − Cn (k) sλn |R0 |
·
gn (k) + o(sλ−1
gn (k) ≡
· σ(k).
n ),
d−1
sλn |R0 |
sN n
sλn
k∈Zd
P
From (9.11) and Lemma 8.3, it follows that k∈Zd |gn (k)| ≤ C, n ∈ Z+ and that gn (k) → C(k)σ(k)
E(τ̂n2 ) − τn2 =
for k ∈ Zd . By the LDCT, the proof of Theorem 4.1 is complete. To establish Theorem 4.2, we require some additional notation. For i, k ∈ Zd , denote the difference
between two Lebesgue volume-for-lattice-point-count approximations as:
Di,n (k) = |R̃i,n |−sN i,n − |R̃i,n ∩k+R̃i,n |−sN i,n (k)| = |sRn |−sN i,n − |sRn ∩k+sRn |−sN i,n (k)| .
Proof of Theorem 4.2: We handle here the cases d = 2 or 3. Details on the proof for d = 1 are
given in Nordman (2002). We note first that if V (k) exists for each k ∈ Zd , then Lemma 9.4 implies
C(k) = V (k).
2
Consider τ̂n,
N OL . Applying Lemma 9.2 for d = 3, and Lemma 9.3 for d = 2, with (9.8), (9.10), and
(9.11) gives
2
) − τn2 = |JN OL |−1
E(τ̂n,
N OL
X
i∈JN OL
Then, using (9.3), we can arrange terms to write
|JN OL |−1
sN i,n
X
|sRn |
i∈JN OL
for Ψn = −
−1
P
k∈Zd |sRn |
s Ii,n
= Ψn +
X Gn (k)
;
λ |R0 |
d s n
sN i,n
|sRn |
s Ii,n
+ o(sλ−1
n ).
Gn (k) =
k∈Z
(|sRn | − |sRn ∩ k + sRn |)σ(k). Since R0 is convex, the boundary condition is
valid and it holds that for all i, k ∈ Zd :
d−1
,
sN i,n (k) − |sRn ∩ k ∩ sRn | ≤ C(sλmax
n )
d−1
(9.12)
|sRn | − |sRn ∩ k + sRn | ≤ Ckkkd∞ (sλmax
n )
from Lemma 8.3 and (9.9). Then (9.3), Lemma 9.4, and (9.12) give:
d
Gn (k) → 0 for k ∈ Z ; and sλn Ψn = O(1). By the LDCT, we establish
X Gn (k)
= o sλ−1
,
n
λ |R0 |
d s n
k∈Z
X Di,n (k) · σ(k)
,
λd−1 |JN OL |
i∈JN OL s n
P
k∈Zd
|Gn (k)| ≤ C, n ∈ Z+ ;
2
E(τ̂n,
) − τn2 = Ψn (1 + o(1)),
N OL
34
representing the formulation of Theorem 4.2 in terms of Ψn . If V (k) exists for each k ∈ Zd , then (9.3)
and (9.12) imply that we can use the LDCT again to produce
X
−1
Ψn =
V (k)σ(k) (1 + o(1)).
sλn |R0 |
d
(9.13)
k∈Z
2
The proof of Theorem 4.2 for τ̂n,
is now finished.
N OL
2
Consider τ̂n,
OL . We can repeat the same steps as above to find
2
E(τ̂n,
) − τn2 = Ψn +
OL
X G∗ (k)
n
+ o sλ−1
,
n
λ |R0 |
d s n
G∗n (k) =
k∈Z
D0,n (k) · σ(k)
.
d−1
sλn
The same arguments for Gn apply to G∗n and (9.13) remains valid when each V (k) exists, k ∈ Zd ,
2
d
establishing Theorem 4.2 for τ̂n,
OL . Note as well that if V (k) exists for each k ∈ Z , then Lemma 9.4
and Lemma 4.1 also imply the second formulation of the bias in Theorem 4.2. Proof of Theorem 5.1: Follows from Theorems 3.1 and 4.1 and simple arguments from Calculus
involving minimization of a smooth function of a real variable. .
Proof of Theorem 6.1: We develop some tools to facilitate the counting arguments required for
2
2
d
expanding the biases of τ̂n,
lattice points
OL and τ̂n,N OL . We first define a count for the number of Z
Qd
Qd
lying in a “face” of a d-dimensional rectangle. For a rectangle T , where j=1 (cj , c̃j ) ⊂ T ⊂ j=1 [cj , c̃j ],
Sd cj , c̃j ∈ IR, define the “border” point set: B{T } = j=1 s = (s1 , . . . , sd )′ ∈ Zd ∩ T : sj ∈ {cj , c̃j } .
With k ∈ Zd , it can be shown that there exists Nk ∈ Z+ such that: for n ≥ Nk ,
B{sλn (i + R0 )} − B{sλn (i + R0 ) ∩ k + sλn (i + R0 )} ≤ 32d(d+1)kkk∞sλd−2
n ,
i ∈ Zd .
(9.14)
See Nordman (2002) for more details.
Fix k 6= 0 ∈ Zd and assume min1≤j≤d s λn ℓj > 3 + kkk∞ . We then write, for i ∈ Zd : sN i,n =
P5,i,n − P4,i,n + P3,i,n , sN i,n (k) = P6,i,n − P7,i,n + P8,i,n , and
X
4
|Di,n (k)| = (P2j−1,i,n − P2j,i,n );
j=1
P1,i,n =
P5,i,n =
Qd
j=1 sλn ℓj ,
Qd
j=1 |sλn (ij
Qd
− |kj |), P3,i,n = |B{sλn (i + R0 )}|, P4,i,n = |B{sλn (i + R0 )}|,
Q
+ ℓj [−1/2, 1/2]) − tj ∩ Z|, P6,i,n = dj=1 (|sλn (ij + ℓj [−1/2, 1/2]) − tj ∩ Z| − |kj |),
P2,i,n =
j=1 (sλn ℓj
P7,i,n = |B{sλn (i + R0 ) ∩ k + sλn (i + R0 )}|, P8,i,n = |B{sλn (i + R0 ) ∩ k + sλn (i + R0 )}|.
We show now that, for k 6= 0, there exists C > 0, Nk ∈ Z+ , such that: n ≥ Nk ,
4
X
d−2
|Di,n (k)| = (P2j−1,i,n − P2j,i,n ) ≤ Ckkkd−1
∞ sλn ,
j=1
35
i ∈ Zd .
(9.15)
By (9.14), for large n ≥ Nk , we have that 0 ≤ (P3,i,n − P8,i,n ) ≤ (P4,i,n − P7,i,n ) ≤ Ckkk∞sλd−2
n . Note
that: |sλn (i + ℓj [−1/2, 1/2]) − tj ∩ Z| − sλn ℓj ≤ 3 for i ∈ Z, j ∈ {1, . . . , d}. Hence,
|(P1,i,n − P5,i,n ) − (P2,i,n − P6,i,n )|
Y
X
Y
d−2
≤
kkk|A|
|sλn (ij + ℓj [−1/2, 1/2]) − tj ∩ Z| ≤ Ckkkd−1
sλn ℓj −
∞ ∞ sλn ,
A⊂{1,...,d},
1≤|A|<d
j∈A
j∈A
which holds uniformly in i ∈ Zd . With these bounds, we have (9.15).
Applying (9.15) in place of Lemma 9.4, the same proof for Theorem 4.2 establishes Theorem 6.1. References
Billingsley, P. (1986). Probability and Measure, 2nd Edition. John Wiley & Sons, New York.
Bolthausen, E. (1982). On the central limit theorem for stationary mixing random fields. Ann. Probab.
10, 1047-1050.
Bradley, R. C. (1989). A caution on mixing conditions for random fields. Statist. Probab. Letters 8,
489-491.
Bühlman, P. and Künsch, H. R. (1999). Block length selection in the bootstrap for time series.
Comput. Stat. Data Anal. 31, 295-310.
Carlstein, E. (1986). The use of subseries methods for estimating the variance of a general statistic
from a stationary time series. Ann. Statist. 14, 1171-1179.
Corput, J. G. van der (1920). Über Gitterpunkte in der Ebene. Math. Ann. 81, 1-20.
Cressie, N. (1991). Statistics for Spatial Data, 1st Edition. John Wiley & Sons, New York.
DiCiccio, T., Hall, P., and Romano, J. P. (1991). Empirical likelihood is Bartlett-correctable. Ann.
Statist. 19, 1053-1061.
Doukhan, P. (1994). Mixing: properties and examples. Lecture Notes in Statistics 85. Springer-Verlag,
New York.
Fukuchi, J.-I. (1999). Subsampling and model selection in time series analysis. Biometrika 86, 591-604.
Garcia-Soidan, P. H. and Hall, P. (1997). In sample reuse methods for spatial data. Biometrics 53,
273-281.
Guyon, X. (1995). Random Fields on a Network. Springer-Verlag, New York.
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York.
Hall, P., Horowitz, J. L., and Jing, B.-Y. (1995). On blocking rules for the bootstrap with dependent
data. Biometrika 82, 561-574.
Hall, P. and Jing, B.-Y. (1996). On sample reuse methods for dependent data. J. R. Stat. Soc. Ser.
B 58, 727-737.
Huxley, M. N. (1993). Exponential sums and lattice points II. Proc. London Math. Soc. 66, 279-301.
36
Huxley, M. N. (1996). Area, Lattice Points, and Exponential Sums. Oxford University Press, New
York.
Krätzel, E. (1988). Lattice Points. Deutscher Verlag Wiss., Berlin.
Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann.
Statist. 17, 1217-1241.
Lahiri, S. N. (1996). On empirical choice of the optimal block size for block bootstrap methods.
Preprint. Department of Statistics, Iowa State University, Ames, IA.
Lahiri, S. N. (1999a). Asymptotic distribution of the empirical spatial cumulative distribution function
predictor and prediction bands based on a subsampling method. Probab. Theory Related Fields
114, 55-84.
Lahiri, S. N. (1999b). Central limit theorems for weighted sums under some spatial sampling schemes.
Preprint. Department of Statistics, Iowa State University, Ames, IA.
Lahiri, S. N. (1999c). Theoretical comparisons of block bootstrap methods. Ann. Statist. 27, 386-404.
Léger, C., Politis, D. N., and Romano, J. (1992). Bootstrap technology and applications. Technometrics 34, 378-399.
Martin, R. J. (1990). The use of time-series models and methods in the analysis of agricultural field
trials. Comm. Statist.-Theory Methods 19, 55-81.
Meketon, M. S. and Schmeiser, B. (1984). Overlapping batch means: something for nothing? Proc.
Winter Simulation Conf., 227-230, S. Sheppard, U. Pooch, and D. Pegden (editors). IEEE,
Piscataway, New Jersey.
Nordman, D. J. (2002). On optimal spatial subsample size for variance estimation. Dissertation paper.
Department of Statistics, Iowa State University, Ames, IA.
Nordman, D. J. and Lahiri, S. N. (2002) On the approximation of differenced lattice point counts
with application to statistical bias expansions. Dissertation paper/technical report. Department of Statistics, Iowa State University, Ames, IA.
Perera, G. (1997). Geometry of Zd and the central limit theorem for weakly dependent random fields.
J. Theoret. Probab. 10, 581-603.
Politis, D. N. and Romano, J. P. (1993a). Nonparametric resampling for homogeneous strong mixing
random fields. J. Multivariate Anal. 47, 301-328.
Politis, D. N. and Romano, J. P. (1993b). On the sample variance of linear statistics derived from
mixing sequences. Stochastic Process. Appl. 45, 155-167.
Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under
minimal assumptions. Ann. Statist. 22, 2031-2050.
Politis, D. N. and Sherman, M. (2001). Moment estimation for statistics from marked point processes.
J. R. Stat. Soc. Ser. B 63, 261-275.
37
Politis, D. N., Romano, J. P., and Wolf, M. (1999). Subsampling. Springer-Verlag, New York.
Possolo, A. (1991). Subsampling a random field. IMS Lecture Notes 20, 286-294.
Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. R. Stat.
Soc. Ser. B 58, 509-523.
Sherman, M. and Carlstein, E. (1994). Nonparametric estimation of the moments of a general
statistic computed from spatial data. J. Amer. Statist. Assoc. 89, 496-500.
Song, W. M. T. and Schmeiser, B. (1988). Estimating standard errors: empirical behavior of asymptotic MSE-optimal batch sizes. In Computer Science and Statistics: Proc. 20th Symp. Interface, 575-580, E. J. Wegman, D. T. Gantz, and J. J. Miller (editors). American Statistical
Association.
38
Download