Optimal block size for variance estimation by a spatial

advertisement
Optimal block size for variance estimation by a spatial
block bootstrap method
running title: bootstrap block size
Daniel J. Nordman, Soumendra N. Lahiri
Brooke L. Fridley
Iowa State University, Ames, USA
Mayo Clinic, USA
Abstract
This paper considers the block selection problem for a block bootstrap variance estimator
applied to spatial data on a regular grid. We develop precise formulae for the optimal block
sizes that minimize the mean squared error of the bootstrap variance estimator. We then
describe practical methods for estimating these spatial block sizes and prove the consistency
of a block selection method by Hall, Horowitz and Jing (1995), originally introduced for
time series. The spatial block bootstrap method is illustrated through data examples and
its performance is investigated through several simulation studies.
AMS(2000) Subject Classification: Primary 62G09; Secondary 62M30
Key Words: block bootstrap, empirical block choice, stationary random fields.
1
Introduction
In recent years, different versions of block bootstrap methods have been proposed for spatial
data. As in the time series case, the accuracy of a block bootstrap estimator of a population
parameter critically depends on the block size employed. Although this is a very important
problem, there seems to be little information available in the literature about the (theoretical)
optimal block sizes for estimating a given parameter with a spatial block bootstrap. In this
paper, we investigate the problem of determining optimal block sizes for variance estimation
by a spatial block bootstrap method.
For times series data, expressions for theoretical optimal block sizes with different block
bootstrap methods are known (Künsch, 1989, Hall et al., 1995, Lahiri, 1999). In the time
series case, the optimal block sizes depend on the blocking mechanism (i.e., overlapping/nonoverlapping blocks) and the covariance structure of the process. In comparison, the optimal
1
block sizes in the spatial case are determined by the blocking mechanism, the covariance
structure of the spatial process, and the dimension of the spatial sampling region. For
a large class of “smooth function model” spatial statistics, the main results of this paper
give expansions for the mean squared error (MSE) of block bootstrap variance estimators
based on overlapping and non-overlapping versions of the spatial block bootstrap method.
Using these MSE expansions, we give explicit expressions for the optimal bootstrap block
sizes. It turns out that, due to the mechanics of the bootstrap, the optimal blocks in
the spatial case are a natural extension of the time series case under identical conditions
on the smooth function model. This result is surprising compared to spatial subsampling
variance estimators, studied by Politis and Romano (1994a), Sherman and Carlstein (1994)
and Sherman (1996). With subsampling, the formulae for optimal blocks may differ largely
between the spatial/temporal settings and also require much smoother functions for time
series data than for spatial data (Nordman and Lahiri, 2004).
We then develop data-based methods for estimating the optimal block size and study
their performance through simulation studies. In particular, we contribute the first proof of
consistency (to our knowledge) for a general block estimation technique of Hall et al. (1995),
which has been applied by Hall and Jing (1996), Politis and Sherman (2001) and Nordman
and Lahiri (2004) among others; the result applies to both time series and spatial block
bootstraps. We also illustrate the proposed methodology with two data examples, which
investigate spatial patterns in longleaf pine trees (Cressie, 1993) and a cancer mortality map
of the United States (Sherman and Carlstein, 1994).
The rest of the paper is organized as follows. We conclude this section with a brief
literature review. In Section 2, we describe the spatial sampling framework, the variance
estimation problem, and the spatial block bootstrap methods for variance estimation. In
Section 3, we state the assumptions and the main results on optimal block sizes. Empirical
methods for selecting the optimal block sizes are discussed in Section 4 while Section 5
summarizes a simulation study of the bootstrap estimators with empirical block choices.
Section 6 describes data examples of the bootstrap method. Proofs of all technical results
are given in the Appendix.
Block bootstrap methods for time series data and spatial data have been put forward
by Hall (1985), Künsch (1989), Liu and Singh (1992), Politis and Romano (1993, 1994b),
Politis et al. (1999), Zhu and Lahiri (2007) among others. In contrast to the spatial setting, considerable research has focused on optimal block sizes for time series block bootstrap
methods including work by Künsch (1989), Hall et al. (1995), Lahiri (1996), Bühlmann
2
and Künsch (1999), Lahiri et al. (2006), and Politis and White (2004). Nordman and
Lahiri (2004) derive expressions for optimal block sizes for a class of spatial subsampling
methods. See Lahiri (2003a, Chapter 12) for more on bootstrap methods for spatial data.
2
Block bootstrap variance estimators
2.1
Variance estimation problem with spatial data
We assume that the available spatial data are collected from a spatial sampling region Rn ⊂
Rd as follows. For a fixed vector t ∈ [−1/2, 1/2)d , identify the t-translated integer lattice
as Zd ≡ t + Zd . Suppose that a p-dimensional, stationary weakly dependent spatial process
{Z(s) : s ∈ Zd } is observed at those locations Sn ≡ {s1 , . . . , sNn } of the lattice grid Zd that
lie inside Rn , i.e., the data are Zn = {Z(s) : s ∈ Sn } for Sn = Rn ∩ Zd . For simplicity, we
let N ≡ Nn denote the spatial sample size or the number of sites in Rn .
To allow a wide variety of sampling region shapes, we suppose that Rn is obtained by
inflating a prototype set R0 by a constant λn :
Rn = λn R0 ,
(1)
where R0 ⊂ (−1/2, 1/2]d is a Borel subset containing an open neighborhood of the origin and {λn }n≥1 is a positive sequence of scaling factors such that λn ↑ ∞ as n → ∞.
Because R0 contains the origin, the shape of the sampling region Rn is preserved for different values of n as the region grows in an “increasing domain asymptotic” framework as
termed by Cressie (1993). Sherman and Carlstein (1994), Sherman (1996) and Nordman and
Lahiri (2004) use a comparable sampling framework. Observations indexed by Zd ≡ t + Zd ,
rather than the integers Zd , entail that the “center” of Rn at the origin need not be a
sampling site.
We study the spatial block bootstrap method of variance estimation for a large class
of statistics that are functions of sample means. Let H : Rp → R be a smooth function
P
and Z̄N = N −1 N
i=1 Z(si ) denote the sample mean of the N sites within Rn . Suppose
that a relevant statistic θ̂n can be represented as a function of the sample mean θ̂n =
H(Z̄N ) and estimates the population parameter θ = H(µ) involving the mean EZ(t) = µ ∈
Rp of the random field. Hall (1992) refers to this parameter/estimator framework as the
“smooth function” model which permits a wide range of spatial estimators with suitable
transformations of the Z(s)’s, including means, differences and ratios of sample moments
(e.g., spatial variograms), and other spatial test statistics.
3
√
Using a spatial block bootstrap, we wish to estimate the variance of the scaled statistic
N θ̂n , say σn2 = N Var(θ̂n ). We next describe two bootstrap variance estimators of σn2 .
2.2
Block bootstrap methods
The spatial bootstrap aims to generate bootstrap renditions of spatial data through the same
recipe used in the moving block bootstrap (MBB) for time series (Künsch, 1989, Liu and
Singh, 1992). Recall that the MBB produces a bootstrap reconstruction of a length n time
series through a sequence of block resampling steps as follows.
1. Partition the time series into k = bn/bc consecutive observational blocks of length b.
2. Develop some collection of length b observational blocks from the original time series.
Two possibilities for block collections are either the (non-overlapping) blocks in Step 1
or the size n − b + 1 set of all possible (potentially overlapping) blocks of length b.
3. For each data block in Step 1, create a bootstrap rendition of this block by independently resampling a length b block of observations from the collection in Step 2.
4. Paste the k resampled blocks together to form a bootstrap time series of length kb.
To re-create spatial data, the spatial block bootstrap involves the same essential steps as the
MBB, namely dividing the sampling region Rn into spatial data blocks, creating a bootstrap
version of each data block through block resampling, and concatenating these resampled
blocks together into a bootstrap spatial sample. Details are described next, where the time
series bootstrap is modified to accommodate the spatial data structure.
Let {bn }n≥1 be a sequence of positive integers to define the d-dimensional spatial blocks
as Bn (i) ≡ i + bn U, i ∈ Zd , using the unit cube U = (0, 1]d . We suppose b−1
n + bn /λn → 0 as
n → ∞ to keep the blocks Bn (·) small relative to the size of the sampling region Rn in (1).
To implement the block bootstrap, the sampling region Rn is first divided into disjoint
cubes or blocks of spatial observations. To this end, let Kn = {k ∈ Zd : Bn (bn k) ⊂ Rn }
represent the index set of all complete cubes Bn (bn k) = bn (k+U) lying inside Rn ; Figure 1(a)
illustrates a partition of Rn by cubes. For any A ⊂ Rd , let Zn (A) = {Z(s) : s ∈ A ∩ Sn }
denote the set of all observations corresponding to the sampling sites lying in the set A. We
define a bootstrap version of Zn (Rn ) by putting together bootstrap replicates of the process
Z(·) on each block in the partition of Rn , given by
Rn (k) ≡ Rn ∩ Bn (bn k),
4
k ∈ Kn .
(2)
That is, we consider one block subregion Rn (k), k ∈ Kn , at a time and create a bootstrap
rendition of the data Zn (Rn (k)) in that block by resampling from a suitable collection of
blocks in Rn ; piecing resampled blocks together builds a bootstrap version of the process
Z(·) on the entire region Rn . Similar to the MBB, we formulate two block bootstrap variance
estimators based on two different sources of blocks for resampling: rectangular subregions
Bn (·) within Rn that are overlapping (OL) or non-overlapping (NOL). We describe the OL
version in detail; the NOL version is similar.
2.2.1
Block bootstrap variance estimator
Define an integer index set In = {i ∈ Zd : Bn (i) ⊂ Rn } for those integer-translated blocks
bn U lying completely inside Rn ; see Figure 1(b). For each k ∈ Kn , we resample one block
at random from the collection of OL blocks {Bn (i) : i ∈ In }, independently from other
resampled blocks, to define a version of the process {Z(s) : s ∈ Rn (k) ∩ Zd } observed on
Rn (k). To make this precise, let {ik,OL : k ∈ Kn }, be a collection of iid random variables
with common distribution
P (ik,OL = j) =
1
,
|In |
j ∈ In ,
(3)
using |A| to denote the number of elements in a finite set A ⊂ Rd . For each k ∈ Kn , the OL
∗
block bootstrap version of data Zn (Rn (k)) is defined by Zn,
OL (Rn (k)) = Zn (Bn (ik,OL )). We
concatenate the resampled block observations for each k ∈ Kn into a single spatial bootstrap
∗
sample {Zn,
OL (Rn (k)) : k ∈ Kn } defined at sampling sites Sn ∩ {Rn (k) : k ∈ Kn }, which
has N1 ≡ N1n = |Kn | · bdn observations; see Figure 1(d). Here bdn = |bn U ∩ Zd | represents the
∗
number of sampling sites in a block Bn (i), i ∈ In or a subregion Rn (k), k ∈ Kn . Let Z̄n,
OL
be the average of the N1 resampled values. We define the OL block bootstrap version of θ̂n
∗
∗
2
as θ̂n,
OL = H(Z̄n,OL ) and give the corresponding variance estimator of σn as
2
∗
σ̂n,
OL (bn ) = N1 Var∗ (θ̂n,OL ),
where Var∗ denotes the conditional variance given the data.
Hence, this spatial block resampling scheme is an extension of the OL MBB for time series.
In applying the MBB to a size n time series, the bootstrap time stretch may have marginally
shorter length n1 = kb because the MBB resamples complete data blocks and ignores some
∗
boundary time values in the reconstruction. Analogously with {Zn,
OL (Rn (k)) : k ∈ Kn }, we
create a bootstrap rendition of those Rn -observations belonging to some “complete” block
Rn (k), k ∈ Kn in the partition of Rn ; see Figure 1(d). As in the time series case, some
5
boundary observations of Rn may not be used in the bootstrap reconstruction and this
occurs commonly with other spatial block methods like subsampling (Sherman, 1996).
2
Closed-form expressions for σ̂n,
OL (bn ) are obtainable for some spatial statistics θ̂n (see
2
(7)) while in other cases we may evaluate σ̂n,
OL (bn ) by Monte-Carlo simulation as follows.
Let M be a large positive integer, denoting the number of bootstrap replicates. For each
` = 1, . . . , M , generate a set {`ik,OL : k ∈ Kn }, of iid random variables according to (3)
to obtain the `th bootstrap sample replicate as {Zn (Bn (`ik,OL )) : k ∈ Kn }, whose sample
∗
∗
mean evaluated in the function H yields the `th bootstrap replicate `θ̂n,
OL of θ̂n,OL . The
Monte-Carlo approximation to the OL block bootstrap variance estimator is then given by
MC
2
σ̂n,
=
OL (bn )
M
´2
N1 X ³ ∗
MC ∗
,
θ̂
−
E
θ̂
` n,OL
∗
n,OL
M `=1
∗
EMC
∗ θ̂n,OL =
M
1 X ∗
.
`θ̂
M `=1 n,OL
For the NOL block bootstrap method, we consider resampling strictly from the NOL
collection of blocks {Rn (k) : k ∈ Kn }; see Figure 1(c). Let {ik,N OL : k ∈ Kn } denote iid
random variables with common distribution P (ik,N OL = j) = 1/|Kn |, j ∈ Kn . The NOL
∗
bootstrap version of Zn (Rn (k)), k ∈ Kn , is then Zn,
As
N OL (Rn (k)) = Zn (Rn (ik,N OL )).
before, we combine the resampled NOL block observations into a size-N1 bootstrap sample
∗
∗
{Zn,
N OL (Rn (k)) : k ∈ Kn } with an average denoted by Z̄n,N OL . The NOL block bootstrap
2
∗
variance estimator of σn2 is given by σ̂n,
N OL (bn ) = N1 Var∗ (θ̂n,N OL ) based on the NOL block
∗
∗
bootstrap version θ̂n,
N OL = H(Z̄n,N OL ) of θ̂n .
3
Main results
3.1
Assumptions
To describe the results on the spatial block bootstrap, we require some notation and assumpP
tions. For a vector x = (x1 , ..., xd )0 ∈ Rd , let kxk, kxk1 = di=1 |xi | and kxk∞ = max1≤i≤d |xi |
denote the Euclidean, l1 and l∞ norms of x, respectively. Define the distance between two
sets E1 , E2 ⊂ Rd as: dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 }. For an uncountable set
A ⊂ Rd , vol(A) will refer to the volume (i.e., the Rd Lebesgue measure) of A.
The assumptions below, resembling those in Nordman and Lahiri (2004), include a
mixing/moment condition (Assumption 4r ) stated as a function of a positive argument
r ∈ Z+ = {0, 1, 2, . . .}. For ν = (ν1 , ..., νp )0 ∈ Zp+ , let Dν denote the νth order partial
ν
differential operator ∂ ν1 +...+νp /∂xν11 ...∂xpp and ∇ = (∂H(µ)/∂x1 , . . . , ∂H(µ)/∂xp )0 be the
vector of first order partial derivatives of H at µ. Let FZ (T ) denote the σ-field generated
6
by the variables {Z(s) : s ∈ T }, T ⊂ Zd and define the strong mixing coefficient for the
random field Z(·) as α(k, l) = sup{α̃(T1 , T2 ) : Ti ⊂ Zd , |Ti | ≤ l, i = 1, 2; dis(T1 , T2 ) ≥ k}
where α̃(T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ FZ (T1 ), B ∈ FZ (T2 )}.
Assumptions:
(d+1)/d
(1.) As n → ∞, b−1
n + bn
/λn → 0 for λn in (1). For any positive sequence an → 0, the
−(d−1)
d
number of cubes an (i + [0, 1) ), i ∈ Zd , intersecting both R0 and Rd \ R0 is O(an
).
p
(2.) H : R → R is twice continuously differentiable and, for some a ∈ Z+ and real C ≥ 0,
it holds that max{|Dν H(x)| : kνk1 = 2} ≤ C(1 + kxka ), x ∈ Rp .
¡ 0
¢
P
0
%(k)
∈
(0,
∞),
where
%(k)
=
Cov
∇
Z(t),
∇
Z(t
+
k)
.
(3.) σ 2 =
d
k∈Z
(4r .) There exist nonnegative functions α1 (·), g(·) such that α(k, l) ≤ α1 (k)g(l). For some
0 < δ ≤ 1, 0 < κ < (2r − 1 − 1/d)(2r + δ)/δ and C > 0, it holds that EkZ(t)k2r+δ < ∞,
P∞
(2r−1)d−1
α1 (m)δ/(2r+δ) < ∞, g(x) ≤ Cxκ , x ∈ [1, ∞).
m=1 m
Growth rates for the blocks bn and sampling region Rn = λn R0 in Assumption 1 are the
spatial analog of scaling used with the MBB for time series d = 1 (Lahiri, 1996). The condition on R0 is satisfied by most regions of practical interest and holds in the plane d = 2, for
example, if the boundary ∂R0 of R0 is delineated by a simple curve of finite length (Sherman,
1996). The R0 -condition implies that the effect of data points lying near the boundary of
Rn is negligible compared to the totality of data points and that the volume of Rn determines the number of data points and blocks (see Lemma 1(c) in Appendix). Conditions on
the smooth model function H in Assumption 2 are standard in this context. Assumption 3
implies that a positive limiting variance σ 2 = limn→∞ σn2 exists. Assumption 4r describes
a bound on the strong mixing coefficient satisfying certain growth conditions. Bounds of
this type are known to apply for many weakly dependent random fields and time series; see
Doukhan (1994) and Guyon (1995). This assumption also allows moment bounds for sample
means of the process Z(·) as well as a central limit theorem (Lahiri, 2003b).
3.2
Bias and variance expansions
To find optimal block sizes, we first provide the asymptotic bias and variance of the spatial
2
2
block bootstrap estimators σ̂n,
OL (bn ) and σ̂n,N OL (bn ).
Theorem 1 (i) Suppose that Assumptions 1 - 3 and 42+2a hold with “a” as specified under
7
Assumption 2 and B0 6= 0 in (4). Then, the bias of σ̂n2 (bn ) is
£
¤
¢
B0 ¡
E σ̂n2 (bn ) − σn2 = −
1 + o(1) ,
bn
B0 =
X
kkk1 %(k) ∈ R,
(4)
k∈Zd
2
2
where σ̂n2 (bn ) represents either σ̂n,
OL (bn ) or σ̂n,N OL (bn ).
(ii) If Assumptions 1-3 and 45+4a hold with “a” as specified under Assumption 2, then
¤
£ 2
(b
)
=
Var σ̂n,
n
OL
µ ¶d
¤¡
¢
£ 2
2
(b
)
1
+
o(1)
,
Var σ̂n,
n
N OL
3
¤
£ 2
¢
2σ 4 bdn ¡
(b
)
=
Var σ̂n,
1
+
o(1)
.
n
N OL
vol(Rn )
Because more OL blocks are generally available than NOL ones, the variance of OL block
estimator turns out to be (2/3)d times smaller than the variance of the NOL block estimator.
For d = 1, a size n time series sample is obtained from (1) by setting R0 = (−1/2, 1/2],
λn = n on the untranslated lattice Z = Z and applying this in Theorem 1 gives the wellknown bias/variance of the MBB variance estimator with time series (Künsch, 1989, Hall et
al., 1995, Lahiri, 1996).
We also note some important differences between the spatial bootstrap and spatial subsampling as an alternative for variance estimation with subregions of dependent data (Sherman and Carlstein, 1994, Sherman, 1996). While the block bootstrap aims to recreate a
“size Rn ” sampling region, subsampling considers scaled-down copies bn R0 of Rn = λn R0
that can be treated as repeated, smaller versions of Rn on which to evaluate a spatial statistic θ̂n (Politis et al., 1999, Section 3.8); the subsampling variance estimator of Var(θ̂n ) is
the sample variance of the subsample evaluations of θ̂n . Philosophical differences in the
bootstrap/subsampling mechanics translate into large differences in their bias and variance
expansions. For example, if Z̄0,n denotes the sample mean over the bdn observations in the
block bn U for the bootstrap or over the |bn R0 ∩ Zd | observations in bn R0 for subsampling, the
expected values Eσ̂ 2 of bootstrap/subsampling variance estimators of σ 2 = N Var(θˆn ) are
n
bootstrap: bdn Var(∇0 Z̄0,n ) + o(N −1/2 )
n
subsampling: |bn R0 ∩ Zd |Var{H(Z̄0,n )}(1 + o(1)).
That is, for any dimension d of sampling, the bootstrap bias (i.e, Eσ̂n2 − σn2 ) is determined
by the first (linear) term ∇0 Z̄0 in a Taylor’s expansion of H(Z̄0,n ) around µ. This is not
generally true for subsampling, which can require up to four terms in the Taylor’s expansion
of H(Z̄0,n ) to pinpoint the main O(b−1
n ) bias component, especially in lower dimensions d =
1, 2; see Nordman and Lahiri (2004). Under appropriate conditions, subsampling variance
estimators have the same variance and bias orders (i.e., bdn /λdn , b−1
n ) as the bootstrap, but the
proportionality constants for these orders depend greatly on the geometry of Rn and are not
8
as simple as the bootstrap expressions in Theorem 1. This can make plug-in estimation of
optimal block sizes (see Section 5) generally more difficult for subsampling compared to the
bootstrap.
3.3
Theoretical optimal block size
The bias and variance expansions from Theorem 1 yield an asymptotic MSE (e.g., E[σ̂n2 (bn )−
σn2 ]2 ), which the theoretically best block size bn minimizes to optimize the over-all performance of a block bootstrap variance estimator. Explicit expressions for optimal OL and
NOL block sizes for block bootstrap variance estimation are given in the following. We find
that, in large samples, OL blocks should be larger than NOL ones by a factor owing to
differing variances in Theorem 1.
Theorem 2 Under the assumptions of Theorem 1(ii), the optimal block sizes bopt
n,OL and
2
2
bopt
n,N OL for σ̂n,OL (·) and σ̂n,N OL (·) are given by
µ 2
¶1/(d+2)
µ ¶d/(d+2)
B0 · vol(Rn )
3
opt
opt
opt
(1 + o(1)).
(1 + o(1)),
bn,N OL =
bn,OL = bn,N OL
2
dσ 4
(5)
Remark: The volume vol(Rn ) may be replaced by the sample size N in Theorems 1-2.
In terms of the spatial sample size, the optimal block size for bootstrap variance estimation is O(N 1/(d+2) ). For the time series case d = 1, we set the time series length n for
vol(Rn ) in Theorem 2 to yield the optimal block expression obtained by other authors with
P
P∞
σ2 = ∞
k=−∞ %(k), B0 =
k=−∞ |k|%(k) (Künsch, 1989, Hall et al., 1995); see Section 3.2
for defining Rn with time series. Hence, there is continuity in the optimal blocks between
spatial and times series bootstrap settings. In the spatial case, the key difference is that
optimal blocks depend on the dimension d of sampling as well as the volume of the sampling
region Rn (or relatedly the spatial sample size) as determined by the geometry of Rn .
In subsequent sections, we turn to examining practical methods for estimating the optimal
block size for using the spatial bootstrap.
4
Empirical choice of the optimal block size
We describe two general approaches for estimating the optimal block size for the bootstrap
estimators. These are spatial versions of block selection procedures for time series and involve
either cross-validation (Hall et al., 1995) or “plug-in” estimators. Let σ̂n (·) denote either the
OL or NOL block-based variance estimator in the following.
9
Hall et al. (1995) proposed a method for choosing a block length with the time series MBB,
in which a type of subsampling is used to estimate the MSE incurred by the bootstrap at
various block sizes. To describe a spatial version of their procedure, let m ≡ mn be a positive
integer sequence satisfying m−1 +m/λn → 0 and define a set Mm = {i ∈ Zd : i+mU ⊂ Rn } of
all OL cubes mU (of side length m) within Rn = λn R0 . Consider the rectangular subsamples
of the data: Zn (i + mU), i ∈ Mm . (The subsampling regions here correspond to cubes
lying inside Rn , not scaled-down copies of Rn more typically associated with subsampling
2
(b), i ∈ Mm , denote the block bootstrap variance estimator
(Sherman, 1996).) Let σ̂i,m
applied to each subsample Zn (i + mU) by resampling blocks of size b. A subsampling
2
(b)), the mean squared error of the bootstrap variance estimator on a
estimator of MSE(σ̂i,m
rectangular subsample, is then given as
[ m (b) =
MSE
i2
X h
1
2
σ̂i,m
(b) − σ̂n2 (b∗n )
|Mm | i∈M
(6)
m
where b∗n is a pilot block size. For some ² > 0, let Jm,² = {b ∈ Z+ : md/(d+2)−² ≤ b ≤
©
ª
[ m (b) : b ∈ Jm,² , where b̂0m
md/(d+2)+² } and define the minimizer of (6) as b̂0m = arg min MSE
estimates the optimal block size for a sampling region of size vol(mU) = md . By Theorem 3,
we may re-scale b̂0m to obtain a block size estimator for the original sampling region Rn by
£
¤1/(d+2)
b̂HHJ
= b̂0m · vol(Rn )/md
.
n
We refer to this as HHJ method.
A second, computationally simpler estimator of the best block size is through a nonparametric plug-in (NPI) method. This involves directly substituting estimates of population quantities into the theoretical block expressions from (5).
For the time series
MBB, plug-in choices of block size have been suggested in various forms by Bühlmann and
Künsch (1999), Politis and White (2003), and Lahiri et al. (2006). Our NPI approach for
NPI
the spatial block size involves two integer sequences bNPI
n,1 and bn,2 of block sizes satisfying
d/(d+i)
NPI
1/bNPI
n,i + bn,i /λn
→ 0 as n → ∞, for i = 1, 2. Following Lahiri et al. (2006), we use
the difference of two bootstrap variance estimators to estimate the bias component B0 from
b0 = 2bNPI · [σ̂ 2 (bNPI ) − σ̂ 2 (2bNPI )]. The expectation result in Theorem 1 suggests
(5) by B
n,2
n
n,2
n
n,2
b0 as an appropriate estimator of B0 . Using the second block size bNPI
B
n,1 , we estimate the
b0 and σ̂ 2 into the theoretical
spread component σ 2 in (5) with σ̂ 2 = σ̂ 2 (bNPI ). Substituting B
n
n,1
of the optimal block size for σ̂n2 (·).
expression from Theorem 2 gives a NPI estimator b̂NPI
n
Both HHJ and NPI block selection methods are consistent under mild conditions. In
previous work, Hall et al. (1995) did not establish the consistency of their block method with
10
the time series MBB and the method’s formal consistency has remained an open problem.
We provide the first consistency result for the HHJ method (to our knowledge), which is
valid for both spatial and time series data. In the following, let bopt
n denote the theoretical
optimal block size for σ̂n2 (·) and suppose σ̂n2 (·) is used to compute b̂HHJ
or b̂NPI
n
n .
Theorem 3 Assume m2d+²(d+2) /vol(Rn ) = o(1) for ² defining Jm,² , b∗n = C ∗ vol(Rn )1/(d+2)
for some C ∗ > 0. In addition to Assumptions 1-3 and B0 6= 0, suppose Assumption 4r holds
using r = 5 + 4a for b̂NPI
or r = 15 + 12a for b̂HHJ
(with a as specified under Assumption 2).
n
n
p
p
opt
HHJ
opt
Then as n → ∞, b̂NPI
n /bn −→ 1 and b̂n /bn −→ 1.
The HHJ block estimator b̂HHJ
depends on the subsampling tuning parameter m and
n
a block size b∗n = C ∗ vol(Rn )1/(d+2) , C ∗ > 0 (e.g., C ∗ = 1, 2), of the optimal order from
Theorem 2. As in the time series case, the optimal value of m is unknown in the spatial
setting. However, to reduce the effect of the tuning parameter b∗n , we may follow the iteration
proposal of Hall et al. (1995). Upon obtaining an estimate b̂HHJ
n , this value may be set as the
pilot block b∗n in a second round of the algorithm and this iterative process may be repeated
until convergence of the block estimate. This is the approach we apply in Section 5.
NPI
For the NPI estimator b̂NPI
n requiring smoothing parameters bn,i , i = 1, 2, it can be estab-
1/(d+4)
lished that the optimal order of the block size bNPI
n,2 for estimating the bias B0 is vol(Rn )
1/(d+2)
while the optimal order of the block size bNPI
from Theorem 2. Hence, plaun,1 is vol(Rn )
1/(d+2i)
sible tuning parameters have the form bNPI
for Ci > 0, i = 1, 2.
n,i = Ci vol(Rn )
5
Simulation Study
In this section, we summarize numerical studies of the performance of spatial block bootstrap
variance estimators as well as empirical methods for choosing the block size.
5.1
Block Bootstrap MSE and Optimal Blocks
We first conducted a simulation study to compare OL and NOL block bootstrap variance
estimators of σn2 = N Var(θ̂n ) as well as a spatial subsampling estimator, where θ̂n = Z̄N
is the sample mean over a circular sampling region Rn . Two regions of different sizes were
considered Rn := {x ∈ R2 : kxk ≤ 9} and {x ∈ R2 : kxk ≤ 20}. We used the circulant
embedding method of Chan and Wood (1997) to generate real-valued mean-zero Gaussian
random fields on Z2 = Z2 (i.e., t = 0) with an Exponential covariance structure:
£
¤
Model(β1 , β2 ) : %(k) = exp − β1 |k1 | − β2 |k2 | , k = (k1 , k2 )0 ∈ Z2 , β1 , β2 > 0.
11
We considered the values (β1 , β2 ) = (1, 1) and (0.5, 0.3) to obtain covariograms exhibiting
various rates of decay. Because the smooth model function is linear here (i.e., H(x) = x), the
2
2
bootstrap estimator σ̂n,
OL (bn ) of σn has a closed form expression as a scaled sample variance
of block means
2
σ̂n,
OL (bn )
= |In |
−1
X
bdn
´2
³
Z̄i,n − Z̄n ,
Z̄n = |In |−1
X
Z̄i,n ,
(7)
i∈In
i∈In
where Z̄i,n denotes the sample mean of the b2n (d = 2) observations in an OL block Bn (i),
2
i ∈ In ; σ̂n,
N OL (bn ) is given by replacing In , Z̄i,n with NOL analogs Kn , Z̄bn i,n .
For comparison, we also evaluated the OL subsampling variance estimator of the sample
mean θ̂n = Z̄n with subsamples as translates of bn R0 , based on the template R0 = {x ∈
R2 : kxk ≤ 1/2} for Rn . Here the OL subsampling estimator of σn2 has a form similar to
2
d
σ̂n,
OL but employs OL subcircles rather than cubes (i.e., redefine In , bn in (7) by using bn R0
instead of bn U). That is, in the case of the sample mean, the only difference between the
OL block bootstrap and subsampling variance estimators is the subregion shape used and
the bootstrap employs rectangular blocks.
Figure 2 gives the normalized MSEs for each variance estimator over various block sizes
with optimal block sizes also denoted. We observe the following from the results:
1. The OL bootstrap estimator generally performed as good as (and often better than)
than the NOL version at any fixed block size considered.
2. At their optimal blocks, the OL block bootstrap performed slightly better than OL
subsampling in most cases. Table 1 provides a decomposition of the bias and variance
of both estimators at optimal block scaling. In the one instance where subsampling
outperformed the bootstrap (i.e., Model(0.5,0.3) on the larger Rn ), more optimal OL
circular subsamples were available than OL optimal blocks (i.e., 665 subcircles compared to 632 blocks).
3. The MSEs for the OL bootstrap exhibited more curvature than subsampling as a
function of block size, which may suggest a possible advantage for the bootstrap in
estimating/locating optimal block sizes. This topic is beyond the scope of the current
paper but may be important for future investigation.
5.2
Block Selection Methods
Using the same regions Rn and covariance models, we next studied empirical block selection for the OL block bootstrap by investigating possible values for constants (C1 , C2 )
12
1/(d+2i)
and (C ∗ , m), used to define the tuning parameters b̂NPI
, i = 1, 2 and
n,i = Ci vol(Rn )
b∗n = C ∗ vol(Rn )1/(d+2) in the NPI and HHJ methods, respectively. With time series, numerical studies for the MBB suggest that constants near one perform adequately for plug-in
block choices (Lahiri et al., 2006). Hence, we considered combinations of scalars C1 , C2 , C ∗ ∈
[1/2, 2] in increments of 1/4 and values m = {Cvol(Rn )1/6 : C = 2, 3, 4} to produce an estimator b̂n,OL of the optimal OL block size bopt
n,OL with the NPI or HHJ approach. (The m-values
in the HHJ method were chosen under Theorem 3 with ² = 1/2 and Jm,² = [1, m] ∩ Z; the
resulting values for m were typically around m = 5, 10, 15.) To initially assess a block
estimator, an MSE-criterion
2
4
2
2
opt
E{σ̂n,
OL (b̂n,OL ) − σ̂n,OL (bn,OL )} /σn
(8)
was evaluated based on 10,000 simulations for each sampling region Rn , dependence structure, and (C1 , C2 ) or (C ∗ , m) combination. The criterion assesses the discrepancy between
bootstrap variance estimators based on a block estimate and the optimal OL block choice
from Figure 2.
The simulation results (not included here for brevity) indicated tuning parameters values
of C1 , C2 ≤ 1 were generally best for the NPI method, especially with C2 = 1/2. For the
HHJ method, a constant C ∗ ∈ [1, 2] performed adequately as observed in Politis and Sherman (2001) while values around m = 10, 15 performed best on the larger sampling region
(especially under strongest spatial dependence with Model(0.5,0.3) where larger subsample
blocks may be expected). Figure 3 illustrates dot plots of the MSE-criterion (8) to give a
rough comparison of NPI and HHJ methods over the mentioned tuning parameter ranges
where the methods often performed well (i.e., C1 , C2 ∈ {0.5, .75, 1}, C ∗ ∈ {1, 1.5, 2}). Figure 3 indicates that both NPI and HHJ methods often performed comparably in terms of (8)
with no one method emerging as clearly superior. The NPI method often performed well and
produced the lowest MSE-realizations but the HHJ exhibited more stability (less variability)
across a larger number of tuning parameters. In this sense, the HHJ method appeared more
robust to the tuning parameter selection, which may be due to the HHJ iteration steps to
minimize the effect of C ∗ in b∗n (described at the end of Section 4). At the same time, the
HHJ method is computationally more intensive than the NPI which can be prohibitive in
some applications (see the data example of Section 6.2).
Table 2 displays the frequency distribution of OL block estimates based on both selection
methods. Both NPI and HHJ approaches performed fairly well in identifying the optimal
OL block size and both methods appear comparable. On the larger sampling regions, the
13
NPI and HHJ selections seemed to exhibit more variability but empirical block estimates
that deviate slightly from the optimal block appear less serious for these regions (since the
bootstrap MSEs in Figure 2 are similar around the optimal block size).
6
Data examples
6.1
Data Example 1: Longleaf Pines
We illustrate the spatial bootstrap method under the smooth function model with longleaf
pine data from Cressie (1993). Figure 8.4 of Cressie (1993) provides counts for the number of pines in a 32 × 32 grid of quadrats observed in a forest region of the Wade Tract,
Georgia, where each quadrat corresponds to a (0, 6.25] × [0, 6.25) m2 section partitioning a
(0, 200] × [0, 200) m2 study region. Analysis by Sherman (1996) suggests that the spatial
counts are stationary. We wish to test if the locations of pines exhibit complete spatial randomness or the tendency to cluster through the clustering index ICS≡ s2n /x̄n − 1 of David
and Moore (1954), involving the sample mean x̄n and variance s2n of the N = 1024 quadrat
counts (defining s2n here with a divisor N ). A positive ICS-value implies the trees tend to
cluster while we should expect the statistic to be zero when the tree locations are completely
spatially random; see Cressie (1993) for details.
To judge if the observed ICS =1.164 for the longleaf pine counts is significantly greater
than zero, a test statistic requires a variance estimate of σn2 = N Var(ICS) that accounts for
possible spatial dependence between the counts. The block bootstrap is applicable under
the smooth function model. To see this, let H : R2 → R for H(x, y) = x−1 y − x − 1 and
define Z̄N as the bivariate sample mean of Z(s) = (X(s), X 2 (s))0 of the tree counts X(s)
over quadrat sites s ∈ Rn ∩ Z2 , Rn = (0, 32]2 . Then, the ICS index equals θ̂n = H(Z̄N ).
2
2
Due to the nonlinearity of θ̂n , we evaluated the OL block estimator σ̂n,
OL of σn =
N Var(θ̂n ) by M = 3000 Monte-Carlo simulations. Both NPI and HHJ methods tended
to agree in their block selections. The NPI method selected an OL block b̂NPI
n,OL = 3 us1/(d+2i)
, i = 1, 2;
ing C2 = 0.5 and C1 ∈ {0.5, 1, 2} in the two pilot blocks b̂NPI
n,i = Ci N
NPI
∗
changing C2 = 2 with the same C1 range produced b̂NPI
n,OL = 3 or 5. Using bn = bn,1 (i.e.,
C∗ = C1 ∈ {0.5, 1, 2}) in the HHJ method led to block estimates b̂HHJ
n,OL = 3 for m ∈ {5, 20},
HHJ
b̂HHJ
n,OL = 5 for m = 10 and b̂n,OL = 6 for m = 15. That is, both NPI and HHJ appeared to be
fairly stable over one tuning parameter C1 = C ∗ in this example and changed slightly over
the second tuning parameter (C2 for NPI or m for HHJ); note that we may anticipate HHJ to
14
be robust to C ∗ due to iteration steps in the method (see Section 4). The resulting estimates
2
2
2
σ̂n,
OL (3) = 141.58, σ̂n,OL (5) = 126.60 and σ̂n,OL (6) = 116.53 produced estimated standard
2
1/2
errors (σ̂n,
of ICS as 0.372, 0.352 and 0.337 that indicate significant evidence of
OL /N )
clustering based on a normal approximation for the ICS index. Other findings in Chapter 8
of Cressie (1993) involving Poisson goodness-of-fit tests or distance methods support this
conclusion as well as the subsampling analysis by Sherman (1996).
6.2
Data Example 2: U.S. Cancer Mortality Map
Sherman and Carlstein (1994) provide a binary mortality map of the United States that indicates “high” or ”low” morality rates from liver and gallbladder cancer in white males during
1950-1959. The map consists of a spatial region Rn containing 2298 sites on the integer Z2
grid, of which 2012 have four nearest-neighbors available; see Sherman and Carlstein (1994)
for details. For a given site s ∈ Z2 , we code data Z(s) = 0 or 1 to indicate a low or high
mortality rate. (Note Sherman and Carlstein (1994) used coding −1/1 rather than 0/1.)
To investigate whether incidences of high cancer mortality exhibit clustering, Sherman
and Carlstein (1994) proposed fitting an autologistic model fit to the data as follows. Suppose
the data were generated by an autologistic model [Besag (1974)]:
£
¤
¡
¢
exp z(α + βN (s))
£
¤ , z = 0, 1,
P Z(s) = z | Z(i), i 6= s =
(9)
1 + exp α + βN (s)
P
where N (s) = kh−sk=1 Z(h) denotes the sum of indicators over the four nearest-neighbors
of site s. Positive values of the parameter β suggest a tendency of clustering, while β = 0
implies no clustering among sites. Maximum pseudolikelihood estimation (MPLE) with the
N = 2012 pairs Z(s), N (s) yielded β̂ = 0.419 and α̂ = −1.485. To test if the estimate β̂ is
significantly greater than zero, we require an estimate of N Var(β̂) that accounts for spatial
dependence because standard errors for MPLE statistics β̂ generally have no closed-forms.
We evaluated the OL block bootstrap estimator of N Var(β̂) based on M = 800 MonteCarlo simulations. For this variance estimation problem, a NPI estimate of OL block size
is relatively fast and easy to compute while the HHJ method is computationally prohibitive
(requiring many iterations of MPLE to be applied in the subsampling steps). Following
1/(d+2i)
, i = 1, 2,
tuning parameters from Section 5.2, we set C1 = C2 = 0.5 for b̂NPI
n,i = Ci N
in the NPI method which selected an OL block b̂NPI
n,OL = 4 with associated variance estimate
1/2
2
2
of β̂ is 0.0799,
σ̂n,
OL (4) = 12.84. The corresponding estimated standard error (σ̂n,OL (4)/N )
from which we conclude significant evidence of clustering. This agrees with other analyses
15
performed by Sherman and Carlstein (1994) accounting for their parametrization. Similar
NPI blocks b̂NPI
n,OL ∈ {2, 4, 5} with standard error estimates around 0.08 also followed from
other tuning combinations C1 , C2 ∈ {0.5, 0.75, 1} in this example.
Because the MPLE statistic β̂ does not readily fit into the smooth model framework
2
(Section 2.1), we further examined the bootstrap variance estimator σ̂n,
OL and block selection
in a small, but computationally involved, numerical study. We simulated data from an
autologistic model (9) with α = −1.49, β = 0.42 on a region of the same shape as the U.S.
mortality map. (For a single simulated data set, we generated independent Bernoulli p = 0.1
variables on the grid and applied the Gibbs Sampler with a burn-in of 10,000 iterations.)
The mean Eβ̂ and variance N Var(β̂) of the MPLE statistic were found to be 0.419 and
11.835 based on 10,000 simulated data sets. Table 3 displays the MSEs associated with the
2
OL block estimator σ̂n,
OL (based on M = 800 Monte-Carlo simulations) for various blocks
bn . The table also shows the distribution of OL block size estimates using the NPI method
with C1 = C2 = 0.5. The median and mode of the block estimates equal the optimal block
length bopt
n,OL =3 and simulations with other choices of C1 , C2 ∈ [1/2, 2] led to similar results.
7
Appendix: Proofs of Main Results
2
To save space, we consider only the OL block bootstrap estimator σ̂n,
OL (bn ) in detail. Proofs
for the NOL block version follow in an analogous manner. For i ∈ In and k ∈ Kn , let
Ui and Uk∗ respectively denote the sample averages of the bdn observations in Zn (Bn (i))
P
∗
−1
ˆ
and Zn,
OL (Rn (k)). Define µ̂n = |In |
i∈In Ui , a centered version µ̂n,cen = µ̂n − µ, ∇ =
ˆ 0 (Ui −µ) and W ∗ = ∇
ˆ 0 U ∗ . For ν = (ν1 , ..., νp )0 ∈
(∂H(µ̂n )/∂x1 , . . . , ∂H(µ̂n )/∂xp )0 , Wi,cen = ∇
k
k
Q
Q
p
p
νi
p
p
ν
ν
(Z+ ) , x ∈ R , write x = i=1 xi , ν! = i=1 (νi !), cν = D H(µ)/ν! and ĉν = Dν H(µ̂n )/ν!.
We assume that the Rd zero vector 0 ∈ Kn . Let C to denote a generic positive constant that
does not depend on n, block sizes, or any Zd , Zd points. Appearances of “a,r” refer to the
a, r-values from Assumptions 2 and 4r for the theorem under consideration. Limits in order
symbols are taken letting n tend to infinity.
We require some results in Lemma 1, where parts(a)-(b) follow from Doukhan (1994, p. 9,
26) and Jensen’s inequality and part(c) holds by the R0 -boundary condition in Assumption 1.
Lemma 1 (a) Suppose T1 , T2 ⊂ Zd are bounded with dis(T1 , T2 ) > 0. Let p, q > 0 where
1/p + 1/q < 1. If the random variable Xi is measurable with respect to FZ (Ti ), i = 1, 2, then
¡
¢1−1/p−1/q
|Cov(X1 , X2 )| ≤ 8(E|X1 |p )1/p (E|X2 |q )1/q α dis(T1 , T2 ); maxi=1,2 |Ti |
.
(b) Under Assumption 4r , r ∈ Z+ , it holds that for 1 ≤ m ≤ 2r and any T ⊂ Zd ,
16
Ek
P
s∈T
¡
¢
Z(s) − µ km ≤ C|T |m/2 and Ekµ̂n − µkm ≤ CN −m/2 .
(c) Under Assumption 1, N/vol(Rn ) → 1, |Kn |/(b−d
n vol(Rn )) → 1, |In |/vol(Rn ) → 1.
P2
2
∗
∗
∗
To prove Theorem 1, we write σ̂n,
OL (bn ) =
i=1 Var∗ (Ein ) + 2Cov∗ (E1n , E2n ) where
R
P
P
∗
∗
ν
∗
−1
∗
ν 1
(1 − ω)Dν H(µ̂n +
E1n
=
kνk1 =1 ĉν (Z̄n,OL − µ̂n ) , E2n = 2
kνk1 =2 (ν!) (Z̄n,OL − µ̂n )
0
∗
∗
∗
ω(Z̄n,
OL − µ̂n ))dω, using a Taylor expansion of θ̂n,OL = H(Z̄n,OL ) around µ̂n .
Proof of Theorem 1(i). By Assumption 1 and Lemma 1(c), it suffices to show that:
∗
E(N1 Var∗ (E1n
)) − σn2 = −
¢
B0 ¡
1 + o(1) ,
bn
∗
E(N1 Var∗ (E2n
)) = O(N −1 ),
(10)
∗
∗
from which N1 E|Cov∗ (E1n
, E2n
)| = O(N −1/2 ) = o(b−1
n ) follows by Holder’s inequality. BeginP
∗
∗
ning with Var∗ (E1n
) and writing E1n
= |Kn |−1 k∈Kn Wk∗ , it follows from the construction
∗
d
of Zn,
OL (Rn (k)), k ∈ Kn , by the distribution (3) and N1 = bn |Kn | that
µ
³
´2 ¶
Var∗ (W0∗ )
1 X 2
∗
d
0
ˆ µ̂n,cen
N1 Var∗ (E1n ) = N1
= bn
Wi,cen − ∇
,
|Kn |
|In | i∈I
(11)
n
ˆ 0 µ̂n,cen = |In |−1 P
ˆ
where ∇
i∈In Wi,cen . Expand each component ĉν , kνk1 = 1, of ∇ around µ
and write Wi,cen = S1i +S2i , i ∈ In , where S1i = ∇0 (Ui −µ) and S2i is defined by the difference.
2
By Assumption 5, Lemma 1(b) and Holder’s inequality, we find bdn E|S1i S2i |, bdn ES2i
≤ CN −1/2 ,
ˆ 0 µ̂n,cen )2 ≤ C bd N −1 = o(b−1 ) from k∇k
ˆ ≤ C(1 + kµ̂n,cen k + kµ̂n,cen k1+a ).
i ∈ In , and bd E(∇
n
Hence, (10) for
n
∗
Var∗ (E1n
)
σn2 − σ 2 = O(N −1/2 ),
n
will follow from (11) by showing
2
bdn ES1i
− σ2 = −
¢
B0 ¡
1 X bdn − Cn (k)
%(k)
=
−
1
+
o(1)
, (12)
d−1
bn
b
b
n
n
d
k∈Z
using the lattice point count Cn (k) = |Zd ∩ bn U ∩ (k + bn U)|, k ∈ Zd . The first part of
(12) holds from Nordman and Lahiri (2004, Lemma 9.1); the second part follows by the
−(d−1) ¡ d
Lebesgue dominated convergence theorem since 0 ≤ bdn − Cn (k) ≤ Ckkkd∞ bd−1
bn −
n , bn
¢
P
∗
) in (10),
Cn (k) → −kkk∞ for k ∈ Zd and k∈Zd kkkd∞ |%(k)| < ∞. Turning to Var∗ (E2n
4+2a
4
∗
∗
∗ 2
). We next note
) ≤ C(1 + kµ̂n k2a )(E∗ kZ̄n,
we bound E∗ (E2n
OL − µ̂n k + E∗ kZ̄n,OL − µ̂n k
P
−1
∗
∗
∗
∗
Z̄n,
OL = |Kn |
k∈Kn Uk and, conditioned on the data, {Uk }k∈Kn are iid with E∗ U0 = µ̂n ,
E∗ kU0∗ k4+2a < ∞ so that, for m = 2 or 2 + a,
2m
∗
≤ C|Kn |−m E∗ kU0∗ − µk2m ≤ C
E∗ kZ̄n,
OL − µ̂n k
|Kn |−m X
kUi − µk2m .
|In | i∈I
(13)
n
∗ 2
∗
) ] ≤ CN −1 . ¤
) ≤ N1 E[E∗ (E2n
By Holder’s inequality with Lemma 1, we find N1 EVar∗ (E2n
17
Proof of Theorem 1(ii). Beginning with (11), write |In |−1
P
i∈In
Wi2 ≡ T1n + 2T2n +
2
2
T3n as the sum of separate averages over terms S1i
, S1i S2i , S2i
, i ∈ In , respectively. From
Lemma 1(c) and Theorem 9.1 of Nordman and Lahiri (2004), it holds that Var(bdn T1n ) =
¡
¢
2
2σ 4 (2/3)d bdn /vol(Rn ) 1 + o(1) . To establish the expression for Var[σ̂n,
OL (bn )], it now suffices
to prove that
¡
¢
∗
Var(N1 Var∗ (E1n
)) = Var(bdn T1n ) 1 + o(1) ,
∗
Var(N1 Var∗ (E2n
)) = O(N −2 ),
∗
∗
since (10) and (14) entail Var[N1 Cov∗ (E1n
, E2n
)] = O(N −1 ).
(14)
By (13) and the bound
∗ 2
∗
on E∗ (E2n
) preceding (13), we may apply Lemma 1(b) to bound Var[N1 Var∗ (E2n
)] in
∗ 2 2
∗
(14) by N12 E[E∗ (E2n
) ] ≤ CN −2 . Then considering Var(N1 Var∗ (E1n
)) in (14), we bound
ˆ 0 µ̂n,cen )2 ] ≤ b2d E[∇
ˆ 0 µ̂n,cen ]4 = O(b2d /vol(Rn )2 ) so it becomes sufficient from (11) to
Var[bdn (∇
n
n
d
d
show Var(bn Tin ) = o(bn /vol(Rn )), i = 2, 3. We consider Var(bdn T3n ); the proof for Var(bdn T2n )
is similar.
2
2
For each k ∈ Zd , define a covariance function %n (k) = Cov(S20
, S2k
) and a distance
P
d
d
d
−2 2d
disn (k) = dis(Z ∩ Bn (0), Z ∩ Bn (k)). We bound Var(bn T3n ) = |In | bn
k∈Zd |{i ∈
P
In : i + k ∈ In }| · %n (k) ≤ V1n + V2n with sums V1n = |In |−1 b2d
n
k∈An |%n (k)| , V2n =
P
d
|In |−1 b2d
n
k∈Zd \An |%n (k)| where An = {k ∈ Z : kkk∞ ≤ bn }. By Lemma 1(b), |%n (k)| ≤
4
d
d
E(S20
) ≤ CN −2 b−2d
n , k ∈ An holds so that V1n = o(bn /vol(Rn )). For k ∈ Z \ An , it holds
(2r−1)d−1
that disn (k) ≥ 1 and |%n (k)| ≤ CN −2 b−2d
min{1, α1 [disn (k)]δ/(2r+δ) bn
n
} by Lemma 1(a)
and Assumption 4r . For ` ≥ 1 ∈ Z+ , |{k ∈ Zd : disn (k) = `}| ≤ C(bn + `)d−1 so we may
consider two sums of |%n (k)|, k ∈ Zd \ An , over distances ` ≡ disn (k) ≤ bn or > bn :
µ ¶2d(r−1)
µX
¶
∞
bn
X
¡
¢
`
1
d−1
(2r−1)d−1
d−1
δ/(2r+δ)
bn + bn
` α1 (`)
= o bdn /vol(Rn ) ,
V2n ≤ C 2
N |In | `=1
bn
`=b +1
n
2d(r−1)
substituting the additional term (`/bn )
≥ 1 in the second sum. Thus, Var(bdn T3n ) =
∗
o(bdn /vol(Rn )) follows, establishing the claim in (14) for Var∗ (E1n
). ¤
Proof of Theorem 2. Follows from Theorem 1 and optimization with calculus. ¤
b0 are MSE-consistent for σ 2
Proof of Theorem 3. By Theorem 1 and its proof, σ̂ 2 and B
then follows.
and B0 6= 0, respectively, so the consistency of b̂NPI
n
¯
¯
HHJ
[ m (b)/MSEm (b)−1¯ > η) (with
To establish the consistency of b̂n , let ∆m,η (b) = P (¯MSE
m ≡ mn ) for η > 0. It suffices to prove
¯µ
¯
µ ¶d/(d+2)
¶−1
¯ bd C
¯
m
B02
¯
¯
0
,
max ¯
MSE
(b)
−
1
max ∆m,η (b) ≤ C
+
¯ = o(1),
m
d
2
b∈Jm,² ¯
b∈Jm,²
¯
λn
m
b
18
(15)
where MSEm (b) = E[σ̂0,m (b) − σ 2 ]2 represents a MSE of a size-b block variance estimator on a sampling region mU (i.e., spatial observations Zn (mU)) with C0 /[2σ 4 ] = 1 or
[
(2/3)d for NOL or OL block cases. To see this, note b̂0m and bopt
m minimize MSEm (·) and
d 2
1/(d+2)
MSEm (·), respectively, where bopt
(1 + o(1)) is the optimal block
m = [2m B0 /(C0 d)]
size for a rectangular region mU by Theorem 2. Then, for b = b̂0m or bopt
m , it follows that
P
p
[ m (b)/MSEm (b) −→ 1 from ∆m,η (b) ≤
MSE
j∈Jm,² ∆m,η (j) = o(1) using (15), |Jm,² | =
p
O(md/(d+2)+² ) and m² (m2 /λn )d/(d+2) = o(1). By this and (15), one may argue that b̂0m /bopt
m −→
1. Consistency of b̂HHJ
= b̂0m · [vol(Rn )/md ]1/(d+2) for bopt
n
n then follows.
The second result in (15) for MSEm (b), b ∈ Jm,² , follows from the proof of Theorem 1
and using vol(mU) = md ; note σ 2 may be used in Theorem 1(i) (in place of σn2 ). For large n,
−2d/(d+2)
note that E[σ̂n2 (b∗n ) − σ 2 ]2 ≤ Cλn
d/(d+2)
(since b∗n is of order λn
) by applying Theorem 1
and that MSEm (b) ≥ Cm−2d/(d+2) (the optimal MSE order) for b ∈ Jm,² . Define MSEm (b) by
[ m (b) from (6). By adding/subtracting σ 2 inreplacing σ̂n2 (b∗n ) with σ 2 in the definition of MSE
[ m (b) and applying the Markov inequality, it holds that ∆m,η (b) ≤ C[P1 (b) + P2 (b) +
side MSE
P3 (b)] for any b ∈ Jm,² , where P1 (b) = {Var[MSEm (b)]/MSE2m (b)}1/2 , P2 (b) = E[σ̂n2 (b∗n ) −
P
2 ∗
σ 2 ]2 /MSEm (b) ≤ C(m/λn )2d/(d+2) and by Holder’s inequality P3 (b) =
i∈Mm E|σ̂n (bn ) −
1/2
2
σ 2 ||σ̂i,m
(b) − σ 2 |/(MSEm (b)|Mn |) ≤ CP2 (b) = O((m/λn )d/(d+2) ). To handle P1 (b), argud
2
2 4
ments as in proof of Theorem 1(ii) yield Var[MSEm (b)] ≤ Cλ−d
n m E[σ̂0,m (b) − σ ] , b ∈ Jm,² .
2
The bias/variance arguments from the proof of Theorem 1 give E[σ̂0,m
(b)−σ 2 ]4 /MSE2 (b) ≤ C,
for any b ∈ Jm,² when n is large, using the second part of (15). Hence, P1 (b) ≤ C(m/λn )d/2
holds and the bound on maxb∈Jm,² ∆m,η (b) in (15) follows. ¤
References
Besag, J. (1974) Spatial interaction and the statistical analysis of lattice systems (with
discussion). J. R. Stat. Sob. Ser. B 36, 192-236.
Bühlmann, P. and Künsch, H. R. (1999) Block length selection in the bootstrap for time
series. Comput. Stat. Data Anal. 31, 295-310.
Chan G. and Wood A. T. A. (1997) An algorithm for simulating stationary Gaussian
random fields. Applied Statistics 46, 171-181.
Cressie, N. (1993) Statistics for Spatial Data, 2nd Edition. John Wiley & Sons, New York.
David, F. N. and Moore, P. G. (1954) Notes on contagious distributions in plant populations. Ann. Bot. 18, 47-53.
19
Doukhan, P. (1994) Mixing: properties and examples. Lecture Notes in Statistics 85.
Springer-Verlag, New York.
Guyon, X. (1995) Random Fields on a Network. Springer-Verlag, New York.
Hall, P. (1985) Resampling a coverage pattern. Stochastic Processes and Their Applications
20, 231-246.
Hall, P. (1992) The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York.
Hall, P. and Jing, B.-Y. (1996) On sample reuse methods for dependent data. J. R. Stat.
Soc. Ser. B 58, 727-737.
Hall, P., Horowitz, J. L., and Jing, B.-Y. (1995) On blocking rules for the bootstrap
with dependent data. Biometrika 82, 561-574.
Künsch, H. R. (1989) The jackknife and the bootstrap for general stationary observations.
Ann. Statist. 17, 1217-1241.
Lahiri, S. N. (1996) Empirical choice of the optimal block length for block bootstrap methods. Preprint, Department of Statistics, Iowa State University, Ames, IA.
Lahiri, S. N. (1999) Theoretical comparisons of block bootstrap methods. Ann. Statist. 27,
386-404.
Lahiri, S. N. (2003a) Resampling Methods for Dependent Data. Springer, New York.
Lahiri, S. N. (2003b) Central limit theorems for weighted sums of a spatial process under
a class of stochastic and fixed designs. Sankhya: Series A 65, 356-388.
Lahiri, S. N., Furukawa, K. and Lee, Y-D. (2006) A nonparametric plug-in rule for
selecting optimal block lengths for block bootstrap methods. Stat. Methodol. (in
press)
Liu, R.Y. and Singh, K. (1992) Moving blocks jackknife and bootstrap capture weak dependence. In Exploring the Limits of Bootstrap, 225-248, R. LePage and L. Billard
(editors). John Wiley & Sons, New York.
Nordman, D. J. and Lahiri, S. N. (2004) On optimal spatial subsample size for variance
estimation. Ann. Statist. 32, 1981-2027.
Politis, D. N. and Romano, J. P. (1993) Nonparametric resampling for homogeneous
strong mixing random fields. J. Multivariate Anal. 47, 301-328.
Politis, D. N. and Romano, J. P. (1994a) Large sample confidence regions based on
subsamples under minimal assumptions. Ann. Statist. 22, 2031-2050.
Politis, D. N. and Romano, J. P. (1994b) The stationary bootstrap. Journal of the American Statistical Association 89, 1303-1313.
20
Politis, D. N. and Sherman, M. (2001) Moment estimation for statistics from marked
point processes. J. R. Stat. Soc. Ser. B 63, 261-275.
Politis, D. N., Paparoditis, E. and Romano, J. P. (1999). Resampling marked point
processes. In Multivariate Analysis, Design of Experiments, and Survey Sampling: a
Tribute to J. N. Srivastava (Ed. - S Ghosh). Mercel Dekker, New York, 163-185.
Politis, D. N., Romano, J. P., and Wolf, M. (1999) Subsampling. Springer, New York.
Politis, D. N. and White, H. (2004) Automatic block-length selection for the dependent
bootstrap. Econometric Reviews 23 53-70.
Sherman, M. (1996) Variance estimation for statistics computed from spatial lattice data.
J. R. Stat. Soc. Ser. B 58, 509-523.
Sherman, M. and Carlstein, E. (1994) Nonparametric estimation of the moments of a
general statistic computed from spatial data. J. Amer. Statist. Assoc. 89, 496-500.
Zhu, J. and Lahiri, S.N. (2007) Weak convergence of blockwise bootstrapped empirical
processes for stationary random fields with statistical applications. Stat. Inference
Stoch. Process. 10, 107-145.
Daniel J. Nordman, Soumendra N. Lahiri
Brooke L. Fridley
Department of Statistics
Division of Biostatistics, Mayo Clinic
Iowa State University
200 First Street SW
Ames, IA 50010 USA
Rochester, MN 55905 USA
dnordman@iastate.edu, snlahiri@iastate.edu
Fridley.Brooke@mayo.edu
Figure 1: The blocking mechanism for the spatial block bootstrap method. (a) Partition of a
hexagonal sampling region Rn by subregions Rn (k), k ∈ Kn from (2); (b) Set of OL complete
blocks; (c) Set of NOL complete blocks; (d) Bootstrap version of the spatial process Z(·)
obtained by resampling and concatenating blocks from (b) or (c) in positions of complete
blocks from (a).
(a) complete block (b)
(c)
¢
¡
¡
¡
®¢¢ @
@
@
¡
¡
¡
@
¡ @
¡
@
@
@AK
@
¡
A incomplete block
@
¡
¡
(d)
@
@
@
¡
@
@
¡
¡
@
@
@
¡
¡
¡
21
¡
Table 1: Normalized bias Eσ̂n2 /σn2 − 1 and variance Var(σ̂n2 )/σn4 for OL block bootstrap
and OL subsampling estimators of σn2 = N Var(Z̄N ) when using optimal block scaling from
Figure 7). Estimates based on 100,000 simulations.
Rn = {x ∈ R2 : kxk ≤ 9}
Model(1, 1)
Bias
Rn = {x ∈ R2 : kxk ≤ 20}
Model(0.5, 0.3)
Var
Bias
Model(1, 1)
Var
Bias
Model(0.5, 0.3)
Var
Bias
Var
Bootstrap
-0.3895 0.0478
-0.6958 0.0283
-0.2629 0.0263
-0.4814 0.0543
Subsampling
-0.3855 0.0671
-0.7003 0.0328
-0.2679 0.0283
-0.4754 0.0536
Model(1,1), Circular Sampling Region Radius = 20
0.6
Model(1,1), Circular Sampling Region Radius = 9
0.5
0.4
0.2
0.1
2
4
6
8
10
12
2
4
6
8
10
12
14
16
18
block size b
Model(0.5,0.3), Circular Sampling Region Radius = 9
Model(0.5,0.3), Circular Sampling Region Radius = 20
0.7
0.8
normalized MSE
0.9
OL Block Boostrap (5)
NOL Block Bootstrap (5)
Subsampling (6)
0.5
2
4
6
8
10
12
0.3 0.4 0.5 0.6 0.7 0.8 0.9
block size b
0.6
normalized MSE
OL Block Boostrap (6)
NOL Block Bootstrap (6)
Subsampling (7)
0.3
normalized MSE
0.6
0.4
0.2
normalized MSE
0.8
OL Block Boostrap (4)
NOL Block Bootstrap (4)
Subsampling (5)
OL Block Boostrap (10)
NOL Block Bootstrap (9)
Subsampling (11)
2
4
block size b
6
8
10
12
14
16
18
block size b
Figure 2: Estimates of normalized MSE, E(σ̂n2 /σn2 − 1)2 , for OL/NOL spatial block bootstrap
variance estimators σ̂n2 (bn ) of σn2 = N Var(Z̄N ) for various block sizes bn as well as OL
subsampling variance estimator. Each estimated MSE is based on 100,000 simulations.
Values in (·) denote the optimal block sizes with minimal MSE.
22
0.04
0
0.02
MSE−criterion
0.06
Evaulations of MSE−criterion over 9 tuning parameter combinations
NPI
HHJ
_____________
region radius 9
Model(1,1)
NPI
HHJ
_____________
region radius 9
Model(0.5,0.3)
NPI
HHJ
_____________
region radius 20
Model(1,1)
NPI
HHJ
_____________
region radius 20
Model(0.5,0.3)
Figure 3: Dot plots of MSE-criterion (8) based on 9 combinations of tuning parameter
coefficients for each block selection method, circular sampling region Rn and covariance
model. The NPI method uses C1 , C2 ∈ {0.5, 0.75, 1}; the HHJ method uses C ∗ ∈ {1, 1.5, 2},
m ∈ {Cvol(Rn )1/6 : C = 2, 3, 4}.
23
Table 2: Frequency distribution of empirically chosen OL block bootstrap scaling with NPI
and HHJ methods (based on 1000 simulations). Along with C2 = 0.5, NPI1 and NPI2 use
C1 = 0.5 and 0.75, respectively. With C ∗ = 1, the HHJ ≡ HHJm method uses m = 5, 10
and 15, respectively. True optimal block sizes bopt
n,OL are determined from Figure 2.
Rn radius,
Model(·, ·)
Estimates b̂n,OL of optimal block size bopt
n,OL
Method
2
3
4
NPI1
76
924
9,
NPI2
646 354
(1,1)
HHJ5
103
HHJ10
8
5
6
7
8
9
10
11
12
4
897
288 597
104
2
NPI1
8
78
284
458
162
20,
NPI2
11
127
558
299
5
(1,1)
HHJ10
119
HHJ15
22
10
427 481
69
331
9,
NPI2
145 855
(0.5,0.3)
HHJ5
997
3
441
477
19
1
871
NPI1
HHJ10
10
6
1
669
5
4
1
NPI1
1
9
200
20,
NPI2
6
744
250
(0.5,0.3)
HHJ10
HHJ15
bopt
n,OL
58
5
6
926
242
657
95
624
166
10
1
2
2
2
2
Table 3: Normalized MSEs E(σ̂n,
for OL block bootstrap estimator σ̂n,
OL /σn − 1)
OL ≡
2
2
σ̂n,
OL (bn ) of MPLE variance σn = N Var(β̂n ), based on 1000 simulations for each block
bn . Also included, the distribution of OL block size estimates b̂NPI
n,OL using NPI method with
C1 = C2 = 0.5 (i.e., percentage of times that b̂NPI
n,OL = bn based on additional 500 simulations).
block size bn
1
2
3
4
5
6
7
MSE×102
1.654
0.939
0.896∗
1.006
1.326
1.879
2.450
b̂NPI
n,OL = bn %
9.0%
16.4%
24.8%
23.8%
16.0%
8.2%
1.8%
24
Download