On optimal spatial subsample size for variance estimation Daniel J. Nordman Soumendra N. Lahiri Dept. of Statistics Dept. of Statistics Iowa State University Iowa State University Ames, IA 50011 Ames, IA 50011 Abstract In this paper, we consider the problem of determining the optimal block size for a spatial subsampling method for spatial processes observed on regular grids. We derive expansions for the mean square error of the subsampling variance estimator, which yields an expression for the theoretical optimal block size. The theoretical optimal block size is shown to depend in an intricate way on the geometry of the spatial sampling region as well as on the characteristics of the underlying random field. Final expressions for the optimal block size make use of some nontrivial estimates of lattice point counts in shifts of convex sets. The expressions for the optimal block size are computed for sampling regions of a number of commonly-encountered shapes. AMS(2000) Subject Classification: Primary 62G09; Secondary 62M40, 60G60 Key Words: Block bootstrap, block size, lattice point count, mixing, nonparametric variance estimation, random fields, spatial statistics, subsampling method Research partially supported by NSF grants DMS 98-16630 and DMS 0072571 and by the Deutsche Forschungsgemeinschaft (SFB 475). 1 Introduction We examine the problem of choosing subsample sizes to maximize the performance of subsampling methods for variance estimation. The data at hand are viewed as realizations of a stationary, weakly dependent, spatial lattice process. We consider the common scenario of sampling from sites of regular distance (e.g., indexed by the integer lattice Zd ), lying within some region Rn embedded in IRd . Such 1 lattice data appear often in time series, agricultural field trials, and remote sensing and image analysis (medical and satellite image processing). For variance estimation via subsampling, the basic idea is to construct several “scaled-down” copies of the sampling region Rn (subsamples) that fit inside Rn , evaluate the analog of θ̂n on each of these subregions, and then compute a properly normalized sample variance from the resulting values. The Rn -sampling scheme is essentially recreated at the level of the subregions. Two subsampling designs are most typical: subregions can be maximally overlapping (OL) or devised to be non-overlapping (NOL). The accuracy (e.g. variance and bias) of subsample-based estimators depends crucially on the choice of subsample size. To place our work into perspective, we briefly outline previous research in variance estimation with subsamples and theoretical size considerations. The concept of variance estimation through subsampling originated from analysis of weakly dependent time processes. Let θ̂n be an estimator of a parameter of interest θ based on {Z(1), . . . , Z(n)} from a stationary temporal process {Z(i)}i≥1 . To obtain subsamples from a stationary stretch Z(1), ..., Z(n) in time, Carlstein (1986) first proposed the use of NOL blocks of length m ≤ n: n o Z(1 + (i − 1)m), ..., Z(im) , for i = 1, . . . , ⌊n/m⌋, while the sequence of subseries: n o Z(i), ..., Z(i + m − 1) for i = 1, ..., n − m + 1 provides OL subsamples of length m (cf. Künsch, 1989; Politis and Romano, 1993b). Here, ⌊x⌋ denotes the integer part of a real number x. In each respective collection, evaluations of an analog statistic θ̂i are made for each subseries and a normalized sample variance is calculated to estimate the parameter nVar(θ̂n ): Sn2 = J X m(θ̂i − θ̃)2 i=1 J , θ̃ = J X θ̂i i=1 J , where J = ⌊n/m⌋ (J = n − m + 1) for the NOL (OL) subsample-based estimator. Carlstein (1986) and Fukuchi (1999) established the L2 consistency of the NOL and OL estimators, respectively, for the variance of a general (not necessarily linear) statistic. Politis and Romano (1993b) determined asymptotic orders of the variance O(m/n) and bias O (1/m) of the subsample variance estimators for linear statistics. For mixing time series, they found that a subsample size m proportional to n1/3 is optimal in the sense of minimizing the Mean Square Error (MSE) of variance estimation, concurring also with optimal block order for the moving block bootstrap variance estimator [Hall et al. (1995), Lahiri (1996)]. Cressie (1991, p. 492) conjectured the recipe for extending Carlstein’s variance estimator to the general spatial setting, obtaining subsamples by tiling the sample region Rn with disjoint “congruent” 2 subregions. Politis and Romano (1993a, 1994) have shown the consistency of subsample-based variance estimators for rectangular sampling/subsampling regions in IRd when the sampling sites are observed Qd Qd on Zd ∩ i=1 [1, ni ] and integer translates of i=1 [1, mi ] yield the subsamples. Garcia-Soidan and Hall (1997) and Possolo (1991) proposed similar estimators under an identical sampling scenario. For Qd linear statistics, Politis and Romano (1993a) determined that a subsampling scaling choice i=1 mi = Q C{ di=1 ni }d/(d+2) , for some unknown C, minimizes the order of a variance estimator’s asymptotic MSE. Sherman and Carlstein (1994) and Sherman (1996) proved the MSE-consistency of NOL and OL subsample estimators, respectively, for the variance of general statistics in IR2 . Their work allowed for a more flexible sampling scheme: the “inside” of a simple closed curve defines a set D ⊂ [−1, 1]2 , Z2 ∩ nD (using a scaled-up copy of D) constitutes the set of sampling sites, and translates of mD within nD form subsamples. Sherman (1996) minimized a bound on the asymptotic order of the OL estimator’s MSE to argue that the best size choice for OL subsamples involves m = O(n1/2 ) (coinciding with the above findings of Politis and Romano (1993a) for rectangular regions in IR2 ). Politis and Sherman (2001) have developed consistent subsampling methods for variance estimation with marked point process data [cf. Politis et al. (1999), Chapter 6]. Few theoretical and numerical recommendations for choosing subsamples have been offered in the spatial setting, especially with the intent of variance estimation. As suggested in the literature, an explicit theoretical determination of optimal subsample “scaling” (size) requires calculation of an order and associated proportionality constant for a given sampling region Rn . Even for the few sampling situations where the order of optimal subsample size has been established, the exact adjustments to theses orders are unknown and, quoting Politis and Romano (1993a), “important (and difficult) in practice.” Beyond the time series case with the univariate sample mean, the influence of the geometry and dimension of Rn , as well as the structure of θ̂n , on precise subsample selection has not been explored. We attempt here to advance some ideas on the best size choice, both theoretically and empirically, for subsamples. We work under the “smooth function” model of Hall (1992), where the statistic of interest θ̂n can be represented as a function of sample means. We formulate a framework for sampling in IRd where the sampling region Rn (say) is obtained by “inflating” a prototype set in the unit cube in IRd and the subsampling regions are given by suitable translates of a scaled down copy of the sampling region Rn . We consider both a non-overlapping version and a (maximal) overlapping version of the subsampling method. For each method, we derive expansions for the variance and the bias of the corresponding subsample estimator of Var(θ̂n ). The asymptotic variance of the spatial subsample estimator for the OL version turns out to be smaller than that of the NOL version by a constant factor K1 (say) which depends solely on the geometry of the sampling region Rn . In the time series case, Meketon and Schmeiser (1984), Künsch (1989), Hall et al. (1995) and Lahiri (1996) have shown in different degrees 3 of generality that the asymptotic variance under the OL subsampling scheme, compared to the NOL one, is K1 = 2 3 times smaller. Results of this paper show that for rectangular sampling regions Rn in d-dimensional space, the factor K1 is given by ( 32 )d . We list the factor K1 for sampling regions of some common shapes in Table 1. Table 1: Examples of K1 for several shapes of the sampling region Rn ⊂ IRd . Shape of Rn Rectangle in IRd K1 2/3 d Sphere in IR3 Circle in IR2 Right triangle in IR2 17π/315 π/4 − 4/(3π) 1/5 In contrast, the bias parts of both the OL and NOL subsample variance estimators are (usually) asymptotically equivalent and depend on the covariance structure of the random field as well as on the geometry of the sampling region Rn . Since the bias term is typically of the same order as the number of lattice points lying near a subsample’s boundary, determination of the leading bias term involves some nontrivial estimates of the lattice point counts over translated subregions. Counting lattice points in scaled-up sets is a hard problem and has received a lot of attention in Analytic Number Theory and in Combinatorics. Even for the case of the plane (i.e., d = 2), the counting results available in the literature are directly applicable to our problem only for a very restricted class of subregions (that have the so-called “smoothly winding border” [cf. van der Corput (1920), Huxley (1993, 1996)]. Here explicit expressions for the bias terms are derived for a more general class of sampling regions using some new estimates on the discrepancy between the number of lattice points and the volume of the shifted subregions in the plane and in three dimensional Euclidean space. In particular, our results are applicable to sampling regions that do not necessarily have “smoothly winding borders”. Minimizing the combined expansions for the bias and the variance parts, we derive explicit expressions for the theoretical optimal block size for sampling regions of different shapes. To briefly describe the result for a few common shapes, suppose the sampling region Rn is obtained by inflating a given set R0 ∈ (− 12 , 21 ]d by a scaling constant λn as Rn = λn R0 and that the subsamples are formed by considering the translates of s Rn = s λn s λn R0 . Then, the theoretically optimal choice of the block size for the OL version is of the form d 2 1/(d+2) λn B0 opt (1 + o(1)) sλn = dK0 τ 4 as n → ∞ for some constants B0 and K0 (coming from the bias and the variance terms, respectively), where τ 2 is a population parameter that does not depend on the shape of the sampling region Rn (see Theorem 5.1 for details). The following table lists the constants B0 and K0 for some shapes of Rn . It follows from Table 2 that, unlike the time series case, in higher dimensions the optimal block size critically depends 4 Table 2: Examples of B0 , K0 for some sampling regions Rn . Cross and triangle shapes appear in Figure 1, Section 6; see Section 6 for further details. Autocovariances σ(·) and Euclidean, l1 , and l∞ norms k · k, k · k1 , k · k∞ are described in Section 2.3. Rn K0 Sphere in IR3 B0 3/2 34/105 X kkkσ(k) X kkk1 σ(k) k∈Z3 Cross in IR2 4/3 4/9 · 191/192 Right triangle in IR2 @ k∈Z2 2 2/5 X kkk1 σ(k) + 2 k=(k1 ,k2 )′ ∈Z2 signk1 =signk2 X kkk∞ σ(k) k∈Z2 signk1 6=signk2 on the shape of the spatial sampling region Rn . It simplifies only slightly for the NOL subsampling scheme as the constant K0 is unnecessary for computing optimal NOL subsamples, but the bias constant B0 is often the same for both estimators from each version of subsampling. These expressions may be readily used to obtain ‘plug-in’ estimates of the theoretical optimal block lengths for use in practice. The rest of the paper is organized as follows. In Section 2, we describe the spatial subsampling method and state the Assumptions used in the paper. In Sections 3 and 4, we respectively derive expansions for the variance and the bias parts of the subsampling estimators. Theoretical optimal block lengths are derived in Section 5. The results are illustrated with some common examples in Section 6. Section 7 addresses data-based subsample size selection. Proofs of variance and bias results are separated into Sections 8 and 9, respectively. 2 Variance estimators via subsampling In Section 2.1, we frame the sampling design and the structure of the sampling region. Two methods of subsampling are presented in Section 2.2 along with corresponding nonparametric variance estimators. Assumptions and Conditions used in the paper are given in Section 2.3. 2.1 The sampling structure To describe the sampling scheme used, we first assume all potential sampling sites are located on a translate of the rectangular integer lattice in IRd . For a fixed (chosen) vector t ∈ [−1/2, 1/2)d, we identify the t-translated integer lattice as Zd ≡ t + Zd . Let {Z(s) | s ∈ Zd } be a stationary weakly dependent random field (hereafter r.f.) taking values in IRp . (We use bold font as a standard to denote 5 vectors in the space of sampling IRd and normal font for vectors in IRp , including Z(·).) We suppose that the process Z(·) is observed at sampling sites lying within the sampling region Rn ⊂ IRd . Namely, the collection of available sampling sites is {Z(s) : s ∈ Rn ∩ Zd }. To obtain the results in the paper, we assume that the sampling region Rn becomes unbounded as the sample size increases. This will provide a commonly used “increasing domain” framework for studying asymptotics with spatial lattice data [cf. Cressie (1991)]. We next specify the structure of the regions Rn and employ a formulation similar to that of Lahiri (1999a,b). Let R0 be a Borel subset of (−1/2, 1/2]d containing an open neighborhood of the origin such that for any sequence of positive real numbers an → 0, the number of cubes of the scaled lattice an Zd which d−1 intersect the closures R0 and R0c is O((a−1 ) as n → ∞. Let ∆n be a sequence of d × d diagonal n ) (n) (n) matrices, with positive diagonal elements λ(n) 1 , . . . , λd , such that each λi → ∞ as n → ∞. We assume that the sampling region Rn is obtained by “inflating” the template set R0 by the directional scaling factors ∆n ; that is, Rn = ∆n R0 . Because the origin is assumed to lie in R0 , the sampling region Rn grows outward in all directions as n increases. Furthermore, if the scaling factors are all equal (λ1(n) = · · · = λ(n) d ), the shape of Rn remains the same for different values of n. The formulation given above allows the sampling region Rn to have a large variety of fairly irregular shapes with the boundary condition on R0 imposed to avoid pathological cases. Some common examples of such regions are convex subsets of IRd , such as spheres, ellipsoids, polyhedrons, as well as certain non-convex subsets with irregular boundaries, such as star-shaped regions. Sherman and Carlstein (1994) and Sherman (1996) consider a similar class of such regions in the plane (i.e. d = 2) where the boundaries of the sets R0 are delineated by simple rectifiable curves with finite lengths. The border requirements on R0 ensure that the number of observations near the boundary of Rn is negligible compared to the totality of data values. This assumption is crucial to the derivation of results to follow. In addition, Perera (1997) demonstrates that the border geometry of R0 can significantly influence the asymptotic distribution of sample means taken with observations from “expanding” sets, like Rn . 2.2 Subsampling designs and variance estimators We suppose that the relevant statistic, whose variance we wish to estimate, can be represented as a function of sample means. Let θ̂n = H(Z̄Nn ) be an estimator of the population parameter of interest θ = H(µ), where H : IRp → IR is a smooth function, EZ(t) = µ ∈ IRp is the mean of the stationary r.f. 6 Z(·), and Z̄Nn is the sample mean of the N n observations within Rn . We can write X Z̄Nn = N −1 n Z(s). (2.1) s∈Zd ∩Rn This parameter and estimator formulation is what Hall (1992) calls the “smooth function” model and it has been used in other scenarios, such as with the moving block bootstrap (MBB) and empirical likelihood, for studying approximately linear functions of a sample mean [Lahiri (1996), DiCiccio et al. (1991)]. By considering suitable functions of the Z(s)’s, one can represent a wide range of estimators under the present framework. In particular, these include means, products and ratios of means, autocorrelation estimators, sample lag cross-correlation estimators [Politis and Romano (1993b)], and Yule-Walker estimates for autoregressive processes [cf. Guyon (1995)]. The quantity which we seek to estimate nonparametrically is the variance of the normalized statistic √ N n θ̂n , say, τn2 = N n E(θ̂n − Eθ̂n )2 . In our problem, this goal is equivalent to consistently estimating the limiting variance τ 2 = limn→∞ τn2 . 2.2.1 Overlapping subsamples Variance estimation with OL subsampling regions has been presented previously in the literature, though in more narrow sampling situations. Sherman (1996) considered an OL subsample-based estimator for sampling regions in IR2 ; Politis and Romano (1994) extended a similar estimator for rectangular regions in IRd with faces parallel to the coordinate axes (i.e. R0 = (−1/2, 1/2]d); and a host of authors, in a variety of contexts, have examined OL subsample estimators applied to time series data [cf. Song and Schmeiser (1988), Politis and Romano (1993a), Fukuchi (1999)]. We first consider creating a smaller version of Rn , which will serve as a template for the OL subsampling regions. To this end, let s∆n be a d × d diagonal matrix with positive diagonal elements, (n) (n) (n) {sλ(n) → 0 and sλ(n) → ∞, as n → ∞, for each i = 1, . . . , d. (The 1 , . . . , sλd }, such that sλi /λi i matrix ∆n represents the determining scaling factors for Rn and s∆n shall be factors used to define the subsamples.) We make the “prototype” subsampling region, sRn = s∆n R0 , (2.2) and identify a subset of Zd , say JOL , corresponding to all integer translates of sRn lying within Rn . That is, JOL = i ∈ Zd : i + sRn ⊂ Rn . The desired OL subsampling regions are precisely the translates of sRn given by: R i,n ≡ i + sRn , i ∈ JOL . Note that the origin belongs to JOL and, clearly, some of these subregions overlap. Let sN n = |Zd ∩ sRn | be the number of sampling sites in sRn and let |JOL | denote the number of 7 available subsampling regions. (The number of sampling sites within each OL subsampling region is the same, namely for any i ∈ JOL , sN n = |Zd ∩ Ri,n |.) For each i ∈ JOL , compute θ̂i = H(Zi,n ), where Zi,n = sN n −1 X Z(s) s∈Zd ∩Ri,n denotes the sample mean of observations within the subregion. We then have the OL subsample variance estimator of τn2 as 2 = |JOL |−1 τ̂n, OL 2.2.2 X sN n i∈JOL θ̂i,n − θ̃n 2 θ̃n = |JOL |−1 , X θ̂i,n . i∈JOL Non-overlapping subsamples Carlstein (1986) first proposed a variance estimator involving NOL subsamples for time processes. Politis and Romano (1993a) and Sherman and Carlstein (1994) demonstrated the consistency of variance estimation, via NOL subsampling, for certain rectangular regions in IRd and some sampling regions in IR2 , respectively. Here we adopt a formulation similar to those of Sherman and Carlstein (1994) and Lahiri (1999a). The sampling region Rn is first divided into disjoint “cubes”. Let s∆n be the previously described d × d diagonal matrix from (2.2), which will determine the “window width” of the partitioning cubes. Let JN OL = i ∈ Zd : s∆n i + (−1/2, 1/2]d ⊂ Rn represent the set of all “inflated” subcubes that lie inside Rn . Denote its cardinality as |JN OL |. For each i ∈ JN OL , define the subsampling region R̃i,n = s∆n (i + R0 ) by inscribing the translate of s∆n R0 such that the origin is mapped onto the midpoint of the cube s∆n (i + (−1/2, 1/2]d). This provides a collection of NOL subsampling regions, which are smaller versions of the original sampling region Rn , that lie inside Rn . For each i ∈ JN OL , the function H(·) is evaluated at the sample mean, say Z̃i,n for a corresponding subsampling region R̃ i,n to obtain θ̂ i,n = H(Z̃ i,n ). The NOL subsample estimator of τn2 is again an appropriately scaled sample variance: 2 τ̂n, = |JN OL |−1 N OL X i∈JN OL sN i,n θ̂i,n − θ̃n 2 , θ̃n = |JN OL |−1 X θ̂i,n i∈JN OL where sN i,n = |Zd ∩ R̃i,n | denotes the number of sampling sites within a given NOL subsample. We note that sN i,n may differ between NOL subsamples, but all such subsamples will have exactly sN i,n = sN n sites available if the diagonal elements of s∆n are integers. 8 2.3 Assumptions For stating the assumptions, we need to introduce some notation. For a vector x = (x1 , ..., xd )′ ∈ IRd , Pd let kxk and kxk1 = i=1 |xi | denote the usual Euclidean and l1 norms of x, respectively. Denote the l∞ norm as kxk∞ = max1≤k≤d |xk |. Define dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 } for two sets E1 , E2 ⊂ IRd . We shall use the notation | · | also in two other cases: for a countable set B, |B| would denote the cardinality of the set B; for an uncountable set A ⊂ IRd , |A| would refer to the volume (i.e., the IRd Lebesgue measure) of A. Let FZ (T ) = σhZ(s) : s ∈ T i be the σ-field generated by the variables {Z(s) : s ∈ T }, T ⊂ Zd . For T1 , T2 ⊂ Zd , write α̃(T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ FZ (T1 ), B ∈ FZ (T2 )}. Then, the strong mixing coefficient for the r.f. Z(·) is defined as α(k, l) = sup{α̃(T1 , T2 ) : Ti ⊂ Zd , |Ti | ≤ l, i = 1, 2; dis(T1 , T2 ) ≥ k} (2.3) Note that the supremum in the definition of α(k, l) is taken over sets T1 , T2 which are bounded. For d > 1, this is important. A r.f. on the (rectangular) lattice Zd with d ≥ 2 that satisfies a strong mixing condition of the form lim sup{α̃(T1 , T2 ) : T1 , T2 ⊂ Zd , dis(T1 , T2 ) ≥ k} = 0 (2.4) k→∞ with supremum taken over possibly unbounded sets necessarily belongs to the more restricted class of ρ-mixing r.f.’s [cf. Bradley (1989)]. Politis and Romano (1993a) used moment inequalities based 2 2 on the mixing condition in (2.4) to determine the orders of the bias and variance of τ̂n, OL , τ̂n,N OL for rectangular sampling regions. For proving the subsequent theorems, the assumptions below are needed along with two conditions stated as functions of a positive argument r ∈ Z+ = {0, 1, 2, . . .}. In the following, det(∆) represents the determinant of a square matrix ∆. For α = (α1 , ..., αp )′ ∈ (Z+ )p , let Dα denote the αth order α p ′ 1 partial differential operator ∂ α1 +...+αp /∂xα 1 ...∂xp and ∇ = (∂H(µ)/∂x1 , . . . , ∂H(µ)/∂xp ) be the vec- tor of first order partial derivatives of H at µ. Limits in order symbols are taken letting n tend to infinity. Assumptions: (A.1) There exists a d × d diagonal matrix ∆0 , det(∆0 ) > 0, such that 1 ∆n → ∆0 . (n) s sλ1 (A.2) For the scaling factors of the sampling and subsampling regions: d X i=1 d (n) X 1 [det( s∆n )](d+1)/d sλi + = o(1), + (n) (n) det(∆n ) λi sλi i=1 9 max λi(n) = O min λi(n) . 1≤i≤d 1≤i≤d (A.3) There exist nonnegative functions α1 (·) and g(·) such that limk→∞ α1 (k) = 0, liml→∞ g(l) = ∞ and the strong-mixing coefficient α(k, l) from (2.3) satisfies the inequality α(k, l) ≤ α1 (k)g(l) k > 0, l > 0. (A.4) sup{ α̃(T1 , T2 ) : T1 , T2 ⊂ Zd , |T1 | = 1, dis(T1 , T2 ) ≥ k} = o(k −d ). (A.5) τ 2 > 0, where τ 2 = P k∈Zd σ(k), σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) . Conditions: Dr : H : IRp → IR is r-times continuously differentiable and, for some a ∈ Z+ and real C > 0, max{|Dν H(x)| : kνk1 = r} ≤ C(1 + kxka ), Mr : x ∈ IRp . For some 0 < δ ≤ 1, 0 < κ < (2r − 1 − 1/d)(2r + δ)/δ, and C > 0, EkZ(t)k2r+δ < ∞, ∞ X x=1 x(2r−1)d−1 α1 (x)δ/(2r+δ) < ∞, g(x) ≤ Cxκ . Some comments about the assumptions and the conditions are in order. Assumption A.5 implies a positive, finite asymptotic variance τ 2 for the standardized estimator, √ N n θ̂n . We would like τ 2 ∈ (0, ∞) for a purposeful variance estimation procedure. In Assumption A.3, we formulate a conventional bound on the mixing coefficient α(k, l) from (2.3) that is applicable to many r.f.s and resembles the mixing assumption of Lahiri (1999a,b). For r.f.s satisfying Assumption A.3, the “distance” component of the bound, α1 (·), often decreases at an exponential rate while the function of “set size,” g(·), increases at a polynomial rate [cf. Guyon (1995)]. Examples of r.f.s that meet the requirements of A.3 and Mr include Gaussian fields with analytic spectral densities, certain linear fields with a moving average or autoregressive (AR) representation (like m-dependent fields), separable AR(1)xAR(1) lattice processes suggested by Martin (1990) for modeling in IR2 , many Gibbs and Markov fields, and important time series models [cf. Doukhan (1994)]. Condition Mr combined with A.3 also provides useful moment bounds for normed sums of observations (see Lemma 8.2). Assumption A.4 permits the CLT in Bolthausen (1982) to be applied to sums of Z(·) on sets of increasing domain, in conjunction with the boundary condition on R0 , Assumption A.3, and Condition Mr . This version of the CLT (Stein’s method) is derived from α-mixing conditions which ensure asymptotic independence between a single point and observations in arbitrary sets of increasing distance [cf. Perera (1997)]. Assumptions A.1 and A.2 set additional guidelines for how sampling and subsampling design parameters, ∆n and s∆n , may be chosen. The assumptions provide a flexible framework for handling “increas10 ing domains” of many shapes. For d = 1, A.1-A.2 are equivalent to the requirements of Lahiri (1999c) who provides variance and bias expansions for the MBB variance estimator with weakly dependent time processes. 3 Variance expansions We now give expansions for the asymptotic variance of the OL/NOL subsample variance estimators 2 2 τ̂n, and τ̂n, of τn2 = N n Var(θ̂n ). OL N OL Theorem 3.1 Suppose that Assumptions A.1 - A.5 and Conditions D2 and M5+2a hold, then (a) 2 Var(τ̂n, OL ) (b) 2 Var(τ̂n, N OL ) = K0 · = det(s∆n ) 4 2τ (1 + o(1)), det(∆n ) 1 det(s∆n ) 4 · 2τ (1 + o(1)), |R0 | det(∆n ) Z 1 |(x + R0 ) ∩ R0 |2 where K0 = · dx is an integral with respect to the IRd Lebesgue measure. |R0 | IRd |R0 |2 2 is a property of the The constant K0 appearing in the variance expansion of the estimator τ̂n, OL shape of the sampling template R0 but not of its exact embedding in space IRd or even the scale of the set. Namely, K0 is invariant to invertible affine transformations applied to R0 and hence can be computed from either R0 or Rn = ∆n R0 . Values of K0 for some template shapes are given in Table 3 and Section 6. Table 3: Examples of K0 from Theorem 3.1 for several shapes of R0 ⊂ IRd . ∗ The trapezoid has a 90◦ interior ∠ and parallel sides b2 ≥ b1 ; c = (b2 /b1 + 1)−2 [1 + 2(b2 /b1 − 1)/(b2 /b1 + 1)]. R0 Shape IRd Rectangle IR3 Ellipsoid IR3 Cylinder IR2 Ellipse IR2 Trapezoid∗ K0 (2/3)d 34/105 2/3 1 − 16/(3π 2 ) 1 − 16/(3π 2 ) 2/5 (1 + 4c/9) A stationary time sequence Z1 , . . . , Zn can be obtained within our sampling formulation by choosing R0 = (−1/2, 1/2] and λ(n) 1 = n on the untranslated integer lattice Z = Z. In this special sampling case, an application of Theorem 3.1 yields 2 2 Var(τ̂n, OL ) = 2/3 · Var(τ̂n,N OL ), (n) (n) 2 Var(τ̂n, · [2τ 4 ](1 + o(1)), N OL ) = sλ1 /λ1 a result which is well-known for “nearly” linear functions θ̂n of a time series sample mean [cf. Künsch (1989)]. Theorem 3.1 implies that, under the “smooth” function model, the asymptotic variance of the OL 11 subsample-based variance estimator is always strictly less than the NOL version because 2 Var(τ̂n, ) OL = K0 |R0 | < 1. 2 n→∞ Var(τ̂n,N OL ) K1 = lim (3.1) If both estimators have the same bias (which is often the case), (3.1) implies that variance estimation with OL subsamples is more efficient than the NOL subsample alternative owing to a smaller asymptotic MSE. Unlike K0 , K1 does depend on the volume |R0 |, which in turn is constrained by the R0 -template’s geometry. That is, K1 is ultimately bounded, through |R0 | in (3.1), by the amount of space that an object of R0 ’s shape can possibly occupy within (−1/2, 1, 2]d; or by how much volume can be filled by a given geometrical body (e.g. circle) compared to a cube. The constants K1 in Table 1 are computed with templates of prescribed shape and largest possible volume in (−1/2, 1/2]d. These values most accurately reflect the influence of R0 ’s (or Rn ’s) geometry on the relative performance (ie. variance, 2 2 efficiency) of the estimators τ̂n, and τ̂n, . OL N OL To conclude this section, we remark that both subsample-based variance estimators can be shown to be (MSE) consistent under Theorem 3.1 conditions, allowing for more general spatial sampling regions, in both shape and dimension, than previously considered. Inference on the parameter θ can be made √ through the limiting standard normal distribution of N n (θ̂n − θ)/τ̂n for τ̂n = τ̂n,OL or τ̂n,N OL . 4 Bias expansions We now try to capture and precisely describe the leading order terms in the asymptotic bias of each subsample-based variance estimator, similar to the variance determinations from the previous section. 2 We first establish and note the order of the dominant component in the bias expansions of τ̂n, OL and 2 τ̂n, , which is the subject of the following lemma. N OL Lemma 4.1 With Assumptions A.1 - A.5, suppose that Conditions D2 and M2+a hold for d ≥ 2; and D3 and M3+a for d = 1. Then, the subsample estimators of τn2 = N n Var(θ̂n ) have expectations: 2 E(τ̂n, ) = τn2 + O (1/ sλ(n) OL 1 ) and 2 E(τ̂n, ) = τn2 + O (1/ sλ(n) N OL 1 ). The lemma shows that, under the smooth function model, the asymptotic bias of each estimator is O (1/ sλ1(n) ) for all dimensions of sampling. Politis and Romano (1993a, p.323) and Sherman (1996) 2 showed this same size for the bias of τ̂n, with sampling regions based on rectangles R0 = (−1/2, 1/2]d OL or simple closed curves in IR2 , respectively. Lemma 4.1 extends these results to a broader class of sampling regions. However, we would like to precisely identify the O (1/ sλ(n) 1 ) bias component for 2 2 τ̂n, OL or τ̂n,N OL to later obtain theoretically correct proportionality constants associated with optimal subsample scaling. 12 To achieve some measure of success in determining the exact bias of the subsampling estimators, we reformulate the subsampling design slightly so that sλn ≡ sλ1(n) = · · · = sλ(n) d . That is, a common scaling factor in all directions is now used to define the subsampling regions (as in Sherman and Carlstein (1994); Sherman (1996)). This constraint will allow us to deal with the counting issues at the heart of the bias expansion. Adopting a common scaling factor sλn for the subsamples also is sensible for a few other reasons at this stage: • “Unconstrained” optimum values of s∆n cannot always be found by minimizing the asymptotic 2 2 MSE of τ̂n, or τ̂n, , even for variance estimation of some desirable statistics on geometrically OL N OL “simple” sampling and subsampling regions. Consider estimating the variance of a real-valued sample mean over a rectangular sampling region in IRd based on R0 = (−1/2, 1/2]d, with observations on Zd = Zd . If Assumptions A.1-A.5 and Condition M1 hold, the leading term in the bias expansion can be shown to be: 2 Bias of τ̂n, OL = d X Li − (n) λ i=1 s i ! (1 + o(1)); Li = X |ki |Cov Z(0), Z(k) . k∈Zd k=(k1 ,...,kd )′ 2 In using the parenthetical sum above to expand the MSE of τ̂n, , one finds that the resulting MSE OL cannot be minimized over the permissible, positive range of s∆n if the signs of the Li values are unequal. That is, for d > 1, the subsample estimator MSE cannot always be globally minimized to obtain optimal subsample factors s∆n by considering just the leading order bias terms. An effort to determine and incorporate (into the asymptotic MSE) second or third order bias components quickly becomes intractable, even with rectangular regions. • The diagonal components of s∆n are asymptotically scalar multiples of each other by Assumption A.1. If so desired, a template choice for R0 could be used to scale the expansion of the subsampling regions in each direction. In the continued discussion, we assume sRn = sλn R0 . (4.1) We frame the components necessary for determining the biases of the spatial subsample variance estimators in the next theorem. Let Cn (k) ≡ |Zd ∩ sRn ∩ (k + sRn )| denote the number of pairs of observations in the subsampling region sRn separated by a translate k ∈ Zd . Theorem 4.1 Suppose that d ≥ 2, sRn = sλn R0 and Assumptions A.1 - A.5, Conditions D3 and M3+a hold. If, in addition, sλn ∈ Z+ for NOL subsamples and lim n→∞ − Cn (k) = C(k) (sλn )d−1 sN n 13 (4.2) exists for all k ∈ Zd , then X −1 E(τ̂n2 ) − τn2 = C(k)σ(k) (1 + o(1)) sλn |R0 | d k∈Z 2 2 where σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) and where τ̂n2 is either τ̂n, OL or τ̂n,N OL . Note that the numerator on the left side of (4.2) is the number of integer grid points that lie in the subregion sRn , but not in the translate k + sRn . Hence, computing the bias above actually requires counting the number of lattice points inside intersections like sRn ∩ k + sRn , which is difficult in general. To handle the problem, one may attempt to estimate the count Cn (k) with the corresponding Lebesgue volume, |sRn ∩ k + sRn |, and then quantify the resulting approximation error. The determination of volumes or areas may not be easy either but hopefully more manageable. For example, if R0 is a circle, the area of sλn R0 can be readily computed, but the number of Z2 integers inside sλn R0 is not so simple and was in fact a famous consideration of Gauss [cf. Krätzel (1988, p. 141)]. We first note that the boundary condition on R0 provides a general (trivial) bound on the discrepancy between the count Cn (k) and the volume |sRn ∩ k + sRn |: O (sλd−1 n ). However, the size of the numerator d in (4.2) is also O (sλd−1 n ), corresponding to the order of Z lattice points “near” the boundary of sRn . Consequently, a standard O (sλd−1 n ) bound on the volume-count approximation error is too large to immediately justify the exchange of volumes |sRn |, |sRn ∩ k + sRn | for counts sN n , Cn (k) in (4.2). Bounds on the difference between lattice point counts and volumes have received much attention in analytic number theory, which we briefly mention. Research has classically focused on sets outlined by “smooth” simple closed curves in the plane IR2 and on one question in particular [Huxley (1996)]: When a curve with interior area A is ‘blown up’ by a factor b, how large is the difference between the number of Z2 integer points inside the new curve and the area b2 A? For convex sets with a smoothly winding border, van der Corput’s (1920) answer to the posed question above is O (b46/69+ǫ ), while the best answer is O (b46/73+ǫ ) for curves with sufficiently differentiable radius of curvature [Huxley (1993, 1996)]. These types of bounds, however, are invalid for many convex polygonal templates R0 in IR2 such as triangles, trapezoids, etc., where often the difference between number of Z2 integer points in sRn = s λn R0 and its area is of exact order O (sλn ) (set also by the boundary condition on R0 or the perimeter length of sRn ). The problem above, as considered by number theorists, does not directly address counts for intersections between an expanding region and its vector translates, e.g. sRn ∩ k + sRn . 2 To eventually compute “closed-form” bias expansions for τ̂n, OL , we use approximation techniques for subtracted lattice point counts. For each k ∈ Zd , we 1. replace the numerator of (4.2) with the difference of corresponding Lebesgue volumes. 2. show the following error term is of sufficiently small order o(sλd−1 n ): d d sN n − Cn (k) − sλn |R0 | − |sRn ∩ k + sRn | = sN n − sλn |R0 | − Cn (k) − |sRn ∩ k + sRn | . 14 We do approximate the number of lattice points in sRn and sRn ∩ k + sRn by set volumes, though the Lebesgue volume may not adequately capture the lattice point count in either set. However, the difference between approximation errors sN n − sλdn |R0 | and Cn (k) − |sRn ∩ k + sRn | can be shown to be asymptotically small enough, for some templates R0 , to justify replacing counts with volumes in (4.2) (see Lemma 9.4). That is, these two volume-count estimation errors can cancel to a sufficient extent when subtracted. The above approach becomes slightly more complicated for NOL subsamples, R̃i,n = s∆n (i + R0 ), which may vary in number of sampling sites sN i,n . In this case, errors incurred by approximating counts |Zd ∩ R̃i,n ∩ k + R̃i,n | with volumes |R̃i,n ∩ k + R̃i,n | are shown to be asymptotically negligible, uniformly in i ∈ JN OL . In the following theorem, we use this technique to give bias expansions for a large class of sampling regions in IRd , d ≤ 3, which are “nearly” convex. That is, the sampling region Rn differs from a convex set possibly only at its boundary, but sampling sites on the border may be arbitrarily included or excluded from Rn . Q i Some notation is additionally required. For α = (α1 , ..., αp )′ ∈ (Z+ )p , x ∈ IRp , write xα = pi=1 xα i , Qp α! = i=1 (αi !), and cα = Dα H(µ)/α!. Let Z∞ denote a random vector with a normal N (0, Σ∞ ) √ distribution on IRp , where Σ∞ is the limiting covariance matrix of the scaled sample mean N n (Z̄Nn −µ) from (2.1). Let B ◦ , B denote the interior and closure of B ∈ IRd , respectively. Theorem 4.2 Suppose sRn = sλn R0 and there exists a convex set B such that B ◦ ⊂ R0 ⊂ B. With Assumptions A.2 - A.5, assume Conditions D5−d and M5−d+a hold for d ∈ {1, 2, 3}. Then, C(k) = V (k) ≡ lim n→∞ |sRn | − |sRn ∩ (k + sRn )| , (sλn )d−1 k ∈ Zd 2 2 2 2 whenever V (k) exists and the biases E(τ̂n, OL ) − τn , E(τ̂n,N OL ) − τn are equal to: for d = 1, for d = 2 or 3, −1 λ s n |R0 | X k∈Z |k|σ(k) + C∞ ! (1 + o(1)); X |sRn | − |sRn ∩ (k + sRn )| − σ(k) (1 + o(1)) |sRn | d k∈Z −1 X or V (k)σ(k) (1 + o(1)), provided each V (k) exists; sλn |R0 | d k∈Z X cα X cα cβ α α β where σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) and C∞ = Var Z∞ +2 E Z∞ Z∞ α! β! kαk1 =2 15 kαk1 =1, kβk1 =3 +2 X k1 , k2 ∈Z X α β γ cα c(β+γ) E Z(t) − µ Z(t + k1 ) − µ Z(t + k2 ) − µ . (β + γ)! kαk1 =1, kβk1 =1,kγk1 =1 Remark 1: If Condition Dx holds with C = 0 for x ∈ {2, 3, 4}, then Condition Mx−1 is sufficient in Theorem 4.2. Remark 2: For each k ∈ Zd , the numerator in V (k) is of order O (sλd−1 n ) by the R0 -boundary condition which holds for convex templates. We may then expand the bias of the estimators through the limiting, scaled volume differences V (k). For d = 1, with samples and subsamples based on intervals, it can be easily seen that V (k) = |k| which appears in Theorem 4.2. 2 The function H(·) needs to be increasingly “smoother” to determine the bias component of τ̂n, OL 2 or τ̂n, in lower dimensional spaces d = 1 or 2. For a real-valued time series sample mean θ̂n = Z̄n , N OL the well-known bias of the subsample variance estimators follows from Theorem 4.2 under our sampling framework R0 = (−1/2, 1/2], Z = Z: −1 sλn X k∈Z |k|Cov ∇′ Z(0), ∇′ Z(k) (4.3) with ∇ = 1. In general though, terms in the Taylor expansion of θ̂ i,n (around µ) up to fourth order 2 2 can contribute to the bias of τ̂n, OL and τ̂n,N OL when d = 1. In contrast, the asymptotic bias of the time series MBB variance estimator with “smooth” model statistics is very different from its subsamplebased counterpart. The MBB variance estimator’s bias is given by (4.3), determined only by the linear component from the Taylor’s expansion of θ̂i,n [cf. Lahiri (1996)]. 5 Asymptotically optimal subsample sizes In the following, we consider “size” selection for the subsampling regions to maximize the large-sample accuracy of the subsample variance estimators. For reasons discussed in Section 4, we examine a theoretically optimal scaling choice sλn for subregions in (4.1). 5.1 Theoretical optimal subsample sizes 2 Generally speaking, there is a trade off in the effect of subsample size on the bias and variance of τ̂n, OL 2 or τ̂n, . For example, increasing sλn reduces the bias but increases the variance of the estimators. N OL The best value of sλn optimizes the over-all performance of a subsample variance estimator by balancing the contributions from both the estimator’s variance and bias. An optimal sλn choice can be found by minimizing the asymptotic order of a variance estimator’s MSE under a given OL or NOL sampling scheme. 2 2 Theorem 4.1 implies that the bias of the estimators τ̂n, and τ̂n, is of exact order O (1/sλn ). For OL N OL 16 a broad class of sampling regions Rn , the leading order bias component can be determined explicitly with Theorem 4.2. We bring these variance and bias expansions together to obtain an optimal subsample scaling factor sλn . Theorem 5.1 Let sRn = sλn R0 . With Assumptions A.2 - A.5, assume Conditions D2 and M5+2a P hold if d ≥ 2; and Conditions D3 and D7+2a if d = 1. If B0 |R0 | ≡ k∈Zd C(k)σ(k) + I{d=1} C∞ 6= 0, then opt n,OL sλ = det(∆n )(B0 )2 dK0 τ 4 1/(d+2) (1 + o(1)) and opt n,N OL sλ = det(∆n )|R0 |(B0 )2 dτ 4 1/(d+2) (1 + o(1)). Remark 3: If Condition Dx holds with C = 0 for x ∈ {2, 3}, then Condition M2x−1 is sufficient. Remark 4: Theorem 5.1 suggests that optimally scaled OL subsamples should be larger than the NOL ones by a scalar: (K1 )−1/(d+2) > 1 where K1 = K0 |R0 | the limiting ratio of variances from (3.1). It is well-known in the time series case that the OL subsampling scheme produces an asymptotically more efficient variance estimator than its NOL counterpart. We can now quantify the relative efficiency of the two subsampling procedures in d-dimensional sampling space. With each variance estimator 2 2 respectively optimized using (4.1), τ̂n, OL is more efficient than τ̂n,N OL and the asymptotic relative 2 2 efficiency of τ̂n, to τ̂n, depends solely of the geometry of R0 : N OL OL 2 E(τ̂n, − τn2 )2 OL = (K1 )2/(d+2) < 1. 2 n→∞ E(τ̂n, − τn2 )2 N OL AREd = lim Possolo (1991), Politis and Romano (1993a, 1994), Hall and Jing (1996), and Garcia-Soidan and Hall (1997) have examined subsampling with rectangular regions based essentially on R0 = (−1/2, 1/2]d. Using the geometrical characteristic K1 = ( 23 )d for rectangles, we can now examine the effect of the 2 2 sampling dimension on the AREd of τ̂n, to τ̂n, for these sampling regions. Although the AREd deN OL OL 2 2 creases as the dimension d increases, we find the relative improvement of τ̂n, OL over τ̂n,N OL is ultimately limited and the AREd has a lower bound of 4/9 for all IRd -rectangular regions. We conclude this section by addressing a question raised by a referee on subsample shape selection. Although not widely considered in the literature, it is possible to formulate subsample variance estimators using subsamples of a freely chosen shape, rather than scaled-down copies of Rn . Nordman and Lahiri (2003) discuss comparing estimators, based on subsamples of different shapes, in terms of the asymptotic MSE of variance estimation (e.g., to select subsamples as scaled down Rn -copies vs. rectangles). This involves finding variance and bias expansions for OL, NOL estimators to determine optimal scaling for subsamples of an arbitrary shape. Modified versions of Theorems 3.45 are given. Such performance comparisons depend heavily on the subsample geometry (which solely determines the bias of OL, NOL subsample estimators) as well as the r.f. covariances σ(k), k ∈ Zd . For some 17 limited sampling regions, it may be possible to select a good subsample shape only on geometrical considerations. See Nordman and Lahiri (2003) for further details and examples. 6 Examples We now provide some examples of the important quantities K0 , K1 , B0 associated with optimal scaling sλn opt with some common sampling region templates, determined from Theorems 3.1 and 4.2. For subsamples from (4.1), the theoretically best sλopt n can also be formulated in terms of |Rn | = det(∆n )|R0 | (sampling region volume), K1 , and B0 . 6.1 Examples in IR2 example 1. Rectangular regions in IR2 (potentially rotated): n o ′ If R0 = (l1 cos θ, l2 sin θ)x, (−l1 sin θ, l2 cos θ)x : x ∈ (−1/2, 1/2]2 for θ ∈ [0, π], 0 < l1 , l2 , then K0 = 4 , 9 B0 = X k∈Z2 k=(k1 ,k2 )′ |k1 cos θ − k2 sin θ| |k1 sin θ + k2 cos θ| + σ(k). l1 l2 The characteristics K1 , B0 for determining optimal subsamples based on two rectangular templates are further described in Table 4. example 2. If R0 is a circle of radius r ≤ 1/2 centered at the origin, then K0 appears in Table 3 and P B0 = 2/(rπ) k∈Z2 kkkσ(k). example 3. For any triangle, K0 = 2/5. Two examples are provided in Tables 2 and 4. Figure 1. Examples of templates R0 ⊂ (−1/2, 1/2]2 are outlined by solid lines. Cross-shaped sampling regions Rn described in Table 2 are based on R0 in (v). 0.5 i ii iii @ @ A A @ @ A @ @ 0 A @ @ A @ @ -0.5 @ @ A -0.5 0 0.5 iv v example 4. If R0 is a regular hexagon, centered at the origin and with side length l ≤ 1/2, then 37 K0 = , 81 √ √ 2 3 X B0 = |k2 | + max{ 3|k1 |, |k2 |} σ(k). l 2 k∈Z example 5. For any parallelogram in IR2 with interior angle γ and adjacent sides of ratio b ≥ 1, K0 = 4/9 + 2/15 · b−2 | cos γ|(1 − | cos γ|). In particular, if a parallelogram R0 is formed by two vectors 18 Table 4: Examples of several shapes of R0 ⊂ IR2 and associated K1 , B0 for sλopt n R0 K1 B0 (−1/2, 1/2]2 4/9 P Circle of radius 1/2 at origin π/4 − 4/(3π) 4/π “Diamond” in Figure 1(i) 2/9 Right triangle in Figure 1(ii) 1/5 Triangle in Figure 1(iii) 1/5 k∈Z2 2 k∈Z2 P k∈Z2 kkkσ(k) kkk∞ σ(k) Table 2 P 2|k2 |σ(k) + k∈Z2 , |k1 |≤|k2 |/2 Parallelogram in Figure 1(iv) P kkk1 σ(k) √ 2/9 + ( 5 − 1)/375 P |k2 | + 2|k1 | σ(k) k∈Z2 , |k1 |>|k2 |/2 √ P 4/ 5 k∈Z2 |k1 − 2k2 |/5 + |k2 | σ(k) (0, l1 )′ , (l2 cos γ, l2 sin γ)′ extended from a point x ∈ (−1/2, 1/2]2, then X k1 · | cos θ| − k2 · | sin θ| 1 |k2 | B0 = + σ(k), | sin θ| max{l1 , l2 } min{l1 , l2 } 2 k∈Z γ ∈ (0, π), l1 , l2 > 0. For further bias term B0 calculation tools with more general (non-convex) sampling regions and templates R0 (represented as the union of two approximately convex sets), see Nordman (2002). 6.2 Examples in IRd , d ≥ 3 example 6. For any sphere, K0 is given in Table 3. The properties B0 , K1 of the sphere described in Table 1 and 2 correspond to the template sphere R0 of radius 1/2 (and maximal volume in (−1/2, 1/2]3). example 7. The K0 value for any IR3 cylinder appears in Table 3. If R0 is a cylinder with circular base (parallel to x-y plane) of radius r and height h, then ! p X |k3 | 2 k12 + k22 B0 = + σ(k). h πr k∈Z3 k=(k1 ,k2 ,k3 )′ The results of Theorem 4.2 for determining the bias B0 also seem plausible for convex sampling regions in IRd , d ≥ 4, but require further study of lattice point counting techniques in higher dimensions. However, bias expansions of the OL and NOL subsample variance estimators are relatively straightforward for an important class of rectangular sampling regions based on the prototype R0 = (−1/2, 1/2]d, 19 which can then be used in optimal subsample scaling. These hypercubes have “faces” parallel to the coordinate axes which simplifies the task of counting sampling sites, or lattice points, within such regions. We give precise bias expansions in the following theorem, while allowing for potentially missing sampling sites at the borders, or faces, of the sampling region Rn . d Theorem 6.1 Let (−1/2, 1/2)d ⊂ Λ−1 ℓ R0 ⊂ [−1/2, 1/2] , d ≥ 3 for a d × d diagonal matrix Λℓ with entries 0 < ℓi ≤ 1, i = 1, ..., d. Suppose sRn = sλn R0 and Assumptions A.2 - A.5, Conditions D2 and 2 2 2 2 −1 M2+a hold. Then, the biases E(τ̂n, OL ) − τn , E(τ̂n,N OL ) − τn are equal to −sλn B0 (1 + o(1)) where d X X |ki | σ(k), B0 = ℓi d i=1 k∈Z σ(k) = Cov ∇′ Z(t), ∇′ Z(t + k) . example 8. For rectangular sampling regions Rn = ∆n (−1/2, 1/2]d, optimal subsamples may be chosen with opt n,N OL sλ = |Rn | dτ 4 X k∈Zd 2 !1/(d+2) kkk1 σ(k) (1 + o(1)) or opt opt sλn,OL = sλn,N OL 1/(d+2) 3 2 (common directional scaling) using the template R0 = (−1/2, 1/2]d. 7 Empirical subsample size determination In this section we consider data-based estimation of the theoretical optimal subsample sizes. In particular, we propose methods for estimating the scaling factor sλopt n for subsamples in (4.1), which are applicable to both OL and NOL subsampling schemes. Inference on “best” subsample scaling closely resembles the problem of empirically gauging the theoretically optimal block size with the MBB variance estimator, which has been much considered for time series. We first frame the techniques used in this special setting, which can be modified for data-based subsample selection. Two distinct techniques have emerged for estimating the (time series) MBB block size. One approach involves “plug-in” type estimators of optimal block length [cf. Bühlman and Künsch (1999)]. For subsample inference on nVar(Z̄n ), Carlstein (1986) (with AR(1) models) and Politis et al. (1999) have described “plug-in” estimators for subsample length. The second MBB block selection method, suggested by Hall et al. (1995) and Hall and Jing (1996), uses subsampling to create a data-based version of the MSE of the MBB estimator (as a function of block size) which then is minimized to obtain an estimate of optimal block length. With spatial data, Garcia-Soidan and Hall (1997) extended this empirical MSE selection procedure with subsample-based distribution estimators. For variance estimation of a univariate time series sample mean, Lèger et al. (1992) proposed a subsample length estimate based on asymptotic MSE considerations; namely, selecting a size by simultaneously solving 3 2 2 opt 2 (sλopt n ) = (3/2)(B̂/τ̂n,OL ) n for sλn and the associated estimator τ̂n,OL , where B̂ estimates the paren- thetical quantity in (4.3). 20 A “plug-in” procedure for inference on the theoretical, optimal subsample size involves constructing, and subsequently substituting, (consistent) estimates of unknown population parameters in sλopt n from Theorem 5.1. The limiting variance τ 2 appears in the formulation of sλopt n and could be approximated with a subsample variance estimate based on some preliminary size choice. The value K0 can be determined from the available sampling region Rn , but selection of a template R0 is also required in the “plug-in” approach. The best solution may be to pick a sampling template R0 to be the largest set of d the form Λ−1 l Rn for a positive diagonal matrix Λ that fits within (−1/2, 1/2] ; this choice appears to 2 2 reduce the magnitude of the bias of τ̂n, , τ̂n, , which contributes heavily to the MSE of an estimator. OL N OL In a spirit similar to the “plug-in” approach, one could empirically select sλopt n as in Lèger et al. (1992) and Politis and Romano (1993b). After evaluating K0 and an estimate B̂0 of the leading bias compo2 2 nent B0 , compute τ̂n, OL (or analogously τ̂n,N OL ) for a series of sλn values and simultaneously solve the 2 asymptotical MSE-based formulation sλd+2 = det(∆n )(B̂0 )2 /{dK0 (τ̂n, )2 } for s λn and an associated n OL 2 τ̂n, . OL 8 Proofs for variance expansions For the proofs, we use C to denote generic positive constants that do not depend on n or any Zd integers (or Zd lattice points). The real number r, appearing in some proofs, always assumes the value stated under Condition Mr with respect to the lemma or theorem under consideration. Unless otherwise specified, limits in order symbols are taken letting n tend to infinity. In the following, we denote the indicator function as I{·} (ie. I{·} ∈ {0, 1} and I{A} = 1 if and only if an event A holds). For two sequences {rn } and {tn } of positive real numbers, we write rn ∼ tn if rn /tn → 1 as n → ∞. We write λmax and sλmax for the largest diagonal entries of ∆n and s∆n , n n respectively, while sλ(n) min ≥ 1 will denote the smallest diagonal entry of s∆n . We require a few lemmas for the proofs. Lemma 8.1 Suppose T1 , T2 ⊂ Zd ≡ t + Zd are bounded. Let p, q > 0 where 1/p + 1/q < 1. If X1 , X2 are random variables, with Xi measurable with respect to FZ (Ti ), i = 1, 2, then, 1−1/p−1/q |Cov(X1 , X2 )| ≤ 8(E|X1 |p )1/p (E|X2 |q )1/q α dis(T1 , T2 ); max |Ti | , i=1,2 provided expectations are finite and dis(T1 , T2 ) > 0. Proof. Follows from Theorem 3, Doukhan (1994, p. 9). Lemma 8.2 Let r ∈ Z+ . Under Assumption A.3 and Condition Mr , for 1 ≤ m ≤ 2r and any T ⊂ Zd ≡ t + Zd , X m m/2 E Z(s) − µ ; ≤ C(α)|T | s∈T C(α) is a constant that depends only on the coefficients α(k, l), l ≤ 2r, and EkZ(t)k2r+δ . 21 Proof: This follows from Theorem 1, Doukhan (1994, p. 26-31) and Jensen’s inequality. We next determine the asymptotic sizes of important sets relevant to the sampling or subsampling designs. Lemma 8.3 Under Assumptions A.1-A.2, the number of sampling sites within (a) the sampling region Rn : N n = |Rn ∩ Zd | ∼ |R0 | · det(∆n ); (b) an OL subsample, Ri,n , i ∈ JOL : sN n ∼ |R0 | · det(s∆n ); (c) a NOL subsample, R̃i,n , i ∈ JN OL : sN i,n ∼ |R0 | · det(s∆n ). The number of (d) OL subsamples within Rn : |JOL | ∼ |R0 | · det(∆n ); (e) NOL subsamples within Rn : |JN OL | ∼ |R0 | · det(∆n ) · det(s∆n )−1 . (f ) sampling sites near the border of a subsample, Ri,n or R̃i,n , is less than c 6= ? f or T j = j + [−2, 2]d }| ≤ C( λmax )d−1 . supi∈Zd |{j ∈ Zd : T j ∩ Ri,n 6= ?, T j ∩ Ri,n s n Proof: Results follow from the boundary condition on R0 ; see Nordman (2002) for more details. We require the next lemma for counting the number of subsampling regions which are separated by an appropriately “small” integer translate; we shall apply this lemma in the proof of Theorem 3.1. For k = (k1 , . . . , kd )′ ∈ Zd , define the following sets, Jn (k) = |{i ∈ JOL : i + k + s∆n R0 ⊂ Rn }|, En = {k ∈ Zd : |kj | ≤ sλ(n) j , j = 1, . . . , d}. Lemma 8.4 Under Assumption A.2, Jn (k) max 1 − = o(1). k∈En |JOL | Proof: For k ∈ En , write the set Jn∗ (k) and bound its cardinality: Jn∗ (k) = ≤ |{i ∈ JOL : (i + k + s∆n R0 ) ∩ ∆n R0c 6= ?}| d max max d−1 |{i ∈ Zd : T i ∩ ∆n R0c 6= ?, T i ∩ ∆n R0 6= ?; T i = i + sλmax , n [−2, 2] }| ≤ C sλn (λn ) by the boundary condition on R0 . We have then that for all k ∈ En , max d−1 |JOL | ≥ Jn (k) = |JOL | − Jn∗ (k) ≥ |JOL | − C sλmax . n (λn ) By Assumption A.2 and the growth rate of |JOL | from Lemma 8.3, the proof is complete. We now provide a theorem which captures the main contribution to the asymptotic variance expan2 sion of the OL subsample variance estimator τ̂n, from Theorem 3.1. OL 22 Theorem 8.1 For i ∈ Zd , let Yi,n = ∇′ (Zi,n − µ). Under the Assumptions and Conditions of Theorem 3.1, sN n X k∈En 2 2 Cov(Y0,n , Yk,n ) = K0 · [2τ 4 ](1 + o(1)), where the constant K0 is defined in Theorem 3.1. Proof: We give only a sketch of the important features; for more details, see Nordman (2002). For a set T ⊂ IRd , define the function Σ(·) as X Σ(T ) = s∈Zd ∩T ∇′ Z(s) − µ . (I) With the set intersection Rk,n = sRn ∩ (k + sRn ), k ∈ Zd , write functions: (I) H1n (k) = Σ(Rk,n \ Rk,n ), (I) H2n (k) = Σ(R0,n \ Rk,n ), (I) H3n (k) = Σ(Rk,n ). These represent, respectively, sums over sites in: Rk,n but not R0,n = sRn ; R0,n but not Rk,n ; and both R0,n and Rk,n . Then define hn (·) : Zd → IR as 2 2 2 2 2 2 4 2 hn (k) = E[H1n (k)]E[H2n (k)]+E[H1n (k)]E[H3n (k)]+E[H2n (k)]E[H3n (k)]+E[H3n (k)]−(sN n )4 [E(Y0,n )]2 . We will make use of the following proposition. Proposition 8.1 Under the Assumptions and Conditions of Theorem 3.1, 2 2 ) − (sN n )−2 hn (k) = o(1). max (sN n )2 Cov(Y0,n , Yk,n k∈En The proof of Proposition 8.1 can be found in Nordman (2002) and involves cutting out Zd lattice points near the borders of R0,n and Rk,n , (say) Bn0 and Bnk with e ℓn = ⌊(sλ(n) min ) ⌋, c 6= ? , Bj,n = i ∈ Zd : i ∈ Rj,n , (i + ℓn (−1, 1]d) ∩ Rj,n j ∈ Zd , where e = (κδ/{(2r + δ)(2r − 1 − 1/d)} + 1)/2 < 1 from Condition Mr . Here ℓn → ∞, ℓn = o(sλ(n) min ) is (I) chosen so that the remaining observations in R0,n , Rk,n , Rk,n are nearly independent upon remov- ing B0,n , Bk,n points and, using the R0 -boundary condition, the set cardinalities |B0,n |, |Bk,n | ≤ d−1 Cℓn (sλmax are of smaller order than sN n (namely, these sets are asymptotically negligible in size). n ) By Proposition 8.1 and |En | = O(sN n ), we have X X 2 2 , Yk,n ) − (sN n )−3 hn (k) = o(1). Cov(Y0,n sN n k∈En Consequently, we need only focus on (sN n )−3 (8.1) k∈En P k∈En hn (k) to complete the proof of Theorem 8.1. For measurability reasons, we create a set defined in terms of the IRd Lebesgue measure: o det(∆0 )|R0 | n E+ ≡ (0, 1) ∩ ǫ < : x ∈ IRd : |(x + ∆0 R0 ) ∩ ∆0 R0 | = ǫ or det(∆0 )|R0 | − ǫ = 0 . 2 23 Note the set {0 < ǫ < min{1, (det(∆0 )|R0 |)/2} : ǫ ∈ / E+ } is at most countable [cf. Billingsley (1986), Theorem 10.4]. For ǫ ∈ E+ , define a new set as a function of ǫ and n: (I) (I) (n) d d R̃ǫ,n = {k ∈ Zd : |Rk,n | > ǫ(sλ(n) 1 ) , |sRn \ Rk,n | > ǫ(sλ1 ) }. (I) Here R̃ǫ,n ⊂ En because k 6∈ En implies Rk,n = ?. P We now further simplify (sN n )−3 k∈En hn (k) using the following proposition involving R̃ǫ,n . Proposition 8.2 There exist N ∈ Z+ and a function b(·) : E+ → (0, ∞) such that b(ǫ) ↓ 0 as ǫ ↓ 0 and X X −1 d (sN n )−3 hn (k) − hn (k) ≤ C ǫ + (sλ(n) ) + [b(ǫ)] , 1 k∈En (8.2) k∈R̃ǫ,n where C > 0 does not depend on ǫ ∈ E+ or n ≥ N. The proof of Proposition 8.2 is tedious and given in Nordman (2002). The argument involves bounding the sum of hn (·) over two separate sets in En : those integers in En that are either “too large” or “too small” in magnitude to be included in R̃ǫ,n . To finish the proof, our approach (for an arbitrary ǫ ∈ E+ ) will be to write (sN n )−3 P k∈R̃ǫ,n hn (k) as an integral of a step function with respect to the Lebesgue measure of a step function, say fǫ,n (x), then show limn→∞ fǫ,n (x) exists almost everywhere (a.e.) on IRd , and apply the Lebesgue Dominated P 2 2 Convergence Theorem (LDCT). By letting ǫ ↓ 0, we will obtain the limit of sN n k∈En Cov(Y0,n , Yk,n ). Fix ǫ ∈ E+ . With counting arguments based on the boundary condition of R0 and the definition of (I) (I) R̃ǫ,n , it holds that for some Nǫ ∈ Z+ and all k ∈ R̃ǫ,n : |Rk,n ∩ Zd | ≥ 1 and sN n − |Rk,n ∩ Zd | ≥ 1 when n ≥ Nǫ . We can rewrite (sN n )−2 hn (k), k ∈ R̃ǫ,n , in the well-defined form (for n ≥ Nǫ ): hn (k) (sN n )2 = # " # !2 (I) 2 2 |Rk,n ∩ Zd | H1n (k) H2n (k) E E 1− (I) (I) d d sN n sN n − |Rk,n ∩ Z | sN n − |Rk,n ∩ Z | # " " # ! ! (I) (I) 2 2 2 X |Rk,n ∩ Zd | |Rk,n ∩ Zd | Hjn (k) H3n (k) E + E 1− (I) (I) d |Rk,n ∩ Zd | sN n sN n sN n − |Rk,n ∩ Z | j=1 " # ! 2 (I) 4 |Rk,n ∩ Zd | H3n (k) 2 +E − [sN n E(Y0,n )]2 . (I) |Rk,n ∩ Zd |2 sN n " For x = (x1 , ..., xd )′ ∈ IRd , write ⌊x⌋ = (⌊x1 ⌋, . . . , ⌊xd ⌋)′ ∈ Zd . Let fǫ,n (x) : IRd → IR be the step function defined as fǫ,n (x) = (sN n )−2 I{⌊sλ(n) x⌋∈R̃ǫ,n } hn (⌊sλ(n) 1 x⌋). 1 + We have then that (with the same fixed ǫ ∈ E ): d Z X 1 (sλ(n) −2 1 ) (sN n ) hn (k) = fǫ,n (x)dx. sN n sN n IRd k∈R̃ ǫ,n 24 (8.3) We focus on showing: lim fǫ,n (x) = fǫ (x) ≡ I{x∈R̃ǫ } 2τ 4 n→∞ |(x + ∆0 R0 ) ∩ ∆0 R0 | det(∆0 )|R0 | 2 a.e. x ∈ IRd , (8.4) with R̃ǫ = {x ∈ IRd : |(x + ∆0 R0 ) ∩ ∆0 R0 | > ǫ, |∆0 R0 \ (x + ∆0 R0 )| > ǫ}, a Borel measurable set. Write xn = ⌊sλ1(n) x⌋, x ∈ IRd , to ease the notation. To establish (8.4), we begin by showing convergence of indicator functions: I{xn ∈R̃ǫ,n } → I{x∈R̃ǫ } a.e. x ∈ IRd . (8.5) −1 −1 Define the sets An (x) = (sλ(n) {(xn + sRn ) ∩ sRn }, Ãn (x) = {(sλ(n) sRn } \ An (x) as a function of 1 ) 1 ) x ∈ IRd . The LDCT can be applied to show: for each x ∈ IRd , |An (x)| → |(x + ∆0 R0 ) ∩ ∆0 R0 | and |Ãn (x)| → |∆0 R0 \ (x + ∆0 R0 )|. Thus, if x ∈ R̃ǫ , then |An (x)| → |(x + ∆0 R0 ) ∩ ∆0 R0 | > ǫ, |Ãn (x)| → |∆0 R0 \ (x + ∆0 R0 )| > ǫ, (8.6) implying further that 1 = I{xn ∈R̃ǫ,n } → I{x∈R̃ǫ } = 1. Now consider R̃ǫc . If x 6∈ R̃ǫ such that |(x+∆0 R0 )∩ ∆0 R0 | < ǫ (or |∆0 R0 \ (x + ∆0 R0 )| < ǫ), then |An (x)| < ǫ (or |Ãn (x)| < ǫ) eventually for large n and 0 = I{xn ∈R̃ǫ,n } → I{x∈R̃ǫ } = 0 in this case. Finally, ǫ ∈ E+ implies that a last possible subset of R̃ǫc has Lebesgue measure zero; namely, x ∈ R̃ǫc : |(x + ∆0 R0 ) ∩ ∆0 R0 | = ǫ or |∆0 R0 \ (x + ∆0 R0 )| = ǫ = 0. We have now proven (8.5). We next establish a limit for (sN n )−2 hn (xn ), x ∈ R̃ǫ . We wish to show: |Rx(I)n ,n ∩ Zd | |(x + ∆0 R0 ) ∩ ∆0 R0 | → , N det(∆0 )|R0 | s n x ∈ R̃ǫ . (8.7) d−1 Using the bound | |Rx(I)n ,n | − |Rx(I)n ,n ∩ Zd | | ≤ C(sλmax from the R0 -boundary condition and noting n ) −d the limit in (8.6), we find: (s λ(n) |Rx(I)n ,n ∩ Zd | → |(x + ∆0 R0 ) ∩ ∆0 R0 |, x ∈ R̃ǫ . By this and 1 ) (sλ1(n) )d /sN n → (det(∆0 )|R0 |)−1 , (8.7) follows. We can also establish: for each x ∈ R̃ǫ , j = 1 or 2, " # " # 2j 2 Hjn (xn ) H3n (xn ) ′ 2j 2 E → E([∇ Z∞ ] ), E , sN n E(Y0,n ) → E([∇′ Z∞ ]2 ) (I) d |Rx(I)n ,n ∩ Zd |j sN n − |Rxn ,n ∩ Z | (8.8) where ∇′ Z∞ is a normal N (0, τ 2 ) random variable and so it follows E([∇′ Z∞ ]2j ) = (2j − 1)τ 2j , j = 1, 2. The limits in (8.8) follow essentially from the Central Limit Theorem (CLT) of Bolthausen (1982), after verifying that the CLT can be applied; see Nordman (2002) for more details. Putting (8.5), (8.7), and (8.8) together, we have shown the (a.e.) convergence of the univariate functions fǫ,n (x) as in (8.4). For k ∈ En and n ≥ Nǫ , Lemma 8.2 ensures: (sN n )−2 |hn (k)| ≤ C, implying that for x ∈ IRd : |fǫ,n (x)| ≤ CI{x∈[−c,c]d} for some c > 0 by Assumption A.1. With this uniform bound on fǫ,n (·) and the limits in (8.4), we can apply the LDCT to get Z Z lim f (x)dx = fǫ (x)dx, ǫ ∈ E+ . ǫ,n d n→∞ IRd IR 25 (8.9) + Let {ǫm }∞ where ǫm ↓ 0. Then, R̃ǫm ⊂ ∆0 [−1, 1]d and limm→∞ I{x∈R̃ǫm } → I{x∈R̃0 } for m=1 ⊂ E x 6= 0 ∈ IRd , with R̃0 = x ∈ IRd : 0 < |(x + ∆0 R0 ) ∩ ∆0 R0 | < det(∆0 )|R0 | . Hence, by the LDCT, lim m→∞ Z IRd fǫm (x)dx = Z IRd f0 (x)dx, 2 |(x + ∆0 R0 ) ∩ ∆0 R0 | . f0 (x) ≡ I{x∈R̃0 } [2τ ] det(∆0 )|R0 | 4 (8.10) From (8.1)-(8.3), (8.9)-(8.10) and (sλ1(n) )d /sN n → (det(∆0 )|R0 |)−1 , we have that: Z X 1 2 2 limn→∞ sN n Cov(Y0,n , Yk,n ) − f (x)dx 0 det(∆0 )|R0 | IRd k∈En Z Z X (sλ1(n) )d 1 2 2 ≤ lim sN n Cov(Y0,n , Yk,n ) − fǫm ,n (x)dx + fǫm (x) − f0 (x)dx d d det(∆0 )|R0 | IR sN n IR k∈En (n) d Z Z (sλ ) 1 + lim 1 f (x)dx f (x)dx − ǫ ,n ǫ m m det(∆0 )|R0 | IRd sN n IRd Z 1 → 0 as ǫm ↓ 0. ≤ C ǫm + [b(ǫm )]d + f (x) − f (x)dx ǫ 0 m det(∆0 )|R0 | IRd Finally, Z Z 2τ 4 |(y + R0 ) ∩ R0 |2 1 f (x)dx = dy, 0 det(∆0 )|R0 | IRd |R0 | IRd |R0 |2 using a change of variables y = ∆−1 0 x. This completes the proof of Theorem 8.1. For clarity of exposition, we will prove Theorem 3.1, parts (a) and (b), separately for the OL and NOL subsample variance estimators. 8.1 Proof of Theorem 3.1(a) For i ∈ JOL , we use a Taylor’s expansion of H(·) (around µ) to rewrite the statistic θ̂i,n = H(Zi,n ): θ̂i,n X X (Zi,n − µ)α Z 1 cα (Zi,n − µ) + 2 (1 − ω)Dα H(µ + ω(Zi,n − µ))dω α! 0 α = H(µ) + ≡ H(µ) + Yi,n + Qi,n . kαk1 =1 kαk1 =2 (8.11) We also have: θ̃n = H(µ) + |JOL |−1 2 τ̂n, OL = sN n 1 |JOL | X i∈JOL 2 Yi,n + P 1 |JOL | i∈JOL X Yi,n + |JOL |−1 Q2i,n + i∈JOL 2 |JOL | P i∈JOL X i∈JOL Qi,n ≡ H(µ) + Ȳn + Q̄n . Then, Yi,n Qi,n − Ȳn2 − Q̄2n − 2(Ȳn )(Q̄n ) . We establish Theorem 3.1(a) in two parts by showing X det(s∆n ) sN n 2 (a) Var Yi,n = K0 · · [2τ 4 ](1 + o(1)), |JOL | det(∆n ) i∈JOL X sN n 2 2 = o det(s∆n ) . (b) Var(τ̂n, ) − Var Y OL i,n |JOL | det(∆n ) i∈JOL 26 (8.12) 2 2 We will begin with proving (8.12)(a) above. For k ∈ Zd , let σn (k) = Cov(Y0,n , Yk,n ). We write X (sN n )2 (sN n )2 X 2 Var Yi,n = Jn (k)σn (k) + 2 |JOL | |JOL |2 i∈JOL k∈En X k∈Zd \En Jn (k)σn (k) ≡ W1n + W2n . 4 By stationarity and Lemma 8.2, we bound: |σn (k)| ≤ E(Y0,n ) ≤ C(sN n )−2 , k ∈ Zd . Using this covariance bound, Lemmas 8.3-8.4, and |En | ≤ 3d det(s∆n ): (sN n )2 X |En | Jn (k) det(s∆n ) σn (k) − W1n ≤ C · · max 1 − = o . |JOL | |JOL | k∈En |JOL | det(∆n ) (8.13) k∈En Then applying Theorem 8.1 and Lemma 8.3, (sN n )2 X det(s∆n ) σn (k) = K0 · · [2τ 4 ](1 + o(1)). |JOL | det(∆n ) (8.14) k∈En By (8.13)-(8.14), we need only show that W2n = o (det(s∆n )/ det(∆n )) to finish the proof of (8.12)(a). For i ∈ Zd , denote a set of lattice points within a translated rectangular region: Qd (n) ∩ Zd , Fi,n = i + j=1 − ⌈sλ(n) j ⌉/2, ⌈sλj ⌉/2 where ⌈·⌉ represents the “ceiling” function. Note that for k = (k1 , . . . , kd )′ ∈ Zd \ En , there exists d d j ∈ {1, . . . , d} such that |kj | > s λ(n) j , implying dis(R0,n ∩ Z , Rk,n ∩ Z ) ≥ dis(F0,n , Fk,n ) ≥ 1. Hence, sequentially using Lemmas 8.1-8.2, we may bound the covariances σn (k), k ∈ Zd \ En , with the mixing coefficient α(·, ·): |σn (k)| h i2r/(2r+δ) 2(2r+δ)/r ≤ 8 E(Y0,n ) α(dis(R0,n ∩ Zd , Rk,n ∩ Zd ), sN n )δ/(2r+δ) ≤ C(sN n )−2 α(dis(F0,n , Fk,n ), sN n )δ/(2r+δ) . From the above bound and Jn (k)/|JOL | ≤ 1, k ∈ Zd , we have |W2n | ≤ C(x,j,n) = C|JOL |−1 ∞ X d X x=1 j=1 C(x,j,n) α(x, sN n )δ/(2r+δ) , (8.15) n o i ∈ Zd : dis(F0,n , Fi,n ) = x = inf{|vj − wj | : v ∈ F0,n , w ∈ Fi,n } . The function C(x,j,n) counts the number of translated rectangles Fi,n that lie a distance of x ∈ Z+ from the rectangle F0,n , where this distance is realized in the jth coordinate direction for j = 1, . . . , d. For i ∈ Zd , x ≥ 1 and j ∈ {1, . . . , d}, if dis(F0,n , Fi,n ) = x = inf{|vj − wj | : v ∈ F0,n , w ∈ Fi,n }, then |ij | = ⌈sλj(n) ⌉ + x − 1 with the remaining components of i, namely im for m ∈ {1 . . . d} \ {j}, constrained 27 by |im | ≤ sλ(n) m + x. We use this observation to further bound the right hand side of (8.15) by: C|JOL |−1 ∞ X d X x=1 det(s∆n ) |JOL | ≤ C· ≤ det(s∆n ) C· |JOL | d Y j=1 m=1,j6=m d X −j (sλ(n) min ) ℓn hX dℓn + (n) sλmin ∞ X xj−1 + x=1 j=1 (n) 3(sλm + x) α(x, sN n )δ/(2r+δ) x=ℓn +1 1/e dκδ/(2r+δ) ℓn ℓ2rd−d n xj−1 [α1 (x)g(sN n )]δ/(2r+δ) ∞ X 2rd−d−1 x δ/(2r+δ) α1 (x) x=ℓn +1 i =o det(s∆n ) det(∆n ) , using Assumptions A.1, A.3, Condition Mr , and ℓn = o(sλ(n) min ). This completes the proof of (8.12)(a). To establish (8.12)(b), first note that X X 5 5 X X sN n sN n 1/2 1/2 1/2 2 2 2 Var(τ̂n, Y ≤ 4 A A + Var Y , ) − Var i,n i,n OL jn jn |JOL | |JOL | j=1 i∈JOL j=1 i∈JOL P where A1n = Var(sN n Ȳn2 ), A2n = Var(|JOL |−1 sN n i∈JOL Q2i,n ), A3n = Var(sN n Q̄2n ), A4n = P Var(|JOL |−1 sN n i∈JOL Yi,n Qi,n ), A5n = Var(sN n Ȳn Q̄n ). By (8.12)(a), it suffices to show that Ajn = o(det(s∆n )/ det(∆n )) for each j = 1, . . . , 5. We handle only two terms for illustration: A1n , A4n . Consider A1n . For s ∈ Rn ∩ Zd , let ω(s) = [2d det(s∆n )]−1 |{i ∈ JOL : s ∈ i + s∆n R0 }| so that 0 ≤ ω(s) ≤ 1. By Condition Mr and Theorem 3, Doukhan (1994, p.31) (similar to Lemma 8.2), h X i4 (2d det(s∆n ))4 (N n )2 (det(s∆n ))4 4 ′ A1n ≤ E(Ȳn ) = Z(s) − µ ≤ C · E ω(s)∇ . (8.16) |JOL |4 (sN n )2 |JOL |4 (sN n )2 d s∈Rn ∩Z Then A1n = o(det(s∆n )/ det(∆n )) follows from Lemma 8.3. To handle A4n , write σ1n (k) = Cov(Y0,n Q0,n , Yk,n Qk,n ), k ∈ Zd . Then, A4n X (sN n )2 X (sN n )2 X = Jn (k)σ1n (k) ≤ |σ1n (k)|+ 2 |JOL | |JOL | d d k∈Z k∈En k∈Z \En |σ1n (k)| = A4n (En )+A4n (Enc ). For k ∈ En , note |σ1n (k)| ≤ C(sN n )−3 using |Y0,n Q0,n | ≤ CkZ0,n − µk3 (1 + kZ0,n − µka ) (from Condition D) with Lemmas 8.1-8.2. From this bound, Lemma 8.3, and |En | ≤ 3d det(s∆n ), we find A4n (En ) = o(det(s∆n )/ det(∆n )). We next bound the covariances σ1n (k), k ∈ Zd \ En , |σ1n (k)| h i2r/(2r+δ) ≤ 8 E(|Y0,n Q0,n |(2r+δ)/r ) α(dis(R0,n ∩ Zd , Rk,n ∩ Zd ), sN n )δ/(2r+δ) ≤ C(sN n )−3 α(dis(F0,n , Fk,n ), sN n )δ/(2r+δ) by the stationarity of the random field Z(·) and Lemmas 8.1-8.2. Using this inequality and repeating the same steps used to majorize ‘W2n ’ from the proof of (8.12)(a) (see (8.15)), we have A4n (Enc ) = o(det(s∆n )/ det(∆n )). The proof of Theorem 3.1(a) is finished. 28 8.2 Proof of Theorem 3.1(b) To simplify the counting arguments, we assume here that s∆n ∈ Z+ , implying sN i,n = sN n , i ∈ Zd . The more general case, in which the NOL subregions may differ in the number of sampling sites, is treated in Nordman (2002). For each NOL subregion R̃i,n , we denote the corresponding sample mean Z̃i,n = (sN i,n )−1 P s∈R̃i,n ∩Zd The subsample evaluations of the statistic of interest, θ̂i,n , i ∈ JN OL , can be expressed through a Taylor’s expansion of H(·) around µ, substituting Z̃i,n for Zi,n in (8.11): θ̂i,n = H(Z̃i,n ) = H(µ) + Ỹi,n + Q̃i,n . We will complete the proof of Theorem 3.1(b) in two parts by showing X det(s∆n ) sN n 2 Ỹi,n = · [2τ 4 ](1 + o(1)), (a) Var |JN OL | det(∆n )|R0 | i∈JN OL X det(s∆n ) sN n 2 2 (b) Var(τ̂n,N OL ) − Var Ỹi,n = o . |JN OL | det(∆n ) (8.17) i∈JN OL We will begin with showing (8.17)(a). For k ∈ Zd , let J˜n (k) = {i ∈ JN OL : i + k ∈ JN OL } and σ̃n (k) = Cov(Ỹ0,n , Ỹk,n ). Then we may express the variance: X X (sN n )2 sN n 2 Ỹi,n = J˜n (k)σ̃n (k) + Var |JN OL | |JN OL |2 d i∈JN OL k∈Z 0<kkk∞ ≤1 X ˜ Jn (k)σ̃n (k) + |JN OL |σ̃n (0) k∈Zd kkk∞ >1 ≡ U1n + U2n + |JN OL |−1 (sN n )2 σ̃n (0). (8.18) We first prove U2n = o(|JN OL |−1 ), noting that det(s∆n )/ det(∆n ) = O(|JN OL |−1 ) from Lemma 8.3. When k = (k1 , . . . , kd )′ ∈ Zd , kkk∞ > 1, then for some 1 ≤ mk ≤ d: dis(R̃0,n ∩ Zd , R̃k,n ∩ Zd ) ≥ dis s∆n [−1/2, 1/2]d, s∆n (k + [−1/2, 1/2]d) = (n) max (|kj | − 1)sλ(n) ≡ (|kmk | − 1)sλm . j k 1≤j≤d −1 d If j ∈ {1 . . . d}, j 6= mk , we have |kj | ≤ (s λ(n) (|kmk | − 1)s λ(n) mk + 1. Note also if k ∈ Z , kkk∞ > 1, j ) then |σ̃(k)| ≤ C(sN n )−2 α (|kmk | − 1)sλ(n) mk , sN n by Lemmas 8.1-8.2. Hence, we have ∞ d d C X X (n) −(d−1) Y (n) (n) (n) |U2n | ≤ (sλ ) (sλi x + sλj ) α(sλmin x, sN n )δ/(2r+δ) |JN OL | x=1 j=1 j i=1,i6=j max d−1 ∞ X C sλn δ/(2r+δ) ≤ xd−1 α(sλ(n) min x, sN n ) (n) |JN OL | x=1 sλmin ∞ C {det(s∆n )}κδ/(2r+δ) X (n) 2rd−d−1 1 (n) δ/(2r+δ) ≤ (sλmin x) α1 (sλmin x) = o , 2rd−d−1 |JN OL | (sλ(n) |JN OL | min ) x=1 by Assumptions A.1, A.3 and Condition Mr . We now show that U1n = o(|JN OL |−1 ). For k ∈ Zd , 0 < kkk∞ ≤ 1, define {x ∈ IRd : 1/2 · sλj(n) < xj ≤ 1/2 · sλ(n) j + ℓn } j (n) Tk,n =: {x ∈ IRd : −1/2 · sλ(n) j − ℓn < xj ≤ −1/2 · sλj } ? 29 the trimmed set: if kj = 1, if kj = −1, if kj = 0, Z(s). j for each coordinate direction j = 1, . . . , d. Let Tk,n = ∪dj=1 Tk,n . We decompose the sum: sN n Ỹk,n = ∗ Σ(R̃k,n \ Tk,n ) + Σ(R̃k,n ∩ Tk,n ) ≡ S̃k,n + S̃k,n . Then, U1n = o(|JN OL |−1 ) follows from (1)-(4) below: h i1/3 2 ∗ 6 ∗ (1) |E(Ỹ0,n S̃k,n S̃k,n )| ≤ E(Ỹ0,n )E(|S̃k,n |3 )E(|S̃k,n |3 ) = o(1), using Lemma 8.2 and |R̃k,n ∩ Tk,n ∩ Zd | ≤ d X j=1 j |R̃k,n ∩ Tk,n ∩ Zd | ≤ ℓn det(s∆n ) d X −1 (sλ(n) . j ) j=1 2 ∗2 4 ∗4 (2) Likewise, E(Ỹ0,n S̃k,n ) ≤ [E(Ỹ0,n )E(S̃k,n )]1/2 = o(1). nh i1/2 o 2 2 2 ∗2 ∗2 (3) |sN n E(Ỹk,n ) − (sN n )−1 E(S̃k,n )| ≤ 4(sN n )−1 max E(S̃k,n )E(S̃k,n ) , E(S̃k,n ) = o(1). 2 2 (4) |Cov(Ỹ0,n , S̃k,n )| ≤ Cα(ℓn , sN n )δ/(2r+δ) = o(1) by applying Lemmas 8.1-8.2, Assumption A.3, and Condition Mr with dis(R̃k,n ∩ Zd \ Tk,n , sRn ∩ Zd ) ≥ ℓn . 2 Since σ̃n (0) = Var(Y0,n ), the remaining quantity in (8.18) can be expressed as (sN n )2 1 det(∆n ) σ̃n (0) = Var [∇′ Z∞ ]2 (1 + o(1)) = [2τ 4 ](1 + o(1)) |JN OL | |JN OL | det(∆n )|R0 | by applying the CLT (as in (8.8)) and Lemma 8.3. We have now established (8.17)(a). We omit the proof of (8.17)(b), which resembles the one establishing (8.12)(b) and incorporates arguments used to bound U1n , U2n ; Nordman (2002) provides more details. 9 Proofs for bias expansions We will use the following lemma concerning τn2 = N n Var(θ̂n ) to prove the theorems pertaining to bias 2 2 expansions of τ̂n, and τ̂n, . OL N OL Lemma 9.1 Under the Assumptions and Conditions of Theorem 3.1, τn2 = τ 2 + O [det(∆n )]−1/2 . Proof: By a Taylor’s expansion around µ: θ̂n = H(Z̄Nn ) = H(µ) + ȲNn + Q̄Nn (replacing Z̄Nn for Zi,n in (8.11)) and so N n Var(θ̂n ) = N n Var(ȲNn + Q̄Nn ). For k ∈ Zd , let Nn (k) = |{i ∈ Rn ∩ Zd : i + k ∈ Rn }|. It holds that Nn (k) ≤ Nn and Nn ≤ Nn (k) + |{i ∈ Zd : T i ∩ Rn 6= ?, T i ∩ Rnc 6= ?; T i = i + kkk∞ [−1, 1]d}| ≤ d−1 Nn (k) + Ckkkd∞ (λmax , n ) (9.1) by the boundary condition on R0 . Also, by Lemma 8.1 and stationarity, for each k 6= 0 ∈ Zd : |σ(k)| ≤ Cα1 (kkk∞ )δ/(2r+δ) , k ∈ Zd . (9.2) Using |{k ∈ Zd : kkk∞ = x}| ≤ Cxd−1 , x ≥ 1, the covariances are absolutely summable over Zd : X k∈Zd |σ(k)| ≤ |σ(0)| + C ∞ X x=1 30 xd−1 α1 (x)δ/(2r+δ) < ∞. (9.3) From (9.1)-(9.3), we find N n Var(ȲNn ) 1 X Nn (k)σ(k) = τ 2 + In ; Nn d = (9.4) k∈Z 1 X |N n − Nn (k)| · |σ(k)| Nn d |In | ≤ k∈Z ≤ C· ∞ d−1 X (λmax n ) x2d−1 α1 (x)δ/(2r+δ) = O [det(∆n )]−1/d . Nn x=1 (9.5) By Condition D and Lemma 8.2, it follows that N n Var(Q̄Nn ) = O([det(∆n )]−1 ). Finally, with bounds on the variance of ȲNn and Q̄Nn , we apply the Cauchy-Schwartz inequality to the covariance: Nn |Cov(ȲNn , Q̄Nn )| = O([det(∆n )]−1/2 ), setting the order on the difference |N n Var(θ̂n ) − τ 2 |. 2 2 We give a few lemmas which help compute the bias of the estimators τ̂n, OL and τ̂n,N OL . Lemma 9.2 Let Ỹi,n = (sN i,n )−1 P ′ s∈Zd ∩R̃i,n∇ (Z(s) − µ), i ∈ Zd . Suppose Assumptions A.1 - A.5 and Conditions D2 and M2+a hold with d ≥ 2. Then, 2 2 E(τ̂n, ) − sN 0,n E(Ỹ0,n ) OL 2 −1 E(τ̂n, N OL ) − |JN OL | P 2 i∈JN OL sN i,n E(Ỹi,n ) =: O [det(s∆n )]−1/2 + o [det(s∆n )]−1/d . 2 2 Proof: We consider here only E(τ̂n, OL ). For integer s∆n , the arguments for E(τ̂n,N OL ) are essentially the same; more details are provided in Nordman (2002). By stationarity and an algebraic expansion: h i 2 2 2 2 2 E(τ̂n, ) = N E(Y ) + E(Q ) + 2E(Y Q ) − E( Ȳ ) − E( Q̄ ) − 2E( Ȳ Q̄ ) . s n 0,n 0,n n n OL 0,n 0,n n n With the moment arguments based on Lemma 8.2 and Condition Dr , we have 2 sN n E(Y0,n ) ≤ C, 2 sN n E(Ȳn ) ≤ C sN n (N n )−1 , 2 2 sN n E(Q0,n ), sN n E(Q̄n ) ≤ C(sN n )−1 , (9.6) where bound on sN n E(Ȳn2 ) follows from (8.16). By Holder’s inequality and Assumption A.2: 2 2 E(τ̂n, ) = sN n E(Y0,n ) + O (sN n )−1/2 + O sN n (N n )−1 . OL 2 2 Note that E(Y0,n ) = E(Ỹ0,n ), sN n = sN 0,n . Hence, applying Lemma 8.3 and Assumption A.2, we 2 establish Lemma 9.2 for τ̂n, . OL The next lemma provides a small refinement to Lemma 9.2 made possible when the function H(·) 2 2 is smoother. We shall make use of this lemma in bias expansions of τ̂n, OL and τ̂n,N OL in lower sampling dimensions, namely d = 1 or 2. 31 Lemma 9.3 Assume d = 1 or 2. In addition to Assumptions A.1 - A.5, suppose that Conditions D3 and M3+a hold. Then, O [det(s∆n )]−1 =: P o [det( ∆ )]−1/2 2 −1 2 E(τ̂n,N OL ) − |JN OL | N E( Ỹ ) s n s i,n i,n i∈JN OL 2 2 E(τ̂n, ) − sN 0,n E(Ỹ0,n ) OL if d = 1, if d = 2. 2 Proof: We again consider only τ̂n, . For i ∈ JOL , we use a third-order Taylor’s expansion of each OL subsample statistic around µ: θ̂i,n = H(µ) + Yi,n + Qi,n + Ci,n , where Yi,n = ∇′ (Zi,n − µ); Qi,n Z 1 X cα X cα α α = (Zi,n − µ) ; Ci,n = 3 (Zi,n − µ) (1 − ω)2 Dα H(µ + ω(Zi,n − µ))dω. α! α! 0 kαk1 =2 kαk1 =3 Here Ci,n denotes the remainder term in the Taylor’s expansion and Qi,n is defined a little differently. P Write the sample means for the Taylor terms: Ȳn , Q̄n as before, C̄n = |JOL |−1 i∈JOL Ci,n . The moment inequalities in (9.6) are still valid and, by Lemma 8.2 and Condition D, we can produce 2 bounds: sN n E(C0,n ), sN n E(C̄n2 ) ≤ C(sN n )−2 . By Holder’s inequality and the scaling conditions from Assumptions A.1-A.2, we then have O [det(s∆n )]−1 h i 2 2 E(τ̂n, + OL ) = sN n E(Y0,n ) + 2E(Y0,n Q0,n ) o [det( ∆ )]−1/2 s n if d = 1, if d = 2. 2 2 2 Since sN n E(Y0,n ) = sN 0,n E(Ỹ0,n ), Lemma 9.3 for τ̂n, will follow by showing OL sN n E(Y0,n Q0,n ) = sN n p X i,j,k=1 h i ci aj,k E (Zi,0,n − µ)(Zj,0,n − µ)(Zk,0,n − µ) = O [det(s∆n )]−1 , (9.7) where Z0,n = (Z1,0,n , . . . , Zp,0,n )′ ∈ IRp is a vector of coordinate sample means; ci = ∂H(µ)/∂xi ; aj,k = 1/2 · ∂ 2 H(µ)/∂xj ∂xk . Denote the observation Z(s) = (Z1 (s), . . . , Zp (s))′ ∈ IRd , s ∈ Zd . Fix i, j, k ∈ {1, . . . , p} and WLOG ijk assume µ = 0. Then, sN n E(Zi,0,n Zj,0,n Zk,0,n ) = (sN n )−1 E(Zi (t)Zj (t)Zk (t)) + Lijk 1n + L2n where Lijk 1n = (sN n )−2 X u,v,w∈Zd ∩sRn , u6=v6=w Lijk 2n = (sN n )−2 X h i E Zi (u)Zj (v)Zk (w) , h i E Zi (u)Zj (u)Zk (v) + Zi (u)Zj (v)Zk (u) + Zi (v)Zj (u)Zk (u) . u,v∈Zd ∩sRn , u6=v By Lemma 8.1, Assumption A.3, and Condition Mr , |Lijk 2n | ≤ ∞ C X d−1 x α(x, 1)δ/(2r+δ) = O [det(s∆n )]−1 , sN n x=1 similar to (9.3). For y1 , y2 , y3 ∈ IRd , define dis3 ({y1 , y2 , y3 }) = max1≤i≤3 dis({yi }, {y1 , y2 , y3 } \ {yi }). If x ≥ 1 ∈ Z+ , then |{(y1 , y2 ) ∈ (Zd )2 : dis3 ({y1 , y2 , 0}) = x}| ≤ Cx2d−1 from Theorem 4.1, 32 Lahiri (1999a). Thus, |Lijk 1n | ≤ ∞ C X 2d−1 x α(x, 2)δ/(2r+δ) = O [det(s∆n )]−1 . sN n x=1 2 This establishes (9.7), completing the proof of Lemma 9.3 for τ̂n, . OL We use the next lemma in the proof of Theorem 4.2. It allows us to approximate lattice point counts with Lebesgue volumes, in IR2 or IR3 , to a sufficient degree of accuracy. Lemma 9.4 Let d = 2, 3 and R0 ⊂ (−1/2, 1/2]d such that B ◦ ⊂ R0 ⊂ B for a convex set B. Let d {bn }∞ n=1 be a sequence of positive real numbers such that bn → ∞. If k ∈ Z , then there exists Nk ∈ Z+ and Cd > 0 such that for n ≥ Nk , i ∈ Zd , |bn R0 | − |Zd ∩ bn (i + R0 )| − |bn R0 ∩ k + bn R0 | − |Zd ∩ bn (i + R0 ) ∩ k + bn (i + R0 )| ≤ C2 kkk2∞ 2 C3 kkk4∞ b5/3 n + ξk,n bn if d = 2 if d = 3, where {ξk,n }∞ n=1 ⊂ IR is a nonnegative sequence (possibly dependent on k) such that ξk,n → 0. Proof: Provided in Nordman and Lahiri (2002). To establish Lemma 4.1, we require some additional notation. For i, k ∈ Zd , let sN i,n (k) = |Zd ∩ R̃i,n ∩ k + R̃i,n | denote the number of sampling sites (lattice points) in the intersection of a NOL subregion with its k-translate. Note sN i,n (k) is a subsample version of Nn (k) from (9.1). Proof of Lemma 4.1: We start with bounds: d−1 C(sλmax , n ) (9.8) ≤ |{j ∈ Zd : T j ∩ sRn 6= ?, T j ∩ sRcn 6= ?; T j = j + kkk∞ [−2, 2]d}| ≤ d−1 Ckkkd∞ (sλmax , n ) (9.9) sup |sN n − sN i,n | ≤ i∈Zd |sN i,n − sN i,n (k)| by the boundary condition on R0 (cf. Lemma 8.3) and inf j∈Zd ks∆n i − jk∞ ≤ 1/2. Modify (9.4) by replacing N n , Nn (k), ȲNn with sN i,n , sN i,n (k), Ỹi,n = ∇′ (Z̃i,n − µ) (i.e., use a NOL subsample version in place of the sample one); and replace N n , ∆n , λmax with the subsample analogs n max sN i,n , s∆n , sλn in (9.5). We then find using (9.3): for each i ∈ Zd , 2 sN i,n E(Ỹi,n ) − τ2 = 1 X (sN i,n (k) − sN i,n )σ(k) ≡ s Ii,n , sN i,n d k∈Z 33 (9.10) sup |s Ii,n | ≤ sup i∈Zd i∈Zd 1 X |sN i,n (k) − sN i,n | · |σ(k)| sN i,n d k∈Z ∞ X (sλn )d−1 2d−1 δ/(2r+δ) −1/d x α (x) = O [det( ∆ )] , 1 s n max d−1 sN n − C(sλn ) x=1 max ≤ C· (9.11) from (9.8)-(9.9) and Assumption A.1. Now applying Lemma 9.1 and Assumption A.2 with Lemma 9.2 for d ≥ 2 or Lemma 9.3 for d = 1, Lemma 4.1 follows. Proof of Theorem 4.1: Here sN i,n = sN n , sN i,n (k) = Cn (k), E(Ỹi,n ) = E(Ỹ0,n ) for each i, k ∈ Zd (since sλn ∈ Z+ ) and det(s∆n ) = sλdn . Applying Lemma 9.2 for d ≥ 3 and Lemma 9.3 for d = 2, Lemma 9.1, Assumption A.2, and (9.11): d −1 X sN n − Cn (k) sλn |R0 | · gn (k) + o(sλ−1 gn (k) ≡ · σ(k). n ), d−1 sλn |R0 | sN n sλn k∈Zd P From (9.11) and Lemma 8.3, it follows that k∈Zd |gn (k)| ≤ C, n ∈ Z+ and that gn (k) → C(k)σ(k) E(τ̂n2 ) − τn2 = for k ∈ Zd . By the LDCT, the proof of Theorem 4.1 is complete. To establish Theorem 4.2, we require some additional notation. For i, k ∈ Zd , denote the difference between two Lebesgue volume-for-lattice-point-count approximations as: Di,n (k) = |R̃i,n |−sN i,n − |R̃i,n ∩k+R̃i,n |−sN i,n (k)| = |sRn |−sN i,n − |sRn ∩k+sRn |−sN i,n (k)| . Proof of Theorem 4.2: We handle here the cases d = 2 or 3. Details on the proof for d = 1 are given in Nordman (2002). We note first that if V (k) exists for each k ∈ Zd , then Lemma 9.4 implies C(k) = V (k). 2 Consider τ̂n, N OL . Applying Lemma 9.2 for d = 3, and Lemma 9.3 for d = 2, with (9.8), (9.10), and (9.11) gives 2 ) − τn2 = |JN OL |−1 E(τ̂n, N OL X i∈JN OL Then, using (9.3), we can arrange terms to write |JN OL |−1 sN i,n X |sRn | i∈JN OL for Ψn = − −1 P k∈Zd |sRn | s Ii,n = Ψn + X Gn (k) ; λ |R0 | d s n sN i,n |sRn | s Ii,n + o(sλ−1 n ). Gn (k) = k∈Z (|sRn | − |sRn ∩ k + sRn |)σ(k). Since R0 is convex, the boundary condition is valid and it holds that for all i, k ∈ Zd : d−1 , sN i,n (k) − |sRn ∩ k ∩ sRn | ≤ C(sλmax n ) d−1 (9.12) |sRn | − |sRn ∩ k + sRn | ≤ Ckkkd∞ (sλmax n ) from Lemma 8.3 and (9.9). Then (9.3), Lemma 9.4, and (9.12) give: d Gn (k) → 0 for k ∈ Z ; and sλn Ψn = O(1). By the LDCT, we establish X Gn (k) = o sλ−1 , n λ |R0 | d s n k∈Z X Di,n (k) · σ(k) , λd−1 |JN OL | i∈JN OL s n P k∈Zd |Gn (k)| ≤ C, n ∈ Z+ ; 2 E(τ̂n, ) − τn2 = Ψn (1 + o(1)), N OL 34 representing the formulation of Theorem 4.2 in terms of Ψn . If V (k) exists for each k ∈ Zd , then (9.3) and (9.12) imply that we can use the LDCT again to produce X −1 Ψn = V (k)σ(k) (1 + o(1)). sλn |R0 | d (9.13) k∈Z 2 The proof of Theorem 4.2 for τ̂n, is now finished. N OL 2 Consider τ̂n, OL . We can repeat the same steps as above to find 2 E(τ̂n, ) − τn2 = Ψn + OL X G∗ (k) n + o sλ−1 , n λ |R0 | d s n G∗n (k) = k∈Z D0,n (k) · σ(k) . d−1 sλn The same arguments for Gn apply to G∗n and (9.13) remains valid when each V (k) exists, k ∈ Zd , 2 d establishing Theorem 4.2 for τ̂n, OL . Note as well that if V (k) exists for each k ∈ Z , then Lemma 9.4 and Lemma 4.1 also imply the second formulation of the bias in Theorem 4.2. Proof of Theorem 5.1: Follows from Theorems 3.1 and 4.1 and simple arguments from Calculus involving minimization of a smooth function of a real variable. . Proof of Theorem 6.1: We develop some tools to facilitate the counting arguments required for 2 2 d expanding the biases of τ̂n, lattice points OL and τ̂n,N OL . We first define a count for the number of Z Qd Qd lying in a “face” of a d-dimensional rectangle. For a rectangle T , where j=1 (cj , c̃j ) ⊂ T ⊂ j=1 [cj , c̃j ], Sd cj , c̃j ∈ IR, define the “border” point set: B{T } = j=1 s = (s1 , . . . , sd )′ ∈ Zd ∩ T : sj ∈ {cj , c̃j } . With k ∈ Zd , it can be shown that there exists Nk ∈ Z+ such that: for n ≥ Nk , B{sλn (i + R0 )} − B{sλn (i + R0 ) ∩ k + sλn (i + R0 )} ≤ 32d(d+1)kkk∞sλd−2 n , i ∈ Zd . (9.14) See Nordman (2002) for more details. Fix k 6= 0 ∈ Zd and assume min1≤j≤d s λn ℓj > 3 + kkk∞ . We then write, for i ∈ Zd : sN i,n = P5,i,n − P4,i,n + P3,i,n , sN i,n (k) = P6,i,n − P7,i,n + P8,i,n , and X 4 |Di,n (k)| = (P2j−1,i,n − P2j,i,n ); j=1 P1,i,n = P5,i,n = Qd j=1 sλn ℓj , Qd j=1 |sλn (ij Qd − |kj |), P3,i,n = |B{sλn (i + R0 )}|, P4,i,n = |B{sλn (i + R0 )}|, Q + ℓj [−1/2, 1/2]) − tj ∩ Z|, P6,i,n = dj=1 (|sλn (ij + ℓj [−1/2, 1/2]) − tj ∩ Z| − |kj |), P2,i,n = j=1 (sλn ℓj P7,i,n = |B{sλn (i + R0 ) ∩ k + sλn (i + R0 )}|, P8,i,n = |B{sλn (i + R0 ) ∩ k + sλn (i + R0 )}|. We show now that, for k 6= 0, there exists C > 0, Nk ∈ Z+ , such that: n ≥ Nk , 4 X d−2 |Di,n (k)| = (P2j−1,i,n − P2j,i,n ) ≤ Ckkkd−1 ∞ sλn , j=1 35 i ∈ Zd . (9.15) By (9.14), for large n ≥ Nk , we have that 0 ≤ (P3,i,n − P8,i,n ) ≤ (P4,i,n − P7,i,n ) ≤ Ckkk∞sλd−2 n . Note that: |sλn (i + ℓj [−1/2, 1/2]) − tj ∩ Z| − sλn ℓj ≤ 3 for i ∈ Z, j ∈ {1, . . . , d}. Hence, |(P1,i,n − P5,i,n ) − (P2,i,n − P6,i,n )| Y X Y d−2 ≤ kkk|A| |sλn (ij + ℓj [−1/2, 1/2]) − tj ∩ Z| ≤ Ckkkd−1 sλn ℓj − ∞ ∞ sλn , A⊂{1,...,d}, 1≤|A|<d j∈A j∈A which holds uniformly in i ∈ Zd . With these bounds, we have (9.15). Applying (9.15) in place of Lemma 9.4, the same proof for Theorem 4.2 establishes Theorem 6.1. References Billingsley, P. (1986). Probability and Measure, 2nd Edition. John Wiley & Sons, New York. Bolthausen, E. (1982). On the central limit theorem for stationary mixing random fields. Ann. Probab. 10, 1047-1050. Bradley, R. C. (1989). A caution on mixing conditions for random fields. Statist. Probab. Letters 8, 489-491. Bühlman, P. and Künsch, H. R. (1999). Block length selection in the bootstrap for time series. Comput. Stat. Data Anal. 31, 295-310. Carlstein, E. (1986). The use of subseries methods for estimating the variance of a general statistic from a stationary time series. Ann. Statist. 14, 1171-1179. Corput, J. G. van der (1920). Über Gitterpunkte in der Ebene. Math. Ann. 81, 1-20. Cressie, N. (1991). Statistics for Spatial Data, 1st Edition. John Wiley & Sons, New York. DiCiccio, T., Hall, P., and Romano, J. P. (1991). Empirical likelihood is Bartlett-correctable. Ann. Statist. 19, 1053-1061. Doukhan, P. (1994). Mixing: properties and examples. Lecture Notes in Statistics 85. Springer-Verlag, New York. Fukuchi, J.-I. (1999). Subsampling and model selection in time series analysis. Biometrika 86, 591-604. Garcia-Soidan, P. H. and Hall, P. (1997). In sample reuse methods for spatial data. Biometrics 53, 273-281. Guyon, X. (1995). Random Fields on a Network. Springer-Verlag, New York. Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York. Hall, P., Horowitz, J. L., and Jing, B.-Y. (1995). On blocking rules for the bootstrap with dependent data. Biometrika 82, 561-574. Hall, P. and Jing, B.-Y. (1996). On sample reuse methods for dependent data. J. R. Stat. Soc. Ser. B 58, 727-737. Huxley, M. N. (1993). Exponential sums and lattice points II. Proc. London Math. Soc. 66, 279-301. 36 Huxley, M. N. (1996). Area, Lattice Points, and Exponential Sums. Oxford University Press, New York. Krätzel, E. (1988). Lattice Points. Deutscher Verlag Wiss., Berlin. Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17, 1217-1241. Lahiri, S. N. (1996). On empirical choice of the optimal block size for block bootstrap methods. Preprint. Department of Statistics, Iowa State University, Ames, IA. Lahiri, S. N. (1999a). Asymptotic distribution of the empirical spatial cumulative distribution function predictor and prediction bands based on a subsampling method. Probab. Theory Related Fields 114, 55-84. Lahiri, S. N. (1999b). Central limit theorems for weighted sums under some spatial sampling schemes. Preprint. Department of Statistics, Iowa State University, Ames, IA. Lahiri, S. N. (1999c). Theoretical comparisons of block bootstrap methods. Ann. Statist. 27, 386-404. Léger, C., Politis, D. N., and Romano, J. (1992). Bootstrap technology and applications. Technometrics 34, 378-399. Martin, R. J. (1990). The use of time-series models and methods in the analysis of agricultural field trials. Comm. Statist.-Theory Methods 19, 55-81. Meketon, M. S. and Schmeiser, B. (1984). Overlapping batch means: something for nothing? Proc. Winter Simulation Conf., 227-230, S. Sheppard, U. Pooch, and D. Pegden (editors). IEEE, Piscataway, New Jersey. Nordman, D. J. (2002). On optimal spatial subsample size for variance estimation. Dissertation paper. Department of Statistics, Iowa State University, Ames, IA. Nordman, D. J. and Lahiri, S. N. (2002) On the approximation of differenced lattice point counts with application to statistical bias expansions. Dissertation paper/technical report. Department of Statistics, Iowa State University, Ames, IA. Perera, G. (1997). Geometry of Zd and the central limit theorem for weakly dependent random fields. J. Theoret. Probab. 10, 581-603. Politis, D. N. and Romano, J. P. (1993a). Nonparametric resampling for homogeneous strong mixing random fields. J. Multivariate Anal. 47, 301-328. Politis, D. N. and Romano, J. P. (1993b). On the sample variance of linear statistics derived from mixing sequences. Stochastic Process. Appl. 45, 155-167. Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 22, 2031-2050. Politis, D. N. and Sherman, M. (2001). Moment estimation for statistics from marked point processes. J. R. Stat. Soc. Ser. B 63, 261-275. 37 Politis, D. N., Romano, J. P., and Wolf, M. (1999). Subsampling. Springer-Verlag, New York. Possolo, A. (1991). Subsampling a random field. IMS Lecture Notes 20, 286-294. Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. R. Stat. Soc. Ser. B 58, 509-523. Sherman, M. and Carlstein, E. (1994). Nonparametric estimation of the moments of a general statistic computed from spatial data. J. Amer. Statist. Assoc. 89, 496-500. Song, W. M. T. and Schmeiser, B. (1988). Estimating standard errors: empirical behavior of asymptotic MSE-optimal batch sizes. In Computer Science and Statistics: Proc. 20th Symp. Interface, 575-580, E. J. Wegman, D. T. Gantz, and J. J. Miller (editors). American Statistical Association. 38