Point and Interval Estimation of Variogram Models using Spatial Empirical Likelihood running title: EL variogram inference Daniel J. Nordman, Petruţa C. Caragea Department of Statistics Iowa State University Ames, IA 50011 Abstract We present a spatial blockwise empirical likelihood method for estimating variogram model parameters in the analysis of spatial data on a grid. The method produces point estimators that require no spatial variance estimates to compute, unlike least squares methods for variogram fitting, but are as efficient as the best least squares estimator in large samples. Our approach also produces confidence regions for the variogram, without requiring knowledge of the full joint distribution of the spatial data. Additionally, the empirical likelihood formulation extends to spatial regression problems and allows simultaneous inference on both spatial trend and variogram parameters. The asymptotic behavior of the estimator is examined analytically, while its behavior in finite samples is investigated through simulation studies. 1 Introduction Describing spatial dependence with variograms is one of the most common methods used in practice, and estimation of parameters in theoretical variogram models 1 is needed to achieve prediction at unobserved locations. One popular method for variogram estimation is so-called “least squares variogram fitting,” proposed originally in the geostatistical literature (cf. Cressie, 1993) and further examined by Cressie (1985), Zhang et al. (1995), and Lee and Lahiri (2002), among others. This approach estimates variogram parameters by minimizing a weighted distance between a nonparametric variogram estimator (e.g., sample variogram) and a parametric variogram model. The choice of the weighting criterion largely determines the performance of the resulting least squares estimator (LSE). For example, the generalized least squares (GLS) estimator is known to be statistically optimal within the class of LSEs (Lahiri et al., 2002), but its calculation requires an asymptotic covariance matrix for the variogram estimator, and this can make it computationally intractable. As an alternative, Lee and Lahiri (2002) recently proposed a subsampling generalized least squares (SLS) estimator that replaces the asymptotic covariance matrix needed in the GLS criterion with a nonparametric estimate. Although much attention has centered on developing practical LSEs for variogram parameters (see Section 2), there is less information available on assessing the precision of the resulting estimates. For example, confidence bands for the variogram model could be helpful in quantifying the uncertainty in estimation. However, the unknown and possibly complex distribution of the data generating process complicates the setting of confidence regions. Parametric approaches to LSE for estimating variogram parameters, such as maximum likelihood (or REML), may assess uncertainty in parameter estimates by assuming the data follow a Gaussian distribution (Cressie, 1993, p. 92). However, a main advantage of LSE methods is that these make minimal distributional assumptions and are computationally less demanding than methods that make full distributional assumptions. This article proposes a new method of variogram estimation that is based on a 2 spatial empirical likelihood (EL) computed from sets of spatial blocks or sub-regions. A good deal of recent work has been focused on extending the original EL methods of Owen (1988, 1990, 2001) from applications appropriate for independent data to deal with problems in time series. Kitamura (1997) proposed a “blockwise” EL method for weakly dependent time series. With similar blocking techniques, Bravo (2005) considered time series regressions and Zhang (2006) adapted EL for negatively associated series. Monti (1997) and Nordman and Lahiri (2006) proposed periodogram-based EL methods for dealing with short- and long-memory processes, respectively. Additionally, research in econometrics has focused on EL for testing moment restrictions (Kitamura, Tripathi and Ahn, 2004; Newey and Smith, 2004). While our spatial EL shares some features with these methods for dependent data in one dimension, the demonstration of asymptotic properties in the spatial setting requires more than simply “folding” a one-dimensional process onto a set of higher dimensions. The EL method proposed here allows valid likelihood inference to be made about variogram model parameters without requiring an estimate of the covariance matrix for the joint distribution of the underlying process, which is of practical importance. This is a consequence of the “internal studentization” known to exist for EL methods if suitably formulated for particular problems (e.g. Hall and La Scala, 1990; Kitamura, 1997). This EL method results in estimators of the variogram parameters that are asymptotically normal and as efficient as the optimal LSE based on the sample variogram. The method also extends in a straightforward manner to the problem of simultaneous estimation of parameters in variogram models and linear models for large-scale spatial structure or trend, thus offering a solution to the problem of variogram estimation in application of universal kriging (Cressie, 1993, 3.4.3). In Section 2, we provide background on LSEs and variogram fitting. In Section 3, we describe the spatial EL method for variogram inference. The main distributional 3 results of the paper are presented in Section 4 and the EL method is investigated through a simulation study in Section 5. An extension of the EL method to a spatial regression model is given in Section 6 and illustrated with an example. Section 7 provides some final remarks. Additional material, including proofs of the main results, is provided in a supplementary on-line Appendix. 2 Least squares estimation of variogram models Suppose that {Z(s) : s ∈ Rd } is a real-valued, intrinsically stationary random field, whereby E{Z(s) − Z(s + h)} = 0 and 2γ(h) ≡ Var{Z(s) − Z(s + h)} = E{Z(s) − Z(s + h)}2 , for all s, h ∈ Rd . The function 2γ(h) denotes the variogram of the spatial process. In least squares variogram fitting, we assume that the true variogram of Z(·) belongs to a parametric family {2γ(·; θ) : θ ∈ Θ}. The goal is to estimate θ ∈ Θ ⊂ Rp based on the available data {Z(s1 ), . . . , Z(sn )}, collected from sites {s1 , . . . , sn } located within a spatial sampling region. Here, we will assume that {s1 , . . . , sn } lie on a regular lattice in Rd . A least squares method begins with a nonparametric estimator 2γ̂n (h) of the process variogram 2γ(h), such as the sample variogram (Matheron, 1962), 2γ̂n (h) = X {Z(si ) − Z(sj )}2 /|Nn (h)|, h ∈ Rd , (1) (i,j)∈Nn (h) Nn (h) = {(i, j) : i, j ∈ [1, n], si − sj = h}, where we take |A| to denote the size of a finite set A. Throughout the remainder, we suppose that a least squares method is defined using (1). A LSE of the variogram parameter θ is obtained by minimizing a weighted distance between the variogram model 2γ(h; θ) and the estimator 2γ̂n (h) over a collection of r ≥ p fixed lags h ∈ {h1 , . . . , hr } ⊂ Rd . Namely, for a r × r positive definite weight matrix V(θ), the LSE of θ with respect to V(θ) is given by θ̂n,V ≡ arg min{Qn,V (θ) : θ ∈ Θ}, 4 with Qn,V ≡ gn (θ)0 V(θ)gn (θ), (2) where gn (θ) is a r × 1 vector with i-th element 2γ̂n (hi ) − 2γ(hi ; θ). The selection of V(θ) determines the type of the LSE θ̂n,V . For example, two common choices for V(θ) are the identity matrix and a diagonal matrix with entries Nn (hi )/{2γ(hi ; θ)}2 , producing the ordinary least squares estimator (OLS) and an approximation to Cressie’s weighted least squares (WLS) estimator (Cressie, 1993, p. 96), respectively. Under general conditions, Lahiri et al. (2002) have shown the generalized least squares (GLS) estimator to be asymptotically efficient among all LSEs of θ, which corresponds to selecting V(θ) = Σ(θ)−1 where Σ(θ) is the asymptotic covariance matrix of gn (θ) under θ. While statistically optimal in the LSE class, the computation of the GLS estimator requires minimizing a complex function (2) of the covariance matrix Σ(θ) at each θ. As an alternative, Lee and Lahiri (2002) proposed replacing the covariance matrix Σ(θ0 ) at the true parameter value θ0 with a nonparametric b based on a subsampling method for dependent data described by Politis estimator Σ and Romano (1994) and Sherman (1996), among others. Their subsampling least b −1 , squares (SLS) estimator θ̂n,SLS = θ̂n,Σb −1 is then obtained by setting V(θ) = Σ which is free of θ thereby making (2) simpler to compute for SLS than GLS. However, Lee and Lahiri (2002) do not address quantifying the uncertainty in estimates of the variogram 2γ(h; θ̂n,SLS ), which we consider with the EL method described next. 3 3.1 Spatial EL method for variogram estimation Spatial sampling design To describe the spatial EL method, we shall adopt a spatial sampling framework that allows a spatial sampling region to expand as the sample size increases. Suppose the process Z(·) is observed at n sites {s1 , . . . , sn } located on a regular grid within a spatial sampling region Rn ⊂ Rd , d ≥ 1. For some increasing sequence {λn }n≥1 5 of positive scaling factors, we suppose that the sampling region Rn is obtained by inflating a prototype set R0 by λn . That is, Rn = λn R0 , where the template set R0 ⊂ (−1/2, 1/2]d contains a neighborhood around the origin. This formulation of Rn permits a wide variety of sampling region shapes where the shape of Rn is preserved as the sampling region grows. Similar sampling schemes have been considered by Politis and Romano (1994), Sherman (1996), and Lee and Lahiri (2002) for spatial subsampling, and we follow these authors in assuming that the process Z(·) is observed at locations on the integer grid Zd lying inside Rn ; that is, the available sampling sites are {s1 , . . . , sn } = Rn ∩ Zd . 3.2 Variogram estimating equations The spatial EL method for variogram estimation employs a moment condition that links the intrinsically stationary spatial process Z(·) to the variogram parameter value θ ∈ Θ ⊂ Rp . To accomplish this, we select r ≥ p fixed lags {hi }ri=1 ⊂ Zd , define a r × 1 vector function Γ(θ), θ ∈ Θ with i-th component equal to 2γ(hi ; θ), and construct vectors Y(·) of the spatial process Z(·) as 0 Y(s) ≡ {Z(s) − Z(s + h1 )}2 , . . . , {Z(s) − Z(s + hr )}2 , s ∈ Rd . (3) When the Z(·)-process variogram 2γ(h; θ0 ), h ∈ Rd , belongs to the model class, we have the moment equation EY(s) = Γ(θ0 ) ∈ Rr , s ∈ Rd . (4) For inference on θ ∈ Θ, we next create an EL function based on (3) and the moment assumption in (4). Other EL functions (and resulting estimators) can be possible for θ by changing the estimating functions and moment conditions. For example, by re-defining Y(·) with absolute differences (rather than squared), the right hand side 6 of (4) would be (2/π)1/2 Γ1/2 (θ0 ) for Gaussian processes. However, we shall focus our spatial EL development on (3) and (4). 3.3 Blockwise EL function and maximum EL estimator Consider values {Y(s) : s ∈ Rn,Y ∩ Zd } as defined in (3), corresponding to a sampling region Rn,Y = {s ∈ Rn : s + h1 , . . . , s + hr ∈ Rn }. The EL function for θ involves creating a likelihood function based on blocks of Y(·). The data blocking is a device used to retain the spatial dependence structure in the EL likelihood by keeping neighboring observations together. Similar blocking techniques have been key to formulating other nonparametric likelihoods for dependent data, like the block bootstrap or subsampling (Künsch, 1989; Politis and Romano, 1994). While the block bootstrap assigns probabilities to data blocks by resampling, the spatial EL method creates a likelihood by assigning probabilities to blocks under a moment constraint and block sample means of Y(·) observations are used to summarize the information in each block regarding the moment condition (4). Let {bn }n≥1 be a sequence of positive integers that will define the EL block scaling and let In = {i ∈ Zd : Bbn (i) ⊂ Rn,Y } denote the index set of all d-dimensional rectangles Bbn (i) ≡ i + bn (−1/2, 1/2]d , i ∈ Zd , lying inside Rn,Y . This provides a collection of blocks as {Bbn (i) : i ∈ In }. Figure 1 provides an illustration of the sampling region and the blocking mechanism. To keep the blocks small relative to the size of the sampling region Rn = λn R0 (or Rn,Y ), we suppose that bn → ∞ grows at a slower rate than λn and require b2n /λn → 0 or equivalently, d 2 b−1 n + (bn ) /n → 0 (5) as the spatial sample size n → ∞; see the on-line Appendix for more details. Each block Bbn (i), i ∈ In , contains |Bbn (i)∩Zd | = bdn observations of the vector process Y(·) 7 with a block sample mean Ȳi = P s∈Bbn (i)∩Zd Y(s)/bdn . By (5), the squared number of observations in a block must be of smaller order than the overall sample size n, which is a generalization of EL block conditions from time series d = 1 (Kitamura, 1997). Figure 1: (a) Sampling region Rn for Z(·)-process with site locations denoted by •; (b) Sampling region RY,n for Y(·) from (3) based on r = 2 lags: h1 = (0, −1)0 , h2 = (2, 0)0 ; (c) Overlapping blocks. (a) (b) pp pp pp pp pp pp pp pp pp pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp p p p ppppppppppppppp pp pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp ppppppppppppppppp (c) pp pp pp pp pp pp pp pp pp pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp p p p ppppppppppppppp pp pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp ppppppppppppppppp We assess the plausibility of a value θ ∈ Θ using the profile EL function given by ( Ln (θ) = NINI · sup ) Y pi : pi ≥ 0, i∈In X i∈In pi = 1, X pi Ȳi = Γ(θ) , (6) i∈In where NI = |In | denotes the number of blocks. A multinomial likelihood is created from probabilities assigned to each block mean Ȳi , under an “expectation-Γ(θ)” linear constraint due to (4), and the largest possible product of these probabilities determines the EL function Ln (θ) for θ ∈ Θ. Note that the maximal value of Ln (θ) is 1, which occurs if each pi = 1/NI , and we define Ln (θ) = −∞ if the set in (6) is empty. If Γ(θ) is interior to the convex hull of {Ȳi : i ∈ In }, then the EL function for θ achieves a maximum at probabilities pθ,i = NI−1 {1 + t0θ (Ȳi − Γ(θ))}−1 , i ∈ I, and (6) becomes Ln (θ) = Y −1 1 + t0θ (Ȳi − Γ(θ)) , i∈In 8 (7) where tθ ∈ Rr satisfies P i∈In (Ȳi − Γ(θ))pθ,i = 0r . Owen (1990) and Qin and Law- less (1994) provide these and further computational details with EL methods. In particular, Owen (1990, Section 3) details the computation of the profile EL function for inference on the mean µ of independent data. It is important to note that the same maximization routine applies to (7) by substituting block means Ȳi and Γ(θ) in place of individual independent observations and µ. That is, the spatial EL function requires only forming blocks and supplying these into a well-known EL function for independent data. The maximizer of the EL function Ln (θ) is a maximum empirical likelihood estimator (MELE) of θ, denoted by θ̂n . Next, we consider large sample properties of this MELE as well as EL confidence regions formulated with the log-EL function given by `n (θ) = −2b−d n log Ln (θ), θ ∈ Θ. (8) The factor b−d n is a required adjustment due to overlapping EL blocks and represents the spatial analog of an EL block adjustment with time series (Kitamura, 1997). 4 Distributional results for the EL method For describing the distributional properties of the EL method, we require some assumptions on the spatial process and the variogram model, referred to as Assumptions “A.1 to A.6.” We defer technical details on these assumptions to Section 8.1 (on-line Appendix). Briefly, Assumptions A.1-A.3 describe spatial mixing and moments conditions so that the spatial EL method will be valid for a broad class of weakly dependent spatial processes. Assumptions A.4-A.5 are smoothness and identifiability conditions on the variogram model which may be checked in practice. All of these assumptions are similar to those of Lee and Lahiri (2002) so that the SLS and EL methods are valid under the same general conditions. Assumption A6 entails 9 the EL block scaling (5). 4.1 Distribution of maximum empirical likelihood estimator The main result of this section establishes the existence, consistency, and asymptotic normality of the MELE of θ as the global maximizer of Ln (θ) over Θ. We note that the theorem does not require some of the stronger conditions often associated with global MELE results such as compactness of the parameter space Θ or secondorder derivatives of the estimating functions (cf. Qin and Lawless, 1994, for iid data; Kitamura, 1997, for time series). Let θ0 ∈ Θ ⊂ Rp denote the unique parameter value satisfying (4) and Σ(θ0 ) denote the asymptotic covariance matrix for the sample variogram (1) over lags {hi }ri=1 , as described in Section 2. Theorem 1 Suppose Assumptions A1-A6 and (4) hold. Then, as n → ∞, p (i) P (a global maximum θ̂n exists on Θ) → 1 and θ̂n −→ θ0 . −1 d (ii) n1/2 (θ̂n − θ0 ) −→ N (0p , ∆(θ0 )), with ∆(θ0 ) = D(θ0 )0 Σ(θ0 )−1 D(θ0 ) . Out of all LSEs based on the sample variogram, the GLS estimator has the smallest asymptotic variance because of its LSE-optimal limiting covariance matrix, given by ∆(θ0 ) (Lahiri et al., 2002). By Theorem 1, the MELE here has the same asymptotic efficiency as the best LSE in this class. We may also draw some connections between the MELE and the SLS estimator of Lee and Lahiri (2002). Recall that the SLS estimator minimizes (2) with a matrix b −1 involving a subsampling estimator Σ b of Σ(θ0 ) at the true parameter value V(θ) = Σ θ0 . As the sample size n → ∞, we may expand the log-EL ratio (8) at “EL-plausible” values of θ (e.g., satisfying `n (θ) ≤ r for some r > 0) as b −1 gn (θ) 1 + op (1) , `n (θ) = ngn (θ)0 Σ EL d X b EL ≡ bn {Ȳi − Γ(θ0 )}0 {Ȳi − Γ(θ0 )}, Σ NI i∈I n where gn (θ) ∈ Rr has i-th component 2γ̂n (hi ) − 2γ(hi ; θ) based on the sample varb EL denotes a subsampling estimator of the asymptotic covariance iogram (1) and Σ 10 matrix Σ(θ0 ) of gn (θ) under θ0 . Because θ̂n minimizes `n (θ), the MELE asymptotb −1 . The EL ically minimizes a quadratic form (2) using a weight matrix V(θ) = Σ EL b EL resembles the subsampling estimator Σ b of Lee and Lahiri in structure version Σ but, unlike SLS, the EL method involves no direct variance estimation. 4.2 Variogram confidence regions As suggested earlier, the EL method allows an assessment of the uncertainty in estimating θ through EL confidence regions based on the MELE θ̂n . Using the log-EL function (8), Theorem 2 concerns the log-EL ratio statistic rn (θ) ≡ `n (θ) − `n (θ̂n ) = −2b−d n log{L(θ)/L(θ̂n )}, θ ∈ Θ, which is shown to have a chi-squared limit at θ = θ0 for calibrating confidence regions. Relatedly, the spatial EL approach can also accommodate confidence regions for parameter subsets after profiling out the nuisance parameters (cf. Qin and Lawless, 1994, for iid data case). Suppose θ = (θ10 , θ20 )0 , where θ1 represents a q × 1 parameter vector of interest and θ2 denotes a (p − q) × 1 nuisance vector. For fixed θ1 , suppose that θ̂2θ1 maximizes the EL function Ln (θ1 , θ2 ) with respect to θ2 and define θ1 `n (θ1 ) ≡ −2b−d n log Ln (θ1 , θ̂2 ). In the following, let χ2ν denote a chi-squared random variable with ν degrees of freedom and lower-α quantile denoted by χ2ν;α . Theorem 2 Under the assumptions of Theorem 1, as n → ∞ d (i) rn (θ0 ) = `n (θ0 ) − `n (θ̂n ) −→ χ2p if H0 : θ = θ0 holds. d 0 0 0 (ii) rn (θ10 ) = `n (θ10 ) − `n (θ̂1n ) −→ χ2q if H0 : θ1 = θ10 ∈ Rq holds, where θ̂n = (θ̂1n , θ̂2n ). We may set an approximate 100(1 − α)% EL confidence region for θ as CR(1 − α) ≡ {θ ∈ Θ : rn (θ) ≤ χ2p;1−α }; analogous regions apply to a parameter subset θ1 when profiling. Confidence regions for θ may also be turned into simultaneous 11 confidence bands {γ(h; θ) : θ ∈ CR(1 − α)} for the entire variogram model, under the interpretation that the bands contain the unknown variogram γ(h; θ0 ) at a lag h ∈ Rd if and only if the confidence region for θ contains θ0 . A Bartlett correction for the spatial EL method is a potential tool to enhance the coverage accuracy of EL confidence regions. This correction is a property often associated with EL methods, which essentially involves a scalar adjustment to the EL log-ratio rn (θ0 ) to align its expected value Ern (θ0 ) with a chi-squared’s mean and improve the chi-squared approximation. With independent data, a Bartlett correction was first established by DiCiccio et al. (1991) for mean parameters and has been extended by others to more EL scenarios; see Chen and Cui (2006a, 2006b); Kitamura (1997) and Monti (1997). A formal justification of a Bartlett correction in the spatial setting is difficult and requires machinery outside of the scope of this paper. At the same time, the effect of a correction may still be of interest and we propose here an algorithm for a practical Bartlett/mean correction factor by using a spatial block bootstrap. Bartlett Factor Algorithm: Pick integer M ≥ 1. For i = 1, . . . , M , independently generate a block bootstrap rendition, say Yn∗i , of the original (vectorized) spatial ∗i ∗i data Yn = {Ys : s ∈ RY,n ∩ Zd } and compute rn∗i (θ̂n ) = `∗i n (θ̂n ) − `n (θ̂n ) (using θ̂n as ∗i a consistent estimate of θ0 ∈ Rp ), where `∗i n and θ̂n are the log EL ratio and MELE P ∗i analogs based on Yn∗i . Calculate r̄n∗ = M −1 M i=1 rn (θ̂n ) to estimate Ern (θ0 ) and set a Bartlett corrected confidence region as {θ : (p/r̄n∗ )rn (θ) ≤ χ2p,1−α }. If θ = (θ10 , θ20 )0 with interest on θ1 ∈ Rq (treating θ2 ∈ Rp−q as nuisance parameter), we use the profile version rn (θ1 ) = `n (θ1 ) − `n (θ̂1n ) for a Bartlett corrected confidence region P ∗i {θ1 : (q/r̄n∗ )rn (θ1 ) ≤ χ2q,1−α } based on r̄n∗ = M −1 M i=1 rn (θ̂1n ). We examine the efficacy of the Bartlett correction through a numerical study in Section 5 to follow. The block bootstrap method for generating spatial data that we 12 use is described in detail by Lahiri (2003a, Section 12.3.1), and requires an integer block length ζn as input. 5 Numerical study This section examines the finite sample performance of our EL method for variogram inference through simulation. The behavior of EL estimators is influenced by a combination of factors including the sample size, the strength of spatial dependence and the choice of lags and block size. To examine these factors and the interactions among them, we consider an exponential variogram in R2 parameterized as 2γ(h; θ) = 2 θ1 + s 1 − exp{−||h||/θ2 } , h 6= 0 ∈ R2 , (9) with nugget and range parameters θ = (θ1 , θ2 ) ∈ Θ = (0, ∞)2 and fixed sill equal to s = 1. That is, in studying the performance factors mentioned above and comparing results obtained through several estimation techniques, we focus estimation on the nugget and range parameters, as they most directly influence small and large-scale spatial variation; in practice, the sill parameter would be estimated as well and Section 5.4 provides some simulation evidence for that case. Using a two parameter model, we generated real-valued, mean-zero (intrinsically) stationary Gaussian variables on an integer grid within sampling regions Rn = λn (−1/2, 1/2]2 ⊂ R2 of size λn × λn for λn = 10, 30, 50, through the circulant embedding method of Chan and Wood (1997). Because the dependence strength can greatly impact performance of least squares methods, we present results for three range parameter values θ2 = 1, 4 or 8, selected to represent relatively weak, moderate and strong dependence with the nugget value θ1 = 0.5. For increasing λn , we generated a total of 10000, 5000 or 3000 data sets, respectively; these simulation sizes were larger than those needed to produce Monte Carlo standard errors less than 1% of actual parameter values. 13 In Section 5.1, we first examine EL confidence regions for quantifying the precision of point estimators by comparing EL coverage probabilities against those obtained through the SLS method. In Section 5.2, we then compare the mean squared errors of EL point estimators against those of other LSEs. The simulation results to follow are based on lag {hi }ri=1 choices for variogram inference (which may differ by process strength and sample size) using a rule of thumb described in Section 5.3. Section 5.4 considers EL estimation in a three-parameter variogram model. 5.1 Assessment of confidence regions While Lee and Lahiri (2002) do not explicitly consider interval estimation, we may propose SLS confidence regions for variogram parameters based on the SLS quadratic form in (2). We define an approximate 100(1 − α)% SLS-based confidence region for θ = (θ1 , . . . , θp ) as θ ∈ Θ : n Qn,Σb −1 (θ) − Qn,Σb −1 (θ̂n,SLS ) ≤ χ2p,1−α . (10) where the chi-squared calibration follows naturally from the distributional properb and θ̂n,SLS given by Lee and ties of the SLS covariance and point estimators Σ Lahiri (2002). A similar chi-squared approximation is generally not valid for other least squares approaches, such as OLS or WLS. As both EL and SLS involve data blocks, Figure 2 displays EL and SLS coverage probabilities for 90% confidence regions for model parameters in (9) on a sampling region with λn = 50 against various block sizes. The coverage probabilities for EL are seen to be generally closer to nominal than those of SLS, which tended to exhibit extreme under-coverage, especially under strong dependence. This behavior held often in our simulations over a wide range of block sizes. In Figure 2, the best block sizes for coverage accuracy increase with the dependence strength and, as expected, 14 Figure 2: Coverage probabilities for 90% confidence regions for (θ1 , θ2 ) plotted against block sizes for a 50 × 50 sampling region, where θ1 = 0.5. In each panel, the top curve is for EL, the lower (dashed) curve for SLS based on 3000 simulations. 4 6 8 Block size 10 12 1.0 0.8 0.6 0.2 0.0 EL SLS 0.0 EL SLS 0.0 EL SLS 2 0.4 Coverage Probabilities 0.8 0.6 0.2 0.4 Coverage Probabilities 0.8 0.6 0.4 0.2 Coverage Probabilities (c) θ2 = 8 1.0 (b) θ2 = 4 1.0 (a) θ2 = 1 2 4 6 8 Block size 10 12 2 4 6 8 10 12 Block size the coverage probabilities often deteriorate with an increase in the range. We repeated the previous analysis for smaller sampling regions λn = 30 and λn = 10. Coverage probabilities typically decreased for both EL and SLS on smaller regions, with EL continuing to have coverages closer to the nominal level; see Appendix Figure 1 (in the on-line Appendix) for the 30 × 30 region results. For example, the coverage results for θ2 = 4 and λn = 30 appeared similar to those of θ2 = 8 and λn = 50. The most drastic reduction in the lattice size λn = 10 produced the lowest coverage probabilities over a larger variety of block sizes, especially for larger range parameters θ2 = 4, 8 which are difficult to capture on a small data set. In Figure 3, we show the coverage results for the weak dependence case θ2 = 1 on a 10 × 10 lattice. Although both methods have substantially poorer coverage than on the larger sampling region, SLS exhibits under-coverage to a greater degree than EL. On smaller sampling regions in particular, it is possible to enhance the coverage accuracy of EL confidence regions through the Bartlett correction described in Section 4.2. We applied the Bartlett correction algorithm of Section 4.2 using M = 200 15 Figure 3: Coverage probabilities for 90% confidence regions for (θ1 , θ2 ) plotted against block sizes for a 10 × 10 sampling region, where θ1 = 0.5, θ2 = 1. Methods include 0.6 0.4 EL SLS Bart.EL Boot.Q. EL 0.0 0.2 Coverage Probabilities 0.8 1.0 SLS, (uncorrected) EL, Bartlett corrected EL and bootstrap quantile calibrated EL. 2 3 4 5 6 7 Block size block bootstrap renditions with block sizes ζn = bn + 1 in the bootstrap procedure of Lahiri (2003a, Section 12.1.3). Figure 3 illustrates that this correction dramatically improves EL coverage probabilities over a range of block sizes on the 10 × 10 region. For comparison, we also include in Figure 3 the coverages for EL regions based on a bootstrap quantile calibration of the log-EL ratio rn (θ0 ) from Theorem 2. While the Bartlett procedure uses bootstrap replicates for a mean correction, the bootstrap quantile procedure uses sample quantiles from these bootstrap replicates to calibrate this log-EL ratio (rather than a chi-squared calibration). In Figure 3, the Bartlett correction appears to produce better coverage probabilities than the bootstrap quantile approximation. Intuitively, mean estimation in the Bartlett procedure may be an easier task with a smaller number of bootstrap renditions (e.g., M = 200) than approximating extreme quantiles in the log-EL ratio’s distribution; Chen and Cui (2006b) observed similar results with Bartlett corrected EL intervals 16 Table 1: Coverage probabilities for 90% confidence regions for (θ1 = 0.5, θ2 ) using SLS, EL, Bartlett corrected EL (Bc) and bootstrap quantile calibrated EL with several sampling regions and ranges θ2 and a block size bn = n1/5 (based on 1000 simulations). Rn 10 × 10 20 × 20 30 × 30 θ2 SLS EL ELBc ELboot SLS EL ELBc ELboot SLS EL ELBc ELboot 1 67.4 77.6 88.6 80.8 85.4 93.4 96.0 86.2 83.5 92.1 96.4 87.1 4 59.4 74.8 84.8 78.6 58.4 76.8 91.8 82.4 63.0 78.8 92.4 83.6 8 50.4 70.6 84.6 75.6 53.2 75.8 93.0 84.0 52.2 70.2 93.4 81.0 for independent data. Table 1 displays coverage probabilities for some other smaller-sized regions with block sizes chosen to have slightly smaller order than n1/4 from (5). The Bartlett corrected EL regions seem to exhibit good coverage accuracy. In Table 1, the pattern again emerges that the uncorrected EL regions have better coverages than SLS, but coverage probabilities typically decrease with an increase in dependence (range). 5.2 Assessment of point estimation Using the same simulation design as in Section 5.1, we now compare EL and SLS along with WLS and OLS estimators (from Section 2) in terms of Relative Root Mean Squared Error (RRMSE) in point estimation. RRMSE is defined here as the percentage of a parameter value represented by root mean squared error of an estimator, which allows meaningful comparisons across different sampling conditions. Figure 4 presents the RRMSE results for all four estimators of the range parameter θ2 when λn = 50 as a function of block size; note WLS and OLS estimators do not involve data blocks so their RRMSEs are horizontal lines. The four methods perform very similarly under weak dependence (θ2 = 1), with RRMSEs just below 30%. As the dependence/range increases, differences among the four methods emerge 17 Figure 4: Relative Root Mean Squared Error (%) for the estimate of the range on a 50 × 50 sampling region against block size (based on 3000 simulations). ● EL SLS WLS OLS 50 40 ● ● ● ● ● ● ● EL SLS WLS OLS 0 0 ● ● 10 ● ● ● ● 2 4 6 8 10 12 Block size 30 ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● Relative Root MSE (%) ● ● 10 ● EL SLS WLS OLS 0 ● 30 40 ● 20 ● Relative Root MSE (%) 40 30 ● 20 ● 10 Relative Root MSE (%) (c) θ2 = 8 50 (b) θ2 = 4 50 (a) θ2 = 1 2 4 6 8 10 12 2 Block size 4 6 8 10 12 Block size and EL and SLS appear to have better performances than OLS and WLS (which are not asymptotically efficient least squares methods). The best block sizes for EL point estimation appear to be slightly smaller than those for optimal coverage accuracy in Figure 2. Again small blocks seem adequate for the weakest dependence (smallest range); for medium and large ranges, EL point estimation worsens when blocks become too large (i.e, more than n1/4 ≈ 7). The corresponding RRMSE results for estimation of the nugget θ1 on the 50 × 50 region appear in Appendix Figure 2 and are typically smaller than for range estimation. A closer analysis of the bias/spread decomposition of the RRMSEs (results not shown) indicate that the variability of all estimators is much larger than their bias. While the EL estimator often has somewhat less bias, the SLS estimator generally exhibits smaller variance to the extent that the SLS method is often slightly better in terms of RRMSE. However, the EL point estimator seems RRMSE-comparable to the SLS estimator and, as Figures 2 and 3 suggest, the SLS point estimator often lies far enough from the true variogram parameters that the SLS confidence region (10) exhibits extreme under-coverage. As the size of the sampling region decreases, the RRMSE performance of all 18 methods deteriorate. For a 30 × 30 sampling region, the RRMSEs for the range varied just below 40% for weak dependence to above 50% for the strong dependence case. For estimation of larger range values, SLS and EL appear to be slightly better than WLS and OLS at smaller block sizes (see Appendix Figure 3); all methods appear comparable for nugget estimation in this case (Appendix Figure 4). Appendix Figure 5 illustrates the RRMSE values on a 10×10 region and θ2 = 1 which are around 60% for both nugget and range estimators with all methods and EL and SLS point estimators particularly worsen when block sizes become large; because fewer blocks are available, this behavior appears more accentuated on a 10 × 10 region than for larger sampling regions. Additionally, a chance exists for nugget estimates to be zero (i.e., driven to the boundary) on small sampling regions and all LSE methods produced such estimates on the same 7% of the 10000 simulations for the 10 × 10 region with range θ2 = 1 (zero nugget estimates did not appear on the larger regions in the simulations). The SLS and EL methods produced more zero nugget estimates for larger blocks (as high as 12% at the maximal block size of b = 7 possible), which also supports their deterioration in RRMSE as blocks increase on the small region. 5.3 A discussion of the choice of lags The choice of lags is important for least squares inference and we propose a strategy for lag selection based on reviewing a large number of situations. Throughout the preceding two sections, we used the same collection of lags {hi }ri=1 ⊂ Z2 for all estimation methods, but we allowed these lags to vary by model parameters (9) (e.g., dependence strength) and sample size as described in the following. Asymptotically, the variance of the EL estimator in Theorem 1 will not increase (and may in fact decrease) by adding more lags to set the moment condition (4); see Qin and Lawless (1994, Corollary 1) for a similar result on adding EL estimating 19 functions with independent data. In this sense, it is then desirable to have a reasonable number of lags. However, in practical settings, the data may not support a large lag number and the addition of too many lags can force the EL moment condition (4) to become too restrictive for inference as well as hinder estimation for other least squares methods. Additionally, longer lags are helpful for capturing large-scale spatial dependence (e.g., range) but overly long lags can also limit the number of data blocks that are available to both EL and SLS methods. Thus, a balance in the number and length of lags must be sought, and we suggest a rough rule of thumb for accomplishing this balance. To capture small scale variogram behavior (e.g., nugget parameter), the lags should contain enough local information and we begin with a few very short lags in both the horizontal and vertical directions of the lattice, such as h1 = (1, 0)0 , h2 = (0, 1)0 or additionally h3 = (2, 0)0 , h4 = (0, 2)0 . However, for large values of the range parameter, the use of only short lags produced poor point estimates for all four methods from Section 5.2, as expected. We found that the most efficient way of augmenting the lags (in both number and length) to increase performance, while also retaining as many data blocks as possible, was to include “diagonal” terms (such as h5 = (1, 1)0 , h6 = (2, 2)0 and so on) until roughly about 80-100% of the actual value of the range used. For example, when θ2 = 1 the last “diagonal” term used was (1, 1)0 , for θ2 = 4 it was (4, 4)0 , while for θ2 = 8 it was (6, 6)0 . This approach allows incorporation of larger distances with fewer lags. Following this empirical rule in Sections 5.1 and 5.2, we used lags h1 = (1, 0)0 , h2 = (0, 1)0 and h5 = (1, 1)0 for λn = 10, 30, 50 with range θ2 = 1. For θ2 = 4, we added h3 = (0, 2)0 , h4 = (2, 0)0 , h6 = (2, 2)0 , h7 = (3, 3)0 and h8 = (4, 4)0 for the cases of λn = 30 and 50. It proved difficult to select lags for satisfactory point estimators with any method on a small region λn = 10. For the largest range θ2 = 8, we used h1 through h8 on the 30 × 30 20 region and added two more diagonal terms, h9 = (5, 5)0 and h10 = (6, 6)0 on the 50 × 50 region. Not including h9 and h10 on the 30 × 30 region with θ2 = 8 allowed for more blocks and appeared to produce slightly better point estimators for SLS and EL methods, which illustrates the complex interaction between the choice of lags, dependence strength and the sample size. In general, for larger sampling regions, increasing the lags according to the empirical rule led to reasonable point estimators of range with both EL and SLS methods and did not create any substantial losses in performance in nugget estimation (an easier task compared to range estimation). In practice, one can obtain a pilot estimate of the magnitude of the range using an empirical estimator of the variogram, such as (1), for guidance in selecting the diagonal elements of h. 5.4 Additional practical and computational issues In this section, we briefly consider two previously unaddressed issues through simulation: estimation of a three parameter variogram model including a sill parameter and the separate issue of least squares estimates falling on parameter space boundaries. So far in Section 5, we have considered a simpler model (4) than would be needed in application, by fixing the sill rather than estimating it. This approach was motivated by our goals to understand the rather large number of factors that may impact estimation procedures and to compare EL with three other estimation methods. In practice however, the sill parameter would be estimated and we address this issue here. Results from a small simulation study are presented in Figures 5 and 6 and Appendix Figure 6. We generated 2000 data sets from a Gaussian process on a 50 × 50 lattice with a nugget of 0.5, a sill of 1 and ranges of 1, 4 or 8. We used only the EL technique to estimate the three parameters and report the 10th, 50th and 90th percentiles of (scaled) parameter estimates in Figure 5 (using the same rule of thumb 21 from Section 5.3 to chose the lag matrix). We scaled the parameter estimates by dividing them by the true parameter values, thus the size of these percentiles indicates the multiplicative difference between the estimates and the true parameters. Figure 5 indicates that the EL method performs favorably: the median estimates for all parameters are around 1 (indicated in the figure) and stable as function of the block size. However, performance deteriorates as the sample size is decreased and Figure 6 provides the same percentiles for the scaled EL estimates on a smaller 20 × 20 sampling region (based on 5000 simulations from a similar Gaussian process). From Figure 6 we see that, for a moderate range value (θ2 = 4), the median estimates perform well and are stable over block sizes. However, compared to the 50×50 region, the 90th percentiles for range and sill estimates now assume greater magnitudes while the 10th percentiles for nugget estimates appear less stable as a function of block size (Appendix Figure 6). On the other hand, when the range parameter is set to 10, the median range estimates are no longer stable as a function of block sizes and, with this strengthening of spatial dependence, the lower percentiles for the nugget estimates fall further to zero (Appendix Figure 6) while upper percentiles for range estimates increase greatly in magnitude (Figure 6). Sill estimation remains stable in the sense that medians are on target, though the upper percentiles do increase as well with the greater range parameter. On the 20 × 20 region, setting the range to 10 induces a point where balancing estimation needs becomes difficult in terms of adequate block sizes (number of blocks) and the rule-of-thumb choice of the lag matrix. With these simulations, we mention some numerical issues related to LSEs falling on the boundary of the parameter space. This situation occurs more with small sample sizes and the chance of this vanishes as the sample size increases. In particular, nugget estimates can become zero and range estimates may diverge to 0 or +∞; the latter case can happen especially for large ranges on small regions. Section 5.2 described the 22 Figure 5: Percentiles (10th, 50th and 90th) for (scaled) EL estimates of the range and sill on a 50 × 50 region against block size for three cases: θ2 = 1, 4 or 8 (based on 3000 simulations); a horizontal line running fully between graph margins indicates 1. (b) Sill 4.0 (a) Range 3.0 Range=1 Range=4 Range=8 2.5 1.5 2.0 Percentiles 2.5 2.0 0.5 0.5 1.0 1.0 1.5 Percentiles 3.0 3.5 Range=1 Range=4 Range=8 2 4 6 8 10 12 2 4 6 Block size 8 10 12 Block size Figure 6: Percentiles (10th, 50th and 90th) for (scaled) EL estimates of the range and sill on a 20 × 20 region against block size for two cases: θ2 = 4 or 10 (based on 5000 simulations); a horizontal line running fully between graph margins indicates 1. 12 (b) Sill 25 (a) Range Range=4 Range=10 4 6 Percentiles 15 10 2 5 0 0 Percentiles 8 20 10 Range=4 Range=10 2 3 4 5 6 7 8 2 Block size 3 4 5 Block size 23 6 7 8 occurrence of zero-valued nugget estimates on a 10 × 10 region for all LSE methods involved in the two parameter variogram model (with range parameter 1). In the three-parameter simulations here, no zero-valued nugget estimates occurred on the 50 × 50 region but 1-5% of EL nugget estimates were zero on the 20 × 20 region with range 4 (2-6% for range 8); this varying percentage is due to block differences and typically increased with block size. In the reported two-parameter simulations as well as in the three-parameter study here on a 50 × 50 region, the maximal range estimate for any LSE typically was between 10-300 times larger than the size of the true range, with larger magnitudes associated with estimating large ranges on small regions. But with the ranges chosen for the 20×20 region here, a fraction of EL range estimates did explode in size, being at least 1000 times larger than the true ranges (0-1% with range 4 or 2-5% with range 10, varying with block size); some of these estimates corresponded to range estimates of +∞. Sill estimation also can become more unstable as ranges increase on small regions, but to a lesser extent and never unbounded in our simulations. These observations support our belief that the decision to use EL for estimating spatial parameters should start with a careful analysis of the empirical variogram to understand the strength of spatial dependence in conjunction with an assessment of the size (and extent) of the available data and the choice of a lag matrix. 6 Spatial regression model In this section, we consider an extension of the EL method to a spatial regression model Z(s) = X(s)0 β + ε(s), 24 s ∈ Rd , where X(s) is a q × 1 vector of non-random regressors, β is a vector of regression parameters and ε(s) is a strictly stationary random process with variogram 2γε (·; θ), θ ∈ Θ ⊂ Rp . In this framework, a common approach to variogram fitting involves separately estimating the trend parameter β with some β̂n (e.g., using OLS regression) followed by a step of variogram estimation based on the available residuals ε̂(s) = Z(s) − X(s)0 β̂n ; this approach is similar to that adopted in the data analysis of Lee and Lahiri (2002). In contrast, the EL method extends to inference on both trend β and variogram θ parameters simultaneously through a joint EL function. Simultaneous estimation of the regression parameters and the variogram is attractive for reducing bias in variogram estimation, which can result from basing such estimation on residuals (Cressie, 1993). In addition, uncertainty in estimation of both regression and variogram parameters should be more correctly quantified through simultaneous estimation than through two-step procedures, even if these are iteratively applied. For a blockwise EL function for (β, θ), we use the same block collection {Bbn (i) : i ∈ In } developed in Section 3.3 but with different estimating functions to jointly treat both trend and variogram parameters. For β ∈ Rq and s ∈ Rd , let Zβ (s) = Z(s) − X(s)0 β and define Yβ (s), s ∈ Rd , as in (3) replacing Z(·) with Zβ (·). Then, the process Wβ (s) = {X(s)Zβ (s), Yβ (s)}0 satisfies EWβ (s) = {0q , Γε (θ0 )}0 ∈ Rq × Rr , s ∈ Rd , (11) at the true parameters (β0 , θ0 ) where Γε (θ) is a r × 1 vector with i-th component 2γ (hi ; θ). For each i ∈ In , we let W̄β,i denote the sample mean of the Wβ (·) observations in block Bbn (i). The joint EL function for (β, θ) is ( Ln (β, θ) = NINI · sup ) Y i∈In pi : pi ≥ 0, X pi = 1, i∈In with log-EL function `n (β, θ) = −2b−d n log Ln (β, θ). 25 X i∈In pi W̄β,i = {0q , Γε (θ)}0 The main result here gives the distribution of these joint MELEs β̂n , θ̂n for β, θ, as maximizers of Ln (β, θ), under the spatial regression model. For weakly dependent time series, Bravo (2005) studied a blockwise EL for regression with random regressors X(·). In contrast, our initial result involves non-random regressors, which introduces non-stationary forms of dependence through the EL block means. However, the spatial EL method is shown to remain valid for the spatial regression model and P again requires no variance estimation. In the following, let An = ni=1 X(si )X(si )0 and define sums SX , SY of X(s)Zβ0 (s), Yβ0 (s) over available sites s ∈ {s1 , . . . , sn }; 1/2 define a parameter set Un = {(β, θ) : kAn (β − β0 )k2 + kn1/2 (θ − θ0 )k2 ≤ n2κ } for some 0 < κ ≤ 1/12 and formulate the following: Assumption S: In addition to A3 through A6, there exists δ > 0 such that τ1 > 5d(6 + δ)/δ, 0 < τ2 ≤ (τ1 − d)/d and E{Z(0) − Z(hi )}12+2δ < ∞ for i = 1, . . . , r. Theorem 3 Suppose that Assumption S holds for ε(·) ≡ Z(·) and 2γ(·; ·) ≡ 2γε (·; ·) −1/2 with E|ε(0)|6+δ < ∞; max{kAn X(si )k : 1 ≤ i ≤ n} = O(n−1/2 ); (12) exists and is positive definite; and P {Ln (β, θ) > 0 for (β, θ) ∈ Un } → 1. Then, as n → ∞, p (i.) P (global maximizers β̂n , θ̂n exist) → 1 and (β̂n , θ̂n ) −→ (β0 , θ0 ). d (ii.) `n (β0 , θ0 ) − `n (β̂n , θ̂n ) −→ χ2q+p and 1/2 0 ∆ε ∆ε D0ε Σ−1 ε Bε n (θ̂n − θ0 ) d 0p −→ N , 1/2 −1 An (β̂n − β0 ) 0q Bε Σε Dε ∆ε Qε −1 −1 −1 0 −1 0 with Dε ≡ ∂Γε (θ0 )/∂θ, ∆ε ≡ {D0ε Σ−1 ε Dε } , Qε ≡ Aε −Bε (Σε −Σε Dε ∆ε Dε Σε )Bε , −1/2 0 An SX Aε Bε lim Var −1/2 ≡ (12) . n→∞ n SY Bε Σε Under the conditions of Theorem 3, the limiting covariance matrix of θ̂n matches that of the optimal LSE based on the sample variogram of the underlying process ε(·) (Lahiri et al., 2002), making this MELE of the variogram parameter θ asymptotically 26 as efficient under spatial regression. For the regression parameter, the MELE β̂n can be asymptotically more efficient than the standard OLS estimator β̂n,OLS of β. 1/2 To see this, note that Aε represents the limiting variance An (β̂n,OLS − β0 ) of the OLS estimator and Aε − Qε is nonnegative definite from Theorem 3. Hence, the simultaneous EL approach can improve upon OLS inference for β. In addition, the chi-squared distribution of the log-EL ratio for (β, θ) can again be used for confidence region estimation and profile versions for β or θ alone are also possible as in Theorem 2. −1/2 Remark 1: Regarding Theorem 3 assumptions, a growth condition on max kAn −1/2 is required for a central limit theorem with weighted sums like An X(si )k SX (cf. Lahiri, 2003b) and the moment/mixing conditions match those in Lee and Lahiri (2002). The probability assumption that Ln (β, θ) can be positively computed in a neighborhood Un of (β0 , θ0 ) is similar to conditions in other EL regression contexts (cf. Owen, 1991; Bravo, 2005). Remark 2: The same EL construction is also valid when the regressors are stochastic. Namely, if {X(s), (s)} is strictly stationary and (11) holds, the conclusions of Theorem 3 remain valid under appropriate mixing/moment conditions on {X(s), (s)}. Remark 3: When the random process Z(·) is strictly stationary, Theorem 1 and Theorem 2 remain valid under Assumption S conditions. 6.1 An example In this section we briefly illustrate the EL spatial regression method with a simulated data set on a small 15 × 15 sampling region Rn . Real-valued data were simulated as Z(s) = β0 + β1 s1 + β2 s2 + ε(s), s = (s1 , s2 ) ∈ Z2 , using a simple linear trend in the coordinates at each location s ∈ Rn ∩ Z2 with (β0 , β1 , β2 ) = (1, 0.5, 0.75) and continuous errors ε(s) ≡ ε1 (s) + ε2 (s) consisting of 27 a mean-zero stationary Gaussian processes ε1 (·) pertubated by a collection of ε2 (·) √ independently distributed chi-squared variables 0.3(χ21 − 1). The resulting error process has an isotropic Gaussian variogram 2γε (h; θ) = 2θ0 + 2θ1 1 − exp − 3(khk/θ2 )2 , h 6= 0 ∈ R2 , involving nugget, sill, and range parameters θ = (θ0 , θ1 , θ2 ) = (0.6, 1, 4), respectively. For comparison, we describe two analysis approaches. One common practice in geostatistical applications is to separately estimate the large scale structure (i.e., β0 , β1 , β2 ) by OLS regression, for example, followed by estimation of the variogram (i.e., 2γε (·; θ)) based on the resulting residuals. This is simply a two-step procedure. Based on the OLS regression residuals, we applied the EL method described in Section 4.2 to obtain 90% confidence bands for the variogram model, presented in Figure 7. This two stage detrending analysis has the disadvantage of not accurately reflecting the uncertainty in variogram fitting due to regression estimation, possibly losing some regression precision and introducing bias into variogram estimation (see Cressie, 1993). We next applied the EL procedure from Section 6 for simultaneous analysis of regression and variogram scale structures and Figure 7 also presents the confidence bands resulting from this approach. The lags for EL were chosen by the empirical rule described in Section 5.3. For this example, the range is θ2 = 4 and we used the matrix h given by h1 = (1, 0)0 , h2 = (0, 1)0 , h3 = (0, 2)0 , h4 = (2, 0)0 , h5 = (1, 1)0 , h6 = (2, 2)0 , h7 = (3, 3)0 and h8 = (4, 4)0 with a block size bn = 3. Both estimation approaches produce comparable variogram point estimates but the confidence bands are wider when accounting for uncertainty in the regression parameters. The simultaneous analysis also provides confidence intervals for the large-scale (regression) parameters and, in this example, 90% EL confidence intervals for (β0 , β1 , β2 ) are (1.61,4.37), (0.15,0.55) and (0.50,0.81), respectively. The width of the EL confidence bands increases with distance, which is not surpris28 Figure 7: Two stage and simultaneous analysis EL confidence bands represented by dashed lines. True variogram used to generate these data is represented by the solid 8 line. 4 0 2 Variogram 6 True Variog Bands_Detr Bands_Sim 0 2 4 6 8 10 Distance ing, but does indicate that variogram values at shorter distances are more precisely estimated than values at longer distances. This is pleasing from the viewpoint of spatial prediction, since typical kriging predications depend primarily on variogram values at shorter distances. But also note that the width of the confidence band does increase substantially before the sill is reached; the width of the confidence bands in Figure 7 at a distance of 4 units are about half again greater than at a distance of 2 units. Such information could prove useful in applications for which a kriging neighborhood is selected (e.g. Cressie, 1993, p.134, 158) although a full discussion is beyond the scope of this article. A closer analysis of this figure reveals that the upper EL confidence band limits are more extreme than the lower limits, which agrees with percentile behavior in Figures 5 and 6. In this example, the width of the confidence bands reflects the sample size. For larger samples, the bands become narrower and more informative over all distances and simultaneous/detrended bands become closer. 29 7 Summary and concluding remarks This article presents a new method for estimation of variogram model parameters for intrinsically stationary processes using a blockwise spatial empirical likelihood (EL) approach. The proposed method has the advantage that it does not require knowledge about the full joint distribution of the spatial data and involves no covariance matrix estimation or inversion. This makes the EL method computationally more attractive than generalized least squares (GLS) or parametric approaches. The internal studentization feature of the spatial EL method can be convenient in other ways. When the variogram inference problem changes (e.g., inference on both spatial trend and variogram parameters), the same essential construction of an EL function applies and any new needs in spatial studentization are handled automatically within the mechanics of EL function which can be calibrated for confidence regions. Hall and La Scala (1990, p. 110) have noted similar EL properties with independent data in comparing EL to the bootstrap. Under mild conditions, EL variogram estimators are asymptotically normal and as efficient as the LSE-optimal GLS or subsampling least squares (SLS) estimators. Numerical studies suggest that EL and SLS point estimators have comparable mean squared errors in large samples, but coverage probabilities for EL confidence regions are typically closer to the nominal levels than for SLS regions. In terms of computational speed, both SLS and the spatial EL were similar in our simulation studies and were slightly more demanding than weighted or ordinary least squares. The spatial EL method also extends to spatial regression problems, allowing simultaneous inference on both regression and variogram parameters. Numerical studies also indicate that a Bartlett correction may improve the coverage probabilities of spatial EL confidence regions. The mechanics to rigorously establish the Bartlett correction for the spatial setting are not yet fully developed and should be addressed in future work along with data driven methods for spatial 30 block selection. Additionally, improvements in the performance of the EL method should be possible through data tapering. For time series data, Paparoditis and Politis (2001) have shown that tapered data blocks produce better block bootstrap variance estimators. This notion can be applied to build a spatial EL function that replaces spatial block sample averages with tapered spatial block averages. Like the block bootstrap, the variance estimation mechanism internal to the spatial EL should be improved to enhance the performance of the method in general. Other possibilities for future research with the spatial EL method include model testing. While we focused on estimation in this manuscript, the statistic `n (θ̂n ) based on our EL estimator θ̂n ∈ Rp can also be used to test whether the variogram moment conditions (4) hold as a means of variogram model checking. If there are r > p lags used, then `n (θ̂n ) will have an asymptotic chi-squared distribution with r − p degrees of freedom under Theorem 1 to test if (4) holds for some parameter θ0 . Finally, we comment on potential uses of EL confidence regions and bands to assess the uncertainty associated with variogram estimation. This feature may be useful in practice when one needs to detect changes in spatial structure between different regions or different time intervals, since variogram confidence bands have the potential to facilitate meaningful comparisons. Connections exist between parameter estimation of variograms, large-scale regression models, and spatial prediction (Cressie 1993). Investigation of these connections, particularly in actual applications, has been hampered by the inability to easily quantify uncertainty in variogram estimation. For example, it is well accepted that the typical estimates of prediction error in applications of kriging are underestimates of the true prediction error (e.g. Cressie 1993, p.111, 127) because uncertainty in variogram estimation is not taken into account. Although EL estimation of uncertainty does not lend itself easily to theoretical analysis, uncertainty in variogram estima31 tion is easily computed with the spatial EL approach for specific applications. This holds promise for obtaining practical expressions for the total uncertainty in kriging predictions for specific applied problems. References Bravo, F. (2002) Testing linear restrictions in linear models with empirical likelihood. The Econometrics Journal Online, 5, 104-130. Bravo, F. (2005) Blockwise empirical entropy tests for time series regressions. J. Time Ser. Anal., 26, 185-210. Chan G. and Wood A. T. A. (1997) An algorithm for simulating stationary Gaussian random fields. Applied Statistics, 46, 171-181. Chen, S. X. (1993) On the coverage accuracy of empirical likelihood regions for linear regression models. Ann. Inst. Statist. Math. 45, 621-637. Chen, S. X. and Cui, H.-J. (2006a) On Bartlett correction of empirical likelihood in the presence of nuisance parameters. Biometrika, 93, 215-220. Chen, S. X. and Cui, H.-J. (2006b) On the second order properties of empirical likelihood with moment restrictions. Technical report. Dept. of Statistics. Iowa State University. Cressie, N. (1985) Fitting variogram models by weighted least squares. J. Int. Ass. Math. Geol., 17, 693-702. Cressie, N. (1993) Statistics for Spatial Data, 2nd Edition. John Wiley & Sons, New York. DiCiccio, T., Hall, P., and Romano, J. P. (1991) Empirical likelihood is Bartlettcorrectable. Ann. Statist., 19, 1053-1061. 32 Doukhan, P. (1994) Mixing: properties and examples. Lecture Notes in Statistics 85. Springer-Verlag, New York. Francesco, B. (2002) Testing linear restrictions in linear models with empirical likelihood. The Econometrics Journal Online, 5, 104-130. Francesco, B. (2005) Blockwise empirical entropy tests for time series regressions J. Time Ser. Anal., 26, 185-210. Hall, P. and La Scala, B. (1990). Methodology and algorithms of empirical likelihood. Internat. Statist. Rev. 58 109-127. Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes. Ann. Statist., 25, 2084-2102. Kitamura, Y., Tripathi, G. and Ahn, H. (2004). Empirical likelihood-based inference in conditional moment restriction models. Econometrica 72, 1667-1714. Künsch, H. R. (1989). The jackknife and bootstrap for general stationary observations. Ann. Statist. 17 1217-1261. Lahiri, S. N. (2003a) Resampling Methods for Dependent Data. Springer, New York. Lahiri, S. N. (2003b) Central limit theorems for weighted sums of a spatial process under a class of stochastic and fixed designs. Sankhya: Series A, 65, 356-388. Lahiri, S.N., Lee, Y. and Cressie, N. (2002) On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters. J. Statist. Planng Inf., 103, 65-85. Lee, Y. D. and Lahiri, S. N. (2002) Least squares variogram fitting by spatial subsampling. J. R. Stat. Soc. Ser. B, 64, 837-854. Matheron, G. (1962) Traité de geostatistique appliquée, tome I. Mem. Bur. Rec. Geol. Min., 14. Monti, A. C. (1997). Empirical likelihood confidence regions in time series models. Biometrika, 84, 395-405. 33 Newey, W. K. and Smith, R. J. (2004) Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 72 219-255. Nordman, D. J. and Lahiri, S. N. (2006). A frequency domain empirical likelihood for short- and long-range dependence. Ann. Statist. 64, 3019-3050. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 237-249. Owen, A. B. (1990). Empirical likelihood confidence regions. Ann. Statist., 18, 90120. Owen, A. B. (1991). Empirical likelihood for linear models. Ann. Statist., 19, 17251747. Owen, A. B. (2001). Empirical likelihood. Chapman & Hall, London. Paparoditis, E. and Politis, D. N. (2001). Tapered block bootstrap. Biometrika 88, 1105-1119. Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist., 22, 2031-2050. Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist., 22, 300-325. Sherman, M. (1996) Variance estimation for statistics computed from spatial lattice data. J. R. Stat. Soc. Ser. B, 58, 509-523. Zhang, J. (2006). Empirical likelihood for NA series. Statist Probab. Lett. 76, 153160. Zhang, X., van Eijkeren, J. and Heemink, A. (1995) On the weighted leastsquares method for fitting a semivariogram model. Comput. Geosci., 21, 605608. 34 Supplementary Material: Point and interval estimation of variogram models using spatial empirical likelihood Daniel J. Nordman, Petruţa C. Caragea Department of Statistics Iowa State University Ames, IA 50011 Supplementary material to follow consists of an Appendix providing proofs of the main results as well as some additional figures (appearing on the last pages). 8 Appendix: Proofs of Main Results Section 8.1 describes the assumptions used to establish the main distributional results on spatial EL variogram inference and proofs are presented in Section 8.2. To simplify the presentation, a very general blockwise EL argument and result is formulated in Section 8.3, to which we will refer. Theorem 4 there allows different EL scenarios to be treated in a unified manner. 8.1 Assumptions Limits in order symbols are taken letting n → ∞ and, for two positive sequences, we write sn ∼ tn if sn /tn → 1. For a vector x = (x1 , ..., xd )0 ∈ Rd , let kxk and kxk∞ = max1≤i≤d |xi | denote the Euclidean and l∞ norms of x, respectively. Define the distance between two sets E1 , E2 ⊂ Rd as: dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 }. Recall the process {Z(s) : s ∈ Rd } is assumed to be real-valued and intrinsically stationary. Let F(T ) 1 denote the σ-field generated by the random vectors {Z(s) : s ∈ T }, T ⊂ Rd . For T1 , T2 ⊂ Rd , write α̃(T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ F(T1 ), B ∈ F(T2 )} and define the strong mixing coefficient for the process Z(·) as αY (v, w) = sup{α̃(T1 , T2 ) : Ti ⊂ Rd , |Ti | ≤ w, i = 1, 2; dis(T1 , T2 ) ≥ v}, v, w > 0. (13) Spatial observations Z(·) are observed at sites on the integer lattice Zd within a spatial sampling region Rn = λn R0 ⊂ Rd . Recall the EL method involves block sample P means Ȳi = s∈Bbn (i)∩Zd Y(s)/bdn of distances Y(·) ∈ Rr as in (3) (i.e., formed by lags {hi }ri=1 ) over a collection of blocks {Bbn (i) : i ∈ In }; each block contains bdn observations of Y(·) and there are NI = |In | blocks. In the following, let θ0 ∈ Θ ⊂ Rp denote the unique parameter value satisfying the moment condition (4) and d/2 define normalized block means Ỹi = bn (Ȳi − EȲi ), i ∈ In . We make the following assumptions, which are similar to those used by Lee and Lahiri (2002). −1/2 A1. NI P i∈In (Ȳi d − EȲi ) −→ Z, for a normal vector Z ∼ N {0r , Σ(θ0 )} with positive definite Σ(θ0 ). P A2. kNI−1 i∈In EỸi Ỹi0 − Σ(θ0 )k = o(1); max{EkỸi k4+δ0 : i ∈ In } = O(1) for some P δ0 > 0; and kNI−1 i∈In P (Ỹi ≤ y) − P (Z ≤ y)k = o(1) for each y ∈ Rr . A3. There exist τ1 , τ2 > 0 with τ1 ≥ dτ2 such that α(v, w) ≤ Cv −τ1 wτ2 for all v, w ≥ 1. A4. For any > 0, there exists δ > 0 with inf{kΓ(θ) − Γ(θ0 )k : kθ − θ0 k ≥ , θ ∈ Θ} > δ . A5. In a neighborhood of θ0 , Γ(θ) is continuously differentiable and D(θ0 ) ≡ ∂Γ(θ0 )/∂θ has full column rank p. 2 A6. As n → ∞, b−1 n + bn /λn = o(1) and, for any positive real sequence an → 0, the number of cubes of an Zd which intersect both closures R0 and Rd \ R0 is −(d−1) O(an ). 2 Assumption A1 implies that an unbiased block-based estimator P i∈In Ȳi /NI of the variogram 2γ(h) at lags h ∈ {h1 , . . . , hr } is asymptotically normal. This blockwise estimator involves a weighted average of vector Y(·) observations and so represents a variation on Matheron’s (1962) method-of-moments estimator (1); hence, A1 essentially implies that the sample variogram has a normal limit. Assumptions A2 and A3 closely match the mixing/moment assumptions of Lee and Lahiri (2002). The probability condition in A2 is used only to ensure that the EL ratio (6) is positive at θ0 . Assumptions A4 and A5 are smoothness and identifiability conditions on the variogram model, and may be checked in practice. Assumption A6 entails a spatial generalization of EL block scaling conditions used for time series d = 1 (Kitamura, 1997). The condition on the template R0 implies that the total number of Z(·)-sampling sites in Rn = λn R0 (located at Zd ∩ Rn ) is of larger magnitude O(λdn ) than the number O(λd−1 n ) of sites near the boundary of Rn , allowing us to avoid pathological regions in the same manner as with the spatial subsampling by Lee and Lahiri (2002). The R0 -boundary condition also implies that spatial sample size n = |Rn ∩ Zd | ∼ vol(Rn ) and the number of size-bdn EL blocks NI = |In | ∼ vol(Rn ), where vol(·) denotes volume and vol(Rn ) = λdn vol(R0 ); see Lahiri (2003a, Chapter 12.2) for details. We will use that n, NI , vol(Rn ) are asymptotically equivalent in arguments to follow. 8.2 Proofs of the Main Results To prove Theorems 1 and 2, we apply the general blockwise result in Theorem 4 (Section 8.3) with Lemma 1 below, recalling the EL function Ln (·) from (6) for variogram inference. d/2 Lemma 1 Let δ ≡ 4−1 δ0 /(4 + δ0 ) from A2 and Ỹi = bn {Ȳi − Γ(θ0 )}, i ∈ In . Under 0 p b θ0 ≡ N −1 P the moment condition (4) and A1 through A6: (i) Σ I i∈In Ỹi Ỹi −→ Σ(θ0 ); −d/2 1/2−δ (ii) maxi∈In kỸi k = op bn NI ; and (iii) P Ln (θ) > 0 for θ ∈ Θn → 1 for 3 −1/2+δ Θn = {θ ∈ Θ : kθ − θ0 k ≤ NI } proof. Note that under (4), EȲi = Γ(θ0 ). With A2/A3, part(i) follows from b θ0 − Σ(θ0 )| = o(1) and, by Lemma 1 of Lee and Lahiri (2002), Var(Σ b θ0 ) = o(1). |EΣ P 1/(4+δ0 ) by Part(ii) follows from E(maxi∈In kỸi k) ≤ ( i∈In EkỸi k4+δ0 )1/(4+δ0 ) ≤ NI d/2 1/4 A2 and bn /NI Fn (s, θ) ≡ NI−1 = o(1) by A6. For part(iii), define an empirical distribution P d/2 i∈In I(bn s0 {Ȳi − Γ(θ)} < 0), s ∈ Rr , θ ∈ Θn where I(·) de- notes the indicator function, and let Z denote a N {0r , Σ(θ0 )} vector. Using A2/A3 d/2 as well as supθ∈Θn bn kΓ(θ) − Γ(θ0 )k = o(1) by A5/A6, it may be shown that p supθ∈Θn sups∈Rr ,ksk=1 |Fn (s, θ) − P (s0 Z < 0)| −→ 0. (The main step to show this is P p d/2 that, for each fixed y ∈ Rr , F̃n (y) ≡ NI−1 i∈In I(bn {Ȳi −Γ(θ0 )} ≤ y) −→ P (Z ≤ y) holds using |EF̃n (y) − P (Z ≤ y)| = o(1) by the probability condition in A.2 along with Var(F̃n (y)) = o(1) by subsampling arguments based on mixing as in Politis and Romano (1994).) By this and the fact that inf s∈Rr ,ksk=1 P (s0 Z < 0) > C holds for some C > 0 because Σ(θ0 ) is positive definite (Lemma 2, Owen, 1990), we have P ( inf inf θ∈Θn s∈Rr ,ksk=1 Fn (s, θ) > C) → 1. (14) Fix θ ∈ Θn . Then Ln (θ) > 0 holds if 0r ∈ Rr is interior to the convex hull of {Ȳi − Γ(θ) : i ∈ In }; see Section 3.3. If 0r ∈ Rr is not interior, there exists s ∈ Rr , ksk = 1 such that s0 {Ȳi −Γ(θ)} ≥ 0 for all i ∈ In by the supporting/separating hyperplane theorem, implying Fn (s, θ) = 0. In which case, P (Ln (θ) > 0 for θ ∈ Θn ) must be at least as great as the probability in (14), proving Lemma 1(iii). Proof of Theorems 1-2. We apply the general argument from Section 8.3 with −1/2 ϑ ≡ θ, Θ̃ ≡ Θ, Mi,θ ≡ NI 1/2 {Ȳi − Γ(θ)}, Cn ≡ NI Ip , Σ ≡ Σ(θ0 ), D ≡ −D(θ0 ), and Θ̃n ≡ Θn defined with δ ≡ 4−1 δ0 /(4 + δ0 ) from Lemma 1. In this set-up, we −1/2 verify the conditions of Theorem 4 hold. Note that Mi,θ − Mi,θ0 = NI 4 {Γ(θ) − −1/2 Γ(θ0 )}, ∂Mi,θ /∂θ = NI ∂Γ(θ)/∂θ. From this, Condition B1 follows from A1 and (4); B2 from Lemma 1(i); B3 from A5; B4 from Lemma 1(iii); B5 from A1/A5; B6 from Lemma 1(ii) and A6; and B7/B8 from A5. Now assuming the additional condition in Theorem 4(v) holds, then Theorem 1 and Theorem 2(i) follow directly from Theorem 4; Theorem 2(ii) holds as well by modifying arguments in Qin and Lawless (1994, Corollary 5). To verify the condition in Theorem 4(v), we define an EL ratio L̃n (µ) for the Z(·)process variogram parameter µ ≡ {2γ(h1 ), . . . , 2γ(hr )}0 ∈ Rr by replacing “Γ(θ)” Then, letting µθ ≡ Γ(θ), θ ∈ Θ, we have L̃n (µθ ) = Ln (θ). P ≡ NI−1 i∈In Ȳi is the maximizer of L̃(µ), with L̃(µ̂n ) = 1. Fix with “µ” in (6). Note that µ̂n α ∈ (0, 1). The µ-confidence set In,α ≡ {µ ∈ Rr : L̃n (µ) ≥ exp(−bdn χ2r;α /2)} is convex in Rr (Theorem 2.2, Hall and La Scala, 1990) and hence connected. By Thed 2 orem 4(i), `n (θ0 ) = −2b−d n log L̃n (µθ0 ) −→ χr and it can generally be shown that −1/2 −2b−d n log L̃n (µθ0 + NI d Σ1/2 s) −→ χ2r (ksk2 ) holds for s ∈ Rr as in Owen (1990, Corollary 1), where ksk2 denotes a non-centrality parameter (this follows from (16) d 1/2 here). By this and the fact that µ̂n ∈ In,α with NI (µ̂n − µθ0 ) −→ N (0r , Σ) by A1, 1/2 it must hold that supµ∈In,α NI kµ − µθ0 k = Op (1) by the connectedness of In,α . (The −1/2 idea is that, for large C > 0, a ball of radius NI C around µθ0 will contain µ̂n with −1/2 arbitrarily high probability and points on this ball’s boundary, say Bd(NI C), 1/2 will belong to In,α with arbitrarily low probability; if supµ∈In,α NI kµ − µθ0 kC > −1/2 1/2 NI kµ̂n − µθ0 k, then connectedness of In,α implies “In,α ∩ Bd(NI C) is non-empty” (else In,α could be divided among two open sets), but this quoted event has low prob1/2 ability when C is large.) It then follows that sup{NI kθ − θ0 k : θ ∈ Θ, µθ ∈ In,α } = Op (1) by A4/A5 so that the Theorem 4(v) condition follows. 5 Proof of Theorem 3. We sketch the proof, applying Theorem 4 as follows. Let Im denote the m × m identity matrix, m ≥ 1. With notation from Section 6, we define terms to be used in the argument from Section 8.3: ϑ ≡ (β, θ), Θ̃ ≡ Rq × Θ ⊂ Rq+p ; (q + r) × 1 vector Mi,θ ≡ C−1 n {W̄i − (0q , Γ(θ))}; Σ ≡ (q + r) × (q + r) matrix in (12); Θ̃n ≡ Un and δ ≡ κ; (q + r) × (q + r) and (q + r) × (q + p) matrices Cn ≡ 1/2 An 0 0 NI Ir 1/2 , Iq 0 D ≡ − . 0 D In this framework, we may verify the conditions of Theorem 4 under Theorem 3 assumptions. Condition B1 holds by assumption. Under the mixing/moment assumptions and (12), B2 follows from Lahiri (2003b, Theorem 4.3) while B3 follows b ϑ0 − Σk = o(1) here and showing Var(Σ b ϑ0 ) = o(1) with straightforfrom checking kEΣ ward modifications to Lee and Lahiri (2002, Lemma 1). Conditions B4 through B8 may be checked using standard moment bounds from Doukhan (1994, Theorem 1.2.3) for weighted sums of random variables. Now Theorem 3 will follow from Theorem 4 upon verifying the condition in Theorem 4(v) for ϑ = (β, θ). With “µ, µθ ” defined as in the proof of Theorem 1 above, define L̃n (β, µ) by replacing “Γ(θ)” with “µ” in the definition of Ln (ϑ) ≡ Ln (β, θ) so that L̃n (β, µθ ) = Ln (ϑ). Fix α ∈ (0, 1). It may be shown that In,α ≡ {(β, µ) ∈ Rq+r : L̃n (β, µ) ≥ exp(−bdn χ2q+r;α /2)} is connected in Rq+r ; namely, it holds that {β ∈ Rq : supµ∈Rr L̃n (β, µ) ≥ exp(−bdn χ2q+r;α /2)} is connected while, for fixed β, {µ ∈ Rr : (β, µ) ∈ In,α } is convex. Then, by Theorem 4(i) and arguments as in Owen (1990, Corollary 1), we have sup(β,µ)∈In,α kCn {(β, µ)−(β0 , µθ0 )}k = Op (1) by the connectedness of In,α . Consequently, sup{kCn (ϑ−ϑ0 )k : ϑ = (β, θ) ∈ Rq ×Θ, (β, µθ ) ∈ In,α } = Op (1) by A4/A5, implying that the Theorem 4(v) condition holds. 6 8.3 A General Blockwise EL Argument Suppose Mi,ϑ : In × Θ̃ → Rr represents an estimating function defined on Bbn (i), Q i ∈ In , ϑ ∈ Θ̃ ⊂ Rp with corresponding EL function Ln (ϑ) = NINI sup{ i∈In pi : P P pi ≥ 0, i∈In pi = 1, i∈In pi Mi,ϑ = 0r } and `n (ϑ) = −2b−d n log Ln (ϑ). For some δ ∈ (0, 1/2), ϑ0 ∈ Θ̃ and invertible p × p scaling matrix Cn , define Θ̃n = {θ ∈ Θ̃ : P 0 b ϑ = bdn P kCn (ϑ − ϑ0 )k ≤ NIδ }. For ϑ ∈ Θ̃n , let Mϑ = i∈In Mi,ϑ , Σ i∈In Mi,ϑ Mi,ϑ , Ωϑ = max{1, kCn (ϑ − ϑ0 )k} and suppose ∂Mi,ϑ /∂ϑ is continuous on Θ̃n , i ∈ In . Define two functions of (ϑ, t) on Θ̃n × Rr as Q1n (ϑ, t) = X i∈In Mϑ,i , 1 + t0 Mϑ,i Q2n (ϑ, t) = b−d n X ∂Mϑ,i /∂ϑ 0 t . 0M 1 + t ϑ,i i∈I (15) n Define ϑ̂∗n ≡ arg maxϑ∈Θ̃n Ln (ϑ) and the global EL maximizer ϑ̂n ≡ arg maxϑ∈Θ̃ Ln (ϑ). Let χ2v denote a chi-squared random variable with v degrees of freedom and lower α quantile denoted as χ2v,α . d Theorem 4 Suppose (B1) Mϑ0 −→ N (0r , Σ) with r × r positive definite matrix Σ; P p b ϑ0 −→ (B2) Σ Σ; (B3) supϑ∈Θ̃n i∈In kMi,ϑ0 − Mi,ϑ k = Op (NIδ ); (B4) P (Ln (ϑ) > 0 on ϑ ∈ Θ̃n ) → 1; (B5) supϑ∈Θ̃n kMϑ k/Ωϑ = Op (1); (B6) supϑ∈Θ̃n maxi∈In bdn NIδ kMi,ϑ k = op (1); (B7) for an r × p matrix D of rank p, P supϑ∈Θ̃n kD − i∈In (∂Mi,ϑ /∂ϑ)C−1 n k = op (1); P (B8) supϑ∈Θ̃n i∈In k(∂Mi,ϑ /∂ϑ)C−1 n k = Op (1). d d Then, (i) P (ϑ̂∗n exists) → 1; (ii) `n (ϑ0 ) −→ χ2r ; (iii) Cn (ϑ̂∗n −ϑ0 ) −→ N {0p , (D0 Σ−1 D)−1 }; d (iv) `n (ϑ0 )−`n (ϑ̂∗n ) −→ χ2p ; and (v) if, in addition, sup{kCn (ϑ−ϑ0 )k : ϑ ∈ Θ̃, `n (ϑ) ≤ χ2r,α } = op (NIδ ) for some fixed α ∈ (0, 1), then P (ϑ̂∗n = ϑ̂n ) → 1. proof. By B4, we may write Ln (ϑ) = Q i∈In (1 + γi,ϑ ) > 0 for ϑ ∈ Θ̃n where γi,ϑ = t0ϑ Mϑ,i < 1 and Q1n (ϑ, tϑ ) = 0r (see Section 3.3 for details). Note that bϑ − Σ b ϑ0 k = op (1) so that sup b B3/B6 imply supϑ∈Θ̃n kΣ ϑ∈Θ̃n kΣϑ − Σk = op (1) by B2. 7 b ϑ and Q1n (ϑ, tϑ ) = 0r entail that tϑ is a continThe resulting positive definiteness of Σ uously differentiable function of ϑ on Θ̃n by the implicit function theorem and `n (ϑ) is as well; see Qin and Lawless (1994, p. 304). Hence, `n (ϑ) attains a minimum (or Ln (ϑ) a maximum) on Θ̃n , establishing Theorem 4(i). Following Owen (1990, p. 101), write tϑ = ktϑ kuϑ ∈ Rr , kuϑ k = 1 for ϑ ∈ Θ̃n P and expand 0 = u0ϑ Q1n (ϑ, tϑ ) = u0ϑ Mϑ − i∈In u0ϑ Mi,ϑ M0i,ϑ tϑ /{1 + t0ϑ Mϑ,i } to find 0≥ b ϑ uϑ kMϑ k ktϑ k u0ϑ Σ − , bdn Ωϑ 1 + Zn Ωϑ ϑ ∈ Θ̃n , b ϑ − Σk = op (1) with where Zn ≡ supϑ∈Θ̃n maxi∈In ktϑ kkMi,ϑ k. Using supϑ∈Θ̃n kΣ B5/B6, this inequality yields supϑ∈Θ̃n ktϑ k/Ωϑ = Op (bdn ) and Zn = op (1). For any b −1 Mϑ + φϑ where kφϑ k ≤ ϑ ∈ Θ̃n , we algebraically solve 0r = Q1n (ϑ, tϑ ) for tϑ = bdn Σ ϑ d b −1 kkΣ b ϑ k/(1 − Zn ) so that sup Zn ktϑ kkΣ ϑ∈Θ̃n kφϑ k/Ωϑ = op (bn ). Applying Taylor’s ϑ 2 expansion gives log(1 + γi,ϑ ) = γi,ϑ − γi,ϑ /2 + ∆i,ϑ for each i ∈ In so that b −1 Mϑ − b−d φ0 Σ b bdn `n (ϑ) = bdn M0ϑ Σ n ϑ ϑ φϑ + 2 ϑ X ∆i,ϑ , i∈In P −d with bn i∈In 2 3 b |∆i,ϑ | ≤ b−2d n ktϑ0 k Zn kΣϑ k/(1 − Zn ) ; this and bounds on ktϑ k, kφϑ k yield b −1 Mϑ |/Ω2ϑ = op (1). sup |`n (ϑ) − M0ϑ Σ ϑ (16) ϑ∈Θ̃n For ϑ = ϑ0 , we have Ωϑ0 = 1 so that (16) and B1/B2 yield Theorem 4(ii). b ϑ − Σk = op (1) and B5/B7, a Taylor expansion gives By (16), supϑ∈Θ̃n kΣ sup |`n (ϑ) − Wϑ0 Σ−1 Wϑ |/Ω2ϑ = op (1) (17) ϑ∈Θ̃n where Wϑ = Mϑ0 + DCn (ϑ − ϑ0 ). By (17), `n (ϑ) ≥ σNI2δ /2 holds uniformly for ϑ ∈ Bd(Θ̃n ) ≡ {ϑ ∈ Θ̃n : kCn (ϑ − ϑ0 )k = NIδ } when n is large, where σ denotes the smallest eigenvalue of D0 Σ−1 D, while `n (ϑ0 ) = Op (1) holds by Theorem 4(ii). Hence, by the differentiability of `n (ϑ), this function’s minimum ϑ̂∗n on Θ̃n must lie in Θ̃n \ Bd(Θ̃n ) and satisfy 0r = Q1n (ϑ̂∗n , tϑ̂∗n ) and 0p = ∂`n (ϑ̂∗n )/∂ϑ = 2Q2n (ϑ̂∗n , tϑ̂∗n ). 8 −1 As before, from 0r = Q1n (ϑ̂∗n , tϑ̂∗n ) we deduce b−d n tϑ̂∗n = Σ Wϑ̂∗n + op (δn ) for δn = ktϑ̂∗n /bdn k + kCn (ϑ̂∗n − ϑ0 )k using B5/B7, while 0p = (C0n )−1 Q2n (ϑ̂∗n , tϑ̂∗n ) implies 0p = D0 b−d n tϑ̂∗n (1 + op (1)) by Zn = op (1) and B8. In matrix form, we may write −1 tϑ̂∗n /bdn Cn (ϑ̂∗n − ϑ0 ) Σ −D = −D0 0 Mϑ0 + op (δn ) . op (δn ) By B1, δn = Op (1) then follows from Mϑ0 = Op (1) and we also have d Cn (ϑ̂∗n − ϑ0 ) = (D0 Σ−1 D)−1 D0 Σ−1 Mϑ0 + op (1) −→ N {0p , (D0 Σ−1 D)−1 }, (18) establishing Theorem 4(iii). To prove Theorem 4(iv), it follows from (17), (18), and d B1 that `n (ϑ0 ) − `n (ϑ̂∗n ) = (Σ−1/2 Mϑ0 )0 PΣ−1/2 D (Σ−1/2 Mϑ0 ) + op (1) −→ χ2p , where PΣ−1/2 D denotes the orthogonal projection matrix for the column space of Σ−1/2 D, which has rank p. To establish Theorem 4(v), recall `n (ϑ) = −2b−d n log Ln (ϑ), ϑ ∈ Θ̃ and `n (ϑ) ≡ ∞ if it is not true that Ln (ϑ) > 0. For large n, two events hold with probability arbitrarily close to 1: “a maximum ϑ̂∗n of Ln (ϑ) on Θ̃n exists” by Theorem 4(i) and “{ϑ ∈ Θ̃ : Ln (ϑ) ≥ exp(−bdn χ2r;α /2)} ⊂ Θ̃n ” by the condition in Theorem 4(v). These events in quotations together imply ϑ̂∗n = ϑ̂n . 9 Appendix Figure 1: Coverage probabilities for 90% confidence regions (EL and SLS methods) for (θ1 , θ2 ) plotted against block sizes for a 30 × 30 sampling region, where θ1 = 0.5 (based on 5000 simulations). 1.0 0.8 0.2 0.4 0.6 Coverage Probabilities 0.8 0.2 0.4 0.6 Coverage Probabilities 0.8 0.6 0.4 0.2 Coverage Probabilities (c) θ2 = 8 1.0 (b) θ2 = 4 1.0 (a) θ2 = 1 2 3 4 5 6 7 8 0.0 EL SLS 0.0 EL SLS 0.0 EL SLS 9 2 3 4 5 Block size 6 7 8 9 2 3 4 5 Block size 6 7 8 9 Block size Appendix Figure 2: Relative Root Mean Squared Error (%) for nugget θ1 = 0.5 estimation on a 50 × 50 region over several values of range θ2 (based on 3000 simulations). 30 (c) θ2 = 8 30 (b) θ2 = 4 EL SLS WLS OLS 10 15 ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● 4 6 8 Block size 10 12 ● ● ● ● ● ● ● ● ● 5 ● 0 5 0 5 ● 2 EL SLS WLS OLS 20 ● ● 15 ● 10 ● 25 ● 20 ● EL SLS WLS OLS 15 ● 10 ● Relative Root MSE (%) ● 20 ● 0 Relative Root MSE (%) 25 ● Relative Root MSE (%) 30 (a) θ2 = 1 2 4 6 8 Block size 10 10 12 2 4 6 8 Block size 10 12 Appendix Figure 3: Relative Root Mean Squared Error (%) for the range θ2 parameter estimation on a 30 × 30 region with nugget θ1 = 0.5 (based on 5000 simulations). 2 3 4 ● ● ● ● 5 6 7 8 9 100 80 ● ● ● ● ● 2 3 ● ● 0 0 0 ● 60 60 ● ● 40 80 ● ● 20 ● ● 20 ● 20 ● ● ● ● ● EL SLS WLS OLS ● 40 40 60 80 ● Relative Root MSE (%) ● EL SLS WLS OLS ● Relative Root MSE (%) EL SLS WLS OLS ● Relative Root MSE (%) (c) θ2 = 8 100 (b) θ2 = 4 100 (a) θ2 = 1 2 3 4 Block size 5 6 7 8 9 4 5 Block size 6 7 8 9 Block size Appendix Figure 4: Relative Root Mean Squared Error (%) for nugget θ1 = 0.5 estimation on a 30 × 30 region over several values of range θ2 (based on 5000 simulations). ● ● 100 ● ● ● ● ● 2 3 4 5 6 Block size 7 8 9 80 60 60 40 ● ● 0 0 ● 40 80 ● ● ● ● ● ● ● ● 2 3 4 5 6 7 8 ● 0 ● EL SLS WLS OLS ● Relative Root MSE (%) ● 20 ● Relative Root MSE (%) 80 60 40 ● 20 ● EL SLS WLS OLS ● 20 EL SLS WLS OLS ● Relative Root MSE (%) (c) θ2 = 8 100 (b) θ2 = 4 100 (a) θ2 = 1 2 3 4 5 6 Block size 11 7 8 9 Block size 9 Appendix Figure 5: Relative Root Mean Squared Error (%) for nugget θ1 = 0.5 and range θ2 = 1 estimation on a 10 × 10 sampling region (based on 10000 simulations). 100 (b) Range 100 (a) Nugget 80 60 60 40 EL SLS WLS OLS ● ● 2 3 4 ● ● ● 5 6 7 0 0 20 ● ● 40 ● Relative Root MSE (%) ● ● ● ● ● EL SLS WLS OLS 20 Relative Root MSE (%) 80 ● 2 3 4 5 6 7 Block size Block size Appendix Figure 6: Percentiles (10th, 50th and 90th) for (scaled) EL estimates of the nugget on a (a) 50 × 50 sampling region and (b) 20 × 20 region, against block size; a horizontal line running fully between graph margins indicates 1. (a) 50 × 50 region (b) 20 × 20 region 1.6 Range=4 Range=10 1.2 1.0 Percentiles 0.8 1.0 0.6 0.8 0.4 0.6 Percentiles 1.2 1.4 1.4 Range=1 Range=4 Range=8 2 4 6 8 10 12 2 Block size 3 4 5 Block size 12 6 7 8