A blockwise empirical likelihood for spatial lattice data Short title: Spatial empirical likelihood Daniel J. Nordman Dept. of Statistics Iowa State University Ames, IA 50011 dnordman@iastate.edu Abstract This article considers an empirical likelihood method for data located on a spatial grid. The method allows inference on spatial parameters, such as means and variograms, without knowledge of the underlying spatial dependence structure. Log-likelihood ratios are shown to have chi-square limits under spatial dependence for calibrating tests and confidence regions, and maximum empirical likelihood estimators permit parameter estimation and testing of spatial moment conditions. A practical Bartlett correction is proposed to improve the coverage accuracy of confidence regions. The spatial empirical likelihood method is investigated through a simulation study and illustrated with a data example. Key Words: Data blocking, discrete index random fields, estimating equations 1 Introduction Empirical likelihood (EL), introduced by Owen (1988, 1990), is a statistical method allowing likelihood-based inference without requiring a fully specified parametric model for the data. For independent data, versions of EL are known to share many qualities associated with parametric likelihood, such as limiting chi-square distributions for log-likelihood ratios; see Owen (1988) for means, Hall and La Scala (1990) for smooth mean functions and Qin and 1 Lawless (1994) for parameters satisfying moment restrictions. More recently, attention has focused on formulating EL for dependent time series. For weakly dependent time series, Kitamura (1997) proposed a general EL method based on data blocking techniques, and related “blockwise” versions of EL have been developed for other time series inference: Lin and Zhang (2001) for blockwise Euclidean EL; Chuang and Chan (2002) for autoregressive models; Chen, Härdle and Li (2003) for goodness-of-fit tests; Bravo (2005) for time series regressions; Zhang (2006) for negatively associated series. In econometrics, much research has considered EL for testing moment restrictions and comparisons between EL and generalized method of moments estimators; see, for example, Kitamura, Tripathi and Ahn (2004) and Newey and Smith (2004). Monti (1997) and Nordman and Lahiri (2006) have considered periodogram-based EL inference for short- and long-memory time series, respectively. In contrast to time series, the potential application of EL for spatially dependent data has received little consideration. The aim of this paper is to propose an EL method for spatial lattice data and demonstrate that it has some important inference properties in the spatial setting. The method has nonparametric and semiparametric uses and is valid for many spatial processes under weak conditions; this can be appealing when there is uncertainty about an appropriate parametric model. Spatial EL provides a general framework for inference on many spatial parameters through a likelihood function based on estimating equations. Applying the EL method to different spatial problems requires only adjusting the estimating functions that describe the inference scenario. In addition, the spatial EL method does not require variance estimation steps to set confidence regions or conduct tests. This feature of spatial EL is particularly important because standard errors can be difficult to obtain for many spatial statistics under an unknown spatial dependence structure. Current nonparametric methods for spatial data, such as spatial subsampling and the spatial block bootstrap, often require direct estimation of the variance of spatial statistics under data dependence (see Sherman and Carlstein (1994), Politis, Romano and Wolf (1999), Lahiri (2003a), and references therein). An example of a situation where spatial EL provides an attractive approach is illustrated in Figure 1(a), which presents a map of high and low cancer mortality rates for the United States. High and low mortality are defined as in Sherman and Carlstein (1994), who fit an au2 60 50 40 30 20 10 0 0 10 20 30 40 50 60 (a) 1 0 0 0 1 1 1 0 0 0 1 1 2 1 1 2 3 2 2 1 2 0 0 0 0 0 1 1 1 0 0 0 0 1 2 1 1 1 2 2 3 0 1 0 0 0 0 1 1 1 2 1 0 0 0 1 3 3 3 3 3 3 2 2 1 1 0 0 0 0 2 0 2 1 0 0 0 0 1 2 3 0 1 1 2 1 2 0 1 0 0 1 1 1 3 1 1 0 0 0 1 2 3 3 3 2 2 1 0 1 0 0 0 1 1 0 3 1 1 1 1 0 0 2 2 1 2 0 1 1 1 1 2 1 2 0 0 1 1 2 1 2 0 0 1 0 0 1 1 1 1 1 0 0 0 0 1 0 1 1 1 2 0 1 2 1 0 1 1 1 3 1 0 2 1 2 1 2 1 4 2 1 0 1 0 1 1 2 2 1 1 1 2 0 0 1 0 0 0 0 0 0 1 0 2 1 1 2 0 1 1 0 1 1 1 2 2 1 2 1 1 3 1 3 0 3 2 3 1 0 0 2 1 0 1 3 3 2 2 2 1 1 1 0 2 1 0 0 0 0 0 2 0 1 1 0 1 0 0 1 0 1 2 1 2 2 0 2 1 1 2 0 2 0 2 1 1 0 0 1 2 2 1 1 2 2 3 3 1 1 1 0 0 2 1 1 2 1 0 0 0 1 1 0 0 1 1 0 0 0 0 2 1 2 0 0 2 0 1 0 1 2 0 0 0 1 0 0 1 0 1 2 2 2 2 1 2 2 1 1 0 0 0 0 0 2 2 1 1 2 0 1 1 2 1 1 1 2 1 0 0 1 0 2 0 0 0 0 1 0 1 1 2 1 0 0 0 0 0 1 2 0 0 2 3 3 1 2 2 0 0 1 0 0 0 0 0 1 0 2 1 2 0 1 2 3 2 1 1 3 1 1 0 0 0 2 0 0 0 0 0 0 0 0 2 2 1 0 0 0 0 2 2 2 1 1 1 2 3 2 1 2 1 1 0 0 0 0 0 0 0 0 2 0 1 1 1 1 1 2 2 2 1 0 2 0 1 0 1 0 1 0 1 0 0 0 0 0 1 2 1 1 1 0 1 1 3 2 2 0 1 1 1 3 1 1 0 0 0 0 1 0 1 0 2 2 0 1 0 1 1 2 0 2 2 1 1 0 2 1 1 0 1 0 1 0 0 0 0 2 2 2 1 2 1 0 2 1 1 0 2 0 3 2 2 2 1 0 0 1 0 2 1 2 1 1 2 0 0 1 1 0 3 1 2 1 1 2 2 2 1 0 0 1 0 1 0 0 2 1 2 1 3 1 1 1 0 2 0 1 2 1 3 3 1 2 0 0 0 1 1 1 1 1 2 0 1 1 0 1 1 0 1 1 2 0 1 3 2 1 0 0 0 1 0 1 1 0 2 1 2 0 2 0 0 2 0 1 1 3 1 1 2 0 1 0 0 0 1 1 0 1 0 2 1 1 1 0 0 0 0 1 0 2 1 1 3 0 0 0 0 1 2 0 0 1 0 1 1 2 0 1 0 1 2 0 3 1 3 0 0 1 0 0 0 2 0 1 0 0 1 1 1 2 0 0 0 1 0 1 3 0 1 2 0 1 0 0 1 1 1 1 0 1 1 1 2 1 1 1 2 1 2 3 1 4 2 1 0 0 1 0 2 1 2 0 0 0 0 0 2 0 1 0 1 0 2 1 1 2 0 0 2 0 0 0 0 2 2 0 1 2 1 2 1 1 2 2 1 2 2 2 3 1 1 2 1 2 1 2 1 0 0 0 0 0 0 1 0 0 0 1 0 2 1 0 0 2 0 1 0 0 1 1 2 1 1 2 2 0 2 1 2 2 1 0 2 2 3 1 2 2 1 2 2 0 1 0 1 0 1 0 1 0 0 0 0 2 1 2 0 1 0 3 1 0 0 0 3 1 1 2 1 2 1 0 2 1 3 1 2 2 1 3 2 2 1 1 1 2 0 2 0 2 0 3 0 1 1 0 2 1 3 1 1 1 3 1 1 1 0 1 0 3 1 0 3 0 1 1 0 2 1 2 2 1 2 2 1 1 1 1 0 1 0 2 1 2 0 3 1 0 2 0 2 1 2 1 1 1 2 1 0 1 1 2 1 1 2 0 1 0 0 0 0 1 1 2 0 1 0 0 0 0 0 1 1 1 2 2 1 1 2 0 1 0 2 1 1 2 1 0 0 1 1 1 3 2 3 0 1 0 0 1 0 0 0 1 1 1 1 1 1 0 0 1 2 2 1 3 1 2 0 1 0 1 2 3 2 0 2 0 1 1 2 3 2 3 2 3 0 0 1 0 2 1 1 1 1 1 2 3 1 1 1 1 1 2 2 1 2 1 1 0 1 1 2 3 2 2 0 2 1 3 2 3 2 4 3 1 1 0 1 2 1 2 1 1 1 2 2 1 2 0 0 1 1 0 1 1 1 1 1 1 0 2 2 3 1 1 1 0 2 2 3 2 3 2 2 3 1 1 0 1 1 1 1 0 1 0 1 1 0 1 0 0 0 0 0 0 2 1 1 0 2 0 2 1 3 0 1 0 1 1 3 2 2 1 4 1 1 1 2 0 0 0 0 1 0 1 1 0 2 2 1 1 0 0 1 1 1 1 2 0 0 1 1 0 3 1 2 0 1 0 2 2 2 2 2 0 2 1 1 0 1 0 1 0 0 2 1 2 2 3 2 2 0 2 1 2 2 2 1 0 1 1 1 1 1 3 2 1 1 0 1 1 3 0 1 1 1 1 0 1 0 1 0 1 1 1 4 0 2 3 1 1 0 2 2 1 0 3 2 1 2 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 0 2 0 1 1 2 1 1 1 3 1 0 1 0 0 1 0 2 0 1 1 1 0 0 0 0 1 0 0 0 2 1 0 2 2 1 1 0 1 0 0 0 1 0 2 1 0 1 1 1 0 1 0 3 2 1 1 1 1 1 2 2 0 1 0 2 0 0 0 1 1 2 3 1 2 1 2 1 3 0 2 2 2 1 2 2 0 1 0 2 0 1 0 0 0 3 3 2 3 2 2 2 3 2 0 2 1 3 1 3 1 1 1 0 1 0 2 1 1 1 2 2 4 4 3 1 3 2 4 4 1 0 0 1 0 2 0 1 2 0 3 2 3 0 3 1 1 2 1 2 1 2 2 2 3 4 4 3 2 3 1 4 2 4 2 2 1 0 2 0 1 1 0 3 1 3 1 2 2 2 3 2 2 1 2 1 2 2 2 4 2 2 2 2 2 1 3 2 3 0 1 0 0 1 0 2 1 2 1 1 3 2 2 1 1 0 2 1 1 3 1 2 2 1 2 2 2 1 2 2 0 1 1 0 0 0 2 0 2 1 1 1 1 0 0 1 1 2 2 1 2 2 1 4 2 3 1 3 2 1 0 0 1 3 0 1 0 2 0 0 0 0 1 2 3 1 2 1 3 3 3 3 3 2 2 1 1 1 1 1 1 2 0 2 0 1 0 0 0 1 2 2 3 0 2 3 1 4 1 4 1 2 1 1 2 1 1 1 2 0 2 0 0 0 1 0 3 2 1 3 1 1 4 0 4 1 3 0 1 3 2 2 1 1 1 2 0 1 0 1 0 2 1 2 2 0 3 2 1 3 1 2 1 2 1 2 3 4 1 1 1 1 1 2 0 1 0 1 1 2 1 0 2 0 2 1 1 2 1 1 1 1 1 1 3 4 2 3 0 0 1 1 1 2 1 0 0 2 1 2 0 0 1 0 1 1 0 2 1 2 1 0 1 3 2 3 3 1 1 0 0 1 1 1 0 1 1 1 4 0 1 0 0 0 0 0 2 2 1 0 3 1 2 1 0 0 0 0 0 1 1 0 2 2 2 2 0 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 2 1 4 1 1 0 0 1 0 2 1 0 0 0 1 1 0 2 1 1 0 1 0 1 1 1 4 2 3 0 1 0 0 2 2 1 0 0 1 0 2 2 1 1 1 1 0 2 0 0 2 3 4 1 2 0 1 1 0 1 0 0 2 1 1 1 1 0 0 2 0 1 0 1 3 3 2 0 1 1 0 1 0 1 1 0 1 2 0 0 0 2 0 0 1 2 3 1 0 3 1 0 0 1 1 2 1 1 3 1 3 2 1 0 0 0 2 1 1 1 2 1 1 0 1 0 0 1 1 1 2 1 0 0 1 1 1 2 2 1 2 0 1 1 0 2 0 2 2 2 1 0 1 0 2 0 1 0 0 0 2 0 1 1 2 3 3 1 1 1 0 1 0 1 0 1 0 0 0 0 1 1 2 2 1 1 2 0 2 1 3 2 4 2 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 10 20 30 40 50 2 1 0 0 0 1 0 0 0 0 0 1 0 1 1 1 1 2 0 2 1 1 1 0 0 0 1 2 2 1 0 0 0 1 1 1 0 1 2 2 1 1 0 0 0 0 0 10 20 30 40 50 60 (b) Figure 1: (a) Cancer mortality map, where • and ◦ respectively denote a high, Zs = 1, or low, Zs = 0, mortality rate at site s ∈ Rn ∩ Z2 of the sampling region Rn . (b) Sampling region R5,n for vectors Ys , s ∈ R5,n ∩ Z2 , where Ys consists of Zs and its four nearest neighbors Zh , ks − hk = 1; at each site s ∈ R5,n ∩ Z2 , the indicated value denotes the sum Ss of the four neighboring indicators Zh of Zs where values in dark (light) font denote Zs = 1 (Zs = 0) at site s. 3 tologistic model to assess evidence of clustering among high mortality cases. To estimate the autologistic parameter that describes clustering, these authors employed maximum pseudolikelihood (Besag (1975)) followed by a spatial subsampling step. In particular, subsampling was used to obtain a standard error for the pseudo-likelihood estimate in order to set a confidence interval for the autologistic parameter through a normal approximation. This example is revisited in Section 6, where the spatial EL method produces a confidence interval for the clustering model parameter automatically, and no separate determination of standard error is required. Intervals from the spatial EL approach indicate spatial clustering, but suggest the evidence for clustering is not as strong as reported by Sherman and Carlstein (1994). In what follows, a spatial blockwise EL method is developed, based on spatial estimating equations combined with either maximally overlapping or non-overlapping blocks of spatial observations. Data blocking is used as a device to accommodate unknown spatial dependence, similar to the time series blockwise EL of Kitamura (1997). For a broad class of spatial parameters, the spatial EL method yields log-ratios that are asymptotically chi-square, allowing the formulation of tests and confidence regions without knowledge of the data dependence structure. Our EL results include distributions for EL point estimators of spatial parameters as well (i.e., so-called maximum EL estimators). Based on recent results in Chen and Cui (2006, 2007) for independent data, a procedure for a practical Bartlett correction for the spatial EL method is proposed and investigated using simulation. The Bartlett correction makes an adjustment to the log EL ratio that improves coverage accuracy. The rest of the paper is organized as follows. In Section 2, we describe the spatial sampling and estimating function frameworks, with some examples provided for illustration. We also construct the spatial blockwise EL. The main distributional results for the spatial EL method are presented in Section 3. Section 4 outlines an empirical Bartlett correction. The proposed methodology is assessed through a numerical study in Section 5, and illustrated with the cancer mortality map of the United States in Section 6. Section 7 provides a discussion of EL block selection. Assumptions and detailed proofs for the main results are deferred to an Appendix, available in the online supplement to this manuscript. 4 2 Spatial empirical likelihood method To set the stage for development of spatial EL, recall the formulation of EL using a sample Y1 , . . . , Yn of independent, identically distributed (iid) data (e.g., Owen (1990), Qin and Lawless (1994)). First, a parameter of interest θ ∈ Rp is linked to each observation by creating a function Gθ (Yi ) of both, using a vector of r ≥ p estimating functions Gθ (·). The estimating functions are chosen so that, at the true parameter value θ = θ0 , we have an expectation condition E{Gθ0 (Yi )} = 0r that identifies θ0 . With such estimating functions in place, an EL function for θ can be constructed by maximizing a product of n probabilities placed on Gθ (Y1 ), . . . , Gθ (Yn ) under a linear “expectation” constraint. The resulting EL function for θ has important uses; the function can be maximized for point estimators for θ, or chi-square calibrated to set confidence regions. An EL for spatial lattice data is similarly based on estimating functions that satisfy a moment condition, but requires modifications to handle spatial dependence. First, we need a spatial sampling region Rn ⊂ Rd , d ≥ 1, on which a spatial process {Zs : s ∈ Zd } is observed on a grid; here d denotes the dimension of sampling. Then we develop estimating functions involving a spatial parameter θ of interest and the spatial Zs -observations. To provide more generality in the spatial setting, we consider functions Gθ (Ys ), s ∈ Zd , that connect θ to vectors of spatial observations Ys = (Zs+h1 , . . . , Zs+hm )0 based on some selection of fixed spatial lags h1 , . . . , hm ∈ Zd ; these Ys -observations have their own sampling region Rn,Y based on the region Rn for the observed spatial process {Zs : s ∈ Zd }. These formulations are made precise in Section 2.1, which also provides some examples. A spatial EL function for θ is then constructed using an estimating function Gθ (·) along with spatial blocks of Ys -observations, instead of using individual observations, as described in Section 2.2. For clarity throughout the sequel, a bold font denotes a vector in Rd , e.g., s, h, i ∈ Rd . 2.1 Spatial estimating equations To describe the spatial EL method, we adopt a sampling framework that allows a spatial sampling region Rn ⊂ Rd to grow as the sample size n increases. Using a subset R0 ⊂ (−1/2, 1/2]d containing an open neighborhood and an increasing positive sequence {λn } of 5 scaling factors, suppose the sampling region Rn is obtained by inflating the “template” set R0 by the constant λn : Rn = λn R0 . This formulation permits a wide variety of shapes for the sampling region Rn , which shape is preserved as the sampling region grows. For spatial subsampling, Sherman and Carlstein (1994), Sherman (1996), and Nordman and Lahiri (2004) use a comparable sampling structure. We assume that a real-valued, strictly stationary process {Zs : s ∈ Zd } is observed at regular locations on the grid Zd inside Rn . Hence, the available data are {Zs : s ∈ Rn ∩ Zd } observed at n sites {s1 , . . . , sn } = Rn ∩ Zd , with n as the sample size of the observed Zs . To describe a finite dimensional parameter θ ∈ Θ ⊂ Rp of the spatial process {Zs : s ∈ Zd } with estimating functions, we collect observations from Rn into vectors. For a positive integer m, we form an m-dimensional vector Ys = (Zs+h1 , Zs+h2 , . . . , Zs+hm )0 , s ∈ Rn,Y ∩ Zd , where h1 , h2 , . . . , hm ∈ Zd are selected lag vectors, and Rn,Y = {s ∈ Rn : s + h1 , . . . , s + hm ∈ Rn } denotes the sampling region for the process {Ys : s ∈ Zd } containing nY ≡ |Rn,Y ∩ Zd | observations. Here and throughout the sequel, |A| represents the size of a finite set A. As in the iid data formulation of EL (Qin and Lawless (1994)), suppose information about θ ∈ Θ ⊂ Rp exists through r ≥ p estimating functions linking θ to a vector form Ys , s ∈ Zd of the spatial process {Zs : s ∈ Zd }. With arguments y = (y1 , . . . , ym )0 ∈ Rm and θ ∈ Θ, define Gθ (y) = (g1,θ (y), . . . , gr,θ (y))0 : Rm × Rp → Rr as a vector of r estimating functions satisfying E{Gθ0 (Ys )} = 0r ∈ Rr , s ∈ Zd , (1) at the true and unique parameter value θ0 . When r > p, the above functions are said to be “overidentifying” for θ. In Section 2.2, we build an EL function for a spatial parameter θ via the moment condition in (1). With appropriate choices of vectors Ys and estimating functions Gθ (·), EL inference is possible for a large class of spatial parameters, as is illustrated in the following examples. Example 1. (Poisson counts). Consider a pattern of events in a spatial region that may exhibit spatial randomness (e.g., tree locations in a forest). It is common to partition the region into rectangular plots on a grid, and the number of events occurring in each plot (or quadrat) is considered as a lattice observation Zs , s ∈ Zd (e.g., counts of trees in a quadrat), 6 where each count Zs follows a Poisson distribution with mean E(Zs ) = θ when the events exhibit complete spatial randomness (Cressie (1993), Chapter 8.2). For EL inference, we set Ys = Zs , s ∈ Zd and use estimating functions Gθ (Ys ) = (Zs − θ, Zs2 − θ2 − θ)0 , based on Poisson moments, so that (1) holds with p = m = 1, r = 2. Using EL results in Section 3, it is possible to estimate the mean count θ, or more importantly test if the Poisson assumption (1) holds, without nonparametric variance estimation as used in some previous applications (Sherman (1996)). Example 2. (Variogram inference). Estimation of the variogram 2γ(hi ) ≡ Var(Zs − Zs+hi ) = E{(Zs − Zs+hi )2 } of the process {Zs : s ∈ Zd } at given lags h1 , . . . , hp ∈ Zd is an important problem. Least squares variogram fitting is commonly proposed in the geostatistical literature; see Lee and Lahiri (2002) and references therein. For EL inference on the variogram θ = (2γ(h1 ), . . . , 2γ(hp ))0 ∈ Rp , we define a vector function Gθ (Ys ) = (g1,θ (Ys ), . . . , gp,θ (Ys ))0 of the (p + 1)-dimensional process Ys = (Zs , Zs+h1 , . . . , Zs+hp )0 , where gi,θ (Ys ) = (Zs − Zs+hi )2 − 2γ(hi ). This selection fulfills (1) with r = p, m = p + 1. Example 3. (Pseudo-likelihood inference). Markov random fields provide an important class of models for spatial lattice data. They allow the conditional distribution of an observation Zs , s ∈ Zd , to be written through a neighborhood structure as Pθ (Zs = z | {Zh : h ∈ Ns }) if Zs is discrete fθ (z | {Zh : h 6= s}) = z ∈ R, (2) density f (z | {Z : h ∈ N }) if Z is continuous, θ h s s where Ns ⊂ Zd denotes a neighborhood of Zs (Cressie (1993), Chapter 6). Besag (1974) developed models based on conditional distributions from one-parameter exponential families in (2)and estimated them through maximum pseudo-likelihood (Besag (1975)), where the pseudo-likelihood estimator θ̂nP L of θ ∈ Θ ⊂ Rp solves the score-based system X s∈Rn ∩Zd ∂ log fθ (Zs | {Zh : h ∈ Ns }) = 0p ∈ Rp . ∂θ Confidence regions for θ based on a normal approximation for θ̂nP L often require estimating the variance Var(θ̂nP L ) of the pseudo-likelihood estimator, a difficult task in general. This 7 issue is relevant when fitting (2) with pseudo-likelihood to examine clustering in the mortality map described in the Introduction. However, the EL method may be generally applied for pseudo-likelihood inference with the advantage that a confidence region for a parameter θ characterizing (2) can be set by simply calibrating an EL function. This will be illustrated for the mortality map example in Section 6. To set up EL inference for a conditional distribution (2), suppose the neighborhoods Ns , s ∈ Zd , have a constant structure, such as “four-nearest neighbor” Ns ≡ {s ± e ∈ Z2 : e = (0, 1)0 , (1, 0)0 } when d = 2. For describing θ ∈ Θ ⊂ Rp , we choose r = p score-functions Gθ (Ys ) = ∂ log fθ (Zs | {Zh : h ∈ Ns })/∂θ involving a vector Ys = (Zs , Zs+h1 , . . . , Zs+h|Ns | )0 , hi ∈ Ns − s, formed by Zs and its |Ns | neighbors, s ∈ Zd . For Markov random fields based on exponential-family models (2), these functions entail the moment condition (1) for θ. 2.2 Spatial blockwise empirical likelihood construction Suppose a spatial parameter θ ∈ Θ ⊂ Rp is identified through a vector process Ys , s ∈ Zd , and estimating functions Gθ (·) satisfying (1). Construction of the spatial EL function for θ requires spatial blocks of observed vectors Ys , s ∈ Rn,Y ∩ Zd . We consider two possible sources of rectangular blocks within Rn,Y , namely, maximally overlapping (OL) and non-overlapping (NOL) blocks. Such blocking schemes are common with other block-based spatial resampling methods, such as the spatial block bootstrap and spatial subsampling (Lahiri (2003a)). Let {bn }n≥1 be a sequence of positive integers and define general d-dimensional blocks as Bbn (i) ≡ i + bn U, i ∈ Zd , using the cube U = (−1/2, 1/2]d . To keep the blocks small relative to the sampling region Rn,Y , we suppose bn grows at a slower rate than the sample size nY , and require that 2d b−1 n + bn /nY = o(1) (3) as n → ∞. We elaborate on this block condition in Section 7. The integer index set IbOL = n {i ∈ Zd : Bbn (i) ⊂ Rn,Y } identifies all integer-translated cubes bn U lying completely inside the sampling region Rn,Y for the Ys -observations. From this, the collection of maximally OL blocks is given by {Bbn (i) : i ∈ IbOL }; see Figure 2(c). For NOL blocks, the region Rn,Y is divided n into disjoint cubes of Ys -observations. Letting IbNnOL = {bn k : k ∈ Zd , Bbn (bn k) ⊂ Rn,Y } ⊂ Zd 8 represent the index set of all NOL cubes Bbn (bn k) = bn (k + U) lying completely inside Rn,Y , the NOL block collection is then {Bbn (i) : i ∈ IbNnOL }; see Figure 2(b). Figure 2: (a) Sampling region Rn,Y ; (b) NOL complete blocks; (c) OL blocks; (d) Bootstrap region R∗n,Y formed by the complete blocks in (b). (Bootstrap samples on R∗n,Y are found by resampling data blocks from (c) and concatenating these into block positions in (d).) (a) qqqqqqqqq £q qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq Bqqq q q£qq qq qq qq qq qq qq qq qq qq qq qq Bqq q £qq qq qq qq qq qq qq qq qq qq qq qq qq qq Bqq qqqqqqqqqqqqqqqqq qq£qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq Bqqq qq £q q q q q q q q q q q q q q q q q qBq £qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qqBqq qqqqqqqqqqqqqqqqqqqqqqq q£qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qqBqq q £q q q q q q q q q q q q q q q q q q q q q q q qBq (b) complete block £ £ £ £ £ £ ¢ ¢® B £ B B B B B B B £ KA £ A incomplete block £ £ £ £ £ £ (c) £ (d) B B B B B B B B In the following, we let In generically denote either chosen index set IbOL or IbNnOL and n denote the number of blocks as NI = |In |. Using estimating functions Gθ in (1), we compute P a sample mean Mθ,i = s∈Bbn (i)∩Zd Gθ (Ys )/bdn , i ∈ In , for each block Bbn (i) in the collection, which provides |Bbn (i) ∩ Zd | = bdn observations of Gθ (Ys ), s ∈ Zd . The EL function Ln (θ) and EL ratio Rn (θ) for θ ∈ Θ are then given by ( ) Y X X Ln (θ) = sup pi : pi ≥ 0, pi = 1, pi Mθ,i = 0r , i∈In i∈In Rn (θ) = i∈In Ln (θ) . (1/NI )NI (4) The EL function for θ ∈ Θ involves maximizing a multinomial likelihood created from probabilities assigned to each block sample mean, under an expectation-based linear constraint. Without the expectation constraint in Ln (θ), the product has a maximum when each pi = 1/NI , yielding the EL ratio in (4). If 0r ∈ Rr is interior to the convex hull of {Mθ,i : i ∈ In }, then Ln (θ) represents a positive, constrained maximum and (4) may be written as Ln (θ) = Y pθ,i , i∈In where tθ solves Rn (θ) = Y¡ 1 + t0θ Mθ,i ¢−1 , pθ,i = {NI (1 + t0θ Mθ,i )}−1 ∈ (0, 1), (5) i∈In P i∈In Mθ,i /(1 + t0 Mθ,i ) = 0r . We define Ln (θ) = −∞ when the set in (4) is empty. See Owen (1990) and Qin and Lawless (1994) for these computational details on EL. In the next section, we consider the distribution of the log EL ratio given by `n (θ) = −2Bn log Rn (θ), 9 Bn = nY bdn NI . (6) The factor Bn is a block adjustment to ensure chi-square limits for (6), and represents the spatial analog of the block correction used for the time series blockwise EL (Kitamura (1997)). 3 Main results Distributional results for the spatial EL are established under a set of assumptions referred to as “Assumptions 1-4” in the sequel. We defer technical details on these assumptions to Section 8.1 of the online Appendix. In brief, Assumption 1 provides a condition equivalent to the block growth rate (3). Assumptions 2-4 describe spatial mixing and moment conditions which allow the spatial EL method to be valid for a large class of spatial processes exhibiting weak spatial dependence. All of the EL results to follow apply equally to EL functions Rn (θ), `n (θ) constructed of either OL or NOL blocks (i.e., In = IbOL or IbNnOL ). n 3.1 Smooth function model We first establish the distribution of spatial blockwise EL ratios for inference on “smooth function” parameters, as in Hall and La Scala (1990) for iid data and Kitamura (1997) for mixing time series. Suppose θ = E{G(Ys )} ∈ Θ ⊂ Rp represents the mean of a function G : Rm → Rp applied to an m-dimensional vector Ys , s ∈ Zd . EL inference on a more general parameter θH = H(θ) ∈ Ru may be considered using a smooth function H : Rp → Ru of θ. This “smooth function” model permits a wide range of spatial parameters θH , including ratios or differences of means θ. For example, θ = {E(Ys ), E(Ys2 ), E(Ys Ys+h )}0 ∈ R3 and H(x1 , x2 , x3 ) = (x3 − x21 )/(x2 − x21 ) : R3 → R yield a spatial autocorrelation θH = H(θ) at lag h ∈ Zd . For smooth model inference, we first define an EL ratio Rn (θ) for θ using functions Gθ (Ys ) = G(Ys ) − θ, s ∈ Zd in (5), which satisfy (1) with the same number of parameters and estimating functions r = p. An EL ratio and log-ratio for a parameter θH are then defined as Rn (θH ) ≡ sup θ∈Θ:H(θ)=θH `n (θH ) ≡ −2Bn log Rn (θH ). Rn (θ), Theorem 1 provides a nonparametric recasting of Wilks’ theorem for spatial data, useful for calibrating confidence regions and tests of spatial “smooth model” parameters based on a 10 chi-square approximation. In the following, χ2ν denotes a chi-square variable with ν degrees d of freedom with a lower α-quantile given by χ2ν;α and −→ denotes distributional convergence. Theorem 1 (Smooth functions of means) Suppose In = IbOL or IbNnOL ; E{G(Ys )} = n θ ∈ Rp ; Assumptions 1-4 hold with r = p estimating functions Gθ (Ys ) = G(Ys ) − θ, s ∈ Zd ; H : Rp → Ru is continuously differentiable in a neighborhood of θ0 and θ0H = H(θ0 ). Then, d `n (θ0H ) −→ χ2ν as n → ∞, where ν denotes the rank of the u × p matrix ∂H(θ)/∂θ|θ=θ0 . See Hall and La Scala (1990) for properties of EL confidence regions for smooth model parameters. 3.2 Maximum empirical likelihood point estimation We refer to the maximum of Rn (θ) from (5) as the maximum empirical likelihood estimator (MELE) and denote it by θ̂n . Using general estimating equations, Qin and Lawless (1994) and Kitamura (1997) considered the distribution of the MELE with independent data and mixing time series, respectively. With spatial data, we show the MELE has properties resembling those available in other EL frameworks. We first consider establishing the existence, consistency and asymptotic normality of a sequence of maxima of the EL ratio Rn (θ) from (5), along the lines of the classical arguments of Cramér (1946). The conditions are mild and have the advantage that they are typically easy to verify. Let k · k denote the Euclidean norm in the following. Theorem 2 (General estimating equations) Assume In = IbOL or IbNnOL , Assumptions n 1-4 and (1) hold. In addition, suppose in a neighborhood of θ0 , ∂Gθ (·)/∂θ, ∂ 2 Gθ (·)/∂θ∂θ0 are continuous in θ and k∂Gθ (·)/∂θk, k∂ 2 Gθ (·)/∂θ∂θ0 k are bounded by a nonnegative, real-valued J(·) with E{J 3 (Ys )} < ∞; and Dθ0 ≡ E{∂Gθ (Ys )/∂θ|θ=θ0 } has full column rank p. Then, ¡ −5/12 ¢ as n → ∞, P Rn (θ) is continuously differentiable on kθ − θ0 k ≤ nY → 1; there exists a ³ ´ −5/12 sequence {θ̂n } such that P Rn (θ̂n ) = max Rn (θ) & kθ̂n − θ0 k < nY → 1; and −5/12 kθ−θ0 k≤nY V 0 0 θ̂ − θ0 d 1/2 −→ N p , θ0 nY n −d 0r 0 Uθ0 bn tθ̂n 11 ¡ ¢−1 −1 −1 0 where Vθ0 = Dθ0 0 Σ−1 , Uθ0 = Σ−1 θ0 Dθ0 θ0 − Σθ0 Dθ0 Vθ0 Dθ0 Σθ0 . Remark. For an iid sample of size n, Qin and Lawless (1994) established a related result for −5/12 a ball kθ − θ0 k ≤ n−1/3 . We could replace nY −1/3 with nY , to allow a larger ball in Theo- rem 2, by strengthening moment assumptions (i.e., E(kGθ (Ys )k12+δ ) < ∞ in Assumption 1 of −5/12 Section 8.1 of the online Appendix). However, regardless of the ball radius nY −1/2 the maximizer of the EL function on each ball satisfies kθ̂n − θ0 k = Op (nY −1/3 or nY , ), and thereby maximizers on the different balls must ultimately be equal. Theorem 2 establishes the existence of a local maximizer of the spatial EL function. When the likelihood Rn (θ) has a single maximum with probability approaching 1, by the concavity of Rn (θ) for example, then the sequence {θ̂n } corresponds to a global MELE. Under stronger conditions, as in Kitamura (1997), a global maximum on Θ can be shown to satisfy kθ̂n − −1/2 θ0 k = Op (nY ), thereby coinciding with the sequence in Theorem 2. However, Theorem 2 conditions are often sufficient, for many estimating functions, to ensure that the sequence {θ̂n } in Theorem 2 corresponds to global maximizers without more restrictive assumptions, such as compactness of the parameter space Θ. For example, this is true with estimating functions of the common form Gθ (Ys ) = G(Ys ) − γ(θ) for some G : Rm → Rr and differentiable γ : Θ → Rr with kγ(θ) − γ(θ0 )k increasing in kθ − θ0 k; see Example 1 of Section 2.1 for illustration. 3.3 Empirical likelihood tests of hypotheses As in the EL frameworks of Qin and Lawless (1994) and Kitamura (1997), the spatial EL method allows test statistics based on θ̂n for both spatial parameter and moment hypotheses. The distribution of the log-EL ratio rn (θ) ≡ `n (θ) − `n (θ̂n ) at θ = θ0 is useful for simple hypothesis tests or for calibrating approximate 100(1 − α)% EL confidence regions for θ as {θ ∈ Θ : rn (θ) ≤ χ2p;1−α }. For testing the null hypothesis that the moment condition (1) holds for the estimating functions, the log-ratio statistic `n (θ̂n ) may be applied. Theorem 3 provides the limiting chi-square distributions of these EL log-ratio statistics. In Theorem 3, we show additionally that the profile spatial EL ratio statistics can be developed to conduct tests and set confidence regions in the presence of nuisance parameters; 12 see Qin and Lawless (1994) for the iid data case. Let θ = (θ10 , θ20 )0 , where θ1 denotes the q × 1 parameter of interest and θ2 denotes a (p − q) × 1 nuisance vector. For fixed θ1 , suppose that (θ ) θ̂2 1 maximizes the EL function Rn (θ1 , θ2 ) with respect to θ2 and define the profile log-EL (θ ) ratio `n (θ1 ) ≡ −2Bn log Rn (θ1 , θ̂2 1 ) for θ1 . Theorem 3 Under the assumptions of Theorem 2 with the sequence {θ̂n }, as n → ∞, d d (i) rn (θ0 ) = `n (θ0 ) − `n (θ̂n ) −→ χ2p and `n (θ̂n ) −→ χ2r−p . d (ii) If H0 : θ1 = θ10 holds, then rn (θ10 ) = `n (θ10 ) − `n (θ̂1n ) −→ χ2q , where θ̂n = (θ̂1n , θ̂2n )0 . We examine the performance of the spatial EL in subsequent sections. EL inference for spatial parameters under constraints is also possible, as considered by Qin and Lawless (1995) and Kitamura (1997) for iid and time series data; see Section 8.4 of the online Appendix for this. 4 A Bartlett correction procedure A Bartlett correction is often an important property for EL methods. This involves making a mean adjustment to the EL log-ratio in order to improve the limiting chi-square approximation, and to enhance the coverage accuracy of EL confidence regions. For EL with independent data, a Bartlett correction has been established by DiCiccio, Hall and Romano (1991) for smooth function means, and by Chen and Cui (2006, 2007) under general estimating equations and nuisance parameters; see Chen and Cui (2007) for additional references with iid data. With weakly dependent time series, Kitamura (1997) and Monti (1997) considered Bartlett corrections for blockwise EL with mean parameters and a periodogram-type EL, respectively. While a formal justification of a Bartlett correction in the spatial setting is difficult, a practical Bartlett correction for the spatial EL may be proposed using a spatial block bootstrap. Let rn (θ) = `n (θ) − `n (θ̂n ), θ ∈ Θ ⊂ Rp , denote the log EL ratio from Section 3.3 based on the d MELE θ̂n and (6). By Theorems 2-3, we have rn (θ0 ) −→ χ2p and θ̂n is consistent for θ0 so that a bootstrap Bartlett correction factor may be calculated as follows. Pick some large M ∈ N. For i = 1, . . . , M , independently generate a block bootstrap rendition, say Yn∗i , of the original ∗i ∗i vectorized spatial data Yn ≡ {Ys : s ∈ Rn,Y ∩ Zd } and compute rn∗i (θ̂n ) = `∗i n (θ̂n ) − `n (θ̂n ), ∗i ∗i where `∗i n and θ̂n are the log EL ratio and MELE analogs based on Yn . We then compute 13 r̄n∗ = M −1 PM ∗i i=1 rn (θ̂n ) to estimate E{rn (θ0 )} and set a Bartlett-corrected 100(1 − α)% confi- dence region as {θ : (p/r̄n∗ )rn (θ) ≤ χ2p,1−α }. If θ = (θ10 , θ20 )0 with interest on θ1 ∈ Rq , treating θ2 ∈ Rp−q as nuisance parameter as in Theorem 3, we take a Bartlett-corrected confidence region for θ1 as {θ1 : (q/r̄n∗ )rn (θ1 ) ≤ χ2q,1−α } with respect to rn (θ1 ) = `n (θ1 ) − `n (θ̂1n ) and r̄n∗ ∗i ∗i based on rn∗i (θ̂1n ) = `∗i n (θ̂1n ) − `n (θ̂1n ). Under the smooth function model in Theorem 1, the same algorithm applies for the EL ratio rn (θ0H ) ≡ `n (θ0H ), with `n (θ̂n ) = 0 in this case. As an alternative to the Bartlett correction, we mention another option would be to calibrate confidence regions for the log El ratio rn (θ) using sample quantiles from the M bootstrap replicates rn∗i (θ̂n ). The Bartlett correction involves estimating the mean of rn (θ0 ) at the true parameter θ0 while the bootstrap calibration aims to approximate extreme quantiles of the distribution of rn (θ0 ). Intuitively, mean estimation is a more robust task and may possibly require fewer bootstrap replicates M for adequate estimates. Simulation studies with independent data in Chen and Cui (2007) appear to suggest this as well. For this reason, we concentrate our numerical studies in Section 5 on the Bartlett correction. For completeness, we describe a spatial block bootstrap method for generating a bootstrap version of Yn on Rn,Y in Section 8.5 of the online Appendix. The bootstrap involves spatial d blocks determined by a block scaling factor bn,bt , satisfying b−1 n,bt + bn,bt /nY = o(1). The bootstrap scaling bn,bt may differ from the EL block scaling bn and might be expected to be larger than bn in many cases. 5 Numerical study We conducted a simulation study to compare OL and NOL versions of the blockwise EL method, and to examine the Bartlett correction algorithm for inference on the mean E(Zs ) = θ of a real-valued spatial process Zs , s ∈ Z2 , on the integer grid. Sampling regions Rn = λn (−1/2, 1/2]2 ⊂ R2 of different sizes were considered with λn = 10, 20, 30; a fourth region was taken as Rn = (−5, 5] × (−15, 15]. We used the circulant embedding method of Chan and Wood (1997) to generate real-valued mean-zero Gaussian random fields on Z2 with an 14 Exponential or Gaussian covariance structure: £ ¤ ¡ ¢ exp − β1 |h1 | − β2 |h2 | 0 2 h = (h1 , h2 ) ∈ Z , Cov Zs , Zs+h = exp £ − β1 |h1 |2 − β2 |h2 |2 ¤ model E(β1 , β2 ) model G(β1 , β2 ), with values (β1 , β2 ) = (0.8, 0.8) and (0.4, 0.2). Using Ys = Zs and Gθ (Ys ) = Zs − θ in (1), we calculated approximate two-sided 90% EL intervals for θ as {θ : rn (θ) ≤ χ21;0.9 } using OL/NOL blocks of length bn = Cn1/5 , C = 1, 2, where n = |Rn ∩ Z2 | and rn (θ) = `n (θ); note `n (θ̂n ) = 0 here for the mean and nY = n. This order of the EL block factor was intuitively chosen to be smaller than the optimal order O(n1/(d+2) ) known for spatial subsampling variance estimation when d = 2 (Sherman (1996)); EL block scaling bn is discussed further in Section 7. Using the algorithm from Section 4, Bartlett-corrected EL intervals were also computed using M = 1000 Monte Carlo approximations and bootstrap block sizes bn,bt = n1/4 , n1/3 . Additionally, for comparison to EL intervals, normal approximation intervals for θ were taken as Z̄n ± 1.645Sn using the sample mean Z̄n over Rn and a spatial subsampling variance estimator Sn2 of Var(Z̄n ) based on a plug-in estimate of its optimal block size, with pilot block sizes n1/(2+2i) , i = 1, 2, see Nordman and Lahiri (2004). Table 1 provides summaries of the coverage accuracies and interval lengths for the EL method based on 1000 simulation runs for each sampling region and covariance structure; Table 2 provides the same for the subsampling-based intervals. The Bartlett correction appears to provide large improvements in the EL intervals across a variety of dependence structures. From the results, we make the following observations: 1. The coverage accuracies of the intervals often improved, and the interval lengths decreased, as the strength of underlying spatial dependence decreased and the size of the sampling region increased. 2. Coverage probabilities of uncorrected EL and normal approximation intervals were similar, and often far below the nominal level. This agrees with other simulation results for EL with independent data, in which uncorrected EL intervals often appeared too narrow (DiCiccio, Hall and Romano (1991), Chen and Cui (2005b)). 3. Bartlett-corrected EL intervals based on NOL and OL blocks were generally competitive and had coverage accuracies that were much closer to the nominal level than uncorrected intervals. The NOL block version typically performed better with shorter blocks bn . 15 Table 1: Coverage probabilities for approximate two-sided 90% EL confidence intervals for the process mean, with expected interval lengths, based on OL/NOL blocks of length bn ; below U C, BC3 , BC4 denote uncorrected and Bartlett-corrected intervals based on bootstrap blocks bn,bt = n1/3 , n1/4 , respectively, and n1 × n2 denotes the size of the sampling region with n = n1 n2 . bn = n1/5 NOL bn = 2n1/5 OL NOL OL E(0.4,0.2) UC BC3 BC4 UC BC3 BC4 UC BC3 BC4 UC BC3 BC4 10×10 39.6 68.1 76.6 38.7 60.9 65.7 47.9 68.8 68.8 42.4 76.5 74.4 0.57 1.11 1.54 0.57 0.96 1.12 0.63 1.08 1.08 0.58 1.31 1.28 41.8 69.3 77.2 42.0 61.4 73.0 39.6 75.1 62.3 51.0 79.3 74.2 0.35 0.65 0.90 0.36 0.54 0.72 0.42 0.98 0.74 0.51 0.91 0.82 50.9 72.6 77.6 50.0 68.0 79.5 55.2 95.7 95.4 58.8 78.6 77.1 0.30 0.48 0.55 0.30 0.44 0.59 0.40 1.21 1.25 0.41 0.63 0.63 42.3 75.6 80.2 40.4 58.3 69.9 48 80.3 80.3 50.1 80.0 90.5 0.40 0.91 1.28 0.40 0.61 0.83 0.49 1.21 1.21 0.53 1.02 1.47 64.6 88.8 84.9 63.8 81.5 78.7 62.8 83.6 83.6 55.2 91.3 89.0 0.49 0.98 1.06 0.50 0.76 0.76 0.47 0.81 0.81 0.46 1.06 1.00 69.4 86.9 83.6 71.4 83.6 85.4 53.2 82.7 71.6 70.8 86.3 81.1 0.28 0.42 0.49 0.28 0.37 0.41 0.28 0.62 0.48 0.32 0.52 0.46 74.9 87.8 85.4 73.6 85.6 88.1 73.7 98.4 98.0 76.7 89.3 87.9 0.21 0.28 0.27 0.21 0.27 0.29 0.22 0.61 0.60 0.24 0.33 0.31 71.9 92.2 87.8 70.5 82.3 84.5 62.7 91.3 91.0 70.5 89.2 92.4 0.32 0.56 0.63 0.32 0.43 0.47 0.33 0.81 0.81 0.36 0.61 0.82 64.3 89.7 90.4 62.6 86.0 83.6 62.5 80.1 80.1 56.1 87.9 86.4 0.63 1.27 1.55 0.64 1.08 1.09 0.60 1.02 1.02 0.58 1.30 1.26 66.7 89.0 86.1 67.6 83.4 85.0 54.3 86.1 77.5 70.1 91.9 85.8 0.35 0.57 0.65 0.35 0.49 0.55 0.34 0.80 0.63 0.41 0.70 0.60 76.3 90.4 87.5 76.6 87.1 89.6 71.4 98.6 98.3 74.9 89.8 86.5 0.27 0.38 0.36 0.27 0.35 0.39 0.29 0.81 0.78 0.30 0.43 0.40 70.6 92.9 87.6 69.1 83.6 85.2 59.8 91.3 91.0 69.0 92.4 94.9 0.40 0.77 0.85 0.40 0.57 0.63 0.42 1.04 1.04 0.46 0.84 1.15 20×20 30×30 10×30 E(0.8,0.8) 10×10 20×20 30×30 10×30 G(0.4,0.2) 10×10 20×20 30×30 10×30 16 Table 2: Coverage probabilities and expected interval lengths for approximate two-sided 90% confidence intervals for the process mean based on a normal approximation with a subsampling variance estimator. Sampling region Rn sizes noted by n1 × n2 . (β1 , β2 ) (0.4,0.2) (0.8,0.8) E(β1 , β2 ) G(β1 , β2 ) 10×10 20×20 30×30 10×30 10×10 20×20 30×30 10×30 44.2 50.9 64.0 53.9 70.4 71.9 81.4 76.0 0.59 0.40 0.38 0.50 0.64 0.36 0.28 0.45 69.0 77.9 79.7 76.7 80.4 81.9 86.0 82.5 0.50 0.30 0.22 0.34 0.49 0.27 0.19 0.31 4. Under spatial dependence E(0.4,0.2), the Bartlett-corrected EL intervals were most sensitive to the EL and bootstrap block sizes. In this case, larger blocks seemed preferable to capture the stronger dependence structure. Repeating the simulation with M = 500 or 250 bootstrap renditions did not change the results significantly, suggesting an adequate Bartlett correction may also be possible with fewer spatial bootstrap replicates. 6 Data example: cancer mortality map The spatial EL method was applied to the cancer mortality map shown in Figure 1(a), constructed using mortality rates from liver and gallbladder cancer in white males during 19501959. Sherman and Carlstein (1994) considered these data for applying subsampling. We use their division of high and low mortality rates for illustration purposes, recognizing that the map’s binary nature discards useful information relevant to the underlying scientific problem. The sampling region Rn in Figure 1(a) contains 2298 sites on a portion of the integer grid (0, 66] × (0, 58] ∩ Z2 . For a given site s ∈ Z2 , we code Zs = 0 or 1 to indicate a low or P high mortality rate, and let Ss = h∈Ns Zh denote the sum of indicators Zh over the four nearest-neighbors Ns = {h ∈ Z2 : ks − hk = 1} of site s. To test whether incidences of high cancer mortality exhibit clumping, Sherman and Carlstein proposed examining the spatial dependence parameter β of an autologistic model of 17 0.4 0.6 8 0.0 0.2 beta 0.0 0.6 0.6 10 6 8 ( 0.1284 , 0.4418 ) 0 2 4 r(beta) 8 6 2 0.4 0.4 b= 12 ( 0.1492 , 0.4493 ) 0 0.2 0.2 beta 4 r(beta) 8 6 4 0 2 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 b= 2 , Bartlett Corrected b= 4 , Bartlett Corrected b= 6 , Bartlett Corrected 0.0 0.2 0.4 0.6 2 4 6 8 MELEs 0.3114 , −1.3599 ( 0.1349 , 0.504 ) 0 2 4 6 r(beta)/bar(r*) 8 MELEs 0.3444 , −1.3923 ( 0.1619 , 0.534 ) 0 0 2 4 6 r(beta)/bar(r*) 8 MELEs 0.3821 , −1.4392 ( 0.1753 , 0.5882 ) 10 beta 10 beta 10 beta 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 b= 8 , Bartlett Corrected b= 10 , Bartlett Corrected b= 12 , Bartlett Corrected 0.2 0.4 beta 0.6 8 6 2 0 2 0.0 MELEs 0.2713 , −1.3413 ( 0.0483 , 0.5554 ) 4 6 r(beta)/bar(r*) 8 MELEs 0.2854 , −1.3541 ( 0.1047 , 0.5147 ) 0 0 2 4 6 r(beta)/bar(r*) 8 MELEs 0.2963 , −1.3524 ( 0.1185 , 0.5027 ) 10 beta 10 beta 10 beta 4 r(beta) 0.6 b= 10 10 10 b= 8 r(beta)/bar(r*) 0.4 beta ( 0.1531 , 0.4578 ) 0.0 6 2 0 2 0 0.2 ( 0.1677 , 0.4658 ) 4 6 r(beta) 8 ( 0.2065 , 0.4866 ) 4 r(beta) 8 6 4 0 2 r(beta) 0.0 r(beta)/bar(r*) b= 6 10 b= 4 10 10 b= 2 ( 0.258 , 0.5059 ) 0.0 0.2 0.4 beta 0.6 0.0 0.2 0.4 0.6 beta Figure 3: Spatial log-EL ratio rn (β) for β, and a Bartlett-corrected version rn (β)/r̄n∗ for various block lengths bn . Horizontal lines indicate the chi-square quantile χ21;0.95 , and approximate 95% confidence intervals for β appear in brackets; MELEs β̂n , α̂n are given for each bn . 18 the type introduced by Besag (1974). That is, suppose the binary process Zs , s ∈ Z2 was generated by the conditional model, with parameters θ = (α, β)0 , written as £ ¤ ¢ ¡ exp z(α + βSs ) £ ¤, fθ (z|{Zh : h 6= s}) = Pθ Zs = z | {Zh ∈ Ns } = 1 + exp α + βSs z = 0, 1. (7) Positive values of β suggest a tendency for clustering, while β = 0 implies no clustering among sites. Sherman and Carlstein set a normal-theory confidence interval for β based on the pseudo-likelihood estimate β̂nP L and a spatial subsampling variance estimate for Var(β̂ P L ). The spatial EL may be applied to investigate evidence of clumping without a variance estimation step. For this, we use pseudo-likelihood-type estimating functions as described in Example 3 of Section 2.1. For θ = (α, β)0 in (7), we consider the vector process Ys of dimension m = 5, formed by Zs and its four nearest neighbors Zh , h ∈ Ns , along with r = p = 2 estimating functions Gθ (Ys ) = ∂ log fθ (Zs | {Zh : h ∈ Ns })/∂θ based on (7). Figure 1(b) shows the sampling region R5,n of these Ys -observations. Treating α as a nuisance parameter, we obtain a profile log-EL ratio rn (β) = `n (β) − `n (β̂n ) for each β value, where (β) (β) `n (β) = `n (β, α̂n ), α̂n = arg maxα Rn (β, α) and β̂n is the MELE for β. For various block choices bn , we computed the MELEs θ̂n = (α̂n , β̂n )0 and, by Theorem 3, calibrated approximate 95% confidence intervals for β based on a χ21 distribution for rn (β). Figure 3 shows the log-EL ratio rn (β), MELEs, and corresponding approximate 95% confidence interval for β with and without Bartlett corrections for each block size used. The Bartlett correction factor r̄n∗ was computed based on M = 1000 bootstrap renditions of R5,m and a block factor bn,bt = 6. As in the simulation study of Section 5, Bartlett-corrected EL intervals for β are notably wider than their uncorrected counterparts. EL intervals for β suggest clustering but these are shifted much closer to zero compared to Sherman and Carlstein’s subsampling-based 95% confidence interval (0.2185, 0.6183) (after re-parameterization there). In comparison, the EL method gives a slightly moderated interpretation of clustering. Of additional note, the behavior of EL intervals in Figure 3 also suggests a visual way of selecting a block size for the EL method; this is described in the next section. 19 7 Spatial empirical likelihood block scaling The spatial EL proposed in this article involves a block condition (3), stating that the spatial sample size nY for Rn,Y must be of larger order than the squared number of observations in a spatial block b2d n . This appears also to be necessary for the results presented previously. To see why, note that, from Theorem 2, the exact order of the EL Lagrange multiplier tθ̂n is 1/2 Op (bdn /nY ), which is also the order of tθ0 at the true parameter θ0 . Under the EL moment 1/2 condition (1), we expect tθ0 to converge to zero in probability (requiring bdn /nY → 0) as the sample size increases, so that the EL block probabilities pθ0 ,i from (5) become close to the probabilities 1/NI maximizing the EL function. Hence, (3) may represent the weakest possible requirement on the blocks. Potential EL block scaling in Rd can involve bn = CnκY , for some C > 0 and 0 < κ < 1/(2d), although the best EL block orders for coverage accuracy are presently unknown for any d. With some time series block resampling methods, MSE-optimal blocks for distribution estimation are usually smaller than optimal blocks for variance estimation (Lahiri (2003a)). 1/5 This motivated the choice bn = CnY in the simulation study of Section 5 so as to be smaller 1/4 than the optimal block order O(nY ) known for subsampling variance estimation when d = 2 (Sherman (1996)). This order choice of κ = 1/5 is also a compromise between the optimal block orders κ ∈ [1/4, 1/6] for some R2 -subsampling distribution estimators studied by GarciaSoidan and Hall (1997). In practice, EL block sizes might be chosen by the “minimum volatility” method, described by Politis, Romano and Wolf (1999) (Section 9.3.2) for time series subsampling. The method is heuristic and based on the idea that, while some block sizes bn may be too large or small, we might expect to find a range of bn -values yielding approximately correct inference. In this range, confidence regions should be stable as a function of the block size. Hence, by creating EL confidence regions over a range of block sizes, an appropriate block size could be chosen by visual inspection. For illustration, we consider the EL confidence intervals in Figure 3 from the mortality map example. The apparent stability of these intervals over bn = 6, 8, 10 seems to indicate that these block choices are reasonable for applying the EL method. 20 Acknowledgments The author wishes to thank an associate editor and two referees for constructive comments that improved an earlier version of the paper, as well as Mark Kaiser for helpful discussions. References Aitchison, J. and Silvey, S. D. (1953). Maximum-likelihood estimation of parameters subject to restraints. Ann. Math. Statist. 29, 813-828. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Stat. Soc. B 36, 192-236. Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician 24, 179-195. Bravo, F. (2005). Blockwise empirical entropy tests for time series regressions. J. Time Ser. Anal. 26, 185-210. Chuang, C. and Chan, N. H. (2002). Empirical likelihood for autoregressive models, with applications to unstable time series. Statist. Sinica 12, 387-407. Chan, G. and Wood, A. T. A. (1997). An algorithm for simulating stationary Gaussian random fields. Applied Statistics 46, 171-181. Chen, S. X. and Cui, H.-J. (2006). On Bartlett correction of empirical likelihood in the presence of nuisance parameters. Biometrika 93, 215-220. Chen, S. X. and Cui, H.-J. (2007). On the second order properties of empirical likelihood with moment restrictions. Journal of Econometrics (to appear) Chen, S. X., Härdle, W. and Li, M. (2003). An empirical likelihood goodness-of-fit test for time series. J. R. Stat. Soc. B 65, 663-678. Cramèr, H. (1946). Mathematical Methods of Statistics. Princeton University Press, N.J. Cressie, N. (1993). Statistics for Spatial Data, 2nd Edition. John Wiley & Sons, New York. DiCiccio, T., Hall, P., and Romano, J. P. (1991). Empirical likelihood is Bartlett-correctable. Ann. Statist. 19, 1053-1061. Doukhan, P. (1994). Mixing: properties and examples. Lecture Notes in Statistics 85. Springer, New York. 21 Garcia-Soidan, P. H. and Hall, P. (1997). In sample reuse methods for spatial data. Biometrics 53, 273-281. Hall, P. and La Scala, B. (1990). Methodology and algorithms of empirical likelihood. Internat. Statist. Rev. 58, 109-127. Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes. Ann. Statist. 25, 2084-2102. Kitamura, Y., Tripathi, G. and Ahn, H. (2004). Empirical likelihood-based inference in conditional moment restriction models. Econometrica 72, 1667-1714. Lahiri, S. N. (2003a). Resampling Methods for Dependent Data. Springer, New York. Lahiri, S. N. (2003b). Central limit theorems for weighted sums of a spatial process under a class of stochastic and fixed designs. Sankhya 65, 356-388. Lin, L. and Zhang, R. (2001). Blockwise empirical Euclidean likelihood for weakly dependent processes. Statist. Probab. Lett. 53, 143-152. Lee, Y. D. and Lahiri, S. N. (2002). Least squares variogram fitting by spatial subsampling. J. R. Stat. Soc. B 64, 837-854. Monti, A. C. (1997). Empirical likelihood confidence regions in time series models. Biometrika 84, 395-405. Newey, W. K. and Smith, R. J. (2004). Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 72, 219-255. Nordman, D. J. and Lahiri, S. N. (2004). On optimal spatial subsample size for variance estimation. Ann. Statist. 32, 1981-2027. Nordman, D. J. and Lahiri, S. N. (2006). A frequency domain empirical likelihood for shortand long-range dependence. Ann. Statist. 34, 3019-3050. Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 237-249. Owen, A. B. (1990). Empirical likelihood confidence regions. Ann. Statist. 18, 90-120. Politis, D. N., Romano, J. P., and Wolf, M. (1999). Subsampling. Springer, New York. Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist. 22, 300-325. 22 Qin, J. and Lawless, J. (1995). Estimating equations, empirical likelihood and constraints on parameters. Canad. J. Statist. 23, 145-159. Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. R. Stat. Soc. B 58, 509-523. Sherman, M. and Carlstein, E. (1994). Nonparametric estimation of the moments of a general statistic computed from spatial data. J. Amer. Statist. Assoc. 89, 496-500. Zhang, J. (2006). Empirical likelihood for NA series. Statist Probab. Lett. 76, 153-160. 23 A blockwise empirical likelihood for spatial lattice data Daniel J. Nordman Dept. of Statistics Iowa State University Ames, IA 50011 dnordman@iastate.edu 8 Appendix Section 8.1 details the spatial mixing and moment conditions used to establish the main results of the manuscript. Section 8.2 provides some technical lemmas to facilitate the proofs of the main results, which are presented in Section 8.3. In Section 8.4, we describe a further result on EL inference under parameter constraints. Section 8.5 describes the spatial bootstrap method used to implement the spatial EL Bartlett correction from Section 4 of the manuscript. 8.1 Assumptions To establish the main results on the spatial EL, we require assumptions on the spatial process and the potential vector Gθ of estimating functions. Recall that we may collect observations from the real-valued, strictly stationary spatial process {Zs : s ∈ Zd } into m-dimensional vectors Ys = (Zs+h1 , Zs+h2 , . . . , Zs+hm )0 , s ∈ Zd , using fixed lag vectors h1 , h2 , . . . , hm ∈ Zd for a positive integer m ≥ 1. Recall Rn = λn R0 ⊂ Rd denotes the sampling region for the process {Zs : s ∈ Zd } and Rn,Y is the sampling region of the observed Ys , s ∈ Zd , where n = |Rn ∩ Zd | and nY = |Rn,Y ∩ Zd | are the sample sizes for each region. We first outline some notation. For A ⊂ Rd , denote the Lesbegue volume of an uncountable set A as vol(A) and the cardinality of a countable set A as |A|. Limits in order symbols are taken letting n → ∞ and, for two positive sequences, we write sn ∼ tn if sn /tn → 1. For a vector x = (x1 , ..., xd )0 ∈ Rd , 1 let kxk and kxk∞ = max1≤i≤d |xi | denote the Euclidean and l∞ norms of x, respectively. Define the distance between two sets E1 , E2 ⊂ Rd as: dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 }. Let FY (T ) denote the σ-field generated by the random vectors {Ys : s ∈ T }, T ⊂ Zd , and define the strong mixing coefficient for the strictly stationary random field {Ys : s ∈ Zd } as αY (v, w) = sup{α̃(T1 , T2 ) : Ti ⊂ Zd , |Ti | ≤ w, i = 1, 2; dis(T1 , T2 ) ≥ v}, v, w > 0, (8) where α̃Y (T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ FY (T1 ), B ∈ FY (T2 )}. In the following assumptions, let θ0 denote the unique parameter value which satisfies (1). Throughout the sequel, we use C to denote a generic positive constant that does not depend on n or any Zd points and may vary from instance to instance. Assumptions 2 d 1. As n → ∞, b−1 n +(bn /λn ) = o(1) and, for any positive real sequence an → 0, the number −(d−1) of cubes of an Zd which intersect the closures R0 and Rd \ R0 is O(an ). 2. There exist nonnegative functions α1 (·) and q(·) such that α1 (v) → 0 as v → ∞ and αY (v, w) ≤ α1 (v)q(w), v, w > 0. The non-decreasing function q(·) is bounded for the time series case d = 1, but may be unbounded q(w) → ∞ as w → ∞ for d ≥ 2. 3. For some 0 < δ ≤ 1, E{kGθ0 (Ys )k6+δ } < ∞, 4. The r × r matrix Σθ0 = 0 < κ < (5d − 1)(6 + δ)/(dδ) and C > 0, it holds that P∞ 5d−1 α1 (v)δ/(6+δ) < ∞, q(w) ≤ Cwκ , w ≥ 1. v=1 v P h∈Zd Cov{Gθ0 (Ys ), Gθ0 (Ys+h )} is positive definite. To avoid pathological sampling regions, a boundary condition on R0 in Assumption 1 implies that the number of Zd lattice points near the boundary of Rn = λn R0 is of smaller order O(λd−1 n ) than the volume of the sampling region Rn . Lahiri (2003a, p. 283) also describes this boundary condition, which is satisfied for most practical sampling regions. As a consequence, the number n of Zs -sampling sites (i.e., Zd points) contained in Rn , as well as the number nY of Ys -sampling sites in Rn,Y , is asymptotically equivalent to the volume of Rn : n = |Rn ∩ Zd | ∼ vol(Rn ) = λdn vol(R0 ), 2 nY = |Rn,Y ∩ Zd | ∼ λdn vol(R0 ). The growth rate of the spatial block factor bn in Assumption 1 represents a spatial extension of scaling conditions used for the blockwise EL for time series d = 1 in Kitamura (1997); this is equivalent to the block condition (3). Additionally, the boundary condition on R0 allows the number of blocks to be quantified under different EL blocking schemes; see Lemma 2(i) of the following Section 8.3 for illustration. Assumption 2 describes a mild bound on the mixing coefficient from (8) with growth rates set in Assumption 3. These mixing assumptions permit moment bounds and a central limit P theorem to be applied to sample means of the form Ḡn = s∈Rn,Y ∩Z2 Gθ0 (Ys )/nY (Lahiri, 2003b); Lemma 1 in Section 8.3 illustrates such moment bounds. The conditions on the mixing coefficient (8) in Assumptions 2-3 apply to many weakly dependent random fields including certain linear fields with a moving average representation, Gaussian fields with analytic spectral densities, Markov random fields as well as various time series; see Doukhan (1994). For d > 1, we allow (8) to become unbounded in w, which is important in the spatial case to avoid a more restrictive form of mixing; see Lahiri (2003a, p. 295). Assumption 4 implies that the limiting variance Σθ0 = limn→∞ nY Var(Ḡn ) is positive definite. 8.2 Preliminary results for main proofs Lemma 1 gives moment bounds based on Doukhan (1994, p. 9, 26) while Lemma 2 provides some important distributional results for proving the main EL results. In particular, parts (ii) and (iii) of Lemma 2 entail that, at the true parameter value θ0 , spatial block sample means Mθ0 ,i , i ∈ In , from the EL construction (4) can be combined to produce normally distributed averages or consistent variance estimators. Parts (iv)-(vi) of this lemma are used to prove that, in a neighborhood of θ0 , the EL ratio Rn (θ) from (4) can be finitely computed and also that a sequence θ̂n of maximizers of Rn (θ) (i.e., the maximal EL estimator) must exist in probability. Lemma 3 establishes the distribution of the spatial log-EL ratio at the true parameter value θ0 . Proofs of Lemmas 2 and 3 appear subsequently. Lemma 1 (i) Suppose a random variable Xi is measurable with respect to FY (Ti ) for bounded Ti ⊂ Zd , i = 1, 2 and let s, t > 0, 1/s + 1/t < 1. If dis(T1 , T2 ) > 0 and expectations are finite, ¡ ¢1−1/s−1/t . then |Cov(X1 , X2 )| ≤ 8{E(|X1 |s )}1/s {E(|X2 |t )}1/t αY dis(T1 , T2 ); max |Ti | i=1,2 3 (ii) Under Assumptions 2-3, for any real 1 ≤ k ≤ 6 and T ⊂ Zd it holds that E{k P s∈T G̃θ0 (Ys )kk } ≤ C|T |k/2 , where G̃θ0 (Ys ) = Gθ0 (Ys ) − E{Gθ0 (Ys )}. Lemma 2 Let In = IbOL or IbNnOL and NI = |In |. Under Assumptions 1-4, n (i) |IbOL | ∼ vol(Rn,Y ), nY ∼ vol(Rn,Y ), |IbNnOL | ∼ vol(Rn,Y )/bdn and vol(Rn,Y ) ∼ vol(Rn ) = n λdn vol(R0 ); P d 1/2 (ii) nY M̄θ0 −→ N (0r , Σθ0 ), where M̄θ0 ≡ i∈In Mθ0 ,i /NI ; p 0 b θ0 ≡ bdn P (iii) Σ i∈In Mθ0 ,i Mθ0 ,i /NI −→ Σθ0 , with Σθ0 from Assumption 4; (iv) P (Rn (θ0 ) > 0) → 1; ¡ 5/12 ¢ (v) maxi∈In kMθ0 ,i k = Op b−d n ; Y n P d/2 (vi) P (inf v∈Rr ,kvk=1 NI−1 i∈In bn v 0 Mθ0 ,i I(v 0 Mθ0 ,i > 0) > C) → 1 for some C > 0, letting I(·) denote the indicator function. d Lemma 3 Under Assumptions 1-4 and In = IbOL or IbNnOL , it holds in (6) that `n (θ0 ) −→ χ2r . n Proof of Lemma 2. Assumption 1 yields part(i) of the lemma. We shall sketch the proof for vol(Rn,Y ) and the number |IbOL | of OL blocks; the remaining cases follow similarly and n more details on counting results can be found in Nordman and Lahiri (2004). For a positive integer j, define Jn (j) = {i ∈ Zd : (i + j[−1, 1]d ) ∩ Rn 6= ∅, (i + j[−1, 1]d ) ∩ (Rd \ Rn ) 6= ∅}, where again Rn = λn R0 , and note that for an = j/λn ¯© ª¯ |Jn (j)| ≤ (2j + 1)d ¯ i ∈ an Zd : cube i + an [−1, 1]d intersects both R0 and Rd \ R0 ¯ = (2j + 1)d O(a−(d−1) ) = O(jλd−1 n n ) (9) by the R0 -boundary condition in Assumption 1. The bound in (9) also holds if we replace a fixed integer j by the sequence of block factors bn (i.e., replace j, Jn (j) with bn , Jn (bn )). Recall that Rn,Y = {s ∈ Rn : s + h1 , . . . , s + hm ∈ Rn } ⊂ Rd is defined with respect to m d fixed lags {hi }m i=1 ⊂ Z . Let h = max1≤i≤m khi k∞ and note that vol(Rn ) − vol(Rn \ R∗n,Y ) = vol(R∗n,Y ) ≤ vol(Rn,Y ) ≤ vol(Rn ) 4 where R∗n,Y = {s ∈ Rn : s + h[−1, 1]d ⊂ Rn }. Then, for fixed h by (9), we find vol(Rn \ d R∗n,Y ) ≤ (2h)d |Jn (h)| = O(λd−1 n ) so that vol(Rn,Y ) ∼ vol(Rn ) = λn vol(R0 ) follows. Likewise, ¯ ¯ n = |Zd ∩ Rn | ∼ vol(Rn ) holds from ¯n − vol(Rn )¯ ≤ 2d |Jn (1)| and then |IbOL | ∼ vol(Rn ) n follows from n − |Jn (bn )| ≤ |IbOL | ≤ n and |Jn (bn )| = O(bn λd−1 ) = o(vol(Rn )). n To prove parts of Lemma 2(ii) and (iii), we treat only the OL block case In = IbOL ; n the NOL case follows similarly and we shall describe the modifications required for handling P NOL blocks. Defining the overall sample mean Ḡn ≡ n−1 Y s∈Rn,Y ∩Zd Gθ0 (Ys ), it holds that d 1/2 nY Ḡn −→ N (0r , Σθ0 ) under Assumptions 1-3 by applying the spatial central limit theorem result in Theorem 4.2 of Lahiri (2003b). Now define a scaled difference between Ḡn and the average of block sample means M̄θ0 as X −1 An ≡ Ḡn − n−1 Y NI M̄θ0 = nY ws Gθ0 (Ys ), s∈Rn,Y ∩Zd where the last representation uses weights ws ∈ [0, 1] for each s ∈ Rn,Y ∩ Zd where d OL ws = 1 − b−d n × “# of OL blocks among {Bbn (i) ≡ i + bn (−1/2, 1/2] : i ∈ Ibn } containing s”. Because ws = 0 if s + bn [−1, 1]d ⊂ Rn,Y , it holds that |{s ∈ Rn,Y ∩ Zd : ws 6= 0}| ≤ |Jn (bn )| ≤ Cbn λd−1 from (9) and Rn,Y ⊂ Rn . Consequently, letting 0 ∈ Zd denote the zero vector, we n have d nY E(A2n ) ≤ n−1 Y |{s ∈ Rn,Y ∩ Z : ws 6= 0}| X kCov{Gθ0 (Y0 ), Gθ0 (Yh )}k h∈Zd −1 ≤ Cbn λd−1 n nY = O(bn /λn ) = o(1) follows from Lemma 2(i) along with X kCov(Gθ0 (Y0 ), Gθ0 (Yh ))k ≤ C ∞ X αY (v; 1)δ/(6+δ) |{h ∈ Zd : khk∞ = v}| < ∞, (10) v=1 h∈Zd ,h6=0 which holds by Lemma 1 with Assumptions 2-3 and |{h ∈ Zd : khk∞ = v}| ≤ 2(2v + 1)d−1 , 1/2 p v ≥ 1. Hence, in the OL block case, nY An −→ 0 and part(ii) follows from the normal limit 1/2 of nY Ḡn along with Slutsky’s theorem and n−1 Y NI → 1 for OL blocks by Lemma 2(i). (In the P −1 d NOL block case, we define a difference An ≡ Ḡn − n−1 Y bn NI M̄θ0 = nY s∈Rn,Y ∩Zd ws Gθ0 (Ys ), where weight ws = 1 if site s ∈ Rn,Y ∩ Zd belongs to some NOL block in the collection 5 p 1/2 {Bbn (i) : i ∈ IbNnOL } and ws = 0 otherwise. Then, nY An −→ 0 holds for NOL blocks by the d same argument and part(ii) then follows by Slutsky’s theorem along with n−1 Y bn NI → 1 for NOL blocks by Lemma 2(i).) We next establish Lemma 2(iii) for OL blocks In = IbOL . Writing h = (h1 , . . . , hd )0 ∈ Zd , n note that by the Dominated Convergence Theorem and (10) we have that b θ0 ) = bd E(Mθ0 ,0 M 0 ) = b−d Var E(Σ n θ0 ,0 n X Gθ0 (Ys ) s∈Bbn (0)∩Zd = X b−d n Cov(Gθ0 (Y0 ), Gθ0 (Yh )) d Y (bn − |hi |) → Σθ0 , i=1 khk∞ ≤bn for expectation over the cube Bbn (0) = bn (−1/2, 1/2]d . Hence, for part(iii) it suffices to show b θ0 v2 ) = o(1) for any vi ∈ Rr , kvi k = 1, i = 1, 2. Fix v1 , v2 and expand the variance Var(v10 Σ b θ0 v2 ) = N −2 b2d Var(v10 Σ n I X |{i ∈ In : i + h ∈ In }|Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,h Mθ0 0 ,h v2 )} h∈Zd ≡ A1n + A2n by considering two sums of covariances at displacements h ∈ Zd with khk∞ ≤ bn (i.e., A1n ) or khk∞ > bn (i.e., A2n ). Then, applying the Cauchy-Schwartz inequality with Lemma 1(ii) and Assumption 3, we have for h ∈ Zd |Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,h Mθ0 0 ,h v2 )}| ≤ Var(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ) ≤ E(kMθ0 ,0 k4 ) ≤ Cb−2d n , so that |A1n | ≤ CNI−1 |{h ∈ Zd : khk∞ ≤ bn }| = O(bdn /λdn ) = o(1) by Lemma 2(i) for OL blocks. For h ∈ Zd with khk∞ > bn , it holds that dis[Bbn (0), Bbn (h)] ≥ 1 so that by Assumption 3 and Lemma 1(i) (i.e., taking s = t = 3/(6 + δ) there for δ in Assumption 3), we may bound the covariance |Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,h Mθ0 0 ,h v2 )}| by the quan¢δ/(6+δ) ¡ where the moment satisfies tity C{E(kMθ0 ,0 k(12+2δ)/3 )}6/(6+δ) αY dis[Bbn (0), Bbn (h)], bdn by Lemma 1(ii). By Lemma 2(i) and Assumptions 2-3, {E(kMθ0 ,0 k(12+2δ)/3 )}6/(6+δ) ≤ Cb−2d n 6 we then bound |A2n | ≤ ≤ b2d n NI C NI C ≤ NI X ¡ ¢δ/(6+δ) {E(kMθ0 ,0 k(12+2δ)/3 )}6/(6+δ) αY dis[Bbn (0), Bbn (h)], bdn h∈Zd ,khk∞ >bn ∞ X k(k + bn )d−1 αY (k, bdn )δ/(6+δ) k=1 bn X dκδ/(6+δ) d−1 k(k + bn ) k=1 d d+1 + Cλ−d ≤ Cλ−d n bn n bn Cbn + NI ∞ X µ ¶4d−1 ∞ X k k d α1 (k)δ/(6+δ) bn k=b +1 n k 5d−1 α1 (k)δ/(6+δ) = o(1), k=bn +1 using |{h ∈ Zd : dis[Bbn (0), Bbn (h)] = k}| ≤ Ck(k +bn )d−1 , k ≥ 1, in the second inequality and substituting (k/bn )4d−1 ≥ 1 in the second sum of the third inequality. So part(iii) follows for b θ0 v2 ) = OL blocks. (We note that, in the case of NOL blocks, the above argument that Var(v10 Σ o(1) must be slightly modified. When In = IbNnOL and NI = |IbNnOL |, then b θ0 v) = Var(v 0 Σ X b2d n |{i ∈ In : i + bn h ∈ In }|Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,bn h Mθ0 0 ,bn h v2 )} NI d h∈Z ≡ A1n + A2n −1 0 0 where A1n = NI−1 b2d n Var(v1 Mθ0 ,bn h Mθ0 ,bn h v2 ) = O(NI ) = o(1) corresponds to the covariance sum at lag h = 0 and A2n = o(1) represents the sum of covariance terms over non-zero lags khk > 0.) In proving the remaining parts of Lemma 2, we need not make a distinction between OL or NOL blocks. To show part(iv) of Lemma 2, we will assume part(vi) holds. We argue that a contradiction arises by supposing that the event in probability statement of part(vi) holds and the zero vector 0r ∈ Rr is not interior to the convex hull of {Mθ0 ,i : i ∈ In }. If 0r is not interior, then by supporting/separating hyperplane theorem there exists some v ∈ Rr , kvk = 1 where v 0 Mθ0 ,i ≤ v 0 0r = 0 holds for all i ∈ In ; however, this contradicts the event in the probability statement of part(vi), which implies that v 0 Mθ0 ,i > 0 holds for some i ∈ In . Therefore, whenever the event in part(vi) holds, then 0r must be interior to the convex hull of {Mθ0 ,i : i ∈ In }, which implies Rn (θ0 ) > 0 by (5). Hence, part(vi) implies part(iv) of the lemma. 7 To show part(v), note E(max kMθ0 ,i k) ≤ E i∈In à X kMθ0 ,i k6 !1/6 i∈In −5/12 d bn by Lemma 1(ii) so that nY ( = X )1/6 1/6 E(kMθ0 ,i k6 ) ≤ Cbn−d/2 NI i∈In −1/4 d/2 bn ) maxi∈In kMθ0 ,i k = Op (nY −d/4 d/2 bn ) = Op (λn = op (1) by Assumption 1, Lemma 2(i) and NI ≤ nY . Finally, to establish part(vi), we employ an empirical distribution of block means F̂ (v) = P d/2 NI−1 i∈In I(bn Mθ0 ,i ≤ v), v ∈ Rr . For fixed v ∈ Rd , it holds that |F̂n (v)−P (Z ≤ v)| = op (1) where Z denotes a normal N (0r , Σθ0 ) random vector. This can be shown using E{F̂n (v)} = d/2 P (bn Mθ0 ,0 ≤ v) → P (Z ≤ v) under Assumptions 1-3 by applying a central limit theorem for d/2 the block sample mean bn Mθ0 ,0 (Theorem 4.2, Lahiri, 2003b) and verifying Var{F̂n (v)} = o(1) similar to the proof of Lemma 2(iii). Consequently, supv∈Rr |F̂n (v) − P (Z ≤ v)| = op (1) holds by Polya’s theorem and, from this and part(iii), one can prove convergence of the following absolute “half-space” moments of F̂n (·) X ¯ ¯ 0 0 ¯ = op (1). sup ¯NI−1 bd/2 |v M | − E|v Z| θ ,i 0 n v∈Rr ,kvk=1 Using this along with sup v∈Rr ,kvk=1 0 i∈In P −→ 0r by part(ii), where M̄θ0 = NI−1 i∈In Mθ0 ,i , we have ¯ −1 X d/2 0 ¯ 0 −1 0 ¯N ¯ = op (1) b v M I(v M > 0) − 2 E|v Z| θ ,i θ ,i 0 0 n I 1/2 bn M̄θ0 p i∈In 0 because v Mθ0 ,i I(v Mθ0 ,i > 0) = (|v 0 Mθ0 ,i | + v 0 Mθ0 ,i )/2 for i ∈ In , v ∈ Rr . Now part(vi) follows using the fact that inf v∈Rr ,kvk=1 E|v 0 Z| ≥ C holds for some C > 0 since Var(Z) = Σθ0 is positive definite by Assumption 4. ¤ Proof of Lemma 3. By Lemma 2(iv), a positive Rn (θ0 ) exists in probability and can be Q written, from (5), as Rn (θ0 ) = i∈In (1 + γθ0 ,i )−1 with γθ0 ,i = t0θ0 Mθ0 ,i < 1, where tθ0 ∈ Rr satisfies Q1n (θ0 , tθ0 ) = 0r in (15). By Lemma 2, it holds that Zθ0 ≡ maxi∈In kMθ0 ,i k = 1/2 op (b−d n nY ). We now modify an argument from Owen (1990, p. 101) by writing tθ0 = ktθ0 kuθ0 with uθ0 ∈ Rr , kuθ0 k = 1, and then expanding Q1n (θ0 , tθ0 ) = 0r to find 1/2 nY ktθ0 k X u0θ0 Mθ0 ,i Mθ0 0 ,i uθ0 1/2 0 1/2 0 = −nY uθ0 Q1n (θ0 , tθ0 ) = − nY u0θ0 M̄θ0 NI 1 + γθ0 ,i i∈I n ≥ 1+ 1/2 0 b nY b−d n ktθ0 kuθ0 Σθ0 uθ0 1/2 −1/2 (nY bdn Zθ0 )(nY b−d n ktθ0 k) 8 1/2 − nY kM̄θ0 k (11) where the inequality follows upon replacing each γθ0 ,i with Zθ0 ktθ0 k and u0θ0 M̄θ0 with kM̄θ0 k and b θ0 from Lemma 2. Then combining the facts that nY using the definitions of M̄θ0 , Σ −1/2 d bn Zθ0 = b θ0 uθ0 > C) → 1 for some op (1), that nY kM̄θ0 k = Op (1) by Lemma 2(ii), and that P (u0θ0 Σ 1/2 −1/2 C > 0 by Lemma 2(iii) and Assumption 4, we deduce ktθ0 k = Op (bdn nY ) from (11). From this, we also have maxi∈In |γθ0 ,i | ≤ ktθ0 kZθ0 = op (1). b θ0 is positive definite in probability, we may algebraically solve Q1n (θ0 , tθ0 ) = 0r for As Σ b −1 M̄θ0 + φθ0 where tθ0 = bdn Σ θ0 b −1 kkΣ b θ0 k Zθ0 ktθ0 k2 kΣ −1/2 θ0 kφθ0 k ≤ = op (bdn nY ). 1 − ktθ0 kZθ0 (12) Applying a Taylor expansion gives log(1 + γθ0 ,i ) = γθ0 ,i − γθ20 ,i /2 + ∆i for each i ∈ In so that `n (θ0 ) = 2Bn X 0 b b −1 M̄θ0 − b−2d log(1 + γθ0 ,i ) = nY (M̄θ0 0 Σ n φθ0 Σθ0 φθ0 ) + 2Bn θ0 i∈In X ∆i (13) i∈In d b −1 M̄θ0 −→ χ2r and it also holds that where Bn = nY /(bdn NI ). By Lemma 2(ii)-(iii), nY M̄θ0 0 Σ θ0 0 b b−2d n nY φθ0 Σθ0 φθ0 = op (1) from (12). Finally, we may bound 2Bn X i∈In 3 b b−2d n nY 2Zθ0 ktθ0 k kΣθ0 k = op (1). |∆i | ≤ (1 − Zθ0 ktθ0 k)2 (14) Lemma 3 then follows by Slutsky’s Theorem. ¤ 8.3 Proofs of the main results Proof of Theorem 1. In the case that H(θ) = θ is the identity mapping, the result follows immediately from Lemma 3. From this, Theorem 1 follows for a general smooth H(·) as in the proof of Theorem 2.1 of Hall and La Scala (1990). ¤ −5/12 Proof of Theorem 2. Set Θn = {θ ∈ Θ : kθ − θ0 k ≤ nY }, ∂Θn = {θ ∈ Θ : kθ − θ0 k = P P −5/12 0 b θ = bd nY } and define M̄θ = i∈In Mθ,i /NI , Σ n i∈In Mθ,i Mθ,i /NI , θ ∈ Θn and functions ¡ ¢ X ∂Mθ,i /∂θ 0 t b−d Q2n (θ, t) = n , NI i∈I 1 + t0 Mθ,i 1 X Mθ,i Q1n (θ, t) = , NI i∈I 1 + t0 Mθ,i n on Θ×Rr . For i = 1, 3, set Jn,i = P s∈Rn,Y ∩Zd (15) n J i (Ys )/nY , noting Ji,n = Op (1) by EJ 3 (Ys ) < ∞; again J(·) is assumed to be nonnegative. To establish Theorem 2, we proceed in three steps 9 to show, that with arbitrarily large probability as n → ∞, the following hold: Step 1. the log EL ratio `n (θ) exists finitely on Θn and is continuously differentiable and hence a sequence of minimums θ̂n exists of `n (θ) on Θn (i.e., θ̂n is a maximizer of Rn (θ)); Step 2. θ̂n 6∈ ∂Θn and ∂`n (θ)/∂θ = 0p at θ = θ̂n ; Step 3. θ̂n has the normal limit stated in Theorem 2. Step 1. Note that ¯ ¯ ¯ ¯ X kMθ,i − Mθ ,i k ¢ ¯ ¯ −1 X ¡ 0 0 , sup ¯NI v Mθ,i I(v 0 Mθ,i > 0) − v 0 Mθ0 ,i I(v 0 Mθ0 ,i > 0) ¯ ≤ sup ¯ θ∈Θn N v∈Rr ,kvk=1 ¯ I i∈I i∈I n n θ∈Θn −5/12 which is bounded by CJn,1 supθ∈Θn kθ − θ0 k = Op (nY −d/2 ) = op (bn ). From this and Lemma 2(vi), it holds that, for some C > 0, à ! X d/2 0 0 P inf bn v Mθ,i I(v Mθ,i > 0)/NI > C → 1 kvk=1,θ∈Θn i∈In As proof of Lemma 2(iv), when the event in the above probably statement holds, then for any Q θ ∈ Θn , we may write Rn (θ) = i∈In (1 + γθ,i ) > 0 where γθ,i = t0θ Mθ,i and Q1n (θ, tθ ) = 0r . −1/2 Let Ωθ = max{nY b θ around θ0 , we find , kθ − θ0 k}, θ ∈ Θn . Expanding both M̄θ and Σ 1/2 sup kM̄θ k/Ωθ ≤ nY kM̄θ0 k + CJn,1 sup Ω−1 θ kθ − θ0 k = Op (1), θ∈Θn (16) θ∈Θn b θ − Σθ0 k ≤ sup kΣ bθ − Σ b θ0 k + kΣ b θ0 − Σθ0 k = op (1), sup kΣ θ∈Θn θ∈Θn 1/2 by applying Lemma 2(ii)-(iii) above along with Ω−1 θ ≤ nY and d X bθ − Σ b θ0 k ≤ sup bn sup kΣ kMθ0 ,i kkMθ0 ,i − Mθ,i k(1 + kMθ0 ,i − Mθ,i k) ≡ An θ∈Θn θ∈Θn NI i∈In © ª1/3 −5/12 d E(An ) ≤ CnY bn {E[J(Y0 )3 ]}2/3 E(kMθ0 ,0 k3 ) + [E(kMθ0 ,0 k3 )]2 −5/12 d/2 bn ≤ CnY = o(1), which follows from Holder’s inequality, nY ∼ vol(R0 )λdn by Lemma 2(i), and using Lemma 1(ii) b −1 exists uniin the last line. Hence, by the positive definiteness of Σθ0 in Assumption 4, Σ θ b n by (16) implies, for each fixed θ ∈ Θn , formly in θ ∈ Θn . Also, the positive definiteness of Σ ∂Q1n (θ, t)/∂t is negative definitive for t ∈ {t ∈ Rr : 1 + t0 Mθ,i ≥ 1/NI , i ∈ In } so that, by implicit function theorem using Q1n (θ, tθ ) = 0r , tθ is a continuously differentiable function of θ on Θn and the function `n (θ) = −2Bn log Rn (θ) is as well (e.g., Qin and Lawless, 1994, 10 p. 304-305). Hence, with large probability as n → ∞, the minimizer of `n (θ) exists on Θn . Step 2. Let Zθ ≡ maxi∈In kMi,θ k, θ ∈ Θn . Using b2n /λn = o(1) by Assumption 1, supθ∈Θn Ωθ ≤ −5/12 nY , and Lemma 2 [parts(i) and (v)], we may expand the block means Mθ,i , i ∈ In around θ0 to find µ sup θ∈Θn Ωθ bdn Zθ ≤ −5/12 bdn nY ≤ op (1) + ¶ 1/3 max kMi,θ0 k + sup Ckθ − θ0 k(nY Jn,3 ) i∈In θ∈Θn −1/2 Op (bdn nY ) = op (1). (17) Now using (16) and (17) and that Q1n (θ, tθ ) = 0r for θ ∈ Θn , we can repeat the same essential 1/2 argument in (11) (i.e., replace θ0 , nY 0≥ there with θ, Ω−1 θ ) to find −d 0 b Ω−1 θ bn ktθ kuθ Σθ uθ − Ω−1 θ kM̄θ k −d kt k) 1 + (Ωθ bdn Zθ )(Ω−1 b θ n θ ¡ ¢ with tθ = ktθ kuθ , kuθ k = 1 −d and then show supθ∈Θn Ω−1 θ bn ktθ k = Op (1). From this (and analogous to (12) from the proof b −1 M̄θ + φθ for θ ∈ Θn where of Lemma 3), we expand Q1n (θ, tθ ) = 0r to yield tθ = bdn Σ θ −d supθ∈Θn Ω−1 θ bn kφθ k = op (1). Using now these orders of kφθ k, ktθ k and Zθ with arguments as in (13) and (14), we may then expand `n (θ) uniformly in θ ∈ Θn as à #! " 2 b 2Z kt k k Σ k θ θ θ −2 −2d 0 b −1 b θ φθ + Ω−2 sup φ0θ Σ = op (1) sup n−1 Y Ωθ |`n (θ) − nY M̄θ Σθ M̄θ | ≤ Op θ bn (1 − Zθ ktθ k)2 θ∈Θn θ∈Θn and then using (16) −2 0 b −1 sup n−1 Y Ωθ |`n (θ) − nY M̄θ Σθ0 M̄θ | = op (1) θ∈Θn follows. For each θ ∈ Θn , we may write M̄θ = M̄θ0 +D̄θ0 (θ−θ0 )+Eθ for D̄θ0 = NI−1 P i∈In ∂Mθ0 ,i /∂θ p and a remainder Eθ satisfying supθ∈Θn kEθ k ≤ Ckθ − θ0 k2 Jn,1 . Note that D̄θ0 −→ Dθ0 ≡ E∂Gθ0 (Yt )/∂θ because E D̄θ0 = Dθ0 and, as in (10), Var(D̄θ0 ) ≤ Cn−1 Y X kCov{∂Gθ0 (Y0 )/∂θ, ∂Gθ0 (Yh )/∂θ}k ≤ Cn−1 Y h∈Zd by Lemma 1 and Assumptions 2-3. Hence, we have sup |M̄θ − [M̄θ0 + Dθ0 (θ − θ0 )]| = op (Ωθ ) (18) θ∈Θn and so it now follows that ¯ ¤¯¯ £ ¤0 −1 £ −1 −2 ¯ sup nY Ωθ ¯`n (θ) − nY M̄θ0 + Dθ0 (θ − θ0 ) Σθ0 M̄θ0 + Dθ0 (θ − θ0 ) ¯ = op (1). θ∈Θn 11 (19) −5/12 −5/12 For θ = vθ nY + θ0 ∈ ∂Θn , kvθ k = 1, we have Ωθ = nY so that from (19) we find that 1/6 `n (θ) ≥ σnY /2 holds uniformly in θ ∈ ∂Θn when n is large, where σ denotes the smallest eigenvalue of Dθ0 0 Σ−1 θ0 Dθ0 . At the same time, by Lemma 3, we have `n (θ0 ) = Op (1) (i.e., −2 n−1 Y Ωθ0 = 1 in (19)). Hence, with probability approaching 1, the minimum θ̂n of `n (θ) on Θn cannot be an element of ∂Θn . Hence, θ̂n must satisfy θ̂n ∈ Θn \ ∂Θn and 0r = Q1n (θ̂n , tθ̂n ) in addition to 0p = (2nY )−1 ∂`n (θ)/∂θ|θ=θ̂n = Q2n (θ̂n , tθ̂n ) by the differentiability of `n (θ). b −1 M̄ + φ Step 3. From the argument in Step 2, we may solve Q1n (θ̂n , tθ̂n ) = 0r for tθ̂n = bdn Σ θ̂n θ̂n θ̂ n or £ ¤ b −1 M̄ + b−d φ = Σ−1 M̄θ0 + Dθ0 (θ − θ0 ) + op (Ω ) b−d t = Σ n θ̂n n θ0 θ̂n θ̂n θ̂n θ̂ (20) n P p by Ω−1 b−d kφθ̂n k = op (1), (16) and (18). Recalling also D̄θ0 = NI−1 i∈In ∂Mθ0 ,i /∂θ −→ Dθ0 θ̂n n P from Step 2 along with kD̄θ0 −NI−1 i∈In ∂Mθ̂n ,i /∂θk = Op (kθ̂n −θ0 k), and maxi∈In |t0θ̂ Mθ̂n ,i | ≤ n kt0θ̂ kZθ̂n n = op (1) (where again Zθ̂n = maxi∈In kMθ̂n ,i k), we find from Q2n (θ̂n , tθ̂n ) = 0p that ¡ ¢ X ∂Mθ̂ ,i /∂θ 0 tθ̂ b−d n n n −d = Dθ0 0 b−d (21) 0p = n tθ̂n + op (kbn tθ̂n k). 0 NI i∈I 1 + tθ̂ Mθ̂n ,i n n Now letting δn = kb−d n tθ̂n k + Ωθ̂n , from (20) and (21) we may from write µ −d ¶ · ¸ Σθ0 −Dθ0 bn tθ̂n M̄θ0 + op (δn ) = , 0 Dθ0 0 θ̂n − θ0 op (δn ) −1 Σθ0 −Dθ0 Dθ0 0 1/2 · = 0 Uθ0 Σ−1 θ0 Dθ0 Vθ0 −Vθ0 Dθ0 0 Σ−1 θ0 Vθ0 d ¸ . 1/2 By Lemma 2(ii), nY M̄θ0 −→ N (0, Σθ0 ) holds so it follows that nY δn = Op (1) and the limiting distribution of θ̂n is given by µ 1/2 nY b−d n tθ̂n θ̂n − θ0 ¶ · = Uθ0 −Vθ0 Dθ0 0 Σ−1 θ0 ¸ U 0 0 d 1/2 (22) nY M̄θ0 + op (1) −→ N r , θ0 0p 0 Vθ0 The proof of Theorem 2 is complete. ¤ 12 Proof of Theorem 3. Let PX = X(X 0 X)−1 X 0 denote the projection matrix for a given matrix X of full column rank and let Ir×r denote the r × r identity matrix. Using (19) along −1/2 with kθ̂n − θ0 k = Op (nY −2 ) by (22) and n−1 Y Ωθ0 = 1 in (19), we write −1/2 `n (θ̂n ) = nY (Σθ0 −1/2 `n (θ0 ) = nY (Σθ0 −1/2 M̄θ0 )0 (Ir×r − PΣ−1/2 D )(Σθ0 θ0 −1/2 M̄θ0 )0 (Σθ0 θ0 M̄θ0 ) + op (1), M̄θ0 ) + op (1). The chi-square limit distributions in Theorem 3(i) now follow by Lemma 2(ii) as PΣ−1/2 D , θ0 θ0 Ir×r −PΣ−1/2 D are orthogonal idempotent matrices with ranks p, r−p, respectively. With Theθ0 θ0 orem 3(i) in place, Theorem 3(ii) follows from modifying arguments in Qin and Lawless (1994, Corollary 5) in the proof of Theorem 2. ¤ 8.4 Spatial empirical likelihood under parameter constraints As a continuation of Section 3.3, here we briefly consider constrained maximum EL estimation of spatial parameters. Qin and Lawless (1995) introduced constrained EL inference for independent samples and Kitamura (1997) developed a blockwise version of constrained EL for weakly dependent time series. For spatial data, we may also consider blockwise EL estimation subject to a system of parameter constraints on a spatial parameter θ ∈ Θ ⊂ Rp : ψ(θ) = 0q ∈ Rq where q < p and Ψ(θ) = ∂ψ(θ)/∂θ is of full row rank q. By maximizing the EL function in (5) under the above restrictions on θ, we find a constrained MELE θ̂nψ . Corollary 1 Suppose Theorem 2 conditions hold and, in a neighborhood of θ0 , ψ(θ) is continuously differentiable, k∂ 2 ψ(θ)/∂θ∂θ0 k is bounded, and Ψ(θ0 ) is rank q. If H0 : ψ(θ0 ) = 0q d d holds, then rn (θ̂nψ ) = `n (θ̂nψ ) − `n (θ̂n ) −→ χ2q and `n (θ0 ) − `n (θ̂nψ ) −→ χ2p−q as n → ∞. We can then sequentially test H0 : ψ(θ0 ) = 0q with a log-likelihood ratio statistic `n (θ̂nψ )−`n (θ̂n ) and, if failing to reject H0 , make an approximate 100(1−α)% confidence region for constrained θ values {θ : ψ(θ) = 0q , `n (θ) − `n (θ̂nψ ) ≤ χ2p−q,1−α }. Proof of Corollary 1. We sketch the proof which requires modifications to the proof of Theorem 2 as well as arguments from Qin and Lawless (1995) (for the iid data case); we shall 13 employ notation used in the proof of Theorem 2. Write the functions ψ(θ), Ψ(θ) as ψθ , Ψθ in the following. To establish the existence of θ̂nψ , let Q∗1n (θ, t, ν) = Q1n (θ, t), Q∗2n (θ, t, ν) = Q2n (θ, t) + Ψ0θ ν, and Q∗3n (θ, t, ν) = ψθ and define Un = {(θ, t, ν) ∈ Rp × Rr × Rq : θ ∈ −5/12 Θn , kt/bdn k + kνk ≤ nY }. Step 1. It can first be shown that the system of equations: Q∗1n (θ, t, ν) = 0r , Q∗2n (θ, t, ν) = 0p , Q∗3n (θ, t, ν) = 0q (23) −1 has a solution (θn∗ , t∗n , νn∗ ) ∈ Un . Uniformly in θ ∈ Θn , it holds that b−d n ∂tθ /∂θ = Σθ0 Dθ0 + op (1) (by differentiating Q∗1n (θ, tθ ) = 0r with respect to θ) and that (2nY )−1 ∂`n (θ)/∂θ = −5/12 Vθ−1 (θ − θ0 ) + Tθ where Tθ is continuous in θ and supθ∈Θn kTθ k = op (nY 0 ) (by expanding (2nY )−1 ∂`n (θ)/∂θ = Q2n (θ, tθ ) around θ0 ). For θ ∈ Θn , define ψθ −Ψθ0 (θ−θ0 ) = kθ−θ0 k2 k(θ), where k(θ) is continuous and bounded, and write a function η(θ) as η(θ) = 1 ∂`n (θ) + 2nY ∂θ (24) ³ Ψ0θ (Ψθ0 Vθ0 Ψ0θ )−1 kθ − θ0 k2 k(θ) − Ψθ0 Vθ0 · ¸´ 1 ∂`n (θ) −1 − Vθ0 (θ − θ0 ) . 2nY ∂θ It can be shown that η(θ) = Vθ−1 (θ − θ0 ) + T̃θ , where T̃θ is continuous in θ and supθ∈Θn kT̃θ k = 0 −5/12 op (nY ) , which implies that there exists θ̂n∗ ∈ Θn \ ∂Θn such that −η(θ̂n∗ ) = 0p . This root θ̂n∗ of η(θ) inside Θn \ ∂Θn is deduced from Lemma 2 of Aitchison and Silvey (1958); this result entails that because, for large n, −σ1−1 η(θ) maps Θn into {(θ − θ0 ) : θ ∈ Θn } and (θ − θ0 )0 {−σ1−1 η(θ)} < −σ0 /(2σ1 ) holds for θ ∈ ∂Θn (i.e., (θ − θ0 )0 {−σ1−1 η(θ)} is negative for −5/12 kθ−θ0 k = nY ), where σ1 and σ0 > 0 respectively denote the largest and smallest eigenvalues −5/12 of Vθ−1 , it must follow that −σ1−1 η(θ̂n∗ ) = 0 for some kθ̂n∗ −θ0 k < nY 0 by Brouwer’s fixed point theorem. From this root, we have that 0q = Ψθ0 Vθ0 η(θ̂n∗ ) = kθ̂n∗ −θ0 k2 k(θ̂n∗ )+Ψθ̂n∗ (θ̂n∗ −θ0 ) = ψθ̂n∗ from (24) as well as 1 ∂`n (θ̂n∗ ) 1 ∂`n (θ̂n∗ ) = Ψ0θ̂∗ (Ψθ0 Vθ0 Ψ0θ̂∗ )−1 Ψθ0 Vθ0 . n n 2nY ∂θ 2nY ∂θ (25) This yields that θ̂n∗ , the EL Lagrange multiplier tθ̂n∗ for θ̂n∗ defined by Q1n (θ̂n∗ , tθ̂n∗ ) = 0r , and νn∗ = −(Ψθ0 Vθ0 Ψ0θ̂∗ )−1 Ψθ0 Vθ0 (2nY )−1 ∂`n (θ̂n∗ )/∂θ satisfy (23) jointly. n 14 Step 2. We now show that any solution of (23) in Un , say (θ̃, t̃, ν̃), must minimize `n (θ) on Θn subject to the condition ψθ = 0q . To see this, note if θ ∈ Θn with ψθ = 0q , then we make a Taylor expansion around θ̃: i 2 ∗ 1 h 1 ∂`n (θ̃) 1 0 ∂ `n (θ ) `n (θ) − `n (θ̃) = (θ − θ̃) + (θ − θ̃) (θ − θ̃), 2nY 2nY ∂θ0 4nY ∂θ∂θ0 θ∗ between θ, θ̃. Since θ̃ satisfies (23), it follows from some algebra that θ̃ also satisfies (25) after substituting θ̃ for θ̂n∗ . Using 0q = ψθ − ψθ̃ = Ψθ̃ (θ − θ̃) + o(kθ − θ̃k2 ), we find (2nY )−1 ∂`n (θ̃)/∂θ0 (θ − θ̃) = op (kθ−θ̃k2 ) for θ̃ fulfilling (25); it may also be shown that (2nY )−1 ∂ 2 `n (θ∗ )/∂θ∂θ0 = Vθ−1 +op (1) 0 (by expanding (2nY )−1 ∂`n (θ)/∂θ = Q2n (θ, tθ ) around θ0 ). Hence, `n (θ) − `n (θ̃) ≥ {σ0 /2 + op (1)}nY kθ − θ̃k2 , where the op (1) term is uniform for θ ∈ Θn , ψθ = 0. Step 3. By the first two steps, we have therefore established that there exists a consistent MELE θ̂nψ of θ0 , given by θ̂nψ = θ̂n∗ ∈ Θn \ ∂Θn , that satisfies the condition ψ(θ̂nψ ) = 0; we may denote tθ̂nψ = tθ̂n∗ and νnψ = νn∗ . We now show µ 1/2 nY θ̂n − θ0 νnψ ψ ¶ µ · ¸¶ Pθ0 0 d −→ N 0r+p+q , , 0 Rθ0 ³ ´ Pθ0 = Vθ0 Ip×p − Ψ0θ0 Rθ0 Ψθ0 Vθ0 , ³ ´−1 0 Rθ0 = Ψθ0 Vθ0 Ψθ0 . (26) Expanding Q∗in (θ, t, ν) at (θ0 , 0, 0) and using that (θ̂nψ , tθ̂nψ , νnψ ) satisfies (23), we have: −Q1n (θ0 , 0r ) + op (δn∗ ) op (δn∗ ) op (δn∗ ) tθ̂nψ /bdn = Σ∗n θ̂ψ − θ 0 n νnψ ∂Q1n (θ0 , 0r )/∂t ∂Q1n (θ0 , 0r )/∂θ 0 ∗ , Σn = ∂Q (θ , 0 )/∂t 0 Ψ0θ0 2n 0 r 0 Ψθ0 0 , b θ0 , ∂Q1n (θ0 , 0r )/∂θ = D̄θ0 = [bd ∂Q2n (θ0 , 0r )/∂t]0 where Q1n (θ0 , 0r ) = M̄θ0 , bdn ∂Q1n (θ0 , 0r )/∂t = −Σ n p and δn∗ = kθ̂nψ − θ0 k + ktθ̂nψ /bdn k + kνnψ k. Using Lemma 2(iii) and D̄θ0 −→ Dθ0 from the proof of Theorem 2, we have −Σθ0 Dθ0 0 0 Σ∗n −→ 0 Ψ0θ0 Dθ0 0 Ψθ0 0 p C C12 ≡ 11 ≡ C̃, C21 C22 15 h C12 C11 i 0 = Dθ0 0 , C21 = C12 0 0 Ψθ0 . = −Σθ0 , C22 = Ψθ0 0 ) det(−Rθ−1 ) 6= 0, for Qc = C22 − Note that det(C̃) = det(C11 ) det(Qc ) = det(−Σθ0 ) det(Vθ−1 0 0 −1 C21 C11 C12 , and −1 −1 −1 −1 −1 0 −1 C Q P V Ψ R Σ −Σ + Σ C Q C Σ 12 c θ0 θ0 θ0 θ0 12 c 21 θ0 θ0 θ0 θ0 , Q−1 . C̃ −1 = c = −1 −1 −1 Qc Qc C21 Σθ0 Rθ0 Ψθ0 Vθ0 −Rθ0 1/2 d 1/2 Since, by Lemma 2(ii), nY Q1n (θ0 , 0r ) = nY M̄θ0 −→ N (0r , Σθ0 ), it follows that δn∗ = −1/2 Op (nY ). Then, µ 1/2 nY θ̂nψ − θ0 νnψ ¶ 1/2 −1 −nY Q−1 c C21 Σθ0 Q1n (θ0 , 0r ) + op (1) Pθ0 0 d . −→ N 0p+q , 0 Rθ0 = Step 4. As in the proof of Theorem 2, we can then expand by (19) ³ ´0 ³ ´ ψ `n (θ̂nψ ) = nY M̄θ0 + Dθ0 (θ̂nψ − θ0 ) Σ−1 M̄ + D ( θ̂ − θ ) + op (1) θ0 θ0 n 0 θ0 ³ ´0 ³ ´ −1 −1 −1 0 0 0 = nY Q1n (θ0 , 0r ) Ir×r − Dθ0 Pθ0 Dθ0 Σθ0 Σθ0 Ir×r − Dθ0 Pθ0 Dθ0 Σθ0 M̄θ0 + op (1) h i0 h ih i 1/2 −1/2 1/2 −1/2 = nY Σθ0 M̄θ0 Ir×r − (PΣ−1/2 D − PHθ0 ) nY Σθ0 M̄θ0 + op (1), θ0 θ0 −1/2 −1 0 Dθ0 (Dθ0 0 Σ−1 θ0 Dθ0 ) Ψθ0 . Then, h i0 h i 1/2 −1/2 1/2 −1/2 ψ `n (θ̂n ) − `n (θ̂n ) = nY Σθ0 M̄θ0 PHθ0 nY Σθ0 M̄θ0 + op (1), h i0 h i 1/2 −1/2 1/2 −1/2 `n (θ0 ) − `n (θ̂nψ ) = nY Σθ0 M̄θ0 (PΣ−1/2 D − PHθ0 ) nY Σθ0 M̄θ0 + op (1). where Hθ0 = Σθ0 θ0 1/2 −1/2 Note now that nY Σθ0 θ0 d Q1n (θ0 , 0r )M̄θ0 −→ N (0, Ir×r ) by Lemma 2(ii), PHθ0 and PΣ−1/2 D − θ0 θ0 PHθ0 are idempotent matrices with ³ ´ ¡ ¢ rank PHθ0 = rank Hθ0 = rank(Ψθ0 ) = q; ´ ³ ¤ ¤ £ £ = p − trace PHθ0 = p − rank PHθ0 = p − q. rank PΣ−1/2 D − PHθ0 θ0 θ0 −1/2 For rank(PHθ0 ) = q above, we used rank(Hθ0 ) ≤ rank(Ψθ0 ), rank(Ψθ0 ) = rank(Dθ0 0 Σθ0 Hθ0 ) ≤ rank(Hθ0 ). Corollary 1 now follows. ¤ 8.5 Spatial block bootstrap algorithm Here we outline a spatial block bootstrap method for generating bootstrap version Yn∗ of the original vectorized spatial data Yn = {Ys : s ∈ Rn,Y ∩ Zd } on Rn,Y ⊂ Rd . Bootstrap replicates 16 Yn∗ of spatial data, on a bootstrap sampling region R∗n,Y , are used to formulate the empirical Bartlett correction for the spatial EL method as described in Section 4. Let Yn A = {Ys : s ∈ A ∩ Zd } denote the observed spatial data at Zd points lying inside a set A ⊂ Rn,Y . The block bootstrap requires a block scaling factor, denoted by bn,bt , satisfying d b−1 n,bt + bn,bt /nY = o(1). Suppose this bootstrap block scaling is used to make the blocks of size bn,bt (−1/2, 1/2]d in Rn,Y appearing in Figure 2(b)-(c). As a first step, we divide the sampling region Rn,Y into NOL blocks of size bn,bt (−1/2, 1/2]d that fall entirely inside Rn,Y , as depicted OL in Figure 2(b). In the notation of Section 2.2, {Bbn,bt (i) : i ∈ IbNn,bt } represents a collection of bn,bt -scaled NOL “complete blocks” partitioning Rn,Y . These complete NOL blocks inside Rn,Y , when taken together, form a bootstrap sampling region R∗n,Y as R∗n,Y ≡ {Bbn,bt (i) : i ∈ OL IbNn,bt }, as shown in Figure 2(d) based on complete NOL blocks in Figure 2(b). In place of the original data Yn observed on Rn,Y , we aim to create a bootstrap sample Yn∗ on R∗n,Y . Each block OL Bbn,bt (i) = i + bn,bt (−1/2, 1/2]d , i ∈ IbNn,bt , that constitutes a part of R∗n,Y also corresponds to a piece of Rn,Y , where we originally observed the data Yn Bbn,bt (i), Bbn,bt (i) ⊂ Rn,Y . For a fixed OL i ∈ IbNn,bt , we then create a bootstrap rendition Yn∗ Bbn,bt (i) of Yn Bbn,bt (i) by independently resampling some size bn,bt (−1/2, 1/2]d block of Ys -observations from the region Rn,Y (as in Figure 2(c)) and pasting this observational block into the position of Bbn,bt (i) within R∗n,Y . To OL make the resampling scheme precise, for each i ∈ IbNn,bt , we define the bootstrap version as Yn∗ Bbn,bt (i) ≡ Yn Bbn,bt (i∗ ) where i∗ ∈ Zd is random vector selected uniformly from the collection of OL block indices given by IbOL in the notation of Section 2.2; that is, we resample from n,bt all OL bn,bt -scaled blocks within Rn,Y (as depicted in Figure 2(c)) to produce a spatial block of observations Yn∗ Bbn,bt (i). We then concatenate the resampled block observations for each OL OL } on R∗n,Y with i ∈ IbNn,bt into a single spatial bootstrap sample Yn∗ = {Yn∗ Bbn,bt (i) : i ∈ IbNn,bt OL | · bdn,bt sampling sites at R∗n,Y ∩ Zd . In Section 4, the bootstrap EL version `∗n may n∗Y = |IbNn,bt be computed as in (6) after replacing Yn , Rn,Y , nY with Yn∗ , R∗n,Y , n∗Y . See Chapter 12.3 of Lahiri (2003a) for more details on the spatial block bootstrap. 17