A blockwise empirical likelihood for spatial lattice data Abstract Daniel J. Nordman

advertisement
A blockwise empirical likelihood for spatial lattice data
Short title: Spatial empirical likelihood
Daniel J. Nordman
Dept. of Statistics
Iowa State University
Ames, IA 50011
dnordman@iastate.edu
Abstract
This article considers an empirical likelihood method for data located on a spatial grid. The
method allows inference on spatial parameters, such as means and variograms, without knowledge of the underlying spatial dependence structure. Log-likelihood ratios are shown to have
chi-square limits under spatial dependence for calibrating tests and confidence regions, and
maximum empirical likelihood estimators permit parameter estimation and testing of spatial
moment conditions. A practical Bartlett correction is proposed to improve the coverage accuracy of confidence regions. The spatial empirical likelihood method is investigated through
a simulation study and illustrated with a data example.
Key Words: Data blocking, discrete index random fields, estimating equations
1
Introduction
Empirical likelihood (EL), introduced by Owen (1988, 1990), is a statistical method allowing
likelihood-based inference without requiring a fully specified parametric model for the data.
For independent data, versions of EL are known to share many qualities associated with
parametric likelihood, such as limiting chi-square distributions for log-likelihood ratios; see
Owen (1988) for means, Hall and La Scala (1990) for smooth mean functions and Qin and
1
Lawless (1994) for parameters satisfying moment restrictions. More recently, attention has
focused on formulating EL for dependent time series. For weakly dependent time series,
Kitamura (1997) proposed a general EL method based on data blocking techniques, and
related “blockwise” versions of EL have been developed for other time series inference: Lin
and Zhang (2001) for blockwise Euclidean EL; Chuang and Chan (2002) for autoregressive
models; Chen, Härdle and Li (2003) for goodness-of-fit tests; Bravo (2005) for time series
regressions; Zhang (2006) for negatively associated series. In econometrics, much research has
considered EL for testing moment restrictions and comparisons between EL and generalized
method of moments estimators; see, for example, Kitamura, Tripathi and Ahn (2004) and
Newey and Smith (2004). Monti (1997) and Nordman and Lahiri (2006) have considered
periodogram-based EL inference for short- and long-memory time series, respectively.
In contrast to time series, the potential application of EL for spatially dependent data has
received little consideration. The aim of this paper is to propose an EL method for spatial
lattice data and demonstrate that it has some important inference properties in the spatial
setting. The method has nonparametric and semiparametric uses and is valid for many spatial
processes under weak conditions; this can be appealing when there is uncertainty about an
appropriate parametric model. Spatial EL provides a general framework for inference on many
spatial parameters through a likelihood function based on estimating equations. Applying the
EL method to different spatial problems requires only adjusting the estimating functions that
describe the inference scenario. In addition, the spatial EL method does not require variance
estimation steps to set confidence regions or conduct tests. This feature of spatial EL is
particularly important because standard errors can be difficult to obtain for many spatial
statistics under an unknown spatial dependence structure. Current nonparametric methods
for spatial data, such as spatial subsampling and the spatial block bootstrap, often require
direct estimation of the variance of spatial statistics under data dependence (see Sherman and
Carlstein (1994), Politis, Romano and Wolf (1999), Lahiri (2003a), and references therein).
An example of a situation where spatial EL provides an attractive approach is illustrated
in Figure 1(a), which presents a map of high and low cancer mortality rates for the United
States. High and low mortality are defined as in Sherman and Carlstein (1994), who fit an au2
60
50
40
30
20
10
0
0
10
20
30
40
50
60
(a)
1
0
0
0
1
1
1
0
0
0
1
1
2
1
1
2
3
2
2
1
2
0
0
0
0
0
1
1
1
0
0
0
0
1
2
1
1
1
2
2
3
0
1
0
0
0
0
1
1
1
2
1
0
0
0
1
3
3
3
3
3
3
2
2
1
1
0
0
0
0
2
0
2
1
0
0
0
0
1
2
3
0
1
1
2
1
2
0
1
0
0
1
1
1
3
1
1
0
0
0
1
2
3
3
3
2
2
1
0
1
0
0
0
1
1
0
3
1
1
1
1
0
0
2
2
1
2
0
1
1
1
1
2
1
2
0
0
1
1
2
1
2
0
0
1
0
0
1
1
1
1
1
0
0
0
0
1
0
1
1
1
2
0
1
2
1
0
1
1
1
3
1
0
2
1
2
1
2
1
4
2
1
0
1
0
1
1
2
2
1
1
1
2
0
0
1
0
0
0
0
0
0
1
0
2
1
1
2
0
1
1
0
1
1
1
2
2
1
2
1
1
3
1
3
0
3
2
3
1
0
0
2
1
0
1
3
3
2
2
2
1
1
1
0
2
1
0
0
0
0
0
2
0
1
1
0
1
0
0
1
0
1
2
1
2
2
0
2
1
1
2
0
2
0
2
1
1
0
0
1
2
2
1
1
2
2
3
3
1
1
1
0
0
2
1
1
2
1
0
0
0
1
1
0
0
1
1
0
0
0
0
2
1
2
0
0
2
0
1
0
1
2
0
0
0
1
0
0
1
0
1
2
2
2
2
1
2
2
1
1
0
0
0
0
0
2
2
1
1
2
0
1
1
2
1
1
1
2
1
0
0
1
0
2
0
0
0
0
1
0
1
1
2
1
0
0
0
0
0
1
2
0
0
2
3
3
1
2
2
0
0
1
0
0
0
0
0
1
0
2
1
2
0
1
2
3
2
1
1
3
1
1
0
0
0
2
0
0
0
0
0
0
0
0
2
2
1
0
0
0
0
2
2
2
1
1
1
2
3
2
1
2
1
1
0
0
0
0
0
0
0
0
2
0
1
1
1
1
1
2
2
2
1
0
2
0
1
0
1
0
1
0
1
0
0
0
0
0
1
2
1
1
1
0
1
1
3
2
2
0
1
1
1
3
1
1
0
0
0
0
1
0
1
0
2
2
0
1
0
1
1
2
0
2
2
1
1
0
2
1
1
0
1
0
1
0
0
0
0
2
2
2
1
2
1
0
2
1
1
0
2
0
3
2
2
2
1
0
0
1
0
2
1
2
1
1
2
0
0
1
1
0
3
1
2
1
1
2
2
2
1
0
0
1
0
1
0
0
2
1
2
1
3
1
1
1
0
2
0
1
2
1
3
3
1
2
0
0
0
1
1
1
1
1
2
0
1
1
0
1
1
0
1
1
2
0
1
3
2
1
0
0
0
1
0
1
1
0
2
1
2
0
2
0
0
2
0
1
1
3
1
1
2
0
1
0
0
0
1
1
0
1
0
2
1
1
1
0
0
0
0
1
0
2
1
1
3
0
0
0
0
1
2
0
0
1
0
1
1
2
0
1
0
1
2
0
3
1
3
0
0
1
0
0
0
2
0
1
0
0
1
1
1
2
0
0
0
1
0
1
3
0
1
2
0
1
0
0
1
1
1
1
0
1
1
1
2
1
1
1
2
1
2
3
1
4
2
1
0
0
1
0
2
1
2
0
0
0
0
0
2
0
1
0
1
0
2
1
1
2
0
0
2
0
0
0
0
2
2
0
1
2
1
2
1
1
2
2
1
2
2
2
3
1
1
2
1
2
1
2
1
0
0
0
0
0
0
1
0
0
0
1
0
2
1
0
0
2
0
1
0
0
1
1
2
1
1
2
2
0
2
1
2
2
1
0
2
2
3
1
2
2
1
2
2
0
1
0
1
0
1
0
1
0
0
0
0
2
1
2
0
1
0
3
1
0
0
0
3
1
1
2
1
2
1
0
2
1
3
1
2
2
1
3
2
2
1
1
1
2
0
2
0
2
0
3
0
1
1
0
2
1
3
1
1
1
3
1
1
1
0
1
0
3
1
0
3
0
1
1
0
2
1
2
2
1
2
2
1
1
1
1
0
1
0
2
1
2
0
3
1
0
2
0
2
1
2
1
1
1
2
1
0
1
1
2
1
1
2
0
1
0
0
0
0
1
1
2
0
1
0
0
0
0
0
1
1
1
2
2
1
1
2
0
1
0
2
1
1
2
1
0
0
1
1
1
3
2
3
0
1
0
0
1
0
0
0
1
1
1
1
1
1
0
0
1
2
2
1
3
1
2
0
1
0
1
2
3
2
0
2
0
1
1
2
3
2
3
2
3
0
0
1
0
2
1
1
1
1
1
2
3
1
1
1
1
1
2
2
1
2
1
1
0
1
1
2
3
2
2
0
2
1
3
2
3
2
4
3
1
1
0
1
2
1
2
1
1
1
2
2
1
2
0
0
1
1
0
1
1
1
1
1
1
0
2
2
3
1
1
1
0
2
2
3
2
3
2
2
3
1
1
0
1
1
1
1
0
1
0
1
1
0
1
0
0
0
0
0
0
2
1
1
0
2
0
2
1
3
0
1
0
1
1
3
2
2
1
4
1
1
1
2
0
0
0
0
1
0
1
1
0
2
2
1
1
0
0
1
1
1
1
2
0
0
1
1
0
3
1
2
0
1
0
2
2
2
2
2
0
2
1
1
0
1
0
1
0
0
2
1
2
2
3
2
2
0
2
1
2
2
2
1
0
1
1
1
1
1
3
2
1
1
0
1
1
3
0
1
1
1
1
0
1
0
1
0
1
1
1
4
0
2
3
1
1
0
2
2
1
0
3
2
1
2
0
1
0
1
1
1
1
1
1
1
1
0
0
0
1
0
2
0
1
1
2
1
1
1
3
1
0
1
0
0
1
0
2
0
1
1
1
0
0
0
0
1
0
0
0
2
1
0
2
2
1
1
0
1
0
0
0
1
0
2
1
0
1
1
1
0
1
0
3
2
1
1
1
1
1
2
2
0
1
0
2
0
0
0
1
1
2
3
1
2
1
2
1
3
0
2
2
2
1
2
2
0
1
0
2
0
1
0
0
0
3
3
2
3
2
2
2
3
2
0
2
1
3
1
3
1
1
1
0
1
0
2
1
1
1
2
2
4
4
3
1
3
2
4
4
1
0
0
1
0
2
0
1
2
0
3
2
3
0
3
1
1
2
1
2
1
2
2
2
3
4
4
3
2
3
1
4
2
4
2
2
1
0
2
0
1
1
0
3
1
3
1
2
2
2
3
2
2
1
2
1
2
2
2
4
2
2
2
2
2
1
3
2
3
0
1
0
0
1
0
2
1
2
1
1
3
2
2
1
1
0
2
1
1
3
1
2
2
1
2
2
2
1
2
2
0
1
1
0
0
0
2
0
2
1
1
1
1
0
0
1
1
2
2
1
2
2
1
4
2
3
1
3
2
1
0
0
1
3
0
1
0
2
0
0
0
0
1
2
3
1
2
1
3
3
3
3
3
2
2
1
1
1
1
1
1
2
0
2
0
1
0
0
0
1
2
2
3
0
2
3
1
4
1
4
1
2
1
1
2
1
1
1
2
0
2
0
0
0
1
0
3
2
1
3
1
1
4
0
4
1
3
0
1
3
2
2
1
1
1
2
0
1
0
1
0
2
1
2
2
0
3
2
1
3
1
2
1
2
1
2
3
4
1
1
1
1
1
2
0
1
0
1
1
2
1
0
2
0
2
1
1
2
1
1
1
1
1
1
3
4
2
3
0
0
1
1
1
2
1
0
0
2
1
2
0
0
1
0
1
1
0
2
1
2
1
0
1
3
2
3
3
1
1
0
0
1
1
1
0
1
1
1
4
0
1
0
0
0
0
0
2
2
1
0
3
1
2
1
0
0
0
0
0
1
1
0
2
2
2
2
0
0
0
1
0
0
0
1
0
1
1
0
1
0
0
0
1
0
1
2
1
4
1
1
0
0
1
0
2
1
0
0
0
1
1
0
2
1
1
0
1
0
1
1
1
4
2
3
0
1
0
0
2
2
1
0
0
1
0
2
2
1
1
1
1
0
2
0
0
2
3
4
1
2
0
1
1
0
1
0
0
2
1
1
1
1
0
0
2
0
1
0
1
3
3
2
0
1
1
0
1
0
1
1
0
1
2
0
0
0
2
0
0
1
2
3
1
0
3
1
0
0
1
1
2
1
1
3
1
3
2
1
0
0
0
2
1
1
1
2
1
1
0
1
0
0
1
1
1
2
1
0
0
1
1
1
2
2
1
2
0
1
1
0
2
0
2
2
2
1
0
1
0
2
0
1
0
0
0
2
0
1
1
2
3
3 1
1 1
0 1
0 1
0 1 0
0 0
0
1
1
2
2 1
1
2
0 2
1
3
2 4
2
0
0
1 0
0
0 0
0 0
1 0
0 1
1 1
1
1
0
10
20
30
40
50
2
1
0
0
0
1
0
0
0
0
0
1
0
1
1
1
1
2
0
2
1
1
1
0
0
0
1
2
2
1
0
0
0
1
1
1
0
1
2
2
1
1
0
0
0
0
0
10
20
30
40
50
60
(b)
Figure 1: (a) Cancer mortality map, where • and ◦ respectively denote a high, Zs = 1, or low,
Zs = 0, mortality rate at site s ∈ Rn ∩ Z2 of the sampling region Rn . (b) Sampling region
R5,n for vectors Ys , s ∈ R5,n ∩ Z2 , where Ys consists of Zs and its four nearest neighbors Zh ,
ks − hk = 1; at each site s ∈ R5,n ∩ Z2 , the indicated value denotes the sum Ss of the four
neighboring indicators Zh of Zs where values in dark (light) font denote Zs = 1 (Zs = 0) at
site s.
3
tologistic model to assess evidence of clustering among high mortality cases. To estimate the
autologistic parameter that describes clustering, these authors employed maximum pseudolikelihood (Besag (1975)) followed by a spatial subsampling step. In particular, subsampling
was used to obtain a standard error for the pseudo-likelihood estimate in order to set a confidence interval for the autologistic parameter through a normal approximation. This example
is revisited in Section 6, where the spatial EL method produces a confidence interval for the
clustering model parameter automatically, and no separate determination of standard error is
required. Intervals from the spatial EL approach indicate spatial clustering, but suggest the
evidence for clustering is not as strong as reported by Sherman and Carlstein (1994).
In what follows, a spatial blockwise EL method is developed, based on spatial estimating
equations combined with either maximally overlapping or non-overlapping blocks of spatial
observations. Data blocking is used as a device to accommodate unknown spatial dependence,
similar to the time series blockwise EL of Kitamura (1997). For a broad class of spatial parameters, the spatial EL method yields log-ratios that are asymptotically chi-square, allowing
the formulation of tests and confidence regions without knowledge of the data dependence
structure. Our EL results include distributions for EL point estimators of spatial parameters as well (i.e., so-called maximum EL estimators). Based on recent results in Chen and
Cui (2006, 2007) for independent data, a procedure for a practical Bartlett correction for the
spatial EL method is proposed and investigated using simulation. The Bartlett correction
makes an adjustment to the log EL ratio that improves coverage accuracy.
The rest of the paper is organized as follows. In Section 2, we describe the spatial sampling
and estimating function frameworks, with some examples provided for illustration. We also
construct the spatial blockwise EL. The main distributional results for the spatial EL method
are presented in Section 3. Section 4 outlines an empirical Bartlett correction. The proposed
methodology is assessed through a numerical study in Section 5, and illustrated with the
cancer mortality map of the United States in Section 6. Section 7 provides a discussion of
EL block selection. Assumptions and detailed proofs for the main results are deferred to an
Appendix, available in the online supplement to this manuscript.
4
2
Spatial empirical likelihood method
To set the stage for development of spatial EL, recall the formulation of EL using a sample
Y1 , . . . , Yn of independent, identically distributed (iid) data (e.g., Owen (1990), Qin and Lawless (1994)). First, a parameter of interest θ ∈ Rp is linked to each observation by creating a
function Gθ (Yi ) of both, using a vector of r ≥ p estimating functions Gθ (·). The estimating
functions are chosen so that, at the true parameter value θ = θ0 , we have an expectation
condition E{Gθ0 (Yi )} = 0r that identifies θ0 . With such estimating functions in place, an
EL function for θ can be constructed by maximizing a product of n probabilities placed on
Gθ (Y1 ), . . . , Gθ (Yn ) under a linear “expectation” constraint. The resulting EL function for θ
has important uses; the function can be maximized for point estimators for θ, or chi-square
calibrated to set confidence regions.
An EL for spatial lattice data is similarly based on estimating functions that satisfy a
moment condition, but requires modifications to handle spatial dependence. First, we need a
spatial sampling region Rn ⊂ Rd , d ≥ 1, on which a spatial process {Zs : s ∈ Zd } is observed
on a grid; here d denotes the dimension of sampling. Then we develop estimating functions
involving a spatial parameter θ of interest and the spatial Zs -observations. To provide more
generality in the spatial setting, we consider functions Gθ (Ys ), s ∈ Zd , that connect θ to vectors
of spatial observations Ys = (Zs+h1 , . . . , Zs+hm )0 based on some selection of fixed spatial lags
h1 , . . . , hm ∈ Zd ; these Ys -observations have their own sampling region Rn,Y based on the
region Rn for the observed spatial process {Zs : s ∈ Zd }. These formulations are made
precise in Section 2.1, which also provides some examples. A spatial EL function for θ is then
constructed using an estimating function Gθ (·) along with spatial blocks of Ys -observations,
instead of using individual observations, as described in Section 2.2.
For clarity throughout the sequel, a bold font denotes a vector in Rd , e.g., s, h, i ∈ Rd .
2.1
Spatial estimating equations
To describe the spatial EL method, we adopt a sampling framework that allows a spatial
sampling region Rn ⊂ Rd to grow as the sample size n increases. Using a subset R0 ⊂
(−1/2, 1/2]d containing an open neighborhood and an increasing positive sequence {λn } of
5
scaling factors, suppose the sampling region Rn is obtained by inflating the “template” set
R0 by the constant λn : Rn = λn R0 . This formulation permits a wide variety of shapes for
the sampling region Rn , which shape is preserved as the sampling region grows. For spatial
subsampling, Sherman and Carlstein (1994), Sherman (1996), and Nordman and Lahiri (2004)
use a comparable sampling structure. We assume that a real-valued, strictly stationary process
{Zs : s ∈ Zd } is observed at regular locations on the grid Zd inside Rn . Hence, the available
data are {Zs : s ∈ Rn ∩ Zd } observed at n sites {s1 , . . . , sn } = Rn ∩ Zd , with n as the sample
size of the observed Zs .
To describe a finite dimensional parameter θ ∈ Θ ⊂ Rp of the spatial process {Zs : s ∈ Zd }
with estimating functions, we collect observations from Rn into vectors. For a positive integer
m, we form an m-dimensional vector Ys = (Zs+h1 , Zs+h2 , . . . , Zs+hm )0 , s ∈ Rn,Y ∩ Zd , where
h1 , h2 , . . . , hm ∈ Zd are selected lag vectors, and Rn,Y = {s ∈ Rn : s + h1 , . . . , s + hm ∈ Rn }
denotes the sampling region for the process {Ys : s ∈ Zd } containing nY ≡ |Rn,Y ∩ Zd |
observations. Here and throughout the sequel, |A| represents the size of a finite set A.
As in the iid data formulation of EL (Qin and Lawless (1994)), suppose information about
θ ∈ Θ ⊂ Rp exists through r ≥ p estimating functions linking θ to a vector form Ys , s ∈ Zd of
the spatial process {Zs : s ∈ Zd }. With arguments y = (y1 , . . . , ym )0 ∈ Rm and θ ∈ Θ, define
Gθ (y) = (g1,θ (y), . . . , gr,θ (y))0 : Rm × Rp → Rr as a vector of r estimating functions satisfying
E{Gθ0 (Ys )} = 0r ∈ Rr ,
s ∈ Zd ,
(1)
at the true and unique parameter value θ0 . When r > p, the above functions are said to be
“overidentifying” for θ. In Section 2.2, we build an EL function for a spatial parameter θ via
the moment condition in (1).
With appropriate choices of vectors Ys and estimating functions Gθ (·), EL inference is
possible for a large class of spatial parameters, as is illustrated in the following examples.
Example 1. (Poisson counts). Consider a pattern of events in a spatial region that may
exhibit spatial randomness (e.g., tree locations in a forest). It is common to partition the
region into rectangular plots on a grid, and the number of events occurring in each plot (or
quadrat) is considered as a lattice observation Zs , s ∈ Zd (e.g., counts of trees in a quadrat),
6
where each count Zs follows a Poisson distribution with mean E(Zs ) = θ when the events
exhibit complete spatial randomness (Cressie (1993), Chapter 8.2). For EL inference, we set
Ys = Zs , s ∈ Zd and use estimating functions Gθ (Ys ) = (Zs − θ, Zs2 − θ2 − θ)0 , based on
Poisson moments, so that (1) holds with p = m = 1, r = 2. Using EL results in Section 3, it
is possible to estimate the mean count θ, or more importantly test if the Poisson assumption
(1) holds, without nonparametric variance estimation as used in some previous applications
(Sherman (1996)).
Example 2. (Variogram inference). Estimation of the variogram 2γ(hi ) ≡ Var(Zs −
Zs+hi ) = E{(Zs − Zs+hi )2 } of the process {Zs : s ∈ Zd } at given lags h1 , . . . , hp ∈ Zd is
an important problem. Least squares variogram fitting is commonly proposed in the geostatistical literature; see Lee and Lahiri (2002) and references therein. For EL inference
on the variogram θ = (2γ(h1 ), . . . , 2γ(hp ))0 ∈ Rp , we define a vector function Gθ (Ys ) =
(g1,θ (Ys ), . . . , gp,θ (Ys ))0 of the (p + 1)-dimensional process Ys = (Zs , Zs+h1 , . . . , Zs+hp )0 , where
gi,θ (Ys ) = (Zs − Zs+hi )2 − 2γ(hi ). This selection fulfills (1) with r = p, m = p + 1.
Example 3. (Pseudo-likelihood inference). Markov random fields provide an important class of models for spatial lattice data. They allow the conditional distribution of an
observation Zs , s ∈ Zd , to be written through a neighborhood structure as

 Pθ (Zs = z | {Zh : h ∈ Ns })
if Zs is discrete
fθ (z | {Zh : h 6= s}) =
z ∈ R, (2)
 density f (z | {Z : h ∈ N }) if Z is continuous,
θ
h
s
s
where Ns ⊂ Zd denotes a neighborhood of Zs (Cressie (1993), Chapter 6). Besag (1974)
developed models based on conditional distributions from one-parameter exponential families
in (2)and estimated them through maximum pseudo-likelihood (Besag (1975)), where the
pseudo-likelihood estimator θ̂nP L of θ ∈ Θ ⊂ Rp solves the score-based system
X
s∈Rn ∩Zd
∂ log fθ (Zs | {Zh : h ∈ Ns })
= 0p ∈ Rp .
∂θ
Confidence regions for θ based on a normal approximation for θ̂nP L often require estimating
the variance Var(θ̂nP L ) of the pseudo-likelihood estimator, a difficult task in general. This
7
issue is relevant when fitting (2) with pseudo-likelihood to examine clustering in the mortality
map described in the Introduction. However, the EL method may be generally applied for
pseudo-likelihood inference with the advantage that a confidence region for a parameter θ
characterizing (2) can be set by simply calibrating an EL function. This will be illustrated
for the mortality map example in Section 6.
To set up EL inference for a conditional distribution (2), suppose the neighborhoods Ns ,
s ∈ Zd , have a constant structure, such as “four-nearest neighbor” Ns ≡ {s ± e ∈ Z2 : e =
(0, 1)0 , (1, 0)0 } when d = 2. For describing θ ∈ Θ ⊂ Rp , we choose r = p score-functions
Gθ (Ys ) = ∂ log fθ (Zs | {Zh : h ∈ Ns })/∂θ involving a vector Ys = (Zs , Zs+h1 , . . . , Zs+h|Ns | )0 ,
hi ∈ Ns − s, formed by Zs and its |Ns | neighbors, s ∈ Zd . For Markov random fields based on
exponential-family models (2), these functions entail the moment condition (1) for θ.
2.2
Spatial blockwise empirical likelihood construction
Suppose a spatial parameter θ ∈ Θ ⊂ Rp is identified through a vector process Ys , s ∈ Zd ,
and estimating functions Gθ (·) satisfying (1). Construction of the spatial EL function for θ
requires spatial blocks of observed vectors Ys , s ∈ Rn,Y ∩ Zd . We consider two possible sources
of rectangular blocks within Rn,Y , namely, maximally overlapping (OL) and non-overlapping
(NOL) blocks. Such blocking schemes are common with other block-based spatial resampling
methods, such as the spatial block bootstrap and spatial subsampling (Lahiri (2003a)).
Let {bn }n≥1 be a sequence of positive integers and define general d-dimensional blocks as
Bbn (i) ≡ i + bn U, i ∈ Zd , using the cube U = (−1/2, 1/2]d . To keep the blocks small relative
to the sampling region Rn,Y , we suppose bn grows at a slower rate than the sample size nY ,
and require that
2d
b−1
n + bn /nY = o(1)
(3)
as n → ∞. We elaborate on this block condition in Section 7. The integer index set IbOL
=
n
{i ∈ Zd : Bbn (i) ⊂ Rn,Y } identifies all integer-translated cubes bn U lying completely inside the
sampling region Rn,Y for the Ys -observations. From this, the collection of maximally OL blocks
is given by {Bbn (i) : i ∈ IbOL
}; see Figure 2(c). For NOL blocks, the region Rn,Y is divided
n
into disjoint cubes of Ys -observations. Letting IbNnOL = {bn k : k ∈ Zd , Bbn (bn k) ⊂ Rn,Y } ⊂ Zd
8
represent the index set of all NOL cubes Bbn (bn k) = bn (k + U) lying completely inside Rn,Y ,
the NOL block collection is then {Bbn (i) : i ∈ IbNnOL }; see Figure 2(b).
Figure 2: (a) Sampling region Rn,Y ; (b) NOL complete blocks; (c) OL blocks; (d) Bootstrap
region R∗n,Y formed by the complete blocks in (b). (Bootstrap samples on R∗n,Y are found by
resampling data blocks from (c) and concatenating these into block positions in (d).)
(a)
qqqqqqqqq
£q qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq Bqqq q
q£qq qq qq qq qq qq qq qq qq qq qq qq Bqq q
£qq qq qq qq qq qq qq qq qq qq qq qq qq qq Bqq
qqqqqqqqqqqqqqqqq
qq£qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq Bqqq qq
£q q q q q q q q q q q q q q q q q qBq
£qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qqBqq
qqqqqqqqqqqqqqqqqqqqqqq
q£qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qqBqq q
£q q q q q q q q q q q q q q q q q q q q q q q qBq
(b) complete block
£
£
£
£
£
£
¢
¢® B
£
B
B
B
B
B
B
B
£ KA
£
A incomplete block
£
£
£
£
£
£
(c)
£
(d)
B
B
B
B
B
B
B
B
In the following, we let In generically denote either chosen index set IbOL
or IbNnOL and
n
denote the number of blocks as NI = |In |. Using estimating functions Gθ in (1), we compute
P
a sample mean Mθ,i = s∈Bbn (i)∩Zd Gθ (Ys )/bdn , i ∈ In , for each block Bbn (i) in the collection,
which provides |Bbn (i) ∩ Zd | = bdn observations of Gθ (Ys ), s ∈ Zd . The EL function Ln (θ) and
EL ratio Rn (θ) for θ ∈ Θ are then given by
(
)
Y
X
X
Ln (θ) = sup
pi : pi ≥ 0,
pi = 1,
pi Mθ,i = 0r ,
i∈In
i∈In
Rn (θ) =
i∈In
Ln (θ)
.
(1/NI )NI
(4)
The EL function for θ ∈ Θ involves maximizing a multinomial likelihood created from probabilities assigned to each block sample mean, under an expectation-based linear constraint. Without the expectation constraint in Ln (θ), the product has a maximum when each pi = 1/NI ,
yielding the EL ratio in (4). If 0r ∈ Rr is interior to the convex hull of {Mθ,i : i ∈ In }, then
Ln (θ) represents a positive, constrained maximum and (4) may be written as
Ln (θ) =
Y
pθ,i ,
i∈In
where tθ solves
Rn (θ) =
Y¡
1 + t0θ Mθ,i
¢−1
,
pθ,i = {NI (1 + t0θ Mθ,i )}−1 ∈ (0, 1),
(5)
i∈In
P
i∈In
Mθ,i /(1 + t0 Mθ,i ) = 0r . We define Ln (θ) = −∞ when the set in (4) is
empty. See Owen (1990) and Qin and Lawless (1994) for these computational details on EL.
In the next section, we consider the distribution of the log EL ratio given by
`n (θ) = −2Bn log Rn (θ),
9
Bn =
nY
bdn NI
.
(6)
The factor Bn is a block adjustment to ensure chi-square limits for (6), and represents the
spatial analog of the block correction used for the time series blockwise EL (Kitamura (1997)).
3
Main results
Distributional results for the spatial EL are established under a set of assumptions referred
to as “Assumptions 1-4” in the sequel. We defer technical details on these assumptions to
Section 8.1 of the online Appendix. In brief, Assumption 1 provides a condition equivalent to
the block growth rate (3). Assumptions 2-4 describe spatial mixing and moment conditions
which allow the spatial EL method to be valid for a large class of spatial processes exhibiting
weak spatial dependence. All of the EL results to follow apply equally to EL functions Rn (θ),
`n (θ) constructed of either OL or NOL blocks (i.e., In = IbOL
or IbNnOL ).
n
3.1
Smooth function model
We first establish the distribution of spatial blockwise EL ratios for inference on “smooth
function” parameters, as in Hall and La Scala (1990) for iid data and Kitamura (1997) for
mixing time series. Suppose θ = E{G(Ys )} ∈ Θ ⊂ Rp represents the mean of a function
G : Rm → Rp applied to an m-dimensional vector Ys , s ∈ Zd . EL inference on a more general
parameter θH = H(θ) ∈ Ru may be considered using a smooth function H : Rp → Ru of
θ. This “smooth function” model permits a wide range of spatial parameters θH , including
ratios or differences of means θ. For example, θ = {E(Ys ), E(Ys2 ), E(Ys Ys+h )}0 ∈ R3 and
H(x1 , x2 , x3 ) = (x3 − x21 )/(x2 − x21 ) : R3 → R yield a spatial autocorrelation θH = H(θ) at lag
h ∈ Zd . For smooth model inference, we first define an EL ratio Rn (θ) for θ using functions
Gθ (Ys ) = G(Ys ) − θ, s ∈ Zd in (5), which satisfy (1) with the same number of parameters and
estimating functions r = p. An EL ratio and log-ratio for a parameter θH are then defined as
Rn (θH ) ≡
sup
θ∈Θ:H(θ)=θH
`n (θH ) ≡ −2Bn log Rn (θH ).
Rn (θ),
Theorem 1 provides a nonparametric recasting of Wilks’ theorem for spatial data, useful for
calibrating confidence regions and tests of spatial “smooth model” parameters based on a
10
chi-square approximation. In the following, χ2ν denotes a chi-square variable with ν degrees
d
of freedom with a lower α-quantile given by χ2ν;α and −→ denotes distributional convergence.
Theorem 1 (Smooth functions of means) Suppose In = IbOL
or IbNnOL ; E{G(Ys )} =
n
θ ∈ Rp ; Assumptions 1-4 hold with r = p estimating functions Gθ (Ys ) = G(Ys ) − θ, s ∈ Zd ;
H : Rp → Ru is continuously differentiable in a neighborhood of θ0 and θ0H = H(θ0 ). Then,
d
`n (θ0H ) −→ χ2ν
as n → ∞, where ν denotes the rank of the u × p matrix ∂H(θ)/∂θ|θ=θ0 .
See Hall and La Scala (1990) for properties of EL confidence regions for smooth model parameters.
3.2
Maximum empirical likelihood point estimation
We refer to the maximum of Rn (θ) from (5) as the maximum empirical likelihood estimator
(MELE) and denote it by θ̂n . Using general estimating equations, Qin and Lawless (1994) and
Kitamura (1997) considered the distribution of the MELE with independent data and mixing
time series, respectively. With spatial data, we show the MELE has properties resembling
those available in other EL frameworks.
We first consider establishing the existence, consistency and asymptotic normality of a
sequence of maxima of the EL ratio Rn (θ) from (5), along the lines of the classical arguments
of Cramér (1946). The conditions are mild and have the advantage that they are typically
easy to verify. Let k · k denote the Euclidean norm in the following.
Theorem 2 (General estimating equations) Assume In = IbOL
or IbNnOL , Assumptions
n
1-4 and (1) hold. In addition, suppose in a neighborhood of θ0 , ∂Gθ (·)/∂θ, ∂ 2 Gθ (·)/∂θ∂θ0 are
continuous in θ and k∂Gθ (·)/∂θk, k∂ 2 Gθ (·)/∂θ∂θ0 k are bounded by a nonnegative, real-valued
J(·) with E{J 3 (Ys )} < ∞; and Dθ0 ≡ E{∂Gθ (Ys )/∂θ|θ=θ0 } has full column rank p. Then,
¡
−5/12 ¢
as n → ∞, P Rn (θ) is continuously differentiable on kθ − θ0 k ≤ nY
→ 1; there exists a
³
´
−5/12
sequence {θ̂n } such that P Rn (θ̂n ) =
max
Rn (θ) & kθ̂n − θ0 k < nY
→ 1; and
−5/12

kθ−θ0 k≤nY


 

V
0 
0
θ̂ − θ0  d
1/2
−→ N  p  ,  θ0
nY  n
−d
0r
0 Uθ0
bn tθ̂n
11
¡
¢−1
−1
−1
0
where Vθ0 = Dθ0 0 Σ−1
, Uθ0 = Σ−1
θ0 Dθ0
θ0 − Σθ0 Dθ0 Vθ0 Dθ0 Σθ0 .
Remark. For an iid sample of size n, Qin and Lawless (1994) established a related result for
−5/12
a ball kθ − θ0 k ≤ n−1/3 . We could replace nY
−1/3
with nY
, to allow a larger ball in Theo-
rem 2, by strengthening moment assumptions (i.e., E(kGθ (Ys )k12+δ ) < ∞ in Assumption 1 of
−5/12
Section 8.1 of the online Appendix). However, regardless of the ball radius nY
−1/2
the maximizer of the EL function on each ball satisfies kθ̂n − θ0 k = Op (nY
−1/3
or nY
,
), and thereby
maximizers on the different balls must ultimately be equal.
Theorem 2 establishes the existence of a local maximizer of the spatial EL function. When
the likelihood Rn (θ) has a single maximum with probability approaching 1, by the concavity
of Rn (θ) for example, then the sequence {θ̂n } corresponds to a global MELE. Under stronger
conditions, as in Kitamura (1997), a global maximum on Θ can be shown to satisfy kθ̂n −
−1/2
θ0 k = Op (nY
), thereby coinciding with the sequence in Theorem 2. However, Theorem 2
conditions are often sufficient, for many estimating functions, to ensure that the sequence {θ̂n }
in Theorem 2 corresponds to global maximizers without more restrictive assumptions, such as
compactness of the parameter space Θ. For example, this is true with estimating functions of
the common form Gθ (Ys ) = G(Ys ) − γ(θ) for some G : Rm → Rr and differentiable γ : Θ → Rr
with kγ(θ) − γ(θ0 )k increasing in kθ − θ0 k; see Example 1 of Section 2.1 for illustration.
3.3
Empirical likelihood tests of hypotheses
As in the EL frameworks of Qin and Lawless (1994) and Kitamura (1997), the spatial EL
method allows test statistics based on θ̂n for both spatial parameter and moment hypotheses.
The distribution of the log-EL ratio rn (θ) ≡ `n (θ) − `n (θ̂n ) at θ = θ0 is useful for simple
hypothesis tests or for calibrating approximate 100(1 − α)% EL confidence regions for θ as
{θ ∈ Θ : rn (θ) ≤ χ2p;1−α }. For testing the null hypothesis that the moment condition (1) holds
for the estimating functions, the log-ratio statistic `n (θ̂n ) may be applied. Theorem 3 provides
the limiting chi-square distributions of these EL log-ratio statistics.
In Theorem 3, we show additionally that the profile spatial EL ratio statistics can be
developed to conduct tests and set confidence regions in the presence of nuisance parameters;
12
see Qin and Lawless (1994) for the iid data case. Let θ = (θ10 , θ20 )0 , where θ1 denotes the q × 1
parameter of interest and θ2 denotes a (p − q) × 1 nuisance vector. For fixed θ1 , suppose that
(θ )
θ̂2 1 maximizes the EL function Rn (θ1 , θ2 ) with respect to θ2 and define the profile log-EL
(θ )
ratio `n (θ1 ) ≡ −2Bn log Rn (θ1 , θ̂2 1 ) for θ1 .
Theorem 3 Under the assumptions of Theorem 2 with the sequence {θ̂n }, as n → ∞,
d
d
(i) rn (θ0 ) = `n (θ0 ) − `n (θ̂n ) −→ χ2p and `n (θ̂n ) −→ χ2r−p .
d
(ii) If H0 : θ1 = θ10 holds, then rn (θ10 ) = `n (θ10 ) − `n (θ̂1n ) −→ χ2q , where θ̂n = (θ̂1n , θ̂2n )0 .
We examine the performance of the spatial EL in subsequent sections. EL inference for spatial
parameters under constraints is also possible, as considered by Qin and Lawless (1995) and
Kitamura (1997) for iid and time series data; see Section 8.4 of the online Appendix for this.
4
A Bartlett correction procedure
A Bartlett correction is often an important property for EL methods. This involves making a
mean adjustment to the EL log-ratio in order to improve the limiting chi-square approximation, and to enhance the coverage accuracy of EL confidence regions. For EL with independent data, a Bartlett correction has been established by DiCiccio, Hall and Romano (1991)
for smooth function means, and by Chen and Cui (2006, 2007) under general estimating equations and nuisance parameters; see Chen and Cui (2007) for additional references with iid data.
With weakly dependent time series, Kitamura (1997) and Monti (1997) considered Bartlett
corrections for blockwise EL with mean parameters and a periodogram-type EL, respectively.
While a formal justification of a Bartlett correction in the spatial setting is difficult, a practical Bartlett correction for the spatial EL may be proposed using a spatial block bootstrap.
Let rn (θ) = `n (θ) − `n (θ̂n ), θ ∈ Θ ⊂ Rp , denote the log EL ratio from Section 3.3 based on the
d
MELE θ̂n and (6). By Theorems 2-3, we have rn (θ0 ) −→ χ2p and θ̂n is consistent for θ0 so that
a bootstrap Bartlett correction factor may be calculated as follows. Pick some large M ∈ N.
For i = 1, . . . , M , independently generate a block bootstrap rendition, say Yn∗i , of the original
∗i ∗i
vectorized spatial data Yn ≡ {Ys : s ∈ Rn,Y ∩ Zd } and compute rn∗i (θ̂n ) = `∗i
n (θ̂n ) − `n (θ̂n ),
∗i
∗i
where `∗i
n and θ̂n are the log EL ratio and MELE analogs based on Yn . We then compute
13
r̄n∗ = M −1
PM
∗i
i=1 rn (θ̂n )
to estimate E{rn (θ0 )} and set a Bartlett-corrected 100(1 − α)% confi-
dence region as {θ : (p/r̄n∗ )rn (θ) ≤ χ2p,1−α }. If θ = (θ10 , θ20 )0 with interest on θ1 ∈ Rq , treating
θ2 ∈ Rp−q as nuisance parameter as in Theorem 3, we take a Bartlett-corrected confidence
region for θ1 as {θ1 : (q/r̄n∗ )rn (θ1 ) ≤ χ2q,1−α } with respect to rn (θ1 ) = `n (θ1 ) − `n (θ̂1n ) and r̄n∗
∗i ∗i
based on rn∗i (θ̂1n ) = `∗i
n (θ̂1n ) − `n (θ̂1n ). Under the smooth function model in Theorem 1, the
same algorithm applies for the EL ratio rn (θ0H ) ≡ `n (θ0H ), with `n (θ̂n ) = 0 in this case.
As an alternative to the Bartlett correction, we mention another option would be to calibrate confidence regions for the log El ratio rn (θ) using sample quantiles from the M bootstrap
replicates rn∗i (θ̂n ). The Bartlett correction involves estimating the mean of rn (θ0 ) at the true
parameter θ0 while the bootstrap calibration aims to approximate extreme quantiles of the
distribution of rn (θ0 ). Intuitively, mean estimation is a more robust task and may possibly
require fewer bootstrap replicates M for adequate estimates. Simulation studies with independent data in Chen and Cui (2007) appear to suggest this as well. For this reason, we
concentrate our numerical studies in Section 5 on the Bartlett correction.
For completeness, we describe a spatial block bootstrap method for generating a bootstrap
version of Yn on Rn,Y in Section 8.5 of the online Appendix. The bootstrap involves spatial
d
blocks determined by a block scaling factor bn,bt , satisfying b−1
n,bt + bn,bt /nY = o(1). The
bootstrap scaling bn,bt may differ from the EL block scaling bn and might be expected to be
larger than bn in many cases.
5
Numerical study
We conducted a simulation study to compare OL and NOL versions of the blockwise EL
method, and to examine the Bartlett correction algorithm for inference on the mean E(Zs ) = θ
of a real-valued spatial process Zs , s ∈ Z2 , on the integer grid. Sampling regions Rn =
λn (−1/2, 1/2]2 ⊂ R2 of different sizes were considered with λn = 10, 20, 30; a fourth region
was taken as Rn = (−5, 5] × (−15, 15]. We used the circulant embedding method of Chan
and Wood (1997) to generate real-valued mean-zero Gaussian random fields on Z2 with an
14
Exponential or Gaussian covariance structure:

£
¤
¡
¢  exp − β1 |h1 | − β2 |h2 |
0
2
h = (h1 , h2 ) ∈ Z , Cov Zs , Zs+h =
 exp £ − β1 |h1 |2 − β2 |h2 |2 ¤
model E(β1 , β2 )
model G(β1 , β2 ),
with values (β1 , β2 ) = (0.8, 0.8) and (0.4, 0.2). Using Ys = Zs and Gθ (Ys ) = Zs − θ in (1), we
calculated approximate two-sided 90% EL intervals for θ as {θ : rn (θ) ≤ χ21;0.9 } using OL/NOL
blocks of length bn = Cn1/5 , C = 1, 2, where n = |Rn ∩ Z2 | and rn (θ) = `n (θ); note `n (θ̂n ) = 0
here for the mean and nY = n. This order of the EL block factor was intuitively chosen to be
smaller than the optimal order O(n1/(d+2) ) known for spatial subsampling variance estimation
when d = 2 (Sherman (1996)); EL block scaling bn is discussed further in Section 7. Using the
algorithm from Section 4, Bartlett-corrected EL intervals were also computed using M = 1000
Monte Carlo approximations and bootstrap block sizes bn,bt = n1/4 , n1/3 . Additionally, for
comparison to EL intervals, normal approximation intervals for θ were taken as Z̄n ± 1.645Sn
using the sample mean Z̄n over Rn and a spatial subsampling variance estimator Sn2 of Var(Z̄n )
based on a plug-in estimate of its optimal block size, with pilot block sizes n1/(2+2i) , i = 1, 2,
see Nordman and Lahiri (2004). Table 1 provides summaries of the coverage accuracies and
interval lengths for the EL method based on 1000 simulation runs for each sampling region
and covariance structure; Table 2 provides the same for the subsampling-based intervals. The
Bartlett correction appears to provide large improvements in the EL intervals across a variety
of dependence structures. From the results, we make the following observations:
1. The coverage accuracies of the intervals often improved, and the interval lengths decreased, as the strength of underlying spatial dependence decreased and the size of the
sampling region increased.
2. Coverage probabilities of uncorrected EL and normal approximation intervals were similar, and often far below the nominal level. This agrees with other simulation results
for EL with independent data, in which uncorrected EL intervals often appeared too
narrow (DiCiccio, Hall and Romano (1991), Chen and Cui (2005b)).
3. Bartlett-corrected EL intervals based on NOL and OL blocks were generally competitive
and had coverage accuracies that were much closer to the nominal level than uncorrected
intervals. The NOL block version typically performed better with shorter blocks bn .
15
Table 1: Coverage probabilities for approximate two-sided 90% EL confidence intervals for
the process mean, with expected interval lengths, based on OL/NOL blocks of length bn ;
below U C, BC3 , BC4 denote uncorrected and Bartlett-corrected intervals based on bootstrap
blocks bn,bt = n1/3 , n1/4 , respectively, and n1 × n2 denotes the size of the sampling region with
n = n1 n2 .
bn = n1/5
NOL
bn = 2n1/5
OL
NOL
OL
E(0.4,0.2)
UC
BC3
BC4
UC
BC3
BC4
UC
BC3
BC4
UC
BC3
BC4
10×10
39.6
68.1
76.6
38.7
60.9
65.7
47.9
68.8
68.8
42.4
76.5
74.4
0.57
1.11
1.54
0.57
0.96
1.12
0.63
1.08
1.08
0.58
1.31
1.28
41.8
69.3
77.2
42.0
61.4
73.0
39.6
75.1
62.3
51.0
79.3
74.2
0.35
0.65
0.90
0.36
0.54
0.72
0.42
0.98
0.74
0.51
0.91
0.82
50.9
72.6
77.6
50.0
68.0
79.5
55.2
95.7
95.4
58.8
78.6
77.1
0.30
0.48
0.55
0.30
0.44
0.59
0.40
1.21
1.25
0.41
0.63
0.63
42.3
75.6
80.2
40.4
58.3
69.9
48
80.3
80.3
50.1
80.0
90.5
0.40
0.91
1.28
0.40
0.61
0.83
0.49
1.21
1.21
0.53
1.02
1.47
64.6
88.8
84.9
63.8
81.5
78.7
62.8
83.6
83.6
55.2
91.3
89.0
0.49
0.98
1.06
0.50
0.76
0.76
0.47
0.81
0.81
0.46
1.06
1.00
69.4
86.9
83.6
71.4
83.6
85.4
53.2
82.7
71.6
70.8
86.3
81.1
0.28
0.42
0.49
0.28
0.37
0.41
0.28
0.62
0.48
0.32
0.52
0.46
74.9
87.8
85.4
73.6
85.6
88.1
73.7
98.4
98.0
76.7
89.3
87.9
0.21
0.28
0.27
0.21
0.27
0.29
0.22
0.61
0.60
0.24
0.33
0.31
71.9
92.2
87.8
70.5
82.3
84.5
62.7
91.3
91.0
70.5
89.2
92.4
0.32
0.56
0.63
0.32
0.43
0.47
0.33
0.81
0.81
0.36
0.61
0.82
64.3
89.7
90.4
62.6
86.0
83.6
62.5
80.1
80.1
56.1
87.9
86.4
0.63
1.27
1.55
0.64
1.08
1.09
0.60
1.02
1.02
0.58
1.30
1.26
66.7
89.0
86.1
67.6
83.4
85.0
54.3
86.1
77.5
70.1
91.9
85.8
0.35
0.57
0.65
0.35
0.49
0.55
0.34
0.80
0.63
0.41
0.70
0.60
76.3
90.4
87.5
76.6
87.1
89.6
71.4
98.6
98.3
74.9
89.8
86.5
0.27
0.38
0.36
0.27
0.35
0.39
0.29
0.81
0.78
0.30
0.43
0.40
70.6
92.9
87.6
69.1
83.6
85.2
59.8
91.3
91.0
69.0
92.4
94.9
0.40
0.77
0.85
0.40
0.57
0.63
0.42
1.04
1.04
0.46
0.84
1.15
20×20
30×30
10×30
E(0.8,0.8)
10×10
20×20
30×30
10×30
G(0.4,0.2)
10×10
20×20
30×30
10×30
16
Table 2: Coverage probabilities and expected interval lengths for approximate two-sided 90%
confidence intervals for the process mean based on a normal approximation with a subsampling
variance estimator. Sampling region Rn sizes noted by n1 × n2 .
(β1 , β2 )
(0.4,0.2)
(0.8,0.8)
E(β1 , β2 )
G(β1 , β2 )
10×10
20×20
30×30
10×30
10×10
20×20
30×30
10×30
44.2
50.9
64.0
53.9
70.4
71.9
81.4
76.0
0.59
0.40
0.38
0.50
0.64
0.36
0.28
0.45
69.0
77.9
79.7
76.7
80.4
81.9
86.0
82.5
0.50
0.30
0.22
0.34
0.49
0.27
0.19
0.31
4. Under spatial dependence E(0.4,0.2), the Bartlett-corrected EL intervals were most sensitive to the EL and bootstrap block sizes. In this case, larger blocks seemed preferable
to capture the stronger dependence structure.
Repeating the simulation with M = 500 or 250 bootstrap renditions did not change the
results significantly, suggesting an adequate Bartlett correction may also be possible with
fewer spatial bootstrap replicates.
6
Data example: cancer mortality map
The spatial EL method was applied to the cancer mortality map shown in Figure 1(a), constructed using mortality rates from liver and gallbladder cancer in white males during 19501959. Sherman and Carlstein (1994) considered these data for applying subsampling. We use
their division of high and low mortality rates for illustration purposes, recognizing that the
map’s binary nature discards useful information relevant to the underlying scientific problem.
The sampling region Rn in Figure 1(a) contains 2298 sites on a portion of the integer grid
(0, 66] × (0, 58] ∩ Z2 . For a given site s ∈ Z2 , we code Zs = 0 or 1 to indicate a low or
P
high mortality rate, and let Ss = h∈Ns Zh denote the sum of indicators Zh over the four
nearest-neighbors Ns = {h ∈ Z2 : ks − hk = 1} of site s.
To test whether incidences of high cancer mortality exhibit clumping, Sherman and Carlstein proposed examining the spatial dependence parameter β of an autologistic model of
17
0.4
0.6
8
0.0
0.2
beta
0.0
0.6
0.6
10
6
8
( 0.1284 , 0.4418 )
0
2
4
r(beta)
8
6
2
0.4
0.4
b= 12
( 0.1492 , 0.4493 )
0
0.2
0.2
beta
4
r(beta)
8
6
4
0
2
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
b= 2 , Bartlett Corrected
b= 4 , Bartlett Corrected
b= 6 , Bartlett Corrected
0.0
0.2
0.4
0.6
2
4
6
8
MELEs 0.3114 , −1.3599
( 0.1349 , 0.504 )
0
2
4
6
r(beta)/bar(r*)
8
MELEs 0.3444 , −1.3923
( 0.1619 , 0.534 )
0
0
2
4
6
r(beta)/bar(r*)
8
MELEs 0.3821 , −1.4392
( 0.1753 , 0.5882 )
10
beta
10
beta
10
beta
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
b= 8 , Bartlett Corrected
b= 10 , Bartlett Corrected
b= 12 , Bartlett Corrected
0.2
0.4
beta
0.6
8
6
2
0
2
0.0
MELEs 0.2713 , −1.3413
( 0.0483 , 0.5554 )
4
6
r(beta)/bar(r*)
8
MELEs 0.2854 , −1.3541
( 0.1047 , 0.5147 )
0
0
2
4
6
r(beta)/bar(r*)
8
MELEs 0.2963 , −1.3524
( 0.1185 , 0.5027 )
10
beta
10
beta
10
beta
4
r(beta)
0.6
b= 10
10
10
b= 8
r(beta)/bar(r*)
0.4
beta
( 0.1531 , 0.4578 )
0.0
6
2
0
2
0
0.2
( 0.1677 , 0.4658 )
4
6
r(beta)
8
( 0.2065 , 0.4866 )
4
r(beta)
8
6
4
0
2
r(beta)
0.0
r(beta)/bar(r*)
b= 6
10
b= 4
10
10
b= 2
( 0.258 , 0.5059 )
0.0
0.2
0.4
beta
0.6
0.0
0.2
0.4
0.6
beta
Figure 3: Spatial log-EL ratio rn (β) for β, and a Bartlett-corrected version rn (β)/r̄n∗ for various
block lengths bn . Horizontal lines indicate the chi-square quantile χ21;0.95 , and approximate 95%
confidence intervals for β appear in brackets; MELEs β̂n , α̂n are given for each bn .
18
the type introduced by Besag (1974). That is, suppose the binary process Zs , s ∈ Z2 was
generated by the conditional model, with parameters θ = (α, β)0 , written as
£
¤
¢
¡
exp z(α + βSs )
£
¤,
fθ (z|{Zh : h 6= s}) = Pθ Zs = z | {Zh ∈ Ns } =
1 + exp α + βSs
z = 0, 1.
(7)
Positive values of β suggest a tendency for clustering, while β = 0 implies no clustering
among sites. Sherman and Carlstein set a normal-theory confidence interval for β based on
the pseudo-likelihood estimate β̂nP L and a spatial subsampling variance estimate for Var(β̂ P L ).
The spatial EL may be applied to investigate evidence of clumping without a variance
estimation step. For this, we use pseudo-likelihood-type estimating functions as described
in Example 3 of Section 2.1. For θ = (α, β)0 in (7), we consider the vector process Ys of
dimension m = 5, formed by Zs and its four nearest neighbors Zh , h ∈ Ns , along with
r = p = 2 estimating functions Gθ (Ys ) = ∂ log fθ (Zs | {Zh : h ∈ Ns })/∂θ based on (7).
Figure 1(b) shows the sampling region R5,n of these Ys -observations. Treating α as a nuisance
parameter, we obtain a profile log-EL ratio rn (β) = `n (β) − `n (β̂n ) for each β value, where
(β)
(β)
`n (β) = `n (β, α̂n ), α̂n = arg maxα Rn (β, α) and β̂n is the MELE for β. For various block
choices bn , we computed the MELEs θ̂n = (α̂n , β̂n )0 and, by Theorem 3, calibrated approximate
95% confidence intervals for β based on a χ21 distribution for rn (β). Figure 3 shows the log-EL
ratio rn (β), MELEs, and corresponding approximate 95% confidence interval for β with and
without Bartlett corrections for each block size used. The Bartlett correction factor r̄n∗ was
computed based on M = 1000 bootstrap renditions of R5,m and a block factor bn,bt = 6.
As in the simulation study of Section 5, Bartlett-corrected EL intervals for β are notably
wider than their uncorrected counterparts. EL intervals for β suggest clustering but these
are shifted much closer to zero compared to Sherman and Carlstein’s subsampling-based 95%
confidence interval (0.2185, 0.6183) (after re-parameterization there). In comparison, the EL
method gives a slightly moderated interpretation of clustering. Of additional note, the behavior of EL intervals in Figure 3 also suggests a visual way of selecting a block size for the EL
method; this is described in the next section.
19
7
Spatial empirical likelihood block scaling
The spatial EL proposed in this article involves a block condition (3), stating that the spatial
sample size nY for Rn,Y must be of larger order than the squared number of observations in
a spatial block b2d
n . This appears also to be necessary for the results presented previously.
To see why, note that, from Theorem 2, the exact order of the EL Lagrange multiplier tθ̂n is
1/2
Op (bdn /nY ), which is also the order of tθ0 at the true parameter θ0 . Under the EL moment
1/2
condition (1), we expect tθ0 to converge to zero in probability (requiring bdn /nY
→ 0) as
the sample size increases, so that the EL block probabilities pθ0 ,i from (5) become close to
the probabilities 1/NI maximizing the EL function. Hence, (3) may represent the weakest
possible requirement on the blocks.
Potential EL block scaling in Rd can involve bn = CnκY , for some C > 0 and 0 < κ <
1/(2d), although the best EL block orders for coverage accuracy are presently unknown for
any d. With some time series block resampling methods, MSE-optimal blocks for distribution
estimation are usually smaller than optimal blocks for variance estimation (Lahiri (2003a)).
1/5
This motivated the choice bn = CnY
in the simulation study of Section 5 so as to be smaller
1/4
than the optimal block order O(nY ) known for subsampling variance estimation when d = 2
(Sherman (1996)). This order choice of κ = 1/5 is also a compromise between the optimal
block orders κ ∈ [1/4, 1/6] for some R2 -subsampling distribution estimators studied by GarciaSoidan and Hall (1997).
In practice, EL block sizes might be chosen by the “minimum volatility” method, described
by Politis, Romano and Wolf (1999) (Section 9.3.2) for time series subsampling. The method
is heuristic and based on the idea that, while some block sizes bn may be too large or small,
we might expect to find a range of bn -values yielding approximately correct inference. In this
range, confidence regions should be stable as a function of the block size. Hence, by creating
EL confidence regions over a range of block sizes, an appropriate block size could be chosen
by visual inspection. For illustration, we consider the EL confidence intervals in Figure 3 from
the mortality map example. The apparent stability of these intervals over bn = 6, 8, 10 seems
to indicate that these block choices are reasonable for applying the EL method.
20
Acknowledgments
The author wishes to thank an associate editor and two referees for constructive comments
that improved an earlier version of the paper, as well as Mark Kaiser for helpful discussions.
References
Aitchison, J. and Silvey, S. D. (1953). Maximum-likelihood estimation of parameters subject
to restraints. Ann. Math. Statist. 29, 813-828.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. R. Stat. Soc. B 36, 192-236.
Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician 24, 179-195.
Bravo, F. (2005). Blockwise empirical entropy tests for time series regressions. J. Time Ser.
Anal. 26, 185-210.
Chuang, C. and Chan, N. H. (2002). Empirical likelihood for autoregressive models, with
applications to unstable time series. Statist. Sinica 12, 387-407.
Chan, G. and Wood, A. T. A. (1997). An algorithm for simulating stationary Gaussian random
fields. Applied Statistics 46, 171-181.
Chen, S. X. and Cui, H.-J. (2006). On Bartlett correction of empirical likelihood in the presence
of nuisance parameters. Biometrika 93, 215-220.
Chen, S. X. and Cui, H.-J. (2007). On the second order properties of empirical likelihood with
moment restrictions. Journal of Econometrics (to appear)
Chen, S. X., Härdle, W. and Li, M. (2003). An empirical likelihood goodness-of-fit test for
time series. J. R. Stat. Soc. B 65, 663-678.
Cramèr, H. (1946). Mathematical Methods of Statistics. Princeton University Press, N.J.
Cressie, N. (1993). Statistics for Spatial Data, 2nd Edition. John Wiley & Sons, New York.
DiCiccio, T., Hall, P., and Romano, J. P. (1991). Empirical likelihood is Bartlett-correctable.
Ann. Statist. 19, 1053-1061.
Doukhan, P. (1994). Mixing: properties and examples. Lecture Notes in Statistics 85. Springer,
New York.
21
Garcia-Soidan, P. H. and Hall, P. (1997). In sample reuse methods for spatial data. Biometrics
53, 273-281.
Hall, P. and La Scala, B. (1990). Methodology and algorithms of empirical likelihood. Internat.
Statist. Rev. 58, 109-127.
Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes. Ann.
Statist. 25, 2084-2102.
Kitamura, Y., Tripathi, G. and Ahn, H. (2004). Empirical likelihood-based inference in conditional moment restriction models. Econometrica 72, 1667-1714.
Lahiri, S. N. (2003a). Resampling Methods for Dependent Data. Springer, New York.
Lahiri, S. N. (2003b). Central limit theorems for weighted sums of a spatial process under a
class of stochastic and fixed designs. Sankhya 65, 356-388.
Lin, L. and Zhang, R. (2001). Blockwise empirical Euclidean likelihood for weakly dependent
processes. Statist. Probab. Lett. 53, 143-152.
Lee, Y. D. and Lahiri, S. N. (2002). Least squares variogram fitting by spatial subsampling.
J. R. Stat. Soc. B 64, 837-854.
Monti, A. C. (1997). Empirical likelihood confidence regions in time series models. Biometrika
84, 395-405.
Newey, W. K. and Smith, R. J. (2004). Higher order properties of GMM and generalized
empirical likelihood estimators. Econometrica 72, 219-255.
Nordman, D. J. and Lahiri, S. N. (2004). On optimal spatial subsample size for variance
estimation. Ann. Statist. 32, 1981-2027.
Nordman, D. J. and Lahiri, S. N. (2006). A frequency domain empirical likelihood for shortand long-range dependence. Ann. Statist. 34, 3019-3050.
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional.
Biometrika 75, 237-249.
Owen, A. B. (1990). Empirical likelihood confidence regions. Ann. Statist. 18, 90-120.
Politis, D. N., Romano, J. P., and Wolf, M. (1999). Subsampling. Springer, New York.
Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann.
Statist. 22, 300-325.
22
Qin, J. and Lawless, J. (1995). Estimating equations, empirical likelihood and constraints on
parameters. Canad. J. Statist. 23, 145-159.
Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J.
R. Stat. Soc. B 58, 509-523.
Sherman, M. and Carlstein, E. (1994). Nonparametric estimation of the moments of a general
statistic computed from spatial data. J. Amer. Statist. Assoc. 89, 496-500.
Zhang, J. (2006). Empirical likelihood for NA series. Statist Probab. Lett. 76, 153-160.
23
A blockwise empirical likelihood for spatial lattice data
Daniel J. Nordman
Dept. of Statistics
Iowa State University
Ames, IA 50011
dnordman@iastate.edu
8
Appendix
Section 8.1 details the spatial mixing and moment conditions used to establish the main results
of the manuscript. Section 8.2 provides some technical lemmas to facilitate the proofs of the
main results, which are presented in Section 8.3. In Section 8.4, we describe a further result on
EL inference under parameter constraints. Section 8.5 describes the spatial bootstrap method
used to implement the spatial EL Bartlett correction from Section 4 of the manuscript.
8.1
Assumptions
To establish the main results on the spatial EL, we require assumptions on the spatial process
and the potential vector Gθ of estimating functions. Recall that we may collect observations
from the real-valued, strictly stationary spatial process {Zs : s ∈ Zd } into m-dimensional
vectors Ys = (Zs+h1 , Zs+h2 , . . . , Zs+hm )0 , s ∈ Zd , using fixed lag vectors h1 , h2 , . . . , hm ∈ Zd for
a positive integer m ≥ 1. Recall Rn = λn R0 ⊂ Rd denotes the sampling region for the process
{Zs : s ∈ Zd } and Rn,Y is the sampling region of the observed Ys , s ∈ Zd , where n = |Rn ∩ Zd |
and nY = |Rn,Y ∩ Zd | are the sample sizes for each region. We first outline some notation.
For A ⊂ Rd , denote the Lesbegue volume of an uncountable set A as vol(A) and the
cardinality of a countable set A as |A|. Limits in order symbols are taken letting n → ∞ and,
for two positive sequences, we write sn ∼ tn if sn /tn → 1. For a vector x = (x1 , ..., xd )0 ∈ Rd ,
1
let kxk and kxk∞ = max1≤i≤d |xi | denote the Euclidean and l∞ norms of x, respectively. Define
the distance between two sets E1 , E2 ⊂ Rd as: dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 }.
Let FY (T ) denote the σ-field generated by the random vectors {Ys : s ∈ T }, T ⊂ Zd , and
define the strong mixing coefficient for the strictly stationary random field {Ys : s ∈ Zd } as
αY (v, w) = sup{α̃(T1 , T2 ) : Ti ⊂ Zd , |Ti | ≤ w, i = 1, 2; dis(T1 , T2 ) ≥ v},
v, w > 0,
(8)
where α̃Y (T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ FY (T1 ), B ∈ FY (T2 )}. In the following
assumptions, let θ0 denote the unique parameter value which satisfies (1).
Throughout the sequel, we use C to denote a generic positive constant that does not depend on n or any Zd points and may vary from instance to instance.
Assumptions
2
d
1. As n → ∞, b−1
n +(bn /λn ) = o(1) and, for any positive real sequence an → 0, the number
−(d−1)
of cubes of an Zd which intersect the closures R0 and Rd \ R0 is O(an
).
2. There exist nonnegative functions α1 (·) and q(·) such that α1 (v) → 0 as v → ∞ and
αY (v, w) ≤ α1 (v)q(w), v, w > 0. The non-decreasing function q(·) is bounded for the
time series case d = 1, but may be unbounded q(w) → ∞ as w → ∞ for d ≥ 2.
3. For some 0 < δ ≤ 1,
E{kGθ0 (Ys )k6+δ } < ∞,
4. The r × r matrix Σθ0 =
0 < κ < (5d − 1)(6 + δ)/(dδ) and C > 0, it holds that
P∞ 5d−1
α1 (v)δ/(6+δ) < ∞, q(w) ≤ Cwκ , w ≥ 1.
v=1 v
P
h∈Zd
Cov{Gθ0 (Ys ), Gθ0 (Ys+h )} is positive definite.
To avoid pathological sampling regions, a boundary condition on R0 in Assumption 1
implies that the number of Zd lattice points near the boundary of Rn = λn R0 is of smaller
order O(λd−1
n ) than the volume of the sampling region Rn . Lahiri (2003a, p. 283) also
describes this boundary condition, which is satisfied for most practical sampling regions. As
a consequence, the number n of Zs -sampling sites (i.e., Zd points) contained in Rn , as well as
the number nY of Ys -sampling sites in Rn,Y , is asymptotically equivalent to the volume of Rn :
n = |Rn ∩ Zd | ∼ vol(Rn ) = λdn vol(R0 ),
2
nY = |Rn,Y ∩ Zd | ∼ λdn vol(R0 ).
The growth rate of the spatial block factor bn in Assumption 1 represents a spatial extension
of scaling conditions used for the blockwise EL for time series d = 1 in Kitamura (1997); this
is equivalent to the block condition (3). Additionally, the boundary condition on R0 allows
the number of blocks to be quantified under different EL blocking schemes; see Lemma 2(i)
of the following Section 8.3 for illustration.
Assumption 2 describes a mild bound on the mixing coefficient from (8) with growth rates
set in Assumption 3. These mixing assumptions permit moment bounds and a central limit
P
theorem to be applied to sample means of the form Ḡn =
s∈Rn,Y ∩Z2 Gθ0 (Ys )/nY (Lahiri,
2003b); Lemma 1 in Section 8.3 illustrates such moment bounds. The conditions on the mixing coefficient (8) in Assumptions 2-3 apply to many weakly dependent random fields including
certain linear fields with a moving average representation, Gaussian fields with analytic spectral densities, Markov random fields as well as various time series; see Doukhan (1994). For
d > 1, we allow (8) to become unbounded in w, which is important in the spatial case to avoid
a more restrictive form of mixing; see Lahiri (2003a, p. 295). Assumption 4 implies that the
limiting variance Σθ0 = limn→∞ nY Var(Ḡn ) is positive definite.
8.2
Preliminary results for main proofs
Lemma 1 gives moment bounds based on Doukhan (1994, p. 9, 26) while Lemma 2 provides
some important distributional results for proving the main EL results. In particular, parts (ii)
and (iii) of Lemma 2 entail that, at the true parameter value θ0 , spatial block sample means
Mθ0 ,i , i ∈ In , from the EL construction (4) can be combined to produce normally distributed
averages or consistent variance estimators. Parts (iv)-(vi) of this lemma are used to prove
that, in a neighborhood of θ0 , the EL ratio Rn (θ) from (4) can be finitely computed and
also that a sequence θ̂n of maximizers of Rn (θ) (i.e., the maximal EL estimator) must exist
in probability. Lemma 3 establishes the distribution of the spatial log-EL ratio at the true
parameter value θ0 . Proofs of Lemmas 2 and 3 appear subsequently.
Lemma 1 (i) Suppose a random variable Xi is measurable with respect to FY (Ti ) for bounded
Ti ⊂ Zd , i = 1, 2 and let s, t > 0, 1/s + 1/t < 1. If dis(T1 , T2 ) > 0 and expectations are finite,
¡
¢1−1/s−1/t
.
then |Cov(X1 , X2 )| ≤ 8{E(|X1 |s )}1/s {E(|X2 |t )}1/t αY dis(T1 , T2 ); max |Ti |
i=1,2
3
(ii) Under Assumptions 2-3, for any real 1 ≤ k ≤ 6 and T ⊂ Zd it holds that E{k
P
s∈T
G̃θ0 (Ys )kk } ≤
C|T |k/2 , where G̃θ0 (Ys ) = Gθ0 (Ys ) − E{Gθ0 (Ys )}.
Lemma 2 Let In = IbOL
or IbNnOL and NI = |In |. Under Assumptions 1-4,
n
(i) |IbOL
| ∼ vol(Rn,Y ), nY ∼ vol(Rn,Y ), |IbNnOL | ∼ vol(Rn,Y )/bdn and vol(Rn,Y ) ∼ vol(Rn ) =
n
λdn vol(R0 );
P
d
1/2
(ii) nY M̄θ0 −→ N (0r , Σθ0 ), where M̄θ0 ≡ i∈In Mθ0 ,i /NI ;
p
0
b θ0 ≡ bdn P
(iii) Σ
i∈In Mθ0 ,i Mθ0 ,i /NI −→ Σθ0 , with Σθ0 from Assumption 4;
(iv) P (Rn (θ0 ) > 0) → 1;
¡
5/12 ¢
(v) maxi∈In kMθ0 ,i k = Op b−d
n
;
Y
n
P
d/2
(vi) P (inf v∈Rr ,kvk=1 NI−1 i∈In bn v 0 Mθ0 ,i I(v 0 Mθ0 ,i > 0) > C) → 1 for some C > 0, letting I(·)
denote the indicator function.
d
Lemma 3 Under Assumptions 1-4 and In = IbOL
or IbNnOL , it holds in (6) that `n (θ0 ) −→ χ2r .
n
Proof of Lemma 2. Assumption 1 yields part(i) of the lemma. We shall sketch the proof
for vol(Rn,Y ) and the number |IbOL
| of OL blocks; the remaining cases follow similarly and
n
more details on counting results can be found in Nordman and Lahiri (2004). For a positive
integer j, define
Jn (j) = {i ∈ Zd : (i + j[−1, 1]d ) ∩ Rn 6= ∅, (i + j[−1, 1]d ) ∩ (Rd \ Rn ) 6= ∅},
where again Rn = λn R0 , and note that for an = j/λn
¯©
ª¯
|Jn (j)| ≤ (2j + 1)d ¯ i ∈ an Zd : cube i + an [−1, 1]d intersects both R0 and Rd \ R0 ¯
= (2j + 1)d O(a−(d−1)
) = O(jλd−1
n
n )
(9)
by the R0 -boundary condition in Assumption 1. The bound in (9) also holds if we replace a
fixed integer j by the sequence of block factors bn (i.e., replace j, Jn (j) with bn , Jn (bn )).
Recall that Rn,Y = {s ∈ Rn : s + h1 , . . . , s + hm ∈ Rn } ⊂ Rd is defined with respect to m
d
fixed lags {hi }m
i=1 ⊂ Z . Let h = max1≤i≤m khi k∞ and note that
vol(Rn ) − vol(Rn \ R∗n,Y ) = vol(R∗n,Y ) ≤ vol(Rn,Y ) ≤ vol(Rn )
4
where R∗n,Y = {s ∈ Rn : s + h[−1, 1]d ⊂ Rn }. Then, for fixed h by (9), we find vol(Rn \
d
R∗n,Y ) ≤ (2h)d |Jn (h)| = O(λd−1
n ) so that vol(Rn,Y ) ∼ vol(Rn ) = λn vol(R0 ) follows. Likewise,
¯
¯
n = |Zd ∩ Rn | ∼ vol(Rn ) holds from ¯n − vol(Rn )¯ ≤ 2d |Jn (1)| and then |IbOL
| ∼ vol(Rn )
n
follows from n − |Jn (bn )| ≤ |IbOL
| ≤ n and |Jn (bn )| = O(bn λd−1 ) = o(vol(Rn )).
n
To prove parts of Lemma 2(ii) and (iii), we treat only the OL block case In = IbOL
;
n
the NOL case follows similarly and we shall describe the modifications required for handling
P
NOL blocks. Defining the overall sample mean Ḡn ≡ n−1
Y
s∈Rn,Y ∩Zd Gθ0 (Ys ), it holds that
d
1/2
nY Ḡn −→ N (0r , Σθ0 ) under Assumptions 1-3 by applying the spatial central limit theorem
result in Theorem 4.2 of Lahiri (2003b). Now define a scaled difference between Ḡn and the
average of block sample means M̄θ0 as
X
−1
An ≡ Ḡn − n−1
Y NI M̄θ0 = nY
ws Gθ0 (Ys ),
s∈Rn,Y ∩Zd
where the last representation uses weights ws ∈ [0, 1] for each s ∈ Rn,Y ∩ Zd where
d
OL
ws = 1 − b−d
n × “# of OL blocks among {Bbn (i) ≡ i + bn (−1/2, 1/2] : i ∈ Ibn } containing s”.
Because ws = 0 if s + bn [−1, 1]d ⊂ Rn,Y , it holds that |{s ∈ Rn,Y ∩ Zd : ws 6= 0}| ≤ |Jn (bn )| ≤
Cbn λd−1
from (9) and Rn,Y ⊂ Rn . Consequently, letting 0 ∈ Zd denote the zero vector, we
n
have
d
nY E(A2n ) ≤ n−1
Y |{s ∈ Rn,Y ∩ Z : ws 6= 0}|
X
kCov{Gθ0 (Y0 ), Gθ0 (Yh )}k
h∈Zd
−1
≤ Cbn λd−1
n nY = O(bn /λn ) = o(1)
follows from Lemma 2(i) along with
X
kCov(Gθ0 (Y0 ), Gθ0 (Yh ))k ≤ C
∞
X
αY (v; 1)δ/(6+δ) |{h ∈ Zd : khk∞ = v}| < ∞, (10)
v=1
h∈Zd ,h6=0
which holds by Lemma 1 with Assumptions 2-3 and |{h ∈ Zd : khk∞ = v}| ≤ 2(2v + 1)d−1 ,
1/2
p
v ≥ 1. Hence, in the OL block case, nY An −→ 0 and part(ii) follows from the normal limit
1/2
of nY Ḡn along with Slutsky’s theorem and n−1
Y NI → 1 for OL blocks by Lemma 2(i). (In the
P
−1
d
NOL block case, we define a difference An ≡ Ḡn − n−1
Y bn NI M̄θ0 = nY
s∈Rn,Y ∩Zd ws Gθ0 (Ys ),
where weight ws = 1 if site s ∈ Rn,Y ∩ Zd belongs to some NOL block in the collection
5
p
1/2
{Bbn (i) : i ∈ IbNnOL } and ws = 0 otherwise. Then, nY An −→ 0 holds for NOL blocks by the
d
same argument and part(ii) then follows by Slutsky’s theorem along with n−1
Y bn NI → 1 for
NOL blocks by Lemma 2(i).)
We next establish Lemma 2(iii) for OL blocks In = IbOL
. Writing h = (h1 , . . . , hd )0 ∈ Zd ,
n
note that by the Dominated Convergence Theorem and (10) we have that


b θ0 ) = bd E(Mθ0 ,0 M 0 ) = b−d Var 
E(Σ
n
θ0 ,0
n
X
Gθ0 (Ys )
s∈Bbn (0)∩Zd
=
X
b−d
n
Cov(Gθ0 (Y0 ), Gθ0 (Yh ))
d
Y
(bn − |hi |) → Σθ0 ,
i=1
khk∞ ≤bn
for expectation over the cube Bbn (0) = bn (−1/2, 1/2]d . Hence, for part(iii) it suffices to show
b θ0 v2 ) = o(1) for any vi ∈ Rr , kvi k = 1, i = 1, 2. Fix v1 , v2 and expand the variance
Var(v10 Σ
b θ0 v2 ) = N −2 b2d
Var(v10 Σ
n
I
X
|{i ∈ In : i + h ∈ In }|Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,h Mθ0 0 ,h v2 )}
h∈Zd
≡ A1n + A2n
by considering two sums of covariances at displacements h ∈ Zd with khk∞ ≤ bn (i.e., A1n )
or khk∞ > bn (i.e., A2n ). Then, applying the Cauchy-Schwartz inequality with Lemma 1(ii)
and Assumption 3, we have for h ∈ Zd
|Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,h Mθ0 0 ,h v2 )}| ≤ Var(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ) ≤ E(kMθ0 ,0 k4 ) ≤ Cb−2d
n ,
so that |A1n | ≤ CNI−1 |{h ∈ Zd : khk∞ ≤ bn }| = O(bdn /λdn ) = o(1) by Lemma 2(i) for
OL blocks. For h ∈ Zd with khk∞ > bn , it holds that dis[Bbn (0), Bbn (h)] ≥ 1 so that
by Assumption 3 and Lemma 1(i) (i.e., taking s = t = 3/(6 + δ) there for δ in Assumption 3), we may bound the covariance |Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,h Mθ0 0 ,h v2 )}| by the quan¢δ/(6+δ)
¡
where the moment satisfies
tity C{E(kMθ0 ,0 k(12+2δ)/3 )}6/(6+δ) αY dis[Bbn (0), Bbn (h)], bdn
by Lemma 1(ii). By Lemma 2(i) and Assumptions 2-3,
{E(kMθ0 ,0 k(12+2δ)/3 )}6/(6+δ) ≤ Cb−2d
n
6
we then bound
|A2n | ≤
≤
b2d
n
NI
C
NI
C
≤
NI
X
¡
¢δ/(6+δ)
{E(kMθ0 ,0 k(12+2δ)/3 )}6/(6+δ) αY dis[Bbn (0), Bbn (h)], bdn
h∈Zd ,khk∞ >bn
∞
X
k(k + bn )d−1 αY (k, bdn )δ/(6+δ)
k=1
bn
X
dκδ/(6+δ)
d−1
k(k + bn )
k=1
d
d+1
+ Cλ−d
≤ Cλ−d
n bn
n bn
Cbn
+
NI
∞
X
µ ¶4d−1
∞
X
k
k d α1 (k)δ/(6+δ)
bn
k=b +1
n
k 5d−1 α1 (k)δ/(6+δ) = o(1),
k=bn +1
using |{h ∈ Zd : dis[Bbn (0), Bbn (h)] = k}| ≤ Ck(k +bn )d−1 , k ≥ 1, in the second inequality and
substituting (k/bn )4d−1 ≥ 1 in the second sum of the third inequality. So part(iii) follows for
b θ0 v2 ) =
OL blocks. (We note that, in the case of NOL blocks, the above argument that Var(v10 Σ
o(1) must be slightly modified. When In = IbNnOL and NI = |IbNnOL |, then
b θ0 v) =
Var(v 0 Σ
X
b2d
n
|{i ∈ In : i + bn h ∈ In }|Cov{(v10 Mθ0 ,0 Mθ0 0 ,0 v2 ), (v10 Mθ0 ,bn h Mθ0 0 ,bn h v2 )}
NI
d
h∈Z
≡ A1n + A2n
−1
0
0
where A1n = NI−1 b2d
n Var(v1 Mθ0 ,bn h Mθ0 ,bn h v2 ) = O(NI ) = o(1) corresponds to the covariance
sum at lag h = 0 and A2n = o(1) represents the sum of covariance terms over non-zero lags
khk > 0.)
In proving the remaining parts of Lemma 2, we need not make a distinction between OL
or NOL blocks. To show part(iv) of Lemma 2, we will assume part(vi) holds. We argue
that a contradiction arises by supposing that the event in probability statement of part(vi)
holds and the zero vector 0r ∈ Rr is not interior to the convex hull of {Mθ0 ,i : i ∈ In }.
If 0r is not interior, then by supporting/separating hyperplane theorem there exists some
v ∈ Rr , kvk = 1 where v 0 Mθ0 ,i ≤ v 0 0r = 0 holds for all i ∈ In ; however, this contradicts the
event in the probability statement of part(vi), which implies that v 0 Mθ0 ,i > 0 holds for some
i ∈ In . Therefore, whenever the event in part(vi) holds, then 0r must be interior to the convex
hull of {Mθ0 ,i : i ∈ In }, which implies Rn (θ0 ) > 0 by (5). Hence, part(vi) implies part(iv) of
the lemma.
7
To show part(v), note
E(max kMθ0 ,i k) ≤ E
i∈In
Ã
 X

kMθ0 ,i k6
!1/6 

i∈In
−5/12 d
bn
by Lemma 1(ii) so that nY

(
=
X
)1/6
1/6
E(kMθ0 ,i k6 )
≤ Cbn−d/2 NI
i∈In
−1/4 d/2
bn )
maxi∈In kMθ0 ,i k = Op (nY
−d/4 d/2
bn )
= Op (λn
= op (1) by
Assumption 1, Lemma 2(i) and NI ≤ nY .
Finally, to establish part(vi), we employ an empirical distribution of block means F̂ (v) =
P
d/2
NI−1 i∈In I(bn Mθ0 ,i ≤ v), v ∈ Rr . For fixed v ∈ Rd , it holds that |F̂n (v)−P (Z ≤ v)| = op (1)
where Z denotes a normal N (0r , Σθ0 ) random vector. This can be shown using E{F̂n (v)} =
d/2
P (bn Mθ0 ,0 ≤ v) → P (Z ≤ v) under Assumptions 1-3 by applying a central limit theorem for
d/2
the block sample mean bn Mθ0 ,0 (Theorem 4.2, Lahiri, 2003b) and verifying Var{F̂n (v)} = o(1)
similar to the proof of Lemma 2(iii). Consequently, supv∈Rr |F̂n (v) − P (Z ≤ v)| = op (1) holds
by Polya’s theorem and, from this and part(iii), one can prove convergence of the following
absolute “half-space” moments of F̂n (·)
X
¯
¯
0
0
¯ = op (1).
sup ¯NI−1
bd/2
|v
M
|
−
E|v
Z|
θ
,i
0
n
v∈Rr ,kvk=1
Using this along with
sup
v∈Rr ,kvk=1
0
i∈In
P
−→ 0r by part(ii), where M̄θ0 = NI−1 i∈In Mθ0 ,i , we have
¯ −1 X d/2 0
¯
0
−1
0
¯N
¯ = op (1)
b
v
M
I(v
M
>
0)
−
2
E|v
Z|
θ
,i
θ
,i
0
0
n
I
1/2
bn M̄θ0
p
i∈In
0
because v Mθ0 ,i I(v Mθ0 ,i > 0) = (|v 0 Mθ0 ,i | + v 0 Mθ0 ,i )/2 for i ∈ In , v ∈ Rr . Now part(vi) follows
using the fact that inf v∈Rr ,kvk=1 E|v 0 Z| ≥ C holds for some C > 0 since Var(Z) = Σθ0 is
positive definite by Assumption 4. ¤
Proof of Lemma 3. By Lemma 2(iv), a positive Rn (θ0 ) exists in probability and can be
Q
written, from (5), as Rn (θ0 ) = i∈In (1 + γθ0 ,i )−1 with γθ0 ,i = t0θ0 Mθ0 ,i < 1, where tθ0 ∈ Rr
satisfies Q1n (θ0 , tθ0 ) = 0r in (15). By Lemma 2, it holds that Zθ0 ≡ maxi∈In kMθ0 ,i k =
1/2
op (b−d
n nY ). We now modify an argument from Owen (1990, p. 101) by writing tθ0 = ktθ0 kuθ0
with uθ0 ∈ Rr , kuθ0 k = 1, and then expanding Q1n (θ0 , tθ0 ) = 0r to find
1/2
nY ktθ0 k X u0θ0 Mθ0 ,i Mθ0 0 ,i uθ0
1/2 0
1/2
0 = −nY uθ0 Q1n (θ0 , tθ0 ) =
− nY u0θ0 M̄θ0
NI
1 + γθ0 ,i
i∈I
n
≥
1+
1/2
0 b
nY b−d
n ktθ0 kuθ0 Σθ0 uθ0
1/2
−1/2
(nY bdn Zθ0 )(nY b−d
n ktθ0 k)
8
1/2
− nY kM̄θ0 k
(11)
where the inequality follows upon replacing each γθ0 ,i with Zθ0 ktθ0 k and u0θ0 M̄θ0 with kM̄θ0 k and
b θ0 from Lemma 2. Then combining the facts that nY
using the definitions of M̄θ0 , Σ
−1/2 d
bn Zθ0
=
b θ0 uθ0 > C) → 1 for some
op (1), that nY kM̄θ0 k = Op (1) by Lemma 2(ii), and that P (u0θ0 Σ
1/2
−1/2
C > 0 by Lemma 2(iii) and Assumption 4, we deduce ktθ0 k = Op (bdn nY
) from (11). From
this, we also have maxi∈In |γθ0 ,i | ≤ ktθ0 kZθ0 = op (1).
b θ0 is positive definite in probability, we may algebraically solve Q1n (θ0 , tθ0 ) = 0r for
As Σ
b −1 M̄θ0 + φθ0 where
tθ0 = bdn Σ
θ0
b −1 kkΣ
b θ0 k
Zθ0 ktθ0 k2 kΣ
−1/2
θ0
kφθ0 k ≤
= op (bdn nY ).
1 − ktθ0 kZθ0
(12)
Applying a Taylor expansion gives log(1 + γθ0 ,i ) = γθ0 ,i − γθ20 ,i /2 + ∆i for each i ∈ In so that
`n (θ0 ) = 2Bn
X
0 b
b −1 M̄θ0 − b−2d
log(1 + γθ0 ,i ) = nY (M̄θ0 0 Σ
n φθ0 Σθ0 φθ0 ) + 2Bn
θ0
i∈In
X
∆i
(13)
i∈In
d
b −1 M̄θ0 −→ χ2r and it also holds that
where Bn = nY /(bdn NI ). By Lemma 2(ii)-(iii), nY M̄θ0 0 Σ
θ0
0 b
b−2d
n nY φθ0 Σθ0 φθ0 = op (1) from (12). Finally, we may bound
2Bn
X
i∈In
3 b
b−2d
n nY 2Zθ0 ktθ0 k kΣθ0 k
= op (1).
|∆i | ≤
(1 − Zθ0 ktθ0 k)2
(14)
Lemma 3 then follows by Slutsky’s Theorem. ¤
8.3
Proofs of the main results
Proof of Theorem 1. In the case that H(θ) = θ is the identity mapping, the result follows
immediately from Lemma 3. From this, Theorem 1 follows for a general smooth H(·) as in
the proof of Theorem 2.1 of Hall and La Scala (1990). ¤
−5/12
Proof of Theorem 2. Set Θn = {θ ∈ Θ : kθ − θ0 k ≤ nY
}, ∂Θn = {θ ∈ Θ : kθ − θ0 k =
P
P
−5/12
0
b θ = bd
nY
} and define M̄θ = i∈In Mθ,i /NI , Σ
n
i∈In Mθ,i Mθ,i /NI , θ ∈ Θn and functions
¡
¢
X ∂Mθ,i /∂θ 0 t
b−d
Q2n (θ, t) = n
,
NI i∈I 1 + t0 Mθ,i
1 X Mθ,i
Q1n (θ, t) =
,
NI i∈I 1 + t0 Mθ,i
n
on Θ×Rr . For i = 1, 3, set Jn,i =
P
s∈Rn,Y ∩Zd
(15)
n
J i (Ys )/nY , noting Ji,n = Op (1) by EJ 3 (Ys ) < ∞;
again J(·) is assumed to be nonnegative. To establish Theorem 2, we proceed in three steps
9
to show, that with arbitrarily large probability as n → ∞, the following hold: Step 1. the log
EL ratio `n (θ) exists finitely on Θn and is continuously differentiable and hence a sequence of
minimums θ̂n exists of `n (θ) on Θn (i.e., θ̂n is a maximizer of Rn (θ)); Step 2. θ̂n 6∈ ∂Θn and
∂`n (θ)/∂θ = 0p at θ = θ̂n ; Step 3. θ̂n has the normal limit stated in Theorem 2.
Step 1. Note that
¯
¯
¯
¯
X kMθ,i − Mθ ,i k
¢
¯
¯ −1 X ¡ 0
0
,
sup ¯NI
v Mθ,i I(v 0 Mθ,i > 0) − v 0 Mθ0 ,i I(v 0 Mθ0 ,i > 0) ¯ ≤ sup
¯ θ∈Θn
N
v∈Rr ,kvk=1 ¯
I
i∈I
i∈I
n
n
θ∈Θn
−5/12
which is bounded by CJn,1 supθ∈Θn kθ − θ0 k = Op (nY
−d/2
) = op (bn
).
From this and
Lemma 2(vi), it holds that, for some C > 0,
Ã
!
X
d/2 0
0
P
inf
bn v Mθ,i I(v Mθ,i > 0)/NI > C → 1
kvk=1,θ∈Θn
i∈In
As proof of Lemma 2(iv), when the event in the above probably statement holds, then for any
Q
θ ∈ Θn , we may write Rn (θ) = i∈In (1 + γθ,i ) > 0 where γθ,i = t0θ Mθ,i and Q1n (θ, tθ ) = 0r .
−1/2
Let Ωθ = max{nY
b θ around θ0 , we find
, kθ − θ0 k}, θ ∈ Θn . Expanding both M̄θ and Σ
1/2
sup kM̄θ k/Ωθ ≤ nY kM̄θ0 k + CJn,1 sup Ω−1
θ kθ − θ0 k = Op (1),
θ∈Θn
(16)
θ∈Θn
b θ − Σθ0 k ≤ sup kΣ
bθ − Σ
b θ0 k + kΣ
b θ0 − Σθ0 k = op (1),
sup kΣ
θ∈Θn
θ∈Θn
1/2
by applying Lemma 2(ii)-(iii) above along with Ω−1
θ ≤ nY
and
d X
bθ − Σ
b θ0 k ≤ sup bn
sup kΣ
kMθ0 ,i kkMθ0 ,i − Mθ,i k(1 + kMθ0 ,i − Mθ,i k) ≡ An
θ∈Θn
θ∈Θn NI
i∈In
©
ª1/3
−5/12 d
E(An ) ≤ CnY
bn {E[J(Y0 )3 ]}2/3 E(kMθ0 ,0 k3 ) + [E(kMθ0 ,0 k3 )]2
−5/12 d/2
bn
≤ CnY
= o(1),
which follows from Holder’s inequality, nY ∼ vol(R0 )λdn by Lemma 2(i), and using Lemma 1(ii)
b −1 exists uniin the last line. Hence, by the positive definiteness of Σθ0 in Assumption 4, Σ
θ
b n by (16) implies, for each fixed θ ∈ Θn ,
formly in θ ∈ Θn . Also, the positive definiteness of Σ
∂Q1n (θ, t)/∂t is negative definitive for t ∈ {t ∈ Rr : 1 + t0 Mθ,i ≥ 1/NI , i ∈ In } so that, by
implicit function theorem using Q1n (θ, tθ ) = 0r , tθ is a continuously differentiable function
of θ on Θn and the function `n (θ) = −2Bn log Rn (θ) is as well (e.g., Qin and Lawless, 1994,
10
p. 304-305). Hence, with large probability as n → ∞, the minimizer of `n (θ) exists on Θn .
Step 2. Let Zθ ≡ maxi∈In kMi,θ k, θ ∈ Θn . Using b2n /λn = o(1) by Assumption 1, supθ∈Θn Ωθ ≤
−5/12
nY
, and Lemma 2 [parts(i) and (v)], we may expand the block means Mθ,i , i ∈ In around
θ0 to find
µ
sup
θ∈Θn
Ωθ bdn Zθ
≤
−5/12
bdn nY
≤ op (1) +
¶
1/3
max kMi,θ0 k + sup Ckθ − θ0 k(nY Jn,3 )
i∈In
θ∈Θn
−1/2
Op (bdn nY )
= op (1).
(17)
Now using (16) and (17) and that Q1n (θ, tθ ) = 0r for θ ∈ Θn , we can repeat the same essential
1/2
argument in (11) (i.e., replace θ0 , nY
0≥
there with θ, Ω−1
θ ) to find
−d
0 b
Ω−1
θ bn ktθ kuθ Σθ uθ
− Ω−1
θ kM̄θ k
−d kt k)
1 + (Ωθ bdn Zθ )(Ω−1
b
θ
n
θ
¡
¢
with tθ = ktθ kuθ , kuθ k = 1
−d
and then show supθ∈Θn Ω−1
θ bn ktθ k = Op (1). From this (and analogous to (12) from the proof
b −1 M̄θ + φθ for θ ∈ Θn where
of Lemma 3), we expand Q1n (θ, tθ ) = 0r to yield tθ = bdn Σ
θ
−d
supθ∈Θn Ω−1
θ bn kφθ k = op (1). Using now these orders of kφθ k, ktθ k and Zθ with arguments as
in (13) and (14), we may then expand `n (θ) uniformly in θ ∈ Θn as
Ã
#!
"
2 b
2Z
kt
k
k
Σ
k
θ
θ
θ
−2
−2d
0 b −1
b θ φθ +
Ω−2
sup φ0θ Σ
= op (1)
sup n−1
Y Ωθ |`n (θ) − nY M̄θ Σθ M̄θ | ≤ Op
θ bn
(1 − Zθ ktθ k)2
θ∈Θn
θ∈Θn
and then using (16)
−2
0 b −1
sup n−1
Y Ωθ |`n (θ) − nY M̄θ Σθ0 M̄θ | = op (1)
θ∈Θn
follows. For each θ ∈ Θn , we may write M̄θ = M̄θ0 +D̄θ0 (θ−θ0 )+Eθ for D̄θ0 = NI−1
P
i∈In
∂Mθ0 ,i /∂θ
p
and a remainder Eθ satisfying supθ∈Θn kEθ k ≤ Ckθ − θ0 k2 Jn,1 . Note that D̄θ0 −→ Dθ0 ≡
E∂Gθ0 (Yt )/∂θ because E D̄θ0 = Dθ0 and, as in (10),
Var(D̄θ0 ) ≤ Cn−1
Y
X
kCov{∂Gθ0 (Y0 )/∂θ, ∂Gθ0 (Yh )/∂θ}k ≤ Cn−1
Y
h∈Zd
by Lemma 1 and Assumptions 2-3. Hence, we have
sup |M̄θ − [M̄θ0 + Dθ0 (θ − θ0 )]| = op (Ωθ )
(18)
θ∈Θn
and so it now follows that
¯
¤¯¯
£
¤0 −1 £
−1 −2 ¯
sup nY Ωθ ¯`n (θ) − nY M̄θ0 + Dθ0 (θ − θ0 ) Σθ0 M̄θ0 + Dθ0 (θ − θ0 ) ¯ = op (1).
θ∈Θn
11
(19)
−5/12
−5/12
For θ = vθ nY
+ θ0 ∈ ∂Θn , kvθ k = 1, we have Ωθ = nY
so that from (19) we find that
1/6
`n (θ) ≥ σnY /2 holds uniformly in θ ∈ ∂Θn when n is large, where σ denotes the smallest
eigenvalue of Dθ0 0 Σ−1
θ0 Dθ0 . At the same time, by Lemma 3, we have `n (θ0 ) = Op (1) (i.e.,
−2
n−1
Y Ωθ0 = 1 in (19)). Hence, with probability approaching 1, the minimum θ̂n of `n (θ) on Θn
cannot be an element of ∂Θn . Hence, θ̂n must satisfy θ̂n ∈ Θn \ ∂Θn and 0r = Q1n (θ̂n , tθ̂n ) in
addition to
0p = (2nY )−1 ∂`n (θ)/∂θ|θ=θ̂n = Q2n (θ̂n , tθ̂n )
by the differentiability of `n (θ).
b −1 M̄ + φ
Step 3. From the argument in Step 2, we may solve Q1n (θ̂n , tθ̂n ) = 0r for tθ̂n = bdn Σ
θ̂n
θ̂n
θ̂
n
or
£
¤
b −1 M̄ + b−d φ = Σ−1 M̄θ0 + Dθ0 (θ − θ0 ) + op (Ω )
b−d
t
=
Σ
n θ̂n
n
θ0
θ̂n
θ̂n
θ̂n
θ̂
(20)
n
P
p
by Ω−1
b−d kφθ̂n k = op (1), (16) and (18). Recalling also D̄θ0 = NI−1 i∈In ∂Mθ0 ,i /∂θ −→ Dθ0
θ̂n n
P
from Step 2 along with kD̄θ0 −NI−1 i∈In ∂Mθ̂n ,i /∂θk = Op (kθ̂n −θ0 k), and maxi∈In |t0θ̂ Mθ̂n ,i | ≤
n
kt0θ̂ kZθ̂n
n
= op (1) (where again Zθ̂n = maxi∈In kMθ̂n ,i k), we find from Q2n (θ̂n , tθ̂n ) = 0p that
¡
¢
X ∂Mθ̂ ,i /∂θ 0 tθ̂
b−d
n
n
n
−d
= Dθ0 0 b−d
(21)
0p =
n tθ̂n + op (kbn tθ̂n k).
0
NI i∈I 1 + tθ̂ Mθ̂n ,i
n
n
Now letting δn = kb−d
n tθ̂n k + Ωθ̂n , from (20) and (21) we may from write


µ −d
¶
·
¸
Σθ0 −Dθ0
bn tθ̂n
M̄θ0 + op (δn )


=
,
0
Dθ0
0
θ̂n − θ0
op (δn )


−1
Σθ0 −Dθ0
Dθ0 0
1/2

·
=
0
Uθ0
Σ−1
θ0 Dθ0 Vθ0
−Vθ0 Dθ0 0 Σ−1
θ0
Vθ0
d
¸
.
1/2
By Lemma 2(ii), nY M̄θ0 −→ N (0, Σθ0 ) holds so it follows that nY δn = Op (1) and the
limiting distribution of θ̂n is given by
µ
1/2
nY
b−d
n tθ̂n
θ̂n − θ0
¶
·
=
Uθ0
−Vθ0 Dθ0 0 Σ−1
θ0
¸

 

U
0 
0
d
1/2
(22)
nY M̄θ0 + op (1) −→ N  r  ,  θ0
0p
0 Vθ0
The proof of Theorem 2 is complete. ¤
12
Proof of Theorem 3. Let PX = X(X 0 X)−1 X 0 denote the projection matrix for a given
matrix X of full column rank and let Ir×r denote the r × r identity matrix. Using (19) along
−1/2
with kθ̂n − θ0 k = Op (nY
−2
) by (22) and n−1
Y Ωθ0 = 1 in (19), we write
−1/2
`n (θ̂n ) = nY (Σθ0
−1/2
`n (θ0 ) = nY (Σθ0
−1/2
M̄θ0 )0 (Ir×r − PΣ−1/2 D )(Σθ0
θ0
−1/2
M̄θ0 )0 (Σθ0
θ0
M̄θ0 ) + op (1),
M̄θ0 ) + op (1).
The chi-square limit distributions in Theorem 3(i) now follow by Lemma 2(ii) as PΣ−1/2 D ,
θ0
θ0
Ir×r −PΣ−1/2 D are orthogonal idempotent matrices with ranks p, r−p, respectively. With Theθ0
θ0
orem 3(i) in place, Theorem 3(ii) follows from modifying arguments in Qin and Lawless (1994,
Corollary 5) in the proof of Theorem 2. ¤
8.4
Spatial empirical likelihood under parameter constraints
As a continuation of Section 3.3, here we briefly consider constrained maximum EL estimation of spatial parameters. Qin and Lawless (1995) introduced constrained EL inference for
independent samples and Kitamura (1997) developed a blockwise version of constrained EL
for weakly dependent time series. For spatial data, we may also consider blockwise EL estimation subject to a system of parameter constraints on a spatial parameter θ ∈ Θ ⊂ Rp :
ψ(θ) = 0q ∈ Rq where q < p and Ψ(θ) = ∂ψ(θ)/∂θ is of full row rank q. By maximizing the
EL function in (5) under the above restrictions on θ, we find a constrained MELE θ̂nψ .
Corollary 1 Suppose Theorem 2 conditions hold and, in a neighborhood of θ0 , ψ(θ) is continuously differentiable, k∂ 2 ψ(θ)/∂θ∂θ0 k is bounded, and Ψ(θ0 ) is rank q. If H0 : ψ(θ0 ) = 0q
d
d
holds, then rn (θ̂nψ ) = `n (θ̂nψ ) − `n (θ̂n ) −→ χ2q and `n (θ0 ) − `n (θ̂nψ ) −→ χ2p−q as n → ∞.
We can then sequentially test H0 : ψ(θ0 ) = 0q with a log-likelihood ratio statistic `n (θ̂nψ )−`n (θ̂n )
and, if failing to reject H0 , make an approximate 100(1−α)% confidence region for constrained
θ values {θ : ψ(θ) = 0q , `n (θ) − `n (θ̂nψ ) ≤ χ2p−q,1−α }.
Proof of Corollary 1. We sketch the proof which requires modifications to the proof of
Theorem 2 as well as arguments from Qin and Lawless (1995) (for the iid data case); we shall
13
employ notation used in the proof of Theorem 2. Write the functions ψ(θ), Ψ(θ) as ψθ , Ψθ
in the following. To establish the existence of θ̂nψ , let Q∗1n (θ, t, ν) = Q1n (θ, t), Q∗2n (θ, t, ν) =
Q2n (θ, t) + Ψ0θ ν, and Q∗3n (θ, t, ν) = ψθ and define Un = {(θ, t, ν) ∈ Rp × Rr × Rq : θ ∈
−5/12
Θn , kt/bdn k + kνk ≤ nY
}.
Step 1. It can first be shown that the system of equations:
Q∗1n (θ, t, ν) = 0r ,
Q∗2n (θ, t, ν) = 0p ,
Q∗3n (θ, t, ν) = 0q
(23)
−1
has a solution (θn∗ , t∗n , νn∗ ) ∈ Un . Uniformly in θ ∈ Θn , it holds that b−d
n ∂tθ /∂θ = Σθ0 Dθ0 +
op (1) (by differentiating Q∗1n (θ, tθ ) = 0r with respect to θ) and that (2nY )−1 ∂`n (θ)/∂θ =
−5/12
Vθ−1
(θ − θ0 ) + Tθ where Tθ is continuous in θ and supθ∈Θn kTθ k = op (nY
0
) (by expanding
(2nY )−1 ∂`n (θ)/∂θ = Q2n (θ, tθ ) around θ0 ). For θ ∈ Θn , define ψθ −Ψθ0 (θ−θ0 ) = kθ−θ0 k2 k(θ),
where k(θ) is continuous and bounded, and write a function η(θ) as
η(θ) =
1 ∂`n (θ)
+
2nY ∂θ
(24)
³
Ψ0θ (Ψθ0 Vθ0 Ψ0θ )−1 kθ − θ0 k2 k(θ) − Ψθ0 Vθ0
·
¸´
1 ∂`n (θ)
−1
− Vθ0 (θ − θ0 ) .
2nY ∂θ
It can be shown that η(θ) = Vθ−1
(θ − θ0 ) + T̃θ , where T̃θ is continuous in θ and supθ∈Θn kT̃θ k =
0
−5/12
op (nY
) , which implies that there exists θ̂n∗ ∈ Θn \ ∂Θn such that −η(θ̂n∗ ) = 0p . This root
θ̂n∗ of η(θ) inside Θn \ ∂Θn is deduced from Lemma 2 of Aitchison and Silvey (1958); this
result entails that because, for large n, −σ1−1 η(θ) maps Θn into {(θ − θ0 ) : θ ∈ Θn } and
(θ − θ0 )0 {−σ1−1 η(θ)} < −σ0 /(2σ1 ) holds for θ ∈ ∂Θn (i.e., (θ − θ0 )0 {−σ1−1 η(θ)} is negative for
−5/12
kθ−θ0 k = nY
), where σ1 and σ0 > 0 respectively denote the largest and smallest eigenvalues
−5/12
of Vθ−1
, it must follow that −σ1−1 η(θ̂n∗ ) = 0 for some kθ̂n∗ −θ0 k < nY
0
by Brouwer’s fixed point
theorem. From this root, we have that 0q = Ψθ0 Vθ0 η(θ̂n∗ ) = kθ̂n∗ −θ0 k2 k(θ̂n∗ )+Ψθ̂n∗ (θ̂n∗ −θ0 ) = ψθ̂n∗
from (24) as well as
1 ∂`n (θ̂n∗ )
1 ∂`n (θ̂n∗ )
= Ψ0θ̂∗ (Ψθ0 Vθ0 Ψ0θ̂∗ )−1 Ψθ0 Vθ0
.
n
n
2nY ∂θ
2nY ∂θ
(25)
This yields that θ̂n∗ , the EL Lagrange multiplier tθ̂n∗ for θ̂n∗ defined by Q1n (θ̂n∗ , tθ̂n∗ ) = 0r , and
νn∗ = −(Ψθ0 Vθ0 Ψ0θ̂∗ )−1 Ψθ0 Vθ0 (2nY )−1 ∂`n (θ̂n∗ )/∂θ satisfy (23) jointly.
n
14
Step 2. We now show that any solution of (23) in Un , say (θ̃, t̃, ν̃), must minimize `n (θ) on Θn
subject to the condition ψθ = 0q . To see this, note if θ ∈ Θn with ψθ = 0q , then we make a
Taylor expansion around θ̃:
i
2
∗
1 h
1 ∂`n (θ̃)
1
0 ∂ `n (θ )
`n (θ) − `n (θ̃) =
(θ
−
θ̃)
+
(θ
−
θ̃)
(θ − θ̃),
2nY
2nY ∂θ0
4nY
∂θ∂θ0
θ∗ between θ, θ̃.
Since θ̃ satisfies (23), it follows from some algebra that θ̃ also satisfies (25) after substituting
θ̃ for θ̂n∗ . Using 0q = ψθ − ψθ̃ = Ψθ̃ (θ − θ̃) + o(kθ − θ̃k2 ), we find (2nY )−1 ∂`n (θ̃)/∂θ0 (θ − θ̃) =
op (kθ−θ̃k2 ) for θ̃ fulfilling (25); it may also be shown that (2nY )−1 ∂ 2 `n (θ∗ )/∂θ∂θ0 = Vθ−1
+op (1)
0
(by expanding (2nY )−1 ∂`n (θ)/∂θ = Q2n (θ, tθ ) around θ0 ). Hence, `n (θ) − `n (θ̃) ≥ {σ0 /2 +
op (1)}nY kθ − θ̃k2 , where the op (1) term is uniform for θ ∈ Θn , ψθ = 0.
Step 3. By the first two steps, we have therefore established that there exists a consistent
MELE θ̂nψ of θ0 , given by θ̂nψ = θ̂n∗ ∈ Θn \ ∂Θn , that satisfies the condition ψ(θ̂nψ ) = 0; we may
denote tθ̂nψ = tθ̂n∗ and νnψ = νn∗ . We now show
µ
1/2
nY
θ̂n − θ0
νnψ
ψ
¶
µ
·
¸¶
Pθ0 0
d
−→ N 0r+p+q ,
,
0 Rθ0
³
´
Pθ0 = Vθ0 Ip×p − Ψ0θ0 Rθ0 Ψθ0 Vθ0 ,
³
´−1
0
Rθ0 = Ψθ0 Vθ0 Ψθ0
.
(26)
Expanding Q∗in (θ, t, ν) at (θ0 , 0, 0) and using that (θ̂nψ , tθ̂nψ , νnψ ) satisfies (23), we have:


 −Q1n (θ0 , 0r ) +


op (δn∗ )


op (δn∗ )
op (δn∗ )


tθ̂nψ /bdn




 = Σ∗n  θ̂ψ − θ
0

 n


νnψ



 ∂Q1n (θ0 , 0r )/∂t ∂Q1n (θ0 , 0r )/∂θ 0
 ∗ 
 , Σn =  ∂Q (θ , 0 )/∂t
0
Ψ0θ0
2n 0 r




0
Ψθ0
0


,


b θ0 , ∂Q1n (θ0 , 0r )/∂θ = D̄θ0 = [bd ∂Q2n (θ0 , 0r )/∂t]0
where Q1n (θ0 , 0r ) = M̄θ0 , bdn ∂Q1n (θ0 , 0r )/∂t = −Σ
n
p
and δn∗ = kθ̂nψ − θ0 k + ktθ̂nψ /bdn k + kνnψ k. Using Lemma 2(iii) and D̄θ0 −→ Dθ0 from the proof
of Theorem 2, we have

 −Σθ0 Dθ0 0

0
Σ∗n −→ 
0 Ψ0θ0
 Dθ0

0
Ψθ0 0
p





C
C12
 ≡  11
 ≡ C̃,


C21 C22
15
h
C12
C11
i
0
= Dθ0 0 , C21 = C12


0
0 Ψθ0
.
= −Σθ0 , C22 = 
Ψθ0 0
) det(−Rθ−1
) 6= 0, for Qc = C22 −
Note that det(C̃) = det(C11 ) det(Qc ) = det(−Σθ0 ) det(Vθ−1
0
0
−1
C21 C11
C12 , and




−1
−1
−1
−1
−1
0
−1
C
Q
P
V
Ψ
R
Σ
−Σ
+
Σ
C
Q
C
Σ
12 c
θ0
θ0 θ0 θ0
12 c
21 θ0
θ0
θ0
θ0
 , Q−1

.
C̃ −1 = 
c =
−1
−1
−1
Qc
Qc C21 Σθ0
Rθ0 Ψθ0 Vθ0
−Rθ0
1/2
d
1/2
Since, by Lemma 2(ii), nY Q1n (θ0 , 0r ) = nY M̄θ0 −→ N (0r , Σθ0 ), it follows that δn∗ =
−1/2
Op (nY
). Then,
µ
1/2
nY
θ̂nψ − θ0
νnψ
¶
1/2
−1
−nY Q−1
c C21 Σθ0 Q1n (θ0 , 0r ) + op (1)



Pθ0 0
d
 .
−→ N 0p+q , 
0 Rθ0
=
Step 4. As in the proof of Theorem 2, we can then expand by (19)
³
´0
³
´
ψ
`n (θ̂nψ ) = nY M̄θ0 + Dθ0 (θ̂nψ − θ0 ) Σ−1
M̄
+
D
(
θ̂
−
θ
)
+ op (1)
θ0
θ0 n
0
θ0
³
´0
³
´
−1
−1
−1
0
0
0
= nY Q1n (θ0 , 0r ) Ir×r − Dθ0 Pθ0 Dθ0 Σθ0 Σθ0 Ir×r − Dθ0 Pθ0 Dθ0 Σθ0 M̄θ0 + op (1)
h
i0 h
ih
i
1/2 −1/2
1/2 −1/2
= nY Σθ0 M̄θ0 Ir×r − (PΣ−1/2 D − PHθ0 ) nY Σθ0 M̄θ0 + op (1),
θ0
θ0
−1/2
−1 0
Dθ0 (Dθ0 0 Σ−1
θ0 Dθ0 ) Ψθ0 . Then,
h
i0
h
i
1/2 −1/2
1/2 −1/2
ψ
`n (θ̂n ) − `n (θ̂n ) = nY Σθ0 M̄θ0 PHθ0 nY Σθ0 M̄θ0 + op (1),
h
i0
h
i
1/2 −1/2
1/2 −1/2
`n (θ0 ) − `n (θ̂nψ ) = nY Σθ0 M̄θ0 (PΣ−1/2 D − PHθ0 ) nY Σθ0 M̄θ0 + op (1).
where Hθ0 = Σθ0
θ0
1/2
−1/2
Note now that nY Σθ0
θ0
d
Q1n (θ0 , 0r )M̄θ0 −→ N (0, Ir×r ) by Lemma 2(ii), PHθ0 and PΣ−1/2 D −
θ0
θ0
PHθ0 are idempotent matrices with
³
´
¡
¢
rank PHθ0
= rank Hθ0 = rank(Ψθ0 ) = q;
´
³
¤
¤
£
£
= p − trace PHθ0 = p − rank PHθ0 = p − q.
rank PΣ−1/2 D − PHθ0
θ0
θ0
−1/2
For rank(PHθ0 ) = q above, we used rank(Hθ0 ) ≤ rank(Ψθ0 ), rank(Ψθ0 ) = rank(Dθ0 0 Σθ0
Hθ0 ) ≤
rank(Hθ0 ). Corollary 1 now follows. ¤
8.5
Spatial block bootstrap algorithm
Here we outline a spatial block bootstrap method for generating bootstrap version Yn∗ of the
original vectorized spatial data Yn = {Ys : s ∈ Rn,Y ∩ Zd } on Rn,Y ⊂ Rd . Bootstrap replicates
16
Yn∗ of spatial data, on a bootstrap sampling region R∗n,Y , are used to formulate the empirical
Bartlett correction for the spatial EL method as described in Section 4.
Let Yn A = {Ys : s ∈ A ∩ Zd } denote the observed spatial data at Zd points lying inside a
set A ⊂ Rn,Y . The block bootstrap requires a block scaling factor, denoted by bn,bt , satisfying
d
b−1
n,bt + bn,bt /nY = o(1). Suppose this bootstrap block scaling is used to make the blocks of size
bn,bt (−1/2, 1/2]d in Rn,Y appearing in Figure 2(b)-(c). As a first step, we divide the sampling
region Rn,Y into NOL blocks of size bn,bt (−1/2, 1/2]d that fall entirely inside Rn,Y , as depicted
OL
in Figure 2(b). In the notation of Section 2.2, {Bbn,bt (i) : i ∈ IbNn,bt
} represents a collection
of bn,bt -scaled NOL “complete blocks” partitioning Rn,Y . These complete NOL blocks inside
Rn,Y , when taken together, form a bootstrap sampling region R∗n,Y as R∗n,Y ≡ {Bbn,bt (i) : i ∈
OL
IbNn,bt
}, as shown in Figure 2(d) based on complete NOL blocks in Figure 2(b). In place of the
original data Yn observed on Rn,Y , we aim to create a bootstrap sample Yn∗ on R∗n,Y . Each block
OL
Bbn,bt (i) = i + bn,bt (−1/2, 1/2]d , i ∈ IbNn,bt
, that constitutes a part of R∗n,Y also corresponds to a
piece of Rn,Y , where we originally observed the data Yn Bbn,bt (i), Bbn,bt (i) ⊂ Rn,Y . For a fixed
OL
i ∈ IbNn,bt
, we then create a bootstrap rendition Yn∗ Bbn,bt (i) of Yn Bbn,bt (i) by independently
resampling some size bn,bt (−1/2, 1/2]d block of Ys -observations from the region Rn,Y (as in
Figure 2(c)) and pasting this observational block into the position of Bbn,bt (i) within R∗n,Y . To
OL
make the resampling scheme precise, for each i ∈ IbNn,bt
, we define the bootstrap version as
Yn∗ Bbn,bt (i) ≡ Yn Bbn,bt (i∗ ) where i∗ ∈ Zd is random vector selected uniformly from the collection
of OL block indices given by IbOL
in the notation of Section 2.2; that is, we resample from
n,bt
all OL bn,bt -scaled blocks within Rn,Y (as depicted in Figure 2(c)) to produce a spatial block
of observations Yn∗ Bbn,bt (i). We then concatenate the resampled block observations for each
OL
OL
} on R∗n,Y with
i ∈ IbNn,bt
into a single spatial bootstrap sample Yn∗ = {Yn∗ Bbn,bt (i) : i ∈ IbNn,bt
OL
| · bdn,bt sampling sites at R∗n,Y ∩ Zd . In Section 4, the bootstrap EL version `∗n may
n∗Y = |IbNn,bt
be computed as in (6) after replacing Yn , Rn,Y , nY with Yn∗ , R∗n,Y , n∗Y . See Chapter 12.3 of
Lahiri (2003a) for more details on the spatial block bootstrap.
17
Download