Point and Interval Estimation of Variogram Models using Spatial Empirical Likelihood Abstract

advertisement
Point and Interval Estimation of Variogram Models
using Spatial Empirical Likelihood
running title: EL variogram inference
Daniel J. Nordman, Petruţa C. Caragea
Department of Statistics
Iowa State University
Ames, IA 50011
Abstract
We present a spatial blockwise empirical likelihood method for estimating variogram
model parameters in the analysis of spatial data on a grid. The method produces
point estimators that require no spatial variance estimates to compute, unlike least
squares methods for variogram fitting, but are as efficient as the best least squares
estimator in large samples. Our approach also produces confidence regions for the
variogram, without requiring knowledge of the full joint distribution of the spatial
data. Additionally, the empirical likelihood formulation extends to spatial regression
problems and allows simultaneous inference on both spatial trend and variogram
parameters. The asymptotic behavior of the estimator is examined analytically, while
its behavior in finite samples is investigated through simulation studies.
1
Introduction
Describing spatial dependence with variograms is one of the most common methods used in practice, and estimation of parameters in theoretical variogram models
1
is needed to achieve prediction at unobserved locations. One popular method for
variogram estimation is so-called “least squares variogram fitting,” proposed originally in the geostatistical literature (cf. Cressie, 1993) and further examined by
Cressie (1985), Zhang et al. (1995), and Lee and Lahiri (2002), among others. This
approach estimates variogram parameters by minimizing a weighted distance between
a nonparametric variogram estimator (e.g., sample variogram) and a parametric variogram model. The choice of the weighting criterion largely determines the performance of the resulting least squares estimator (LSE). For example, the generalized
least squares (GLS) estimator is known to be statistically optimal within the class
of LSEs (Lahiri et al., 2002), but its calculation requires an asymptotic covariance
matrix for the variogram estimator, and this can make it computationally intractable.
As an alternative, Lee and Lahiri (2002) recently proposed a subsampling generalized
least squares (SLS) estimator that replaces the asymptotic covariance matrix needed
in the GLS criterion with a nonparametric estimate.
Although much attention has centered on developing practical LSEs for variogram
parameters (see Section 2), there is less information available on assessing the precision of the resulting estimates. For example, confidence bands for the variogram
model could be helpful in quantifying the uncertainty in estimation. However, the
unknown and possibly complex distribution of the data generating process complicates the setting of confidence regions. Parametric approaches to LSE for estimating
variogram parameters, such as maximum likelihood (or REML), may assess uncertainty in parameter estimates by assuming the data follow a Gaussian distribution
(Cressie, 1993, p. 92). However, a main advantage of LSE methods is that these
make minimal distributional assumptions and are computationally less demanding
than methods that make full distributional assumptions.
This article proposes a new method of variogram estimation that is based on a
2
spatial empirical likelihood (EL) computed from sets of spatial blocks or sub-regions.
A good deal of recent work has been focused on extending the original EL methods of
Owen (1988, 1990, 2001) from applications appropriate for independent data to deal
with problems in time series. Kitamura (1997) proposed a “blockwise” EL method for
weakly dependent time series. With similar blocking techniques, Bravo (2005) considered time series regressions and Zhang (2006) adapted EL for negatively associated
series. Monti (1997) and Nordman and Lahiri (2006) proposed periodogram-based
EL methods for dealing with short- and long-memory processes, respectively. Additionally, research in econometrics has focused on EL for testing moment restrictions
(Kitamura, Tripathi and Ahn, 2004; Newey and Smith, 2004). While our spatial
EL shares some features with these methods for dependent data in one dimension,
the demonstration of asymptotic properties in the spatial setting requires more than
simply “folding” a one-dimensional process onto a set of higher dimensions.
The EL method proposed here allows valid likelihood inference to be made about
variogram model parameters without requiring an estimate of the covariance matrix
for the joint distribution of the underlying process, which is of practical importance.
This is a consequence of the “internal studentization” known to exist for EL methods
if suitably formulated for particular problems (e.g. Hall and La Scala, 1990; Kitamura, 1997). This EL method results in estimators of the variogram parameters that
are asymptotically normal and as efficient as the optimal LSE based on the sample
variogram. The method also extends in a straightforward manner to the problem
of simultaneous estimation of parameters in variogram models and linear models for
large-scale spatial structure or trend, thus offering a solution to the problem of variogram estimation in application of universal kriging (Cressie, 1993, 3.4.3).
In Section 2, we provide background on LSEs and variogram fitting. In Section 3,
we describe the spatial EL method for variogram inference. The main distributional
3
results of the paper are presented in Section 4 and the EL method is investigated
through a simulation study in Section 5. An extension of the EL method to a spatial
regression model is given in Section 6 and illustrated with an example. Section 7
provides some final remarks. Additional material, including proofs of the main results,
is provided in a supplementary on-line Appendix.
2
Least squares estimation of variogram models
Suppose that {Z(s) : s ∈ Rd } is a real-valued, intrinsically stationary random field,
whereby E{Z(s) − Z(s + h)} = 0 and 2γ(h) ≡ Var{Z(s) − Z(s + h)} = E{Z(s) −
Z(s + h)}2 , for all s, h ∈ Rd . The function 2γ(h) denotes the variogram of the
spatial process. In least squares variogram fitting, we assume that the true variogram
of Z(·) belongs to a parametric family {2γ(·; θ) : θ ∈ Θ}. The goal is to estimate
θ ∈ Θ ⊂ Rp based on the available data {Z(s1 ), . . . , Z(sn )}, collected from sites
{s1 , . . . , sn } located within a spatial sampling region. Here, we will assume that
{s1 , . . . , sn } lie on a regular lattice in Rd .
A least squares method begins with a nonparametric estimator 2γ̂n (h) of the
process variogram 2γ(h), such as the sample variogram (Matheron, 1962),
2γ̂n (h) =
X
{Z(si ) − Z(sj )}2 /|Nn (h)|,
h ∈ Rd ,
(1)
(i,j)∈Nn (h)
Nn (h) = {(i, j) : i, j ∈ [1, n], si − sj = h}, where we take |A| to denote the size of
a finite set A. Throughout the remainder, we suppose that a least squares method
is defined using (1). A LSE of the variogram parameter θ is obtained by minimizing
a weighted distance between the variogram model 2γ(h; θ) and the estimator 2γ̂n (h)
over a collection of r ≥ p fixed lags h ∈ {h1 , . . . , hr } ⊂ Rd . Namely, for a r × r
positive definite weight matrix V(θ), the LSE of θ with respect to V(θ) is given by
θ̂n,V ≡ arg min{Qn,V (θ) : θ ∈ Θ},
4
with Qn,V ≡ gn (θ)0 V(θ)gn (θ),
(2)
where gn (θ) is a r × 1 vector with i-th element 2γ̂n (hi ) − 2γ(hi ; θ). The selection of
V(θ) determines the type of the LSE θ̂n,V . For example, two common choices for V(θ)
are the identity matrix and a diagonal matrix with entries Nn (hi )/{2γ(hi ; θ)}2 , producing the ordinary least squares estimator (OLS) and an approximation to Cressie’s
weighted least squares (WLS) estimator (Cressie, 1993, p. 96), respectively.
Under general conditions, Lahiri et al. (2002) have shown the generalized least
squares (GLS) estimator to be asymptotically efficient among all LSEs of θ, which
corresponds to selecting V(θ) = Σ(θ)−1 where Σ(θ) is the asymptotic covariance
matrix of gn (θ) under θ. While statistically optimal in the LSE class, the computation
of the GLS estimator requires minimizing a complex function (2) of the covariance
matrix Σ(θ) at each θ. As an alternative, Lee and Lahiri (2002) proposed replacing
the covariance matrix Σ(θ0 ) at the true parameter value θ0 with a nonparametric
b based on a subsampling method for dependent data described by Politis
estimator Σ
and Romano (1994) and Sherman (1996), among others. Their subsampling least
b −1 ,
squares (SLS) estimator θ̂n,SLS = θ̂n,Σb −1 is then obtained by setting V(θ) = Σ
which is free of θ thereby making (2) simpler to compute for SLS than GLS. However,
Lee and Lahiri (2002) do not address quantifying the uncertainty in estimates of the
variogram 2γ(h; θ̂n,SLS ), which we consider with the EL method described next.
3
3.1
Spatial EL method for variogram estimation
Spatial sampling design
To describe the spatial EL method, we shall adopt a spatial sampling framework that
allows a spatial sampling region to expand as the sample size increases. Suppose
the process Z(·) is observed at n sites {s1 , . . . , sn } located on a regular grid within
a spatial sampling region Rn ⊂ Rd , d ≥ 1. For some increasing sequence {λn }n≥1
5
of positive scaling factors, we suppose that the sampling region Rn is obtained by
inflating a prototype set R0 by λn . That is, Rn = λn R0 , where the template set
R0 ⊂ (−1/2, 1/2]d contains a neighborhood around the origin. This formulation of Rn
permits a wide variety of sampling region shapes where the shape of Rn is preserved
as the sampling region grows. Similar sampling schemes have been considered by
Politis and Romano (1994), Sherman (1996), and Lee and Lahiri (2002) for spatial
subsampling, and we follow these authors in assuming that the process Z(·) is observed
at locations on the integer grid Zd lying inside Rn ; that is, the available sampling
sites are {s1 , . . . , sn } = Rn ∩ Zd .
3.2
Variogram estimating equations
The spatial EL method for variogram estimation employs a moment condition that
links the intrinsically stationary spatial process Z(·) to the variogram parameter value
θ ∈ Θ ⊂ Rp . To accomplish this, we select r ≥ p fixed lags {hi }ri=1 ⊂ Zd , define
a r × 1 vector function Γ(θ), θ ∈ Θ with i-th component equal to 2γ(hi ; θ), and
construct vectors Y(·) of the spatial process Z(·) as
0
Y(s) ≡ {Z(s) − Z(s + h1 )}2 , . . . , {Z(s) − Z(s + hr )}2 ,
s ∈ Rd .
(3)
When the Z(·)-process variogram 2γ(h; θ0 ), h ∈ Rd , belongs to the model class, we
have the moment equation
EY(s) = Γ(θ0 ) ∈ Rr ,
s ∈ Rd .
(4)
For inference on θ ∈ Θ, we next create an EL function based on (3) and the moment
assumption in (4). Other EL functions (and resulting estimators) can be possible
for θ by changing the estimating functions and moment conditions. For example, by
re-defining Y(·) with absolute differences (rather than squared), the right hand side
6
of (4) would be (2/π)1/2 Γ1/2 (θ0 ) for Gaussian processes. However, we shall focus our
spatial EL development on (3) and (4).
3.3
Blockwise EL function and maximum EL estimator
Consider values {Y(s) : s ∈ Rn,Y ∩ Zd } as defined in (3), corresponding to a sampling region Rn,Y = {s ∈ Rn : s + h1 , . . . , s + hr ∈ Rn }. The EL function for θ
involves creating a likelihood function based on blocks of Y(·). The data blocking
is a device used to retain the spatial dependence structure in the EL likelihood by
keeping neighboring observations together. Similar blocking techniques have been
key to formulating other nonparametric likelihoods for dependent data, like the block
bootstrap or subsampling (Künsch, 1989; Politis and Romano, 1994). While the block
bootstrap assigns probabilities to data blocks by resampling, the spatial EL method
creates a likelihood by assigning probabilities to blocks under a moment constraint
and block sample means of Y(·) observations are used to summarize the information
in each block regarding the moment condition (4).
Let {bn }n≥1 be a sequence of positive integers that will define the EL block scaling
and let In = {i ∈ Zd : Bbn (i) ⊂ Rn,Y } denote the index set of all d-dimensional
rectangles Bbn (i) ≡ i + bn (−1/2, 1/2]d , i ∈ Zd , lying inside Rn,Y . This provides a
collection of blocks as {Bbn (i) : i ∈ In }. Figure 1 provides an illustration of the
sampling region and the blocking mechanism. To keep the blocks small relative to
the size of the sampling region Rn = λn R0 (or Rn,Y ), we suppose that bn → ∞ grows
at a slower rate than λn and require b2n /λn → 0 or equivalently,
d 2
b−1
n + (bn ) /n → 0
(5)
as the spatial sample size n → ∞; see the on-line Appendix for more details. Each
block Bbn (i), i ∈ In , contains |Bbn (i)∩Zd | = bdn observations of the vector process Y(·)
7
with a block sample mean Ȳi =
P
s∈Bbn (i)∩Zd
Y(s)/bdn . By (5), the squared number of
observations in a block must be of smaller order than the overall sample size n, which
is a generalization of EL block conditions from time series d = 1 (Kitamura, 1997).
Figure 1: (a) Sampling region Rn for Z(·)-process with site locations denoted by
•; (b) Sampling region RY,n for Y(·) from (3) based on r = 2 lags: h1 = (0, −1)0 ,
h2 = (2, 0)0 ; (c) Overlapping blocks.
(a)
(b)
pp pp pp pp pp pp pp pp pp
pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp
pp pp pp pp pp pp pp pp pp pp pp pp
pp pp pp pp pp pp pp pp pp pp pp pp p p p
ppppppppppppppp
pp pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp
ppppppppppppppppp
(c)
pp pp pp pp pp pp pp pp pp
pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp
pp pp pp pp pp pp pp pp pp pp pp pp
pp pp pp pp pp pp pp pp pp pp pp pp p p p
ppppppppppppppp
pp pp pp pp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp ppp
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp
pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp pp
ppppppppppppppppp
We assess the plausibility of a value θ ∈ Θ using the profile EL function given by
(
Ln (θ) =
NINI
· sup
)
Y
pi : pi ≥ 0,
i∈In
X
i∈In
pi = 1,
X
pi Ȳi = Γ(θ) ,
(6)
i∈In
where NI = |In | denotes the number of blocks. A multinomial likelihood is created from probabilities assigned to each block mean Ȳi , under an “expectation-Γ(θ)”
linear constraint due to (4), and the largest possible product of these probabilities
determines the EL function Ln (θ) for θ ∈ Θ. Note that the maximal value of Ln (θ)
is 1, which occurs if each pi = 1/NI , and we define Ln (θ) = −∞ if the set in (6) is
empty.
If Γ(θ) is interior to the convex hull of {Ȳi : i ∈ In }, then the EL function for θ
achieves a maximum at probabilities pθ,i = NI−1 {1 + t0θ (Ȳi − Γ(θ))}−1 , i ∈ I, and (6)
becomes
Ln (θ) =
Y
−1
1 + t0θ (Ȳi − Γ(θ))
,
i∈In
8
(7)
where tθ ∈ Rr satisfies
P
i∈In (Ȳi
− Γ(θ))pθ,i = 0r . Owen (1990) and Qin and Law-
less (1994) provide these and further computational details with EL methods. In
particular, Owen (1990, Section 3) details the computation of the profile EL function
for inference on the mean µ of independent data. It is important to note that the
same maximization routine applies to (7) by substituting block means Ȳi and Γ(θ) in
place of individual independent observations and µ. That is, the spatial EL function
requires only forming blocks and supplying these into a well-known EL function for
independent data.
The maximizer of the EL function Ln (θ) is a maximum empirical likelihood estimator (MELE) of θ, denoted by θ̂n . Next, we consider large sample properties of this
MELE as well as EL confidence regions formulated with the log-EL function given by
`n (θ) = −2b−d
n log Ln (θ),
θ ∈ Θ.
(8)
The factor b−d
n is a required adjustment due to overlapping EL blocks and represents
the spatial analog of an EL block adjustment with time series (Kitamura, 1997).
4
Distributional results for the EL method
For describing the distributional properties of the EL method, we require some
assumptions on the spatial process and the variogram model, referred to as Assumptions “A.1 to A.6.” We defer technical details on these assumptions to Section 8.1
(on-line Appendix). Briefly, Assumptions A.1-A.3 describe spatial mixing and moments conditions so that the spatial EL method will be valid for a broad class of
weakly dependent spatial processes. Assumptions A.4-A.5 are smoothness and identifiability conditions on the variogram model which may be checked in practice. All
of these assumptions are similar to those of Lee and Lahiri (2002) so that the SLS
and EL methods are valid under the same general conditions. Assumption A6 entails
9
the EL block scaling (5).
4.1
Distribution of maximum empirical likelihood estimator
The main result of this section establishes the existence, consistency, and asymptotic
normality of the MELE of θ as the global maximizer of Ln (θ) over Θ. We note
that the theorem does not require some of the stronger conditions often associated
with global MELE results such as compactness of the parameter space Θ or secondorder derivatives of the estimating functions (cf. Qin and Lawless, 1994, for iid data;
Kitamura, 1997, for time series). Let θ0 ∈ Θ ⊂ Rp denote the unique parameter value
satisfying (4) and Σ(θ0 ) denote the asymptotic covariance matrix for the sample
variogram (1) over lags {hi }ri=1 , as described in Section 2.
Theorem 1 Suppose Assumptions A1-A6 and (4) hold. Then, as n → ∞,
p
(i) P (a global maximum θ̂n exists on Θ) → 1 and θ̂n −→ θ0 .
−1
d
(ii) n1/2 (θ̂n − θ0 ) −→ N (0p , ∆(θ0 )), with ∆(θ0 ) = D(θ0 )0 Σ(θ0 )−1 D(θ0 )
.
Out of all LSEs based on the sample variogram, the GLS estimator has the smallest
asymptotic variance because of its LSE-optimal limiting covariance matrix, given by
∆(θ0 ) (Lahiri et al., 2002). By Theorem 1, the MELE here has the same asymptotic
efficiency as the best LSE in this class.
We may also draw some connections between the MELE and the SLS estimator
of Lee and Lahiri (2002). Recall that the SLS estimator minimizes (2) with a matrix
b −1 involving a subsampling estimator Σ
b of Σ(θ0 ) at the true parameter value
V(θ) = Σ
θ0 . As the sample size n → ∞, we may expand the log-EL ratio (8) at “EL-plausible”
values of θ (e.g., satisfying `n (θ) ≤ r for some r > 0) as
b −1 gn (θ) 1 + op (1) ,
`n (θ) = ngn (θ)0 Σ
EL
d X
b EL ≡ bn
{Ȳi − Γ(θ0 )}0 {Ȳi − Γ(θ0 )},
Σ
NI i∈I
n
where gn (θ) ∈ Rr has i-th component 2γ̂n (hi ) − 2γ(hi ; θ) based on the sample varb EL denotes a subsampling estimator of the asymptotic covariance
iogram (1) and Σ
10
matrix Σ(θ0 ) of gn (θ) under θ0 . Because θ̂n minimizes `n (θ), the MELE asymptotb −1 . The EL
ically minimizes a quadratic form (2) using a weight matrix V(θ) = Σ
EL
b EL resembles the subsampling estimator Σ
b of Lee and Lahiri in structure
version Σ
but, unlike SLS, the EL method involves no direct variance estimation.
4.2
Variogram confidence regions
As suggested earlier, the EL method allows an assessment of the uncertainty in estimating θ through EL confidence regions based on the MELE θ̂n . Using the log-EL
function (8), Theorem 2 concerns the log-EL ratio statistic
rn (θ) ≡ `n (θ) − `n (θ̂n ) = −2b−d
n log{L(θ)/L(θ̂n )},
θ ∈ Θ,
which is shown to have a chi-squared limit at θ = θ0 for calibrating confidence regions.
Relatedly, the spatial EL approach can also accommodate confidence regions for parameter subsets after profiling out the nuisance parameters (cf. Qin and Lawless,
1994, for iid data case). Suppose θ = (θ10 , θ20 )0 , where θ1 represents a q × 1 parameter vector of interest and θ2 denotes a (p − q) × 1 nuisance vector. For fixed θ1 ,
suppose that θ̂2θ1 maximizes the EL function Ln (θ1 , θ2 ) with respect to θ2 and define
θ1
`n (θ1 ) ≡ −2b−d
n log Ln (θ1 , θ̂2 ).
In the following, let χ2ν denote a chi-squared random variable with ν degrees of
freedom and lower-α quantile denoted by χ2ν;α .
Theorem 2 Under the assumptions of Theorem 1, as n → ∞
d
(i) rn (θ0 ) = `n (θ0 ) − `n (θ̂n ) −→ χ2p if H0 : θ = θ0 holds.
d
0
0 0
(ii) rn (θ10 ) = `n (θ10 ) − `n (θ̂1n ) −→ χ2q if H0 : θ1 = θ10 ∈ Rq holds, where θ̂n = (θ̂1n
, θ̂2n
).
We may set an approximate 100(1 − α)% EL confidence region for θ as CR(1 −
α) ≡ {θ ∈ Θ : rn (θ) ≤ χ2p;1−α }; analogous regions apply to a parameter subset
θ1 when profiling. Confidence regions for θ may also be turned into simultaneous
11
confidence bands {γ(h; θ) : θ ∈ CR(1 − α)} for the entire variogram model, under
the interpretation that the bands contain the unknown variogram γ(h; θ0 ) at a lag
h ∈ Rd if and only if the confidence region for θ contains θ0 .
A Bartlett correction for the spatial EL method is a potential tool to enhance
the coverage accuracy of EL confidence regions. This correction is a property often
associated with EL methods, which essentially involves a scalar adjustment to the
EL log-ratio rn (θ0 ) to align its expected value Ern (θ0 ) with a chi-squared’s mean and
improve the chi-squared approximation. With independent data, a Bartlett correction was first established by DiCiccio et al. (1991) for mean parameters and has been
extended by others to more EL scenarios; see Chen and Cui (2006a, 2006b); Kitamura (1997) and Monti (1997). A formal justification of a Bartlett correction in the
spatial setting is difficult and requires machinery outside of the scope of this paper.
At the same time, the effect of a correction may still be of interest and we propose
here an algorithm for a practical Bartlett/mean correction factor by using a spatial
block bootstrap.
Bartlett Factor Algorithm: Pick integer M ≥ 1. For i = 1, . . . , M , independently
generate a block bootstrap rendition, say Yn∗i , of the original (vectorized) spatial
∗i ∗i
data Yn = {Ys : s ∈ RY,n ∩ Zd } and compute rn∗i (θ̂n ) = `∗i
n (θ̂n ) − `n (θ̂n ) (using θ̂n as
∗i
a consistent estimate of θ0 ∈ Rp ), where `∗i
n and θ̂n are the log EL ratio and MELE
P
∗i
analogs based on Yn∗i . Calculate r̄n∗ = M −1 M
i=1 rn (θ̂n ) to estimate Ern (θ0 ) and set
a Bartlett corrected confidence region as {θ : (p/r̄n∗ )rn (θ) ≤ χ2p,1−α }. If θ = (θ10 , θ20 )0
with interest on θ1 ∈ Rq (treating θ2 ∈ Rp−q as nuisance parameter), we use the
profile version rn (θ1 ) = `n (θ1 ) − `n (θ̂1n ) for a Bartlett corrected confidence region
P
∗i
{θ1 : (q/r̄n∗ )rn (θ1 ) ≤ χ2q,1−α } based on r̄n∗ = M −1 M
i=1 rn (θ̂1n ).
We examine the efficacy of the Bartlett correction through a numerical study in
Section 5 to follow. The block bootstrap method for generating spatial data that we
12
use is described in detail by Lahiri (2003a, Section 12.3.1), and requires an integer
block length ζn as input.
5
Numerical study
This section examines the finite sample performance of our EL method for variogram
inference through simulation. The behavior of EL estimators is influenced by a combination of factors including the sample size, the strength of spatial dependence and
the choice of lags and block size. To examine these factors and the interactions among
them, we consider an exponential variogram in R2 parameterized as
2γ(h; θ) = 2 θ1 + s 1 − exp{−||h||/θ2 } ,
h 6= 0 ∈ R2 ,
(9)
with nugget and range parameters θ = (θ1 , θ2 ) ∈ Θ = (0, ∞)2 and fixed sill equal to
s = 1. That is, in studying the performance factors mentioned above and comparing
results obtained through several estimation techniques, we focus estimation on the
nugget and range parameters, as they most directly influence small and large-scale
spatial variation; in practice, the sill parameter would be estimated as well and Section 5.4 provides some simulation evidence for that case. Using a two parameter
model, we generated real-valued, mean-zero (intrinsically) stationary Gaussian variables on an integer grid within sampling regions Rn = λn (−1/2, 1/2]2 ⊂ R2 of size
λn × λn for λn = 10, 30, 50, through the circulant embedding method of Chan and
Wood (1997). Because the dependence strength can greatly impact performance of
least squares methods, we present results for three range parameter values θ2 = 1,
4 or 8, selected to represent relatively weak, moderate and strong dependence with
the nugget value θ1 = 0.5. For increasing λn , we generated a total of 10000, 5000 or
3000 data sets, respectively; these simulation sizes were larger than those needed to
produce Monte Carlo standard errors less than 1% of actual parameter values.
13
In Section 5.1, we first examine EL confidence regions for quantifying the precision
of point estimators by comparing EL coverage probabilities against those obtained
through the SLS method. In Section 5.2, we then compare the mean squared errors
of EL point estimators against those of other LSEs. The simulation results to follow
are based on lag {hi }ri=1 choices for variogram inference (which may differ by process
strength and sample size) using a rule of thumb described in Section 5.3. Section 5.4
considers EL estimation in a three-parameter variogram model.
5.1
Assessment of confidence regions
While Lee and Lahiri (2002) do not explicitly consider interval estimation, we may
propose SLS confidence regions for variogram parameters based on the SLS quadratic
form in (2). We define an approximate 100(1 − α)% SLS-based confidence region for
θ = (θ1 , . . . , θp ) as
θ ∈ Θ : n Qn,Σb −1 (θ) − Qn,Σb −1 (θ̂n,SLS ) ≤ χ2p,1−α .
(10)
where the chi-squared calibration follows naturally from the distributional properb and θ̂n,SLS given by Lee and
ties of the SLS covariance and point estimators Σ
Lahiri (2002). A similar chi-squared approximation is generally not valid for other
least squares approaches, such as OLS or WLS.
As both EL and SLS involve data blocks, Figure 2 displays EL and SLS coverage
probabilities for 90% confidence regions for model parameters in (9) on a sampling
region with λn = 50 against various block sizes. The coverage probabilities for EL
are seen to be generally closer to nominal than those of SLS, which tended to exhibit
extreme under-coverage, especially under strong dependence. This behavior held
often in our simulations over a wide range of block sizes. In Figure 2, the best block
sizes for coverage accuracy increase with the dependence strength and, as expected,
14
Figure 2: Coverage probabilities for 90% confidence regions for (θ1 , θ2 ) plotted against
block sizes for a 50 × 50 sampling region, where θ1 = 0.5. In each panel, the top curve
is for EL, the lower (dashed) curve for SLS based on 3000 simulations.
4
6
8
Block size
10
12
1.0
0.8
0.6
0.2
0.0
EL
SLS
0.0
EL
SLS
0.0
EL
SLS
2
0.4
Coverage Probabilities
0.8
0.6
0.2
0.4
Coverage Probabilities
0.8
0.6
0.4
0.2
Coverage Probabilities
(c) θ2 = 8
1.0
(b) θ2 = 4
1.0
(a) θ2 = 1
2
4
6
8
Block size
10
12
2
4
6
8
10
12
Block size
the coverage probabilities often deteriorate with an increase in the range.
We repeated the previous analysis for smaller sampling regions λn = 30 and
λn = 10. Coverage probabilities typically decreased for both EL and SLS on smaller
regions, with EL continuing to have coverages closer to the nominal level; see Appendix Figure 1 (in the on-line Appendix) for the 30 × 30 region results. For example,
the coverage results for θ2 = 4 and λn = 30 appeared similar to those of θ2 = 8 and
λn = 50. The most drastic reduction in the lattice size λn = 10 produced the lowest
coverage probabilities over a larger variety of block sizes, especially for larger range
parameters θ2 = 4, 8 which are difficult to capture on a small data set. In Figure
3, we show the coverage results for the weak dependence case θ2 = 1 on a 10 × 10
lattice. Although both methods have substantially poorer coverage than on the larger
sampling region, SLS exhibits under-coverage to a greater degree than EL.
On smaller sampling regions in particular, it is possible to enhance the coverage
accuracy of EL confidence regions through the Bartlett correction described in Section 4.2. We applied the Bartlett correction algorithm of Section 4.2 using M = 200
15
Figure 3: Coverage probabilities for 90% confidence regions for (θ1 , θ2 ) plotted against
block sizes for a 10 × 10 sampling region, where θ1 = 0.5, θ2 = 1. Methods include
0.6
0.4
EL
SLS
Bart.EL
Boot.Q. EL
0.0
0.2
Coverage Probabilities
0.8
1.0
SLS, (uncorrected) EL, Bartlett corrected EL and bootstrap quantile calibrated EL.
2
3
4
5
6
7
Block size
block bootstrap renditions with block sizes ζn = bn + 1 in the bootstrap procedure
of Lahiri (2003a, Section 12.1.3). Figure 3 illustrates that this correction dramatically improves EL coverage probabilities over a range of block sizes on the 10 × 10
region. For comparison, we also include in Figure 3 the coverages for EL regions
based on a bootstrap quantile calibration of the log-EL ratio rn (θ0 ) from Theorem 2.
While the Bartlett procedure uses bootstrap replicates for a mean correction, the
bootstrap quantile procedure uses sample quantiles from these bootstrap replicates
to calibrate this log-EL ratio (rather than a chi-squared calibration). In Figure 3,
the Bartlett correction appears to produce better coverage probabilities than the
bootstrap quantile approximation. Intuitively, mean estimation in the Bartlett procedure may be an easier task with a smaller number of bootstrap renditions (e.g.,
M = 200) than approximating extreme quantiles in the log-EL ratio’s distribution;
Chen and Cui (2006b) observed similar results with Bartlett corrected EL intervals
16
Table 1: Coverage probabilities for 90% confidence regions for (θ1 = 0.5, θ2 ) using SLS,
EL, Bartlett corrected EL (Bc) and bootstrap quantile calibrated EL with several
sampling regions and ranges θ2 and a block size bn = n1/5 (based on 1000 simulations).
Rn
10 × 10
20 × 20
30 × 30
θ2
SLS
EL
ELBc
ELboot
SLS
EL
ELBc
ELboot
SLS
EL
ELBc
ELboot
1
67.4
77.6
88.6
80.8
85.4
93.4
96.0
86.2
83.5
92.1
96.4
87.1
4
59.4
74.8
84.8
78.6
58.4
76.8
91.8
82.4
63.0
78.8
92.4
83.6
8
50.4
70.6
84.6
75.6
53.2
75.8
93.0
84.0
52.2
70.2
93.4
81.0
for independent data.
Table 1 displays coverage probabilities for some other smaller-sized regions with
block sizes chosen to have slightly smaller order than n1/4 from (5). The Bartlett
corrected EL regions seem to exhibit good coverage accuracy. In Table 1, the pattern
again emerges that the uncorrected EL regions have better coverages than SLS, but
coverage probabilities typically decrease with an increase in dependence (range).
5.2
Assessment of point estimation
Using the same simulation design as in Section 5.1, we now compare EL and SLS
along with WLS and OLS estimators (from Section 2) in terms of Relative Root
Mean Squared Error (RRMSE) in point estimation. RRMSE is defined here as the
percentage of a parameter value represented by root mean squared error of an estimator, which allows meaningful comparisons across different sampling conditions.
Figure 4 presents the RRMSE results for all four estimators of the range parameter
θ2 when λn = 50 as a function of block size; note WLS and OLS estimators do
not involve data blocks so their RRMSEs are horizontal lines. The four methods
perform very similarly under weak dependence (θ2 = 1), with RRMSEs just below
30%. As the dependence/range increases, differences among the four methods emerge
17
Figure 4: Relative Root Mean Squared Error (%) for the estimate of the range on a
50 × 50 sampling region against block size (based on 3000 simulations).
●
EL
SLS
WLS
OLS
50
40
●
●
●
●
●
●
●
EL
SLS
WLS
OLS
0
0
●
●
10
●
●
●
●
2
4
6
8
10
12
Block size
30
●
●
●
●
●
●
●
●
●
●
●
●
●
20
●
Relative Root MSE (%)
●
●
10
●
EL
SLS
WLS
OLS
0
●
30
40
●
20
●
Relative Root MSE (%)
40
30
●
20
●
10
Relative Root MSE (%)
(c) θ2 = 8
50
(b) θ2 = 4
50
(a) θ2 = 1
2
4
6
8
10
12
2
Block size
4
6
8
10
12
Block size
and EL and SLS appear to have better performances than OLS and WLS (which
are not asymptotically efficient least squares methods). The best block sizes for
EL point estimation appear to be slightly smaller than those for optimal coverage
accuracy in Figure 2. Again small blocks seem adequate for the weakest dependence
(smallest range); for medium and large ranges, EL point estimation worsens when
blocks become too large (i.e, more than n1/4 ≈ 7). The corresponding RRMSE
results for estimation of the nugget θ1 on the 50 × 50 region appear in Appendix
Figure 2 and are typically smaller than for range estimation.
A closer analysis of the bias/spread decomposition of the RRMSEs (results not
shown) indicate that the variability of all estimators is much larger than their bias.
While the EL estimator often has somewhat less bias, the SLS estimator generally
exhibits smaller variance to the extent that the SLS method is often slightly better
in terms of RRMSE. However, the EL point estimator seems RRMSE-comparable to
the SLS estimator and, as Figures 2 and 3 suggest, the SLS point estimator often lies
far enough from the true variogram parameters that the SLS confidence region (10)
exhibits extreme under-coverage.
As the size of the sampling region decreases, the RRMSE performance of all
18
methods deteriorate. For a 30 × 30 sampling region, the RRMSEs for the range
varied just below 40% for weak dependence to above 50% for the strong dependence
case. For estimation of larger range values, SLS and EL appear to be slightly better
than WLS and OLS at smaller block sizes (see Appendix Figure 3); all methods
appear comparable for nugget estimation in this case (Appendix Figure 4). Appendix
Figure 5 illustrates the RRMSE values on a 10×10 region and θ2 = 1 which are around
60% for both nugget and range estimators with all methods and EL and SLS point
estimators particularly worsen when block sizes become large; because fewer blocks
are available, this behavior appears more accentuated on a 10 × 10 region than for
larger sampling regions. Additionally, a chance exists for nugget estimates to be
zero (i.e., driven to the boundary) on small sampling regions and all LSE methods
produced such estimates on the same 7% of the 10000 simulations for the 10 × 10
region with range θ2 = 1 (zero nugget estimates did not appear on the larger regions
in the simulations). The SLS and EL methods produced more zero nugget estimates
for larger blocks (as high as 12% at the maximal block size of b = 7 possible), which
also supports their deterioration in RRMSE as blocks increase on the small region.
5.3
A discussion of the choice of lags
The choice of lags is important for least squares inference and we propose a strategy
for lag selection based on reviewing a large number of situations. Throughout the
preceding two sections, we used the same collection of lags {hi }ri=1 ⊂ Z2 for all
estimation methods, but we allowed these lags to vary by model parameters (9) (e.g.,
dependence strength) and sample size as described in the following.
Asymptotically, the variance of the EL estimator in Theorem 1 will not increase
(and may in fact decrease) by adding more lags to set the moment condition (4);
see Qin and Lawless (1994, Corollary 1) for a similar result on adding EL estimating
19
functions with independent data. In this sense, it is then desirable to have a reasonable
number of lags. However, in practical settings, the data may not support a large lag
number and the addition of too many lags can force the EL moment condition (4)
to become too restrictive for inference as well as hinder estimation for other least
squares methods. Additionally, longer lags are helpful for capturing large-scale spatial
dependence (e.g., range) but overly long lags can also limit the number of data blocks
that are available to both EL and SLS methods. Thus, a balance in the number and
length of lags must be sought, and we suggest a rough rule of thumb for accomplishing
this balance.
To capture small scale variogram behavior (e.g., nugget parameter), the lags
should contain enough local information and we begin with a few very short lags
in both the horizontal and vertical directions of the lattice, such as h1 = (1, 0)0 ,
h2 = (0, 1)0 or additionally h3 = (2, 0)0 , h4 = (0, 2)0 . However, for large values of
the range parameter, the use of only short lags produced poor point estimates for all
four methods from Section 5.2, as expected. We found that the most efficient way
of augmenting the lags (in both number and length) to increase performance, while
also retaining as many data blocks as possible, was to include “diagonal” terms (such
as h5 = (1, 1)0 , h6 = (2, 2)0 and so on) until roughly about 80-100% of the actual
value of the range used. For example, when θ2 = 1 the last “diagonal” term used
was (1, 1)0 , for θ2 = 4 it was (4, 4)0 , while for θ2 = 8 it was (6, 6)0 . This approach
allows incorporation of larger distances with fewer lags. Following this empirical rule
in Sections 5.1 and 5.2, we used lags h1 = (1, 0)0 , h2 = (0, 1)0 and h5 = (1, 1)0 for
λn = 10, 30, 50 with range θ2 = 1. For θ2 = 4, we added h3 = (0, 2)0 , h4 = (2, 0)0 ,
h6 = (2, 2)0 , h7 = (3, 3)0 and h8 = (4, 4)0 for the cases of λn = 30 and 50. It proved
difficult to select lags for satisfactory point estimators with any method on a small
region λn = 10. For the largest range θ2 = 8, we used h1 through h8 on the 30 × 30
20
region and added two more diagonal terms, h9 = (5, 5)0 and h10 = (6, 6)0 on the
50 × 50 region. Not including h9 and h10 on the 30 × 30 region with θ2 = 8 allowed
for more blocks and appeared to produce slightly better point estimators for SLS and
EL methods, which illustrates the complex interaction between the choice of lags,
dependence strength and the sample size. In general, for larger sampling regions,
increasing the lags according to the empirical rule led to reasonable point estimators
of range with both EL and SLS methods and did not create any substantial losses in
performance in nugget estimation (an easier task compared to range estimation). In
practice, one can obtain a pilot estimate of the magnitude of the range using an empirical estimator of the variogram, such as (1), for guidance in selecting the diagonal
elements of h.
5.4
Additional practical and computational issues
In this section, we briefly consider two previously unaddressed issues through simulation: estimation of a three parameter variogram model including a sill parameter and
the separate issue of least squares estimates falling on parameter space boundaries.
So far in Section 5, we have considered a simpler model (4) than would be needed
in application, by fixing the sill rather than estimating it. This approach was motivated by our goals to understand the rather large number of factors that may impact
estimation procedures and to compare EL with three other estimation methods. In
practice however, the sill parameter would be estimated and we address this issue
here. Results from a small simulation study are presented in Figures 5 and 6 and Appendix Figure 6. We generated 2000 data sets from a Gaussian process on a 50 × 50
lattice with a nugget of 0.5, a sill of 1 and ranges of 1, 4 or 8. We used only the
EL technique to estimate the three parameters and report the 10th, 50th and 90th
percentiles of (scaled) parameter estimates in Figure 5 (using the same rule of thumb
21
from Section 5.3 to chose the lag matrix). We scaled the parameter estimates by
dividing them by the true parameter values, thus the size of these percentiles indicates the multiplicative difference between the estimates and the true parameters.
Figure 5 indicates that the EL method performs favorably: the median estimates
for all parameters are around 1 (indicated in the figure) and stable as function of
the block size. However, performance deteriorates as the sample size is decreased
and Figure 6 provides the same percentiles for the scaled EL estimates on a smaller
20 × 20 sampling region (based on 5000 simulations from a similar Gaussian process).
From Figure 6 we see that, for a moderate range value (θ2 = 4), the median estimates
perform well and are stable over block sizes. However, compared to the 50×50 region,
the 90th percentiles for range and sill estimates now assume greater magnitudes while
the 10th percentiles for nugget estimates appear less stable as a function of block size
(Appendix Figure 6). On the other hand, when the range parameter is set to 10, the
median range estimates are no longer stable as a function of block sizes and, with this
strengthening of spatial dependence, the lower percentiles for the nugget estimates
fall further to zero (Appendix Figure 6) while upper percentiles for range estimates
increase greatly in magnitude (Figure 6). Sill estimation remains stable in the sense
that medians are on target, though the upper percentiles do increase as well with the
greater range parameter. On the 20 × 20 region, setting the range to 10 induces a
point where balancing estimation needs becomes difficult in terms of adequate block
sizes (number of blocks) and the rule-of-thumb choice of the lag matrix.
With these simulations, we mention some numerical issues related to LSEs falling
on the boundary of the parameter space. This situation occurs more with small sample
sizes and the chance of this vanishes as the sample size increases. In particular, nugget
estimates can become zero and range estimates may diverge to 0 or +∞; the latter
case can happen especially for large ranges on small regions. Section 5.2 described the
22
Figure 5: Percentiles (10th, 50th and 90th) for (scaled) EL estimates of the range
and sill on a 50 × 50 region against block size for three cases: θ2 = 1, 4 or 8 (based on
3000 simulations); a horizontal line running fully between graph margins indicates 1.
(b) Sill
4.0
(a) Range
3.0
Range=1
Range=4
Range=8
2.5
1.5
2.0
Percentiles
2.5
2.0
0.5
0.5
1.0
1.0
1.5
Percentiles
3.0
3.5
Range=1
Range=4
Range=8
2
4
6
8
10
12
2
4
6
Block size
8
10
12
Block size
Figure 6: Percentiles (10th, 50th and 90th) for (scaled) EL estimates of the range
and sill on a 20 × 20 region against block size for two cases: θ2 = 4 or 10 (based on
5000 simulations); a horizontal line running fully between graph margins indicates 1.
12
(b) Sill
25
(a) Range
Range=4
Range=10
4
6
Percentiles
15
10
2
5
0
0
Percentiles
8
20
10
Range=4
Range=10
2
3
4
5
6
7
8
2
Block size
3
4
5
Block size
23
6
7
8
occurrence of zero-valued nugget estimates on a 10 × 10 region for all LSE methods
involved in the two parameter variogram model (with range parameter 1). In the
three-parameter simulations here, no zero-valued nugget estimates occurred on the
50 × 50 region but 1-5% of EL nugget estimates were zero on the 20 × 20 region
with range 4 (2-6% for range 8); this varying percentage is due to block differences
and typically increased with block size. In the reported two-parameter simulations
as well as in the three-parameter study here on a 50 × 50 region, the maximal range
estimate for any LSE typically was between 10-300 times larger than the size of the
true range, with larger magnitudes associated with estimating large ranges on small
regions. But with the ranges chosen for the 20×20 region here, a fraction of EL range
estimates did explode in size, being at least 1000 times larger than the true ranges
(0-1% with range 4 or 2-5% with range 10, varying with block size); some of these
estimates corresponded to range estimates of +∞. Sill estimation also can become
more unstable as ranges increase on small regions, but to a lesser extent and never
unbounded in our simulations.
These observations support our belief that the decision to use EL for estimating
spatial parameters should start with a careful analysis of the empirical variogram to
understand the strength of spatial dependence in conjunction with an assessment of
the size (and extent) of the available data and the choice of a lag matrix.
6
Spatial regression model
In this section, we consider an extension of the EL method to a spatial regression
model
Z(s) = X(s)0 β + ε(s),
24
s ∈ Rd ,
where X(s) is a q × 1 vector of non-random regressors, β is a vector of regression
parameters and ε(s) is a strictly stationary random process with variogram 2γε (·; θ),
θ ∈ Θ ⊂ Rp . In this framework, a common approach to variogram fitting involves
separately estimating the trend parameter β with some β̂n (e.g., using OLS regression) followed by a step of variogram estimation based on the available residuals
ε̂(s) = Z(s) − X(s)0 β̂n ; this approach is similar to that adopted in the data analysis of
Lee and Lahiri (2002). In contrast, the EL method extends to inference on both trend
β and variogram θ parameters simultaneously through a joint EL function. Simultaneous estimation of the regression parameters and the variogram is attractive for
reducing bias in variogram estimation, which can result from basing such estimation
on residuals (Cressie, 1993). In addition, uncertainty in estimation of both regression
and variogram parameters should be more correctly quantified through simultaneous
estimation than through two-step procedures, even if these are iteratively applied.
For a blockwise EL function for (β, θ), we use the same block collection {Bbn (i) :
i ∈ In } developed in Section 3.3 but with different estimating functions to jointly
treat both trend and variogram parameters. For β ∈ Rq and s ∈ Rd , let Zβ (s) =
Z(s) − X(s)0 β and define Yβ (s), s ∈ Rd , as in (3) replacing Z(·) with Zβ (·). Then,
the process Wβ (s) = {X(s)Zβ (s), Yβ (s)}0 satisfies
EWβ (s) = {0q , Γε (θ0 )}0 ∈ Rq × Rr ,
s ∈ Rd ,
(11)
at the true parameters (β0 , θ0 ) where Γε (θ) is a r × 1 vector with i-th component
2γ (hi ; θ). For each i ∈ In , we let W̄β,i denote the sample mean of the Wβ (·)
observations in block Bbn (i). The joint EL function for (β, θ) is
(
Ln (β, θ) = NINI · sup
)
Y
i∈In
pi : pi ≥ 0,
X
pi = 1,
i∈In
with log-EL function `n (β, θ) = −2b−d
n log Ln (β, θ).
25
X
i∈In
pi W̄β,i = {0q , Γε (θ)}0
The main result here gives the distribution of these joint MELEs β̂n , θ̂n for β, θ,
as maximizers of Ln (β, θ), under the spatial regression model. For weakly dependent
time series, Bravo (2005) studied a blockwise EL for regression with random regressors
X(·). In contrast, our initial result involves non-random regressors, which introduces
non-stationary forms of dependence through the EL block means. However, the
spatial EL method is shown to remain valid for the spatial regression model and
P
again requires no variance estimation. In the following, let An = ni=1 X(si )X(si )0
and define sums SX , SY of X(s)Zβ0 (s), Yβ0 (s) over available sites s ∈ {s1 , . . . , sn };
1/2
define a parameter set Un = {(β, θ) : kAn (β − β0 )k2 + kn1/2 (θ − θ0 )k2 ≤ n2κ } for
some 0 < κ ≤ 1/12 and formulate the following:
Assumption S: In addition to A3 through A6, there exists δ > 0 such that τ1 >
5d(6 + δ)/δ, 0 < τ2 ≤ (τ1 − d)/d and E{Z(0) − Z(hi )}12+2δ < ∞ for i = 1, . . . , r.
Theorem 3 Suppose that Assumption S holds for ε(·) ≡ Z(·) and 2γ(·; ·) ≡ 2γε (·; ·)
−1/2
with E|ε(0)|6+δ < ∞; max{kAn
X(si )k : 1 ≤ i ≤ n} = O(n−1/2 ); (12) exists and is
positive definite; and P {Ln (β, θ) > 0 for (β, θ) ∈ Un } → 1. Then, as n → ∞,
p
(i.) P (global maximizers β̂n , θ̂n exist) → 1 and (β̂n , θ̂n ) −→ (β0 , θ0 ).
d
(ii.) `n (β0 , θ0 ) − `n (β̂n , θ̂n ) −→ χ2q+p and




 
1/2
0
∆ε
∆ε D0ε Σ−1
ε Bε 
 n (θ̂n − θ0 )  d
 0p  

 −→ N 
,

1/2
−1
An (β̂n − β0 )
0q
Bε Σε Dε ∆ε
Qε
−1
−1
−1
0 −1
0
with Dε ≡ ∂Γε (θ0 )/∂θ, ∆ε ≡ {D0ε Σ−1
ε Dε } , Qε ≡ Aε −Bε (Σε −Σε Dε ∆ε Dε Σε )Bε ,


−1/2 0
An SX
 Aε Bε 
lim Var −1/2
≡
(12)
.
n→∞
n
SY
Bε Σε
Under the conditions of Theorem 3, the limiting covariance matrix of θ̂n matches
that of the optimal LSE based on the sample variogram of the underlying process ε(·)
(Lahiri et al., 2002), making this MELE of the variogram parameter θ asymptotically
26
as efficient under spatial regression. For the regression parameter, the MELE β̂n
can be asymptotically more efficient than the standard OLS estimator β̂n,OLS of β.
1/2
To see this, note that Aε represents the limiting variance An (β̂n,OLS − β0 ) of the
OLS estimator and Aε − Qε is nonnegative definite from Theorem 3. Hence, the
simultaneous EL approach can improve upon OLS inference for β. In addition, the
chi-squared distribution of the log-EL ratio for (β, θ) can again be used for confidence
region estimation and profile versions for β or θ alone are also possible as in Theorem 2.
−1/2
Remark 1: Regarding Theorem 3 assumptions, a growth condition on max kAn
−1/2
is required for a central limit theorem with weighted sums like An
X(si )k
SX (cf. Lahiri,
2003b) and the moment/mixing conditions match those in Lee and Lahiri (2002). The
probability assumption that Ln (β, θ) can be positively computed in a neighborhood
Un of (β0 , θ0 ) is similar to conditions in other EL regression contexts (cf. Owen, 1991;
Bravo, 2005).
Remark 2: The same EL construction is also valid when the regressors are stochastic.
Namely, if {X(s), (s)} is strictly stationary and (11) holds, the conclusions of Theorem 3 remain valid under appropriate mixing/moment conditions on {X(s), (s)}.
Remark 3: When the random process Z(·) is strictly stationary, Theorem 1 and
Theorem 2 remain valid under Assumption S conditions.
6.1
An example
In this section we briefly illustrate the EL spatial regression method with a simulated
data set on a small 15 × 15 sampling region Rn . Real-valued data were simulated as
Z(s) = β0 + β1 s1 + β2 s2 + ε(s),
s = (s1 , s2 ) ∈ Z2 ,
using a simple linear trend in the coordinates at each location s ∈ Rn ∩ Z2 with
(β0 , β1 , β2 ) = (1, 0.5, 0.75) and continuous errors ε(s) ≡ ε1 (s) + ε2 (s) consisting of
27
a mean-zero stationary Gaussian processes ε1 (·) pertubated by a collection of ε2 (·)
√
independently distributed chi-squared variables 0.3(χ21 − 1). The resulting error
process has an isotropic Gaussian variogram
2γε (h; θ) = 2θ0 + 2θ1 1 − exp − 3(khk/θ2 )2 ,
h 6= 0 ∈ R2 ,
involving nugget, sill, and range parameters θ = (θ0 , θ1 , θ2 ) = (0.6, 1, 4), respectively.
For comparison, we describe two analysis approaches. One common practice in
geostatistical applications is to separately estimate the large scale structure (i.e.,
β0 , β1 , β2 ) by OLS regression, for example, followed by estimation of the variogram
(i.e., 2γε (·; θ)) based on the resulting residuals. This is simply a two-step procedure. Based on the OLS regression residuals, we applied the EL method described
in Section 4.2 to obtain 90% confidence bands for the variogram model, presented in
Figure 7. This two stage detrending analysis has the disadvantage of not accurately
reflecting the uncertainty in variogram fitting due to regression estimation, possibly
losing some regression precision and introducing bias into variogram estimation (see
Cressie, 1993). We next applied the EL procedure from Section 6 for simultaneous
analysis of regression and variogram scale structures and Figure 7 also presents the
confidence bands resulting from this approach. The lags for EL were chosen by the
empirical rule described in Section 5.3. For this example, the range is θ2 = 4 and
we used the matrix h given by h1 = (1, 0)0 , h2 = (0, 1)0 , h3 = (0, 2)0 , h4 = (2, 0)0 ,
h5 = (1, 1)0 , h6 = (2, 2)0 , h7 = (3, 3)0 and h8 = (4, 4)0 with a block size bn = 3.
Both estimation approaches produce comparable variogram point estimates but
the confidence bands are wider when accounting for uncertainty in the regression
parameters. The simultaneous analysis also provides confidence intervals for the
large-scale (regression) parameters and, in this example, 90% EL confidence intervals
for (β0 , β1 , β2 ) are (1.61,4.37), (0.15,0.55) and (0.50,0.81), respectively.
The width of the EL confidence bands increases with distance, which is not surpris28
Figure 7: Two stage and simultaneous analysis EL confidence bands represented by
dashed lines. True variogram used to generate these data is represented by the solid
8
line.
4
0
2
Variogram
6
True Variog
Bands_Detr
Bands_Sim
0
2
4
6
8
10
Distance
ing, but does indicate that variogram values at shorter distances are more precisely
estimated than values at longer distances. This is pleasing from the viewpoint of
spatial prediction, since typical kriging predications depend primarily on variogram
values at shorter distances. But also note that the width of the confidence band does
increase substantially before the sill is reached; the width of the confidence bands
in Figure 7 at a distance of 4 units are about half again greater than at a distance
of 2 units. Such information could prove useful in applications for which a kriging
neighborhood is selected (e.g. Cressie, 1993, p.134, 158) although a full discussion is
beyond the scope of this article. A closer analysis of this figure reveals that the upper
EL confidence band limits are more extreme than the lower limits, which agrees with
percentile behavior in Figures 5 and 6. In this example, the width of the confidence
bands reflects the sample size. For larger samples, the bands become narrower and
more informative over all distances and simultaneous/detrended bands become closer.
29
7
Summary and concluding remarks
This article presents a new method for estimation of variogram model parameters for
intrinsically stationary processes using a blockwise spatial empirical likelihood (EL)
approach. The proposed method has the advantage that it does not require knowledge about the full joint distribution of the spatial data and involves no covariance
matrix estimation or inversion. This makes the EL method computationally more
attractive than generalized least squares (GLS) or parametric approaches. The internal studentization feature of the spatial EL method can be convenient in other ways.
When the variogram inference problem changes (e.g., inference on both spatial trend
and variogram parameters), the same essential construction of an EL function applies
and any new needs in spatial studentization are handled automatically within the
mechanics of EL function which can be calibrated for confidence regions. Hall and
La Scala (1990, p. 110) have noted similar EL properties with independent data in
comparing EL to the bootstrap. Under mild conditions, EL variogram estimators are
asymptotically normal and as efficient as the LSE-optimal GLS or subsampling least
squares (SLS) estimators. Numerical studies suggest that EL and SLS point estimators have comparable mean squared errors in large samples, but coverage probabilities
for EL confidence regions are typically closer to the nominal levels than for SLS regions. In terms of computational speed, both SLS and the spatial EL were similar in
our simulation studies and were slightly more demanding than weighted or ordinary
least squares. The spatial EL method also extends to spatial regression problems,
allowing simultaneous inference on both regression and variogram parameters.
Numerical studies also indicate that a Bartlett correction may improve the coverage probabilities of spatial EL confidence regions. The mechanics to rigorously
establish the Bartlett correction for the spatial setting are not yet fully developed
and should be addressed in future work along with data driven methods for spatial
30
block selection. Additionally, improvements in the performance of the EL method
should be possible through data tapering. For time series data, Paparoditis and
Politis (2001) have shown that tapered data blocks produce better block bootstrap
variance estimators. This notion can be applied to build a spatial EL function that
replaces spatial block sample averages with tapered spatial block averages. Like the
block bootstrap, the variance estimation mechanism internal to the spatial EL should
be improved to enhance the performance of the method in general.
Other possibilities for future research with the spatial EL method include model
testing. While we focused on estimation in this manuscript, the statistic `n (θ̂n ) based
on our EL estimator θ̂n ∈ Rp can also be used to test whether the variogram moment
conditions (4) hold as a means of variogram model checking. If there are r > p lags
used, then `n (θ̂n ) will have an asymptotic chi-squared distribution with r − p degrees
of freedom under Theorem 1 to test if (4) holds for some parameter θ0 .
Finally, we comment on potential uses of EL confidence regions and bands to
assess the uncertainty associated with variogram estimation. This feature may be
useful in practice when one needs to detect changes in spatial structure between
different regions or different time intervals, since variogram confidence bands have
the potential to facilitate meaningful comparisons.
Connections exist between parameter estimation of variograms, large-scale regression models, and spatial prediction (Cressie 1993). Investigation of these connections,
particularly in actual applications, has been hampered by the inability to easily quantify uncertainty in variogram estimation. For example, it is well accepted that the
typical estimates of prediction error in applications of kriging are underestimates of
the true prediction error (e.g. Cressie 1993, p.111, 127) because uncertainty in variogram estimation is not taken into account. Although EL estimation of uncertainty
does not lend itself easily to theoretical analysis, uncertainty in variogram estima31
tion is easily computed with the spatial EL approach for specific applications. This
holds promise for obtaining practical expressions for the total uncertainty in kriging
predictions for specific applied problems.
References
Bravo, F. (2002) Testing linear restrictions in linear models with empirical likelihood. The Econometrics Journal Online, 5, 104-130.
Bravo, F. (2005) Blockwise empirical entropy tests for time series regressions. J.
Time Ser. Anal., 26, 185-210.
Chan G. and Wood A. T. A. (1997) An algorithm for simulating stationary Gaussian
random fields. Applied Statistics, 46, 171-181.
Chen, S. X. (1993) On the coverage accuracy of empirical likelihood regions for linear
regression models. Ann. Inst. Statist. Math. 45, 621-637.
Chen, S. X. and Cui, H.-J. (2006a) On Bartlett correction of empirical likelihood in
the presence of nuisance parameters. Biometrika, 93, 215-220.
Chen, S. X. and Cui, H.-J. (2006b) On the second order properties of empirical
likelihood with moment restrictions. Technical report. Dept. of Statistics. Iowa
State University.
Cressie, N. (1985) Fitting variogram models by weighted least squares. J. Int. Ass.
Math. Geol., 17, 693-702.
Cressie, N. (1993) Statistics for Spatial Data, 2nd Edition. John Wiley & Sons, New
York.
DiCiccio, T., Hall, P., and Romano, J. P. (1991) Empirical likelihood is Bartlettcorrectable. Ann. Statist., 19, 1053-1061.
32
Doukhan, P. (1994) Mixing: properties and examples. Lecture Notes in Statistics
85. Springer-Verlag, New York.
Francesco, B. (2002) Testing linear restrictions in linear models with empirical
likelihood. The Econometrics Journal Online, 5, 104-130.
Francesco, B. (2005) Blockwise empirical entropy tests for time series regressions
J. Time Ser. Anal., 26, 185-210.
Hall, P. and La Scala, B. (1990). Methodology and algorithms of empirical likelihood. Internat. Statist. Rev. 58 109-127.
Kitamura, Y. (1997). Empirical likelihood methods with weakly dependent processes. Ann. Statist., 25, 2084-2102.
Kitamura, Y., Tripathi, G. and Ahn, H. (2004). Empirical likelihood-based inference in conditional moment restriction models. Econometrica 72, 1667-1714.
Künsch, H. R. (1989). The jackknife and bootstrap for general stationary observations. Ann. Statist. 17 1217-1261.
Lahiri, S. N. (2003a) Resampling Methods for Dependent Data. Springer, New York.
Lahiri, S. N. (2003b) Central limit theorems for weighted sums of a spatial process
under a class of stochastic and fixed designs. Sankhya: Series A, 65, 356-388.
Lahiri, S.N., Lee, Y. and Cressie, N. (2002) On asymptotic distribution and
asymptotic efficiency of least squares estimators of spatial variogram parameters. J. Statist. Planng Inf., 103, 65-85.
Lee, Y. D. and Lahiri, S. N. (2002) Least squares variogram fitting by spatial
subsampling. J. R. Stat. Soc. Ser. B, 64, 837-854.
Matheron, G. (1962) Traité de geostatistique appliquée, tome I. Mem. Bur. Rec.
Geol. Min., 14.
Monti, A. C. (1997). Empirical likelihood confidence regions in time series models.
Biometrika, 84, 395-405.
33
Newey, W. K. and Smith, R. J. (2004) Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 72 219-255.
Nordman, D. J. and Lahiri, S. N. (2006). A frequency domain empirical likelihood
for short- and long-range dependence. Ann. Statist. 64, 3019-3050.
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 237-249.
Owen, A. B. (1990). Empirical likelihood confidence regions. Ann. Statist., 18, 90120.
Owen, A. B. (1991). Empirical likelihood for linear models. Ann. Statist., 19, 17251747.
Owen, A. B. (2001). Empirical likelihood. Chapman & Hall, London.
Paparoditis, E. and Politis, D. N. (2001). Tapered block bootstrap. Biometrika
88, 1105-1119.
Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based
on subsamples under minimal assumptions. Ann. Statist., 22, 2031-2050.
Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist., 22, 300-325.
Sherman, M. (1996) Variance estimation for statistics computed from spatial lattice
data. J. R. Stat. Soc. Ser. B, 58, 509-523.
Zhang, J. (2006). Empirical likelihood for NA series. Statist Probab. Lett. 76, 153160.
Zhang, X., van Eijkeren, J. and Heemink, A. (1995) On the weighted leastsquares method for fitting a semivariogram model. Comput. Geosci., 21, 605608.
34
Supplementary Material:
Point and interval estimation of variogram models
using spatial empirical likelihood
Daniel J. Nordman, Petruţa C. Caragea
Department of Statistics
Iowa State University
Ames, IA 50011
Supplementary material to follow consists of an Appendix providing proofs of the
main results as well as some additional figures (appearing on the last pages).
8
Appendix: Proofs of Main Results
Section 8.1 describes the assumptions used to establish the main distributional results
on spatial EL variogram inference and proofs are presented in Section 8.2. To simplify
the presentation, a very general blockwise EL argument and result is formulated in
Section 8.3, to which we will refer. Theorem 4 there allows different EL scenarios to
be treated in a unified manner.
8.1
Assumptions
Limits in order symbols are taken letting n → ∞ and, for two positive sequences, we
write sn ∼ tn if sn /tn → 1.
For a vector x = (x1 , ..., xd )0 ∈ Rd , let kxk and kxk∞ = max1≤i≤d |xi | denote
the Euclidean and l∞ norms of x, respectively. Define the distance between two sets
E1 , E2 ⊂ Rd as: dis(E1 , E2 ) = inf{kx − yk∞ : x ∈ E1 , y ∈ E2 }. Recall the process
{Z(s) : s ∈ Rd } is assumed to be real-valued and intrinsically stationary. Let F(T )
1
denote the σ-field generated by the random vectors {Z(s) : s ∈ T }, T ⊂ Rd . For
T1 , T2 ⊂ Rd , write α̃(T1 , T2 ) = sup{|P (A ∩ B) − P (A)P (B)| : A ∈ F(T1 ), B ∈ F(T2 )}
and define the strong mixing coefficient for the process Z(·) as
αY (v, w) = sup{α̃(T1 , T2 ) : Ti ⊂ Rd , |Ti | ≤ w, i = 1, 2; dis(T1 , T2 ) ≥ v},
v, w > 0.
(13)
Spatial observations Z(·) are observed at sites on the integer lattice Zd within a spatial
sampling region Rn = λn R0 ⊂ Rd . Recall the EL method involves block sample
P
means Ȳi = s∈Bbn (i)∩Zd Y(s)/bdn of distances Y(·) ∈ Rr as in (3) (i.e., formed by
lags {hi }ri=1 ) over a collection of blocks {Bbn (i) : i ∈ In }; each block contains bdn
observations of Y(·) and there are NI = |In | blocks. In the following, let θ0 ∈
Θ ⊂ Rp denote the unique parameter value satisfying the moment condition (4) and
d/2
define normalized block means Ỹi = bn (Ȳi − EȲi ), i ∈ In . We make the following
assumptions, which are similar to those used by Lee and Lahiri (2002).
−1/2
A1. NI
P
i∈In (Ȳi
d
− EȲi ) −→ Z, for a normal vector Z ∼ N {0r , Σ(θ0 )} with
positive definite Σ(θ0 ).
P
A2. kNI−1 i∈In EỸi Ỹi0 − Σ(θ0 )k = o(1); max{EkỸi k4+δ0 : i ∈ In } = O(1) for some
P
δ0 > 0; and kNI−1 i∈In P (Ỹi ≤ y) − P (Z ≤ y)k = o(1) for each y ∈ Rr .
A3. There exist τ1 , τ2 > 0 with τ1 ≥ dτ2 such that α(v, w) ≤ Cv −τ1 wτ2 for all
v, w ≥ 1.
A4. For any > 0, there exists δ > 0 with inf{kΓ(θ) − Γ(θ0 )k : kθ − θ0 k ≥ , θ ∈
Θ} > δ .
A5. In a neighborhood of θ0 , Γ(θ) is continuously differentiable and D(θ0 ) ≡ ∂Γ(θ0 )/∂θ
has full column rank p.
2
A6. As n → ∞, b−1
n + bn /λn = o(1) and, for any positive real sequence an → 0,
the number of cubes of an Zd which intersect both closures R0 and Rd \ R0 is
−(d−1)
O(an
).
2
Assumption A1 implies that an unbiased block-based estimator
P
i∈In
Ȳi /NI of the
variogram 2γ(h) at lags h ∈ {h1 , . . . , hr } is asymptotically normal. This blockwise
estimator involves a weighted average of vector Y(·) observations and so represents a
variation on Matheron’s (1962) method-of-moments estimator (1); hence, A1 essentially implies that the sample variogram has a normal limit. Assumptions A2 and A3
closely match the mixing/moment assumptions of Lee and Lahiri (2002). The probability condition in A2 is used only to ensure that the EL ratio (6) is positive at θ0 . Assumptions A4 and A5 are smoothness and identifiability conditions on the variogram
model, and may be checked in practice. Assumption A6 entails a spatial generalization of EL block scaling conditions used for time series d = 1 (Kitamura, 1997). The
condition on the template R0 implies that the total number of Z(·)-sampling sites
in Rn = λn R0 (located at Zd ∩ Rn ) is of larger magnitude O(λdn ) than the number
O(λd−1
n ) of sites near the boundary of Rn , allowing us to avoid pathological regions
in the same manner as with the spatial subsampling by Lee and Lahiri (2002). The
R0 -boundary condition also implies that spatial sample size n = |Rn ∩ Zd | ∼ vol(Rn )
and the number of size-bdn EL blocks NI = |In | ∼ vol(Rn ), where vol(·) denotes volume and vol(Rn ) = λdn vol(R0 ); see Lahiri (2003a, Chapter 12.2) for details. We will
use that n, NI , vol(Rn ) are asymptotically equivalent in arguments to follow.
8.2
Proofs of the Main Results
To prove Theorems 1 and 2, we apply the general blockwise result in Theorem 4 (Section 8.3) with Lemma 1 below, recalling the EL function Ln (·) from (6) for variogram
inference.
d/2
Lemma 1 Let δ ≡ 4−1 δ0 /(4 + δ0 ) from A2 and Ỹi = bn {Ȳi − Γ(θ0 )}, i ∈ In . Under
0 p
b θ0 ≡ N −1 P
the moment condition (4) and A1 through A6: (i) Σ
I
i∈In Ỹi Ỹi −→ Σ(θ0 );
−d/2 1/2−δ (ii) maxi∈In kỸi k = op bn NI
; and (iii) P Ln (θ) > 0 for θ ∈ Θn → 1 for
3
−1/2+δ
Θn = {θ ∈ Θ : kθ − θ0 k ≤ NI
}
proof. Note that under (4), EȲi = Γ(θ0 ). With A2/A3, part(i) follows from
b θ0 − Σ(θ0 )| = o(1) and, by Lemma 1 of Lee and Lahiri (2002), Var(Σ
b θ0 ) = o(1).
|EΣ
P
1/(4+δ0 )
by
Part(ii) follows from E(maxi∈In kỸi k) ≤ ( i∈In EkỸi k4+δ0 )1/(4+δ0 ) ≤ NI
d/2
1/4
A2 and bn /NI
Fn (s, θ) ≡ NI−1
= o(1) by A6. For part(iii), define an empirical distribution
P
d/2
i∈In
I(bn s0 {Ȳi − Γ(θ)} < 0), s ∈ Rr , θ ∈ Θn where I(·) de-
notes the indicator function, and let Z denote a N {0r , Σ(θ0 )} vector. Using A2/A3
d/2
as well as supθ∈Θn bn kΓ(θ) − Γ(θ0 )k = o(1) by A5/A6, it may be shown that
p
supθ∈Θn sups∈Rr ,ksk=1 |Fn (s, θ) − P (s0 Z < 0)| −→ 0. (The main step to show this is
P
p
d/2
that, for each fixed y ∈ Rr , F̃n (y) ≡ NI−1 i∈In I(bn {Ȳi −Γ(θ0 )} ≤ y) −→ P (Z ≤ y)
holds using |EF̃n (y) − P (Z ≤ y)| = o(1) by the probability condition in A.2 along
with Var(F̃n (y)) = o(1) by subsampling arguments based on mixing as in Politis and
Romano (1994).) By this and the fact that inf s∈Rr ,ksk=1 P (s0 Z < 0) > C holds for
some C > 0 because Σ(θ0 ) is positive definite (Lemma 2, Owen, 1990), we have
P ( inf
inf
θ∈Θn s∈Rr ,ksk=1
Fn (s, θ) > C) → 1.
(14)
Fix θ ∈ Θn . Then Ln (θ) > 0 holds if 0r ∈ Rr is interior to the convex hull of
{Ȳi − Γ(θ) : i ∈ In }; see Section 3.3. If 0r ∈ Rr is not interior, there exists
s ∈ Rr , ksk = 1 such that s0 {Ȳi −Γ(θ)} ≥ 0 for all i ∈ In by the supporting/separating
hyperplane theorem, implying Fn (s, θ) = 0. In which case, P (Ln (θ) > 0 for θ ∈ Θn )
must be at least as great as the probability in (14), proving Lemma 1(iii). Proof of Theorems 1-2. We apply the general argument from Section 8.3 with
−1/2
ϑ ≡ θ, Θ̃ ≡ Θ, Mi,θ ≡ NI
1/2
{Ȳi − Γ(θ)}, Cn ≡ NI Ip , Σ ≡ Σ(θ0 ), D ≡ −D(θ0 ),
and Θ̃n ≡ Θn defined with δ ≡ 4−1 δ0 /(4 + δ0 ) from Lemma 1. In this set-up, we
−1/2
verify the conditions of Theorem 4 hold. Note that Mi,θ − Mi,θ0 = NI
4
{Γ(θ) −
−1/2
Γ(θ0 )}, ∂Mi,θ /∂θ = NI
∂Γ(θ)/∂θ. From this, Condition B1 follows from A1 and
(4); B2 from Lemma 1(i); B3 from A5; B4 from Lemma 1(iii); B5 from A1/A5;
B6 from Lemma 1(ii) and A6; and B7/B8 from A5. Now assuming the additional
condition in Theorem 4(v) holds, then Theorem 1 and Theorem 2(i) follow directly
from Theorem 4; Theorem 2(ii) holds as well by modifying arguments in Qin and
Lawless (1994, Corollary 5).
To verify the condition in Theorem 4(v), we define an EL ratio L̃n (µ) for the Z(·)process variogram parameter µ ≡ {2γ(h1 ), . . . , 2γ(hr )}0 ∈ Rr by replacing “Γ(θ)”
Then, letting µθ ≡ Γ(θ), θ ∈ Θ, we have L̃n (µθ ) = Ln (θ).
P
≡ NI−1 i∈In Ȳi is the maximizer of L̃(µ), with L̃(µ̂n ) = 1. Fix
with “µ” in (6).
Note that µ̂n
α ∈ (0, 1). The µ-confidence set In,α ≡ {µ ∈ Rr : L̃n (µ) ≥ exp(−bdn χ2r;α /2)} is
convex in Rr (Theorem 2.2, Hall and La Scala, 1990) and hence connected. By Thed
2
orem 4(i), `n (θ0 ) = −2b−d
n log L̃n (µθ0 ) −→ χr and it can generally be shown that
−1/2
−2b−d
n log L̃n (µθ0 + NI
d
Σ1/2 s) −→ χ2r (ksk2 ) holds for s ∈ Rr as in Owen (1990,
Corollary 1), where ksk2 denotes a non-centrality parameter (this follows from (16)
d
1/2
here). By this and the fact that µ̂n ∈ In,α with NI (µ̂n − µθ0 ) −→ N (0r , Σ) by A1,
1/2
it must hold that supµ∈In,α NI kµ − µθ0 k = Op (1) by the connectedness of In,α . (The
−1/2
idea is that, for large C > 0, a ball of radius NI
C around µθ0 will contain µ̂n with
−1/2
arbitrarily high probability and points on this ball’s boundary, say Bd(NI
C),
1/2
will belong to In,α with arbitrarily low probability; if supµ∈In,α NI kµ − µθ0 kC >
−1/2
1/2
NI kµ̂n − µθ0 k, then connectedness of In,α implies “In,α ∩ Bd(NI
C) is non-empty”
(else In,α could be divided among two open sets), but this quoted event has low prob1/2
ability when C is large.) It then follows that sup{NI kθ − θ0 k : θ ∈ Θ, µθ ∈ In,α } =
Op (1) by A4/A5 so that the Theorem 4(v) condition follows. 5
Proof of Theorem 3. We sketch the proof, applying Theorem 4 as follows. Let Im
denote the m × m identity matrix, m ≥ 1. With notation from Section 6, we define
terms to be used in the argument from Section 8.3: ϑ ≡ (β, θ), Θ̃ ≡ Rq × Θ ⊂ Rq+p ;
(q + r) × 1 vector Mi,θ ≡ C−1
n {W̄i − (0q , Γ(θ))}; Σ ≡ (q + r) × (q + r) matrix in (12);
Θ̃n ≡ Un and δ ≡ κ; (q + r) × (q + r) and (q + r) × (q + p) matrices


Cn ≡ 

1/2
An
0
0
NI Ir
1/2


,

 Iq 0 
D ≡ −
.
0 D
In this framework, we may verify the conditions of Theorem 4 under Theorem 3
assumptions. Condition B1 holds by assumption. Under the mixing/moment assumptions and (12), B2 follows from Lahiri (2003b, Theorem 4.3) while B3 follows
b ϑ0 − Σk = o(1) here and showing Var(Σ
b ϑ0 ) = o(1) with straightforfrom checking kEΣ
ward modifications to Lee and Lahiri (2002, Lemma 1). Conditions B4 through B8
may be checked using standard moment bounds from Doukhan (1994, Theorem 1.2.3)
for weighted sums of random variables.
Now Theorem 3 will follow from Theorem 4 upon verifying the condition in Theorem 4(v) for ϑ = (β, θ). With “µ, µθ ” defined as in the proof of Theorem 1 above,
define L̃n (β, µ) by replacing “Γ(θ)” with “µ” in the definition of Ln (ϑ) ≡ Ln (β, θ)
so that L̃n (β, µθ ) = Ln (ϑ). Fix α ∈ (0, 1). It may be shown that In,α ≡ {(β, µ) ∈
Rq+r : L̃n (β, µ) ≥ exp(−bdn χ2q+r;α /2)} is connected in Rq+r ; namely, it holds that
{β ∈ Rq : supµ∈Rr L̃n (β, µ) ≥ exp(−bdn χ2q+r;α /2)} is connected while, for fixed β,
{µ ∈ Rr : (β, µ) ∈ In,α } is convex. Then, by Theorem 4(i) and arguments as in
Owen (1990, Corollary 1), we have sup(β,µ)∈In,α kCn {(β, µ)−(β0 , µθ0 )}k = Op (1) by the
connectedness of In,α . Consequently, sup{kCn (ϑ−ϑ0 )k : ϑ = (β, θ) ∈ Rq ×Θ, (β, µθ ) ∈
In,α } = Op (1) by A4/A5, implying that the Theorem 4(v) condition holds. 6
8.3
A General Blockwise EL Argument
Suppose Mi,ϑ : In × Θ̃ → Rr represents an estimating function defined on Bbn (i),
Q
i ∈ In , ϑ ∈ Θ̃ ⊂ Rp with corresponding EL function Ln (ϑ) = NINI sup{ i∈In pi :
P
P
pi ≥ 0, i∈In pi = 1, i∈In pi Mi,ϑ = 0r } and `n (ϑ) = −2b−d
n log Ln (ϑ). For some
δ ∈ (0, 1/2), ϑ0 ∈ Θ̃ and invertible p × p scaling matrix Cn , define Θ̃n = {θ ∈ Θ̃ :
P
0
b ϑ = bdn P
kCn (ϑ − ϑ0 )k ≤ NIδ }. For ϑ ∈ Θ̃n , let Mϑ = i∈In Mi,ϑ , Σ
i∈In Mi,ϑ Mi,ϑ ,
Ωϑ = max{1, kCn (ϑ − ϑ0 )k} and suppose ∂Mi,ϑ /∂ϑ is continuous on Θ̃n , i ∈ In .
Define two functions of (ϑ, t) on Θ̃n × Rr as
Q1n (ϑ, t) =
X
i∈In
Mϑ,i
,
1 + t0 Mϑ,i
Q2n (ϑ, t) =
b−d
n
X ∂Mϑ,i /∂ϑ 0 t
.
0M
1
+
t
ϑ,i
i∈I
(15)
n
Define ϑ̂∗n ≡ arg maxϑ∈Θ̃n Ln (ϑ) and the global EL maximizer ϑ̂n ≡ arg maxϑ∈Θ̃ Ln (ϑ).
Let χ2v denote a chi-squared random variable with v degrees of freedom and lower α
quantile denoted as χ2v,α .
d
Theorem 4 Suppose (B1) Mϑ0 −→ N (0r , Σ) with r × r positive definite matrix Σ;
P
p
b ϑ0 −→
(B2) Σ
Σ; (B3) supϑ∈Θ̃n i∈In kMi,ϑ0 − Mi,ϑ k = Op (NIδ );
(B4) P (Ln (ϑ) > 0 on ϑ ∈ Θ̃n ) → 1; (B5) supϑ∈Θ̃n kMϑ k/Ωϑ = Op (1);
(B6) supϑ∈Θ̃n maxi∈In bdn NIδ kMi,ϑ k = op (1); (B7) for an r × p matrix D of rank p,
P
supϑ∈Θ̃n kD − i∈In (∂Mi,ϑ /∂ϑ)C−1
n k = op (1);
P
(B8) supϑ∈Θ̃n i∈In k(∂Mi,ϑ /∂ϑ)C−1
n k = Op (1).
d
d
Then, (i) P (ϑ̂∗n exists) → 1; (ii) `n (ϑ0 ) −→ χ2r ; (iii) Cn (ϑ̂∗n −ϑ0 ) −→ N {0p , (D0 Σ−1 D)−1 };
d
(iv) `n (ϑ0 )−`n (ϑ̂∗n ) −→ χ2p ; and (v) if, in addition, sup{kCn (ϑ−ϑ0 )k : ϑ ∈ Θ̃, `n (ϑ) ≤
χ2r,α } = op (NIδ ) for some fixed α ∈ (0, 1), then P (ϑ̂∗n = ϑ̂n ) → 1.
proof. By B4, we may write Ln (ϑ) =
Q
i∈In (1
+ γi,ϑ ) > 0 for ϑ ∈ Θ̃n where
γi,ϑ = t0ϑ Mϑ,i < 1 and Q1n (ϑ, tϑ ) = 0r (see Section 3.3 for details). Note that
bϑ − Σ
b ϑ0 k = op (1) so that sup
b
B3/B6 imply supϑ∈Θ̃n kΣ
ϑ∈Θ̃n kΣϑ − Σk = op (1) by B2.
7
b ϑ and Q1n (ϑ, tϑ ) = 0r entail that tϑ is a continThe resulting positive definiteness of Σ
uously differentiable function of ϑ on Θ̃n by the implicit function theorem and `n (ϑ)
is as well; see Qin and Lawless (1994, p. 304). Hence, `n (ϑ) attains a minimum (or
Ln (ϑ) a maximum) on Θ̃n , establishing Theorem 4(i).
Following Owen (1990, p. 101), write tϑ = ktϑ kuϑ ∈ Rr , kuϑ k = 1 for ϑ ∈ Θ̃n
P
and expand 0 = u0ϑ Q1n (ϑ, tϑ ) = u0ϑ Mϑ − i∈In u0ϑ Mi,ϑ M0i,ϑ tϑ /{1 + t0ϑ Mϑ,i } to find
0≥
b ϑ uϑ kMϑ k
ktϑ k u0ϑ Σ
−
,
bdn Ωϑ 1 + Zn
Ωϑ
ϑ ∈ Θ̃n ,
b ϑ − Σk = op (1) with
where Zn ≡ supϑ∈Θ̃n maxi∈In ktϑ kkMi,ϑ k. Using supϑ∈Θ̃n kΣ
B5/B6, this inequality yields supϑ∈Θ̃n ktϑ k/Ωϑ = Op (bdn ) and Zn = op (1). For any
b −1 Mϑ + φϑ where kφϑ k ≤
ϑ ∈ Θ̃n , we algebraically solve 0r = Q1n (ϑ, tϑ ) for tϑ = bdn Σ
ϑ
d
b −1 kkΣ
b ϑ k/(1 − Zn ) so that sup
Zn ktϑ kkΣ
ϑ∈Θ̃n kφϑ k/Ωϑ = op (bn ). Applying Taylor’s
ϑ
2
expansion gives log(1 + γi,ϑ ) = γi,ϑ − γi,ϑ
/2 + ∆i,ϑ for each i ∈ In so that
b −1 Mϑ − b−d φ0 Σ
b
bdn `n (ϑ) = bdn M0ϑ Σ
n
ϑ ϑ φϑ + 2
ϑ
X
∆i,ϑ ,
i∈In
P
−d
with bn
i∈In
2
3
b
|∆i,ϑ | ≤ b−2d
n ktϑ0 k Zn kΣϑ k/(1 − Zn ) ; this and bounds on ktϑ k, kφϑ k
yield
b −1 Mϑ |/Ω2ϑ = op (1).
sup |`n (ϑ) − M0ϑ Σ
ϑ
(16)
ϑ∈Θ̃n
For ϑ = ϑ0 , we have Ωϑ0 = 1 so that (16) and B1/B2 yield Theorem 4(ii).
b ϑ − Σk = op (1) and B5/B7, a Taylor expansion gives
By (16), supϑ∈Θ̃n kΣ
sup |`n (ϑ) − Wϑ0 Σ−1 Wϑ |/Ω2ϑ = op (1)
(17)
ϑ∈Θ̃n
where Wϑ = Mϑ0 + DCn (ϑ − ϑ0 ). By (17), `n (ϑ) ≥ σNI2δ /2 holds uniformly for
ϑ ∈ Bd(Θ̃n ) ≡ {ϑ ∈ Θ̃n : kCn (ϑ − ϑ0 )k = NIδ } when n is large, where σ denotes
the smallest eigenvalue of D0 Σ−1 D, while `n (ϑ0 ) = Op (1) holds by Theorem 4(ii).
Hence, by the differentiability of `n (ϑ), this function’s minimum ϑ̂∗n on Θ̃n must lie
in Θ̃n \ Bd(Θ̃n ) and satisfy 0r = Q1n (ϑ̂∗n , tϑ̂∗n ) and 0p = ∂`n (ϑ̂∗n )/∂ϑ = 2Q2n (ϑ̂∗n , tϑ̂∗n ).
8
−1
As before, from 0r = Q1n (ϑ̂∗n , tϑ̂∗n ) we deduce b−d
n tϑ̂∗n = Σ Wϑ̂∗n + op (δn ) for
δn = ktϑ̂∗n /bdn k + kCn (ϑ̂∗n − ϑ0 )k using B5/B7, while 0p = (C0n )−1 Q2n (ϑ̂∗n , tϑ̂∗n ) implies
0p = D0 b−d
n tϑ̂∗n (1 + op (1)) by Zn = op (1) and B8. In matrix form, we may write
−1

tϑ̂∗n /bdn
Cn (ϑ̂∗n − ϑ0 )
 Σ −D 
=

−D0 0
Mϑ0 + op (δn )
.
op (δn )
By B1, δn = Op (1) then follows from Mϑ0 = Op (1) and we also have
d
Cn (ϑ̂∗n − ϑ0 ) = (D0 Σ−1 D)−1 D0 Σ−1 Mϑ0 + op (1) −→ N {0p , (D0 Σ−1 D)−1 },
(18)
establishing Theorem 4(iii). To prove Theorem 4(iv), it follows from (17), (18), and
d
B1 that `n (ϑ0 ) − `n (ϑ̂∗n ) = (Σ−1/2 Mϑ0 )0 PΣ−1/2 D (Σ−1/2 Mϑ0 ) + op (1) −→ χ2p , where
PΣ−1/2 D denotes the orthogonal projection matrix for the column space of Σ−1/2 D,
which has rank p.
To establish Theorem 4(v), recall `n (ϑ) = −2b−d
n log Ln (ϑ), ϑ ∈ Θ̃ and `n (ϑ) ≡ ∞
if it is not true that Ln (ϑ) > 0. For large n, two events hold with probability
arbitrarily close to 1: “a maximum ϑ̂∗n of Ln (ϑ) on Θ̃n exists” by Theorem 4(i) and
“{ϑ ∈ Θ̃ : Ln (ϑ) ≥ exp(−bdn χ2r;α /2)} ⊂ Θ̃n ” by the condition in Theorem 4(v). These
events in quotations together imply ϑ̂∗n = ϑ̂n . 9
Appendix Figure 1: Coverage probabilities for 90% confidence regions (EL and SLS
methods) for (θ1 , θ2 ) plotted against block sizes for a 30 × 30 sampling region, where
θ1 = 0.5 (based on 5000 simulations).
1.0
0.8
0.2
0.4
0.6
Coverage Probabilities
0.8
0.2
0.4
0.6
Coverage Probabilities
0.8
0.6
0.4
0.2
Coverage Probabilities
(c) θ2 = 8
1.0
(b) θ2 = 4
1.0
(a) θ2 = 1
2
3
4
5
6
7
8
0.0
EL
SLS
0.0
EL
SLS
0.0
EL
SLS
9
2
3
4
5
Block size
6
7
8
9
2
3
4
5
Block size
6
7
8
9
Block size
Appendix Figure 2: Relative Root Mean Squared Error (%) for nugget θ1 = 0.5 estimation on a 50 × 50 region over several values of range θ2 (based on 3000 simulations).
30
(c) θ2 = 8
30
(b) θ2 = 4
EL
SLS
WLS
OLS
10
15
●
●
25
●
●
●
●
●
●
●
●
●
●
●
●
4
6
8
Block size
10
12
●
●
●
●
●
●
●
●
●
5
●
0
5
0
5
●
2
EL
SLS
WLS
OLS
20
●
●
15
●
10
●
25
●
20
●
EL
SLS
WLS
OLS
15
●
10
●
Relative Root MSE (%)
●
20
●
0
Relative Root MSE (%)
25
●
Relative Root MSE (%)
30
(a) θ2 = 1
2
4
6
8
Block size
10
10
12
2
4
6
8
Block size
10
12
Appendix Figure 3: Relative Root Mean Squared Error (%) for the range θ2 parameter estimation on a 30 × 30 region with nugget θ1 = 0.5 (based on 5000 simulations).
2
3
4
●
●
●
●
5
6
7
8
9
100
80
●
●
●
●
●
2
3
●
●
0
0
0
●
60
60
●
●
40
80
●
●
20
●
●
20
●
20
●
●
●
●
●
EL
SLS
WLS
OLS
●
40
40
60
80
●
Relative Root MSE (%)
●
EL
SLS
WLS
OLS
●
Relative Root MSE (%)
EL
SLS
WLS
OLS
●
Relative Root MSE (%)
(c) θ2 = 8
100
(b) θ2 = 4
100
(a) θ2 = 1
2
3
4
Block size
5
6
7
8
9
4
5
Block size
6
7
8
9
Block size
Appendix Figure 4: Relative Root Mean Squared Error (%) for nugget θ1 = 0.5 estimation on a 30 × 30 region over several values of range θ2 (based on 5000 simulations).
●
●
100
●
●
●
●
●
2
3
4
5
6
Block size
7
8
9
80
60
60
40
●
●
0
0
●
40
80
●
●
●
●
●
●
●
●
2
3
4
5
6
7
8
●
0
●
EL
SLS
WLS
OLS
●
Relative Root MSE (%)
●
20
●
Relative Root MSE (%)
80
60
40
●
20
●
EL
SLS
WLS
OLS
●
20
EL
SLS
WLS
OLS
●
Relative Root MSE (%)
(c) θ2 = 8
100
(b) θ2 = 4
100
(a) θ2 = 1
2
3
4
5
6
Block size
11
7
8
9
Block size
9
Appendix Figure 5: Relative Root Mean Squared Error (%) for nugget θ1 = 0.5 and
range θ2 = 1 estimation on a 10 × 10 sampling region (based on 10000 simulations).
100
(b) Range
100
(a) Nugget
80
60
60
40
EL
SLS
WLS
OLS
●
●
2
3
4
●
●
●
5
6
7
0
0
20
●
●
40
●
Relative Root MSE (%)
●
●
●
●
●
EL
SLS
WLS
OLS
20
Relative Root MSE (%)
80
●
2
3
4
5
6
7
Block size
Block size
Appendix Figure 6: Percentiles (10th, 50th and 90th) for (scaled) EL estimates of
the nugget on a (a) 50 × 50 sampling region and (b) 20 × 20 region, against block
size; a horizontal line running fully between graph margins indicates 1.
(a) 50 × 50 region
(b) 20 × 20 region
1.6
Range=4
Range=10
1.2
1.0
Percentiles
0.8
1.0
0.6
0.8
0.4
0.6
Percentiles
1.2
1.4
1.4
Range=1
Range=4
Range=8
2
4
6
8
10
12
2
Block size
3
4
5
Block size
12
6
7
8
Download