Non-Gaussian Spatiotemporal Modelling through Scale Mixing

advertisement
Non-Gaussian Spatiotemporal Modelling through Scale Mixing
Thaís C.O. da Fonseca and Mark F.J. Steel∗
Department of Statistics, University of Warwick, U.K.
Abstract
The aim of this work is to construct non-Gaussian and nonseparable covariance functions
for processes that vary continuously in space and time. Stochastic modelling of phenomena
over space and time is important in many areas of application. But choice of an appropriate
model can be difficult as one must take care to use valid covariance structures. We start from a
general and flexible way of constructing valid nonseparable covariance functions derived through
mixing over separable Gaussian covariance functions. We then generalize the resulting models by
allowing for individual outliers as well as regions with larger variances. We induce this through
scale mixing with separate positive-valued processes. Smooth mixing processes are applied to
the underlying correlated Gaussian processes in space and in time, thus leading to regions in
space and time of increased spread. We also apply a separate uncorrelated mixing process to
the nugget effect to generate individual outliers. We consider posterior and predictive Bayesian
inference with these models. We implement this through a Markov chain Monte Carlo sampler
and apply our modelling approach to temperature data in the Basque country.
Key words: Bayesian Inference; Flexible tails; Mixtures; Nonseparability; Outliers; Temperature data.
1
Introduction
The motivation of this work is to develop and study non-Gaussian models for processes that
vary continuously in space and time. This is a problem of interest in many fields of science such
as geology, hydrology and meteorology. Consider the problem of modelling a phenomenon of
interest over space and time as a random process
{Z(s, t); (s, t) ∈ D × T },
∗
(1)
Thaís Fonseca acknowledges financial support from the Center for Research in Statistical Methodology (CRiSM)
and we thank Blanca Palacios for providing us with the temperature data.
1
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
where (s, t) ∈ D × T , D ⊆ <d , T ⊆ < are space-time coordinates that vary continuously
in D × T . We usually observe a realization of this process at locations si , i = 1, . . . , I and
time points tj , j = 1, . . . , J. The usual assumption for the finite dimensional distributions
implied by this process is that, for these spatiotemporal coordinates, the random vector Z =
(Z(s1 , t1 ), . . . , Z(sI , t1 ), . . . , Z(s1 , tJ ), . . . , Z(sI , tJ ))0 has a multivariate normal distribution with
covariance matrix Σ with elements Σkk0 = Cov(Zk , Zk0 ), k, k 0 = 1, 2, . . . , N = IJ. This class
is mathematically very convenient, but it is a very restrictive assumption and the data may
well present non-Gaussian characteristics. For instance, if there are aberrant observations in
the data set it would be useful to consider heavy tailed distributions in order to accommodate
these observations. In recent years some models describing departures from Gaussianity for
spatial processes were presented in the literature. Cressie and Hawkins (1980) discuss robust
estimation of the variogram when the distribution has heavier tails than the normal in spatial
models. De Oliveira et al. (1997) used nonlinear transformations of random fields in order
to accommodate moderate departures from Gaussianity. For instance, their proposal includes
the Gaussian and the lognormal models as sampling distributions. Palacios and Steel (2006)
proposed a geostatistical model that accommodates non-Gaussian tail behaviour in space. Their
proposal has the Gaussian model as a limiting case. The proposed class of processes is based
on scale mixing a Gaussian process which allows for modelling regions with larger observational
variance. Here we consider similar mixing in the nugget effect component allowing for individual
outliers. In addition, the ideas in Palacios and Steel (2006) are here extended to processes in
space and time, while avoiding the restrictive assumption of separability between space and
time.
In the context of larger observational variance, Damian et al. (2001) considered temporally
p
ν(s)Y1 (s) + Y2 (s, t), where
independent samples from a spatiotemporal process Z(s, t) =
Y1 (s) is a spatial process and Y2 (s, t) accounts for the nugget effect. Their model addresses
the problem of anisotropy through deformation of the spatial coordinates and uses Bayesian
semi-parametric modelling of the deformation function. The general model potentially also
accounts for different variances in space but they adopted the simplifying assumption of constant
variances ν(s) = ν, ∀s. Damian et al. (2003) consider the complete model that incorporates
spatial heterogeneity by modelling ν(s) as latent variables with a log-Gaussian distribution.
To deal with heterogeneity in time, Stein (2009) proposed a model that can account for
occasional bursts of increased variability in time. This was done by considering the transformed
spatiotemporal process divided by a function of time which was estimated by computing the
sample standard deviation at each time point and then smoothed by cubic splines. Notice that
this approach does not allow for predictions in time.
The model we propose here is able to capture heterogeneous variability both in time and
2
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
space, as well as outliers in space through a mixed nugget effect. In addition, the covariance
structure is nonseparable between space and time. We present an application to maximum
temperature data in the Spanish Basque Country, in which the model massively outperforms
Gaussian modelling, both in terms of within-sample data support and out-of-sample predictive
fit. Moreover, the model easily allows for prediction in space and in time, since we can also
predict the mixing processes. We use a Bayesian inferential framework with mildly informative
priors. The flexibility of the proposed model does not substantially complicate posterior and
predictive inference since conditional on the mixing processes the finite dimensional distributions
are all Gaussian distributions.
2
Spatiotemporal modelling
Building adequate models for processes observed over space and time is not an easy task. Many
features have to be considered, like stationarity, separability, isotropy and Gaussianity. Adequate specification of the sampling distribution plays an important role in this context since
misspecification can lead to poor forecasts or interpolations in space and time. In particular,
Gaussian models will not perform well if the data are contaminated by outliers or if there are
regions in space or time with larger observational variance. For this reason, we propose a general
model able to capture individual outliers as well as regions with different variance. We use the
idea of scale mixing in order to construct processes that imply finite dimensional distribution
with heavier tail than the normal distribution.
We consider nonseparable models in space and time generated as proposed in Fonseca and
Steel (2008). This construction takes advantage of the models proposed for spatial and temporal
processes separately and combines them by using a continuous mixture of separable covariance
functions. Let (U, V ) be a bivariate nonnegative random vector with distribution µ(u, v) and
independent of {Z1 (s); s ∈ D} and {Z2 (t); t ∈ T } which are purely spatial and temporal random
processes, respectively, taken to be independent. Define the process
Z(s, t) = Z1 (s; U )Z2 (t; V ),
(2)
where Z1 (s; u) is a purely spatial random process for every u ∈ <+ with covariance function
C1 (s; u) = σ1 exp{−γ1 (s)u} which is a stationary covariance for s ∈ D and every u ∈ <+ and a
measurable function of u ∈ <+ for every s ∈ D. Z2 (t; v) is a purely temporal random process
for every v ∈ <+ with covariance function C2 (t; v) = σ2 exp{−γ2 (t)v} which is a stationary
covariance for t ∈ T and every v ∈ <+ and a measurable function of v ∈ <+ for every t ∈ T .
γ1 (s) is a purely spatial variogram on D and γ2 (t) is a purely temporal variogram on T . Then
the corresponding covariance function of Z(s, t) is a convex combination of separable covariance
3
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
functions. It is valid (see Ma, 2002, 2003) and generally nonseparable, and is given by
Z
C(s, t) = C1 (s; u)C2 (t; v)dµ(u, v)
(3)
In particular, if we define U = X0 + X1 and V = X0 + X2 where X0 , X1 and X2 are independent nonnegative random variables with finite moment generating functions M0 , M1 and M2 ,
respectively, then the resulting covariance function is given by
C(s, t) = σ 2 M0 (−γ1 (s) − γ2 (t)) M1 (−γ1 (s)) M2 (−γ2 (t)),
(4)
where σ 2 = σ1 σ2 . For some interesting classes generated by this approach see Fonseca and Steel
(2008). We now consider a more general process {Z̃(s, t); (s, t) ∈ D × T } defined by
Z̃(s, t) = Z̃1 (s; U )Z̃2 (t; V ),
(5)
where
Z̃1 (s; U ) =
p
Z1 (s; U )
(s)
1 − τ2 p
+τp
λ1 (s)
h(s)
(6)
where {λ1 (s); s ∈ D} is a positively valued mixing process which is independent of (s) and
Z1 (s; u). {(s); s ∈ D} denotes an uncorrelated Gaussian process with zero mean and unitary
variance which introduces a nugget effect parameterised by τ . {h(s); s ∈ D} is an uncorrelated
process in <+ with distribution Ph . The mixing process λ1 (s) is spatially correlated and allows
for regions in space with larger variance while the process h(s) can create traditional outliers,
i.e. observations with unusually large nugget effects. We also want to allow heterogeneous
observational variances in time, so we consider the following process in time
Z2 (t; V )
Z̃2 (t; V ) = p
λ2 (t)
(7)
where {λ2 (t); t ∈ T } is a positive mixing process which is independent of Z2 (t; v). The covariance
function for the process {Z̃(s, t); (s, t) ∈ D × T } is given by
C̃(s, t) = Cov(Z̃(s0 , t0 ), Z̃(s0 + s, t0 + t)) = EU,V {C̃1 (s; U )C̃2 (t; V )},
(8)
where s, s + s0 ∈ D and t, t + t0 ∈ T , C̃1 (s; u) = Cov(Z̃1 (s0 ; u), Z̃1 (s0 + s; u)) and C̃2 (t; v) =
Cov(Z̃2 (t0 ; v), Z̃2 (t0 + t; v)). Throughout, we assume independence between λ1 (s), h(s) and
λ2 (t).
3
Scale mixing in space
In this section we consider scale mixing in the space dimension. This will account for individual outliers (through the process h(s)) and regions in space with larger observational variance
(through the process λ1 (s)). The latter is quite common e.g. in meteorological applications
4
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
where outliers are often associated with severe weather events such as tornados and hurricanes.
Lu et al. (2007) pointed out that these events do not usually happen in a single location but
cover an extended region.
Initially, we consider the case where λ2 (t) = 1, ∀t ∈ T . The results in Palacios and Steel
(2006) are directly applicable to the purely spatial process Z̃1 (s; u). The mixing process λ1 (s)
needs to be correlated to induce mean square continuity in the process Z̃1 (s; u) for τ = 0, that is,
we need to correlate the mixing variables so that locations that are close together will have very
similar values of λ1 (s). The simplest way to do this is to consider a common mixing variable
λ1 (s) = λ ∼ Pλ . Then we have that
C̃1 (s; u) = (1 − τ 2 )E λ−1 C1 (s; u) + τ 2 E h−1 I(s=0)
(9)
E[h−1 ] = E[h(s)−1 ], ∀s, where h ∼ Ph . Solving the integral in (8) and assuming throughout
that σ 2 = σ2 and σ1 = 1 we obtain
C̃(s, t) = σ 2 M0 (−γ1 (s) − γ2 (t)) M̃1 (−γ1 (s)) M2 (−γ2 (t))
(10)
where M̃1 (−γ1 ) = (1 − τ 2 )E[λ−1 ]M1 (−γ1 (s)) + τ 2 E[h−1 ]I(s=0) . Therefore the correlation structure for s 6= 0 is given by
ρ̃(s, t) =
E[λ−1 ]
C̃(s, t)
=
M0 (−γ1 (s) − γ2 (t)) M1 (−γ1 (s)) M2 (−γ2 (t))
−1
E[λ ] + w2 E[h(s)−1 ]
C̃(0, 0)
(11)
where w2 = τ 2 /(1 − τ 2 ). When τ 2 = 0 (no nugget effect) the mixing does not affect the
correlation structure, that is, ρ̃(s, t) = ρ(s, t), where ρ(s, t) = C(s, t)/C(0, 0) is the correlation
function of {Z(s, t), (s, t) ∈ D × T }. For instance, if we take Pλ to be Ga(ν/2, ν/2) then
the unconditional distribution of Z̃ = (Z̃(s1 , t1 ), . . . , Z̃(sI , tJ ))0 is IJ-variate student-t with ν
degrees of freedom. Roislien and Omre (2006) presents some characteristics of student-t random
fields.
But setting λ1 (s) = λ ∼ Pλ is an extreme situation and we would like to have individual
−1/2
mixing variables that account for spatial heterogeneity. Thus, we consider a process λ1
(s)
that is mean square continuous which implies that
continuous. This
Z̃1 (s; u) is also mean square
2 −1/2
−1/2
means that we need to satisfy the condition E λ1 (si ) − λ1 (si0 )
→ 0 as si → si0 .
The latter is satisfied for the log-Gaussian process proposed in Palacios and Steel (2006) where
{ln(λ1 (s)); s ∈ D} is a Gaussian process with mean −ν/2 and covariance structure νC1∗ (·), with
ν > 0 and C1∗ (·) a valid correlation function. This implies a lognormal distribution for each
λ1 (si ) with mean one and Var[λ1 (si )] = exp(ν) − 1, so that the marginal distribution becomes
more spread out as ν increases. For large ν the distribution also becomes more right-skewed
with the mode shifting towards zero, allowing for substantial variance inflation for some spatial
regions. The spatial covariance function is then given by
nν
o
C̃1 (s; u) = (1 − τ 2 ) exp
[C1∗ (s) − 1] + ν C1 (s; u) + τ 2 E h−1 I(s=0) ,
4
(12)
5
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
and the resulting covariance function for the spatiotemporal process obtained by solving (8)
is given by (10) with M̃1 (−γ1 ) = (1 − τ 2 ) exp ν4 [C1∗ (s) − 1] + ν M1 (−γ1 ) + τ 2 E[h−1 ]I(s=0) .
Therefore the correlation structure for s 6= 0 is given by
exp ν4 [C1∗ (s) − 1] + ν
ρ̃(s, t) =
M0 (−γ1 (s) − γ2 (t)) M1 (−γ1 (s)) M2 (−γ2 (t)).
exp{ν} + w2 E[h−1 ]
(13)
Throughout, we will use C1∗ (s) = M1 (−γ1 (s)). If, in addition, we take h(s) = 1 we have the
same model presented in Palacios and Steel (2006) for the space dimension. As commented in
Palacios and Steel (2006), we could use a different correlation function C1∗ (.) for the mixing
process but then we would need to estimate the parameters in C1∗ (.) and this might not be easy
on the basis of typically available data.
The smoothness properties presented in Palacios and Steel (2006) and Fonseca and Steel
(2008) extend to this framework. In the following, f (q) (x) will denote the q th derivative of a
function f (x) with respect to x.
Proposition 3.1 In the case without nugget effect and with C1∗ (s) = M1 (−γ1 (s)), the purely
spatial process {Z̃(s, t0 ), s ∈ D} at a fixed time point t0 ∈ T is m times mean square differentiable
(2m)
if and only if M1
3.1
(2m)
(r) and γ1
(s) exist and are finite at 0.
Process h(s)
We define the process {h(s); s ∈ D} as an uncorrelated mixing process that allows for larger
nugget effects. This accommodates traditional outlying observations. The process is uncorrelated with (s), Z1 (s; U ) and Z2 (t; V ).
Aberrant observations are common in time series analysis and might also be encountered in
processes observed in space. Therefore it is essential to consider this possibility when modelling
phenomena over space-time. It is important to understand the effect of outliers in the estimation
of the parameters in the correlation structure as this will affect directly the predictions.
We consider the detection of outliers jointly in the estimation procedure. The variables
hi = h(si ), i = 1, . . . , I are considered latent variables and their posterior distribution provides
an indication of outlying observations. If the marginal posterior distribution of hi has a lot of
mass close to 0, this indicates inflation of the scale τ 2 and therefore an outlying observation.
We consider the following i.i.d. assumptions for hi , i = 1, . . . , I where νh > 0:
1. ln(hi ) ∼ N(−νh /2, νh ), that is, E[hi ] = 1 and Var(hi ) = exp(νh ) − 1. If νh is close to 0
then the distribution of hi is very tight around 1.
2. hi ∼ Ga(1/νh , 1/νh ), where Ga(a, b) denotes the Gamma distribution with density function
f (x) = ba /Γ(a)xa−1 exp(−bx), so that E[hi ] = 1 and Var(hi ) = νh . If νh is close to 0 then
again the distribution of hi is very tight around 1.
6
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
In order to evaluate the tail behaviour of the finite dimensional distribution of the proposed
4
2
process we consider the kurtosis, which is given by E[Z̃ij
]/E2 [Z̃ij
], where Z̃ij = Z̃(si , tj ), i =
1, . . . , I and j = 1, . . . , J. The kurtosis of the marginal finite dimensional distributions implied
by the process defined in (5)-(6) in combination with a log-Gaussian λ1 (s) is given by
kurt[Z̃ij ] =
−2
4
3{exp(3ν) + 2w2 exp(ν)E[h−1
i ] + w E[hi ]}
.
4 2 −1
exp(2ν) + 2w2 exp(ν)E[h−1
i ] + w E [hi ]
(14)
Notice that when τ 2 = 0, that is, if there is no nugget effect then the kurtosis is given by
3 exp(ν) as in Palacios and Steel (2006). In the case of ln(hi ) ∼ N(−νh /2, νh ), E[h−1
i ] = exp(νh )
−1
and E[h−2
i ] = exp(3νh ), for any νh > 0. If instead hi ∼ Ga(1/νh , 1/νh ) then E[hi ] = 1/1 − νh
and E[h−2
i ] = 1/(1 − νh )(1 − 2νh ). The latter case requires that νh < 0.5. Figure 1 shows the
implied kurtosis for several values of w2 for both models when ν = 0.5. The Gamma distribution
for Ph looks less flexible since it gives kurtosis very close to 3 exp(ν) for almost all values of νh .
We need to go very close to νh = 0.5 to get larger values for the kurtosis coefficient. Without
mixing through λ1 (s), the kurtosis is an increasing function of νh . For instance, for the case
of ln(hi ) ∼ N(−νh /2, νh ) we have that kurt[Z̃ij ] = 3{1 + 2w2 exp(νh ) + w4 exp(3νh )}/{1 +
2w2 exp(νh ) + w4 exp(2νh )} which is an increasing function of νh . On the other hand, for ν > 0
the kurtosis is not monotonous in νh , as illustrated in Figure 1. Notice that without mixing in
the nugget effect the kurtosis is a increasing function of ν and without either mixing we obtain
kurt[Z̃ij ] = 3 (Gaussian case).
w2 = 0.05
w2 = 0.01
w2 = 0.05
3.0
0.0
1.0
2.0
3.0
4.90
kurtosis
0.2
0.4
0.0
0.2
νh
νh
w2 = 0.11
w2 = 0.18
w2 = 0.11
w2 = 0.18
2.0
3.0
4.5
4.6
0.0
1.0
νh
2.0
3.0
0.0
νh
5.5
kurtosis
5.4
5.0
kurtosis
25
15
kurtosis
1.0
0.2
0.4
0.0
νh
(a) hi Lognormal
0.4
6.5
νh
5
0.0
0.0
4.80
4.905
4.8
2.0
νh
5
kurtosis
1.0
10 15 20
0.0
4.915
kurtosis
10
8
kurtosis
6
5.0
kurtosis
5.2
w2 = 0.01
0.2
0.4
νh
(b) hi Gamma
Figure 1: Kurtosis for different values of w2 and ν = 0.5.
7
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
3.2
Parameterisation
In what follows we make particular choices for the variogram functions γ1 (s), γ2 (t) and variables
X0 , X1 and X2 introduced in Section 2. Consider γ1 (s) = ||s/a||α , γ2 (t) = |t/b|β , a, b > 0,
α ∈ (0, 2] and β ∈ (0, 2]. Let X0 ∼ Ga(η0 , 1), X2 ∼ Ga(η2 , 1), which results in Cauchy
covariance functions, that is, M0 (x) = (1 − x)−η0 and M2 (x) = (1 − x)−η2 , η0 , η2 > 0. For X1
we consider the following distributions
1. X1 ∼ Ga(η1 , 1) resulting in the Cauchy function M1 (x) = (1 − x)−η1 , η1 > 0;
2. X1 ∼ InvGa(η1 , 1) resulting in the Matèrn function M1 (x) =
√
2(−x)η1 /2
K (2 −x),
2η1 −1 Γ(η1 ) η1
η1 >
0. See Stein (1999) for details of this class of covariance functions;
3. X1 ∼ GIG(η1 , δ, δ) resulting in the Generalized Matérn function M1 (x) = (1 −
p
x −η1 /2
Kη1 (2δ 1 − xδ )/Kη1 (2δ), η1 ∈ <, δ > 0. See Shkarofsky (1968) for details of
δ)
this class of covariance functions.
Here InvGa(η1 , 1) denotes the Inverse Gamma distribution with density function f (x) =
1/Γ(η1 )x−η1 −1 exp(−1/x) and GIG(η1 , δ, δ) denotes the Generalized Inverse Gaussian distribution with density function f (x) = 1/(2Kη1 (2δ))xη1 −1 exp{−(δx + δ/x)}. We use the correlation between the variables U and V as an indication of interaction between space and time
components. This correlation is given by
η0
c= p
,
(η0 + V1 )(η0 + η2 )
(15)
where V1 = Var(X1 ). Thus, 0 ≤ c ≤ 1 could be used as a measure of space-time interaction,
with c = 0 indicating separability and c = 1 meaning high dependence between space and time.
Notice that in the case X1 ∼ InvGa(η1 , 1) the variance of X1 does not exist (unless η1 > 2 is
imposed through the prior) and the dependence between space and time is then measured by
η0
c̃ = q
,
(η0 + Ṽ1 (η1 ))(η0 + η2 )
(16)
where Ṽ1 (η1 ) = (Q(0.75; η1 ) − Q(0.25; η1 ))2 and Q(x; η1 ) is the quantile of X1 corresponding to
100x%.
3.3
Inference
Suppose we observe realizations z̃ij of Z̃ij ≡ Z̃(si , tj ) at locations si , i = 1, . . . , I and time
points tj , j = 1, . . . , J. Defining λ1 = (λ1 (s1 ), . . . , λ1 (sI )), h = (h(s1 ), . . . , h(sI )) and µ =
(µ(s1 , t1 , . . . µ(sI , tJ ))0 a location function, the likelihood function is given by
L1 (θ, λ1 , h; z̃) = fN (z̃ | µ, Σ),
(17)
8
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
where θ = (η0 , η1 , η2 , α, β, a, b, σ 2 , τ 2 ) or θ = (δ, η0 , η1 , η2 , α, β, a, b, σ 2 , τ 2 ) depending on which
distribution we choose for X1 , fN (.|µ, Σ) denotes the multivariate Gaussian density function
with mean µ and covariance matrix Σ and Σkk0 = Cov[(Z̃)k , (Z̃)k0 ] where
I(si = si0 )
M1 (−γ1 )
+ τ2 √
M2 (−γ2 ),
Cov(Z̃ij , Z̃i0 j 0 ) = σ 2 M0 (−γ1 − γ2 ) (1 − τ 2 ) √
λ1i λ1i0
hi hi0
i, i0 = 1, . . . , I, j, j 0 = 1, . . . , J, γ1 = γ1 (si − si0 ), γ2 = γ2 (tj − tj 0 ), λ1i = λ1 (si ) and (Z̃)k is the
k th element of the IJ-dimensional vector Z̃.
Note that Gaussian behaviour is only assumed given the mixing variables λ1 and h. Integrating out with respect to these mixing variables leads to non-Gaussian distributions. We augment
with the latent variables λ1 and h in order to identify possible regions with larger observational
variance and/or traditional outliers. The vector ln(λ1 ) is multivariate normally distributed with
mean −ν/2 and covariance matrix with elements νM1 (−γ1 (si − si0 )). And ln(hi ), i = 1, . . . , I
is either normal with mean −νh /2 and variance νh or hi ∼ Ga(1/νh , 1/νh ).
We use stochastic simulation via MCMC to obtain an approximation of the posterior distribution of (θ, λ1 , h). We obtain samples from the target distribution p(θ, λ1 , h|z) by successive
generations from the full conditional distributions. More specifically, we adopt a hybrid Gibbs
sampler scheme with Metropolis-Hastings steps. We use random walk proposals to generate
values of λ1 and h. We also consider groups in space in order to block the sampler. For a more
elaborate algorithm see Palacios and Steel (2006).
Model comparison is conducted on the basis of Bayes factors. These are computed from the
MCMC output using methods to approximate the marginal predictive density of z. In previous
simulation studies (Fonseca and Steel, 2008) we noticed that the estimator p4 of Newton and
Raftery (1994) (with their d as small as 0.01), the optimal bridge sampling approach of Meng
and Wong (1996), and the shifted Gamma estimator proposed by Raftery et al. (2007) (with
values of their λ1 close to one) give essentially the same results, especially the last two.
3.4
Prediction and interpolation
Gaussian model
Suppose we are interested in predicting Zp at a location sp at time point tp , where sp is
not necessarily included within the sampling design. Under a Bayesian approach, the prediction
of Zp is based on the posterior predictive distribution P (Zp |Zo ) where Zo are the available
observations of the process Z(s, t). We have that
Z
P (Zp |Zo ) = P (Zp |Zo , θ)P (θ|Zo )dθ
(18)
Under the Gaussian model, (Zo , Zp |θ) has a multivariate Gaussian distribution. Using the
properties of conditional distributions in the Gaussian family we obtain the mean and the
9
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
variance for Zp |Zo , θ given by
µ̄ = µp + Σpo Σ−1
oo (zo − µo )
(19)
Σ̄ = Σpp − Σpo Σ−1
oo Σop
(20)
and
Suppose we have θ(1) , . . . , θ(M ) generated from the posterior distribution θ|Zo (by the
MCMC sampler). Then we approximate the predictive distribution of Zp by averaging over
P (Zp |Zo , θ(k) ), for k = 1, . . . , M .
Non-Gaussian model
Suppose we have samples from the distribution of θ, λ1o |Zo . We need to sample λ1p |θ, λ1o , Zo
and then use the Gaussian sampler just described to sample Zp |Zo , θ, λ1p , λ1o . Notice that
p(λ1p |θ, λ1o , Zo ) ≡ p(λ1p |θ, λ1o ), where
(ln λ1o , ln λ1p )0 |θ ∼ N (−ν/2, νM1 ),
(21)
with M1ij = M1 (−γ1 (si − sj )). Thus, given λ1o and θ we can easily generate values of λ1p .
Analogously, we deal with λ2 (t) discussed in the next section. Taking into account h(s) is trivial
due to the independence of the hi ’s.
Predictive model comparison
In order to check the predictive accuracy of each model we use a predictive scoring rule.
Scoring rules provide summaries for the evaluation of probabilistic forecasts by comparing the
predictive distribution with the actual value observed for the process. For more details about
scoring rules see Gneiting and Raftery (2007). In particular, we use the log predictive score
(LPS) based on the predictive distribution p (which can be multivariate) and on the observed
value z,
LPS(p; z) = − ln(p(z)).
(22)
The smaller LPS is, the better the model does in forecasting Zp .
4
Scale mixing in time
We also want to consider the case where λ2 (t) 6= 1. Let {λ2 (t); t ∈ T } be a mixing process in
time. Notice that the process {Y (s, t); (s, t) ∈ D × T } where
Y (s, t) = λ2 (t)1/2 Z̃(s, t)
(23)
is exactly the process proposed in Section 3. We derive properties and conduct inference by using
this fact. We will not consider the case with temporal nugget effect but the model presented
here can be easily extended to allow for a nugget effect in time as done in the spatial dimension.
10
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
Defining λ2 = (λ2 (t1 ), . . . , λ2 (tJ )) and using (17) and (23) the resulting likelihood function
is given by
L(θ, λ1 , h, λ2 ; z̃) = L1 (θ, λ1 , h; Λ2 z̃)
J
Y
I/2
λ2j ,
(24)
j=1
1/2
1/2
where Λ2 = diag((λ21 , . . . , λ2J )0 ⊗ 1I ), λ2j = λ2 (tj ) and 1I is a vector of ones of size I.
We use in the time dimension the same kind of mixing process used in space, i.e. {ln(λ2 (t)) :
t ∈ T } is a Gaussian process with mean −ν2 /2 and covariance structure ν2 C2∗ (·), with ν2 > 0
and C2∗ (·) a valid correlation function. The temporal covariance function is given by
C̃2 (t; v) = exp
nν
2
4
o
[C2∗ (t) − 1] + ν2 C2 (t; v)
(25)
And the resulting covariance function for the spatiotemporal process obtained by solving (8) is
given by
C̃(s, t) = σ 2 M0 (−γ1 (s) − γ2 (t)) M̃1 (−γ1 (s)) M̃2 (−γ2 (t))
(26)
where M̃1 (−γ1 (s)) = (1 − τ 2 ) exp ν4 [C1∗ (s) − 1] + ν M1 (−γ1 (s)) + τ 2 E[h(s)−1 ]I(s=0) and
M̃2 (−γ2 (t)) = exp ν42 [C2∗ (t) − 1] + ν2 M2 (−γ2 (t)). Therefore, the correlation structure for
s 6= 0 is given by
exp ν4 [C1∗ (s) − 1] + ν + ν42 [C2∗ (t) − 1] + ν2
M0 (−γ1 (s)−γ2 (t)) M1 (−γ1 (s)) M2 (−γ2 (t)),
ρ̃(s, t) =
[exp{ν} + w2 E[h(s)−1 ]] exp{ν2 }
(27)
where we set C1∗ (s) = M1 (−γ1 (s)) and C2∗ (t) = M2 (−γ2 (t)).
This scale mixing in time will capture periods in time with larger observational variance,
which can be seen as a way to address the issue of volatility clustering, which is quite a common
occurrence in e.g. financial time series data. This aspect of our model is reminiscent of a
stochastic volatility model used in this literature. Smoothness properties, such as the one in
proposition 3.1, are easily derived for the temporal process. Thus, if C2∗ (t) = M2 (−γ2 (t)), the
purely temporal process {Z̃(s0 , t), t ∈ T } at a fixed location s0 ∈ D is m times mean square
(2m)
differentiable if and only if M2
(2m)
(r) and γ2
(t) exist and are finite at 0.
To summarize, the full model, with scale mixing through λ1 , λ2 and h will be able to accommodate smooth spatial heterogeneity in the variance (through λ1 ), gradual temporal changes in
the variance (through λ2 ) and spatial outliers (through the fat-tailed distribution of the nugget
effect, induced by mixing with h).
5
Empirical Results
Throughout, the prior distribution for the hyperparameters ν, νh and ν2 are Ga(1, 5), assigning a
large probability mass to values close to zero which indicate simpler cases (without scale mixing,
11
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
i.e. no spatial or temporal heterogeneity or fat tails). The prior distribution for the remaining
parameters is mildly informative as proposed and discussed in Fonseca and Steel (2008).
5.1
Simulation results
We have analysed a substantial number of generated datasets, with and without perturbations
with respect to Gaussianity. This has illustrated that the priors are reasonable and the inference
methods are reliable and efficient. In addition, we can successfully identify outliers and regions
of increased variance in time and space. In fact, we are able to separate these effects even when
they are all present in the data simultaneously. The use of Bayes factors leads to sensible model
choices, in line with the way the data were generated.
5.2
Application to temperature data
We now present an application to the maximum temperatures recorded daily in July of 2006
(J = 31) in 70 locations within the Spanish Basque country. We consider I = 67 of these
locations for estimation of the parameters and we leave out 3 locations for predictive comparison.
Figure 2 (a)-(b) present the boxplots for the maximum temperature (in degrees centigrade) over
space and time, respectively. Notice in Figure 2 (c) and (d) that the empirical variance over
time and space is far from constant, which suggests that a simple Gaussian model with constant
variance might be unsuitable. In order to model nonstationarities in the mean of the process we
considered a mean function that depends on the spatial coordinates and on time. In addition, the
region considered is quite mountainous with altitudes ranging from 0 to 1188 meters, therefore
the altitude (x) was included in the mean function as a covariate. The resulting mean function
is given by
µ(s, t) = δ0 + δ1 s1 + δ2 s2 + δ3 x + δ4 t + δ5 t2
(28)
The covariance model considered here is nonseparable as presented in (4) allowing for interactions between space and time. The model is parameterized as in Subsection 3.2 and the
chosen covariance in space is the Cauchy type. The parameters estimated are the trend coefficients (δ0 , . . . , δ5 ), the covariance parameters (η0 , α, β, a, b, σ 2 , τ 2 ) and the mixing parameters
(ν1 , ν2 , νh ). We also generate the auxiliary variables (λ1 , λ2 , h) in our MCMC algorithm as described in Subsection 3.3 in order to identify regions in space and time with larger observational
variance. The parameters η1 and η2 are set to 1. See Fonseca and Steel (2008) for a discussion of
parameterisation in this class of covariance models. For the mixing on the nugget effect we only
consider lognormally distributed hi , as this seems the most flexible option (see the discussion
at the end of Subsection 3.1).
In order to calculate the likelihood function we need to invert a matrix with dimension
12
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
2077 × 2077. Since this is very time demanding we approximate the likelihood by using conditional distributions as described in Stein et al. (2004). In summary, we consider a partition of Z into subvectors Z1 , ..., Z31 where Zj = (Z(s1 , tj ), . . . , Z(s67 , tj ))0 and we define
Z(j) = (Zj−L+1 , ..., Zj ). Then, taking φ = (θ, λ1 , λ2 , h), we use
p(z|φ) ≈ p(z1 |φ)
31
Y
p(zj |z(j−1) , φ).
(29)
j=2
Since we have a natural time ordering of Z, this means the distribution of Zj will only depend
on the observations in space for the previous L time points. In this application we used L = 5
to make the MCMC feasible. We checked that this approximation is quite accurate for these
data.
We estimated the parameters for the following models: the Gaussian model, the nonGaussian model with λ1 only, the non-Gaussian model with h, the non-Gaussian model with
h and λ1 , the non-Gaussian model with λ2 only, the non-Gaussian model with λ2 and h, the
non-Gaussian model with λ1 and λ2 and finally the non-Gaussian model with h, λ1 and λ2 . Notice that all the models considered here have a nugget effect in space parameterized by τ 2 > 0
that accounts for measurement errors and small-scale variation. The estimated Bayes Factors
presented in Table 1 indicate the non-Gaussian models are much more adequate for this dataset
than the Gaussian model. The complete model that includes λ1 , λ2 and h is by far the best
one according to both estimators. As expected on the basis of the large variations in empirical
variance over time and space (see Figure 2), the models that include both λ1 and λ2 perform
well.
Table 2 presents some posterior summaries for the parameters of interest. The models with
λ1 tend to give rather different results for the smoothness parameter in space α, the range in
space a and the nugget effect τ 2 (not reported). In particular, models with λ1 tend to suggest
rougher processes with smaller values of α than the models without λ1 . This may be related to
the use of the same covariance structure for the processes λ1 (s) and Z1 (s; u). Thus, it may be
the case that the process λ1 (s) is rougher than the process Z1 (s; u). But estimation of a different
covariance structure for λ1 would probably be too much to ask from the data. Furthermore, h
seems to capture some of this roughness as the models with λ1 and also h have larger estimates
of α.
Inference on the separability measure c in (15) is relatively unaffected by the model choice
and indicates that the data are fairly close to separable. Notice that the process is very rough
in time with very small values of β. For β < 1 the process is not even mean square continuous
in time.
The posterior distributions of the parameters of the mixing distributions, which drive the
tail behaviour, are depicted in Figure 3. It is clear that the posteriors are very different from
13
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
the priors (indicated by dashed lines), and point very strongly towards non-Gaussian behaviour.
Remember that the Gaussian model corresponds to the limiting case where all ν parameters
tend towards zero. The posteriors clearly suggest that all three forms of scale mixing in the
model are supported by the data, especially for the smooth spatial process. Of course, this is
in line with the Bayes factors discussed earlier.
Inference on the coefficients in the mean function (28) show that altitude is an important
covariate with a similar effect for all the models (in particular, it indicates a drop in mean level
of about 0.8 of a degree centigrade per 100 meters altitude).
In the most complete model, the variance of Z̃(si , tj ) is
σ2 1 − τ 2
τ2
2
Var[Z̃(si , tj )] ≡ σij
=
+
,
λ2j
λi1
hi
from which we can deduce the variance structures over stations and time points. In particular, if
we marginalise over space by assigning the spatial mixing variables an “average” constant value
which we can take to be the prior mean, i.e. λi1 = hi = 1 then we can trace the temporal
evolution of the variance as σ 2 /λ2j . Similarly, if we assume λ2j = 1, we can decompose the
spatial variance into a part for the correlated process σ 2 (1 − τ 2 )/λ1i and a nugget part σ 2 τ 2 /hi .
Figure 5 gives the posterior boxplots for the variance evolution over time and space for the full
model, whereas Figure 4 does the same for the model with only h and λ2 (here the constant
contribution to the variance of the spatially smooth process is not depicted). Notice the clear
separation of the effects showing that stations 17, 18 and 66 (both panels (b)) are outliers
whereas some regions in time have larger observational variance (panels (a)). The 3 stations
identified as outliers are all located in the north but quite far from each other, indicating that
it is not the case of a region in space with larger variance. The temporal variance pattern in
both models is roughly in line with the empirical variances over time shown in Figure 2 (d).
The component containing λ1 (Figure 5 (c)) does indicate appreciable differences in variances
over spatial regions, in line with the Bayes factor in favour of this model and the posterior
inference on ν. The Gaussian model is, of course, not as flexible, assuming just one global
variance with posterior 95% credible interval given by (8.64, 17.03). Figure 6 presents the total
2
posterior variance σij
for all time points at stations 1 (small nugget effect) and 17 (large nugget
effect) using the complete model, while horizontal lines indicate the posterior credible interval
for the variance for the Gaussian model. This clearly illustrates the inadequacy of the Gaussian
model in adapting to the temporal changes in the variance. For instance, for station 1 and time
points 5 to 10 it is clear that the Gaussian model overestimates the variance and for time point
31 it is clear that the variance should be larger than the one estimated by the Gaussian model.
This underestimation is even more pronounced for location 17 where the nugget effect is more
important, and even time period 22 seems problematic then.
14
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
In view of the lack of smoothness of the process in time, we do not present any predictions
in time, but we will conduct interpolation to unobserved sites. In our case, we left some stations
out of the estimation sample in order to compare the predictions with the actually observed
maximum temperature. Table 3 shows the estimated value of the LPS as in (22) using the
Gaussian and non-Gaussian models for the 3 stations left out of the estimation step. The nonGaussian models that include λ2 predict better than the Gaussian model. Especially in the
tails, the extreme events are predicted in a much more adequate way. This can be seen in
Figure 7 which plots the predictions (medians and 95% credible intervals) obtained for various
models against actual observations at out-of-sample stations (labelled 1∗ to 3∗ ) for all 31 time
periods. Station 1∗ is very close to other stations and even the Gaussian model gives reasonable
predictions. The non-Gaussian models provide interval widths that are similar to the Gaussian
model on average but the intervals are smaller for many points in time and larger in the tails
(e.g. j = 31) where the posterior prediction interval for the Gaussian model misses the observed
value for the process at stations 2∗ and 3∗ . This suggests that scale mixing in time is essential
in this application in order to produce good interpolations in space, especially for extreme
temperatures.
In order to verify how the Gaussian and non-Gaussian models would predict in other regions
of the spatial domain we repeated the estimation and prediction steps for 5 different partitions of
the data, selecting at random three testing locations and leaving the remaining 67 for estimation.
The results obtained for the Gaussian model and the 2 best non-Gaussian models are presented
in Table 4. The model with λ2 and h has the smallest average log predictive score, followed by
the model with only λ2 .
In summary, whereas Bayes factors favour the model with all three mixing mechanisms, the
model with smoothly varying heterogeneous variance in time and a fat-tailed nugget effect is
the one that does best in out-of-sample predictions.
h
λ1
λ1 & h
λ2
λ2 & h
λ1 & λ2
λ1 , h & λ2
Shifted Gamma estimator
172
148
345
138
279
417
547
Raftery estimator p4
115
116
227
116
203
302
327
Table 1: Log Bayes factor in favour of the model in the column versus the Gaussian model using ShiftedGamma (λ = 0.98) and Newton-Raftery (d = 0.01) estimators for the marginal likelihood.
15
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
35
25
15
20
Maximum Temperature
30
35
30
25
20
Maximum Temperature
15
1
5
9
14 19 24 29 34 39 44 49 54 59 64
1
3
5
7
9 11
14
Station
17
20
23
26
29
Day
12
(b) Temperature over time
12
(a) Temperature over space
10
10
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
4
6
●
●
●
8
●
●
●
●
●
●
●
6
8
●
● ●
empirical variance
●●
4
empirical variance
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
2
2
●●
●
0
10
20
30
40
50
60
0
5
station index
10
15
20
25
30
time index
(c) Empirical variance for each location
(d) Empirical variance for each time point
Figure 2: Data summaries.
a
b
c
α
β
Gaussian
(0.99, 2.04)
(3.50, 8.13)
(0.00, 0.15)
(1.31, 1.83)
(0.33, 0.50)
h
(0.99, 1.98)
(3.29, 8.22)
(0.01, 0.13)
(1.39, 1.74)
(0.31, 0.48)
λ1
(4.17, 24.08)
(3.49, 7.94)
(0.01, 0.13)
(0.58, 0.94)
(0.38, 0.55)
λ1 & h
(1.44, 3.15)
(3.30, 7.70)
(0.01, 0.12)
(1.20, 1.55)
(0.36, 0.53)
λ2
(0.95, 2.01)
(4.28, 9.74)
(0.01, 0.10)
(1.35, 1.84)
(0.32, 0.46)
λ2 & h
(1.10, 2.33)
(3.33, 7.88)
(0.01, 0.13)
(1.29, 1.68)
(0.30, 0.45)
λ1 & λ2
(7.28, 21.20)
(0.78, 4.41)
(0.01, 0.15)
(0.75, 1.00)
(0.29, 0.52)
λ1 , h & λ2
(6.36, 18.14)
(0.39, 2.58)
(0.01, 0.20)
(0.89, 1.11)
(0.31, 0.68)
Table 2: 95% posterior credible intervals for some parameters of interest.
16
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
1
2
3
4
5
6
7
2.0
1.0
0.0
0.0
1.0
Density
Density
2.0
3.0
0.6
0.4
Density
0.2
0.0
0
0.5
1.0
1.5
0.0
0.4
νh
ν
(a) ν
0.8
1.2
ν2
(b) νh
(c) ν2
Figure 3: Prior and posterior densities of the parameters of the mixing distributions for the model with h,
3
σ2τ2 h
30
0
0
1
10
2
20
σ 2 λ2
4
40
5
50
6
λ1 and λ2 . Prior densities are given by dashed lines and posterior densities are indicated by drawn lines.
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31
1 4 7
(a) σ 2 /λ2j , j = 1, . . . , 31
11
16
21
26
31
36
41
46
51
56
61
66
(b) σ 2 τ 2 /hi , i = 1, . . . , 67
Figure 4: Posterior boxplots of the variance structure over time and space corresponding to the model with
2
h and λ2 , where σij
= σ 2 [(1 − τ 2 ) + τ 2 /hi ]/λ2j . The left panel describes the temporal evolution and the
right panel the spatial nugget effect.
17
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
50
3.0
40
2.5
400
30
σ2(1 − τ2) λ1
20
1.0
1.5
σ2τ2 h
2.0
300
200
σ 2 λ2
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31
(a) σ 2 /λ2j , j = 1, . . . , 31
0
0.0
0.5
10
100
0
1
1 4 7
11
16
21
26
31
36
41
46
51
56
61
66
1 4 7
(b) σ 2 τ 2 /hi , i = 1, . . . , 67
11
16
21
26
31
36
41
46
51
56
61
66
(c) σ 2 (1 − τ 2 )/λ1i , i = 1, . . . , 67
Figure 5: Posterior boxplots of the variance structure over time and space for the model with h, λ1 and λ2 ,
2
= σ 2 [(1 − τ 2 )/λ1i + τ 2 /hi ]/λ2j . The left panel describes the temporal evolution, the middle panel
where σij
60
50
40
σ172j
30
0
10
20
30
0
10
20
σ12j
40
50
60
the spatial nugget effect and the right panel the smooth spatial process.
1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31
1
(a) station 1
3
5
7
9
11 13 15 17 19 21 23 25 27 29 31
(b) station 17
Figure 6: Posterior boxplots of the total variance for two stations at each time point corresponding to the
2
model with h, λ1 and λ2 , where σij
= σ 2 [(1 − τ 2 )/λ1i + τ 2 /hi ]/λ2j . The left panel is for station 1 (small
nugget effect) and the right panel for station 17 (large nugget effect). Horizontal lines indicate the 95%
credible interval for the variance of the Gaussian model.
18
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
Station 2*
Station 3*
35
35
Station 1*
*
* * *
*** *
*
*
predicted
**
*
*
25
*
20
predicted
* ***
** *
** *
* * * ** *
*
*
**
*
* ***** * *
** *
* **
*
** *
15
25
* **
*
** **
** ***
*
* **
*
*
*
*
*
* *
**
** ** *
**
10
*
20
20
predicted
30
30
*
*
25
*
18
20
22
24
26
28
30
32
16
18
20
22
observed
24
26
28
22
24
26
28
observed
(a) Gaussian Model
(b) Gaussian Model
32
34
(c) Gaussian Model
Station 1*
Station 1*
25
35
35
30
Station 1*
30
observed
*
*
*
* * *
*** *
*
predicted
**
*
*
25
20
predicted
*
***** *
* **
* * * ****
*
*
**
*
* ***** ** *
* **
*
*
*** *
15
25
* **
*
** **
** ***
** **
* ** *
*
* *
**
** ** *
**
*
10
*
20
20
predicted
30
30
*
18
20
22
24
26
28
30
32
16
18
20
22
observed
24
26
28
22
24
26
28
observed
(d) Model with λ2
(e) Model with λ2
32
34
(f) Model with λ2
Station 2*
Station 3*
35
35
30
Station 1*
30
observed
*
*
* **
*
*** *
*
**
*
predicted
*
* ***** * *
** *
* *
*** * *
*
25
* *
20
predicted
***** *
** **
* ***
*
*
*
*
*
15
25
*
*
*
***
** ** *
**
*
*
*
*
** ***
** **
* ** *
*
*
10
*
20
20
predicted
30
30
25
*
18
20
22
24
26
28
30
32
16
18
20
observed
(g) Model with λ2 & h
22
24
26
observed
(h) Model with λ2 & h
28
22
24
26
28
30
32
34
observed
(i) Model with λ2 & h
Figure 7: Posterior predictive median of the maximum temperature versus the observed maximum temperatures (points) at out-of-sample stations for times j = 1, . . . , 31 . The dashed lines are the 95% credible
predictive intervals, and the solid line indicates y = x.
19
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
model
LPS
Gaussian
97.25
h
112.56
λ1
107.43
λ1 & h
117.20
λ2
76.73
λ2 & h
77.60
λ1 & λ2
96.35
λ1 , h & λ2
90.30
Table 3: Log predictive score (LPS) for the predicted maximum temperature at the out-of-sample
stations.
model
LPS
Gaussian
88.70
λ2
86.64
λ2 & h
80.59
Table 4: Average log predictive score (LPS) for the predicted maximum temperature at the out-of-sample
stations for 5 partitions of the data set.
20
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
6
Conclusions and future work
We present a non-Gaussian spatiotemporal model that is able to capture departures from Gaussianity in terms of outlier contamination and regions in space or time with larger observational
variance. The proposed finite dimensional distributions have heavier tails than the normal distribution and have the normal as a limiting case. The general model includes correlated mixing
in the spatiotemporal process (both in time and space) and in the nugget effect. This model is
quite flexible, combining nonseparability and non-Gaussian behaviour, and performed well on
simulated data. This was also illustrated in an application to Spanish temperature data, where
simultaneous mixing in the nugget effect and in time seems an essential feature. Prediction is
straightforward, using the fact that we are scale mixing Gaussian processes, and efficient MCMC
algorithms for posterior and predictive inference also immediately allow for the identification of
outliers and regions in time and space with inflated variances.
As a topic of future research, it might be interesting to explore the effect of using C1∗ (s) =
M1 (−γ1 (s)) and C2∗ (t) = M2 (−γ2 (t)) in the model for posterior and predictive inference, and to
investigate ways of separately modelling correlation structures for the mixing variables and the
observables. In the simulated examples we examined here, it did not seem to be an important
restriction but this might be the case for some applications.
Appendix A
Proof of Proposition 3.1
The Gaussian-Log-Gaussian process {Z̃(s, t); (s, t) ∈ D × T } as defined in (5) and (10) with no
nugget effect can be rewritten as
1
Z̃(s, t) = p
λ1 (s)
1
Z1 (s; U )Z2 (t; V ) = p
λ1 (s)
Z(s, t).
Then for a fixed time point t0 we have
C̃(s, 0) = σ 2 C(s, 0)f (s),
where f (s) = exp{ ν4 [M1 (−γ1 (s)) − 1] + ν} and C(s, 0) = σ 2 M0 (−γ1 (s))M1 (−γ1 (s)). Therefore,
(i)
P2m
(2m−i)
C̃ (2m) (s, 0) = i=0 2m
(s).
i C (s, 0)f
By Faá di Brunos’s formula, termwise differentiation of C(s, 0) results in
C
(2m)
(s, 0)
=
X
A
Y
m!
y (k) (−γ1 (s))
k1 !k2 !...k2m !
kj 6=0
(2m)
= {y (1) (−γ1 (s))[γ1
(
(j)
−γ1 (s)
j!
)kj
(30)
(1)
(s)] + . . . + y (2m) (−γ1 (s))[γ1 (s)]2m }
where A = {k1 , k2 , ..., k2m : k1 + 2k2 + ... + 2mk2m = 2m}, k = k1 + k2 + ... + k2m , ki ≥ 0, i =
(k−i)
Pk
(i)
1, 2, . . . , 2m and y (k) (x) = i=0 ki M0
(x)M1 (x).
21
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
In the expression (30), the highest order derivative of γ1 (s) is 2m obtained when k2m = 1 and
k1 = . . . = k2m−1 = 0. Thus, in this case, the behaviour of C (2m) (s, 0) as s → 0 depends only
(2m)
on the local behaviour of γ1
(1)
(s) and M1 (r) as s, r → 0. And the highest order derivative of
M1 (−γ1 (s)) is 2m obtained when i = 2m. Thus, the behaviour of C (2m) (s, 0) as s → 0 depends
(2m)
only on the local behaviour of M1
(1)
(k)
(r) and γ1 (s) as s, r → 0. Notice that M0 (0) = E[X0k ]
always exists since X0 ∼ Gamma(λ0 , 1) in our model.
(2m)
(2m)
This proves that C (2m) (s, 0) exists and is finite at 0 if and only if M1 (r) and γ1 (s)
n
o
(2m)
are finite and exist at 0. Also, f (2m) (s) = γ4 f (s) M1 (−γ1 (s)) + g(ν, s) exists and is finite
under the same conditions (see Palacios and Steel, 2006). The result follows.
References
Cressie, N. and Hawkins, D. M. (1980). “Robust Estimation of the Variogram: I.” Mathematical
Geology, 12, 115–125.
Damian, D., Sampson, P. D., and Guttorp, P. (2001). “Bayesian estimation of semi-parametric
non-stationary spatial covariance structures.” Environmetrics, 12, 161–178.
— (2003). “Variance modeling for nonstationary spatial processes with temporal replications.”
Journal of Geophysical Research, 108, 1–12.
De Oliveira, V., Kedem, B., and Short, D. A. (1997). “Bayesian Prediction of Transformed
Gaussian Random Fields.” Journal of the American Statistical Association, 92, 1422–1433.
Fonseca, T. C. O. and Steel, M. (2008). “A New Class of Nonseparable Spatiotemporal Models.”
CRiSM Working Paper 08-13, University of Warwick.
Gneiting, T. and Raftery, A. E. (2007). “Strictly proper scoring rules, prediction and estimation.”
Journal of the American Statistical Association, 102, 477, 360–378.
Lu, C., Kou, Y., Zhao, J., and Chen, L. (2007). “Detecting and tracking regional outliers in
meteorological data.” Information Sciences, , 177, 1609–1632.
Ma, C. (2002). “Spatio-Temporal Covariance Functions Generated by Mixtures.” Mathematical
Geology, 34, 8, 965–975.
— (2003). “Spatio-Temporal Stationary Covariance Models.” Journal of Multivariate analysis,
86, 97–107.
Meng, X. L. and Wong, W. H. (1996). “Simulating Ratios of Normalizing Constants via a Simple
Identity: A Theoretical Exploration.” Statistica Sinica, 6, 831–860.
22
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
Newton, M. A. and Raftery, A. E. (1994). “Approximate Bayesian Inference With the Weighted
Likelihood Bootstrap.” Journal of the Royal Statistical Society Series B , 56, 3–48.
Palacios, M. B. and Steel, M. F. J. (2006). “Non-Gaussian Bayesian Geostatistical Modeling.”
Journal of the American Statistical Association, 101, 474, 604–618.
Raftery, A. E., Newton, M. A., Satagopan, J. M., and Krivitsky, P. N. (2007). “Estimating
the Integrated Likelihood via Posterior Simulation Using the Harmonic Mean Identity.” In
Bayesian Statistics 8 , 371–416. Oxford University Press.
Roislien, J. and Omre, H. (2006). “T-distributed Random Fields: A Parametric Model for
Heavy-tailed Well-log Data.” Mathematical Geology, 38, 7, 821–849.
Shkarofsky, I. P. (1968).
“Generalized Turbulence Space-Correlation and Wave-Number
Spectrum-Function Pairs.” Canadian Journal of Physics, 46, 2133–2153.
Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer New
York.
— (2009). “Spatial interpolation of high-frequency monitoring data.” Annals of Applied Statistics, 272–291.
Stein, M. L., Chi, Z., and Welty, L. J. (2004). “Approximating Likelihoods for Large Spatial
Data Sets.” Journal of the Royal Statistical Society Series B , 66, 275–296.
23
CRiSM Paper No. 09-33, www.warwick.ac.uk/go/crism
Download