Spatial Modelling of Count Data: A Case Study in Modelling

advertisement
Spatial Modelling of Count Data:
A Case Study in
Modelling Breeding Bird Survey Data
on Large Spatial Domains
Christopher K. Wikle
University of Missouri-Columbia
2
0.1 Introduction
The North American Breeding Bird Survey (BBS) is conducted each breeding season by volunteer observers (e.g., Robbins et al. 1986). The observers
count the number of various species of birds along specified routes. The collected data are used for several purposes, including the study of the range of
bird species, and the variation of the range and abundance over time (e.g.,
Link and Sauer, 1998). Such studies usually require spatial maps of relative
abundance. Traditional methods for producing such maps are somewhat ad
hoc (e.g., inverse distance methods) and do not always account for the special discrete, positive nature of the count data (e.g., Sauer et al. 1995). In
addition, corresponding prediction uncertainties for maps produced in this
fashion are not typically available. Providing such uncertainties is critical
as the prediction maps are often used as ”data” in other studies and for
the design of auxiliary sampling plans.
We consider the BBS modeling problem from a hierarchical perspective,
modeling the count data as Poisson, conditional on a spatially-varying intensity process. The intensities are then assumed to follow a log-normal
distribution with fixed effects and with spatial and non-spatial random
effects. Model-based geostatistical methods for generalized linear mixed
models (GLMMs) of this type have been available since the seminal work
of Diggle et al. (1998). However, implementation is problematic when there
are large data sets and prediction is desired over large domains. We show
that by utilizing spectral representations of the spatial random effects process, Bayesian spatial prediction can easily be carried out on very large
data sets over extensive prediction domains.
The BBS sampling unit is a roadside route 39.2 km in length. Over each
route, an observer makes 50 stops, at which birds are counted by sight and
sound for a period of 3 minutes. Over 4000 routes have been included in
the North American survey, but not all routes are available each year. As
might be expected due to the subjectivity involved in counting birds by
sight and sound, and the relative experience and expertise of the volunteer
observers, there is substantial observer error in the BBS survey (e.g., Sauer
et al. 1994).
In this study, we are concerned with the relative abundance of the House
Finch (Carpodacus mexicanus). Figure 0.1 shows the location of the sampling route midpoints and observed counts over the continental United
States (U.S.) for the 1999 House Finch BBS. The size of the circle radius
is proportional to the number of birds observed at each site. This figure
suggests that the House Finch is more prevalent in the Eastern and Western U.S. than in the interior. Indeed, this species is native to the Western
U.S. and Mexico. The Eastern population is a result of a 1940 release of
caged birds in New York. The birds were being sold illegally in New York
City as “Hollywood Finches” and were supposedly released by dealers in
THE POISSON RANDOM EFFECTS MODEL
3
Figure 0.1 Observation locations for 1999 BBS of House Finch (Carpodacus mexicanus). Radius and color are proportional to the observed counts.
an attempt to avoid prosecution. Within three years there were reports of
the birds breeding in the New York area. Because the birds are prolific
breeders and their juveniles disperse over long distances, the House Finch
quickly expanded to the west (Elliott and Arbib, 1953). Simultaneously, as
the human population on the west coast expanded eastward (and correspondingly, changed the environment) the House Finch expanded eastward
as well. By the late 1990’s, the two populations met in the Central Plains
of North America.
From Figure 0.1 it is clear that there are many regions of the U.S. that
were not sampled in the 1999 House Finch BBS. Our interest here is to
predict abundance over a relatively dense network of spatial locations, every
quarter degree of latitude and longitude. The network of prediction grid
locations includes 228 points in the longitudinal and 84 in the latitudinal
direction, for a total of 19,152 prediction grid locations.
0.2 The Poisson Random Effects Model
Consider the model for the count process y(x) given a spatially varying
mean process λ(x):
y(x)|λ(x) ∼ P oisson(λ(x)).
(0.1)
The log of the spatial mean process is given by:
log(λ(x)) = µ + z(x) + η(x),
(0.2)
where µ is a deterministic mean component, z(x) is a spatially correlated
random component, and η(x) is an uncorrelated spatial random component. In general, the fixed component µ might be related to spatiallyvarying covariates (such as habitat) and could include “regression” terms.
We will consider the simple constant mean formulation in this application.
4
The correlated process, z(x), is necessary in this application because we
have substantial prior belief that the counts at “nearby” routes are correlated. From a scientific point of view, this is likely due (at least in part)
to the fact that the birds are attracted to specific habitats, and we know
that habitat is correlated in space. Typically, one can view the z-process as
accounting for the effects of “unknown” covariates, since it induces spatial
structure in the λ-process, and thus the observed counts. In that sense,
maps of the z-process may be interesting and lead to greater understanding as to the preferred habitat of the modeled bird species (e.g., Royle
et al, 2001). The random component η(x) accounts for observer effects.
A major concern in the analysis of BBS data is the known observer bias,
as discussed previously. Typically, we can assume that since the observers
produce counts on different routes, they are independent with regards to
space.
The above discussion suggests that we might model z(x) as a Gaussian
random field with zero mean and covariance given by cθ (x, x0 ), where θ
represents parameters (possibly vector-valued) of the covariance function
c. In addition, we assume η(x) ∼ N (0, ση2 ), where cov(η(x), η(x0 )) = 0 if
x 6= x0 .
As presented, the Poisson spatial model follows the framework for generalized geostatistical prediction formulated in Diggle et al. (1998). An example of this approach applied to the BBS problem can be found in Royle et
al. (2001). However, implementation in that case was concerned with relatively small data sets and over limited geographical regions. The Gaussian
random field-based Bayesian hierarchical approach becomes increasingly
difficult to implement as the dimensionality of the data and number of prediction locations increases. Consequently, such an approach is not feasible
at the continental scale and high resolution that we require in the present
application. However, as outlined in Royle and Wikle (2001), one can still
use the Bayesian GLMM methodology in these high-dimensional settings
if one makes use of spectral representations. This approach is summarized
in the next section.
0.2.1 Spectral Formulation
Let {xi }m
i=1 be the set of data locations, at which counts y(xi ) were observed. Further, let {xj }nj=1 be the set of prediction locations, which may,
but need not, include some or all of the m data locations. We now rewrite
the mean-process model (0.2):
log(λ(xi )) = µ + k0i z n + η(xi ),
(0.3)
where z n is an n × 1 vector representation of z-process at the prediction
locations, and the vector ki relates the log-mean process at observation
location xi to one or more elements of the z-process at prediction locations
THE POISSON RANDOM EFFECTS MODEL
5
(e.g., Wikle et al. 1998; Wikle et al. 2001). We then assume:
z n = Ψα + ²,
(0.4)
where Ψ is an n×p matrix, fixed and known, α is a p×1 vector of coefficients
with α ∼ N (0, Σα ), and ² ∼ N (0, σ²2 I). We let Ψ consist of spectral basis
0
functions [ψj,k ]n,p
j=1,k=1 that are orthogonal. That is, if ψ k ≡ [ψ1,k , . . . , ψn,k ]
then ψ 0k ψ j = 0 if k 6= j and 1, otherwise. In this case, we say that α are
spectral coefficients. From a hierarchical perspective, we can write:
z n |α, σ²2 ∼ N (Ψα, σ²2 I)
(0.5)
α|Σα ∼ N (0, Σα ).
(0.6)
and
In general, the covariance function for the α-process depends on some parameters θ; we denote this covariance by Σα (θ). The modeling motivation
for the hierarchy is apparent if we note that the random z-process can be
written z n ∼ N (0, Σz (θ) + σ² I), where σ²2 accounts for the “nugget effect”
due to small scale variability. Given (0.4), the covariance function for the
z-process can be written, Σz (θ) = ΨΣα (θ)Ψ0 .
In principle, any set of orthogonal spectral basis functions could be used
for Ψ. For example, one could use the leading variance modes of the covariance matrix Σz . Such modes are just the eigenvectors that diagonalize
the spatial covariance matrix and thus are just principal components. These
spatial principal components are known as Empirical Orthogonal Functions
(EOFs) in the geostatistical literature (e.g., Obled and Creutin, 1986; Wikle
and Cressie, 1999). Such a formulation is advantageous because it allows
for non-stationary spatial correlation and dimension reduction (p << n).
Another possibility would be to use Fourier basis functions in Ψ. This could
apply if the prediction locations were defined in continuous space or on a
grid. However, as we will demonstrate, if we choose a grid implementation,
one need not actually form the matrix Ψ, which would be problematic for
grid sizes of order 105 as we consider here. That is, the operation Ψα is actually an inverse Fourier transform operation on the vector α. On a discrete
lattice, one can use Fast Fourier Transform (FFT) procedures to efficiently
implement this transform without having to make or store the matrix of
basis functions. In this case, p = n. If the z-process is stationary, the use of
Fourier basis functions suggests that the matrix Σα (θ) is diagonal (asymptotically). For situations where it is more appropriate to assume that the
process is nonstationary and the prediction locations can be thought of
as a discrete grid, one could consider a wavelet basis function for Ψ. In
this case, the operation Ψα is just an inverse discrete wavelet transform
of α; again, Ψ need not be constructed directly. Depending on the class
of wavelets chosen, the matrix Σα (θ) may be diagonal (asymptotically) or
nearly so.
6
In the hierarchical implementation, the parameterization of Σα (θ) is
especially critical. For example, with wavelet basis functions, we might
assume a fractional scaling behavior in the variance of the different wavelet
scales. This is particularly useful when the process is known to exhibit such
behavior, such as turbulence examples in atmospheric science (e.g., Wikle
et al. 2001). Alternatively, we might assume a common stationary class for
the z-process, such as the Matérn class of covariance functions,
c(dij ) = φ(θ1 dij )θ2 Kθ2 (θ1 dij ), φ > 0, θ1 > 0, θ2 > 0,
(0.7)
where dij is the distance between two spatial locations, Kθ2 is the modified
Bessel function, θ2 is related to the degree of smoothness of the spatial
process, θ1 is related to the correlation range, and φ is proportional to the
variance of the process (e.g., Stein 1999, p.48). The corresponding spatial
spectral density function at frequency ω is,
f (ω; θ1 , θ2 , φ, g) =
2θ2 −1 φΓ(θ2 + g/2)θ1 2θ2
,
π g/2 (θ1 2 + ω 2 )θ2 +g/2
(0.8)
where g is the dimension of the spatial process (e.g., Stein 1999, p. 49).
Thus, if one chooses Fourier basis functions for Ψ and assumes the Matérn
class, then Σα (θ) should be diagonal (asymptotically) with diagonal elements corresponding to f given by (0.8). If not known, one must specify
prior distributions for θ and φ at the next level of the model hierarchy.
0.2.2 Model Implementation and Prediction
The hierarchical Poisson model with a spectral spatial component is summarized as follows. The joint likelihood for all observations y (an m × 1
vector) is
m
Y
[y|λ] =
P oisson(λ(xi )),
(0.9)
i=1
where λ is an m × 1 vector, corresponding to the locations of the vector y.
The joint prior distribution for log(λ(xi )) is:
[log(λ)|µ, γ, z n , σn2 ] = N (µ1 + γKz n , ση2 I),
(0.10)
where 1 is an m × 1 vector of ones, log(λ) is the m × 1 vector with elements
log(λ(xi )), K is an m×n matrix with rows k0i , and γ is a scaling coefficient
(introduced for computational reasons as discussed below). Then, let
[z n |α, σe2 ] = N (Ψα, σe2 I),
(0.11)
and allow the spectral coefficients to have distribution,
[α|Rα (θ1 )] = N (0, Rα (θ1 )),
(0.12)
where Rα (θ1 ) is a diagonal matrix. For the BBS illustration presented here,
we let θ2 = 1/2 in (0.7) (i.e., we assume the covariance model is exponen-
THE POISSON RANDOM EFFECTS MODEL
7
tial) but assume the dependence parameter θ1 is random. Note that as a
consequence of including the γ parameter in (0.10) we are able to specify the conditional covariance of α as the diagonalization of a correlation
matrix rather than a covariance matrix (see discussion below). Finally, to
complete the model hierarchy, we assume the remaining parameters are
independent and specify the following prior distributions:
µ ∼ N (µ0 , σµ2 ),
ση2 ∼ IG(qη , rη ),
σe2 ∼ IG(qe , re ),
γ ∼ U [0, b],
θ1 ∼ U [u1 , u2 ],
(0.13)
(0.14)
where IG( ) refers to an inverse gamma distribution, and U [ ] a uniform
distribution. For the BBS House Finch data we select qη = 0.5, qe = 1,
rη = 2, re = 10, µ0 = 0, σµ2 = 10, b = 100, u1 = 1, and u2 = 100 (note,
our parameterization of the exponential is r(d) ∝ exp(−θ1 d), where d is
the distance). These hyperparameters correspond to rather vague proper
priors.
The alternative to specifying γ in (0.10) is to let the conditional covariance of α be σα2 Rα (θ1 ). However, as is often the case for Bayesian spatial
models that are deep in the model hierarchy (and thus, relatively far from
the data), the MCMC implementation has difficulty converging because
of the tradeoff between the spatial process variance, σα2 , and the dependence parameter, θ1 . By allowing the z-process to have unit variance, as in
the above formulation, we need not estimate σα2 (which is 1 in this case).
The variance in the spatial process is then achieved through γ. In situations where the implied assumption of homogeneous variance is unrealistic,
a more complicated reparameterization would be required. Note that the
γ parameterization also affects the interpretation of the variance of the
z-process (i.e., σe2 = σ²2 /γ 2 ).
Our goal is the estimation of the joint posterior distribution,
[log(λ), z n , θ1 , γ, ση2 , σe2 , µ|y]
∝
[y| log(λ)][log(λ)|µ, z n , ση2 ][z n |α, σe2 ]
× [α|θ1 ][θ1 ][γ][µ][ση2 ][σe2 ]
Although this distribution cannot be analyzed directly, we are able to use
MCMC approaches as suggested by Diggle et al. (1998) to draw samples
from this posterior and appropriate marginals. In particular, we utilized a
Gibbs sampler with Metropolis-Hastings sampling of log(λ) and θ1 (e.g.,
see Royle et al. 2001). Perhaps more importantly, we would like estimates
from the posterior distribution of λn , the λ-process at prediction grid locations. The key difficulty in the traditional (non-spectral) geostatistical
formulation is the dimensionality of the full-conditional update for the zprocess given all other parameters. As we show below, this is no longer a
serious problem if we make use of the spectral representation.
8
Selected Full-Conditional Distributions
As mentioned above, for the most part the full-conditional distributions
follow those outlined generally in Diggle et al. (1998) and specifically, those
in Royle et al. (2001). However, the spectral representation allows simpler
forms for the z n and α full-conditionals.
The full-conditional distribution for z n can be shown to be:
−1
z n |· ∼ N (S −1
z az , S z ),
I/σe2
0
2
/ση2
Ψα/σe2
(0.15)
0
where S z =
+ K Kγ
and az =
+ K (log(λ) − µ1)γ/ση2 .
In our case, K is an incidence matrix (a matrix of ones and zeros) such
that each observation is only associated with one prediction grid location
(a reasonable assumption at the resolution presented here). Thus, K 0 K
can be shown to be a diagonal matrix with 1’s and 0’s along the diagonal.
Although the matrix S z is very high-dimensional (order 105 × 105 ), it is
diagonal and trivial to invert. In addition, Ψα can be calculated by the
inverse FFT function (a fast operation) and z n is updated as simple univariate normal distributions. In practice, we update these simultaneously
in a matrix language implementation.
Similarly,
−1
α|· ∼ N (S −1
(0.16)
α aα , S α ),
where S α = (Ψ0 Ψ/σe2 + Rα (θ1 )−1 ) and aα = Ψ0 z n /σe2 . At first glance,
this appears problematic due to the Ψ0 Ψ and Rα (θ1 )−1 terms in the fullconditional variance. However, since the spectral operators are orthogonal,
Ψ0 Ψ = I and the matrix Rα (θ1 )−1 is diagonal as discussed previously.
Furthermore, Ψ0 z n is just the FFT operation on z n and is very fast. Thus,
α|· ∼ N ((I/σe2 + Rα (θ1 )−1 )−1 Ψ0 z n /σe2 , (I/σe2 + Rα (θ1 )−1 )−1 )
(0.17)
and can be sampled as individual univariate normals, or easily in a block
update.
Prediction
To obtain predictions of λn , the λ-process at the prediction grid locations,
we sample from
(t)
(t)
(t)
2 (t)
2 (t)
[log(λ(t)
] = N (µ(t) 1 + γ (t) z (t)
I),
n )|z n , γ , µ , ση
n , ση
(t)
(0.18)
2 (t)
are the t-th samples from the
where 1 is n × 1 and µ(t) , γ (t) , z n , ση
MCMC simulation. We obtain λ(t)
by
simply
exponentiating these samples.
n
Implementation
The MCMC simulation must be run long enough to achieve precise estimation of model parameters and predictions. For the BBS House Finch data,
the MCMC simulation was run for 200,000 iterations after a 50,000 burn-in
RESULTS
9
period. For sake of comparison, the algorithm took approximately 0.5 seconds per iteration with a MATLAB implementation on a 500 MHz Pentium
III processor running Linux. Considering there are nearly 20,000 prediction
locations and relatively strong spatial structure, this is quite fast. We examined many shorter runs to establish burn-in time and to evaluate model
sensitivity to the fixed parameters and starting values. The model does not
seem overly sensitive to these parameters.
0.3 Results
The posterior mean and posterior standard deviation for the scalar parameters are shown in Table 0.1.
Table 0.1 Posterior mean and standard deviation of univariate model parameters.
Parameter
µ
γ
ση2
σe2
θ1
Posterior
Mean
Posterior
Standard
Deviation
0.74
1.41
0.84
0.23
14.78
0.105
0.138
0.100
0.064
4.605
Figure 0.2 shows the posterior mean for the gridded z-process. We note
the agreement with the data shown in Figure 0.1. One might examine this
map to indentify possible habitat covariates that are represented by the
spatial random field. One possibility in this case might be elevation and
population, both of which are thought to be associated with the prevalence
of the House Finch.
We note that the prediction grid extends beyond the continental United
States. Clearly, estimates over ocean regions are meaningless with regard
to House Finch data. These estimates are a result of the large-scale Fourier
coefficients in the model. Fortunately, the map of posterior standard deviations for this process, shown in Figure 0.3, indicates that these regions
with no-data are highly suspect. This is also true of the northern plains
region, which has few observations. Of course, having the prediction grid
extend over the ocean is not ideal in this case, but the FFT-based algorithm requires rectangular grids. We could control for the land-sea effect
by having an indicator covariate or possibly, a regime-specific model. Such
modifications would be simple to implement in the hierarchical Bayesian
framework presented here. However, simulation studies have shown that
10
Figure 0.2 Posterior mean of
zn
for the 1999 BBS House Finch data.
Figure 0.3 Posterior standard deviation of
data.
zn
for the 1999 BBS House Finch
these are not necessary and if desired, one could simply mask the water
portions of the map for presentation.
Finally, Figure 0.4 and Figure 0.5 show the posterior mean and standard
deviation of the λ-process on the prediction grid. These plots show clearly
that the posterior standard deviation is proportional to the predicted mean,
as expected with Poisson count data. In addition, the standard errors are
also high in data sparse regions, as we expect.
0.4 Conclusion
In summary, we have demonstrated how the Bayesian implementation of
geostatistical-based GLMM Poisson spatial models can be implemented in
problems with very large numbers of prediction locations. By utilizing relatively simple spectral transforms and associated orthogonality and decorrelation, we are able to implement the modeling approach very efficiently
in general MCMC algorithms.
CONCLUSION
Figure 0.4 Posterior mean of gridded
11
n
for the 1999 BBS House Finch data.
Figure 0.5 Posterior standard deviation of
data.
n
for the 1999 BBS House Finch
Acknowledgement
This research has been supported by a grant from the U.S. Environmental
Protection Agency’s Science to Achieve Results (STAR) program, Assistance Agreement No. R827257-01-0. The author would like to thank Andy
Royle for providing the BBS data and for helpful discussions.
References
Diggle, P.J., J.A. Tawn, and R.A. Moyeed. 1998. Model-based geostatistics
(with discussion). Applied Statistics 47:299-350.
Elliott, J.J., and R.S. Arbib. 1953. Origin and status of the house finch in
the eastern United States. Auk 70:31-37.
Link, W.A., and J.R. Sauer. 1998. Estimating population change from
12
count data: application to the North American Breeding Bird Survey.
Ecological Applications 8:258-268.
Obled, C., and J.D. Creutin. 1986. Some developments in the use of empirical orthogonal functions for mapping meteorological fields. J. Climate
and Applied Meteorology 25:1189-1204.
Robbins, C.S., D.A. Bystrak, and P.H. Geissler. 1986. The Breeding Bird
Survey: its first fifteen years, 1965-1979. USDOI, Fish and Wildlife Service Resource Publication 157. Washington, D.C.
Royle, J.A., W.A. Link, and J.R. Sauer. 2001. Statistical mapping of count
survey data. In Predicting Species Occurrences: Issues of Scale and Accuracy, (Scott, J. M., P. J. Heglund, M. Morrison, M. Raphael, J. Haufler,
B. Wall, editors). Island Press. Covello, CA. (to appear)
Royle, J.A., and C.K. Wikle. 2001. Large-scale spatial modeling of breeding
bird survey data. Under review.
Sauer, J.R., B.G. Peterjohn, and W.A. Link. 1994. Observer differences in
the North American Breeding Bird Survey. Auk 111:50-62.
Sauer, J.R., G.W. Pendleton, and S. Orsillo. 1995. Mapping of bird distributions from point count surveys. Pages 151-160 in C.J. Ralph, J.R.
Sauer, and S. Droege, eds. Monitoring Bird Populations by Point Counts,
USDA Forest Service, Pacific Southwest Research Station, General Technical Report PSW-GTR-149.
Stein, M. 1999. Interpolation of Spatial Data: Some Theory for Kriging.
Springer-Verlag: New York.
Wikle, C.K., Berliner, L.M., and N. Cressie. 1998. Hierarchical Bayesian
space-time models. Journal of Environmental and Ecological Statistics
5:117–154.
Wikle, C.K. and N. Cressie. 1999. A dimension reduction approach to spacetime Kalman filtering. Biometrika 86:815-829.
Wikle, C.K., R.F. Milliff, D. Nychka, and L.M. Berliner. 2001. Spatiotemporal hierarchical Bayesian modeling: Tropical ocean surface winds. Journal of the American Statistical Association 96:382-397.
Download