Integrating Spatial Statistics With GIS and Inventories Robin M. Reich

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Integrating Spatial Statistics With GIS and
Remote Sensing in Designing Multiresource
Inventories 1
Robin M. Reich 2
Vanessa A. Bravo3
Abstract-In order to design an integrated multiresource inventory and monitoring system that evaluates the status and trends of
natural resources (forest, rangeland, agriculture, wildlife, hydrology, soils, etc.) baseline data for comparison is needed. These
systems are generally complex and it may not be wise to select just
one or two variables for monitoring purposes. Also, analyzing these
variables independently of one another may lead to incorrect conclusion because of their inter-dependencies. One approach is to
model the spatial relationship that exists between key variables.
This information can then be used, for example, to identify forest
habitat that are either conducive, or deterrent to the presence of
ecologically important plant and/or animal species. Techniques
commonly used in describing spatial relationships between two or
more variables include regression analysis and a variety of spatial
and geostatistical procedures such as kriging and cokriging. The
use of spatially explicit models can be used to monitor the efficiency
of certain components of proposed management plans, as well as
provide a general prediction of how key indicator variables are
changing in time and space. Such models also provide greater
insight into changes in the landscape, both on the macro- and microscale, and more importantly, the consequential impact these changes
have on selected species. Theoretical and technical aspects of this
approach are briefly described in this paper.
Spatial Modeling
An important problem facing natural resource managers
is the integration of several types of data when modeling
the spatial dynamics of an individual population. There are
two aspects to the problem: first, the integration of data
from different sources at a fine enough resolution, and
second, modeling the spatial dynamics of an individual
population.
The first aspect, data integration, has been researched
extensively during the last decade. The most widely accepted procedure of integrating spatial data is the use of
geographic information systems (GIS). GIS allow for the
collection, storage, and analysis of objects and phenomena
where geographic location is an important characteristic of,
or critical to analysis (Arnoff 1991). GIS has been used for
a variety of purposes, including the identification of
Ipaper presented at the North American Science Symposium: Toward a
Unified Framework for Inventorying and Monitoring Forest Ecosystem
Resources, Guadalajara, Mexico, November 1-6,1998.
2 Robin M. Reich is professor, Department of Forest Sciences, Colorado
State University, Fort Coliins, Colorado, 80521 USA.
3 Vanessa A. Bravo is researcher, Quantitative Spatial Analysis Company,
Fort Collins, Colorado USA 80525.
202
suitable wildlife habitat, timber harvest schedules, modeling biodiversity and population dynamics (Lui et al. 1995).
Integration of remotely sensed data and geographic information systems is becoming an extremely powerful tool for
producing maps of ecosystem resources and has become
vital to resource managers in making decisions and establishing policy (Arnoff 1991). The main obstacle in the development of a descriptive GIS model is the coarse-grained
resolution of raster data.
Spatial Predictive Models
The ability to model the small scale variability in stand
characteristics requires the generation offull-coverage maps
depicting stand characteristics measured in the field. While
remotely sensed data has been shown to provide reliable
information for macro-scale ecological monitoring, it falls
short in providing the precision required by more refined
ecosystem resource models (Gown et al. 1994). Spatial statistics and geostatistics provide a means to developing
spatial models that can be used to correlate remotely sensed
imagery with field measurements. If a satellite image is
geographically referenced to a base-map, one can overlay
the location of field plots on the image to obtain pixel
intensities associated with each of the field plots. Thus, for
each sample plot we have field data describing stand characteristics and seven intensities representing the 7 TM
bands (Fig. 1A) (Arnoff 1991). If the field data is spatially
correlated with the intensity ofthe remotely sensed image it
is possible to develop a model describing this spatial continuity (Cliff and Ord 1981). It is also possible to include geographical variables, such as elevation, slope, aspect, and
precipitation thought to influence the large scale spatial
variability of the environmental property and is available in
the form of a complete coverage of the study area. The
functional form of this model is defined as:
<1>0
=i
i+j
if3ijX{OX;o +
~p
f.k YkYkO +
110
(1)
where, l3ij are the regression coefficients associated with
the trend surface component of the model, Yk are the regression coefficients associated with the q auxiliary variables,
YkO, available as a coverage in the GIS data base, and 110 is the
error term which mayor may not be spatially correlated with
its neighbors (Kallas 1997; Metzger 1997).
Once a spatial, or temporal dependancy is established for
a given variable, this information can be used to interpolate
values for points not measured (Robertson 1987). In most
sample surveys, supplemental information is collected in
USDA Forest Service Proceedings RMRS-P-12. 1999
A
To account for this spatial autocorrelation in the residuals
of the model developed to describe large scale spatial variability, we propose to model the small scale spatial variability (i.e. spatial noise) using the cokriging model:
113
n
'110
m
n
= I,w r 'l1r + I, I,vtrUtr + Eo
r=1
t=1 r=1
(2)
subject to the linear constraints:
5
r=1
69
B
35
o
c
13
10
6
Figure 1.-Gray scale maps of an 890 ha experimental forest northeast of Gainesville, Florida depicting: (A) Digital numbers for band 5 of a Landsat
imagery; (8) Estimated basal area (m2/ha) at a 10m
resolution for a selected portion of the forest (R2 =
0.77); (C) Standard error of prediction of basal area
(m2/ha) for a selected portion of the forest. The center
of the black circles are the approximate locations of the
sample plots used in developing the spatial model for
basal area (Metzger 1997).
t=1 r=1
(3)
where, Wr are the kriging weights associated with the nnearest residuals, Vtr' are the cokriging weights associated
with the m auxiliary variables, Utr that are spatially correlated to the residuals, and E ais the error term which we will
assume to be spatially independent and normally distributed with mean 0 and variance cr2 . One of the appealing
features of cokriging is that the auxiliary information does
not have be collected at the same data points as the variable
of interest. This allows us to combine remote sensing and
field data to provide a full coverage map with a higher
resolution than would have been possible by using remote
sensing and field data alone. In essence, remote sensing
images provide information on large scale spatial variability, while field data provides information on small scale
spatial variability.
Prior to fitting the cokriging model, the residuals of the
model describing the large scale spatial variability are
analyzed for anisotropy (spatial autocorrelation changes
with direction). The residuals are also evaluated for the
presence of spatial cross-correlation (Bonham et a1. 1995;
Czaplewski and Reich 1993; Reich et a1. 1994) with the
independent variables included in the large scale model, or
variables for which only data associated with field plot
locations is available. Complete coverage of the variables
associated with the field data is not available in the GIS
database. If no spatial cross-correlation is detected, the
residuals can be modeled using ordinary kriging, otherwise
the residuals are modeled using cokriging.
Spatial dependency ofthe residuals can be modeled using
the Gaussian semi-variogram
y(h;8){
~o +c1(1- exp(- 311hlll a)th~o}
(4)
or some other appropriate model (spherical, exponential,
linear, etc.), where 8 = (co, Cb a) is a vector of parameters
subject to the constraints Co ~ 0, Cl ~ 0, and a ~ O. In modeling
the cross-correlations between the residuals and independent variables, the constraints for the model are relaxed to
allow the parameters Co and Cl to take on negative values.
The parameters of the semi-variogram model are estimated by minimizing:
k
I,nj
addition to the variable of interest (i.e., average stand
diameter, percent crown cover, food availability, etc.). If
these variables are spatially correlated with the variable of
interest, this information can be used to improve estimates
(lsaaks and Srivastava 1989). The use of auxiliary information in spatial prediction is referred to as cokriging. The
usefulness of auxiliary information is enhanced by the fact
that the variable of interest is generally under sampled
(Isaaks and Srivastava 1989).
USDA Forest Service Proceedings RMRS-P-12. 1999
{2Y"( h(j»-2y( h(j);8)}2
(5)
j=1
where 2y*( ) is the sample variogramlcross-variogram obtained at k lags (h(l), ... , h(k», nj is the number of observations contributing to the estimator at each lag, and 2y(h; 8)
is the semi-variogram model with parameter 8 = (co, CIo a).
Prior to fitting the variogram and cross-variogram model,
the residuals and independent variables are rescaled by
dividing the individual variables by their respective maximum values (Carr and McCallister 1985). The predicted
203
surface of scaled residuals obtained using kriginglcokriging
are then rescaled back to their original units by multiplying
the surface by the maximum observed residual. The rescaled
surface of the predicted residuals are then added to the
predicted surface describing large scale spatial variability to
create the final surface with the desired scale (Fig. IB):
<PO
= ttf3ijXtox~o+ tYkYkO+ i Wr11r+ i:ivtrUtr+Eo
i+ j
k
~p
r=1
i
(6)
V( <I»
~(Sj)}2
n
L (C/l(s) - ~(Sj)}2
(7)
j=1
In addition, response surfaces of predicted standard errors
(Fig. lC) for the final model can be computed using the
following variance formula Osaaks and Srivastava 1989):
Var(E O) = i i WiW j COV(11i11) + ii:ViVjCOV(J..liJ..l)
i=1 j=1
i=1 j=1
+
2i
i : WiVj COV(11iJ..l j) i=1 j=1
2
iWPOV(11011o)
i=1
(8)
m
- 2 LVjCov(J..l j 110) + Cov( 110 110)
j=1
where COV(TJi TJj) is the autocovariance between the estimated environmental property at location i andj, Cov(J.!i. Ilj)
is the autocovariance between the auxiliary variables at
location i and j, and COV(TJi J.l.i) is the cross-covariance between the estimated environmental property and location i
and the auxiliary variable at locationj.
Spatial Integration
The ability to spatially model field data allows one to
integrate the data over any specified geographical region
(i.e. stand, management unit, watershed, region, etc.) to
obtain a point estimate and associated standard error of
prediction. This is accomplished by integrating the three
dimension response surface representing the variable of
interest over the area of interest and dividing by the area.
Since the spatially modeled response surfaces can be represented as a grid in ARCIINFO, any specified region will
contain a finite number (n) of grid cells of uniform size (Le
10 m x 10 m). Our point estimate of a resource in some
204
(9)
~
1 ~~
112
112
= 2" L.J
V(Ei) + 2" L.J L.J Pij (h) V(Ei)
V(E j)
A - I
t=1 r=1
(11(Sj) __________
_
1 11
<I>=-L<P i , <Pi EA
n i=1
The estimated variance is given by
The kriginglcokriging surfaces can be cross-validated to
assess the amount of variability in prediction error of the
kriginglcokriging system. Cross-validation involves deleting one observation from the data set and predicting the
deleted observation using the remaining observations in the
data set. This process is repeated for all observations in the
data set. Residuals are computed as the observed minus
predicted values and analyzed using standard techniques
employed in regression analysis to evaluate the underlying
assumptions of the model.
Overall performance of the final model (large scale
model + kriged/cokriged residuals) can be evaluated by
computing an R2 value similar to that used in regression
analysis (Kallas 1997):
R2=I_~j=~1
bounded region A, is obtai"ned by summing the point estimates associated with each cell, ct>h and dividing by the
number of cells in the bounded region:
n
A
i=l
A
n
A
A
(10)
i'#j
where V(E) is the estimated variance associated with cell i
(Eq. 9), and Pij(h) is the spatial correlation between cells i
andj, which are separated by distance h. The spatial correlation is estimated using the appropriate variogram function (Eq. 5) associated with the variable of interest. For
example, if we apply Eq. 10 and 11 to the small polygon
(24.64 ha) located in the center of Figures IB and Ie we
obtain a basal area estimate of 13.4 m 2/ha with a bound on
the error of estimation of 3.7 m 2/ha at the 67% level of
confidence.
Point Process Models
The second aspect, modeling the spatial dynamics of an
individual population, is a more recent development, especially with the increase in computing power which
makes it easier to perform intricate computations needed
to explore complex spatial patterns. One class of spatial
models that has received considerable attention in recent
years is the Gibbsian interaction model, which is often
referred to as Markov random fields (Ripley 1990; Cressie
1991). These models encompass conditional spatial
autoregression and a wide class of models for interacting
point patterns. The term Gibbsian interaction comes from
statistical mechanics, where such models have been used for
nearly a century to describe the behavior of gases (Ripley
1990; Cressie 1991). In most applications, interactions between events are assumed to be pairwise.
Examples of spatial stochastic models that take into
consideration the interaction among events include work
on sequential packing models of non-overlapping discs
(Matern 1960; Bartlett 1974; Diggle et al. 1976), Poisson
cluster models (Matern 1960; Diggle 1979), and Strausstype and hard-core models (Strauss 1975; Kelly and Ripley
1976; Gates and Westcott 1980). While most of this work
has been theoretical, the increase in computing power has
contributed to progress in estimating the parameters of
these models using theoretical approximations to the likelihood function or computer simulations. Approximate maximum pseudo-likelihood procedures provide reasonable parameter estimates and are somewhat easier than
approximate maximum likelihood (Ripley 1990). Nonparametric estimations of pairwise-interaction point processes
for similar problems have also been developed (Diggle et al.
1987).
In developing these models it is assumed that we have
very specific information on the location of every individual within the population. This information may be
obtained from intensive monitoring research sites aimed
at studying very specific components of the environment.
For example, one might be interested in studying the spatial
relationship of the northern goshawk, or selected plants
USDA Forest Service Proceedings RMRS-P-12. 1999
with their habitat. The plants and/or animals would be
located in the field, georeferenced, and important variables
thought to influence their presence measured. This information can then be used to model the spatial interaction of
individual species (i.e. threatened and endangered plants
and animals) with themselves, other species and their environment using procedures developed by Reich et al. (1997).
Suppose one has a mapped spatial pattern of points in
a finite planner region. In the case of the northern goshawk, it is easy to identify potential habitat using environmental variables such as elevation, slope and aspect along
with existing forest cover type maps. Even though suitable
habitat may be identified this does not mean that the species
will be present at that location. In habitats where the
goshawk is present, a pattern where individuals are rather
equally spaced from one another would be expected. Such a
pattern is called "regular".
One way to model this spatial interaction is to consider a
function of distances (rij) between individual sites of activity.
In such instances, it is customary to assume that the equilibrium system is statistically characterized by a Gibbs distribution of total potential energy (Cressie 1991):
N
U N(X) =
L'P(ri)
(11)
where '1'1(rij) and 'P 2(rij) describe interaction between individuals ofa given species and 'P 12 (rij) describes interactions
between the two species. The approximate log likelihood of
the pairwise potential (Eq. 4) is given by
log L(e I X) =
L LUeClX
((x) =
exp[ -UN(X)]/Z('P;N)
(12)
where Z(.) is a normalizing constant. For a single species
population, a positive potential energy represents a repulsion between individuals while a negative potential energy
represents an attraction between individuals (Fig. 2). This
model can be expanded to include more than one species:
Nl
U N(X) =
N2
Nl N2
L 'PI (li) +L 'P (li) +L L 'P12 (li)
2
i<j
i
i<j
j
(13)
i
XmP -
N(N -l)lOg( 1-
~~n (14)
which is easily solved using nonlinear optimization procedures. To use this relationship one needs to be able to
mathematically describe the interaction potentials of a spatial point pattern. Three parameterized potential functions
proposed by Ogata and Tanemura (1981, 1985) can be
evaluated to describe the interactions observed in the distribution of active and inactive nest sites:
PF1:
l.jf e(r)
= -log[l + (ar - 1)e-13r2 ]
PF2:
l.jfe(r)
= -log[l + (a _1)e-
PF3:
l.jfe(r)
=
13r2
]
~(crlr)12 -a(crlr)6
e=(a,~),a~O, ~>O
(15)
e=(a, ~), a~O, ~>O
(16)
e=(a,~,cr), ~>O
(17)
The second cluster integral, aCe) for the three potential
function are given by:
aM 12)
PFl: a(a,~) = (1t/~)(1PF2: a(a,~) = 1t(1-a)/~
i<j
where the points X can be regarded as being distributed
according to a Gibbs canonical distribution:
n -
.
PF3. a(a,~,cr) =
(18)
(19)
_'::~1/6cr2~ ~r(6k-2)ak~-k/2
6
~ k!
12
(20)
All three models are capable of modeling both repulsive and
a ttractive forces.
The pairwise potential models PFl-3 are fit to point data
using a nonlinear least squares procedure to maximize the
log likelihood (Eq. 18). The Akaike Information Criteria
(AIC) (Aka ike 1977) is used to select the model which
minimizes the val ue of AI C among the three possible models.
A model with a smaller AIC is considered to be a better fit.
In the case of point patterns with two categories (i.e. active
vs. inactive nests), AIC is computed for each of the three
ctS
:;:::
c
(])
10
+-'
0
a..
(])
(J'J
.~
.(ij
5
a..
o- -- - -- - 1.0
1.5
=-===-=-=-==-==-=--~---~--
2.0
2.5
3.0
Distance (km)
Figure 2.-Pairwise potential model (PF 3) describing the spatial interaction of northern
goshawk territories on the Kaibab National Forest in northern Arizona. The northern goshawk
is territorial with a minimum distance of 1 km between territories; territories are spatially
independent at approximately 2.1 km (Reynolds and Joy 1995, personal communication).
USDA Forest Service Proceedings RMRS-P-12. 1999
205
components in Eq. 17 (AIC u , AIC 22 , AIC 12), and the best
model for each component is selected independently. This is
because the approximate log likelihood with respect to the
parameters is equivalent to the independent maximization
of the individual components (Ogata and Tanemura 1985).
As mentioned previously, just because an area is deemed
suitable for the presence of a particular species does not
mean that the species will be present. Within a given
habitat, the spatial distribution of active sites are influenced
by small scale spatial variability, such as differences in the
abundance of a food supply, plant competition, distances to
openings in the canopy, stocking levels, species compositions, etc. To include this small scale spatial variability in
the model one can redefine the total potential energy as
follows:
i<j
i<j
i
j
where <l>l(rj) and <l>2(rj) are measures of small scale spatial
interaction. This model would allow us to describe the
territoriality of the northern goshawk and interaction with
their immediate habitat. To model this component, the
probability of habitat suitability can be defined at each of the
grid points superimposed over the study area. This probabilityis computed as the ratio of estimated density of nests A(S),
at spatial location s, to the maximum intensity observed in
the study area. The potential energy associated with a given
micro-habitat can be defined as
$(r) = -loJ
6l max).(s)).(s) ) = {(stand characteristics)
(22)
which can be regressed on individual stand characteristics
available in the GIS database. Large positive values would
indicate unsuitable habitats while small values would indicate suitable habitats (Reich et al. 1997).
Concluding Remarks
The application of such a model can be updated yearly
with current information that can quantify progress of the
species in question. Information can be very specific, such as,
how the spatial location of the species is changing over time
to more general questions relating to the effects of food
supply availability, natural forest succession, and silviculture treatment. Information derived from the model could
also be used to facilitate the efforts of field investigators
studying the ecology of the selected species. This model,
when combined with information on population dynamics,
demographic information and linkerl: to a forest successional model, could provide land managers with valuable
insight in developing management plans to guide the recovery efforts of a species. Such a model could be used to address
species viability and minimum area requirements.
This is a unique approach to modeling the spatial distribution of threatened and endangered species such as the
northern goshawk with which their existence is related to
past land management activities. The use of spatially explicit models can be used to monitor the efficiency of certain
components of the recovery plan as well as to provide a
general prediction of how the population is changing in time
and space. In this pilot study, the modeling approach suggested above is worthwhile in developing an ecosystem
206
maintenance and preservation program by providing
greater insight into changes in the landscape, both on the
macro- and micro-scale, and more importantly, to the consequential impact these changes have on selected species.
References --------------------------------H. Akaike, " On entropy maximization principle," In Applications of
Statistics, P. R. Krishnaiah (ed.), 27-41. Amsterdam, NorthHolland, 1997.
S. Aronoff, "Geographic Information Systems: A Management Perspective," WDL Publications. Ottawa, Ontario. 1991
M. S. Bartlett, "The statistical analysis of spatial patterns," Advances Appl. Prob., Vol. 6, pp. 336-358, 1974.
C. D. Bonham, R. M. Reich, and K. K. Leader, " Spatial crosscorrelation of Bouteloua gracilis with site factors," Grasslands
Science, Vol. 41, pp. 196-201, 1995.
J. R. Carr, and P. G. McCallister, " An application of cokriging for
estimation of tripartite earthquake response spectra," Math.
Geology, Vol. 17, pp. 527-545, 1985.
R. L. Czapleski, and R. M. Reich, Expected value and variance of
Moran's bivariate spatial autocorrelation statistic under permutation, Research Paper RM-309. U.S. Department of Agriculture,
Rocky Mountain Experimental Range Station., Fort Collins, CO,
1993.
A. Cliff, and J. K. Ord, Spatial processes, models and applications.
Pion, Ltd. London., 1981.
N. Cressie, Statistics for spatial data. John Wiley & Sons, New York,
1991.
P. J. Diggle, J. Besag, and J. T. Gleavens, "Statistical analysis of
spatial point patterns by means of distance methods," Biometrics,
Vol. 32, pp. 659-667, 1976.
P. J. Diggle, " On parametric estimation and goodness-of-fit testing
for spatial point patterns," Biometrics, Vol. 35,pp. 87-101, 1979.
P. J. Diggle, D. J. Gates, and A. Stibbard, "A nonparametric
estimator for pair-wise interaction point processes," Biometrics,
Vol. 74, pp. 763-770, 1987.
D. J. Gates, and W. Westcott, "Further bounds for the distribution
of minimum interpolation distance on a sphere," Biometrika, Vol.
67, pp. 446-469, 1980.
S. N. Gown, R. H. Waring, D. G. Dye, andJ. Yang, "Ecological remote
sensing at OTTER: Satellite Macroscale Observation. Ecological
Application.Vol. 4, pp. 322-343, 1994.
E. H. Isaaks, R. M. Srivastava, An introduction to applied
geostatistics, Oxford University Press, New York., 1989.
M. Kallas, Hazard rating of Armillaria root rot on the Black Hills
National Forest, M.S. Thesis, Department of Forest Sciences,
Colorado State University, Fort Collins, CO 80523, pp., 1997.
F. P. Kelly, and B. D. Ripley, "A note on Strauss's model for
clustering," Biometrics, Vol. 63, pp. 357-360, 1976.
J. Lui, J. B. Dunning, Jr., and H. R. Pulliam, "Potential effects ofa
forest management plan on Bachman's Sparrows ( Aimophila
aestivalis):Linking a Spatially Explicit Model with GIS," Conservation Biology, Vol. 9, pp. 62-75, 1995.
B. Matern, "Spatial variation," Medelanden from Statens
Skogsforsknings Institut., Vol. 49, No.5, pp. 1-44,1960.
K. Metzger, Modeling small-scale spatial variability in stand structure using remote sensing and field data. M.S. Thesis, Department of Forest Sciences, Colorado State University, Fort Collins,
CO 80523, 1997.
Y. Ogata, and M. Tanemura, "Estimation of interactive potentials of
spatial point patterns through the maximum likelihood procedure," Ann. Instit. Statist. Math. Part B, Vol. 33, pp. 315-338,
1981.
Y. Ogata, and M. Tanemura, "Estimation of interactive potentials of
marked spatial point patterns through the maximum likelihood
method," Biometrics, Vol. 41, pp. 421-433, 1985.
R. M. Reich, C. D. Bonham, and K. Metzger, "Modeling small-scale
spatial interaction of shortgrass prairie species.," Ecological
Modeling, Vol. 101, pp. 163-174,1997.
R. M. Reich, R. L. Czaplewski, and W. A. Bechtold, "Spatial crosscorrelation in growth ofundisturbed natural shortleafpine stands
in northern Georgia," J. Enivorn. and Ecol. Stat., Vol. 1, pp. 201217,1994.
USDA Forest Service Proceedings RMRS-P-12. 1999
R. Reynolds, and S. Joy, Personal Communication, USDA Forest
Service, Rocky Mountain Forest and Range Experiment Station,
218 West Prospect, Fort Collins, CO 80523, 1997.
B. D. Ripley, "Gibbsian interaction models," DA. Griffith (editor),
Spatial Statistics: Past, Present, and Future, Institute of Mathematical Geography, Syracuse University, New York. p.3-25,
1990.
USDA Forest Service Proceedings RMRS-P-12. 1999
G. P. Robertson, "Geostatisticsinecology: Interpolating with known
variance," Ecology, Vol. 63,pp. 744-748,1987.
D.J. Strauss, "A model for clustering," Biometrics, Vol. 35, pp. 87101,1975.
207
Download