Document 11863913

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Understanding the Spatial
Distribution of Tree Species in
Pennsylvania
Rachel Riemann Hersheyl
Abstract.--Current, accessible information on the distribution of tree
species would aid in the understanding and management of ecosystems.
However, such detailed information on forest composition is only
available from ground inventory. Geostatistical techniques are used here
to create an interpolated dataset, a 'map' of individual species
distribution, from known sample information. In a previous study, we
found that indicator kriging and sequential gaussian conditional
simulation (sgCS) were promising tools for estimating sugar maple
distribution from the USDA National Forest Inventory and Analysis
(FIA) data. The techniques provided an estimate of species occurrence
and a measure of uncertainty associated with that estimate, while
retaining much of the local variability present in the sample data. In this
study, these techniques are applied to 9 additional species in
Pennsylvania.. Four output datasets are available for each species--the
probability of species occurrence, an estimate of its relative abundance,
and a plus and minus level of uncertainty associated with that estimate.
The datasets, used in conjunction with one another, provide the user with
considerable flexibility in setting up the balance of errors of omission
and commission that best suit the analysis under consideration.
Similarities and differences between the species are identified and
discussed as to their possible effect on the final estimates. Examples of
how the datasets can be used are also presented. Indicator kriging and
sgCS, used in conjunction with FIA sample data, provide a relatively
straightforward technique to describe species occurrence and relative
density across a state.
INTRODUCTION
Data describing forest composition--so desired as a basic data source for many
aspects of ecosystem analyses, models, and management--are generally
unavailable andlor is stored in fixed forest-type categories. But forest
communities are often not well characterized by the discrete categories imposed
by forest cover type divisions. Inherent in each forest type category is an entire
continuum (usually multi-dimensional) of different species and their relative
importance.
In a previous study, we compared the geostatistical techniques available to
interpolate FIA sample data to create a 'map' of tree species distribution? The
orth the astern Forest Experiment Station, USDA Forest Service, 5 Radnor Corporate Center, Radnor, PA
73
19087-4585.
tools of ordinary kriging, multigaussian kriging, indicator kriging, and sequential
gaussian conditional simulation (sgCS) were used to estimate the occurrence and
distribution of sugar maple in Pennsylvania. After considering the phenomenon
being examined, the sample data being used, and the kind(s) of output desired in
this study, we decided that indicator kriging sgCS proved to be the best
interpolation tools for:
a)
b)
c)
d)
e)
f)
providing an estimate of sugar maple occurrence,
providing an estimate of sugar maple 'importance' in terms of %ba/acre,
providing a measure of uncertainty associated with the two estimates,
maintaining local variability,
maintaining the characteristics of the original sample data, and
handling sample data with highly skewed distributions.
a) An estimate of each species' presence or absence was provided by indicator
kriging. An indicator transform divides the data into two classes--either above or
below a designated cutoff value; in this study O%balacre indicating presence or
absence. Indicator kriging calculates for each cell an estimate of the probability
that it falls above or below the cutoff value. The output dataset thus indicated the
probability that sugar maple occurred at each location.
b) The second piece of information desired was an estimate of the relative
amount of sugar maple at that location--i.e., whether the species represented a
minor, moderate, or a major component of the total balacre on the plot at that
location. Sequential gaussian conditional simulation determines multiple
estimates for each cell. All are equally probable, and yet alternative realizations
of the data determined fiom multiple simulation runs. From this set of estimates,
an entire distribution can be built for each cell, representing the range of possible
values. A summary statistic such as the mean or median of this distribution can
then be chosen and used as the modeled 'estimate' of %ba/acre for that cell?
c) A level of uncertainty is always associated with any estimate. Knowing how
much uncertainty exists will help the user identify whether that uncertainty is
acceptable for a specific task and how the data can be used. Correspondingly,
knowing how much uncertainty exists will help the producer identify areas in
which additional sampling would most improve the estimates. For estimates of
species' presencelabsence, indicator kriging provides this information in terms of
a probability. For estimates of %balacre, summary statistics such as standard
deviation or inter-quartile range were calculated from the distribution of simulated
values to describe the variation associated with the %ba/acre estimate for each
cell.
Hershey, R. Riemann, M.A. Ramirez, and D.A. Drake. Using Geostatistical techniques to estimate the distribution and
relative density of individual tree species in Pennsylvania. Unpublished report on file at USDA Forest Service,
Northeastern Forest Experiment Station, Forest Inventory and Analysis Unit, Radnor, PA.
e, methods are described in Rossi et al. 1993 and Isaaks and Srivastava 1989; the analysis was perjiormed using
GSLIB routines (Deutsch and Journel1992) with some additional routines written by R.E. Rossi.
d) Tree species in Pennsylvania exhibit a high level of local variation as a result
of natural environmental factors and land use histories. At the intensity of
sampling present in the FIA sample data, much of this local variation cannot be
modeled and effectively predicted in the interpolation process, but instead appears
as variation that is unexplained by neighboring plots. However, such local
variability is an important characteristic of the distribution of a species. Thus, we
did not want this local variability to become hidden behind a regional average of
the resource, but to remain as apparent and accessible to the user as possible in the
final estimated dataset@). Sequential gaussian conditional simulation was the
most effective of the interpolation methods at maintaining local variation.
e) One feature of a well-designed sampling scheme is that it is sensitive to and
can report, with an acceptable level of error, the characteristics of the phenomena
of interest. In this ideal situation, the characteristics of the sample data represent
reasonably well those characteristics of the phenomena itself. Every estimation
technique honors and maintains different aspects of the original data. The specific
goals of the interpolation task at hand will determine the priorities, but in general
the more characteristics of the sample data that are preserved in the estimated
dataset, the more desirable the dataset. Sequential gaussian conditional simulation
again did the best job of maintaining both the univariate and bivariate
characteristics of the sample data.
f) As is true with many plant and animal populations, tree species have
population distributions that are distinctly skewed toward younger individuals-more small trees than large mature ones. In addition, Pennsylvania, like most of
the northeastern states, contains primarily mixed forests. Individual species rarely
occur in pure stands. The 10 species examined in both studies included 8 of the
most common by volume in Pennsylvania, and yet more than 50% of the time,
when a species occurred on a plot it occurred as only a minor component (here
defined as making up less than 20% of the total balacre on that plot). Both factors
are combined in the %ba/acre 'relative importance' value, resulting in a highly
skewed frequency distribution. Such extreme characteristics in the sample data
can cause difficulties and biases when used with some of the interpolation
methods that depend on assumptions about the normality of the distribution of the
sample data (Isaaks and Srivastava 1989). One particular advantage of indicator
kriging is that it makes no assumptions about the distribution of the data.
Sequential gaussian conditional simulation, on the other hand, does assume that
the data are normally distributed and stationary, and must be used more carefully.
The sgCS routine used here, from Deutsch and Journel (1 992), performs a 1- 1,
invertible normal-score transform on the data before running the simulation. In
addition, however, the data also should be checked for binormality and a decision
made as to whether to assume multivariate normality before the results of
conditional simulation are accepted (Rossi et al. 1993).
The data exploration techniques used also proved invaluable to understanding
the spatial characteristics of the specieslvariable being examined. Techniques
included univariate analysis, variograms and other spatial dependence analyses,
and calculating local statistics. The resulting information was critical not only in
determining what interpolation methods were most suitable and for checking the
sample data for errors, but also for understanding the characteristics of the sample
data and thus the phenomena being investigated. Geostatistical techniques
offered ways to explore, organize, and summarize spatial patterns in the data that
can provide clues to the variation and spatial behavior of the individual species
under investigation.
Applying the techniques to 10 species in Pennsylvania
Because of the promising results from using geostatistical techniques for
estimating the distribution of sugar maple from FIA data in the previous study, the
same geostatistical methods were applied to nine additional species: red oak,
white oak, chestnut oak, black oak, hemlock, red maple, beech, white pine, and
yellow birch. This list includes 8 of the top 10 most abundant species in
Pennsylvania by volume, and two species (yellow birch and white pine) that are
much less common (Alerich 1993).
Tree species distribution is affected by many factors, including both
environmental conditions and direct human influence through harvesting and
other land use histories. As a result of being differentially affected by all of these
factors, each species will exhibit different patterns and scales of spatial
distribution. Some of these factors occur at scales much smaller than the sampling
intensity of the FIA data, and some occur over larger areas, representing broadscale variation in the species distribution. In the previous study, it was found that
a substantial amount of variation in sugar maple distribution was resolved at the
sampling scale used for the FIA plots. This spatial dependence could, therefore,
be modeled and used to support estimates of species occurrence and relative
'importance' (%ba/acre). The goal of this study is to examine to what extent this
is true for the other species. More specifically, the objectives of this study are:
a) if spatial dependence is exhibited at this intensitylscale of sampling,
b) the resulting spatial distribution for each species, and how that compares to
our current understanding,
c) how the species differ from one another in terms of spatial dependence and
distribution, and how that affects our ability to estimate them, and
d) how to use the resulting estimated datasets.
DATA
The sample data were collected by the Northeastern Forest Experiment Station's
Forest Inventory and Analysis (FIA) unit. Basal area--the summed cross-sectional
area at breast height--is calculated for all live trees 1.0 inches DBH or larger on
the plot (Hansen et al. 1992). The data were for individual tree species, by basal
area (ba) per acre as a proportion of the total basal area (% balacre). The data
were accessed from individual tree records in the USFS Eastwide tree-level
database and summarized as %ba/acre for each species by plot. In Pennsylvania,
there were a total of 5,100 plots. Nonforested plots and those with total balacre
equal to zero (due to missing data) were removed--leaving only 2,905 plots.
METHODS
Each species was examined entirely independently. As in the previous study,
the data for each species was organized, summarized, and explored using
univariate statistics, measures of spatial dependence (variogram, covariance, and
correlogram), and spatial distribution of local statistics across the state. All
species were similar in many of their basic characteristics to each other and to the
previously investigated species, sugar maple. Each exhibited extremely skewed
distributions, with more than 50% of the plots containing less than 1%ba/acre in
every species except red maple and red oak. A variogram was calculated for both
the raw sample data and for a 1- 1, invertible normal-score transform of the sample
data, using a lag distance of 500m and no directional component (anisotropy). In
every instance, the variogram of the normal-scored data exhibited considerably
more spatial dependence and structure than that of the raw data (Figure I),
revealing spatial characteristics that were hidden by the strong univariate
characteristics of the data. As sgCS uses normal-scored data, it was the model
fitted to the normal-scored variogram that was used in the conditional simulation.
An indicator variogram also was calculated and modeled for use in the indicator
kriging. In general, there was far less structure and less of the variation explained
in the indicator variogram (32 to 57%) than in the normal-scored regular
variogram (35% and 64 to 97%) (Table 1). To assess how areas of 'local'
variability in the sample data changed across the state, the mean and standard
deviation were calculated for each of the 23,400 3000 x 3000m cells, using a 15 x
15km area as the window defining the size of the 'local' area. All species
exhibited a proportional effect, with areas of high mean corresponding with areas
of high local standard deviation, indicating a lack of stationarity. Using normalscored data seemed to largely eliminate this situation.
Table 1. The percent variation explained by the spatial dependence in the variograms.
Indicator
Normal-scores
Species
variogram
variogram
Beech
46
76
Black oak
Chestnut oak
Hemlock
Red maple
Red oak
Sugar maple
White oak
White pine
Yellow birch
Indicator kriging and sgCS were run for each species, using models derived
from the appropriate variograms. The estimation parameters of cell size (3,00Om),
search radius (10,00Om), and minimum:maximum number of points used (1 :16)
were taken directly from the results of the previous study. In sgCS, 50
simulations were run for each species.
0
25000 50000 75000 100000
distance (m)
0
25000
50000 75000
distance (m)
100000
Figure 1. Spatial dependence as demonstrated by the variogram of the raw data (left side)
and variogram of the normal-scored data (right side) for white pine.
RESULTS
With the exception of red maple, all species examined demonstrated substantial
spatial dependence in the variogram of normal-scored data, with 64 to 97% of the
variation explained by the visible structure and capable of being modeled. In
some species, much of that spatial dependence was contained in a verylong-range
trend of about 100,000m. White oak, black oak, chestnut oak, and beech all fell
into this category. The rest of the species appeared to split the bulk of the
explained spatial dependence over 2 ranges. For red oak, this was both a shortand medium-range pattern (12,000 and 40,000m); sugar maple a very short- and a
long-range pattern (2,100 and 60,000m); and yellow birch a medium- and longrange pattern (19,000 and 80,000) (Figure 2). The spatial dependence exhibited
i n the indicator variograms was much less, ranging from only 32 to 57% of the
variation explained. This is not ideal for interpolation and suggests the necessity
for hrther refinement.
Z Q
1 T a1 beech
0.75
I
*=
0.5
.4
0
.l
*
0.25
0
t------+
0
25000
I
I
50000
75000
distance (m)
I
0
100000
0
25000
50000
75000
distance (m)
100000
25000
50000
75000
distance (m)
100000
asample data -model
1
5 (y0.75
94
0.5
2 * 0.25
.e3
0
0
25000
50000
75000
distance (m)
100000
0
Figure 2a-d. Variograms from the normal-scored data for four of the species.
The results create several output datasets. Figure 3a-d shows the results for
beech, using 4 datasets to represent the species. Part (a) shows the estimated
probability of beech occurrence, as calculated from indicator kriging. Part (b) is
the median value of the 50 sgCS simulations, representing the chosen estimate of
beech relative 'importance' in %ba/acre. The uncertainty associated with that
estimate (here chosen to be a percentile range capturing approximately 2/3 of the
distribution) is described in (c) and (d). Part (c) expresses the minus variation, or
that distance in %ba/acre values between the median and the bottom of that range
(the 17th percentile), and part (d) expresses the plus (+) variation (83rd-50th
percentiles). In every instance, the + variation is much greater.
Figure 3. Four estimated datasets describing the distribution of beech in Pennsylvania: a) the
estimated probability of occurrence using indicator kriging, b) the median values from 50
sequential gaussian conditional simulations, c) the minus variation (median-17th percentile).
and d) the plus variation (83rd percentile-median) about the median estimate.
DISCUSSION
The probability that a species occurs is a unique and useful dataset. Any
probability can be used as the cutoff to create a species presence/absence map
depending on the objectives of the task at hand. If, for example, a particular
insect is known to live in chestnut oak forests and the objective is to limit the
search to only those areas where there is a high probability of finding suitable
conditions, we might set the cutoff at the probability level of 2 .8. If, however,
we are most interested in not missing any areas where the insect occurred, we
might set the probability level for forest much lower, say 2.4.
accrptAWe
mdmte
umxqmble
Figure 4. Chestnut oak occurrence as a
major (>40%ba/acre), moderate (20-40%),
and minor component (<20%) of the total
balacre. Derived from sgCS estimates using
the 75th percentile.
Figure 5. The uncertainty associated with
the estimates used in Figure 4, in classes oE
acceptable (S&lOOhba/acre),moderate (&lo40%), and unacceptable (>& 40%). Derived
from the sgCS estimates.
The distribution of densities at which a species occurs also can be of
considerable interest. For example, Figure 4 is a plot produced from sgCS
estimates illustrating where chestnut oak occurs as a major, moderate, and minor
component. The three categories have been defined as 1 to 20%ba/acre, 20 to
40%ba/acre, and >40%ba/acre. The %ba/acre estimate in Figure 4 can be
associated with a corresponding uncertainty dataset (Figure 5) in which the
uncertainty classes are broken into acceptable, moderate, and unacceptable.
Again, the dataset itself is unclassed, and the user defines these--all the building
blocks are provided in the output from sgCS.
Alternatively, the same %ba/acre dataset(s) can be used to create a forest cover
type 'map.' For this purpose, the summary statistic (whether mean, median, or
another of the percentiles of each cell's simulated distribution) can be chosen
specifically for the intended purpose in the same way the different levels of
probability could be chosen when mapping species presence/absence from
indicator kriging estimates. If the objective is to reduce the error of commission
(i.e., classifying areas as sugar maplelbeech (SMIB) in the estimated map that
really are not SMIB), then using a percentile at the lower end of each cell's
distribution would be more desirable. If, however, the objective is to reduce the
error of omission (i.e., missing areas that do contain SMIB), then using a higher
percentile from the distribution, such as 75%, would be more desirable.
The %ba/acre estimates of three species were further refined. Two species,
white pine and yellow birch, nearly disappeared in the initial sgCS estimate even
though they showed up as present in the probability of occurrence estimate from
the indicator kriging. This was considered suspicious, so the dataset for each of
these two species was divided into several different populations by region, and the
process of variogram modeling and interpolation was repeated for each region.
The regionalized variogram was substantially different from the average
variogram, and when sgCS was performed using the locally tuned models, it
revealed an entirely different and much more credible picture of white pine
distribution in Pennsylvania. The third species, hemlock, was refined based
primarily on the suspicion that small and large stand-size classes may have
different spatial distribution patterns. When variograms were calculated
separately for these two populations (using a cutoff of 45 years), they were indeed
substantially different in shape, sill, nugget, and range.
CONCLUSIONS
The estimated datasets output by indicator kriging and sgCS have the potential
to be very useful. Every species examined, with the exception of red maple,
exhibited substantial spatial dependence in the variograms of the normal-scored
data, suggesting that there is considerable potential for the estimation of such
datasets from FIA data. Each species exhibited some variety in spatial patterns
and spatial dependence and may require different levels of additional fine-tuning,
depending upon the objectives of the specific analysis and the time and expertise
available.
Characteristics that make the datasets useful
These techniques make explicit the uncertainties associated with an estimate in
a form that can be incorporated when the data are used. This feature adds
considerable utility and flexibility in the use of the resulting estimates, as the risk
of errors of commission or omission can be specifically determined and
manipulated to suit the current objectives.
Maintaining individual species information separately allows considerable
flexibility in the use of species distribution data. Instead of being limited to
previously defined fixed classes, forest cover types can be uniquely defined to
capture more accurately the habitat required for a particular study. The potential
also exists to use one or more of the species datasets as a decision layer in the
interpretation of satellite imagery. The two datasets offer complementary
information about the species composition that really exists on the ground.
The techniques used in this study are not extremely time-consuming nor
difficult to process, and could be easily extended to additional species and states.
There is a high level of variance associated with these estimates of %ba/acre--in
many locations this variance can be as much as the estimate itself. However, the
dataset nevertheless provides a very descriptive picture of species distribution at
the state level. In comparison to previous depictions of current species
distribution from FIA data by summarizing at the county level, this method
provides a much more detailed picture of species occurrence and distribution.
Although estimates could probably be improved and variances diminished by
additional investigation into each species, the current estimates are informative
and provide a useful basis from which to proceed.
Clues to refining the interpolation and improving the estimates
As was observed with white pine, yellow birch, and hemlock, it may be possible
to significantly improve the estimates of %ba/acre by refining the analysis. When
sub populations of a species have a significantly different pattern of spatial
distribution, treating the populations separately in the interpolation will improve
the final estimates. These populations may be described by regional land features
or by some other defining characteristic (e.g., stand age for hemlock). The results
suggest several clues to determine when this additional effort is necessary. First,
when results of indicator kriging showed a species as present, but sgCS did not.
Second, when the spatial dependence in the variogram is noticeably less than
expected. This may suggest that there are different populations being lumped
together that should be separated. For white pine, dividing the state into several
broad geographic regions made a significant difference in the calculated
variogram and thus in the final estimates. Another important clue is previous
knowledge about the species that different ecological regions may have caused
distinct spatial distribution patterns among tree species, or that different size class
or age populations may have different spatial distribution patterns over the
landscape. Hemlock is an example of the latter. As a result of past management
practices that involved heavy harvesting of large hemlock for the tanning
industry, today there are often relics of large individuals among a relatively wider
distribution of smaller, younger trees that have grown up in the interim (Hough
and Forbes 1943, Powell and Considine 1982).
These datasets of individual species distribution do not contain any of the finescale foresthonforest detail. If such information is desired, more detailed datasets
describing the forestlnonforest land cover in Pennsylvania would have to be
derived from a more intense point sample or the continuous but averaged data
available from satellite imagery. Such detailed datasets could be used as a 'mask'
and overlaid on any of the datasets of species distribution.
There are more possibilities for applying geostatistical techniques than have
been investigated here. For example, some species may exhibit some correlation
with a particular soil, climate, topography, or reflectance data from satellite
imagery. Indicator kriging and sgCS, in particular, allow the incorporation of such
ancillary, 'soft' information to contribute to the estimation process.
REFERENCES
Alerich, C.L. 1993. Forest Statistics for Pennsylvania--1978 and 1989. Resource
Bulletin NE- 126. USDA Forest Service, Northeastern Forest Experiment
Station. Radnor, PA. 244p.
Deutsch, C.V. and A.G. Journel. 1992. GSLIB: Geostatistical Software Library
and User's Guide. Oxford University Press, New York.
Hansen M.H., T. Frieswyk, J.F. Glover, and J.F. Kelly. 1992. The Eastwide Forest
Inventory Data Base: Users Manual. General Technical Report NC- 151. USDA
Forest Service, North Central Experiment Station. St. Paul, MN.
Hough, A.F. and R.D. Forbes. 1943. The Ecology and silvics of forests in the
High Plateaus of Pennsylvania. Ecological Monographs. 13:299-320.
Isaaks, E.H. and R.M. Srivastava. 1989. An Introduction to Applied Geostatistics.
Oxford University Press, New York.
Powell, D.S. and T.J. Considine. 1982. An analysis of Pennsylvania's forest
resources. Resource Bulletin NE-69, USDA Forest Service, Northeastern Forest
Experiment Station. Broomall, PA. 97p.
Rossi, R.E., P.W. Borth, and J.J. Tollefson. 1993. Stochastic simulation for
characterizing ecological spatial patterns and appraising risk. Ecological
Applications. 3(4):719-735.
BIOGRAPHICAL SKETCH
Rachel Riemann Hershey is a foresterlgeographer with the Forest Inventory and
Analysis Unit, Northeastern Forest Experiment Station. She received a B.A. in
ecology from Middlebury College, an M.S. in forestry from the University of NH
and an M.Phil. in geography from the London School of Economics.
Download