Effect of changing grain size on the predictive performance and

advertisement
Effect of changing grain size on the predictive performance and ecological drivers of invasive
plant habitat suitability models
Melissa E. Bridges
Introduction
Generating spatially explicit habitat suitability models for non-indigenous or invasive plant
species (NIS) across a landscape of interest can be a very powerful decision support tool for land
managers charged with reducing the spread of such weeds. Habitat suitability models (HSM)
can be used for monitoring new populations for early detection and rapid response (Stohlgren
and Schnase 2006). These models could be used to prioritize management (Rew et al. 2007)
assuming more suitable habitats foster stronger invasive plant source populations (Maxwell et al.
unpub; Hanski 1994). There are several published and tested modeling methods used to generate
habitat suitability models for a variety of species (Elith et al. 2006). Most methods utilize
presence and absence data for a species of interest as the response variable and a set of
environmental variables as predictors.
Habitat suitable models for several NIS in the Northern Range of Yellowstone National Park
(NRYNP) have been generated using general linear models with logit link functions (i.e., logistic
regression) (Rew et al. 2005). The predicted response of a logistic regression, like most of
habitat suitability modeling methods, yields values bounded between 0 and 1. Typically, the
expected value of the response is interpreted as a probability of occurrence. Because NIS
presence and absence data were collected at a 10m scale in NRYNP and most predictor variables
were available at a 10m resolution (i.e., grain size), initial HSM predictions were made at a 10m
scale (Rew et al. 2005). However, little is know how predictions would differ at coarser grain
sizes or how drivers of the ecological processes giving rise to the spatial pattern of invasive plant
habitats change when predictions are made at different spatial scales (but see Guisan et al. 2007
for some evaluations of predictive performance and changing grain size).
The objectives of this study were to i) quantify how changing grain size (resolution) of the
environmental predictor variables and, thus, of the prediction (response) surface from 30m
resolution to a 1km resolution affected the predictive performance of HSMs for two invasive
plant species within the NRYNP and ii) Characterize how changing the prediction scale for
habitat suitability affected which ecological drivers (i.e., environmental variables) were
significant in the resulting habitat suitability models.
Methods
NIS Survey
Surveys were conducted within the YNPNR for the presence and absence of non-indigenous
plant species (NIS) at a 10m scale between the years of 2001-2003 (Fig. 1). The survey was
designed as a stratified random sample where 2km continuous transects were randomly assigned
to start from roads or trails and end greater than or equal to 2km from any other road or trail
(Rew et al. 2005). Mapping crews used Trimble GeoXT GPS units to record the presence and
absence of several non-indigenous plant species every 10m. This survey method was illustrated
as being efficient and adequate for characterizing NIS (Rew et al. 2006). The surveys conducted
between 2001 and 2003 resulted in a total sample size of n = 52,631.
NIS Species
Cirsium arvense (CIAR) and Bromus tectorum (BRTE) were selected for analyzing the effect of
changes in grain size on the predictive performance and the variables explaining their spatial
distributions. These two species were selected because they represent different life histories.
CIAR is a perennial species thought to have been introduced to North America in the 1600s
(Morishita 1999) and reproduces both sexually and asexually via adventitious root buds (Donald
1994). Conversely, BRTE is an annual grass species that reproduces by seed and was first
introduced to North America between 1875 and 1889 (Mack 1981). Both species are considered
invasive and are pests of both rangeland and cropland.
Environmental Variables
Environmental variables were assembled based on either availability for the spatial extent of the
study area and their potential relevance to describing the habitat suitability of CIAR and BRTE
(Tables 1 and 2) (Evangelista et al. 2008). Several of the variables were derived from a 10m
digital elevation model acquired from the U.S. Geological Survey (http://seamless.usgs.gov).
The normalized difference vegetation index (NDVI) was calculated using 2002 scenes from
Landsat TM bands 3 and 4 (Eq. 1). Landsat imagery was acquired from GLOVIS
(http://glovis.usgs.gov/). The binary variable, wildfire, depicts the presence and absence of
wildfire in the NRYNP and was provided by Park staff.
NDVI = (NIR – RED) / (NIR + RED)
Eq. 1
Where NIR = near infrared reflectance band (Landsat band 4)
Red = red reflectance band (Landsat band 3)
All continuous variables were re-sampled to 30m and 1km using bilinear interpolation within
functions in ESRI ArcGIS 9.3. Categorical variables were re-sampled to 30m and 1km using
nearest neighbor assignment.
Habitat Suitability Model Building
A subset of data were randomly selected from the total dataset of presences and absences to
represent the model building data (n =47,367). The remaining data were used for model
validation (n = 5264). A popular method for generating HSMs and the method employed in this
study is through the use of logistic regression (Eq. 2) (Keating and Cherry 2004; Rew et al. 2005;
Morisette et al. 2006).
p( y  1| x) 
Eq. 2
exp( B 0  B1x1  ...  Bpxp)
1  exp( B 0  B1x1  ...  Bpxp)
The response variable, y, is a binary response equal to 1 if the sample or trial resulted in a
success or equal to 0 if the sample was not a success. As the logistic regression applies to
HSMs, y is equal to 1 if the species is present or equal to 0 if the species is absent. The vector, x,
represent continuous or categorical environmental variables. The vector, β, represents the
parameters to be estimated. The inference is intrinsically bounded between 0 and 1 (Keating and
Cherry 2004), which can be interpreted as a probability of species occurrence.
Model Selection
Logistic regression was performed in R (www.r-project.org) using the GLM function (family =
binomial). Final models were selected based on a stepwise procedure designed to select the
model with the lowest Akaike’s Information Criterion (AIC) value (STEP function in R).
Coefficients for final models were used in ESRI Spatial Analyst and ArcGIS 9.3 to create maps
depicting the probabilities of occurrence for each modeled species.
Model Validation
The subset validation datasets for each species were used to test the predictive performance of
each model. A commonly used measure for assessing the predictive ability of a logistic
regression model is calculating the area under the receiver operator characteristic curve (AUC of
the ROC curve) (Fielding and Bell 1997). The AUC is a probability threshold independent
measure of model performance unlike Kappa statistics, which require a probability decision rule
at which the classification of the model is assessed (Fielding and Bell 1997). A ROC curve is
derived from varying the decision or probability threshold incrementally from 0 to 1.0. At each
increment, an accuracy assessment of the classification is assessed where the true positive and
false positive rates of classification are calculated (Pearce and Ferrier 2000). A ROC curve for
each BRTE and CIAR model was constructed by plotting the false positive rate of classification
(1-specificity) versus the true positive rate of classification (sensitivity). The areas under each
curve were calculated to yield one probability threshold independent measure of model
performance. An AUC value of 0.90 is interpreted as that the model correctly discriminated
between sites where a species was present and sites where a species was absent 90% of the time
(Pearce and Ferrier 2000).
Receiver operator characteristic curves and the calculation of the AUC for each of the final
models selected by AIC were constructed in R (ROCR and Verification packages).
Results
Environmental explanatory variables
Mean and range values of each of the environmental variables fluctuated slightly as grain size
increased from 30m to 1km (Tables 1 and 2). In general, the variability of in the data of each of
the environmental variables decreased as grain size increased (Tables 1 and 2). The variance of
the distance from roads variable remained approximately unchanged from the 30m to the 1km
resolution.
Final HSMs
Final models describing the habitat suitability of both Bromus tectorum (BRTE) and Cirsium
arvense (CIAR) at both the 30m and 1km scales were selected using a stepwise AIC model
selection procedure. The models for BRTE at both the 30m and 1km scales utilized similar
environmental variables; however, neighborhood variables such as distance from roads, streams,
and trails were more important at the 30m versus the 1km scale (Tables 3 and 4). The
relationship between the presence of BRTE and aspect switched directions from a positive
relationship at the 30m to a negative relationship at the 1km scale (Tables 3 and 4).
Furthermore, the presence of wildfire increased the odds of a presence of BRTE only at the 1km
scale (Table 4).
The models for CIAR at both the 30m and 1km scales utilized all the same environmental
variables and final models selected at both scales did not eliminate any of the variables from the
full model (Tables 5 and 6). Relationships of both distance from trails and slope with the
presence of CIAR changed as the scale of the prediction increased from 30m to 1km. Distance
from trails and slope had negative relationships with CIAR presence at the 30m prediction scale;
whereas, they had positive relationships at the 1km scale (Tables 5 and 6).
Final logistic regression models were plotted onto geographic space to create maps of the
probability of occurrence, interpreted as a relative ranking of habitat suitability, of each species
at each scale (Figs. 2-5). When predictions were made at a courser grain size, detail in the HSM
was lost. Although different sets of explanatory variables were used to derive the maps for
BRTE at 30m versus 1km, the pattern of prediction for both models appears similar; however,
the 1km prediction was unable to predict some of the mid-range probabilities of occurrence in
the southern portion of the study area that the 30m prediction was able to model (Figs. 2 and 3).
The spatial patterns of habitat suitability for CIAR at both scales of prediction were also very
similar, which was not too surprising considering both models used the same set of predictor
variables (Figs. 4 and 5).
HSM validation
Receiver operator characteristic (ROC) curves (Fig. 6) were constructed for each model and the
area under the ROC curves (AUC) were calculated to determine model performance. AUC
values increased with increased grain size for both species (Table 7). The AUC for the 30m
BRTE model was 0.84 compared to an AUC of 0.89 for the 1km BRTE model. Likewise, the
30m CIAR model yielded an AUC of 0.718 and the 1km CIAR model had an AUC equal to
0.733.
Discussion
HSM performance and scale
Understanding the relationship between scale and habitat suitability of invasive species is
important both for increasing our understanding of the potential processes driving the spatial
distribution of an invasion and for land managers as they prioritize how and where to monitor for
invasive plant populations. Model performance at different resolutions can indicate at what
scale(s) the relationship between the environment and a species’ distribution is likely the
dominating process versus competition processes (Wiens 1989). In the case of both BRTE and
CIAR, all four models yielded AUC values that were better than random chance (i.e., AUC >
0.5), suggesting that the environmental variables, thus habitat, contributed to the spatial pattern
of each of species across the study area at both the 30m and 1km scales. A model yielding an
AUC value no better than random chance would suggest that there were ecological drivers (e.g.,
community assemblage, measures of competition) not included in the model that could have
better explained the spatial distribution of species occurrences across the landscape than did the
variables used to describe habitat.
Both models for BRTE resulted in higher AUC values than the models for CIAR. My models
suggest that I was better able to discriminate BRTE habitat than CIAR habitat. I initially
hypothesized that CIAR habitat would be more distinguishable and result in higher AUC values
than BRTE models because CIAR populations have been in North America perhaps 200 years
longer than BRTE populations, thus, allowing their populations to reach equilibrium with its
habitat availability. HSMs are often plagued with the difficulty of discriminating suitability
habitat because data on presence and absence of an introduced species can usually only model
the realized niche and not the fundamental niche of that species (Barry and Elith 2006). The
difference between the realized and fundamental niches describes the difference between suitable
habitat that is occupied and suitable habitat that is not occupied at the time of the survey and
represents a source of uncertainty in HSMs (Barry and Elith 2006).
Another interesting result from evaluating model performance at two scales was that for both
BRTE and CIAR, AUC values were slightly higher for the 1km models. This result is consistent
with what Guisan et al. (2007) reported after evaluating the performance of several HSMs for
different plant species in different regions of the world.
Ecological driver response to changing scale
A different set of environmental variables were used to make predictions for BRTE at the 30m
scale versus the 1km scale. There were no differences in the set of variables selected by AIC for
CIAR HSMs. The importance of the neighborhood variables (i.e., proximity variables)
decreased with increasing grain size for BRTE; however, the spatial patterns of probabilities of
occurrence remained very similar. This result for BRTE could suggest that different processes at
different spatial scales were yielding the same (or similar) geographic distribution for this
species. This is an interesting but not unexpected result if the models were discriminating habitat
better than random chance. Taking a closer look at how the relationships between environmental
variables and a species presence change with coarsening the grain size of the prediction was an
interesting exercise as it allowed the evaluation of how the odds of a presence are either reduced
or increased depending on the scale.
Implications for management
Land managers could find useful at what scale a HSM is performing best, and this scale could be
used to design monitoring protocols. Monitoring is often a low priority for weed managers
because of limited resources. Producing HSMs can help managers prioritize what locations
should be monitored for early detection of new unwanted plant populations (Rew et al. 2005,
Stohlgren and Schnase 2006). Likewise, determining the appropriate scale for making
predictions about habitat suitability of an invasive species can aid managers in developing
principles for management and can guide the design of future NIS surveys.
References
Barry S, Elith J (2006) Error and uncertainty in habitat models. J Applied Ecol 43:413-423
Donald WW (1994) The biology of Canada thistle (Cirsium arvense). Rev Weed Sci 6:77-101
Elith J, Graham CH, Anderson RP, Dudik M, Ferrier S, Guisan A, et al (2006) Novel methods
improve prediction of species’ distributions from occurrence data. Ecography 29:129-151
Evangelista PH, Kumar S, Stohlgren TJ, Jarnevich CS, Crall AW, Norman III JB, Barnett DT
(2008) Modelling invasion for a habitat generalist and a specialist plant species. Diversity
Distrib 14:808-817
Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in
conservation presence/absence models. Environmental Conservation 24:38-49
Guisan A, Graham CH, Elith J, Huettmann F, NCEAS Species Distribution Modelling
Group. (2007) Sensitivity of predictive species distribution models to change in grain size.
Diversity Distrib 13:332-340
Hanski I. (1994) A practical model of metapopulation dynamics. J Animal Ecol. 63:151-162
Keating K, Cherry S (2004) Use and interpretation of logistic regression in habitat-selection
studies. J Wildlife Mang 68:774-789
Lehnhoff EA, Rew LJ, Maxwell BD, Taper ML (2008) Quantifying invasiveness of
plants: a test case with yellow toadflax (Linaria vulgaris). Invasive Plant Sci Mang 1:319325
Mack RN (1981) Invasion of Bromus tectorum L. into western North America: an ecological
chronicle. Agro-Ecosystems 7:145-165
Maxwell B.D., et al (unpub) Modified incidence function for modeling metapopulation
dynamics.
Morishita Don W. (1999) Canada thistle. In: Sheley RL.; Petroff JK, eds. Biology
and management of noxious rangeland weeds. Corvallis, OR: Oregon State University Press:
162-174
Morisette JT, Jarnevich CS, Ullah A, Cai WJ, Pedelty JA, Gentle JE, Stohlgren TJ, Schnase JL
(2006) A tamarisk habitat suitability map for the continental United States. Frontiers in
Ecology and the Environment 4:11-17
Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed
using logistic regression. Ecol Modelling 133:225-245
Rew LJ, Maxwell BD, Aspinall R (2005) Predicting the occurrence of non-indigenous
species using environmental and remotely sensed data. Weed Sci 53:236-241
Rew LJ, Maxwell BD, Dougher FL, Aspinall R (2006) Searching for a needle in a
haystack: evaluating survey methods for non-indigenous plant species. Biological Invasions
8:523-539
Rew LJ, Lehnhoff EA, Maxwell BD (2007) Non-indigenous species management using
a population prioritization framework. Can. J. Plant Sci 87:1029-1036
Stohlgren TS., Schnase JL (2006) Risk analysis for biological hazards: what we need to know
about invasive species. Risk Anal 26:163-173
Weins JA (1989) Spatial scaling in ecology. Functional Ecology 3:385-397
Tables and Figures
Table 1. Summary statistics of the 30m environmental predictor variables used in the habitat
suitability models
Variable
Mean
Range
Std. Deviation
-2
Solar insulation (Wh m )
0.719
0.196 - 1.058
0.119
Aspect (degrees)
176.365
0.000 - 359.396
108.552
Cos of aspect (degrees)
0.040
-1.000 – 1.000
0.711
Distance from roads (m)
2441.627
0.000 - 11231.580
2221.876
Distance from streams (m)
590.935
0.000 - 2173.500
453.514
Distance from trails (m)
1272.126
0.000 - 5937.272
1161.851
Elevation (m)
2163.139 1588.000 – 2830.000
249.524
NDVI
0.283
-0.250 - 0.698
0.123
Sin of aspect (degrees)
0.000
-1.000 – 1.000
0.702
Slope (degrees)
10.932
0.000 - 59.533
7.962
Wildfire (binary; categorical)
N/A
0 or 1
N/A
Table 2. Summary statistics of the 1km environmental predictor variables used in the habitat
suitability models
Variable
Mean
Range
Std. Deviation
Solar insulation (Wh m-2)
0.713
0.467 - 0.898
0.069
Aspect (degrees)
171.939
54.130 - 297.355
50.060
Cos of aspect
0.027
-0.189 - 0.328
0.052
Distance from roads (m)
2514.636 139.764 - 12546.196
2223.887
Distance from streams (m)
598.176
109.974 - 2346.596
333.028
Distance from trails (m)
1414.340
113.795 - 5629.797
1139.618
Elevation (m)
2182.818 1622.000 – 2838.000
224.718
NDVI
0.283
0.008 - 0.484
0.083
Sin of aspect (degrees)
0.013
-0.275 - 0.234
0.038
Slope (degrees)
11.681
1.045 - 31.075
5.319
Wildfire (binary; categorical)
N/A
0 or 1
N/A
Table 3. Logistic regression summary for the Bromus tectorum (BRTE) habitat suitability model
using environmental predictor variables with a 30m resolution
Variable
Estimate
Std. Error
z-value
p-value
Intercept
4.652e+00 2.551e-01
18.234
< 2e-16 ***
Solar insulation
4.551e+00 1.782e-01
25.545
< 2e-16 ***
Aspect
6.726e-04
2.167e-04
3.104
0.00191 **
Distance from roads
-1.598e-04 1.380e-05
-11.577
< 2e-16 ***
Distance from streams
7.476e-04
5.714e-05
13.082
< 2e-16 ***
Distance from trails
-4.649e-05 1.812e-05
-2.566
0.01028 *
Elevation
-5.595e-03 1.290e-04
-43.364
< 2e-16 ***
Sin of aspect
9.470e-02
2.974e-02
3.184
0.00145 **
Slope
6.132e-02
2.516e-03
24.374
< 2e-16 ***
NDVI
-8.900e-01 1.753e-01
-5.078
3.82e-07 ***
AIC = 16,953
Table 4. Logistic regression summary for the Bromus tectorum (BRTE) habitat suitability model
using environmental predictor variables with a 1km resolution
Variable
Estimate
Std. Error
z-value
p-value
Intercept
8.943e+00 3.760e-01
23.782
< 2e-16 ***
Solar insulation
4.321e+00 3.101e-01
13.936
< 2e-16 ***
Aspect
-3.291e-03 5.234e-04
-6.288
3.22e-10 ***
Cos of aspect
-6.486e+00 7.145e-01
-9.078
< 2e-16 ***
Distance from roads
-4.186e-04 1.911e-05
-21.909
< 2e-16 ***
Elevation
-7.360e-03 1.647e-04
-44.679
< 2e-16 ***
Sin of aspect
-1.474e+00 8.976e-01
-1.643
0.10046
Slope
1.363e-01
4.349e-03
31.338
< 2e-16 ***
Wildfire (binary)
5.369e-01
5.391e-02
9.958
< 2e-16 ***
NDVI
-8.757e-01 2.902e-01
-3.018
0.00254 **
AIC = 15,415
Table 5. Logistic regression summary for the Cirsium arvense (CIAR) habitat suitability model
using environmental predictor variables with a 30m resolution
Variable
Estimate
Std. Error
z-value
p-value
Intercept
-5.642e+00 2.521e-01
-22.379
< 2e-16 ***
Solar insulation
2.085e+00 2.033e-01
10.257
< 2e-16 ***
Aspect
-4.740e-03 2.517e-04
-18.829
< 2e-16 ***
Cos of aspect
5.668e-02
3.134e-02
1.808
0.07056 .
Distance from roads
2.023e-04
9.336e-06
21.668
< 2e-16 ***
Distance from streams
5.235e-04
4.822e-05
10.857
< 2e-16 ***
Distance from trails
-4.561e-05 2.515e-05
-1.813
0.06979 .
Elevation
2.961e-04
1.030e-04
2.876
0.00403 **
Sin of aspect
-5.028e-02 3.159e-02
-1.591
0.11152
Slope
-6.763e-03 2.972e-03
-2.276
0.02285 *
Wildfire (binary)
4.509e-01
4.734e-02
9.525
< 2e-16 ***
NDVI
7.152e-01
1.821e-01
3.928
8.56e-05 ***
AIC = 16,541
Table 6. Logistic regression summary for the Cirsium arvense (CIAR) habitat suitability model
using environmental predictor variables with a 1km resolution
Variable
Estimate
Std. Error
z-value
p-value
Intercept
-7.907e+00 3.661e-01
-21.599
< 2e-16 ***
Solar insulation
2.495e+00
3.299e-01
7.563
3.93e-14 ***
Aspect
-1.106e-02
4.999e-04
-22.122
< 2e-16 ***
Cos of aspect
2.409e+00
6.029e-01
3.996
6.43e-05 ***
Distance from roads
1.383e-04
9.407e-06
14.698
< 2e-16 ***
Distance from streams
3.298e-04
6.768e-05
4.872
1.10e-06 ***
Distance from trails
4.173e-05
2.708e-05
1.541
0.1233
Elevation
7.639e-04
1.249e-04
6.118
9.50e-10 ***
Sin of aspect
-1.353e+00 6.836e-01
-1.979
0.0478 *
Slope
8.475e-02
4.848e-03
17.480
< 2e-16 ***
Wildfire (binary)
1.254e+00
6.595e-02
19.014
< 2e-16 ***
NDVI
1.851e+00
2.984e-01
6.204
5.51e-10 ***
AIC = 15,796
Table 7. Areas under the receiver operator characteristic curves (AUC) for each model.
Model
AUC
BRTE 30m
0.838
BRTE 1km
0.889
CIAR 30m
0.718
CIAR 1km
0.733
BRTE: Bromus tectorum
CIAR: Cirsium arvense
Fig. 1. Northern Range of Yellowstone National Park and non-indigenous plant species sample
points.
Fig. 2. Probability of Bromus tectorum (BRTE) occurrence at the 30m scale in the Northern
Range of Yellowstone National Park.
Fig. 3. Probability of Bromus tectorum (BRTE) occurrence at the 1km scale in the Northern
Range of Yellowstone National Park
Fig. 4. Probability of Cirsium arvense (CIAR) occurrence at the 30m scale in the Northern
Range of Yellowstone National Park
Fig. 5. Probability of Cirsium arvense (CIAR) occurrence at the 1km scale in the Northern
Range of Yellowstone National Park
Fig. 6. Receiver operator characteristic curves for each habitat suitability model (HSM). Top
plots correspond to the Bromus tectorum (BRTE) models. Bottom plots correspond to the
Cirsium arvense (CIAR) models.
Download