Effect of changing grain size on the predictive performance and ecological drivers of invasive plant habitat suitability models Melissa E. Bridges Introduction Generating spatially explicit habitat suitability models for non-indigenous or invasive plant species (NIS) across a landscape of interest can be a very powerful decision support tool for land managers charged with reducing the spread of such weeds. Habitat suitability models (HSM) can be used for monitoring new populations for early detection and rapid response (Stohlgren and Schnase 2006). These models could be used to prioritize management (Rew et al. 2007) assuming more suitable habitats foster stronger invasive plant source populations (Maxwell et al. unpub; Hanski 1994). There are several published and tested modeling methods used to generate habitat suitability models for a variety of species (Elith et al. 2006). Most methods utilize presence and absence data for a species of interest as the response variable and a set of environmental variables as predictors. Habitat suitable models for several NIS in the Northern Range of Yellowstone National Park (NRYNP) have been generated using general linear models with logit link functions (i.e., logistic regression) (Rew et al. 2005). The predicted response of a logistic regression, like most of habitat suitability modeling methods, yields values bounded between 0 and 1. Typically, the expected value of the response is interpreted as a probability of occurrence. Because NIS presence and absence data were collected at a 10m scale in NRYNP and most predictor variables were available at a 10m resolution (i.e., grain size), initial HSM predictions were made at a 10m scale (Rew et al. 2005). However, little is know how predictions would differ at coarser grain sizes or how drivers of the ecological processes giving rise to the spatial pattern of invasive plant habitats change when predictions are made at different spatial scales (but see Guisan et al. 2007 for some evaluations of predictive performance and changing grain size). The objectives of this study were to i) quantify how changing grain size (resolution) of the environmental predictor variables and, thus, of the prediction (response) surface from 30m resolution to a 1km resolution affected the predictive performance of HSMs for two invasive plant species within the NRYNP and ii) Characterize how changing the prediction scale for habitat suitability affected which ecological drivers (i.e., environmental variables) were significant in the resulting habitat suitability models. Methods NIS Survey Surveys were conducted within the YNPNR for the presence and absence of non-indigenous plant species (NIS) at a 10m scale between the years of 2001-2003 (Fig. 1). The survey was designed as a stratified random sample where 2km continuous transects were randomly assigned to start from roads or trails and end greater than or equal to 2km from any other road or trail (Rew et al. 2005). Mapping crews used Trimble GeoXT GPS units to record the presence and absence of several non-indigenous plant species every 10m. This survey method was illustrated as being efficient and adequate for characterizing NIS (Rew et al. 2006). The surveys conducted between 2001 and 2003 resulted in a total sample size of n = 52,631. NIS Species Cirsium arvense (CIAR) and Bromus tectorum (BRTE) were selected for analyzing the effect of changes in grain size on the predictive performance and the variables explaining their spatial distributions. These two species were selected because they represent different life histories. CIAR is a perennial species thought to have been introduced to North America in the 1600s (Morishita 1999) and reproduces both sexually and asexually via adventitious root buds (Donald 1994). Conversely, BRTE is an annual grass species that reproduces by seed and was first introduced to North America between 1875 and 1889 (Mack 1981). Both species are considered invasive and are pests of both rangeland and cropland. Environmental Variables Environmental variables were assembled based on either availability for the spatial extent of the study area and their potential relevance to describing the habitat suitability of CIAR and BRTE (Tables 1 and 2) (Evangelista et al. 2008). Several of the variables were derived from a 10m digital elevation model acquired from the U.S. Geological Survey (http://seamless.usgs.gov). The normalized difference vegetation index (NDVI) was calculated using 2002 scenes from Landsat TM bands 3 and 4 (Eq. 1). Landsat imagery was acquired from GLOVIS (http://glovis.usgs.gov/). The binary variable, wildfire, depicts the presence and absence of wildfire in the NRYNP and was provided by Park staff. NDVI = (NIR – RED) / (NIR + RED) Eq. 1 Where NIR = near infrared reflectance band (Landsat band 4) Red = red reflectance band (Landsat band 3) All continuous variables were re-sampled to 30m and 1km using bilinear interpolation within functions in ESRI ArcGIS 9.3. Categorical variables were re-sampled to 30m and 1km using nearest neighbor assignment. Habitat Suitability Model Building A subset of data were randomly selected from the total dataset of presences and absences to represent the model building data (n =47,367). The remaining data were used for model validation (n = 5264). A popular method for generating HSMs and the method employed in this study is through the use of logistic regression (Eq. 2) (Keating and Cherry 2004; Rew et al. 2005; Morisette et al. 2006). p( y 1| x) Eq. 2 exp( B 0 B1x1 ... Bpxp) 1 exp( B 0 B1x1 ... Bpxp) The response variable, y, is a binary response equal to 1 if the sample or trial resulted in a success or equal to 0 if the sample was not a success. As the logistic regression applies to HSMs, y is equal to 1 if the species is present or equal to 0 if the species is absent. The vector, x, represent continuous or categorical environmental variables. The vector, β, represents the parameters to be estimated. The inference is intrinsically bounded between 0 and 1 (Keating and Cherry 2004), which can be interpreted as a probability of species occurrence. Model Selection Logistic regression was performed in R (www.r-project.org) using the GLM function (family = binomial). Final models were selected based on a stepwise procedure designed to select the model with the lowest Akaike’s Information Criterion (AIC) value (STEP function in R). Coefficients for final models were used in ESRI Spatial Analyst and ArcGIS 9.3 to create maps depicting the probabilities of occurrence for each modeled species. Model Validation The subset validation datasets for each species were used to test the predictive performance of each model. A commonly used measure for assessing the predictive ability of a logistic regression model is calculating the area under the receiver operator characteristic curve (AUC of the ROC curve) (Fielding and Bell 1997). The AUC is a probability threshold independent measure of model performance unlike Kappa statistics, which require a probability decision rule at which the classification of the model is assessed (Fielding and Bell 1997). A ROC curve is derived from varying the decision or probability threshold incrementally from 0 to 1.0. At each increment, an accuracy assessment of the classification is assessed where the true positive and false positive rates of classification are calculated (Pearce and Ferrier 2000). A ROC curve for each BRTE and CIAR model was constructed by plotting the false positive rate of classification (1-specificity) versus the true positive rate of classification (sensitivity). The areas under each curve were calculated to yield one probability threshold independent measure of model performance. An AUC value of 0.90 is interpreted as that the model correctly discriminated between sites where a species was present and sites where a species was absent 90% of the time (Pearce and Ferrier 2000). Receiver operator characteristic curves and the calculation of the AUC for each of the final models selected by AIC were constructed in R (ROCR and Verification packages). Results Environmental explanatory variables Mean and range values of each of the environmental variables fluctuated slightly as grain size increased from 30m to 1km (Tables 1 and 2). In general, the variability of in the data of each of the environmental variables decreased as grain size increased (Tables 1 and 2). The variance of the distance from roads variable remained approximately unchanged from the 30m to the 1km resolution. Final HSMs Final models describing the habitat suitability of both Bromus tectorum (BRTE) and Cirsium arvense (CIAR) at both the 30m and 1km scales were selected using a stepwise AIC model selection procedure. The models for BRTE at both the 30m and 1km scales utilized similar environmental variables; however, neighborhood variables such as distance from roads, streams, and trails were more important at the 30m versus the 1km scale (Tables 3 and 4). The relationship between the presence of BRTE and aspect switched directions from a positive relationship at the 30m to a negative relationship at the 1km scale (Tables 3 and 4). Furthermore, the presence of wildfire increased the odds of a presence of BRTE only at the 1km scale (Table 4). The models for CIAR at both the 30m and 1km scales utilized all the same environmental variables and final models selected at both scales did not eliminate any of the variables from the full model (Tables 5 and 6). Relationships of both distance from trails and slope with the presence of CIAR changed as the scale of the prediction increased from 30m to 1km. Distance from trails and slope had negative relationships with CIAR presence at the 30m prediction scale; whereas, they had positive relationships at the 1km scale (Tables 5 and 6). Final logistic regression models were plotted onto geographic space to create maps of the probability of occurrence, interpreted as a relative ranking of habitat suitability, of each species at each scale (Figs. 2-5). When predictions were made at a courser grain size, detail in the HSM was lost. Although different sets of explanatory variables were used to derive the maps for BRTE at 30m versus 1km, the pattern of prediction for both models appears similar; however, the 1km prediction was unable to predict some of the mid-range probabilities of occurrence in the southern portion of the study area that the 30m prediction was able to model (Figs. 2 and 3). The spatial patterns of habitat suitability for CIAR at both scales of prediction were also very similar, which was not too surprising considering both models used the same set of predictor variables (Figs. 4 and 5). HSM validation Receiver operator characteristic (ROC) curves (Fig. 6) were constructed for each model and the area under the ROC curves (AUC) were calculated to determine model performance. AUC values increased with increased grain size for both species (Table 7). The AUC for the 30m BRTE model was 0.84 compared to an AUC of 0.89 for the 1km BRTE model. Likewise, the 30m CIAR model yielded an AUC of 0.718 and the 1km CIAR model had an AUC equal to 0.733. Discussion HSM performance and scale Understanding the relationship between scale and habitat suitability of invasive species is important both for increasing our understanding of the potential processes driving the spatial distribution of an invasion and for land managers as they prioritize how and where to monitor for invasive plant populations. Model performance at different resolutions can indicate at what scale(s) the relationship between the environment and a species’ distribution is likely the dominating process versus competition processes (Wiens 1989). In the case of both BRTE and CIAR, all four models yielded AUC values that were better than random chance (i.e., AUC > 0.5), suggesting that the environmental variables, thus habitat, contributed to the spatial pattern of each of species across the study area at both the 30m and 1km scales. A model yielding an AUC value no better than random chance would suggest that there were ecological drivers (e.g., community assemblage, measures of competition) not included in the model that could have better explained the spatial distribution of species occurrences across the landscape than did the variables used to describe habitat. Both models for BRTE resulted in higher AUC values than the models for CIAR. My models suggest that I was better able to discriminate BRTE habitat than CIAR habitat. I initially hypothesized that CIAR habitat would be more distinguishable and result in higher AUC values than BRTE models because CIAR populations have been in North America perhaps 200 years longer than BRTE populations, thus, allowing their populations to reach equilibrium with its habitat availability. HSMs are often plagued with the difficulty of discriminating suitability habitat because data on presence and absence of an introduced species can usually only model the realized niche and not the fundamental niche of that species (Barry and Elith 2006). The difference between the realized and fundamental niches describes the difference between suitable habitat that is occupied and suitable habitat that is not occupied at the time of the survey and represents a source of uncertainty in HSMs (Barry and Elith 2006). Another interesting result from evaluating model performance at two scales was that for both BRTE and CIAR, AUC values were slightly higher for the 1km models. This result is consistent with what Guisan et al. (2007) reported after evaluating the performance of several HSMs for different plant species in different regions of the world. Ecological driver response to changing scale A different set of environmental variables were used to make predictions for BRTE at the 30m scale versus the 1km scale. There were no differences in the set of variables selected by AIC for CIAR HSMs. The importance of the neighborhood variables (i.e., proximity variables) decreased with increasing grain size for BRTE; however, the spatial patterns of probabilities of occurrence remained very similar. This result for BRTE could suggest that different processes at different spatial scales were yielding the same (or similar) geographic distribution for this species. This is an interesting but not unexpected result if the models were discriminating habitat better than random chance. Taking a closer look at how the relationships between environmental variables and a species presence change with coarsening the grain size of the prediction was an interesting exercise as it allowed the evaluation of how the odds of a presence are either reduced or increased depending on the scale. Implications for management Land managers could find useful at what scale a HSM is performing best, and this scale could be used to design monitoring protocols. Monitoring is often a low priority for weed managers because of limited resources. Producing HSMs can help managers prioritize what locations should be monitored for early detection of new unwanted plant populations (Rew et al. 2005, Stohlgren and Schnase 2006). Likewise, determining the appropriate scale for making predictions about habitat suitability of an invasive species can aid managers in developing principles for management and can guide the design of future NIS surveys. References Barry S, Elith J (2006) Error and uncertainty in habitat models. J Applied Ecol 43:413-423 Donald WW (1994) The biology of Canada thistle (Cirsium arvense). Rev Weed Sci 6:77-101 Elith J, Graham CH, Anderson RP, Dudik M, Ferrier S, Guisan A, et al (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129-151 Evangelista PH, Kumar S, Stohlgren TJ, Jarnevich CS, Crall AW, Norman III JB, Barnett DT (2008) Modelling invasion for a habitat generalist and a specialist plant species. Diversity Distrib 14:808-817 Fielding AH, Bell JF (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49 Guisan A, Graham CH, Elith J, Huettmann F, NCEAS Species Distribution Modelling Group. (2007) Sensitivity of predictive species distribution models to change in grain size. Diversity Distrib 13:332-340 Hanski I. (1994) A practical model of metapopulation dynamics. J Animal Ecol. 63:151-162 Keating K, Cherry S (2004) Use and interpretation of logistic regression in habitat-selection studies. J Wildlife Mang 68:774-789 Lehnhoff EA, Rew LJ, Maxwell BD, Taper ML (2008) Quantifying invasiveness of plants: a test case with yellow toadflax (Linaria vulgaris). Invasive Plant Sci Mang 1:319325 Mack RN (1981) Invasion of Bromus tectorum L. into western North America: an ecological chronicle. Agro-Ecosystems 7:145-165 Maxwell B.D., et al (unpub) Modified incidence function for modeling metapopulation dynamics. Morishita Don W. (1999) Canada thistle. In: Sheley RL.; Petroff JK, eds. Biology and management of noxious rangeland weeds. Corvallis, OR: Oregon State University Press: 162-174 Morisette JT, Jarnevich CS, Ullah A, Cai WJ, Pedelty JA, Gentle JE, Stohlgren TJ, Schnase JL (2006) A tamarisk habitat suitability map for the continental United States. Frontiers in Ecology and the Environment 4:11-17 Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Modelling 133:225-245 Rew LJ, Maxwell BD, Aspinall R (2005) Predicting the occurrence of non-indigenous species using environmental and remotely sensed data. Weed Sci 53:236-241 Rew LJ, Maxwell BD, Dougher FL, Aspinall R (2006) Searching for a needle in a haystack: evaluating survey methods for non-indigenous plant species. Biological Invasions 8:523-539 Rew LJ, Lehnhoff EA, Maxwell BD (2007) Non-indigenous species management using a population prioritization framework. Can. J. Plant Sci 87:1029-1036 Stohlgren TS., Schnase JL (2006) Risk analysis for biological hazards: what we need to know about invasive species. Risk Anal 26:163-173 Weins JA (1989) Spatial scaling in ecology. Functional Ecology 3:385-397 Tables and Figures Table 1. Summary statistics of the 30m environmental predictor variables used in the habitat suitability models Variable Mean Range Std. Deviation -2 Solar insulation (Wh m ) 0.719 0.196 - 1.058 0.119 Aspect (degrees) 176.365 0.000 - 359.396 108.552 Cos of aspect (degrees) 0.040 -1.000 – 1.000 0.711 Distance from roads (m) 2441.627 0.000 - 11231.580 2221.876 Distance from streams (m) 590.935 0.000 - 2173.500 453.514 Distance from trails (m) 1272.126 0.000 - 5937.272 1161.851 Elevation (m) 2163.139 1588.000 – 2830.000 249.524 NDVI 0.283 -0.250 - 0.698 0.123 Sin of aspect (degrees) 0.000 -1.000 – 1.000 0.702 Slope (degrees) 10.932 0.000 - 59.533 7.962 Wildfire (binary; categorical) N/A 0 or 1 N/A Table 2. Summary statistics of the 1km environmental predictor variables used in the habitat suitability models Variable Mean Range Std. Deviation Solar insulation (Wh m-2) 0.713 0.467 - 0.898 0.069 Aspect (degrees) 171.939 54.130 - 297.355 50.060 Cos of aspect 0.027 -0.189 - 0.328 0.052 Distance from roads (m) 2514.636 139.764 - 12546.196 2223.887 Distance from streams (m) 598.176 109.974 - 2346.596 333.028 Distance from trails (m) 1414.340 113.795 - 5629.797 1139.618 Elevation (m) 2182.818 1622.000 – 2838.000 224.718 NDVI 0.283 0.008 - 0.484 0.083 Sin of aspect (degrees) 0.013 -0.275 - 0.234 0.038 Slope (degrees) 11.681 1.045 - 31.075 5.319 Wildfire (binary; categorical) N/A 0 or 1 N/A Table 3. Logistic regression summary for the Bromus tectorum (BRTE) habitat suitability model using environmental predictor variables with a 30m resolution Variable Estimate Std. Error z-value p-value Intercept 4.652e+00 2.551e-01 18.234 < 2e-16 *** Solar insulation 4.551e+00 1.782e-01 25.545 < 2e-16 *** Aspect 6.726e-04 2.167e-04 3.104 0.00191 ** Distance from roads -1.598e-04 1.380e-05 -11.577 < 2e-16 *** Distance from streams 7.476e-04 5.714e-05 13.082 < 2e-16 *** Distance from trails -4.649e-05 1.812e-05 -2.566 0.01028 * Elevation -5.595e-03 1.290e-04 -43.364 < 2e-16 *** Sin of aspect 9.470e-02 2.974e-02 3.184 0.00145 ** Slope 6.132e-02 2.516e-03 24.374 < 2e-16 *** NDVI -8.900e-01 1.753e-01 -5.078 3.82e-07 *** AIC = 16,953 Table 4. Logistic regression summary for the Bromus tectorum (BRTE) habitat suitability model using environmental predictor variables with a 1km resolution Variable Estimate Std. Error z-value p-value Intercept 8.943e+00 3.760e-01 23.782 < 2e-16 *** Solar insulation 4.321e+00 3.101e-01 13.936 < 2e-16 *** Aspect -3.291e-03 5.234e-04 -6.288 3.22e-10 *** Cos of aspect -6.486e+00 7.145e-01 -9.078 < 2e-16 *** Distance from roads -4.186e-04 1.911e-05 -21.909 < 2e-16 *** Elevation -7.360e-03 1.647e-04 -44.679 < 2e-16 *** Sin of aspect -1.474e+00 8.976e-01 -1.643 0.10046 Slope 1.363e-01 4.349e-03 31.338 < 2e-16 *** Wildfire (binary) 5.369e-01 5.391e-02 9.958 < 2e-16 *** NDVI -8.757e-01 2.902e-01 -3.018 0.00254 ** AIC = 15,415 Table 5. Logistic regression summary for the Cirsium arvense (CIAR) habitat suitability model using environmental predictor variables with a 30m resolution Variable Estimate Std. Error z-value p-value Intercept -5.642e+00 2.521e-01 -22.379 < 2e-16 *** Solar insulation 2.085e+00 2.033e-01 10.257 < 2e-16 *** Aspect -4.740e-03 2.517e-04 -18.829 < 2e-16 *** Cos of aspect 5.668e-02 3.134e-02 1.808 0.07056 . Distance from roads 2.023e-04 9.336e-06 21.668 < 2e-16 *** Distance from streams 5.235e-04 4.822e-05 10.857 < 2e-16 *** Distance from trails -4.561e-05 2.515e-05 -1.813 0.06979 . Elevation 2.961e-04 1.030e-04 2.876 0.00403 ** Sin of aspect -5.028e-02 3.159e-02 -1.591 0.11152 Slope -6.763e-03 2.972e-03 -2.276 0.02285 * Wildfire (binary) 4.509e-01 4.734e-02 9.525 < 2e-16 *** NDVI 7.152e-01 1.821e-01 3.928 8.56e-05 *** AIC = 16,541 Table 6. Logistic regression summary for the Cirsium arvense (CIAR) habitat suitability model using environmental predictor variables with a 1km resolution Variable Estimate Std. Error z-value p-value Intercept -7.907e+00 3.661e-01 -21.599 < 2e-16 *** Solar insulation 2.495e+00 3.299e-01 7.563 3.93e-14 *** Aspect -1.106e-02 4.999e-04 -22.122 < 2e-16 *** Cos of aspect 2.409e+00 6.029e-01 3.996 6.43e-05 *** Distance from roads 1.383e-04 9.407e-06 14.698 < 2e-16 *** Distance from streams 3.298e-04 6.768e-05 4.872 1.10e-06 *** Distance from trails 4.173e-05 2.708e-05 1.541 0.1233 Elevation 7.639e-04 1.249e-04 6.118 9.50e-10 *** Sin of aspect -1.353e+00 6.836e-01 -1.979 0.0478 * Slope 8.475e-02 4.848e-03 17.480 < 2e-16 *** Wildfire (binary) 1.254e+00 6.595e-02 19.014 < 2e-16 *** NDVI 1.851e+00 2.984e-01 6.204 5.51e-10 *** AIC = 15,796 Table 7. Areas under the receiver operator characteristic curves (AUC) for each model. Model AUC BRTE 30m 0.838 BRTE 1km 0.889 CIAR 30m 0.718 CIAR 1km 0.733 BRTE: Bromus tectorum CIAR: Cirsium arvense Fig. 1. Northern Range of Yellowstone National Park and non-indigenous plant species sample points. Fig. 2. Probability of Bromus tectorum (BRTE) occurrence at the 30m scale in the Northern Range of Yellowstone National Park. Fig. 3. Probability of Bromus tectorum (BRTE) occurrence at the 1km scale in the Northern Range of Yellowstone National Park Fig. 4. Probability of Cirsium arvense (CIAR) occurrence at the 30m scale in the Northern Range of Yellowstone National Park Fig. 5. Probability of Cirsium arvense (CIAR) occurrence at the 1km scale in the Northern Range of Yellowstone National Park Fig. 6. Receiver operator characteristic curves for each habitat suitability model (HSM). Top plots correspond to the Bromus tectorum (BRTE) models. Bottom plots correspond to the Cirsium arvense (CIAR) models.