Journal of Biogeography Supporting Information Invasion ratcheting

advertisement
Journal of Biogeography
SUPPORTING INFORMATION
Invasion ratcheting in the zebra mussel (Dreissena polymorpha) and the ability of native
and invaded ranges to predict its global distribution
Belinda Gallardo, Philine S. E. zu Ermgassen and David C. Aldridge
Appendix S2 Optimization of ecological niche models (ENMs) for the zebra mussel.
Regularization
According to the MAXENT user manual (available at: http://www.cs.princeton.edu/~schapire/
maxent/tutorial/tutorial.doc), the ‘regularization multiplier’ parameter affects how focused or
closely-fitted the output predicted distribution is. For instance, a smaller value than the default
of 1.0 will result in a more localized output distribution that is a closer fit to the given presence
records, but can result in overfitted predictions. A larger regularization multiplier will give a
more spread out, less localized prediction (Phillips & Dudík, 2008).
For the zebra mussel, a regularization score of 1 to 4 was tested and models compared
using ENMTools (Warren et al., 2010). The Akaike information criterion corrected for sample
size (AICc) was used to select the best regularization option, as recommended by (Warren &
Seifert, 2011).
We conclude that while the default regularization multiplier of 1 is appropriate to model
the distribution of the zebra mussel based on its European and North American ranges, an increased value of 4 is needed when using the Ponto-Caspian partial data set.
Table S1 Results of ecological niche models (ENM) for the zebra mussel (Dreissena polymorpha), performed with a regularization multiplier of 1 to 4. Model: occurrence data used for calibration corresponding
to Europe (EU), North America (NA) and the Ponto-Caspian region (PC); sample size: number of data
points used to calibrate the model; AICc: Akaike information criterion corrected for sample size; AUC:
accuracy of the model. Best model indicated in bold.
Model
EU
NA
PC
Regularization
Log-likelihood
Parameters
Sample size
AICc
AUC
1.0
2.0
−9817.0
−9895.9
66
47
910
910
19776.5
19891.0
0.954
0.953
3.0
−9938.2
41
910
19962.3
0.953
4.0
−9973.5
32
910
20013.5
0.953
1.0
2.0
−18370.0
−18596.6
111
78
1642
1642
36978.1
37357.1
0.923
0.922
3.0
−18747.9
63
1642
37626.9
0.921
4.0
−18853.4
52
1642
37814.1
0.920
4.0
2.0
−1001.7
−979.8
16
29
95
95
2042.5
2044.4
0.990
0.990
1.0
−958.6
38
95
2046.2
0.989
3.0
−992.8
27
95
2062.2
0.989
Sampling bias
Zebra mussel occurrence points are clustered around the Netherlands, England and the Great
Lakes of North America, which can potentially bias the results of output predictions. A sampling
effort map was created to compensate this potential source of bias (e.g. Heibl & Renner, 2012;
Torres et al., 2012), which involves retrieving the global occurrence of a ‘target group’ of species
that are likely to be sampled using the same methods as our study species from GBIF (Phillips &
Dudík, 2008). In this case, we chose the whole Dreissenidae family, and assumed that the number of Dreissenidae occurrence records per pixel is an indirect indicator of the sampling effort
invested. We calculated Dreissenidae occurrence density at a 5-arcminute resolution with ARCVIEW 10.0 (ESRI, Redlands, CA, USA) and log10-transformed the map to reduce numeric disparities. As a result, a raster map was obtained where each pixel value ranged from 0 (lowest sampling effort) to 6 (highest sampling effort, corresponding to the Netherlands, England, and the
Great Lakes of North America). This map was used in ENM to weight occurrences i.e. an inverse
weight to its sampling effort was applied to each occurrence, reducing the importance of oversampled areas (also known as ‘bias file option’ (Phillips et al., 2011). According to this test, a
bias file did not seem to significantly improve predictions, therefore samples were not weighted
in the final models.
Species geographical attributes
ENM are affected by the geographical attributes of species, including sample size, prevalence
(i.e. the ratio between presence and absence data), clustering (i.e. spatial autocorrelation) and
rarity (Marmion et al., 2009). Species with a large number of occurrences have a higher prevalence but are likely to produce overfitted predictions, whereas a low number of occurrences
often results in very general and loose models. In this study, a considerable clustering of data
was noted that could not be appropriately compensated using the ‘bias file’ option described
before. In addition, different sample sizes for the Ponto-Caspian, European and North American
regions may prevent comparing predictions in a meaningful way.
To compensate unequal sample sizes in the three zebra mussel ranges, we created a subset of 100 randomly selected points in the invaded Europe and North America ranges – thus
comparable to sampling size in the native range (n = 98). This procedure was repeated 10 times
Table S2 Results from ecological niche models (ENM) performed with and without using a ‘bias file’.
Model: occurrence data used for calibration corresponding to Europe (EU), North America (NA) and the
Ponto-Caspian region (PC). Sample size: number of data points used to calibrate the model. AIC c: Akaike
information criterion corrected for sample size. AUC: accuracy of the model. Best option highlighted in
bold.
Model
Bias file
Log-likelihood
Parameters
Sample size
AICc
AUC
EU
no
yes
−9803.2
−11970.8
67
77
910
910
19751.2
24110.0
0.954
0.911
NA
no
yes
−18362.6
−21183.2
103
87
1642
1642
36945.1
42550.2
0.923
0.876
PC
no
yes
−956.6
−955.7
40
37
95
95
2054.1
2034.8
0.990
0.990
Table S3 Results from ecological niche models (ENM) performed using equal- and unequal-size
subsets of data for the European (EU) and North American (NA) ranges of the zebra mussel. AUC:
accuracy of the model. Prevalence: ratio between presence and background data. Range: maximum
and minimum predicted scores. Fractional area: % area predicted suitable for the species according
to the threshold maximizing the training suitability and specificity of the model (maxTSS).
Model
EU
NA
Sample size
Equal
Unequal
Equal
Unequal
AUC
0.99
0.95
0.99
0.92
Entropy
5.51
6.90
5.71
7.63
Prevalence
0.01
0.05
0.01
0.09
Range
0–0.74
0–0.64
0–0.81
0–0.84
Fractional area
0.040
0.100
0.026
0.149
to reduce the possibility of sample subselection influencing the result of models. Although we
did not specifically correct for spatial autocorrelation, the degree of clustering of each subset
was notably reduced.
We ran models using the default modelling options of MAXENT and subsequently reported the average of the 10 replicated models calibrated for Europe and North America respectively (i.e equal-size models). In this case it was not appropriate to use AICc values to compare
the performance of equal-size and unequal-size models since a lower number of degrees of freedom in the former would inevitably lead to lower AICc. We therefore used alternative metrics to
evaluate the two options: the AUC of the model, range of suitability scores, and the ecological
plausibility of predictions.
Although model accuracy (AUC) was higher in equal-size than in unequal-size models,
differences between both options were most prominent in spatial predictions. The area predicted to be suitable for the zebra mussel was 2.5 to 6 times smaller when using equal-size subsets,
notably underestimating its current and presumably potential range of distribution (Fig. 1). We
hereafter show results from ENMs calibrated with all available data, an option recommended by
Beaumont et al. (2009) and Broenimann & Guisan (2008).
(A)
(B)
Figure S1 Results from ecological niche models (ENM) calibrated with zebra mussels occurrence
data from Europe (A) and North America (B). In dark grey, the areas where the zebra mussel is
predicted to be present according to both the equal-size and unequal-size models. In light grey, the
additional areas where the species is predicted to be present according to the unequal-size model
(using all available occurrence data).
Variable inclusion
Because it has been suggested that the choice and number of predictors affects the accuracy and
transferability of ENM (Rödder & Lötters, 2010), we sequentially removed one variable at a
time and selected the option with lowest AICc (i.e. backward elimination; Guyon & Elisseeff,
2003).
Table S4 Results from backward elimination of variables in ENM calibrated with the zebra mussel native
(PC), European (EU) and North American (NA) ranges. Parameters: number of parameters in the model
after linear, quadratic, product, threshold and hinge features have been automatically optimized (‘auto
features’ default option). Sample size: number of occurrence records used to calibrate the model. AIC c:
Akaike information criterion corrected for sample size.
Model
Variable excluded
Log-likelihood
Parameters
Sample size
AICc
EU
PPdriest
None
PPseason
Tmax
Tseason
Tann
Tmin
PPann
Elevation
Geology
−9703.1
−9706.4
−9714.6
−9720.7
−9724.3
−9736.5
−9742.5
−9748.1
−9754.6
−9817.0
63
66
65
63
62
67
65
64
72
66
910
910
910
910
910
910
910
910
910
910
19541.7
19555.3
19569.4
19576.9
19581.7
19617.8
19625.1
19634.1
19665.7
19776.5
NA
None
Tann
Tmin
Elevation
PPdriest
Tseason
Tmax
−18229.1
−18250.8
−18274.6
−18295.1
−18284.5
−18304.0
−18296.0
101
95
101
95
106
94
104
1642
1642
1642
1642
1642
1642
1642
36673.5
36703.4
36764.5
36791.9
36795.7
36807.5
36814.2
PPseason
Geology
PPann
PC
Elevation, PPseason, Tann*
Elevation
PPdriest
PPann
Geology
PPseason
None
Tann
Tmax
Tseason
Tmin
−18360.2
−18370.0
−18548.6
97
111
111
1642
1642
1642
36926.8
36978.2
37335.5
−987.5
−986.4
−977.2
−990.4
−958.7
−978.0
−976.4
−976.8
−986.1
−988.0
−981.3
19
24
29
24
38
31
32
32
36
39
45
95
95
95
95
95
95
95
95
95
95
95
2023.1
2037.9
2039.2
2045.9
2046.3
2049.6
2050.9
2051.8
2090.2
2110.7
2137.1
* Intermediate backward elimination steps not shown.
Variable optimization suggested eliminating a number of factors from models calibrated
with the zebra mussel’s native and European ranges, whereas models based on its North American range should utilize all available environmental predictors.
REFERENCES
Beaumont, L.J., Gallagher, R.V., Thuiller, W., Downey, P.O., Leishman, M.R. & Hughes, L. (2009) Different
climatic envelopes among invasive populations may lead to underestimations of current and future
biological invasions. Diversity and Distributions, 15, 409–420.
Broennimann, O. & Guisan, A. (2008) Predicting current and future biological invasions: both native and
invaded ranges matter. Biology Letters, 4, 585–589.
Guyon, I. & Elisseeff, A. (2003) An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Heibl, C. & Renner, S.S. (2012) Distribution models and a dated phylogeny for Chilean Oxalis species reveal occupation of new habitats by different lineages, not rapid adaptive radiation. Systematic Biology,
61, 823–834.
Phillips, S., Dudík, M. & Schapire, R. (2011) A brief tutorial on MaxEnt. Princeton University, Princeton, NJ.
Phillips, S.J. & Dudík, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161–175.
Rödder, D. & Lötters, S. (2009) Niche shift versus niche conservatism? Climatic characteristics of the native and invasive ranges of the Mediterranean house gecko (Hemidactylus turcicus). Global Ecology and
Biogeography, 18, 674–687.
Torres, R., Jayat, J.P. & Pacheco, S. (2012) Modelling potential impacts of climate change on the bioclimatic
envelope and conservation of the Maned Wolf (Chrysocyon brachyurus). Mammalian Biology, 78, 41–
49.
Warren, D.L. & Seifert, S.N. (2011) Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21, 335–342.
Warren, D.L., Glor, R.E. & Turelli, M. (2010) ENMTools: a toolbox for comparative studies of environmental niche models. Ecography, 33, 607–611.
Download