Electronic Supplementary Material – Seo et al

advertisement
Electronic Supplementary Material – Seo et al.
Species distribution models (SDMs) have a number of advantages and disadvantages that
require a thorough understanding of their operation for correct use (Heikkinen et al.
2006). To date, SDM assessments have focused on comparing model performance
criteria (Fielding & Bell 1997). Widely used metrics in such assessments include area
under the Receiver Operating Characteristic curve (AUC) (Pearce & Ferrier. 2000), and
the Kappa statistic (Monserud. & Leemans 1992; Elith et al. 2006). However, these
indices can be uninformative, because with sufficient samples, many SDMs will indicate
similar levels of performance (Stockwell & Peterson 2002), while they may have
weighted predictor variables differently.
There is a critical need to quantify the bias introduced by coarse-scale predictor variables
(such as global climate models outputs) because these may cause SDMs to over-predict,
area for species’ ranges even while performance metrics such as AUC are high. In turn,
over-predicted species’ ranges could be problematic for conservation planners, because
areas predicted as suitable under climate change may not be. Landscape ecologists
grappling with scale effects have called for using a multi-criteria approach to assessing
how scale impacts pattern and process (Qi & Wu 1996). In species distribution modeling,
there is a need to identify what scales are best suited to accurately capturing actual
distribution of a species, which could be a guide to what scale is appropriate for running
SDMs at in the first place (Meyer & Thuiller 2006).
This study conducted tests of scale effect on species distribution models that used a
framework in which comparison of model outputs was possible. Our assumption was that
smaller grid size models would produce range predictions with greater fidelity to what a
species’ range actually is, and that there is a decay in accuracy that could be quantified as
model grid size increases. Figure 1 (a) shows the methods used in developing the
framework. SDM response variables (species presence and absence) were resampled as
shown along the top of the boxes, with the original survey points used to define a species’
presence or absence at each of seven spatial scales. Predictor variables were selected
from the WorldClim data set which has a native resolution of 1 km2. These data were
resampled as shown in the arrows along the bottom of the boxes in figure 1 (a), into the
other six spatial scales used in the model.
The number of presence and absence points and cells for each species, and the number of
presence and absence grid cells used for running the SDMs at each grid size is shown in
table 1.The distribution of species’ presence and absence points used here represents the
best available data for use in our study system, the contributed efforts of multiple
government agencies, herbaria, and scientific researchers. The maps species presence and
absence used (figure 2) illustrate the extensive geographic coverage available in
California. It is debatable whether the use of real absences is better or worse than the use
of randomly selected absences that some algorithms employ (e.g. Maxent, GARP). We
considered the use of registered absence points an advance over studies that have used
only presence data, such as herbarium records, and randomly assigned absences for
model development. However, this study was not able to test that assumption.
We used a simple resampling approach to test the scale sensitivity of SDMs developed
with climate data. Resampling of predictive variables Worldclim data to larger grid size
was done by averaging the smaller grid cells found in each larger grid cell (following
Guisan et. al, 2007). The original data of Worldclim were at a 30 second resolution, the
other 3 data (2.5, 5 and 10 minutes) have been derived through the aggregation
(http://www.worldclim.org/format.htm). Predictor variables were resampled to each of
the six other grid sizes by using the mean value of 1 km2 grid cells found within each
other grid size (e.g. 4x4 km) to produce a predictor variable value in each grid cell for
each grid size. Resampling of predictor variables was conducted in ArcGIS (ESRI 2007)
following the methods outlined in Guisan et al (2007), wherein continuous data were
resampled to coarser grid size with the “aggregate” command with the mean function.
The SDM results are meant to be comparative between operational scales, and would not
differ greatly if a different set of predictor variables were used. We felt WorldClim was
appropriate to use because these are several 1000 meteorological stations in California
that were available for development of the weather surfaces (Figure 3). Note that the
paper published on WorldClim indicates an increase in resolution of about 400 times
other weather surfaces when it was published (Hijmans et al 2005.
Analyses of SDM model results were conducted; a comparison of model performance
statistics, a cross-scale locational accuracy assessment, termed a congruence analysis,
that used derived AUC values, and a comparison of total area selected between
operational scales (described in the methods section) as shown in figure 1 (b)
Figure 1. (a) The flow of data resampling to derive the grid sizes used for SDM
modeling in the paper is shown, with response variable resampling along the top,
and predictor variable resampling along the bottom; (b) the spatial congruence
analysis of AUC values was conducted that compared SDM spatial outputs between
all the scales analyzed.
We selected nine vascular plant species for modeling, all of which are endemic or near
endemic tree species of California. These species were classed into three range size
categories, using species range maps developed from independent data (Viers et al. 2006)
to assess range size. The modeled trees size classes were narrow- (<20,000 km2),
intermediate- (20,000-90,000 km2), or broad range-size (> 90,000 km2) species. Figure 2
shows the distribution of presences and absences in the original data for each species
used in the modeling.
The tree species we selected are all large, easily recognized species. Therefore, if these
species were not reported in a vegetation plot, they were listed as absent, within the
boundaries of the plot. Presence and absence records from vegetation plot data were
sampled into the 1km2 and other grid sizes (Article Table 1). The methods used for
determining presence and absence on the landscape fall within the standard methods of
many SDM papers (See list at end of supplemental materials).
Presence
Absence
Original
Number
from Plot
and
Herbarium
Records
1,089
31,464
Broad
Presence
Absence
1,228
31,325
1,057
810
1,620
676
1,352
568
1,136
428
856
238
476
110
220
45
54
Quercus wislizenii
Broad
Intermediate
Pinus coulteri
Intermediate
Quercus agrifolia
Intermediate
1,499
31,054
2,097
30,456
317
32,236
1,389
31,164
162
Abies magnifica
Presence
Absence
Presence
Absence
Presence
Absence
Presence
Absence
1,119
2,238
1,666
3,332
204
408
827
1,654
944
1,888
1,293
2,586
152
304
669
1,338
742
1,484
852
1,704
114
228
497
994
534
1,068
449
898
81
162
347
694
318
636
207
414
54
108
198
396
125
251
84
168
37
74
83
166
61
38
32
64
22
44
32
64
Juglans californica
hindsii
Narrow
Presence
Absence
79
32,474
14
55
110
42
84
33
66
27
54
19
38
13
26
11
22
Quercus
engelmannii
Narrow
Presence
Absence
123
32,430
23
112
224
101
202
87
174
58
116
35
70
16
32
8
16
Sequoiadendron
giganteum
Narrow
Presence
Absence
105
32,448
13
87
174
75
150
57
114
42
84
28
56
18
36
10
20
Species
Pinus sabiniana
Range Class
Broad
Quercus douglasii
Herbarium
Records
39
18
0
192
1x1 km
702
1,404
2x2 km
545
1,090
4x4 km
422
844
8x8 km
320
640
16x16
km
202
404
32x32
km
97
194
64x64
km
43
56
Table 1. Number of presence and absence values used for species distribution model development at each grid size. The
vegetation plots used to identify presence and absence of species are a compilation of the majority of surveys conducted in
California, totaling over 32,000 survey points.
Figure 2. The distribution of presence (black) and absence (grey) points for the nine
species used to test bias of operational scale in species distribution models. Broadrange size species in first row, intermediate-range size species in second row,
narrow-range size species in bottom row.
To derive binary range maps for each model run from the probability range maps, the
AUC cutoff value indicating maximum sensitivity and specificity was used. Table 2
presents the AUC cutoff values used.
Species
Pinus sabiniana
Model
Type
Operational
Scale in km
1x1
2x2
4x4
8x8
16x16
32*32
64x64
GLM
GAM
CTA
ANN
0.366
0.374
0.384
0.405
0.353
0.346
0.377
0.347
0.378
0.385
0.235
0.393
0.394
0.393
0.777
0.330
0.388
0.374
0.399
0.412
0.373
0.374
0.242
0.373
0.489
0.531
0.400
0.658
GLM
GAM
CTA
ANN
0.414
0.445
0.399
0.454
0.395
0.443
0.222
0.434
0.404
0.432
0.384
0.395
0.428
0.465
0.389
0.503
0.453
0.435
0.571
0.526
0.595
0.440
0.250
0.319
0.445
0.598
0.714
0.714
GLM
GAM
CTA
ANN
0.386
0.397
0.531
0.470
0.373
0.388
0.613
0.448
0.391
0.361
0.356
0.393
0.396
0.403
0.391
0.443
0.444
0.431
0.463
0.432
0.521
0.520
0.587
0.622
0.738
0.640
0.919
0.802
Abies magnifica
GLM
GAM
CTA
ANN
0.419
0.492
0.552
0.480
0.411
0.459
0.368
0.450
0.409
0.463
0.489
0.464
0.390
0.449
0.423
0.466
0.434
0.490
0.299
0.432
0.417
0.450
0.499
0.432
0.363
0.404
0.599
0.742
Pinus coulteri
GLM
GAM
CTA
ANN
0.415
0.470
0.249
0.479
0.410
0.453
0.466
0.526
0.502
0.563
0.499
0.540
0.368
0.463
0.787
0.459
0.453
0.538
0.800
0.545
0.400
0.622
0.893
0.814
0.299
0.398
0.333
0.636
GLM
GAM
CTA
ANN
0.436
0.454
0.566
0.503
0.433
0.433
0.374
0.500
0.423
0.458
0.667
0.517
0.425
0.517
0.669
0.531
0.368
0.419
0.399
0.433
0.346
0.395
0.857
0.521
0.407
0.453
0.699
0.285
GLM
GAM
CTA
ANN
0.602
0.535
0.950
0.522
0.369
0.559
0.666
0.351
0.512
0.729
0.941
0.627
0.574
0.672
0.600
0.331
0.420
0.764
0.731
0.750
0.425
0.490
0.519
0.679
0.307
0.462
0.500
0.665
GLM
GAM
CTA
ANN
0.583
0.536
0.285
0.715
0.553
0.622
0.787
0.549
0.658
0.643
0.666
0.593
0.542
0.600
0.722
0.537
0.368
0.862
0.599
0.729
0.482
0.997
0.599
0.955
0.999
0.859
0.999
0.928
GLM
GAM
CTA
ANN
0.436
0.373
0.250
0.418
0.355
0.445
0.333
0.449
0.453
0.509
0.333
0.433
0.494
0.394
0.700
0.431
0.582
0.602
0.806
0.547
0.316
0.637
0.817
0.866
0.441
0.999
0.692
0.458
Quercus
douglasii
Quercus
wislizenii
Quercus
agrifolia
Juglans
californica
hindsii
Quercus
englemannii
Sequoiadendron
giganteum
Table 2. AUC cutoff values used in creating binary range maps from continuous
probability SDM outputs.
We conducted a five fold cross-validation test of model performance in which each data
set was partitioned into five sets and one set used as a test data set in each of five
iterations of the model to derive individual model performance (Guisan et al. 2006;
Mathy et al. 2006). Resulting AUC values varied from 0.97 to 0.56, while Kappa
statistics ranged from 0.86 to 0.23. Lower Kappa scores were recorded with the larger
grid sizes, and with Classification and Regression Tree models. Large grid size-based,
models had lower scores in this study due to the small sample size of presence and
absence points available within the bounds of the study area at the coarser grid sizes.
Complete results from the SDM analyses are included in the attached Excel spreadsheet
supplemental table. The spreadsheet contains: The AUC values derived from the five fold
cross-validation; Kappa statistics from the five fold cross-validation; the spatial
congruence AUC values; and the area selected as range for each model/species
combination. All results for each model type are in a single row, with the different model
types following one below the other, identified as GAM (General Additive Model), GLM
(General Linear Model), CTA (Classification Tree Analysis, and ANN (Artificial Neural
Net).
Finally, we developed a summary chart (Figure 4) that shows sum total of all models’
cross validation AUC, spatial congruence AUC, and area selected. The threshold for
trade offs between increasing grid size and model fidelity to the 1x1 km grid size can be
seen to be around 8x8 km.
The subject of our paper addresses one aspect of the bias introduced by using large grid
sizes, there are other analyses that could be done. The main finding of this study is that
large cells (coarse grid size) can produce different spatial output from that of small cells
(fine grid size) and tend to overestimate predicted area. It is possible that our result here
means that the populations of the selected species were nearly sufficiently sampled for
full geographic representation.
Figure 3. Networked weather stations likely used for creation of WorldClim in
California.
Figure 4. Cross-validation AUC (CV-AUC), spatial congruence AUC (SC-AUC), and
area selected (Area) for all model types ranked across grid size. Area increases, spatial
accuracy and performance statistics decline as grid size increases.
REFERENCES
Elith, J., C. H. Graham, R. P. Anderson, M. Dudik, S. Ferrier, A. Guisan, R. J. Hijmans,
F. Huettmann, J. R. Leathwick, A. Lehmann, et al. 2006. Novel methods improve
prediction of species’ distributions from occurrence data. Ecography 29, 129-151.
ESRI. 2007. ArcGIS software. Redlands, CA.
Fielding, A. H. & J. F. Bell. 1997. A review of methods for the assessment of prediction
errors in conservation presence/absence models. Environmental Conservation 24, 38-49.
Guisan, A., Graham, C.H., Elith, J., Huettmann, F., & NCEAS Species Distribution
Modelling Group. 2007. Sensitivity of predictive species distribution models to change in
grain size. Diversity and Distributions 13: 332-340.
Guisan, A., O. Broennimann, E. Engler, M. Vust, N. G. Yoccoz, A. Lehmann, & N. E.
Zimmermann. 2006. Using niche-based models to improve sampling of rare species.
Conservation Biology 20: 501-511.
Heikkinen, R.K., Luoto, M., Araújo, M.B., Virkkala, R., Thuiller, W., & Sykes, M.T.
2006. Methods and uncertainties in bioclimatic envelope modeling under climate
change. Progress in Physical Geography, 30, 751-777.
Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G. & Jarvis, A. 2005. Very high
resolution interpolated climate surfaces for global land areas. International Journal of
Climatology 25, 1965-1978.
Mathy, L. N. E. Zimmermann, N. Zbinden, & W. Suter. 2006. Identifying habitat
suitability for hazel grouse Bonasa bonasia at the landscape scale. Wildlife Biology 12:
357-366.
Meyer, C. B. & W. Thuiller. 2006. Accuracy of resource selection functions across
spatial scales. Diversity and Distributions 12, 288-297.
Monserud, R.A. & Leemans, R. 1992. Comparing global vegetation maps with
the Kappa statistic. Ecological Modelling, 62, 275-293.
Pearce, J. L. & S. Ferrier. 2000. Evaluating the predictive performance of habitat models
developed using logistic regression. Ecological Modeling 133: 225-245.
Stockwell, D.R.B & Peterson, A.T. 2001. Effects of sample size on accuracy of species
distribution models. Ecological Modelling 148, 1-13.
Qi, Y., Wu, J. 1996. Effects of changing spatial resolution on the results of landscape
pattern analysis using spatial autocorrelational indices. Landscape Ecology 11, 39-49.
A partial list of studies that have used Species Distribution Model analyses
Bakkenes, M. et al. 2002. Assessing effects of forecasted climate change on the diversity
and distribution of European higher plants for 2050. — Global Change Biology 8: 390407.
Bakkenes, M. et al. 2006. Impacts of different climate stabilisation scenarios on plant
species in Europe. — Global Environmental Change 16: 19-28.
Bomhard, B. et al. 2005. Potential impacts of future land use and climate change on the
Red List status of the Proteaceae in the Cape Floristic Region, South Africa. — Global
Change Biology 11: 1452-1468.
Broennimann, O. et al. 2006. Do geographic distribution, niche property and life form
explain plants’ vulnerability to global change? — Global Change Biology 12: 1079-1093.
Broennimann, O. et al. 2007. Evidence of climatic niche shift during biological invasion.
— Ecology Letters 10: 701-709.
Elith, J. et al. 2006. Novel methods improve prediction of species' distributions from
occurrence data. — Ecography 29: 129-151.
Erasmus, B. F. N. et al. 2002. Vulnerability of South African animal taxa to climate
change. — Global Change Biology 8: 679-693.
Huntley, B. et al. 1995. Modelling present and potential future ranges of some European
higher plants using climate response surfaces. — Journal of Biogeography 22: 967-1001.
Huntley, B. et al. 2006. Potential impacts of climatic change upon geographical
distributions of birds. — Ibis 148: 8-28.
Huntley, B. et al. 2004. The performance of models relating species geographical
distributions to climate is independent of trophic level. — Ecology Letters 7: 417-426.
Iverson, L. R. and Prasad, A. 2002. Potential redistribution of tree species habitat under
five climate change scenarios in the eastern US. — Forest Ecology and Management 155:
205-222.
Iverson, L. R. et al. 1999. Modelling potential future individual tree-species distributions
in the Eastern United States under climate change scenario: a case study with Pinus
virginiana. — Ecological Modelling 115: 77-93.
Lawler, J. J. et al. 2006. Predicting climate-induced range shifts: model differences and
model reliability. — Global Change Biology 12: 1568-1584.
Luoto, M. and Hjort, J. 2005. Evaluation of current statistical approaches for predictive
geomorphological mapping. — Geomorphology 67: 299-315.
Luoto, M. and Hjort, J. 2006. Scale matters–A multi-resolution study of the determinants
of patterned ground activity in subarctic Finland. — Geomorphology In press:
Midgley, G. F. et al. 2002. Assessing the vulnerability of species richness to
anthropogenic climate change in a biodiversity hotspot. — Global Ecology &
Biogeography 11: 445-451.
Midgley, G. F. et al. 2003. Developing regional and species-level assessments of climate
change impacts on biodiversity in the Cape Floristic Region. — Biological Conservation
112: 87-97.
Midgley, G. F. et al. 2006. Migration rate limitations on climate change induced range
shifts in Cape Proteaceae. — Diversity and Distributions 12: 555–562.
Pearson, R. G. et al. 2002. SPECIES: A Spatial Evaluation of Climate Impact on the
Envelope of Species. — Ecological Modelling 154: 289-300.
Pearson, R. G. et al. 2004. Modelling species distributions in Britain: a hierarchical
integration of climate and land-cover data. — Ecography 27: 285-298.
Rouget, M. et al. 2004. Mapping the potential ranges of major plant invaders in South
Africa, Lesotho and Swaziland using climatic suitability. — Diversity and Distributions
10: 475-484.
Thuiller, W. 2003. BIOMOD: Optimizing predictions of species distributions and
projecting potential future shifts under global change. — Global Change Biology 9:
1353-1362.
Thuiller, W. et al. 2004. Do we need land-cover data to model species distributions in
Europe? — Journal of Biogeography 31: 353-361.
Thuiller, W. et al. 2006. Vulnerability of African mammals to anthropogenic climate
change under conservative land transformation assumptions. — Global Change Biology
12: 424-440.
Thuiller, W. et al. 2005. Niche properties and geographical extent as predictors of species
sensitivity to climate change. — Global Ecology and Biogeography 14: 347-357.
Thuiller, W. et al. 2005. Climate change threats to plant diversity in Europe. —
Proceedings of the National Academy of Sciences, USA 102: 8245-8250.
Thuiller, W. et al. 2006. Using niche-based modelling to assess the impact of climate
change on tree functional diversity in Europe. — Diversity and Distributions 12: 49-60.
Thuiller, W. et al. 2006. Endemic species and ecosystem vulnerability to climate change
in Namibia. — Global Change Biology 12: 759–776.
Thuiller, W. et al. 2005. Niche-based modelling as a tool for predicting the risk of alien
plant invasions at a global scale. — Global Change Biology 11: 2234–2250.
Walther, B. A. et al. 2007. Modelling the winter distribution of a rare and endangered
migrant, the Aquatic Warbler Acrocephalus paludicola. — IBIS 149: 701–714.
Download