jbi12098-sup-0001-AppendixS1

advertisement
SUPPORTING INFORMATION
The role of ecology in the geographical separation of blood parasites infecting an insular
bird
Josselin Cornuault, Aurélie Khimoun, Ryan J. Harrigan, Yann X.C. Bourgeois, Borja Milá,
Christophe Thébaud and Philipp Heeb
Journal of Biogeography
Appendix S1 Details of species distribution models and associated statistical tests.
Construction of SDMs
GLMs and GAMs
Models were constructed with binomial error. We used the functions glm() for generalized linear
models (GLMs) and gam() (R package GAM) for generalized additive models (GAMs). Model
selection was carried out with the functions stepAIC() for GLMs and step.gam() for GAMs.
Interactions were not considered.
MARS
We used the function earth() of the R package EARTH for multivariate adaptive regression splines
(MARS) . The pmethod argument was set to ‘none’.
SVM
We used the function ksvm() of the R package KERNLAB for support vector machines (SVM). The
type of model was epsilon regression with a polynomial kernel function. Default tuning parameters were kept unchanged except for the value of epsilon in the insensitive-loss function which
was set to 0.01.
Random forests
We used the function randomForest() of the R package RANDOMFOREST. Default parameters
were kept except for the number of trees in the forest which was set to 5000.
Average model
Once the five elementary models were calibrated, predictions were obtained for all the pixels of
the study area, for each model. The five values of predicted prevalence received by each pixel
were then averaged.
Niche identity test
The procedure for this test is described in Warren et al. (2008) (named niche equivalency test) for
presence-only data, and has been implemented in ENMTOOLS (Warren et al., 2010). It required a
slight adaptation to continuous data, because each locality cannot be scored as 1 (presence of
lineage 1) or 0 (presence of lineage 2), but received a pair of continuous prevalence values. We
proceeded as follows.
1. The 2n prevalence values (n for Lineage A + n for Lineage B) were pooled together
within vector X.
2. An empty vector Y of same length as X (2n) was created, each of its elements corresponding to one of the 2n possible locality–lineage associations.
3. The vector Y was filled with 2n random draws within X, without replacement. Y thus
represents a randomly drawn pseudo-dataset.
4. One SDM was built for each lineage with vector Y as input data. The procedure for the
building of these SDMs was exactly similar to the procedure with the real data, including
model selection when appropriate.
5. From the predictions given by the two SDMs, values of I and D statistics were calculated.
6. Steps 1–5 were reiterated 1000 times, providing a null distribution of I and D statistics to
be compared to the observed values of the same statistics.
Note that this procedure implies that both lineage identities are randomized within the whole
area, not within locality. We could have chosen to randomize lineage identities within localities,
but we believe that such a randomization procedure would not really reflect the null situation of
no difference between niches. Take this study as an example data set. Among the 33 localities, 24
are occupied by only one of the two lineages. A great part of the data set thus consists of localities with high among-lineage differences in prevalence. Such a characteristic of the data set is
particularly expected if lineages do not share the same niche and should not be constrained in a
null model of random differences between niches.
Niche background test
This test consists of randomizing the geographical coordinates of one lineage’s occurrences (i.e.
non-zero-prevalence points) within its background. Zero-prevalence points are left unchanged.
This allows one to simulate a random expectation as for the environmental conditions in which
the lineage preferentially occurs, among conditions that are actually geographically available to
it. 1000 random SDMs were thus constructed and projected over the whole island. I and D similarity indices were used as statistics in a two-tailed test where niche conservatism or niche divergence are evidenced if the observed values of I and D are respectively greater than or less than
97.5% of the null hypothesis for each index. This test requires that lineage backgrounds are delimited and one of the most used methods for defining a species background is to draw the minimal convex polygon that includes all occurrence points (Warren et al., 2010). Here we used this
method, but also tried two alternative methods for defining the distribution area of Lineage A, as
its semicircular distribution (Fig. 1) is badly represented by a convex polygon. Results were
qualitatively similar for the three methods (see below).
The niche background test was conducted as follows:
1. Let m be the number of localities where Lineage A is present, geographically circumscribed by an area B, denoted the background of Lineage A (see below). Let n be the
number of sampled localities where Lineage A was absent.
2. A new set of m geographical coordinates is drawn at random within B, providing a
pseudo-data set for non-zero prevalence, with randomized geographical coordinates and
environmental values. The n absences are left unchanged.
3. A new SDM is then built for Lineage A with this pseudo-data. The SDM for Lineage B is
kept unchanged (observed SDM).
4. Values of I and D statistics are calculated between the pseudo-SDM for Lineage A and
the observed SDM for Lineage B.
5. Steps 1–4 are reiterated 1000 times, providing a null distribution of I and D statistics to be
compared to the observed values of the same statistics.
6. Steps 1–5 are reiterated with randomization of Lineage B occurrences.
Background definition
Minimal convex polygons circumscribing the occurrences of Lineage A and Lineage B were
drawn with DIVA-GIS software.
For Lineage A, we tried two other backgrounds that we call minimal and maximal:
minimal was drawn by hand around occurrences of Lineage A and maximal was minimal extended westward as far as the ocean. maximal was also tried because most probably, the absence of
occurrence of Lineage A westwards of minimal is due to the lack of sampling and not to real
absences.
The three backgrounds are illustrated below and yield similar results for the background
test:
Type of background for Lineage A
I
P
D
P
minimal convex polygon
minimal
maximal
0.65
0.65
0.65
<0.001***
<0.001***
<0.001***
0.45
0.45
0.45
<0.001***
<0.001***
<0.001***
Convex polygon. Blue points: occurrence points of Lineage A. Yellow area: background.
Minimal background.
Maximal background.
REFERENCES
Warren, D.L., Glor, R.E. & Turelli, M. (2008) Environmental niche equivalency versus
conservatism: quantitative approaches to niche evolution. Evolution, 62, 2868–2883.
Warren, D.L., Glor, R.E. & Turelli, M. (2010) ENMTools: a toolbox for comparative studies of
environmental niche models. Ecography, 33, 607–611.
Download