SUPPORTING INFORMATION The role of ecology in the geographical separation of blood parasites infecting an insular bird Josselin Cornuault, Aurélie Khimoun, Ryan J. Harrigan, Yann X.C. Bourgeois, Borja Milá, Christophe Thébaud and Philipp Heeb Journal of Biogeography Appendix S1 Details of species distribution models and associated statistical tests. Construction of SDMs GLMs and GAMs Models were constructed with binomial error. We used the functions glm() for generalized linear models (GLMs) and gam() (R package GAM) for generalized additive models (GAMs). Model selection was carried out with the functions stepAIC() for GLMs and step.gam() for GAMs. Interactions were not considered. MARS We used the function earth() of the R package EARTH for multivariate adaptive regression splines (MARS) . The pmethod argument was set to ‘none’. SVM We used the function ksvm() of the R package KERNLAB for support vector machines (SVM). The type of model was epsilon regression with a polynomial kernel function. Default tuning parameters were kept unchanged except for the value of epsilon in the insensitive-loss function which was set to 0.01. Random forests We used the function randomForest() of the R package RANDOMFOREST. Default parameters were kept except for the number of trees in the forest which was set to 5000. Average model Once the five elementary models were calibrated, predictions were obtained for all the pixels of the study area, for each model. The five values of predicted prevalence received by each pixel were then averaged. Niche identity test The procedure for this test is described in Warren et al. (2008) (named niche equivalency test) for presence-only data, and has been implemented in ENMTOOLS (Warren et al., 2010). It required a slight adaptation to continuous data, because each locality cannot be scored as 1 (presence of lineage 1) or 0 (presence of lineage 2), but received a pair of continuous prevalence values. We proceeded as follows. 1. The 2n prevalence values (n for Lineage A + n for Lineage B) were pooled together within vector X. 2. An empty vector Y of same length as X (2n) was created, each of its elements corresponding to one of the 2n possible locality–lineage associations. 3. The vector Y was filled with 2n random draws within X, without replacement. Y thus represents a randomly drawn pseudo-dataset. 4. One SDM was built for each lineage with vector Y as input data. The procedure for the building of these SDMs was exactly similar to the procedure with the real data, including model selection when appropriate. 5. From the predictions given by the two SDMs, values of I and D statistics were calculated. 6. Steps 1–5 were reiterated 1000 times, providing a null distribution of I and D statistics to be compared to the observed values of the same statistics. Note that this procedure implies that both lineage identities are randomized within the whole area, not within locality. We could have chosen to randomize lineage identities within localities, but we believe that such a randomization procedure would not really reflect the null situation of no difference between niches. Take this study as an example data set. Among the 33 localities, 24 are occupied by only one of the two lineages. A great part of the data set thus consists of localities with high among-lineage differences in prevalence. Such a characteristic of the data set is particularly expected if lineages do not share the same niche and should not be constrained in a null model of random differences between niches. Niche background test This test consists of randomizing the geographical coordinates of one lineage’s occurrences (i.e. non-zero-prevalence points) within its background. Zero-prevalence points are left unchanged. This allows one to simulate a random expectation as for the environmental conditions in which the lineage preferentially occurs, among conditions that are actually geographically available to it. 1000 random SDMs were thus constructed and projected over the whole island. I and D similarity indices were used as statistics in a two-tailed test where niche conservatism or niche divergence are evidenced if the observed values of I and D are respectively greater than or less than 97.5% of the null hypothesis for each index. This test requires that lineage backgrounds are delimited and one of the most used methods for defining a species background is to draw the minimal convex polygon that includes all occurrence points (Warren et al., 2010). Here we used this method, but also tried two alternative methods for defining the distribution area of Lineage A, as its semicircular distribution (Fig. 1) is badly represented by a convex polygon. Results were qualitatively similar for the three methods (see below). The niche background test was conducted as follows: 1. Let m be the number of localities where Lineage A is present, geographically circumscribed by an area B, denoted the background of Lineage A (see below). Let n be the number of sampled localities where Lineage A was absent. 2. A new set of m geographical coordinates is drawn at random within B, providing a pseudo-data set for non-zero prevalence, with randomized geographical coordinates and environmental values. The n absences are left unchanged. 3. A new SDM is then built for Lineage A with this pseudo-data. The SDM for Lineage B is kept unchanged (observed SDM). 4. Values of I and D statistics are calculated between the pseudo-SDM for Lineage A and the observed SDM for Lineage B. 5. Steps 1–4 are reiterated 1000 times, providing a null distribution of I and D statistics to be compared to the observed values of the same statistics. 6. Steps 1–5 are reiterated with randomization of Lineage B occurrences. Background definition Minimal convex polygons circumscribing the occurrences of Lineage A and Lineage B were drawn with DIVA-GIS software. For Lineage A, we tried two other backgrounds that we call minimal and maximal: minimal was drawn by hand around occurrences of Lineage A and maximal was minimal extended westward as far as the ocean. maximal was also tried because most probably, the absence of occurrence of Lineage A westwards of minimal is due to the lack of sampling and not to real absences. The three backgrounds are illustrated below and yield similar results for the background test: Type of background for Lineage A I P D P minimal convex polygon minimal maximal 0.65 0.65 0.65 <0.001*** <0.001*** <0.001*** 0.45 0.45 0.45 <0.001*** <0.001*** <0.001*** Convex polygon. Blue points: occurrence points of Lineage A. Yellow area: background. Minimal background. Maximal background. REFERENCES Warren, D.L., Glor, R.E. & Turelli, M. (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62, 2868–2883. Warren, D.L., Glor, R.E. & Turelli, M. (2010) ENMTools: a toolbox for comparative studies of environmental niche models. Ecography, 33, 607–611.