JANE_1967_sm_SuppInfRegressions

advertisement
1
Supporting Information: Regressions on distance matrices
To generate the response matrix N, the nestedness metric for any pair of hosts ‘i’and ‘j’ is
calculated in the following manner: (i) suppose ‘i’ and ‘j’ are characterized by vulnerabilities (numbers of
parasite species) ki and kj respectively, and that ki>kj, (ii) the nestedness measure (Npaired) for this host pair
is given by the percentage of parasite species interacting with ‘j’ which is shared with ‘i’. (Almeida-Neto
et al. 2008). As we previously sorted the hosts in interaction matrix M in decreasing order of
vulnerability, the Npaired is zero only when species ‘i’ and ‘j’ have the same vulnerability or share no
parasite in common (see Almeida-Neto et al. 2008 for other possibilities).
The taxonomic distance matrix T was calculated by assigning semi-metric distances between host
species. Thus, tij values were assumed to be equal to 1, 2, 3, 4 and 5 if species i and j belong, respectively,
to the same genus, to the same family (but not the same genus), to the same order (but different families),
to the same class (but different orders), and to different classes. The abundance, body size, biomass, and
sampling effort distance matrices were calculated by simple subtractions between hosts, incorporating an
ordering component consistent with the concept of nestedness implied by Npaired. Consider again the two
host species ‘i’ and ‘j’, with abundances ai and aj, and vulnerabilities ki > kj. . As the parasite composition
of the species with lower vulnerability is expected to be a nested subset of the composition of the species
with higher vulnerability (and not the opposite), we also expect that ai>aj if abundance is a factor driving
nestedness (i.e. more abundant hosts are expected to have richer parasite compositions as they are easier
targets for infection and have a larger capacity to sustain parasite populations). So the abundance distance
aij was calculated as ai – aj. A matrix A with these distances for all i and j presenting ki>kj was then
calculated, and used as explanatory matrix to represent the influence of abundance on nestedness. The
same procedure was used for biomass, body size (fish standard length), and sampling effort (number of
individual fishes analyzed for parasites), generating the distance matrices B, L, and E, respectively. For
species pairs with the same vulnerability (ki= kj), the Npaired is equal to zero and there is no expected
2
direction for the difference between their abundance, biomass, body size, or sampling effort, so no value
was calculated for these pairs in the explanatory distance matrices.
All matrices were unfolded to generate vectors n, a, b, l, e, and t, representing nestedness,
abundance, biomass, body size, sampling effort, and taxonomy, respectively. These vectors contain the
H(H-1)/2 unique host pairs comprising the elements above the main diagonal of original matrices (see
Figure 10.21 in Legendre & Legendre 1998), where H is the total number of host species with data
available for all variables (H = 69). The usual procedure is then to carry out a linear regression of n
against the explanatory vectors a, b, l, e, and t (Legendre & Legendre 1998). However, as there is
substantial correlation among some of these explanatory vectors, we made a principal component
regression (Graham 2003 and references therein). It consists in replacing original explanatory vectors
with the factors derived from a Principal Component Analysis (PCA). These factors are linear
combinations of original vectors, and are orthogonal, which means that the estimation of regression
coefficients is not biased by collinearity (Graham 2003). As suggested by Graham (2003), all PCA factors
were used for regression. Prior to PCA, the variables were standardized to control for differences in scale.
After the regression, the coefficients representing the effects of original (standardized) variables are back
calculated by multiplying a matrix containing the PCA eigenvectors by a vector containing the regression
coefficients of PCA factors (Legendre & Legendre 1998).
The species pairs with the same vulnerability did not enter this analysis for the reason outlined
above. So, from a total of 69(69-1)/2 = 2346 original pairs, only 2116 were valid cases. The significance
of the regression coefficients were assessed by a randomization procedure, as in a Mantel test (Legendre
& Legendre 1998). It consisted in randomly permuting the order of species in matrix N, recalculating n,
excluding from n the cases corresponding to invalid cases in a, b, l, e, and t, (i.e. host pairs with the same
vulnerability) and performing a new regression. The procedure was repeated 9999 times, producing a
distribution of 10000 coefficient estimates (including the observed value) for each explanatory variable.
3
The p-values were calculated by the proportion of coefficients in these distributions with magnitudes
higher than or equal to the observed coefficient.
The same steps above were used to test for influences of the explanatory variables on the response
matrix C, with three exceptions: (i) we used logistic instead of linear regression; (ii) the distance matrices
of abundance, biomass, body length and sampling effort were calculated by the absolute differences,
because the ordering of species vulnerability does not matter for network modules as it matters for
nestedness; and (iii) as a consequence we could use all 2346 species pairs for this regression.
References
Almeida-Neto, M., Guimarães, P., Guimarães Jr, P.R., Loyola, R.D. & Ulrich, W.(2008) A consistent
metric for nestedness analysis in ecological systems: reconciling concept and measurement. Oikos, 117,
1227-1239.
Graham, M.H. (2003). Confronting multicollinearity in ecological multiple regression. Ecology, 84, 28092815.
Legendre, P. & Legendre L. (1998) Numerical Ecology. Elsevier Science.
Download