ele1809-sup-0002-FigureS1-9-Legends

advertisement
Supplementary Figure S1. The hierarchical theory. (a) Metacommunity level; (b)
Community level. Illustrations of the data generation showing three hypothetical species across
three hypothetical communities within the study area (the US). At a metacommunity level (a),
the data points are species (highlighted in red) and the relationships among their average body
size, niche breadth, maximum population abundance across the study area, and distribution
within the US are explored as outlined in Fig. 1a, c. Note that in (a) only communities 1 and 3
contain the largest population of a species. Although not all communities are represented in the
metacommunity dataset, its geographic range is equivalent to this of the community dataset,
described in (b). Furthermore, species’ ln-transformed maximum and average population
abundance across all sites of detection are highly correlated (between 90 and 93% in the three
studied groups), indicating that the metacommunity analyses are not influenced by the choice of
a summary abundance metric. At a community level (b), the data points are properties of the
community (highlighted in red), e.g. species richness as well as slopes b and d, which are derived
from similar analyses as in the metacommunity level but using species body size (measured at
the site or averaged across all collections depending on the dataset), local abundance, and
distribution within the US. Then, the two slopes are treated as a response in the pathways,
depicted in Fig. 1b, d. Note that in (b) communities 1-3 are included.
-1-
Supplementary Figure S2. Diatoms. (a) Relationship between two ln-transformed
measures of distribution, namely number of occurrences in 720 localities (x) and geographic
range (y), calculated as the sum of the maximum latitudinal and maximum longitudinal span in
km. Species with a single occurrence have a geographic range of zero. The relationship is fit with
a second order formation function (TableCurve 2D 5.01, SYSTAT Software, Inc. 2002),
indicating that half of the maximum geographic range is reached when a species has only 1.5
occurrences (x50 = 1/ab = 0.41; e0.41 = 1.5, where a and b = regression parameters). In other
words, geographic range increases at a much faster rate than occurrence, whereby species with
only a few occurrences exhibit broad geographic spans (see also (b)). Therefore, species
occurrence outperforms geographic range as a measure of distribution because species with
broad geographic ranges can be present in a handful of localities, contributing only marginally to
the regional colonist pool. The regression model and parameters are given in the figure; p <
0.00001 for both parameters. The same pattern is observed in invertebrates (2078 species across
1866 localities, R2 = 0.94, x50 = 1.6) and fish (561 species across 1105 localities, R2 = 0.90, x50 =
1.8). (b) Distribution maps of the diatoms Achnanthidium minutissimum (Kützing) Czarnecki
(top panel) and Stauroneis phoenicenteron (Nitzsch) Ehrenberg (bottom panel), showing the
localities where these species are found, i.e. 593 localities for A. minutissimum and 12 localities
for S. phoenicenteron. Maximum latitudinal and maximum longitudinal span (shown as arrows)
are calculated as the range between the minimum and maximum latitude or longitude of
detection, respectively, converted to Great Circle distances in km. The inserts show micrographs
of the two species (courtesy of Chad Larson). Note that although the two diatoms have
continental ranges, their total occurrences are drastically different.
-2-
Supplementary Figure S3. Testing the metacommunity model. (a) Relationship of
diatom niche breadth (NB) vs. proportion (P) of occurrences in streams with common conditions
fit with a Gaussian model, which is given in the figure (p < 0.00001 for all parameters). In
diatoms as well as invertebrates and fish (discussed below), NB is measured as the species’ root
mean square standard deviation across the first four axes of canonical correspondence analysis of
species and environmental data (the RMSTOL metric in CANOCO). The value of P (Pu)
yielding maximum niche breadth (NBmax) is equal to parameter b and the standard deviation (SD)
about NBmax = parameter c. In invertebrates (N = 1739 taxa), the Gaussian model generates the
following regression statistics: R2 = 0.46, a = 92.25, b = 0.54, c = 0.32 (p < 0.00001 for all
parameters), while in fish (N = 488 species), the respective statistics are: R2 = 0.36, a = 78.47, b
= 0.56, c = 0.34 (p < 0.00001 for all parameters). (b)–(d) ANOVA least squares means (±
standard error) compared with Tukey post-hoc tests, testing the hypothesis of no difference in
occurrence among species of varying environmental preference, including preference for
common conditions (c), rare conditions (r), and no preference (n). Species with no environmental
preference are defined as having P = Pu ± 1SD, species with a preference for rare conditions, P <
Pu − 1SD, and species with a preference for common conditions, P > Pu + 1SD. In diatoms (b),
invertebrates (c), and fish (d) n species exhibit the following ranges of P: 0.28-0.88, 0.22-0.87,
and 0.23-0.90, respectively. These ranges include the proportion of common sites in each dataset,
i.e. 0.70 in the 703 streams sampled for diatoms, 0.69 in the 636 streams sampled for
invertebrates, and 0.71 in the 417 streams sampled for fish. Therefore, species without
environmental preference are found in common conditions statistically as frequently as these
conditions occur continentally. In all analyses, n species have significantly greater occurrence
than c and r species (p = 0.000002). Only in diatoms, the occurrence of c species exceeds
-3-
significantly this of r species (p = 0.00002), while in the remaining two groups, they are
statistically equivalent (p > 0.8). These results do not change statistically when distribution is
measured as geographic range. There are 474 c, 568 n, and 185 r diatoms in (b); 540 c, 778 n,
and 421 r invertebrates in (c); and 194 c, 218 n, and 76 r fish in (d). Different letters in each
panel indicate significant differences in means (p < 0.05).
Thus, in diatoms there is a tendency for species with a preference for common conditions
to exceed in distribution those with a preference for rare conditions, while in invertebrates and
fish this trend disappears.
-4-
Supplementary Figure S4. Invertebrates, testing the metacommunity model.
Regressions of distribution (D) when measured as ln number of occurrences in all 1866 streams,
against (a) ln maximum population density (Nmax): ln D = –0.21 + 0.54ln Nmax (R2 = 0.61, p <
0.000001) and (b) niche breadth (NB) (the RMSTOL metric): ln D = 0.34 + 0.04NB (R2 = 0.60, p
< 0.000001). Species maximum population density is represented by the maximum number of
individuals per m2 in 3719 samples from 1866 stream localities, while NB is derived from a
subset of 636 streams with environmental data, shown in Fig. 2. When distribution is measured
as geographic range, Nmax and NB produce the following models: ln D = 2.88 + 0.75ln Nmax (R2 =
0.36, p < 0.000001) and ln D = 2.45 + 0.64NB0.5 (R2 = 0.63, p < 0.000001 for both parameters).
Therefore, the positive patterns persist with both measures of distribution but since occurrence is
more sensitive, as shown in Suppl. Fig. S2, it is employed in structural equation modeling. (c) A
structural equation model showing the paths (p < 0.05 for all) with corresponding standardized
regression coefficients and, in parentheses, coefficients of non-determination (1 – R2) for each
response variable. Sample discrepancy function of –2.46e−16 indicates an excellent model fit. E1E2 = error terms. Number of taxa = 1739.
-5-
Supplementary Figure S5. Fish, testing the metacommunity model. Regressions of ln
distribution (D), calculated as the number of occurrences in all 1105 streams, against (a) ln
maximum relative abundance (Nmax): ln D = 3.88 + 0.68ln Nmax (R2 = 0.52, p < 0.000001); (b) ln
body weight (M): ln D = 1.35 + 0.86ln M – 0.11(ln M)2 (R2 = 0.15, p < 0.00001 for all
parameters); and (c) niche breadth (NB) (the RMSTOL metric in CANOCO): ln D = 0.55 +
0.04NB (R2 = 0.63, p < 0.000001). When distribution is measured as geographic range,
maximum relative abundance, body weight, and NB produce the following responses: ln D =
8.36 + 1.01ln Nmax (R2 = 0.39, p < 0.000001), ln D = 4.51 + 1.32ln M – 0.16(ln M)2 (R2 = 0.12, p
< 0.00001 for all parameters), and ln D = 2.50 + 0.61NB0.5 (R2 = 0.63, p < 0.00001). Therefore,
like in diatoms and invertebrates, the two measures of distribution display very similar
behaviours but further analyses are performed with occurrence due to its greater discriminating
capacity (see Suppl. Fig. S2) and a linear response to NB. Maximum relative abundance and
body weight for each species are measured as the maximum proportional abundance and the
average body weight, respectively, observed in 2383 samples from 1105 stream localities.
Although arcsine square root transformation is recommended for proportional data, lntransformation of Nmax improves normality to a greater extent and is adopted. NB is derived from
a subset of 417 streams with environmental data, shown in Fig. 2. (d) A structural equation
model showing only the significant paths (p < 0.05) with corresponding standardized regression
coefficients and, in parentheses, coefficients of non-determination (1 – R2) for each response
variable. A root mean square error of approximation of less than 0.00001 indicates an excellent
model fit. E1-E3 = error terms. Number of species = 488.
-6-
Supplementary Figure S6. Fish, testing the metacommunity model. Quadratic
regressions of ln body weight (M) and (a) ln maximum relative abundance (Nmax): ln Nmax = –
3.27 + 0.95ln M – 0.13(ln M)2, R2 = 0.16 (p < 0.00001 for all parameters); and (b) niche breadth
(NB): NB = 26.34 + 16.58ln M – 2.01(ln M)2, R2 = 0.14 (p < 0.00001 for all parameters). Both
Nmax and NB are measured, as defined in Suppl. Fig. S5. Number of species = 488.
-7-
Supplementary Figure S7. Invertebrates, testing the community model. (a)
Relationships of species richness (S) with significant slope d values (p < 0.1) from linear
regressions of local population density (N) against regional distribution (occurrences) (D): ln N =
d0 + dln D, where d0 = intercept. A higher p-level than the conventional 5% is adopted to ensure
that relationships, slightly affected by single outliers with a tendency to inflate the p-values, are
included. The best fit and regression parameters are given in the figure (p < 0.000001). (b)
Partitioning of the variance of slope d in the subset with environmental data, as outlined in
Legendre & Legendre (1998). The variance of slope d and species richness, explained by the
environment, is shown next to the solid red arrows, while the variance of slope d explained by
richness, is given next to the solid black arrow. The pure environmental effect (next to the dotted
red arrow), the pure richness effect (next to the dotted black arrow), and the covariance effect of
richness and environment (in the red triangle) are also shown. The negative predictors of slope d
include fluoride concentration and wetland area, while the positive predictors encompass human
population density as well as sodium and nitrate concentrations, both concentrations associated
with more extensive agriculture. The relationships of invertebrate richness with all but one of
these variables (F−) are opposite. Basic statistics of the environmental variables are given in
Suppl. Table S1. +/ – = positive/negative correlation with slope d. № = number of communities.
Reference
Legendre P. & Legendre L. (1998). Numerical Ecology. Second English Edition. Elsevier
Science B.V., Amsterdam, The Netherlands.
-8-
Supplementary Figure S8. Fish, testing the community model. Relationships of
species richness (S) with regression parameters of proportion relative abundance (N): (a)
significant parameter b2 values (p < 0.1) from the quadratic regressions of N against body size
(M): ln N = b0 + b1ln M + b2(ln M)2 (10 out of the 287 significant b2 values are identified as
outliers and removed prior to regression) and (b) significant slope d values (p < 0.1) from the
linear regressions of N against species distribution (occurrences) (D): ln N = d0 + dln D, where b0
and d0 = intercepts, and b1 = parameter. The fit and regression parameters are given in the
figures. p = 0.00003 in the first regression and p < 0.000001 in the second regression. № =
number of communities.
-9-
Supplementary Figure S9. Fish, testing the community model. Partitioning of the
variance of parameter b2 from the quadratic regressions of proportion relative abundance (N)
against body size (M): ln N = b0 + b1ln M + b2(ln M)2 (a) and slope d from the regressions of N
against occurrence (D): ln N = d0 + dln D (b) in the fish subset with environmental data. The
variance of parameter b2, slope d, and species richness explained by the environment, is shown
next to the solid red arrows, while the variance of parameter b2 and slope d explained by
richness, is given next to the solid black arrows. The pure environmental effect (next to the
dotted red arrows), the pure richness effect (next to the dotted black arrows), and the covariance
effect of richness and environment (in the red triangles) are also shown. To linearize the
relationships of richness (S) with the parameter b2 and slope d, it is transformed as S−1. This
transformation also increases the R2 of the multiple regressions of richness against the
environmental predictors in (a) and (b). Low richness communities with variable but generally
positive values of parameter b2 and negative values of slope d are found in streams of lower
temperature and comparatively pristine watersheds, covered by forests and shrublands. High
richness communities with negative b2 and positive slope d values are detected in streams of
higher temperature and more extensive agriculture (percent clay in the soil, which is a negative
predictor of parameter b2 and a positive predictor of richness, is highly positively correlated with
agriculture). As expected from the relationships in Suppl. Fig. S8, richness and parameter b2
show opposing responses to the environment, while richness and slope d, behave in a similar
fashion. Basic statistics of the environmental variables are provided in Suppl. Table S1. +/ – =
positive/negative correlation with parameter b2 or slope d. № = number of communities.
- 10 -
Download