Suppl

advertisement
1
Appendix:
2
Effects of Equiprobable Marginal Distributions on the PP Algorithm
3
4
The PP algorithm contains a bias that has previously been overlooked in null model
5
analysis and is related to the larger variance of the marginal total distributions of the null
6
matrices with respect to the observed matrix. The effect is most severe in matrices of similar
7
numbers of occurrences within rows or columns, which can arise from equiprobable, random
8
sampling. To see this, consider a matrix that has in each row 50 occurrences. The variance in
9
the row marginal distribution is therefore zero. Any resampling proportional to observed
10
totals that retains the grand total will return marginal totals that deviate more or less from the
11
observed ones although the observed equality is preserved in the averages for a set of random
12
matrices. The variance in the marginal total distribution for each single null matrix will be
13
larger than zero. Such differences in numbers of occurrences between rows (and columns)
14
generate patterns of pair-wise species co-occurrences that systematically differ from the
15
observed ones. Metrics that rely on patterns of pair-wise co-occurrence, such as the C-score,
16
the Soerensen index, or NODF, then identify the observed pattern as being not random.
17
18
To understand this behavior, consider a species pair at c sites of l and m occurrences,
respectively. The probability of k co-occurrences is given by (Connor and Simberloff 1978)
𝑝 𝑘 =
𝑐
𝑘
19
20
𝑐−𝑘 𝑐−𝑙
𝑙−𝑘 𝑚−𝑘
𝑐 𝑐
𝑙 𝑚
(1)
21
with mean  = lm/c. The expected number of checkerboards Ch under random placement for
22
any pair of species is therefore given by (Stone and Roberts 1990)
23
24
Ch = (𝑙 −
𝑙𝑚
𝑙𝑚
)(𝑚 − )
𝑐
𝑐
(2)
25
Using the inequality a2 > (a+b)(a-b) it is now straightforward to show that Ch is always
26
largest if l = m:
27
We have to show that
𝑎2
𝑎−
𝑛
28
29
30
> (𝑎 − 𝑏) −
(𝑎 − 𝑏)(𝑎 + 𝑏)
𝑛
(𝑎 + 𝑏) −
Rearrangement of the right hand term gives
31
32
(𝑎 − 𝑏)(𝑎 + 𝑏)
(𝑎 − 𝑏)(𝑎 + 𝑏)
(𝑎 − 𝑏)(𝑎 + 𝑏)
(𝑎 + 𝑏) −
= 𝑎−
𝑛
𝑛
𝑛
2
2 2
2
2 2
𝑎 −𝑏
𝑎
𝑏
= 𝑎−
− 𝑏2 = 𝑎 − −
− 𝑏2
𝑛
𝑛
𝑛
2
𝑎2
𝑎2 𝑏 2 𝑏 4
= 𝑎−
− 2𝑎 −
+ − 𝑏2
𝑛
𝑛 𝑛 𝑛2
𝑎2
𝑎−
𝑛
2
𝑎2
> 𝑎−
𝑛
2
− 2𝑎 −
𝑎2 𝑏 2 𝑏 4
+ − 𝑏2
𝑛 𝑛 𝑛2
(5)
and
𝑎2 1
𝑏2
2𝑎 −
+1> 2
𝑛 𝑛
𝑛
37
38
(6)
Since a < n and b < n
2𝑎 −
40
41
− 𝑏2
Thus
35
39
2
(4)
34
36
(𝑎 − 𝑏)(𝑎 + 𝑏)
𝑛
(3)
(𝑎 − 𝑏) −
33
2
𝑎2
𝑏2
> 0 𝑎𝑛𝑑
<1
𝑛
𝑛
(7)
42
Thus the left hand term in (7) and therefore also that in (3) are always larger than the
43
respective right hand terms.
44
When summed over all species pairs, it follows that any matrix with equal numbers of
45
species occurrences will always have a larger potential number of checkerboards than a
46
matrix with unequal numbers of occurrences if the grand total number of occurrences is fixed.
47
As a result the C-score and related metrics (Stone and Roberts, 1992; Gotelli 2001; Baselga,
48
2010; Presley et al., 2010; Ulrich and Gotelli, 2012) will have the tendency to identify
49
matrices with unequal marginal total distributions as being aggregated when compared to
50
similarly structured matrices with more equal marginal total distributions. In turn, matrices
51
with more equal marginal total distributions will be identified as being segregated.
52
An empirical evaluation showed that Z-scores of NODF and the C-score and the total
53
variance in the row and column marginal distributions are indeed connected (Appendix Figure
54
1A, C). Both metrics correctly identified the Mprop matrices that have pronounced variances
55
(2 > 10) as being random. In turn, they pointed to segregation (anti-nestedness) in the case of
56
the Mequi matrices that had very similar numbers of occurrences across rows and columns (2
57
< 10). Z-scores of NODF and the C-score were within 95% confidences limits of the null
58
distributions if the variances of the row/column marginal distributions were less than two
59
times the observed variances (Appendix Figure 1B, C). The correlations of Z-scores for the
60
Mequi matrices with variance (Fig. 5A, C) result from the previously described matrix size
61
effect. The larger a matrix was, the more equal were row and column totals and thus the more
62
deviant the C-scores and NODF scores from the PP random expectation. In the case of the
63
Mprop and Mrand matrices, both metrics were not correlated to matrix size and fill (not shown).
64
The identification of equiprobable presence – absence matrices as being segregated by
65
the PP algorithm raises the question of whether this is an undesired artifact or whether it
66
reflects ecological realism. Traditional null model approaches have focused on placement
67
probabilities and treated marginal distributions as side-effects. In the case of the EE null
68
model, this means that occurrences within the matrix are equiprobably random while the
69
marginal total distribution is more even than expected just by chance, particularly for larger
70
matrices. The PP algorithm incorporates both the effects of internal matrix structure and the
71
marginal total distributions. Used with metrics that quantify patterns of co-occurrences, it will
72
identify matrices that deviate from random expectation in internal structure or/and in marginal
73
distributions. Thus matrices with too equal marginal total distributions will be identified as
74
being segregated because this equal spread is improbable. In our view this is a desired
75
property that demands us to consider all aspects of matrix structure in null model analysis.
76
We note that the problem of marginal totals is not restricted to our null model. It applies
77
to all null and neutral models that alter marginal distributions during randomization. For
78
example, Moore and Swihart (2007) called for null models that are based on meta-community
79
abundance distributions to generate presence – absence null matrices. Such randomization
80
algorithms will inevitably influence marginal total distributions and our co-occurrence metrics
81
might identify observed matrix structures too often as being non-random. The same
82
consideration regards neutral model simulations (Gotelli and McGill, 2006; Gotelli and
83
Ulrich, 2012) to obtain expectations on matrix structure under ecological drift. Neutral model
84
simulations tend to have different marginal total distributions than observed and might also
85
point too frequently to non-randomness (Ulrich, 2004).
86
87
References
88
Baselga, A., 2010. Partitioning the turnover and nestedness components of beta diversity.
89
90
91
Global Ecology and Biogeography 19, 134-143.
Connor, E.F., Simberloff, D., 1978. Species number and compositional similarity of the
Galapagos flora and avifauna. Ecological Monographs 48, 219-248.
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Gotelli, N.J., 2001. Research frontiers in null model analysis. Global Ecology and
Biogeography 10, 337-343.
Gotelli, N.J., McGill, B.J., 2006. Null versus neutral models: what’s the difference?
Ecography 29, 793-800.
Moore, J.E., Swihart, R.K., 2007. Toward ecologically explicit null models of nestedness.
Oecologia 152, 763-777.
Patterson, B.D., Atmar, W., 1986. Nested subsets and the structure of insular mammalian
faunas and archipelagos. Biological Journal of the Linnean Society 28, 65–82.
Stone, L., Roberts, A., 1992. Competitive exclusion, or species aggregation? An aid in
deciding. Oecologia 91, 419-24.
Ulrich, W., 2004. Species co-occurrences and neutral models: reassessing J. M. Diamond's
assembly rules. Oikos 107, 603-609.
109
Appendix Figure 1: Z-scores for the C-score (A, B) and NODF (C, D) in dependence of the
110
total observed variance in row and column occurrences (A, C) and the quotient of total
111
observed variances and average total predicted variances in row and column occurrences (B,
112
D). Data from 100 Mequi matrices (open dots) and 100 Mprop matrices (black dots).
113
6
5
4
3
2
1
0
-1
-2
-3
A
Z-scores
1
10
100
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
10
100
2obs
114
115
116
1000
C
1
6
5
4
3
2
1
0
-1
-2
-3
1000
B
0
2
4
6
8
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
10
D
0
2
4
6
8
s2model / s2obs
10
Download