1 Appendix: 2 Effects of Equiprobable Marginal Distributions on the PP Algorithm 3 4 The PP algorithm contains a bias that has previously been overlooked in null model 5 analysis and is related to the larger variance of the marginal total distributions of the null 6 matrices with respect to the observed matrix. The effect is most severe in matrices of similar 7 numbers of occurrences within rows or columns, which can arise from equiprobable, random 8 sampling. To see this, consider a matrix that has in each row 50 occurrences. The variance in 9 the row marginal distribution is therefore zero. Any resampling proportional to observed 10 totals that retains the grand total will return marginal totals that deviate more or less from the 11 observed ones although the observed equality is preserved in the averages for a set of random 12 matrices. The variance in the marginal total distribution for each single null matrix will be 13 larger than zero. Such differences in numbers of occurrences between rows (and columns) 14 generate patterns of pair-wise species co-occurrences that systematically differ from the 15 observed ones. Metrics that rely on patterns of pair-wise co-occurrence, such as the C-score, 16 the Soerensen index, or NODF, then identify the observed pattern as being not random. 17 18 To understand this behavior, consider a species pair at c sites of l and m occurrences, respectively. The probability of k co-occurrences is given by (Connor and Simberloff 1978) 𝑝 𝑘 = 𝑐 𝑘 19 20 𝑐−𝑘 𝑐−𝑙 𝑙−𝑘 𝑚−𝑘 𝑐 𝑐 𝑙 𝑚 (1) 21 with mean = lm/c. The expected number of checkerboards Ch under random placement for 22 any pair of species is therefore given by (Stone and Roberts 1990) 23 24 Ch = (𝑙 − 𝑙𝑚 𝑙𝑚 )(𝑚 − ) 𝑐 𝑐 (2) 25 Using the inequality a2 > (a+b)(a-b) it is now straightforward to show that Ch is always 26 largest if l = m: 27 We have to show that 𝑎2 𝑎− 𝑛 28 29 30 > (𝑎 − 𝑏) − (𝑎 − 𝑏)(𝑎 + 𝑏) 𝑛 (𝑎 + 𝑏) − Rearrangement of the right hand term gives 31 32 (𝑎 − 𝑏)(𝑎 + 𝑏) (𝑎 − 𝑏)(𝑎 + 𝑏) (𝑎 − 𝑏)(𝑎 + 𝑏) (𝑎 + 𝑏) − = 𝑎− 𝑛 𝑛 𝑛 2 2 2 2 2 2 𝑎 −𝑏 𝑎 𝑏 = 𝑎− − 𝑏2 = 𝑎 − − − 𝑏2 𝑛 𝑛 𝑛 2 𝑎2 𝑎2 𝑏 2 𝑏 4 = 𝑎− − 2𝑎 − + − 𝑏2 𝑛 𝑛 𝑛 𝑛2 𝑎2 𝑎− 𝑛 2 𝑎2 > 𝑎− 𝑛 2 − 2𝑎 − 𝑎2 𝑏 2 𝑏 4 + − 𝑏2 𝑛 𝑛 𝑛2 (5) and 𝑎2 1 𝑏2 2𝑎 − +1> 2 𝑛 𝑛 𝑛 37 38 (6) Since a < n and b < n 2𝑎 − 40 41 − 𝑏2 Thus 35 39 2 (4) 34 36 (𝑎 − 𝑏)(𝑎 + 𝑏) 𝑛 (3) (𝑎 − 𝑏) − 33 2 𝑎2 𝑏2 > 0 𝑎𝑛𝑑 <1 𝑛 𝑛 (7) 42 Thus the left hand term in (7) and therefore also that in (3) are always larger than the 43 respective right hand terms. 44 When summed over all species pairs, it follows that any matrix with equal numbers of 45 species occurrences will always have a larger potential number of checkerboards than a 46 matrix with unequal numbers of occurrences if the grand total number of occurrences is fixed. 47 As a result the C-score and related metrics (Stone and Roberts, 1992; Gotelli 2001; Baselga, 48 2010; Presley et al., 2010; Ulrich and Gotelli, 2012) will have the tendency to identify 49 matrices with unequal marginal total distributions as being aggregated when compared to 50 similarly structured matrices with more equal marginal total distributions. In turn, matrices 51 with more equal marginal total distributions will be identified as being segregated. 52 An empirical evaluation showed that Z-scores of NODF and the C-score and the total 53 variance in the row and column marginal distributions are indeed connected (Appendix Figure 54 1A, C). Both metrics correctly identified the Mprop matrices that have pronounced variances 55 (2 > 10) as being random. In turn, they pointed to segregation (anti-nestedness) in the case of 56 the Mequi matrices that had very similar numbers of occurrences across rows and columns (2 57 < 10). Z-scores of NODF and the C-score were within 95% confidences limits of the null 58 distributions if the variances of the row/column marginal distributions were less than two 59 times the observed variances (Appendix Figure 1B, C). The correlations of Z-scores for the 60 Mequi matrices with variance (Fig. 5A, C) result from the previously described matrix size 61 effect. The larger a matrix was, the more equal were row and column totals and thus the more 62 deviant the C-scores and NODF scores from the PP random expectation. In the case of the 63 Mprop and Mrand matrices, both metrics were not correlated to matrix size and fill (not shown). 64 The identification of equiprobable presence – absence matrices as being segregated by 65 the PP algorithm raises the question of whether this is an undesired artifact or whether it 66 reflects ecological realism. Traditional null model approaches have focused on placement 67 probabilities and treated marginal distributions as side-effects. In the case of the EE null 68 model, this means that occurrences within the matrix are equiprobably random while the 69 marginal total distribution is more even than expected just by chance, particularly for larger 70 matrices. The PP algorithm incorporates both the effects of internal matrix structure and the 71 marginal total distributions. Used with metrics that quantify patterns of co-occurrences, it will 72 identify matrices that deviate from random expectation in internal structure or/and in marginal 73 distributions. Thus matrices with too equal marginal total distributions will be identified as 74 being segregated because this equal spread is improbable. In our view this is a desired 75 property that demands us to consider all aspects of matrix structure in null model analysis. 76 We note that the problem of marginal totals is not restricted to our null model. It applies 77 to all null and neutral models that alter marginal distributions during randomization. For 78 example, Moore and Swihart (2007) called for null models that are based on meta-community 79 abundance distributions to generate presence – absence null matrices. Such randomization 80 algorithms will inevitably influence marginal total distributions and our co-occurrence metrics 81 might identify observed matrix structures too often as being non-random. The same 82 consideration regards neutral model simulations (Gotelli and McGill, 2006; Gotelli and 83 Ulrich, 2012) to obtain expectations on matrix structure under ecological drift. Neutral model 84 simulations tend to have different marginal total distributions than observed and might also 85 point too frequently to non-randomness (Ulrich, 2004). 86 87 References 88 Baselga, A., 2010. Partitioning the turnover and nestedness components of beta diversity. 89 90 91 Global Ecology and Biogeography 19, 134-143. Connor, E.F., Simberloff, D., 1978. Species number and compositional similarity of the Galapagos flora and avifauna. Ecological Monographs 48, 219-248. 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 Gotelli, N.J., 2001. Research frontiers in null model analysis. Global Ecology and Biogeography 10, 337-343. Gotelli, N.J., McGill, B.J., 2006. Null versus neutral models: what’s the difference? Ecography 29, 793-800. Moore, J.E., Swihart, R.K., 2007. Toward ecologically explicit null models of nestedness. Oecologia 152, 763-777. Patterson, B.D., Atmar, W., 1986. Nested subsets and the structure of insular mammalian faunas and archipelagos. Biological Journal of the Linnean Society 28, 65–82. Stone, L., Roberts, A., 1992. Competitive exclusion, or species aggregation? An aid in deciding. Oecologia 91, 419-24. Ulrich, W., 2004. Species co-occurrences and neutral models: reassessing J. M. Diamond's assembly rules. Oikos 107, 603-609. 109 Appendix Figure 1: Z-scores for the C-score (A, B) and NODF (C, D) in dependence of the 110 total observed variance in row and column occurrences (A, C) and the quotient of total 111 observed variances and average total predicted variances in row and column occurrences (B, 112 D). Data from 100 Mequi matrices (open dots) and 100 Mprop matrices (black dots). 113 6 5 4 3 2 1 0 -1 -2 -3 A Z-scores 1 10 100 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 10 100 2obs 114 115 116 1000 C 1 6 5 4 3 2 1 0 -1 -2 -3 1000 B 0 2 4 6 8 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 10 D 0 2 4 6 8 s2model / s2obs 10