An improved method to set significance thresholds for β diversity testing in microbial community comparisons. Arda Gülay1 and Barth F. Smets1* 1 Department of Environmental Engineering, Technical University of Denmark, Building 113, Miljøvej, 2800 Kgs Lyngby, Denmark. Phone: +45 45251600. FAX: +45 45932850. email: bfsm@env.dtu.dk, argl@env.dtu.dk Supplementary Information Fig. S1 PCoA analysis of a sample (7900 individuals) and its subsamples (5000 individuals) using Bray-curtis dissimilarity measure. Jackknife technique was also applied to compare the interquartile range and observed distance between subsamples. Fig. S2 All β1AA' and β2AA' (intra) beta diversities between an in-silico OTU library and its copies using seven dissimilarity measures. OTU libraries were randomly subsampled at equal sample sizes from 13000 to 3000 individuals. 10 subsets were created for each subsampling depth from all samples. All ordination plots were created using PCoA (Principle Coordinate Analysis) and the percentage of total variance explained by principle coordinate is shown under the associated axis. Each data point inside the ordination plots represents a subsample from the in-silico samples. Blue, green and red colors represent each sample and their subsamples that were being compared. Fig. S3 Rank-abundance curves of in-silico microbial communities that were created using different distribution models (table 1 in the article) such as: Log-normal, Log-normal trimmed (singletons and doubletons removed), Uniform and Chi-squared (degree of freedom:9). We used “rlnorm (1000, meanlog=1.79, sdlog=1)”, “runif (1000, min = 1, max = 19.1)” and “rchisq (1000, 9.2, ncp = 1)” commands to create Log-Normal, Uniform and Chi-Squared distributed communities respectively. Fig. S4 Lorenz curves of in-silico microbial communities that were created using different distribution models such as: Log-normal, Uniform and Chi-squared (degree of freedom: 9) as a measure of evenness. Fig. S5 In-silico OTU libraries following log-normal distribution with same richness but different evenness. 9 communities were chosen among many simulations in R. β diversity values between these communities were used to assess the effect of evenness on corrected β diversity values. Fig. S6 β diversities between in silico OTU libraries (see in Fig. S2) with same richness but different evenness calculated with Bray-Curtis indices at different sampling scales. Delta values represent the level of evenness differences in terms of Gini coefficient. Black points indicate observed β diversities as a function of subsampling depth and green points indicate corrected β diversities as a function of subsampling depth. Fig. S7 β diversity analysis of eight in-silico OTU libraries following log-normal distribution with same richness but different evenness (0.36, 0.42, 0.44, 0.64, 0.73, 0.75, 0.80, and 0.90) calculated with Bray-Curtis indices. OTU0.03 libraries were chosen from a large number of simulations of which their comparisons resulted as the observed β diversity of 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7. Red dots represents βMC of the species pool obtained from compared communities Fig. S8 Rank–abundance curves of triplicate 16S rRNA tag sequence libraries from 3 different rapid sand filters. Separate panels are shown for the dominant (>1%) and rare (<1%) OTU0.03s .Sequence abundance per OTU is shown in different colors for replicates. Shared and unique OTU0.03s are also shown in Venn diagrams. Fig. S9 Schematic explanation of the β diversity significance assessment technique using meta-community concept. 10 subsamples are recommended for β1AA' and β2AA' assessment. In addition, a p-value can be calculated by comparing observed βA῾B῾ to the null distribution (i.e. n=999) of β1AA῾ values obtained from the meta-community. Fig. S10 PCoA plots of standard techniques (subsampling and Jackknife technique) that were applied on replicates of 3 different rapid sand filters in order to measure significance of β diversity. 3 clusters of each PCoA plots, affiliated to subsampling technique, represent a replicate from the same filter in where two different colours in each cluster represent equalized original sample and its subsamples with lesser individuals. In Jackknife technique, ellipses around represent the interquartile range (IQR; Lozupone et al., 2007) in each axis for the 100 jackknife replicates. Table S1 β diversity analysis of three log normal distributed in silico OTU libraries with different individuals (4500, 7500 and 13500) using Raup-Crick method implemented in Vegan after 999 simulations. Raup-Crick outputs show the number of simulated values (shared species) that are smaller or equal than the observed value. Comparisions Raup-Crick values 4700 vs.7500 0.114 7500 vs.13500 0.001 4700 vs.13500 0.001 Calculations were implemented using model r1 as the method in “oecosimu” function, which uses the column marginal frequencies as probabilities Supplementary Table 1 Mean distances fromreplicates replicatesthat thatwere wererandomly randomlysubsampled subsampledfor for10 10 Table S2 Weighted UniFrac distances from times according to the minimum sample depth. Effect of rare OTUs were calculated by times at the minimum sampling depth.Effect of rare OTUs were calculated by substracting β subtracting diversity totalβOTUs fromofβdominant diversity of dominant OTUs. diversity ofβtotal OTUsoffrom diversity OTUs. Filter 2 Filter 7 Filter 12 Rep1 vs. Rep2 vs. Rep1 vs. Rep1 vs. Rep2 vs. Rep1 vs. Rep1 vs. Rep2 vs. Rep1 vs. Rep2 Rep3 Rep3 Rep2 Rep3 Rep3 Rep2 Rep3 Rep3 Total distance 1 0.066 0.041 0.042 0.057 0.156 0.161 0.104 0.065 0.096 Effect of dominant 0.030 0.020 0.014 0.028 0.124 0.126 0.073 0.024 0.075 Effect of rare 0.036 0.021 0.028 0.028 0.032 0.035 0.031 0.040 0.021 2 11 All dissimilarities were calculatedusing usingWeighted Weighted UniFrac UniFrac algorithm Dissimilarities were calculated algorithm OTU-tables without rarerare OTUOTUs Dissimilarities werecalculated calculatedfrom from OTU tables without S 22 Dissimilarities were