SUPPORTING INFORMATION FOR ONLINE PUBLICATION: Bacterial taxa-area and distance-decay relationships in marine environments L. Zinger1,2, A. Boetius1, A. Ramette1 1 HGF-MPG Joint Research Group on Deep Sea Ecology and Technology, Max Planck Institute for Marine Microbiology, Celsiusstrasse 1, D-28359 Bremen, Germany 2 CNRS & Université Paul Sabatier, UMR 5174 Evolution et Diversité Biologique, bâtiment 4R1, 118 route de Narbonne, 31062 Toulouse, France 1 SUPPORTING TABLES Table S1: Sample name and associated characteristics are uploaded separately in a .txt file. Longitudes and latitudes are provided in decimal degrees and sample depth in meters. OTU average relative occupancy corresponds to the average proportion of sites occupied by each OTU. Table S2: Dataset characteristics per ecosystem type and realm. Values are indicated for the entire dataset, and per sample ± SD in brackets Overall nb sequences Nb OTUs % Singletons % Abundant OTUs1 175 2491405 (14237 ± 6955) 23105 (539 ± 289) 54 (46 ± 30) 6.89 (7.05±3.34) OTU average relative occupancy2 ± SD 0.023 ± 0.067 Nb of samples Raw data Coastal waters Surface waters4 70 1566743 (22382 ± 9897) 9982 (619 ± 237) 49 (38 ± 16) 10.03 (9.53 ± 3.23) 0.062 ± 0.124 Deep waters5 61 1190636 (19519 ± 7611) 12132 (966 ± 262) 48 (38 ± 16) 10.07 (5.28 ± 1.45) 0.079 ± 0.150 Total Pelagic 306 5248784 (17152 ± 8569) 36283 (643 ± 318) 56 (42 ± 24) 8.08 (7.15 ± 3.11) 0.017 ± 0.054 72 1725682 (23968 ± 11833) 71586 (2753 ± 1378) 48 (51 ± 28) 4.56 (2.14 ± 1.12) 0.038 ± 0.068 Coastal sediments Deep-sea sediments Standardized data3 60 1221543 (20359 ± 7518) 45300 (2797 ± 1800) 48 (49 ± 39) 5.64 (2.15 ± 1.00) 0.061 ± 0.110 Total Benthic 132 2947225 (22327 ± 10227) 104097 (2773 ± 1578) 49 (50 ± 33) 5.14 (2.12 ± 1.04) 0.026 ± 0.056 Coastal waters 175 875000 (5000 ± 0) 54880 (313 ± 141) 10 (38 ± 22) 1.61 (0.03 ± 7e-5) 0.025 ± 0.072 Surface waters4 70 350000 (5000 ± 0) 22887 (326 ± 111) 8 (37 ± 15) 2.00 (5.82 ± 0.92) 0.067 ± 0.129 Deep waters5 61 305000 (5000 ± 0) 31019 (508 ± 75) 8 (43 ± 10) 1.60 (3.35 ± 0.39) 0.081 ± 0.146 Total Pelagic 306 1530000 (5000 ± 0) 108786 (355 ± 145) 7 (39 ± 21) 1.34 (5.07 ± 0.84) 0.019 ± 0.057 72 360000 (5000 ± 0) 81772 (1135 ± 523) 18 (54 ± 27) 1.15 (0.88 ± 0.35) 0.034 ± 0.059 60 300000 (5000 ± 0) 70872 (1181±485) 13 (52 ± 29) 1.11 (0.72 ± 0.25) 0.057 ± 0.105 132 660000 (5000 ± 0) 152644 (1156±505) 14 (53 ± 28) 1.08 (0.87 ± 0.26) 0.024 ± 0.051 Coastal sediments Deep-sea sediments Total Benthic 1 OTUs > 50 sequences; 2 Average proportion of sites occupied by each OTU; 3After random resampling of 5000 sequences per sample; 4samples <=200 m water depth; 5samples > 200 m water depth. 2 SUPPORTING FIGURES Figure S1: Schematic representation of the analytical pipeline used to calculate TAR and DDR slope coefficients. Figure S2: Effect of the removal of rare taxa on dataset characteristics in surface-sea waters (green, squares), deep-sea waters (blue, circles) and coastal sediments (orange, triangles): a) number of non-empty, remaining samples, b-d) pairwise geographic distances, e-g) Average richness and percentage of remaining OTUs and sequences per samples, h) Average OTU relative occupancy (average proportion of sites occupied by each OTU), i) Proportion of OTUs detected in the smallest area, j-l) Pairwise similarities between close (geographic distances <2000 km, left part of boxplots) and distant communities (geographic distances >12,000 km, right part of boxplots). Data used here are the same as in Fig. 2. Figure S3: Effect of the removal of rare taxa TAR (a) and DDR (b) slope coefficients and intercepts. Raw intercepts are displayed as obtained by linear regression on the data used in Fig. 2. Green squares, blue circles and orange triangles represent parameters obtained with the raw dataset in surface-, deep-sea waters and coastal sediments, respectively. Symbols with backgrounds of decreasing grey intensities correspond to data obtained by removing taxa of increasing abundance. Figure S4: Relationship between z and β obtained per realm/ecosystem type obtained with 1,000 randomly resampling of 40 samples in the standardized presence/absence community tables. The colour gradient (cyan to dark-red) represents point density. The corresponding Kendall correlation coefficients and their respective significance are indicated in each panel. Significance codes for Holm-corrected p-values: ***: P < 0.001, **: P < 0.01, n.s.: P > 0.05. 3 Figure S5: Distribution of Kendall correlation coefficients between z and β pairs obtained for each ecosystem type at each resampling step (green bars) and between z and permuted β values (grey bars) defined as a null distribution. n = 1,000 in each case. 4 1 FIGURE S1 Standardized OTU table OTU1 OTU2 OTU3 … OTUk Nseq S1 5000 S2 5000 S3 5000 Richness per Area calculation (nb. Richness values per Area = 40; nb. Areas = 10) AT A3 … Sn 5000 Reference selection: each sample considered successively A2 A1 Reference sample Sr Random resampling x 1000 (nb. samples = 40) Samples ranked by distance to the reference OTU1 OTU2 OTU3 … OTUk OTU1 OTU2 OTU3 … OTUk Sr S1 S3 S3 Sn S1 Averaging of the 40 Richness values obtained per Area DDR Sorensen similarity matrix 2 Richness observed per Area for Sr log Community Similarity (Sorensen) log Geographic Distances TAR log Avg. Richness log Area 5 FIGURE S2 b c 5000 0 500 600 0 0 10 50 90 Maximum abundance of OTUs removed 270 330 390 450 510 570 1.0 0.5 0.0 80 60 40 20 0 0 100 200 300 400 10 50 90 500 600 100 200 300 400 500 450 510 570 0 10 50 90 210 270 330 390 450 510 570 0.20 60 40 20 0.15 0.10 0.05 0.00 100 200 300 400 500 600 0 Maximum abundance of OTUs removed 100 200 300 400 500 600 Maximum abundance of OTUs removed l 0 log Avg Pairwise Similarity -1 -2 -3 150 Maximum abundance of OTUs removed k -5 -4 log Avg Pairwise Similarity 390 80 0 0 0 -1 -2 330 h 600 j -3 270 100 Maximum abundance of OTUs removed -4 210 0 0 Maximum abundance of OTUs removed i 150 0 1.5 Avg % Remaining Sequences Avg % Remaining OTUs 2.0 5000 0 0 g 100 2.5 10000 Maximum abundance of OTUs removed f 3.0 log10 Avg OTU Richness Obs 210 Maximum abundance of OTUs removed e log Avg S:S tot ratio in min Area 150 Avg OTU relative occupancy 400 log Avg Pairwise Similarity 300 -1 200 -2 100 5000 -3 0 10000 15000 -1 0 10000 15000 -2 20 15000 -3 40 20000 -4 60 d 20000 Pairwise Geographic Distances (km) Pairwise Geographic Distances (km) Nb of Remaining Samples 20000 Pairwise Geographic Distances (km) a -4 3 0 4 100 200 300 400 Maximum abundance of OTUs removed 500 600 1 5 9 14 20 26 32 5 9 14 20 26 32 Maximum abundance of OTUs removed 1 5 9 14 20 26 32 5 9 14 20 26 32 Maximum abundance of OTUs removed 1 5 9 14 20 26 32 5 9 14 20 26 32 Maximum abundance of OTUs removed 6 FIGURE S3 a 0 -2 -6 -4 -10 TAR's Intercept 2 4 TAR 0.2 0.3 0.4 0.5 0.6 z b -2.0 -3.0 -4.0 DDR's Intercept -1.0 DDR 0.00 0.05 0.10 0.15 0.20 0.25 |b| 7 FIGURE S4 Pelagic 0.6 0.4 0.2 z 0.4 0.2 Kendall t = 0.15 *** 0.0 0.0 Kendall t = 0.38 *** 0.05 0.10 0.15 0.00 0.05 |b| 0.6 0.6 z 0.2 z 0.10 0.15 0.00 0.05 |b| 0.10 0.15 |b| Coastal sediments 0.00 0.05 0.10 0.15 |b| Deep-sea sediments 0.4 0.2 0.2 z 0.4 0.6 g 0.6 f Kendall t = 0.08 ** 0.0 Kendall t = 0.15 *** 0.0 0.0 0.05 Deep-sea waters e 0.4 0.6 0.4 0.2 z Surface waters d Kendall t = 0.29 *** 0.00 Kendall t = 0.09 *** 0.0 Kendall t = 0.02n.s. 0.0 z 0.15 |b| Coastal waters c 0.10 0.4 0.00 0.2 z Benthic b 0.6 a 0.00 0.05 0.10 |b| 0.15 0.00 0.05 0.10 0.15 |b| 8 100 50 0 Frequency 150 FIGURE S5 -1.0 -0.5 0.0 0.5 1.0 Kendall's t 9