1 Supplementary files for 2 3 Phylogenetic beta diversity in bacterial assemblages across ecosystems: deterministic versus 4 stochastic processes 5 6 Jianjun Wang1, 2, *, Ji Shen1, Yucheng Wu3, Chen Tu4, Janne Soininen5, James C. Stegen6, Jizheng 7 He2, Xingqi Liu1, Lu Zhang1, Enlou Zhang1 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 State 25 * 26 27 Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, 210008 China 2 State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085 China 3 State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing, 210008 China 4 Key Laboratory of Coastal Zone Environmental Processes, Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, 264003 China 5 Department of Geosciences and Geography, University of Helsinki, FIN-00014, Finland 6 Fundamental and Computational Sciences Directorate, Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA Correspondence: Jianjun Wang; E-MAIL: JJWang@niglas.ac.cn 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 Material and methods For subsurface sediment (below 1 cm) from Kusai Lake in Qinghai-Tibetan plateau (KS) and Lugu Lake in Yunnan Province (LG) and subsurface soil (below 5 cm) from a park in Nanjing City (NJ), sediment/soil cores were collected with sterilized polyvinyl chloride tubes using platform drilling rig or hydraulic pressure-driven coring device and contamination-free samples along the depth were subsampled using sterile cut-off syringes (Wang et al 2008). The total depths for these three sediment cores were 7.5, 11.7 and 25.5 m, respectively. For stream biofilm bacteria from Laojun Mountain in Yunnan Province (STR), we scraped off the stream benthic stones for subsamples from predefined areas using sterilized sponges (Wang et al 2011, Wang et al 2012). For free-living bacteria from Taihu Lake in Jiangsu Province (ThW), we sampled water from the surface layer (top 50 cm) and collected the bacteria using 0.2-μm pore size Isopore filters after being prefiltered with 5-μm pore size Isopore filters. For surface sediment samples from Kuilei Lake (KL1 and KL7 sampled in January and July, respectively) and Taihu Lake in Jiangsu Province (ThS), and the lakes in Sichuan Province (SC), three short sediment cores (< 50 cm) for each site were retrieved using Kajak-Brinkhurst sampler and the three 1-cm surface sediment subsamples were pooled. For surface soil samples from Zhejiang Province (ZJ) or Hoh Xil in Qinghai-Tibetan plateau (HX), we sampled three top 4-cm soil subsamples for each site which were subsequently pooled. For the sample groups ThS, ThW, KS, LG, NJ, STR, and HX, we obtained the samples along transects. To compare the bacteria communities in lake water and sediments, we simultaneously examined the bacteria communities in lake water (ThW) and sediment (ThS) from Taihu Lake in the same locations during the same cruise. In these lakes, Kusai Lake is a saline lake with a salinity of ~17‰ and the others were freshwater lakes, and the two surface sediments from Kusai Lake were grouped as KSS. All these samples were transported to the laboratory under -18 oC and then kept in -80 oC for subsequent genetic analyses. Details in DNA sequencing and analyses can be found in Wang et al (2012). Briefly, genomic DNA was extracted from freeze-dried samples (soil, sediment and biofilm) or filters using phenol chloroform method. Bacterial 16S rRNA genes were amplified using the 27F primer with the 454 Life Sciences ‘A’ sequencing adapter, and the 519R primer with an 8-bp barcode sequence and the 454 Life Sciences “B” sequencing adapter. The PCR amplifications were performed for three replicates per sample, and then replicates were mixed. The purified, barcoded amplicons were pooled at equimolar ratios to the final concentration of 100 ng μL-1 and then sequenced using a Roche 454 FLX pyrosequencer. The sequences were deposited in MG-RAST database with sequence numbers 4508181.3 - 4508379.3 under the condition of “Data will be publicly accessible eventually”. Sequences generated from pyrosequencing of bacterial 16S rRNA gene amplicons were processed using QIIME pipeline (the Quantitative Insights into Microbial Ecology, v1.2) (Caporaso et al 2010b, Wang et al 2012). Briefly, the sequences were denoised with the Denoiser algorithm (Reeder and Knight 2010), clustered into OTUs at 97% pairwise identity with the seed-based uclust algorithm (Edgar 2010). After chimeras were removed via Chimera Slayer (Haas et al 2011), representative sequences from each OTU were aligned to the Greengenes imputed core reference alignment (DeSantis et al 2006) using PyNAST (Caporaso et al 2010a). The alignments were then used to construct an approximately maximum-likelihood phylogenetic tree with Jukes-Cantor distance using FastTree (Price et al 2010) after removing gaps and hypervariable regions using a Lane mask. Taxonomic identity of each representative sequence was determined using the RDP Classifier (Wang et al 2007) and chloroplast or archaeal sequences were removed. 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 References for supplementary materials Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, Knight R (2010a). PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26: 266-267. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al (2010b). QIIME allows analysis of high-throughput community sequencing data. Nat Meth 7: 335-336. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K et al (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72: 5069-5072. Edgar RC (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G et al (2011). Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21: 494-504. Price MN, Dehal PS, Arkin AP (2010). FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One 5: e9490. Reeder J, Knight R (2010). Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Meth 7: 668-669. Stegen JC, Lin X, Konopka AE, Fredrickson JK (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. ISME J 6: 1653-1664. Wang J, Wu Y, Jiang H, Li C, Dong H, Wu Q et al (2008). High beta diversity of bacteria in the shallow terrestrial subsurface. Environ Microbiol 10: 2537-2549. Wang J, Soininen J, Zhang Y, Wang B, Yang X, Shen J (2011). Contrasting patterns in elevational diversity between microorganisms and macroorganisms. J Biogeogr 38: 595-603. Wang J, Soininen J, He J, Shen J (2012). Phylogenetic clustering increases with elevation for microbes. Environ Microbiol Rep 4: 217-226. Wang Q, Garrity GM, Tiedje JM, Cole JR (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol 73: 5261-5267. 112 113 114 115 Table S1 Mantel and partial Mantel tests for the correlation between unweighted Unifrac and the explanatory distances (elevational, geographic and environmental distance) using Spearman’s rho for different habitat types and spatial scales. Unifrac Local Surface Sediments Effects of Controlling for Regional Lake Subsurface Stream Surface Water Sediments Biofilm Sediments KL1 KL7 ThS ThW LG KS STR SC Geographic 0.093 0.112 0.699*** 0.476*** 0.578*** 0.520*** 0.521*** 0.267*** Environmental 0.226* 0.476* 0.511*** 0.519*** 0.582*** 0.453*** 0.436*** 0.329* 0.827*** 0.170 0.407*** 0.175 Elevational -0.248*** 0.242* Environmental 0.781*** 0.089 Geographic 0.769*** 0.125* 0.268*** 0.263 -0.065*** 0.298* Elevational Geographic Elevational Environmental Environmental Geographic 0.037 0.210 0.025 0.466* 0.563*** 0.115 Elevational 116 117 118 119 120 0.215** 0.315*** 0.375*** 0.382** 0.458*** 0.373** The significances are tested based on 10,000 permutations. *** P < 0.001; ** P < 0.01; * P < 0.05. For the HX group, both the geographic and environmental distances were not significantly related to bacterial phylogenetic turnover (P > 0.05). For ZJ and NJ groups, the Mantel analyses are not conducted due to their lower sample numbers (7 and 5, respectively). 121 122 123 124 Table S2 Mantel and partial Mantel tests for the correlation between betaMNTD and the explanatory distances (elevational, geographic and environmental distance) using Spearman’s rho for different habitat types and spatial scales. BetaMNTD Local Surface Sediments Effects of Controlling for Regional Lake Subsurface Stream Surface Water Sediments Biofilm Sediments KL1 KL7 ThS ThW LG KS STR SC Geographic 0.099 0.202 0.435*** 0.275** 0.422*** 0.486*** 0.494*** 0.143 Environmental 0.296* 0.404* 0.497*** 0.485*** 0.571*** 0.422** 0.447*** 0.377** 0.827*** 0.129 0.372** 0.030 Elevational -0.367** 0.127** Environmental 0.779** 0.030 Geographic 0.799** 0.110** 0.296** 0.354* -0.043** 0.358** Elevational Geographic Elevational Environmental Environmental Geographic 0.032 0.282* 0.140 0.380* 0.226** 0.344** Elevational 125 126 127 128 0.056 0.419*** 0.152 0.446*** 0.422*** 0.338** The significances are tested based on 10,000 permutations. *** P < 0.001; ** P < 0.01; * P < 0.05. For the HX group, both the geographic and environmental distances were not significantly related to bacterial phylogenetic turnover (P > 0.05). For ZJ and NJ groups, the Mantel analyses are not conducted due to their lower sample numbers (7 and 5, respectively). 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 Figure S1 The dissimilarity among bacterial communities between different habitat types. (A) The dissimilarity between the communities of the surface sediments and lake water in Taihu Lake. The box-plots show the unweighted Unifrac or betaMNTD dissimilarity between each surface sediment and the all 27 lake water (between ThS01 and ThW01-ThW27, for instance). The communities differed greatly between surface sediment and lake water for both Unifrac and betaMNTD metrics, with average values of 0.94 ± 0.01 and 0.44 ± 0.02, respectively. Sites 01 and 18 are near river mouths, site 10 is ~1 km away from a man-made island, and site 6 and 14 are ~ 1 km away from the waterway channels (solid square). The arrows from site 1 to site 5 and from site 18 to 15 indicate that the distances between these sites to the correspongding river mounth are increasing. The communities of the surface sediments near river mouths (islands or waterway channels) were more similar to the lake water than the other locations in Taihu Lake, with an increasing trend in dissimilarities from the river mouth to the lake center. (B) The unweighted Unifrac or betaMNTD dissimilarity between the each surface (KS01 and KS02 from sample group KSS) or subsurface (KS03-KS30 from sample group KS) sediments from Kusai Lake and all of the soils (HX) from Kusai Lake region. Communities differed notably between soils and lake sediments for Unifrac and betaMNTD metrics, with average values of 0.95 ± 0.01 and 0.49 ± 0.05, respectively. 148 149 150 151 152 153 154 Figure S2 The relationships between betaMNTD and spatial change for each sample group. The regression slopes of the linear relationships based on Gaussian generalized model are shown with solid or dashed (statistically non-significant, ranked Mantel test, 9999 permutations, P > 0.05) lines. The significant slope (unweighted betaMNTD per 103 km) is shown at the bottom-right corner of each sample group panel. 155 156 157 158 159 160 161 162 Figure S3 Barplot of the proportions of significant ses.betaMNTD compared to the random phylogenetic turnover (P < 0.05). Ses.betaMNTD values < -2 indicate less than expected turnover; values > +2 indicate greater than expected turnover. The mean values of betaMNTD were shown below the labels of the corresponding sample groups and were significantly different from expected value of zero for random data (P < 0.001, t-test), except sample group KL1 and KL7 (P > 0.05). 163 164 165 166 167 168 169 170 171 Figure S4 The relationships between the standardized effect size of betaMNTD (ses.betaMNTD) and spatial change for each sample group. The regression slopes of the linear relationships based on Gaussian generalized model are shown with solid or dashed (statistically non-significant, Mantel test, 9999 permutations, P > 0.05) lines. The dotted red line (>+2 or <-2) shows the 95% confidence intervals around the expectation under a null model of random shuffling of taxa across the tips of the phylogenetic tree including all species in the studied sample group. 172 173 174 175 176 177 178 179 180 Figure S5 Median habitat differences between pairs of OTUs as a function of between OTU phylogenetic distances, following the methods of Stegen et al (2012). Habitat difference was calculated as the abundance-weighted mean of all the environmental variables of a given OTU and medians were taken within phylogenetic distance bins. Curves were fitted with locally-weighted polynomial regression. Across short phylogenetic distances, strong positive relationships between phylogenetic distance and OTU niche differences were observed, indicating significant phylogenetic signal.