1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Quantifying Community Assembly Processes and Identifying Features that Impose Them James C. Stegen*, Xueju Lin, Jim K. Fredrickson, Xingyuan Chen, David W. Kennedy, Christopher J. Murray, Mark L. Rockhold, and Allan E. Konopka Fundamental and Computational Sciences Directorate, Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99352, USA *Corresponding author, phone: 509-371-6763, Email: James.Stegen@pnnl.gov Supplementary Methods Field sampling, DNA processing, and environmental data We study a bacterial meta-community associated with subsurface sediments within an unconfined aquifer ~250m (horizontal distance) from the Columbia River in Richland, WA. The sampled locations are within the Hanford Integrated Field Research Challenge (IFRC) site (http://ifchanford.pnnl.gov/) (Bjornstad et al 2009). Sediment samples were taken during the drilling of 26 wells as described in (Bjornstad et al 2009) (see Table S1 for summarized metadata). DNA was extracted as in (Lin et al 2012a) and sequenced as in (Lin et al 2012b). Using QIIME (Caporaso et al 2010) sequences were pre-processed by removing any sequences < 200 or > 300 nucleotides long, with a mean quality score < 25 (Huse et al 2007), containing ambiguous characters, containing a homopolymer longer than 8 nucleotides, missing the primer sequence, or containing an uncorrectable barcode. Samples were not used if they had < 500 sequences or >5% relative abundance of Propionibacterial (a sign of contamination). Operational taxonomic units (OTUs) were created using cd-hit (Li & Godzik 2006) with a prefix pre-filter length of 200 nucleotides, a minimum coverage of 99%, and a minimum similarity of 97%. The most abundant sequence within each OTU was taken to represent the OTU. For simplicity, below we refer to OTUs as ‘species.’ Prior to further analyses, each sample was rarefied to 500 sequences (to control betweensample heterogeneity) and only the 1000 most abundant species were retained to reduce the influences of sequencing errors that may produce singletons. This step is not likely to greatly influence our results because all community-level analyses account for relative abundances; patterns are driven primarily by the most abundant species. Representative sequences were aligned against the SILVA database (Pruesse et al 2007) using PyNAST within QIIME (Caporaso et al 2010) with a minimum alignment length of 150 and a minimum identity of 75%. Sequences that failed to align were dropped. FastTree (Price et al 2009) was used to infer a phylogenetic tree within QIIME (Caporaso et al 2010). All community-level analyses (described below) were carried out in the R statistical language (http://cran.r-project.org/). Measured environmental data for each community included its elevation, its horizontal distance from the Columbia River shoreline, the elevation of the top of the Ringold formation at its geographic location (see Bjornstad et al 2009), and the composition of its associated sediments measured as percent mud. These variables provide indirect characterization of environmental conditions that may influence microbial communities; elevation is related to vertical gradients in redox and potentially the availability of electron donors and acceptors; horizontal distance from the Columbia River is related to the magnitude of river intrusion; the Ringold elevation likely influences how the geochemical environment changes with depth such 1 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 that two communities at the same depth but in locations with different Ringold elevations may experience different geochemical environments; and percent mud influences numerous physical and geochemical aspects of the local environment. To estimate percent mud, grain-size analyses were performed on selected sediment samples collected during drilling at the IFRC site using a combination of sieve, hydrometer, and laser diffraction methods (Gee & Or 2002). The mud content (silt- and clay-sized particles) in each of the samples was estimated from the mass fraction less than 0.063 mm (Folk 1980). Geophysical logging was also performed in the IFRC wells using a spectral gamma logging system. The primary natural gamma-emitting radionuclides are 40K, 232Th, and 238U. Percent mud was found to be positively correlated with both 40K and 232Th, with slightly stronger correlations for 232Th. Therefore the 232Th log data were interpolated to the sediment sample measurement locations, and a correlation function was used to estimate percent mud from the interpolated 232 Th values. One sample did not have a 232Th value (well C6207 within the Ringold formation) from which to calculate percent mud. This sample was approximately a meter within finegrained Ringold material (see Appendix A in Bjornstad et al 2009); to estimate its percent mud we used the median percent mud across samples taken from deeper than a meter below the top of the Ringold. Testing phylogenetic signal To test for phylogenetic signal we first estimated two dimensions of each OTU’s ecological niche. Each dimension is estimated as the habitat conditions under which a given OTU is most abundant (Andersson et al 2010, Pei et al 2011, Stegen et al 2012). Across our system the most substantial shifts in environmental conditions are likely associated with subsurface depth and sediment composition; at greater depths the sediment shifts from coarsegrained Hanford material to fine-grained Ringold material and redox conditions shift from oxidizing to reducing (Bjornstad et al 2009). Our sampled communities are primarily from oxidizing conditions (Table S1), so we cannot rigorously evaluate redox-state niches. Instead, the environmental niche of each OTU was characterized as the subsurface elevation where it was most abundant and sediment composition (the % mud) where it was most abundant (similar to Pei et al 2011, Stegen et al 2012). We relate between-OTU niche differences to between-OTU phylogenetic distances using Mantel correlograms with permutation-based significance tests for each of 50 phylogenetic distance classes (similar to Diniz-Filho et al 2010). Significance tests were based on 999 permutations using the R function ‘mantel.correlog’ (package ‘vegan’) with a progressive Bonferroni correction (Legendre & Legendre 1998) and no distance class cutoff. Turnover in phylogenetic community composition A given value of βMNTD could be less than, greater than, or equal to the degree of turnover expected when Selection does not influence turnover in community composition. Less than expected phylogenetic turnover should result from environmental conditions constraining community composition by imposing Selection on species’ ecological niches. Greater than expected phylogenetic turnover should be due to divergent environmental conditions causing each community to be composed of an ecologically distinct set of species. Note that these 2 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 conditions assume at least a minor amount of exchange of organisms among local communities through deep evolutionary time (so that individual communities do not evolve evolutionarily distinct assemblages in situ). This assumption is likely upheld in our system, which is within a single unconfined aquifer (maximum of 54m separating any two communities) through which groundwater regularly flows and into which the Columbia River annually intrudes (McKinley et al in revision). As such, the degree to which βMNTD deviates from a null model expectation measures the degree to which community composition is limited by Selection on species’ ecological niches. To quantify the degree to which βMNTD deviates from a null model expectation we used a randomization procedure that shuffled species names and abundances across the tips of the phylogeny. After shuffling, βMNTD was recalculated to provide a null value, and repeating the randomization 999 times provided a null distribution. The difference between observed βMNTD and the mean of the null distribution is measured in units of standard deviations (of the null distribution) and is referred to as the β-Nearest Taxon Index (βNTI). βNTI values less than -2 or greater than +2 indicate that observed βMNTD deviates by more than two standard deviations from the null model expectation. For a given pairwise community comparison, βNTI < -2 or > +2 therefore indicates significantly less than or greater than expected phylogenetic turnover, respectively. Our randomization procedure was chosen by considering the results of Hardy (Hardy 2008), which show that randomization outcomes are influenced by phylogenetic signal in species abundances. We find little to no evidence for phylogenetic signal in species abundances, using Mantel correlograms as above (Fig. S1). In this case Hardy (Hardy 2008) showed that our randomization procedure provides very close to an exact test for a range of phylogenetic turnover metrics. We therefore suggest that our specific null model provides robust statistical and ecological inferences. Turnover in species composition Most metrics of turnover in species composition provide no information on whether the observed degree of turnover is less than, greater than, or similar to the degree of turnover expected if community assembly was governed primarily by Drift. One exception is Raup-Crick (Chase et al 2011), which generates an expected degree of turnover using a randomization procedure where species are probabilistically drawn into each local community until empirically-observed local richness is reached. In the randomization procedure the probability of drawing a given species from the meta-community species pool is proportional to the number of local sites occupied by that species (Chase et al 2011). In its current form Raup-Crick does not account for species’ relative abundances (Chase et al 2011), which can carry information useful for understanding community assembly processes (Anderson et al 2011). In order to take full advantage of this information we extend Raup-Crick to consider species’ relative abundances. Practically, this requires a minor addition to the procedure developed by Chase et al. (Chase et al 2011). In the randomization procedure we use their method (summarized above) to draw species into a given local community until the empirical species richness is reached. At that point each species in the randomly assembled community is represented by one individual. To model stochastic (Drift-based) recruitment, individuals are drawn into the community, but only into those species assembled in the previous step. The probability of drawing an individual into a given species is proportional to that species 3 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 abundance in the meta-community. The randomization procedure therefore assumes no influence of Selection or Dispersal Limitation. For a pair of communities being compared, both were randomly assembled following the above procedure. Compositional turnover between the communities was then quantified using standard Bray-Curtis dissimilarity, which accounts for species relative abundances. For each pair of communities the randomization was run 999 times, providing a null distribution of expected Bray-Curtis values. The empirically observed Bray-Curtis value for the pair of communities was compared to this null distribution following the procedure developed by Chase et al. (Chase et al 2011). Specifically, the number of comparisons between randomly assembled communities that have a Bray-Curtis value greater than the empirical Bray-Curtis is added to half the number of ties; ties occur when observed Bray-Curtis is equal to Bray-Curtis based on randomly assembled communities. The resulting sum gives the probability that stochastic community assembly (Drift) results in less turnover than empirically observed. As in Chase et al. (Chase et al 2011), we standardize this probability to vary between -1 and +1 by subtracting 0.5 and then multiplying by 2, and refer to the resulting metric as RCbray. Similar to Chase et al. (Chase et al 2011), we interpret RCbrayvalues greater than +0.95 or less than -0.95 as indicating that Selection or Dispersal significantly influence turnover in community composition. In turn, RCbrayvalues between -0.95 and +0.95 are consistent with a dominant role of Drift. 160 Combining spatial eigenvectors and measured environmental variables with model-selection 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 To provide the most complete description of spatial and environmental relationships among local communities we combined spatial eigenvector analysis with measured abiotic variables. The spatial eigenvectors describe spatial relationships among communities across a range of spatial scales; the first eigenvector breaks sampling locations into broadly distributed clusters and subsequent eigenvectors characterize spatial relationships at increasingly fine scales (Borcard & Legendre 2002, Borcard et al 2011, Heino et al 2011). To carry out spatial eigenvector analyses we used the R function ‘pcnm’ within the package ‘vegan’. The ‘pcnm’ function takes a (spatial) distance matrix as input. For analyses within the Ringold and Hanford formations we used the geographical locations (Eastings and Northings, Table S1) of each well to build the distance matrix. Within each formation spatial eigenvectors therefore describe spatial relationships in two-dimensions. The resulting eigenvectors are referred to as ‘PCNM’ axes in Tables S2-S4. Note that this method was originally referred to ‘principal coordinates of neighbor matrices’ (Borcard & Legendre 2002), but is now referred to as ‘Moran’s eigenvector maps’ (Borcard et al 2011). Eigenvectors were described using two-dimensional spatial relationships within formations because the horizontal distance separating communities was much larger than the vertical distance separating communities. Communities retained for analyses within each formation are therefore distributed across an approximately two-dimensional space. For analyses across the full system (which includes both formations) we described spatial distances in threedimensions due to the larger vertical distances separating communities. These three-dimensional Euclidean distances were then used to define spatial eigenvectors. Note that spatial eigenvector analysis is robust in one, two or three dimensions (Borcard & Legendre 2002). 4 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 Spatial eigenvectors only describe spatial relationships among sampling locations. As such, some eigenvectors may describe the spatial scale(s) at which dispersal operates, while others may be more related to the spatial structure of environmental variables (Legendre et al 2009). In addition to spatial relationships, we measured four abiotic variables that may influence community composition. However, these measured variables may also simply describe spatial relationships among communities. For example, horizontal distance from the Columbia River may reflect spatial relationships or may reflect different environmental conditions related to spatially structured river water intrusion (McKinley et al in revision). In addition, measured abiotic variables may co-vary with each other and/or with spatial eigenvectors. To combine all variables and minimize co-variation we combined measured abiotic variables with spatial eigenvectors using principal components analysis (PCA). The resulting PCA axes (Tables S2-S4 provide loadings) were used as independent variables in a model-selection procedure with either βNTI or RCbray as the dependent variable. Note that three separate sets of PCA axes were characterized: one for the Hanford formation, one for the Ringold formation, and one for the full system. As such, labels associated with Hanford formation PCA axes have no relationship to, for example, labels of Ringold formation axes. To fit statistical models for βNTI and RCbray we used distance-based redundancy analysis (Legendre & Anderson 1999) (R function ‘capscale’ within package ‘vegan’) combined with a model-selection procedure. We used forward model-selection (Blanchet et al 2008) where the significance of independent variables (α = 0.05) was evaluated step-wise and the order of variable evaluation was based on improvement in the model’s adjusted R2. Model-selection proceeded until the next independent variable was non-significant as determined by 1000 permutations (R function ‘ordiR2step’ within package ‘vegan’). Separate model-selection procedures were carried out for the Hanford, the Ringold and the full system, and βNTI and RCbray were further evaluated separately. Distance-based redundancy analysis takes positive, pairwise community distances as input such that βNTI and RCbray were each normalized to vary between 0 and 1 prior to model-selection; for each, the absolute magnitude of the minimum (negative) value was added to all values (making all > 0) and the resulting values were then divided by their maximum (making all > 0 and < 1). As discussed above, the magnitude of βNTI is governed by the influence of Selection relative to the influences of Dispersal and Drift. Any PCA axes that explain a significant fraction of variation in βNTI must therefore reflect one or more environmental variables that impose Selection. This is true even if a significant PCA axis is unrelated to measured abiotic variables; the degree to which PCA axes are related to measured abiotic variables was evaluated by examining PCA axis loadings (Tables S2-S4). If a given PCA axis is significant for βNTI but measured abiotic variables do not load onto it, we consider this PCA axis to be an unmeasured, yet influential and spatially structured environmental variable. If measured abiotic variables load heavily onto a significant PCA axis, we consider the axis to be a measured, influential environmental variable. Furthermore, all PCA axes non-significant for βNTI were considered to primarily characterize spatial relationships among communities. This is true even if measured abiotic variables load heavily; measuring a given abiotic variable does not necessarily indicate that variable imposes Selection. Prior to RCbray model-selection we used the βNTI model-selection results to characterize each PCA axis as an unmeasured environmental, a measured environmental, or a spatial variable. Following RCbray model-selection, these variable designations were used (in conjunction with PCA loadings) to interpret the factors imposing Selection or Dispersal Limitation. 5 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 Comparison of inferences to those gleaned from pre-existing approaches We compared the insights derived from our novel analytical framework to those derived from a pre-existing approach (similar to e.g., Heino et al 2011, Legendre et al 2009). To achieve a direct comparison to our approach we used the same PCA axes with the same model-selection procedure, but with Bray-Curtis dissimilarity as the dependent variable. However, in the ‘preexisting approach’ one must rely on PCA loadings to identify PCA axes as environmental or spatial; PCA axes with heavy loadings from measured abiotic variables are considered environmental and all others are considered spatial. The key assumption is that any influence of measured abiotic variables is through Selection (i.e., measured variables do not impose Dispersal Limitation). As noted above, this is not necessarily the case. In fact, which abiotic variables impose Selection and which impose Dispersal Limitation is an empirical question that requires an answer informed by the ecology of the system rather than PCA axis loadings (which by themselves carry no ecological information). A large number of studies have combined model-selection with variation partitioning (Legendre & Legendre 1998) to infer influences of community assembly processes (e.g., Legendre et al 2009). Variation partitioning has, however, recently been shown to provide invalid inferences (Gilbert & Bennett 2010, Smith & Lundholm 2010), especially with respect to how much stronger one process is than another (Stegen & Hurlbert 2011). As such, we do not use variation partitioning. Instead, if environmental or spatial variables explain a non-zero fraction of variation in Bray-Curtis, we consider this evidence for some (non-zero) influence of Selection or Dispersal Limitation, respectively. How influential each process is, however, cannot be estimated, nor can the influence of Drift (Anderson et al 2011, Legendre et al 2009). Modelselection results for Bray-Curtis are provided in Table 1 and a comparison of inferences drawn using our approach versus the standard approach is provided in Figure 4. 6 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 Supplemental Figures Figure S1 Figure S1. Phylogenetic Mantel correlogram showing generally non-significant phylogenetic signal for species abundances. Solid and open symbols denote significant and non-significant correlations, respectively, relating between-species abundance differences to between-species phylogenetic distances at a given phylogenetic distance lag. Abundance for each species was found across the entire meta-community (both formations, all communities) after rarefaction (to 500 sequences per community). Significant negative correlations indicate that closely related species have different abundances, while more distantly related species have more similar abundances. Importantly, there are no significant positive correlations; abundance is not phylogenetically ‘conserved’ (sensu Losos 2008). Lack of phylogenetic conservatism in abundance allows for robust statistical performance of our specific randomization used in conjunction with phylogenetic turnover (Hardy 2008). Further, there are only two phylogenetic distances across which there is a significant negative correlation. This rather weak pattern of ‘species abundance phylogenetic overdispersion’ (sensu Hardy 2008) is also consistent with robust statistical performance of our specific randomization procedure (Hardy 2008). 7 276 277 278 279 280 281 282 283 Supplemental Tables Table S1. Metadata for sampled communities. Well ID can be associated with well names provided in Bjornstad et al. (Bjornstad et al 2009). Communities at the Hanford-Ringold contact are designated as such. Core Elevation is the elevation at which a given community was sampled. Ringold Elevation is the elevation at the top of the Ringold formation. The water table has a minimum elevation near 104.3m (Bjornstad et al 2009). 284 285 8 286 287 Table S1 (cont.). 9 288 289 10 290 Table S1 (cont.). 291 292 11 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 Table S2. Loadings of measured environmental variables and spatial eigenvectors (‘PCNM’) on PCA axes within the Ringold formation. Any loadings weaker than 0.001 are listed as null. A standard approach to identify PCA axes as either environmental or spatial is based on the loadings of measured abiotic variables. In this case, PCA axes with heavy loadings from measured abiotic variables are considered environmental (grey fill across the ‘Loading-Based Designation’ row). All other PCA axes were considered spatial (black fill). The approach we employ uses model-selection for βNTI; PCA axes retained during model-selection for βNTI reflect environmental variables, whether or not measured abiotic variables load on them. If measured abiotic variables do not load onto a retained PCA axis, it is considered an unmeasured environmental variable. PCA7 is such a variable within the Ringold; PCA7 was the only axis retained during the βNTI model-selection procedure (Table 1), yet no measured abiotic variables load onto PCA7. This indicates that Selection in the Ringold is imposed, in part, by an unmeasured environmental variable with an estimated spatial structure depicted in Fig. 5. 308 309 12 310 13 311 312 313 314 315 316 317 318 319 320 321 Table S3. As for Table S2, but within the Hanford formation. Within the Hanford PCA1 and PCA3 were retained during modelselection for βNTI (Table 1). Those axes are therefore considered environmental variables that impose Selection, while all other PCA axes were considered to primarily reflect spatial relationships among communities (as opposed to environmental differences among communities). Distance-to-the-river and elevation (of communities) are the measured variables loading most heavily on PCA1 and PCA3, respectively. RCbray model-selection retained PCA axes 1, 3 and 7 (Table 1). No measured variables load onto PCA7. Note that PCA axes are specific to the formation being studied; PCA7 for the Hanford is unrelated to PCA7 for the Ringold. The loading patterns suggest that Selection results from vertically and horizontally structure environmental variables, while Dispersal Limitation results from isolation of communities due to spatially structured, but unmeasured aspects of the system, such as complex hydrologic flow paths. 322 323 14 324 325 326 327 328 329 330 331 332 333 334 335 336 Table S4. As for Table S2, but across the Hanford and Ringold formations (the ‘full’ system). Model-selection for βNTI across the full system retained PCA axes 1 and 32 (Table 1), and measured abiotic variables load heavily on both. Elevation (of communities) was the measured variable loading most heavily on both PCA1 and PCA32, although the percent-mud loading on PCA1 was nearly as strong (0.576 vs. -0.554). PCA1 and PCA32 were therefore considered measured environmental variables. PCA19 was also selected, but no measured abiotic variables loaded on this axis, suggesting that PCA19 reflects an unmeasured environmental variable that imposes Selection. All other PCA axes were considered to primarily reflect spatial relationship. These patterns suggest that unmeasured factors and factors associated with elevation, such as percent-mud, impose Selection across the full system. Modelselection for RCbray also retained PCA1 and PCA32, but retained 11 other axes as well (Table 1). These additional 11 axes were identified as spatial variables by the βNTI model-selection, and are therefore considered to primarily reflect factors that impose Dispersal Limitation. Given the large number of selected variables and their spatial complexity, we hypothesize that the degree of organismal exchange among local communities is governed by spatially complex hydrologic flow paths. 15 337 338 Table S4 (cont.). 16 339 340 Table S4 (cont.). 17 341 18 342 Supplemental References 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 Anderson MJ, Crist TO, Chase JM, Vellend M, Inouye BD, Freestone AL et al (2011). Navigating the multiple meanings of β diversity: a roadmap for the practicing ecologist. Ecol Lett 14: 19-28. Andersson AF, Riemann L, Bertilsson S (2010). Pyrosequencing reveals contrasting seasonal dynamics of taxa within Baltic Sea bacterioplankton communities. ISME 4: 171-181. Bjornstad BN, Horner JA, Vermeul VR, Lanigan DC, Thorne PD (2009). Borehole completion and conceptual hydrogeologic model for the IFRC Well Field, 300 Area, Hanford Site. PNNL-18340, Pacific Northwest National Laboratory, Richland, WA. Blanchet FG, Legendre P, Borcard D (2008). Forward selection of explanatory variables. Ecology 89: 2623-2632. Borcard D, Legendre P (2002). All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol Model 153: 51-68. Borcard D, Gillet F, Legendre L (2011). Numerical Ecology with R. Springer: New York, NY. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al (2010). QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336. Chase JM, Kraft NJB, Smith KG, Vellend M, Inouye BD (2011). Using null models to disentangle variation in community dissimilarity from variation in α-diversity. Ecosphere 2: art24. Diniz-Filho JAF, Terribile LC, da Cruz MJR, Vieira LCG (2010). Hidden patterns of phylogenetic nonstationarity overwhelm comparative analyses of niche conservatism and divergence. Global Ecol Biogeogr 19: 916-926. Folk RL (1980). Petrology of Sedimentary Rocks. Hemphill Publishing Co.: Austin, Texas. Gee GW, Or D (2002). Particle-size analysis. In: Dane JH, Topp GC (eds). Methods of Soil Analysis, Part 4. Physical Methods. Soil Science Society of America: Madison, Wisconsin. pp 255-293. Gilbert B, Bennett JR (2010). Partitioning variation in ecological communities: do the numbers add up? Journal of Applied Ecology 47: 1071-1082. Hardy OJ (2008). Testing the spatial phylogenetic structure of local communities: statistical performances of different null models and test statistics on a locally neutral community. J Ecol 96: 914-926. Heino J, Grönroos M, Soininen J, Virtanen R, Muotka T (2011). Context dependency and metacommunity structuring in boreal headwater streams. Oikos: no-no. Huse SM, Huber JA, Morrison HG, Sogin ML, Mark Welch D (2007). Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8. Legendre P, Legendre L (1998). Numerical Ecology. Elsevier Science: Amsterdam, The Netherlands. Legendre P, Anderson MJ (1999). Distance-based redundancy analysis: Testing multispecies responses in multifactorial ecological experiments. Ecol Monogr 69: 1-24. Legendre P, Mi XC, Ren HB, Ma KP, Yu MJ, Sun IF et al (2009). Partitioning beta diversity in a subtropical broad-leaved forest of China. Ecology 90: 663-674. Li WZ, Godzik A (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658-1659. Lin X, Kennedy D, Peacock A, McKinley J, Resch CT, Fredrickson J et al (2012a). Distribution of Microbial Biomass and Potential for Anaerobic Respiration in Hanford Site 300 Area Subsurface Sediment. Appl Environ Microb 78: 759-767. Lin X, Mckinley J, Resch CT, Lauber C, Fredrickson J, Konopka AE (2012b). Spatial and temporal dynamics of microbial community in the Hanford unconfined aquifer. ISME 6: 1665-1676. Losos JB (2008). Phylogenetic niche conservatism, phylogenetic signal and the relationship between phylogenetic relatedness and ecological similarity among species. Ecol Lett 11: 995-1003. McKinley JP, Zachara JM, Resch CT, Kaluzny RM, Miller MD, Vermeul VR et al (in revision). River water intrusion and contaminant uranium contributions from the vadose zone to groundwater during the annual Spring rise in Columbia River stage at the Hanford Site, Washington. 19 391 392 393 394 395 396 397 398 399 400 401 402 403 404 Pei NC, Lian JY, Erickson DL, Swenson NG, Kress WJ, Ye WH et al (2011). Exploring Tree-Habitat Associations in a Chinese Subtropical Forest Plot Using a Molecular Phylogeny Generated from DNA Barcode Loci. Plos One 6. Price MN, Dehal PS, Arkin AP (2009). FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. Mol Biol Evol 26: 1641-1650. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig WG, Peplies J et al (2007). SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35: 7188-7196. Smith TW, Lundholm JT (2010). Variation partitioning as a tool to distinguish between niche and neutral processes. Ecography 33: 648-655. Stegen JC, Hurlbert AH (2011). Inferring Ecological Processes from Taxonomic, Phylogenetic and Functional Trait β-Diversity. Plos One 6: e20906. Stegen JC, Lin X, Konopka AE, Fredrickson JK (2012). Stochastic and deterministic assembly processes in subsurface microbial communities. ISME 6: 1653-1664. 405 406 20