Supplementary Information SI Material and Methods Statistical analyses The Mantel test determines the significance of the correlation coefficients between dissimilarity matrices. The Procrustes test determines whether two ordinations are significantly correlated (Peres-Neto and Jackson, 2001), based on symmetric Procrustes rotation that uses the reduced spaces instead of the original, complete dissimilarity matrices as in the Mantel test (Legendre and Legendre, 1998; Ramette, 2007). The ARISA and 454 MPTS tables were standardized by Hellinger transformation to make them more suitable for multivariate linear analyses (Legendre and Gallagher, 2001). Environmental parameters were log- or square root transformed when necessary, in order to normalize their distribution (Ramette, 2007). Environmental data tables were standardized (zscore transformation; x scaled to zero mean and unit variance) prior to analyses to remove the influence of magnitude differences between scales and units. Variation partitioning and path analysis both test for effects of specific parameters while controlling for the effects of other parameters. While the model in variation partitioning is defined a priori so as to explain the variation in the response table as a function of the explanatory sets of environmental parameters , path analysis determines the most likely causal modelling scenarios between several variables or categories of variables. In that respect, path analysis enables various competing, causal models to be assessed for their Goodness-of-fit with the data at hand (Legendre and Legendre 1998), and in our study, enabled a joint analysis of both community structure and enzyme activity (both potential response variables) as a function of their environmental context in the same model. Taxonomy based on 454 MPTS 454 massively parallel tag sequencing presents a cost-effective, high-throughput technique allowing a much higher sequencing depth (sampling effort) than traditional sequencing of the full length 16S rRNA gene (Margulies et al, 2005; Sogin et al, 2006). Taxonomic assignments of V6-hypervariable region tags were obtained through comparisons with a reference database of rRNA sequences using the Global Alignment for Sequence Taxonomy tool (Sogin et al, 2006; Huse et al, 2008). The retrieved taxonomy was shown to be highly consistent with results based on full-length rRNA sequences (Huse et al, 2008). 1 Technical considerations Recent studies using 454 MPTS on Arctic water samples have explored the ecology of rare tags in 454 MPTS datasets (Galand et al, 2009; Kirchman et al, 2010), while other studies have critically discussed the existence of the rare biosphere and suggested that diversity estimates may be inflated by sequencing errors in massively parallel tag sequencing approaches (Kunin et al, 2009; Quince et al, 2009). In our study we neither focused our analysis on rare tags nor put an emphasis on richness estimates in comparison with other studies, but rather explored relative differences between samples (Reeder and Knight, 2010). To test the impact of rare tag sequences on ecological interpretations of 454 MPTS datasets, Gobet et al. (2010) compared the effects of removing or not the pyrosequencing noise in datasets with the PyroNoise tool (Quince et al., 2009) and demonstrated that the observed variation in profiles were mostly due to non-technical fluctuations in the data, i.e. to real structural and ecological characteristics of the studied data sets. This result and the consistency of our 454 MPTS data with the ARISA data make us confident that the patterns we describe reflect true ecological variations between communities and not sequencing artifacts. SI Results ARISA and 454 MPTS datasets A total of 42 samples composed of three sediment horizons (0-1 cm, 1-2 cm, 4-5 cm) were analyzed by the molecular community fingerprinting technique ARISA, and for each sample 106-230 OTUs were obtained after binning (binning was done to take into account technical imprecision in the OTU definition). A subset of ten samples was selected for 454 MPTS, where the total number of sequences in our dataset was 225,744; the number of sequences from each of the selected samples ranged from 7,613 to 45,891, with 1275 to 4844 unique OTUs at 97% sequence similarity. The proportion of singletons, i.e. sequences that occurred only once in the study, was 65% and 11% respectively, when relating it either to the number of OTUs defined as unique sequences or to the total number of sequences in the dataset. Sequences are deposited under GenBank Sequence Read Archive (www.ncbi.nlm.nih.gov) submission number SRA046414.1, with experiment accession numbers: SRX099942.3, SRX099943.3, SRX099944.3, SRX099945.3, SRX099946.3, SRX099947.3, SRX099948.2, SRX100175.1, SRX100176.1, SRX100177.1. 2 Comparability between ARISA and 454 MPTS To demonstrate comparability in the ecological patterns extracted with ARISA and 454 MPTS, a Mantel test with Spearman correlation was used to compare dissimilarity matrices. Correlations were highly significant for comparisons on all taxonomic levels, legitimizing a combination of the two techniques for the interpretation of bacterial ecological patterns in our study (Table S1). In order to keep analyses over different taxonomic levels consistent, we used a subset of the 454 MPTS dataset for further analysis, in which only sequences with a complete assignment up to genus level were retained. A high Spearman correlation between dissimilarity matrices of the reduced (20% of original) and the original dataset confirmed that ecological patterns were consistent in both datasets (Table S1). 3 SI References Galand PE, Casamayor EO, Kirchman DL, Lovejoy C. (2009). Ecology of the rare microbial biosphere of the Arctic Ocean. Proc Natl Acad Sci U S A 106:22427-22432. Gobet A, Quince C, Ramette A. (2010). Multivariate cutoff level analysis (MultiCoLA) of large community data sets. Nucleic Acids Res 38:e155. Huse SM, Dethlefsen L, Huber JA, Welch, DM, Relman DA, Sogin ML. (2008). Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet 4: e1000255. Jakobsson M, Macnab R, Mayer L, Anderson R, Edwards M, Hatzky J et al. (2008). An improved bathymetric portrayal of the Arctic Ocean: Implications for ocean modeling and geological, geophysical and oceanographic analyses. Geophys Res Lett 35:L07602, 5pp. Kirchmann DL, Cottrell MT, Lovejoy C. (2010). The structure of bacterial communities in the western Arctic Ocean as revealed by pyrosequencing of 16S rRNA genes. Environ Microbiol 12(5):1132-1143. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. (2009). Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118-123. Legendre L, Legendre P. (1998). Numerical Ecology. (Elsevier Science, Amsterdam). 853pp. Legendre P, Gallagher ED. (2001). Ecologically meaningful transformations for ordination of species data. Oecologia 129:271-280. Margulies M et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-380. Peres-Neto PR, Jackson DA (2001) How well do multivariate data sets match? The advantage of a procrustean superimposition approach over the mantel test. Oecologia 129:169-178. Quince C et al. (2009) Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6:639-641. Ramette A. (2007). Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62:142-160. Reeder J, Knight R. (2010). Rapidly denoising pyrosequencing amplicon reads by exploiting rank-abundance distributions. Nat Methods 7:668-669. Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR et al. (2006). Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc Natl Acad Sci U S A 103:12115-12120. 4