1 1 Detailed Material and Methods 2 Data retrieval. 3 We analyzed 16S rDNA sequences from the 0.1-0.8 m size fraction for each of forty- 4 five seawater samples collected on the Sorcerer II as part of the Global Ocean Sampling 5 expedition[1]. As described in this study the samples were collected between May 2003 6 and March 2004. Detailed procedures for DNA sequencing are available elsewhere[1, 7 2]. We collected a total of 4,125 16S rRNA gene sequences with corresponding 8 environmental data (25 samples) from the CAMERA website[3]. A subset of 3,228 9 sequences remained after discarding those from poorly sampled localities or from a 10 different study[2] — e.g., "GS00a"— (Table S1). We acknowledge that environmental 11 shotgun sequencing only discloses the most abundant phylotypes of local surface 12 communities, which are most likely involved in the main biogeochemical processes at 13 sampling time[4]. Hence, unless they belong to known taxonomic lineages, the rare 14 members of marine microbial biosphere may not follow a similar PT. 15 DNA alignment and phylogenetic assessment. 16 To reduce alignment errors due to implausible insertion-deletion event histories, all 16S 17 rDNA sequences were aligned using the PRANK software[5]. A maximum likelihood 18 (ML) tree was then inferred from 1,285 nucleotide sites using RAxML[6] under a GTR 19 + Gamma + Invariable model of sequence evolution. A three-step quality control was 20 then conducted: (i) sequences generating excessively long branches (>0.5 substitutions 21 per site) were removed from subsequent analyses; (ii) sites exhibiting more than 75% of 22 gaps or missing data were discarded; and (iii) too fragmentary sequences (totalizing less 23 than 10% of the final alignment length) were also eliminated. The final alignment 24 (3,228 sequences and 1,285 sites) was then subjected to a new RAxML analysis 25 followed by a PAUP[7] refinement based on SPR branch swapping. The resulting ML 2 26 phylogram was rendered ultrametric by the non-parametric rate smoothing procedure of 27 R8S[8]. Patristic distances and tree drawing were then managed using the APE package 28 for R[9]. 29 Taxonomic assignation. 30 The taxonomy of each 16S rDNA sequences was inferred using a local BLAST[10] 31 versus the SILVA database version 100 from August 2009[11] which contained nearly 32 1,200,000 SSU/LSU sequences. The first 100 best BLAST hits were then processed 33 using a local Perl script to parse out relevant taxonomic information, and a 2/3 34 consensus majority was used to infer taxonomy. Relevant subgroups (i.e., Alpha 35 Proteobacteria and Gamma Proteobacteria) were then selected from the overall dataset. 36 Because OTUs belonging to other taxonomic groups were often scarce, we could not 37 disclose their patterns of PT. 38 Distance matrices. 39 Geographic distances between pairs of samples were calculated using latitudinal and 40 longitudinal coordinates and computed using R[12]. Phylogenetic ultrametric distances 41 were assessed by the non-parametric rate smoothing procedure of R8S[8] using the 42 picante package for R by the non-parametric rate smoothing procedure of R8S[8]. Then, 43 the amount of phylogenetic turnover between communities was calculated using the 44 Phylosor index[13],which quantifies the fraction of branch lengths that were unique (not 45 shared) to each of the two microbial communities. Environmental distance matrix was 46 computed using the Gower distance implemented in the cluster package for R[12]. 47 Disentangling geographic versus environment influences on phylogenetic turnover. 48 We analyzed the respective effect of geographic and environment on phylogenetic 49 turnover between all pairs of microbial communities using multiple regressions on 50 distance matrices (MRM; see[14, 15] for details). In brief, MRM is an extension of 3 51 partial Mantel analysis that is used to investigate relationships between a multivariate 52 response distance matrix and any number of explanatory distance matrices[15]. We 53 implemented additional partial multiple regressions on distance matrices to estimate the 54 “pure” effect of each explanatory matrix[16]. Significance of regression coefficients 55 were tested using 9,999 permutations. Analyses were performed using library 56 “ecodist”[17] implemented in the R Package[12]. 57 58 59 References 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 1 Rusch, D. B., Halpern, A. L., Sutton, G., Heidelberg, K. B., Williamson, S., et al. 2007 The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLOS Biol. 5, 0398. 2 Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., et al. 2004 Environmental genome shotgun sequencing of the Sargasso Sea. Science. 304, 66-74. 3 Seshadri, R., Kravitz, S., Smarr, L., Gilna, P., Frazier, M. 2007 CAMERA: A Community Resource for Metagenomics. PLOS Biol. 5, e75 doi:10.1371/journal.pbio.0050075. 4 Pedrós-Alió, C. 2006 Marine microbial diversity: can it be determined? Trends Microbiol. 14, 257-263. 5 Loytynoja, A., Goldman, N. 2008 Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis. Science. 320, 1632-1635. 6 Stamatakis, A. 2006 RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 22, 2688-2690. 7 Swofford, D. L. PAUP*: Phylogenetic Analysis Using Parsimony (*And Other Methods). 4 ed. Sunderland, Massachusetts: Sinauer Associates 2002. 8 Sanderson, M. J. 2003 r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 19, 301-302. 9 Paradis, E., Claude, J., Strimmer, K. 2004 APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics. 20, 289-290. 10 Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., et al. 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. 11 Pruesse, E., Quast, C., Knittel, K., Fuchs, B., Ludwig, W., et al. 2007 SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 35, 7188-7196. 12 R Development Core Team. 2008 A language and environment for statistical computing. . 4 89 90 91 92 93 94 95 96 97 98 99 100 101 102 13 Bryant, J. A., Lamanna, C., Morlon, H., Kerkhoff, A. J., Enquist, B. J., et al. 2008 Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proc. Natl. Acad. Sci. U. S. A. 105 Suppl 1, 11505-11511. 14 Manly, B. F. J. 1986 Randomization and regression methods for testing for associations with geographical, environmental and biological distances between populations. Researches on population ecology. 28, 201-218. 15 Lichstein, J. W. 2007 Multiple regression on distance matrices: a multivariate spatial analysis tool. Plant ecology. 188, 117-131. 16 Legendre, P., Legendre, L. 1998 Numerical Ecology. Elsevier Science Publ. Co. 17 Goslee, S. C., Urban, D. L. 2007 The ecodist package for dissimilarity-based analysis of ecological data. Journal of statistical software. 22, 1-19. 5 103 Table S1. Samples considered for the analyses 104 105 Sample Sample Chlorophyll Date Depth (m) Density 64°30'00"W 15 May 03 5 0.1 36.7 22.9 32°10'00"N 64°30'00"W 15 May 03 5 0.1 36.7 22.9 GS02 42°30'11"N 67°14'24"W 21 Aug. 03 1 1.4 29.2 18.2 GS03 42°51'10"N 66°13'2"W 21 Aug. 03 1 1.4 29.9 11.7 GS04 44°8'14"N 63°38'40"W 22 Aug. 03 2 0.4 28.3 17.3 GS05 44°41'25"N 63°38'14"W 23 Aug. 03 1 6 30.2 15 GS07 43°37'56"N 66°50'50"W 25 Aug. 03 1 1.4 31.7 17.9 GS08 41°29'9"N 71°21'4"W 17 Nov. 03 1 2.2 26.5 9.4 GS09 41°5'28"N 71°36'8"W 17 Nov. 03 1 4 31 11 GS10 38°56'24"N 74°41'6"W 18 Nov. 03 1 2 31 12 GS12 38°56'49"N 76°25'2"W 18 Dec. 03 13.2 21 3.5 1 GS15 24°29'18"N 83°4'12"W 8 Jan. 04 1.7 0.2 36 25 GS16 24°10'29"N 84°20'40"W 8 Jan. 04 2 0.16 35.8 26.4 GS17 20°31'21"N 85°24'49"W 9 Jan. 04 2 0.13 35.8 27 GS18 18°2'12"N 83°47'5"W 10 Jan. 04 1.7 0.14 35.4 27.4 GS19 10°42'59"N 80°15'16"W 12 Jan. 04 1.7 0.23 35.4 27.7 GS21 8°7'45"N 79°41'28"W 20 Jan. 04 1.6 0.5 30.7 27.6 GS22 6°29'34"N 82°54'14"W 21 Jan. 04 2 0.33 32.3 29.3 GS23 5°38'24"N 86°33'55"W 22 Jan. 04 2 0.07 32.6 28.7 GS26 1°15'51"N 90°17'42"W 2 Feb. 04 2 0.22 32.6 27.8 GS27 1°12'58"S 90°25'22"W 4 Feb. 04 2.2 0.4 34.9 25.5 GS29 0°12'0"S 90°50'7"W 9 Feb. 04 2.1 0.4 34.5 26.2 GS35 1°23'21"N 91°49'1"W 3 Feb. 04 1.7 0.28 34.5 21.8 GS36 0°1'15"S 91°11'52"W 2 Mar. 04 2.1 0.65 34.6 25.8 GS47 10°7'53"S 135°26'58"W 29 Mar. 04 30 0.12 37.3 28.6 Sample Latitude Longitude GS01a 32°10'00"N GS01c Salinity Temperature (°C)