1 Supplementary material 2 3 Supplementary results and discussion 4 5 Testing of prophage identification tools. Although many software packages have been 6 developed for prophage identification (i.e. Phage_Finder (Fouts, 2006), Prophinder (Lima- 7 Mendez et al., 2008) or PhiSpy (Akhter et al., 2012)), detection of pathogenicity islands (i.e. 8 PIPS (Soares et al., 2012)) and horizontal gene transfer events (i.e. Alien Hunter (Vernikos & 9 Parkhill, 2006)), they are poorly suited when analyzing novel viruses in uncultured 10 microorganisms, as they mostly rely on sequences present in the GenBank database obtained 11 primarily from viral isolates. We compared the efficiency of our method for viral sequence 12 detection to two other tools: PhiSpy (Akhter et al., 2012) and ProPhinder (Lima-Mendez et al., 13 2008). The latter two software packages found viral sequences in only five SAGs (Roseobacter 14 AAA076-E06, Bacteroidetes AAA160-P02, Verrucomicrobia AAA164-A21, AAA164-M04 and 15 AAA164-P11), and none of the SAGs, respectively, as compared to 20 SAGs detected with our 16 analysis. This is not surprising, since PhiSpy and ProPhinder were designed to identify prophages 17 in complete genomes and appear less effective when applied on the incomplete and fragmented 18 genome assemblies that are typical of SAGs. Moreover, our metagenomic fragment recruitment 19 and tetramer frequency criteria make viral search less reliant on existing, limited databases of 20 well-characterized viral genomes. 21 22 1 23 References 24 25 26 Akhter S, Aziz RK, Edwards RA. (2012). PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res 40:e126. 27 28 Brussaard C. (2004). Optimization of Procedures for Counting Viruses by Flow Cytometry. Appl Environ Microbiol 70. 29 30 Fouts DE. (2006). Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res 34:5839–5851. 31 32 Lima-Mendez G, Van Helden J, Toussaint A, Leplae R. (2008). Prophinder: a computational tool for prophage prediction in prokaryotic genomes. Bioinformatics 24:863–865. 33 34 35 Sieracki ME, Poulton NJ, Crosbie N. (2005). Automated isolation techniques for microalgae. In:Algal culturing techniques, Anderson, RA (ed), Elsevier Academic Press: New York, pp. 101– 116. 36 37 Soares SC, Abreu VAC, Ramos RTJ, Cerdeira L, Silva A, Baumbach J, et al. (2012). PIPS: pathogenicity island prediction software. PLoS One 7:e30848. 38 Suttle CA. (2005). Viruses in the sea. Nature 437:356–361. 39 40 41 Vernikos GS, Parkhill J. (2006). Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics 22:2196–2203. 42 43 2 44 45 46 Supplementary Figure 1: A) Alignments of Roseobacter phage AAA300-J04 and 47 Verrucomicrobia phage AAA164-B23 to the isolate Cellulophaga phage 3:2. B) Alignment of the 48 Verrucomicrobia phage AAA164-M04 to Cellulophaga phage 40:1. C) Alignment of 49 Pelagibacter phage HTVC010P to Verrucomicrobia phage AAA168-E21. Each arrow represents 50 a gene, with red arrows representing structural genes. Color scale indicates amino acid identity. 51 3 52 53 Supplementary Figure 2: Alignments of the cyanophage genomes P-SSM2 (infecting 54 Prochlorococcus) and S-SKS1 (infecting Synechococcus) with the Roseobacter phages AAA076- 55 E06 and AAA160-J18. Each arrow represents a gene, with the ones highlighted in red 56 representing a tail protein that could be associated with host recognition. Color scale indicates 57 amino acid identity. 58 4 59 60 Supplementary Figure 3: Alignments of Verrucomicrobia virus AAA164-N20 with similar 61 Phycodnaviridae viruses infecting prasinophytes. Each arrow represents a gene. Color scale 62 indicates amino acid identity. 63 5 64 Supplementary Table 1: Results of SAG de novo assemblies, performed using a combination of 65 Velvet and Allpaths (VA) or SPAdes (S), with (+k) or without k-mer pre-normalization, and co- 66 assembly with PacBio data (+PB). Assemblies with the largest N50 value are highlighted in bold. 67 SAG AAA076-E06 AAA160-C11 AAA160-J20 AAA164-A08 AAA164-A21 AAA164-B23 AAA164-I21 Assembly type VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB Contig Largest Assembly count contig size N50 15 34,374 213,430 25,679 7 96,895 210,055 55,325 9 88,313 217,375 48,877 8 96,895 212,799 55,325 10 88,313 220,394 48,877 29 190,175 931,699 51,612 19 367,558 949,786 187,353 22 456,898 981,557 126,510 19 367,558 949,786 187,353 22 456,898 981,557 126,510 34 97,772 496,591 26,603 30 123,152 532,593 28,562 39 107,683 594,717 32,621 30 123,152 532,593 28,562 39 107,683 594,717 32,621 4 59,851 82,090 59,851 5 51,013 85,975 51,013 6 43,343 81,237 43,343 5 51,013 85,975 51,013 7 43,343 85,100 43,343 94 79,991 845,225 13,668 87 103,650 996,353 24,729 123 104,071 1,300,161 24,863 87 103,650 996,353 24,729 123 104,071 1,300,161 24,863 7 51,439 98,281 51,439 8 53,860 113,061 15,234 32 34,561 183,572 9,499 9 53,860 115,096 15,234 34 34,561 188,948 8,191 118 59,566 892,874 11,799 100 74,587 1,021,063 19,396 114 77,344 1,300,473 24939 100 74,587 1,021,063 19,396 117 77,344 1,309,634 24951 6 AAA164-M04 AAA164-P11 AAA168-E21 VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB VA S+k S S+k+PB S+PB 107 96 116 86 105 18 25 44 25 40 153 107 131 107 132 176,415 344,229 345,067 344,229 345,067 49,973 96,777 86,187 96,810 131,700 102,949 339,100 527,868 339,100 257,868 2,315,820 2,416,169 2,597,583 2,429,768 2,609,478 255,061 293,927 396,023 299,814 401,629 2,180,223 2,269,188 2,522,632 2,518,811 2,518,811 42,672 87,527 68,071 157473 85,371 35,515 36496 30,370 36496 30,370 27,344 65789 51,168 65789 51,168 68 69 7 70 Supplementary Table 2: Sequencing effort for SAGs that were sequenced with both Illumina 71 and PacBio technologies. Illumina sequencing was performed using 2x150 bp sequencing of 240 72 bp average length fragments. 73 PacBio Nb Reads Illumina Mean read length Sequencing effort Nb Reads Sequencing effort AAA076-E06 128,027 1,506 192,929,588 19,715,412 2,957,311,800 AAA160-C11 188,335 1,549 291,738,368 26,781,308 4,017,196,200 AAA160-J20 202,869 1,573 319,287,097 24,006,694 3,601,004,100 AAA164-A08 94,132 1,620 152,567,589 27,914,728 4,187,209,200 AAA164-A21 171,216 1,574 269,582,113 16,261,738 2,439,260,700 AAA164-B23 176,761 1,684 297,776,729 34,876,442 5,231,466,300 AAA164-I21 197,592 1,498 296,055,774 26,314,206 3,947,130,900 AAA164-M04 27,011 1,732 46,783,943 28,032,742 4,204,911,300 AAA164-P11 16,126 1,933 31,181,127 28,576,768 4,286,515,200 AAA168-E21 116,413 1,589 185,040,555 18,092,664 2,713,899,600 74 75 8 76 Supplementary Table 3: Putative sequence cross-contamination. 77 SAG AAA164-B23 AAA015-D07 AAA015-M09 AAA160-J20 AAA160-P02 AAA288-N07 AAA536-G18 AAA164-A08 Contig 00001 00020 00023 00011 00061 00041 00067 00003 AAA076-E06 AAA015-O19 00005 00045 Contig length (bp) 51,439 3,541 2,393 11,805 3,577 3,729 2,694 2,465 Depth of coverage 77,585 9 6 23 10 9 7 10 23,533 7,017 21,598 14 Notes Putative contamination source Identical to AAA164-B23 contig 00001 Identical to AAA164-B23 contig 00001 Identical to AAA164-B23 contig 00001 Identical to AAA164-B23 contig 00001 Identical to AAA164-B23 contig 00001 Identical to AAA164-B23 contig 00001 Identical to AAA164-B23 contig 00001 Putative contamination source Identical to AAA076-E06 contig 00005 78 79 9 80 Supplementary Table 4: List of the genes that were found in the newly sequenced Myoviridae phages Verrucomicrobia phage 81 AAA164-P11, Thaumarchaeota phage AAA160-J20 and Roseobacter phages AAA176-E06 and AAA160-J18. Only the genes that 82 were conserved among all three phages and the isolates were kept for whole genome phylogeny. Gene name Function AAA164P11 AAA160J20 AAA076E06 contig00005_28 contig00005_9 contig00001_7 contig00005_8 contig00001_89 contig00001_68 contig00001_46 contig00001_67 contig00005_19 contig00001_1 contig00001_3 contig00001_5 contig00005_2 contig00006_15 contig00001_63 contig00001_61 contig00001_8 contig00001_10 gp32 gp41 gp43 gp44 gp61 Exonuclease A terminase large subunit tail sheath monomer portal vertex protein prohead core scaffold and protease major head subunit single-stranded DNA binding protein DNA primase/helicase DNA polymerase sliding clamp loader DNA primase contig00001_23 contig00002_1 contig00002_11 contig00005_29 contig00001_32 contig00001_29 contig00022_8 contig00002_3 contig00001_90 contig00005_15 contig00001_47 contig00006_1 contig00001_29 contig00006_16 gp62 clamp loader contig00002_13 contig00002_5 contig00001_39 NrdA ribonucleotide-diphosphate reductase subunit alpha contig00001_25 contig00001_27 contig00006_17 contig00006_17 Absent in RM378 and phiM12. NrdB ribonucleotide-diphosphate reductase subunit beta contig00001_24 contig00001_28 contig00006_18 Contig00006_18 Absent in RM378 and phiM12. DexA gp17 gp18 gp20 gp21 gp23 AAA160J18 Comments Absent in RM378 vB_CsaM_GAP32. and contig00001_5 contig00001_8 Absent in RM378. contig00001_47 Absent in RM378. Absent in RM378. Absent in RM378 vB_CsaM_GAP32. and 83 84 10 85 Supplementary Table 5: Taxonomic classification of viral sequences found in each SAG, based on the number of best blast hit 86 method (evalue < 10-5) for each viral family. The highest number of hits for each SAG is indicated in bold. Phage Total genes Myo Phycodna Podo Other viruses Sipho AAA300-J04 34 2 2 7 2 4 AAA160-D02 48 3 1 16 4 3 AAA160-C11 102 11 0 11 6 3 AAA160-P02 89 8 0 19 4 1 AAA164-I21 78 8 0 8 2 2 AAA164-M04 102 3 1 22 8 2 AAA168-E21 63 0 0 32 0 2 AAA164-O14 47 3 0 6 3 4 AAA164-A21 71 6 0 19 1 3 AAA164-B23 82 2 0 15 3 2 AAA164-P11 205 29 3 5 6 6 AAA076-E06 289 60 1 0 193 13 AAA160-J18 111 75 0 0 11 10 AAA16-0I06 13 3 0 0 0 2 AAA168-P09 23 14 0 0 0 0 AAA160-J20 208 34 2 4 9 3 AAA160-J14 66 6 0 5 22 6 AAA164-A08 103 15 0 3 30 3 AAA164-L15 53 2 0 1 4 1 AAA164-N20 91 0 73 0 1 7 Note Global homology of this phage to AAA164-B23 suggests that this phage belongs to Podoviridae. Global homology to other Podoviridae, like Puniceispirillum phage HMO2011, as well as to phage AAA160-D02, suggests that this phage belongs to Podoviridae. Global homology to Pelagibacter phage HTVC011P and Puniceispirillum phage HMO-2011 suggests that this phage belongs to Podoviridae. This phage is most similar to Siphoviridae cyanophage S-SKS1, which has the morphology of a Siphoviridae, but is most similar to Myoviridae in its genome composition. 87 88 11 Supplementary Table 6: List of viral isolates that were used in genomic alignments. Phage Accession Number Myoviridae Bacillus phage SP10 Cellulophaga phage phiST Cyanophage Syn30 Pelagibacter phage HTVC008M Rhodothermus phage RM378 Synechococcus phage S-SSM4 Cyanophage P-RSM6 Phormidium phage Pf-WMP3 Prochlorococcus phage P-HM1 Prochlorococcus phage P-SSM2 Aeromonas phage 65 Bacillus phage SP10 Campylobacter phage CPt10 Cellulophaga phage phiSM Cellulophaga phage phiST Cyanophage S-TIM5 Rhodothermus phage RM378 Synechococcus phage S-PM2 Podoviridae Salinivibrio CW02 Roseobacter SIO1 Pelagibacter HTVC019P Pelagibacter HTVC011P Enterobactiophage T7 Enterobacteriophage K1F Cyanophage Syn5 Cyanophage P-SSP2 Celeribacter P12053L Cellulophaga phage phi40 Phycodnaviridae Bathycoccus sp. RCC1105 virus BpV1 Ostreococcus tauri virus 1 Ostreococcus virus OsV5 Micromonas pusilla virus 12T NC_019487 KC821604 NC_021072 NC_020484 NC_004735 NC_020875 NC_020855 NC_009551 NC_015280 GU071092 NC_015251 NC_019487 FN667789 NC_020860 KC821604 NC_019516 NC_004735 NC_006820 NC_019540 NC_002519 NC_020483 NC_020482 NC_001604 NC_007456 NC_009531 NC_016656 NC_018280 KC821612 NC_014765 FN386611 NC_010191 NC_020864 12 Supplementary Table 7: List of cyanophage isolates that were used to experimentally infect Roseobacter strains. Syn5 8102-8 8017-1 Syn9 P-SSM2 Med4-8a 9515-11a Syn2 9303-10a 8109-2 S-WHM1 9302-1a S-PM2 9515-10a Syn1 8018-8 6501-1 SS120-1 Natl-2A-3 Natl-2A-14 Natl-2A-30 Natl-2A-39 Syn33a 6501-5 6501-9 Syn19 8102-4 8102-12 8109-3 P-SS2 P-SSP7 P-HM1 P-HM2 P-SSM4 9211-16 9215-3a 9215-6a 9303-2 9303-2a CYANOPHAGE LYSATE USED FOR ROSEOBACTER ASSAY 9515-15 Natl 1A-23 MBARI C-9 MBARI C-107 9515-14 Natl 1A-24 MBARI C-10 MBARI C-108 9515-13 Natl 1A-25 MBARI C-11 MBARI C-109 9515-12 Natl 1A-26 MBARI C-12 MBARI US-1 9515-11 Natl 1A-27 MBARI C-13 MBARI US-2 9515-10 Natl 1A-28 MBARI C-14 MBARI US-3 9515-9 Natl 1A-29 MBARI C-15 MBARI US-4 9515-8 Natl 1A-31 MBARI C-16 MBARI US-5 9515-7 Natl 1A-32 MBARI C-17 MBARI US-7 9515-6 Natl 1A-33 MBARI C-19 MBARI US-13 9515-5 Natl 1A-34 MBARI C-21 MBARI US-17 9515-4 Natl 1A-35 MBARI C-22 MBARI US-18 9515-3 Natl 1A-36 MBARI C-24 MBARI US-19 9515-2 Natl 1A-37 MBARI C-25 MBARI US-20 9515-1 Natl 1A-38 MBARI C-26 MBARI US-23 Natl 1A-1 Natl 1A-39 MBARI C-28 MBARI US-24 Natl 1A-2 Natl 1A-42 MBARI C-29 MBARI US-26 Natl 1A-3 Natl 1A-42 MBARI C-31 MBARI US-30 Natl 1A-4 Natl 1A-44 MBARI C-33 MBARI US-33 Natl 1A-5 Natl 1A-45 MBARI C-34 MBARI US-34 Natl 1A-6 Natl 1A-46 MBARI C-35 MBARI US-36 Natl 1A-7 Med4-1 MBARI C-36 MBARI US-37 Natl 1A-8 Med4-2 MBARI C-37 MBARI US-39 Natl 1A-9 Med4-3 MBARI C-38 MBARI US-40 Natl 1A-10 Med4-5 MBARI C-39 MBARI US-42 Natl 1A-11 Med4-6 MBARI C-40 MBARI US-43 Natl 1A-12 Med4-7 MBARI C-42 MBARI US-44 Natl 1A-13 Med4-8 MBARI C-43 MBARI US-46 Natl 1A-14 Med4-9 MBARI C-45 MBARI US-47 Natl 1A-15 Med4-10 MBARI C-46 MBARI US-49 9303-1 Med4-11 MBARI C-47 MBARI US-50 9303-4 Med4-12 MBARI C-48 MBARI US-52 9303-5 Med4-13 MBARI C-49 MBARI US-53 9303-6 Med4-14 MBARI C-53 MBARI US-54 9303-7 Med4-15 MBARI C-54 MBARI US-56 9303-8 Med4-16 MBARI C-55 MBARI US-57 9303-9 Med4-17 MBARI C-56 MBARI US-59 9303-10 Med4-18 MBARI C-57 MBARI US-60 9211-1 Med4-19 MBARI C-58 MBARI US-61 13 9313-2 SS120-6a Natl-2A-3a Natl-2A-19 Natl-2A-31 Natl-2A-40 Natl-2A-41 Natl-2A-53 Natl-2A-79a Med4-52 Med4-56 Med4-42 Med4-55 Med4-53 Med4-54 Med4-48 Med4-44 7803-6 8018-1 8018-2 8018-5 8018-4 7803-1 7803-8 7803-7 7803-2 7803-3 7803-4 7803-5 9515-12 Med4-51 9515-13 9515-14 9515-18 9515-17 9515-16 9211-2 9211-3 9211-4 9211-5 9211-6 9211-7 9211-8 9211-10 9211-11 9211-12 9211-13 9211-14 9211-15 9303-8 SS120-2 SS120-3 SS120-4 SS120-5 Natl 1A-1 Natl 1A-2 Natl 1A-4 Natl 1A-5 Natl 1A-6 Natl 1A-7 Natl 1A-8 Natl 1A-9 Natl 1A-10 Natl 1A-11 Natl 1A-12 Natl 1A-13 Natl 1A-15 Natl 1A-16 Natl 1A-17 Natl 1A-18 Natl 1A-21 Natl 1A-22 Med4-20 Med4-21 Med4-22 Med4-23 Med4-24 Med4-25 Med4-26 Med4-27 Med4-28 Med4-29 Med4-30 Med4-31 Med4-32 Med4-33 Med4-34 Med4-35 Med4-36 Med4-37 Med4-38 Med4-39 Med4-40 Med4-41 Med4-43 Med4-44 Med4-45 Med4-46 Med4-47 Med4-48 Med4-49 Med4-50 MBARI C-1 MBARI C-2 MBARI C-5 MBARI C-6 MBARI C-7 MBARI C-8 MBARI C-59 MBARI C-60 MBARI C-61 MBARI C-66 MBARI C-67 MBARI C-68 MBARI C-69 MBARI C-72 MBARI C-73 MBARI C-75 MBARI C-76 MBARI C-77 MBARI C-78 MBARI C-79 MBARI C-80 MBARI C-83 MBARI C-84 MBARI C-85 MBARI C-86 MBARI C-88 MBARI C-89 MBARI C-90 MBARI C-91 MBARI C-92 MBARI C-93 MBARI C-94 MBARI C-96 MBARI C-97 MBARI C-98 MBARI C-99 MBARI C-100 MBARI C-101 MBARI C-102 MBARI C-103 MBARI C-104 MBARI C-105 MBARI US-62 MBARI US-63 MBARI US-64 MBARI US-65 MBARI US-71 MBARI US-74 MBARI US-78 MBARI US-79 MBARI US-80 MBARI US-82 MBARI US-83 MBARI US-85 MBARI US-88 MBARI US-89 MBARI US-94 MBARI US-95 MBARI US-101 MBARI US-102 MBARI US-103 MBARI US-104 MBARI US-105 MBARI US-106 MBARI US-108 MBARI US-109 MBARI US-110 MBARI US-111 MBARI US-112 MBARI US-113 MBARI US-114 MBARI US-115 MBARI US-116 MBARI US-117 MBARI US-120 MBARI US-122 MBARI US-123 14 Supplementary Table 8: List of Roseobacter strains tested against cyanophage isolates. Roseobacter Growth OD Growth OD Isolate Hr 575nm Hr 575nm 1/2YTSS - 25C 502 22 0.168 42 0.380 2597 22 0.200 42 0.334 566 42 0.490 434 42 0.623 ISM 42 0.400 563 42 0.445 GAI101 22 0.154 42 0.483 CCS2 42 0.440 TM1040 63 0.656 EE36 22 0.163 42 0.462 2601 63 0.322 EPP04 42 0.206 R11 43 0.087 E37 42 0.296 2654 42 0.284 Zobell - 21C Y41 42 0.507 458 42 0.209 445 42 0.558 474 42 0.241 2.1 42 0.426 NAS14.1 42 0.229 DTUF 42 0.368 AW10 42 0.440 DSS-3 63 0.211 DP14-09 465 63 0.103 457 63 0.246 DP1-21 43 0.205 2516 115 0.486 51 0.158 SIO67 43 0.126 DP14-28 63 0.316 51 0.205 DP1-11 51 0.524 K2 63 0.525 CCS1 115 0.256 51 0.142 15