Supplementary Material (docx 1457K)

advertisement
1
Supplementary material
2
3
Supplementary results and discussion
4
5
Testing of prophage identification tools. Although many software packages have been
6
developed for prophage identification (i.e. Phage_Finder (Fouts, 2006), Prophinder (Lima-
7
Mendez et al., 2008) or PhiSpy (Akhter et al., 2012)), detection of pathogenicity islands (i.e.
8
PIPS (Soares et al., 2012)) and horizontal gene transfer events (i.e. Alien Hunter (Vernikos &
9
Parkhill, 2006)), they are poorly suited when analyzing novel viruses in uncultured
10
microorganisms, as they mostly rely on sequences present in the GenBank database obtained
11
primarily from viral isolates. We compared the efficiency of our method for viral sequence
12
detection to two other tools: PhiSpy (Akhter et al., 2012) and ProPhinder (Lima-Mendez et al.,
13
2008). The latter two software packages found viral sequences in only five SAGs (Roseobacter
14
AAA076-E06, Bacteroidetes AAA160-P02, Verrucomicrobia AAA164-A21, AAA164-M04 and
15
AAA164-P11), and none of the SAGs, respectively, as compared to 20 SAGs detected with our
16
analysis. This is not surprising, since PhiSpy and ProPhinder were designed to identify prophages
17
in complete genomes and appear less effective when applied on the incomplete and fragmented
18
genome assemblies that are typical of SAGs. Moreover, our metagenomic fragment recruitment
19
and tetramer frequency criteria make viral search less reliant on existing, limited databases of
20
well-characterized viral genomes.
21
22
1
23
References
24
25
26
Akhter S, Aziz RK, Edwards RA. (2012). PhiSpy: a novel algorithm for finding prophages in
bacterial genomes that combines similarity- and composition-based strategies. Nucleic Acids Res
40:e126.
27
28
Brussaard C. (2004). Optimization of Procedures for Counting Viruses by Flow Cytometry. Appl
Environ Microbiol 70.
29
30
Fouts DE. (2006). Phage_Finder: automated identification and classification of prophage regions
in complete bacterial genome sequences. Nucleic Acids Res 34:5839–5851.
31
32
Lima-Mendez G, Van Helden J, Toussaint A, Leplae R. (2008). Prophinder: a computational tool
for prophage prediction in prokaryotic genomes. Bioinformatics 24:863–865.
33
34
35
Sieracki ME, Poulton NJ, Crosbie N. (2005). Automated isolation techniques for microalgae.
In:Algal culturing techniques, Anderson, RA (ed), Elsevier Academic Press: New York, pp. 101–
116.
36
37
Soares SC, Abreu VAC, Ramos RTJ, Cerdeira L, Silva A, Baumbach J, et al. (2012). PIPS:
pathogenicity island prediction software. PLoS One 7:e30848.
38
Suttle CA. (2005). Viruses in the sea. Nature 437:356–361.
39
40
41
Vernikos GS, Parkhill J. (2006). Interpolated variable order motifs for identification of
horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics
22:2196–2203.
42
43
2
44
45
46
Supplementary Figure 1: A) Alignments of Roseobacter phage AAA300-J04 and
47
Verrucomicrobia phage AAA164-B23 to the isolate Cellulophaga phage 3:2. B) Alignment of the
48
Verrucomicrobia phage AAA164-M04 to Cellulophaga phage 40:1. C) Alignment of
49
Pelagibacter phage HTVC010P to Verrucomicrobia phage AAA168-E21. Each arrow represents
50
a gene, with red arrows representing structural genes. Color scale indicates amino acid identity.
51
3
52
53
Supplementary Figure 2: Alignments of the cyanophage genomes P-SSM2 (infecting
54
Prochlorococcus) and S-SKS1 (infecting Synechococcus) with the Roseobacter phages AAA076-
55
E06 and AAA160-J18. Each arrow represents a gene, with the ones highlighted in red
56
representing a tail protein that could be associated with host recognition. Color scale indicates
57
amino acid identity.
58
4
59
60
Supplementary Figure 3: Alignments of Verrucomicrobia virus AAA164-N20 with similar
61
Phycodnaviridae viruses infecting prasinophytes. Each arrow represents a gene. Color scale
62
indicates amino acid identity.
63
5
64
Supplementary Table 1: Results of SAG de novo assemblies, performed using a combination of
65
Velvet and Allpaths (VA) or SPAdes (S), with (+k) or without k-mer pre-normalization, and co-
66
assembly with PacBio data (+PB). Assemblies with the largest N50 value are highlighted in bold.
67
SAG
AAA076-E06
AAA160-C11
AAA160-J20
AAA164-A08
AAA164-A21
AAA164-B23
AAA164-I21
Assembly
type
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
Contig Largest
Assembly
count
contig
size
N50
15
34,374
213,430
25,679
7
96,895
210,055
55,325
9
88,313
217,375
48,877
8
96,895
212,799
55,325
10
88,313
220,394
48,877
29 190,175
931,699
51,612
19 367,558
949,786 187,353
22 456,898
981,557 126,510
19 367,558
949,786 187,353
22 456,898
981,557 126,510
34
97,772
496,591
26,603
30 123,152
532,593
28,562
39 107,683
594,717
32,621
30 123,152
532,593
28,562
39 107,683
594,717
32,621
4
59,851
82,090
59,851
5
51,013
85,975
51,013
6
43,343
81,237
43,343
5
51,013
85,975
51,013
7
43,343
85,100
43,343
94
79,991
845,225
13,668
87 103,650
996,353
24,729
123 104,071
1,300,161
24,863
87 103,650
996,353
24,729
123 104,071
1,300,161
24,863
7
51,439
98,281
51,439
8
53,860
113,061
15,234
32
34,561
183,572
9,499
9
53,860
115,096
15,234
34
34,561
188,948
8,191
118
59,566
892,874
11,799
100
74,587
1,021,063
19,396
114
77,344
1,300,473
24939
100
74,587
1,021,063
19,396
117
77,344
1,309,634
24951
6
AAA164-M04
AAA164-P11
AAA168-E21
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
VA
S+k
S
S+k+PB
S+PB
107
96
116
86
105
18
25
44
25
40
153
107
131
107
132
176,415
344,229
345,067
344,229
345,067
49,973
96,777
86,187
96,810
131,700
102,949
339,100
527,868
339,100
257,868
2,315,820
2,416,169
2,597,583
2,429,768
2,609,478
255,061
293,927
396,023
299,814
401,629
2,180,223
2,269,188
2,522,632
2,518,811
2,518,811
42,672
87,527
68,071
157473
85,371
35,515
36496
30,370
36496
30,370
27,344
65789
51,168
65789
51,168
68
69
7
70
Supplementary Table 2: Sequencing effort for SAGs that were sequenced with both Illumina
71
and PacBio technologies. Illumina sequencing was performed using 2x150 bp sequencing of 240
72
bp average length fragments.
73
PacBio
Nb Reads
Illumina
Mean read
length
Sequencing
effort
Nb Reads
Sequencing
effort
AAA076-E06
128,027
1,506
192,929,588
19,715,412
2,957,311,800
AAA160-C11
188,335
1,549
291,738,368
26,781,308
4,017,196,200
AAA160-J20
202,869
1,573
319,287,097
24,006,694
3,601,004,100
AAA164-A08
94,132
1,620
152,567,589
27,914,728
4,187,209,200
AAA164-A21
171,216
1,574
269,582,113
16,261,738
2,439,260,700
AAA164-B23
176,761
1,684
297,776,729
34,876,442
5,231,466,300
AAA164-I21
197,592
1,498
296,055,774
26,314,206
3,947,130,900
AAA164-M04
27,011
1,732
46,783,943
28,032,742
4,204,911,300
AAA164-P11
16,126
1,933
31,181,127
28,576,768
4,286,515,200
AAA168-E21
116,413
1,589
185,040,555
18,092,664
2,713,899,600
74
75
8
76
Supplementary Table 3: Putative sequence cross-contamination.
77
SAG
AAA164-B23
AAA015-D07
AAA015-M09
AAA160-J20
AAA160-P02
AAA288-N07
AAA536-G18
AAA164-A08
Contig
00001
00020
00023
00011
00061
00041
00067
00003
AAA076-E06
AAA015-O19
00005
00045
Contig length (bp)
51,439
3,541
2,393
11,805
3,577
3,729
2,694
2,465
Depth of coverage
77,585
9
6
23
10
9
7
10
23,533
7,017
21,598
14
Notes
Putative contamination source
Identical to AAA164-B23 contig 00001
Identical to AAA164-B23 contig 00001
Identical to AAA164-B23 contig 00001
Identical to AAA164-B23 contig 00001
Identical to AAA164-B23 contig 00001
Identical to AAA164-B23 contig 00001
Identical to AAA164-B23 contig 00001
Putative contamination source
Identical to AAA076-E06 contig 00005
78
79
9
80
Supplementary Table 4: List of the genes that were found in the newly sequenced Myoviridae phages Verrucomicrobia phage
81
AAA164-P11, Thaumarchaeota phage AAA160-J20 and Roseobacter phages AAA176-E06 and AAA160-J18. Only the genes that
82
were conserved among all three phages and the isolates were kept for whole genome phylogeny.
Gene
name
Function
AAA164P11
AAA160J20
AAA076E06
contig00005_28
contig00005_9
contig00001_7
contig00005_8
contig00001_89
contig00001_68
contig00001_46
contig00001_67
contig00005_19
contig00001_1
contig00001_3
contig00001_5
contig00005_2
contig00006_15
contig00001_63
contig00001_61
contig00001_8
contig00001_10
gp32
gp41
gp43
gp44
gp61
Exonuclease A
terminase large subunit
tail sheath monomer
portal vertex protein
prohead core scaffold and
protease
major head subunit
single-stranded DNA binding
protein
DNA primase/helicase
DNA polymerase
sliding clamp loader
DNA primase
contig00001_23
contig00002_1
contig00002_11
contig00005_29
contig00001_32
contig00001_29
contig00022_8
contig00002_3
contig00001_90
contig00005_15
contig00001_47
contig00006_1
contig00001_29
contig00006_16
gp62
clamp loader
contig00002_13
contig00002_5
contig00001_39
NrdA
ribonucleotide-diphosphate
reductase subunit alpha
contig00001_25
contig00001_27
contig00006_17
contig00006_17
Absent in RM378 and phiM12.
NrdB
ribonucleotide-diphosphate
reductase subunit beta
contig00001_24
contig00001_28
contig00006_18
Contig00006_18
Absent in RM378 and phiM12.
DexA
gp17
gp18
gp20
gp21
gp23
AAA160J18
Comments
Absent
in
RM378
vB_CsaM_GAP32.
and
contig00001_5
contig00001_8
Absent in RM378.
contig00001_47
Absent in RM378.
Absent in RM378.
Absent
in
RM378
vB_CsaM_GAP32.
and
83
84
10
85
Supplementary Table 5: Taxonomic classification of viral sequences found in each SAG, based on the number of best blast hit
86
method (evalue < 10-5) for each viral family. The highest number of hits for each SAG is indicated in bold.
Phage
Total
genes
Myo
Phycodna
Podo
Other
viruses
Sipho
AAA300-J04
34
2
2
7
2
4
AAA160-D02
48
3
1
16
4
3
AAA160-C11
102
11
0
11
6
3
AAA160-P02
89
8
0
19
4
1
AAA164-I21
78
8
0
8
2
2
AAA164-M04
102
3
1
22
8
2
AAA168-E21
63
0
0
32
0
2
AAA164-O14
47
3
0
6
3
4
AAA164-A21
71
6
0
19
1
3
AAA164-B23
82
2
0
15
3
2
AAA164-P11
205
29
3
5
6
6
AAA076-E06
289
60
1
0
193
13
AAA160-J18
111
75
0
0
11
10
AAA16-0I06
13
3
0
0
0
2
AAA168-P09
23
14
0
0
0
0
AAA160-J20
208
34
2
4
9
3
AAA160-J14
66
6
0
5
22
6
AAA164-A08
103
15
0
3
30
3
AAA164-L15
53
2
0
1
4
1
AAA164-N20
91
0
73
0
1
7
Note
Global homology of this phage to AAA164-B23 suggests that this phage
belongs to Podoviridae.
Global homology to other Podoviridae, like Puniceispirillum phage HMO2011, as well as to phage AAA160-D02, suggests that this phage belongs
to Podoviridae.
Global homology to Pelagibacter phage HTVC011P and Puniceispirillum
phage HMO-2011 suggests that this phage belongs to Podoviridae.
This phage is most similar to Siphoviridae cyanophage S-SKS1, which has
the morphology of a Siphoviridae, but is most similar to Myoviridae in its
genome composition.
87
88
11
Supplementary Table 6: List of viral isolates that were used in genomic alignments.
Phage
Accession Number
Myoviridae
Bacillus phage SP10
Cellulophaga phage phiST
Cyanophage Syn30
Pelagibacter phage HTVC008M
Rhodothermus phage RM378
Synechococcus phage S-SSM4
Cyanophage P-RSM6
Phormidium phage Pf-WMP3
Prochlorococcus phage P-HM1
Prochlorococcus phage P-SSM2
Aeromonas phage 65
Bacillus phage SP10
Campylobacter phage CPt10
Cellulophaga phage phiSM
Cellulophaga phage phiST
Cyanophage S-TIM5
Rhodothermus phage RM378
Synechococcus phage S-PM2
Podoviridae
Salinivibrio CW02
Roseobacter SIO1
Pelagibacter HTVC019P
Pelagibacter HTVC011P
Enterobactiophage T7
Enterobacteriophage K1F
Cyanophage Syn5
Cyanophage P-SSP2
Celeribacter P12053L
Cellulophaga phage phi40
Phycodnaviridae
Bathycoccus sp. RCC1105 virus BpV1
Ostreococcus tauri virus 1
Ostreococcus virus OsV5
Micromonas pusilla virus 12T
NC_019487
KC821604
NC_021072
NC_020484
NC_004735
NC_020875
NC_020855
NC_009551
NC_015280
GU071092
NC_015251
NC_019487
FN667789
NC_020860
KC821604
NC_019516
NC_004735
NC_006820
NC_019540
NC_002519
NC_020483
NC_020482
NC_001604
NC_007456
NC_009531
NC_016656
NC_018280
KC821612
NC_014765
FN386611
NC_010191
NC_020864
12
Supplementary Table 7: List of cyanophage isolates that were used to experimentally
infect Roseobacter strains.
Syn5
8102-8
8017-1
Syn9
P-SSM2
Med4-8a
9515-11a
Syn2
9303-10a
8109-2
S-WHM1
9302-1a
S-PM2
9515-10a
Syn1
8018-8
6501-1
SS120-1
Natl-2A-3
Natl-2A-14
Natl-2A-30
Natl-2A-39
Syn33a
6501-5
6501-9
Syn19
8102-4
8102-12
8109-3
P-SS2
P-SSP7
P-HM1
P-HM2
P-SSM4
9211-16
9215-3a
9215-6a
9303-2
9303-2a
CYANOPHAGE LYSATE USED FOR ROSEOBACTER ASSAY
9515-15
Natl 1A-23
MBARI C-9
MBARI C-107
9515-14
Natl 1A-24
MBARI C-10
MBARI C-108
9515-13
Natl 1A-25
MBARI C-11
MBARI C-109
9515-12
Natl 1A-26
MBARI C-12
MBARI US-1
9515-11
Natl 1A-27
MBARI C-13
MBARI US-2
9515-10
Natl 1A-28
MBARI C-14
MBARI US-3
9515-9
Natl 1A-29
MBARI C-15
MBARI US-4
9515-8
Natl 1A-31
MBARI C-16
MBARI US-5
9515-7
Natl 1A-32
MBARI C-17
MBARI US-7
9515-6
Natl 1A-33
MBARI C-19
MBARI US-13
9515-5
Natl 1A-34
MBARI C-21
MBARI US-17
9515-4
Natl 1A-35
MBARI C-22
MBARI US-18
9515-3
Natl 1A-36
MBARI C-24
MBARI US-19
9515-2
Natl 1A-37
MBARI C-25
MBARI US-20
9515-1
Natl 1A-38
MBARI C-26
MBARI US-23
Natl 1A-1
Natl 1A-39
MBARI C-28
MBARI US-24
Natl 1A-2
Natl 1A-42
MBARI C-29
MBARI US-26
Natl 1A-3
Natl 1A-42
MBARI C-31
MBARI US-30
Natl 1A-4
Natl 1A-44
MBARI C-33
MBARI US-33
Natl 1A-5
Natl 1A-45
MBARI C-34
MBARI US-34
Natl 1A-6
Natl 1A-46
MBARI C-35
MBARI US-36
Natl 1A-7
Med4-1
MBARI C-36
MBARI US-37
Natl 1A-8
Med4-2
MBARI C-37
MBARI US-39
Natl 1A-9
Med4-3
MBARI C-38
MBARI US-40
Natl 1A-10
Med4-5
MBARI C-39
MBARI US-42
Natl 1A-11
Med4-6
MBARI C-40
MBARI US-43
Natl 1A-12
Med4-7
MBARI C-42
MBARI US-44
Natl 1A-13
Med4-8
MBARI C-43
MBARI US-46
Natl 1A-14
Med4-9
MBARI C-45
MBARI US-47
Natl 1A-15
Med4-10
MBARI C-46
MBARI US-49
9303-1
Med4-11
MBARI C-47
MBARI US-50
9303-4
Med4-12
MBARI C-48
MBARI US-52
9303-5
Med4-13
MBARI C-49
MBARI US-53
9303-6
Med4-14
MBARI C-53
MBARI US-54
9303-7
Med4-15
MBARI C-54
MBARI US-56
9303-8
Med4-16
MBARI C-55
MBARI US-57
9303-9
Med4-17
MBARI C-56
MBARI US-59
9303-10
Med4-18
MBARI C-57
MBARI US-60
9211-1
Med4-19
MBARI C-58
MBARI US-61
13
9313-2
SS120-6a
Natl-2A-3a
Natl-2A-19
Natl-2A-31
Natl-2A-40
Natl-2A-41
Natl-2A-53
Natl-2A-79a
Med4-52
Med4-56
Med4-42
Med4-55
Med4-53
Med4-54
Med4-48
Med4-44
7803-6
8018-1
8018-2
8018-5
8018-4
7803-1
7803-8
7803-7
7803-2
7803-3
7803-4
7803-5
9515-12
Med4-51
9515-13
9515-14
9515-18
9515-17
9515-16
9211-2
9211-3
9211-4
9211-5
9211-6
9211-7
9211-8
9211-10
9211-11
9211-12
9211-13
9211-14
9211-15
9303-8
SS120-2
SS120-3
SS120-4
SS120-5
Natl 1A-1
Natl 1A-2
Natl 1A-4
Natl 1A-5
Natl 1A-6
Natl 1A-7
Natl 1A-8
Natl 1A-9
Natl 1A-10
Natl 1A-11
Natl 1A-12
Natl 1A-13
Natl 1A-15
Natl 1A-16
Natl 1A-17
Natl 1A-18
Natl 1A-21
Natl 1A-22
Med4-20
Med4-21
Med4-22
Med4-23
Med4-24
Med4-25
Med4-26
Med4-27
Med4-28
Med4-29
Med4-30
Med4-31
Med4-32
Med4-33
Med4-34
Med4-35
Med4-36
Med4-37
Med4-38
Med4-39
Med4-40
Med4-41
Med4-43
Med4-44
Med4-45
Med4-46
Med4-47
Med4-48
Med4-49
Med4-50
MBARI C-1
MBARI C-2
MBARI C-5
MBARI C-6
MBARI C-7
MBARI C-8
MBARI C-59
MBARI C-60
MBARI C-61
MBARI C-66
MBARI C-67
MBARI C-68
MBARI C-69
MBARI C-72
MBARI C-73
MBARI C-75
MBARI C-76
MBARI C-77
MBARI C-78
MBARI C-79
MBARI C-80
MBARI C-83
MBARI C-84
MBARI C-85
MBARI C-86
MBARI C-88
MBARI C-89
MBARI C-90
MBARI C-91
MBARI C-92
MBARI C-93
MBARI C-94
MBARI C-96
MBARI C-97
MBARI C-98
MBARI C-99
MBARI C-100
MBARI C-101
MBARI C-102
MBARI C-103
MBARI C-104
MBARI C-105
MBARI US-62
MBARI US-63
MBARI US-64
MBARI US-65
MBARI US-71
MBARI US-74
MBARI US-78
MBARI US-79
MBARI US-80
MBARI US-82
MBARI US-83
MBARI US-85
MBARI US-88
MBARI US-89
MBARI US-94
MBARI US-95
MBARI US-101
MBARI US-102
MBARI US-103
MBARI US-104
MBARI US-105
MBARI US-106
MBARI US-108
MBARI US-109
MBARI US-110
MBARI US-111
MBARI US-112
MBARI US-113
MBARI US-114
MBARI US-115
MBARI US-116
MBARI US-117
MBARI US-120
MBARI US-122
MBARI US-123
14
Supplementary Table 8: List of Roseobacter strains tested against cyanophage isolates.
Roseobacter Growth
OD
Growth
OD
Isolate
Hr
575nm
Hr
575nm
1/2YTSS - 25C
502
22
0.168
42
0.380
2597
22
0.200
42
0.334
566
42
0.490
434
42
0.623
ISM
42
0.400
563
42
0.445
GAI101
22
0.154
42
0.483
CCS2
42
0.440
TM1040
63
0.656
EE36
22
0.163
42
0.462
2601
63
0.322
EPP04
42
0.206
R11
43
0.087
E37
42
0.296
2654
42
0.284
Zobell - 21C
Y41
42
0.507
458
42
0.209
445
42
0.558
474
42
0.241
2.1
42
0.426
NAS14.1
42
0.229
DTUF
42
0.368
AW10
42
0.440
DSS-3
63
0.211
DP14-09
465
63
0.103
457
63
0.246
DP1-21
43
0.205
2516
115
0.486
51
0.158
SIO67
43
0.126
DP14-28
63
0.316
51
0.205
DP1-11
51
0.524
K2
63
0.525
CCS1
115
0.256
51
0.142
15
Download