mec13321-sup-0001-SuppInfo

advertisement
Supporting Information
TITLE:
Ecological connectivity shapes quasispecies structure of RNA virus in an Antarctic lake
AUTHORS:
Alberto López-Bueno1, Alberto Rastrojo1, Ramón Peiró1, Miguel Arenas1 & Antonio Alcamí1*
FULL POSTAL ADDRESS:
1
Department of Virology and Microbiology. Centro de Biología Molecular “Severo Ochoa”
(Consejo Superior de Investigaciones Científicas-Universidad Autónoma de Madrid). Nicolás
Cabrera 1, Cantoblanco, 28049, Madrid, Spain
CORRESPONDING AUTHOR:
*
Antonio Alcamí. Tel. +34 911964560, Fax +34 911964420, Email aalcami@cbm.csic.es
1
Supporting Fig. 1. Taxonomy overview of reads contained in contigs from Lake Limnopolar
viromes. BLASTx (E value< 0.001) comparison of contigs obtained with Newbler assembler against a
dataset of reference viral genomes. (A) Breakdown in assembled or unassembled metagenomic reads
(singletons). (B) Taxonomic profile of reads in contigs assigned to viral families.
2
Supporting Fig. 2. Genomic organization of four APLVs. ORFs with >100 codons are drawn as grey
arrows. Blocks within ORFs represent different domains: green for RNA_helicase (pfam00910), yellow
for 3C_cysteine_proteinase (pfam00548), light blue for RNA_dependent_RNA_polymerase (pfam00680),
red for Rhv_like capsid domain (pfam00073), blue for Dicistroviridae VP4 protein (pfam11492) and light
green for CRPV-Capsid domain (pfam08762). Final lengths include 20 consecutive adenines at the 3´-end.
3
Supporting Fig. 3. Principal Component Analysis (PCA) of the dinucleotide bias of APLV14 and a comprehensive set of viruses from the order Picornavirales. Blue, red and green dots
represent values for individual viruses known to infect plants, invertebrates and microorganisms
respectively. Ellipses contain 67% of the total variance of virus groups with the same host.
4
Supporting Fig. 4. Genome stability of APLVs. (A) Nucleotide and amino acid differences
affecting the consensus sequence of APLV1, 2 and 3 among water and cyanobacterial mat
samples. (B) Shared SNVs among quasispecies of APLV1 considering the 50 most abundant
SNVs. (C) Ranking of the eight most abundant SNVs of APLV1 with colours indicating identical
SNVs among samples.
5
Supporting Fig. 5. Statistical analysis of complexity indexes. Statistical comparisons were performed
using unpaired Mann-Whitney test in GraphPad software package. P-value < 0.05 was considered
significant. (A) Average SNV frequencies obtained with CLC are shown. (B) Entropy was calculated for
each position using the formula H = -Σi=A, C, G, TPilog2Pi , where Pi represents the probability of ith base.
Average entropy is represented. (C) Diversity was defined as the proportion of different nucleotide states
for each position along the genome and was calculated with the following formula D = 1 - Σi=A, C, G, T (Pi)2 .
Average diversity is represented. (D) dN-dS for each codon was calculated as described in Material and
Methods using an in-house script. Horizontal lines inside boxes represent the median and the whiskers
represent minimum and maximum values. Error bars represent standard deviation in A, B and C.
6
Supporting Fig. 6. SNV calling calculated by four alternative methods. (A) Distribution of SNVs
along APLV1 and APLV2 genomes. As indicated, each coloured dot represents the method used to
validate SNVs forming the quasispecies of APLV1 in 2006 and APLV2 in 2007. Genome organization
schemes are drawn above graphs to properly locate the SNVs. (B) Top ten of the most abundant SNVs
called under four alternative methods are shown.
7
Supporting Table 1. Details of sample collection.
Virome
L2006
L2007
L2010
CyaMat
Sample
Freshwater (4 m depth)
Freshwater (4 m depth)
Freshwater (4 m depth)
Cyanobacteria Mat (Pool 5 samples)
Physical factors
Ice-covered
Open lake
Open lake
Non ice-covered mats
Collecting date
27th Nov 2006
22nd Jan 2007
1st Feb 2010
2nd Feb 2008
Season
Spring
Summer
Summer
Summer
Supporting Table 2. Oligonucleotides used for scaffolding and 3´ end sequencing.
Virus
APLV1
APLV2
APLV3
APLV4
Region
3´-end*
3´-end*
Inner
Inner
Inner
Inner
Inner
Inner
3´-end*
3´-end*
Inner
Inner
3´-end*
3´-end*
3´-end*
3´-end*
Name
7739F-rS9B1
7977F-rS9B1
4504F-rS9A3
5237R-rS9A3
5493F-rS9A·
6556R-rS9A3
7114F-rS9A3
7839R-rS9A3
8630F-rS9A3
8986F-rS9A3
4065F-rS9A17
4904R-rS9A17
7892F-rS9A17
8304F-rS9A17
8088F-rS9Z121
8360F-rS9Z121
Sequence (5´-3´)
GAGGTATAGCTTCAACGC
TGATGCACGTTGTTTCGC
GAGCAAGGTCAAATTTGGA
CATCTTGCTCTTAGGCCC
TGGTACGCTTGCTATCTG
CAAGCCTACAATGCCAAG
CTAGACCCTACTGAATCAC
ATTGAGCCTCGCCAATAC
TTACTCCGGCTTTCGCAG
TTCTCAGTGATGAAGGCG
GCTCGACCATATCTACGG
GCCTACACGTAATACCAC
GGAAGTGTTGGGAGAAGG
GATACGCTTAGTGTAAGGG
CGACTCAAAGTGTTGAACC
ACGTAAGTGTACTTTTGCG
*A Nested PCR with Oligo dT was employed to sequence the 3´end region of the four APLVs genomes
Supporting Table 3. Metagenomic reads statistics and accession numbers.
L2006
L2007
L2010
Imput
Reads
42,653
65,525
95,425
Average length
374 bp
325 bp
361 bp
Mbp
16.0
21.3
34.4
Primers clippingReads (%)
37,956 (89.0)
54,727 (83.5)
83,379 (87.4)
quality filtering
Average length
359 bp
316 bp
347 bp
Mbp (%)
13.7 (85.9)
17.5 (82)
29.1 (84.4)
Repository
SRA accession no.
SRS668074
SRS668081
SRS668082
MG-RAST id.
4572579.3
4572623.3
4572643.3
Metavir id.
4488
4489
4490
Metagenomic reads statistics before and after primer-clipping and quality filtering. Accession numbers of
RNA viromes in three repositories of metagenomics reads. Sequence Read Archive (SRA) numbers
correspond to Biosamples and they are included in the Bioproject SRP044919.
8
Supporting Table 4. Taxonomic profile of contigs.
dsDNA
ssDNA
dsRNA
ssRNA
Number of contigs (% reads in contigs)
Number viral contigs (% reads in contigs)
Myoviridae
Podoviridae
Siphoviridae
Mimiviridae
Phycodnaviridae
Poxviridae
Satellites
Circoviridae
Inoviridae
Microviridae
Nanoviridae
Unclassified
Totiviridae
Reoviridae
Alphatetraviridae
Bromoviridae
Sobemovirus
Tombusviridae
Virgaviridae
Dicistroviridae
Secoviridae
Picornaviridae
Bacillarnavirus
Iflaviridae
Labyrnavirus
Marnaviridae
Environmental Picornavirales
Unclassified ssRNA viruses
L2006
79 (72.8%)
39 (66.6%)
0
1
1
0
0
2
1
3
3
12
1
3
0
0
0
0
0
1
0
4
0
0
4
0
0
1
2
0
L2007
222 (70,7%)
60 (41.8%)
3
0
7
2
2
1
0
0
1
3
0
0
1
0
0
0
0
0
1
4
1
1
10
1
1
1
18
2
L2010
201 (72.4%)
96 (60.7%)
0
1
0
1
0
0
0
0
0
0
0
0
3
1
1
1
1
8
0
18
2
1
15
0
3
5
33
2
Reads were assembled with Newbler 2.5.3 using 97% identity over a minimum overlapping of 90%. Binnig
was performed by comparison with the reference viral proteins dataset of the NCBI using BLASTx (threshold
of 10-3 on the BLAST e-value) and contigs were ascribed to viral families or groups based on the best hit.
9
Supporting Table 5. BLASTp against GenBank nr of the APLVs-ORFs.
APLV1(ORF-1)
APLV1(ORF-2)
APLV2(ORF-1)
APLV2(ORF-2)
APLV3(ORF-1)
APLV4(ORF-1)
APLV4(ORF-1)
E-Value
3E-164
5E-155
0
2E-118
6E-59
3E-95
1E-99
Query
27%
99%
84%
98%
60%
69%
97%
Identity
58%
37%
40%
32%
24%
30%
33%
Family
Dicistroviridae
Dicistroviridae
Environmental
Bacillarnavirus
Bacillarnavirus
Environmental
Bacillarnavirus
Species
Nedicistrovirus TFN-2012
Nedicistrovirus TFN-2012
JP-A
AglaRNAV
CtenRNAV
JP-A
AglaRNAV
Query indicates the percentage of aa in the alignment region and Identity the percentage of aa identity. JPA: Marine RNA virus JP-A, AglaRNAV: Asterionellopsis glacialis RNA virus, CtenRNAV: Chaetoceros
tenuissimus RNA virus.
Supporting Table 6. Percentage of amino acid identity among conserved domains of APLVs.
Rhv domains (pfam00073)
*
APLV1-1
APLV1-2*
APLV2-1*
APLV2-2*
APLV4-1*
APLV4-2*
APLV1-1 APLV1-2 APLV2-1 APLV2-2 APLV4-1 APLV4-2
100
20.2
19.6
14.4
23.1
12.6
20.2
100
17.0
25.2
20.3
23.5
19.6
17.0
100
18.5
40.6
14.9
14.4
25.2
18.5
100
16.3
39.4
23.1
20.3
40.6
16.3
100
16.4
12.6
23.5
14.9
39.4
16.4
100
RdRp domain (pfam00680)
APLV1
APLV2
APLV3
APLV4
APLV1
100
23.6
22.4
23.0
APLV2
23.6
100
32.9
34.4
APLV3
22.4
32.9
100
29.9
*
APLV4
23.0
34.4
29.9
100
There are two Rhv domains in the structural gene of ALPV1, 2 and 4.
10
Supporting Table 7. Complexity indexes of APLV1 and APLV2 viral quasispecies.
APLV1-L2006
Variant calling method
CLC
MPileup
No. SNVs
Coverage
Quality
Length studied
Mean diversity
SNV frequency
Ti/Tv
Sn
Sn norm
π (%)
134
831
34.6
8,333
1.61
2.58
11.18
20.8
0.0012
0.077
266
572
ND
8,315
3.20
2.12
2.97
37.3
0.0022
0.129
APLV2-L2007
MPileup MPileup
+Realig. +Shorah
153
621
ND
8,315
1.84
2.19
7.50
21.5
0.0013
0.076
118
932
ND
8,150
1.45
3.23
0.62
22.5
0.0014
0.087
CLC
MPileup
475
447
35.4
8,877
5.35
11.89
4.22
208.8
0.0118
0.964
453
300
ND
8,600
5.27
9.54
4.46
163.9
0.0095
0.758
MPileup
MPileup
+Shora
+Realig.
h
455
255
254
293
ND
ND
8,600
8,397
5.29
3.04
7.29
14.33
7.77
4.20
135.5
123.1
0.0079
0.0073
0.614
0.620
Complexity indexes of two viral quasispecies obtained following four distinct SNV-calling strategies. No.
SNVs: number of SNVs >1%; Coverage refers to the average coverage for SNV sites; Length studied
(coverage>25×; bp); Mean diversity (% SNV sites); SNV frequency: average percentage of SNVs; Ti/Tv
represents the ratio of nucleotide transitions and transvertions; Sn and NSn are the Shannon index and the
Normalized Shannon index respectively; π is the nucleotide diversity; ND Non Determined.
Supporting Table 8. Cyanobacterial Mat RNA viral contigs in the Lake Limnopolar.
Reads
Contigs/
Genomes
C-26
APLV3
APLV2
C-79
C-130
C-1
C-2
Length
(bp)
3,874
8,644
9,357
2,207
1,628
8,251
7,561
CyaMat
6,086
4,151
2,776
2,071
1,165
676
485
ORF Taxonomy
L2006
0
1
13
0
0
0
0
L2007
0
2,918
4,219
0
0
8
44
L2010
0
749
634
1
2
16
68
Non-structural
JP-A
CtenRNAV
JP-A
CsfrRNAV
JP-A
RsRNAV
Structural
AglaRNAV
AglaRNAV
CtenRNAV
AglaRNAV
Subsamples of 25,000 reads from each virome were aligned to full-length RNA viral genomes or contigs
previously assembled from Cyanobacterial Mat virome. ORF-taxonomy was obtained by BLASTp against
GenBank nr database. JP-A: Marine RNA virus JP-A, CtenRNAV: Chaetoceros tenuissimus RNA,
AglaRNAV: Asterionellopsis glacialis RNA virus, CsfrRNAV: Chaetoceros socialis f. radians RNA
virus, RsRNAV: Rhizosolenia setigera RNA virus.
11
Supporting Table 9. Alternative methods for global dN/dS rates estimation.
APLV
KaKs
_Calculator
HyPhy
In-house
script
ORF2
APLV2
CyaMa
t
ORF1
CyaMa
t
ORF2
NG (JC69)
0.2607
0.1405 0.1865 0.1759 0.2004 0.1285
0.2312
L2006
L2006
L2007
L2007
L2010
L2010
ORF1
ORF2
ORF1
ORF2
ORF1
L2007
L2007
L2010
L2010
ORF1
ORF2
ORF1
ORF2
APLV3
CyaMa
t
ORF1
0.1705
0.0344 0.1138 0.0536 0.1160
0.1723
APLV
L2007
L2010
L2010
L2010
ORF1
ORF1
ORF1
ORF2
0.0601 0.0978
0.1637
0.2812
YN (HKY)
0.3567
0.1118 0.2443 0.1204 0.2624 0.0896
0.3445
0.2061
0.0406 0.1229 0.0627 0.1283
0.2115
0.0707 0.1088
0.1659
0.2462
MYN (TN93)
0.4001
0.1120 0.2930 0.1296 0.3520 0.0935
0.3611
0.2033
0.0484 0.1206 0.0706 0.1267
0.2101
0.0715 0.1189
0.2062
0.2818
MS
0.4096
0.1459 0.2998 0.1823 0.3463 0.1061
0.3249
0.2248
0.0512 0.1423 0.0735 0.1355
0.2343
0.0778 0.1214
0.1813
0.2641
MA
0.3994
0.1376 0.2928 0.1766 0.3457 0.0947
0.3385
0.2228
0.0512 0.1415 0.0735 0.1382
0.2338
0.0773 0.1216
0.1813
0.2607
Average (HKY)
0.4847
0.2102 0.4785 0.1009 0.4672 0.1377
0.3187
0.3444
0.0347 0.1600 0.0535 0.1412
0.1173
0.0683 0.0839
0.0963
0.2207
SD (HKY)
0.0240
0.0260 0.0395 0.0177 0.0325 0.0198
0.0147
0.0346
0.0003 0.0015 0.0004 0.0014
0.0019
0.0006 0.0007
0.0014
0.0114
Average (TN93)
0.6271
0.2537 0.6668 0.1443 0.6756 0.1825
0.3908
0.3546
0.0485 0.1756 0.0702 0.1523
0.1337
0.0788 0.0977
0.1152
0.2630
SD (TN93)
0.0322
0.0312 0.0507 0.0253 0.0434 0.0271
0.0172
0.0385
0.0005 0.0017 0.0005 0.0013
0.0021
0.0008 0.0007
0.0019
0.0147
Average (F81)
0.2850
0.1556 0.2835 0.0729 0.2769 0.1129
0.1930
0.2689
0.0215 0.1263 0.0340 0.1095
0.0803
0.0482 0.0593
0.0703
0.1847
SD (F81)
0.0162
0.0251 0.0249 0.0161 0.0177 0.0215
0.0080
0.0285
0.0003 0.0014 0.0002 0.0011
0.0012
0.0004 0.0004
0.0011
0.0107
Average (GTR)
0.6191
0.2595 0.6612 0.1445 0.6743 0.1871
0.3946
0.3538
0.0491 0.1783 0.0710 0.1542
0.1336
0.0790 0.0978
0.1169
0.2755
SD (GTR)
0.0316
0.0313 0.0491 0.0346 0.0470 0.0257
0.0160
0.0368
0.0005 0.0016 0.0005 0.0015
0.0043
0.0007 0.0009
0.0015
0.0152
JC69
0.2942
0.1517 0.2162 0.1719 0.2304 0.1260
0.2513
0.1792
0.0472 0.1256 0.0691 0.1295
0.1936
0.0713 0.1150
0.1853
0.3151
HKY
0.2941
0.1518 0.2162 0.1719 0.2304 0.1260
0.2513
0.1792
0.0470 0.1241 0.0688 0.1287
0.1936
0.0712 0.1153
0.1845
0.3199
F81
0.2941
0.1518 0.2162 0.1719 0.2304 0.1260
0.2513
0.1792
0.0472 0.1256 0.0691 0.1294
0.1936
0.0714 0.1150
0.1855
0.3139
TN93
0.2925
0.1518 0.2149 0.1719 0.2304 0.1260
0.2513
0.1792
0.0468 0.1236 0.0689 0.1279
0.1936
0.0711 0.1152
0.1847
0.3191
GTR
0.2929
0.1518 0.2151 0.1719 0.2304 0.1260
0.2513
0.1792
0.0468 0.1228 0.0687 0.1276
0.1936
0.0711 0.1155
0.1840
0.3232
Global dN/dS rates estimated by three alternative methods that implement several nucleotide substitution models. Information regarding the KaKs
calculator substitution matrices can be retrieved from http://services.cbu.uib.no/tools/kaks. dN/dS estimated by HyPhy package from simulated
reads reflect the average value from 100 replicates and the standard deviation is also provided.
12
Download