Additional File 3

advertisement
Additional file 3: Sequence differences between the B. thailandensis E264 and
ATCC700388 genomes
We compared the genome sequences of two distinct isolates of Bt strain E264 sequenced
at The Institute for Genomic Research (TIGR), and Bt strain ATCC700388 sequenced at
the Broad Institute (BI)). These two genome sequences are referred to as the TIGR and
BI sequences. As the BI sequence was not assembled to closure, there are 44 gaps in the
BI sequence (29 in Chr 1, 15 in Chr 2). Our comparison is confined to regions that could
be confidently matched between the two genomes, thus regions in these sequence gaps
were deliberately ignored.
1. Large-scale differences between the genomes.
Using methods described in the Main Text, we aligned the TIGR and BI sequences and
visualized the alignment in the form of a dot-matrix diagram.
Genomic alignments of Chr 1 (left) and Chr 2 (right). The x-axis depicts the TIGR sequence, while the yaxis depicts the BI sequence.
It can be seen that a large-scale inversion of about 2 million bp has occurred in Chr 1.
Using the TIGR sequence as a reference, the inversion stretches from position 12442442
(BTH_I1099) until 3328461 (BTH_I2895).
2. Comparison at the CD level
All 5645 CDs in the TIGR sequence were (3282 in chromosome 1 and 2363 in
chromosome 2) compared to the BI sequence. We identified 4 CDs without clear
matches in the BI sequence and which did not lie within or close to sequence gaps.
Notably, BTH_I1485 and BTH_I1486 encode components of a Type-II
oligopolysaccharide biosynthesis gene cluster. We experimentally confirmed the absence
of these two genes in the BI sequence using PCR assays (P.T., data not shown).
3. Comparison at the nucleotide level
We identified a total of 218 sequence polymorphisms between the TIGR and BI
sequence, covering both single nucleotide polymorphisms (SNPs) and small
insertions/deletions (indels). 138 of these polymorphisms were on Chr 1, and 80 on Chr
2. 80 of these polymorphisms were predicted to cause alterations in protein sequence.To
confirm these potential sequence differences, we selected 66 polymorphisms within 37
CDs for confirmatory resequencing (33 polymorphisms from 15 ORFs in chromosome 1
and also 33 polymorphisms from 22 ORFs from chromosome 2). The resequencing
results only managed to positively confirm 1 polymorphism; another 9 polymorphisms
appeared to be potentially genuine as well but they require another round of confirmation
by resequencing. The remaining polymorphisms are either false (no such polymorphism:
sequencing error) or cannot be confirmed due to poor sequence quality. These results are
summarized in the following table.
Type of
polymorphism
Nucleotide substitution
Confirmed
Likely*
False
Poor data quality
Insertion
Confirmed
Likely*
False
Poor data quality
Deletion
Confirmed
Likely*
False
Poor data quality
Total
Number of ORF
Chromosome 1
Chromosome 2
21
18
1
7
9
4
11
0
2
4
12
14
0
0
11
0
1
0
0
13
1
1
0
0
1
0
33
0
0
1
0
33
*: “Likely” denotes that the polymorphism is likely to be true and need more confirmation.
Genes Absent in ATCC700388 but Present in E264
Gene Name
BTH_I1485
BTH_I1486
BTH_I1593
BTH_II2367
Annotation
Start
End
Strand
Length COG-Long Desc
UDP-glucose 4-epimerase
1683400 1684422
-1
1023 UDP-glucose 4-epimerase
glycosyl transferase, group 1 family protein
1684437 1685573
-1
1137 Glycosyltransferase
hypothetical protein
1792124 1792396
-1
273 NA
NADPH-dependent FMN reductase domain protein2905638 2906195
-1
558 Predicted flavoprotein
Genes with Putative Protein Altering Polymorphisms between ATCC700388 and E264
Gene Name
BTH_II1674
BTH_I1838
BTH_II1664
BTH_I1948
BTH_II1080
BTH_II1144
BTH_II1876
BTH_II1286
BTH_II1291
BTH_II0112
BTH_II0834
BTH_II1933
BTH_I1057
BTH_I3118
BTH_II0682
BTH_I2788
BTH_II0241
BTH_II1409
BTH_I2892
BTH_II0214
BTH_II0626
BTH_I2962
BTH_II2289
BTH_I0897
BTH_I3110
BTH_I3244
BTH_I1724
BTH_I0505
BTH_I2580
BTH_I2335
BTH_I2663
BTH_II1545
BTH_I3271
BTH_II2167
BTH_I0770
BTH_I0733
BTH_I0079
BTH_II2048
BTH_I1885
BTH_II2268
BTH_II0721
BTH_I0188
BTH_I3064
BTH_I0887
BTH_I1928
BTH_II1123
BTH_II0725
BTH_I0628
BTH_I1198
BTH_II0995
BTH_I0792
BTH_I1594
BTH_I2162
BTH_II0730
BTH_I1886
BTH_I1675
BTH_II1934
BTH_I0164
BTH_I1237
BTH_I2413
BTH_I1853
BTH_I2951
BTH_I2842
BTH_II1010
BTH_II1645
BTH_II2273
BTH_I2750
BTH_II0894
Resequenced?
No
No
No
Yes
No
Yes
No
No
Yes
No
Yes
No
No
No
Yes
No
No
Yes
Yes
No
No
No
Yes
No
No
No
No
No
No
Yes
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
Yes
Yes
No
Yes
No
No
No
No
No
No
No
No
No
No
No
No
No
Yes
No
No
No
No
Start
End
Strand
Gene Length
2011510 2028396
-1
16887
ATP-dependent helicase HrpA
2065457 2069599
1
4143
polyketide
synthase,
putative
1953925 1960173
-1
6249
outer membrane
autotransporter
domain
protein
1
1884
RND
efflux
system, outer membrane 2195291 2197174
lipoprotein,ammonia-lyase,
NodT family
1257179 1258840
1
1662
threonine
biosynthetic
1
1608
RND efflux system, outer membrane 1332457 1334064
lipoprotein,
family
-1
1662
RND efflux NodT
system,
outer membrane 2273440 2275101
lipoprotein, NodT family
1530903 1532534
-1
1632
HlyD family secretion protein
1539483 1540916
1
1434
Hep_Hag family
127466
129949
1
2484
BsaU protein
975515
976726
1
1212
conserved
hypothetical
protein
2352051 2353157
-1
1107
oxidoreductase,
zinc-binding
dehydrogenase family protein
1200939 1201958
1
1020
ISBma1, transposase
3556186 3557346
1
1161
acetyl-CoA
carboxylase, carboxyl
transferase, beta subunit
798103
798975
1
873
hydroxyacylglutathione hydrolase
3204365 3205171
-1
807
major facilitator
family transporter
292278
293852
-1
1575
histidine
ABC transporter,
permease
protein
1661008 1661721
-1
714
transposase,
Mutator
3326258 3327526
1
1269
ATP-dependent
RNA family
helicase
DbpA
256014
257702
-1
1689
Acyltransferase
putative
731844
733061
-1
1218
unnamed
proteinfamily,
product;
Similar to
Hcp protein
3409765 3410268
-1
504
conserved
hypothetical
protein
2805466 2805993
1
528
ABC transporter,
ATP-binding
protein
1023583 1024923
1
1341
NADP-dependent malic enzyme
3545624 3547894
-1
2271
transposase,
Mutator
family protein
3697762 3699030
-1
1269
sigma factor algU
regulatory
MucA, putative
1932194 1932796
1
603
transposase
559553
561472
-1
1920
lipoprotein,
putative
2942786 2943994
1
1209
alcohol dehydrogenase,
ironcontaining
2628295 2630208
1
1914
lipoprotein, putative
3038953 3039759
1
807
ISBma3, transposase, truncation
1815390 1816652
1
1263
conserved hypothetical protein
3730425 3731204
1
780
conserved hypothetical protein
2664332 2665588
-1
1257
isoleucyl-tRNA synthetase
885342
888179
-1
2838
conserved hypothetical protein
842829
843296
1
468
lipoprotein, putative
85638
85793
1
156
conserved
hypothetical protein
2496382 2496927
-1
546
single-stranded-DNA-specific
exonuclease
2130301 2131995
-1
1695
Bacterial typeRecJ
II and III secretion
system protein domain protein
2781243 2783282
1
2040
glutamyl-tRNA, putative
845217
846821
-1
1605
conserved domain protein
219016
219486
1
471
ribosomal
S19
3498799 3499074
-1
276
amino
acidprotein
ABC transporter,
ATPbinding protein
1013646 1014413
-1
768
PAAR motif family
2171510 2171935
-1
426
O-methyltransferase family protein
1306459 1307160
1
702
cytochrome
P450of unknown
852095
854446
1
2352
Bacterial
protein
function (DUF879) superfamily
722822
724762
-1
1941
cyanate hydratase
1345311 1345781
-1
471
L-serine ammonia-lyase
1180738 1182210
1
1473
CDP-6-deoxy-delta-3,4-glucoseen
reductase,
909071
910102
-1
1032
cold-shock putative
domain family proteinrelated
1792799 1793002
1
204
glycosylprotein
transferase, group 1 family
protein
2440075 2441139
1
1065
lectin repeat domain protein
858567
860987
-1
2421
conserved
hypothetical
protein
2132011 2133153
-1
1143
sigma-54 dependent
DNA-binding
transcriptional regulator
1883709 1885598
-1
1890
conserved
protein
2353157 2354038
-1
882
heat
shockhypothetical
protein HslVU,
ATPase
subunit HslU
195371
196714
-1
1344
glutathione S-transferase, putative
1385359 1386063
1
705
conserved hypothetical protein
2747430 2748002
1
573
nitrate reductase, beta subunit
2086546 2088099
-1
1554
hypotheticalhypothetical
protein
3394070 3395035
-1
966
conserved
protein,
frameshift
3259631 3260546
-1
916
major facilitator family transporter
1196765 1197961
-1
1197
hypothetical
proteinASSEMBLY
1929535 1929744
-1
210
PUTATIVE PILUS
PROTEIN
2785938 2787146
1
1209
2.7.1.66
3161084 3161914
-1
831
transcriptional regulator, LysR family 1048640 1049548
-1
909
Conclusion from resequencing
Annotation
NA
polyketide synthase
NA
NA
Sequencing error in ATCC700388
NA
Sequencing Error in E264
NA
NA
Sequencing Error in E264
NA
Sequencing error in ATCC700388
NA
NA
NA
Sequencing error in ATCC700388
NA
NA
Sequencing error in ATCC700388
Sequencing error in ATCC700388
NA
NA
NA
Weak signal in chromatogram
NA
NA
NA
NA
NA
NA
Sequencing error in ATCC700388
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Sequencing error in ATCC700388
Sequencing error in ATCC700388
NA
Sequencing error in ATCC700388
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
Sequencing error in ATCC700388
NA
NA
NA
NA
Download