Additional file 3: Sequence differences between the B. thailandensis E264 and ATCC700388 genomes We compared the genome sequences of two distinct isolates of Bt strain E264 sequenced at The Institute for Genomic Research (TIGR), and Bt strain ATCC700388 sequenced at the Broad Institute (BI)). These two genome sequences are referred to as the TIGR and BI sequences. As the BI sequence was not assembled to closure, there are 44 gaps in the BI sequence (29 in Chr 1, 15 in Chr 2). Our comparison is confined to regions that could be confidently matched between the two genomes, thus regions in these sequence gaps were deliberately ignored. 1. Large-scale differences between the genomes. Using methods described in the Main Text, we aligned the TIGR and BI sequences and visualized the alignment in the form of a dot-matrix diagram. Genomic alignments of Chr 1 (left) and Chr 2 (right). The x-axis depicts the TIGR sequence, while the yaxis depicts the BI sequence. It can be seen that a large-scale inversion of about 2 million bp has occurred in Chr 1. Using the TIGR sequence as a reference, the inversion stretches from position 12442442 (BTH_I1099) until 3328461 (BTH_I2895). 2. Comparison at the CD level All 5645 CDs in the TIGR sequence were (3282 in chromosome 1 and 2363 in chromosome 2) compared to the BI sequence. We identified 4 CDs without clear matches in the BI sequence and which did not lie within or close to sequence gaps. Notably, BTH_I1485 and BTH_I1486 encode components of a Type-II oligopolysaccharide biosynthesis gene cluster. We experimentally confirmed the absence of these two genes in the BI sequence using PCR assays (P.T., data not shown). 3. Comparison at the nucleotide level We identified a total of 218 sequence polymorphisms between the TIGR and BI sequence, covering both single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels). 138 of these polymorphisms were on Chr 1, and 80 on Chr 2. 80 of these polymorphisms were predicted to cause alterations in protein sequence.To confirm these potential sequence differences, we selected 66 polymorphisms within 37 CDs for confirmatory resequencing (33 polymorphisms from 15 ORFs in chromosome 1 and also 33 polymorphisms from 22 ORFs from chromosome 2). The resequencing results only managed to positively confirm 1 polymorphism; another 9 polymorphisms appeared to be potentially genuine as well but they require another round of confirmation by resequencing. The remaining polymorphisms are either false (no such polymorphism: sequencing error) or cannot be confirmed due to poor sequence quality. These results are summarized in the following table. Type of polymorphism Nucleotide substitution Confirmed Likely* False Poor data quality Insertion Confirmed Likely* False Poor data quality Deletion Confirmed Likely* False Poor data quality Total Number of ORF Chromosome 1 Chromosome 2 21 18 1 7 9 4 11 0 2 4 12 14 0 0 11 0 1 0 0 13 1 1 0 0 1 0 33 0 0 1 0 33 *: “Likely” denotes that the polymorphism is likely to be true and need more confirmation. Genes Absent in ATCC700388 but Present in E264 Gene Name BTH_I1485 BTH_I1486 BTH_I1593 BTH_II2367 Annotation Start End Strand Length COG-Long Desc UDP-glucose 4-epimerase 1683400 1684422 -1 1023 UDP-glucose 4-epimerase glycosyl transferase, group 1 family protein 1684437 1685573 -1 1137 Glycosyltransferase hypothetical protein 1792124 1792396 -1 273 NA NADPH-dependent FMN reductase domain protein2905638 2906195 -1 558 Predicted flavoprotein Genes with Putative Protein Altering Polymorphisms between ATCC700388 and E264 Gene Name BTH_II1674 BTH_I1838 BTH_II1664 BTH_I1948 BTH_II1080 BTH_II1144 BTH_II1876 BTH_II1286 BTH_II1291 BTH_II0112 BTH_II0834 BTH_II1933 BTH_I1057 BTH_I3118 BTH_II0682 BTH_I2788 BTH_II0241 BTH_II1409 BTH_I2892 BTH_II0214 BTH_II0626 BTH_I2962 BTH_II2289 BTH_I0897 BTH_I3110 BTH_I3244 BTH_I1724 BTH_I0505 BTH_I2580 BTH_I2335 BTH_I2663 BTH_II1545 BTH_I3271 BTH_II2167 BTH_I0770 BTH_I0733 BTH_I0079 BTH_II2048 BTH_I1885 BTH_II2268 BTH_II0721 BTH_I0188 BTH_I3064 BTH_I0887 BTH_I1928 BTH_II1123 BTH_II0725 BTH_I0628 BTH_I1198 BTH_II0995 BTH_I0792 BTH_I1594 BTH_I2162 BTH_II0730 BTH_I1886 BTH_I1675 BTH_II1934 BTH_I0164 BTH_I1237 BTH_I2413 BTH_I1853 BTH_I2951 BTH_I2842 BTH_II1010 BTH_II1645 BTH_II2273 BTH_I2750 BTH_II0894 Resequenced? No No No Yes No Yes No No Yes No Yes No No No Yes No No Yes Yes No No No Yes No No No No No No Yes No No No No No No No No No No No No No No No No Yes Yes No Yes No No No No No No No No No No No No No Yes No No No No Start End Strand Gene Length 2011510 2028396 -1 16887 ATP-dependent helicase HrpA 2065457 2069599 1 4143 polyketide synthase, putative 1953925 1960173 -1 6249 outer membrane autotransporter domain protein 1 1884 RND efflux system, outer membrane 2195291 2197174 lipoprotein,ammonia-lyase, NodT family 1257179 1258840 1 1662 threonine biosynthetic 1 1608 RND efflux system, outer membrane 1332457 1334064 lipoprotein, family -1 1662 RND efflux NodT system, outer membrane 2273440 2275101 lipoprotein, NodT family 1530903 1532534 -1 1632 HlyD family secretion protein 1539483 1540916 1 1434 Hep_Hag family 127466 129949 1 2484 BsaU protein 975515 976726 1 1212 conserved hypothetical protein 2352051 2353157 -1 1107 oxidoreductase, zinc-binding dehydrogenase family protein 1200939 1201958 1 1020 ISBma1, transposase 3556186 3557346 1 1161 acetyl-CoA carboxylase, carboxyl transferase, beta subunit 798103 798975 1 873 hydroxyacylglutathione hydrolase 3204365 3205171 -1 807 major facilitator family transporter 292278 293852 -1 1575 histidine ABC transporter, permease protein 1661008 1661721 -1 714 transposase, Mutator 3326258 3327526 1 1269 ATP-dependent RNA family helicase DbpA 256014 257702 -1 1689 Acyltransferase putative 731844 733061 -1 1218 unnamed proteinfamily, product; Similar to Hcp protein 3409765 3410268 -1 504 conserved hypothetical protein 2805466 2805993 1 528 ABC transporter, ATP-binding protein 1023583 1024923 1 1341 NADP-dependent malic enzyme 3545624 3547894 -1 2271 transposase, Mutator family protein 3697762 3699030 -1 1269 sigma factor algU regulatory MucA, putative 1932194 1932796 1 603 transposase 559553 561472 -1 1920 lipoprotein, putative 2942786 2943994 1 1209 alcohol dehydrogenase, ironcontaining 2628295 2630208 1 1914 lipoprotein, putative 3038953 3039759 1 807 ISBma3, transposase, truncation 1815390 1816652 1 1263 conserved hypothetical protein 3730425 3731204 1 780 conserved hypothetical protein 2664332 2665588 -1 1257 isoleucyl-tRNA synthetase 885342 888179 -1 2838 conserved hypothetical protein 842829 843296 1 468 lipoprotein, putative 85638 85793 1 156 conserved hypothetical protein 2496382 2496927 -1 546 single-stranded-DNA-specific exonuclease 2130301 2131995 -1 1695 Bacterial typeRecJ II and III secretion system protein domain protein 2781243 2783282 1 2040 glutamyl-tRNA, putative 845217 846821 -1 1605 conserved domain protein 219016 219486 1 471 ribosomal S19 3498799 3499074 -1 276 amino acidprotein ABC transporter, ATPbinding protein 1013646 1014413 -1 768 PAAR motif family 2171510 2171935 -1 426 O-methyltransferase family protein 1306459 1307160 1 702 cytochrome P450of unknown 852095 854446 1 2352 Bacterial protein function (DUF879) superfamily 722822 724762 -1 1941 cyanate hydratase 1345311 1345781 -1 471 L-serine ammonia-lyase 1180738 1182210 1 1473 CDP-6-deoxy-delta-3,4-glucoseen reductase, 909071 910102 -1 1032 cold-shock putative domain family proteinrelated 1792799 1793002 1 204 glycosylprotein transferase, group 1 family protein 2440075 2441139 1 1065 lectin repeat domain protein 858567 860987 -1 2421 conserved hypothetical protein 2132011 2133153 -1 1143 sigma-54 dependent DNA-binding transcriptional regulator 1883709 1885598 -1 1890 conserved protein 2353157 2354038 -1 882 heat shockhypothetical protein HslVU, ATPase subunit HslU 195371 196714 -1 1344 glutathione S-transferase, putative 1385359 1386063 1 705 conserved hypothetical protein 2747430 2748002 1 573 nitrate reductase, beta subunit 2086546 2088099 -1 1554 hypotheticalhypothetical protein 3394070 3395035 -1 966 conserved protein, frameshift 3259631 3260546 -1 916 major facilitator family transporter 1196765 1197961 -1 1197 hypothetical proteinASSEMBLY 1929535 1929744 -1 210 PUTATIVE PILUS PROTEIN 2785938 2787146 1 1209 2.7.1.66 3161084 3161914 -1 831 transcriptional regulator, LysR family 1048640 1049548 -1 909 Conclusion from resequencing Annotation NA polyketide synthase NA NA Sequencing error in ATCC700388 NA Sequencing Error in E264 NA NA Sequencing Error in E264 NA Sequencing error in ATCC700388 NA NA NA Sequencing error in ATCC700388 NA NA Sequencing error in ATCC700388 Sequencing error in ATCC700388 NA NA NA Weak signal in chromatogram NA NA NA NA NA NA Sequencing error in ATCC700388 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA Sequencing error in ATCC700388 Sequencing error in ATCC700388 NA Sequencing error in ATCC700388 NA NA NA NA NA NA NA NA NA NA NA NA NA Sequencing error in ATCC700388 NA NA NA NA