Genomes integrated in the ColiScope

advertisement
E. coli S88
Chr.
Plas.
E. coli UMN026 Chr.
Plas1
Plas2
E. coli IAI1
Chr.
E. coli ED1a
Chr.
Plas.
E. coli 55989
Chr.
Plas.
E. coli IAI39
Chr.
E. fergusonii
Chr.
Plas.
Size
(Mb)
5.032
0.134
5.202
0.122
0.034
4.701
5.209
0.120
5.155
0.072
5.132
4.589
0.055
Predicted protein
Automatic
genes
annotation transfera
5086
2909
157
5048
3534
160
50
4627
3658
5275
3833
153
5065
3841
106
4935
4225
4501
2481
58
TOTAL ANNOTATION EFFORT:
Manual expert
annotationb
1950
144
1384
149
49
833
1296
150
1128
100
681
1855
54
9776
Artefactsc
227
13
130
11
1
136
146
3
96
6
29
165
4
Supplementary Table 2A. Number of predicted protein encoding genes in the genomes of the newly sequenced strains of Escherichia coli
and E. fergusonii.
Genes were (a) functionality annotated using automatic annotation transfer from K-12 MG1655 orthologs or other ColiScope manually annotated
orthologous genes, (b) manually annotated using the MaGe web-based graphical interface, or (c) which were considered as false positive gene
predictions.
Chr : chromosome
Plas : plasmid
Genomes integrated in
the ColiScope
Size
database
(Mb)
E. coli O157:H7 EDL
E. coli O157:H7 Sakai
E. coli CFT073
E. coli W3110
E. coli UTI89
E. coli 536
E. coli APECO1
E. coli HS
S. flexneri 301
S. flexneri 2457T
S. boydii Sb227
S. sonnei Ss046
S. dysenteria Sd197
S. flexneri 5b 8401
5.53
5.50
5.23
4.64
5.06
4.94
5.08
4.64
4.61
4.60
4.52
4.82
4.37
4.57
Re -annotation
Process a
Original data
Date
2001-01
2001-02
2002-12
2006-03
2006-04
2006-07
2006-10
2007-09
2002-10
2003-04
2005-11
2005-11
2005-11
2006-07
RefSeq
NC_002655
NC_002695
NC_004431
NC_000091
NC_007946
NC_008253
NC_008563
NC_009800
NC_004337
NC_004741
NC_007613
NC_007384
NC_007606
NC_008258
Genes
(nb)
‘New’
status
‘Wrong’
status
5374
5269
5443
4352
5029
4668
4461
4577
4656
4668
4542
4585
4649
4522
74
164
72
5
88
40
389
109
136
178
213
379
187
153
94
102
525
0
354
24
126
221
11
56
27
14
117
16
Annotation transfer b
E. coli E. coli strains Artefacts
from
K12
ColiScope
3739
921
25
3757
911
26
3568
1079
56
4199
54
3
3554
1016
34
3603
744
17
3564
1009
37
3788
478
13
3753
490
53
3785
472
53
3612
486
47
3762
533
57
3592
478
52
3688
491
48
Specific
genesc
669
637
287
101
159
320
114
186
485
480
583
598
597
432
Supplementary Table 2B. Publicly available Escherichia and Shigella genomes included in the ColiScope database.
(a) Inaccurate (‘Wrong’ status) or missed gene annotations (‘New’ status) have been found using our MICheck procedure. For the 14 analyzed
genomes, the list of newly predicted genes is given in Supplementary Table 3. (b) Automatic functional annotation transfer between orthologous
genes (85 % identity over at least 80 % of the length of the smallest protein) began with similarity results obtained with E. coli K-12 MG1655,
then with the new genomes of the ColiScope project. False gene predictions (i.e, artefacts) were those defined in the course of the expert
annotation of the ColiScope sequences. (c) ‘Specific genes’ are genes that have no ortholog in E. coli K-12 MG1655 or any of the newly
sequenced and annotated genomes.
Download