E. coli S88 Chr. Plas. E. coli UMN026 Chr. Plas1 Plas2 E. coli IAI1 Chr. E. coli ED1a Chr. Plas. E. coli 55989 Chr. Plas. E. coli IAI39 Chr. E. fergusonii Chr. Plas. Size (Mb) 5.032 0.134 5.202 0.122 0.034 4.701 5.209 0.120 5.155 0.072 5.132 4.589 0.055 Predicted protein Automatic genes annotation transfera 5086 2909 157 5048 3534 160 50 4627 3658 5275 3833 153 5065 3841 106 4935 4225 4501 2481 58 TOTAL ANNOTATION EFFORT: Manual expert annotationb 1950 144 1384 149 49 833 1296 150 1128 100 681 1855 54 9776 Artefactsc 227 13 130 11 1 136 146 3 96 6 29 165 4 Supplementary Table 2A. Number of predicted protein encoding genes in the genomes of the newly sequenced strains of Escherichia coli and E. fergusonii. Genes were (a) functionality annotated using automatic annotation transfer from K-12 MG1655 orthologs or other ColiScope manually annotated orthologous genes, (b) manually annotated using the MaGe web-based graphical interface, or (c) which were considered as false positive gene predictions. Chr : chromosome Plas : plasmid Genomes integrated in the ColiScope Size database (Mb) E. coli O157:H7 EDL E. coli O157:H7 Sakai E. coli CFT073 E. coli W3110 E. coli UTI89 E. coli 536 E. coli APECO1 E. coli HS S. flexneri 301 S. flexneri 2457T S. boydii Sb227 S. sonnei Ss046 S. dysenteria Sd197 S. flexneri 5b 8401 5.53 5.50 5.23 4.64 5.06 4.94 5.08 4.64 4.61 4.60 4.52 4.82 4.37 4.57 Re -annotation Process a Original data Date 2001-01 2001-02 2002-12 2006-03 2006-04 2006-07 2006-10 2007-09 2002-10 2003-04 2005-11 2005-11 2005-11 2006-07 RefSeq NC_002655 NC_002695 NC_004431 NC_000091 NC_007946 NC_008253 NC_008563 NC_009800 NC_004337 NC_004741 NC_007613 NC_007384 NC_007606 NC_008258 Genes (nb) ‘New’ status ‘Wrong’ status 5374 5269 5443 4352 5029 4668 4461 4577 4656 4668 4542 4585 4649 4522 74 164 72 5 88 40 389 109 136 178 213 379 187 153 94 102 525 0 354 24 126 221 11 56 27 14 117 16 Annotation transfer b E. coli E. coli strains Artefacts from K12 ColiScope 3739 921 25 3757 911 26 3568 1079 56 4199 54 3 3554 1016 34 3603 744 17 3564 1009 37 3788 478 13 3753 490 53 3785 472 53 3612 486 47 3762 533 57 3592 478 52 3688 491 48 Specific genesc 669 637 287 101 159 320 114 186 485 480 583 598 597 432 Supplementary Table 2B. Publicly available Escherichia and Shigella genomes included in the ColiScope database. (a) Inaccurate (‘Wrong’ status) or missed gene annotations (‘New’ status) have been found using our MICheck procedure. For the 14 analyzed genomes, the list of newly predicted genes is given in Supplementary Table 3. (b) Automatic functional annotation transfer between orthologous genes (85 % identity over at least 80 % of the length of the smallest protein) began with similarity results obtained with E. coli K-12 MG1655, then with the new genomes of the ColiScope project. False gene predictions (i.e, artefacts) were those defined in the course of the expert annotation of the ColiScope sequences. (c) ‘Specific genes’ are genes that have no ortholog in E. coli K-12 MG1655 or any of the newly sequenced and annotated genomes.