The E. coli Extended Genome: Ecological and Clinical Implications. Fernando Baquero. Department of Microbiology, Ramón y Cajal University Hospital, and Laboratory for Microbial Evolution, Centre for Astrobiology, INTA-CSIC, Madrid, 28034 fbaquero.hrc@salud.madrid.org The entire genome analysis of a number of different E. coli strains is becoming available (K12MG1655, K12-W3110, O157:H7-(GIRC), O157:H7-EDL933, E2348/69, CFT073, O42, HS, E24377A), and the same is true for the closely related organism Shigella flexneri (strains SF-301, 2457T). A key and immediate result of this analysis is that that not a single strain may represent all the strains. The number of genes contained in these genomes range from 4,289 in E. coli MG1655 (genome size, 4,63 Mb) to 5,379 in E. coli CFT073 (5,23 Mb). Nevertheless, all these genomes contain a backbone sequence of about 3.8 Mb length that is extremely conserved among strains (97% identity between MG1655 and O157:H7), including limited rearrangements in the order of genes (conserved synteny). This backbone can be considered the highly conserved “signature” of E. coli physiology, containing the basic set of functional protein families, and is slowly evolving by vertical acquired sequence changes, still probably remaining quite similar to the hypothetical common ancestor of all E. coli strains (5 million years ago?). On the contrary, such backbone deeply diverges from the corresponding one in other Enterobacteriaceae, as Salmonella enterica. The differences between different E. coli strains are based on the insertion in this backbone of strain-specific loops of different sizes (in CFT073 there are 60 segments of >4 kb). Apart from chromosomal DNA segments, many of these loops may correspond to phages, pathogenicity (or adaptive) islands, plasmids, or transposable elements. Strain specific loops accounts for a significant proportion of the entire genome (20% in K12, 33% in O157:H7). Obviously, strains also differ in sequences contained in extra-chromosomal elements, frequently plasmids. We might consider that the sum of: i) all the genetic sequences of the common backbone in all different strains, plus ii) all genetic sequences of all the strain-specific loops, plus iii) all extra-chromosomal genetic sequences persistently associated to E. coli strains constitutes a virtual extended genome of E. coli. In reality, the image of a species should be based on the full complement of genes enabling the group to be a living, self-regulating, self-reproducing entity with flexible adaptive response to environmental changes, a real evolutionary unity. Such unity requires the compatibility of the interacting sequences: the extended genome reflects the unity of various interactive networks, and probably the opposition to potentially disturbing networks that might reach access to E. coli. Unfortunately, we will need much more sequences of E. coli strains to determine the approximate size of the extended genome. Clinical microbiologists are impressed with the overwhelming diversity of E. coli clones associated to humans. A possible explanation of such diversity is the collective exploitation by different clones of variable environments, including different warmblooded animals (“primary habitat”), and among them human individuals differently influenced by variable environments (as hospitals). Moreover, the importance of the non-host, free-environment (that may contain about half of the total population of E. coli) in E. coli biology should not be underestimated. This “secondary habitat” assures the access of the extended E. coli genome to a number of free-living organisms (as Xylella, Ralstonia, Caulobacter, Agrobacterium) able to supply sequences included in a gene-exchange commmunity. From the clinical point of view, the “strainspecific loops” are the critical ones in determining pathogenicity and antibiotic resistance. Finally, the comparative analysis of genes in E. coli (orthologs or paralogs) that might be represented in human cells could provide some clues to understand host-bacteria interactions. In summary, the extended genome represents a common space for genetic variability and evolvability in E. coli.