The E. coli Extended Genome: Ecological and Clinical

advertisement
The E. coli Extended Genome: Ecological and Clinical Implications.
Fernando Baquero. Department of Microbiology, Ramón y Cajal University Hospital, and
Laboratory for Microbial Evolution, Centre for Astrobiology, INTA-CSIC, Madrid, 28034
fbaquero.hrc@salud.madrid.org
The entire genome analysis of a number of different E. coli strains is becoming available (K12MG1655, K12-W3110, O157:H7-(GIRC), O157:H7-EDL933, E2348/69, CFT073, O42, HS,
E24377A), and the same is true for the closely related organism Shigella flexneri (strains SF-301,
2457T). A key and immediate result of this analysis is that that not a single strain may represent all
the strains. The number of genes contained in these genomes range from 4,289 in E. coli MG1655
(genome size, 4,63 Mb) to 5,379 in E. coli CFT073 (5,23 Mb). Nevertheless, all these genomes
contain a backbone sequence of about 3.8 Mb length that is extremely conserved among strains
(97% identity between MG1655 and O157:H7), including limited rearrangements in the order of
genes (conserved synteny). This backbone can be considered the highly conserved “signature” of E.
coli physiology, containing the basic set of functional protein families, and is slowly evolving by
vertical acquired sequence changes, still probably remaining quite similar to the hypothetical
common ancestor of all E. coli strains (5 million years ago?). On the contrary, such backbone
deeply diverges from the corresponding one in other Enterobacteriaceae, as Salmonella enterica.
The differences between different E. coli strains are based on the insertion in this backbone of
strain-specific loops of different sizes (in CFT073 there are 60 segments of >4 kb). Apart from
chromosomal DNA segments, many of these loops may correspond to phages, pathogenicity (or
adaptive) islands, plasmids, or transposable elements. Strain specific loops accounts for a
significant proportion of the entire genome (20% in K12, 33% in O157:H7). Obviously, strains also
differ in sequences contained in extra-chromosomal elements, frequently plasmids. We might
consider that the sum of: i) all the genetic sequences of the common backbone in all different
strains, plus ii) all genetic sequences of all the strain-specific loops, plus iii) all extra-chromosomal
genetic sequences persistently associated to E. coli strains constitutes a virtual extended genome of
E. coli. In reality, the image of a species should be based on the full complement of genes enabling
the group to be a living, self-regulating, self-reproducing entity with flexible adaptive response to
environmental changes, a real evolutionary unity. Such unity requires the compatibility of the
interacting sequences: the extended genome reflects the unity of various interactive networks, and
probably the opposition to potentially disturbing networks that might reach access to E. coli.
Unfortunately, we will need much more sequences of E. coli strains to determine the approximate
size of the extended genome. Clinical microbiologists are impressed with the overwhelming
diversity of E. coli clones associated to humans. A possible explanation of such diversity is the
collective exploitation by different clones of variable environments, including different warmblooded animals (“primary habitat”), and among them human individuals differently influenced by
variable environments (as hospitals). Moreover, the importance of the non-host, free-environment
(that may contain about half of the total population of E. coli) in E. coli biology should not be
underestimated. This “secondary habitat” assures the access of the extended E. coli genome to a
number of free-living organisms (as Xylella, Ralstonia, Caulobacter, Agrobacterium) able to supply
sequences included in a gene-exchange commmunity. From the clinical point of view, the “strainspecific loops” are the critical ones in determining pathogenicity and antibiotic resistance. Finally,
the comparative analysis of genes in E. coli (orthologs or paralogs) that might be represented in
human cells could provide some clues to understand host-bacteria interactions. In summary, the
extended genome represents a common space for genetic variability and evolvability in E. coli.
Download