The E. coli Extended Genome Fernando Baquero Dept. Microbiology, Ramón y Cajal University Hospital, and Laboratory for Microbial Evolution, CAB (INTA-CSIC) Madrid, Spain The Species E. coli Roles of the concept of “species” • Units of taxonomic classification: Units in the general reference system that microbiologists use to order the isolates • Units of generalization: Kinds of microorganisms over which explanatory-predictive generalizations can be made • Units of evolution: Bacterial entities that participate in evolutionary processes and undergo evolutionary change (Modified from T.A.C. Reydon, Ph.D. Dissertation, Leiden University, 2005) The Species E. coli New way • Units of taxonomic classification: Units in the general reference system that microbiologists use to order the isolates • Units of generalization: Kinds of microorganisms over which explanatory-predictive generalizations can be made • Units of evolution: Bacterial entities that participate in evolutionary processes and undergo evolutionary change Classic way Diversity at all hierarchical levels Strain Population Mutation Clonalization Some strains are more mutable than others Some populations tend to produce more clones? Some bacterial Community Speciation groups tend to produce more species? At any level, the origin of diversity is probably stochastic Adaptation Complexity: Mutation Single adaptive event Clonalization Multiple adaptive events Speciation Very complex adaptive events Clonalization Allopatric clonalization Sympatric clonalization Host Defenses Clonalization ExPEC* Allopatric clonalization Sympatric clonalization NonExPEC * From James R. “Linneus” Johnson The elimination of intermediates Impossibility of being a business man and a little meermaid Species-Environment Concerted Evolution Phylogenetic groups Core genome Basic reproductive environment species evolution environmental evolution Co-evolution: Trees within Trees Host Bacteria or bacterial consortium The clues of E. coli genetic diversity • • • • • Errors in DNA replication and repair Horizontal genetic transfer from other organisms Creation of mosaic genes from parts of other genes Duplication and divergence of pre-existing genes De novo invention of genes from DNA that had previously a non-coding sequence Modified from Wolfe and Li, Nat. Genet. 33, 2003 Not a single strain represents the whole species • • • • • • • • K12-MG1655 (4,289 ORFs) K12-W3110 (4,390 ORFs) O157:H7 (Sakai) (5,361 ORFs) O157:H7-EDL933 (5,349 ORFs) E2348/69 CFT073 (UPEC) (5,379 ORFs) O42 (EAEC), HS, E24377A (ETEC), Nissle (PBEC) Shigella floxneri SF-301 and 2457T (4,084) E. coli genomes 1,000 genes of difference! http://colibase.bham.ac.uk E. coli genomes http://colibase.bham.ac.uk Loops in a common core backbone A-strain A-loop (A-island) B-strain B-loops (B-islands) Loops in a common core backbone 1,393 kb 296 loops in E.A-strain coli Sakai 325 loops in E.B-strain coli K12 BB: 3,730 kb BB: 3,730 kb S-loops K-loops 537 kb Loop sizes Large loops arise from horizontal transfer events Small loops may arise from replication errors (small deletions or insertions), or correspond to highly polymorphic regions Chiapello et al., BMC Bioinformatics, 6:171, 2005 The core backbone is not the minimal genome • The “core backbone” is not the “minimal E. coli genome”, because of high level of gene redundancy. • A high number of genes are members of gene families (2-30 copies), similar enough to be assigned similar functions (paralogs) • Such redundancy involves 20-40 % of the E. coli coding sequences (more in the largest genomes) • “In-silico metabolic phenotype” including all basic functions, predict about 700 genes in minimal genome (Blattner at al., Science 1997, Edwards and Palsson, PNAS 2000) Gogarden et Townsend, Nature Rev. Mic. (2005) The blue gene, unexpected in the species “C”, might have arisen: i) by horizontal gene transfer; or ii) by an ancient gene duplication followed by differential gene loss. The loops • The backbone evolves by vertical transfer. • Large loops are probably acquired by horizontal gene transfer, but also evolve by vertical transfer. PAIs, islets, phages, plasmids, transposable, repetitive elements... • Loops tend to have a different codon usage and higher AT % than the backbone. • Loops tend to contain more frequently operational genes (actions) than informative genes (complex regulation) (R. Jain, 1999) Random-scale sub-network (loop) ALIEN nodes Operative genes are more easily accepted links Elaboration from Jain et al. ALIEN Scale free network (core) nodes Informative genes less easily accepted Number of links (log) Elaboration from Jain et al. Informative genes less easily accepted except alien replacement of an entire sub-network ALIEN Subnetwork Scale free network (core) nodes Number of links (log) 3,256 E. coli genes are connected by 113,894 links Predicted functional modules in E. coli (von Mering et al., PNAS 100:15428, 2003) Loops as R&D E. coli laboratories • Proteins expressed (bars in red) Positions of K-loops (bars in blue) The genes in the loops express proteins in only 10% of the cases M. Taoka et al., Mol & Cell. Proteomics (2004) Gene flux Excision Modification Acquisition Loss Duplication Modification (Daubin et al., Genome Biol., 4:R57, 2003; Ochman and Jones, EMBO J., 19:6637, 2000) More loss in sequences of recent acquisition* Insertions and deletions occur more frequently in loops Overall less loss than acquisition? Gene flux Acquisition Excision Modification Constant Random Gene Influx? Loss Duplication Modification As in the case of random mutation, there might be a blind, random uptake and loss of available foreign genetic sequences; environmental selection and random drift determines the fate of these constructions. E. coli - where alien genes come from? • Enterobacteriaceae (56 %) (Klebsiella, Salmonella, Serratia, Yersinia); Aeromonas, Xylella, Ralstonia, Caulobacter, Agrobacterium • Plasmids (28 %) - about 250 plasmids identified in E. coli. • Phages (10%) + many ORFan genes (64 MG1655specific) (Modified from Duphraigne et al., NAR 33, 2005, and Daubin&Ochman, Genome Research, 2004) The E. coli “Gene Exchange Community” should be better identified! E. coli Recipient Barriers for Horizontal Gene Transfer • • • • • • • • • • • • • Ecological separation from donor DNA sequence divergence Low numbers Inadequate phage receptors Inadequate pilus specificity for mating Contact-killing or inhibition Surface exclusion *200 enzymes! Restriction*; no anti-restriction mechanisms, gene inactivation Absence of replication of foreign gene, incompatibility Absence of integration of foreign gene in specific sites No recombination with host genome (AT/CG), MMR system Decrease in fitness of recipient after DNA acquisition No more room for new DNA: Headroom (Maximal Genome?) Sequence divergence reduces acquisition of foreign DNA If the acquisition produce neutral events the tolerance increases Modified from Gogarten and Towsend, Nature RM, 2005 Deleterious events are frequent with high divergence, but eventual beneficial events are rare with low divergence rates Species-Environment Concerted Evolution Phylogenetic groups Core genome Basic reproductive environment species evolution environmental evolution Genome Size in E. coli strains ECOR Phylogenetic Groups kb 5,4 5,2 5 4,8 K12 level 4,6 4,4 4,2 4 A B1 B2 D Data: Bergthorsson and Ochman, Microb. Biol. Evol. 15:6-16, 1998 Phylogenetic groups: clinical associations 100 90 80 70 60 50 40 30 20 10 0 A B1 B2 Clinical Cystitis Febrile UTI Rectal (FUTI) Faecal HV-Fr Faecal HV-Sp D Clinical: Johnson et al., EID 11:141, 2005; Cystitis: Johnson et al., AAC 49:26, 2005; FUTI and rectal FUTI: Johnson et al., JCM 43:3895, 2005; Faecal Fr/Cr/Ma, Duriez et al., Microbiology 147:1671, 2001; Faecal HV Spain, Machado et al., AAC 49, 2005 Phylogenetic groups: clinical associations Groups B2 and D are the more frequently found in E. coli bacteremia (Hilali et al., Inf.Imm 68:3983, 2000; Johnson et al., JID15:2121, 2004, Bingen, yesterday) But: “Epidemic extraintestinal 100 many SxT-R in UTI strains”, 90Israel, France (Johnson in US, et 80 al.,EID 11:141, 2005) 70 60 50 40 30 20 10 0 A B1 B2 Clinical Cystitis Febrile UTI Rectal (FUTI) Faecal HV-Fr Faecal HV-Sp D Clinical: Johnson et al., EID 11:141, 2005; Cystitis: Johnson et al., AAC 49:26, 2005; FUTI and rectal FUTI: Johnson et al., JCM 43:3895, 2005; Faecal Fr/Cr/Ma, Duriez et al., Microbiology 147:1671, 2001; Faecal HV Spain, Machado et al., AAC 49, 2005 Distribution of E. coli isolates from hospitalized patients and from healthy volunteers among the four phylogenetic groups 50 % of strai n s 40 30 20 10 0 A B1 B2 D Machado, Cantón, Baquero et al., AAC 49 (2005) ESBLs (red) predominates among strains of group D Pathogenic strains, non ESBL, predominates among group B2 Commensal strains, non ESBL, predominates among group A Antimicrobial-R in phylogenetic groups 80 70 60 50 40 30 20 10 0 A B1 SxT-R ESBLs B2 D Cipro-R(1) Cipro-R(2) SxT-R and Cipro-R(1): Johnson et al, AAC 49:26, 2005; ESBL: Machado et al., AAC 49, 2005; Cipro-R(2): Kuntaman et al., EID 11:1363, 2005 (Indonesia). The phylogenetic group B2, the more pathogenic one, tends to be the less resistant? Species-Environment Concerted Evolution Ecotypes Core genome Basic reproductive environment species evolution environmental evolution Models for Multiple Ecotypes (Gevers et al., Nature MR 3:733, 2005) Clonalization Patients with different ESBL clones Ramón y Cajal Hospital, Madrid (Baquero, Coque & Cantón, Lancet I.D. 2:591, 2002) 30 No. of patients/clone 25 20 15 10 5 0 88 89 90 91 92 93 94 95 96 97 98 99 Ye ar 0 Mutation: Intra-Clonal Diversity 80 E. coli : Faecal - Urine - Blood - ESBLs 70 % of strains 60 50 40 30 20 10 0 Hypo Normo Weak Mutation frequency Baquero et al, AAC 2004 and Nov. 2005 Strong Clonal Ensembles: Metastability through Intermittent Fixation Line of best fit clones time Different clones peak in frequency at different times, accordingly to the best-fit clone in each epoch* of a changing environment *epochal evolution The maintenance of clonal ensembles is favored by the assymetry of fitness abilities in different clones in different epochs Clonal ensemble Shared Environments and Maintenance of Diversity A regional polyclonal community structure 1 2 1 Alternative stable equilibria and the coexistence of variant organisms On this topic: Geographic mosaic theory of coevolution, Forde et al, Nature, 2004 Maintenance of diversity A regional polyclonal community structure 1 2 Local Migration 1 Local Gene Flow Diversity: Collapse and Resurrection Kin effects in open systems SELECTION Maintenance of diversity A regional polyclonal community structure 1 Environmental gradients are composed by a multiplicity of patches that may act as discrete selective points for bacterial variants Maintenance of diversity A regional polyclonal community structure Gradients and concentrationdependent selection (F. Baquero and C. Negri, Bioessays, 1997) Maintenance of Diversity by Scissors, Rock, Paper Model B. Kerr et al., Local dispersal promotes biodiversity in a reallife game of rock-paper-scissors. Nature 418:171, 2002 Rock, Paper, Scissors Model 2. Scissors increase its power against paper... 3. And less paper means more stones... 1. If the stones reduces its attack again scissors.... Rock, Paper, Scissors Model B. Kerr et al., Local dispersal promotes biodiversity in a reallife game of rock-paper-scissors. Nature 418:171, 2002 Rock, Paper, Scissors Model B. Kerr et al., Local dispersal promotes biodiversity in a reallife game of rock-paper-scissors. Nature 418:171, 2002 Kindly provided by Teresa Coque et al., 2005 In60-like integrons Int1 Int1 aadB qacED1sul1 orf513 dfrA10 aadA2 qacED1sul1 orf513 ampC Int1 aac(6) aadA2 2 qacED1sul1 blaOXA-2 orfD qacED1sul1 aacA4 Int1 Int1 dfrA16 dfrA16 blaoxA30 aadA22 catB3 qacED1sul1 aar-3 qacED1sul1 qacED1sul1 orf513 qacED1 sul1 aadA2 Int1 Int1 qacED1sul1 aacA4 catA2 qacE D1 sul1 orf5 orf5 qacE D1 sul1 ampR orf5 CTX-M-9 orf513 bla CTXM-9 orf3-like CTX-M-2 orf513 orf513 orf513 orf513 qacED1sul1 orf513 qacED1sul1 orf513 blaCTXM-2 qnr qnr bla DHA orf3:: qacED1 sul1 ampR int1 ampR orf1 qacED1 sul1 orf5 orf6 qacED1 sul1 orf5 orf6 ampR dfrA18 qacED1 sul1 IS3000 oxa1 aadA1 IS6100 IS6100 qacED1 sul1 qacED1 sul1 blaDHA ampR qacED1 sul1 Extensive “McFarlane-Burnett” Model and Evolution of Bacterial Pathogenicity • Every evolutionary element (clones, chromosomal sequences, plasmids, transposons, islands, recombinases, insertion sequences...) is independently submitted to apparently random spontaneous variation. • Combinations of the variant elements are constantly constructed apparently at random. • Eventually a given combination is selected and enriched by an unexpected advantage (colonizationpathogenicity) or fixed by drift. Pre-pathogens are probably constantly constructed; many of them eliminated by immunity and normal microbiota The opportunity of meeting interesting people: E. coli in the environment • It has been suggested that one-half of E. coli population resides in primary habitats (warmblooded hosts) and one-half in soil or water. • Tropical waters harbor natural populations of E. coli (Carrillo et al., AEM 50:468, 1985) • In nutrient-rich soils, particularly with cyclic periods of wet and dry weather, E. coli is member of normal microflora (Winfield and Groisman, AEM 69:3687, 2003) E. coli in the environment • Land disposal practices of sewage and sewage sludges that result from wastewater treatment. • More than 3 million gallons of sewage effluent from more than 3,000 land treatment sites and 15 million septic tanks were applied to land every day in 1984 (Keswick, BH. 1984) • More than 7 million dry tons of sewage sludge are produced anually and 54 % of this is applied to soil (Environmental Protection Agency, http:// www.epa.gov./oigearth; 2002; Santamaría&Toranzos, Int.Microbiol. 6:5-9, 2003) E. coli in the environment • EPA Class A Biosolids Less than 103 thermotolerant coliforms/g, for lawns, home gardens, as commercial fertilizer. • EPA Class B Biosolids Less than 106 thermotolerant coliforms/g, for land application, forest lands, reclamation sites. During a period, access is limited to public and livestock. (Environmental Protection Agency) Temperature fitness profiles Absolute fitness E. coli K. pneumoniae 5 0 -5 -10 -15 -20 10 20 30 40 50 10 Temperature (ºC) 20 30 40 Modified from: Okada and Gordon, Mol. Ecol. 10:2499, 2001 50 CTX-M-10 linked to Kluyvera and phage sequences Tn1000-like Transposase ORF3 (fragment) ORF2 ORF4 BamHI EcoRI BamHI DNA CTX-M-10 Transposase ORF8 Transposase ORF11 invertase ORF7 IS432 ORF10 IS5 EcoRI Invertible region Phage related region EcoRI EcoRI BamHI Tn 5708 fragment IS4321 IS5 K. cryocrescens homol. region (90%) Oliver, Coque, Alonso, Valverde, Baquero, Cantón. AAC 2005; 1567-1571 Present in different clones at Ramón y Cajal Hospital Variability in the sequence among different clones Probably linked to the same plasmid structure The Extended Genome A genetic space composed by the sum of: • The sequences corresponding to the maximal core genome of all clones (ortologs-paralogs), plus • The sequences of all loops that have been inserted in such a core in the different natural (successful at one time) clones or lineages: ecotypes, geotypes, pathotypes.., plus • The sequences of all extra-chromosomal elements stably associated with any clone Extended Genome: a Genetic Space Core Loops Peripheral Extended Genome: Core Gravity Foreign sequences of different base composition tends to “ameliorate” to resemble the features of the resident genome* Core Loops Peripheral *Ochman and Jones, EMBO J., 19:6637, 2000 Extended Genome: a Genetic Space Filling the Carrying Capacity of the Environment for the Species Genetic Space Complex Genetic Space The Extended E. coli Genome • Research to increase our interpretative, predictive and preventive capability about Escherichia coli evolutionary biology. • Catalog of sequences of all evolutionary relevant pieces* in E. coli. • Network of all interactions between pieces. • Modelization of combinations that might emerge under particular environmental or clinical conditions. *F.Baquero, From Pieces to Patterns, Nature Reviews 2004 A lot of work, a lot of fun. Particular thanks to some of my friends in the lab... • • • • Rafael Cantón Teresa Coque Juan-Carlos Galán José-Luis Martínez (CNB, CSIC) Gerdes SY et al, JB 2003