Distribution of Chromosome Length Variation in Natural Isolates of Escherichia coli Ulfar Bergthorsson and Howard Ochman Department of Biology, University of Rochester Large-scale variation in chromosome size was analyzed in 35 natural isolates of Escherichia coli by physical mapping with a restriction enzyme whose sites are restricted to rDNA operons. Although the genetic maps and chromosome lengths of the laboratory strains E. coli K12 and Salmonella enterica sv. Typhimurium LT2 are highly congruent, chromosome lengths among natural strains of E. coli can differ by as much as 1 Mb, ranging from 4.5 to 5.5 Mb in length. This variation has been generated by multiple changes dispersed throughout the genome, and these alterations are correlated; i.e., additions to one portion of the chromosome are often accompanied by additions to other chromosomal regions. This pattern of variation is most probably the result of selection acting to maintain equal distances between the replication origin and terminus on each side of the circular chromosome. There is a large phylogenetic component to the observed size variation: natural isolates from certain subgroups of E. coli have consistently larger chromosomes, suggesting that much of the additional DNA in larger chromosomes is shared through common ancestry. There is no significant correlation between genome sizes and growth rates, which counters the view that the streamlining of bacterial genomes is a response to selection for faster growth rates in natural populations. Introduction Escherichia coli and Salmonella enterica are closely related species of enteric bacteria that diverged an estimated 120 to 160 MYA (Ochman and Wilson 1987). Comparisons of the genetic maps of the best characterized representatives of these species, E. coli K12 and S. enterica sv. Typhimurium LT2, reveal extensive conservation in the order and spacing of mapped loci (Riley and Krawiec 1987). Moreover, the chromosome sizes of these two strains, as estimated by physical mapping procedures, are very similar: the E. coli K12 chromosome is 4.6 Mb in length (Kohara, Akiyama, and Isono 1987; Smith et al. 1987), whereas that of Typhimurium LT2 is 4.8 Mb (Liu and Sanderson 1992). Although these findings suggest that the structure of bacterial genomes is evolutionarily conserved, the range of variation in genome size among natural isolates of E. coli greatly exceeds that observed between E. coli K12 and Typhimurium LT2 (Brenner et al. 1972; Bergthorsson and Ochman 1995). Genome sizes of natural isolates of E. coli can vary by as much as 650 kb (Bergthorsson and Ochman 1995), and among serovars of S. enterica (Enteriditis, Paratyphi, Typhi, and Typhimurium), chromosome sizes can differ by 300 kb (Liu, Hessel, and Sanderson 1993). Despite the overall correspondence in the size and organization of the E. coli K12 and Typhimurium LT2 chromosomes, alignments of their physical and genetic maps have revealed several large regions confined to only one species. These regions—termed ‘‘chromosomal loops’’ (Riley and Krawiec 1987)—are distributed throughout the chromosome and cumulatively account Abbreviations: PFGE, pulsed-field gel electrophoresis; LEE, locus of enterocyte effacement; PCR, polymerase chain reaction; sv., serovar. Key words: chromosome size variation, Escherichia coli, genome evolution, physical mapping, pulsed field gel electrophoresis. Address for correspondence and reprints: Howard Ochman, Department of Biology, University of Rochester, Rochester, New York 14627. E-mail: ochman@ho.biology.rochester.edu. Mol. Biol. Evol. 15(1):6–16. 1998 q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 6 for more than 10% of the DNA in each species. Analysis of the base composition of sequenced genes from E. coli suggests that as much as 15%, or 700 kb, of the E. coli K12 genome may have been acquired by transfer from foreign sources and that perhaps 30 kb of foreign DNA is acquired every million years (Lawrence and Ochman 1997). Although the evolution of E. coli and S. enterica has been marked by the acquisition and deletion of large regions of DNA, the chromosomes of these species remain symmetric in the sense that approximately equal distances are maintained between the replication origin and terminus on each side of the circular chromosome. Selection to maintain chromosome symmetry is thought to preserve the order of genes by reducing rearrangements, and there is some support for this notion from experimental populations of E. coli in which the deleterious effect of an inversion is related to the resulting asymmetry in the distance between the origin and terminus (François et al. 1990; Hill, Harvey, and Gray 1990; Riley and Sanderson 1990). Furthermore, in two naturally occurring isolates of S. enterica, inversions have apparently compensated for asymmetries introduced by large chromosomal insertions (Liu and Sanderson 1995a, 1995b). Thus, it appears that patterns of change in bacterial genomes are affected by natural selection to maintain chromosome symmetry. In this paper, we address several issues concerning the large degree of variation in genome size observed among naturally occurring strains of E. coli: 1. How much of the total variation in total genome size can be attributed to chromosomal DNA? The previously published estimates of genome size in natural isolates of E. coli were based on the cumulative lengths of all restriction fragments resolved on pulsed-field gels, and in that study, the assignment of length variation to particular chromosomal regions, or even to extrachromosomal elements, was impeded by the highly variable restriction fragment patterns present in natural populations. Chromosome Size Variation in E. coli 2. Is size variation distributed randomly over the chromosome, or are specific regions prone to acquire and delete DNA? Laboratory isolates of E. coli exhibit more structural variation close to the replication terminus (Perkins et al. 1993), perhaps due to higher recombination rates in this region. Based on these studies, we might also expect the termination region to be more variable in natural isolates; however, in S. enterica, a region near the replication origin accounts for most of the variation in chromosome length (Liu, Hessel, and Sanderson 1993). 3. What does the pattern of chromosome variation in nature reveal about selective constraints acting on chromosome organization? For example, are differences in chromosome length among strains symmetrically distributed around the replication origin as expected if natural selection acts to keep equal distances between the origin and terminus of replication? 4. What is the absolute rate of chromosome evolution in enteric bacteria? The original comparisons of E. coli and Typhimurium suggest that chromosome size is stable over long evolutionary periods. Applying the homing endonuclease I-CeuI (Marshall and Lemieux 1992), whose restriction sites occur only in rDNA operons, we have investigated the distribution of length variation over the entire chromosome in natural isolates of E. coli. In this study, we establish that the range of variation in chromosome length is in fact larger than that of overall genome size as previously detected among natural isolates of E. coli, and that these size differences are distributed symmetrically with respect to the replication origin and terminus. Materials and Methods Bacterial Strains We selected 35 strains of E. coli from the ECOR reference collection (Ochman and Selander 1984), which includes natural isolates from a wide variety of hosts and geographic regions. Phylogenetic relationships among these strains have been inferred from variation at 38 polymorphic loci as detected by enzyme electrophoresis (Herzer et al. 1990). Restriction Endonuclease Digestion Agarose plugs containing intact genomic DNA were prepared as previously described (Bergthorsson and Ochman 1995). Approximately 50 ml of each agarose plug (containing 10 ng/ml DNA) was digested overnight at 378C with 0.3 U of I-CeuI (NEB) in 50 ml of restriction enzyme buffer. Partial digests were generated by overnight digestion with 0.05 units of enzyme. PFGE and Physical Mapping Approximately 15 ml of an agarose plug was inserted into a 0.9% agarose gel and subjected to electrophoresis in 0.5 3 TBE at 148C in a CHEF-DR II pulsedfield gel box (Bio-Rad Laboratories, Richmond, Calif.). To separate fragments smaller than 1,000 kb, electrophoresis proceeded for 24 h at 180 V with pulse times varying according to the intended range of resolution 7 (Bergthorsson and Ochman 1995). To resolve the largest I-CeuI restriction fragment, which is typically over 2,500 kb in length, samples were electrophoresed in a 0.7% agarose gel for 120 h at 60 V with pulse times ramped from 10 to 16 min over the course of the run. Gels were stained in 0.01% ethidium bromide and photographed under UV light. Lambda ladder, low range PFG marker (NEB), and chromosomes from Saccharomyces cerevisiae and Hanensula wingei (Bio-Rad) were used as molecular size markers. Size estimates of H. wingei chromosomes, as provided by the supplier, were used with one exception. Our comparisons of H. wingei chromosomes with I-CeuI fragment A of E. coli K12 (which is 2,450 kb in length) yielded a size for H. wingei chromosome VI of 2.5 Mb instead of 2.7 Mb, as reported by the supplier. In cases where we did not observe consistent differences in the sizes of particular I-CeuI fragments across multiple runs, fragment size estimates were considered to be equal and pooled in the calculation of average size and its standard error. (This pooling was not applied in the estimates of fragment A where sizes and standard errors were based on one to four runs for each isolate.) The relative chromosomal position of each I-CeuI fragment was established by partial digests, and the identity of fragments was determined from Southern blots of I-CeuI-digested DNA, which were probed with genes whose locations are known on the E. coli K12 chromosome. Loci used in identification of I-CeuI fragments by Southern hybridization were: dif at 34.39 for fragment A; rpoS at 61.89 for fragment B; oriC at 84.69 for fragment C; uvrD at 86.29 for fragment D; glnLG at 87.39 for fragment E; and ileS at 0.59 for fragment G. When referring to a particular I-CeuI fragment, we are denoting the pulse-field gel band that hybridized to the corresponding probe. Map positions of these probes are based on the nucleotide sequence of the E. coli K12 chromosome (Blattner et al. 1997, GenBank accession number U00094). To confirm fragment identity, PCR assays were performed on DNA recovered from I-CeuI fragments originally recognized as B, C, and G. Primers for genes at the opposite ends of fragments B, C, and G on the E. coli K12 chromosome were developed based on the nucleotide sequence of the E. coli K12 chromosome (Blattner et al. 1997). The assayed genes were as follows: clpB at 58.99 and yhdZ at 72.79 for I-CeuI fragment B; aroE at 73.99 for I-CeuI for fragment C; and metA at 90.89 and proS at 4.79 for fragment G. Small samples of agarose containing the appropriate I-CeuI restriction fragment were removed from a gel with a Pasteur pipette and melted in 30 ml ddH2O, and 1 ml of this preparation was used as template in the PCR. Amplification reactions proceeded for 25 cycles of 948C for 1 min, 588C for 1 min, and 728C for 1 min. Plasmid Analysis Plasmid DNA was isolated by the method of Kado and Liu (1981), digested with a rare-cutting restriction enzyme (Bln I, Not I, Sfi I, or Xba I), and resolved by PFGE as described above. 8 Bergthorsson and Ochman Statistical Analysis Differences in average chromosome size among subspecific groups of E. coli were tested by a single classification analysis of variance (Sokal and Rohlf 1981, p. 210). Chromosome symmetry and distribution of length variation with respect to the replication origin and terminus were examined by comparing the cumulative lengths of fragments flanking each side of the replication origin. To eliminate nonindependence of data points due to the common ancestry of isolates, we applied the method of phylogenetic contrasts proposed by Felsenstein (1985). The analysis was performed with programs written by Garland et al. (1993) and Martins (1996) and applied to data extracted from the neighbor-joining tree of the genetic relationships among strains (Herzer et al. 1990). Because the absolute values of standardized contrasts are inversely related to lengths of branches on the tree, the branch lengths were log-transformed prior to standardization and subsequent calculations (Garland, Harvey, and Ives 1992). To test whether the correlations were sensitive to uncertainties in the phylogenetic relationships among strains, phylogenetic contrasts were performed on 1,000 random trees (Martins 1996) and on an alternative tree for the ECOR collection based on random amplified DNA sequences (Desjardins et al. 1995). To test whether regions of the chromosome differ in degree of variability, we compared the variances in the lengths of different chromosomal regions based on I-CeuI fragments. The variance in the length of a given region is expected to increase linearly with fragment size, assuming that the number of insertions, duplications and deletions per fragment increases with fragment length. Therefore, the variances in the sizes of the ICeuI fragments were standardized by dividing the variance of a given fragment by its average size before subjecting the values to pairwise F-tests to test the equality of variances (Sokal and Rohlf 1981, p. 185). Critical significance values of pairwise F-tests were adjusted with a sequentially rejective Bonferroni procedure for multiple comparisons (Holm 1979; Rice 1989). Copy numbers of IS elements in ECOR strains were taken from Sawyer et al. (1987), Hall et al. (1989) and Lawrence, Ochman and Hartl (1992), and information on growth rates was from Mikkola and Kurland (1991). FIG. 1.—Locations of I-CeuI recognition sites on the E. coli K12 chromosome. I-CeuI cleaves at the seven rrn genes, whose map positions are indicated. The resulting restriction fragments are designated A through G. indicates that this site is not interrupted by the LEE or any other piece of DNA. Amplification reactions were performed for 25 cycles at 948C for 1 min, 508C for 1 min, and 728C for 1 min. Results Variation in Chromosome Size Among Natural Isolates of E. coli All 35 ECOR isolates produced seven fragments after digestion with I-CeuI, indicating that the number of rrn operons is conserved among strains of E. coli. Following the convention of Liu, Hessel, and Sanderson (1993), these fragments were designated A through G (fig. 1). Based on the cumulative sizes of these fragments for each strain, natural isolates of E. coli can differ by over 1 Mb in the lengths of their chromosomes, with sizes ranging from 4,500 to 5,520 kb (table 1 and fig. 2). PCR Differences Between Subgroups of E. coli The phylogenetic tree of Herzer et al. (1990) displays five major subspecific groups within E. coli. We detected significant differences in chromosome size between these subgroups of E. coli (F 5 6.8, P , 0.001). The laboratory strain E. coli K12 has a chromosome size of only 4.6 Mb and is most closely related to strains from subgroup A, which contains strains with the smallest chromosomes. Strains with the largest chromosomes are found in subgroups B2, D, and E of the ECOR collection. The chromosomes of certain strains of E. coli have integrated large regions called pathogenicity islands. The frequency of a 35-kb pathogenicity island—the locus of enterocyte effacement (LEE)—was assessed by the PCR using primer sequences published in McDaniel et al. (1995). Primers K255 and K260 flank the right junction of the LEE (K260 is outside the locus), and K295 and K296 flank the left junction (K295 is outside). PCR products from reactions using the two primer pairs, K255 and K260, and K295 and K296, denote the presence of the LEE at this site, whereas a reaction product of 527 bp from the flanking primers (K260 and K261) Changes in Different Chromosomal Regions are Correlated The most striking feature of the size variation in ICeuI restriction fragments is that the sizes of fragments to the left of the replication origin and those to the right are strongly related (fig. 2). The correlation coefficient between the cumulative size of the BC region, which proceeds counterclockwise from the replication origin to the terminus, and that of the DEFG region, which proceeds clockwise, is highly significant (r 5 0.83, P , 0.001), denoting symmetry around the origin of repli- Chromosome Size Variation in E. coli 9 Table 1 Sizes of I-CeuI Restriction Fragments and Total Chromosome Size (kb) of 35 Natural Isolates of Escherichia coli I-CEUI FRAGMENTSb SUBGROUPa Group A . . . . STRAIN ECOR 4 ECOR 5 ECOR 11 ECOR 13 ECOR 14 ECOR 15 ECOR 18 ECOR 19 ECOR 20 ECOR 21 ECOR 23 Group B1 . . . ECOR 27 ECOR 28 ECOR 29 ECOR 34 ECOR 58 ECOR 68 ECOR 71 ECOR 72 Group B2 . . . ECOR 51 ECOR 56 ECOR 57 ECOR 60 ECOR 61 ECOR 62 ECOR 63 ECOR 64 ECOR 65 Group D . . . . ECOR 36 ECOR 38 ECOR 39 ECOR 40 Group E. . . . . ECOR 31 ECOR 37 ECOR 42 a Ad 2,585 (35.0) 2,940 (10.0) 2,750 * 2,485 (12.0) 2,645 (45.0) 2,690 (55.0) 2,510 (15.3) 2,480 (11.6) 2,505 (25.0) 2,505 (25.0) 2,675 (5.0) 2,600 (33.3) 2,620 (27.5) 2,610 (60.0) 2,500 * 2,700 (25.3) 2,745 (25.0) 2,650 * 2,635 (65.0) 2,750 (50.0) 2,590 (54.9) 2,860 (17.6) 2,580 (8.8) 2,505 (15.0) 2,585 (15.0) 2,700 (25.0) 2,775 (165.0) 2,500 (0.0) 2,710 (26.5) 2,800 * 2,780 (80.0) 2,845 (35.0) 2,775 (35.0) 3,100 (0.0) 2,735 (35.0) B C D E F G SIZE (kb)c 707 (3.4) 743 (3.9) 824 (5.7) 680 (3.4) 735 (4.5) 735 (4.5) 699 (3.8) 699 (3.8) 654 (8.8) 654 (8.8) 807 (3.8) 707 (3.4) 743 (3.9) 787 (2.6) 790 (5.6) 743 (3.9) 843 (5.5) 771 (8.0) 771 (8.0) 810 (5.6) 824 (5.7) 810 (5.6) 790 (5.6) 776 (12.4) 843 (5.5) 873 (7.3) 810 (5.6) 787 (2.6) 824 (5.7) 807 (3.8) 787 (2.6) 807 (3.8) 743 (2.7) 787 (2.6) 743 (2.7) 527 (2.0) 515 (2.5) 556 (4.0) 515 (2.5) 608 (3.3) 575 (6.1) 515 (2.5) 527 (2.0) 480 (0) 480 (0) 532 (3.0) 515 (2.5) 527 (2.0) 527 (2.0) 515 (2.5) 515 (2.5) 532 (3.0) 547 (5.4) 532 (3.0) 550 (2.9) 550 (2.9) 550 (2.9) 608 (3.3) 527 (2.0) 527 (2.0) 550 (2.9) 581 (2.1) 527 (2.0) 556 (4.0) 616 (1.8) 581 (2.1) 616 (1.8) 547 (5.4) 581 (2.1) 616 (1.8) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 90 (0.6) 94 (0.7) 94 (0.7) 94 (0.7) 94 (0.7) 94 (0.7) 90 (0.6) 94 (0.7) 112 (1.1) 112 (1.1) 112 (1.1) 112 (1.1) 112 (1.1) 112 (1.1) 112 (1.1) 112 (1.1) 104 (1.0) 104 (1.0) 104 (1.0) 104 (1.0) 104 (1.0) 94 (0.7) 94 (0.7) 94 (0.7) 166 (1.2) 128 (1.2) 128 (1.2) 128 (1.2) 128 (1.2) 138 (0.6) 122 (0.5) 122 (0.5) 122 (0.5) 122 (0.5) 138 (0.6) 143 (0.9) 128 (1.2) 138 (0.6) 138 (0.6) 136 (1.8) 138 (0.6) 138 (0.6) 138 (0.6) 138 (0.6) 138 (0.6) 138 (0.6) 138 (0.6) 166 (1.2) 166 (1.2) 138 (0.6) 175 (1.2) 138 (0.6) 138 (0.6) 143 (0.9) 143 (0.9) 143 (0.9) 138 (0.6) 175 (1.2) 143 (0.9) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 38 (0.3) 43 (0.9) 38 (0.3) 43 (0.9) 38 (0.3) 38 (0.3) 38 (0.3) 608 (3.3) 699 (3.3) 735 (4.5) 639 (2.1) 707 (3.4) 639 (2.1) 608 (3.3) 639 (2.1) 608 (3.3) 608 (3.3) 680 (3.4) 616 (1.8) 639 (2.1) 639 (2.1) 680 (3.4) 639 (2.1) 807 (3.8) 654 (8.8) 680 (3.4) 810 (5.6) 707 (3.4) 810 (5.6) 824 (5.7) 776 (8.3) 790 (5.6) 807 (3.8) 743 (2.7) 707 (3.4) 743 (2.7) 807 (3.8) 713 (6.3) 787 (2.6) 735 (4.5) 743 (2.7) 699 (3.3) 4,720 5,150 5,120 4,580 4,950 4,910 4,580 4,600 4,500 4,500 4,960 4,710 4,790 4,830 4,760 4,870 5,200 4,890 4,890 5,210 4,960 5,320 5,090 4,900 5,060 5,220 5,230 4,800 5,110 5,320 5,150 5,350 5,070 5,520 5,070 The E. coli subgroup designations are derived from the neighbor-joining tree of Herzer et al. (1990). The phylogenetic positions of subgroups are shown in figure 2. See figure 1 for positions of I-CeuI fragments. All fragment lengths are given in kb, and standard errors are in parentheses below the fragment size estimates. Chromosome size is computed as the sum of all I-CeuI fragment lengths rounded to the nearest 10 kb. d An asterisk indicates that the size of the fragment was measured only once, and there is no estimate of standard error. b c 10 Bergthorsson and Ochman FIG. 3.—Relationship between lengths of chromosomal regions flanking the replication origin in natural isolates of E. coli. Region left of the origin comprises I-CeuI fragments B and C, located between 56.29 and 84.69 on the E. coli K12 chromosome. (Although the replication origin lies within fragment C, .90% of this fragment is located counterclockwise to the origin.) The region right of the origin comprises fragments D, E, F, and G, which map between 84.69 and 5.19 on the E. coli K12 chromosome (r 5 0.83, P , 0.001, df 5 33). be expected if chromosomal insertions and deletions were independent events. FIG. 2.—Linear representation of I-CeuI maps for 19 natural isolates of E. coli and E. coli K12. Letters A through G along the bottom correspond to I-CeuI fragments, and the megabase scale at the top shows chromosome lengths. Genetic relationships are inferred from variation at 38 enzyme loci (Herzer et al. 1990). Numbers in this tree represent ECOR strain designations, and letters represent major subgroups within E. coli. The shaded box depicts the range of variation in chromosome size between E. coli K12 and the ECOR strain with the largest chromosome. cation (fig. 3). The corresponding correlation between the region surrounding the origin of replication, fragments B through G, and the region flanking the terminus represented by fragment A is much lower but still statistically significant (r 5 0.52, P , 0.01). Because the strains used in this analysis are phylogenetically related, the high degree of significance in the correlation between the lengths on either side of the origin may be inflated due to changes in a few ancestral strains. Using phylogenetically independent contrasts, the association between changes on both sides of the replication origin remains significant (fig. 2b, r 5 0.58, P , 0.01) (fig. 4), and the 99% confidence limits on the distribution of correlation coefficients generated by 1,000 random phylogenetic trees do not include the zero. The correlation between the lengths of the region that spans the replication origin and that spanning the terminus is also significant using the phylogenetic correction (r 5 0.39, P , 0.05). These correlations would not Variation Among Chromosomal Regions With the exception of the smallest fragment (F), all I-CeuI fragments show considerable length variation among natural isolates of E. coli (table 1, fig. 2). For example, I-CeuI fragment G ranges from 608 to 824 kb in length, overlapping in size with fragments B (654 to 873 kb) and fragment C (480 to 616 kb). Therefore, the identity of each fragment was determined by Southern hybridizations to genes of known position on the E. coli K12 chromosome. Probes to oriC and dif were used to ascertain that the replication origin and terminus were located on fragments C and A, respectively. PCR assays of clpB, yhdZ, aroE, metA, and proS confirmed the identity and map position of the I-CeuI fragments. Although these genes were generally recovered from the expected fragments (based on size and Southern hybridizations), there were a few exceptions: proS was amplified from fragment B instead of fragment G of ECOR 57, and yhdZ was detected on fragment G but not on fragment FIG. 4.—Bivariate plot of phylogenetically independent contrasts of lengths of regions flanking the replication origin. The neighborjoining tree by Herzer et al. (1990) is used to remove covariation due to common ancestry of isolates. Chromosome Size Variation in E. coli FIG. 5.—Relationship between the variances of different I-CeuI fragments (A through G) standardized by their average size and the average size of the corresponding fragments. Standardization should remove the linear relationship between the variance and the average fragment size; however, a significant relationship still exists. Fragment G exhibits larger variation than expected for a fragment of this size. B in ECOR 61. These cases indicate the occurrence of either large-scale chromosomal inversions or translocations in these strains. There were also instances in which amplification from the isolated fragments failed, although PCR assays on total genomic DNA indicated that the gene was present in these strains. We did not detect metA in fragments B, C, or G in ECOR 11 from subgroup A or ECOR 51, 56, 57, 61, 62 and 65 from subgroup B2. Furthermore, proS was not detected in these same fragments in ECOR 61, 65, and 72. To test whether certain regions of the chromosome exhibit different levels of length variation, we compared the standardized variances for each fragment across strains in pairwise F-tests. There were no significant differences in degree of variation among the larger fragments, A, B, and G; however, fragments G and A are significantly more variable than fragment C (whereas B is not). The variance in the size of the smallest I-CeuI fragment (F) is significantly less than that of all other I-CeuI fragments, with its length ranging from 38 to 42 kb among the natural isolates. There is a significant correlation between I-CeuI fragment size and the standardized variance of each fragment (r 5 0.85, P , 0.05, df 5 5) (fig. 5). This relationship could, in principle, be explained by a positive correlation between chromosomal changes occurring within each region, which would increase the expected variance of each fragment. (If we suppose, for example, that each I-CeuI fragment consists of two smaller parts, the expected variance for the whole fragment would equal sum of the variances of each part plus twice the covariance of the parts. Therefore, standardized variances that are higher than expected could result from correlation of changes within a fragment.) Alternatively, the correlation between I-CeuI fragment size and its standardized variance could result from the positions of particular I-CeuI fragments around the chromosome. Smaller I-CeuI fragments are clustered close to the replication origin, and if chromosome organization is more conserved closer to the replication origin, we might expect that the standardized variance in length is related to fragment length. And in fact, there 11 is a significant correlation between the standardized variance of a fragment and distance from the replication origin. However, there is not a significant difference in variability between the half of the chromosome flanking the replication origin (the sum of fragments B through G) and that of the terminus (A), which would be expected if constraints on chromosome organization decreased from the origin to the terminus. In sum, this analysis indicates that the relationship between standardized variance and average fragment size is due to correlations in size changes over the entire chromosome rather than to regional conservation closer to the origin of replication. Among serovars of S. enterica, I-CeuI fragment G—corresponding to 90.59 to 5.19 on the E. coli K12 chromosome—accounts for a majority of the total difference in chromosome length between the largest and the smallest chromosomes (Liu, Hessel, and Sanderson 1993). It appears that this same chromosomal region is more variable in E. coli. In figure 5, the point for fragment G is well above the regression line of the standardized variance of I-CeuI fragment to average fragment size. Chromosome Size and Repetitive DNA Repetitive sequences, such as IS elements, are widespread in the E. coli genome, and their copy numbers are highly variable among strains (Sawyer et al. 1987). In addition to their direct contribution to genome size variation, repetitive DNAs can affect the rate of large-scale genome rearrangements by providing regions of homology for ectopic recombination and can lead to duplications, deletions, inversions, translocations, and the integration of DNA (Umeda and Ohtsubo 1989; Deonier 1996). IS elements increase the rate of duplications and deletions, but because duplications are very unstable and frequently revert, and deletions are virtually irreversible, high numbers of IS elements might act to reduce chromosome sizes. Alternatively, IS elements could act as sites for integration of plasmids and fragments of foreign DNA into the chromosome, thereby increasing genome size. The numbers of chromosomal copies of IS1—the most numerous IS element in strains from the ECOR collection—are negatively correlated with chromosome size (r 5 20.47, P , 0.01). However, when the independent-contrasts method is used to correct for the effect of covariation due to shared ancestry, the negative association between IS number and genome size disappears. No other IS elements show significant correlation to genome size. Chromosome Size Versus Genome Size There is reasonably good agreement between the chromosome sizes based on I-CeuI fragment length and the overall genome sizes, which include chromosomal and extrachromosomal DNA, as estimated from Not I and Bln I digests of 14 ECOR isolates (Bergthorsson and Ochman 1995). There is, however, one notable exception: ECOR 37 is now estimated to have a 5.5-Mb chromosome, whereas previous estimates yielded a total 12 Bergthorsson and Ochman Table 2 Chromosome Size and the Sizes of Large Plasmids in 14 Natural Isolates of Escherichia coli Strain ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR ECOR 4. . . . . . . . . 13 . . . . . . . . 14 . . . . . . . . 15 . . . . . . . . 28 . . . . . . . . 29 . . . . . . . . 37 . . . . . . . . 38 . . . . . . . . 40 . . . . . . . . 51 . . . . . . . . 62 . . . . . . . . 63 . . . . . . . . 68 . . . . . . . . 71 . . . . . . . . Chromosome Length (kb) 4,720 4,580 4,950 4,910 4,790 4,830 5,520 5,320 5,350 5,210 5,060 5,220 5,200 4,890 Sizes of Large Plasmids (kb) 92, 87 40 102 170 49, 60, 60, 104 69 65, 85 40 130, 110 90 genome size of approximately 4.9 Mb. This discrepancy results from at least two previously unresolved comigrating Not I and Bln I restriction fragments, which led to an underestimation of genome size. To determine the effect of large plasmids on previous estimates of genome size using Not I and Bln I, we sampled 14 ECOR strains for the presence of large plasmids. Consistent with an earlier survey of plasmid distribution among ECOR strains (Hartl et al. 1986), approximately 70% (10 of 14) of the isolates contained plasmids over 40 kb in length (table 2). Digestion of plasmids with Bln I and Not I revealed that in 7 of the 14 isolates (ECOR 13, 14, 15, 28, 29, 62, and 71), plasmids were included in the original estimates of genome size (Bergthorsson and Ochman 1995). We detected no significant relationship between chromosome size and the cumulative sizes of large plasmids for these 14 strains (r 5 20.09), indicating that total DNA content in enteric bacteria is not under strong stabilizing selection, despite the fact that E. coli K12 and Typhimurium LT2 have very similar chromosome lengths. In some instances, the present analysis of plasmids within these strains resolved certain inconsistencies between the estimates of genome size as derived from digestions with different restriction enzymes. For example, estimates of genome size for ECOR 29 based on Bln I and Not I digests were 5,121 and 4,763 kb, respectively (Bergthorsson and Ochman 1995). However, this strain is now known to harbor a 170-kb plasmid with two Bln I restriction sites and no Not I sites. In that only linearized plasmids are resolved under our electrophoretic conditions (Beverly 1988), the original genome size estimate of this strain calculated from Bln I digests, which included plasmid fragments, was larger than that estimated from Not I digests, which did not include any plasmid-associated fragments. And due to plasmid copy number, these Bln I fragments corresponded to doublets in the original digests resulting in an overestimation by some 340 kb. Once these plasmid fragments are eliminated, the chromosome size estimates for ECOR 29 based on the fragments produced by Bln I and Not I digests differ by less than 1% and agree well with values obtained from I-CeuI digests. Growth Rates If the rate of cell division is largely limited by replication rate and cell size, large genome sizes are expected to decrease growth rate (Stouthamer and Koojiman 1993). Mikkola and Kurland (1991) measured growth rates in minimal media for several ECOR strains, and, when combined with the chromosome size data presented here, there is a significant negative correlation between chromosome size and their growth rate estimates (r 5 20.44, n 5 28, P , 0.05; Kendall’s t 5 20.42, P , 0.05). When the phylogenetic relationships of the ECOR strains have been taken into account, the correlation coefficient is still significant (r 5 20.44, n 5 26, P , 0.05); however, this correlation is strongly dependent on one data point (the contrast between ECOR 56 and 57) and is sensitive to transformations made on the branch lengths. Furthermore, a nonparametric test of associations between phylogenetic contrasts on chromosome size and growth rate is not significant (Kendall’s t 5 20.18, P . 0.1). Pathogenicity Islands Several large clusters of virulence genes have been discovered in pathogenic strains of E. coli (Hacker et al. 1990; McDaniel et al. 1995). The LEE pathogenicity island promotes attaching and effacing lesions and is inserted at the selC locus of enteropathogenic E. coli (McDaniel et al. 1995). Using a PCR-based assay, only ECOR 37 contains a LEE island at the selC locus. There may be an additional insertion at selC in this strain, because the PCR product of the right-hand LEE junction is approximately 6 kb in length, compared to 418 bp detected in LEE-containing enteropathogenic strains of E. coli. Discussion Despite the high degree of similarity in the size and organization of the E. coli K12 and S. enterica sv. Typhimurium LT2 chromosomes, natural isolates of E. coli display a wide range of variation in chromosome lengths, with some strains differing by more than 1 Mb. In E. coli, the average length of a gene is almost 1 kb, and some 85% of its chromosome is occupied by coding sequences (Burland et al. 1993, 1995). Therefore, the difference between the strains of E. coli with the smallest and largest chromosomes might involve more than 800 genes. Although the nature of most of this genomic variation is presently unknown, there are three general sources that could contribute to differences in chromosome length: (1) duplications and deletions; (2) the acquisition of foreign DNA, including the integration of plasmids and phages; and (3) the accumulation of repetitive DNA, such as insertion sequences and transposons. The Sources of Chromosome Size Variation Riley and Labedan (1996) reported that more than half of the coding sequences in E. coli K12 share regions of similarity with other genes in the genome, suggesting that the evolution of the E. coli chromosome involved Chromosome Size Variation in E. coli the duplication and subsequent divergence of ancestral genes. Despite their conclusions, large-scale duplications are probably not an important factor in chromosome size variation observed within E. coli. Although duplications are relatively common, most are unstable and are maintained only under strong selection (Sonti and Roth 1989). Moreover, we did not observe any additional I-CeuI sites, suggesting that little of the chromosome size variation in natural populations of E. coli originated through duplications. In some cases, E. coli K12 has more than one gene coding for the same function, such as the genes coding for ornithine carbamoyltransferase, argI and argF, which show 78% nucleotide sequence similarity (Van Vliet, Boyen, and Glansdorff 1988). However, argF is located in a region flanked by IS1 elements and has an unusually high GC content, suggesting that it was gained by horizontal transfer rather than through the duplication of an existing gene (York and Stodolsky 1981; Van Vliet, Boyen, and Glansdorff 1988). Therefore, much of the similarity among E. coli K12 genes detected by Riley and Labedan (1996) could also have arisen through the transfer or fusing of orthologous genes (i.e., genes that diverged after speciation but were later incorporated into the same genome), rather than through gene duplications. Horizontal transfer is likely to be the major source of chromosome size variation among natural strains. Examinations of the features of sequenced genes of E. coli K12 suggest that at least 6% (Whittam and Ake 1993), or as much as 17% (Medigue et al. 1991; Lawrence and Ochman 1997), of the E. coli K12 genome has atypical GC contents or codon usage patterns, suggesting that these genes were acquired through horizontal transfer. And, based on an extrapolation from the number of recently acquired genes, Lawrence and Ochman (1997) estimate that the E. coli K12 lineage has gained and lost more than 3,000 kb since its divergence from S. enterica. While size variation in eukaryotic genomes is largely attributable to repetitive DNA, prokaryotic genomes consist primarily of single-copy sequences. We expect no simple relationship between genome size and the amount of multicopy DNA in bacteria because transposable sequences, such as IS elements, promote rearrangements that can either decrease or increase genome size. The number of IS elements is highest in strains of E. coli with the smallest chromosomes; however, we detected no significant correlation between number of insertion sequences and chromosome size when the phylogenetic relationships of the strains were taken into account. Distribution of Length Variation in Bacterial Chromosomes In some species of bacteria, alterations in chromosome structure map to specific portions of the chromosome. For example, in Bacillus cereus, where chromosome sizes range from 2.4 to 5.3 Mb (Carlson and Kolstø 1994), a single highly variable region is present only in strains with larger chromosomes, and in Pseu- 13 domonas aeruginosa, the chromosome can be subdivided into two segments—a stable auxotroph-rich region encompassing the origin of replication and a variable auxotroph-poor region flanking the terminus (Römling, Greipel, and Tümmler 1995). Comparisons of physical maps of laboratory isolates descended from E. coli K12 reveal a disproportionately large number of chromosome rearrangements near the replication terminus, presumably due to higher rates of recombination in this region (Perkins et al. 1993). This is also consistent with the notion that the termination region contains fewer essential genes (Henson and Kuempel 1985) and can tolerate the accumulation of alterations. However, the density of genes near the terminus appears to be similar to that of the rest of the chromosome (Moir et al. 1992), and in natural isolates of E. coli, we found that the region surrounding the replication terminus was not more variable in size than the region surrounding the replication origin. In fact, strains with larger genomes display an increase in the sizes of the majority of their I-CeuI restriction fragments, suggesting that the acquisition and deletion events leading to the variation among natural isolates of E. coli are distributed, and correlated, throughout the chromosome. In Salmonella, most variation in chromosome length is associated with a single region (I-CeuI fragment G) which corresponds to a 670-kb region at 909 to 59 on the E. coli K12 physical map (Liu, Hessel, and Sanderson 1993). The same chromosomal region is also highly variable in natural isolates of E. coli, perhaps due to the presence of conserved sequences that act as sites for recombination. For example, this region contains several tRNA genes, such as the leucyl-tRNA at 979, which can serve as insertion sites of phages and pathogenicity islands (Cheetham and Katz 1995). Chromosome Symmetry The replication origin and terminus of the E. coli chromosome are diametrically opposed, and in experimental populations of E. coli, inversions that disrupt this orientation impede cellular growth rates (Riley and Sanderson 1990; Hill, Harvey, and Gray 1990; François et al. 1990). Moreover, in two serovars of S. enterica— Typhi and Paratyphi A—large chromosomal insertions have been accompanied by inversions that restore the symmetry between the replication origin and terminus (Liu and Sanderson 1995a, 1995b). Among natural isolates of E. coli, there is a highly significant association between the cumulative size of ICeuI fragments clockwise (fragments D through G) and those counterclockwise (fragments B and C) of the replication origin. This symmetrical distribution of length variation in natural isolates supports the notion that the large-scale organization of bacterial chromosomes is, in part, governed by stabilizing selection which acts to maintain equal distances on both sides of the chromosome between the origin and terminus. Strains that have incurred an insertion on one side of the replication origin are more likely to have changes that enlarge the opposite side. Although these multiple events may not 14 Bergthorsson and Ochman fully restore chromosome symmetry, they can serve to compensate for the original change. It should also be noted that there is also a correlation between the region flanking the origin of replication—fragments B through G—and fragment A, which contains the replication terminus. This correlation is also caused by insertions that serve to maintain symmetry between the origin and terminus because the effects of a large insertion on one side of the origin of replication could be offset by an insertion anywhere on the opposite side. In principle, any disruption of chromosome asymmetry caused by a large insertion on either side of the chromosome could also be counterbalanced by a similarly large deletion on the same side, which would eliminate the correlation between the lengths of the regions flanking the replication origin and the terminus. Presumably, most chromosome size variation among strains is due to insertions rather than to deletions, because a large deletion would usually include essential genes and would be deleterious. Although we attribute the variation in sizes of ICeuI fragments to the cumulative effects of insertions and deletions, it is possible that inversions or translocations have caused some of the observed variation. However, such chromosomal rearrangements do not affect the overall variation in chromosome size and would contribute negative, not positive, correlations between the lengths of different chromosomal regions. The maintenance of chromosome symmetry may not be the only reason why the lengths of different portions of the chromosome are correlated. Differences between strains in their intrinsic rates of insertions or deletions could introduce correlations in length between different chromosomal regions and could also account for the correlation between the standardized variance in I-CeuI fragment length and the average length of a fragment. Genome Size and Growth Rate Mikkola and Kurland (1991) have determined that most of the variation in growth rates among natural isolates of E. coli is due to differences in translational efficiency. We find no significant association between total DNA content of natural isolates and the growth rate estimates of Mikkola and Kurland (1991). Therefore, the variation in growth rates resulting from ribosomal kinetics appears to overwhelm any effects due to DNA content. This counters the view that bacterial genomes respond to selection for faster growth rates by streamlining the sizes of their chromosomes. If small genomes were important for faster growth in nature, we would expect some association between ribosomal kinetics and genome size. However, growth rates of E. coli in nature are only one to two generations per day, very far removed from the maximum growth rate of 20-min doubling time of laboratory strains, and the constraints on growth rate under natural conditions are very different from those in culture. In fact, Mikkola and Kurland (1992) found that natural isolates of E. coli converge on the growth rate characteristic of laboratory isolates when allowed to adapt to laboratory conditions for a few hundred generations. The Ages of E. coli Subgroups and the Rate of Chromosome Size Divergence By scaling the genetic divergence among strains from different subgroups of E. coli to that between E. coli and S. enterica, it is possible to estimate the rate of chromosome size evolution within natural populations. Between isolates of E. coli from subgroup A to those of subgroups B2 and D (fig. 2), the average divergence at synonymous sites for six loci is 0.044 for mdh, 0.153 for trpB, 0.114 for trpC, 0.133 for putP, 0.016 for gapA, and 0.061 for crr (Nelson, Whittam, and Selander 1991; Hall and Sharp 1992; Nelson and Selander 1992; Boyd et al. 1994; Milkman 1996). The amount of divergence between E. coli and S. enterica at the same loci ranges from 0.252 for gapA to 1.39 for trpC. Assuming that E. coli and S. enterica diverged 140 MYA and that the rate of molecular evolution has been uniform within lineages, the dates of divergence between these E. coli subgroups ranges from 8 to 22 MYA. This estimate is considerably lower than the date of 80 MYA proposed by Tominaga et al. (1994) for the divergence of Shigella and E. coli, which was based on sequence variation of fliC genes. Since Shigella spp. are a polyphyletic group of strains that arose from within E. coli, one expects the time of divergence between Shigella and E. coli to be the same as, or less than, that between subgroups of E. coli. However, fliC of Shigella has undergone extensive recombination with the flagellin genes from Salmonella, and it is also under diversifying selection (Tominaga et al. 1994; Li et al. 1994), which increases its level of sequence polymorphism, leading to an overestimate of divergence time. Applying these divergence times to the average difference in chromosome size of isolates from subgroups A and subgroups B2 and D yields a rate of chromosome size divergence of 16–44 kb/Myr. Based on comparisons between E. coli K12 and S. enterica sv. Typhimurium LT2, chromosome size appears to be well conserved, despite an estimated rate of additions and deletions of 30 kb/Myr (Lawrence and Ochman 1997). Although chromosome sizes within E. coli are highly variable, there is a large phylogenetic component to this variation indicating that chromosome sizes also remain relatively stable in natural populations of E. coli over long evolutionary periods, which reflects some constraints on absolute chromosome size in addition to those imposed on chromosome symmetry. Acknowledgment This work was supported by NIH grant GM56120. LITERATURE CITED BERGTHORSSON, U., and H. OCHMAN. 1995. Heterogeneity of genome sizes among natural isolates of Escherichia coli. J. Bacteriol. 177:5784–5789. BEVERLEY, S. M. 1988. Characterization of the ‘unusual’ mobility of large circular DNAs in pulsed field-gradient electrophoresis. Nucleic Acids Res. 16:925–939. Chromosome Size Variation in E. coli BLATTNER, F. R., G. PLUNKETT III, C. A. BLOCH et al. (13 coauthors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1462. BOYD, E. F., K. NELSON, F.-S. WANG, T. S. WHITTAM, and R. K. SELANDER. 1994. Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica. Proc. Natl. Acad. Sci. USA 91:1280–1284. BRENNER, D. J., G. R. FANNING, F. J. SKERMAN, and S. FALKOW. 1972. Polynucleotide sequence divergence among strains of Escherichia coli and closely related organisms J. Bacteriol. 109:953–965. BURLAND, V., G. PLUNKETT III, D. L. DANIELS, and F. R. BLATTNER. 1993. DNA sequence and analysis of 136 kilobases of the Escherichia coli genome: organizational symmetry around the origin of replication. Genomics 16:551– 561. BURLAND, V., G. PLUNKETT III, H. J. SOFIA, D. L. DANIELS, and F. R. BLATTNER. 1995. Analysis of the Escherichia coli genome VI: DNA sequence of the region from 92.8 through 100 minutes. Nucleic Acids Res. 12:2105–2119. CARLSON, C. R., and A.-B. KOLSTØ. 1994. A small (2.4 mb) Bacillus cereus chromosome corresponds to a conserved region of a larger (5.3 mb) Bacillus cereus chromosome. Mol. Microbiol. 13:161–169. CHEETHAM, B. F., and M. E. KATZ. 1995. A role for bacteriophages in the evolution and transfer of bacterial virulence determinants. Mol. Microbiol. 18:201–208. DEONIER, R. C. 1996. Native insertion sequence elements: Locations, distributions, and sequence relationships. Pp. 2000– 2011 in F. C. NEIDHARDT, R. CURTISS III, J. L. INGRAHAM, E. C. C. LIN, K. B. LOW, B. MAGASANIK, W. S. REZNIKOFF, M. RILEY, M. SCHAECHTER, and H. E. UMBARGER, eds. Escherichia coli and Salmonella typhimurium. Cellular and molecular biology. 2nd edition. ASM Press, Washington, D.C. DESJARDINS, P., B. PICARD, B. KALTENBÖCK, J. ELION, AND E. DENAMUR. 1995. Sex in Escherichia coli does not disrupt the clonal structure of the population: Evidence from random amplified polymorphic DNA and restriction-fragmentlength polymorphism. J. Mol. Evol. 41:440–448. DIMRI, G. P., K. E. RUDD, M. K. MORGAN, H. BAYAT, and G. F.-L. AMES. 1992. Physical mapping of repetitive extragenic palindromic sequences in Escherichia coli and phylogenetic distribution among Escherichia coli strains and other enteric bacteria. J. Bacteriol. 174:4583–4593. FELSENSTEIN, J. 1985. Phylogenies and the comparative method. Am. Nat. 125:1–15. FRANÇOIS, V., J. LOUARN, J.-E. REBOLLO, and J.-M. LOUARN. 1990. Replication termination, nondivisible zones, and structure of the Escherichia coli chromosome. Pp. 351–359 in K. DRLICA and M. RILEY, eds. The bacterial chromosome. ASM Press, Washington, D.C. GARLAND, T. JR., A. W. DICKERMAN, C. M. JANIS, and J. A. JONES. 1993. Phylogenetic analysis of covariance by computer simulation. Syst. Biol. 42:265–292. GARLAND, T. JR., P. H. HARVEY, and A. R. IVES. 1992. Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst. Biol. 41:18–32. HACKER, J., L. BENDER, M. OTT, J. WINGENDER, B. LUND, R. MARRE, and W. GOEBEL. 1990. Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extraintestinal Escherichia coli isolates. Microb. Pathog. 8:213–225. HALL, B. G., L. L. PARKER, P. W. BETTS, R. F. DUBOSE, S. A. SAWYER, and D. L. HARTL. 1989. IS103, a new insertion 15 element in Escherichia coli: characterization and distribution in natural populations. Genetics 121:423–431. HALL, B. G., and P. M. SHARP. 1992. Molecular population genetics of Escherichia coli: DNA sequence diversity at the celC, crr, and gutB loci of natural isolates. Mol. Biol. Evol. 9:654–665. HARTL, D. L., M. MEDHORA, L. GREEN, and D. E. DYKHUIZEN. 1986. The evolution of DNA sequences in Escherichia coli. Phil. Trans. R. Soc. Lond. B Biol. Sci. 312:191–204. HENSON, J. M., and P. L. KUEMPEL. 1985. Deletion of the terminus region (340 kilobase pairs of DNA) from the chromosome of Escherichia coli. Proc. Natl. Acad. Sci. USA 82:3766–3770. HERZER, P. J., S. INOUYE, M. INOUYE, and T. S. WHITTAM. 1990. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli. J. Bacteriol. 172:6175–6181. HILL, C. W., S. HARVEY, and J. A. GRAY. 1990. Recombination between rRNA genes in Escherichia coli and Salmonella typhimurium. Pp. 335–340 in K. DRLICA and M. RILEY, eds. The bacterial chromosome. ASM Press, Washington, D.C. HOLM, S. 1979. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6:65–70. KADO, C. I., and S.-T. LIU. 1981. Rapid procedure for detection and isolation of large and small plasmids. J. Bacteriol. 145: 1365–1373. KOHARA, Y., K. AKIYAMA, and K. ISONO. 1987. The physical map of the whole E. coli chromosome: application of a new strategy for rapid analysis and sorting of a large genomic library. Cell 50:495–508. LAWRENCE, J. G., and H. OCHMAN. 1997. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44:383–397. LAWRENCE, J. G., H. OCHMAN, and D. L. HARTL. 1992. The evolution of insertion sequences within enteric bacteria. Genetics 131:9–20. LI, J., K. NELSON, A. C. MCWHORTER, T. S. WHITTAM, and R. K. SELANDER. 1994. Recombinational basis of serovar diversity in Salmonella enterica. Proc. Natl. Acad. Sci. USA 91:2552–2556. LIU, S.-L., and K. E. SANDERSON. 1992. A physical map of the Salmonella typhimurium LT2 genome made by using XbaI analysis. J. Bacteriol. 174:1662–1672. LIU, S.-L., and K. E. SANDERSON. 1995a. Rearrangements in the genome of the bacterium Salmonella typhi. Proc. Natl. Acad. Sci. USA 92:1018–1022. LIU, S.-L., and K. E. SANDERSON. 1995b. I-CeuI reveals conservation of the genome of independent strains of Salmonella typhimurium. J. Bacteriol. 177:3355–3357. LIU, S.-L., A. HESSEL, and K. E. SANDERSON. 1993. Genomic mapping with I-CeuI, an intron-encoded endonuclease specific for genes for ribosomal RNA, in Salmonella spp., Escherichia coli, and other bacteria. Proc. Natl. Acad. Sci. USA 90:6874–6878. MCDANIEL, T. K., K. G. JARVIS, M. S. DONNENBERG, and J. B. KAPER. 1995. A genetic locus of enterocyte effacement conserved among diverse enterobacterial pathogens. Proc. Natl. Acad. Sci. USA 92:1664–1668. MARSHALL, P., and C. LEMIEUX. 1992. The I-CeuI endonuclease recognizes a sequence of 19 base pairs and preferentially cleaves the coding strand of the Chlamydomonas moewusii chloroplast large subunit rRNA gene. Nucleic Acids Res. 20:6401–6407. MARTINS, E. P. 1996. Conducting phylogenetic comparative studies when the phylogeny is not known. Evolution 50: 12–22. 16 Bergthorsson and Ochman MÉDIGUE, C., T. ROUXEL, P. VIGIER, A. HÉNAUT, and A. DANCHIN. 1991. Evidence for horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222:851–856. MIKKOLA, R., and C. G. KURLAND. 1991. Is there a unique ribosome phenotype for naturally occurring Escherichia coli? Biochimie 73:1061–1066. . 1992. Selection of laboratory wild-type phenotype from natural isolates of Escherichia coli in chemostats. Mol. Biol. Evol. 9:394–402. MILKMAN, R. 1996. Recombinational exchange among clonal populations. Pp. 2663–2684 in F. C. NEIDHARDT, R. CURTISS III, J. L. INGRAHAM, E. C. C. LIN, K. B. LOW, B. MAGASANIK, W. S. REZNIKOFF, M. RILEY, M. SCHAECHTER, and H. E. UMBARGER, eds. Escherichia coli and Salmonella typhimurium. Cellular and molecular biology. 2nd edition. ASM Press, Washington, D.C. MOIR, P. D., R. SPIEGELBERG, I. R. OLIVER, J. H. PRINGLE, and M. MASTERS. 1992. Proteins encoded by the Escherichia coli replication terminus region. J. Bacteriol. 174:2102– 2110. NELSON, K., and R. K. SELANDER. 1992. Evolutionary genetics of the proline permease gene (putP) and the control region of the proline utilization operon in populations of Salmonella and Escherichia coli. J. Bacteriol. 174:6886–6895. NELSON, K., T. S. WHITTAM, and R. K. SELANDER. 1991. Nucleotide polymorphism and evolution in the glyceraldehyde3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. Proc. Natl. Acad. Sci. USA 88:6667–6671. OCHMAN, H., and R. K. SELANDER. 1984. Standard reference strains of Escherichia coli from natural populations. J. Bacteriol. 157:690–693. OCHMAN, H., and A. C. WILSON. 1987. Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J. Mol. Evol. 26:74–86. PERKINS, J. D., J. D. HEATH, B. R. SHARMA, and G. M. WEINSTOCK. 1993. XbaI and BlnI genomic cleavage maps of Escherichia coli K-12 strain MG1655 and comparative analysis of other strains. J. Mol. Biol. 232:419–445. RICE, W. R. 1989. Analyzing tables of statistical tests. Evolution 43:223–225. RILEY, M., and S. KRAWIEC. 1987. Genome organization. Pp. 967–981 in F. C. NEIDHARDT, J. L. INGRAHAM, K. B. LOW, B. MAGASANIK, M. SCHAECHTER, and H. E. UMBARGER, eds. Escherichia coli and Salmonella typhimurium: cellular and molecular biology. ASM Press, Washington, D.C. RILEY, M., and B. LABEDAN. 1996. Escherichia coli gene products: physiological functions and common ancestries. Pp. 2118–2202 in F. C. NEIDHARDT, R. CURTISS III, J. L. INGRAHAM, E. C. C. LIN, K. B. LOW, B. MAGASANIK, W. S. REZNIKOFF, M. RILEY, M. SCHAECHTER, and H. E. UMBARGER, eds. Escherichia coli and Salmonella typhimurium. Cellular and molecular biology. 2nd edition. ASM Press, Washington, D.C. RILEY, M., and K. E. SANDERSON. 1990. Comparative genetics of Escherichia coli and Salmonella typhimurium. Pp. 85– 95 in K. DRLICA and M. RILEY, eds. The bacterial chromosome. ASM Press, Washington, D.C. RÖMLING, U., J. GREIPEL, and B. TÜMMLER. 1995. Gradient of genomic diversity in the Pseudomonas aeruginosa chromosome. Mol. Microbiol. 17:323–332. SAWYER, S. A., D. E. DYKHUIZEN, R. F. DUBOSE, L. GREEN, T. MUTANGADURA-MHLANGA, D. F. WOLCZYK, and D. L. HARTL. 1987. Distribution and abundance of insertion sequences among natural isolates of Escherichia coli. Genetics 115:51–63. SHARP, P. M. 1991. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J. Mol. Evol. 33:23–33. SMITH, C. L., J. G. ECONOME, A. SCHUTT, S. KLCO, and C. R. CANTOR. 1987. A physical map of the Escherichia coli K12 genome. Science 236:1448–1453. SOKAL, R. R., and F. J. ROHLF. 1981. Biometry. Freeman, San Fransisco. SONTI, R. V., and J. R. ROTH. 1989. Role of gene duplications in the adaptation of Salmonella typhimurium to growth on limiting carbon sources. Genetics 123:19–28. STOUTHAMER, A. H., and S. A. L. M. KOOIJMAN. 1993. Why it pays for bacteria to delete disused DNA and to maintain megaplasmids. Antonie van Leeuwenhoek 63:39–43. TOMINAGA, A., M. A.-H. MAHMOUD, T. MUKAIHARA, and M. ENOMOTO. 1994. Molecular characterization of intact, but cryptic, flagellin genes in the genus Shigella. Mol. Microbiol. 12:277–285. UMEDA, M., and E. OHTSUBO. 1989. Mapping of insertion elements IS1, IS2, and IS3 on the Escherichia coli K-12 chromosome. Role of the insertion elements in formation of Hfrs and F9 factors and in rearrangements of bacterial chromosomes. J. Mol. Biol. 208:601–614. VAN VLIET, F., A. BOYEN, and N. GLANSDORFF. 1988. On interspecies gene transfer: the case of the argF gene of Escherichia coli. Ann. Inst. Pasteur/Microbiol. 139:493–496. WHITTAM, T. S., and S. E. AKE. 1993. Genetic polymorphisms and recombination in natural populations of Escherichia coli. Pp. 223–245 in N. TAKAHATA and A. G. CLARK, eds. Mechanisms of molecular evolution. Japan Scientific Societies Press, Tokyo. YORK, M. K., and M. STODOLSKY. 1981. Characterization of P1argF derivatives from Escherichia coli K12 transduction. I. IS1 elements flank the argF gene segment. Mol. Gen. Genet. 181:230–240. JULIAN P. ADAMS, reviewing editor Accepted September 29, 1997