2. Genome Anatomies 2.1. An Overview of Genome Anatomies 2.2. The Anatomy of the Eukaryotic Genome 2.3. The Anatomy of the Prokaryotic Genome 2.4. The Repetitive DNA Content of Genomes LEARNING OUTCOME When you have read Chapter 2, you should be able to: 1. Draw diagrams illustrating the major differences between the genetic organizations of the genomes of humans, plants, insects, yeast and bacteria, and give an explanation for the C-value paradox 2. Describe the DNA-protein interactions that give rise to the chromatosome and the 30 nm chromatin fiber 3. State the functions of centromeres and telomeres and list their special structural features 4. Explain why chromosome banding patterns and the isochore model suggest that genes are not evenly distributed in eukaryotic chromosomes 5. Outline the differences between the gene contents of different eukaryotic genomes and explain, with examples, what is meant by ‘multigene family' 6. Describe the physical features and gene contents of mitochondrial and chloroplast genomes and discuss the current hypothesis concerning the origins of organelle genomes 7. Describe the physical structure of the Escherichia coli genome and indicate the ways in which this structure is and is not typical of other prokaryotes 8. Define, with examples, the term ‘operon' 9. Explain why prokaryotic genome sequences have complicated the species concept 10. Speculate on the content of the minimal prokaryotic genome and on the identity of distinctiveness genes 11. Define the term ‘satellite DNA' and distinguish between satellite, minisatellite and microsatellite DNA 12. Give examples of the various types of RNA and DNA transposons, and outline their transposition pathways 2.1. An Overview of Genome Anatomies Figure 2.1. Cells of eukaryotes (left) and prokaryotes (right). The top part of the figure shows a typical human cell and typical bacterium drawn to scale. The human cell is 10 μm in diameter and the bacterium is rod-shaped with dimensions of 1 × 2 μm. The lower drawings show the internal structures of eukaryotic and prokaryotic cells. Eukaryotic cells are characterized by their membrane-bound compartments, which are absent from prokaryotes. The bacterial DNA is contained in the structure called the nucleoid. Figure 2.2. Comparison of the genomes of humans, yeast, fruit flies, maize and Escherichia coli. (A) is the 50-kb segment of the human β T-cell receptor locus shown in Figure 1.14 . This is compared with 50-kb segments from the genomes of (B) Saccharomycescerevisiae (chromosome III; redrawn from Oliver et al., 1992); (C) Drosophila melanogaster (redrawn from Adams et al., 2000); (D) maize (redrawn from SanMiguel et al., 1996) and (E) E. coli K12 (redrawn from Blattner et al., 1997). See the text for more details. Figure 2.3. Plasmids are small circular DNA molecules that are found inside some prokaryotic cells. 2.2. The Anatomy of the Eukaryotic Genome Figure 2.4. Nuclease protection analysis of chromatin from human nuclei. Chromatin is gently purified from nuclei and treated with a nuclease enzyme. On the left, the nuclease treatment is carried out under limiting conditions so that the DNA is cut, on average, just once in each of the linker regions between the bound proteins. After removal of the protein, the DNA fragments are analyzed by agarose gel electrophoresis (see Technical Note 2.1) and found to be 200 bp in length, or multiples thereof. On the right, the nuclease treatment proceeds to completion, so all the DNA in the linker regions is digested. The remaining DNA fragments are all 146 bp in length. The results show that in this form of chromatin, protein complexes are spaced along the DNA at regular intervals, one for each 200 bp, with 146 bp of DNA closely attached to each protein complex. Figure 2.5. Nucleosomes. (A) Electron micrograph of a purified chromatin strand showing the ‘beads-on-a-string' structure. (Courtesy of Dr Barbara Hamkalo, University of California, Irvine.) (B) The model for the ‘beads-on-a-string' structure in which each bead is a barrel-shaped nucleosome with the DNA wound twice around the outside. Each nucleosome is made up of eight proteins: a central tetramer of two histone H3 and two histone H4 subunits, plus a pair of H2A-H2B dimers, one above and one below the central tetramer (see Figure 8.9 ). (C) The precise position of the linker histone relative to the nucleosome is not known but, as shown here, the linker histone may act as a clamp, preventing the DNA from detaching from the outside of the nucleosome. Figure 2.6. The solenoid model for the 30 nm chromatin fiber. In this model, the ‘beads-ona-string' structure of chromatin is condensed by winding the nucleosomes into a helix with six nucleosomes per turn. Higher levels of chromatin packaging are described in Section 8.1.2. Figure 2.7. The typical appearance of a metaphase chromosome. Metaphase chromosomes are formed after DNA replication has taken place, so each one is, in effect, two chromosomes linked together at the centromere. The arms are called the chromatids. A telomere is the extreme end of a chromatid Figure 2.8. The human karyogram. The chromosomes are shown with the G-banding pattern obtained after Giemsa staining. Chromosome numbers are given below each structure and the band numbers to the left. ‘rDNA' is a region containing a cluster of repeat units for the ribosomal RNA genes, which specify a type of noncoding RNA (Section 3.2.1). ‘Constitutive heterochromatin' is very compact chromatin which has few or no genes (Section 8.1.2). Redrawn from Strachan and Read (1999). Figure 2.9. The role of the kinetochores during nuclear division. During the anaphase period of nuclear division (see Figures 5.14 and 5.15 ), individual chromosomes are drawn apart by the contraction of microtubules attached to the kinetochores. Figure 2.10. Telomeres. The sequence at the end of a human telomere. The length of the 3′ extension is different in each telomere. See Section 13.2.4 for more details about telomeric DNA Figure 2.11. Gene density along the largest of the five Arabidopsis thaliana chromosomes. Chromosome 1, which is 29.1 Mb in length, is illustrated with the sequenced portions shown in red and the centromere and telomeres in blue. The gene map below the chromosome gives gene density in pseudocolor, from deep blue (low density) to red (high density). The density varies from 1 to 38 genes per 100 kb. Reprinted with permission from AGI (The Arabidopsis Genome Initiative), Nature, 408, 797–815. Copyright 2000 Macmillan Magazines Limited. Figure 2.12. Comparison of the gene catalogs of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, fruit fly and humans. Genes are categorized according to their function, as deduced from the protein domains specified by each gene. Redrawn from IHGSC (2001) Figure 2.13. Relationship between the human gene catalog and the catalogs of other groups of organism. The pie chart categorizes the human gene catalog according to the distribution of individual genes in other organisms. The chart shows, for example, that 22% of the human gene catalog is made up of genes that are specific to vertebrates, and that another 24% comprises genes specific to vertebrates and other animals. Genes are categorized according to their function, as deduced from the protein domains specified by each gene. Redrawn from IHGSC (2001). Figure 2.14. The human α- and β-globin gene clusters. The α-globin cluster is located on chromosome 16 and the β-cluster on chromosome 11. Both clusters contain genes that are expressed at different developmental stages and each includes at least one pseudogene. Note that expression of the α-type gene ξ2 begins in the embryo and continues during the fetal stage; there is no fetal-specific α-type globin. The θ pseudogene is expressed but its protein product is inactive. None of the other pseudogenes is expressed. For more information on the developmental regulation of the β-globin genes, see Section 8.1.2. Figure 2.15. The Saccharomyces cerevisiae mitochondrial genome. Because of their relatively small sizes, many mitochondrial genomes have been completely sequenced. In the yeast genome, the genes are more spaced out than in the human mitochondrial genome ( Figure 1.22 ) and some of the genes have introns. This type of organization is typical of many lower eukaryotes and plants. The yeast genome contains five additional open reading frames (not shown on this map) that have not yet been shown to code for functional gene products, and there are also several genes located within the introns of the discontinuous genes. Most of the latter code for maturase proteins involved in splicing the introns from the transcripts of these genes (Section 10.2.3). Abbreviations: ATP6, ATP8, ATP9, genes for ATPase subunits 6, 8 and 9, respectively; COI, COII, COIII, genes for cytochrome c oxidase subunits I, II and III, respectively; Cytb, gene for apocytochrome b; var 1, gene for a ribosomeassociated protein. Ribosomal RNA and transfer RNA are two types of non-coding RNA (Section 3.2.1). The 9S RNA gene specifies the RNA component of the enzyme ribonuclease P (Section 10.2.2). Figure 2.16. The rice chloroplast genome. Only those genes with known functions are shown. A number of the genes contain introns which are not indicated on this map. These discontinuous genes include several of those for tRNAs, which is why the tRNA genes are of different lengths even though the tRNAs that they specify are all of similar size 2.3. The Anatomy of the Prokaryotic Genome Figure 2.17. Supercoiling. The diagram shows how underwinding a circular doublestranded DNA molecule results in negative supercoiling Figure 2.18. A model for the structure of the Escherichia coli nucleoid. Between 40 and 50 supercoiled loops of DNA radiate from the central protein core. One of the loops is shown in circular form, indicating that a break has occurred in this segment of DNA, resulting in a loss of the supercoiling Figure 2.19. The genome of Escherichia coli K12. The map is shown with the origin of replication (Section 13.2.1) positioned at the top. Genes on the outside of the circle are transcribed in the clockwise direction and those on the inside are transcribed in the anticlockwise direction. Image supplied courtesy of Dr FR Blattner, Laboratory of Genetics, University of Wisconsin-Madison. Reproduced with permission Figure 2.20. Two operons of Escherichia coli. (A) The lactose operon. The three genes are called lacZ, lacY and lacA, the first two separated by 52 bp and the second two by 64 bp. All three genes are expressed together, lacY coding for the lactose permease which transports lactose into the cell, and lacZ and lacA coding for enzymes that split lactose into its component sugars - galactose and glucose. (B) The tryptophan operon, which contains five genes coding for enzymes involved in the multistep biochemical pathway that converts chorismic acid into the amino acid tryptophan. The genes in the tryptophan operon are closer together than those in the lactose operon: trpE and trpD overlap by 1 bp, as do trpB and trpA; trpD and trpC are separated by 4 bp, and trpC and trpB by 12 bp. For more details on the regulation of these operons, see Sections 9.3.1 and 12.1.1 Figure 2.21. A typical operon in the genome of Aquifex aeolicus. The genes code for the following proteins: gatC, glutamyl-tRNA aminotransferase subunit C, which plays a role in protein synthesis (Section 11.2); recA, recombination protein RecA; pilU, twitching mobility protein; cmk, cytidylate kinase, required for synthesis of cytidine nucleotides; pgsA, phosphotidylglycerophosphate synthase, an enzyme involved in lipid biosynthesis; recJ, single-strand-specific endonuclease RecJ, which is another recombination protein (Section 14.3). Figure 2.22. The impact of lateral gene transfer on the content of prokaryotic genomes. The chart shows the DNA that is unique to a particular species in blue and the DNA that has been acquired by lateral gene transfer in red. The number at the end of each bar indicates the percentage of the genome that derives from lateral transfer. Note that intergenic regions are omitted from this analysis. Redrawn from Ochman et al. (2000) Figure 2.23. Lateral gene transfer obscures the evolutionary relationships between species. In (A) a group of eight modern species has evolved from an ancestor without lateral gene transfer. The evolutionary relationships between the species can be inferred by comparisons between their DNA sequences, using the molecular phylogenetics techniques described in Chapter 16. In (B) extensive lateral gene transfer has occurred. The evolutionary histories of the modern species cannot now be inferred by standard molecular phylogenetics because one or more of the species may have acquired the sequences that are being compared by lateral gene transfer rather than by inheritance from a direct ancestor Figure 2.24. Satellite DNA from the human genome. Human DNA has an average GC content of 40.3% and average buoyant density of 1.701 g cm-3. Fragments made up mainly of single-copy DNA have a GC content close to this average and are contained in the main band in the density gradient. The satellite bands at 1.687, 1.693 and 1.697 g cm-3 consist of fragments containing repetitive DNA. The GC contents of these fragments depend on their repeat motif sequences and are different from the genome average, meaning that these fragments have different buoyant densities to single-copy DNA and migrate to different positions in the density gradient. 2.4. The Repetitive DNA Content of Genomes Figure 2.25. The use of microsatellite analysis in genetic profiling. In this example, microsatellites located on the short arm of chromosome 6 have been amplified by the polymerase chain reaction (PCR; Section 4.3). The PCR products are labeled with a blue or green fluorescent marker and run in a polyacrylamide gel (see Technical Note 6.1), each lane showing the genetic profile of a different individual. No two individuals have the same genetic profile because each person has a different set of microsatellite length variants, the variants giving rise to bands of different sizes after PCR. The red bands are DNA size markers. Image supplied courtesy of PE Biosystems, Warrington, UK, and reproduced with permission. Figure 2.26. Retrotransposition. Compare with Figure 1.19 (page 22), and note that the events are essentially the same as those that result in a processed pseudogene. Figure 2.27. Retroelements. A comparison of the structures of four types of retroelement. Retroviruses and retrotransposons are LTR elements that possess long terminal repeats at each end. The gag gene codes for a series of proteins located in the virus core; pol codes for the reverse transcriptase and other enzymes involved in replication of the element; env codes for coat proteins. LINEs and SINEs are non-LTR retroelements or retroposons. Both have a poly(A) region (a long series of A nucleotides) at one end. Figure 2.28. Two mechanisms of transposition used by DNA transposons. For more details see Section 14.3.3 Figure 2.29. DNA transposons of prokaryotes. Four types are shown. Insertion sequences, Tn3-type transposons and transposable phages are flanked by short (< 50 bp) inverted terminal repeat (ITR) sequences. The resolvase gene of the Tn3-type transposon codes for a protein involved in the transposition process.