12 Genomes Chapter 12 Genomes Key Concepts • 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products • 12.2 Prokaryotic Genomes Are Relatively Small and Compact • 12.3 Eukaryotic Genomes Are Large and Complex • 12.4 The Human Genome Sequence Has Many Applications Chapter 12 Opening Question What does genome sequencing reveal about dogs and other animals? Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products The Human Genome Project was proposed in 1986 to determine the normal sequence of all human DNA. The publicly funded effort was aided and complemented by privately funded groups. Methods used were first developed to sequence prokaryotes and simple eukaryotes. Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products A key to interpreting DNA sequences is to experiment simultaneously on a given chromosome and to break the DNA into fragments. The fragment sequences are put together using larger, overlapping fragments. Next-generation DNA sequencing uses DNA replication and the polymerase chain reaction (PCR). Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products One approach to next-generation DNA sequencing: • DNA is cut into 100 bp fragments. • DNA is denatured by heat, and each single strand then acts a template for synthesis. • Each fragment is attached to adapter sequences and then to supports. • Fragments are then amplified by PCR. Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products Amplified DNA attached to a solid substrate is ready for sequencing: • Fragments are denatured and primers, DNA polymerase, and fluorescently labeled nucleotides are added. • DNA is replicated by adding one nucleotide at a time. • Fluorescent color of the particular nucleotide is detected as it is added, indicating the sequence of the DNA. Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products The power of this method derives from the fact that: • It is fully automated and miniaturized. • Millions of different fragments are sequenced at the same time. This is called massively parallel sequencing. • It is an inexpensive way to sequence large genomes. Figure 12.1 DNA Sequencing (Part 1) Figure 12.1 DNA Sequencing (Part 2) Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products Determining sequences is possible because original DNA fragments are overlapping. Example: A 10 bp fragment cut three different ways yields TG, ATG, and CCTAC AT, GCC, and TACTG CTG, CTA, and ATGC The correct sequence is ATGCCTACTG. Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products For genome sequencing the fragments are called “reads.” The field of bioinformatics was developed to analyze DNA sequences using complex mathematics and computer programs. Figure 12.2 Arranging DNA Sequences Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products In functional genomics, sequences identify the functions of various parts: • Open reading frames—the coding regions of the genes, recognized by start and stop codons for translation, and sequences indicating location of introns • Amino acid sequences of proteins Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products • Regulatory sequences—promoters and terminators for transcription • RNA genes, including rRNA, tRNA, small nuclear RNA, and microRNA genes • Other noncoding sequences in various categories Figure 12.3 The Genomic Book of Life Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products Comparative genomics compares a newly sequenced genome with sequences from other organisms. It provides information about function of sequences and can trace evolutionary relationships. Genetic determinism—the concept that a phenotype is determined solely by his or her genotype Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products Many genes encode for more than one protein, through alternative splicing and posttranslational modifications. The proteome is the total of the proteins produced by an organism—more complex than its genome. Figure 12.4 Proteomics (Part 1) Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products Two techniques are used to analyze proteins and the proteome: • Two-dimensional gel electrophoresis separates proteins based on size and electric charges. • Mass spectrometry identifies proteins by their atomic masses. Proteomics seeks to identify and characterize all of the expressed proteins. Figure 12.4 Proteomics (Part 2) Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products The metabolome is the description of all of the metabolites of a cell or organism: • Primary metabolites are involved in normal processes, such as in pathways like glycolysis. Also includes hormones and other signaling molecules. • Secondary metabolites are often unique to particular organisms or groups. Examples: Antibiotics made by microbes, and chemicals made by plants for defense. Concept 12.1 There Are Powerful Methods for Sequencing Genomes and Analyzing Gene Products Metabolomics aims to describe the metabolome of a tissue or organism under particular environmental conditions. Analytical instruments can separate molecules with different chemical properties, and other techniques can identify them. Measurements can be related to physiological states. Figure 12.5 Genomics, Proteomics, and Metabolomics Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Features of bacterial and archaeal genomes: • Relatively small, with single, circular chromosome • Compact—mostly protein-coding regions • Most do not contain introns • Often carry plasmids, smaller circular DNA molecules Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Functional genomics assigns functions to the products of genes. H. influenzae chromosome has 1,727 open reading frames. When it was first sequenced, only 58 percent coded for proteins with known functions. Since then, the roles of almost all other proteins have been identified. More genes are involved in each function in the larger E. coli. Table 12.1 Gene Functions in Three Bacteria Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Next, the study of the smallest known genome (M. genitalium) was completed. Comparative genomics showed that M. genitalium lacks many enzymes and must obtain them from its environment. It also has very few genes for regulatory proteins—its flexibility is limited by its lack of control over gene expression. Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Transposons (or transposable elements) are DNA segments that can move from place to place in the genome. They can move from one piece of DNA (such as a chromosome), to another (such as a plasmid). If a transposon is inserted into the middle of a gene, it will be transcribed and result in abnormal proteins. Figure 12.6 DNA Sequences That Move (Part 1) Figure 12.6 DNA Sequences That Move (Part 2) Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Prokaryotes can be identified by their growth in culture, but DNA can also be isolated directly from environmental samples. Metagenomics—genetic diversity is explored without isolating intact organisms. DNA can be cloned for “libraries” or amplified and sequenced to detect known and unknown organisms. Figure 12.7 Metagenomics Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Comparing genomes of prokaryotes and eukaryotes: Certain genes are present in all organisms (universal genes); and some universal gene segments are present in many organisms. This suggests that a minimal set of DNA sequences is common to all cells. Concept 12.2 Prokaryotic Genomes Are Relatively Small and Compact Efforts to define a minimal genome involve computer analysis of genomes, the study of the smallest known genome (M. genitalium), and using transposons as mutagens. Transposons can insert into genes at random; the mutated bacteria are tested for growth and survival, and DNA is sequenced. Figure 12.8 Using Transposon Mutagenesis to Determine the Minimal Genome (Part 1) Concept 12.3 Eukaryotic Genomes Are Large and Complex There are major differences between eukaryotic and prokaryotic genomes: • Eukaryotic genomes are larger and have more protein-coding genes. • Eukaryotic genomes have more regulatory sequences. Greater complexity requires more regulation. • Much of eukaryotic DNA is noncoding, including introns, gene control sequences, and repeated sequences. Concept 12.3 Eukaryotic Genomes Are Large and Complex Several model organisms have been studied extensively. Model organisms are easy to grow and study in a laboratory, their genetics are well studied, and their characteristics represent a larger group of organisms. Table 12.2 Representative Sequenced Genomes Concept 12.3 Eukaryotic Genomes Are Large and Complex The yeast, Saccharomyces cerevisiae: Yeasts are single-celled eukaryotes. Yeasts and E. coli appear to use about the same number of genes to perform basic functions. However, the compartmentalization of the eukaryotic yeast cell requires it to have many more genes to target proteins to organelles. Concept 12.3 Eukaryotic Genomes Are Large and Complex The nematode, Caenorhabditis elegans: A millimeter-long soil roundworm made up of about 1,000 cells, yet has complex organ systems. Its genome is 8 times larger than yeast, and it has about 3.5 times as many proteincoding genes as do yeasts. Other genes are for cell differentiation, intercellular communication, and forming tissues from cells. Concept 12.3 Eukaryotic Genomes Are Large and Complex The fruit fly, Drosophila melanogaster: The fruit fly has ten times more cells and is more complex than C. elegans, undergoing more developmental stages. It has a larger genome with many genes encoding transcription factors needed for development. Figure 12.9 Functions of the Eukaryotic Genome Concept 12.3 Eukaryotic Genomes Are Large and Complex The thale cress, Arabidopsis thaliana: The genomes of some plants are huge, but A. thaliana has a much smaller genome. Many of the genes found in fruit flies and nematodes have orthologs—genes with very similar sequences—in plants, suggesting a common ancestor. Concept 12.3 Eukaryotic Genomes Are Large and Complex Arabidopsis has some genes related to functions unique to plants: • Photosynthesis, water transport, assembly of the cell wall, and making molecules for defense against microbes and herbivores The basic plant genome may be determined by comparing different plant genomes for common sequences. Figure 12.10 Plant Genomes Concept 12.3 Eukaryotic Genomes Are Large and Complex Eukaryotes have closely related genes called gene families. These arose over evolutionary time when different copies of genes underwent separate mutations. For example: Genes encoding the globin proteins in hemoglobin and myoglobin all arose from a single common ancestral gene. Concept 12.3 Eukaryotic Genomes Are Large and Complex During development, different members of the globin gene family are expressed at different times and in different tissues. Hemoglobin of the human fetus contains γglobin, which binds O2 more tightly than adult hemoglobin. Hemoglobins with different affinities are provided at different stages of development. Figure 12.11 The Globin Gene Family Concept 12.3 Eukaryotic Genomes Are Large and Complex Many gene families include nonfunctional pseudogenes (Ψ), resulting from mutations that cause a loss of function, rather a new one. A pseudogene may simply lack a promoter, and thus fail to be transcribed, or a recognition site, needed for the removal of an intron. Concept 12.3 Eukaryotic Genomes Are Large and Complex Eukaryotic genomes have repetitive DNA sequences: • Highly repetitive sequences—short sequences (< 100 bp) repeated thousands of times in tandem; not transcribed • Short tandem repeats (STRs) of 1–5 bp are scattered around the genome and can be used in DNA fingerprinting. Concept 12.3 Eukaryotic Genomes Are Large and Complex Moderately repetitive sequences are repeated 10–1,000 times. • Includes the genes for tRNAs and rRNAs • Single copies of the tRNA and rRNA genes are inadequate to supply large amounts of these molecules needed by cells, so genome has multiple copies in clusters Most moderately repeated sequences are transposons. Concept 12.3 Eukaryotic Genomes Are Large and Complex Transposons are of two main types in eukaryotes: Retrotransposons (Class I) make RNA copies of themselves, which are copied into DNA and inserted in the genome. • LTR retrotransposons have long terminal repeats of DNA sequences • Non-LTR retrotransposons do not have LTR sequences—SINEs and LINEs are types of non-LTR retrotransposons Concept 12.3 Eukaryotic Genomes Are Large and Complex DNA transposons (Class II) do not use RNA intermediates. They are excised from the original location and inserted at a new location without being replicated. Table 12.3 Types of Sequences in Eukaryotic Genomes Concept 12.4 The Human Genome Sequence Has Many Applications By 2010 the complete haploid genome sequence was completed for more than ten individuals. Soon, a human genome will be sequenced for less than $1,000. Concept 12.4 The Human Genome Sequence Has Many Applications Some interesting facts about the human genome: • Protein-coding genes make up about 24,000 genes, less than 2 percent of the 3.2 billion base pair human genome. • Each gene must code for several proteins, and posttranscriptional mechanisms (e.g., alternative splicing) must account for the observed number of proteins in humans. Concept 12.4 The Human Genome Sequence Has Many Applications • An average gene has 27,000 base pairs, but size varies greatly as does the size of the proteins. • All human genes have many introns. • 3.5 percent of the genome is functional but noncoding—have roles in gene regulation (microRNAs) or chromosome structure. Concept 12.4 The Human Genome Sequence Has Many Applications • Over 50 percent of the genome is transposons and other repetitive sequences. • Most of the genome (97 percent) is the same in all people. • Chimpanzees share 95 percent of the human genome. Figure 12.12 Evolution of the Genome Concept 12.4 The Human Genome Sequence Has Many Applications Rapid genotyping technologies are being used to understand the complex genetic basis of diseases such as diabetes, heart disease, and Alzheimer’s disease. “Haplotype maps” are based on single nucleotide polymorphisms (SNPs)— DNA sequence variations that involve single nucleotides. SNPs are point mutations in a DNA sequence. Concept 12.4 The Human Genome Sequence Has Many Applications SNPs that differ are not all inherited as independent alleles. A set of SNPs that are close together on a chromosome are inherited as a linked unit. A piece of chromosome with a set of linked SNPs is called a haplotype. Analyses of human haplotypes have shown that there are, at most, 500,000 common variations. Concept 12.4 The Human Genome Sequence Has Many Applications Technologies to analyze SNPs in an individual genome include next-generation sequencing methods and DNA microarrays. A DNA microarray detects DNA or RNA sequences that are complementary to and hybridize with an oligonucleotide probe. The aim is to find out which SNPs are associated with specific diseases and identify alleles that contribute to disease. Figure 12.13 SNP Genotyping and Disease Concept 12.4 The Human Genome Sequence Has Many Applications Genetic variation can affect an individual’s response to a particular drug. A variation could make an drug more or less active in an individual. Pharmacogenomics studies how the genome affects the response to drugs. This makes it possible to predict whether a drug will be effective, with the objective of personalizing drug treatments. Figure 12.14 Pharmacogenomics Concept 12.4 The Human Genome Sequence Has Many Applications Comparisons of the proteomes of humans and other eukaryotes has revealed categories of proteins. The human proteome includes a set of 1,300 proteins—also present in yeasts, nematodes, and fruit flies—that carry out the basic metabolic functions of the cell. Concept 12.4 The Human Genome Sequence Has Many Applications Proteomics can be useful in the diagnosis of diseases by studying the pattern of proteins made in a particular tissue at a particular time. Metabolomics may also be able to aid in diagnostics when patterns of metabolites can be associated with physiology. Concept 12.4 The Human Genome Sequence Has Many Applications DNA fingerprinting refers to a group of techniques used to identify individuals by their DNA. Short tandem repeat (STR) analysis is most common. When several different STR loci are analyzed, a unique pattern becomes apparent. Can be used for questions of paternity and in crime investigation Figure 12.15 DNA Fingerprinting (Part 1) Figure 12.15 DNA Fingerprinting (Part 2) Answer to Opening Question • Genome sequencing in dogs led to the identification of an SNP in the IGF-1 gene that is important in determining size. • Large and small breeds have different alleles of the gene. • Another gene shows differences in the musculature of dogs and cattle when a mutation is present. Figure 12.16 Muscular Gene (Part 1) Figure 12.16 Muscular Gene (Part 2)