Prescott’s Microbiology, 9th Edition 18 Microbial Genomics CHAPTER OVERVIEW This chapter introduces genomics, a revolutionary new discipline in the biological sciences. Techniques important to the study of genomes are discussed. Bioinformatics, functional genomics, and comparative genomics are detailed. Proteomics theory and techniques are discussed. The chapter then gives numerous examples of the types of patterns already being discerned in the analysis of the microbial genomes thus far sequenced. Finally, metagenomic analysis of environmental communities is introduced. LEARNING OUTCOMES After reading this chapter you should be able to: • • • • • • • • • • • • • • • • • • explain how DNA is sequenced by the Sanger chain termination method contrast and compare the advantages and disadvantages of the Sanger method with massively parallel sequencing methods list the steps used in whole-genome shotgun cloning describe the multiple strand displacement method and how this technique is used explain how a potential protein coding is recognized within a genome sequence compare the meaning of the terms orthologue and paralogue differentiate between a conserved hypothetical protein and a putative protein of unknown function describe the construction of a physical genome map explain how genome annotation can be used to graphically represent the metabolism, transport, motility, and other key features of a microbe contrast and compare microarray analysis with RNA-seq in the study of transcriptomes explain how 2-D gel electrophoresis is able to resolve two proteins of identical molecular weight summarize the importance of mass spectrometry in analyzing protein structure explain why DNA-protein interactions are of interest and how they can be experimentally identified discern the general relationship between genome size and organism complexity describe the genomic differences that distinguish intracellular parasites from free-living microbes differentiate between the construction and screening of a genomic library (chapter 17) and a metagenomic library list two applications of metagenomics in any field of microbial biology define the human microbiome and explain the role metagenomics plays in its investigation CHAPTER OUTLINE I. Determining DNA Sequences A. Sanger DNA sequencing 1. Uses dideoxynucleoside triphosphates (ddNTPs) in DNA synthesis; these lack a 3′-hydroxyl and terminate DNA synthesis 2. Single strands of DNA are mixed with a primer, DNA polymerase I, four deoxynucleoside triphosphates (one is labeled), and a small amount of one of the ddNTPs; DNA synthesis begins with primer but terminates each time a ddNTP is added to the chain 1 © 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner. This document may not be copied, scanned, duplicated, forwarded, distributed, or posted on a website, in whole or part. Prescott’s Microbiology, 9th Edition 3. Four reactions are run, each with a different ddNTP; these reactions generate DNA fragments of different length because the site at which the ddNTP is inserted is random 4. Newly synthesized DNA fragments are separated electrophoretically on a polyacrylamide gel or with capillary electrophoresis often using an automated system; the gel can autoradiographed if radioactive ddNTPs were used or monitored with a laser if fluorescent ddNTPs were used; the sequence is then read from the autoradiogram or chromatographic trace B. Next generation DNA sequencing 1. Newer sequencing technologies do not require the construction of genomic clone libraries; these methods attach DNA to solid substrates, PCR amplify sequences, and separate DNA fragments. 2. Three approaches are available: pyrosequencing (454 Life Sciences), SOLEXA, and SOLiD technology (sequencing by ligation) II. Genome Sequencing A. Whole-genome shotgun sequencing 1. Library construction—chromosomes are broken into gene-sized fragments, inserted into plasmids, and transformed into special E. coli strains 2. Random sequencing—the cloned fragments are sequenced, typically several times to assure full coverage 3. Fragment alignment and gap closure—DNA fragments are clustered and assembled into longer stretches of sequence by comparing nucleotide sequence overlaps between fragments producing contigs (contiguous sequences); the contigs are aligned in the proper order to form the completed genome sequence; gaps in the sequence are filled 4. Editing—sequence is proofread to resolve any ambiguities B. Single-cell genomic sequencing uses DNA polymerase from bacteriophage phi29 to randomly amplify many genomic DNA fragments using a multiple strand displacement (MDA) scheme III. Bioinformatics A. The field concerned with the management and analysis of biological data using computers B. Genome annotation is done once the sequence is obtained; annotation involves identifying open reading frames (ORFs), determining potential amino acid sequences, and comparison to known protein and DNA sequences (using alignments and BLAST) C. These comparisons allow tentative assignment of gene function as well as identification of transposable elements, operons, and repeat sequences, and the detection of various metabolic pathways D. Two or more genes in the genome of a single organism that arise through duplication of a common ancestral gene are called paralogues, and between genomes are called orthologues IV. Functional Genomics A. Functional genomics is focused on how genes and genomes operate; physical maps of genomes are useful in annotation B. Metabolic pathways and physiological features can be modeled using annotated genomes where potential functional proteins have been defined C. Transcriptome analysis 1. DNA microarrays—solid supports (e.g., glass) that have DNA attached in highly organized arrays of spots; in commercial chips, the array may consist of many expressed sequence tags (ESTs; an expressed gene product made from cDNA) covering every ORF of an organism 2. The mRNA (transcriptome) or cDNA to be analyzed (target mixture) is isolated, labeled with fluorescent reporter groups, and incubated with the DNA chip; fluorescence at an address on the chip indicates that the DNA probe on the chip is bound to a mRNA or cDNA in the target mixture; analysis of the hybridization pattern shows which genes are being transcribed 3. Using this procedure, the characteristic expression of whole sets of genes during differentiation or in response to environmental changes can be observed; patterns of gene expression can be detected using hierarchical cluster analysis and functions can be tentatively assigned based on expression V. Proteomics A. Study of genome function at the level of translation 1. Proteome—entire collection of proteins that an organism produces; proteomics is the study of the proteome 2 © 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner. This document may not be copied, scanned, duplicated, forwarded, distributed, or posted on a website, in whole or part. Prescott’s Microbiology, 9th Edition 2. Functional proteomics determines the function of proteins, how they interact with each other, and how they are regulated a. Two-dimensional electrophoresis is used to resolve thousands of proteins in a mixture; proteins are first separated based on charge qualities and then by size b. Mass spectrometry is used to tentatively identify the proteins isolated by two-dimensional electrophoresis; N-terminal amino acid sequencing can be used to determine ORFs when the genome sequence is available 3. Structural proteomics attempts to directly determine the three-dimensional structures of many proteins and then uses that information to predict the structures of other proteins and protein complexes based on their amino acid sequence (protein modeling) B. Similar studies can be performed using lipidomics (lipid profiles), glycomics (carbohydrate profiles), and metabolomics (small molecule profiles) C. Probing DNA-protein interactions 1. Electrophoretic mobility shift assays examine DNA-protein interactions by observing changes in the migration of DNA fragments when bound to target proteins 2. Chromatin immunoprecipitation (ChIP) assays examine DNA-protein complexes fixed in vivo and then detected by antibody precipitation; the captured DNA molecules can be detected using microarray analysis (ChIP-chip) VI. Systems Biology A. . Seeks to integrate the molecular interactions among many chemical components of a cell into a theoretical framework that broadly describes living systems B. Production of predictable networks allow new testable hypotheses to be formed VII. Comparative Genomics A. Comparisons of genomes and their functional genes leads to new insights in microbial biology and the development of vaccines (reverse vaccinology) B. Genome sizes vary among domains and organisms with varied ecological roles C. The core genome (essential backbone of genes) is a set of genes that all organisms within a monophyletic group share; the pan-genome (flexible gene pool) is the collection of all genes within a given group D. Horizontal gene transfer (HGT) is important for the exchange of genetic material between organisms; mobile elements integrated into the genome (genomic islands) can confer virulence (pathogenicity islands) E. Synteny is used to compare the order in which genes appear in different phylogenetic groups VIII. Metagenomics A. Environmental genomics, or metagenomics, is being used to study microbial diversity in natural systems; fewer than 1% of the microbes in the environment can grow in the laboratory, so genetic techniques are used to directly detect and enumerate microbial populations B. The genomes of entire microbial communities can be sequenced and assembled, giving a picture of their species composition and functionality; new species (phylotypes) are detected, unique genes catalogued, and new functions ascribed to taxa CRITICAL THINKING 1. In order for computers to identify open reading frames (ORFs) and other features of a genome, they must be programmed to do so. What features of a nucleotide sequence would be important for identifying ORFs? Explain your choices. Would the features be the same for both eukaryotic and prokaryotic organisms? Explain. 2. Molecular microbial ecology uses genetic techniques to describe microbial communities in the environment. If you were asked to describe the diversity of the microbes in a lake rich in Epsom salts (magnesium sulfate), what research plan would you pursue? Would you include a cultivation campaign? Why or why not? Which molecular techniques would you apply and what might be their limitations? 3 © 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner. This document may not be copied, scanned, duplicated, forwarded, distributed, or posted on a website, in whole or part. Prescott’s Microbiology, 9th Edition 3. Comparison of DNA and protein sequences to known sequences of previously sequenced and assigned organisms greatly accelerates the process of gene identification in DNA and functional domain identification in proteins. Explain how this facilitates these processes. 4. How does the lack of introns in most eubacterial organisms and the organization of genes into functional operons help accelerate the proteomic and metabolomic designation of genes found in newly sequenced organisms? CONCEPT MAPPING CHALLENGE Use the words listed below to construct a concept map that describes the ways in which a genome might be sequenced and analyzed. Provide your own linking words. Whole-genome shotgun sequencing (Celera), Sanger DNA sequencing, Pyrosequencing Paralogue, Solid DNA microarray, Open reading frame (ORF), Orthologue, Post-Sanger DNA sequencing, Genome annotation, SOLEXA, BLAST 4 © 2014 by McGraw-Hill Education. This is proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner. This document may not be copied, scanned, duplicated, forwarded, distributed, or posted on a website, in whole or part.