Fill in the Blank

advertisement
16
Microbial Genomics
CHAPTER OVERVIEW
This chapter introduces genomics, a revolutionary new discipline in the biological sciences. Techniques
important to the study of genomes are discussed. Bioinformatics, functional genomics, and comparative
genomics are detailed. Proteomics theory and techniques are discussed. The chapter then gives numerous
examples of the types of patterns already being discerned in the analysis of the microbial genomes thus far
sequenced. Finally, metagenomic analysis of environmental communities is introduced.
CHAPTER OBJECTIVES
After reading this chapter you should be able to:
•
•
•
•
•
•
define genomics and bioinformatics
compare and contrast structural genomics, functional genomics, and comparative genomics
describe methods of sequencing DNA and the whole-genome shotgun method for sequencing a genome
describe the types of analyses done for functional genomics and proteomics
discuss some of the insights gained thus far by the analysis of microbial genomes
discuss metagenomics
CHAPTER OUTLINE
I.
II.
Introduction
A. Genomics is the study of the molecular organization of genomes, their information content, and the
gene products they encode
B. Genomics will enable scientists to get a holistic view of microbial genetics, gene expression
patterns, microbial communities, and evolutionary relationships
Determining DNA Sequences
A. Sanger DNA sequencing
1. Uses dideoxynucleoside triphosphates (ddNTPs) in DNA synthesis; these lack a 3′-hydroxyl
and terminate DNA synthesis
2. Single strands of DNA are mixed with a primer, DNA polymerase I, four deoxynucleoside
triphosphates (one is labeled), and a small amount of one of the ddNTPs; DNA synthesis
begins with primer but terminates each time a ddNTP is added to the chain
3. Four reactions are run, each with a different ddNTP; these reactions generate DNA fragments
of different length because the site at which the ddNTP is inserted is random
4. Newly synthesized DNA fragments are separated electrophoretically on a polyacrylamide gel
or with capillary electrophoresis often using an automated system; the gel can
autoradiographed if radioactive ddNTPs were used or monitored with a laser if fluorescent
ddNTPs were used; the sequence is then read from the autoradiogram or chromatographic
trace
B. Post-Sanger DNA sequencing
1. Newer sequencing technologies do not require the construction of genomic clone libraries;
these methods attach DNA to solid substrates, PCR amplify sequences, and separate DNA
fragments.
159
2.
Three approaches are available: pyrosequencing (454 Life Sciences), SOLEXA, and SOLiD
technology (sequencing by ligation)
III. Genome Sequencing
A. Sequencing a genome by the whole-genome shotgun approach is a multi-step process
1. Library construction—chromosomes are broken into gene-sized fragments, inserted into
plasmids, and transformed into special E. coli strains
2. Random sequencing—the cloned fragments are sequenced, typically several times to assure
full coverage
3. Fragment alignment and gap closure—DNA fragments are clustered and assembled into longer
stretches of sequence by comparing nucleotide sequence overlaps between fragments
producing contigs (contiguous sequences); the contigs are aligned in the proper order to form
the completed genome sequence; gaps in the sequence are filled
4. Editing—sequence is proofread to resolve any ambiguities
B. Single-cell genomic sequencing uses DNA polymerase from bacteriophage phi29 to randomly
amplify many genomic DNA fragments using a multiple strand displacement (MDA) scheme
IV. Bioinformatics
A. The field concerned with the management and analysis of biological data using computers
B. Genome annotation is done once the sequence is obtained; annotation involves identifying open
reading frames (ORFs), determining potential amino acid sequences, and comparison to known
protein and DNA sequences (using alignments and BLAST)
C. These comparisons allow tentative assignment of gene function as well as identification of
transposable elements, operons, and repeat sequences, and the detection of various metabolic
pathways
D. Two or more genes in the genome of a single organism that arise through duplication of a common
ancestral gene are called paralogues, and between genomes are called orthologues
V. Functional Genomics
A. Functional genomics is focused on how genes and genomes operate; physical maps of genomes are
useful in annotation
B. Metabolic pathways and physiological features can be modeled using annotated genomes where
potential functional proteins have been defined
C. Microarray analysis
1. DNA microarrays—solid supports (e.g., glass) that have DNA attached in highly organized
arrays of spots; in commercial chips, the array may consist of many expressed sequence tags
(ESTs; an expressed gene product made from cDNA) covering every ORF of an organism
2. The mRNA (transcriptome) or cDNA to be analyzed (target mixture) is isolated, labeled with
fluorescent reporter groups, and incubated with the DNA chip; fluorescence at an address on
the chip indicates that the DNA probe on the chip is bound to a mRNA or cDNA in the target
mixture; analysis of the hybridization pattern shows which genes are being transcribed
3. Using this procedure, the characteristic expression of whole sets of genes during
differentiation or in response to environmental changes can be observed; patterns of gene
expression can be detected using hierarchical cluster analysis and functions can be tentatively
assigned based on expression
VI. Proteomics
A. Study of genome function at the level of translation
1. Proteome—entire collection of proteins that an organism produces; proteomics is the study of
the proteome
2. Functional proteomics determines the function of proteins, how they interact with each other,
and how they are regulated
a. Two-dimensional electrophoresis is used to resolve thousands of proteins in a mixture;
proteins are first separated based on charge qualities and then by size
b. Mass spectrometry is used to tentatively identify the proteins isolated by two-dimensional
electrophoresis; N-terminal amino acid sequencing can be used to determine ORFs when
the genome sequence is available
160
3.
Structural proteomics attempts to directly determine the three-dimensional structures of many
proteins and then uses that information to predict the structures of other proteins and protein
complexes based on their amino acid sequence (protein modeling)
B. Similar studies can be performed using lipidomics (lipid profiles), glycomics (carbohydrate
profiles), and metabolomics (small molecule profiles)
C. DNA-protein interactions are important for gene regulation and in understanding transcription and
replication
1. Electrophoretic mobility shift assays examine DNA-protein interactions by observing changes
in the migration of DNA fragments when bound to target proteins
2. Chromatin immunoprecipitation (ChIP) assays examine DNA-protein complexes fixed in vivo
and then detected by antibody precipitation; the captured DNA molecules can be detected
using microarray analysis (ChIP-chip)
VII. Systems Biology seeks to integrate the molecular interactions among the many chemical components of a
cell into a theoretical framework that broadly describes living systems
VIII. Comparative Genomics
A. Comparisons of genomes and their functional genes leads to new insights in microbial biology and
the development of vaccines (reverse vaccinology)
B. Genome sizes vary among domains and organisms with varied ecological roles
C. The core genome (essential backbone of genes) is a set of genes that all organisms within a
monophyletic group share; the pan-genome (flexible gene pool) is the collection of all genes within
a given group
D. Horizontal gene transfer (HGT) is important for the exchange of genetic material between
organisms; mobile elements integrated into the genome (genomic islands) can confer virulence
(pathogenicity islands)
E. Synteny is used to compare the order in which genes appear in different phylogenetic groups
IX. Metagenomics
A. Environmental genomics, or metagenomics, is being used to study microbial diversity in natural
systems; fewer than 1% of the microbes in the environment can grow in the laboratory, so genetic
techniques are used to directly detect and enumerate microbial populations
B. The genomes of entire microbial communities can be sequenced and assembled, giving a picture of
their species composition and functionality; new species (phylotypes) are detected, unique genes
catalogued, and new functions ascribed to taxa
TERMS AND DEFINITIONS
____ 1.
____ 2.
____ 3.
____ 4.
____ 5.
____ 6.
____ 7.
____ 8.
The study of the molecular organization of genomes,
their information content, and the gene products they
encode
The study of the physical nature of genomes
The study of the way genomes function
The comparison of genomes from different organisms
to discern patterns of gene function and regulation and
microbial evolution
Identification and localization of genes in a genome,
and the determination of their function by comparison
to gene sequences in databases
A reading frame sequence that is not interrupted by a
stop codon; if larger than 100 codons, it is thought to
encode a protein
The field concerned with the management and analysis
of biological data using computers
The entire collection of proteins that an organism
produces
161
____ 9.
The study of the array of
proteins an organism can
produce
____ 10. The study of the function
of different proteins,
how they interact with
each other, and how they
are regulated
____ 11. The process of
determining the structure
of various proteins and
then using that
information to predict
the structure of other
proteins and protein
complexes based on
their amino acid
sequence
____ 12. A technique used to
____ 13.
____ 14.
____ 15.
____ 16.
____ 17.
____ 18.
evaluate gene expression where DNA is attached to a
solid support
The flexible genome that is a collection of all genes
within a given group
The totality of all the mRNA in an organism
A taxon that is characterized only by its nucleic acid
sequence
Essential set of genes present in all organisms of a
monophyletic group
A technique used to compare the order of genes in the
genomes of different organisms
A section of the genome containing genes involved in
virulence
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
m.
n.
o.
p.
q.
r.
annotation
bioinformatics
core genome
comparative genomics
DNA microarray
functional genomics
functional proteomics
genomics
open reading frame (ORF)
pathogenicity island
pan-genome
phylotype
proteome
proteomics
structural genomics
structural proteomics
synteny
transcriptome
FILL IN THE BLANK
1.
2.
3.
4.
5.
6.
The most widely used sequencing technique was developed by Frederick Sanger. It uses
dideoxynucleotides and is called the
DNA sequencing method. Post-Sanger sequencing
techniques includes 454 sequencing, also called
.
Analysis of vast amounts of genome data requires sophisticated computers and computer software; these
analytical procedures are part of the field of
.
The study of the way a genome functions is called
. It begins with
of the
genome, which identifies genes and tentatively assigns functions to them. One important aspect of
understanding the function of a genome, is to determine under what conditions each gene in the genome
is expressed. One of the best ways to evaluate gene expression is through the use of
, which
are highly organized arrays of DNA on a solid support (e.g., glass or silicon). Commercially made arrays
often use short sequences (~25 base pairs in length) that are unique to a gene, rather than the entire gene
sequence. These short sequences are called
, and they are derived
from cDNA molecules.
The proteome is often analyzed by
, followed in many cases by mass
spectrometry. Antibodies are often used in
assays to determine protein-DNA
interactions.
The use of genomics to study microbial diversity in natural systems is called _____________. The
genomes of entire communities can be ___________ and then ___________ to give a picture of the
functionality of the entire community.
The essential set of genes in a taxon is called the
and this is supplemented by a wider
flexible set of genes, called the
, that are specific to individual members of that group.
MULTIPLE CHOICE
For each of the questions below select the one best answer.
1.
Which of the following is NOT a step in
whole-genome shotgun sequencing?
a. library construction
b. sequencing of randomly produced
fragments
c.
d.
e.
162
fragment alignment and closure
editing
All of the above are steps in wholegenome shotgun sequencing.
2.
3.
4.
Which of the following is a general pattern of
genome organization discerned by
comparisons of genomes?
a. There is very little variation in genome
organization in bacteria and archaea.
b. There has been considerable horizontal
gene transfer, especially of
housekeeping or operational genes.
c. Most parasitic organisms have more
genes than do free-living organisms.
d. All of the above patterns have been
observed.
A complex mixture of proteins can be
separated using two-dimensional
electrophoresis. What is the basis for the
separation?
a. charge differences (isoelectric focusing)
b. size differences
c. both (a) and (b)
d. neither (a) nor (b)
Which type of genomic analysis provides
information about microbial evolution?
a. structural genomics
5.
6.
7.
b. functional genomics
c. comparative genomics
d. none of the above
Translated amino acid sequences can be
analyzed for motifs. What do these
represent?
a. functional units
b. transcriptional controls
c. paralogues
d. orthologues
Microarray analysis is NOT appropriate for
which of the following?
a. monitoring individual gene expression
b. tentatively assigning gene functions
c. observing patterns of gene expression
d. determining phylogenetic relationships
What percentage of environmental microbes
grow in the laboratory?
a. 1%
b. 20%
c. 60%
d. nearly 100%
TRUE/FALSE
1. The genome of M. genitalium is one of the smallest of any free-living organism.
2. There has been a great deal of horizontal gene transfer between genomes in both Bacteria and
Archaea.
3. One of the ultimate goals of genomic analysis is to model a cell on a computer and make predictions
about how it would respond to environmental changes.
4. It is unlikely that genomic analysis will provide any information useful for understanding
pathogenicity or for developing treatments for infectious disease.
5. Vaccine development can only be done using killed or weakened viruses.
____ 6. Open reading frames are known to be functional genes.
CRITICAL THINKING
1.
In order for computers to identify open reading frames (ORFs) and other features of a genome, they must
be programmed to do so. What features of a nucleotide sequence would be important for identifying
ORFs? Explain your choices. Would the features be the same for both eukaryotic and prokaryotic
organisms? Explain.
2.
Molecular microbial ecology uses genetic techniques to describe microbial communities in the
environment. If you were asked to describe the diversity of the microbes in a lake rich in Epsom salts,
what research plan would you pursue? Would you include a cultivation campaign? Why or why not?
Which molecular techniques would you apply and what might be their limitations?
163
ANSWER KEY
Terms and Definitions
1. h, 2. o, 3. f, 4. d, 5. a, 6. i, 7. b, 8. m, 9. n, 10. g, 11. p, 12. e, 13. k, 14. r, 15. l, 16. c, 17. q, 18. j
Fill in the Blank
1. chain-termination; pyrosequencing 2. bioinformatics 3. functional genomics; annotation; DNA microarrays
(chips); expressed sequence tags 4. two-dimensional electrophoresis; chromatin immunoprecipitation (ChIP) 5.
metagenomics; sequenced; assembled 6. core genome; pan-genome
Multiple Choice
1. e, 2. b, 3. c, 4. c, 5. a, 6. d, 7. a
True/False
1. T, 2. T, 3. T, 4. F, 5. F, 6. F
164
Download