Genomics Chapter 18

advertisement
Genomics
Chapter 18
Mapping Genomes
Maps of genomes can be divided into 2 types
-Genetic maps
-Abstract maps that place the relative
location of genes on chromosomes
based on recombination frequency
-Physical maps
-Use landmarks within DNA sequences,
ranging from restriction sites to the
actual DNA sequence
2
Physical Maps
Distances between “landmarks” are measured
in base-pairs
-1000 basepairs (bp) = 1 kilobase (kb)
Knowledge of DNA sequence is not necessary
There are three main types of physical maps
-Restriction maps
-Cytological maps
-Radiation hybrid maps
3
Physical Maps
Restriction maps
-The first physical maps
-Based on distances between restriction
sites
-Overlap between smaller segments can be
used to assemble them into a contig
-Continuous segment of the genome
4
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. Multiple copies
DNA
enzyme B
of a segment of
DNA are cut
with restriction
enzymes.
Molecular
weight
marker
2. The fragments
produced by
enzyme A only,
by enzyme B
only, and by
enzymes A and
B together are
run side-by-side
on a gel, which
separates them
according to
size.
14 kb
14 kb
10 kb
9 kb
8 kb
6 kb
5 kb
2 kb
3. The fragments
are arranged
so that the
smaller ones
produced by the
simultaneous
cut can be
grouped to
generate the
larger ones
produced by
the individual
enzymes.
2 kb
A
5 kb
3 kb
2 kb
2 kb
8 kb
9 kb
A
5 kb
14 kb
B
2 kb 3 kb
A
B
A
B
4. A physical map
is constructed.
9 kb
0 2 kb 5 kb
5 kb
9 kb
A
A
10 kb
5
19 kb
Physical Maps
Cytological maps
-Employ stains that generate reproducible
patterns of bands on the chromosomes
-Divide chromosomes into subregions
-Provide a map of the whole genome, but
at low resolution
-Cloned DNA is correlated with map using
fluorescent in situ hybridization (FISH)
6
Physical Maps
7
Physical Maps
Radiation hybrid maps
-Use radiation to fragment chromosomes
randomly
-Fragments are then recovered by fusing
irradiated cell to another cell
-Usually a rodent cell
-Fragments can be identified based on
banding patterns or FISH
8
Physical Maps
Sequence-tagged sites
-An STS is a small stretch of DNA that is
unique in the genome
-Only 200-500 bp
-Boundary is defined by PCR primers
-Identified using any DNA as a template
-STSs essentially provide a scaffold for
assembling genome sequences
9
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
STS sites
STS 1 STS 2
STS 3
STS 4
DNA
PCR primers
1. The location of 4 STSs in the genome is shown. PCR is used to
amplify each STS from different clones in a library. Amplifying each
STS by PCR generates a unique fragment that can be identified.
PCR runs with four clones
Clone A Clone B Clone C Clone D
Longer
fragments
STS 4
STS 3
STS 2
STS 1
Shorter
fragments
2. The products of the PCR reactions are separated by gel
electrophoresis producing a different size fragment for each STS.
STS 1 STS 2
Clone A
STS 2
STS 3
STS 2
STS 3
STS 4
STS 3
STS 4
STS 3
STS 4
Clone B
Clone C
Clone D
STS 1 STS 2
Contig
3. The presence or absence of each STS in the clones
identifies regions of overlap. The final result is a
contiguous sequence (contig) of overlapping clones.
10
Genetic Maps
Genetic maps are measured in centimorgans
-1 cM = 1% recombination frequency
Linkage mapping can be done without
knowing the DNA sequence of a gene
-Limitations:
1. Genetic distance does not directly
correspond to actual physical distance
2. Not all genes have obvious
phenotypes
11
Genetic Maps
Most common markers are short repeat
sequences called, short tandem repeats,
or STR loci
-Differ in repeat length between individuals
-13 form the basis of modern DNA
fingerprinting developed by the FBI
-Cataloged in the CODIS database to
identify criminal offenders
12
Genetic Maps
Genetic and physical maps can be correlated
-Any cloned gene can be placed within the
genome and can also be mapped genetically
13
Genetic Maps
All of these different kinds of maps are stored
in databases
-The National Center for Biotechnology
Information (NCBI) serves as the US
repository for these data and more
-Similar databases exist in Europe and
Japan
14
Whole Genome Sequencing
The ultimate physical map is the base-pair
sequence of the entire genome
-Requires use of
high-throughout
automated
sequencing and
computer analysis
15
Whole Genome Sequencing
Sequencers provide accurate sequences for
DNA segments up to 800 bp long
-To reduce errors, 5-10 copies of a genome
are sequenced and compared
Vectors use to clone large pieces of DNA:
-Yeast artificial chromosomes (YACs)
-Bacterial artificial chromosomes (BACs)
-Human artificial chromosomes (HACs)
16
-Are circular, at present
Whole Genome Sequencing
Clone-by-clone sequencing
-Overlapping regions between BAC clones
are identified by restriction mapping or STS
analysis
Shotgun sequencing
-DNA is randomly cut into smaller fragments,
cloned and then sequenced
-Computers put together the overlaps
-Sequence is not tied to other information 17
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Clone-by-Clone Method
1. Large DNA clones are
first isolated. These are
arranged into
contiguous sequences
based on overlapping
tagged sites.
2. Large clones are
fragmented into smaller
clones for sequencing.
3. The entire sequence is
assembled from the
overlapping larger clones.
a.
Shotgun Method
1. Cut DNA of entire
chromosome into small
fragments and clone.
2. Sequence each
segment and arrange
based on overlapping
nucleotide sequences.
b.
18
The Human Genome Project
Originated in 1990 by the International Human
Genome Sequencing Consortium
Craig Venter formed a private company, and
entered the “race” in May, 1998
In 2001, both groups published a draft
sequence
-Contained numerous gaps
19
The Human Genome Project
In 2004, the “finished” sequence was published
as the reference sequence (REF-SEQ) in
databases
-3.2 gigabasepairs
-1 Gb = 1 billion basepairs
-Contains a 400-fold reduction in gaps
-99% of euchromatic sequence
-Error rate = 1 per 100,000 bases
20
Characterizing Genomes
The Human Genome Project found fewer
genes than expected
-Initial estimate was 100,000 genes
-Number now appears to be about 25,000!
In general, eukaryotic genomes are larger and
have more genes than those of prokaryotes
-However, the complexity of an organism is
not necessarily related to its gene number
21
Characterizing Genomes
22
Finding Genes
Genes are identified by open reading frames
-An ORF begins with a start codon and
contains no stop codon for a distance long
enough to encode a protein
Sequence annotation
-The addition of information, such as ORFs,
to the basic sequence information
23
Finding Genes
BLAST
-A search algorithm used to search NCBI
databases for homologous sequences
-Permits researchers to infer functions for
isolated molecular clones
Bioinformatics
-Use of computer programs to search for
genes, and to assemble and compare
genomes
24
Genome Organization
Genomes consist of two main regions
-Coding DNA
-Contains genes than encode proteins
-Noncoding DNA
-Regions that do not encode proteins
25
Coding DNA in Eukaryotes
Four different classes are found:
-Single-copy genes : Includes most genes
-Segmental duplications : Blocks of genes
copied from one chromosome to another
-Multigene families : Groups of related but
distinctly different genes
-Tandem clusters : Identical copies of genes
occurring together in clusters
-Also include rRNA genes
26
Noncoding DNA in Eukaryotes
Each cell in our bodies has about 6 feet of
DNA stuffed into it
-However, less than one inch is devoted to
genes!
Six major types of noncoding human DNA
have been described
27
Noncoding DNA in Eukaryotes
Noncoding DNA within genes
-Protein-encoding exons are embedded
within much larger noncoding introns
Structural DNA
-Called constitutive heterochromatin
-Localized to centromeres and telomeres
Simple sequence repeats (SSRs)
-One- to six-nucleotide sequences repeated
28
thousands of times
Noncoding DNA in Eukaryotes
Segmental duplications
-Consist of 10,000 to 300,000 bp that have
duplicated and moved
Pseudogenes
-Inactive genes
29
Noncoding DNA in Eukaryotes
Transposable elements (transposons)
-Mobile genetic elements
-Four types:
-Long interspersed elements (LINEs)
-Short interspersed elements (SINEs)
-Long terminal repeats (LTRs)
-Dead transposons
30
Noncoding DNA in Eukaryotes
31
Expressed Sequence Tags
ESTs can identify genes that are expressed
-They are generated by sequencing the
ends of randomly selected cDNAs
ESTs have identified 87,000 cDNAs in
different human tissues
-But how can 25,000 human genes encode
three to four times as many proteins?
-Alternative splicing yields different
proteins with different functions
32
Alternative Splicing
1
2
3
4
5
6
7
8
9
10
11
12
13
5´ cap
3´ poly-A tail
Primary RNA transcript
mRNA splicing
exons
introns
3
4
5
6
Processed RNA in brain
8
Processed RNA in muscle
9 10 12
5´ cap
Mature mRNA in brain
1
3´ poly-A
tail
2
4
5
6
8
9 10 13
5´ cap
3´poly-A tail
Mature mRNA in muscle
33
Variation in the Human Genome
Single-nucleotide polymorphisms (SNPs)
are sites where individuals differ by only one
nucleotide
-Must be found in at least 1% of population
Haplotypes are regions of the chromosome
that are not exchanged by recombination
-Tendency for genes not to be randomized is
called linkage disequilibrium
-Can be used to map genes
34
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
SNPs
SNP
SNP
SNP
Chromosome 1 A A C A C G C C A
T T CGGGGT C
AG T C G AC C G
Chromosome 2 A A C A C G C C A
T T CG AGGT C
AG T C AAC C G
Chromosome 3 A A C A T G C C A
T T CGGGGT C
AG T C AAC C G
Chromosome 4 A A C A C G C C A
T T CGGGGT C
AG T C G AC C G
a.
Haplotypes
Haplotype 1 C T C A A A G T A C G G T T C A G G C A
Haplotype 2 T T G A T T G C G C A A C A G T A A T A
Haplotype 3 C C C G A T C T G T G A T A C T G G T G
Haplotype 4 T C G A T T C C G C G G T T C A G A C A
b.
Diagnostic SNPs
A/G
T/C
C/G
Haplotype 1 A T C
Haplotype 2 A C G
Haplotype 3 G T C
Haplotype 4 A C G
35
Genomics
Comparative genomics, the study of whole
genome maps of organisms, has revealed
similarities among them
-For example, over half of Drosophila genes
have human counterparts
Synteny refers to the conserved arrangements
of DNA segments in related genomes
-Allows comparisons of unsequenced
genomes
36
Genomics
37
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Rice
Sugarcane
Corn
Wheat
Genomic Alignment (Segment Rearrangement)
38
Genomics
Organellar genomes
-Mitochondria and chloroplasts are
descendants of ancient endosymbiotic
bacterial cells
-Over time, their genomes exchanged
genes with the nuclear genome
-Both organelles contain polypeptides
encoded by the nucleus
39
Genomics
Functional genomics is the study of the
function of genes and their products
DNA microarrays (“gene chips”) enable
the analysis of gene expression at the
whole-genome level
-DNA fragments are deposited on a slide
-Probed with labeled mRNA from
different sources
-Active/inactive genes are identified
40
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. Unique, PCR-amplified Arabidopsis genome
2. DNA is printed onto a microscope slide.
fragments (1, 2, 3, 4...) are contained in each
well of a plate.
Plate containing
genome fragments
DNA microarray
Microscope
slide
1
2
Robotic
quill
3
4
DNA
1
3. Samples of mRNA are obtained from two different tissues. Probes for each
sample are prepared using a different fluorescent nucleotide for each sample.
Flower-specific mRNA
(sample 1)
2
3
4
4. The two probes are mixed and hybridized
with the microarray. Fluorescent signals on
the microarray are analyzed.
Probe 1
Mix
Hybridize
Reverse transcriptase
Fluorescent nucleotide
Probe 2
cDNA probe
Leaf-specific mRNA
(sample 2)
Reverse transcriptase
Different fluorescent nucleotide
cDNA probe
Weak
signal from
probe 2
Similar
signals from
both probes
Strong
signal from
probe
Strong
signal
from
probe 1
Weak
signal
from
probe 1
41
Genomics
Transgenics is the creation of organisms
containing genes from other species
(transgenic organisms
-Can be used to determine whether:
-A gene identified by an annotation
program is really functional in vivo
-Homologous genes from different
species have the same function
42
Genomics
43
Proteomics
Proteomics is the study of the proteome
-All the proteins encoded by the genome
The transcriptome consists of all the RNA
that is present in a cell or tissue
44
Proteomics
Proteins are much more difficult to study
than DNA because of:
-Post-translational modifications
-Alternative splicing
However, databases containing the known
protein structural motifs exist
-These can be searched to predict the
structure and function of gene sequences
45
Proteomics
46
Proteomics
Protein microarrays are being used to study
large numbers of proteins simultaneously
-Can be probed using:
-Antibodies to specific proteins
-Specific proteins
-Small molecules
The yeast two-hybrid system has generated
large-scale maps of interacting proteins 47
Applications of Genomics
The genomics revolution will have a lasting
effect on how we think about living systems
The immediate impact of genomics is being
seen in diagnostics
-Identifying genetic abnormalities
-Identifying victims by their remains
-Distinguishing between naturally occurring
and intentional outbreaks of infections
48
Applications of Genomics
49
Applications of Genomics
Genomics has also helped in agriculture
-Improvement in the
yield and nutritional
quality of rice
-Doubling of world grain production in last
50 years, with only a 1% cropland increase
50
Applications of Genomics
Genome science is also a source of ethical
challenges and dilemmas
-Gene patents
-Should the sequence/use of genes be
freely available or can it be patented?
-Privacy concerns
-Could one be discriminated against
because their SNP profile indicates
susceptibility to a disease?
51
Download