physical maps

advertisement
PowerPoint to accompany
Genetics: From Genes to Genomes
Third Edition
Hartwell ● Hood ● Goldberg ● Reynolds ● Silver ● Veres
Chapter
10
Prepared by Malcolm Schug
University of North Carolina Greensboro
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-1
Reconstructing the
Genome
Through
Genetic and Molecular Analysis
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-2
Outline of Chapter 10

Challenges and strategies of genome analysis





Insights emerging from complete genome sequencing





Number and type of genes
Extent of repeated sequences
Genome organization and structure
Evolution by lateral gene transfer
High throughput tools for analyzing genomes and their protein products




Genome size
Features to be analyzed
Problems with DNA polymorphisms
Development of whole-genome maps
DNA sequencers
DNA arrays
Mass spectrophotometers
Two paradigm changes propelled by whole-genome sequences and new tools of
genome analysis


Systems biology
Predictive and preventative medicine
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-3
The genomes of living organisms vary enormously in size.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-4
Genomicists look at two basic features of
genomes: sequence and polymorphism.

Major challenges to determine sequence of each
chromosome in genome and identify many
polymorphisms:


How does one sequence a 500 Mb chromosome 600 bp at a
time?
How accurate should a genome sequence be?


How does one distinguish sequence errors from
polymorphisms?



DNA sequencing error rate is about 1% per 600 bp.
Rate of polymorphism in diploid human genome is about 1 in 500 bp.
Repeat sequences may be hard to place.
Unclonable DNA cannot be sequenced.

Up to 30% of genome is heterochromatic DNA that can not be cloned
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-5
Divide and conquer strategy meets
most challenges.



Chromosomes are broken into small
overlapping pieces and cloned.
Ends of clones sequenced and reassembled
into original chromosome strings
Each piece is sequenced multiple times to
reduce error rate.

10-fold sequence coverage achieves a rate of
error less than 1/10,000.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-6
Fig. 10.2
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-7
Techniques for mapping and cloning

Cloning



Hybridization


Direct amplification of a particular region of DNA ranging from 1 bp to > 20kb
DNA sequencing


Location of a particular DNA sequence within the library of fragments
PCR amplification


Library of DNA fragments 500 – 1,000,000 bp
Insert into one of a variety of vectors
Automated DNA sequencer using Sanger method determines sequences 600 bp at a
time.
Computational tools




Programs for identifying matches between a particular sequence and a large
population of previously sequenced fragments
Programs for identifying overlaps of DNA fragments
Programs for estimating error rates
Programs for identifying genes in chromosomal sequences
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-8
Making a large scale linkage map

Types of DNA polymorphisms used for large-scale mapping:


Single nucleotide polymorphisms (SNPs) – 1/500 – 1/1000 bp across genome
Simple sequence repeats (SSRs) – 1/20-1/40 kb across genome




2-5 nucleotides is repeated 4-50 or more times.
Most SNPs and SSRs have little or no effect on the organism.
Serve as DNA markers across the chromosomes
Must be able to rapidly identify and assay in populations from
100s to 1000s of individuals
Fig. 10.3
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-9
Genome wide identification of
genetic markers




Initial genetic maps used SSRs which are
highly polymorphic.
Identified by screening DNA libraries with
SSR probes
Amplified by PCR and length differences
assayed
SNPs – millions more recently identified by
comparison of orthologous regions of cDNA
clones from different individuals
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-10




Homologous – genes with enough sequence
similarity to be related somewhere in evolutionary
history
Orthologous – genes in two different species that
arose from the same gene in the two species’
common ancestor
Paralogous – arise by duplication within same
species
Orthologous genes are always homologous, but
homologous genes are not always orthologous.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-11
SNPs and SSRs for genome coverage


Until recently, maps were constructed from
about 500 SSRs evenly spaced across
genome (1 SSR every 6 Mb).
SNPs provide more than 500,000 DNA
markers across the genome.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-12
Genome wide typing of genetic
markers

Two-stage assay
for simple
sequence
repeats
PCR
amplification
 Size separation

Fig. 10.4
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-13
Long range physical maps: karyotypes and genomic
libraries position markers on chromosomes.

Physical map




Overlapping DNA fragments ordered and oriented that
span each of the chromosomes
Based on direct analysis of DNA rather than
recombination on which linkage maps are based
Chart actual number of bp, kb, or Mb that separate a
locus from its neighbors
Linkage vs. physical maps


1 cM = 1 Mb in humans
1 cM = 2 Mb in mice
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-14
Vectors used for clone large inserts
for physical mapping

YACs (yeast artificial chromosomes)


Insert size 100-1,000,000 Mb
BACs (bacterial artificial chromosomes)
Insert size 50 – 300 kb
 More stable and easier to purify from host DNA
than YACs

Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-15
How to determine order of clones
across genome

Overlapping inserts help align cloned
fragments.
Bottom-up approach – overlapping sequences
of tens of thousands of clones determined by
restriction site analysis or sequence tag sites
(STSs)
 Top-down approach – insert is hybridized
against karyotype of entire genome.

Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-16
Identifying and isolating a set of overlapping
fragments from a library

Two approaches:

Linkage maps used to derive a physical map





Set of markers less than 1 cM apart
Use markers to retrieve fragments from library by
hybridization.
Construct contigs – two or more partially overlapping cloned
fragments.
Chromosome walk by using ends of unconnected contigs to
probe library for fragments in unmapped regions
Physical mapping techniques:



Direct analysis of DNA
Overlapping clones aligned by restriction mapping
Sequence tag segments (STSs)
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-17
Physical mapping by analysis of STSs
Bottom-up approach
Fig. 10.5
Each STS represents a unique segment of the
genome amplified by PCR.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-18
Human Karyotype


Fig. 10.6 a, b
(a) Complete set
of human
chromosomes
stained with
Giemsa dye
shows bands.
(b) Ideograms
show idealized
banding pattern.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-19
Chromosome 7 at three levels of resolution
Fig. 10. 6 c
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-20
FISH protocol for top-down approach
Fig. 10.8
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-21
Sequence maps show the order of
nucleotides in a cloned piece of DNA.

Two strategies for sequence human genome:
Hierarchical shotgun approach
 Whole-genome shotgun approach


Shotgun – randomly generated overlapping
insert fragments:
Fragments from BACs
 Fragments from shearing whole genome

Shearing DNA with sonication
 Partial digestion with restriction enzymes

Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-22
Hierarchical shotgun strategy
Used in publicly funded effort to sequence human genome





Fig. 10.9
Shear 200 kb BAC clone
into ~2 kb fragments
Sequence ends 10 times
Need about 1700 plasmid
inserts per BAC and about
20,000 BACs to cover
genome
Data form linkage and
physical maps used to
assemble sequence maps
of chromosomes
Significant work to create
libraries of each BAC and
physically map BAC
clones
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-23
Whole-genome shotgun sequencing
Private company Celera used to sequence whole human genome.

Whole genome randomly
sheared three times







Fig. 10.10
Plasmid library constructed
with ~ 2kb inserts
Plasmid library with ~10 kb
inserts
BAC library with ~ 200 kb
inserts
Computer program assembles
sequences into chromosomes.
No physical map construction
Only one BAC library
Overcomes problems of repeat
sequences
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-24
Limitations of whole genome
sequencing

Some DNA can not be cloned.



e.g., heterochromatin
Some sequences rearrange or sustain
deletions when cloned.
Future large genome sequencing will use
both shotgun approaches.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-25
Sequencing of the human genome

Most of draft took place during last year of
project.
Instrument improvements – 500,000,000 bp/day
 Automated factory-like production line
generated sufficient DNA to supply sequencers
on a daily basis.
 Large sequencing centers with 100-300
instruments – 150,000,000 bp/day

Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-26
Integration of linkage, physical, and
sequence maps



Provides check on the correct order of each
map against other two
SSR and SNP DNA linkage markers readily
integrated into physical map by PCR
analysis across insert clones in physical map
SSR, SNP (linkage maps), and STS markers
(physical maps) have unique sequences 20
bp or more, allowing placement on sequence
map.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-27
Changes in biology, genetics and genomics from
human genome sequence


Genetics parts list
Speeds gene-finding and gene-function analysis



Sequence identification in second organism through
homology
Gene function in one organism helps understand
function in another for orthologous and paralogous
genes
Genes often encode one or more protein domains



Allows guess at function of new protein by comparison of
protein sequence in databases of all known domains
Ready access to identification of known human
polymorphism
Speeds mapping of new organisms by comparison

e.g., mouse and human have high similarity in gene content and
order
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-28
Major insights from human and
model organism sequences




Approximately 25,000 human genes
Genes encode noncoding RNA or proteins.
Repeat sequences are > 50% of genome.
Distinct types of gene organization:







Gene families
Gene rich regions
Combinatorial strategies amplify genetic information and
increase diversity.
Evolution by lateral transfer of genes from one organism to
another
Males have twofold higher mutation rate than females.
Human races have very few unique distinguishing genes.
All living organisms evolve from a common ancestor.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-29
Conserved
segments of
syntenic blocks
in human and
mouse genomes
Fig. 10.12
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-30
Noncoding RNA genes




Transfer RNAs (tRNAs) – adaptors that translate
triplet code of RNA into amino acid sequence of
proteins
Ribosomal RNAs (rRNAs) – components of
ribosome
Small nucleolar RNAs (snoRNAs) – RNA
processing and base modification in nucleolus
Small nuclear RNAs (sncRNAs) - spliceosomes
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-31
Protein coding genes generate the
proteome.


Proteome – collective translation of 30,000 protein
coding genes into proteins
Complexity of proteome increase from yeast to
humans.





More genes
Shuffling, increase, or decrease of functional modules
More paralogs
Alternative RNA splicing – humans exhibit significantly
more
Chemical modification of proteins is higher in humans.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-32
Protein coding genes generate the proteome
How transcription factor protein domains have expanded
in specific lineages
Fig. 10.11
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-33
Examples of domain accretions in chromatin proteins
Fig. 10.13
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-34
Number of distinct domain architectures in
four eukaryotic genomes
Fig. 10.14
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-35
Repeat sequences fall into five classes.





Transposon-derived repeats
Processed pseudogenes
SSRs
Segmental duplications of 10-300 kb
Blocks of repeated sequences at centromere,
telomeres and other chromosomal features
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-36
Repeat sequences constitute more
than 50% of the genome.
Fig. 10.15
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-37
Gene organization of genome

Gene families


Gene-rich regions


Closely related genes clustered or dispersed
Functional or chance events?
Gene deserts
Span 144 Mb or 3% of genome
 Contain regions difficult to identify?


e.g., big genes – nuclear transcript spans 500 kb or
more with very large introns (exons < 1% of DNA)
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-38
Genome has a distinct organization.
Gene family – olfactory receptor gene family
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-39
Class II region of human major
histocompatibility complex contains
60 genes in 700 kb
Fig. 10.17
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-40
Combinatorial strategies

At DNA level – T-cell receptor genes are encoded by a multiplicity of
gene segments.
Fig. 10.18

At RNA level –
splicing of exons in
different orders
Fig. 10.19a
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-41
Lateral transfer of genes


> 200 human genes may arise by transfer
from organisms such as bacteria.
Lateral transfer is direct transfer of genes
from one species into the germ line of
another.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-42
Twofold higher mutation rate in
males




Comparison of X and Y chromosomes
Same may be true for autosomes, but
difficult to measure.
Majority of human mutations arise in
males.
Males give rise to more defects, but also
more diversity.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-43
Human races have similar genes.




Genome sequence centers have sequenced
significant portions of at least three races.
Range of polymorphisms within a race can
be much greater than the range of
differences between any two individuals of
different races.
Very few genes are race specific.
Genetically, humans are a single race.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-44
All living organisms are a single
race.



All living organisms have remarkably
similar genetic components.
Life evolved once and we are descendents
of that event.
Analysis of appropriate biological systems
in model organisms provides fundamental
insight into corresponding human systems.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-45
In the future, other features of chromosomes
will become increasingly important.

Chemical modification of bases




Interaction of various proteins with chromosome
Three dimensional structure of proteins in nucleus


Understand DNA methylation now
Others may be discovered
May determine interactions of chromosomal regions
with regions of nuclear envelope
More effective tools need to be developed to
examine chromosome features.
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-46
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-47
High-throughput instruments
DNA sequencer
Fig. 10.20
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-48
High-throughput
instruments
e.g, microarrays
Fig. 10.21
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-49
Two color - DNA microarray
Fig. 10.22
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-50
Analysis of genomic and RNA
sequences

Quantitative analysis of mRNA levels

Serial analysis of gene expression (SAGE)


Small cDNA tags of 15 bp from 3’ ends of mRNA are
linked and sequenced.
Massively parallel signature sequence (MPSS)
Transcriptome – population of mRNAs expressed in
a single cell or cell type
 MPSS allows identification of most of cell’s rarely
expressed mRNAs

Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-51
Lynx therapeutics sequencing
strategy of MPSS
Fig. 10.24
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-52
Systems Biology – the global study of multiple components
of biological systems and their interactions

New approach to studying biological
systems has made possible:
Sequencing genomes
 High-throughput platform development
 Development of powerful computational tools
 The use of model organisms
 Comparative genomics

Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-53
Human Genome Project has changed the
potential for predictive/preventive medicine.

Provided access to DNA polymorphisms
underlying human variability



Makes possible identification of genes predisposing to
disease
Understanding of defective genes in context of
biological systems
Circumvent limitations of defective genes



Novel drugs
Environmental controls
Approaches such as stem-cell transplants or gene therapy
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-54
Social, ethical, and legal issues






Privacy of genetic information
Limitations on genetic testing
Patenting of DNA sequences
Society’s view of older people
Training of physicians
Human genetic engineering


Somatic gene therapy – inserting replacement genes
Germ-line therapy – modifications of human germ line
Copyright © The McGraw-Hill Companies, Inc. Permission required to reproduce or display
10-55
Download