BioSci D145 Lecture #3
• Bruce Blumberg (blumberg@uci.edu)
– 4103 Nat Sci 2 - office hours Tu, Th 3:30-5:00 (or by appointment)
– phone 824-8573
• TA – Ron Leavitt (rleavitt@uci.edu)
– 4351 Nat Sci 2, 824-6873 – office hours M 2:30-3:30 4206 Nat Sci 2
• check e-mail and noteboard daily for announcements, etc..
– Please use the course noteboard for discussions of the material
• Updated lectures will be posted on web pages after lecture
– http://blumberg.bio.uci.edu/biod145-w2016
– http://blumberg-lab.bio.uci.edu/biod145-w2016
• Last year’s midterm is posted.
BioSci D145 lecture 1
page 1
©copyright
Bruce Blumberg 2014. All rights reserved
Genome mapping
• The problem – genomes are large, workable fragments are small
– How to figure out where everything is?
– How to track mutations in families or lineages?
• analogy to roadmaps
– The most useful maps do not have too much detail but have major
features and landmarks that everything can be related to
• Allows genetic markers to be related to physical markers
• What sorts of maps are useful for genomes?
– Restriction maps of various sorts (most often of large insert libraries)
• RFLPs, fingerprints
– Recombination maps, how often to traits segregate together
– Physical maps – which genes occur on same chunks of DNA
BioSci D145 lecture 2
page 2
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• How are maps made?
– Restriction digestion and ordering of fragments to build contigs
• Fingerprinting
– Location of marker sequences onto larger chunks
– Hybridization of markers to larger chunks
– Calculation of recombination frequencies between loci
• What do we map these days?
– BACs are most common target for mapping of new genomes
– Radiation hybrid panels still in wide use
– Goal is always to map markers onto ordered large fragments and infer
location of genes relative to each other.
– HAPPY mapping becoming widely used again
BioSci D145 lecture 2
page 3
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd) (stopped here)
• Useful markers
– STS – sequence tagged sites
• Short randomly acquired sequences
• PCRing sequences, then prove by
hybridization that only a
single sequence is amplified/genome
– VERY tedious and slow
• validated ones mapped back
to RH panels
• Orders sequences on large chunks of DNA
– STC – sequence tagged connectors
• Array BAC libraries to 15x
coverage of genome
• Sequence BAC ends
• Combine with genomic maps
and fingerprints to link clones
– Average about 1 tag/5 kb
• Most useful preparatory
to sequencing
BioSci D145 lecture 2
page 4
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Useful markers (contd)
– ESTs – expressed sequence tags
• randomly acquired cDNA sequences
• Lots of value in ESTs
– Info about diversity of genes expressed
– Quick way to get expressed genes
• Better than STS because ESTs are expressed genes
• Can be mapped to
– chromosomes by FISH
– RH panels
– BAC contigs
– Polymorphic STS – STS with variable lengths
• Often due to microsatellite differences
• Useful for determining relationships
• Also widely used for forensic analysis
– OJ, Kobe, etc
BioSci D145 lecture 2
page 5
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Useful markers (contd)
– SNPs – single nucleotide polymorphisms
• Extraordinarily useful - ~1/1000 bp in humans
• Much of the differences among us are in SNPs
• SNPs that change restriction sites cause RFLPs (restriction fragment
length polymorphisms
• Detected in various ways
– Hybridization to high density arrays (Affymetrix)
– Sequencing
– Denaturing electrophoresis or HPLC
– Invasive cleavage
• Tony Long in E&E Biology has method for ligation mediated SNP
detection that they use for evolutionary analyses
BioSci D145 lecture 2
page 6
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Useful markers (contd)
– RAPDs – randomly amplified polymorphic DNA
• Amplify genomic DNA with short, arbitrary primers
• Some fraction will amplify fragments that differ among individuals
• These can be mapped like STS
• Issues with PCR amplification
• Benefit – no sequence information required for target
– AFLPs – amplified fragment length polymorphisms
• Cut with enzymes (6 and 4 cutter) that yield a variety of small
fragments ( < 1 kb)
• Ligate sequences to ends and amplify by PCR
• Generates a fingerprint
– Controlled by how frequently enzymes cut
• Often correspond to unique regions of genome
– Can be mapped
• Benefit – no sequence required.
BioSci D145 lecture 2
page 7
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Fingerprinting
– Array and spot ibraries
– Probe with short oligos (10-mers)
• Repeat
– Build up a “fingerprint” for each clone
– Can tell which ones share sequences
• tedious
BioSci D145 lecture 2
page 8
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Mapping by walking/hybridization
– Start with a seed clone then walk along the chromosome
– Takes a LOOONNNNGGG time
– Benefit – can easily jump repetitive sequences
BioSci D145 lecture 2
page 9
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Mapping by hybridization
– Array library – pick a “seed clone”
– See where it hybridizes, pick new seed and repeat
– Product
BioSci D145 lecture 2
page 10
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd) Restriction mapping of large insert clones
• Mapping by restriction digest fingerprinting
– Order clones by comparing patterns from restriction enzyme digestion
BioSci D145 lecture 2
page 11
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• FISH - Fluorescent in situ hybridization – can detect chromosomes or genes
– Can localize probes to chromosomes and
– Relationship of markers to each other
– Requires much knowledge of genome being mapped
– Chromosome painting
BioSci D145 lecture 2
page 12
©copyright
marker detection
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Radiation hybrid mapping
– Old but very useful technique (Geisler paper)
• Lethally irradiate cells with X-rays
• Fuse with cells of another species, e.g., blast human cells then fuse
with hamster cells
– Chunks of human DNA will remain in mouse cells
• Expand colonies of cells to get a collection of cell lines, each
containing a single chunk of human cDNA
• Collection = RH panel
– Now map markers onto these RH panels
• Can identify which of any type of markers map together
– STS, EST (very commonly used), etc
• Can then map others by linkage to the ones you have mapped
– Compare RH panel with other maps
• Utility – great for cloning gaps in other maps
• HAPPY Mapping –
– PCR-based method – see Ron’s presentation
BioSci D145 lecture 3
page 13
©copyright
Bruce Blumberg 2004. All rights reserved
Genome mapping (contd)
• How should maps be made with current knowledge?
– All methods have strengths and weaknesses – must integrate data for
useful map
• e.g, RH panel, BAC maps, STS, ESTs
– Size and complexity of genome is important
• More complex genomes require more markers and time mapping
– Breakpoints and markers are mapped relative to each other
– Maps need to be defined by markers (cities, lakes, roads in analogy)
– Key part of making a finely detailed map is construction of genomic
libraries and cell lines for common use
• Efforts by many groups increase resolution and utility of maps
• Current strategies
– BAC end sequencing
– Whole genome shotgun sequencing
– EST sequencing
– HAPPY mapping
– Mapping of above to RH panels
– Fancier techniques (Dovetail, Chicago reads, Hi-C assemblies)
BioSci D145 lecture 3
page 14
©copyright
Bruce Blumberg 2004. All rights reserved
DNA sequence analysis (first gen sequencing)
• DNA sequencing = determining the nucleotide sequence of DNA
– Two main methods
– shared Nobel prize in 1980
• Chemical cleavage – Maxam and Gilbert
• Enzymatic sequencing (based on polymerization reaction)
Nobel Prize in Chemistry 1980
Walter Gilbert (Harvard) & Frederick
Sanger (MRC Labs)
(Sanger also won Nobel in 1958 for protein
sequencing)
BioSci D145 lecture 4
page 15
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis
• Maxam and Gilbert
– One of the first reasonable sequencing methods
– Very popular in late 70s and early 80s
– VERY TEDIOUS!!
• Totally superceded by dideoxy sequencing now
BioSci D145 lecture 4
page 16
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
• Dideoxy sequencing – Sanger 1977
– Virtually all sequencing is done
this way now
– Requires modified nucleotide
• 2’3’-dideoxy dNTP
– DNA polymerase incorporates
the ddNTP and chain
elongation terminates
– Original method used 4
separate elongation reactions
– Products separated by
denaturing PAGE and visualized
by autoradiography
BioSci D145 lecture 4
page 17
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
• Dideoxy sequencing (contd) – Sanger 1977
– Dideoxy NTPs present at ~1% of [dNTP]
– Each reaction has identified end
– In principle, all possible chain lengths are represented
• varies by [dNTPs], [ddNTPs], [primer] and [template] and ratios
BioSci D145 lecture 4
page 18
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
A
ACGT ACGT
BioSci D145 lecture 4
page 19
©copyright
Bruce Blumberg 2004-2007. All rights reserved
C
G
T
Automated DNA sequence analysis
• How to improve throughput of sequencing?
– Incorporate fluorescent ddNTPs, separate products by PAGE
• Base calling and lane calling issues
– Key advance was capillary sequencers
• Separate DNA in a thin capillary instead of gel
• Very accurate, no tracking errors, much more automation friendly
BioSci D145 lecture 4
1.
Trace files (dye signals) are analyzed and bases called to
create chromatograms.
2.
Chromatograms from opposite strands are reconciled
with software to create double-stranded sequence data.
page 20
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Automated DNA sequence analysis
• Capillaries vs gels
– Capillaries much faster – higher field strength possible
– Fully automated = higher throughput
BioSci D145 lecture 4
page 21
©copyright
Bruce Blumberg 2004-2007. All rights reserved
PCR – polymerase chain reaction amplification of DNA
• PCR is most routinely used method to amplify
DNA
– Exponential amplification of DNA by
polymerases – Saiki et al, 1985
• 2n fold amplification, n= # cycles
– 35 cycles = 235 = 3.4 x 1010 fold
• Originally used DNA polymerase I
– Needed to add fresh enzyme at
every cycle because heat
denaturation of template killed
the enzyme
– Not widely used – too painful to
do manually
– Nobel Prize to Kary Mullis in 1993 for
deciding to use Taq DNA polymerase for
PCR
• He was middle author on paper!
BioSci D145 lecture 4
page 22
©copyright
Bruce Blumberg 2004-2007. All rights reserved
PCR – polymerase chain reaction amplification of DNA (contd)
Hot water bacteria:
Thermus aquaticus
Taq DNA polymerase
Life at High Temperatures by Thomas D. Brock
Biotechnology in Yellowstone
© 1994 Yellowstone Association for Natural Science
http://www.bact.wisc.edu/Bact303/b27
BioSci D145 lecture 4
page 23
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Cycle sequencing – fusion of PCR and fluorescent ddNTP sequencing
• http://www.dnalc.org/ddnalc/resources/animations.html
• Combine PCR amplification with
dideoxy sequencing – cycle sequencing
– Linear amplification of template
in the presence of fluorescent ddNTPs
– When nucleotides are used up
reaction is over
– Separate on capillary electrophoresis
instrument
– Advantages
• Fast, single tube reaction
• Works with small amounts of
starting material
– Disadvantages
• Still need to prepare high
quality template to sequence
• Cost and time
– Many sequencing centers spend
time, $$ on template prep
– Automation requirements
BioSci D145 lecture 4
page 24
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Isothermal amplification – the solution to template preparation
• How to make template preparation faster, easier and more reliable?
– Eliminate automation requirement, amplify starting material in some
other way
– Φ29 DNA polymerase (aka TempliPhi)
– http://www.gelifesciences.com/aptrix/upp01077.nsf/content/sample_pr
eparation~product_selection_category~rolling_circle_amplification
– Enzyme has high processivity and strand displacement activity
• Isothermal reaction produces huge quantities of DNA from tiny
amount of input
• More efficient than PCR (no temp change, no machine, no cleanup)
BioSci D145 lecture 4
page 25
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Modern DNA sequence analysis
• Cycle sequencing
– Virtually all commercial DNA sequencing today is done by cycle
sequencing with fluorescent ddNTPs
• ABI Big Dye chemistry
– Template preparation still tedious for small scale
• TempliPHi used in genome centers (obviated need for most
automation)
– Capillary sequencers predominant form of technology in use
• But, next generation sequencing is already coming online and will rapidly
displace old technology in genome centers.
– 454 sequencing (Roche)
– Solexa (Illumina)
– SoLID (Applied Biosystems)
• 3rd generation sequencing (individual DNA molecule) now available
– e.g., Pacific Biosciences (sequence reads of 1,000-10K bases)
BioSci D145 lecture 4
page 26
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies
• Sequencing by hybridization
– Construct a high-density
microchip with all possible
combinations of a short
oligonucleotide
• Up to 25-mers
• By photolithography
– Synthesized on
chip directly
– Label and hybridize
fragment to be sequenced
– Wash stringently
– Read fluorescent spots
– Reconstruct sequence
by computer
BioSci D145 lecture 5
page 27
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies (contd)
• Sequencing by hybridization rarely used for de novo sequencing
– Extremely fast and useful to sequence something you already know the
sequence of but want to identify mutation - resequencing
– Disease causing changes
• e.g in mitochondrial DNA
– SNP discovery
– Works best for examining sequence of <10 kb
BioSci D145 lecture 5
page 28
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies (contd)
• http://www.affymetrix.com/products/arrays/index.affx
• SNP discovery
– Photo shows
mitochondrial chip
– Right panel shows pairs
of normal (top) vs
disease (bottom)
(Leber’s Hereditary
Optic Neuropathy)
• Top 3 disease
mutations
• Bottom control
with no change
BioSci D145 lecture 5
page 29
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies – Next Generation sequencing
• 2nd generation = high throughput, short sequences
• 3rd generation = single molecule sequencing
• Small number of sequence templates (thousands) but very long reads
(~105 bp)
• What is the immediate implication of this technology for genome
assembly?
We should now be able to completely sequence large insert clones
directly and avoid fragmentation by repetitive elements!
• Key review is Metzger, M.L. (2010) Sequencing technologies — the next
generation, Nature Reviews Genetics 11, 31-46.
BioSci D145 lecture 5
page 30
©copyright
Bruce Blumberg 2004-2007. All rights reserved
3rd generation
Other sequencing technologies (contd)
• Pyrosequencing –
– http://www.454.com
– Based on synthesis of complementary strand to a template (like Sanger)
– Detection of elongation with chemiluminescence
• Fragment genome to appropriate size (depends on application)
• add adapters to each end
• Isolate those with different adapters on each end
• PCR to amplify
BioSci D145 lecture 5
page 32
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies (contd)
• Pyrosequencing (contd)
– PCR – capture template on micro beads such that each bead gets 1
molecule of DNA – how? Use a large ratio of beads to DNA
– Emulsify in water/oil microreactors
– Amplify DNA
– Break and recover DNA containing beads
BioSci D145 lecture 5
page 33
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies (contd)
• Pyrosequencing (contd)
– Sequencing – load beads into picotiter wells
• Add enzymes (sulfurylase and luciferase)
• Run reaction – flow nucleotide/buffer
solution across wells one at a time
• Complementary nucleotide addition
leads to light output
– light output is proportional
to # consecutive nucleotides
BioSci D145 lecture 5
page 34
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Other sequencing technologies (contd)
• Pyrosequencing (contd)
– What is the point?
• Can generate 400,000 reads in parallel (FLX)
• Or > 1,000,000 (FLX Titanium)
• Each read is 200-400 bp (FLX), or 400-600 (FLX Titanium)
• So you can get
– 8 x107 bp per run! (FLX)
– 4-6 x 108 bp/run (FLX Titanium)
• What is massively parallel sequencing good for?
–
–
–
–
–
–
–
Rapid sequencing of genomes, or resequencing of known sequences
Ancient DNA (even dinosaurs? – Svante Pääbo says ~200K years is limit)
ChIP-sequencing (week 6)
Sequencing ESTs or other tags
Determining microbial diversity in field samples
Transcriptome sequencing
Identifying variations in
• Viral populations
• Gene sequences in mixed populations
BioSci D145 lecture 5
page 35
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Amplicon sequencing
• Idea is to sequence many copies of the same thing
– Gene sequence
– mRNA transcript
BioSci D145 lecture 5
page 36
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Amplicon sequencing (contd)
• What is amplicon sequencing good for?
– Discovery of rare somatic mutations in complex samples (e.g., cancerous
tumors - mixed with germline DNA) based on ultra-deep sequencing of
amplicons
– Sequencing collections of exons from populations of individuals to
identify diversity
– Sequencing collections of human exons from populations of individuals
for the identification of rare alleles associated with disease
– Analysis of viral quasispecies present within infected populations in the
context of epidemiological studies
– Evolutionary biology in populations
BioSci D145 lecture 5
page 37
©copyright
Bruce Blumberg 2004-2007. All rights reserved