BioSci D145 Lecture #3 • Bruce Blumberg (blumberg@uci.edu) – 4103 Nat Sci 2 - office hours Tu, Th 3:30-5:00 (or by appointment) – phone 824-8573 • TA – Ron Leavitt (rleavitt@uci.edu) – 4351 Nat Sci 2, 824-6873 – office hours M 2:30-3:30 4206 Nat Sci 2 • check e-mail and noteboard daily for announcements, etc.. – Please use the course noteboard for discussions of the material • Updated lectures will be posted on web pages after lecture – http://blumberg.bio.uci.edu/biod145-w2016 – http://blumberg-lab.bio.uci.edu/biod145-w2016 • Last year’s midterm is posted. BioSci D145 lecture 1 page 1 ©copyright Bruce Blumberg 2014. All rights reserved Genome mapping • The problem – genomes are large, workable fragments are small – How to figure out where everything is? – How to track mutations in families or lineages? • analogy to roadmaps – The most useful maps do not have too much detail but have major features and landmarks that everything can be related to • Allows genetic markers to be related to physical markers • What sorts of maps are useful for genomes? – Restriction maps of various sorts (most often of large insert libraries) • RFLPs, fingerprints – Recombination maps, how often to traits segregate together – Physical maps – which genes occur on same chunks of DNA BioSci D145 lecture 2 page 2 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • How are maps made? – Restriction digestion and ordering of fragments to build contigs • Fingerprinting – Location of marker sequences onto larger chunks – Hybridization of markers to larger chunks – Calculation of recombination frequencies between loci • What do we map these days? – BACs are most common target for mapping of new genomes – Radiation hybrid panels still in wide use – Goal is always to map markers onto ordered large fragments and infer location of genes relative to each other. – HAPPY mapping becoming widely used again BioSci D145 lecture 2 page 3 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) (stopped here) • Useful markers – STS – sequence tagged sites • Short randomly acquired sequences • PCRing sequences, then prove by hybridization that only a single sequence is amplified/genome – VERY tedious and slow • validated ones mapped back to RH panels • Orders sequences on large chunks of DNA – STC – sequence tagged connectors • Array BAC libraries to 15x coverage of genome • Sequence BAC ends • Combine with genomic maps and fingerprints to link clones – Average about 1 tag/5 kb • Most useful preparatory to sequencing BioSci D145 lecture 2 page 4 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Useful markers (contd) – ESTs – expressed sequence tags • randomly acquired cDNA sequences • Lots of value in ESTs – Info about diversity of genes expressed – Quick way to get expressed genes • Better than STS because ESTs are expressed genes • Can be mapped to – chromosomes by FISH – RH panels – BAC contigs – Polymorphic STS – STS with variable lengths • Often due to microsatellite differences • Useful for determining relationships • Also widely used for forensic analysis – OJ, Kobe, etc BioSci D145 lecture 2 page 5 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Useful markers (contd) – SNPs – single nucleotide polymorphisms • Extraordinarily useful - ~1/1000 bp in humans • Much of the differences among us are in SNPs • SNPs that change restriction sites cause RFLPs (restriction fragment length polymorphisms • Detected in various ways – Hybridization to high density arrays (Affymetrix) – Sequencing – Denaturing electrophoresis or HPLC – Invasive cleavage • Tony Long in E&E Biology has method for ligation mediated SNP detection that they use for evolutionary analyses BioSci D145 lecture 2 page 6 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Useful markers (contd) – RAPDs – randomly amplified polymorphic DNA • Amplify genomic DNA with short, arbitrary primers • Some fraction will amplify fragments that differ among individuals • These can be mapped like STS • Issues with PCR amplification • Benefit – no sequence information required for target – AFLPs – amplified fragment length polymorphisms • Cut with enzymes (6 and 4 cutter) that yield a variety of small fragments ( < 1 kb) • Ligate sequences to ends and amplify by PCR • Generates a fingerprint – Controlled by how frequently enzymes cut • Often correspond to unique regions of genome – Can be mapped • Benefit – no sequence required. BioSci D145 lecture 2 page 7 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Fingerprinting – Array and spot ibraries – Probe with short oligos (10-mers) • Repeat – Build up a “fingerprint” for each clone – Can tell which ones share sequences • tedious BioSci D145 lecture 2 page 8 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Mapping by walking/hybridization – Start with a seed clone then walk along the chromosome – Takes a LOOONNNNGGG time – Benefit – can easily jump repetitive sequences BioSci D145 lecture 2 page 9 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Mapping by hybridization – Array library – pick a “seed clone” – See where it hybridizes, pick new seed and repeat – Product BioSci D145 lecture 2 page 10 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) Restriction mapping of large insert clones • Mapping by restriction digest fingerprinting – Order clones by comparing patterns from restriction enzyme digestion BioSci D145 lecture 2 page 11 ©copyright Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • FISH - Fluorescent in situ hybridization – can detect chromosomes or genes – Can localize probes to chromosomes and – Relationship of markers to each other – Requires much knowledge of genome being mapped – Chromosome painting BioSci D145 lecture 2 page 12 ©copyright marker detection Bruce Blumberg 2007. All rights reserved Genome mapping (contd) • Radiation hybrid mapping – Old but very useful technique (Geisler paper) • Lethally irradiate cells with X-rays • Fuse with cells of another species, e.g., blast human cells then fuse with hamster cells – Chunks of human DNA will remain in mouse cells • Expand colonies of cells to get a collection of cell lines, each containing a single chunk of human cDNA • Collection = RH panel – Now map markers onto these RH panels • Can identify which of any type of markers map together – STS, EST (very commonly used), etc • Can then map others by linkage to the ones you have mapped – Compare RH panel with other maps • Utility – great for cloning gaps in other maps • HAPPY Mapping – – PCR-based method – see Ron’s presentation BioSci D145 lecture 3 page 13 ©copyright Bruce Blumberg 2004. All rights reserved Genome mapping (contd) • How should maps be made with current knowledge? – All methods have strengths and weaknesses – must integrate data for useful map • e.g, RH panel, BAC maps, STS, ESTs – Size and complexity of genome is important • More complex genomes require more markers and time mapping – Breakpoints and markers are mapped relative to each other – Maps need to be defined by markers (cities, lakes, roads in analogy) – Key part of making a finely detailed map is construction of genomic libraries and cell lines for common use • Efforts by many groups increase resolution and utility of maps • Current strategies – BAC end sequencing – Whole genome shotgun sequencing – EST sequencing – HAPPY mapping – Mapping of above to RH panels – Fancier techniques (Dovetail, Chicago reads, Hi-C assemblies) BioSci D145 lecture 3 page 14 ©copyright Bruce Blumberg 2004. All rights reserved DNA sequence analysis (first gen sequencing) • DNA sequencing = determining the nucleotide sequence of DNA – Two main methods – shared Nobel prize in 1980 • Chemical cleavage – Maxam and Gilbert • Enzymatic sequencing (based on polymerization reaction) Nobel Prize in Chemistry 1980 Walter Gilbert (Harvard) & Frederick Sanger (MRC Labs) (Sanger also won Nobel in 1958 for protein sequencing) BioSci D145 lecture 4 page 15 ©copyright Bruce Blumberg 2004-2007. All rights reserved DNA sequence analysis • Maxam and Gilbert – One of the first reasonable sequencing methods – Very popular in late 70s and early 80s – VERY TEDIOUS!! • Totally superceded by dideoxy sequencing now BioSci D145 lecture 4 page 16 ©copyright Bruce Blumberg 2004-2007. All rights reserved DNA sequence analysis (contd) • Dideoxy sequencing – Sanger 1977 – Virtually all sequencing is done this way now – Requires modified nucleotide • 2’3’-dideoxy dNTP – DNA polymerase incorporates the ddNTP and chain elongation terminates – Original method used 4 separate elongation reactions – Products separated by denaturing PAGE and visualized by autoradiography BioSci D145 lecture 4 page 17 ©copyright Bruce Blumberg 2004-2007. All rights reserved DNA sequence analysis (contd) • Dideoxy sequencing (contd) – Sanger 1977 – Dideoxy NTPs present at ~1% of [dNTP] – Each reaction has identified end – In principle, all possible chain lengths are represented • varies by [dNTPs], [ddNTPs], [primer] and [template] and ratios BioSci D145 lecture 4 page 18 ©copyright Bruce Blumberg 2004-2007. All rights reserved DNA sequence analysis (contd) A ACGT ACGT BioSci D145 lecture 4 page 19 ©copyright Bruce Blumberg 2004-2007. All rights reserved C G T Automated DNA sequence analysis • How to improve throughput of sequencing? – Incorporate fluorescent ddNTPs, separate products by PAGE • Base calling and lane calling issues – Key advance was capillary sequencers • Separate DNA in a thin capillary instead of gel • Very accurate, no tracking errors, much more automation friendly BioSci D145 lecture 4 1. Trace files (dye signals) are analyzed and bases called to create chromatograms. 2. Chromatograms from opposite strands are reconciled with software to create double-stranded sequence data. page 20 ©copyright Bruce Blumberg 2004-2007. All rights reserved Automated DNA sequence analysis • Capillaries vs gels – Capillaries much faster – higher field strength possible – Fully automated = higher throughput BioSci D145 lecture 4 page 21 ©copyright Bruce Blumberg 2004-2007. All rights reserved PCR – polymerase chain reaction amplification of DNA • PCR is most routinely used method to amplify DNA – Exponential amplification of DNA by polymerases – Saiki et al, 1985 • 2n fold amplification, n= # cycles – 35 cycles = 235 = 3.4 x 1010 fold • Originally used DNA polymerase I – Needed to add fresh enzyme at every cycle because heat denaturation of template killed the enzyme – Not widely used – too painful to do manually – Nobel Prize to Kary Mullis in 1993 for deciding to use Taq DNA polymerase for PCR • He was middle author on paper! BioSci D145 lecture 4 page 22 ©copyright Bruce Blumberg 2004-2007. All rights reserved PCR – polymerase chain reaction amplification of DNA (contd) Hot water bacteria: Thermus aquaticus Taq DNA polymerase Life at High Temperatures by Thomas D. Brock Biotechnology in Yellowstone © 1994 Yellowstone Association for Natural Science http://www.bact.wisc.edu/Bact303/b27 BioSci D145 lecture 4 page 23 ©copyright Bruce Blumberg 2004-2007. All rights reserved Cycle sequencing – fusion of PCR and fluorescent ddNTP sequencing • http://www.dnalc.org/ddnalc/resources/animations.html • Combine PCR amplification with dideoxy sequencing – cycle sequencing – Linear amplification of template in the presence of fluorescent ddNTPs – When nucleotides are used up reaction is over – Separate on capillary electrophoresis instrument – Advantages • Fast, single tube reaction • Works with small amounts of starting material – Disadvantages • Still need to prepare high quality template to sequence • Cost and time – Many sequencing centers spend time, $$ on template prep – Automation requirements BioSci D145 lecture 4 page 24 ©copyright Bruce Blumberg 2004-2007. All rights reserved Isothermal amplification – the solution to template preparation • How to make template preparation faster, easier and more reliable? – Eliminate automation requirement, amplify starting material in some other way – Φ29 DNA polymerase (aka TempliPhi) – http://www.gelifesciences.com/aptrix/upp01077.nsf/content/sample_pr eparation~product_selection_category~rolling_circle_amplification – Enzyme has high processivity and strand displacement activity • Isothermal reaction produces huge quantities of DNA from tiny amount of input • More efficient than PCR (no temp change, no machine, no cleanup) BioSci D145 lecture 4 page 25 ©copyright Bruce Blumberg 2004-2007. All rights reserved Modern DNA sequence analysis • Cycle sequencing – Virtually all commercial DNA sequencing today is done by cycle sequencing with fluorescent ddNTPs • ABI Big Dye chemistry – Template preparation still tedious for small scale • TempliPHi used in genome centers (obviated need for most automation) – Capillary sequencers predominant form of technology in use • But, next generation sequencing is already coming online and will rapidly displace old technology in genome centers. – 454 sequencing (Roche) – Solexa (Illumina) – SoLID (Applied Biosystems) • 3rd generation sequencing (individual DNA molecule) now available – e.g., Pacific Biosciences (sequence reads of 1,000-10K bases) BioSci D145 lecture 4 page 26 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies • Sequencing by hybridization – Construct a high-density microchip with all possible combinations of a short oligonucleotide • Up to 25-mers • By photolithography – Synthesized on chip directly – Label and hybridize fragment to be sequenced – Wash stringently – Read fluorescent spots – Reconstruct sequence by computer BioSci D145 lecture 5 page 27 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies (contd) • Sequencing by hybridization rarely used for de novo sequencing – Extremely fast and useful to sequence something you already know the sequence of but want to identify mutation - resequencing – Disease causing changes • e.g in mitochondrial DNA – SNP discovery – Works best for examining sequence of <10 kb BioSci D145 lecture 5 page 28 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies (contd) • http://www.affymetrix.com/products/arrays/index.affx • SNP discovery – Photo shows mitochondrial chip – Right panel shows pairs of normal (top) vs disease (bottom) (Leber’s Hereditary Optic Neuropathy) • Top 3 disease mutations • Bottom control with no change BioSci D145 lecture 5 page 29 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies – Next Generation sequencing • 2nd generation = high throughput, short sequences • 3rd generation = single molecule sequencing • Small number of sequence templates (thousands) but very long reads (~105 bp) • What is the immediate implication of this technology for genome assembly? We should now be able to completely sequence large insert clones directly and avoid fragmentation by repetitive elements! • Key review is Metzger, M.L. (2010) Sequencing technologies — the next generation, Nature Reviews Genetics 11, 31-46. BioSci D145 lecture 5 page 30 ©copyright Bruce Blumberg 2004-2007. All rights reserved 3rd generation Other sequencing technologies (contd) • Pyrosequencing – – http://www.454.com – Based on synthesis of complementary strand to a template (like Sanger) – Detection of elongation with chemiluminescence • Fragment genome to appropriate size (depends on application) • add adapters to each end • Isolate those with different adapters on each end • PCR to amplify BioSci D145 lecture 5 page 32 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies (contd) • Pyrosequencing (contd) – PCR – capture template on micro beads such that each bead gets 1 molecule of DNA – how? Use a large ratio of beads to DNA – Emulsify in water/oil microreactors – Amplify DNA – Break and recover DNA containing beads BioSci D145 lecture 5 page 33 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies (contd) • Pyrosequencing (contd) – Sequencing – load beads into picotiter wells • Add enzymes (sulfurylase and luciferase) • Run reaction – flow nucleotide/buffer solution across wells one at a time • Complementary nucleotide addition leads to light output – light output is proportional to # consecutive nucleotides BioSci D145 lecture 5 page 34 ©copyright Bruce Blumberg 2004-2007. All rights reserved Other sequencing technologies (contd) • Pyrosequencing (contd) – What is the point? • Can generate 400,000 reads in parallel (FLX) • Or > 1,000,000 (FLX Titanium) • Each read is 200-400 bp (FLX), or 400-600 (FLX Titanium) • So you can get – 8 x107 bp per run! (FLX) – 4-6 x 108 bp/run (FLX Titanium) • What is massively parallel sequencing good for? – – – – – – – Rapid sequencing of genomes, or resequencing of known sequences Ancient DNA (even dinosaurs? – Svante Pääbo says ~200K years is limit) ChIP-sequencing (week 6) Sequencing ESTs or other tags Determining microbial diversity in field samples Transcriptome sequencing Identifying variations in • Viral populations • Gene sequences in mixed populations BioSci D145 lecture 5 page 35 ©copyright Bruce Blumberg 2004-2007. All rights reserved Amplicon sequencing • Idea is to sequence many copies of the same thing – Gene sequence – mRNA transcript BioSci D145 lecture 5 page 36 ©copyright Bruce Blumberg 2004-2007. All rights reserved Amplicon sequencing (contd) • What is amplicon sequencing good for? – Discovery of rare somatic mutations in complex samples (e.g., cancerous tumors - mixed with germline DNA) based on ultra-deep sequencing of amplicons – Sequencing collections of exons from populations of individuals to identify diversity – Sequencing collections of human exons from populations of individuals for the identification of rare alleles associated with disease – Analysis of viral quasispecies present within infected populations in the context of epidemiological studies – Evolutionary biology in populations BioSci D145 lecture 5 page 37 ©copyright Bruce Blumberg 2004-2007. All rights reserved