March 2012 Third Generation Sequencing Barbara Hutter Division of Theoretical Bioinformatics (B080) Computational Oncology group The Next Next Generation ● http://seqanswers.com/forums/showthread.php?t=6263 • When does "current generation" become "last generation", and "next generation" become "current generation", and can "third generation" become "next generation"? • Is Sanger still "current generation"? • Will the GA/454/Solid always be "next generation"? • Should we stop saying "current" and "next" and start saying "first" and "second"? • Maybe we should take a tip from Star Trek. After Next generation comes Deep Space Nine, Voyager, and Enterprise. • "Next Generation" does have a nicer ring to it than "Markedly Faster than Last Year (but wait until next year)" • I like the idea of always having the "next gen" name. Page 2 Barbara Hutter 3rd Generation Sequencing Why Another Generation of Sequencers? ● ● Next (2nd) Generation Sequencing weaknesses: • PCR steps introduce bias (duplicates, PCR errors) • alternating phases of nucleotide incorporation and signal detection • dephasing, quality decreasing towards the end • short reads • big expensive machines, expensive chemicals • long run times Make things bigger, better, faster, faster, cheaper: • less input DNA, simpler library preparation, less reagents (no “washing”) • smaller, less expensive machines • reduced run time • longer reads • better quality • higher throughput Page 3 Barbara Hutter 3rd Generation Sequencing Third Generation Sequencing in the Strict Sense ● ● 3rd Gen Seq = real time sequencing of single DNA (or RNA) molecules • by synthesis • nucleotide incorporation and signal detection occur continuously • as fast as the polymerase incorporates nucleotides (750 nt/sec) • SMRT (Pacific Biosciences) • without synthesis or ligation • Oxford Nanopore Between second and third generation: still use “wash-and-scan” technology • Ion Torrent (Life Technologies) • non-optical sequencing • Helicos Genetic Analysis System • single molecule sequencing Page 4 Barbara Hutter 3rd Generation Sequencing Advantages of Single Molecule Sequencing ● ● ● ● No PCR • no PCR amplification bias • simplified library construction No synchronization needed => no dephasing Consensus read • sequence the same template molecule more than once • multiple alignment of all the sequences from each template molecule • construct a consensus read • => reduce stochastic errors in the single-molecule sequence, gain greater accuracy than that of raw reads Direct RNA sequencing • replacing DNA polymerase with a reverse transcriptase or other RNA-dependent polymerase Page 5 Barbara Hutter 3rd Generation Sequencing Ion Torrent I ● ● ● Very similar to 454 sequencing but with semiconductor technology Hydrogen ions (H3O+) released by DNA synthesis => changes in pH that can be measured by ion-sensitive field-effect transistor (ISFET) Microwells on a semiconductor chip (ion sensitive layer) below which is an ISFET ion sensor • each microwell contains one single-stranded template DNA molecule to be sequenced and one DNA polymerase http://en.wikipedia.org/wiki/Ion_semiconductor_sequencing Page 6 Barbara Hutter 3rd Generation Sequencing Ion Torrent II ● ● ● ● ● Has been purchased by Life Technologies for $375 million No need for labeled nucleotides and (expensive and bulky) laser equipment • small and comparably cheap machines Limited read length (max. 200 bp) and throughput, but very fast • ~ 100 million bases in 2 hours Homopolyer error similar to 454 • proportionally greater electronic signal Already widely applied for: • targetted and amplicon sequencing, e.g. for validation of variants detected with other sequencing platforms • sequencing of bacterial genomes, e.g. enterohemorrhagic Escherichia coli (EHEC) in June 2011 Page 7 Barbara Hutter 3rd Generation Sequencing Helicos Genetic Analysis System ● ● ● ● ● ● ● ● ● ● Close to 3rd Gen boundary First DNA-sequencing instrument to operate by imaging individual DNA molecules Individual DNA molecules fixed to a surface Proprietary Virtual Terminator nucleotides allow for step-wise sequencing ~ 1 billion molecules sequenced in ∼8 days High raw error rate (over 5%) improved by consensus sequencing Reads only ~ 32 nucleotides Higher costs than 2nd Gen sequencing Direct RNA sequencing possible Helicos BioSciences have re-focused on molecular diagnostics http://en.wikipedia.org/wiki/Single_molecule_fluorescent_sequencing Page 8 Barbara Hutter 3rd Generation Sequencing SMRT I ● ● ● ● ● Single-Molecule Real Time sequencing DNA polymerase anchored to the bottom of zero-mode waveguides (ZMW; some 10 nm diameter) with biotinstreptavidin Laser light “bends” only 30 nm into the ZMW Dye is attached to the phosphate and naturally cleaved off Each incorporated nucleotide emits a “flash” metal film with holes = ZMWs DNA polymerase glass slide Schadt EE et al. A window into third-generation sequencing Hum. Mol. Genet. (2010) 19(R2):R227 Page 9 Barbara Hutter 3rd Generation Sequencing SMRT II ● ● ● ● ● ● ● Incorporation in milliseconds but 3 times slower than diffusion => strong signal of the incorporated nucleotide overcomes the noise of diffusing ones Minimal amount of reagents, only added once dsDNA is circularized => can “read” the same sequence (both strands) several times Read size limited by bleaching and redox reactions Standard reads: 1 pass = 1 read, 1 - 2 kb Short reads: 6 passes, 250 bp, circular consensus sequence (=> high quality) Strobes: 1 pass, very large (6 -10 kb) fragments (“multiple paired end reads”) • alternating light pulses (signal) and letting the polymerase work in the dark (gaps) • structural variants, transcripts, haplotyping, scaffolding in hybrid assembly Page 10 Barbara Hutter 3rd Generation Sequencing SMRT III ● ● ● ● ● ● ● Strand-specific Modified nucleotides on template (e.g. methylated C) can be detected by altered kinetics (light impulse duration, height and width of peaks, distance between peaks) • actual detection is complicated Frequently random errors and insertions / deletions • special mapping and assembly software • ~ 50 Mb mappable reads per run Speed limit imposed by the imaging equipment Can use other molecules instead of DNA polymerase • RNA polymerase => watch transcription • ribosome (labeled tRNAs) => watch translation • molecule that binds drugs Already on the market but not high throughput Applications: • targetted sequencing • hybrid assembly (patch gaps in scaffolds of EHEC genome) Page 11 Barbara Hutter 3rd Generation Sequencing Nanopore Sequencing I ● ● ● ● ● ● ● Oxford Nanopore Nanopore sequencing is a method under development since 1995 Porous transmembrane cellular protein, diameter ~ 1nm • modified alpha hemolysin (αHL) or Mycobacterium smegmatis porin A (MspA) Nanopore is immersed in a conducting fluid (synthetic lipid bilayer) Potential (voltage) applied across the bilayer (by salt gradient) Conduction of ions through the nanopore => electric current Passage of bases disrupts the current Schadt EE et al. 2010 Page 12 Barbara Hutter 3rd Generation Sequencing Nanopore Sequencing II ● ● ● ● ● ● Characteristic change in the magnitude of the current through the nanopore • single nucleotides cleaved off by endonuclease • threading the whole ssDNA through the nanopore • no single-nucleotide resolution because the DNA strand moves too rapidly through the nanopore (1-5 μs per base) • engineered nanopores, dsDNA stretches to slow down • modified bases recognized directly Pore records each base irrespective of what comes before or after • homopolymer stretches are resolved correctly Can read the same DNA molecule several times => consensus sequence Strand-specific, average read length > 1 kb Also applicable to other molecules (RNA, amino acids, ...) No laser equipment, no need for high-speed CCD camera, no chemicals Page 13 Barbara Hutter 3rd Generation Sequencing Nanopore Sequencing III ● ● ● ● ● ● ● ● Oxford Nanopore ready for the market in 2012 http://www.nanoporetech.com/news/press-releases/view/39 Oxford Nanopore's GridION system consists of scalable instruments (nodes) used with consumable cartridges that contain proprietary array chips for multi-nanopore sensing. Each GridION node and cartridge is initially designed to deliver tens of Gb of sequence data per 24 hour period, with the user choosing whether to run for minutes or days according to the experiment. Oxford Nanopore has also miniaturised these devices to develop the MinION; a disposable DNA sequencing device the size of a USB memory stick [...]. A single MinION is expected to retail at less than $900. Each cartridge is initially designed for real-time sequencing by 2,000 individual nanopores at any one time. Alternative configurations with more processing cores will become available in early 2013 containing over 8,000 nanopores. Nodes may be clustered in a similar way to computing devices, allowing users to increase the number of nanopore experiments being conducted at any one time if a faster time-to-result is required. For example, a 20-node installation using an 8,000 nanopore configuration would be expected to deliver a complete human genome in 15 minutes. Each GridION node contains all the computing hardware and control software required for primary analysis of data as it is streamed from each nanopore, resulting in full length real-time delivery of complete reads [...]. Page 14 Barbara Hutter 3rd Generation Sequencing Other Approaches ● ● ● Optical multipore detection • two different fluorescently labeled molecular beacons hybridized to the DNA • the beacons are sequentially unzipped from the DNA molecules as they are translocated through a nanopore • each unzipping event unquenches a new fluorophore Direct imaging of DNA • transmission electron microscopy (TEM) • scanning tunneling microscope (STM) tips • DNA molecule has to be stretched and fixed on a surface Schadt EE et al. 2010 • no publications for proof of principle yet Transistor-mediated DNA sequencing (developed by IBM) • individual bases of ssDNA molecules pass through nanometer-sized pores => unique electronic signature • surface of the pores consists of axially stratified, alternating layers of metal and dielectric material (like a transistor; see figure) • control the motion of the DNA through the pores by modulating the current in the electrodes of the transistor Page 15 Barbara Hutter 3rd Generation Sequencing Summary Third Generation Sequencing ● ● ● ● ● ● No “wash-and-scan” technology “Real time” - really fast No synchronization required => no dephasing problem Single molecule sequencing • no PCR => no bias, simpler library preparation • strand-specific • direct detection of modified bases • also RNA Improved read quality by consensus sequence from multiple passes Challenges • not yet at high throughput • deletions and insertions are the most frequent errors • need special mapping and analysis programs Page 16 Barbara Hutter 3rd Generation Sequencing Literature ● ● ● ● ● Schadt EE et al. A window into third-generation sequencing Hum. Mol. Genet. (2010) 19(R2):R227 Ion Torrent • http://www.iontorrent.com/publications/ SMRT • http://www.pacificbiosciences.com/news_and_events/publications • http://www.aacc.org/events/meeting_proceeding/2011/Documents/OakRidge_Tur ner_Slides.pdf Nanopore Sequencing • http://www.nanoporetech.com/technology/publications Sequencing Wars - The Third Generation • http://stocks.investopedia.com/stock-analysis/2010/Sequencing-Wars---TheThird-Generation-ILMN-LIFE-A-CALP-GE0610.aspx#ixzz1op4LPPE8 Page 17 Barbara Hutter 3rd Generation Sequencing The Next Generation (Sequencing Experts) at DKFZ ● ● http://www.dkfz.de Theoretical Bioinformatics (http://ibios.dkfz.de/tbi/) Computational Oncology Group (Benedikt Brors) ● Network Modelling Group (Rainer König) • Moritz Aschoff – RNA-Seq • Prakash Balasubramanian – ChIP-Seq • Volker Ast – RNA-Seq • Lars Feuerbach – integrative analyses • Rosario Piro – interactions, pathways • Michael Heinold – WGS pipeline • Barbara Hutter – WGS, SOLiD, RNA• • • • • • Seq, ChIP-Seq Natalie Jäger – WGS, SNVs Dilafruz Juraeva – GWAS, pathways Rolf Kabbe – best cluster administrator ever! Nora Rieber – SOLiD, Complete Genomics Matthias Schlesner – structural variants, assembly Qi Wang – indels Page 18 Barbara Hutter ● Molecular Genetics • Volker Hovestadt – miRNA-Seq, WG bisulfite-Seq, methods development • Marc Zapatka – assembly ● Core Facilitity Genome Sequencing • Bärbel Lasitschka – WGS pipeline And you?! 3rd Generation Sequencing