The Past, Present, and Future of DNA Sequencing

The Past, Present, and Future of DNA Sequencing Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University A very short history of DNA sequencing I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between deoxyribonucleic acids. Edwin Chargaff Milestones • • • • • • First Isolation of DNA : 1867 (Freidrich Meisher) Composition of nucleic acids; tetranucleotide theory : 1909 - 1940 (Phoebus Levine) G=C and A=T however, the G/C and A/T content of different organisms vary : 1950 (Edwin Chargaff) G/C content measured by annealing : 1968 (Mandel and Marmur) Maxam-Gilbert and Sanger Sequencing : 1977 Next-Generation Sequencing : 2005 Genomes Sequenced • Virus – 3222 (Bacteriophage phi X 174, 5386 nt – 1977) • Bacteria – 2289 (Haemophilus influenza, 1.8 x 106 nt – 1995) • Eukarya – 168 (S. cerevisiae 1.2 x 107 nt – 1995; H. sapien, 3 x 109 nt -2001) • Archaea – 152 (Methanococcus jannaschi , 1.7 x 106 nt – 1996) Next-Generation Sequencing Liu et al. Journal of Biomedicine and Biotechnology Volume 2012 (2012), Article ID 251364, 11 pages doi:10.1155/2012/251364 Changes in instrument capacity* ER Mardis. Nature 470, 198-203 (2011) doi:10.1038/nature09796 Sequencing Cost Date Sep-01 Sep-02 Oct-03 Oct-04 Oct-05 Oct-06 Oct-07 Oct-08 Oct-09 Oct-10 Oct-11 Oct-12 Jan-13 Cost per Mb Cost per Genome $5,292.39 $3,413.80 $2,230.98 $1,028.85 $766.73 $581.92 $397.09 $3.81 $0.78 $0.32 $0.09 $0.07 $0.06 Source - NHGRI : http://www.genome.gov/sequencingcosts/ $95,263,072 $61,448,422 $40,157,554 $18,519,312 $13,801,124 $10,474,556 $7,147,571 $342,502 $70,333 $29,092 $7,743 $6,618 $5,671 Central Dogma of Molecular Biology James Watson version - 1965 DNA RNA Protein So once we have the genomic DNA sequence of a species we have all of the information there is? Really? • No, not really. Illumina HiSeq and MiSeq • Massively parallel – HiSeq : 150 or 180 million reads per lane – MiSeq : 15 million reads per run • Intermediate Read Length – HiSeq : 100 nt or 150 nt – MiSeq : 250 nt • High total output per run – HiSeq : 90 GB or 288 GB – MiSeq : 8 GB Sequencing Types Single Read Paired-end read Mate-pair read Library Types • Many different library preps : DNA, mate-pair, mRNA, miRNA, ChIP • Fragmentation – DNA : 300 – 500 nt – RNA : 150 – 200 nt • Attachment of appropriate adapters – Complex : flow cell binding, F & R sequencing, BC – Custom : Avoid if possible • Removal of dimers/small inserts • Amplification (or not) Applications • de Novo sequencing (genomes, transcriptomes) • Resequencing (genomes, exomes, custom sequence capture) • RNA-seq (mRNA, miRNA, degradome) • Chip-Seq • Methyl-seq • RIP-seq • Amplicon de Novo Experimental Design • Estimate of genome size • Coverage (30 x – 100 x) • Sequencing Type (paired-end or mate-pair) • Example 100 MB genome, 100 x 100 nt paired-end reads – (100 MB) x (30 x coverage) = 3 GB – 3 GB / (200 nt for each pair of paired-end reads) = 15 million read pairs • Replicates Resequencing : Sequence Capture RNA-seq Experimental Design • Estimate of transcriptome size (1-5% of genome ?) • Coverage (30 x ?) – mRNA or rRNA depleted RNA – Relative abundance of transcripts you are interested in • Sequencing Type (single read or paired-end) – Simple transcriptome vs. complex transcriptome – Splice variants • Example 3 GB genome, 100 nt single reads – (3 GB genome) x ( 5% transcriptome ) = 120 MB Transcriptome – (120 MB transcriptome) x (30 x coverage) = 4.5 GB total sequence – 4.5 GB / (100 nt for each read) = 45 million read pairs • Replicates : Yes!!!! – Biological not technical ChIP-Seq http://www.nature.com/nmeth/journal/v4/n8/images/nmeth0807-613-F1.gif RIP-seq Source : http://openi.nlm.nih.gov/imgs/rescaled512/3269675_ijms-13-00097f6.png Methyl-seq 20 different types of base modifications in DNA are known and there are perhaps 200 modifications of RNA Experimental Space: Next-Gen Platform • PacBio : 0.075 x 106 reads/sample, 1000 – 3000 nt – Whole transcript • Roche 454 FLX+ : 0.5 -1 x 106 reads/sample, 800 -1000 nt – Small – Medium Genome de novo sequencing – Long Amplicon – Transcriptome • PGM: 1-2 x 106 reads per sample, 400 nt – Small genome de novo – Medium Amplicon • MiSeq: 1-2 x 106 reads per sample, 50 – 250 nt – Small genome de Novo – Small Amplicon • HiSeq : 10-100 x 106 reads per sample, 50 – 150 nt – Counting Applications : RNA-seq, ChIP-seq, RIP-seq, Methyl-seq – Large genome de novo and resequencing Experimental Space: The Relevancy of “Classic” Techniques Differential Gene Expression • Northern blotting (1977) : 1 Probe – 20 samples • Dot Blots (1987) : 100s of probes – 1 sample • RT-PCR (1992) : 100s of probes – 10 -100 samples • Microarrays (1995 ) : 100,000s of probes – 1 sample • Next-gen sequencing (2005) : 10-100 x 106 reads – 1 sample The Future • More Reads • Longer Reads • Faster Sequencing • Cheaper Sequencing • New Applications

The Past, Present, and Future of DNA Sequencing

Related documents

Products

Support

The Past, Present, and Future of DNA Sequencing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib