Document 11311572

Department of Electrical and Computer Engineering Fall 2014 Seminar Series Seminar Title: RNA-‐seq expression estimates need not take longer than a cup of coffee Time: 3:00-4:00 PM, Friday, Oct 17, 2014 Location: ECE 101 Lankford Lab Speaker: Carl Kingsford Carnegie Mellon University Abstract: The quantification of isoform abundance is a fundamental step in many transcriptome analysis tasks, such as determining differential expression between biological samples. Yet, estimating isoform abundance from a large set of RNA-‐seq reads was a computationally intensive task, owing in large part to the necessity of read mapping. To address this problem, we developed Sailfish, a software tool that implements a novel, alignment-‐free algorithm for the estimation of isoform abundances directly from a set of reference sequences and RNA-‐seq reads. Rather than working at the read level, the fundamental unit of transcript coverage in Sailfish is the k-‐mer. This lightweight approach allows Sailfish to dispense with many of the complexities of read mapping while remaining robust to sequencing errors. Sailfish is able to quantify isoform abundance in about 15 minutes for a set of 150 million reads where previous tools took over 6 hours. This increase in speed is obtained without sacrificing accuracy. Sailfish implements an efficient, accelerated expectation-‐maximization algorithm for quantifying isoform abundance that produces high-‐quality results and is capable of correcting numerous types of systematic bias that are known to occur in RNA-‐seq experiments. We demonstrate that, on both real and synthetic data, Sailfish is as accurate as existing read mapping-‐based tools such as eXpress and Cufflinks. As the size and quantity of RNA-‐ seq experiments continues to grow, and as such experiments become more relevant in high-‐throughput and clinical settings, Sailfish and similar alignment-‐free approaches will become crucial to allowing computational analysis to keep up with data acquisition. An open-‐source and highly optimized implementation of our algorithm, licensed under GPL v3, is available at http://www.cs.cmu.edu/~ckingsf/software/sailfish (short URL http://ongen.us/SFish). This is joint work with Rob Patro and Stephen M. Mount. Speaker Bio: Carl Kingsford is an Associate Professor of Computational Biology in Carnegie Mellon’s School of Computer Science. His group works on creating efficient computational methods to extract biological insight from high-‐throughput data, including analysis of molecular interaction networks and large-‐scale genomics. He is the author of many open-‐source, widely-‐used software packages including JELLYFISH, Sailfish, TransTermHP, GIRAF, Armatus, CORAL, and others. He is the recipient of an Alfred P. Sloan Research Fellowship in computational and evolutionary molecular biology, a NSF CAREER award, and was recently selected by the Gordon and Betty Moore Foundation as one of 14 Data Driven Discovery Investigators.

Document 11311572

Related documents

Products

Support

Document 11311572

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib