Document 11311572

advertisement
Department of Electrical and Computer Engineering
Fall 2014 Seminar Series
Seminar Title: RNA-­‐seq expression estimates need not take longer than a cup of coffee
Time: 3:00-4:00 PM, Friday, Oct 17, 2014
Location: ECE 101 Lankford Lab
Speaker:
Carl Kingsford
Carnegie Mellon University
Abstract:
The quantification of isoform abundance is a fundamental step in many transcriptome analysis tasks, such as determining differential expression between biological samples. Yet, estimating isoform abundance from a large set of RNA-­‐seq reads was a computationally intensive task, owing in large part to the necessity of read mapping. To address this problem, we developed Sailfish, a software tool that implements a novel, alignment-­‐free algorithm for the estimation of isoform abundances directly from a set of reference sequences and RNA-­‐seq reads. Rather than working at the read level, the fundamental unit of transcript coverage in Sailfish is the k-­‐mer. This lightweight approach allows Sailfish to dispense with many of the complexities of read mapping while remaining robust to sequencing errors. Sailfish is able to quantify isoform abundance in about 15 minutes for a set of 150 million reads where previous tools took over 6 hours. This increase in speed is obtained without sacrificing accuracy. Sailfish implements an efficient, accelerated expectation-­‐maximization algorithm for quantifying isoform abundance that produces high-­‐quality results and is capable of correcting numerous types of systematic bias that are known to occur in RNA-­‐seq experiments. We demonstrate that, on both real and synthetic data, Sailfish is as accurate as existing read mapping-­‐based tools such as eXpress and Cufflinks. As the size and quantity of RNA-­‐
seq experiments continues to grow, and as such experiments become more relevant in high-­‐throughput and clinical settings, Sailfish and similar alignment-­‐free approaches will become crucial to allowing computational analysis to keep up with data acquisition. An open-­‐source and highly optimized implementation of our algorithm, licensed under GPL v3, is available at http://www.cs.cmu.edu/~ckingsf/software/sailfish (short URL http://ongen.us/SFish). This is joint work with Rob Patro and Stephen M. Mount. Speaker Bio:
Carl Kingsford is an Associate Professor of Computational Biology in Carnegie Mellon’s School of Computer Science. His group works on creating efficient computational methods to extract biological insight from high-­‐throughput data, including analysis of molecular interaction networks and large-­‐scale genomics. He is the author of many open-­‐source, widely-­‐used software packages including JELLYFISH, Sailfish, TransTermHP, GIRAF, Armatus, CORAL, and others. He is the recipient of an Alfred P. Sloan Research Fellowship in computational and evolutionary molecular biology, a NSF CAREER award, and was recently selected by the Gordon and Betty Moore Foundation as one of 14 Data Driven Discovery Investigators. 
Download