Genome and Transcriptome-Free Analysis of RNA-Seq

advertisement
Cloud Implementation of GT-FAR
(Genome and Transcriptome-Free Analysis of RNA-Seq)
University of Southern California
GT-FAR Pipeline
GT-FAR Components
1.
Read Quality Control and Adaptor Trimming for Input Read File
2.
Sequential Ungapped Mapping to Reference Gene-Models/Genome*
3.
Gapped alignment to Reference Gene Models/Genome to faciliate Splice
Variant Prediction*
4.
Sample Quantification
a) A reference based version concerning gene/junction/exon/pre-mRNA
expression*
b) A reference free quantification of read/kmer sequences
5.
Output
a) Quantification data, visualization, and an alignment sam file for further
analysis
b) Capable of including >99% in reference based output in high quality
human samples
6.
* When a reference genome and gtf file are available. If one is not
available only a sequence/kmer based analysis (4b) is performed.
Pegasus WMS on the Cloud
• Allows scientist to design an analysis at a high-level without worrying about
how to invoke it, execute it
• Provides Python, Java, and Perl APIs for workflow creation
• Automatically executes computations on computational resources available
to the community or individual
• When failures occur, it tries to recover from them using a variety of
mechanisms
• Records provenance
• Used in a number of domains: astronomy, bioinformatics, earthquake
science, helioseismology, gravitational-wave physics, seismology, etc..
• Detailed documentation on workflow design and execution at
http://pegasus.isi.edu
• Pegasus tutorial on Amazon AWS
http://pegasus.isi.edu/wms/docs/latest/vm_amazon.php
• User support available pegasus-users@isi.edu
GT-FAR Cloud Based Pipeline
Capabilities
•
•
•
•
•
•
•
•
GT-FAR pipeline is available as a cloud-based
solution hosted on Amazon EC2.
(http://genomics.isi.edu )
The pipeline is executed on distributed resources
using the Pegasus Workflow Management System
(http://pegasus.isi.edu )
Investigators can start an
EC2 instance with a GUI/GTFAR
Users can upload input files
(FastQ file in gzip format)
using web browser
Tracks running workflows
Users are able to download
the outputs to their local
laptops
Outputs are also made
available in Amazon S3
Allows for error reporting and
debugging
GTFAR Success Email
GTFAR Failure Email
Expression of APOL1
•
APOL1 has moderate expression
– we can notice that it all comes from a few exons and matching
junctions
– Hence, it is driven by a single transcript.
RNA-seq Analysis Workflows
• GT-FAR (Read-based RNA-seq Analysis)
– New Functions: Novel Splice Junctions, Reference-free analysis
– Pegasus WMS: http://pegasus.isi.edu
– Pegasus GT-FAR (genome and transcriptome free analysis of RNA):
http://genomics.isi.edu/gtfar
– Pegasus tutorial on Amazon AWS
http://pegasus.isi.edu/wms/docs/latest/vm_amazon.php
– GitHub: https://github.com/pegasus-isi/pegasus-gtfar
• RseqFlow (Standard RNA-seq Analysis)
– Command line based
– Functions: RPKM, Differential Expression, Variants
– Google: https://code.google.com/p/rseqflow/
– GitHub: https://github.com/herstein/RseqFlow
– SourceForge: http://sourceforge.net/projects/rseqflow
Download