Cain_Genome_Informatics_2011

advertisement
GMOD in the Cloud
Scott Cain
GMOD Project Coordinator
Ontario Institute for Cancer Research
scott@scottcain.net
Genome Informatics
November 3, 2011
Introduction: GMOD is …
Click to edit the title text format
• A set of interoperable open-source software
components for visualizing, annotating, and
managing biological data.
• An active community of developers and
users asking diverse questions, and facing
common challenges, with their biological data.
Who uses GMOD?
Click to edit the title text format
Plus hundreds of others
GMOD in the Cloud
Click to edit the title text format
What GMOD in the cloud isn't:
Clouds
Guy getting
blown up
Garry's MOD (aka gmod.com)
Several GMOD Cloud Projects
Click to edit the title text format
Galaxy - Web-based platform for data intensive
biomedical research
CloVR - Automated and portable sequence
analysis
GBrowse2 - Web-based, scalable genome
browser
cloud.gmod.org - Several integrated GMOD
tools
http://gmod.org/wiki/Cloud
Galaxy Cloudman
Click to edit the title text format

Get Galaxy without the data or usage
limitations.

Combine with Cloud BioLinux to have access
to MANY tools.

Create an analysis cluster in minutes.

Use autoscaling to get good performance at
low cost.
http://wiki.g2.bx.psu.edu/Admin/Cloud
Deploying Galaxy cluster on AWS
Click to edit the title text format
1.
2.
3.
4.
Exercising elasticity with autoscaling
Click to edit the title text format
Fixed cluster size
5
nodes
20
nodes
Computation time: 9 hrs
Computation cost: $20
Computation time: 6 hrs
Computation cost: $50
Dynamic cluster size
1 to 16
nodes
Computation time: 6 hrs
Computation cost: $20
CloVR
Click to edit the title text format

Cloud Virtual Resource.

Automated pipeline for sequence analysis.

Uses 2 GMOD tools: Workflow and Ergatis.

Use a virtual machine locally to interact with
resources in the cloud.
http://clovr.org/
CloVR Architecture
Click to edit the title text format
Why the virtual machine?
Click to edit the title text format
Running the pipeline happens on the local machine,
while the heavy lifting is done on the cloud/cluster
GBrowse2
Click to edit the title text format

Installed and configured recent release of
GBrowse2.

Tools to allow automatically adding rendering
servers.

Ability to add standard data sets.
http://gmod.org/wiki/GBrowse
GBrowse2
Click to edit the title text format
GBrowse2 in the Cloud
Master
Render
Slaves
Yeast
Fly
Worm
Human
Amazon Snapshots
Click to edit the title text format
cloud.gmod.org
Click to edit the title text format
GMOD tools preinstalled:
Tripal
Drupal-based web frontend
Chado
Generic organism DB schema
GBrowse
Venerable genome browser
JBrowse
Fast, AJAX genome browser
Sample data
Saccharomyces cerevisiae
Can be run as a micro machine (albeit slowly)
A little more on Tripal
Click to edit the title text format
Based on the popular CMS Drupal.
Several modules written to serve as an
interface for Chado:
Controlled Vocabularies
Features
Analyses
Libraries
Stocks
Integrated job management
Click to edit the title text format
Click to edit the title text format
Click to edit the title text format
Potential use case for Cloud GMOD
Click to edit the title text format
Community annotation:
Just add a web-start Apollo and set the security
group to allow it to connect to the database.
When WebApollo is ready, it's even easier: WA is
an addon to JBrowse but allows collaborative
editing.
Tripal and Drupal allow editing of most data types in
Chado, and commenting on pages similar to a
blog.
Why use the cloud?
Click to edit the title text format
Avoid installation related issues (saves you time
and frustration!)
Save money (how much, of course, depends)
Availability of common genomic data sets
(several projects already make these
available at AWS)
Future work
Click to edit the title text format
 Get GBrowse2 AMI public (very soon)

Add Apollo to gmod.cloud.org (relatively soon)

Add WebApollo to gmod.cloud.org (as soon
as it's released)
Conclusion
Click to edit the title text format
http://gmod.org/wiki/Cloud for more information
on GMOD work in the cloud.
http://cloud.gmod.org/ for a running example of
cloud.gmod.org.
http://clovr.org/ for more info on CloVR and to
download the client VM.
http://getgalaxy.org/ for more information on
getting Cloudman.
Acknowlegements
Click to edit the title text format
• Funding agencies: NIH, USDA ARS, NSF,
Ontario Ministry of Economic Development and
Innovation
• Lincoln Stein, Chris Vandevelde
• Enis Afgan and the Galaxy Team
• Sam Angiuoli et al at UofM SOM
• Stephen Ficklin and the Tripal group
• Mitch Skinner and JBrowse developers
• The rest of the GMOD community
Download