GMOD in the Cloud Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research scott@scottcain.net Genome Informatics November 3, 2011 Introduction: GMOD is … Click to edit the title text format • A set of interoperable open-source software components for visualizing, annotating, and managing biological data. • An active community of developers and users asking diverse questions, and facing common challenges, with their biological data. Who uses GMOD? Click to edit the title text format Plus hundreds of others GMOD in the Cloud Click to edit the title text format What GMOD in the cloud isn't: Clouds Guy getting blown up Garry's MOD (aka gmod.com) Several GMOD Cloud Projects Click to edit the title text format Galaxy - Web-based platform for data intensive biomedical research CloVR - Automated and portable sequence analysis GBrowse2 - Web-based, scalable genome browser cloud.gmod.org - Several integrated GMOD tools http://gmod.org/wiki/Cloud Galaxy Cloudman Click to edit the title text format Get Galaxy without the data or usage limitations. Combine with Cloud BioLinux to have access to MANY tools. Create an analysis cluster in minutes. Use autoscaling to get good performance at low cost. http://wiki.g2.bx.psu.edu/Admin/Cloud Deploying Galaxy cluster on AWS Click to edit the title text format 1. 2. 3. 4. Exercising elasticity with autoscaling Click to edit the title text format Fixed cluster size 5 nodes 20 nodes Computation time: 9 hrs Computation cost: $20 Computation time: 6 hrs Computation cost: $50 Dynamic cluster size 1 to 16 nodes Computation time: 6 hrs Computation cost: $20 CloVR Click to edit the title text format Cloud Virtual Resource. Automated pipeline for sequence analysis. Uses 2 GMOD tools: Workflow and Ergatis. Use a virtual machine locally to interact with resources in the cloud. http://clovr.org/ CloVR Architecture Click to edit the title text format Why the virtual machine? Click to edit the title text format Running the pipeline happens on the local machine, while the heavy lifting is done on the cloud/cluster GBrowse2 Click to edit the title text format Installed and configured recent release of GBrowse2. Tools to allow automatically adding rendering servers. Ability to add standard data sets. http://gmod.org/wiki/GBrowse GBrowse2 Click to edit the title text format GBrowse2 in the Cloud Master Render Slaves Yeast Fly Worm Human Amazon Snapshots Click to edit the title text format cloud.gmod.org Click to edit the title text format GMOD tools preinstalled: Tripal Drupal-based web frontend Chado Generic organism DB schema GBrowse Venerable genome browser JBrowse Fast, AJAX genome browser Sample data Saccharomyces cerevisiae Can be run as a micro machine (albeit slowly) A little more on Tripal Click to edit the title text format Based on the popular CMS Drupal. Several modules written to serve as an interface for Chado: Controlled Vocabularies Features Analyses Libraries Stocks Integrated job management Click to edit the title text format Click to edit the title text format Click to edit the title text format Potential use case for Cloud GMOD Click to edit the title text format Community annotation: Just add a web-start Apollo and set the security group to allow it to connect to the database. When WebApollo is ready, it's even easier: WA is an addon to JBrowse but allows collaborative editing. Tripal and Drupal allow editing of most data types in Chado, and commenting on pages similar to a blog. Why use the cloud? Click to edit the title text format Avoid installation related issues (saves you time and frustration!) Save money (how much, of course, depends) Availability of common genomic data sets (several projects already make these available at AWS) Future work Click to edit the title text format Get GBrowse2 AMI public (very soon) Add Apollo to gmod.cloud.org (relatively soon) Add WebApollo to gmod.cloud.org (as soon as it's released) Conclusion Click to edit the title text format http://gmod.org/wiki/Cloud for more information on GMOD work in the cloud. http://cloud.gmod.org/ for a running example of cloud.gmod.org. http://clovr.org/ for more info on CloVR and to download the client VM. http://getgalaxy.org/ for more information on getting Cloudman. Acknowlegements Click to edit the title text format • Funding agencies: NIH, USDA ARS, NSF, Ontario Ministry of Economic Development and Innovation • Lincoln Stein, Chris Vandevelde • Enis Afgan and the Galaxy Team • Sam Angiuoli et al at UofM SOM • Stephen Ficklin and the Tripal group • Mitch Skinner and JBrowse developers • The rest of the GMOD community