decWorkshopSlidedeck - iPlant Pods

advertisement
The iPlant Collaborative
Community Cyberinfrastructure for Life Science
Jason Williams
Cold Spring Harbor Laboratory, iPlant
www.iPlantCollaborative.org
The iPlant Collaborative
Vision
Enable life science researchers and educators to use
and extend cyberinfrastructure to understand and
ultimately predict the complexity of biological systems
The iPlant Collaborative
Vision
How can we prepare for science we can’t anticipate?
The iPlant Collaborative
What is cyberinfrastructure?
iPlant makes computation, data storage, cloud services, and
software tools easily available to informaticians and researchers,
leveraging existing CI investments.
Cyberinfrastructure consists of computing systems, data storage systems, instruments
and data repositories, visualization environments, and people, linked together by
software and networks to improve research productivity and enable breakthroughs not
otherwise possible. --Craig Stewart
Biological Cyberinfrastructure
The Problem of Big Data in Biology
The iPlant Collaborative
Where iPlant is today and where we are going
• Initial funding in 2008
• Almost 2 years of community input
gathering – software development starts
in 2009
• Major CI components appear late 2010
• Finished 5th year
• Recommended for second 5 year term
• > 9000 users
• > 20K (analyses) jobs in 2012
• > 10K HPC jobs)
• 500 terabytes of user data
Image from: http://adammclane.com/2011/12/06/bottlenecks/
The iPlant Collaborative
Where iPlant is today and where we are going
iPlant Renewed by NSF
September 2013 begins next 5 year period
Scientific Advisory Board
Focus on Genotype-Phenotype science
NSF Recommended expansion of scope beyond plants
The iPlant Collaborative
What we have to offer you
•
•
•
•
•
•
•
•
•
•
Data Management & Storage Resources
Access to High Performance Computing Resources
Tool Integration System
Application Programming Interfaces (APIs)
Cloud Computing Resources
Genotype To Phenotype Science Enablement Portfolio
Tree of Life Science Enablement Portfolio
Image Analysis Platform
Support for Molecular Breeding Platform (IBP)
Support for AgMIP
How iPlant CI Enables Discovery
Solution: Discovery Environment
An extensible platform for
science
•
•
•
•
•
High-powered computing
Data sharing/collaboration
Easy to use interface
Virtually limitless apps
Analysis history (provenance)
How iPlant CI Enables Discovery
Solution: Atmosphere
On-demand computing resource built on
a cloud infrastructure
• Virtual Machine pre-configured with:
 Software
 Memory requirements
 Processing power
• Plant authentication and storage and
HPC capabilities
• Build custom images/appliances and
share with community
• Cross-platform desktop access to GUI
applications in the cloud (using VNC)
How iPlant CI Enables Discovery
Solution: iPlant Data Store
All data in within the same platform
speed and accessibility
Source
Time (s)
CD
320
• Access your data from multiple iPlant services
External Drive
36*
• Automatic data backup redundant between
University of Arizona and University of Texas
(NSF Data management plan)
USB2.0 Flash
30
• Multiple ways to share data with collaborators
iPlant Data
Store
18*
• Multi-threaded high speed transfers
My Computer
15
• Default 100GB allocation. >1TB allocations
available with justification
Berkeley Server 150
Highlighted Objectives and Deliverables
Community identified priorities
• Increased interoperability with other data providers – e.g.
BioMarts, CoGe, MaizeGDB
• Data discovery through interaction with trait repositories
(trait/plant ontologies)
• Workflows for variant discovery – SNP detection pipelines
• Scalable Genome Assembly Workflows – expanded
capabilities with MAKER, InterProScan
• iPlant Data Commons – Resources for storage, data
conversion, and metadata
The iPlant Collaborative
Leadership Team
Steve Goff - UA
Dan Stanzione – TACC
Matthew Vaughn - TACC
Nirav Merchant - UA
Doreen Ware – CSHL
Michael Schatz – CSHL
David Micklos – CSHL
Ann Stapleton – UNC Wilmington
Ron Vetter – UNC Wilmington
Faculty Advisors & Collaborators:
Ali Akoglu
Kobus Barnard
Timothy Clausner
Brian Enquist
Damian Gessler
Ruth Grene
John Hartman
Matthew Hudson
David Lowenthal
B.S. Manjunath
David Neale
Brian O’Meara
Sudha Ram
David Salt
Mark Schildhauer
Doug Soltis
Pam Soltis
Edgar Spalding
Alexis Stamatakis
Steve Welch
Your colleagues
Postdocs:
Students:
Barbara Banbury
Christos Noutsos
Solon Pissis
Brad Ruhfel
Peter Bailey
Jeremy Beaulieu
Devi Bhattacharya
Storme Briscoe
YaDi Chen
David Choi
Barbara Dobrin
Staff:
Steve Gregory
Matthew Hanlon
Natalie Henriques
Uwe Hilgert
Nicole Hopkins
EunSook Jeong
Logan Johnson
Chris Jordan
Kathleen Kennedy
Mohammed Khalfan
David Knapp
Lars Koersterk
Sangeeta Kuchimanchi
Kristian Kvilekval
Sue Lauter
Tina Lee
Andrew Lenards
Monica Lent
Greg Abram
Sonali Aditya
Ritu Arora
Roger Barthelson
Rob Bovill
Brad Boyle
Gordon Burleigh
John Cazes
Mike Conway
Victor Cordero
Rion Dooley
Aaron Dubrow
Andy Edmonds
Dmitry Fedorov
Melyssa Fratkin
Michael Gatto
Utkarsh Gaur
Cornel Ghiban
John Donoghue
Yekatarina Khartianova
Chris La Rose
Amgad Madkour
Aniruddha Marathe
Andre Mercer
Kurt Michaels
Zack Pierce
Andrew Predoehl
Sathee Ravindranath
Kyle Simek
Gregory Striemer
Jason Vandeventer
Nicholas Woodward
Kuan Yang
Zhenyuan Lu
Eric Lyons
Aaron MarcuseKubitz
Naim Matasci
Sheldon McKay
Robert McLay
Nathan Miller
Steve Mock
Martha Narro
Shannon Oliver
Benoit Parmentier
Jmatt Peterson
Dennis Roberts
Paul Sarando
Jerry Schneider
Bruce Schumaker
Edwin Skidmore
Brandon Smith
Mary Margaret Sprinkle
Sriram Srinivasan
Josh Stein
Lisa Stillwell
Jonathan Strootman
Peter Van Buren
Hans VasquezGross
Rebeka Villarreal
Ramona Wallls
Liya Wang
Anton Westveld
Jason Williams
John Wregglesworth
Weijia Xu
Overview of the iPlant Discovery Environment
Scalable platform for
powerful computing, data, and application resources
Overview of the iPlant Discovery Environment
The evolution of cyberinfrastructure, from the bench biologist’s point of view
?
Image From: http://www.wired.com/wired/archive/17.01/ff_mac_viewer.html
Overview of the iPlant Discovery Environment
Through the Discovery
Environment you have:
• High-powered computing
• iPlant data store
• Easy to use interface
• Virtually limitless apps
• Analysis history (provenance)
Overview of the iPlant Discovery Environment
Key DE Features in the 1.6-1.8
releases
• Enhanced ability to share data
with colleagues/collaborators
• Visual workflow creation
• WYSIWYG Tool Integration
Discovery Environment
What’s Next for the DE?




Support for batch submission
Extensive support for metadata, ontologies and tagging
Search data based on metadata and ontologies
Integrate with external data sources like Bio-mart
Overview of Atmosphere
Overview of Atmosphere
Cloud Computing on Demand
• On-demand computing resource built on a
cloud infrastructure
• Virtual Machine pre-configured with:
– Software
– Memory requirements
– Processing power
Overview of Atmosphere
Cloud Computing on Demand
• Fully integrated into iPlant authentication and
storage and HPC capabilities
• Enables users to build custom images/appliances
and share with community
• Cross-platform desktop access to GUI applications
in the cloud (using VNC)
• Provide easy web based access to resources
Overview of Atmosphere
Cloud Computing on Demand
•
API-compatible implementation of Amazon EC2/S3 interfaces
•
Virtualize the execution environment for applications and services
•
Up to 16 core / 32 GB instances
•
Access to Cloud Storage + EBS
Overview of Atmosphere
Multiple Ways to Access
• VNC client
• Command line tools (e.g. SSH)
Atmosphere
What’s Next ?




Increased capacity to meet demand
Transition to OpenStack
Pause and Resizing Images
Private Federated Clouds – a little further ahead
Where to go for help
iPlant user forums
ask.iplantcollaborative.org
Download