iPlant Atmosphere - iPlant Pods

advertisement
Cloud Computing for Education and Research
Customized cloud platform for computing on your terms !
• Nirav Merchant (nirav@email.arizona.edu)
Topic Coverage
•
•
•
•
Introduction to cloud computing
Challenges and Features
iPlant Atmosphere
Designing customized infrastructure for
research, course work and training material
• Using Atmosphere (hands on) for collaborative
data analysis
• Explore use of these resources on your own
and ask questions !
The iPlant Collaborative
Cyberinfrastructure for the Plant Sciences
Typical End
Users
Teragrid
XSEDE
Computational
Users
Cloud Computing
• Not a singular technology component
• Not a black box or alien technology
• Not a “elixir of scalability”, “panacea for Big Data”
etc.
• It cannot keep growing and scaling without
planning (and architecting your application)
• Unfortunate victim of marketing hype
• Further complicated by use of jargon, TLA, private
cloud, community cloud, hybrid cloud …
What is cloud computing ?
http://geekandpoke.typepad.com/geekandpoke/2009/03/let-the-clouds-make-your-life-easier.html
Cloud Computing
• Amazingly flexible technology
• It’s a platform that comprise of many uniquely
flexible components (more later)
• Allows us to create “purpose built appliances”
• Allows us to finally “script our infrastructure”
• Allows mixing and matching of components that
you need to do your science
• Opens up many new avenues and approaches for
teaching topics that require complex (pre
configured) software tools and data
Often overheard
I do my analysis using the
“cloud”
It’s the close equivalent of saying:
I do my research using “science”
Cloud Computing Zen
• Don’t get frustrated…
– This is cutting (bleeding) edge technology
– There will be plenty of WTF#$@ moments
• Be patient…
– Instructions/infrastructure keep changing (s/w version)
• Be flexible…
– There will be unanticipated issues along the way
• Be constructive…
– Use wiki, forums and share knowledge
– Make everyone’s experience better
• Be creative…
– There is more than one way to do it (TIM-TOWTDI)
8
iPlant URL’s you should know
•
•
•
•
Wiki.iplantcollaborative.org
Forum.iplantcollaborative.org
ask.iplantcollaborative.org
www.iplantcollaborative.org
Impromptu survey
• How many of you use command line
• How many of you are windows, mac, linux
users ?
• How many of you use HPC ? (or know what
HPC is)
• What resources do you use to teach
computing based workshops/training/courses
Atmosphere: motivation
• Standalone GUI-based applications are frequently
required for analysis
• GUI apps not easily to transform into web apps
• Need to handle complex software dependencies
(e.g specific bioperl version and R modules)
• Users needing full control of their software stack
(occasional sudo access)
• Need to share desktop/applications for
collaborative analysis (remote collaborators)
• Availability of Next Gen map-reduce based
algorithms (currently we have limited support)
As a Service models
Productivity
Is “Research as a Service”
SaaS: Software as a Service
(e.g. Clustering/Assembly is a service)
PaaS: Platform as a Service
IaaS plus core software capabilities on which you build SaaS
More Pain
More Flexibility
Cyberinfrastructure
(e.g. Hadoop/MapReduce is a Platform)
IaaS: Infrastructure as a Service
(get computer time with a credit card and with a Web interface like EC2)
http://salsahpc.indiana.edu
1
But where do I start ?
• Not very helpful searching for “cloud computing ”
related terms (as you will most likely get
bombarded by commercials and advertisements
in the first few hits !)
• NIST: National Institute of Standards and
Technology
Cloud Computing Synopsis and Recommendations
(Special Publication 800-146 : May 2012)
http://www.nist.gov/customcf/get_pdf.cfm?pub_id=911075
What it is
Challenges of existing platforms
• Amazon Web Services (AWS)
http://aws.amazon.com/
• Flexible and scalable
• High level of expertise required for
configurations
• Fairly challenging for biologists to master all
steps
• Limited lifecycle management (cost, time)
Steps to get started !
What is Atmosphere ?
• Self-service cloud infrastructure
• Designed to make underlying cloud infrastructure easy
to use by novice user
• Built on open source Eucalyptus (OpenStack)
• Fully integrated into iPlant authentication and storage
and HPC capabilities
• Enables users to build custom images/appliances and
share with community
• Cross-platform desktop access to GUI applications in
the cloud (using VNC)
• Start and stop your analysis (without loosing state),
much like your laptop (hibernate)
• Profile your application usage patter
• Provide easy web based access to remote resources
(compute+data+s/w)
Who is this tutorial designed for ?
• Users wanting to launch configured images in
atmosphere (like app store)
• Developers for application distribution
• Prototyping/Testing new software/modules
• Tailored software training setups (custom
workshops/laboratory courses etc)
• Distribute tasks in the cloud
• Collaborate and share screen/application
• Extend compute capabilities of existing
applications i.e. utilize iPlant API
Terms and jargon for cloud you should
know about
• Virtual Machine (aka VM)
• Image (aka VM-image)
• Instance (running VM)
• Amazon EC2 (Elastic Compute)
• Amazon EBS (Elastic Block Storage)
• Amazon S3 (Simple Storage Service)
The iPlant Collaborative
Project Atmosphere™: Custom Cloud Computing
• API-compatible implementation of
Amazon EC2/S3 interfaces
• Virtualize the execution environment for
applications and services
• Up to 12 core / 48 GB instances
• Access to Cloud Storage + EBS
• Run servers, CloudBurst desktop use
cases. Big data and the desktop are colocal again!
>60 hosted applications in
Atmosphere today, including
users from USDA, Forest
Service, database providers,
etc.
(30 more for postdocs and
grad students for training
classes)
Atmosphere: Collaboration
iPlant Data Store
Lifecycle
Working together
• How often do you wish you could show your
desktop to the person on the phone/skype
• Let them navigate the application for you ?
• Continue your work while you are away ?
• Give you a judgment call ?
• Very doable if you
– Buy screen sharing software
– Log into a different application
Distributing Tasks (scaling)
• You have a large collection (aka BoT: Bag of
Tasks) e.g many fasta sequence
• You build a “appliance” and now want to
distribute that among many appliances
• Works well for 1 but how do you feed many ?
• You REALLY want to add more appliances to
finish faster
Makeflow to the Rescue
• Simple way to distribute and manage your
workflow/analysis among many computing
platforms (appliances)
• Keeps track of progress, deals with failures
and starts where it left (no repeating
completed tasks)
Why another workflow system
• Emphasis on simplicity
• Very easy to integrate with cloud and HPC resources
• Does not support complex workflows, handles
dependencies in tasks very elegantly
• Light weight and portable
• Even works on local machine and makes full use of
multiple cores !
• Working on certain tasks locally (important for data
intensive apps)
• Workflow system is VERY extensible using various
scripting languages (if you choose)
How does it work ?
Your complex task
(needs software X, Y,Z)
Someone built you a
script/program
Atmosphere Image/Appliance
Atmosphere Image/Appliance
Atmosphere Image/Appliance
Atmosphere Image/Appliance
Atmosphere Image/Appliance
Atmosphere Image/Appliance
Atmosphere Image/Appliance
?
DATA !
Makeflow instructions
out-10-align.fasta : in-10.fasta align.exe
align.exe –p 10 –i in-10.fasta -o out-10-align.fasta
out-20-align.fasta : in-20.fasta align.exe
align.exe –p 10 –i in-20.fasta -o out-20-align.fasta
out-30-align.fasta : in-30.dat align.exe
align.exe –p 10 –i in-30.fasta -o out-30-align.fasta
Running it
•
•
•
•
Take the makeflow file (previous slide)
Run makeflow –f <filename>
Launch workers
Profit
What happens ?
Makeflow instructions + your
program (align.exe) + data
Tasks
out-10-align.fasta
out-20-align.fasta
in-10.fasta
In-20.fasta
DATA !
Workers in Atmosphere
Workers in Atmosphere
Image/Appliance
Workers
in Atmosphere
Image/Appliance
Workers
in Atmosphere
Image/Appliance
Workers
in Atmosphere
Image/Appliance
Workers
in Atmosphere
Image/Appliance
Workers
in Atmosphere
Image/Appliance
Image/Appliance
Example
When not to use cloud !
• When you need “bare metal” performance
– CPU speed
– Network
– Data I/O
• You application can support MPI across large
number of compute nodes (> 2)
• When applications need large memory
(>64Gb)
Users of Atmosphere for teaching
• Workshops:
– Frontiers and Techniques in Plant Sciences
CSHL 2011,2012
– Genotyping by Sequencing
Cornell Computational Biology
• Graduate/U. Graduate course work:
– BCB 660 Volker Brendel and Amy Toth
Fall 2011, Iowa State University
– ISTA 420/520 Nirav Merchant & Eric Lyons
Fall 2012, Univ. of Arizona
– Intro. Bioinformaics, Anne Lorraine
Fall 2012l Univ. of North Carolina
• Popular community contributed images:
– PhytoMorph (Nate Miller, U. Wisconsin)
– Twig2Genome (Haibao Tang, JCVI)
– Julin Maloof, UC Davis*
Recap on key concepts
•
•
•
•
Purpose built appliances
Scriptable infrastructure
Scaling multiple self contained tasks
Collaborative analysis
Discussion
• What would you want to build with your
custom infrastructure ?
Courses Using Atmosphere
Asian Wild Rice Distribution
The Research
•
Genetic studies documented
geographic subdivision of Asian wild
rice ( Oryza rufipogon ), the progenitor
of cultivated Asian rice.
–
•
•
Cause unknown.
Use species distribution modeling
(SDM) to examine environmental
factors associated with the spatial and
temporal distribution of O. rufipogon.
Compare estimated distribution during
Last Glacial Maximum (LGM) to genetic
data.
Problem
•
Analysis requires large datasets
iPlant Workshop at BSA, July 2011
•
Pu Huang (Washington U.) attended.
•
Learned about Atmosphere, iPlant’s
cloud computing platform.
Results
•
Present distribution of O. rufipogon
(Fig. A).
•
Projected paleodistribution at LGM was
separated into disconnected east and
west ranges (Fig. B).
–
•
•
Consistent with current geographic
pattern of genetic variation, with
two genetic groups that intergrade
(Fig. D).
Annual precipitation contributes most
to SDM estimates.
SDM projections for year 2080 indicate
an increasing probability of presence
and range expansion (Fig. C).
–
Indicates global warming is less
threat to this endangered species
than other human-mediated factors.
Scalable science
•
325 records of O. rufipogon sample
locations from two sources.
iPlant enabled Huang and Schaal to
successfully pursue this research.
P Huang and B.A. Schaal, Am. J. Botany 99(11). 2012.
(A) present, (B) Last Glacial Maximum,
(C) Future 2080, (D) Genetic variation.
Hands On Lab
Atmosphere Login
• Visit http://www.iplantcollaborative.org/
• Next click on the Atmosphere Login Image
(should be about mid page)
Click the Login button and enter your iPlant username and password
Atmosphere Intro screen
Click the Launch New Instance Button
1.
2.
3.
Search for NGS Viewers v3 08/20/2012(an instance type) and select the purple icon.
Give it a name and select the instance size (choose m1.small).
• By selecting different sizes you will notice project resources change.
When ready, press the Launch Instance
Understanding Instance Metrics
• After an image has launched, you can view information about it.
• Resource Usage Metrics
– My Resource Usage at the top of the screen shows how much of your
quota in CPUs and GB of memory is being used by your running
instances. (Seen at the top)
– Instance Details
•
The Instance Details tab displays important information about the instance,
including the ID assigned to the instance when it was launched, name of the image it
is using, unique EMI ID, the instance size, the date you launched the image, and the
IP address, which you will need when logging in to the instance.
– Instance Metrics
•
Instance Metrics allow you to drill down into the usage expended for the running
image.
Logging into an Instance
• Via ssh- If the Shell tab is disabled, you can log
into your instance via SSH for you operating
system.
• In your terminal window type:
$ssh your_iplant_username@instance_ip_address
For example, mine would look like:
$ssh amercer@128.196.142.48
Enter your iPlant password and you should be logged into your instance
Terminating an Instance
• Click instance to terminate in the My Instances
list.
• Either
– Click the Terminate Instance icon in your My Instances
list or
– Click the Terminate Instance button on the Data tab.
• Click OK to the warning message.
Requesting More Resources
• Enter the amount or resources you are
requesting.
• Enter the justification for the request.
• Click the Request Resources button (right side
of page).
– Your request will be reviewed and you will receive
a response within 2 working days.
Reporting an Instance Problem
• Select the instance which you are having
problems with.
• Click report instance
• Fill out the Instance Error form.
• When finished, press the Report this Instance
button.
Dealing with technical challenge
(Firewall issues)
Logging in via VNC
• Airport VNC runs a built-in Java VNC viewer from
a web browser within the Atmosphere Airport
interface and requires Java. This is the more
common use.
• Select the VNC tab
– If prompted, allow the Java applet to run
• In the VNC Server field, enter the IP address for
your instance, appending :1 after the IP address
(should be auto-populated already). Press
connect.
Enter your username and password
Here you have successfully logged via VNC.
Terminating a VNC session
• You can terminate a VNC Viewer session either
from the VNC tab in Airport or from the VNC
Viewer application window.
• To terminate the session from Airport: Click
the 'X' from the My Instances list or from the
VNC tab:
Hands on exercise
• Launching a instance (one per team)
• Connecting to it (vnc and ssh) using the web
browser and vnc client software
• Launching a application (flapjack/tablet)
• Installing a new application (optional)
• Collaborating with other users (sharing your
session)
• Terminating the instance when you are done
Download