Science Cloud Paul Watson Newcastle University, UK

advertisement
Science Cloud
Paul Watson
Newcastle University, UK
paul.watson@ncl.ac.uk
Research Challenge
Understanding the brain is the greatest
informatics challenge
• Enormous implications for science:
• Medicine
• Biology
• Computer Science
Collecting the Evidence
100,000 neuroscientists generate huge quantities of data
–
–
–
–
molecular (genomic/proteomic)
neurophysiological (time-series activity)
anatomical (spatial)
behavioural
Neuroinformatics Problems
• Data is:
• expensive to collect but rarely shared
• in proprietary formats & locally described
• The result is:
• a shortage of analysis techniques that can be applied
across neuronal systems
• limited interaction between research centres with
complementary expertise
Data in Science
•
Bowker’s “Standard Scientific Model”
1. Collect data
2. Publish papers
3. Gradually loose the original data
The New Knowledge Economy & Science & Technology Policy, G.C. Bowker
•
Problems:
– papers often draw conclusions from data that is not
published
– inability to replicate experiments
– data cannot be re-used
Codes in Science
•
Three stages for codes
1. Write code and apply to data
2. Publish papers
3. Gradually loose the original codes
•
Problems:
– papers often draw conclusions from codes that are
not published
– inability to replicate experiments
– codes cannot be re-used
Plan
• Neuroinformatics - a challenging e-science application
• CARMEN – addressing the challenges
• Cloud Computing for e-science
– Lessons we’ve Learnt
• The Promise of Commercial Clouds
Focus on Neural Activity
raw voltage signal data typically collected
using single or multi-electrode array
recording
neurone 1
neurone 2
neurone 3
cracking the neural code
Epilepsy Exemplar
Data analysis guides surgeon during operation
Further analysis provides evidence
WARNING!
The next 2 Slides show an exposed human brain
CARMEN
enables sharing and
collaborative exploitation of
data, analysis code and
expertise that are not
physically collocated
CARMEN Project
UK EPSRC e-Science Pilot
$7M (2006-10)
20 Investigators
Stirling
St. Andrews
Newcastle
Manchester
York
Sheffield
Leicester
Warwick
Cambridge
Plymouth
Imperial
Industry & Associates
CARMEN e-Science Requirements
• Store
– very large quantities of data (100TB+)
• Analyse
– suite of neuroinformatics services
– support data intensive analysis
• Automate
– workflow
• Share
– under user-control
Background: North East Regional e-Science Centre
• 25 Research Projects across many domains:
• Bioinformatics, Ageing & Health, Neuroscience, Chemical
Engineering, Transport, Geomatics, Video Archives, Artistic
Performance Analysis, Computer Performance Analysis,....
• Same key needs:
Share
Automate
Analyse
Store
Result: e-Science Central
• Integrated Store-Analyse-Automate-Share infrastructure
• Web-based
• Generic
– CARMEN neuroinformatics & chemistry as pilots
Science Cloud Architecture
Access over
Internet
(typically via
browser)
Upload
data &
services
Run
analyses
Data storage
and
analysis
Cloud Services Continuum (based on Robert Anderson)
http://et.cairene.net/2008/07/03/cloud-services-continuum/
Google Apps
Software
(SaaS)
Salesforce.com
Google AppEngine
Platform
(PaaS)
Microsoft Azure
Amazon EC2 & S3
Infrastructure
(IaaS)
Science Cloud Options
Users
Science
App n
Science
App 1
Service Developers
Science
App 1
....
Science
App n
....
Science Platform
Cloud Infrastructure:
Storage & Compute
Cloud Infrastructure:
Storage & Compute
CARMEN Cloud
Filestore
with Pattern
Search
Security
Workflow
Browsers
&
Rich
Clients
Database
Workflow
Enactment
Metadata
Service
Repository
Processing
Editing and Running a Workflow on the Web
Workflow
Result File
Viewing the output of Workflow Runs
Viewing results
Blogs and links
Communicating Results
Linking to
results & workflows
What we learnt: Moving into a Cloud
• Moving existing technologies into a cloud can be difficult
– some can’t run in a Cloud at all
Raw Data Exploration with Signal Data Explorer
What we learnt : Scalability
• Clouds offer the potential for scalability
– grab compute power only when needed
• But developers have to write scalable code
– for Infrastructure as a Service Clouds
Dynasoar: Dynamic Deployment
Service Repository
2: service fetch &
deploy
SR
A request to s4
node 1
s2, s5
R
req
1
C
WSP
node 2
…
res
3
Web Service
Provider
The deployed service remains in place and
can be re-used
- unlike job scheduling
node n
s2
Host Provider
29
Dynasoar
node 1
s2, s5
req
C
node 2
WSP
Consumer
…
res
Web Service
Provider
A request for s2 is routed to an existing
node n
s2
Host Provider
deployment of the service
30
Adaptive Dynamic Deployment
with Dynasoar
Commercial
Pay-as-you-go clouds
Would allow us to avoid this limit
18
400
Response time
(Seconds)
16
350
processors in pool
14
300
12
250
10
200
Adding Processors
as you need
them optimises
150 resources and
saves money100
in pay-as-you-go
clouds
8
6
4
Arrival Rate (messages per second)
1
1
1
0.5
0.5
0.5
0.25
0.25
0.13
0.13
0.13
0.06
0.06
0
0.03
0
0.03
2
0.03
50
Processors in pool
Response time (seconds)
450
Hot Off the Press..
• Recent experiments with Microsoft Azure Cloud
– running Chemical analyses
– Silverlight UI
Thanks to:
- Paul Appleby & Team at the Microsoft Technology Centre, Reading
- & MS e-Science Group
Microsoft Azure Cloud for e-Science Demo
Why are Commercial Clouds Important: Before
Research
1. Have good idea
2. Write proposal
3. Wait 6 months
4. If successful, wait 3
months
5. Install Computers
6. Start Work
Science Start-ups
1. Have good idea
2. Write Business Plan
3. Ask VCs to fund
4. If successful..
5. Install Computers
6. Start Work
Why Use Commercial Clouds:
1.
2.
3.
4.
Have good idea
Grab nodes from Cloud provider
Start Work
Pay for what you used
• also scalability, cost, sustainability
Commercial Clouds to the Rescue?
• Focus currently on infrastructure as a service
• But, this is only part of the stack
• Can we have pay-as-you-go Science Cloud Platforms?
A Sustainable Science Cloud
?
e-Science Central
www.inkspotscience.com
Problem:
delivering
the e-science
platform
Commercial
Clouds
Science .... Science
App 1
App n
Science Platform
as a Service
?

Cloud Infrastructure:
Storage & Compute
Summary: e-Science Central & CARMEN
•Web based
•Works anywhere
Software as a
Service
e-Science
Central /
CARMEN
• Dynamic Resource
Allocation
• Pay-as-you-Go*
Social
Networking
• Controlled Sharing
• Collaboration
• Communities
Cloud
Computing
Summary
• e-Science Central
– Store-Analyse-Automate-Share e-science platform
– Adding content from a range of domains
• CARMEN is piloting this approach for neuroinformatics
• Cloud computing can revolutionise e-science
– reduce time from idea to realisation
Download