Ian Foster www.ci.anl.gov
www.ci.uchicago.edu
www.ci.anl.gov
www.ci.uchicago.edu
The data deluge
Genomic sequencing output x2 every 9 month
>300 public centers
MACHO et al.: 1 TB
Palomar: 3 TB
2MASS: 10 TB
GALEX: 30 TB
Sloan: 40 TB
Pan-STARRS:
40,000 TB
100,000 TB
1330 molec. bio databases
Nucleic Acids Research (96 in Jan 2001)
2004: 36 TB
2012: 2,300 TB
Climate model intercomparison project (CMIP) of the IPCC
Big science has achieved big successes
LIGO: 1 PB data in last science run, distributed worldwide
Robust production solutions
Substantial teams and expense
Sustained, multi-year effort
Application-specific solutions, built on common technology
OSG: 1.4M CPU-hours/day,
>90 sites, >3000 users,
>260 pubs in 2010
ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs
4 www.ci.uchicago.edu
But small science is struggling
More data, more complex data
Ad-hoc solutions
Inadequate software, hardware
Data plan mandates
5 www.ci.anl.gov
www.ci.uchicago.edu
•
•
•
•
6
Medium-scale science struggles too!
Dark Energy Survey receives 100,000 files each night in Illinois
They transmit files to
Texas for analysis … then move results back to Illinois
Process must be reliable, routine, and efficient
The cyberinfrastructure team is not large
Blanco 4m on Cerro Tololo
Image credit: Roger Smith/NOAO/AURA/NSF www.ci.anl.gov
www.ci.uchicago.edu
The challenge of staying competitive
"Well, in our country," said Alice …
"you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”
"A slow sort of country!" said the
Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
7 www.ci.anl.gov
www.ci.uchicago.edu
Current approaches are unsustainable
8
•
•
•
Small laboratories
– PI, postdoc, technician, grad students
– Estimate 5,000 across US university community
– Average ill-spent/unmet need of 0.5 FTE/lab?
Medium-scale projects
– Multiple PIs, a few software engineers
– Estimate 500 across US university community
– Average ill-spent/unmet need of 3 FTE/project?
Total 4000 FTE: at ~$100K/FTE => $400M/yr
Plus computers, storage, opportunity costs, … www.ci.anl.gov
www.ci.uchicago.edu
9
And don’t forget administrative costs
42% of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research
— Federal Demonstration Partnership faculty burden survey, 2007 www.ci.anl.gov
www.ci.uchicago.edu
10
You can run a company from a coffee shop www.ci.anl.gov
www.ci.uchicago.edu
11
Because businesses outsource their IT
Web presence
Email (hosted Exchange)
Calendar
Telephony (hosted VOIP)
Human resources and payroll
Accounting
Customer relationship mgmt
www.ci.anl.gov
www.ci.uchicago.edu
12
And often their large-scale computing too
Web presence
Email (hosted Exchange)
Calendar
Telephony (hosted VOIP)
Human resources and payroll
Accounting
Customer relationship mgmt
Data analytics
Content distribution
Infrastructure as a Service
(IaaS) www.ci.anl.gov
www.ci.uchicago.edu
Let’s rethink how we provide research IT
Accelerate discovery and innovation worldwide by providing research IT as a service
13
•
•
•
Leverage software-as-a-service to provide millions of researchers with unprecedented access to powerful tools; enable a massive shortening of cycle times in time-consuming research processes; and reduce research IT costs dramatically via economies of scale www.ci.anl.gov
www.ci.uchicago.edu
14
Time-consuming tasks in science
•
•
•
•
•
•
•
•
•
Run experiments
Collect data
Manage data
Move data
Acquire computers
Analyze data
Run simulations
Compare experiment with simulation
Search the literature
• Communicate with colleagues
• Publish papers
• Find, configure, install relevant software
• Find, access, analyze relevant data
• Order supplies
• Write proposals
• Write reports
• … www.ci.anl.gov
www.ci.uchicago.edu
15
Time-consuming tasks in science
•
•
•
•
•
•
•
•
•
Run experiments
Collect data
Manage data
Move data
Acquire computers
Analyze data
Run simulations
Compare experiment with simulation
Search the literature
• Communicate with colleagues
• Publish papers
• Find, configure, install relevant software
• Find, access, analyze relevant data
• Order supplies
• Write proposals
• Write reports
• … www.ci.anl.gov
www.ci.uchicago.edu
16
Data movement can be surprisingly difficult
A B www.ci.anl.gov
www.ci.uchicago.edu
Data movement can be surprisingly difficult
17
Discover endpoints, determine available protocols, negotiate firewalls, configure software, manage space, determine required credentials, configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, …
A
It took 2 weeks and much help from many people to move 10 TB between
California and Tennessee.
(2007 BES report)
B www.ci.anl.gov
www.ci.uchicago.edu
18
Globus Online’s SaaS/Web 2.0 architecture
Web interface
HTTP REST interface
POST https://transfer.api.
globusonline.org/ v0.10/ transfer <transfer-doc>
Fire-and-forget data movement
Automatic fault recovery
High performance
No client software install
Across multiple security domains
Command line interface ls alcf#dtn:/ scp alcf#dtn:/myfile \ nersc#dtn:/myfile
OpenID
OAuth
Shibboleth
(Hosted on)
GridFTP servers
FTP servers
Other protocols:
HTTP, WebDAV, SRM, …
Globus Connect on local computers www.ci.anl.gov
www.ci.uchicago.edu
Example application: UC sequencing facility
Mac using Globus Connect
Delivery of data to customer
Mount drive iBi File Server
Sequencing instrument
19 iBi general-purpose compute cluster
Sequencing-specific compute cluster www.ci.anl.gov
www.ci.uchicago.edu
Statistics and user feedback
•
•
•
•
Launched November 2010
>1700 users registered
>500 TB user data moved
>30 million user files moved
>150 endpoints registered
Widely used on TeraGrid/
XSEDE; other centers & facilities; internationally
>20x faster than SCP
Faster than hand-tuned
20
“Last time I needed to fetch
100,000 files from NERSC, a graduate student babysat the process for a month.”
“I expected to spend four weeks writing code to manage my data transfers; with Globus
Online, I was up and running in five minutes.”
“Transferred 28 MB in 20 minutes instead of 61 hours.
Makes these global climate simulations manageable.” www.ci.anl.gov
www.ci.uchicago.edu
21
Moving 586 Terabytes in two weeks www.ci.anl.gov
www.ci.uchicago.edu
22
Monitoring provides deep visibility www.ci.anl.gov
www.ci.uchicago.edu
Terabyte
Gigabyte
Megabyte
Kilobyte
20 Terabytes in less than one day
20 Gigabyes in more than two days
24
Common research data management steps
•
•
•
Dark Energy Survey
Galaxy genomics
LIGO observatory
•
•
•
SBGrid structural biology consortium
NCAR climate data applications
Land use change; economics www.ci.anl.gov
www.ci.uchicago.edu
We have choices of where to compute
25
•
•
Campus systems
– First target for many researchers
XSEDE supercomputers
•
–
–
220,000 cores, peer-reviewed awards
Optimized for scientific computing
Open Science Grid
– 60,000 cores; high throughput
• Commercial cloud providers
–
–
Instant access for small tasks
Expensive for big projects
Users insist that they need everything connected www.ci.anl.gov
www.ci.uchicago.edu
26
Towards “research IT as a service”
Scienti fic data management as a service
GO-Store GO-Collaborate
GO-Compute GO-Catalog
GO-Galaxy
GO-Team
GO-Transfer
GO-User www.ci.anl.gov
www.ci.uchicago.edu
27
Research data management as a service
•
•
•
•
GO-User Today
– Credentials and other profile information
GO-Transfer
– Data movement
GO-Team
– Group membership
Fall
GO-Collaborate
– Connect to collaborative tools: Jira, Confluence, …
•
•
•
•
GO-Store
–
Prototype
Access to campus, cloud, XSEDE storage
GO-Catalog
– On-demand metadata catalogs
GO-Compute
– Access to computers
GO-Galaxy
– Share, create, run workflows www.ci.anl.gov
www.ci.uchicago.edu
SaaS services in action: The XSEDE vision
Academic institution
= Standard
interface
2
User Team Catalog Transfer Compute
...
InCommon
28
XSEDE service provider
Commercial provider
...
Data provider
Open
Science
Grid www.ci.anl.gov
www.ci.uchicago.edu
Data analysis as a service: Early steps
Securely and reliably:
1.
Assemble code
2.
Find computers
3.
Deploy code
4.
Run program
5.
Access data
6.
Store data
7.
Record workflow
8.
Reuse workflow
Data store
[7, 8]
We have built such systems for biological, environmental, and economics researchers
[5, 6]
29
[1, 2]
VM image
App code
Workflow
Galaxy
Condor
[3, 4] www.ci.anl.gov
www.ci.uchicago.edu
SaaS economics: A quick tutorial
30
•
•
•
•
Lower per-user cost (x10?)
$ via aggregation onto common infrastructure
– $400M/yr $40M/yr?
Initial “cost trough” due to fixed costs
0
Per-user revenue permits positive return to scale
Further reduce per-user cost over time
Time
X10 reduction in per-user cost:
$50K $5K/yr per lab
$300K $30K/yr per project www.ci.anl.gov
www.ci.uchicago.edu
A national cyberinfrastructure strategy?
•
•
•
To provide
more capability for
more people at less cost …
L
Small and medium laboratories and projects
L
L
L
P
L
L
L
L
L
P
L
L
L
L
L
P
L
L
L
L
P
L
L
L
L
P
L
L
L
L
L
Create infrastructure
– Robust and universal
– Economies of scale
– Positive returns to scale
Research data management
Collaboration, computation
Research administration a a
S
Via the creative use of
– Aggregation (“cloud”)
– Federation (“grid”)
31 www.ci.anl.gov
www.ci.uchicago.edu
Acknowledgments
32
•
•
•
•
Colleagues at UChicago and Argonne
– Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik, and others listed at www.globusonline.org/about/goteam/
Carl Kesselman and other colleagues at other institutions
Participants in the recent ICiS workshop on
“Human-Computer Symbiosis: 50 Years On”
NSF OCI and MPS; DOE ASCR; and NIH for support www.ci.anl.gov
www.ci.uchicago.edu
For more information
33
•
•
• www.globusonline.org
; @globusonline: Twitter
Foster, I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet
Computing(May/June):70-73, 2011.
Allen, B., Bresnahan, J., Childers, L., Foster, I.,
Kandaswamy, G., Kettimuthu, R., Kordas, J., Link,
M., Martin, S., Pickett, K. and Tuecke, S. Globus
Online: Radical Simplification of Data Movement via SaaS. Communications of the ACM, 2011.
www.ci.anl.gov
www.ci.uchicago.edu
Thank you!
foster@uchicago.edu
www.globusonline.org
@globusonline www.ci.anl.gov
www.ci.uchicago.edu