Distributed Computing Economics

advertisement
Distributed Computing
Economics
Jim Gray
Microsoft Research
gray@microsoft.com
Presentation To Microsoft Venture
Capital Summit
28 April 2004
Distributed Computing
Economics
Why is Seti@Home a great idea?
Why is Napster a great deal?
Why is the Computational Grid uneconomic?
When does computing on demand work?
What is the “right” level of abstraction?
Is the Access Grid the real killer app?
Based on: Distributed Computing Economics,
Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24
http://research.microsoft.com/research/pubs/view.aspx?tr_id=655
Computing Is Free
Computers cost 1k$ (if you shop right)
(yes, there are 1μ$ to 1M$ computers, but..)
So 1 cpu day = 1$ (computers last 3 years)
If you pay the phone bill, internet bandwidth
costs 50…500$/mbps/m (not including
routers and management)
So 1GB costs 1$ to send and 1$ to receive
Caveat: All numbers rounded to nearest factor of 3.
Why Is Seti@Home A
Good Deal?
Send 300 KB:
Costs 3e-4$
User computes for ½ day: Benefit .5e-1$
ROI: 1500:1
Seti@Home
The worlds most powerful computer
67 TF is sum of top 4 of Top 500
67 TF is 9x the number 2 system
67 TF more than the sum of systems 2...10
Seti@Home
http://setiathome.ssl.berkeley.edu/totals.html
Users
Results received
Total CPU time
Floating Point
Operations
26 April 2004
Total
5M
1.3 B
Last 24 Hours
1,138
1,5 M
1.5 M years
5 E+21 flops
6 E+18 FLOPS/day
5 zeta flops
67 TeraFLOPs
1,199 years
Why Was Napster A
Good Deal?
Send 5 MB
costs 5e-3$
½ a penny per song
Both sender and receiver can afford it
Same logic powers web sites (Yahoo!...)
1e-3$/page view advertising revenue
1e-5$/page view cost of serving web page
100:1 ROI
Computing Equivalents
1$ buys
1 day of cpu time
4 GB (fast) ram for a day
1 GB of network bandwidth
1 GB of disk storage for 3 years
10 M database accesses
10 TB of disk access (sequential)
10 TB of LAN bandwidth (bulk)
10 KWhrs == 4 days of computer time
Depreciating over 3 years, and there are about 1k days in 3 years.
Some Consequences
Beowulf networking is 10,000x cheaper than
WAN networking factors of 105 matter
The cheapest and fastest way to move
Terabytes cross country is sneakernet
24 hours = 4 MB/s
50$ shipping vs 1,000$ wan cost
Sending 10PB CERN data via network is silly:
buy disk bricks in Geneva, fill them, ship them
TeraScale SneakerNet: Using Inexpensive Disks for Backup,
Archiving, and Data Exchange
Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg
Microsoft Technical Report may 2002, MSR-TR-2002-54
http://research.microsoft.com/research/pubs/view.aspx?tr_id=569
Computational Grid
Economics
To the extent that computational grid is like
Seti@Home or ZetaNet or Folding@home or…it is a
great thing
The extent that the computational grid is MPI or data
analysis, it fails on economic grounds: move the
programs to the data, not the data to the programs
The Internet is not the cpu backplane
An alternate reality: Nearly free networking
Telcos go bankrupt and price=cost=0
Taxpayers pay your phone bill so price=0 and telcos receive
a BIG government subsidy
When To Export A Task
IF instruction density >
100,000 instructions/byte
AND remote computer is free (costs you nothing)
THEN ROI > 0
ELSE ROI < 0
Computing On Demand
Was called outsourcing/service bureaus in my youth.
CSC and IBM did it
It is not a new way of doing things: think payroll.
Payroll is standard outsourced service
Now Hotmail, Salesforce.com, Oracle.com,…
Works for standard apps
COD works for commoditized services
Airlines outsource reservations. Banks
outsource ATMs
But Amazon, Amex, Wal-Mart, eTrade, eBay... Can’t
outsource their core competence
What’s The Right Abstraction Level For
Internet Scale Distributed Computing?
Disk block?
File?
Database?
Application?
No too low
No too low
No too low
Yes, of course
Blast search
Google search
Send/Get eMail
Portals that federate astronomy archives
(http://skyQuery.Net/)
Web Services (.NET, EJB, OGSA) give this
abstraction level
Access Grid
Q: What comes after the telephone?
A: eMail?
A: Instant messaging?
Both seem retro: text & emotons
Access Grid could revolutionize human
communication
But, it needs a new idea
Q: What comes after the telephone?
Supercomputers You Use
Hotmail, Yahoo!, Google: ~10k servers
Amazon, Barnes&Noble
Expedia, Orbitz
Dell, HP,…
Service-oriented architectures
Not computing on demand, but
information on demand!
Distributed Computing Economics
Why is Seti@Home a great idea?
Why is Napster a great deal?
Why is the Computational Grid
uneconomic
When does computing on
demand work?
What is the “right” level of abstraction?
Is the Access Grid the real killer app?
Based on: Distributed Computing Economics,
Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24
http://research.microsoft.com/research/pubs/view.aspx?tr_id=655
Poll
Is there a market for Supercomputers?
Yes, Google, Expedia, Hotmail,…
Is Computing On Demand a highmargin business?
I think not
Do you know the equivalent highmargin business?
Information on demand
Take Aways
Computing on demand is a service
business; probably not high margin;
questionable economics; think
LoudCloud
Distributed computing is coming,
but it is probably via Service Oriented
Architecture (SOA)
Web Services is the way to do SOA
Outline
Overview of Microsoft Research
Distribute Computing Economics
Q&A
The Cost Of Computing
Computers are NOT free!
IBM, HP, Dell make $billions
Capital Cost of a TpcC system
is mostly storage and
storage software (database)
IBM 32 cpu, 512 GB ram
2,500 disks, 43 TB
TpcC Cost Components DB2/AIX
http://www.tpc.org/results/individual_results/IBM /IBM p690es_05092003.pdf
software
10%
storage
61%
(680,613 tpmC @ 11.13 $/tpmc available 11/08/03)
http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf
A 7.5M$ super-computer
Total Data Center Cost:
40% capital & facilities 60% staff
(includes app development)
cpu/mem
29%
Download