Super Computer - Microsoft Research

advertisement
Information Centric
Super Computing
Jim Gray
Microsoft Research
gray@microsoft.com
Talk at http://research.microsoft.com/~gray/talks
20 May 2003
Presentation to
Committee on the Future of Supercomputing
of the National Research Council's
Computer Science and Telecommunications Board
1
Committee Goal
… assess the status of supercomputing in the United States,
including the characteristics of relevant systems and architecture
research in government, industry, and academia and the
characteristics of the relevant market. The committee will
examine key elements of context--the history of supercomputing,
the erosion of research investment, the needs of government
agencies for supercomputing capabilities--and assess options for
progress. Key historical or causal factors will be identified. The
committee will examine the changing nature of problems
demanding supercomputing (e.g., weapons design, molecule
modeling and simulation, cryptanalysis, bioinformatics, climate
modeling) and the implications for systems design. It will seek to
understand the role of national security in the supercomputer
market and the long-term federal interest in supercomputing.
2
Summary: It’s the Software…
•
•
•
•
•
Computing is Information centric
Scientific computing is Beowulf computing
Scientific computing becoming Info-centric.
Adequate investment in files/OS/networking
Underinvestment in Scientific Information
management and visualization tools.
• Computation Grid moves too much data,
DataGrid (or App Grid) is right concept
super
3
Thesis
• Most new information is digital
(and old information is being digitized)
• A Computer Science Grand Challenge:
–
–
–
–
Capture
Organize
Summarize
Visualize
This information
• Optimize Human Attention as a resource
• Improve information quality
4
Information Avalanche
• The Situation
– We can record everything
– Everything is a LOT!
• The Good news
– Changes science, education, medicine, entertainment,….
– Shrinks time and space
– Can augment human intelligence
• The Bad News
– The end of privacy
– Cyber Crime / Cyber Terrorism
– Monoculture
• The Technical Challenges
– Amplify human intellect
– Organize, summarize and prioritize information
– Make programming easy
5
Super Computers
• You and Others use
Every day
–
–
–
–
Google, Inktomi,…
AOL, MSN, Yahoo!
Hotmail, MSN,…
eBay,
Amazon.com,…
• IntraNets
–
–
–
–
Wal-Mart
Federal Reserve
Amex
1 Tflops
• All more than 1PB
• All are more than
10 Tops
• All more than 1PB
They are ALL Information Centric
6
Q: How can I recognize a SuperComputer?
A: Costs 10M$
Gordon Bell’s Seven Price Tiers
10$:
100$:
1,000$:
10,000$:
100,000$:
1,000,000$:
10,000,000$:
wrist watch computers (sensors)
pocket/ palm computers (phone/camera)
portable computers
(tablet)
personal •computers
(workstation)
departmental computers (closet)
site computers
(glass house)
regional computers
(glass castle SC)
Super Computer / “Mainframe” Costs more than 1M$
Must be an array of
processors,
disks
comm ports
7
Computing is Information Centric
that’s why they call it IT
• Programs capture, organize, abstract, filter,
present Information to people.
• Networks carry Information.
• File is wrong abstraction:
Information is typed / schematized
words, pictures, sounds, arrays, lists,..
• Notice that none of the examples on prev slide
serve files – they serve typed information.
• Recommendation:
Increase Research investments
ABOVE the OS level
Information Management/Visualization
8
Summary: It’s the Software…
•
•
•
•
•
Computing is Information centric
Scientific computing is Beowulf computing
Scientific computing becoming Info-centric.
Adequate investment in files/OS/networking
Underinvestment in Scientific Information
management and visualization tools.
• Computation Grid moves too much data,
DataGrid (or App Grid) is right concept
9
Anecdotal Evidence,
Everywhere I go I see Beowulfs
• Clusters of PCs (or high-slice-price micros)
• True: I have not visited Earth Simulator,
but… Google, MSN, Hotmail, Yahoo, NCBI, FNAL,
Los Alamos, Cal Tech, MIT, Berkeley, NARO,
Smithsonian, Wisconsin, eBay, Amazon.com,
Schwab, Citicorp, Beijing, Cern, BaBar, NCSA,
Cornell, UCSD, and of course NASA and Cal Tech
10
skip
Super Computing
The Top 10 of Top 500
Adapted from Top500 Nov 2002
Cumulative
Hardware
TeraFlops
Site
TF
1
NEC Earth-Sim
35.9
Earth Sim Ctr
36
2
HP ASCI Q
7.7
LLNL
44
3
HP ASCI Q
7.7
LLNL
51
4
IBM ASCI White
7.2
LLNL
59
5
Intel/NetworX
5.7
LLNL
64
6
HP Alpha
4.5
PSC
69
7
HP Alpha
4.0
CEA
73
8
Intel/HPTi
3.3
NOAA
76
9
IBM SP2
3.2
HPCx
79
10
IBM SP2
3.2
NCAR
82
11
skip
Seti@Home
The worlds most powerful computer
• 61 TF is sum of top 4 of Top 500.
• 61 TF is 9x the number 2 system.
• 61 TF more than the sum of systems 2..10
Seti@Home
http://setiathome.ssl.berkeley.edu/totals.html
20 May 2003
Total
Last 24 Hours
Users
4,493,731
1,900
Results received
886 M
1,4 M
Total CPU time
1.5 M years
1,514 years
Floating Point
Operations
3 E+21 ops
3 zeta ops
5 E+18 FLOPS/day
61.3 TeraFLOPs
12
skip
And…
• Google:
– 10k cpus, 2PB,… as of 2 years ago
– 40 Tops
• AOL, MSN, Hotmail, Yahoo!, …
-- all ~10K cpus
-- all have ~ 1PB …10PB storage
• Wal-Mart is a PB poster child
• Clusters / Beowulf everywhere you go.
13
Scientific == Beowulf (clusters)
• Scientific/ Beowulf/ Grid computing
70’s style computing:
process / file / socket
byte arrays, no data schema or semantics
batch job scheduling
manual parallelism (MPI)
poor / no Information management support
poor / no Information visualization toolkits
• Recommendation:
Increase investment in Info-Management
Increase investment in Info-Visualization
14
Summary: It’s the Software…
•
•
•
•
•
Computing is Information centric
Scientific computing is Beowulf computing
Scientific computing becoming Info-centric.
Adequate investment in files/OS/networking
Underinvestment in Scientific Information
management and visualization tools.
• Computation Grid moves too much data,
DataGrid (or App Grid) is right concept
15
The Evolution of Science
• Observational Science
– Scientist gathers data by direct observation
– Scientist analyzes Information
• Analytical Science
– Scientist builds analytical model
– Makes predictions.
• Computational Science
– Simulate analytical model
– Validate model and makes predictions
• Science - Informatics
Information Exploration Science
Information captured by instruments
Or Information generated by simulator
– Processed by software
– Placed in a database / files
– Scientist analyzes database / files
16
How Discoveries Made?
Adapted from slide by George Djorgovski
•
Conceptual Discoveries: e.g., Relativity, QM, Brane World,
Inflation …
Theoretical, may be inspired by observations
•
Phenomenological Discoveries: e.g., Dark Matter, QSOs, GRBs,
CMBR, Extrasolar Planets, Obscured Universe …
Empirical, inspire theories, can be motivated by them
New Technical
Capabilities
Observational
Discoveries
Theory
Phenomenological Discoveries:
Explore parameter space
Make new connections (e.g., multi-)
Understanding of complex phenomena requires
complex, information-rich data (and simulations?)
17
The Information Avalanche
both comp-X and X-info
generating Petabytes
• Comp-Science
generating Information
avalanche
comp-chem,
comp-physics,
comp-bio,
comp-astro,
comp-linguistics,
comp-music,
comp-entertainment,
comp-warfare
• Science-Info
generating
Information
avalanche
bio-info,
astro-info,
text-info,
18
Information Avalanche Stories
• Turbulence: 100 TB simulation
then mine the Information
• BaBar: Grows 1TB/day
2/3 simulation Information
1/3 observational Information
• CERN: LHC will generate 1GB/s
10 PB/y
• VLBA (NRAO) generates 1GB/s today
• NCBI: “only ½ TB” but doubling each year
very rich dataset.
19
• Pixar: 100 TB/Movie
Astro-Info
World Wide Telescope
http://www.astro.caltech.edu/nvoconf/
http://www.voforum.org/
• Premise: Most data is (or could be online)
• Internet is the world’s best telescope:
–
–
–
–
It has data on every part of the sky
In every measured spectral band: optical, x-ray, radio..
As deep as the best instruments (2 years ago).
It is up when you are up.
The “seeing” is always great
(no working at night, no clouds no moons no..).
– It’s a smart telescope:
links objects and data to
literature on them.
20
Why Astronomy Data?
IRAS 25m
•It has no commercial value
–No privacy concerns
–Can freely share results with others
–Great for experimenting with algorithms
2MASS 2m
•It is real and well documented
– High-dimensional data (with confidence intervals)
– Spatial data
– Temporal data
DSS Optical
•Many different instruments from
many different places and
many different times
•But, it’s the same universe
so comparisons make sense & are interesting.
•Federation is a goal
•There is a lot of it (petabytes)
•Great sandbox for data mining algorithms
IRAS 100m
WENSS 92cm
–Can share cross company
–University researchers
NVSS 20cm
•Great way to teach both
Astronomy and
Computational Science
21
ROSAT ~keV
GB 6cm
Summary: It’s the Software…
•
•
•
•
•
Computing is Information centric
Scientific computing is Beowulf computing
Scientific computing becoming Info-centric.
Adequate investment in files/OS/networking
Underinvestment in Scientific Information
management and visualization tools.
• Computation Grid moves too much data,
DataGrid (or App Grid) is right concept
22
What X-info Needs from us (cs)
(not drawn to scale)
Miners
Scientists
Science Data
& Questions
Data Mining
Algorithms
Plumbers
Database
To store data
Execute
Queries
Question &
Answer
Visualization
Tools
23
Data Access is hitting a wall
FTP and GREP are not adequate
•
•
•
•
You can GREP 1 MB in a second
You can GREP 1 GB in a minute
You can GREP 1 TB in 2 days
You can GREP 1 PB in 3 years.
•
•
•
•
You can FTP 1 MB in 1 sec
You can FTP 1 GB / min (= 1 $/GB)
… 2 days and 1K$
… 3 years and 1M$
• Oh!, and 1PB ~5,000 disks
• At some point you need
indices to limit search
parallel data search and analysis
• This is where databases can help
24
Next-Generation Data Analysis
• Looking for
– Needles in haystacks – the Higgs particle
– Haystacks: Dark matter, Dark energy
• Needles are easier than haystacks
• Global statistics have poor scaling
– Correlation functions are N2, likelihood techniques N3
• As data and processing grow at same rate,
we can only keep up with N logN
• A way out?
– Discard notion of optimal (data is fuzzy, answers are approximate)
– Don’t assume infinite computational resources or memory
• Requires combination of statistics & computer science
• Recommendation:
invest in data mining research
both general and domain-specific.
25
Analysis and Databases
• Statistical analysis deals with
–
–
–
–
–
–
–
–
Creating uniform samples
data filtering & censoring bad data
Assembling subsets
Estimating completeness
Counting and building histograms
Generating Monte-Carlo subsets
Likelihood calculations
Hypothesis testing
• Traditionally these are performed on files
• Most of these tasks are much better done inside a database
close to the data.
• Move Mohamed to the mountain,
not the mountain to Mohamed.
• Recommendation: Invest in database research:
extensible databases: text, temporal, spatial, …
data interchange, parallelism,
indexing, query optimization
26
Goal:
Easy Data Publication & Access
• Augment FTP with data query:
Return intelligent data subsets
• Make it easy to
– Publish: Record structured data
– Find:
• Find data anywhere in the network
• Get the subset you need
– Explore datasets interactively
• Realistic goal:
– Make it as easy as
publishing/reading web sites today.
27
Data Federations of Web Services
• Massive datasets live near their owners:
–
–
–
–
Near the instrument’s software pipeline
Near the applications
Near data knowledge and curation
Super Computer centers become Super Data Centers
• Each Archive publishes a web service
– Schema: documents the data
– Methods on objects (queries)
• Scientists get “personalized” extracts
• Uniform access to multiple ArchivesFederation
– A common global schema
28
Web Services: The Key?
• Web SERVER:
– Given a url + parameters
– Returns a web page (often dynamic)
Your
program
Web
Server
• Web SERVICE:
– Given a XML document (soap msg)
– Returns an XML document
– Tools make this look like an RPC.
• F(x,y,z) returns (u, v, w)
– Distributed objects for the web.
– + naming, discovery, security,..
• Internet-scale
distributed computing
Your
program
Data
In your
address
space
Web
Service
29
The Challenge
• This has failed several times before–
understand why.
• Develop
– Common data models (schemas),
– Common interfaces (class/method)
• Build useful prototypes (nodes and portals)
• Create a community
that uses the prototypes and
evolves the prototypes.
30
Grid and Web Services Synergy
• I believe the Grid will be many web services
• IETF standards Provide
– Naming
– Authorization / Security / Privacy
– Distributed Objects
Discovery, Definition, Invocation, Object Model
– Higher level services: workflow, transactions, DB,..
• Synergy: commercial Internet & Grid tools
31
Summary: It’s the Software…
•
•
•
•
•
Computing is Information centric
Scientific computing is Beowulf computing
Scientific computing becoming Info-centric.
Adequate investment in files/OS/networking
Underinvestment in Scientific Information
management and visualization tools.
• Computation Grid moves too much data,
DataGrid (or App Grid) is right concept
32
Recommendations
• Increase Research investments
ABOVE the OS level
Information Management/Visualization
• Invest in database research:
extensible databases: text, temporal, spatial, …
data interchange, parallelism,
indexing, query optimization
• invest in data mining research
both general and domain-specific
33
Stop Here
• Bonus slides on Distributed Computing
Economics
34
Distributed Computing Economics
•
•
•
•
•
•
Why is Seti@Home a great idea
Why is Napster a great deal?
Why is the Computational Grid uneconomic
When does computing on demand work?
What is the “right” level of abstraction
Is the Access Grid the real killer app?
Based on: Distributed Computing Economics,
Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24
http://research.microsoft.com/research/pubs/view.aspx?tr_id=655
35
Computing is Free
• Computers cost 1k$ (if you shop right)
• So 1 cpu day == 1$
• If you pay the phone bill (and I do)
Internet bandwidth costs 50 … 500$/mbps/m
(not including routers and management).
• So 1GB costs 1$ to send and 1$ to receive
36
Why is Seti@Home a Good Deal?
• Send 300 KB for
• User computes for ½ day:
• ROI: 1500:1
costs 3e-4$
benefit .5e-1$
37
Why is Napster a Good Deal?
• Send 5 MB
costs 5e-3$
• ½ a penny per song
• Both sender and receiver can afford it.
• Same logic powers web sites (Yahoo!...):
– 1e-3$/page view advertising revenue
– 1e-5$/page view cost of serving web page
– 100:1 ROI
38
The Cost of Computing:
Computers are NOT free!
• Capital Cost of a TpcC
system is mostly
storage and
storage software (database)
• IBM 32 cpu, 512 GB ram
2,500 disks, 43 TB
TpcC Cost Components DB2/AIX
http://www.tpc.o rg/results/individual_results/IB M /IB M p690es_05092003.pdf
software
10%
cpu/mem
29%
storage
61%
(680,613 tpmC @ 11.13 $/tpmc available 11/08/03)
http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf
• A 7.5M$ super-computer
• Total Data Center Cost:
40% capital &facilities
60% staff
(includes app development)
39
Computing Equivalents
1 $ buys
•
•
•
•
•
•
•
1 day of cpu time
4 GB ram for a day
1 GB of network bandwidth
1 GB of disk storage
10 M database accesses
10 TB of disk access (sequential)
10 TB of LAN bandwidth (bulk)
40
Some consequences
• Beowulf networking is
10,000x cheaper than WAN networking
factors of 105 matter.
• The cheapest and fastest way to move a
Terabyte cross country is sneakernet.
24 hours = 4 MB/s
50$ shipping vs 1,000$ wan cost.
• Sending 10PB CERN data via network
is silly:
buy disk bricks in Geneva,
fill them,
ship them.
TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange
Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg
Microsoft Technical Report may 2002, MSR-TR-2002-54
http://research.microsoft.com/research/pubs/view.aspx?tr_id=569
41
How Do You Move A Terabyte?
Context
Speed
Rent
$/TB
$/Mbps
Mbps $/month
Sent
Time/TB
Home phone 0.04
40
1,000
3,086
6 years
Home DSL
0.6
70
117
360
5 months
T1
1.5
1,200
800
2,469
2 months
T3
43
28,000
651
2,010
2 days
OC3
155
49,000
316
976
14 hours
OC 192
9600
1,920,000
200
617
14 minutes
100 Mpbs
100
1 day
Gbps
1000
2.2 hours
Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all
42
Computational Grid Economics
• To the extent that computational grid is like
Seti@Home or ZetaNet or Folding@home
or… it is a great thing
• The extent that the computational grid is MPI
or data analysis, it fails on economic grounds:
move the programs to the data, not the data to
the programs.
• The Internet is NOT the cpu backplane.
• The USG should not hide this economic fact
from the academic/scientific research
community.
43
Computing on Demand
• Was called outsourcing / service bureaus
in my youth. CSC and IBM did it.
• It is not a new way of doing things: think payroll.
Payroll is standard outsource.
• Now we have Hotmail, Salesforce.com,
Oracle.com,….
• Works for standard apps.
• Airlines outsource reservations.
Banks outsource ATMs.
• But Amazon, Amex, Wal-Mart, ...
Can’t outsource their core competence.
• So, COD works for commoditized services.
44
What’s the right abstraction level for
Internet Scale Distributed Computing?
•
•
•
•
Disk block?
File?
Database?
Application?
No too low.
No too low.
No too low.
Yes, of course.
– Blast search
– Google search
– Send/Get eMail
– Portals that federate astronomy archives
(http://skyQuery.Net/)
• Web Services (.NET, EJB, OGSA)
give this abstraction level.
45
Access Grid
•
•
•
•
•
Q: What comes after the telephone?
A: eMail?
A: Instant messaging?
Both seem retro technology: text & emotons.
Access Grid
could revolutionize human communication.
• But, it needs a new idea.
• Q: What comes after the telephone?
46
Distributed Computing Economics
•
•
•
•
•
•
Why is Seti@Home a great idea?
Why is Napster a great deal?
Why is the Computational Grid uneconomic
When does computing on demand work?
What is the “right” level of abstraction?
Is the Access Grid the real killer app?
Based on: Distributed Computing Economics,
Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24
http://research.microsoft.com/research/pubs/view.aspx?tr_id=655
47
Turbulence, an old problem
Observational
Described 5 centuries ago
by Leonardo
Theoretical
Best minds have tried and …. “moved on”:
• Lamb:
… “When I die and go to heaven…”
•
•
•
Heisenberg, von Weizsäcker
…some attempts
Partial successes:
Kolmogorov, Onsager
Feynman
“…the last unsolved problem of classical
physics”
Adapted from ASCI ASCP gallery
http://www.cacr.caltech.edu/~slombey/asci/fluids/turbulence-volren.med.jpg
48
Simulation: Comp-Physics
•
•
•
How does the turbulent energy cascade work?
Direct numerical simulation of “turbulence in a box”
Pushing comp-limits along specific directions:
81922,
Three-dimensional (5123 - 4,0963),
but only two-dimensional
but only static information
Ref: Chen & Kraichnan
Slide courtesy of Charles Meneveau @ JHU
Ref: Cao, Chen et al.
49
Data-Exploration: Physics-Info
We can now “put it all together”:
Large scale range, scale-ratio O(1,000)
Three-dimensional in space
Time-evolution and Lagrangian
approach (follow the flow)
Turbulence data-base:
• Create a 100 TB database of
O(2,000) consecutive snapshots
of a 1,0243 turbulence simulation.
•
Mine the database
to understand flows in detail
50
Slide courtesy of Charles Meneveau, Alex Szalay @ JHU
Following 18 slides from 1997
• Bell & Gray Computer Industry “laws”
• Rules of thumb
• Still relevant
51
Computer Industry Laws (rules of thumb)
•
•
•
•
•
•
•
•
•
•
•
Metcalf’s law
Moore’s First Law
Bell’s Computer Classes (7 price tiers)
Bell’s Platform Evolution
Bell’s Platform Economics
Bill’s Law
Software Economics
Grove’s law
Moore’s second law
Is Info-Demand Infinite?
The Death of Grosch’s Law
52
Metcalf’s Law
Network Utility = Users2
• How many connections can it make?
– 1 user: no utility
– 1K users: a few contacts
– 1M users: many on net
– 1B users: everyone on net
• That is why the Internet is so “hot”
– Exponential benefit
53
Moore’s First Law
•XXX doubles every 18 months 60% increase per year
1GB
–Micro Processor speeds
128MB
–chip density
1 chip memory size
8MB
( 2 MB to 32 MB)
–Magnetic disk density
1MB
–Communications bandwidth 128KB
WAN bandwidth approaching LANs
8KB
1970
1980
1990
2000
•Exponential Growth:
bits: 1K 4K 16K 64K256K 1M 4M 16M64M256M
–The past does not matter
–10x here, 10x there, soon you're talking REAL change.
•PC costs decline faster than any other platform
–Volume & learning curves
–PCs will be the building bricks of all future systems
54
Bumps in the Moore’s Law Road
1000000
• DRAM:
$/MB of DRAM
10000
–1988: US Anti-Dumping rules100
–1993-1995: ?? price flat
1
1970
1980
1990
2000
$/MB of DISK
10,000
Magnetic Disk
100
–1965-1989: 10x/decade
1
–1989-2002: 7x/3year!
1,000X/decade .01
1970
1980
1990
55
2000
Gordon Bell’s 1975 VAX planning model...
He didn’t believe it!
System Price = 5 x 3 x .04 x memory size/ 1.26
5x: Memory is 20% of cost
3x:DEC markup
.04x: $ per byte
He didn’t believe:
The projection
500$ machine
(t-1972)
K$
100,000.K$
10,000.K$
1,000.K$
100.K$
10.K$
1.K$
He couldn’t comprehend
implications
0.1K$
0.01K$
1960
16 KB
1970
1980
64 KB
256 KB
1990
1 MB
2000
856
MB
Gordon Bell’s
Processing, memories, & comm 100 years
1.E+18
1.E+15
1.E+12
1.E+09
1.E+06
1.E+03
1.E+00
1947
1967
Processing
1987
2007
2027
Pri. Mem
POTS(bps)
2047
Sec. Mem.
Backbone
57
Gordon Bell’s Seven Price Tiers
•
•
•
•
•
•
•
10$: wrist watch computers
100$: pocket/ palm computers
1,000$: portable computers
10,000$: personal computers (desktop)
100,000$: departmental computers (closet)
1,000,000$: site computers (glass house)
10,000,000$:regional computers (glass castle)
SuperServer: Costs more than 100,000 $
“Mainframe” Costs more than 1M$
Must be an array of
processors,
disks, tapes
comm ports
58
Bell’s Evolution of Computer Classes
Technology enable two evolutionary paths:
1. constant performance, decreasing cost
2. constant price, increasing performance
Log Price
Mainframes (central)
Minis (dep’t.)
WSs
PCs (personals)
??
Time
1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .8
1.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62
59
Gordon Bell’s Platform Economics
• Traditional computers: Custom or SemiCustom
high-tech and high-touch
units
• 100000
New computers:
high-tech and no-touch
10000
1000
$
100
Price (K$)
Volume (K)
App price
10
1
0.1
0.01
Mainframe
WS
Computer type
Browser
60
Software Economics
CIRCA 1997
• An engineer costs about
150 k$/year
• R&D gets [5%…15%] of budget
• Need [3M$…1M$] revenue
Microsoft: 9 B$
Profit
24%
Tax
13%
per engineer
Intel 16 B$
Profit
22%
SG&A
11%
Tax
12%
Product&Service
47%
Profit
Tax 6%
5%
SG&A
34%
Product&Service
13%
Oracle: 3 B$
IBM: 72 B$
R&D
8%
R&D
16%
R&D
8%
SG&A
22%
Product&Service
59%
Profit
15%
Tax
7%
Product&
Services
26%
R&D
9%
SG&A
43%
61
Software Economics: Bill’s Law
Fixed _ Cost
Price 
 Marginal_Cost
Units
• Bill Joy’s law (Sun):
Don’t write software for less than 100,000 platforms.
@10M$ engineering expense, 1,000$ price
• Bill Gate’s law:
Don’t write software for less than 1,000,000 platforms.
@10M$ engineering expense, 100$ price
• Examples:
– UNIX vs NT: 3,500$ vs 500$
– Oracle vs SQL-Server: 100,000$ vs 6,000$
– No Spreadsheet or Presentation pack on UNIX/VMS/...
• Commoditization of base Software & Hardware
62
Grove's Law
The New Computer Industry
• Horizontal integration
is new structure
• Each layer picks best
from lower layer.
• Desktop (C/S) market
– 1991: 50%
– 1995: 75%
Function
Operation
Integration
Applications
Middleware
Baseware
Systems
Silicon & Oxide
Example
AT&T
EDS
SAP
Oracle
Microsoft
Compaq
Intel & Seagate
63
Moore’s Second Law
•The Cost of Fab Lines Doubles Every Generation
• Physical limit:
• Quantum Effects
at 0.25 micron now
0.05 micron seems hard
12 years, 3 generations
$10,000
M$ / Fab Line
(3 years)
• Money Limit:
hard to imagine
10 B$ line
20 B$ line
40 B$ line
$1,000
$100
$10
$1
1960
• Lithograph:
need Xray below 0.13 micron
1970
1980
1990
2000
Year
64
Constant Dollars vs Constant Work
• Constant Work:
– One SuperServer can do all the world’s computations.
• Constant Dollars:
– The world spends 10% on information processing
– Computers are moving from 5% penetration to 50%
• 300 B$ to 3T$
• We have the patent on the byte and algorithm
65
Crossing the Chasm
New
Market
product finds
customers
No Product
No Customers
hard
Old
Market
Boring
Competitve
Slow Growth
Old
Technology
hard
Customers
find product
New
Technology
66
Billions of Clients Need
Millions of Servers
All clients are networked to servers
Clients
may be nomadic or on-demand
mobile
clients
Fast clients want faster servers Servers
fixed
clients
server
Servers provide
data,
control,
coordination
communication
super
server
Super Servers
Large Databases
High Traffic shared data67
The Parallel Law of Computing
Grosch's Law:
1 MIPS
1$
2x $ is 4x performance
1,000 MIPS
32 $
.03$/MIPS
2x $ is
2x performance
Parallel Law:
Needs
Linear Speedup and Linear Scaleup
Not always possible
1,000 MIPS
1,000 $ 1 MIPS
1$
68
Our Biggest Problem
What is the trend line?
Maintenance
care& feeding
75%
hardware
20%
software
5%
This wasn’t a problem when
MIPS cost 100k$ and
Disks cost 1k$/MB
69
Download