Information Centric Super Computing Jim Gray Microsoft Research gray@microsoft.com Talk at http://research.microsoft.com/~gray/talks 20 May 2003 Presentation to Committee on the Future of Supercomputing of the National Research Council's Computer Science and Telecommunications Board 1 Committee Goal … assess the status of supercomputing in the United States, including the characteristics of relevant systems and architecture research in government, industry, and academia and the characteristics of the relevant market. The committee will examine key elements of context--the history of supercomputing, the erosion of research investment, the needs of government agencies for supercomputing capabilities--and assess options for progress. Key historical or causal factors will be identified. The committee will examine the changing nature of problems demanding supercomputing (e.g., weapons design, molecule modeling and simulation, cryptanalysis, bioinformatics, climate modeling) and the implications for systems design. It will seek to understand the role of national security in the supercomputer market and the long-term federal interest in supercomputing. 2 Summary: It’s the Software… • • • • • Computing is Information centric Scientific computing is Beowulf computing Scientific computing becoming Info-centric. Adequate investment in files/OS/networking Underinvestment in Scientific Information management and visualization tools. • Computation Grid moves too much data, DataGrid (or App Grid) is right concept super 3 Thesis • Most new information is digital (and old information is being digitized) • A Computer Science Grand Challenge: – – – – Capture Organize Summarize Visualize This information • Optimize Human Attention as a resource • Improve information quality 4 Information Avalanche • The Situation – We can record everything – Everything is a LOT! • The Good news – Changes science, education, medicine, entertainment,…. – Shrinks time and space – Can augment human intelligence • The Bad News – The end of privacy – Cyber Crime / Cyber Terrorism – Monoculture • The Technical Challenges – Amplify human intellect – Organize, summarize and prioritize information – Make programming easy 5 Super Computers • You and Others use Every day – – – – Google, Inktomi,… AOL, MSN, Yahoo! Hotmail, MSN,… eBay, Amazon.com,… • IntraNets – – – – Wal-Mart Federal Reserve Amex 1 Tflops • All more than 1PB • All are more than 10 Tops • All more than 1PB They are ALL Information Centric 6 Q: How can I recognize a SuperComputer? A: Costs 10M$ Gordon Bell’s Seven Price Tiers 10$: 100$: 1,000$: 10,000$: 100,000$: 1,000,000$: 10,000,000$: wrist watch computers (sensors) pocket/ palm computers (phone/camera) portable computers (tablet) personal •computers (workstation) departmental computers (closet) site computers (glass house) regional computers (glass castle SC) Super Computer / “Mainframe” Costs more than 1M$ Must be an array of processors, disks comm ports 7 Computing is Information Centric that’s why they call it IT • Programs capture, organize, abstract, filter, present Information to people. • Networks carry Information. • File is wrong abstraction: Information is typed / schematized words, pictures, sounds, arrays, lists,.. • Notice that none of the examples on prev slide serve files – they serve typed information. • Recommendation: Increase Research investments ABOVE the OS level Information Management/Visualization 8 Summary: It’s the Software… • • • • • Computing is Information centric Scientific computing is Beowulf computing Scientific computing becoming Info-centric. Adequate investment in files/OS/networking Underinvestment in Scientific Information management and visualization tools. • Computation Grid moves too much data, DataGrid (or App Grid) is right concept 9 Anecdotal Evidence, Everywhere I go I see Beowulfs • Clusters of PCs (or high-slice-price micros) • True: I have not visited Earth Simulator, but… Google, MSN, Hotmail, Yahoo, NCBI, FNAL, Los Alamos, Cal Tech, MIT, Berkeley, NARO, Smithsonian, Wisconsin, eBay, Amazon.com, Schwab, Citicorp, Beijing, Cern, BaBar, NCSA, Cornell, UCSD, and of course NASA and Cal Tech 10 skip Super Computing The Top 10 of Top 500 Adapted from Top500 Nov 2002 Cumulative Hardware TeraFlops Site TF 1 NEC Earth-Sim 35.9 Earth Sim Ctr 36 2 HP ASCI Q 7.7 LLNL 44 3 HP ASCI Q 7.7 LLNL 51 4 IBM ASCI White 7.2 LLNL 59 5 Intel/NetworX 5.7 LLNL 64 6 HP Alpha 4.5 PSC 69 7 HP Alpha 4.0 CEA 73 8 Intel/HPTi 3.3 NOAA 76 9 IBM SP2 3.2 HPCx 79 10 IBM SP2 3.2 NCAR 82 11 skip Seti@Home The worlds most powerful computer • 61 TF is sum of top 4 of Top 500. • 61 TF is 9x the number 2 system. • 61 TF more than the sum of systems 2..10 Seti@Home http://setiathome.ssl.berkeley.edu/totals.html 20 May 2003 Total Last 24 Hours Users 4,493,731 1,900 Results received 886 M 1,4 M Total CPU time 1.5 M years 1,514 years Floating Point Operations 3 E+21 ops 3 zeta ops 5 E+18 FLOPS/day 61.3 TeraFLOPs 12 skip And… • Google: – 10k cpus, 2PB,… as of 2 years ago – 40 Tops • AOL, MSN, Hotmail, Yahoo!, … -- all ~10K cpus -- all have ~ 1PB …10PB storage • Wal-Mart is a PB poster child • Clusters / Beowulf everywhere you go. 13 Scientific == Beowulf (clusters) • Scientific/ Beowulf/ Grid computing 70’s style computing: process / file / socket byte arrays, no data schema or semantics batch job scheduling manual parallelism (MPI) poor / no Information management support poor / no Information visualization toolkits • Recommendation: Increase investment in Info-Management Increase investment in Info-Visualization 14 Summary: It’s the Software… • • • • • Computing is Information centric Scientific computing is Beowulf computing Scientific computing becoming Info-centric. Adequate investment in files/OS/networking Underinvestment in Scientific Information management and visualization tools. • Computation Grid moves too much data, DataGrid (or App Grid) is right concept 15 The Evolution of Science • Observational Science – Scientist gathers data by direct observation – Scientist analyzes Information • Analytical Science – Scientist builds analytical model – Makes predictions. • Computational Science – Simulate analytical model – Validate model and makes predictions • Science - Informatics Information Exploration Science Information captured by instruments Or Information generated by simulator – Processed by software – Placed in a database / files – Scientist analyzes database / files 16 How Discoveries Made? Adapted from slide by George Djorgovski • Conceptual Discoveries: e.g., Relativity, QM, Brane World, Inflation … Theoretical, may be inspired by observations • Phenomenological Discoveries: e.g., Dark Matter, QSOs, GRBs, CMBR, Extrasolar Planets, Obscured Universe … Empirical, inspire theories, can be motivated by them New Technical Capabilities Observational Discoveries Theory Phenomenological Discoveries: Explore parameter space Make new connections (e.g., multi-) Understanding of complex phenomena requires complex, information-rich data (and simulations?) 17 The Information Avalanche both comp-X and X-info generating Petabytes • Comp-Science generating Information avalanche comp-chem, comp-physics, comp-bio, comp-astro, comp-linguistics, comp-music, comp-entertainment, comp-warfare • Science-Info generating Information avalanche bio-info, astro-info, text-info, 18 Information Avalanche Stories • Turbulence: 100 TB simulation then mine the Information • BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information • CERN: LHC will generate 1GB/s 10 PB/y • VLBA (NRAO) generates 1GB/s today • NCBI: “only ½ TB” but doubling each year very rich dataset. 19 • Pixar: 100 TB/Movie Astro-Info World Wide Telescope http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/ • Premise: Most data is (or could be online) • Internet is the world’s best telescope: – – – – It has data on every part of the sky In every measured spectral band: optical, x-ray, radio.. As deep as the best instruments (2 years ago). It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). – It’s a smart telescope: links objects and data to literature on them. 20 Why Astronomy Data? IRAS 25m •It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms 2MASS 2m •It is real and well documented – High-dimensional data (with confidence intervals) – Spatial data – Temporal data DSS Optical •Many different instruments from many different places and many different times •But, it’s the same universe so comparisons make sense & are interesting. •Federation is a goal •There is a lot of it (petabytes) •Great sandbox for data mining algorithms IRAS 100m WENSS 92cm –Can share cross company –University researchers NVSS 20cm •Great way to teach both Astronomy and Computational Science 21 ROSAT ~keV GB 6cm Summary: It’s the Software… • • • • • Computing is Information centric Scientific computing is Beowulf computing Scientific computing becoming Info-centric. Adequate investment in files/OS/networking Underinvestment in Scientific Information management and visualization tools. • Computation Grid moves too much data, DataGrid (or App Grid) is right concept 22 What X-info Needs from us (cs) (not drawn to scale) Miners Scientists Science Data & Questions Data Mining Algorithms Plumbers Database To store data Execute Queries Question & Answer Visualization Tools 23 Data Access is hitting a wall FTP and GREP are not adequate • • • • You can GREP 1 MB in a second You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. • • • • You can FTP 1 MB in 1 sec You can FTP 1 GB / min (= 1 $/GB) … 2 days and 1K$ … 3 years and 1M$ • Oh!, and 1PB ~5,000 disks • At some point you need indices to limit search parallel data search and analysis • This is where databases can help 24 Next-Generation Data Analysis • Looking for – Needles in haystacks – the Higgs particle – Haystacks: Dark matter, Dark energy • Needles are easier than haystacks • Global statistics have poor scaling – Correlation functions are N2, likelihood techniques N3 • As data and processing grow at same rate, we can only keep up with N logN • A way out? – Discard notion of optimal (data is fuzzy, answers are approximate) – Don’t assume infinite computational resources or memory • Requires combination of statistics & computer science • Recommendation: invest in data mining research both general and domain-specific. 25 Analysis and Databases • Statistical analysis deals with – – – – – – – – Creating uniform samples data filtering & censoring bad data Assembling subsets Estimating completeness Counting and building histograms Generating Monte-Carlo subsets Likelihood calculations Hypothesis testing • Traditionally these are performed on files • Most of these tasks are much better done inside a database close to the data. • Move Mohamed to the mountain, not the mountain to Mohamed. • Recommendation: Invest in database research: extensible databases: text, temporal, spatial, … data interchange, parallelism, indexing, query optimization 26 Goal: Easy Data Publication & Access • Augment FTP with data query: Return intelligent data subsets • Make it easy to – Publish: Record structured data – Find: • Find data anywhere in the network • Get the subset you need – Explore datasets interactively • Realistic goal: – Make it as easy as publishing/reading web sites today. 27 Data Federations of Web Services • Massive datasets live near their owners: – – – – Near the instrument’s software pipeline Near the applications Near data knowledge and curation Super Computer centers become Super Data Centers • Each Archive publishes a web service – Schema: documents the data – Methods on objects (queries) • Scientists get “personalized” extracts • Uniform access to multiple ArchivesFederation – A common global schema 28 Web Services: The Key? • Web SERVER: – Given a url + parameters – Returns a web page (often dynamic) Your program Web Server • Web SERVICE: – Given a XML document (soap msg) – Returns an XML document – Tools make this look like an RPC. • F(x,y,z) returns (u, v, w) – Distributed objects for the web. – + naming, discovery, security,.. • Internet-scale distributed computing Your program Data In your address space Web Service 29 The Challenge • This has failed several times before– understand why. • Develop – Common data models (schemas), – Common interfaces (class/method) • Build useful prototypes (nodes and portals) • Create a community that uses the prototypes and evolves the prototypes. 30 Grid and Web Services Synergy • I believe the Grid will be many web services • IETF standards Provide – Naming – Authorization / Security / Privacy – Distributed Objects Discovery, Definition, Invocation, Object Model – Higher level services: workflow, transactions, DB,.. • Synergy: commercial Internet & Grid tools 31 Summary: It’s the Software… • • • • • Computing is Information centric Scientific computing is Beowulf computing Scientific computing becoming Info-centric. Adequate investment in files/OS/networking Underinvestment in Scientific Information management and visualization tools. • Computation Grid moves too much data, DataGrid (or App Grid) is right concept 32 Recommendations • Increase Research investments ABOVE the OS level Information Management/Visualization • Invest in database research: extensible databases: text, temporal, spatial, … data interchange, parallelism, indexing, query optimization • invest in data mining research both general and domain-specific 33 Stop Here • Bonus slides on Distributed Computing Economics 34 Distributed Computing Economics • • • • • • Why is Seti@Home a great idea Why is Napster a great deal? Why is the Computational Grid uneconomic When does computing on demand work? What is the “right” level of abstraction Is the Access Grid the real killer app? Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24 http://research.microsoft.com/research/pubs/view.aspx?tr_id=655 35 Computing is Free • Computers cost 1k$ (if you shop right) • So 1 cpu day == 1$ • If you pay the phone bill (and I do) Internet bandwidth costs 50 … 500$/mbps/m (not including routers and management). • So 1GB costs 1$ to send and 1$ to receive 36 Why is Seti@Home a Good Deal? • Send 300 KB for • User computes for ½ day: • ROI: 1500:1 costs 3e-4$ benefit .5e-1$ 37 Why is Napster a Good Deal? • Send 5 MB costs 5e-3$ • ½ a penny per song • Both sender and receiver can afford it. • Same logic powers web sites (Yahoo!...): – 1e-3$/page view advertising revenue – 1e-5$/page view cost of serving web page – 100:1 ROI 38 The Cost of Computing: Computers are NOT free! • Capital Cost of a TpcC system is mostly storage and storage software (database) • IBM 32 cpu, 512 GB ram 2,500 disks, 43 TB TpcC Cost Components DB2/AIX http://www.tpc.o rg/results/individual_results/IB M /IB M p690es_05092003.pdf software 10% cpu/mem 29% storage 61% (680,613 tpmC @ 11.13 $/tpmc available 11/08/03) http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf • A 7.5M$ super-computer • Total Data Center Cost: 40% capital &facilities 60% staff (includes app development) 39 Computing Equivalents 1 $ buys • • • • • • • 1 day of cpu time 4 GB ram for a day 1 GB of network bandwidth 1 GB of disk storage 10 M database accesses 10 TB of disk access (sequential) 10 TB of LAN bandwidth (bulk) 40 Some consequences • Beowulf networking is 10,000x cheaper than WAN networking factors of 105 matter. • The cheapest and fastest way to move a Terabyte cross country is sneakernet. 24 hours = 4 MB/s 50$ shipping vs 1,000$ wan cost. • Sending 10PB CERN data via network is silly: buy disk bricks in Geneva, fill them, ship them. TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg Microsoft Technical Report may 2002, MSR-TR-2002-54 http://research.microsoft.com/research/pubs/view.aspx?tr_id=569 41 How Do You Move A Terabyte? Context Speed Rent $/TB $/Mbps Mbps $/month Sent Time/TB Home phone 0.04 40 1,000 3,086 6 years Home DSL 0.6 70 117 360 5 months T1 1.5 1,200 800 2,469 2 months T3 43 28,000 651 2,010 2 days OC3 155 49,000 316 976 14 hours OC 192 9600 1,920,000 200 617 14 minutes 100 Mpbs 100 1 day Gbps 1000 2.2 hours Source: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all 42 Computational Grid Economics • To the extent that computational grid is like Seti@Home or ZetaNet or Folding@home or… it is a great thing • The extent that the computational grid is MPI or data analysis, it fails on economic grounds: move the programs to the data, not the data to the programs. • The Internet is NOT the cpu backplane. • The USG should not hide this economic fact from the academic/scientific research community. 43 Computing on Demand • Was called outsourcing / service bureaus in my youth. CSC and IBM did it. • It is not a new way of doing things: think payroll. Payroll is standard outsource. • Now we have Hotmail, Salesforce.com, Oracle.com,…. • Works for standard apps. • Airlines outsource reservations. Banks outsource ATMs. • But Amazon, Amex, Wal-Mart, ... Can’t outsource their core competence. • So, COD works for commoditized services. 44 What’s the right abstraction level for Internet Scale Distributed Computing? • • • • Disk block? File? Database? Application? No too low. No too low. No too low. Yes, of course. – Blast search – Google search – Send/Get eMail – Portals that federate astronomy archives (http://skyQuery.Net/) • Web Services (.NET, EJB, OGSA) give this abstraction level. 45 Access Grid • • • • • Q: What comes after the telephone? A: eMail? A: Instant messaging? Both seem retro technology: text & emotons. Access Grid could revolutionize human communication. • But, it needs a new idea. • Q: What comes after the telephone? 46 Distributed Computing Economics • • • • • • Why is Seti@Home a great idea? Why is Napster a great deal? Why is the Computational Grid uneconomic When does computing on demand work? What is the “right” level of abstraction? Is the Access Grid the real killer app? Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24 http://research.microsoft.com/research/pubs/view.aspx?tr_id=655 47 Turbulence, an old problem Observational Described 5 centuries ago by Leonardo Theoretical Best minds have tried and …. “moved on”: • Lamb: … “When I die and go to heaven…” • • • Heisenberg, von Weizsäcker …some attempts Partial successes: Kolmogorov, Onsager Feynman “…the last unsolved problem of classical physics” Adapted from ASCI ASCP gallery http://www.cacr.caltech.edu/~slombey/asci/fluids/turbulence-volren.med.jpg 48 Simulation: Comp-Physics • • • How does the turbulent energy cascade work? Direct numerical simulation of “turbulence in a box” Pushing comp-limits along specific directions: 81922, Three-dimensional (5123 - 4,0963), but only two-dimensional but only static information Ref: Chen & Kraichnan Slide courtesy of Charles Meneveau @ JHU Ref: Cao, Chen et al. 49 Data-Exploration: Physics-Info We can now “put it all together”: Large scale range, scale-ratio O(1,000) Three-dimensional in space Time-evolution and Lagrangian approach (follow the flow) Turbulence data-base: • Create a 100 TB database of O(2,000) consecutive snapshots of a 1,0243 turbulence simulation. • Mine the database to understand flows in detail 50 Slide courtesy of Charles Meneveau, Alex Szalay @ JHU Following 18 slides from 1997 • Bell & Gray Computer Industry “laws” • Rules of thumb • Still relevant 51 Computer Industry Laws (rules of thumb) • • • • • • • • • • • Metcalf’s law Moore’s First Law Bell’s Computer Classes (7 price tiers) Bell’s Platform Evolution Bell’s Platform Economics Bill’s Law Software Economics Grove’s law Moore’s second law Is Info-Demand Infinite? The Death of Grosch’s Law 52 Metcalf’s Law Network Utility = Users2 • How many connections can it make? – 1 user: no utility – 1K users: a few contacts – 1M users: many on net – 1B users: everyone on net • That is why the Internet is so “hot” – Exponential benefit 53 Moore’s First Law •XXX doubles every 18 months 60% increase per year 1GB –Micro Processor speeds 128MB –chip density 1 chip memory size 8MB ( 2 MB to 32 MB) –Magnetic disk density 1MB –Communications bandwidth 128KB WAN bandwidth approaching LANs 8KB 1970 1980 1990 2000 •Exponential Growth: bits: 1K 4K 16K 64K256K 1M 4M 16M64M256M –The past does not matter –10x here, 10x there, soon you're talking REAL change. •PC costs decline faster than any other platform –Volume & learning curves –PCs will be the building bricks of all future systems 54 Bumps in the Moore’s Law Road 1000000 • DRAM: $/MB of DRAM 10000 –1988: US Anti-Dumping rules100 –1993-1995: ?? price flat 1 1970 1980 1990 2000 $/MB of DISK 10,000 Magnetic Disk 100 –1965-1989: 10x/decade 1 –1989-2002: 7x/3year! 1,000X/decade .01 1970 1980 1990 55 2000 Gordon Bell’s 1975 VAX planning model... He didn’t believe it! System Price = 5 x 3 x .04 x memory size/ 1.26 5x: Memory is 20% of cost 3x:DEC markup .04x: $ per byte He didn’t believe: The projection 500$ machine (t-1972) K$ 100,000.K$ 10,000.K$ 1,000.K$ 100.K$ 10.K$ 1.K$ He couldn’t comprehend implications 0.1K$ 0.01K$ 1960 16 KB 1970 1980 64 KB 256 KB 1990 1 MB 2000 856 MB Gordon Bell’s Processing, memories, & comm 100 years 1.E+18 1.E+15 1.E+12 1.E+09 1.E+06 1.E+03 1.E+00 1947 1967 Processing 1987 2007 2027 Pri. Mem POTS(bps) 2047 Sec. Mem. Backbone 57 Gordon Bell’s Seven Price Tiers • • • • • • • 10$: wrist watch computers 100$: pocket/ palm computers 1,000$: portable computers 10,000$: personal computers (desktop) 100,000$: departmental computers (closet) 1,000,000$: site computers (glass house) 10,000,000$:regional computers (glass castle) SuperServer: Costs more than 100,000 $ “Mainframe” Costs more than 1M$ Must be an array of processors, disks, tapes comm ports 58 Bell’s Evolution of Computer Classes Technology enable two evolutionary paths: 1. constant performance, decreasing cost 2. constant price, increasing performance Log Price Mainframes (central) Minis (dep’t.) WSs PCs (personals) ?? Time 1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .8 1.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62 59 Gordon Bell’s Platform Economics • Traditional computers: Custom or SemiCustom high-tech and high-touch units • 100000 New computers: high-tech and no-touch 10000 1000 $ 100 Price (K$) Volume (K) App price 10 1 0.1 0.01 Mainframe WS Computer type Browser 60 Software Economics CIRCA 1997 • An engineer costs about 150 k$/year • R&D gets [5%…15%] of budget • Need [3M$…1M$] revenue Microsoft: 9 B$ Profit 24% Tax 13% per engineer Intel 16 B$ Profit 22% SG&A 11% Tax 12% Product&Service 47% Profit Tax 6% 5% SG&A 34% Product&Service 13% Oracle: 3 B$ IBM: 72 B$ R&D 8% R&D 16% R&D 8% SG&A 22% Product&Service 59% Profit 15% Tax 7% Product& Services 26% R&D 9% SG&A 43% 61 Software Economics: Bill’s Law Fixed _ Cost Price Marginal_Cost Units • Bill Joy’s law (Sun): Don’t write software for less than 100,000 platforms. @10M$ engineering expense, 1,000$ price • Bill Gate’s law: Don’t write software for less than 1,000,000 platforms. @10M$ engineering expense, 100$ price • Examples: – UNIX vs NT: 3,500$ vs 500$ – Oracle vs SQL-Server: 100,000$ vs 6,000$ – No Spreadsheet or Presentation pack on UNIX/VMS/... • Commoditization of base Software & Hardware 62 Grove's Law The New Computer Industry • Horizontal integration is new structure • Each layer picks best from lower layer. • Desktop (C/S) market – 1991: 50% – 1995: 75% Function Operation Integration Applications Middleware Baseware Systems Silicon & Oxide Example AT&T EDS SAP Oracle Microsoft Compaq Intel & Seagate 63 Moore’s Second Law •The Cost of Fab Lines Doubles Every Generation • Physical limit: • Quantum Effects at 0.25 micron now 0.05 micron seems hard 12 years, 3 generations $10,000 M$ / Fab Line (3 years) • Money Limit: hard to imagine 10 B$ line 20 B$ line 40 B$ line $1,000 $100 $10 $1 1960 • Lithograph: need Xray below 0.13 micron 1970 1980 1990 2000 Year 64 Constant Dollars vs Constant Work • Constant Work: – One SuperServer can do all the world’s computations. • Constant Dollars: – The world spends 10% on information processing – Computers are moving from 5% penetration to 50% • 300 B$ to 3T$ • We have the patent on the byte and algorithm 65 Crossing the Chasm New Market product finds customers No Product No Customers hard Old Market Boring Competitve Slow Growth Old Technology hard Customers find product New Technology 66 Billions of Clients Need Millions of Servers All clients are networked to servers Clients may be nomadic or on-demand mobile clients Fast clients want faster servers Servers fixed clients server Servers provide data, control, coordination communication super server Super Servers Large Databases High Traffic shared data67 The Parallel Law of Computing Grosch's Law: 1 MIPS 1$ 2x $ is 4x performance 1,000 MIPS 32 $ .03$/MIPS 2x $ is 2x performance Parallel Law: Needs Linear Speedup and Linear Scaleup Not always possible 1,000 MIPS 1,000 $ 1 MIPS 1$ 68 Our Biggest Problem What is the trend line? Maintenance care& feeding 75% hardware 20% software 5% This wasn’t a problem when MIPS cost 100k$ and Disks cost 1k$/MB 69