A Personal View of Microsoft Research and its Impact Jim Gray Microsoft Research Gray@Microsoft.com http://Research.Microsoft.com/~Gray/Talks 1 Outline • What I am doing • What my group is doing • What Microsoft Research is doing • How it affects Microsoft Products and others • Q&A 2 What I Am Doing • Came to Microsoft to work on scalable computing: build computers by the slice – – – – – TerraServer Billion transactions per day Fault tolerant clusters Transactions in the OS (DTC, MTS, COM+,..) Cyberbricks • Social service – PITAC: expand government funding of IT research and training. – Library of Congress study – Putting all technical literature online – Turing lecture 3 PITAC Report Presidential IT Advisory Committee • Findings: http://www.ccic.gov/ac/report/ – Software construction is a mess: needs breakthroughs. – We do not know how to scale the Internet 100x • Security, manageability, services, terabit per second issues. – USG needs high-performance computing (Simulation) but market is not providing vector-supers – just providing processor arrays. – Trained people are in very short supply. • Recommendations: – – – – Lewis & Clark expeditions to 21st century. Increase long-term research funding by 1.4B$/y. Re-invigorate university research & teaching. Facilitate immigration of technical experts. 4 Why Can’t Industry Fund IT Research? • It does: IBM (5.8%), Intel (13%), Lucent (12%), Microsoft (14.%) , Sun (12%) , ... – R&D is ~5%-15% (50 B$ of 500 B$) • AD is 10% of that (5 B$) – Long-Range Research is 10% of that 500 M$ 2,500 researchers and university support – Compaq: 4.8% R&D (1.3 B$ of 27.3 B$).AOL: 3.7% D, ?R (96 M$ of 2.6 B$) – Dell:1.6% R&D EDS, MCI-WorldCom, …. (204 M$ of 12.6 B$), • To be competitive, some companies cannot make large long-term research investments. The Xerox/PARC story: created Mac, Adobe, 3Com… 5 Cyberspace is a New World • We have discovered a “new continent”. • It is changing how we learn, work, and play. – 1 T$/y industry – 1 T$ new wealth since 1993 – 30% of US economic growth since 1993 • There is a gold rush to stake out territory. THE • But we also need explorers: LONG Lewis & Clark expeditions BOOM Universities to teach the next generation(s) • Governments, industry, and philanthropists should fund long-term research. 6 ACM 1998 Turing Lecture http://research.microsoft.com/~gray/ • Organized around three seminal visionaries – Babbage: Computers – Bush: Automatic Information storage & access – Turing: Intelligent Machines • A dozen LONG RANGE research goals – – – – Vision, speech, …, Intelligent machines Understand information Automatic programming, dependable computing Most are AI complete. 7 How Much Information Is there? Yotta • Soon everything can be recorded and indexed • Most data never be seen by humans • Precious Resource: Human attention Auto-Summarization Auto-Search is key technology. Everything ! Recorded All Books Zetta Exa MultiMedia Peta All LoC books (words) .Movi e A Photo Tera Giga Mega http://www.lesk.com/mlesk/ksg97/ksg.html A Book 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli Kilo8 Putting All Information Online • Library of Congress study – NRC group looking at IT future of LC. – World-library is evolving • CoRR – Put all scientific literature online (and free) – http://xxx.lanl.gov/archive/cs • TerraServer – An example of the new digital library • Sloan Digital Sky Survey (and an object catalog) http://www.sdss.org/ 9 1.2 Billion Transactions Per Day • • • • 1 B tpd ran for 24 hrs. Out-of-the-box software Off-the-shelf hardware AMAZING! •Sized for 30 days •Linear growth •5 micro-dollars per transaction 10 How Much Is 1 Billion Tpd? • Mtpd Mtpd Millions of of Transactions Transactions Per Per Day Day Millions • 1 billion tpd = 11,574 tps ~ 700,000 tpm (transactions/minute) ATT – 185 million calls per peak day (worldwide) 1,000. 900. 800. 100. 700. 600. 500. 10. 400. 300. 1. 200. 100. 0. 0.1 • Visa ~20 million tpd – 400 million customers – 250K ATMs worldwide – 7 billion transactions (card+cheque) in 1994 1 Btpd Visa ATT BofA NYSE • New York Stock Exchange – 600,000 tpd • Bank of America – 20 million tpd checks cleared (more than any other bank) – 1.4 million tpd ATM transactions • Worldwide Airlines Reservations: 250 Mtpd 11 NCSA Super Cluster http://access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html • National Center for Supercomputing Applications University of Illinois @ Urbana • 512 Pentium II cpus, 2,096 disks, SAN • Compaq + HP +Myricom + WindowsNT • A Super Computer for 3M$ • Classic Fortran/MPI programming • COM+ programming model 12 Transactions in the OS • • • • • Transaction Processing Monitors (TP) Web Servers Database Systems (stored procedures) Object Request Brokers (ORB) Remote Procedure Call (RPC) Programming model: requests invoke objects: COM+ or EJB Quickly Dispatch many requests Authorize on the fly Load balance across many servers Provide transactional services (mgmt, restart/recovery, log, …) Protocol translation (HTTP<->RPC or RPC<->LU6 or..) All moving to core OS. All converging on a common architecture Web Servers Embrace&Extend TPmonitor techniques SPS: servelets per second (ASPs served per second by IIS, 1,000 statement VBscript) 450 400 1P 2P 4P 8P 350 Shift 300 from 4x200 Mhz to 8 450 Mhz 250 200 150 100 50 0 NT4 InProc W2K RC2 InProc NT4 OOP W2K RC2 OOP 13 CyberBricks: Data Gravity Processing Moves to Transducers • Move Processing to data sources • Move to where the power (and sheet metal) is • Processor in – Modem – Display – Microphones (speech recognition) & cameras (vision) – Storage: Data storage and analysis 14 It’s Already True of Printers Peripheral = CyberBrick • You buy a printer • You get a – several network interfaces – A Postscript engine • • • • cpu, memory, software, a spooler (soon) – and… a print engine. 15 Remember Your Roots 16 Year 2002 Disks • Big disk (5 $/GB) – – – – 3” 150 GB 150 kaps (k accesses per second) 30 MBps sequential • Small disk (50 $/GB) – – – – 1” 1 GB 100 kaps 10 MBps sequential • Both running Windows NT™ 7.0? (see below for why) 17 How Do They Talk to Each Other? Applications Each node has an OS Each node has local resources: A federation. Each node does not completely trust the others. Nodes use RPC to talk to each other – CORBA? DCOM? IIOP? RMI? – One or all of the above. Applications ? RPC streams datagrams • Huge leverage in high-level interfaces. • Same old distributed system story. VIAL/VIPL ? RPC streams datagrams • • • • h Wire(s) 18 Technology Drivers: The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/ • Today: – wires are 10 MBps (100 Mbps Ethernet) – ~20 MBps tcp/ip saturates 2 cpus – round-trip latency is ~300 us • In the lab – Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… – Fast user-level communication • tcp/ip ~ 100 MBps 10% of each processor • round-trip latency is 15 us 19 SAN: System Area Networks Standard Interconnect Gbps Ethernet: 110 MBps PCI: 70 MBps UW Scsi: 40 MBps • LAN faster than memory bus? • 1 GBps links in lab. • 100$ port cost soon • Port is computer FW scsi: 20 MBps scsi: 5 MBps 20 Plug & Play Software • RPC is standardizing: (DCOM, IIOP, HTTP) – Gives huge TOOL LEVERAGE – Solves the hard problems for you: • naming, • security, • directory service, • operations,... • Commoditized programming environments – – – – FreeBSD, Linix, Solaris,…+ tools NetWare + tools WinCE, Win2K,…+ tools JavaOS + tools • Apps gravitate to data. • General purpose OS on controller runs apps. 21 Outline • What I am doing • What my group is doing • What Microsoft Research is doing • How it affects Microsoft Products and others • Q&A 22 Scale Up and Scale Out Grow Up with SMP 4xP6 is now standard SMP Super Server Grow Out with Cluster Cluster has inexpensive parts Departmental Server Personal System Cluster of PCs Microsoft TerraServer: Scaleup to Big Databases http://terraserver.Micrsoft.com/ • Build a multi-TB SQL Server database • Data must be – – – – 1 TB Unencumbered Interesting to everyone everywhere And not offensive to anyone anywhere • Loaded – 1.5 M place names from Encarta World Atlas – 4 M Sq Km from USGS (1 meter resolution) – 1 M Sq Km from Russian Space agency (2 m) • On the web (world’s largest atlas) • Sell images with commerce server. 24 Microsoft TerraServer Background • Earth is 500 Tera-meters square – USA is 10 tm2 • 100 TM2 land in 70ºN to 70ºS • We have pictures of 6% of it • Someday – multi-spectral image – of everywhere – once a day / hour – 3 tsm from USGS – 2 tsm from Russian Space Agency • • • • Compress 5:1 (JPEG) to 1.5 TB. Slice into 10 KB chunks Store chunks in DB Navigate with – Encarta™ Atlas .2x.2 km2 tile .4x.4 km2 image .8x.8 km2 image 1.6x1.6 km2 image • globe • gazetteer – StreetsPlus™ in the USA 25 USGS Digital Ortho Quads (DOQ) • US Geologic Survey • 4 Tera Bytes • Most data not yet published • Based on a CRADA – Microsoft TerraServer makes data available. 1x1 meter 4 TB Continental US New Data Coming USGS “DOQ” 26 Russian Space Agency(SovInfomSputnik) SPIN-2 (Aerial Images is Worldwide Distributor) SPIN-2 • • • • • • 1.5 Meter Geo Rectified imagery of (almost) anywhere Almost equal-area projection De-classified satellite photos (from 200 KM), More data coming (1 m) Selling imagery on Internet. Putting 2 tm2 onto Microsoft TerraServer. 27 Demo http://www.TerraServer. Microsoft.com/ Microsoft BackOffice SPIN-2 28 Hardware Internet Map Site Server Servers SPIN-2 100 Mbps Ethernet Switch DS3 Web Servers 1TB Database Server AlphaServer 8400 4x400. 10 GB RAM 324 StorageWorks disks 10 drive tape library (STC Timber Wolf DLT7000 ) 29 Software Web Client Image Server Active Server Pages Internet Information Server 4.0 Java Viewer browser MTS Terra-Server Stored Procedures HTML The Internet Internet Info Server 4.0 SQL Server 7 Microsoft Automap ActiveX Server TerraServer DB Automap Server TerraServer Web Site Internet Information Server 4.0 Microsoft Site Server EE Image Delivery SQL Server Application 7 Image Provider Site(s) 30 System Management & Maintenance • Backup and Recovery – – – – STK 9710 Tape robot Legato NetWorker™ SQL Server 7 Backup & Restore Clocked at 80 MBps (peak) (~ 200 GB/hr) • SQL Server Enterprise Mgr – DBA Maintenance – SQL Performance Monitor 31 After a Year: 30M Count • 1 TB of data 750 M records 10M • 2.3 billion Hits • 2.0 billion DBB Queries 0 • 1.7 billion Images sent • 368 million Page Views • 99.93% DB Availability • 3rd design now Online • Built and operated by team of 4 people • In late July 99 Operations missed 32 hr outage (!) TerraServer Daily Traffic Jun 22, 1998 thru June 22, 1999 Sessions Hit Page View DB Query Image 20M Down Time TotalTime (Hours) (Hours:minutes) 8640 6:00 7920 5:30 7200 5:00 6480 Operations 4:30 5760 4:00 5040 4320 3600 2880 Up 3:30 3:00 2:30 Scheduled 2:00 2160 1:30 1440 1:00 720 0:30 0 0:00 HW+Software 32 TerraServer What Next • Integrated with Encarta Online (a classic technology transfer story) • Adding USGS Topographic maps (4 TB more) • Potential European coverage (?) • Adding mult-layer maps (with UC Berkeley) • Thinking about Geo-Spatial extension to SQL Server 33 Automatic Testing • 60% of Microsoft R&D is testing. • What can research do to help? – beyond joining the 500,000 Win2K beta testers • Test generation robot: Case W – Make up SQL queries – Send them to SQL Server, Oracle, DB2, Informix,… – If answer is the same, great, if not there is a problem X Y 1672 1672 232 234 241 31 1 1 1 1 31 15 12 28 1 12 5 116 0 29 32 4 18 18 19 25 45 19 18 113 All four agree 84% 1672 1672 • • • • Also good for stress tests Found MANY bugs in our products (all fixed). Found MANY bugs in other’s products. Very valuable tool. • MSR-TR-98-21 Massive Stochastic Testing of SQL, Slutz, Don http://research.microsoft.com/scripts/pubDB/pubsasp.asp?RecordID=175 Error Z W,X, and Y agree 95% Problem with intermediate table. 34 Gordon Bell on Tele Presentations http://research.microsoft.com/barc/GBell/ 35 Motivation: Telepresentations • Presenter and/or audience telepresent NOT: meeting or collaboration settings Avoids the nasty social issues! Mostly one-way 36 Telepresentation Elements Slides Audio • Video • Script, text comments, hyperlinks, etc. 37 Telepresentations: The Essentials • Slide and audio a must • Add some video (low quality) to make us feel good • Storage and transmission costs low 38 Telepresentations: The Killer App • Increased attendance & lower travel costs • Practical and low-cost NOW • e.g. ACM97 - 2,000 visitors in real space, 20,000 visitors on Internet http://research.microsoft.com/acm97 39 Today’s Experiment • Would you like to pause, rewind, browse? • Do you wish you could have seen this – At home? – At another time? • How much does a present speaker add? How much would you pay for real presence? 40 University Lectures Online • • • • Research lectures on-line & on-demand http://murl.microsoft.com/ Will get UVC content Available to anyone anywhere – T1 good, 28.8 OK • Generated by CMU, MIT, MSR, Stanford, UW, Xerox • Hosted by MSR 41 Outline • What I am doing • What my group is doing • What Microsoft Research is doing • How it affects Microsoft Products and others • Q&A 42 Microsoft Research • 450 Scientists • University research model: – Open publication – Collaboration – Many visitors • Many research areas • Major focus on break-throughs in human-computer interfaces • Mostly Redmond, Washington, USA • Also Labs in Beijing, China, Cambridge UK, San Francisco, CA, USA 43 Flows Making “Flows” a Reality • Computer Graphics – Creating realistic looking environments, people • Computer Vision – Analyzing posture, gaze, gestures • Speech input/output • Natural Language – Analysis, IR • Implicit requests for information 45 Building life-like human characters Generating life-like speech from textual data • Data-driven stochastic speech – Natural sounding – Rapid, automatic customizability • Examples – Synthetic voice w/ transplanted speech contours 47 Artificial singing • AT&T Voder, 1962, by Homer Dudley – Daisy (Inspiration for HAL’s voice in 2001) • Microsoft Research Whistler, 1997 – Scarborough Fair 48 Understanding language: MindNet • Ten year investment • A huge language knowledge base • Automatically created from dictionaries • Words (nodes) linked by relationships • Millions of links • Recently added (Encarta) encyclopedia knowledge working on web knowledge 49 Changing balance between user & software systems • Yesterday: – Applications were single programs running in isolation – Users used to (more or less) understand systems that they used • Today: – Componentized applications operate in concert – Sophisticated users understand only small percentage of systems they use 50 Examples of user agents & implicit actions • Lumiere (Office 97) – Monitoring user and program events to provide user help and assistance • Implicit queries – Inferring information needs from browsing • Lookout SpamKiller – Monitoring mail activity to auto-categorize it 51 Tomorrow’s Systems and Applications • Users will not be able to predict – where computations will be performed, – when they will be performed or – by what software components • Gap between system capabilities and user understanding will grow to the point that the only way user will be able to use system is through assisting agents 52 Millennium • Long-term research project to eliminate distinction between distributed and local computing. • Raise the Level of Abstraction App App App AppApplication Millennium COM+ COM+ COM+ NT NT NT • Maintain single system image. • Transparent invocation, migration, and recovery. • Individual computers, file systems, and networks become unimportant to application developers. • System auto-configure, automonitor, auto-tune 53 Outline • What I am doing • What my group is doing • What Microsoft Research is doing • How Research affects Products • Q&A 54 Analyzing language • Language recognition shipped in Word 97 • General purpose text-critiquing, summarization, Japanese word-breaking 55 Inside The Office Grammar Checker 56 Microsoft ClearType • 200% - 300% increase in resolution • S/W solution that works on existing color LCD displays 57 SQL 7 Tuning Wizard • Automate physical database design • Analyzes actual server usage history • Makes recommendations to improve performance 58 SQL 7 Index Wizard is Good but will get better • On a complex query set wizard is 90% of best expert. • Extending to other aspects of DB design 59 Data Mining • Find interesting structure (patterns, relationships) in data – Prediction – Segmentation (clustering) – Dependency modeling (find distribution) – Summarization – Trend and change detection and modeling • Allow user to state the query in terms of the business logic – User does not speak statistics or SQL • Use data to build predictors – regression, classification, segmentation etc. • Generate summaries and reports for insight – find “easy to describe” segments in data automatically – find segments not known to analyst 60 Example Embedded Feature: Microsoft SiteServer Commerce 3.0 • Intelligent Cross-sell • Based on: – Historical sales baskets in stores – Contents of current shopper basket – Browsing behavior of shopper • Predict: ranking of products in store likely to be most interesting to shopper. Http://www.holtoutlet.com/outlet4 61 100.0% 98.5% 94.8% 68.5% 56.9% 43.8% 34.5% 25.5% 6.7% 5.3% 1.3% 0.6% 0.3% 0.2% 0.1% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % Captured of true targets Mail to 25% and capture 40% 400% improved response! % mailed Real data drawn from a Microsoft marketing example 62 How do people use www.microsoft.com? 70M hits per day 10M users/week User browsing data X segments Data Mining (Clustering) Engine Cluster Visualizer Wizard 63 64 Windows 2000 IntelliMirror™ • Extends CMU Coda File System ideas • Files and settings mirrored on client and server • Great for disconnected users • Facilitates roaming • Easy to replace PCs • Optimizes network performance • Grew out of a research prototype 65 WinSock Direct Path – 70 us latency – 86 MBps bandwidth – USER LEVEL (application to application)! – On workstation-class PCs • In Windows2000. • Microseconds Sockets is the “standard” interface for programs. TCP LPC VIA-copy Internet apps all use sockets. 1600 So, can we make sockets fast? 1200 800 COM+ over SAN: 400 VIA-direct 0 0 2048 4096 6144 8192 Data size (bytes) VIA-direct VIA-copy TCP 100 Bandwidth (MBps) • • • • 80 60 40 20 High-Performance Distributed Objects 0 over a System Area Network 0 16384 32768 Li, Li ; Forin, Alessandro ; Hunt, Galen ; Data size (bytes) Wang, Yi-Min, December 1998 MSR-TR-98-68 http://research.microsoft.com/scripts/pubDB/pubsasp.asp?RecordID=214 49152 65536 66 Outline • • • • What I am doing What my group is doing What Microsoft Research is doing How it affects Microsoft Products and others • Q&A 67 68 Scaleable Servers Jim Gray Microsoft Research Gray@Microsoft.com http://Research.Microsoft.com/~Gray/Talks 69 Exponential Growth • Some enterprises growing well 15%/year business as usual • Some enterprises growing fast 50%/year can ride Moore’s law • Some enterprises are exploding 500%/year huge demand for services Example Hotmail: 300%/year • New demand – As customers go online – As online allows global consolidation 70 Unpredictable Growth • The TerraServer Story: – – – – We expected 5 M hits per day We got 50 M hits on day 1 We peak at 15-20 M hpd on a “hot” day Average 5 M hpd after 1 year • Most of us cannot predict demand – Must be able to deal with NO demand – Must be able to deal with HUGE demand 71 An Architecture for X? • Need to be able to add capacity – New processing – New storage – New networking • Need continuous service – Online change of all components (hardware and software) – Multiple service sites – Multiple network providers • Need great development tools – Change the application several times per year. – Add new services several times per year. 72 Premise: Each Site is a Farms • Buy computing by the slice: – Rack of servers + disks. • Grow by adding slices – Spread data and computation to new slices • Two styles: – Clones: anonymous servers – Mobs+Packs: Partitions fail over within a pack • In both cases, remote site for disaster recovery 73 Clones: Availability+Scalability • Some applications are – read-mostly – low consistency requirements – Modest storage requirement (less than 1TB) • Examples: – HTML web servers (IP sprayer/sieve + replication) – LDAP servers (replication via gossip) • Replicate whole app at all nodes (clones) • Spray requests across nodes. • Grow by adding clones • Fault tolerance: stop sending to that clone. 74 What Clones Need • Automatic replication – Applications (and system software) – Data • Automatic request routing – Spray or sieve • Management: – Who is up? – Update management & propagation – Application monitoring. 75 Mobs for Scalability • Clones do not always work. • Some applications are – statefull – with fairly high update rate • Examples – Email – Databases • Partition state among servers (mob) • Scalability: – Partition split/merge – Partitioning must be transparent to client. 76 Packs for Availability • Each partition may fail (independent of others) • Partitions migrate to new node via fail-over – Failover in seconds • Pack: the nodes supporting a partition • Mobs typically grow in packs. 77 What Mobs+Packs Need • Automatic partitioning (in dbms, mail, files,…) – Location transparent – Partition split/merge • Simple failover model – Partition migration is transparent – MSCS-like model for services • Application-centric request routing • Management: – Who is up? – Automatic partition management (split/merge) – Application monitoring. 78 AlwaysUP: site pairs • Tape-based backup/restore too slow for online • Keep online copy of data at second site – Clone or transaction log • Failover to second site in case of disaster • Masks many – Envrionmental faults – Operations faults – Some software faults • Also eases many operations problems 79 MAS • Manageability • Availability • Scalability 80 Manageability • Manage growth & change of – – – – Applications Nodes/servers Data Sites • Automate standard tasks • Operations deals with exceptions – A few per hour? – If load grows 10x then a few per minute? – A few events per year is the safe zone! 81 System Management Must be Automatic • Self operating – propagate changes to all members of cluster • Self tuning – load balance, design,… • Self repair – Failover – Software upgrades – Call-home for replacement parts. 82 Availability • • • • • We monitor most large web sites. They deliver worse than 99% availability. There is a lot of “hype” about 6-9’s (99.99999% = ½ minute/year) Those people are not counting scheduled downtime! MCI scheduled 36 hours of downtime a few weekends ago! • Let’s talk end-to-end availability. 83 SMP+NUMA 84 The Largest TPC-C Benchmark Sun+Oracle 115,395.73 tpmC @ 105.63 $/tpmC == 12.2 M$ available 8/22/99 27 Sun StorageEdge A3500 with 1778 disks (15.6 TB) Sun E10000 + Solaris+ Oracle8i 64x400 Mhz cpus + 64 GB RAM (8,430k$ hardware + 1,890 k$ software) Tuxedo® on 32 Sun Ultra 10 333Mhz workstations, Wyse terminals 429k$ hardware, 168k$ software 85 Commodity TPC-C Benchmark Compaq + Microsoft 40,266.4 tpmC @ 18.70 $/tpmC == 0.8 M$ available 12/31/99. Compaq SmartArray RAID5 477 disks, 5TB, 418k$ Compaq Proliant 8000, Microsoft NT4, SQL 7 8x550MHz Intel Xeon, 4 GB RAM 101k$ hardware, 49k$ software IIS web server on on 5 Compaq Proliant 2 x500 Mhz workstations, 35k$ hardware, 4k$ software 86 Unix Per Unit Costs Much Higher $50 • • • • • Disk $/MB CPU/trans Software/tran Network/tran Compaq: – – – – 3.3x more 11x more 27x more ~ same 3 vs 30 cabinets 5 vs 33 nodes 18 vs 96 cpus Pure COM+ $45 $/tpmC by component $40 Sun+Oracle Compaq+Microsoft $35 $30 $25 $20 $15 $10 $5 $0 cpu+mem disks Network Oracle 8i on Sun Starfire 64x 115396 tpmC @ 105$/tpmC % Per tpmC cpu+memory 3871 31% $34 disks 5171 41% $45 Network 438 3% $4 software 3058 24% $27 SUM 12538 100% $109 TPMC 115396 Compaq 40,0013 tpmc @ 18.86$/tpmC cpu+memory 127 17% disks 439 58% Network 141 19% software 49 6% SUM 756 100% TPMC 40013 $3 $11 $4 $1 $19 software Ram $/MB Disk $/MB $7.12 $0.33 ram$/MB disk$/MB $3.87 $0.09 87 What Conclusions? • • • • • Sun+Oracle has impressive performance but… 8x more cpus give 2.6x more throughput The first 40 ktpmC cost 0.8 M$ The next 80 ktpmC cost 11.4 M$ Why not buy 3 “small” systems and spread the load? – Save 9.8 M$ – Get fault tolerance (failover to other server) – Would require partitioning the database and app. • Scaleout with commodity components. • Clone whole system at remote site for disaster protection 88 How much is 100 kTpmC? • • • • • 100,000 users Each submitting 2.3 transactions/minute About 300 M transactions/day A huge number! About 1/3 of Yahoo! load 89 500 140 450 400 120 350 100 300 25080 20060 150 40 100 5020 0 0 IISPerformance Performance IIS UP UP 2P 2P 4P 8P 4P Pages/Sec Pages/Sec Serving 1,000 statement ASPs the new COM+ In-Proc In-Proc OOP OOP Win NT4 2000sp5 RC1 In-Proc In-Proc OOP OOP Win2000 2000build beta 2111 Win 90 NCSA • Super-computing performance at mail-order prices. 91 Cornell 92 Beowolf 93 ASCII 94 www.Microsoft.com a typical web cluster server Building 11 Internal WWW Staging Servers (7) Log Processing Av e CFG: 4xP6, 1 GB RAM, 180 GB HD Av e Cost: $128K FY98 Fcst: 2 The Microsoft.Com Site Ave CFG: 4xP5, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 12 SQLNet Feeder LAN Router Liv e SQL Serv ers MOSWest Admin LAN Live SQL Server All servers in Building11 are accessable from corpnet. register.microsoft.com (2) Ave CFG: 4xP6, home.microsoft.com (4) w w w .microsoft.com (4) premium.microsoft.com Ave CFG: 4xP6, (2) Av e CFG: 4xP6 512 RAM 28 GB HD Av e Cost: $35K FY98 Fcst: 17 FDDI Ring (MIS1) FDDI Ring (MIS2) activex.microsoft.com (2) Av e CFG: 4xP6, 256 RAM, 30 GB HD Av e Cost: $25K FY98 Fcst: 2 Router Internet premium.microsoft.com (1) w w w .microsoft.com (3) register.msn.com (2) Switched Ethernet search.microsoft.com (1) Japan Data Center w w w .microsoft.com SQL SERVERS (2) premium.microsoft.com (3) Av e CFG: 4xP6, (1) Av e CFG: 4xP6, 512 RAM, Av e CFG: 4xP6, 512 RAM, 30 GB HD Av e Cost: $35K FY98 Fcst: 1 512 RAM, 50 GB HD Av e Cost: $50K FY98 Fcst: 1 160 GB HD Av e Cost: $80K FY98 Fcst: 1 msid.msn.com (1) Switched Ethernet FTP Download Serv er (1) HTTP Download Serv ers (2) search.microsoft.com (2) Router Secondary Gigaswitch Router (100 Mb/Sec Each) support.microsoft.com (2) Ave CFG: 4xP6, 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 9 13 DS3 (45 Mb/Sec Each) Ave CFG: 4xP5, 512 RAM, 30 GB HD Ave Cost: $28K FY98 Fcst: 0 register.microsoft.com (2) support.microsoft.com search.microsoft.com (1) (3) 2 Ethernet Router FTP.microsoft.com (3) register.microsoft.com (1) (100Mb/Sec Each) Internet Router msid.msn.com (1) 2 OC3 Primary Gigaswitch Router FDDI Ring (MIS3) Av e CFG: 4xP6, 512 RAM, 160 GB HD Av e Cost: $80K FY98 Fcst: 1 Router Router Av e CFG: 4xP5, 256 RAM, 20 GB HD Av e Cost: $29K FY98 Fcst: 2 Av e CFG: 4xP6, 512 RAM, 30 GB HD Av e Cost: $28K FY98 Fcst: 7 Router msid.msn.com (1) SQL SERVERS (2) Router Av e CFG: 4xP6, 512 RAM, 30 GB HD Av e Cost: $28K FY98 Fcst: 3 cdm.microsoft.com (1) Av e CFG: 4xP5, 256 RAM, 12 GB HD Av e Cost: $24K FY98 Fcst: 0 FTP Download Serv er (1) msid.msn.com (1) search.microsoft.com (3) home.microsoft.com (3) Av e CFG: 4xP6, 1 GB RAM, 160 GB HD Av e Cost: $83K FY98 Fcst: 2 msid.msn.com (1) 512 RAM, 30 GB HD Ave Cost: $43K FY98 Fcst: 10 Av e CFG: 4xP6, 512 RAM, 50 GB HD Av e Cost: $50K FY98 Fcst: 17 512 RAM, 30 GB HD Ave Cost: $35K FY98 Fcst: 3 Av e CFG: 4xP6, 512 RAM, 160 GB HD Av e Cost: $83K FY98 Fcst: 12 Av e CFG: 4xP6, 512 RAM, 50 GB HD Av e Cost: $35K FY98 Fcst: 2 512 RAM, 30 GB HD Av e Cost: $35K FY98 Fcst: 1 512 RAM, 50 GB HD Av e Cost: $50K FY98 Fcst: 1 SQL Consolidators DMZ Staging Serv ers Router w w w .microsoft.com (4) home.microsoft.com (2) w w w .microsoft.compremium.microsoft.com (1) Av e CFG: 4xP6, Av e CFG: 4xP6, (3) MOSWest FTP Servers Ave CFG: 4xP5, 512 RAM, Download 30 GB HD Replication Ave Cost: $28K FY98 Fcst: 0 SQL Reporting Av e CFG: 4xP6, 512 RAM, 160 GB HD Av e Cost: $80K FY98 Fcst: 2 European Data Center IDC Staging Serv ers w w w .microsoft.com (5) Internet FDDI Ring (MIS4) home.microsoft.com (5) 95 TerraServer as an Example 96 TerraServer Manageability 97 TerraServer Availability 98 TerraServer Scalability 99