Microsoft Research Directions Jim Gray Senior Researcher Microsoft Corporation gray@microsoft.com http://www.Research.Microsoft.com/~Gray ™ Microsoft Research Goal: pursue strategic technologies for Microsoft Founded in 1991 200 researchers in 12 areas Redmond, San Francisco, Cambridge England Growing to 600 by 2001 Internationally recognized research teams Many publications, conference presentations Leadership roles in professional societies, journals, conferences Direct involvement with product and service groups at Microsoft Microsoft Research Themes Programming tools, methodologies and techniques Advanced interactivity and intelligence Basic block tool, program analysis, IP Speech, natural language, vision Decision theory, 3D graphics, UI Systems and architecture OS, databases, scalable servers Advanced Development Tools Analysis of executables Dynamic analysis driven by user scenarios Instrumented code Automatic reorganization of executables Reduction of code working set size Branch straightening Boot ordering for boot-time reduction Initial Results Reduced code working sets up to 50% Improved throughput by 10% Delivered to ~35 clients Pages referenced Windows NT working set size 500 400 Original Optimized 300 200 100 0 0 20 40 60 Seconds 80 100 Speech Technology Speech recognition Dictation Speaker-independent, large vocabulary Discrete and continuous speech Trainable speech synthesis Speaker-independent, command, and control Prosody and concatenative speech units learned from corpus Download from MS Research web site Natural Language Broad-coverage syntax analyzes unrestricted text Dictionary-based semantic network provides growing knowledge base Flexible underlying system for multiple languages Robotics Vision Machine learning University Discourse/ pragmatics Library UI 2nd F Dictionary SR Concept normalizing Sense choosing Logical form Interactive movies Info highway cruiser Advanced summary Interactive games Peedy+ NL query Improved IR Bob+ Enhanced help Improved SR Semantic critiques Probs (DT) fjeiofjdksl fjeiofjdksl eriowe.asm eriowe.asm qweiqpo eroqweiqpo ero oei iqpwe iiooei iqpwe iio qwpe ec,l;aklqwpe ec,l;akl Revised syntax Initial syntax Morphology Auto indexing Syntactic critiques Phrase spacing Find and replace Levels Of Writing Critiques We scheduled the next meeting for noon. Each of the products are designed to help. I saw the Grand Canyon flying to Arizona. Ladies are requested not to have children in the bar. (From a sign in a Norwegian cocktail lounge) Comic Chat Comic panels based on chat input Users control character's emotions Comic strip acts as compelling record of the conversation Automated: Character placement Balloon construction Balloon layout Camera zoom Panel breaks Etc. 3D Graphics Research Bring very high-performance, high-quality graphics to PCs Modeling Interactive Uniform treatment of multimedia Representation of 3D models Automatic simplification Animation Simplification Problem 70,000 8,700 34,100 4,200 2,600 2,300 Competing goals: accuracy and conciseness Vision Projects 3D reconstruction from video and images Motion analysis for video compression Model acquisition for rendering Visual human/computer interaction Communication by gestures and expressions Multimodal speech/vision interfaces Motion Analysis Convert masked images into a background sprite for contentbased coding [Scrunch] + + = Working with Softimage on motion tracking + Video-Based 3D Modeling Convert a video sequence into a solid 3D model based on object silhouettes Being used in Lumigraph project Systems Research Areas Scalable, fault-tolerant servers and services Most of this talk is about scalable servers Other OS projects Video & Audio servers - NetShow Real time OS for set-top boxes WindowsCE grew from an AT project High-performance distributed computing Zero Admin Windows IPv6 1987: 256 Tps Benchmark $ 14 million computer (Tandem) A dozen people False floor, two rooms of machines Admin expert A 32 node processor array Hardware experts Auditor Network expert Simulate 25,600 clients Performance expert Manager DB expert A 40 GB disk array (80 drives) OS expert 1997: 10 Years Later One person and one box = 1250 tps One breadbox ~ 5x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 1,000x less 1 micro dollar per transaction Hardware expert OS expert Net expert DB expert App expert 4x200 Mhz cpu 1/2 GB DRAM 12x4 GB disk 3x7x4 GB disk arrays Thesis Many little beat few big $1 million Mainframe 14" 1 MM $100,000 Mini 9" $10,000 Micro 5.25" 3.5" 3 Pico Processor Nano 1 MB 10 picosecond ram 100MB 10 nanosecond ram 10 GB 10 microsecond ram 1TB 10 millisecond disc 2.5" 1.8" 100 TB 10 second tape archive Smoking, hairy golf ball How to connect the many little parts? How to program the many little parts? Fault tolerance? 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-chip SMP Future Super Server: 4T Machine Array of 1,000 4B machines A few megabucks Challenge: 1 bps processors 1 BB DRAM 10 BB disks 1 Bbps comm lines 1 TB tape robot Manageability Programmability Security Availability Scalability Affordability As easy as a single system CPU 50 GB Disc 5 GB RAM Cyber Brick a 4B machine Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work The Hardware Is In Place And then a miracle occurs ? SNAP: scalable network and platforms Commodity-distributed OS built on: Commodity platforms Commodity network interconnect Enables parallel applications Scalable Computers BOTH SMP And Cluster SMP Super Server Departmental Server Personal System Grow up with SMP 4xP6 is now standard Grow out with cluster Cluster has inexpensive parts Cluster of PCs What TPC-Benchmarks Say PC technology 2.5x cheaper than high-end SMPs PC performance is 1/4 high-end SMPs 4xP6 vs 24x UltraSparc 9.1k tpmC @ 49$/tpmC vs 31 ktpmC @ 109$/tpmc 6x more cpus, 3.5x more thruput. NT 2.3 ktpmC/cpu vs Solaris 1.3 ktpmC/cpu Still, UltraSparc performance IS impressive Commodity solutions will come 10000 MS SQL Server tpmC vs Time (200%/year growth) 300.0 MS SQL Server $/tpmC vs Time (200%/year better) 250.0 tpmC 200.0 $/tpmC 150.0 5000 100.0 50.0 0 5/ 95 9/ 95 12/ 95 3/ 96 7/ 96 time 10/ 96 1/ 97 4/ 97 8/ 97 0.0 5/95 12/95 7/96 time 1/97 8/97 HP’s New TPC-C Result TPC Price/tpmC 50 Oracle on UltraSPARC, 31 k tpmC 45 40 38 Microsoft, HP, 9.1 k tpmc 34 35 29 30 24 25 20 15 12 11 9 10 8 6 5 5 0 processor disk software net total/10 How Big Are Windows NT ™ SQL Servers ? Study found Several at 50 GB to 100 GB nodes A few multi-node up to one TB http://131.107.1.182/research/barc/gray/SQL Server Scaleability.doc None beyond 100 GB per node ® A survey shows relatively few operational DBs beyond 1 TB (1 TB ~ 500K$ of disk!) http://www.wintercorp.com/topten.html Want to “pioneer” large DBs on Windows NT Goal Build a 1 TB SQL Server database Demo it on the Internet Show off Windows NT and SQL Server scalability Stress test the product WWW accessible by anyone So data must be 1 TB Unencumbered Interesting to everyone everywhere And not offensive to anyone anywhere The Hardware DEC Alpha + 324 StorageWorks Drives (1.4 TB) SQL Server 7.0 USGS data Russian Space data Two meter resolution images SPIN-2 Image Data Sources 300 GB Src: USGS and UCSB UCSB missing some DOQs DOQ Spin-2 500 GB Worldwide LOB app New data coming Demo Cluster Advantages Clients and servers made from the same stuff Fault tolerance: Spare modules mask failures Modular growth Inexpensive: built with commodity components Grow by adding small modules Parallel data search Use multiple processors and disks Cluster: Shared What? Shared memory multiprocessor Shared disk cluster Multiple processors, one memory All devices are local DEC, SG, Sun Sequent 16..64 nodes Easy to program, not commodity An array of nodes All shared common disks VAXcluster + Oracle Shared nothing cluster Each device local to a node Ownership may change Tandem, SP2, “Wolfpack” Clusters Being Built Teradata 500 nodes (50K$/slice) Tandem, VMScluster 150 nodes Intel, 9,000 nodes @ $55 million (100K$/slice) (6K$/slice) Teradata, Tandem, DEC moving to Windows NT + low slice price IBM: 512 nodes @ $100 million PC clusters (bare handed) at dozens of nodes Web servers (msn, PointCast...), DB servers Key technology is the applications Applications distribute data Applications distribute execution “It’s the applications STUPID!” (200K$/slice) Billion Transactions per Day Project Built a 45-node Windows NT Cluster (with help from Intel & Compaq) > 900 disks All off-the-shelf parts Using SQL Server & DTC distributed transactions DebitCredit Transaction Each node has 1/20 th of the DB Each node does 1/20 th of the work 15% of the transactions are “distributed” Billion Transactions Per Day Hardware 45 nodes (Compaq Proliant) Clustered with 100 Mbps Switched Ethernet 140 cpu, 13 GB, 3 TB. Type Workflow MTS SQL Server Distributed Transaction Coordinator TOTAL nodes CPUs DRAM ctlrs disks 20 Compaq Proliant 2500 20 Compaq Proliant 5000 5 Compaq Proliant 5000 45 20x 20x 20x 20x RAID space 20x 2 128 1 1 2 GB 20x 20x 20x 20x 4 512 4 20x 36x4.2GB 7x9.1GB 130 GB 5x 5x 5x 5x 5x 4 256 1 3 8 GB 140 13 GB 105 895 3 TB 1.2 B tpd 1 B tpd ran for 24 hrs. Sized for 30 days Linear growth 5 micro-dollars per transaction Out-of-the-box software Off-the-shelf hardware AMAZING! How Much Is 1 Billion Tpd? 1 billion tpd = 11,574 tps (transactions per second) ~ 700,000 tpm (transactions/minute) ATT Visa does ~20 million tpd 600,000 tpd Bank of America 400 million customers 250K ATMs worldwide 7 billion transactions (card+cheque) in 1994 New York Stock Exchange 185 million calls per peak day (worldwide) 20 million tpd checks cleared (more than any other bank) 1.4 million tpd ATM transactions Millions of Transactions Per Day Mtpd 1,000. 900. 800. 100. 700. 600. 500. 10. 400. 300. 1. 200. 100. 0. 0.1 1 Btpd Visa Worldwide Airlines Reservations: 250 Mtpd ATT BofA NYSE Clusters (Plumbing) Single-system image Fault tolerance Naming Protection/security Management/load balance “Wolfpack” demo Hot pluggable hardware and software So, What’s New? When slices cost $50,000, you buy 10 or 20 When slices cost $5,000 you buy 100 or 200 Manageability, programmability, usability become key issues (total cost of ownership) PCs are much easier to use and program MPP vicious cycle No customers! New New New New New New New New MPP and app MPP and app MPP and app MPP and app NewOS NewOS NewOS NewOS CP/commodity virtuous cycle: Standards allow progress and investment protection Apps Standard OS and Hardware Customers Windows NT Server Clustering ® High availability on standard hardware Standard API for clusters on many platforms No special hardware required Resource Group is unit of failover 2-node cluster in beta Typical resources: test now Available H1 ’97 Shared disk, printer... >2 node is next IP address, NetName SQL Server and Oracle Service (Web,SQL, File, Print Demo on it today Mail, MTS …) Key concepts API to define System: a node Resource groups Cluster: systems working together Dependencies Resource: HW/SW module Resources Resource dependency: resource GUI administrative interface needs another A consortium of 60 HW and Resource group: fails over as a unit SW vendors (everybody who Dependencies: do not cross is anybody) group boundaries Where We Are Today Clusters moving fast Technology ahead of schedule OLTP “Wolfpack” CPUs, disks, tapes, wires... OR databases are evolving Parallel DBMSs are evolving HSM still immature Metcalf’s Law Network Utility = Users2 How many connections can it make? One user: no utility 100,000 users: a few contacts 1 million users: many on Net 1 billion users: everyone on Net That is why the Internet is so “hot” Exponential benefit Moore’s First Law XXX doubles every 18 months 60% increase per year Micro processor speeds Chip density Magnetic disk density Communications bandwidth WAN bandwidth approaching LAN speeds 1GB 128MB 1 chip memory size ( 2 MB to 32 MB) 8MB 1MB 128KB 8KB 1970 bits: 1K 1990 The past does not matter 10x here, 10x there, soon you’re talking REAL change PC costs decline faster than any other platform 2000 4K 16K 64K 256K 1M 4M 16M 64M 256M Exponential growth: 1980 Volume and learning curves PCs will be the building bricks of all future systems Bumps In The Moore’s Law Road DRAM: 1988: United States antidumping rules 1993-1995: ?price flat 1,000,000 10,000 100 1 1970 Magnetic disk: 1965-1989: 10x/decade 1989-1996: 4x/3year! $/MB of DRAM 1980 1990 2000 $/MB of DISK 10,000 100 1 100X/decade .01 1970 1980 1990 2000 Gordon Bell’s Seven Price Tiers 10$: 100$: 1,000$: 10,000$: 100,000$: 1,000,000$: 10,000,000$: wrist watch computers pocket/ palm computers portable computers • personal computers (desktop) departmental computers (closet) site computers (glass house) regional computers (glass castle) Super server: costs more than $100,000 “Mainframe”: costs more than $1 million Must be an array of processors, disks, tapes, comm ports Bell’s Evolution Of Computer Classes Technology enables two evolutionary paths: 1. Constant performance, decreasing cost 2. Constant price, increasing performance Mainframes (central) Log price Minis (dept.) WSs PCs (personals) Time ?? 1.26 = 2x/3 yrs - 10x/decade; 1/1.26 = .8 1.6 = 4x/3 yrs - 100x/decade; 1/1.6 = .62 Software Economics An engineer costs about $150,000/year R&D gets [5%...15%] of budget Need [$3 million… $1 million] revenue per engineer Intel: $16 billion Profit 22% R&D 8% SG&A 11% Tax 12% P&S 47% Microsoft: $9 billion Profit 24% SG&A 34% Tax 13% Product and Service 13% IBM: $72 billion Profit Tax 6% 5% R&D 16% R&D 8% Oracle: $3 billion Profit 15% Tax 7% SG&A 22% P&S 59% P&S 26% R&D 9% SG&A 43% Software Economics: Bill’s Law Price = Units + Marginal _Cost Bill Joy’s law (Sun): don’t write software for less than 100,000 platforms @ $10 million engineering expense, $1,000 price Bill Gate’s law: don’t write software for less than 1,000,000 platforms @ $10 engineering expense, $100 price Examples: Fixed_ Cost UNIX versus Windows NT: $3,500 versus $500 Oracle versus SQL Server: $100,000 versus $6,000 No spreadsheet or presentation pack on UNIX/VMS/... Commoditization of base software and hardware Gordon Bell’s Platform Economics Traditional computers: custom or semi-custom, high-tech and high-touch New computers: high-tech and no-touch 100000 10000 Price (K$) Volume (K) Application price 1000 100 10 1 0.1 0.01 Mainframe WS Computer type Browser Grove’s Law The New Computer Industry Horizontal integration is new structure Each layer picks best from lower layer Desktop (C/S) market 1991: 50% 1995: 75% Function Operation Example AT&T Integration EDS Applications SAP Middleware Baseware Systems Oracle Microsoft Compaq Silicon and Oxide Intel and Seagate