The Future of Distributed Systems . Jim Gray Researcher Microsoft Corp. Gray@Microsoft.com ™ 1 Outline Global forces Moore’s, Metcalf’s, Bell’s, Bills, Andy’s laws Micro dollars per transaction Cyber-content is key value because distribution costs go to zero Distributed Systems Concepts and terms Key software technologies objects, transactions 2 Metcalf’s Law Network Utility = Users2 How many connections can it make? 1 user: no utility 100,000 users: a few contacts 1 million users: many on Net 1 billion users: everyone on Net That is why the Internet is so “hot” Exponential benefit Moore’s First Law XXX doubles every 18 months 60% increase per year Exponential growth: Micro processor speeds Chip density Magnetic disk density Communications bandwidth WAN bandwidth approaching LAN speeds 1GB 128MB 1 chip memory size ( 2 MB to 32 MB) 8MB 1MB 128KB 8KB 1970 bits: 1K 1980 1990 4K 16K 64K 256K 1M 4M 16M 64M 256M The past does not matter 10x here, 10x there, soon you’re talking REAL change PC costs decline faster than any other platform 2000 Volume and learning curves PCs will be the building bricks of all future systems Bumps In The Moore’s Law Road $/MB of DRAM 1000000 DRAM: 1988: United States anti-dumping rules 1993-1995: ?price flat Magnetic disk: 1965-1989: 10x/decade 1989-1996: 4x/3year! 100X/decade 10000 100 1 1970 1980 1990 2000 $/MB of DISK 10,000 100 1 .01 1970 1980 1990 2000 Gordon Bell’s Seven Price Tiers 10$: 100$: 1,000$: 10,000$: 100,000$: 1,000,000$: 10,000,000$: wrist watch computers pocket/ palm computers portable computers • personal computers (desktop) departmental computers (closet) site computers (glass house) regional computers (glass castle) Super server: costs more than $100,000 “Mainframe”: costs more than $1 million Must be an array of processors, disks, tapes, comm ports Bell’s Evolution Of Computer Classes Technology enables two evolutionary paths: 1. constant performance, decreasing cost 2. constant price, increasing performance Log price Mainframes (central) Minis (dep’t.) WSs PCs (personals) Time ?? 1.26 = 2x/3 yrs -- 10x/decade; 1/1.26 = .8 1.6 = 4x/3 yrs --100x/decade; 1/1.6 = .62 Software Economics An engineer costs about $150,000/year R&D gets [5%…15%] of budget Need [$3 million… $1 million] revenue per engineer Intel: $16 billion Profit 22% R&D 8% SG&A 11% Tax 12% P&S 47% Microsoft: $9 billion Profit 24% R&D 16% SG&A 34% Tax 13% Product and Service 13% IBM: $72 billion Profit Tax 6% 5% R&D 8% Oracle: $3 billion Profit 15% Tax 7% SG&A 22% P&S 59% P&S 26% R&D 9% SG&A 43% Software Economics: Bill’s Law Fixed_Cost Price = + Marginal _Cost Units Bill Joy’s law (Sun): don’t write software for less than 100,000 platforms @$10 million engineering expense, $1,000 price Bill Gate’s law: don’t write software for less than 1,000,000 platforms @$10 engineering expense, $100 price Examples: UNIX versus Windows NT: $3,500 versus $500 Oracle versus SQL-Server: $100,000 versus $6,000 No spreadsheet or presentation pack on UNIX/VMS/... Commoditization of base software and hardware Gordon Bell’s Platform Economics Traditional computers: custom or semi-custom, high-tech and high-touch New computers: high-tech and no-touch 100000 10000 Price (K$) Volume (K) Application price 1000 100 10 1 0.1 0.01 Mainframe WS Computer type Browser Grove’s Law The New Computer Industry Horizontal integration is new structure Each layer picks best from lower layer Desktop (C/S) market 1991: 50% 1995: 75% Function Operation Integration Applications Middleware Baseware Systems Silicon & Oxide Example AT&T EDS SAP Oracle Microsoft Compaq Intel & Seagate Outline Global forces Moore’s, Metcalf’s, Bell’s, Bills, Andy’s laws Micro dollars per transaction Cyber-content is key value because distribution costs go to zero Distributed Systems Concepts and terms Key software technologies objects, transactions 12 1987: 256 tps Benchmark 14 M$ computer (Tandem) A dozen people False floor, 2 rooms of machines Admin expert Hardware experts A 32 node processor array Simulate 25,600 clients Network expert Manager Performance expert DB expert Auditor OS expert A 40 GB disk array (80 drives) 13 1997: 10 years later 1 Person and 1 box = 1250 tps 1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 1,000x less 1 micro dollar per transaction Hardware expert OS expert Net expert DB expert App expert 4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk 3 x7 x 4GB disk arrays 15 What Happened? Moore’s law: Things get 4x better every 3 years (applies to computers, storage, and networks) New Economics: Commodity class price/mips software $/mips k$/year mainframe 10,000 100 minicomputer 100 10 microcomputer 10 1 time GUI: Human - computer tradeoff optimize for people, not computers 16 What Happens Next Last 10 years: 1000x improvement Next 10 years: ???? Today: 1985 1995 text and image servers are free 1 m$/hit cost 70,000m$/hit advertising revenue Advertising pays for them Content is only “real” expense “You ain’t seen nothing yet!” 2005 17 Kinds Of Information Processing Point-to-point Immediate Timeshifted Broadcast Conversation Money Lecture Concert Network Mail Book Newspaper Database It’s ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis and automatic processing are being added Low rent min $/byte Shrinks time now or later Shrinks space here or there Automate processing knowbots Immediate OR time-delayed Why Put Everything In Cyberspace? Point-to-point OR broadcast Network Locate Process Analyze Summarize Database Billions Of Clients Every device will be “intelligent” Doors, rooms, cars… Computing will be ubiquitous Billions Of Clients Need Millions Of Servers All clients networked to servers May be nomadic or on-demand Fast clients want faster servers Servers provide Shared Data Control Coordination Communication Clients Mobile clients Fixed clients Servers Server Super server Thesis Many little beat few big $1 million 3 1 MM $100 K $10 K Pico Processor Micro Mini Mainframe Nano 1 MB 10 pico-second ram 10 nano-second ram 100 MB 10 GB 10 microsecond ram 1 TB 14" 9" 5.25" 3.5" 2.5" 1.8" 10 millisecond disc 100 TB 10 second tape archive Smoking, hairy golf ball How to connect the many little parts? How to program the many little parts? Fault tolerance? 1 M SPECmarks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP Future Super Server: 4T Machine Array of 1,000 4B machines 1 bps processors 1 BB DRAM 10 BB disks 1 Bbps comm lines 1 TB tape robot A few megabucks Challenge: Manageability Programmability CPU 50 GB Disc 5 GB RAM Cyber Brick a 4B machine Security Availability Scaleability Affordability As easy as a single system Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work The Hardware Is In Place… And then a miracle occurs ? SNAP: scaleable network and platforms Commodity-distributed OS built on: Commodity platforms Commodity network interconnect Enables parallel applications Outline Global forces Moore’s, Metcalf’s, Bell’s, Bills, Andy’s laws Micro dollars per transaction Cyber-content is key value because distribution costs go to zero Distributed Systems Concepts and terms Key software technologies objects, transactions 25 Outline Concepts and Terminology Why Distributed Distributed data & objects Distributed execution Three tier architectures Transaction concepts 26 What’s a Distributed System? Centralized: everything in one place stand-alone PC or Mainframe Distributed: some parts remote distributed users distributed execution distributed data 27 Why Distribute? No best organization Companies constantly swing between Centralized: focus, control, economy Decentralized: adaptive, responsive, competitive Why distribute? reflect organization or application structure empower users / producers improve service (response / availability) distributed load use PC technology (economics) 28 What Should Be Distributed? Users and User Interface Thin client Presentation Processing workflow Data Trim client Fat client Business Objects Database Will discuss tradeoffs later 29 Transparency in Distributed Systems Make distributed system as easy to use and manage as a centralized system Give a Single-System Image Location transparency: hide fact that object is remote hide fact that object has moved hide fact that object is partitioned or replicated Name doesn’t change if object is replicated, partitioned or moved. 30 Outline Concepts and Terminology Why Distribute Distributed data & objects Partitioned Replicated Distributed execution remote procedure call queues Three tier architectures Transaction concepts 44 Distributed Execution Threads and Messages Thread is Execution unit threads (software analog of cpu+memory) Threads execute at a node Threads communicate via Shared memory (local) Messages (local and remote) shared memory messages 45 Peer-to-Peer or Client-Server Peer-to-Peer is symmetric: Either side can send Client-server client sends requests server sends responses simple subset of peer-to-peer 46 Remote Procedure Call: The key to transparency y = pObj->f(x); Object may be local or remote Methods on object work wherever it is. Local invocation x f() return val; y = val; val 48 Remote Procedure Call: The key to transparency Remote invocation y = pObj->f(x); x proxy Obj Local? x marshal stub x un marshal pObj->f(x) f() x Obj Local? f() return val; y = val; val val return val; un marshal val marshal val 49 Object Request Broker (ORB) Orchestrates RPC Registers Servers Manages pools of servers Connects clients to servers Does Naming, request-level authorization, Provides transaction coordination (new feature) Old names: Transaction Processing Monitor, Web server, Transaction NetWare Object-Request Broker 50 History and Alphabet Soup 1995 CORBA Solaris Object Management Group (OMG) 1990 X/Open UNIX International 1985 Open software Foundation (OSF) Microsoft DCOM based on OSF-DCE Technology DCOM and ActiveX extend it Open Group OSF DCE NT COM 51 ActiveX and COM COM is Microsoft model, engine inside OLE ALL Microsoft software is based on COM (ActiveX) CORBA + OpenDoc is equivalent Heated debate over which is best Both share same key goals: Encapsulation: hide implementation Polymorphism: generic operations key to GUI and reuse Versioning: allow upgrades Transparency: local/remote Security: invocation can be remote Shrink-wrap: minimal inheritance Automation: easy COM now managed by the Open Group Linking And Embedding Objects are data modules; transactions are execution modules Link: pointer to object somewhere else Think URL in Internet Embed: bytes are here Objects may be active; can callback to subscribers Bottom Line Re ORBs Microsoft Promises Cairo distributed objects, secure, transparent, fast invocation Netscape promises the CORBA Both will deliver Customers can pick the best one Transaction Object-Request Broker 54 Outline Concepts and Terminology Why Distributed Distributed data & objects Distributed execution Three tier architectures remote procedure call queues what why Transaction concepts 57 Work Distribution Spectrum Thin Presentation and plug-ins Workflow manages session & invokes objects Business objects Database Fat Presentation workflow Business Objects Database Fat Thin 61 Transaction Processing Evolution to Three Tier Intelligence migrated to clientsMainframe Mainframe Batch processing (centralized) Dumb terminals & Remote Job Entry cards green screen 3270 TP Monitor Intelligent terminals database backends Workflow Systems Object Request Brokers Application Generators Server ORB Active 62 Web Evolution to Three Tier Intelligence migrated to clients (like TP) Web WAIS Character-mode clients, smart servers Server archie ghopher green screen Mosaic GUI Browsers - Web file servers GUI Plugins - Web dispatchers - CGI Smart clients - Web dispatcher (ORB) pools of app servers (ISAPI, Viper) workflow scripts at client & server NS & IE Active 63 PC Evolution to Three Tier Intelligence migrated to server Stand-alone PC (centralized) PC + File & print server message per I/O PC + Database server message per SQL statement PC + App server message per transaction IO request reply disk I/O SQL Statement Transaction ActiveX Client, ORB ActiveX server, Xscript 64 The Pattern: Three Tier Computing Presentation Clients do presentation, gather input Clients do some workflow (Xscript) Clients send high-level requests to ORB (Object Request Broker) ORB dispatches workflows and business objects -- proxies for client, Business Objects orchestrate flows & queues Server-side workflow scripts call on distributed business objects to execute task workflow Database 65 The Three Tiers Web Client HTML VB Java plug-ins VBscritpt JavaScrpt Middleware VB or Java Script Engine Object server Pool VB or Java Virt Machine Internet HTTP+ DCOM ORB ORB TP Monitor Web Server... Object & Data server. DCOM (oleDB, ODBC,...) IBM Legacy Gateways 66 Why Did Everyone Go To Three-Tier? Manageability Business rules must be with data Middleware operations tools Performance (scaleability) workflow Server resources are precious ORB dispatches requests to server pools Technology & Physics Presentation Put UI processing near user Put shared data processing near shared data Business Objects Database 67 Why Put Business Objects at Server? MOM’s Business Objects DAD’sRaw Data Customer comes to store Takes what he wants Fills out invoice Leaves money for goods Easy to build No clerks Customer comes to store with list Gives list to clerk Clerk gets goods, makes invoice Customer pays clerk, gets goods Easy to manage Clerks controls access Encapsulation 68 What Middleware Does ORB, TP Monitor, Workflow Mgr, Web Server Registers transaction programs workflow and business objects (DLLs) Pre-allocates server pools Provides server execution environment Dynamically checks authority (request-level security) Does parameter binding Dispatches requests to servers parameter binding load balancing Provides Queues Operator interface 69 Server Side Objects Easy Server-Side Execution A Server ORB gives simple execution environment Object gets Network start invoke shutdown Everything else is automatic Drag & Drop Business Objects Queue Connections Context Security Thread Pool Configuration Management Receiver Service logic Synchronization Shared Data 70 A new programming paradigm Develop object on the desktop Better yet: download them from the Net Script work flows as method invocations All on desktop Then, move work flows and objects to server(s) Gives desktop development three-tier deployment Software Cyberbricks Why Server Pools? Server resources are precious. Clients have 100x more power than server. Pre-allocate everything on server preallocate memory pre-open files pre-allocate threads N clients x N Servers x F files = N x N x F file opens!!! pre-open and authenticate clients Keep high duty-cycle on objects (re-use them) Pool threads, not one per client Classic example: TPC-C benchmark IE 2 processes everything pre-allocated Pool of DBC links HTTP 7,000 clients IIS SQL 72 order entry , payment , status (oltp) delivery (mini-batch) restock (mini-DSS) Metrics: Throughput, Price/Performance Shows best practices: everyone three tier 2 processes at server everything pre-allocated HTTP Transaction Processing Performance Council (TPC): standard performance benchmarks 5 transaction types IIS = Web Pool of DBC links ODBC Classic Three-Tier Example TPC-C 7,000 Web clients SQL 73 Outline Laws & micro$/transaction Distributed Systems Why Distributed Distributed data & objects Distributed execution Three tier architectures why: manageability & performance what: server side workflows & objects Transaction concepts Why transactions? Using transactions 75 Thesis Transactions are key to structuring distributed applications ACID properties ease exception handling Atomic: all or nothing Consistent: state transformation Isolated: no concurrency anomalies Durable: committed transaction effects persist 76 What Is A Transaction? Programmer’s view: Bracket a collection of actions A simple failure model Only two outcomes: Begin() action action action action Commit() Success! Begin() Begin() action action action action action action Rollback() Fail ! Rollback() Failure! 77 Why ACID For Client/Server And Distributed ACID is important for centralized systems Failures in centralized systems are simpler In distributed systems: More and more-independent failures ACID is harder to implement That makes it even MORE IMPORTANT Simple failure model Simple repair model 81 Outline Why Distributed Distributed data & objects Distributed execution Three tier architectures Transaction concepts Why transactions? Using transactions programming workflow 90 References Essential Client/Server Survival Guide 2nd ed. Client/Server Programming with Java and CORBA Orfali, Harkey, J Wiley, 1997 Principles of Transaction Processing Orfali, Harkey & Edwards, J. Wiley, 1996 Bernstein & Newcomer, Morgan Kaufmann, 1997 Transaction Processing Concepts and Techniques Gray & Reuter, Morgan Kaufmann, 1993 91 ™ 92