INSTITUTE OF COMPUTING TECHNOLOGY Computing for the Masses 为人民计算 Zhiwei Xu 徐志伟 Information Science Advisory Committee, NSFC Institute of Computing Technology (ICT) Chinese Academy of Sciences zxu@ict.ac.cn Contents • Background • Goals • Problems and Approaches Demand: China Computer Market Grows GDP US$Trillion Computer Market US$Billion Internet Users (Million) Client Devices (Million) Still has big growth space 1995 0.69 7.4 2000 1.08 2005 2.30 2010 3.00 2015 4.75 World 217.3 2020 7.07 2007 10 years = USA 3.38 China in 93.1 210today? Internet Penetration 25.9 22.5 (2007) 8.9 China 59.0 16% 111(+80) 49.5 USA 115.6 70% 233 106 403.9 Sources: China NBS, CCID, CNNIC, Goldman Sachs 19% 411 191 662 308 78 Supply: China computer industry is weak (Forbes 2000 for year 2007 US$ billion) Rank Company Country Sales Profits Assets Market Value Profit/Sales P/E Application Services 709 Ebay USA 7.67 0.35 15.37 35 4.56% 100 740 Amazon USA 14.84 0.48 6.49 26.87 3.23% 56 1565 Expedia USA 2.67 0.3 8.3 6.55 11.24% 22 1863 Alibaba China 0.17 0.03 0.23 12.25 17.65% 408 Software & Services 37 IBM USA 98.79 10.42 120.43 157.62 10.55% 15 63 Microsoft USA 57.9 16.96 67.34 253.15 29.29% 15 213 Google USA 16.59 4.2 25.34 147.66 25.32% 35 319 SAP Germany 14.96 2.81 14.93 57.77 18.78% 21 1040 Infosys India 3.21 0.89 3.08 22.09 27.73% 25 1905 Tencent China 0.36 0.14 0.58 11.37 38.89% 81 Hardware 53 HP USA 107.67 7.85 88.57 122.04 7.29% 16 178 Apple USA 26.5 4.07 30.04 109.88 15.36% 27 192 Dell USA 61.13 2.95 27.56 44.6 4.83% 15 1053 RIM Canada 2.95 0.61 3.08 58.73 20.68% 96 1338 Lenovo China 14.53 0.16 5.35 6.47 1.10% 40 Challenges to Academia • C4M supply is seriously lacking • Lag behind demand Lag behind industry • Lag behind international peers Too much short-term “mission” Computing for the Masses (C4M) • Research and applications of computer science for mass adoption – Directly benefit the masses • Billions of people ≠ scientific computing or business computing – Including • Parallelism for the masses • Net computing for the masses • Social computing for the masses Contents • Background • Goals – Mass Adoption billions of people • No dumbing down: Value = Ω(Adoption) – Sustainability: Value↑, resource→ • Problems and Approaches Sustainability Value & Resource Total IT Value Resource consumption Environment impact Energy/operation (2000-2007) μJ Servers electricity bill: $1.9 billion (IDC, 2007 China ) pJ Physics 1960 1970 1980 1990 2000 2010 2020 2030 2040 Time Systems Circuits C4M is not dumbing down Value Personalized Expertise Ubiquity Commodity Basic 2010 2020 2030 2040 Adoption 0.1 0.2 0.3 0.4 0.5 0.6 0.8 1.0 1.2 (Computer Users in China, Billion) Value-Augmenting Adoption PC: example of C4M • Reached more users • More value than mainframe/mini • More innovations with big ideas Value Personalized – Frame buffer – GUI – OO programming 2040 – Ethernet Expertise Ubiquity 2030 2020 Commodity 2010 Basic Adoption 0.1 0.2 0.3 0.4 0.5 0.6 0.8 1.0 1.2 (Computer Users in China, Billion) What’s different now? • The Net = Three Worlds • Man-Machine Symbiosis Man-Machine Society Computing for the Masses Computer users in China: 210 million (2007); 800 million (2020) Business Value Business App & Svc Business Infrastructure IT Infrastructure IT Components Utility Devices Godson CPU Lenovo PC/Laptop Dawning Servers BlueWhale Storage Vega Grid Servers, Networks Storages, Data Sensor Networks, CPS Contents • Background • Goals • Problems and Approaches – A Science of Three Worlds – Architectural Characteristics – Personal Net A Science of Net Computing • Computation as a unifying theme in a new science of three worlds • Enrich our beautiful algorithmic computing theory • Karp – Computational lens • Hurwicz – Mechanism design • Science 2020 Google-Like Computing 2006 Data ($ billion) Revenue:10.6 Profit: 3 Cost: 7 Google Value (Visible) Search, AdWord, Map, Earth, News, Froogle, etc. Google application software and data Sorting, machine learning, graph computing, etc. R&D: 1.2 Google utilities: MapReduce & BigTable Google system software Google filesystem, resource mngt, fault tolerance, etc. Hosting Environment: LAMP Resource: 2.4 Datacenters Servers Data, Metadata Code Linux Servers and Other Resources Distributed in Wide Area O=F(I) Value = F(Resource) Enrich Algorithmic Computing • Traditional algorithmic computing – Turing machine decision problem – Input, output, a procedure of mechanic steps – Time complexity, space complexity • What are “algorithms” in the tri-world? – What is the “decision problem”? • What is Web computable? • What is Wiki computable? – How to quantify “value” and “resource”? – What is a “step”? What is termination? – What are the complexity metrics? System Characteristics • C4M Workload Analytics – Time, space, information – Interaction, energy, effort • Basic “Laws” revisited – Moore’s law – Network effect (Metcalf, Brown) – Viral market – Internet principles (E2E, REST) • New Phenomena and Abstractions Architecture Characterization • Patterson & Hennessey Performance = Program/Time = 1/(#Instructions x CPI x CycleTime) • Need reexamination – Program=C4M workloads – Time=? – Instruction=? • Other important metrics Task/Energy = ? Task/Effort = ? Architecture Characterization Admin, Knowledge, Naming, Coding, Contribution Distributed Systems Control Virtual hosts Decentralized Virtual Machines Salesforce.com Centralized many web sites Single WWW Clouds PNC environment Decentralized Systems Google Amazon Teragrid Multiple Execution Number of Execution Sites (Datacenters, Machines) Personal Net (PN) • A PN for each member of the masses • A general-purpose, personal, net computing platform – a dynamic, virtualized set of assets from the Net (cyberinfrastructure, community, physical world) – appearing to be dedicated to a personal owner’s use and control • People share the Net personally The Net Now • Offering – – – – – – Traditional network services: email, ftp, BBS, messaging Consumer web: Amazon, eBay Business web: salesforce.com Community Web: Wiki, MySpace, Facebook Grid services: Nanohub Platform: Teragrid, Amazon S3 and EC2 (clouds) • Characteristics – Institutional, not personal – Special-purpose solutions, not a general-purpose platform • The Net now is like the Mainframe in 1960’s Personal Net Computing Individuals Pocket Web Accessing Devices The Net Personal Grids Platform Providers Resource Providers A, D O, P C, N, S …… A, D O, P C, N, S PG Platform Provider A, D O, P C, N, S …… A, D O, P C, N, S PG Platform Provider DISC Grids Clouds ASP DSP CSP Assets A: applications D: data O: operating sys. P: policies C: computing N: networking S: storage SSP NSP …… ASP SP: service provider for resources Resources are “raw” assets (capabilities) Pocket Web: battery life > 2 weeks; assets on demand; the Net in you hand Characterizing Emergence R1 Qn R 2 Q 2 Rn Q1 Dual-objective optimization Fairness P standard deviation ( , , ) where C Success Rate 1 if T T where m 1 0 0 i n n j 0l m Pl j 0l m Cl j i i i C i C i D i D 0 if T T Emergences do appear • Xiao et al, “Incentive-based Scheduling for Market-like Computational Grids”, IEEE Transactions on Parallel and Distributed Systems, 2008 Workloads (over 1 million jobs simulated) – Synthetic workloads – Real workloads:www.cs.huji.ac.il/labs/parallel/workload/ • Value For consumers: high job success rate for providers: fair revenue and utilization Clash of the Computer and the Network Approaches • Fetching 10-byte data from a blog server: 162 ms, 52 context switches at server side • How much is needed to host 100 million PG’s? – Response time < 0.25 s • Sustained = 5% Peak? TCP/IP Stack Web/Web Service Stacks 4 Application App Thread Thread VMM HW (core) App Database OS VMM HW system HTML HTTP GSML BPEL WSRF WSDL SOAP XML 3 Transport 2 Inter Network 1 Network Access ? Summary • CS research lags behind demand • C4M is potentially a transformative opportunity for CS research • C4M means sustainability and valueaugmenting mass adoption • C4M research agenda – Establish a science of three worlds – Characterize net computing architectures – Create personal net Thank you! zxu@ict.ac.cn