NSF Visit Gordon Bell www.research.microsoft.com/~gbell Microsoft Research 4 October 2002 Topics • How much things have changed since CISE was formed in 1986, but remain the same? • 10 year base case @CRA’s Grand Challenges? http://www.google.com/search?sourceid=navclient&q=cra+grand+challenges • GB MyLifeBits: storing one’s entire life for recall, home media, etc. • Clusters, Grids, and Centers…challenge is apps • Supercomputing directions Messages… • The Grand Challenge for CISE is to work on applications in science, engineering, and bio/medicine/health care (e.g. NIH). • Databases versus greping. Revolution needed. Performance from software >= Moore’s Law • Big challenge moving forward will come from trying to manage and exploit all the storage. • Supercomputing: Cray. Gresham's Law • Build on industry standards and efforts. Grid and “web services” must co-operate. • Whatever happened to the first, Grand Challenges? • Minimize grant overhead… site visits. IBM Sets Up Biotech Research Center U.S.-based IBM recently set up a biotechnology research and development center in Taiwan -- IBM Life Sciences Center of Excellence -- the company's first in the Asia Pacific region… the center will provide computation solutions and services from an integrated bio-information database linked to resources around the world. Local research institutes working in cooperation with the center include Academia Sinica, the Institute for Information Industry and National Yang Ming University. From HPCWire 30 September 2002 Retrospective: CISE formed in 1986 • CISE spent about $100 million on research in 1987 • Q: What areas of software research do you think will be the most vital in the next decade? • A: Methods to design and build large programs and data bases in a distributed environment are central. • Q: What software research areas are funded? • A: We fund what the community considers to be important … objectoriented languages, data bases, & human interfaces; semantics; formal methods of design and construction; connectionism; and data and knowledge bases, including concurrency. We aren’t funding applications. Software Productivity c1986 • I believe the big gains in software will come about by eliminating the old style of programming, by moving to a new paradigm, rather than magic tools or techniques to make the programming process better. Visicalc and Lotus 1-2-3 are good examples of a dramatic improvement in programming productivity. In essence, programming is eliminated and the work put in the hands of the users. • These breakthroughs are unlikely to come from the software research community, because they aren’t involved in real applications. Most likely they will come from people trained in another discipline who understand enough about software to be able to carry out the basic work that ultimately is turned over to the software engineers to maintain and evolve. Software productivity c1986 • Q: The recent Software Engineering Conference featured a division of opinion on mechanized programming. … developing a programming system to write programs can automate much of the mundane tasks… • A: Mechanized programming is recreated and renamed every few years. In the beginning, it meant a compiler. The last time it was called automatic programming. A few years ago it was program generators and the programmer’s work bench. The better it gets, the more programming you do! Parallelism c1986 • To show my commitment to parallel processing, for the next 10 years I will offer two $1000 annual awards for the best, operational scientific or engineering program with the most speedup ... • Q: What …do you expect from parallelism in the next decade? • A: Our goal is obtaining a factor of 100 … within the decade and a factor of 10 within five years. 10 will be easy because it is inherently in most applications right now. The hardware will clearly be there if the software can support it or the users can use it. • Many researchers think this goal is aiming too low. They think it should be a factor of I million within 15 years. However, I am skeptical that anything more than our goal will be Goodness No challenge, next decade of systems. Industry’s evolutionary path… ¿Que sera sera Grand Challengeland Death and Doldrums 2000 Time 2012 Computing Research Association Grand Challenges Gordon Bell Microsoft Research 26 June 2002 In a decade, the evolution: We can count on: • Moore’s Law provides ≈50-100x performance, const. $ 20% $ decrease/year => ½ per 5 years • Terabyte personal stores => personal db managers • Astronomical sized, by current standards, databases! • Paper quality screens on watch, tablets… walls • DSL wired, 3-4G/802.11j nets (>10 Mbps) access • Network Services: Finally computers can use|access the web. “It’s the Internet, Stupid.” – Enabler of intra-, extra-, inter-net commerce – Finally EDI/Exchanges/Markets • Ubiquity rivaling the telephone. – Challenge: An instrument to supplant the phone? – Challenge: Affordability for everyone on planet <$1500/year • Personal authentication to access anything of value • Murphy’s Law continues with larger and more complex systems, requiring better fundamental understanding. A opportunity and need for “Autonomic Computing” In a decade, the evolution: We are likely to “have” • 120M computers/yr. World population >1B. – increasing with decreasing price. 2x / -50% – X% are discarded. Result is 1 Billion. • Smaller personals w/phones… video @PDA $ • Almost adequate speech communication for commands, limited dictation, note taking, segmenting/indexing video • Vision capable of tracking each individual in a relatively large crowd. With identity, everybody’s location is known, everywhere, anytime. Inevitable wireless nets… body, home, …x-area nets will create new opportunities • Need to construct these environment of platforms, networking protocols, and programming environments for each kind • Each net has to research its own sensor/effector structure as f(application) e.g. body, outdoor, building, • Taxonomy includes these alternative dimensions: – – – – – – – – Network function master|slave vs. distributed… currently peripheral nets permanent|dynamic indoor|outdoor; size and spatial diameter; bandwidth and performance; sensor/effector types; security and noise immunity; New environments can support a wide range of new apps • Continued evolution of personal monitoring and assistance for health and personal care of all ages • Personal platforms that provide “total recall” that will assist (25% of population) solving problems • Platforms for changing education will be available. Limiters: Authoring tools & standards; content • Transforming the scientific infrastructure is needed! – – – – petabyte databases, petaflops performance shared data notebooks across instruments and labs new ways of performing experiments and new ways of programming/visualizing and storing data. • Serendipity: Something really new, like we get every decade but didn’t predict, will occur. R & D Challenges • Engineering, evolutionary construction, and non-trivial maintenance of billions of node, fractal nets ranging from the space, continent, campus, local, … to in-body nets • Increasing information flows & vast sea of data – Large disks everywhere! personal to large servers across all apps – Akin to the vast tape libraries that are never read (bit rot) • A modern, healthcare system that each of us would be happy or unafraid of being admitted into. Cf. islands (incompatible systems) of automation and instruments floating on a sea of paper moved around by people who maintain a bloated and inefficient “services” industry/economy. MyLifeBits, The Challenge of a 0.001-1 Petabyte lifetime PC Cyberizing everything… I’ve written, said, presented (incl. video), photos of physical objects & a few things I’ve read, heard, seen and might “want to see” on TV "The PC is going to be the place where you store the information … really the center of control“ Billg 1/7/2001 MyLifeBits is an “on-going” project following CyberAll to “cyberize” all of personal bits! ►Memory recall of books, CDs, communication, papers, photos, video ►Photos of physical object collections ►Elimination of all physical stores & objects ►Content source for home media: ambiance, entertainment, communication, interaction Freestyle for CDs, photos, TV content, videos Goal: to understand the 1 TByte PC: need, utility, cost, feasibility, challenge & tools. Storing all we’ve read, heard, & seen Human data-types read text, few pictures /hr 200 K /day (/4yr) 2 -10 M/G /lifetime 60-300 G speech text @120wpm speech @1KBps 43 K 3.6 M 0.5 M/G 40 M/G 15 G 1.2 T stills w/voice @100KB 200 K 2 M/G 60 G video-like 50Kb/s POTS video 200Kb/s VHS-lite 22 M 90 M .25 G/T 1 G/T 25 T 100 T video 4.3Mb/s HDTV/DVD 1.8 G 20 G/T 1P © 2002 Scenes from Media Center A “killer app” for Terabyte, Lifetime, PC? MyLifeBits demonstrates need for lifetime memory! ► MODI (Microsoft Office Document Imaging)! The most significant Office™ addition since HTML. ► Technology to support the vision: 1. Guarantee that data will live forever! 2. A single index that includes mail, conversations, web accesses, and books! 3. E-book…e-magazines reach critical mass! 4. Telephony and audio capture are needed 5. Photo & video “index serving” 6. More meta-information … Office, photos 7. Lots of GUIs to improve ease-of-use ► The Clusters – GRID Era GS CC C 2002 Lyon, France September 2002 Copyright Gordon Bell Clusters & Grids Same observations as 2000 GRID was/is X an exciting concept … They can/must work within a community, organization, or project. Apps need to drive. – “Necessity is the mother of invention.” – Taxonomy… interesting vs necessity Web SVCs Cycle scavenging and object evaluation (e.g. seti@home, QCD) – File distribution/sharing for IP theft Napster – Databases &/or programs for a community (astronomy, bioinformatics, CERN, NCAR) – e.g. Grid nj. An arbitrary distributed, cluster platform A geographical and multi-organizational collection of diverse computers dynamically configured as cluster platforms responding to arbitrary, ill-defined jobs “thrown” at it. Costs are not necessarily favorable e.g. disks are less expensive than cost to transfer data. Latency and bandwidth are non-deterministic, thereby changing cluster characteristics Once a large body of data exists for a job, it is inherently bound to (set into) fixed resources. Large datasets & I/O bound programs need to be with their data or be database accesses… But are there resources there to share? Bright spots… near term, user focus, a lesson for Grid suppliers Tony Hey, head of UK scientific computing. apps-based funding. versus tools-based funding. Web services based Grid & data orientation. David Abramson - Nimrod. – – – Andrew Grimshaw - Avaki – Parameter scans… other low hanging fruit Encapsulate apps! “Excel”-- language/control mgmt. “Legacy apps are programs that users just want, and there’s no time or resources to modify code …independent of age, author, or language e.g. Java.” Making Legion vision real. A reality check. Lip 4 pairs of “web services” based apps Gray et al Skyservice and Terraservice Goal: providing a web service must be as easy as publishing a web page…and will occur!!! SkyServer: delivering a web service to the astronomy community. Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer http://research.microsoft.com/~gray/Papers/MSR_ TR_2001_77_Virtual_Observatory.pdf http://research.microsoft.com/~gray/Papers/MSR_ TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community http://research.microsoft.com/~gray/Papers/MSR_ TR_01_104_SkyServer_V1.pdf http://research.microsoft.com/~gray/Papers/MSR_ TR_01_104_SkyServer_V1.doc Copyright Gordon Bell Clusters & Grids What can be learned from Sky Server? It’s about data, not about harvesting flops 1-2 hr. query programs versus 1 wk programs based on grep 10 minute runs versus 3 day compute & searches Database viewpoint. 100x speed-ups Avoid costly re-computation and searches Use indices and PARALLEL I/O. Read / Write >>1. – Parallelism and Copyright Gordon Bell is automatic, transparent, Clusters & Grids just depends on the number of – – Some science is hitting a wall FTP and GREP are not adequate (Jim Gray) You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. You can FTP 1 MB in 1 sec. You can FTP 1 GB / min. … 2 days and 1K$ disks … 3 years and 1M$ 1PB ~10,000 >> 1,000 At some point you need indices to limit search parallel data search and analysis Goal using dbases. Make it easy to – – Publish: Record structured data Find data anywhere in the network Get the subset you need! – Explore datasets interactively Database becomes the file system!!! Network concerns Very high cost – – – Disks cost less than $2/GByte to purchase Low availability of fast links (last mile problem) – – Labs & universities have DS3 links at most, and they are very expensive Traffic: Instant messaging, music stealing Performance at desktop is poor – $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper Disks cost $1/GByte to purchase!!! DSL at home is $0.15 - $0.30 1- 10 Mbps; very poor communication links Manage: trade-in fast links for cheap links!! Gray’s $2.4 K, 1 TByte Sneakernet aka Disk Brick Cost to move a Terabyte Cost, time, and speed to move a Terabyte Cost of a “Sneaker-Net” TB We now ship NTFS/SQL disks. Not good format for Linux. Ship NFS/CIFS/ODBC servers (not disks). Plug “disk” into LAN. DHCP then file or DB serve… Service in Bay long term CourtesyWeb of Jim Gray, Microsoft Area Research Cost to move a Terabyte Speed Rent Raw Context Mbps $/month $/Mbps home phone 0.04 40 1,000 home DSL 0.6 70 117 T1 1.5 1,200 800 T3 43 28,000 651 OC3 155 49,000 316 100 Mpbs 100 Gbps 1000 OC192 9600 1,920,000 200 Raw $/TB Time/TB sent days 3,086 6 years 360 5 months 2,469 2 months 2,010 2 days 976 14 hours 1 day 2.2 hours 617 14 minutes Cost, time of Sneaker-net vs Alts Medi a CD DVD Tape DiskBric 1500 200 25 7 Robot$ 2x800 2x8K 2x15K 1K Media $ 240 400 1000 1,400 TB read + write time ship time TotalTim/ TB Mbps 60 hrs 24 hrs 6 days 28 60 hrs 24 hrs 6 days 28 $20 K $2,000 92 hrs 24 hrs 5 days 18 $31 K $3,100 19 hrs 24 hrs Courtesy of Jim Gray, Microsoft Bay Area Research 2 days 52 Cost (10 TB) $/TB shipped $2 K $208 $2.6 K $260 Grids: Real and “personal” Two carrots, one downside. A bet. Bell will match any Gordon Bell Prize (parallelism, performance, or performance/cost) winner’s prize that is based on “Grid Platform Technology”. I will bet any individual or set of individuals of the Grid Research community up to $5,000 that a Grid application will not win the above by SC2005. Copyright Gordon Bell Clusters & Grids Technical computing: Observations on an ever changing, occasionally repetitious, environment Copyright Gordon Bell LANL 5/17/2002 A brief, simplified history of HPC 1. 2. 3. 4. 5. 6. 7. 8. 9. Sequential & data parallelism using shared memory, Cray’s Fortran computers 60-02 (US:90) 1978: VAXen threaten general purpose centers… NSF response: form many centers 1988 - present SCI: Search for parallelism to exploit micros 85-95 Scalability: “bet the farm” on clusters. Users “adapt” to clusters aka multi-computers with LCD program model, MPI. >95 Beowulf Clusters adopt standardized hardware and Linus’s software to create a standard! >1995 “Do-it-yourself” Beowulfs impede new structures and threaten g.p. centers >2000 1997-2002: Let’s tell NEC they aren’t “in step”. High speed networking enables peer2peer computing and the Grid. Will this really work? What Is the System Architecture? (GB c1990) Distributed Memory Multiprocessors (scalable) Multiprocessors Single Address Space Shared Memory Computation MIMD Distributed Multicomputers (scalable) X SIMD Multiple Address Space Message Passing Computation X Cross-point or Multi-stage Cray, Fujitsu, Hitachi, IBM, NEC, Tera Central Memory Multiprocessors (not scalable) Multicomputers Dynamic Binding of addresses to processors KSR Static binding, Ring multi IEEE SCI proposal Static Binding, caching Alliant, DASH Static Run-time Binding research machines X Simple, ring multi ... bus multi replacement Bus multis DEC, Encore, NCR, ... Sequent, SGI,Sun Mesh Connected Intel Butterfly/Fat Tree/Cubes CM5, NCUBE Switch connected IBM X Fast LANs for High Availability and High Capacity Clusters DEC, Tandem LANs for Distributed Processing Workstations, PCs GRID Processor Architectures? VECTORS OR VECTORS CS View SC Designers View MISC >> CISC >> Language directed RISC >> VCISC (vectors)>> RISC >> Super-scalar >> Extra-Long Instruction Word Massively parallel (SIMD) (multiple pipelines) Caches: mostly alleviate need for memory B/W Memory B/W = perf. Copyright Gordon Bell LANL 5/17/2002 Results from DARPA’s SCI c1983 Many research and construction efforts … virtually all new hardware efforts failed except Intel and Cray. DARPA directed purchases… screwed up the market, including the many VC funded efforts. No Software funding! Users responded to the massive power potential with LCD software. Clusters, clusters, clusters using MPI. Beowulf! It’s not scalar vs vector, its memory bandwidth! – – 6-10 scalar processors = 1 vector unit 16-64 scalars = a 2 – 6 processor SMP Copyright Gordon Bell LANL 5/17/2002 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics What a difference 25 years AND spending >10x makes! ESRDC: 40 Tflops. 640 nodes (8 - 8GFl P.vec/node) LLNL 150 Mflops machine room c1978 Copyright Gordon Bell LANL 5/17/2002 Japanese Earth Simulator • Spectacular results for $400M. – Year to year gain of 10x. The greatest gain since the first (1987) Gordon Bell Prize. – Performance is 10x the nearest entrant – Performance/cost is 3x the nearest entrant – RAP (real application performance) >60% Peak Other machines are typically 10% of peak. – Programming was done in HPF (Fortran) that the US research community abandoned. • NCAR was right in wanting to purchase an NEC super Computer types -------- Connectivity-------WAN/LAN Netwrked Supers… GRID Legion & P2P Condor SAN VPPuni DSM SM NEC super NEC mP Cray X…T (all mPv) Clusters Old World Mainframes T3E SGI DSM SP2(mP) clusters & Multis BeowulfNOW SGI DSM WSs PCs NT clusters Copyright Gordon Bell LANL 5/17/2002 The Challenge leading to Beowulf NASA HPCC Program begun in 1992 Comprised Computational Aero-Science and Earth and Space Science (ESS) Driven by need for post processing data manipulation and visualization of large data sets Conventional techniques imposed long user response time and shared resource contention Cost low enough for dedicated single-user platform Requirement: – 1 Gflops peak, 10 Gbyte, < $50K Commercial systems: $1000/Mflops or 1M/Gflops Copyright Gordon Bell LANL 5/17/2002 The Virtuous Economic Cycle drives the PC industry… & Beowulf Attracts suppliers Greater availability @ lower cost Standards Attracts users Copyright Gordon Bell Creates apps, tools, training, LANL 5/17/2002 Lessons from Beowulf An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that allowed apps to form Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech. Clusters: Next Steps Scalability… They can exist at all levels: personal, group, … centers Clusters challenge centers… given that smaller users get small clusters Copyright Gordon Bell LANL 5/17/2002 Computing in small spaces @ LANL (RLX cluster in building with NO A/C) 240 processors @2/3 GFlops Fill the 4 racks -- gives a Teraflops Internet II concerns given $0.5B cost Very high cost – – Disks cost $1/GByte to purchase! Low availability of fast links (last mile problem) – – $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper DSL at home is $0.15 - $0.30 Labs & universities have DS3 links at most, and they are very expensive Traffic: Instant messaging, music stealing Performance at desktop is poor – 1- 10 Mbps; very poor communication links Copyright Gordon Bell LANL 5/17/2002 Scalable computing: the effects They come in all sizes; incremental growth 10 or 100 to 10,000 (100X for most users) debug vs run; problem growth Allows compatibility heretofore impossible 1978: VAX chose Cray Fortran 1987: The NSF centers went to UNIX Users chose sensible environment – – The role of gp centers e.g. NSF, statex is unclear. Necessity for support? – – – Acquisition and operational costs & environments Cost to use as measured by user’s time Scientific Data for a given community… Community programs and data Manage GRIDdiscipline Are clusters ≈ Gresham’s Law? Drive out alts. The end Copyright Gordon Bell LANL 5/17/2002