Crays, Clusters, Centers and Grids Gordon Bell (gbell@microsoft.com) Bay Area Research Center Microsoft Corporation Copyright Gordon Bell Clusters & Grids Summary Sequential & data parallelism using shared memory, Fortran computers 60-90 Search for parallelism to exploit micros 85-95 Users adapted to the clusters aka multicomputers by lcd program model, MPI. >95 Beowulf standardized clusters of standard hardware and software >1998 “Do-it-yourself” Beowulfs impede new structures and threaten centers >2000 High speed nets kicking in to enable Grid. Copyright Gordon Bell Clusters & Grids Outline Retracing scientific computing evolution: Cray, DARPA SCI & “killer micros”, Clusters kick in. Current taxonomy: clusters flavors deja’vu rise of commodity computng: Beowulfs are a replay of VAXen c1980 Centers Role of Grid and Peer-to-peer Will commodities drive out new ideas? Copyright Gordon Bell Clusters & Grids DARPA Scalable Computing Initiative c1985-1995; ASCI Motivated by Japanese 5th Generation Realization that “killer micros” were Custom VLSI and its potential Lots of ideas to build various high performance computers Threat and potential sale to military Copyright Gordon Bell Clusters & Grids Steve Squires & G Bell at our “Cray” at the start of Darpa’s SCI. Copyright Gordon Bell Clusters & Grids Dead Supercomputer Society Copyright Gordon Bell Clusters & Grids Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics DARPA Results Many research and construction efforts … virtually all failed. DARPA directed purchases… screwed up the market, including the many VC funded efforts. No Software funding. Users responded to the massive power potential with LCD software. Clusters, clusters, clusters using MPI. It’s not scalar vs vector, its memory bandwidth! – – 6-10 scalar processors = 1 vector unit 16-64 scalars = a 2 – 6 processor SMP Copyright Gordon Bell Clusters & Grids CDC 1960 1965 1970 1604 6600 7600 1975 1980 Star 205 Cray Research Vector and SMPvector Cray 1 MPPs (DEC/Compaq Alpha) SMP(Sparc) SGI MIPS 1985 XMP 1990 1995 2000 ETA 10 2 YMP C T SVs-----> sold to SUN SMP & Scalable SMP buy & sell Cray Research ? The evolution of vector supercomputers Cray Inc. ? Tera Computer (Multi-Thread Arch.) _-- HEP@Denelcor |--------- Cray Computer SRC Company (Intel based shared memory multiprocessor) Fujitsu vector Hitachi vector NEC vector IBM vector Other parallel Cray 3 MTA1,2 4 SRC1 VP 100 … --------------------> Hitachi 810... -----------> SX1… SX5 2938 vector processor Illiac IV, TI ASC Intel Microprocessors 8008 8086,8 286 3090 vector processing 386 486 Pentium Itanium 1960 CDC 1604 1965 1970 1975 1980 1985 1990 1995 2000 The evolution of Cray Inc. 6600 7600 Star Cray Research Vector and SMPvector Cray 1 MPPs (DEC/Compaq Alpha) SMP(Sparc) 205 XMP ETA 10 2 YMP C T SVs-----> sold to SUN SGI MIPS SMP & Scalable SMP buy & sell Cray Research ? Cray Inc. Tera Computer (Multi-Thread Arch.) _-- ? HEP@Denelcor |--------- Cray Computer SRC Company (Intel based shared memory multiprocessor) MTA1,2 Cray 3 4 SRC1 Top500 taxonomy… everything is a cluster aka multicomputer Clusters are the ONLY scalable structure – Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector. MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name Cluster types. Implied message passing. – – – – – Constellations = clusters of >=16 P, SMP Commodity clusters of uni or <=4 Ps, SMP DSM: NUMA (and COMA) SMPs and constellations DMA clusters (direct memory access) vs msg. pass Uni- and SMPvector clusters: Vector Clusters and Vector Constellations Copyright Gordon Bell Clusters & Grids Copyright Gordon Bell Clusters & Grids The Challenge leading to Beowulf NASA HPCC Program begun in 1992 Comprised Computational Aero-Science and Earth and Space Science (ESS) Driven by need for post processing data manipulation and visualization of large data sets Conventional techniques imposed long user response time and shared resource contention Cost low enough for dedicated single-user platform Requirement: – 1 Gflops peak, 10 Gbyte, < $50K Commercial systems: $1000/Mflops or 1M/Gflops Copyright Gordon Bell Clusters & Grids Linux - a web phenomenon Linus Tovald - bored Finish graduate student writes news reader for his PC, uses Unix model Puts it on the internet for others to play Others add to it contributing to open source software Beowulf adopts early Linux Beowulf adds Ethernet drivers for essentially all NICs Beowulf adds channel bonding to kernel Red Hat distributes Linux with Beowulf software Low level Beowulf cluster management tools added Copyright Gordon Bell Clusters & Grids Copyright Gordon Bell Clusters & Grids Courtesy of Dr. Thomas Sterling, Caltech The Virtuous Economic Cycle drives the PC industry… & Beowulf Attracts suppliers Greater availability @ lower cost Standards Attracts users Copyright Gordon Bell Creates apps, tools, training, Clusters & Grids BEOWULF-CLASS SYSTEMS Cluster of PCs – – – Pure M2COTS Unix-like O/S with source – Linux, BSD, Solaris Message passing programming model – Intel x86 DEC Alpha Mac Power PC PVM, MPI, BSP, homebrew remedies Single user environments Large science and engineering Copyright Gordon Bell Clusters & Grids applications Interesting “cluster” in a cabinet 366 servers per 44U cabinet – – – Single processor 2 - 30 GB/computer (24 TBytes) 2 - 100 Mbps Ethernets ~10x perf*, power, disk, I/O per cabinet ~3x price/perf Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes Copyright Gordon Bell Clusters & Grids Lessons from Beowulf An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that allowed apps to form Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech. Direction and concerns Commodity clusters are evolving to be mainline supers Beowulf do-it-yourself effect is like VAXen … clusters have taken a long time. Will they drive out or undermine centers? Or is computing so complex as to require a center to manage and support complexity? Centers: – – Data warehouses Community centers e.g. weather Will they drive out a diversity of ideas? Assuming there are some? Copyright Gordon Bell Clusters & Grids Grids: Why now? Copyright Gordon Bell Clusters & Grids The virtuous cycle of bandwidth supply and demand Increased Demand Increase Capacity (circuits & bw) Standards Create new service Telnet & FTP EMAIL Lower response time WWW Audio Voice! Video Redmond/Seattle, Map of GrayWABell Prize results single-thread single-stream tcp/ip New York via 7 hops desktop-to-desktop …Win 2K out of the box performance* Arlington, VA San Francisco, CA 5626 km 10 hops Copyright Gordon Bell Clusters & Grids The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/ Yesterday: – – – 10 MBps (100 Mbps Ethernet) ~20 MBps tcp/ip saturates 2 cpus round-trip latency ~250 µs Now – 250 Time µs to Send 1KB 200 150 Transmit receivercpu sender cpu 100 Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… – Fast user-level communication - tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us 1.6 Gbps demoed Copyright Gordon Bellon a WAN 50 0 100Mbps Gbps SAN Clusters & Grids SNAP … c1995 Scalable Network And Platforms A View of Computing in 2000+ We all missed the impact of WWW! Platform Gordon Bell Copyright Gordon Bell Network Jim Gray Clusters & Grids How Will Future Computers Be Built? Thesis: SNAP: Scalable Networks and Platforms • Upsize from desktop to world-scale computer • based on a few standard components Platform Network Because: • Moore’s law: exponential progress • Standardization & Commoditization • Stratification and competition When: Sooner than you think! • Massive standardization gives massive use • Economic forces are enormous Computing SNAP built entirely from PCs Wide-area global network Mobile Nets Wide & Local Area Networks for: terminal, PC, workstation, & servers Person Person servers servers (PCs) (PCs) ??? TC=TV+PC home ... (CATV or ATM or satellite) Portables Legacy mainframes & Legacy minicomputers mainframe & terms servers & minicomputer servers & terminals scalable computers built from PCs Centralized &Centralized departmental uni& mP servers & departmental (UNIX & NT) servers buit from PCs A space, time (bandwidth), & generation scalable environment Copyright Gordon Bell Clusters & Grids SNAP Architecture---------- Copyright Gordon Bell Clusters & Grids GB plumbing from the baroque: evolving from the 2 dance-hall model Mp ---- S --- Pc : | : |——————-- S.fiber ch. — Ms | : |— S.Cluster |— S.WAN — vs. MpPcMs — S.Lan/Cluster/Wan — : Copyright Gordon Bell Clusters & Grids Grids: Why? The problem or community dictates a Grid Economics… thief or scavenger Research funding… that’s where the problems are Copyright Gordon Bell Clusters & Grids The Grid… including P2P GRID was/is an exciting concept … – – They can/must work within a community, organization, or project. What binds it? “Necessity is the mother of invention.” Taxonomy… interesting vs necessity – – – – – – – – Cycle scavenging and object evaluation (e.g. seti@home, QCD, factoring) File distribution/sharing aka IP theft (e.g. Napster, Gnutella) Databases &/or programs and experiments (astronomy, genome, NCAR, CERN) Workbenches: web workflow chem, bio… Single, large problem pipeline… e.g. NASA. Exchanges… many sites operating together Transparent web access aka load balancing Facilities managed PCs operating as cluster! Copyright Gordon Bell Clusters & Grids Some observations Clusters are purchased, managed, and used as a single, one room facility. Clusters are the “new” computers. They present unique, interesting, and critical problems… then Grids can exploit them. Clusters & Grids have little to do with one another… Grids use clusters! Clusters should be a good simulation of tomorrow’s Grid. Distributed PCs: Grids or Clusters? Perhaps some clusterable problems can be solved on a Grid… but it’s unlikely. – Lack of understanding clusters & variants Copyright Gordon Bell Clusters & Grids – Socio-, political, eco- wrt to Grid. deja’ vu ARPAnet: c1969 – – To use remote programs & data Got FTP & mail. Machines & people overloaded. NREN: c1988 – – – BW => Faster FTP for images, data Latency => Got http://www… Tomorrow => Gbit communication BW, latency <’90 Mainframes, minis, PCs/WSs >’90 very large, dep’t, & personal clusters VAX: c1979 one computer/scientist Beowulf: c1995 one cluster ∑PCs /scientist 1960s batch: opti-use allocate, schedule,$ 2000s GRID: opti-use allocate, schedule, $ Copyright Gordon Bell Clusters & Grids (… security, management, etc.) The end Copyright Gordon Bell Clusters & Grids Modern scalable switches … also hide a supercomputer Scale from <1 to 120 Tbps 1 Gbps ethernet switches scale to 10s of Gbps, scaling upward SP2 scales from 1.2 Copyright Gordon Bell Clusters & Grids CMOS Technology Projections 2001 – – 2005 – – logic: 0.10 um, 250 Mtr, 2.0 GHz memory: 17.2 Gbits, 1.45 access 2008 – – logic: 0.15 um, 38 Mtr, 1.4 GHz memory: 1.7 Gbits, 1.18 access logic: 0.07 um, 500 Mtr, 2.5 GHz memory: 68.7 Gbits, 1.63 access 2011 – – logic: 0.05 um, 1300 Mtr, 3.0 GHz memory: 275 Gbits, 1.85 access Copyright Gordon Bell Clusters & Grids Future Technology Enablers SOCs: system-on-a-chip GHz processor clock rate VLIW 64-bit processors – – scientific/engineering application address spaces Gbit DRAMs Micro-disks on a board Optical fiber and wave division multiplexing communications (free space?) Copyright Gordon Bell Clusters & Grids The End How can GRIDs become a non- ad hoc computer structure? Get yourself an application community! Copyright Gordon Bell Clusters & Grids p e r f o r m a n c e Volume drives simple, cost to standard price for platforms Stand-alone Desk tops high speed interconnect Distributed workstations PCs Clustered Computers 1-4 processor mP MPPs 1-20 processor mP price Copyright Gordon Bell Clusters & Grids CDC 1960 1965 1970 1604 6600 7600 1975 1980 Star 205 Cray Research Vector and SMPvector Cray 1 MPPs (DEC/Compaq Alpha) SMP(Sparc) SGI MIPS 1985 XMP 1990 1995 2000 ETA 10 2 YMP C T SVs-----> sold to SUN SMP & Scalable SMP buy & sell Cray Research ? Cray Inc. ? Tera Computer (Multi-Thread Arch.) _-- HEP@Denelcor |--------- Cray Computer SRC Company (Intel based shared memory multiprocessor) Fujitsu vector Hitachi vector NEC vector IBM vector Other parallel Cray 3 MTA1,2 4 SRC1 VP 100 … --------------------> Hitachi 810... -----------> SX1… SX5 2938 vector processor Illiac IV, TI ASC Intel Microprocessors 8008 8086,8 286 3090 vector processing 386 486 Pentium Itanium In a 5-10 years we can/will have: more powerful personal computers – processing 10-100x; multiprocessors-on-a-chip – – – adequate networking? PCs now operate at 1 Gbps – – 4x resolution (2K x 2K) displays to impact paper Large, wall-sized and watch-sized displays low cost, storage of one terabyte for personal use ubiquitous access = today’s fast LANs Competitive wireless networking One chip, networked platforms e.g. light bulbs, cameras Some well-defined platforms that compete with the PC for mind (time) and market share watch, pocket, body implant, home (media, set-top) Inevitable, continued cyberization… the challenge… interfacing platforms and people. Linus’s & Stahlman’s Law: Linux everywhere aka Torvald Stranglehold Software is or should be free All source code is “open” Everyone is a tester Everything proceeds a lot faster when everyone works on one code Anyone can support and market the code for any price Zero cost software attracts users! All the developers write code ISTORE Hardware Vision System-on-a-chip enables computer, memory, without significantly increasing size of disk 5-7 year target: MicroDrive:1.7” x 1.4” x 0.2” 2006: ? 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor 2x height Connected via crossbar switch growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tf Copyright Gordon Bell Clusters & Grids The Disk Farm? or a System On a Card? 14" The 500GB disc card An array of discs Can be used as 100 discs 1 striped disc 50 FT discs ....etc LOTS of accesses/second of bandwidth A few disks are replaced by 10s of Gbytes of RAM and a processor to run Apps!! Copyright Gordon Bell Clusters & Grids