Technical computing: Observations on an ever changing, occasionally repetitious, environment Los Alamos National Laboratory 17 May 2002 Copyright Gordon Bell LANL 5/17/2002 A brief, simplified history of HPC 1. 2. 3. 4. 5. 6. 7. 8. 9. Sequential & data parallelism using shared memory, Cray’s Fortran computers 60-02 (US:90) 1978: VAXen threaten general purpose centers… NSF response: form many centers 1988 - present SCI: Search for parallelism to exploit micros 85-95 Scalability: “bet the farm” on clusters. Users “adapt” to clusters aka multi-computers with LCD program model, MPI. >95 Beowulf Clusters adopt standardized hardware and Linus’s software to create a standard! >1995 “Do-it-yourself” Beowulfs impede new structures and threaten g.p. centers >2000 1997-2002: Let’s tell NEC they aren’t “in step”. High speed networking enables peer2peer computing and the Grid. Will this really work? Outline Retracing scientific computing evolution: Cray, SCI & “killer micros”, ASCI, & Clusters kick in. Current taxonomy: clusters flavors deja’vu rise of commodity computng: Beowulfs are a replay of VAXen c1978 Centers: 2+1/2 at NSF; BRC on CyberInfrastructure urges 650M/year Role of Grid and Peer-to-peer Will commodities drive out or enable new ideas? Copyright Gordon Bell LANL 5/17/2002 DARPA SCI: c1985-1995; prelude to DOE’s ASCI Motivated by Japanese 5th Generation … note the creation of MCC Realization that “killer micros” were Custom VLSI and its potential Lots of ideas to build various high performance computers Threat and potential sale to military Copyright Gordon Bell LANL 5/17/2002 Steve Squires & G Bell at our “Cray” at the start of DARPA’s SCI c1984. Copyright Gordon Bell LANL 5/17/2002 What Is the System Architecture? (GB c1990) Distributed Memory Multiprocessors (scalable) Multiprocessors Single Address Space Shared Memory Computation MIMD Distributed Multicomputers (scalable) X SIMD Multiple Address Space Message Passing Computation X Cross-point or Multi-stage Cray, Fujitsu, Hitachi, IBM, NEC, Tera Central Memory Multiprocessors (not scalable) Multicomputers Dynamic Binding of addresses to processors KSR Static binding, Ring multi IEEE SCI proposal Static Binding, caching Alliant, DASH Static Run-time Binding research machines X Simple, ring multi ... bus multi replacement Bus multis DEC, Encore, NCR, ... Sequent, SGI,Sun Mesh Connected Intel Butterfly/Fat Tree/Cubes CM5, NCUBE Switch connected IBM X Fast LANs for High Availability and High Capacity Clusters DEC, Tandem LANs for Distributed Processing Workstations, PCs GRID Processor Architectures? VECTORS OR CS View VECTORS SC Designers View MISC >> CISC >> RISC >> VCISC Language directed (vectors)>> RISC >> Super-scalar >> Extra-Long Instruction Word Massively parallel (SIMD) (multiple pipelines) Caches: mostly alleviate need for memory B/W Memory B/W = perf. Copyright Gordon Bell LANL 5/17/2002 The Bell-Hillis Bet c1991 Massive (>1000) Parallelism in 1995 TMC TMC TMC World-wide Supers World-wide Supers World-wide Supers Applications Petaflops / mo. Revenue Copyright Gordon Bell LANL 5/17/2002 Results from DARPA’s SCI c1983 Many research and construction efforts … virtually all new hardware efforts failed except Intel and Cray. DARPA directed purchases… screwed up the market, including the many VC funded efforts. No Software funding! Users responded to the massive power potential with LCD software. Clusters, clusters, clusters using MPI. It’s not scalar vs vector, its memory bandwidth! – – 6-10 scalar processors = 1 vector unit 16-64 scalars = a 2 – 6 processor SMP Copyright Gordon Bell LANL 5/17/2002 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics What a difference 25 years AND spending >10x makes! ESRDC: 40 Tflops. 640 nodes (8 - 8GFl P.vec/node) LLNL 150 Mflops machine room c1978 Copyright Gordon Bell LANL 5/17/2002 Computer types -------- Connectivity-------WAN/LAN Netwrked Supers… GRID Legion & P2P Condor SAN VPPuni DSM SM NEC super NEC mP Cray X…T (all mPv) Clusters Old World Mainframes T3E SGI DSM SP2(mP) clusters & Multis BeowulfNOW SGI DSM WSs PCs NT clusters Copyright Gordon Bell LANL 5/17/2002 Top500 taxonomy… everything is a cluster aka multicomputer Clusters are the ONLY scalable structure – Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector. MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name Cluster types. Implied message passing. – – – – – Constellations = clusters of >=16 P, SMP Commodity clusters of uni or <=4 Ps, SMP DSM: NUMA (and COMA) SMPs and constellations DMA clusters (direct memory access) vs msg. pass Uni- and SMPvector clusters: Vector Clusters and Vector Constellations Copyright Gordon Bell LANL 5/17/2002 Linux - a web phenomenon Linus Tovald - writes news reader for his PC Puts it on the internet for others to play Others add to it contributing to open source software Beowulf adopts early Linux Beowulf adds Ethernet drivers for essentially all NICs Beowulf adds channel bonding to kernel Red Hat distributes Linux with Beowulf software Low level Beowulf cluster management tools added Copyright Gordon Bell LANL 5/17/2002 The Challenge leading to Beowulf NASA HPCC Program begun in 1992 Comprised Computational Aero-Science and Earth and Space Science (ESS) Driven by need for post processing data manipulation and visualization of large data sets Conventional techniques imposed long user response time and shared resource contention Cost low enough for dedicated single-user platform Requirement: – 1 Gflops peak, 10 Gbyte, < $50K Commercial systems: $1000/Mflops or 1M/Gflops Copyright Gordon Bell LANL 5/17/2002 The Virtuous Economic Cycle drives the PC industry… & Beowulf Attracts suppliers Greater availability @ lower cost Standards Attracts users Copyright Gordon Bell Creates apps, tools, training, LANL 5/17/2002 Lessons from Beowulf An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that allowed apps to form Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech. Clusters: Next Steps Scalability… They can exist at all levels: personal, group, … centers Clusters challenge centers… given that smaller users get small clusters Copyright Gordon Bell LANL 5/17/2002 Kilo Mega Disk Evolution Capacity:100x in 10 years 1 TB 3.5” in 2005 20 TB? in 2012?! System on a chip High-speed SAN Disk replacing tape Disk is super computer ! LANL 5/17/2002 Giga Tera Peta Exa Zetta Yotta Copyright Gordon Bell Intermediate Step: Shared Logic Snap Brick with 8-12 disk drives 200 mips/arm (or more) 2xGbpsEthernet General purpose OS 10k$/TB to 100k$/TB Shared – – – – – Sheet metal Power Support/Config Security Network ports ~1TB 12x80GB NAS NetApp ~.5TB 8x70GB NAS Maxstor ~2TB 12x160GB NAS IBM TotalStorage ~360GB 10x36GB NAS LANL 5/17/2002 These bricks could run applications e.g. SQL, Mail… Copyright Gordon Bell SNAP Architecture---------- Copyright Gordon Bell LANL 5/17/2002 RLX “cluster” in a cabinet 366 servers per 44U cabinet – – – Single processor 2 - 30 GB/computer (24 TBytes) 2 - 100 Mbps Ethernets ~10x perf*, power, disk, I/O per cabinet ~3x price/perf Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes Computing in small spaces @ LANL (RLX cluster in building with NO A/C) 240 processors @2/3 GFlops Fill the 4 racks -- gives a Teraflops Beowulf Clusters: space Performance/Space Ratio ASCI White Bladed Beowulf Mflops/Sq. Ft. Copyright Gordon Bell LANL 5/17/2002 Beowulf clusters: power Performance/Power Ratio Beowulf Bladed Beowulf Mflops/Watt Copyright Gordon Bell LANL 5/17/2002 “The networks becomes the system.”- Bell 2/10/82 Ethernet announcement with Noyce (Intel), and Liddle (Xerox) “The network become the computer.” SUN Slogan >1982 “The network becomes the system.” GRID mantra c1999 Copyright Gordon Bell LANL 5/17/2002 Computing SNAP built entirely from PCs Wide-area global network Mobile Nets Wide & Local Area Networks for: terminal, PC, workstation, & servers Person Person servers servers (PCs) (PCs) ??? TC=TV+PC home ... (CATV or ATM or satellite) Portables Legacy mainframes & Legacy minicomputers mainframe & terms servers & minicomputer servers & terminals scalable computers built from PCs Centralized &Centralized departmental uni& mP servers & departmental (UNIX & NT) servers buit from PCs A space, time (bandwidth), & generation scalable environment Copyright Gordon Bell LANL 5/17/2002 The virtuous cycle of bandwidth supply and demand Increased Demand Increase Capacity (circuits & bw) Standards Create new service Telnet & FTP EMAIL Lower response time WWW Audio Voice! Video Internet II concerns given $0.5B cost Very high cost – – Disks cost $1/GByte to purchase! Low availability of fast links (last mile problem) – – $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper DSL at home is $0.15 - $0.30 Labs & universities have DS3 links at most, and they are very expensive Traffic: Instant messaging, music stealing Performance at desktop is poor – 1- 10 Mbps; very poor communication links Copyright Gordon Bell LANL 5/17/2002 Scalable computing: the effects They come in all sizes; incremental growth 10 or 100 to 10,000 (100X for most users) debug vs run; problem growth Allows compatibility heretofore impossible 1978: VAX chose Cray Fortran 1987: The NSF centers went to UNIX Users chose sensible environment – – The role of gp centers e.g. NSF, statex is unclear. Necessity for support? – – – Acquisition and operational costs & environments Cost to use as measured by user’s time Scientific Data for a given community… Community programs and data Manage GRIDdiscipline Are clusters ≈ Gresham’s Law? Drive out alts. The end Copyright Gordon Bell LANL 5/17/2002