Prof. Thomas Sterling Department of Computer Science Louisiana State University January 18, 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS AN INTRODUCTION CSC 7600 Lecture 1 : Introduction Spring 2011 Aerial & Satellite of Hurricane Katrina CSC 7600 Lecture 1 : Introduction Spring 2011 2 Devastation from Hurricane Katrina CSC 7600 Lecture 1 : Introduction Spring 2011 3 Simulating Katrina CSC 7600 Lecture 1 : Introduction Spring 2011 Evolution of HPC 1949 Edsac 1 One OPS 1823 Babbage Difference Engine 1943 Harvard Mark 1 1959 IBM 7094 1976 Cray 1 1991 Intel Delta 1996 2003 2009 T3E Cray X1 Cray XT5 103 106 109 1012 KiloOPS MegaOPS GigaOPS TeraOPS 1951 Univac 1 1964 CDC 6600 1982 Cray XMP 1988 Cray YMP 1015 PetaOPS 2001 2006 1997 ASCI Red Earth BlueGene/L Simulator CSC 7600 Lecture 1 : Introduction Spring 2011 5 New Fastest Computer in the World DEPARTMENT OF COMPUTER SCIENCE @ LOUISIANA STATE UNIVERSITY 6 CSC 7600 Lecture 1 : Introduction Spring 2011 2nd Fastest Computer in the World Jaguar (Cray XT5-HE) • Owned by Oak Ridge National Laboratory • Breaks Petaflops processing barrier(1.759e+15 flops) • Contains 224,162 AMD x86_64 Opteron Six Core 2600 MHz chips CSC 7600 Lecture 1 : Introduction Spring 2011 7 Topics • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 8 Synergy Drives Supercomputing Evolution • Technology – Enables digital technology – Defines balance of capabilities – Establishes relationship of relative costs • Architecture – – – – Creates interface between computation and technology Determines structures of technology-based components Establishes low-level semantics of operation Provides low-cost mechanisms • Model of Computation – Paradigm by which computation is manifest – Provides governing principles of architecture operation – Implies programming model and languages CSC 7600 Lecture 1 : Introduction Spring 2011 9 Where Does Performance Come From? • Device Technology – Logic switching speed and device density – Memory capacity and access time – Communications bandwidth and latency • Computer Architecture – Instruction issue rate • • • • Execution pipelining Reservation stations Branch prediction Cache management – Parallelism • Parallelism – number of operations per cycle per processor – Instruction level parallelism (ILP) – Vector processing • Parallelism – number of processors per node • Parallelism – number of nodes in a system CSC 7600 Lecture 1 : Introduction Spring 2011 10 Major Technology Generations (dates approximate) • Electromechanical – 19th century through 1st half of 20th century • Digital electronic with vacuum tubes – 1940s • Core memory – 1950 • Transistors – 1947 • SSI & MSI RTL/DTL/TTL semiconductor – 1970 • DRAM – 1970s • CMOS VLSI – 1990 • Multicore – 2006 CSC 7600 Lecture 1 : Introduction Spring 2011 11 The SIA ITRS Roadmap 100,000 M B per D R A M C hip Lo g ic Tra nsisto rs per C hip ( M ) uP C lo ck (M Hz) 10,000 1,000 100 2012 2009 2006 2003 2001 1999 1 1997 10 Year of Technology Availability CSC 7600 Lecture 1 : Introduction Spring 2011 12 Classical DRAM • Memory mats: ~ 1 Mbit each • Row Decoders • Primary Sense Amps • Secondary sense amps & “page” multiplexing • Timing, BIST, Interface • Kerf 1.00 1000 0.90 100 0.80 % Chip Overhead Gbits per chip 10 1 0.1 0.01 0.001 0.70 0.60 0.50 0.40 0.30 0.20 0.0001 0.10 0.00001 0.00 0.000001 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 Historical ITRS @ Production ITRS @ Introduction Density/Chip has dropped below 4X/3yrs 1970 1980 Historical 1990 2000 SIA Production 2010 2020 SIA Introduction And 45% of Die is Non-Memory CSC 7600 Lecture 1 : Introduction Spring 2011 13 Peak Logic Clock Rates 100,000 100000 Clock (MHz) 3 GHz 3 GHz 1000 1,000 Clock (MHz) 10000 10,000 100 100 10 10 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 Historical ITRS Max Clock Rate (12 invertors) 10000 1000 100 10 Feature Size Historical ITRS Max 2005 projection was for 5.2 GHz – and we didn’t make it in production. Further, we’re still stuck at 3+GHz in production. CSC 7600 Lecture 1 : Introduction Spring 2011 14 Classes of Architecture for High Performance Computers • Parallel Vector Processors (PVP) – NEC Earth Simulator, SX-6 – Cray- 1, 2, XMP, YMP, C90, T90, X1 – Fujitsu 5000 series • Massively Parallel Processors (MPP) – – – – • Intel Touchstone Delta & Paragon TMC CM-5 IBM SP-2 & 3, Blue Gene/Light Cray T3D, T3E, Red Storm/Strider Distributed Shared Memory (DSM) – SGI Origin – HP Superdome • Single Instruction stream Multiple Data stream (SIMD) – Goodyear MPP, MasPar 1 & 2, TMC CM-2 • Commodity Clusters – Beowulf-class PC/Linux clusters – Constellations – HP Compaq SC, Linux NetworX MCR CSC 7600 Lecture 1 : Introduction Spring 2011 15 Top 500 : System Architecture CSC 7600 Lecture 1 : Introduction Spring 2011 16 Driving Issues/Trends • Multicore – Now: 8, AMD Opterons, Intel Xeon – possibly 100’s – will be million-way parallelism • Heterogeneity – GPGPU – Clearspeed – Cell SPE • Component I/O Pins – Off chip bandwidth not increasing with demand • Limited number of pins • Limited bandwidth per pin (pair) – Cache size per core may decline – Shared cache fragmentation • System Interconnect – Node bandwidth not increasing proportionally to core demand • Power – Mwatts at the high end = millions of $s per year CSC 7600 Lecture 1 : Introduction Spring 2011 17 Multi-Core • Motivation for Multi-Core – – – – • Exploits improved feature-size and density Increases functional units per chip (spatial efficiency) Limits energy consumption per operation Constrains growth in processor complexity Challenges resulting from multi-core – Relies on effective exploitation of multiple-thread parallelism • Need for parallel computing model and parallel programming model – Aggravates memory wall • Memory bandwidth – Way to get data out of memory banks – Way to get data into multi-core processor array • Memory latency • Fragments L3 cache – Pins become strangle point • Rate of pin growth projected to slow and flatten • Rate of bandwidth per pin (pair) projected to grow slowly – Requires mechanisms for efficient inter-processor coordination • Synchronization • Mutual exclusion • Context switching CSC 7600 Lecture 1 : Introduction Spring 2011 18 Heterogeneous Multicore Architecture • Combines different types of processors – Each optimized for a different operational modality • Performance > nX better than other n processor types – Synthesis favors superior performance • For complex computation exhibiting distinct modalities • Conventional co-processors – Graphical processing units (GPU) – Network controllers (NIC) – Efforts underway to apply existing special purpose components to general applications • Purpose-designed accelerators – Integrated to significantly speedup some critical aspect of one or more important classes of computation – IBM Cell architecture – ClearSpeed SIMD attached array processor CSC 7600 Lecture 1 : Introduction Spring 2011 19 Topics • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 20 Definitions: “supercomputer” Supercomputer: A computing system exhibiting high-end performance capabilities and resource capacities within practical constraints of technology, cost, power, and reliability. Thomas Sterling, 2007 Supercomputer: a large very fast mainframe used especially for scientific computations. Merriam-Webster Online Supercomputer: any of a class of extremely powerful computers. The term is commonly applied to the fastest high-performance systems available at any given time. Such computers are used primarily for scientific and engineering work requiring exceedingly high-speed computations. Encyclopedia Britannica Online CSC 7600 Lecture 1 : Introduction Spring 2011 21 Moore’s Law Moore's Law describes a longterm trend in the history of computing hardware, in which the number of transistors that can be placed inexpensively on an integrated circuit has doubled approximately every two years. CSC 7600 Lecture 1 : Introduction Spring 2011 22 Top 500 List CSC 7600 Lecture 1 : Introduction Spring 2011 23 Performance • Performance: – A quantifiable measure of rate of doing (computational) work – Multiple such measures of performance • Delineated at the level of the basic operation – ops – operations per second – ips – instructions per second – flops – floating operations per second • Rate at which a benchmark program takes to execute – – – – A carefully crafted and controlled code used to compare systems Linpack Rmax (Linpack flops) gups (billion updates per second) others • Two perspectives on performance – Peak performance • Maximum theoretical performance possible for a system – Sustained performance • Observed performance for a particular workload and run • Varies across workloads and possibly between runs CSC 7600 Lecture 1 : Introduction Spring 2011 24 Scalability • • The ability to deliver proportionally greater sustained performance through increased system resources Strong Scaling – Fixed size application problem – Application size remains constant with increase in system size • Weak Scaling – Variable size application problem – Application size scales proportionally with system size • Capability computing – in most pure form: strong scaling – Marketing claims tend toward this class • Capacity computing – Throughput computing • Includes job-stream workloads – In most simple form: weak scaling • Cooperative computing – Interacting and coordinating concurrent processes – Not a widely used term – Also: coordinated computing CSC 7600 Lecture 1 : Introduction Spring 2011 25 Machine Parameters affecting Performance • • • • • • Peak floating point performance Main memory capacity Bi-section bandwidth I/O bandwidth Secondary storage capacity Organization – – – – – Class of system # nodes # processors per node Accelerators Network topology • Control strategy – – – – MIMD Vector, PVP SIMD SPMD CSC 7600 Lecture 1 : Introduction Spring 2011 26 Topics • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 27 A Brief History of Supercomputing • Mechanical Computing – Babbage, Hollerith, Aiken • Electronic Digital Calculating – Atanasoff, Eckert, Mauchly • von Neumann Architecture – Turing, von Neumann, Eckert, Mauchly, Foster, Wilkes • Semiconductor Technologies • Birth of the Supercomputer – Cray, Watanabe • The Golden Age – Batcher, Dennis, S. Chen, Hillis, Dally, Blank, B. Smith • Common Era of Killer Micros – Scott, Culler, Sterling/Becker, Goodhue, A. Chen, Tomkins • Petaflops – Messina, Sterling, Stevens, P. Smith, CSC 7600 Lecture 1 : Introduction Spring 2011 28 Practical Constraints and Limitations • Cost – Deployment – Operational support • Power – Energy required to run the computer – Energy for support facilities – Energy for cooling (remove heat from machine) • Size – Floor space – Access way for power and signal cabling • Reliability – One factor of availability • Generality – How good is it across a range of problems • Usability – How hard is it to program and manage CSC 7600 Lecture 1 : Introduction Spring 2011 29 Historical Machines • • • • • • • • • • • • Leibniz Stepped Reckoner Babbage Difference Engine Hollerith Tabulator Harvard Mark 1 Un. of Pennsylvania Eniac Cambridge Edsac MIT Whirlwind Cray 1 TMC CM-2 Intel Touchstone Delta Beowulf IBM Blue Gene/L CSC 7600 Lecture 1 : Introduction Spring 2011 30 Golden Age of Parallel Architecture • 1975 – 1992 • Vector – Cray-1&2, NEC SX, Fujitsu VPP 1976 Cray 1 • SIMD – Maspar, CM-2 • Systolic – Warp • Dataflow – Manchester, Sigma, Monsoon • Multithreaded – HEP, MTA • Actor-based – J-Machine CSC 7600 Lecture 1 : Introduction Spring 2011 31 Dark Ages of Parallel Computing Technology drivers • • • • • 1992 to present Killer Micro and mass market PCs High density DRAM High cost of fab lines CSP – Message passing • • • Economy of scale S-curve MPP Weak scaling – Gustafson et al • • • • Beowulf, NOW Clusters MPI Ethernet, Myrinet Linux CSC 7600 Lecture 1 : Introduction Spring 2011 32 Supercomputer Points of Transition • Automated calculating – 17th century • Stored program digital electronic – 1948 • Vector – 1975 • SIMD – 1980s • MPPs – 1991 • Commodity Clusters – 1993/4 • Multicore – 2006 CSC 7600 Lecture 1 : Introduction Spring 2011 33 Topics • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 34 Driving Factors for HPC • Technology trends – Multicore components – Heterogeneous structures and accelerators • The 4 Horsemen of the Apocalypse (SLOW) – – – – Starvation (sufficient parallelism and load balancing) Latency (idle time due to round trip delays) Overhead (critical path support mechanisms) Waiting for contention (inadequate bandwidth) • Reliability – Single point failure modes cannot be tolerated – Reduced feature size and increased component count • Power consumption – Just too much! – Dominating practical growth in mission critical domains • Changing application workload characteristics – Data (meta-data) intensive for sparse numerics and symbolics • Programmability & ease of use – System complexity, scale and dynamics defy optimization by hand CSC 7600 Lecture 1 : Introduction Spring 2011 35 Sources of Performance Degradation (SLOW) • Starvation – Not enough work to do due to insufficient parallelism or poor load balancing among distributed resources • Latency – Waiting for access to memory or other parts of the system • Overhead – Extra work that has to be done to manage program concurrency and parallel resources the real work you want to perform • Waiting for Contention – Delays due to fighting over what task gets to use a shared resource next. Network bandwidth is a major constraint. CSC 7600 Lecture 1 : Introduction Spring 2011 36 The Memory Wall 0 0 Ratio 0 Tim e (n s ) Memory Access Time 1 0 1 0 0 0 0 4 0 0 3 0 0 2 0 0 1 0 0 M e m o ry t o CPU Ra t io 1 5 CPU Time 1 0 0 . 1 1 9 91 79 92 90 02 X C M P e U m 10 02 x i s l o r y c S - A C o 30 02 60 0 9 R k y a P t ei o r i o d s t e m A c THE WALL CSC 7600 Lecture 1 : Introduction Spring 2011 37 ( c Microprocessors no longer realize the full potential of VLSI technology 1e+7 1e+6 Perf (ps/Inst) 1e+5 Linear (ps/Inst) 1e+4 1e+3 1e+2 30:1 1e+1 1,000:1 1e+0 30,000:1 1e-1 1e-2 1e-3 1e-4 1980 1990 2000 2010 2020 CSC 7600 Lecture 1 : Introduction Spring 2011 38 Amdahl’s Law TO start end TA TF start end TF/g TO º time for non-accelerated computation TA º time for accelerated computation TF º time of portion of computation that can be accelerated g º peak performance gain for accelerated portion of computation f º fraction of non-accelerated computation to be accelerated S º speed up of computation with acceleration applied S = TO TA f = TF TO æfö TA = (1- f ) ´ TO + ç ÷ ´ TO ègø TO S= æfö (1- f ) ´ TO + ç ÷ ´ TO ègø 1 S= æfö 1- f + ç ÷ ègø CSC 7600 Lecture 1 : Introduction Spring 2011 39 Amdahl’s Law with Overhead TO start end tF TA tF tF tF n start end v + tF/g TF tFi i v overheadof accelerat ed work segment n V t ot aloverheadfor accelerat ed work vi i TA 1 f TO f TO n v g TO TO S TA 1 f TO f TO n v g 1 S 1 f f n v g TO CSC 7600 Lecture 1 : Introduction Spring 2011 40 Topics • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 41 Supercomputing System Stack • Device technologies – Enabling technologies for logic, memory, & communication – Circuit design • Computer architecture – semantics and structures • Models of computation – governing principles • Operating systems – Manages resources and provides virtual machine • Compilers and runtime software – Maps application program to system resources, mechanisms, and semantics • Programming – languages, tools, & environments • Algorithms – Numerical techniques – Means of exposing parallelism • Applications – End user problems, often in sciences and technology CSC 7600 Lecture 1 : Introduction Spring 2011 42 Topics • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview – Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 43 Addressing the Big Questions • How to integrate technology into computing engines? • How to push the performance to extremes? – What are the enabling conditions? – What are the inhibiting factors? • How to manage supercomputer resources to deliver useful computing capabilities? – What are the hardware mechanisms? – What are the software policies? • How do users program such systems? – What languages and in what environments? – What are the semantics and strategies? • What grand challenge applications demand these capabilities? • What are the computational models and algorithms that can map the innate application properties to the physical medium of the machine? CSC 7600 Lecture 1 : Introduction Spring 2011 44 Goals of the Course • A first overview of the entire field of HPC • Basic concepts that govern the capability and effectiveness of supercomputers • Techniques and methods for applying HPC systems • Tools and environments that facilitate effective application of supercomputers • Hands-on experience with widely used systems and software • Performance measurement methods, benchmarks, and metrics • Practical real-world knowledge about the HPC community • Access by students outside the HPC mainstream CSC 7600 Lecture 1 : Introduction Spring 2011 45 Student Objectives • • • • Computational Scientist HPC researcher System Administrators Design Engineers CSC 7600 Lecture 1 : Introduction Spring 2011 46 Course Overview: Multiple Segments • Introduction – – – – – An Overview Parallel Computer Architecture Commodity Clusters Benchmarking Throughput Computing • Distributed Memory - MPI – Communicating sequential processes (CSP) – Enabling Technologies - Networks – MPI programming – Performance measurement (2) • Shared Memory – OpenMP • System Software – Operating Systems – Schedulers and Middleware – Parallel file I/O • Advanced Techniques ⁻ ⁻ ⁻ Visualization Parallel Algorithms HPC Libraries • Conclusions – What’s beyond the scope of this course – What form will the future of HPC take – Single Node Architecture – Enabling Technologies – Memory, Core Architectures,.. – Parallel thread computing – OpenMP programming – Performance factors and measurement (1) CSC 7600 Lecture 1 : Introduction Spring 2011 47 Introduction & Throughput Computing January February Tu 18 Introduction Th 20 Parallel Computer Architecture, Quiz1 Tu Commodity Cluster 25 Th 27 Benchmarking, Quiz2 Tu Throughput Computing 1 *Project walkthroughs will be held during office hours. CSC 7600 Lecture 1 : Introduction Spring 2011 48 Distributed Memory & MPI Th 3 CSP / Parallelism, Quiz3 Tu 8 MPI 1 Th 10 MPI 2 / Performance Measurement (TAU), Quiz4 Tu 15 Shared Memory / Parallelization, Sample Project Overview *Project walkthroughs will be held during office hours. CSC 7600 Lecture 1 : Introduction Spring 2011 49 Shared Memory & OpenMP March Th 17 Enabling Technologies -(memory, architecture, multicore, cache coherence) , Quiz5 Tu 22 Pthreads Th 24 OpenMP , Quiz6 Tu 1 Performance Measurement (PAPI…) Th 3 Visualization, Quiz7, Project Abstract Due Tu Th 8 10 Mardi Gras Holidays Parallel Algorithms 1, Quiz8 *Project walkthroughs will be held during office hours. CSC 7600 Lecture 1 : Introduction Spring 2011 50 Advanced Techniques April Th Tu 17 22 Parallel Algorithms 2, Quiz9 Parallel Algorithms 3, Project Walkthroughs* Th Tu Th 24 29 31 Parallel Algorithms 4, Project Walkthroughs*, Quiz10 Libraries 1 Libraries 2, Quiz11 Tu Th Tu Th 5 7 12 14 Parallel File I/O 1 Parallel File I/O 2, Quiz12 Operating Systems 1 Operating Systems 2, Quiz13 *Project walkthroughs will be held during office hours. CSC 7600 Lecture 1 : Introduction Spring 2011 51 System Software May Tu 19 Spring Break Th 21 Spring Break Tu 26 Scheduling / Workload Management Systems Th 28 Checkpointing/System Administration, Project Due, Quiz14 Tu Th 3 5 Beyond and Beyond Class Summary / Final Exam Review Th 12 FINAL EXAM (7:30 – 9:30 AM) *Project walkthroughs will be held during office hours. CSC 7600 Lecture 1 : Introduction Spring 2011 52 Topics • • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Demo 1 : Performance Scalability Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 53 Course Website • HPC Course Website can be accessed at: http://www.cct.lsu.edu/csc7600 • Course Info: – Syllabus – Schedule • Contact Information in the (People Section): email, IM, Phone etc. • All course announcements will be made via email and Website. • Lecture Slides will be made available on the course website (Course Material Section) • Videos of Lectures will be made available on the course website (Course Material Section) after every lecture. CSC 7600 Lecture 1 : Introduction Spring 2011 54 Contact Information Prof. Thomas Sterling tron@cct.lsu.edu (225) 578-8982 (CCT Office) Johnston Hall 320, (225) 578-3320 Office Hours: Tu(1:00 - 3:00 PM) & Th(9:00 – 10:00 AM) Teaching Assistant: Daniel Kogler dkogler@eatel.net Office Hours : Johnston 318 Tuesday 1:40 – 3:00 PM Thursday 9:00 – 10:00 AM Course Secretary : Ms. Terrie Bordelon tbordelon@cct.lsu.edu 302 Johnston Hall (225) 578-5979 CSC 7600 Lecture 1 : Introduction Spring 2011 55 Grading Policy Grading Policy for Graduate Students : • • • • • Midterm – 20 % Final – 30 % Problem Sets – 25 % Quizzes – 5 % Project – 20 % Grading Policy for Under-Graduate Students : • • • • Midterm – 30 % Final – 35 % Problem Sets – 30 % Quizzes – 5 % CSC 7600 Lecture 1 : Introduction Spring 2011 56 Assignments • There will be adequately portioned assignments during this course. – Assignments should be turned in as PRINTOUTS to the TA the following TUESDAY BEFORE CLASS. – Assignments should be turned in WORD format / PDF format. NO handwritten assignments will be accepted. – Assignments involving programming problems should have source code printed and attached, and all solution relevant materials (e.g. PBS scripts, commands used for performance measurement etc…) must be well documented and attached. – Source code and all relevant files for assignments involving programming assignments needs to be submitted according to the guidelines mentioned in each problem-set and is due the same time as the assignment (late policy for source code submissions is the same as that of assignments). CSC 7600 Lecture 1 : Introduction Spring 2011 57 Assignments • LATE POLICY: – All assignments should be turned in on the due date BEFORE the CLASS. – Assignments turned in on the same day by 5 PM (Central) will incur a penalty of 30% of the assignment grade. – Assignments turned in BEYOND 5PM (Central) of the due date will receive 0 points irrespective of the work quality. • IMPORTANT : – Most of the assignments will need to be run on local supercomputing resources that are shared among several users. – Jobs that you submit WILL get stuck in a queue. – “Queue ate my homework” is NOT an acceptable excuse for not turning homework in. – You are strongly encouraged to start working on assignments as and when they are assigned to avoid inevitable queue wait times. CSC 7600 Lecture 1 : Introduction Spring 2011 58 Graduate Student Projects • Term projects are required for Graduate Students • Sample Topics – – – – Parallel Image Processing Application performance measurement Advanced visualization techniques Parallel Programming • LATE POLICY: – Abstracts turned in later than the assigned date will incur an overall project penalty of 5% – Walkthroughs done later than the assigned date will incur a overall project penalty of 15% – Projects turned in later than the assigned date will NOT be considered for grading and will have an automatic score of 0. CSC 7600 Lecture 1 : Introduction Spring 2011 59 Graduate Student Project Topics • Application Scaling : detailed analysis & performance profiling of application(s) based on parameters such as number of processors, application performance bottlenecks, etc.. • Application Development : design and develop new parallel applications with simple performance profiling analysis. • Architecture Comparative Studies: alternative networks, processors, accelerators CSC 7600 Lecture 1 : Introduction Spring 2011 60 Reference Material • No Required Textbook • Lecture notes (slides), required reading lists (URLs) provided at the end of lectures, some additional notes (on web site), and assignments would be primary sources of material for exams. • Students are strongly encouraged to pursue additional reading material available on the internet (and as part of projects). CSC 7600 Lecture 1 : Introduction Spring 2011 61 DEMO: Computing Resources Overview presented by Adam Yates CSC 7600 Lecture 1 : Introduction Spring 2011 62 Computing Resources Arete [arete.cct.lsu.edu] • 64 compute nodes x 8 cores • Quad-core AMD Opteron Processor @ 2.4 Ghz • 8 GB RAM per Node • 24TB of shared storage • 1GB ethernet network interface • 10GB Infiniband interconnect CSC 7600 Lecture 1 : Introduction Spring 2011 63 Plagiarism • The LSU Code of Student Conduct defines plagiarism in Section 5.1.16: – "Plagiarism is defined as the unacknowledged inclusion of someone else's words, structure, ideas, or data. When a student submits work as his/her own that includes the words, structure, ideas, or data of others, the source of this information must be acknowledged through complete, accurate, and specific references, and, if verbatim statements are included, through quotation marks as well. Failure to identify any source (including interviews, surveys, etc.), published in any medium (including on the internet) or unpublished, from which words, structure, ideas, or data have been taken, constitutes plagiarism;“ • Plagiarism will not be tolerated and will be dealt with in accordance with and as outlined by the LSU Code of Student Conduct : http://appl003.lsu.edu/slas/dos.nsf/$Content/Code+of+Conduct? OpenDocument CSC 7600 Lecture 1 : Introduction Spring 2011 64 Topics • • • • • • • • • • • HPC Applications Supercomputing : An Enabler Architecture, Technologies, Programming Models Performance oriented theme Demo 1 : Performance Scalability Brief History of HPC Sources of Performance Degradation Supercomputer System Stack Course Overview - Goals & Content Course Administration Summary Materials for Test CSC 7600 Lecture 1 : Introduction Spring 2011 65 Summary Materials for Test • • • • • • • Defining Supercomputer – slide 21 Performance Issues in HPC – slide 24 Scalability – slide 25 Machine parameters affecting performance – slide 26 Driving factors for HPC – slide 35 Sources of performance degradation – slide 36 Supercomputing system stack – slide 42 CSC 7600 Lecture 1 : Introduction Spring 2011 66 CSC 7600 Lecture 1 : Introduction Spring 2011 ENIAC (Electronic Numerical Integrator and Computer ) • Eckert and Mauchly, 1946. • Vacuum tubes. • Numerical solutions to problems in fields such as atomic energy and ballistic trajectories. CSC 7600 Lecture 1 : Introduction Spring 2011 68 EDSAC (Electronic Delay Storage Automatic Calculator) • Maurice Wilkes, 1949. • Mercury delay lines for memory and vacuum tubes for logic. • Used one of the first assemblers called Initial Orders. • Calculation of prime numbers, solutions of algebraic equations, etc. CSC 7600 Lecture 1 : Introduction Spring 2011 69 MIT Whirlwind • Jay Forrester, 1949. • Fastest computer. • First computer to use magnetic core memory. • Displayed real time text and graphics on a large oscilloscope screen. CSC 7600 Lecture 1 : Introduction Spring 2011 70 CRAY-1 • Cray Research, 1976. • Pipelined vector arithmetic units. • Unique C-shape to help increase the signal speeds from one end to the other. CSC 7600 Lecture 1 : Introduction Spring 2011 71 CM-2 • Thinking Machines Corporation, 1987. • Hypercube architecture with 65,536 processors. • SIMD. • Performance in the range of GFLOPS. CSC 7600 Lecture 1 : Introduction Spring 2011 72 INTEL Touchstone Delta • INTEL, 1990. • MIMD hypercube. • LINPACK rating of 13.9 GFLOPS . • Enough computing power for applications like real-time processing of satellite images and molecular models for AIDS research. CSC 7600 Lecture 1 : Introduction Spring 2011 73 Beowulf • Thomas Sterling and Donald Becker, 1994. • Cluster formed of one head node and one/more compute nodes. • Nodes and network dedicated to the Beowulf. • Compute nodes are mass produced commodities. • Use open source software including Linux. CSC 7600 Lecture 1 : Introduction Spring 2011 74 Earth Simulator • Japan, 1997. • Fastest supercomputer from 2002-2004: 35.86 TFLOPS. • 640 nodes with eight vector processors and 16 gigabytes of computer memory at each node. CSC 7600 Lecture 1 : Introduction Spring 2011 75 BlueGene/L • IBM, 2004. • First supercomputer ever to run over 100 TFLOPS sustained on a real world application, namely a threedimensional molecular dynamics code (ddcMD). CSC 7600 Lecture 1 : Introduction Spring 2011 76 CSC 7600 Lecture 1 : Introduction77 Spring 2011