Introduction: HPC goes mainstream Chokchai Box Leangsuksun Associate Professor, Computer Science Louisiana Tech University box@latech.edu 1 Outline • Why HPC is critical technology ? • Conclusion 13 April 2015 2 Why HPC? • High Performance Computing – Parallel , Supercomputing – Enabled by multiple high speed CPUs, networking, software etc – fastest possible solution – Technologies that help solving non-trivial tasks including scientific, engineering, medical, business entertainment and etc. • Time to insights, Time to discovery, Times to markets • BTW, HPC is not GRID!!!. 13 April 2015 3 HPC Applications and Major Industries • Finite Element Modeling – Auto/Aero • Fluid Dynamics – Auto/Aero, Consumer Packaged Goods Mfgs, Process Mfg, Disaster Preparedness (tsunami) • Imaging – Seismic & Medical • Finance – Banks, Brokerage Houses (Regression Analysis, Risk, Options Pricing, What if, …) • Molecular Modeling – Biotech and Pharmaceuticals Complex Problems, Large Datasets, Long Runs This slide is from Intel presentation “Technologies for Delivering Peak Performance on HPC and Grid Applications” 13 April 2015 4 Life Science Problem – an example of Protein Folding • Take a computing year (in serial mode) to do molecular dynamics simulation for a protein folding problem •Excerpted from IBM David Klepacki’s The future of HPC 13 April 2015 •Petaflop = a thousand trillion floating point operations per second 5 Disaster Preparedness - example • Project LEAD – Severe Weather prediction (Tornado) – OU leads. • HPC & Dynamically adaptation to weather forecast • Professor Seidel’s LSU CCT – Hurricane Route Prediction – Emergency Preparedness – Show Movie – HPC-enabled Simulation 13 April 2015 6 Did you know that Playstation 3 is a HPC/Supercomputer? • • 9 cores/CPUs in one chip. Future gaming software is no longer graphic or multimedia only • This diagram is from an article from IBM Cell processor & compiler challenge 13 April 2015 7 No Free Lunch (mainstream CPUs) • CPU speed – plateaus 3-4 Ghz • More cores in a single chip – Dual core is now – Multicore is imminent 3-4 Ghz cap • Traditional Applications won’t get a free rides • Conversion to parallel computing (HPC, MT) This diagram is from “no free lunch article in DDJ 13 April 2015 8 Cancer Gene-mining • • Unsuccessful on a uni-processor Our approach – Novel parallel gene-mining algorithms – Input from microarray – Retain accuracy – Significantly speed up (superlinear) • IBM P5 supercomputer (128 node PPC). Time to run the algorithm, keeping number of nodes fixed M es othelioma Time taken(in secs) 1200 Breas t 80 60 Renal 1000 Bladder 100 L eukemia 40 800 20 P ros tate 600 0 L ung 400 P anc reas 200 0 C olorec tal O vary 13 39 65 91 L ymphoma M elanoma Number of processors O vaM arker bas ed Selec tion 13 April 2015 G eneSetM ine bas ed Selec tion 9 Significant indicators – why HPC now? • • • “I propose to double the commitment to the No Free lunch in CPU speed up (Intelfederal or AMD) mostatcritical – In past 1-2 years, CPU speed was flatten 3+ Ghzbasic research programschips in the physical – More CPUs in one chip – Dual core, multi-core – Traditional software won’t take advantage of these sciences overnew theprocessors next 10 – Personal/Desktop Supercomputing. years. This funding will support Many real problems are highly computational the work intensive. of America's most – NSA uses supercomputing to do datacreative mining minds as they explore – DOE – fusion, plasma, energy relatedpromising (including areas weaponry). such as – Help solving many other important areas (nanotech, life science etc.) nanotechnology, Giants recently sneeze out HPC supercomputing, and – Bush’s state of union speech – 3 mainalternative S&T focusenergy of which Supercomputing is one sources.” of them – Bill Gates’ keynote speech at SC05 – MS goes after HPC • • • Gorge W. Bush, 2005 Google search engine - 100,000 nodes Playstation 3 is a personal supercomputing platform Hollywood (Entertainment) is HPC-bound (Pixar – more than 3000 CPUs to render animation) 13 April 2015 10 HPC preparedness • Build work forces that understand HPC paradigm & its applications – HPC/Grid Curriculum in IT/CS/CE/ICT – Offer HPC-enabling tracks to other disciplinary (engineering, life science, physic, computational chem, business etc..) – Training business community (e.g. HPC for enterprise ; Fluent certification, HA SLA certification) – Bring awareness to public • . 13 April 2015 11 Introduction to Parallel computing • Need more computing power – Improve the operating speed of processors & other components • constrained by the speed of light, thermodynamic laws, & the high financial costs for processor fabrication – Connect multiple processors together & coordinate their computational efforts • parallel computers • allow the sharing of a computational task among multiple processors 13 April 2015 12 How to Run Applications Faster ? • There are 3 ways to improve performance: – Work Harder – Work Smarter – Get Help • Computer Analogy – Using faster hardware – Optimized algorithms and techniques used to solve computational tasks – Multiple computers to solve a particular task 13 April 2015 13 Era of Computing – Rapid technical advances • the recent advances in VLSI technology • software technology – OS, PL, development methodologies, & tools • grand challenge applications have become the main driving force – Parallel computing • one of the best ways to overcome the speed bottleneck of a single processor • good price/performance ratio of a small cluster-based parallel computer 13 April 2015 14 HPC Level-setting Definitions • High performance computing is: – Computing that demands more than a single highmarket-volume workstation or server can deliver • HPC is based on concurrency: – Concurrency: computing in which multiple tasks are active at the same time • Parallel computing occurs when you use concurrency to: – Solve bigger problems – Solve a fixed-size problem in less time 13 April 2015 15 HPC Level-setting Hardware for Parallel Computing Parallel Computers Single Instruction Multiple Data (SIMD)§ Shared Address Space Symmetric Multiprocessor (SMP) Non-uniform Memory Architecture (NUMA) Multiple Instruction Multiple Data (MIMD) Disjoint Address Space Massively Parallel Processor (MPP) Commodit y Cluster Distributed Computing §SIMD has failed as a way to organize large-scale computers with multiple processors. It has succeeded, however, as a mechanism to increase instruction-level parallelism in modern microprocessors (in Intel® MMX™ technology). 13 April 2015 16 Scalable Parallel Computer Architectures • MPP – A large parallel processing system with a shared-nothing architecture – Consist of several hundred nodes with a high-speed interconnection network/switch – Each node consists of a main memory & one or more processors • Runs a separate copy of the OS • SMP – – – – 13 April 2015 2-64 processors today Shared-everything architecture All processors share all the global resources available Single copy of the OS runs on these systems 17 Scalable Parallel Computer Architectures • CC-NUMA – a scalable multiprocessor system having a cache-coherent nonuniform memory access architecture – every processor has a global view of all of the memory • Distributed systems – considered conventional networks of independent computers – have multiple system images as each node runs its own OS – the individual machines could be combinations of MPPs, SMPs, clusters, & individual computers • Clusters – a collection of workstations of PCs that are interconnected by a highspeed network – work as an integrated collection of resources – have a single system image spanning all its nodes 13 April 2015 18 Cluster Computer and its Architecture • A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource • A node – – – – – 13 April 2015 a single or multiprocessor system with memory, I/O facilities, & OS generally 2 or more computers (nodes) connected together in a single cabinet, or physically separated & connected via a LAN appear as a single system to users and applications provide a cost-effective way to gain features and benefits 19 Cluster Computer Architecture 13 April 2015 20 Beowulf Head Node •Login •Compile •Submit job Compute nodes 13 April 2015 •Run tasks 21 Prominent Components of Cluster Computers (I) • Multiple High Performance Computers – PCs – Workstations – SMPs (CLUMPS) – Distributed HPC Systems leading to Metacomputing 13 April 2015 22 Prominent Components of Cluster Computers (II) • State of the art Operating Systems – – – – – – – Linux (Beowulf) Microsoft NT (Illinois HPVM) SUN Solaris (Berkeley NOW) IBM AIX(IBM SP2) HP UX (Illinois - PANDA) Mach (Microkernel based OS) (CMU) Cluster Operating Systems (Solaris MC, SCO Unixware, MOSIX (academic project) – OS gluing layers (Berkeley Glunix) 13 April 2015 23 Prominent Components of Cluster Computers (III) • High Performance Networks/Switches – Ethernet (10Mbps), Fast Ethernet (100Mbps), – InfiniteBand (1-8 Gbps) – Gigabit Ethernet (1Gbps) – SCI (Dolphin - MPI- 12micro-sec latency) – ATM – Myrinet (1.2Gbps) – Digital Memory Channel – FDDI 13 April 2015 24 Prominent Components of Cluster Computers (IV) • Network Interface Card – Myrinet has NIC – InfiniteBand (HBA) – User-level access support 13 April 2015 25 Prominent Components of Cluster Computers (VI) • Cluster Middleware – Single System Image (SSI) – System Availability (SA) Infrastructure • Hardware – DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques • Operating System Kernel/Gluing Layers – Solaris MC, Unixware, GLUnix • Applications and Subsystems – – – Applications (system management and electronic forms) Runtime systems (software DSM, PFS etc.) Resource management and scheduling software (RMS) • 13 April 2015 CODINE, LSF, PBS, NQS, etc. 26 Prominent Components of Cluster Computers (VII) • Parallel Programming Environments and Tools – Threads (PCs, SMPs, NOW..) • POSIX Threads • Java Threads – MPI • Linux, NT, on many Supercomputers – PVM – Software DSMs (Shmem) – Compilers • C/C++/Java • Parallel programming with C++ (MIT Press book) – RAD (rapid application development tools) • GUI based tools for PP modeling – Debuggers – Performance Analysis Tools – Visualization Tools 13 April 2015 27 Prominent Components of Cluster Computers (VIII) • Applications – Sequential – Parallel / Distributed (Cluster-aware app.) • Grand Challenging applications – – – – – Weather Forecasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) ………………. • PDBs, web servers,data-mining 13 April 2015 28 Key Operational Benefits of Clustering • • • • High Performance Expandability and Scalability High Throughput High Availability 13 April 2015 29 Divide and Conquer • Says 1 CPU – 1,000,000 elements – Numerical processing for 1 element = .1 secs – One computer will take 100,000 secs = 27.7 hrs • Says 100 CPUs – .27 hr ~ 16 mins 13 April 2015 30 Parallel Computing • A big application is divided into Multiple tasks • Total computation time – Computing time – Communication time 13 April 2015 31 Summary • HPC helps accelerates Time to insights, time to discovery and time to Market for challenging problems • Divide and Conquer – Computing vs communication time • Cluster computing is a predominant HPC system 13 April 2015 32