High Performance Computing Driving Innovation and Capability Ian Wardrope EMEA Sales Director High Performance Computing and Fabrics Intel Confidential — Do Not Forward Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. So=ware and workloads used in performance tests may have been opRmized for performance only on Intel microprocessors.Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, so=ware, operaRons and funcRons.Any change to any of those factors may cause the results to vary.You should consult other informaRon and performance tests to assist you infully evaluaRng your contemplated purchases, including the performance of that product when combined with other products. Intel product plans in this presentaRon do not consRtute Intel plan of record product roadmaps. Please contact your Intel representaRve to obtain Intel's current plan of record product roadmaps. Intel's compilers may or may not opRmize to the same degree for non-­‐Intel microprocessors for opRmizaRons that are not uniqueto Intel microprocessors. These opRmizaRons include SSE2, SSE3, and SSE3 instrucRon sets and other opRmizaRons. Intel does not guarantee the availability, funcRonality, or effecRveness of any opRmizaRon on microprocessors not manufactured by Intel. Microprocessor-­‐dependent opRmizaRons in this product are intended for use with Intel microprocessors. Certain opRmizaRons not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more informaRon regarding the specificinstrucRon sets covered by this noRce.NoRce revision #20110804 All products, computer systems, dates, and figures specified are preliminary based on current expectaRons, and are subject to change without noRce. Intel processor numbers are not a measure of performance.Processor numbers differenRate features within each processor family, not across different processor families.Go to: hdp://www.intel.com/products/processor_number Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specificaRons. Current characterized errata are available on request. Intel, Intel Xeon, Intel Xeon Phi, Intel Hadoop DistribuRon, Intel Cluster Ready, Intel OpenMP, Intel CilkPlus, Intel Threaded Buildiingblocks, Intel Cluster Studio, Intel Parallel Studio, Intel CoarrayFortran, Intel Math KernalLibrary, Intel Enterprise EdiRon for LustreSo=ware, Intel Composer, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel CorporaRon or its subsidiaries in the United States and other countries. Intel does not control or audit the design or implementaRon of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. Other names, brands , and images may be claimed as the property of others. Copyright © 2013, Intel CorporaRon. All rights reserved. Intel Confidential – for internal use only Exascale Problem Statement Achieve 1 ExaFLOP of performance by 2020 within a 20MW power limit Intel Confidential – for internal use only 3 Intel in HPC Processors Coprocessor Intel® Xeon® Processor Intel® Many Integrated Core XEON PHI® Intel Confidential – for internal use only Fabric Intel® True Scale Storage Software & Services True Scale Technology 4 Timeline of Many-Core at Intel Era of Tera CTO Keynote & “The Power Wall” 2004 2005 Teraflops Research Processor (Polaris) 2006 Many-core Many-core Tera-scale technology R&D agenda computing research Strategic & BU program Planning Larrabee development(80+ projects) 2007 Single-chip Cloud Computer (Rock Creek) 2008 Workloads, Universal simulators, Parallel software & Computing insights from Research Intel Labs Centers Aubrey Isle & Intel® MIC Architecture 2009 2010 1 Teraflops SGEMM on Larrabee @ SC’091 Many-core applications research community 2011 2012 Intel® Xeon Phi™ Coprocessor enters Top500 at #150 (pre-launch) 2 1. Source: Intel Measured/demonstrated at SC ‘09, Nov. 2009. 2: Source: www.top500.org June 2012 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance 5 Intel Confidential – for internal use only Intel® Xeon Phi™ Co-processor (Codenamed Knights Corner) • Significant improvement in FLOPS/Watt • 60 Cores, 1.053 GHz and 240 Threads • 8 GB memory and up to 320 GB/s memory bandwidth • 512-bit SIMD Vectors • Works synergistically with Intel® Xeon® processors Source: Intel® Xeon Phi™ Coprocessor 5110P key specifications Intel Confidential – for internal use only Intel® ASCI Red (1997) 9,298 Intel CPUs = 1 TFLOPS performance 76 Server Cabinets Intel® Xeon Phi™ Co-Processor (2013) >1 TFLOPS performance 1 PCIe Slot Many-core Execution Models SOURCE CODE SERIAL AND MODERATELLY PARALLEL CODE Compilers, Libraries, Runtime Systems MAIN() XEON® MAIN() XEON PHI™ RESULT S Multicore Only Intel Confidential – for internal use only XEON® XEON PHI™ RESULT S Multicore Hosted with Manycore Offload HIGHLY PARALLEL CODE MAIN() MAIN() XEON® XEON PHI™ RESULT S Symmetric MAIN() XEON® XEON PHI™ RESULT S Manycore Only (Native) Intel® Xeon Phi™ Co-processor: Application Performance Examples % SIMD/VECTOR • Intel® Xeon Phi™ coprocessor accelerates highly parallel & vectorizable applications. (graph above) Customer Application Performance Increase1 vs. 2S Xeon* Los Alamos Molecular Dynamics Up to 2.52x Acceleware 8th order isotropic variable velocity Up to 2.05x Jefferson Labs Lattice QCD Up to 2.27x Financial Services BlackScholes SP Monte Carlo SP Up to 7x Up to 10.75x Sinopec Seismic Imaging Up to 2.53x2 Sandia Labs miniFE (Finite Element * Xeon = Intel® Xeon® processor; * Xeon Phi = Intel® Xeon Phi™Solver) coprocessor Intel Labs Notes: 1. 2. 3. 4. Ray Tracing (incoherent rays) Up to 2x3 Up to 1.88x4 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted) 2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload) 8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero) Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are Intel Measured Oct. 2012 computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other 8measured using specific information and performance Intel Confidential – for internal tests use to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: only Customer Measured results as of October 22, 2012 Configuration Details: Please reference slide speaker notes.For more information go to http://www.intel.com/performance Next Intel® Xeon Phi™ Product Family (Codenamed Knights Landing) v Available in Intel cutting-edge 14 nanometer process v Stand alone CPU or PCIe coprocessor – not bound by ‘offloading’ bottlenecks v Integrated Memory - balances compute with bandwidth All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. 9 Intel Confidential – for internal use only Note that code name above is not the product name Heterogeneous Computing THEN NOW NEXT Multiple Source Single Source Single Source Multiple Binary Dual Binary Single Binary CPU Accelerator (Multicore)(GPU, FPGA, DSP, ASIC) OFFLOAD Intel Confidential – for internal use only CPU (Multicore) Co-processor (Manycore) OFFLOAD & NATIVE CPU (Multi & Manycore) NATIVE Intel Parallel Computing Centers (IPCC) • World leading universities, institutions, and research labs • Focused on modernizing applications to increase parallelism and scalability • Optimizations that leverage cores, caches, threads, and vector capabilities of microprocessors and coprocessors KONRAD-­‐ZUSE-­‐ZENTRUM FÜR INFORMATIONSTECHNIK BERLIN Intel Confidential – for internal use only 11 Intel Parallel Computing Centers (IPCC) KONRAD-­‐ZUSE-­‐ZENTRUM FÜR INFORMATIONSTECHNIK BERLIN Intel Confidential – for internal use only 12 Archer at EPCC Intel Confidential – for internal use only HPC as a differentiator in UK industry and academia Intel Confidential – for internal use only 14 UK HPC Investment David Willetts unveils £73 million of new funding to help the public and academics unlock the potential of big data Minister for Science announces a new £158M capital investment in einfrastructure Includes £43M for ARCHER, the next national HPC facility for academic research. UK Government 2012 UK Government 2014 EPCC and Scottish Enterprise Launch £1.2M Supercomputing Scotland Programme Scottish businesses to benefit from 3-year investment; focus on energy, life science and finance sectors. HPC Wales, is part-funded by some £25 million through the Welsh Government, including over £19.5m from the European Regional Development Fund Welsh Government 2014 Supercomputing Scotland 2013 Intel Confidential – for internal use only 15 HPC is no longer an Optional Investment ENERGY EXPLORATION COMPUTATIONAL RACE FINANCIAL ANALYSES MEDICAL IMAGING To Compete You Must Compute CLIMATE WEATHER MODELING DIGITAL CONTENT CREATION CAE/CAD MANUFACTURING SCIENTIFIC RESEARCH SECURITY Intel Confidential – for internal use only Providing a Competitive Advantage "It costs £500,000 to do each physical test of a car crash, and it's not repeatable. It costs £12 to run a virtual simulation of a car crash, and it’s fully repeatable, so it can be used to optimise the design of a vehicle." Andy Searle, Head of Computer Aided Engineering, Jaguar Land Rover. Intel Confidential – for internal use only Disruptive Changes We are now entering an era of personalised medicine where the sequencing of a patient’s genome costs $1000 Opportunities • • • Provides ability to tailor individual treatments Allows improved insights into population health trends Enables decoding and curing of complex diseases Challenges • • • Sequencing cost is diverging from Moore’s Law Compute and Analytics performance needs to keep pace with Sequencing Lower cost will lead to higher demand and therefore higher volume Intel Confidential – for internal use only National Human Genome Research Institute 18 Product Innovation Proctor & Gamble use HPC capability to design the optimum shape of Pringles potato chips “Fluid flow interactions with the steam and oil as the chips are being cooked and seasoned [ensures even cooking and flavouring]” “We make them fast enough so that in their transport, the aerodynamics are relevant. If we make them too fast, they fly where we don't want them to….” Source: The Aerodynamics of Pringles – Tom Lange, Director of Modelling & Simulation, Proctor & Gamble Intel Confidential – for internal use only 19 Convergence with High Performance Data Analytics Intel Confidential – for internal use only 20 Today: Islands of Resources and Capabilities Central IT Acquisition Archiving Archiving Analytics HPC Line of Business Analytics Replication of Data Acquisition Line of Business Preprocessing Intel Confidential – for internal use only Cost of Data movement Tomorrow: Integrated into Workflow Acquisition Results Postprocessing/ Analytics Intel Confidential – for internal use only Line of Business + Central IT Filter/Preprocessing Computation/ Simulation HPC meets Big Data Modelling & Simulation Anthropological & Social data Weather & Climate Realtime monitoring & sensor input Historical trends Intel Confidential – for internal use only Bioinformatics Current Systems and Future Trends Intel Confidential – for internal use only 24 Current Cluster Architecture Storage Core Core Core I/O I/O I/O Memory Memory Memory Core Core Core Core Core Core Core Core Core C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Memory C C C C C C Intel Confidential – for internal use only CP U C C C C C C C C C C C C Coprocessor CP U Coprocessor CP U Coprocessor CP U Coprocessor CP U Coprocessor CP U Coprocessor Future Trends Storage Fabric Controller integrated with CPU Fabric performance scales with CPU Core Core Core Core Core Core I/O I/O I/O I/O Core Core Core CP U CP U CP U CP U Core Core Core Core Core Core C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Memory C C C C C C C C C C C C C C C C C C C C C C C C Core Core Core Memory Memory Highly parallel, wide vector CPUs Increased cores/threads Increased memory capacity and bandwidth Intel Confidential – for internal use only Simplified node architecture Future Software Development Today Tomorrow Intel Confidential – for internal use only Threading Vectors Communication 10s of threads 256 bit MPI 100s-1000s of threads 512 bit MPI, SHMEM, PGAS 27 The Path to Exascale Intel Confidential – for internal use only 28 The Path to Exascale Roughly 10-12 years between each FLOPS barrier. Plan to break ExaFLOPS barrier by 2020, but this time within a 20MW limit ExaFLOP PetaFLOP 1.0E+18 1.0E+15 TeraFLOP 1.0E+12 GigaFLOP 1.0E+09 1.0E+06 1.0E+03 1985- Cray 2 Intel Confidential – for internal use only 1996 - Intel ASCI Red 2008 - IBM Roadrunner 2012 - Cray Titan 2013 - NUDT/ Intel Tianhe-2 2020- ??? #1 on the Top500 list Tianhe-2 (“Milky Way 2”) National University of Defense Technology/Sun Yat-sen University, Guangzhou, China 3,120,000 Compute Cores 1.4 TB RAM 32,000 Intel® Xeon™ Processors 33.8 PetaFLOPS 48,000 Intel® Xeon Phi™ Co-processors 12.4 PB Global Parallel Storage ~520 MW/ExaFLOP Intel Confidential – for internal use only 24 MW of Power and Cooling Performance/Power Challenges 12.6 MW 17.6 MW 2008 IBM Roadrunner 2011 Fujitsu K 2013 Tianhe-2 1.042 PF 10.51 PF 33.8 PF 2.35 MW Intel Confidential – for internal use only 20 MW ~960x Performance ~8.5x Power Consumption 1 EF Exascale Requirements Current #1 machine is capable of 33 PFLOPS while consuming 17.6 MW of power (24 MW including HVAC) So an ExaScale system needs to provide 25-30x the performance whilst consuming a little over 10% more power Moore’s law will get us part of the way, but a fundamental change is required • • • • • • Lower Power Consumption (per core, per node, per cluster) Reduced physical size through improved integration Improved component and system reliability New programming languages and methods Increased parallelism and threading Better insight into debugging and performance profiling Intel Confidential – for internal use only Intel Research Areas Many-core Computing High Bandwidth Memory Silicon Photonics Teraflops Terabytes Terabits of computing power of memory bandwidth of I/O throughput Future vision, does not represent real products. Intel Confidential – for internal use only Driving Innovation and Integration Integrated Today Coming Tomorrow SYSTEM LEVEL BENEFITS IN COST, POWER, DENSITY, SCALABILITY & PERFORMANCE Intel Confidential – for internal use only Intel Exascale Labs — Europe Strong Commitment To Advance Computing Leading Edge: Intel collaborating with HPC community & European researchers 4 labs in Europe - Exascale computing is the central topic ExaScale Computing Research Lab, Paris Performance and scalability of Exascale applications Tools for performance characterization ExaCluster Lab, Jülich ExaScience Life Lab, Leuven Intel and BSC Exascale Lab, Barcelona Exascale cluster scalability and reliability HPC for Life Science Genomics, Biostatistics Scalable Runtime System and tools www.exascale-labs.eu Intel Confidential – for internal use only New algorithms Exascale Challenges Exploiting massive parallelism § § § § How will existing applications scale? Will there be new apps or models using new algorithms? Data transfer (memory, interconnect) will become relatively more expensive Requirements on (hierarchical) programming models, schedulers, languages, … Reducing power requirements § Must reduce the power requirement by a factor of at least 100 § Is a challenge also for SW (middleware and applications) § Optimize for performance and power Coping with run-time errors § Frequency of errors will increase, identification and correction will become more difficult § HPC middleware has to include resiliency § Redesign applications to embed resiliency? Intel Confidential – for internal use only Career Opportunities at Intel Intel Jobs Website: http://jobs.intel.com Intel’s Exascale Labs are recruiting - Currently 2 open vacancies Email: karl.solchenbach@intel.com Intel Confidential – for internal use only 37 Intel Confidential — Do Not Forward 28pt Light Text Section Break Page 12pt Medium Subhead Intel Information Technology Intel Confidential – for internal use only 39 28pt Light Text Section Break Page 12pt Medium Subhead Intel Confidential – for internal use only 40