High Performance Computing Driving Innovation and Capability
Ian Wardrope
EMEA Sales Director
High Performance Computing and Fabrics
Intel Confidential — Do Not Forward
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. So=ware and workloads used in performance tests may have been opRmized for performance only on Intel microprocessors.Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, so=ware, operaRons and funcRons.Any change to any of those factors may cause the results to vary.You should consult other informaRon and performance tests to assist you infully evaluaRng your contemplated purchases, including the performance of that product when combined with other products. Intel product plans in this presentaRon do not consRtute Intel plan of record product roadmaps. Please contact your Intel representaRve to obtain Intel's current plan of record product roadmaps. Intel's compilers may or may not opRmize to the same degree for non-­‐Intel microprocessors for opRmizaRons that are not uniqueto Intel microprocessors. These opRmizaRons include SSE2, SSE3, and SSE3 instrucRon sets and other opRmizaRons. Intel does not guarantee the availability, funcRonality, or effecRveness of any opRmizaRon on microprocessors not manufactured by Intel. Microprocessor-­‐dependent opRmizaRons in this product are intended for use with Intel microprocessors. Certain opRmizaRons not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more informaRon regarding the specificinstrucRon sets covered by this noRce.NoRce revision #20110804 All products, computer systems, dates, and figures specified are preliminary based on current expectaRons, and are subject to change without noRce. Intel processor numbers are not a measure of performance.Processor numbers differenRate features within each processor family, not across different processor families.Go to: hdp://www.intel.com/products/processor_number Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specificaRons. Current characterized errata are available on request. Intel, Intel Xeon, Intel Xeon Phi, Intel Hadoop DistribuRon, Intel Cluster Ready, Intel OpenMP, Intel CilkPlus, Intel Threaded Buildiingblocks, Intel Cluster Studio, Intel Parallel Studio, Intel CoarrayFortran, Intel Math KernalLibrary, Intel Enterprise EdiRon for LustreSo=ware, Intel Composer, the Intel Xeon Phi logo, the Intel Xeon logo and the Intel logo are trademarks or registered trademarks of Intel CorporaRon or its subsidiaries in the United States and other countries. Intel does not control or audit the design or implementaRon of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase. Other names, brands , and images may be claimed as the property of others. Copyright © 2013, Intel CorporaRon. All rights reserved. Intel Confidential – for internal use
only
Exascale Problem Statement
Achieve 1 ExaFLOP of performance by 2020 within
a 20MW power limit
Intel Confidential – for internal use
only
3
Intel in HPC
Processors
Coprocessor
Intel® Xeon® Processor Intel® Many Integrated Core
XEON PHI®
Intel Confidential – for internal use
only
Fabric
Intel® True Scale
Storage
Software
& Services
True Scale
Technology
4
Timeline of Many-Core at Intel
Era of Tera CTO
Keynote
& “The Power Wall”
2004
2005
Teraflops
Research
Processor
(Polaris)
2006
Many-core Many-core Tera-scale
technology R&D agenda computing
research
Strategic
& BU
program
Planning
Larrabee
development(80+ projects)
2007
Single-chip Cloud
Computer (Rock
Creek)
2008
Workloads, Universal
simulators,
Parallel
software & Computing
insights from Research
Intel Labs
Centers
Aubrey Isle &
Intel® MIC
Architecture
2009
2010
1 Teraflops
SGEMM on
Larrabee
@ SC’091
Many-core
applications
research
community
2011
2012
Intel® Xeon Phi™
Coprocessor
enters Top500 at
#150
(pre-launch) 2
1. Source: Intel Measured/demonstrated at SC ‘09, Nov. 2009. 2: Source: www.top500.org June 2012
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance
of that product when combined with other products. For more information go to http://www.intel.com/performance
5
Intel Confidential – for internal use
only
Intel® Xeon Phi™ Co-processor
(Codenamed Knights Corner)
• 
Significant improvement in FLOPS/Watt
• 
60 Cores, 1.053 GHz and 240 Threads
• 
8 GB memory and up to 320 GB/s memory
bandwidth
• 
512-bit SIMD Vectors
• 
Works synergistically with Intel® Xeon®
processors
Source: Intel® Xeon Phi™ Coprocessor 5110P key specifications
Intel Confidential – for internal use
only
Intel® ASCI Red (1997)
9,298 Intel CPUs = 1 TFLOPS performance
76 Server Cabinets
Intel® Xeon Phi™ Co-Processor (2013)
>1 TFLOPS performance
1 PCIe Slot
Many-core Execution Models
SOURCE
CODE
SERIAL AND MODERATELLY
PARALLEL CODE
Compilers,
Libraries,
Runtime Systems
MAIN()
XEON®
MAIN()
XEON
PHI™
RESULT
S
Multicore Only
Intel Confidential – for internal use
only
XEON®
XEON
PHI™
RESULT
S
Multicore Hosted
with
Manycore Offload
HIGHLY PARALLEL
CODE
MAIN()
MAIN()
XEON®
XEON
PHI™
RESULT
S
Symmetric
MAIN()
XEON®
XEON
PHI™
RESULT
S
Manycore Only
(Native)
Intel® Xeon Phi™ Co-processor:
Application Performance Examples
% SIMD/VECTOR
•  Intel® Xeon Phi™ coprocessor accelerates highly parallel &
vectorizable applications. (graph above)
Customer
Application
Performance Increase1 vs. 2S
Xeon*
Los Alamos
Molecular
Dynamics
Up to 2.52x
Acceleware
8th order isotropic
variable velocity
Up to 2.05x
Jefferson
Labs
Lattice QCD
Up to 2.27x
Financial
Services
BlackScholes SP
Monte Carlo SP
Up to 7x
Up to 10.75x
Sinopec
Seismic Imaging
Up to 2.53x2
Sandia Labs
miniFE
(Finite Element
* Xeon = Intel® Xeon® processor;
* Xeon Phi = Intel® Xeon Phi™Solver)
coprocessor
Intel Labs
Notes:
1. 
2. 
3. 
4. 
Ray Tracing
(incoherent rays)
Up to 2x3
Up to 1.88x4
2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted)
2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload)
8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero)
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are
Intel
Measured
Oct.
2012 computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
8measured
using
specific
information
and performance
Intel Confidential
– for internal tests
use to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source:
only
Customer
Measured results as of October 22, 2012 Configuration Details: Please reference slide speaker notes.For more information go to http://www.intel.com/performance
Next Intel® Xeon Phi™ Product Family
(Codenamed Knights Landing)
v  Available in Intel cutting-edge 14
nanometer process
v  Stand alone CPU or PCIe
coprocessor – not bound by
‘offloading’ bottlenecks
v  Integrated Memory - balances
compute with bandwidth
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
9
Intel Confidential – for internal use
only
Note that code name above is not the product name
Heterogeneous Computing
THEN
NOW
NEXT
Multiple Source
Single Source
Single Source
Multiple Binary
Dual Binary
Single Binary
CPU
Accelerator
(Multicore)(GPU, FPGA, DSP, ASIC)
OFFLOAD
Intel Confidential – for internal use
only
CPU
(Multicore)
Co-processor
(Manycore)
OFFLOAD &
NATIVE
CPU
(Multi & Manycore)
NATIVE
Intel Parallel Computing Centers (IPCC)
•  World leading universities, institutions, and research labs
•  Focused on modernizing applications to increase parallelism and scalability
•  Optimizations that leverage cores, caches, threads, and vector capabilities of
microprocessors and coprocessors
KONRAD-­‐ZUSE-­‐ZENTRUM FÜR INFORMATIONSTECHNIK BERLIN Intel Confidential – for internal use
only
11
Intel Parallel Computing Centers (IPCC)
KONRAD-­‐ZUSE-­‐ZENTRUM FÜR INFORMATIONSTECHNIK BERLIN Intel Confidential – for internal use
only
12
Archer at EPCC
Intel Confidential – for internal use
only
HPC as a differentiator in UK industry and academia
Intel Confidential – for internal use
only
14
UK HPC Investment
David Willetts unveils £73
million of new funding to help
the public and academics
unlock the potential of big data
Minister for Science announces a
new £158M capital investment in einfrastructure
Includes £43M for ARCHER, the next
national HPC facility for academic research.
UK Government 2012
UK Government 2014
EPCC and Scottish Enterprise Launch £1.2M
Supercomputing Scotland Programme
Scottish businesses to benefit from 3-year investment; focus
on energy, life science and finance sectors.
HPC Wales, is part-funded by some
£25 million through the Welsh
Government, including over £19.5m
from the European Regional
Development Fund
Welsh Government 2014
Supercomputing Scotland 2013
Intel Confidential – for internal use
only
15
HPC is no longer an Optional Investment
ENERGY
EXPLORATION
COMPUTATIONAL
RACE
FINANCIAL
ANALYSES
MEDICAL
IMAGING
To Compete You Must Compute
CLIMATE
WEATHER
MODELING
DIGITAL
CONTENT
CREATION
CAE/CAD
MANUFACTURING
SCIENTIFIC
RESEARCH
SECURITY
Intel Confidential – for internal use
only
Providing a Competitive Advantage
"It costs £500,000 to do each physical test of a car crash, and
it's not repeatable.
It costs £12 to run a virtual simulation of a car crash, and it’s
fully repeatable, so it can be used to optimise the design of a
vehicle."
Andy Searle,
Head of Computer Aided Engineering,
Jaguar Land Rover.
Intel Confidential – for internal use
only
Disruptive Changes
We are now entering an era of personalised medicine where the sequencing of a patient’s
genome costs $1000
Opportunities
• 
• 
• 
Provides ability to tailor individual treatments
Allows improved insights into population health
trends
Enables decoding and curing of complex
diseases
Challenges
• 
• 
• 
Sequencing cost is diverging from Moore’s Law
Compute and Analytics performance needs to
keep pace with Sequencing
Lower cost will lead to higher demand and
therefore higher volume
Intel Confidential – for internal use
only
National Human Genome Research Institute
18
Product Innovation
Proctor & Gamble use HPC capability to design the optimum shape of Pringles potato
chips
“Fluid flow interactions with the
steam and oil as the chips are
being cooked and seasoned
[ensures even cooking and
flavouring]”
“We make them fast enough so
that in their transport, the
aerodynamics are relevant. If we
make them too fast, they fly where
we don't want them to….”
Source: The Aerodynamics of Pringles – Tom Lange, Director of Modelling & Simulation, Proctor & Gamble
Intel Confidential – for internal use
only
19
Convergence with High Performance Data Analytics
Intel Confidential – for internal use
only
20
Today: Islands of Resources and Capabilities
Central IT
Acquisition
Archiving
Archiving
Analytics
HPC
Line of Business
Analytics
Replication
of Data
Acquisition
Line of Business
Preprocessing
Intel Confidential – for internal use
only
Cost of
Data movement
Tomorrow: Integrated into Workflow
Acquisition
Results
Postprocessing/
Analytics
Intel Confidential – for internal use
only
Line of Business
+ Central IT
Filter/Preprocessing
Computation/
Simulation
HPC meets Big Data
Modelling
& Simulation
Anthropological &
Social data
Weather & Climate
Realtime monitoring
& sensor input
Historical trends
Intel Confidential – for internal use
only
Bioinformatics
Current Systems and Future Trends
Intel Confidential – for internal use
only
24
Current Cluster Architecture
Storage
Core Core Core
I/O
I/O
I/O
Memory
Memory
Memory
Core Core Core
Core Core Core
Core Core Core
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C C C
C C C
C C C
C C C
C C C
C C C
Memory
C
C
C
C
C
C
Intel Confidential – for internal use
only
CP
U
C
C
C
C
C
C
C
C
C
C
C
C
Coprocessor
CP
U
Coprocessor
CP
U
Coprocessor
CP
U
Coprocessor
CP
U
Coprocessor
CP
U
Coprocessor
Future Trends
Storage
Fabric Controller
integrated with CPU
Fabric performance
scales with CPU
Core Core Core
Core Core Core
I/O
I/O
I/O
I/O
Core Core Core
CP
U
CP
U
CP
U
CP
U
Core Core Core
Core Core Core
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C C C
C C C
C C C
C C C
C C C
C C C
Memory
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
Core Core Core
Memory
Memory
Highly parallel, wide vector CPUs
Increased cores/threads
Increased memory
capacity
and bandwidth
Intel Confidential – for internal use
only
Simplified node
architecture
Future Software Development
Today
Tomorrow
Intel Confidential – for internal use
only
Threading
Vectors
Communication
10s of threads
256 bit
MPI
100s-1000s
of threads
512 bit
MPI, SHMEM,
PGAS
27
The Path to Exascale
Intel Confidential – for internal use
only
28
The Path to Exascale
Roughly 10-12 years between each FLOPS barrier.
Plan to break ExaFLOPS barrier by 2020, but this time within a 20MW limit
ExaFLOP
PetaFLOP
1.0E+18
1.0E+15
TeraFLOP
1.0E+12
GigaFLOP
1.0E+09
1.0E+06
1.0E+03
1985- Cray 2
Intel Confidential – for internal use
only
1996 - Intel
ASCI Red
2008 - IBM
Roadrunner
2012 - Cray
Titan
2013 - NUDT/
Intel Tianhe-2
2020- ???
#1 on the Top500 list
Tianhe-2 (“Milky Way 2”)
National University of Defense Technology/Sun Yat-sen University, Guangzhou, China
3,120,000 Compute
Cores
1.4 TB RAM
32,000 Intel® Xeon™
Processors
33.8 PetaFLOPS
48,000 Intel® Xeon
Phi™
Co-processors
12.4 PB Global
Parallel Storage
~520 MW/ExaFLOP
Intel Confidential – for internal use
only
24 MW of Power and
Cooling
Performance/Power Challenges
12.6 MW
17.6 MW
2008
IBM Roadrunner
2011
Fujitsu K
2013
Tianhe-2
1.042 PF
10.51 PF
33.8 PF
2.35 MW
Intel Confidential – for internal use
only
20 MW
~960x Performance
~8.5x Power Consumption
1 EF
Exascale Requirements
Current #1 machine is capable of 33 PFLOPS while consuming 17.6 MW of power (24 MW
including HVAC)
So an ExaScale system needs to provide 25-30x the performance whilst consuming a little
over 10% more power
Moore’s law will get us part of the way, but a fundamental change is required
• 
• 
• 
• 
• 
• 
Lower Power Consumption (per core, per node, per cluster)
Reduced physical size through improved integration
Improved component and system reliability
New programming languages and methods
Increased parallelism and threading
Better insight into debugging and performance profiling
Intel Confidential – for internal use
only
Intel Research Areas
Many-core
Computing
High Bandwidth
Memory
Silicon
Photonics
Teraflops
Terabytes
Terabits
of computing power
of memory bandwidth
of I/O throughput
Future vision, does not represent real products.
Intel Confidential – for internal use
only
Driving Innovation and Integration
Integrated Today
Coming Tomorrow
SYSTEM LEVEL BENEFITS IN COST, POWER, DENSITY, SCALABILITY & PERFORMANCE
Intel Confidential – for internal use
only
Intel Exascale Labs — Europe
Strong Commitment To Advance Computing Leading Edge:
Intel collaborating with HPC community & European researchers
4 labs in Europe - Exascale computing is the central topic
ExaScale Computing
Research Lab, Paris
Performance and
scalability of Exascale
applications
Tools for performance
characterization
ExaCluster Lab,
Jülich
ExaScience Life Lab,
Leuven
Intel and BSC Exascale
Lab, Barcelona
Exascale cluster
scalability
and reliability
HPC for Life Science
Genomics,
Biostatistics
Scalable Runtime
System and tools
www.exascale-labs.eu
Intel Confidential – for internal use
only
New algorithms
Exascale Challenges
Exploiting massive parallelism
§ 
§ 
§ 
§ 
How will existing applications scale?
Will there be new apps or models using new algorithms?
Data transfer (memory, interconnect) will become relatively more expensive
Requirements on (hierarchical) programming models, schedulers, languages, …
Reducing power requirements
§  Must reduce the power requirement by a factor of at least 100
§  Is a challenge also for SW (middleware and applications)
§  Optimize for performance and power
Coping with run-time errors
§  Frequency of errors will increase, identification and correction will become more difficult
§  HPC middleware has to include resiliency
§  Redesign applications to embed resiliency?
Intel Confidential – for internal use
only
Career Opportunities at Intel
Intel Jobs Website: http://jobs.intel.com
Intel’s Exascale Labs are recruiting - Currently 2 open vacancies
Email: karl.solchenbach@intel.com
Intel Confidential – for internal use
only
37
Intel Confidential — Do Not Forward
28pt Light Text Section Break Page
12pt Medium Subhead
Intel Information Technology
Intel Confidential – for internal use
only
39
28pt Light Text Section Break Page
12pt Medium Subhead
Intel Confidential – for internal use
only
40