Presentation

advertisement
Porting Scientific Applications to
OpenPOWER
Dirk Pleiter
Forschungszentrum Jülich / JSC
#OpenPOWERSummit
Join the conversation at #OpenPOWERSummit 1
JSC’s HPC Strategy
IBM Blue Gene/L
JUBL, 45 TFlop/s
IBM Power 6
JUMP, 9 TFlop/s
IBM Blue Gene/P
JUGENE, 1 PFlop/s
Intel Nehalem
JUROPA
300 TFlop/s
File Server
IBM Blue Gene/Q
JUQUEEN
5.9 PFlop/s
Lustre GPFS
JURECA
~ 2 PFlop/s
+ Booster
~ 10 PFlop/s
General-Purpose Cluster
Highly Scalable System
Join the conversation at #OpenPOWERSummit
2
Achieving Scalability
Need for Research on
 Research on architectures and technologies
 Research on applications and algorithms
 Ingredients for HPC co-design
Provide incentives to users: High-Q Club
 Showcase for codes that can utilize a
28-rack Blue Gene/Q at JSC Selected
 Club members
•
•
•
•
dynQCD: simulation of particle theories
KKRnano: DFT-based condensed matter
PEPC: Tree-based N-body code
...
Join the conversation at #OpenPOWERSummit
3
Why OpenPOWER?
Answer from a customer point of view
 Increasing share of Top500 are based on CPUs from
single vendor
•
Pure market observation, no statement about technology
 Lack of competition in processor technologies
•
•
Usually higher prices
Less incentive for innovations
 Need for promoting alternative technologies
•
OpenPOWER
Join the conversation at #OpenPOWERSummit
4
Why OpenPOWER?
Answer from an architectural point of view
 Tight integration of high-performance processor and
low-clocked, highly parallel compute devices
•
•
•
Enable drastic improvement of power efficiency
Preserve usability at tremendously increased level of
parallelism
Opportunity to improve overall balance of system
 Integration of non-volatile memory into fat compute
nodes
•
Increased reliability though reduced number of components
and support of resilience
 Addresses exascale challenges
Join the conversation at #OpenPOWERSummit
5
POWER Acceleration and Design Center
PADC is a collaboration between
 IBM R&D Labs in Böblingen and Zürich
 Forschungszentrum Jülich
 NVIDIA Europe
Mission statement
 Support scientists and engineers to target the grand
challenges facing society using OpenPOWER technologies
 Grand challenges
•
•
•
Energy and environment, e.g. plasma physics
Information, e.g. condensed matter physics
Healthcare, e.g. brain research
http://www.fz-juelich.de/ias/jsc/padc
Join the conversation at #OpenPOWERSummit
6
Applications
PADC takes application driven approach
 Builds on previous work in NVIDIA Application Lab
 Previously targeted applications
•
•
•
•
Regional Flood Model
B-CALM
PANDA
...
 Ongoing and future applications
•
•
•
KKRnano
BigBrain
...
Join the conversation at #OpenPOWERSummit
7
Performance Analysis
 Performance analysis POWER8 memory hierarchy
 Performance analysis
GPU-GPU data transport
Join the conversation at #OpenPOWERSummit
8
Performance Characterization
Characterization of applications on given hardware
 Methodology
•
•
•
•
•
Identification of performance critical kernels
Optimization of kernel at best effort with given constraints
Performance characterization
Measurement of extensive performance metrics
Architectural analysis
 Question addressed in architectural analysis
•
•
•
How does performance change with clock speed?
How does it depend on memory hierarchy?
…
Join the conversation at #OpenPOWERSummit
9
Performance Characterization
Example: Regional Flood Model
 Key kernel: Solver for Saint-Venant equations
•
Compute particle flow in 2 dimensions
 Selected performance metrics (on K20x)
•
•
•
Arithmetic intensity AIacc(T) = 0.5
Memory rd/wr bandwidth = 80/86 GByte/s
Warp execution efficiency εwarp = 80%
 Example analysis for changing
boost clock
Join the conversation at #OpenPOWERSummit
10
Performance Modelling
Semi-empirical performance modelling methodology
 Methodology
•
•
•
On basis of prior knowledge formulate scaling formulae
describing dependence of execution time ∆t(W) as function of
work-load W
Measure ∆t(W ) for different W and fit scaling formulae to
result
Check fitted parameters for plausibility
 Considered example: B-CALM
•
1-dimensionally parallelized Finite
Difference Time Domain approach
for electro-magnetic simulations
Join the conversation at #OpenPOWERSummit
11
Performance Modelling
Semi-empirical performance modelling for B-CALM
 Model ansatz
•
•
•
•
Calculation of boundary sites ∆tbnd ~ Nx Ny
Calculation of bulk sites ∆tbulk ~ Nx Ny (Nz / P)
Communication of boundary ∆tnet ~ Nx Ny
Overlapping calculations and communications:
∆t = ∆tbnd + max(∆tbulk, ∆tnet)
 Weak scaling measured for
fixed Nx Ny using P=2 GPUs
attached to single processor
Non-optimized MPI
Join the conversation at #OpenPOWERSummit
12
Future opportunities

Challenging applications with large memory capacity and
high bandwidth requirements
•
•

Example: BigBrain project at FZ Jülich
•
•
•
•

High bandwidth, smaller capacity memory attached to GPU
Large capacity, smaller bandwidth memory attached to CPU
Goal: 3d brain model reconstructed from 2d slices
Computational challenge: image registration
Compute intensive computation of
mutual information metric
Large capacity required for storing
high-resolution images
Significant slow-down found using host
[A. Adinets et al., HeteroPar 2013]
memory on today’s architectures
 Large benefit expected from NVLink
Join the conversation at #OpenPOWERSummit
13
Conclusions
 OpenPOWER opens important opportunities for
HPC infrastructure providers
•
Exascale challenges are addressed
 No problems porting GPU-enabled applications to
OpenPOWER
•
Room for optimizations
 Support for porting more applications required
•
•
Based on performance characterization and modelling
Optimization and code restructuring within PADC
Join the conversation at #OpenPOWERSummit
14
Download