Optimization of Coupled Systems on Emerging Architectures

advertisement
Optimization of
Coupled Systems
on Emerging Architectures
Gerhard Theurich
NRL/SAIC
ESMF Executive Board / Interagency Working Group Meeting
June 12, 2014
Coupling challenges


Applications are trending toward larger numbers of
components:
–
Coupling with multiple time-scales
–
Explicit, semi-implicit and, fully implicit schemes
–
High resolution, adaptive, unstructured grids
–
Hierarchical versus flat component architectures
–
Ensembles: multi-instance, multi-model, concurrent versus
sequential.
To make matters worse:
The growing complexity of coupled systems is
compounded by the increasing levels of explicit
parallelism on emerging computing architectures.
HPC hardware is a changing element

1980s – Vector machines
–

1990s – Parallel machines
–


Coarse grain parallelism.
Early 2000s – Massively parallel machines (some parallel vector)
–
Distributed memory.
–
Distributed shared memory.
–
MPI as the standard to implement coarse grain parallelism.
Early 2010s – Massively parallel machines with multi-core CPUs
–

SIMD type parallelism of serial code.
Hybrid coarse and fine grain parallelism: MPI+threads (e.g. OpenMP)
Today – Heterogeneous systems
–
Heterogeneous HW: Multi-core CPUs + GPUs + MICs
Not every node the same: different #CPU sockets, memory size & devices
–
Heterogeneous SW: MPI+OpenMP + CUDA/PGI-ACC/Cray-ACC/OpenACC/IntelMIC-Directives/Intel-MIC-Native/OpenCL
Modern supercomputer architecture
Increasing fine grain parallelism:
Threading and SIMD on CPU cores, and device cores.
Host – multi-core CPU
RAM
Increasing
course grain
parallelism:
Decomposition
and
distribution.
cores
Host – multi-core CPU
RAM
cores
GPU
GPU
GPU
GPU
.
.
.
Host – multi-core CPU
RAM
cores
MIC
MIC
O(10k)
nodes
Earth system models on accelerators



Focus has been on specific models.
Optimization of the computationally intensive
kernels to take advantage of the extra level of
parallelism offered by accelerators.
Examples:
–
–
WRF Single Moment 5-tracer (WSM5) on
Nvidia GPU (Michalakes, et al.)
NEMS/NMMB on Intel MIC (Michalakes)
Coupling challenges on modern architectures





Components are distributed across O(10k) nodes.
Field and grid data is stored and processed on different hardware
throughout a run within a component and between components.
Efficient coupling requires data locality between the components to
reduce the cost of data movements.
A limited number of each type of processing hardware (and memory)
is available and must be shared between the components.
Efficient use of the available hardware requires some oversubscription, but suffers from too much.
ESMF and the NUOPC Layer offer key
elements to address the coupling challenges






Support for a wide range of multi-model application architectures.
Data types that can represent a large range of structured and
unstructured grids.
Data types that can represent data decompositions and their
distribution across the underlying hardware.
Methods to move data efficiently between
decompositions/distributions.
A well defined set of initialization sequences with multi-way
negotiation between the components.
Grids and decomposition information can be transferred between
components during initialization.
Example of interleaved components
Comp-A
Host – multi-core CPU
cores
GPU
Host – multi-core CPU
GPU
cores
.
.
.
Host – multi-core CPU
cores
GPU
Comp-B
Comp-C
ESMF/NUOPC accelerator projects






ESMF team efforts are funded through ONR/Earth System Prediction
Capability
Target is a suite of coupled models for next generation Naval prediction
1 year ONR seed project: Optimized Infrastructure for the Earth System
Prediction Capability includes prototype accelerator support (began May
2013)
3 year ONR project: An Integration and Evaluation Framework for ESPC
Coupled Models includes delivery of capability for coupled systems
Specific projects we interact with under ONR include:
–
Accelerated Prediction of the Polar Ice and Global Ocean (APPIGO)
–
NPS-NRL-Rice-UIUC Collaboration on Navy Atmosphere-Ocean
Coupled Models on Many-Core Computer Architectures
Our part is to look into accelerators with ESMF/NUOPC specifically for
coupled systems.
Initial questions and considerations



Can components that use different programming models
(OpenCL, OpenACC, Intel-MIC-Directives, …) run under the
same single ESMF executable?
Do the different programming models provide enough control
for a component to decide at run-time whether to use a specific
accelerator device or not?
Is it possible to uniquely identify the available devices? Across
programming models? Across the distributed parts of a
component? Across components?
Early prototyping
Jayesh Krishna (ANL) has prototyped ESMF components with
OpenCL, OpenACC, and Intel-MIC-Directives.
Feature
OpenCL
OpenACC
Intel-MICDirectives
Can be
combined in
ESMF
application
YES
YES
YES
Control
offloading
during runtime
YES
YES
NO?
Identify the
available
resources
YES (but)
GPUs+MICs,
within OpenCL
YES (but)
GPUs, within
OpenACC
YES (but)
MICs, within
MIC-Directives
Next steps




Offer access to device information through ESMF: enough to guide a
driver component to do component placement (interleaved
components).
Support data references for the most efficient exchange between
sequential components that are placed on the same compute
resources.
Prototype the inter-component negotiation of distributions by looking
at the optimization problem of model grid distribution within the
mediator component.
Explore the possibility of automated construction of interleaved
component based on the discovered resources and hints provided by
the components during the initialization negotiation.
Thank you!
Project page on Earth System CoG:
https://earthsystemcog.org/projects/couplingtestbed/acceleratorplans
Download