A node-level programming model framework for exascale computing*

By Chunhua (Leo) Liao , Stephen Guzik, Dan Quinlan LLNL-PRES-539073 Lawrence Livermore National Laboratory

* Proposed for LDRD FY’12, initially funded by ASC/FRIC and now being moved back to LDRD 1

We are building a framework for creating node-level parallel programming models for exascale

    Problem: • Exascale machines: more challenges to programming models • Parallel programming models: important but increasingly lag behind node-level architectures Goal: • Speedup designing/evolving/adopting programming models for exascale Approach: • Identify and implement common

building blocks

in node-level programming models so both




can quickly construct or customize their own models Deliverables: • A node-level

programming model framework

(PMF) with building blocks at language, compiler, and library levels •

Example programming models

built using the PMF 2

Programming models bridge algorithms and machines and are implemented through components of software stack


… Algorithm Programming Model Express Abstract Machine Software Stack

Language Compiler Application




• • •

Measures of success: Expressiveness Performance Programmability

• •

Portability Efficiency

… Real Machine


Parallel programming models are built on top of sequential ones and use a combination of language/compiler/library support Programming Model Abstract Machine (overly simplified) Sequential Memory CPU Parallel Shared Memory (e.g. OpenMP ) Distributed Memory (e.g. MPI ) Interconnect Shared Memory CPU … CPU Memory CPU … Memory CPU Software Stack: 1. Language 2. Compiler 3. Library General purpose Languages (GPL) C/C++/Fortran Sequential Compiler Optional Seq. Libs GPL + Directives Seq. Compiler + OpenMP support OpenMP Runtime Lib GPL + Call to MPI libs Seq. Compiler MPI library


Problem: programming models will become a limiting factor for exascale computing if no drastic measures are taken

   Future exascale architectures • Clusters of many-core nodes, abundant threads • • Deep memory hierarchy, CPU+GPU, … Power and resilience constraints, … (Node level) programming models: • Increasingly complex design space • Conflicting goals: performance, power, productivity, expressiveness Current situation: • Programming model researchers: struggle to design/build

individual models

to find the right one in the huge design space • Application developers: stuck with

stale models

high-level models and tedious low-level ones : insufficient 5

Solution: we are building a programming model framework (PMF) to address exascale challenges Level 1

A three-level, open framework to facilitate building node-level programming models for exascale architectures

Programming model 1 Reuse & Customize

Language Extensions Directive 1

Language Ext.

Compiler Sup.

Directive n Runtime Lib.

Level 2

Compiler Support (ROSE) Tool 1

Tool n

Programming model 2

Compiler Sup.

Level 3

Runtime Library Function 1

Function 1 Runtime Lib.

… Programming model n

Runtime Lib.


We will serve both researchers and developers, engage lab applications, and target heterogeneous architectures

 • • Users: Programming model researchers: explore design space Experienced application developers: build custom models targeting current and future machines  • • • Scope of this project DOE/LLNL applications The programming model framework vastly increases the flexibility in how the HPC stack can be used for application development.

Heterogeneous architectures: CPUs + GPUs Example building blocks: parallelism, heterogeneity, data locality, power efficiency, thread scheduling, etc. • Two major example programming models built using PMF 7

Example 1: researchers use the programming model framework to extend a higher-level model (OpenMP) to support GPUs

  OpenMP: a high level, popular node-level programming model for shared memory programming • High demand for GPU support (within a node) PMF: provides a set of selectable, customizable building blocks • Language : directives, like #acc_region, #data_region, #acc_loop, #data_copy, #device, etc. • Compiler : parser builder, outliner, loop tiling, loop collapsing, dependence analysis, etc. , based on ROSE • Runtime : thread management, task scheduling, data transferring, load balancing, etc. 8

Using PMF to extend OpenMP for GPUs Level 1 Level 2 Level 3 Programming model framework

Language Extensions Compiler Support (ROSE) Runtime Library Directive 1

Directive n Tool 1

Tool n Function 1

Function 1

Reuse & Customize OpenMP Extended for GPUs

#pragma omp acc region #pragma omp acc_loop #pragma omp acc_region_loop Pragma_parsing() Outlining_for_GPU() Insert_runtime_call() Optimize_memory() Dispatch_tasks() Balancing_load() Transfer_data() 9

Example 2: application developers use PMF to explore a lower level, domain-specific programming model

  Target lab application: • Lattice-Boltzmann algorithm with adaptive-mesh refinement for direct numerical simulation studies on how wall-roughness affects turbulence transition.

• Stencil operations on structured arrays Requirements: • • • Concurrent, balanced execution on CPU & GPU Users do not like translating OpenMP to GPU Want to have the power to express lower level details like data decomposition • Exploit domain features: a box-based approach for describing data-layout and regions for numerical solvers • Target current and future architectures 10

Using the PMF to implement the domain-specific programming model (ongoing work with many unknown details)

• •

C++ (main algorithm

infrastructure) Pragmas (gluing and supplemental semantics) Cuda (describe kernels) Compiler Support Building blocks Architecture A Architecture B Source-code that can be compiled using native compilers Executable

Language feature • Use a sequential language, CUDA, and pragmas to describe algorithms Compiler (first compilation) • Generate code to help chores • Custom code generation for multiple architectures Final compilation using native compilers, linking with a runtime library * Scheduling among CPUs and GPUs 11


   We are building a framework instead of a single programming model for exascale node architectures • • Building blocks : language, compiler, runtime Two major example programming models Programming model researchers • Quickly design and implementation solutions to exascale challenges • Eg. Explore OpenMP extensions for GPUs Experienced application developers • • Ability to directly change the software stack Eg. Compose domain-specific programming models 12

Thank you!