SUPER Project: Oak Ridge National Laboratory

advertisement
SUPER Project:
Oak Ridge National
Laboratory Team
Update: 9/18/2013
Eduardo D’Azevedo
Philip C. Roth
Sarat Sreepathi
Patrick H. Worley (Site Contact)
Computer Science and Mathematics Division
Oak Ridge National Laboratory
Oak Ridge, TN USA
SUPER All Hands Meeting
Oakland, CA
September 18, 2013
ORNL Update Overview
•  ORNL SUPER team is involved in several activities
–  Engagement
–  End-to-end/Integration
–  Tool Development
–  I/O Analysis and Optimization
–  Communication Analysis and Optimization
•  Personnel
–  Lost Gabriel Marin to UT-K in February 2013
–  Added Sarat Sreepathi, Ed D’Azevedo
2 SUPER All-Hands Meeting, September 18-19, 2013
Engagement
•  Managed SUPER engagement with Science
Application Partnership (SAP) projects
•  Participated in 4 SAP projects (with SAP funding)
•  FY13 SAP engagement activities and highlights
–  FES: Plasma Surface Interactions (PSI)
•  Performance analysis of SOLPS, an existing plasma
surface interactions code; began optimization by porting
parts to accelerator
•  Provided guidance on design for lightweight performance
data collection infrastructure within XOLOTL, a plasma
surface interaction code being developed as part of the
PSI project
3 SUPER All-Hands Meeting, September 18-19, 2013
Engagement
•  FY13 SAP engagement activities and highlights
–  FES: Center for Edge Physics Simulation (EPSi)
•  Contributed to 4X performance improvement on Titan
(and to smaller improvements on other systems)
•  Documented performance for INCITE, ERCAP,
highlights, ... on Edison, Mira, and Titan. Results will be
presented in a SC13 poster.
•  Continued development and maintenance of
infrastructure for automatic performance data capture
•  Developed "performance variability" fault tolerance
infrastructure
•  Packaged and maintained benchmark version of code
for SUPER, plus ongoing support
•  Ongoing evaluation of new code / new problem sizes;
contributing to planning
4 SUPER All-Hands Meeting, September 18-19, 2013
EPSi Highlights
5 SUPER All-Hands Meeting, September 18-19, 2013
Engagement
•  FY13 SAP engagement activities and highlights
–  BER: Predicting Ice Sheet and Climate Evolution at
Extreme Scales (PISCEES)
•  Evaluated performance of "old" model, identifying how to
more than double performance on Titan
•  Evaluated performance of one of the "new" models,
identifying need for more work in new solver (ML)
•  Developed infrastructure for collecting performance data
from Trilinos compatible with existing performance data
infrastructure
•  Ongoing development of infrastructure for including
performance tests as part of V&V test suite
6 SUPER All-Hands Meeting, September 18-19, 2013
PISCEES Highlights
SEACISM Performance: 5km GIS (No I/O)
Cray XK7 (1 sixteen-core processor per node) using all cores per node
10000
with evolving temperature
GNU, overlap 0
GNU, overlap 1
PGI, overlap 1
with constant temperature
June 2012: PGI, overlap 1
8000
Timesteps per Day
7000
16000
14000
Timesteps per Day
9000
SEACISM Performance: 5km GIS with evolving temperature
18000
6000
5000
4000
12000
10000
8000
6000
3000
2000
1000
0
0
1000
2000
3000
4000
5000
6000
Processor Cores
7000
8000
9000
256
Cray XE6 (two 12-core AMD proc. per node)
ILU/ML preconditioner
ILU/Belos preconditioner
ILU/AztecOO preconditioner
128
64
32
16
1024
2048
4096
Processor Cores
7 SUPER All-Hands Meeting, September 18-19, 2013
0
0
2000
4000
6000
8000
10000
12000
Allocated Processor Cores
FELIX Ice Sheet Model Dynamical Core Performance
Steady State Solution on 2km Greenland Ice Sheet Grid
Seconds for Nonlinear System Solution
Cray XK7 (1 sixteen-core processor per node)
using every other core per node and and overlap of 0
No IO (GNU, more opt.)
No IO (GNU)
No IO (PGI)
4000
2000
8192
16384
14000
16000
18000
Engagement
•  FY13 SAP engagement activities and highlights
–  BER: Applying Computationally Efficient Schemes for
BioGeochemical Cycles (ACES4BGC)
•  Continued development and maintenance of
infrastructure for automatic performance data capture for
Community Earth System Model (many versions; moving
target)
•  1.6X performance improvement for a suite of production
runs
•  2X performance improvement for a (different) suite of
production runs
•  Packaged and maintained MOAB kernel (from Tim
Tautges) for SUPER
8 SUPER All-Hands Meeting, September 18-19, 2013
Engagement
•  FY13 SAP engagement activities and highlights
–  BER: Multiscale Methods for Accurate, Efficient, and ScaleAware Models of the Earth System (no SAP funding)
•  Made available infrastructure for collecting performance
data from Trilinos to the group developing implicit
discreatizations of the atmosphere model
•  Contributing to ongoing performance evaluation and
optimization
9 SUPER All-Hands Meeting, September 18-19, 2013
End-to-End/Integration
•  Continued development of data collection infrastructure and
“marketed” to SAP projects, primarily in FES and BER so far
•  Collaborated with LLNL and Univ. of Oregon on analyzing
data being collected (see LLNL slides)
•  Leveraged infrastructure in developing performance variability
“aware” simulation control logic for EPSi, and currently
designing similar capability for ACES4BGC
•  Extending infrastructure to support I/O analysis and
optimization activity (new activity)
•  Extending infrastructure to support communication analysis
and optimization activity (new activity)
10 SUPER All-Hands Meeting, September 18-19, 2013
Application Characterization (mpiP)
•  Oxbow application characterization activity
•  Enhanced mpiP with ability to capture communication topology for
point-to-point and for collective communication operations
•  Paper describing Oxbow application characterization submitted to
Performance Modeling, Benchmarking, and Simulation of High
Performance Computer Systems (PMBS13) (in review)
Number of messages
1e+06
MPIRandomAccess
PTRANS
MPIFFT
HPL
100000
10000
1000
100
10
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
log_2(Msg size bin upper bound, exclusive)
Message sizes, HPCC, selected phases
11 SUPER All-Hands Meeting, September 18-19, 2013
HPCC MPI Random Access (!) communication topology
(Visualized using VisIt)
Application Analysis: Value Influence Tracking
•  Understanding how values are propagated through time and space can help us
recognize
–  Inefficient/unnecessary computation (e.g., cut-off distance)
–  Incorrect computation (e.g., this value should have been accessed)
–  Values for which high reliability is needed
•  Value influence tracking is a direct, on-line approach for tracking how a value
contributes to later computation (its influence) in multithreaded and MPI programs
•  VIT tool, based on Intel Pin and PMPI profiling interface, implements this approach
–  Dynamic instrumentation associated
with individual instructions propagates
influence data
–  PMPI versions of functions propagate
influence data between address
spaces
•  P.C. Roth, “Tracking a Value’s Influence on
Later Computation,” 2013 Workshop on
Productivity and Performance
(PROPER 2013), Aachen, Germany, August
2013.
u: 0.3!
Opera&on:*u*+*v*
Influence*operator:*average*
v: 0.5!
12 SUPER All-Hands Meeting, September 18-19, 2013
dest: 0.4!
Task 0!
Task 1!
Main thread!
Main thread!
A[0]: 0.3!
A[1]: 0.7!
A[2]: 0.4!
…!
MPI_Recv(B,…)!
MPI_Send(A,…)!
PMPI_S
end of
A!
PMPI_S
en
IA[1] , I d of IA[0] ,!
A[2] , …!
B[0]: 0.3!
B[1]: 0.7!
B[2]: 0.4!
…!
Value Influence Tracking: Example
•  2D heat transfer application (5 point stencil)
Value influence propagation, starting with value on left
boundary. Cells colored according to magnitude of
influence associated with that data point.
13 SUPER All-Hands Meeting, September 18-19, 2013
I/O Characterization and Optimization
•  New activity (sort of)
–  Builds on work from SciDAC 2 Petascale Data Storage
Institute, and work with mpiP
–  Leverages end-to-end data collection infrastructure
–  Motivated by a number of different engagement activities
•  Personnel: Roth (lead), Sreepathi, Worley
•  Research directions
–  Techniques for monitoring application I/O behavior and
correlating it with system activity to diagnose causes of
observed I/O performance variability
–  I/O scheduling techniques to reduce conflicts
–  On-line auto-tuning of I/O-related parameters
14 SUPER All-Hands Meeting, September 18-19, 2013
Communication Optimization
•  New activity
–  Also leverages end-to-end data collection infrastructure
–  Also motivated by a number of different engagement
activities
•  Personnel: D’Azevedo (lead), Roth, Worley
•  Research directions
–  Runtime techniques for process placement that minimizes
communication overhead based on (initially) offline
communication characterization and online information
about allocated nodes and network topology, with special
attention to
•  applications with multiple communication phases, each with different
mapping preferences
•  coupled models in which mapping must take into account
dependencies between components but for which multiple
components can be mapped to the same nodes
15 SUPER All-Hands Meeting, September 18-19, 2013
Communication Optimization:
3mm DIII-D and 5.5mm ITER on Titan
•  XGC1 uses a logical 2D processor grid. Different phases of code prefer different mappings
of process grid to physical nodes and cores, and importance increases when multiple
processes assigned to a single node. These “trends” are also sensitive to the nodes
allocated for a given run, which is not controllable on Titan (or Hopper or Edison) currently.
•  Optimal mapping different for 5.5mm ITER grid than for DIII-D. Note weird non-power-oftwo behavior for Dimension 2 major ordering.
16 SUPER All-Hands Meeting, September 18-19, 2013
Summary
•  The ORNL SUPER team is actively involved in
–  Engagement
–  End-to-end/Integration
–  Application Characterization/Analysis/Optimization and
development of enabling techniques and tools
For more information, contact
•  Phil Roth at rothpc@ornl.gov
•  Pat Worley at worleyph@ornl.gov
17 SUPER All-Hands Meeting, September 18-19, 2013
Download