Code description

advertisement
HPCMP Benchmarking and
Performance Analysis
Mark Cowan
USACE ERDC ITL in support of DoD HPCMP
Tuesday, April 17, 2012
What is the HPCMP?
• Initiated in 1992
• Congressional mandate to modernize
DoD’s HPC capabilities
• Assembled from collection of HPC
departments across Army, Air Force, and
Navy labs and test centers
What is the HPCMP?
FOCUS
• Solve military and security problems using HPC hardware
and software
• Assess technical and management risks
• Performance
• Time
• Available resources
• Cost
• Schedule
• Supports DoD objectives through research, development,
test and evaluation
Where we benchmark
Migrate to a 2-year acquisition cycle
•Why the radical change?
• Entice more vendors into the competition
• Vendor feedback  remove or alleviate disincentives
• Review the entirety of the TI acquisition process
• Line-by-line justification of benchmarking rules document
• Address both HPC community and vendor concerns
• Comprehensive reevaluation of how we benchmark
• Analyze the codes
• Justify the test cases
Migrate to a 2-year acquisition cycle
• Dangers?
• Time the milestones poorly on the calendar and miss out
on release of cutting-edge technology
• Difficult problem
• How to schedule activities to maximize likelihood of hitting
publicly-available products months in advance, while being
blind to intricacies of chip fabrication schedules and
unforeseen recalls?
Codes considered for TI-11/12
ABAQUS
COBALT
ICEPIC
ABINIT
CP2K
LAMMPS
ACES
CPMD
LS-DYNA
ADCIRC
CTH
MATLAB
ADH
ETA
OOCORE
ALE3D
FDTD
OVERFLOW
ALEGRA
FLAPW
SHAMRC
AMR
FLUENT
SIERRA
AVUS
GAMESS
STAR CCM+
CFD++
GASP
VASP
CFDSHIP-IOWA
GAUSSIAN
WRF
COAMPS
HYCOM
XPATCH
TI-11/12 benchmarking applications
•
•
ADCIRC – Coastal Circulation
and Storm Surge model
– 100% Fortran, MPI
– Uses METIS library (C)
– 205K LOC
ALEGRA – Hydrodynamic and
solid dynamics plus magnetic
field and thermal transport
– 96% C, 4% Fortran, MPI
– 978K LOC
•
AVUS (Cobalt-60) – Turbulent
flow CFD code
– Fortran, MPI, 29K LOC
•
CTH – Shock physics code
– ~58% Fortran/~42% C, MPI,
900K LOC
 GAMESS – Quantum chemistry
code
– Fortran, MPI, 330K LOC
 HYCOM – Ocean circulation
modeling code
– Fortran, MPI, 31K LOC
 ICEPIC – Particle-in-cell
magnetohydrodynamics code
– C, MPI, 350K LOC
 LAMMPS – Molecular dynamics
code
– C++, MPI, 45K LOC
█ Predicted
█ Benchmarked
Components of testing packages
• Applications tested on representative input sets
Distinguished
Time (sec) on
CODE
CASE
Core Count
DIAMOND
Core Counts
ADCIRC
baroclinic
1024
8959
512, 768, 1024, 1280, 1536, 1792, 2048
ADCIRC
hurricane
1280
2082
512, 768, 1024, 1280, 1536, 1792, 2048
ALEGRA
obliqueImp
1536
1640
1024, 1280, 1536, 1792, 2048
ALEGRA
explWire
256
944
256, 384, 512, 768, 1024
AVUS
waverider
1024
941
384, 512, 768, 1024, 1536
AVUS
turret-td
1280
1332
768, 1024, 1280, 1536, 2048
CTH
fixed-grid
1280
3399
768, 1024, 1280, 1536, 2048
CTH
amr
1280
2535
768, 1024, 1280, 1536, 2048
GAMESS
DFT-grad
256
4701
128, 192, 256, 384, 512
GAMESS
MP2-grad
512
2536
128, 256, 512, 768, 1024
GAMESS
CC-energy
1024
3658
512, 768, 1024, 1536, 2048
HYCOM
lrg
1353
3020
1001, 1353, 1516, 1770, 2045
ICEPIC
magnetron
384
2559
256, 384, 512, 768, 1024
ICEPIC
gyrotron
2048
3639
1536, 1792, 2048, 2304, 2560
LAMMPS
Au
1024
3182
128, 256, 384, 512, 1024, 1280, 1536, 2048
Some components of HPC procurement cycle
Some components of HPC procurement cycle
• Acquire new versions of codes
• Port codes to various machines
• Acquire test cases
• Develop or acquire accuracy checks
• Test codes, get times to compare
• Assemble package for vendors
Some components of HPC procurement cycle
• Run codes with test cases on
installed DSRC machines
• Optimize! How fast can we go?
Some components of HPC procurement cycle
• We review vendor submittal
• Anything suspicious?
• How do vendor times compare to
ours? How did vendors optimize?
• How risky is vendor’s proposal?
• Present our results
Components of testing packages
• Timers measure the elapsed running times
• Accuracy checks ensure validity of output files
• Often requires determination of acceptable error bounds
continued
How the test packages are used
• Run all test cases on 5 different DSRC machines to
acquire times
• Debug test packages
• Quantify variation across/within machines
• Compare times to proposed systems
Machine attributes
Architectures Used in Study
DSRC
Name
Make
ERDC
Diamond
SGI
MHPCC
Mana
Dell
NAVY
DaVinci
IBM
NAVY
ERDC
Einstein
Garnet
Cray
Cray
Model
Chip Set
Altix ICE
Intel
Xeon QC
PowerEdge
Intel
M610
Xeon QC
Processor
Number of Cores Operating
Speed Interconnect
Cores per Node System
(GHz)
2.8
DDR4
InfiniBand
15360
8
SUSE Linux
2.8
DDR
InfiniBand
9216
8
Linux
Power6
IBM P6
DC
4.7
DDR
Infiniband
4800
32
AIX
XT5
Cray
Opteron
QC
2.3
SeaStar2+
12736
8
CNL
XE6
AMD
Opteron
64-bit
2.4
Cray Gemini
20224
16
CLE
RESULTS! Graphs of runtimes
Risk Assessment: Major Areas Assessed
• Compliance assessment
– Ability to follow benchmark rules
– Number of test case results provided
– Results within accuracy criteria
• Assessment of risk in meeting proposed times in acceptance tests
– Differences between benchmarked and proposed system
• Processor , interconnect, and I/O system differences
– Quality of estimation procedure
• Quality of explanation and soundness of estimation procedure
– Aggressiveness of final estimate
• Comparison with measured benchmark system times
• Comparison with predicted times
• Assessment of likelihood of users and/or developers using proposed
code modifications
– Acceptability of proposed code modifications
Benchmarking website
URL:
http://www.benchmarking.hpc.mil/
Benchmarking website
continued
Narrative of website purpose, codes tested
Heatmap of systems best suited for applications
Benchmarking website
Brief description of application
Brief description of test cases
continued
Benchmarking website
continued
An example of how we made
the heatmap for allocation
choices
Benchmarking website
continued
Got a question?
Want to suggest an improvement?
Contact us.
Performance Team Members
•Mark Cowan – ERDC – Chair
•Larry Davis – HPCMPO
•Lloyd Slonaker – AFRL
•Tim Sell – AFRL
•Laura Brown – ERDC
•Mahbubur Rashid – ERDC
•Christine Cuicchi – NAVO
•Matt Grismer – AFRL
•Jerry Boatz – AFRL
Performance Team Advisors
•
•
•
•
•
•
•
William Ward – HPCMPO
Steve Finn – DTRA
Carrie Leach – ERDC
Paul Bennett – ERDC
Tom Oppe – ERDC
Henry Newman – Instrumental
Michael Laurenzano – SDSC
• Bronis de Supinski – LLNL
• Joseph Swartz – LM
• Allan Snavely – SDSC
• Laura Carrington – SDSC
• Robert Pennington – NSF
• Nick Wright – NERSC
• James Ianni – ARL
Questions?
Contact me…
Mark Cowan
USACE ERDC ITL
Computational Analysis Branch
3909 Halls Ferry Road
Building 8000, Room 1255
Vicksburg, MS 39180
(601) 634-2665
Mark.A.Cowan@usace.army.mil
ADDENDA
AVUS: Code description
•CFD code, formerly COBALT_60
•Simulates 3-D turbulent viscous flow over
irregular geometries
•Grid-based, reads a large grid file
•AVUS: 29K lines of Fortran 90 code
•Uses ParMETIS: 12K lines of C code
•Parallelism via MPI, no OpenMP
•Runs on Cray XT, IBM Power, SGI Altix, Linux
clusters
CTH: Code description
• CTA: CSM (Computational Structural Mechanics)
• Shock Physics
• Two-step, 2nd order accurate Eulerian algorithm is used
to solve the mass, momentum, and energy conservation
equations
• An explicit approach that does not require solving a linear
system
• Has both static and adaptive mesh capabilities
• Parallelism via MPI
• 900K LOC, 58% FORTRAN and 42% C
• Uses NetCDF, supplied with distribution
GAMESS: Code description
• CTA: CCM (Computational Chemistry, Biology, and
Materials Science)
• Ab Initio Quantum chemistry
• Computes many energy integrals with molecular data
in form of atom positions and electron orbitals
• Communication depends on platform
• LAPI, Sockets, SHMEM, MPI
• Code composition: 99% FORTRAN, 1% C
HYCOM: Code description
•CTA: Climate/Weather/Ocean Modeling and
Simulation (CWO)
•A primitive equation ocean general circulation
model
•Communication is MPI (MPI-2 is available)
•100% FORTRAN
•Version 2.2.27
HYCOM: MPI-2 details
•HYCOM may be run with MPI or MPI-2
•MPI-2 is MPI with additional features such as
parallel I/O, dynamic process management and
remote memory operations
•HYCOM utilizes parallel I/O feature
•Parallel I/O times required starting with TI-10
ICEPIC: Code description
•CTA: Computational Electromagnetics and Acoustics (CEA)
•Particle-in-cell plasma physics code
•Ions and electrons move under influence of electromagnetic
fields
•Particles are updated in a grid-free manner; grouped in cells
which are periodically adjusted to preserve load balance
•Fields calculated on a structured, static grid and dual grid
according to Maxwell's Equations
•Can simulate plasmas contained in complex geometries
•Used in electromagnetic device design
•~350K lines of code, 100% C++, C
LAMMPS: Code description
• CTA: CCM (Comp Chemistry, Biology, & Material
Science)
• Classical molecular dynamics code that models
particles in a liquid, solid, or gaseous state
• Calculates atomic velocities, positions, system energy,
and temperature
• After equilibration: surface tension, radial pressure, and phase change
• Post-processing: pair-correlation function and diffusion coefficients
• All actions occur within box (usually orthogonal)
•Distributed-memory message-passing parallelism (MPI)
• Highly-portable C++
• Libraries needed: MPI and single-processor FFT
ADCIRC: Code description
ADCIRC Coastal Circulation and Storm Surge Model
Solves time dependent, free surface circulation and transport problems in 2 and 3
dimensions. Use the finite element method in space, which permits highly flexible,
unstructured grids.
Typical ADCIRC applications have included:
•Modeling tides and wind driven circulation,
•Analysis of hurricane storm surge and flooding,
•Dredging feasibility and material disposal studies,
•Larval transport studies, and
•Near shore marine operations
“BASE” ALEGRA: Code description
ALE code -- Arbitrary Lagrangian-Eulerian -- provides flexibility, accuracy and reduced numerical
dissipation over pure Eulerian code; modern remeshing technology allows for robust mesh
smoothing and control.
Hydrodynamic and solid dynamics
Models large distortions and strong shock propagation in multiple-materials
Finite element code; descendent of PRONTO and uses some
CTH Eulerian technology
Energy deposition and explosive burn models
Geometry -- 2D/3D Cartesian, 2D cylindrical
Material Models in ALEGRA:
Equations of State
Elastic-Plastic Models
Fracture Models
Pressure and temperature during
formation of jet from shaped charge
“BASE” ALEGRA: Code description
ALEGRA_MHD: Code description
All hydrodynamics/solid dynamic modules of "base" ALEGRA PLUS magnetic field and
thermal transport effects
Lorentz forces, Joule heating, thermal transport and simple models for radiating excess
energy
2D and 3D versions
2D modeling with the magnetic flux density vector components in or out of the
plane with the corresponding current density out of or in the plane, respectively.
3D uses a magnetic diffusion solution based on edge and face elements which
maintains the discrete flux divergence-free property during magnetic solve and constrained
transport remap stage
Lumped element coupled circuit equations
Magnetic and thermal conduction
Advanced models for thermal and electrical conductivity
Emission model radiates excess energy when medium is optically thin while accounting for
reabsorption
Download