Application Performance Tools for Linux

advertisement
Advanced Computing Technology Center
The IBM High Performance
Computing Toolkit
Guojing Cong
© 2005 IBM Corporation
Advanced Computing Technology Center
IBM High Performance Computing Toolkit (HPCT)
 One consolidated package
 Components:
– Hardware Performance Monitor(HPM)
– Simulation Guided Memory Analyzer (SiGMA)
– MPI Profiler (MP_profiler)
– OpenMP Profiler (PompProf)
– Modular I/O Performance Tool (MIO)
– Xprofiler
– GUI integration tool w/ source code traceback (PeekPerf)
– Watson Sparse Matrix Library (WSMP) included
© 2005 IBM Corporation
Advanced Computing Technology Center
Our Vision
 A toolkit that spans various aspects of high performance
computing
– CPU profiling, memory behavior analysis, communication profiling,
I/O analysis and optimization
 Integrated performance monitoring and profiling environment
– one single consistent interface for all components
– enhanced functionality
• Binary instrumentation (without source code modification)
• Dynamic instrumentation
 Available on IBM Platforms
– AIX, LoP, and BlueGene
© 2005 IBM Corporation
Advanced Computing Technology Center
Support Matrix
HPMCount
&
HPMlib
MPprofiler&
MP-tracer
Xprofiler
SHMEM
&
SHMEMprofiler
MIO
PompPofi
ler
AIX
Powe
r
today (AIX
5L 5.1, 5.3)
today
(AIX 4.3.3
+)
today
(AIX 5L
5.1)
today
(AIX 5L
5.1)
today(AI
X 5L 5.1)
today
(AIX 5L
5.1)
Linux
Powe
r
Aug/05
(Linux 2.4
&2.6)
May/05
(Linux
2.6)
Aug-Sep/05
(Linux 2.6)
N/A
TBT
(Linux
2.6)
Linux
JS20
Aug/05
(Linux 2.4
&2.6)
May/05
(Linux
2.6)
Aug-Sep/05
(Linux 2.6)
Linux
BG/L
Aug/05
today
Aug/05
PeekPerf
Watson Sparse
Matrix Package
today (AIX
4.3.3+)
today(AI
X 4.3.3+)
today (AIX 5L
5.1)
N/A
Aug-Sep/05
(Linux 2.6)
TBT
TBT(Linux 2.6)
N/A
TBT
(Linux
2.6)
N/A
Aug-Sep/05
(Linux 2.6)
TBT
TBT(Linux 2.6)
N/A
TBT
N/A
N/A
today
N/A
SiGMA
© 2005 IBM Corporation
Advanced Computing Technology Center
Outline
 Xprofiler
 HPM
 MP Profiler
 OpenMP Profiler
 MIO
© 2005 IBM Corporation
Advanced Computing Technology Center
Xprofiler
 CPU profiling tool similar to gprof
 Can be used to profile both serial and parallel applications
 Use procedure-profiling information to construct a graphical
display of the functions within an application
 Provide quick access to the profiled data and helps users
identify functions that are the most CPU-intensive
 Based on sampling (support from both compiler and kernel)
 Charge execution time to source lines and show
disassembly code
© 2005 IBM Corporation
Advanced Computing Technology Center
Xprofiler: Main Display
 Width of a bar:
time including
called routines
 Height of a bar:
time excluding
called routines
 Call arrows
labeled with
number of calls
 Overview window
for easy
navigation
(View  Overview)
© 2005 IBM Corporation
Advanced Computing Technology Center
Xprofiler: Source Code Window
 Source code
window displays
source code
with time profile
(in ticks=.01 sec)
 Access
– Select function
in main display
–
 context menu
– Select function
in flat profile
–
 Code Display
–
 Show Source
Code
© 2005 IBM Corporation
Advanced Computing Technology Center
Xprofiler - Disassembler Code
© 2005 IBM Corporation
Advanced Computing Technology Center
HPM
 provides comprehensive reports of hardware events that are
critical to performance
– Accurate and Low overhead
– Comprehensive
• E.g., number of floating-point instructions executed, cache
misses, TLB misses
 Derived metrics
– correlate the behavior of the application to one or more of the
hardware components
 Thread-level support
 Including
– Hpmcount, libhpm, hpmstat
© 2005 IBM Corporation
Advanced Computing Technology Center
HPM Visualization Using PeekPerf
© 2005 IBM Corporation
Advanced Computing Technology Center
MP_profiler
 A set of libraries that collect profiling data for MPI and
TurboSHMEM applications
– Implements wrappers using PMPI interface
 Report performance metrics, e.g.,
– time used by MPI function calls
– message sizes
 Visualization tools help users identify performance
bottlenecks
– peekperf maps performance metrics back to the source codes
– peekview gives a visual representation of the overall
computation and communication pattern of the system.
© 2005 IBM Corporation
Advanced Computing Technology Center
MP_Profiler Visualization Using PeekPerf
© 2005 IBM Corporation
Advanced Computing Technology Center
MP_Tracer Visualization Using PeekPerf
© 2005 IBM Corporation
Advanced Computing Technology Center
POMP Profiler (PompProf)
 Generates a detailed profile describing overheads and time
spent by each thread in three key regions of the parallel
application:
– Parallel regions
– OpenMP loops inside a parallel region
– User defined functions
 Profile data is presented in the form of an XML file that can
be visualized with PeekPerf
© 2005 IBM Corporation
Advanced Computing Technology Center
DPOMP
 Dynamically instruments OpenMP applications
 Has the advantage of the being able to modify binaries with
performance instrumentation without requiring access to
souce codes or recompilation
 Based on dynamic probes using DPCL
© 2005 IBM Corporation
Advanced Computing Technology Center
PompProf Visualization Using PeekPerf
© 2005 IBM Corporation
Advanced Computing Technology Center
Modular I/O Performance Tool (MIO)
 I/O Analysis
– Trace module
– Summary of File I/O Activity + Binary Events File
– Low CPU overhead
 I/O Performance Enhancement Library
– Prefetch module (optimizes asynchronous prefetch and write-behind)
– System Buffer Bypass capability
– User controlled pages (size and number)
 Recoverable Error Handling
– Recover module (monitors return values and errnor + reissues failed requests)
 Remote Data Server
– Remote module (simple socket protocol for moving data)
 Shared object library for AIX
© 2005 IBM Corporation
Advanced Computing Technology Center
MIO User Code Interface
#define open64(a,b,c)
#define read
#define write
#define close
#define lseek64
#define fcntl
#define ftruncate64
#define fstat64
MIO_open64(a,b,c,0)
MIO_read
MIO_write
MIO_close
MIO_lseek64
MIO_fcntl
MIO_ftruncate64
MIO_fstat64
© 2005 IBM Corporation
Advanced Computing Technology Center
MIO Trace Module (sample partial text output)
Trace close : program <-> pf : /bmwfs/cdh108.T20536_13.SCR300 :
(281946/2162.61)=130.37 mbytes/s
current size=0
max_size=16277
mode =0777 sector size=4096
oflags =0x302=RDWR CREAT TRUNC
open
1
0.01
write
478193
462.10
59774
59774
131072
read
1777376 1700.48
222172
222172
131072
seek
911572
2.83
fcntl
3
0.00
trunc
16
0.40
close
1
0.03
size
127787
131072
131072
© 2005 IBM Corporation
Advanced Computing Technology Center
Time (seconds)
MSC.Nastran V2001
60,000
Elapsed
CPU time
50,000
Benchmark:
SOL 111, 1.7M DOF, 1578 modes,
146 frequencies, residual flexibility
and acoustics. 120 GB of disk space.
Machine:
4-way, 1.3 GHz p655, 32 GB with 16
GB large pages, JFS striped on 16
SCSI disks.
40,000
30,000
20,000
10,000
0
no MIO
with MIO
MSC.Nastran:
V2001.0.9 with large pages,
dmp=2 parallel=2 mem=700mb
The run with MIO used mio=1000mb
6.8 TB of I/O in 26666 seconds is an average of about 250 MB/sec
© 2005 IBM Corporation
Advanced Computing Technology Center
© 2005 IBM Corporation
Advanced Computing Technology Center
© 2005 IBM Corporation
Advanced Computing Technology Center
Problems that we are considering
 Performance profiling and monitoring for scientific
applications on large systems
– Selectively generates and reports profiling data
– Large amount performance data management and analysis
 Composite profiling and presentation
– CPU profiling
– Hardware Performance Counter profiling
– Communication profiling
© 2005 IBM Corporation
Download