Interactive and semiautomatic performance evaluation

advertisement
Interactive and semiautomatic
performance evaluation
W. Funika, B. Baliś
M. Bubak, R. Wismueller
Outline






Motivation
Tools Environment Architecture
Tools Extensions for GRID
Semiautomatic Analysis
Prediction model for Grid execution
Summary
Motivation



Large number of tools, but mainly off-line and non-Grid oriented ones
Highly dynamic character of Grid-bound performance data
Tool development needs a monitoring system
–
–
–





accessible via well-defined interface
with a comprehensive range of possibilities
not only to observe but also to control
Recent initiatives (DAMS – no perf, PARMON – no MP, OMIS)
Re-usability of existing tools
Enhancing the functionality to support new programming models
Interoperability of tools to support each other
When interactive tools are difficult or impossible to apply,
(semi)automatic ones are of help
Component Structure of Environment
X# Task 2.4 - Workflow and Interfaces
WP 1,3,4
WP 4
internal
integration,
testing,
refinement
3
M2.1
D2.1
PU
Interface to Grid
Monitoring Services
Performance data Model
state-of-the-art
GR
6
D2.2
CO
full Grid
testbed
WP 4
WP 1
2.5
2.2 - 2.41st development
2.1
PM
feedback
mainly from
local Grid
testbed
requirements
from
12
M2.2
Design of interfaces
D2.3
Between tool
PU
Design of Performance
Analysis Tool
1st prototype
IR
+ report
feedback
mainly from
internal
integration,
testing,
refinement
2nd development
15
D2.4
CO
internal
progress
report
18
internal
integration,
testing,
refinement
WP 1
2.5
3rd development
24
M2.3
27
D2.6
CO
33
M2.4
PU
D2.5 internal final version
progress
PU
2nd prototype
+ report
report
36
D2.7
PU
final demo
+ report
Application analysis







Basic blocks of all applications dataflow for input and
output
CPU-intensive cores
Parallel tasks / threads
Communication
Basic structures of the (Cross-) Grid
Flow charts, diagrams, basic blocks from the
applications
Optional information on application’s design patterns:
e.g. SPMD, master/worker, pipeline, divide & conquer
Categories of performance evaluation tools
Interactive, manual performance analysis
Off-line tools
• track based (combined with visualization)
• profile based (no time reference)
• problem: strong influence when fine grained measurements
On-line tools
• possible definition (restriction) of the measurements at run-time
• suitable with cyclic programs: new measurements based to the previous
results. => Automation of the bottleneck search is possible
Semi-automatic and automatic tools
•
•
Batch-oriented use of the computational environment (e.g. Grid)
Basis: Search-model: enables possible refining of measurements
Defining new functionality of performance
tool




Types of measurements
Types of presentation
Levels of measurement granularity
Measurement scopes:





Program
Procedure
Loop
Function call
Statement
• Code region identification
• Object types to be handled within an application
Definition and design Work







architecture of the tools, based on their functional description
hierarchy and naming policy of objects to be monitored
the tool/monitor interface, based on the expressing of
measurement requests in terms of monitoring specification
standard services
the filtering and grouping policy for the tools
functions for handling the measurement requests and the modes
of their operation
granularity of measurement representation and visualization
modes
the modes of delivering performance data for particular
measurements
Modes of delivering performance data
Interoperability of tools
``Capability to run multiple tools concurrently and apply them to the same
application''
Motivation:
- concurrent use of tools for different tasks
- combined use can lead to additional benefits
- enhanced modularity
Problems:
Structural conflicts: due to incompatible monitoring modules
Logical conflicts: e.g. a tool modifies the state of an object while another
tool still keeps outdated information about it
Semiautomatic Analysis

Why (semi-)automatic on-line performance
evaluation?
–

Grid: exact performance characteristics of computing
resources and network often unknown to user
–

ease of use - guide programmers to performance problems
tool should assess actual performance w.r.t. achievable
performance
interactive applications not well suited for tracing
–
–
–
–
applications run 'all the time'
detailed trace files would be too large
on-line analysis can focus on specific execution phases
detailed information via selective refinement
The APART approach

object oriented performance data model
–
–
–

formal specification of performance properties
–
–
–

available performance data
different kinds and sources, e.g. profiles, traces, ...
make use of existing monitoring tools
possible bottlenecks in an application
specific to programming paradigm
APART specification language (ASL)
specification of automatic analysis process
APART specification language

specification of performance property has three parts:
–
–
–

specification can combine different types of
performance data
–

CONDITION: when does a property hold?
CONFIDENCE: how sure are we? (depends on data source) (0-1)
SEVERITY: how important is the property?
 basis for determining the most important performance
problems
data from different hosts => global properties, e.g. load
imbalance
templates for simplified specification of related
properties
Supporting different performance analysis goals

performance analysis tool may be used to
–
–


can be supported via different definitions of
SEVERITY
e.g.: communication cost
–
–

optimize an application (independent of execution platform)
find out how well it runs on a particular Grid configuration
relative amount of execution time spent for communication
relative amount of available bandwidth used for
communication
also provides hints why there is a performance
problem (resources not well used vs. resources
exhausted)
Analytical model for predicting performance on GRID



Extract the relationship between the application and
execution features, and the actual execution time.
Focus on the relevant kernels in the applications
included in WP1.
Assuming message-passing paradigm
(in particular MPI).
Taking features into a model

HW features :
–
–
–

Networks speeds
CPU speeds
Memory bandwith
Application features:
–
–
–
–
Matrix and vector sizes
Number of the required coomunications
Size of these communications
Memory access patterns
Building a model




Through statistical analysis, a model to predict the
influence of several aspects on the execution of the
kernels will be extracted.
Then, a particular model for each aspect will be
obtained. A linear combination of them will be used to
predict the whole execution time.
Every particular model will be a function of the above
features.
Aspects to be included in the model:
–
–
–
–
computations time as a function of the above features
memory access time as a function of the features
communications time as a function of the features
synchronization time as a function of the features
X# WP2.4 Tools w.r.t. DataGrid WP3
Requirement
GRM
PATOP/OMIS
1 Scalability (#u, #r, #e)
No, no, yes
No, no, yes
2 Intrusiveness
Low (how much ?)
Low (0-10 %)
3 Portability
no
yes
New mon. modules
possible
yes
New data types
Yes (ev. def.)
yes
5 Communication
push
query/response
6 Metrics
Application only
comprehensive
7 Archive handling
no
Possible (TATOO)
4 Extendibility
Summary


New requirements for performance tools in Grid
Adaptation of int. performance ev. tool to GRID
–
–
–
–

Need in semiautomatic performance analysis
–
–
–

New measurements
New dialogue window
New presentations
New objects
Performance properties
APART specification language
Search strategy
Prediction model construction
Performance Measurements with PATOP
Possible Types of Measurement:






CPU time
Delay in Remote Procedure Calls (system calls executed on front-end)
Delay in send and receive calls
Amount of data sent and received
Time in marked areas (code regions)
Numer of executions of a specific point in the source code
Scope of Measurement
System Related:





Whole computing system,
Individual nodes,
Individual threads,
Pairs of nodes (communication partners, for send/receive),
Set of nodes specified by a performance condition
Program Related:


Whole program,
Individual functions
PATOP
Performance evaluation tools on top of the OCM
On-line Monitoring Interface Specification
The interface should provide the following properties:





support for interoperable tools
efficiency (minimal intrusion, scalability)
support for on-line monitoring (new objects, control)
platform-independence (HW, OS, programming library)
usability for any kind of run-time tool
(observing/manipulating, interactive/automatic,
centralized/distributed)
Object based approach to monitoring

observed system is a hierarchical set of objects:
1.
2.


access via abstract identifiers (tokens)
services observe and manipulate objects
1.
2.

classes: nodes, processes, threads, messages, and message
queues
node/process model suitable for DMPs, NOWs, SMPs, and SMP
clusters
OMIS core services: platform independent
others: platform (HW, OS, environment) specific extensions
tools define their own view of the observed system
Classification of overheads

Synchronisation (e.g. barriers and locks)
–

Control of parallelism (e.g. fork/join operations and
loop scheduling)
–

e.g. eliminating data dependences
Loss of parallelism – imperfect parallelisation
–

control and manage parallelism of a program (user, compiler)
Additional computation - changes to sequential code
to increase paralellism or data locality
–

coordination of accessing data, maintaining consistency
un- or partially parallelised code, replicated code
Data movement
–
any data transfer within a process or between processes
Interoperability of PATOP and DETOP



PATOP provides a high-level performance
measurement and visualisation
DETOP provides a source-code level debugging
Possible scenarios:
–
–
–
Erroneous behaviour observed via PATOP
 Suspend application with DETOP, examine source code
Measurement of execution phases
 Start/stop measurement at breakpoint
Measurement on dynamic objects
 Start measurement at breakpoint when object is created
Download