Interactive and semiautomatic performance evaluation W. Funika, B. Baliś M. Bubak, R. Wismueller Outline Motivation Tools Environment Architecture Tools Extensions for GRID Semiautomatic Analysis Prediction model for Grid execution Summary Motivation Large number of tools, but mainly off-line and non-Grid oriented ones Highly dynamic character of Grid-bound performance data Tool development needs a monitoring system – – – accessible via well-defined interface with a comprehensive range of possibilities not only to observe but also to control Recent initiatives (DAMS – no perf, PARMON – no MP, OMIS) Re-usability of existing tools Enhancing the functionality to support new programming models Interoperability of tools to support each other When interactive tools are difficult or impossible to apply, (semi)automatic ones are of help Component Structure of Environment X# Task 2.4 - Workflow and Interfaces WP 1,3,4 WP 4 internal integration, testing, refinement 3 M2.1 D2.1 PU Interface to Grid Monitoring Services Performance data Model state-of-the-art GR 6 D2.2 CO full Grid testbed WP 4 WP 1 2.5 2.2 - 2.41st development 2.1 PM feedback mainly from local Grid testbed requirements from 12 M2.2 Design of interfaces D2.3 Between tool PU Design of Performance Analysis Tool 1st prototype IR + report feedback mainly from internal integration, testing, refinement 2nd development 15 D2.4 CO internal progress report 18 internal integration, testing, refinement WP 1 2.5 3rd development 24 M2.3 27 D2.6 CO 33 M2.4 PU D2.5 internal final version progress PU 2nd prototype + report report 36 D2.7 PU final demo + report Application analysis Basic blocks of all applications dataflow for input and output CPU-intensive cores Parallel tasks / threads Communication Basic structures of the (Cross-) Grid Flow charts, diagrams, basic blocks from the applications Optional information on application’s design patterns: e.g. SPMD, master/worker, pipeline, divide & conquer Categories of performance evaluation tools Interactive, manual performance analysis Off-line tools • track based (combined with visualization) • profile based (no time reference) • problem: strong influence when fine grained measurements On-line tools • possible definition (restriction) of the measurements at run-time • suitable with cyclic programs: new measurements based to the previous results. => Automation of the bottleneck search is possible Semi-automatic and automatic tools • • Batch-oriented use of the computational environment (e.g. Grid) Basis: Search-model: enables possible refining of measurements Defining new functionality of performance tool Types of measurements Types of presentation Levels of measurement granularity Measurement scopes: Program Procedure Loop Function call Statement • Code region identification • Object types to be handled within an application Definition and design Work architecture of the tools, based on their functional description hierarchy and naming policy of objects to be monitored the tool/monitor interface, based on the expressing of measurement requests in terms of monitoring specification standard services the filtering and grouping policy for the tools functions for handling the measurement requests and the modes of their operation granularity of measurement representation and visualization modes the modes of delivering performance data for particular measurements Modes of delivering performance data Interoperability of tools ``Capability to run multiple tools concurrently and apply them to the same application'' Motivation: - concurrent use of tools for different tasks - combined use can lead to additional benefits - enhanced modularity Problems: Structural conflicts: due to incompatible monitoring modules Logical conflicts: e.g. a tool modifies the state of an object while another tool still keeps outdated information about it Semiautomatic Analysis Why (semi-)automatic on-line performance evaluation? – Grid: exact performance characteristics of computing resources and network often unknown to user – ease of use - guide programmers to performance problems tool should assess actual performance w.r.t. achievable performance interactive applications not well suited for tracing – – – – applications run 'all the time' detailed trace files would be too large on-line analysis can focus on specific execution phases detailed information via selective refinement The APART approach object oriented performance data model – – – formal specification of performance properties – – – available performance data different kinds and sources, e.g. profiles, traces, ... make use of existing monitoring tools possible bottlenecks in an application specific to programming paradigm APART specification language (ASL) specification of automatic analysis process APART specification language specification of performance property has three parts: – – – specification can combine different types of performance data – CONDITION: when does a property hold? CONFIDENCE: how sure are we? (depends on data source) (0-1) SEVERITY: how important is the property? basis for determining the most important performance problems data from different hosts => global properties, e.g. load imbalance templates for simplified specification of related properties Supporting different performance analysis goals performance analysis tool may be used to – – can be supported via different definitions of SEVERITY e.g.: communication cost – – optimize an application (independent of execution platform) find out how well it runs on a particular Grid configuration relative amount of execution time spent for communication relative amount of available bandwidth used for communication also provides hints why there is a performance problem (resources not well used vs. resources exhausted) Analytical model for predicting performance on GRID Extract the relationship between the application and execution features, and the actual execution time. Focus on the relevant kernels in the applications included in WP1. Assuming message-passing paradigm (in particular MPI). Taking features into a model HW features : – – – Networks speeds CPU speeds Memory bandwith Application features: – – – – Matrix and vector sizes Number of the required coomunications Size of these communications Memory access patterns Building a model Through statistical analysis, a model to predict the influence of several aspects on the execution of the kernels will be extracted. Then, a particular model for each aspect will be obtained. A linear combination of them will be used to predict the whole execution time. Every particular model will be a function of the above features. Aspects to be included in the model: – – – – computations time as a function of the above features memory access time as a function of the features communications time as a function of the features synchronization time as a function of the features X# WP2.4 Tools w.r.t. DataGrid WP3 Requirement GRM PATOP/OMIS 1 Scalability (#u, #r, #e) No, no, yes No, no, yes 2 Intrusiveness Low (how much ?) Low (0-10 %) 3 Portability no yes New mon. modules possible yes New data types Yes (ev. def.) yes 5 Communication push query/response 6 Metrics Application only comprehensive 7 Archive handling no Possible (TATOO) 4 Extendibility Summary New requirements for performance tools in Grid Adaptation of int. performance ev. tool to GRID – – – – Need in semiautomatic performance analysis – – – New measurements New dialogue window New presentations New objects Performance properties APART specification language Search strategy Prediction model construction Performance Measurements with PATOP Possible Types of Measurement: CPU time Delay in Remote Procedure Calls (system calls executed on front-end) Delay in send and receive calls Amount of data sent and received Time in marked areas (code regions) Numer of executions of a specific point in the source code Scope of Measurement System Related: Whole computing system, Individual nodes, Individual threads, Pairs of nodes (communication partners, for send/receive), Set of nodes specified by a performance condition Program Related: Whole program, Individual functions PATOP Performance evaluation tools on top of the OCM On-line Monitoring Interface Specification The interface should provide the following properties: support for interoperable tools efficiency (minimal intrusion, scalability) support for on-line monitoring (new objects, control) platform-independence (HW, OS, programming library) usability for any kind of run-time tool (observing/manipulating, interactive/automatic, centralized/distributed) Object based approach to monitoring observed system is a hierarchical set of objects: 1. 2. access via abstract identifiers (tokens) services observe and manipulate objects 1. 2. classes: nodes, processes, threads, messages, and message queues node/process model suitable for DMPs, NOWs, SMPs, and SMP clusters OMIS core services: platform independent others: platform (HW, OS, environment) specific extensions tools define their own view of the observed system Classification of overheads Synchronisation (e.g. barriers and locks) – Control of parallelism (e.g. fork/join operations and loop scheduling) – e.g. eliminating data dependences Loss of parallelism – imperfect parallelisation – control and manage parallelism of a program (user, compiler) Additional computation - changes to sequential code to increase paralellism or data locality – coordination of accessing data, maintaining consistency un- or partially parallelised code, replicated code Data movement – any data transfer within a process or between processes Interoperability of PATOP and DETOP PATOP provides a high-level performance measurement and visualisation DETOP provides a source-code level debugging Possible scenarios: – – – Erroneous behaviour observed via PATOP Suspend application with DETOP, examine source code Measurement of execution phases Start/stop measurement at breakpoint Measurement on dynamic objects Start measurement at breakpoint when object is created