PowerPoint Presentation - Computational Informatics for Brain

advertisement
Advances in the
TAU Performance System
Allen D. Malony, Sameer Shende
{malony,shende}@cs.uoregon.edu
Department of Computer and Information Science
Computational Science Institute
University of Oregon
Outline
Complexity and performance technology
 Was ist TAU?
 Problems currently being investigated


Instrumentation control
 Selective

Performance mapping
 Callpath

Instrumentation
profiling
Performance data interaction, and steering
 Online
performance analysis and visualization
Performance analysis for component software
 Concluding remarks

Dagstuhl, August 2002
Advances in the TAU Performance System
2
Complexity in Parallel and Distributed Systems

Complexity in computing system architecture

Diverse parallel and distributed system architectures
 shared


/ distributed memory, cluster, hybrid, NOW, Grid, …
Sophisticated processor / memory / network architectures
Complexity in parallel software environment





Diverse parallel programming paradigms
Optimizing compilers and sophisticated runtime systems
Advanced numerical libraries and application frameworks
Hierarchical, multi-level software architectures
Multi-component, coupled simulation models
Dagstuhl, August 2002
Advances in the TAU Performance System
3
Complexity Determines Performance Requirements

Performance observability requirements





Multiple levels of software and hardware
Different types and detail of performance data
Alternative performance problem solving methods
Multiple targets of software and system application
Performance technology requirements





Broad scope of performance observation
Flexible and configurable mechanisms
Technology integration and extension
Cross-platform portability
Open, layered, and modular framework architecture
Dagstuhl, August 2002
Advances in the TAU Performance System
4
Complexity Challenges for Performance Tools

Computing system environment complexity





Observation integration and optimization
Access, accuracy, and granularity constraints
Diverse/specialized observation capabilities/technology
Restricted modes limit performance problem solving
Sophisticated software development environments





Programming paradigms and performance models
Performance data mapping to software abstractions
Uniformity of performance abstraction across platforms
Rich observation capabilities and flexible configuration
Common performance problem solving methods
Dagstuhl, August 2002
Advances in the TAU Performance System
5
General Problems (Performance Technology)
How do we create robust and ubiquitous
performance technology for the analysis and tuning
of parallel and distributed software and systems in
the presence of (evolving) complexity challenges?

How do we apply performance technology effectively
for the variety and diversity of performance
problems that arise in the context of complex
parallel and distributed computer systems?
Dagstuhl, August 2002
Advances in the TAU Performance System
6
TAU Performance System Framework
Tuning and Analysis Utilities (aka Tools Are Us)
 Performance system framework for scalable parallel and
distributed high-performance computing
 Targets a general complex system computation model





nodes / contexts / threads
Multi-level: system / software / parallelism
Measurement and analysis abstraction
Integrated toolkit for performance instrumentation,
measurement, analysis, and visualization


Portable performance profiling/tracing facility
Open software approach
Dagstuhl, August 2002
Advances in the TAU Performance System
7
TAU Performance System Architecture
Paraver
EPILOG
Dagstuhl, August 2002
Advances in the TAU Performance System
8
Instrumentation Control

Selection of which performance events to observe



How is selection supported in instrumentation system?





Could depend on scope, type, level of interest
Could depend on instrumentation overhead
No choice
Include / exclude lists (TAU)
Environment variables
Static vs. dynamic
Problem: Controlling instrumentation of small routines


High relative measurement overhead
Significant intrusion and possible perturbation
Dagstuhl, August 2002
Advances in the TAU Performance System
9
Rule-Based Overhead Analysis (N. Trebon, UO)
Analyze the performance data to determine events with
high (relative) overhead performance measurements
 Create a select list for excluding those events
 Rule grammar (used in TAUreduce tool)

[GroupName:] Field Operator Number
 GroupName indicates rule applies to events in group
 Field is a event metric attribute (from profile statistics)
 numcalls,
numsubs, percent, usec, cumusec, totalcount,
stdev, usecs/call, counts/call



Operator is one of >, <, or =
Number is any number
Compound rules possible using & between simple rules
Dagstuhl, August 2002
Advances in the TAU Performance System
10
TAUReduce Example
tau_reduce implements overhead reduction in TAU
 Consider klargest example




Find kth largest element in a N elements
Compare two methods: quicksort, select_kth_largest
i = 2324, N = 1000000 (uninstrumented)



quicksort: (wall clock) = 0.188511 secs
select_kth_largest: (wall clock) = 0.149594 secs
Total: (P3/1.2GHz time) = 0.340u 0.020s 0:00.37
Execution with all routines instrumented
 Execution with rule-based selective instrumentation


usec>1000 & numcalls>400000 & usecs/call<30 & percent>25
Dagstuhl, August 2002
Advances in the TAU Performance System
12
Simple sorting example on one processor
Before selective instrumentation reduction
NODE 0;CONTEXT 0;THREAD 0:
--------------------------------------------------------------------------------------%Time
Exclusive
Inclusive
#Call
#Subrs Inclusive Name
msec
msec
usec/call
--------------------------------------------------------------------------------------100.0
13
4,982
1
4
4982030 int main
93.5
3,223
4,659 4.20241E+06 1.40268E+07
1 void quicksort
62.9
0.00481
3,134
5
5
626839 int kth_largest_qs
36.4
137
1,813
28
450057
64769 int select_kth_largest
33.6
150
1,675
449978
449978
4 void sort_5elements
28.8
1,435
1,435 1.02744E+07
0
0 void interchange
0.4
20
20
1
0
20668 void setup
0.0
0.0118
0.0118
49
0
0 int ceil
After selective instrumentation reduction
NODE 0;CONTEXT 0;THREAD 0:
--------------------------------------------------------------------------------------%Time
Exclusive
Inclusive
#Call
#Subrs Inclusive Name
msec
total msec
usec/call
--------------------------------------------------------------------------------------100.0
14
383
1
4
383333 int main
50.9
195
195
5
0
39017 int kth_largest_qs
40.0
153
153
28
79
5478 int select_kth_largest
5.4
20
20
1
0
20611 void setup
0.0
0.02
0.02
49
0
0 int ceil
Dagstuhl, August 2002
Advances in the TAU Performance System
13
Performance Mapping
Associate performance with “significant” entities (events)
 Source code points are important


Functions, regions, control flow events, user events
Execution process and thread entities are important
 Some entities are more abstract, harder to measure
 Consider callgraph (callpath) profiling


Measure time (metric) along an edge (path) of callgraph
 Incident
edge gives parent / child view
 Edge sequence (path) gives parent / descendant view

Problem: Callpath profiling when callgraph is unknown


Determine callgraph dynamically at runtime
Map performance measurement to dynamic call path state
Dagstuhl, August 2002
Advances in the TAU Performance System
14
1-Level Callpath Implementation in TAU
TAU maintains a performance event (routine) callstack
 Profiled routine (child) looks in callstack for parent





Previous profiled performance event is the parent
A callpath profile structure created first time parent calls
TAU records parent in a callgraph map for child
String representing 1-level callpath used as its key
 “a(

)=>b( )” : name for time spent in “b” when called by “a”
Map returns pointer to callpath profile structure

1-level callpath is profiled using this profiling data
Build upon TAU’s performance mapping technology
 Measurement is independent of instrumentation

Dagstuhl, August 2002
Advances in the TAU Performance System
16
Performance Monitoring and Steering

Desirable to monitor performance during execution



Large-scale parallel applications complicate solutions




Long-running applications
Steering computations for improved performance
More parallel threads of execution producing data
Large amount of performance data (relative) to access
Analysis and visualization more difficult
Problem: Online performance data access and analysis



Incremental profile sampling (based on files)
Integration in computational steering system
Dynamic performance measurement and access
Dagstuhl, August 2002
Advances in the TAU Performance System
17
Online Performance Analysis (K. Li, UO)
SCIRun (Univ. of Utah)
Application
Performance
Steering
Performance
Visualizer
// performance
data streams
TAU
Performance
System
// performance
data output
file system
accumulated
samples
Performance
Data Integrator
Performance
Analyzer
Performance
Data Reader
• sample sequencing
• reader synchronization
Dagstuhl, August 2002
Advances in the TAU Performance System
18
2D Field Performance Visualization in SCIRun
SCIRun program
Dagstuhl, August 2002
Advances in the TAU Performance System
19
Uintah Computational Framework (UCF)
University
of Utah
 UCF analysis




Scheduling
MPI library
Components
500 processes
 Use for online
and offline
visualization
 Apply SCIRun
steering

Dagstuhl, August 2002
Advances in the TAU Performance System
20
Performance Analysis of Component Software
Complexity in scientific problem solving addressed by
advances in software development environments and rich
layered software middleware and libraries
 Increases complexity in performance problem solving
 Integration barriers for performance technology




Incompatible with advanced software technology
Inconsistent with software engineering process
Problem: Performance engineering for component systems



Respect software development methodology
Leverage software implementation technology
Look for opportunities for synergy and optimization
Dagstuhl, August 2002
Advances in the TAU Performance System
21
Focus on Component Technology and CCA
Emerging component technology for HPC and Grid
 Component: software object embedding functionality
 Component architecture (CA): how components connect
 Component framework: implements a CA
 Common Component Architecture (CCA)



Standard foundation for scientific component architecture
Component descriptions
 Scientific



Interface Description Language (SIDL)
CCA ports for component interactions (provides and uses)
CCA services: directory, registery, connection, event
High-performance components and interactions
Dagstuhl, August 2002
Advances in the TAU Performance System
22
Extended Component Design
generic
component
POC and PKC are compliant with component architecture
 Component composition performance engineering
 Utilize technology and services of component framework

Dagstuhl, August 2002
Advances in the TAU Performance System
23
Architecture of a Performance Component


Each component advertises its services
Performance component:





Ports
Timer (start/stop)
Performance
Event (trigger)
Component
Query (timers…)
Knowledge (component performance model)
Timer
Event
Query
Knowledge
Prototype implementation of timer

CCAFFEINE reference framework



http://www.cca-forum.org/café.html
SIDL
Instantiate with TAU functionality
Dagstuhl, August 2002
Advances in the TAU Performance System
24
TimerPort Interface Declaration in CCAFEINE

Create Timer port abstraction
namespace performance{
namespace ccaports{
/**
* This abstract class declares the Timer interface.
* Inherit from this class to provide functionality.
*/
class Timer: /* implementation of port */
public virtual gov::cca::Port { /* inherits from port spec */
public:
virtual ~ Timer (){ }
/**
* Start the Timer. Implement this function in
* a derived class to provide required functionality.
*/
virtual void start(void) = 0; /* virtual methods with */
virtual void stop(void) = 0; /* null implementations */
...
}
Dagstuhl, August 2002
Advances in the TAU Performance System
25
Using Performance Component Timer


Component uses framework services to get TimerPort
Use of this TimerPort interface is independent of TAU
// Get Timer port from CCA framework services form CCAFFEINE
port = frameworkServices->getPort ("TimerPort");
if (port)
timer_m = dynamic_cast < performance::ccaports::Timer * >(port);
if (timer_m == 0) {
cerr << "Connected to something, not a Timer port" << endl;
return -1;
}
string s = "IntegrateTimer"; // give name for timer
timer_m->setName(s);
// assign name to timer
timer_m->start();
// start timer (independent of tool)
for (int i = 0; i < count; i++) {
double x = random_m->getRandomNumber ();
sum = sum + function_m->evaluate (x);
}
timer_m->stop();
// stop timer
Dagstuhl, August 2002
Advances in the TAU Performance System
26
Using TAU Component in CCAFEINE
repository
repository
repository
repository
repository
repository
repository
repository
create
create
create
create
create
create
get
get
get
get
get
get
get
get
TauTimer
Driver
MidpointIntegrator
MonteCarloIntegrator
RandomGenerator
LinearFunction
NonlinearFunction
PiFunction
/* get TAU component from repository */
/* get application components */
LinearFunction lin_func
/* create component instances */
NonlinearFunction nonlin_func
PiFunction pi_func
MonteCarloIntegrator mc_integrator
RandomGenerator rand
TauTimer tau
/* create TAU component instance */
/* connecting components and running */
connect mc_integrator RandomGeneratorPort rand RandomGeneratorPort
connect mc_integrator FunctionPort nonlin_func FunctionPort
connect mc_integrator TimerPort tau TimerPort
create Driver driver
connect driver IntegratorPort mc_integrator IntegratorPort
go driver Go
quit
Dagstuhl, August 2002
Advances in the TAU Performance System
29
Concluding Remarks
Complex software and parallel computing systems pose
challenging performance analysis problems that require
robust methodologies and tools
 To build more sophisticated performance tools, existing
proven performance technology must be utilized
 Performance tools must be integrated with software and
systems models and technology




Performance engineered software
Function consistently and coherently in software and
system environments
TAU performance system offers robust performance
technology that can be broadly integrated
Dagstuhl, August 2002
Advances in the TAU Performance System
30
Dagstuhl, August 2002
Advances in the TAU Performance System
31
Download