Briefing on Tool Evaluations Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant PAT 11 July 2005 HCS Research Laboratory University of Florida Purpose & Methodology 11 July 2005 Purpose of Evaluations Investigate performance analysis methods used in existing tools Determine what features are necessary for tools to be effective Examine usability of tools Find out what performance factors existing tools focus on Create standardized evaluation strategy and apply to popular existing tools Gather information about tool extensibility Generate list of reusable components from existing tools Identify any tools that may serve as basis for our SHMEM and UPC performance tool Take best candidates for extension and gain experience modifying to support new features 11 July 2005 3 Evaluation Methodology Generate list of desirable characteristics for performance tools Categorize based on influence of a tool’s: Usability/productivity Portability Scalability Miscellaneous Assign importance rating to each Minor (not really important) Average (nice to have) Important (should include) Critical (absolutely needed) Formulate a scoring strategy for each Give numerical scores Will present list of characteristics and actual scores in later slide 11 July 2005 1-5: 5 best 0: not applicable Create objective scoring criteria where possible Use relative scores for subjective categories 4 Performance Tool Test Suite Method used to ensure subjective scores consistent across each tool Also used to determine effectiveness of performance tool Includes Suite of C MPI microbenchmarks that have specific performance problems: PPerfMark [1,2], based on GrindStone [3] Large-scale program: NAS NPB LU benchmark [4] “Control” program with good parallel efficiency to test for false positives: CAMEL cryptanalysis C MPI implementation (HCS lab) For each program in test suite, assign FAIL: Tool was unable to provide information to identify bottleneck TOSS-UP: Tool indicated a bottleneck was occurring, but user must be clever to find out and fix PASS: Tool clearly showed where bottleneck was occurring and gave enough information so a competent user could fix it 11 July 2005 5 Performance Tool Test Suite (2) What should performance tool tell us? CAMEL No communication bottlenecks, CPU-bound code Performance could be improved by using non-blocking MPI calls LU 11 July 2005 Large number of small messages Dependence on network bandwidth and latency Identify which routines take the most time 6 Performance Tool Test Suite (3) Big message Many small messages, overall execution time dependent on network latency Random barrier 11 July 2005 One node holds up barrier One procedure responsible for slow node behavior Point-to-point messages sent in wrong order System time One node is bombarded with lots of messages Wrong way Ping-pong First node overloaded with work Small messages Intensive server Several large messages sent Dependence on network bandwidth Most time spent in system calls Diffuse procedure Similar to random barrier One node holds up barrier Time for slow procedure “diffused” across several nodes in round-robin fashion 7 Overview of Tools Evaluated 11 July 2005 List of Tools Evaluated Profiling tools TAU (Univ. of Oregon) mpiP (ORNL, LLNL) HPCToolkit (Rice Univ.) SvPablo (Univ. of Illinois, Urbana-Champaign) DynaProf (Univ. of Tennessee, Knoxville) Tracing tools 11 July 2005 Intel Cluster Tools (Intel) MPE/Jumpshot (ANL) Dimemas & Paraver (European Ctr. for Parallelism of Barcelona) MPICL/ParaGraph (Univ. of Illinois, Univ. of Tennessee, ORNL) 9 List of Tools Evaluated (2) Other tools KOJAK (Forschungszentrum Jülich, ICL @ UTK) Paradyn (Univ. of Wisconsin, Madison) Also quickly reviewed 11 July 2005 CrayPat/Apprentice2 (Cray) DynTG (LLNL) AIMS (NASA) Eclipse Parallel Tools Platform (LANL) Open/Speedshop (SGI) 10 Profiling Tools 11 July 2005 Tuning and Analysis Utilities (TAU) Developer: University of Oregon Current versions: Website: TAU 2.14.4 Program database toolkit 3.3.1 http://www.cs.uoregon.edu/research/paracomp/tau/tautools/ Contact: 11 July 2005 Sameer Shende: sameer@cs.uoregon.edu 12 TAU Overview Measurement mechanisms Source (manual) Source (automatic via PDToolkit) Binary (DynInst) Key features Supports both profiling and tracing 11 July 2005 No built-in trace viewer Generic export utility for trace files (.vtf, .slog2, .alog) Many supported architectures Many supported languages: C, C++, Fortran, Python, Java, SHMEM (TurboSHMEM and Cray SHMEM), OpenMP, MPI, Charm Hardware counter support via PAPI 13 TAU Visualizations 11 July 2005 14 mpiP Developer: ORNL, LLNL Current version: Website: mpiP v2.8 http://www.llnl.gov/CASC/mpip/ Contacts: 11 July 2005 Jeffrey Vetter: vetterjs@ornl.gov Chris Chambreau: chcham@llnl.gov 15 mpiP Overview Measurement mechanism Profiling via MPI profiling interface Key features Simple, lightweight profiling Source code correlation (facilitated by mpipview) 11 July 2005 Gives profile information for MPI callsites Uses PMPI interface with extra libraries (libelf, libdwarf, libunwind) to do source correlation 16 mpiP Source Code Browser 11 July 2005 17 HPCToolkit Developer: Rice University Current version: Website: HPCToolkit v1.1 http://www.hipersoft.rice.edu/hpctoolkit/ Contact: 11 July 2005 John Mellor-Crummey: johnmc@cs.rice.edu Rob Fowler: rjf@cs.rice.edu 18 HPCToolkit Overview Measurement Mechanism Hardware counters (requires PAPI on Linux) Key Features 11 July 2005 Create hardware counter profiles for any executable via sampling No instrumentation necessary Relies on PAPI overflow events and program counter values to relate PAPI metrics to source code Source code correlation of performance data, even for optimized code Navigation pane in viewer assists in locating resourceconsuming functions 19 HPCToolkit Source Browser 11 July 2005 20 SvPablo Developer: University of Illinois Current versions: Website: SvPablo 6.0 SDDF component 5.5 Trace Library component 5.1.4 http://www.renci.unc.edu/Software/Pablo/pablo.htm Contact: 11 July 2005 ? 21 SvPablo Overview Measurement mechanism Profiling via source code instrumentation Key features 11 July 2005 Single GUI integrates instrumentation and performance data display Assisted source code instrumentation Management of multiple instances of instrumented sourced code and corresponding performance data Simplified scalability analysis of performance data from multiple runs 22 SvPablo Visualization 11 July 2005 23 Dynaprof Developer: Philip Mucci (UTK) Current versions: Website: Dynaprof CVS as of 2/21/2005 DynInst API v4.1.1 (dependency) PAPI v3.0.7 (dependency) http://www.cs.utk.edu/~mucci/dynaprof/ Contact: 11 July 2005 Philip Mucci: mucci@cs.utk.edu 24 Dynaprof Overview Measurement mechanism Profiling via PAPI and DynInst Key features Simple, gdb-like command line interface No instrumentation step needed – binary instrumentation at runtime Produces simple text-based profile output similar to gprof for 11 July 2005 PAPI metrics Wallclock time CPU time (getrusage) 25 Tracing Tools 11 July 2005 Intel Trace Collector/Analyzer Developer: Intel Current versions: Website: Intel Trace Collector 5.0.1.0 Intel Trace Analyzer 4.0.3.1 http://www.intel.com/software/products/cluster Contact: 11 July 2005 http://premier.intel.com 27 Intel Trace Collector/Analyzer Overview Measurement Mechanism MPI profiling interface for MPI programs Static binary instrumentation (proprietary method) Key Features 11 July 2005 Simple, straightforward operation Comprehensive set of visualizations Source code correlation pop-up dialogs Views are linked, allowing analysis of specific portions/phases of execution trace 28 Intel Trace Analyzer Visualizations 11 July 2005 29 MPE/Jumpshot Developer: Argonne National Laboratory Current versions: Website: MPE 1.26 Jumpshot-4 http://www-unix.mcs.anl.gov/perfvis/ Contacts: 11 July 2005 Anthony Chan: chan@mcs.anl.gov David Ashton: ashton@mcs.anl.gov Rusty Lusk: lusk@mcs.anl.gov William Gropp: gropp@mcs.anl.gov 30 MPE/Jumpshot Overview Measurement Mechanism MPI profiling interface for MPI programs Key Features Distributed with MPICH Easy to generate traces of MPI programs 11 July 2005 Compile with mpicc -mpilog Scalable logfile format for efficient visualization Java-based timeline viewer with extensive scrolling and zooming support 31 Jumpshot Visualization 11 July 2005 32 CEPBA Tools (Dimemas, Paraver) Developer: European Center for Parallelism of Barcelona Current versions: Website: MPITrace 1.1 Paraver 3.3 Dimemas 2.3 http://www.cepba.upc.es/tools_i.htm Contact: 11 July 2005 Judit Gimenez: judit@cepba.upc.edu 33 Dimemas/Paraver Overview Measurement Mechanism MPI profiling interface Key Features Paraver Dimemas 11 July 2005 Sophisticated trace file viewer, uses “tape” metaphor Supports displaying hardware counter metrics along in trace visualization Uses modular software architecture, very customizable Trace-driven simulator Uses simple models for real hardware Generates “predictive traces” that can be viewed by Paraver 34 Paraver Visualizations 11 July 2005 35 Paraver Visualizations (2) 11 July 2005 36 MPICL/ParaGraph Developer: Current versions: Paragraph (no version number, but last available update 1999) MPICL 2.0 Website: ParaGraph: University of Illinois, University of Tennessee MPICL: ORNL http://www.csar.uiuc.edu/software/paragraph/ http://www.csm.ornl.gov/picl/ Contacts: ParaGraph MPICL 11 July 2005 Michael Heath: heath@cs.uiuc.edu Jennifer Finger Patrick Worley: worleyph@ornl.gov 37 MPICL/Paragraph Overview Measurement Mechanism MPI profiling interface Other wrapper libraries for obsolete vendor-specific message-passing libraries Key Features Large number of different visualizations (about 27) Several types 11 July 2005 Utilization visualizations Communication visualizations “Task” visualizations Other visualizations 38 Paragraph Visualizations: Utilization 11 July 2005 39 Paragraph Visualizations: Communication 11 July 2005 40 Other Tools 11 July 2005 KOJAK Developer: Forschungszentrum Jülich, ICL @ UTK Current versions: Website: Stable: KOJAK-v2.0 Development: KOJAK v2.1b1 http://icl.cs.utk.edu/kojak/ http://www.fz-juelich.de/zam/kojak/ Contacts: 11 July 2005 Felix Wolf: fwolf@cs.utk.edu Bernd Mohr: b.mohr@fz-juelich.de Generic email: kojak@cs.utk.edu 42 KOJAK Overview Measurement Mechanism MPI profiling interface Binary instrumentation on a few platforms Key Features Generates and analyzes trace files 11 July 2005 Automatic classification of bottlenecks Simple, scalable profile viewer with source correlation Exports traces to Vampir format 43 KOJAK Visualization 11 July 2005 44 Paradyn Developer: University of Wisconsin, Madison Current versions: Website: Paradyn: 4.1.1 DynInst: 4.1.1 KernInst: 2.0.1 http://www.paradyn.org/index.html Contact: 11 July 2005 Matthew Legendre: legendre@cs.wisc.edu 45 Paradyn Overview Measurement Mechanism Dynamic binary instrumentation Key Features Dynamic instrumentation at runtime 11 July 2005 No instrumentation phase Visualizes user-selectable metrics while program is running Automatic performance bottleneck detection via Performance Consultant Users can define their own metrics using a TCL-like language All analysis happens while program is running 46 Paradyn Visualizations 11 July 2005 47 Paradyn Performance Consultant 11 July 2005 48 Evaluation Ratings 11 July 2005 Scoring System Scores given for each category Usability/productivity Portability Scalability Miscellaneous Scoring formula shown below Used to generate scores for each category Weighted sum based on characteristic’s importance Importance multipliers used Critical: 1.0 Important: 0.75 Average: 0.5 Minor: 0.25 Overall score is sum of all category scores n (Characteri stic )( Importance Multiplier i i ) i 1 11 July 2005 50 Characteristics: Portability, Miscellaneous, Scalability Categories Portability Critical Extensibility Hardware support Critical Software support Minor Scalability Important Heterogeneity support Filtering and aggregation Multiple executions Performance bottleneck identification Minor Searching Miscellaneous Important 11 July 2005 Cost Interoperability Note: See appendix for details on how scores were assigned for each characteristic. 51 Characteristics: Usability/Productivity Category Usability/productivity Response time Technical support Minor Installation Important 11 July 2005 Available metrics Learning curve Multiple analyses/views Profiling/tracing support Source code correlation Average Critical Documentation Manual (user) overhead Measurement accuracy Stability 52 Usability/Productivity Scores HPCToolkit 28.625 MPICL 20.25 SvPablo 28.875 ICT 37.75 KOJAK 34.25 Dynaprof 25.5 mpiP 28.75 DiP 25 MPE/Jumpshot 26.25 Paradyn 32.25 TAU 32.75 0 11 July 2005 5 10 15 20 25 30 35 40 53 Portability Scores HPCToolkit 8.5 MPICL 9.25 SvPablo 7.25 ICT 5.625 KOJAK 11.625 Dynaprof 9.875 9.25 mpiP 5.625 DiP 9.75 MPE/Jumpshot 7.5 Paradyn 12.75 TAU 0 11 July 2005 2 4 6 8 10 12 14 54 Scalability Scores HPCToolkit 7.25 MPICL 5 SvPablo 8 ICT 9.5 KOJAK 9.5 Dynaprof 3 mpiP 4.5 DiP 10.5 MPE/Jumpshot 7.875 6.5 Paradyn TAU 9 0 11 July 2005 2 4 6 8 10 12 55 Miscellaneous Scores HPCToolkit 6 MPICL 3.75 SvPablo 4.5 ICT 3 KOJAK 5.25 Dynaprof 4.125 mpiP 4.5 DiP 1.5 MPE/Jumpshot 4.125 Paradyn 4.5 TAU 7.5 0 11 July 2005 1 2 3 4 5 6 7 8 56 Overall Scores HPCToolkit 50.375 MPICL 38.25 SvPablo 48.625 ICT 55.875 KOJAK 60.625 Dynaprof 42.5 mpiP 47 DiP 42.625 MPE/Jumpshot 48 Paradyn 50.75 TAU 62 0 11 July 2005 10 20 30 40 50 60 70 57 Extensibility Study & Demo Question: Should we write new tool from scratch, or reuse existing tool? To help answer, we added preliminary support for GPSHMEM to two tools Picked top candidate tools for extension, KOJAK and TAU (based on portability scores) Added weak binding support for GPSHMEM (GCC only) Created simple GPSHMEM wrapper libraries for KOJAK and TAU Will study creating comparable components from scratch in near future Notes/caveats No advanced analyses for one-sided memory operations available in either TAU or KOJAK Only simple support added! Analyzing one-sided operations is difficult. GPSHMEM requires source patches for weak binding support, only currently works with GCC compilers Adding UPC support to these tools would require several more orders of magnitude of work (Demo) 11 July 2005 58 Q&A 11 July 2005 59 References [1] Kathryn Mohror and Karen L. Karavanic. "Performance Tool Support for MPI-2 on Linux," SC2004, November 2004, Pittsburgh, PA. [2] Kathryn Mohror and Karen L. Karavanic. "Performance Tool Support for MPI-2 on Linux ," PSU CS Department Technical Report, April 2004. [3] Jeffrey K. Hollingsworth, Michael Steele. “Grindstone: A Test Suite for Parallel Performance Tools,” Technical Report CS-TR-3703, University of Maryland, Oct. 1996. [4] David Bailey, Tim Harris, William Saphir, Rob van der Wijngaart, Alex Woo, and Maurice Yarrow. “The NAS Parallel Benchmarks 2.0,” Technical Report NAS-95-020, NASA, December, 1995. 11 July 2005 60 Appendix: Tool Characteristics Used in Evaluations 11 July 2005 Usability/Portability Characteristics 11 July 2005 Available Metrics Description Depth of metrics provided by tool Examples Communication statistics or events Hardware counters Importance rating Critical, users must be able to obtain representative performance data to debug performance problems Rating strategy Scored using relative ratings (subjective characteristic) Compare tool’s available metrics with metrics provided by other tools 11 July 2005 63 Documentation Quality Description Quality of documentation provided Includes user’s manuals, READMEs, and “quick start” guides Importance rating Important, can have a large affect on overall usability Rating strategy Scored using relative ratings (subjective characteristic) Correlated to how long it takes to decipher documentation enough to use tool Tools with quick start guides or clear, concise high-level documentation receive higher scores 11 July 2005 64 Installation Description Measure of time needed for installation Also incorporates level of expertise necessary to perform installation Importance Minor, installation only needs to be done once and may not even be done by end user Rating strategy Scored using relative ratings based on mean installation time for all tools All tools installed by a single person with significant system administration experiences 11 July 2005 65 Learning Curve Description Importance rating Difficulty level associated with learning to use tool effectively Critical, tools that are perceived as being too difficult to operate by users will be avoided Rating strategy 11 July 2005 Scored using relative ratings (subjective characteristic) Based on time necessary to get acquainted with all features needed for day-to-day operation of tool 66 Manual Overhead Description Importance rating Amount of user effort needed to instrument their code Important, tool must not cause more work for user in end (instead it should reduce time!) Rating strategy Use hypothetical test case Score one point for each of the following actions that can be completed on a fresh copy of source code in 10 minutes (estimated) 11 July 2005 MPI program, ~2.5 kloc in 20 .c files with 50 user functions Instrument all MPI calls Instrument all functions Instrument five arbitrary functions Instrument all loops, or a subset of loops Instrument all function callsites, or a subset of callsites (about 35) 67 Measurement Accuracy Description How much runtime instrumentation overhead tool imposes Importance rating Important, inaccurate data may lead to incorrect diagnosis which creates more work for user with no benefit Rating strategy Use standard application: CAMEL MPI program Score based on runtime overhead of instrumented executable (wallclock time) 11 July 2005 0-4%: five points 5-9%: four points 10-14%: three points 15-19%: two points 20% or greater: one point 68 Multiple Analyses/Views Description Importance rating Different ways tool presents data to user Different analyses available from within tool Critical, tools must provide enough ways of looking at data so that users may track down performance problems Rating strategy 11 July 2005 Score based on relative number of views and analyses provided by each tool Approximately one point for each different view and analyses provided by tool 69 Profiling/Tracing Support Description Low-overhead profile mode offered by tool Comprehensive event trace offered by tool Importance rating Critical, profile mode useful for quick analysis and trace mode necessary for examining what really happens during execution Rating strategy Two points if a profiling mode is available Two points if a tracing mode is available One extra point if trace file size is within a few percent of best trace file size across all tools 11 July 2005 70 Response Time Description How much time is needed to get data from tool Importance rating Average, user should not have to wait an extremely long time for data but high-quality information should always be first goal of tools Rating strategy Score is based on relative time taken to get performance data from tool Tools that perform post-mortem complicated analyses or bottleneck detection receive lower scores Tools that provide data while program is running receive five points 11 July 2005 71 Source Code Correlation Description How well tool relates performance data back to original source code Importance rating Critical, necessary to see which statements and regions of code are causing performance problems Rating strategy Four to five points if tool supports source correlation to function or line level One to three points if tool supports indirect method of attributing data to functions or source lines Zero points if tool does not provide enough data to map performance metrics back to source code 11 July 2005 72 Stability Description Importance rating How likely tool is to crash while under use Important, unstable tools will frustrate users and decrease productivity Rating strategy Scored using relative ratings (subjective characteristic) Score takes into account 11 July 2005 Number of crashes experienced during evaluation Severity of crashes Number of bugs encountered 73 Technical Support Description Importance rating How quick responses are received from tool developers or support departments Quality of information and helpfulness of responses Average, important for users during installation and initial use of tool but becomes less important as time goes on Rating strategy 11 July 2005 Relative rating based on personal communication with our contacts for each tool (subjective characteristic) Timely, informative responses result in four or more points 74 Portability Characteristics 11 July 2005 Extensibility Description How easy tool may be extended to support UPC and SHMEM Importance rating Critical, tools that cannot be extended for UPC and SHMEM are almost useless for us Rating strategy Commercial tools receive zero points 11 July 2005 Regardless of if export or import functionality is available Interoperability covered by another characteristic Subjective score based on functionality provided by tool Also incorporates quality of code (after quick review) 76 Hardware Support Description Importance rating Number and depth of hardware platforms supported Critical, essential for portability Rating strategy Based on our estimate of important architectures for UPC and SHMEM Award one point for support of each of the following architectures 11 July 2005 IBM SP (AIX) IBM BlueGene/L AlphaServer (Tru64) Cray X1/X1E (UnicOS) Cray XD1 (Linux w/Cray proprietary interconnect) SGI Altix (Linux w/NUMALink) Generic 64-bit Opteron/Itanium Linux cluster support 77 Heterogeneity Description Importance rating Tool support for running programs across different architectures within a single run Minor, not very useful on shared-memory machines Rating strategy 11 July 2005 Five points if heterogeneity is supported Zero points if heterogeneity is not supported 78 Software Support Description Number of languages, libraries, and compilers supported Importance rating Important, should support many compilers and not hinder library support but hardware support and extensibility are more important Rating strategy Score based on relative number of languages, libraries, and compilers supported compared with other tools Tools that instrument or record data for existing closed-source libraries receive an extra point (up to max of five points) 11 July 2005 79 Scalability Characteristics 11 July 2005 Filtering and Aggregation Description Importance rating How well tool is able to provide users with tools to simplify and summarize data being displayed Critical, necessary for users to effectively work with large data sets generated by performance tools Rating strategy 11 July 2005 Scored using relative ratings (slightly subjective characteristic) Tools that provide many different ways of filtering and aggregating data receive higher scores 81 Multiple Executions Description Support for relating and comparing performance information from different runs Examples Importance rating Automated display of speedup charts Differences between time taken for methods using different algorithms or variants of a single algorithm Critical, import for doing scalability analysis Rating strategy 11 July 2005 Five points if tool supports relating data from different runs Zero points if not 82 Performance Bottleneck Detection Description Importance rating How well tool identifies each known (and unknown) bottleneck in our test suite Critical, bottleneck detection the most important function of a performance tool Rating strategy 11 July 2005 Score proportional to the number of PASS ratings given for test suite programs Slightly subjective characteristic; have to guess that the user is able to determine bottleneck based on data provided by tool 83 Searching Description Importance rating Ability of the tool to search for particular information or events Minor, can be useful but difficult to provide users with a powerful search that is user-friendly Rating strategy Five points if searching is support 11 July 2005 Points deducted if only simple search available Zero points if no search functionality 84 Miscellaneous Characteristics 11 July 2005 Cost Description Importance rating How much (per seat) the tool costs to use Important, tools that are prohibitively expensive reduce overall availability of tool Rating strategy Scale based on per-seat cost 11 July 2005 Free: five points $1.00 to $499.99: four points $500.00 to $999.99: three points $1,000.00 to $1,999.99: two points $2,000.00 or more: one point 86 Interoperability Description How well the tool works and integrates with other performance tools Importance rating Important, tools lacking in areas like trace visualization can make up for it by exporting data that other tools can understand (also helpful for getting data from 3rd-party sources) Rating strategy Zero if data cannot be imported or exported from tool One point for export of data in a simple ASCII format Additional points (up to five) for each format the tool can export from and import into 11 July 2005 87