A Case Study of HPC Metrics Collection and Analysis Philip Johnson and Michael Paulding, University of Hawaii, Honolulu, Hawaii. Goals of the case study Results: Process and Product Telemetry Charts Results: Daily Diary with CLI and Most Active File • Provide a complete implementation of one Purpose Based Benchmark problem definition, called Optimal Truss Design • Implement the Optimal Truss Design system in C++ using MPI on a 240 node Linux cluster at the University of Hawaii • Develop and evaluate automated support for HPC process and product measurement using Hackystat • Assess the utility of the metrics for understanding HPC development Metrics Collected • Size (Number of files, Total SLOC, “Parallel” SLOC containing an MPI directive, “Serial” SLOC not containing an MPI directive, Test code) • Active Time (amount of time spent editing Optimal Truss Design files) Insights and Lessons Learned • Performance (wall clock time on 1, 2, 4, 8, 16, and 32 processors) •Productivity (22 LOC/hour) and test code density (27%) seem in line with traditional software engineering metrics. • Milestone Tests (indicates functional completeness) •Speedup data indicates almost linear speedup to 4 processors, then falls off sharply, indicating that current solution is not scalable. • Command Line Invocations •Parallel and serial LOC were equal at start of project, then most effort was devoted to serial code, with some final enhancements to parallel code at end of project. Results: Basic Process and Product Measures Total Source Lines of Code 3320 LOC Total Test Lines of Code 901 LOC Total MPI Lines of Code Total Days (Calendar time) 1032 LOC 1 year •Performance data was not comparable over course of project (only final numbers available; no telemetry) • Hackystat provides effective infrastructure for collection of process and product metrics. Total Days (with Development Activity) 88 days Total Active Time 152 Hours Total Distinct MPI Directives 60 Directives Total Files 56 Files Total Sequential Files (no MPI) 51 Files Total Parallel Files (containing MPI) 5 Files Execution Time 126 sec. (1 processor) 66 sec. (2 processors) 33 sec. (4 processors) 27 sec. (8 processors) 39 sec. (16 processors) 43 sec. (32 processors) For More Information •Understanding HPCS development through automated process and product measurement with Hackystat, Philip M. Johnson and Michael G. Paulding, Proceedings of the Second Workshop on Productivity and Performance in High-End Computing. • This case study provides useful baseline data to compare with future studies. Results: Derived Process and Product Measures Derived Metric Definition Value Productivity Proxy (LOC / Active Time) 22 LOC/hour Average Daily Active Time (Total Active Time / Total Days) 1.73 hours/day Test Code Density Percentage (Total Test LOC / Total LOC) 27% MPI Code Density Percentage (Total MPI LOC / Total LOC) 31% MPI File Density Percentage (Total MPI Files / Total Files) 9% MPI Directive Frequency Ratio (Total MPI Directives : Total MPI LOC) 1 Directive : 17 LOC Speedup (Execution Time (1 Proc.) / Exec. Time (n processors) 1.0 1.9 3.7 4.5 3.2 2.9 • Future research: • Compare to OpenMP or JavaParty implementation. • Gather metrics while improving scalability of system • Compare metrics against other application types. • Analyze CLI data for patterns, bottlenecks Thanks to our sponsors (1 processor) (2 processors) (4 processors) (8 processors) (16 processors) (32 processors)