Deep Start: A Hybrid Strategy for Automated Performance Problem Searches Philip C. Roth pcroth@cs.wisc.edu Computer Sciences Department University of Wisconsin 1210 W. Dayton St. Madison, WI 53706-1685 USA pcroth@cs.wisc.edu Paradyn/Condor Week (12 March 2001, Madison, WI) Performance Consultant • Paradyn’s automated bottleneck diagnosis component • Search-based •General to very specific experiments • Experimental data collected using dynamic instrumentation • Automates process that experienced programmer would use pcroth@cs.wisc.edu [2 of 28] Deep Start Deep Start Search Strategy • Goal: more scalable automated searches • Idea: search “closer” to actual bottlenecks • Hybrid approach •Automated search using dynamic instrumentation •Stack sampling • Benefits •Efficiency-find bottlenecks more quickly •Effectiveness-find bottlenecks hidden from current search strategy pcroth@cs.wisc.edu [3 of 28] Deep Start Performance Consultant • Hypotheses: reasons why application may be performing poorly TopLevelHypothesis ExcessiveSyncWaitingTime ExcessiveIOBlockingTime CPUbound TooManySmallIOOps pcroth@cs.wisc.edu [4 of 28] Deep Start Performance Consultant • Resources: locations where application may be performing poorly pcroth@cs.wisc.edu [5 of 28] Deep Start Performance Consultant • Focus: tuple of resources, one from each hierarchy • Names a set of application resources • </Code/setup.c,/Machine/lc05.cs.wisc.edu,/SyncObject> pcroth@cs.wisc.edu [6 of 28] Deep Start Current Search Strategy • First determine why application is performing poorly • Search through hypotheses at whole program focus • Then find where application is performing poorly • Refine focus as much as possible • Code follows call graph • Others follow resource hierarchy structure • One step at a time • Prune search path when experiment metric is below threshold pcroth@cs.wisc.edu [7 of 28] Deep Start Current Search Strategy pcroth@cs.wisc.edu [8 of 28] Deep Start Current Search Strategy pcroth@cs.wisc.edu [9 of 28] Deep Start Current Search Strategy pcroth@cs.wisc.edu [10 of 28] Deep Start Current Search Strategy pcroth@cs.wisc.edu [11 of 28] Deep Start Deep Start Search Strategy • Goal: start searching “closer” to actual bottlenecks • Approach: Use stack samples gathered as side-effect of dynamic instrumentation as hints • Sampling augments current strategy •Won’t miss bottlenecks due to sampling pcroth@cs.wisc.edu [12 of 28] Deep Start Deep Start Example pcroth@cs.wisc.edu [13 of 28] Deep Start Deep Start Example Deep starter pcroth@cs.wisc.edu [14 of 28] Deep Start Deep Start Example Adding deep starter was worthwhile! pcroth@cs.wisc.edu [15 of 28] Deep Start Deep Start Example pcroth@cs.wisc.edu [16 of 28] Deep Start Stack Sampling • Paradyn daemons perform stack walk whenever they insert dynamic instrumentation • Daemons now save stack samples • Samples delivered in batches to Performance Consultant pcroth@cs.wisc.edu [17 of 28] Deep Start Choosing Deep Starters • Function count graph •ABCD •AECD •AFD •AFG pcroth@cs.wisc.edu [18 of 28] Deep Start Choosing Deep Starters • Function count graph •ABCD •AECD •AFD •AFG B:1 A:4 E:1 F:2 pcroth@cs.wisc.edu [19 of 28] C:2 D:3 G:1 Deep Start Choosing Deep Starters • Consider functions whose count is above threshold • Percentage of total samples seen • Choose deepest function in abovethreshold subgraphs B:1 A:4 E:1 F:2 pcroth@cs.wisc.edu [20 of 28] Threshold = 3 C:2 D:3 G:1 Deep Start Adding Deep Starters • Look for deep starters each time PC refines along search path that has already searched Code hierarchy • Search from deep starter at high priority •Focuses attention near likely bottlenecks pcroth@cs.wisc.edu [21 of 28] Deep Start Results • Applications •Sequential circuit layout • Sun Ultra 10 (SPARC) • Solaris 7 •Parallel global circulation simulation •Parallel quantum chromodynamics simulation • Eight-node x86 cluster, 100 Mb/s Ethernet switch • Linux 2.2.17, high-resolution timer patch • MPICH 1.2 pcroth@cs.wisc.edu [22 of 28] Deep Start Bubba • sequential circuit layout 100% 90% Bottlenecks Found 80% 70% 60% Deep Start 50% Call Graph 40% 30% 20% 10% 0% 0 50 100 150 200 250 Time (sec) pcroth@cs.wisc.edu [23 of 28] Deep Start Su3_rmd • SU(3) lattice gauge theory simulation 100% 90% Bottlenecks Found 80% 70% 60% Deep Start Call Graph 50% 40% 30% 20% 10% 0% 0 50 100 150 200 250 300 Time (sec) pcroth@cs.wisc.edu [24 of 28] Deep Start Om3 • global ocean general circulation model 100% 90% Bottlenecks Found 80% 70% 60% Deep Start 50% Call Graph 40% 30% 20% 10% 0% 0 50 100 150 200 250 300 Time (sec) pcroth@cs.wisc.edu [25 of 28] Deep Start Future Work • Bidirectional Deep Start •Search upward from deep starter as well as downward •Takes further advantage of stack samples as “hot paths” • “Priming the pump” •Sampling period at the start of a Performance Consultant search •Avoids making early deep starter decisions based on too few samples pcroth@cs.wisc.edu [26 of 28] Deep Start Future Work • Minimize cost of finding deep starters •With probability p at each refinement •Every nth refinement •Every nth sample •Push model • Make context-sensitive decisions to support deep starters in non-Code hierarchies •Identify functions in stack samples pcroth@cs.wisc.edu [27 of 28] Deep Start Conclusions • Deep Start strategy is more efficient and effective than current Performance Consultant search strategy • Hybrid strategy takes advantage of strengths of dynamic instrumentation and sampling • Technique applicable to other types of hints • Better scalability for automated searches Demo Wednesday! pcroth@cs.wisc.edu [28 of 28] Deep Start