Software Visualization CS 4460 - Information Visualization Many slides courtesy John Stasko, with edits by J. Foley Software Visualization • Definition “The use of the crafts of typography, graphic design, animation, and cinematography with modern humancomputer interaction and computer graphics technology to facilitate both the human understanding and effective use of computer software.” Price, Baecker and Small, ‘98 CS 4460/7450 2 Challenge • Unlike much information visualization, however, software is often dynamic, thus requiring our visualizations reflect the time dimension History views Animation ... CS 4460/7450 3 Subdomains • Two main subareas of software visualization Program visualization Use of visualization to help programmers, coders, developers Software engineering focus Algorithm visualization Use of visualization to help teach algorithms and data structures Pedagogy focus CS 4460/7450 4 Caveat • This is a HUGH area • Presentation goal: provide flavor of kinds of techniques and systems that have been created Lots of screen shots Some videos CS 4460/7450 5 Program Visualization • Can be as simple as enhanced views of program source • Can be as complex as views of the execution of a highly parallel program, its data structures, run-time heap, etc. CS 4460/7450 6 PV is a big research Area CS 4460/7450 7 PV is a Big Product Category • Do Google searches on Program visualization Code visualization Software visualization • Lots of companies/products CS 4460/7450 8 What Can We Visualize? • ?????? What Can We Visualize? • Structure Static code structures • Behavior Dynamic (execution) code behavior Test suite results Bug issues/relations • Evolution Code repository structure/evolution Program team communications patterns CS 4460/7450 10 Static Code Structures • • • • Call graphs Object class hierarchy Scope of variables Code module sizes CS 4460/7450 11 Execution Data Summaries, such as Total running time Number of times a method was called Amount of time CPU was idle Bytes read/written Memory high-water mark Etc etc etc CS 4460/7450 Details, such as Memory allocations System calls Cache misses Page faults Pipeline flushes Process scheduling Completion of disk reads or writes Message receipt Application phases Etc etc etc 12 Test Results • Hundreds, maybe thousands of tests • For each test: Purpose Result (pass or fail) Could be per-configuration or per-version Relevant parts of the code CS 4460/7450 13 Really Detailed Execution Data • Logging virtual machines can capture everything Enough data to replay program execution and recreate the entire machine state at any point in time Allows “time-traveling” For long running systems, data could span months • Uses: Debugging Understanding attacks CS 4460/7450 14 SoftVis 2010 Program • • • • • • • • • • • • • • • • • • • • • • • • • • • • • An Interactive Ambient Visualization for Code Smells CodePad: Interactive Spaces for Maintaining Concentration in Programming Environments User Evaluation of Polymetric Views Using a Large Visualization Wall Software Evolution Storylines AllocRay: Memory Allocation Visualization for Unmanaged Languages Heapviz: Interactive Heap Visualization for Program Understanding and Debugging A Map of the Heap: Revealing Design Abstractions in Runtime Structures Trevis: A Context Tree Visualization & Analysis Framework and Its Use for Classifying Performance Failure Reports Exploring the Inventor's Paradox: Applying Jigsaw to Software Visualization Dependence Cluster Visualization Towards Anomaly Comprehension: Using Structural Compression to Navigate Profiling Call-Trees Embedding Spatial Software Visualization in the IDE: An Exploratory Study 3D Kiviat Diagrams for the Interactive Analysis of Software Metric Trends Graph Works - Pilot Graph Theory Visualization Tool Visualizing Software Entities Using a Matrix Layout ImpactViz: Visualizing Class Dependencies and the Impact of Changes in Software Revisions VIPERS: Visual Prototyping Environment for Real-Time Imaging Systems Towards Automated Analysis and Visualization of Distributed Software Systems TIE: An Interactive Visualization of Thread Interleavings GEM: Graphical Explorer of MPI Programs Fault Forest Visualization Visualizing Windows System Traces Understanding Complex Multithreaded Software Systems by Using Trace Visualization Zinsight: A Visual and Analytic Environment for Exploring Large Event Traces Jype - A Program Visualization and Programming Exercise Tool for Python Off-Screen Visualization Techniques for Class Diagrams An Automatic Layout Algorithm for BPEL Processes Visual Comparison of Software Architectures Representing Development History in Software Cities Frank xDIVA: Automatic Animation Between Debugging Break Points Understanding Relaxed Memory Consistency Through Interactive Visualization the Execution of Object Orientated Concurrent Programs CS 4460/7450 15 Commercial System Screen Shots CS 4460/7450 16 Dependency Graph Ndepend commercial product; http://www.ndepend.com/SampleReports/OnDb4o/NDependReport.html#/?screen=Main CS 4460/7450 17 Graph and Matrix Rep’n NDepend CS 4460/7450 18 Code Hierarchy Treemap; Lines of Code NDepend CS 4460/7450 19 Open Source System Screen Shots • From The Source-NavigatorTM IDE http://sourcenav.sourceforge.net/index.html CS 4460/7450 20 Hierarchy Browser Window CS 4460/7450 21 Cross-Reference Browser Window CS 4460/7450 22 Cross-Reference Browser Window CS 4460/7450 23 Open Source System Screen Shots • From jBixbe http://www.jbixbe.com/index.html CS 4460/7450 24 Structure Diagram CS 4460/7450 25 Message Exchanges CS 4460/7450 26 Multi-threading CS 4460/7450 27 Research Examples • The following are discussions of a variety of research prototypes • Lots and lots of papers in Software Visualization Conference Proceedings CS 4460/7450 28 Pretty Printing – Common Example CS 4460/7450 29 More Sophisticated • But not so common to apply more sophisticated graphic design • Baecker & Marcus, Design Principles for the Enhanced Presentation of Computer Program Source Text, Proceedings CHI ’86 • No user testing • Looks nice CS 4460/7450 30 Baecker&Marcus Pretty Printing CS 4460/7450 31 SeeSoft System • Pulled-back, far away view of source code • Map one line of source to one line of pixels Maintain indentation & length Color code lines in meaningful way • Like taping your source code to the wall, walking far away, then looking back at it CS 4460/7450 Stephen G. Eick, Joseph L. Steffen and Eric E. Sumner, Jr. “SeeSoft – A Tool for Visualizing Line-Oriented Software Statistics.” IEEE Transactions on Software Engineering, 18(11):957-968, November 1992. 32 SeeSoft View Selected code 15,000 lines of code in 52 files Heat map for colorcoded characteristic CS 4460/7450 Details of selected code 33 SeeSoft • Code tracking (typically means mapping a data attribute to line color) Code modification (when, by whom) Location of bug fixes for specific bug report Location of code to implement a specific feature Code coverage or hotspots • Interactive Change color mappings Link from heat map to code overview Brush views – back to source code CS 4460/7450 34 Tarantula • Developed at GT • Utilizes SeeSoft code view methodology • Takes results of test suite run and helps developer find program faults • Key is the clever color mapping Eagan, Harrold, Jones & Stasko InfoVis ’01 Jones, Harrold & Stasko ICSE ‘02 CS 4460/7450 35 Tarantula View Red – code executed by failed tests Green executed by passed tests Yellow executed by both passed and failed CS 4460/7450 36 Tarantula – Selective Display Failed tests only CS 4460/7450 37 Tarantula: Continuous Colour Mapping • Extend discrete colour mapping by Interpolating between red and green Adjusting brightness according to number of tests • Possibilities: Number of passed or failed tests Ratio of passed to failed tests Ratio of % passed to % failed CS 4460/7450 38 Tarantula: Continuous Colour Mapping • For each line L Let p and f be the percentages of passed and failed tests that executed L If p = f = 0, colour L grey Else, colour L according to Hue: p / ( p + f ), where 0 is red and 1 is green Brightness: max( p, f ) CS 4460/7450 39 CS 4460/7450 40 Tarantula: Future Work • Does it help find bugs? Seems like it should • Link red lines to the tests Static Dynamic CS 4460/7450 41 Gammatella - Visualization of ProgramExecution Data for Deployed Software Orso, Jones and Harrold, Proc. Of ACM Symp. on Software Visualization, June 2003, pp. 67-76. Gammatella: Tri-Level Representation • System level Treemap of package/class hierarchy Size – code lines Smaller areas colored differently according as code lines colored differently • File level: SeeSoft-like view of code • Statement level: Source code (colored text) CS 4460/7450 43 Gammatella screen shot Multiple linked views CS 4460/7450 44 Gammatella • Displays program execution data From data base of executions • Visualization of data from many executions Code coverage and profiling data Execution properties OS Java version Errors Etc etc etc CS 4460/7450 45 Color Coding • Two variables Hue: Red to yellow to green Intensity: Dark to light • Various attributes can be mapped to color Execution profile: red frequent, green infrequent; intensity not used Other mappings not discussed What could be done??? CS 4460/7450 46 System-Level View Gray: Never executed code CS 4460/7450 Execution Bar Colored subdivisions within a module block indicate amounts of code with various usages 47 Execution Bar • Each small vertical bar represents one program execution (test run) Color code – results of test run • May be hundreds of test runs • Treemap shows information for selected test run(s) Select one or range of runs by pointing Select multiple non-contiguous runs via data base query on test run properties For multiple executions, color codings are averages CS 4460/7450 48 Gammatella: Critique • Complete system – not just a visualization • Nicely links code to structure • Trial usage discovered useful but highlevel information Mainly relied on system view • Needs more usage to determine utility • Complex color coding – color and hue – hard to decode CS 4460/7450 49 CVSscan: Visualization of Code Evolution “CVSscan: Visualization of Code Evolution”, Voinea, Telea, and van Wijk, SoftVis 2005 Original presentation by Summer Adams CS 4460/7450 50 Overview • Objective - understand code evolution over multiple versions - support maintenance • Want to answer questions such as What code lines were added, removed, or altered? When? By whom? Which parts of code are unstable? How are changes correlated? How are development tasks distributed? Context in which code segment appeared? CS 4460/7450 51 Screen Shot Versions View of code, line by line and version by version Portion of code from selected version CS 4460/7450 52 Approach • Takes information from CVS (Concurrent Versioning System) source code repository • Uses UNIX’s diff command to compare versions • Visualizes each version as column Lines of code represented as in SeeSoft As thin horizontal color-coded strip CS 4460/7450 53 Two Code Visualization Formats • File-based • Line-based Local: like SeeSoft, just current lines CS 4460/7450 Global: includes future inserts 54 Code Visualization - Examples • 65 software versions • Color code Green – constant Yellow – modified Red - modified by deletion Light blue - modified by insertion Light gray in global view - inserted and deleted lines CS 4460/7450 Code cleanup stage Local: like SeeSoft, just current lines Global: includes future inserts 55 Metrics Per code line metric • Metrics for each version Code size • Metrics for each line of code Author Lifetime at current line • Other ideas for metrics?? CS 4460/7450 Per version metric 56 Overall View Preset zoom levels CS 4460/7450 57 Use Scenario Status Construct Author The same code, viewed with three different attributes being colorcoded. The modified piece of code (yellow status) is a comment (green status) and was modified by programmer whose color is purple (would be nice to have color key ) CS 4460/7450 58 CVScan Evaluation • Modest – two users, each on different code repository Perl Script, 65 versions, 457 lines C program, 60 versions, 2900 lines • 15 minutes training, 15 minutes use • Assessment General usability- think out loud Effectiveness –code understandings after use CS 4460/7450 59 CVScan Evaluation Results • Perl (65 versions, 457 lines) Familiar with overall organization of the file Focus of each developer Areas of significant modifications What modifications referred to • C (60 versions, 2900 lines) Did not have clear image of code’s evolution Did conclude that was legacy code adapted by mainly two users to support IPv6 network protocol Pointed out major modification to memory manager CS 4460/7450 60 CVScan Critique • Very modest evaluation – inconclusive • Use of metrics not very well developed • Nicely leverages SeeSoft to visualizing multiple versions of same code CS 4460/7450 61 Stepping Up • Next step in program visualization is to help debugging and performance optimization by visualizing program executions Data, data structures, run-time heap, memory, control flow, data flow, hot spots, ... CS 4460/7450 62 Graph Visualization • Area that we’ve already studied which is very important to SV • Graphs pop up everywhere in software • Common use: Visualize a call graph, visualize a flow chart, ... CS 4460/7450 63 Sample CallGraph View CS 4460/7450 64 Software Projects • Come up a level, to group of developers working on a code base • What might you want to understand? For instance, in an open source project? CS 4460/7450 65 Stargate: An Author-Centric Approach to Software Project Visualization Ogawas and Ma, Stargate: An Author-Centric Approach to Software Project Visualization Proceedings of IEEE PacificVis 2008 March, 2008 CS 4460/7450 66 Stargate – Start with Modified Sunburst CS 4460/7450 67 Stargate – Add Developers • Dots are developers Placed close to files they worked on Size => amount of activity • File colors => File type CS 4460/7450 68 Stargate – Add Communications • Email communications among developers CS 4460/7450 69 Stargate – Add File History CS 4460/7450 70 Stargate Video • http://vidi.cs.ucdavis.edu/research/videos /stargate • (Show from SV folder) CS 4460/7450 71 SV Takeaways • Multiple dimensions to SV Static code/program relationships Dynamic code/program relationships Testing/coverge Project evolution • Many methods we have studied are applicable, such as ways to depict Time-varying behaviors Object relationships CS 4460/7450 72 The End CS 4460/7450 73 FIELD • Program development and analysis environment with a wide assortment of different program views Integrated a variety of UNIX tools Utilized central message server architecture in which tools communicated through message passing Reiss Software Pract & Exp ‘90 CS 4460/7450 74 FIELD Interface CS 4460/7450 75 Dynamic Call Graph View On call stack CS 4460/7450 Currently active 76 Class Browser CS 4460/7450 77 Heap View Color can be When allocated Block size Where allocated CS 4460/7450 78 3D Call Graph Selected file Groups of calls Collapsed file CS 4460/7450 79 Visualizing Application Behavior on Superscalar Processors Chris Stolte, Robert Bosch, Pat Hanrahan, and Mendel Rosenblum Proc. InfoVis 1999 Superscalar Processors: Quick Overview • Pipeline • Multiple Functional Units Instruction-Level Parallelism (ILP) • Instruction Reordering • Branch Prediction and Speculation • Reorder Buffer Instructions wait to graduate (exit pipeline) CS 4460/7450 81 CS 4460/7450 82 CS 4460/7450 83 CS 4460/7450 84 CS 4460/7450 85 CS 4460/7450 86 CS 4460/7450 87 CS 4460/7450 88 CS 4460/7450 89 Critique • Most code doesn’t need this level of optimization, but The visualization is effective, and would be useful for code that does May reduce the expertise needed to perform low level optimzation • Might be effective as a teaching tool • Bad color scheme: black/purple/brown • Does it scale with processor complexity? CS 4460/7450 90 Concurrent Programs • Understanding parallel programs is even more difficult than serial • Visualization and animation seem naturals for illustrating concurrency • Temporal mapping of program execution to animation becomes critical Kraemer & Stasko JPDC ‘93 CS 4460/7450 91 POLKA • We developed a system, POLKA, that was designed to help people build visualizations of concurrent programs Used for both program and algorithm visualization Used for different programming models: message passing, shared memory threads, ... Stasko & Kraemer JPDC ‘93 CS 4460/7450 92 Message Passing Systems PVM/Conch Topol, Stasko, Sunderam IJPDSN ‘98 CS 4460/7450 93 Shared Memory Threads CS 4460/7450 Pthreads Zhao & Stasko TR ‘95 94 StrataVarious: MultiLayer Visualization of Dynamics in Software System Behavior Doug Kimelman, Bryan Rosenburg, Tova Roth Proc. Fifth IEEE Conf. Visualization ’94, IEEE Computer Society Press, Los Alamitos, Calif., 1994, pp. 172–178. StrataVarious • Trace-driven program visualization • Trace: sequence of <time, event> pairs • Events captured from all layers (strata, hence the name): Hardware Operating System Application • Replay execution history • Coordinate navigation of event views CS 4460/7450 96 StrataVarious: Why Multi-Strata? • Debugging and tuning requires simultaneously analyzing behaviour at multiple layers of the system CS 4460/7450 97 Accumulated process times Call tree Kernel performance stats CS 4460/7450 Process run history 98 CS 4460/7450 99 Goals • Facilitate debugging by providing execution time information • Faster assimilation of execution data • Facilitate visual correlations, trends and anomalies • Several layers including user-level libraries, OS and hardware CS 4460/7450 100 PV • Trace-driven • Trace: Time ordered sequence of events of interest • Synchronous and asynchronous configurations • Multiple concurrent views representing the different layers CS 4460/7450 101 Views • Process scheduling and System activity Scheduling view Activity view CS 4460/7450 102 Views • Memory Activity and Application Progress Memory size view Source of memory allocation State of the page of the data segment CS 4460/7450 103 Memory view Code views CS 4460/7450 104 Views • Hardware Activity and Source Progress Active loop view Hardware performance statistics CS 4460/7450 105 Conclusion • A more effective and comprehensive visualization of software • Worthwhile for developers facing performance issues • Room for improvement in visualization • More direct traversal • Presentation of derived data CS 4460/7450 106 StrataVarious: Critique • Examples demonstrate usefulness • Fundamentally, a good idea Increasing importance as multi-core machines become standard • Many windows Titles not meaningful • Dubious claim that tracing does not alter behavior CS 4460/7450 107 What is Useful? 1991 Survey • Very useful 14,1,18,32,2, 19,5 • Useless 22,33,34 • All rest useful but not essential Software Visualization Tools: Survey and Analysis Sarita Bassil and Rudolf K. Keller, 1991 CS 4460/7450 108 What is Useful? • Very useful 14,1,18,32,2, 19,5 • Useless 22,33,34 • All rest useful but not essential CS 4460/7450 109