SeeSys: Space-Filling Software Visualization Marla J. Baker Stephen G. Eick AT&T Bell Labs Description of the paper •A concrete application of TreeMaps •Influenced by Dynamic Queries as well •Provides a high-level visualization of a large-scale software engineering project •A fast way of getting an overview from a CVS repository? •In other words: CVS for management •This is a “real-world” application! What questions do we want to answer visually? 1. Which subsystems are the largest? Where is new development activity? 2. Where are the large directories? How are the directories changing? 3. What proportion of work on a subsystem relates to fixing bugs as opposed to adding new functionality? 4. What components are candidates for code restructuring, based on their histories of required bug fixes? 5. How are the subsystems changing between releases? What assumptions must we make about the information to be visualized? The Data must be: •Quantitative Why? So we can easily compare different elements and display them with graphs. •Additive. Why? So we can break a whole up into parts, and know that the sum of these parts will reproduce the whole. TreeMaps seem to require these two notions. Any hidden assumptions? Yes! We’re also assuming that these types of metrics are easily kept by every programmer on the project, and feasibly attainable to SeeSys. Basically, we assume that something like CVS is running underneath SeeSys. Is this a valid assumption? Of course! But I wanted to make this explicit. What can we measure about our target system? •Non-comment source lines (NCSL) •Software complexity metrics •How do we get these? •Number and scope of modifications •Number of programmers making modifications •Number and type of bugs (a subcategory of modifications) 1) Subsystem Information Question: Which subsystems are largest? •Check the area of a bounding box •Check the color (redundant coloring) Question: Where is new development taking place? •Check the gray-fill area of a bounding box •It is perpendicular to the division of the bounding boxes Subsystem Information cont’d 2) Directory Information Question: Where are the large directories? •Find the thickest slices of a subsystem’s bounding box •They are parallel to the division of the subsystems •Larger slices are brighter in hue •Newly added code is grey filled Directory Information cont’d Directory Information cont’d: Zoom In 3) Error-Prone Code Question: Which subsystems and directories have the most bugs? •No more redundant coloring (why?) •Area of each subsystem represents new NCLS •Area of grey-fill is NCSL dedicated to bug fixes •Blue spikes represent “directory bug fixing NCSL detail” Error-Prone Code cont’d 4) Recurring Problems Question: What subsystems would make good candidates for code restructuring? •Area of each subsystem represents number of bugs •Grey-fill area represents fix-on-fix bugs •Blue spikes again represent directory detail •Why can’t I zoom in here! Ahh! Recurring Problems cont’d 5) System Evolution Question: How have subsystems changed between versions? •Area of rectangle represents largest size ever •Colored fill area represents size under current version • Can animate from one version to subsequent versions to get a picture over time •Could this be improved? System Evolution cont’d System Evolution cont’d System Evolution cont’d Favorite Sentence “When applied to production-sized systems, routines for producing flow-charts, function call-graphs, and structure diagrams often break because the display is too complicated. Or they produce displays that contain too much information and are illegible.” I translate this to read: Unlike the other tools I’ve cited, which are all nice little toys if you’re locked away in your ivory tower, this is not a toy application! This is a real-world application! Contributions •Concrete, useful, large-scale, real-world application of TreeMaps •Proof of concept that TreeMaps can handle extremely large datasets in a sensible way. •Showcases effective use of redundant coloring •Highlights hierarchical zooming– from subsystem, to directory, to file– between versions Contributions cont’d Still, much better than this: Index: BundleDownloader.java =================================================================== RCS file: /fs/savoir/pugh/p/cvs/java/daveho/cl/BundleDownloader.java,v retrieving revision 1.11 diff -u -r1.11 BundleDownloader.java --- BundleDownloader.java 2000/08/29 15:31:49 +++ BundleDownloader.java 2001/02/07 20:15:53 1.11 @@ -56,10 +56,11 @@ if ( tag == BundleProtocol.NOTFOUND ) { String name = m_input.readUTF(); callback.notFound( name ); + continue; } if ( tag != BundleProtocol.BUNDLE ) - throw new IllegalArgumentException("Invalid tag byte"); + throw new IllegalArgumentException("Invalid tag byte: " + tag); int length = m_input.readInt(); Notes on the references •Cites the TreeMaps paper (Johnson and Schneiderman, 1991) •Cites work on visualizing line-oriented data, such as an idividual source file. This looks cool! But it’s not relevant. •Cites lots of other papers, too… But I think the TreeMaps paper is the major influence on this work. Critique - weaknesses •I really want zoomed-in pictures of directories with error-prone code and recurring problems! •I’m not sure the techniques used to display errorprone code and recurring problems were all that great anyway… •The screenshots, even in the original paper, are terrible. •No measurements! I want to see “4 out of 5 middle managers prefer SeeSys to the leading brand.” Critique – strengths • Short, simple, elegant paper • Attacks a real problem • Doesn’t try to do too much– SeeSys provides high-level visualizations of large, complex software systems. Nothing more. What has happened to this topic? My web search uncovered that: •Google search reveals: SeeSys is someone’s domain name, and it’s an obscure command in MatLab. •This paper has been referenced 4 times, though 3 were by the same paper which showed up in a couple of places. •I found that a couple software engineering classes study this tool. •I was not able to uncover any commercial organizations that use this particular tool, though I’m not sure if that information is public anyway. Conclusions •Realistically, what else was left to do? •SeeSys provides useful, general, effective high-level visualizations of any quantitative, additive measurements. •Further useful visualizations would likely be too specific to a given system for this generalized framework •Or would need to be done per line of source code. This is a very different type of visualization problem for which TreeMaps are probably not the best medium. • More sophisticated Q&A measurements are more difficult to obtain– at some point, software engineering becomes an art My main question: Has this been written in Java? Does GNU have a free version? Can I build this type of TreeMap interface on top of a CVS distribution, like jCVS?