Using Set Operations on Code Coverage Data to Discover Program Properties

advertisement
Using Set Operations
on Code Coverage
Data to Discover
Program Properties
by
Nick Rutar
Motivation

Many Programs already have code coverage data






Various Code Coverage Tools Available
Widely Explored Area of Research
Regression tests with coverage data becoming more common
Code coverage data contains wealth of information
about the program
Data usually limited to how program reports it
Want to milk the data for all it is worth

Possibly useful for finding errors in the program
Code Coverage

Three Main Types




Program usually Instrumented


Statement
 Every line of code
Conditional
 Every decision in program (if/else)
Path
 Every path in the program
Dynamic or Static
Usually presented as a composite of separate tests
Using Set Operations





Why use set operations?
Most developers familiar with sets
Data for statement coverage maps nicely onto sets
Possible to manipulate data easily and give glimpses
of properties of the code
Most code coverage tools implicitly use sets anyway
Set Operations

Union


Intersection


Traditional Coverage
Lines ran on all tests
Difference


Potential for Locating Errors
Probably biggest stretch from what data is currently being used for
Set Operations At Work
int main(int argc, char *argv)
{
int x, y, z;
x = y = z =0;
if (argc == 2)
x = atoi(argv[1]);
if (x == 1)
y = 3;
else if (x == 2)
y = 4;
if (y > 0)
z = 5;
else
z = -2;
return z;
}
Inputs
No input
1
2
Union
Intersection
Difference
Off the Beaten Path Sets


Diff, - Union, U Intersection, I
U/I Bad Sets - U Good Sets

Sometimes give better basis for finding bad code
 Closest example of prior work only dealt with one bad run at a time

Any given test - itself


U (I of Sets & (U/I Bad Sets - U Good Sets))


Gives you the empty set
Gives you a very rough slice of program that went bad
Manipulate data as seen fit for what you are looking for …
Other Code Coverage Info

Pareto principle






Better known as 80-20 rule
Pareto noticed 80% of the land in Italy owned by 20% of people
Shows up in all kinds of domains
 Nick’s high school - 80% of girls dated 20% of the boys
Software 80-20 rule
 20% of the lines of code is 80% of the runtime of the software
Code Coverage often has frequency information
Use that information for performance bottlenecks
Implementation


Create tool that can use the set information
Implementation details





Created in Java
Based on output of format from LCOV coverage tool
Takes in pre-generated coverage information as input
Supports Union, Difference, and Intersection
Supports Frequency Information
Demo
Evaluation

Test Large Program against its regression test



Use Dyninst for evaluation
 C++ program that does binary instrumentation
 100+ Source Files
 ~30,000 LOC instrumented to create coverage data
 Nightly build already has coverage capability with regression tests
Verify Union matches coverage data given by tool
Use Difference to try to find errors
 Series of tests with various inputs
 See which inputs cause failure and locate lines to discover error
Future Work

For the Tool

Create Template for Insertion into program
 This program doesn’t care what language you are using
 Just needs input format to generate initial sets
 Specify format in text file, program uses it to input data
 Better Visualization to specify points of interest
 Highlight source code that still has active lines
 Usability
 Write now more of a proof of concept than a battle hardened tool

In General

More evaluation of using Diff for finding errors in the program
 Evaluation of software bottlenecks
 IDE integration
Questions???
Download