Dynamically Discovering
Likely Program Invariants to
Support Program Evolution
Michael D. Ernst, Jake Cockrell,
William G. Griswold, David Notkin
Presented by: Nick Rutar
Program Invariants
Useful in software development
Protect programmers from making errant changes
Verify properties of a program
Can be explicitly stated in programs
Programmers can annotate code with invariants
This can take time and effort
Many important invariants will be missed
Daikon - Dynamic Invariant Detector
Dynamic -- From Program Executions
Step 1: Instrument Source Program
Trace Variables of Interest
Step 2: Run Instrumented Program Over Test Suite
Step 3: Infer Invariants from
Instrumented Variables
Derived Variables
Example Program
(taken from “The Science of Programming”)
i = 0;
s = 0;
do i ≠ n
i=i+1
s = s + b[i]
Precondition:
n≥ 0
Postcondition:
s = ( j : 0 ≤ j < n : b[j])
Loop Invariant:
0 ≤ i ≤ n and
s = ( j : 0 ≤ j < i : b[j])
Daikon results from the program
(100 randomly generated input arrays of length 713)
ENTER
N = size(B)
N in [7 … 13]
B - All elements ≥ -100
EXIT
N = I = orig(N) = size(B)
B = orig(B)
S = sum(B)
N in [7 … 13]
B - All elements ≥ -100
LOOP
N = size(B)
S = sum(B[0 … I -1])
N in [7 … 13]
I in [0 … 13]
I≤N
B - all elements in [-100.100]
sum(B) in [-556.539]
B[0] nonzero in [-99.96]
B[-1] in [-88.99]
N != B[-1] (negative)
B[0] != B[-1] (negative)
Instrumentation
Insert instrumentation points
Writes to a file values for
Procedure Entry
Procedure Exit
Loop Heads
All variables in scope
Global Variables
Procedure arguments
Local Variables
Procedure’s return value
Available for Platforms
LISP
C/C++
Java (from Daikon website)
Eclipe plug-in available
Perl (from Daikon website)
Inferring invariants
System checks for the following (x,y,z variables; a,b,c computed constants):
Any variable
constant or small number of values
Numeric variable
range (a ≤ x ≤ b)
modulus & nonmodulus
Multiple numbers
linear relationship (such as x = ay + bz + c)
functions (all those in standard lib, e.g. x = abs(y))
comparisons (x < y, x ≥ y, x == y)
invariants over x + y and x -y
Sequence:
sortedness
invariants over all elements (e.g., every element < 100)
Multiple sequences
subsequence & lexicographic relationship
Sequence and scalar
membership
Inferring invariants (continued)
Each potential variant is tested
When invariant doesn’t hold, not tested again
Negative Invariants
Relationships that are expected but don’t occur from input
Probability limit decides if invariants are included
Derived Variables
Expressions treated same as regular variables
Include:
From any array: first and last elements, length
From numeric array: sum, min, max
From array and scalar: element at that index(a[i]), subarray up to, and
subarray beyond, that index
From function invocation: number of calls so far
Using Invariants
Modified Siemens replace (~500 LOC) program
Takes in regular expression and replacement string as input
Copies input stream to output stream replacing matched strings
Added input pattern <pat>+ to <pat><pat>*
Use invariants for glimpse on how program runs
Found occurrences where initial belief was contradicted
Prevented introducing bugs based on flawed knowledge of code
Found instance of unreported array bounds error
Using invariants (continued)
Everything learned from “replace” could have been learned
by combination of
Reading the code
Static Analyses
Selected Program Instrumentation
Invariants give benefits that other approaches do not
Inferred invariants are abstraction of larger amount of data
Flags raised with unexpected invariants or expected invariants not appearing
Queries against database build intuition about source of invariant
Inferred invariants provide basis for programmer inferences
Invariants provide beneficial degree of serendipity
Results - Time
Ran tests with between 500-3000 test inputs for replace
Inferred ~71 variables per inst point in replace
6 original, 65 derived, 52 scalars, 19 sequences
On average, 10 derived for every original
1000 test cases
Produce 10,120 samples per instrumentation point
System takes 220 seconds to infer invariants
3000 test cases
33,801 samples
Processing takes 540 seconds
Invariant detection time grows quadratically with the
number of variables over which invariants are checked
Time grows linearly with test suite size
Invariant Stability
Relationship between test size suite and invariants
Across test suites
Identical - invariant same between two test suites
Missing - invariant is present in one test suite, but not other
Different - invariant is different between two test suites
Interesting - Worthy of further study to determine relevance
Uninteresting - Peculiarity in the data
S1 in [ 0 … 98 ] (99 values)
S1 >= 0
(96 values)
Invariant differences(2500-element test suite)
Invariant Type/Test Cases
Identical Unary
500
2129
1000
2419
1500
2553
2000
2612
Missing Unary
125
47
27
14
Diff Unary
442
230
117
73
Interesting
57
18
10
8
Uninteresting
385
212
107
65
Identical binary
5296
9102
12515
14089
Missing Binary
4089
1921
1206
732
Diff Binary
109
45
24
19
Interesting
22
21
15
13
Uninteresting
87
24
9
6
Invariants and Program
Correctness
Compare invariants detected across programs
Correct versions of programs have more invariants than
incorrect ones
Examination of 424 intro C programs from U of Washington
Given # of students, amount of money, # of pizzas, calculates whether the
students can afford the pizzas.
Chose eight relevant invariants
people – [1…50]
pizzas – [1…10]
pizza_price – {9,11}
excess_money – [0...40]
slices = 8 * pizza
slices = 0 (mod 8)
slices_per – {0,1,2,3}
slices_left people - 1
Relationship of Grade and Goal Invariants
Invariants Detected
Grade 2
3
4
5
6
12
4
2
0
0
0
14
9
2
5
2
0
15
15
23
27
11
3
16
33
40
42
19
9
17
13
10
23
27
7
18
16
5
29
27
21
Future Work (from 2001 paper)
Increasing Relevance
Invariant is relevant if it assists programmer
Repress invariants logically implied by others
Viewing and Managing Invariants
Overwhelming for a programmer to sort through
Various tools for selective reporting of invariants
Improving Performance
Balance between invariant quality and runtime
Number of Derived Variables used
Richer Invariants
Invariants over Pointer based data structures
Computing Conditional Invariants
Resources
Daikon website
http://pag.csail.mit.edu/daikon/
Contains links to
Papers
Source Code
User Manual
Developers Manual
Questions???