Dynamically Discovering Likely Program Invariants to Support Program Evolution

advertisement
Dynamically Discovering
Likely Program Invariants to
Support Program Evolution
Michael D. Ernst, Jake Cockrell,
William G. Griswold, David Notkin
Presented by: Nick Rutar
Program Invariants

Useful in software development
Protect programmers from making errant changes
 Verify properties of a program


Can be explicitly stated in programs
Programmers can annotate code with invariants
 This can take time and effort
 Many important invariants will be missed

Could there be a
way to
dynamically
discover
program
invariants???
Daikon: An Invariant Detector




Pick a source program (Daikon is language independent)
Instrument source program to trace variables of interest
Run instrumented program over test cases
Infer variants over


Instrumented variables (variables present in
source)
Derived variables

Created variables that might be of interest
Derived Variables

From any Sequence s
Length: size(s)
 Extremal elements: s[0], s[1], s[-1], s[-2]


From a numeric sequence


sum(s), min(s), max(s)
Any Sequence s and numeric variable(i)
Element at index: s[i], s[i-1]
 Subsequences: s[0…i], s[0…i-1]


From Function Invocations:

Number of calls so far
Example Program
(taken from “The Science of Programming”)
i, s = 0;
do i ≠ n 
i, s = i + 1, s + b[i]
Precondition:
n≥ 0
Postcondition:
s = ( j : 0 ≤ j < n : b[j])
Loop Invariant:
0 ≤ i ≤ n and
s = ( j : 0 ≤ j < i : b[j])
Daikon results from the program
(100 randomly generated input arrays of length 713)
 LOOP
 ENTER
N = size(B)
 N in [7 … 13]
 B - All elements ≥ -100


EXIT





N = I = orig(N) = size(B)
B = orig(B)
S = sum(B)
N in [7 … 13]
B - All elements ≥ -100

N = size(B)
S = sum(B[0 … I -1])
N in [7 … 13]
I in [0 … 13]
I≤N

B - all elements in [-100.100]









sum(B) in [-556.539]
B[0] nonzero in [-99.96]
B[-1] in [-88.99]
N != B[-1]
B[0] != B[-1]
*boxes indicate generated invariants that match expected ones
Architecture of the Daikon tool
Original
Program
Instrument Instrumented
Program
Run
Test
Suite
Data
Trace
Detect
Invariants
Invariants
Original
Program





Instrument
Instrumented
Program
Daikon has instrumenters for Java, C, and Lisp
Source to Source Translation
Determines which variables are in scope
Inserts code to dump the variables into an output file
Creates a declaration file
Variables being instrumented
 Types in the original program
 Representations in the trace file
 Sets of variables that may be sensibly compared


Operates only on scalar numbers and arrays of numbers.
Scalar numbers includes characters and booleans
 Any other type is converted to one of these forms

Instrumented
Program

Run
At each program point of interest

Instrumented Program writes to a data trace file

All variables in scope





Global Variables
Procedure Arguments
Local Variables
Return Values (at procedure exits)
Modification bit


Data
Trace
Whether a value has been set since last time
For small programs runtime may be I/O bound
Data
Trace

Detect Invariants
Invariants
Single variable invariants (numeric or sequence)
Constant value: x = a (variable is a constant)
 Uninitialized: x = uninit (variable is never set)
 Modulus: x ≡ a mod b (x mod b = a always holds)


Multiple variables up to 3 (numeric or sequence)
Linear relationship: y = ax + b.
 Reversal: x is the reverse of y
 Invariants over x - y, x + y


These are just a few


Complete list can be found in the paper
Domain-Specific invariants can easily be coded in
Run Time of Daikon

Informally, can be characterized as

Time = O( (vars³ x falsetime +
trueinvs x testsuite) x program)

vars is the number of variables at a program point (in scope)






falsetime is the (small constant) time to falsify a potential invariant
trueinvs is the (small) number of true invariants at a program point
testsuite is the size of the test suite


Most invariants are falsified quickly
Only true invariants are checked for the entire run
Potentially cubic because invariants involve at most 3 variables
Must balance accuracy versus runtime
program is the number of instrumented program points


The default is proportional to the size of the program
Users can control the extent of instrumentation
Invariant Stability

Size of Test Suite

Too Small



Too large


Small number of invariants
More false invariants
Increases runtime linearly
Interesting vs. Uninteresting


Different size test suites will have more/less invariants
Uninteresting



Difference in a bound on a variable’s range
Different small set of possible values
Interesting – everything else
Invariant differences(2500-element test suite)
Invariant Type/Test Cases
Identical Unary
Missing Unary
Diff Unary
Interesting
Uninteresting
Identical binary
Missing Binary
Diff Binary
Interesting
Uninteresting
500
2129
125
442
57
385
5296
4089
109
22
87
1000
2419
47
230
18
212
9102
1921
45
21
24
1500
2553
27
117
10
107
12515
1206
24
15
9
2000
2612
14
73
8
65
14089
732
19
13
6
Invariants and Program
Correctness



Compare invariants detected across programs
Correct versions of programs have more invariants than incorrect ones
Examination of 424 intro C programs from U of Washington


Given # of students, amount of money, # of pizzas, calculates whether the
students can afford the pizzas.
Chose eight relevant invariants








people – [1…50]
pizzas – [1…10]
pizza_price – {9,11}
excess_money – [0...40]
slices = 8 * pizza
slices = 0 (mod 8)
slices_per – {0,1,2,3}
slices_left  people - 1
Relationship of Grade and Goal Invariants
Invariants Detected
Grade 2
3
4
5
6
12
4
2
0
0
0
14
9
2
5
2
0
15
15
23
27
11
3
16
33
40
42
19
9
17
13
10
23
27
7
18
16
5
29
27
21
Other Applications of Invariants


Inserted as assert statements for testing
Double-check existing documentation





Check against existing assert statements
Useful when program self-checks are ineffective
Discovering Bugs
Generate test cases or validate existing test suites
Could possibly direct a correctness proof
Ongoing and Future Work

Increasing Relevance
Invariant is relevant if it assists programmer
 Repress invariants logically implied by others
 Unrelated variables don’t need to be compared
 Ignore variables not assigned since last time


Viewing and Managing Invariants
Overwhelming for a programmer to sort through
 Various tools for selective reporting of invariants




Ordering by category
Retrieves invariants based on supplied property
List of invariants by program point
More Ongoing Work

Improving Performance



Balance between invariant quality and runtime
Number of Derived Variables used
Richer Invariants


Invariants over Pointer based data structures
Computing Conditional Invariants
Resources

Daikon website


http://pag.lcs.mit.edu/daikon/download/
Contains links to
Papers
 Source Code
 User Manual
 Developers Manual

Questions???
Download