Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael D. Ernst, Jake Cockrell, William G. Griswold, David Notkin Presented by: Nick Rutar Program Invariants Useful in software development Protect programmers from making errant changes Verify properties of a program Can be explicitly stated in programs Programmers can annotate code with invariants This can take time and effort Many important invariants will be missed Could there be a way to dynamically discover program invariants??? Daikon: An Invariant Detector Pick a source program (Daikon is language independent) Instrument source program to trace variables of interest Run instrumented program over test cases Infer variants over Instrumented variables (variables present in source) Derived variables Created variables that might be of interest Derived Variables From any Sequence s Length: size(s) Extremal elements: s[0], s[1], s[-1], s[-2] From a numeric sequence sum(s), min(s), max(s) Any Sequence s and numeric variable(i) Element at index: s[i], s[i-1] Subsequences: s[0…i], s[0…i-1] From Function Invocations: Number of calls so far Example Program (taken from “The Science of Programming”) i, s = 0; do i ≠ n i, s = i + 1, s + b[i] Precondition: n≥ 0 Postcondition: s = ( j : 0 ≤ j < n : b[j]) Loop Invariant: 0 ≤ i ≤ n and s = ( j : 0 ≤ j < i : b[j]) Daikon results from the program (100 randomly generated input arrays of length 713) LOOP ENTER N = size(B) N in [7 … 13] B - All elements ≥ -100 EXIT N = I = orig(N) = size(B) B = orig(B) S = sum(B) N in [7 … 13] B - All elements ≥ -100 N = size(B) S = sum(B[0 … I -1]) N in [7 … 13] I in [0 … 13] I≤N B - all elements in [-100.100] sum(B) in [-556.539] B[0] nonzero in [-99.96] B[-1] in [-88.99] N != B[-1] B[0] != B[-1] *boxes indicate generated invariants that match expected ones Architecture of the Daikon tool Original Program Instrument Instrumented Program Run Test Suite Data Trace Detect Invariants Invariants Original Program Instrument Instrumented Program Daikon has instrumenters for Java, C, and Lisp Source to Source Translation Determines which variables are in scope Inserts code to dump the variables into an output file Creates a declaration file Variables being instrumented Types in the original program Representations in the trace file Sets of variables that may be sensibly compared Operates only on scalar numbers and arrays of numbers. Scalar numbers includes characters and booleans Any other type is converted to one of these forms Instrumented Program Run At each program point of interest Instrumented Program writes to a data trace file All variables in scope Global Variables Procedure Arguments Local Variables Return Values (at procedure exits) Modification bit Data Trace Whether a value has been set since last time For small programs runtime may be I/O bound Data Trace Detect Invariants Invariants Single variable invariants (numeric or sequence) Constant value: x = a (variable is a constant) Uninitialized: x = uninit (variable is never set) Modulus: x ≡ a mod b (x mod b = a always holds) Multiple variables up to 3 (numeric or sequence) Linear relationship: y = ax + b. Reversal: x is the reverse of y Invariants over x - y, x + y These are just a few Complete list can be found in the paper Domain-Specific invariants can easily be coded in Run Time of Daikon Informally, can be characterized as Time = O( (vars³ x falsetime + trueinvs x testsuite) x program) vars is the number of variables at a program point (in scope) falsetime is the (small constant) time to falsify a potential invariant trueinvs is the (small) number of true invariants at a program point testsuite is the size of the test suite Most invariants are falsified quickly Only true invariants are checked for the entire run Potentially cubic because invariants involve at most 3 variables Must balance accuracy versus runtime program is the number of instrumented program points The default is proportional to the size of the program Users can control the extent of instrumentation Invariant Stability Size of Test Suite Too Small Too large Small number of invariants More false invariants Increases runtime linearly Interesting vs. Uninteresting Different size test suites will have more/less invariants Uninteresting Difference in a bound on a variable’s range Different small set of possible values Interesting – everything else Invariant differences(2500-element test suite) Invariant Type/Test Cases Identical Unary Missing Unary Diff Unary Interesting Uninteresting Identical binary Missing Binary Diff Binary Interesting Uninteresting 500 2129 125 442 57 385 5296 4089 109 22 87 1000 2419 47 230 18 212 9102 1921 45 21 24 1500 2553 27 117 10 107 12515 1206 24 15 9 2000 2612 14 73 8 65 14089 732 19 13 6 Invariants and Program Correctness Compare invariants detected across programs Correct versions of programs have more invariants than incorrect ones Examination of 424 intro C programs from U of Washington Given # of students, amount of money, # of pizzas, calculates whether the students can afford the pizzas. Chose eight relevant invariants people – [1…50] pizzas – [1…10] pizza_price – {9,11} excess_money – [0...40] slices = 8 * pizza slices = 0 (mod 8) slices_per – {0,1,2,3} slices_left people - 1 Relationship of Grade and Goal Invariants Invariants Detected Grade 2 3 4 5 6 12 4 2 0 0 0 14 9 2 5 2 0 15 15 23 27 11 3 16 33 40 42 19 9 17 13 10 23 27 7 18 16 5 29 27 21 Other Applications of Invariants Inserted as assert statements for testing Double-check existing documentation Check against existing assert statements Useful when program self-checks are ineffective Discovering Bugs Generate test cases or validate existing test suites Could possibly direct a correctness proof Ongoing and Future Work Increasing Relevance Invariant is relevant if it assists programmer Repress invariants logically implied by others Unrelated variables don’t need to be compared Ignore variables not assigned since last time Viewing and Managing Invariants Overwhelming for a programmer to sort through Various tools for selective reporting of invariants Ordering by category Retrieves invariants based on supplied property List of invariants by program point More Ongoing Work Improving Performance Balance between invariant quality and runtime Number of Derived Variables used Richer Invariants Invariants over Pointer based data structures Computing Conditional Invariants Resources Daikon website http://pag.lcs.mit.edu/daikon/download/ Contains links to Papers Source Code User Manual Developers Manual Questions???