Dynamically Discovering Likely Program Invariants to Support Program Evolution Michael Ernst, Jake Cockrell, William Griswold, David Notkin Presented by Charles Song What are Invariants “An invariant is a condition that does not change, or should not, if the system is working correctly.” – Wikipedia Invariant Example int getDayOfMonth() { … } (0 < returned value <= 31) Invariant Example a = x; Y = 0; (y = x – a) while (a != 0) { y = y + 1; a = a – 1; } (y = x – a) if x = 5 y = 0; x y = 1; x y = 2; x y = 3; x y = 4; x y = 5; x = = = = = = 5; 5; 5; 5; 5; 5; a a a a a a = = = = = = 5 4 3 2 1 0 Invariants & Software Evolution Specify correct behavior of programs (Axiomatic Approach) Protect programmers from making changes that violate correct behavior Explicit Invariants Invariants are great, where do we get some? Have programmers annotate code Automatically infer invariants Technique Overview Dynamic Discovery of Invariants Execute a program on a collection of inputs Extract variable values Infer Invariants Invariant Detection Engine Instrumentation Select program points at which to insert instrumentation Procedure entry and exit points Loop heads Select variables to examine at selected points All variables in scope Invariant Detection Engine Selecting/Running test suites Require repeated execution of instrumentation points Accuracy of inferred invariants depends on quality of inputs Invariant Detection Engine Inferring Invariants Use outputs of instrumented programs List invariants detected at each instrumented point Invariants Checked Constants/small number of values Range (a < x < b), modulus Linear relationship (x = ay + bz + c) Comparisons (x < y) Functions (z = max(x, y)) Sequences (< 100, membership) Other Invariants Negative invariants expected relationships but never observed determined by probability Derived variables array: first & last elem, length, subarray numeric array: sum, min, max function invocations Staged Derivation & Inference Derived variables are not introduced until invariants are computed for variables if j >= len(A) then do not derive A[j] Evaluations The Science of Programming with formal pre & post conditions, loop invariants detected stated properties and more Search/Replace C Program undocumented code most invariants remained unchanged changed invariants verified modifications Performance Factors Number of variables in scope Most effect run-time (quadratic) Plot different sets of variables at same instrumentation point 10 derived variables for 1 original one Number of test cases Less effect on runtime (linear) Invariant Stability 500, 1000, … 2500, 3000 test cases Compare unary and binary invariants Knee somewhere between 500 and 1000 Problems with pointers and uninitialized arrays Performance Improvments Select interested parts of program Fewer test cases but risk of less precise output Check fewer invariants Conclusions Automatically detect invariants in programs Encourage programmers to think in terms of invariants Not useful to programmers who knows exactly what they seek