Slides - faculty.sutd.edu.sg

advertisement
50.530: Software Engineering
Sun Jun
SUTD
Week 10: Invariant Generation
Problem
{pre}while B do program{post} if there exists an invariant inv
such that the following are satisfied:
(1) pre => inv
(2) {inv && B} program {inv}
(3) inv && !B => post
and the loop terminates.
How do we find inv so as to complete the proof?
Big View
pre
inv
pre => inv
Big View
B
!B
post
pre
inv
inv && !B => post
Big View
B
!B
post
pre
inv
one iteration
{inv && B}program{inv}
Static/Dynamic Analysis
• Static analysis: infer (loop) invariants based on
source code without executing the program
(treating programs a mathematical formula)
• Dynamic analysis: infer (loop) invariants based
on testing results.
– It’s about learning something about the invariants
and making guesses!
Exercise 1
x = 0.1; y = 0;
while (x < 2) {
k = 4 – x*x;
y = sqrt(4-k);
x += 0.001;
}
if (y < 0) {
error();
}
Show that the error is not occurring.
Ernst et al. IEEE Transactions on Software Engineering 2001
DYNAMICALLY DISCOVERING LIKELY
PROGRAM INVARIANTS TO SUPPORT
PROGRAM EVOLUTION
The Approach
Seem familiar?
Instrumentation
• Instrument at the beginning/end of each
method and the start of loops.
• Daikon only supports two forms of data: scalar
numbers (including characters and Booleans)
and sequence of scalars;
• Convert other values into one of these forms.
Example: Instrumentation
public int sumUp (int[] B, int N) {
int i = 0;
int s = 0;
public int sumUp (int[] B, int N) {
//add code to output values
int i = 0;
int s = 0;
while (i != N) {
i = i+1;
s = s +B[i]
}
while ( i != N) {
//add code to output values
i = i+1;
s = s +B[i];
}
return s;
}
//add code to output values
return s;
}
Example: Testing
100 randomly-generated arrays of length 7 to 13, in
which each element was a random number in the range
of -100 to 100.
The following s is the learned pre-condition.
Example: Testing
100 randomly-generated arrays of length 7 to 13, in
which each element was a random number in the range
of -100 to 100.
The following s is the learned post-condition.
Example: Testing
100 randomly-generated
arrays of length 7 to 13, in
which each element was a
random number in the
range of -100 to 100.
The following loop
invariants are learned .
Discussion
What invariants should we infer?
What Invariants to Infer?
• Invariants over any variables
– Constant value, e.g., x = a;
– Uninitialized, e.g., x = uninit;
• Invariants over a single numeric variable
– Range limit, e.g., x >= a, x <= b, a <= x <= b
– Nonzero, e.g., x != 0
– Modulus, e.g., x mod b = a
– Non-modulus, e.g., x mod b != a
What Invariants to Infer?
• Invariants over two numeric variables
– Linear relationship, e.g., y = ax+b
– Ordering comparison: x < y, x <= y, x > y, x >= y, x = y, x
!= y
– Functions, e.g., y = fn(x) or x = fn(y) where fn is one of
Python’s built-in unary functions like absolute values,
negation, etc.
– Invariants over x+y: any invariant from the list of
invariants over a single numeric variable, such as (x+y)
mod b = a
– Invariants over x-y: as for x+y;
What Invariants to Infer?
• Invariants over three numeric variables
– Linear relationship, e.g., z = ax+by+c
– Functions, e.g., z = fn(x, y) or x = fn(y) where fn is
one of Python’s built-in binary functions like min,
max, GCD, and, or, etc.
How about four variables and more?
What Invariants to Infer?
• Invariants over a single sequence variable
– Range: minimum and maximum sequence values,
ordered lexicographically; for instance, this can
indicate the range of string or array values
– Element ordering: whether the elements of each
sequence are non-decreasing, non-increasing, or
equal
– Invariants over all the sequence elements (treated
as a single large collection)
What Invariants to Infer?
• Invariants over two sequence variables
– Linear relationship: y = ax + b, element-wise
– Comparison: x < y, x <= y, x > y, x >= y, x = y, x != y,
perform lexicographically
– Subsequence relationship: x is a subsequence of y or
vice versa
– Reversal: x is the reverse of y
• Invariants over a sequence and a numeric
variable
– Membership: i in s
What Invariants to Infer?
• Derived variables
– Derived from any sequence s
• Length: size(s)
• Extremal elements: s[0], s[1], s[size(s)-1], s[size(s)-2]
– Derived from any numeric sequence s
• sum: sum(s)
• Minimum elements: min(s)
• Maximum elements: max(s)
– Derived from any sequence s and any numeric variable i
• Element at the index: s[i], s[i-1]
• Subsequences: s[0..i], s[0..i-1]
– Derived from function invocations: number of calls so far
Algorithm
• Collect samples at a program point (through
instrumentation and testing)
• For all variables, test every potential invariant
(defined above)
• Remove an invariant if it is violated by a
sample.
Exercise 2
int inc(int *x, int y) {
*x += y;
return *x;
}
Given the program and the
collected data, what are the
invariants?
Filtering Invariants
• Too many potentially invariants could
discourage programmers from looking
through them.
• A better test suite could help.
• Daikon filters invariants by computing an
invariant confidence: assume a random input,
what is the chance of the invariant would
appear?
Invariant Confidence: Example
• A range for numeric ranges like x in [32..126]
are reported only if the limits appear to be
non-coincidental: if several values near the
extremes all appear about as often as would
be expected (assuming uniform distribution).
Invariant Confidence: Example
• Suppose the reported value for variable x fall
in a range of size r that includes 0
• Suppose that x != 0 holds for all test cases
• The probability of x != 0 is: (1-1/r)^n where n
is the number of samples
• If the probability is less than a user-defined
confidence threshold, then x != 0 is reported.
Scalability
Daikon’s invariant detection time is
• Potentially cubic in the number of variables in
scope at a program point (not the total
number of variables in the program)
• Linear in the number of samples (the number
of times a program point is executed)
• Linear in the number of instrumented
program points.
Case Study: Invariant Stability
Warming:
One program!
Case Study: Invariant Stability
Conclusion:
Stable?
More Invariants, Better Programs?
• Experiment setup
– 424 student programs from a single assignment
for CSE 142 at University of Washington
– The quality of the programs is measured by their
scores.
– Invariant detection was performed over 200
executions of each program, resulting in 3 to 28
invariants per program.
• Conclusion: No co-relation
Discussion
For invariant generation, shall we use
random test case generation or systematic
test case generation?
How do we measure the usefulness of the
generated invariants?
How do we test whether a generated
invariant is really a loop invariant?
How do we identify the useful templates for
invariants?
Can we discover disjunctive invariants?
Jaffar et al. RV’11
UNBOUNDED SYMBOLIC EXECUTION
FOR PROGRAM VERIFICATION
Motivation
• Symbolic execution doesn’t handle loops well:
path explosion
• Loop invariants are essential to handle loops.
• Idea: learn loop invariant through symbolic
execution
Iterative Deepening
Step 1: execute path L0,1,4,5
symbolically
L0 x = 0;
L1 while (x < n) {
L2
x++;
L3 }
L4 if (x < 0) {
L5
error();
L6 }
x = 0 &&
x >= n &&
x<0
//from L0
//from L1
//from L4
Interpolant at L4: x >= 0
Iterative Deepening
L0 x = 0;
L1 while (x < n) {
L2
x++;
L3 }
L4 if (x < 0) {
L5
error();
L6 }
Step 2: check if x >= 0 is a loop
invariant by checking whether
the following is satisfiable.
x >= 0 && x < n &&
x1 = x+1 && x1 < 0
No! Thus x >= 0 is a loop
invariant. Complete the proof
with Hoare logic rules.
Another Look
Initially,
L0
L0 x = 0;
L1 while (x < n) {
L2
x++;
L3 }
L4 if (x < 0) {
L5
error();
L6 }
L1
x<n
L2
L3
x>=n
L4
x<0
error
Another Look
With the loop invariant,
L0
L0 x = 0;
L1 while (x < n) {
L2
x++;
L3 }
L4 if (x < 0) {
L5
error();
L6 }
L2
L3
L1
x>=0
x<n
x>=n
L4
x<0
error
This serves as a proof that error is not reachable.
Finding a loop invariant is to find this label at this a loop head!
Iterative Deepening
L0
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L1
new!=old
L2
L3
new=old
L6
lock=0
error
L4
L5
Is error happening?
What label shall we generate at L1?
Iterative Deepening
Step 1: execute path L0,1,6,7
symbolically
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
lock=0&&new=old+1&& //from L0
new==old &&
//from L1
lock==0
//from L6
Interpolant at L6: lock!=0
Is lock!=0 an invariant during the
loop?
Iterative Deepening
Step 1: execute path L0,1,6,7
symbolically
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
lock=0&&new=old+1&& //from L0
new=old &&
//from L1
lock==0
//from L6
What is the interpolant at L1?
That is,
• A is lock=0&&new=old+1
• B is new=old&&lock=0
Ideal Case
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
The interpolant at L1 is
new!=old || lock != 0
Exercise 3: Is this a loop invariant
strong enough to prove that error is
not possible?
Recall existing techniques only return conjunctive interpolants.
The interpolant at L1 thus may be either new!=old or lock!=0,
neither of which is a loop invariant.
Iterative Deepening
Step 2: execute path L0,1,2,3,5,1,6,7
symbolically
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
lock=0&&new=old+1&& //from L0
new!=old &&
//from L1
lock1=1&old1=new && //from L2
new=old1&&
//from L1
lock1==0
//from L6
Interpolant at L1?
Iterative Deepening
Step 2: execute path
L0,1,2,3,4,5,1,6,7 symbolically
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
It doesn’t help to execute
more iterations
lock=0&&new=old+1&& //from L0
new!=old &&
//from L1
lock1=1&old1=new && //from L2
lock2=0&new1=new+1 && //from L2
new=old1&&
//from L1
lock1==0
//from L6
Interpolant at L1?
Alternative Approach
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
Assume there is a label Inv at L1
which is a loop invariant; The
following is true.
lock=0&&new=old+1 => Inv
lock=1&&old=new => Inv
L0
L1
lock=0&&new=old+1
L2
L3
L5
L1’
lock=1&&old=new
Alternative Approach
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L0
L1
lock=0&&new=old+1
L2
lock=0&&new=old+1
Ideally, we let Inv be
(lock=0&&new=old+1) ||
(lock=1&&old=new) ||
(lock=0&&new=old+1)
L3
Exercise: check if Inv is indeed a
loop invariant.
L1’
L4
L5
L5
lock=1&&old=new
L1’
Invariant Validation
L0
L1
new!=old
L2
L3
(lock=0&&new=old+1) || (lock=1&&old=new)
new=old
L6
lock=0
error
L4
L5
Since it is a loop invariant,
we can label L1 now. Is it
strong enough?
An Ideal Algorithm
• Identify paths which end at the loop head for
the first time.
• Test if the disjunction of the path conditions is
a loop invariant strong enough for the proof
• If positive, terminate
• Otherwise, identify paths which end at the
loop head for the second time.
• …
Discussion
int i = 0;
while (i < 1000) {
i++;
}
First time: i = 0;
Second time: i = 1;
Third time: i = 2;
…
How do we make the jump to i <= 1000?
Another Look at Daikon
L0
{(lock=0,old=*, new=*+1),
(lock=1,old=*+1, new=*+1), …}
L1
new!=old
L2
L3
L4
new=old
L6
lock=0
error
Pre-defined
abstraction
lock=0
new=old
new=old+1
L5
Can Daikon find the right invariant in this case?
New Approach: USE
Step 1: execute
symbolically
L0
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L1
L6
L7
New Approach: USE
Step 2: Compute
interpolant
L0
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L1
L6
L7
lock!=0
New Approach: USE
Step 3: Label loop head
L0
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L1
L6
L7
{lock=0, new=old+1}
lock!=0
New Approach: USE
Step 4: abstract loop head
labels based on the new
condition.
• The loop head L1 is visited
with a different path with a
new condition.
• Abstract the labels on L1 so
that it is implied by the new
condition.
L0
L1
lock=0&&new=old+1
L2
L3
L5
L1’
lock=1&&old=new
New Approach: USE
Step 4: abstract loop head
labels based on the new
condition.
• Remove labels at L1 until
the conjunction of the
remaining labels is implied
by the new condition
L0
L1
lock=0&&new=old+1
true
L2
L3
Do we need to continue from L1’
given now it is stronger than an
ancestor L1?
L5
L1’
lock=1&&old=new
New Approach: USE
Step 5: execute symbolically
L0
Since lock=0&&new=old+1 (at
L1’) implies true (at L1). We
stop.
L1
true
L2
L3
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L4
L5
L1’
lock=0&&new=old+1
USE: First Abstraction
L0
L1
new!=old
L2
L3
L4
L5
true
new=old
L6
lock=0
Is this abstraction safe or not?
It is safe iff error is not
reachable if it is not reachable
based on this abstraction.
error
Is error reachable or not based
on this abstraction?
USE: Checking
L0
L1
new!=old
L2
L3
L4
L5
true
new=old
L6
lock=0
error
Run DFS/BFS algorithm on
this graph shows that
error is reachable.
L0 -> L1 -> L6 -> error
A counterexample based
on the abstraction might
not be a real
counterexample!
USE: Spuriousness Checking
L0
L1
new!=old
L2
L3
L4
L5
true
new=old
L6
lock=0
error
Run DFS/BFS algorithm on
this graph shows that
error is reachable.
L0 -> L1 -> L6 -> error
Symbolically execute the
above path and conclude
that it is spurious.
Why it is spurious?
USE: Refinement
L0
L1
new!=old
L2
L3
L4
true
new=old
L6
lock=0
error
• The path L0,L1,L6,error
is spurious.
• One (or more) loop
head in this path must
be too abstract.
• Find an interpolant at
the loop head (L1)
L5
lock=0&&new=old+1&& new=old && lock=0
Assume the interpolant found at L1 is: new!=old
USE: Refinement
L0
new!=old
new=old
L6
lock=0
error
• The path L0,L1,L6,error
is spurious.
• One (or more) loop
head in this path must
be too abstract.
• Find an interpolant at
the loop head (L1)
USE: Re-explore
Since the label at L1 has
changed, we need to reexplore.
L0
L1
This time, we can’t
remove the label at L1.
L2
We continue instead.
L5
new!=old
L3
L1’
lock=1&&old=new
USE: Re-explore
Continue with L6,
symbolic execution proves
that it is not possible.
L0
L1
new!=old
L2
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L3
L5
L1’
L6
lock=1&&old=new
USE: Re-explore
Continue with L6,
symbolic execution proves
that it is not possible.
L0
L1
new!=old
L2
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
We can’t go further since lock==1.
L3
L5
L1’
L6
lock=1&&old=new
USE: Re-explore
Backtrack to L1’ and
continue with L2, symbolic
execution shows it is not
feasible.
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
We can’t go further since old==new.
L0
L1
new!=old
L2
L3
L5
L1’
L2’
lock=1&&old=new
USE: Re-Explore
Backtrack to L3, continue with
L4,L5,L1. We can stop at L1’
because lock=0&&new=old+1
implies new!=old.
L0
L1
new!=old
L2
L0 lock=0;new=old+1
L1 while (new!=old) {
L2
lock=1;old=new;
L3
if (*) {
L4
lock=0;new++;}
L5 };
L6 if (lock==0)
L7
error();
L3
L4
L5
L1’
lock=0&&new=old+1
Recap: the USE Approach
L0
L1
new!=old
new!=old
L2
L3
L5
L1
L6
L4
L5
L2
L1
subsumed by new!=old
Recap: the USE Approach
• This approach acknowledges the difficulty in
finding (disjunctive) loop invariants and
compensates it with a combination of state
space exploring and abstraction-refinement.
Case Study
Iterative Deepening
New Approach
Exercise 4
L0
L1
new!=old
L2
L3
L4
new=old
L6
lock=0
error
• The path L0,L1,L6,error
is spurious.
• One (or more) loop
head in this path must
be too abstract.
• Find an interpolant at
the loop head (L1)
L5
lock=0&&new=old+1&& new=old && lock=0
What if the interpolant at L1 is: new=old+1?
Download