Slide - Microsoft Research

advertisement
Symbolic Execution
A quest for nails
Willem Visser
Stellenbosch University
Overview
• Yesterday
– Classic symbolic execution
– Enhanced to use concrete results
• Today
– String domain
– Infinite loops
• Tomorrow
– Probabilities
How do we get there?
How did we get here?
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
How do we obtain
Statement Coverage?
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
How do we obtain
Statement Coverage?
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
Random Inputs
might work
if you are moderately lucky
But there is a better way!
Where you don’t need to
int hash(x) {
win the Lottery
if (0<=x<=10) return x*10;
else return 0;
}
Symbolic Execution
test(X,Y)
[X>0]
void test(int x, int y) {
if (x > 0) {
[ X > 0 ] hash (X)
if (y == hash(x))
[ X>10 & … ] …
S0;
[ 0<X<=10 & Y=X*10 ] S0
else
[ 0<X<=10 & Y!=X*10 ] S1
S1;
if (x > 3 && y > 10)
S3;
[ 3<X<=10 & 10<Y=X*10] S3
[ 3<X<=10 & 10<Y!=X*10] S3
else
[ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4
S4;
}
[ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4
}
int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
[ X > 0 ] hash (X)
[ 0<X<=10 ] ret X*10
[ X>10] ret 0
Symbolic Execution
test(X,Y)
[X>0]
void test(int x, int y) {
if (x > 0) {
[ X > 0 ] hash (X)
if (y == hash(x))
[ X>10 & … ] …
X=1,Y=10
S0;
[ 0<X<=10 & Y=X*10 ] S0
else
[ 0<X<=10 & Y!=X*10 ] S1
X=1,Y=0
S1;
if (x > 3 && y > 10)
X=4,Y=11
S3;
[ 3<X<=10 & 10<Y=X*10] S3
[ 3<X<=10 & 10<Y!=X*10] S3
else
X=1,Y=10
[ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4
S4;
}
[ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4
}
int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
[ X > 0 ] hash (X)
[ 0<X<=10 ] ret X*10
[ X>10] ret 0
One of the basic blocks in the Binomial Heap
implementation required a minimum
sequence of 13 API calls to be covered
private void merge(BinomialHeapNode binHeap) {
BinomialHeapNode temp1 = Nodes, temp2 = binHeap;
while ((temp1 != null) && (temp2 != null)) {
if (temp1.degree == temp2.degree) {
BinomialHeapNode tmp = temp2;
temp2 = temp2.sibling;
tmp.sibling = temp1.sibling;
temp1.sibling = tmp;
temp1 = tmp.sibling;
} else {
if (temp1.degree < temp2.degree) {
if ((temp1.sibling == null) ||
(temp1.sibling.degree > temp2.degree)) {
// HERE!
…
X4(1) >= X8(1) && X10(2) > X8(1) &&
X10(2) <= X11(2) && X11(2) > 0 &&
X10(2) > 0 && X8(1) <= X9(1) &&
X9(1) > 0 && X8(1) > 0 && X4(1) <= X2(1) &&
X6(2) > X4(1) && X6(2) <= X7(2) &&
X7(2) > 0 && X6(2) > 0 && X4(1) <= X5(1) &&
X5(1) > 0 && X4(1) > 0 && X2(1) <= X0(1) &&
X2(1) <= X3(1) && X3(1) > 0 && X2(1) > 0 &&
X0(1) <= X1(1) && X1(1) > 0 && X0(1) > 0
insert(X0);insert(X1);insert(X2);insert(X3);insert(X4);
insert(X5);insert(X6);insert(X7);insert(X8);insert(X9);
insert(X10);insert(X11);extractMin();
Symbolic Execution is not
it has a
the best thing since
few serious
namely :
• It is inherently white-box
• Only as good as the decision procedures
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
native int hash(x);
Code is not available so
no SymExe is possible
OR
int hash(x) {
return x*x % 1023
}
Assuming we only have a linear
integer arithmetic DP we cannot
handle the non-linearity here
Concolic Execution
or
Directed Automated Random Testing (DART)
Godefroid, Klarlund and Sen 2005
Novel combination of concrete and
symbolic execution to overcome the
two weaknesses of classic symbolic execution
Executes program concretely, but collects the
path condition, negates constraints on the PC
after a run and executes again with the newly
found solutions.
[ X>0 & Y!=10 & X>3]
void test(int x, int y) {
test(1,0)
[X>0]
if (x > 0) {
if (y == hash(x))
S0;
else
[ X > 0 & Y != 10 ]
S1;
if (x > 3 && y > 10)
S3;
else
[ X>0 & Y!=10 & X<=3]
S4;
}
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
test(4,0)
[X>0]
[ X > 0 & Y != 40 ]
[ X>0 & Y!=40 & X>3 & Y<=10]
Concolic Execution
[ X>0 & Y!=40 & X>3 & Y>10]
void test(int x, int y) {
test(4,11)
[X>0]
if (x > 0) {
if (y == hash(x))
S0;
else
[ X > 0 & Y != 40 ]
S1;
if (x > 3 && y > 10)
[ X>0 & Y!=40 & X>3 & Y>10]
S3;
else
S4;
}
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
[ X>0 & Y=40 & X>3 & Y>10]
test(4,40)
[X>0]
[ X > 0 & Y = 40 ]
[ X>0 & Y=40 & X>3 & Y>10]
Concolic Execution
[ X>0 & Y=40 & X>3 & Y>10]
void test(int x, int y) {
if (x > 0) {
if (y == 40)
S0;
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
test(4,40)
[X>0]
[ X > 0 & Y = 40 ]
[ X>0 & Y=40 & X>3 & Y>10]
Concolic Execution
[ X>0 & Y=40 & X<=3 & Y>10]
void test(int x, int y) {
test(1,40)
[X>0]
if (x > 0) {
if (y == hash(x))
Divergence!
S0;
else
[ X > 0 & Y != 10 ]
Aimed to get S0;S4
S1;
But reached S1;S4
if (x > 3 && y > 10)
S3;
else
[ X>0 & Y!=10 & X<=3 & Y>10]
ASSERT not via S0
S4;
}
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
Concolic Execution
Symbolic Execution
with
Mixed Concrete-Symbolic Solving
Pasareanu, Rungta, Visser 2011
Symbolic Execution that falls back onto concrete
values when it doesn’t have access to the code or the
decision procedures don’t work.
SymCrete = Symbolic + Concrete
vs
Concolic = Concrete + Symbolic
Symbolic Execution
test(X,Y)
[X>0]
void test(int x, int y) {
if (x > 0) {
[ X > 0 ] hash (X)
if (y == hash(x))
[ X>10 & … ] …
S0;
[ 0<X<=10 & Y=X*10 ] S0
else
[ 0<X<=10 & Y!=X*10 ] S1
S1;
if (x > 3 && y > 10)
S3;
[ 3<X<=10 & 10<Y=X*10] S3
[ 3<X<=10 & 10<Y!=X*10] S3
else
[ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4
S4;
}
[ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
[ X > 0 ] hash (X)
[ 0<X<=10 ] ret X*10
[ X>10] ret 0
Symbolic Execution
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
test(X,Y)
[X>0]
[ X > 0 ] hash (X)
SymCrete 3 Steps
1. Split PC into two parts:
1. Part you can solve
2. Part you cannot solve
2. Solve the easy part and evaluate
the hard part with the solutions
3. Replace the hard part with the
evaluated results and check SAT
test(X,Y)
SymCrete Execution
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (x > 3 && y > 10)
S3;
else
S4;
}
}
[X>0]
[ X > 0 ] hash (X)
[ X>0 & Y=hash(X) ] S0
easy
hard
1
X>0
Y=hash(X)
2
X=1
3
X>0 & Y=10 is SAT
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
[ X>0 & Y!=hash(X) ] S1
Y=hash(1)=10
X>0 & Y!=10 is SAT
SymCrete Execution
test(X,Y)
[X>0]
void test(int x, int y) {
if (x > 0) {
[ X > 0 ] hash (X)
if (y == hash(x))
S0;
[ X>0 & Y=hash(X) ] S0
else
S1;
if (x > 3 && y > 10)
S3;
[ X>3 & Y=hash(X) & Y>10 ] S3
else
X>3 & Y>10
Y=hash(X)
S4;
[ 3>=X>0 & Y=hash(X)] S4
X=4 & Y=11
Y=hash(4)
}
3>=X>0
Y=hash(X)
[X>3 & Y=40 & Y>10 is SAT
}
X=1
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
Y=hash(1)
[3>=X>0 & Y=10 is SAT
SymCrete Execution
test(X,Y)
[X>0]
void test(int x, int y) {
if (x > 0) {
[ X > 0 ] hash (X)
if (y == hash(x))
x=1,y=10
S0;
[ X>0 & Y=hash(X) ] S0
x=1,y=0
else
[ X>0 & Y!=hash(X) ] S1
S1;
if (x > 3 && y > 10)
S3;
[ X>3 & Y=hash(X) & Y>10 ] S3
[ X>3 & Y!=hash(X) & Y>10 ] S3
else
x=4,y=40
x=4,y=11
S4;
[ 3>=X>0 & Y=hash(X)] S4
[ 3>=X>0 & Y!=hash(X)] S4
}
x=1,y=10
x=1,y=0
}
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
The Risk of Unsoundness
test (int x, int y) {
if (x>=0 && x>y && y == x*x)
S0;
Not Reachable
else
S1;
}
[ X>=0 & X > Y & Y = X*X ] S0
X>=0 & X>Y
Y = X*X
X=0, Y=-1
Y=0*0=0
X>=0 & X>Y & Y=0
Is SAT which implies
S0 is Reachable
Must add constraints on the
solutions from Step 2 in Step 3
X>=0 & X>Y & Y=0 & X=0
NOT SAT
Concolic will diverge instead
3 More Enhancements
Incremental Solving
User Annotations
Random Solving
[ X>0 & Y!=10 & Y>10]
Problem for Concolic
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (y > 10)
S3;
else
S4;
}
}
test(1,0)
test(1,11)
[X>0]
[X>0]
[ X > 0 & Y != 10 ]
[ X > 0 & Y != 10 ]
[ X>0 & Y!=10 & Y>10]
[ X>0 & Y!=10 & Y<=10]
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
After Negation Concolic is Stuck
[ X>0 & Y=10 & Y>10]
test(X,Y)
SymCrete Execution
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (y > 10)
S3;
else
S4;
}
}
[X>0]
[ X > 0 ] hash (X)
[ X>0 & Y=hash(X) ] S0
[ X>0 & Y=hash(X) & Y>10 ] S3
X>0 & Y>10
Y=hash(X)
X=1
Y=hash(1) =10
X>0 & Y>10 & Y=10 & X=1 UNSAT
Get another solution!
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
}
X=2
Y=hash(2) =20
X>0 & Y>10 & Y=20 & X=2 is SAT
SymCrete Execution
test(X,Y)
@Partition({“x>3”,”x<=3”})
void test(int x, int y) {
if (x > 0) {
if (y == hash(x))
S0;
else
S1;
if (y > 10)
S3;
else
S4;
}
}
[X>0]
[ X > 0 ] hash (X)
[ X>0 & Y=hash(X) ] S0
[ X>0 & Y=hash(X) & Y>10 ] S3
X>0 & Y>10
Y=hash(X)
X=1
Y=hash(1) =10
X>0 & Y>10 & Y=10 & X=1 UNSAT
Add user partitions one at a time
X>0 & Y>10 & X > 3
native int hash(x) {
if (0<=x<=10) return x*10;
else return 0;
X=4
Y=hash(X)
Y=hash(4) =40
X>3 & Y>10 & Y=40 & X=4 is SAT
Random Solving
Pick solutions randomly from the solution space
Current implementation only picks randomly
if the solution space is completely unconstrained
- Not all solvers support the general feature -
JavaPathFinder
Symcrete
Custom Listeners on SPF
Symbolic PathFinder
SPF
Symbolic Execution extension
for JPF called jpf-symbc
Model Checker for Java
Open Source
http://babelfish.arc.nasa.gov/trac/jpf
public String preserveTags(String body)
{…}
Infinite loops are the worst kind of error,
since it is input driven and therefore can reappear
frequently, in fact infinitely often!
Symbolic String Analysis
•
•
•
•
•
•
(Almost) All Java String operations covered
Mixed Integer and String constraints
Automata and SMT (bitvector) back-ends
Part of Symbolic PathFinder
M.Sc. by Gideon Redelinghuys
Collaborators
– Jaco Geldenhuys (Stellenbosch)
Infinite Loop?
while (x > 0)
(x,y) = (x+y+2,-x);
Try (1,-2)
We only consider
affine transformations on loop variables
and simple loop conditions
such as x>0 and x>=0
Infinite Loop?
x,y are inputs
while (x >= 0) {
x := x – y;
}
Ranking functions
x,y are inputs
while (x >= 0) {
assert(‘x > x);
x := x – y;
}
Use ranking functions for non-termination!
Ranking functions
x,y are inputs
while (x >= 0) {
assert(‘x > x);
x := x – y;
}
‘x <= x
‘x <= x
‘x <= x
…
{c /\ wp(s,‘x <= x)}
s
{c /\ wp(s,‘x <= x)}
Inductive?
x,y are inputs
while (x >= 0) {
assert(‘x > x);
x := x – y;
}
{x >= 0 /\ wp(x:=x-y,‘x <= x)}
x := x - y
{x >= 0/\ wp(x:=x-y,‘x <= x)}
wp(x:=x-y,’x<=x) = {x <= x-y}
{x >= 0 /\ y <= 0}
x := x - y
{x >= 0 /\ y <= 0}
So how about just…
while (c) {
s;
}
{c /\ wp(s,!rr)}
s
{c /\ wp(s,!rr)}
x,y are inputs
while (x >= 0) {
assert(‘x > x);
x := x + y;
y := 1 – y;
}
x,y are inputs
while (x >= 0) {
assert(‘x > x);
x := x + y;
y := 1 – y;
}
{x >= 0 /\ wp(x:=x+y;y:=1-y,‘x <= x)}
x := x – y; y := 1 – y;
{x >= 0/\ wp(x:=x+y;y:=1-y,‘x <= x)}
wp(x:=x+y;y:=1-y,’x<=x) =
{x <= x+(1-y)}
{x >= 0 /\ y <= 1}
x:=x+y;y:=1-y;
{x >= 0 /\ y <= 1}
‘x <= x
‘x <= x
while (c) {
s;
}
{c /\ wp(sn,!rr)}
sn
{c /\ wp(sn,!rr)}
‘x <= x
…
‘x <= x
N
‘x <= x
…
while (x0 > 0) {
f(x) = Ax+b;
}
We conjecture that if there is
an infinite loop then
there exist n such that
for all x for which the following
is true you will loop infinitely
x0 > 0 /\ f1(x) > 0 /\ … /\ f2n-1(x) > 0 /\
x0 ≤ fn(x) => fn(x) ≤ f2n(x)
Can we derive n from the number of variables in x?
For 1 variable n = 2
For 2 variables n >= 6
JavaPathFinder
AffineLoopListener
Custom Listener on SPF
Tries n = 0..6
Symbolic PathFinder
SPF
Symbolic Execution extension
for JPF called jpf-symbc
Model Checker for Java
Open Source
http://babelfish.arc.nasa.gov/trac/jpf
Download