Symbolic Execution A quest for nails Willem Visser Stellenbosch University Overview • Yesterday – Classic symbolic execution – Enhanced to use concrete results • Today – String domain – Infinite loops • Tomorrow – Probabilities How do we get there? How did we get here? void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; else S4; } } How do we obtain Statement Coverage? void test(int x, int y) { if (x > 0) { if (y == hash(x)) How do we obtain Statement Coverage? else S1; if (x > 3 && y > 10) S3; else S4; } } void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; else S4; } } Random Inputs might work if you are moderately lucky But there is a better way! Where you don’t need to int hash(x) { win the Lottery if (0<=x<=10) return x*10; else return 0; } Symbolic Execution test(X,Y) [X>0] void test(int x, int y) { if (x > 0) { [ X > 0 ] hash (X) if (y == hash(x)) [ X>10 & … ] … S0; [ 0<X<=10 & Y=X*10 ] S0 else [ 0<X<=10 & Y!=X*10 ] S1 S1; if (x > 3 && y > 10) S3; [ 3<X<=10 & 10<Y=X*10] S3 [ 3<X<=10 & 10<Y!=X*10] S3 else [ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4 S4; } [ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4 } int hash(x) { if (0<=x<=10) return x*10; else return 0; } [ X > 0 ] hash (X) [ 0<X<=10 ] ret X*10 [ X>10] ret 0 Symbolic Execution test(X,Y) [X>0] void test(int x, int y) { if (x > 0) { [ X > 0 ] hash (X) if (y == hash(x)) [ X>10 & … ] … X=1,Y=10 S0; [ 0<X<=10 & Y=X*10 ] S0 else [ 0<X<=10 & Y!=X*10 ] S1 X=1,Y=0 S1; if (x > 3 && y > 10) X=4,Y=11 S3; [ 3<X<=10 & 10<Y=X*10] S3 [ 3<X<=10 & 10<Y!=X*10] S3 else X=1,Y=10 [ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4 S4; } [ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4 } int hash(x) { if (0<=x<=10) return x*10; else return 0; } [ X > 0 ] hash (X) [ 0<X<=10 ] ret X*10 [ X>10] ret 0 One of the basic blocks in the Binomial Heap implementation required a minimum sequence of 13 API calls to be covered private void merge(BinomialHeapNode binHeap) { BinomialHeapNode temp1 = Nodes, temp2 = binHeap; while ((temp1 != null) && (temp2 != null)) { if (temp1.degree == temp2.degree) { BinomialHeapNode tmp = temp2; temp2 = temp2.sibling; tmp.sibling = temp1.sibling; temp1.sibling = tmp; temp1 = tmp.sibling; } else { if (temp1.degree < temp2.degree) { if ((temp1.sibling == null) || (temp1.sibling.degree > temp2.degree)) { // HERE! … X4(1) >= X8(1) && X10(2) > X8(1) && X10(2) <= X11(2) && X11(2) > 0 && X10(2) > 0 && X8(1) <= X9(1) && X9(1) > 0 && X8(1) > 0 && X4(1) <= X2(1) && X6(2) > X4(1) && X6(2) <= X7(2) && X7(2) > 0 && X6(2) > 0 && X4(1) <= X5(1) && X5(1) > 0 && X4(1) > 0 && X2(1) <= X0(1) && X2(1) <= X3(1) && X3(1) > 0 && X2(1) > 0 && X0(1) <= X1(1) && X1(1) > 0 && X0(1) > 0 insert(X0);insert(X1);insert(X2);insert(X3);insert(X4); insert(X5);insert(X6);insert(X7);insert(X8);insert(X9); insert(X10);insert(X11);extractMin(); Symbolic Execution is not it has a the best thing since few serious namely : • It is inherently white-box • Only as good as the decision procedures void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; else S4; } } native int hash(x); Code is not available so no SymExe is possible OR int hash(x) { return x*x % 1023 } Assuming we only have a linear integer arithmetic DP we cannot handle the non-linearity here Concolic Execution or Directed Automated Random Testing (DART) Godefroid, Klarlund and Sen 2005 Novel combination of concrete and symbolic execution to overcome the two weaknesses of classic symbolic execution Executes program concretely, but collects the path condition, negates constraints on the PC after a run and executes again with the newly found solutions. [ X>0 & Y!=10 & X>3] void test(int x, int y) { test(1,0) [X>0] if (x > 0) { if (y == hash(x)) S0; else [ X > 0 & Y != 10 ] S1; if (x > 3 && y > 10) S3; else [ X>0 & Y!=10 & X<=3] S4; } } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } test(4,0) [X>0] [ X > 0 & Y != 40 ] [ X>0 & Y!=40 & X>3 & Y<=10] Concolic Execution [ X>0 & Y!=40 & X>3 & Y>10] void test(int x, int y) { test(4,11) [X>0] if (x > 0) { if (y == hash(x)) S0; else [ X > 0 & Y != 40 ] S1; if (x > 3 && y > 10) [ X>0 & Y!=40 & X>3 & Y>10] S3; else S4; } } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } [ X>0 & Y=40 & X>3 & Y>10] test(4,40) [X>0] [ X > 0 & Y = 40 ] [ X>0 & Y=40 & X>3 & Y>10] Concolic Execution [ X>0 & Y=40 & X>3 & Y>10] void test(int x, int y) { if (x > 0) { if (y == 40) S0; else S1; if (x > 3 && y > 10) S3; else S4; } } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } test(4,40) [X>0] [ X > 0 & Y = 40 ] [ X>0 & Y=40 & X>3 & Y>10] Concolic Execution [ X>0 & Y=40 & X<=3 & Y>10] void test(int x, int y) { test(1,40) [X>0] if (x > 0) { if (y == hash(x)) Divergence! S0; else [ X > 0 & Y != 10 ] Aimed to get S0;S4 S1; But reached S1;S4 if (x > 3 && y > 10) S3; else [ X>0 & Y!=10 & X<=3 & Y>10] ASSERT not via S0 S4; } } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } Concolic Execution Symbolic Execution with Mixed Concrete-Symbolic Solving Pasareanu, Rungta, Visser 2011 Symbolic Execution that falls back onto concrete values when it doesn’t have access to the code or the decision procedures don’t work. SymCrete = Symbolic + Concrete vs Concolic = Concrete + Symbolic Symbolic Execution test(X,Y) [X>0] void test(int x, int y) { if (x > 0) { [ X > 0 ] hash (X) if (y == hash(x)) [ X>10 & … ] … S0; [ 0<X<=10 & Y=X*10 ] S0 else [ 0<X<=10 & Y!=X*10 ] S1 S1; if (x > 3 && y > 10) S3; [ 3<X<=10 & 10<Y=X*10] S3 [ 3<X<=10 & 10<Y!=X*10] S3 else [ 0<X<=10 & Y=X*10 & ! (X>3 & Y>10) ] S4 S4; } [ 0<X<=10 & Y!=X*10 & ! (X>3 & Y>10) ] S4 } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } [ X > 0 ] hash (X) [ 0<X<=10 ] ret X*10 [ X>10] ret 0 Symbolic Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; else S4; } } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } test(X,Y) [X>0] [ X > 0 ] hash (X) SymCrete 3 Steps 1. Split PC into two parts: 1. Part you can solve 2. Part you cannot solve 2. Solve the easy part and evaluate the hard part with the solutions 3. Replace the hard part with the evaluated results and check SAT test(X,Y) SymCrete Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (x > 3 && y > 10) S3; else S4; } } [X>0] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 easy hard 1 X>0 Y=hash(X) 2 X=1 3 X>0 & Y=10 is SAT native int hash(x) { if (0<=x<=10) return x*10; else return 0; } [ X>0 & Y!=hash(X) ] S1 Y=hash(1)=10 X>0 & Y!=10 is SAT SymCrete Execution test(X,Y) [X>0] void test(int x, int y) { if (x > 0) { [ X > 0 ] hash (X) if (y == hash(x)) S0; [ X>0 & Y=hash(X) ] S0 else S1; if (x > 3 && y > 10) S3; [ X>3 & Y=hash(X) & Y>10 ] S3 else X>3 & Y>10 Y=hash(X) S4; [ 3>=X>0 & Y=hash(X)] S4 X=4 & Y=11 Y=hash(4) } 3>=X>0 Y=hash(X) [X>3 & Y=40 & Y>10 is SAT } X=1 native int hash(x) { if (0<=x<=10) return x*10; else return 0; } Y=hash(1) [3>=X>0 & Y=10 is SAT SymCrete Execution test(X,Y) [X>0] void test(int x, int y) { if (x > 0) { [ X > 0 ] hash (X) if (y == hash(x)) x=1,y=10 S0; [ X>0 & Y=hash(X) ] S0 x=1,y=0 else [ X>0 & Y!=hash(X) ] S1 S1; if (x > 3 && y > 10) S3; [ X>3 & Y=hash(X) & Y>10 ] S3 [ X>3 & Y!=hash(X) & Y>10 ] S3 else x=4,y=40 x=4,y=11 S4; [ 3>=X>0 & Y=hash(X)] S4 [ 3>=X>0 & Y!=hash(X)] S4 } x=1,y=10 x=1,y=0 } native int hash(x) { if (0<=x<=10) return x*10; else return 0; } The Risk of Unsoundness test (int x, int y) { if (x>=0 && x>y && y == x*x) S0; Not Reachable else S1; } [ X>=0 & X > Y & Y = X*X ] S0 X>=0 & X>Y Y = X*X X=0, Y=-1 Y=0*0=0 X>=0 & X>Y & Y=0 Is SAT which implies S0 is Reachable Must add constraints on the solutions from Step 2 in Step 3 X>=0 & X>Y & Y=0 & X=0 NOT SAT Concolic will diverge instead 3 More Enhancements Incremental Solving User Annotations Random Solving [ X>0 & Y!=10 & Y>10] Problem for Concolic void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (y > 10) S3; else S4; } } test(1,0) test(1,11) [X>0] [X>0] [ X > 0 & Y != 10 ] [ X > 0 & Y != 10 ] [ X>0 & Y!=10 & Y>10] [ X>0 & Y!=10 & Y<=10] native int hash(x) { if (0<=x<=10) return x*10; else return 0; } After Negation Concolic is Stuck [ X>0 & Y=10 & Y>10] test(X,Y) SymCrete Execution void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (y > 10) S3; else S4; } } [X>0] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 [ X>0 & Y=hash(X) & Y>10 ] S3 X>0 & Y>10 Y=hash(X) X=1 Y=hash(1) =10 X>0 & Y>10 & Y=10 & X=1 UNSAT Get another solution! native int hash(x) { if (0<=x<=10) return x*10; else return 0; } X=2 Y=hash(2) =20 X>0 & Y>10 & Y=20 & X=2 is SAT SymCrete Execution test(X,Y) @Partition({“x>3”,”x<=3”}) void test(int x, int y) { if (x > 0) { if (y == hash(x)) S0; else S1; if (y > 10) S3; else S4; } } [X>0] [ X > 0 ] hash (X) [ X>0 & Y=hash(X) ] S0 [ X>0 & Y=hash(X) & Y>10 ] S3 X>0 & Y>10 Y=hash(X) X=1 Y=hash(1) =10 X>0 & Y>10 & Y=10 & X=1 UNSAT Add user partitions one at a time X>0 & Y>10 & X > 3 native int hash(x) { if (0<=x<=10) return x*10; else return 0; X=4 Y=hash(X) Y=hash(4) =40 X>3 & Y>10 & Y=40 & X=4 is SAT Random Solving Pick solutions randomly from the solution space Current implementation only picks randomly if the solution space is completely unconstrained - Not all solvers support the general feature - JavaPathFinder Symcrete Custom Listeners on SPF Symbolic PathFinder SPF Symbolic Execution extension for JPF called jpf-symbc Model Checker for Java Open Source http://babelfish.arc.nasa.gov/trac/jpf public String preserveTags(String body) {…} Infinite loops are the worst kind of error, since it is input driven and therefore can reappear frequently, in fact infinitely often! Symbolic String Analysis • • • • • • (Almost) All Java String operations covered Mixed Integer and String constraints Automata and SMT (bitvector) back-ends Part of Symbolic PathFinder M.Sc. by Gideon Redelinghuys Collaborators – Jaco Geldenhuys (Stellenbosch) Infinite Loop? while (x > 0) (x,y) = (x+y+2,-x); Try (1,-2) We only consider affine transformations on loop variables and simple loop conditions such as x>0 and x>=0 Infinite Loop? x,y are inputs while (x >= 0) { x := x – y; } Ranking functions x,y are inputs while (x >= 0) { assert(‘x > x); x := x – y; } Use ranking functions for non-termination! Ranking functions x,y are inputs while (x >= 0) { assert(‘x > x); x := x – y; } ‘x <= x ‘x <= x ‘x <= x … {c /\ wp(s,‘x <= x)} s {c /\ wp(s,‘x <= x)} Inductive? x,y are inputs while (x >= 0) { assert(‘x > x); x := x – y; } {x >= 0 /\ wp(x:=x-y,‘x <= x)} x := x - y {x >= 0/\ wp(x:=x-y,‘x <= x)} wp(x:=x-y,’x<=x) = {x <= x-y} {x >= 0 /\ y <= 0} x := x - y {x >= 0 /\ y <= 0} So how about just… while (c) { s; } {c /\ wp(s,!rr)} s {c /\ wp(s,!rr)} x,y are inputs while (x >= 0) { assert(‘x > x); x := x + y; y := 1 – y; } x,y are inputs while (x >= 0) { assert(‘x > x); x := x + y; y := 1 – y; } {x >= 0 /\ wp(x:=x+y;y:=1-y,‘x <= x)} x := x – y; y := 1 – y; {x >= 0/\ wp(x:=x+y;y:=1-y,‘x <= x)} wp(x:=x+y;y:=1-y,’x<=x) = {x <= x+(1-y)} {x >= 0 /\ y <= 1} x:=x+y;y:=1-y; {x >= 0 /\ y <= 1} ‘x <= x ‘x <= x while (c) { s; } {c /\ wp(sn,!rr)} sn {c /\ wp(sn,!rr)} ‘x <= x … ‘x <= x N ‘x <= x … while (x0 > 0) { f(x) = Ax+b; } We conjecture that if there is an infinite loop then there exist n such that for all x for which the following is true you will loop infinitely x0 > 0 /\ f1(x) > 0 /\ … /\ f2n-1(x) > 0 /\ x0 ≤ fn(x) => fn(x) ≤ f2n(x) Can we derive n from the number of variables in x? For 1 variable n = 2 For 2 variables n >= 6 JavaPathFinder AffineLoopListener Custom Listener on SPF Tries n = 0..6 Symbolic PathFinder SPF Symbolic Execution extension for JPF called jpf-symbc Model Checker for Java Open Source http://babelfish.arc.nasa.gov/trac/jpf