DART and CUTE: Concolic Testing Koushik Sen University of California, Berkeley Joint work with Gul Agha, Patrice Godefroid, Nils Klarlund, Rupak Majumdar, Darko Marinov 7/17/2016 1 Today, QA is mostly testing “50% of my company employees are testers, and the rest spends 50% of their time testing!” Bill Gates 1995 2 Software Reliability Importance Software is pervading every aspects of life Software failures were estimated to cost the US economy about $60 billion annually [NIST 2002] Software everywhere Improvements in software testing infrastructure may save onethird of this cost Testing accounts for an estimated 50%-80% of the cost of software development [Beizer 1990] 3 Big Picture Safe Programming Languages and Type systems Static Program Analysis Model Checking and Theorem Proving Testing Runtime Monitoring Dynamic Program Analysis Model Based Software Development and Analysis 4 Big Picture Safe Programming Languages and Type systems Static Program Analysis Model Checking and Theorem Proving Testing Runtime Monitoring Dynamic Program Analysis Model Based Software Development and Analysis 5 A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) { int i=lo, j=hi, h; int x=a[(lo+hi)/2]; // partition do { while (a[i]<x) i++; while (a[j]>x) j--; if (i<=j) { h=a[i]; a[i]=a[j]; a[j]=h; i++; j--; } } while (i<=j); // recursion if (lo<j) quicksort(a, lo, j); if (i<hi) quicksort(a, i, hi); } 6 A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) { int i=lo, j=hi, h; int x=a[(lo+hi)/2]; Test QuickSort // partition do { while (a[i]<x) i++; while (a[j]>x) j--; if (i<=j) { h=a[i]; a[i]=a[j]; a[j]=h; i++; j--; } } while (i<=j); // recursion if (lo<j) quicksort(a, lo, j); if (i<hi) quicksort(a, i, hi); Create an array Initialize the elements of the array Execute the program on this array How much confidence do I have in this testing method? Is my test suite *Complete*? Can someone generate a small and *Complete* test suite for me? } 7 Automated Test Generation Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? 8 Automated Test Generation Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient 9 Automated Test Generation Studied since 70’s 30 years have passed, and yet no effective solution What Happened??? King 76, Myers 79 Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient In the recent years we have seen remarkable progress in static program-analysis and constraint solving SLAM, BLAST, ESP, Bandera, Saturn, MAGIC 10 Automated Test Generation Studied since 70’s King 76, Myers 79 Question: Can we use similar 30 years have passed, and yet no effective techniques in Automated Testing? solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient In the recent years we have seen remarkable progress in static program-analysis and constraint solving SLAM, BLAST, ESP, Bandera, Saturn, MAGIC 11 Systematic Automated Testing Static Analysis Model-Checking and Formal Methods Testing Automated Theorem Proving Constraint Solving 12 CUTE and DART Combine random testing (concrete execution) and symbolic testing (symbolic execution) [PLDI’05, FSE’05, FASE’06, CAV’06,ISSTA’07, ICSE’07] Concrete + Symbolic = Concolic 13 Goal Automated Unit Testing of real-world C and Java Programs Generate test inputs Execute unit under test on generated test inputs so that all reachable statements are executed Any assertion violation gets caught 14 Goal Automated Unit Testing of real-world C and Java Programs Generate test inputs Execute unit under test on generated test inputs so that all reachable statements are executed Any assertion violation gets caught Our Approach: Explore all execution paths of an Unit for all possible inputs Exploring all execution paths ensure that all reachable statements are executed 15 Execution Paths of a Program Can be seen as a binary tree with possibly infinite depth Computation tree Each node represents the execution of a “if then else” statement Each edge represents the execution of a sequence of non-conditional statements Each path in the tree represents an equivalence class of inputs 1 0 0 1 1 1 0 0 1 1 0 1 16 Example of Computation Tree void test_me(int x, int y) { if(2*x==y){ if(x != y+10){ printf(“I am fine here”); } else { printf(“I should not reach here”); ERROR; } } } N 2*x==y N Y x!=y+10 Y ERROR 17 Concolic Testing: Finding Security and Safety Bugs Divide by 0 Error x = 3 / i; Buffer Overflow a[i] = 4; 18 Concolic Testing: Finding Security and Safety Bugs Key: Add Checks Automatically and Perform Concolic Testing Divide by 0 Error Buffer Overflow if (i !=0) x = 3 / i; else ERROR; if (0<=i && i < a.length) a[i] = 4; else ERROR; 19 Existing Approach I Random testing generate random inputs execute the program on generated inputs Probability of reaching an error can be astronomically less test_me(int x){ if(x==94389){ ERROR; } } Probability of hitting ERROR = 1/232 20 Existing Approach II Symbolic Execution use symbolic values for input variables execute the program symbolically on symbolic input values collect symbolic path constraints use theorem prover to check if a branch can be taken Does not scale for large programs test_me(int x){ if((x%10)*4!=17){ ERROR; } else { ERROR; } } Symbolic execution will say both branches are reachable: False positive 21 Existing Approach II Symbolic Execution use symbolic values for input variables execute the program symbolically on symbolic input values collect symbolic path constraints use theorem prover to check if a branch can be taken Does not scale for large programs test_me(int x){ if(bbox(x)!=17){ ERROR; } else { ERROR; } } Symbolic execution will say both branches are reachable: False positive 22 Concolic Testing Approach int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { Random Test Driver: random value for and y x Probability of reaching ERROR is extremely low ERROR; } } } 23 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution Symbolic Execution concrete state symbolic state x = 22, y = 7 x = x0, y = y0 path condition void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; } } } 24 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution concrete state Symbolic Execution symbolic state path condition void testme (int x, int y) { z = double (y); if (z == x) { x = 22, y = 7, z = 14 x = x0, y = y0, z = 2*y0 if (x > y+10) { ERROR; } } } 25 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution concrete state Symbolic Execution symbolic state path condition void testme (int x, int y) { z = double (y); 2*y0 != x0 if (z == x) { if (x > y+10) { ERROR; } } } x = 22, y = 7, z = 14 x = x0, y = y0, z = 2*y0 26 Concolic Testing Approach int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); Concrete Execution concrete state Symbolic Execution symbolic state path condition Solve: 2*y0 == x0 Solution: x0 = 2, y0 = 1 2*y0 != x0 if (z == x) { if (x > y+10) { ERROR; } } } x = 22, y = 7, z = 14 x = x0, y = y0, z = 2*y0 27 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution concrete state Symbolic Execution symbolic state path condition void testme (int x, int y) { x = 2, y = 1 x = x0, y = y0 z = double (y); if (z == x) { if (x > y+10) { ERROR; } } } 28 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution concrete state Symbolic Execution symbolic state path condition void testme (int x, int y) { z = double (y); if (z == x) { x = 2, y = 1, z=2 x = x0, y = y0, z = 2*y0 if (x > y+10) { ERROR; } } } 29 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution concrete state Symbolic Execution symbolic state path condition void testme (int x, int y) { z = double (y); 2*y0 == x0 if (z == x) { if (x > y+10) { x = 2, y = 1, z=2 x = x0, y = y0, z = 2*y0 ERROR; } } } 30 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution concrete state Symbolic Execution symbolic state path condition void testme (int x, int y) { z = double (y); 2*y0 == x0 if (z == x) { x0 · y0+10 if (x > y+10) { ERROR; } } } x = 2, y = 1, z=2 x = x0, y = y0, z = 2*y0 31 Concolic Testing Approach int double (int v) { return 2*v; } void testme (int x, int y) { z = double (y); Concrete Execution concrete state Symbolic Execution symbolic state path condition Solve: (2*y0 == x0) Æ (x0 > y0 + 10) Solution: x0 = 30, y0 = 15 2*y0 == x0 if (z == x) { x0 · y0+10 if (x > y+10) { ERROR; } } } x = 2, y = 1, z=2 x = x0, y = y0, z = 2*y0 32 Concolic Testing Approach int double (int v) { return 2*v; } Concrete Execution Symbolic Execution concrete state symbolic state x = 30, y = 15 x = x0, y = y0 path condition void testme (int x, int y) { z = double (y); if (z == x) { if (x > y+10) { ERROR; } } } 33 Concolic Testing Approach int double (int v) { return 2*v; } void testme (int x, int y) { Concrete Execution concrete state Symbolic Execution symbolic state path condition Program Error z = double (y); 2*y0 == x0 if (z == x) { x0 > y0+10 if (x > y+10) { ERROR; x = 30, y = 15 x = x0, y = y0 } } } 34 Explicit Path (not State) Model Checking Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors T F F T T T F F T T F T 35 Explicit Path (not State) Model Checking Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors T F F T T T F F T T F T 36 Explicit Path (not State) Model Checking Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors T F F T T T F F T T F T 37 Explicit Path (not State) Model Checking Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors T F F T T T F F T T F T 38 Explicit Path (not State) Model Checking Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors T F F T T T F F T T F T 39 Explicit Path (not State) Model Checking Traverse all execution paths one by one to detect errors assertion violations program crash uncaught exceptions combine with valgrind to discover memory errors T F F T T T F F T T F T 40 Novelty : Simultaneous Concrete and Symbolic Execution int foo (int v) { return (v*v) % 50; } Concrete Execution Symbolic Execution concrete state symbolic state x = 22, y = 7 x = x0, y = y0 path condition void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; } } } 41 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Execution int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { concrete state Symbolic Execution symbolic state path condition Solve: (y0*y0 )%50 == x0 Don’t know how to solve! Stuck? (y0*y0)%50 !=x0 if (x > y+10) { ERROR; } } } x = 22, y = 7, z = 49 x = x0, y = y0, z = (y0 *y0)%50 42 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Execution concrete state void testme (int x, int y) { z = foo (y); if (z == x) { Symbolic Execution symbolic state path condition Solve: foo (y0) == x0 Don’t know how to solve! Stuck? foo (y0) !=x0 if (x > y+10) { ERROR; } } } x = 22, y = 7, z = 49 x = x0, y = y0, z = foo (y0) 43 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Execution int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); if (z == x) { concrete state Symbolic Execution symbolic state path condition Solve: (y0*y0 )%50 == x0 Don’t know how to solve! Not Stuck! (y0*y0)%50 !=x0 Use concrete state if (x > y+10) { Replace y0 by 7 (sound) ERROR; } } } x = 22, y = 7, z = 49 x = x0, y = y0, z = (y0 *y0)%50 44 Novelty : Simultaneous Concrete and Symbolic Execution int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { z = foo (y); Concrete Execution concrete state Symbolic Execution symbolic state path condition Solve: 49 == x0 Solution : x0 = 49, y0 = 7 49 !=x0 if (z == x) { if (x > y+10) { ERROR; } } } x = 22, y = 7, z = 48 x = x0, y = y0, z = 49 45 Novelty : Simultaneous Concrete and Symbolic Execution int foo (int v) { return (v*v) % 50; } Concrete Execution Symbolic Execution concrete state symbolic state x = 49, y = 7 x = x0, y = y0 path condition void testme (int x, int y) { z = foo (y); if (z == x) { if (x > y+10) { ERROR; } } } 46 Novelty : Simultaneous Concrete and Symbolic Execution int foo (int v) { return (v*v) % 50; } void testme (int x, int y) { Concrete Execution concrete state Symbolic Execution symbolic state path condition Program Error z = foo (y); 2*y0 == x0 if (z == x) { x0 > y0+10 if (x > y+10) { ERROR; } x = 49, y = 7, z = 49 x = x0, y = y0 , z = 49 } } 47 Concolic Testing: A Middle Approach Random Testing Symbolic Testing Concolic Testing + Complex programs + Efficient - Less coverage + No false positive + Complex programs +/- Somewhat efficient + High coverage + No false positive - Simple programs - Not efficient + High coverage - False positive 48 Implementations DART and CUTE for C programs jCUTE for Java programs MSR has four implementations Goto http://srl.cs.berkeley.edu/~ksen/ for CUTE and jCUTE binaries SAGE, PEX, YOGI, Vigilante Similar tool: EXE at Stanford Easiest way to use and to develop on top of CUTE Implement concolic testing yourself 49 Testing Data Structures (joint work with Darko Marinov and Gul Agha 50 Example typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } • Random Test Driver: • random memory graph reachable from p • random value for x • Probability of reaching abort( ) is extremely low 51 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } concrete state symbolic state constraints p , x=236 int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution p=p0, x=x0 NULL 52 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p , x=236 p=p0, x=x0 NULL 53 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } x0>0 p , x=236 p=p0, x=x0 NULL 54 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } x0>0 !(p0!=NULL ) p , x=236 p=p0, x=x0 NULL 55 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete symbolic state state solve: x0>0 and p0NULL int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution constraints x0>0 p0=NULL p , x=236 p=p0, x=x0 NULL 56 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete symbolic state state solve: x0>0 and p0NULL int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution x0=236, p0 constraints NULL 634 x0>0 p0=NULL p , x=236 p=p0, x=x0 NULL 57 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } concrete state p 634 NULL , x=236 Symbolic Execution symbolic state constraints p=p0, x=x0, p->v =v0, p->next=n0 58 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p 634 NULL , x=236 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 59 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p 634 NULL , x=236 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 p0NULL 60 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p 634 NULL , x=236 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 p0NULL 2x0+1v0 61 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } x0>0 p0NULL 2x0+1v0 p 634 NULL , x=236 p=p0, x=x0, p->v =v0, p->next=n0 62 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete symbolic state state solve: x0>0 and p0NULL and 2x0+1=v0 int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution constraints x0>0 p0NULL 2x0+1v0 p 634 NULL , x=236 p=p0, x=x0, p->v =v0, p->next=n0 63 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete symbolic state state solve: x0>0 and p0NULL and 2x0+1=v0 int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution x0=1, p0 constraints NULL 3 x0>0 p0NULL 2x0+1v0 p 634 NULL , x=236 p=p0, x=x0, p->v =v0, p->next=n0 64 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p 3 Symbolic Execution concrete state symbolic state NULL , x=1 p=p0, x=x0, p->v =v0, p->next=n0 constraints 65 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 66 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 p0NULL 67 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 p0NULL 2x0+1=v0 68 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } x0>0 p0NULL p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 2x0+1=v0 n0p0 69 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } x0>0 p0NULL 2x0+1=v0 n0p0 p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 70 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution symbolic state constraints solve: x0>0 and p0NULL and 2x0+1=v0 and n0=p0 . x0>0 p0NULL 2x0+1=v0 n0p0 p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 71 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } Symbolic Execution symbolic state constraints solve: x0>0 and p0NULL and 2x0+1=v0 and n0=p0 x0=1, p0 3 x0>0 p0NULL 2x0+1=v0 n0p0 p NULL , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 72 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } concrete state symbolic state , x=1 p=p0, x=x0, p->v =v0, p->next=n0 p 3 Symbolic Execution constraints 73 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 74 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 p0NULL 75 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } p , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 x0>0 p0NULL 2x0+1=v0 76 CUTE Approach Concrete Execution typedef struct cell { int v; struct cell *next; } cell; concrete state Symbolic Execution symbolic state constraints int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } x0>0 p0NULL Program Error p , x=1 3 p=p0, x=x0, p->v =v0, p->next=n0 2x0+1=v0 n0=p0 77 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path 79 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically 80 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution 81 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution concrete execution enables symbolic execution to overcome incompleteness of theorem prover replace symbolic expressions by concrete values if symbolic expressions become complex resolve aliases for pointer using concrete values handle arrays naturally 82 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution concrete execution enables symbolic execution to overcome incompleteness of theorem prover replace symbolic expressions by concrete values if symbolic expressions become complex resolve aliases for pointer using concrete values handle arrays naturally symbolic execution helps to generate concrete input for next execution increases coverage 83 Handling Pointer Casting and Arithmetic struct foo { int i; char c; }; void * memset(void *s,char c,size_t n){ for(int i=0;i<n;i++){ ((char *)s)[i] = c; } return s; } bar(struct foo *a){ if(a && a->c==1){ memset(a,0,sizeof(struct foo)); if(a->c!= 1) { ERROR; } } } Reasoning about dynamic data is easy Due to limitation of alias analysis “static analyzers” cannot determine that “a->c” has been rewritten BLAST would infer that the program is safe practical CUTE finds the error 84 Library Functions With Side-effects struct foo { int i; char c; }; bar(struct foo *a){ if(a && a->c==1){ memset(a,0,sizeof(struct foo)); if(a->c== 1) { ERROR; } } } Reasoning about function with side-effects Sound static tool will give false alarm CUTE will not report the error because it cannot generate a concrete input that will make the program hit ERROR. 85 Underlying Random Testing Helps 1 foobar(int x, int y){ 2 if (x*x*x > 0){ 3 if (x>0 && y==10){ 4 ERROR; 5 } 6 } else { 7 if (x>0 && y==20){ 8 ERROR; 9 } 10 } 11 } static analysis based model-checkers would consider both branches Symbolic execution both ERROR statements are reachable false alarm gets stuck at line number 2 or warn that both ERRORs are reachable CUTE finds the only error 86 Data-structure Testing Solving Data-structure Invariants int isSortedSlist(slist * head) { slist * cur, *tmp; int i,j; if (head == 0) return 1; i=j=0; for (cur = head; cur!=0; cur = cur->next){ i++; j=1; for (tmp = head; j<i; tmp = tmp->next){ j++; if(cur==tmp) return 0; } } for (cur = head; cur->next!=0; cur = cur>next){ if(cur->i > cur->_next->i) return 0; } return 1; } testme(slist *L,slist *e){ CUTE_assume(isSortedSlist(L)); sglib_slist_add(&L,e); CUTE_assert(isSortedSlist(L)); } 87 Data-structure Testing Generating Call Sequence for (i=1; i<10; i++) { CU_input(toss); CU_input(e); switch(toss){ case 2: sglib_hashed_ilist_add_if_not_member(htab,e,&m); break; case 3: sglib_hashed_ilist_delete_if_member(htab,e,&m); break; case 4: sglib_hashed_ilist_delete(htab,e); break; case 5: sglib_hashed_ilist_is_member(htab,e); break; case 6: sglib_hashed_ilist_find_member(htab,e); break; } } 88 Instrumentation 89 Instrumented Program Maintains a symbolic state maps each memory location used by the program to symbolic expressions over symbolic input variables Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 Æ C2 Æ … Æ Ck holds over the path currently executed by the program 90 Instrumented Program Maintains a symbolic state maps each memory location used by the program to symbolic expressions over symbolic input variables Arithmetic Expressions a1x1+…+anxn+c Pointer expressions x Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 Æ C2 Æ … Æ Ck holds over the path currently executed by the program 91 Instrumented Program Maintains a symbolic state maps each memory location used by the program to symbolic expressions over symbolic input variables a1x1+…+anxn+c a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 Æ C2 Æ … Æ Ck holds over the path currently executed by the program Pointer expressions x Maintains a path constraint Arithmetic Expressions Arithmetic Constraints a1x1+…+anxn ¸ 0 Pointer Constraints x=y x y x=NULL x NULL 92 Constraint Solving Path constraint C1Æ C2Æ… Æ Ck Æ… Cn 93 Constraint Solving Path constraint C1Æ C2Æ… Æ Ck Æ… Cn Solve C1Æ C2Æ… Æ : Ck to generate next input 94 Constraint Solving Path constraint C1Æ C2Æ… Æ Ck Æ… Cn Solve C1Æ C2Æ… Æ : Ck to generate next input If Ck is pointer constraint then invoke pointer solver Ci1 Æ Ci2 Æ … Æ : Ck where ij 2 [1..k-1] and Cij is pointer constraint If Ck is arithmetic constraint then invoke linear solver Ci1Æ Ci2 Æ … Æ : Ck where ij 2 [1..k-1] and Cij is arithmetic constraint 95 Optimizations Fast un-satisfiability check C1Æ C2Æ… Æ : Ck 96 Optimizations Fast un-satisfiability check C1Æ C2Æ… Æ : Ck Eliminate same constraints (y>2) Æ (y>2)Æ :(y==4) ´ (y>2)Æ :(y==4) that is C1Æ C2Æ…Ci … Cj … Æ : Ck = C1Æ C2Æ… Ci … Æ : Ck if Ci = Cj 97 Optimizations Fast un-satisfiability check C1Æ C2Æ… Æ : Ck Eliminate same constraints (y>2) Æ (y>2)Æ :(y==4) ´ (y>2)Æ :(y==4) that is C1Æ C2Æ…Ci … Cj … Æ : Ck = C1Æ C2Æ… Ci … Æ : Ck if Ci = Cj Eliminate non-dependent constraints (x==1) Æ (y>2)Æ :(y==4) ´ (y>2)Æ :(y==4) 98 SGLIB: popular library for C data-structures Used in Xrefactory a commercial tool for refactoring C/C++ programs Found two bugs in sglib 1.0.1 Bug 1: reported them to authors fixed in sglib 1.0.2 doubly-linked list library segmentation fault occurs when a non-zero length list is concatenated with a zero-length list discovered in 140 iterations ( < 1second) Bug 2: hash-table an infinite loop in hash table is member function 193 iterations (1 second) 99 SGLIB: popular library for C data-structures Name Run time in seconds # of Iterations # of Branches Explored % Branch Coverage # of Functions Tested # of Bugs Found Array Quick Sort 2 732 43 97.73 2 0 Array Heap Sort 4 1764 36 100.00 2 0 Linked List 2 570 100 96.15 12 0 Sorted List 2 1020 110 96.49 11 0 Doubly Linked List 3 1317 224 99.12 17 1 Hash Table 1 193 46 85.19 8 1 2629 1,000,000 242 71.18 17 0 Red Black Tree 100 Experiments: Needham-Schroeder Protocol Tested a C implementation of a security protocol (Needham-Schroeder) with a known attack 600 lines of code Took less than 13 seconds on a machine with 1.8GHz Xeon processor and 2 GB RAM to discover middle-man attack In contrast, a software model-checker (VeriSoft) took 8 hours 101 Testing Data-structures of CUTE itself Unit tested several non-standard datastructures implemented for the CUTE tool cu_depend (used to determine dependency during constraint solving using graph algorithm) cu_linear (linear symbolic expressions) cu_pointer (pointer symbolic expresiions) Discovered a few memory leaks and a copule of segmentation faults these errors did not show up in other uses of CUTE for memory leaks we used CUTE in conjunction with Valgrind 102 Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree 103 Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree Explored by Concolic Testing 104 Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree Explored by Concolic Testing 105 Hybrid Concolic Testing (joing work with Rupak Majumdar) 106 Limitations: A Comparative View Concolic: Broad, shallow Random: Narrow, deep 107 Hybrid Concolic Testing Interleave Random Testing and Concolic Testing to increase coverage Motivated by similar idea used in VLSI design validation: Ganai et al. 1999, Ho et al. 2000 108 Hybrid Concolic Testing Interleave Random Testing and Concolic Testing to increase coverage while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; } 109 Hybrid Concolic Testing Interleave Random Testing and Concolic Testing to increase coverage while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; } Deep, broad search Hybrid Search 110 Hybrid Concolic Testing 4x more coverage than random testing 2x more coverage than concolic testing 111 Results 112 Results 113 Summary Static Analysis Program Verification Automated Theorem Proving Model-Checking and Formal Methods Constraint Solving 114 Summary Static Analysis Program Verification Model-Checking and Formal Methods Testing Automated Theorem Proving Constraint Solving 115 References 1. 2. 3. 4. 5. 6. 7. DART: Directed Automated Random Testing. Patrice Godefroid, Nils Klarlund, and Koushik Sen (PLDI 2005) CUTE: A Concolic Unit Testing Engine for C. Koushik Sen, Darko Marinov, and Gul Agha (ESEC/FSE 2005) "Dynamic Test Input Generation for Database Applications” Michael Emmi, Rupak Majumdar, and Koushik Sen, International Symposium on Software Testing and Analysis (ISSTA'07), ACM, 2007. "Hybrid Concolic Testing” Rupak Majumdar and Koushik Sen, 29th International Conference on Software Engineering (ICSE'07), IEEE, 2007. "A Race-Detection and Flipping Algorithm for Automated Testing of Multi-Threaded Programs” Koushik Sen and Gul Agha, Haifa verification conference 2006 (HVC'06), Lecture Notes in Computer Science, Springer, 2006. "CUTE and jCUTE : Concolic Unit Testing and Explicit Path Model-Checking Tools” Koushik Sen and Gul Agha, 18th International Conference on Computer Aided Verification (CAV'06), Lecture Notes in Computer Science, Springer, 2006. (Tool Paper). "Automated Systematic Testing of Open Distributed Programs” Koushik Sen and Gul Agha, International Conference on Fundamental Approaches to Software Engineering (FASE'06) (ETAPS'06 conference), Lecture Notes in Computer Science, Springer, 2006. “EXE: Automatically Generating Inputs of Death,” Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler, 13th ACM Conference on Computer and Communications Security, 2006. “Automatically generating malicious disks using symbolic execution,” Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler. IEEE Security and Privacy, 2006. 116