Intelligent automatic test pattern generation for C-based HW/SW co-design descriptions through combined use of concrete and symbolic simulations Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008 Background In high-level SoC design, system behavior can be described in C-like programming languages Target both hardware and software Tool support is not sufficient Difficulties compared with RTL or lower design descriptions Many wide-bit word-level signals (large exploration space) Complicated control flow (many paths) Difficulty in modeling various descriptions Our goal is to assist test case generation for system-level descriptions in C-like languages Automatic input pattern generation 2 SW: pointers, pointer-arithmetic, casting, dynamic allocation, recursive calls… HW: concurrency, synchronization, throughput, latency… Assertion-based verification to find bugs For higher code coverage that results in higher confidence Most important issues in debugging Generally speaking, counter examples generated by simulation/emulation are very “long” Could be billions of cycles Not east at all to understand why error occurs Need much shorter counter examples just to understand why the bug happens Are those long sequences really necessary ? Initial state Bug State space State space Bounded model checking is based on assertions with “constraints” 3 Initial state There can be more directBug Loops can be skipped path Bounds cannot be large Can we drive good constraints from the counter examples found in simulation/emulation ? Target language SpecC = ANSI-C + mechanisms for HW Structural hierarchy Parallelism Behavior Ports Synchronization Channel p1 Channel c1 Interfaces p2 B v1 Languages discussed here C language Some additional features b1 Child behaviors 4 b2 Variable (wire) Outline Background Problem definitions for input pattern generation Preliminaries Concrete/symbolic hybrid simulation 5 branch / path / coverage definitions Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage Implementation Experimental Results Conclusion and Future work Requirements for input pattern generation (1) For assertion failure detection Given a design description annotated with Input variable definitions Assumption for input variables as predicates Assertion predicates Possible result Assertion violation (and input value assignments), Assertion holds for all possible input values, Unknown int func(int x, int y) { int r = 0; if (x – y > 0) r = x - y; else r = y – x; return r; } 6 int x, y; FL_INPUT(x); FL_INPUT(y); FL_ASSUME(x >= 0); FL_ASSUME(y >= 0); FL_ASSERT(func(x, y) > 0); Assertion failure Counter examples exist: (x = 0, y = 0) (x = 3, y = 3) ... Requirements for input pattern generation (2) For branch coverage: Given design description with annotations and target branch coverage Generate set of test cases (input value assignments) to cover branches Tell how to activate code fragments as many as possible (over multiple runs) int x, y; FL_INPUT(x); FL_INPUT(y); if (x > 2) { } if (y > 2) { } 7 Test cases of (1) (x = 0, y = 0) (2) (x = 3, y = 3) will achieve 100% branch coverage Outline Background Problem definitions for input pattern generation Preliminaries Concrete/symbolic hybrid simulation 8 branch / path / coverage definitions Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage Implementation Experimental Results Conclusion and Future work Branch / path definitions A (pair of) conditional branch(es): Associated with if, do-while, for, switch-case, and while statements A branch is covered when the associated condition has been evaluated as true (or false) at least once (over multiple runs) if (cond) then BC = cond 9 else BC = ! cond Branch / path definitions A path is a sequence of branches taken A path condition is defined as the conjunction of all the branch conditions taken A false (infeasible) path is a path such that there is no value assignment which satisfies the path condition 1: 2: 3: 4: 5: 6: 7: 8: void func(int x, int y) { if (x > 2) { } else { } if (y > 2) { } else { } } 1: 2: 3: 4: 5: 6: void func(int x, int y) { if (x > 2) { } if (x < 2) { } } There appear to be 4 paths; There are 4 paths; The path condition is (x > 2) AND NOT(y > 2) 10 But the path condition is (x > 2) AND (x < 2) INFEASIBLE! Branch / path coverage definitions Branch coverage # of branches covered out of # of all branches Path coverage # of paths covered out of # of all (or feasible) paths Difficult to use in practice because: The number of feasible paths cannot be known so easily The number of possible paths can be huge Exponential w.r.t. # of if-statements * loop iterations if if 11 Exercised 2 runs: branch coverage: 4 / (2 + 2) (100%) path coverage: 2 / (2 * 2) (50%) Outline Background Problem definitions for input pattern generation Preliminaries Concrete/symbolic hybrid simulation 12 branch / path / coverage definitions Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage Implementation Experimental Results Conclusion and Future work Traditional (concrete) simulation approach Create test cases (input values) by hand Or, generate randomly Very simple, but how long does it take to hit the failure? Incomplete: cannot prove the assertion ALWAYS holds Automated, but maybe difficult to activate the corner cases In system level descriptions, the search space can be huge (e.g. 32-bit word level signals) Run simulation Not so easy unless all possible values have been exercised (not practically possible) Confidence (quality of tests): given by coverage metrics E.g. Branch-coverage Try (x=3, y=100) => r=97 > 0 OK Try (x=1, y=20) => r=19 > 0 OK ... ... Try (x=10, y=10) => r=0 > 0 NG! (may eventually happen, but much rarely) 13 Formal approach 14 Build the formal expressions and mathematically solve the constraints Precise & Complete Computationally expensive Word-level approach: Symbolic simulation Evaluates values as symbolic expressions instead of concrete values Symbolic Simulation Needs to enumerate all the paths Sometimes the path can be infeasible (falsepath problem) path-condition Path1 int func(int x, int y) { int r = 0; if (x – y > 0) r = x - y; else r = y – x; return r; path2 } Enumerates possible paths (including infeasible ones) Path1: (r_1=0) (x – y > 0) (r_2=x - y) (x>=0) (y>=0) -> (r_2>0) VALID for all x,y 15 Path2: (r_1=0) NOT(x – y > 0) (r_2=y -x) (x>=0) (y>=0) -> (r_2>0) INVALID Counter Example: (y - x=0) (some of them may be reported) Symbolic simulation (cont’d) Employs SMT (satisfiability modulo theory) solver To solve path conditions To evaluate assertions For each path: One symbolic simulation on a path corresponds to concrete simulations of all possible values on that path Limitations: # of paths (including false paths) Size of symbolic expressions Solver capability (non-linear algebra) How to model complicated descriptions 16 May not be applied straightforwardly to complex / large descriptions Concrete-symbolic hybrid approach Combines concrete simulation and symbolic simulation (originally proposed by Larson[5]) CUTE[11] is proposed for unit testing Exhaustive traversal on all paths Concrete run guides the path for symbolic simulation (initially random simulation) Symbolic run on that path derives the path-condition Use concrete values for approximation if the constraints cannot be processed (e.g. non-linear) Solve the constraints to guide the path to another 17 Negate some path-condition term to take another branch Concolic Simulation (1st) initially random 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Concrete States x=0 y=0 z=0 (0 > 3)? -> no! Find the inputs to reach reach_me() 18 Symbolic States x=i1 y=i2 z=i3 (i1 > 3)? Path Condition (i1 <= 3) Negate this condition And solve to take THEN branch at B1 Concolic Simulation (2nd) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Concrete States x=10 y=0 z=0 (10 > 3) (0 > 11)? -> no! Find the inputs to reach reach_me() 19 Symbolic States x=i1 y=i2 z=i3 (x > 3) (y <= 11) Path Condition (i1 > 3) (i2 <= 11) Negate this condition And solve to take THEN branch at B2 Concolic Simulation (3rd) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Concrete States x=10 y=20 z=0 (10 > 3) (20 > 11) (0 == 400)? -> no! Find the inputs to reach reach_me() 20 Symbolic States x=i1 y=i2 z=i3 (x > 3) (y > 11) (z == y*y) Path Condition (i1 > 3) (i2 > 11) (i3 != 400) Non-linear i2*i2 is replaced by 400. Negate this condition And solve to take THEN branch at B3 Concolic Simulation (4th) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Concrete States x=10 y=20 z=400 (10 > 3) (20 > 11) (400 == 400) (10 < 5)? -> no! Find the inputs to reach reach_me() 21 Symbolic States x=i1 y=i2 z=i3 (x > 3) (y > 11) (z == 400) (x >= 5) Path Condition (i1 (i2 (i3 (i1 > 3) > 11) == 400) >= 5) Negate this condition And solve to take THEN branch at B4 Concolic Simulation (5th) 1: void test(int x, int y, int z) { 2: if (x > 3) // B1 3: if (y > 11) // B2 4: if (z == y*y) // B3 5: if (x < 5) // B4 6: reach_me(); 7: } Concrete States x=4 y=20 z=400 (4 > 3) (20 > 11) (400 == 400) (4 < 5) Symbolic States x=i1 y=i2 z=i3 (x > 3) (y > 11) (z == 400) (x < 5) Path Condition (i1 (i2 (i3 (i1 > 3) > 11) == 400) < 5) Find the inputs to reach reach_me() Reached successfully! 22 Concolic approach Can be applied to work-around non-linear Can be used to enumerate the paths Good Can be used to guide the path But CUTE does not think about which path should be tried next As 23 for path coverage CUTE’s strategy is exhaustive May not terminate if # of paths is huge Outline Background Problem definitions for input pattern generation Preliminaries Concrete/symbolic hybrid simulation 24 branch / path / coverage definitions Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage Implementation Experimental Results Conclusion and Future work Proposed method Flip a branch condition on a path only when not covered yet Gives the priority for path enumeration Terminates when the target coverage is achieved Tries to avoid enumerating all the paths Not guaranteed to cover all possible branches 25 Skips the uncovered paths that do not contribute to the branch coverage Derived alternative paths may not be feasible Worst case: all paths need to be enumerated Also limited by the solver’s capability (i.e. path condition may not be solved) Our implementation Implemented on FLEC (our C-Equivalence Checker) Used as SpecC[3] frontend Control/data/communication/… dependencies have been extracted AST interpreter Evaluates AST node (expression / statement) one by one For alternative path For assertion failure SMT solver: CVC3[12] 26 Concrete simulator evaluates with concrete values Symbolic simulator evaluates with symbolic expressions Branch/Path coverage profiler Input pattern generator C.f. CUTE: instrument & compile We can start from any points in the program ! To generate input patterns To evaluate assertions C.f. CUTE: lpsolve Outline Background Problem definitions for input pattern generation Preliminaries Concrete/symbolic hybrid simulation 27 branch / path / coverage definitions Concrete simulation, symbolic simulation Hybrid simulation Proposed Method for branch coverage Implementation Experimental Results Conclusion and Future work Experimental results (1/3) 1: int func(int x, int y) { 2: int r = 0; 3: if (x – y > 0) 4: r = x – y; 5: else 6: r = y – x; 7: return r; 8: } 9: void main() { 10: int x, y; 11: FL_INPUT(x); 12: FL_INPUT(y); 13: FL_ASSUME(x >= 0); 14: FL_ASSUME(y >= 0); 15: FL_ASSERT(func(x, y) > 0); 16: } 28 Simple example Achieved 2 / 2 (100%) branch coverage with 2 runs Detected assertion failure with (x=0, y=0) Experimental results (2/3) 1: unsigned int fact_rec(unsigned int s) { 2: if ( s <= 1) { 3: return 1; 4: } else { 5: unsigned int t; 6: unsigned int p; 7: t = s * fact_rec(s – 1); 8: return t; 9: } 10: unsigned int fact_for(unsigned int s) { 11: unsigned int i; 12: unsigned int p; 13: p = 1; 14: for (i = 1; i <= s; i++) { 15: p *= I; 16: } 17: return p; 18:29} 19: 20: 21: 22: 23: 24: 25: 26: void main() { int i, o1, o2; FL_INPUT(i); FL_ASSUME(i <= 10); o1 = fact_for(i); o2 = fact_rec(i); FL_ASSERT(o1 == o2); } Calculate factorial with two implementations With recursive function calls With for-loop Validated for one path (i = 8) Achieved 4/4 (100%) branch coverage with 1 run Experimental results (3/3) 1: int f(int x,int y, int z) { 2: int p; 3: if (x+y+z == 6) 4: if (2*x+7*y+3*z==25) 5: if(-4*x-2*y+2*z==-2) 6: FL_ASSERT(0); 7: for (p = 0; p < 100; p++) { 8: if (p == z) { 9: } 10: } 11: } 12: void main() { 13: int x, y, z; 14: FL_INPUT(x); 15: FL_INPUT(y); 16: FL_INPUT(z); 17: f(x, y, z); 18: } 30 # of branches: 10 # of paths: 4 * 2^100 Achieved 10 / 10 (100%) branch coverage with 5 runs Detected assertion failure with (x=1, y=2, z=3) CUTE got stuck due to too many paths Elevator controller profile Elevator controller (abstracted model) Cycle-based behavior Simple, but designed by real engineer Inputs: 3 Floors 1F 2F open 3 buttons for floor stop request 2 buttons for door open / close Outputs: 31 Up request buttons on 1F and 2F Down request buttons on 2F and 3F 1 Cabin 3F There is a not-intended bug Up, Down request status Floor stop request status Door open/close Cabin vertical speed (0: stopped, +1: up, -1: down) Cabin position (on 1F, b/w 1F and 2F, on 2F, b/w 2F and 3F, on 3F) Service direction (0: none, +1: up, -1: down) 3F 2F close 1F Elevator controller profile (cont’d) State variables: Up/Down request status (2+2) Floor stop request status (3) Door status (1) Cabin position (on 1F, b/w 1F and 2F, on 2F, b/w 2F and 3F, on 3F) Cabin speed (0: stopped, +1: up, -1: down) Service direction (0: none, +1: up, -1: down) 2^8 * 5 * 3 * 3 = 11.5k states (including infeasible ones) Initially stopped on 1F, door closed, no request active Original code: 396 lines in SpecC 145 million paths (including infeasible) Replaced if-then-else & switch-case statements with conditional (cond ? True : false) expressions 32 To handle multiple paths at once Simple control flow (straight line), but very complex data flow Reduced to 155 lines Elevator controller profile (cont’d) Property examples Elevator must be on or between 1F and 3F ASSERT((out_position >= 0) && (out_position <= 4)); Door opens only when the elevator is stopped on either of 1F, 2F and 3F ASSERT (!out_door || ( (out_speed == 0) && ( (out_position == 0) || (out_position ==2) || (out_position == 4)))) 33 Symbolic simulation result Symbolic expression explodes in 3-4 cycles of symbolic simulation nodes With constant propagation/substitution With simplifications for ITE, AND, OR, and other operators Without concrete-value substitution (approximation) Without common sub-expression sharing # of cycles of symbolic simulation must be highly bounded! Beginning of Symbolic simulation 1.E+06 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 Reset sequence typical signal all signals 1 34 300k nodes and more! 2 3 cycle 4 5 User guided simulation Starts symbolic simulation from the specified state by the user Explore with respect to the states of user’s interest Some of the states (proved to be) reachable by concrete (random) simulation Jump into the states (which may or may not be feasible) Will need to check its feasibility later Cycle is bounded Concrete simulation Symbolic State space simulation Initial states 35 Paths unknown Symbolic simulation Might be infeasible User guided result (1) Try to generate the input pattern to make a situation where Located on 2F Speed = -1 (down) I.e. to violate ASSERT (!((out_speed == -1) && (out_position == 2))) This state is out of bound from the initial state (stopped on 1F) 36 (not a bug) Need more than 3 cycles for elevator to accept request on 1F, start moving, go up at least to 2F, and go down… User guided result (1) (cont’d) So let’s jump in to one of the feasible state Found one of the input pattern to violate the assertion @ cycle 5 (3rd cycle of symbolic sim.) 37 state_position = 4, state_door = false, state_speed = 0 … Known as a reachable state by random simulation a priori Up request on 1F @ cycle 1 = true Up request on 2F @ cycle 1 = false Down request on 2F @ cycle 1 = false Stop on 1F request @ cycle 1 = false Stop on 2F request @ cycle 1 = false User guided result (2) Try to violate the assertion Elevator must be on or between 1F and 3F ASSERT((out_position >= 0) && (out_position <= 4)); Let’s jump into one of the state state_position = 4 (on 3F) state_speed = +1 (up) next state goes into out_position = 5 (higher than 3F!) And violates the assertion! However, the state (state_position = 4, state_speed = +1) is actually infeasible 38 Wrong assumption may lead a wrong conclusion The feasibility of the originating state should be verified in some way Conclusion & Future work Conclusion Implemented concrete/symbolic hybrid simulator based on AST interpreter Proposed a method for input pattern generation for branch coverage Experimental results demonstrate the input pattern generation For assertion failure detection For better branch coverage Future work 39 Capability to cover the specified target branch Handling of concurrent executions Hybrid simulation heuristic tuning Efficient management of symbolic expressions References 40 [3] D. D. Gajski, J. Zhu, R. Domer, A. Gerstlauer, and S. Zhao. SpecC: Specification Language and Methodology. Kluwer Academic Publishers, 2000. [5] E. Larson and T. Austin. High coverage detection of input-related security facults. In SSYM’03: Proc of 12th conf on USENIX Security Symbosium, 2003. [11] K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for c. In Proc. Of Esec/SIGSOFT FSE-13, 2005. [12] A. Stump, C. Barrett, and D. Dill. CVC: a cooperating validity checker. In 14th int’l conf on computer-aided verification, 2002 Difficulty compared with RTL or lower 41 In traditional methodology for RTL or gate-level Word signals are converted into bit-vector Then, solved with Boolean algebra Efficient algorithms available: SAT, BDDs… In system-level descriptions Too many word signals, too wide words (32 bit / 64 bit) Too wide space to explore Complicated control-flow Data-flow dynamically changes depending on the path Control-conditions are complex Too many paths Difficulty compared with RTL or lower (cont’d) In system-level descriptions To model software Recursive calls, pointers, pointer-arithmetic, typecasting, dynamic-allocations… To model hardware Concurrency, synchronization, throughput, latency… As word-level solvers, SMT solvers can be employed, but with limited capability Usually up to linear algebra Need approximation / workaround, otherwise it would not work! 42