Realization of solver based techniques for Dynamic Software Verification Andreas S Scherbakov Intel Corporation andreas.s.scherbakov@intel.com What’s program testing here? • The problem: to test a program means • Find at least one set of input values such that – a crash/an Illegal operation occur or – some user defined property has violated (unexpected results/behaviour) or • Prove correctness of the program – or at least demonstrate that it’s correct with some high probability SW testing: basic approaches • Random testing -> You execute your program repeatedly with random input values.. + covers a lot of unpredictable cases ─ too much redundant iterations -> out of resources • “Traditional “ testing - Custom test suites -> You know you code and therefore you can create necessary examples to test it?.. + targets known critical points ─ misses most of unusual use cases ─ large effort, requires intimate knowledge of the code • Directed testing -> Try to get a significantly different run each attempt.. + explores execution alternatives rapidly + effective for mixed whitebox/blackbox code ─ usually needs some collateral code ─ takes large resources if poorly optimized SW testing: basic approaches - 2 • Static Analysis ─ ─ + + Commercial tools: Coverity, Klocwork, … Find dumb bugs, not application logic errors Finds some “false positive” bugs, misses many real bugs Good performance Little expertise required + ─ ─ Academic and competitor tools: BLAST, CBMC, SLAM/SDV Finds application logic errors Finds some “false positive” bugs, but doesn’t miss any real ones Significant user expertise required + ─ ─ Academic tools: HOL, Isabelle, … Ultimate guarantee: proves conformance with specification Scaling constraint is human effort, not machine time Ultimate user expertise required: multiple FV PhDs • Model Checking • Formal Verification Directed Testing: as few runs as possible • executes the program with two test cases: i=0 and i=5 • 100% branch coverage DART: Directed Automated Random Testing • Main idea has been proposeded in Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed Automated Random Testing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2005: 213-223. • Dependent upon a Satisfiability Modulo Theories (SMT) solvers -> SMT solvers are applications able to solve equation sets. A theory here implies methods related to some set of allowed data types/operands What does it check? • Does not verify the correctness of the program UNLESS YOU HAVE Express the meaning of CORRECTNESS in form of ASSERTION CHECKERs – Can not infer what the ‘correct’ behavior of the program is • What does it check – allows users to add assumptions to limit the search space and assertions (‘ensure‘) to define the expected behavior. – Assertions are treated as (‘bad’) branches – so test process will try to reach them, or formally verify it is impossible. – ‘built in’ checks for crashes, divide by 0, memory corruption • requires some familiarity with the software under test for effectiveness. 7 Looking for a Snark in a Forest Looking for a Bug in a Program • A bug is a like a snark • A program is like a forest with many paths • Source code is like a map of the forest Just the place for a Snark! I have said it twice: That alone should encourage the crew. Just the place for a Snark! I have said it thrice: What I tell you three times is true. The Hunting of the Snark Lewis Carroll Proof Rather than Snark Hunting forest searching can be a very effective way to show the presence of snarks, but is hopelessly inadequate for showing their absence. The Humble Snark Hunter • How can we prove there no snarks in the forest? – – – – • Get a map of the forest Find the area between trees Assume a safe minimum diameter of a snark If minimum snark diameter > minimum tree separation no snarks in forest The gold standard, but: – – – – You need a formal model of the forest A mathematician Substantial effort As good as your model of forests and snarks (are snarks really spherical?) Snark Hunting Via Random Testing • REPEAT – Walk through the forest with a coin. – On encountering a fork, toss the coin: • heads, go left • tails, go right • UNTIL snark found or exhausted • Easy to do: You don’t even need a map! • But: – Very low probability of finding a snark Traditional Snark Hunting • Study the forest map and use your experience to choose the places where snarks are likely to hide. • For each likely hiding place, write a sequence of “turn left”, “turn right” instructions that will take you there. • REPEAT – Choose an unused instruction sequence – Walk through the forest following the instructions • UNTIL snark found or all instructions used • But… – Snarks notoriously good at hiding where you don’t expect Snark Hunting Via Static Coverage Analysis • Get a map of the forest • Have a computer calculate instruction sequences that go through all locations in the forest. • REPEAT – Choose an unused instruction sequence – Walk through the forest following the instructions • UNTIL snark found or enough of the forest covered • But… – Lot of computing power to calculate the paths – there will be a lot of paths Effective Snark Hunting Without A Map • Start with a blank Map He had bought a large map representing the sea, Without the least vestige of land: And the crew were much pleased when they found it to be A map they could all understand. • REPEAT – REPEAT • Walk through the forest with – a map (initially blank) – sequence of instructions (initially blank) • Add each fork that you haven’t seen before to your map. • When encountering a fork: – If there is an unused instruction, follow it – Otherwise, toss a coin as in random testing – UNTIL you exit the forest • If there is a fork on your map with a branch not taken – Write a sequence of instructions that lead down such a branch • UNTIL snark found, no untaken branches on map, you’re tired Comparison of alternatives Formal Verification Model checking Accuracy DART Traditional testing Static analysis Expertise/Effort 14 How it Works x= 0 y= 9 • f(x,y) run: 1 – Arbitrary inputs: • x=0 • y=9 false false x1 = x – 1; x1 > y x>y =-1 void f (int x, int y) { if (x > y) { x = x + y; y = x – y – 3; x = x – y; } x = x – 1; if (x > y) { abort (); } return; } How it Works x y • f(x,y) run: 2 – choose x, y so – (x > y) = false – x1 = x – 1 – (x1 > y) = true – no such x, y! x>y x1 = x – 1; x1 > y void f (int x, int y) { if (x > y) { x = x + y; y = x – y – 3; x = x – y; } x = x – 1; if (x > y) { abort (); } return; } • f(x,y) run:2 – choose x, y so • (x > y) = true How it Works x =9 y =0 true x>y x1 = x + y; y1 = x1 – y; x2 = x1 – y1 – 3; x3 = x2 – 1; x1 = x – 1; – Inputs • x=9 • y = 0x1 > y false x3 > y 1 =9 =9 =-3 =-4 void f (int x, int y) { if (x > y) { x = x + y; y = x – y – 3; x = x – y; } x = x – 1; if (x > y) { abort (); } return; } • f(x,y) run: 3 – – – – – – – – – – choose x, y so (x > y) = true x1 = x + y y1 = x1 – y x2 = x1 – y1 + 3 x 3 = x2 – 1 (x3 > y1) = true Inputs: x=1 y=0 How it Works x =1 y =0 x>y true x1 = x + y; y1 = x1 – y; x2 = x1 – y1 – 3; x3 = x2 – 1; x3 > y1 void f (int x, int y) { if (x > y) { x = x + y; y = x – y – 3; x = x – y; =1 } =1 x = x – 1; =-3 if (x > y) { =-4 abort (); } return; true } abort A Simple Test Harness The Program void snarky (int x, int y) { if (x > y) { x = x + y; y = x – y – 3; x = x – y; } x = x – 1; if (x > y) { abort (); } } int main () { const int x = choose_int ("x"); const int y = choose_int ("y"); snarky (x, y); return 0; } • instrumentation library routine Quick Example void string_copy (const char *s, char *t) { int i; for (i=0; s[i] != '\0'; ++i) { t[i] = s[i]; } } int string_equal (const char *s, const char *t) { int i = 0; while (s[i] != '\0' && s[i] == t[i]) { ++i; } int main () { const size_t source_length = choose_size_atmost (…); const char *source = choose_valid_string (…); const size_t target_size = choose_size_atleast (…); const char *target = choose_valid_char_array (…); string_copy (source, target); ensure (string_equal (source, target)); return 0; } Quick example: Bug found Bug found with the parameters: target_size = 1 target[0] = 1 source_length = 0 (Killed by signal) Overall Design • Harness Library – Supply specified values for inputs, or arbitrary values – Check required/ensured constraints • Instrumentation – Modify a C program to produce an execution trace with the required execution • Observed Execution – Observe path taken by a run and calculate predicate describing a new path • Constraint Solver – Solver used to discover for a specified path condition • If the path is feasible • Inputs that would cause it to be executed Testing Time 90 80 70 60 50 40 30 20 10 0 1 2 3 4 5 6 • Don’t expect to test all paths for realistically sized data • You can, however, run many useful tests quickly 7 8 You Provide The Controllability • For each “unit” you write – A harness to call unit’s functions – Stubs for functions the unit calls • Provides functions to generate values – For harnesses to call with – For stubs to return with – Declarative specification of constraints on the values • This provides – A model of the unit’s environment – Controllability over the unit Harness code Code under test Stub code Front End: Instrumentation Why do we track symbolic data? We want to be able to choose another branch next run.. if (x==y+3) { /* branch A */ } else { /* branch B */ } To choose given branch, we need to solve: ( x==y+3 ) == false/true To pass it to solver, we need to have x==y+3 expression in a symbolic form at if In order to know it at this point, we should track assignments of constituent components.. Tracing symbolic data • Solution: adding special tracing statements to source statements CIL • “CIL (C Intermediate Language) is a high-level representation along with a set of tools that permit easy analysis and source-to-source transformation of C programs.” http://www.cs.berkeley.edu/~necula/cil/ • CIL enables user application to explore and refactor various types of C source constructs (functions, blocks, statements, instructions, expressions, variables etc) in a convenient way while keeping the remaining code structure. Tool Framework User Input User written harness Software under test Frontend CIL Instrument -ation Backend Instrumented Program Run Scoreboard track coverage Input Generator SMT Solver Problem: CIL Based Frontend does not support C++ Solution: Replace the CIL based frontend with LLVM to support C++ How CIL simplifies handling the code.. • Automatically rewrites C expressions with side effects: a = b+= --c ---> c = c-1; b = b+c; a = b; • Uniformly represents memory references: (base+offset) • Converts do,for,while loops to while (1) { if (cond1) break; /* if needed */ if (cond2) continue; /* if needed */ body; } • Traces control flow What is LLVM? • • • • LLVM – Low Level Virtual Machine Modular and reusable collection of libraries Developed at UIUC and at Apple® LLVM Intermediate Representation (IR) is well designed and documented. • Has a production quality C++ frontend that is compatible with GCC • Open-source with industry friendly license. • More info at www.llvm.org LLVM frontend LLVM Based Frontend User written harness Software under test Clang C/C++ Parser LLVM IR Compiler Pass Rest of Compile Instrumented Program Backend LLVM provides modular libraries and tool infrastructure to develop compiler passes Using C++ overloads • Idea: redefine operators such a way that they output trace data: my_int operator + (my_int x, my_int y) { symbolic s = trace_addition(x.symbol(),y.symbol()); int c = x.val() + y.vall(); return my_int (s,c); } • Instrumentation is still needed (control tracing, types..) Reducing branches • This 2-branch control: if (x && y) action1; else action2; really produces 3 branches in C/C++: if (x) { if (y) action1; else action2; } else action 1; • x && y is not really a logical and. – We cannot simply supply (x && y) to a SMT solver.. Reducing branches: solution • But.. Sometimes it IS logical and – Namely, if y may be safely evaluated at x==false or y cannot be safely evaluated at any x value which means – y has no side effects and – y crash conditions don’t depend on x • If we can prove this statically, use the form: if (logical_and(x,y)) action1; else action2; • Else use 3-branch form Solver Theories • Different solver theory – Linear Integer Arithmetic: (a*x + b*y + ….) {><=} C – Linear Real Arithmetic – BitVector Arithmetic • Most conditions in C source code fits one of them. But some mixed/complex don’t – alas, sometimes using random alternation – luckily, theories are being developed actively • Need to recognize theory patterns for better performance -> Sometimes supported scope is wider then declared theory scope 36 Path exploration strategy • Usually we explore all paths in Depth First Search mode: – alternate deeper ones first – when complete, return one level and try again • But execution path count may occur to be extremely high to explore all of them Path exploration strategy -2 • If we have no resources to explore all path, DFS is not the best strategy: some nodes never be visited while some others are carefully explored - low coverage coverage - most of dumb bugs may be missed explored unexplored • Good strategy principle: first visit new nodes, next explore new paths - Details are subject to research An optimization: Get function properties • Idea: Taking advantage of code hierarchy: using I/O properies for function/procedure call -> try to go with the assumptions only rather than deepening into subroutine body Example: y = string_copy (x) require valid_pointer(x) property valid_pointer(y) /* assuming we have yet memory */ property length(x) == length(y) property i < length(x) y[i] == x [i] • • For black box (external library) code, assumptions should be supplied as collaterals For available source code, they can also be extracted automatically -> but it’s a question what to extract If (length(s) > 2) { p = string_copy(s); if (length(s) >1) { } else { do_something(); }} If (length(s) > 2) { p = ???; assume length(p) == length(s); if (length(p) >1) { } else { /* lenghts(p) <=1 && length(p) == length(s) && lengths(s) > 2) ---- Infeasible */ }} An optimization: Separate independent alternations if (z == 2) { x=b; do_something1(); } if (y == x) { do_something2(); } Dependent choices We should try 2*2 combinations: •z=2, y=b •z=2, y≠b •z≠2, y=x •z≠2, y≠x (all variables are sampled at the beginning of code piece presented) Separate independent alternations -2 if (z == 2) { q = b; } if (y == x) { p = c; } Independent choices We can try only 2 combinations, for example: •z=2, y=x •z≠2, y≠x (provided that do_something1() and do_something2() effects don’t interdepend) Separate independent alternations -3 if (z == 2) { q = b; } if (y == x) { p = c; } if (q == p) … Dependent choices again! An optimization: re-using unsatisfied conditions if (a && b && c) { … } if (a && b && c) { … } if (a && b && c && d &&e) { … } Let we’ve proved that we cannot get here Then, we can be sure that we cannot get there too No need to call a solver again Handling Black Boxed Code Contents • • • • • • • • • • • Motivation Losing control with black boxes Return Value Representation Randomization Characterizing Learning Stubbing/Wrapping Example: The encryption problem Selective/Dynamic Black-Boxing Embedded White-Boxes Afterwords Motivation • Testing a portion of a code within a large system. E.g: – Code over infrastructure/library functions – Firmware over hardware API/Virtual Platform – Binary infrastructure • Hiding Code solver can’t cope with – Non Linear arithmetic (a*b = C) – Assembly • Handling Deep paths/Recursion Losing Control with Black Boxes • Black-boxes impair our controllability when program paths are influenced by black-box outputs. int a = choose_int(“a”); int b = bb(a); if (b > 10) { … } else { … } • We have no information to pick “a” such that it drives (b > 10) in both directions. Return Value Representation • The flow treats the return value of an uninstrumented function as concrete only (not symbolic). • But it can be explicitly assigned a fresh symbolic variable with fresh_* int a = blackboxed_func(…); // a is concrete fresh_integer(a, “a”); // a is symbolic • The reverse could be done as well with concrete_* (later). Example: The Encryption Problem ulong x = choose_uint ("x%d", count); ulong y = choose_uint ("y%d", count); if (y == encrypt(x)) <…>; else <…>; • We pathologically can’t guess x and y beforehand such that y == encrypt(x). coping with it by: • Running once with y=x=0, the condition fails. • “see”: (y == <concrete encrypt(0)>) • choose x=0, y = encrypt(0) for the 2nd run. Randomization • We can increase our chances of gaining coverage by adding randomization int a = choose_random_int(“a”); int b = bb(a); if (b > 10) { … } else { … } Characterizing: assert • Our first step to gain back control is having the user tell us something about the function. • A new construct is added: assert(<cond>) Reminder: – require(<cond>) : Assume <cond> holds. If it doesn’t ignore current path and move on. This is actually a branch equivalent to if (!cond) exit(0); – ensure(<cond>) : Make sure <cond> holds. If it doesn’t - stop execution and report it. If it does, try to make it fail. This is actually a branch equivalent to if (!cond) abort(); assert – 2. • Eg: a strictly monotonic black boxed function. int a = choose_int(“a”); int b = bb(a); fresh_integer(b, “b”); assert(b > a); if (b > 10) { … } else { … } • • • • assert(<cond>): Assume <cond> holds. If it doesn’t – stop execution and report. But don’t try to make it fail. Must use with fresh_* Full characterization: solves the problem, but impractical. Partial characterization: – May help the solver – depending on its internal heuristics. – The more assertions the better. Learning • Learn from concrete inputs and outputs of a black-boxed function – and use if future runs. • But what to learn?: Function is not always deterministic: has implicit “inputs” and “outputs” / internal state. • Instead of learning functions, we learn a “subject” in many lessons. Each lesson can have multiple inputs and outputs. Learning - 2. int a = choose_int(“a”); lesson l = begin_lesson (); learn_integer(l, a, LEARN_INPUT); int b = bb(a); fresh_integer(b, “b”); learn_integer(l, b, LEARN_OUTPUT); end_lesson(l); if (b > 10) { … } else { … } Learning - 3. • When it misses a path it can retry it several times, and learn new concrete values in each try. • Previous learning will add constraints to the solver, so previous inputs will not be reused when trying to get a different outputs. Learning – 4. • The “subject” is supposed to be common to all invocations of a black-boxed function, but is different from function to function. • An easier function: begin_lesson() implicitly creates a unique subject from the code location. • Problem: the function is invoked from different places in the code. We want to write learning once. • Solution: We shall later see how we can easily write wrappers to divert all calls to one place. Stubbing / Wrapping • User will write a wrapper to add characterization and learning sugaring to all invocations of a function: int bb_wrap(int a) { lesson l = begin_lesson (); learn_integer(l, a, LEARN_INPUT); int b = bb(a); fresh_integer(b, “b”); learn_integer(l, b, LEARN_OUTPUT); end_lesson(l); assert(b > a); return(a); } Stubbing / Wrapping – 2. • Now we want to call the wrapper bb_wrap, instead of bb. • But we don’t want to manually change all invocations in our code. • instrument does it for us: instrument –stub bb:bb_wrap – stub … -stub … • Conveniently it won’t replace calls within the stub code itself. Selective Blackboxing • We can selectively blackbox instrumented code: begin_blackbox(true); int x = 10; end_box(); concrete_integer(x); • We must use fresh_* or concrete_* on values that were defined/modified inside the blackbox and are visible outside of it. • Otherwise it might think it is uninitialized, or miscalculate paths. Dynamic Blackboxing - 1. • Sometimes we just have too many paths: char* s = choose_valid_string("str", 100); int count_a = 0; for(int i=0; i<100; i++) { if (s[i] == 'a') { count_a++; } } • If we can’t change the string length it will see 2^100 paths… Dynamic Black-boxing – 2. • We can dynamically hide code: • char* s = choose_valid_string("str", 100); int count_a = 0; for(int i=0; i<100; i++) { begin_blackbox(i>2); if (s[i] == 'a') { count_a++; } end_box(); }Now XXX sees only 2^2 paths. Embedded White-boxes • What if we want to look into some code that is called from black-boxed code? E.g a callback: void callback(int x) { … } // we want to see this code int main () { int x = choose_uint("x"); bb_foo(x, callback); // bb_foo is blackboxed return 0; } Embedded White-boxes - 2. • On the “inside” we “whitebox” it expicitly, and freshen (or concretize) the inputs: void callback(int x) { static int count = 0; begin_whitebox(true); fresh_integer(x, “new_x%d”, count); end_box(); ++count; } // we want to see this code Afterwords • We have established a “Swiss army knife” of features to support future black-box challenges. • This helps overcome some simple synthetic examples. • Since the loss of controllability is generally hard, we expect we’d need to refine this set of features as we hit real life test-cases.