Basic Definitions: Testing What is software testing? • • Running a program In order to find faults • • • • • a.k.a. defects a.k.a. errors a.k.a. flaws a.k.a. faults a.k.a. BUGS Hrm. . . that’s a lot of “a.k.a”s • Let’s refine this terminology a bit 1 Faults, Errors, and Failures Fault: a static flaw in a program • What we usually think of as “a bug” Error: a bad program state that results from a fault • Not every fault always produces an error Failure: an observable incorrect behavior of a program as a result of an error • Not every error ever becomes visible 2 To Expose a Fault with a Test Reachability: the test much actually reach and execute the location of the fault Infection: the fault must actually corrupt the program state (produce an error) Propagation: the error must persist and cause an incorrect output – a failure 3 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; Find the fault for (i = n-1; i > 0; i--) { if (a[i] == x) return i; } return -1; } 4 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] == x) Here’s a test case: a = {} n=0 x=2 return i; } return -1; Does not even reach the fault } 5 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) Here’s another: a = {3, 9, 4} n=3 x=2 return i; } return -1; Reaches the fault Infects state with error But no failure } 6 An Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) And finally: a = {2, 9, 4} n=3 x=2 return i; } return -1; } Reaches the fault Infects state with error And fails – returns -1 instead of 0 7 Controllability and Observability Goals for a test case: • • • Reach a fault Produce an error Make the error visible as a failure In order to make this easy the program must be controllable and observable • Controllability: • • How easy it is to drive the program where we want to go Observability: • How easy it is to tell what the program is doing 8 Design for Testability If a program is not designed to be controllable and observable, it generally won’t be We have to start preparing for testing before we write any code • Testing as an after-the-fact, ad hoc, exercise is often limited by earlier design choices 9 Test-Driven Development One way to design for testability is to write the test cases before the code • Idea arising from Extreme Programming and agile development • • • • • Write automated test cases first Then write the code to satisfy tests Helps focus attention on making software well-specified Forces observability and controllability: you have to be able to handle the test cases you’ve already written (before deciding they were impractical) Reduces temptation to tailor tests to idiosyncratic behaviors of implementation 10 Controllability: Simulation and Stubbing A key to controllable code is effective simulation and stubbing • Simulation of low-level hardware devices through a clean driver interface • • • • Real hardware may be slow May be impossible/expensive to induce some hardware failure modes on real hardware Real hardware may be a limited resource Stubbing for other routines and code • • • Other code/modules may not be complete May be slow and irrelevant to test May need to simulate failure of other modules 11 Simulation and Stubbing: JPL Example When testing JPL flash storage modules we rely on software simulation of flash devices • Real flash devices are slow • • Real flash devices are expensive • • • Can’t do aggressive random testing JPL only has a few boards – constant competition to test on these Running hundreds of thousand of tests will wear the flash hardware out Enables us to introduce rare hardware failures • System resets, spontaneous bad blocks and write failures, etc. 12 Controllability: Downwards Scalability Another important aspect of controllability is to make code “downwards scalable” • • Many faults cause an error only in a corner case due to a resource limit An effective strategy for finding errors is to reduce the resource limits • • • Test a version of the program with very tight bounds Finding corner cases is easier if the corners are close together Too many programs hard-code resource limits or make assumptions about resources unconnected to defined limits • E.g., not checking the result of malloc 13 Downwards Scalability: JPL Example Flight flash hardware is usually 1-4 GB device • E.g., 64 blocks of 32 pages of 8192 bytes We primarily test with much smaller “devices” (using software simulation) • 6 blocks of 4 pages of 64 bytes • Forces flash file system to compact storage more often • Tests assumptions about how space is used on flash • Forces more multi-page writes and directory entries over multiple pages 14 Downwards Scalability: JPL Example Easier to explore various combinations of states of blocks/pages of the device Used page Free page Dirty page Bad block 15 Controllability Other important themes for controllability • Network/file access • If program reads from the network or to remote files, this is hard to control • • System calls • Similarly, reading the time from the operating system can be hard to control • • Again, simulation and stubbing are key Simulation and stubbing – Operating System Abstraction Layer etc. GUI control • Allow scripted control of GUI elements so tests can be automated 16 Observability: Assertions Assertions improve observability by making (some) errors into failures • Even if the effect of a fault doesn’t propagate, it may be visible if an assertion checks the state at the right time Assertions also improve observability by making the error, rather than failure, visible • Know how the state was corrupted directly, not just eventual effect 17 Observability: Invariant Checkers Can extend the idea of assertions to writing “full” invariant checkers • Do a crawl of code’s basic data structures • Check various invariants that would be too expensive to check at runtime • Invariant checker can be written to be easy-to-use: recursion, memory allocation, etc. • • Won’t run on actual system But be careful! If your invariant checker has a bug and changes the system state. . . 18 Observability Other important themes for observability • Logging • • Especially critical for GUI interfaces, to mirror GUI events in ordered parseable messages Network/file access • If program writes to the network or to remote files, this is hard to observe 19 Controllability & Observability: Memory Allocation More extreme case: embedded code for mission or safety critical systems • • May be running without memory protection Dynamic allocation often forbidden Design module to accept a static block allocated elsewhere, and only access this memory • Controllability: allows us to introduce memory faults, simulate warm reboots • Observability: allows us to easily instrument code with low-overhead checks to find memory safety violations during testing 20 Coverage Literature of software testing is primarily concerned with various notions of coverage Ammann and Offutt identify four basic kinds of coverage: • Graph coverage • Logic coverage • Input space partitioning • Syntax-based coverage 21 Graph Coverage Cover all the nodes, edges, or paths of some graph related to the program Examples: • Statement coverage • Branch coverage • Path coverage • Data flow (def-use) coverage • Model-based testing coverage • Many more – most common kind of coverage, by far 22 Graph Coverage Most FSM testing algorithms can be seen as graph coverage • Consider VC – computing a spanning tree to nodes is standard graph exploration • Beizer: “find a graph and cover it” 23 Statement/Basic Block Coverage if (x < y) { y = 0; x = x + 1; } else { x = y; } Statement coverage: Cover every node of these graphs 1 x<y y=0 x=x+1 x >= y 2 3 x=y 4 Treat as one node because if one statement executes the other must also execute (code is a basic block) if (x < y) { y = 0; x = x + 1; } 1 x<y y=0 x=x+1 2 x >= y 3 24 Branch Coverage if (x < y) { y = 0; x = x + 1; } else { x = y; } 1 x<y y=0 x=x+1 x >= y 2 3 Branch coverage vs. statement coverage: Same for if-then-else x=y 4 But consider this if-then structure. For branch coverage can’t just cover all nodes, but must cover all edges – get to node 3 both after 2 and without executing 2! if (x < y) { y = 0; x = x + 1; } 1 x<y y=0 x=x+1 2 x >= y 3 25 Path Coverage if (x < y) { y = 0; x = x + 1; } else { x = y; } How many paths through this code are there? Need one test case for each to get path coverage 1 x<y y=0 x=x+1 x >= y 2 3 1 2 4 5 6 and 1 3 4 6 4 x<y if (x < y) y=0 x=x+1 { 5 } Path coverage needs two more: 12456 x >= y 1346 1246 y = 0; x = x + 1; x=y To get statement and branch coverage, we only need two test cases: 6 13456 In general: exponential in the number of conditional branches! 26 Data Flow Coverage 1 x = 3; y = 3; 2 if (w) { x = y + 2; } if (z) { y = x – 2; } x=3 Def(x) Annotate program with locations where variables are defined and used (very basic static analysis) y=3 Def(y) 3 w x=y+2 Def(x) Use(y) !w 4 54 n=x+y Def-use pair coverage requires executing all possible pairs of nodes where a variable is first defined and then used, without any intervening re-definitions z y=x-2 Def(y) Use(x) May be many pairs, some not actually executable !z 6 7 E.g., this path covers the pair where x is defined at 1 and used at 7: 1 2 3 5 6 7 n=x+y Use(x) Use(y) But this path does NOT: 1234567 27 Logic Coverage What if, instead of: if (x < y) { y = 0; x = x + 1; } we have: if (((a>b) || G)) && (x < y)) { y = 0; x = x + 1; } 1 ((a>b) || G)) && (x < y) y=0 x=x+1 ((a <= b) && !G) || (x >= y) 2 3 Now, branch coverage will guarantee that we cover all the edges, but does not guarantee we will do so for all the different logical reasons We want to test the logic of the guard of the if statement 28 Active Clause Coverage ( (a > b) or G ) and (x < y) With these values for G and (x<y), (a>b) determines the value of the predicate 1 T F T T 2 F F T F With these values for (a>b) and (x<y), G determines the value of the With these values predicate for (a>b) and G, (x<y) determines the value of the predicate 3 F T T T 4 F F T F 5 T T T T 6 T T F F duplicate 29 29 Input Domain Partitioning Partition scheme q of domain D The partition q defines a set of blocks, Bq = b1 , b2 , … bQ The partition must satisfy two properties: 1. 2. blocks must be pairwise disjoint (no overlap) together the blocks cover the domain D (complete) b1 b2 b3 bi bj = , i j, bi, bj Bq b=D b Bq Coverage then means using at least one input from each of b1, b2, b3, . . . 30 30 Input Domain Partitioning Some subtleties here… What’s wrong with this partition of file contents? • { • • • • b1: Sorted ascending file b2: Sorted descending file b3: Neither sorted ascending nor sorted descending } b1 b2 b3 bi bj = , i j, bi, bj Bq b=D b Bq 31 31 Syntax-Based Coverage Based on mutation testing (a pet topic of Amman and Offutt, who are heavily into this research area) Bit different kind of creature than the other coverages we’ve looked at Idea: generate many syntactic mutants of the original program Coverage: how many mutants does a test suite kill (detect)? 32 32 Mutating Our Buggy Program int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return -1; } 33 Mutant #1 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n; i > 0; i--) { if (a[i] = x) return i; } return -1; } 34 Mutant #2 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) return i; } return 0; } 35 Mutant #3 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] != x) return i; } return -1; } 36 Mutant #4 int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = n) return i; } return -1; } 37 Mutant #5: Wait, this one’s the fix! int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return -1; } 38 Syntax-Based Coverage MUTANTS OF P Program P 100% coverage means you kill all the mutants with your test suite 39 39 Generation vs. Recognition Generation of tests based on coverage means producing a test suite to achieve a certain level of coverage • As you can imagine, generally very hard • Consider: generating a suite for 100% statement coverage easily reaches “solving the halting problem” level • Obviously hard for, say, mutant-killing Recognition means seeing what level of coverage an existing test suite reaches 40 Coverage and Subsumption Sometimes one coverage approach subsumes another • If you achieve 100% coverage of criteria A, you are guaranteed to satisfy B as well • For example, consider node and edge coverage • (there’s a subtlety here, actually – can you spot it?) What does this mean? • Unfortunately, not a great deal • If test suite X satisfies “stronger” criteria A and test suite Y satisfies “weaker” criteria B • • • Y may still reveal bugs that X does not! For example, consider our running example and statement vs. branch coverage It means we should take coverage with a grain of salt, for one thing 41 Testing “for” Coverage Never seek to improve coverage just for the sake of increasing coverage • Well, unless it’s a command from-on-high Coverage is not the goal • • Finding failures that expose faults is the goal No amount of coverage will prove that the program cannot fail “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming 42 The Purpose of Testing “Program testing can be used to show the presence of bugs, but never to show their absence!” – E. Dijkstra, Notes On Structured Programming Dijkstra meant this as a criticism of testing and an argument in favor of more disciplined and total approaches (proving programs correct) But he also points out what testing is good for: exposing errors Coverage is valuable if and only if test sets with higher coverage are more likely to expose failures 43 The Purpose of Testing “Program testing can be used to show the presence of bugs” When we first start “testing,” we often want to “see that the program works” • Try out some scenarios and watch the program “do its stuff” • Surprised (annoyed) when (if) the program fails • This is not really testing: testing is not the same as a demonstration • Aim to break (your) code, if it can be broken 44 Levels of Testing Adapted from Beizer, by Amman and Offutt • Level 0: Testing is debugging • Level 1: Testing is to show the program works • Level 2: Testing is to show the program doesn’t work • Level 3: Testing is not to prove anything specific, but to reduce risk of using program • Level 4: Testing is a mental discipline that helps develop higher quality software 45 What’s So Good About Coverage? Consider a fault that causes failure every time the code is executed Don’t execute the code: cannot possibly find the fault! int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; } That’s a pretty good argument for statement coverage 46 What’s So Good About Coverage? We should have an argument for any kind of coverage: • “If I don’t cover this, then there is more chance I’ll miss a fault like that” • Backed with empirical data, preferably! int findLast (int a[], int n, int x) { // Returns index of last element // in a equal to x, or -1 if no // such. n is length of a int i; for (i = n-1; i >= 0; i--) { if (a[i] = x) return i; } return 0; } 47 Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a Let’s write a tester for int i; this version of the for (i = n-1; i > 0; i--) { program (back to the if (a[i] == x) first off-by-one bug) return i; } return -1; Forget for a moment that we know what the bug is! } 48 Return to Our Example int findLast (int a[], int n, int x) { // Returns index of last element in a // equal to x, or -1 if no such. // n is length of a int i; for (i = n-1; i > 0; i--) { if (a[i] = x) What kind of coverage might we want to think about when testing this code? return i; } return -1; } 49 Return to Our Example #define N 5 // 5 is “big enough”? int testFind () { int a[N]; int p, i; for (p = 0; p < N; p++) { random_assign(a, N) a[p] = 3; for (i = p; i < N; i++) { if (a[i] == 3) a[i] = a[i] – 1; } printf (“TEST: findLast({”); print_array(a, N); printf (“}, %d, 3)”, N); assert (findLast(a, N, 3) == p); } } What kind of coverage does this tester exploit? 50