Dependable Software Systems Topics in Data-Flow Testing Material drawn from [Beizer, Mancoridis, Vokolos] © SERG Data-Flow Testing • Data-flow testing uses the control flowgraph to explore the unreasonable things that can happen to data (i.e., anomalies). • Consideration of data-flow anomalies leads to test path selection strategies that fill the gaps between complete path testing and branch or statement testing. © SERG Data-Flow Testing (Cont’d) • Data-flow testing is the name given to a family of test strategies based on selecting paths through the program’s control flow in order to explore sequences of events related to the status of data objects. • E.g., Pick enough paths to assure that: – Every data object has been initialized prior to its use. – All defined objects have been used at least once. © SERG Data Object Categories • (d) Defined, Created, Initialized • (k) Killed, Undefined, Released • (u) Used: – – (c) Used in a calculation (p) Used in a predicate © SERG (d) Defined Objects • An object (e.g., variable) is defined when it: – – – – – appears in a data declaration is assigned a new value is a file that has been opened is dynamically allocated ... © SERG (k) Killed Objects • An object is killed when it is: – released (e.g., free) or otherwise made unavailable (e.g., out of scope) – a loop control variable when the loop exits – a file that has been closed – ... © SERG (u) Used Objects • An object is used when it is part of a computation or a predicate. • A variable is used for a computation (c) when it appears on the RHS (sometimes even the LHS in case of array indices) of an assignment statement. • A variable is used in a predicate (p) when it appears directly in that predicate. © SERG Example: Definition and Uses What are the definitions and uses for the program below?! 1. 2. 3. 4 read (x, y); z = x + 2; if (z < y) w = x + 1; else 5. y = y + 1; 6. print (x, y, w, z); © SERG Example: Definition and Uses 1. 2. 3. 4 read (x, y); z = x + 2; if (z < y) w = x + 1; else 5. y = y + 1; 6. print (x, y, w, z); Def C-use x, y z x P-use z, y w x y y x, y, w, z © SERG Data-Flow Anomalies • A data-flow anomaly is denoted by a two character sequence of actions. E.g., – ku: Means that an object is killed and then used. – dd: Means that an object is defined twice without an intervening usage. © SERG Example • E.g., of a valid (not anomalous) scenario where variable A is a dpd: A = C + D; if(A > 0) X = 1; else X = -1; A = B + C; © SERG Two Letter Combinations for dku • • • • • • • • • dd: Probably harmless, but suspicious. dk: Probably a bug. du: Normal situation. kd: Normal situation. kk: Harmless, but probably a bug. ku: Definitely a bug. ud: Normal situation (reassignment). uk: Normal situation. uu: Normal situation. © SERG Single Letter Situations • A leading dash means that nothing of interest (d, k, u) occurs prior to the action noted along the entry-exit path of interest. • A trailing dash means that nothing of interest happens after the point of action until the exit. © SERG Single Letter Situations • -k: Possibly anomalous: – Killing a variable that does not exist. – Killing a variable that is global. • -d: Normal situation. • -u: Possibly anomalous, unless variable is global. • k-: Normal situation. • d-: Possibly anomalous, unless variable is global. • u-: Normal situation. © SERG Data-Flow Anomaly State Graph state of variable action K k u,k d u U d u D d,k anomalous state A k,u,d © SERG Data-Flow Anomaly State Graph with Variable Redemption u KU u k k u K d k d DK k d D k u U d d k u u DD d © SERG Static vs Dynamic Anomaly Detection • Static Analysis is analysis done on source code without actually executing it. • E.g., Syntax errors are caught by static analysis. © SERG Static vs Dynamic Anomaly Detection (Cont’d) • Dynamic Analysis is analysis done as a program is executing and is based on intermediate values that result from the program’s execution. • E.g., A division by 0 error is caught by dynamic analysis. • If a data-flow anomaly can be detected by static analysis then the anomaly does not concern testing. (Should be handled by the compiler.) © SERG Anomaly Detection Using Compilers • Compilers are able to detect several dataflow anomalies using static analysis. • E.g., By forcing declaration before use, a compiler can detect anomalies such as: – -u – ku • Optimizing compilers are able to detect some dead variables. © SERG Is Static Analysis Sufficient? • • • • Questions: Why isn’t static analysis enough? Why is testing required? Could a good compiler detect all data-flow anomalies? • Answer: No. Detecting all data-flow anomalies is provably unsolvable. © SERG Static Analysis Deficiencies • Current static analysis methods are inadequate for: – Dead Variables: Detecting unreachable variables is unsolvable in the general case. – Arrays: Dynamically allocated arrays contain garbage unless they are initialized explicitly. (-u anomalies are possible) © SERG Static Analysis Deficiencies (Cont’d) – Pointers: Impossible to verify pointer values at compile time. – False Anomalies: Even an obvious bug (e.g., ku) may not be a bug if the path along which the anomaly exists is unachievable. (Determining whether a path is or is not achievable is unsolvable.) © SERG Data-Flow Modeling • Data-flow modeling is based on the control flowgraph. • Each link is annotated with: – symbols (e.g., d, k, u, c, p) – sequences of symbols (e.g., dd, du, ddd) • that denote the sequence of data operations on that link with respect to the variable of interest. © SERG Control Flowgraph Annotated for X and Y Data Flows 1 3 4 5 6 7 8 9 10 11 12 13 2 INPUT X,Y Z:= X+Y Y:= X-Y IF Z>=0 GOTO SAM JOE: Z:=Z-1 SAM: Z:=Z+V U:=0 LOOP B(U),Q(V):=(Z+V)*U IF B(U)=0 GOTO JOE Z:=Z-1 2 IF Z=0 GOTO ELL U:=U+1 END UNTIL U=z B(U-1):=B(U+1)+Q(V-1) ELL: B(U+Q(V)):=U+V IF U=V GOTO JOE IF U>V THEN U:=Z YY:Z:=U END 1 dcc JOE 3 4 5 LOOP B(U)? 6 7 SAM Z? ELL 13 12 11 YY U,V? U,V? 10 U,Z? 9 Z? 8 © SERG Control Flowgraph Annotated for Z Data Flows 1 3 4 5 6 7 8 9 10 11 12 13 2 INPUT X,Y p Z:= X+Y d 1 3 p Y:= X-Y IF Z>=0 GOTO SAM Z? JOE: Z:=Z-1 SAM: Z:=Z+V U:=0 LOOP d B(U),Q(V):=(Z+V)*U 2 13 c 12 IF B(U)=0 GOTO JOE END YY U,V? Z:=Z-1 IF Z=0 GOTO ELL U:=U+1 UNTIL U=z B(U-1):=B(U+1)+Q(V-1) ELL: B(U+Q(V)):=U+V IF U=V GOTO JOE IF U>V THEN U:=Z YY:Z:=U END JOE 4 LOOP cd 5 cd SAM ELL 11 U,V? 10 6 B(U)? 7 c cd p p 9 U,Z? p 8 Z? p © SERG Definition-Clear Path Segments • A Definition-clear Path Segment (w.r.t. variable X) is a connected sequence of links such that X is defined on the first link and not redefined or killed on any subsequent link of that path segment. © SERG Definition-Clear Path Segments for Variable Z (Cont’d) p d 1 3 JOE p 4 Z? cd 5 cd LOOP B(), U? 6 7 c SAM cd p 2 END d ELL 13 YY c 12 11 U,V? U,V? 10 p 9 U,Z? p 8 Z? p © SERG Non Definition-Clear Path Segments for Variable Z (Cont’d) p d 1 3 JOE p 4 Z? cd 5 cd LOOP B(), U? 6 7 c SAM cd p 2 END d ELL 13 YY c 12 11 U,V? U,V? 10 p 9 U,Z? p 8 Z? p © SERG Simple Path Segments • A Simple Path Segment is a path segment in which at most one node is visited twice. – E.g., (7,4,5,6,7) is simple. • Therefore, a simple path may or may not be loop-free. © SERG Loop-free Path Segments • A Loop-free Path Segment is a path segment for which every node is visited at most once. – E.g., (4,5,6,7,8,10) is loop-free. – path (10,11,4,5,6,7,8,10,11,12) is not loop-free because nodes 10 and 11 are visited twice. © SERG du Path Segments • A du Path is a path segment such that if the last link has a use of X, then the path is simple and definition clear. © SERG def-use Associations • A def-use association is a triple (x, d, u,), where: x is a variable, d is a node containing a definition of x, u is either a statement or predicate node containing a use of x, and there is a sub-path in the flow graph from d to u with no other definition of x between d and u. © SERG Example: Def-Use Associations 1! 2! 3 T w=x+1 4! read (x, y) z=x+2 Some Def-Use Associations:! (x, 1, 2), (x, 1, 4), …! z<y (z, 2, (3,t)),...! F 5! (y, 1, (3,t)), (y, 1, (3,f)), (y, 1, 5), …! y=y+1 6! print (x,y,w,z) © SERG Example: Def-Use Associations What are all the def-use associations for the program below? read (z) x=0 y=0 if (z ≥ 0) { x = sqrt (z) if (0 ≤ x && x ≤ 5) y = f (x) else y = h (z) } y = g (x, y) print (y) © SERG Example: Def-Use Associations def-use associations for variable z.! read (z) x=0 y=0 if (z ≥ 0) { x = sqrt (z) if (0 ≤ x && x ≤ 5) y = f (x) else y = h (z) } y = g (x, y) print (y) © SERG Example: Def-Use Associations read (z) def-use associations for variable x=0 x.! y=0 if (z ≥ 0) { x = sqrt (z) if (0 ≤ x && x ≤ 5) else } y = g (x, y) print (y) y = f (x) y = h (z) © SERG Example: Def-Use Associations read (z) def-use associations for variable y.! x=0 y=0 if (z ≥ 0) { x = sqrt (z) if (0 ≤ x && x ≤ 5) y = f (x) else y = h (z) } y=g (x, y) print (y) © SERG Definition-Clear Paths • A path (i, n1, ..., nm, j) is called a definition-clear path with respect to x from node i to node j if it contains no definitions of variable x in nodes (n1, ..., nm , j) . • The family of data flow criteria requires that the test data execute definition-clear paths from each node containing a definition of a variable to specified nodes containing c-use and edges containing p-use of that variable. © SERG Data-Flow Testing Strategies • • • • • • • All du Paths (ADUP) All Uses (AU) All p-uses/some c-uses (APU+C) All c-uses/some p-uses (ACU+P) All Definitions (AD) All p-uses (APU) All c-uses (ACU) © SERG All du Paths Strategy (ADUP) • ADUP is one of the strongest data-flow testing strategies. • ADUP requires that every du path from every definition of every variable to every use of that definition be exercised under some test All du Paths Strategy (ADUP). © SERG Example: pow(x,y) /* pow(x,y) This program computes x to the power of y, where x and y are integers. INPUT: The x and y values. OUTPUT: x raised to the power of y is printed to stdout. */ 1 void pow (int x, y) 2 { 3 float z; 4 int p; b 5 if (y < 0) 6 p = 0 – y; a f d 7 else p = y; 1 5 8 9 8 z = 1.0; 9 while (p != 0) 10 { c e 11 z = z * x; 12 p = p – 1; 13 } 14 if (y < 0) 15 z = 1.0 / z; 16 printf(z); 17 } g 14 16 i 17 h © SERG Example: pow(x,y) du-Path for Variable x /* pow(x,y) This program computes x to the power of y, where x and y are integers. INPUT: The x and y values. OUTPUT: x raised to the power of y is printed to stdout. */ 1 void pow (int x, y) 2 { 3 float z; 4 int p; b 5 if (y < 0) 6 p = 0 – y; a f d 7 else p = y; 1 5 8 9 8 z = 1.0; 9 while (p != 0) c 10 { e 11 z = z * x; 12 p = p – 1; 13 } 14 if (y < 0) 15 z = 1.0 / z; 16 printf(z); 17 } g 14 16 i 17 h © SERG Example: pow(x,y) du-Path for Variable x /* pow(x,y) This program computes x to the power of y, where x and y are integers. INPUT: The x and y values. OUTPUT: x raised to the power of y is printed to stdout. */ 1 void pow (int x, y) 2 { 3 float z; 4 int p; b 5 if (y < 0) 6 p = 0 – y; a f d 7 else p = y; 1 5 8 9 8 z = 1.0; 9 while (p != 0) c 10 { e 11 z = z * x; 12 p = p – 1; 13 } 14 if (y < 0) 15 z = 1.0 / z; 16 printf(z); 17 } g 14 16 i 17 h © SERG Example: pow(x,y) du-Path for Variable y /* pow(x,y) This program computes x to the power of y, where x and y are integers. INPUT: The x and y values. OUTPUT: x raised to the power of y is printed to stdout. */ 1 void pow (int x, y) 2 { 3 float z; 4 int p; b 5 if (y < 0) 6 p = 0 – y; a f d 7 else p = y; 1 5 8 9 8 z = 1.0; 9 while (p != 0) c 10 { e 11 z = z * x; 12 p = p – 1; 13 } 14 if (y < 0) 15 z = 1.0 / z; 16 printf(z); 17 } g 14 16 i 17 h © SERG Example: pow(x,y) du-Path for Variable y /* pow(x,y) This program computes x to the power of y, where x and y are integers. INPUT: The x and y values. OUTPUT: x raised to the power of y is printed to stdout. */ 1 void pow (int x, y) 2 { 3 float z; 4 int p; b 5 if (y < 0) 6 p = 0 – y; a f d 7 else p = y; 1 5 8 9 8 z = 1.0; 9 while (p != 0) c 10 { e 11 z = z * x; 12 p = p – 1; 13 } 14 if (y < 0) 15 z = 1.0 / z; 16 printf(z); 17 } g 14 16 i 17 h © SERG Example: pow(x,y) du-Path for Variable y /* pow(x,y) This program computes x to the power of y, where x and y are integers. INPUT: The x and y values. OUTPUT: x raised to the power of y is printed to stdout. */ 1 void pow (int x, y) 2 { 3 float z; 4 int p; b 5 if (y < 0) 6 p = 0 – y; a f d 7 else p = y; 1 5 8 9 8 z = 1.0; 9 while (p != 0) c 10 { e 11 z = z * x; 12 p = p – 1; 13 } 14 if (y < 0) 15 z = 1.0 / z; 16 printf(z); 17 } g 14 16 i 17 h © SERG Example: Using du-Path Testing to Test Program COUNT • Consider the following program: /* COUNT This program counts the number of characters and lines in a text file. INPUT: Text File OUTPUT: Number of characters and number of lines. */ 1 main(int argc, char *argv[]) 2 { 3 int numChars = 0; 4 int numLines = 0; 5 char chr; 6 FILE *fp = NULL; 7 © SERG Program COUNT (Cont’d) 8 if (argc < 2) 9 { 10 printf(“\nUsage: %s <filename>”, argv[0]); 11 return (-1); 12 } 13 fp = fopen(argv[1], “r”); 14 if (fp == NULL) 15 { 16 perror(argv[1]); 17 return (-2); 18 } /* display error message */ © SERG Program COUNT (Cont’d) 19 while (!feof(fp)) 20 { 21 chr = getc(fp); /* read character */ 22 if (chr == ‘\n’) /* if carriage return */ 23 ++numLines; 24 else 25 ++numChars; 26 } 27 printf(“\nNumber of characters = %d”, numChars); 28 printf(“\nNumber of lines = %d”, numLines); 29 } © SERG The Flowgraph for COUNT 1 8 11 14 17 19 22 23 24 26 29 • The junction at line 12 and line 18 are not needed because if you are at these lines then you must also be at line 14 and 19 respectively. © SERG du-Path for argc p 1 d 8 p 11 14 17 19 22 23 24 26 29 argc? © SERG du-Path for argc p 1 d 8 p 11 14 17 19 22 23 24 26 29 argc? © SERG du-Path for argv[] c 1 d 8 argc? c 11 14 c 17 19 22 23 24 26 29 fp? © SERG du-Path for argv[] 1 d 8 c 11 14 17 19 22 23 24 26 29 argc? © SERG du-Path for numChars c fp? 1 d 8 argc? 11 14 fp? 17 19 22 23 24 cd 26 29 chr? © SERG du-Path for numChars c fp? 1 d 8 argc? 11 14 fp? 17 19 22 23 24 cd 26 29 chr? © SERG du-Path for numChars c fp? 1 d 8 argc? 11 14 fp? 17 19 22 23 24 cd 26 29 chr? © SERG du-Path for numLines c fp? 1 d 8 argc? 11 14 fp? 17 19 22 cd 23 24 26 29 chr? © SERG du-Path for numLines c fp? 1 d 8 argc? 11 14 fp? 17 19 22 cd 23 24 26 29 chr? © SERG du-Path for numLines c fp? 1 d 8 argc? 11 14 fp? 17 19 22 cd 23 24 26 29 chr? © SERG du-Path for chr fp? 1 d 8 argc? 11 14 fp? 17 19 p d 22 p 23 24 26 29 chr? © SERG du-Path for chr fp? 1 d 8 argc? 11 14 fp? 17 19 p d 22 p 23 24 26 29 chr? © SERG du-Path for fp d 1 d 8 argc? 11 p 14 fp? p fp? p 17 19 pc 22 23 24 26 29 chr? © SERG du-Path for fp d 1 d 8 argc? 11 p 14 fp? p fp? p 17 19 pc 22 23 24 26 29 chr? © SERG du-Path for fp d 1 d 8 argc? 11 p 14 fp? p fp? p 17 19 pc 22 23 24 26 29 chr? © SERG All Uses Strategy (AU) • AU requires that at least one path from every definition of every variable to every use of that definition be exercised under some test. • Hence, at least one definition-clear path from every definition of every variable to every use of that definition be exercised under some test. • Clearly, AU < ADUP. © SERG All p-uses/Some c-uses Strategy (APU+C) • APU+C requires that for every variable and every definition of that variable include at least one definition-free path from the definition to every predicate use. • If there are definitions of the variable that are not covered by the above prescription, then add computational-use test cases to cover every definition. © SERG All c-uses/Some p-uses Strategy (ACU+P) • ACU+P requires that for every variable and every definition of that variable include at least one definition-free path from the definition to every computational use. • If there are definitions of the variable that are not covered by the above prescription, then add predicate-use test cases to cover every definition. © SERG All Definitions Strategy (AD) • AD requires that for every variable and every definition of that variable include at least one definition-free path from the definition to a computational or predicate use. • AD < ACU+P and AD < APU+C. © SERG All p-uses (APU) All c-uses (ACU) • APU is the same as APU+C without the C requirement. • APU < APU+C. • ACU is the same as ACU+P without the P requirement. • ACU < ACU+P. © SERG Relationship among DF criteria ALL-PATHS! ALL-DU-PATHS! ALL-USES! ALL-P-USES/SOME-C-USES! ALL-C-USES/SOME-P-USES! ALL-C-USES! ALL-P-USES! ALL-DEFS! ALL-EDGES! ALL-NODES! © SERG Feasible Data Flow Criteria • What happens if we eliminate all un-executable associations from consideration? • If we eliminate all un-executable associations from consideration, then there are significant differences between the Data Flow criteria and the Feasible Data Flow criteria. • For a large class of “well behaved programs”, the Feasible DF criteria All-p-uses, All-p-uses/some-c-uses, and All-uses bridge the gap between All-edges and All-paths. • However, for certain programs with anomalies there are tests which satisfy All-p-uses without satisfying All-edges. © SERG An Example 1! read (x) 2! x = sqrt(x) 6 F T 7! 3 read (y) x<0 F T 5! 4! 8! 9! • Node 4 is un-executable; y is un-initialized.! • Essentially, only def-use of concern is (x, 2, 3)! 6 T y≠0 F • Outcome of node 6 can be either T or F.! • All-p-uses satisfied, but not all-edges.! © SERG Effectiveness of Strategies • Ntafos compared Random, Branch, and All uses testing strategies on 14 Kernighan and Plauger programs. • Kernighan and Plauger programs are a set of mathematical programs with known bugs that are often used to evaluate test strategies. • Ntafos conducted two experiments: © SERG Results of 2 of the 14 Ntafos Experiments Strategy Mean Number of Test Cases Percentage of Bugs Found Random 35 93.7 Branch 3.8 91.6 All Uses 11.3 96.3 Strategy Mean Number of Test Cases Percentage of Bugs Found Random 100 79.5 Branch 34 85.5 All Uses 84 90.0 © SERG Data-Flow Testing Tips • Resolve all data-flow anomalies. • Try to do all data-flow operations on a variable within the same routine (i.e., avoid integration problems). • Use strong typing and user defined types when possible. © SERG Data-Flow Testing Tips (Cont’d) • Use explicit (rather than implicit) declarations of data when possible. • Put data declarations at the top of the routine and return data objects at the bottom of the routine. © SERG Summary • Data are as important as code. • Define what you consider to be a data-flow anomaly. • Data-flow testing strategies span the gap between all paths and branch testing. © SERG Summary • AU has the best payoff for the money. It seems to be no worse than twice the number of required test cases for branch testing, but the results are much better. • Path testing with Branch Coverage and Data-flow testing with AU is a very good combination. © SERG