Summarizing “Structural” Testing • Now that we have learned to create test cases through both: – a) Functional (blackbox)and – b) Structural (whitebox) testing methodologies we might ask how much is “enough” or “when should we stop testing.” Some potential answers to “when is enough” Testing • similar We stop testing: 1. 2. 3. 4. 5. 6. 7. When we run out of time When no more failure is encountered during testing When no more defects are revealed by testing When we have executed all the designed test cases When we can not think of any more test case to run When we reach a point of “diminishing” return When all faults are discovered Un-decidable or When the preset % of “fault seeds” are found – see last slide Explanation of “when to stop” testing Unfortunately, “when we run out of time” is an often used criteria to stop testing! (Think of the following): 1. – – 2. 3. 4. 5. 6. Customer satisfaction Increased customer support cost and fix cost Some quality conscious organization uses reliability theory and the concept of “when no more or “little” failures or defects can be revealed” is when we stop testing. (hard to do.) “When we have executed all the designed test cases” is fine if the designed test cases provide good coverage; otherwise, it is just a convenient statement to meet schedule. “When we can not think of anymore test case” after properly analyzing the test case coverage would be another acceptable solution. “When we reach a point of diminishing return” is a good management solution similar to the reliability theory of not revealing anymore new defects or failures. (otherwise – “diminishing return needs to be defined) “When all faults are discovered” is not possible theoretically and especially so for large systems. Diminishing Return # of Total Bugs Found Start considering terminating testing Time or Total Test Cases Run terminate testing Test Case Coverage • For us, test case coverage is a key issue in determining when to stop testing. We stop testing when our tests have covered all that we want to cover. Ask: – Are there gaps and redundancies? – Have we covered all the relevant situations? We will use the Triangle Problem as an example to look at these questions Previous Sample Triangle Psuedo-code 1. Program Triangle 2. Declare a, b, c as Integer 3. Declare IsTriangle as Boolean 4. Output ( “enter a, b, and c integers”) 5. Input (a, b, c) 6. Output (“side 1 is”, a) 7. Output (“side 2 is”, b) 8. Output (”side 3 is”, c) 9. If (a<b+c) AND (b<a+c) And (c<b+a) 10. then IsTriangle = True 11. else IsTriangle = False 12. endif 13. If IsTriangle 14. then if (a=b) AND (b=c) 15. then Output (“equilateral”) 16. else if (a NE b) AND (a NE b) AND (b NE c) 17. then Output ( “Scalene”) 18. else Output (“Isosceles”) 19. endif 20. endif 21. else Output (“not a triangle”) 22. endif 23. end Triangle2 Condensation Graph from pseudo code Statements coverage Branch (DD-path) coverage Cyclomatic # = 4+1 = 5 All combinations first 1- 8 - 4 paths 4 paths 5 lin. Ind paths 8 paths 9 10 11 Is_Triangle= True 12 ~Triangle 13 Is_Triangle = False Triangle 21 14 15 Not triangle 16 equilateral 17 18 scalene isosceles 19 20 22 Last All Combination paths ? • Let’s look at the all 8 combination paths 1. 2. 3. 4. P1: < 8,9,10,12,13,14,15,20,22> P2: <8,9,10,12,13,14,16,17,19,20,22> P3: <8,9,10,12,13,14,16,18,19,20,22> P4: <8,9,10,12,13,21,22> (Equilateral) (Scalene) (Isosceles) (not possible) 5. 6. 7. 8. P5: <8,9,11,12,13,14,15,20,22> (not possible) P6: <8,9,11,12,13,14,16,17,19,20,22> (not possible) P7: < 8,9,11,12,13,14,16,18,19,20,22> (not possible) P8: <8,9,11,12,13,21,22> (Not a triangle) - So, there are 4 decision-decision (dd) paths (branch testing) that make sense. - These are P1, P2, P3, and P8. - We should at least test these four paths. Compare against Boundary Value Test (15 test cases for Triangle problem ) Remember the boundary: Test case 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a b 100 100 100 100 100 100 100 100 100 100 1 2 100 199 200 100 100 100 100 100 1 2 100 199 200 100 100 100 100 100 1 ≤ TriangleSide ≤ 200 c 1 2 100 199 200 100 100 100 100 100 100 100 100 100 100 expected output Isosceles Isosceles Equilateral Isosceles Not Triangle Isosceles Isosceles Equilateral Isosceles Not Triangle Isosceles Isosceles Equilateral Isosceles Not Triangle paths P3 P3 P1 P3 P8 P3 P3 P1 P3 P8 P3 P3 P1 P3 P8 Let’s analyze this table in more detail --- next chart Comparison Summary • Potential “Gap” exist in the Boundary Value Test. When we look at the equivalence classes (or logic table) of the outputs, we see that Scalene triangle is not covered. – Path P2 is not covered with the 15 Boundary Value test cases! • There are, however, lots of “Duplications” – P3 is covered 9 times (Isosceles triangle) – P1 is covered 3 times (Equilateral) – P8 is covered 3 times (Not Triangle) Clearly, boundary value (functional testing) is not enough here ; is it possible that it is also not as efficient? Comparison Metrics of Functional .vs. Structural Test Effectiveness • Assume 1. Functional Test M generates m test cases 2. Structural Test S generates s structural elements. (structural elements = the chosen paths for the S test) 3. When all of the m test cases are executed, then n , where n ≤ s, of the s structural elements are traversed or covered. • Then consider 3 metric of evaluating testing “effectiveness” of functional with respect to structural are: – – – Coverage of M with respect to S: C(M,S) = n/s Redundancy of M with respect to S: R(M,S) = m/s Net redundancy of M with respect to S: NR(M,S) = m/n Comparison for the Triangle Example • The Boundary Value Test, M, generated 15 test cases; so m = 15. • The dd –path (or Branch) Test generated 4 paths for test cases; so s = 4. • The 15 M test cases covers 3 of the 4 paths from the S test; so n = 3. The 3 comparison of effectiveness of M to S shows: Coverage(M,S) = 3 / 4 Redundancy(M,S) = 15 / 4 NetRed(M,S) = 15 / 3 : 75% coverage effectiveness : 375% redundancy : 500% net redundancy Note the penalty here Relative Efforts (Test complexity) Comparison within Structural Test Methodologies Effort to identify test coverage elements Sophistication in methodology dd path (branch) Basis d-u path slice Should we consider Structural Test Complexity when Designing? • If so ----– Since program slice testing takes more effort, should we have less program slices in our programs? – If we do have program slices, should those slice size (# of statements) be small? What was that “fault seeding” stop criteria? • Fault seeding is a technique for – i) determining when to stop and/or for – ii) projecting “escaped” bugs. • Fault seeding technique: – Develop a number of bugs (e.g. 20 bugs) and seed them into the product, without letting the testers know. – Pick a % (e.g. 90%) of discovery of the seeded faults by the test team to be considered as the stopping criteria. – Run the tests and see if .9 x 20= 18 of the seeded bugs are found. Stop testing only if 90% is reached. – If the total number of unique problems found is Z (e.g. 45, NOT including the 18 seeded fault), then we may roughly project the remaining nonseeded problems are: - 45/Y = 18/20 - y = 50 - remaining non-seeded problems = 50-45 = 5 - project that there are 5 more undetected problems remaining