Improving Logic-Based Testing Jeff Offutt Professor, Software Engineering George Mason University Fairfax,VA USA www.cs.gmu.edu/~offutt/ offutt@gmu.edu Joint research with Gary Kaminski and Paul Ammann Improving Logic-Based Testing, invited to Journal of Software Systems of 33 Outline 1. Motivation 2. Logic-Based Testing 3. Making Mutation Cheaper 4. Strengthening MCDC 5. Making MCDC Obsolete 6. Summary Better Logic-Testing © Kaminski, Ammann, Offutt 2 of 33 Software is a Skin that Surrounds Our Civilization Quote due to Dr. Mark Harman Linköping, January 2011 © Jeff Offutt 3 of 33 Costly Software Failures • “The Economic Impacts of Inadequate Infrastructure for Software Testing” – Inadequate software testing costs the US alone between $22 and $59 billion USD annually – Better testing could cut this amount in half • 2006 : Amazon’s BOGO offer became a double discount • 2007 : Symantec says that most security vulnerabilities are now due to faulty software – And more than half are in web applications • Huge losses due to web application failures – Financial services : $6.5 million per hour (just in USA!) – Credit card sales applications : $2.4 million per hour (in USA) World-wide monetary loss is staggering Better Logic-Testing © Kaminski, Ammann, Offutt 4 of 33 Cost Of Late Testing 60 50 40 30 20 10 0 Fault origin (%) Fault detection (%) Unit cost (X) Software Engineering Institute; Carnegie Mellon University; Handbook CMU/SEI-96-HB-002 Linköping, January 2011 © Jeff Offutt 5 of 33 Outline 1. Motivation 2. Logic-Based Testing 3. Making Mutation Cheaper 4. Strengthening MCDC 5. Making MCDC Obsolete 6. Summary Better Logic-Testing © Kaminski, Ammann, Offutt 6 of 33 Covering Logic Expressions • Logic expressions show up in many situations • They are essential to defining software behavior • Covering logic expressions is required by the US Federal Aviation Administration for safety critical software • Logical expressions can come from many sources – – – – Decisions in programs UML : FSMs and statecharts, activity diagrams Requirements SQL queries • Test designs are subsets of expressions’ truth assignments Better Logic-Testing © Kaminski, Ammann, Offutt 7 of 33 Logic Predicates and Clauses • A predicate is an expression that evaluates to a boolean value • Predicates can contain – boolean variables – non-boolean variables that contain >, <, ==, >=, <=, != – boolean function calls • Internal structure is created by logical operators – ¬ – the negation operator – – the and operator – – the or operator – – the implication operator – – the exclusive or operator – – the equivalence operator • A clause is a predicate with no logical operators Better Logic-Testing © Kaminski, Ammann, Offutt 8 of 33 Power of Logic Testing • Logic expressions encode the behavior of software • Logic expressions define the domain of values for which the software behaves in a certain way • Logic expressions are often – Complicated – Subtle – Easy to get wrong, both in design and implementation Testing logic predicates is a cost-effective way to find many subtle software faults Better Logic-Testing © Kaminski, Ammann, Offutt 9 of 33 Problems Addressed • This (mostly) theoretical talk presents results on three problems with logic predicate testing : 1. Redundant mutation operators for predicate testing 2. Weakness of major logic testing criterion : MCDC 3. A stronger logic test criterion, minimal-MUMCUT • Solutions based on theoretical analysis • Solutions can be immediately used to create better tools and stronger criteria, with very slight additional cost Better Logic-Testing © Kaminski, Ammann, Offutt 10 of 33 Outline 1. Motivation 2. Logic-Based Testing 3. Making Mutation Cheaper 4. Strengthening MCDC 5. Making MCDC Obsolete 6. Summary Better Logic-Testing © Kaminski, Ammann, Offutt 11 of 33 Mutation Testing • Mutation helps testers design tests directly to find common mistakes 1. Modify the software in small, syntactic, ways (mutants) • Replace a variable, replace an operator, delete statements, … 2. Design or find a test to cause each mutant to result in incorrect behavior (killing mutants) • The resulting tests are very strong—will detect most mistakes in the software Fundamental Premise : If the software contains a fault, there will usually be mutants that can only be killed by a test that also detects that fault Better Logic-Testing © Kaminski, Ammann, Offutt 12 of 33 Redundancy in Mutation • Mutation is widely considered to be “expensive” • This expense is largely based on the high number of test requirements—mutants • But Li et al. found that mutation needed fewer tests ! Number of Test Requirements 29 Java Classes 4000 3000 2000 1000 0 Number of Tests 29 Java Classes 500 400 300 200 100 0 Li, Praphamontripong, Offutt, An experimental comparison of four unit test criteria, Mutation 2009 Better Logic-Testing © Kaminski, Ammann, Offutt 13 of 33 Eliminating Redundancy • This is strong evidence that mutation tools use many redundant operators • A more clever mutation system should have less redundancy • Fewer mutants means less work for the tester … cheaper! Better Logic-Testing © Kaminski, Ammann, Offutt 14 of 33 Mutation Predicate Testing • Traditional ROR operator : Each occurrence of a relational operator (<, >, <=, >=, =, !=) is replaced by each other operator, and the expression is replaced by True and False. • Example: – Original predicate: a > b – – – – – – – Mutant 1 : a < b Mutant 2 : a <= b Mutant 3 : a >= b Mutant 4 : a == b Mutant 5 : a != b Mutant 6 : true Mutant 7 : false Better Logic-Testing © Kaminski, Ammann, Offutt 15 of 33 Mutation Predicate Testing A fault hierarchy establishes theoretical dominance relations among faults: If fault A dominates fault B, then any test that detects fault A will by definition detect fault B detects LIF Lau and Yu’s logic fault hierarchy LRF TOF LOF LNF ORF+ TNF ORF. ENF Better Logic-Testing © Kaminski, Ammann, Offutt 16 of 33 ROR Mutant Hierarchy If mutant A dominates mutant B, then any test that detects mutant A will by definition detect mutant B Mutants for a < b false a <= b a != b a == b a>b true Mutants for a >= b true a>b a == b a != b a >= b Better Logic-Testing a <= b false a<b © Kaminski, Ammann, Offutt 17 of 33 A Cheaper ROR Operator Each occurrence of a relational operator (<, >, <=, >=, =, !=) is replaced by operators as follows: • < : <=, !=, False • > : >=, !=, False • <= : <, ==, True • >= : >, ==, True • == : <=, >=, False • != : <, >, True Saves four mutants for each relational operator Better Logic-Testing © Kaminski, Ammann, Offutt 18 of 33 Outline 1. Motivation 2. Logic-Based Testing 3. Making Mutation Cheaper 4. Strengthening MCDC 5. Making MCDC Obsolete 6. Summary Better Logic-Testing © Kaminski, Ammann, Offutt 19 of 33 MCDC • Multiple Condition - Decision Coverage – Required by the USA Federal Aviation Administration to test safety critical software • Each clause (condition) in a predicate (decision) is required to be tested with true and false when the clause “matters” – Changing the value of the clause changes the predicate’s value • Example : p = a (b c) Test 1 for a : a=true, (b c)=false Test 2 for a : a=false, (b c)=false • MCDC considered to thoroughly probe predicates • Most useful when predicate has more than 4 clauses – Otherwise, we can test all 2N truth assignments Better Logic-Testing © Kaminski, Ammann, Offutt 20 of 33 Weakness of MCDC • MCDC was invented in the early 1990s • Research community has invented additional logic criteria since – MCDC is weaker than ROR-mutation • MCDC works at the clause level • ROR works at the relational operator level Solution : Extend MCDC to the relational operator level Better Logic-Testing © Kaminski, Ammann, Offutt 21 of 33 Stronger MCDC • MCDC can be extended to include requirements to kill ROR mutants • Method : – MCDC requires clause c = x op y to have two values, True and False – Cheaper-ROR requires c to have three values : x < y, x == y, x > y – The two MCDC values will always satisfy at least two of the cheaper-ROR requirements – Add one additional test to cover the third Better Logic-Testing © Kaminski, Ammann, Offutt 22 of 33 Cost is Minor • MCDC on a predicate with N clauses requires N+1 .. 2N tests • MCDC + ROR requires N more (2N+1 ... 3N tests) • Algorithm and proof in paper Better Logic-Testing © Kaminski, Ammann, Offutt 23 of 33 Example p=abc a = (a1 < a2), b = (b1 <= b2), c = (c1 == c2) The following test set satisfies MCDC : T = { t1, t2, t3, t4} = { ttf, tft, tff, ftf } Which can be refined with the following value assignments : Test Value R O R t e s t s Better Logic-Testing a1 a2 b1 b2 c1 c2 a b < < t1 TTF 5 6 10 11 21 22 t2 TFT 5 6 11 10 21 21 t3 TFF 5 6 11 10 21 22 t4 FTF 6 5 10 11 21 22 > New (t1) 5 5 10 11 21 22 == New (t1) 5 6 10 10 21 22 New (t2) 5 6 11 10 22 21 © Kaminski, Ammann, Offutt c == > < == > 24 of 33 Outline 1. Motivation 2. Logic-Based Testing 3. Making Mutation Cheaper 4. Strengthening MCDC 5. Making MCDC Obsolete 6. Summary Better Logic-Testing © Kaminski, Ammann, Offutt 25 of 33 Faults in Logic Expressions • Types of common and possible faults in logic expressions have been categorized – LIF : Literal Insertion Fault … An extraneous literal, or clause, is in the predicate – LRF : Literal Replacement Fault … The wrong literal, or clause, is used in the predicate – LOF : Literal Omission Fault … A literal, or clause, that should have been in the predicate was omitted – TNF : Term Negation Fault … A term, or collection of clauses, that should have been negated was not (or vice versa) • Lau and Yu defined nine types of logic faults and explored their relationships Better Logic-Testing © Kaminski, Ammann, Offutt 26 of 33 Lau and Yu’s Fault Hierarchy A fault hierarchy establishes theoretical dominance relations among faults: If fault A dominates fault B, then any test that detects fault A will by definition detect fault B minimal- detects MUMCUT LIF LRF TOF LOF LNF ORF+ TNF ORF. MCDC ENF Better Logic-Testing © Kaminski, Ammann, Offutt 27 of 33 Empirical Studies • The fault hierarchy result is theoretical – That is, MCDC is only guaranteed to detect TNF and ENF faults – But it could detect others by serendipity • Two studies on different sets of logic expressions 1. Software from 5 airplane “black boxes” (Line Replaceable Units) 2. Air traffic collision avoidance system (TCAS) Better Logic-Testing © Kaminski, Ammann, Offutt 28 of 33 Empirical Results • Black boxes – – – – – 20,256 logic expressions from 5 different “black boxes” Expressions originally studied by Chilenski 132 expressions with at least 5 unique literals 125 were simple, so that MCDC can find all faults, leaving 7 MCDCfound found81% 81%ofofthe thepossible possiblefaults faults(140 (140ofof173) MCDC 173) • TCAS – – – 19 predicates (originally studied by Chen, Lau and Yu) Larger predicates, several with 25 clauses MCDC MCDC found found only only 35% 35% of of the the possible possible faults faults (205 (205 of of 580) 580) Better Logic-Testing © Kaminski, Ammann, Offutt 29 of 33 Size Matters Faults found grouped by number of unique literals Unique Literals Better Logic-Testing Total Faults Faults Found Percent 5 71 61 86% 7 523 328 63% 8 316 152 48% 9 1630 650 40% 10 1120 432 39% 11 956 312 33% 12 4870 1421 29% 13 1623 535 33% © Kaminski, Ammann, Offutt 30 of 33 Minimal-MUMCUT vs. MCDC • Minimal-MUMCUT finds significantly more faults than MCDC – Especially in large, complicated, logic expressions – Which is precisely when engineers are most likely to make mistakes! – Also very hard to debug when the software fails • Requires up to four times as many tests • Suggested approach 1. Use all combinations on predicates with less than 5 clauses 2. Use MCDC with 5 to 10 clauses 3. Use minimal-MUMCUT with above 10 clauses Better Logic-Testing © Kaminski, Ammann, Offutt 31 of 33 Outline 1. Motivation 2. Logic-Based Testing 3. Making Mutation Cheaper 4. Strengthening MCDC 5. Making MCDC Obsolete 6. Summary Better Logic-Testing © Kaminski, Ammann, Offutt 32 of 33 Recommendations 1. Replace ROR with cheaper-ROR in mutation tools – No loss in strength – Saves 4 test requirements (mutants) for each relational operator – We are currently implementing in the muJava mutation tool 2. Extend logic criteria such as MCDC with ROR – Logic testing should apply to the relational operator level – Small increase in the number of tests – Large increase in the testing strength 3. Replace MCDC – – – – Better: Replace MCDC with Minimal-MUMCUT + ROR RTCA-DO-178B has been in effect for almost 20 years MCDC was a brilliant idea … in 1992 MCDC was adopted and required without scientific validation • Little theoretical analysis and no experimental studies – We now know that MCDC is poor at discovering crucial faults Better Logic-Testing © Kaminski, Ammann, Offutt 33 of 33