Software Testing Research Group (STRG) An Evaluation of MC/DC Coverage for Pair-wise Test Cases By David Anderson [email protected] Background • Software is becoming larger and more complicated, which naturally means the cost and time associated with testing is increasing. According to a National Institute of Standards and Technology report, software bugs cost the U.S. economy an estimated $59.5 billion annually. • The same report indicated that one third, or $22.2 billion of that amount could be saved by improving testing infrastructure. • New research needs to be conducted to find more cost effective ways to test software. The Proposal • This project proposes the integration of pair-wise testing and MC/DC to create a new framework to help software developers test their products in a more cost effective way. • This part of the project is concerned primarily with measuring MC/DC coverage using test cases generated by pair-wise testing. • Future parts include research into how to improve MC/DC coverage of pair-wise test suites and developing tools that integrate these two testing techniques into one framework. Definitions • Pair-wise testing: a testing technique that analyzes interactions between variables using a small number of tests to cover all possible pairs between parameters. • Modified Condition Decision Coverage (MC/DC): A code coverage criterion that requires every point of entry and exit in a program to be executed at least once, every condition in a decision takes on all possible outcomes at least once, and each condition is shown to affect that decision’s outcome independently. Pair-wise example Consider the Boolean equation: d = (A ∧ B) ∨ C The following are acceptable test cases for full pair-wise coverage. A B C t1 1 1 0 t2 1 0 1 t3 0 1 1 t4 0 0 0 Pair-wise facts • Pair-wise is a powerful black-box testing technique. Extensive research has been conducted on this technique with outstanding results. • The number of test cases compared to exhaustive testing is significantly less. The bigger the system being tested, the better this reduction is. MC/DC example Consider the Boolean equation: d = (A ∧ B) ∨ C The following are acceptable test cases for full MC/DC coverage A B C d t1 1 1 0 1 t2 1 0 1 1 t3 1 0 0 0 t4 0 1 0 0 MC/DC facts • MC/DC is a white-box testing technique that ensures adequate coverage of decisions in software. • MC/DC is used in standards DO-178B and DO-178C to ensure adequate testing of safety-critical software. In particular, the FAA has adopted this technique for the testing of airborne software. • Given an expression of N values, on average N+1 test cases are needed to satisfy MC/DC coverage. For comparison, exhaustive testing requires 2N test cases. Why combine MC/DC and pair-wise? Reason 1: Effectiveness of testing Boolean Expressions Pair-wise - weak MC/DC - strong Pair-wise testing is not very effective at testing Boolean expressions. This has been demonstrated in the paper “Effectiveness of Pair-wise Testing for Software with Boolean Inputs” by W. Balance, S. Vilkomir, and W. Jenkins. In this study, pair-wise testing was only slightly more effective than random testing. MC/DC is designed for testing complex Boolean expressions. Many studies have been conducted on the effectiveness of MC/DC with very positive results. In avionics software it is not uncommon to have Boolean expressions with 6+ variables. MC/DC was created specifically to adequately test this kind of complex logic. Why combine MC/DC and pair-wise? Reason 2: Cost of implementation Pair-wise - relatively inexpensive MC/DC - expensive Pair-wise and combinatorial testing in general is relatively cheap to implement. This comes from the black box nature of the technique. A relatively small set of input data is needed for full pair-wise coverage. Since MC/DC is a white box technique, testing of the underlying code is necessary. In particular, each Boolean expression must have in individual set of test data to achieve full MC/DC coverage. This makes implementing MC/DC very time consuming and expensive. Tools used • Automated Combinatorial Testing for Software (ACTS): A tool developed by NIST that is used to generate combinatorial(in this case pair-wise) test cases for specified input variables. • CodeCover: An Eclipse plugin developed at the University of Stuttgart that is used to measure various code coverage metrics including MC/DC. This was the main tool used for measuring coverage. • CTC++: A commercial tool by Verifysoft for measuring coverage of C/C++ programs. This tool was used to verify the correctness of the data from CodeCover. Demonstration For this demonstration, consider the Boolean expression: (A ∧ B) ∨ (C ∨ D) Part 1: Generating Pair-wise test cases with ACTS Part 2: Measuring MC/DC Coverage with CodeCover Simple program Note While the previous example obtained 87.5% MC/DC Coverage, the results are not always this good… The Experiment Two categories of expressions • Boolean expressions were categorized as either “Simple” or “Complex”. • Simple expressions were defined as expressions without repetition in variables while complex expressions contained repetition. • For example: Simple Complex (A ∧ (B ∨ C) (A ∧ B) ∨ (¬A ∧ C) ∨ (¬B ∧ ¬C) • The reasoning behind this was that complex expressions add more points of measurement to the expressions. In complex expressions, each instance of the variable in the expression has to be covered while in simple expressions each variable only has one point to be covered. Comparison with random test cases • For each size of expression, one set of pair-wise test cases and three sets of random test cases were generated. • Random test cases were generated simply by using a random number generator and converting that number into binary. • Each set of random test cases had the same number of cases as the pair-wise set for that expression size. • The goal was to see if pair-wise test cases obtain better levels of MC/DC than randomly generated test cases. Experiment design Simple Expressions Number of Variables Number of Expressions pair-wise sets pair-wise test cases random sets random test cases 3 6 1 4 3 12 4 12 1 6 3 18 5 10 1 6 3 18 6 10 1 7 3 21 7 10 1 7 3 21 8 10 1 8 3 24 total 58 6 38 18 114 Experiment design Complex Expressions Number of Variables Number of Expressions pair-wise sets pair-wise test cases random sets random test cases 3 6 1 4 3 12 4 10 1 6 3 18 5 10 1 6 3 18 6 10 1 7 3 21 7 10 1 7 3 21 8 10 1 8 3 24 total 56 6 38 18 114 Results Comparison based on size Complex Expressions 90 90 80 80 70 70 60 60 MC/DC Coverage (%) MC/DC Coverage (%) Simple Expressions 50 40 30 50 40 30 20 20 10 10 0 0 3-var 4-var 5-var Pair-wise 6-var Random 7-var 8-var 3-var 4-var 5-var Pair-wise 6-var Random 7-var 8-var Comparison based on complexity 90 80 70 MC/DC Coverage (%) 60 50 40 30 20 10 0 3-var 4-var 5-var 6-var Simple Complex 7-var 8-var Summary of Results Simple Pair-wise Random Complex Pair-wise Both Random Pair-wise Random 3-var 77.8 75.9 77.7 74.2 77.7 75 4-var 76.0 73.3 80.3 71.8 78 72.6 5-var 70.0 64.7 73.4 66.4 71.7 65.4 6-var 64.2 60.3 62.7 66.6 63.4 63.4 7-var 59.3 50.2 60.4 57.8 59.9 54.0 8-var 57.5 53.9 49.4 52.9 53.5 53.4 Average 67.1 62.5 66.5 64.3 66.8 63.4 Analysis Analysis • The data found in this experiment suggests that pair-wise test cases obtain only slightly better coverage than randomly generated test cases. • The data between simple and complex expressions did not seem to be significantly different. • With larger expressions, coverage appeared to slowly decrease. • Coverage appeared to be highly dependent on the structure of individual expressions, with high variance within sets of data. Stability of Results • It should be noted that the range of the results was very high. 100 MC/DC Coverage (%) • Note the chart to the right. This is a sample from one set of 4-variable expressions. As you can see, there is a wide range of coverage levels for both pair-wise and random tests. 120 80 60 40 20 0 1 2 3 4 5 6 Pair-wise 7 8 Random 9 10 11 12 What does this mean? • Because of this high range and variance of MC/DC coverage level, this data only presents a good average of coverage when many expressions of different sizes, complexities, and structures are measured together. • This average would not be suitable as a predictor for coverage of individual expressions, or for software from the industry. Current Work Analyzing coverage for one large set of test data • In the previous experiment, each size of expression had different test data. • In this experiment, one set of test data for 10 Boolean variables is used. Expressions of different sizes and containing different subsets of these 10 variables are tested and coverage is measured. • This approach better matches the structure of industry software by using one set of test data for many expressions of different sizes. Industry Software • Since the long-term goal of this project is to create a framework for developers to test their code in a more effective way, applying this approach to software from the industry is important. • Repositories exist such as the Software-Artifact Infrastructure Repository that contain many examples of software intended for experiments such as this one. • The results for this could be very different that the results from measuring coverage of individual expressions. Methods for Improving Coverage • Now that we have some data for coverage, a next step is to look for methods to improve this coverage. • Methods could be increasing interaction strength (3-wise, 4-wise, etc.) or adding additional test cases to the pair-wise sets based on some criteria. Any Questions?