David Pryor Mutation-Based Testing Same basic goal as Code Coverage Evaluate the tests Determine “how much” code exercised Mutation testing goes beyond checking which lines of code were executed Goal: Distinguish statements that are simply executed from those that are fully tested History Theory published in 1971 by a student Computationally Infeasible Technological advances: 90’s – present Still not mainstream, but gaining popularity 2 Basic Procedure Requirements Complete working program Tests written and passing Make a small change(mutation) to the source code of the program Run the tests If a test fails: The test “killed” this mutant If all tests still pass: Redundant/Unnecessary code Tests are incomplete 3 Example test functions This test achieves 100% code coverage testAbs1 will never fail Even if abs(-3) == -3 Mutation testing can detect this testAbs1 will initially pass It will still pass on mutated code It failed to “kill” the mutant The test is inadequate 4 Example test functions testAbs2 passes initially On the mutated function, it fails abs(-5) != 5 Test is supposed to fail It shows the test is robust enough to catch errors such as this mutation It killed the mutant 5 Mutation Operators Types of mutations to be applied Types of “changes” to make to the code Defined by the testing framework Chosen by the user Goals Introduce errors Simulate common coding bugs Ensure the testing of all possible circumstances Traditional Mutation Operators Simple Common Included in most mutation frameworks 6 Dropped Statement Removes a single statement from the program Unnecessary code Needs to be selective Many possible mutants 7 Arithmetic Operator Replacement Swaps arithmetic operators +, -, *, /, % Some frameworks have set translations + always becomes * Others allow any/random swaps 8 Boolean Relation Replacement Swaps boolean relations ==, !=, <, <=, >, >= Tightly constrained form Only mutates to/from similar relations < to <= > to >= == to != 9 Boolean Expression Replacement Replaces an entire expression with true or false Unnecessary Code Code paths that aren’t sufficiently tested 10 Variable Replacement Replaces a variable reference with a different variable reference The new variable must be accessible and defined in the same scope Not trying to create compiler errors False positives Unnecessary/Duplicate variables 11 Non-Traditional Operators Lots of Operators out there Bit operation / Binary operators Shift/Rotate replacement Increments / Decrements Invert Negatives Should be able to define your own for customized testing Ideally: Minimize false positives Reasonable number of created mutants 12 Replace inline constants Replaces an inline constant Numeric or string Non-Deterministic Tests for responses more than behavior Requires many tests that may not be helpful Used in Heckle(Ruby) 13 Object-Oriented Operators Encapsulation Changing access modifiers Inheritance Hiding variables Method overriding Parent / super actions Polymorphism Lots of operators here Basic Idea: change something between the parent and the child in the usage of an object 14 Concurrency Operators Modify sleep/wait calls Mutual exclusion and semaphores Change boundaries of the critical section Change synchronization details Switch concurrent objects Others Goal is the same Evaluate the adequacy of the tests 15 Computation Time The biggest problem/roadblock with Mutation testing Theory has existed for 40+ years Computationally infeasible for use in industry for a long time Frameworks can automate almost everything, but: Every mutant has to be run against the entire test suite Many mutants and a large test suite cause immense testing times 16 Computation Time Estimation T = Time to run test suite against the code base M = # of Mutation Operators in use (3-20+) N = # of Mutants per Operator (depends on code base size) Mutation testing causes time to increase from T to T*M*N Minimum time increase of a factor of 30 For small code base and few operators Total time increases very quickly Only time needed to run tests, not compilation If T = 1 minute, T*M*N can become hours or days If T = 1 hour, T*M*N can become weeks or longer 17 Addressing Computation Time Need to spend less time on mutation testing Variety of methods that fall into three categories Do Less Test fewer mutants and mutation operators Need to be careful Fewer may result in poor tests slipping through Do Faster Increase the speed of execution Do Smarter Eliminate mutants and mutation operators that do not provide meaningful results 18 Source code or Byte code? Can perform mutations on the source code itself, but: Large code bases result in lots of slow disk reads Have to compile EVERY mutant Instead, compile the original source once Mutate the compiled byte code Much faster Can be difficult to back-trace the byte code to the source code to show the mutants that were created 19 Weak vs. Strong Testing Two conditions for killing a mutant Testing data should cause a different program state for the mutant than for the original For example, a test results in: valid = false done = true The difference in state should result in a difference in output and be checked by the test In this case: the test should check ‘result’ Weak Testing: Only satisfy the first condition Strong Testing: Both conditions 20 Weak vs. Strong Testing Weak assures that the tests cause a difference Not assured that they check the difference Not as thorough Strong is ideal Computationally expensive Must always run to the end Weak can stop as soon as it detects a difference in state 21 Incremental Analysis Currently experimental Most useful for long-term projects with large code bases, which use mutation testing over and over Basic Idea: save state and results of tests and code Only re-run those tests and mutants for which relevant code has changed Decide which to skip based on changes made Not perfected yet – can be “tricked” by odd behavior 22 Selective Mutation Goal: Eliminate some mutants or operators that are not necessary Mutants Remove duplicates caused by multiple operators Remove “likely” duplicates – those that will probably cause duplicate results Some amount of error here Mutation Operators Some pairs of operators might produce many of the same mutants An operator might produce a subset of mutants from a different operator Detection can be difficult 23 Coverage Based Test Selection Typical mutant only affects a single statement / line Typical test suite has many tests that do not execute this line No need to run these tests on the mutant Only run tests that exercise the mutated code Optimize the running order of tests 24 Other Problems and Design Considerations Equivalent Mutants This mutant is functionally equivalent to the original No test that calls this code could ever distinguish the two “Some mutants can’t be killed” Can sometimes be detected automatically and filtered out Not all can be detected Requires human effort to determine if equivalent Not always an easy task 25 Other Problems and Design Considerations Mutant Infinite Loops Some mutations can cause infinite loops in the mutants Statement deletion removed the only way out of this loops Solution: Time the un-mutated code If the mutant takes significantly longer than this time Probably an infinite loop Timeout after the un-mutated time, plus some padding 26 Other Problems and Design Considerations Complex Bugs Mutation only makes small changes What if the test cases miss a large, complex error? Mutation doesn’t create complex mutants Coupling Effect Hypothesis – Tests that detect simple errors are sensitive enough to detect more complex errors Supported by empirical data Testing for “simple” errors helps to find the more complex ones 27 Fuzz Testing / Fuzzing Completely unrelated to Mutation testing Often confused with Mutation Involves generating random data to use as input to a program Test security/vulnerability See if the program crashes Fuzzing – modifies input Checks program behavior Mutation – modifies source code Checks test case results Deterministic 28 Tools and Environments Java MuJava Bacterio Javalanche Jumble PIT Jester C/C++ Insure++ Fortran Mothra PHP Mutagenesis Ruby Heckle Mutant C# Nester 29 Using Mutation Testing in Industry Use Mutation from the beginning of a project Don’t use it with “dangerous” methods Look for high quality tools/environments Speed optimizations Reporting/Coverage information Configurable Operators 30 Benefits of Mutation testing Evaluation of tests / test suite More than code coverage: are the tests adequate? Mutation Score: % of non-equivalent mutants killed Evaluation of code Find unreachable or redundant code Find bugs that were hidden through inadequate tests Future: Automatic Test Generation Create dummy tests, use Mutation to revise Repeat until all or most non-equivalent mutants killed Still experimental, but promising 31 Questions 32 References Alexander, R. T., & Bieman, J. M. (2002). Mutation of java objects. IEEE Int. Symp. Software Reliability Engineering, Retrieved from http://www.cs.colostate.edu/~bieman/Pubs/AlexanderBiemanGhoshJiISSRE02.pdf Offutt, A. J. (n.d.). A practical system for mutation testing: Help for the common programmer. Retrieved from http://cs.gmu.edu/~offutt/rsrch/papers/practical.pdf Offutt, A. J., & Untch, R. H. (n.d.). Mutation 2000: Uniting the orthogonal. Retrieved from http://cs.gmu.edu/~offutt/rsrch/papers/mut00.pdf Ma, Y. S., Kwon, Y. R., & Offutt, A. J. (n.d.). Mujava: An automated class mutation system. Retrieved from http://www.cs.gmu.edu/~offutt/rsrch/papers/mujava.pdf Bradbury, J. S., Cordy, J. R., & Dingel, J. (n.d.). Mutation operators for concurrent java(j2se 5.0). Retrieved from http://www.irisa.fr/manifestations/2006/Mutation2006/papers/14_Final_version.pdf Pit mutation testing. (n.d.). Retrieved from http://pitest.org/ 33