Applications of Metamorphic Testing Chris Murphy University of Pennsylvania cdmurphy@cis.upenn.edu November 17, 2011 About Me Lecturer, University of Pennsylvania PhD-Computer Science, Columbia Univ, 2010 Advisor: Prof. Gail Kaiser Research: software testing, CS education Seven years experience in software development industry Problem: When testing a piece of software, how can we know that we’ve created enough test cases? Problem: When testing a piece of software, how can we create more test cases? Solution: We use properties of the software to create new test cases from existing ones (particularly those that have not failed). Result: This approach, known as metamorphic testing, is more effective at testing certain types of software than other approaches. Today's Talk What is metamorphic testing? How is metamorphic testing used to find bugs in software? How can metamorphic testing be applied to applications that do not have test oracles? What are the open research questions related to metamorphic testing? UPenn CIS 573 Software Engineering Graduate-level software engineering course Just over 100 students Focuses on software maintenance issues: – Testing – Formal verification – Debugging – Fault Localization – Refactoring It's your first day of work as a software engineer at BloobleSoft. Your boss gives you 6,000 lines of code and a specification and says “find the bugs”. Where would you start? Specification Code Test Case Generation Strategy Test Test Test Test Cases Cases Cases Cases You've created 4,837 test cases. All of them pass. How do you know when you're done creating test cases? Testing Requirements Measurable Adequacy Criteria Test Cases Desired Adequacy Level Coverage Level Acceptable? Your test cases are achieving 100% coverage. None of them have found any bugs. Are those test cases useful? Maybe those test cases can be used to create new test cases. The more test cases, the better. Right? This is the idea behind “metamorphic testing”. [Chen et al., HKUST TR CS-98-01, 1998] A simple example (really, really, really, really) Let's say you're testing a cosine function. (I know, I know...) You have a test case {45º, 0.7071}, i.e. cos(45º) = 0.7071 How could we use this test case to create new test cases? We know that the cosine function exhibits certain properties. That is, if we make certain changes to the input, we can predict the effect on the output. These are referred to as “metamorphic properties”. What are the metamorphic properties of the cosine function? cos(x + 360º) = cos(x) That is, if we add 360 to the input, the output should not change. cos(x - 360º) = cos(x) cos(x + 180º) = -1 * cos(x) Given our original test case {45°, 0.7071}, we can create three follow-on test cases. Property: cos(x + 360º) = cos(x) Input: 45º + 360º = 405º Output: cos(45º) = 0.7071 Property: cos(x - 360º) = cos(x) Input: 45º - 360º = -315º Output: cos(45º) = 0.7071 Property: cos(-1 * x) = -1 * cos(x) Input: -1 * 45º = -45º Output: -1 * cos(45º) = -0.7071 Initial test case {x, f(x)} x f f(x) t g t(x) f f(t(x)) = g(f(x)) Follow-on test case {t(x), f(t(x))} A metamorphic property of a function f is a pair of functions (t, g) such that f(t(x)) = g(f(x)) for all inputs x but wait, isn’t that the same as… Program invariants: -1 ≤ cos(x) ≤ 1 Describe legal ranges/values of a function, but not how it should react when the input is changed. Algebraic properties: cos²(x) = 1 – sin²(x) Describe the relationships between multiple functions, but not a single function. simple categories of properties Initial test case a b c d e f sum s Permute c e b a f d sum s s+12 Add a+2 b+2 c+2 d+2 e+2 f+2 sum Multiply 2a 2b 2c 2d 2e 2f sum Include a b c d e Exclude a b c d e f g 2s s+g sum sum s-f Initial test case #1 a b c d e f sum s Initial test case #2 g h i j k l sum t a b c d e f g h i j k l sum s+t sum 2s+2t Compose Combination of properties 2h 2d 2a 2k 2e 2g 2i 2c 2l 2f 2b 2j Common Metamorphic Properties • Additive: Increase (or decrease) numerical values by a constant • Multiplicative: Multiply numerical values by a constant • Permutative: Randomly permute the order of elements in a set • Invertive: Create the “opposite” of a set • Inclusive: Add a new element to a set • Exclusive: Remove an element from a set • Compositional: Compose a set [Murphy et al., SEKE’08] Other Types of Properties • Noise-based: include input values that will not affect the output • Semantically Equivalent: create inputs that are have the same “meaning” as the original • Heuristic: create inputs that are “close” to the original • Statistical: create inputs that exhibit the same statistical properties one more example Consider a function that takes a set of Points (x-y coordinates) and calculates the total distance from the first to the last, via the rest. What are that function’s metamorphic properties? Okay, I think I get it. But does it really work?!?! In order to find bugs… 1. The original test case must pass, even though there is a bug. 2. The follow-on test case must fail. But how can this be?!?! /* Return the smallest value in the array */ int findMin(int A[]) { int min = A[0]; for (int i = 1; i < A.length-1; i++) { if (A[i] < min) min = A[i]; } return min; } /* Return the smallest value in the array */ int findMin(int A[]) { int min = A[0]; for (int i = 1; i < A.length-1; i++) { if (A[i] < min) min = A[i]; } return min; } Test case { {2, 1, 4, 3}, 1} 100% statement coverage! 100% branch coverage! PASS! /* Return the smallest value in the array */ int findMin(int A[]) { int min = A[0]; for (int i = 1; i < A.length-1; i++) { if (A[i] < min) min = A[i]; } return min; } Test case { {2, 1, 4, 3}, 1} Metamorphic property: If we permute the input, the output remains the same. Follow-on test case: { {4, 2, 3, 1}, 1} FAIL! metamorphic testing in the real world Bioinformatics [Chen et al., BMC Bioinf., 2009] Machine Learning [Xie et al., JSS, 2011] Network Simulation [Chen et al., FTDS, 2009] Computer Graphics [Guderlei et al., QSIC, 2007] what types of applications is metamorphic testing good for? Applications that deal primarily with numerical input and numerical output. Applications that use graph-based algorithms. Compilers. [Zhou et al., ISFST’04] Applications that do not have test oracles. Program Test Input Actual Output Specification Expected Output Oracle what if there is no oracle? Machine Learning Discrete Event Simulation Length of Stay versus Utilization 300 16 14 12 units of time 200 10 150 8 6 100 4 50 2 0 0 0 2 4 6 number of beds 8 10 12 percent utilization 250 LOS Doctor Utilization Nurse Utilization Triage Utilization Clerk Utilization x f f(x) t t(x) g f f(t(x)) Actual =? g(f(x)) Expected If f(t(x)) = g(f(x)) that does not mean that the output is correct. But if f(t(x)) != g(f(x)) then one (or both) must be incorrect. example: RapidMiner RapidMiner is a suite of machine learning algorithms implemented in Java. In its NaïveBayes implementation, a confidence level c is reported whenever it classifies an example e using a model M created from a training data set T. That is: c = Classify(M(T), e) We expect that if we modify T to include an extra instance of e, then the confidence level should double, since we are twice as certain about the classification. That is: Classify(M(T+e), e) = 2 * Classify(M(T), e) Our testing detected violations of this property, thus revealing a bug. [Murphy et al., ICST’09] empirical study Goal: Show that metamorphic testing is more effective than other techniques at finding bugs in applications without test oracles. Approach: Use mutation analysis to insert faults into the applications, and see how many are detected using various techniques. Application domains investigated: 1. Machine Learning (C4.5, MartiRank, Support Vector Machines, PAYL) 2. Discrete Event Simulation (JSim) 3. Information Retrieval (Lucene) 4. Optimization (gaffitter) Techniques investigated: 1. Metamorphic Testing 2. Runtime Assertion Checking 3. Partial Oracle Experimental Results Partial Oracle Runtime Assertion Checking Metamorphic Testing C4.5 MartiRank SVM PAYL JSim Lucene gaffitter TOTAL 0 20 40 60 % of Mutants Killed [Murphy et al., ISSTA’09] 80 100 120 can we do better? That experiment used applicationlevel metamorphic properties. What if we test at the function level, too? And continuously conduct those tests while the software is running? This is known as Metamorphic Runtime Checking. [Murphy et al., TR CUCS-042-09, 2009] Experimental Results Partial Oracle Runtime Assertion Checking Metamorphic Testing MT + MRC C4.5 MartiRank SVM PAYL JSim Lucene gaffitter TOTAL 0 20 40 60 80 100 120 research directions When I run my test, I see that the metamorphic property is violated. Does that mean there's a bug? Well, not necessarily.... How can we know whether the metamorphic properties are sound? I've used the guidelines to identify as many metamorphic properties as I could. Does that mean that's all of them? Well, not necessarily.... How can we know whether the set of metamorphic properties is complete? I've used the guidelines to identify as many metamorphic properties as I could. Does that mean that's all of them? Well, not necessarily.... Could we detect (likely) metamorphic properties automatically? I have a function for which I expect that, if I double the input, the output should be doubled. Could I verify that property without actually executing the code? Well, probably.... Can metamorphic properties be verified statically? summary Metamorphic testing is a method of creating new test cases from existing ones. It depends heavily on the software’s metamorphic properties, which are often numerical. Metamorphic testing is particularly effective at finding bugs in applications that do not have test oracles. thanks! Applications of Metamorphic Testing Chris Murphy University of Pennsylvania cdmurphy@cis.upenn.edu