Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail Kaiser Columbia University Joshua Ho University of Sydney Baowen Xu Nanjing University 1 Background Many applications in the field of scientific computing depend on machine learning (ML) algorithms ML applications often do not have test oracles that indicate whether the output is correct for arbitrary input Applications without test oracles are called “non-testable programs” 2 Problem Statement Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques However, it is difficult to detect subtle (computational) errors for arbitrary inputs 3 Testing ML Applications There has been much research into applying ML techniques to software testing, but not the other way around Reusable real-world data sets and frameworks are available for checking that an ML algorithm predicts well, but not for checking that an implementation works correctly 4 Observation If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output However, it may be possible to know relationships between a set of inputs and the corresponding set of outputs “Metamorphic Testing” such an approach [Chen et al. ’98] is 5 Metamorphic Testing An approach for creating follow-on test cases based on previous test cases If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution 6 Metamorphic Testing without an Oracle When a test oracle exists, we can know whether f(t(x)) is correct – Because we have an oracle for f(x) – So if f(t(x)) is as expected, then it is correct When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) – If f(t(x)) is as expected, it is not necessarily correct – However, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong 7 Metamorphic Testing Example Consider a program that reads a text file of test scores for students in a class, and computes the averages and the standard deviation of the averages If we permute the values in the text file, the results should stay the same If we multiply each score by 10, the final results should all be multiplied by 10 as well These metamorphic properties can be used to create a “pseudo-oracle” for the application 8 Approach To apply Metamorphic Testing to such ML applications, we first enumerate the metamorphic relations based on the expected behaviors of a given machine learning algorithm We then utilize these relations to conduct metamorphic testing on the implementation 9 Verification & Validation The scope of which metamorphic properties are necessary may differ between various problems in the domain Properties that are necessary can be used for verification: “Is the implementation of the algorithm correct?” Other properties can be used for validation: “Is the algorithm appropriate for solving this problem?” 10 Research Questions What are the metamorphic properties of supervised ML classification algorithms? – Which can be used for verification? – Which can be used for validation? Can metamorphic testing detect defects in real-world ML applications? 11 Machine Learning Fundamentals Data sets consist of a number of samples, each of which has attributes and a label In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label In the second phase, the model is applied to a previously-unseen data set (“testing” data) with unknown labels to produce a classification of each sample 12 Algorithms Investigated k-Nearest Neighbors (kNN) – Samples in the testing data are classified by using Euclidean distance to find the k nearest samples in the training data – Classification is then done by majority rule Naïve Bayes Classifier (NBC) – For a given sample in the testing data, computes the probability of that sample belonging to each class, assuming conditional independence between the attributes 13 – Chooses the class that is most likely Metamorphic Relations We identified 11 properties that we would expect all classification algorithms to have Affine transformation of attributes Permutation of labels or attributes Addition of informative or uninformative attributes Addition of classes by duplicating or re-labeling samples Removal of classes or samples 14 Experimental Setup Applied the approach to implementations in the Weka 3.5.7 toolkit Initial test cases: – Randomly generated values – Four attributes (“columns”) – 20-50 samples (“rows”) Metamorphic relations were applied to create 20-300 follow-on test cases 15 Results k Nearest Neighbors Property Naïve Bayes Classifier Necessary? % violated Necessary? % violated 0 0 7.4 1.1 15.9 0.3 1.2 0 0 2.1 0 0.6 2.2 4.1 0 3.1 0 0 3.2 0 0 4.1 25.3 0 4.2 0 3.9 5.1 5.9 5.6 5.2 2.8 2.8 16 Analysis: kNN No necessary properties were violated Issues related to validation: – Labels that are non-existent in the training data have a non-zero chance of being selected in classification – If two labels are equally likely, the “first” one that is listed is chosen 17 Analysis: Naïve Bayes Four necessary properties were violated, indicating defects in the implementation – Loss of precision related to use of the “double” datatype in Java – Laplace Accuracy used to determine probabilities; thus, labels that did not appear in training data have non-zero probability 18 Suggestions We suggest using the “BigDecimal” class instead of the “double” datatype Laplace Accuracy is appropriate for the attributes but not for the labels Use of Laplace Accuracy should be set as an option 19 Future Work Apply the testing approach to other domains that depend on ML, such as scientific computing Further investigation of testing “nontestable programs” Measure the effectiveness of the approach in empirical studies 20 Summary Metamorphic testing is easy to implement and automate We were able to devise fault-revealing properties even with just a basic understanding of the ML algorithms Metamorphic testing can be used for both verification and validation 21 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail Kaiser Columbia University Joshua Ho University of Sydney Baowen Xu Nanjing University 22 Related Work Applying MT to non-testable programs in other domains General properties for use in MT 23