Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen

advertisement
Application of
Metamorphic Testing to
Supervised Classifiers
Xiaoyuan Xie, Tsong Yueh Chen
Swinburne University of Technology
Christian Murphy, Gail Kaiser
Columbia University
Joshua Ho
University of Sydney
Baowen Xu
Nanjing University
1
Background

Many applications in the field of scientific
computing depend on machine learning
(ML) algorithms

ML applications often do not have test
oracles that indicate whether the output is
correct for arbitrary input

Applications without test oracles are called
“non-testable programs”
2
Problem Statement

Oracles may exist for a limited subset of
the input domain, and gross errors (e.g.
crashes) can be detected with certain
inputs or techniques

However, it is difficult to detect subtle
(computational) errors for arbitrary
inputs
3
Testing ML Applications

There has been much research into
applying ML techniques to software
testing, but not the other way around

Reusable real-world data sets and
frameworks are available for checking that
an ML algorithm predicts well, but not
for checking that an implementation
works correctly
4
Observation
If there is no oracle in the general case,
we cannot know the expected relationship
between a particular input and its output
 However, it may be possible to know
relationships between a set of inputs and
the corresponding set of outputs


“Metamorphic Testing”
such an approach
[Chen et al. ’98]
is
5
Metamorphic Testing

An approach for creating follow-on test
cases based on previous test cases

If input x produces output f(x), then the
function’s “metamorphic properties” are used
to guide a transformation function t, which is
applied to produce a new test case input, t(x)

We can then predict the expected value of
f(t(x)) based on the value of f(x) obtained
from the actual execution
6
Metamorphic Testing without an Oracle

When a test oracle exists, we can know whether
f(t(x)) is correct
– Because we have an oracle for f(x)
– So if f(t(x)) is as expected, then it is correct

When there is no test oracle, f(x) acts as a
“pseudo-oracle” for f(t(x))
– If f(t(x)) is as expected, it is not necessarily
correct
– However, if f(t(x)) is not as expected, either f(x)
or f(t(x)) (or both) is wrong
7
Metamorphic Testing Example

Consider a program that reads a text file of test
scores for students in a class, and computes the
averages and the standard deviation of the
averages
If we permute the values in the text file, the
results should stay the same
 If we multiply each score by 10, the final
results should all be multiplied by 10 as well


These metamorphic properties can be used to
create a “pseudo-oracle” for the application
8
Approach

To apply Metamorphic Testing to such ML
applications, we first enumerate the
metamorphic relations based on the
expected behaviors of a given machine
learning algorithm

We then utilize these relations to conduct
metamorphic testing on the
implementation
9
Verification & Validation

The scope of which metamorphic
properties are necessary may differ
between various problems in the domain

Properties that are necessary can be used
for verification: “Is the implementation
of the algorithm correct?”

Other properties can be used for
validation: “Is the algorithm appropriate
for solving this problem?”
10
Research Questions

What are the metamorphic properties of
supervised ML classification algorithms?
– Which can be used for verification?
– Which can be used for validation?

Can metamorphic testing detect defects in
real-world ML applications?
11
Machine Learning Fundamentals

Data sets consist of a number of samples, each
of which has attributes and a label

In the first phase (“training”), a model is
generated that attempts to generalize how
attributes relate to the label

In the second phase, the model is applied to a
previously-unseen data set (“testing” data) with
unknown labels to produce a classification of
each sample
12
Algorithms Investigated

k-Nearest Neighbors (kNN)
– Samples in the testing data are classified by
using Euclidean distance to find the k nearest
samples in the training data
– Classification is then done by majority rule

Naïve Bayes Classifier (NBC)
– For a given sample in the testing data,
computes the probability of that sample
belonging to each class, assuming conditional
independence between the attributes
13
– Chooses the class that is most likely
Metamorphic Relations

We identified 11 properties that we would
expect all classification algorithms to have

Affine transformation of attributes
Permutation of labels or attributes
Addition of informative or uninformative
attributes
Addition of classes by duplicating or re-labeling
samples
Removal of classes or samples




14
Experimental Setup

Applied the approach to implementations
in the Weka 3.5.7 toolkit

Initial test cases:
– Randomly generated values
– Four attributes (“columns”)
– 20-50 samples (“rows”)

Metamorphic relations were applied to
create 20-300 follow-on test cases
15
Results
k Nearest Neighbors
Property
Naïve Bayes Classifier
Necessary? % violated Necessary? % violated
0
0
7.4
1.1
15.9
0.3
1.2
0
0
2.1
0
0.6
2.2
4.1
0
3.1
0
0
3.2
0
0
4.1
25.3
0
4.2
0
3.9
5.1
5.9
5.6
5.2
2.8
2.8
16
Analysis: kNN

No necessary properties were violated

Issues related to validation:
– Labels that are non-existent in the training data
have a non-zero chance of being selected in
classification
– If two labels are equally likely, the “first” one
that is listed is chosen
17
Analysis: Naïve Bayes

Four necessary properties were violated,
indicating defects in the implementation
– Loss of precision related to use of the
“double” datatype in Java
– Laplace Accuracy used to determine
probabilities; thus, labels that did not appear
in training data have non-zero probability
18
Suggestions

We suggest using the “BigDecimal” class
instead of the “double” datatype
Laplace Accuracy is appropriate for the
attributes but not for the labels
 Use of Laplace Accuracy should be set as
an option

19
Future Work

Apply the testing approach to other
domains that depend on ML, such as
scientific computing

Further investigation of testing “nontestable programs”

Measure the effectiveness of the approach
in empirical studies
20
Summary
Metamorphic testing is easy to implement
and automate
 We were able to devise fault-revealing
properties even with just a basic
understanding of the ML algorithms


Metamorphic testing can be used for both
verification and validation
21
Application of
Metamorphic Testing to
Supervised Classifiers
Xiaoyuan Xie, Tsong Yueh Chen
Swinburne University of Technology
Christian Murphy, Gail Kaiser
Columbia University
Joshua Ho
University of Sydney
Baowen Xu
Nanjing University
22
Related Work

Applying MT to non-testable programs in
other domains

General properties for use in MT
23
Download