Automatic System Testing of Programs without Test Oracles Columbia University

advertisement
Automatic System Testing of
Programs without Test Oracles
Christian Murphy, Kuang Shen, Gail Kaiser
Columbia University
Problem Statement

Some applications (e.g. machine learning,
simulation) do not have test oracles that indicate
whether the output is correct for arbitrary input

Oracles may exist for a limited subset of the input
domain, and gross errors (e.g. crashes) can be
detected with certain inputs or techniques

However, it is difficult to detect subtle
(computational) errors for arbitrary inputs in such
“non-testable programs”
Chris Murphy – Columbia University
2
Observation
If there is no oracle in the general case, we
cannot know the expected relationship
between a particular input and its output
 However, it may be possible to know
relationships between sets of inputs and the
corresponding set of outputs


“Metamorphic Testing” [Chen et al. ’98] is such
an approach
Chris Murphy – Columbia University
3
Metamorphic Testing

An approach for creating follow-up test cases
based on previous test cases

If input x produces output f(x), then the function’s
“metamorphic properties” are used to guide a
transformation function t, which is applied to
produce a new test case input, t(x)

We can then predict the expected value of
f(t(x)) based on the value of f(x) obtained
from the actual Chris
execution
Murphy – Columbia University
4
Metamorphic Testing without an Oracle

When a test oracle exists, we can know
whether f(t(x)) is correct
we have an oracle for f(x)
 So if f(t(x)) is as expected, then it is correct
 Because

When there is no test oracle, f(x) acts as a
“pseudo-oracle” for f(t(x))
f(t(x)) is as expected, it is not necessarily
correct
 However, if f(t(x)) is not as expected, either
f(x) or f(t(x)) (or both) is wrong
 If
Chris Murphy – Columbia University
5
Metamorphic Testing Example

Consider a program that reads a text file of test
scores for students in a class, and computes the
averages and the standard deviation of the
averages

If we permute the values in the text file, the results
should stay the same
If we multiply each score by 10, the final results
should all be multiplied by 10 as well


These metamorphic properties can be used to
create a “pseudo-oracle” for the application
Chris Murphy – Columbia University
6
Limitations of Metamorphic Testing

Manual transformation of the input data or
comparison of output can be laborious and
error-prone

Comparison of outputs not always possible
with tools like diff when they are not
expected to be “exactly” the same
Chris Murphy – Columbia University
7
Our Solution

Automated Metamorphic System Testing

Tester needs to:
 Specify
the application’s metamorphic properties
 Configure the testing framework
 Run the application with its test input

Framework takes care of automatically:
 Transforming
program input data
 Executing multiple instances of the application with
different transformed inputs in parallel
 Comparing outputs of the executions
Chris Murphy – Columbia University
8
Model
Chris Murphy – Columbia University
9
Amsterdam: Automated Metamorphic System Testing Framework

Metamorphic properties are specified in XML
 Input transformation
 Runtime options
 Output comparison

Framework provides out-of-box support for
numerous transformation and comparison
functions but is extendable to support custom
operations

Additional invocations are executed in parallel in
separate sandboxes that have their own virtual
execution environment [Osman et al. OSDI’02]
Chris Murphy – Columbia University
10
Empirical Studies

To measure the effectiveness of the
approach, we selected three real-world
applications from the domain of supervised
machine learning
 Support
Vector Machines (SVM): vector-based
classifier
 C4.5: decision tree classifier
 MartiRank: ranking application
Chris Murphy – Columbia University
11
Methodology (1)

Mutation testing was used to seed defects into
each application
 Comparison operators were reversed
 Math operators were changed
 Off-by-one errors were introduced

For each program, we created multiple variants,
each with exactly one mutation

Weak mutants (that did not affect the final output)
were discarded, as were those that caused outputs
that were obviously wrong
Chris Murphy – Columbia University
12
Methodology (2)

Each variant (containing one mutation) acted
as a pseudo-oracle for itself:
 Program
was run to produce an output with the
original input dataset
 Metamorphic properties applied to create new
input datasets
 Program run on new inputs to create new
outputs
 If outputs not as expected, the mutant had been
killed (i.e. the defect had been detected)
Chris Murphy – Columbia University
13
Metamorphic Properties

Each application had four metamorphic properties
specified, based on:
 Permuting
the order of the elements in the input data set
 Multiplying the elements by a positive constant
 Adding a constant to the elements
 Negating the values of the elements in the input data

Testing was conducted using our implementation of
the Amsterdam framework
Chris Murphy – Columbia University
14
SVM Results





Permuting the input was very effective at killing offby-one mutants
Many functions in SVM perform calculations on a
set of numbers
Off-by-one mutants caused some element of the
set to be omitted
By permuting, a different number would be omitted
The results of the calculations would be different,
revealing the defect
Chris Murphy – Columbia University
15
C4.5 Results




Negating the input was very effective
C4.5 creates a decision tree in which nodes contain
clauses like “if attrn > α then class = C”
If the data set is negated, those nodes should
change to “if attrn ≤ -α then class = C”, i.e. both the
operator and the sign of α
In most cases, only one of the changes occurred
Chris Murphy – Columbia University
16
MartiRank Results



Permuting and negating were effective at killing
comparison operator mutants
MartiRank depends heavily on sorting
Permuting and negating change which numbers get
compared and what the result should be, thus
inducing the differences in the final sorted list
Chris Murphy – Columbia University
17
Summary of Results
143 mutants killed out of 182 (78%)
 Permuting or negating the inputs proved to
be effective techniques for killing mutants
because of the mathematical nature of the
applications
 Multiplying and adding were not effective,
possibly because of the nature of the
mutants we inserted

Chris Murphy – Columbia University
18
Benefits of Automation


For SVM, all of the metamorphic properties called
for the outputs to be the same as the original
But in practice we knew they wouldn’t be exactly
the same
 Partly
due to floating point calculations
 Partly due to approximations in the implementation

We could use Heuristic Metamorphic Testing to
allow for outputs that were considered “close
enough” (either semantically or to within some
tolerance)
Chris Murphy – Columbia University
19
Effect on Testing Time

Without parallelism, metamorphic testing
introduces at least 100% overhead since the
application must be run at least twice

In our experiments on a multi-core machine,
the only overhead came from creating the
“sandbox” and comparing the results
 less
than one second for a 10MB input file
Chris Murphy – Columbia University
20
Limitations and Future Work

Framework Implementation
 The
“sandbox” only includes in-process memory and the
file system, but not anything external to the system
 The framework does not yet address fault localization

Approach
 Approach
requires some knowledge of the application to
determine the metamorphic properties in the first place
 Need to investigate applicability to other domains
 Further applicability of Heuristic Metamorphic Testing to
non-deterministic applications
Chris Murphy – Columbia University
21
Contributions

A testing technique called Automated
Metamorphic System Testing that
facilitates testing of non-testable programs

An implementation called Amsterdam

Empirical studies demonstrating the
effectiveness of the approach
Chris Murphy – Columbia University
22
Automatic System Testing of
Programs without Test Oracles
Chris Murphy
cmurphy@cs.columbia.edu
http://psl.cs.columbia.edu/metamorphic
Related Work

Pseudo-oracles [Davis & Weyuker ACM’81]
Testing non-testable programs [Weyuker TCJ’82]

Overview of approaches [Baresi and Young ’01]

 Embedded
assertion languages
 Extrinsic interface contracts
 Pure specification languages
 Trace checking & log file analysis

Using metamorphic testing [Chen et al. JIST’02; others]
Chris Murphy – Columbia University
24
Related Work

Applying Metamorphic Testing to “nontestable programs”
 Chen

et al. ISSTA’02 (among others)
Automating metamorphic testing
 Gotleib
& Botella COMPSAC’03
Chris Murphy – Columbia University
25
Categories of Metamorphic Properties








Additive: Increase (or decrease) numerical values by a
constant
Multiplicative: Multiply numerical values by a constant
Permutative: Randomly permute the order of elements in a
set
Invertive: Reverse the order of elements in a set
Inclusive: Add a new element to a set
Exclusive: Remove an element from a set
Others….
ML apps such as ranking, classification, and anomaly
detection exhibit these properties [Murphy SEKE’08]
Chris Murphy – Columbia University
26
Specifying Metamorphic Properties
Chris Murphy – Columbia University
27
Further Testing

For each app, additional data sets were used
to see if more mutants could be killed
SVM: 18 of remaining 19 were killed
 MartiRank: 6 of remaining 19 were killed
 C4.5: one remaining mutant was killed

Chris Murphy – Columbia University
28
Heuristic Metamorphic Testing

Specify metamorphic properties in which the
results are may be “similar” but not
necessarily exactly the same as predicted

Reducing false positives by checking
against a difference threshold when
comparing floating point numbers

Addressing non-determinism by specifying
heuristics for what is considered “close”
Chris Murphy – Columbia University
29
Download