Slides - faculty.sutd.edu.sg

advertisement
50.530: Software Engineering
Sun Jun
SUTD
Week 2: Automatic Testing
A Big View: Testing
the initial state
C
A
B
the behaviors we wanted
the behaviors we have
A Big View: Testing
a test which shows a bug
the initial state
C
A
the behaviors we wanted
the behaviors we have
Testing
• Methods: white-box testing, black-box testing,
grey-box testing
• Levels: unit testing, integration testing, system
testing, etc.
• Types: installation testing, compatibility testing,
smoke and sanity testing, regression testing,
acceptance testing, alpha testing, beta testing,
function/non-functional testing, combinatorial
testing, performance testing, security testing, etc.
Research Question
Isn’t jUnit good enough?
How do we automatically generate test cases so as to reveal bugs?
A Big View: Systematic Testing
the initial state
C
A
B
the behaviors we wanted
the behaviors we have
A Big View: Random Testing
a test which shows a bug
the initial state
C
A
the behaviors we wanted
the behaviors we have
Boyapati et al., ISSTA 2002, ACM SIGSOFT Distinguished Paper Award
KORAT: AUTOMATED TESTING
BASED ON JAVA PREDICATES
Motivation
• It is important to be able to generate test
cases automatically.
• It is important to generate test cases which
are representative.
• Korat is merely a sample approach for
systematic test case generation, however, it is
similar in spirit to many systematic testing
techniques (e.g., combinatorial testing,
parameterized testing).
Example
public class BinaryTree {
public static class Node {
Node left; Node right;
}
private Node root;
private int size;
public void remove (Node n) {
//some code
}
…
}
How do we test remove(node n)?
Example
• How do we test remove(Node n)?
– We need a valid BinaryTree object bt.
– We need a valid Node object nd.
– We need to know what is expected after executing
bt.remove(nd)
public class BinaryTree {
public static class Node {
Node left; Node right;
}
private Node root;
private int size;
public void remove (Node n) {
//some code
}
…
}
Vocabulary
Class invariant:
• an invariant used to define
what are valid objects of
the class
• e.g., size == 0 if root == null
and size equals to the
number of nodes in the
tree
public class BinaryTree {
public static class Node {
Node left; Node right;
}
private Node root;
private int size;
public void remove (Node n) {
//some code
}
…
}
Vocabulary
Pre-condition (of a method)
• a condition which must be
true prior to the execution
of the method
• e.g., n must not be null.
The class invariant is always
part of the pre-condition.
public class BinaryTree {
public static class Node {
Node left; Node right;
}
private Node root;
private int size;
public void remove (Node n) {
//some code
}
…
}
Vocabulary
Post-condition (of a method)
• a condition which must be
true after the execution of
the method
• e.g., after remove, size is
decremented by 1.
The class invariant is always
part of the post-condition.
public class BinaryTree {
public static class Node {
Node left; Node right;
}
private Node root;
private int size;
public void remove (Node n) {
//some code
}
…
}
Karat: Assumption
public boolean repOK() {
if (root == null)
return size == 0;
Set<Node> visited = new HashSet<Node>();
visited.add(root);
LinkedList<Node> workList = new LinkedList<Node>();
workList.add(root);
while (!workList.isEmpty()) {
Node current = (Node) workList.removeFirst();
if (current.left != null) {
if (!visited.add(current.left))
return false;
workList.add(current.left);
}
if (current.right != null) {
if (!visited.add(current.right))
return false;
workList.add(current.right);
}
}
A class invariant is
encoded as a method
repOk(), which return true
if and only if the object is
in a state which satisfies
the class invariant.
return (visited.size() == size);
}
Korat: Assumption
• Pre-condition and postcondition are encoded
in Java Modeling
Language
//@ public invariant repOk(); // class invariant
// for BinaryTree
/*@ public normal_behavior // specification for remove
@ requires has(n); // precondition
@ ensures !has(n); // postcondition
@*/
public void remove(Node n) {
// ... method body
}
This is probably too harsh a pre-condition?
Karat: Approach
Generate a BinaryTree bt and a Node n
if repOk() and pre-condition is true
otherwise
Execute bt.remove(n)
if post-condition is true
otherwise
Finitization
• There are infinitely
many candidates for bt
and n.
– For each variable
in the class, define its
domain
interesting bt
all possible bt
Finitization
public static Finitization finBinaryTree(int NUM_Node) {
Finitization f = new Finitization (BinaryTree.class);
ObjSet nodes = f.createObjSet(“Node”, NUM_Node);
nodes.add(null);
f.set("root", nodes);
f.set("Node.left", nodes);
f.set("Node.right", nodes);
public class BinaryTree {
return f;
public static class Node {
}
Node left; Node right;
}
private Node root;
private int size;
…
}
Finitization
public static Finitization finBinaryTree(int NUM_Node) {
Finitization f = new Finitization (BinaryTree.class);
ObjSet nodes = f.createObjSet(“Node”, NUM_Node);
nodes.add(null);
f.set("root", nodes);
f.set("Node.left", nodes);
f.set("Node.right", nodes);
return f;
translation
}
nodes = {null, N0, N1, N2}
BinaryTree.root is a member of nodes
Node.left is a member of nodes
Node.right is a member of nodes
Example Trees
With finBinaryTree(3), there are 4 objects: one
BinaryTree object, three Node objects, which could be
set up as follows.
Finitization: the Space
• How many bt are there
with finBinaryTree(3),
assume that bt.size is
always set to the right
value?
– 4^7
• How many bt are there
with finBinaryTree(n)?
– (n+1)^(2n+1)
interesting bt
all possible bt
Filtering 1
• For each candidate bt
and n, check the precondition of remove. If
the pre-condition is not
satisfied, ignore that
tree.
interesting bt
invalid bt
all possible bt
public boolean repOK() {
if (root == null)
return size == 0;
Set<Node> visited = new HashSet<Node>();
visited.add(root);
LinkedList<Node> workList = new LinkedList<Node>();
workList.add(root);
while (!workList.isEmpty()) {
Node current = (Node) workList.removeFirst();
if (current.left != null) {
if (!visited.add(current.left))
return false;
workList.add(current.left);
}
if (current.right != null) {
if (!visited.add(current.right))
return false;
workList.add(current.right);
}
Is the following bt valid?
}
return (visited.size() == size);
}
Korat: Search Algorithm
1. Order all the elements in every class domain
and every field domain
1. Node class ordering: <null, N0, N1, N2>
2. Assume domain of size: <3>
2. Generate a candidate as a vector of field
domain indices, e.g., [1,0,2,2,0,0,0,0]
Korat: Search Algorithm
3. Invoke repOk() to check if the candidate is
valid, e.g., [1,0,2,2,0,0,0,0] is invalid
4. Backtrack to generate the next candidate in
line, e.g., [1,0,2,2,0,0,0,1]
Optimization 1
• During the execution of repOk, Korat monitors
the fields that repOk accesses.
– e.g., [0, 2, 3] for the following example
• If repOk() results in false, backtrack until the
accessed fields are different
– e.g., try [1,0,2,3,0,0,0,0] after [1,0,2,2,0,0,0,0]
Is this justified?
Theory
• For non-deterministic repOk methods,
– All candidates for which repOk() always returns
true are generated
– Candidates for which repOk() always returns false
are never generated;
• Candidates for which repOk() sometimes
returns true and sometimes false may or may
not be generated.
Optimization 2
• If we generated the above, we may not want
to generate [1, 0, 3, 2, 0, 0, 0, 0].
Is this justified?
N2
N1
Vocabulary
object graph:
N2
Isomorphic: two object graphs C
and C’ are isomorphic iff there is
a permutation per such that
per(C) = C’ and per(C’) = C
– e.g., per = {N1->N2, N2->N1}
N1
Optimization 2
interesting bt
representative
all candidates in the same region are isomorphic
Representative
Given the two graphs below
N2
N1
[1, 0, 2, 3, 0, 0, 0, 0] and [1, 0, 3, 2, 0, 0, 0, 0],
Korat takes the latter as a representative, as it is
“bigger”.
Implementation: Op 2
• When backtracking from [a, b, c, …,k, …],
– Korat tries [a, b, c, …,k+1, …] if k+1 is smaller than
or equal to any number in the vector which has
the same associated type.
– Korat tries [a, b, c, …, j+1, …] otherwise.
Example
When backtrack from [1, 0, 2, 2, 0, 0, 0, 0],
Korat skips [1, 0, 2, 3, 0, 0, 0, 0] (since there is a
“bigger” representative [1,0,3,2,0,0,0,0]), and
continues with [1,0,3,0,0,0,0,0]
Result
Only 5 bt are generated – assuming size is set to
3 always.
Evaluation
Is this biased?
Experiment I
Experiment II
Experiment III
Conclusion
• Korat generates test cases from a specified
domain and correctness specification.
• Korat reduces test cases based on
– pre-condition
– a simple learning
– symmetry reduction
Exercise 1
Apply Korat to java.util.Stack by answering the
following questions.
• What is the repOk()?
• What is the pre-condition and post-condition
of method push and pop?
• How would you track which fields are
accessed in repOk()?
• When are two stack objects isomorphic?
Discussion
Any thought on Korat?
Pacheco et. al. ICSE 2007, cited 440+
FEEDBACK-DIRECTED RANDOM
TEST GENERATION
Random Testing
• Easy to implement
• Yields lots of test cases
• Finds errors
– 1990: Unix utilities
Perhaps simply got lucky?
– 1998: OS services
– 2000: GUI applications
– 2000: functional programs
– 2005: object-oriented programs
– 2007: flash memory, file systems
Research Question
Which one is better: systematic testing or random testing?
Random vs Systematic
• Theoretical work suggests that random testing is
as effective as more systematic input generation
techniques
– Duran et al. 1984 and Hamlet et al. 1990
• Some empirical studies suggest systematic is
more effective than random
– Ferguson et al. 1996: vs. chaining
– Marinov et al. 2003: vs. bounded exhaustive
– Visser et al. 2006: vs. model checking and
symbolic execution
small benchmarks; no measurement on error revealing effectiveness
Contributions
• Propose feedback-directed random test generation
– Randomized creation of new test inputs is guided by
feedback about the execution of previous inputs
– Goal is to avoid redundant and illegal inputs
• Empirical evaluation
– Evaluate coverage and error-detection ability on a large
number of widely-used, well-tested libraries (780KLOC)
– Compare against systematic input generation
– Compare against undirected random input generation
Sample Test Case
public static void test1 () {
LinkedList l1 = new LinkedList();
Object o1 = new Object();
l1.addFirst(o1);
TreeSet t1 = new TreeSet(l1);
Set s1 = Collections.unmodifiableSet(t1);
Assert.assertTrue(s1.equals(s1));
}
Randoop
• Input: a class with multiple public methods.
• Output: a set of test cases (sequences of
method calls)
• Main idea:
– Build test inputs incrementally: New test inputs
extend previous ones
– As soon as a test input is created, execute it
– Use execution results to guide generation
The Oracle Problem
If we are to do automatic testing, we must
know what are the correct results, but how?
Specification
How to get a better specification in general?
Algorithm
“There are two ways of constructing a software design: One way is
to make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies.”
1980, C.A.R.Hoare
Randoop Example
Date s = new Date(2006, 2, 14);
Assert specification
assertTrue(s.equals(s));
How do we randomly generate construct parameter values like 2006, 2, 14?
Randoop Example
HashSet s = new HashSet();
Randomly pick a public method
s.add(“”);
Assert specification
assertTrue(s.equals(s));
The default value for String “” is used since there is no other String in the system.
Randoop Example
HashSet s = new HashSet();
Randomly pick a public method
s.add(“”);
Randomly pick a public method
s.isEmpty();
Assert specification
assertTrue(s.equals(s));
A method is probably an observer method if it has no parameters; it is
public and non-static; it returns primitive values; and its name is size,
count, length, toString, or begins with get or is.
Randoop Example
Date d = new Date(2006, 2, 14);
Randomly pick a public method
d.setMonth(-1); // pre: argument >= 0
A sequence of method calls result in an exception is added to errSeqs.
Randoop Example
Date d = new Date(2006, 2, 14);
Randomly pick a public method
d.setMonth(-1); // pre: argument >= 0
d.setDay(5);
Assert specification
assertTrue(s.equals(s));
Classifying a sequence
start
execute and
check
contracts
contract
violated?
yes
minimize
sequence
no
components
no
sequence
redundant?
yes
discard
sequence
contractviolating
test case
Redundancy Checking
• Randoop maintains a set of objects for each
type.
• A sequence (of method calls) is redundant if
the objects created during its execution are
members of the above set.
– Use equals() to compare
– Or user-defined more sophisticated checking
Some Randoop options
• Avoid use of null
statically…
…and dynamically
Object o = new Object();
LinkedList l = new LinkedList();
l.add(null);
Object o = returnNull();
LinkedList l = new LinkedList();
l.add(o);
• Biased random selection
– Favor smaller sequences
– Favor methods that have been less covered
– Use constants mined from source code
Research Question
How effective would Randoop be?
How do we judge whether one set of random test
cases are better than another set?
Coverage
• Code block coverage: a set of random test
cases are better if it covers more code blocks.
– For instance, consider each branch as a block
• Predicate coverage: given a set of predicates, a
set of random test cases are better if it covers
more valuations of the predicates.
– For instance, consider the predicates to be the
propositions in the program.
Coverage Achieved by Randoop
data structure
time (s)
branch cov.
Bounded stack (30 LOC)
1
100%
Unbounded stack (59 LOC)
1
100%
BS Tree (91 LOC)
1
96%
Binomial heap (309 LOC)
1
84%
Linked list (253 LOC)
1
100%
Tree map (370 LOC)
1
81%
Heap array (71 LOC)
1
100%
Is this representative?
Predicate Coverage
Binary tree
Binomial heap
102
predicate coverage
predicate coverage
55
feedback-directed
54
best systematic
53
undirected random
52
96
90
undirected random
84
0
0.5
1
1.5
2
2.5
0
5
time (seconds)
10
15
time (seconds)
Fibonacci heap
Tree map
100
107
feedback-directed
96
predicate coverage
predicate coverage
best systematic
feedback-directed
best systematic
92
88
undirected random
best systematic
feedback-directed
106
105
undirected random
104
103
84
0
20
40
60
tim e (seconds)
80
100
0
10
20
30
time (seconds)
40
50
Bug Detection
JDK (2 libraries)
LOC
Classes
53K
272
(java.util, javax.xml)
Apache commons
(5 libraries)
114K
974
582K
3330
A
(logging, primitives,
chain jelly, math,
collections)
.Net framework
(5 libraries)
How would Korat perform on these examples?
C
Methodology
• Ran Randoop on each library
– Used default time limit (2 minutes)
• Contracts:
–
–
–
–
–
o.equals(o)==true
o.equals(o) throws no exception
o.hashCode() throws no exception
o.toString() throw no exception
No null inputs and:
• Java: No NullPointerEexceptions
• .NET: No NPEs, out-of-bounds, of illegal state exceptions
Results
test
cases
output
errorrevealing
tests cases
distinct
errors
32
29
8
Apache commons
187
29
6
.Net framework
192
192
192
Total
411
250
206
JDK
Errors found: examples
•
JDK Collections classes have 4 methods that create objects violating o.equals(o)
contract
•
Javax.xml creates objects that cause hashCode and toString to crash, even though
objects are well-formed XML constructs
•
Apache libraries have constructors that leave fields unset, leading to NPE on calls
of equals, hashCode and toString (this only counts as one bug)
•
Many Apache classes require a call of an init() method before object is legal—led
to many false positives
•
.Net framework has at least 175 methods that throw an exception forbidden by
the library specification (NPE, out-of-bounds, of illegal state exception)
•
.Net framework has 8 methods that violate o.equals(o)
•
.Net framework loops forever on a legal but unexpected input
Regression testing
• Randoop can create regression oracles
• Generated test cases using JDK 1.5
– Randoop generated 41K regression test cases
• Ran resulting test cases on
– JDK 1.6 Beta
• 25 test cases failed
Object o = new Object();
LinkedList l = new LinkedList();
l.addFirst(o);
l.add(o);
assertEquals(2, l.size());
// expected to pass
assertEquals(false, l.isEmpty()); // expected to pass
– Sun’s implementation of the JDK
• 73 test cases failed
– Failing test cases pointed to 12 distinct errors
– These errors were not found by the extensive compliance
test suite that Sun provides to JDK developers
Evaluation: summary
• Feedback-directed random test generation:
– Is effective at finding errors
• Discovered several errors in real code (e.g. JDK, .NET
framework core libraries)
– Can outperform systematic input generation
• On previous benchmarks and metrics (coverage), and
• On a new, larger corpus of subjects, measuring error
detection
– Can outperform undirected random test generation
Conclusion
• Feedback-directed random test generation
– Finds errors in widely-used, well-tested libraries
– Can outperform systematic test generation
– Can outperform undirected test generation
• Randoop:
– Easy to use—just point at a set of classes
– Has real clients: used by product groups at Microsoft
• A mid-point in the systematic-random space of
input generation techniques
Exercise 2
Apply Randoop, manually, to OrderSet.java
• To create 2 valid tests, one redundant test and
one illegal sequence.
• Create a test case to expose the bug.
Research Question
How do we improve Korat or Randoop?
Download