Cross-Entropy Based Testing Hana Chockler, Benny Godlin, Eitan Farchi, Sergey Novikov Haifa, Israel

advertisement
Cross-Entropy Based
Testing
IBM Research
Hana Chockler, Benny Godlin, Eitan Farchi, Sergey Novikov
Research
Haifa, Israel
IBM Haifa Labs
© 2007 IBM Corporation
© 2006 IBM Corporation
IBM Research
The problem:
How to test for rare problems in large programs?
Testing involves running the program many times, hoping to find the
problem.
o If a problem appears only in a small fraction of the runs, it is
unlikely to be found during random executions.
searching for a
needle in haystack
2
2
© 2003 IBM Corporation
IBM Research
The main idea:
Use the
cross-entropy
method!
The cross-entropy method is a widely used approach
to estimating probabilities of rare events (Rubinstein).
3
3
© 2003 IBM Corporation
IBM Research
The cross-entropy method - motivation
 The problem:
o There is a probability space S with probability distribution f and
a performance function P defined on it.
o A rare event e is that P(s) > r, for some s 2 S and some r
and this happens very rarely under f
o How can we estimate the probability of e?
input in which the
rare event e occurs
s
4
Space S
4
© 2003 IBM Corporation
IBM Research
The naïve idea
Generate a big enough sample and compute the probability of the
rare event from the inputs in the sample
a huge
sample
from the
probability
space
This won’t work because for very
rare events even a very large
sample does not reflect the
probability correctly
5
5
© 2003 IBM Corporation
IBM Research
The cross-entropy method
A wishful thinking: if we had a distribution that gives
the good inputs the probability 1, then we would be all
set …
But we don’t have such a distribution
 So we try to approximate it in iterations, every time trying to come
a little closer:
o In each iteration, we generate a sample of some (large) size.
o We update the parameters (the probability distribution) so
that we get a better sample in the next iteration.
6
6
w.r.t. the performance function
© 2003 IBM Corporation
IBM Research
Formal definition of cross-entropy
 In information theory, the cross entropy, or the Kullback-Leibler
“distance” between two probability distributions p and q measures
the average number of bits needed to identify an event from a set
of possibilities, if a coding scheme is used based on a given
probability distribution q, rather than the "true" distribution p.
 The cross entropy for two distributions p and q over the same
discrete probability space is defined as follows:
H(p,q) = - x p(x) log(q(x))
7
7
not really a distance, because it is not symmetric
© 2003 IBM Corporation
IBM Research
The cross-entropy method
for optimization problems
[Rubinstein]
 In optimization problems, we are looking for inputs that maximize
the performance function.
 The main problem is that this maximum is unknown beforehand.
 The stopping point is when the sample has a small relative standard
deviation.
 The method was successfully applied to a variety of graph
optimization problems:
o MAX-CUT
o Traveling salesman
o …
8
8
© 2003 IBM Corporation
IBM Research
starting
point
Performance
function
Illustration
Performance
function
Uniform distribution
Updated distribution
9
9
© 2003 IBM Corporation
IBM Research
The setting in graphs
 In graph problems, we have the following:
o The space is all paths in the graph G
o A performance function f gives each path a value
o We are looking for a path that maximizes f
 In each iteration, we choose the best part Q of the sample
 The probability update formula for an edge e=(v,w) is
f’(e) =
10
#paths in Q that use e
#paths in Q that go via v
10
© 2003 IBM Corporation
IBM Research
Cross-entropy for testing
 A program is viewed as a graph
 Each decision point is a node in the graph
 Decision points can result from any non-deterministic or other not
predetermined decisions:
concurrency
inputs
coin tossing
 The performance function is defined according to the bug that we
want to find
11 o More on than later …
11
© 2003 IBM Corporation
IBM Research
Our implementation
 We focus on
concurrent programs.
 A program under test is represented as a graph, with nodes being
the synchronization points.







12
this works only if
there is a correct
locking policy
Edges are possible transitions between nodes.
The graph is assumed to be DAG – all loops are unwound.
The graph is constructed on-the-fly during the executions.
The initial probability distribution is uniform among edges.
We collect a sample of several hundreds executions.
We adjust the probabilities of edges according to the formula.
We repeat the process until the sample has a very small relative
standard deviation.
1-5%
12
© 2003 IBM Corporation
IBM Research
Dealing with loops
for i=1 to 100 do
sync node;
end for
i mod 2
 Unwinding all loops creates a huge graph.
 Problems with huge graphs:
sync node
sync node
odd
even
o Takes more space to represent
o Takes more time to converge
 We assume that most of the time, we are doing the same thing on
subsequent iterations of the loop.
for instance, modulo 2
creates two nodes for
each location inside the
loop – for even and for
odd iterations
13
 We introduce modulo parameter.
 It reduces the size of the graph.
dramatically, but also loses information
 There is a balance between a too-small and
a too-large modulo parameter that is found
13 empirically.
© 2003 IBM Corporation
IBM Research
Bugs and performance functions
14
14
bug
performance function
buffer overflow
number of elements in the
buffer
deadlock
number of locks
data race
number of accessed shared
resources
testing error paths
number of error paths taken
note that we can also test for
patterns, not necessarily bugs
© 2003 IBM Corporation
IBM Research
Implementation – in Java for Java
Instrumentation
Stopper
Decider
program
under test
-------------------------------
probability
distribution
table
Evaluator
15
15
Updater
disk
© 2003 IBM Corporation
IBM Research
Experimental results
 We ran ConCEnter on several examples with buffer overflow and
with deadlocks.
 The bugs were very rare and did not manifest themselves in random
testing.
 ConCEnter found the bugs successfully.
 The method requires significant tuning: the modulo parameter, the
smoothing parameter, correct definition of the performance
function, etc.
Example: A-B-push-pop
myName=A // or B – there are two types thread A
loop: if (top_of_stack=myName) pop; thread A
else
push(myName);
x10
end loop;
16
16
the probability of stack
overflow is exponentially
small
thread B
thread B
A
B
A
36
© 2003 IBM Corporation
IBM Research
Future work
 Automatic tuning.
 Making ConCEnter plug-and-play for some predefined bugs.
 Replay: can we use distance from a predefined execution as a
performance function?
works already
 Second best: what if there are several areas in the graph where the
maximum is reached?
 What are the restrictions on the performance function in order for
this method to work properly?
17
17
seems that the function
should be smooth enough
© 2003 IBM Corporation
IBM Research
Related work
 Testing:
o Random testing
nothing specifically
targeted to rare
o Stress testing
bugs
o Noise makers
o Coverage estimation
o Bug-specific heuristics
o Genetic algorithms
o …
 Cross-entropy applications:
o Buffer allocation, neural computation, DNA sequence alignment,
scheduling, graph problems, …
18
18
cross-entropy is useful in many
areas
© 2003 IBM Corporation
IBM Research
19
19
© 2003 IBM Corporation
Download