Cross-Entropy Based Testing IBM Research Hana Chockler, Benny Godlin, Eitan Farchi, Sergey Novikov Research Haifa, Israel IBM Haifa Labs © 2007 IBM Corporation © 2006 IBM Corporation IBM Research The problem: How to test for rare problems in large programs? Testing involves running the program many times, hoping to find the problem. o If a problem appears only in a small fraction of the runs, it is unlikely to be found during random executions. searching for a needle in haystack 2 2 © 2003 IBM Corporation IBM Research The main idea: Use the cross-entropy method! The cross-entropy method is a widely used approach to estimating probabilities of rare events (Rubinstein). 3 3 © 2003 IBM Corporation IBM Research The cross-entropy method - motivation The problem: o There is a probability space S with probability distribution f and a performance function P defined on it. o A rare event e is that P(s) > r, for some s 2 S and some r and this happens very rarely under f o How can we estimate the probability of e? input in which the rare event e occurs s 4 Space S 4 © 2003 IBM Corporation IBM Research The naïve idea Generate a big enough sample and compute the probability of the rare event from the inputs in the sample a huge sample from the probability space This won’t work because for very rare events even a very large sample does not reflect the probability correctly 5 5 © 2003 IBM Corporation IBM Research The cross-entropy method A wishful thinking: if we had a distribution that gives the good inputs the probability 1, then we would be all set … But we don’t have such a distribution So we try to approximate it in iterations, every time trying to come a little closer: o In each iteration, we generate a sample of some (large) size. o We update the parameters (the probability distribution) so that we get a better sample in the next iteration. 6 6 w.r.t. the performance function © 2003 IBM Corporation IBM Research Formal definition of cross-entropy In information theory, the cross entropy, or the Kullback-Leibler “distance” between two probability distributions p and q measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p. The cross entropy for two distributions p and q over the same discrete probability space is defined as follows: H(p,q) = - x p(x) log(q(x)) 7 7 not really a distance, because it is not symmetric © 2003 IBM Corporation IBM Research The cross-entropy method for optimization problems [Rubinstein] In optimization problems, we are looking for inputs that maximize the performance function. The main problem is that this maximum is unknown beforehand. The stopping point is when the sample has a small relative standard deviation. The method was successfully applied to a variety of graph optimization problems: o MAX-CUT o Traveling salesman o … 8 8 © 2003 IBM Corporation IBM Research starting point Performance function Illustration Performance function Uniform distribution Updated distribution 9 9 © 2003 IBM Corporation IBM Research The setting in graphs In graph problems, we have the following: o The space is all paths in the graph G o A performance function f gives each path a value o We are looking for a path that maximizes f In each iteration, we choose the best part Q of the sample The probability update formula for an edge e=(v,w) is f’(e) = 10 #paths in Q that use e #paths in Q that go via v 10 © 2003 IBM Corporation IBM Research Cross-entropy for testing A program is viewed as a graph Each decision point is a node in the graph Decision points can result from any non-deterministic or other not predetermined decisions: concurrency inputs coin tossing The performance function is defined according to the bug that we want to find 11 o More on than later … 11 © 2003 IBM Corporation IBM Research Our implementation We focus on concurrent programs. A program under test is represented as a graph, with nodes being the synchronization points. 12 this works only if there is a correct locking policy Edges are possible transitions between nodes. The graph is assumed to be DAG – all loops are unwound. The graph is constructed on-the-fly during the executions. The initial probability distribution is uniform among edges. We collect a sample of several hundreds executions. We adjust the probabilities of edges according to the formula. We repeat the process until the sample has a very small relative standard deviation. 1-5% 12 © 2003 IBM Corporation IBM Research Dealing with loops for i=1 to 100 do sync node; end for i mod 2 Unwinding all loops creates a huge graph. Problems with huge graphs: sync node sync node odd even o Takes more space to represent o Takes more time to converge We assume that most of the time, we are doing the same thing on subsequent iterations of the loop. for instance, modulo 2 creates two nodes for each location inside the loop – for even and for odd iterations 13 We introduce modulo parameter. It reduces the size of the graph. dramatically, but also loses information There is a balance between a too-small and a too-large modulo parameter that is found 13 empirically. © 2003 IBM Corporation IBM Research Bugs and performance functions 14 14 bug performance function buffer overflow number of elements in the buffer deadlock number of locks data race number of accessed shared resources testing error paths number of error paths taken note that we can also test for patterns, not necessarily bugs © 2003 IBM Corporation IBM Research Implementation – in Java for Java Instrumentation Stopper Decider program under test ------------------------------- probability distribution table Evaluator 15 15 Updater disk © 2003 IBM Corporation IBM Research Experimental results We ran ConCEnter on several examples with buffer overflow and with deadlocks. The bugs were very rare and did not manifest themselves in random testing. ConCEnter found the bugs successfully. The method requires significant tuning: the modulo parameter, the smoothing parameter, correct definition of the performance function, etc. Example: A-B-push-pop myName=A // or B – there are two types thread A loop: if (top_of_stack=myName) pop; thread A else push(myName); x10 end loop; 16 16 the probability of stack overflow is exponentially small thread B thread B A B A 36 © 2003 IBM Corporation IBM Research Future work Automatic tuning. Making ConCEnter plug-and-play for some predefined bugs. Replay: can we use distance from a predefined execution as a performance function? works already Second best: what if there are several areas in the graph where the maximum is reached? What are the restrictions on the performance function in order for this method to work properly? 17 17 seems that the function should be smooth enough © 2003 IBM Corporation IBM Research Related work Testing: o Random testing nothing specifically targeted to rare o Stress testing bugs o Noise makers o Coverage estimation o Bug-specific heuristics o Genetic algorithms o … Cross-entropy applications: o Buffer allocation, neural computation, DNA sequence alignment, scheduling, graph problems, … 18 18 cross-entropy is useful in many areas © 2003 IBM Corporation IBM Research 19 19 © 2003 IBM Corporation