Delta Debugging

advertisement
Simplifying and Isolating
Failure-Inducing Input
Andreas Zeller
Ralf Hildebrandt
Saarland University,
Germany
DeTeLine – Deutsche
Telekom, Germany
Presented by Nir Peer
University of Maryland
Introduction
Overview
 Given some test case, a program fails.
 What is the minimal test case that still produces
the failure?
 Also, what is the difference between a passing
and a failing test case?
 or in other words…
3
How do we go from this
<td align=left valign=top>
<SELECT NAME="op sys" MULTIPLE SIZE=7>
<OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows
95<OPTION VALUE="Windows 98">Windows 98<OPTION VALUE="Windows ME">Windows ME<OPTION VALUE="Windows
2000">Windows 2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="Mac System 7">Mac System
7<OPTION VALUE="Mac System 7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System
7.6.1<OPTION VALUE="Mac System 8.0">Mac System 8.0<OPTION VALUE="Mac System 8.5">Mac System
8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="Mac System 9.x">Mac System 9.x<OPTION
VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTION VALUE="BSDI">BSDI<OPTION
VALUE="FreeBSD">FreeBSD<OPTION VALUE="NetBSD">NetBSD<OPTION VALUE="OpenBSD">OpenBSD<OPTION
VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HP-UX<OPTION VALUE="IRIX">IRIX<OPTION
VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTION VALUE="OS/2">OS/2<OPTION
VALUE="OSF/1">OSF/1<OPTION VALUE="Solaris">Solaris<OPTION VALUE="SunOS">SunOS<OPTION
VALUE="other">other</SELECT>
</td>
<td align=left valign=top>
<SELECT NAME="priority" MULTIPLE SIZE=7>
<OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTION
VALUE="P4">P4<OPTION VALUE="P5">P5</SELECT>
</td>
<td align=left valign=top>
<SELECT NAME="bug severity" MULTIPLE SIZE=7>
<OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTION
VALUE="normal">normal<OPTION VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTION
VALUE="enhancement">enhancement</SELECT>
</tr>
</table>
File
Print
Segmentation Fault
4
into this
<SELECT>
File
Print
Segmentation Fault
5
Motivation
 The Mozilla open-source web browser project receives
several dozens bug reports a day.
 Each bug report has to be simplified
 Eliminate all details irrelevant to producing the failure
 To facilitate debugging
 To make sure it does not replicate a similar bug report
 In July 1999, Bugzilla listed more than 370 open bug
reports for Mozilla.
 These were not even simplified
 Mozilla engineers were overwhelmed with work
 They created the Mozilla BugAThon: a call for volunteers to process bug
reports
6
Motivation
 Simplifying meant turning bug reports into
minimal test cases
 where every part of the input would be significant in
reproducing the failure
 What we want is the simplest HTML page that
still produces the fault.
 Decomposing specific bug reports into simple
test case is of general interest…
 Let’s automate this task!
7
Simplification of test cases
 The minimizing delta debugging algorithm
ddmin
 Takes a failing test case
 Simplifies it by successive testing
 Stops when a minimal test case is reached
 where removing any single input entity will cause the failure to
disappear
8
How to minimize a test case?
 Test subsets with
removed characters
(shown in grey)
 A given test case
 Fails () if Mozilla crashes on
it
 Passes () otherwise
9
How to minimize a test case?
Original failing input
1
<SELECTNAME="priority"MULTIPLESIZE=7>

2
<SELECTNAME="priority"MULTIPLESIZE=7>

3
<SELECTNAME="priority"MULTIPLESIZE=7>

Try removing half…
Now everything passes, we’ve lost the error inducing input!
Try removing a quarter… ok found something!
10
How to minimize a test case?
Try removing a quarter
instead
OK, we’ve got
something!
So keep it, and continue…
Good, carry on
Lost it!
Try removing an
eighth instead…
1
<SELECTNAME="priority"MULTIPLESIZE=7>

2
<SELECTNAME="priority"MULTIPLESIZE=7>

3
<SELECTNAME="priority"MULTIPLESIZE=7>

4
<SELECTNAME="priority"MULTIPLESIZE=7>

5
<SELECTNAME="priority"MULTIPLESIZE=7>

6
<SELECTNAME="priority"MULTIPLESIZE=7>

7
<SELECTNAME="priority"MULTIPLESIZE=7>

11
How to minimize a test case?
6
<SELECTNAME="priority"MULTIPLESIZE=7>

Removing an eighth
7
<SELECTNAME="priority"MULTIPLESIZE=7>

Good, keep it!
8
<SELECTNAME="priority"MULTIPLESIZE=7>

9
<SELECTNAME="priority"MULTIPLESIZE=7>

10
<SELECTNAME="priority"MULTIPLESIZE=7>

11
<SELECTNAME="priority"MULTIPLESIZE=7>

12
<SELECTNAME="priority"MULTIPLESIZE=7>

13
<SELECTNAME="priority"MULTIPLESIZE=7>

14
<SELECTNAME="priority"MULTIPLESIZE=7>

15
<SELECTNAME="priority"MULTIPLESIZE=7>

16
<SELECTNAME="priority"MULTIPLESIZE=7>

17
<SELECTNAME="priority"MULTIPLESIZE=7>

18
<SELECTNAME="priority"MULTIPLESIZE=7>

Lost it!
Try removing a
sixteenth instead…
Great! we’re
making progress
OK, now let’s see if removing single characters helps us reduce it even more
12
How to minimize a test case?
Removing a single
character
Reached a minimal
test case!
Therefore, this
should be our
test case
17
<SELECTNAME="priority"MULTIPLESIZE=7>

18
<SELECTNAME="priority"MULTIPLESIZE=7>

19
<SELECTNAME="priority"MULTIPLESIZE=7>

20
<SELECTNAME="priority"MULTIPLESIZE=7>

21
<SELECTNAME="priority"MULTIPLESIZE=7>

22
<SELECTNAME="priority"MULTIPLESIZE=7>

23
<SELECTNAME="priority"MULTIPLESIZE=7>

24
<SELECTNAME="priority"MULTIPLESIZE=7>

25
<SELECTNAME="priority"MULTIPLESIZE=7>

26
<SELECTNAME="priority"MULTIPLESIZE=7>

26
<SELECTNAME="priority"MULTIPLESIZE=7>

13
Formalization
Testing for Change
 The execution of a program is determined by a a number
of circumstances





The program code
Data from storage or input devices
The program’s environment
The specific hardware
and so on
 We’re only interested in the changeable circumstances
 Those whose change may cause a different program behavior
15
The change that Causes a Failure
 Denote the set of possible configurations of
circumstances by R.
 Each rR determines a specific program run.
 This r could be
 a failing run, denoted by r
 a passing run, denoted by r
 Given a specific r
 We focus on the difference between r and some rR that works
 This difference is the change which causes the failure
 The smaller this change, the better it qualifies as a failure cause
16
The change that Causes a Failure
 Formally, the difference between r and r is expressed
as a mapping  which changes the circumstances of a
program run
Definition 1 (Change).
A change  is a mapping  : RR.
The set of changes is C  R  R.
The relevant change between two runs r,rR is
a change C s.t. (r)  r.
 The exact definition of  is problem specific
 In the Mozilla example, applying  means to expand a trivial (empty)
HTML input to the full failure-inducing HTML page.
17
Decomposing Changes
 We assume that the relevant change  can be
decomposed into a number of elementary
changes 1,..., n.
 In general, this can be an atomic decomposition
 Changes that can no further be decomposed
Definition 2 (Composition of changes).
The change composition : C  C  C is defined as
(i  j)(r)  i(j(r))
18
Test Cases and Tests
 According to the POSIX 1003.3 standard for testing
frameworks, we distinguish three test outcomes:
 The test succeeds (PASS, written here as )
 The test has produced the failure it was intended to capture (FAIL,
written here as )
 The test produced indeterminate results (UNRESOLVED, written as ?)
Definition 3 (rtest).
The function rtest : R  {,,?} determines for a program run rR
whether some specific failure occurs () or not () or whether the test is
unresolved (?).
Axiom 4 (Passing and failing run).
rtest(r)   and rtest(r)   hold.
19
Test Cases and Tests
 We identify each run by the set of changes being
applied to r
 We define c as the empty set  which identifies r (no
changes applied)
 The set of all changes c  {1,2,...,n} identifies
r  (12...n)(r)
Definition 5 (Test case).
A subset c  c is called a test case.
20
Test Cases and Tests
Definition 6 (test).
The function test : 2  {,,?} is defined as follows:
Let c  c be a test case with c  {1,2,...,n}. Then
test(c)  rtest((12...n)(r))
holds.
Corollary 7 (Passing and failing test cases).
The following holds:
test(c)  test()  
(“passing test case”)
test(c)  test({1,2,...,n})  
(“failing test case”)
21
Minimizing Test Cases
Minimal Test Cases
 If a test case c  c is a minimum, no other smaller
subset of c causes a failure
Definition 8 (Global minimum).
A set c  c is called the global minimum of c if:
c'  c  (|c'| < |c|  test(c')  )
holds.
 But we don't want to have to test all 2|c| of c
 So we'll settle for a local minimum
 A test case is minimal if none of its subsets causes a failure
Definition 9 (Local minimum).
A test case c  c is a local minimum of c or minimal if:
c'  c  (test(c')  )
holds.
23
Minimal Test Cases
 Thus, if a test case c is minimal
 It is not necessarily the smallest test case (there may be a
different global minimum)
 But each element of c is relevant in producing the failure
 Nothing can be removed without making the failure disappear
 However, determining that c is minimal still
requires 2|c| tests
 We can use an approximation instead
 It is possible that removing several changes at once might make a test
case smaller
 But we'll only check if this is so when we remove up to n changes
24
Minimal Test Cases
 We define n-minimality: removing any combination of up
to n changes, causes the failure to disappear
 We're actually most interested in 1-minimal test cases
 When removing any single change causes the failure to disappear
 Removing two or more changes at once may result in an even smaller,
still failing test case
 But every single change on its own is significant in reproducing the failure
Definition 10 (n-minimal test case).
A test case c  c is n-minimal if:
c'  c  (|c| - |c'|  n  test(c')  ) holds.
Consequently, c is 1-minimal if i  c  (test(c - {i})  ) holds.
25
The Delta Debugging Algorithm
 We partition a given test case c into subsets
 Suppose we have n subsets D1,...,Dn
 We test
 each Di and
 its complement i  c - Di
26
The Delta Debugging Algorithm
 Testing each Di and its complement, we have four
possible outcomes
 Reduce to subset
 If testing any Di fails, it will be a smaller test case
 Continue reducing Di with n  2 subsets
 Reduce to complement
 If testing any i  c - Di fails, it will be a smaller test case
 Continue reducing i with n - 1 subsets
– Why n - 1 subsets and not n  2 subsets? (Maintain granularity!)
 Double the granularity
 Done
27
The Delta Debugging Algorithm
28
Example
 Consider the following minimal test case which consists
of the changes 1, 7, and 8
 Any test case that includes only a subset of these changes results in an
unresolved test outcome
 A test case that includes none of these changes passes the test
 We first partition the set of changes in two halves
 none of them passes the test
1 D1  2 1 2 3 4
2 D2  1








?
5 6 7 8 ?
29
Example
 We continue with granularity increased to four subsets
3
D1
1 2






?
4
D2


3 4





5
D3




5 6



6
D4





7 8 ?

 When testing the complements, the set 2 fails, thus
removing changes 3 and 4
7
1


8
2
1 2
3 4 5 6 7 8 ?


5 6 7 8 
 We continue with splitting 2 into three subsets
30
Example
 Steps 9 to 11 have already been carried out and need
not be repeated (marked with *)
9
D1
1 2






10
D2




5 6

 *
11
D3





7 8 *

?*
 When testing 2, changed 5 and 6 can be eliminated
12
1




5 6 7 8 ?
13
2
1 2




7 8 
 We reduce to 2 and continue with two subsets
31
Example
14 D1  2 1 2
15 D2  1












7 8 ?*
?*
 We increase granularity to four subsets and test each
16
D1
1







?
17
D2

2







18
D3






7

?
19
D4







8 ?
 Testing the complements shows the we can eliminate 2
20
1

2




7 8 ?
21
2
1





7 8 
32
Example
 The next steps show that none of the remaining changes
1, 7, and 8 can be eliminated
22
D1
1







?*
23
D2






7

?*
24
D3







8 ?*
25
1






7 8 ?
26
2
1






8 ?
27
3
1





7

?
 To minimize this test case, a total of 19 different tests
was required
33
Case Studies
The GNU C Compiler
#define SIZE 20
 This program (bug.c) causes
GCC 2.95.2 to crash when
optimization is enabled
 We would like to minimize this
program in order to file a bug
report
 In the case of GCC, a passing
program run is the empty input
 For the sake of simplicity, we
model change as the insertion of
a single character
 r is running GCC with an empty input
 r means running GCC with bug.c
 each change i inserts the ith
character of bug.c
double mult(double z[], int n)
{
int i, j;
i = 0;
for (j = 0; j < n; j++) {
i = i + j + 1;
z[i] = z[i] * (z[0] + 1.0);
}
return z[n];
}
void copy(double to[], double from[], int count)
{
int n = (count + 7) / 8;
switch (count % 8) do {
case 0: *to++ = *from++;
case 7: *to++ = *from++;
case 6: *to++ = *from++;
case 5: *to++ = *from++;
case 4: *to++ = *from++;
case 3: *to++ = *from++;
case 2: *to++ = *from++;
case 1: *to++ = *from++;
} while (--n > 0);
return mult(to, 2);
}
int main(int argc, char *argv[])
{
double x[SIZE], y[SIZE];
double *px = x;
while (px < x + SIZE)
*px++ = (px – x) * (SIZE + 1.0);
return copy(y, x, SIZE);
}
35
The GNU C Compiler
 The test procedure would
 create the appropriate subset of bug.c
 feed it to GCC
 return  iff GCC had crashed, and  otherwise
77
755
377
188
36
The GNU C Compiler
 The minimized code is
t(double z[],int n){int i,j;for(;;){i=i+j+1;z[i]=z[i]*(z[0]+0);}return z[n];}
 The test case is 1-minimal




No single character can be removed without removing the failure
Even every superfluous whitespace has been removed
The function name has shrunk from mult to a single t
This program actually has a semantic error (infinite loop), but GCC still
isn't supposed to crash
 So where could the bug be?
 We already know it is related to optimization
 If we remove the –O option to turn off optimization, the failure
disappears
37
The GNU C Compiler
 The GCC documentation lists 31 options to control
optimization on Linux
–ffloat-store
–fforce-mem
–fno-inline
–fkeep-static-consts
–fstrength-reduce
–fcse-skip-blocks
–fgcse
–fschedule-insns2
–fcaller-saves
–fmove-all-movables
–fstrict-aliasing
–fno-default-inline
–fforce-addr
–finline-functions
–fno-function-cse
–fthread-jumps
–frerun-cse-after-loop
–fexpensive-optimizations
–ffunction-sections
–funroll-loops
–freduce-all-givs
–fno-defer-pop
–fomit-frame-pointer
–fkeep-inline-functions
–ffast-math
–fcse-follow-jumps
–frerun-loop-opt
–fschedule-insns
–fdata-sections
–funroll-all-loops
–fno-peephole
 It turns out that applying all of these options causes the
failure to disappear
 Some option(s) prevent the failure
38
The GNU C Compiler
 We can use test case minimization in order to find the
preventing option(s)
 Each i stands for removing a GCC option
 Having all i applied means to run GCC with no option (failing)
 Having no i applied means to run GCC with all options (passing)
 After seven tests, the single option -ffast-math is
found which prevents the failure
 Unfortunately, it is a bad candidate for a workaround because it may
alter the semantics of the program
 Thus, we remove -ffast-math from the list of options and make
another run
 Again after seven tests, it turn out that -fforce-addr also
prevents the failure
 Further examination shows that no other option prevents the failure
39
The GNU C Compiler
 So, this is what we can send to the GCC
maintainers
 The minimal test case
 “The failure only occurs with optimization”
 “-ffast-math and -fforce-addr prevent the
failure”
40
Minimizing Fuzz
 In a classical experiment by Miller et al. several UNIX utilities were
fed with fuzz input – a large number of random characters
 The studies showed that in the worst case 40% of the basic programs crashed
or went into infinite loops
 We would like to use the ddmin algorithm to minimize the fuzz input
sequences
 We examine the following six UNIX utilities
 NROFF (format documents for display)
 TROFF (format documents for typesetter)
 FLEX (fast lexical analyzer generator)
 CRTPLOT (graphics filter for various plotters)
 UL (underlining filter)
 UNITS (convert quantities)
41
Minimizing Fuzz
 The following table summarizes the characteristics of the
different fuzz inputs used
Name
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10 t11 t12 t13 t14 t15 t16
Character range
all
printable
all
printable
NUL characters
yes
yes
no
no
Input size
103 104 105 106 103 104 105 106 103 104 105 106 103 104 105 106
Name
t1
t2
t3
t4
t5
t6
NROFF
S S S S
–
A A A S S S S
–
–
–
–
TROFF
–
S S S
–
A A S
–
–
S S
–
–
–
–
FLEX
–
–
–
–
–
–
–
–
–
–
–
–
–
S S S
CRTPLOT
S
–
–
–
S
–
–
–
S
–
–
–
S S S S
UL
–
–
–
–
S S S S
–
–
–
–
S S S S
UNITS
–
S S S
–
–
–
S S
–
t7
–
t8
–
t9
S  segmentation fault, A  arithmetic exception
t10 t11 t12 t13 t14 t15 t16
–
–
–
–
42
Minimizing Fuzz
 Out of the 6  16  96 test runs, the utilities
crashed 42 times (43%)
 We apply our algorithm to minimize the failureinducing fuzz input
 Our test function returns  if the input made the program
crash
 or  otherwise
43
Download