Automated Whitebox Fuzz Testing (NDSS 2008) David Molnar Michael Y. Levin

advertisement
Automated Whitebox Fuzz Testing
(NDSS 2008)
Patrice Godefroid
Michael Y. Levin
David Molnar
Microsoft (Research)
Microsoft (CSE)
UC Berkeley
pg@microsoft.com
mlevin@microsoft.com
dmolnar@eecs.berkeley.edu
Presented by:
Edmund Warner
University of Central Florida
April 7, 2011
Acknowledgments


Figures are taken directly from the paper or
original presentation slides
Some slides reused from the original
presentation
Overview
Definition of Whitebox Fuzz Testing
The Search Algorithm
SAGE (Scalable, Automated, Guided Execution)
Test Findings
Conclusions
What is Whitebox Fuzz Testing?
Fuzz testing is a form of blackbox random testing
Can be remarkably effective, but there are limitations
Given the then branch statement:
If (x == 10) then...
has 1 in 2^32 chance of being executed if
x is a random 32-bit input
Can provide low code coverage
Whitebox Fuzz Testing
Combine fuzz testing with dynamic test generation

Run the code with its input

Collect constraints on inputs with symbolic execution

Generate new constraints

Solve constraints with constraint solver

Synthesize new inputs
Whitebox Fuzz Testing
In theory, this approach can lead to full program
path coverage
Practically, it will fall short and the search will be
incomplete:


Number of execution paths in the program is huge
Symbolic execution, constraint generation, and constraint
solving are necessarily imprecise
The Search Algorithm
With blackbox fuzzing, it is unlikely to catch the error
(5 values out of 2^(8*4) 4-byte cases)
This is rather simple, however, for dynamic test
generation
Dynamic Test Generation
For instance, we run the input “good” on the program.
We develop a path constraint based off of the
conditional statements crossed:
<i0 != 'b', i1 != 'a', i2 != 'd', i3 != '!'>
Create a new path constraint:
<i0 = 'g', i1 != 'o', i2 != 'o', i3 = '!'>
Limitations
Path Explosion


Does not scale to large, realistic programs
Can be alleviated with different methods in the search
algorithm
Imperfect Symbolic Execution

Complex program statements (pointer manipulation)

OS and library functions (cost)
The Search Algorithm
Solution: Generational Search

Places the initial input in a workList

Runs program for bugs in the first execution

WorkList is processed by selecting an element and expanding it

Run with child inputs

Assigned a score

Added to workList
The Search Algorithm
More on ExpandExecution

Tests program with input

Generates path constraints (PC)

Attempt to expand path constraints

If so, save for later execution
The Search Algorithm
What does this mean?

Given input with PC

Attempts to expand all constraints in PC



Instead of just the last with a depth-first search

Or the first with a breadth-first search
A parameter bound is used to limit backtracking through
parent nodes
End Result: achieve the largest search space in the shortest
amount of time
SAGE
Scalable, Automated, Guided Execution
Can test any file-reading program running on Windows
by treating bytes read from files as symbolic input.
SAGE Architecture
Instead of being source-based, SAGE is a
machine-code-based instrumentation


Multitude of languages and build processes

No need for specific source, compiler and build operations

Slower to start, but encompasses much more
Compiler and post-build transformations


By performing symbolic execution on binary code that actually ships,
SAGE can detects bugs also in the compiling and post-processign tools
Unavailability of source


Source-based may be difficult for self-modifying or JITed code
SAGE doesn't need the data types or structures not visible at machine
code level
Constraint Generation
SAGE is trace-based
Uses replay of trace to update the concrete and
symbolic stores
This allows constraints to be built on input values
*Given conditional jumps, it uses bitvectors to tag the
EFLAGS used for the jumps
Constraint Optimization
SAGE employs a number of optimization techniques to
improve speed and decrease memory consumption:

Tag catching

Unrelated constraint elimination

Local constraint catching

Flip count limit

Concretization

Constraint subsumption**
Constraint subsumption checks to see if newly created
contstraints imply or are being implied
Findings

Generational Search vs. Depth-First Search

On Media1,2,3 applications they tested, DFS terminated in
~11 hours with nothing. GS ran for slightly longer and found
15 crashes in 4 buckets in Media3.

Bogus files find few bugs

Divergences are common: ~60%

Most bugs are shallow**

Impact of the block-coverage heuristic

Adding 10407 blocks instead of 10633; not very effective in
most cases
Conclusions
Most unique bugs found are on well formatted
input, and in few generations
There may be a limited sample size, but the
success of finding bugs previously missed
suggests a new search strategy
SAGE still needs enhancement: precision, power
Contributions
A critical vulnerability was found in the MS07-017 ANI, which has
been missed by extensive blackbox testing and static analysis
A new search algorithm was introduced for systematic test
generation, which has been optimized for large applications
Introduction and implementation of SAGE, which can scale to
programs with hundreds of millions of instructions
Weaknesses
The paper itself is hard to understand in certain
areas
Sometimes there is nondeterminism shown in
the coverage of the program

Same input, same program, same machine,
different coverage
Improvements
Paper – more figures explaining the heuristics
and rules
Nondeterminism – export input coverage results
to a database to be checked so that nothing is
repeated
Download