Automated Whitebox Fuzz Testing

advertisement
Automated Whitebox
Fuzz Testing
Patrice Godefroid (Microsoft Research)
Michael Y. Levin (Microsoft Center for
Software Excellence)
David Molnar
(UC-Berkeley, visiting MSR)
Presented by Yuyan Bao
Acknowledgement
• This presentation is extended and modified
from
• The presentation by David Molnar
Automated Whitebox Fuzz Testing
• The presentation by Corina Păsăreanu
(Kestrel)
Symbolic Execution for Model Checking and
Testing
Agenda
•
•
•
•
•
•
•
•
•
Fuzz Testing
Symbolic Execution
Dynamic Testing Generation
Generational Search
SAGE Architecture
Conclusion
Contribution
Weakness
Improvement
Fuzz Testing
• An effective technique for finding
security vulnerabilities in software
• Apply invalid, unexpected, or
random data to the inputs of a program.
• If the program fails(crashing, access violation
exception), the defects can be noted.
Whitebox Fuzzing
• This paper present an alternative whitebox fuzz testing
approach inspired by recent advances in symbolic
execution and dynamic test generation
– Run the code with some initial well-formed input
– Collect constraints on input with symbolic
execution
– Generate new constraints (by negating constraints
one by one)
– Solve constraints with constraint solver
– Synthesize new inputs
Symbolic Execution
•
•
•
•
A method for deriving test cases which satisfy a given path
Performed as a simulation of the computation on the path
Initial path function = Identity function, Initial path condition = true
Each vertex on the path induce a symbolic composition on the path
function and a logical constraint on the path condition:
If an assignment
was made:
X  g( X )
f g
If a conditional decision was made:
path condition  path condition
f
 branch condition [ X  f ( X )]
Output: path function and path condition for the given path
Slide from “White Box Testing and Symbolic Execution” by Michael Beder
White Box Testing and Symbolic
Execution
6
Symbolic Execution
Symbolic execution tree:
x:X,y:Y
PC: true
true
x:X,y:Y
PC: X>Y
Code:
int x, y;
if (x > y) {
x = x + y;
y = x - y;
x = x - y;
if (x – y > 0)
assert (false);
}
x:X,y:Y
PC: X<=Y
x:X+Y,y:Y
PC: X>Y
x:X+Y,y:X
PC: X>Y
Not reachable
x:Y,y:X
PC: X>Y
- “Simulate” the code using
symbolic values instead of
program numeric data
(PC=“path condition”)
false
true
x:Y,y:X
PC: X>Y  Y-X>0
false
x:Y,y:X
PC:X>Y  Y-X<=0
FALSE!
7
Dynamic Test Generation
• Executing the program starting with some
initial inputs
• Performing a dynamic symbolic execution to
collect constraints on inputs
• Use a constraint solver to infer variants of the
previous inputs in order to steer the next
executions
Dynamic Test Generation
input = “good”
void top(char input[4])
{
int cnt = 0;
if (input[0] == ‘b’) cnt++;
if (input[1] == ‘a’) cnt++;
if (input[2] == ‘d’) cnt++;
if (input[3] == ‘!’) cnt++;
if (cnt >= 3) crash();
}
•
•
Running the program with random values for the 4 input
bytes is unlikely to discovery the error: a probability
of about 1 / 230
Traditional fuzz testing is difficult to generate input
values that will drive program through all its possible
execution paths.
Depth-First Search
I0 != ‘b’
I1 != ‘a’
I2 != ‘d’
I3 != ‘!’
good
void top(char input[4])
{
int cnt = 0;
if (input[0] == ‘b’) cnt++;
if (input[1] == ‘a’) cnt++;
if (input[2] == ‘d’) cnt++;
if (input[3] == ‘!’) cnt++;
if (cnt >= 3) crash();
}
I0
I1
I2
I3
!=
!=
!=
!=
‘b’
‘a’
‘d’
‘!’
Depth-First Search
good goo!
void top(char input[4])
{
int cnt = 0;
if (input[0] == ‘b’) cnt++;
if (input[1] == ‘a’) cnt++;
if (input[2] == ‘d’) cnt++;
if (input[3] == ‘!’) cnt++;
if (cnt >= 3) crash();
}
I0
I1
I2
I3
!=
!=
!=
==
‘b’
‘a’
‘d’
‘!’
Practical limitation
good godd
void top(char input[4])
{
int cnt = 0;
if (input[0] == ‘b’) cnt++;
if (input[1] == ‘a’) cnt++;
if (input[2] == ‘d’) cnt++;
if (input[3] == ‘!’) cnt++;
if (cnt >= 3) crash();
}
I0
I1
I2
I3
!=
!=
==
!=
Every time only one constraint is expanded, low efficiency!
‘b’
‘a’
‘d’
‘!’
Generational search
• BFS with a heuristic to maximize block coverage
• Score returns the number of new blocks covered
Slide from “Symbolic Execution” by Kevin Wallace, CSE504 2010-04-28
14
Generational Search
Key Idea: One Trace, Many Tests
bood
gaod
godd
good goo!
void top(char input[4])
{
int cnt = 0;
if (input[0] == ‘b’) cnt++;
if (input[1] == ‘a’) cnt++;
if (input[2] == ‘d’) cnt++;
if (input[3] == ‘!’) cnt++;
if (cnt >= 3) crash();
}
“Generation 1” test cases
I0
I1
I2
I3
==
==
==
==
‘b’
‘a’
‘d’
‘!’
The Generational Search Space
void top(char input[4])
{
int cnt = 0;
if (input[0] == ‘b’) cnt++;
if (input[1] == ‘a’) cnt++;
if (input[2] == ‘d’) cnt++;
if (input[3] == ‘!’) cnt++;
if (cnt >= 3) crash();
}
0
1
1
2
1
2
2
1
2
2
2
good goo! godd god! gaod gao! gadd gad! bood boo! bodd bod! baod baod badd bad!
SAGE Architecture
(Scalable Automated Guided Execution)
Input0
Task: Tester
Check for
Crashes
(AppVerifier)
Constraints
Trace File
Task: Tracer
Trace
Program
(Nirvana)
Task:
CoverageCollector
Gather
Constraints
(Truscan)
Input1
Input2
…
InputN
Task:
SymbolicExecutor
Solve
Constraints
(Disolver)
Initial Experiences with SAGE
• Since 1st MS internal release in April’07: dozens of new
security bugs found (most missed by blackbox fuzzers,
static analysis)
• Apps: image processors, media players, file decoders,…
• Many bugs found rated as “security critical, severity 1,
priority 1”
• Now used by several teams regularly as part of QA
process
Experiments
ANI Parsing - MS07-017
RIFF...ACONLIST
B...INFOINAM....
3D Blue Alternat
e v1.1..IART....
................
1996..anih$...$.
................
................
..rate..........
..........seq ..
................
..LIST....framic
on......... ..
Initial input
RIFF...ACONB
B...INFOINAM....
3D Blue Alternat
e v1.1..IART....
................
1996..anih$...$.
................
................
..rate..........
..........seq ..
................
..anih....framic
on......... ..
Only 1 in 232
chance at
random!
Crashing test case
Initial input : 341 branch constraints, 1,279,939 total instructions.
Crash test cases: 7706 test cases, 7hrs 36 mins
Different Crashes From
Different Initial Input Files 100
Bug ID
seed1 seed2
seed3 seed4
seed5
seed6
seed7
1867196225
X
X
X
X
X
2031962117
X
X
X
X
X
X
X
612334691
1061959981
X
1212954973
X
1011628381
X
842674295
1246509355
X
X
X
X
X
X
X
X
1527393075
X
1277839407
X
1951025690
zero
bytes
X
Media 1: 60 machine-hours, 44,598 total tests, 357 crashes, 12 bugs
Conclusion
Blackbox vs. Whitebox Fuzzing
• Cost/precision tradeoffs
– Blackbox is lightweight, easy and fast, but poor coverage
– Whitebox is smarter, but complex and slower
– Recent “semi-whitebox” approaches
• Less smart but more lightweight: Flayer (taint-flow analysis, may
generate false alarms), Bunny-the-fuzzer (taint-flow, source-based,
heuristics to fuzz based on input usage), autodafe, etc.
• Which is more effective at finding bugs? It depends…
– Many apps are so buggy, any form of fuzzing finds bugs!
– Once low-hanging bugs are gone, fuzzing must become smarter: use
whitebox and/or user-provided guidance (grammars, etc.)
• Bottom-line: in practice, use both!
Contribution
• A novel search algorithm with a
coverage-maximizing heuristic
• It performs symbolic execution of
program traces at the x86 binary level
• Tests larger applications than previously
done in dynamic test generation
Weakness
• Gathering initial inputs
• The class of inputs characterized by each symbolic
execution is determined by the dependence of the
program’s control flow on its inputs
• Not a general solution
• Only for x86 windows applications
• Only focus on file-reading applications
• Nirvana simulates a given processor’s architecture
Improvement
• Portability
• Translate collected traces into an
intermediate language
• Reproduce bugs on different platforms
without re-simulating the program
• Better way to find initial inputs efficiently
Thank you!
Questions?
Download