Automated Whitebox Fuzz Testing Network and Distributed System Security (NDSS) 2008

advertisement
Automated Whitebox Fuzz Testing
Network and Distributed System Security
(NDSS) 2008
by Patrice Godefroid, Michael Y. Levin, and David Molnar
Present by
Diego Velasquez
Acknowledgments

Figures are copy from the paper.

Some slides were taken from the original
presentation presented by the authors
2
Outline

Summary







Review




Goals
Motivations
Methods
Experiments
Results
Conclusions
Strengths
Weakness
Extensions
Reference
3
Goals

Propose a novel methodology that
performs efficiently fuzz testing.

Introduce a new search algorithm for
systematic test generation.

Outcast their system SAGE (Scalable,
Automated, Guided Execution)
4
Methods

Fuzz testing inserts random data to input of applications
in order to find defects of a software system. Heavily
used in Security testing.


Pros: Cost effective and can find most of known bugs
Cons: It has some limitations depending on some types of
branches, for example on project 2 in order to find bug # 10 we
need to execute the if statement below.
if(address ==613 && value >= 128 && value<255)//Bug #7
printf("BUG 10 TRIGGERED);

Has (1 in 5000) * (128 in 2^32) in order to be executed if we know that
is only 5000 addresses and value is a random 32-bit input
5
Methods Cont.

Whitebox Fuzz Testing

Combine fuzz testing with dynamic test generation [2]
 Run the code with some initial input
 Collect constraints on inputs with symbolic
execution
 Generate new constraints
 Solve constraints with constraint solver
 Synthesize new inputs
6
Methods Cont.

The Search Algorithm
figure 1 from [1]


Black box will do poorly in this case
Dynamic test could do better
7
Methods Cont.

Dynamic Approach



Input ‘good’ as example
Collect constrain from trace
Create a new path constraint
Figure 2 from [1]
8
Methods Cont.

Limitations of Dynamic Testing

Path Explosion



Path doesn’t scale to large in realistic programs.
Can be corrected by modifying the search algorithm.
Imperfect Symbolic Execution


Could be imprecise due to Complex program statements
(arithmetic, pointer manipulation)
Calls to OS have to be expensive in order to be precise
9
Methods Cont.

New Generation Search Algorithm
Figure 3 and figure 4 from [1]


A type of Bread First Search with heuristic to get more
input test cases.
Scores return the number of new test cases covered.
10
Methods Cont.

Summary of Generation Search Algorithm






Push input to the list
Run&Check(input) check bugs in that input
Traverse the list by selecting from the list base in score
Expanded child paths and adding to the childlist
Traverse childlist Run&Check, assigned score and add to list
Expand Execution



Generates Path constrain
Attempt to expand path constraints and save them
Input.bound is bound is used to limit the backtracking of each
sub-search above the branch.
11
Experiments

Can test any file-reading
program running on
Windows by treating
bytes read from files as
symbolic input.

Another key novelty of
SAGE is that it performs
symbolic execution of
program traces at the
x86 binary level
FIGURE FROM [2]
12
Experiments Cont.

Sage advantages
 Not source-based, SAGE is a machine-code-based,
so it can run different languages.
 Expensive to build at the beginning, but less
expensive over time
 Test after shipping,


Since is based in symbolic execution on binary code, SAGE can
detects bugs after the production phase
Not source is needed like in another systems

SAGE doesn’t even need specific data types or structures not easy
visible in machine code
13
Experiments Cont.




MS07-017: Vulnerabilities in Graphics Device Interface
(GDI) Could Allow Remote Code Execution.
Test in different Apps such as image processors, media
players, file decoders.[2]
Many bugs found rated as “security critical, severity 1,
priority 1”[2]
Now used by several teams regularly as part of QA
process.[2]
14
Experiments Cont.

More in MS07-017, figure below is from [2] left is input
right is crashing test case
RIFF...ACONLIST
B...INFOINAM....
3D Blue Alternat
e v1.1..IART....
................
1996..anih$...$.
................
................
..rate..........
..........seq ..
................
..LIST....framic
on......... ..
RIFF...ACONB
B...INFOINAM....
3D Blue Alternat
e v1.1..IART....
................
1996..anih$...$.
................
................
..rate..........
..........seq ..
................
..anih....framic
on......... ..
15
Only 1 in 232
chance at
random!
Results

Statistics from 10hour searches on seven
test applications, each seeded with a well
formed input file.
16
Results


Focused on the Media 1 and Media 2 parsers.
Ran a SAGE search for the Media 1 parser with five
“well-formed” media files, and five bogus files.
Figure 7 from [1]
17
Results

Compared with Depth-First Search Method

DFS runs for 10 hours for Media 2 with wff-2 and wff-3, didn’t
find anything GS found 15 crashes

Symbolic Execution is slow

Well formed input are better than Bogus files

Non-determinism in Coverage Results.

The heuristic method didn’t have too much impact

Divergences are common
18
Results

Most bugs found are “shallow”
3.5
3
2.5
2
# Unique
First-Found
Bugs
1.5
1
0.5
0
1
2
3
4
5
Figure from [2]
19
6
7
Conclusions

Blackbox vs. Whitebox Fuzzing

Cost/precision tradeoffs



Blackbox is lightweight, easy and fast, but poor coverage
Whitebox is smarter, but complex and slower
Recent “semi-whitebox” approaches


Which is more effective at finding bugs? It depends…



Less smart but more lightweight: Flayer (taint-flow analysis, may generate false
alarms), Bunny-the-fuzzer (taint-flow, source-based, heuristics to fuzz based on
input usage), autodafe, etc.
Many apps are so buggy, any form of fuzzing finds bugs!
Once low-hanging bugs are gone, fuzzing must become smarter:
use whitebox and/or user-provided guidance (grammars, etc.)
Bottom-line: in practice, use both!
*Slide From [2]
20
Strengths

Novel approach to do fuzz testing


Applied as a black box



Not source code was needed
symbolic execution of program at the x86 binary level
Shows results comparing previous results


Introduced new search algorithm that use codecoverage maximizing heuristic
Test large applications previously tested found more
bugs.
Introduced a full system and applied the novel
ideas in this paper
21
Weakness

The results were non-determinism


Only focus in specific areas





Same input, program and idea different results.
X86 windows applications
File manipulation applications
Well formed input still some type of regular fuzzing
testing
SAGE needs help from different tools
In my opinion the paper extends too much in the
implementation of SAGE, and the system could of be
too specific to Microsoft
22
Extensions

Make SAGE more general




Better way to create input files


Easy to implement to another architectures
Use for another types of applications
Linux based applications
May be used of grammar
Make the system deterministic

Having different results make me think that it
could not be reliable.
23
Reference



[1] P Godefroid, MY Levin, D Molnar,
Automated Whitebox Fuzz Testing,
NDSS, 2008.
[2] Original presentation slides
www.truststc.org/pubs/366/15%20%20Molnar.ppt
[3] Wikipedia Fuzz testing
http://en.wikipedia.org/wiki/Fuzz_testing.
24

Questions, Comments or
Suggestions?
25
Download