Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting

advertisement
Automated Vulnerability Analysis:
Leveraging Control Flow
for Evolutionary Input Crafting
Sherri Sparks, Shawn Embleton,
Ryan Cunningham, and Cliff Zou
School of Electrical Engineering and Computer Science
University of Central Florida
December, 2007
ACSAC
Vulnerability Analysis

Involves discovering a subset of a program input
space with which a malicious user can exploit logic
errors to drive it into an insecure state

Complexity of modern software makes complete
program state space exploration an intractable
problem
Motivation

Oftentimes, security researchers/hackers have analyzed
and located a potential vulnerable location in a system
(software/hardware)



C programs have well-known potentially vulnerable API functions
(e.g., strcpy()).
A critical hardware component dealing with user inputs
Exploitability implies reachability

In order to determine if a potential vulnerability is exploitable one
must prove that …
1.
2.

It is reachable on the runtime execution path
It is dependent / influenceable by user supplied input
Testing: Intelligent input generation to improve code
coverage
An Input Crafting Problem

What does the input have to look like to exercise the code path
between input node (recv) & the potentially vulnerable node
(strcpy) ?
recv
Parsing & validation
logic on path
between recv and
strcpy
strcpy
Control Flow Graph (CFG)

Testing: intelligently generate inputs that can reach a code region
for intense testing
TFTP Control Flow Graph
Basic Idea of Our Approach

Some inputs are better than others:



Find new improved inputs by Genetic Algorithm (GA)



They increase coverage by reaching previously unexplored
areas of the CFG
They are on a path to a basic block where some potentially
vulnerable APIs are being used
“Mate” the best of previous inputs we’ve found in the past to
generate new generation of inputs
Propose “Dynamic Markov Model” for input
measurement
Apply “Grammatical Evolution” to shrink input search
space
Short Review ―
Genetic Algorithms


A stochastic optimization algorithm that
mimics evolution
Requires two things

A representation

What should a solution look like


Binary string, ASCII string, integer…
A fitness function

Tells how good or bad each a solution is
Short Review ―
Genetic Algorithms

It works like this:
1.
2.
3.
4.
5.
Start out with a population (set) of
random solutions
Find each solution’s fitness
Select solutions with high fitness values
Generate new solutions through
mutations and crossover on selected
solutions
GOTO 2 (the next generation)
Grammatical Evolution
in Generating Inputs



Efficiently reduce search space
Flexible in utilizing partial-known knowledge of
inputs (user-specified context-free grammar)
Not used in any previous approaches
S
S
0
1
2
sAs | xBx | m
A
B
bBb | B
aAa | C
| AB
C
c
|
xBx
|
xaAax
d
e
xabBbax
10011
xabCbax
xabdbax
Fitness Function ―
Dynamic Markov Model



Treat the control flow graph
as a Markov Chain
The probability on each
conditional transition edge
is updated along the
searching based on
previously tested inputs
Edge transition probability is
calculated by:
# of inputs traversed the edge
# of inputs reached the conditional block
A
.25
.75
B
C
1
.9
.5
D
.2
L
.67
.33
G
H
.8
1
M
.1
E
.5
F
.4
I
.6
J
K
1
1
N
Control Flow Graph (CFG)
Fitness of An Input

Fitness of an input: inverse of the product of transition
probabilities of all edges along the execution path

Larger fitness is better
 Explore unobserved states
 Explore rarely observed states
 Increase coverage
A
.25
.75
B
C
1
 Better than previous methods
G
• Explore less observed
state
L
• Utilize information of all
previously searched Execution Path
paths
= A, C, E, D, G, M
.5
D
.67
.2
.9
.33
H
.8
1
M
.1
E
.5
F
.4
I
.6
J
K
1
1
N
Fitness = 1/(.75 x .9 x .5
x .67 x .8) = 5.525
Prototype ― An Intelligent
Fuzz Testing Tool (1)
Fuzzers – Black box analysis tools
that inject random generated inputs
into a program and then monitor it for
crashes



Pros: Simple, automated, test unthinkable
inputs
Cons: non-intelligent, hard to achieve good
code coverage
Prototype ― An Intelligent
Fuzz Testing Tool(2)
We seek to provide the following desirable qualities
(many existing tools lack one or more)

Intelligence


The ability to learn something useful from the inputs that have been tried in
the past and use that knowledge to guide the selection of future inputs.
Targeted Code Coverage


The ability to focus testing upon selective regions of interest in the code.
Targeted Execution Control


The ability to drive program execution through parse code to “drill down” to a
specific node in the control flow graph (which is suspected to contain a
vulnerability)
Source Code Independence


Ability to work on compiled binaries without source code availability
Extensibility and Configurability


The ability to fuzz multiple protocols with a single tool
Prototype ― An Intelligent
Fuzz Testing Tool(3)

Implementation:

Use PAIMEI framework to build a prototype fuzz testing
tool
 PAIMEI is a reverse engineering framework
 Written in Python scripting language
 Has been used by security community to build various
fuzzing, code coverage, and data flow tracking tools

Use IDA Pro plugin SDK to construct control flow
graph
Have successfully tested on TFTP binary program

System Overview
Extract program control flow graph (CFG)
Extract focusing subgraph (source, destination)
Set breakpoints and register breakpoint handlers
Initialize the set of random inputs
Inject inputs one by one










Record an input’s execution path via breakpoint handlers
Update dynamic Markov model parameters of CFG
Calculate fitness
Select a fraction of best inputs
Build the new set of inputs via mutation and crossover
Evaluation

Target Application


GA Parameters






We used the tftpd.exe Windows server program for our
initial experiments and validation of our approach
Mutation Rate = 90%
Crossover Rate = 75%
Elitism
Selective Breeding
Dynamic Mutation
Context Free Grammar


Hex bytes 0-255
Strings “netascii”, “octet”, and “mail”
TFTP Control Flow Graph
Experiment # 1:
Targeted Execution Control

Tested the ability of GA fuzzer to drive
execution through parse logic to 2 embedded,
vulnerable strcpy() functions.

Compared against fuzzing with random input

1st strcpy() reached in:



GA: 224 generations
Random: 2294 generations
2nd strcpy() reached in:


GA: 224 generations
Random: 9106 generations
GA vs. Random Search
Comparison between GA driven and random
search of tftp packet parsing logic. The node
address corresponds to basic block virtual
addresses on paths from the beginning to the
end of the packet parsing logic.
Comparison of the standard deviations of the
# of generations (in 50 runs) between the GA
driven and random search of tftp packet
parsing logic.
Fuzzing ran around 1 hour for 10,000 generations (may still not reach
target node), while our approach ran around 10 minutes to reach target node
Experiment # 2
Code Coverage Selectivity


Tested the ability of our GA to achieve code
coverage of the tftp parser logic
Compared against random input selection

Better code coverage
 Average over 3000 generations


Random approach: running for an additional 7000
generations only increased its coverage to 54.51%
Achieves deeper code coverage quicker
 Able to leverage what it has learned from past inputs!


GA: 84.81% coverage
Random: 49.54% coverage
Experiment # 3
CFG Penetration Depth
GA Search
Random Search
10000
9000
Time (generations)
8000
7000
6000
5000
4000
3000
2000
1000
0
1
2
3
4
5
6
7
8
Depth
9
10
11
12
13
14
15
Experiment #4:
Learning Input Formats

Programs assume that input will comply with
published standards


As a result, protocol parsing bugs abound!!!
We test the ability of our prototype to explore the
boundaries of the TFTP packet parsing logic by
attempting to have it learn a valid packet format

We set the destination node as the basic block
corresponding to an accepted packet
Evolving A TFTP Packet
Major Contributions

Practical implementation



Novelty in methodology



Dynamic Markov model as fitness
Grammatical evolution for input generation
Security focused


Finished initial prototype
Analysis on binary code
Previous related work focuses on software testing
Targeted code coverage

Efficiently test mission-critical or susceptible parts
Advantages of Our Approach

We apply knowledge gained from past experience to drive our
choice for future inputs



Maximizes code coverage within specific portions of a program
graph
Minimal knowledge of input structure required


Well suited to applying to parser code, which has a
rich control flow structure for the GA to learn from
GA can learn to approximate input format during
execution
Once a target location has been reached, the algorithm
continues to exploit weakensses in the CFG to produce
additional, different inputs capable of reaching it
Limitations

Difficulty to extract some parts of the CFG statically



Thread Creation
Call tables
Dependent upon Control Flow Graph structure

Program must have enough information embedded within its
structure for the GA to be able to “learn from”




Assumes dependency between graph structure and user supplied
input (an example would be parser code)
Not useful for programs that have a ‘flat’ CFG structure
Finding all paths has high complexity O() and takes a long time
on large program graphs
We can prove reachability by getting to a potentially vulnerable
target state, but failure to get there does not mean the location
is unreachable!
Conclusions

Shows how genetic algorithms can be applied to the external
input crafting process to maximize exploration of program state
space and intelligently drive a program into potential vulnerable
states.

Automated approach  treats the internal structure of each
node in the CFG as a black box.

Needs testing on more complex programs
 Our work is theoretical and prototypish
Download