PowerPoint 簡報

Semi-Auto Vulnerability Research
Black Hat 2010
Richard Johnson, Sourcefire Inc
Lurene Grenier, Sourcefire Inc
Vulnerability Research Workflow
Assisted Triage Process
• What’s Fuzzing?
Vulnerability Research Workflow
Assisted Triage Process
What’s Fuzzing?
Input fuzzing is a technique in which data is
programmatically generated and provided to a
program in an effort to exercise available code
paths and expose memory corruption flaws.
For attacker, it represents the highest return
on investment with regards to time and effort.
 Introduction
 Vulnerability Research Workflow
Workflow Overview
Fuzzer Engine Selection
Attack Surface Analysis
Input Population and Selection
Process Setup
Data Collection
 Assisted Triage Process
 Conclusion
Attacker type
• Someone who has a target in mind for long term
and consistent infiltration.
• Freelance researcher whose goal is to reliably
discover exploitable vulnerabilities, and
weaponize them for sale.
That is, the balance between human effort
and CPU time is crucial.
The choice of target is outside the control of
attackers, and is instead chosen by
environment or market pressure.
Workflow Overview
 Select a target
 Select a fuzzer
• If the fuzzer is a mutation fuzzer, select input data.
• If the fuzzer is a generation fuzzer, create the
necessary templates.
 Run the fuzzer
 Evaluate output files by hand for exploitability
• 1st pass: remove certainly unexploitable bugs.
• 2nd pass: select probably exploitable bugs.
• 3rd pass: select a single bug worth putting significant
triage and exploit development time into.
Fuzzer Engine Selection
Mutation fuzzer
• Attacker needs little or no knowledge of format.
• It’s hard to go through full control paths.
Generation fuzzer
• In general, it find more bugs.
• However, it requires a good deal of time to create
the original template for each format or protocol.
Charlie Miller: the bug sets discovered by the
two fuzzer types often can be disjoint.
Attack Surface Analysis
Attack surface refers to the program code that
interacts with untrusted data.
The initial phases of fuzzing involve
understanding the target program and the
input data as thoroughly as possible.
The potential attack vectors include traditional
points of interest such as untrusted data entry
points or ancillary data such as code age.
Attack Surface Analysis (cont.)
Fig. 1: Untrusted data like files, registry keys, network packets, and other
input sources. This will be accomplished through an enumeration of
possible data entry points and call graph analysis. We can enumerate
by using static analysis to detect I/O related function calls with NTAPI
rather than wrapper functions.
Input Population and Selection
 Code path coverage should be distinct from
simple code block coverage.
 It’s also useful to constrain research to a specific
area of common program in order to reduce the
likelihood that others perform research in that
 For example, JBIG2 vulnerability in pdf, one could
utilize code path discovery program to some pdf
files from Google to find the difference, then use
PIN, a dynamic binary instrumentation
framework, to facilitate efficient program tracing.
Process Setup
 Ensure the process to be tested is in a state that
will facilitate simple and accurate collection of
data when a crash occur.
 The setup of debugging environment
• Post mortem debugging is to be avoided.
• In many cases, process will throw exception
▪ that will be handled properly but not for your debugging
▪ that is unable to handle.
 Authors’ solution is implemented with a custom
Windows debugging API wrapper in which the
process was set up and launched.
Data Collection
 The first issue of concern is: when a test may be considered
• Manually decide on a static time.
• Baseline the process in an idle state, then monitor CPU and
memory usage fro a return to this baseline once the input is
• In authors’ debugging wrapper, CPU monitoring is performed
with the Windows WBEM interface.
 Another issue is: what should be collected?
• Depends on the time and space necessary to capture, and the
likelihood of its usefulness.
• Some information, like stack trace and instruction when
program failed is always necessary should be collected
regardless of cost.
▪ This data is needed to differentiate individual crashes. (or bucket them)
Data Collection (cont.)
 There are few goals we’d like to achieve with the data
we collect
• Determine roughly what class the bug falls into.
• Know the probability of exploitability (a first pass)
• Get enough information to separate bugs into “buckets” as
mentioned before.
 There are also few goals we’d like to achieve in our
data storage
• Store information about all crashes indefinitely.
• Ability to search for a set of crashes with arbitrary
• Ability to change the criteria of our searches and the data
we store as we learn more about differentiation of crashes.
• We don’t want to destroy too many hard drives.
Data Collection (cont.)
Why Database?
• Attention paid to changes in software over time
can provide valuable insight into the prospective
shelf-life of a crash.
• We don’t need to rerun all tests.
• For developer, retest buckets by hand when a
crash is chosen for further testing to confirm that
it has been patched.
Bindiff may also be used to aid this process.
Data Collection (cont.)
 Data collection should begin when program throw an
exception that isn’t ignored by the debugger.
 Records include
• Unique fingerprint for the crash. (Used in bucketing to see
if it’s previously seen or variation)
Stack trace.
Crashing address.
Entire block of assembly containing the crashing instruction.
Registers and crash-specific metadata.
• Attackers should also weigh considerations such as ease of
discovery, ease of exploitation, and weight of target
 These information then be generated dataflow graph.
Vulnerability Research Workflow
Assisted Triage Process
• Triage Process
• Determine Exploitability
• Determine Root Cause or Triggering Condition
Triage Process
Finally, once an input has been generated that
results in a program exception, the triage
process can begin.
Triage is to determine
• Exploitability
• Root cause or triggering condition
Determine Exploitability
Exploitability can be determined by identifying
tainted data in the context of the crash.
Analyze surrounding code
• Tainted data is referenced by instruction prior to
the crash.
• Control instructions prior to the crash are not
impacted by tainted data.
Determine Root Cause
or Triggering Condition
 Root cause analysis is important especially to defender
so that attempts to fix vulnerabilities are made at the
original source of the problem. Otherwise, attackers
need only modify slightly on the input.
 In the case of mutation fuzzing, the trigger conditions
are known.
 If it happen at some impossible path, analyze the call
stack leading to the crash.
 Another approach to automating root cause is done
through graph analysis. The graphs of two or more
similar crash execution is likely to find the same tainted
source or some file location.
Vulnerability Research Workflow
Assisted Triage Process
• Fuzzing Process Schema
• Contribution
Fuzzing Process Schema
To sum up
Choose a target by environment or market.
Analyze target by static or auxiliary information.
Choose a kind of fuzzer, and generate inputs.
Set up debugging environment.
Set up the target to the state being tested.
Run the fuzzer.
Manually identify the crash by previous records,
call graph, control path graph, and dataflow graph.
• Store the outcome in database for further use.
This paper focus on alleviating some
determining efforts, and enumerate the
considerations taken into account when
coming to these situations.
The important factors in the continued
discovery of exploitable vulnerabilities was
not so much the exhaustive nature of fuzzer,
but post processing.
The End