BINHUNT

advertisement
BinHunt:
Automatically Finding Semantic
Differences in Binary Programs
Debian Gao
Michael K. Reiter
Dawn Song
ICICS 2008: 10th International Conference on
Information and Comunications Security
Conference
ICICS:
A bi-annual International Conference on Information,
Communications and Signal Processing. The conference
covers areas in Information Engineering, Communication
Systems, Signal Processing, Multimedia Processing and
Applications.
Papers
Session V: Software security
 BinHunt: Automatically Finding Semantic Differences in Binary
Programs
Debin Gao (a), Mike Reiter (b) and Dawn Song (c)
 Enhancing Java ME Security Support with Resource Usage
Monitoring
Paolo Mori, Fabio Martinelli, Alessandro Castrucci and Francesco
Roperti
IIT-CNR, Italy
 Pseudo-randomness Inside Web Browsers
Guan Zhi, Zhang Long, Zhong Chen and Nan Xianghao
Peking University, China
Author

Debin Gao

Michael K. Reiter

Dawn Song
Debin Gao



Assistant Professor
School of Information Systems
Singapore Management University

Automatically Adapting a Trained Anomaly Detector to
Software Patches
Peng Li, Debin Gao and Michael K. Reiter
In Proceedings of the 12th International Symposium on
Recent Advances in Intrusion Detection (RAID 2009)
Bridging the Gap between Data-flow and Control-flow
Analysis for Anomaly Detection
Peng Li, Hyundo Park, Debin Gao and Jianming Fu
In Proceedings of the 24th Annual Computer Security
Applications Conference (ACSAC 2008)
Gray-Box Extraction of Execution Graphs for Anomaly
Detection
Debin Gao, Michael K. Reiter and Dawn Song
In Proceedings of the 11th ACM Conference on
Computer and Communications Security (CCS 2004)
On Gray-Box Program Tracking for Anomaly Detection
Debin Gao, Michael K. Reiter and Dawn Song
In Proceedings of the 13th USENIX Security
Symposium (USENIX Security 2004)
Michael K. Reiter



Automatically adapting a trained anomaly detector to
software patches
P. Li, D. Gao and M. K. Reiter
In Recent Advances in Intrusion Detection, 12th
International Symposium, RAID 2009
Fast and black-box exploit detection and signature
generation for commodity software
X. Wang, Z. Li, J. Y. Choi, J. Xu, M. K. Reiter and C. Kil
ACM Transactions on Information and System Security
12(2)
On gray-box program tracking for anomaly detection
D. Gao, M. K. Reiter and D. Song
In Proceedings of the 13th USENIX Security
Symposium
Lawrence M. Slifkin Distinguished Professor
Department of Computer Science
University of North Carolina at Chapel HIll
Dawn Song

Research Projects
BitBlaze: Binary analysis for COTS protection and
malicious code defense


Associate Professor
Computer Science Division
University of California, Berkeley

Binary Code Extraction and Interface Identification for
Security Applications. Juan Caballero, Noah M.
Johnson, Stephen McCamant, and Dawn Song. In
Proceedings of the 17th Annual Network and
Distributed System Security Symposium, February
2010.
Loop-Extended Symbolic Execution on Binary
Programs. Prateek Saxena, Pongsin Poosankam,
Stephen McCamant, and Dawn Song. In Proceedings of
the ACM/SIGSOFT International Symposium on
Software Testing and Analysis (ISSTA), July 2009.
BitBlaze: A New Approach to Computer Security via
Binary Analysis. Dawn Song, David Brumley, Heng Yin,
Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai
Liang, James Newsome, Pongsin Poosankam, and
Prateek Saxena. In Proceedings of the 4th International
Conference on Information Systems Security
Introduction
BinHunt:
It bases its analysis on the control flow of the programs using
a new graph isomorphism technique, symbolic execution, and
theorem proving for finding semantic differences in binary
programs.
Semantic differences:
changes in the program functionality
Syntactic differences:
e.g. Different register allocation and basic block re-ordering
Challenge


A small change in the source code may cause
the compiler to use a different register allocation
in other parts of the program in which the
corresponding source code remains the same
A small change in the source code may change
the size of a small number of basic blocks, which
further triggers the compiler to re-order many
other basic blocks in the binary file
Idea

The control flow of a program is much more
resistant to “superficial” changes like different
register allocations and basic block re-ordering,
and therefore is a more attractive feature for
finding semantic differences
Assumption


source code of binary files is not available
function name extracted from these binary
files are unreliable for the purpose of binary
difference analysis, since they can be
changed easily
System Overview(1)
Input: two binary files
Output: a matching between functions in the two binary files
a matching between basic blocks in two matched functions
a matching strength for each match of functions or basic block
System Overview(2)
Decision:
The matchings together with the matching strengths tell us
where the semantic differences are. Unmatched functions and
unmatched basic blocks, as well as matched functions and
matched basic blocks with low matching strengths, constitute
the semantic differences found between the two binary file.
Disassembler

parse each binary file

locate the code segment
Realization:
Implement a plug-in to IDA Pro
IR Converter

IR: a dozen different statements, which are type-checked
and free of side effects
Easy: our symbolic execution and theorem proving are applied on a much
simpler set of instructions
Reliable: reduce the language variation in performing the same functionality
CFG Constuctor


CFG: a set of nodes each representing a basic block and a set
of directed edges representing the control flow among the
basic blocks
CG: the set of nodes corresponding to the functions in the file
and the set of directed edges representing calls among the
functions
Graph Isomorphism Engine

Basic Block Comparison
Symbolic Execution and Theorem Proving

Maximum common subgraph isomorphism problem
Backtracking Algorithm
Symbolic Execution

Definition
represent values of program variables with symbolic values instead of
concrete(initialized) data and to manipulate expressions involving symbolic
values

Procedure
Step1:
find all the input and output registers and variables
Step2:
use symbolic execution to represent the final values of the output registers
and variables
Theorem Proving

Realization
STP: a decision procedure for the satisfiability of quantifier-free formulas in
the theory of bit-vectors and arrays

Procedure
pick the symbolic representation of one register/variable from each basic
block and use STP to test if they are equivalent, assuming that the inputs to
the basic blocks share the same values

Assurance
if two basic blocks are found to be different by our technique of symbolic
execution and theorem proving, then they must not be functionally equivalent
This property holds even if the two binary files are compiled using
different compilers or compiler options.
Matching Strength

Basic Block

1.0: functionally equivalent and registers used are the same

0.9: functionally equivalent while registers used are different

lower: scored on how functionally equivalent they are

Function

1.0: instructions(x86 or IR) of the two functions are the same

others: subgraph measurement divided by the number of nodes in the CFG
that has fewer nodes, where subgraph measurement is defined as the
summation of matching strengths of matched nodes(basic blocks)
Backtracking Algorithm

D:
contains all possible pairs
of nodes that might still be
matched(initially V X M)

M:
contains matched node
pairs(initially empty)
Case Study——gzip
Case Study——tar(1)
Case Study——tar(2)
Case Study——tar(3)
Related Work& Conclusion

BinDiff/BindView
contruct a maximal subgraph isomorphism between the sets of
functions in two versions of the same executable file



BinHunt:
contribute a more thorough technique(backtracking technique)
for identifying the maximum common subgraph isomorphism
use a novel technique for basic block comparison using
symbolic execution and theorem proving
Reference
Thank you!
Download