BinHunt: Automatically Finding Semantic Differences in Binary Programs Debian Gao Michael K. Reiter Dawn Song ICICS 2008: 10th International Conference on Information and Comunications Security Conference ICICS: A bi-annual International Conference on Information, Communications and Signal Processing. The conference covers areas in Information Engineering, Communication Systems, Signal Processing, Multimedia Processing and Applications. Papers Session V: Software security BinHunt: Automatically Finding Semantic Differences in Binary Programs Debin Gao (a), Mike Reiter (b) and Dawn Song (c) Enhancing Java ME Security Support with Resource Usage Monitoring Paolo Mori, Fabio Martinelli, Alessandro Castrucci and Francesco Roperti IIT-CNR, Italy Pseudo-randomness Inside Web Browsers Guan Zhi, Zhang Long, Zhong Chen and Nan Xianghao Peking University, China Author Debin Gao Michael K. Reiter Dawn Song Debin Gao Assistant Professor School of Information Systems Singapore Management University Automatically Adapting a Trained Anomaly Detector to Software Patches Peng Li, Debin Gao and Michael K. Reiter In Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection (RAID 2009) Bridging the Gap between Data-flow and Control-flow Analysis for Anomaly Detection Peng Li, Hyundo Park, Debin Gao and Jianming Fu In Proceedings of the 24th Annual Computer Security Applications Conference (ACSAC 2008) Gray-Box Extraction of Execution Graphs for Anomaly Detection Debin Gao, Michael K. Reiter and Dawn Song In Proceedings of the 11th ACM Conference on Computer and Communications Security (CCS 2004) On Gray-Box Program Tracking for Anomaly Detection Debin Gao, Michael K. Reiter and Dawn Song In Proceedings of the 13th USENIX Security Symposium (USENIX Security 2004) Michael K. Reiter Automatically adapting a trained anomaly detector to software patches P. Li, D. Gao and M. K. Reiter In Recent Advances in Intrusion Detection, 12th International Symposium, RAID 2009 Fast and black-box exploit detection and signature generation for commodity software X. Wang, Z. Li, J. Y. Choi, J. Xu, M. K. Reiter and C. Kil ACM Transactions on Information and System Security 12(2) On gray-box program tracking for anomaly detection D. Gao, M. K. Reiter and D. Song In Proceedings of the 13th USENIX Security Symposium Lawrence M. Slifkin Distinguished Professor Department of Computer Science University of North Carolina at Chapel HIll Dawn Song Research Projects BitBlaze: Binary analysis for COTS protection and malicious code defense Associate Professor Computer Science Division University of California, Berkeley Binary Code Extraction and Interface Identification for Security Applications. Juan Caballero, Noah M. Johnson, Stephen McCamant, and Dawn Song. In Proceedings of the 17th Annual Network and Distributed System Security Symposium, February 2010. Loop-Extended Symbolic Execution on Binary Programs. Prateek Saxena, Pongsin Poosankam, Stephen McCamant, and Dawn Song. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), July 2009. BitBlaze: A New Approach to Computer Security via Binary Analysis. Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. In Proceedings of the 4th International Conference on Information Systems Security Introduction BinHunt: It bases its analysis on the control flow of the programs using a new graph isomorphism technique, symbolic execution, and theorem proving for finding semantic differences in binary programs. Semantic differences: changes in the program functionality Syntactic differences: e.g. Different register allocation and basic block re-ordering Challenge A small change in the source code may cause the compiler to use a different register allocation in other parts of the program in which the corresponding source code remains the same A small change in the source code may change the size of a small number of basic blocks, which further triggers the compiler to re-order many other basic blocks in the binary file Idea The control flow of a program is much more resistant to “superficial” changes like different register allocations and basic block re-ordering, and therefore is a more attractive feature for finding semantic differences Assumption source code of binary files is not available function name extracted from these binary files are unreliable for the purpose of binary difference analysis, since they can be changed easily System Overview(1) Input: two binary files Output: a matching between functions in the two binary files a matching between basic blocks in two matched functions a matching strength for each match of functions or basic block System Overview(2) Decision: The matchings together with the matching strengths tell us where the semantic differences are. Unmatched functions and unmatched basic blocks, as well as matched functions and matched basic blocks with low matching strengths, constitute the semantic differences found between the two binary file. Disassembler parse each binary file locate the code segment Realization: Implement a plug-in to IDA Pro IR Converter IR: a dozen different statements, which are type-checked and free of side effects Easy: our symbolic execution and theorem proving are applied on a much simpler set of instructions Reliable: reduce the language variation in performing the same functionality CFG Constuctor CFG: a set of nodes each representing a basic block and a set of directed edges representing the control flow among the basic blocks CG: the set of nodes corresponding to the functions in the file and the set of directed edges representing calls among the functions Graph Isomorphism Engine Basic Block Comparison Symbolic Execution and Theorem Proving Maximum common subgraph isomorphism problem Backtracking Algorithm Symbolic Execution Definition represent values of program variables with symbolic values instead of concrete(initialized) data and to manipulate expressions involving symbolic values Procedure Step1: find all the input and output registers and variables Step2: use symbolic execution to represent the final values of the output registers and variables Theorem Proving Realization STP: a decision procedure for the satisfiability of quantifier-free formulas in the theory of bit-vectors and arrays Procedure pick the symbolic representation of one register/variable from each basic block and use STP to test if they are equivalent, assuming that the inputs to the basic blocks share the same values Assurance if two basic blocks are found to be different by our technique of symbolic execution and theorem proving, then they must not be functionally equivalent This property holds even if the two binary files are compiled using different compilers or compiler options. Matching Strength Basic Block 1.0: functionally equivalent and registers used are the same 0.9: functionally equivalent while registers used are different lower: scored on how functionally equivalent they are Function 1.0: instructions(x86 or IR) of the two functions are the same others: subgraph measurement divided by the number of nodes in the CFG that has fewer nodes, where subgraph measurement is defined as the summation of matching strengths of matched nodes(basic blocks) Backtracking Algorithm D: contains all possible pairs of nodes that might still be matched(initially V X M) M: contains matched node pairs(initially empty) Case Study——gzip Case Study——tar(1) Case Study——tar(2) Case Study——tar(3) Related Work& Conclusion BinDiff/BindView contruct a maximal subgraph isomorphism between the sets of functions in two versions of the same executable file BinHunt: contribute a more thorough technique(backtracking technique) for identifying the maximum common subgraph isomorphism use a novel technique for basic block comparison using symbolic execution and theorem proving Reference Thank you!