Anti-malware Security Projects 236349 Contact Persons: Primary: Secondary: Tomer Brand Daniel Radu Daniel.Radu@microsoft.com 052-6119010 Marian Radu Marian.Radu@microsoft.com TomerB@Microsoft.com General comments: 1) The following projects will be guided by Microsoft security researchers, which based in Munich, Germany. Students who will choose those projects shall gain: Hands-on experience in anti-malware / anti-cyber challenges Initial experience of working in a global environments with experts from around the globe (The will also be a point of contact in Israel.) 2) The project are stack rank according to the difficulty level (From easiest to hardest) Project 1: Automatic function identification in binary code General Description Finding functions in a binary is a fundamental problem in reverse engineering because of the fact that there is no distinction between code and data on intel processors. Current methods are based on pattern matching based on the code generated by compilers (prologue and epilog usually) and as such not accurate enough: when compilers are changed, code is obfuscated or prologue and epilog of the functions are not standard. This capability, of finding code functions, would be integrated into an automatic processes which performs classification / identification of malware files and tools used in cyber-attacks. Goals Students will be required to implement a tool which performs static analysis of a portable executable program and finds all internal functions (not including imports) The ‘input’ we will have are stripped binary files (without any debug info) with a pointer for the program entry point Prerequisites Compilation course Computer Security course Basic Cryptography course Recommended Reading IDA F.L.I.R.T. Technology: In-Depth BYTEWEIGHT: Learning to Recognize Functions in Binary Code Project 2: Improving de-compilation using symbolic execution (smt solvers, abstract interpretation) General Description IDA is a de-facto standard tool used by all the researchers in the anti-malware industry. IDA has a plug-in which allows it to decompile x86 code back to C. Because of the fact that IDA does all of its analysis statically the de-compilation fails if the disassembly it encounters contains: Data embedded between instructions Indirect branch instructions Obfuscated code which does not follow compiler generated “style” Etc. Goals The goal is to be able to leverage symbolic execution to retrieve additional information and embedding it back into IDA in order to improve de-compilation results. More specifically we would to produce SMT equations for compiler functions Prerequisites Compilation course Computer Security course Basic Cryptography course Recommended Reading Disassembly Challenges Project 3: Using symbolic execution and SMT solvers to reason about a loop’s exit criteria and reduction of complexity General Description When the anti-malware engine scans a file that is about to get launched it emulates (executing in a local sandbox) in an attempt to identify malicious behaviors. During emulation, loops stand out because they are resource intensive and lead to early termination of the emulation in a significant number of cases. Malware (ab)uses loops to hide their behavior from the malware scanner’s emulator. Being able to tell if a loop: Will terminate What kind of computation it is performing Is inefficient and can be optimized Would help the emulation process and increase the anti-malware engines ability to detect malware before it being actually executed by the OS. Goals Identify loops and their intent in order to replace the expensive loop with a less expensive, non-iterative, piece of code, with similar side effects, and continue emulation. Prerequisites Compilation course Computer Security course Basic Cryptography course Project 4: Function matching using code semantics General Description Syntax is highly fluid, one can produce many different implementations, in terms of code structure, which eventually performs the same task. This fact give an attacker a lot of power in terms of hiding his real intent and escape security products. This project is aiming towards statically identify functions based on semantics rather than syntax thus making it resilient to obfuscation, compiler changes, etc. This could be useful in recognizing: Crypto algorithms Standard library functions (atoa, printf, etc.) Malicious functions In the general sense this is unsolvable (NP Complete) problem, but we can do a reduction for scenarios which is doable. Goals The goal is to be able to define a language for semantic level / intent of a function and have a tool which analyze programs and describe their using the defined language The ‘input’ would be a binary and a breakdown of its internal functions in a decompiled or assembly language. The expected output would be: o Identification of crypto algorithms (and even distinguishing between an encrypt or decrypt routines) o Identify authentication method Prerequisites Compilation course Computer Security course Basic Cryptography course Recommended Reading Fast location of similar code fragments using semantic “juice”