EECS 354 Network Security Reverse Engineering Reverse Engineering Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable Anything is possible There is no computer system in existence that cannot be reverse engineered Most important limiting factors Complexity Time Reversing by Language Ruby, javascript, HTML, etc Not compiled Python, Java, C#, VB.NET, etc Byte compiled Easier to decompile/inspect Many symbols still exist in bytecode C, C++ Compiled into machine code Much harder to decompile Still possible to reverse engineer with debugger and disassembler Scalability of techniques Basic reversing techniques work for small code bases It’s possible to determine what assembly code does for a 100 line C program without too much difficulty Not used heavily by hackers When trying to hack an application, crashes and error messages are better hints Windows Is it possible to reverse engineer Windows? How many lines of code does it have? How long would it take? Wine’s reverse engineering The Wine project attempts to implement the windows API Project began in 1993, still unstable and incomplete Has over 1.4 million lines of code (written by 700 contributors) Does not cover all of Windows (core OS, windowing, etc) On the other hand, Samba (reverse engineering Windows file sharing) has been pretty successful Why Reverse Engineering? Defense Security companies often reverse malware binaries Protocol reversing for botnet analysis Working with proprietary APIs or protocols Hacking Finding vulnerabilities is easier with the code Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable Preventing reverse engineering Obfuscation Translate code into something unreadable or unnatural Must trick a human reader without tricking the machine interpreter/loader Reverse engineering, besides in the most basic form, is combating software obfuscation Obfuscation Techniques Renaming functions/variables Adding bogus code with no side-effects Remove whitespace Make strings/numbers hex values Using “dynamic” code Javascript: eval Java: GetName, GetAttribute Python: getattr, setattr Most of these are reversible Except function/variable names can’t be recovered Obfuscation Techniques Packing Storing an executable as a string (or otherwise) within an executable Can make use of compression and encryption to hide contents Decompression or decryption code must be packed in the executable as well Complex packers exist for most languages Javascript Obfuscation Javascript Obfuscation <script>eval(unescape('%3C%64%69%76%20% 73%74'))</script> <script>a = ‘t’; b = ‘er’; c = ‘a’; d = eval; e = ‘\”XSS\”’; d(c+'l'+b+a+'('+e+')'); </script> Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable What is byte code? Byte code is compiled code that cannot be executed by the processor Distinct from machine code Architecture independent Executed by a software interpreter: a VM, a JIT compiler, etc Byte code is often dynamic Symbols can be referenced at runtime This means the program structure still exists, can be rebuilt Decompilers Decompilers reverse the steps taken by a compiler Opcode translation Abstract Syntax Tree construction Python Uncompyle2, decompyle, unpyc Java Jad, JD Reversing Basics Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable Executables Machine code is changed significantly from the original source code Variables have been allocated to registers or somewhere in memory Optimization steps have changed the program structure No way to decompile this back to the original source Machine instructions translate directly to assembly code Disassembly analysis can be effective Reversing Executables We will be focusing on x86 32-bit LSB ELF executables Contains ELF header, program header, section table, and data May also contain a symbol table Reversing Executables ELF Header contains program entry point, basic identifying information Program header describes memory segments (e.g. where in memory will segments be loaded? what parts of memory are r/w/x?) Used at program load time Section table describes section layout (e.g. where’s the .rodata? .text? .bss?) Used at link time X86 Assembly mov add, sub shl, shr, sar, mul, div and, or, xor jmp, je, jne, jl, jg, jle, jge cmp, test call, push, pop, ret, nop 0x8(%esp), -0xc(%ebp) Reversing Basics Basic tools: file strings strace (and ltrace) nm objdump or readelf tcpdump gdb You can reverse anything with a good debugger, but… Reversing Frameworks For more advanced reversing, it may help to have more than just a debugger IDA Radare ELF Obfuscation There are some additional techniques for obfuscating executable formats: Storing data in unusual sections: .ctors, .dtors, .init, etc “Corrupting” the ELF header Stripping the symbol table Checking ptrace to prevent debuggers Packing Code is unpacked dynamically during execution Malware Examples Demo... Source: http://crackmes.de/users/synamics/xrockmr/