Reverse Engineering - Network Penetration and Security

advertisement
EECS 354
Network Security
Reverse Engineering
Reverse Engineering
Introduction
Preventing Reverse Engineering
Reversing High Level Languages
Reversing an ELF Executable
Anything is possible
There is no computer system in existence that
cannot be reverse engineered
Most important limiting factors
Complexity
Time
Reversing by Language
Ruby, javascript, HTML, etc
Not compiled
Python, Java, C#, VB.NET, etc
Byte compiled
Easier to decompile/inspect
Many symbols still exist in bytecode
C, C++
Compiled into machine code
Much harder to decompile
Still possible to reverse engineer with debugger
and disassembler
Scalability of techniques
Basic reversing techniques work for small
code bases
It’s possible to determine what assembly code
does for a 100 line C program without too much
difficulty
Not used heavily by hackers
When trying to hack an application, crashes and
error messages are better hints
Windows
Is it possible to reverse engineer Windows?
How many lines of code does it have?
How long would it take?
Wine’s reverse engineering
The Wine project attempts to implement the
windows API
Project began in 1993, still unstable and
incomplete
Has over 1.4 million lines of code (written by 700
contributors)
Does not cover all of Windows (core OS,
windowing, etc)
On the other hand, Samba (reverse
engineering Windows file sharing) has been
pretty successful
Why Reverse Engineering?
Defense
Security companies often reverse malware
binaries
Protocol reversing for botnet analysis
Working with proprietary APIs or protocols
Hacking
Finding vulnerabilities is easier with the code
Introduction
Preventing Reverse Engineering
Reversing High Level Languages
Reversing an ELF Executable
Preventing reverse
engineering
Obfuscation
Translate code into something unreadable or
unnatural
Must trick a human reader without tricking the
machine interpreter/loader
Reverse engineering, besides in the most basic
form, is combating software obfuscation
Obfuscation Techniques
Renaming functions/variables
Adding bogus code with no side-effects
Remove whitespace
Make strings/numbers hex values
Using “dynamic” code
Javascript: eval
Java: GetName, GetAttribute
Python: getattr, setattr
Most of these are reversible
Except function/variable names can’t be
recovered
Obfuscation Techniques
Packing
Storing an executable as a string (or otherwise)
within an executable
Can make use of compression and encryption to
hide contents
Decompression or decryption code must be
packed in the executable as well
Complex packers exist for most languages
Javascript Obfuscation
Javascript Obfuscation
<script>eval(unescape('%3C%64%69%76%20%
73%74'))</script>
<script>a = ‘t’; b = ‘er’; c = ‘a’; d = eval; e =
‘\”XSS\”’; d(c+'l'+b+a+'('+e+')'); </script>
Introduction
Preventing Reverse Engineering
Reversing High Level Languages
Reversing an ELF Executable
What is byte code?
Byte code is compiled code that cannot be
executed by the processor
Distinct from machine code
Architecture independent
Executed by a software interpreter: a VM, a JIT
compiler, etc
Byte code is often dynamic
Symbols can be referenced at runtime
This means the program structure still exists,
can be rebuilt
Decompilers
Decompilers reverse the steps taken by a
compiler
Opcode translation
Abstract Syntax Tree construction
Python
Uncompyle2, decompyle, unpyc
Java
Jad, JD
Reversing Basics
Preventing Reverse Engineering
Reversing High Level Languages
Reversing an ELF Executable
Executables
Machine code is changed significantly from the
original source code
Variables have been allocated to registers or
somewhere in memory
Optimization steps have changed the program
structure
No way to decompile this back to the original
source
Machine instructions translate directly to
assembly code
Disassembly analysis can be effective
Reversing Executables
We will be focusing on x86 32-bit LSB ELF
executables
Contains ELF header, program header, section
table, and data
May also contain a symbol table
Reversing Executables
ELF Header contains program entry point, basic
identifying information
Program header describes memory segments (e.g.
where in memory will segments be loaded? what parts
of memory are r/w/x?)
Used at program load time
Section table describes section layout (e.g. where’s
the .rodata? .text? .bss?)
Used at link time
X86 Assembly
mov
add, sub shl, shr, sar, mul, div
and, or, xor
jmp, je, jne, jl, jg, jle, jge
cmp, test
call, push, pop, ret, nop
0x8(%esp), -0xc(%ebp)
Reversing Basics
Basic tools:
file
strings
strace (and ltrace)
nm
objdump or readelf
tcpdump
gdb
You can reverse anything with a good
debugger, but…
Reversing Frameworks
For more advanced reversing, it may help to
have more than just a debugger
IDA
Radare
ELF Obfuscation
There are some additional techniques for
obfuscating executable formats:
Storing data in unusual sections: .ctors, .dtors,
.init, etc
“Corrupting” the ELF header
Stripping the symbol table
Checking ptrace to prevent debuggers
Packing
Code is unpacked dynamically during execution
Malware Examples
Demo...
Source: http://crackmes.de/users/synamics/xrockmr/
Download