Intro to Reverse Engineering

advertisement
Intro to Reverse Engineering
~ intropy ~
Intro
Why do we reverse engineer?
• Closed source software
– Vulnerability Research
– Product verification
• Proprietary formats
– Interoperability
• SMB on UNIX
• Word compatible editors
• Virus research
Why should you give a fuck?
• Basis of computing
– Reverse engineering teaches the inner workings
of any processor
– Learning how the processor handles data helps in
understanding many other aspects of computer
security
• All the cool kids are doing it (not really)
Real Time RCE (Debugging)
• Debuggers that disassemble
– OllyDbg
– WinDbg
– SoftIce
• Code actually runs
– The application actually executes all instructions as if it
was ran normally
• Uses interrupts to control execution of the program
– Swaps out the current instruction with an interrupt
instruction code
– Swaps it back when the execution is continued
Static Analysis (Dead Listing)
• Traditional disassemblers
– IDA Pro
– W32Dasm
– objdump
• Code does not execute
– The disassembler parses the file format and related code sections
– Good disassemblers do deep recursive analysis to ensure proper
instruction disassembly
• Allows the user the ability to look at what code will do without
actually running it
• Does not allow the ease of live disassembly/debugging
– Viewing registers
– Inspecting the contents of memory
File Formats
What are file formats?
• Files that adhere to a specific format often
being executable by an operating system
• Executable files are created from source code
and libraries by a compiler
• Data files can be created by anything from a
text editor to an mp3 encoder
Executable Contents
• Machine code
– Instructions the program will run
– Memory locations
• code addresses
• function addresses
• Program data
– Static variables
– Strings
• Loader data
– Imports
– Exports
Sections
• Allows the loader to find various information
• Not finite, executables can have user defined
sections
Executable Formats
• ELF – Executable and Linker Format
– History
Originally published by UNIX system laboratories as a dynamic,
linkable format to be used in various UNIX platforms
– What uses ELF
• Linux
• Solaris
• Most modern BSD based unix’s
– Dissection
• Header
• Sections
ELF Header
• The header contains various information the operating system loading
needs
e_ident
– Contains various identification fields including Endianess, ELF
version, Operating System
e_type
– Identifies the object file type including relocatable, executable,
or core file
e_machine – Contains the processor type including Intel 80386, HPPA,
PowerPC
e_version – Contains the file version information
e_entry
- Contains the entry point for the executable
e_phoff
– Contains the program files header offset in bytes
e_shoff
– Contains the section header offset
e_flags
– Contains the processor specific flags
e_ehsize – Contains the ELF header size in bytes
ELF Sections
• Each section of an ELF executable contain various information
needed to execute
.bss
- This section holds uninitialized data that contributes to the program's
memory image. By definition, the system initializes the data with zeros
when the program begins to run.
.comment - This section holds version control information.
.ctors
- This section holds initialized pointers to the C++ constructor functions.
.data
- This section holds initialized data that contribute to the program's
memory image.
.data1
- This section holds initialized data that contribute to the program's
memory image.
.debug - This section holds information for symbolic debugging. The contents are
unspecified.
.dtors
- This section holds initialized pointers to the C++ destructor functions.
.dynamic - This section holds dynamic linking information.
ELF Sections Cont…
.dynstr - This section holds strings needed for dynamic linking, most commonly the
strings that represent the names associated with symbol table entries.
.dynsym - This section holds the dynamic linking symbol table.
.fini
- This section holds executable instructions that contribute to the process
termination code. When a program exits normally the system arranges to
execute the code in this section.
.got
- This section holds the global offset table.
.hash - This section holds a symbol hash table.
.init
- This section holds executable instructions that contribute to the process
initialization code. When a program starts to run the system arranges to
execute the code in this section before calling the main program entry
point.
.interp - This section holds the pathname of a program interpreter. If the file has a
loadable segment that includes the section, the section's attributes will
include the SHF_ALLOC bit. Otherwise, that bit will be off.
.line
- This section holds line number information for symbolic debugging, which
describes the correspondence between the program source and the
machine code. The contents are unspecified.
ELF Sections Cont…
.note
- This section holds information in the ``Note Section'' format described
below.
.plt
- This section holds the procedure linkage table.
.relNAME - This section holds relocation information. By convention, ``NAME'' is
supplied by the section to which the relocations apply. Thus a relocation
section for .text normally would have the name .rel.text
.rodata - This section holds read-only data that typically contributes to a nonwritable segment in the process image.
.rodata1 - This section holds read-only data that typically contributes to a nonwritable segment in the process image.
.shstrtab - This section holds section names.
.strtab - This section holds strings, most commonly the strings that represent the
names associated with symbol table entries.
.symtab - This section holds a symbol table. If the file has a loadable segment that
includes the symbol table, the section's attributes will include the
SHF_ALLOC bit. Otherwise the bit will be off.
.text
- This section holds the ``text'' or executable instructions, of a program.
Executable Formats Cont…
• PE – Portable Executable
– History
Microsoft migrated to the PE format with the introduction of the Windows NT 3.1
operating system. It is based of a modified form of the UNIX COFF format
– What uses PE
•
•
•
•
•
Windows NT
Window 2000
Windows XP
Windows 2003
Windows CE
– Dissection
• DOS Stub
– The DOS stub contains a message that the executable will not run in DOS mode
• Optional Header (Not optional]
• RVA
– Relative virtual addressing
• Sections
Optional Header
•
The optional header in a PE executable contains various information regarding the
executable contents needed for the OS loader
SizeOfCode
- Size of the code (text) section, or the sum of all code sections
if there are multiple sections.
AddressOfEntryPoint – Address of the entry function to start execution from
BaseOfCode
- RVA of the start of the code relative to the base address
BaseOfData
– RVA of the start of the data relative to the base address
SectionAlignment
– Alignment of sections when loaded into memory
FileAlignment
– Alignment of section on disk
SizeOfImage
- Size, in bytes, of image, including all headers; must be a
multiple of Section Alignment
SizeOfHeaders
- Combined size of MS-DOS stub, PE Header, and section
headers rounded up to a multiple of FileAlignment.
NumberOfRvaAndSizes - Number of data-dictionary entries in the remainder of the
Optional Header. Each describes a location and size.
Sections
• The sections in a PE file contain various pieces of the
executable needed to run including various RVA’s and offsets
.text – Contains all executable code
.idata – Contains imported data such as dll addresses
.edata – Contains any exported data
.data – Contains initialized data like global variables and string
literals
.bss – Contains un-initialized data
.rsrc – Contains all module resources
.reloc – Contains relocation data for the OS loader
Data Formats
• Different than executable formats
– Doesn’t usually contain machine code
– Has structure but not always defined sections
• A reverser often needs to reverse how a file format
functions
– Proprietary formats are not always published
– Reversing allows compatibility (i.e. Microsoft doc)
• Data rights management
– Often the only way to get what you pay for is to take action
Assembly Language
What is it
• Lowest level of programming (besides
microcode)
• Direct processor register access utilizing
architecture defined instructions
• Output of most compilers
How is it used
• Directly using an assembler
– NASM
– ml
– as
• Output by a high level compiler
– GCC
– cl
What does it looks like
• Depends on the instruction set
– IA32
• mov eax, 0x1
– PA-RISC
• copy %r14,%r25
– ARM
• LDR r0,[r8]
Instruction Sets
• The mneumonics for the opcodes handled by
the processor
• Minimal set of “commands” that achieve a
programming goal
Different Instruction Set Architectures
• RISC - Reduced Instruction Set Computing
– Fixed length 32 bit instructions
– 32 general purpose registers
– Vendors
• IBM (PowerPC)
• HP (PA-RISC)
• Apple (PowerPC)
• CISC - Complex Instruction Set Computing
–
–
–
–
Multibyte instructions
Multiple synonymous opcodes
16 registers
Vendors
• Intel (IA-32)
• DEC [PDP-11]
• Motorola (m68K)
Registers and the Stack
Overview
• Purpose
– Registers are used to store temporary data
• Pointers
• Computations
– The stack is used to manage data
• Variables
• Data
Stack Layout
• Stack is dynamic but builds as it goes
• Addresses start at a higher address and builds to
lower addresses
• The stack is generally allocated in 4 byte chunks
Register sizes
• Register sizes depend on the supported
architecture
– 32 bit
– 64 bit
• IA32
– 16 registers 32 bits (4 bytes) each
• RISC
– 32 general purpose registers 64 bits [8 bytes]
each
IA32 Registers
• EBP – Stack frame base pointer
– Points to the start of the functions stack frame
• ESP – Stack source pointer
– Points to the current (top) location on the stack
• EIP – Instruction pointer
– Points to the next executable instruction
IA32 Registers Cont…
•
General Purpose registers
–
–
–
–
–
–
–
•
Segment registers
–
–
–
–
–
–
–
•
Used in general computation and control flow
EAX – Accumulator register
EBX – General data register
ECX – Counter register
EDX – General data register
ESI – Source index register
EDI – Destination index register
Used to segment memory and compute addresses
CS – Code segment register
SS - Stack segment register
DS - Data segment register
ES - Extra (More data) segment register
FS - Third data segment register
GS – Fourth data segment register
EFLAGS
– CF – Carry Flag
– SF – Signed Flag
– ZF – Zero Flag
Overview of IA-32 Instruction Set
• mov – Moves source to destination
• lea – Loads effective address
• jmp – Jump
– jne – Jump if not equal
– jg – Jump if greater than
•
•
•
•
•
•
call – Unconditional function call
ret – Returns from a function to the caller
add – Adds two values
sub – subtracts two values
xor – XORs two values
cmp – Compares two registers
Calling conventions
Calling conventions define how the callers data is arranged on the stack
•
cdecl
– Most common calling convention
– Dynamic parameters
– Caller unwinds stack
• pop ebp
• ret
•
fastcall
– Higher performance
– First two parameters are passed over registers
•
stdcall
– Common in Windows
– Parameters are received in reverse order
– Function unwinds stack
• ret 0x16
Example
PUSH
MOV
CMP
JNZ
EBP
EBP, ESP
DWORD PTR [EBP+C], 111
00401054
; Pushes the contents of EBP onto the stack
; Moves the address of ESP to EBP
; Subtract what is at EBP+12 with 111
; If previous compare is not zero jump to
00401054
MOV EAX, DWORD PTR [EBP+10] ; Move what is at EBP+16 to EAX
CMP AX, 64
; Subtract what we moved to EAX with 64
JNZ 00401068
; If the comparison does not equal 0 jump to
address
POP EBP
; Store the current value on the stack in EBP
RET
; Return to the caller
OllyDbg
Overview
• Purpose
– OllyDbg is a general purpose win32 user land debugger.
The great thing about it is the intuitive UI and powerful
disassembler
• Licensing
– OllyDbg is free (shareware), however it is not open source
and the source code is not available
• Extensibility
– OllyDbg has defined a plugin architecture allowing
extensibility via powerful plugins
Window Layouts
• Window layouts are the various parts of the UI
that contain pertinent information
– Code window – Displays the executable machine
code
– Register window – Allows the user to watch the
contents of each register during execution
– Memory window – Allows the user to view the
contents of various memory locations
– Stack window – Displays the stack, including
memory addresses and values
Working in OllyDbg
• Navigation
– Moving
– Searching
• Commenting
– Can be entered in the code window with the ; or : keys
• Listing Names
– The names window displays all functions or imported functions used
in the program
– Listing them is easy via the shortcut Ctrl + N
• Showing Memory
– Displaying memory can be useful when looking for strings or other
important data
– Displaying the memory map window can be achieved via Alt + M
Working in OllyDbg Cont…
• Breakpoints
– Breakpoints allow the debugger to stop at a specified
address or instruction
– There are two types of breakpoints in general
• Software breakpoints
– Handled by the operating system
– Set by navigating to the specified address and hitting F2
• Hardware breakpoints
– Handled by the processor
– Set by finding a place in memory you want to break on access and
right clicking selecting the proper option
– Olly also provides a way to view and turn on and off
breakpoints via the breakpoints window with Alt + B
Working in OllyDbg Cont…
• Controlling Execution
– Starting the process
• Once the target program is either loaded or attached in Olly you can start
execution. This will actually set up an initial breakpoint at the application
entry point
– There are several ways you can proceed from the entry point
• Single stepping
– Executes one instruction at a time and can be achieved by hitting F7
– Steps into every function
– Tedious as fuck
• Execute until return
– Executes until the ret instuction is encoutered which can be achieved by
hitting Ctrl + F9
– Executes all instructions in the current function
– Faster than single stepping but not as comprehensive
Working in OllyDbg Cont…
• Watching execution
– Registers
• Handled in the register window
• Red highlighting indicates a register has changed
– Stack
• Handled in the stack window
• Display can be address or relative address from ebp
• Call stack
– Displays the functions the current function has been
called from
– Can be displayed with the shortcut Alt + K
OllyDbg Case Study*
(smarty word for demo)
• Example
– Program displays a popup box
– Goal is to make the proper box show and exit
• Patching
– Allows us to modify the executable assembly code
and save it to a new file with the changes
OllyDbg Plugins
• OllyDbg provides a downloadable PDK for
plugin development
• Several plugins exist that provide extra
usability
– Heap Vis
– Breakpoint manager
– Ollyscript
IDA Pro
Overview
• IDA Pro was originally designed as a powerful
disassembler
• Supports 30+ processors
• It has since been broadened to include a built in
debugger
• Designed for reverse engineers with quickness and
robustness in mind
– This sometimes makes the learning curve step
• Extensible plugin architecture and scripting
language
Window Layouts
• Customizing window layouts
– Each saved session will store any customized
layouts
– A default layout can also be saved
– Customized layouts are provided to help the user
with workflow and can consist of any combination
or number of windows
Navigation
• Shortcuts
– Most actions have equivalent shortcuts associated with them
– Some of the most used
• [Enter] – Jumps into the function under the cursor
• [Esc] – Returns to the previous cursor position
• Jumping
– IDA allows the user to jump to various parts of a binary file easily
– Some of the jumps
• Entry point – Jumps to the entry point of the binary
• By name – Allows the user to jump to a specific function or string in the binary
• By address – Allows the user to jump to a specific address
• Markers
– Markers can be used to tag locations in the binary for future reference
– Markers are set using Alt + M and naming
– Jumping to a marker is easily achieved with Ctrl + M
Editing
• Comments
– Comments allow you to organize and document important
parts of the binary
– Comments can be entered using the shortcut keys ; or :
• Function names can be renamed to something more
descriptive
– Often times symbols are not available for the binary and
naming each functions allows you to understand and track
your work
– Functions can be renamed using the shortcut Alt + P
Windows
• IDA View
– Displays the disassembled binary
• Hex View
– Display the hex view of the current cursor position
• Names
– The names windows displays textual names and addresses in the binary
• Strings
– The strings window contains any ascii strings present in the executable
• Imports
– The imports window contains the imported functions from dll’s
• Functions
– The functions window allows you to view all functions and their addresses
Graphing
• IDA Pro has a powerful graphing engine that
allows a user to visualize call graphs and
xrefs
– Flow chart graphs display the current functions
machine code and any branches
– Function call graph will display the call flow of all
the functions in the executable (Can be large)
– Xref graphs display the to and from xrefs with
machine code
SDK/Plugins
• The SDK allows the user to develop plugins for use in IDA Pro
• Plugins are generally written in C/C++ and compiled against
the SDK libraries and headers
• Using the plugins you can write
– processor modules
– input processing modules
– plugin modules
• Some good plugins
– x86emu – Allows ida to do runtime emulation
– IDAPython – Access the IDA API in Python
– Processes Stalker – Allows visualization and run time tracing
Flirt
• Fast Library Identification and Recognition
Technology
• Flirt is a means for IDA Pro to identify imported
functions and compilers by matching against
a database of known signatures
• This greatly speeds up analysis by
automatically naming discovered functions
• Only works with C/C++ functions
IDC Scripting
• The IDC scripting engine allows the user to
achieve small tasks through the IDC scripting
engine
• IDC resembles C and has many helpful
functions built in
– PatchByte
– Comment
– FindCode
Decompiling
Overview
• Decompiling is different than disassembling in that
it tries to reconstruct machine code to readable (and
ultimately compilable) source code
– Native compiled code is difficult to reconstruct because of
the compilers behavior when optimizing the produced
code
– Virtual machine code is much easier to achieve readable
code because of its nature. It must be compiled into a
intermediate language with all necessary information the
target platform may need to run
• .Net
• Java
.Net
• .Net is compiled down into MSIL (Microsoft
intermediate language) and is a good
example of decompiling
• .Net must provide the operating system with a
wealth of information including symbol
names, and data structures
Native code
• Native code is a language that has been
compiled down into machine language
• Often times because of optimization a
compiler inadvertently obfuscates the higher
lever source code
• Decompiling is not quite to the point of
producing a good representation of the
original source code
Decompilers
• .Net
– ILDasm
– Remotesoft Salamander
– Reflector for .Net
• Java
– JODE
– JAD (Disappeared)
• Native
– Boomerang
Decompilation Demo
Thanks fend3r!
Conclusion
• Reverse engineering is a vast and complex
world
• With a lot of practice though it becomes much
easier
• A good reverser knows their tools inside and
out
• Workflow and organization are the keys to
reversing
Shirt Quiz
•
•
•
•
•
•
•
•
Name the IA-32 registers
What does .Net assemble into
In OllyDbg how do you list the Names
What is the IA-32 instruction to Compare two
integers
How does the IA-32 processor handle signedness
What does the IDC scripting language resemble
How many processors does IDA support (roughly)
In IDA how do you quickly follow a CALL
References
•
•
•
•
•
•
•
•
•
•
•
•
•
Reversing - http://www.wiley.com/WileyCDA/WileyTitle/productCd0764574817.html
ELF File format - http://www.skyfree.org/linux/references/ELF_Format.pdf
PE File Format - http://msdn.microsoft.com/library/default.asp?url=/library/enus/dndebug/html/msdn_peeringpe.asp
http://lsd-pl.net/references.html
OllyDbg - http://ollydbg.de/
OllyDbg Plugins - http://ollydbg.win32asmcommunity.net/stuph/
IDA Pro - http://www.datarescue.com/
IDC - http://www.datarescue.com/idadoc/707.htm
IDA Plugins - http://home.arcor.de/idapalace/
Reflector - http://www.aisto.com/roeder/dotnet/
JODE - http://jode.sourceforge.net/
Boomerang - http://boomerang.sourceforge.net/
Crackmes.de - http://www.crackmes.de/
Fucking done.
Questions?
Download