Intro to Reverse Engineering ~ intropy ~ Intro Why do we reverse engineer? • Closed source software – Vulnerability Research – Product verification • Proprietary formats – Interoperability • SMB on UNIX • Word compatible editors • Virus research Why should you give a fuck? • Basis of computing – Reverse engineering teaches the inner workings of any processor – Learning how the processor handles data helps in understanding many other aspects of computer security • All the cool kids are doing it (not really) Real Time RCE (Debugging) • Debuggers that disassemble – OllyDbg – WinDbg – SoftIce • Code actually runs – The application actually executes all instructions as if it was ran normally • Uses interrupts to control execution of the program – Swaps out the current instruction with an interrupt instruction code – Swaps it back when the execution is continued Static Analysis (Dead Listing) • Traditional disassemblers – IDA Pro – W32Dasm – objdump • Code does not execute – The disassembler parses the file format and related code sections – Good disassemblers do deep recursive analysis to ensure proper instruction disassembly • Allows the user the ability to look at what code will do without actually running it • Does not allow the ease of live disassembly/debugging – Viewing registers – Inspecting the contents of memory File Formats What are file formats? • Files that adhere to a specific format often being executable by an operating system • Executable files are created from source code and libraries by a compiler • Data files can be created by anything from a text editor to an mp3 encoder Executable Contents • Machine code – Instructions the program will run – Memory locations • code addresses • function addresses • Program data – Static variables – Strings • Loader data – Imports – Exports Sections • Allows the loader to find various information • Not finite, executables can have user defined sections Executable Formats • ELF – Executable and Linker Format – History Originally published by UNIX system laboratories as a dynamic, linkable format to be used in various UNIX platforms – What uses ELF • Linux • Solaris • Most modern BSD based unix’s – Dissection • Header • Sections ELF Header • The header contains various information the operating system loading needs e_ident – Contains various identification fields including Endianess, ELF version, Operating System e_type – Identifies the object file type including relocatable, executable, or core file e_machine – Contains the processor type including Intel 80386, HPPA, PowerPC e_version – Contains the file version information e_entry - Contains the entry point for the executable e_phoff – Contains the program files header offset in bytes e_shoff – Contains the section header offset e_flags – Contains the processor specific flags e_ehsize – Contains the ELF header size in bytes ELF Sections • Each section of an ELF executable contain various information needed to execute .bss - This section holds uninitialized data that contributes to the program's memory image. By definition, the system initializes the data with zeros when the program begins to run. .comment - This section holds version control information. .ctors - This section holds initialized pointers to the C++ constructor functions. .data - This section holds initialized data that contribute to the program's memory image. .data1 - This section holds initialized data that contribute to the program's memory image. .debug - This section holds information for symbolic debugging. The contents are unspecified. .dtors - This section holds initialized pointers to the C++ destructor functions. .dynamic - This section holds dynamic linking information. ELF Sections Cont… .dynstr - This section holds strings needed for dynamic linking, most commonly the strings that represent the names associated with symbol table entries. .dynsym - This section holds the dynamic linking symbol table. .fini - This section holds executable instructions that contribute to the process termination code. When a program exits normally the system arranges to execute the code in this section. .got - This section holds the global offset table. .hash - This section holds a symbol hash table. .init - This section holds executable instructions that contribute to the process initialization code. When a program starts to run the system arranges to execute the code in this section before calling the main program entry point. .interp - This section holds the pathname of a program interpreter. If the file has a loadable segment that includes the section, the section's attributes will include the SHF_ALLOC bit. Otherwise, that bit will be off. .line - This section holds line number information for symbolic debugging, which describes the correspondence between the program source and the machine code. The contents are unspecified. ELF Sections Cont… .note - This section holds information in the ``Note Section'' format described below. .plt - This section holds the procedure linkage table. .relNAME - This section holds relocation information. By convention, ``NAME'' is supplied by the section to which the relocations apply. Thus a relocation section for .text normally would have the name .rel.text .rodata - This section holds read-only data that typically contributes to a nonwritable segment in the process image. .rodata1 - This section holds read-only data that typically contributes to a nonwritable segment in the process image. .shstrtab - This section holds section names. .strtab - This section holds strings, most commonly the strings that represent the names associated with symbol table entries. .symtab - This section holds a symbol table. If the file has a loadable segment that includes the symbol table, the section's attributes will include the SHF_ALLOC bit. Otherwise the bit will be off. .text - This section holds the ``text'' or executable instructions, of a program. Executable Formats Cont… • PE – Portable Executable – History Microsoft migrated to the PE format with the introduction of the Windows NT 3.1 operating system. It is based of a modified form of the UNIX COFF format – What uses PE • • • • • Windows NT Window 2000 Windows XP Windows 2003 Windows CE – Dissection • DOS Stub – The DOS stub contains a message that the executable will not run in DOS mode • Optional Header (Not optional] • RVA – Relative virtual addressing • Sections Optional Header • The optional header in a PE executable contains various information regarding the executable contents needed for the OS loader SizeOfCode - Size of the code (text) section, or the sum of all code sections if there are multiple sections. AddressOfEntryPoint – Address of the entry function to start execution from BaseOfCode - RVA of the start of the code relative to the base address BaseOfData – RVA of the start of the data relative to the base address SectionAlignment – Alignment of sections when loaded into memory FileAlignment – Alignment of section on disk SizeOfImage - Size, in bytes, of image, including all headers; must be a multiple of Section Alignment SizeOfHeaders - Combined size of MS-DOS stub, PE Header, and section headers rounded up to a multiple of FileAlignment. NumberOfRvaAndSizes - Number of data-dictionary entries in the remainder of the Optional Header. Each describes a location and size. Sections • The sections in a PE file contain various pieces of the executable needed to run including various RVA’s and offsets .text – Contains all executable code .idata – Contains imported data such as dll addresses .edata – Contains any exported data .data – Contains initialized data like global variables and string literals .bss – Contains un-initialized data .rsrc – Contains all module resources .reloc – Contains relocation data for the OS loader Data Formats • Different than executable formats – Doesn’t usually contain machine code – Has structure but not always defined sections • A reverser often needs to reverse how a file format functions – Proprietary formats are not always published – Reversing allows compatibility (i.e. Microsoft doc) • Data rights management – Often the only way to get what you pay for is to take action Assembly Language What is it • Lowest level of programming (besides microcode) • Direct processor register access utilizing architecture defined instructions • Output of most compilers How is it used • Directly using an assembler – NASM – ml – as • Output by a high level compiler – GCC – cl What does it looks like • Depends on the instruction set – IA32 • mov eax, 0x1 – PA-RISC • copy %r14,%r25 – ARM • LDR r0,[r8] Instruction Sets • The mneumonics for the opcodes handled by the processor • Minimal set of “commands” that achieve a programming goal Different Instruction Set Architectures • RISC - Reduced Instruction Set Computing – Fixed length 32 bit instructions – 32 general purpose registers – Vendors • IBM (PowerPC) • HP (PA-RISC) • Apple (PowerPC) • CISC - Complex Instruction Set Computing – – – – Multibyte instructions Multiple synonymous opcodes 16 registers Vendors • Intel (IA-32) • DEC [PDP-11] • Motorola (m68K) Registers and the Stack Overview • Purpose – Registers are used to store temporary data • Pointers • Computations – The stack is used to manage data • Variables • Data Stack Layout • Stack is dynamic but builds as it goes • Addresses start at a higher address and builds to lower addresses • The stack is generally allocated in 4 byte chunks Register sizes • Register sizes depend on the supported architecture – 32 bit – 64 bit • IA32 – 16 registers 32 bits (4 bytes) each • RISC – 32 general purpose registers 64 bits [8 bytes] each IA32 Registers • EBP – Stack frame base pointer – Points to the start of the functions stack frame • ESP – Stack source pointer – Points to the current (top) location on the stack • EIP – Instruction pointer – Points to the next executable instruction IA32 Registers Cont… • General Purpose registers – – – – – – – • Segment registers – – – – – – – • Used in general computation and control flow EAX – Accumulator register EBX – General data register ECX – Counter register EDX – General data register ESI – Source index register EDI – Destination index register Used to segment memory and compute addresses CS – Code segment register SS - Stack segment register DS - Data segment register ES - Extra (More data) segment register FS - Third data segment register GS – Fourth data segment register EFLAGS – CF – Carry Flag – SF – Signed Flag – ZF – Zero Flag Overview of IA-32 Instruction Set • mov – Moves source to destination • lea – Loads effective address • jmp – Jump – jne – Jump if not equal – jg – Jump if greater than • • • • • • call – Unconditional function call ret – Returns from a function to the caller add – Adds two values sub – subtracts two values xor – XORs two values cmp – Compares two registers Calling conventions Calling conventions define how the callers data is arranged on the stack • cdecl – Most common calling convention – Dynamic parameters – Caller unwinds stack • pop ebp • ret • fastcall – Higher performance – First two parameters are passed over registers • stdcall – Common in Windows – Parameters are received in reverse order – Function unwinds stack • ret 0x16 Example PUSH MOV CMP JNZ EBP EBP, ESP DWORD PTR [EBP+C], 111 00401054 ; Pushes the contents of EBP onto the stack ; Moves the address of ESP to EBP ; Subtract what is at EBP+12 with 111 ; If previous compare is not zero jump to 00401054 MOV EAX, DWORD PTR [EBP+10] ; Move what is at EBP+16 to EAX CMP AX, 64 ; Subtract what we moved to EAX with 64 JNZ 00401068 ; If the comparison does not equal 0 jump to address POP EBP ; Store the current value on the stack in EBP RET ; Return to the caller OllyDbg Overview • Purpose – OllyDbg is a general purpose win32 user land debugger. The great thing about it is the intuitive UI and powerful disassembler • Licensing – OllyDbg is free (shareware), however it is not open source and the source code is not available • Extensibility – OllyDbg has defined a plugin architecture allowing extensibility via powerful plugins Window Layouts • Window layouts are the various parts of the UI that contain pertinent information – Code window – Displays the executable machine code – Register window – Allows the user to watch the contents of each register during execution – Memory window – Allows the user to view the contents of various memory locations – Stack window – Displays the stack, including memory addresses and values Working in OllyDbg • Navigation – Moving – Searching • Commenting – Can be entered in the code window with the ; or : keys • Listing Names – The names window displays all functions or imported functions used in the program – Listing them is easy via the shortcut Ctrl + N • Showing Memory – Displaying memory can be useful when looking for strings or other important data – Displaying the memory map window can be achieved via Alt + M Working in OllyDbg Cont… • Breakpoints – Breakpoints allow the debugger to stop at a specified address or instruction – There are two types of breakpoints in general • Software breakpoints – Handled by the operating system – Set by navigating to the specified address and hitting F2 • Hardware breakpoints – Handled by the processor – Set by finding a place in memory you want to break on access and right clicking selecting the proper option – Olly also provides a way to view and turn on and off breakpoints via the breakpoints window with Alt + B Working in OllyDbg Cont… • Controlling Execution – Starting the process • Once the target program is either loaded or attached in Olly you can start execution. This will actually set up an initial breakpoint at the application entry point – There are several ways you can proceed from the entry point • Single stepping – Executes one instruction at a time and can be achieved by hitting F7 – Steps into every function – Tedious as fuck • Execute until return – Executes until the ret instuction is encoutered which can be achieved by hitting Ctrl + F9 – Executes all instructions in the current function – Faster than single stepping but not as comprehensive Working in OllyDbg Cont… • Watching execution – Registers • Handled in the register window • Red highlighting indicates a register has changed – Stack • Handled in the stack window • Display can be address or relative address from ebp • Call stack – Displays the functions the current function has been called from – Can be displayed with the shortcut Alt + K OllyDbg Case Study* (smarty word for demo) • Example – Program displays a popup box – Goal is to make the proper box show and exit • Patching – Allows us to modify the executable assembly code and save it to a new file with the changes OllyDbg Plugins • OllyDbg provides a downloadable PDK for plugin development • Several plugins exist that provide extra usability – Heap Vis – Breakpoint manager – Ollyscript IDA Pro Overview • IDA Pro was originally designed as a powerful disassembler • Supports 30+ processors • It has since been broadened to include a built in debugger • Designed for reverse engineers with quickness and robustness in mind – This sometimes makes the learning curve step • Extensible plugin architecture and scripting language Window Layouts • Customizing window layouts – Each saved session will store any customized layouts – A default layout can also be saved – Customized layouts are provided to help the user with workflow and can consist of any combination or number of windows Navigation • Shortcuts – Most actions have equivalent shortcuts associated with them – Some of the most used • [Enter] – Jumps into the function under the cursor • [Esc] – Returns to the previous cursor position • Jumping – IDA allows the user to jump to various parts of a binary file easily – Some of the jumps • Entry point – Jumps to the entry point of the binary • By name – Allows the user to jump to a specific function or string in the binary • By address – Allows the user to jump to a specific address • Markers – Markers can be used to tag locations in the binary for future reference – Markers are set using Alt + M and naming – Jumping to a marker is easily achieved with Ctrl + M Editing • Comments – Comments allow you to organize and document important parts of the binary – Comments can be entered using the shortcut keys ; or : • Function names can be renamed to something more descriptive – Often times symbols are not available for the binary and naming each functions allows you to understand and track your work – Functions can be renamed using the shortcut Alt + P Windows • IDA View – Displays the disassembled binary • Hex View – Display the hex view of the current cursor position • Names – The names windows displays textual names and addresses in the binary • Strings – The strings window contains any ascii strings present in the executable • Imports – The imports window contains the imported functions from dll’s • Functions – The functions window allows you to view all functions and their addresses Graphing • IDA Pro has a powerful graphing engine that allows a user to visualize call graphs and xrefs – Flow chart graphs display the current functions machine code and any branches – Function call graph will display the call flow of all the functions in the executable (Can be large) – Xref graphs display the to and from xrefs with machine code SDK/Plugins • The SDK allows the user to develop plugins for use in IDA Pro • Plugins are generally written in C/C++ and compiled against the SDK libraries and headers • Using the plugins you can write – processor modules – input processing modules – plugin modules • Some good plugins – x86emu – Allows ida to do runtime emulation – IDAPython – Access the IDA API in Python – Processes Stalker – Allows visualization and run time tracing Flirt • Fast Library Identification and Recognition Technology • Flirt is a means for IDA Pro to identify imported functions and compilers by matching against a database of known signatures • This greatly speeds up analysis by automatically naming discovered functions • Only works with C/C++ functions IDC Scripting • The IDC scripting engine allows the user to achieve small tasks through the IDC scripting engine • IDC resembles C and has many helpful functions built in – PatchByte – Comment – FindCode Decompiling Overview • Decompiling is different than disassembling in that it tries to reconstruct machine code to readable (and ultimately compilable) source code – Native compiled code is difficult to reconstruct because of the compilers behavior when optimizing the produced code – Virtual machine code is much easier to achieve readable code because of its nature. It must be compiled into a intermediate language with all necessary information the target platform may need to run • .Net • Java .Net • .Net is compiled down into MSIL (Microsoft intermediate language) and is a good example of decompiling • .Net must provide the operating system with a wealth of information including symbol names, and data structures Native code • Native code is a language that has been compiled down into machine language • Often times because of optimization a compiler inadvertently obfuscates the higher lever source code • Decompiling is not quite to the point of producing a good representation of the original source code Decompilers • .Net – ILDasm – Remotesoft Salamander – Reflector for .Net • Java – JODE – JAD (Disappeared) • Native – Boomerang Decompilation Demo Thanks fend3r! Conclusion • Reverse engineering is a vast and complex world • With a lot of practice though it becomes much easier • A good reverser knows their tools inside and out • Workflow and organization are the keys to reversing Shirt Quiz • • • • • • • • Name the IA-32 registers What does .Net assemble into In OllyDbg how do you list the Names What is the IA-32 instruction to Compare two integers How does the IA-32 processor handle signedness What does the IDC scripting language resemble How many processors does IDA support (roughly) In IDA how do you quickly follow a CALL References • • • • • • • • • • • • • Reversing - http://www.wiley.com/WileyCDA/WileyTitle/productCd0764574817.html ELF File format - http://www.skyfree.org/linux/references/ELF_Format.pdf PE File Format - http://msdn.microsoft.com/library/default.asp?url=/library/enus/dndebug/html/msdn_peeringpe.asp http://lsd-pl.net/references.html OllyDbg - http://ollydbg.de/ OllyDbg Plugins - http://ollydbg.win32asmcommunity.net/stuph/ IDA Pro - http://www.datarescue.com/ IDC - http://www.datarescue.com/idadoc/707.htm IDA Plugins - http://home.arcor.de/idapalace/ Reflector - http://www.aisto.com/roeder/dotnet/ JODE - http://jode.sourceforge.net/ Boomerang - http://boomerang.sourceforge.net/ Crackmes.de - http://www.crackmes.de/ Fucking done. Questions?