Reverse Engineering Grant Curell What is reverse engineering? Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation Why does the military care and more specifically, why should you? General reversing • • • • What is the x86 architecture? What is a computer? What is the general structure of a computer? What differentiates malicious instructions from legitimate instructions? Registers • • • • • • • • EAX - Accumulator Register EBX - Base Register ECX - Counter Register EDX - Data Register ESI - Source Index EDI - Destination Index EBP - Base Pointer ESP - Stack Pointer • EAX - All major calculations take place in EAX, making it similar to a dedicated accumulator register. • EDX - The data register is the an extension to the accumulator. It is most useful for storing data related to the accumulator's current calculation. • ECX - Like the variable i in high-level languages, the count register is the universal loop counter. • EDI - Every loop must store its result somewhere, and the destination index points to that place. With a single-byte STOS instruction to write data out of the accumulator, this register makes data operations much more size-efficient. • ESI - In loops that process data, the source index holds the location of the input data stream. Like the destination index, EDI had a convenient one-byte instruction for loading data out of memory into the accumulator. • ESP - ESP is the sacred stack pointer. With the important PUSH, POP, CALL, and RET instructions requiring it's value, there is never a good reason to use the stack pointer for anything else. • EBP - In functions that store parameters or variables on the stack, the base pointer holds the location of the current stack frame. In other situations, however, EBP is a free data-storage register. • EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space. Commands command operand1, operand2 command reg1, op1/reg2 command op1/reg1 Where an operator can be fixed number or a register Commands – The basics • sub dest, src - The source is subtracted from the destination and the result is stored in the destination. (dest-src=dest) • add dest, src - Adds "src" to "dest" and replacing the original contents of "dest". Both operands are binary. • mov dest, src - Copies byte or word from the source operand to the destination operand. Commands – push and pop • push src - Decrements SP by the size of the operand (two or four, byte values are sign extended) and transfers one word from source to the stack top (SS:SP). • Transfers word at the current stack top (SS:SP) to the destination then increments SP by two to point to the new stack top. CS is not a valid destination. commands – comparison operators • xor dest, src - Performs a bitwise exclusive OR of the operands and returns the result in the destination. • test dest, src - Performs a logical AND of the two operands updating the flags register without saving the result. • cmp - Subtracts source from destination and updates the flags but does not save result. Flags can subsequently be checked for conditions. Commands – call • call - Pushes Instruction Pointer (and Code Segment for far calls) onto stack and loads Instruction Pointer with the address of procname. Code continues with execution at CS:IP. Commands – leave and ret • Leave – commands - jumps Details on Jumps commands – pointers and lea • lea dest, src - Transfers offset address of "src" to the destination register. Question? Is the value of ESI the same or different after each of these instructions? What is its value(s)? MOV ESI, [EBX + 8*EAX + 4] and LEA ESI, [EBX + 8*EAX + 4] Example 1 We’ll talk about these later. The original code... Calling Conventions The C Convention: pushes arguments onto the stack from right to left (i.e., the first argument of the function is placed on the stack last, and thus appears on top). Deleting arguments from the stack is entrusted not to the function, but to the code calling the function. The Pascal convention pushes arguments on the stack from left to right (i.e., the first argument of the function is placed on the stack first, and thus appears on the bottom). The deletion of function arguments is entrusted to the function itself, Example 1 – What’s happening here? Example 2 Meow Example 2 – Where was our string stored? Example 3: Now you know enough to be dangerous Hint: You’ll need this ASCII TABLE Some review – Buffer Overflows Where is data stored? • Generally: All local variables are stored on the stack. • Dynamically allocated variables are generally placed in the heap. Example 4 -There are two flaws in this program. Can you spot them? What is an access violation and why did it occur? (Pay close attention because the answer will help you with your assignment.) As compiled with Dev C++ A quick comparison with what it looked like when compiled with Visual Studio What’s happening here? What is this for? What do we expect the stack to look like when we reach this point? What is this? Stack after the sub esp, 68 Can you identify what is on the stack right now? Assuming NO access violation occurred, what do you think happened? So now what? We want control. How do you think we should do it? Time for some math. 0022FF60-22FF10 = 0x50 or 80 bytes first 76 bytes are buffer last 4 overwrite return address. Start of our buffer Return address What it looks like after input – notice the return address overwritten by B instead of A. Return address So that’s great, but where are we gonna put our shell code? Registers just before the return Stack just before the return Finding a suitable jmp ecx Now we write some shell code… Let’s take a look… right before gets Right after gets… looks good It didn’t work… why? This still won’t work. Why? So we need a new exploitation technique. Example 5 SEH… What is that? Let’s get a better look… What it looks like in Ollydbg… your target This is the unfiltered exception handler. It is called if no other EH can handle the problem. So here’s generally what to do… See later slides for why step 4 is different. What your malicious payload should look like… NOPs Exploit Code Short JMP Address of Pop Pop Ret 16 NOPs Long Jump To NOP Sled Large amount of garbage One difference conceptually… • Because we do not have space after the SEH, we must put our shellcode before. This means we will have to jump to the nopsled above and let it “slide” down. • The encoding you will want to use is: • \xE9\x14\xFD\xFF\xFF Generating a Payload Using Backtrack 1) 2) 3) 4) Open a console Type “msfconsole” Type “use payload/windows/messagebox” Type “show options” to see exploit options. Mess with these as you wish. 5) Type “generate” to generate the payload Note: My payload code is in the notes if you wish to use that. Let’s take a pictorial look That questions slide Insert a generic picture of question marks here