On the Effectiveness of Address-Space Randomization CS6V81 - 005 Brian Ricks and Vasundhara Chimmad Overview ● ASLR: Address Space Layout Randomization – Certain brute force attacks can be thwarted by constantly randomizing the address-space layout each time the program is restarted. – The attacker must either craft a specific exploit for each instance of a randomized program or perform brute force attacks to guess the address-space layout. Overview – PaX ASLR ● PaX applies ASLR to binaries and dynamic libraries. – For the purposes of ASLR, a process’s user address space consists of three areas, called the executable, mapped, and stack areas. – ASLR randomizes these three areas separately, adding to the base address of each one an offset variable randomly chosen when the process is created. ● We will focus on the mapped area, which includes the heap and dynamic libraries Overview – PaX ASLR ● ● PaX ASLR provides the following randomness: – 16 bits for addresses in the executable area – 16 bits for addresses in the mapped area – 24 bits for addresses in the stack area We will focus on the mapped data offset, which we call delta_mmap – Limited to 16 bits of randomness: ● ● Altering bits 28-31 would affect the mmap() function in terms of handling large memory mappings Altering bits 0-11 would cause memory mapped pages not to be aligned on page boundaries Breaking PaX ASLR ● Overview – Target: the Apache web server ● – No known buffer overflows, so one will be replicated in the Apache source Exploit using return-to-libc technique ● The stack addresses are randomized using 24-bits – ● Makes guessing them by brute-force not feasible Instead, we use knowledge of the stack layout, as the layout does not change. Breaking PaX ASLR ● Overview – Determine the value of delta_mmap ● – Brute force attack that pinpoints an address in libc Once delta_mmap is obtained, mount a return-to-libc attack to spawn a shell ● ● We assume that the stack is write only, in that we cannot execute shellcode directly on the stack We instead call a predefined function from the libc library, which is linked by default. – One such function, system(), can execute programs, such as a shell. Precomputing libc Addresses ● In the libc library, determine the address offsets of the functions system(), usleep(), and a return (ret) instruction. – We can obtain these offsets by using the standard objdump tool, which displays information from object files (such as the libc library). Precomputing libc Addresses ● Once these offsets are obtained, we can calculate the correct virtual addresses of system() and ret as follows: address = 0x40000000 + offset + delta_mmap. – 0x40000000: This is the standard base address for memory obtained using the mmap() function ● – offset: The offset from the standard base address ● – Already known Obtained from objdump delta_mmap: The PaX ASLR offset ● We need to figure this out!! Obtaining the value of delta_mmap ● Obtain the value of delta_mmap ● What about usleep()? – We use this function to help us determine delta_mmap. – delta_mmap comprises the 'missing' 16-bits in the address for usleep() (we already know the others: they comprise the base address and the offset) – We try to guess the address for usleep() by guessing the value of delta_mmap: ● Only 2^16 = 65535 possible values ● Possible by brute force Obtaining the value of delta_mmap ● Why usleep()? – Gives deterministic behavior in Apache for a successful guess vs a failed guess ● ● Failed guess: child process crashes, Apache spawns a new process (forks) – Connection closes immediately – The new child process uses the same delta_mmap value as the crashed one! – Can keep guessing, knowing that delta_mmap will not change Successful guess: child process hangs for 16 seconds – We can infer from this 16 second delay that we found the correct address for usleep() – The guess for delta_mmap is the correct value Obtaining the value of delta_mmap ● How do we do this? – Iterate over all possible values for delta_mmap starting from 0 and ending at 65535. – For each value of delta_mmap, compute the guess for the randomized virtual address of usleep() from its offset and base address. – Create the attack buffer and send it to the Apache web server (buffer overflow exploit). – If the connection closes immediately, continue with the next value of delta_mmap. If the connection hangs for 16 seconds, then the current guess for delta_mmap is correct. Obtaining the value of delta_mmap ● Why does the child process hang for 16 seconds on a successful guess? – We send to usleep() an argument of 16,843,009, which corresponds to roughly 16 seconds that the process will sleep for. – This value is represented in the attack buffer as 0x01010101 ● ● Notice that if we want a number any lower than this, we will end up with a '00' somewhere in the hex representation. A '00' will be interpreted by strcpy() as a null terminator, and thus will terminate before overflowing the entire buffer. Obtaining the value of delta_mmap ● What does the attack buffer look like? – Top figure: the stack before probing – Bottom figure: the stack after one probe ● The buffer is toward the bottom in the figure, and the overflow spreads upward, as denoted by the arrow Obtaining the value of delta_mmap ● Iteration of one probe – Enter ap_getline() ● – The return address (EIP) in the stack frame is overwritten with the guessed address for usleep() ● – This function is modified to include a 64 char buffer (which the attack buffer is written to) and the strcpy() function which will cause the overflow When ap_getline() returns, control is redirected to the guessed address The stack pointer (EBP) is overwritten with 0xDEADBEEF (must be overwritten to reach EIP) Obtaining the value of delta_mmap ● Iteration of one probe ● When ap_getline() returns: – If the guess is correct, the address 0xDEADBEEF (above EIP) will be interpreted as the return address for usleep() ● ● – This will cause a crash on return from usleep(), but the purpose here is to enter the function This address will make you a 1337 h4x0r If the guess is correct, the value 0x01010101 will be interpreted as the argument for usleep() ● Hex for 16,843,009 decimal, or about 16 seconds. Obtaining the value of delta_mmap ● Iteration of one probe ● When ap_getline() returns: – If the guess is incorrect, the child process will segfault. ● ● This will cause Apache to fork() a new child process. However, this new process will have the same randomization as the old one (PaX randomization occurs when the parent process starts). Thus, we just guess again After obtaining delta_mmap ● ● ● ● We can now compute the addresses in libc of all other functions with certainty Use the same buffer overflow exploit (to obtain delta_mmap) to conduct a return-to-libc attack. We initially start in the stack frame for the ap_getline() function. The overflow causes the ap_getline() function to return to a sequence of ret instructions, the address of which can be any ret instruction found in libc After obtaining delta_mmap ● Sequence of events: – The 64 byte buffer in ap_getline() is overflowed by using strcpy() to copy the attack buffer into it. – EIP for ap_getline()'s (current) stack frame is overwritten (due to the overflow) with the address of a ret instruction from libc. – When ap_getline() returns, the address in EIP is a pointer to a ret instruction! ● ● Remember, when ret is called for the ap_getline() function, EIP is popped off the stack and into the EIP register This results in a 32-bit word (address) being popped off the stack (from the EIP location) After obtaining delta_mmap ● Sequence of events: – When EIP is popped off the stack, execution jumps to the address contained in the EIP register (what was popped) ● In our case though, this address is a pointer to a ret instruction in libc! – Thus, the ret instruction pops EIP off the stack again, and again, this address is a pointer to a ret instruction! – What we are doing is essentially shifting the stack downwards one address at a time until we hit the address of system() (part of the attack buffer) After obtaining delta_mmap ● Sequence of events: – When we have popped enough of the stack to reach system(), then we know that the pointer to the 64 byte buffer will be in position to serve as the argument to system(). ● – Why? Because we know the stack layout doesn't change, and thus we can figure out exactly how many ret instructions to put in the attack buffer so that system()'s address will be exactly two words down in the stack from the 64 byte buffer pointer The pointer to the 64 byte buffer can be found in the stack frame for ap_getline()’s calling function, and thus we overflow all stack frames with ret instructions until we hit the stack frame for ap_getline()’s calling function. After obtaining delta_mmap ● What does the attack buffer look like? – First 64 bytes: the shell command that we want system() to execute – This is followed by a series of ret instructions ● ● These are pointers to any ret instruction found in libc We already know the addresses of ret functions in libc After obtaining delta_mmap ● What does the attack buffer look like? – Above the last ret instruction is the address of system() – We have just enough ret instructions to 'eat up' the stack such that we reach the 64 byte buffer in position to be the argument for system() After obtaining delta_mmap ● What does the attack buffer look like? – Again we use 0xDEADBEEF to overwrite EBP for the current stack frame and for the return address of system() – The pointer into the 64 byte buffer is not overwritten!! ● We need this for our arg to system()!! After obtaining delta_mmap ● Sequence of events: – Thus, when system() is called, the pointer to the 64 byte buffer (which contains say “/bin/sh”) is passed as an argument to system() After obtaining delta_mmap ● Why do we need to use pointers to ret instructions? Couldn't we use say replace the ret addresses with 0xDEADBEEF (or some other 1337 address) instead and simply overwrite the EIP of ap_getline()'s stack frame with the address in the stack where we overflowed with the system() address? Wouldn't this allow us to jump directly to the correct place in the stack without having to pop words to get there? – Sure, but how are we going to get that stack address? – PaX ASLR randomizes 24-bits for stack addresses – That would require 2^24 = 16,777,216 guesses of the offset alone in the worst case to figure out this stack address!! Not feasible to simply jump to this address. – But, the stack layout is not randomized (as mentioned) Experimental Platform ● 2.4 GHz Pentium 4 client attacking an Athlon 1.8GHz server. – ● Connected over a 100Mbps network Each probe resulted in about 200 bytes of network traffic – Total of 12.8MB in the worst case – Total of 6.4MB in the average case Experimental Results ● ● ● 10 Trials Total # of Apache child processes spawned concurrently: 150 Results (of 10 trials): – Slowest time: 810 seconds – Average time: 216 seconds – Quickest time: 29 seconds Improvements • Attacks exploited the low entropy of 16 bits • Address space layouts are randomized only at program loading 64 bit architectures • 16 bits of address space randomization can be defeated by brute force • 64 bits is good as 40 address bits are available for randomization • Online brute force attack wont go unnoticed Randomization frequency • More frequency of randomization • Re-randomizing adds no more than 1 bit of security against brute force. • Increase the frequency Randomization granularity • Finer granularity by increasing randomness • By randomizing functions and variable addresses within memory segments • In addition to randomizing base addresses Randomizing at Compile time • Compiler and linker can be modified to randomize variables and function addresses within their segments • Introduction of random padding • By placing entry points in random order within a library additional 10-12 bits of entropy Randomizing at runtime • Randomizing more than 16 bits but prevent the fragmentation of virtual address space. • Function re-ordering within shared library • Effective against return-to-libc attacks. • Modifying the compiler and linker , relative jumps can be eliminated at compile time. • Defer resolution of offsets until runtime dynamic linking • Allows to order functions arbitrarily and loading from different libraries also non-contiguous portions of virtual memory • Library pages differ from each processes • Clustering functions into page size groups and shuffling groups instead of individual functions. • Code that need to call these functions must be able to locate them effectively • Global Offset Table(GOT) and array of pointers initialized by runtime dynamic linker need to be fixed . . • Difficult with all the constraints. • Designing a linking architecture that facilitates function shuffling in shared code pages effectively n securely is needed. Further research in this area is needed. Monitoring and catching errors • Crash detection and reaction mechanism called Watcher. • Attacker incorrect guesses will trigger segmentation violations . • But limited actions of crash watcher