On the Effectiveness of Address-Space Randomization CS6V81 - 005

advertisement
On the Effectiveness of Address-Space
Randomization
CS6V81 - 005
Brian Ricks and Vasundhara Chimmad
Overview
●
ASLR: Address Space Layout Randomization
–
Certain brute force attacks can be thwarted by
constantly randomizing the address-space layout each
time the program is restarted.
–
The attacker must either craft a specific exploit for
each instance of a randomized program or perform
brute force attacks to guess the address-space layout.
Overview – PaX ASLR
●
PaX applies ASLR to binaries and dynamic
libraries.
–
For the purposes of ASLR, a process’s user address
space consists of three areas, called the executable,
mapped, and stack areas.
–
ASLR randomizes these three areas separately, adding
to the base address of each one an offset variable
randomly chosen when the process is created.
●
We will focus on the mapped area, which includes the heap
and dynamic libraries
Overview – PaX ASLR
●
●
PaX ASLR provides the following randomness:
–
16 bits for addresses in the executable area
–
16 bits for addresses in the mapped area
–
24 bits for addresses in the stack area
We will focus on the mapped data offset, which
we call delta_mmap
–
Limited to 16 bits of randomness:
●
●
Altering bits 28-31 would affect the mmap() function in terms of
handling large memory mappings
Altering bits 0-11 would cause memory mapped pages not to be aligned
on page boundaries
Breaking PaX ASLR
●
Overview
–
Target: the Apache web server
●
–
No known buffer overflows, so one will be replicated in the
Apache source
Exploit using return-to-libc technique
●
The stack addresses are randomized using 24-bits
–
●
Makes guessing them by brute-force not feasible
Instead, we use knowledge of the stack layout, as the layout
does not change.
Breaking PaX ASLR
●
Overview
–
Determine the value of delta_mmap
●
–
Brute force attack that pinpoints an address in libc
Once delta_mmap is obtained, mount a return-to-libc
attack to spawn a shell
●
●
We assume that the stack is write only, in that we cannot
execute shellcode directly on the stack
We instead call a predefined function from the libc library,
which is linked by default.
–
One such function, system(), can execute programs, such as a shell.
Precomputing libc Addresses
●
In the libc library, determine the address offsets of
the functions system(), usleep(), and a return (ret)
instruction.
–
We can obtain these offsets by using the standard
objdump tool, which displays information from object
files (such as the libc library).
Precomputing libc Addresses
●
Once these offsets are obtained, we can calculate
the correct virtual addresses of system() and ret as
follows:
address = 0x40000000 + offset + delta_mmap.
–
0x40000000: This is the standard base address for memory
obtained using the mmap() function
●
–
offset: The offset from the standard base address
●
–
Already known
Obtained from objdump
delta_mmap: The PaX ASLR offset
●
We need to figure this out!!
Obtaining the value of
delta_mmap
●
Obtain the value of delta_mmap
●
What about usleep()?
–
We use this function to help us determine delta_mmap.
–
delta_mmap comprises the 'missing' 16-bits in the address for
usleep() (we already know the others: they comprise the base
address and the offset)
–
We try to guess the address for usleep() by guessing the value
of delta_mmap:
●
Only 2^16 = 65535 possible values
●
Possible by brute force
Obtaining the value of
delta_mmap
●
Why usleep()?
–
Gives deterministic behavior in Apache for a successful guess
vs a failed guess
●
●
Failed guess: child process crashes, Apache spawns a new process
(forks)
–
Connection closes immediately
–
The new child process uses the same delta_mmap value as the crashed one!
–
Can keep guessing, knowing that delta_mmap will not change
Successful guess: child process hangs for 16 seconds
–
We can infer from this 16 second delay that we found the correct address
for usleep()
–
The guess for delta_mmap is the correct value
Obtaining the value of
delta_mmap
●
How do we do this?
–
Iterate over all possible values for delta_mmap starting from 0
and ending at 65535.
–
For each value of delta_mmap, compute the guess for the
randomized virtual address of usleep() from its offset and base
address.
–
Create the attack buffer and send it to the Apache web server
(buffer overflow exploit).
–
If the connection closes immediately, continue with the next
value of delta_mmap. If the connection hangs for 16 seconds,
then the current guess for delta_mmap is correct.
Obtaining the value of
delta_mmap
●
Why does the child process hang for 16 seconds
on a successful guess?
–
We send to usleep() an argument of 16,843,009, which
corresponds to roughly 16 seconds that the process will sleep
for.
–
This value is represented in the attack buffer as 0x01010101
●
●
Notice that if we want a number any lower than this, we will end up
with a '00' somewhere in the hex representation.
A '00' will be interpreted by strcpy() as a null terminator, and thus will
terminate before overflowing the entire buffer.
Obtaining the value of
delta_mmap
●
What does the attack
buffer look like?
–
Top figure: the stack
before probing
–
Bottom figure: the stack
after one probe
●
The buffer is toward the
bottom in the figure, and
the overflow spreads
upward, as denoted by the
arrow
Obtaining the value of
delta_mmap
●
Iteration of one probe
–
Enter ap_getline()
●
–
The return address (EIP) in the stack frame is overwritten with
the guessed address for usleep()
●
–
This function is modified to include a 64 char buffer (which the attack
buffer is written to) and the strcpy() function which will cause the
overflow
When ap_getline() returns, control is redirected to the guessed address
The stack pointer (EBP) is overwritten with 0xDEADBEEF
(must be overwritten to reach EIP)
Obtaining the value of
delta_mmap
●
Iteration of one probe
●
When ap_getline() returns:
–
If the guess is correct, the address 0xDEADBEEF (above EIP)
will be interpreted as the return address for usleep()
●
●
–
This will cause a crash on return from usleep(), but the purpose here is
to enter the function
This address will make you a 1337 h4x0r
If the guess is correct, the value 0x01010101 will be
interpreted as the argument for usleep()
●
Hex for 16,843,009 decimal, or about 16 seconds.
Obtaining the value of
delta_mmap
●
Iteration of one probe
●
When ap_getline() returns:
–
If the guess is incorrect, the child process will segfault.
●
●
This will cause Apache to fork() a new child process. However, this
new process will have the same randomization as the old one (PaX
randomization occurs when the parent process starts).
Thus, we just guess again
After obtaining delta_mmap
●
●
●
●
We can now compute the addresses in libc of all other
functions with certainty
Use the same buffer overflow exploit (to obtain
delta_mmap) to conduct a return-to-libc attack.
We initially start in the stack frame for the ap_getline()
function.
The overflow causes the ap_getline() function to return to
a sequence of ret instructions, the address of which can
be any ret instruction found in libc
After obtaining delta_mmap
●
Sequence of events:
–
The 64 byte buffer in ap_getline() is overflowed by using
strcpy() to copy the attack buffer into it.
–
EIP for ap_getline()'s (current) stack frame is overwritten (due
to the overflow) with the address of a ret instruction from libc.
–
When ap_getline() returns, the address in EIP is a pointer to a
ret instruction!
●
●
Remember, when ret is called for the ap_getline() function, EIP is
popped off the stack and into the EIP register
This results in a 32-bit word (address) being popped off the stack (from
the EIP location)
After obtaining delta_mmap
●
Sequence of events:
–
When EIP is popped off the stack, execution jumps to the
address contained in the EIP register (what was popped)
●
In our case though, this address is a pointer to a ret instruction in libc!
–
Thus, the ret instruction pops EIP off the stack again, and again,
this address is a pointer to a ret instruction!
–
What we are doing is essentially shifting the stack downwards
one address at a time until we hit the address of system() (part
of the attack buffer)
After obtaining delta_mmap
●
Sequence of events:
–
When we have popped enough of the stack to reach system(),
then we know that the pointer to the 64 byte buffer will be in
position to serve as the argument to system().
●
–
Why? Because we know the stack layout doesn't change, and thus we
can figure out exactly how many ret instructions to put in the attack
buffer so that system()'s address will be exactly two words down in the
stack from the 64 byte buffer pointer
The pointer to the 64 byte buffer can be found in the stack
frame for ap_getline()’s calling function, and thus we overflow
all stack frames with ret instructions until we hit the stack
frame for ap_getline()’s calling function.
After obtaining delta_mmap
●
What does the attack
buffer look like?
–
First 64 bytes: the shell
command that we want
system() to execute
–
This is followed by a series
of ret instructions
●
●
These are pointers to any ret
instruction found in libc
We already know the addresses
of ret functions in libc
After obtaining delta_mmap
●
What does the attack
buffer look like?
–
Above the last ret instruction
is the address of system()
–
We have just enough ret
instructions to 'eat up' the
stack such that we reach the
64 byte buffer in position to
be the argument for system()
After obtaining delta_mmap
●
What does the attack
buffer look like?
–
Again we use 0xDEADBEEF
to overwrite EBP for the
current stack frame and for
the return address of system()
–
The pointer into the 64 byte
buffer is not overwritten!!
●
We need this for our arg to
system()!!
After obtaining delta_mmap
●
Sequence of events:
–
Thus, when system() is called, the pointer to the 64
byte buffer (which contains say “/bin/sh”) is passed as
an argument to system()
After obtaining delta_mmap
●
Why do we need to use pointers to ret instructions? Couldn't we
use say replace the ret addresses with 0xDEADBEEF (or some
other 1337 address) instead and simply overwrite the EIP of
ap_getline()'s stack frame with the address in the stack where we
overflowed with the system() address? Wouldn't this allow us to
jump directly to the correct place in the stack without having to
pop words to get there?
–
Sure, but how are we going to get that stack address?
–
PaX ASLR randomizes 24-bits for stack addresses
–
That would require 2^24 = 16,777,216 guesses of the offset alone in the
worst case to figure out this stack address!! Not feasible to simply jump to
this address.
–
But, the stack layout is not randomized (as mentioned)
Experimental Platform
●
2.4 GHz Pentium 4 client attacking an Athlon
1.8GHz server.
–
●
Connected over a 100Mbps network
Each probe resulted in about 200 bytes of network
traffic
–
Total of 12.8MB in the worst case
–
Total of 6.4MB in the average case
Experimental Results
●
●
●
10 Trials
Total # of Apache child processes spawned
concurrently: 150
Results (of 10 trials):
–
Slowest time: 810 seconds
–
Average time: 216 seconds
–
Quickest time: 29 seconds
Improvements
•
Attacks exploited the low entropy of 16 bits
•
Address space layouts are randomized only
at program loading
64
bit
architectures
• 16 bits of address space randomization can be
defeated by brute force
• 64 bits is good as 40 address bits are available for
randomization
• Online brute force attack wont go unnoticed
Randomization frequency
• More frequency of randomization
• Re-randomizing adds no more than 1 bit of
security against brute force.
• Increase the frequency
Randomization granularity
• Finer granularity by increasing randomness
• By randomizing functions and variable
addresses within memory segments
• In addition to randomizing base addresses
Randomizing at Compile time
• Compiler and linker can be modified to randomize
variables and function addresses within their
segments
• Introduction of random padding
• By placing entry points in random order within a
library additional 10-12 bits of entropy
Randomizing at runtime
• Randomizing more than 16 bits but prevent the
fragmentation of virtual address space.
• Function re-ordering within shared library
• Effective against return-to-libc attacks.
• Modifying the compiler and linker , relative
jumps can be eliminated at compile time.
• Defer resolution of offsets until runtime
dynamic linking
• Allows to order functions arbitrarily and loading
from different libraries also non-contiguous
portions of virtual memory
• Library pages differ from each processes
• Clustering functions into page size groups and
shuffling groups instead of individual functions.
• Code that need to call these functions must be
able to locate them effectively
• Global Offset Table(GOT) and array of pointers
initialized by runtime dynamic linker need to be
fixed .
.
• Difficult with all the constraints.
• Designing a linking architecture that facilitates
function shuffling in shared code pages
effectively n securely is needed. Further
research in this area is needed.
Monitoring and catching errors
• Crash detection and reaction mechanism called
Watcher.
• Attacker incorrect guesses will trigger
segmentation violations .
• But limited actions of crash watcher
Download