formatstring

advertisement
Format Strings 101
Nathan Beard
2/21/2011
REMINDER:
Disable ASLR on your Linux machine before continuing:
echo 0 > /proc/sys/kernel/randomize_va_space
Above is a program I wrote that echos a string back to the user and throws an error message if the number
variable is not equal to 1000. It may seem impossible to change this upon first glance because we are never
able to directly interact with that variable, but the magic of format string exploits will come to the rescue!
Compile the program with gcc and ignore the warning messages.
What is a format string?
The second line in the image above shows a format string. A function like printf that uses a format string
will simply evaluate the text and perform a special action anytime a percent sign with it's corresponding
identifier is encountered. After the format string itself, variables are listed that are used to fill in the values
held by the format parameters. It is very important that the number of variables listed on the right
correspond exactly to the number of format parameters. While the second line looks perfectly acceptable, the
top presents a potential vulnerability if we are able to control the contents of buffer.
Step 1 – Source Auditing
The first thing you should attempt to do when attempting to exploit a program similar to the one used in this
exercise is read the source code if it is available. Although format string bugs are becoming rare, they are
relatively easy to spot due to the lack of proper formatting.
The above illustrates a classic example of a clear format string vulnerability. Since we are passing in the
string pointed to by the pointer argv[1], which is the first command line argument to the program, we are
able to completely control how the buffer is formed. The biggest problem, however, comes from the fact that
the printf function is used in an improper manner. Since we have not specified an exact format string, the
function is free to create it's own based on the contents of buffer.
The above image displays a much safer way to display input. The %s format parameter indicates that printf
is to display a string, which is a pointer to a character in the C language. By specifically creating a format
string for the function to use, it is ensured that malicious users are not allowed to tamper with our program.
Step 2 – Verify Vulnerability
The above shows the program running normally. A user can input a string as the first argument, and the
program will echo it back as the first line. The second line prints the integer held by the number variable
which is set to 16 by default. This is followed by the address of the number variable (a pointer to the
variable), and finally a string that reads “nope...” indicating that we have not set our integer equal to 1000.
In order to test for simple format string vulnerabilities, you can simply pass in a string to the program that
contains format parameters like %x. Here are a few of the more common parameters:
1. %s → string
2. %x → Hexadecimal number
3. %p → Pointer
4. %d → Integer
Lets attempt to pass in a “%x”:
This time, our program returned a hexadecimal number like we asked it to, which indicates a vulnerability in
the current context. It turns out that if printf cannot find a variable associated with a format parameter, it
simply starts pulling from the stack itself.
Step 3 – Map the Stack
Since we now know that we are able to construct a format string that will pull arbitrary values from the
stack, we can start to do some investigation. It's easy to enumerate a stack by simply stringing together
%08x statements. It also helps to insert another character like a period to help view the results more clearly.
Notice that the 7th element we pulled from the stack is the value of our number variable (16 in hex). This is
easy to view because we know the number exists, but it is easy to view the contents of the memory
addresses by using the “%s” parameter. This fetches the data stored by the memory address given, so it is
important to understand how it is used. Let's get the value of what is stored in the 1st address pulled:
This works because the value being dereferenced was a valid memory address. Lets see what happens if you
attempt to get the value of a non-valid memory address:
The segmentation fault indicates that the program attempted to access a memory location that it did not have
access to. This makes it very easy way to crash programs that have format string vulnerabilities, because you
simply need to pass in a large string of “%s” format parameters..
Change Stack Values
We can clearly see that we are pulling information from higher up in the stack, but how can we change their
values? It turns out the the printf family of functions has a special %n format parameter that allows the
number of bytes read to be stored in a variable through the use of a pointer.
The screenshot below illustrates this new format parameter in action:
You can see that we are able to change the value of number from 0 to 10 in the previous example. Notice
that we print the number of bytes written to the address of the number variable. This is important to note,
because it means that we are able to write to arbitrary memory addresses by using only printf!
Step 4 – Pass in Vulnerable Address
After running the formatstr program with “test” as the first argument, a pointer to the number variable is said
to be at the address of 0xbffff69c. Since this address can be written to using the %n format parameter, it
needs to be put on the stack so it can be accessed. In the second example, I have placed this address at the
front of the format string, and followed it with a simple stack enumeration string. Notice, however, that the
address of the number variable changes during the second execution. The reason for this centers around the
workings of the stack, and the length of the format string passed to the program. A longer format string will
necessarily need to reserve more space on the stack, so the address for number will be lower in memory.
I also need to mention the fact that I needed to pass in the address of the number backwards. This is due to
the “little endian” nature of x86 Intel processors, and the fact that they store words of data (4 bytes)
backwards. In any case, we will need to change the address we are passing in to match that displayed by the
number variable if we wish to successfully change it. We will do this below:
Step 5 – Modify Stack Values
In order to write to a memory address that is present on the stack, we will need to make sure that the %n
format parameter aligns with the vulnerable address. To do this, we can modify the number of hexadecimal
format parameters to print so that we end up one short of the desired address. (below):
An easier way of modifying values is to use something called Direct Parameter Access. This syntax allows
you to reference certain format string parameters without having to pad the initial ones with “%x”. An
example is given below:
The %7 refers to the parameter to access, and the \$d tells it to print the integer value of that specific
parameter. You can use any format identifier here (it does not have to be \$d), so it is indeed very useful.
Now lets try to write a new value to the number variable by printing the number of bytes written to the
address on the stack. Since we will be using direct parameter access, and only have the vulnerable address in
the format string, our number should end up being 4 bytes.
Success! We have changed the value, but how are we going to get a number as high as 1000? It turns out
that we can use a %#x parameter to specify a width and insert blank spaces in the format string. This allows
us to greatly increase the amount we can write to number. (below):
The “%100x” parameter prints 100 spaces, while the “%.100x” prints 100 0's. In order to use this to write
the correct number, however, we will need to pad 996 bytes:
SUCCESS! The number is now 1000, and we have bypassed the program's protection mechanism!
Step 6 – Overwrite GOT Values
Although we were able to bypass the protection mechanism in place for this simple program, this doesn't get
us much of anything besides a congratulatory message. In order to get something more useful like a root
shell, we will need to control the program's execution to a greater degree. Since a program may need to use a
function from a shared library many times throughout the course of execution, it is helpful to have a table
that maps these addresses. This section is called the Procedure Linkage Table, and anytime a function from a
shared library is called from your program the address is found in this location. Since this section is readonly, however, we will need to look for another location to attack. This is where the GOT comes into play.
This table stores the actual address where the shared library functions are located, while the PLT only stores
pointers to the addresses. Along with this, the GOT is writable, which makes our job as attackers much
easier. In order to read these different sections of our binary, we will use objdump.
The above shows some entries from the PLT of the vulnerable program. The following switches are used:
1. -d → Disassemble and get assembly code
2. -j → Specify a specific location to disassemble.
You can see that the strncpy function is present, and is mapped to a pointer to the 0x804a004 address. Now
lets find how this is represented in the GOT by using the -R switch:
The above output shows that the offset address of the strncpy function is equal to what was present in the
PLT. Now that we know a location where we can overwrite values to functions, we will need to build our
shellcode.
Step 7 – Create and Store Shellcode
The above screenshot illustrates how we can store created shellcode in an environmental variable. The
msfpayload command is great for generating all types of shellcode to use with different exploits, but I have
written my own and it is smaller then the default cmd payload. We are storing the shellcode in an
environmental variable so that we can pinpoint it's exact location. The following program and demonstration
shows how to accomplish this:
Now that we know the exact location of our shellcode, we can attempt to place this address in the GOT in
place of another address of a function that is used. Looking back at our program, we can see that the exit
function is called even on failure. This makes it a great target for an overwrite.
Step 7 – Performing the Exploit
We can recall from earlier that the 0x0804a018 address is where our exit function is pointing to, so we will
need to pass this onto the stack in order to edit it. Once we get it on the stack, we'll also need to locate the
exact parameter values that we will need to access. We can do this by dumping a few stack values and
looking for the addresses. (below)
We can see that we are pushing on two addresses that are two bytes apart where our exit function is pointing.
The two address values are located at the 8th and 9th parameter locations. We can use this knowledge for the
next example. Let's try to write some values to the GOT:
It appears that we were successful in writing a new address because we got a Segmentation Fault error. This
means that although we wrote a value, it was not a valid memory address and the program crashed. In order
to find the correct values to write, we can use gdb to print the decimal representation of hex values.
We need to subtract 8 from the initial hex value because we are passing 8 bytes of addresses on the stack
before performing our write operations. We also need to subtract the initial write value from the next write in
order to compensate for the amount of bytes written. Since we have the correct values to pad with, we can
now construct our exploit:
It appears that out exploit was successful since a shell was returned. We can now completely control
execution and jump out of our process!
Download