Format Strings 101 Nathan Beard 2/21/2011 REMINDER: Disable ASLR on your Linux machine before continuing: echo 0 > /proc/sys/kernel/randomize_va_space Above is a program I wrote that echos a string back to the user and throws an error message if the number variable is not equal to 1000. It may seem impossible to change this upon first glance because we are never able to directly interact with that variable, but the magic of format string exploits will come to the rescue! Compile the program with gcc and ignore the warning messages. What is a format string? The second line in the image above shows a format string. A function like printf that uses a format string will simply evaluate the text and perform a special action anytime a percent sign with it's corresponding identifier is encountered. After the format string itself, variables are listed that are used to fill in the values held by the format parameters. It is very important that the number of variables listed on the right correspond exactly to the number of format parameters. While the second line looks perfectly acceptable, the top presents a potential vulnerability if we are able to control the contents of buffer. Step 1 – Source Auditing The first thing you should attempt to do when attempting to exploit a program similar to the one used in this exercise is read the source code if it is available. Although format string bugs are becoming rare, they are relatively easy to spot due to the lack of proper formatting. The above illustrates a classic example of a clear format string vulnerability. Since we are passing in the string pointed to by the pointer argv[1], which is the first command line argument to the program, we are able to completely control how the buffer is formed. The biggest problem, however, comes from the fact that the printf function is used in an improper manner. Since we have not specified an exact format string, the function is free to create it's own based on the contents of buffer. The above image displays a much safer way to display input. The %s format parameter indicates that printf is to display a string, which is a pointer to a character in the C language. By specifically creating a format string for the function to use, it is ensured that malicious users are not allowed to tamper with our program. Step 2 – Verify Vulnerability The above shows the program running normally. A user can input a string as the first argument, and the program will echo it back as the first line. The second line prints the integer held by the number variable which is set to 16 by default. This is followed by the address of the number variable (a pointer to the variable), and finally a string that reads “nope...” indicating that we have not set our integer equal to 1000. In order to test for simple format string vulnerabilities, you can simply pass in a string to the program that contains format parameters like %x. Here are a few of the more common parameters: 1. %s → string 2. %x → Hexadecimal number 3. %p → Pointer 4. %d → Integer Lets attempt to pass in a “%x”: This time, our program returned a hexadecimal number like we asked it to, which indicates a vulnerability in the current context. It turns out that if printf cannot find a variable associated with a format parameter, it simply starts pulling from the stack itself. Step 3 – Map the Stack Since we now know that we are able to construct a format string that will pull arbitrary values from the stack, we can start to do some investigation. It's easy to enumerate a stack by simply stringing together %08x statements. It also helps to insert another character like a period to help view the results more clearly. Notice that the 7th element we pulled from the stack is the value of our number variable (16 in hex). This is easy to view because we know the number exists, but it is easy to view the contents of the memory addresses by using the “%s” parameter. This fetches the data stored by the memory address given, so it is important to understand how it is used. Let's get the value of what is stored in the 1st address pulled: This works because the value being dereferenced was a valid memory address. Lets see what happens if you attempt to get the value of a non-valid memory address: The segmentation fault indicates that the program attempted to access a memory location that it did not have access to. This makes it very easy way to crash programs that have format string vulnerabilities, because you simply need to pass in a large string of “%s” format parameters.. Change Stack Values We can clearly see that we are pulling information from higher up in the stack, but how can we change their values? It turns out the the printf family of functions has a special %n format parameter that allows the number of bytes read to be stored in a variable through the use of a pointer. The screenshot below illustrates this new format parameter in action: You can see that we are able to change the value of number from 0 to 10 in the previous example. Notice that we print the number of bytes written to the address of the number variable. This is important to note, because it means that we are able to write to arbitrary memory addresses by using only printf! Step 4 – Pass in Vulnerable Address After running the formatstr program with “test” as the first argument, a pointer to the number variable is said to be at the address of 0xbffff69c. Since this address can be written to using the %n format parameter, it needs to be put on the stack so it can be accessed. In the second example, I have placed this address at the front of the format string, and followed it with a simple stack enumeration string. Notice, however, that the address of the number variable changes during the second execution. The reason for this centers around the workings of the stack, and the length of the format string passed to the program. A longer format string will necessarily need to reserve more space on the stack, so the address for number will be lower in memory. I also need to mention the fact that I needed to pass in the address of the number backwards. This is due to the “little endian” nature of x86 Intel processors, and the fact that they store words of data (4 bytes) backwards. In any case, we will need to change the address we are passing in to match that displayed by the number variable if we wish to successfully change it. We will do this below: Step 5 – Modify Stack Values In order to write to a memory address that is present on the stack, we will need to make sure that the %n format parameter aligns with the vulnerable address. To do this, we can modify the number of hexadecimal format parameters to print so that we end up one short of the desired address. (below): An easier way of modifying values is to use something called Direct Parameter Access. This syntax allows you to reference certain format string parameters without having to pad the initial ones with “%x”. An example is given below: The %7 refers to the parameter to access, and the \$d tells it to print the integer value of that specific parameter. You can use any format identifier here (it does not have to be \$d), so it is indeed very useful. Now lets try to write a new value to the number variable by printing the number of bytes written to the address on the stack. Since we will be using direct parameter access, and only have the vulnerable address in the format string, our number should end up being 4 bytes. Success! We have changed the value, but how are we going to get a number as high as 1000? It turns out that we can use a %#x parameter to specify a width and insert blank spaces in the format string. This allows us to greatly increase the amount we can write to number. (below): The “%100x” parameter prints 100 spaces, while the “%.100x” prints 100 0's. In order to use this to write the correct number, however, we will need to pad 996 bytes: SUCCESS! The number is now 1000, and we have bypassed the program's protection mechanism! Step 6 – Overwrite GOT Values Although we were able to bypass the protection mechanism in place for this simple program, this doesn't get us much of anything besides a congratulatory message. In order to get something more useful like a root shell, we will need to control the program's execution to a greater degree. Since a program may need to use a function from a shared library many times throughout the course of execution, it is helpful to have a table that maps these addresses. This section is called the Procedure Linkage Table, and anytime a function from a shared library is called from your program the address is found in this location. Since this section is readonly, however, we will need to look for another location to attack. This is where the GOT comes into play. This table stores the actual address where the shared library functions are located, while the PLT only stores pointers to the addresses. Along with this, the GOT is writable, which makes our job as attackers much easier. In order to read these different sections of our binary, we will use objdump. The above shows some entries from the PLT of the vulnerable program. The following switches are used: 1. -d → Disassemble and get assembly code 2. -j → Specify a specific location to disassemble. You can see that the strncpy function is present, and is mapped to a pointer to the 0x804a004 address. Now lets find how this is represented in the GOT by using the -R switch: The above output shows that the offset address of the strncpy function is equal to what was present in the PLT. Now that we know a location where we can overwrite values to functions, we will need to build our shellcode. Step 7 – Create and Store Shellcode The above screenshot illustrates how we can store created shellcode in an environmental variable. The msfpayload command is great for generating all types of shellcode to use with different exploits, but I have written my own and it is smaller then the default cmd payload. We are storing the shellcode in an environmental variable so that we can pinpoint it's exact location. The following program and demonstration shows how to accomplish this: Now that we know the exact location of our shellcode, we can attempt to place this address in the GOT in place of another address of a function that is used. Looking back at our program, we can see that the exit function is called even on failure. This makes it a great target for an overwrite. Step 7 – Performing the Exploit We can recall from earlier that the 0x0804a018 address is where our exit function is pointing to, so we will need to pass this onto the stack in order to edit it. Once we get it on the stack, we'll also need to locate the exact parameter values that we will need to access. We can do this by dumping a few stack values and looking for the addresses. (below) We can see that we are pushing on two addresses that are two bytes apart where our exit function is pointing. The two address values are located at the 8th and 9th parameter locations. We can use this knowledge for the next example. Let's try to write some values to the GOT: It appears that we were successful in writing a new address because we got a Segmentation Fault error. This means that although we wrote a value, it was not a valid memory address and the program crashed. In order to find the correct values to write, we can use gdb to print the decimal representation of hex values. We need to subtract 8 from the initial hex value because we are passing 8 bytes of addresses on the stack before performing our write operations. We also need to subtract the initial write value from the next write in order to compensate for the amount of bytes written. Since we have the correct values to pad with, we can now construct our exploit: It appears that out exploit was successful since a shell was returned. We can now completely control execution and jump out of our process!