Defeating Memory Corruption Attacks via Pointer Taintedness Detection Shuo Chen†, Jun Xu‡, Nithin Nakka†, Zbigniew Kalbarczyk† and Ravi K. Iyer† † Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign, U.S.A. ‡ Department of Computer Science, North Carolina State University, U.S.A. IEEE International Conference on Dependable Systems and Networks Yokohama, Japan, June 30, 2005 Introduction Memory corruption attack Major threat of Internet Current dominant form: Control data attack Our contributions Non-control data attacks are realistic More general observation: pointer taintedness A new architecture for detection 2 Outline Non-control Data Attacks The Concept of Tainted Pointers Processor Architecture for Pointer Taintedness Detection Experimental Evaluation Conclusion 3 Control Data Attack Control data attack a.k.a. control hijacking or code-injection attack Dominant form of memory corruption attacks [CERT and Microsoft Security Bulletin] Control data (code pointers) data used as targets of call, return and jump widely understood as security critical-data Many existing defenses: enforce security via control data integrity 4 Control Data Attack – An Example WU-FTPD format string attack repeat Embed malicious contents in input FTP_service() Authentication; x = user ID seteuid(x) get an FTP command SITE_EXEC(fmt) printf(fmt,…) Overwrite a return address seteuid(0) exec(“/bin/sh”) Execute malicious code 5 Non-Control-Data Attack: A Realistic Threat Non-control-data: not control data (code pointers), attacks corrupt application-specific data Not been seriously considered We constructed non-control-data attacks against a number of real world applications Equivalent security compromise as control data attacks Root privilege on HTTP, SSH, Telnet and FTP servers Corrupting user identify data, configuration data, user input data, and decision-making data Will appear in USENIX Security Symposium, Aug 2005 6 Non-Control Data Attack – An Example WU-FTPD format string attack repeat Embed malicious contents in input FTP_service() Authentication; x = user ID seteuid(x) get an FTP command SITE_EXEC(fmt) printf(fmt,…) Overwrite x (saved user ID) getdatasock( ... ) { seteuid(0); setsockopt( ... ); seteuid(x); 7 } More Non-Control-Data Attacks Against NULL HTTP server Corrupt the configuration string of CGI-BIN path. Run /bin/sh as a CGI program Against SSH Communications SSH server Corrupt a Boolean Log in as root with an arbitrary password Against GazTek HTTP server Corrupt user URL input Run /bin/sh as a CGI program New threat calling for new defense How can we defeat both control-data and non-control-data attacks? 8 Pointer Taintedness Detection Tainted pointers: code or data pointers derived from malicious user input Root cause of a large class of memory corruption attacks (control-data or non-controldata) Detection of tainted pointers Defeat a large class real-world memory attacks, e.g., stack smashing, format string, heap corruption, integer overflow 9 Internals of Stack Buffer Overflow Attacks Vulnerable code: char buf[100]; strcpy(buf,user_input); Stack growth High Return addr Frame pointer buf[99] … buf[1] buf[0] Frame pointer or return address can be tainted. user_input buf Low 10 Runtime Pointer Taintedness Detection A processor architectural level mechanism to detect pointer taintedness Implemented a taintedness-aware memory system One-bit extension for each byte to indicate the taintedness of the byte Taintedness initialization Tag every byte of data received from external input sources Taintedness tracking Tainedness is propagated by ALU instructions Attack detection When a tainted value is dereferenced (i.e., used as a pointer). On SimpleScalar processor simulator 11 Opcode Register File 4 bits 36 bits 4 bits M U X Bitwise 4 bits OR 36 bits 0 36 bits 8-bit byte Taintedness bit 36 bits M U X store path 36 bits Data pointer taintedness detector alert MUX load/ store? jr? 36 bits M U X MUX alert Jump pointer taintedness 32 bits detector A 32 bits 36 bits L 32 bits U MEM/WB 0 Data Memory ID/EX Shift specific logic AND specific logic XOR specific logic Compare specific logic ALU taintedness tracking logic EX/MEM M U X 4 bits 36 bits 36 bits load path 12 Related Work on Taintedness Perl security Shankar and Wagner (2001) Static analysis to uncover format string vulnerabilities Our previous work on pointer taintedness (Aug. 2004) A source code analysis technique to uncover pointer taintedness vulnerabilities Reasoning taintedness at machine code level, relying on an extended memory model More recent work: Secure Program Execution (MIT), Minos (UC-Davis) and TaintCheck (CMU) (late 2004 and early 2005) Similar memory model Taintedness of control data Pointer taintedness vs. control-data taintedness cause vs. result of memory corruption 13 Evaluation Attack detection effectiveness Synthetic vulnerable programs Real-world network applications Evaluation of false positives Real-world network applications SPEC 2000 benchmarks Potential false negative scenarios 14 Attack Detection Effectiveness First, test on synthetic vulnerable programs All attacks (control/non-control data) are detected Stack Buffer Overflow Heap Corruption Attack Format String Attack Vulnerable program void exp1() { char buf[10]; scanf("%s",buf); } void exp2() { void exp3(int s) { char * buf; char buf[100]; buf = malloc(8); recv(s,buf,100,0); scanf("%s",buffer); printf(buf); free(p); } } Input data (network/console) aaaaaaaaaaaaaaaaaa aaaaaaaaaaaaa abcd%x%x%x%n Violating instruction 400a38: JR $31 401dc0: LW $3,0($3) 402d60: SW $21,0($3) Tainted data $31= 0x61616161 $3 = 0x61616161 $3=0x64636261 15 Attack Detection Effectiveness (cont.) Evaluation on real world network applications All attacks are detected No difference between control-data attack and non-control-data attack from the viewpoint of pointer taintedness WU-FTP server Format string attack Overwrite user identity data (non-control-data) detected GazTek HTTP server Stack buffer overflow attack Overwrite user input data (non-control-data) detected NULL HTTP server Heap corruption Overwrite configuration data attack (non-control-data) detected traceroute Double free detected Function pointer (control-data) 16 Transparency and False Positive No need for re-compilation, run existing binary executables Results from network applications: no false positives Results from SPEC benchmarks 15 billion instructions without any false positive Conclusion: No known false positive BZIP2 GCC GZIP MCF PARSER VPR Total Program size 321KB 4184KB 485KB 304KB 595KB 697KB 6586KB Total number of input bytes 1048KB 77.7K 282KB 39.2KB 743.0KB 6.4KB 2186KB Total number of instructions 5,951M 110M 6,926M 1,653M 389M 108M 15,139M Alert generated? No No No No No No No 17 Potential False Negative Scenarios Incorrect array index boundary check Determining correct array size requires source code analysis – very hard at binary level Buffer overflow within the local frame If no pointer is tainted, no alert is raised Unlikely to cause severe security damage because attacker-controllable location is very limited Format string attack causing information leak Allows inspection of some memory data words Cause security compromises if these words containing security-critical secret, e.g., key and password 18 Integer overflow Induced Array Index Out of Bound void foo(unsigned int ui) { 1: int i = ui; 2: if (i >= ArraySize) 3: i = ArraySize – 1; 4: array[i] = 1; } 19 Buffer overflow causing critical flags to be corrupted void bar () { 1: int auth; 2: char buf[100]; 3: auth = do_auth (); 4: scanf(“%s”,buf); 5: if (auth) grant_access(); } 20 Format string attack causing information leak void leak() { 1: int secret_key; 2: char buf[12]; 3: recv(s,buf,12,0); 4: printf(buf); “%x%x%x%x” } 21 Conclusions Contributions: Non-control-data attack is a realistic threat Memory corruption attacks, including control-data attacks and non-control-data attacks, are due to pointer taintedness Proposed a runtime pointer taintedness detection architecture - Substantial improvement in security coverage Evaluation transparent to existing applications a near-zero false positive rate We plan to implement this approach in the Hardware framework for detection and recovery 22 Questions? 23 Another Motivating Example NULL-HTTPD heap corruption attack repeat Overwrite function pointer foo HTTP_service() p=malloc(…) process HTTP header free(p) HTTP_POST() *foo() recv(p,…) Corrupt heap structure seteuid(0) exec(“/bin/sh”) Execute malicious code 24 Non-Control-Data Attack against WU-FTP Server Overwrite an integer representing user ID obtain the root privilege of the server int x; site_exec() { a format string vulnerability } getdatasock( ... ) { seteuid(0); setsockopt( ... ); seteuid(x); } 25 Internals of Format String Attack Vulnerable code: recv(buf); printf(buf); Stack growth High Low \xdd \xcc \xbb \xaa %d %d %d %n /* should be printf(“%s”,buf) */ … %n %d %d %d 0xaabbccdd fmt: format string pointer ap: pointer fmt:argument format string pointer ap: argument pointer In vfprintf(), *ap is a if (fmt points to “%n”) then **ap = (character count) tainted value. 26 Future Directions Combination of static code analysis and architecture support To automatically derive predicates to be checked by processor at runtime Reliability and security support for embedded systems Migrate our current techniques to embedded systems New topics: cell phone virus, reduced power consumption, tamper-resistance hardware, crypto and authentication hardware/software 27 Other 33% Buffer Overflow 44% Globbing 2% Format String 7% Heap Corruption 8% Integer Overflow 6% 28