Enhancing Security of Real-World Systems with a Better Understanding of Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable and High Performance Computing Coordinated Science Laboratories University of Illinois at Urbana-Champaign 1 My Dissertation Security Threat Analysis and Mitigations in Real-World Systems – How errors in hardware and software impose security threats to real-world systems? (Any common characteristics?) – How effective are current defense techniques? (Any substantial deficiencies?) – How to build better defenses? Analysis-centric research approach – Emulated hardware memory errors impact on system security (DSN’01, DSN’02) – Software vulnerabilities reported in Bugtraq and CERT databases (DSN'03) – Current attack methods and defense techniques (In submission) – Analysis results motivate the development of new defense techniques. (IFIP SEC’04, DSN’05) Many areas related 2 Analyzing and Identifying Security Threats on Real-World Systems 3 Threat of Hardware Memory Errors Due to hardware memory errors, users can log in with arbitrary passwords Attacker Network server (FTP and SSH) Due to hardware memory errors, packets can penetrate firewalls Attacker Firewall (IPChains and Netfilter) Target host Emulate random hardware memory errors Use a stochastic model to estimate the threats in real environments Motivate other researchers to conduct physical fault injection experiments – Java type system subverted due to memory errors. 4 Threat of Software Vulnerabilities Other 33% Buffer Overflow 44% Globbing 2% Format String 7% Heap Corruption 8% Integer Overflow 6% CERT Advisories: 66% vulnerabilities are low level memory errors in software. Widely exploited by attackers, worms and viruses. 5 State Machine Model: WU-FTP Server Attack repeat Embed malicious contents in input FTP_service() Authentication; x = user ID seteuid(x) get an FTP command SITE_EXEC(fn) printf(fn,…) Overwrite a return address seteuid(0) exec(“/bin/sh”) 6 Execute malicious code State Machine Model: NULL-HTTP Server Attack repeat Overwrite function pointer foo HTTP_service() p=malloc(…) process HTTP header free(p) HTTP_POST() *foo() recv(p,…) Corrupt heap structure seteuid(0) exec(“/bin/sh”) Execute malicious code 7 Control Data Attack: Well-Known, Dominant Control data: – data used as targets of call, return and jump. – widely understood as security critical elements Control data attack: the most dominant form of memory corruption attacks [CERT and Microsoft Security Bulletin] Many current defense techniques: to enforce program control flow integrity to provide security. Non-control-data attacks – Currently very rare in reality – One instance suggested by Young and McHugh in 1987. – How applicable are such attacks against many real-world software? 8 An Important Question Are attackers in general incapable to mount noncontrol-data attacks against many real systems? – Probably not. – Random hardware memory errors can subvert the security of real-world systems with a non-negligible probability. – Software vulnerabilities are more deterministic and more amenable to attacks. – Each attack exploiting software vulnerabilities is composed by multiple primitive components. Allow potentially polymorphic attacks. Dangerous. 9 Our Claim: General Applicability of Non-control-data Attacks We claim: – Many real-world software applications are susceptible to non-control-data attacks. – The severity of the attack consequences is equivalent to that due to control data attacks. Validate the claim by constructing non-controldata attacks to get the root privilege on major network servers: FTP, HTTP, SSH and Telnet servers – Over 1/3 of vulnerabilities in CERT advisories Non-control-data attacks are realistic threats. 10 Non-control-data attack on WU-FTP Server (via a format string bug) int x; FTP_service(...) { authenticate(); x uninitialized, run as EUID 0 x = user ID of the authenticated user; x=109, run as EUID 0 seteuid(x); x=109, run as EUID 109. Lose the root privilege! while (1) { Get a special SITE EXEC command. Get a data command (e.g., PUT) get_FTP_command(...); //vulnerable Exploit a format string vulnerability. x= 0, still run as EUID 109. if (a data command?) getdatasock(...); } } getdatasock( ... ) { loop, still runs as EUID 0 (root). When return to service x=0, run as EUID 0 Allow seteuid(0); me to upload /etc/passwd setsockopt( ); root privilege! I can grant myself... the x=0, run as EUID 0 seteuid(x); 11 Only}corrupt an integer, not a control data attack. Non-control-hijacking attack on NULL-HTTP Server (via a heap overflow bug) Attack the configuration string of CGI-BIN path. Mechanism of CGI – suppose server name = www.foo.com CGI-BIN = /usr/local/httpd/exe /usr/local/httpd/exe /bar – Requested URL = http://www.foo.com/cgi-bin/bar – The server executes Our attack – Exploit the vulnerability to overwrite CGI-BIN to /bin /sh – Request URL http://www.foo.com/cgi-bin/sh – The server executes The server gives me a root shell! Only overwrite four characters in the CGI-BIN string. Not a control data attack. 12 Non-control-hijacking attack on SSH Communications SSH Server (via an integer overflow bug) void do_authentication(char *user, ...) { auth = 0 int auth = 0; ... auth = 0 while (!auth) { /* Get a packet from the client */ type = packet_read(); auth = 1 switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: Password incorrect, if (auth_password(user, password))but auth = 1 auth =1; case ... } if (auth) break; auth = 1 } /* Perform session preparation. */ Logged in without do_authenticated(…); 13 correct password } More non-control-hijacking attacks Against NetKit Telnet server (default Telnet server of Redhat Linux) – Exploit a heap overflow bug – Overwrite two strings: /bin/login –h foo.com -p (normal scenario) /bin/sh –h –p -p (attack scenario) – The server runs /bin/sh when it tries to authenticate the user. Against GazTek HTTP server – Exploit a stack buffer overflow bug Send a legitimate URL http://www.foo.com/cgi-bin/bar The server checks that “/..” is not embedded in the URL Exploit the bug to change the URL to http://www.foo.com/cgi-bin/../../../../bin/sh 14 The server executes /bin/sh Implications of Non-Control-Data Attacks Control flow integrity is not a sufficiently accurate approximation to software security. Many types of non-control data critical to security Once attackers have the incentive, they are likely to succeed in non-controldata attacks. 15 Re-Examining Current Defense Techniques Many of them are based on control flow integrity – Monitor system call sequence – Protect control data – Non-executable stack and heap Pointer encryption Address space randomization StackGuard, Libsafe and FormatGuard Building a generic and secure defense technique: still an open problem. 16 Pointer Taintedness Detection: Towards a Better Security Protection for Real-World Systems 17 Pointer Taintedness Pointer Taintedness: a pointer value, including a return address, is derived from user input. Most memory corruption attacks are due to pointer taintedness. Pointer taintedness: a unifying perspective for reasoning about security vulnerabilities. 18 Most Memory Corruption Attacks are Due to Pointer Taintedness Format string attack – Taint an argument pointer of functions such as printf, fprintf, sprintf and syslog. Stack buffer overflow (stack smashing) – Taint a frame pointer or a return address. Heap corruption – Taint the free-chunk doubly-linked list maintaining the heap structure. globbing attack – User input resides in a location that is used as a pointer by the parent function of glob(). 19 Internals of Stack Buffer Overflow Attacks Vulnerable code: char buf[100]; strcpy(buf,user_input); Stack growth High Return addr Frame pointer buf[99] … buf[1] buf[0] Frame pointer or return address can be tainted. user_input buf Low 20 Internals of Format String Attacks Vulnerable code: recv(buf); printf(buf); Stack growth High Low \xdd \xcc \xbb \xaa %d %d %d %n /* should be printf(“%s”,buf) */ … %n %d %d %d 0xaabbccdd fmt: format string pointer ap: pointer fmt:argument format string pointer ap: argument pointer In vfprintf(), *ap is a if (fmt points to “%n”) 21 tainted value. then **ap = (character count) Internals of Heap Corruption Attacks user input Vulnerable code: buf = malloc(1000); recv(sock,buf,1024); free(buf); Free chunk A Allocated buffer buf Free chunk B fd=A bk=C In free(): B->fd->bk=B->bk; B->bk->fd=B->fd; Free chunk C When B->fd and B->bk are tainted, the effect of free() is to write a user specified value to a user specified address. 22 Building Defense Techniques based on Pointer Taintedness Static code analysis: analyze the source code to extract the conditions under which the possibility of pointer taintedness exists. – To uncover potential vulnerabilities Runtime detection: monitor at runtime whether a tainted value is dereferenced as a pointer. – To defeat memory corruption attacks 23 Static Analysis about Pointer Taintedness: To Extract Security Specifications of Library Functions IFIP International Information Security Conference 2004 24 Library function specifications are crucial to secure programming Library function specifications are specified empirically – printf(fmt,…): fmt cannot be a user-specified string – strcpy(d,s): the length of string s should not exceed the size of buffer d, and d and s cannot be overlapped. – free(p): p must be a pointer obtained from a previous malloc; p cannot be freed before. – glob(), strtok(), savestr(), …. A unified reason why these specifications are required – They are required to eliminate the possibility of pointer taintedness. Extraction of security specifications of a function is reduced to a theorem proving problem: – Under which conditions can a function eliminate the possibility of pointer taintedness. 25 Semantics of Pointer Taintedness Formal definition of program semantics is required for theorem proving. Taintedness-aware memory model – The logic framework defines operations to fetch the content and test the taintedness (true/false) of each memory location. Incorporate pointer taintedness into program semantics – Define program semantics at the assembly level to reason about memory layout. – Load/Store/ALU instructions: propagate taintedness from source data to destination data. – Input functions (scanf, recv and recvfrom) Axiom: The memory locations in the receiving buffer are tainted 26 immediately after these function calls. Extracting Function Specifications by Theorem Prover C source code of a library function Automatically translated to formal semantic representation formal semantic representation Theorem generation For each pointer dereference in an assignment, generate a theorem stating that the pointer is not tainted Theorem proving A set of sufficient conditions that imply the validity of the theorems. They are the security specifications of the analyzed function. 27 Example: vfprintf() int vfprintf (FILE *s, const char *format, va_list ap) { char * p, *q; int done,data,n,state; char buf[10]; p=format; done=0; if (p==NULL) return 0; state=NO_PENDING; while (*p != 0) { if (state==NO_PENDING) { if (*p=='%') state=PENDING; else outchar(s,*p); } else { switch (*p) { case '%': outchar(s,'%') break; case 'd': data=va_arg (ap, int); if (data<0) { outchar(s,'-'); data=-data; } n=0; while (data>0 && n<10) { Theorem1: buf+n should not be a tainted value case 's': case 'n': Theorem2: q should not be a tainted value buf[n]=data%10+'0'; data/=10; n++; } while (n>0) { n--; outchar(s,buf[n]); } break; q=va_arg (ap, char *); if (q==NULL) break; while (*q!=0) { outchar(s,*q) q++; } break; q= va_arg(ap,void*) ; *(int*) q = done; break; outchar(s,*p) default: } state=NO_PENDING; } } p++; } return done; 28 Extracting the Specifications of vfprintf() iterate Try to prove the two theorems The theorem prover cannot complete the proof initially – only valid under certain preconditions. Add these preconditions as axioms to the theorem prover. Repeat until all theorems are proved. Finally, the following four preconditions are added, which are the specifications of vfprintf (FILE *s, const char *format, va_list ap) – ap never points to any location within the current function frame. – *ap never points to the location of variable ap, i.e., *ap &ap – Suppose the memory segment that ap sweeps over is called ap_activitiy_range, then *ap never points to any location within ap_activitiy_range. – No locations within ap_activitiy_range are tainted before 29 vfprintf() is called. Suggest the scenario of format string vulnerability Other Studied Examples Function strcpy() – Four security specifications indicating buffer overflow, buffer overlapping and buffer underflow scenarios causing pointer taintedness. Function free() of a heap management system – Seven security specifications are extracted, including several specifications indicating heap corruption vulnerabilities. Socket read functions of Apache HTTP Server and NULL HTTP Server – The Apache function is proven to be free of pointer taintedness. – Two (known) vulnerabilities are exposed in the theorem proving process of NULL HTTP Server function. 30 Runtime Pointer Taintedness Detection: To Defeat Memory Corruption Attacks To appear in IEEE Conference on Dependable Systems and Networks, 2005. 31 The Technique A processor architectural level mechanism to detect pointer taintedness – Implemented on SimpleScalar simulator Extended memory system to be taintednessaware Enhance load, store and ALU instructions to propagate taintedness bits in memory Untaint data that are checked by compare instructions Enhance input system calls to initialize taintedness Detect security attacks when tainted data are dereferenced, and stop the process. 32 Evaluations on Real-World Software Evaluation – – – – Effectiveness of detection No false alarm in any application evaluated Transparent to applications A small number of potential attack scenarios undetected. Pointer taintedness detection is a technique that can be applied to the whole program of real software, – offers a substantial improvement on security protection. 33 Conclusions 34 Conclusions Many real-world software can be compromised by corrupting non-control data. – It is insufficient to rely on control flow integrity for software security. Pointer taintedness is a unifying perspective to reason about most memory corruption vulnerabilities/attacks. Reasoning about pointer taintedness is a promising direction to enhance security on real-world systems – A theorem proving based code analysis approach – A runtime pointer taintedness detection mechanism 35 Future Directions Short term goals – Provide a higher degree of automation for the theorem proving technique. – Reduce the intrusiveness of the runtime pointer taintedness detection technique Combine with the theorem proving technique. The processor only checks function preconditions. Long term goals – Extract programming styles susceptible to security attacks. e.g., long lifetime of security critical data is a big problem. Can compilers detect bad programming styles? – Identify a broader range of non-traditional security threats. – Study historical data about how security vulnerabilities were discovered, reported and patched. – Decompose the behaviors of viruses, worms and rootkits to a number of basic building blocks. 36 Summary of My Research Methodology Analysis-centric approach – A significant amount of effort in my dissertation is on analysis. I like doing analysis on real data and incidents – Tedious? Sometimes, but it is a step toward a lot of fun. – Rewarding? Definitely. Especially important for systems research. – Goal: strongly motivate research topics that solve problems in the reality. 37 Backup Slides 38 Static and Dynamic Approaches Static approaches (avoid producing memory vulnerabilities in programs) Writing code with type safe language Compiler techniques to uncover memory vulnerabilities Compiler instruments source code according to program annotations. Challenges: legacy code and low level code, compatibility and performance. Fact: Memory vulnerabilities are still constantly discovered and exploited. Intrusion detection techniques (defeat attacks, given the existence of vulnerabilities) – Specialized techniques Defeat stack buffer overflow and format string attacks. – Generic defense techniques Most techniques are designed to defeat control-hijacking attacks. Host intrusion detection system and control flow integrity protection techniques. very active research area. Others have constraints and difficulties in their deployments. 39 (pointer encryption and address randomization) One-Slide Intro to Equational Logic Use term rewriting to establish proofs of theorems. Natural number addition expressed in the Maude system. 0 : Natural . s_ : Natural -> Natural . _+_ : Natural Natural -> Natural . vars N M : Natural Axiom: N + 0 = N . Axiom: N + s M = s (N + M) . (s s s 0) + (s s 0) = s ((s s s 0) + (s 0)) = s( s((s s s 0) + 0)) = s(s((s s s 0)) = s s s s s 0 Intuitively, this is a proof of “3 + 2 = 5” in natural number algebra. 40 Taintedness-Aware Memory Model • A store represents a snapshot of the memory state at a point in the program execution. • For each memory location, we can evaluate two properties: content and taintedness (true/false). • Operations on memory locations: •The fetch operation Ftch(S,A) gives the content of the memory address A in store S •The location-taintedness operation LocT(S,A) gives the taintedness of the location A in store S • Operations on expressions: •The evaluation operation Eval(S,E) evaluates expression E in store S •The expression-taintedness operation ExpT(S,E) computes 41 the taintedness of expression E in store S. Axioms of Eval and ExpT operations Eval(S, I) = I // I is an integer constant Eval(S, ^ E1) = Ftch(S, Eval(S,E1)) Eval(S, E1 + E2) = Eval(S, E1) + Eval(S, E2) Eval(S, E1 - E2) = Eval(S, E1) - Eval(S, E2) …… ExpT (S, I) = false ExpT(S, ^ E1) = LocT(S,Eval(S,E1)) ExpT(S,E1 + E2) = ExpT(S,E1) or ExpT(S,E2) ExpT(S,E1 - E2) = ExpT(S,E1) or ExpT(S,E2) …… E.g., is the expression (^100)–2 tainted in store S? ExpT(S, (^100)–2) = ExpT(S, (^100)) or ExpT(S, 2) = LocT(S,100) or false = LocT(S,100) Note: ^ is the dereference operator, ^100 gives the content in the location 42 100 Semantics of My Assembly Language The following instructions are defined: – mov [Exp1] <- Exp2 – branch (Condition) Label – call FuncName(Exp1,Exp2,…) Axioms defining mov instruction semantics – Specify the effects of applying mov instruction on a store – Allow taintedness to propagate from Exp2 to [Exp1]. Ftch((S ; mov [E1] <- E2),X1) = Eval(S,E2) if (Eval(S,E1) is X1) . Ftch((S ; mov [E1] <- E2),X1) = Ftch(S,X1) if not (Eval(S,E1) is X1) . LocT((S ; mov [E1] <- E2),X1) = ExpT(S,E2) if (Eval(S,E1) is X1) . LocT((S ; mov [E1] <- E2),X1) = LocT(S,X1) if not (Eval(S,E1) is X1) . Axioms defining the semantics of recv (similarly, scanf, recvfrom: user input functions) – Specify the memory locations tainted by the recv call. 43 Example: strcpy() char * strcpy (char * dst, char * src) { char * res; 0: res =dst; while (*src!=0) { 1: *dst=*src; dst++; src++; } 2: *dst=0; return res; } 0: mov [res] <- ^ dst lbl(#while#6) Translate to formal semantics branch (^ ^ src is 0) #ex#while#6 1: mov [^ dst] <- ^ ^ src mov [dst] <- (^ dst) + 1 mov [src] <- (^ src) + 1 branch true #while#6 Theorem generation lbl(#ex#while#6) 2: mov [^ dst] <- 0 mov [ret] <- ^ res a) Suppose S1 is the store before Line L1, then LocT(S1,dst) = false b) If S0 is the store before Line L0, and S2 is the store after Line L1, then I < Eval(S0, ^dst) or Eval(S0, ^dst+dstsize) I => LocT(S2,I) = LocT(S0, I) c) Suppose S3 is the store before Line L2, then LocT(S3,dst) = false Theorem proving 44 Specifications Extracted Suppose when function strcpy() is called, the size of destination buffer (dst) is dstsize, the length of user input string (src) is srclen Specifications that are extracted Documented in by the theorem proving approach – srclen <= dstsize – The buffers src and dst do not overlap in such a way that the buffer dst covers the string terminator of the src string. – The buffers dst and src do not cover the function frame of strcpy. – Initially, dst is not tainted Linux man page 45 Not documented Internships in Industrial Labs Summer’01, Avaya Labs, Basking Ridge, NJ – Libsafe is a software package originally invented for Linux to detect stack buffer overflow attacks. I implemented it on Windows NT/2000. Summer’02, Bell Labs, Holmdel, NJ – Mitigate network congestive denial of service attacks by detecting TCP unfriendly flows Summer’03, Microsoft Research, Redmond, WA – Audit-enhanced authentication in Kerberos Summer’04, Microsoft Research, Redmond, WA – A tracing technique to identify the dependencies of Windows applications on Administrator privileges 46