Ben Livshits Based in part of Stanford class slides from http://www.stanford.edu/class/cs295/ and slides from Ben Zorn’s talk slides • Lecture 1: Introduction to static analysis • Lecture 2: Introduction to runtime analysis • Lecture 3: analysis Applications of static and runtime reliability & bug finding performance security 2 • Purify for finding memory errors [1992] • Detecting memory leaks with Purify and GC [1992] • Detecting dangling pointers & buffer overruns with DieHard [2006] • Detecting buffer overruns with StackGard [1997] 3 • Dangling pointers: If the program mistakenly frees a live object, the allocator may overwrite its contents with a new object or heap metadata. • Buffer overflows: Out-of-bound writes can corrupt the contents of live objects on the heap. • Heap metadata overwrites: If heap metadata is stored near heap objects, an out-of-bound write can corrupt it. • Uninitialized reads: Reading values from newly-allocated or unallocated memory leads to undefined behavior. • Invalid frees: Passing illegal addresses to free can corrupt the heap or lead to undefined behavior. • Double frees: Repeated calls to free of objects that have already been freed cause freelist-based allocators to fail. 4 FINDING MEMORY ERRORS IN C/C++ PROGRAMS 5 • C/C++ are not memory-safe • Neither the compiler nor the runtime system enforces type abstractions • What is memory-safe vs. type-safe? • Possible to read or write outside of your intended data structure • Among other bad behaviors • What else is possible that you can’t do in Java or Scheme or ML or F#? 6 • Each byte of memory is in one of 3 states: • Unallocated: cannot be read or written •Allocated but uninitialized: cannot be read • Allocated and initialized: anything goes 7 • Check the state of each byte on each access • Binary instrumentation • Add code before each load and store • Represent states as giant array • 2 bits per byte of memory • What is the memory overhead? • 25%!! • Catches byte-level errors • Won’t catch bit-level errors 8 • We can only detect bad accesses if they are to unallocated or uninitialized memory • Try to make all bad accesses be of those two forms • We can make this part of our custom memory allocator 9 • Red Zones • Leave buffer space between allocated objects that is never allocated • Guarantees that walking off the end of an array accesses unallocated memory • Aging Freed Memory • When memory is freed, do not reallocate immediately • Helps catch dangling pointer errors 10 • One of the first commercially successful runtime tools • Was an independent company that got bought by IBM Rational • Overhead can vary from 25% to 40x 11 • This is where buffer overruns come from! • Why can’t we catch them? 12 PROGRESS & OPEN PROBLEMS 13 • Memory leaks are at least as serious as memory corruption errors • Also very difficult to find • Manifest only over hours, days, weeks • Often persist in production code • Managed languages such as Java and C# don’t really help 14 • We can find many memory leaks using techniques borrowed from garbage collection • Any memory with no pointers to it is leaked • There is no way to free this memory • Run a garbage collector • But don’t free any garbage • Just detect the garbage • Any inaccessible memory is leaked memory • Can we do this in C/C++ at all? Sort of… 15 • It is sometimes hard to tell what is accessible in a C/C++ program? • Cases • No pointers to a malloc’d block: definitely garbage • No pointers to head of a malloc’d block: maybe garbage • Pointers to the head of a malloc’d block: not garbage by usual definition 16 • From time to time, run a garbage collector • Use mark and sweep • Report areas of memory that are definitely or probably garbage • No type safety ==> no memory safety • Is this as easy as in Java? • Bookkeeping • Need to report who malloc’d the blocks originally • Store this information in the red zone between objects • Used in Purify, but watch out for memory overhead 17 • A Limitation • Only finds leaks to unreachable objects • Doesn’t cover leaks in languages with GC • Retaining data structures longer than needed • In practice, also a serious source of leaks, especially in Java . . . 18 • Look for objects not accessed for a “long time”. For each object • Track it from the moment it is allocated • Record the time of the last access (read or write) • Discard information when object is de-allocated • Periodically • Scan all objects • Warn about objects unused for a “long time” 19 TOLERATING MEMORY ERRORS AT RUNTIME 20 • Buffer overflow c char *c = malloc(100); c[101] = ‘a’; a 0 99 • Dangling reference char *p1 = malloc(100); char *p2 = p1; free(p1); p2[0] = ‘x’; p1 p2 x 0 99 22 • Increase robustness of installed code base • Potentially improve millions of lines of code • Minimize effort – ideally no source mods, no recompilation • Reduce requirement to patch • Patches are expensive (detect, write, deploy) • Patches may introduce new errors • Trade resources for robustness • E.g., more memory implies higher reliability • Make deployment easy • Change the allocator DLL, no changes to code needed • Make existing programs more fault tolerant • Define semantics of programs with errors • Programs complete with correct result despite errors 23 • Emery D. Berger and Benjamin G. Zorn, "DieHard: Probabilistic Memory Safety for Unsafe Languages", PLDI’06 • DieHard: correct execution in face of errors with high probability • Plug-compatible replacement for malloc/free in C lib • Define “infinite heap semantics” • Programs execute as if each object allocated with unbounded memory • All frees ignored • Approximating infinite heaps: 3 key ideas 1. 2. 3. Overprovisioning Randomization Replication • Allows analytic/probabilistic reasoning about safety 24 Expand size requests by a factor of M (e.g., M=2) 1 2 3 1 4 5 2 Pr(write corrupts) = ½ ? 3 4 5 Randomize object placement 4 2 3 1 5 Pr(write corrupts) = ½ ! 25 Replicate process with different randomization seeds P1 1 3 2 5 4 P2 input 4 3 1 5 2 P3 5 2 1 4 Broadcast input to all replicas 3 Voter Compare outputs of replicas, kill when replica disagrees 26 • Allocation • Segregate objects by size (log2), bitmap allocator • Within size class, place objects randomly in address space • De-allocation • Expansion factor => frees deferred • Extra checks for illegal free • Separate metadata from user data • Fill objects with random values – for detecting uninitialized reads 27 Runtime on Windows malloc DieHard 1.4 Normalized runtime 1.2 1 0.8 0.6 0.4 0.2 0 cfrac espresso lindsay p2c roboop Geo. Mean 28 • Synthetic: • Tolerates high rate of synthetically injected errors in SPEC programs • Spec benchmarks: • Detected two previously unreported benign bugs (197.parser and espresso) • Avoiding real errors: • Successfully hides buffer overflow error in Squid web cache server (v 2.3s5) • Avoids dangling pointer error in Mozilla • DoS in glibc & Windows 29 AVOIDING SECURITY EXPLOITS 30 • “Smashing the Stack for Fun and Profit” • Aleph One (AKA Elias Levy), Phrack 49, August 1996 • It is a cook book for how to create exploits for “stack smashing” attacks • Prior to this paper, buffer overflow attacks were known, but not widely exploited • “Validate all input parameters” is a security principle going back to the 1960s • After this paper, attacks became rampant • Stack smashing vulns are massively common, easy to discover, and easy to exploit 31 • Buffer overflow: • Program accepts string input, placing it in a buffer • Program fails to correctly check the length of the input • Attacker gets to overwrite adjacent state, corrupting it • Stack Smash: • Special case of a buffer overflow that corrupts the activation record 32 • Return address • Overflow changes it to point somewhere else • “Shell Code” • Point to exploit code that was encoded as CPU instructions in the attacker’s string • That code does exec(“/bin/sh”) hence “shell code” 33 • Why are we so vulnerable to something so trivial? • Because C chose to represent strings as null terminated instead of (base, bound) tuples • Because strings grow up and stacks grow down • Because we use Von Neumann architectures that store code and data in the same memory • But these things are hard to change … mostly • Try to move away from Von Neumann architecture by making key regions of memory be non-executable • Problem: x86 memory architecture does not distinguish between “readable” and “executable” per page 34 • “Solar Designer” introduces the Linux non-executable stack patch • Fun with x86 segmentation registers maps the stack differently from the heap and static data • Results in a non-executable stack • Effective against naïve Stack Smash attacks • Bypassable: • Inject your shell code into the heap (still executable) • Point return address at your shell code in the heap 35 • Compile in integrity checks for activation records • Insert a “canary word” (after the Welsh miner’s canary) • If the canary word is damaged, then your stack is corrupted • Instead of jumping to attacker code, abort the program 36 • Written in a few days by one intern • Less than 100 lines of code patch to GCC • Helped a lot that the GCC function preamble and function post amble code generator routines were nicely isolated • First canary was hardcoded 0xDEADBEEF • Easily spoofable, but worked for proof of concept 37 • The random canary: • Pull a random integer from the OS /dev/random at process startup time • Simple in concept, but in practice it is very painful to make reading from /dev/random work while still inside crt0.o • Made it work, but motivated us to seek something simpler • “Terminator” canary: • CR, LF, 00, -1: the symbols that terminate various string library functions • Rationale: will cause all the standard string mashers to terminate while trying to write the canary cannot spoof the canary and successfully write beyond it • Still vulnerable to attacks against poorly used memcpy() code, but buffer overflows thought to be rare 38 • 1999, “Emsi” creates the frame pointer attack • Frame pointer stored below the canary corruptible • Change FP to point to a fake activation record constructed on the heap • Function return code will believe FP, interpret the fake activation record, and jump to shell code • Bypasses both Terminator and Random Canaries • XOR Random Canary • XOR the correct return address with the random canary • Integrity check must match both the random number, and the correct return address 39 • Focus on malware detection and prevention • Nozzle • Runtime detector for heap spraying attacks • False positive rates: 106 • Much harder to “fix” than stack-based buffer overruns • Externally exploitable bugs • Overhead: 5-10% • Zozzle • Static/statistical detector • False positive rates: 106 • Overhead: very small 40