Enhancing Availability and Security Through Failure-Oblivious Computing Martin Rinard, Cristian Cadar, Daniel Dumitran, Daniel Roy, and William Beebee, Jr. Introduction Memory errors are a common source of program failures ML and Java use dynamic checks to eliminate such errors Assumption: Invalid memory access unsafe to continue the execution Failure-Oblivious Computing Instead of throwing an exception or terminating Ignores any memory access errors and continue Read (an out of bounds array element) Just read a manufactured value Write (an out of bounds array element) Discard the value Wrong Results? Many programs can continue to run As long as errors do not corrupt the program’s address space or data structures Failure-oblivious computing can improve the availability, robustness, and security of such programs Shouldn’t We Stop at the First Error? Debugging may not be an option No source code Not enough time Failure-oblivious computing can still provide acceptable service Better than no service Servers and Buffer-overrun Attacks When a program allocates a fixed-size buffer Then fails to check if input string fits in the buffer A long input string containing executable code can overwrites the stack contents Can coerce the server into running arbitrary code Servers and Buffer-overrun Attacks Failure-oblivious computing discards the excess characters, preserving the integrity of the stack Server detects invalid request and returns an error Converts a dangerous attack into an invalid input Multiple Items or Outputs Many programs (e.g. mail readers) process multiple items Some applications generate multiple outputs Some outputs are more important than others Without failure-oblivious computing Failure to process one can prevent the program to process the rest Benefits and Drawbacks + Increased resilience Graceful degradation and continue to operate successfully on most of its inputs + Increased security Can survive stack overruns + Reduced development costs Pressured to find and eliminate all disruptive bugs + Reduced administration overhead Reduce the success rate of attacks Benefits and Drawbacks + Safer integration Lowers the risks to use foreign components - May generate unacceptable results Inevitable consequence for better resiliency Need to convert unanticipated states into anticipated error states Scope Interactive computing environments Mailers Servers System administration tools Operating systems Document processing systems Mission critical applications Halting is not an option Scope Less appropriate for programs Hard to determine whether the output is correct Safety-critical applications Safer to terminate the computation Example A Mutt procedure With standard compilers Takes an input string Returns an encoded output string Fails to allocate sufficient space Writes succeed, corrupt the address space, and program segfaults With safe-C compilers Mutt exits before presenting the GUI Example With the failure-oblivious compiler The returned string is incorrect Server responds with an error Failure oblivious approach works Mostly correct programs With subtle errors Implementation Failure oblivious compiler Generate two kinds of additional code Checking code Discard erroneous writes Manufactures values for erroneous reads Continuation code Executes when checking code detects an attempt to perform illegal access Checking Code Jones and Kelly’s Scheme Track the locations to structs, arrays, variables Each data item is padded with an extra byte Initialized to ILLEGAL Check the status of each pointer before dereferencing it Continuation Code Write continuation code Discards the value Read continuation code Redirects the read to a preallocated buffer of values Iterates through all small integers Increasing the chance to exit loops To avoid nontermination Mostly 0s and 1s Continuation Code Optional logging Can be used to track down errors Failure-oblivious computing Can also reduce the incentive to eliminate errors Case Studies Recompiled widely-used open-source programs with known memory errors Pine (mail user agent) Midnight commander (file manager) Sendmail (mail transfer agent) Mutt (mail user agent) Samba (file server) WsMp3 (mp3 server) Apache (http server) Methodology Compare each program compiled differently By a standard C compiler By the CRED safe-C compiler By the failure-oblivious compiler Workloads Contain inputs that exploit known security vulnerabilities Pine 4.44 Fails to parse certain legal From fields Possible to execute arbitrary code Standard version: crashed Safe version: terminated with an error Failure oblivious version: continued to run Was able to forward the read and forward the message with the problematic From field Midnight Commander Problems with symbolic links in tgz files Standard version: segfaulted Safe version: terminated with an error message Failure-oblivious version: continued to run Sendmail 8.11.6 Allows root privilege to execute arbitrary code on the machine running the Sendmail server Standard version: vulnerable to an attack to gain the root shell Safe version: exited with an error message Failure-oblivious version: not vulnerable to the attack Mutt 1.4 Memory error in the conversion from UTF-8 to UTF-7 string formats Standard version: crashed Safe version: exited with an error message Failure oblivious version: continued to run 6x slow down Took about 1 second to load 3,000 messages Samba 2.2.5 Memory corruption error Standard version: vulnerable to an attack to gain the root shell Safe version: functional until the attack A remote user can obtain the root shell The child process exited Failure oblivious version: continued to run Similar performance compared to the safe version WsMp3 0.0.5 Memory-error vulnerability Standard version: segfaulted Safe version: crashed the entire server Single threaded Failure-oblivious version: survived the attack Apache 2.0.47 mod_alias contains a memory-error vulnerability Standard version: child process segfaulted Safe version: child process exited properly Failure-oblivious version: child process redirected the attacking request to a nonexistent URL The child process stayed alive and processed subsequent requests correctly Gzip 1.2.4a Memory error in its file name processing code Standard version: segfaulted An attacker can run arbitrary code Remaining files were not processed Safe version: exited at the problematic file Failure-oblivious version: prompted an error message for the problematic files Proceeded to process all remaining files 10x slow down (1.2 MB/sec) Discussion Failure oblivious versions survived all memory-corruption attempts Work well for this class of applications One input has a minimal effect on the next input Unless it corrupts the data structures or address space Little performance degradation for interactive programs Safe versions are prone to DoS attacks Tend to terminate prematurely Related Work Any safe-C compiler can be modified to implement a failure-oblivious compiler Discard writes Manufacture values for unsafe reads Typically < 2x slow down Occasionally 8x slow down Does not perceptibly degrade the response times of interactive programs Also I/O-bound programs Safe Languages Java and ML Modify the exception handling code Discard illegal writes Return manufactured values for illegal reads Traditional Error Recovery Traditional approaches Reboot Checkpointing Partial system restarts Hardware redundancy Failure-oblivious computing reduces down time and vulnerabilities to persistent errors Restarting Pine will not solve the problem Other Approaches Data structure repair Statically detect all buffer-overrun errors Failure-oblivious approach is preventive May conservatively reject almost working code Buffer-overrun detection tools Detect overwriting the return address Detect overwriting function pointers Failure-oblivious approach prevents the attack from corrupting the address space Conclusion Failure-oblivious computation enhances availability, resilience, and security Converts dangerous unknown system states to known error cases