24th ACM SOSP (November, 2013) Best Paper Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, Armando Solar-Lezama MIT CSAIL OUTLINE 2013/11/26 Introduction Model for Unstable Code Design & Implementation Evaluation A Seminar at Advanced Defense Lab 2 INTRODUCTION A Seminar at Advanced Defense Lab The specifications of C-family languages designate certain code fragments as having undefined behavior. 2013/11/26 giving compilers the freedom to generate instructions Aiming for system programming, the specifications choose to trust programmers and assume that their code will never invoke undefined behavior. 3 UNDEFINED BEHAVIOR IN C 2013/11/26 A Seminar at Advanced Defense Lab p, q, p’: n-bit pointer x, y : n-bit integer a : array 4 COMPILER OPTIMIZATION One way in which compilers exploit undefined behavior is to optimize a program under the assumption that the program NEVER invokes undefined behavior. A Seminar at Advanced Defense Lab 2013/11/26 Consequence: Origin program ≠ Optimized program We call such code optimization-unstable code, or just unstable code for short. 5 UNSTABLE CODE EXAMPLE 2013/11/26 Vulnerability Note VU#162289 (US-CERT) [link] A Seminar at Advanced Defense Lab =>Compiler think: always false 6 UNSTABLE CODE EXAMPLE (CONT.) 2013/11/26 CVE-2009-1897 [link] Linux Kernel 2.6.30 [LXR link] Programmer put the check at an improper position, but it can work... A Seminar at Advanced Defense Lab =>Compiler think: always false 7 Is this programmers’ fault? Poor understanding of unstable code is a major obstacle to reasoning about system behavior. A Seminar at Advanced Defense Lab 2013/11/26 However, these bugs are quite subtle, and understanding them requires detailed knowledge of the language specification. 8 Is this compilers’ fault? A story: GCC bug #30475 (2007/01/15) [link] “This will create MAJOR SECURITY ISSUES in ALL MANNER OF CODE. I don’t care if your language lawyers tell you gcc is right. . . . FIX THIS! NOW!” A Seminar at Advanced Defense Lab 2013/11/26 A GCC user “I am not joking, the C standard explictly says signed integer overflow is undefined behavior. . . . GCC is not going to change.” A GCC developer 9 UNSTABLE CODE TEST 2013/11/26 A Seminar at Advanced Defense Lab The default optimization level for release build is -O2. 10 MODEL FOR UNSTABLE CODE A code fragment e in program P is unstable w.r.t. language specifications C and C* iff there exists a fragment e’ such that P ↝ 𝑃[𝑒/𝑒′] is legal under C but not under C*. A Seminar at Advanced Defense Lab 2013/11/26 C*: a C dialect that assigns well-defined semantics to code fragments that have undefined behavior in C. P: Program e: expression or code fragment P[e/e’]: replace e in program P with e’ Definition: Unstable code 11 APPROACH FOR IDENTIFYING UNSTABLE CODE Stack does this using a two-phase scheme Run optimizer O without taking advantage of undefined behavior, which resembles optimizations under C* 2. Run optimizer O again, this time taking advantage of undefined behavior, which resembles (more aggressive) optimizations under C. A Seminar at Advanced Defense Lab 1. 2013/11/26 12 WELL-DEFINED PROGRAM ASSUMPTION A Seminar at Advanced Defense Lab A code fragment e is well-defined on an input x iff executing e never triggers undefined behavior at e 𝑅𝑒 𝑥 ⟶ ¬𝑈𝑒 𝑥 A program P is well-defined on an input x iff every fragment of the program is well-defined on that input, denoted as Δ ∆ 𝑥 = 𝑒∈𝑃 𝑅𝑒 (𝑥) → ¬𝑈𝑒 (𝑥) 2013/11/26 x: input Re(x): reachability condition. => under input x, will e be reached? Ue(x) or UB: undefined behavior condition. => under input x, will e exhibit undefined behavior in C? Definition: Well-defined program assumption 13 ELIMINATING UNREACHABLE CODE Theorem: Elimination A Seminar at Advanced Defense Lab In a well-defined program P, an optimizer can eliminate code fragment e, if there is no input x that both reaches e and satisfies the well-defined program assumption Δ(x) ∄𝑥: 𝑅𝑒 (𝑥) ∆(𝑥) 2013/11/26 14 SIMPLIFYING UNNECESSARY COMPUTATION Theorem: Simplification ∃𝑒 ′ , ∄𝑥: 𝑒 𝑥 ≠ 𝑒 ′ 𝑥 ∆(𝑥) A Seminar at Advanced Defense Lab 𝑅𝑒 𝑥 2013/11/26 15 SIMPLIFICATION ORACLE Algebra oracle: propose to eliminate common terms on both sides of a comparison if one side is a subexpression of the other x + y < x => y < 0 A Seminar at Advanced Defense Lab Boolean oracle: propose true and false in turn for a boolean expression, enumerating possible values 2013/11/26 16 LIMITATION It is possible to exploit the well-defined program assumption in other forms. 2013/11/26 A Seminar at Advanced Defense Lab 17 DESIGN & IMPLEMENTATION Implement with LLVM + Boolector solver 2013/11/26 A Seminar at Advanced Defense Lab 18 COMPILER FRONTEND A Seminar at Advanced Defense Lab To reduce false warnings, Stack ignores such compiler-generated code by tracking code origins, at the cost of missing possible bugs. 2013/11/26 19 UB CONDITION INSERTION Stack inserts a special function call into the IR at the corresponding instruction void bug_on(bool expr) A Seminar at Advanced Defense Lab 2013/11/26 20 SOLVER-BASED ALGORITHM But it is practically infeasible to precisely compute them for large programs. To address this challenge, Stack computes approximate queries by limiting the computation to a single function. With Tu and Padua’s algorithm A Seminar at Advanced Defense Lab To implement these algorithms, Stack consults the Boolector solver to decide satisfiability for elimination and simplification queries. 2013/11/26 21 EVALUATION New bug: 160 (July 2012 March 2013) 2013/11/26 A Seminar at Advanced Defense Lab 22 ANALYSIS OF BUG REPORTS 2013/11/26 A Seminar at Advanced Defense Lab Non-optimization bugs Urgent optimization bugs Time bombs Redundant code (false alarm) 23 ANALYSIS OF BUG REPORTS (CONT.) 2013/11/26 A Seminar at Advanced Defense Lab Non-optimization Bugs Example: PostgreSQL [link] 24 Time bomb!! PRECISION Kerberos: 11 warning Postgres: STACK produced 68 warnings 9 patches accepted 29 patches in discussion: developers blamed compilers 26 time bombs 4 false warnings A Seminar at Advanced Defense Lab Developers accepted every patch false warning rate: 0/11 2013/11/26 25 PERFORMANCE 2013/11/26 A Seminar at Advanced Defense Lab 64-bit Ubuntu (Linux) Intel Core i7-980 3.3GHz 24GB memory Solver time out: 5s 26 PREVALENCE OF UNSTABLE CODE 2013/11/26 A Seminar at Advanced Defense Lab All packages in Debian Wheezy archive: 17,432 Containing C/C++ code: 8,575 Containing unstable code: 3,471 (40%) 150 CPU day to analyze 27 PREVALENCE OF UNSTABLE CODE (CONT.) 2013/11/26 A Seminar at Advanced Defense Lab 28 COMPLETENESS We analyze what kind of unstable code Stack misses. A total of ten tests from real systems Result: 7/10 A Seminar at Advanced Defense Lab It is difficult to known precisely how much unstable code Stack would miss in general. 2013/11/26 29 2013/11/26 30 A Seminar at Advanced Defense Lab Q&A