Analyzing the Impact of Undefined Behavior

advertisement
24th ACM SOSP
(November, 2013)
Best Paper
Towards Optimization-Safe
Systems: Analyzing the Impact of
Undefined Behavior
Xi Wang,
Nickolai Zeldovich,
M. Frans Kaashoek,
Armando Solar-Lezama
MIT CSAIL
OUTLINE
2013/11/26
Introduction
 Model for Unstable Code
 Design & Implementation
 Evaluation

A Seminar at Advanced Defense Lab
2
INTRODUCTION
A Seminar at Advanced Defense Lab
The specifications of C-family languages
designate certain code fragments as having
undefined behavior.


2013/11/26

giving compilers the freedom to generate instructions
Aiming for system programming, the
specifications choose to trust programmers and
assume that their code will never invoke
undefined behavior.
3
UNDEFINED BEHAVIOR IN C
2013/11/26
A Seminar at Advanced Defense Lab
p, q, p’: n-bit pointer
 x, y : n-bit integer
 a : array

4
COMPILER OPTIMIZATION
One way in which compilers exploit undefined
behavior is to optimize a program under the
assumption that the program NEVER invokes
undefined behavior.
A Seminar at Advanced Defense Lab

2013/11/26

Consequence:
Origin program ≠ Optimized program
 We call such code optimization-unstable code, or just
unstable code for short.

5
UNSTABLE CODE EXAMPLE
2013/11/26

Vulnerability Note VU#162289 (US-CERT) [link]
A Seminar at Advanced Defense Lab
=>Compiler think: always false
6
UNSTABLE CODE EXAMPLE (CONT.)
2013/11/26
CVE-2009-1897 [link]
 Linux Kernel 2.6.30 [LXR link]
 Programmer put the check at an improper
position, but it can work...

A Seminar at Advanced Defense Lab
=>Compiler think: always false
7
Is this programmers’ fault?
Poor understanding of unstable code is a major
obstacle to reasoning about system behavior.
A Seminar at Advanced Defense Lab

2013/11/26

However, these bugs are quite subtle, and
understanding them requires detailed knowledge
of the language specification.
8
Is this compilers’ fault?
A story: GCC bug #30475 (2007/01/15) [link]
“This will create MAJOR SECURITY ISSUES in ALL
MANNER OF CODE. I don’t care if your language
lawyers tell you gcc is right. . . . FIX THIS! NOW!”


A Seminar at Advanced Defense Lab

2013/11/26

A GCC user
“I am not joking, the C standard explictly says signed
integer overflow is undefined behavior. . . . GCC is not
going to change.”

A GCC developer
9
UNSTABLE CODE TEST
2013/11/26
A Seminar at Advanced Defense Lab

The default optimization level for release build is
-O2.
10
MODEL FOR UNSTABLE CODE
A code fragment e in program P is unstable w.r.t.
language specifications C and C*
iff there exists a fragment e’ such that P ↝ 𝑃[𝑒/𝑒′] is
legal under C but not under C*.
A Seminar at Advanced Defense Lab

2013/11/26
C*: a C dialect that assigns well-defined
semantics to code fragments that have undefined
behavior in C.
 P: Program
 e: expression or code fragment
 P[e/e’]: replace e in program P with e’
 Definition: Unstable code

11
APPROACH FOR IDENTIFYING UNSTABLE
CODE
Stack does this using a two-phase scheme
Run optimizer O without taking advantage of
undefined behavior, which resembles optimizations
under C*
2.
Run optimizer O again, this time taking advantage
of undefined behavior, which resembles (more
aggressive) optimizations under C.
A Seminar at Advanced Defense Lab
1.
2013/11/26

12
WELL-DEFINED PROGRAM ASSUMPTION
A Seminar at Advanced Defense Lab
A code fragment e is well-defined on an input x
iff executing e never triggers undefined behavior at e
 𝑅𝑒 𝑥 ⟶ ¬𝑈𝑒 𝑥
 A program P is well-defined on an input x
iff every fragment of the program is well-defined on
that input, denoted as Δ
 ∆ 𝑥 = 𝑒∈𝑃 𝑅𝑒 (𝑥) → ¬𝑈𝑒 (𝑥)

2013/11/26
x: input
 Re(x): reachability condition.
=> under input x, will e be reached?
 Ue(x) or UB: undefined behavior condition.
=> under input x, will e exhibit undefined
behavior in C?
 Definition: Well-defined program assumption

13
ELIMINATING UNREACHABLE CODE
Theorem: Elimination
A Seminar at Advanced Defense Lab
In a well-defined program P, an optimizer can
eliminate code fragment e, if there is no input x that
both reaches e and satisfies the well-defined program
assumption Δ(x)
 ∄𝑥: 𝑅𝑒 (𝑥) ∆(𝑥)

2013/11/26

14
SIMPLIFYING UNNECESSARY
COMPUTATION
Theorem: Simplification

∃𝑒 ′ , ∄𝑥: 𝑒 𝑥 ≠ 𝑒 ′ 𝑥
∆(𝑥)
A Seminar at Advanced Defense Lab
𝑅𝑒 𝑥
2013/11/26

15
SIMPLIFICATION ORACLE
Algebra oracle: propose to eliminate common
terms on both sides of a comparison if one side is
a subexpression of the other

x + y < x => y < 0
A Seminar at Advanced Defense Lab

Boolean oracle: propose true and false in turn
for a boolean expression, enumerating possible
values
2013/11/26

16
LIMITATION
It is possible to exploit the well-defined program
assumption in other forms.
2013/11/26

A Seminar at Advanced Defense Lab
17
DESIGN & IMPLEMENTATION
Implement with LLVM + Boolector solver
2013/11/26

A Seminar at Advanced Defense Lab
18
COMPILER FRONTEND
A Seminar at Advanced Defense Lab
To reduce false warnings, Stack ignores such
compiler-generated code by tracking code origins,
at the cost of missing possible bugs.
2013/11/26

19
UB CONDITION INSERTION
Stack inserts a special function call into the IR at
the corresponding instruction
void bug_on(bool expr)
A Seminar at Advanced Defense Lab

2013/11/26

20
SOLVER-BASED ALGORITHM
But it is practically infeasible to precisely compute
them for large programs.
 To address this challenge, Stack computes
approximate queries by limiting the computation to a
single function.


With Tu and Padua’s algorithm
A Seminar at Advanced Defense Lab
To implement these algorithms, Stack consults
the Boolector solver to decide satisfiability for
elimination and simplification queries.
2013/11/26

21
EVALUATION
New bug: 160 (July 2012  March 2013)
2013/11/26

A Seminar at Advanced Defense Lab
22
ANALYSIS OF BUG REPORTS
2013/11/26
A Seminar at Advanced Defense Lab
Non-optimization bugs
 Urgent optimization bugs
 Time bombs
 Redundant code (false alarm)

23
ANALYSIS OF BUG REPORTS (CONT.)
2013/11/26
A Seminar at Advanced Defense Lab
Non-optimization Bugs
 Example: PostgreSQL [link]

24
Time bomb!!
PRECISION
Kerberos: 11 warning

Postgres: STACK produced 68 warnings
9 patches accepted
 29 patches in discussion: developers blamed
compilers
 26 time bombs
 4 false warnings

A Seminar at Advanced Defense Lab
Developers accepted every patch
 false warning rate: 0/11

2013/11/26

25
PERFORMANCE
2013/11/26
A Seminar at Advanced Defense Lab
64-bit Ubuntu (Linux)
 Intel Core i7-980 3.3GHz
 24GB memory
 Solver time out: 5s

26
PREVALENCE OF UNSTABLE CODE
2013/11/26
A Seminar at Advanced Defense Lab
All packages in Debian Wheezy archive: 17,432
 Containing C/C++ code: 8,575
 Containing unstable code: 3,471 (40%)
 150 CPU day to analyze

27
PREVALENCE OF UNSTABLE CODE (CONT.)
2013/11/26
A Seminar at Advanced Defense Lab
28
COMPLETENESS

We analyze what kind of unstable code Stack
misses.
A total of ten tests from real systems

Result: 7/10
A Seminar at Advanced Defense Lab

It is difficult to known precisely how much
unstable code Stack would miss in general.
2013/11/26

29
2013/11/26
30
A Seminar at Advanced Defense Lab
Q&A
Download