Safety Checking of Machine Code Zhichen Xu

advertisement
Safety Checking of Machine Code
Zhichen Xu
Zhichen@hpl.hp.com
Internet Systems and Storage Laboratory
HP Labs, Palo Alto
(This is the Dissertation Work done at UW-Madison
Thesis Advisors: Bart Miller and Tom Reps)
©2000-2003 Zhichen Xu
UC Davis 5/2003
Motivation
• Dynamic extensibility
– Operating systems: custom policies, general
functionality, performance
• Extensible OS: exokernel, VINO, SPIN, synthetix...
• Commodity OS: SLIC, kerninst, ...
– Databases: type-based extensions
• Illustra, informix, paradise, ...
– Web browsers: plug-ins
– Performance tools: measurement code
• Kerninst, paradyn, ...
– Other
• Dyninst, ...
2
Motivation
• Component-based software (Java, COM)
– Software components from different vendors are
combined to construct a complete application
– Code from several sources with no mutual trust
Safety is crucial in these domains!
3
Big Picture and Concepts
• Is it safe for untrusted foreign code to be loaded into a
trusted host system?
Code producer
Untrusted
Code
Data
Code
Host
Code consumer
4
Safety Properties We Enforce
• Default collection of safety conditions
–
–
–
–
–
–
No type violations
No out-of-bounds array accesses,
No misaligned loads/stores,
No uses of uninitialized variables,
No invalid pointer dereferences,
No unsafe interaction with the host
Type safety
• Precise and flexible host access policy
– Customizable
5
Outline
• Introduction
– Motivation
– Related work
– High-level characteristics
•
•
•
•
Safety-properties and policies
Safety-checking analysis
Experimental evaluation
Conclusions and future research
6
Related Work
• Dynamic Techniques:
– Hardware enforced
address spaces, SFI,
interpretation, etc.
• Hybrid Techniques
– Safe languages: Java,
ML, Modula 3, etc.
• Static Techniques
– Proof-Carrying Code
[Necula, Lee]
– Certifying Compiler,
Typed-Assembly Language
[Necula, Lee]
[Morrisett, Walker, Crary,
Glew, ...]
[Colby, Lee, Necula, Blau]
Runtime cost
Potential recovery problem
7
Related Work vs Our Approach
Pointer Arithmetic,...
Safe C, Java, ML
C, C++, Assembly, ...
C, C++
Certifying
Compiler
Machine
Code
Off-the-Shelf
cc, gcc, as
Ordinary
Binary
Proof
Proof
Checker
<<
Yes/No
Safety
Policy,
Annotated
Initial
initial inputs
Safety
Checker
8
Yes/No
Related Work vs Our Approach
C, C++, Assembly, ...
Pointer Arithmetic,...
Safe C, Java, ML
C, C++
Safety Policy,
Initial inputs
Ordinary
Binary
Certifying
Compiler
Machine
Code
Off-the-Shelf
cc, gcc, as
Proof
generator
Proof
Ordinary
Binary
Proof
Checker
Proof
Proof
Checker
Yes/No
Yes/No
9
High-Level Characteristics
• Perform safety checking on ordinary binary,
mechanically synthesize (and verify) a safety
proof
• Extend the host at a very fine-grained level
(allow the untrusted code to manipulate the
internal data structures of the host directly)
• Enforce host-specified access policy + type
safety
10
Outline
•
•
•
•
•
Introduction
Safety Properties and Policies
Safety-Checking Analysis
Experimental Evaluation
Conclusion and Future Research
11
Safety Properties
• Default collection of safety conditions
–
–
–
–
–
–
No type violations
No out-of-bounds array accesses,
No misaligned loads/stores,
No uses of uninitialized variables,
No invalid pointer dereferences,
No unsafe interaction with the host
(Fine grained memory protection and data abstraction)
• Precise and flexible host access policy
– Customizable
12
Host-Specified Access Policy
• Classify locations into regions
As big as the entire address space, as small as a variable
• [Region : Category : Access]
– Category: Types, fields
– Access:
readable (r),
Locations
writable (w),
followable (f),
executable (e),
Values
operable (o)
(e.g., to “copy”, to “examine”)
13
Protections Provided by Access Policy
Initial inputs to the untrusted code
Tree
List
rf
rf
rw
call
rw
call
rw
call
rw
Host
x
Methods
rw
14
Principle of “Least Privilege”
• Kernel page-replacement extension
– Pick a cold page from global LRU list.
typedef struct _page_list {
int page;
...
struct _page_list * next;
// read access
// follow access
} page_list;
[Host : page_list.page : ro]
[Host : page_list.next, page_list ptr : rfo]
15
Outline
• Introduction
• Safety Properties and Policies
• Safety-Checking Analysis
– An abstract storage model
– Safety checking technique
– An example
• Experimental Evaluation
• Conclusion and Future Research
16
Abstract Storage Model
• Abstract Store : Abstract location  Typestate
– Abstract locations:
• summarize one or more memory locations
• readable, writable, name, size, align, ...
– Typestate : properties of the values
• <type, state, access>
• Forms a lattice
• Linear Constraints
– Linear equalities, linear inequalities
– Array bounds checks, alignment checks, etc.
e.g., ptr != 0; 0 <=index < Length; addr mod 4=0;
17
Example and Notations
• Abstract Location
ptr
n1
n2
ptr
n3
n4
n
• Abstract location  typestate
– n.page:<page_list.page, i, ro>
– n.next: <page_list.next, {n}, rfo>
– ptr:<page_list ptr, {n}, rfo>
“i” for initialized, “u” for uninitialized, {n} for reference to n
18
Inputs to Our Safety Checker
•Access policy: [H: ....page: ro]
C, C++, Java
•Typestate spec
cc...
init
%o0
Ordinary binary
init
•Invocation spec (bindings)
Safety Checker
Yes
Untrusted
Trusted
No (why)
19
Host-Typestate Specification
• Data aspect:
– Types and states of host data before the invocation of
untrusted code
• Control aspect
– Safety pre- and post- conditions
• Precondition describes the obligations that the actual
parameters must meet
• Postcondition provides a guarantee on the resulting
state
20
Summarizing Function Calls
• Placeholders: size, access permission, and typestate provide
the safety requirements for actual parameters
int gettimeofday (struct timeval* tp);
Safety Precondition
%o0: <struct timeval ptr, {null, t}, fo>
t: <struct timeval, [u, u], w>
Safety Postcondition
t: <struct timeval, [<int, i, o>, <int, i, o>], o>
%o0: <int, i, o>
other registers: 
21
Checking Function Calls
• A binding process:
–
Check whether the actual abstract locations can meet
the obligations specified by placeholders
• An update process:
–
Updates the typestates of all actual locations that are
represented by the placeholders according to the
safety postcondition
22
Safety Checking: Phases
1: Preparation
2: Typestate
Propagation
3: Annotation
4: Local
Verification
5: Global
Verification
23
Safety Checking: Phases
1: Preparation
2: Typestate
What does each instruction do?
Propagation
3: Annotation
4: Local
Verification
Is each instruction safe?
5: Global
Verification
24
Safety Checking: Phases
1: Preparation
2: Typestate
Untrusted
Code
Access policy
Host typestate
Invocation
Propagation
3: Annotation
4: Local
Verification
5: Global
Verification
1: Preparation
1.
Initial Annotation:
AbsLoc->typestate,
Linear Constraints
25
Safety Checking: Phases
1: Preparation
2: Typestate
Untrusted
Code
1.
Initial
Annotation
Propagation
3: Annotation
4: Local
Verification
5: Global
Verification
2: Typestate
Propagation
2.
Approximation of
memory contents at
each program point
26
Safety Checking: Phases
1: Preparation
2: Typestate
Untrusted
Code
2.
Approximation
of memory
Propagation
3. Annotation
3: Annotation
4: Local
Verification
5: Global
3.
Local Safety
Conditions
3.
Global Safety
Conditions
3.
Facts
Verification
27
Safety Checking: Phases
1: Preparation
2: Typestate
2.
Approximation
of memory
3.
Local Safety
Conditions
Propagation
3: Annotation
4: Local
Verification
5: Global
4. Local
Verification
True/False
Verification
28
Safety Checking: Phases
1: Preparation
2: Typestate
Untrusted
Code
3.
Global Safety
Conditions
Propagation
3: Annotation
5a. Range Analysis
4: Local
Verification
5b. Induction Iteration
5: Global
Verification
3, 5a
Facts
True/False
29
A Bounds Checking Example
– Host Typestate:
a_p
a_p: <int[n], {a}, _>, {n1}
a: <int, i, _>
– Invocation
%o0 <-- a_p; %o1 <--- n
– Access Policy
[a_p : int[n] : rfo] [a: int : ro]
a
%o0
%o1
=n
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
30
Phase 1: Preparation
– Host Typestate:
a_p
a_p: <int[n], {a}, _>, {n1}
a: <int, i, _>
– Invocation
%o0 <-- a_p; %o1 <--- n
– Access Policy
[a_p : int[n] : rfo] [a: int : ro]
– Initial Annotation
a: <int, i, or>,
%o0: <int [n], {a}, rwfo>,
%o1:<int, i, rwo>
{%o1=n  n1}
a
%o0
%o1
=n
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
31
Phase 2: Typestate Propagation
Finds out typestate of each absLoc at each program point
a: <int, i, ro>, %o0: <int [n], {a}, rwfo>, %o1:<int, i, rw> {%o1 = n  n  1}
%o0
<int[n], {a},
<int[n], {a},
<int[n], {a},
<int[n], {a},
<int[n], {a},
<int[n], {a},
>
>
>
>
>
>
%o2
%g2
<type, [4], >
<int, i, >
<int, i, >
<int, i, >
<int, i, >
<int, i, >
< type, [4], >
< type, [4], >
< type, [4], >
< type, [4], >
< type, [4], >
< int, i, rwo >
6: a : <int, i, ro>
%o0: <int[n], {a}, rwo>
%g2 : <int, i, rwo>
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
32
Phase 3: Annotation
• Find out safety requirements and facts
6: a: <int, i, ro>
%o0: <int[n], {a}, rwof>
%g2 : <int, i, rwo>
Facts: %o0 mod 4=0
Local Safety Conditions:
a: readable, operable, init,...
%g3: writable
Global Safety Conditions:
(%o0+%g2) mod 4=0
0 <=%g2 < 4n
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
33
Phase 4: Local Verification
• Verify Local Safety Requirements
6: a: <int, i, ro>
%o0: <int[n], {a}, rwof>
%g2 : <int, i, rwo>
%g3 : < type, [4], w>
Local Safety Conditions:
a: readable, operable, initialized
%g3: writable
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
34
Phase 5: Global Verification
Synergy of
• A symbolic range analysis
• A program-Verification Technique
– Induction-Iteration Method to synthesize loop
invariants [Suzuki & Ishihata 1977]
35
Phase 5: Range Analysis
• The ranges of the registers at each program point
–
–
–
–
register  [ax+by+c, a’x’+b’y’+c’]
x, y, x’ and y’ are either array base or length
e.g., index: [0, length-1],
e.g., pointer: [base, base+length-1]
• Worklist-based algorithm
• Immediate widening for quick convergence
• Strategy for sharpening analysis
– Pick the right spots in a loop to perform widening
– Consider correlation among register values
36
Phase 5: Range Analysis : Example
i=0;
...a[i]...
i=i+1
i< n
n1
37
Phase 5: Range Analysis : Example
i=0;
[0,0]
[0,0]
...a[i]...
[1,1]
i=i+1
[1,1]
i< n
38
Phase 5: Range Analysis : Example
i=0;
[0,0]
[0,0] U [1,1]=[0,1]
...a[i]...
[1,1]
i=i+1
[1,1][1,2]=[1,]
i< n
39
Phase 5: Range Analysis : Example
i=0;
[0,0]
[0,0] U [1,1]=[0,1]
...a[i]...
[1,n-1]
i=i+1
[1,1][1,2]=[1,]
i< n
40
Phase 5: Range Analysis : Example
i=0;
[0,0]
[0,0]  [1,n-1]=[0,n-1]
...a[i]...
[1,n-1]
i=i+1
[1, ][1,n]=[1,]
i< n
41
Phase 5: Result of Range Analysis
%g2: [0, 4n-4]
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
42
Program Verification
R=wlp(S, Q)
{y+1>0}
S
z=1
Q
R is the weakest condition
that if S terminates then Q
is true
{y+z>0}
x=y+z
{x>0}
Verification Condition (VC)
Generation
43
Program Verification
Inv
S
Q
Loop invariant:
(i) Inv is true on entry
(ii) Inv  wlp(S, Inv)
(iii) Inv  wlp(S, Q)
44
Induction-Iteration Method
W(0)


W(1),
...,
 W(j)  W(j+1)
S
Q
W(0) = wlp(S, Q)
W(i+1) = wlp(S, W(i))
45
Phase 5: Result of Induction Iteration
%o2<n  %o1<=n
5: %o2 < n
6: %g2 < 4n
1:
MOV 0,%o2
2:
CMP %o2, %o1
3:
BGE 11
4:
NOP
5:
SLL %o2,2,%g2
6:
LD [%o0+%g2],%g3
7:
ADD %o2,1,%o2
8:
CMP %o2, %o1
9:
BL 5
10:
ADD %o3, %g3, %o3
11:
RET
12:
MOV %o3, %o0
46
Enhancements to Induction-Iteration
Method
•
•
•
•
Handle load/stores
Handle multiple loops
Handle procedure calls
Introduce several strategies to speedup the
induction-iteration method
47
Outline
•
•
•
•
•
Introduction
Safety Properties and Policies
Safety-Checking Analysis
Experimental Evaluation
Conclusions and Future Research
48
Experimental Evaluation
• Test cases
Array sum, start/stop timer, b-tree, kernel paging policy,
hash, bubble sort, heap sort, stack-smashing, MD5, jPVM,
/dev/kerninst (symbol, loggedWrites)
• Summary of Results
– Found safety violations in kernel policy, stack-smashing,
/dev/kerninst
– Verified all conditions, except for some calls in MD5,
jPVM (precision lost due to inability to detect that a loop
‘kills’ all elements of an array)
– Checking times vary from 0.1 to 30 seconds
49
71
95
309
315
339
358
883
16
89
16
45
36
11
6(4)
6
5
(2)
6
Btree
Hash
/dev/kerninst
/symbol
51
jPVM
Md5
Stack-smashing
41
Heap Sort
36
Heap Sort 2
25
Btree2
25
Stop Timer
22
Bubble
Sort
20
Start Timer
/dev/kerninst
/loggedWrites
Instructions 13
Paging
Policy
Sum
Characteristics of Test Cases
Branches
2
5
1
4
5
3
11
11
9
Loops
(Inner)
1
2(1)
0
1
2(1)
0
2(1)
2(1)
4(2) 4(2) 7(1)
3
0
0
1(1)
1
0
2(2) 0
4 (4)
3
40
36
(40) (25)
48
(12)
13
15
(2)
16
(8)
17
39
(14)
56
84
100
(26) (42) (74)
99
(18)
116
(42)
192 121
(40) (30)
C
C in
C++
style
C++
C++
Procedure
Calls
(Trusted)
Global
Safety
Conditions
(Bounds
Checks)
Source
Language
4
9
(2)
C
C
C
C
C
C
35
(14)
C
C
0
C
2
C
50
C
Heap Sort
/dev/kerninst
/symbol
/dev/kerninst
/loggedWrites
0.04
0.03
0.09
0.11
0.17
0.15
0.69
3.05
4.88
15.4
5.92
Annotation
0.003
0.005
0.005
0.006
0.005
0.007
0.008
0.01
0.015
0.015
0.03
0.069
0.068
0.26
0.082
Range
Analysis
0.01
0
0
0.01
0.03
0
0.03
0.04
0.08
0.12
0.54
0.24
0.68
0.95
1.24
InductionIteration
0.08
0.18
0.13
0.40
0.18
0.14
0.40
0.035
1.15
2.46
12.74
1.55
8.60
12.33
3..41
TOTAL
0.1
0.23
0.16
0.46
0.26
0.18
0.53
0.51
1.42
2.75
14.0
4.91
14.2
28.94
10.65
51
Md5
Heap Sort 2
0.04
jPVM
Btree
0.02
Btree2
Stop Timer
0.05
Bubble
Sort
Hash
0.02
Paging
Policy
Typestate
Propagation
Sum
Start Timer
Stack-smashing
Timing (Seconds)
gi
ng
Po
Su
m
St
l
ar i cy
tT
im
er
Ha
Bu
sh
bb
le
So
St
rt
op
Ti
me
r
Bt
re
e
Bt
re
He
e2
ap
So
He rt 2
St
ap
ac
S
ksm ort
/d
as
/d
ev
hi
ev
ng
/k
/k
e
er
rn
jP
ni
V
ns i nst
/s M
t/
log
y
ge mbo
l
dW
ri t
es
M
d5
Pa
Timing
20
10
0
Typestate
Propagation
Global Verification
52
Some Benefits of Typestate
System and Range Analysis
• Summarizing function calls allow the analysis to use
summaries of several host and library functions
• Symbolic range analysis allow the system to identify
boundaries of array in structure
• Symbolic range analysis speeds up global verification up to
53% (with a median of 29%)
53
Speedup due to Range Analysis
Range Analysis
(normalized)
1.2
1
0.8
0.6
Iduction-teration
(normalized)
D5
M
jP
VM
rt
ksm
as
hi
ng
So
St
ac
He
ap
rt
2
So
2
Bt
re
e
rt
So
Bt
re
e
He
ap
Bu
bb
le
Ha
sh
Su
m
0.4
0.2
0
54
Outline
•
•
•
•
•
Introduction
Safety-Properties and Policies
Safety-Checking Analysis
Experimental Evaluation
Conclusions and Future Research
55
Conclusion
• A technique that works on ordinary machine code,
and mechanically synthesizes (and verifies) a
safety proof
– Requiring only the initial inputs to the untrusted code be
annotated
• Extensible:
– host-specified access policy
– naturally extends to the checking of security properties
• Experience promising
56
Limitations
• Can only ensure safety properties that can be
expressed using typestates + linear constraints
– e.g., cannot handle nonlinear array subscripts
• Induction iteration method is incomplete
– e.g., generalization capability is limited
• Limitations in handling of arrays
– Lost precision
• Inherited limitations of static techniques
– Must reject code that cannot be checked statically
– Otherwise, there is the recovery problem
57
Directions for Future Research
• Improving the Precision of the Analyses
– Developing better algorithms
– Employing both static and run-time checks
• Improving the Scalability of the Analysis
– Employing modular checking
– Employing analyses that are unsound
– Producing proof-carrying code
58
Directions for Future Research
• Extending the Techniques beyond Safety Checking
– Checking security properties (information flow)
• The access control policies we enforce is
discretionary, once the data in the host is read, there
is no control as to what the untrusted code can do
with the data
– Allows reverse engineering
• Use by a run-time optimizer, a performance tool, etc.
• We can forsake the most expensive part of our
analysis
59
Contributions
1 Opens up the possibility of certifying code produced by offthe-shelf compilers
2 The technique is extensible
3 Extended the notion of typestate in several ways
4 Handle inheritance polymorphism implemented via physical
subtyping (a new method for coping with subtyping in the
presence of mutable pointers)
5 A mechanism for summarizing the effect of function calls
6 A technique to infer information about the sizes and types
of stack-allocated arrays
7 A symbolic range analysis for array bounds checks
8 A prototype implementation and experimental study
60
Download