Safety Checking of Machine Code Zhichen Xu Zhichen@hpl.hp.com Internet Systems and Storage Laboratory HP Labs, Palo Alto (This is the Dissertation Work done at UW-Madison Thesis Advisors: Bart Miller and Tom Reps) ©2000-2003 Zhichen Xu UC Davis 5/2003 Motivation • Dynamic extensibility – Operating systems: custom policies, general functionality, performance • Extensible OS: exokernel, VINO, SPIN, synthetix... • Commodity OS: SLIC, kerninst, ... – Databases: type-based extensions • Illustra, informix, paradise, ... – Web browsers: plug-ins – Performance tools: measurement code • Kerninst, paradyn, ... – Other • Dyninst, ... 2 Motivation • Component-based software (Java, COM) – Software components from different vendors are combined to construct a complete application – Code from several sources with no mutual trust Safety is crucial in these domains! 3 Big Picture and Concepts • Is it safe for untrusted foreign code to be loaded into a trusted host system? Code producer Untrusted Code Data Code Host Code consumer 4 Safety Properties We Enforce • Default collection of safety conditions – – – – – – No type violations No out-of-bounds array accesses, No misaligned loads/stores, No uses of uninitialized variables, No invalid pointer dereferences, No unsafe interaction with the host Type safety • Precise and flexible host access policy – Customizable 5 Outline • Introduction – Motivation – Related work – High-level characteristics • • • • Safety-properties and policies Safety-checking analysis Experimental evaluation Conclusions and future research 6 Related Work • Dynamic Techniques: – Hardware enforced address spaces, SFI, interpretation, etc. • Hybrid Techniques – Safe languages: Java, ML, Modula 3, etc. • Static Techniques – Proof-Carrying Code [Necula, Lee] – Certifying Compiler, Typed-Assembly Language [Necula, Lee] [Morrisett, Walker, Crary, Glew, ...] [Colby, Lee, Necula, Blau] Runtime cost Potential recovery problem 7 Related Work vs Our Approach Pointer Arithmetic,... Safe C, Java, ML C, C++, Assembly, ... C, C++ Certifying Compiler Machine Code Off-the-Shelf cc, gcc, as Ordinary Binary Proof Proof Checker << Yes/No Safety Policy, Annotated Initial initial inputs Safety Checker 8 Yes/No Related Work vs Our Approach C, C++, Assembly, ... Pointer Arithmetic,... Safe C, Java, ML C, C++ Safety Policy, Initial inputs Ordinary Binary Certifying Compiler Machine Code Off-the-Shelf cc, gcc, as Proof generator Proof Ordinary Binary Proof Checker Proof Proof Checker Yes/No Yes/No 9 High-Level Characteristics • Perform safety checking on ordinary binary, mechanically synthesize (and verify) a safety proof • Extend the host at a very fine-grained level (allow the untrusted code to manipulate the internal data structures of the host directly) • Enforce host-specified access policy + type safety 10 Outline • • • • • Introduction Safety Properties and Policies Safety-Checking Analysis Experimental Evaluation Conclusion and Future Research 11 Safety Properties • Default collection of safety conditions – – – – – – No type violations No out-of-bounds array accesses, No misaligned loads/stores, No uses of uninitialized variables, No invalid pointer dereferences, No unsafe interaction with the host (Fine grained memory protection and data abstraction) • Precise and flexible host access policy – Customizable 12 Host-Specified Access Policy • Classify locations into regions As big as the entire address space, as small as a variable • [Region : Category : Access] – Category: Types, fields – Access: readable (r), Locations writable (w), followable (f), executable (e), Values operable (o) (e.g., to “copy”, to “examine”) 13 Protections Provided by Access Policy Initial inputs to the untrusted code Tree List rf rf rw call rw call rw call rw Host x Methods rw 14 Principle of “Least Privilege” • Kernel page-replacement extension – Pick a cold page from global LRU list. typedef struct _page_list { int page; ... struct _page_list * next; // read access // follow access } page_list; [Host : page_list.page : ro] [Host : page_list.next, page_list ptr : rfo] 15 Outline • Introduction • Safety Properties and Policies • Safety-Checking Analysis – An abstract storage model – Safety checking technique – An example • Experimental Evaluation • Conclusion and Future Research 16 Abstract Storage Model • Abstract Store : Abstract location Typestate – Abstract locations: • summarize one or more memory locations • readable, writable, name, size, align, ... – Typestate : properties of the values • <type, state, access> • Forms a lattice • Linear Constraints – Linear equalities, linear inequalities – Array bounds checks, alignment checks, etc. e.g., ptr != 0; 0 <=index < Length; addr mod 4=0; 17 Example and Notations • Abstract Location ptr n1 n2 ptr n3 n4 n • Abstract location typestate – n.page:<page_list.page, i, ro> – n.next: <page_list.next, {n}, rfo> – ptr:<page_list ptr, {n}, rfo> “i” for initialized, “u” for uninitialized, {n} for reference to n 18 Inputs to Our Safety Checker •Access policy: [H: ....page: ro] C, C++, Java •Typestate spec cc... init %o0 Ordinary binary init •Invocation spec (bindings) Safety Checker Yes Untrusted Trusted No (why) 19 Host-Typestate Specification • Data aspect: – Types and states of host data before the invocation of untrusted code • Control aspect – Safety pre- and post- conditions • Precondition describes the obligations that the actual parameters must meet • Postcondition provides a guarantee on the resulting state 20 Summarizing Function Calls • Placeholders: size, access permission, and typestate provide the safety requirements for actual parameters int gettimeofday (struct timeval* tp); Safety Precondition %o0: <struct timeval ptr, {null, t}, fo> t: <struct timeval, [u, u], w> Safety Postcondition t: <struct timeval, [<int, i, o>, <int, i, o>], o> %o0: <int, i, o> other registers: 21 Checking Function Calls • A binding process: – Check whether the actual abstract locations can meet the obligations specified by placeholders • An update process: – Updates the typestates of all actual locations that are represented by the placeholders according to the safety postcondition 22 Safety Checking: Phases 1: Preparation 2: Typestate Propagation 3: Annotation 4: Local Verification 5: Global Verification 23 Safety Checking: Phases 1: Preparation 2: Typestate What does each instruction do? Propagation 3: Annotation 4: Local Verification Is each instruction safe? 5: Global Verification 24 Safety Checking: Phases 1: Preparation 2: Typestate Untrusted Code Access policy Host typestate Invocation Propagation 3: Annotation 4: Local Verification 5: Global Verification 1: Preparation 1. Initial Annotation: AbsLoc->typestate, Linear Constraints 25 Safety Checking: Phases 1: Preparation 2: Typestate Untrusted Code 1. Initial Annotation Propagation 3: Annotation 4: Local Verification 5: Global Verification 2: Typestate Propagation 2. Approximation of memory contents at each program point 26 Safety Checking: Phases 1: Preparation 2: Typestate Untrusted Code 2. Approximation of memory Propagation 3. Annotation 3: Annotation 4: Local Verification 5: Global 3. Local Safety Conditions 3. Global Safety Conditions 3. Facts Verification 27 Safety Checking: Phases 1: Preparation 2: Typestate 2. Approximation of memory 3. Local Safety Conditions Propagation 3: Annotation 4: Local Verification 5: Global 4. Local Verification True/False Verification 28 Safety Checking: Phases 1: Preparation 2: Typestate Untrusted Code 3. Global Safety Conditions Propagation 3: Annotation 5a. Range Analysis 4: Local Verification 5b. Induction Iteration 5: Global Verification 3, 5a Facts True/False 29 A Bounds Checking Example – Host Typestate: a_p a_p: <int[n], {a}, _>, {n1} a: <int, i, _> – Invocation %o0 <-- a_p; %o1 <--- n – Access Policy [a_p : int[n] : rfo] [a: int : ro] a %o0 %o1 =n 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 30 Phase 1: Preparation – Host Typestate: a_p a_p: <int[n], {a}, _>, {n1} a: <int, i, _> – Invocation %o0 <-- a_p; %o1 <--- n – Access Policy [a_p : int[n] : rfo] [a: int : ro] – Initial Annotation a: <int, i, or>, %o0: <int [n], {a}, rwfo>, %o1:<int, i, rwo> {%o1=n n1} a %o0 %o1 =n 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 31 Phase 2: Typestate Propagation Finds out typestate of each absLoc at each program point a: <int, i, ro>, %o0: <int [n], {a}, rwfo>, %o1:<int, i, rw> {%o1 = n n 1} %o0 <int[n], {a}, <int[n], {a}, <int[n], {a}, <int[n], {a}, <int[n], {a}, <int[n], {a}, > > > > > > %o2 %g2 <type, [4], > <int, i, > <int, i, > <int, i, > <int, i, > <int, i, > < type, [4], > < type, [4], > < type, [4], > < type, [4], > < type, [4], > < int, i, rwo > 6: a : <int, i, ro> %o0: <int[n], {a}, rwo> %g2 : <int, i, rwo> 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 32 Phase 3: Annotation • Find out safety requirements and facts 6: a: <int, i, ro> %o0: <int[n], {a}, rwof> %g2 : <int, i, rwo> Facts: %o0 mod 4=0 Local Safety Conditions: a: readable, operable, init,... %g3: writable Global Safety Conditions: (%o0+%g2) mod 4=0 0 <=%g2 < 4n 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 33 Phase 4: Local Verification • Verify Local Safety Requirements 6: a: <int, i, ro> %o0: <int[n], {a}, rwof> %g2 : <int, i, rwo> %g3 : < type, [4], w> Local Safety Conditions: a: readable, operable, initialized %g3: writable 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 34 Phase 5: Global Verification Synergy of • A symbolic range analysis • A program-Verification Technique – Induction-Iteration Method to synthesize loop invariants [Suzuki & Ishihata 1977] 35 Phase 5: Range Analysis • The ranges of the registers at each program point – – – – register [ax+by+c, a’x’+b’y’+c’] x, y, x’ and y’ are either array base or length e.g., index: [0, length-1], e.g., pointer: [base, base+length-1] • Worklist-based algorithm • Immediate widening for quick convergence • Strategy for sharpening analysis – Pick the right spots in a loop to perform widening – Consider correlation among register values 36 Phase 5: Range Analysis : Example i=0; ...a[i]... i=i+1 i< n n1 37 Phase 5: Range Analysis : Example i=0; [0,0] [0,0] ...a[i]... [1,1] i=i+1 [1,1] i< n 38 Phase 5: Range Analysis : Example i=0; [0,0] [0,0] U [1,1]=[0,1] ...a[i]... [1,1] i=i+1 [1,1][1,2]=[1,] i< n 39 Phase 5: Range Analysis : Example i=0; [0,0] [0,0] U [1,1]=[0,1] ...a[i]... [1,n-1] i=i+1 [1,1][1,2]=[1,] i< n 40 Phase 5: Range Analysis : Example i=0; [0,0] [0,0] [1,n-1]=[0,n-1] ...a[i]... [1,n-1] i=i+1 [1, ][1,n]=[1,] i< n 41 Phase 5: Result of Range Analysis %g2: [0, 4n-4] 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 42 Program Verification R=wlp(S, Q) {y+1>0} S z=1 Q R is the weakest condition that if S terminates then Q is true {y+z>0} x=y+z {x>0} Verification Condition (VC) Generation 43 Program Verification Inv S Q Loop invariant: (i) Inv is true on entry (ii) Inv wlp(S, Inv) (iii) Inv wlp(S, Q) 44 Induction-Iteration Method W(0) W(1), ..., W(j) W(j+1) S Q W(0) = wlp(S, Q) W(i+1) = wlp(S, W(i)) 45 Phase 5: Result of Induction Iteration %o2<n %o1<=n 5: %o2 < n 6: %g2 < 4n 1: MOV 0,%o2 2: CMP %o2, %o1 3: BGE 11 4: NOP 5: SLL %o2,2,%g2 6: LD [%o0+%g2],%g3 7: ADD %o2,1,%o2 8: CMP %o2, %o1 9: BL 5 10: ADD %o3, %g3, %o3 11: RET 12: MOV %o3, %o0 46 Enhancements to Induction-Iteration Method • • • • Handle load/stores Handle multiple loops Handle procedure calls Introduce several strategies to speedup the induction-iteration method 47 Outline • • • • • Introduction Safety Properties and Policies Safety-Checking Analysis Experimental Evaluation Conclusions and Future Research 48 Experimental Evaluation • Test cases Array sum, start/stop timer, b-tree, kernel paging policy, hash, bubble sort, heap sort, stack-smashing, MD5, jPVM, /dev/kerninst (symbol, loggedWrites) • Summary of Results – Found safety violations in kernel policy, stack-smashing, /dev/kerninst – Verified all conditions, except for some calls in MD5, jPVM (precision lost due to inability to detect that a loop ‘kills’ all elements of an array) – Checking times vary from 0.1 to 30 seconds 49 71 95 309 315 339 358 883 16 89 16 45 36 11 6(4) 6 5 (2) 6 Btree Hash /dev/kerninst /symbol 51 jPVM Md5 Stack-smashing 41 Heap Sort 36 Heap Sort 2 25 Btree2 25 Stop Timer 22 Bubble Sort 20 Start Timer /dev/kerninst /loggedWrites Instructions 13 Paging Policy Sum Characteristics of Test Cases Branches 2 5 1 4 5 3 11 11 9 Loops (Inner) 1 2(1) 0 1 2(1) 0 2(1) 2(1) 4(2) 4(2) 7(1) 3 0 0 1(1) 1 0 2(2) 0 4 (4) 3 40 36 (40) (25) 48 (12) 13 15 (2) 16 (8) 17 39 (14) 56 84 100 (26) (42) (74) 99 (18) 116 (42) 192 121 (40) (30) C C in C++ style C++ C++ Procedure Calls (Trusted) Global Safety Conditions (Bounds Checks) Source Language 4 9 (2) C C C C C C 35 (14) C C 0 C 2 C 50 C Heap Sort /dev/kerninst /symbol /dev/kerninst /loggedWrites 0.04 0.03 0.09 0.11 0.17 0.15 0.69 3.05 4.88 15.4 5.92 Annotation 0.003 0.005 0.005 0.006 0.005 0.007 0.008 0.01 0.015 0.015 0.03 0.069 0.068 0.26 0.082 Range Analysis 0.01 0 0 0.01 0.03 0 0.03 0.04 0.08 0.12 0.54 0.24 0.68 0.95 1.24 InductionIteration 0.08 0.18 0.13 0.40 0.18 0.14 0.40 0.035 1.15 2.46 12.74 1.55 8.60 12.33 3..41 TOTAL 0.1 0.23 0.16 0.46 0.26 0.18 0.53 0.51 1.42 2.75 14.0 4.91 14.2 28.94 10.65 51 Md5 Heap Sort 2 0.04 jPVM Btree 0.02 Btree2 Stop Timer 0.05 Bubble Sort Hash 0.02 Paging Policy Typestate Propagation Sum Start Timer Stack-smashing Timing (Seconds) gi ng Po Su m St l ar i cy tT im er Ha Bu sh bb le So St rt op Ti me r Bt re e Bt re He e2 ap So He rt 2 St ap ac S ksm ort /d as /d ev hi ev ng /k /k e er rn jP ni V ns i nst /s M t/ log y ge mbo l dW ri t es M d5 Pa Timing 20 10 0 Typestate Propagation Global Verification 52 Some Benefits of Typestate System and Range Analysis • Summarizing function calls allow the analysis to use summaries of several host and library functions • Symbolic range analysis allow the system to identify boundaries of array in structure • Symbolic range analysis speeds up global verification up to 53% (with a median of 29%) 53 Speedup due to Range Analysis Range Analysis (normalized) 1.2 1 0.8 0.6 Iduction-teration (normalized) D5 M jP VM rt ksm as hi ng So St ac He ap rt 2 So 2 Bt re e rt So Bt re e He ap Bu bb le Ha sh Su m 0.4 0.2 0 54 Outline • • • • • Introduction Safety-Properties and Policies Safety-Checking Analysis Experimental Evaluation Conclusions and Future Research 55 Conclusion • A technique that works on ordinary machine code, and mechanically synthesizes (and verifies) a safety proof – Requiring only the initial inputs to the untrusted code be annotated • Extensible: – host-specified access policy – naturally extends to the checking of security properties • Experience promising 56 Limitations • Can only ensure safety properties that can be expressed using typestates + linear constraints – e.g., cannot handle nonlinear array subscripts • Induction iteration method is incomplete – e.g., generalization capability is limited • Limitations in handling of arrays – Lost precision • Inherited limitations of static techniques – Must reject code that cannot be checked statically – Otherwise, there is the recovery problem 57 Directions for Future Research • Improving the Precision of the Analyses – Developing better algorithms – Employing both static and run-time checks • Improving the Scalability of the Analysis – Employing modular checking – Employing analyses that are unsound – Producing proof-carrying code 58 Directions for Future Research • Extending the Techniques beyond Safety Checking – Checking security properties (information flow) • The access control policies we enforce is discretionary, once the data in the host is read, there is no control as to what the untrusted code can do with the data – Allows reverse engineering • Use by a run-time optimizer, a performance tool, etc. • We can forsake the most expensive part of our analysis 59 Contributions 1 Opens up the possibility of certifying code produced by offthe-shelf compilers 2 The technique is extensible 3 Extended the notion of typestate in several ways 4 Handle inheritance polymorphism implemented via physical subtyping (a new method for coping with subtyping in the presence of mutable pointers) 5 A mechanism for summarizing the effect of function calls 6 A technique to infer information about the sizes and types of stack-allocated arrays 7 A symbolic range analysis for array bounds checks 8 A prototype implementation and experimental study 60