A Trustworthy Proof Checker Andrew W. Appel Neophytos G. Michael Roberto Virga Princeton University Aaron Stump Stanford University A trustworthy proof checker for proofs of properties of machinecode programs. FCS & VERIFY, July 2002 6/21/2016 1 Trusted Computing Base Theorem: Operating System: an + bn cn gcc emacs netscape Proof rogomatic make Axioms 6/21/2016 Trusted Base Kernel 2 The problem: Mobile Code Security Code Producer Source Program Compiler Code Code Consumer Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 ? Private files Network access Launch control etc. 6/21/2016 3 Existing Practice: Hardware VM protection Code Producer Source Program Compiler Machine Code load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Disadvantages: Code Consumer Execute Operating System virtual memory Protected resources Large trusted code base of O.S. Clumsy, slow interfaces between trusted & untrusted code 6/21/2016 4 Existing Practice: Bytecode Verification Code Producer Java Program Compiler ByteCode load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Trusted Computing Base Advantage: Clean, fast, O-O interface between trusted & untrusted code Disadvantage: Huge trusted computing base: JIT 6/21/2016 Code Consumer Bytecode Verifier OK Just-in-time Compiler Native code Execute 5 Foundational Proof-Carrying Code Code Producer Source Program Compiler Hints Machine Spec + Policy Prover 6/21/2016 Native Code load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Safety Proof $-i( -i(... -r ( ...) ) ) Code Consumer Execute Trusted Computing Base Machine Spec + Policy Checker OK 6 Trusted Computing Base The minimal set of code that must be trusted Our goal: make TCB as small as possible TCB consists of two pieces: 6/21/2016 The safety policy (a predicate in Higher-Order Logic that characterizes whether a program is safe to execute) The proof-checker (a small C program that checks safety proofs) 7 Trusted Computing Base (cont.) 1. Safety Policy a) b) c) Choose a logical framework (programming language for logic) Choose an object logic (axioms, inference rules) Represent our theorem in the object logic 2. Proof Checking Build a proof-checker for the logical framework 6/21/2016 1. Safety Policy a) We choose LF b) We choose Higher-Order Logic c) We will explain... 2. Proof Checking We use Twelf to prove theorems, but for checking we want something smaller and simpler . . . 8 Harper et al. 1993 LF, Twelf, and Higher Order Logic What is LF? A Logical Framework for defining and presenting logics Based on a general treatment of syntax, rules, and proofs by means of a typed first-order -calculus Its type system has three levels of terms: 6/21/2016 Objects Types Kinds -- that classify objects -- that classify families of types. Equality is taken as -conversion The judgments-as-types principle We use the Twelf implementation of LF (Pfenning et al. 99) We implement a standard HOL with arithmetic 9 Programming in Twelf Define formula constructors (an LF signature): num : type. form : type. imp : form -> form -> form. . . . . . . . . . Define proof constructors (axioms): pf : form -> type. imp_i : (pf A -> pf B) -> pf (A imp B). imp_e : pf (A imp B) -> pf A -> pf B. . . . 6/21/2016 . . . . . . 10 Theorems, proof checking in HOL Proof of logical transitivity: imp_trans: pf (A imp B) -> pf (B imp C) -> pf (A imp C) = [p1 : pf (A imp B)] [p2 : pf (B imp C)] imp_i [p3 : pf A] imp_e p2 (imp_e p1 p3). This shows the general form of a Twelf definition: name : = exp. 6/21/2016 11 The safety policy “This program accesses memory only in range 0-1000” “This program never executes an illegal instruction.” Step I: define access predicates readable(x) = 0 x 1000 writable(x) = 0 x 1000 Step II: define legal instructions . . . 6/21/2016 12 Machine states, step relation Machine State = Register bank + memory (r,m) (r’,m’ ) : the step relation is a map between machine states r 0 1 2 3 psr pc 6/21/2016 7 m r’ 0 1 2 3 psr pc m’ 8 13 Machine instruction = step relation add r1:=r2+r3 m’=m, r’(1)=r(2)+r(3), r’(pc)=1+r(pc), i i 1 i pc r’(i)=r(i) r 0 1 2 3 psr pc 6/21/2016 7 2 6 m r’ 0 1 2 3 psr pc m’ 8 2 6 14 Instruction decoding; memory policy load ri := m(rj+k) w= op d s1 s2 3 i j k r 0 1 2 3 psr pc m 7 w (r,m) (r’,m’ ) $ w,i,j,k m (r (pc)) = w w = 3212 + i28 + j24 + k m’ = m readable (r ( j) + k ) r’ (i) = m (r ( j)+ k) r’ (pc) = 1+ r’ (pc) x xi xpc r’ (x)=r (x) (...) (...) ... 6/21/2016 15 Making the specification concise & trustworthy Described in [Michael & Appel 2000] 6/21/2016 Separate syntax from semantics Factor the semantics Use “New Jersey Machine-Code Toolkit” to describe syntax Automatically translate NJMCT descriptions into concise and readable higher-order logic 16 Specifying safe execution relation includes only the legal instructions Safety means, “no matter how many instructions you execute, the next instruction is legal” The program is meant to be loaded at some start address loaded(m,start,prog) = i dom(prog). m(start+i) = prog(i) 100: Example: loaded(m,100, (9017;4214;8099;4010;6231;1008)) 6/21/2016 9017 4214 8099 4010 6231 1008 17 Safety theorem safe(prog) = r,m,start. loaded(m,start,prog) r(pc)=start r’,m’. r,m r’,m’ $ r’’,m’’. Trusted Computing Base r’,m’ r’’,m’’ r m start: 9017 4214 8099 Theorem to be proved: 4010 pc: start safe(9017;4214;8099;4010;6231;1008) 6/21/2016 6231 1008 ? ? ? 18 Size of Safety Specification (Sparc) Safety Predicate 6% Logic 7% Arithmetic 9% Machine Syntax Machine 25% Semantics 53% Safety Specification Logic Arithmetic Machine Syntax Machine Semantics Safety Predicate Total 6/21/2016 Lines Definitions 135 160 460 1005 105 61 94 334 692 25 1865 1206 19 Representation Issues in the Specification Eliminating Redundancy in LF terms Dealing with Arithmetic Representation of Axioms and Trusted Definitions: 6/21/2016 Encoding Higher-Order Logic in LF Polymorphic programming in Twelf Explicit versus implicit programming in Twelf Avoiding term reconstruction 20 Eliminating Redundancy LF signatures contain lots of redundant information imp_i : {A: form}{B: form} (pf A -> pf B) -> pf (A imp B). Twelf’s answer: parameters can be “declared” implicit imp_i : (pf A -> pf B) -> pf (A imp B). Implicit parameters in the TCB means type reconstruction in the checker 6/21/2016 Algorithm is large and complex It relies on higher-order unification which is undecidable (some valid proofs may fail) 21 Eliminating Redundancy (cont.) On the TCB side: We write axioms & trusted definitions in fully explicit style On the proving side: Implicit versus explicit LF term sizes Other approaches to this problem: Necula’s LFi, Oracle based checking We represent proofs as DAGs with structure sharing of common sub-expressions 6/21/2016 Proof-size blowup is avoided The checker does not need to parse proofs But constant factor is not so good, though A tradeoff: TCB size versus Proof Size 22 Term Reconstruction in the Prover 6/21/2016 Twelf’s term reconstruction algorithm (a.k.a. “type inference”) is extremely useful in writing proofs Outside TCB, write “compatibility lemmas” to interface with proofs that are written in implicit style. 23 The Proof Checker A small C program (~ 803 lines, 1/3 of the TCB) Type checks explicit LF proofs and loads and executes only safe programs Makes no use of libraries except: read, and _exit Main program Error Messaging 7% 2% Input/Output 4% Component Type checking 21% Parser DAG creation Lines Error Messaging Input/Output Parser DAG creation Type checking Main program 14 29 428 111 167 54 Total 803 52% 14% 6/21/2016 24 Why do we need a parser? 6/21/2016 Not for proofs -- they are transmitted to checker in DAG form For axioms! Humans can’t read axioms and trusted definitions in DAG form, therefore can’t trust them. (see Pollack ‘98, “How to believe a machinechecked proof”) 25 DAG representation of proofs & types Each DAG node is 5 words op arg1 arg2 type match op arg1 arg2 type match 6/21/2016 opcode left child right child computed type weak head normal form op arg1 arg2 type match Entire DAG is transmitted as a single block 26 Proof-checking measurements In the paper, we report a time of 74 seconds to check a benchmark proof (~ 6,000 lines) We have improved this to 0.48 seconds Checker marks closed terms Avoid traversing closed terms during substitutions Adds 20 lines to the Proof Checker op cl arg1 arg2 type match 6/21/2016 27 Smallest possible TCB 140 Compiler or Checker FPCC . 20 SpecialJ . 40 . 60 BulletTrain 80 0 6/21/2016 . 100 Kaffe 1000s of lines . 120 Core Runtime 28 Future Work Machine Descriptions for other CPUs (Mips, Sparc so far) TCB is really small but proof sizes are large. Work on finding the right tradeoff between TCB size and proof size 6/21/2016 Compress DAG in some way Use another compressed form of the LF syntactic notation Add a simple Prolog interpreter to the TCB that “rediscovers” the proof based on the sequence of TAL instructions given to the checker TCB no longer minimal but proof sizes greatly reduced 29