CS711 Overview of PCC Greg Morrisett Cornell University Thanks to G.Necula & P.Lee Papers for this Lecture G. Necula, “Proof-Carrying Code”. PoPL'97. G.Necula and P.Lee. “Safe Kernel Extensions Without Run-Time Checking”. OSDI'96. G.Necula and P.Lee. “The Design and Implementation of a Certifying Compiler.” PLDI’98, June 1998. [pldi98.ps] I also highly recommend Necula’s PhD thesis (CMU). 7/12/2016 Lang. Based Security 2 Ideally: trusted computing base Your favorite language verifier Security Policy System Binary Low-Level IL optimizer machine code 7/12/2016 Lang. Based Security 3 Idea #1: Theorem Prover! trusted computing base Your favorite language NuPRL Security Policy System Binary Low-Level IL optimizer machine code 7/12/2016 Lang. Based Security 4 Unfortunately... trusted computing base NuPRL 7/12/2016 Lang. Based Security 5 Observation Finding a proof is hard, but verifying a proof is easy. 7/12/2016 Lang. Based Security 6 PCC: trusted computing base verifier optimizer System Binary machine code prover Security Policy could be you “certified binary” code 7/12/2016 invariants Lang. Based Security proof 7 Making “Proof” Rigorous: Specify machine-code semantics and security policy using axiomatic semantics. {Pre} ld r2,r1(i) {Post} Given: – security policy (i.e., axiomatic semantics and associated logic for assertions) – untrusted code – annotated with invariant assertions it’s possible to calculate a verification condition: – an assertion A such that – if A is true then the code respects the policy. 7/12/2016 Lang. Based Security 8 The Client The client takes its code & the policy: – constructs some loop invariants. – constructs the verification condition A from the code, policy, and loop invariants. – constructs a proof that A is true. code invariants proof “certified binary” 7/12/2016 Lang. Based Security 9 Verification The Verifier (~ 4-6 pages of C code): – takes code, loop invariants, and policy – calculates the verification condition A. – checks that the proof is a valid proof of A: • fails if some step doesn’t follow from an axiom or inference rule • fails if the proof is valid, but not a proof of A code invariants proof “certified binary” 7/12/2016 Lang. Based Security 10 Advantages of PCC In Principle: Simple, small, and fast TCB. No external authentication or cryptography. No additional run-time checks. “Tamper-proof”. Precise and expressive specification of code safety policies. code 7/12/2016 invariants proof Lang. Based Security 11 An Experiment: Packet Filters • Safety Policy: – given a packet, returns yes/no – packet is read-only, small scratchpad – no loops • Compare: – Berkeley Packet Filter Interpreter – Modula-3 (but turn off type-checking) – Software Fault Isolation (sandboxing) – PCC (hand-optimized, proved) 7/12/2016 Lang. Based Security 12 Results: PCC wins: ms 15 12 9 6 3 0 PCC SFI M3 BPF 0 7/12/2016 10 20 30 40 Thousands of packets Lang. Based Security 50 13 Is PCC the answer? PCC seems to offer everything we need: – small, simple trusted computing base – optimize all you want, any language, any security policy, etc. But how do we make it scale to real programs? 7/12/2016 Lang. Based Security 14 Scaling Problem #1: How to generate proofs? • Manual construction is too painful for real programs. • Interactive theorem provers are really only feasible for a relatively small fraction of the code. • We need something that’s fully automatic most of the time. 7/12/2016 Lang. Based Security 15 One Approach Restrict the safety policy to type safety. • Necessary for most policies anyway: – cannot execute code or access data for which you do not have a capability. – type systems are a meta-policy that allow programmers to define fine-grained notions of “capability” and “access”. • abstract types, interfaces, static scope, etc. • Start with a well-typed, high-level program – you have a proof for the high-level code – preserve the proof as you compile 7/12/2016 Lang. Based Security 16 Type-Preserving Compilation Source code Type-checker binary Optimizer Codegenerator Proof of type-safety 7/12/2016 Proof of type-safety Lang. Based Security 17 Touchstone [Necula] • Compiles type-safe subset of C to certified binaries for the DEC Alpha. • Security policy is type-safety: – parameters of the right type to functions – values of the right type in arrays, structs – array indices in bounds • Highly-optimizing – competitive with GCC, DEC cc – eliminates array bound checks when possible 7/12/2016 Lang. Based Security 18 Touchstone Performance Speedup vs. "GNU gcc -O0" 12x 10x 8x 6x 4x 2x 0x GNU gcc -O4 DEC cc -O4 Touchstone blur sharpen qsort simplex kmp unpack bcopy edge GMEAN 2.33 2.92 2.64 3.82 3.68 3.89 3.51 3.52 3.52 2.97 2.79 3.86 2.44 2.44 1.93 2.62 2.76 2.20 5.50 6.88 4.00 2.92 11.52 9.16 3.17 3.92 3.48 In spite of the fact that C compilers do not insert array bound checks, Touchstone is competitive. 7/12/2016 Lang. Based Security 19 Touchstone Compilation Time 6000.0 Time (ms) 5000.0 4000.0 3000.0 2000.0 1000.0 0.0 Proof Checking (ms) Proving (ms) VC Generation (ms) Code Generation (ms) blur 5.9 81.0 6.9 271.0 sharpen qsort 21.0 257.0 22.3 818.0 16.1 127.0 12.0 560.0 simplex kmp unpack bcopy edge 108.7 1272.0 73.9 4340.0 9.6 108.0 8.4 348.0 50.0 1912.0 55.9 1885.0 1.9 25.0 3.3 136.0 9.3 143.0 9.7 697.0 • Geometric means: – compilation 75% – VC generation 2% 7/12/2016 – proving 21% – proof checking 2% Lang. Based Security 20 JVM vs. Touchstone JVM: – portable – $$$ Touchstone: – extremely good performance – extremely small TCB – fast verification 7/12/2016 Lang. Based Security 21 However... • Touchstone’s type system suits only one very simple language: – no abstract data types, objects, etc. – no threads • Proof size was an issue: – proofs were 1-3x the size of the code, just for a really simple notion of type-safety. – but recent work by Necula shows that this can be compressed down to tiny overhead (e.g., 10%) 7/12/2016 Lang. Based Security 22 PCC binary size (bytes) Touchstone proof size 15000 10000 5000 0 blur Proof Invar Code 718 162 320 sharpen qsort 2774 342 1248 1778 272 560 simplex kmp 11810 825 3584 1246 132 624 unpack bcopy 5636 804 2496 250 36 64 edge 1122 102 640 Touchstone’s proof size relative to code and invariant annotations. 7/12/2016 Lang. Based Security 23 Summary thus far... • Proof-carrying code is great in principle. – It’s the right general framework. – For special-purpose applications, can’t be beat. • But for general-purpose extensions: – Need some way to get the proof automatically (limit policy to type-safety). – Engineering proof size is an issue. – Compiling high-level languages is an issue. 7/12/2016 Lang. Based Security 24 Design Details Server Client Safety policy Invar Code VC Generator Source Logic VC Code Proof Checker 7/12/2016 Certifying Compiler Theorem Prover Proof Untrusted Trusted Complex Simple Slow Fast Lang. Based Security 25 Abstract Machine • Instructions (from DEC Alpha): – – – – ADD/SUB rs, Op, rd (Op ::= n | r) LD rd, n(rs), ST rs, n(rd) BEQ/NE rs, n, RET INV(P) • States: (R,pc) – R[r] is a 64-bit integer – R[mem] is memory: Int64->Int64 – pc is current program counter • Expressions: – e ::= n | r | e1 + e2 | e1 – e2 | sel(m,e) – m ::= mem | upd(m,e1,e2) 7/12/2016 Lang. Based Security 26 Semantics: • (R,pc) -> (R',pc') – relative to fixed instruction sequence S • Rewriting rules: – R' = R[rd := R(rt) + R(rs)], pc' = pc+1 if S(pc) = ADD rs,rt,rd – R' = R[rd := sel(R(m),R(rs)+n)], pc'=pc+1 if S(pc) = LD rd,n(rs) and readable(R,rs,n) – R' = R[m := upd(R(m),R(rd)+n,R(rs))] pc' = pc+1 if S(pc) = ST rs, n(rd) and writeable (R,rd,n) – R = R', pc = pc+n+1 if S(pc) = BEQ rs,n and R(rs) = 0 (and pc+n+1 in 0..S.size-1) 7/12/2016 Lang. Based Security 27 Predicates • P ::= true | false | P1 & P2 | P1 => P2 | All x.P | e1 = e2 | e1 != e2 | e : T • T ::= RO | RW – quantifiers range over numbers and are meant to hold in every state. – e:T predicate asserting that e has type T • Example pre-condition: – r0:RO & (r0+8):RO & (sel(m,r0) != 0) => (r0+8):RW 7/12/2016 Lang. Based Security 28 Axioms and Proof Rules • The usual ones for predicate logic • Some rules for reasoning about 64-bit arithmetic values • Rules for reasoning about memory: – sel(upd(m,e1,e2),e3) = e2 when e1 = e3 – sel(upd(m,e1,e2),e3) = sel(m,e3) when e1 != e3. – upd(upd(m,e1,e2),e3,e4) = upd(upd(m,e3,e4),e1,e2) when e1 != e3 – Note: aliasing strikes again! • Rules for reasoning about types: – e:RW => e:RO 7/12/2016 Lang. Based Security 29 Notes on Axioms • When you scale PCC up: – you still need a rich type system to specify interfaces (i.e., pre-conditions) – you still have to prove the consistency and soundness of your axioms w.r.t. the machine – i.e., you still have to write down a TAL and prove its soundness – you'll tend to use the same type invariance tricks to ensure soundness 7/12/2016 Lang. Based Security 30 Verification Conditions VC(i) = [rs+rt / rd] VC(i+1) if S(i) = ADD rs,rt,rd (rs+n):RO & [sel(m,rs+n)/rd]VC(i+1) if S(i) = LD rd,n(rs) (rd+n):RW & [upd(m,rd+n,rs)/m]VC(i+1) if S(i) = ST rs,n(rd) 7/12/2016 Lang. Based Security 31 VC continued VC(i) = (rs = 0 => VC(i+n+1)) & (rs != 0 => VC(i+1)) when S(i) = BEQ rs,n PostCondition when VC(i) = RET P when VC(i) = INV(P) 7/12/2016 Lang. Based Security 32 Notes on VCGen • Computes the weakest pre-condition of the program if you start form the post-condition at the RET(s) and work back. – Need to cut cycles (back-edges in CFG) with INV nodes or more properly. – Note that INV isn't trusted – it's assumed for the continuation, but verified if you ever get to it. – Accomplished by adding INV => VC(i+1) to the final safety predicate. • Now all you need is a proof that VCGen is implied by the pre-condition. 7/12/2016 Lang. Based Security 33 Example: add r2,r2,5 ld r1, r2(3) st r5, r1(1) {true} 7/12/2016 Lang. Based Security 34 Example: add r2,r2,5 ld r1, r2(3) st r5, r1(1) ; (r1+1):RW & true {true} 7/12/2016 Lang. Based Security 35 Example add r2,r2,5 ld r1, r2(3) ; (r2+3):RO & (sel(m,r2,3)+1):RW st r5, r1(1) ; (r1+1):RW {true} 7/12/2016 Lang. Based Security 36 Example add r2,r2,5 ; (r2+5+3):RO & (sel(m,r2+5,3)+1):RW ld r1, r2(3) ; (r2+3):RO & (sel(m,r2,3)+1):RW st r5, r1(1) ; (r1+1):RW {true} 7/12/2016 Lang. Based Security 37 Proof Representation Use a variant of LF to represent assertions and proofs. – – – – write down assertion language write down inference rules for the logic proof-checking becomes LF type-checking decouples the logic and assertion language from the verifier. – of course, you still have to establish the soundness and consistency of the logic that you encode within LF. – and some logics (e.g., linear or temporal or modal) do not encode so nicely into LF (see Twelf) 7/12/2016 Lang. Based Security 38 Representing LF Proofs In practice LF proof objects are HUGE. Recent work on proof oracles compresses this down to nothing [PoPL’2001?] – assume you can match the goal against the conclusions of the proof rules (e.g., 1st-order unification.) If you can’t match with this, then force the representation to contain more information – only some (small) subset of the rules will apply (say k of them.) – so you only need to spit out lg(k) bits to indicate which rule is actually used in the proof. – the matching lets you then establish sub-goals that need to be proven. 7/12/2016 Lang. Based Security 39 Where PCC stands • Cedilla has built a certifying compiler for Java. – generates optimized x86 code – but you can write your own code too! – uses a Nelson-Oppen-style prover • The proof checker is actually machine independent – map object code up to a machine-independent IL (Secure Assembly Language) – proofs are with respect to that the SAL code – retargeting the prover to another machine just involves writing a (correct) mapping from the machine code to SAL. 7/12/2016 Lang. Based Security 40 Foundational PCC [Appel, Felty] Eliminate more trust from PCC: – logic encoded into LF – implicit machine semantics Rather, encode things from the machine semantics up. – you prove w.r.t. the semantics that {Pre}C{Post} is valid. Interesting observation: – to do any reasonable proof, you start introducing “types” or invariants that look suspiciously like TAL – except that you have a semantic encoding as to what the TAL types mean w.r.t. the machine. 7/12/2016 Lang. Based Security 41