CS711 Overview of PCC Greg Morrisett Cornell University

advertisement
CS711
Overview of PCC
Greg Morrisett
Cornell University
Thanks to G.Necula & P.Lee
Papers for this Lecture
G. Necula, “Proof-Carrying Code”. PoPL'97.
G.Necula and P.Lee. “Safe Kernel Extensions Without
Run-Time Checking”. OSDI'96.
G.Necula and P.Lee. “The Design and Implementation
of a Certifying Compiler.” PLDI’98, June 1998.
[pldi98.ps]
I also highly recommend Necula’s PhD thesis (CMU).
7/12/2016
Lang. Based Security
2
Ideally:
trusted computing base
Your favorite
language
verifier
Security
Policy
System
Binary
Low-Level IL
optimizer
machine code
7/12/2016
Lang. Based Security
3
Idea #1: Theorem Prover!
trusted computing base
Your favorite
language
NuPRL
Security
Policy
System
Binary
Low-Level IL
optimizer
machine code
7/12/2016
Lang. Based Security
4
Unfortunately...
trusted computing base
NuPRL
7/12/2016
Lang. Based Security
5
Observation
Finding a proof is hard, but verifying a proof is easy.
7/12/2016
Lang. Based Security
6
PCC:
trusted computing base
verifier
optimizer
System
Binary
machine code
prover
Security
Policy
could be
you
“certified binary”
code
7/12/2016
invariants
Lang. Based Security
proof
7
Making “Proof” Rigorous:
Specify machine-code semantics and security
policy using axiomatic semantics.
{Pre} ld r2,r1(i) {Post}
Given:
– security policy (i.e., axiomatic semantics and
associated logic for assertions)
– untrusted code
– annotated with invariant assertions
it’s possible to calculate a verification condition:
– an assertion A such that
– if A is true then the code respects the policy.
7/12/2016
Lang. Based Security
8
The Client
The client takes its code & the policy:
– constructs some loop invariants.
– constructs the verification condition A from
the code, policy, and loop invariants.
– constructs a proof that A is true.
code
invariants
proof
“certified binary”
7/12/2016
Lang. Based Security
9
Verification
The Verifier (~ 4-6 pages of C code):
– takes code, loop invariants, and policy
– calculates the verification condition A.
– checks that the proof is a valid proof of A:
• fails if some step doesn’t follow from an axiom
or inference rule
• fails if the proof is valid, but not a proof of A
code
invariants
proof
“certified binary”
7/12/2016
Lang. Based Security
10
Advantages of PCC
In Principle:





Simple, small, and fast TCB.
No external authentication or cryptography.
No additional run-time checks.
“Tamper-proof”.
Precise and expressive specification of code
safety policies.
code
7/12/2016
invariants
proof
Lang. Based Security
11
An Experiment: Packet Filters
• Safety Policy:
– given a packet, returns yes/no
– packet is read-only, small scratchpad
– no loops
• Compare:
– Berkeley Packet Filter Interpreter
– Modula-3 (but turn off type-checking)
– Software Fault Isolation (sandboxing)
– PCC (hand-optimized, proved)
7/12/2016
Lang. Based Security
12
Results:
PCC wins:
ms 15
12
9
6
3
0
PCC
SFI
M3
BPF
0
7/12/2016
10
20
30
40
Thousands
of
packets
Lang. Based Security
50
13
Is PCC the answer?
PCC seems to offer everything we need:
– small, simple trusted computing base
– optimize all you want, any language, any
security policy, etc.
But how do we make it scale to real
programs?
7/12/2016
Lang. Based Security
14
Scaling Problem #1:
How to generate proofs?
• Manual construction is too painful for real
programs.
• Interactive theorem provers are really only
feasible for a relatively small fraction of the
code.
• We need something that’s fully automatic
most of the time.
7/12/2016
Lang. Based Security
15
One Approach
Restrict the safety policy to type safety.
• Necessary for most policies anyway:
– cannot execute code or access data for which you
do not have a capability.
– type systems are a meta-policy that allow
programmers to define fine-grained notions of
“capability” and “access”.
• abstract types, interfaces, static scope, etc.
• Start with a well-typed, high-level program
– you have a proof for the high-level code
– preserve the proof as you compile
7/12/2016
Lang. Based Security
16
Type-Preserving Compilation
Source code
Type-checker
binary
Optimizer
Codegenerator
Proof of
type-safety
7/12/2016
Proof of
type-safety
Lang. Based Security
17
Touchstone [Necula]
• Compiles type-safe subset of C to
certified binaries for the DEC Alpha.
• Security policy is type-safety:
– parameters of the right type to functions
– values of the right type in arrays, structs
– array indices in bounds
• Highly-optimizing
– competitive with GCC, DEC cc
– eliminates array bound checks when
possible
7/12/2016
Lang. Based Security
18
Touchstone Performance
Speedup vs. "GNU gcc -O0"
12x
10x
8x
6x
4x
2x
0x
GNU gcc -O4
DEC cc -O4
Touchstone
blur
sharpen
qsort
simplex
kmp
unpack
bcopy
edge
GMEAN
2.33
2.92
2.64
3.82
3.68
3.89
3.51
3.52
3.52
2.97
2.79
3.86
2.44
2.44
1.93
2.62
2.76
2.20
5.50
6.88
4.00
2.92
11.52
9.16
3.17
3.92
3.48
In spite of the fact that C compilers do not
insert array bound checks, Touchstone is
competitive.
7/12/2016
Lang. Based Security
19
Touchstone Compilation Time
6000.0
Time (ms)
5000.0
4000.0
3000.0
2000.0
1000.0
0.0
Proof Checking (ms)
Proving (ms)
VC Generation (ms)
Code Generation (ms)
blur
5.9
81.0
6.9
271.0
sharpen qsort
21.0
257.0
22.3
818.0
16.1
127.0
12.0
560.0
simplex
kmp
unpack
bcopy
edge
108.7
1272.0
73.9
4340.0
9.6
108.0
8.4
348.0
50.0
1912.0
55.9
1885.0
1.9
25.0
3.3
136.0
9.3
143.0
9.7
697.0
• Geometric means:
– compilation 75%
– VC generation 2%
7/12/2016
– proving
21%
– proof checking 2%
Lang. Based Security
20
JVM vs. Touchstone
JVM:
– portable
– $$$
Touchstone:
– extremely good performance
– extremely small TCB
– fast verification
7/12/2016
Lang. Based Security
21
However...
• Touchstone’s type system suits only one
very simple language:
– no abstract data types, objects, etc.
– no threads
• Proof size was an issue:
– proofs were 1-3x the size of the code, just for
a really simple notion of type-safety.
– but recent work by Necula shows that this can
be compressed down to tiny overhead (e.g.,
10%)
7/12/2016
Lang. Based Security
22
PCC binary size (bytes)
Touchstone proof size
15000
10000
5000
0
blur
Proof
Invar
Code
718
162
320
sharpen qsort
2774
342
1248
1778
272
560
simplex
kmp
11810
825
3584
1246
132
624
unpack bcopy
5636
804
2496
250
36
64
edge
1122
102
640
Touchstone’s proof size relative to code and
invariant annotations.
7/12/2016
Lang. Based Security
23
Summary thus far...
• Proof-carrying code is great in principle.
– It’s the right general framework.
– For special-purpose applications, can’t be
beat.
• But for general-purpose extensions:
– Need some way to get the proof
automatically (limit policy to type-safety).
– Engineering proof size is an issue.
– Compiling high-level languages is an issue.
7/12/2016
Lang. Based Security
24
Design Details
Server
Client
Safety policy
Invar
Code
VC
Generator
Source
Logic
VC
Code
Proof
Checker
7/12/2016
Certifying
Compiler
Theorem
Prover
Proof
Untrusted
Trusted
Complex
Simple
Slow
Fast
Lang. Based Security
25
Abstract Machine
• Instructions (from DEC Alpha):
–
–
–
–
ADD/SUB rs, Op, rd (Op ::= n | r)
LD rd, n(rs), ST rs, n(rd)
BEQ/NE rs, n, RET
INV(P)
• States: (R,pc)
– R[r] is a 64-bit integer
– R[mem] is memory: Int64->Int64
– pc is current program counter
• Expressions:
– e ::= n | r | e1 + e2 | e1 – e2 | sel(m,e)
– m ::= mem | upd(m,e1,e2)
7/12/2016
Lang. Based Security
26
Semantics:
• (R,pc) -> (R',pc')
– relative to fixed instruction sequence S
• Rewriting rules:
– R' = R[rd := R(rt) + R(rs)], pc' = pc+1
if S(pc) = ADD rs,rt,rd
– R' = R[rd := sel(R(m),R(rs)+n)], pc'=pc+1
if S(pc) = LD rd,n(rs) and readable(R,rs,n)
– R' = R[m := upd(R(m),R(rd)+n,R(rs))]
pc' = pc+1 if S(pc) = ST rs, n(rd) and
writeable (R,rd,n)
– R = R', pc = pc+n+1 if S(pc) = BEQ rs,n and R(rs) =
0 (and pc+n+1 in 0..S.size-1)
7/12/2016
Lang. Based Security
27
Predicates
• P ::= true | false | P1 & P2 | P1 => P2 |
All x.P | e1 = e2 | e1 != e2 | e : T
• T ::= RO | RW
– quantifiers range over numbers and are
meant to hold in every state.
– e:T predicate asserting that e has type T
• Example pre-condition:
– r0:RO & (r0+8):RO &
(sel(m,r0) != 0) => (r0+8):RW
7/12/2016
Lang. Based Security
28
Axioms and Proof Rules
• The usual ones for predicate logic
• Some rules for reasoning about 64-bit
arithmetic values
• Rules for reasoning about memory:
– sel(upd(m,e1,e2),e3) = e2 when e1 = e3
– sel(upd(m,e1,e2),e3) = sel(m,e3)
when e1 != e3.
– upd(upd(m,e1,e2),e3,e4) =
upd(upd(m,e3,e4),e1,e2) when e1 != e3
– Note: aliasing strikes again!
• Rules for reasoning about types:
– e:RW => e:RO
7/12/2016
Lang. Based Security
29
Notes on Axioms
• When you scale PCC up:
– you still need a rich type system to specify
interfaces (i.e., pre-conditions)
– you still have to prove the consistency and
soundness of your axioms w.r.t. the
machine
– i.e., you still have to write down a TAL and
prove its soundness
– you'll tend to use the same type invariance
tricks to ensure soundness
7/12/2016
Lang. Based Security
30
Verification Conditions
VC(i) =
[rs+rt / rd] VC(i+1) if S(i) = ADD rs,rt,rd
(rs+n):RO & [sel(m,rs+n)/rd]VC(i+1)
if S(i) = LD rd,n(rs)
(rd+n):RW & [upd(m,rd+n,rs)/m]VC(i+1)
if S(i) = ST rs,n(rd)
7/12/2016
Lang. Based Security
31
VC continued
VC(i) =
(rs = 0 => VC(i+n+1)) &
(rs != 0 => VC(i+1)) when
S(i) = BEQ rs,n
PostCondition when VC(i) = RET
P when VC(i) = INV(P)
7/12/2016
Lang. Based Security
32
Notes on VCGen
• Computes the weakest pre-condition of the
program if you start form the post-condition at
the RET(s) and work back.
– Need to cut cycles (back-edges in CFG) with INV
nodes or more properly.
– Note that INV isn't trusted – it's assumed for the
continuation, but verified if you ever get to it.
– Accomplished by adding INV => VC(i+1) to the
final safety predicate.
• Now all you need is a proof that VCGen is
implied by the pre-condition.
7/12/2016
Lang. Based Security
33
Example:
add r2,r2,5
ld r1, r2(3)
st r5, r1(1)
{true}
7/12/2016
Lang. Based Security
34
Example:
add r2,r2,5
ld r1, r2(3)
st r5, r1(1) ; (r1+1):RW & true
{true}
7/12/2016
Lang. Based Security
35
Example
add r2,r2,5
ld r1, r2(3) ; (r2+3):RO &
(sel(m,r2,3)+1):RW
st r5, r1(1) ; (r1+1):RW
{true}
7/12/2016
Lang. Based Security
36
Example
add r2,r2,5 ; (r2+5+3):RO &
(sel(m,r2+5,3)+1):RW
ld r1, r2(3) ; (r2+3):RO &
(sel(m,r2,3)+1):RW
st r5, r1(1) ; (r1+1):RW
{true}
7/12/2016
Lang. Based Security
37
Proof Representation
Use a variant of LF to represent assertions and
proofs.
–
–
–
–
write down assertion language
write down inference rules for the logic
proof-checking becomes LF type-checking
decouples the logic and assertion language from
the verifier.
– of course, you still have to establish the
soundness and consistency of the logic that you
encode within LF.
– and some logics (e.g., linear or temporal or modal)
do not encode so nicely into LF (see Twelf)
7/12/2016
Lang. Based Security
38
Representing LF Proofs
In practice LF proof objects are HUGE.
Recent work on proof oracles compresses this
down to nothing [PoPL’2001?]
– assume you can match the goal against the
conclusions of the proof rules (e.g., 1st-order
unification.) If you can’t match with this, then force
the representation to contain more information
– only some (small) subset of the rules will apply
(say k of them.)
– so you only need to spit out lg(k) bits to indicate
which rule is actually used in the proof.
– the matching lets you then establish sub-goals
that need to be proven.
7/12/2016
Lang. Based Security
39
Where PCC stands
• Cedilla has built a certifying compiler for Java.
– generates optimized x86 code
– but you can write your own code too!
– uses a Nelson-Oppen-style prover
• The proof checker is actually machine independent
– map object code up to a machine-independent IL (Secure
Assembly Language)
– proofs are with respect to that the SAL code
– retargeting the prover to another machine just involves
writing a (correct) mapping from the machine code to SAL.
7/12/2016
Lang. Based Security
40
Foundational PCC [Appel, Felty]
Eliminate more trust from PCC:
– logic encoded into LF
– implicit machine semantics
Rather, encode things from the machine
semantics up.
– you prove w.r.t. the semantics that {Pre}C{Post} is
valid.
Interesting observation:
– to do any reasonable proof, you start introducing
“types” or invariants that look suspiciously like TAL
– except that you have a semantic encoding as to
what the TAL types mean w.r.t. the machine.
7/12/2016
Lang. Based Security
41
Download