Preventing Information Leaks with Jeeves Jean Yang, MIT April 8, 2015 Data, Data Everywhere Electronic health records Social media Online courses Jean Yang / Jeeves Wearable devices 2 All Kinds of People Are Writing All Kinds of Code Open source lines of code Medical researchers Social scientists Journalists Children Jean Yang / Jeeves 3 Even Trained Developers Leak Information Jean Yang / Jeeves 4 Why Aren’t Existing Approaches Enough? Encrypting Data Defensive protection Exploit Patch But people are still showing the data wrong. But leaves system builders a step behind. Jean Yang / Jeeves 5 My Approach: Privacy by Construction Factor out privacy to reduce opportunity for leaks. • Programmer specifies high-level policies about how sensitive data can be used. • Rest of program is policy-agnostic. • System manages policies automatically. Jean Yang / Jeeves 6 Social Calendar Example Alice and Bob throw a surprise party for Carol. Jean Yang / Jeeves 7 Problem: Even Seemingly Simple Policies Have Subtleties Surprise party for Carol at Chuck E. Cheese. Guests Private event at Chuck E. Cheese. Pizza with Alice/Bob. Strangers Carol Policy: Must be guest. Policy: Only visible to hosts until finalized. Policies can depend on sensitive values and other policies. Jean Yang / Jeeves 8 Enforcing Policies Can Leak Information! Surprise party at Chuck E. Cheese. Policy: Must be guest. Guests 2 1 Policy: Only visible to hosts until finalized. Guests can’t see event • Subtle mistake: check for policy 1 neglects dependency on policy 2. • Problem arises when programmers trusted to get dependencies right. Guests can see event Guest list finalized Jean Yang / Jeeves 9 Problem: Policies Are Intertwined Across the Code “What is the most popular location among friends 7pm Tuesday?” Update to event subscribers • Track information flow through derived values. • Track where derived values flow. Jean Yang / Jeeves 10 “Policy Spaghetti” in Real Systems Code from HotCRP conference management system Highlighted: conditional permissions checks everywhere. Jean Yang / Jeeves 11 Automatic Enforcement with Jeeves The well-intentioned programmer writes same code no matter what policies are. Programming model. Mathematical guarantees. Jean Yang / Jeeves Implementations, web framework, and case studies. 12 Jeeves Factors Out Policies • Centralized policies. • Policy-agnostic program. • Runtime differentiates behavior. Model View Jean Yang / Jeeves Controller 13 This Talk Background. Jeeves programming model. [POPL ’12] How Jeeves works. [PLAS ’13] Theoretical guarantees. [PLAS ’13] Implementation & evaluation. Jean Yang / Jeeves [draft] 14 Automatically Managing Information Flow Jeeves supports more expressive policies than prior work. 1. Specifying policies. 2. Specifying differentiated behavior. 3. Factoring out policies from other code. 4. Enforcing policies that depend on sensitive values. Only Jeeves supports 3 and 4. Jean Yang / Jeeves 15 Related Work: Checking Information Flow Must explicitly define and tag alternate behavior. Programmer tags data. if viewer == : elif viewer != == System prevents leaks by raising error. == true error false : Information flow controls check that only allowed flows occur [HiStar, Flume, Resin, LIO/Hails, Jif/Sif, flow locks, F*, and more]. Jean Yang / Jeeves 16 My previous work. Related Work: Computing with Multiple Values Must encode policies as tags. == • “Policy spaghetti.” • Policies can leak information. true false Multi-execution approaches differentiate behavior based on viewer [secure multi-execution (Devriese et al); faceted execution (Austin and Flanagan)]. Jean Yang / Jeeves 17 This Talk Background. Jeeves programming model model. How Jeeves works. Challenges 1 Theoretical guarantees. Factoring out information flow. Supporting expressive policies that can depend on sensitive values. 2 Implementation & evaluation. Jean Yang / Jeeves 18 Policy-Agnostic Programming in Jeeves policies Policies describe rules for how values may flow to output contexts. Sensitive values encapsulate multiple behaviors. Application Code Separate from policies. Jean Yang / Jeeves 19 Jeeves Supports Expressive Policies A policy is an arbitrary function that takes the output context and returns a Boolean value. def isNotCarol(oc): return oc != Output context can be of arbitrary type. Policies can depend on sensitive values. def isGuest(oc): return oc in .guests .guests Jean Yang / Jeeves [] 20 Jeeves Programming Model Programmer specifies policies and facets. 1 == Runtime produces differentiated output based on the viewer. 4 print { Programmer writes policyagnostic programs. 2 Runtime propagates values and policies. 3 true false } print { true } false Jean Yang / Jeeves 21 The Jeeves Programming Model • Well-defined runtime semantics for policy-agnostic programming with information flow policies. • Can be implemented standalone or embedded as a library. • Has been adapted across runtimes in web frameworks. Jean Yang / Jeeves 22 This Talk Background. Jeeves programming model How Jeeves works. Theoretical guarantees. Challenges Handling policies with dependencies. 1 Implementation & evaluation. Enforcing policies when state changes. 2 Jean Yang / Jeeves 23 Jeeves Execution Model Runtime simulates simultaneous multiple executions. x = 0 Runtime propagates values and policies. if print { } print { 1 0 1 Runtime : solves for values to show based on policies and viewer. 2 == x += 1 return x Jean Yang / Jeeves } 24 Using Policies to Produce Outputs Jeeves uses policies to defacet appropriately. policy( def isNotCarol(oc): return oc != 0 1 print { } ) ( != ) ? 1 : 0 0 Jean Yang / Jeeves 25 But What About Dependencies? def isMaybeCarol(oc): return oc == print { policy( ) ( == ) ? : } Need to find a fixed point! Possible solutions: ( == ) ? : ( == ) ? : Jean Yang / Jeeves Jeeves runtime will pick the secret value if allowed. 26 Using Constraints to Handle Dependencies Evaluated with respect to state at time of output. policy( a 1 Label def isGuest(oc): return oc in .guests a 0 Policy π ∈ {π πππππ‘, ππ’ππππ} print { ) π = π πππππ‘ ⇒ in β’ π = π πππππ‘ ⇒ ππππ π β’ ¬(π = π πππππ‘) 0 Jean Yang / Jeeves } .guests • Constraints contain only Boolean variables. • Always a consistent assignment. 27 Handling Indirect Flows a if == : x += 1 if a true false : x += 1 Labels follow values through all computations, including conditionals and assignments. a x = xold+1 xold Jean Yang / Jeeves 28 Jeeves Runtime Semantics: Outputting to Sinks Statement evaluation Σ, π ⇓ ππ , ππ: π Expression evaluation Σ, πΈ ⇓ππ Σ ′ , π Evaluate output context Σ, πΈππ ⇓∅ Σππ , πππ and expression to print. Σππ , πΈπ ⇓∅ Σπ , ππ π1 … ππ = ππππ ππΎ(ππππππ πΈππ ∪ ππππππ (πΈπ ), Σ2 ) Retrieve labels πΈπ = ππ₯. π‘ππ’π ∧π … ∧π Σ2 (ππ ) and policies. policies applied Σπ , (πΈπ πππ ) ⇓∅ Σπ , ππ Evaluate to the output context. pick ππ such that ππ πππ = ππ, ππ ππ = π , ππ ππ = π‘ππ’π Σ, print πΈππ πΈπ ⇓ ππ , ππ: π Policies Output Result context Jean Yang / Jeeves Defacet using satisfying policy assignment. 29 This Talk Background. Jeeves programming model Challenges Defining what it means to enforce policies. 1 How Jeeves works. Theoretical guarantees. Implementation & evaluation. Jean Yang / Jeeves Proving enforcement with respect to our semantics. 2 30 Non-Interference Secret values should not affect public output. a if == if : x += 1 a == : x += 1 a 1 0 print { 0 Jean Yang / Jeeves Jeeves Challenge: Compute labels from program—may have dependencies } on secret values! 31 Jeeves Non-Interference and Policy Compliance a if == if : x += 1 a == : x += 1 a 1 0 print { 0 Jean Yang / Jeeves Jeeves Theorem: All executions where a must be public produce equivalent } outputs. Can’t tell apart secret values that require a to be public. 32 Jeeves Policy Compliance Theorem Suppose we have π π π1 = πππππ‘ π πΆ[ πΈ1 πΈπ ] π2 = πππππ‘ π πΆ[ πΈ2 πΈπ ] Where π must be public. Then the set of values π1 evaluates to is equivalent to the set of values π2 evaluates to. π: π ∃π. ∅, π1 ⇓ π, π: π } = {π: π | ∃π. ∅, π2 ⇓ π, π: π } Need to account for policies that depend on sensitive values! Jean Yang / Jeeves 33 This Talk Background. Jeeves programming model Challenges How Jeeves works. Adapting the approach for web applications. Theoretical guarantees. 2 1 Scaling the approach. evaluation. Implementation & and evaluation. Jean Yang / Jeeves 34 Facets in the Application and Database Application Queries select * from Users where location = Application All data select * from Users SQL Database Database queries can leak information! SQL Database Impractical and potentially slow! Solution: Facet the database. Implemented as an objectrelational mapping supporting uniform facet representation. Jean Yang / Jeeves 35 Jacqueline Web Framework Jeeves runtime Application Frontend Attach policies. Viewer Database @jeeves Programmer is responsible Framework is responsible Jean Yang / Jeeves 36 Django-like data schema for describing fields. Helper functions for policy include queries. Policy for ‘location’ field. Public value for ‘location’ field. Jean Yang / Jeeves 37 Implemented in Jeeves Conference management system Course manager Medical records (HIPAA) Questions 1. How does faceted execution scale? 2. How does faceting the database affect queries? Jean Yang / Jeeves 38 Demo Jean Yang / Jeeves 39 CMS Running Times Show All Papers 0.25 0.2 0.15 0.1 0.05 0 0 200 400 Number of papers in DB 512 users Time to load page(s) Time to load page (s) Show Single Paper 50 40 30 20 10 0 200 400 Number of papers in DB 512 users 1024 users Database size does not affect query speed. 0 1024 users Showing sensitive data grows linearly. Tests from Amazon AWS machine via HTTP requests from another machine. Jean Yang / Jeeves 40 Evaluation Lessons • Faceted execution is expensive. • Policy-agnostic programming does not always need faceted execution. • Many opportunities for optimization in web frameworks. Jean Yang / Jeeves 41 Policy-Agnostic Programming in Jeeves Sensitive values Policies == Other functionality Design of a policyagnostic programming language [POPL ‘12] Semantics and guarantees [PLAS ’13] Jean Yang / Jeeves Implementation, web framework, and case studies [in submission] 42 My Interests in Provable Guarantees Typechecked Verified F*: type-checking Verve: a provably security properties. type-safe OS. [PLDI ’10 (Best Paper); [ICFP ’11; JFP ‘13] CACM ‘10] Ask Reeves: verified privacy policies. [patent filed] Lessons • Difficult to write secure and private programs. • Even more difficult to prove them correct. • Easier if we can get privacy by construction. Jean Yang / Jeeves 43 The Future of PolicyAgnostic Programming • Factor out policies from other code. • Rely on compiler and runtime to differentiate behavior. Jean Yang / Jeeves 44 Making Policy-Agnostic Programming Work • Programmability. – Languages for different kinds of policies. – Tools for verifying and visualizing policies. • Scaling. – Policy-agnostic semantics with minimal faceting. – Building foundations with policy management in mind. Web frameworks Databases Web browsers Jean Yang / Jeeves 45 Parting Thoughts By reducing opportunity for programmer error, we can eliminate whole classes of privacy leaks. Jean Yang / Jeeves 46