Software Model Checking for Confidentiality Rajeev Alur University of Pennsylvania Joint work with Pavol Cerny Confidentiality “Data Leaks Abound And No One Is Safe“ (Feb 9 “Indian Foreign Ministry hit by spyware” th) (Feb 15th) “Cell Phones a Much Bigger Privacy Risk Than Facebook” (Feb 20th) download online store programs banking health records email 2 Confidentiality How do data leaks happen? “Unauthorized application use: … the use of unauthorized programs resulted in as many as half of their companies' data loss incidents.” (“Data leakage worldwide, …”,Cisco, 2008) Focus of our case study: J2ME midlets for mobile devices can buy spyware (flexispy.com,..) “A malicious signed application could read all the PIM data and send it to an attacker using the variety of transport mechanisms outlined in this document.” (Symantec, 2007) 3 J2ME midlets EventSharingMidlet: void sendEvent(…) { Accesses phone’s native data … contactList = (ContactList) PIM.getInstance().openPIMlists( PIM.Contact_LIST, PIM.READ_ONLY, listname) … conn.send(message) How do we know that information does not leak? … } Sends something 4 How can information be leaked? public void sendEvent() { doUsefulWork(); public void sendEvent(…) { … doUsefulWork(); low = 0; ... if (phoneBook.contains(“555-55”)) { conn.send (secret_message); low = 1; } } Model: The attacker a) knows the program b) observes all external communication conn.send(low); } Information leaked due to malicious (or buggy) code. Confidentiality is not a property of a single trace. 5 Checking Confidentiality createEvent Midlet //get the phone number number = phoneBook.elementAt(selected); //test if the number is valid if ((number==null)||(number==“”)) { //output error } else { String message = inputMessage(); //send a message to the receiver sendMessage(number,message); } •Taint analysis too strict •Language-based approaches would require annotations for downgrading 6 Software Model Checking Not applicable to specifying and verifying of confidentiality: Program P1. Confidentiality is not a property of a single Specification φ execution (thus not specifiable in LTL and (source code) in fact is not specifiable in μ-calculus). •Is Abstraction 2. Both over- and underneeded. every acquired lock eventually released? approximation •Is the system deadlock free? 3. Main strength of software model checking Software model – Finding bugs in checker control-oriented programs . Successful and widely Yes / used, e.g. SLAM → SDV. No (counterexample) 7 Goal What we need: Specification framework Specification Yes Confidentiality analysis tool program No Analysis method 8 Reachability Reachability Temporal Specifications LTL, CTL, μ-calculus Finite-state systems NL-complete Programs (Java methods) Undecidable. Over-approximation for sound analysis (of unreachability) 9 Talk Overview Reachability “Confidentiality” ?? Temporal Specifications LTL, CTL, μ-calculus ?? Finite-state systems NL-complete ?? Programs (Java methods) Undecidable. Over-approximation for sound analysis (of unreachability) ?? 10 Defining Confidentiality Secret: Property to be kept confidential; typically a predicate over state variables Observation h of an execution: What can the attacker observe? Two executions with same observation are equivalent Examples: Outputs; Sequence of messages sent More generally, each state is labeled with observable propositions, and observation of an execution is a sequence of observable propositions of states Executions of interest specified by a condition cond Terminating executions Executions where input satisfies some constraint 11 Conditional Confidentiality Given a notion of observation, a property secret, and a condition cond of interesting executions, a program P satisfies conditional confidentiality iff For every execution r satisfying cond, there exists an execution r’ such that 1. r and r’ have the same observation 2. r and r’ differ on the value of secret 12 Temporal Logics for Confidentiality Motivation: In multi-agent systems and for protocols, how to specify requirements concerning order in which secrets are revealed Classical model of systems/programs: Trees Existing branching-time logics are not adequate Thm: Confidentiality cannot be expressed in m-calculus Cannot capture “equivalence” of executions 13 Labeled Trees pq pq pq pq pq pq pq Agent a observes proposition p, b observes q Labeled Trees with Equivalence Edges pq pq pq b pq a a pq pq pq a Agent a observes proposition p, b observes q a-labeled edge between nodes: a considers them equivalent The logic CTL≈ CTL≈ EX f EIa g f = p | ¬ f | f1 or f2 | EX f | f1 EU f2 | EG f | EIa f a a f EIa f: f holds in some world considered plausible by a • Confidentiality: AG (EIa α and EIa ¬α) • Agent a does not reveal x before agent b reveals y A (EIa x and EIa ~x) U ( AIb y or AIb ~y) Analogous extension of m-calculus: µ≈ g Model Checking Does a finite-state system satisfy a temporal logic formula? Nesting-free fragments CTL≈ :PSPACE complete μ≈ -calculus: EXPTIME complete In general – nonelementary (resp. undecidable) Good news: Typical confidentiality properties captured in the nesting-free fragments 17 Talk Overview Reachability Conditional Confidentiality Temporal logics CTL, μ-calculus CTL≈, μ≈-calculus Finite-state systems NL-complete PSPACE-complete Programs (Java methods) Undecidable. Over-approximation for sound analysis (of unreachability) ?? 18 Confidentiality for programs •secret: Does A contain 7? res = -1; i=0; while (i<n) { if (A[i]==key) { res=A[i]; } i++; } send res; •Observer sees the value of res •cond: key is not 7 For all observations h, if h is valid (consistent with the condition cond), then h leads to a state where secret holds, and h leads to a state where the secret does not hold. Example: suppose the observer sees 3 (that is, res = 3): There exists a state: A= [7,3]; key = 3 (observation valid) There exists a state: A= [7,3]; key = 3 (secret holds) There exists a state: A= [1,3]; key = 3 (secret does not hold) 19 Confidentiality for programs res = -1; i=0; while (i<n) { if (A[i]==key) { res =A[i]; } i++; } send res; •secret: Does A contain 7? •Observer sees the value of res. •cond: key is not 7. Confidentiality: For all possible observations h, if h is valid (consistent with the condition cond), if there exists s: s in R and cond(s) and s[res]=h then h leads to a state where secret holds, then there exists s: s in R and secret(s) and s[res]=h R - set of reachable states and h leads to a state where the secret does not hold. and there exists s: s in R and ¬secret(s) and s[res]=h Over- / under- approximation Computing reachable states exactly is impractical. Approximation: R+ (an over-approximation (R R+)), R- (an under-approximation (R R-)) R+ Confidentiality: R- R For all possible observations h, if h is valid (consistent with the condition cond), if there exists s: s in R+ and cond(s) and s[res]=h then h leads to a state where secret holds, then there exists s: s in R- and secret(s) and s[res]=h and h leads to a state where the secret does not hold. and there exists s: s in R- and ¬secret(s) and s[res]=h Lemma: The approximate formula implies confidentiality. 21 Over- / under- approximation Computing the over-approximation R+ : invariants (user-supplied or computed): Example: res = -1; i=0; while (i<n) { if (A[i]==key) { res =A[i]; Invariant: } i++; (res ==key) or } (res ==-1) send res; 22 Over- / under- approximation Computing the under-approximation R- : (loop unrolling, bounding the data structure size) res = -1; i=0; while (i<n) { if (A[i]==key) { res =A[i]; } i++; } send res; res = -1; i=0; if (i<n) { if (A[i]==key) { res =A[i]; } i++; } if (i<n) { if (A[i]==key) { res =A[i]; } i++; } assume(i>=n); send res; 23 Confidentiality as a logical formula Program vars for all h: Invariant if there exist pv: inv(pv) and cond(pv) and res=h implies Weakest precondition there exist pv: WP(P’,(secret and res=h)) and there exist pv: WP(P’,(¬secret and res=h)) Confidentiality holds only if: Program with unrolled loops h : (pv : inv ( pv) cond( pv) hist h) (pv : WP ( P ' , secret( pv) hist h)) (pv : WP ( P ' , secret( pv) hist h)) 24 Deciding validity of confidentiality formula Problem: Quantifier alternation. Complexity of decision procedures (QBF, Pressburger) high, tools not well engineered. Question: Could we use SMT solvers? Idea: Restrict the expression language to contain only equality (order). Rationale: Many programs do not perform arithmetic on the data, only tasks like searching, inserting, deleting, (sorting). res = -1; i=0; while (i<n) { if (A[i]=key) { res =A[i]; } i++; } send res; 25 Deciding validity of confidentiality formula Result: If universal quantifier is over a domain with only equality, we can replace it by checking the formula at a fixed number of specific values h : (res , key : (( res 1) (res key)) (key 7) res h)) (pv : 1 ) (pv : 2 ) res = -1; i=0; while (i<n) { if (A[i]=key) { res =A[i]; } i++; } send result; Thus, an SMT solver can be used (checking three formulas Values 7, -1, and one other (e.g. 1) need to be per constant). checked. 26 ConAn (CONfidentiality ANalysis) Java Bytecode WALA Secret Cond Performs SMT solving. ConAn Invariant Nunroll Processes bytecode to produce an intermediate representation of SSA instructions organized in a control-flow graph. Narray Valid Yices Unsat 27 Applications • • • Case study: J2ME Java methods third party programs, accessing PIM information (managing contacts, calendars, to-do lists) and sending messages Other Java methods: methods from other PIM managing programs (chat clients, calendars..). data structure accessing methods from Java standard library. 28 Experimental results Project/ Class Method Name # of lines unroll running time (s) result 1 Java.lang/ Vector elementAt 6 1 0.18 valid 2 EventSharing sendEvent 122 2 1.83 valid 3 EventSharing sendEvent (bug) 126 2 1.80 unsat 4 find 9 1 0.31 unsat 5 find 9 2 0.34 valid 6 Funambol/ Contact getContact 13 2 0.32 valid 7 Blackchat/ ICQContact getContact-ByReference 23 2 0.24 valid 8 password check 9 2 0.22 valid 29 Conclusions Algorithmic, specification-driven analysis is an effective way of establishing that programs do not leak confidential information. 30