Online Auditing How may Auditors Inadvertently Compromise Your Privacy Kobbi Nissim Microsoft With Nina Mishra HP/Stanford Work in progress The Setting q = (f ,i1,…,ik) f (di1,…,dik) Statistical database • Dataset: d={d1,…,dn} – Entries di: Real, Integer, Boolean • Query: q = (f ,i1,…,ik) – f : Min, Max, Median, Sum, Average, Count… • Bad users will try to breach the privacy of individuals 2 The Data Privacy Game: an Information-Privacy Tradeoff • Private functions: i f f f – Want to hide i(d)=di • Information functions: – Want to reveal query answers f(di1,…,dik) • Major question: what may be computed over d (and given to users) without breaching privacy? • Confidentiality control methods – Perturbation methods: give `noisy’ answers – Query restriction methods: limit the queries users may post, usually imposing some structure (e.g. size/overlap restrictions) 3 Auditing • [AW89] classify auditing as a query restriction method: – “Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued” • Partial motivation: May allow for more queries to be posed, if no privacy threat occurs • Early work: Hofmann 1977, Schlorer 1976, Chin, Ozsoyoglu 1981, 1986 • Recent interest: Kleinberg, Papadimitriou, Raghavan 2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003 4 Auditing Here’s the answer OR Here’s a new query: qi+1 Query denied (as the answer would cause privacy loss) Auditor Query log q1,…,qi Statistical database 5 Design choices in Prior Work (1) 1. Privacy definition: – – Privacy breached (only) when a database entry may be deduced fully, or within some accuracy These privacy guarantees do not generally suffice: • Should take into account: Adversary’s computational power, prior knowledge, access to other databases… 2. Exact answers given – Auditors viewed as a way to give `quality’ answers??? 6 Design choices in Prior Work (2) 3. Which information is taken into account in the auditor decision procedure: – • Decision made based on queries q1,…,qi, qi+1 and their answers a1,…,ai, ai+1 Denials ignored 4. Offline vs. Online: • Offline auditing: queries and answers checked for compromise at the end of the day • • Only detect breaches Online auditing: answer/deny queries on the fly • Prevent breaches just before they happen 7 Example 1: Sum/Max auditing di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) Oh well… Auditor 8 Some Prior Work on Auditors Sum/Max [Chin] Boolean [KPR00] Data real Queries Breach Sum/max di learned Complexity NP-hard 0/1 Sum --”-- NP-hard* Max [KPR00] Real Max --”-- PTIME Interval based [LWWJ02] Generalized results [JK03] di [a,b] sum di within accuracy PTIME NP-hard / PTIME * Approx version in PTIME 9 Can we use the offline version for online auditing? … After Two Minutes … di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) There must beiffa q2 is denied Ohreason well… for the d1=d2=d3 =5 denial… I win! Denied (the answer would cause privacy loss) Auditor 10 Example 2: Interval Based Auditing di [0,100], sum queries, =1 (PTIME) q1 = sum(d1,d2) Sorry, denied q2 = sum(d2,d3) sum(d2,d3) = 50 d1,d2 [0,1] Denial d3 [49,50] d1,d2[0,1] or [99,100] Auditor 11 Sounds Familiar? Colonel Oliver North, on the Iran-Contra Arms Deal: On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights. David Duncan, Former auditor for Enron and partner in Andersen: Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States. 12 Max Auditing d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn di real q1 = max(d1,d2,d3,d4) M1234 q2 = max(d1,d2,d3) If denied: d4=M1234 M123 / denied q2 = max(d1,d2) If denied: d3=M123 M12 / denied Auditor 13 Adversary’s Success q1 = max(d1,d2,d3,d4) If denied: d4=M1234 Denied with probability 1/4 q2 = max(d1,d2) If denied: d3=M123 Denied with probability 1/3 Success probability: 1/4 + (1- 1/4)·1/3 = 1/2 q2 = max(d1,d2,d3) Recover 1/8 of the database! Auditor 14 Boolean Auditing? d1 d2 d3 d4 d5 d6 d7 d8 … dn-1 dn q1 = sum(d1,d2) di Boolean 1 / denied q2=sum(d2,d3) … 1 / denied qi denied iff di = di+1 learn database/complement Let di,dj,dk not all equal, where qi-1, qi, qj-1, qj, qk-1, qk all denied q2=sum(di,dj,dk) 1/2 Recover the entire database! Auditor 15 Two Problems • Obvious problem: denied queries ignored – Algorithmic problem: not clear how to incorporate denials in the decision • Subtle problem: – Query denials leak (potentially sensitive) information • Users cannot decide denials by themselves Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai) qi+1 denied 16 A Spectrum of Auditors “Safe” “Safe” Size overlap restriction Algebraic structure q1,…,qi, qi+1 < utility “Unsafe” q1,…,qi, qi+1 a1,…,ai > q1,…,qi, qi+1 a1,…,ai, ai+1 privacy *Note: can work in “unsafe” region, but need to prove denials do not leak crucial information 17 Simulatable Auditing* An auditor is simulatable if a simulator exists s.t.: qi+1 q1,…,qi Statistical database Auditor Deny/answer qi+1 q1,…,qi a1,…,ai Simulator Deny/answer Simulation denials do not leak information * `self auditors’ in [DN03] 18 Why Simulatable Auditors do not Leak Information? Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai ) qi+1 denied/allowed 19 Summary • Improper usage of auditors may lead to privacy breaches, due to information leakage in the decision procedure. – Cell suppression / some k-anonymity methods should be checked similarly – Should make sure offline auditors do not leak information in decision • Simulatable auditors provably don’t leak information – Give best utility while still “safe” – A launching point for further research on auditors • Further research: – Auditors with more reasonable privacy guarantees 20