Online Auditing

advertisement

Online Auditing

Kobbi Nissim

Microsoft

Based on a position paper with Nina Mishra

The Setting

q = (f ,i

1

,…,i k

) f (d i1

,…,d ik

)

Statistical database

• Dataset: {d

1

,…,d n

}

– Entries d i

: Real, Integer, Boolean

• Query: q = (f ,i

1

,…,i k

)

– f : Min, Max, Median, Sum, Average, Count…

• Some users are bad…

2

Here’s a new query: q i+1

Auditing

Here’s the answer

OR

Query denied (as the answer would cause privacy loss)

Auditor

Query log q

1

,…,q i

Statistical database

3

Auditing

• [Adam, Wortmann 89] classify auditing as a query restriction method

– Such methods limit the queries users may post, usually imposing some structure (e.g. combinatorial, algebraic)

– “Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued”

• Partial motivation:

May allow for more queries to be posed, if no privacy threat occurs

• Early work: Hofmann 1977, Schlorer 1976, Chin,

Ozsoyoglu 1981, 1986

• Recent interest:

Kleinberg, Papadimitriou, Raghavan 2000,

Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003

4

Design choices in Prior Work

• Out of the scope for this talk (but important):

– Very weak privacy guarantee: Privacy breached

(only) when a database entry may be uniquely deduced

– Exact answers given

• Important for this talk:

– Data taken into account in decision procedure

• Answers to q

1

,…,q i

• Denials ignored and q i+1 taken into account

5

Some Prior Work on Auditors

Sum/Max

[Chin]

Boolean

[KPR00]

Max [KPR00]

Interval based

[LWWJ02]

Generalized results [JK03]

Data Queries real Sum/max d i

0/1 Sum

Breach Complexity learned

--”--

NP-hard

NP-hard

Real Max --”-d i

[a,b] sum d i within accuracy

.

PTIME

PTIME

NP-hard /

PTIME

6

Example 1: Sum/Max auditing

d i real, sum/max queries q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3)

Denied (the answer would cause privacy loss)

Oh well… d1=d2=d3 = 5

I win!

Auditor

7

Example 2: Interval Based Auditing

d i

[0,100], sum queries,

=1 (PTIME) q1 = sum(d1,d2)

Sorry, denied q2 = sum(d2,d3) sum(d2,d3) = 50 d1,d2

[0,1] d3

[49,50]

Auditor

8

Sounds Familiar?

Colonel Oliver North, on the Iran-Contra Arms Deal:

On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights.

David Duncan, Former auditor for Enron and partner in Andersen:

Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United

States.

9

d1 d2

What about Max Auditing?

d3 d4 d5 d6 d7 d8 … d n-1 d n d i real q1 = max(d1,d2,d3,d4)

M

1234 q2 = max(d1,d2,d3)

If denied: d4=M

1234 q2 = max(d1,d2)

M

123

/ denied

M

12

/ denied

If denied: d3=M

123

Recover 1/8 of the database!

Auditor

10

What about Boolean Auditing?

d1 d2 d3 d4 d5 d6 d7 d8 … d n-1 d n d i

Boolean q1 = sum(d1,d2)

1 / denied q i q2=sum(d2,d3) denied iff d i

1 / denied

= d i+1

 learn database/complement

Let d i

,d j

,d k not all equal, where q i-1

, q i

, q j-1

, q j

, q k-1

, q k all denied q2=sum(d i

,d j

,d k

)

1 / 2

Recover the entire database!

Auditor

11

What are the Problems?

• Obvious problem: denied queries ignored

– Algorithmic problem: not clear how to incorporate denials in the deicion

• Subtle problem:

– Query denials leak (potentially sensitive) information

• Users cannot decide denials by themselves

Possible assignments to {d

1

,…,d n

}

Assignments consistent with (q

1

,…q i

) q i+1 denied

12

A Spectrum of Auditors

Decision data q

1

,…,q i

, q i+1 i i

, q i i

, a i+1

Examples

• Size overlap restriction

• Algebraic structure

• Sum/Max, Interval based,

Boolean, Max

• Cell suppression

• k-anonimity

“safe”

“unsafe”

*Note: can work in “unsafe” region, but need to prove denials do not

13 leak crucial information

Simulatable Auditing*

An auditor is simulatable if a simulator exists s.t.: q

1

,…,q i q i+1 q i+1

Statistical database

Auditor q

1

,…,q i a

1

,…,a i

Simulator

Deny/answer

Deny/answer

Simulation

 denials do not leak information

* `self auditors’ in [DN03]

14

Summary

• Subtleties in current definition of auditors allow for information leakage, and potentially, privacy breaches

– Denials are not taken into account

– Auditor uses information not available to user

• Simulatable auditors provably don’t leak information in decision

– New starting point for research on auditors

15

Download