A model for data revelation Poorvi Vora Dept. of Computer Science

advertisement
A model for data revelation
Poorvi Vora
Dept. of Computer Science
George Washington University
“Security” frameworks
Binary
• Divide the world into trusted and untrusted
parties
• Provides complete revelation of information
or complete protection
E.g. multiparty computation, encrypted data
6/30/2016
Poorvi Vora/CS/GWU
2
Even a statistic or aggregate reveals
“private” information
Secure multiparty computation reveals
f(x1, x2, .. xn)
And nothing more.
Yet, this reveals information about all xi
Thus, typical security assurances not enough
6/30/2016
Poorvi Vora/CS/GWU
3
What is privacy
• Control over information
• Extent of information revelation
Tensions between:
Access to aggregate information for community
Vs.
Individual control
reputation vs. predjudice
6/30/2016
Poorvi Vora/CS/GWU
4
Individual control requires more than binary
security of personal information
Information is often given up for
something in return
– Safeway card
– Monthly charge to be kept of phone books
– Information for community statistics:
• Health statistics
• Collaborative filtering/personalization in virtual
communities
6/30/2016
Poorvi Vora/CS/GWU
5
A model: introduce uncertainty
maximum uncertainty (i.e. secrecy)
corresponds to crypto protocols
• Alice and Bob determine:
– a binary data point from Alice’s personal information, x
– a probability of truth, p
– a return, y
• Alice reveals a variable z = x with probability p
• Bob provides, in return, y
• z exists in the ether as Alice’s value x with probability p
This is not mutually exclusive with cryptographic protection (p=0.5
is cryptographic)
Used in public health community for twenty odd years
6/30/2016
Poorvi Vora/CS/GWU
6
Outcome
Protocol is a mathematical game between Alice and
Bob
Optimal situation not when no information is revealed,
but when Alice gets maximum benefit for her
information
Think about this: should women in Africa test for HIV
when they will certainly not obtain any treatment for
it?
6/30/2016
Poorvi Vora/CS/GWU
7
An analogy
• The protocol is a communication channel
• The sender is Alice, the receiver (malicious?)
Bob
• The probability of error is the probability of a
lie
6/30/2016
Poorvi Vora/CS/GWU
8
Security properties of randomization
• Repeated queries
Error  0 as n  
And n   as Error  0
• Cost to attacker increases without bound if error not
bounded above zero
• This is a repetition code over channel
6/30/2016
Poorvi Vora/CS/GWU
9
Other attacks
Query 1: Graying?
Query 2: Balding?
Query 3: Weight?
Query 4: Sports?
Really asking about age and gender
How does one characterize all such attacks?
What can one say about security wrt such attacks?
6/30/2016
Poorvi Vora/CS/GWU
10
An analogy
• The protocol is a communication channel
• The sender is Alice, the receiver (malicious?) Bob
• The probability of error is the probability of a lie
• The attributes that Bob wants to determine
form the message
6/30/2016
Poorvi Vora/CS/GWU
11
A simple attack
• Query 1: Female?
• Query 2: Over 40?
• Query 3: Losing Calcium?
Query 3 checks answers to Query 1 and 2
Is a parity-check it
6/30/2016
Poorvi Vora/CS/GWU
12
An analogy
• All attacks are communication over channel
• Good attacks are codes
• What Bob queries is a codeword bit
• What he receives is the transmitted codeword that he
decodes
6/30/2016
Poorvi Vora/CS/GWU
13
Shannon’s theorems apply
In fact, assuming
any functions of Alice’s data points as queries (adaptive,
related queries)
and error probability  0 as n 
The number of queries required per bit of entropy
is asymptotically tightly bound below by the inverse of
the channel capacity
Above this bound, error tends exponentially to 0
Below it, it increases exponentially with n
6/30/2016
Poorvi Vora/CS/GWU
14
Questions
• How does one determine the entropy of a particular
data set, or a general data set?
• What kinds of attacks are computationally feasible?
• This was a very powerful attacker. What are
reasonable limits on the attacker’s abilities?
• Result in itself, independent of model.
• Partly published at Int. Symp. Info. Theory, 2003
• Journal paper in review, at website
6/30/2016
Poorvi Vora/CS/GWU
15
Value-free model
• Human rights aspects covered through crypto
protocols
• Necessary health information and community
information can be gathered
• Consumer behaviour treated through this game
• Criticism: very adversarial model
6/30/2016
Poorvi Vora/CS/GWU
16
Another application: anonymous delivery
Crowds: Reiter and Rubin/Lucent and AT&T
At node i+1: node i more likely than any other
B
A
Receiver: Node i+1
Message: sending node
E
C
Received symbol: Node i
D
N nodes;
pf probability of
forwarding
6/30/2016
Channel characteristic:
Probability that true sender is Node i,
Probability that other nodes are senders
Traffic analysis/data mining: correlations among
senders (communication across channel, less
efficient than some error-correcting code)
Poorvi Vora/CS/GWU
17
An example of model use to measure the value
of information
with Yu-An Sun and Sumit Joshi
• Auction bids reveal much about an individual’s profile
• Consider the Vickrey – sealed second highest bid –
auction
– Optimal strategy: to bid one’s valuation
– Bids (and hence valuations) can be protected with secure
multiparty computation
– But, bids allow determination of market demand (efficient
markets)
– Need for an aggregate value, not well-defined at the moment
of the auction
6/30/2016
Poorvi Vora/CS/GWU
18
Variably Private Vickrey – Bidding Round
Introduce uncertainty
• The seller announces a minimum sale price and a
maximum randomization setting.
• Each bidder submits a sealed interval containing her
bid. The size of the interval is her choice.
• In the running with high end, committed to low
6/30/2016
Poorvi Vora/CS/GWU
19
Variably Private Vickrey – Revealing
Round
• Bidders not in the running will reveal no more
information on their valuations.
• Largest of the others will reveal which half of their
interval contains valuation
6/30/2016
Poorvi Vora/CS/GWU
20
Sale Price
Seller gets
{
Buyer pays
Divided among all
bidders proportional
to the interval width
6/30/2016
Poorvi Vora/CS/GWU
21
Properties?
• Provides various demand statistics
• In general, accuracy of future bid estimation lower for
more uncertainty
• Allows for bidder to vary uncertainty, and pay for it
• Allows seller to obtain more than regular Vickrey,
depending on how much information is valued
• Bidder with highest valuation still wins auction as long
as she can tolerate revealing her valuation to the
extent required.
6/30/2016
Poorvi Vora/CS/GWU
22
Summary
A model that we hope will:
– Provide choices not currently typically available to users
– Extend the security framework to include problems like those
in statistical databases
– Provide a means of measuring uncertainty in situations
where there is some not none or complete
– Include other leakage from security-related protocols such
as anonymous delivery and ciphers
– Be useful for measuring the economic value of information
6/30/2016
Poorvi Vora/CS/GWU
23
Download