A model for data revelation Poorvi Vora Dept. of Computer Science George Washington University “Security” frameworks Binary • Divide the world into trusted and untrusted parties • Provides complete revelation of information or complete protection E.g. multiparty computation, encrypted data 6/30/2016 Poorvi Vora/CS/GWU 2 Even a statistic or aggregate reveals “private” information Secure multiparty computation reveals f(x1, x2, .. xn) And nothing more. Yet, this reveals information about all xi Thus, typical security assurances not enough 6/30/2016 Poorvi Vora/CS/GWU 3 What is privacy • Control over information • Extent of information revelation Tensions between: Access to aggregate information for community Vs. Individual control reputation vs. predjudice 6/30/2016 Poorvi Vora/CS/GWU 4 Individual control requires more than binary security of personal information Information is often given up for something in return – Safeway card – Monthly charge to be kept of phone books – Information for community statistics: • Health statistics • Collaborative filtering/personalization in virtual communities 6/30/2016 Poorvi Vora/CS/GWU 5 A model: introduce uncertainty maximum uncertainty (i.e. secrecy) corresponds to crypto protocols • Alice and Bob determine: – a binary data point from Alice’s personal information, x – a probability of truth, p – a return, y • Alice reveals a variable z = x with probability p • Bob provides, in return, y • z exists in the ether as Alice’s value x with probability p This is not mutually exclusive with cryptographic protection (p=0.5 is cryptographic) Used in public health community for twenty odd years 6/30/2016 Poorvi Vora/CS/GWU 6 Outcome Protocol is a mathematical game between Alice and Bob Optimal situation not when no information is revealed, but when Alice gets maximum benefit for her information Think about this: should women in Africa test for HIV when they will certainly not obtain any treatment for it? 6/30/2016 Poorvi Vora/CS/GWU 7 An analogy • The protocol is a communication channel • The sender is Alice, the receiver (malicious?) Bob • The probability of error is the probability of a lie 6/30/2016 Poorvi Vora/CS/GWU 8 Security properties of randomization • Repeated queries Error 0 as n And n as Error 0 • Cost to attacker increases without bound if error not bounded above zero • This is a repetition code over channel 6/30/2016 Poorvi Vora/CS/GWU 9 Other attacks Query 1: Graying? Query 2: Balding? Query 3: Weight? Query 4: Sports? Really asking about age and gender How does one characterize all such attacks? What can one say about security wrt such attacks? 6/30/2016 Poorvi Vora/CS/GWU 10 An analogy • The protocol is a communication channel • The sender is Alice, the receiver (malicious?) Bob • The probability of error is the probability of a lie • The attributes that Bob wants to determine form the message 6/30/2016 Poorvi Vora/CS/GWU 11 A simple attack • Query 1: Female? • Query 2: Over 40? • Query 3: Losing Calcium? Query 3 checks answers to Query 1 and 2 Is a parity-check it 6/30/2016 Poorvi Vora/CS/GWU 12 An analogy • All attacks are communication over channel • Good attacks are codes • What Bob queries is a codeword bit • What he receives is the transmitted codeword that he decodes 6/30/2016 Poorvi Vora/CS/GWU 13 Shannon’s theorems apply In fact, assuming any functions of Alice’s data points as queries (adaptive, related queries) and error probability 0 as n The number of queries required per bit of entropy is asymptotically tightly bound below by the inverse of the channel capacity Above this bound, error tends exponentially to 0 Below it, it increases exponentially with n 6/30/2016 Poorvi Vora/CS/GWU 14 Questions • How does one determine the entropy of a particular data set, or a general data set? • What kinds of attacks are computationally feasible? • This was a very powerful attacker. What are reasonable limits on the attacker’s abilities? • Result in itself, independent of model. • Partly published at Int. Symp. Info. Theory, 2003 • Journal paper in review, at website 6/30/2016 Poorvi Vora/CS/GWU 15 Value-free model • Human rights aspects covered through crypto protocols • Necessary health information and community information can be gathered • Consumer behaviour treated through this game • Criticism: very adversarial model 6/30/2016 Poorvi Vora/CS/GWU 16 Another application: anonymous delivery Crowds: Reiter and Rubin/Lucent and AT&T At node i+1: node i more likely than any other B A Receiver: Node i+1 Message: sending node E C Received symbol: Node i D N nodes; pf probability of forwarding 6/30/2016 Channel characteristic: Probability that true sender is Node i, Probability that other nodes are senders Traffic analysis/data mining: correlations among senders (communication across channel, less efficient than some error-correcting code) Poorvi Vora/CS/GWU 17 An example of model use to measure the value of information with Yu-An Sun and Sumit Joshi • Auction bids reveal much about an individual’s profile • Consider the Vickrey – sealed second highest bid – auction – Optimal strategy: to bid one’s valuation – Bids (and hence valuations) can be protected with secure multiparty computation – But, bids allow determination of market demand (efficient markets) – Need for an aggregate value, not well-defined at the moment of the auction 6/30/2016 Poorvi Vora/CS/GWU 18 Variably Private Vickrey – Bidding Round Introduce uncertainty • The seller announces a minimum sale price and a maximum randomization setting. • Each bidder submits a sealed interval containing her bid. The size of the interval is her choice. • In the running with high end, committed to low 6/30/2016 Poorvi Vora/CS/GWU 19 Variably Private Vickrey – Revealing Round • Bidders not in the running will reveal no more information on their valuations. • Largest of the others will reveal which half of their interval contains valuation 6/30/2016 Poorvi Vora/CS/GWU 20 Sale Price Seller gets { Buyer pays Divided among all bidders proportional to the interval width 6/30/2016 Poorvi Vora/CS/GWU 21 Properties? • Provides various demand statistics • In general, accuracy of future bid estimation lower for more uncertainty • Allows for bidder to vary uncertainty, and pay for it • Allows seller to obtain more than regular Vickrey, depending on how much information is valued • Bidder with highest valuation still wins auction as long as she can tolerate revealing her valuation to the extent required. 6/30/2016 Poorvi Vora/CS/GWU 22 Summary A model that we hope will: – Provide choices not currently typically available to users – Extend the security framework to include problems like those in statistical databases – Provide a means of measuring uncertainty in situations where there is some not none or complete – Include other leakage from security-related protocols such as anonymous delivery and ciphers – Be useful for measuring the economic value of information 6/30/2016 Poorvi Vora/CS/GWU 23