Privacy Issues in Disclosing Averages Susmit Sarkar (CMU) Non-Interference Non-Interference : Observable actions of programs are not influenced by sensitive data Too restrictive in practice! Think of password security Safe Relaxation of Non-Interference Passwords are sensitive data Checking passwords violates noninterference This is still okay [Volpano] if passwords are chosen randomly The interaction is carefully controlled Generalizing to Averages Idea: restrict access to allow us to answer interesting queries Also, we can measure information loss We want to calculate averages on private data Generalize the notion of averages Content Host’s problem Content host serving multiple content providers The number of hits is sensitive information Often, clients ask average hits of specified clients Example: Sport Site You want to know how the redesign of your sports portal worked Complications : It happens to be Superbowl Sunday We want averages of all sports sites What if there are only 2 sports sites? Formal Model Data D1 D2 D3 D4 D5 0 1 0 1 Query 1 := d1 + d3 + d5 = ? Problem : what about 1 0 1 1 0, and 10111 Query Model Solution : Maintain history Idea : add current query to set, decide if “bad” vectors are derivable We restrict attention to weighted sums Issues Ignored in Model Answers of queries (Right Hand Sides) Data values Extraneous information : Correlation between data Some of this are in further work Characterizing Bad Vectors (0 1 0 0 0 0 0 0 0 0 0) (1 106 1 1 1 1 1 1 1 1 1) We want a measure that indicates when all entries are of similar magnitude Idea : Entropy We use the entropy function : - pi lg pi Normalize entries so that magnitudes sum to one Then treat the magnitudes as probabilities in entropy definition Entropy is low when data is skewed Formal Problem Statement m Query vectors Qi = (qi1,qi2,L,qin) Unknown linear combination U = c1 Q1 + c2 Q2 + L Variables ui = cj qij Variables u’i ¸ ui and u’i ¸ – ui u’i ¸ |ui| Calculating Entropy Entropy (u’i / u’j ) lg (u’i / u’j) ¸ T Minimize : u’I Notice that this is a convex program Convex Programming [Vempala] allows us to do convex programming efficiently His algorithm allows us to solve our problem in polynomial time Future Work Extend our measure to take into account the Right Hand Sides Change the model to maximize queries we can answer Bibliography [Volpano] “Verifying Secrets and Relative Secrecy”, Volpano and Smith, POPL’ 00 [Vempala] “Solving Convex Programs by Random Walks”, Vempala and Bertsimas, STOC’ 02