Fairness, Privacy, and Social Norms Omer Reingold, MSR-SVC “Fairness through awareness” with Cynthia Dwork, Moritz Hardt, Toni Pitassi, Rich Zemel + Musings with Cynthia Dwork, Guy Rothblum and Salil Vadhan In This Talk • Fairness in Classification (individual-based notion) – Connection between Fairness and Privacy – DP beyond Hamming Distance • A notion of privacy beyond the DB setting. • Empowering society to make choices on privacy. Fairness in Classification Health Care Advertising Financial aid Concern: Discrimination • Population includes minorities – Ethnic, religious, medical, geographic – Protected by law, policy, ethics • A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, … discrimination may be subtle! Credit Application (WSJ 8/4/10) User visits capitalone.com Capital One uses tracking information provided by the tracking network [x+1] to personalize offers Concern: Steering minorities into higher rates (illegal)* Here: A CS Perspective • An individual based notion of fairness – fairness through awareness • Versatile framework for obtaining and understanding fairness • Lots of open problems/directions – Fairness vs. Privacy Other notions of “fairness” in CS • • • • • Fair scheduling Distributed computing Envy-freeness Cake cutting Stable matching • More closely related notions outside of CS (Economics, Political Studies, …) – Rawls, Roemer, Fleurbaey, Young, Calsamiglia Fairness and Privacy (1) • [Dwork & Mulligan 2012] objections to online behavioral targeting often expressed in terms of privacy. In many cases the underlying concern is better described in terms of fairness (e.g., price discrimination, being mistreated). – Other major concern: feeling of “ickiness” [Tene] • Privacy does not imply fairness – Definitions and techniques useful. – Can Fairness Imply Privacy (beyond DB setting)? Ad network (x+1) x V: Individuals Vendor (capital one) M(x) O: outcomes A: actions Our goal: Achieve Fairness in the first step (mapping) Assume x V: Individuals M(x) unknown, untrusted, un-auditable vendor O: outcomes First attempt… Fairness through Blindness Fairness through Blindness • Ignore all irrelevant/protected attributes – e.g., Facebook “sex” & “interested in men/women” • Point of failure: Redundant encodings – Machine learning: You don’t need to see the label to be able to predict it – E.g., redlining Second attempt… Group Fairness (Statistical Parity) • Equalize minority S with general population T at the level of outcomes – Pr[outcome o | S] = Pr[outcome o | T] • Insufficient as a notion of fairness – Has some merit, but can be abused – Example: Advertise burger joint to carnivores in T and vegans in S. – Example: Self fulfilling prophecy – Example: Multiculturalism … Lesson: Fairness is task-specific • Fairness requires understanding of classification task (this is where utility and fairness are in accord) – Cultural understanding of protected groups – Awareness! Our approach… Individual Fairness Treat similar individuals similarly Similar for the purpose of Similar distribution (fairness in) the classification task over outcomes Metric – Who Decides? • Assume task-specific similarity metric – Extent to which two individuals are similar w.r.t. the classification task at hand • Possibly captures some ground truth or society’s best approximation – Or instead: society’s norms • Open to public discussion, refinement • Our framework is agnostic to the choice of metric • User control? Metric - Starting Points • Financial/insurance risk metrics – Already widely used (though secret) • IBM’s AALIM health care metric – health metric for treating similar patients similarly • Roemer’s relative effort metric – Well-known approach in Economics/Political theory • Machine Learning Maybe not so much science fiction after all… Randomized Mapping Classification x M(x) V: Individuals O: outcomes Towards Formal Definition Close individuals according to Metric d: V V R Mapped to close distributions M(y) y x M(x) V: Individuals O: outcomes Fairness and D-Privacy (2) Close databases individualsaccording according to Hamming V Metric d: Vd:VV RR Mapped to close distributions M(y) y x M(x) V: databases Individuals O: outcomes sanitizations Key elements of our approach… Efficiency (with utility maximization) loss function L: V O R d-fair mapping M Metric d: V V R Efficient Procedure x V: Individuals M(x) O: outcomes Minimize vendor’s expected loss subject to fairness condition More Specific Question we Address • How to efficiently construct the mapping M: V -> (O) • When does individual fairness imply group fairness (statistical parity)? – For a specific metric, which sub-communities are treated similarly? • Framework for achieving “fair affirmative action” (ensuring minimal violation of fairness condition) Fairness vs. Privacy • Privacy does not imply fairness. • Can (our definition of) fairness imply privacy? • Differential Privacy [Dwork-McSherry-NissimSmith’06], privacy for individuals whose information is part of a database: Privacy on the Web? • No longer protected by the data of others – my traces can be used directly to compromise my privacy. • Can fairness be viewed as a measure of privacy? – Can fairness “blend me in with the (surrounding) crowd”? Relation to K-Anonymity • Critique of k-anonymity: Blending with others that have the same sensitive property X is a small consolation. • “Our” notion of privacy is as good as the metric! • If your surrounding is “normative” may imply meaningful protection (and substantiate, currently unjustified, sense of security of users). Simple Observation: Who Are You Mr. Reingold? ?? • If all new information on me obeys our fairness definition with metrics where the two possible Omers are very close then your confidence won’t increase by much … Do We Like It? Challenge – Accumulated Leakage: • Different applications require different metrics. • Less of an issue for fairness … DPrivacy with Other Metrics • This work gives additional motivation to study differential privacy beyond Hamming distance. • Well motivated even in the context of database privacy (there since the original paper). • Example: Privacy of social networks [KiferMachanavajjhala SIGMOD ‘11] – Privacy depends on context • Privacy is a matter of social norms. • Our burden: give tools to decision makers. What is the Privacy in DP? • Original motivation mainly given in terms of optout/opt-in incentives. Worry about an individual deciding if to participate. • A different point of view: a committee that needs to approve a proposed study in the first place. – Does the study incur only tolerable amount of privacy loss for any particular individual? On Correlations and Priors • Assume that rows are selected independently, and no prior information on the database: – DP protects the privacy of each individual. • But at the presence of prior information, privacy can be grossly violated [Dwork-Naor ‘10] • Pufferfish [Kifer- Machanavajjhala] A Semantic Approach to the Privacy of Correlated Data • Protect privacy at the presence of pre-specified adversaries • Interesting case may be when there is a conflict between privacy and utility Individual-Oriented Sanitization • Assume you only care about the privacy of Alice. • Further assume that the data of Alice is correlated to the data of at most 10 others. • Enough to erase these 11 rows from the database. • Even if correlated to more, expunging more that 11 rows may exceed the (society defined) legitimate expectation of privacy (e.g., in a health study). • Differential privacy simultaneously gives “comparable” level of privacy to everyone. Other variants of DP • Suggests and interprets other variants of DP – defined by the sanitization we allow individuals. • For example: in social networks, what is the reasonable expectation of privacy for an individual: – Erase your neighborhood? – Erase information originating from you? • Another variant: change a few entries in each column. Objections • Adam Smith: this informal interpretation may lose too much. For example, the distance in the definition of DP is subtle • Jonathan Katz: How do you set up epsilon? • Omer Reingold: How do you incorporate input from machine learning into the decision process of policy makers? Lots of open problems/directions • Metric – Social aspects, who will define them? – How to generate metric (semi-)automatically, metric oracle? • Connection to Econ literature/problems – Rawls, Roemer, Fleurbaey, Young, Calsamiglia – Local vs global distributive fairness? Composition? • Case Study (e.g., in health care) – Start from AALIM? • Quantitative trade-offs in concrete settings Lots of open problems/directions • Further explore connection and implications to privacy. • Additional study of DP with other metrics. • Completely different definitions of privacy? • … Thank you. Questions?