Four years a SPY - Lessons learned in the interdisciplinary project SPION (Security and Privacy in Online Social Networks) Bettina Berendt Department of Computer Science, KU Leuven, Belgium www.berendt.de , www.spion.me Thanks to (in more or less chronological order) • • • • • • • • • • • Sarah Spiekermann Seda Gürses Sören Preibusch Bo Gao Ralf De Wolf Brendan Van Alsenoy Rula Sayaf Thomas Peetz Ellen Vanderhoven my other SPION colleagues and many others co-authors and collaborators! [All references for these slides are at the end of the slide set.] Overview 1. What can data mining do for privacy? 2. Beyond privacy: discrimination/fairness, democracy 3. Towards sustainable solutions 1. What can data mining do for privacy? The Siren (AD 2000) 1. DM can detect privacy phenomena 2. DM can cause privacy violations 3. DM can be modified to avoid privacy violations 3. DM can be modified to avoid privacy violations Is that sufficient? ... because: What is privacy? • Privacy is not only hiding information: ▫ “dynamic boundary regulation processes […] a selective control of access to the self or to one's group“ (Altman/Petronio) ▫ Different research traditions relevant to CS: & vis-à-vis whom? Social vs. institutional privacy AND: Privacy vis-à-vis whom? Social privacy, institutional privacy, freedom from surveillance ... because: What is privacy? ... and what is data mining? whom? Social vs. institutional privacy 13 Goal (AD ~ 2008): From the simple view ... towards a more comprehensive view 4. DM can affect our perception of reality 4. DM can affect our perception of reality – also enhancing awareness & reflection?! Privacy feedback and awareness tools encrypted content, unobservable communication selectivity by access control offline communities: social identities, social requirements identification of information flows profiling feedback & awareness tools educational materials cognitive biases and and communication design nudging interventions legal aspects Complementary technical approaches in SPION • DTAI is 1 of the technical partners (with COSIC and DistriNet) • Developing software tool for Privacy Feedback and Awareness • Collaborating with other partners (general interdisciplinary questions, requirements, evaluation) • What is Privacy Feedback and Awareness? Examples ... Only these ^^^ friends should see it ^^^ Nobody else should even know I communicated with them Who are (groups of) recipients in this network anyway? What happens with my data? What can I do about this? 1. What can data mining do for privacy? Case study FreeBu: a tool that uses community-detection algorithms for helping users perform audience management on Facebook An F&A tool for audience management FreeBu (1): circle FreeBu (2): circle FreeBu (3): map FreeBu (4): column FreeBu (5): rank FreeBu is interactive, but does it give a good starting point? Testing against 3 ground-truth groupings and finding “the best“ community-detection algorithm FreeBu: better than Facebook Smart Lists for access control • User experiment, n=16 • 2 groups, same interface Result: (circle), algo: hierarchical modularity-maximisation vs. Facebook Smart Lists • Task: think of 3 posts that you wouldn‘t want everybody to see, select from the given groups those who should see it FreeBu: What do users think? • Two user studies with a total of 12 / 147 participants • Method: exploratory, mixed methods (interview, questionnaire, log analysis) • Results: ▫ Affordances: grouping for access control, reflection/overview, (unfriending) ▫ Visual effects on attention – examples “map“ & “rank“ vis.s: More observations • No relationship of tool appreciation with privacy concerns • “don‘t tell my friends I am using your tool to spy on them“ • “don‘t give these data to your colleague“ • “how can you show these photos [in an internal presentation] without getting your friends‘ consent first?“ • Trust in Facebook > trust in researchers & colleagues? • Or: machines / abstract people vs. concrete people? • Recognition of privacy interdependencies? ( discussion of „choice“ earlier today) • Feedback tools are themselves spying tools ... Lessons learned • Social privacy trumps institutional privacy • Change in attitudes or behaviour takes time • No graceful degradation w.r.t. usability: ▫ Tools that are <100% usable are NOT used AT ALL. • What is GOOD? What is BETTER? 2. Beyond privacy: discrimination/fairness “Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. “Data mining IS discrimination“ 32 “Data mining IS discrimination“ 33 “Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. • View 2: Privacy is one of a set of social issues. Discrimination-aware data mining (Pedreschi, Ruggieri, & Turini, 2008, + many since then) PD and PND items: potentially (not) discriminatory – goal: want to detect & block mined rules such as purpose=new_car & gender = female → credit=no – measures of discriminatory power of a rule include elift (B&A → C) = conf (B&A → C) / conf (B → C) , where A is a PD item and B a PND item Note: 2 uses/tasks of data mining here: Descriptive “In the past, women who got a loan for a new car often defaulted on it.“ Prescriptive (Therefore) “Women who want a new car should not get a loan.“ Limitations of classical DADM Constraint-oriented DADM Exploratory DADM Detection • Can only detect discrimination by pre-defined features / constraints • Ex.: PD(female), PND(haschildren), but discrimination of mothers Exploratory data analysis supports feature construction, new feature analyses Avoidance of creation Fully automatic decision making: cannot implement the legal concept of „treat equal things equally and different things differently“ (AI-hard) Semi-automated decision support: sanitized rules sanitized minds? ? Salience, awareness, reflection better decisions? Exploratory DADM: DCUBE-GUI Left: rule count (size) vs. PD/non-PD (colour) Right: rule count (size) vs. AD-measure (rainbow-colours scale) Evaluation: Comparing c & eDADM Constraint-oriented DADM Exploratory DADM Detection • Can only detect discrimination by pre-defined features / constraints • Ex.: PD(female), PND(haschildren), but discrimination of mothers Exploratory data analysis supports feature construction, new feature analyses Avoidance of creation Fully automatic decision making: “hiding bad patterns“, cannot implement the legal black box concept of „treat equal things equally and different things differently“ (AI-hard) Semi-automated decision support: sanitized rules sanitized minds? ? “highlighting bad patterns“, white box Salience, awareness, reflection better decisions? Online experiment with 215 US mTurkers Framing Tasks Questionnaire Prevention: bank Detection: agency $6.00 show-up fee 3 Exercise tasks 6 Assessed tasks $0.25 performance bonus per AT Demographics Quant/bank job Experience with discrimination Dabiku is a Kenyan national. She is single and has no children. She has been employed as a manager for the past 10 years. She now asks for a loan of $10,000 for 24 months to set up her own business. She has $100 in her checking account and no other debts. There have been some delays in paying back past loans. Decision-making scenario Task structure Vignette, describing applicant and application Rules: positive/negative risks, flagged Decision and motivation, optional comment Required competencies Discard discrimination-indexed rules Aggregate rule certainties Justify decision by categorising risk factors Rule visualisation by treatment Constrained DADM Exploratory DADM (not DA) DM Hide bad features Prevention scenario Flag bad features Detection scenario Neither flagged nor hidden Results: Actionability and decision quality Decisions and Motivations Biases DA versus DADM More correct decisions in DADM More correct motivations in DADM No performance impact Discrimination persistent in cDADM Relative merits Constrained DADM better for prevention Exploratory DADM better for detection ‘‘I dropped the -.67 number a little bit because it included her being a female as a reason.’’ Berendt & Preibusch. Better decision support through exploratory discrimination-aware data mining. in: ARTI, 2014 “Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. • View 2: Privacy is one of a set of social issues. • View 3: Heightened privacy concerns are just a symptom of something more general being wrong. (e.g. Discrimination – underlying definition of fairness – who gets to decide?) Discrimination-aware data mining (Pedreschi, Ruggieri, & Turini, 2008, + many since then) 2 uses/tasks of data mining: Descriptive “In the past, women who got a loan for a new car often defaulted on it.“ Prescriptive (Therefore) “Women who want a new car should not get a loan.“ Goal: detect the first AND/OR block the second (= push it below a threshold) What we did • an interactive tool DCUBE-GUI • a conceptual analysis of ▫ (anti-)discrimination as modelled in data mining (“DADM“) ▫ unlawful discrimination as modelled in law • framework: constraint-oriented vs. exploratory DADM • two user studies (n=20, 215) with DADM as decision support that showed ▫ DADM can help make better decisions & motivations ▫ cDADM / eDADM better for different settings ▫ Sanitized patterns are not sufficient to make sanitized minds “Privacy is not the problem“ • Privacy, social justice, and democracy • View 1: Privacy is a problem (partly) because its violation may lead to discrimination. • View 2: Privacy is one of a set of social issues. • View 3: Heightened privacy concerns are just a symptom of something more general being wrong. (e.g. Discrimination – underlying definition of fairness – who gets to decide?) Lessons learned Privacy by design?! • A systems approach is needed “Multi-stakeholder information systems“ Diverse Information systems stakeholders Experts Interactive systems (e.g. Exploratory ValueSoftware analysis) sensitive Algorithms DevelopUsers design, No people; “solutionism“ ment, HCI Sociology, AI / Data mining IS Science, Politics, Law Education 3. Towards sustainable solutions Effectiveness of “ethical apps“? Effectiveness of “ethical apps“? Hudson et al. (2013): • What makes people buy a fair-trade product? • Informational film shown before buying decision? ▫ NO • Having to make the decision in public? ▫ NO • Some prior familiarity with the goals and activities of fair-trade campaigns as well as broader understanding of national and global political issues that are only peripherally related to fair trade? ▫ YES Rather: long-term educational campaigns • “[W]hile latest technologies allow us to do plenty of easy things on the cheap, those easy things are not necessarily the ones that matter. Perhaps it's not even technology that is at fault here. • Rather, it's a choice between stand-alone apps that seek to change our behavior on the fly and sophisticated, content-rich apps—integrated into a broader educational strategy—that might deepen our knowledge about a given subject in the long term. • And while there are plenty of news apps, having citizens actually engage with the long-form content that those apps provide—let alone understand the causes of the greenhouse effect or the intricacies of world trade—is a task that might require a different, app-free strategy.” Morozov (2013) Where to get a captive audience for that? • Schools, (universities) • Schools: lots of materials, little knowledge about effects • Where there was evaluation, no big effects ▫ (notable exception: SPION Privacy Manual, in Dutch) • Mostly short-term interventions • With limited scope and often unclear concepts We developed our own lesson series spanning 10 double hours (and carried it out) Informatics Economics Society and politics Trackers Profile and behavioural data Basic structure of data mining models (correlations in “Big Data“ instead of causality) Use of data by Facebook for third parties (business models and customer loyalty) advertising Application of descriptive models for predicting TIDAP (total intransparency of data analysis and processing) Customer segmentation and „weblining“ (use of data mining by third parties) access to loans, insurance, ... Ex. 1: Association rule learning with Apriori Ex. 2: Regression analysis for prediction Usage contexts of other third parties access to education, work, ...? Cf. View 3: Heightened privacy concerns are just a symptom of something more general being wrong. (e.g. notions of fairness, control, freedom of speech) The fundamental right of informational self-determination and threats to it: Chilling effects created by panoptism and TIDAP Plurality of opinions as a characteristic of democracy and threats to it: “Weblining“ via TIDAP Freedom of contract vs. Other fundamental rights of participation that the state has to protect actively Plan for schools (utopian?!): Course & curriculum overview (ENISA Report 2014) • Goals: Knowledge, reflection/attitudes, action orientation • Module 1: Different notions of “security”: safety, e-safety, security, cybersecurity, security and privacy, IT security and national security, “good and bad hackers”, … • Modules 2-6: „“Security“ in the sense of ... 2. protection against inappropriate content and undesired audiences & contacts 3. protection of personal data and privacy 4. IT Security 5. protection of fundamental rights and democracy 6. protection against procrastination • Duration proposal: 1-2 days – 1 year • Stakeholders: ECDL Foundation, ISC, SANS, European Schoolnet / etwinning, national and regional teacher (training) associations Plan for university teaching (concrete): Interlinking two courses Knowledge and the Web • Data interoperability and semantics – … • Data heterogeneity and combining data – … <some topics mandatory only for 6p> • From data to knowledge – … • Data in context Privacy and Big Data • Legal and ethical issues of Big Data – ... • Data and database security – ... • Privacy techniques – … – Data publishing /mining and privacy – Data publishing /mining and discrimination Consultancy on privacy issues in their projects Lessons learned • How to measure that privacy (privacy awareness, knowledge, behaviour, outcome ...) has become BETTER? • In doing that, how can we avoid another iteration of undue “reification of data“ (Kitchins)? • We need to enlist the computer scientists / take part as CSers in addressing “Big Data“s problems – but: • The really hard part is to ask CSers to depart from their favourite basic assumption, which comes in different flavours: ▫ If there is a problem, it‘s because someone has too little information. ▫ Problems can be fixed. ▫ There is a right and a wrong. Summary 1. What can data mining do for privacy? 2. Beyond privacy: fairness, democracy 3. Towards sustainable solutions Many thanks! Banksy, Marble Arch, London, 2005 References pp. 5-6: Berendt, B., Günther, O., & Spiekermann, S. (2005). Privacy in E-Commerce: Stated preferences vs. actual behavior. Communications of the ACM, 48(4), 101-106. http://warhol.wiwi.hu-berlin.de/~berendt/Papers/p101-berendt.pdf p. 10: • Altman, I. (1976). Privacy: A conceptual analysis. Environment and Behaviour, 8(1), 7-29. • Petronio, S. (2002). Boundaries of Privacy: Dialectics of Disclosure.Albany, NY, USA: SUNY. • Gürses, S.F. & Berendt, B. (2010). The Social Web and Privacy. In E. Ferrari & F. Bonchi (Eds.), Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques. Boca Raton, FL: Chapman & Hall/CRC Press, Data Mining and Knowledge Discovery Series. http://www.cosic.esat.kuleuven.be/publications/article-1304.pdf p. 13: Berendt, B. (2012). More than modelling and hiding: Towards a comprehensive view of Web mining and privacy. Data Mining and Knowledge Discovery, 24 (3), 697-737. http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_2012_DAMI.pdf p. 15: Berendt, B. (2012). Data mining for information literacy. In D.E. Holmes and L.C. Jain. (Eds.), Data Mining: Foundations and Intelligent Paradigms. Springer. http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_2012_DM4IL.pdf pp. 19ff.: Gao, Bo; Berendt, Bettina. Circles, posts and privacy in egocentric social networks: An exploratory visualization approach, ASONAM, Niagara Falls, Canada, 25-28 August 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 792-796, IEEE. https://lirias.kuleuven.be/bitstream/123456789/424074/1/gao_berendt_2013.pdf p. 25: Berendt, B.; Gao, B. Friends and Circles — A Design Study for Contact Management in Egocentric Online Social Networks. In Online Social Media Analysis and Visualization, Springer, 2014. pp. 36ff.: Berendt, B. & Preibusch, S. (2014). Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artificial Intelligence and Law, 22 (2), 175-209. http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_preibusch_2014.pdf p. 37: Gao, B. & Berendt, B. (2011). Visual Data Mining for Higher-level Patterns: Discrimination-Aware Data Mining and Beyond. In Benelearn 2011. Proceedings of the Twentieth Belgian Dutch Conference on Machine Learning The Hague, May 20 2011 (pp. 45-52). http://www.liacs.nl/~putten/benelearn2011/Benelearn2011_Proceedings.pdf p. 50: Hudson, M., Hudson, I., & Edgerton, J.D. (2013). Political Consumerism in Context: An Experiment on Status and Information in Ethical Consumption Decisions. American Journal of Economics, 72 (4), 1009-1037. http://dx.doi.org/10.1111/ajes.12033 p. 51: Morozov, E. (2013). Hey, Big Fair-Trade Spender. Apps promote ethical purchases, but do they inspire deeper learning? Slate. http://www.slate.com/articles/technology/future_tense/2013/09/goodguide_fairphone_ethical_shopping_apps_miss_the_point.html pp. 52f: Berendt, B., Dettmar, G., Demir, C., & Peetz, T. (2014). Kostenlos ist nicht kostenfrei. LOG IN 178/179, 41-56. Links to teaching materials and English summary at http://people.cs.kuleuven.be/~bettina.berendt/Privacy-education/ p. 53: Berendt, B., De Paoli, S., Laing, C., Fischer-Hübner, S., Catalui, D., & Tirtea, R. (in press). Roadmap for NIS education programmes in Europe. ENISA. p. 54: http://people.cs.kuleuven.be/~bettina.berendt/teaching/2014-15-1stsemester/kaw/ Backup Transparency A legal view of „knowledge is power“ 62 62 Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching 62 A legal view of „knowledge is power“ 63 63 Data protection : transparency and accountability obligations of data controllers Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching 63 A legal view of „knowledge is power“ 64 64 Privacy : Opacity of the individual as a Data subject Data protection : transparency and accountability obligations of data controllers Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching 64 Article 8 of the European Convention on Human Rights - a protected sphere in which one is „let alone“ (mostly) 65 65 Article 8 – Right to respect for private and family life 1. Everyone has the right to respect for his private and family life, his home and his correspondence. 2. There shall be no interference by a public authority with the exercise of this right except such as is in accordance with the law and is necessary in a democratic society in the interests of national security, public safety or the economic well-being of the country, for the prevention of disorder or crime, for the protection of health or morals, or for the protection of the rights and freedoms of others. Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching 65 Privacy as control – Data protection(1) 66 66 OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data (aka Fair Information Practices) - Similarly encoded in the EU Directives relating to privacy Collection limitation : Data collectors should only collect information that is necessary, and should do so by lawful and fair means, i.e., with the knowledge or consent of the data subject. Data quality : The collected data should be kept up-to-date and stored only as long as it is relevant. Purpose specification : The purpose for which data is collected should be specified (and announced) ahead of the data collection. Use limitation : Personal data should only be used for the stated purpose, except with the data subject’s consent or as required by law. Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching 66 Privacy as control – Data protection(2) 67 67 Security safeguards : Reasonable security safeguards should protect collected data from unauthorised access, use, modification, or disclosure. Openness : It should be possible for data subjects to learn about the data controller’s identity, and how to get in touch with him. Individual participation : A data subject should be able to obtain from a data controller confirmation of whether or not the controller has data relating to him, to obtain such data, to challenge data relating to him and, if the challenge is successful, to have the data erased, rectified, completed or amended. Accountability : Data controllers should be accountable for complying with these principles. Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching 67 Contract freedom? Informatica Economie Maatschappij en politiek Tekst (geschreven voor het SPION Privacy Manual) + software tools ter bescherming tegen gegevensverzameling Tekst (Website voor een breed publiek) Tekst (kwaliteitskrant) Tekst (voor en seminaar; Facebook‘s Data Use Policy) Tekst (kwaliteitskrant) (Tekst zie links) Rollenspel Web API (Facebook) + data mining algoritme Data mining online tool (Preference Tool: “Predicting personality from Faceb. Likes“) Documentatie rond het tool, wetenschappelijk artikel - psychologie Teksten(rechtbank ordeel; wetenschappelijk artikel rechten) Tekst (wetenschappelijk artikel rechten) Rollenspel