Four years a SPY - Lessons learned in the interdisciplinary project

Four years a SPY
- Lessons learned in the
interdisciplinary project SPION
(Security and Privacy in Online Social Networks)
Bettina Berendt
Department of Computer Science,
KU Leuven, Belgium ,
1. What can data mining do for privacy?
2. Beyond privacy: discrimination/fairness,
3. Towards sustainable solutions
1. What can data mining do for privacy?
The Siren (AD 2000)
 1. DM can detect privacy phenomena
2. DM can cause privacy violations
3. DM can be modified to avoid privacy
Is that
... because: What is privacy?
• Privacy is not only hiding information:
▫ “dynamic boundary regulation processes […] a selective control of access
to the self or to one's group“ (Altman/Petronio)
▫ Different research traditions relevant to CS:
& vis-à-vis whom? Social vs. institutional privacy
AND: Privacy vis-à-vis whom?
Social privacy, institutional privacy, freedom from
... because: What is privacy?
... and what is data mining? whom? Social
vs. institutional privacy
Goal (AD ~ 2008): From the simple view ...
towards a more comprehensive view
4. DM can affect our perception of reality
4. DM can affect our perception of reality –
also enhancing awareness & reflection?!
Privacy feedback and awareness tools
encrypted content,
unobservable communication
selectivity by
offline communities:
social identities,
social requirements
of information flows
feedback &
awareness tools
educational materials
cognitive biases and
and communication design nudging interventions
legal aspects
Complementary technical
approaches in SPION
• DTAI is 1 of the technical partners (with COSIC and DistriNet)
• Developing software tool for Privacy Feedback and Awareness
• Collaborating with other partners (general interdisciplinary
questions, requirements, evaluation)
• What is Privacy Feedback and Awareness? Examples ...
Only these ^^^ friends should see it ^^^
Nobody else should even know I communicated with them
Who are (groups of) recipients in this network anyway?
What happens with my data? What can I do about this?
1. What can data mining do for privacy?
Case study FreeBu: a tool that uses
community-detection algorithms
for helping users perform audience
management on Facebook
An F&A tool for audience management
FreeBu (1): circle
FreeBu (2): circle
FreeBu (3): map
FreeBu (4): column
FreeBu (5): rank
FreeBu is interactive, but does it give a good starting
point? Testing against 3 ground-truth groupings and finding
“the best“ community-detection algorithm
FreeBu: better than Facebook Smart
Lists for access control
• User experiment, n=16
• 2 groups, same interface Result:
(circle), algo: hierarchical
vs. Facebook Smart Lists
• Task: think of 3 posts
that you wouldn‘t want
everybody to see, select
from the given groups
those who should see it
FreeBu: What do users think?
• Two user studies with a total of 12 / 147 participants
• Method: exploratory, mixed methods (interview, questionnaire, log
• Results:
▫ Affordances: grouping for access control, reflection/overview,
▫ Visual effects on attention – examples “map“ & “rank“ vis.s:
More observations
• No relationship of tool appreciation with privacy
• “don‘t tell my friends I am using your tool to spy on
• “don‘t give these data to your colleague“
• “how can you show these photos [in an internal
presentation] without getting your friends‘ consent
• Trust in Facebook > trust in researchers & colleagues?
• Or: machines / abstract people vs. concrete people?
• Recognition of privacy interdependencies? (
discussion of „choice“ earlier today)
• Feedback tools are themselves spying tools ...
Lessons learned
• Social privacy trumps institutional privacy
• Change in attitudes or behaviour takes time
• No graceful degradation w.r.t. usability:
▫ Tools that are <100% usable are NOT used AT ALL.
• What is GOOD? What is BETTER?
2. Beyond privacy:
“Privacy is not the problem“
• Privacy, social justice, and democracy
• View 1: Privacy is a problem (partly) because its
violation may lead to discrimination.
“Data mining IS discrimination“
“Data mining IS discrimination“
Discrimination-aware data mining
(Pedreschi, Ruggieri, & Turini, 2008,
+ many since then)
 PD and PND items: potentially (not) discriminatory
– goal: want to detect & block mined rules such as
purpose=new_car & gender = female → credit=no
– measures of discriminatory power of a rule include
elift (B&A → C) = conf (B&A → C) / conf (B → C) ,
where A is a PD item and B a PND item
Note: 2 uses/tasks of data mining here:
 Descriptive
 “In the past, women who got a loan for a new car often defaulted on it.“
 Prescriptive
 (Therefore) “Women who want a new car should not get a loan.“
Limitations of classical DADM
Constraint-oriented DADM
Exploratory DADM
• Can only detect discrimination
by pre-defined features /
• Ex.:
PD(female), PND(haschildren),
but discrimination of mothers
Exploratory data analysis
supports feature construction,
new feature analyses
Avoidance of creation
Fully automatic decision making:
cannot implement the legal
concept of „treat equal things
equally and different things
differently“ (AI-hard)
Semi-automated decision support:
sanitized rules  sanitized
Salience, awareness, reflection
 better decisions?
Exploratory DADM: DCUBE-GUI
Left: rule count (size) vs. PD/non-PD (colour)
Right: rule count (size) vs. AD-measure (rainbow-colours scale)
Evaluation: Comparing c & eDADM
Constraint-oriented DADM
Exploratory DADM
• Can only detect discrimination
by pre-defined features /
• Ex.:
PD(female), PND(haschildren),
but discrimination of mothers
Exploratory data analysis
supports feature construction,
new feature analyses
Avoidance of creation
Fully automatic decision making:
“hiding bad patterns“,
cannot implement the legal
black box
concept of „treat equal things
equally and different things
differently“ (AI-hard)
Semi-automated decision support:
sanitized rules  sanitized
“highlighting bad
patterns“, white box
Salience, awareness, reflection
 better decisions?
Online experiment with 215 US mTurkers
 Prevention: bank
 Detection: agency
 $6.00 show-up fee
 3 Exercise tasks
 6 Assessed tasks
 $0.25 performance
bonus per AT
 Demographics
 Quant/bank job
 Experience with
Dabiku is a Kenyan national. She is single and has no children. She has been
employed as a manager for the past 10 years. She now asks for a loan of $10,000
for 24 months to set up her own business. She has $100 in her checking account
and no other debts. There have been some delays in paying back past loans.
Decision-making scenario
Task structure
 Vignette, describing applicant and application
 Rules: positive/negative risks, flagged
 Decision and motivation, optional comment
Required competencies
 Discard discrimination-indexed rules
 Aggregate rule certainties
 Justify decision by categorising risk factors
Rule visualisation by treatment
(not DA)
 Hide bad features
 Prevention scenario
 Flag bad features
 Detection scenario
 Neither flagged
nor hidden
Results: Actionability and decision quality
Decisions and Motivations
 DA versus DADM
 More correct decisions in DADM
 More correct motivations in DADM
 No performance impact
 Discrimination
persistent in cDADM
 Relative merits
 Constrained DADM better for prevention
 Exploratory DADM better for detection
 ‘‘I dropped the -.67
number a little bit
because it included
her being a female as
a reason.’’
Berendt & Preibusch. Better decision support through exploratory discrimination-aware data mining. in: ARTI, 2014
What we did
• an interactive tool DCUBE-GUI
• a conceptual analysis of
▫ (anti-)discrimination as modelled in data
mining (“DADM“)
▫ unlawful discrimination as modelled in law
• framework: constraint-oriented vs.
exploratory DADM
• two user studies (n=20, 215) with DADM as
decision support that showed
▫ DADM can help make better decisions &
▫ cDADM / eDADM better for different settings
▫ Sanitized patterns are not sufficient to make
sanitized minds
Lessons learned
Privacy by
• A systems approach is needed
“Multi-stakeholder information systems“
Information systems
Interactive systems (e.g. Exploratory
No people; “solutionism“
AI / Data mining
IS Science,
3. Towards sustainable solutions
Effectiveness of “ethical apps“?
Effectiveness of “ethical apps“?
Hudson et al. (2013):
• What makes people buy a fair-trade product?
• Informational film shown before buying decision?
▫ NO
• Having to make the decision in public?
▫ NO
• Some prior familiarity with the goals and activities
of fair-trade campaigns as well as broader
understanding of national and global political issues
that are only peripherally related to fair trade?
Rather: long-term educational campaigns
• “[W]hile latest technologies allow us to do plenty of easy
things on the cheap, those easy things are not necessarily the
ones that matter. Perhaps it's not even technology that is at
fault here.
• Rather, it's a choice between stand-alone apps that seek to
change our behavior on the fly and sophisticated, content-rich
apps—integrated into a broader educational strategy—that
might deepen our knowledge about a given subject in the long
• And while there are plenty of news apps, having citizens
actually engage with the long-form content that those apps
provide—let alone understand the causes of the greenhouse
effect or the intricacies of world trade—is a task that might
require a different, app-free strategy.”
Morozov (2013)
Where to get a captive
audience for that?
• Schools, (universities)
• Schools: lots of materials, little
knowledge about effects
• Where there was evaluation, no
big effects
▫ (notable exception: SPION
Privacy Manual, in Dutch)
• Mostly short-term interventions
• With limited scope and often
unclear concepts
 We developed our own lesson
series spanning 10 double hours
(and carried it out)
Society and politics
Profile and behavioural data
Basic structure of data mining models
(correlations in “Big Data“ instead of
Use of data by Facebook for third
parties (business models and
customer loyalty)  advertising
Application of descriptive models for
 TIDAP (total intransparency of
data analysis and processing)
Customer segmentation and
„weblining“ (use of data mining by
third parties)  access to loans,
insurance, ...
Ex. 1: Association rule learning with
Ex. 2: Regression analysis for
Usage contexts of other third parties  access to education, work, ...?
Cf. View 3: Heightened privacy concerns
are just a symptom of something more
general being wrong. (e.g. notions of
fairness, control, freedom of speech)
The fundamental right of
informational self-determination and
threats to it: Chilling effects created
by panoptism and TIDAP
Plurality of opinions as a
characteristic of democracy and
threats to it: “Weblining“ via TIDAP
Freedom of contract vs. Other fundamental rights of participation that the
state has to protect actively
Plan for schools (utopian?!):
Course & curriculum overview (ENISA Report 2014)
• Goals: Knowledge, reflection/attitudes, action orientation
• Module 1: Different notions of “security”: safety, e-safety, security,
cybersecurity, security and privacy, IT security and national security,
“good and bad hackers”, …
• Modules 2-6: „“Security“ in the sense of ...
2. protection against inappropriate content and undesired audiences
& contacts
3. protection of personal data and privacy
4. IT Security
5. protection of fundamental rights and democracy
6. protection against procrastination
• Duration proposal: 1-2 days – 1 year
• Stakeholders: ECDL Foundation, ISC, SANS, European Schoolnet /
etwinning, national and regional teacher (training) associations
Plan for university teaching (concrete):
Interlinking two courses
Knowledge and the Web
• Data interoperability and
– …
• Data heterogeneity and
combining data
– … <some topics mandatory only for 6p>
• From data to knowledge
– …
• Data in context
Privacy and Big Data
• Legal and ethical issues
of Big Data
– ...
• Data and database
– ...
• Privacy techniques
– …
– Data publishing /mining and privacy
– Data publishing /mining and
Consultancy on privacy issues in their projects
Lessons learned
• How to measure that privacy (privacy awareness,
knowledge, behaviour, outcome ...) has become
• In doing that, how can we avoid another iteration of
undue “reification of data“ (Kitchins)?
• We need to enlist the computer scientists / take part as
CSers in addressing “Big Data“s problems – but:
• The really hard part is to ask CSers to depart from their
favourite basic assumption, which comes in different
▫ If there is a problem, it‘s because someone has too little
▫ Problems can be fixed.
▫ There is a right and a wrong.
Many thanks!
Banksy, Marble Arch, London, 2005
